pandas offers its users two choices to select a single column of data and that is with either brackets or dot notation. In this article, I suggest using the brackets and not dot notation for the following ten reasons.
Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.
Let’s begin by creating a small DataFrame with a few columns
import pandas as pd
df = pd.DataFrame({'name': ['Niko', 'Penelope', 'Aria'],
'average score': [10, 5, 3],
'max': [99, 100, 3]})
df
Let’s select the name
column with dot notation. Many pandas users like dot notation.
>>> df.name
0 Niko
1 Penelope
2 Aria
We can also select it with the brackets
df['name']
You might think it doesn’t matter, but the following reasons might persuade you otherwise. Here are my 10 reasons for using the brackets instead of dot notation.
Selecting the column average score
is only possible with the brackets. Dot notation is met with a syntax error.
>>> df.average score
SyntaxError: invalid syntax
If a column name has the same name as a DataFrame method, then dot notation will attempt to use the method and not the column name. max
is both a column name and a method.
>>> df.max
bound method DataFrame.max
You must use the brackets
df['max']
If you use assigned a column name to a variable, you can only use the brackets.
>>> col = 'name'
>>> df.col
AttributeError: 'DataFrame' object has no attribute 'col'
>>> df[col] # works
I don’t suggest using non-string columns, but using the brackets is the only way to select it. First, we need to add a non-string column to our DataFrame. The following creates a new column with the integer 0 as its name.
df[0] = 5
df
Attempting to select this column with df.0
is a syntax error. You must use df[0]
. Again, it’s bad practice to use non-strings as column names, but if you do have them in your DataFrame, you’ll need to use brackets to select them.
You cannot create new columns with dot notation. For instance, df.new_col = 99
does not work and just creates a new attribute on your DataFrame with value of 99. You must use the brackets — df['new_col'] = 99
.
Selecting multiple columns is only possible with the brackets.
df[['name', 'max']]
The brackets are able to do accomplish all the tasks of the dot notation. There is nothing that dot notation gives you that is unique to it.
The brackets work in 100% of situations involving column selection. Always using the same code to do the same thing helps with readability.
Many people are aware that tab-completion in most editors will work when using dot notation. For example, writing df.n
and then pressing tab can fill in the rest of the method or column name. This is also possible within the brackets. Writing, df['n
and pressing tab will fill in the column name as well.
Tab completion also works if you want to call a method on the column. Writing df['name'].
and pressing tab will reveal all the possible Series methods.
All objects in Python use the brackets as the canonical way to select a subset of data from them. Whether it’s strings, tuples, lists, dictionaries, or numpy arrays, brackets are used to select subsets from them.
Immerse yourself in my comprehensive path for mastering data science and machine learning with Python. Purchase the All Access Pass to get lifetime access to all current and future courses. Some of the courses it contains:
I frequently post my python data science thoughts on social media. Follow me!
If you have a group at your company looking to learn directly from an expert who understands how to teach and motivate students, let me know by filling out the form on this page.
Upon registration, you'll get access to the following free courses: