Use the brackets to select a single pandas DataFrame column and not dot notation

pandas Sep 13, 2019
 

pandas offers its users two choices to select a single column of data and that is with either brackets or dot notation. In this article, I suggest using the brackets and not dot notation for the following ten reasons.

  1. Select column names with spaces
  2. Select column names that have the same name as methods
  3. Select columns with variables
  4. Select non-string columns
  5. Set new columns
  6. Select multiple columns
  7. Dot notation is a strict subset of the brackets
  8. Use one way which works for all situations
  9. Auto-completion works in the brackets and following it
  10. Brackets are the canonical way to select subsets for all objects

Selecting a single column

Let’s begin by creating a small DataFrame with a few columns

import pandas as pd
df = pd.DataFrame({'name': ['Niko', 'Penelope', 'Aria'],
'average score': [10, 5, 3],
'max': [99, 100, 3]})
df
 

Let’s select the name column with dot notation. Many pandas users like dot notation.

>>> df.name
0 Niko
1 Penelope
2 Aria

We can also select it with the brackets

df['name']

You might think it doesn’t matter, but the following reasons might persuade you otherwise. Here are my 10 reasons for using the brackets instead of dot notation.

1. Select column names with spaces

Selecting the column average score is only possible with the brackets. Dot notation is met with a syntax error.

>>> df.average score
SyntaxError: invalid syntax

2. Select column names that have the same name as methods

If a column name has the same name as a DataFrame method, then dot notation will attempt to use the method and not the column name. max is both a column name and a method.

>>> df.max
bound method DataFrame.max

You must use the brackets

df['max']

3. Select columns with variables

If you use assigned a column name to a variable, you can only use the brackets.

>>> col = 'name'
>>> df.col
AttributeError: 'DataFrame' object has no attribute 'col'
>>> df[col] # works

4. Select non-string columns

I don’t suggest using non-string columns, but using the brackets is the only way to select it. First, we need to add a non-string column to our DataFrame. The following creates a new column with the integer 0 as its name.

df[0] = 5
df

Attempting to select this column with df.0 is a syntax error. You must use df[0]. Again, it’s bad practice to use non-strings as column names, but if you do have them in your DataFrame, you’ll need to use brackets to select them.

5. Set new columns

You cannot create new columns with dot notation. For instance, df.new_col = 99 does not work and just creates a new attribute on your DataFrame with value of 99. You must use the brackets — df['new_col'] = 99.

6. Select multiple columns

Selecting multiple columns is only possible with the brackets.

df[['name', 'max']]

7. Dot notation is a strict subset of the brackets

The brackets are able to do accomplish all the tasks of the dot notation. There is nothing that dot notation gives you that is unique to it.

8. Use one way which works for all situations

The brackets work in 100% of situations involving column selection. Always using the same code to do the same thing helps with readability.

9. Auto-completion works in the brackets and following it

Many people are aware that tab-completion in most editors will work when using dot notation. For example, writing df.n and then pressing tab can fill in the rest of the method or column name. This is also possible within the brackets. Writing, df['n and pressing tab will fill in the column name as well.

Tab completion also works if you want to call a method on the column. Writing df['name']. and pressing tab will reveal all the possible Series methods.

10. Brackets are the canonical way to select subsets for all objects

All objects in Python use the brackets as the canonical way to select a subset of data from them. Whether it’s strings, tuples, lists, dictionaries, or numpy arrays, brackets are used to select subsets from them.

Learn Data Science with Python

I have several online and in-person courses available on dunderdata.com to teach you Python, data science, and machine learning.

Online Courses

  • Master Data Analysis with Python — a comprehensive course with access to over 500 pages of text, 300 exercises, 13 hours of video, multiple projects, and detailed solutions
  • Exercise Python — master the fundamentals of Python with access to over 300 pages of text, 150 exercises, multiple projects and detailed solutions
  • Intro to Pandas — FREE course to get started. 5.5 hours of video, nearly 50 exercises
  • All Access Pass! — Get lifetime access to all current and future online courses for one low price!

In-Person Courses

Social Media

I frequently post my python data science thoughts on social media. Follow me!

Corporate Training

If you have a group at your company looking to learn directly from an expert who understands how to teach and motivate students, let me know by filling out the form on this page.

Close

50% Complete

Two Step

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.