Minimally Sufficient Pandas Cheat Sheet

pandas Jan 03, 2020

This article summarizes the very detailed guide presented in Minimally Sufficient Pandas.

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.

What is Minimally Sufficient Pandas?

  • It is a small subset of the library that is sufficient to accomplish nearly everything that it has to offer.
  • It allows you to focus on doing data analysis and not the syntax

How will Minimally Sufficient Pandas benefit you?

  • All common data analysis tasks will use the same syntax
  • Fewer commands will be easier to commit to memory
  • Your code will be easier to understand by others and by you
  • It will be easier to put Pandas code in production
  • It reduces the chance of landing on a Pandas bug.

Specific Guidance

Selecting a Single Column of Data

Use the brackets and not dot notation to select a single column of data because the dot notation cannot column names with spaces, those that collide with...

Continue Reading...

Minimally Sufficient Pandas

pandas Jan 01, 2020

In this article, I will offer an opinionated perspective on how to best use the Pandas library for data analysis. My objective is to argue that only a small subset of the library is sufficient to complete nearly all of the data analysis tasks that one will encounter. This minimally sufficient subset of the library will benefit both beginners and professionals using Pandas. Not everyone will agree with the suggestions I lay forward, but they are how I teach and how I use the library myself. If you disagree or have any of your own suggestions, please leave them in the comments below.

By the end of this article you will:

  • Know why limiting Pandas to a small subset will keep your focus on the actual data analysis and not on the syntax
  • Have specific guidelines for taking a single approach to completing a variety of common data analysis tasks with Pandas

Begin Mastering Data Science Now for Free!

Take my...

Continue Reading...

The Craziness of Subset Selection in Pandas

pandas Nov 21, 2019

Selecting subsets of data in pandas is not a trivial task as there are numerous ways to do the same thing. Different pandas users select data in different ways, so these options can be overwhelming. I wrote a long frou-part series on it to clarify how its done. For instance, take a look at the following options for selecting a single column of data (assuming it’s the first column):

  • df[‘colname’]
  • df[[‘colname’]]
  • df.colname
  • df.loc[:, ‘colname’]
  • df.iloc[:, 0]
  • df.get(‘colname’)

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.

Summary of this post

In this post, I want to cover a single edge case of subset selection that I believe most pandas users will be unaware of what it does and how it works. Let’s say we have a DataFrame df and issue the following subset selection.

df[1, 2]

Deceptively simple

This appears to...

Continue Reading...

Use the brackets to select a single pandas DataFrame column and not dot notation

pandas Sep 13, 2019
 

pandas offers its users two choices to select a single column of data and that is with either brackets or dot notation. In this article, I suggest using the brackets and not dot notation for the following ten reasons.

  1. Select column names with spaces
  2. Select column names that have the same name as methods
  3. Select columns with variables
  4. Select non-string columns
  5. Set new columns
  6. Select multiple columns
  7. Dot notation is a strict subset of the brackets
  8. Use one way which works for all situations
  9. Auto-completion works in the brackets and following it
  10. Brackets are the canonical way to select subsets for all objects

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.

Selecting a single column

Let’s begin by creating a small DataFrame with a few columns

import pandas as pd
df = pd.DataFrame({'name': ['Niko', 'Penelope', 'Aria'],
'average score': [10, 5, 3],
'max': [99, 100, 3]})
df
 

...

Continue Reading...

The Five-Step Process for Data Exploration in a Jupyter Notebook

pandas Aug 07, 2019

Video available

I also have a video from the Dunder Data YouTube channel where I demonstrate this entire process. I believe this is a post that is better viewed as a demonstration, so if you have the time see the video below.

Tutorial

A major pain point for beginners is writing too many lines of code in a single cell. When you are learning, you need to get feedback on every single line of code that you write and verify that it is in fact correct. Only once you have verified the result should you move on to the next line of code.

To help increase your ability to do data exploration in Jupyter Notebooks, I recommend the following five-step process:

  1. Write and execute a single line of code to explore your data
  2. Verify that this line of code works by inspecting the output
  3. Assign the result to a variable
  4. Within the same cell, in a second line output the head of the DataFrame or Series
  5. Continue to the next cell. Do not add more lines of code to the cell

Apply to every part of...

Continue Reading...

Pandas Cookbook — Develop Powerful Routines for Exploring Real-World Datasets

pandas Jul 18, 2019

In this article, I will discuss the overall approach I took to writing Pandas Cookbook along with highlights of each chapter.

I have a new book titled Master Data Analysis with Python that is far superior to Pandas Cookbook. It contains over 300 exercises and projects to reinforce all the material and will receive continuous updates indefinitely. If you are interested in Pandas Cookbook, I would strongly suggest to purchase Master Data Analysis with Python instead.

All Access Pass!

If you want to learn python, data analysis, and machine learning, then the All Access Pass will provide you access to all my current and future material for one low price.

I had three main guiding principles when writing the book:

  • Use of real-world datasets
  • Focus on doing data analysis
  • Writing modern, idiomatic pandas

First, I wanted you, the reader, to explore real-world datasets and not randomly...

Continue Reading...

Python for Data Analysis — A Critical Line-by-Line Review

book review pandas python Jul 09, 2019

In this post, I will offer my review of the book, Python for Data Analysis (2nd edition) by Wes McKinney. My name is Ted Petrou and I am an expert at pandas and author of the recently released Pandas Cookbook. I thoroughly read through PDA and created a very long, review that is available on github. This post provides some of the highlights from that full review.

What is a critical line-by-line review?

I read this book as if I was the only technical reviewer and I was counted on to find all the possible errors. Every single line of code was scrutinized and explored to see if a better solution existed. Having spent nearly every day of the last 18 months writing and talking about pandas, I have formed strong opinions about how it should be used. This critical examination lead to me finding fault with quite a large percentage of the code.

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis...

Continue Reading...

Selecting Subsets of Data in Pandas: Part 4

pandas Jun 30, 2019

This is the fourth and final part of the series “Selecting Subsets of Data in Pandas”. Pandas offers a wide variety of options for subset selection, which necessitates multiple articles. This series is broken down into the following topics.

  1. Selection with [].loc and .iloc
  2. Boolean indexing
  3. Assigning subsets of data
  4. How NOT to select subsets of data

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.

In-Person Courses

Social Media

I frequently post my python data science thoughts on social media. Follow me!

Corporate Training

If you have a group at your company looking to learn directly from an expert who understands how to teach and motivate students, let me know by filling out the form on this page.

Learning what not to do

In all...

Continue Reading...
Close

50% Complete

Register for a free account

Upon registration, you'll get access to three free courses and the discount code to purchase the All Access Pass for 50% off through Cyber Monday (12/2)