Finding the Percentage of Missing Values in a Pandas DataFrame

Uncategorized Nov 01, 2019

In this tutorial, we will cover an efficient and straightforward method for finding the percentage of missing values in a Pandas DataFrame. This tutorial is available as a video on YouTube.

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.

Non-intuitive Solution

The final solution to this problem is not quite intuitive for most people when they first encounter it. We will slowly build up to it...

Continue Reading...

Animating NBA Games with matplotlib and pandas

Uncategorized Oct 01, 2019

In this tutorial, game 7 of the 2016 NBA finals will be animated with Matplotlib one shot at a time within a Jupyter Notebook. This tutorial integrates many different topics including:

  • Using the developer tools of a browser to discover non-public APIs
  • Using h3theh3 requests library to get data into a Pandas DataFrame
  • Creating a static visual representation of an NBA court with shots from a game using Matplotlib
  • Creating a Matplotlib animation showing the description, score, time...
Continue Reading...

Dunder Data Challenge #3 - Optimal Solution

dunder data challenges Sep 17, 2019

In this article, I will present an ‘optimal’ solution to Dunder Data Challenge #3. Please refer to that article for the problem setup. Work on this challenge directly in a Jupyter Notebook right now by clicking this link.

Naive Solution — Custom function with apply

The naive solution was presented in detail in the previous article. The end result was a massive custom function containing many boolean filters used to find specific subsets of data to...

Continue Reading...

Use the brackets to select a single pandas DataFrame column and not dot notation

pandas Sep 13, 2019
 

pandas offers its users two choices to select a single column of data and that is with either brackets or dot notation. In this article, I suggest using the brackets and not dot notation for the following ten reasons.

  1. Select column names with spaces
  2. Select column names that have the same name as methods
  3. Select columns with variables
  4. Select non-string columns
  5. Set new columns
  6. Select multiple columns
  7. Dot notation is a strict subset of the brackets
  8. Use one way which works for all situations
  9. ...
Continue Reading...

Dunder Data Challenge #3 - Naive Solution

dunder data challenges Sep 12, 2019

To view the problem setup, go to the Dunder Data Challenge #3 post. This post will contain the solution.

Master Data Analysis with Python

Master Data Analysis with Python is an extremely comprehensive course that will help you learn pandas to do data analysis.

I believe that it is the best possible resource available for learning how to data analysis with pandas and provide a 30-day 100% money back guarantee if you are not satisfied.

Solution

I will first present a naive solution that...

Continue Reading...

Dunder Data Challenge #3 - Multiple Custom Grouping Aggregations

dunder data challenges Sep 09, 2019

Welcome to the third edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook courtesy of Binder (mybinder.org).

This challenge is going to be fairly difficult, but should answer a question that many pandas users face — What is the best way to perform a groupby that does many custom aggregations? In this context, a ‘custom...

Continue Reading...

Dunder Data Challenge #2 - Explain the 1,000x Speed Difference when taking the Mean

dunder data challenges Sep 08, 2019

Welcome to the second edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook courtesy of Binder (mybinder.org).

In this challenge, your goal is to explain why taking the mean of the following DataFrame is more than 1,000x faster when setting the parameter   numeric_only to True

Learn Data Science with Python

I have...

Continue Reading...

Dunder Data Challenge #1 - Optimize Custom Grouping Function

dunder data challenges Sep 07, 2019

This is the first edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook thanks to Binder (mybinder.org).

In this challenge, your goal is to find the fastest solution while only using the Pandas library.

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.

...
Continue Reading...

From pandas to scikit-learn - An Exciting New Workflow

Uncategorized Sep 05, 2019

 Scikit-Learn’s new integration with Pandas

Scikit-Learn will make one of its biggest upgrades in recent years with its mammoth version 0.20 release. For many data scientists, a typical workflow consists of using Pandas to do exploratory data analysis before moving to scikit-learn for machine learning. This new release will make the process simpler, more feature-rich, robust, and standardized.

Become an Expert

If you want to be trusted to make decisions using...

Continue Reading...

The Five-Step Process for Data Exploration in a Jupyter Notebook

pandas Aug 07, 2019

Video available

I also have a video from the Dunder Data YouTube channel where I demonstrate this entire process. I believe this is a post that is better viewed as a demonstration, so if you have the time see the video below.

Tutorial

A major pain point for beginners is writing too many lines of code in a single cell. When you are learning, you need to get feedback on every single line of code that you write and verify that it is in fact correct. Only once you have verified the...

Continue Reading...
Close

Register for a free account

Upon registration, you'll get access to four free courses.