Minimally Sufficient Pandas

pandas Jan 01, 2020

In this article, I will offer an opinionated perspective on how to best use the Pandas library for data analysis. My objective is to argue that only a small subset of the library is sufficient to complete nearly all of the data analysis tasks that one will encounter. This minimally sufficient subset of the library will benefit both beginners and professionals using Pandas. Not everyone will agree with the suggestions I lay forward, but they are how...

Continue Reading...

Recreate Tesla Cybertruck in Matplotlib - Dunder Data Challenge #6 Solution

dunder data challenges Dec 10, 2019
 

In this post, we recreate the new Tesla Cybertruck using matplotlib and animate it so that it drives. Our goal is to recreate this image below.

Click the video at the top of this post to view the animation and final solution.

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.

Tutorial

A tutorial will now follow that describes the recreation. It will discuss the...

Continue Reading...

Dunder Data Challenge #6 — Recreate the Tesla Cybertruck with Matplotlib

dunder data challenges Nov 26, 2019

In this challenge, you will recreate the Tesla Cybertruck unveiled last week using matplotlib. All challenges are available to be completed in your browser in a Jupyter Notebook now thanks to Binder (mybinder.org).

Challenge

Use matplotlib to recreate the Tesla Cybertruck image above.

Extra Challenge

Add animation so that it drives off the screen.

I’m still working on this challenge myself. My current recreation is below:

 

Become a pandas...

Continue Reading...

Dunder Data Challenge #5 Solution

dunder data challenges Nov 25, 2019

This post presents a solution to Dunder Data Challenge #5 — Keeping Values Within the Interquartile Range.

All challenges may be worked in a Jupyter Notebook right now thanks to Binder (mybinder.org).

Solution

We begin by finding the first and third quartiles of each stock using the quantile method. This is an aggregation which returns a single value for each column by default. Set the first parameter, q to a float between 0 and 1 to...

Continue Reading...

The Craziness of Subset Selection in Pandas

pandas Nov 21, 2019

Selecting subsets of data in pandas is not a trivial task as there are numerous ways to do the same thing. Different pandas users select data in different ways, so these options can be overwhelming. I wrote a long frou-part series on it to clarify how its done. For instance, take a look at the following options for selecting a single column of data (assuming it’s the first column):

  • df[‘colname’]
  • df[[‘colname’]]
  • df.colname
  • df.loc[:,...
Continue Reading...

Dunder Data Challenge #5 — Keeping Values Within the Interquartile Range

dunder data challenges Nov 14, 2019

In this challenge, you are given a table of closing stock prices for 10 different stocks with data going back as far as 1999. For each stock, calculate the interquartile range (IQR). Return a DataFrame that satisfies the following conditions:

  • Keep values as they are if they are within the IQR
  • For values lower than the first quartile, make them equal equal to the exact value of the first quartile
  • For values higher than the third quartile, make them equal equal to the exact value of the...
Continue Reading...

Dunder Data Challenge #4 - Solution

dunder data challenges Nov 13, 2019

In this post, I detail the solution to Dunder Data Challenge #4 — Finding the Date of the Largest Percentage Stock Price Drop.

Solution

To begin, we need to find the percentage drop for each stock for each day. pandas has a built-in method for this called pct_change. By default, it finds the percentage change between the current value and the one immediately above it. Like most DataFrame methods, it treats each column independently from the others.

If we call it on our current...

Continue Reading...

Dunder Data Challenge #4 - Finding the Date of the Largest Percentage Stock Price Drop

dunder data challenges Nov 12, 2019

In this challenge, you are given a table of closing stock prices for 10 different stocks with data going back as far as 1999. For each stock, find the date where it had its largest one-day percentage loss.

Begin working this challenge now in a Jupyter Notebook with Binder

Begin working this challenge now in a Jupyter Notebook thanks to Binder (mybinder.org). The data is found in the stocks10.csv file with the ticker symbol as a column name. The Dunder Data Challenges Github...

Continue Reading...

Finding the Percentage of Missing Values in a Pandas DataFrame

Uncategorized Nov 01, 2019

In this tutorial, we will cover an efficient and straightforward method for finding the percentage of missing values in a Pandas DataFrame. This tutorial is available as a video on YouTube.

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.

Non-intuitive Solution

The final solution to this problem is not quite intuitive for most people when they first encounter it. We will slowly build up to it...

Continue Reading...

Animating NBA Games with matplotlib and pandas

Uncategorized Oct 01, 2019

In this tutorial, game 7 of the 2016 NBA finals will be animated with Matplotlib one shot at a time within a Jupyter Notebook. This tutorial integrates many different topics including:

  • Using the developer tools of a browser to discover non-public APIs
  • Using h3theh3 requests library to get data into a Pandas DataFrame
  • Creating a static visual representation of an NBA court with shots from a game using Matplotlib
  • Creating a Matplotlib animation showing the description, score, time...
Continue Reading...
Close

Register for a free account

Upon registration, you'll get access to the following free courses:

  • Python Installation
  • Intro to Jupyter Notebooks
  • Intro to Pandas
  • Python  and Pandas Challenges