 ## Recreate Tesla Cybertruck in Matplotlib - Dunder Data Challenge #6 Solution

Dec 10, 2019

In this post, we recreate the new Tesla Cybertruck using matplotlib and animate it so that it drives. Our goal is to recreate this image below.  Click the video at the top of this post to view the animation and final solution.

#### Tutorial

A tutorial will now follow that describes the recreation. It will discuss the following:

• Figure and Axes setup
• Animation

Understanding these topics should give you enough to start animating your own figures in matplotlib.

##### Figure and Axes setup

We first create a matplotlib Figure and Axes, remove the axis labels and tick marks, and set the x and y axis limits. The `fill_between` method is used to set two different background colors.

`import numpy as np import matplotlib.pyplot as plt %matplotlib inline fig, ax = plt.subplots(figsize=(16, 8)) ax.axis('off') ax.set_ylim(0, 1) ax.set_xlim(0, 2) ax.fill_between(x=[0, 2], y1=.36, y2=1, color='black') ax.fill_between(x=[0, 2], y1=...`

## Dunder Data Challenge #6 — Recreate the Tesla Cybertruck with Matplotlib

Nov 26, 2019 In this challenge, you will recreate the Tesla Cybertruck unveiled last week using matplotlib.

All challenges are available to be completed in your browser in a Jupyter Notebook now thanks to Binder (mybinder.org).

# Challenge

Use matplotlib to recreate the Tesla Cybertruck image above.

## Extra Challenge

Add animation so that it drives off the screen.

I’m still working on this challenge myself. My current recreation is below:

# Become a pandas expert

If you are looking to completely master the pandas library and become a trusted expert for doing data science work, check out my book Master Data Analysis with Python. It comes with over 300 exercises with detailed solutions covering the pandas library in-depth.

## Dunder Data Challenge #5 Solution

Nov 25, 2019 This post presents a solution to Dunder Data Challenge #5 — Keeping Values Within the Interquartile Range.

All challenges may be worked in a Jupyter Notebook right now thanks to Binder (mybinder.org).

### Solution

We begin by finding the first and third quartiles of each stock using the `quantile` method. This is an aggregation which returns a single value for each column by default. Set the first parameter, `q` to a float between 0 and 1 to represent the quantile. Below, we create two variables to hold the first and third quartiles (also known as the 25th and 75th percentiles) and output their results to the screen.

`import pandas as pdstocks = pd.read_csv('../data/stocks10.csv', index_col='date',  parse_dates=['date'])stocks.head()`
`>>> lower = stocks.quantile(.25)>>> upper = stocks.quantile(.75)>>> lowerMSFT 19.1500AAPL 3.9100SLB 25.6200AMZN 40.4600TSLA 33.9375XOM 32.6200WMT 37.6200T 14.5000FB 62.3000V...`

## Dunder Data Challenge #5 — Keeping Values Within the Interquartile Range

Nov 14, 2019 In this challenge, you are given a table of closing stock prices for 10 different stocks with data going back as far as 1999. For each stock, calculate the interquartile range (IQR). Return a DataFrame that satisfies the following conditions:

• Keep values as they are if they are within the IQR
• For values lower than the first quartile, make them equal equal to the exact value of the first quartile
• For values higher than the third quartile, make them equal equal to the exact value of the third quartile

Start this challenge in a Jupyter Notebook right now thanks to Binder (mybinder.org)

`import pandas as pdstocks = pd.read_csv('../data/stocks10.csv', index_col='date', parse_dates=['date'])stocks.head()`

#### Challenge

There is a straightforward solution that completes this challenge in a single line of readable code. Can you find it?

### Become a pandas expert

If you are looking to completely master the pandas library and become a trusted expert for doing data science work,...

## Dunder Data Challenge #4 - Solution

Nov 13, 2019 In this post, I detail the solution to Dunder Data Challenge #4 — Finding the Date of the Largest Percentage Stock Price Drop.

#### Solution

To begin, we need to find the percentage drop for each stock for each day. pandas has a built-in method for this called `pct_change`. By default, it finds the percentage change between the current value and the one immediately above it. Like most DataFrame methods, it treats each column independently from the others.

If we call it on our current DataFrame, we’ll get an error as it will not work on our date column. Let’s re-read in the data, converting the date column to a datetime and place it in the index.

`stocks = pd.read_csv('../data/stocks10.csv', parse_dates=['date'], index_col='date')stocks.head()`

Placing the date column in the index is a key part of this challenge that makes our solution quite a bit nicer. Let’s now call the `pct_change` method to get the percentage change for each trading day.

`...`

## Dunder Data Challenge #4 - Finding the Date of the Largest Percentage Stock Price Drop

Nov 12, 2019

In this challenge, you are given a table of closing stock prices for 10 different stocks with data going back as far as 1999. For each stock, find the date where it had its largest one-day percentage loss.

## #### Begin working this challenge now in a Jupyter Notebook with Binder

Begin working this challenge now in a Jupyter Notebook thanks to Binder (mybinder.org). The data is found in the `stocks10.csv` file with the ticker symbol as a column name.

The Dunder Data Challenges Github repository also contains all of the challenges.

#### Challenge

Can you return a Series that has the ticker symbols in the index and the date where the largest percentage price drop happened as the values? There is a nice, fast solution that uses just a minimal amount of code without any loops.

#### Extra challenge

Can you return a DataFrame with the ticker symbol as the columns with a row for the date and another row for the percentage price drop?

## Dunder Data Challenge #3 - Optimal Solution

Sep 17, 2019

In this article, I will present an ‘optimal’ solution to Dunder Data Challenge #3. Please refer to that article for the problem setup. Work on this challenge directly in a Jupyter Notebook right now by clicking this link.

### Naive Solution — Custom function with apply

The naive solution was presented in detail in the previous article. The end result was a massive custom function containing many boolean filters used to find specific subsets of data to aggregate. For each group, a Series was returned with 11 values. Each of these values became a new column in the resulting DataFrame. Let’s take a look at the custom function:

Our performance using this naive solution takes nearly 4 seconds.

## Use the brackets to select a single pandas DataFrame column and not dot notation

Sep 13, 2019

pandas offers its users two choices to select a single column of data and that is with either brackets or dot notation. In this article, I suggest using the brackets and not dot notation for the following ten reasons.

1. Select column names with spaces
2. Select column names that have the same name as methods
3. Select columns with variables
4. Select non-string columns
5. Set new columns
6. Select multiple columns
7. Dot notation is a strict subset of the brackets
8. Use one way which works for all situations
9. Auto-completion works in the brackets and following it
10. Brackets are the canonical way to select subsets for all objects

#### Selecting a single column

Let’s begin by creating a small DataFrame with a few columns

`import pandas as pddf = pd.DataFrame({'name': ['Niko', 'Penelope', 'Aria'], 'average score': [10, 5, 3], 'max': [99, 100, 3]})df`

Let’s select the `name` column with dot notation. Many pandas users like dot notation.

`>>> df.name0 Niko1 Penelope2 Aria`

...

## Dunder Data Challenge #3 - Naive Solution

Sep 12, 2019

To view the problem setup, go to the Dunder Data Challenge #3 post. This post will contain the solution.

### Become an Expert

I will first present a naive solution that returns the correct results, but is extremely slow. It uses a large custom function with the groupby `apply` method. Using the groupby `apply` method has potential to capsize your program as performance can be awful.

One of my first attempts at using a groupby `apply` to solve a complex grouping problem resulted in a computation that took about eight hours to finish. The dataset was fairly large, at around a million rows, but could still easily fit in memory. I eventually ended up solving the problem using SAS (and not pandas) and shrank the execution...

## Dunder Data Challenge #3 - Multiple Custom Grouping Aggregations

Sep 09, 2019

Welcome to the third edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook courtesy of Binder (mybinder.org).

This challenge is going to be fairly difficult, but should answer a question that many pandas users face — What is the best way to perform a groupby that does many custom aggregations? In this context, a ‘custom aggregation’ is defined as one that is not directly available to use from pandas and one that you must write a custom function.

In Dunder Data Challenge #1, a single aggregation, which required a custom grouping function, was the desired result. In this challenge, you’ll need to return several aggregations when grouping. There are a few different solutions to this problem, but depending on how you arrive at your solution, there could arise enormous performance differences. I am...

Close 