Nov 25, 2019

This post presents a solution to Dunder Data Challenge #5 — Keeping Values Within the Interquartile Range.

All challenges may be worked in a Jupyter Notebook right now thanks to Binder (mybinder.org).

We begin by finding the first and third quartiles of each stock using the `quantile`

method. This is an **aggregation** which returns a single value for each column by default. Set the first parameter, `q`

to a float between 0 and 1 to represent the quantile. Below, we create two variables to hold the first and third quartiles (also known as the 25th and 75th percentiles) and output their results to the screen.

`import pandas as pd`

stocks = pd.read_csv('../data/stocks10.csv', index_col='date',

parse_dates=['date'])

stocks.head()

>>> lower = stocks.quantile(.25)

>>> upper = stocks.quantile(.75)

>>> lower

MSFT 19.1500

AAPL 3.9100

SLB 25.6200

AMZN 40.4600

TSLA 33.9375

XOM 32.6200

WMT 37.6200

T 14.5000

FB 62.3000

V 19.4750

Name: 0.25, dtype: float64

>>> upper

MSFT 39.2600

AAPL 90.5900

SLB 66.2900

AMZN 362.7000

TSLA 260.4700

XOM 71.8100

WMT 65.1500

T 26.2300

FB 162.3050

V 80.3375

Name: 0.75, dtype: float64

We now use the `clip`

method which trims values in a DataFrame at the given threshold. It has two parameters `lower`

and `upper`

which can either be a single value or a sequence of values. We set each parameter to the Series containing the appropriate quartile. The `clip`

method requires that we use the `axis`

parameter to inform pandas which direction to align the given sequence. We align with the columns.

`stocks_final `**=** stocks.clip(lower, upper, axis**=**'columns')

stocks_final.head()

Let’s verify that each column contains the correct values by taking the min and max of each one.

`stocks_final.agg(['min', 'max'])`

Using one line of code, we can pass the Series containing the quartiles directly to the `clip`

method.

`stocks.clip(stocks.quantile(.25), stocks.quantile(.75), `

axis**=**'columns')

This is just for fun, but you can pass the `quantile`

method a list to return multiple quantiles on each column.

`stocks.quantile([.25, .75])`

pandas default iteration is over the column names. But, numpy defaults its iteration by row. We can use this knowledge to unpack each of the first two rows as the first two parameters in the `clip`

method after using the `values`

attribute to get the numpy array from the DataFrame.

`stocks.clip(`*****stocks.quantile([.25, .75]).values, axis**=**'columns')

If you are looking to completely master the pandas library and become a trusted expert for doing data science work, check out my book Master Data Analysis with Python. It comes with over 300 exercises with detailed solutions covering the pandas library in-depth.

Upon registration, you'll get access to the following free courses:

- Python Installation
- Intro to Jupyter Notebooks
- Intro to Pandas
- Python and Pandas Challenges