Selecting subsets of data in pandas is not a trivial task as there are numerous ways to do the same thing. Different pandas users select data in different ways, so these options can be overwhelming. I wrote a long frou-part series on it to clarify how its done. For instance, take a look at the following options for selecting a single column of data (assuming it’s the first column):
In this challenge, you are given a table of closing stock prices for 10 different stocks with data going back as far as 1999. For each stock, calculate the interquartile range (IQR). Return a DataFrame that satisfies the following conditions:
In this post, I detail the solution to Dunder Data Challenge #4 — Finding the Date of the Largest Percentage Stock Price Drop.
To begin, we need to find the percentage drop for each stock for each day. pandas has a built-in method for this called
pct_change. By default, it finds the percentage change between the current value and the one immediately above it. Like most DataFrame methods, it treats each column independently from the others.
If we call it on our current...
In this challenge, you are given a table of closing stock prices for 10 different stocks with data going back as far as 1999. For each stock, find the date where it had its largest one-day percentage loss.
Begin working this challenge now in a Jupyter Notebook thanks to Binder (mybinder.org). The data is found in the
stocks10.csv file with the ticker symbol as a column name. The Dunder Data Challenges Github...
In this tutorial, we will cover an efficient and straightforward method for finding the percentage of missing values in a Pandas DataFrame. This tutorial is available as a video on YouTube.
Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.
The final solution to this problem is not quite intuitive for most people when they first encounter it. We will slowly build up to it...
In this tutorial, game 7 of the 2016 NBA finals will be animated with Matplotlib one shot at a time within a Jupyter Notebook. This tutorial integrates many different topics including:
In this article, I will present an ‘optimal’ solution to Dunder Data Challenge #3. Please refer to that article for the problem setup. Work on this challenge directly in a Jupyter Notebook right now by clicking this link.
The naive solution was presented in detail in the previous article. The end result was a massive custom function containing many boolean filters used to find specific subsets of data to...
pandas offers its users two choices to select a single column of data and that is with either brackets or dot notation. In this article, I suggest using the brackets and not dot notation for the following ten reasons.
To view the problem setup, go to the Dunder Data Challenge #3 post. This post will contain the solution.
Master Data Analysis with Python is an extremely comprehensive course that will help you learn pandas to do data analysis.
I believe that it is the best possible resource available for learning how to data analysis with pandas and provide a 30-day 100% money back guarantee if you are not satisfied.
I will first present a naive solution that...
Welcome to the third edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook courtesy of Binder (mybinder.org).
This challenge is going to be fairly difficult, but should answer a question that many pandas users face — What is the best way to perform a groupby that does many custom aggregations? In this context, a ‘custom...