Welcome to the third edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook courtesy of Binder (mybinder.org).
This challenge is going to be fairly difficult, but should answer a question that many pandas users face — What is the best way to perform a groupby that does many custom aggregations? In this context, a ‘custom...
Welcome to the second edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook courtesy of Binder (mybinder.org).
In this challenge, your goal is to explain why taking the mean of the following DataFrame is more than 1,000x faster when setting the parameter numeric_only
to True
.
I have...
This is the first edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook thanks to Binder (mybinder.org).
In this challenge, your goal is to find the fastest solution while only using the Pandas library.
Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.
...Scikit-Learn will make one of its biggest upgrades in recent years with its mammoth version 0.20 release. For many data scientists, a typical workflow consists of using Pandas to do exploratory data analysis before moving to scikit-learn for machine learning. This new release will make the process simpler, more feature-rich, robust, and standardized.
If you want to be trusted to make decisions using...
I also have a video from the Dunder Data YouTube channel where I demonstrate this entire process. I believe this is a post that is better viewed as a demonstration, so if you have the time see the video below.
A major pain point for beginners is writing too many lines of code in a single cell. When you are learning, you need to get feedback on every single line of code that you write and verify that it is in fact correct. Only once you have verified the...
In this article, I will discuss the overall approach I took to writing Pandas Cookbook along with highlights of each chapter.
I have a new book titled Master Data Analysis with Python that is far superior to Pandas Cookbook. It contains over 300 exercises and projects to reinforce all the material and will receive continuous updates indefinitely. If you are interested in Pandas Cookbook, I would strongly suggest to purchase Master...
In this post, I will offer my review of the book, Python for Data Analysis (2nd edition) by Wes McKinney. My name is Ted Petrou and I am an expert at pandas and author of the recently released Pandas Cookbook. I thoroughly read through PDA and created a very long, review that is available on github. This post provides some of the highlights from that full review.
I read this book as if I was the only technical reviewer and I was...
In this tutorial, I will describe a process for setting up a lean and robust Python data science environment on your system. By the end of the tutorial, your system will be set up such that:
This is the fourth and final part of the series “Selecting Subsets of Data in Pandas”. Pandas offers a wide variety of options for subset selection, which necessitates multiple articles. This series is broken down into the following topics.
[]
, .loc
and .iloc
If you want to be trusted to make decisions using pandas and scikit-learn, you must become an...
This is part three of a four-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following topics.
[]
, .loc
and .iloc
If you want to be trusted to make decisions...
Upon registration, you'll get access to the following free courses: