Dunder Data Challenge #3 - Multiple Custom Grouping Aggregations

dunder data challenges Sep 09, 2019

Welcome to the third edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook courtesy of Binder (mybinder.org).

This challenge is going to be fairly difficult, but should answer a question that many pandas users face — What is the best way to perform a groupby that does many custom aggregations? In this context, a ‘custom...

Continue Reading...

Dunder Data Challenge #2 - Explain the 1,000x Speed Difference when taking the Mean

dunder data challenges Sep 08, 2019

Welcome to the second edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook courtesy of Binder (mybinder.org).

In this challenge, your goal is to explain why taking the mean of the following DataFrame is more than 1,000x faster when setting the parameter   numeric_only to True

Learn Data Science with Python

I have...

Continue Reading...

Dunder Data Challenge #1 - Optimize Custom Grouping Function

dunder data challenges Sep 07, 2019

This is the first edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook thanks to Binder (mybinder.org).

In this challenge, your goal is to find the fastest solution while only using the Pandas library.

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.

...
Continue Reading...

From pandas to scikit-learn - An Exciting New Workflow

Uncategorized Sep 05, 2019

 Scikit-Learn’s new integration with Pandas

Scikit-Learn will make one of its biggest upgrades in recent years with its mammoth version 0.20 release. For many data scientists, a typical workflow consists of using Pandas to do exploratory data analysis before moving to scikit-learn for machine learning. This new release will make the process simpler, more feature-rich, robust, and standardized.

Become an Expert

If you want to be trusted to make decisions using...

Continue Reading...

The Five-Step Process for Data Exploration in a Jupyter Notebook

pandas Aug 07, 2019

Video available

I also have a video from the Dunder Data YouTube channel where I demonstrate this entire process. I believe this is a post that is better viewed as a demonstration, so if you have the time see the video below.

Tutorial

A major pain point for beginners is writing too many lines of code in a single cell. When you are learning, you need to get feedback on every single line of code that you write and verify that it is in fact correct. Only once you have verified the...

Continue Reading...

Pandas Cookbook — Develop Powerful Routines for Exploring Real-World Datasets

pandas Jul 18, 2019

In this article, I will discuss the overall approach I took to writing Pandas Cookbook along with highlights of each chapter.

I have a new book titled Master Data Analysis with Python that is far superior to Pandas Cookbook. It contains over 300 exercises and projects to reinforce all the material and will receive continuous updates indefinitely. If you are interested in Pandas Cookbook, I would strongly suggest to purchase Master...

Continue Reading...

Python for Data Analysis — A Critical Line-by-Line Review

book review pandas python Jul 09, 2019

In this post, I will offer my review of the book, Python for Data Analysis (2nd edition) by Wes McKinney. My name is Ted Petrou and I am an expert at pandas and author of the recently released Pandas Cookbook. I thoroughly read through PDA and created a very long, review that is available on github. This post provides some of the highlights from that full review.

What is a critical line-by-line review?

I read this book as if I was the only technical reviewer and I was...

Continue Reading...

Anaconda is bloated — Set up a lean, robust data science environment with Miniconda and Conda-Forge

Uncategorized Jul 01, 2019

In this tutorial, I will describe a process for setting up a lean and robust Python data science environment on your system. By the end of the tutorial, your system will be set up such that:

  • Python is installed with only the most common and useful packages for data science
  • Conda is installed to manage packages and environments
  • You’ll have a single, robust environment which minimizes dependency issues by relying on the conda-forge channel

Begin Mastering Data Science Now...

Continue Reading...

Selecting Subsets of Data in Pandas: Part 4

pandas Jun 30, 2019

This is the fourth and final part of the series “Selecting Subsets of Data in Pandas”. Pandas offers a wide variety of options for subset selection, which necessitates multiple articles. This series is broken down into the following topics.

  1. Selection with [].loc and .iloc
  2. Boolean indexing
  3. Assigning subsets of data
  4. How NOT to select subsets of data

Become an Expert

If you want to be trusted to make decisions using pandas and scikit-learn, you must become an...

Continue Reading...

Selecting Subsets of Data in Pandas: Part 3

Part 3: Assigning subsets of data

This is part three of a four-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following topics.

  1. Selection with [].loc and .iloc
  2. Boolean indexing
  3. Assigning subsets of data
  4. How NOT to select subsets of data

Become an Expert

If you want to be trusted to make decisions...

Continue Reading...
Close

Register for a free account

Upon registration, you'll get access to the following free courses:

  • Python Installation
  • Intro to Jupyter Notebooks
  • Intro to Pandas
  • Python  and Pandas Challenges