Dunder Data Challenge #2 - Explain the 1,000x Speed Difference when taking the Mean

dunder data challenges Sep 08, 2019

Welcome to the second edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook courtesy of Binder (mybinder.org).

In this challenge, your goal is to explain why taking the mean of the following DataFrame is more than 1,000x faster when setting the parameter   numeric_only to True

Learn Data Science with Python

I have several online and in-person courses available on dunderdata.com to teach you Python, data science, and machine learning.

Online Courses

  • Master Data Analysis with Python — a comprehensive course with access to over 500 pages of text, 300 exercises, 13 hours of video, multiple projects, and detailed solutions
  • Exercise Python — master the fundamentals of Python with access to over 300 pages of text, 150 exercises, multiple projects and detailed solutions
  • Intro to...
Continue Reading...

Dunder Data Challenge #1 - Optimize Custom Grouping Function

dunder data challenges Sep 07, 2019

This is the first edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook thanks to Binder (mybinder.org).

In this challenge, your goal is to find the fastest solution while only using the Pandas library.

Become an Expert

The Challenge

The college_pop dataset contains the name, state, and population of all higher-ed institutions in the US and its territories. For each state, find the percentage of the total state population made up by the 5 largest colleges of that state. Below, you can inspect the first few rows of the...

Continue Reading...

Pandas Cookbook — Develop Powerful Routines for Exploring Real-World Datasets

pandas Jul 18, 2019

In this article, I will discuss the overall approach I took to writing Pandas Cookbook along with highlights of each chapter.

New Book — Master Data Analysis with Python

I have a new book titled Master Data Analysis with Python that is far superior to Pandas Cookbook. It contains over 300 exercises and projects to reinforce all the material and will receive continuous updates through 2020. If you are interested in Pandas Cookbook, I would strongly suggest to purchase Master Data Analysis with Python instead.

All Access Pass!

If you want to learn python, data analysis, and machine learning, then the All Access Pass! will provide you access to all my current and future material for one low price.

Pandas Cookbook Guiding Principles

I had three main guiding principles when writing the book:

  • Use of real-world datasets
  • Focus on doing data analysis
  • Writing modern, idiomatic pandas

First, I wanted you, the reader, to explore real-world datasets and not randomly...

Continue Reading...

Python for Data Analysis — A Critical Line-by-Line Review

book review pandas python Jul 09, 2019

In this post, I will offer my review of the book, Python for Data Analysis (2nd edition) by Wes McKinney. My name is Ted Petrou and I am an expert at pandas and author of the recently released Pandas Cookbook. I thoroughly read through PDA and created a very long, review that is available on github. This post provides some of the highlights from that full review.

What is a critical line-by-line review?

I read this book as if I was the only technical reviewer and I was counted on to find all the possible errors. Every single line of code was scrutinized and explored to see if a better solution existed. Having spent nearly every day of the last 18 months writing and talking about pandas, I have formed strong opinions about how it should be used. This critical examination lead to me finding fault with quite a large percentage of the code.

Review Focuses on Pandas

The main focus of PDA is on the pandas library but it does have material on basic Python, IPython...

Continue Reading...

Anaconda is bloated — Set up a lean, robust data science environment with Miniconda and Conda-Forge

Uncategorized Jul 01, 2019

In this tutorial, I will describe a process for setting up a lean and robust Python data science environment on your system. By the end of the tutorial, your system will be set up such that:

  • Python is installed with only the most common and useful packages for data science
  • Conda is installed to manage packages and environments
  • You’ll have a single, robust environment which minimizes dependency issues by relying on the conda-forge channel

Learn Data Science with Python

I have several online and in-person courses available on dunderdata.com to teach you Python, data science, and machine learning.

Online Courses

  • Master Data Analysis with Python — a comprehensive course with access to over 500 pages of text, 300 exercises, 13 hours of video, multiple projects, and detailed solutions
  • Exercise Python — master the fundamentals of Python with access to over 300 pages of text, 150 exercises, multiple projects and detailed solutions
  • Intro to Pandas...
Continue Reading...

Selecting Subsets of Data in Pandas: Part 4

pandas Jun 30, 2019

This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed solutions in another notebook. All material will be contained in my Learn-Pandas Github repository.

This is the fourth and final part of the series “How to Select Subsets of Data in Pandas”. Pandas offers a wide variety of options for subset selection, which necessitates multiple articles. This series is broken down into the following topics.

  1. Selection with [].loc and .iloc
  2. Boolean indexing
  3. Assigning subsets of data
  4. How NOT to select subsets of data

Learn Data Science with Python

I have several online and in-person courses available on dunderdata.com to teach you Python, data science, and machine learning.

Online Courses

  • Master Data Analysis with Python — a comprehensive course with access to over 500 pages of text, 300 exercises, 13 hours of video, multiple projects, and detailed solutions
  • ...
Continue Reading...

Selecting Subsets of Data in Pandas: Part 3

This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed solutions in another notebook.

Part 3: Assigning subsets of data

This is part 3 of a 4-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following topics.

  1. Selection with [].loc and .iloc
  2. Boolean indexing
  3. Assigning subsets of data
  4. How NOT to select subsets of data

Master Data Analysis with Python

Begin your journey mastering data analysis using python with my free Intro to Pandas series.

Assignment

When you see the word assign used during a discussion on programming, it usually means that a variable is set equal to some value. For most programming languages, this means using the equal sign. For instance, to assign the value 5 to the...

Continue Reading...

Selecting Subsets of Data in Pandas: Part 2

This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed solutions in another notebook.

Part 2: Boolean Indexing

This is part 2 of a four-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following 4 topics.

  1. Selection with [].loc and .iloc
  2. Boolean indexing
  3. Assigning subsets of data
  4. How NOT to select subsets of data

Master Data Analysis with Python

Begin your journey mastering data analysis using python with my free Intro to Pandas series.

Part 1 vs Part 2 subset selection

Part 1 of this series covered subset selection with [].loc and .iloc. All three of these indexers use either the row/column labels or their integer location to make selections. The actual data of...

Continue Reading...

Selecting Subsets of Data in Pandas: Part 1

This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed solutions in another notebook.

Part 1: Selection with [ ].loc and .iloc

This is the beginning of a four-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following four topics.

  1. Selection with [].loc and .iloc
  2. Boolean indexing
  3. Assigning subsets of data
  4. How NOT to select subsets of data

Master Data Analysis with Python

Begin your journey mastering data analysis using python with my free Intro to Pandas series.

Assumptions before we begin

These series of articles assume you have no knowledge of pandas, but that you understand the fundamentals of the Python programming language. It also assumes that you have installed pandas on your machine....

Continue Reading...
1 2
Close

50% Complete

Register for a free account

Upon registration, you'll get access to three free courses and the discount code to purchase the All Access Pass for 50% off through Cyber Monday (12/2)