Sep 08, 2019

Welcome to the second edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook courtesy of Binder (mybinder.org).

In this challenge, your goal is to explain why taking the mean of the following DataFrame is more than 1,000x faster when setting the parameter `numeric_only`

to `True`

.

I have several online and in-person courses available on dunderdata.com to teach you Python, data science, and machine learning.

- Master Data Analysis with Python — a comprehensive course with access to over 500 pages of text, 300 exercises, 13 hours of video, multiple projects, and detailed solutions
- Exercise Python — master the fundamentals of Python with access to over 300 pages of text, 150 exercises, multiple projects and detailed solutions
- Intro to...

Sep 07, 2019

This is the first edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook thanks to Binder (mybinder.org).

In this challenge, your goal is to find the fastest solution while only using the Pandas library.

- My book Master Data Analysis with Python is the most comprehensive text on the market to learn data analysis using Python and comes with 300+ exercises and projects.
- Sign-up for the
**FREE**Intro to Pandas class - Follow me on Twitter @TedPetrou for my daily data science tricks

The `college_pop`

dataset contains the name, state, and population of all higher-ed institutions in the US and its territories. For each state, find the percentage of the total state population made up by the 5 largest colleges of that state. Below, you can inspect the first few rows of the...

Jul 18, 2019

In this article, I will discuss the overall approach I took to writing Pandas Cookbook along with highlights of each chapter.

I have a new book titled Master Data Analysis with Python that is far superior to Pandas Cookbook. It contains over 300 exercises and projects to reinforce all the material and will receive continuous updates through 2020. If you are interested in Pandas Cookbook, I would strongly suggest to purchase Master Data Analysis with Python instead.

If you want to learn python, data analysis, and machine learning, then the All Access Pass! will provide you access to all my current and future material for one low price.

I had three main guiding principles when writing the book:

- Use of real-world datasets
- Focus on doing data analysis
- Writing modern, idiomatic pandas

First, I wanted you, the reader, to explore real-world datasets and not randomly...

Jul 09, 2019

In this post, I will offer my review of the book, Python for Data Analysis (2nd edition) by Wes McKinney. My name is Ted Petrou and I am an expert at pandas and author of the recently released Pandas Cookbook. I thoroughly read through PDA and created a very long, review that is available on github. This post provides some of the highlights from that full review.

I read this book as if I was the only technical reviewer and I was counted on to find all the possible errors. Every single line of code was scrutinized and explored to see if a better solution existed. Having spent nearly every day of the last 18 months writing and talking about pandas, I have formed strong opinions about how it should be used. This critical examination lead to me finding fault with quite a large percentage of the code.

The main focus of PDA is on the pandas library but it does have material on basic Python, IPython...

Jul 01, 2019

In this tutorial, I will describe a process for setting up a lean and robust Python data science environment on your system. By the end of the tutorial, your system will be set up such that:

- Python is installed with only the most common and useful packages for data science
- Conda is installed to manage packages and environments
- You’ll have a single, robust environment which minimizes dependency issues by relying on the conda-forge channel

I have several online and in-person courses available on dunderdata.com to teach you Python, data science, and machine learning.

- Master Data Analysis with Python — a comprehensive course with access to over 500 pages of text, 300 exercises, 13 hours of video, multiple projects, and detailed solutions
- Exercise Python — master the fundamentals of Python with access to over 300 pages of text, 150 exercises, multiple projects and detailed solutions
- Intro to Pandas...

Jun 30, 2019

This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed solutions in another notebook. All material will be contained in my Learn-Pandas Github repository.

This is the fourth and final part of the series “How to Select Subsets of Data in Pandas”. Pandas offers a wide variety of options for subset selection, which necessitates multiple articles. This series is broken down into the following topics.

- Selection with
`[]`

,`.loc`

and`.iloc`

- Boolean indexing
- Assigning subsets of data
- How NOT to select subsets of data

I have several online and in-person courses available on dunderdata.com to teach you Python, data science, and machine learning.

- Master Data Analysis with Python — a comprehensive course with access to over 500 pages of text, 300 exercises, 13 hours of video, multiple projects, and detailed solutions
- ...

Jun 30, 2019

This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed solutions in another notebook.

This is part 3 of a 4-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following topics.

- Selection with
`[]`

,`.loc`

and`.iloc`

- Boolean indexing
- Assigning subsets of data
- How NOT to select subsets of data

Begin your journey mastering data analysis using python with my free Intro to Pandas series.

When you see the word **assign** used during a discussion on programming, it usually means that a variable is set equal to some value. For most programming languages, this means using the equal sign. For instance, to assign the value 5 to the...

Jun 30, 2019

This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed solutions in another notebook.

This is part 2 of a four-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following 4 topics.

- Selection with
`[]`

,`.loc`

and`.iloc`

- Boolean indexing
- Assigning subsets of data
- How NOT to select subsets of data

Begin your journey mastering data analysis using python with my free Intro to Pandas series.

Part 1 of this series covered subset selection with `[]`

, `.loc`

and `.iloc`

. All three of these **indexers** use either the row/column labels or their integer location to make selections. The actual **data** of...

Jun 05, 2019

This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed solutions in another notebook.

`[ ]`

, `.loc`

and `.iloc`

This is the beginning of a four-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following four topics.

- Selection with
`[]`

,`.loc`

and`.iloc`

- Boolean indexing
- Assigning subsets of data
- How NOT to select subsets of data

Begin your journey mastering data analysis using python with my free Intro to Pandas series.

These series of articles assume you have no knowledge of pandas, but that you understand the fundamentals of the Python programming language. It also assumes that you have installed pandas on your machine....

50% Complete

Upon registration, you'll get access to three free courses and the discount code to purchase the All Access Pass for 50% off through Cyber Monday (12/2)