Pandas Cookbook — Develop Powerful Routines for Exploring Real-World Datasets

pandas Jul 18, 2019

In this article, I will discuss the overall approach I took to writing Pandas Cookbook along with highlights of each chapter.

New Book — Master Data Analysis with Python

I have a new book titled Master Data Analysis with Python that is far superior to Pandas Cookbook. It contains over 300 exercises and projects to reinforce all the material and will receive continuous updates through 2020. If you are interested in Pandas Cookbook, I would strongly suggest to purchase Master Data Analysis with Python instead.

All Access Pass!

If you want to learn python, data analysis, and machine learning, then the All Access Pass! will provide you access to all my current and future material for one low price.

Pandas Cookbook Guiding Principles

I had three main guiding principles when writing the book:

  • Use of real-world datasets
  • Focus on doing data analysis
  • Writing modern, idiomatic pandas

First, I wanted you, the reader, to explore real-world datasets and not randomly...

Continue Reading...

Python for Data Analysis — A Critical Line-by-Line Review

book review pandas python Jul 09, 2019

In this post, I will offer my review of the book, Python for Data Analysis (2nd edition) by Wes McKinney. My name is Ted Petrou and I am an expert at pandas and author of the recently released Pandas Cookbook. I thoroughly read through PDA and created a very long, review that is available on github. This post provides some of the highlights from that full review.

What is a critical line-by-line review?

I read this book as if I was the only technical reviewer and I was counted on to find all the possible errors. Every single line of code was scrutinized and explored to see if a better solution existed. Having spent nearly every day of the last 18 months writing and talking about pandas, I have formed strong opinions about how it should be used. This critical examination lead to me finding fault with quite a large percentage of the code.

Review Focuses on Pandas

The main focus of PDA is on the pandas library but it does have material on basic Python, IPython...

Continue Reading...

Anaconda is bloated — Set up a lean, robust data science environment with Miniconda and Conda-Forge

Uncategorized Jul 01, 2019

In this tutorial, I will describe a process for setting up a lean and robust Python data science environment on your system. By the end of the tutorial, your system will be set up such that:

  • Python is installed with only the most common and useful packages for data science
  • Conda is installed to manage packages and environments
  • You’ll have a single, robust environment which minimizes dependency issues by relying on the conda-forge channel

Become an Expert

I am extraordinarily dedicated to producing the absolute best content for doing data science using Python. For all my courses and live training visit Dunder Data.

Continue Reading...

Selecting Subsets of Data in Pandas: Part 3

This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed solutions in another notebook.

Part 3: Assigning subsets of data

This is part 3 of a 4-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following topics.

  1. Selection with [].loc and .iloc
  2. Boolean indexing
  3. Assigning subsets of data
  4. How NOT to select subsets of data

Assignment

When you see the word assign used during a discussion on programming, it usually means that a variable is set equal to some value. For most programming languages, this means using the equal sign. For instance, to assign the value 5 to the variable x in Python, we do the following:

>>> x = 5

This is formally called an assignment statement. More generally, we can...

Continue Reading...

Selecting Subsets of Data in Pandas: Part 2

This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed solutions in another notebook.

Part 2: Boolean Indexing

This is part 2 of a four-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following 4 topics.

  1. Selection with [].loc and .iloc
  2. Boolean indexing
  3. Assigning subsets of data
  4. How NOT to select subsets of data

Part 1 vs Part 2 subset selection

Part 1 of this series covered subset selection with [].loc and .iloc. All three of these indexers use either the row/column labels or their integer location to make selections. The actual data of the Series/DataFrame is not used at all during the selection.

In Part 2 of this series, on boolean indexing, we will select...

Continue Reading...

Selecting Subsets of Data in Pandas: Part 1

This article is available as a Jupyter Notebook complete with exercises at the bottom to practice and detailed solutions in another notebook.

Part 1: Selection with [ ].loc and .iloc

This is the beginning of a four-part series on how to select subsets of data from a pandas DataFrame or Series. Pandas offers a wide variety of options for subset selection which necessitates multiple articles. This series is broken down into the following four topics.

  1. Selection with [].loc and .iloc
  2. Boolean indexing
  3. Assigning subsets of data
  4. How NOT to select subsets of data

Learn More

If you’d like to learn more and support my work:

Assumptions before we begin

These series of articles assume you have no knowledge of pandas, but that you understand the fundamentals...

Continue Reading...
Close

50% Complete

Two Step

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.