Dunder Data Pandas Challenge Solutions 1-5

Uncategorized Nov 22, 2021

The solutions to the five Dunder Data Pandas Challenges will be presented below. Try these challenges for yourself at python.dunderdata.com before looking at the solutions. Video solutions for all of the challenges are available on YouTube. The following two datasets are used for the challenges.

import pandas as pd

emp = pd.read_csv('../data/employee.csv', parse_dates=['hire_date'])
emp.head(3)

movie = pd.read_csv('../data/movie.csv', index_col='title')
movie.head(3)

Challenge 1 (1pt)

Select the first five rows for columns title and salary.

def exercise_1(df):
    """
    Parameters
    ----------
    df: DataFrame
    
    Returns
    -------
    DataFrame
    """
    # YOUR CODE HERE
    return df[['title', 'salary']].head()

Video solution to challenge #1

Challenge 2 (1pt)

Select the year, content_rating, and duration columns for all movies between “The Lion King” and “The Little Prince”.

def exercise_1(df):
    """
    Parameters
    ----------
    df: DataFrame
    
    Returns
    -------
    DataFrame
    """
    # YOUR CODE HERE
    return df.loc["The Lion King":"The Little Prince", ['year', 'content_rating', 'duration']]

Video solution to challenge #2

Challenge 3 (1pt)

Select every 100th row beginning with the first row.

def exercise_1(df):
    """
    Parameters
    ----------
    df: DataFrame
    
    Returns
    -------
    DataFrame
    """
    # YOUR CODE HERE
    return df.iloc[::100]

Video solution to challenge #3

Challenge 4 (1pt)

Return the first 10 employees hired from March 1, 2010. Use kind="mergesort" if sorting.

def exercise_1(df):
    """
    Parameters
    ----------
    df: DataFrame
    
    Returns
    -------
    DataFrame
    """
    # YOUR CODE HERE
    return (df.query('hire_date >= "3-1-2010"')
              .sort_values('hire_date', kind='mergesort')
              .head(10))

Video solution to challenge #4

Challenge 5 (2pts)

Return the top employee of each department by salary. Return just the department, title, and salary columns. Use kind=”mergesort” if sorting.

def exercise_1(df):
    """
    Parameters
    ----------
    df: DataFrame
    
    Returns
    -------
    DataFrame
    """
    # YOUR CODE HERE
    cols = ['dept', 'title', 'salary']
    return (df[cols].sort_values('salary', ascending=False,
                                  kind='mergesort')
                    .drop_duplicates(subset=['dept']))

Video solution to challenge #5

Lots more challenges!

Dunder Data Python and Pandas are released each weekday and cover a wide variety of topics. They will get more difficult over time. Take them for free at python.dunderdata.com.