Animating NBA Games with matplotlib and pandas

Uncategorized Oct 01, 2019

In this tutorial, game 7 of the 2016 NBA finals will be animated with Matplotlib one shot at a time within a Jupyter Notebook. This tutorial integrates many different topics including:

  • Using the developer tools of a browser to discover non-public APIs
  • Using h3theh3 requests library to get data into a Pandas DataFrame
  • Creating a static visual representation of an NBA court with shots from a game using Matplotlib
  • Creating a Matplotlib animation showing the description, score, time remaining, and player image for each shot taken

Full Tutorial on Github

Only a portion of the code has been included in this post. Please visit the official Dunder Data Github organization’s Matplotlib Tutorials page to download the complete tutorial.

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.

Jump to Results

Below is the video that is created at the very end of the tutorial.

Motivation: Game 7 of NBA finals — June 19, 2016

Game 7 between the Cleveland Cavaliers and the Golden State Warriors held on June 19, 2016 was very exciting with the outcome being determined at the very end. I have seen shot-positional data before and thought it would be a fun challenge to animate the entire game using Matplotlib.

Get started by visiting the game page

The NBA has a great website at nba.com with most of its data on the stats page. Once you have navigated to the stats page, click on the scores tab and then use the calendar to select the date June 19, 2016. You should see the image below:

Shot Positional Data

The NBA tracks almost every piece of data imaginable during the game. In this tutorial, we will look at the shot positional data, which consists of the following:

  • The location of the shot on the court
  • Whether the shot was made or missed
  • The player attempting the shot
  • The time remaining in the game

Get the data

From the scores page above, click on Box Score. You will get many traditional stats for both teams.

Notice the message for Shot Chart

There is a message above the box score that informs us how to find the Shot Chart. Click the number 82 in the FGA column to get the shot chart for the Cleveland Cavaliers. All the field goals attempted (shots) by them for the entire game are displayed in the following image.

Getting the shot chart data in a Pandas DataFrame

The following section is created from a great blog post by Greg Reda on how to reverse engineer an API.

No Web Scraping — Finding the internal API

Often times, we will have to scrape the raw HTML to get the data we want, but in this case, the data is fetched through an internal API that the NBA maintains. As far as I know, they do not show how it is publicly used. So, in order to get the data, we will have to take a peak into the requests being made as the page is loading.

Using the developer tools

All browsers have tools for web developers. In Google Chrome we can find the developer tools by right-clicking the page, selecting More Tools, and then Developer Tools.

Find requests in Network tab

Click on the Network tab in the developer tools and select XHR below it. XHR stands for XMLHttpRequest and is used to fetch XML or JSON data.

Refresh Page

The area is probably going to be blank when you first visit it as seen in the image above. Refresh the page and all the XHR requests will show up.

Select the shotchartdetail request

There are a lot of requests, but the one we are looking for is the shotchartdetail. Click on it, and then click on preview and reveal the parameters.

Master Python, Data Science and Machine Learning

Immerse yourself in my comprehensive path for mastering data science and machine learning with Python. Purchase the All Access Pass to get lifetime access to all current and future courses. Some of the courses it contains:

Get the All Access Pass now! 

All of the nba.com APIs

Eli Uriegas has put together a document containing all the possible API endpoints and their parameters. Since this is a not a public API, it is subject to change, and you might need to repeat the procedure above to ensure you make the right request.

View all the data

Go back to the Headers tab and copy and paste the entire Request URL into a new browser tab.

JSON Data

Your browser should return the request as JSON data.

Assign URL to string — For all teams

The above URL returns the shot chart data for just the Cleveland Cavaliers. If we modify if by changing the TeamID parameter to equal 0, we will get data for both teams. We assign this URL to a variable.

game7_url = '''http://stats.nba.com/stats/shotchartdetail?\
AheadBehind=&CFID=&CFPARAMS=&ClutchTime=\
...
&VsPlayerID4=&VsPlayerID5=&VsTeamID='''

Get JSON data in Python with Requests library

The Requests third-party library makes it easy to get data from the web. In order to make this particular request, you will need to provide a header. It appears that providing the User-Agent is enough for our request to be accepted.

To find the header values, look back at the network tab in the developer tools under the Headers tab. Scroll down until you see Request Headers. You can put these values into a dictionary and pass this to the get function. Below, we only use the user agent. Find out more from this Stack Overflow post.

import requests
headers = {'User-Agent': '<your user agent>'}
r = requests.get(base_url, params=params, headers=headers)

If you are enjoying this article, consider purchasing the All Access Pass! which includes all my current and future material for one low price.

Get the JSON data from the requests object into a Pandas DataFrame

Use the json method from the requests object to retrieve the data. The results will be given to you in a Python dictionary, which we can pass to the Pandas DataFrame constructor.

json_data = r.json()
result = json_data['resultSets'][0]
import pandas as pd
columns = result['headers']
data = result['rowSet']
df_shots = pd.DataFrame(data=data, columns=columns)
df_shots.head()

Examine the types of shots

Find the count of each type of shot.

>>> df_shots['SHOT_TYPE'].value_counts()2PT Field Goal    99
3PT Field Goal 66
Name: SHOT_TYPE, dtype: int64

Missing Free Throw Data

Unfortunately, this API does not have the free throws, which means we will need to make another request to a different endpoint. If we look at the free throws attempted (FTA) column in the box score, there is no link to click on. We need to go hunting on a different page.

Play by Play

If you go back to the page that just had the score of the game, you will see a link for play by play. This section has a description for all the events in the game including free throws.

Inspect this page again

By completing the same procedure as above, we locate the desired endpoint as ‘playbyplayv2’ along with the following URL:

play_by_play_url = '''https://stats.nba.com/stats/playbyplayv2?\                                                                                                      
EndPeriod=10&EndRange=55800&GameID=0041500407\
&RangeType=2&Season=2015-16\
&SeasonType=Playoffs&StartPeriod=1\
&StartRange=0'''

Use requests again to fetch data

We replicate our previous work (along with some transformations not shown) to get the play by play data in a Pandas DataFrame.

r = requests.get(pbp_base_url, params=pbp_params, headers=headers)
pbp_json_data = r.json()
...
df_pbp.head()


Finding Free Throws

This play by play data has every single recorded play during the game. By manually looking through the data, I noticed that when EVENTMSGTYPE was equal to 3, a free throw was attempted. Let’s create a new DataFrame of just free throws.

df_ft = df_pbp[df_pbp['EVENTMSGTYPE'] == 3]
df_ft.head()

Concatenate shot and free throw data

Let’s get all of our data in one single DataFrame.

>>> df_shots_all = pd.concat([df_shots2, df_ft2], sort=False, 
ignore_index=True)

Add a column for points

We can map the shot type to points with a dictionary and create a new column. We multiply by the shot made flag to only track points if the shot was made.

>>> points = {'Free Throw': 1,
'2PT Field Goal': 2,
'3PT Field Goal': 3}
>>> df_shots_all['POINTS'] = df_shots_all['SHOT_TYPE'] \
.replace(points) *
df_shots_all['SHOT_MADE_FLAG']

Does the data make sense?

Lets do a sanity check, and calculate all the points that were scored in the game.

>>> df_shots_all.groupby('TEAM_NAME').agg({'POINTS': 'sum'})

All good — This was the final score of the game

Visualization

Finally, let’s work on visualizing our data. The columns LOC_X and LOC_Y contain the point on the court where the shot was taken.

Use Pandas to plot

>>> df_shots_all.plot('LOC_X', 'LOC_Y', kind='scatter')

Examining Shot Locations

Looking at the range of LOC_X and LOC_Y columns it seems apparent that the dimensions are feet multiplied by 10 since a basketball court is 94ft long by 50ft wide.

There are negative values for LOC_X, and more investigation yields that the basket is located at LOC_X equal to 0. The actual basket is 4 feet from the edge of the court (-40 for LOC_X). This means that the max value for LOC_X would be 900.

Transforming Visitor Shot Locations

The same values for LOC_X and LOC_Y are used regardless of the team. To animate a game, we need to translate the shots of one team over to the other side of the court. For the visiting team we subtract the x-axis location from 900 to move it to the other side of the court. Note, we also transposed our data so that our court will be wide and not long.

is_home = df_shots_all2['HOME_TEAM'] == 1
x = df_shots_all2['LOC_Y'] + 40
x = np.where(is_home, x, 900 - x)
df_shots_all2['LOC_X_NEW'] = x
df_shots_all2['LOC_Y_NEW'] = df_shots_all2['LOC_X']

Build Static Visualization — Create Court

A great tutorial on creating an NBA court with shots can be viewed here. Matplotlib lines and patches are used to create the court.

from matplotlib.patches import Arc, Circledef create_court():
# Set-up figure
fig = plt.figure(figsize=(16, 8))
ax = fig.add_axes([.2, .1, .6, .8], frame_on=False,
xticks=[], yticks=[])

# Draw the borders of the court
ax.set_xlim(-20, 960)
ax.vlines([0, 940], -250, 250)
ax.hlines([-250, 250], 0, 940)
ax.hlines([-80, -80, 80, 80], [0, 750] * 2, [190, 940] * 2)
ax.hlines([-60, -60, 60, 60], [0, 750] * 2, [190, 940] * 2)
ax.vlines([190, 750], -80, 80)
ax.vlines(470, -250, 250)
ax.vlines([40, 900], -30, 30)
# Add the three point arc, free throw circle,
# midcourt circle and backboard and rim
ax.add_patch(Arc((190, 0), 120, 120, theta1=-90, theta2=90)
ax.add_patch(Arc((190, 0), 120, 120, theta1=90, theta2=-90)
ax.add_patch(Arc((750, 0), 120, 120, theta1=90, theta2=-90))
ax.hlines([-220, -220, 220, 220], [0, 800] * 2, [140, 940] * 2))
ax.add_patch(Arc((892.5, 0), 475, 475, theta1=112.5,
theta2=-112.5)
ax.add_patch(Arc((47.5, 0), 15, 15, theta1=0, theta2=360)
ax.add_patch(Arc((892.5, 0), 15, 15, theta1=0, theta2=360)
ax.add_patch(Circle((470, 0), 60, facecolor='none', lw=2))

# Text for score, time and decsription
ax.text(20, 270, f"{full_home_team} 0" ,
fontsize=16, fontweight='bold', label='home')
ax.text(680, 270, f"{full_visitor_team} 0",
fontsize=16, fontweight='bold', label='visitor')
ax.text(0, -270, "Q:1 12:00", fontsize= 14, label='time')
ax.text(200, -270, "", fontsize=14, label='description')
return fig, ax
fig, ax = create_court()

Add Shot Data

A Matplotlib scatterplot displays each team’s shots as a different color. Filled in circles are shots that were made. Matplotlib allows us to use strings to refer to column names. To take advantage of this, we create two new columns FACECOLOR and EDGECOLOR and simply use their names for the face and edge color parameters.

fig, ax = create_court()missed = df_shots_all2['SHOT_MADE_FLAG'] == 0
edgecolor = df_shots_all2['HOME_TEAM'].replace({0: 'r', 1:'b'})
facecolor = edgecolor.copy()
facecolor[missed] = 'none'
df_shots_all2['FACECOLOR'] = facecolor
df_shots_all2['EDGECOLOR'] = edgecolor
ax.scatter('LOC_X_NEW', 'LOC_Y_NEW', marker='o', s=120,
facecolors='FACECOLOR', edgecolors='EDGECOLOR',
lw=2, data=df_shots_all2)

Player Images

We can add player images to our visualization after every shot they take. When clicking on individual player images, I discovered the pattern for finding the correct player image for the correct year. You need to have the team ID, year, and player ID.

unique_players = df_shots_all2[['TEAM_ID', 'PLAYER_ID']]
.drop_duplicates()
unique_players.head()

Get image array for each player

We use the imread Matplotlib function to convert an image into a NumPy array of RGBA values and store them in a dictionary. A new Matplotlib Axes object is created for each team to display the players. The below image shows the result after setting a player image for each team.

Animation

To create animations in Matplotlib inside the Jupyter notebook, two things must occur. First, the magic command %matplotlib notebook must be run. This changes matplotlib’s backend to allow for interactive plots. Second, the FuncAnimation function must be imported which utilizes a user-defined function to control the objects in the plot.

Provide a function to create a new plot for every frame

The main parameter passed toFuncAnimation is a function that gets called once for each frame and is used to update the figure. The function update is created to do the following:

  • Remove the old player image data and replace it with the current player’s image
  • Plot a single new shot as a one-point scatter plot
  • Update the score and time remaining
  • Display the play by play description
def update(frame_number):    # remove old player image data                    
im_left.set_data([[]])
im_right.set_data([[]])
# get next shot data
current_row = df_shots_all2.iloc[frame_number - 1]
# plot shot as a one-point scatter plot
ax.scatter('LOC_X_NEW', 'LOC_Y_NEW', marker='o', s=120,
facecolors='FACECOLOR', edgecolors='EDGECOLOR',
lw=2, data=current_row);
# update scores
team_type = 'home' if current_row['HOME_TEAM'] == 1 else \
'visitor'
scores[team_type] += current_row['POINTS']
texts['home'].set_text(f"{full_home_team} {scores['home']}")
texts['visitor'].set_text(f"{full_visitor_team} "
"{scores['visitor']}")
# update time remaining
per = current_row['PERIOD']
mr = current_row['MINUTES_REMAINING']
sr = current_row['SECONDS_REMAINING']
texts['time'].set_text(f"Q:{per} {mr:02d}:{sr:02d}")
texts['description'].set_text(current_row['DESCRIPTION'])

# get player image data
team_id = current_row['TEAM_ID']
player_id = current_row['PLAYER_ID']
image_array = player_image_data[team_id][player_id]
if team_type == 'home':
im_left.set_data(image_array)
else:
im_right.set_data(image_array)
animation = FuncAnimation(fig, func=update,
frames=len(df_shots_all2) + 1,
init_func=init, interval=10, repeat=False)

Saving the animation

Notice that the result was saved to the animation variable. In order to save the animation to a file, the save method is called which needs a file name and a MovieWriter instance. If no writer is provided it will default to the writer in the 'animation.writer' value in the rcParams dictionary.

# my default movie writer
>>> plt.rcParams['animation.writer']
'ffmpeg'

Matplotlib can show you a list of all the available writers by importing the helper variable writers.

>>> from matplotlib.animation import writers
>>> writers.list()
['pillow', 'ffmpeg', 'ffmpeg_file', 'imagemagick', 'imagemagick_file', 'html']

We can then import one of the writer classes directly, create an instance of it and pass it to the save method. Here we choose to create our movie with 3 frames every 4 seconds.

>>> from matplotlib.animation import FFMpegWriter
>>> writer = FFMpegWriter(fps=.75)
>>> animation.save('nba_game_animation.mp4', writer=writer)

Embedding video into a Jupyter Notebook

After successfully saving the file, you can embed it directly into the notebook by using the video html tag. We will useHTML from the IPython.display module to do the actual embedding.

from IPython.display import HTML
HTML("""<video width="600" height="450" controls>
<source src="nba_game_animation.mp4" type="video/mp4">
</video>""")

Replication

The actual notebook used for this tutorial contains quite a bit more code than what appeared during this post. You should be able to replicate this with any other NBA game.

Summary

This tutorial covered a wide range of topics including:

  • Using the browser developer tools to uncover non-public APIs
  • Using the requests library to get JSON data into a Pandas DataFrame
  • Building a static representation of the court and shots with patches, lines, and a scatterplot
  • Adding two more Axes objects to our figure to hold images
  • Using FuncAnimation with a custom function that updates the figure each frame
  • Saving the animation to a file with a movie writer

Master Python, Data Science and Machine Learning

Immerse yourself in my comprehensive path for mastering data science and machine learning with Python. Purchase the All Access Pass to get lifetime access to all current and future courses. Some of the courses it contains:

Get the All Access Pass now!

Close

Register for a free account

Upon registration, you'll get access to the following free courses:

  • Python Installation
  • Intro to Jupyter Notebooks
  • Intro to Pandas
  • Python  and Pandas Challenges