Lab 2: The NBA Bubble

OIDD 245 Tambe

1. Overview

A. NBA Analytics

Compared with other sports, such as baseball, it has been difficult to “quantify” and study what happens during basketball games. Superior instrumentation, such as digital tracking of the basketballs used during games, has made this easier, and many NBA coaches now have access to a wealth of data which can be used to study, analyze, and improve what happens on the court. As a result, the NBA has gone all-in on analytics. In one interesting example of how analytics is changing coaching strategy, the Wall Street Journal profiled a high school team in Minnesota that "never takes a bad shot".

B. What do managers want?

Like all managers, NBA general managers want value; they want the best performance to dollar ratio. An example of using data to improve performance to dollar ratio with which some of you are probably familar is the Moneyball example (made famous through books by Michael Lewis and later, by a movie starring Brad Pitt and Jonah Hill), in which the budget-constrained Oakland A’s, led by GM Billy Beane, used statistics to perform better than what would have been predicted for their team based on their financial resources.

For example, the following graph illustrates both a) how important a factor a Major League Baseball team’s budget is for predicting team performance, and b) how far beyond expectations the Oakland A’s were able to perform given their limited budget. These types of analyses inform the question of where hidden value can be found for professional sports teams.

C. Data analytics and organizational change

As a side note, sports analytics is a good place to think about one of the biggest impediments to data-driven decision making in modern business, which is organizational change. In organizations moving towards data collection and analysis, there is a “battle” being waged between the “quants” and those who make decisions based on intuition. This has historically been true of marketing and finance (e.g. quant vs discretionary trading) and is now becoming true in domains such as HR, sports, entertainment, and so on.

In these industries, many senior decision-makers remain skeptical of the value of data for informing decisions (although they are growing fewer in number each year), partly because data-driven processes can be flawed, and partly because it threatens their power within the organization. This tension plays out in one of my favorite scenes from Moneyball.

Charles Barkley, a famous and outspoken NBA basketball player who is now retired, captures the sentiments of many in his quote below. (More background on the Barkley interview along with a response from Mark Cuban is available online).

Analytics don't work at all. It's just some crap that people who were really smart made up to try to get in the game because they had no talent. Because they had no talent to be able to play, so smart guys wanted to fit in, so they made up a term called analytics. Analytics don't work.

What analytics did the Miami Heat have? What analytics did the Chicago Bulls have? What analytics do the Spurs have? They have the best players, coaching staffs who make players better. Like I say, the Rockets sucked for a long time. So, they went out and paid James Harden a lot of money. Then they went out and got Dwight Howard, they got better. They had Chandler Parsons, this year they got Ariza. The NBA is about talent.

All these guys who run these organizations who talk about analytics, they have one thing in common: They\'re a bunch of guys who ain't never played the game, they never got the girls in high school, and they just want to get in the game.

-- Charles Barkley, 2015

As a generation of workers entering an increasingly data-driven economy, it is important to be aware of the kinds of turf battles around data-driven decision making happening in industries from healthcare to strategy to finance.

2. Objective:Who benefited from the NBA Bubble?

For this assignment, you are asked to prepare an analysis of how player performance changed in the NBA “bubble” (more information on this below). It is hard to summarize all of the value a basketball player provides to a team in one performance number, but people try. For this assignment, we will use plus-minus ratio as a metric. A higher number indicates better performance. Feel free to explore the details of this number if you are interested, but it is not necessary for this lab.

Unlike with the last assignment, for this assignment you will have to collect the player data to analyze. This lab requires you to execute the entire data pipeline. This can be considered a fairly complete, end-to-end data project. It requires you to:

  1. Collect data from online sources
  2. Merge data sources using Tableau Prep
  3. Prep data using Tableau Prep, Excel, or another tool
  4. Use Tableau Desktop to create visual summaries of the data

The specific deliverables are described in detail further below.

3. Data context: What was the NBA Bubble?

From Wikipedia:

The 2020 NBA Bubble, also referred to as the Disney Bubble or Orlando Bubble, was the isolation zone at Walt Disney World in Bay Lake, Florida, near Orlando, that was created by the National Basketball Association (NBA) to protect its players from the COVID-19 pandemic during the final eight games of the 2019–20 regular season and throughout the 2020 NBA playoffs. Twenty-two of the thirty NBA teams were invited to participate, with games being held behind closed doors at the ESPN Wide World of Sports Complex and the teams staying at Disney World hotels.

The bubble was a $190 million investment by the NBA to protect its 2019–20 season, which was initially suspended by the pandemic on March 11, 2020. The bubble recouped an estimated $1.5 billion in revenue. In June, the NBA approved the plan to resume the season at Disney World, inviting the 22 teams that were within six games of a playoff spot when the season was suspended. Although initially receiving a mixed reaction from players and coaches, the teams worked together to use the bubble as a platform for the Black Lives Matter movement.

The NBA Bubble can be viewed as a natural experiment that changed some aspects of the game while leaving others the same. For example, it removes home court advantage, eliminates the physical burden of having to travel to other cities before a game, and restricts players movements and access to family and friends while in the bubble. As such, some players may have thrived more under these new conditions than others.

4. Deliverables

You are encouraged to work with others, but must submit your own assignment and list the names of anyone you work with. Through Canvas, submit a .pdf document that includes visualizations or charts satisfying the following four objectives:

A. Salary and performance before the shutdown

Create a scatterplot that plots 2019-2020 salary against players’ plus-minus for the 2019-2020 regular season games that took place before teams entered the bubble.

B. Regular season performance vs bubble performance

Create a scatterplot that would allow a coach or manager to visualize, for NBA players in the data, their plus-minus statistic for games played before the shutdown along one axis and the plus-minus statistic for games played in the bubble on the other axis.

C. Player age and bubble performance

Given the conditions in the bubble – restricted movement, common food services, separation from family – one could argue that younger players would benefit more from the “dorm-like” atmosphere in the bubble. Alternatively, elimination of the physical toll associated with travel and away games may have benefitted older players more.

5. Steps

This lab consists of several steps, so it pays to stay organized, and to go step-by-step. The tools we have used in class to this point are sufficient to complete the assignment. The steps to follow are outlined below.

Step 1: Collect the data.

Some web scraping tools work better than others for different websites. You can choose which you like, and if one of them does not work for the data you are trying to retrieve, try another! Wherever possible, start by trying to just cut and paste the relevant data because it is fast and often works. For this lab, you will need data on players and their performance statistics (outside and inside the bubble), salaries, ages, and free agency status. Create .csv files from the following data sources:

Step 2: Prep and merge the data

After extracting the data files, you will need to clean them and bring them together. You can use Tableau Prep to do this (or another tool, like Open Refine, if you prefer). The key things to focus on are removing unnecessary data from each file, giving meaningful names to columns to make them easier to work with, and formatting field names as needed so you can match them with the corresponding name in other files. For instance, since we are merging different files on player name, you will need to make sure that the names will match across files, which requires making sure that they are the same in terms of being uppercase, lowercase, or title case, and ensuring they are in the same order (e.g. first name and then last name). If you have problems matching the two files, check for trailing or leading whitespace in the names in the problematic file.

Merge the different data sources you need for each question into a single Tableau sheet by joining the data sets in Tableau. Information on how to join two data sets in Tableau can be found here. You will need to think carefully about how data sets should be joined (inner join, left join, etc) to make sure you are retaining the right data.

After the data sets are merged, you should have information on player name, plus-minus statistic in the regular season, plus-minus statistic in the bubble, free-agency status, salary, and age for the 350+ players who played in the bubble. Due to different choices made during the cleaning and merge, people may end up with slightly different numbers of players in their final data set. Normally, we would then want to go back and understand where players get lost in the process (e.g. name mismatches, first season players who are in one data set but not the other, etc.) but for the purposes of this lab, as long as the matching process yields a data set with at least 250 players, you can move to the next step.

Step 3: Analyze the data in Tableau

Using Tableau, create charts to satisfy the project deliverables described earlier in this document. Images of these charts (screenshots or image exports embedded into the .pdf document you will submit) will be the final deliverables. As with all projects in this class, it is important that your results are well presented. Graphs should be easy to read and well labeled. Format them on the page so that they are well spaced out. Points will be deducted for sloppy submissions or results that are difficult to read or understand.