Lab 1: Bike Sharing

OIDD 2450 Tambe

1. Project background

GOAL: To familiarize yourself with Tableau as a tool for data analysis and visualization.


An important application area for big data analysis is transportation and logistics systems.

Some of the largest data science investments are in the transport and logistics industries: e.g. Uber, Lyft, FedEx, & Amazon. A sample job listing from this industry:

Citi Bike has been a controversial initative that has faced operational issues such as:

Data can be useful for addressing these issues. And Citi Bike affiliate Motivate has been analyzing data in this area! (Motivate has since been acquired by Lyft)


2. The Citi Bike data

Monthly Citi Bike usage data are available here. The data are organized as a series of CSV files. Excel, Access, Tableau and other popular data packages can automatically deal with data in .csv formats.

Begin by working with the ridership files from Oct 2019 and Oct 2020 from the link above. If your computer struggles with the larger files, I have generated smaller version of these files to make them slightly easier and quicker to work with. Data files for this session:

Take note of the data dictionary for the operational data collected by Citi Bike. Please note the coding for gender (0/1/2) and subscriber type!


3. Lab Objectives and Deliverables

Guidelines for deliverables:


A. Create charts describing gender and Citi Bike usage

A key business imperative for sharing ride-sharing companies is having a broad demographic customer base. For example, SF has said that it will limit permits for scooter sharing companies unless they become more inclusive, by gender and race. For Citi Bike usage in the 2019 data file, create visualizations illustrating how women differ from men along the following four dimensions. For the choice of visualization, you should use your best judgment as to what most clearly illustrates the comparison. Please drop (e.g. filter out) the "unknown" category in gender.


B. Build a “data product” that advises decision-makers on how to improve gender balance for Citi Bike.

The COVID-19 related changes to city life created a natural experiment that appear to have increased gender balance in the Citi Bike network. This provides an opportunity to derive insights into policies that could shift gender balance in service usage on a more long-term basis.

  1. By comparing the aggregate activity in the October 2019 and October 2020 data sets, provide visual evidence on the extent of this shift in usage by gender. You do not need to combine the data sets for this exercise, you can analyze each month separately and conduct a visual comparison of any charts you make for the two different months.
  1. Using the more detailed activity data, provide evidence for an actionable recommendation as to changes that either policy makers (e.g. NYC local government) or Citibike managers can make to ensure this shift persists after a return to normal. Alternatively, you can choose to provide evidence that you believe that this shift is likely to be only temporary and that no investments should be made. In either case, generate a visualization or a set of visualizations from the data that tells a story to Citi Bike decision makers about what might be driving these changes. To do this, you will need to go beyond using the aggregate data, and examine shifts in gender usage in the two different time periods by location, time, age, or other attributes. Examples of investments you suggest could be adding new features to the bike or stations or roads. Be as creative as you like but your recommendation should be evidence-based. The data are unlikely to provide iron-clad evidence for any intervention, but they should support it.
  1. Provide a brief description (no more than a paragraph) describing your visualization and what it suggests. (As you work on this exercise, consider how large data sets such as these may be shifting decision-making behaviors inside organizations, setting up a battle between the intuition of experts and the analyses produced by data scientists. This will be a recurring theme in this course.)
  1. It is important that this exercise provide a sharp recommendation or answer to the above question. For instance, if your suggestion is how to improve the product or service, you should suggest a specific improvement to their roads or bikes or pricing, based on the data you can analyze.

C. Data-driven operations: The case of Lyft

Towards the end of 2018, Lyft made an approximately $100 million investment into Citi Bike. Citi Bike planned to use this money to add thousands of bikes to their network. Some of these were to be e-assist bikes that are electrically boosted and will allow for "range expansion", which means that it will allow riders to bike longer distances more easily.

D. Digital footprints and the pandemic

This question asks you to use digital trace data (i.e., the Citibike data) to learn something about the effects of the COVID-19 pandemic.

Using the Citibike data, provide a data-driven argument about how patterns of human activity (along any dimension except for gender) changed after the pandemic. To do so, use a set of Tableau visualizations from time periods before and after the pandemic (e.g. Oct 2019 and Oct 2022) to support a specific story you choose to tell about how usage patterns (i.e., rider behaviors) might have changed in NYC over this period. As above, you can try to use the full Oct 2022 file from the Citibike repo if your computer can handle it (the file labeled 202210-citibike-tripdata.csv.zip), but otherwise I have created a smaller extract from that time period.

This question is meant to be flexible: you can tell a story about any aspect of rider behavior related to time of day, geography, or demographic attributes EXCEPT for gender.


Getting started: