Lab 1: Citi Bike

OIDD 245 Tambe

1. Project background

GOAL: To familiarize yourself with Tableau as a tool for data analysis and visualization.


An important application area for big data analysis is transportation and logistics systems.

Some of the largest data science investments are in the transport and logistics industries: e.g. Uber, Lyft, FedEx, & Amazon. A sample job listing from this industry:




Citi Bike has been a controversial initative that has faced operational issues such as:

Data can be useful for addressing these issues. And Citi Bike affiliate Motivate has been analyzing data in this area! (Motivate has since been acquired by Lyft)



2. The Citi Bike data

Monthly Citi Bike usage data are available here. The data are organized as a series of CSV files. Excel, Access, Tableau and other popular data packages can automatically deal with data in .csv formats.

I have generated smaller version of these files to make them slightly easier and quicker to work with. Data files for this session:

Take not of the data dictionary for the operational data collected by Citi Bike. Please note the coding for gender (0/1/2) and subscriber type!


3. Lab Objectives and Deliverables

Guidelines for deliverables:


A. Create charts describing gender and Citi Bike use

A key business imperative for sharing ride-sharing companies is having a broad demographic customer base. For example, SF has said that it will limit permits for scooter sharing companies unless they become more inclusive, by gender and race. For Citi Bike usage in the 2019 data file, create visualizations illustrating how women differ from men along the following four dimensions. For the choice of visualization, you should use your best judgment as to what most clearly illustrates the comparison. Please drop (e.g. filter out) the "unknown" category in gender.


B. Build a “data product” that advises decision-makers on how to improve gender balance for Citi Bike.

The COVID-19 related changes to city life created a natural experiment that appear to have increased gender balance in the Citi Bike network. This provides an opportunity to derive insights into policies that could shift gender balance in service usage on a more long-term basis.

  1. By comparing the aggregate activity in the October 2019 and October 2020 data sets, provide visual evidence on the extent of this shift in usage by gender. You do not need to combine the data sets for this exercise, you can analyze each month separately and conduct a visual comparison of any charts you make for the two different months.
  2. Using the more detailed activity data, provide evidence for an actionable recommendation as to changes policy makers can make to ensure this shift persists after a return to normal. Alternatively, you can choose to provide evidence that you believe that this shift is likely to be only temporary and that no investments should be made. In either case, generate a visualization or a set of visualizations from the data that tells a story to Citi Bike decision makers about what might be driving these changes. To do this, you will need to go beyond using the aggregate data, and examine shifts in gender usage in the two different time periods by location, time, age, or other attributes. Examples of investments you suggest could be adding new features to the bike or stations or roads. Be as creative as you like but your recommendation should be evidence-based. The data are unlikely to provide iron-clad evidence for any intervention, but they should support it.
  3. Provide a brief description (no more than a paragraph) describing your visualization and what it suggests. (As you work on this exercise, consider how large data sets such as these may be shifting decision-making behaviors inside organizations, setting up a battle between the intuition of experts and the analyses produced by data scientists. This will be a recurring theme in this course.)
  4. It is important that this exercise provide a sharp recommendation or answer to the above question. For instance, if your suggestion is how to improve the product or service, you should suggest a specific improvement to their roads or bikes or pricing, based on the data you can analyze.

C. Data-driven operations: The case of Lyft

Towards the end of 2018, Lyft made an approximately $100 million investment into Citi Bike. Citi Bike planned to use this money to add thousands of bikes to their network. Some of these were to be e-assist bikes that are electrically boosted and will allow for "range expansion", which means that it will allow riders to bike longer distances more easily.


Getting started: