Datathon 1: Speed Dating

OIDD 245 Tambe

What is a datathon?

A datathon is a timed workshop that asks researchers to turn information into knowledge. It’s a format modeled after hackathons. The difference is that datathons use research questions and datasets to advance knowledge, not to launch apps or new software. At a datathon, participants work in teams to frame a research question, create and implement a research design, mobilize data resources and present their findings.

They are becoming increasingly common for the following reasons:

  1. Companies want to get people to work on their data and data problems
  2. Companies want to find smart people to hire
  3. Companies want to brand themselves as having interesting data and problems on which to work
  4. They are a good learning and community building exercise

Key objectives of our datathons

Data-driven interviews

The data-driven interview is an interview format that is becoming increasingly common. In such contexts, or in data consulting exercises in general, you are not given structured goals. Rather, you are given data sets, and asked to do something with it. Deciding what to do can be as hard or harder than the technical data work and often draws heavily on your domain expertise.

At Jawbone, Rogati said each applicant for data science jobs at Jawbone gets three hours to make sense of mixed-up company data sets. The test can reveal if candidates possess “applied skills,” she said, not just statistical know-how.

Applied skills are becoming increasingly important. The following article (read it outside of class, if you are interested) lists the following important skills for a data scientist.

  1. Statistical thinking
  2. Technical acumen
  3. Multi-modal communication skills
  4. Curiosity
  5. Creativity
  6. Grit

Note the importance of non-technical skills on this list. The answers come from the data, but the questions have to be formulated by the data scientist, and that requires some knowledge or expertise about the domain.

Ground Rules

Data Context

The data set for this exercise was generated from “speed dating” events conducted at Columbia University. In general, data on dating has provided significant raw material for data analysis. For instance, a leading data science team, OK Cupid, maintains a widely read blog of what it finds in its data. Obviously, people are interested.


In this exercise, you are asked to generate and illustrate a finding that others might find interesting. In other words, generate a finding (i.e. a story and supporting visualization) that might be "blog-worthy". The output should be single, well-labeled image or set of images (collected into one graphic) and a brief description of what it shows.

Please submit your entry by 11:45 am for the 10:30 am section or 2:45 pm for the 1:30 pm section. You should submit your visualization by uploading a screen shot of your image to a slide at the appropriate link (will be announced in class) and you include a brief title and the names of the people you worked with on your slide. Alternatively, you can “export” the image from Tableau and upload it.

You will likely be asked to do an informal, less than 60 second presentation of your entry at the beginning of the next class session. A winning entry will be chosen from the presentations through an anonymous class poll.

Data Sources

There is only one data source to be used for this exercise, which is the speed dating data. However you will also need an additional word document which contains information about the fields in the data.

Participants in these speed dating trials engaged in four-minute conversations to determine whether or not they would be interested in meeting one other again. There is a row for each meeting between two individuals, partner ratings, correlation in interests between the two people, as well as information about whether a match was made, and what order the meeting was in the sequence. Moroever, the participants were surveyed so there is data about their major, their hobbies, their preferences, hometown, race, and so on as well as how they rate themselves in terms of attractiveness, how often they go out on dates, what they value in a date, and other similar assessments.


For this exercise, working in teams is required!


Have fun!