Data project 2

OIDD 245 Tambe

Overview

Data project 2 is meant to be completed individually and is meant to provide a platform for you to utilize your R skills on a project where you have flexibility to specify the structure and context. One of the goals is to complete a data project that may be of specific use to you in getting a future job (i.e. something you can talk about in an interview or showcase to a future employer). Part of the goal is also to allow you to spend time on a project based in an industry context that is of specific interest to you, since as a group are going to be entering different industries (e.g. consulting, finance, real estate, healthcare, technology etc.) This can also be a “passion” project on a topic you might be particularly interested in, which could be anything ranging from protecting endangered species to the Game of Thrones.

You are highly encouraged to pursue rich and creative data sources. There are many that are freely available on the web. Moreover, data from sources such as Facebook, Twitter, Yelp, and other companies can be harvested for analysis. In this class, we have covered web scraping, using API’s, text mining, prediction, and surveyed some visualization techniques, and you are welcome (and encouraged) to use packages and methods that we have not covered in class. If you are unsure whether a project you are considering is a good candidate, please ask me!

Choosing an audience and specifying an “interesting” question are important parts of the assignment (and of any good data science work). What makes for an interesting question can be subjective and is often domain specific. It is a good idea, therefore, to avail yourself of feedback from friends, family, TAs, or me, if you are stuck. I am happy to provide feedback over the next few weeks as you develop your projects.

Learning objectives

  1. Gain experience with data skills and with working with large data sets using R.
  2. Learn to appreciate the combinatorial nature of the possibilities that arise when combining data sources.
  3. Try to be creative in your projects. Many data scientists would argue that creativity, domain knowledge, and storytelling are equally or even more important than skills such as R or Python when developing data products.

Project Requirements

Deliverables

Grading Criteria (125 pts)

Some sample projects from past years (may have had a different rubric)