Data Project 1

1. Overview
- Data project 1 is to be completed alone OR in groups of no more than two people. Grading standards will be higher for groups of two people.
- Deadline: The project is due on Mar 5th, 2020 by 5:00 PM.
- Project goal: Data project 1 is meant to provide a platform for you to utilize your data skills in a context where you have the freedom to specify structure (unlike the labs). A successful project will have fellow students thinking “This is interesting and compelling!” Therefore, it will pay to think hard and creatively about options and if you are unsure, perhaps collect feedback from other students.
- Theme: As it is unlikely that Quaker Days will be held in-person this Spring, the goal is to develop a “data product” (e.g. link 1, link 2) that can paint a picture for admitted students about life at Penn or in Philadelphia. This restriction still leaves open a large number of options, as there are many things on which you could focus, ranging from mental health resources, to the best food trucks, to Philadelphia apartment listings.
- In terms of creativity, the first deliverable for this project is placing a short description of your project and a description of your key data sets in a shared spreadsheet by 5 pm on Friday February 26th (link to sheet for 10:30 am section, link to sheet for 1:30 pm section). This is meant 1) to inspire interest in other projects students are working on, and 2) to make sure that your project is not too closely overlapping with that of anyone else who has posted on the spreadsheet (it is reasonable for many people to be covering different aspects of large topics such as sports at Penn).
- This project is meant to provide the flexibility to allow you to do a project based on data in something you might feel passionate about. If you are unsure about whether a project you have in mind is a good candidate, please ask.
- You are highly encouraged to pursue rich and creative data sources. There are many freely available on the web. Moreover, data from sources like Facebook, Twitter, Yelp, and other companies can be harvested for analysis. In this class, we have covered web scraping and have survey some visualization technologies and you are also welcome and encouraged to use methods that we have not covered in class.
2. Project Requirements
- A typical project involves the web scraping tools we covered in class, some data cleaning performed using Excel or Tableau Prep, the use of Tableau for visualizing the data, and Wix or Weebly for presenting the visualizations and story. Using multiple data sources is encouraged; many of the best projects come from combining data sources in unexpected ways. Again, you are encouraged to pursue rich and creative data sources. Many data sources are freely available on the web, and in addition to the web scraping technologies we have covered in class, you are welcome to use methods that we have not yet covered in class.
- Your output can include as many visualizations as you need to tell your story.
- You will not present this project in class but the projects will be shared online with other students in the class.
3. Deliverables
- The deliverable is a link to a web site where your project can be viewed. You are not expected to “code” a web site. See the next bullet point.
- Building a web site: There are many free services that make
it easy to "build" a web site in the same way you might build a
Word document (no web programming required). Two preferred services
are:
- Weebly
- Wix
- Google Sites
- Please note that Google Sites require permissions to be set correctly to allow outside visitors. It is your responsibility to make sure these permissions are set correctly. There will be a minor but automatic point deduction if we cannot access your site and have to ask you to change permissions. To be sure, it is a good idea to ask your mother/brother/cousin/friend from home if, given the link, they can access the web site you have created.
- Other options are:
- Squarespace
- If you want to use Squarespace, please note that free trials expire after fourteen days, which is not appropriate for this submission. However, if you happen to be a Squarespace customer and so are not subject to this constraint, feel free to use it to host this project.
- Squarespace
- Along with your data product, you should include the following
information on your web site:
- Who you are
- What data sources you are using
4. Grading Criteria
The project is worth 50 pts total, divided in the following way:
- Uniqueness of the idea (5 points)
- Creativity of the analysis, the data, and use and presentation of the data (15 pts)
- Utility of the data product, how useful the data product is to the target audience (15 pts)
- Web site aesthetics, clarity, and functionality (15 pts)
5. Sample projects from earlier years (on a similar theme):
6. Learning objectives
- Gain experience with data skills and with working with large data sets.
- Appreciate the number of possibilities that arise when combining different data sources.