To creatively apply knowledge gained through the course of the semester to a substantial data analysis problem of your own choosing.
For your final project, you will find a dataset and apply your data analysis skills to a new problem based on the data. You will turn in a PDF report discussing your efforts, don't include code in your report.
Your entry will be graded on the following elements:
The final project is designed to give you a chance to explore a data science project end-to-end, with minimal restrictions.
For this project, you must:
You are welcome to use any publicly available code on the internet to help you. For example, you may wish to use the Stan language to help you construct an HMC sampler. Other possibilities include PyMC, the Venture probabilistic programming language, BayesDB, etc.
Your writeup should be a serious report on the dataset you chose, the problem you set out to solve, the technical approach you took (and your rationale for it), the results of any exploratory data analysis, and the results of your final model / inference / optimization algorithm.
Your writeup should discuss questions similar to your recommender engine report: This writeup must include five main sections:
Croudflower
KDD cup
UCI repository
Kaggle (current and past)
Data.gov
AWS
World bank
BYU CS478 datasets
data.utah.gov
Google research
BYU DSC competition