User Tools

Site Tools


cs401r_w2016:fp

This is an old revision of the document!


Objective:

To creatively apply knowledge gained through the course of the semester to a substantial data analysis problem of your own choosing.


Deliverable:

For this lab, you will apply your data analysis skills to a new problem. You will turn in a report discussing your efforts.


Grading standards:

Your entry will be graded on the following elements:

  • 75% Project writeup
    • 35% Exploratory data analysis
    • 35% Description of technical approach
    • 30% Analysis of performance of method
  • 25% Project presentation
    • 33% Clearly motivated problem
    • 33% Clear description of technical approach
    • 33% Clear presentation of results

Description:

The final project is designed to give you a chance to explore a data science project end-to-end, with minimal restrictions.

For this project, you must:

  • Select a dataset to analyze (perhaps one from Kaggle?)
  • Define a question or task to be performed
    • What is your goal in analyzing this dataset? Is it a prediction problem? Or are you searching for patterns?
    • If appropriate, define a cost function to be optimized
  • Choose an analysis strategy
  • If appropriate, define a model
  • If appropriate, choose an inference algorithm to answer your question, given a model

You are welcome to use any publicly available code on the internet to help you. For example, you may wish to use the Stan language to help you construct an HMC sampler. Other possibilities include PyMC, the Venture probabilistic programming language, BayesDB, etc.

Your writeup should be a serious report on the dataset you chose, the problem you set out to solve, the technical approach you took (and your rationale for it), the results of any exploratory data analysis, and the results of your final model / inference / optimization algorithm.

cs401r_w2016/fp.1459795464.txt.gz · Last modified: 2021/06/30 23:40 (external edit)