User Tools

Site Tools


cs401r_w2016:fp

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
cs401r_w2016:fp [2016/04/04 18:44]
admin created
cs401r_w2016:fp [2021/06/30 23:42] (current)
Line 6: Line 6:
 ====Deliverable:​==== ====Deliverable:​====
  
-For this lab, you will apply your data analysis skills to a new problem. ​ You will turn in a report discussing your efforts.+For your final project, you will find a dataset and apply your data analysis skills to a new problem ​based on the data.  You will turn in a PDF report discussing your efforts, don't include code in your report
  
 ---- ----
Line 41: Line 41:
 Your writeup should be a serious report on the dataset you chose, the problem you set out to solve, the technical approach you took (and your rationale for it), the results of any exploratory data analysis, and the results of your final model / inference / optimization algorithm. Your writeup should be a serious report on the dataset you chose, the problem you set out to solve, the technical approach you took (and your rationale for it), the results of any exploratory data analysis, and the results of your final model / inference / optimization algorithm.
  
 +Your writeup should discuss questions similar to your recommender engine report:
 +This writeup must include five main sections:
  
 +  - **A discussion of the dataset**
 +    - Where did it come from?  Who published it?
 +    - Who cares about this data?
 +  - **A discussion of the problem to be solved**
 +    - Is this a classification problem? ​ A regression problem?
 +    - Is it supervised? ​ Unsupervised?​
 +    - What sort of background knowledge do you have that you could bring to bear on this problem?
 +    - What other approaches have been tried? ​ How did they fare?
 +  - **A discussion of your exploration of the dataset**.
 +    - Before you start coding, you should look at the data.  What does it include? ​ What patterns do you see?
 +    - Any visualizations about the data you deem relevant
 +  - **A clear, technical description of your approach.** ​ This section should include:
 +    - Background on the approach
 +    - Description of the model you use
 +    - Description of the inference / training algorithm you use
 +    - Description of how you partitioned your data into a test/​training split
 +  - **An analysis of how your approach worked on the dataset**
 +    - What was your final RMSE on your private test/​training split?
 +    - Did you overfit? ​ How do you know?
 +    - Was your first algorithm the one you ultimately used for your submission? ​ Why did you (or didn't you) iterate your design?
 +    - Did you solve (or make any progress on) the problem you set out to solve?
 +
 +----
 +====Possible sources of interesting datasets====
 +
 +Croudflower
 +
 +KDD cup
 +
 +UCI repository
 +
 +Kaggle (current and past)
 +
 +Data.gov
 +
 +AWS
 +
 +World bank
 +
 +BYU CS478 datasets
 +
 +data.utah.gov
 +
 +Google research
 +
 +BYU DSC competition
  
  
cs401r_w2016/fp.1459795464.txt.gz ยท Last modified: 2021/06/30 23:40 (external edit)