User Tools

Site Tools


cs401r_w2016:lab12

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cs401r_w2016:lab12 [2018/03/16 20:27]
wingated
cs401r_w2016:lab12 [2021/06/30 23:42] (current)
Line 10: Line 10:
   - A notebook containing your code, but we will not run it.   - A notebook containing your code, but we will not run it.
   - A set of predictions for a specific list of <​user,​movie>​ pairs, in a CSV file.   - A set of predictions for a specific list of <​user,​movie>​ pairs, in a CSV file.
-  - A report discussing your approach, how well it worked (in terms of RMSE), and any visualizations or patterns you found in the data.  ​PDF format, please!+  - A report discussing your approach, how well it worked (in terms of RMSE), and any visualizations or patterns you found in the data.  ​Markdown ​format, please!!
  
 We will run a small "​competition"​ on your predictions:​ the three students with the best predictions will get 10% extra credit on this lab. We will run a small "​competition"​ on your predictions:​ the three students with the best predictions will get 10% extra credit on this lab.
Line 21: Line 21:
 Your entry will be graded on the following elements: Your entry will be graded on the following elements:
  
-  * 100% Project writeup +  * 85% Project writeup 
-    * 35% Exploratory data analysis +    * 30% Exploratory data analysis 
-    * 35% Description of technical approach +    * 30% Description of technical approach 
-    * 30% Analysis of performance of method+    * 25% Analysis of performance of method 
 +  * 15% Submission of predictions csv file
   * 10% extra credit for the three top predictions   * 10% extra credit for the three top predictions
  
Line 48: Line 49:
 As part of this lab, you must submit a set of predictions. ​ You must provide predictions as a simple CSV file with two columns and 85,000 rows.  Each row has the form As part of this lab, you must submit a set of predictions. ​ You must provide predictions as a simple CSV file with two columns and 85,000 rows.  Each row has the form
  
-''​testID,​predicted rating''​+''​testID,​predicted_rating''​
  
 The ''​testID''​ field uniquely identifies each ''​user,​movie''​ prediction pair in the predictions set. The ''​testID''​ field uniquely identifies each ''​user,​movie''​ prediction pair in the predictions set.
Line 88: Line 89:
 import seaborn import seaborn
 import pandas import pandas
 +import numpy as np
  
 ur = pandas.read_csv('​user_ratedmovies_train.dat','​\t'​) ur = pandas.read_csv('​user_ratedmovies_train.dat','​\t'​)
Line 104: Line 106:
  
 </​code>​ </​code>​
 +
 +And Here is some code that writes out your prediction file that you will submit:
 +
 +<code python>
 +
 +import numpy as np
 +import pandas as pd
 +
 +pred_array = pd.read_table('​predictions.dat'​)
 +test_ids = pred_array[["​testID"​]]
 +pred_array.head()
 +
 +N = pred_array.shape[0]
 +my_preds = np.zeros((N,​1))
 +
 +for id in range(N): ### Prediction loop
 +    predicted_rating = 3 
 +    my_preds[ id, 0 ] = predicted_rating ### This Predicts everything as 3
 +
 +sfile = open( '​predictions.csv',​ '​w'​ )
 +sfile.write( '"​testID","​predicted_rating"​\n'​ )
 +for id in range( 0, N ):
 +    sfile.write( '​%d,​%.2f\n'​ % (test_ids.iloc[id],​ my_preds[id] ) ) 
 +sfile.close()
 +
 +</​code>​
 +
cs401r_w2016/lab12.1521232054.txt.gz · Last modified: 2021/06/30 23:40 (external edit)