This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
cs401r_w2016:lab12 [2018/03/16 20:27] wingated |
cs401r_w2016:lab12 [2021/06/30 23:42] (current) |
||
|---|---|---|---|
| Line 10: | Line 10: | ||
| - A notebook containing your code, but we will not run it. | - A notebook containing your code, but we will not run it. | ||
| - A set of predictions for a specific list of <user,movie> pairs, in a CSV file. | - A set of predictions for a specific list of <user,movie> pairs, in a CSV file. | ||
| - | - A report discussing your approach, how well it worked (in terms of RMSE), and any visualizations or patterns you found in the data. PDF format, please! | + | - A report discussing your approach, how well it worked (in terms of RMSE), and any visualizations or patterns you found in the data. Markdown format, please!! |
| We will run a small "competition" on your predictions: the three students with the best predictions will get 10% extra credit on this lab. | We will run a small "competition" on your predictions: the three students with the best predictions will get 10% extra credit on this lab. | ||
| Line 21: | Line 21: | ||
| Your entry will be graded on the following elements: | Your entry will be graded on the following elements: | ||
| - | * 100% Project writeup | + | * 85% Project writeup |
| - | * 35% Exploratory data analysis | + | * 30% Exploratory data analysis |
| - | * 35% Description of technical approach | + | * 30% Description of technical approach |
| - | * 30% Analysis of performance of method | + | * 25% Analysis of performance of method |
| + | * 15% Submission of predictions csv file | ||
| * 10% extra credit for the three top predictions | * 10% extra credit for the three top predictions | ||
| Line 48: | Line 49: | ||
| As part of this lab, you must submit a set of predictions. You must provide predictions as a simple CSV file with two columns and 85,000 rows. Each row has the form | As part of this lab, you must submit a set of predictions. You must provide predictions as a simple CSV file with two columns and 85,000 rows. Each row has the form | ||
| - | ''testID,predicted rating'' | + | ''testID,predicted_rating'' |
| The ''testID'' field uniquely identifies each ''user,movie'' prediction pair in the predictions set. | The ''testID'' field uniquely identifies each ''user,movie'' prediction pair in the predictions set. | ||
| Line 88: | Line 89: | ||
| import seaborn | import seaborn | ||
| import pandas | import pandas | ||
| + | import numpy as np | ||
| ur = pandas.read_csv('user_ratedmovies_train.dat','\t') | ur = pandas.read_csv('user_ratedmovies_train.dat','\t') | ||
| Line 104: | Line 106: | ||
| </code> | </code> | ||
| + | |||
| + | And Here is some code that writes out your prediction file that you will submit: | ||
| + | |||
| + | <code python> | ||
| + | |||
| + | import numpy as np | ||
| + | import pandas as pd | ||
| + | |||
| + | pred_array = pd.read_table('predictions.dat') | ||
| + | test_ids = pred_array[["testID"]] | ||
| + | pred_array.head() | ||
| + | |||
| + | N = pred_array.shape[0] | ||
| + | my_preds = np.zeros((N,1)) | ||
| + | |||
| + | for id in range(N): ### Prediction loop | ||
| + | predicted_rating = 3 | ||
| + | my_preds[ id, 0 ] = predicted_rating ### This Predicts everything as 3 | ||
| + | |||
| + | sfile = open( 'predictions.csv', 'w' ) | ||
| + | sfile.write( '"testID","predicted_rating"\n' ) | ||
| + | for id in range( 0, N ): | ||
| + | sfile.write( '%d,%.2f\n' % (test_ids.iloc[id], my_preds[id] ) ) | ||
| + | sfile.close() | ||
| + | |||
| + | </code> | ||
| + | |||