This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cs401r_w2016:lab14 [2018/02/09 23:22] sadler [Hints:] |
cs401r_w2016:lab14 [2018/02/12 21:59] sadler [Description:] |
||
---|---|---|---|
Line 80: | Line 80: | ||
===Part 4: Implementation of Subset of Regressors=== | ===Part 4: Implementation of Subset of Regressors=== | ||
- | **Note: the slides in Monday's presentation were slightly incorrect.** Please follow this description of the subset of regressors approach. In particular, on Monday we discussed how you should partition your dataset into $m$ landmarks, and the $n$ rest of your data points. Don't do that. Instead, think of the $m$ landmarks as reusing points in your dataset -- so $m+n>n$. In your dataset, you have $n$ training points, with $n$ x-values and $n$ y-values. Depending on your landmark selection algorithm, the $m$ landmarks could be the same as some of the training points. So, for example: if you have $n=1000$ training points, and you randomly pick $m=5$ landmark points, you will effectively have $n+m=1005$ points, but $5$ of those are re-used. | + | Please follow this description of the subset of regressors approach. In particular, on Monday we discussed how you should partition your dataset into $m$ landmarks, and the $n$ rest of your data points. Don't do that. Instead, think of the $m$ landmarks as reusing points in your dataset -- so $m+n>n$. In your dataset, you have $n$ training points, with $n$ x-values and $n$ y-values. Depending on your landmark selection algorithm, the $m$ landmarks could be the same as some of the training points. So, for example: if you have $n=1000$ training points, and you randomly pick $m=5$ landmark points, you will effectively have $n+m=1005$ points, but $5$ of those are re-used. |
So: in all of the math below, the number $n$ refers to **all** of your training data. | So: in all of the math below, the number $n$ refers to **all** of your training data. | ||
Line 151: | Line 151: | ||
- | sfile = open( 'mean_sub.csv', 'wb' ) | + | sfile = open( 'mean_sub.csv', 'w' ) |
sfile.write( '"Id","Sales"\n' ) | sfile.write( '"Id","Sales"\n' ) | ||
for id in range( 0, N ): | for id in range( 0, N ): |