This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cs501r_f2016:tmp [2016/09/24 20:44] wingated |
cs501r_f2016:tmp [2016/09/24 20:47] wingated |
||
---|---|---|---|
Line 1: | Line 1: | ||
====Objective:==== | ====Objective:==== | ||
- | To read current papers on DNN research and translate them into working models. To experiment with DNN-style regularization methods, including Dropout, Dropconnect, and L1/L2 weight regularization. | + | To read current papers on DNN research and translate them into working models. To experiment with DNN-style regularization methods, including Dropout, Dropconnect, and L1 weight regularization. |
---- | ---- | ||
Line 75: | Line 75: | ||
**Important note**: the dropconnect paper has a somewhat more sophisticated inference method (that is, the method used at test time). **We will not use that method.** Instead, we will use the same inference approximation used by the Dropout paper -- we will simply scale things by the ''keep_probability''. | **Important note**: the dropconnect paper has a somewhat more sophisticated inference method (that is, the method used at test time). **We will not use that method.** Instead, we will use the same inference approximation used by the Dropout paper -- we will simply scale things by the ''keep_probability''. | ||
- | You should scan across the same values of ''keep_probability'', and you should generate the same plot. | + | You should scan across the same values of ''keep_probability'', and you should generate a similar plot. |
Dropconnect seems to want more training steps than dropout, so you should run the optimizer for 1500 iterations. | Dropconnect seems to want more training steps than dropout, so you should run the optimizer for 1500 iterations. | ||
Line 81: | Line 81: | ||
**Part 3: implement L1 regularization** | **Part 3: implement L1 regularization** | ||
- | For this part, you should implement L1 regularization on the weights. This will change your computation graph a bit, and specifically will change your cost function -- instead of optimizing just ''cross_entropy'', you should optimize ''cross_entropy + lam*regularizers'', where ''lam'' is the \lambda regularization parameter from the slides. You should regularize all of the weights and biases (six variables in total). | + | For this part, you should implement L1 regularization on the weights. This will change your computation graph a bit, and specifically wil |
- | + | ||
- | You should create a plot of test/training performance as you scan across values of lambda. You should test at least [0.1, 0.01, 0.001]. | + | |
- | + | ||
- | Note: unlike the dropout/dropconnect regularizers, you will probably not be able to improve test time performance! | + | |
- | + | ||
- | ---- | + | |
- | ====Hints:==== | + | |
- | + | ||
- | To generate a random binary matrix, you can use ''np.random.rand'' to generate a matrix of random values between 0 and 1, and then only keep those above a certain threshold. | + |