This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision Next revision Both sides next revision | ||
cs501r_f2016:tmp [2016/09/24 20:31] wingated |
cs501r_f2016:tmp [2016/09/24 20:38] wingated |
||
---|---|---|---|
Line 12: | Line 12: | ||
- You must implement dropout (NOT using the pre-defined Tensorflow layers) | - You must implement dropout (NOT using the pre-defined Tensorflow layers) | ||
- You must implement dropconnect | - You must implement dropconnect | ||
- | - You must experiment with L1/L2 weight regularization | + | - You must experiment with L1 weight regularization |
You should turn in an iPython notebook that shows three plots, one for each of the regularization methods. | You should turn in an iPython notebook that shows three plots, one for each of the regularization methods. | ||
Line 36: | Line 36: | ||
This lab is a chance for you to start reading the literature on deep neural networks, and understand how to replicate methods from the literature. You will implement 4 different regularization methods, and will benchmark each one. | This lab is a chance for you to start reading the literature on deep neural networks, and understand how to replicate methods from the literature. You will implement 4 different regularization methods, and will benchmark each one. | ||
- | |||
- | Please note tat | ||
To help ensure that everyone is starting off on the same footing, you should download the following scaffold code: | To help ensure that everyone is starting off on the same footing, you should download the following scaffold code: | ||
- | **For all parts** | ||
- | For all 4 methods, we will run on a single, deterministic batch of the first 1000 images from the MNIST dataset. This will help us to | ||
+ | For all 4 methods, we will run on a single, deterministic batch of the first 1000 images from the MNIST dataset. This will help us to overfit, and will hopefully be small enough not to tax your computers too much. | ||
**Part 1: implement dropout** | **Part 1: implement dropout** | ||
Line 50: | Line 47: | ||
For the first part of the lab, you should implement dropout. The paper upon which you should base your implementation is found at: | For the first part of the lab, you should implement dropout. The paper upon which you should base your implementation is found at: | ||
- | [[https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf|Dropout]] | + | [[https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf|The dropout paper]] |
The relevant equations are found in section 4 (pg 1933). You may also refer to the slides. | The relevant equations are found in section 4 (pg 1933). You may also refer to the slides. | ||
Line 56: | Line 53: | ||
There are several notes to help you with this part: | There are several notes to help you with this part: | ||
- | - First, you should run the provided code as-is. It will overfit on the first 1000 images (how do you know this?). Record the accuracy of the | + | - First, you should run the provided code as-is. It will overfit on the first 1000 images (how do you know this?). Record the test and training accuracy; this will be the "baseline" line in your plot. |
- Second, you should add dropout to each of the ''h1'', ''h2'', and ''h3'' layers. | - Second, you should add dropout to each of the ''h1'', ''h2'', and ''h3'' layers. | ||
- You must consider carefully how to use tensorflow to implement dropout. | - You must consider carefully how to use tensorflow to implement dropout. | ||
- | - Remember that you must scale activations by the ''keep_probability'', as discussed in class and in the paper. | + | - Remember that when you test images (or when you compute training set accuracy), you must scale activations by the ''keep_probability'', as discussed in class and in the paper. |
- You should use the Adam optimizer, and optimize for 150 steps. | - You should use the Adam optimizer, and optimize for 150 steps. | ||
Line 67: | Line 64: | ||
Once you understand dropout, implementing it is not hard; you should only have to add ~10 lines of code. | Once you understand dropout, implementing it is not hard; you should only have to add ~10 lines of code. | ||
+ | |||
+ | Also note that because dropout involves some randomness, your curve may not match mine exactly; this is expected. | ||
**Part 2: implement dropconnect** | **Part 2: implement dropconnect** | ||
Line 80: | Line 79: | ||
Dropconnect seems to want more training steps than dropout, so you should run the optimizer for 1500 iterations. | Dropconnect seems to want more training steps than dropout, so you should run the optimizer for 1500 iterations. | ||
- | **Part 3: implement L1/L2 regularization** | + | **Part 3: implement L1 regularization** |
- | For this part, you should implement both L1 and L2 regularization on the weights. This will change your computation graph a bit, and specifically will change your cost function -- instead of optimizing just ''cross_entropy'', you should optimize ''cross_entropy + lam*regularizers'', where ''lam'' is the \lambda regularization parameter from the slides. You should regularize all of the weights and biases (six variables in total). | + | For this part, you should implement L1 regularization on the weights. This will change your computation graph a bit, and specifically will change your cost function -- instead of optimizing just ''cross_entropy'', you should optimize ''cross_entropy + lam*regularizers'', where ''lam'' is the \lambda regularization parameter from the slides. You should regularize all of the weights and biases (six variables in total). |
You should create a plot of test/training performance as you scan across values of lambda. You should test at least [0.1, 0.01, 0.001]. | You should create a plot of test/training performance as you scan across values of lambda. You should test at least [0.1, 0.01, 0.001]. |