User Tools

Site Tools


cs501r_f2016:tmp

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cs501r_f2016:tmp [2016/09/24 20:31]
wingated
cs501r_f2016:tmp [2016/09/24 20:47]
wingated
Line 1: Line 1:
 ====Objective:​==== ====Objective:​====
  
-To read current papers on DNN research and translate them into working models. ​ To experiment with DNN-style regularization methods, including Dropout, Dropconnect,​ and L1/L2 weight regularization.+To read current papers on DNN research and translate them into working models. ​ To experiment with DNN-style regularization methods, including Dropout, Dropconnect,​ and L1 weight regularization.
  
 ---- ----
Line 12: Line 12:
   - You must implement dropout (NOT using the pre-defined Tensorflow layers)   - You must implement dropout (NOT using the pre-defined Tensorflow layers)
   - You must implement dropconnect   - You must implement dropconnect
-  - You must experiment with L1/L2 weight regularization+  - You must implement ​L1 weight regularization
  
 You should ​ turn in an iPython notebook that shows three plots, one for each of the regularization methods. You should ​ turn in an iPython notebook that shows three plots, one for each of the regularization methods.
Line 18: Line 18:
   - For dropout: a plot showing training / test performance as a function of the "keep probability"​.   - For dropout: a plot showing training / test performance as a function of the "keep probability"​.
   - For dropconnect:​ the same   - For dropconnect:​ the same
-  - For L1/L2: a plot showing training / test performance as a function of the regularization strength, \lambda+  - For L1 a plot showing training / test performance as a function of the regularization strength, \lambda
  
-An example of my training/​test performance is shown at the right.+An example of my training/​test performance ​for dropout ​is shown at the right.
  
 ---- ----
Line 29: Line 29:
   * 40% Correct implementation of Dropout   * 40% Correct implementation of Dropout
   * 30% Correct implementation of Dropconnect   * 30% Correct implementation of Dropconnect
-  * 20% Correct implementation of L1/L2 regularization+  * 20% Correct implementation of L1 regularization
   * 10% Tidy and legible plots   * 10% Tidy and legible plots
  
Line 36: Line 36:
  
 This lab is a chance for you to start reading the literature on deep neural networks, and understand how to replicate methods from the literature. ​ You will implement 4 different regularization methods, and will benchmark each one. This lab is a chance for you to start reading the literature on deep neural networks, and understand how to replicate methods from the literature. ​ You will implement 4 different regularization methods, and will benchmark each one.
- 
-Please note tat 
  
 To help ensure that everyone is starting off on the same footing, you should download the following scaffold code: To help ensure that everyone is starting off on the same footing, you should download the following scaffold code:
  
-**For all parts** +[[http://​liftothers.org/​byu/​lab6_scaffold.py|Lab 6 scaffold code]]
- +
-For all 4 methods, we will run on a single, deterministic batch of the first 1000 images from the MNIST dataset This will help us to +
  
 +For all 4 methods, we will run on a single, deterministic batch of the first 1000 images from the MNIST dataset. ​ This will help us to overfit, and will hopefully be small enough not to tax your computers too much.
  
 **Part 1: implement dropout** **Part 1: implement dropout**
Line 50: Line 47:
 For the first part of the lab, you should implement dropout. ​ The paper upon which you should base your implementation is found at: For the first part of the lab, you should implement dropout. ​ The paper upon which you should base your implementation is found at:
  
-[[https://​www.cs.toronto.edu/​~hinton/​absps/​JMLRdropout.pdf|Dropout]]+[[https://​www.cs.toronto.edu/​~hinton/​absps/​JMLRdropout.pdf|The dropout paper]]
  
 The relevant equations are found in section 4 (pg 1933). ​ You may also refer to the slides. The relevant equations are found in section 4 (pg 1933). ​ You may also refer to the slides.
Line 56: Line 53:
 There are several notes to help you with this part: There are several notes to help you with this part:
  
-  - First, you should run the provided code as-is. ​ It will overfit on the first 1000 images (how do you know this?​). ​ Record the accuracy ​of the +  - First, you should run the provided code as-is. ​ It will overfit on the first 1000 images (how do you know this?​). ​ Record the test and training ​accuracy; this will be the "​baseline"​ line in your plot.
   - Second, you should add dropout to each of the ''​h1'',​ ''​h2'',​ and ''​h3''​ layers.   - Second, you should add dropout to each of the ''​h1'',​ ''​h2'',​ and ''​h3''​ layers.
   - You must consider carefully how to use tensorflow to implement dropout.   - You must consider carefully how to use tensorflow to implement dropout.
-  - Remember that you must scale activations by the ''​keep_probability'',​ as discussed in class and in the paper.+  - Remember that when you test images (or when you compute training set accuracy), ​you must scale activations by the ''​keep_probability'',​ as discussed in class and in the paper.
   - You should use the Adam optimizer, and optimize for 150 steps.   - You should use the Adam optimizer, and optimize for 150 steps.
  
Line 67: Line 64:
  
 Once you understand dropout, implementing it is not hard; you should only have to add ~10 lines of code. Once you understand dropout, implementing it is not hard; you should only have to add ~10 lines of code.
 +
 +Also note that because dropout involves some randomness, your curve may not match mine exactly; this is expected.
  
 **Part 2: implement dropconnect** **Part 2: implement dropconnect**
Line 76: Line 75:
 **Important note**: the dropconnect paper has a somewhat more sophisticated inference method (that is, the method used at test time). ​ **We will not use that method.** Instead, we will use the same inference approximation used by the Dropout paper -- we will simply scale things by the ''​keep_probability''​. **Important note**: the dropconnect paper has a somewhat more sophisticated inference method (that is, the method used at test time). ​ **We will not use that method.** Instead, we will use the same inference approximation used by the Dropout paper -- we will simply scale things by the ''​keep_probability''​.
  
-You should scan across the same values of ''​keep_probability'',​ and you should generate ​the same plot.+You should scan across the same values of ''​keep_probability'',​ and you should generate ​a similar ​plot.
  
 Dropconnect seems to want more training steps than dropout, so you should run the optimizer for 1500 iterations. Dropconnect seems to want more training steps than dropout, so you should run the optimizer for 1500 iterations.
  
-**Part 3: implement L1/L2 regularization** +**Part 3: implement L1 regularization**
- +
-For this part, you should implement both L1 and L2 regularization on the weights. ​ This will change your computation graph a bit, and specifically will change your cost function -- instead of optimizing just ''​cross_entropy'',​ you should optimize ''​cross_entropy + lam*regularizers'',​ where ''​lam''​ is the \lambda regularization parameter from the slides. ​ You should regularize all of the weights and biases (six variables in total). +
- +
-You should create a plot of test/​training performance as you scan across values of lambda. ​ You should test at least [0.1, 0.01, 0.001]. +
- +
-Note: unlike the dropout/​dropconnect regularizers,​ you will probably not be able to improve test time performance! +
- +
----- +
-====Hints:​==== +
- +
-To generate a random binary matrix, you can use ''​np.random.rand''​ to generate a matrix of random values between 0 and 1, and then only keep those above a certain threshold.+
  
 +For this part, you should implement L1 regularization on the weights. ​ This will change your computation graph a bit, and specifically wil
cs501r_f2016/tmp.txt · Last modified: 2021/06/30 23:42 (external edit)