User Tools

Site Tools


cs501r_f2016:tmp

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

cs501r_f2016:tmp [2016/09/24 20:31]
wingated
cs501r_f2016:tmp [2021/06/30 23:42]
Line 1: Line 1:
-====Objective:​==== 
- 
-To read current papers on DNN research and translate them into working models. ​ To experiment with DNN-style regularization methods, including Dropout, Dropconnect,​ and L1/L2 weight regularization. 
- 
----- 
-====Deliverable:​==== 
- 
-{{ :​cs501r_f2016:​lab6_do.png?​direct&​200|}} 
- 
-For this lab, you will need to implement three different regularization methods from the literature, and explore the parameters of each. 
- 
-  - You must implement dropout (NOT using the pre-defined Tensorflow layers) 
-  - You must implement dropconnect 
-  - You must experiment with L1/L2 weight regularization 
- 
-You should ​ turn in an iPython notebook that shows three plots, one for each of the regularization methods. 
- 
-  - For dropout: a plot showing training / test performance as a function of the "keep probability"​. 
-  - For dropconnect:​ the same 
-  - For L1/L2: a plot showing training / test performance as a function of the regularization strength, \lambda 
- 
-An example of my training/​test performance is shown at the right. 
- 
----- 
-====Grading standards:​==== 
- 
-Your notebook will be graded on the following: 
- 
-  * 40% Correct implementation of Dropout 
-  * 30% Correct implementation of Dropconnect 
-  * 20% Correct implementation of L1/L2 regularization 
-  * 10% Tidy and legible plots 
- 
----- 
-====Description:​==== 
- 
-This lab is a chance for you to start reading the literature on deep neural networks, and understand how to replicate methods from the literature. ​ You will implement 4 different regularization methods, and will benchmark each one. 
- 
-Please note tat 
- 
-To help ensure that everyone is starting off on the same footing, you should download the following scaffold code: 
- 
-**For all parts** 
- 
-For all 4 methods, we will run on a single, deterministic batch of the first 1000 images from the MNIST dataset. ​ This will help us to  
- 
- 
-**Part 1: implement dropout** 
- 
-For the first part of the lab, you should implement dropout. ​ The paper upon which you should base your implementation is found at: 
- 
-[[https://​www.cs.toronto.edu/​~hinton/​absps/​JMLRdropout.pdf|Dropout]] 
- 
-The relevant equations are found in section 4 (pg 1933). ​ You may also refer to the slides. 
- 
-There are several notes to help you with this part: 
- 
-  - First, you should run the provided code as-is. ​ It will overfit on the first 1000 images (how do you know this?​). ​ Record the accuracy of the  
-  - Second, you should add dropout to each of the ''​h1'',​ ''​h2'',​ and ''​h3''​ layers. 
-  - You must consider carefully how to use tensorflow to implement dropout. 
-  - Remember that you must scale activations by the ''​keep_probability'',​ as discussed in class and in the paper. 
-  - You should use the Adam optimizer, and optimize for 150 steps. 
- 
-Note that although we are training on only the first 1000 images, we are testing on the entire 10,000 image test set. 
- 
-In order to generate the final plot, you will need to scan across multiple values of the ''​keep_probability''​. ​ You may wish to refactor the provided code in order to make this easier. ​ You should test at least the values ''​[ 0.1, 0.25, 0.5, 0.75, 1.0 ]''​. 
- 
-Once you understand dropout, implementing it is not hard; you should only have to add ~10 lines of code. 
- 
-**Part 2: implement dropconnect** 
- 
-The specifications for this part are similar to part 1.  Once you have implemented Dropout, it should be very easy to modify your code to perform dropconnect. ​ The paper upon which you should base your implementation is 
- 
-[[http://​www.jmlr.org/​proceedings/​papers/​v28/​wan13.pdf|The dropconnect paper]] 
- 
-**Important note**: the dropconnect paper has a somewhat more sophisticated inference method (that is, the method used at test time). ​ **We will not use that method.** Instead, we will use the same inference approximation used by the Dropout paper -- we will simply scale things by the ''​keep_probability''​. 
- 
-You should scan across the same values of ''​keep_probability'',​ and you should generate the same plot. 
- 
-Dropconnect seems to want more training steps than dropout, so you should run the optimizer for 1500 iterations. 
- 
-**Part 3: implement L1/L2 regularization** 
- 
-For this part, you should implement both L1 and L2 regularization on the weights. ​ This will change your computation graph a bit, and specifically will change your cost function -- instead of optimizing just ''​cross_entropy'',​ you should optimize ''​cross_entropy + lam*regularizers'',​ where ''​lam''​ is the \lambda regularization parameter from the slides. ​ You should regularize all of the weights and biases (six variables in total). 
- 
-You should create a plot of test/​training performance as you scan across values of lambda. ​ You should test at least [0.1, 0.01, 0.001]. 
- 
-Note: unlike the dropout/​dropconnect regularizers,​ you will probably not be able to improve test time performance! 
- 
----- 
-====Hints:​==== 
- 
-To generate a random binary matrix, you can use ''​np.random.rand''​ to generate a matrix of random values between 0 and 1, and then only keep those above a certain threshold. 
  
cs501r_f2016/tmp.txt ยท Last modified: 2021/06/30 23:42 (external edit)