Differences

This shows you the differences between two versions of the page.

--- cs501r_f2016:tmp [2016/09/24 20:31]
wingated
+++ cs501r_f2016:tmp [2016/09/24 20:47]
wingated
@@ Line 1: / Line 1: @@
 ====Objective:====
-To read current papers on DNN research and translate them into working models.  To experiment with DNN-style regularization methods, including Dropout, Dropconnect, and L1/L2 weight regularization.
+To read current papers on DNN research and translate them into working models.  To experiment with DNN-style regularization methods, including Dropout, Dropconnect, and L1 weight regularization.
 ----
@@ Line 12: / Line 12: @@
   - You must implement dropout (NOT using the pre-defined Tensorflow layers)
   - You must implement dropconnect
-  - You must experiment with L1/L2 weight regularization
+  - You must implement L1 weight regularization
 You should  turn in an iPython notebook that shows three plots, one for each of the regularization methods.
@@ Line 18: / Line 18: @@
   - For dropout: a plot showing training / test performance as a function of the "keep probability".
   - For dropconnect: the same
-  - For L1/L2: a plot showing training / test performance as a function of the regularization strength, \lambda
+  - For L1 a plot showing training / test performance as a function of the regularization strength, \lambda
-An example of my training/test performance is shown at the right.
+An example of my training/test performance for dropout is shown at the right.
 ----
@@ Line 29: / Line 29: @@
   * 40% Correct implementation of Dropout
   * 30% Correct implementation of Dropconnect
-  * 20% Correct implementation of L1/L2 regularization
+  * 20% Correct implementation of L1 regularization
   * 10% Tidy and legible plots
@@ Line 36: / Line 36: @@
 This lab is a chance for you to start reading the literature on deep neural networks, and understand how to replicate methods from the literature.  You will implement 4 different regularization methods, and will benchmark each one.
-Please note tat
 To help ensure that everyone is starting off on the same footing, you should download the following scaffold code:
-**For all parts**
+[[http://liftothers.org/byu/lab6_scaffold.py|Lab 6 scaffold code]]
-For all 4 methods, we will run on a single, deterministic batch of the first 1000 images from the MNIST dataset.  This will help us to
+For all 4 methods, we will run on a single, deterministic batch of the first 1000 images from the MNIST dataset.  This will help us to overfit, and will hopefully be small enough not to tax your computers too much.
 **Part 1: implement dropout**
@@ Line 50: / Line 47: @@
 For the first part of the lab, you should implement dropout.  The paper upon which you should base your implementation is found at:
-[[https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf|Dropout]]
+[[https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf|The dropout paper]]
 The relevant equations are found in section 4 (pg 1933).  You may also refer to the slides.
@@ Line 56: / Line 53: @@
 There are several notes to help you with this part:
-  - First, you should run the provided code as-is.  It will overfit on the first 1000 images (how do you know this?).  Record the accuracy of the
+  - First, you should run the provided code as-is.  It will overfit on the first 1000 images (how do you know this?).  Record the test and training accuracy; this will be the "baseline" line in your plot.
   - Second, you should add dropout to each of the ''h1'', ''h2'', and ''h3'' layers.
   - You must consider carefully how to use tensorflow to implement dropout.
-  - Remember that you must scale activations by the ''keep_probability'', as discussed in class and in the paper.
+  - Remember that when you test images (or when you compute training set accuracy), you must scale activations by the ''keep_probability'', as discussed in class and in the paper.
   - You should use the Adam optimizer, and optimize for 150 steps.
@@ Line 67: / Line 64: @@
 Once you understand dropout, implementing it is not hard; you should only have to add ~10 lines of code.
+Also note that because dropout involves some randomness, your curve may not match mine exactly; this is expected.
 **Part 2: implement dropconnect**
@@ Line 76: / Line 75: @@
 **Important note**: the dropconnect paper has a somewhat more sophisticated inference method (that is, the method used at test time).  **We will not use that method.** Instead, we will use the same inference approximation used by the Dropout paper -- we will simply scale things by the ''keep_probability''.
-You should scan across the same values of ''keep_probability'', and you should generate the same plot.
+You should scan across the same values of ''keep_probability'', and you should generate a similar plot.
 Dropconnect seems to want more training steps than dropout, so you should run the optimizer for 1500 iterations.
-**Part 3: implement L1/L2 regularization**
+**Part 3: implement L1 regularization**
-For this part, you should implement both L1 and L2 regularization on the weights.  This will change your computation graph a bit, and specifically will change your cost function -- instead of optimizing just ''cross_entropy'', you should optimize ''cross_entropy + lam*regularizers'', where ''lam'' is the \lambda regularization parameter from the slides.  You should regularize all of the weights and biases (six variables in total).
-You should create a plot of test/training performance as you scan across values of lambda.  You should test at least [0.1, 0.01, 0.001].
-Note: unlike the dropout/dropconnect regularizers, you will probably not be able to improve test time performance!
-----
-====Hints:====
-To generate a random binary matrix, you can use ''np.random.rand'' to generate a matrix of random values between 0 and 1, and then only keep those above a certain threshold.
+For this part, you should implement L1 regularization on the weights.  This will change your computation graph a bit, and specifically wil

BYU CS classes

User Tools

Site Tools

Differences

Page Tools