Differences

This shows you the differences between two versions of the page.

--- cs501r_f2016:tmp [2016/09/19 17:18]
wingated
+++ cs501r_f2016:tmp [2016/09/24 20:45]
wingated
@@ Line 1: / Line 1: @@
 ====Objective:====
-To explore deeper networks, to leverage convolutions, and to explore Tensorboard.
+To read current papers on DNN research and translate them into working models.  To experiment with DNN-style regularization methods, including Dropout, Dropconnect, and L1/L2 weight regularization.
 ----
 ====Deliverable:====
-{{ :cs501r_f2016:screen_shot_2016-09-19_at_11.16.48_am.png?direct&200|}}
+{{ :cs501r_f2016:lab6_do.png?direct&200|}}
-For this lab, you will need to perform three steps:
+For this lab, you will need to implement three different regularization methods from the literature, and explore the parameters of each.
-  - You need to implement the [[https://www.tensorflow.org/versions/r0.10/tutorials/index.html|Deep MNIST for experts tutorial]]
+  - You must implement dropout (NOT using the pre-defined Tensorflow layers)
-  - You need to modify the tutorial code to deliver visualizations via Tensorboard.
+  - You must implement dropconnect
+  - You must implement L1 weight regularization
-Specifically, you should  turn in an iPython notebook that shows two images:
+You should  turn in an iPython notebook that shows three plots, one for each of the regularization methods.
-  - A Tensorboard image showing your cost function and classification accuracy over time (using the training set accuracy is fine)
-  - A Tensorboard image showing your (expanded) computation graph
-An example of the cost function / classification accuracies are shown at the right.  An example of the expanded computation graph is shown down below.  (Note the "Download PNG" button in the upper-left of Tensorboard!)
+  - For dropout: a plot showing training / test performance as a function of the "keep probability".
+  - For dropconnect: the same
+  - For L1 a plot showing training / test performance as a function of the regularization strength, \lambda
-According to the tutorial, if you run for 20,000 iterations, the final accuracy of your classifier will be around 99.5%.  To make your life simpler, you only need to run for 1500 iterations.  My final accuracy was 97.1%
+An example of my training/test performance for dropout is shown at the right.
 ----
@@ Line 26: / Line 27: @@
 Your notebook will be graded on the following:
-  * 40% Correct multilayer convolutional network defined and working
+  * 40% Correct implementation of Dropout
-  * 30% Tidy and legible display of Tensorboard accuracy / cost function
+  * 30% Correct implementation of Dropconnect
-  * 30% Tidy and legible display of Tensorboard computation graph
+  * 20% Correct implementation of L1 regularization
+  * 10% Tidy and legible plots
 ----
 ====Description:====
-You now understand the basics of multi-layer neural networks.  Here, we'll expand on your toolkit by adding in convolutions, a bit of dropout, and a new optimization method.  Most of these will be explained in future lectures, so for now we will just use them without (fully) understanding them.
+This lab is a chance for you to start reading the literature on deep neural networks, and understand how to replicate methods from the literature.  You will implement 4 different regularization methods, and will benchmark each one.
-**Part 1: implement deep convolutional networks **
+To help ensure that everyone is starting off on the same footing, you should download the following scaffold code:
-For this lab, you must implement the [[https://www.tensorflow.org/versions/r0.10/tutorials/index.html|Deep MNIST for experts tutorial]].  This is mostly cutting-and-pasting code; since you already have Tensorflow up and running, this should be fairly straightfoward.
+[[http://liftothers.org/byu/lab6_scaffold.py|Lab 6 scaffold code]]
-A few things to note:
+For all 4 methods, we will run on a single, deterministic batch of the first 1000 images from the MNIST dataset.  This will help us to overfit, and will hopefully be small enough not to tax your computers too much.
-  - You are now adding multiple layers.  Be careful with your variable names!
+**Part 1: implement dropout**
-  - You'll use the Adam optimizer, not vanilla SGD.  We learn more about this later.
-  - The dropout layer is optional, but you should probably leave it in just to make cutting-and-pasting easier.
-**Note:** you only need to train for 1500 steps.  My final accuracy was 97.1%, although it varied from run to run.  If you want to train for the full 20k steps, you are of course welcome to do so!
+For the first part of the lab, you should implement dropout.  The paper upon which you should base your implementation is found at:
-**Part 2: add in Tensorboard visualizations**
+[[https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf|The dropout paper]]
-{{ :cs501r_f2016:graph-run_2_.png?direct&200|}}
+The relevant equations are found in section 4 (pg 1933).  You may also refer to the slides.
-There are two parts to this: first, you need to scope all of the nodes in your computation graph.  In class, I showed a visualization that drew pretty boxes around all of the different parts of your computation graph.  That's what I want from you!  Check out the ''tf.name_scope'' function.  You should create 4 scopes: "Wx_B" for your computation graph, "Cost" for your cost function, "accuracy" for your accuracy calculations, and "optimizer" for your optimizer.  When you're done, you should have something that looks like the graph on the right.
+There are several notes to help you with this part:
-Second, you'll need to produce little graphs that show accuracy over time.  You should record your accuracy every 10 iterations.
+  - First, you should run the provided code as-is.  It will overfit on the first 1000 images (how do you know this?).  Record the test and training accuracy; this will be the "baseline" line in your plot.
+  - Second, you should add dropout to each of the ''h1'', ''h2'', and ''h3'' layers.
+  - You must consider carefully how to use tensorflow to implement dropout.
+  - Remember that when you test images (or when you compute training set accuracy), you must scale activations by the ''keep_probability'', as discussed in class and in the paper.
+  - You should use the Adam optimizer, and optimize for 150 steps.
-This is done by asking Tensorflow to create summaries of key variables.  Adventurous souls can dive right into the [[https://www.tensorflow.org/versions/r0.10/how_tos/summaries_and_tensorboard/index.html|Tensorflow visualization tutorial]].  Here are some condensed notes:
+Note that although we are training on only the first 1000 images, we are testing on the entire 10,000 image test set.
-Tensorboard logs //events// to a //summary log//.  You'll need to tell Tensorboard where to stash those events and when to write them out; both are done with a SummaryWriter.  You need to create a SummaryWriter object:
+In order to generate the final plot, you will need to scan across multiple values of the ''keep_probability''.  You may wish to refactor the provided code in order to make this easier.  You should test at least the values ''[ 0.1, 0.25, 0.5, 0.75, 1.0 ]''.
-''summary_writer = tf.train.SummaryWriter( "./tf_logs", graph=sess.graph )''
+Once you understand dropout, implementing it is not hard; you should only have to add ~10 lines of code.
-as well as scalar summaries of relevant variables; maybe something like this:
+Also note that because dropout involves some randomness, your curve may not match mine exactly; this is expected.
-''acc_summary = tf.scalar_summary( 'accuracy', accuracy )''
+**Part 2: implement dropconnect**
-These summaries are considered ops, just like any node in the computation graph, and they are triggered by ''sess.run''.  Tensorflow helpfully allows you to merge all of the summary ops into a single operation:
+The specifications for this part are similar to part 1.  Once you have implemented Dropout, it should be very easy to modify your code to perform dropconnect.  The paper upon which you should base your implementation is
-''merged_summary_op = tf.merge_all_summaries()''
+[[http://www.jmlr.org/proceedings/papers/v28/wan13.pdf|The dropconnect paper]]
-Then, you'll need to trigger the ''merged_summary_op'' operation.  This will generate a //summary string//, which you should pass to your summary writer.
+**Important note**: the dropconnect paper has a somewhat more sophisticated inference method (that is, the method used at test time).  **We will not use that method.** Instead, we will use the same inference approximation used by the Dropout paper -- we will simply scale things by the ''keep_probability''.
-Once you have run your code and collected the necessary statistics, you should be able to start up the Tensorboard visualizer.  It runs as a webserver; to start Tensorboard, you should be able to run something like the following **from the directory where you ran your TF code**:
+You should scan across the same values of ''keep_probability'', and you should generate a similar plot.
-<code bash>
+Dropconnect seems to want more training steps than dropout, so you should run the optimizer for 1500 iterations.
-cd tf_logs
-tensorboard --logdir .
-</code>
-At which point you'll see something like the following output:
+**Part 3: implement L1 regularization**
-<code>
+For this part, you should implement L1 regularization on the weights.  This will change your computation graph a bit, and specifically will change your cost function -- instead of optimizing just ''cross_entropy'', you should optimize ''cross_entropy + lam*regularizers'', where ''lam'' is the \lambda regularization parameter from the slides.  You should regularize all of the weights and biases (six variables in total).
-Starting TensorBoard 28 on port 6006
-(You can navigate to http://192.168.250.107:6006)
-</code>
-Point your browser to the spot indicated, and voila!
+You should create a plot of test/training performance as you scan across values of lambda.  You should test at least [0.1, 0.01, 0.001].
+Note: unlike the dropout/dropconnect regularizers, you will probably not be able to improve test time performance!
 ----
 ====Hints:====
-Make sure you close your ''SummaryWriter'' object at the end of your script, or else your accuracy / cross entropy graphs may not show up!
+To generate a random binary matrix, you can use ''np.random.rand'' to generate a matrix of random values between 0 and 1, and then only keep those above a certain threshold.
-Tensorboard seems a little finnicky.  I have found that I sometimes need to stop it and restart it to avoid having multiple graphs overlap, even if I remove the log files.

BYU CS classes

User Tools

Site Tools

Differences

Page Tools