Differences

This shows you the differences between two versions of the page.

--- cs501r_f2017:lab7 [2017/10/17 22:05]
wingated
+++ cs501r_f2017:lab7 [2017/10/30 21:32]
wingated
@@ Line 1: / Line 1: @@
-====WARNING THIS LAB SPEC IS UNDER DEVELOPMENT:====
 ====Objective:====
@@ Line 9: / Line 7: @@
 ----
 ====Deliverable:====
+{{ :cs501r_f2017:faces_samples.png?direct&200|}}
 For this lab, you will need to implement a generative adversarial
@@ Line 20: / Line 20: @@
 **NOTE:** this lab is complex.  Please read through **the entire
 spec** before diving in.
+Also note that training on this dataset will likely take some time.  Please make sure you start early enough to run the training long enough!
+{{ :cs501r_f2017:faces_interpolate.png?direct&200|}}
 ----
@@ Line 28: / Line 32: @@
   * 20% Correct implementation of discriminator
   * 20% Correct implementation of generator
-  * 20% Correct implementation of loss functions
+  * 50% Correct implementation of training algorithm
-  * 20% Correct sharing of variables
-  * 10% Correct training of subsets of variables
   * 10% Tidy and legible final image
@@ Line 47: / Line 49: @@
 In addition, we'll be able to create networks that generate neat images!
-The most important new concepts here are //deconvolutions//,
+==Part 0: Implement a generator network==
-//variable reusing//, and //trainable variables//.  Deconvolutions are
-what we will use to map a ''z'' vector to an image.  Because we'll
-want to refer to the discriminator in two different contexts, we'll
-want to reuse its variables (instead of creating two different
-discriminators!).  And because we'll want to optimize the
-discriminator and generator separately, we'll need to be able to train
-on subsets of variables.
-In the scaffold code, you will find the following:
+One of the advantages of the "Improved WGAN Training" algorithm is that many different kinds of topologies can be used.  For this lab, I recommend one of three options:
-  - A small set of primitives for creating linear layers, convolution layers, and deconvolution layers.
+  * The [[https://arxiv.org/pdf/1511.06434.pdf|DCGAN architecture]], see Fig. 1.
-  - A few placeholders where you should put your models
+  * A [[https://arxiv.org/pdf/1512.03385|ResNet]].
-  - An optimization loop
+  * Our reference implementation used 5 layers:
-  - A bit of code to visualize samples from the model
+      * A fully connected layer
+      * 4 convolution transposed layers, followed by a relu and batch norm layers (except for the final layer)
+      * Followed by a tanh
-An important part of this lab is reading this code, so please take the
+==Part 1: Implement a discriminator network==
-time to thoroughly read and understand what it's doing.
-Let's dive in!
+Again, you are encouraged to use either a DCGAN-like architecture, or a ResNet.
-----
+Our reference implementation used 4 convolution layers, each followed by a leaky relu (leak 0.2) and batch norm layer (except no batch norm on the first layer).
-**Part 0: naming your variables, and training on subsets of variables**
-Before filling in any code, we need to think ahead a bit.  We're going
+Note that the discriminator simply outputs a single scalar value.  This value should unconstrained (ie, can be positive or negative), so you should **not** use a relu/sigmoid on the output of your network.
-to create a large-ish computation graph that describes everything
-about our GAN, including the generator and discriminator.  However,
-when we train the discriminator, we'll want to adjust only the
-variables involved in the discriminator, and when we train the
-generator, we'll want to adjust only the variables involved in the
-generator.
-How can we accomplish this?  Well, tensorflow has a handy function
+==Part 2: Implement the Improved Wasserstein GAN training algorithm==
-called ''trainable_variables'' that returns a list of all the
-variables in your graph.  By itself, this isn't quite enough -- we
-still need to distinguish generator variables from discriminator
-variables.
-Here's how I solved this problem: by naming my variables consistently,
+The implementation of the improved Wasserstein GAN training algorithm (hereafter called "WGAN-GP") is fairly straightforward, but involves a few new details about tensorflow:
-and then creating a list of only discriminator / generator variables.
-So, for example, here's how I set up a trainer that optimizes my
-discriminator loss function (''d_loss'') by tweaking only
-discriminator variables (''d_vars''):
+  * **Gradient norm penalty.**  First of all, you must compute the gradient of the output of the discriminator with respect to x-hat.  To do this, you should use the ''tf.gradients'' function.
+  * **Reuse of variables.**  Remember that because the discriminator is being called multiple times, you must ensure that you do not create new copies of the variables.  Note that ''scope'' objects have a ''reuse_variables()'' function.
+  * **Trainable variables.**  In the algorithm, two different Adam optimizers are created, one for the generator, and one for the discriminator.  You must make sure that each optimizer is only training the proper subset of variables!  There are multiple ways to accomplish this.  For example, you could use scopes, or construct the set of trainable variables by examining their names and seeing if they start with "d_" or "g_":
 <code python>
-    t_vars = tf.trainable_variables()
+t_vars = tf.trainable_variables()
-    d_vars = [var for var in t_vars if 'd_' in var.name]
+self.d_vars = [var for var in t_vars if 'd_' in var.name]
-    d_optim = tf.train.AdamOptimizer( 0.0002, beta1=0.5 ).minimize( d_loss, var_list=d_vars )
+self.g_vars = [var for var in t_vars if 'g_' in var.name]
 </code>
-The critical part is that I created the ''var_list'' populated with
+I didn't try to optimize the hyperparameters; these are the values that I used:
-only a subset of the variables I needed.
-Note that for compatibility with the provided optimization code, you
+<code python>
-should name your train steps ''d_optim'' and ''g_optim''.
+beta1 = 0.5 # 0
+beta2 = 0.999 # 0.9
+lambda = 10
+ncritic = 1 # 5
+alpha = 0.0002 # 0.0001
+m = 64
-----
+batch_norm decay=0.9
-**Part 1: create your placeholders**
+batch_norm epsilon=1e-5
+</code>
-What are the inputs to a GAN?  At some point, we'll need to be able to
+Changing to number of critic steps from 5 to 1 didn't seem to matter; changing the alpha parameters to 0.0001 didn't seem to matter; but changing beta1 and beta2 to the values suggested in the paper (0.0 and 0.9, respectively) seemed to make things a lot worse.
-pass in a ''z'' variable and some real images.  So, you'll only need
-two placeholders in the entire computation graph!  If you name them
-''z'' and ''true_images'', then your code will be compatible with the
-provided optimization loop.
-----
+==Part 3: Generating the final face images==
-**Part 2: create your discriminator**
-To start, complete the ''disc_model'' function.  This is the
+Your final deliverable is two images.  The first should be a set of randomly generated faces.  This is as simple as generating random ''z'' variables, and then running them through your discriminator.
-discriminator.  Its job is to accept as input a batch of images (call
-it ''imgs''), and output a batch of probabilities (where each
-probability is the probability of the image being a **real** image).
-Your discriminator should have the following layers:
+For the second image, you must pick two random ''z'' values, then linearly interpolate between them (using about 8-10 steps).  Plot the face corresponding to each interpolated ''z'' value.
-  - ''H0'': A 2d convolution on ''imgs'' with 32 filters, followed by a leaky relu
-  - ''H1'': A 2d convolution on ''H0'' with 64 filters, followed by a leaky relu
-  - ''H2'': A linear layer from ''H1'' to a 1024 dimensional vector, followed by a leaky relu
-  - ''H3'': A linear layer mapping ''H2'' to a single scalar (per image)
-  - The final output should be a sigmoid of ''H3''.
-The hardest part of creating your discriminator will be getting all of
+See the beginning of this lab spec for examples of both images.
-the dimensions to line up.  Here are a few hints to help you:
-  - The images that are passed in will have dimension of ''[None,784]''.  However, that's not compatible with a convolution! So, we need to reshape it.  The first line of your function ought to be something like: ''imgs = tf.reshape( imgs, [ batch_size, 28, 28, 1 ] )''.  Note that it's 4-dimensional - that's important!
-  - Similarly, the output of the ''H1'' layer will be a 4 dimensional tensor, but it needs to go through a linear layer to get mapped down to 1024 dimensions.  The easiest way to accomplish this is to reshape ''H1'' to be 2-dimensional, maybe something like: ''h1 = tf.reshape( h1, [ batch_size, -1 ] )''
 ----
-**Part 3: create your generator**
+====Hints and implementation notes:====
-Now, let's fill in the generator function.  The generator's job is to
-accept a batch of ''z'' variables (each of dimension 100), and then
-return a batch of images (each image will be 28x28, but for
-compatibility with the discriminator, we will reshape it to be 784x1).
-Your generator should have the following layers:
-  - ''H1'': A linear layer, mapping ''z'' to 128*7*7 features, followed by a relu
-  - ''D2'': a deconvolution layer, mapping ''H1'' to a tensor that is ''[batch_size,14,14,128]'', followed by a relu
-  - ''D3'': a deconvolution layer, mapping ''D2'' to a tensor that is ''[batch_size,28,28,1]''
-  - The final output should be sigmoid of ''D3''
-Note that you reshape ''D3'' to be ''[batch_size,784]'' for
-compatibility with the discriminator.
-----
-**Part 4: create your loss functions and training ops**
-{{ :cs501r_f2016:lab7_graph.png?200|}}
-You should create two loss functions, one for the discriminator, and
-one for the generator.  Refer to the slides on GANs for details on the
-loss functions.  Note that the slides and the following discussion are
-framed in terms of maximizing, but for consistency with my code (and
-other labs), you may wish to frame your cost functions in terms of
-minimization.
-This is possibly the hardest part of the lab, even though the code is
-relatively simple.  Here's how we need to wire up all of the pieces:
-  - We need to pass the ''z'' variable into the generative model, and call the output ''sample_images''
-  - We need to pass some true images into the discriminator, and get back some probabilities.
-  - We need to pass some sampled images into the discriminator, and get back some (different) probabilities.
-  - We need to construct a loss function for the discriminator that attempts to maximize the log of the output probabilities on the true images and the log of 1.0 - the output probabilities on the sampled images; these two halves can be summed together
-  - We need to construct a loss function for the generator that attempts to maximize the log of the output probabilities on the sampled images
-  - For debugging purposes, I highly recommend you create an additional op called ''d_acc'' that calculates classification accuracy on a batch. This can just check the output probabilities of the discriminator on the real and sampled images, and see if they're greater (or less) than 0.5.
-**Here's the tricky part**.  Note that in wiring up our overall model,
-we need to use the discriminator twice - once on real images, and once
-on sampled images.  You've already coded up a nice function that
-encapsulates the discriminator, but we don't want to just call it
-twice -- that would create two copies of all of the variables.
-Instead, we need to //share variables// -- the idea is that we want to
-be able to call our discriminator function twice to be able to perform
-the same classification logic, but use the same variables each time.
-Tensorflow has a mechanism
-to help with this, which you should [[https://www.tensorflow.org/versions/r0.11/how_tos/variable_scope/index.html|read about here]].
-Note that the provided layers already use "get_variable", so sharing
-variables should be as straightforward as figuring out when to call
-the ''reuse_variables'' function!
-I highly recommend using Tensorboard to visualize your final
-computation graph to make sure you got this right. Check out my computation graph image on the right - you can see the two discriminator blocks, and you can see that the same variables are feeding into both of them.
-----
-**Part 5: Run it and generate your final image!**
-Assuming you've named all of your placeholders and ops properly, you
-can use the provided optimization code.  It's set to run for 500
-iterations, and print out some debugging information every 10 steps.
-Note that the loop takes 3 steps for the generator for every 1 step
-taken by the discriminator!  This is to help maintain the "balance of
-power" we talked about in class.
-Assuming everything has gone well, you should see output something
-like this:
-<code>
-       1.37 0.71 0.88
-      0.90 0.98 1.00
-      0.69 0.93 1.00
-      0.89 1.14 0.91
-      0.94 1.06 0.86
-      0.77 1.20 0.96
-      0.59 1.55 0.94
-      0.46 1.47 0.97
-      0.58 1.64 0.94
-      0.42 1.64 0.98
-     0.73 1.14 0.87
-     0.74 1.51 0.91
-     0.78 1.35 0.86
-     1.08 1.31 0.71
-     1.39 0.94 0.61
-     0.90 1.24 0.82
-     1.26 1.00 0.66
-     0.90 1.03 0.81
-     1.02 1.04 0.76
-...
-     1.25 1.12 0.68
-</code>
-Note that we see the struggle between the generator and discriminator
-clearly here. The first column represents the loss function for the
-discriminator, the second column is the loss function for the
-generator, and the final column is the discriminators classification
-accuracy.
-Initially, the discriminator is able to distinguish almost perfectly
-between true and fake images, but by the end of training, it's only
-running at 68% accuracy.  Not bad!
-Note that for your final image, you may need to train longer -- I used
+The reference implementation was trained for 8 hours on a GTX 1070.  It ran for 25 epochs (ie, scan through all 200,000 images), with batches of size 64 (3125 batches / epoch).
-steps, instead of 500.
-**Hint for debugging**: if you ever see the cost function for the generator going higher and higher, it means that the discriminator is too powerful.
+Although, it might work with far fewer (ie, 2) epochs...

BYU CS classes

User Tools

Site Tools

Differences

Page Tools