User Tools

Site Tools


cs501r_f2017:lab7

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cs501r_f2017:lab7 [2017/10/17 22:05]
wingated
cs501r_f2017:lab7 [2021/06/30 23:42] (current)
Line 1: Line 1:
-====WARNING THIS LAB SPEC IS UNDER DEVELOPMENT:​==== 
- 
  
 ====Objective:​==== ====Objective:​====
Line 9: Line 7:
 ---- ----
 ====Deliverable:​==== ====Deliverable:​====
 +
 +{{ :​cs501r_f2017:​faces_samples.png?​direct&​200|}}
  
 For this lab, you will need to implement a generative adversarial For this lab, you will need to implement a generative adversarial
Line 20: Line 20:
 **NOTE:** this lab is complex. ​ Please read through **the entire **NOTE:** this lab is complex. ​ Please read through **the entire
 spec** before diving in. spec** before diving in.
 +
 +Also note that training on this dataset will likely take some time.  Please make sure you start early enough to run the training long enough!
 +
 +{{ :​cs501r_f2017:​faces_interpolate.png?​direct&​200|}}
  
 ---- ----
Line 28: Line 32:
   * 20% Correct implementation of discriminator   * 20% Correct implementation of discriminator
   * 20% Correct implementation of generator   * 20% Correct implementation of generator
-  * 20% Correct implementation of loss functions +  * 50% Correct implementation of training ​algorithm
-  * 20% Correct sharing of variables +
-  * 10% Correct ​training ​of subsets of variables+
   * 10% Tidy and legible final image   * 10% Tidy and legible final image
  
Line 47: Line 49:
 In addition, we'll be able to create networks that generate neat images! In addition, we'll be able to create networks that generate neat images!
  
-The most important new concepts here are //​deconvolutions//,​ +==Part 0: Implement ​a generator ​network==
-//variable reusing//, and //trainable variables//​. ​ Deconvolutions are +
-what we will use to map ''​z''​ vector to an image. ​ Because we'​ll +
-want to refer to the discriminator in two different contexts, we'​ll +
-want to reuse its variables (instead of creating two different +
-discriminators!). ​ And because we'll want to optimize the +
-discriminator and generator ​separately, we'll need to be able to train +
-on subsets of variables.+
  
-In the scaffold code, you will find the following:+One of the advantages of the "​Improved WGAN Training"​ algorithm is that many different kinds of topologies can be used.  For this lab, I recommend one of three options:
  
-  ​- A small set of primitives for creating linear layersconvolution layers, and deconvolution layers+  ​* The [[https://​arxiv.org/​pdf/​1511.06434.pdf|DCGAN architecture]]see Fig. 1
-  ​few placeholders where you should put your models +  ​[[https://​arxiv.org/​pdf/​1512.03385|ResNet]]. 
-  ​- An optimization loop +  ​* Our reference implementation used 5 layers: 
-  ​- ​bit of code to visualize samples from the model+      ​* ​fully connected layer 
 +      * 4 convolution transposed layers, followed by a relu and batch norm layers (except for the final layer) 
 +      * Followed by a tanh
  
-An important part of this lab is reading this code, so please take the +==Part 1: Implement a discriminator network==
-time to thoroughly read and understand what it's doing.+
  
-Let's dive in!+Again, you are encouraged to use either a DCGAN-like architecture,​ or a ResNet. ​
  
----- +Our reference implementation used 4 convolution layerseach followed by a leaky relu (leak 0.2) and batch norm layer (except no batch norm on the first layer).
-**Part 0: naming your variables, and training ​on subsets of variables**+
  
-Before filling in any code, we need to think ahead a bit.  We're going +Note that the discriminator ​simply outputs a single scalar value.  ​This value should unconstrained (iecan be positive or negative)so you should **not** use a relu/​sigmoid on the output of your network.
-to create a large-ish computation graph that describes everything +
-about our GAN, including ​the generator and discriminator.  ​However, +
-when we train the discriminatorwe'll want to adjust only the +
-variables involved in the discriminator,​ and when we train the +
-generator, we'll want to adjust only the variables involved in the +
-generator.+
  
-How can we accomplish this?  Well, tensorflow has a handy function +==Part 2: Implement ​the Improved Wasserstein GAN training algorithm==
-called ''​trainable_variables''​ that returns a list of all the +
-variables in your graph. ​ By itself, this isn't quite enough -- we +
-still need to distinguish generator variables from discriminator +
-variables.+
  
-Here's how I solved this problem: by naming my variables consistently,​ +The implementation ​of the improved Wasserstein GAN training algorithm (hereafter called "​WGAN-GP"​) is fairly straightforwardbut involves ​few new details about tensorflow:
-and then creating a list of only discriminator / generator variables. +
-Sofor example, here's how I set up trainer that optimizes my +
-discriminator loss function (''​d_loss''​) by tweaking only +
-discriminator variables (''​d_vars''​):+
  
 +  * **Gradient norm penalty.** ​ First of all, you must compute the gradient of the output of the discriminator with respect to x-hat. ​ To do this, you should use the ''​tf.gradients''​ function.
 +  * **Reuse of variables.** ​ Remember that because the discriminator is being called multiple times, you must ensure that you do not create new copies of the variables. ​ Note that ''​scope''​ objects have a ''​reuse_variables()''​ function.
 +  * **Trainable variables.** ​ In the algorithm, two different Adam optimizers are created, one for the generator, and one for the discriminator. ​ You must make sure that each optimizer is only training the proper subset of variables! ​ There are multiple ways to accomplish this.  For example, you could use scopes, or construct the set of trainable variables by examining their names and seeing if they start with "​d_"​ or "​g_":​
 <code python> <code python>
-    ​t_vars = tf.trainable_variables() +t_vars = tf.trainable_variables() 
-    d_vars = [var for var in t_vars if '​d_'​ in var.name] +self.d_vars = [var for var in t_vars if '​d_'​ in var.name] 
-    ​d_optim = tf.train.AdamOptimizer( 0.0002, beta1=0.5 ).minimize( d_loss, var_list=d_vars )+self.g_vars ​[var for var in t_vars if '​g_'​ in var.name]
 </​code>​ </​code>​
  
-The critical part is that created the ''​var_list''​ populated with +didn't try to optimize ​the hyperparameters;​ these are the values that used:
-only a subset of the variables ​needed.+
  
-Note that for compatibility with the provided optimization ​code, you +<code python> 
-should name your train steps ''​d_optim''​ and ''​g_optim''​.+beta1 = 0.5 # 0 
 +beta2 = 0.999 # 0.9 
 +lambda = 10 
 +ncritic = 1 # 5 
 +alpha = 0.0002 # 0.0001 
 +m = 64
  
----- +batch_norm decay=0.9 
-**Part 1: create your placeholders**+batch_norm epsilon=1e-5 
 +</​code>​
  
-What are the inputs ​to a GAN?  At some point, we'll need to be able to +Changing ​to number of critic steps from 5 to 1 didn't seem to matter; changing the alpha parameters ​to 0.0001 didn't seem to matter; but changing beta1 and beta2 to the values suggested ​in the paper (0.0 and 0.9respectively) seemed to make things a lot worse.
-pass in a ''​z''​ variable ​and some real images. ​ So, you'll only need +
-two placeholders ​in the entire computation graph! ​ If you name them +
-''​z'' ​and ''​true_images''​then your code will be compatible with the +
-provided optimization loop.+
  
----- +==Part 3Generating the final face images==
-**Part 2create your discriminator**+
  
-To start, complete the ''​disc_model''​ function.  ​This is the +Your final deliverable is two images.  ​The first should be a set of randomly generated faces.  ​This is as simple as generating random ​''​z'' ​variables, and then running them through your generator.
-discriminator.  ​Its job is to accept ​as input a batch of images (call +
-it ''​imgs''​), and output a batch of probabilities (where each +
-probability is the probability of the image being a **real** image).+
  
-Your discriminator should have the following layers: +For the second imageyou must pick two random ​''​z'' ​valuesthen linearly interpolate between them (using about 8-10 steps). ​ Plot the face corresponding ​to each interpolated ​''​z'' ​value.
-  - ''​H0'':​ A 2d convolution on ''​imgs''​ with 32 filtersfollowed by a leaky relu +
-  - ''​H1''​: A 2d convolution on ''​H0''​ with 64 filtersfollowed by a leaky relu +
-  ​''​H2'':​ A linear layer from ''​H1''​ to a 1024 dimensional vector, followed by a leaky relu +
-  - ''​H3'':​ A linear layer mapping ''​H2'' ​to a single scalar (per image) +
-  - The final output should be a sigmoid of ''​H3''​.+
  
-The hardest part of creating your discriminator will be getting all of +See the beginning ​of this lab spec for examples ​of both images.
-the dimensions to line up.  Here are a few hints to help you: +
- +
-  - The images ​that are passed in will have dimension of ''​[None,​784]''​. ​ However, that's not compatible with a convolution! So, we need to reshape it.  The first line of your function ought to be something like: ''​imgs = tf.reshape( imgs, [ batch_size, 28, 28, 1 ] )''​. ​ Note that it's 4-dimensional - that's important! +
-  - Similarly, the output of the ''​H1''​ layer will be a 4 dimensional tensor, but it needs to go through a linear layer to get mapped down to 1024 dimensions. ​ The easiest way to accomplish this is to reshape ''​H1''​ to be 2-dimensional,​ maybe something like: ''​h1 = tf.reshape( h1, [ batch_size, -1 ] )''​+
  
 ---- ----
-**Part 3: create your generator** +====Hints ​and implementation notes:====
- +
-Now, let's fill in the generator function. ​ The generator'​s job is to +
-accept a batch of ''​z''​ variables (each of dimension 100), and then +
-return a batch of images (each image will be 28x28, but for +
-compatibility with the discriminator,​ we will reshape it to be 784x1). +
- +
-Your generator should have the following layers: +
-  - ''​H1'':​ A linear layer, mapping ''​z''​ to 128*7*7 features, followed by a relu +
-  - ''​D2'':​ a deconvolution layer, mapping ''​H1''​ to a tensor that is ''​[batch_size,​14,​14,​128]'',​ followed by a relu +
-  - ''​D3'':​ a deconvolution layer, mapping ''​D2''​ to a tensor that is ''​[batch_size,​28,​28,​1]''​ +
-  - The final output should be sigmoid of ''​D3''​ +
- +
-Note that you reshape ''​D3''​ to be ''​[batch_size,​784]''​ for +
-compatibility with the discriminator. +
- +
----- +
-**Part 4: create your loss functions and training ops** +
- +
-{{ :​cs501r_f2016:​lab7_graph.png?​200|}} +
- +
-You should create two loss functions, one for the discriminator,​ and +
-one for the generator. ​ Refer to the slides on GANs for details on the +
-loss functions. ​ Note that the slides and the following discussion are +
-framed in terms of maximizing, but for consistency with my code (and +
-other labs), you may wish to frame your cost functions in terms of +
-minimization. +
- +
-This is possibly the hardest part of the lab, even though the code is +
-relatively simple. ​ Here's how we need to wire up all of the pieces: +
- +
-  - We need to pass the ''​z''​ variable into the generative model, and call the output ''​sample_images''​ +
-  - We need to pass some true images into the discriminator,​ and get back some probabilities. +
-  - We need to pass some sampled images into the discriminator,​ and get back some (different) probabilities. +
-  - We need to construct a loss function for the discriminator that attempts to maximize the log of the output probabilities on the true images and the log of 1.0 - the output probabilities on the sampled images; these two halves can be summed together +
-  - We need to construct a loss function for the generator that attempts to maximize the log of the output probabilities on the sampled images +
-  - For debugging purposes, I highly recommend you create an additional op called ''​d_acc''​ that calculates classification accuracy on a batch. This can just check the output probabilities of the discriminator on the real and sampled images, and see if they'​re greater (or less) than 0.5. +
- +
-**Here'​s the tricky part**. ​ Note that in wiring up our overall model, +
-we need to use the discriminator twice - once on real images, and once +
-on sampled images. ​ You've already coded up a nice function that +
-encapsulates the discriminator,​ but we don't want to just call it +
-twice -- that would create two copies of all of the variables. +
- +
-Instead, we need to //share variables// -- the idea is that we want to +
-be able to call our discriminator function twice to be able to perform +
-the same classification logic, but use the same variables each time. +
-Tensorflow has a mechanism +
-to help with this, which you should [[https://​www.tensorflow.org/​versions/​r0.11/​how_tos/​variable_scope/​index.html|read about here]]. +
- +
-Note that the provided layers already use "​get_variable",​ so sharing +
-variables should be as straightforward as figuring out when to call +
-the ''​reuse_variables''​ function! +
- +
-I highly recommend using Tensorboard to visualize your final +
-computation graph to make sure you got this right. Check out my computation graph image on the right - you can see the two discriminator blocks, and you can see that the same variables are feeding into both of them. +
- +
----- +
-**Part 5: Run it and generate your final image!** +
- +
-Assuming you've named all of your placeholders and ops properly, you +
-can use the provided optimization code.  It's set to run for 500 +
-iterations, and print out some debugging information every 10 steps. +
- +
-Note that the loop takes 3 steps for the generator for every 1 step +
-taken by the discriminator! ​ This is to help maintain the "​balance of +
-power" we talked about in class. +
- +
-Assuming everything has gone well, you should see output something +
-like this: +
- +
-<​code>​ +
-0       1.37 0.71 0.88 +
-10      0.90 0.98 1.00 +
-20      0.69 0.93 1.00 +
-30      0.89 1.14 0.91 +
-40      0.94 1.06 0.86 +
-50      0.77 1.20 0.96 +
-60      0.59 1.55 0.94 +
-70      0.46 1.47 0.97 +
-80      0.58 1.64 0.94 +
-90      0.42 1.64 0.98 +
-100     0.73 1.14 0.87 +
-110     0.74 1.51 0.91 +
-120     0.78 1.35 0.86 +
-130     1.08 1.31 0.71 +
-140     1.39 0.94 0.61 +
-150     0.90 1.24 0.82 +
-160     1.26 1.00 0.66 +
-170     0.90 1.03 0.81 +
-180     1.02 1.04 0.76 +
-... +
-490     1.25 1.12 0.68 +
-</​code>​ +
- +
-Note that we see the struggle between the generator and discriminator +
-clearly here. The first column represents the loss function for the +
-discriminator,​ the second column is the loss function for the +
-generator, and the final column is the discriminators classification +
-accuracy. +
- +
-Initially, the discriminator is able to distinguish almost perfectly +
-between true and fake images, but by the end of training, it's only +
-running at 68% accuracy. ​ Not bad!+
  
-Note that for your final imageyou may need to train longer -- I used +The reference implementation was trained ​for 8 hours on a GTX 1070.  It ran for 25 epochs (ie, scan through all 200,000 images)with batches ​of size 64 (3125 batches / epoch).
-5000 stepsinstead ​of 500.+
  
-**Hint for debugging**:​ if you ever see the cost function for the generator going higher and higher, it means that the discriminator is too powerful.+Although, it might work with far fewer (ie, 2) epochs...
cs501r_f2017/lab7.1508277923.txt.gz · Last modified: 2021/06/30 23:40 (external edit)