User Tools

Site Tools


cs501r_f2016:lab7

WARNING THIS LAB SPEC IS UNDER DEVELOPMENT:

Objective:

To learn about deconvolutions, variable sharing, trainable variables, and generative adversarial models.


Deliverable:

For this lab, you will need to implement a generative adversarial network (GAN). Specifically, we will be using the technique outlined in the paper Improved Training of Wasserstein GANs.

You should turn in an iPython notebook that shows a two plots. The first plot should be random samples from the final generator. The second should show interpolation between two faces by interpolating in z space.

You must also turn in your code, but your code does not need to be in a notebook, if it's easier to turn it in separately (but please zip your code and notebook together in a single zip file).

An example of my final samples is shown at the right.

NOTE: this lab is complex. Please read through the entire spec before diving in.


Grading standards:

Your code/image will be graded on the following:

  • 20% Correct implementation of discriminator
  • 20% Correct implementation of generator
  • 20% Correct implementation of loss functions
  • 20% Correct sharing of variables
  • 10% Correct training of subsets of variables
  • 10% Tidy and legible final image

Description:

This lab will help you develop several new tensorflow skills, as well as understand some best practices needed for building large models. In addition, we'll be able to create networks that generate neat images!

The most important new concepts here are deconvolutions, variable reusing, and trainable variables. Deconvolutions are what we will use to map a z vector to an image. Because we'll want to refer to the discriminator in two different contexts, we'll want to reuse its variables (instead of creating two different discriminators!). And because we'll want to optimize the discriminator and generator separately, we'll need to be able to train on subsets of variables.

In the scaffold code, you will find the following:

  1. A small set of primitives for creating linear layers, convolution layers, and deconvolution layers.
  2. A few placeholders where you should put your models
  3. An optimization loop
  4. A bit of code to visualize samples from the model

An important part of this lab is reading this code, so please take the time to thoroughly read and understand what it's doing.

Let's dive in!


Part 0: naming your variables, and training on subsets of variables

Before filling in any code, we need to think ahead a bit. We're going to create a large-ish computation graph that describes everything about our GAN, including the generator and discriminator. However, when we train the discriminator, we'll want to adjust only the variables involved in the discriminator, and when we train the generator, we'll want to adjust only the variables involved in the generator.

How can we accomplish this? Well, tensorflow has a handy function called trainable_variables that returns a list of all the variables in your graph. By itself, this isn't quite enough – we still need to distinguish generator variables from discriminator variables.

Here's how I solved this problem: by naming my variables consistently, and then creating a list of only discriminator / generator variables. So, for example, here's how I set up a trainer that optimizes my discriminator loss function (d_loss) by tweaking only discriminator variables (d_vars):

    t_vars = tf.trainable_variables()
    d_vars = [var for var in t_vars if 'd_' in var.name]
    d_optim = tf.train.AdamOptimizer( 0.0002, beta1=0.5 ).minimize( d_loss, var_list=d_vars )

The critical part is that I created the var_list populated with only a subset of the variables I needed.

Note that for compatibility with the provided optimization code, you should name your train steps d_optim and g_optim.


Part 1: create your placeholders

What are the inputs to a GAN? At some point, we'll need to be able to pass in a z variable and some real images. So, you'll only need two placeholders in the entire computation graph! If you name them z and true_images, then your code will be compatible with the provided optimization loop.


Part 2: create your discriminator

To start, complete the disc_model function. This is the discriminator. Its job is to accept as input a batch of images (call it imgs), and output a batch of probabilities (where each probability is the probability of the image being a real image).

Your discriminator should have the following layers:

  1. H0: A 2d convolution on imgs with 32 filters, followed by a leaky relu
  2. H1: A 2d convolution on H0 with 64 filters, followed by a leaky relu
  3. H2: A linear layer from H1 to a 1024 dimensional vector, followed by a leaky relu
  4. H3: A linear layer mapping H2 to a single scalar (per image)
  5. The final output should be a sigmoid of H3.

The hardest part of creating your discriminator will be getting all of the dimensions to line up. Here are a few hints to help you:

  1. The images that are passed in will have dimension of [None,784]. However, that's not compatible with a convolution! So, we need to reshape it. The first line of your function ought to be something like: imgs = tf.reshape( imgs, [ batch_size, 28, 28, 1 ] ). Note that it's 4-dimensional - that's important!
  2. Similarly, the output of the H1 layer will be a 4 dimensional tensor, but it needs to go through a linear layer to get mapped down to 1024 dimensions. The easiest way to accomplish this is to reshape H1 to be 2-dimensional, maybe something like: h1 = tf.reshape( h1, [ batch_size, -1 ] )

Part 3: create your generator

Now, let's fill in the generator function. The generator's job is to accept a batch of z variables (each of dimension 100), and then return a batch of images (each image will be 28×28, but for compatibility with the discriminator, we will reshape it to be 784×1).

Your generator should have the following layers:

  1. H1: A linear layer, mapping z to 128*7*7 features, followed by a relu
  2. D2: a deconvolution layer, mapping H1 to a tensor that is [batch_size,14,14,128], followed by a relu
  3. D3: a deconvolution layer, mapping D2 to a tensor that is [batch_size,28,28,1]
  4. The final output should be sigmoid of D3

Note that you reshape D3 to be [batch_size,784] for compatibility with the discriminator.


Part 4: create your loss functions and training ops

You should create two loss functions, one for the discriminator, and one for the generator. Refer to the slides on GANs for details on the loss functions. Note that the slides and the following discussion are framed in terms of maximizing, but for consistency with my code (and other labs), you may wish to frame your cost functions in terms of minimization.

This is possibly the hardest part of the lab, even though the code is relatively simple. Here's how we need to wire up all of the pieces:

  1. We need to pass the z variable into the generative model, and call the output sample_images
  2. We need to pass some true images into the discriminator, and get back some probabilities.
  3. We need to pass some sampled images into the discriminator, and get back some (different) probabilities.
  4. We need to construct a loss function for the discriminator that attempts to maximize the log of the output probabilities on the true images and the log of 1.0 - the output probabilities on the sampled images; these two halves can be summed together
  5. We need to construct a loss function for the generator that attempts to maximize the log of the output probabilities on the sampled images
  6. For debugging purposes, I highly recommend you create an additional op called d_acc that calculates classification accuracy on a batch. This can just check the output probabilities of the discriminator on the real and sampled images, and see if they're greater (or less) than 0.5.

Here's the tricky part. Note that in wiring up our overall model, we need to use the discriminator twice - once on real images, and once on sampled images. You've already coded up a nice function that encapsulates the discriminator, but we don't want to just call it twice – that would create two copies of all of the variables.

Instead, we need to share variables – the idea is that we want to be able to call our discriminator function twice to be able to perform the same classification logic, but use the same variables each time. Tensorflow has a mechanism to help with this, which you should read about here.

Note that the provided layers already use “get_variable”, so sharing variables should be as straightforward as figuring out when to call the reuse_variables function!

I highly recommend using Tensorboard to visualize your final computation graph to make sure you got this right. Check out my computation graph image on the right - you can see the two discriminator blocks, and you can see that the same variables are feeding into both of them.


Part 5: Run it and generate your final image!

Assuming you've named all of your placeholders and ops properly, you can use the provided optimization code. It's set to run for 500 iterations, and print out some debugging information every 10 steps.

Note that the loop takes 3 steps for the generator for every 1 step taken by the discriminator! This is to help maintain the “balance of power” we talked about in class.

Assuming everything has gone well, you should see output something like this:

0       1.37 0.71 0.88
10      0.90 0.98 1.00
20      0.69 0.93 1.00
30      0.89 1.14 0.91
40      0.94 1.06 0.86
50      0.77 1.20 0.96
60      0.59 1.55 0.94
70      0.46 1.47 0.97
80      0.58 1.64 0.94
90      0.42 1.64 0.98
100     0.73 1.14 0.87
110     0.74 1.51 0.91
120     0.78 1.35 0.86
130     1.08 1.31 0.71
140     1.39 0.94 0.61
150     0.90 1.24 0.82
160     1.26 1.00 0.66
170     0.90 1.03 0.81
180     1.02 1.04 0.76
...
490     1.25 1.12 0.68

Note that we see the struggle between the generator and discriminator clearly here. The first column represents the loss function for the discriminator, the second column is the loss function for the generator, and the final column is the discriminators classification accuracy.

Initially, the discriminator is able to distinguish almost perfectly between true and fake images, but by the end of training, it's only running at 68% accuracy. Not bad!

Note that for your final image, you may need to train longer – I used 5000 steps, instead of 500.

Hint for debugging: if you ever see the cost function for the generator going higher and higher, it means that the discriminator is too powerful.

cs501r_f2016/lab7.txt · Last modified: 2021/06/30 23:42 (external edit)