This shows you the differences between two versions of the page.
cs501r_f2016:lab7 [2016/10/10 16:02] wingated |
cs501r_f2016:lab7 [2021/06/30 23:42] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====Objective:==== | ||
- | To learn about deconvolutions, variable sharing, trainable variables, | ||
- | and generative adversarial models. | ||
- | |||
- | ---- | ||
- | ====Deliverable:==== | ||
- | |||
- | {{ :cs501r_f2016:lab7_gan_results.png?200|}} | ||
- | |||
- | For this lab, you will need to implement a generative adversarial | ||
- | network (GAN). You will generate images that look like MNIST digits. | ||
- | |||
- | You should turn in an iPython notebook that shows a single plot, which | ||
- | will be samples from the final GAN. | ||
- | |||
- | An example of my final samples is shown at the right. | ||
- | |||
- | You are welcome to turn in your image and your code separately. | ||
- | |||
- | **NOTE:** this lab is complex. Please read through **the entire | ||
- | spec** before diving in. | ||
- | |||
- | ---- | ||
- | ====Grading standards:==== | ||
- | |||
- | Your code/image will be graded on the following: | ||
- | |||
- | * 20% Correct implementation of discriminator | ||
- | * 20% Correct implementation of generator | ||
- | * 20% Correct implementation of loss functions | ||
- | * 20% Correct sharing of variables | ||
- | * 10% Correct training of subsets of variables | ||
- | * 10% Tidy and legible final image | ||
- | |||
- | ---- | ||
- | ====Description:==== | ||
- | |||
- | This lab will help you develop several new tensorflow skills, as well | ||
- | as understand some best practices needed for building large models. | ||
- | In addition, we'll be able to create networks that generate neat images! | ||
- | |||
- | The most important new concepts here are //deconvolutions//, | ||
- | //variable reusing//, and //trainable variables//. Deconvolutions are | ||
- | what we will use to map a ''z'' vector to an image. Because we'll | ||
- | want to refer to the discriminator in two different contexts, we'll | ||
- | want to reuse its variables (instead of creating two different | ||
- | discriminators!). And because we'll want to optimize the | ||
- | discriminator and generator separately, we'll need to be able to train | ||
- | on subsets of variables. | ||
- | |||
- | This lab is a bit more complex than some of the others, so we are | ||
- | providing [[http://liftothers.org/byu/lab7_scaffold.py|some scaffold code]] | ||
- | |||
- | In the scaffold code, you will find the following: | ||
- | |||
- | - A small set of primitives for creating linear layers, convolution layers, and deconvolution layers. | ||
- | - A few placeholders where you should put your models | ||
- | - An optimization loop | ||
- | - A bit of code to visualize samples from the model | ||
- | |||
- | An important part of this lab is reading this code, so please take the | ||
- | time to thoroughly read and understand what it's doing. | ||
- | |||
- | Let's dive in! | ||
- | |||
- | ---- | ||
- | **Part 0: naming your variables, and training on subsets of variables** | ||
- | |||
- | Before filling in any code, we need to think ahead a bit. We're going | ||
- | to create a large-ish computation graph that describes everything | ||
- | about our GAN, including the generator and discriminator. However, | ||
- | when we train the discriminator, we'll want to adjust only the | ||
- | variables involved in the discriminator, and when we train the | ||
- | generator, we'll want to adjust only the variables involved in the | ||
- | generator. | ||
- | |||
- | How can we accomplish this? Well, tensorflow has a handy function | ||
- | called ''trainable_variables'' that returns a list of all the | ||
- | variables in your graph. By itself, this isn't quite enough -- we | ||
- | still need to distinguish generator variables from discriminator | ||
- | variables. | ||
- | |||
- | Here's how I solved this problem: by naming my variables consistently, | ||
- | and then creating a list of only discriminator / generator variables. | ||
- | So, for example, here's how I set up a trainer that optimizes my | ||
- | discriminator loss function (''d_loss'') by tweaking only | ||
- | discriminator variables (''d_vars''): | ||
- | |||
- | <code python> | ||
- | t_vars = tf.trainable_variables() | ||
- | d_vars = [var for var in t_vars if 'd_' in var.name] | ||
- | d_optim = tf.train.AdamOptimizer( 0.0002, beta1=0.5 ).minimize( d_loss, var_list=d_vars ) | ||
- | </code> | ||
- | |||
- | The critical part is that I created the ''var_list'' populated with | ||
- | only a subset of the variables I needed. | ||
- | |||
- | Note that for compatibility with the provided optimization code, you | ||
- | should name your train steps ''d_optim'' and ''g_optim''. | ||
- | |||
- | ---- | ||
- | **Part 1: create your placeholders** | ||
- | |||
- | What are the inputs to a GAN? At some point, we'll need to be able to | ||
- | pass in a ''z'' variable and some real images. So, you'll only need | ||
- | two placeholders in the entire computation graph! If you name them | ||
- | ''z'' and ''true_images'', then your code will be compatible with the | ||
- | provided optimization loop. | ||
- | |||
- | ---- | ||
- | **Part 2: create your discriminator** | ||
- | |||
- | To start, complete the ''disc_model'' function. This is the | ||
- | discriminator. Its job is to accept as input a batch of images (call | ||
- | it ''imgs''), and output a batch of probabilities (where each | ||
- | probability is the probability of the image being a **real** image). | ||
- | |||
- | Your discriminator should have the following layers: | ||
- | - ''H0'': A 2d convolution on ''imgs'' with 32 filters, followed by a leaky relu | ||
- | - ''H1'': A 2d convolution on ''H0'' with 64 filters, followed by a leaky relu | ||
- | - ''H2'': A linear layer from ''H1'' to a 1024 dimensional vector | ||
- | - ''H3'': A linear layer mapping ''H2'' to a single scalar (per image) | ||
- | - The final output should be a sigmoid of ''H3''. | ||
- | |||
- | The hardest part of creating your discriminator will be getting all of | ||
- | the dimensions to line up. Here are a few hints to help you: | ||
- | |||
- | - The images that are passed in will have dimension of ''[None,784]''. However, that's not compatible with a convolution! So, we need to reshape it. The first line of your function ought to be something like: ''imgs = tf.reshape( imgs, [ batch_size, 28, 28, 1 ] )''. Note that it's 4-dimensional - that's important! | ||
- | - Similarly, the output of the ''H1'' layer will be a 4 dimensional tensor, but it needs to go through a linear layer to get mapped down to 1024 dimensions. The easiest way to accomplish this is to reshape ''H1'' to be 2-dimensional, maybe something like: ''h1 = tf.reshape( h1, [ batch_size, -1 ] )'' | ||
- | |||
- | ---- | ||
- | **Part 3: create your generator** | ||
- | |||
- | Now, let's fill in the generator function. The generator's job is to | ||
- | accept a batch of ''z'' variables (each of dimension 100), and then | ||
- | return a batch of images (each image will be 28x28, but for | ||
- | compatibility with the discriminator, we will reshape it to be 784x1). | ||
- | |||
- | Your generator should have the following layers: | ||
- | - ''H1'': A linear layer, mapping ''z'' to 128*7*7 features, followed by a relu | ||
- | - ''D2'': a deconvolution layer, mapping ''H1'' to a tensor that is ''[batch_size,14,14,128]'', followed by a relu | ||
- | - ''D3'': a deconvolution layer, mapping ''D2'' to a tensor that is ''[batch_size,28,28,1]'' | ||
- | - The final output should be sigmoid of ''D3'' | ||
- | |||
- | Note that you reshape ''D3'' to be ''[batch_size,784]'' for | ||
- | compatibility with the discriminator. | ||
- | |||
- | ---- | ||
- | **Part 4: create your loss functions and training ops** | ||
- | |||
- | {{ :cs501r_f2016:lab7_graph.png?200|}} | ||
- | |||
- | You should create two loss functions, one for the discriminator, and | ||
- | one for the generator. Refer to the slides on GANs for details on the | ||
- | loss functions. Note that the slides and the following discussion are | ||
- | framed in terms of maximizing, but for consistency with my code (and | ||
- | other labs), you may wish to frame your cost functions in terms of | ||
- | minimization. | ||
- | |||
- | This is possibly the hardest part of the lab, even though the code is | ||
- | relatively simple. Here's how we need to wire up all of the pieces: | ||
- | |||
- | - We need to pass the ''z'' variable into the generative model, and call the output ''sample_images'' | ||
- | - We need to pass some true images into the discriminator, and get back some probabilities. | ||
- | - We need to pass some sampled images into the discriminator, and get back some (different) probabilities. | ||
- | - We need to construct a loss function for the discriminator that attempts to maximize the log of the output probabilities on the true images and the log of 1.0 - the output probabilities on the sampled images; these two halves can be summed together | ||
- | - We need to construct a loss function for the generator that attempts to maximize the log of the output probabilities on the sampled images | ||
- | - For debugging purposes, I highly recommend you create an additional op called ''d_acc'' that calculates classification accuracy on a batch. This can just check the output probabilities of the discriminator on the real and sampled images, and see if they're greater (or less) than 0.5. | ||
- | |||
- | **Here's the tricky part**. Note that in wiring up our overall model, | ||
- | we need to use the discriminator twice - once on real images, and once | ||
- | on sampled images. You've already coded up a nice function that | ||
- | encapsulates the discriminator, but we don't want to just call it | ||
- | twice -- that would create two copies of all of the variables. | ||
- | |||
- | Instead, we need to //share variables// -- the idea is that we want to | ||
- | be able to call our discriminator function twice to be able to perform | ||
- | the same classification logic, but use the same variables each time. | ||
- | Tensorflow has a mechanism | ||
- | to help with this, which you should [[https://www.tensorflow.org/versions/r0.11/how_tos/variable_scope/index.html|read about here]]. | ||
- | |||
- | Note that the provided layers already use "get_variable", so sharing | ||
- | variables should be as straightforward as figuring out when to call | ||
- | the ''reuse_variables'' function! | ||
- | |||
- | I highly recommend using Tensorboard to visualize your final | ||
- | computation graph to make sure you got this right. Check out my computation graph image on the right - you can see the two discriminator blocks, and you can see that the same variables are feeding into both of them. | ||
- | |||
- | ---- | ||
- | **Part 5: Run it and generate your final image!** | ||
- | |||
- | Assuming you've named all of your placeholders and ops properly, you | ||
- | can use the provided optimization code. It's set to run for 500 | ||
- | iterations, and print out some debugging information every 10 steps. | ||
- | |||
- | Note that the loop takes 3 steps for the generator for every 1 step | ||
- | taken by the discriminator! This is to help maintain the "balance of | ||
- | power" we talked about in class. | ||
- | |||
- | Assuming everything has gone well, you should see output something | ||
- | like this: | ||
- | |||
- | <code> | ||
- | 0 1.37 0.71 0.88 | ||
- | 10 0.90 0.98 1.00 | ||
- | 20 0.69 0.93 1.00 | ||
- | 30 0.89 1.14 0.91 | ||
- | 40 0.94 1.06 0.86 | ||
- | 50 0.77 1.20 0.96 | ||
- | 60 0.59 1.55 0.94 | ||
- | 70 0.46 1.47 0.97 | ||
- | 80 0.58 1.64 0.94 | ||
- | 90 0.42 1.64 0.98 | ||
- | 100 0.73 1.14 0.87 | ||
- | 110 0.74 1.51 0.91 | ||
- | 120 0.78 1.35 0.86 | ||
- | 130 1.08 1.31 0.71 | ||
- | 140 1.39 0.94 0.61 | ||
- | 150 0.90 1.24 0.82 | ||
- | 160 1.26 1.00 0.66 | ||
- | 170 0.90 1.03 0.81 | ||
- | 180 1.02 1.04 0.76 | ||
- | ... | ||
- | 490 1.25 1.12 0.68 | ||
- | </code> | ||
- | |||
- | Note that we see the struggle between the generator and discriminator | ||
- | clearly here. The first column represents the loss function for the | ||
- | discriminator, the second column is the loss function for the | ||
- | generator, and the final column is the discriminators classification | ||
- | accuracy. | ||
- | |||
- | Initially, the discriminator is able to distinguish almost perfectly | ||
- | between true and fake images, but by the end of training, it's only | ||
- | running at 68% accuracy. Not bad! | ||
- | |||
- | Note that for your final image, you may need to train longer -- I used | ||
- | 5000 steps, instead of 500. | ||
- | |||
- | **Hint for debugging**: if you ever see the cost function for the generator going higher and higher, it means that the discriminator is too powerful. |