User Tools

Site Tools


cs501r_f2017:lab7

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cs501r_f2017:lab7 [2017/10/24 22:45]
wingated
cs501r_f2017:lab7 [2021/06/30 23:42] (current)
Line 55: Line 55:
   * The [[https://​arxiv.org/​pdf/​1511.06434.pdf|DCGAN architecture]],​ see Fig. 1.   * The [[https://​arxiv.org/​pdf/​1511.06434.pdf|DCGAN architecture]],​ see Fig. 1.
   * A [[https://​arxiv.org/​pdf/​1512.03385|ResNet]].   * A [[https://​arxiv.org/​pdf/​1512.03385|ResNet]].
- +  * Our reference implementation used 5 layers: 
-Our reference implementation used 5 layers: +      * A fully connected layer 
- +      * 4 convolution transposed layers, followed by a relu and batch norm layers (except for the final layer) 
-  ​* A fully connected layer +      Followed by a tanh
-  * 4 convolution transposed layers, followed by a relu and batch norm layers (except for the final layer) +
-  A final tanH nonlinearity+
  
 ==Part 1: Implement a discriminator network== ==Part 1: Implement a discriminator network==
Line 66: Line 64:
 Again, you are encouraged to use either a DCGAN-like architecture,​ or a ResNet. ​ Again, you are encouraged to use either a DCGAN-like architecture,​ or a ResNet. ​
  
-Our reference implementation used 4 convolution layers, each followed by a leaky relu (leak 0.2) and batch norm layer, with a sigmoid as the final nonlinearity+Our reference implementation used 4 convolution layers, each followed by a leaky relu (leak 0.2) and batch norm layer (except no batch norm on the first layer).
  
-Note that the discriminator simply outputs a single scalar value. ​ This value should unconstrained (ie, can be positive or negative), so you should *not* use a relu on the output of your network.+Note that the discriminator simply outputs a single scalar value. ​ This value should unconstrained (ie, can be positive or negative), so you should ​**not** use a relu/​sigmoid ​on the output of your network.
  
 ==Part 2: Implement the Improved Wasserstein GAN training algorithm== ==Part 2: Implement the Improved Wasserstein GAN training algorithm==
Line 74: Line 72:
 The implementation of the improved Wasserstein GAN training algorithm (hereafter called "​WGAN-GP"​) is fairly straightforward,​ but involves a few new details about tensorflow: The implementation of the improved Wasserstein GAN training algorithm (hereafter called "​WGAN-GP"​) is fairly straightforward,​ but involves a few new details about tensorflow:
  
-  * *Gradient norm penalty.* ​ First of all, you must compute the gradient of the output of the discriminator with respect to x-hat. ​ To do this, you should use the ''​tf.gradients''​ function. +  ​**Gradient norm penalty.**  First of all, you must compute the gradient of the output of the discriminator with respect to x-hat. ​ To do this, you should use the ''​tf.gradients''​ function. 
-  * *Reuse of variables.* ​ Remember that because the discriminator is being called multiple times, you must ensure that you do not create new copies of the variables. ​ Note that ''​scope''​ objects have a ''​reuse_variables()''​ function. +  ​**Reuse of variables.**  Remember that because the discriminator is being called multiple times, you must ensure that you do not create new copies of the variables. ​ Note that ''​scope''​ objects have a ''​reuse_variables()''​ function. 
-  * *Trainable variables.* ​ In the algorithm, two different Adam optimizers are created, one for the generator, and one for the discriminator. ​ You must make sure that each optimizer is only training the proper subset of variables! ​ There are multiple ways to accomplish this.  For example, you could construct the set of trainable variables by examining their names and seeing if they start with "​d_"​ or "​g_"​. ​ You can also use scopes.+  ​**Trainable variables.**  In the algorithm, two different Adam optimizers are created, one for the generator, and one for the discriminator. ​ You must make sure that each optimizer is only training the proper subset of variables! ​ There are multiple ways to accomplish this.  For example, you could use scopes, or construct the set of trainable variables by examining their names and seeing if they start with "​d_"​ or "​g_"​
 +<code python>​ 
 +t_vars = tf.trainable_variables() 
 +self.d_vars = [var for var in t_vars if '​d_'​ in var.name] 
 +self.g_vars = [var for var in t_vars if '​g_'​ in var.name] 
 +</​code>​ 
 + 
 +I didn't try to optimize the hyperparameters;​ these are the values that I used: 
 + 
 +<code python>​ 
 +beta1 = 0.5 # 0 
 +beta2 = 0.999 # 0.9 
 +lambda = 10 
 +ncritic = 1 # 5 
 +alpha = 0.0002 # 0.0001 
 +m = 64 
 + 
 +batch_norm decay=0.9 
 +batch_norm epsilon=1e-5 
 +</​code>​ 
 + 
 +Changing to number of critic steps from 5 to 1 didn't seem to matter; changing the alpha parameters to 0.0001 didn't seem to matter; but changing beta1 and beta2 to the values suggested in the paper (0.0 and 0.9, respectively) seemed to make things a lot worse.
  
 ==Part 3: Generating the final face images== ==Part 3: Generating the final face images==
  
-Your final deliverable is two images. ​ The first should be a set of randomly generated faces. ​ This is as simple as generating random ''​z''​ variables, and then running them through your discriminator.+Your final deliverable is two images. ​ The first should be a set of randomly generated faces. ​ This is as simple as generating random ''​z''​ variables, and then running them through your generator.
  
 For the second image, you must pick two random ''​z''​ values, then linearly interpolate between them (using about 8-10 steps). ​ Plot the face corresponding to each interpolated ''​z''​ value. For the second image, you must pick two random ''​z''​ values, then linearly interpolate between them (using about 8-10 steps). ​ Plot the face corresponding to each interpolated ''​z''​ value.
Line 90: Line 109:
  
 The reference implementation was trained for 8 hours on a GTX 1070.  It ran for 25 epochs (ie, scan through all 200,000 images), with batches of size 64 (3125 batches / epoch). The reference implementation was trained for 8 hours on a GTX 1070.  It ran for 25 epochs (ie, scan through all 200,000 images), with batches of size 64 (3125 batches / epoch).
 +
 +Although, it might work with far fewer (ie, 2) epochs...
cs501r_f2017/lab7.1508885112.txt.gz · Last modified: 2021/06/30 23:40 (external edit)