Differences

This shows you the differences between two versions of the page.

--- cs501r_f2018:lab8 [2018/10/26 20:58]
shreeya
+++ cs501r_f2018:lab8 [2021/06/30 23:42] (current)
@@ Line 57: / Line 57: @@
       * A fully connected layer
       * 4 convolution transposed layers, followed by a batch norm layer and relu (except for the final layer)
-      * Followed by a sigmoid (because the true image is between 0 to 1 and you want your gen img to be between 0 and 1 too)
+      * Followed by a sigmoid (The true image is between 0 to 1 and you want your gen img to be between 0 and 1 too)
 ==Part 1: Implement a discriminator network==
@@ Line 63: / Line 63: @@
 Again, you are encouraged to use either a DCGAN-like architecture, or a ResNet.
-Our reference implementation used 4 convolution layers, each followed by a batch norm layer and leaky relu (leak 0.2)(except for the final layer) Note: No batch norm on the first layer.
+Our reference implementation used 4 convolution layers, each followed by a batch norm layer and leaky relu (leak 0.2) No batch norm on the first layer.
 Note that the discriminator simply outputs a single scalar value.  This value should unconstrained (ie, can be positive or negative), so you should **not** use a relu/sigmoid on the output of your network.
@@ Line 90: / Line 90: @@
     for p in disc_model.parameters():
       p.requires_grad = True
+    for p in gen_model.parameters():
+      p.requires_grad = False
     for n in range(critic_iters):
@@ Line 101: / Line 104: @@
     for p in disc_model.parameters():
       p.requires_grad = False
-      gen_optim.zero_grad()
-      # generate noise tensor z
+    for p in gen_model.parameters():
-      # calculate loss for gen
+      p.requires_grad = True
-      # call gloss.backward() and gen_optim.step()
+    gen_optim.zero_grad()
+    # generate noise tensor z
+    # calculate loss for gen
+    # call gloss.backward() and gen_optim.step()
 </code>
@@ Line 120: / Line 127: @@
 ----
 ====Hints and implementation notes:====
+We have recently tried turning off the batchnorms in both the generator and discriminator, and have gotten good results -- you may want to start without them, and only add them if you need them.  Plus, it's faster without the batchnorms.
 The reference implementation was trained for 8 hours on a GTX 1070.  It ran for 25 epochs (ie, scan through all 200,000 images), with batches of size 64 (3125 batches / epoch).
-Although, it might work with far fewer ie 2 epochs...
+However, we were able to get reasonable (if blurry) faces after training for 2-3 hours.
 I didn't try to optimize the hyperparameters; these are the values that I used:
@@ Line 132: / Line 141: @@
 lambda = 10
 ncritic = 1 # 5
-alpha = 0.0002 # 0.0001
+learning_rate = 0.0002 # 0.0001
-m = 64
+batch_size = 200
-batch_norm decay=0.9
+batch_norm_decay=0.9
-batch_norm epsilon=1e-5
+batch_norm_epsilon=1e-5
 </code>
@@ Line 145: / Line 154: @@
 !wget --load-cookies cookies.txt 'https://docs.google.com/uc?export=download&confirm='"$(wget --save-cookies cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')"'&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM' -O img_align_celeba.zip
 !unzip -q img_align_celeba
+!mkdir test
+!mv img_align_celeba test
 </code>

BYU CS classes

User Tools

Site Tools

Differences

Page Tools