User Tools

Site Tools


cs501r_f2018:lab8

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
cs501r_f2018:lab8 [2018/10/22 18:44]
shreeya [Hints and implementation notes:]
cs501r_f2018:lab8 [2021/06/30 23:42] (current)
Line 37: Line 37:
 ====Dataset:​==== ====Dataset:​====
  
-The dataset you will be using is the [[http://​mmlab.ie.cuhk.edu.hk/​projects/​CelebA.html|"​celebA"​ dataset]], a set of 202,599 face images of celebrities. ​ Each image is 178x218. ​ You should download the "​aligned and cropped"​ version of the dataset. [[https://www.dropbox.com/sh/8oqt9vytwxb3s4r/AADSNUu0bseoCKuxuI5ZeTl1a/​Img?​dl=0&​preview=img_align_celeba.zip|Here is a direct download link (1.4G)]], and+The dataset you will be using is the [[http://​mmlab.ie.cuhk.edu.hk/​projects/​CelebA.html|"​celebA"​ dataset]], a set of 202,599 face images of celebrities. ​ Each image is 178x218. ​ You should download the "​aligned and cropped"​ version of the dataset. [[https://drive.google.com/drive/folders/0B7EVK8r0v71pTUZsaXdaSnZBZzg|Here is a direct download link (1.4G)]], and
 [[https://​www.dropbox.com/​sh/​8oqt9vytwxb3s4r/​AAB06FXaQRUNtjW9ntaoPGvCa?​dl=0&​preview=README.txt|here is additional information about the dataset]]. [[https://​www.dropbox.com/​sh/​8oqt9vytwxb3s4r/​AAB06FXaQRUNtjW9ntaoPGvCa?​dl=0&​preview=README.txt|here is additional information about the dataset]].
  
Line 56: Line 56:
   * Our reference implementation used 5 layers:   * Our reference implementation used 5 layers:
       * A fully connected layer       * A fully connected layer
-      * 4 convolution transposed layers, followed by a relu and batch norm layers ​(except for the final layer) +      * 4 convolution transposed layers, followed by a batch norm layer and relu (except for the final layer) 
-      * Followed by a tanh+      * Followed by a sigmoid (The true image is between 0 to 1 and you want your gen img to be between 0 and 1 too)
  
 ==Part 1: Implement a discriminator network== ==Part 1: Implement a discriminator network==
Line 63: Line 63:
 Again, you are encouraged to use either a DCGAN-like architecture,​ or a ResNet. ​ Again, you are encouraged to use either a DCGAN-like architecture,​ or a ResNet. ​
  
-Our reference implementation used 4 convolution layers, each followed by a leaky relu (leak 0.2) and batch norm layer (except no batch norm on the first layer).+Our reference implementation used 4 convolution layers, each followed by a batch norm layer and leaky relu (leak 0.2) No batch norm on the first layer.
  
 Note that the discriminator simply outputs a single scalar value. ​ This value should unconstrained (ie, can be positive or negative), so you should **not** use a relu/​sigmoid on the output of your network. Note that the discriminator simply outputs a single scalar value. ​ This value should unconstrained (ie, can be positive or negative), so you should **not** use a relu/​sigmoid on the output of your network.
Line 74: Line 74:
   * **Reuse of variables.** ​ Remember that because the discriminator is being called multiple times, you must ensure that you do not create new copies of the variables. ​ Use ''​requires_grad = True''​ for the parameters of the discriminator. An easier way to do this would be to iterate through the discriminator model parameters and set ''​param.requires_grad = True''​   * **Reuse of variables.** ​ Remember that because the discriminator is being called multiple times, you must ensure that you do not create new copies of the variables. ​ Use ''​requires_grad = True''​ for the parameters of the discriminator. An easier way to do this would be to iterate through the discriminator model parameters and set ''​param.requires_grad = True''​
   * **Trainable variables.** ​ In the algorithm, two different Adam optimizers are created, one for the generator, and one for the discriminator. ​ You must make sure that each optimizer is only training the proper subset of variables! ​   * **Trainable variables.** ​ In the algorithm, two different Adam optimizers are created, one for the generator, and one for the discriminator. ​ You must make sure that each optimizer is only training the proper subset of variables! ​
 +
 +<code python>
 +#initialize your generator and discriminator models
 +
 +#initialize separate optimizer for both gen and disc
 +
 +#initialize your dataset and dataloader
 +
 +for e in epochs:
 +  for true_img in trainloader:​
 +  ​
 +    #train discriminator#​
 +  ​
 +    #because you want to be able to backprop through the params in discriminator ​
 +    for p in disc_model.parameters():​
 +      p.requires_grad = True
 +      ​
 +    for p in gen_model.parameters():​
 +      p.requires_grad = False
 +      ​
 +    for n in range(critic_iters):​
 +      disc_optim.zero_grad()
 +  ​
 +      # generate noise tensor z
 +      # calculate disc loss: you will need autograd.grad
 +      # call dloss.backward() and disc_optim.step()
 +      ​
 +    #train generator#
 +    for p in disc_model.parameters():​
 +      p.requires_grad = False
 +      ​
 +    for p in gen_model.parameters():​
 +      p.requires_grad = True
 +      ​
 +    gen_optim.zero_grad()
 +      ​
 +    # generate noise tensor z
 +    # calculate loss for gen
 +    # call gloss.backward() and gen_optim.step()
 +  ​
 +</​code>​
  
  
Line 86: Line 127:
 ---- ----
 ====Hints and implementation notes:==== ====Hints and implementation notes:====
 +
 +We have recently tried turning off the batchnorms in both the generator and discriminator,​ and have gotten good results -- you may want to start without them, and only add them if you need them.  Plus, it's faster without the batchnorms.
  
 The reference implementation was trained for 8 hours on a GTX 1070.  It ran for 25 epochs (ie, scan through all 200,000 images), with batches of size 64 (3125 batches / epoch). The reference implementation was trained for 8 hours on a GTX 1070.  It ran for 25 epochs (ie, scan through all 200,000 images), with batches of size 64 (3125 batches / epoch).
  
-Althoughit might work with far fewer ie epochs...+Howeverwe were able to get reasonable (if blurry) faces after training for 2-3 hours.
  
 I didn't try to optimize the hyperparameters;​ these are the values that I used: I didn't try to optimize the hyperparameters;​ these are the values that I used:
Line 98: Line 141:
 lambda = 10 lambda = 10
 ncritic = 1 # 5 ncritic = 1 # 5
-alpha = 0.0002 # 0.0001 +learning_rate ​= 0.0002 # 0.0001 
-64+batch_size ​200
  
-batch_norm decay=0.9 +batch_norm_decay=0.9 
-batch_norm epsilon=1e-5+batch_norm_epsilon=1e-5
 </​code>​ </​code>​
  
-Changing to number of critic steps from 5 to 1 didn't seem to matter; changing the alpha parameters to 0.0001 didn't seem to matter; but changing beta1 and beta2 to the values suggested in the paper (0.0 and 0.9, respectively) seemed to make things a lot worse. So play around with the numbers that work well for you. There is no specific set of numbers that works well for everyone.+Changing to number of critic steps from 5 to 1 didn't seem to matter; changing the alpha parameters to 0.0001 didn't seem to matter; but changing beta1 and beta2 to the values suggested in the paper (0.0 and 0.9, respectively) seemed to make things a lot worse. Different set of numbers might works well for different people. So play around with the numbers that work well for you. 
 + 
 +This code should be helpful to get the data: 
 +<code python>​ 
 +!wget --load-cookies cookies.txt '​https://​docs.google.com/​uc?​export=download&​confirm='"​$(wget --save-cookies cookies.txt --keep-session-cookies --no-check-certificate '​https://​docs.google.com/​uc?​export=download&​id=0B7EVK8r0v71pZjFTYXZWM3FlRnM'​ -O- | sed -rn '​s/​.*confirm=([0-9A-Za-z_]+).*/​\1\n/​p'​)"'&​id=0B7EVK8r0v71pZjFTYXZWM3FlRnM'​ -O img_align_celeba.zip 
 +!unzip -q img_align_celeba 
 +!mkdir test 
 +!mv img_align_celeba test 
 +</​code>​ 
 + 
 +And using the data in a dataset class: 
 +<code python>​ 
 +class CelebaDataset(Dataset):​ 
 +  def __init__(self,​ root, size=128, train=True):​ 
 +    super(CelebaDataset,​ self).__init__() 
 +    self.dataset_folder = torchvision.datasets.ImageFolder(os.path.join(root) ,transform = transforms.Compose([transforms.Resize((size,​size)),​transforms.ToTensor()])) 
 +  def __getitem__(self,​index):​ 
 +    img = self.dataset_folder[index] 
 +    return img[0] 
 +     
 +  def __len__(self):​ 
 +    return len(self.dataset_folder) 
 + 
 + 
 +</​code>​
cs501r_f2018/lab8.1540233898.txt.gz · Last modified: 2021/06/30 23:40 (external edit)