This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cs501r_f2018:lab8 [2018/10/22 18:56] shreeya [Dataset:] |
cs501r_f2018:lab8 [2021/06/30 23:42] (current) |
||
---|---|---|---|
Line 37: | Line 37: | ||
====Dataset:==== | ====Dataset:==== | ||
- | The dataset you will be using is the [[http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html|"celebA" dataset]], a set of 202,599 face images of celebrities. Each image is 178x218. You should download the "aligned and cropped" version of the dataset. [[https://drive.google.com/drive/folders/0B7EVK8r0v71pTUZsaXdaSnZBZzg=img_align_celeba.zip|Here is a direct download link (1.4G)]], and | + | The dataset you will be using is the [[http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html|"celebA" dataset]], a set of 202,599 face images of celebrities. Each image is 178x218. You should download the "aligned and cropped" version of the dataset. [[https://drive.google.com/drive/folders/0B7EVK8r0v71pTUZsaXdaSnZBZzg|Here is a direct download link (1.4G)]], and |
[[https://www.dropbox.com/sh/8oqt9vytwxb3s4r/AAB06FXaQRUNtjW9ntaoPGvCa?dl=0&preview=README.txt|here is additional information about the dataset]]. | [[https://www.dropbox.com/sh/8oqt9vytwxb3s4r/AAB06FXaQRUNtjW9ntaoPGvCa?dl=0&preview=README.txt|here is additional information about the dataset]]. | ||
Line 56: | Line 56: | ||
* Our reference implementation used 5 layers: | * Our reference implementation used 5 layers: | ||
* A fully connected layer | * A fully connected layer | ||
- | * 4 convolution transposed layers, followed by a relu and batch norm layers (except for the final layer) | + | * 4 convolution transposed layers, followed by a batch norm layer and relu (except for the final layer) |
- | * Followed by a tanh | + | * Followed by a sigmoid (The true image is between 0 to 1 and you want your gen img to be between 0 and 1 too) |
==Part 1: Implement a discriminator network== | ==Part 1: Implement a discriminator network== | ||
Line 63: | Line 63: | ||
Again, you are encouraged to use either a DCGAN-like architecture, or a ResNet. | Again, you are encouraged to use either a DCGAN-like architecture, or a ResNet. | ||
- | Our reference implementation used 4 convolution layers, each followed by a leaky relu (leak 0.2) and batch norm layer (except no batch norm on the first layer). | + | Our reference implementation used 4 convolution layers, each followed by a batch norm layer and leaky relu (leak 0.2) No batch norm on the first layer. |
Note that the discriminator simply outputs a single scalar value. This value should unconstrained (ie, can be positive or negative), so you should **not** use a relu/sigmoid on the output of your network. | Note that the discriminator simply outputs a single scalar value. This value should unconstrained (ie, can be positive or negative), so you should **not** use a relu/sigmoid on the output of your network. | ||
Line 74: | Line 74: | ||
* **Reuse of variables.** Remember that because the discriminator is being called multiple times, you must ensure that you do not create new copies of the variables. Use ''requires_grad = True'' for the parameters of the discriminator. An easier way to do this would be to iterate through the discriminator model parameters and set ''param.requires_grad = True'' | * **Reuse of variables.** Remember that because the discriminator is being called multiple times, you must ensure that you do not create new copies of the variables. Use ''requires_grad = True'' for the parameters of the discriminator. An easier way to do this would be to iterate through the discriminator model parameters and set ''param.requires_grad = True'' | ||
* **Trainable variables.** In the algorithm, two different Adam optimizers are created, one for the generator, and one for the discriminator. You must make sure that each optimizer is only training the proper subset of variables! | * **Trainable variables.** In the algorithm, two different Adam optimizers are created, one for the generator, and one for the discriminator. You must make sure that each optimizer is only training the proper subset of variables! | ||
+ | |||
+ | <code python> | ||
+ | #initialize your generator and discriminator models | ||
+ | |||
+ | #initialize separate optimizer for both gen and disc | ||
+ | |||
+ | #initialize your dataset and dataloader | ||
+ | |||
+ | for e in epochs: | ||
+ | for true_img in trainloader: | ||
+ | | ||
+ | #train discriminator# | ||
+ | | ||
+ | #because you want to be able to backprop through the params in discriminator | ||
+ | for p in disc_model.parameters(): | ||
+ | p.requires_grad = True | ||
+ | | ||
+ | for p in gen_model.parameters(): | ||
+ | p.requires_grad = False | ||
+ | | ||
+ | for n in range(critic_iters): | ||
+ | disc_optim.zero_grad() | ||
+ | | ||
+ | # generate noise tensor z | ||
+ | # calculate disc loss: you will need autograd.grad | ||
+ | # call dloss.backward() and disc_optim.step() | ||
+ | | ||
+ | #train generator# | ||
+ | for p in disc_model.parameters(): | ||
+ | p.requires_grad = False | ||
+ | | ||
+ | for p in gen_model.parameters(): | ||
+ | p.requires_grad = True | ||
+ | | ||
+ | gen_optim.zero_grad() | ||
+ | | ||
+ | # generate noise tensor z | ||
+ | # calculate loss for gen | ||
+ | # call gloss.backward() and gen_optim.step() | ||
+ | | ||
+ | </code> | ||
Line 86: | Line 127: | ||
---- | ---- | ||
====Hints and implementation notes:==== | ====Hints and implementation notes:==== | ||
+ | |||
+ | We have recently tried turning off the batchnorms in both the generator and discriminator, and have gotten good results -- you may want to start without them, and only add them if you need them. Plus, it's faster without the batchnorms. | ||
The reference implementation was trained for 8 hours on a GTX 1070. It ran for 25 epochs (ie, scan through all 200,000 images), with batches of size 64 (3125 batches / epoch). | The reference implementation was trained for 8 hours on a GTX 1070. It ran for 25 epochs (ie, scan through all 200,000 images), with batches of size 64 (3125 batches / epoch). | ||
- | Although, it might work with far fewer ie 2 epochs... | + | However, we were able to get reasonable (if blurry) faces after training for 2-3 hours. |
I didn't try to optimize the hyperparameters; these are the values that I used: | I didn't try to optimize the hyperparameters; these are the values that I used: | ||
Line 98: | Line 141: | ||
lambda = 10 | lambda = 10 | ||
ncritic = 1 # 5 | ncritic = 1 # 5 | ||
- | alpha = 0.0002 # 0.0001 | + | learning_rate = 0.0002 # 0.0001 |
- | m = 64 | + | batch_size = 200 |
- | batch_norm decay=0.9 | + | batch_norm_decay=0.9 |
- | batch_norm epsilon=1e-5 | + | batch_norm_epsilon=1e-5 |
</code> | </code> | ||
Changing to number of critic steps from 5 to 1 didn't seem to matter; changing the alpha parameters to 0.0001 didn't seem to matter; but changing beta1 and beta2 to the values suggested in the paper (0.0 and 0.9, respectively) seemed to make things a lot worse. Different set of numbers might works well for different people. So play around with the numbers that work well for you. | Changing to number of critic steps from 5 to 1 didn't seem to matter; changing the alpha parameters to 0.0001 didn't seem to matter; but changing beta1 and beta2 to the values suggested in the paper (0.0 and 0.9, respectively) seemed to make things a lot worse. Different set of numbers might works well for different people. So play around with the numbers that work well for you. | ||
+ | |||
+ | This code should be helpful to get the data: | ||
+ | <code python> | ||
+ | !wget --load-cookies cookies.txt 'https://docs.google.com/uc?export=download&confirm='"$(wget --save-cookies cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')"'&id=0B7EVK8r0v71pZjFTYXZWM3FlRnM' -O img_align_celeba.zip | ||
+ | !unzip -q img_align_celeba | ||
+ | !mkdir test | ||
+ | !mv img_align_celeba test | ||
+ | </code> | ||
+ | |||
+ | And using the data in a dataset class: | ||
+ | <code python> | ||
+ | class CelebaDataset(Dataset): | ||
+ | def __init__(self, root, size=128, train=True): | ||
+ | super(CelebaDataset, self).__init__() | ||
+ | self.dataset_folder = torchvision.datasets.ImageFolder(os.path.join(root) ,transform = transforms.Compose([transforms.Resize((size,size)),transforms.ToTensor()])) | ||
+ | def __getitem__(self,index): | ||
+ | img = self.dataset_folder[index] | ||
+ | return img[0] | ||
+ | | ||
+ | def __len__(self): | ||
+ | return len(self.dataset_folder) | ||
+ | |||
+ | |||
+ | </code> |