This shows you the differences between two versions of the page.
Next revision | Previous revision Next revision Both sides next revision | ||
cs501r_f2016:lab13 [2016/11/22 19:53] wingated created |
cs501r_f2016:lab13 [2017/11/11 17:10] wingated |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | |||
- | https://www.cs.toronto.edu/~frossard/post/vgg16/ | ||
====Objective:==== | ====Objective:==== | ||
Line 8: | Line 6: | ||
---- | ---- | ||
====Deliverable:==== | ====Deliverable:==== | ||
+ | |||
+ | {{ :cs501r_f2016:style1.png?300|}} | ||
For this lab, you will need to implement the style transfer algorithm of [[https://arxiv.org/pdf/1508.06576v2.pdf|Gatys et al]]. | For this lab, you will need to implement the style transfer algorithm of [[https://arxiv.org/pdf/1508.06576v2.pdf|Gatys et al]]. | ||
Line 19: | Line 19: | ||
- The final image that you generated | - The final image that you generated | ||
- Your code | - Your code | ||
+ | |||
+ | An example image that I generated is shown at the right. | ||
---- | ---- | ||
====Grading standards:==== | ====Grading standards:==== | ||
- | Your notebook will be graded on the following: | + | Your code will be graded on the following: |
- | * 35% Correct implementation of Siamese network | + | * 35% Correct extraction of statistics |
- | * 35% Correct implementation of Resnet | + | * 35% Correct construction of cost function |
- | * 20% Reasonable effort to find a good-performing topology | + | * 20% Correct initialization and optimization of image variable |
- | * 10% Results writeup | + | * 10% Awesome looking final image |
---- | ---- | ||
====Description:==== | ====Description:==== | ||
- | For this lab, you should implement the style transfer algorithm referenced above. We are providing the following: | + | For this lab, you should implement the style transfer algorithm referenced above. We are providing the following, [[https://www.dropbox.com/sh/tt0ctms12aumgui/AACRKSSof6kw-wi8vs1v8ls3a?dl=0 |
+ | |available from a dropbox folder]]: | ||
+ | |||
+ | - lab10_scaffold.py - Lab 10 scaffolding code | ||
+ | - vgg16.py - The VGG16 model | ||
+ | - content.png - An example content image | ||
+ | - style.png - An example style image | ||
+ | |||
+ | You will also need the VGG16 pre-trained weights: | ||
- | - [[http://liftothers.org/byu/lab10_scaffold.py|Lab 10 scaffolding code]] | ||
- | - [[http://liftothers.org/byu/vgg16.py|The VGG16 model]]\ | ||
- [[http://liftothers.org/byu/vgg16_weights.npz|VGG16 weights]] | - [[http://liftothers.org/byu/vgg16_weights.npz|VGG16 weights]] | ||
- | - [[http://liftothers.org/byu/content.png|An example content image]] | + | |
- | - [[http://liftothers.org/byu/style.png|An example style image]] | + | |
In the scaffolding code, you will find some examples of how to use the provided VGG model. (This model is a slightly modified version of [[https://www.cs.toronto.edu/~frossard/post/vgg16/|code available here]]). | In the scaffolding code, you will find some examples of how to use the provided VGG model. (This model is a slightly modified version of [[https://www.cs.toronto.edu/~frossard/post/vgg16/|code available here]]). | ||
**Note:** In class, we discussed how to construct a computation graph that reuses the VGG network 3 times (one for content, style, and optimization images). It turns out that you don't need to do that. In fact, we merely need to //evaluate// the VGG network on the content and style images, and save the resulting activations. | **Note:** In class, we discussed how to construct a computation graph that reuses the VGG network 3 times (one for content, style, and optimization images). It turns out that you don't need to do that. In fact, we merely need to //evaluate// the VGG network on the content and style images, and save the resulting activations. | ||
+ | |||
+ | The activations can be used to construct a cost function directly. In other words, we don't need to keep around the content/style VGG networks, because we'll never back-propagate through them. | ||
The steps for completion of this lab are: | The steps for completion of this lab are: | ||
- | - Load all of the data. Create a test/training split. | + | - Run the VGG network on the content and style images. Save the activations. |
- | - Establish a baseline accuracy (ie, if you randomly predict same/different, what accuracy do you achieve?) | + | - Construct a content loss function, based on the paper |
- | - Use tensorflow to create your siamese network. | + | - Construct a style loss function, based on the paper |
- | - Use ResNets to extract features from the images | + | - For each layer specified in the paper (also noted in the code), you'll need to construct a Gram matrix |
- | - Make sure that parameters are shared across both halves of the network! | + | - That Gram matrix should match an equivalent Gram matrix computed on the style activations |
- | - Train the network using an optimizer of your choice | + | - Construct an Adam optimizer, step size 0.1 |
- | - You should use some sort of SGD. | + | - Initialize all of your variables and reload your VGG weights |
- | - You will need to sample same/different pairs. | + | - Initialize your optimization image to be the content image (or another image of your choosing) |
+ | - Optimize! | ||
- | Note: you will NOT be graded on the accuracy of your final classifier, as long as you make a good faith effort to come up with something that performs reasonably well. | + | Some of these steps are already done in the scaffolding code. |
- | Your ResNet should extract a vector of features from each image. Those feature vectors should then be compared to calculate an "energy"; that energy should then be input into a contrastive loss function, as discussed in class. | + | Note that I ran my DNN for about 6000 steps to generate the image shown above. |
- | Remember that your network should be symmetric, so if you swap input images, nothing should change. | + | Here was my loss function over time: |
- | + | ||
- | Note that some people in the database only have one image. These images are still useful, however (why?), so don't just throw them away. | + | |
- | + | <code> | |
- | ---- | + | ITER LOSS STYLE LOSS CONTENT LOSS |
- | ====Writeup:==== | + | 0 210537.875000 210537872.00000 0.000000 |
- | + | 100 73993.000000 67282552.000000 6710.441406 | |
- | As discussed in the "Deliverable" section, your writeup must include the following: | + | 200 47634.054688 39536856.000000 8097.196777 |
- | + | 300 36499.234375 28016930.000000 8482.302734 | |
- | - A description of your test/training split | + | 400 30405.132812 21805504.000000 8599.625977 |
- | - A description of your resnet architecture (layers, strides, nonlinearities, etc.) | + | 500 26572.333984 17947418.000000 8624.916016 |
- | - How you assessed whether or not your architecture was working | + | 600 23952.351562 15339518.000000 8612.833008 |
- | - The final performance of your classifier | + | 700 22057.589844 13475838.000000 8581.751953 |
- | + | 800 20623.390625 12093137.000000 8530.253906 | |
- | This writeup should be small - less than 1 page. You don't need to wax eloquent. | + | 900 19504.234375 11023667.000000 8480.566406 |
+ | 1000 18598.349609 10174618.000000 8423.731445 | ||
+ | 1100 17857.289062 9491233.000000 8366.055664 | ||
+ | 1200 17243.207031 8932358.000000 8310.849609 | ||
+ | 1300 16727.312500 8470261.000000 8257.049805 | ||
+ | 1400 16287.441406 8079912.500000 8207.528320 | ||
+ | 1500 15904.160156 7747010.500000 8157.148926 | ||
+ | 1600 15567.595703 7453235.500000 8114.359863 | ||
+ | 1700 15269.226562 7199946.500000 8069.279297 | ||
+ | 1800 15003.159180 6973264.000000 8029.895020 | ||
+ | 1900 14762.021484 6776666.500000 7985.354492 | ||
+ | 2000 14544.566406 6602410.000000 7942.156738 | ||
+ | 2100 14347.167969 6442019.000000 7905.148926 | ||
+ | 2200 14166.757812 6299105.500000 7867.651367 | ||
+ | 2300 13999.201172 6169558.500000 7829.643066 | ||
+ | 2400 13845.177734 6053753.000000 7791.424316 | ||
+ | 2500 13701.140625 5946503.500000 7754.636230 | ||
+ | 2600 13566.027344 5846906.000000 7719.121582 | ||
+ | 2700 13440.531250 5751874.500000 7688.655762 | ||
+ | 2800 13322.011719 5664197.500000 7657.814453 | ||
+ | 2900 13210.117188 5585183.000000 7624.934570 | ||
+ | 3000 13105.109375 5510268.000000 7594.841797 | ||
+ | 3100 13005.414062 5440027.500000 7565.385742 | ||
+ | 3200 12912.160156 5376126.000000 7536.033203 | ||
+ | 3300 12824.537109 5316451.500000 7508.085938 | ||
+ | 3400 12742.234375 5259337.500000 7482.895996 | ||
+ | 3500 12663.185547 5202367.500000 7460.817871 | ||
+ | 3600 12588.695312 5151772.000000 7436.922363 | ||
+ | 3700 12517.728516 5103315.000000 7414.413574 | ||
+ | 3800 12450.191406 5055678.000000 7394.513184 | ||
+ | 3900 12385.476562 5012455.000000 7373.021484 | ||
+ | 4000 12323.820312 4973657.000000 7350.163086 | ||
+ | 4100 12263.249023 4937481.000000 7325.767578 | ||
+ | 4200 12204.673828 4898750.000000 7305.923340 | ||
+ | 4300 12148.785156 4860086.000000 7288.698242 | ||
+ | 4400 12095.140625 4822883.500000 7272.257324 | ||
+ | 4500 12043.544922 4787642.500000 7255.902832 | ||
+ | 4600 11992.242188 4753499.500000 7238.742188 | ||
+ | 4700 11942.533203 4722825.500000 7219.708008 | ||
+ | 4800 11895.559570 4695372.500000 7200.187012 | ||
+ | 4900 11849.578125 4666181.000000 7183.397461 | ||
+ | 5000 11804.967773 4639222.500000 7165.745117 | ||
+ | 5100 11762.816406 4614679.500000 7148.136719 | ||
+ | 5200 11722.379883 4589744.000000 7132.635742 | ||
+ | 5300 11682.291016 4565345.000000 7116.945312 | ||
+ | 5400 11642.744141 4541704.500000 7101.039062 | ||
+ | 5500 11604.595703 4519445.000000 7085.149902 | ||
+ | 5600 11568.400391 4497892.000000 7070.507812 | ||
+ | 5700 11533.195312 4478154.000000 7055.040527 | ||
+ | 5800 11497.519531 4459191.000000 7038.328125 | ||
+ | 5900 11463.125977 4439539.000000 7023.586914 | ||
+ | 6000 11429.999023 4421518.000000 7008.480957 | ||
+ | </code> | ||
---- | ---- | ||
====Hints:==== | ====Hints:==== | ||
- | To help you get started, here's a simple script that will load all of the images and calculate labels. It assumes that the face database has been unpacked in the current directory, and that there exists a file called ''list.txt'' that was generated with the following command: | + | You should make sure that if you initialize your image to the content image, and your loss function is strictly the content loss, that your loss is 0.0 |
- | <code bash> | + | I found that it was important to clip pixel values to be in [0,255]. To do that, every 100 iterations I extracted the image, clipped it, and then assigned it back in. |
- | find ./lfw2/ -name \*.jpg > list.txt | + | |
- | </code> | + | |
- | After running this code, the data will in the ''data'' tensor, and the labels will be in the ''labels'' tensor: | + | ...although now that I think about it, perhaps I should have been operating on whitened images from the beginning! You should probably try that. |
- | <code python> | ||
- | from PIL import Image | + | ---- |
- | import numpy as np | + | ====Bonus:==== |
- | # | + | There's no official extra credit for this lab, but have some fun with it! Try different content and different styles. See if you can get nicer, higher resolution images out of it. |
- | # assumes list.txt is a list of filenames, formatted as | + | |
- | # | + | |
- | # ./lfw2//Aaron_Eckhart/Aaron_Eckhart_0001.jpg | + | |
- | # ./lfw2//Aaron_Guiel/Aaron_Guiel_0001.jpg | + | |
- | # ... | + | |
- | # | + | |
- | files = open( './list.txt' ).readlines() | + | Also, take a look at the vgg16.py code. What happens if you swap out max pooling for average pooling? |
- | data = np.zeros(( len(files), 250, 250 )) | + | What difference does whitening the input images make? |
- | labels = np.zeros(( len(files), 1 )) | + | |
- | # a little hash map mapping subjects to IDs | + | Show me the awesome results you can generate! |
- | ids = {} | + | |
- | scnt = 0 | + | |
- | # load in all of our images | ||
- | ind = 0 | ||
- | for fn in files: | ||
- | subject = fn.split('/')[3] | ||
- | if not ids.has_key( subject ): | ||
- | ids[ subject ] = scnt | ||
- | scnt += 1 | ||
- | label = ids[ subject ] | ||
- | | ||
- | data[ ind, :, : ] = np.array( Image.open( fn.rstrip() ) ) | ||
- | labels[ ind ] = label | ||
- | ind += 1 | ||
- | # data is (13233, 250, 250) | ||
- | # labels is (13233, 1) | ||
- | </code> |