User Tools

Site Tools


cs501r_f2016:lab13

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
cs501r_f2016:lab13 [2016/11/22 20:32]
wingated
cs501r_f2016:lab13 [2017/11/11 17:10]
wingated
Line 1: Line 1:
- 
-https://​www.cs.toronto.edu/​~frossard/​post/​vgg16/​ 
  
 ====Objective:​==== ====Objective:​====
Line 9: Line 7:
 ====Deliverable:​==== ====Deliverable:​====
  
-{{ :​cs501r_f2016:​style1.png?​200|}}+{{ :​cs501r_f2016:​style1.png?​300|}}
  
 For this lab, you will need to implement the style transfer algorithm of [[https://​arxiv.org/​pdf/​1508.06576v2.pdf|Gatys et al]]. For this lab, you will need to implement the style transfer algorithm of [[https://​arxiv.org/​pdf/​1508.06576v2.pdf|Gatys et al]].
Line 27: Line 25:
 ====Grading standards:​==== ====Grading standards:​====
  
-Your notebook ​will be graded on the following:+Your code will be graded on the following:
  
-  * 35% Correct ​implementation ​of Siamese network +  * 35% Correct ​extraction ​of statistics 
-  * 35% Correct ​implementation ​of Resnet +  * 35% Correct ​construction ​of cost function 
-  * 20% Reasonable effort to find a good-performing topology +  * 20% Correct initialization and optimization of image variable 
-  * 10% Results writeup+  * 10% Awesome looking final image
  
 ---- ----
 ====Description:​==== ====Description:​====
  
-For this lab, you should implement the style transfer algorithm referenced above. ​ We are providing the following:+For this lab, you should implement the style transfer algorithm referenced above. ​ We are providing the following, [[https://​www.dropbox.com/​sh/​tt0ctms12aumgui/​AACRKSSof6kw-wi8vs1v8ls3a?​dl=0 
 +|available from a dropbox folder]]: 
 + 
 +  - lab10_scaffold.py - Lab 10 scaffolding code 
 +  - vgg16.py - The VGG16 model 
 +  - content.png - An example content image 
 +  - style.png - An example style image 
 + 
 +You will also need the VGG16 pre-trained weights:
  
-  - [[http://​liftothers.org/​byu/​lab10_scaffold.py|Lab 10 scaffolding code]] 
-  - [[http://​liftothers.org/​byu/​vgg16.py|The VGG16 model]]\ 
   - [[http://​liftothers.org/​byu/​vgg16_weights.npz|VGG16 weights]]   - [[http://​liftothers.org/​byu/​vgg16_weights.npz|VGG16 weights]]
-  - [[http://​liftothers.org/​byu/​content.png|An example content image]] +
-  - [[http://​liftothers.org/​byu/​style.png|An example style image]]+
  
 In the scaffolding code, you will find some examples of how to use the provided VGG model. ​ (This model is a slightly modified version of [[https://​www.cs.toronto.edu/​~frossard/​post/​vgg16/​|code available here]]). In the scaffolding code, you will find some examples of how to use the provided VGG model. ​ (This model is a slightly modified version of [[https://​www.cs.toronto.edu/​~frossard/​post/​vgg16/​|code available here]]).
  
 **Note:** In class, we discussed how to construct a computation graph that reuses the VGG network 3 times (one for content, style, and optimization images). ​ It turns out that you don't need to do that.  In fact, we merely need to //​evaluate//​ the VGG network on the content and style images, and save the resulting activations. **Note:** In class, we discussed how to construct a computation graph that reuses the VGG network 3 times (one for content, style, and optimization images). ​ It turns out that you don't need to do that.  In fact, we merely need to //​evaluate//​ the VGG network on the content and style images, and save the resulting activations.
 +
 +The activations can be used to construct a cost function directly. ​ In other words, we don't need to keep around the content/​style VGG networks, because we'll never back-propagate through them.
  
 The steps for completion of this lab are: The steps for completion of this lab are:
  
-  - Load all of the data.  ​Create a test/​training split+  - Run the VGG network on the content and style images.  ​Save the activations
-  - Establish ​baseline accuracy (ie, if you randomly predict same/​differentwhat accuracy do you achieve?) +  - Construct ​content loss functionbased on the paper 
-  - Use tensorflow to create your siamese network. +  - Construct a style loss function, based on the paper 
-    - Use ResNets to extract features from the images +    - For each layer specified in the paper (also noted in the code), you'll need to construct a Gram matrix 
-    - Make sure that parameters are shared across both halves of the network! +    - That Gram matrix should match an equivalent Gram matrix computed on the style activations 
-  - Train the network using an optimizer of your choice +  - Construct ​an Adam optimizer, step size 0.1 
-    You should use some sort of SGD. +  - Initialize all of your variables and reload your VGG weights 
-    You will need to sample same/​different pairs.+  Initialize your optimization image to be the content image (or another image of your choosing) 
 +  Optimize!
  
-Note: you will NOT be graded on the accuracy ​of your final classifier, as long as you make a good faith effort to come up with something that performs reasonably well.+Some of these steps are already done in the scaffolding code.
  
-Your ResNet should extract a vector of features from each image. ​ Those feature vectors should then be compared to calculate an "​energy"; ​that energy should then be input into a contrastive loss function, as discussed in class.+Note that I ran my DNN for about 6000 steps to generate the image shown above.
  
-Remember that your network should be symmetric, so if you swap input images, nothing should change. +Here was my loss function over time:
- +
-Note that some people in the database only have one image. ​ These images are still useful, however (why?), so don't just throw them away.+
  
- +<​code>​ 
----- +ITER    LOSS            STYLE LOSS      CONTENT LOSS 
-====Writeup:​==== +0       ​210537.875000 ​  ​210537872.00000 0.000000 
- +100     ​73993.000000 ​   67282552.000000 6710.441406 
-As discussed in the "​Deliverable"​ section, your writeup must include the following: +200     ​47634.054688 ​   39536856.000000 8097.196777 
- +300     ​36499.234375 ​   28016930.000000 8482.302734 
-  ​- A description of your test/​training split +400     ​30405.132812 ​   21805504.000000 8599.625977 
-  ​- A description of your resnet architecture (layers, strides, nonlinearities,​ etc.) +500     ​26572.333984 ​   17947418.000000 8624.916016 
-  ​- How you assessed whether or not your architecture was working +600     ​23952.351562 ​   15339518.000000 8612.833008 
-  ​- The final performance of your classifier +700     ​22057.589844 ​   13475838.000000 8581.751953 
- +800     ​20623.390625 ​   12093137.000000 8530.253906 
-This writeup should be small - less than 1 page.  ​You don't need to wax eloquent.+900     ​19504.234375 ​   11023667.000000 8480.566406 
 +1000    18598.349609 ​   10174618.000000 8423.731445 
 +1100    17857.289062 ​   9491233.000000 ​ 8366.055664 
 +1200    17243.207031 ​   8932358.000000 ​ 8310.849609 
 +1300    16727.312500 ​   8470261.000000 ​ 8257.049805 
 +1400    16287.441406 ​   8079912.500000 ​ 8207.528320 
 +1500    15904.160156 ​   7747010.500000 ​ 8157.148926 
 +1600    15567.595703 ​   7453235.500000 ​ 8114.359863 
 +1700    15269.226562 ​   7199946.500000 ​ 8069.279297 
 +1800    15003.159180 ​   6973264.000000 ​ 8029.895020 
 +1900    14762.021484 ​   6776666.500000 ​ 7985.354492 
 +2000    14544.566406 ​   6602410.000000 ​ 7942.156738 
 +2100    14347.167969 ​   6442019.000000 ​ 7905.148926 
 +2200    14166.757812 ​   6299105.500000 ​ 7867.651367 
 +2300    13999.201172 ​   6169558.500000 ​ 7829.643066 
 +2400    13845.177734 ​   6053753.000000 ​ 7791.424316 
 +2500    13701.140625 ​   5946503.500000 ​ 7754.636230 
 +2600    13566.027344 ​   5846906.000000 ​ 7719.121582 
 +2700    13440.531250 ​   5751874.500000 ​ 7688.655762 
 +2800    13322.011719 ​   5664197.500000 ​ 7657.814453 
 +2900    13210.117188 ​   5585183.000000 ​ 7624.934570 
 +3000    13105.109375 ​   5510268.000000 ​ 7594.841797 
 +3100    13005.414062 ​   5440027.500000 ​ 7565.385742 
 +3200    12912.160156 ​   5376126.000000 ​ 7536.033203 
 +3300    12824.537109 ​   5316451.500000 ​ 7508.085938 
 +3400    12742.234375 ​   5259337.500000 ​ 7482.895996 
 +3500    12663.185547 ​   5202367.500000 ​ 7460.817871 
 +3600    12588.695312 ​   5151772.000000 ​ 7436.922363 
 +3700    12517.728516 ​   5103315.000000 ​ 7414.413574 
 +3800    12450.191406 ​   5055678.000000 ​ 7394.513184 
 +3900    12385.476562 ​   5012455.000000 ​ 7373.021484 
 +4000    12323.820312 ​   4973657.000000 ​ 7350.163086 
 +4100    12263.249023 ​   4937481.000000 ​ 7325.767578 
 +4200    12204.673828 ​   4898750.000000 ​ 7305.923340 
 +4300    12148.785156 ​   4860086.000000 ​ 7288.698242 
 +4400    12095.140625 ​   4822883.500000 ​ 7272.257324 
 +4500    12043.544922 ​   4787642.500000 ​ 7255.902832 
 +4600    11992.242188 ​   4753499.500000 ​ 7238.742188 
 +4700    11942.533203 ​   4722825.500000 ​ 7219.708008 
 +4800    11895.559570 ​   4695372.500000 ​ 7200.187012 
 +4900    11849.578125 ​   4666181.000000 ​ 7183.397461 
 +5000    11804.967773 ​   4639222.500000 ​ 7165.745117 
 +5100    11762.816406 ​   4614679.500000 ​ 7148.136719 
 +5200    11722.379883 ​   4589744.000000 ​ 7132.635742 
 +5300    11682.291016 ​   4565345.000000 ​ 7116.945312 
 +5400    11642.744141 ​   4541704.500000 ​ 7101.039062 
 +5500    11604.595703 ​   4519445.000000 ​ 7085.149902 
 +5600    11568.400391 ​   4497892.000000 ​ 7070.507812 
 +5700    11533.195312 ​   4478154.000000 ​ 7055.040527 
 +5800    11497.519531 ​   4459191.000000 ​ 7038.328125 
 +5900    11463.125977 ​   4439539.000000 ​ 7023.586914 
 +6000    11429.999023 ​   4421518.000000 ​ 7008.480957 
 +</​code>​
  
 ---- ----
 ====Hints:​==== ====Hints:​====
  
-To help you get startedhere's a simple script that will load all of the images ​and calculate labels. ​ It assumes that the face database has been unpacked in the current directoryand that there exists a file called ''​list.txt''​ that was generated with the following command:+You should make sure that if you initialize your image to the content image, and your loss function is strictly ​the content loss, that your loss is 0.0
  
-<code bash> +I found that it was important to clip pixel values to be in [0,255] To do that, every 100 iterations I extracted the image, clipped it, and then assigned it back in.
-find ./lfw2/ -name \*.jpg > list.txt +
-</​code>​+
  
-After running this code, the data will in the ''​data''​ tensor, and the labels will be in the ''​labels''​ tensor:+...although now that I think about itperhaps I should have been operating on whitened images from the beginning! ​ You should probably try that.
  
-<code python> 
  
-from PIL import Image +---- 
-import numpy as np+====Bonus:​====
  
-+There'​s no official extra credit for this lab, but have some fun with it!  Try different content and different styles See if you can get nicer, higher resolution images out of it.
-# assumes list.txt is a list of filenames, formatted as +
-+
-# ./​lfw2//​Aaron_Eckhart/​Aaron_Eckhart_0001.jpg +
-# ./​lfw2//​Aaron_Guiel/​Aaron_Guiel_0001.jpg +
-# ... +
-#+
  
-files = open( './list.txt' ).readlines()+Also, take a look at the vgg16.py code What happens if you swap out max pooling for average pooling?
  
-data = np.zeros(( len(files), 250, 250 )) +What difference does whitening the input images make?
-labels = np.zeros(( len(files), 1 ))+
  
-# a little hash map mapping subjects to IDs +Show me the awesome results you can generate!
-ids = {} +
-scnt = 0+
  
-# load in all of our images 
-ind = 0 
-for fn in files: 
  
-    subject = fn.split('/'​)[3] 
-    if not ids.has_key( subject ): 
-        ids[ subject ] = scnt 
-        scnt += 1 
-    label = ids[ subject ] 
-    ​ 
-    data[ ind, :, : ] = np.array( Image.open( fn.rstrip() ) ) 
-    labels[ ind ] = label 
-    ind += 1 
  
-# data is (13233, 250, 250) 
-# labels is (13233, 1) 
  
-</​code>​ 
cs501r_f2016/lab13.txt · Last modified: 2021/06/30 23:42 (external edit)