Differences

This shows you the differences between two versions of the page.

--- cs501r_f2016:lab13 [2016/11/22 19:53]
wingated created
+++ cs501r_f2016:lab13 [2017/11/11 17:10]
wingated
@@ Line 1: / Line 1: @@
-https://www.cs.toronto.edu/~frossard/post/vgg16/
 ====Objective:====
@@ Line 8: / Line 6: @@
 ----
 ====Deliverable:====
+{{ :cs501r_f2016:style1.png?300|}}
 For this lab, you will need to implement the style transfer algorithm of [[https://arxiv.org/pdf/1508.06576v2.pdf|Gatys et al]].
@@ Line 19: / Line 19: @@
   - The final image that you generated
   - Your code
+An example image that I generated is shown at the right.
 ----
 ====Grading standards:====
-Your notebook will be graded on the following:
+Your code will be graded on the following:
-  * 35% Correct implementation of Siamese network
+  * 35% Correct extraction of statistics
-  * 35% Correct implementation of Resnet
+  * 35% Correct construction of cost function
-  * 20% Reasonable effort to find a good-performing topology
+  * 20% Correct initialization and optimization of image variable
-  * 10% Results writeup
+  * 10% Awesome looking final image
 ----
 ====Description:====
-For this lab, you should implement the style transfer algorithm referenced above.  We are providing the following:
+For this lab, you should implement the style transfer algorithm referenced above.  We are providing the following, [[https://www.dropbox.com/sh/tt0ctms12aumgui/AACRKSSof6kw-wi8vs1v8ls3a?dl=0
+|available from a dropbox folder]]:
+  - lab10_scaffold.py - Lab 10 scaffolding code
+  - vgg16.py - The VGG16 model
+  - content.png - An example content image
+  - style.png - An example style image
+You will also need the VGG16 pre-trained weights:
-  - [[http://liftothers.org/byu/lab10_scaffold.py|Lab 10 scaffolding code]]
-  - [[http://liftothers.org/byu/vgg16.py|The VGG16 model]]\
   - [[http://liftothers.org/byu/vgg16_weights.npz|VGG16 weights]]
-  - [[http://liftothers.org/byu/content.png|An example content image]]
-  - [[http://liftothers.org/byu/style.png|An example style image]]
 In the scaffolding code, you will find some examples of how to use the provided VGG model.  (This model is a slightly modified version of [[https://www.cs.toronto.edu/~frossard/post/vgg16/|code available here]]).
 **Note:** In class, we discussed how to construct a computation graph that reuses the VGG network 3 times (one for content, style, and optimization images).  It turns out that you don't need to do that.  In fact, we merely need to //evaluate// the VGG network on the content and style images, and save the resulting activations.
+The activations can be used to construct a cost function directly.  In other words, we don't need to keep around the content/style VGG networks, because we'll never back-propagate through them.
 The steps for completion of this lab are:
-  - Load all of the data.  Create a test/training split.
+  - Run the VGG network on the content and style images.  Save the activations.
-  - Establish a baseline accuracy (ie, if you randomly predict same/different, what accuracy do you achieve?)
+  - Construct a content loss function, based on the paper
-  - Use tensorflow to create your siamese network.
+  - Construct a style loss function, based on the paper
-    - Use ResNets to extract features from the images
+    - For each layer specified in the paper (also noted in the code), you'll need to construct a Gram matrix
-    - Make sure that parameters are shared across both halves of the network!
+    - That Gram matrix should match an equivalent Gram matrix computed on the style activations
-  - Train the network using an optimizer of your choice
+  - Construct an Adam optimizer, step size 0.1
-    - You should use some sort of SGD.
+  - Initialize all of your variables and reload your VGG weights
-    - You will need to sample same/different pairs.
+  - Initialize your optimization image to be the content image (or another image of your choosing)
+  - Optimize!
-Note: you will NOT be graded on the accuracy of your final classifier, as long as you make a good faith effort to come up with something that performs reasonably well.
+Some of these steps are already done in the scaffolding code.
-Your ResNet should extract a vector of features from each image.  Those feature vectors should then be compared to calculate an "energy"; that energy should then be input into a contrastive loss function, as discussed in class.
+Note that I ran my DNN for about 6000 steps to generate the image shown above.
-Remember that your network should be symmetric, so if you swap input images, nothing should change.
+Here was my loss function over time:
-Note that some people in the database only have one image.  These images are still useful, however (why?), so don't just throw them away.
+<code>
-----
+ITER    LOSS            STYLE LOSS      CONTENT LOSS
-====Writeup:====
+       210537.875000   210537872.00000 0.000000
+     73993.000000    67282552.000000 6710.441406
-As discussed in the "Deliverable" section, your writeup must include the following:
+     47634.054688    39536856.000000 8097.196777
+     36499.234375    28016930.000000 8482.302734
-  - A description of your test/training split
+     30405.132812    21805504.000000 8599.625977
-  - A description of your resnet architecture (layers, strides, nonlinearities, etc.)
+     26572.333984    17947418.000000 8624.916016
-  - How you assessed whether or not your architecture was working
+     23952.351562    15339518.000000 8612.833008
-  - The final performance of your classifier
+     22057.589844    13475838.000000 8581.751953
+     20623.390625    12093137.000000 8530.253906
-This writeup should be small - less than 1 page.  You don't need to wax eloquent.
+     19504.234375    11023667.000000 8480.566406
+    18598.349609    10174618.000000 8423.731445
+    17857.289062    9491233.000000  8366.055664
+    17243.207031    8932358.000000  8310.849609
+    16727.312500    8470261.000000  8257.049805
+    16287.441406    8079912.500000  8207.528320
+    15904.160156    7747010.500000  8157.148926
+    15567.595703    7453235.500000  8114.359863
+    15269.226562    7199946.500000  8069.279297
+    15003.159180    6973264.000000  8029.895020
+    14762.021484    6776666.500000  7985.354492
+    14544.566406    6602410.000000  7942.156738
+    14347.167969    6442019.000000  7905.148926
+    14166.757812    6299105.500000  7867.651367
+    13999.201172    6169558.500000  7829.643066
+    13845.177734    6053753.000000  7791.424316
+    13701.140625    5946503.500000  7754.636230
+    13566.027344    5846906.000000  7719.121582
+    13440.531250    5751874.500000  7688.655762
+    13322.011719    5664197.500000  7657.814453
+    13210.117188    5585183.000000  7624.934570
+    13105.109375    5510268.000000  7594.841797
+    13005.414062    5440027.500000  7565.385742
+    12912.160156    5376126.000000  7536.033203
+    12824.537109    5316451.500000  7508.085938
+    12742.234375    5259337.500000  7482.895996
+    12663.185547    5202367.500000  7460.817871
+    12588.695312    5151772.000000  7436.922363
+    12517.728516    5103315.000000  7414.413574
+    12450.191406    5055678.000000  7394.513184
+    12385.476562    5012455.000000  7373.021484
+    12323.820312    4973657.000000  7350.163086
+    12263.249023    4937481.000000  7325.767578
+    12204.673828    4898750.000000  7305.923340
+    12148.785156    4860086.000000  7288.698242
+    12095.140625    4822883.500000  7272.257324
+    12043.544922    4787642.500000  7255.902832
+    11992.242188    4753499.500000  7238.742188
+    11942.533203    4722825.500000  7219.708008
+    11895.559570    4695372.500000  7200.187012
+    11849.578125    4666181.000000  7183.397461
+    11804.967773    4639222.500000  7165.745117
+    11762.816406    4614679.500000  7148.136719
+    11722.379883    4589744.000000  7132.635742
+    11682.291016    4565345.000000  7116.945312
+    11642.744141    4541704.500000  7101.039062
+    11604.595703    4519445.000000  7085.149902
+    11568.400391    4497892.000000  7070.507812
+    11533.195312    4478154.000000  7055.040527
+    11497.519531    4459191.000000  7038.328125
+    11463.125977    4439539.000000  7023.586914
+    11429.999023    4421518.000000  7008.480957
+</code>
 ----
 ====Hints:====
-To help you get started, here's a simple script that will load all of the images and calculate labels.  It assumes that the face database has been unpacked in the current directory, and that there exists a file called ''list.txt'' that was generated with the following command:
+You should make sure that if you initialize your image to the content image, and your loss function is strictly the content loss, that your loss is 0.0
-<code bash>
+I found that it was important to clip pixel values to be in [0,255].  To do that, every 100 iterations I extracted the image, clipped it, and then assigned it back in.
-find ./lfw2/ -name \*.jpg > list.txt
-</code>
-After running this code, the data will in the ''data'' tensor, and the labels will be in the ''labels'' tensor:
+...although now that I think about it, perhaps I should have been operating on whitened images from the beginning!  You should probably try that.
-<code python>
-from PIL import Image
+----
-import numpy as np
+====Bonus:====
-#
+There's no official extra credit for this lab, but have some fun with it!  Try different content and different styles.  See if you can get nicer, higher resolution images out of it.
-# assumes list.txt is a list of filenames, formatted as
-#
-# ./lfw2//Aaron_Eckhart/Aaron_Eckhart_0001.jpg
-# ./lfw2//Aaron_Guiel/Aaron_Guiel_0001.jpg
-# ...
-#
-files = open( './list.txt' ).readlines()
+Also, take a look at the vgg16.py code.  What happens if you swap out max pooling for average pooling?
-data = np.zeros(( len(files), 250, 250 ))
+What difference does whitening the input images make?
-labels = np.zeros(( len(files), 1 ))
-# a little hash map mapping subjects to IDs
+Show me the awesome results you can generate!
-ids = {}
-scnt = 0
-# load in all of our images
-ind = 0
-for fn in files:
-    subject = fn.split('/')[3]
-    if not ids.has_key( subject ):
-        ids[ subject ] = scnt
-        scnt += 1
-    label = ids[ subject ]
-    data[ ind, :, : ] = np.array( Image.open( fn.rstrip() ) )
-    labels[ ind ] = label
-    ind += 1
-# data is (13233, 250, 250)
-# labels is (13233, 1)
-</code>

BYU CS classes

User Tools

Site Tools

Differences

Page Tools