This shows you the differences between two versions of the page.
cs501r_f2016:lab13 [2016/11/22 19:53] wingated created |
cs501r_f2016:lab13 [2021/06/30 23:42] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | https://www.cs.toronto.edu/~frossard/post/vgg16/ | ||
- | |||
- | ====Objective:==== | ||
- | |||
- | To explore an alternative use of DNNs by implementing the style transfer algorithm. | ||
- | |||
- | ---- | ||
- | ====Deliverable:==== | ||
- | |||
- | For this lab, you will need to implement the style transfer algorithm of [[https://arxiv.org/pdf/1508.06576v2.pdf|Gatys et al]]. | ||
- | |||
- | - You must extract statistics from the content and style images | ||
- | - You must formulate an optimization problem over an input image | ||
- | - You must optimize the image to match both style and content | ||
- | |||
- | You should turn in the following: | ||
- | |||
- | - The final image that you generated | ||
- | - Your code | ||
- | |||
- | ---- | ||
- | ====Grading standards:==== | ||
- | |||
- | Your notebook will be graded on the following: | ||
- | |||
- | * 35% Correct implementation of Siamese network | ||
- | * 35% Correct implementation of Resnet | ||
- | * 20% Reasonable effort to find a good-performing topology | ||
- | * 10% Results writeup | ||
- | |||
- | ---- | ||
- | ====Description:==== | ||
- | |||
- | For this lab, you should implement the style transfer algorithm referenced above. We are providing the following: | ||
- | |||
- | - [[http://liftothers.org/byu/lab10_scaffold.py|Lab 10 scaffolding code]] | ||
- | - [[http://liftothers.org/byu/vgg16.py|The VGG16 model]]\ | ||
- | - [[http://liftothers.org/byu/vgg16_weights.npz|VGG16 weights]] | ||
- | - [[http://liftothers.org/byu/content.png|An example content image]] | ||
- | - [[http://liftothers.org/byu/style.png|An example style image]] | ||
- | |||
- | In the scaffolding code, you will find some examples of how to use the provided VGG model. (This model is a slightly modified version of [[https://www.cs.toronto.edu/~frossard/post/vgg16/|code available here]]). | ||
- | |||
- | **Note:** In class, we discussed how to construct a computation graph that reuses the VGG network 3 times (one for content, style, and optimization images). It turns out that you don't need to do that. In fact, we merely need to //evaluate// the VGG network on the content and style images, and save the resulting activations. | ||
- | |||
- | The steps for completion of this lab are: | ||
- | |||
- | - Load all of the data. Create a test/training split. | ||
- | - Establish a baseline accuracy (ie, if you randomly predict same/different, what accuracy do you achieve?) | ||
- | - Use tensorflow to create your siamese network. | ||
- | - Use ResNets to extract features from the images | ||
- | - Make sure that parameters are shared across both halves of the network! | ||
- | - Train the network using an optimizer of your choice | ||
- | - You should use some sort of SGD. | ||
- | - You will need to sample same/different pairs. | ||
- | |||
- | Note: you will NOT be graded on the accuracy of your final classifier, as long as you make a good faith effort to come up with something that performs reasonably well. | ||
- | |||
- | Your ResNet should extract a vector of features from each image. Those feature vectors should then be compared to calculate an "energy"; that energy should then be input into a contrastive loss function, as discussed in class. | ||
- | |||
- | Remember that your network should be symmetric, so if you swap input images, nothing should change. | ||
- | |||
- | Note that some people in the database only have one image. These images are still useful, however (why?), so don't just throw them away. | ||
- | |||
- | |||
- | ---- | ||
- | ====Writeup:==== | ||
- | |||
- | As discussed in the "Deliverable" section, your writeup must include the following: | ||
- | |||
- | - A description of your test/training split | ||
- | - A description of your resnet architecture (layers, strides, nonlinearities, etc.) | ||
- | - How you assessed whether or not your architecture was working | ||
- | - The final performance of your classifier | ||
- | |||
- | This writeup should be small - less than 1 page. You don't need to wax eloquent. | ||
- | |||
- | ---- | ||
- | ====Hints:==== | ||
- | |||
- | To help you get started, here's a simple script that will load all of the images and calculate labels. It assumes that the face database has been unpacked in the current directory, and that there exists a file called ''list.txt'' that was generated with the following command: | ||
- | |||
- | <code bash> | ||
- | find ./lfw2/ -name \*.jpg > list.txt | ||
- | </code> | ||
- | |||
- | After running this code, the data will in the ''data'' tensor, and the labels will be in the ''labels'' tensor: | ||
- | |||
- | <code python> | ||
- | |||
- | from PIL import Image | ||
- | import numpy as np | ||
- | |||
- | # | ||
- | # assumes list.txt is a list of filenames, formatted as | ||
- | # | ||
- | # ./lfw2//Aaron_Eckhart/Aaron_Eckhart_0001.jpg | ||
- | # ./lfw2//Aaron_Guiel/Aaron_Guiel_0001.jpg | ||
- | # ... | ||
- | # | ||
- | |||
- | files = open( './list.txt' ).readlines() | ||
- | |||
- | data = np.zeros(( len(files), 250, 250 )) | ||
- | labels = np.zeros(( len(files), 1 )) | ||
- | |||
- | # a little hash map mapping subjects to IDs | ||
- | ids = {} | ||
- | scnt = 0 | ||
- | |||
- | # load in all of our images | ||
- | ind = 0 | ||
- | for fn in files: | ||
- | |||
- | subject = fn.split('/')[3] | ||
- | if not ids.has_key( subject ): | ||
- | ids[ subject ] = scnt | ||
- | scnt += 1 | ||
- | label = ids[ subject ] | ||
- | | ||
- | data[ ind, :, : ] = np.array( Image.open( fn.rstrip() ) ) | ||
- | labels[ ind ] = label | ||
- | ind += 1 | ||
- | |||
- | # data is (13233, 250, 250) | ||
- | # labels is (13233, 1) | ||
- | |||
- | </code> |