Differences

This shows you the differences between two versions of the page.

--- cs501r_f2018:lab5 [2018/10/01 19:50]
cat [Description:]
+++ cs501r_f2018:lab5 [2021/06/30 23:42] (current)
@@ Line 28: / Line 28: @@
   * 35% Correct extraction of statistics
-  * 35% Correct construction of cost function
+  * 45% Correct construction of loss function in a loss class
-  * 20% Correct initialization and optimization of image variable
+  * 10% Correct initialization and optimization of image variable in a dataset class
   * 10% Awesome looking final image
@@ Line 35: / Line 35: @@
 ====Description:====
-For this lab, you should implement the style transfer algorithm referenced above.  To do this, you will need to unpack the given images. Since we want you to focus on implementing the paper and the loss function, we will give you the code for this.
+For this lab, you should implement the style transfer algorithm referenced above. To do this, you will need to unpack the given images. Since we want you to focus on implementing the paper and the loss function, we will give you the code for this.
 <code python>
@@ Line 62: / Line 62: @@
 </code>
+Or after the images are uploaded on to the local filesystem, you can use:
+<code>
+style_image = Image.open("style_image.png")
+style_image = load_and_normalize(np.array(style_image)).unsqueeze(0)
+content_image = Image.open("content_image.png")
+content_image = load_and_normalize(np.array(content_image)).unsqueeze(0)
+</code>
+**For reference on the network, we will give you a [[https://pytorch.org/tutorials/advanced/neural_style_tutorial.html#sphx-glr-download-advanced-neural-style-tutorial-py|pytorch implementation]] to look at and to implement. This should not just be copy-pasted but you should implement the steps included in the paper--this has more intuitive notation than the Gatys et al paper, and does a good job explaining what each step does.**
 Additionally, in the paper it talks about the VGG network. For the paper, they use the VGG16 network and we will do the same. You do NOT need to implement this! PyTorch has a VGG16 model in one of its libraries. Here we will show you how to access it.
@@ Line 70: / Line 82: @@
 </code>
-You can access the layers by using the VGG Intermediate class we have created for you. This creates a dictionary of the layers that you request's output activation layers.
+However, accessing the the model is difficult, so you can access the layers by using the VGG Intermediate class we have created for you. This adds a post-hook to the model so that after every layer it writes the output of the layers we care about to a dictionary. This creates a dictionary of the layers that you request's output activation layers.
 <code>
@@ Line 80: / Line 92: @@
     self.vgg = models.vgg16(pretrained=True).features.eval()
     for i, m in enumerate(self.vgg.children()):
+        if isinstance(m, nn.ReLU):   # we want to set the relu layers to NOT do the relu in place.
+          m.inplace = False          # the model has a hard time going backwards on the in place functions.
         if i in requested:
           def curry(i):
@@ Line 92: / Line 107: @@
 </code>
-To view all the layers in the network, you can ''print (VGGIntermediate(requested=requested_vals))''. To return only the output layers, you can use the call ''print (vgg(<image_name>.cuda()))'', and to access the dictionary of the results, you can do
+To view all the layers in the network, you can ''print (VGGIntermediate(requested=requested_vals))''. These requested values will be the integer representations in list form of the layer numbers you are trying to access (you can find these by hand using the list below or preferably using list comprehensions). This is how the layer list came from, but we have included that below. To access the dictionary of the activation layers, you can use:
 <code>
-layer = vgg(<image_name>.cuda()
+vgg = VGGIntermediate(requested=requested_vals)
-output_layer = layer[requested_vals]
+vgg.cuda()
+layer = vgg(<image_name>.cuda())
+print (layer.keys())
+output_activation = layer[requested_vals[<i>]]
 </code>
+This will set your model so that the activation layers are avaliable by using the dictionary key that corresponds to the location of the layer in the VGG network that you're using. (Note: this means that the output_activation on this example should be the conv1_1 activation layer).
 Additionally, for help with understanding the VGG16 model, here is a list of the layers that are contained in the vgg16 model. This should help you know which to index when you're trying to access the activation layers specified in the Gatys et al. paper.
@@ Line 142: / Line 161: @@
 You are welcome and encouraged to submit any other style transfer photographs you have, as long as you also submit the required image. Show us the awesome results you can generate!
+----
+====Hints and usefulness:====
+A former student contributed the following:
+Normalizing the image at each timestep is critical.  Here's what I did.
+Some extra things I did in the code snippet (in case they are useful):
+- I changed the VGG code to do use a dict, which in my opinion made things a lot easier.
+- Swapped the max pool layers for avg pool layers (rather hackily...)
+- I used a style scale of 500000 and a content scale of 1 (not in this code)
+PS if you try to use torchvision.transforms.Normalize() it won't work because it is missing a `forward()` and thus a `backward()` as well...
+<code python>
+from collections import OrderedDict
+class Normalization(nn.Module):
+    def __init__(
+        self,
+        mean=torch.tensor([0.485, 0.456, 0.406]).to(device),
+        std=torch.tensor([0.229, 0.224, 0.225]).to(device),
+    ):
+        super(Normalization, self).__init__()
+        self.mean = torch.tensor(mean).view(-1, 1, 1)
+        self.std = torch.tensor(std).view(-1, 1, 1)
+    def forward(self, img):
+        return (img - self.mean) / self.std
+class VGGIntermediate(nn.Module):
+    def __init__(self, requested=[], transforms=[Normalization()]):
+        super(VGGIntermediate, self).__init__()
+        self.transforms = transforms
+        self.vgg = models.vgg16(pretrained=True).features.eval()
+        layers_in_order = [
+            "conv1_1", "relu1_1", "conv1_2", "relu1_2", "maxpool1",
+            "conv2_1", "relu2_1", "conv2_2", "relu2_2", "maxpool2",
+            "conv3_1", "relu3_1", "conv3_2", "relu3_2", "conv3_3", "relu3_3", "maxpool3",
+            "conv4_1", "relu4_1", "conv4_2", "relu4_2", "conv4_3", "relu4_3", "maxpool4",
+            "conv5_1", "relu5_1", "conv5_2", "relu5_2", "conv5_3", "relu5_3", "maxpool5"
+        ]
+        self.intermediates = OrderedDict()
+        for layer_name, m in zip(layers_in_order, self.vgg.children()):
+            if isinstance(m, nn.ReLU):
+                m.inplace = False
+            elif isinstance(m, nn.MaxPool2d):
+                m.forward = lambda x: F.avg_pool2d(
+                    x, m.kernel_size, m.stride, m.padding
+                )
+            if layer_name in requested:
+                def curry(name):
+                    def hook(module, input, output):
+                        self.intermediates[name] = output
+                    return hook
+                m.register_forward_hook(curry(layer_name))
+    def forward(self, x):
+        for transform in self.transforms:
+            x = transform(x)
+        self.vgg(x)
+        return self.intermediates
+</code>

BYU CS classes

User Tools

Site Tools

Differences

Page Tools