====Objectives:====

  * Build and train a deep conv net  
  * Explore and implement various initialization techniques
  * Implement a parameterized module in Pytorch
  * Use a principled loss function


====Video Tutorial:====
https://youtu.be/3TAuTcx-VCc

----
====Deliverable:====

For this lab, you will submit an ipython notebook via learningsuite.  This is where you build your first deep neural network!

For this lab, we'll be combining several different concepts that we've covered during class, including new layer types, initialization strategies, and an understanding of convolutions.

----
====Grading standards:====


  * 30% Part 0: Successfully followed lab video and typed in code
  * 20% Part 1: Re-implement Conv2D and CrossEntropy loss function
  * 20% Part 2: Implement different initialization strategies
  * 10% Part 3: Print parameters, plot train/test accuracy
  * 10% Part 4: Convolution parameters quiz
  * 10% Tidy and legible figures, including labeled axes where appropriate

----
====Detailed specs:====


**Part 0:** Watch and follow video tutorial

**Part 1:** Re-implement a Conv2D module  with parameters and a CrossEntropy loss function.

You will need to use 
     https://pytorch.org/docs/stable/nn.html#torch.nn.Parameter
     https://pytorch.org/docs/stable/nn.html#torch.nn.functional.conv2d
     https://pytorch.org/docs/stable/torch.html#torch.exp
     https://pytorch.org/docs/stable/torch.html#torch.log
     
**Part 2:** Implement a few initialization strategies which can include Xe initialization (sometimes called Xavier), Orthogonal initailization, and uniform random. You can specify which strategy you want to use with a parameter. Helpful links include:
    https://hjweide.github.io/orthogonal-initialization-in-convolutional-layers (or the orignal paper: http://arxiv.org/abs/1312.6120)
    http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization

**Part 3:** Print the number of parameters in your network and plot accuracy of your training and validation set over time. You should experiment with some deep networks and see if you can get a network with close to 1,000,000 parameters.

**Part 4:** Learn about how convolution layers affect the shape of outputs, and answer the following quiz questions. Include these in a new markdown cell in your jupyter notebook. 

Using a Kernel size of 3x3 what should the settings of your 2d convolution be that results in the following mappings (first answer given to you)

(c=3, h=10, w=10) => (c=10,  h=8,  w=8)  : (out_channels=10, kernel_size=(3, 3), padding=(0, 0))

(c=3, h=10, w=10) => (c=22,  h=10, w=10) : 

(c=3, h=10, w=10) => (c=65,  h=12, w=12) : 

(c=3, h=10, w=10) => (c=7,   h=20, w=20) : 
  
Using a Kernel size of 5x5:

(c=3, h=10, w=10) => (c=10,  h=8,  w=8)  : (out_channels=10,  kernel_size=(5, 5), padding=(1, 1))

(c=3, h=10, w=10) => (c=100, h=10, w=10) :

(c=3, h=10, w=10) => (c=23,  h=12, w=12) : 

(c=3, h=10, w=10) => (c=5,   h=24, w=24) : 
  
Using Kernel size of 5x3:

(c=3, h=10, w=10) => (c=10,  h=8,  w=8)  : 

(c=3, h=10, w=10) => (c=100, h=10, w=10) : 

(c=3, h=10, w=10) => (c=23,  h=12, w=12) : 

(c=3, h=10, w=10) => (c=5,   h=24, w=24) : 
  
Determine the kernel that requires the smallest padding size to make the following mappings possible:

(c=3, h=10, w=10) => (c=10,  h=9,  w=7)  : 

(c=3, h=10, w=10) => (c=22,  h=10, w=10) :