====CS501r, Fall 2016 - Deep Learning: Theory and Practice====

As big data and deep learning gain more prominence in both industry
and academia, the time seems ripe for a class focused exclusively on
the theory and practice of deep learning, both to understand why deep
learning has had such a tremendous impact across so many disciplines,
and also to spur research excellence in deep learning at BYU.

===Learning activities===

This class will be a graduate-level coding class.  Students will be
exposed to the theoretical aspects of deep learning (including
derivatives, regularization, and optimization theory), as well as
practical strategies for training large-scale networks, leveraging
hardware acceleration, distributing training across multiple machines,
and coping with massive datasets.  Students will engage the material
primarily through weekly coding labs dedicated to implementing
state-of-the-art techniques, using modern deep learning software
frameworks.  The class will culiminate with a substantial data
analysis project.

===Preliminary Syllabus and topics to be covered:===

  - **Neuron-based models of computation**
    - Integrate-and-fire
    - Hodgkin-Huxley
    - Population codes
    - Schematic and organization of visual cortex
    - HMAX
  - **Basics of DNNs**
    - Convolution / deconvolution layers
    - Maxpooling layers
    - Relu units
    - Softmax units
    - Local response normalization / contrast normalization
  - **Regularization strategies**
    - Dropout
    - Dropconnect
    - Batch normalization
    - Adversarial networks
    - Data augmentation
  - **High-level implementation packages - pros and cons**
    - Tensorflow, Theano, Caffe, Keras, Torch, Mocha
  - **Case studies / existing networks and why they're interesting**
    - AlexNet
    - VGG
    - GoogLeNet / Inception
    - ZFNet
  - **Training & initialization**
    - Initialization strategies: Xavier, Gaussian, Identity, Sparse
    - Optimization theory and algorithms
    - Local minima; saddle points; plateaus
    - SGD
    - RPROP
    - RMS prop
    - Adagrad
    - Adam
    - Higher-order algorithms (LBFGS; Hessian-free; trust-region)
    - Nesterov and momentum
  - **Large-scale distributed learning**
    - Parameter servers
    - Asynchronous vs. synchronous architectures
  - **Temporal networks and how to train them**
    - Basic RNNs and Backprop-through-time
    - LSTMs
    - Deep Memory Nets
  - **Application areas**
    - Deep reinforcement learning
    - NN models of style vs. content (deepart.io)
    - Imagenet classification
    - The Neural Turing Machine
    - Sentiment classification
    - Word embeddings
  - **Understanding and visualizing CNNs**
    - tSNE embeddings
    - deconvnets
    - data gradients / inceptionism
  - **Misc**
    - Network compression
    - Low bit-precision networks
    - Sum-product networks
    - Evolutionary approaches to topology discovery
    - Spatial transformer networks
    - Network-in-network
    - Regions-with-CNN