User Tools

Site Tools


CS501r, Fall 2016 - Deep Learning: Theory and Practice

As big data and deep learning gain more prominence in both industry and academia, the time seems ripe for a class focused exclusively on the theory and practice of deep learning, both to understand why deep learning has had such a tremendous impact across so many disciplines, and also to spur research excellence in deep learning at BYU.

Learning activities

This class will be a graduate-level coding class. Students will be exposed to the theoretical aspects of deep learning (including derivatives, regularization, and optimization theory), as well as practical strategies for training large-scale networks, leveraging hardware acceleration, distributing training across multiple machines, and coping with massive datasets. Students will engage the material primarily through weekly coding labs dedicated to implementing state-of-the-art techniques, using modern deep learning software frameworks. The class will culiminate with a substantial data analysis project.

Preliminary Syllabus and topics to be covered:

  1. Neuron-based models of computation
    1. Integrate-and-fire
    2. Hodgkin-Huxley
    3. Population codes
    4. Schematic and organization of visual cortex
    5. HMAX
  2. Basics of DNNs
    1. Convolution / deconvolution layers
    2. Maxpooling layers
    3. Relu units
    4. Softmax units
    5. Local response normalization / contrast normalization
  3. Regularization strategies
    1. Dropout
    2. Dropconnect
    3. Batch normalization
    4. Adversarial networks
    5. Data augmentation
  4. High-level implementation packages - pros and cons
    1. Tensorflow, Theano, Caffe, Keras, Torch, Mocha
  5. Case studies / existing networks and why they're interesting
    1. AlexNet
    2. VGG
    3. GoogLeNet / Inception
    4. ZFNet
  6. Training & initialization
    1. Initialization strategies: Xavier, Gaussian, Identity, Sparse
    2. Optimization theory and algorithms
    3. Local minima; saddle points; plateaus
    4. SGD
    5. RPROP
    6. RMS prop
    7. Adagrad
    8. Adam
    9. Higher-order algorithms (LBFGS; Hessian-free; trust-region)
    10. Nesterov and momentum
  7. Large-scale distributed learning
    1. Parameter servers
    2. Asynchronous vs. synchronous architectures
  8. Temporal networks and how to train them
    1. Basic RNNs and Backprop-through-time
    2. LSTMs
    3. Deep Memory Nets
  9. Application areas
    1. Deep reinforcement learning
    2. NN models of style vs. content (
    3. Imagenet classification
    4. The Neural Turing Machine
    5. Sentiment classification
    6. Word embeddings
  10. Understanding and visualizing CNNs
    1. tSNE embeddings
    2. deconvnets
    3. data gradients / inceptionism
  11. Misc
    1. Network compression
    2. Low bit-precision networks
    3. Sum-product networks
    4. Evolutionary approaches to topology discovery
    5. Spatial transformer networks
    6. Network-in-network
    7. Regions-with-CNN
cs501r_f2016_desc.txt · Last modified: 2016/08/29 09:21 by admin