====CS501r, Fall 2016 - Deep Learning: Theory and Practice==== As big data and deep learning gain more prominence in both industry and academia, the time seems ripe for a class focused exclusively on the theory and practice of deep learning, both to understand why deep learning has had such a tremendous impact across so many disciplines, and also to spur research excellence in deep learning at BYU. ===Learning activities=== This class will be a graduate-level coding class. Students will be exposed to the theoretical aspects of deep learning (including derivatives, regularization, and optimization theory), as well as practical strategies for training large-scale networks, leveraging hardware acceleration, distributing training across multiple machines, and coping with massive datasets. Students will engage the material primarily through weekly coding labs dedicated to implementing state-of-the-art techniques, using modern deep learning software frameworks. The class will culiminate with a substantial data analysis project. ===Preliminary Syllabus and topics to be covered:=== - **Neuron-based models of computation** - Integrate-and-fire - Hodgkin-Huxley - Population codes - Schematic and organization of visual cortex - HMAX - **Basics of DNNs** - Convolution / deconvolution layers - Maxpooling layers - Relu units - Softmax units - Local response normalization / contrast normalization - **Regularization strategies** - Dropout - Dropconnect - Batch normalization - Adversarial networks - Data augmentation - **High-level implementation packages - pros and cons** - Tensorflow, Theano, Caffe, Keras, Torch, Mocha - **Case studies / existing networks and why they're interesting** - AlexNet - VGG - GoogLeNet / Inception - ZFNet - **Training & initialization** - Initialization strategies: Xavier, Gaussian, Identity, Sparse - Optimization theory and algorithms - Local minima; saddle points; plateaus - SGD - RPROP - RMS prop - Adagrad - Adam - Higher-order algorithms (LBFGS; Hessian-free; trust-region) - Nesterov and momentum - **Large-scale distributed learning** - Parameter servers - Asynchronous vs. synchronous architectures - **Temporal networks and how to train them** - Basic RNNs and Backprop-through-time - LSTMs - Deep Memory Nets - **Application areas** - Deep reinforcement learning - NN models of style vs. content (deepart.io) - Imagenet classification - The Neural Turing Machine - Sentiment classification - Word embeddings - **Understanding and visualizing CNNs** - tSNE embeddings - deconvnets - data gradients / inceptionism - **Misc** - Network compression - Low bit-precision networks - Sum-product networks - Evolutionary approaches to topology discovery - Spatial transformer networks - Network-in-network - Regions-with-CNN