This shows you the differences between two versions of the page.
— |
cs501r_f2016_desc [2021/06/30 23:42] (current) |
||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====CS501r, Fall 2016 - Deep Learning: Theory and Practice==== | ||
+ | |||
+ | As big data and deep learning gain more prominence in both industry | ||
+ | and academia, the time seems ripe for a class focused exclusively on | ||
+ | the theory and practice of deep learning, both to understand why deep | ||
+ | learning has had such a tremendous impact across so many disciplines, | ||
+ | and also to spur research excellence in deep learning at BYU. | ||
+ | |||
+ | ===Learning activities=== | ||
+ | |||
+ | This class will be a graduate-level coding class. Students will be | ||
+ | exposed to the theoretical aspects of deep learning (including | ||
+ | derivatives, regularization, and optimization theory), as well as | ||
+ | practical strategies for training large-scale networks, leveraging | ||
+ | hardware acceleration, distributing training across multiple machines, | ||
+ | and coping with massive datasets. Students will engage the material | ||
+ | primarily through weekly coding labs dedicated to implementing | ||
+ | state-of-the-art techniques, using modern deep learning software | ||
+ | frameworks. The class will culiminate with a substantial data | ||
+ | analysis project. | ||
+ | |||
+ | ===Preliminary Syllabus and topics to be covered:=== | ||
+ | |||
+ | - **Neuron-based models of computation** | ||
+ | - Integrate-and-fire | ||
+ | - Hodgkin-Huxley | ||
+ | - Population codes | ||
+ | - Schematic and organization of visual cortex | ||
+ | - HMAX | ||
+ | - **Basics of DNNs** | ||
+ | - Convolution / deconvolution layers | ||
+ | - Maxpooling layers | ||
+ | - Relu units | ||
+ | - Softmax units | ||
+ | - Local response normalization / contrast normalization | ||
+ | - **Regularization strategies** | ||
+ | - Dropout | ||
+ | - Dropconnect | ||
+ | - Batch normalization | ||
+ | - Adversarial networks | ||
+ | - Data augmentation | ||
+ | - **High-level implementation packages - pros and cons** | ||
+ | - Tensorflow, Theano, Caffe, Keras, Torch, Mocha | ||
+ | - **Case studies / existing networks and why they're interesting** | ||
+ | - AlexNet | ||
+ | - VGG | ||
+ | - GoogLeNet / Inception | ||
+ | - ZFNet | ||
+ | - **Training & initialization** | ||
+ | - Initialization strategies: Xavier, Gaussian, Identity, Sparse | ||
+ | - Optimization theory and algorithms | ||
+ | - Local minima; saddle points; plateaus | ||
+ | - SGD | ||
+ | - RPROP | ||
+ | - RMS prop | ||
+ | - Adagrad | ||
+ | - Adam | ||
+ | - Higher-order algorithms (LBFGS; Hessian-free; trust-region) | ||
+ | - Nesterov and momentum | ||
+ | - **Large-scale distributed learning** | ||
+ | - Parameter servers | ||
+ | - Asynchronous vs. synchronous architectures | ||
+ | - **Temporal networks and how to train them** | ||
+ | - Basic RNNs and Backprop-through-time | ||
+ | - LSTMs | ||
+ | - Deep Memory Nets | ||
+ | - **Application areas** | ||
+ | - Deep reinforcement learning | ||
+ | - NN models of style vs. content (deepart.io) | ||
+ | - Imagenet classification | ||
+ | - The Neural Turing Machine | ||
+ | - Sentiment classification | ||
+ | - Word embeddings | ||
+ | - **Understanding and visualizing CNNs** | ||
+ | - tSNE embeddings | ||
+ | - deconvnets | ||
+ | - data gradients / inceptionism | ||
+ | - **Misc** | ||
+ | - Network compression | ||
+ | - Low bit-precision networks | ||
+ | - Sum-product networks | ||
+ | - Evolutionary approaches to topology discovery | ||
+ | - Spatial transformer networks | ||
+ | - Network-in-network | ||
+ | - Regions-with-CNN | ||