This shows you the differences between two versions of the page.
cs501r_f2016 [2016/03/31 23:11] admin |
cs501r_f2016 [2021/06/30 23:42] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | =CS501r, Fall 2016 - Deep Learning: Theory and Practice= | ||
- | |||
- | As big data and deep learning gain more prominence in both industry | ||
- | and academia, the time seems ripe for a class focused exclusively on | ||
- | the theory and practice of deep learning, both to understand why deep | ||
- | learning has had such a tremendous impact across so many disciplines, | ||
- | and also to spur research excellence in deep learning at BYU. | ||
- | |||
- | ==Learning activities== | ||
- | |||
- | This class will be a graduate-level coding class. Students will be | ||
- | exposed to the theoretical aspects of deep learning (including | ||
- | derivatives, regularization, and optimization theory), as well as | ||
- | practical strategies for training large-scale networks, leveraging | ||
- | hardware acceleration, distributing training across multiple machines, | ||
- | and coping with massive datasets. Students will engage the material | ||
- | primarily through weekly coding labs dedicated to implementing | ||
- | state-of-the-art techniques, using modern deep learning software | ||
- | frameworks. The class will culiminate with a substantial data | ||
- | analysis project. | ||
- | |||
- | ==Preliminary Syllabus and topics to be covered:== | ||
- | |||
- | - **Basics of DNNs** | ||
- | - Convolution layers | ||
- | - Maxpooling layers | ||
- | - Relu units | ||
- | - Softmax units | ||
- | - Local response normalization / contrast normalization | ||
- | - **Regularization strategies** | ||
- | - Dropout | ||
- | - Dropconnect | ||
- | - Batch normalization | ||
- | - Adversarial networks | ||
- | - Data augmentation | ||
- | - **High-level implementation packages - pros and cons** | ||
- | - Tensorflow, Theano, Caffe, Keras, Torch, Mocha | ||
- | - **Case studies / existing networks and why they're interesting** | ||
- | - AlexNet | ||
- | - VGG | ||
- | - GoogLeNet / Inception | ||
- | - ZFNet | ||
- | - **Training & initialization** | ||
- | - Initialization strategies: Xavier, Gaussian, Identity, Sparse | ||
- | - Optimization theory and algorithms | ||
- | - Local minima; saddle points; plateaus | ||
- | - SGD | ||
- | - RPROP | ||
- | - RMS prop | ||
- | - Adagrad | ||
- | - Adam | ||
- | - Higher-order algorithms (LBFGS; Hessian-free; trust-region) | ||
- | - Nesterov and momentum | ||
- | - **Large-scale distributed learning** | ||
- | - Parameter servers | ||
- | - Asynchronous vs. synchronous architectures | ||
- | - **Temporal networks and how to train them** | ||
- | - Basic RNNs and Backprop-through-time | ||
- | - LSTMs | ||
- | - Deep Memory Nets | ||
- | - **Application areas** | ||
- | - Deep reinforcement learning | ||
- | - NN models of style vs. content (deepart.io) | ||
- | - Imagenet classification | ||
- | - The Neural Turing Machine | ||
- | - Sentiment classification | ||
- | - Word embeddings | ||
- | - **Understanding and visualizing CNNs** | ||
- | - tSNE embeddings | ||
- | - deconvnets | ||
- | - data gradients / inceptionism | ||
- | - **Misc** | ||
- | - Network compression | ||
- | - Low bit-precision networks | ||
- | - Sum-product networks | ||
- | - Evolutionary approaches to topology discovery | ||
- | - Spatial transformer networks | ||
- | - Network-in-network | ||
- | - Regions-with-CNN | ||