**Applications** * BERT - https://arxiv.org/abs/1810.04805 * Machine Theory of Mind - http://arxiv.org/pdf/1802.07740v2.pdf * Video-to-Video Synthesis * Video Prediction via Selective Sampling * Learning to decompose & disadvantage representations for video prediction **GANs / unsupervised** * Composing graphical models with neural networks for structured representations and fast inference https://arxiv.org/pdf/1603.06277.pdf * IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis * Wasserstein GAN * Text adaptive GAN: Manipulating images with natural language **Network design** * Attention is all you need * Neural Ordinary Differential Equations - https://arxiv.org/pdf/1806.07366.pdf * Reversible neural networks - https://arxiv.org/abs/1807.03039 - https://arxiv.org/abs/1605.08803 **Foundations / Philosophy** * Troubling trends in ML scholarship - https://arxiv.org/pdf/1807.03341 * A Theory of Local Learning, the Learning Channel, and the Optimality of Backpropagation - https://arxiv.org/pdf/1506.06472 * Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review - https://arxiv.org/pdf/1611.00740 **RL** * Curiosity-driven exploration by self-supervised prediction * Diversity is all you need: Learning skills without a reward function - http://arxiv.org/pdf/1802.06070v6.pdf * World Models - https://arxiv.org/pdf/1803.10122v4.pdf **Graph networks** * Graph Neural Networks: A Review of Methods and Applications - https://arxiv.org/abs/1812.08434 * Relational inductive biases, deep learning, and graph networks - https://arxiv.org/pdf/1806.01261 **Optimization / training** * Averaging weights leads to wider optima and better generalization - http://arxiv.org/pdf/1803.05407v2.pdf * The loss surface of multilayer networks - https://arxiv.org/pdf/1412.0233 * Visualizing The Loss Landscape of Neural Nets - https://arxiv.org/pdf/1712.09913v3.pdf * The Matrix Calculus You Need For Deep Learning - https://arxiv.org/pdf/1802.01528v3.pdf * Group Norm - https://arxiv.org/pdf/1803.08494v3.pdf * Kalman Normalization: Normalizing internal representations across network layers * MetaReg: towards Domain Generalization using meta-regularization * AutoAugment - https://arxiv.org/abs/1805.09501 * A Disciplined Approach To Neural Network Hyper-Parameters: part 1 - http://arxiv.org/pdf/1803.09820v2.pdf * (Direct) Feedback alignment ** Geometric deep learning ** * Geometric deep learning: going beyond Euclidean data- https://arxiv.org/pdf/1611.08097.pdf * Convolutional Neural Networks on Surfaces via Seamless Toric Covers * SchNet: A continuous-filter convolutional neural network for modeling quantum interactions * Deriving Neural Architectures from Sequence and Graph Kernels * CayleyNets: Graph convolutional neural networks with complex rational spectral filters * Deep Functional Maps: Structured Prediction for Dense Shape Correspondence * Geometric matrix completion with recurrent multi-graph neural networks * Neural Message Passing for Quantum Chemistry * Deep Learning on Lie Groups for Skeleton-based Action Recognition **Other** * Bayesian neural networks?