**Applications**
  * BERT - https://arxiv.org/abs/1810.04805
  * Machine Theory of Mind - http://arxiv.org/pdf/1802.07740v2.pdf
  * Video-to-Video Synthesis
  * Video Prediction via Selective Sampling
  * Learning to decompose & disadvantage representations for video prediction

**GANs / unsupervised**

  * Composing graphical models with neural networks for structured representations and fast inference https://arxiv.org/pdf/1603.06277.pdf
  * IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis
  * Wasserstein GAN
  * Text adaptive GAN: Manipulating images with natural language

**Network design**

  * Attention is all you need
  * Neural Ordinary Differential Equations - https://arxiv.org/pdf/1806.07366.pdf
  * Reversible neural networks - https://arxiv.org/abs/1807.03039 - https://arxiv.org/abs/1605.08803
  
**Foundations / Philosophy**

  * Troubling trends in ML scholarship - https://arxiv.org/pdf/1807.03341
  * A Theory of Local Learning, the Learning Channel, and the Optimality of Backpropagation - https://arxiv.org/pdf/1506.06472
  * Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review - https://arxiv.org/pdf/1611.00740
  
**RL**

  * Curiosity-driven exploration by self-supervised prediction
  * Diversity is all you need: Learning skills without a reward function - http://arxiv.org/pdf/1802.06070v6.pdf
  * World Models - https://arxiv.org/pdf/1803.10122v4.pdf

**Graph networks**

  * Graph Neural Networks: A Review of Methods and Applications - https://arxiv.org/abs/1812.08434
  * Relational inductive biases, deep learning, and graph networks - https://arxiv.org/pdf/1806.01261

**Optimization / training**

  * Averaging weights leads to wider optima and better generalization - http://arxiv.org/pdf/1803.05407v2.pdf
  * The loss surface of multilayer networks - https://arxiv.org/pdf/1412.0233
  * Visualizing The Loss Landscape of Neural Nets - https://arxiv.org/pdf/1712.09913v3.pdf
  * The Matrix Calculus You Need For Deep Learning - https://arxiv.org/pdf/1802.01528v3.pdf
  * Group Norm - https://arxiv.org/pdf/1803.08494v3.pdf
  * Kalman Normalization: Normalizing internal representations across network layers
  * MetaReg: towards Domain Generalization using meta-regularization
  * AutoAugment - https://arxiv.org/abs/1805.09501
  * A Disciplined Approach To Neural Network Hyper-Parameters: part 1 - http://arxiv.org/pdf/1803.09820v2.pdf
  * (Direct) Feedback alignment

** Geometric deep learning **

  * Geometric deep learning: going beyond Euclidean data- https://arxiv.org/pdf/1611.08097.pdf
  * Convolutional Neural Networks on Surfaces via Seamless Toric Covers
  * SchNet: A continuous-filter convolutional neural network for modeling quantum interactions
  * Deriving Neural Architectures from Sequence and Graph Kernels
  * CayleyNets: Graph convolutional neural networks with complex rational spectral filters
  * Deep Functional Maps: Structured Prediction for Dense Shape Correspondence
  * Geometric matrix completion with recurrent multi-graph neural networks
  * Neural Message Passing for Quantum Chemistry
  * Deep Learning on Lie Groups for Skeleton-based Action Recognition
  
**Other**

  * Bayesian neural networks?