├── README.md └── daily └── Aug28.markdown /README.md: -------------------------------------------------------------------------------- 1 | # tricks-used-in-deep-learning 2 | Tricks used in deep learning. Including papers read recently. 3 | 4 | ## Improving softmax 5 | 6 | Gumbel-Softmax: [Categorical Reparameterization with Gumbel-Softmax](https://arxiv.org/abs/1611.01144) 7 | 8 | Confidence penalty: [Regularizing Neural Networks by Penalizing Confident Output Distributions](https://arxiv.org/abs/1701.06548) 9 | 10 | ## Normalization 11 | 12 | weight normalization: [Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks](https://arxiv.org/abs/1602.07868) 13 | 14 | Batch Renormalization: [Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models](https://arxiv.org/abs/1702.03275) 15 | 16 | ## Weight compressing 17 | 18 | [Soft weight-sharing for Neural Network compression](https://arxiv.org/abs/1702.04008) 19 | 20 | ## GAN 21 | 22 | [GAN tricks](https://github.com/soumith/ganhacks) 23 | 24 | [Wasserstein GAN](https://arxiv.org/abs/1701.07875):[my implementation](https://github.com/bobchennan/Wasserstein-GAN-Keras) 25 | [Example on MNIST](https://gist.github.com/f0k/f3190ebba6c53887d598d03119ca2066) 26 | 27 | [Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities](https://arxiv.org/abs/1701.06264) 28 | 29 | 30 | ## Matrix Factorization 31 | 32 | [MCPCA](https://arxiv.org/abs/1702.05471v1) 33 | 34 | 35 | ## Feature representation 36 | [Attentive Recurrent Comparators](https://arxiv.org/abs/1703.00767) 37 | [code](https://github.com/pranv/ARC) 38 | 39 | 40 | ## Training 41 | [Decoupled Neural Interfaces using Synthetic Gradients](https://arxiv.org/abs/1608.05343) 42 | 43 | ## Dropout 44 | [Variational Dropout Sparsifies Deep Neural Networks](https://arxiv.org/abs/1701.05369) 45 | [code](https://github.com/ars-ashuha/variational-dropout-sparsifies-dnn) 46 | 47 | [Concrete Dropout](https://arxiv.org/abs/1705.07832) 48 | 49 | 50 | ## Transfer Learning 51 | [Sobolev Training for Neural Networks](https://arxiv.org/abs/1706.04859) 52 | 53 | 54 | ## Face Recognition 55 | [ArcFace](https://arxiv.org/abs/1801.07698) 56 | 57 | 58 | ## Adaptation 59 | [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/abs/1801.06146) 60 | 61 | 62 | ## Data Augmentation 63 | [mixup: Beyond Empirical Risk Minimization](https://arxiv.org/abs/1710.09412) 64 | 65 | [Random Erasing Data Augmentation](https://arxiv.org/abs/1708.04896) 66 | 67 | [Manifold Mixup: Encouraging Meaningful On-Manifold Interpolation as a Regularizer](https://arxiv.org/abs/1806.05236) 68 | 69 | ## ODE 70 | [Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations](https://arxiv.org/abs/1710.10121) 71 | 72 | [Neural Ordinary Differential Equations](https://arxiv.org/abs/1806.07366) 73 | 74 | ## Temporal/Spatial information 75 | 76 | [An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution](https://arxiv.org/abs/1807.03247) 77 | 78 | All you need is attention 79 | -------------------------------------------------------------------------------- /daily/Aug28.markdown: -------------------------------------------------------------------------------- 1 | **[Learning Hierarchical Features from Generative Models](https://arxiv.org/abs/1702.08396)** 2 | 3 | Recurrent-like hierarchical VAE 4 | 5 | [Understanding Attentive Recurrent Comparators](https://medium.com/@sanyamagarwal/understanding-attentive-recurrent-comparators-ea1b741da5c3) 6 | 7 | Attentive Recurrent Comparators are interesting: alternative attention between two source. 8 | --------------------------------------------------------------------------------