├── README.md
└── daily
    └── Aug28.markdown


/README.md:
--------------------------------------------------------------------------------
 1 | # tricks-used-in-deep-learning
 2 | Tricks used in deep learning. Including papers read recently.
 3 | 
 4 | ## Improving softmax
 5 | 
 6 | Gumbel-Softmax: [Categorical Reparameterization with Gumbel-Softmax](https://arxiv.org/abs/1611.01144)
 7 | 
 8 | Confidence penalty: [Regularizing Neural Networks by Penalizing Confident Output Distributions](https://arxiv.org/abs/1701.06548)
 9 | 
10 | ## Normalization
11 | 
12 | weight normalization: [Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks](https://arxiv.org/abs/1602.07868)
13 | 
14 | Batch Renormalization: [Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models](https://arxiv.org/abs/1702.03275)
15 | 
16 | ## Weight compressing
17 | 
18 | [Soft weight-sharing for Neural Network compression](https://arxiv.org/abs/1702.04008)
19 | 
20 | ## GAN
21 | 
22 | [GAN tricks](https://github.com/soumith/ganhacks)
23 | 
24 | [Wasserstein GAN](https://arxiv.org/abs/1701.07875):[my implementation](https://github.com/bobchennan/Wasserstein-GAN-Keras)
25 | [Example on MNIST](https://gist.github.com/f0k/f3190ebba6c53887d598d03119ca2066)
26 | 
27 | [Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities](https://arxiv.org/abs/1701.06264)
28 | 
29 | 
30 | ## Matrix Factorization
31 | 
32 | [MCPCA](https://arxiv.org/abs/1702.05471v1)
33 | 
34 | 
35 | ## Feature representation
36 | [Attentive Recurrent Comparators](https://arxiv.org/abs/1703.00767)
37 | [code](https://github.com/pranv/ARC)
38 | 
39 | 
40 | ## Training
41 | [Decoupled Neural Interfaces using Synthetic Gradients](https://arxiv.org/abs/1608.05343)
42 | 
43 | ## Dropout
44 | [Variational Dropout Sparsifies Deep Neural Networks](https://arxiv.org/abs/1701.05369)
45 | [code](https://github.com/ars-ashuha/variational-dropout-sparsifies-dnn)
46 | 
47 | [Concrete Dropout](https://arxiv.org/abs/1705.07832)
48 | 
49 | 
50 | ## Transfer Learning
51 | [Sobolev Training for Neural Networks](https://arxiv.org/abs/1706.04859)
52 | 
53 | 
54 | ## Face Recognition
55 | [ArcFace](https://arxiv.org/abs/1801.07698)
56 | 
57 | 
58 | ## Adaptation
59 | [Universal Language Model Fine-tuning for Text Classification](https://arxiv.org/abs/1801.06146)
60 | 
61 | 
62 | ## Data Augmentation
63 | [mixup: Beyond Empirical Risk Minimization](https://arxiv.org/abs/1710.09412)
64 | 
65 | [Random Erasing Data Augmentation](https://arxiv.org/abs/1708.04896)
66 | 
67 | [Manifold Mixup: Encouraging Meaningful On-Manifold Interpolation as a Regularizer](https://arxiv.org/abs/1806.05236)
68 | 
69 | ## ODE
70 | [Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations](https://arxiv.org/abs/1710.10121)
71 | 
72 | [Neural Ordinary Differential Equations](https://arxiv.org/abs/1806.07366)
73 | 
74 | ## Temporal/Spatial information
75 | 
76 | [An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution](https://arxiv.org/abs/1807.03247)
77 | 
78 | All you need is attention
79 | 


--------------------------------------------------------------------------------
/daily/Aug28.markdown:
--------------------------------------------------------------------------------
1 | **[Learning Hierarchical Features from Generative Models](https://arxiv.org/abs/1702.08396)**
2 | 
3 | Recurrent-like hierarchical VAE
4 | 
5 | [Understanding Attentive Recurrent Comparators](https://medium.com/@sanyamagarwal/understanding-attentive-recurrent-comparators-ea1b741da5c3)
6 | 
7 | Attentive Recurrent Comparators are interesting: alternative attention between two source.
8 | 


--------------------------------------------------------------------------------