└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Distributed Deep Learning Reads
 2 | 
 3 | Compilation of literature related to distributed deep learning.  Pull requests welcome :)
 4 | 
 5 | * [100-epoch ImageNet Training with AlexNet in 24 Minutes](https://arxiv.org/abs/1709.05011)
 6 | * [Accumulated Gradient Normalization](https://arxiv.org/abs/1710.02368)
 7 | * [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/pdf/1706.02677.pdf)
 8 | * [Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization](http://papers.nips.cc/paper/5751-asynchronous-parallel-stochastic-gradient-for-nonconvex-optimization.pdf)
 9 | * [Asynchrony begets Momentum, with an Application to Deep Learning](https://arxiv.org/abs/1605.09774)
10 | * [Bandwidth Optimal All-reduce Algorithms for Clusters of Workstations](http://www.cs.fsu.edu/~xyuan/paper/09jpdc.pdf)
11 | * [Bringing HPC Techniques to Deep Learning](http://research.baidu.com/bringing-hpc-techniques-deep-learning/)
12 | * [Deep learning with Elastic Averaging SGD](https://arxiv.org/abs/1412.6651)
13 | * [Distributed Delayed Stochastic Optimization](https://arxiv.org/abs/1104.5525)
14 | * [Don't Decay the Learning Rate, Increase the Batch Size](https://arxiv.org/abs/1711.00489)
15 | * [FireCaffe: near-linear acceleration of deep neural network training on compute clusters](https://arxiv.org/abs/1511.00175)
16 | * [Heterogeneity-aware Distributed Parameter Servers](https://ds3lab.org/wp-content/uploads/2017/07/sigmod2017_jiang.pdf)
17 | * [Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent](https://people.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf)
18 | * [How to scale distributed deep learning?](https://arxiv.org/abs/1611.04581)
19 | * [ImageNet Training in Minutes](https://arxiv.org/abs/1709.05011)
20 | * [Joeri Hermans ADAG Blog](http://joerihermans.com/ramblings/distributed-deep-learning-part-1-an-introduction/)
21 | * [Large Scale Distributed Deep Networks](https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf)
22 | * [Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow](https://eng.uber.com/horovod/)
23 | * [More Effective Distributed ML via a Stale
24 | Synchronous Parallel Parameter Server](http://repository.cmu.edu/cgi/viewcontent.cgi?article=1163&context=machine_learning)
25 | * [Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs](https://arxiv.org/abs/1606.04487)
26 | * [On Parallelizability of Stochastic Gradient Descent for Speech DNNs](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ParallelSGD-ICASSP2014-published.pdf)
27 | * [On Scalable Deep Learning and Parallelizing Gradient Descent](https://github.com/JoeriHermans/master-thesis/tree/master/thesis)
28 | * [One weird trick for parallelizing convolutional neural networks](https://arxiv.org/abs/1404.5997)
29 | * [Parallel training of DNNs with Natural Gradient and Parameter Averaging](https://arxiv.org/abs/1410.7455)
30 | * [Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines](https://arxiv.org/abs/1512.06216)
31 | * [PowerAI DDL](https://arxiv.org/abs/1708.02188)
32 | * [Revisiting Distributed Synchronous SGD](https://arxiv.org/pdf/1604.00981.pdf)
33 | * [Scalable Distributed DNN Training Using Commodity GPU Cloud Computing](https://s3-us-west-2.amazonaws.com/amazon.jobs-public-documents/strom_interspeech2015.pdf)
34 | * [SparkNet: Training Deep Networks in Spark](https://arxiv.org/abs/1511.06051)
35 | * [Staleness-aware Async-SGD for Distributed Deep Learning](https://arxiv.org/abs/1511.05950)
36 | * [TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning](https://arxiv.org/abs/1705.07878)
37 | 


--------------------------------------------------------------------------------