└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Distributed Deep Learning Reads 2 | 3 | Compilation of literature related to distributed deep learning. Pull requests welcome :) 4 | 5 | * [100-epoch ImageNet Training with AlexNet in 24 Minutes](https://arxiv.org/abs/1709.05011) 6 | * [Accumulated Gradient Normalization](https://arxiv.org/abs/1710.02368) 7 | * [Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour](https://arxiv.org/pdf/1706.02677.pdf) 8 | * [Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization](http://papers.nips.cc/paper/5751-asynchronous-parallel-stochastic-gradient-for-nonconvex-optimization.pdf) 9 | * [Asynchrony begets Momentum, with an Application to Deep Learning](https://arxiv.org/abs/1605.09774) 10 | * [Bandwidth Optimal All-reduce Algorithms for Clusters of Workstations](http://www.cs.fsu.edu/~xyuan/paper/09jpdc.pdf) 11 | * [Bringing HPC Techniques to Deep Learning](http://research.baidu.com/bringing-hpc-techniques-deep-learning/) 12 | * [Deep learning with Elastic Averaging SGD](https://arxiv.org/abs/1412.6651) 13 | * [Distributed Delayed Stochastic Optimization](https://arxiv.org/abs/1104.5525) 14 | * [Don't Decay the Learning Rate, Increase the Batch Size](https://arxiv.org/abs/1711.00489) 15 | * [FireCaffe: near-linear acceleration of deep neural network training on compute clusters](https://arxiv.org/abs/1511.00175) 16 | * [Heterogeneity-aware Distributed Parameter Servers](https://ds3lab.org/wp-content/uploads/2017/07/sigmod2017_jiang.pdf) 17 | * [Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent](https://people.eecs.berkeley.edu/~brecht/papers/hogwildTR.pdf) 18 | * [How to scale distributed deep learning?](https://arxiv.org/abs/1611.04581) 19 | * [ImageNet Training in Minutes](https://arxiv.org/abs/1709.05011) 20 | * [Joeri Hermans ADAG Blog](http://joerihermans.com/ramblings/distributed-deep-learning-part-1-an-introduction/) 21 | * [Large Scale Distributed Deep Networks](https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf) 22 | * [Meet Horovod: Uber’s Open Source Distributed Deep Learning Framework for TensorFlow](https://eng.uber.com/horovod/) 23 | * [More Effective Distributed ML via a Stale 24 | Synchronous Parallel Parameter Server](http://repository.cmu.edu/cgi/viewcontent.cgi?article=1163&context=machine_learning) 25 | * [Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs](https://arxiv.org/abs/1606.04487) 26 | * [On Parallelizability of Stochastic Gradient Descent for Speech DNNs](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ParallelSGD-ICASSP2014-published.pdf) 27 | * [On Scalable Deep Learning and Parallelizing Gradient Descent](https://github.com/JoeriHermans/master-thesis/tree/master/thesis) 28 | * [One weird trick for parallelizing convolutional neural networks](https://arxiv.org/abs/1404.5997) 29 | * [Parallel training of DNNs with Natural Gradient and Parameter Averaging](https://arxiv.org/abs/1410.7455) 30 | * [Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines](https://arxiv.org/abs/1512.06216) 31 | * [PowerAI DDL](https://arxiv.org/abs/1708.02188) 32 | * [Revisiting Distributed Synchronous SGD](https://arxiv.org/pdf/1604.00981.pdf) 33 | * [Scalable Distributed DNN Training Using Commodity GPU Cloud Computing](https://s3-us-west-2.amazonaws.com/amazon.jobs-public-documents/strom_interspeech2015.pdf) 34 | * [SparkNet: Training Deep Networks in Spark](https://arxiv.org/abs/1511.06051) 35 | * [Staleness-aware Async-SGD for Distributed Deep Learning](https://arxiv.org/abs/1511.05950) 36 | * [TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning](https://arxiv.org/abs/1705.07878) 37 | --------------------------------------------------------------------------------