├── README.md
├── dist_ml.md
├── dl_cnn.md
├── dl_opt.md
├── dl_sys.md
├── graph.md
└── matrix_fact.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Fast and Scalable Machine Learning: Algorithms and Systems
 2 | 
 3 | 
 4 | This is a collection of papers about recent progress in machine learning and systems, including distributed machine learning, deep learning and etc.
 5 | 
 6 | ## Contents
 7 | 1. Deep Learning
 8 | 	- [Convolutional Neural Networks](dl_cnn.md)  
 9 | 		- [ImageNet Models](dl_cnn.md#imagenet-models)  
10 | 		- [Architecture Design](dl_cnn.md#architecture-design)  
11 | 		- [Activation Functions](dl_cnn.md#activation-functions)  
12 | 		- [Visualization](dl_cnn.md#visualization)  
13 | 		- [Fast Convolution](dl_cnn.md#fast-convolution)  
14 | 		- [Low-Rank Filter Approximation](dl_cnn.md#low-rank-filter-approximation)
15 | 		- [Low Precision](dl_cnn.md#low-precision)  
16 | 		- [Parameter Pruning](dl_cnn.md#parameter-pruning)  
17 | 		- [Transfer Learning](dl_cnn.md#transfer-learning)  
18 | 		- [Theory](dl_cnn.md#theory)  
19 | 		- [3D Data](dl_cnn.md#3d-data)  
20 | 		- [Hardware](dl_cnn.md#hardware)  
21 | 	- [Optimization for Deep Learning](dl_opt.md)  
22 | 		- [Generalization](dl_opt.md#generalization)  
23 | 		- [Loss Surface](dl_opt.md#loss-surface)  
24 | 		- [Batch Size](dl_opt.md#batch-size)  
25 | 		- [General](dl_opt.md#general)  
26 | 		- [Adaptive Gradient Methods](dl_opt.md#adaptive-gradient-methods)  
27 | 		- [Distributed Optimization](dl_opt.md#distributed-optimization)  
28 | 		- [Initialization](dl_opt.md#initialization)  
29 | 		- [Low Precision](dl_opt.md#low-precision)  
30 | 		- [Normalization](dl_opt.md#normalization)  
31 | 		- [Regularization](dl_opt.md#regularization)  
32 | 		- [Meta Learning](dl_opt.md#meta-learning)  
33 | 	- [Deep Learning Systems](dl_sys.md) 	
34 | 		- [General Frameworks](dl_sys.md#general-frameworks)  
35 | 		- [Specific System](dl_sys.md#specific-system)  
36 | 		- [Parallelization](dl_sys.md#parallelization)  
37 | 2. Distributed Machine Learning
38 | 	- [Distributed Optimization](dist_ml.md#distributed-optimization)
39 | 	- [Distributed ML Systems](dist_ml.md#distributed-ml-systems)
40 | 3. Other Topics
41 | 	- [Matrix Factorization](matrix_fact.md)
42 | 	- [Graph Computation](graph.md)
43 | 


--------------------------------------------------------------------------------
/dist_ml.md:
--------------------------------------------------------------------------------
 1 | # Distributed Machine Learning
 2 | 
 3 | - [Distributed Optimization](#distributed-optimization)  
 4 | - [Distributed ML Systems](#distributed-ml-systems)
 5 | 
 6 | ## Distributed Optimization
 7 | - 2016 ICDM [Efficient Distributed SGD with Variance Reduction](https://arxiv.org/pdf/1512.02970.pdf)  
 8 | - 2016 KDD [Robust Large-Scale Machine Learning in the Cloud](http://www.kdd.org/kdd2016/papers/files/Paper_801.pdf)
 9 | - 2015 KDD [Netowrk Lasso: Clustering and Optimization in Large
10 | Graphs](http://web.stanford.edu/~hallac/Network_Lasso.pdf)  
11 | - 2012 JMLR [Distributed Learning, Communication Complexity and Privacy](http://www.cs.cmu.edu/~avrim/Papers/DistLrn.pdf)  
12 | - 2012 AISTATS [Protocols for Learning Classifiers on Distributed Data](https://www.cs.utah.edu/~jeffp/papers/distrib-learn-AIStat.pdf)  
13 | - 2010 NIPS [Parallelized Stochastic Gradient Descent](http://martin.zinkevich.org/publications/nips2010.pdf) | [video](http://videosrv14.cs.washington.edu/info/videos/mp4/colloq/AAgarwal_140210.mp4) (One-Short)  
14 | - 2010 NAACL [Distributed Training Strategies for the Structured Perceptron](http://www.cslu.ogi.edu/~bedricks/courses/cs506-pslc/articles/week3/dpercep.pdf)   
15 | - 2009 NIPS [Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models](http://www.ryanmcd.com/papers/efficient_maxentNIPS2009.pdf)  
16 | - 2009 NIPS [Slow Learners are Fast](http://papers.nips.cc/paper/3888-slow-learners-are-fast.pdf)
17 | 
18 | ### Communication Efficiency, Complexity, Delay, Latency
19 | - 2014 ATC [Exploiting bounded staleness to speed up Big Data analytics](https://www.usenix.org/system/files/conference/atc14/atc14-paper-cui.pdf)  
20 | 2014 NIPS [Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation](http://papers.nips.cc/paper/5386-fundamental-limits-of-online-and-distributed-algorithms-for-statistical-learning-and-estimation.pdf)  
21 | - 2014 ICML [Communication-Efficient Distributed Optimization using an Approximate Newton-type Method](http://jmlr.org/proceedings/papers/v32/shamir14.pdf)  
22 | - 2013 NIPS [Information-theoretic lower bounds for distributed statistical estimation with communication constraints](http://www.cs.berkeley.edu/~yuczhang/files/nips13_communication.pdf)  
23 | - 2013 NIPS [Optimistic Concurrency Control for Distributed Unsupervised Learning](http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2013_5038.pdf)  
24 | - 2013 SDM [Butterfly Mixing: Accelerating Incremental-Update Algorithms on Clusters](http://www.cs.berkeley.edu/~jfc/papers/13/butterflymixing.pdf)  
25 | - 2012 NIPS [Communication-Efficient Algorithms for Statistical Optimization](http://papers.nips.cc/paper/4728-communication-efficient-algorithms-for-statistical-optimization.pdf)  
26 | - 2011 NIPS [Distributed Delayed Stochastic Optimization](http://papers.nips.cc/paper/4247-distributed-delayed-stochastic-optimization.pdf)  
27 | 
28 | ### Distributed Mini-Batching
29 | <!--- 2015 [Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises](http://web.stanford.edu/group/pdplab/pdphandbook/)  -->
30 | - 2014 KDD [Efficient Mini-batch Training for Stochastic Optimization](http://www.cs.cmu.edu/~muli/file/minibatch_sgd.pdf)  
31 | - 2012 JMLR [Optimal Distributed Online Prediction Using Mini-Batches](http://jmlr.org/papers/volume13/dekel12a/dekel12a.pdf)  
32 | - 2011 ICML [Optimal Distributed Online Prediction](http://www.icml-2011.org/papers/404_icmlpaper.pdf)  
33 | - 2011 NIPS [Better Mini-Batch Algorithms via Accelerated Gradient Methods](http://papers.nips.cc/paper/4432-better-mini-batch-algorithms-via-accelerated-gradient-methods.pdf)  
34 | 
35 | ### Distributed Consensus
36 | - 2016 ICLRW [Revisiting Distributed Synchronous SGD](http://arxiv.org/abs/1604.00981)  
37 | - 2014 [Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers](http://web.stanford.edu/~boyd/papers/admm_distr_stats.html) (ADMM)  
38 | - 2012 IEEE Trans. on Automatic Control [Dual Averaging for Distributed Optimization:
39 | Convergence Analysis and Network Scaling](http://www.eecs.berkeley.edu/~wainwrig/Papers/DucAgaWai12.pdf)  
40 | - 2010 NIPS [Distributed Dual Averaging in Networks](https://web.stanford.edu/~jduchi/projects/DuchiAgWa10_nips.pdf)  
41 | - 2009 IEEE Trans. on Automatic Control [Distributed subgradient methods for multi-agent optimization](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4749425) | [slides](http://groups.csail.mit.edu/tds/seminars/s09/MIT-talk.pdf)  
42 | - 2008 Convex Optimization in Signal Processing and Communications [Cooperative Distributed Multi-Agent Optimization](https://asu.mit.edu/sites/default/files/documents/publications/Dist-chapter.pdf)  
43 | 
44 | 
45 | ## Distributed ML Systems  
46 | - 2014 APSys [A Scalable and Topology Configurable Protocol for Distributed Parameter Synchronization](http://research.microsoft.com/pubs/219927/main.pdf)  
47 | - 2014 ICML Tutorial [Emerging System for Large-Scale Machine Learning](http://www.cs.berkeley.edu/~jegonzal/talks/icml14_sysml.pdf)  
48 | - 2013 Distributed Computing [When distributed computation is communication expensive](http://arxiv.org/abs/1304.4636)    
49 | 
50 | ### MapReduce / AllReduce
51 | - 2014 JMLR [A Reliable Effective Terascale Linear Learning System](http://jmlr.org/papers/volume15/agarwal14a/agarwal14a.pdf)  
52 | - 2010 NIPSW [MapReduce/Bigtable for Distributed Optimization](http://www.australianscience.com.au/research/google/36948.pdf)[slides](http://lccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdf)  
53 | - 2007 NIPS [Map-Reduce for Machine Learning on Multicore](http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_725.pdf)  
54 | 
55 | ### Parameter Servers
56 | - 2014 OSDI [Project Adam: Building an Efficient and Scalable Deep Learning Training System](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf)  	 
57 | - 2014 OSDI [Scaling Distributed Machine Learning with the Parameter Server](http://www.cs.cmu.edu/~muli/file/parameter_server_osdi14.pdf)  
58 | - 2014 NIPS [Communication Efficient Distributed Machine
59 | Learning with the Parameter Server](http://www.cs.cmu.edu/~muli/file/parameter_server_nips14.pdf)   
60 | - 2013 NIPSW [Parameter Server for Distributed Machine Learning](http://www.cs.cmu.edu/~muli/file/ps.pdf)  
61 | - 2013 NIPSW [Distributed Delayed Proximal Gradient Methods](http://www.cs.cmu.edu/~muli/file/ddp.pdf)  
62 | - 2012 NIPS [Large Scale Distributed Deep Networks](http://static.googleusercontent.com/media/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf) (DistBelief)  
63 | - 2010 VLDB [An Architecture for Parallel Topic Models](http://vldb.org/pvldb/vldb2010/papers/R63.pdf)  
64 | 
65 | ### Peer-to-Peer
66 | - 2015 EuroSys [MALT: Distributed Data-Parallelism for Existing ML Applications](http://www.nec-labs.com/~asim/papers/malt_eurosys15.pdf)  
67 | 


--------------------------------------------------------------------------------
/dl_cnn.md:
--------------------------------------------------------------------------------
  1 | # Convolutional Neural Networks
  2 | 
  3 | - [ImageNet Models](#imagenet-models)  
  4 | - [Architecture Design](#architecture-design)
  5 | - [Activation Functions](#activation-functions)
  6 | - [Visualization](#visualization)
  7 | - [Fast Convolution](#fast-convolution)
  8 | - [Low-Rank Filter Approximation](#low-rank-filter-approximation)
  9 | - [Low Precision](#low-precision)  
 10 | - [Parameter Pruning](#parameter-pruning)  
 11 | - [Transfer Learning](#transfer-learning)  
 12 | - [Theory](#theory)
 13 | - [3D Data](#3d-data)
 14 | - [Hardware](#hardware)  
 15 | 
 16 | ## ImageNet Models  
 17 | - 2017 CVPR [Xception: Deep Learning with Depthwise Separable Convolutions](http://openaccess.thecvf.com/content_cvpr_2017/papers/Chollet_Xception_Deep_Learning_CVPR_2017_paper.pdf)(Xception)  
 18 | - 2017 CVPR [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/pdf/1611.05431.pdf) (ResNeXt)  
 19 | - 2016 ECCV [Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027) (Pre-ResNet)   
 20 | - 2016 arXiv [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261) (Inception V4)  
 21 | - 2016 CVPR [Deep Residual Learning for Image Recognition](http://arxiv.org/abs/1512.03385) (ResNet)     
 22 | - 2015 arXiv [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) (Inception V3)  
 23 | - 2015 ICML [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](http://jmlr.org/proceedings/papers/v37/ioffe15.pdf) (Inception V2)  
 24 | - 2015 ICCV [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://research.microsoft.com/en-us/um/people/kahe/publications/iccv15imgnet.pdf) (PReLU)  
 25 | - 2015 ICLR [Very Deep Convolutional Networks For Large-scale Image Recognition](http://arxiv.org/abs/1409.1556) (VGG)  
 26 | - 2015 CVPR [Going Deeper with Convolutions](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43022.pdf) (GoogleNet/Inception V1)   
 27 | - 2012 NIPS [ImageNet Classification with Deep Convolutional Neural Networks](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (AlexNet)  
 28 | 
 29 | ## Architecture Design
 30 | - 2018 arXiv [Regularized Evolution for Image Classifier Architecture Search](https://arxiv.org/pdf/1802.01548.pdf)  
 31 | - 2018 CVPR [Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/pdf/1707.07012.pdf)  
 32 | - 2017 arXiv [One Model To Learn Them All](https://arxiv.org/abs/1706.05137)  
 33 | - 2017 arXiv [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861)  
 34 | - 2017 ICML [AdaNet: Adaptive Structural Learning of Artificial Neural Networks](https://arxiv.org/pdf/1607.01097.pdf)  
 35 | - 2017 ICML [Large-Scale Evolution of Image Classifiers](https://arxiv.org/abs/1703.01041)  
 36 | - 2017 CVPR [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/pdf/1611.05431.pdf)  
 37 | - 2017 CVPR [Densely Connected Convolutional Networks](http://arxiv.org/abs/1608.06993)  
 38 | - 2017 ICLR [Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer](https://openreview.net/pdf?id=B1ckMDqlg)  
 39 | - 2017 ICLR [Neural Architecture Search with Reinforcement Learning](https://openreview.net/pdf?id=r1Ue8Hcxg)  
 40 | - 2017 ICLR [Designing Neural Network Architectures using Reinforcement Learning](https://openreview.net/pdf?id=S1c2cvqee)  
 41 | - 2017 ICLR [Do Deep Convolutional Nets Really Need to be Deep and Convolutional?](https://arxiv.org/abs/1603.05691)  
 42 | - 2017 ICLR [Highway and Residual Networks learn Unrolled Iterative Estimation](https://arxiv.org/pdf/1612.07771.pdf)  
 43 | - 2016 NIPS [Residual Networks Behave Like Ensembles of Relatively Shallow Networks](https://arxiv.org/abs/1605.06431)  
 44 | - 2016 BMVC [Wide Residual Networks](http://arxiv.org/abs/1605.07146)  
 45 | - 2016 arXiv [Benefits of depth in neural networks](http://arxiv.org/abs/1602.04485)  
 46 | - 2016 AAAI [On the Depth of Deep Neural Networks: A Theoretical View](http://arxiv.org/abs/1506.05232)  
 47 | - 2016 arXiv [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size](http://arxiv.org/abs/1602.07360)  
 48 | - 2015 ICMLW [Highway Networks](http://arxiv.org/pdf/1505.00387v2.pdf)  
 49 | - 2015 CVPR [Convolutional Neural Networks at Constrained Time Cost](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/He_Convolutional_Neural_Networks_2015_CVPR_paper.pdf)   
 50 | - 2015 CVPR [Fully Convolutional Networks for Semantic Segmentation](https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf)  
 51 | - 2014 NIPS [Do Deep Nets Really Need to be Deep?](http://papers.nips.cc/paper/5484-do-deep-nets-really-need-to-be-deep.pdf)  
 52 | - 2014 ICLRW [Understanding Deep Architectures using a Recursive Convolutional Network](http://arxiv.org/abs/1312.1847)  
 53 | - 2013 ICML [Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures](http://jmlr.org/proceedings/papers/v28/bergstra13.pdf)  
 54 | - 2009 ICCV [What is the Best Multi-Stage Architecture for Object Recognition?](http://yann.lecun.com/exdb/publis/pdf/jarrett-iccv-09.pdf)   
 55 | - 1995 NIPS [Simplifying Neural Nets by Discovering Flat Minima](https://papers.nips.cc/paper/899-simplifying-neural-nets-by-discovering-flat-minima.pdf)  
 56 | - 1994 T-NN [SVD-NET: An Algorithm that Automatically Selects Network Structure](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=286929)  
 57 | 
 58 | ## Activation Functions
 59 | - 2017 arXiv [Self-Normalizing Neural Networks](https://arxiv.org/abs/1706.02515) (SELU)  
 60 | - 2016 ICLR [Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)](https://arxiv.org/pdf/1511.07289.pdf) (ELU)  
 61 | - 2015 arXiv [Empirical Evaluation of Rectified Activations in Convolutional Network](https://arxiv.org/pdf/1505.00853.pdf) (RReLU)
 62 | - 2015 ICCV [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://research.microsoft.com/en-us/um/people/kahe/publications/iccv15imgnet.pdf) (PReLU)
 63 | - 2013 ICML [Rectifier Nonlinearities Improve Neural Network Acoustic Models](https://pdfs.semanticscholar.org/367f/2c63a6f6a10b3b64b8729d601e69337ee3cc.pdf)  
 64 | - 2010 ICML [Rectified Linear Units Improve Restricted Boltzmann Machines](http://www.cs.toronto.edu/~fritz/absps/reluICML.pdf) (ReLU)  
 65 | 
 66 | ## Visualization  
 67 | - 2017 CVPR [Network Dissection: Quantifying Interpretability of Deep Visual Representations](http://netdissect.csail.mit.edu/final-network-dissection.pdf)  
 68 | - 2016 IJCV [Visualizing Deep Convolutional Neural Networks Using Natural Pre-Images](https://arxiv.org/pdf/1512.02017.pdf)  
 69 | - 2016 ICMLW [Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks](http://www.evolvingai.org/files/mfv_icml_workshop_16.pdf)  
 70 | - 2016 CVPR [Inverting Visual Representations with Convolutional Networks](https://arxiv.org/pdf/1506.02753.pdf)  
 71 | - 2015 ICMLW [Understanding Neural Networks Through Deep Visualization](http://yosinski.com/media/papers/Yosinski__2015__ICML_DL__Understanding_Neural_Networks_Through_Deep_Visualization__.pdf)  
 72 | - 2015 CVPR [Understanding Deep Image Representations by Inverting Them](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mahendran_Understanding_Deep_Image_2015_CVPR_paper.pdf)  
 73 | - 2014 ECCV [Visualizing and Understanding Convolutional Networks](https://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf)  
 74 | - 2014 ICLRW [Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps](https://arxiv.org/pdf/1312.6034.pdf)  
 75 | - 2009 [Visualizing Higher-Layer Features of a Deep Network](https://www.researchgate.net/profile/Aaron_Courville/publication/265022827_Visualizing_Higher-Layer_Features_of_a_Deep_Network/links/53ff82b00cf24c81027da530.pdf)  
 76 | 
 77 | ## Fast Convolution
 78 | - 2017 ICML [Warped Convolutions: Efficient Invariance to Spatial Transformations](https://arxiv.org/pdf/1609.04382.pdf)  
 79 | - 2017 ICLR [Faster CNNs with Direct Sparse Convolutions and Guided Pruning](https://openreview.net/pdf?id=rJPcZ3txx)  
 80 | - 2016 NIPS [PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions](http://arxiv.org/abs/1504.08362)  
 81 | - 2016 CVPR [Fast Algorithms for Convolutional Neural Networks](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Lavin_Fast_Algorithms_for_CVPR_2016_paper.pdf) (Winograd)  
 82 | - 2015 CVPR [Sparse Convolutional Neural Networks](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Liu_Sparse_Convolutional_Neural_2015_CVPR_paper.pdf)  
 83 | 
 84 | ## Low-Rank Filter Approximation
 85 | - 2016 ICLR [Convolutional Neural Networks with Low-rank Regularization](https://arxiv.org/abs/1511.06067)  
 86 | - 2016 ICLR [Training CNNs with Low-Rank Filters for Efficient Image Classification](http://arxiv.org/abs/1511.06744)  
 87 | - 2016 TPAMI [Accelerating Very Deep Convolutional Networks for Classification and Detection](https://arxiv.org/abs/1505.06798)  
 88 | - 2015 CVPR [Efficient and Accurate Approximations of Nonlinear Convolutional Networks](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Zhang_Efficient_and_Accurate_2015_CVPR_paper.pdf)  
 89 | - 2015 ICLR [Speeding-up convolutional neural networks using fine-tuned cp-decomposition](https://arxiv.org/pdf/1412.6553v3.pdf)  
 90 | - 2014 NIPS [Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation](http://papers.nips.cc/paper/5544-exploiting-linear-structure-within-convolutional-networks-for-efficient-evaluation.pdf)  
 91 | - 2014 BMVC [Speeding up Convolutional Neural Networks with Low Rank Expansions](https://arxiv.org/abs/1405.3866)  
 92 | - 2013 NIPS [Predicting Parameters in Deep Learning](https://papers.nips.cc/paper/5025-predicting-parameters-in-deep-learning.pdf)  
 93 | - 2013 CVPR [Learning Separable Filters](http://cvlabwww.epfl.ch/~lepetit/papers/rigamonti_cvpr13.pdf)  
 94 | 
 95 | ## Low Precision  
 96 | - 2018 AAAI [Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM](https://arxiv.org/pdf/1707.09870.pdf)  
 97 | - 2018 ICLR [Training and Inference with Integers in Deep Neural Networks](https://arxiv.org/pdf/1802.04680.pdf)  
 98 | - 2018 ICLR [Mixed Precision Training](https://arxiv.org/pdf/1710.03740.pdf)
 99 | - 2017 arXiv [BitNet: Bit-Regularized Deep Neural Networks](https://arxiv.org/pdf/1708.04788.pdf)  
100 | - 2017 arXiv [Gradient Descent for Spiking Neural Networks](https://arxiv.org/abs/1706.04698)  
101 | - 2017 arXiv [ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks](https://arxiv.org/abs/1706.02393)  
102 | - 2017 arXiv [Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework](https://arxiv.org/abs/1705.09283)  
103 | - 2018 ICLR [The High-Dimensional Geometry of Binary Neural Networks](https://arxiv.org/pdf/1705.07199.pdf)  
104 | - 2017 NIPS [Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks](https://arxiv.org/pdf/1711.02213.pdf)  
105 | - 2017 NIPS [Training Quantized Nets: A Deeper Understanding](https://arxiv.org/abs/1706.02379)  
106 | - 2017 NIPS [TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning](https://arxiv.org/abs/1705.07878)  
107 | - 2017 ICML [Analytical Guarantees on Numerical Precision of Deep Neural Networks](http://proceedings.mlr.press/v70/sakr17a/sakr17a.pdf)  
108 | - 2017 CVPR [Deep Learning with Low Precision by Half-wave Gaussian Quantization](https://arxiv.org/abs/1702.00953)    
109 | - 2017 CVPR [Network Sketching: Exploiting Binary Structure in Deep CNNs](https://arxiv.org/pdf/1706.02021.pdf)  
110 | - 2017 CVPR [Local Binary Convolutional Neural Networks](http://openaccess.thecvf.com/content_cvpr_2017/papers/Juefei-Xu_Local_Binary_Convolutional_CVPR_2017_paper.pdf)  
111 | - 2017 ICLR [Towards the Limit of Network Quantization](https://openreview.net/pdf?id=rJ8uNptgl)  
112 | - 2017 ICLR [Loss-aware Binarization of Deep Networks](https://openreview.net/pdf?id=S1oWlN9ll)  
113 | - 2017 ICLR [Trained Ternary Quantization](https://openreview.net/pdf?id=S1_pAu9xl)  
114 | - 2017 ICLR [Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights](https://openreview.net/pdf?id=HyQJ-mclg)  
115 | - 2017 AAAI [How to Train a Compact Binary Neural Network with High Accuracy? ](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjYwK75-bLXAhXKTSYKHRG_DkAQFggoMAA&url=https%3A%2F%2Faaai.org%2Focs%2Findex.php%2FAAAI%2FAAAI17%2Fpaper%2Fdownload%2F14619%2F14454&usg=AOvVaw2S-_PPueqpoUp5PvWduGKG)
116 | - 2016 arXiv [Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations](https://arxiv.org/abs/1609.07061)  
117 | - 2016 arXiv [Accelerating Deep Convolutional Networks using low-precision and sparsity](https://arxiv.org/abs/1610.00324)  
118 | - 2016 arXiv [Deep neural networks are robust to weight binarization and other non-linear distortions](https://arxiv.org/pdf/1606.01981.pdf)  
119 | - 2016 ECCV [XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks](https://arxiv.org/pdf/1603.05279.pdf)  
120 | - 2016 ICMLW [Overcoming Challenges in Fixed Point Training of Deep Convolutional Networks](https://arxiv.org/pdf/1607.02241.pdf)  
121 | - 2016 ICML [Fixed Point Quantization of Deep Convolutional Networks](http://jmlr.org/proceedings/papers/v48/linb16.pdf)  
122 | - 2016 NIPS [Binarized Neural Networks](https://papers.nips.cc/paper/6573-binarized-neural-networks.pdf)  
123 | - 2016 arXiv [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](http://arxiv.org/abs/1602.02830)  
124 | - 2016 CVPR [Quantized Convolutional Neural Networks for Mobile Devices](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Wu_Quantized_Convolutional_Neural_CVPR_2016_paper.pdf)  
125 | - 2016 ICLR [Neural Networks with Few Multiplications](https://arxiv.org/abs/1510.03009)   
126 | - 2015 arXiv [Resiliency of Deep Neural Networks under Quantization](https://arxiv.org/abs/1511.06488)  
127 | - 2015 arXiv [Rounding Methods for Neural Networks with Low Resolution Synaptic Weights](https://arxiv.org/abs/1504.05767)  
128 | - 2015 NIPS [Backpropagation for Energy-Efficient Neuromorphic Computing](https://papers.nips.cc/paper/5862-backpropagation-for-energy-efficient-neuromorphic-computing.pdf)  
129 | - 2015 NIPS [BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations](https://papers.nips.cc/paper/5647-binaryconnect-training-deep-neural-networks-with-binary-weights-during-propagations.pdf)  
130 | - 2015 ICMLW [Bitwise Neural Networks](http://minjekim.com/papers/icml2015_mkim.pdf)  
131 | - 2015 ICML [Deep Learning with Limited Numerical Precision](http://www.jmlr.org/proceedings/papers/v37/gupta15.pdf)  
132 | - 2015 ICLRW [Training deep neural networks with low precision multiplications](https://arxiv.org/abs/1412.7024)    
133 | - 2015 arXiv [Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation](https://arxiv.org/abs/1503.03562)   
134 | - 2014 NIPS [Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights](https://papers.nips.cc/paper/5269-expectation-backpropagation-parameter-free-training-of-multilayer-neural-networks-with-continuous-or-discrete-weights.pdf)  
135 | - 2013 arXiv [Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation](https://arxiv.org/pdf/1308.3432.pdf)  
136 | - 2011 NIPSW [Improving the speed of neural networks on CPUs](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37631.pdf)  
137 | - 1987 Combinatorica [Randomized rounding: A technique for provably good algorithms and algorithmic proofs](https://www.cs.auckland.ac.nz/~cthombor/Pubs/RandomRounding/RandomRounding1987.pdf)  
138 | 
139 | ## Parameter Pruning  
140 | - 2018 ICLR [On the importance of single directions for generalization](https://arxiv.org/pdf/1803.06959.pdf)  
141 | - 2018 ICLR [Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers](https://arxiv.org/pdf/1802.00124.pdf)  
142 | - 2017 NIPS [Runtime Neural Pruning](https://papers.nips.cc/paper/6813-runtime-neural-pruning.pdf)  
143 | - 2017 ICML [Beyond Filters: Compact Feature Map for Portable Deep Model](http://proceedings.mlr.press/v70/wang17m/wang17m.pdf)  
144 | - 2017 ICLR [Soft Weight-Sharing for Neural Network Compression](https://openreview.net/pdf?id=HJGwcKclx)  
145 | - 2017 ICLR [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://openreview.net/pdf?id=SJGCiw5gl)  
146 | - 2017 ICLR [Pruning Filters for Efficient ConvNets](https://openreview.net/pdf?id=rJqFGTslg)  
147 | - 2016 arXiv [Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning](https://arxiv.org/abs/1611.05128)  
148 | - 2016 arXiv [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250)  
149 | - 2016 NIPS [Learning the Number of Neurons in Deep Networks](https://rsu.forge.nicta.com.au/people/jalvarez/LNN/AlvarezSalzmannNIPS16.pdf)  
150 | - 2016 NIPS [Learning Structured Sparsity in Deep Learning](https://arxiv.org/abs/1608.03665) \[[code](https://github.com/wenwei202/caffe/tree/scnn)\]  
151 | - 2016 NIPS [Dynamic Network Surgery for Efficient DNNs](http://128.84.21.199/abs/1608.04493)  
152 | - 2016 ECCV [Less is More: Towards Compact CNNs](https://static-content.springer.com/esm/chp%3A10.1007%2F978-3-319-46493-0_40/MediaObjects/419976_1_En_40_MOESM1_ESM.pdf)  
153 | - 2016 CVPR [Fast ConvNets Using Group-wise Brain Damage](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Lebedev_Fast_ConvNets_Using_CVPR_2016_paper.pdf)  
154 | - 2016 ICLR [Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding](http://arxiv.org/abs/1510.00149)  
155 | - 2016 ICLR [Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications](http://arxiv.org/abs/1511.06530)
156 | - 2015 arXiv [Structured Pruning of Deep Convolutional Neural Networks](http://arxiv.org/abs/1512.08571)  
157 | - 2015 IEEE Access [Channel-Level Acceleration of Deep Face Representations](http://ieeexplore.ieee.org/document/7303876/)  
158 | - 2015 BMVC [Data-free parameter pruning for Deep Neural Networks](http://arxiv.org/abs/1507.06149)
159 | - 2015 ICML [Compressing Neural Networks with the Hashing Trick](http://jmlr.org/proceedings/papers/v37/chenc15.pdf)   
160 | - 2015 ICCV [Deep Fried Convnets](http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Yang_Deep_Fried_Convnets_ICCV_2015_paper.pdf)  
161 | - 2015 ICCV [An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections](http://felixyu.org/pdf/ICCV15_circulant.pdf)  
162 | - 2015 NIPS [Learning both Weights and Connections for Efficient Neural Networks](http://arxiv.org/abs/1506.02626)    
163 | - 2015 ICLR [FitNets: Hints for Thin Deep Nets](http://arxiv.org/pdf/1412.6550v4.pdf)  
164 | - 2014 arXiv [Compressing Deep Convolutional Networks using Vector Quantization](http://arxiv.org/abs/1412.6115)  
165 | - 2014 NIPSW [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)   
166 | - 1995 ISANN [Evaluating Pruning Methods](http://publications.idiap.ch/downloads/papers/1995/thimm-pruning-hop.pdf)  
167 | - 1993 T-NN [Pruning Algorithms--A Survey](http://axon.cs.byu.edu/~martinez/classes/678/Papers/Reed_PruningSurvey.pdf)  
168 | - 1989 NIPS [Optimal Brain Damage](http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf)  
169 | 
170 | ## Transfer Learning  
171 | - 2016 arXiv [What makes ImageNet good for transfer learning?](https://arxiv.org/abs/1608.08614)  
172 | - 2014 NIPS [How transferable are features in deep neural networks?](https://arxiv.org/pdf/1411.1792v1.pdf)  
173 | - 2014 CVPR [CNN Features off-the-shelf: an Astounding Baseline for Recognition](http://www.cv-foundation.org/openaccess/content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Features_Off-the-Shelf_2014_CVPR_paper.pdf)  
174 | - 2014 ICML [DeCAF: A Deep Convolutional Activation](http://proceedings.mlr.press/v32/donahue14.pdf)  
175 | 
176 | ## Theory
177 | - 2018 ICLR [When is a Convolutional Filter Easy to Learn?](https://arxiv.org/pdf/1709.06129.pdf)   
178 | - 2017 arXiv [Opening the black box of Deep Neural Networks via Information](https://arxiv.org/pdf/1703.00810.pdf)  
179 | - 2017 ICML [On the Expressive Power of Deep Neural Networks](https://arxiv.org/pdf/1606.05336v6.pdf)  
180 | - 2017 ICML [A Closer Look at Memorization in Deep Networks](https://arxiv.org/pdf/1706.05394.pdf)  
181 | - 2017 ICML [An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis](https://arxiv.org/pdf/1703.00560.pdf)  
182 | - 2016 NIPS [Exponential expressivity in deep neural networks through transient chaos](https://papers.nips.cc/paper/6322-exponential-expressivity-in-deep-neural-networks-through-transient-chaos.pdf)  
183 | - 2016 arXiv [Understanding Deep Convolutional Networks](https://arxiv.org/pdf/1601.04920.pdf)  
184 | - 2014 NIPS [On the number of linear regions of deep neural networks](http://papers.nips.cc/paper/5422-on-the-number-of-linear-regions-of-deep-neural-networks.pdf)
185 | - 2014 ICML [Provable Bounds for Learning Some Deep Representations](http://proceedings.mlr.press/v32/arora14.pdf)  
186 | - 2014 ICLR [On the number of response regions of deep feed forward networks with piece-wise linear activations](https://arxiv.org/pdf/1312.6098.pdf)   
187 | - 2014 ICLR [Revisiting natural gradient for deep networks](https://arxiv.org/pdf/1301.3584.pdf)  
188 | 
189 | ## 3D Data
190 | - 2017 NIPS [PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space](https://arxiv.org/pdf/1706.02413.pdf)  
191 | - 2017 ICCV [Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs](https://arxiv.org/pdf/1703.09438.pdf)  
192 | - 2017 SIGGRAPH [O-CNN: Octree-based Convolutional Neural Network for Understanding 3D Shapes](http://wang-ps.github.io/O-CNN_files/CNN3D.pdf)  
193 | - 2017 CVPR [PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation](https://arxiv.org/pdf/1612.00593.pdf)  
194 | - 2017 CVPR [OctNet: Learning Deep 3D Representations at High Resolutions](https://arxiv.org/pdf/1611.05009.pdf)  
195 | - 2016 NIPS [FPNN: Field Probing Neural Networks for 3D Data](https://papers.nips.cc/paper/6416-fpnn-field-probing-neural-networks-for-3d-data.pdf)  
196 | - 2016 NIPS [Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling](https://jiajunwu.com/papers/3dgan_nips.pdf)  
197 | - 2015 ICCV [Multi-view Convolutional Neural Networks for 3D Shape Recognition](http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Su_Multi-View_Convolutional_Neural_ICCV_2015_paper.pdf)  
198 | - 2015 BMVC [Sparse 3D convolutional neural networks](http://www.bmva.org/bmvc/2015/papers/paper150/paper150.pdf)  
199 | - 2015 CVPR [3D ShapeNets: A Deep Representation for Volumetric Shapes](http://3dshapenets.cs.princeton.edu/paper.pdf)  
200 | 
201 | ## Hardware
202 | - 2017 ISCA [In-Datacenter Performance Analysis of a Tensor Processing Unit](https://arxiv.org/pdf/1704.04760.pdf) (TPU)    
203 | - 2017 ISVLSI [YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights](https://arxiv.org/pdf/1606.05487.pdf)  
204 | - 2017 ASPLOS [SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing](https://arxiv.org/pdf/1611.05939.pdf)  
205 | - 2017 FPGA [Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks](http://jaewoong.org/pubs/fpga17-next-generation-dnns.pdf)  
206 | - 2015 NIPS Tutorial [High-Performance Hardware for Machine Learning](https://media.nips.cc/Conferences/2015/tutorialslides/Dally-NIPS-Tutorial-2015.pdf)  
207 | 


--------------------------------------------------------------------------------
/dl_opt.md:
--------------------------------------------------------------------------------
  1 | # Optimization for Deep Learning
  2 | 
  3 | - [Generalization](#generalization)
  4 | - [Loss Surface](#loss-surface)
  5 | - [Batch Size](#batch-size)
  6 | - [General](#general)  
  7 | - [Adaptive Gradient Methods](#adaptive-gradient-methods)
  8 | - [Distributed Optimization](#distributed-optimization)  
  9 | - [Initialization](#initialization)  
 10 | - [Low Precision](#low-precision)
 11 | - [Normalization](#normalization)
 12 | - [Regularization](#regularization)
 13 | - [Meta Learning](#meta-learning)
 14 | 
 15 | ## Generalization
 16 | - 2018 ICLR [Sensitivity and Generalization in Neural Networks: an Empirical Study](https://openreview.net/pdf?id=HJC2SzZCW)  
 17 | - 2018 arXiv [On Characterizing the Capacity of Neural Networks using Algebraic Topology](https://arxiv.org/pdf/1802.04443.pdf)  
 18 | - 2017 arXiv [Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data](https://arxiv.org/pdf/1703.11008.pdf)  
 19 | - 2017 NIPS [Exploring Generalization in Deep Learning](https://arxiv.org/pdf/1706.08947.pdf)  
 20 | - 2017 NIPS [Train longer, generalize better: closing the generalization gap in large batch training of neural networks](http://papers.nips.cc/paper/6770-train-longer-generalize-better-closing-the-generalization-gap-in-large-batch-training-of-neural-networks.pdf)  
 21 | - 2017 ICML [A Closer Look at Memorization in Deep Networks](https://arxiv.org/pdf/1706.05394.pdf)
 22 | - 2017 ICLR [Understanding deep learning requires rethinking generalization](https://openreview.net/pdf?id=Sy8gdB9xx)  
 23 | 
 24 | ## Loss Surface
 25 | - 2018 NIPS [Visualizing the Loss Landscape of Neural Nets](https://arxiv.org/pdf/1712.09913.pdf)  
 26 | - 2018 ICML [Essentially No Barriers in Neural Network Energy Landscape](https://arxiv.org/pdf/1803.00885.pdf)  
 27 | - 2018 arXiv [Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs](https://arxiv.org/abs/1802.10026)  
 28 | - 2018 ICML [Optimization Landscape and Expressivity of Deep CNNs](http://proceedings.mlr.press/v80/nguyen18a/nguyen18a.pdf)  
 29 | - 2018 ICLR [Measuring the Intrinsic Dimension of Objective Landscapes](https://arxiv.org/pdf/1804.08838.pdf)  
 30 | - 2017 ICML [The Loss Surface of Deep and Wide Neural Networks](https://arxiv.org/pdf/1704.08045.pdf)  
 31 | - 2017 ICML [Geometry of Neural Network Loss Surfaces via Random Matrix Theory](http://proceedings.mlr.press/v70/pennington17a/pennington17a.pdf)  
 32 | - 2017 ICML [Sharp Minima Can Generalize For Deep Nets](https://arxiv.org/pdf/1703.04933.pdf)    
 33 | - 2017 ICLR [Entropy-SGD: Biasing Gradient Descent Into Wide Valleys](https://arxiv.org/pdf/1611.01838.pdf)  
 34 | - 2017 ICLR [On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima](https://openreview.net/pdf?id=H1oyRlYgg)  
 35 | - 2017 arXiv [An empirical analysis of the optimization of deep network loss surfaces](https://arxiv.org/pdf/1612.04010.pdf)  
 36 | - 2016 ICMLW [Visualizing Deep Network Training Trajectories with PCA](https://icmlviz.github.io/icmlviz2016/assets/papers/24.pdf)  
 37 | - 2016 ICLRW [Stuck in a What? Adventures in Weight Space](https://arxiv.org/pdf/1602.07320.pdf)  
 38 | - 2015 ICLR [Qualitatively Characterizing Neural Network Optimization Problems](https://arxiv.org/pdf/1412.6544.pdf)  
 39 | - 2015 AISTATS [The Loss Surfaces of Multilayer Networks](http://www.jmlr.org/proceedings/papers/v38/choromanska15.pdf)  
 40 | - 2014 NIPS [Identifying and attacking the saddle point problem in high-dimensional non-convex optimization](http://papers.nips.cc/paper/5486-identifying-and-attacking-the-saddle-point-problem-in-high-dimensional-non-convex-optimization.pdf)  
 41 | 
 42 | ## Batch Size
 43 | - 2018 NIPS [Hessian-based Analysis of Large Batch Training and Robustness to Adversaries](https://arxiv.org/pdf/1802.08241.pdf)  
 44 | - 2018 ICLR [Don't Decay the Learning Rate, Increase the Batch Size](https://arxiv.org/pdf/1711.00489.pdf)  
 45 | - 2017 arXiv [Scaling SGD Batch Size to 32K for ImageNet Training](https://arxiv.org/pdf/1708.03888.pdf)  
 46 | - 2017 arXiv [Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677)  
 47 | - 2017 ICML [Sharp Minima Can Generalize For Deep Nets](https://arxiv.org/abs/1703.04933)  
 48 | - 2017 ICLR [On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima](https://openreview.net/pdf?id=H1oyRlYgg)  
 49 | 
 50 | 
 51 | ## General
 52 | - 2016 ICML [Train faster, generalize better: Stability of stochastic gradient descent](http://proceedings.mlr.press/v48/hardt16.pdf)  
 53 | - 2016 arXiv [Optimization Methods for Large-Scale Machine Learning](https://arxiv.org/abs/1606.04838)  
 54 | - 2016 Blog [An overview of gradient descent optimization algorithms](http://sebastianruder.com/optimizing-gradient-descent/index.html)  
 55 | - 2015 DL Summer School [Non-Smooth, Non-Finite, and Non-Convex Optimization](http://www.iro.umontreal.ca/~memisevr/dlss2015/2015_DLSS_NonSmoothNonFiniteNonConvex.pdf)  
 56 | - 2015 NIPS [Training Very Deep Networks](http://papers.nips.cc/paper/5850-training-very-deep-networks.pdf)  
 57 | - 2015 AISTATS [Deeply-Supervised Nets](http://jmlr.org/proceedings/papers/v38/lee15a.pdf)  
 58 | - 2014 OSLW [On the Computational Complexity of Deep Learning](http://lear.inrialpes.fr/workshop/osl2015/slides/osl2015_shalev_shwartz.pdf)  
 59 | - 2011 ICML [On Optimization Methods for Deep Learning](http://ai.stanford.edu/~quocle/LeNgiCoaLahProNg11.pdf)  
 60 | - 2010 AISTATS [Understanding the difficulty of training deep feedforward neural networks](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf)  
 61 | 
 62 | ## Adaptive Gradient Methods
 63 | - 2017 [The Marginal Value of Adaptive Gradient Methods in Machine Learning](https://arxiv.org/abs/1705.08292)  
 64 | - 2017 ICLR [SGDR: Stochastic Gradient Descent with Restarts](https://openreview.net/pdf?id=Skq89Scxx)  
 65 | - 2015 ICLR [Adam: A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980) (Adam)  
 66 | - 2013 ICML [On the importance of initialization and momentum in deep learning](http://www.cs.toronto.edu/~fritz/absps/momentum.pdf) (NAG)  
 67 | - 2012 Lecture [RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning]() (RMSProp)  
 68 | - 2011 JMLR [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf) (Adagrad)  
 69 | 
 70 | ## Distributed Optimization  
 71 | - 2017 arXiv [Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677)  
 72 | - 2017 NIPS [TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning](https://arxiv.org/pdf/1705.07878.pdf)  
 73 | - 2017 NIPS [QSGD: Communication-Efficient Stochastic Gradient Descent, with Applications to Training Neural Networks](https://arxiv.org/pdf/1610.02132.pdf) (QSGD)  
 74 | - 2016 ICML [Training Neural Networks Without Gradients: A Scalable ADMM Approach](http://jmlr.org/proceedings/papers/v48/taylor16.pdf)  
 75 | - 2016 IJCAI [Staleness-aware Async-SGD for Distributed Deep Learning](http://www.ijcai.org/Proceedings/16/Papers/335.pdf)  
 76 | - 2016 ICLRW [Revisiting Distributed Synchronous SGD](http://arxiv.org/abs/1604.00981)  
 77 | - 2016 Thesis [Distributed Stochastic Optimization for Deep Learning](https://cs.nyu.edu/media/publications/zhang_sixin.pdf) (EASGD)    
 78 | - 2015 NIPS [Deep learning with Elastic Averaging SGD](https://www.cs.nyu.edu/~zsx/nips2015.pdf) (EASGD)  
 79 | - 2015 ICLR [Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging](http://arxiv.org/pdf/1409.1556v6.pdf)  
 80 | 
 81 | ## Initialization
 82 | - 2016 NIPS [Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity](http://papers.nips.cc/paper/6427-toward-deeper-understanding-of-neural-networks-the-power-of-initialization-and-a-dual-view-on-expressivity.pdf)
 83 | - 2016 ICLR [All You Need is a Good Init](https://arxiv.org/pdf/1511.06422.pdf)  
 84 | - 2016 ICLR [Data-dependent Initializations of Convolutional Neural Networks](https://arxiv.org/pdf/1511.06856.pdf)    
 85 | - 2015 ICCV [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://research.microsoft.com/en-us/um/people/kahe/publications/iccv15imgnet.pdf) (MSRAinit)   
 86 | - 2014 ICLR [Exact solutions to the nonlinear dynamics of learning in deep linear neural networks](https://arxiv.org/pdf/1312.6120.pdf)  
 87 | - 2013 ICML [On the importance of initialization and momentum in deep learning](http://www.cs.toronto.edu/~fritz/absps/momentum.pdf)  
 88 | - 2010 AISTATS [Understanding the difficulty of training deep feedforward neural networks](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf) (Xavier initialization)  
 89 | 
 90 | 
 91 | ## Low Precision
 92 | - 2017 arXiv [Gradient Descent for Spiking Neural Networks](https://arxiv.org/abs/1706.04698)  
 93 | - 2017 arXiv [Training Quantized Nets: A Deeper Understanding](https://arxiv.org/abs/1706.02379)  
 94 | - 2017 arXiv [TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning](https://arxiv.org/abs/1705.07878)  
 95 | - 2017 ICML [ZipML: Training Linear Models with End-to-End Low Precision](http://proceedings.mlr.press/v70/zhang17e/zhang17e.pdf)  
 96 | - 2016 arXiv [QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks](https://arxiv.org/pdf/1610.02132.pdf)  
 97 | - 2015 NIPS [Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms](https://pdfs.semanticscholar.org/a1d2/1f6c8eef605bf132179daf717a232774b375.pdf)  
 98 | - 2013 arXiv [Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation](https://arxiv.org/pdf/1308.3432.pdf)  
 99 | 
100 | ## Noise
101 | - 2015 arXiv [Adding Gradient Noise Improves Learning for Very Deep Networks](http://arxiv.org/abs/1511.06807)      
102 | 
103 | ## Normalization
104 | - 2017 arXiv [Self-Normalizing Neural Networks](https://arxiv.org/abs/1706.02515)  
105 | - 2017 arXiv [Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models](https://arxiv.org/abs/1702.03275)  
106 | - 2016 NIPS [Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks](https://arxiv.org/pdf/1602.07868.pdf)  
107 | - 2016 NIPS [Layer Normalization](https://arxiv.org/pdf/1607.06450.pdf)  
108 | - 2016 ICML [Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks](https://arxiv.org/pdf/1603.01431.pdf)    
109 | - 2016 ICLR [Data-Dependent Path Normalization in Neural Networks](http://arxiv.org/pdf/1511.06747v4.pdf)  
110 | - 2015 NIPS [Path-SGD: Path-Normalized Optimization in Deep Neural Networks](http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2015_5797.pdf)  
111 | - 2015 ICML [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](http://jmlr.org/proceedings/papers/v37/ioffe15.pdf)  
112 | 
113 | ## Regularization  
114 | - 2017 arXiv [L2 Regularization versus Batch and Weight Normalization](https://arxiv.org/abs/1706.05350)  
115 | - 2014 JMLR [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf) (Dropout)   
116 | 
117 | ## Meta-Learning  
118 | - 2017 ICML [Neural Optimizer Search with Reinforcement Learning](https://arxiv.org/pdf/1709.07417.pdf)  
119 | - 2017 ICML [Learned Optimizers that Scale and Generalize](https://arxiv.org/pdf/1703.04813.pdf)  
120 | - 2017 ICML [Learning to Learn without Gradient Descent by Gradient Descent](http://www.cantab.net/users/yutian.chen/Publications/ChenEtAl_ICML17_L2L.pdf)  
121 | - 2017 ICLR [Learning to Optimize](https://openreview.net/pdf?id=ry4Vrt5gl)  
122 | - 2016 arXiv [Learning to reinforcement learn](https://arxiv.org/abs/1611.05763)  
123 | - 2016 NIPSW [Learning to Learn for Global Optimization of Black Box Functions](https://arxiv.org/abs/1611.03824)  
124 | - 2016 NIPS [Learning to learn by gradient descent by gradient descent](https://arxiv.org/abs/1606.04474)  
125 | - 2016 ICML [Meta-learning with memory-augmented neural networks](http://proceedings.mlr.press/v48/santoro16.pdf)  
126 | 
127 | ## Hyperparameter
128 | - 2015 ICML [Gradient-based hyperparameter optimization through reversible learning](https://www.robots.ox.ac.uk/~vgg/rg/papers/MaclaurinICML15.pdf)  
129 | 
130 | ## Bayesian Optimization  
131 | - 2012 [Practical Bayesian Optimization of Machine Learning Algorithms](https://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf)  
132 | 


--------------------------------------------------------------------------------
/dl_sys.md:
--------------------------------------------------------------------------------
 1 | # Deep Learning Systems
 2 | - [General Frameworks](#general-frameworks)  
 3 | - [Specific System](#specific-system)
 4 | - [Parallelization](#parallelization)
 5 | 
 6 | ## General Frameworks
 7 | - **[Caffe](http://caffe.berkeleyvision.org/)**  
 8 | 	2015 [Large Scale Distributed Deep Learning on Hadoop Clusters](http://yahoohadoop.tumblr.com/post/129872361846/large-scale-distributed-deep-learning-on-hadoop)  
 9 | 	2014 MM [Caffe: Convolutional Architecture for Fast Feature Embedding](http://arxiv.org/abs/1408.5093)  
10 | 
11 | - **[CNTK](https://www.cntk.ai/)** 	
12 | 	2014 MSR-TR [An introduction to computational networks and the computational network toolkit](http://research.microsoft.com/apps/pubs/?id=226641)  
13 | 	2014 OSDI [Project Adam: Building an Efficient and Scalable Deep Learning Training System](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf)  	 
14 | 
15 | - **[MXNet](http://mxnet.dmlc.ml/en/latest/)**  
16 | 	2016 arXiv [Training Deep Nets with Sublinear Memory Cost](https://arxiv.org/abs/1604.06174)   	
17 | 	2015 NIPSW [MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems](http://www.cs.cmu.edu/~muli/file/mxnet-learning-sys.pdf) [[GTC'16  Tutorial](http://www.cs.cmu.edu/~muli/file/mxnet_gtc16.pdf)]    
18 | 	2014 NIPSW [Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning](http://stanford.edu/~rezab/nips-2014workshop/submits/minerva.pdf)  
19 | 	2014 ICLR [Purine: A bi-graph based deep learning framework](http://arxiv.org/abs/1412.6249)  
20 | 
21 | - **[Neon](https://www.nervanasys.com/technology/neon/)**   
22 | 	2015 arXiv [Fast Algorithms for Convolutional Neural Networks](http://arxiv.org/abs/1509.09308) (Winograd) [[Blog]](http://www.nervanasys.com/winograd/)    
23 | 
24 | - **[Paddle](http://www.paddlepaddle.org/)**
25 | 
26 | - **[PyTorch](http://pytorch.org/)**
27 | 
28 | - **[Spark](https://github.com/amplab/SparkNet)**  
29 | 	2016 arXiv [SparkNet: Training Deep Networks in Spark](http://arxiv.org/abs/1511.06051)  
30 | 	2016 arXiv [DeepSpark: A Spark-Based Distributed Deep Learning Framework for Commodity Clusters](https://arxiv.org/abs/1602.08191)  
31 | 
32 | - **[TensorFlow](https://www.tensorflow.org/)**  
33 | 	2016 OSDI [BigArray: A System for Large-Scale Machine Learning]()  
34 | 	2016 arXiv [TensorFlow: A system for large-scale machine learning](http://arxiv.org/abs/1605.08695)  
35 | 	2015 [TensorFlow:Large-Scale Machine Learning on Heterogeneous Distributed Systems](http://download.tensorflow.org/paper/whitepaper2015.pdf) [[slides 1]](http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn-2015.pdf) [[slides 2]](http://vision.stanford.edu/teaching/cs231n/slides/jon_talk.pdf)  		
36 | 	2012 NIPS [Large Scale Distributed Deep Networks](http://static.googleusercontent.com/media/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf) (DistBelief)  
37 | 	2014 NIPSW [Techniques and Systems for Training Large Neural Networks Quickly](http://stanford.edu/~rezab/nips2014workshop/slides/jeff.pdf)  
38 | 
39 | - **[Theano]()**  
40 | 	2016 arXiv [Theano: A Python framework for fast computation of mathematical expressions](http://arxiv.org/abs/1605.02688)  
41 | 
42 | - **[Torch](http://torch.ch/)**  
43 | 	2016 NIPSW [Torchnet: An Open-Source Platform for (Deep) Learning Research](https://lvdmaaten.github.io/publications/papers/Torchnet_2016.pdf)   
44 | 	2011 NIPSW [Torch7: A Matlab-like Environment for Machine Learning](http://cs.nyu.edu/~koray/files/2011_torch7_nipsw.pdf)  
45 | 
46 | ## Specific System
47 | - 2016 ICML [Deep Speech 2: End-to-End Speech Recognition in English and Mandarin](http://arxiv.org/abs/1512.02595)  
48 | - 2015 ICMLW [Massively Parallel Methods for Deep Reinforcement Learning](https://arxiv.org/abs/1507.04296)  
49 | - 2015 arXiv [Deep Image: Scaling up Image Recognition](http://arxiv.org/abs/1501.02876)  
50 | - 2013 ICML [Deep learning with COTS HPC systems](http://jmlr.org/proceedings/papers/v28/coates13.pdf)
51 | 
52 | ## Parallelization
53 | - 2015 Intel [Single Node Caffe Scoring and Training on Intel® Xeon E5-Series Processors](https://software.intel.com/en-us/articles/single-node-caffe-scoring-and-training-on-intel-xeon-e5-series-processors)  
54 | - 2015 arXiv [Caffe con Troll: Shallow Ideas to Speed Up Deep Learning](http://arxiv.org/abs/1504.04343)  
55 | - 2015 ICMLW [Massively Parallel Methods for Deep Reinforcement Learning](https://8109f4a4-a-62cb3a1a-s-sites.googlegroups.com/site/deeplearning-2015/1.pdf?attachauth=ANoY7cocCvmoqZlkfUFQkSwV8fULURfVSzDdFv0dyk8uU1ztfeCHFIK4Kb6JoEQ3iZLUiYBynddwePUhd-3ssJZkANn-PXFU7m1U_wE5Eb4eHbZj3YR41bLF1AEr5T5EDth97i9DdkipHses1XTMDu_wpw8zs0-RGb7WVQRF8ZOhvG1AW47CRkAI8X0iv-oLtWy9fGSSa-JR9JpSwFUtjt_0_UXu4BUUwg==&attredirects=0)  
56 | - 2015 arXiv [Convolutional Neural Networks at Constrained Time Cost](http://arxiv.org/pdf/1412.1710v1.pdf)  
57 | - 2014 arXiv [One weird trick for parallelizing convolutional neural networks](http://arxiv.org/pdf/1404.5997v2.pdf)  
58 | - 2014 NIPS [On the Computational Efficiency of Training Neural Networks](http://papers.nips.cc/paper/5267-on-the-computational-efficiency-of-training-neural-networks.pdf)  
59 | - 2011 NIPSW [Improving the speed of neural networks on CPUs](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37631.pdf)
60 | 


--------------------------------------------------------------------------------
/graph.md:
--------------------------------------------------------------------------------
 1 | # Large-Scale Graph Computation
 2 | ===
 3 | 
 4 | 2015 CIKM [HDRF: Stream-Based Partitioning for Power-Law Graphs](http://www.fabiopetroni.com/Download/petroni2015HDRF.pdf)  
 5 | 2015 VLDB [One Trillion Edges: Graph Processing at Facebook-Scale](http://www.vldb.org/pvldb/vol8/p1804-ching.pdf)  
 6 | 2015 VLDB [A Scalable Distributed Graph Partitioner](http://www.eecs.harvard.edu/~dmargo/pubs/vldb15-paper.pdf)  
 7 | 2015 SOSP [Chaos: Scale-out Graph Processing from Secondary Storage](http://sigops.org/sosp/sosp15/current/2015-Monterey/printable/089-roy.pdf)  
 8 | 2015 EuroSys [PowerLyra: Differentiated Graph Computation and Partitioning
 9 | on Skewed Graphs](http://ipads.se.sjtu.edu.cn/projects/powerlyra/powerlyra-eurosys-final.pdf)  
10 | 2014 VLDB [Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation](http://www.vldb.org/pvldb/vol8/p281-lu.pdf)  
11 | 2014 VLDB [An Experimental Comparison of Pregel-like Graph Processing Systems](http://www.vldb.org/pvldb/vol7/p1047-han.pdf)  
12 | 2014 OSDI [GraphX: Graph Processing in a Distributed Dataflow Framework](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-gonzalez.pdf)   
13 | 2013 SOSP [X-Stream: Edge-centric Graph Processing using Streaming Partitions](http://infoscience.epfl.ch/record/188535/files/paper.pdf)  
14 | 2012 OSDI [PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs](https://www.usenix.org/system/files/conference/osdi12/osdi12-final-167.pdf)  
15 | 2012 OSDI [GraphChi: Large-Scale Graph Computation on Just a PC](http://select.cs.cmu.edu/publications/paperdir/osdi2012-kyrola-blelloch-guestrin.pdf)  
16 | 2010 SIGMOD [Pregel: A System for Large-Scale Graph Processing](https://kowshik.github.io/JPregel/pregel_paper.pdf)  
17 | 2010 UAI [GraphLab: A New Framework For Parallel Machine Learning](http://www.select.cs.cmu.edu/publications/paperdir/uai2010-low-gonzalez-kyrola-bickson-guestrin-hellerstein.pdf)
18 | 


--------------------------------------------------------------------------------
/matrix_fact.md:
--------------------------------------------------------------------------------
 1 | # Matrix Factorization
 2 | 
 3 | 
 4 | - 2016 WSDM [DiFacto — Distributed Factorization Machines](http://www.cs.cmu.edu/~yuxiangw/docs/fm.pdf)   
 5 | - 2015 CIKM [HDRF: Stream-Based Partitioning for Power-Law Graphs](http://www.fabiopetroni.com/Download/petroni- 2015HDRF.pdf)  
 6 | - 2015 KDD [Fast and Robust Parallel SGD Matrix Factorization](http://dm.postech.ac.kr/MLGF-MF/fp352.pdf) (MLGF)  
 7 | - 2015 Facebook Eng [Recommending items to more than a billion people](https://code.facebook.com/posts/861999383875667/recommending-items-to-more-than-a-billion-people/)  
 8 | - 2015 arxiv [Generalized Low Rank Models](https://web.stanford.edu/~boyd/papers/pdf/glrm.pdf)  
 9 | - 2015 RecSys [Fast Differentially Private Matrix Factorization](http://arxiv.org/pdf/1505.01419v2.pdf)   
10 | - 2015 arxiv [Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems](http://arxiv.org/abs/1411.1134)  
11 | - 2015 ICML [PU Learning for Matrix Completion](http://arxiv.org/pdf/1411.6081v1.pdf)  
12 | - 2014 JMLR worksop [A Fast Distributed Stochastic Gradient Descent Algorithm for Matrix Factorization](http://www.jmlr.org/proceedings/papers/v36/li14.pdf)  
13 | - 2014 NIPSW [Elastic Distributed Bayesian Collaborative Filtering](http://stanford.edu/~rezab/nips- 2014workshop/submits/distbayes.pdf)  
14 | - 2014 NIPSW [Factorbird - a Parameter Server Approach to Distributed Matrix Factorization](http://stanford.edu/~rezab/papers/factorbird.pdf) (Factorbird)  
15 | - 2014 RecSys [GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Completion via Graph Partitioning](http://dl.acm.org/citation.cfm?id=2645725) (GASGD)  
16 | - 2014 VLDB [NOMAD: Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion](http://www.vldb.org/pvldb/vol7/p975-yun.pdf) (NOMAD)  
17 | - 2013 WWW [Distributed Large-Scale Natural Graph Factorization](http://www.di.ens.fr/~shervashidze/papers/Ahmedetal13.pdf)  
18 | - 2013 EDBT [Sparkler: Supporting Large-Scale Matrix Factorization](http://people.cs.umass.edu/~boduo/publications/- 2013EDBT-sparkler.pdf)  
19 | - 2013 RecSys [A fast parallel sgd for matrix factorization in shared memory systems](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf.pdf) (FPSGD)  
20 | - 2012 ICDM [Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](http://www.cs.utexas.edu/~rofuyu/papers/icdm-pmf.pdf)  
21 | - 2012 ICDM [Distributed Matrix Completion](https://people.mpi-inf.mpg.de/~rgemulla/publications/teflioudi12completion.pdf) (DSGD++)  
22 | 2011 KDD [Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent](https://people.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) (DSGD)  
23 | 2009 JMLR [Scalable Collaborative Filtering Approaches for Large Recommender Systems](http://www.jmlr.org/papers/volume10/takacs09a/takacs09a.pdf)  
24 | 
25 | ##### ALS
26 | - 2008 AAIM [Large-scale parallel collaborative filtering
27 | for the Netflix prize](http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf)  
28 | 
29 | ##### Nystrom method
30 | - 2015 JMLR [Distributed Matrix Completion and Robust Factorization](http://web.stanford.edu/~lmackey/papers/dmcrf-jmlr15.pdf)  
31 | - 2011 NIPS [Divide-and-Conquer Matrix Factorization](http://papers.nips.cc/paper/4486-divide-and-conquer-matrix-factorization.pdf)  
32 | 
33 | ##### NMF
34 | - 2011 KDD [Fast Coordinate Descent Methods with Variable Selection
35 | for Non-negative Matrix Factorization](http://www.cs.utexas.edu/users/inderjit/public_papers/nmf_kdd11.pdf)  
36 | - 2010 WWW [Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce](http://research.microsoft.com/pubs/119077/DNMF.pdf)  
37 |   


--------------------------------------------------------------------------------