├── README.md ├── dist_ml.md ├── dl_cnn.md ├── dl_opt.md ├── dl_sys.md ├── graph.md └── matrix_fact.md /README.md: -------------------------------------------------------------------------------- 1 | # Fast and Scalable Machine Learning: Algorithms and Systems 2 | 3 | 4 | This is a collection of papers about recent progress in machine learning and systems, including distributed machine learning, deep learning and etc. 5 | 6 | ## Contents 7 | 1. Deep Learning 8 | - [Convolutional Neural Networks](dl_cnn.md) 9 | - [ImageNet Models](dl_cnn.md#imagenet-models) 10 | - [Architecture Design](dl_cnn.md#architecture-design) 11 | - [Activation Functions](dl_cnn.md#activation-functions) 12 | - [Visualization](dl_cnn.md#visualization) 13 | - [Fast Convolution](dl_cnn.md#fast-convolution) 14 | - [Low-Rank Filter Approximation](dl_cnn.md#low-rank-filter-approximation) 15 | - [Low Precision](dl_cnn.md#low-precision) 16 | - [Parameter Pruning](dl_cnn.md#parameter-pruning) 17 | - [Transfer Learning](dl_cnn.md#transfer-learning) 18 | - [Theory](dl_cnn.md#theory) 19 | - [3D Data](dl_cnn.md#3d-data) 20 | - [Hardware](dl_cnn.md#hardware) 21 | - [Optimization for Deep Learning](dl_opt.md) 22 | - [Generalization](dl_opt.md#generalization) 23 | - [Loss Surface](dl_opt.md#loss-surface) 24 | - [Batch Size](dl_opt.md#batch-size) 25 | - [General](dl_opt.md#general) 26 | - [Adaptive Gradient Methods](dl_opt.md#adaptive-gradient-methods) 27 | - [Distributed Optimization](dl_opt.md#distributed-optimization) 28 | - [Initialization](dl_opt.md#initialization) 29 | - [Low Precision](dl_opt.md#low-precision) 30 | - [Normalization](dl_opt.md#normalization) 31 | - [Regularization](dl_opt.md#regularization) 32 | - [Meta Learning](dl_opt.md#meta-learning) 33 | - [Deep Learning Systems](dl_sys.md) 34 | - [General Frameworks](dl_sys.md#general-frameworks) 35 | - [Specific System](dl_sys.md#specific-system) 36 | - [Parallelization](dl_sys.md#parallelization) 37 | 2. Distributed Machine Learning 38 | - [Distributed Optimization](dist_ml.md#distributed-optimization) 39 | - [Distributed ML Systems](dist_ml.md#distributed-ml-systems) 40 | 3. Other Topics 41 | - [Matrix Factorization](matrix_fact.md) 42 | - [Graph Computation](graph.md) 43 | -------------------------------------------------------------------------------- /dist_ml.md: -------------------------------------------------------------------------------- 1 | # Distributed Machine Learning 2 | 3 | - [Distributed Optimization](#distributed-optimization) 4 | - [Distributed ML Systems](#distributed-ml-systems) 5 | 6 | ## Distributed Optimization 7 | - 2016 ICDM [Efficient Distributed SGD with Variance Reduction](https://arxiv.org/pdf/1512.02970.pdf) 8 | - 2016 KDD [Robust Large-Scale Machine Learning in the Cloud](http://www.kdd.org/kdd2016/papers/files/Paper_801.pdf) 9 | - 2015 KDD [Netowrk Lasso: Clustering and Optimization in Large 10 | Graphs](http://web.stanford.edu/~hallac/Network_Lasso.pdf) 11 | - 2012 JMLR [Distributed Learning, Communication Complexity and Privacy](http://www.cs.cmu.edu/~avrim/Papers/DistLrn.pdf) 12 | - 2012 AISTATS [Protocols for Learning Classifiers on Distributed Data](https://www.cs.utah.edu/~jeffp/papers/distrib-learn-AIStat.pdf) 13 | - 2010 NIPS [Parallelized Stochastic Gradient Descent](http://martin.zinkevich.org/publications/nips2010.pdf) | [video](http://videosrv14.cs.washington.edu/info/videos/mp4/colloq/AAgarwal_140210.mp4) (One-Short) 14 | - 2010 NAACL [Distributed Training Strategies for the Structured Perceptron](http://www.cslu.ogi.edu/~bedricks/courses/cs506-pslc/articles/week3/dpercep.pdf) 15 | - 2009 NIPS [Efficient Large-Scale Distributed Training of Conditional Maximum Entropy Models](http://www.ryanmcd.com/papers/efficient_maxentNIPS2009.pdf) 16 | - 2009 NIPS [Slow Learners are Fast](http://papers.nips.cc/paper/3888-slow-learners-are-fast.pdf) 17 | 18 | ### Communication Efficiency, Complexity, Delay, Latency 19 | - 2014 ATC [Exploiting bounded staleness to speed up Big Data analytics](https://www.usenix.org/system/files/conference/atc14/atc14-paper-cui.pdf) 20 | 2014 NIPS [Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation](http://papers.nips.cc/paper/5386-fundamental-limits-of-online-and-distributed-algorithms-for-statistical-learning-and-estimation.pdf) 21 | - 2014 ICML [Communication-Efficient Distributed Optimization using an Approximate Newton-type Method](http://jmlr.org/proceedings/papers/v32/shamir14.pdf) 22 | - 2013 NIPS [Information-theoretic lower bounds for distributed statistical estimation with communication constraints](http://www.cs.berkeley.edu/~yuczhang/files/nips13_communication.pdf) 23 | - 2013 NIPS [Optimistic Concurrency Control for Distributed Unsupervised Learning](http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2013_5038.pdf) 24 | - 2013 SDM [Butterfly Mixing: Accelerating Incremental-Update Algorithms on Clusters](http://www.cs.berkeley.edu/~jfc/papers/13/butterflymixing.pdf) 25 | - 2012 NIPS [Communication-Efficient Algorithms for Statistical Optimization](http://papers.nips.cc/paper/4728-communication-efficient-algorithms-for-statistical-optimization.pdf) 26 | - 2011 NIPS [Distributed Delayed Stochastic Optimization](http://papers.nips.cc/paper/4247-distributed-delayed-stochastic-optimization.pdf) 27 | 28 | ### Distributed Mini-Batching 29 | 30 | - 2014 KDD [Efficient Mini-batch Training for Stochastic Optimization](http://www.cs.cmu.edu/~muli/file/minibatch_sgd.pdf) 31 | - 2012 JMLR [Optimal Distributed Online Prediction Using Mini-Batches](http://jmlr.org/papers/volume13/dekel12a/dekel12a.pdf) 32 | - 2011 ICML [Optimal Distributed Online Prediction](http://www.icml-2011.org/papers/404_icmlpaper.pdf) 33 | - 2011 NIPS [Better Mini-Batch Algorithms via Accelerated Gradient Methods](http://papers.nips.cc/paper/4432-better-mini-batch-algorithms-via-accelerated-gradient-methods.pdf) 34 | 35 | ### Distributed Consensus 36 | - 2016 ICLRW [Revisiting Distributed Synchronous SGD](http://arxiv.org/abs/1604.00981) 37 | - 2014 [Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers](http://web.stanford.edu/~boyd/papers/admm_distr_stats.html) (ADMM) 38 | - 2012 IEEE Trans. on Automatic Control [Dual Averaging for Distributed Optimization: 39 | Convergence Analysis and Network Scaling](http://www.eecs.berkeley.edu/~wainwrig/Papers/DucAgaWai12.pdf) 40 | - 2010 NIPS [Distributed Dual Averaging in Networks](https://web.stanford.edu/~jduchi/projects/DuchiAgWa10_nips.pdf) 41 | - 2009 IEEE Trans. on Automatic Control [Distributed subgradient methods for multi-agent optimization](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=4749425) | [slides](http://groups.csail.mit.edu/tds/seminars/s09/MIT-talk.pdf) 42 | - 2008 Convex Optimization in Signal Processing and Communications [Cooperative Distributed Multi-Agent Optimization](https://asu.mit.edu/sites/default/files/documents/publications/Dist-chapter.pdf) 43 | 44 | 45 | ## Distributed ML Systems 46 | - 2014 APSys [A Scalable and Topology Configurable Protocol for Distributed Parameter Synchronization](http://research.microsoft.com/pubs/219927/main.pdf) 47 | - 2014 ICML Tutorial [Emerging System for Large-Scale Machine Learning](http://www.cs.berkeley.edu/~jegonzal/talks/icml14_sysml.pdf) 48 | - 2013 Distributed Computing [When distributed computation is communication expensive](http://arxiv.org/abs/1304.4636) 49 | 50 | ### MapReduce / AllReduce 51 | - 2014 JMLR [A Reliable Effective Terascale Linear Learning System](http://jmlr.org/papers/volume15/agarwal14a/agarwal14a.pdf) 52 | - 2010 NIPSW [MapReduce/Bigtable for Distributed Optimization](http://www.australianscience.com.au/research/google/36948.pdf)[slides](http://lccc.eecs.berkeley.edu/Slides/HallGiMa10_slides.pdf) 53 | - 2007 NIPS [Map-Reduce for Machine Learning on Multicore](http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_725.pdf) 54 | 55 | ### Parameter Servers 56 | - 2014 OSDI [Project Adam: Building an Efficient and Scalable Deep Learning Training System](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf) 57 | - 2014 OSDI [Scaling Distributed Machine Learning with the Parameter Server](http://www.cs.cmu.edu/~muli/file/parameter_server_osdi14.pdf) 58 | - 2014 NIPS [Communication Efficient Distributed Machine 59 | Learning with the Parameter Server](http://www.cs.cmu.edu/~muli/file/parameter_server_nips14.pdf) 60 | - 2013 NIPSW [Parameter Server for Distributed Machine Learning](http://www.cs.cmu.edu/~muli/file/ps.pdf) 61 | - 2013 NIPSW [Distributed Delayed Proximal Gradient Methods](http://www.cs.cmu.edu/~muli/file/ddp.pdf) 62 | - 2012 NIPS [Large Scale Distributed Deep Networks](http://static.googleusercontent.com/media/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf) (DistBelief) 63 | - 2010 VLDB [An Architecture for Parallel Topic Models](http://vldb.org/pvldb/vldb2010/papers/R63.pdf) 64 | 65 | ### Peer-to-Peer 66 | - 2015 EuroSys [MALT: Distributed Data-Parallelism for Existing ML Applications](http://www.nec-labs.com/~asim/papers/malt_eurosys15.pdf) 67 | -------------------------------------------------------------------------------- /dl_cnn.md: -------------------------------------------------------------------------------- 1 | # Convolutional Neural Networks 2 | 3 | - [ImageNet Models](#imagenet-models) 4 | - [Architecture Design](#architecture-design) 5 | - [Activation Functions](#activation-functions) 6 | - [Visualization](#visualization) 7 | - [Fast Convolution](#fast-convolution) 8 | - [Low-Rank Filter Approximation](#low-rank-filter-approximation) 9 | - [Low Precision](#low-precision) 10 | - [Parameter Pruning](#parameter-pruning) 11 | - [Transfer Learning](#transfer-learning) 12 | - [Theory](#theory) 13 | - [3D Data](#3d-data) 14 | - [Hardware](#hardware) 15 | 16 | ## ImageNet Models 17 | - 2017 CVPR [Xception: Deep Learning with Depthwise Separable Convolutions](http://openaccess.thecvf.com/content_cvpr_2017/papers/Chollet_Xception_Deep_Learning_CVPR_2017_paper.pdf)(Xception) 18 | - 2017 CVPR [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/pdf/1611.05431.pdf) (ResNeXt) 19 | - 2016 ECCV [Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027) (Pre-ResNet) 20 | - 2016 arXiv [Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning](http://arxiv.org/abs/1602.07261) (Inception V4) 21 | - 2016 CVPR [Deep Residual Learning for Image Recognition](http://arxiv.org/abs/1512.03385) (ResNet) 22 | - 2015 arXiv [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) (Inception V3) 23 | - 2015 ICML [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](http://jmlr.org/proceedings/papers/v37/ioffe15.pdf) (Inception V2) 24 | - 2015 ICCV [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://research.microsoft.com/en-us/um/people/kahe/publications/iccv15imgnet.pdf) (PReLU) 25 | - 2015 ICLR [Very Deep Convolutional Networks For Large-scale Image Recognition](http://arxiv.org/abs/1409.1556) (VGG) 26 | - 2015 CVPR [Going Deeper with Convolutions](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43022.pdf) (GoogleNet/Inception V1) 27 | - 2012 NIPS [ImageNet Classification with Deep Convolutional Neural Networks](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (AlexNet) 28 | 29 | ## Architecture Design 30 | - 2018 arXiv [Regularized Evolution for Image Classifier Architecture Search](https://arxiv.org/pdf/1802.01548.pdf) 31 | - 2018 CVPR [Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/pdf/1707.07012.pdf) 32 | - 2017 arXiv [One Model To Learn Them All](https://arxiv.org/abs/1706.05137) 33 | - 2017 arXiv [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) 34 | - 2017 ICML [AdaNet: Adaptive Structural Learning of Artificial Neural Networks](https://arxiv.org/pdf/1607.01097.pdf) 35 | - 2017 ICML [Large-Scale Evolution of Image Classifiers](https://arxiv.org/abs/1703.01041) 36 | - 2017 CVPR [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/pdf/1611.05431.pdf) 37 | - 2017 CVPR [Densely Connected Convolutional Networks](http://arxiv.org/abs/1608.06993) 38 | - 2017 ICLR [Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer](https://openreview.net/pdf?id=B1ckMDqlg) 39 | - 2017 ICLR [Neural Architecture Search with Reinforcement Learning](https://openreview.net/pdf?id=r1Ue8Hcxg) 40 | - 2017 ICLR [Designing Neural Network Architectures using Reinforcement Learning](https://openreview.net/pdf?id=S1c2cvqee) 41 | - 2017 ICLR [Do Deep Convolutional Nets Really Need to be Deep and Convolutional?](https://arxiv.org/abs/1603.05691) 42 | - 2017 ICLR [Highway and Residual Networks learn Unrolled Iterative Estimation](https://arxiv.org/pdf/1612.07771.pdf) 43 | - 2016 NIPS [Residual Networks Behave Like Ensembles of Relatively Shallow Networks](https://arxiv.org/abs/1605.06431) 44 | - 2016 BMVC [Wide Residual Networks](http://arxiv.org/abs/1605.07146) 45 | - 2016 arXiv [Benefits of depth in neural networks](http://arxiv.org/abs/1602.04485) 46 | - 2016 AAAI [On the Depth of Deep Neural Networks: A Theoretical View](http://arxiv.org/abs/1506.05232) 47 | - 2016 arXiv [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size](http://arxiv.org/abs/1602.07360) 48 | - 2015 ICMLW [Highway Networks](http://arxiv.org/pdf/1505.00387v2.pdf) 49 | - 2015 CVPR [Convolutional Neural Networks at Constrained Time Cost](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/He_Convolutional_Neural_Networks_2015_CVPR_paper.pdf) 50 | - 2015 CVPR [Fully Convolutional Networks for Semantic Segmentation](https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf) 51 | - 2014 NIPS [Do Deep Nets Really Need to be Deep?](http://papers.nips.cc/paper/5484-do-deep-nets-really-need-to-be-deep.pdf) 52 | - 2014 ICLRW [Understanding Deep Architectures using a Recursive Convolutional Network](http://arxiv.org/abs/1312.1847) 53 | - 2013 ICML [Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures](http://jmlr.org/proceedings/papers/v28/bergstra13.pdf) 54 | - 2009 ICCV [What is the Best Multi-Stage Architecture for Object Recognition?](http://yann.lecun.com/exdb/publis/pdf/jarrett-iccv-09.pdf) 55 | - 1995 NIPS [Simplifying Neural Nets by Discovering Flat Minima](https://papers.nips.cc/paper/899-simplifying-neural-nets-by-discovering-flat-minima.pdf) 56 | - 1994 T-NN [SVD-NET: An Algorithm that Automatically Selects Network Structure](http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=286929) 57 | 58 | ## Activation Functions 59 | - 2017 arXiv [Self-Normalizing Neural Networks](https://arxiv.org/abs/1706.02515) (SELU) 60 | - 2016 ICLR [Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)](https://arxiv.org/pdf/1511.07289.pdf) (ELU) 61 | - 2015 arXiv [Empirical Evaluation of Rectified Activations in Convolutional Network](https://arxiv.org/pdf/1505.00853.pdf) (RReLU) 62 | - 2015 ICCV [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://research.microsoft.com/en-us/um/people/kahe/publications/iccv15imgnet.pdf) (PReLU) 63 | - 2013 ICML [Rectifier Nonlinearities Improve Neural Network Acoustic Models](https://pdfs.semanticscholar.org/367f/2c63a6f6a10b3b64b8729d601e69337ee3cc.pdf) 64 | - 2010 ICML [Rectified Linear Units Improve Restricted Boltzmann Machines](http://www.cs.toronto.edu/~fritz/absps/reluICML.pdf) (ReLU) 65 | 66 | ## Visualization 67 | - 2017 CVPR [Network Dissection: Quantifying Interpretability of Deep Visual Representations](http://netdissect.csail.mit.edu/final-network-dissection.pdf) 68 | - 2016 IJCV [Visualizing Deep Convolutional Neural Networks Using Natural Pre-Images](https://arxiv.org/pdf/1512.02017.pdf) 69 | - 2016 ICMLW [Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks](http://www.evolvingai.org/files/mfv_icml_workshop_16.pdf) 70 | - 2016 CVPR [Inverting Visual Representations with Convolutional Networks](https://arxiv.org/pdf/1506.02753.pdf) 71 | - 2015 ICMLW [Understanding Neural Networks Through Deep Visualization](http://yosinski.com/media/papers/Yosinski__2015__ICML_DL__Understanding_Neural_Networks_Through_Deep_Visualization__.pdf) 72 | - 2015 CVPR [Understanding Deep Image Representations by Inverting Them](https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mahendran_Understanding_Deep_Image_2015_CVPR_paper.pdf) 73 | - 2014 ECCV [Visualizing and Understanding Convolutional Networks](https://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf) 74 | - 2014 ICLRW [Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps](https://arxiv.org/pdf/1312.6034.pdf) 75 | - 2009 [Visualizing Higher-Layer Features of a Deep Network](https://www.researchgate.net/profile/Aaron_Courville/publication/265022827_Visualizing_Higher-Layer_Features_of_a_Deep_Network/links/53ff82b00cf24c81027da530.pdf) 76 | 77 | ## Fast Convolution 78 | - 2017 ICML [Warped Convolutions: Efficient Invariance to Spatial Transformations](https://arxiv.org/pdf/1609.04382.pdf) 79 | - 2017 ICLR [Faster CNNs with Direct Sparse Convolutions and Guided Pruning](https://openreview.net/pdf?id=rJPcZ3txx) 80 | - 2016 NIPS [PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions](http://arxiv.org/abs/1504.08362) 81 | - 2016 CVPR [Fast Algorithms for Convolutional Neural Networks](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Lavin_Fast_Algorithms_for_CVPR_2016_paper.pdf) (Winograd) 82 | - 2015 CVPR [Sparse Convolutional Neural Networks](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Liu_Sparse_Convolutional_Neural_2015_CVPR_paper.pdf) 83 | 84 | ## Low-Rank Filter Approximation 85 | - 2016 ICLR [Convolutional Neural Networks with Low-rank Regularization](https://arxiv.org/abs/1511.06067) 86 | - 2016 ICLR [Training CNNs with Low-Rank Filters for Efficient Image Classification](http://arxiv.org/abs/1511.06744) 87 | - 2016 TPAMI [Accelerating Very Deep Convolutional Networks for Classification and Detection](https://arxiv.org/abs/1505.06798) 88 | - 2015 CVPR [Efficient and Accurate Approximations of Nonlinear Convolutional Networks](http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Zhang_Efficient_and_Accurate_2015_CVPR_paper.pdf) 89 | - 2015 ICLR [Speeding-up convolutional neural networks using fine-tuned cp-decomposition](https://arxiv.org/pdf/1412.6553v3.pdf) 90 | - 2014 NIPS [Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation](http://papers.nips.cc/paper/5544-exploiting-linear-structure-within-convolutional-networks-for-efficient-evaluation.pdf) 91 | - 2014 BMVC [Speeding up Convolutional Neural Networks with Low Rank Expansions](https://arxiv.org/abs/1405.3866) 92 | - 2013 NIPS [Predicting Parameters in Deep Learning](https://papers.nips.cc/paper/5025-predicting-parameters-in-deep-learning.pdf) 93 | - 2013 CVPR [Learning Separable Filters](http://cvlabwww.epfl.ch/~lepetit/papers/rigamonti_cvpr13.pdf) 94 | 95 | ## Low Precision 96 | - 2018 AAAI [Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM](https://arxiv.org/pdf/1707.09870.pdf) 97 | - 2018 ICLR [Training and Inference with Integers in Deep Neural Networks](https://arxiv.org/pdf/1802.04680.pdf) 98 | - 2018 ICLR [Mixed Precision Training](https://arxiv.org/pdf/1710.03740.pdf) 99 | - 2017 arXiv [BitNet: Bit-Regularized Deep Neural Networks](https://arxiv.org/pdf/1708.04788.pdf) 100 | - 2017 arXiv [Gradient Descent for Spiking Neural Networks](https://arxiv.org/abs/1706.04698) 101 | - 2017 arXiv [ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks](https://arxiv.org/abs/1706.02393) 102 | - 2017 arXiv [Gated XNOR Networks: Deep Neural Networks with Ternary Weights and Activations under a Unified Discretization Framework](https://arxiv.org/abs/1705.09283) 103 | - 2018 ICLR [The High-Dimensional Geometry of Binary Neural Networks](https://arxiv.org/pdf/1705.07199.pdf) 104 | - 2017 NIPS [Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks](https://arxiv.org/pdf/1711.02213.pdf) 105 | - 2017 NIPS [Training Quantized Nets: A Deeper Understanding](https://arxiv.org/abs/1706.02379) 106 | - 2017 NIPS [TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning](https://arxiv.org/abs/1705.07878) 107 | - 2017 ICML [Analytical Guarantees on Numerical Precision of Deep Neural Networks](http://proceedings.mlr.press/v70/sakr17a/sakr17a.pdf) 108 | - 2017 CVPR [Deep Learning with Low Precision by Half-wave Gaussian Quantization](https://arxiv.org/abs/1702.00953) 109 | - 2017 CVPR [Network Sketching: Exploiting Binary Structure in Deep CNNs](https://arxiv.org/pdf/1706.02021.pdf) 110 | - 2017 CVPR [Local Binary Convolutional Neural Networks](http://openaccess.thecvf.com/content_cvpr_2017/papers/Juefei-Xu_Local_Binary_Convolutional_CVPR_2017_paper.pdf) 111 | - 2017 ICLR [Towards the Limit of Network Quantization](https://openreview.net/pdf?id=rJ8uNptgl) 112 | - 2017 ICLR [Loss-aware Binarization of Deep Networks](https://openreview.net/pdf?id=S1oWlN9ll) 113 | - 2017 ICLR [Trained Ternary Quantization](https://openreview.net/pdf?id=S1_pAu9xl) 114 | - 2017 ICLR [Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights](https://openreview.net/pdf?id=HyQJ-mclg) 115 | - 2017 AAAI [How to Train a Compact Binary Neural Network with High Accuracy? ](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjYwK75-bLXAhXKTSYKHRG_DkAQFggoMAA&url=https%3A%2F%2Faaai.org%2Focs%2Findex.php%2FAAAI%2FAAAI17%2Fpaper%2Fdownload%2F14619%2F14454&usg=AOvVaw2S-_PPueqpoUp5PvWduGKG) 116 | - 2016 arXiv [Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations](https://arxiv.org/abs/1609.07061) 117 | - 2016 arXiv [Accelerating Deep Convolutional Networks using low-precision and sparsity](https://arxiv.org/abs/1610.00324) 118 | - 2016 arXiv [Deep neural networks are robust to weight binarization and other non-linear distortions](https://arxiv.org/pdf/1606.01981.pdf) 119 | - 2016 ECCV [XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks](https://arxiv.org/pdf/1603.05279.pdf) 120 | - 2016 ICMLW [Overcoming Challenges in Fixed Point Training of Deep Convolutional Networks](https://arxiv.org/pdf/1607.02241.pdf) 121 | - 2016 ICML [Fixed Point Quantization of Deep Convolutional Networks](http://jmlr.org/proceedings/papers/v48/linb16.pdf) 122 | - 2016 NIPS [Binarized Neural Networks](https://papers.nips.cc/paper/6573-binarized-neural-networks.pdf) 123 | - 2016 arXiv [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](http://arxiv.org/abs/1602.02830) 124 | - 2016 CVPR [Quantized Convolutional Neural Networks for Mobile Devices](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Wu_Quantized_Convolutional_Neural_CVPR_2016_paper.pdf) 125 | - 2016 ICLR [Neural Networks with Few Multiplications](https://arxiv.org/abs/1510.03009) 126 | - 2015 arXiv [Resiliency of Deep Neural Networks under Quantization](https://arxiv.org/abs/1511.06488) 127 | - 2015 arXiv [Rounding Methods for Neural Networks with Low Resolution Synaptic Weights](https://arxiv.org/abs/1504.05767) 128 | - 2015 NIPS [Backpropagation for Energy-Efficient Neuromorphic Computing](https://papers.nips.cc/paper/5862-backpropagation-for-energy-efficient-neuromorphic-computing.pdf) 129 | - 2015 NIPS [BinaryConnect: Training Deep Neural Networks with Binary Weights during Propagations](https://papers.nips.cc/paper/5647-binaryconnect-training-deep-neural-networks-with-binary-weights-during-propagations.pdf) 130 | - 2015 ICMLW [Bitwise Neural Networks](http://minjekim.com/papers/icml2015_mkim.pdf) 131 | - 2015 ICML [Deep Learning with Limited Numerical Precision](http://www.jmlr.org/proceedings/papers/v37/gupta15.pdf) 132 | - 2015 ICLRW [Training deep neural networks with low precision multiplications](https://arxiv.org/abs/1412.7024) 133 | - 2015 arXiv [Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation](https://arxiv.org/abs/1503.03562) 134 | - 2014 NIPS [Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights](https://papers.nips.cc/paper/5269-expectation-backpropagation-parameter-free-training-of-multilayer-neural-networks-with-continuous-or-discrete-weights.pdf) 135 | - 2013 arXiv [Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation](https://arxiv.org/pdf/1308.3432.pdf) 136 | - 2011 NIPSW [Improving the speed of neural networks on CPUs](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37631.pdf) 137 | - 1987 Combinatorica [Randomized rounding: A technique for provably good algorithms and algorithmic proofs](https://www.cs.auckland.ac.nz/~cthombor/Pubs/RandomRounding/RandomRounding1987.pdf) 138 | 139 | ## Parameter Pruning 140 | - 2018 ICLR [On the importance of single directions for generalization](https://arxiv.org/pdf/1803.06959.pdf) 141 | - 2018 ICLR [Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers](https://arxiv.org/pdf/1802.00124.pdf) 142 | - 2017 NIPS [Runtime Neural Pruning](https://papers.nips.cc/paper/6813-runtime-neural-pruning.pdf) 143 | - 2017 ICML [Beyond Filters: Compact Feature Map for Portable Deep Model](http://proceedings.mlr.press/v70/wang17m/wang17m.pdf) 144 | - 2017 ICLR [Soft Weight-Sharing for Neural Network Compression](https://openreview.net/pdf?id=HJGwcKclx) 145 | - 2017 ICLR [Pruning Convolutional Neural Networks for Resource Efficient Inference](https://openreview.net/pdf?id=SJGCiw5gl) 146 | - 2017 ICLR [Pruning Filters for Efficient ConvNets](https://openreview.net/pdf?id=rJqFGTslg) 147 | - 2016 arXiv [Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning](https://arxiv.org/abs/1611.05128) 148 | - 2016 arXiv [Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures](https://arxiv.org/abs/1607.03250) 149 | - 2016 NIPS [Learning the Number of Neurons in Deep Networks](https://rsu.forge.nicta.com.au/people/jalvarez/LNN/AlvarezSalzmannNIPS16.pdf) 150 | - 2016 NIPS [Learning Structured Sparsity in Deep Learning](https://arxiv.org/abs/1608.03665) \[[code](https://github.com/wenwei202/caffe/tree/scnn)\] 151 | - 2016 NIPS [Dynamic Network Surgery for Efficient DNNs](http://128.84.21.199/abs/1608.04493) 152 | - 2016 ECCV [Less is More: Towards Compact CNNs](https://static-content.springer.com/esm/chp%3A10.1007%2F978-3-319-46493-0_40/MediaObjects/419976_1_En_40_MOESM1_ESM.pdf) 153 | - 2016 CVPR [Fast ConvNets Using Group-wise Brain Damage](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Lebedev_Fast_ConvNets_Using_CVPR_2016_paper.pdf) 154 | - 2016 ICLR [Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding](http://arxiv.org/abs/1510.00149) 155 | - 2016 ICLR [Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications](http://arxiv.org/abs/1511.06530) 156 | - 2015 arXiv [Structured Pruning of Deep Convolutional Neural Networks](http://arxiv.org/abs/1512.08571) 157 | - 2015 IEEE Access [Channel-Level Acceleration of Deep Face Representations](http://ieeexplore.ieee.org/document/7303876/) 158 | - 2015 BMVC [Data-free parameter pruning for Deep Neural Networks](http://arxiv.org/abs/1507.06149) 159 | - 2015 ICML [Compressing Neural Networks with the Hashing Trick](http://jmlr.org/proceedings/papers/v37/chenc15.pdf) 160 | - 2015 ICCV [Deep Fried Convnets](http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Yang_Deep_Fried_Convnets_ICCV_2015_paper.pdf) 161 | - 2015 ICCV [An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections](http://felixyu.org/pdf/ICCV15_circulant.pdf) 162 | - 2015 NIPS [Learning both Weights and Connections for Efficient Neural Networks](http://arxiv.org/abs/1506.02626) 163 | - 2015 ICLR [FitNets: Hints for Thin Deep Nets](http://arxiv.org/pdf/1412.6550v4.pdf) 164 | - 2014 arXiv [Compressing Deep Convolutional Networks using Vector Quantization](http://arxiv.org/abs/1412.6115) 165 | - 2014 NIPSW [Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531) 166 | - 1995 ISANN [Evaluating Pruning Methods](http://publications.idiap.ch/downloads/papers/1995/thimm-pruning-hop.pdf) 167 | - 1993 T-NN [Pruning Algorithms--A Survey](http://axon.cs.byu.edu/~martinez/classes/678/Papers/Reed_PruningSurvey.pdf) 168 | - 1989 NIPS [Optimal Brain Damage](http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf) 169 | 170 | ## Transfer Learning 171 | - 2016 arXiv [What makes ImageNet good for transfer learning?](https://arxiv.org/abs/1608.08614) 172 | - 2014 NIPS [How transferable are features in deep neural networks?](https://arxiv.org/pdf/1411.1792v1.pdf) 173 | - 2014 CVPR [CNN Features off-the-shelf: an Astounding Baseline for Recognition](http://www.cv-foundation.org/openaccess/content_cvpr_workshops_2014/W15/papers/Razavian_CNN_Features_Off-the-Shelf_2014_CVPR_paper.pdf) 174 | - 2014 ICML [DeCAF: A Deep Convolutional Activation](http://proceedings.mlr.press/v32/donahue14.pdf) 175 | 176 | ## Theory 177 | - 2018 ICLR [When is a Convolutional Filter Easy to Learn?](https://arxiv.org/pdf/1709.06129.pdf) 178 | - 2017 arXiv [Opening the black box of Deep Neural Networks via Information](https://arxiv.org/pdf/1703.00810.pdf) 179 | - 2017 ICML [On the Expressive Power of Deep Neural Networks](https://arxiv.org/pdf/1606.05336v6.pdf) 180 | - 2017 ICML [A Closer Look at Memorization in Deep Networks](https://arxiv.org/pdf/1706.05394.pdf) 181 | - 2017 ICML [An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis](https://arxiv.org/pdf/1703.00560.pdf) 182 | - 2016 NIPS [Exponential expressivity in deep neural networks through transient chaos](https://papers.nips.cc/paper/6322-exponential-expressivity-in-deep-neural-networks-through-transient-chaos.pdf) 183 | - 2016 arXiv [Understanding Deep Convolutional Networks](https://arxiv.org/pdf/1601.04920.pdf) 184 | - 2014 NIPS [On the number of linear regions of deep neural networks](http://papers.nips.cc/paper/5422-on-the-number-of-linear-regions-of-deep-neural-networks.pdf) 185 | - 2014 ICML [Provable Bounds for Learning Some Deep Representations](http://proceedings.mlr.press/v32/arora14.pdf) 186 | - 2014 ICLR [On the number of response regions of deep feed forward networks with piece-wise linear activations](https://arxiv.org/pdf/1312.6098.pdf) 187 | - 2014 ICLR [Revisiting natural gradient for deep networks](https://arxiv.org/pdf/1301.3584.pdf) 188 | 189 | ## 3D Data 190 | - 2017 NIPS [PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space](https://arxiv.org/pdf/1706.02413.pdf) 191 | - 2017 ICCV [Octree Generating Networks: Efficient Convolutional Architectures for High-resolution 3D Outputs](https://arxiv.org/pdf/1703.09438.pdf) 192 | - 2017 SIGGRAPH [O-CNN: Octree-based Convolutional Neural Network for Understanding 3D Shapes](http://wang-ps.github.io/O-CNN_files/CNN3D.pdf) 193 | - 2017 CVPR [PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation](https://arxiv.org/pdf/1612.00593.pdf) 194 | - 2017 CVPR [OctNet: Learning Deep 3D Representations at High Resolutions](https://arxiv.org/pdf/1611.05009.pdf) 195 | - 2016 NIPS [FPNN: Field Probing Neural Networks for 3D Data](https://papers.nips.cc/paper/6416-fpnn-field-probing-neural-networks-for-3d-data.pdf) 196 | - 2016 NIPS [Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling](https://jiajunwu.com/papers/3dgan_nips.pdf) 197 | - 2015 ICCV [Multi-view Convolutional Neural Networks for 3D Shape Recognition](http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Su_Multi-View_Convolutional_Neural_ICCV_2015_paper.pdf) 198 | - 2015 BMVC [Sparse 3D convolutional neural networks](http://www.bmva.org/bmvc/2015/papers/paper150/paper150.pdf) 199 | - 2015 CVPR [3D ShapeNets: A Deep Representation for Volumetric Shapes](http://3dshapenets.cs.princeton.edu/paper.pdf) 200 | 201 | ## Hardware 202 | - 2017 ISCA [In-Datacenter Performance Analysis of a Tensor Processing Unit](https://arxiv.org/pdf/1704.04760.pdf) (TPU) 203 | - 2017 ISVLSI [YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights](https://arxiv.org/pdf/1606.05487.pdf) 204 | - 2017 ASPLOS [SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing](https://arxiv.org/pdf/1611.05939.pdf) 205 | - 2017 FPGA [Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks](http://jaewoong.org/pubs/fpga17-next-generation-dnns.pdf) 206 | - 2015 NIPS Tutorial [High-Performance Hardware for Machine Learning](https://media.nips.cc/Conferences/2015/tutorialslides/Dally-NIPS-Tutorial-2015.pdf) 207 | -------------------------------------------------------------------------------- /dl_opt.md: -------------------------------------------------------------------------------- 1 | # Optimization for Deep Learning 2 | 3 | - [Generalization](#generalization) 4 | - [Loss Surface](#loss-surface) 5 | - [Batch Size](#batch-size) 6 | - [General](#general) 7 | - [Adaptive Gradient Methods](#adaptive-gradient-methods) 8 | - [Distributed Optimization](#distributed-optimization) 9 | - [Initialization](#initialization) 10 | - [Low Precision](#low-precision) 11 | - [Normalization](#normalization) 12 | - [Regularization](#regularization) 13 | - [Meta Learning](#meta-learning) 14 | 15 | ## Generalization 16 | - 2018 ICLR [Sensitivity and Generalization in Neural Networks: an Empirical Study](https://openreview.net/pdf?id=HJC2SzZCW) 17 | - 2018 arXiv [On Characterizing the Capacity of Neural Networks using Algebraic Topology](https://arxiv.org/pdf/1802.04443.pdf) 18 | - 2017 arXiv [Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data](https://arxiv.org/pdf/1703.11008.pdf) 19 | - 2017 NIPS [Exploring Generalization in Deep Learning](https://arxiv.org/pdf/1706.08947.pdf) 20 | - 2017 NIPS [Train longer, generalize better: closing the generalization gap in large batch training of neural networks](http://papers.nips.cc/paper/6770-train-longer-generalize-better-closing-the-generalization-gap-in-large-batch-training-of-neural-networks.pdf) 21 | - 2017 ICML [A Closer Look at Memorization in Deep Networks](https://arxiv.org/pdf/1706.05394.pdf) 22 | - 2017 ICLR [Understanding deep learning requires rethinking generalization](https://openreview.net/pdf?id=Sy8gdB9xx) 23 | 24 | ## Loss Surface 25 | - 2018 NIPS [Visualizing the Loss Landscape of Neural Nets](https://arxiv.org/pdf/1712.09913.pdf) 26 | - 2018 ICML [Essentially No Barriers in Neural Network Energy Landscape](https://arxiv.org/pdf/1803.00885.pdf) 27 | - 2018 arXiv [Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs](https://arxiv.org/abs/1802.10026) 28 | - 2018 ICML [Optimization Landscape and Expressivity of Deep CNNs](http://proceedings.mlr.press/v80/nguyen18a/nguyen18a.pdf) 29 | - 2018 ICLR [Measuring the Intrinsic Dimension of Objective Landscapes](https://arxiv.org/pdf/1804.08838.pdf) 30 | - 2017 ICML [The Loss Surface of Deep and Wide Neural Networks](https://arxiv.org/pdf/1704.08045.pdf) 31 | - 2017 ICML [Geometry of Neural Network Loss Surfaces via Random Matrix Theory](http://proceedings.mlr.press/v70/pennington17a/pennington17a.pdf) 32 | - 2017 ICML [Sharp Minima Can Generalize For Deep Nets](https://arxiv.org/pdf/1703.04933.pdf) 33 | - 2017 ICLR [Entropy-SGD: Biasing Gradient Descent Into Wide Valleys](https://arxiv.org/pdf/1611.01838.pdf) 34 | - 2017 ICLR [On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima](https://openreview.net/pdf?id=H1oyRlYgg) 35 | - 2017 arXiv [An empirical analysis of the optimization of deep network loss surfaces](https://arxiv.org/pdf/1612.04010.pdf) 36 | - 2016 ICMLW [Visualizing Deep Network Training Trajectories with PCA](https://icmlviz.github.io/icmlviz2016/assets/papers/24.pdf) 37 | - 2016 ICLRW [Stuck in a What? Adventures in Weight Space](https://arxiv.org/pdf/1602.07320.pdf) 38 | - 2015 ICLR [Qualitatively Characterizing Neural Network Optimization Problems](https://arxiv.org/pdf/1412.6544.pdf) 39 | - 2015 AISTATS [The Loss Surfaces of Multilayer Networks](http://www.jmlr.org/proceedings/papers/v38/choromanska15.pdf) 40 | - 2014 NIPS [Identifying and attacking the saddle point problem in high-dimensional non-convex optimization](http://papers.nips.cc/paper/5486-identifying-and-attacking-the-saddle-point-problem-in-high-dimensional-non-convex-optimization.pdf) 41 | 42 | ## Batch Size 43 | - 2018 NIPS [Hessian-based Analysis of Large Batch Training and Robustness to Adversaries](https://arxiv.org/pdf/1802.08241.pdf) 44 | - 2018 ICLR [Don't Decay the Learning Rate, Increase the Batch Size](https://arxiv.org/pdf/1711.00489.pdf) 45 | - 2017 arXiv [Scaling SGD Batch Size to 32K for ImageNet Training](https://arxiv.org/pdf/1708.03888.pdf) 46 | - 2017 arXiv [Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677) 47 | - 2017 ICML [Sharp Minima Can Generalize For Deep Nets](https://arxiv.org/abs/1703.04933) 48 | - 2017 ICLR [On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima](https://openreview.net/pdf?id=H1oyRlYgg) 49 | 50 | 51 | ## General 52 | - 2016 ICML [Train faster, generalize better: Stability of stochastic gradient descent](http://proceedings.mlr.press/v48/hardt16.pdf) 53 | - 2016 arXiv [Optimization Methods for Large-Scale Machine Learning](https://arxiv.org/abs/1606.04838) 54 | - 2016 Blog [An overview of gradient descent optimization algorithms](http://sebastianruder.com/optimizing-gradient-descent/index.html) 55 | - 2015 DL Summer School [Non-Smooth, Non-Finite, and Non-Convex Optimization](http://www.iro.umontreal.ca/~memisevr/dlss2015/2015_DLSS_NonSmoothNonFiniteNonConvex.pdf) 56 | - 2015 NIPS [Training Very Deep Networks](http://papers.nips.cc/paper/5850-training-very-deep-networks.pdf) 57 | - 2015 AISTATS [Deeply-Supervised Nets](http://jmlr.org/proceedings/papers/v38/lee15a.pdf) 58 | - 2014 OSLW [On the Computational Complexity of Deep Learning](http://lear.inrialpes.fr/workshop/osl2015/slides/osl2015_shalev_shwartz.pdf) 59 | - 2011 ICML [On Optimization Methods for Deep Learning](http://ai.stanford.edu/~quocle/LeNgiCoaLahProNg11.pdf) 60 | - 2010 AISTATS [Understanding the difficulty of training deep feedforward neural networks](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf) 61 | 62 | ## Adaptive Gradient Methods 63 | - 2017 [The Marginal Value of Adaptive Gradient Methods in Machine Learning](https://arxiv.org/abs/1705.08292) 64 | - 2017 ICLR [SGDR: Stochastic Gradient Descent with Restarts](https://openreview.net/pdf?id=Skq89Scxx) 65 | - 2015 ICLR [Adam: A Method for Stochastic Optimization](http://arxiv.org/abs/1412.6980) (Adam) 66 | - 2013 ICML [On the importance of initialization and momentum in deep learning](http://www.cs.toronto.edu/~fritz/absps/momentum.pdf) (NAG) 67 | - 2012 Lecture [RMSProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning]() (RMSProp) 68 | - 2011 JMLR [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf) (Adagrad) 69 | 70 | ## Distributed Optimization 71 | - 2017 arXiv [Accurate, Large Minibatch SGD:Training ImageNet in 1 Hour](https://arxiv.org/abs/1706.02677) 72 | - 2017 NIPS [TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning](https://arxiv.org/pdf/1705.07878.pdf) 73 | - 2017 NIPS [QSGD: Communication-Efficient Stochastic Gradient Descent, with Applications to Training Neural Networks](https://arxiv.org/pdf/1610.02132.pdf) (QSGD) 74 | - 2016 ICML [Training Neural Networks Without Gradients: A Scalable ADMM Approach](http://jmlr.org/proceedings/papers/v48/taylor16.pdf) 75 | - 2016 IJCAI [Staleness-aware Async-SGD for Distributed Deep Learning](http://www.ijcai.org/Proceedings/16/Papers/335.pdf) 76 | - 2016 ICLRW [Revisiting Distributed Synchronous SGD](http://arxiv.org/abs/1604.00981) 77 | - 2016 Thesis [Distributed Stochastic Optimization for Deep Learning](https://cs.nyu.edu/media/publications/zhang_sixin.pdf) (EASGD) 78 | - 2015 NIPS [Deep learning with Elastic Averaging SGD](https://www.cs.nyu.edu/~zsx/nips2015.pdf) (EASGD) 79 | - 2015 ICLR [Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging](http://arxiv.org/pdf/1409.1556v6.pdf) 80 | 81 | ## Initialization 82 | - 2016 NIPS [Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity](http://papers.nips.cc/paper/6427-toward-deeper-understanding-of-neural-networks-the-power-of-initialization-and-a-dual-view-on-expressivity.pdf) 83 | - 2016 ICLR [All You Need is a Good Init](https://arxiv.org/pdf/1511.06422.pdf) 84 | - 2016 ICLR [Data-dependent Initializations of Convolutional Neural Networks](https://arxiv.org/pdf/1511.06856.pdf) 85 | - 2015 ICCV [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](http://research.microsoft.com/en-us/um/people/kahe/publications/iccv15imgnet.pdf) (MSRAinit) 86 | - 2014 ICLR [Exact solutions to the nonlinear dynamics of learning in deep linear neural networks](https://arxiv.org/pdf/1312.6120.pdf) 87 | - 2013 ICML [On the importance of initialization and momentum in deep learning](http://www.cs.toronto.edu/~fritz/absps/momentum.pdf) 88 | - 2010 AISTATS [Understanding the difficulty of training deep feedforward neural networks](http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf) (Xavier initialization) 89 | 90 | 91 | ## Low Precision 92 | - 2017 arXiv [Gradient Descent for Spiking Neural Networks](https://arxiv.org/abs/1706.04698) 93 | - 2017 arXiv [Training Quantized Nets: A Deeper Understanding](https://arxiv.org/abs/1706.02379) 94 | - 2017 arXiv [TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning](https://arxiv.org/abs/1705.07878) 95 | - 2017 ICML [ZipML: Training Linear Models with End-to-End Low Precision](http://proceedings.mlr.press/v70/zhang17e/zhang17e.pdf) 96 | - 2016 arXiv [QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks](https://arxiv.org/pdf/1610.02132.pdf) 97 | - 2015 NIPS [Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms](https://pdfs.semanticscholar.org/a1d2/1f6c8eef605bf132179daf717a232774b375.pdf) 98 | - 2013 arXiv [Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation](https://arxiv.org/pdf/1308.3432.pdf) 99 | 100 | ## Noise 101 | - 2015 arXiv [Adding Gradient Noise Improves Learning for Very Deep Networks](http://arxiv.org/abs/1511.06807) 102 | 103 | ## Normalization 104 | - 2017 arXiv [Self-Normalizing Neural Networks](https://arxiv.org/abs/1706.02515) 105 | - 2017 arXiv [Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models](https://arxiv.org/abs/1702.03275) 106 | - 2016 NIPS [Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks](https://arxiv.org/pdf/1602.07868.pdf) 107 | - 2016 NIPS [Layer Normalization](https://arxiv.org/pdf/1607.06450.pdf) 108 | - 2016 ICML [Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks](https://arxiv.org/pdf/1603.01431.pdf) 109 | - 2016 ICLR [Data-Dependent Path Normalization in Neural Networks](http://arxiv.org/pdf/1511.06747v4.pdf) 110 | - 2015 NIPS [Path-SGD: Path-Normalized Optimization in Deep Neural Networks](http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2015_5797.pdf) 111 | - 2015 ICML [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](http://jmlr.org/proceedings/papers/v37/ioffe15.pdf) 112 | 113 | ## Regularization 114 | - 2017 arXiv [L2 Regularization versus Batch and Weight Normalization](https://arxiv.org/abs/1706.05350) 115 | - 2014 JMLR [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf) (Dropout) 116 | 117 | ## Meta-Learning 118 | - 2017 ICML [Neural Optimizer Search with Reinforcement Learning](https://arxiv.org/pdf/1709.07417.pdf) 119 | - 2017 ICML [Learned Optimizers that Scale and Generalize](https://arxiv.org/pdf/1703.04813.pdf) 120 | - 2017 ICML [Learning to Learn without Gradient Descent by Gradient Descent](http://www.cantab.net/users/yutian.chen/Publications/ChenEtAl_ICML17_L2L.pdf) 121 | - 2017 ICLR [Learning to Optimize](https://openreview.net/pdf?id=ry4Vrt5gl) 122 | - 2016 arXiv [Learning to reinforcement learn](https://arxiv.org/abs/1611.05763) 123 | - 2016 NIPSW [Learning to Learn for Global Optimization of Black Box Functions](https://arxiv.org/abs/1611.03824) 124 | - 2016 NIPS [Learning to learn by gradient descent by gradient descent](https://arxiv.org/abs/1606.04474) 125 | - 2016 ICML [Meta-learning with memory-augmented neural networks](http://proceedings.mlr.press/v48/santoro16.pdf) 126 | 127 | ## Hyperparameter 128 | - 2015 ICML [Gradient-based hyperparameter optimization through reversible learning](https://www.robots.ox.ac.uk/~vgg/rg/papers/MaclaurinICML15.pdf) 129 | 130 | ## Bayesian Optimization 131 | - 2012 [Practical Bayesian Optimization of Machine Learning Algorithms](https://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.pdf) 132 | -------------------------------------------------------------------------------- /dl_sys.md: -------------------------------------------------------------------------------- 1 | # Deep Learning Systems 2 | - [General Frameworks](#general-frameworks) 3 | - [Specific System](#specific-system) 4 | - [Parallelization](#parallelization) 5 | 6 | ## General Frameworks 7 | - **[Caffe](http://caffe.berkeleyvision.org/)** 8 | 2015 [Large Scale Distributed Deep Learning on Hadoop Clusters](http://yahoohadoop.tumblr.com/post/129872361846/large-scale-distributed-deep-learning-on-hadoop) 9 | 2014 MM [Caffe: Convolutional Architecture for Fast Feature Embedding](http://arxiv.org/abs/1408.5093) 10 | 11 | - **[CNTK](https://www.cntk.ai/)** 12 | 2014 MSR-TR [An introduction to computational networks and the computational network toolkit](http://research.microsoft.com/apps/pubs/?id=226641) 13 | 2014 OSDI [Project Adam: Building an Efficient and Scalable Deep Learning Training System](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf) 14 | 15 | - **[MXNet](http://mxnet.dmlc.ml/en/latest/)** 16 | 2016 arXiv [Training Deep Nets with Sublinear Memory Cost](https://arxiv.org/abs/1604.06174) 17 | 2015 NIPSW [MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems](http://www.cs.cmu.edu/~muli/file/mxnet-learning-sys.pdf) [[GTC'16 Tutorial](http://www.cs.cmu.edu/~muli/file/mxnet_gtc16.pdf)] 18 | 2014 NIPSW [Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning](http://stanford.edu/~rezab/nips-2014workshop/submits/minerva.pdf) 19 | 2014 ICLR [Purine: A bi-graph based deep learning framework](http://arxiv.org/abs/1412.6249) 20 | 21 | - **[Neon](https://www.nervanasys.com/technology/neon/)** 22 | 2015 arXiv [Fast Algorithms for Convolutional Neural Networks](http://arxiv.org/abs/1509.09308) (Winograd) [[Blog]](http://www.nervanasys.com/winograd/) 23 | 24 | - **[Paddle](http://www.paddlepaddle.org/)** 25 | 26 | - **[PyTorch](http://pytorch.org/)** 27 | 28 | - **[Spark](https://github.com/amplab/SparkNet)** 29 | 2016 arXiv [SparkNet: Training Deep Networks in Spark](http://arxiv.org/abs/1511.06051) 30 | 2016 arXiv [DeepSpark: A Spark-Based Distributed Deep Learning Framework for Commodity Clusters](https://arxiv.org/abs/1602.08191) 31 | 32 | - **[TensorFlow](https://www.tensorflow.org/)** 33 | 2016 OSDI [BigArray: A System for Large-Scale Machine Learning]() 34 | 2016 arXiv [TensorFlow: A system for large-scale machine learning](http://arxiv.org/abs/1605.08695) 35 | 2015 [TensorFlow:Large-Scale Machine Learning on Heterogeneous Distributed Systems](http://download.tensorflow.org/paper/whitepaper2015.pdf) [[slides 1]](http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn-2015.pdf) [[slides 2]](http://vision.stanford.edu/teaching/cs231n/slides/jon_talk.pdf) 36 | 2012 NIPS [Large Scale Distributed Deep Networks](http://static.googleusercontent.com/media/research.google.com/en/us/archive/large_deep_networks_nips2012.pdf) (DistBelief) 37 | 2014 NIPSW [Techniques and Systems for Training Large Neural Networks Quickly](http://stanford.edu/~rezab/nips2014workshop/slides/jeff.pdf) 38 | 39 | - **[Theano]()** 40 | 2016 arXiv [Theano: A Python framework for fast computation of mathematical expressions](http://arxiv.org/abs/1605.02688) 41 | 42 | - **[Torch](http://torch.ch/)** 43 | 2016 NIPSW [Torchnet: An Open-Source Platform for (Deep) Learning Research](https://lvdmaaten.github.io/publications/papers/Torchnet_2016.pdf) 44 | 2011 NIPSW [Torch7: A Matlab-like Environment for Machine Learning](http://cs.nyu.edu/~koray/files/2011_torch7_nipsw.pdf) 45 | 46 | ## Specific System 47 | - 2016 ICML [Deep Speech 2: End-to-End Speech Recognition in English and Mandarin](http://arxiv.org/abs/1512.02595) 48 | - 2015 ICMLW [Massively Parallel Methods for Deep Reinforcement Learning](https://arxiv.org/abs/1507.04296) 49 | - 2015 arXiv [Deep Image: Scaling up Image Recognition](http://arxiv.org/abs/1501.02876) 50 | - 2013 ICML [Deep learning with COTS HPC systems](http://jmlr.org/proceedings/papers/v28/coates13.pdf) 51 | 52 | ## Parallelization 53 | - 2015 Intel [Single Node Caffe Scoring and Training on Intel® Xeon E5-Series Processors](https://software.intel.com/en-us/articles/single-node-caffe-scoring-and-training-on-intel-xeon-e5-series-processors) 54 | - 2015 arXiv [Caffe con Troll: Shallow Ideas to Speed Up Deep Learning](http://arxiv.org/abs/1504.04343) 55 | - 2015 ICMLW [Massively Parallel Methods for Deep Reinforcement Learning](https://8109f4a4-a-62cb3a1a-s-sites.googlegroups.com/site/deeplearning-2015/1.pdf?attachauth=ANoY7cocCvmoqZlkfUFQkSwV8fULURfVSzDdFv0dyk8uU1ztfeCHFIK4Kb6JoEQ3iZLUiYBynddwePUhd-3ssJZkANn-PXFU7m1U_wE5Eb4eHbZj3YR41bLF1AEr5T5EDth97i9DdkipHses1XTMDu_wpw8zs0-RGb7WVQRF8ZOhvG1AW47CRkAI8X0iv-oLtWy9fGSSa-JR9JpSwFUtjt_0_UXu4BUUwg==&attredirects=0) 56 | - 2015 arXiv [Convolutional Neural Networks at Constrained Time Cost](http://arxiv.org/pdf/1412.1710v1.pdf) 57 | - 2014 arXiv [One weird trick for parallelizing convolutional neural networks](http://arxiv.org/pdf/1404.5997v2.pdf) 58 | - 2014 NIPS [On the Computational Efficiency of Training Neural Networks](http://papers.nips.cc/paper/5267-on-the-computational-efficiency-of-training-neural-networks.pdf) 59 | - 2011 NIPSW [Improving the speed of neural networks on CPUs](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/37631.pdf) 60 | -------------------------------------------------------------------------------- /graph.md: -------------------------------------------------------------------------------- 1 | # Large-Scale Graph Computation 2 | === 3 | 4 | 2015 CIKM [HDRF: Stream-Based Partitioning for Power-Law Graphs](http://www.fabiopetroni.com/Download/petroni2015HDRF.pdf) 5 | 2015 VLDB [One Trillion Edges: Graph Processing at Facebook-Scale](http://www.vldb.org/pvldb/vol8/p1804-ching.pdf) 6 | 2015 VLDB [A Scalable Distributed Graph Partitioner](http://www.eecs.harvard.edu/~dmargo/pubs/vldb15-paper.pdf) 7 | 2015 SOSP [Chaos: Scale-out Graph Processing from Secondary Storage](http://sigops.org/sosp/sosp15/current/2015-Monterey/printable/089-roy.pdf) 8 | 2015 EuroSys [PowerLyra: Differentiated Graph Computation and Partitioning 9 | on Skewed Graphs](http://ipads.se.sjtu.edu.cn/projects/powerlyra/powerlyra-eurosys-final.pdf) 10 | 2014 VLDB [Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation](http://www.vldb.org/pvldb/vol8/p281-lu.pdf) 11 | 2014 VLDB [An Experimental Comparison of Pregel-like Graph Processing Systems](http://www.vldb.org/pvldb/vol7/p1047-han.pdf) 12 | 2014 OSDI [GraphX: Graph Processing in a Distributed Dataflow Framework](https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-gonzalez.pdf) 13 | 2013 SOSP [X-Stream: Edge-centric Graph Processing using Streaming Partitions](http://infoscience.epfl.ch/record/188535/files/paper.pdf) 14 | 2012 OSDI [PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs](https://www.usenix.org/system/files/conference/osdi12/osdi12-final-167.pdf) 15 | 2012 OSDI [GraphChi: Large-Scale Graph Computation on Just a PC](http://select.cs.cmu.edu/publications/paperdir/osdi2012-kyrola-blelloch-guestrin.pdf) 16 | 2010 SIGMOD [Pregel: A System for Large-Scale Graph Processing](https://kowshik.github.io/JPregel/pregel_paper.pdf) 17 | 2010 UAI [GraphLab: A New Framework For Parallel Machine Learning](http://www.select.cs.cmu.edu/publications/paperdir/uai2010-low-gonzalez-kyrola-bickson-guestrin-hellerstein.pdf) 18 | -------------------------------------------------------------------------------- /matrix_fact.md: -------------------------------------------------------------------------------- 1 | # Matrix Factorization 2 | 3 | 4 | - 2016 WSDM [DiFacto — Distributed Factorization Machines](http://www.cs.cmu.edu/~yuxiangw/docs/fm.pdf) 5 | - 2015 CIKM [HDRF: Stream-Based Partitioning for Power-Law Graphs](http://www.fabiopetroni.com/Download/petroni- 2015HDRF.pdf) 6 | - 2015 KDD [Fast and Robust Parallel SGD Matrix Factorization](http://dm.postech.ac.kr/MLGF-MF/fp352.pdf) (MLGF) 7 | - 2015 Facebook Eng [Recommending items to more than a billion people](https://code.facebook.com/posts/861999383875667/recommending-items-to-more-than-a-billion-people/) 8 | - 2015 arxiv [Generalized Low Rank Models](https://web.stanford.edu/~boyd/papers/pdf/glrm.pdf) 9 | - 2015 RecSys [Fast Differentially Private Matrix Factorization](http://arxiv.org/pdf/1505.01419v2.pdf) 10 | - 2015 arxiv [Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems](http://arxiv.org/abs/1411.1134) 11 | - 2015 ICML [PU Learning for Matrix Completion](http://arxiv.org/pdf/1411.6081v1.pdf) 12 | - 2014 JMLR worksop [A Fast Distributed Stochastic Gradient Descent Algorithm for Matrix Factorization](http://www.jmlr.org/proceedings/papers/v36/li14.pdf) 13 | - 2014 NIPSW [Elastic Distributed Bayesian Collaborative Filtering](http://stanford.edu/~rezab/nips- 2014workshop/submits/distbayes.pdf) 14 | - 2014 NIPSW [Factorbird - a Parameter Server Approach to Distributed Matrix Factorization](http://stanford.edu/~rezab/papers/factorbird.pdf) (Factorbird) 15 | - 2014 RecSys [GASGD: Stochastic Gradient Descent for Distributed Asynchronous Matrix Completion via Graph Partitioning](http://dl.acm.org/citation.cfm?id=2645725) (GASGD) 16 | - 2014 VLDB [NOMAD: Non-locking, stOchastic Multi-machine algorithm for Asynchronous and Decentralized matrix completion](http://www.vldb.org/pvldb/vol7/p975-yun.pdf) (NOMAD) 17 | - 2013 WWW [Distributed Large-Scale Natural Graph Factorization](http://www.di.ens.fr/~shervashidze/papers/Ahmedetal13.pdf) 18 | - 2013 EDBT [Sparkler: Supporting Large-Scale Matrix Factorization](http://people.cs.umass.edu/~boduo/publications/- 2013EDBT-sparkler.pdf) 19 | - 2013 RecSys [A fast parallel sgd for matrix factorization in shared memory systems](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/libmf.pdf) (FPSGD) 20 | - 2012 ICDM [Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems](http://www.cs.utexas.edu/~rofuyu/papers/icdm-pmf.pdf) 21 | - 2012 ICDM [Distributed Matrix Completion](https://people.mpi-inf.mpg.de/~rgemulla/publications/teflioudi12completion.pdf) (DSGD++) 22 | 2011 KDD [Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent](https://people.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf) (DSGD) 23 | 2009 JMLR [Scalable Collaborative Filtering Approaches for Large Recommender Systems](http://www.jmlr.org/papers/volume10/takacs09a/takacs09a.pdf) 24 | 25 | ##### ALS 26 | - 2008 AAIM [Large-scale parallel collaborative filtering 27 | for the Netflix prize](http://www.grappa.univ-lille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf) 28 | 29 | ##### Nystrom method 30 | - 2015 JMLR [Distributed Matrix Completion and Robust Factorization](http://web.stanford.edu/~lmackey/papers/dmcrf-jmlr15.pdf) 31 | - 2011 NIPS [Divide-and-Conquer Matrix Factorization](http://papers.nips.cc/paper/4486-divide-and-conquer-matrix-factorization.pdf) 32 | 33 | ##### NMF 34 | - 2011 KDD [Fast Coordinate Descent Methods with Variable Selection 35 | for Non-negative Matrix Factorization](http://www.cs.utexas.edu/users/inderjit/public_papers/nmf_kdd11.pdf) 36 | - 2010 WWW [Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce](http://research.microsoft.com/pubs/119077/DNMF.pdf) 37 | --------------------------------------------------------------------------------