├── .gitignore ├── README.md └── notes └── interpret-cnn-compress.md /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | .idea/* -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This is a collection of papers aiming at reducing model sizes or the ASIC/FPGA accelerator for Machine Learning, especially deep neural network related applications. (Inspired by [Embedded-Neural-Network](https://github.com/ZhishengWang/Embedded-Neural-Network).) 2 | 3 | You can use the following materials as your entrypoint: 4 | * [Efficient Processing of Deep Neural Networks: A Tutorial and Survey](https://arxiv.org/abs/1703.09039) 5 | * the related work of [Quantized Neural Networks](https://arxiv.org/abs/1609.07061) 6 | 7 | # Terminologies 8 | 9 | - **Structural pruning (compression)**: compress CNNs based on removing "less important" filter. 10 | 11 | 12 | # Network Compression 13 | 14 | ## Reduce Precision 15 | [Deep neural networks are robust to weight binarization and other non-linear distortions](https://arxiv.org/abs/1606.01981) showed that DNN can be robust to more than just weight binarization. 16 | 17 | 18 | ### Linear Quantization 19 | * Fixed point 20 | * [1502]. [Deep Learning with Limited Numerical Precision](https://arxiv.org/abs/1502.02551) 21 | * [1610]. [QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks](https://arxiv.org/abs/1610.02132) 22 | * Dynamic fixed point 23 | * [1412]. [Training deep neural networks with low precision multiplications](https://arxiv.org/abs/1412.7024) 24 | * [1604]. [Hardware-oriented approximation of convolutional neural networks](https://arxiv.org/abs/1604.03168) 25 | * [1608]. [Scalable and modularized RTL compilation of convolutional neural networks onto FPGA](http://ieeexplore.ieee.org/document/7577356/) 26 | * Binary Quantization 27 | * Theory proof (EBP) 28 | * [1405]. [Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights](https://papers.nips.cc/paper/5269-expectation-backpropagation-parameter-free-training-of-multilayer-neural-networks-with-continuous-or-discrete-weights.pdf) 29 | * [1503]. [Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation](https://arxiv.org/abs/1503.03562) 30 | * [1505]. [Backpropagation for Energy-Efficient Neuromorphic Computing](https://papers.nips.cc/paper/5862-backpropagation-for-energy-efficient-neuromorphic-computing) 31 | * More practice with 1 bit 32 | * [1511]. [BinaryConnect: Training Deep Neural Networks with binary weights during propagations](https://arxiv.org/abs/1511.00363) 33 | * [1510]. [Neural Networks with Few Multiplications](https://arxiv.org/abs/1510.03009) 34 | * [1601]. [Bitwise Neural Networks](https://arxiv.org/abs/1601.06071) 35 | * [1602]. [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830) 36 | * [1603]. [XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks](https://arxiv.org/abs/1603.05279) 37 | * XNOR-Net with slightly large bits (1~2 bit) 38 | * [1606]. [DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients](https://arxiv.org/abs/1606.06160) 39 | * [1608]. [Recurrent Neural Networks With Limited Numerical Precision](https://arxiv.org/abs/1608.06902) 40 | * [1609]. [Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations](https://arxiv.org/abs/1609.07061). (Text overlap with Binarized Neural Network.) 41 | * [1702]. [Deep Learning with Low Precision by Half-wave Gaussian Quantization](https://arxiv.org/abs/1702.00953) 42 | * Ternary Quantization 43 | * [1410]. [Fixed-point feedforward deep neural network design using weights +1, 0, and -1](http://ieeexplore.ieee.org/document/6986082/) 44 | * [1605]. [Ternary Weight Networks](https://arxiv.org/abs/1605.04711) 45 | * [1612]. [Trained Ternary Quantization](https://arxiv.org/abs/1612.01064) 46 | * Other Quantization or others 47 | * [1412]. [Compressing Deep Convolutional Networks using Vector Quantization](https://arxiv.org/abs/1412.6115) 48 | * 1-Bit Stochastic Gradient Descent and its Application to Data-Parallel Distributed Training of Speech DNNs 49 | * Towards the Limit of Network Quantization. 50 | * Loss-aware Binarization of Deep Networks. 51 | 52 | 53 | ### Non-linear Quantization 54 | * Log Domain Quantization 55 | * [1603]. [Convolutional neural networks using logarithmic data representation](https://arxiv.org/abs/1603.01025) 56 | * [1609]. [LogNet: Energy-efficient neural networks using logarithmic computation](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7953288) 57 | * [1702]. [Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights](https://arxiv.org/abs/1702.03044) 58 | * Parameter Sharing 59 | * Structured Matrices 60 | * Structured Convolution Matrices for Energy-efficient Deep learning. 61 | * Structured Transforms for Small-Footprint Deep Learning. 62 | * An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections. 63 | * Theoretical Properties for Neural Networks with Weight Matrices of Low Displacement Rank. 64 | * Hashing 65 | * [1504]. [Compressing neural networks with the hashing trick](https://arxiv.org/abs/1504.04788) 66 | * Functional Hashing for Compressing Neural Networks 67 | * [1510]. [Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding](https://arxiv.org/abs/1510.00149) 68 | * Learning compact recurrent neural networks. 69 | 70 | 71 | ## Reduce Number of Operations and Model Size 72 | ### Exploiting Activation Statistics 73 | * To be updated. 74 | 75 | 76 | ### Network Pruning 77 | Network Prune: a large amount of the weights in a network are redundant and can be removed (i.e., set to zero). 78 | 79 | * Remove low saliency 80 | * [9006]. [Optimal Brain Damage](http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf) 81 | * [1506]. [Learning both weights and connections for efficient neural network](https://arxiv.org/abs/1506.02626) 82 | * Energy-based prune 83 | * [1611]. [Designing energy-efficient convolutional neural networks using energy-aware pruning](https://arxiv.org/abs/1611.05128) 84 | * Process sparse weights 85 | * [1402]. [A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs](https://dl.acm.org/citation.cfm?id=2554785) 86 | * [1510]. [Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding](https://arxiv.org/abs/1510.00149) 87 | * [1602]. [EIE: Efficient Inference Engine on Compressed Deep Neural Network](https://arxiv.org/abs/1602.01528) 88 | * [1705]. [SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks](https://arxiv.org/abs/1708.04485) 89 | * [1710]. [Efficient Methods and Hardware for Deep Learning, Ph.D. Thesis](https://purl.stanford.edu/qf934gh3708) 90 | * Structured pruning 91 | * [1512]. [Structured Pruning of Deep Convolutional Neural Networks](https://arxiv.org/abs/1512.08571) 92 | * [1608]. [Learning Structured Sparsity in Deep Neural Networks](https://arxiv.org/abs/1608.03665) 93 | * [1705]. [Exploring the Regularity of Sparse Structure in Convolutional Neural Networks](https://arxiv.org/abs/1705.08922) 94 | 95 | ### Bayesian network pruning 96 | - [1711]. Interpreting Convolutional Neural Networks Through Compression - [[notes](notes/interpret-cnn-compress.md)][[arXiv](https://arxiv.org/abs/1711.02329)] 97 | - [1705]. Structural compression of convolutional neural networks based on greedy filter pruning - [[notes](notes/interpret-cnn-compress.md)][[arXiv](https://arxiv.org/abs/1705.07356)] 98 | 99 | 100 | ### Compact Network Architectures 101 | * Before Training 102 | * use 1*1 convolutional layer to reduce the number of channels 103 | * [1512]. [Rethinking the inception architecture for computer vision](https://arxiv.org/abs/1512.00567) 104 | * [1610]. [Xception: Deep Learning with Depthwise Separable Convolutions](https://arxiv.org/abs/1610.02357) 105 | * [1704]. [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications](https://arxiv.org/abs/1704.04861) 106 | * Bottleneck: 107 | * [1312]. [Network in network](https://arxiv.org/abs/1312.4400) 108 | * [1409]. [Going deeper with convolutions](https://arxiv.org/abs/1409.4842) 109 | * [1512]. [Deep residual learning for image recognition](https://arxiv.org/abs/1512.03385) 110 | * [1602]. [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size](https://arxiv.org/abs/1602.07360) 111 | * After Training 112 | * Canonical Polyadic (CP) decomposition 113 | * [1404]. [Exploiting linear structure within convolutional networks for efficient evaluation](https://arxiv.org/abs/1404.0736) 114 | * [1412]. [Speeding-up convolutional neural networks using fine-tuned cp-decomposition](https://arxiv.org/abs/1412.6553) 115 | * Tucker decomposition 116 | * [1511]. [Compression of deep convolutional neural networks for fast and low power mobile applications](https://arxiv.org/abs/1511.06530) 117 | 118 | 119 | ### Knowledge Distillation 120 | * [0600]. [Model compression](https://www.cs.cornell.edu/~caruana/compression.kdd06.pdf) 121 | * [1312]. [Do deep nets really need to be deep?](https://arxiv.org/abs/1312.6184) 122 | * [1412]. [Fitnets: Hints for thin deep nets](https://arxiv.org/abs/1412.6550) 123 | * [1503]. [Distilling the knowledge in a neural network](https://arxiv.org/abs/1503.02531) 124 | * Sequence-Level Knowledge Distillation. 125 | * Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. 126 | 127 | 128 | # A Bit Hardware 129 | * [1402]. [Computing's Energy Porblem (and what we can do about it)](http://ieeexplore.ieee.org/document/6757323/) 130 | 131 | # Contributors 132 | - [Tao Lin](https://github.com/IamTao) 133 | - [Jun Lu](https://github.com/junlulocky) 134 | -------------------------------------------------------------------------------- /notes/interpret-cnn-compress.md: -------------------------------------------------------------------------------- 1 | ## 1. Structural compression of convolutional neural networks based on greedy filter pruning - [[arXiv](https://arxiv.org/abs/1705.07356)] 2 | 3 | ## 2. Interpreting Convolutional Neural Networks Through Compression - [[arXiv](https://arxiv.org/abs/1711.02329)] 4 | 5 | The author proposed classification accuracy reduction (CAR). In the CAR structural compression, the filter with the least effect on the classification accuracy gets pruned in each iteration. Afterwards, a fine tuning process is needed to get better accuracy. --------------------------------------------------------------------------------