├── LICENSE
├── Awesome-Quantization-Papers
    └── Awesome_Quantization_Papers.csv
└── README.md


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 Zhen Dong
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/Awesome-Quantization-Papers/Awesome_Quantization_Papers.csv:
--------------------------------------------------------------------------------
  1 | Title,Publication,Bit,Quantizer,Finetune,Task,Special
  2 | Fully integer-based quantization for mobile convolutional neural network inference,Neurocomputing 2021,,,,,
  3 | S^3: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks,NeurIPS 2021,T,,,C,
  4 | BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer,NeurIPS 2021,MP,,QAT,C,
  5 | CBP: backpropagation with constraint on weight precision using a pseudo-Lagrange multiplier method[PyTorch],NeurIPS 2021,B/T/Uni,,QAT,C,
  6 | "Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals",NeurIPS 2021,B/Uni,,,C,
  7 | Learning Frequency Domain Approximation for Binary Neural Networks,NeurIPS 2021,B,,QAT,C,
  8 | Post-Training Quantization for Vision Transformer,NeurIPS 2021,Uni,,PTQ,C,
  9 | Post-Training Sparsity-Aware Quantization[PyTorch]:star:5,NeurIPS 2021,Uni,,PTQ,C,
 10 | Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples[PyTorch]:star:2,NeurIPS 2021,Uni,,,C,
 11 | Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes[PyTorch]:star:2,NeurIPS 2021,Uni,,QAT,C,
 12 | QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning,NeurIPS 2021,B/Uni,,QAT,C,
 13 | VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization,NeurIPS 2021,MP,PQ,QAT,Node Classification,
 14 | Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression,EMNLP 2021,Uni,,QAT/PTQ,N,
 15 | Compressing Word Embeddings via Deep Compositional Code Learning:fire:89[PyTorch]:star:81,EMNLP 2021,MP,,,N,
 16 | Matching-oriented Embedding Quantization For Ad-hoc Retrieval[PyTorch],EMNLP 2021,MP,PQ,QAT,N,
 17 | Understanding and Overcoming the Challenges of Efficient Transformer Quantization[PyTorch]:star:9,EMNLP 2021,Uni,,QAT/PTQ,N,
 18 | Fully Quantized Image Super-Resolution Networks,MM 2021,,,,,
 19 | VQMG: Hierarchical Vector Quantised and Multi-hops Graph Reasoning for Explicit Representation Learning,MM 2021,,,,,
 20 | Cluster-Promoting Quantization With Bit-Drop for Minimizing Network Quantization Loss,ICCV 2021,T/Uni,LQ,,C,
 21 | Distance-Aware Quantization,ICCV 2021,B/T/Uni,LQ,QAT,C,
 22 | Dynamic Network Quantization for Efficient Video Inference,ICCV 2021,MP,,,Video Recognition,
 23 | Generalizable Mixed-Precision Quantization via Attribution Rank Preservation[PyTorch]:star:15,ICCV 2021,MP,LQ,QAT,C,
 24 | Improving Low-Precision Network Quantization via Bin Regularization,ICCV 2021,B/T/Uni,LQ,QAT,C,
 25 | Improving Neural Network Efficiency via Post-Training Quantization With Adaptive Floating-Point[PyTorch],ICCV 2021,MP,Linear,PTQ,C,
 26 | Integer-Arithmetic-Only Certified Robustness for Quantized Neural Networks,ICCV 2021,Uni,Linear,QAT,C,
 27 | MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing,ICCV 2021,Uni,Linear,QAT/PTQ,C/O,
 28 | Once Quantization-Aware Training: High Performance Extremely Low-Bit Architecture Search[PyTorch]:star:19,ICCV 2021,Uni,,QAT,C,
 29 | Product Quantizer Aware Inverted Index for Scalable Nearest Neighbor Search,ICCV 2021,MP,PQ,QAT,C,
 30 | RMSMP: A Novel Deep Neural Network Quantization Framework With Row-Wise Mixed Schemes and Multiple Precisions,ICCV 2021,MP,,QAT/PTQ,C/N,
 31 | ReCU: Reviving the Dead Weights in Binary Neural Networks,ICCV 2021,B,,,C,
 32 | Self-Supervised Product Quantization for Deep Unsupervised Image Retrieval[PyTorch]:star:22,ICCV 2021,Uni,PQ,QAT,C,
 33 | Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks[PyTorch]:star:7,ICCV 2021,B,,,C,
 34 | Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization,ICCV 2021,MP,Linear,PTQ,C,
 35 | 1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed,ICML 2021,B,,,C/N,
 36 | Accurate Post Training Quantization With Small Calibration Sets[PyTorch]:star:14,ICML 2021,MP,,PTQ,C/N,
 37 | ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training,ICML 2021,MP,,,C,
 38 | Communication-Efficient Distributed Optimization with Quantized Preconditioners,ICML 2021,,PQ,,,
 39 | Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution,ICML 2021,MP,,QAT,C,
 40 | Double-Win Quant: Aggressively Winning Robustness of Quantized Deep Neural Networks via Random Precision Training and Inference[PyTorch]:star:3,ICML 2021,MP,,,C,
 41 | Estimation and Quantization of Expected Persistence Diagrams,ICML 2021,,,,,
 42 | HAWQ-V3: Dyadic Neural Network Quantization[PyTorch]:star:193,ICML 2021,MP,Linear,,C,
 43 | I-BERT: Integer-only BERT Quantization,ICML 2021,Uni,,,N,
 44 | Quantization Algorithms for Random Fourier Features,ICML 2021,,,,,
 45 | Soft then Hard: Rethinking the Quantization in Neural Image Compression,ICML 2021,,,,,
 46 | Training Quantized Neural Networks to Global Optimality via Semidefinite Programming,ICML 2021,,,,,
 47 | Vector Quantized Models for Planning,ICML 2021,,,,,
 48 | Distribution-aware Adaptive Multi-bit Quantization,CVPR 2021,,,,,
 49 | Layer importance estimation with imprinting for neural network quantization,CVPR 2021,,,,,
 50 | Adaptive binary-ternary quantization,CVPR 2021,,,,,
 51 | AQD: Towards Accurate Quantized Object Detection[PyTorch]:star:11,CVPR 2021,,,,,
 52 | Automated Log-Scale Quantization for Low-Cost Deep Neural Networks,CVPR 2021,T,Log,QAT,C/S,
 53 | Binary TTC: A Temporal Geofence for Autonomous Navigation,CVPR 2021,,,,,
 54 | Diversifying Sample Generation for Accurate Data-Free Quantization,CVPR 2021,,,,,
 55 | Generative Zero-shot Network Quantization,CVPR 2021,Uni,OptN,QAT,C,
 56 | Improving Accuracy of Binary Neural Networks using Unbalanced Activation Distribution,CVPR 2021,,,,,
 57 | Is In-Domain Data Really Needed? A Pilot Study on Cross-Domain Calibration for Network Quantization,CVPR 2021,Uni,OptN,PTQ,C/O/S,
 58 | Learnable Companding Quantization for Accurate Low-Bit Neural Networks,CVPR 2021,,,,,
 59 | Network Quantization With Element-Wise Gradient Scaling,CVPR 2021,,,,,
 60 | Optimal Quantization Using Scaled Codebook,CVPR 2021,,,,,
 61 | "Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks[PyTorch]:star:109",CVPR 2021,,,,,
 62 | QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks,CVPR 2021,Uni,LQ,PTQ,C/O/S,
 63 | Zero-shot Adversarial Quantization,CVPR 2021,Uni,Linear,PTQ,C/O,
 64 | iVPF: Numerical Invertible Volume Preserving Flow for Efficient Lossless Compression,CVPR 2021,,,,,
 65 | HAO: Hardware-aware neural Architecture Optimization for Efficient Inference,FCCM 2021,,,,,
 66 | Bipointnet: Binary neural network for point clouds,ICLR 2021,,,,,
 67 | Sparse quantized spectral clustering,ICLR 2021,,,,,
 68 | Neural gradients are near-lognormal: improved quantized and sparse training,ICLR 2021,,,,,
 69 | Multiprize lottery ticket hypothesis: Finding accurate binary neural networks by pruning a randomly weighted network,ICLR 2021,,,,,
 70 | High-capacity expert binary networks,ICLR 2021,,,,,
 71 | BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction[PyTorch]:star:58,ICLR 2021,Uni/MP,Linear,PTQ,C/O,
 72 | BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization[PyTorch]:star:9,ICLR 2021,MP,Linear,QAT,C,
 73 | Degree-Quant: Quantization-Aware Training for Graph Neural Networks[PyTorch]:star:17,ICLR 2021,Uni,Linear,QAT,C,
 74 | Incremental few-shot learning via vector quantization in deep embedded space,ICLR 2021,,,,,
 75 | Simple Augmentation Goes a Long Way: ADRL for DNN Quantization,ICLR 2021,MP,Linear,QAT,C,
 76 | Training with Quantization Noise for Extreme Model Compression,ICLR 2021,Uni,PQ,QAT,C,
 77 | CoDeNet: Algorithm-hardware Co-design for Deformable Convolution,FPGA 2021,,,,,
 78 | A Survey of Quantization Methods for Efficient Neural Network Inference:fire:47,BLPCV 2021,,,,,
 79 | Opq: Compressing deep neural networks with one-shot pruning quantization,AAAI 2021,,,,,
 80 | A white paper on neural network quantization,arXiv 2021,,,,,
 81 | Kdlsq-bert: A quantized bert combining knowledge distillation with learned step size quantization,arXiv 2021,,,,,
 82 | Pruning and quantization for deep neural network acceleration: A survey,arXiv 2021,,,,,
 83 | Confounding tradeoffs for neural network quantization,arXiv 2021,,,,,
 84 | Dynamic precision analog computing for neural networks,arXiv 2021,,,,,
 85 | Boolnet: Minimizing the energy consumption of binary neural networks,arXiv 2021,,,,,
 86 | Quantization-aware pruning for efficient low latency neural network inference,arXiv 2021,,,,,
 87 | Any-Precision Deep Neural Networks[PyTorch]:star:33,arXiv 2021,MP,,,,
 88 | Phoenix: A Low-Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks,TVLSI 2020,,,,,
 89 | Hierarchical Binary CNNs for Landmark Localization with Limited Resources,TPAMI 2020,B,,,,
 90 | Deep Neural Network Compression by In-Parallel Pruning-Quantization,TPAMI 2020,,,,,
 91 | Towards Efficient U-Nets: A Coupled and Quantized Approach,TPAMI 2020,,,,,
 92 | SIMBA: A Skyrmionic In-Memory Binary Neural Network Accelerator,TMAG 2020,B,,,,
 93 | Design of High Robustness BNN Inference Accelerator Based on Binary Memristors,TED 2020,B,,,,
 94 | A Resource-Efficient Inference Accelerator for Binary Convolutional Neural Networks,TCSII 2020,,,,,
 95 | "Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations:fire:126",JSAIT 2020,,,,,Gradient
 96 | An Energy-Efficient and High Throughput in-Memory Computing Bit-Cell With Excellent Robustness Under Process Variations for Binary Neural Network,ACCESS 2020,B,,,,
 97 | CP-NAS: Child-Parent Neural Architecture Search for Binary Neural Networks,IJCAI 2020,B,,,,
 98 | Towards Fully 8-bit Integer Inference for the Transformer Model,IJCAI 2020,,,,,
 99 | Soft Threshold Ternary Networks,IJCAI 2020,,,,,
100 | Overflow Aware Quantization: Accelerating Neural Network Inference by Low-bit Multiply-Accumulate Operations,IJCAI 2020,,,,,
101 | Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks,IJCAI 2020,,,,,
102 | Fully Nested Neural Network for Adaptive Compression and Quantization,IJCAI 2020,,,,,
103 | Path sample-analytic gradient estimators for stochastic binary networks,NeurIPS 2020,,,,,
104 | Efficient exact verification of binarized neural networks,NeurIPS 2020,,,,,
105 | Comparing fisher information regularization with distillation for dnn quantization,NeurIPS 2020,,,,,
106 | Position-based scaled gradient for model quantization and sparse training,NeurIPS 2020,,,,,
107 | Flexor: Trainable fractional quantization,NeurIPS 2020,,,,,
108 | Adaptive Gradient Quantization for Data-Parallel SGD[PyTorch]:star:13,NeurIPS 2020,T,,QAT,C,
109 | Bayesian Bits: Unifying Quantization and Pruning,NeurIPS 2020,MP,,QAT/PTQ,C,
110 | "Distribution-free binary classification: prediction sets, confidence intervals and calibration",NeurIPS 2020,B,,,,
111 | FleXOR: Trainable Fractional Quantization,NeurIPS 2020,,,,,
112 | HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks:fire:60,NeurIPS 2020,MP,Linear,QAT,C/O,
113 | Hierarchical Quantized Autoencoders[PyTorch]:star:22,NeurIPS 2020,,Linear,,Image Compression,
114 | Position-based Scaled Gradient for Model Quantization and Pruning[PyTorch]:star:14,NeurIPS 2020,Uni,,PTQ,C,
115 | Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point,NeurIPS 2020,,,,,
116 | Quantized Variational Inference[PyTorch],NeurIPS 2020,,,,,
117 | Robust Quantization: One Model to Rule Them All,NeurIPS 2020,Uni,Linear,QAT/PTQ,C,
118 | Rotated Binary Neural Network[PyTorch]:star:63,NeurIPS 2020,B,,,C,
119 | Searching for Low-Bit Weights in Quantized Neural Networks[PyTorch]:star:20,NeurIPS 2020,,,,,
120 | Ultra-Low Precision 4-bit Training of Deep Neural Networks,NeurIPS 2020,Uni,,QAT,C,
121 | Universally Quantized Neural Compression,NeurIPS 2020,,,,,
122 | TernaryBERT: Distillation-aware Ultra-low Bit BERT,EMNLP 2020,T,OptN,QAT,N,
123 | DFQF: Data Free Quantization-aware Fine-tuning,ACML 2020,Uni,Linear,QAT,C,
124 | One weight bitwidth to rule them all,ECCV 2020,,,,,
125 | DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks,ECCV 2020,MP,,,C,
126 | Deep Transferring Quantization[PyTorch]:star:15,ECCV 2020,Uni,,,C,
127 | Differentiable Joint Pruning and Quantization for Hardware Efficiency,ECCV 2020,MP,,,C,
128 | Finding Non-Uniform Quantization Schemes using Multi-Task Gaussian Processes,ECCV 2020,MP,PQ,QAT,C,
129 | Generative Low-bitwidth Data Free Quantization[PyTorch]:star:23,ECCV 2020,Uni,Linear,QAT,C,
130 | HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs[PyTorch]:star:37,ECCV 2020,MP,LQ,QAT,C,
131 | PAMS: Quantized Super-Resolution via Parameterized Max Scale,ECCV 2020,Uni,,,C,
132 | Post-Training Piecewise Linear Quantization for Deep Neural Networks,ECCV 2020,T/Uni,Linear,PTQ,C,
133 | QuEST: Quantized Embedding Space for Transferring Knowledge,ECCV 2020,,,,,
134 | Quantization Guided JPEG Artifact Correction,ECCV 2020,,,,,
135 | Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization,ECCV 2020,MP,Linear,,C/O,
136 | Task-Aware Quantization Network for JPEG Image Compression,ECCV 2020,,,,,
137 | End to End Binarized Neural Networks for Text Classification,ACL 2020,B,,,,
138 | Differentiable Product Quantization for End-to-End Embedding Compression[PyTorch]:star:38,ICML 2020,MP,PQ,QAT,N,
139 | Don’t Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript,ICML 2020,MP,,,C,
140 | Moniqua: Modulo Quantized Communication in Decentralized SGD,ICML 2020,B/T,,,C,
141 | Online Learned Continual Compression with Adaptive Quantization Modules[PyTorch]:star:19,ICML 2020,,,,,
142 | Towards Accurate Post-training Network Quantization via Bit-Split and Stitching[PyTorch]:star:23,ICML 2020,T/Uni,OptN,PTQ,C/O,
143 | Training Binary Neural Networks through Learning with Noisy Supervision,ICML 2020,B,,QAT,C,
144 | Up or Down? Adaptive Rounding for Post-Training Quantization,ICML 2020,,,PTQ,,
145 | Variational Bayesian Quantization[PyTorch]:star:19,ICML 2020,,,,,
146 | Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML,MLST 2020,,,,,
147 | Balanced binary neural networks with gated residual,ICASSP 2020,,,,,
148 | A Spatial RNN Codec for End-To-End Image Compression,CVPR 2020,,,,,
149 | "APQ: Joint Search for Network Architecture, Pruning and Quantization Policy",CVPR 2020,MP,Linear,QAT,C,
150 | AdaBits: Neural Network Quantization With Adaptive Bit-Widths,CVPR 2020,MP,Linear,PTQ,C,
151 | Adaptive Loss-Aware Quantization for Multi-Bit Networks,CVPR 2020,MP,LQ,QAT,C,
152 | Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach,CVPR 2020,MP,OptN,PTQ,C,
153 | Central Similarity Quantization for Efficient Image and Video Retrieval[PyTorch]:star:161,CVPR 2020,Uni,Linear,QAT,C,
154 | Data-Free Network Quantization With Adversarial Knowledge Distillation,CVPR 2020,Uni,Linear,QAT,C,
155 | Forward and Backward Information Retention for Accurate Binary Neural Networks[PyTorch]:star:133,CVPR 2020,,,,,
156 | Generalized Product Quantization Network for Semi-Supervised Image Retrieval,CVPR 2020,MP,LQ,QAT,C,
157 | LSQ+: Improving low-bit quantization through learnable offsets and better initialization,CVPR 2020,MP,LQ,QAT,C,
158 | M-LVC: Multiple Frames Prediction for Learned Video Compression[PyTorch]:star:51,CVPR 2020,,,,,
159 | OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression,CVPR 2020,,,,,
160 | Structured Compression by Weight Encryption for Unstructured Pruning and Quantization,CVPR 2020,T,OptN,QAT,C,
161 | Training Quantized Neural Networks With a Full-Precision Auxiliary Module,CVPR 2020,,,,,
162 | ZeroQ: A Novel Zero Shot Quantization Framework:fire:106[PyTorch]:star:188,CVPR 2020,MP,Linear,PTQ,C/O,
163 | Neural network quantization with adaptive bitwidths,CVPR 2020,,,,,
164 | BNNsplit: Binarized Neural Networks for embedded distributed FPGA-based computing systems,DATE 2020,B,,,,
165 | PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones,DATE 2020,B,,,,
166 | OrthrusPE: Runtime Reconfigurable Processing Elements for Binary Neural Networks,DATE 2020,B,,,,
167 | A Novel In-DRAM Accelerator Architecture for Binary Neural Network,COOLCHIPS 2020,,,,,
168 | BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency,ISQED 2020,B,,,,
169 | Training binary neural networks with real-to-binary convolutions:fire:66,ICLR 2020,,,,,
170 | Binaryduo: Reducing gradient mismatch in binary activation network by coupling binary activations,ICLR 2020,,,,,
171 | Dms: Differentiable dimension search for binary neural networks,ICLR 2020,,,,,
172 | Once-for-all: Train one network and specialize it for efficient deployment,ICLR 2020,,,,,
173 | Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks[PyTorch]:star:150,ICLR 2020,MP,,,C,
174 | And the Bit Goes Down: Revisiting the Quantization of Neural Networks:fire:64[PyTorch]:star:619,ICLR 2020,MP,PQ,,C,
175 | AutoQ: Automated Kernel-Wise Neural Network Quantization,ICLR 2020,MP,,,C,
176 | FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary,ICLR 2020,Uni,Linear,PTQ,C/O,
177 | Gradient $\ell_1$ Regularization for Quantization Robustness,ICLR 2020,Uni,Linear,,C,
178 | Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech,ICLR 2020,MP,PQ,,Speech,
179 | Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware,ICLR 2020,B/T/Uni,LQ,QAT,C/O,
180 | Mixed Precision DNNs: All you need is a good parametrization,ICLR 2020,MP,LQ,QAT,C,
181 | Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations,ICLR 2020,MP,,,C/N,
182 | Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks,ICLR 2020,Uni,,,C/N,
183 | MuBiNN: Multi-Level Binarized Recurrent Neural Network for EEG Signal Classification,ISCS 2020,B,,,,
184 | Riptide: Fast End-to-End Binarized Neural Networks,SysML 2020,,,,,
185 | Adversarial Attack on Deep Product Quantization Network for Image Retrieval,AAAI 2020,,PQ,,Image Retrieval,
186 | Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers[PyTorch]:star:3,AAAI 2020,,PQ,,C,
187 | Embedding Compression with Isotropic Iterative Quantization,AAAI 2020,,,,N/Image Retrieval,
188 | HLHLp: Quantized Neural Networks Training for Reaching Flat Minima in Loss Surface,AAAI 2020,B,Linear,QAT,C/N,
189 | Indirect Stochastic Gradient Quantization and its Application in Distributed,AAAI 2020,,,,,
190 | Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search,AAAI 2020,,,,,
191 | Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT:fire:125,AAAI 2020,MP,Linear,QAT,N,
192 | Quantized Compressive Sampling of Stochastic Gradients for Efficient Communication in Distributed Deep Learning,AAAI 2020,,,,,
193 | RTN: Reparameterized Ternary Network,AAAI 2020,,,,,
194 | Towards Accurate Low Bit-width Quantization with Multiple Phase Adaptations,AAAI 2020,,,,,
195 | Towards Accurate Quantization and Pruning via Data-free Knowledge Transfer,AAAI 2020,MP,LQ,PTQ,C,
196 | Vector Quantization-Based Regularization for Autoencoders[PyTorch]:star:11,AAAI 2020,,,,,
197 | Training Binary Neural Networks using the Bayesian Learning Rule,CoRR 2020,B,,,,
198 | Integer quantization for deep learning inference: Principles and empirical evaluation,arXiv 2020,,,,,
199 | Wrapnet: Neural net inference with ultra-low-resolution arithmetic,arXiv 2020,,,,,
200 | Leveraging automated mixed-low-precision quantization for tiny edge microcontrollers,arXiv 2020,,,,,
201 | Biqgemm: matrix multiplication with lookup table for binary-coding-based quantized dnns,arXiv 2020,,,,,
202 | Near-lossless post-training quantization of deep neural networks via a piecewise linear approximation,arXiv 2020,,,,,
203 | Efficient execution of quantized deep learning models: A compiler approach,arXiv 2020,,,,,
204 | A statistical framework for low-bitwidth training of deep neural networks,arXiv 2020,,,,,
205 | What is the state of neural network pruning?,arXiv 2020,,,,,
206 | Language models are fewshot learners,arXiv 2020,,,,,
207 | Shifted and squeezed 8-bit floating point format for low-precision training of deep neural networks,arXiv 2020,,,,,
208 | Gradient l1 regularization for quantization robustness,arXiv 2020,,,,,
209 | BinaryBERT: Pushing the Limit of BERT Quantization,arXiv 2020,B,,,,
210 | Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck,arXiv 2020,B,,,,
211 | Towards Lossless Binary Convolutional Neural Networks Using Piecewise Approximation,arXiv 2020,B,,,,
212 | RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks,arXiv 2020,B,,,,
213 | MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy?,arXiv 2020,B,,,,
214 | Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs,arXiv 2020,B,,,,
215 | Distillation Guided Residual Learning for Binary Convolutional Neural Networks,arXiv 2020,B,,,,
216 | How Does Batch Normalization Help Binary Training?,arXiv 2020,B,,,,
217 | Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming[PyTorch],arXiv 2020,MP,OptN,PTQ,C/N,
218 | A Product Engine for Energy-Efficient Execution of Binary Neural Networks Using Resistive Memories,VLSI-SoC 2019,,,,,
219 | Deep Binary Reconstruction for Cross-Modal Hashing:fire:78,TMM 2019,,,,,
220 | Compact Hash Code Learning With Binary Deep Neural Network,TM 2019,,,,,
221 | Recursive Binary Neural Network Training Model for Efficient Usage of On-Chip Memory,TCSI 2019,,,,,
222 | Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays,TCSI 2019,,,,,
223 | An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width,JSSC 2019,,,,,
224 | A Review of Binarized Neural Networks,Electronics 2019,,,,,
225 | Binarized Neural Networks for Resource-Efficient Hashing with Minimizing Quantization Loss,IJCAI 2019,,,,,
226 | Binarized Collaborative Filtering with Distilling Graph Convolutional Network,IJCAI 2019,,,,,
227 | BiScaled-DNN: Quantizing Long-tailed Datastructures with Two Scale Factors for Deep Neural Networks,DAC 2019,,,,,
228 | Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization:fire:53,NeurIPS 2019,,,,,
229 | A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off[PyTorch]:star:12,NeurIPS 2019,,,,,Theory
230 | Bit Efficient Quantization for Deep Neural Networks,NeurIPS 2019,Uni,Linear/Log,PTQ,C,
231 | Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients,NeurIPS 2019,,,,,
232 | Dimension-Free Bounds for Low-Precision Training,NeurIPS 2019,,Log,,,Theory
233 | Double Quantization for Communication-Efficient Distributed Optimization,NeurIPS 2019,,,,,
234 | Focused Quantization for Sparse CNNs,NeurIPS 2019,Uni,LQ,,C,
235 | Generalization Error Analysis of Quantized Compressive Learning,NeurIPS 2019,,,,,Theory
236 | Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks:fire:56,NeurIPS 2019,Uni,,PTQ,C/O/N,
237 | MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization[PyTorch]:star:48,NeurIPS 2019,B,,QAT,C,
238 | Model Compression with Adversarial Robustness: A Unified Optimization Framework,NeurIPS 2019,Uni,,,C,
239 | Normalization Helps Training of Quantized LSTM,NeurIPS 2019,B/T/Uni,,,C/N,
240 | Post-training 4-bit quantization of convolution networks for rapid-deployment:fire:161[PyTorch]:star:163,NeurIPS 2019,T/Uni,,PTQ,C,
241 | "Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations:fire:126",NeurIPS 2019,B,Randomized/Sign,,C,
242 | Using Neuroevolved Binary Neural Networks to solve reinforcement learning environments,APCCAS 2019,,,,,
243 | BinaryDenseNet: Developing an architecture for binary neural networks,ICCVW 2019,,,,,
244 | Low-bit quantization of neural networks for efficient inference:fire:112,ICCV 2019,,,,,
245 | Bayesian Optimized 1-Bit CNNs,ICCV 2019,B,,,C,
246 | Data-Free Quantization Through Weight Equalization and Bias Correction:fire:135,ICCV 2019,Uni,Linear,PTQ,C,
247 | Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks:fire:129,ICCV 2019,B/T/Uni,,,C,
248 | HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision:fire:155,ICCV 2019,MP,Linear,QAT,C,
249 | Proximal Mean-Field for Neural Network Quantization,ICCV 2019,B,,,C,
250 | Unsupervised Neural Quantization for Compressed-Domain Similarity Search[PyTorch]:star:28,ICCV 2019,MP,LQ,,Image Retrieval,
251 | Training Accurate Binary Neural Networks from Scratch,ICIP 2019,,,,,
252 | Binarized Depthwise Separable Neural Network for Object Tracking in FPGA,GLSVLSI 2019,,,,,
253 | Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design:fire:169[PyTorch]:star:152,ICML 2019,,,,,
254 | Improving Neural Network Quantization without Retraining using Outlier Channel Splitting:fire:114[PyTorch]:star:80,ICML 2019,Uni,Linear,PTQ,C,
255 | Lossless or Quantized Boosting with Integer Arithmetic,ICML 2019,,,,C,
256 | SWALP : Stochastic Weight Averaging in Low-Precision Training[PyTorch]:star:52,ICML 2019,Uni,Linear,,C,
257 | PXNOR: Perturbative Binary Neural Network,ROEDUNET 2019,,,,,
258 | Learning channel-wise interactionsfor binary convolutional neural networks,CVPR 2019,,,,,
259 | Circulant binary convolutional networks: Enhancing the performance of 1-bit dcnns with circulant back propagation,CVPR 2019,,,,,
260 | Fighting quantization bias with bias,CVPR 2019,,,,,
261 | A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks,CVPR 2019,B,,,C,
262 | Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?:fire:84,CVPR 2019,B,,,C,
263 | Compressing Unknown Images With Product Quantizer for Efficient Zero-Shot Classification,CVPR 2019,MP,PQ/LQ,,C/ZSL/GZSL,
264 | Deep Spherical Quantization for Image Search,CVPR 2019,,,,Image Search,
265 | End-To-End Supervised Product Quantization for Image Search and Retrieval,CVPR 2019,,PQ,,Image Search/Retrieval,
266 | Fully Quantized Network for Object Detection:fire:59,CVPR 2019,,,,,
267 | HAQ: Hardware-Aware Automated Quantization With Mixed Precision:fire:305[PyTorch]:star:243,CVPR 2019,MP,Linear/K,QAT,C,
268 | Learning Channel-Wise Interactions for Binary Convolutional Neural Networks,CVPR 2019,B,,,C,
269 | Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss:fire:168,CVPR 2019,T/Uni,LQ,,C,
270 | Quantization Networks:fire:84,CVPR 2019,Uni,,QAT,C/O,
271 | SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization,CVPR 2019,T/Uni,Linear,,C,
272 | Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation,CVPR 2019,T,LQ,QAT,C,
273 | Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation:fire:91,CVPR 2019,B,,,C/S,
274 | Variational information distillation for knowledge transfer:fire:188,CVPR 2019,,,,,
275 | Proxylessnas: Direct neural architecture search on target task and hardware,ICLR 2019,,,,,
276 | Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks,ICLR 2019,MP,,QAT,C,Theory
277 | Analysis of Quantized Models,ICLR 2019,,,,,Theory
278 | Defensive Quantization: When Efficiency Meets Robustness:fire:81,ICLR 2019,,,,C,
279 | Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network,ICLR 2019,B/T/Uni,,,C,CoDesign
280 | From Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference,ICLR 2019,MP,PQ,QAT,N,
281 | Learning Recurrent Binary/Ternary Weights[PyTorch]:star:13,ICLR 2019,B/T,,QAT,C/N,
282 | On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks,ICLR 2019,,,,C,Theory
283 | Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm,ICLR 2019,,,,C,
284 | ProxQuant: Quantized Neural Networks via Proximal Operators:fire:56[PyTorch]:star:17,ICLR 2019,B,,QAT,C,
285 | Relaxed Quantization for Discretized Neural Networks:fire:74,ICLR 2019,,LQ,,C,Stochastic
286 | Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets:fire:81,ICLR 2019,,,,C,Theory
287 | Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA,FPGA 2019,,,,,
288 | Deep Neural Network Quantization via Layer-Wise Optimization using Limited Training Data[PyTorch]:star:30,AAAI 2019,Uni,,,C,
289 | Efficient Quantization for Compact Neural Networks with Binary Weights and Low Bitwidth Activations,AAAI 2019,B,Linear/Log,QAT,C,
290 | "Multi-Precision Quantized Neural Networks via Encoding Decomposition of {-1,+1}",AAAI 2019,MP,,,C/O,
291 | Similarity Preserving Deep Asymmetric Quantization for Image Retrieval,AAAI 2019,,,QAT,Image Retrieval,
292 | RBCN: Rectified Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs,CoRR 2019,B,,,,
293 | TentacleNet: A Pseudo-Ensemble Template for Accurate Binary Convolutional Neural Networks,CoRR 2019,B,,,,
294 | Improved training of binary networks for human pose estimation and image recognition,CoRR 2019,B,,,,
295 | Binarized Neural Architecture Search,CoRR 2019,,,,,
296 | Matrix and tensor decompositions for training binary neural networks,CoRR 2019,,,,,
297 | Back to Simplicity: How to Train Accurate BNNs from Scratch?,CoRR 2019,,,,,
298 | MoBiNet: A Mobile Binary Network for Image Classification,arXiv 2019,,,,,
299 | Training high-performance and large-scale deep neural networks with full 8-bit integers,arXiv 2019,,,,,
300 | Knowledge distillation for optimization of quantized deep neural networks,arXiv 2019,,,,,
301 | Accurate and compact convolutional neural networks with trained binarization,arXiv 2019,,,,,
302 | Mixed precision training with 8-bit floating point,arXiv 2019,,,,,
303 | Additive powers-of-two quantization: An efficient nonuniform discretization for neural networks,arXiv 2019,,,,,
304 | Regularizing activation distribution for training binarized deep networks:fire:61,arXiv 2019,,,,,
305 | The knowledge within: Methods for data-free model compression:fire:50,arXiv 2019,,,,,
306 | Xnornet++: Improved binary neural networks,arXiv 2019,,,,,
307 | Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search,arXiv 2019,,,,,
308 | QKD: Quantization-aware Knowledge Distillation,arXiv 2019,,,,,
309 | daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices,arXiv 2019,,,,,
310 | Towards Unified INT8 Training for Convolutional Neural Network,arXiv 2019,,,,,
311 | BNN+: Improved Binary Network Training:fire:72,arXiv 2019,,,,,
312 | Learned Step Size Quantization:fire:129,arXiv 2019,MP,LQ,QAT,C,
313 | "Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization",arXiv 2019,Uni,Linear,PTQ,C,
314 | An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks,TVLSI 2018,,,,,
315 | FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks,TRETS 2018,,,,,
316 | Inference of quantized neural networks on heterogeneous all-programmable devices,NE 2018,,,,,
317 | A Deep Look into Logarithmic Quantization of Model Parameters in Neural Networks,IAIT 2018,,,,,
318 | Deterministic Binary Filters for Convolutional Neural Networks,IJCAI 2018,,,,,
319 | Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models,IJCAI 2018,,,,,
320 | A Quantization-Friendly Separable Convolution for MobileNets,EMC2 2018,,,,,
321 | Moonshine: Distilling with cheap convolutions,NeurIPS 2018,,,,,
322 | A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication:fire:104,NeurIPS 2018,,,,,
323 | GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training,NeurIPS 2018,,PQ,,C,
324 | Heterogeneous Bitwidth Binarization in Convolutional Neural Networks,NeurIPS 2018,MP,,,C,
325 | HitNet: Hybrid Ternary Recurrent Neural Network,NeurIPS 2018,T/Uni,,,N,Stochastic
326 | Scalable methods for 8-bit training of neural networks:fire:151[PyTorch]:star:191,NeurIPS 2018,Uni,,,C,
327 | Training Deep Neural Networks with 8-bit Floating Point Numbers:fire:213,NeurIPS 2018,Uni,,,C,Stochastic
328 | A survey of FPGA-based accelerators for convolutional neural networks,NCA 2018,,,,,
329 | BitStream: Efficient Computing Architecture for Real-Time Low-Power Inference of Binary Neural Networks on CPUs,MM 2018,,,,,
330 | FBNA: A Fully Binarized Neural Network Accelerator,FPL 2018,,,,,
331 | Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm:fire:222[Caffe&pytorch]:star:138,ECCV 2018,B,,,,
332 | LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks:fire:325[PyTorch]:star:207,ECCV 2018,B/Uni,LQ,QAT,C,
333 | LSQ++: Lower running time and higher recall in multi-codebook quantization,ECCV 2018,MP,LQ,,C,
334 | Learning Compression from limited unlabeled Data,ECCV 2018,Uni,,,C,
335 | Product Quantization Network for Fast Image Retrieval,ECCV 2018,Uni,PQ,,Image Retrieval,
336 | Quantization Mimic: Towards Very Tiny CNN for Object Detection:fire:55,ECCV 2018,Uni,,,O,
337 | Quantized Densely Connected U-Nets for Efficient Landmark Localization:fire:105,ECCV 2018,,,,,
338 | TBN: Convolutional Neural Network with Ternary Inputs and Binary Weights:fire:57,ECCV 2018,,,,,
339 | Training Binary Weight Networks via Semi-Binary Decomposition,ECCV 2018,,,,,
340 | Value-aware Quantization for Training and Inference of Neural Networks:fire:75,ECCV 2018,,,,,
341 | Gap-8: A risc-v soc for ai at the edge of the iot,ASAP 2018,,,,,
342 | Distilled binary neural network for monaural speech separation,IJCNN 2018,,,,,
343 | Fast object detection based on binary deep convolution neural networks,IJCNN 2018,,,,,
344 | Analysis and Implementation of Simple Dynamic Binary Neural Networks,IJCNN 2018,,,,,
345 | SIGNSGD: compressed optimisation for non-convex problems:fire:393[PyTorch]:star:54,ICML 2018,,,,,Gradient
346 | Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation:fire:59,CVPR 2018,,,,,
347 | Explicit loss-error-aware quantization for low-bit deep neural networks:fire:67,CVPR 2018,,,,,
348 | A biresolution spectral framework for product quantization,CVPR 2018,,,,,
349 | Amc: Automl for model compression and acceleration on mobile devices:fire:814,CVPR 2018,,,,,
350 | Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations,CVPR 2018,,,,,
351 | Modulated convolutional networks,CVPR 2018,B,,,,
352 | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference:fire:1013,CVPR 2018,,,,,
353 | SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks:fire:84[PyTorch]:star:31,CVPR 2018,T,,,,
354 | Towards Effective Low-bitwidth Convolutional Neural Networks:fire:121,CVPR 2018,,,QAT,,
355 | Two-Step Quantization for Low-bit Neural Networks:fire:72,CVPR 2018,,,,,
356 | CLIP-Q: Deep Network Compression Learning by In-parallel Pruning-Quantization:fire:165,CVPR 2018,,,,,
357 | BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU,IPDPS 2018,,,,,
358 | Mixed Precision Training of Convolutional Neural Networks using Integer Operations:fire:117,ICLR 2018,,,,,
359 | An empirical study of binary neural networks’ optimisation,ICLR 2018,,,,,
360 | Adaptive Quantization of Neural Networks,ICLR 2018,,,,,
361 | Alternating Multi-bit Quantization for Recurrent Neural Networks:fire:87,ICLR 2018,,,,,
362 | Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy:fire:208,ICLR 2018,,,,,
363 | Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking[PyTorch]:star:15,ICLR 2018,,,,,CoDesign
364 | Loss-aware Weight Quantization of Deep Networks:fire:94,ICLR 2018,,,,,
365 | Model compression via distillation and quantization:fire:353[PyTorch]:star:293,ICLR 2018,,,,,
366 | Training and Inference with Integers in Deep Neural Networks:fire:231[[tensorflow]]:star:132,ICLR 2018,,,,,
367 | Variational Network Quantization:fire:58,ICLR 2018,,,,,
368 | WRPN: Wide Reduced-Precision Networks:fire:180,ICLR 2018,,,,,
369 | Adaptive Quantization for Deep Neural Networ:fire:70,AAAI 2018,,,,,
370 | Deep Neural Network Compression with Single and Multiple Level Quantization:fire:65[PyTorch]:star:20,AAAI 2018,,,,,
371 | Distributed Composite Quantization,AAAI 2018,,,,,
372 | Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM:fire:207,AAAI 2018,,,,,
373 | From Hashing to CNNs: Training Binary Weight Networks via Hashing:fire:62,AAAI 2018,B,,,,
374 | Product Quantized Translation for Fast Nearest Neighbor Search,AAAI 2018,,,,,
375 | Quantized Memory-Augmented Neural Networks,AAAI 2018,,,,,
376 | ReBNet: Residual Binarized Neural Network,ISFPCCM 2018,,,,,
377 | LightNN: Filling the Gap between Conventional Deep Neural Networks and Binarized Networks,CoRR 2018,,,,,
378 | BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights,CoRR 2018,,,,,
379 | Learning low precision deep neural networks through regularization,arXiv 2018,,,,,
380 | Blended coarse gradient descent for full quantization of deep neural networks:fire:48,arXiv 2018,,,,,
381 | XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference:fire:72,arXiv 2018,,,,,
382 | A Survey on Methods and Theories of Quantized Neural Networks:fire:128,arXiv 2018,,,,,
383 | Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines,arXiv 2018,,,,,
384 | Discovering low-precision networks close to full-precision networks for efficient embedded inference:fire:83,arXiv 2018,,,,,
385 | On periodic functions as regularizers for quantization of neural networks,arXiv 2018,,,,,
386 | Rethinking floating point for deep learning:fire:95,arXiv 2018,,,,,
387 | Quantizing deep convolutional networks for efficient inference: A whitepaper:fire:425,arXiv 2018,,,,,
388 | Quantization for rapid deployment of deep neural networks,arXiv 2018,,,,,
389 | Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation,arXiv 2018,,,,,
390 | Uniq: Uniform noise injection for non-uniform quantization of neural networks,arXiv 2018,,,,,
391 | Training Competitive Binary Neural Networks from Scratch,arXiv 2018,,,,,
392 | Joint Neural Architecture Search and Quantization,arXiv 2018,,,,,
393 | BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights:fire:55,arXiv 2018,,,,,
394 | Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients:fire:1209[PyTorch],arXiv 2018,Uni,,,,
395 | Espresso: Efficient Forward Propagation for BCNNs,arXiv 2018,,,,,CoDesign
396 | Mixed Precision Training:fire:601,arXiv 2018,Uni,,,,
397 | PACT: Parameterized Clipping Activation for Quantized Neural Networks:fire:341,arXiv 2018,,,,,
398 | Terngrad: Ternary gradients to reduce communication in distributed deep learning:fire:649,NeurIPS 2017,,,,,
399 | QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding:fire:696,NeurIPS 2017,,,,,Gradient
400 | Towards Accurate Binary Convolutional Neural Network:fire:193[TensorFlow]:star:49,NeurIPS 2017,B,,,,
401 | Training Quantized Nets: A Deeper Understanding:fire:134,NeurIPS 2017,,,,,
402 | Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization[Caffe],BMVC 2017,,,,,
403 | Performance guaranteed network acceleration via high-order residual quantization,ICCV 2017,,,,,
404 | Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources:fire: 130[PyTorch]:star:207,ICCV 2017,B,,,,
405 | Performance Guaranteed Network Acceleration via High-Order Residual Quantization:fire:55,ICCV 2017,,,,,
406 | Binary Deep Neural Networks for Speech Recognition,Interspeech 2017,B,,,,
407 | Ternary neural networks for resource-efficient AI applications,IJCNN 2017,,,,,
408 | Fixed-point optimization of deep neural networks with adaptive step size retraining,ICASSP 2017,MP,,,,
409 | Deep Learning with Low Precision by Half-wave Gaussian Quantization:fire:288[Caffe]:star:118,CVPR 2017,,,,,
410 | Fixed-point Factorized Networks,CVPR 2017,T,,,,Factor
411 | Local Binary Convolutional Neural Networks:star:94:fire:156,CVPR 2017,,,,,
412 | Network Sketching: Exploiting Binary Structure in Deep CNNs:fire:71,CVPR 2017,B,,,,
413 | Weighted-Entropy-Based Quantization for Deep Neural Networks:fire:144,CVPR 2017,,Non,,,
414 | A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks,DC 2017,B,,,,CoDesign
415 | On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA,IPDPSW 2017,,,,,CoDesign
416 | Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights:fire:607[PyTorch]:star:181,ICLR 2017,T/Uni,Log,QAT,C,
417 | Learning Discrete Weights Using the Local Reparameterization Trick:fire:61,ICLR 2017,,,,,Stochastic
418 | Loss-aware Binarization of Deep Networks:fire:119[PyTorch]:star:18,ICLR 2017,B,,,,
419 | Soft Weight-Sharing for Neural Network Compression:fire:222:star:18,ICLR 2017,B,,,,
420 | Towards the Limit of Network Quantization:fire:114,ICLR 2017,Uni,,,,
421 | "FINN: A Framework for Fast, Scalable Binarized Neural Network Inference:fire:463",FPGA 2017,B,,,,
422 | How to train a compact binary neural network with high accuracy?:fire:205,AAAI 2017,,,,,
423 | Adaptive Quantization for Deep Neural Network:fire:67,AAAI 2017,MP,,,,
424 | The high-dimensional geometry of binary neural networks,CoRR 2017,,,,,
425 | BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet,CoRR 2017,,,,,
426 | Deep learning binary neural network on an FPGA,arXiv 2017,B,,,,
427 | FP-BNN: Binarized neural network on FPGA:126:,arXiv 2017,B,,,,CoDesign
428 | Accelerating Deep Convolutional Networks using low-precision and sparsity:fire:111,arXiv 2017,T,,,,
429 | Bit-regularized optimization of neural nets,arXiv 2017,,,,,
430 | Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks,arXiv 2017,,,,,
431 | Learning deep binary descriptor with multi-quantization:fire:97,arXiv 2017,,,,,
432 | Gxnor-net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework,arXiv 2017,,,,,
433 | Soft-to-hard vector quantization for end-to-end learning compressible representations,arXiv 2017,,,,,
434 | ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks[TensorFlow]:star:53,arXiv 2017,,,,,
435 | Ternary Neural Networks with Fine-Grained Quantization:fire:71,arXiv 2017,T,,,,
436 | Trained Ternary Quantization:fire:734,arXiv 2017,T,,,,
437 | Decision making with quantized priors leads to discrimination,JPROC 2016,,,,,
438 | Communication quantization for data-parallel training of deep neural networks:fire:130,MLHPC 2016,,,,,
439 | XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks:fire:3469[PyTorch]:star:807,ECCV 2016,B,,,,
440 | Overcoming challenges in fixed point training of deep convolutional networks,ICMLW 2016,,,,,
441 | Fixed point quantization of deep convolutional networks:fire:696,ICML 2016,,,,,
442 | "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding:fire:5045",CVPR 2016,,,,,
443 | Quantized convolutional neural networks for mobile devices:fire:270,CVPR 2016,,,,,
444 | Fixed-point Performance Analysis of Recurrent Neural Networks:fire:67,arXiv 2016,,,,,
445 | Qsgd: Randomized quantization for communication-optimal stochastic gradient descent:fire:801,arXiv 2016,,,,,
446 | Effective quantization methods for recurrent neural networks:fire:62,arXiv 2016,,,,,
447 | Sigma delta quantized networks,arXiv 2016,,,,,
448 | Recurrent neural networks with limited numerical precision:fire:65,arXiv 2016,,,,,
449 | Training bit fully convolutional network for fast semantic segmentation,arXiv 2016,,,,,
450 | Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations:fire:1347,arXiv 2016,,,,,
451 | Convolutional neural networks using logarithmic data representation:fire:320,arXiv 2016,,,,,
452 | Layer normalization:fire:4125,arXiv 2016,,,,,
453 | Binarized Neural Networks on the ImageNet Classification Task,arXiv 2016,B,,,,
454 | Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1:fire:1574[PyTorch]:star:252,arXiv 2016,B,,,,
455 | Deep neural networks are robust to weight binarization and other non-linear distortions:fire:77,arXiv 2016,,,,,
456 | Neural Networks with Few Multiplications:fire:258[PyTorch]:star:81,arXiv 2016,T,,,,
457 | Ternary weight networks:fire:647[Caffe]:star:63,arXiv 2016,T,,,,
458 | Batch normalization: Accelerating deep network training by reducing internal covariate shift:fire:32893,PMLR 2015,,,,,
459 | BinaryConnect: Training Deep Neural Networks with binary weights during propagations:fire:2267[PyTorch]:star:344,NeurIPS 2015,B,,,,
460 | Bitwise Neural Networks:fire:191,ICML 2015,B,,,,
461 | Compressing neural networks with hashing trick:fire:887,ICML 2015,,,,,
462 | Deep Learning with Limited Numerical Precision:fire:1378,ICML 2015,Uni,,,,
463 | Fixed point optimization of deep convolutional neural networks for object recognition:fire:226,ICASSP 2015,,,,,
464 | 8-Bit Approximations for Parallelism in Deep Learning:fire:114,ICLR 2015,,,,,
465 | Training deep neural networks with low precision multiplications:fire:498,ICLR 2015,,,,,
466 | Rounding methods for neural networks with low resolution synaptic weights,arXiv 2015,,,,,
467 | Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation:fire:50,arXiv 2015,,,,,
468 | Resiliency of Deep Neural Networks under quantizations:fire:123,arXiv 2015,,,,,
469 | "Fixed-point feedforward deep neural network design using weights +1, 0, and −1:fire:269",SiPS 2014,,,,,
470 | Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights:fire:190,NeurIPS 2014,,,,,Stochastic
471 | 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns:fire:679,Interspeech 2014,,,,,
472 | Compressing deep convolutional networks using vector quantization:fire:981,arXiv 2014,,,,,
473 | Lowrank matrix factorization for deep neural network training with high-dimensional output targets:fire:563,ICASSP 2013,,,,,
474 | Estimating or propagating gradients through stochastic neurons for conditional computation:fire:1346,arXiv 2013,,,,,
475 | Product quantization for nearest neighbor search:fire:2268,TPAMI 2010,,,,,
476 | An introduction to natural computation:fire:309,MITPress 1999,,,,,


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Awesome-Quantization-Papers [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
  2 | 
  3 | This repo contains a comprehensive paper list of **Model Quantization** for efficient deep learning on AI conferences/journals/arXiv. As a highlight, we categorize the papers in terms of model structures and application scenarios, and label the quantization methods with keywords. <br>
  4 | 
  5 | This repo is being actively updated, and contributions in any form to make this list more comprehensive are welcome. Special thanks to collaborator [Zhikai Li](https://github.com/zkkli), and all researchers who have contributed to this repo! <br> 
  6 | 
  7 | If you find this repo useful, please consider **★STARing** and feel free to share it with others! <br>
  8 | 
  9 | **[Update: Mar, 2024]** Add new papers from ICLR-25. <br>
 10 | **[Update: Nov, 2024]** Add new papers from ECCV-24 and NeurIPS-24. <br>
 11 | **[Update: Sep, 2024]** Add new papers from ICML-24 and IJCAI-24. <br>
 12 | **[Update: Jul, 2024]** Add new papers from CVPR-24. <br>
 13 | **[Update: May, 2024]** Add new papers from ICLR-24. <br>
 14 | **[Update: Apr, 2024]** Add new papers from AAAI-24. <br>
 15 | **[Update: Nov, 2023]** Add new papers from NeurIPS-23. <br>
 16 | **[Update: Oct, 2023]** Add new papers from ICCV-23. <br>
 17 | **[Update: Jul, 2023]** Add new papers from AAAI-23 and ICML-23. <br>
 18 | **[Update: Jun, 2023]** Add new arXiv papers uploaded in May 2023, especially the hot LLM quantization field. <br>
 19 | **[Update: Jun, 2023]** Reborn this repo! New style, better experience! <br>
 20 | 
 21 | ---
 22 | ## Overview
 23 | 
 24 | - [Awesome-Quantization-Papers ](#awesome-quantization-papers-)
 25 |   - [Overview](#overview)
 26 |   - [Survey](#survey)
 27 |   - [Transformer-based Models](#transformer-based-models)
 28 |     - [Language Transformers](#language-transformers)
 29 |     - [Vision Transformers](#vision-transformers)
 30 |     - [Visual Generation](#visual-generation)
 31 |   - [Convolutional Neural Networks](#convolutional-neural-networks)
 32 |     - [Visual Generation](#visual-generation-1)
 33 |     - [Image Classification](#image-classification)
 34 |     - [Other Tasks](#other-tasks)
 35 |       - [Object Detection](#object-detection)
 36 |       - [Super Resolution](#super-resolution)
 37 |       - [Point Cloud](#point-cloud)
 38 |   - [References](#references)
 39 | 
 40 | **Keywords**: **`PTQ`**: post-training quantization | **`Non-uniform`**: non-uniform quantization | **`MP`**: mixed-precision quantization | **`Extreme`**: binary or ternary quantization
 41 | 
 42 | ---
 43 | 
 44 | 
 45 | ## Survey
 46 | - "A Survey of Quantization Methods for Efficient Neural Network Inference", Book Chapter: Low-Power Computer Vision, 2021. [[paper](https://arxiv.org/abs/2103.13630)]
 47 | - "Full Stack Optimization of Transformer Inference: a Survey", arXiv, 2023. [[paper](https://arxiv.org/abs/2302.14017)]
 48 | - "A White Paper on Neural Network Quantization", arXiv, 2021. [[paper](https://arxiv.org/abs/2106.08295)]
 49 | - "Binary Neural Networks: A Survey", PR, 2020. [[Paper](https://arxiv.org/abs/2004.03333)] [**`Extreme`**]
 50 | 
 51 | 
 52 | ## Transformer-based Models
 53 | ### Language Transformers
 54 | - "CBQ: Cross-Block Quantization for Large Language Models", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/28924)]
 55 | - "SpinQuant: LLM Quantization with Learned Rotations", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/28338)]
 56 | - "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/30168)]
 57 | - "Q-VLM: Post-training Quantization for Large Vision-Language Models", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/94107)]
 58 | - "KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/96936)]
 59 | - "QBB: Quantization with Binary Bases for LLMs", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/95634)]
 60 | - "DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/93727)]
 61 | - "ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/96563)]
 62 | - "KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/93558)]
 63 | - "Evaluating Quantized Large Language Models", ICML, 2024. [[paper](https://openreview.net/forum?id=DKKg5EFAFr)]
 64 | - "SqueezeLLM: Dense-and-Sparse Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=0jpbpFia8m)] [**`PTQ`**] [**`Non-uniform`**]
 65 | - "KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache", ICML, 2024. [[paper](https://openreview.net/forum?id=L057s2Rq8O)]
 66 | - "LQER: Low-Rank Quantization Error Reconstruction for LLMs", ICML, 2024. [[paper](https://openreview.net/forum?id=dh8k41g775)]
 67 | - "Extreme Compression of Large Language Models via Additive Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=5mCaITRTmO)]
 68 | - "BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=DbyHDYslM7)]
 69 | - "BiLLM: Pushing the Limit of Post-Training Quantization for LLMs", ICML, 2024. [[paper](https://openreview.net/forum?id=qOl2WWOqFg)]
 70 | - "Compressing Large Language Models by Joint Sparsification and Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=sCGRhnuMUJ)]
 71 | - "FrameQuant: Flexible Low-Bit Quantization for Transformers", ICML, 2024. [[paper](https://openreview.net/forum?id=xPypr0kufs)] [**`PTQ`**]
 72 | - "OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=8Wuvhh0LYW)]"
 73 | - "LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=LzPWWPAdY4)]
 74 | - "SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression", ICLR, 2024. [[paper](https://openreview.net/forum?id=Q1u25ahSuy)] [**`PTQ`**]
 75 | - "QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=WvFoJccpo8)]
 76 | - "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=FIplmUWdm3)] [**`PTQ`**]
 77 | - "PB-LLM: Partially Binarized Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=BifeBRhikU)] [**`Extreme`**]
 78 | - "AffineQuant: Affine Transformation Quantization for Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=of2rhALq8l)]
 79 | - "Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=JzG7kSpjJk)]
 80 | - "LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=gLARhFLE0F)] 
 81 | - "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29237)]
 82 | - "Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29815)]
 83 | - "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29860)]
 84 | - "Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29908)]  [**`PTQ`**]
 85 | - "What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29765)]
 86 | - "EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs", arXiv, 2024. [[paper](http://arxiv.org/abs/2403.02775)]
 87 | - "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact", arXiv, 2024. [[paper](http://arxiv.org/abs/2403.01241)]
 88 | - "FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.17985)]
 89 | - "A Comprehensive Evaluation of Quantization Strategies for Large Language Models", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.16775)]
 90 | - "GPTVQ: The Blessing of Dimensionality for LLM Quantization", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.15319)]
 91 | - "APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.14866)]
 92 | - "EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.10787)]
 93 | - "RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.05628)]
 94 | - "Accurate LoRA-Finetuning Quantization of LLMs via Information Retention", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.05445)]
 95 | - "KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.18079)]
 96 | - "Extreme Compression of Large Language Models via Additive Quantization", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.06118)]
 97 | - "ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.08583)] [**`PTQ`**]
 98 | - "CBQ: Cross-Block Quantization for Large Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.07950)] [**`PTQ`**]
 99 | - "FP8-BERT: Post-Training Quantization for Transformer", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.05725)] [**`PTQ`**]
100 | - "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.05693)]
101 | - "SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.03788)] [**`PTQ`**]
102 | - "A Speed Odyssey for Deployable Quantization of LLMs", arXiv, 2023. [[paper](http://arxiv.org/abs/2311.09550)]
103 | - "AFPQ: Asymmetric Floating Point Quantization for LLMs", arXiv, 2023. [[paper](http://arxiv.org/abs/2311.01792)]
104 | - "Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization", arXiv, 2023. [[paper](http://arxiv.org/abs/2311.16442)]
105 | - "QLoRA: Efficient Finetuning of Quantized LLMs", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/71815)] [[code](https://github.com/artidoro/qlora)]
106 | - "QuIP: 2-Bit Quantization of Large Language Models With Guarantees", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/69982)] [[code](https://github.com/jerry-chee/QuIP)] [**`PTQ`**]
107 | - "Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/72931)]
108 | - "QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources", arXiv, 2023. [[paper](https://arxiv.org/abs/2310.07147)]
109 | - "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2310.16795)]
110 | - "Atom: Low-bit Quantization for Efficient and Accurate LLM Serving", arXiv, 2023. [[paper](http://arxiv.org/abs/2310.19102)]
111 | - "ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers", arXiv, 2023. [[paper](http://arxiv.org/abs/2310.17723)]
112 | - "LLM-FP4: 4-Bit Floating-Point Quantized Transformers", arXiv, 2023. [[paper](https://arxiv.org/abs/2310.16836)]
113 | - "TEQ: Trainable Equivalent Transformation for Quantization of LLMs", arXiv, 2023. [[paper](http://arxiv.org/abs/2310.10944)]
114 | - "Efficient Post-training Quantization with FP8 Formats", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.14592)]
115 | - "Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantization", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.13575)]
116 | - "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.05516)]
117 | - "Norm Tweaking: High-performance Low-bit Quantization of Large Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.02784)]
118 | - "Understanding the Impact of Post-Training Quantization on Large Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.05210)]
119 | - "QuantEase: Optimization-based Quantization for Language Models -- An Efficient and Intuitive Algorithm", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.01885)]
120 | - "FPTQ: Fine-grained Post-Training Quantization for Large Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2308.15987)]
121 | - "FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs", arXiv, 2023. [[paper](http://arxiv.org/abs/2308.09723)] [**`PTQ`**]
122 | - "Gradient-Based Post-Training Quantization: Challenging the Status Quo", arXiv, 2023. [[paper](http://arxiv.org/abs/2308.07662)] [**`PTQ`**]
123 | - "NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search", arXiv, 2023. [[paper](http://arxiv.org/abs/2308.05600)] [**`Non-uniform`**] 
124 | - "ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats", arXiv, 2023. [[paper](http://arxiv.org/abs/2307.09782)]
125 | - "Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2307.05972)]
126 | - "Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study", arXiv, 2023. [[paper](https://arxiv.org/abs/2307.08072)]
127 | - "INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation", arXiv, 2023. [[paper](https://arxiv.org/abs/2306.08162)]
128 | - "QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models", arXiv, 2023. [[paper](https://arxiv.org/abs/2307.03738)] [[code](https://github.com/IST-DASLab/QIGen)]
129 | - "OWQ: Lessons learned from activation outliers for weight quantization in large language models", arXiv, 2023. [[paper](http://arxiv.org/abs/2306.02272)] [**`PTQ`**]
130 | - "PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2306.00014)]
131 | - "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration", arXiv, 2023. [[paper](https://arxiv.org/abs/2306.00978)] [**`PTQ`**]
132 | - "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models", arXiv, 2023. [[paper](https://arxiv.org/abs/2305.17888)]
133 | - "Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling", arXiv, 2023. [[paper](https://arxiv.org/abs/2304.09145)] [**`PTQ`**]
134 | - "RPTQ: Reorder-based Post-training Quantization for Large Language Models", arXiv, 2023. [[paper](https://arxiv.org/abs/2304.01089)] [[code](https://github.com/hahnyuan/rptq4llm)] [**`PTQ`**]
135 | - "The case for 4-bit precision: k-bit Inference Scaling Laws", ICML, 2023. [[paper](https://openreview.net/forum?id=i8tGb1ab1j)]
136 | - "Quantized Distributed Training of Large Models with Convergence Guarantees", ICML, 2023. [[paper](https://openreview.net/forum?id=Nqp8A5IDzq)]
137 | - "Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases", ICML, 2023. [[paper](https://openreview.net/forum?id=q1WGm3hItW)]
138 | - "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models", ICML, 2023. [[paper](https://arxiv.org/abs/2211.10438)] [[code](https://github.com/mit-han-lab/smoothquant)] [**`PTQ`**]
139 | - "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers", ICLR, 2023. [[papar](https://arxiv.org/abs/2210.17323)]  [[code](https://github.com/IST-DASLab/gptq)] [**`PTQ`**]
140 | - "BiBERT: Accurate Fully Binarized BERT", ICLR, 2022. [[paper](https://openreview.net/forum?id=5xEgrl_5FAJ)] [[code](https://github.com/htqin/BiBERT)] [**`Extreme`**]
141 | - "BiT: Robustly Binarized Multi-distilled Transformer", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=55032)] [[code](https://github.com/facebookresearch/bit)] [**`Extreme`**]
142 | - "Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models", NeurIPS, 2022. [[paper]](https://arxiv.org/abs/2209.13325) [[code](https://github.com/wimh966/outlier_suppression)] [**`PTQ`**]
143 | - "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale", NeurIPS, 2022. [[paper](https://arxiv.org/abs/2208.07339)] [[code](https://github.com/timdettmers/bitsandbytes)]
144 | - "Towards Efficient Post-training Quantization of Pre-trained Language Models", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=53407)] [**`PTQ`**]
145 | - "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=54407)] [[code](https://github.com/microsoft/DeepSpeed)] [**`PTQ`**]
146 | - "Compression of Generative Pre-trained Language Models via Quantization", ACL, 2022. [[paper](https://aclanthology.org/2022.acl-long.331)] 
147 | - "I-BERT: Integer-only BERT Quantization", ICML, 2021. [[paper](https://proceedings.mlr.press/v139/kim21d.html)] [[code](https://github.com/kssteven418/I-BERT)]
148 | - "BinaryBERT: Pushing the Limit of BERT Quantization", ACL, 2021. [[paper](https://arxiv.org/abs/2012.15701)] [[code](https://github.com/huawei-noah/Pretrained-Language-Model)] [**`Extreme`**]
149 | - "On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers", ACL, 2021. [[paper](https://aclanthology.org/2021.findings-acl.363)]
150 | - "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP, 2021. [[paper](https://arxiv.org/abs/2109.12948)] [[code](https://github.com/qualcomm-ai-research/transformer-quantization)]
151 | - "KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization", arXiv, 2021. [[paper](https://arxiv.org/abs/2101.05938)]
152 | - "TernaryBERT: Distillation-aware Ultra-low Bit BERT", EMNLP, 2020. [[paper](https://arxiv.org/abs/2009.12812)] [[code](https://github.com/huawei-noah/Pretrained-Language-Model)] [**`Extreme`**]
153 | - "Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation", EMNLP, 2020. [[paper](https://aclanthology.org/2020.findings-emnlp.433/)]
154 | - "GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference", MICRO, 2020. [[paper](https://arxiv.org/abs/2005.03842)]
155 | - "Towards Fully 8-bit Integer Inference for the Transformer Model", IJCAI, 2020. [[paper](https://www.ijcai.org/Proceedings/2020/0520.pdf)] 
156 | - "Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT", AAAI, 2020. [[paper](https://ojs.aaai.org/index.php/AAAI/article/download/6409/6265)]
157 | - "Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model", ICML, 2019. [[paper](https://arxiv.org/abs/1906.00532)]
158 | - "Q8BERT: Quantized 8Bit BERT", EMC2 Workshop, 2019. [[paper](https://www.emc2-ai.org/assets/docs/neurips-19/emc2-neurips19-paper-31.pdf)] 
159 | 
160 | [[Back to Overview](#overview)]
161 | 
162 | ### Vision Transformers
163 | - "CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/8434_ECCV_2024_paper.php)] [**`PTQ`**]
164 | - "AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/3969_ECCV_2024_paper.php)]  [**`PTQ`**]
165 | - "PQ-SAM: Post-training Quantization for Segment Anything Model", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/1627_ECCV_2024_paper.php)]  [**`PTQ`**]
166 | - "ERQ: Error Reduction for Post-Training Quantization of Vision Transformers", ICML, 2024. [[paper](https://openreview.net/forum?id=jKUWlgra9b)] [**`PTQ`**]
167 | - "Outlier-aware Slicing for Post-Training Quantization in Vision Transformer", ICML, 2024. [[paper](https://openreview.net/forum?id=Uh5XN9d2J4)] [**`PTQ`**]
168 | - "PTQ4SAM: Post-Training Quantization for Segment Anything", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Lv_PTQ4SAM_Post-Training_Quantization_for_Segment_Anything_CVPR_2024_paper.html)] [**`PTQ`**]
169 | - "Instance-Aware Group Quantization for Vision Transformers", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Moon_Instance-Aware_Group_Quantization_for_Vision_Transformers_CVPR_2024_paper.html)] [**`PTQ`**]
170 | - "Bi-ViT: Pushing the Limit of Vision Transformer Quantization", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/28109)] [**`Extreme`**]
171 | - "AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29487)]
172 | - "LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.11243)] [**`PTQ`**] [**`MP`**] 
173 | - "MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.14895)] [**`PTQ`**] [**`MP`**] 
174 | - "I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_I-ViT_Integer-only_Quantization_for_Efficient_Vision_Transformer_Inference_ICCV_2023_paper.pdf)] [[code](https://github.com/zkkli/I-ViT)]
175 | - "RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_RepQ-ViT_Scale_Reparameterization_for_Post-Training_Quantization_of_Vision_Transformers_ICCV_2023_paper.pdf)] [[code](https://github.com/zkkli/RepQ-ViT)] [**`PTQ`**]
176 | - "QD-BEV: Quantization-aware View-guided Distillation for Multi-view 3D Object Detection", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_QD-BEV__Quantization-aware_View-guided_Distillation_for_Multi-view_3D_Object_Detection_ICCV_2023_paper.pdf)]
177 | - "BiViT: Extremely Compressed Binary Vision Transformers", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/He_BiViT_Extremely_Compressed_Binary_Vision_Transformers_ICCV_2023_paper.pdf)] [**`Extreme`**]
178 | - "Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Frumkin_Jumping_through_Local_Minima_Quantization_in_the_Loss_Landscape_of_ICCV_2023_paper.pdf)]
179 | - "PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/71880)]
180 | - "Oscillation-free Quantization for Low-bit Vision Transformers", ICML, 2023. [[paper](https://openreview.net/forum?id=DihXH24AdY)] [[code](https://github.com/nbasyl/OFQ)]
181 | - "PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers", TNNLS, 2023. [[paper](https://arxiv.org/abs/2209.05687)] 
182 | - "Variation-aware Vision Transformer Quantization", arXiv, 2023. [[paper](http://arxiv.org/abs/2307.00331)]
183 | - "NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers", CVPR, 2023. [[paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_NoisyQuant_Noisy_Bias-Enhanced_Post-Training_Activation_Quantization_for_Vision_Transformers_CVPR_2023_paper.pdf)]  [**`PTQ`**]
184 | - "Boost Vision Transformer with GPU-Friendly Sparsity and Quantization", CVPR, 2023. [[paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Yu_Boost_Vision_Transformer_With_GPU-Friendly_Sparsity_and_Quantization_CVPR_2023_paper.pdf)] 
185 | - "Q-DETR: An Efficient Low-Bit Quantized Detection Transformer", CVPR, 2023. [[paper](http://openaccess.thecvf.com/content/CVPR2023/html/Xu_Q-DETR_An_Efficient_Low-Bit_Quantized_Detection_Transformer_CVPR_2023_paper.html)]
186 | - "Output Sensitivity-Aware DETR Quantization", 2023. [[paper](https://practical-dl.github.io/2023/extended_abstract/4/CameraReady/4.pdf)]
187 | - "Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction", arXiv, 2023. [[paper](https://arxiv.org/abs/2303.12557)]  [**`PTQ`**]
188 | - "Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer", NeurIPS, 2022. [[paper](https://openreview.net/forum?id=fU-m9kQe0ke)] [[code](https://github.com/yanjingli0202/q-vit)]
189 | - "Patch Similarity Aware Data-Free Quantization for Vision Transformers", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136710154.pdf)] [[code](https://github.com/zkkli/psaq-vit)]  [**`PTQ`**]
190 | - "PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136720190.pdf)] [[code](https://github.com/hahnyuan/ptq4vit)]  [**`PTQ`**]
191 | - "FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer", IJCAI, 2022. [[paper](https://arxiv.org/abs/2111.13824)]  [[code](https://github.com/megvii-research/FQ-ViT)]  [**`PTQ`**]
192 | - "Q-ViT: Fully Differentiable Quantization for Vision Transformer", arXiv, 2022. [[paper](https://arxiv.org/pdf/2201.07703.pdf)]
193 | - "Post-Training Quantization for Vision Transformer", NeurIPS, 2021. [[paper](https://openreview.net/forum?id=9TX5OsKJvm)]  [**`PTQ`**]
194 | 
195 | 
196 | [[Back to Overview](#overview)]
197 | 
198 | ### Visual Generation
199 | - "SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/27906)]
200 | - "ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/30429)]
201 | - "DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/29192)]
202 | - "PTQ4DiT: Post-training Quantization for Diffusion Transformers", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/95445)] [**`PTQ`**]
203 | 
204 | [[Back to Overview](#overview)]
205 | 
206 | ## Convolutional Neural Networks
207 | ### Visual Generation
208 | - "BiDM: Pushing the Limit of Quantization for Diffusion Models", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/93620)]
209 | - "BitsFusion: 1.99 bits Weight Quantization of Diffusion Model", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/96909)]
210 | - "Timestep-Aware Correction for Quantized Diffusion Models", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/8312_ECCV_2024_paper.php)]
211 | - "Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/7353_ECCV_2024_paper.php)]  [**`PTQ`**]
212 | - "Memory-Efficient Fine-Tuning for Quantized Diffusion Model", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/2494_ECCV_2024_paper.php)]
213 | - "MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/2212_ECCV_2024_paper.php)]
214 | - "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Huang_TFMQ-DM_Temporal_Feature_Maintenance_Quantization_for_Diffusion_Models_CVPR_2024_paper.html)] [**`PTQ`**]
215 | - "Towards Accurate Post-training Quantization for Diffusion Models", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Wang_Towards_Accurate_Post-training_Quantization_for_Diffusion_Models_CVPR_2024_paper.html)] [**`PTQ`**]
216 | - "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=UmMa3UNDAz)]
217 | - "QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.03666)]
218 | - "Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.04585)]
219 | - "Efficient Quantization Strategies for Latent Diffusion Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.05431)] [**`PTQ`**]
220 | - "Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2311.06322)]
221 | - "Effective Quantization for Diffusion Models on CPUs", arXiv, 2023. [[paper](http://arxiv.org/abs/2311.16133)]
222 | - "PTQD: Accurate Post-Training Quantization for Diffusion Models", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/71314)] [**`PTQ`**]
223 | - "Q-DM: An Efficient Low-bit Quantized Diffusion Model", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/70279)]
224 | - "Temporal Dynamic Quantization for Diffusion Models", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/72396)]
225 | - "Q-diffusion: Quantizing Diffusion Models", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Q-Diffusion_Quantizing_Diffusion_Models_ICCV_2023_paper.pdf)] [[code](https://github.com/Xiuyu-Li/q-diffusion)] [**`PTQ`**]
226 | - "Towards Accurate Data-free Quantization for Diffusion Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2305.18723)] [**`PTQ`**]
227 | - "Post-training Quantization on Diffusion Models", CVPR, 2023. [[paper](http://openaccess.thecvf.com/content/CVPR2023/html/Shang_Post-Training_Quantization_on_Diffusion_Models_CVPR_2023_paper.html)] [[code](https://https//github.com/42Shawn/PTQ4DM)] [**`PTQ`**]
228 | 
229 | [[Back to Overview](#overview)]
230 | 
231 | ### Image Classification
232 | - "MetaAug: Meta-Data Augmentation for Post-Training Quantization", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/3914_ECCV_2024_paper.php)]
233 | - "Sharpness-Aware Data Generation for Zero-shot Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=8mKXMnhnFW)]
234 | - "A2Q+: Improving Accumulator-Aware Weight Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=mbx2pLK5Eq)]
235 | - "HyQ: Hardware-Friendly Post-Training Quantization for CNN-Transformer Hybrid Networks", IJCAI, 2024. [[paper](https://www.ijcai.org/proceedings/2024/474)] [**`PTQ`**]
236 | - "Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Tang_Retraining-Free_Model_Quantization_via_One-Shot_Weight-Coupling_Learning_CVPR_2024_paper.html)] [**`MP`**]
237 | - "Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Chen_Mixed-Precision_Quantization_for_Federated_Learning_on_Resource-Constrained_Heterogeneous_Devices_CVPR_2024_paper.html)] [**`MP`**]
238 | - "Enhancing Post-training Quantization Calibration through Contrastive Learning", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Shang_Enhancing_Post-training_Quantization_Calibration_through_Contrastive_Learning_CVPR_2024_paper.html)] [**`PTQ`**]
239 | - "Data-Free Quantization via Pseudo-label Filtering", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Fan_Data-Free_Quantization_via_Pseudo-label_Filtering_CVPR_2024_paper.html)]
240 | - "Make RepVGG Greater Again: A Quantization-Aware Approach", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29045)]
241 | - "MetaMix: Meta-State Precision Searcher for Mixed-Precision Activation Quantization", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29212)] [**`MP`**] 
242 | - "Robustness-Guided Image Synthesis for Data-Free Quantization", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/28972)]
243 | - "PTMQ: Post-training Multi-Bit Quantization of Neural Networks", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29553)] [**`PTQ`**]
244 | - "Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.17544)]
245 | - "StableQ: Enhancing Data-Scarce Quantization with Text-to-Image Data", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.05272)]
246 | - "Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/71526)] [**`Extreme`**]
247 | - "TexQ: Zero-shot Network Quantization with Texture Feature Distribution Calibration", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/70325)]
248 | - "Overcoming Forgetting Catastrophe in Quantization-Aware Training", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Overcoming_Forgetting_Catastrophe_in_Quantization-Aware_Training_ICCV_2023_paper.pdf)]
249 | - "Causal-DFQ: Causality Guided Data-Free Network Quantization", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Shang_Causal-DFQ_Causality_Guided_Data-Free_Network_Quantization_ICCV_2023_paper.pdf)] [[code](https://github.com/42Shawn/Causal-DFQ)]
250 | - "DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_DenseShift_Towards_Accurate_and_Efficient_Low-Bit_Power-of-Two_Quantization_ICCV_2023_paper.pdf)]
251 | - "EQ-Net: Elastic Quantization Neural Networks", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_EQ-Net_Elastic_Quantization_Neural_Networks_ICCV_2023_paper.pdf)] [[code](https://github.com/xuke225/EQ-Net)]
252 | - "A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Colbert_A2Q_Accumulator-Aware_Quantization_with_Guaranteed_Overflow_Avoidance_ICCV_2023_paper.pdf)]
253 | - "EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Dong_EMQ_Evolving_Training-free_Proxies_for_Automated_Mixed_Precision_Quantization_ICCV_2023_paper.pdf)] [**`MP`**]
254 | - "Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Bai_Unified_Data-Free_Compression_Pruning_and_Quantization_without_Fine-Tuning_ICCV_2023_paper.pdf)] [**`PTQ`**]
255 | - "Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction", ICML, 2023. [[paper](https://openreview.net/forum?id=m2S96Qf2R3)] [[code](https://github.com/SkoltechAI/fewbit)]
256 | - "FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization", ICML, 2023. [[paper](https://openreview.net/forum?id=EPnzNJTYsb)] [**`PTQ`**]
257 | - "Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning", PR, 2023. [[paper](http://arxiv.org/abs/2307.00498)]
258 | - "OMPQ: Orthogonal Mixed Precision Quantization", AAAI, 2023. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/26084)] [**`MP`**]
259 | - "Rethinking Data-Free Quantization as a Zero-Sum Game", AAAI, 2023. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/26136)]
260 | - "Quantized Feature Distillation for Network Quantization", AAAI, 2023. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/26354)]
261 | - "Resilient Binary Neural Network", AAAI, 2023. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/26261)] [**`Extreme`**]
262 | - "Fast and Accurate Binary Neural Networks Based on Depth-Width Reshaping", AAAI, 2023. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/26268)] [**`Extreme`**]
263 | - "Efficient Quantization-aware Training with Adaptive Coreset Selection", arXiv, 2023. [[paper](http://arxiv.org/abs/2306.07215)] 
264 | - "One-Shot Model for Mixed-Precision Quantization", CVPR, 2023. [[paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Koryakovskiy_One-Shot_Model_for_Mixed-Precision_Quantization_CVPR_2023_paper.pdf)] [**`MP`**]
265 | - "Adaptive Data-Free Quantization", CVPR, 2023. [[paper](https://arxiv.org/abs/2303.06869)] 
266 | - "Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization", CVPR, 2023. [[paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Lin_Bit-Shrinking_Limiting_Instantaneous_Sharpness_for_Improving_Post-Training_Quantization_CVPR_2023_paper.pdf)] [**`PTQ`**]
267 | - "Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective", CVPR, 2023. [[paper](https://arxiv.org/pdf/2303.11906.pdf)] [[code](https://github.com/bytedance/mrecg)] [**`PTQ`**]
268 | - "GENIE: Show Me the Data for Quantization", CVPR, 2023. [[paper](https://arxiv.org/abs/2212.04780)] [[code](https://github.com/SamsungLabs/Genie)] [**`PTQ`**]
269 | - "Bayesian asymmetric quantized neural networks", PR, 2023. [[paper](https://www.sciencedirect.com/science/article/pii/S0031320323001632)]
270 | - "Distribution-sensitive Information Retention for Accurate Binary Neural Network", IJCV, 2023. [[paper](https://arxiv.org/abs/2109.12338)] [**`Extreme`**]
271 | - "SDQ: Stochastic Differentiable Quantization with Mixed Precision", ICML, 2022. [[paper](https://proceedings.mlr.press/v162/huang22h.html)] [**`MP`**]
272 | - "Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks", ICML, 2022. [[paper](https://proceedings.mlr.press/v162/dong22a.html)] [[code](https://github.com/RunpeiDong/DGMS)]
273 | - "GACT: Activation Compressed Training for Generic Network Architectures", ICML, 2022. [[paper](https://proceedings.mlr.press/v162/liu22v.html)] [[code](https://github.com/LiuXiaoxuanPKU/GACT-ICML)]
274 | - "Overcoming Oscillations in Quantization-Aware Training", ICML, 2022. [[paper](https://proceedings.mlr.press/v162/nagel22a/nagel22a.pdf)] [[code](https://github.com/qualcomm-ai-research/oscillations-qat)]
275 | - "Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation", CVPR, 2022. [[paper](https://arxiv.org/abs/2111.14826)] [[code](https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization)] [**`Non-uniform`**]
276 | - "Learnable Lookup Table for Neural Network Quantization", CVPR, 2022. [[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Learnable_Lookup_Table_for_Neural_Network_Quantization_CVPR_2022_paper.pdf)] [[code](https://github.com/The-Learning-And-Vision-Atelier-LAVA/LLT)] [**`Non-uniform`**]
277 | - "Mr.BiQ: Post-Training Non-Uniform Quantization based on Minimizing the Reconstruction Error", CVPR, 2022. [[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Jeon_Mr.BiQ_Post-Training_Non-Uniform_Quantization_Based_on_Minimizing_the_Reconstruction_Error_CVPR_2022_paper.pdf)] [**`PTQ`**] [**`Non-uniform`**]
278 | - "Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization", CVPR, 2022. [[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Chikin_Data-Free_Network_Compression_via_Parametric_Non-Uniform_Mixed_Precision_Quantization_CVPR_2022_paper.pdf)] [**`Non-uniform`**] [**`MP`**]
279 | - "IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization", CVPR, 2022. [[paper](https://openaccess.thecvf.com/content/CVPR2022/html/Zhong_IntraQ_Learning_Synthetic_Images_With_Intra-Class_Heterogeneity_for_Zero-Shot_Network_CVPR_2022_paper.html)] [[code](https://github.com/zysxmu/IntraQ)]
280 | - "Instance-Aware Dynamic Neural Network Quantization", CVPR, 2022. [[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Liu_Instance-Aware_Dynamic_Neural_Network_Quantization_CVPR_2022_paper.pdf)]
281 | - "Leveraging Inter-Layer Dependency for Post-Training Quantization", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=54389)]  [**`PTQ`**]
282 | - "Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=53476)]
283 | - "Entropy-Driven Mixed-Precision Quantization for Deep Network Design",  NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=54104)] [**`MP`**]
284 | - "Redistribution of Weights and Activations for AdderNet Quantization", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=54812)]
285 | - "FP8 Quantization: The Power of the Exponent", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=53073)] [[code](https://github.com/qualcomm-ai-research/fp8-quantization)]
286 | - "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=53412)] [[code](https://github.com/ist-daslab/obc)]  [**`PTQ`**]
287 | - "ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=55162)]
288 | - "Non-Uniform Step Size Quantization for Accurate Post-Training Quantization", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136710657.pdf)]  [**`PTQ`**] [**`Non-uniform`**]
289 | - "Towards Accurate Network Quantization with Equivalent Smooth Regularizer", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136710726.pdf)] 
290 | - "BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136720017.pdf)] [[code](https://github.com/HanByulKim/BASQ)]
291 | - "RDO-Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136720156.pdf)]
292 | - "Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance", ECCV, 2022. [[paper](https://arxiv.org/abs/2203.08368)] [[Code](https://github.com/1hunters/LIMPQ)] [[code](https://github.com/1hunters/LIMPQ)] [**`MP`**]
293 | - "Symmetry Regularization and Saturating Nonlinearity for Robust Quantization", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136710207.pdf)]
294 | - "RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization", IJCAI, 2022. [[paper](https://www.ijcai.org/proceedings/2022/219)] [[code](https://github.com/billamihom/rapq)]  [**`PTQ`**]
295 | - "MultiQuant: Training Once for Multi-bit Quantization of Neural Networks", IJCAI, 2022. [[paper](https://www.ijcai.org/proceedings/2022/504)]
296 | - "F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization", ICLR, 2022. [[paper](https://openreview.net/forum?id=_CfpJazzXT2)] 
297 | - "8-bit Optimizers via Block-wise Quantization", ICLR, 2022. [[paper](https://openreview.net/forum?id=shpkpVXzo3h)] [[code](https://github.com/facebookresearch/bitsandbytes)]
298 | - "Information Bottleneck: Exact Analysis of (Quantized) Neural Networks", ICLR, 2022. [[paper](https://openreview.net/forum?id=kF9DZQQrU0w)] [[code](https://github.com/StephanLorenzen/ExactIBAnalysisInQNNs)]
299 | - "QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization", ICLR, 2022. [[paper](https://openreview.net/forum?id=ySQH0oDyp7)] [[code](https://github.com/wimh966/QDrop)]  [**`PTQ`**]
300 | - "SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation", ICLR, 2022. [[paper](https://openreview.net/forum?id=JXhROKNZzOc)] [[code](https://github.com/clevercool/SQuant)]  [**`PTQ`**]
301 | - "FILM-QNN: Efficient FPGA Acceleration of Deep Neural Networks with Intra-Layer, Mixed-Precision Quantization", FPGA, 2022. [[paper](https://dl.acm.org/doi/abs/10.1145/3490422.3502364)] [**`MP`**]
302 | - "Accurate Post Training Quantization with Small Calibration Sets", ICML, 2021. [[paper](http://proceedings.mlr.press/v139/hubara21a.html)] [[code](https://github.com/papers-submission/CalibTIP)]  [**`PTQ`**]
303 | - "How Do Adam and Training Strategies Help BNNs Optimization?", ICML, 2021. [[paper](http://proceedings.mlr.press/v139/liu21t/liu21t.pdf)] [[code](https://github.com/liuzechun/AdamBNN)]
304 | - "ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training", ICML, 2021. [[paper](https://proceedings.mlr.press/v139/chen21z.html)] [[code](https://github.com/ucbrise/actnn)]
305 | - "HAWQ-V3: Dyadic Neural Network Quantization", ICML, 2021. [[paper](https://proceedings.mlr.press/v139/yao21a.html)] [[code](https://github.com/Zhen-Dong/HAWQ)] [**`MP`**]
306 | - "Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution", ICML, 2021. [[paper](https://proceedings.mlr.press/v139/zhang21r.html)] [**`MP`**]
307 | - "Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators", ICML, 2021. [[paper](https://proceedings.mlr.press/v139/fu21d.html)] [[code](https://github.com/RICE-EIC/Auto-NBA)]
308 | - "Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples", NeurIPS, 2021. [[paper](https://openreview.net/forum?id=ejo1_Weiart)] [[code](https://github.com/iamkanghyunchoi/qimera)]
309 | - "Post-Training Sparsity-Aware Quantization", NeurIPS, 2021. [[paper](https://openreview.net/forum?id=qe9z54E_cqE)] [[code](https://github.com/gilshm/sparq)]  [**`PTQ`**]
310 | - "Diversifying Sample Generation for Accurate Data-Free Quantization", CVPR, 2021. [[paper](https://arxiv.org/abs/2103.01049)]  [**`PTQ`**]
311 | - "Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks.", CVPR, 2021. [[paper](https://arxiv.org/abs/2010.15703)] [[code](https://github.com/uber-research/permute-quantize-finetune)]
312 | - "Learnable Companding Quantization for Accurate Low-bit Neural Networks", CVPR, 2021. [[paper](https://arxiv.org/abs/2103.07156)]
313 | - "Zero-shot Adversarial Quantization", CVPR, 2021. [[paper](https://arxiv.org/abs/2103.15263)] [[code](https://github.com/FLHonker/ZAQ-code)]
314 | - "Network Quantization with Element-wise Gradient Scaling", CVPR, 2021. [[paper](https://arxiv.org/abs/2104.00903)] [[code](https://github.com/cvlab-yonsei/EWGS)]
315 | - "High-Capacity Expert Binary Networks", ICLR, 2021. [[paper](https://openreview.net/forum?id=MxaY4FzOTa)] [[code](https://github.com/1adrianb/expert-binary-networks)] [**`Extreme`**]
316 | - "Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network", ICLR, 2021. [[paper](https://openreview.net/forum?id=U_mat0b9iv)] [[code](https://github.com/chrundle/biprop)] [**`Extreme`**]
317 | - "BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction", ICLR, 2021. [[paper](https://openreview.net/forum?id=POWv6hDd9XH)] [[code](https://github.com/yhhhli/BRECQ)]  [**`PTQ`**]
318 | - "Neural gradients are near-lognormal: improved quantized and sparse training", ICLR, 2021. [[paper](https://openreview.net/forum?id=EoFNy62JGd)] 
319 | - "Training with Quantization Noise for Extreme Model Compression", ICLR, 2021. [[paper](https://openreview.net/forum?id=dV19Yyi1fS3)]
320 | - "BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization", ICLR, 2021. [[paper](https://openreview.net/forum?id=TiXl51SCNw8)] [[code](https://github.com/yanghr/BSQ)] [**`MP`**]
321 | - "Simple Augmentation Goes a Long Way: ADRL for DNN Quantization", ICLR, 2021. [[paper](https://openreview.net/forum?id=Qr0aRliE_Hb)]
322 | - "Distribution Adaptive INT8 Quantization for Training CNNs", AAAI, 2021. [[paper](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwj4-rjuq7nvAhUVPH0KHXlYCUQQFjAFegQIChAD&url=https%3A%2F%2Fwww.aaai.org%2FAAAI21Papers%2FAAAI-7144.ZhaoK.pdf&usg=AOvVaw3dnOXfzKkLIw_qWXj7p7Yc)]
323 | - "Stochastic Precision Ensemble: Self‐Knowledge Distillation for Quantized Deep Neural Networks", AAAI, 2021. [[paper](https://arxiv.org/abs/2009.14502)]
324 | - "Optimizing Information Theory Based Bitwise Bottlenecks for Efficient Mixed-Precision Activation Quantization", AAAI, 2021. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/16474/16281)] [**`MP`**]
325 | - "OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization", AAAI, 2021. [[paper](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjD6aPrqbnvAhXeIDQIHWNdDCUQFjADegQIAxAD&url=https%3A%2F%2Fwww.aaai.org%2FAAAI21Papers%2FAAAI-1054.HuP.pdf&usg=AOvVaw2R_BcDlKyuuAPHMeO0Q-1c)] 
326 | - "Scalable Verification of Quantized Neural Networks", AAAI, 2021. [[paper](https://arxiv.org/pdf/2012.08185)] [[code](https://github.com/mlech26l/qnn_robustness_benchmarks)]
327 | - "Uncertainty Quantification in CNN through the Bootstrap of Convex Neural Networks", AAAI, 2021. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/17434/17241)]
328 | - "FracBits: Mixed Precision Quantization via Fractional Bit-Widths", AAAI, 2021. [[paper](https://www.semanticscholar.org/paper/FracBits%3A-Mixed-Precision-Quantization-via-Yang-Jin/cb219432863778fa173925d51fbf02af1d17ad98)] [**`MP`**]
329 | - "Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision", AAAI, 2021. [[paper](https://arxiv.org/pdf/2002.09049)]  [**`PTQ`**] [**`MP`**]
330 | - "ZeroQ: A Novel Zero Shot Quantization Framework", CVPR, 2020. [[paper](http://openaccess.thecvf.com/content_CVPR_2020/html/Cai_ZeroQ_A_Novel_Zero_Shot_Quantization_Framework_CVPR_2020_paper.html)] [[code](https://github.com/amirgholami/ZeroQ)]  [**`PTQ`**]
331 | - "LSQ+: Improving Low-bit Quantization Through Learnable Offsets and Better Initialization", CVPR, 2020. [[paper](http://openaccess.thecvf.com/content_CVPRW_2020/html/w40/Bhalgat_LSQ_Improving_Low-Bit_Quantization_Through_Learnable_Offsets_and_Better_Initialization_CVPRW_2020_paper.html)]
332 | - "HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks", NeurIPS, 2020. [[paper](https://proceedings.neurips.cc/paper/2020/hash/d77c703536718b95308130ff2e5cf9ee-Abstract.html)] [**`MP`**]
333 | - "Learned step size quantization", ICLR, 2020. [[paper](https://openreview.net/forum?id=rkgO66VKDS)]
334 | - "HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision", ICCV, 2019. [[paper](https://openaccess.thecvf.com/content_ICCV_2019/html/Dong_HAWQ_Hessian_AWare_Quantization_of_Neural_Networks_With_Mixed-Precision_ICCV_2019_paper.html)] [**`MP`**]
335 | - "Data-Free Quantization Through Weight Equalization and Bias Correction", ICCV, 2019. [[paper](https://openaccess.thecvf.com/content_ICCV_2019/html/Nagel_Data-Free_Quantization_Through_Weight_Equalization_and_Bias_Correction_ICCV_2019_paper.html)]  [**`PTQ`**]
336 | - "HAQ: Hardware-Aware Automated Quantization with Mixed Precision", CVPR, 2019. [[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_HAQ_Hardware-Aware_Automated_Quantization_With_Mixed_Precision_CVPR_2019_paper.pdf)] [[code](https://github.com/mit-han-lab/haq)] [**`MP`**]
337 | - "PACT: Parameterized Clipping Activation for Quantized Neural Networks", arXiv, 2018. [[paper](https://arxiv.org/abs/1805.06085)]
338 | - "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference", CVPR, 2018. [[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)]
339 | 
340 | 
341 | [[Back to Overview](#overview)]
342 | 
343 | ### Other Tasks
344 | #### Object Detection
345 | - "Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Ding_Reg-PTQ_Regression-specialized_Post-training_Quantization_for_Fully_Quantized_Object_Detector_CVPR_2024_paper.html)] [**`PTQ`**]
346 | - "Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric", arXiv, 2023. [[paper](https://arxiv.org/abs/2304.09785)]  [**`PTQ`**]
347 | - "AQD: Towards Accurate Quantized Object Detection", CVPR, 2021. [[paper](http://arxiv.org/abs/2007.06919)]
348 | - "BiDet: An Efficient Binarized Object Detector", CVPR, 2020. [[paper](https://arxiv.org/abs/2003.03961)] [[code](https://github.com/ZiweiWangTHU/BiDet)] [**`Extreme`**]
349 | - "Fully Quantized Network for Object Detection", CVPR, 2019. [[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Fully_Quantized_Network_for_Object_Detection_CVPR_2019_paper.pdf)]
350 | 
351 | [[Back to Overview](#overview)]
352 | 
353 | #### Super Resolution
354 | - "Towards Robust Full Low-bit Quantization of Super Resolution Networks", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/9567_ECCV_2024_paper.php)] [**`PTQ`**]
355 | - "Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/2121_ECCV_2024_paper.php)]
356 | - "QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/72890)]
357 | - "Toward Accurate Post-Training Quantization for Image Super Resolution", CVPR, 2023. [[paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Tu_Toward_Accurate_Post-Training_Quantization_for_Image_Super_Resolution_CVPR_2023_paper.pdf)] [[code]( https://github.com/huawei-noah/Efficient-Computing/tree/master/Quantization/PTQ4SR)]  [**`PTQ`**]
358 | - "EBSR: Enhanced Binary Neural Network for Image Super-Resolution", arXiv, 2023. [[paper](https://arxiv.org/abs/2303.12270)] [**`Extreme`**]
359 | - "CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution
360 | ", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136670360.pdf)] [[code](https://github.com/cheeun/cadyq)]
361 | - "Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks", ECCV, 2022. [[paper](https://arxiv.org/abs/2203.03844)] [[code](https://github.com/zysxmu/ddtb)]
362 | - "DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks", WACV, 2022. [[paper](http://openaccess.thecvf.com/content/WACV2022/html/Hong_DAQ_Channel-Wise_Distribution-Aware_Quantization_for_Deep_Image_Super-Resolution_Networks_WACV_2022_paper.html)] [[code](https://github.com/Cheeun/DAQ-pytorch)]
363 | - "Fully Quantized Image Super-Resolution Networks", ACM MM, 2021. [[paper](https://arxiv.org/abs/2011.14265)] [[code](https://github.com/billhhh/FQSR)]
364 | - "PAMS: Quantized Super-Resolution via Parameterized Max Scale", ECCV, 2020. [[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123700562.pdf)] [[code](https://github.com/colorjam/PAMS)]
365 | - "Training Binary Neural Network without Batch Normalization for Image Super-Resolution", AAAI, 2021. [[paper](https://ojs.aaai.org/index.php/AAAI/article/download/16263/16070)] [**`Extreme`**]
366 | 
367 | [[Back to Overview](#overview)]
368 | 
369 | #### Point Cloud
370 | - "LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection", ICLR, 2024. [[paper](https://openreview.net/forum?id=0d1gQI114C)]  [**`PTQ`**]
371 | - "Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis", arXiv, 2023. [[paper](https://arxiv.org/abs/2303.15493)] [**`Extreme`**]
372 | - "BiPointNet: Binary Neural Network for Point Clouds", ICLR, 2021. [[paper](https://openreview.net/forum?id=9QLRCVysdlO)]  [[code](https://github.com/htqin/BiPointNet)] [**`Extreme`**]
373 | 
374 | [[Back to Overview](#overview)]
375 | 
376 | 
377 | 
378 | ---
379 | 
380 | ## References
381 | * Online Resources:
382 |     * [MQBench (Benchmark)](http://mqbench.tech/)
383 |     * [Awesome Model Quantization (GitHub)](https://github.com/htqin/awesome-model-quantization)
384 |     * [Awesome Transformer Attention (GitHub)](https://github.com/cmhungsteve/Awesome-Transformer-Attention)
385 | 
386 | 
387 | 


--------------------------------------------------------------------------------