├── LICENSE ├── Awesome-Quantization-Papers └── Awesome_Quantization_Papers.csv └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Zhen Dong 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /Awesome-Quantization-Papers/Awesome_Quantization_Papers.csv: -------------------------------------------------------------------------------- 1 | Title,Publication,Bit,Quantizer,Finetune,Task,Special 2 | Fully integer-based quantization for mobile convolutional neural network inference,Neurocomputing 2021,,,,, 3 | S^3: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks,NeurIPS 2021,T,,,C, 4 | BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer,NeurIPS 2021,MP,,QAT,C, 5 | CBP: backpropagation with constraint on weight precision using a pseudo-Lagrange multiplier method[PyTorch],NeurIPS 2021,B/T/Uni,,QAT,C, 6 | "Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals",NeurIPS 2021,B/Uni,,,C, 7 | Learning Frequency Domain Approximation for Binary Neural Networks,NeurIPS 2021,B,,QAT,C, 8 | Post-Training Quantization for Vision Transformer,NeurIPS 2021,Uni,,PTQ,C, 9 | Post-Training Sparsity-Aware Quantization[PyTorch]:star:5,NeurIPS 2021,Uni,,PTQ,C, 10 | Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples[PyTorch]:star:2,NeurIPS 2021,Uni,,,C, 11 | Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes[PyTorch]:star:2,NeurIPS 2021,Uni,,QAT,C, 12 | QuPeD: Quantized Personalization via Distillation with Applications to Federated Learning,NeurIPS 2021,B/Uni,,QAT,C, 13 | VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization,NeurIPS 2021,MP,PQ,QAT,Node Classification, 14 | Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression,EMNLP 2021,Uni,,QAT/PTQ,N, 15 | Compressing Word Embeddings via Deep Compositional Code Learning:fire:89[PyTorch]:star:81,EMNLP 2021,MP,,,N, 16 | Matching-oriented Embedding Quantization For Ad-hoc Retrieval[PyTorch],EMNLP 2021,MP,PQ,QAT,N, 17 | Understanding and Overcoming the Challenges of Efficient Transformer Quantization[PyTorch]:star:9,EMNLP 2021,Uni,,QAT/PTQ,N, 18 | Fully Quantized Image Super-Resolution Networks,MM 2021,,,,, 19 | VQMG: Hierarchical Vector Quantised and Multi-hops Graph Reasoning for Explicit Representation Learning,MM 2021,,,,, 20 | Cluster-Promoting Quantization With Bit-Drop for Minimizing Network Quantization Loss,ICCV 2021,T/Uni,LQ,,C, 21 | Distance-Aware Quantization,ICCV 2021,B/T/Uni,LQ,QAT,C, 22 | Dynamic Network Quantization for Efficient Video Inference,ICCV 2021,MP,,,Video Recognition, 23 | Generalizable Mixed-Precision Quantization via Attribution Rank Preservation[PyTorch]:star:15,ICCV 2021,MP,LQ,QAT,C, 24 | Improving Low-Precision Network Quantization via Bin Regularization,ICCV 2021,B/T/Uni,LQ,QAT,C, 25 | Improving Neural Network Efficiency via Post-Training Quantization With Adaptive Floating-Point[PyTorch],ICCV 2021,MP,Linear,PTQ,C, 26 | Integer-Arithmetic-Only Certified Robustness for Quantized Neural Networks,ICCV 2021,Uni,Linear,QAT,C, 27 | MixMix: All You Need for Data-Free Compression Are Feature and Data Mixing,ICCV 2021,Uni,Linear,QAT/PTQ,C/O, 28 | Once Quantization-Aware Training: High Performance Extremely Low-Bit Architecture Search[PyTorch]:star:19,ICCV 2021,Uni,,QAT,C, 29 | Product Quantizer Aware Inverted Index for Scalable Nearest Neighbor Search,ICCV 2021,MP,PQ,QAT,C, 30 | RMSMP: A Novel Deep Neural Network Quantization Framework With Row-Wise Mixed Schemes and Multiple Precisions,ICCV 2021,MP,,QAT/PTQ,C/N, 31 | ReCU: Reviving the Dead Weights in Binary Neural Networks,ICCV 2021,B,,,C, 32 | Self-Supervised Product Quantization for Deep Unsupervised Image Retrieval[PyTorch]:star:22,ICCV 2021,Uni,PQ,QAT,C, 33 | Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks[PyTorch]:star:7,ICCV 2021,B,,,C, 34 | Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization,ICCV 2021,MP,Linear,PTQ,C, 35 | 1-bit Adam: Communication Efficient Large-Scale Training with Adam’s Convergence Speed,ICML 2021,B,,,C/N, 36 | Accurate Post Training Quantization With Small Calibration Sets[PyTorch]:star:14,ICML 2021,MP,,PTQ,C/N, 37 | ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training,ICML 2021,MP,,,C, 38 | Communication-Efficient Distributed Optimization with Quantized Preconditioners,ICML 2021,,PQ,,, 39 | Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution,ICML 2021,MP,,QAT,C, 40 | Double-Win Quant: Aggressively Winning Robustness of Quantized Deep Neural Networks via Random Precision Training and Inference[PyTorch]:star:3,ICML 2021,MP,,,C, 41 | Estimation and Quantization of Expected Persistence Diagrams,ICML 2021,,,,, 42 | HAWQ-V3: Dyadic Neural Network Quantization[PyTorch]:star:193,ICML 2021,MP,Linear,,C, 43 | I-BERT: Integer-only BERT Quantization,ICML 2021,Uni,,,N, 44 | Quantization Algorithms for Random Fourier Features,ICML 2021,,,,, 45 | Soft then Hard: Rethinking the Quantization in Neural Image Compression,ICML 2021,,,,, 46 | Training Quantized Neural Networks to Global Optimality via Semidefinite Programming,ICML 2021,,,,, 47 | Vector Quantized Models for Planning,ICML 2021,,,,, 48 | Distribution-aware Adaptive Multi-bit Quantization,CVPR 2021,,,,, 49 | Layer importance estimation with imprinting for neural network quantization,CVPR 2021,,,,, 50 | Adaptive binary-ternary quantization,CVPR 2021,,,,, 51 | AQD: Towards Accurate Quantized Object Detection[PyTorch]:star:11,CVPR 2021,,,,, 52 | Automated Log-Scale Quantization for Low-Cost Deep Neural Networks,CVPR 2021,T,Log,QAT,C/S, 53 | Binary TTC: A Temporal Geofence for Autonomous Navigation,CVPR 2021,,,,, 54 | Diversifying Sample Generation for Accurate Data-Free Quantization,CVPR 2021,,,,, 55 | Generative Zero-shot Network Quantization,CVPR 2021,Uni,OptN,QAT,C, 56 | Improving Accuracy of Binary Neural Networks using Unbalanced Activation Distribution,CVPR 2021,,,,, 57 | Is In-Domain Data Really Needed? A Pilot Study on Cross-Domain Calibration for Network Quantization,CVPR 2021,Uni,OptN,PTQ,C/O/S, 58 | Learnable Companding Quantization for Accurate Low-Bit Neural Networks,CVPR 2021,,,,, 59 | Network Quantization With Element-Wise Gradient Scaling,CVPR 2021,,,,, 60 | Optimal Quantization Using Scaled Codebook,CVPR 2021,,,,, 61 | "Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks[PyTorch]:star:109",CVPR 2021,,,,, 62 | QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks,CVPR 2021,Uni,LQ,PTQ,C/O/S, 63 | Zero-shot Adversarial Quantization,CVPR 2021,Uni,Linear,PTQ,C/O, 64 | iVPF: Numerical Invertible Volume Preserving Flow for Efficient Lossless Compression,CVPR 2021,,,,, 65 | HAO: Hardware-aware neural Architecture Optimization for Efficient Inference,FCCM 2021,,,,, 66 | Bipointnet: Binary neural network for point clouds,ICLR 2021,,,,, 67 | Sparse quantized spectral clustering,ICLR 2021,,,,, 68 | Neural gradients are near-lognormal: improved quantized and sparse training,ICLR 2021,,,,, 69 | Multiprize lottery ticket hypothesis: Finding accurate binary neural networks by pruning a randomly weighted network,ICLR 2021,,,,, 70 | High-capacity expert binary networks,ICLR 2021,,,,, 71 | BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction[PyTorch]:star:58,ICLR 2021,Uni/MP,Linear,PTQ,C/O, 72 | BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization[PyTorch]:star:9,ICLR 2021,MP,Linear,QAT,C, 73 | Degree-Quant: Quantization-Aware Training for Graph Neural Networks[PyTorch]:star:17,ICLR 2021,Uni,Linear,QAT,C, 74 | Incremental few-shot learning via vector quantization in deep embedded space,ICLR 2021,,,,, 75 | Simple Augmentation Goes a Long Way: ADRL for DNN Quantization,ICLR 2021,MP,Linear,QAT,C, 76 | Training with Quantization Noise for Extreme Model Compression,ICLR 2021,Uni,PQ,QAT,C, 77 | CoDeNet: Algorithm-hardware Co-design for Deformable Convolution,FPGA 2021,,,,, 78 | A Survey of Quantization Methods for Efficient Neural Network Inference:fire:47,BLPCV 2021,,,,, 79 | Opq: Compressing deep neural networks with one-shot pruning quantization,AAAI 2021,,,,, 80 | A white paper on neural network quantization,arXiv 2021,,,,, 81 | Kdlsq-bert: A quantized bert combining knowledge distillation with learned step size quantization,arXiv 2021,,,,, 82 | Pruning and quantization for deep neural network acceleration: A survey,arXiv 2021,,,,, 83 | Confounding tradeoffs for neural network quantization,arXiv 2021,,,,, 84 | Dynamic precision analog computing for neural networks,arXiv 2021,,,,, 85 | Boolnet: Minimizing the energy consumption of binary neural networks,arXiv 2021,,,,, 86 | Quantization-aware pruning for efficient low latency neural network inference,arXiv 2021,,,,, 87 | Any-Precision Deep Neural Networks[PyTorch]:star:33,arXiv 2021,MP,,,, 88 | Phoenix: A Low-Precision Floating-Point Quantization Oriented Architecture for Convolutional Neural Networks,TVLSI 2020,,,,, 89 | Hierarchical Binary CNNs for Landmark Localization with Limited Resources,TPAMI 2020,B,,,, 90 | Deep Neural Network Compression by In-Parallel Pruning-Quantization,TPAMI 2020,,,,, 91 | Towards Efficient U-Nets: A Coupled and Quantized Approach,TPAMI 2020,,,,, 92 | SIMBA: A Skyrmionic In-Memory Binary Neural Network Accelerator,TMAG 2020,B,,,, 93 | Design of High Robustness BNN Inference Accelerator Based on Binary Memristors,TED 2020,B,,,, 94 | A Resource-Efficient Inference Accelerator for Binary Convolutional Neural Networks,TCSII 2020,,,,, 95 | "Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations:fire:126",JSAIT 2020,,,,,Gradient 96 | An Energy-Efficient and High Throughput in-Memory Computing Bit-Cell With Excellent Robustness Under Process Variations for Binary Neural Network,ACCESS 2020,B,,,, 97 | CP-NAS: Child-Parent Neural Architecture Search for Binary Neural Networks,IJCAI 2020,B,,,, 98 | Towards Fully 8-bit Integer Inference for the Transformer Model,IJCAI 2020,,,,, 99 | Soft Threshold Ternary Networks,IJCAI 2020,,,,, 100 | Overflow Aware Quantization: Accelerating Neural Network Inference by Low-bit Multiply-Accumulate Operations,IJCAI 2020,,,,, 101 | Direct Quantization for Training Highly Accurate Low Bit-width Deep Neural Networks,IJCAI 2020,,,,, 102 | Fully Nested Neural Network for Adaptive Compression and Quantization,IJCAI 2020,,,,, 103 | Path sample-analytic gradient estimators for stochastic binary networks,NeurIPS 2020,,,,, 104 | Efficient exact verification of binarized neural networks,NeurIPS 2020,,,,, 105 | Comparing fisher information regularization with distillation for dnn quantization,NeurIPS 2020,,,,, 106 | Position-based scaled gradient for model quantization and sparse training,NeurIPS 2020,,,,, 107 | Flexor: Trainable fractional quantization,NeurIPS 2020,,,,, 108 | Adaptive Gradient Quantization for Data-Parallel SGD[PyTorch]:star:13,NeurIPS 2020,T,,QAT,C, 109 | Bayesian Bits: Unifying Quantization and Pruning,NeurIPS 2020,MP,,QAT/PTQ,C, 110 | "Distribution-free binary classification: prediction sets, confidence intervals and calibration",NeurIPS 2020,B,,,, 111 | FleXOR: Trainable Fractional Quantization,NeurIPS 2020,,,,, 112 | HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks:fire:60,NeurIPS 2020,MP,Linear,QAT,C/O, 113 | Hierarchical Quantized Autoencoders[PyTorch]:star:22,NeurIPS 2020,,Linear,,Image Compression, 114 | Position-based Scaled Gradient for Model Quantization and Pruning[PyTorch]:star:14,NeurIPS 2020,Uni,,PTQ,C, 115 | Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point,NeurIPS 2020,,,,, 116 | Quantized Variational Inference[PyTorch],NeurIPS 2020,,,,, 117 | Robust Quantization: One Model to Rule Them All,NeurIPS 2020,Uni,Linear,QAT/PTQ,C, 118 | Rotated Binary Neural Network[PyTorch]:star:63,NeurIPS 2020,B,,,C, 119 | Searching for Low-Bit Weights in Quantized Neural Networks[PyTorch]:star:20,NeurIPS 2020,,,,, 120 | Ultra-Low Precision 4-bit Training of Deep Neural Networks,NeurIPS 2020,Uni,,QAT,C, 121 | Universally Quantized Neural Compression,NeurIPS 2020,,,,, 122 | TernaryBERT: Distillation-aware Ultra-low Bit BERT,EMNLP 2020,T,OptN,QAT,N, 123 | DFQF: Data Free Quantization-aware Fine-tuning,ACML 2020,Uni,Linear,QAT,C, 124 | One weight bitwidth to rule them all,ECCV 2020,,,,, 125 | DBQ: A Differentiable Branch Quantizer for Lightweight Deep Neural Networks,ECCV 2020,MP,,,C, 126 | Deep Transferring Quantization[PyTorch]:star:15,ECCV 2020,Uni,,,C, 127 | Differentiable Joint Pruning and Quantization for Hardware Efficiency,ECCV 2020,MP,,,C, 128 | Finding Non-Uniform Quantization Schemes using Multi-Task Gaussian Processes,ECCV 2020,MP,PQ,QAT,C, 129 | Generative Low-bitwidth Data Free Quantization[PyTorch]:star:23,ECCV 2020,Uni,Linear,QAT,C, 130 | HMQ: Hardware Friendly Mixed Precision Quantization Block for CNNs[PyTorch]:star:37,ECCV 2020,MP,LQ,QAT,C, 131 | PAMS: Quantized Super-Resolution via Parameterized Max Scale,ECCV 2020,Uni,,,C, 132 | Post-Training Piecewise Linear Quantization for Deep Neural Networks,ECCV 2020,T/Uni,Linear,PTQ,C, 133 | QuEST: Quantized Embedding Space for Transferring Knowledge,ECCV 2020,,,,, 134 | Quantization Guided JPEG Artifact Correction,ECCV 2020,,,,, 135 | Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization,ECCV 2020,MP,Linear,,C/O, 136 | Task-Aware Quantization Network for JPEG Image Compression,ECCV 2020,,,,, 137 | End to End Binarized Neural Networks for Text Classification,ACL 2020,B,,,, 138 | Differentiable Product Quantization for End-to-End Embedding Compression[PyTorch]:star:38,ICML 2020,MP,PQ,QAT,N, 139 | Don’t Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript,ICML 2020,MP,,,C, 140 | Moniqua: Modulo Quantized Communication in Decentralized SGD,ICML 2020,B/T,,,C, 141 | Online Learned Continual Compression with Adaptive Quantization Modules[PyTorch]:star:19,ICML 2020,,,,, 142 | Towards Accurate Post-training Network Quantization via Bit-Split and Stitching[PyTorch]:star:23,ICML 2020,T/Uni,OptN,PTQ,C/O, 143 | Training Binary Neural Networks through Learning with Noisy Supervision,ICML 2020,B,,QAT,C, 144 | Up or Down? Adaptive Rounding for Post-Training Quantization,ICML 2020,,,PTQ,, 145 | Variational Bayesian Quantization[PyTorch]:star:19,ICML 2020,,,,, 146 | Compressing deep neural networks on FPGAs to binary and ternary precision with HLS4ML,MLST 2020,,,,, 147 | Balanced binary neural networks with gated residual,ICASSP 2020,,,,, 148 | A Spatial RNN Codec for End-To-End Image Compression,CVPR 2020,,,,, 149 | "APQ: Joint Search for Network Architecture, Pruning and Quantization Policy",CVPR 2020,MP,Linear,QAT,C, 150 | AdaBits: Neural Network Quantization With Adaptive Bit-Widths,CVPR 2020,MP,Linear,PTQ,C, 151 | Adaptive Loss-Aware Quantization for Multi-Bit Networks,CVPR 2020,MP,LQ,QAT,C, 152 | Automatic Neural Network Compression by Sparsity-Quantization Joint Learning: A Constrained Optimization-Based Approach,CVPR 2020,MP,OptN,PTQ,C, 153 | Central Similarity Quantization for Efficient Image and Video Retrieval[PyTorch]:star:161,CVPR 2020,Uni,Linear,QAT,C, 154 | Data-Free Network Quantization With Adversarial Knowledge Distillation,CVPR 2020,Uni,Linear,QAT,C, 155 | Forward and Backward Information Retention for Accurate Binary Neural Networks[PyTorch]:star:133,CVPR 2020,,,,, 156 | Generalized Product Quantization Network for Semi-Supervised Image Retrieval,CVPR 2020,MP,LQ,QAT,C, 157 | LSQ+: Improving low-bit quantization through learnable offsets and better initialization,CVPR 2020,MP,LQ,QAT,C, 158 | M-LVC: Multiple Frames Prediction for Learned Video Compression[PyTorch]:star:51,CVPR 2020,,,,, 159 | OctSqueeze: Octree-Structured Entropy Model for LiDAR Compression,CVPR 2020,,,,, 160 | Structured Compression by Weight Encryption for Unstructured Pruning and Quantization,CVPR 2020,T,OptN,QAT,C, 161 | Training Quantized Neural Networks With a Full-Precision Auxiliary Module,CVPR 2020,,,,, 162 | ZeroQ: A Novel Zero Shot Quantization Framework:fire:106[PyTorch]:star:188,CVPR 2020,MP,Linear,PTQ,C/O, 163 | Neural network quantization with adaptive bitwidths,CVPR 2020,,,,, 164 | BNNsplit: Binarized Neural Networks for embedded distributed FPGA-based computing systems,DATE 2020,B,,,, 165 | PhoneBit: Efficient GPU-Accelerated Binary Neural Network Inference Engine for Mobile Phones,DATE 2020,B,,,, 166 | OrthrusPE: Runtime Reconfigurable Processing Elements for Binary Neural Networks,DATE 2020,B,,,, 167 | A Novel In-DRAM Accelerator Architecture for Binary Neural Network,COOLCHIPS 2020,,,,, 168 | BNN Pruning: Pruning Binary Neural Network Guided by Weight Flipping Frequency,ISQED 2020,B,,,, 169 | Training binary neural networks with real-to-binary convolutions:fire:66,ICLR 2020,,,,, 170 | Binaryduo: Reducing gradient mismatch in binary activation network by coupling binary activations,ICLR 2020,,,,, 171 | Dms: Differentiable dimension search for binary neural networks,ICLR 2020,,,,, 172 | Once-for-all: Train one network and specialize it for efficient deployment,ICLR 2020,,,,, 173 | Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks[PyTorch]:star:150,ICLR 2020,MP,,,C, 174 | And the Bit Goes Down: Revisiting the Quantization of Neural Networks:fire:64[PyTorch]:star:619,ICLR 2020,MP,PQ,,C, 175 | AutoQ: Automated Kernel-Wise Neural Network Quantization,ICLR 2020,MP,,,C, 176 | FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary,ICLR 2020,Uni,Linear,PTQ,C/O, 177 | Gradient $\ell_1$ Regularization for Quantization Robustness,ICLR 2020,Uni,Linear,,C, 178 | Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech,ICLR 2020,MP,PQ,,Speech, 179 | Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware,ICLR 2020,B/T/Uni,LQ,QAT,C/O, 180 | Mixed Precision DNNs: All you need is a good parametrization,ICLR 2020,MP,LQ,QAT,C, 181 | Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations,ICLR 2020,MP,,,C/N, 182 | Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks,ICLR 2020,Uni,,,C/N, 183 | MuBiNN: Multi-Level Binarized Recurrent Neural Network for EEG Signal Classification,ISCS 2020,B,,,, 184 | Riptide: Fast End-to-End Binarized Neural Networks,SysML 2020,,,,, 185 | Adversarial Attack on Deep Product Quantization Network for Image Retrieval,AAAI 2020,,PQ,,Image Retrieval, 186 | Aggregated Learning: A Vector-Quantization Approach to Learning Neural Network Classifiers[PyTorch]:star:3,AAAI 2020,,PQ,,C, 187 | Embedding Compression with Isotropic Iterative Quantization,AAAI 2020,,,,N/Image Retrieval, 188 | HLHLp: Quantized Neural Networks Training for Reaching Flat Minima in Loss Surface,AAAI 2020,B,Linear,QAT,C/N, 189 | Indirect Stochastic Gradient Quantization and its Application in Distributed,AAAI 2020,,,,, 190 | Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search,AAAI 2020,,,,, 191 | Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT:fire:125,AAAI 2020,MP,Linear,QAT,N, 192 | Quantized Compressive Sampling of Stochastic Gradients for Efficient Communication in Distributed Deep Learning,AAAI 2020,,,,, 193 | RTN: Reparameterized Ternary Network,AAAI 2020,,,,, 194 | Towards Accurate Low Bit-width Quantization with Multiple Phase Adaptations,AAAI 2020,,,,, 195 | Towards Accurate Quantization and Pruning via Data-free Knowledge Transfer,AAAI 2020,MP,LQ,PTQ,C, 196 | Vector Quantization-Based Regularization for Autoencoders[PyTorch]:star:11,AAAI 2020,,,,, 197 | Training Binary Neural Networks using the Bayesian Learning Rule,CoRR 2020,B,,,, 198 | Integer quantization for deep learning inference: Principles and empirical evaluation,arXiv 2020,,,,, 199 | Wrapnet: Neural net inference with ultra-low-resolution arithmetic,arXiv 2020,,,,, 200 | Leveraging automated mixed-low-precision quantization for tiny edge microcontrollers,arXiv 2020,,,,, 201 | Biqgemm: matrix multiplication with lookup table for binary-coding-based quantized dnns,arXiv 2020,,,,, 202 | Near-lossless post-training quantization of deep neural networks via a piecewise linear approximation,arXiv 2020,,,,, 203 | Efficient execution of quantized deep learning models: A compiler approach,arXiv 2020,,,,, 204 | A statistical framework for low-bitwidth training of deep neural networks,arXiv 2020,,,,, 205 | What is the state of neural network pruning?,arXiv 2020,,,,, 206 | Language models are fewshot learners,arXiv 2020,,,,, 207 | Shifted and squeezed 8-bit floating point format for low-precision training of deep neural networks,arXiv 2020,,,,, 208 | Gradient l1 regularization for quantization robustness,arXiv 2020,,,,, 209 | BinaryBERT: Pushing the Limit of BERT Quantization,arXiv 2020,B,,,, 210 | Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck,arXiv 2020,B,,,, 211 | Towards Lossless Binary Convolutional Neural Networks Using Piecewise Approximation,arXiv 2020,B,,,, 212 | RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks,arXiv 2020,B,,,, 213 | MeliusNet: Can Binary Neural Networks Achieve MobileNet-level Accuracy?,arXiv 2020,B,,,, 214 | Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs,arXiv 2020,B,,,, 215 | Distillation Guided Residual Learning for Binary Convolutional Neural Networks,arXiv 2020,B,,,, 216 | How Does Batch Normalization Help Binary Training?,arXiv 2020,B,,,, 217 | Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming[PyTorch],arXiv 2020,MP,OptN,PTQ,C/N, 218 | A Product Engine for Energy-Efficient Execution of Binary Neural Networks Using Resistive Memories,VLSI-SoC 2019,,,,, 219 | Deep Binary Reconstruction for Cross-Modal Hashing:fire:78,TMM 2019,,,,, 220 | Compact Hash Code Learning With Binary Deep Neural Network,TM 2019,,,,, 221 | Recursive Binary Neural Network Training Model for Efficient Usage of On-Chip Memory,TCSI 2019,,,,, 222 | Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays,TCSI 2019,,,,, 223 | An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width,JSSC 2019,,,,, 224 | A Review of Binarized Neural Networks,Electronics 2019,,,,, 225 | Binarized Neural Networks for Resource-Efficient Hashing with Minimizing Quantization Loss,IJCAI 2019,,,,, 226 | Binarized Collaborative Filtering with Distilling Graph Convolutional Network,IJCAI 2019,,,,, 227 | BiScaled-DNN: Quantizing Long-tailed Datastructures with Two Scale Factors for Deep Neural Networks,DAC 2019,,,,, 228 | Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization:fire:53,NeurIPS 2019,,,,, 229 | A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off[PyTorch]:star:12,NeurIPS 2019,,,,,Theory 230 | Bit Efficient Quantization for Deep Neural Networks,NeurIPS 2019,Uni,Linear/Log,PTQ,C, 231 | Communication-Efficient Distributed Learning via Lazily Aggregated Quantized Gradients,NeurIPS 2019,,,,, 232 | Dimension-Free Bounds for Low-Precision Training,NeurIPS 2019,,Log,,,Theory 233 | Double Quantization for Communication-Efficient Distributed Optimization,NeurIPS 2019,,,,, 234 | Focused Quantization for Sparse CNNs,NeurIPS 2019,Uni,LQ,,C, 235 | Generalization Error Analysis of Quantized Compressive Learning,NeurIPS 2019,,,,,Theory 236 | Hybrid 8-bit Floating Point (HFP8) Training and Inference for Deep Neural Networks:fire:56,NeurIPS 2019,Uni,,PTQ,C/O/N, 237 | MetaQuant: Learning to Quantize by Learning to Penetrate Non-differentiable Quantization[PyTorch]:star:48,NeurIPS 2019,B,,QAT,C, 238 | Model Compression with Adversarial Robustness: A Unified Optimization Framework,NeurIPS 2019,Uni,,,C, 239 | Normalization Helps Training of Quantized LSTM,NeurIPS 2019,B/T/Uni,,,C/N, 240 | Post-training 4-bit quantization of convolution networks for rapid-deployment:fire:161[PyTorch]:star:163,NeurIPS 2019,T/Uni,,PTQ,C, 241 | "Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations:fire:126",NeurIPS 2019,B,Randomized/Sign,,C, 242 | Using Neuroevolved Binary Neural Networks to solve reinforcement learning environments,APCCAS 2019,,,,, 243 | BinaryDenseNet: Developing an architecture for binary neural networks,ICCVW 2019,,,,, 244 | Low-bit quantization of neural networks for efficient inference:fire:112,ICCV 2019,,,,, 245 | Bayesian Optimized 1-Bit CNNs,ICCV 2019,B,,,C, 246 | Data-Free Quantization Through Weight Equalization and Bias Correction:fire:135,ICCV 2019,Uni,Linear,PTQ,C, 247 | Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks:fire:129,ICCV 2019,B/T/Uni,,,C, 248 | HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision:fire:155,ICCV 2019,MP,Linear,QAT,C, 249 | Proximal Mean-Field for Neural Network Quantization,ICCV 2019,B,,,C, 250 | Unsupervised Neural Quantization for Compressed-Domain Similarity Search[PyTorch]:star:28,ICCV 2019,MP,LQ,,Image Retrieval, 251 | Training Accurate Binary Neural Networks from Scratch,ICIP 2019,,,,, 252 | Binarized Depthwise Separable Neural Network for Object Tracking in FPGA,GLSVLSI 2019,,,,, 253 | Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design:fire:169[PyTorch]:star:152,ICML 2019,,,,, 254 | Improving Neural Network Quantization without Retraining using Outlier Channel Splitting:fire:114[PyTorch]:star:80,ICML 2019,Uni,Linear,PTQ,C, 255 | Lossless or Quantized Boosting with Integer Arithmetic,ICML 2019,,,,C, 256 | SWALP : Stochastic Weight Averaging in Low-Precision Training[PyTorch]:star:52,ICML 2019,Uni,Linear,,C, 257 | PXNOR: Perturbative Binary Neural Network,ROEDUNET 2019,,,,, 258 | Learning channel-wise interactionsfor binary convolutional neural networks,CVPR 2019,,,,, 259 | Circulant binary convolutional networks: Enhancing the performance of 1-bit dcnns with circulant back propagation,CVPR 2019,,,,, 260 | Fighting quantization bias with bias,CVPR 2019,,,,, 261 | A Main/Subsidiary Network Framework for Simplifying Binary Neural Networks,CVPR 2019,B,,,C, 262 | Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?:fire:84,CVPR 2019,B,,,C, 263 | Compressing Unknown Images With Product Quantizer for Efficient Zero-Shot Classification,CVPR 2019,MP,PQ/LQ,,C/ZSL/GZSL, 264 | Deep Spherical Quantization for Image Search,CVPR 2019,,,,Image Search, 265 | End-To-End Supervised Product Quantization for Image Search and Retrieval,CVPR 2019,,PQ,,Image Search/Retrieval, 266 | Fully Quantized Network for Object Detection:fire:59,CVPR 2019,,,,, 267 | HAQ: Hardware-Aware Automated Quantization With Mixed Precision:fire:305[PyTorch]:star:243,CVPR 2019,MP,Linear/K,QAT,C, 268 | Learning Channel-Wise Interactions for Binary Convolutional Neural Networks,CVPR 2019,B,,,C, 269 | Learning to Quantize Deep Networks by Optimizing Quantization Intervals With Task Loss:fire:168,CVPR 2019,T/Uni,LQ,,C, 270 | Quantization Networks:fire:84,CVPR 2019,Uni,,QAT,C/O, 271 | SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization,CVPR 2019,T/Uni,Linear,,C, 272 | Simultaneously Optimizing Weight and Quantizer of Ternary Neural Network Using Truncated Gaussian Approximation,CVPR 2019,T,LQ,QAT,C, 273 | Structured Binary Neural Networks for Accurate Image Classification and Semantic Segmentation:fire:91,CVPR 2019,B,,,C/S, 274 | Variational information distillation for knowledge transfer:fire:188,CVPR 2019,,,,, 275 | Proxylessnas: Direct neural architecture search on target task and hardware,ICLR 2019,,,,, 276 | Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks,ICLR 2019,MP,,QAT,C,Theory 277 | Analysis of Quantized Models,ICLR 2019,,,,,Theory 278 | Defensive Quantization: When Efficiency Meets Robustness:fire:81,ICLR 2019,,,,C, 279 | Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network,ICLR 2019,B/T/Uni,,,C,CoDesign 280 | From Hard to Soft: Understanding Deep Network Nonlinearities via Vector Quantization and Statistical Inference,ICLR 2019,MP,PQ,QAT,N, 281 | Learning Recurrent Binary/Ternary Weights[PyTorch]:star:13,ICLR 2019,B/T,,QAT,C/N, 282 | On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks,ICLR 2019,,,,C,Theory 283 | Per-Tensor Fixed-Point Quantization of the Back-Propagation Algorithm,ICLR 2019,,,,C, 284 | ProxQuant: Quantized Neural Networks via Proximal Operators:fire:56[PyTorch]:star:17,ICLR 2019,B,,QAT,C, 285 | Relaxed Quantization for Discretized Neural Networks:fire:74,ICLR 2019,,LQ,,C,Stochastic 286 | Understanding Straight-Through Estimator in Training Activation Quantized Neural Nets:fire:81,ICLR 2019,,,,C,Theory 287 | Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA,FPGA 2019,,,,, 288 | Deep Neural Network Quantization via Layer-Wise Optimization using Limited Training Data[PyTorch]:star:30,AAAI 2019,Uni,,,C, 289 | Efficient Quantization for Compact Neural Networks with Binary Weights and Low Bitwidth Activations,AAAI 2019,B,Linear/Log,QAT,C, 290 | "Multi-Precision Quantized Neural Networks via Encoding Decomposition of {-1,+1}",AAAI 2019,MP,,,C/O, 291 | Similarity Preserving Deep Asymmetric Quantization for Image Retrieval,AAAI 2019,,,QAT,Image Retrieval, 292 | RBCN: Rectified Binary Convolutional Networks for Enhancing the Performance of 1-bit DCNNs,CoRR 2019,B,,,, 293 | TentacleNet: A Pseudo-Ensemble Template for Accurate Binary Convolutional Neural Networks,CoRR 2019,B,,,, 294 | Improved training of binary networks for human pose estimation and image recognition,CoRR 2019,B,,,, 295 | Binarized Neural Architecture Search,CoRR 2019,,,,, 296 | Matrix and tensor decompositions for training binary neural networks,CoRR 2019,,,,, 297 | Back to Simplicity: How to Train Accurate BNNs from Scratch?,CoRR 2019,,,,, 298 | MoBiNet: A Mobile Binary Network for Image Classification,arXiv 2019,,,,, 299 | Training high-performance and large-scale deep neural networks with full 8-bit integers,arXiv 2019,,,,, 300 | Knowledge distillation for optimization of quantized deep neural networks,arXiv 2019,,,,, 301 | Accurate and compact convolutional neural networks with trained binarization,arXiv 2019,,,,, 302 | Mixed precision training with 8-bit floating point,arXiv 2019,,,,, 303 | Additive powers-of-two quantization: An efficient nonuniform discretization for neural networks,arXiv 2019,,,,, 304 | Regularizing activation distribution for training binarized deep networks:fire:61,arXiv 2019,,,,, 305 | The knowledge within: Methods for data-free model compression:fire:50,arXiv 2019,,,,, 306 | Xnornet++: Improved binary neural networks,arXiv 2019,,,,, 307 | Mixed Precision Quantization of ConvNets via Differentiable Neural Architecture Search,arXiv 2019,,,,, 308 | QKD: Quantization-aware Knowledge Distillation,arXiv 2019,,,,, 309 | daBNN: A Super Fast Inference Framework for Binary Neural Networks on ARM devices,arXiv 2019,,,,, 310 | Towards Unified INT8 Training for Convolutional Neural Network,arXiv 2019,,,,, 311 | BNN+: Improved Binary Network Training:fire:72,arXiv 2019,,,,, 312 | Learned Step Size Quantization:fire:129,arXiv 2019,MP,LQ,QAT,C, 313 | "Same, Same But Different: Recovering Neural Network Quantization Error Through Weight Factorization",arXiv 2019,Uni,Linear,PTQ,C, 314 | An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks,TVLSI 2018,,,,, 315 | FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks,TRETS 2018,,,,, 316 | Inference of quantized neural networks on heterogeneous all-programmable devices,NE 2018,,,,, 317 | A Deep Look into Logarithmic Quantization of Model Parameters in Neural Networks,IAIT 2018,,,,, 318 | Deterministic Binary Filters for Convolutional Neural Networks,IJCAI 2018,,,,, 319 | Planning in Factored State and Action Spaces with Learned Binarized Neural Network Transition Models,IJCAI 2018,,,,, 320 | A Quantization-Friendly Separable Convolution for MobileNets,EMC2 2018,,,,, 321 | Moonshine: Distilling with cheap convolutions,NeurIPS 2018,,,,, 322 | A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication:fire:104,NeurIPS 2018,,,,, 323 | GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training,NeurIPS 2018,,PQ,,C, 324 | Heterogeneous Bitwidth Binarization in Convolutional Neural Networks,NeurIPS 2018,MP,,,C, 325 | HitNet: Hybrid Ternary Recurrent Neural Network,NeurIPS 2018,T/Uni,,,N,Stochastic 326 | Scalable methods for 8-bit training of neural networks:fire:151[PyTorch]:star:191,NeurIPS 2018,Uni,,,C, 327 | Training Deep Neural Networks with 8-bit Floating Point Numbers:fire:213,NeurIPS 2018,Uni,,,C,Stochastic 328 | A survey of FPGA-based accelerators for convolutional neural networks,NCA 2018,,,,, 329 | BitStream: Efficient Computing Architecture for Real-Time Low-Power Inference of Binary Neural Networks on CPUs,MM 2018,,,,, 330 | FBNA: A Fully Binarized Neural Network Accelerator,FPL 2018,,,,, 331 | Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm:fire:222[Caffe&pytorch]:star:138,ECCV 2018,B,,,, 332 | LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks:fire:325[PyTorch]:star:207,ECCV 2018,B/Uni,LQ,QAT,C, 333 | LSQ++: Lower running time and higher recall in multi-codebook quantization,ECCV 2018,MP,LQ,,C, 334 | Learning Compression from limited unlabeled Data,ECCV 2018,Uni,,,C, 335 | Product Quantization Network for Fast Image Retrieval,ECCV 2018,Uni,PQ,,Image Retrieval, 336 | Quantization Mimic: Towards Very Tiny CNN for Object Detection:fire:55,ECCV 2018,Uni,,,O, 337 | Quantized Densely Connected U-Nets for Efficient Landmark Localization:fire:105,ECCV 2018,,,,, 338 | TBN: Convolutional Neural Network with Ternary Inputs and Binary Weights:fire:57,ECCV 2018,,,,, 339 | Training Binary Weight Networks via Semi-Binary Decomposition,ECCV 2018,,,,, 340 | Value-aware Quantization for Training and Inference of Neural Networks:fire:75,ECCV 2018,,,,, 341 | Gap-8: A risc-v soc for ai at the edge of the iot,ASAP 2018,,,,, 342 | Distilled binary neural network for monaural speech separation,IJCNN 2018,,,,, 343 | Fast object detection based on binary deep convolution neural networks,IJCNN 2018,,,,, 344 | Analysis and Implementation of Simple Dynamic Binary Neural Networks,IJCNN 2018,,,,, 345 | SIGNSGD: compressed optimisation for non-convex problems:fire:393[PyTorch]:star:54,ICML 2018,,,,,Gradient 346 | Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation:fire:59,CVPR 2018,,,,, 347 | Explicit loss-error-aware quantization for low-bit deep neural networks:fire:67,CVPR 2018,,,,, 348 | A biresolution spectral framework for product quantization,CVPR 2018,,,,, 349 | Amc: Automl for model compression and acceleration on mobile devices:fire:814,CVPR 2018,,,,, 350 | Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations,CVPR 2018,,,,, 351 | Modulated convolutional networks,CVPR 2018,B,,,, 352 | Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference:fire:1013,CVPR 2018,,,,, 353 | SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks:fire:84[PyTorch]:star:31,CVPR 2018,T,,,, 354 | Towards Effective Low-bitwidth Convolutional Neural Networks:fire:121,CVPR 2018,,,QAT,, 355 | Two-Step Quantization for Low-bit Neural Networks:fire:72,CVPR 2018,,,,, 356 | CLIP-Q: Deep Network Compression Learning by In-parallel Pruning-Quantization:fire:165,CVPR 2018,,,,, 357 | BitFlow: Exploiting Vector Parallelism for Binary Neural Networks on CPU,IPDPS 2018,,,,, 358 | Mixed Precision Training of Convolutional Neural Networks using Integer Operations:fire:117,ICLR 2018,,,,, 359 | An empirical study of binary neural networks’ optimisation,ICLR 2018,,,,, 360 | Adaptive Quantization of Neural Networks,ICLR 2018,,,,, 361 | Alternating Multi-bit Quantization for Recurrent Neural Networks:fire:87,ICLR 2018,,,,, 362 | Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy:fire:208,ICLR 2018,,,,, 363 | Energy-Constrained Compression for Deep Neural Networks via Weighted Sparse Projection and Layer Input Masking[PyTorch]:star:15,ICLR 2018,,,,,CoDesign 364 | Loss-aware Weight Quantization of Deep Networks:fire:94,ICLR 2018,,,,, 365 | Model compression via distillation and quantization:fire:353[PyTorch]:star:293,ICLR 2018,,,,, 366 | Training and Inference with Integers in Deep Neural Networks:fire:231[[tensorflow]]:star:132,ICLR 2018,,,,, 367 | Variational Network Quantization:fire:58,ICLR 2018,,,,, 368 | WRPN: Wide Reduced-Precision Networks:fire:180,ICLR 2018,,,,, 369 | Adaptive Quantization for Deep Neural Networ:fire:70,AAAI 2018,,,,, 370 | Deep Neural Network Compression with Single and Multiple Level Quantization:fire:65[PyTorch]:star:20,AAAI 2018,,,,, 371 | Distributed Composite Quantization,AAAI 2018,,,,, 372 | Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM:fire:207,AAAI 2018,,,,, 373 | From Hashing to CNNs: Training Binary Weight Networks via Hashing:fire:62,AAAI 2018,B,,,, 374 | Product Quantized Translation for Fast Nearest Neighbor Search,AAAI 2018,,,,, 375 | Quantized Memory-Augmented Neural Networks,AAAI 2018,,,,, 376 | ReBNet: Residual Binarized Neural Network,ISFPCCM 2018,,,,, 377 | LightNN: Filling the Gap between Conventional Deep Neural Networks and Binarized Networks,CoRR 2018,,,,, 378 | BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights,CoRR 2018,,,,, 379 | Learning low precision deep neural networks through regularization,arXiv 2018,,,,, 380 | Blended coarse gradient descent for full quantization of deep neural networks:fire:48,arXiv 2018,,,,, 381 | XNOR Neural Engine: A Hardware Accelerator IP for 21.6-fJ/op Binary Neural Network Inference:fire:72,arXiv 2018,,,,, 382 | A Survey on Methods and Theories of Quantized Neural Networks:fire:128,arXiv 2018,,,,, 383 | Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines,arXiv 2018,,,,, 384 | Discovering low-precision networks close to full-precision networks for efficient embedded inference:fire:83,arXiv 2018,,,,, 385 | On periodic functions as regularizers for quantization of neural networks,arXiv 2018,,,,, 386 | Rethinking floating point for deep learning:fire:95,arXiv 2018,,,,, 387 | Quantizing deep convolutional networks for efficient inference: A whitepaper:fire:425,arXiv 2018,,,,, 388 | Quantization for rapid deployment of deep neural networks,arXiv 2018,,,,, 389 | Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation,arXiv 2018,,,,, 390 | Uniq: Uniform noise injection for non-uniform quantization of neural networks,arXiv 2018,,,,, 391 | Training Competitive Binary Neural Networks from Scratch,arXiv 2018,,,,, 392 | Joint Neural Architecture Search and Quantization,arXiv 2018,,,,, 393 | BinaryRelax: A Relaxation Approach For Training Deep Neural Networks With Quantized Weights:fire:55,arXiv 2018,,,,, 394 | Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients:fire:1209[PyTorch],arXiv 2018,Uni,,,, 395 | Espresso: Efficient Forward Propagation for BCNNs,arXiv 2018,,,,,CoDesign 396 | Mixed Precision Training:fire:601,arXiv 2018,Uni,,,, 397 | PACT: Parameterized Clipping Activation for Quantized Neural Networks:fire:341,arXiv 2018,,,,, 398 | Terngrad: Ternary gradients to reduce communication in distributed deep learning:fire:649,NeurIPS 2017,,,,, 399 | QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding:fire:696,NeurIPS 2017,,,,,Gradient 400 | Towards Accurate Binary Convolutional Neural Network:fire:193[TensorFlow]:star:49,NeurIPS 2017,B,,,, 401 | Training Quantized Nets: A Deeper Understanding:fire:134,NeurIPS 2017,,,,, 402 | Learning Accurate Low-Bit Deep Neural Networks with Stochastic Quantization[Caffe],BMVC 2017,,,,, 403 | Performance guaranteed network acceleration via high-order residual quantization,ICCV 2017,,,,, 404 | Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources:fire: 130[PyTorch]:star:207,ICCV 2017,B,,,, 405 | Performance Guaranteed Network Acceleration via High-Order Residual Quantization:fire:55,ICCV 2017,,,,, 406 | Binary Deep Neural Networks for Speech Recognition,Interspeech 2017,B,,,, 407 | Ternary neural networks for resource-efficient AI applications,IJCNN 2017,,,,, 408 | Fixed-point optimization of deep neural networks with adaptive step size retraining,ICASSP 2017,MP,,,, 409 | Deep Learning with Low Precision by Half-wave Gaussian Quantization:fire:288[Caffe]:star:118,CVPR 2017,,,,, 410 | Fixed-point Factorized Networks,CVPR 2017,T,,,,Factor 411 | Local Binary Convolutional Neural Networks:star:94:fire:156,CVPR 2017,,,,, 412 | Network Sketching: Exploiting Binary Structure in Deep CNNs:fire:71,CVPR 2017,B,,,, 413 | Weighted-Entropy-Based Quantization for Deep Neural Networks:fire:144,CVPR 2017,,Non,,, 414 | A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks,DC 2017,B,,,,CoDesign 415 | On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA,IPDPSW 2017,,,,,CoDesign 416 | Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights:fire:607[PyTorch]:star:181,ICLR 2017,T/Uni,Log,QAT,C, 417 | Learning Discrete Weights Using the Local Reparameterization Trick:fire:61,ICLR 2017,,,,,Stochastic 418 | Loss-aware Binarization of Deep Networks:fire:119[PyTorch]:star:18,ICLR 2017,B,,,, 419 | Soft Weight-Sharing for Neural Network Compression:fire:222:star:18,ICLR 2017,B,,,, 420 | Towards the Limit of Network Quantization:fire:114,ICLR 2017,Uni,,,, 421 | "FINN: A Framework for Fast, Scalable Binarized Neural Network Inference:fire:463",FPGA 2017,B,,,, 422 | How to train a compact binary neural network with high accuracy?:fire:205,AAAI 2017,,,,, 423 | Adaptive Quantization for Deep Neural Network:fire:67,AAAI 2017,MP,,,, 424 | The high-dimensional geometry of binary neural networks,CoRR 2017,,,,, 425 | BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet,CoRR 2017,,,,, 426 | Deep learning binary neural network on an FPGA,arXiv 2017,B,,,, 427 | FP-BNN: Binarized neural network on FPGA:126:,arXiv 2017,B,,,,CoDesign 428 | Accelerating Deep Convolutional Networks using low-precision and sparsity:fire:111,arXiv 2017,T,,,, 429 | Bit-regularized optimization of neural nets,arXiv 2017,,,,, 430 | Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks,arXiv 2017,,,,, 431 | Learning deep binary descriptor with multi-quantization:fire:97,arXiv 2017,,,,, 432 | Gxnor-net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework,arXiv 2017,,,,, 433 | Soft-to-hard vector quantization for end-to-end learning compressible representations,arXiv 2017,,,,, 434 | ShiftCNN: Generalized Low-Precision Architecture for Inference of Convolutional Neural Networks[TensorFlow]:star:53,arXiv 2017,,,,, 435 | Ternary Neural Networks with Fine-Grained Quantization:fire:71,arXiv 2017,T,,,, 436 | Trained Ternary Quantization:fire:734,arXiv 2017,T,,,, 437 | Decision making with quantized priors leads to discrimination,JPROC 2016,,,,, 438 | Communication quantization for data-parallel training of deep neural networks:fire:130,MLHPC 2016,,,,, 439 | XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks:fire:3469[PyTorch]:star:807,ECCV 2016,B,,,, 440 | Overcoming challenges in fixed point training of deep convolutional networks,ICMLW 2016,,,,, 441 | Fixed point quantization of deep convolutional networks:fire:696,ICML 2016,,,,, 442 | "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding:fire:5045",CVPR 2016,,,,, 443 | Quantized convolutional neural networks for mobile devices:fire:270,CVPR 2016,,,,, 444 | Fixed-point Performance Analysis of Recurrent Neural Networks:fire:67,arXiv 2016,,,,, 445 | Qsgd: Randomized quantization for communication-optimal stochastic gradient descent:fire:801,arXiv 2016,,,,, 446 | Effective quantization methods for recurrent neural networks:fire:62,arXiv 2016,,,,, 447 | Sigma delta quantized networks,arXiv 2016,,,,, 448 | Recurrent neural networks with limited numerical precision:fire:65,arXiv 2016,,,,, 449 | Training bit fully convolutional network for fast semantic segmentation,arXiv 2016,,,,, 450 | Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations:fire:1347,arXiv 2016,,,,, 451 | Convolutional neural networks using logarithmic data representation:fire:320,arXiv 2016,,,,, 452 | Layer normalization:fire:4125,arXiv 2016,,,,, 453 | Binarized Neural Networks on the ImageNet Classification Task,arXiv 2016,B,,,, 454 | Binarized Neural Networks: Training Neural Networks with Weights and Activations Constrained to +1 or −1:fire:1574[PyTorch]:star:252,arXiv 2016,B,,,, 455 | Deep neural networks are robust to weight binarization and other non-linear distortions:fire:77,arXiv 2016,,,,, 456 | Neural Networks with Few Multiplications:fire:258[PyTorch]:star:81,arXiv 2016,T,,,, 457 | Ternary weight networks:fire:647[Caffe]:star:63,arXiv 2016,T,,,, 458 | Batch normalization: Accelerating deep network training by reducing internal covariate shift:fire:32893,PMLR 2015,,,,, 459 | BinaryConnect: Training Deep Neural Networks with binary weights during propagations:fire:2267[PyTorch]:star:344,NeurIPS 2015,B,,,, 460 | Bitwise Neural Networks:fire:191,ICML 2015,B,,,, 461 | Compressing neural networks with hashing trick:fire:887,ICML 2015,,,,, 462 | Deep Learning with Limited Numerical Precision:fire:1378,ICML 2015,Uni,,,, 463 | Fixed point optimization of deep convolutional neural networks for object recognition:fire:226,ICASSP 2015,,,,, 464 | 8-Bit Approximations for Parallelism in Deep Learning:fire:114,ICLR 2015,,,,, 465 | Training deep neural networks with low precision multiplications:fire:498,ICLR 2015,,,,, 466 | Rounding methods for neural networks with low resolution synaptic weights,arXiv 2015,,,,, 467 | Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation:fire:50,arXiv 2015,,,,, 468 | Resiliency of Deep Neural Networks under quantizations:fire:123,arXiv 2015,,,,, 469 | "Fixed-point feedforward deep neural network design using weights +1, 0, and −1:fire:269",SiPS 2014,,,,, 470 | Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights:fire:190,NeurIPS 2014,,,,,Stochastic 471 | 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns:fire:679,Interspeech 2014,,,,, 472 | Compressing deep convolutional networks using vector quantization:fire:981,arXiv 2014,,,,, 473 | Lowrank matrix factorization for deep neural network training with high-dimensional output targets:fire:563,ICASSP 2013,,,,, 474 | Estimating or propagating gradients through stochastic neurons for conditional computation:fire:1346,arXiv 2013,,,,, 475 | Product quantization for nearest neighbor search:fire:2268,TPAMI 2010,,,,, 476 | An introduction to natural computation:fire:309,MITPress 1999,,,,, -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Awesome-Quantization-Papers [![Awesome](https://awesome.re/badge.svg)](https://awesome.re) 2 | 3 | This repo contains a comprehensive paper list of **Model Quantization** for efficient deep learning on AI conferences/journals/arXiv. As a highlight, we categorize the papers in terms of model structures and application scenarios, and label the quantization methods with keywords.
4 | 5 | This repo is being actively updated, and contributions in any form to make this list more comprehensive are welcome. Special thanks to collaborator [Zhikai Li](https://github.com/zkkli), and all researchers who have contributed to this repo!
6 | 7 | If you find this repo useful, please consider **★STARing** and feel free to share it with others!
8 | 9 | **[Update: Mar, 2024]** Add new papers from ICLR-25.
10 | **[Update: Nov, 2024]** Add new papers from ECCV-24 and NeurIPS-24.
11 | **[Update: Sep, 2024]** Add new papers from ICML-24 and IJCAI-24.
12 | **[Update: Jul, 2024]** Add new papers from CVPR-24.
13 | **[Update: May, 2024]** Add new papers from ICLR-24.
14 | **[Update: Apr, 2024]** Add new papers from AAAI-24.
15 | **[Update: Nov, 2023]** Add new papers from NeurIPS-23.
16 | **[Update: Oct, 2023]** Add new papers from ICCV-23.
17 | **[Update: Jul, 2023]** Add new papers from AAAI-23 and ICML-23.
18 | **[Update: Jun, 2023]** Add new arXiv papers uploaded in May 2023, especially the hot LLM quantization field.
19 | **[Update: Jun, 2023]** Reborn this repo! New style, better experience!
20 | 21 | --- 22 | ## Overview 23 | 24 | - [Awesome-Quantization-Papers ](#awesome-quantization-papers-) 25 | - [Overview](#overview) 26 | - [Survey](#survey) 27 | - [Transformer-based Models](#transformer-based-models) 28 | - [Language Transformers](#language-transformers) 29 | - [Vision Transformers](#vision-transformers) 30 | - [Visual Generation](#visual-generation) 31 | - [Convolutional Neural Networks](#convolutional-neural-networks) 32 | - [Visual Generation](#visual-generation-1) 33 | - [Image Classification](#image-classification) 34 | - [Other Tasks](#other-tasks) 35 | - [Object Detection](#object-detection) 36 | - [Super Resolution](#super-resolution) 37 | - [Point Cloud](#point-cloud) 38 | - [References](#references) 39 | 40 | **Keywords**: **`PTQ`**: post-training quantization | **`Non-uniform`**: non-uniform quantization | **`MP`**: mixed-precision quantization | **`Extreme`**: binary or ternary quantization 41 | 42 | --- 43 | 44 | 45 | ## Survey 46 | - "A Survey of Quantization Methods for Efficient Neural Network Inference", Book Chapter: Low-Power Computer Vision, 2021. [[paper](https://arxiv.org/abs/2103.13630)] 47 | - "Full Stack Optimization of Transformer Inference: a Survey", arXiv, 2023. [[paper](https://arxiv.org/abs/2302.14017)] 48 | - "A White Paper on Neural Network Quantization", arXiv, 2021. [[paper](https://arxiv.org/abs/2106.08295)] 49 | - "Binary Neural Networks: A Survey", PR, 2020. [[Paper](https://arxiv.org/abs/2004.03333)] [**`Extreme`**] 50 | 51 | 52 | ## Transformer-based Models 53 | ### Language Transformers 54 | - "CBQ: Cross-Block Quantization for Large Language Models", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/28924)] 55 | - "SpinQuant: LLM Quantization with Learned Rotations", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/28338)] 56 | - "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/30168)] 57 | - "Q-VLM: Post-training Quantization for Large Vision-Language Models", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/94107)] 58 | - "KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/96936)] 59 | - "QBB: Quantization with Binary Bases for LLMs", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/95634)] 60 | - "DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/93727)] 61 | - "ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/96563)] 62 | - "KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/93558)] 63 | - "Evaluating Quantized Large Language Models", ICML, 2024. [[paper](https://openreview.net/forum?id=DKKg5EFAFr)] 64 | - "SqueezeLLM: Dense-and-Sparse Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=0jpbpFia8m)] [**`PTQ`**] [**`Non-uniform`**] 65 | - "KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache", ICML, 2024. [[paper](https://openreview.net/forum?id=L057s2Rq8O)] 66 | - "LQER: Low-Rank Quantization Error Reconstruction for LLMs", ICML, 2024. [[paper](https://openreview.net/forum?id=dh8k41g775)] 67 | - "Extreme Compression of Large Language Models via Additive Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=5mCaITRTmO)] 68 | - "BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=DbyHDYslM7)] 69 | - "BiLLM: Pushing the Limit of Post-Training Quantization for LLMs", ICML, 2024. [[paper](https://openreview.net/forum?id=qOl2WWOqFg)] 70 | - "Compressing Large Language Models by Joint Sparsification and Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=sCGRhnuMUJ)] 71 | - "FrameQuant: Flexible Low-Bit Quantization for Transformers", ICML, 2024. [[paper](https://openreview.net/forum?id=xPypr0kufs)] [**`PTQ`**] 72 | - "OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=8Wuvhh0LYW)]" 73 | - "LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=LzPWWPAdY4)] 74 | - "SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression", ICLR, 2024. [[paper](https://openreview.net/forum?id=Q1u25ahSuy)] [**`PTQ`**] 75 | - "QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=WvFoJccpo8)] 76 | - "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=FIplmUWdm3)] [**`PTQ`**] 77 | - "PB-LLM: Partially Binarized Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=BifeBRhikU)] [**`Extreme`**] 78 | - "AffineQuant: Affine Transformation Quantization for Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=of2rhALq8l)] 79 | - "Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=JzG7kSpjJk)] 80 | - "LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=gLARhFLE0F)] 81 | - "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29237)] 82 | - "Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29815)] 83 | - "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29860)] 84 | - "Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29908)] [**`PTQ`**] 85 | - "What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29765)] 86 | - "EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs", arXiv, 2024. [[paper](http://arxiv.org/abs/2403.02775)] 87 | - "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact", arXiv, 2024. [[paper](http://arxiv.org/abs/2403.01241)] 88 | - "FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.17985)] 89 | - "A Comprehensive Evaluation of Quantization Strategies for Large Language Models", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.16775)] 90 | - "GPTVQ: The Blessing of Dimensionality for LLM Quantization", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.15319)] 91 | - "APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.14866)] 92 | - "EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.10787)] 93 | - "RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.05628)] 94 | - "Accurate LoRA-Finetuning Quantization of LLMs via Information Retention", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.05445)] 95 | - "KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.18079)] 96 | - "Extreme Compression of Large Language Models via Additive Quantization", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.06118)] 97 | - "ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.08583)] [**`PTQ`**] 98 | - "CBQ: Cross-Block Quantization for Large Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.07950)] [**`PTQ`**] 99 | - "FP8-BERT: Post-Training Quantization for Transformer", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.05725)] [**`PTQ`**] 100 | - "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.05693)] 101 | - "SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.03788)] [**`PTQ`**] 102 | - "A Speed Odyssey for Deployable Quantization of LLMs", arXiv, 2023. [[paper](http://arxiv.org/abs/2311.09550)] 103 | - "AFPQ: Asymmetric Floating Point Quantization for LLMs", arXiv, 2023. [[paper](http://arxiv.org/abs/2311.01792)] 104 | - "Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization", arXiv, 2023. [[paper](http://arxiv.org/abs/2311.16442)] 105 | - "QLoRA: Efficient Finetuning of Quantized LLMs", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/71815)] [[code](https://github.com/artidoro/qlora)] 106 | - "QuIP: 2-Bit Quantization of Large Language Models With Guarantees", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/69982)] [[code](https://github.com/jerry-chee/QuIP)] [**`PTQ`**] 107 | - "Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/72931)] 108 | - "QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources", arXiv, 2023. [[paper](https://arxiv.org/abs/2310.07147)] 109 | - "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2310.16795)] 110 | - "Atom: Low-bit Quantization for Efficient and Accurate LLM Serving", arXiv, 2023. [[paper](http://arxiv.org/abs/2310.19102)] 111 | - "ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers", arXiv, 2023. [[paper](http://arxiv.org/abs/2310.17723)] 112 | - "LLM-FP4: 4-Bit Floating-Point Quantized Transformers", arXiv, 2023. [[paper](https://arxiv.org/abs/2310.16836)] 113 | - "TEQ: Trainable Equivalent Transformation for Quantization of LLMs", arXiv, 2023. [[paper](http://arxiv.org/abs/2310.10944)] 114 | - "Efficient Post-training Quantization with FP8 Formats", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.14592)] 115 | - "Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantization", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.13575)] 116 | - "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.05516)] 117 | - "Norm Tweaking: High-performance Low-bit Quantization of Large Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.02784)] 118 | - "Understanding the Impact of Post-Training Quantization on Large Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.05210)] 119 | - "QuantEase: Optimization-based Quantization for Language Models -- An Efficient and Intuitive Algorithm", arXiv, 2023. [[paper](http://arxiv.org/abs/2309.01885)] 120 | - "FPTQ: Fine-grained Post-Training Quantization for Large Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2308.15987)] 121 | - "FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs", arXiv, 2023. [[paper](http://arxiv.org/abs/2308.09723)] [**`PTQ`**] 122 | - "Gradient-Based Post-Training Quantization: Challenging the Status Quo", arXiv, 2023. [[paper](http://arxiv.org/abs/2308.07662)] [**`PTQ`**] 123 | - "NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search", arXiv, 2023. [[paper](http://arxiv.org/abs/2308.05600)] [**`Non-uniform`**] 124 | - "ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats", arXiv, 2023. [[paper](http://arxiv.org/abs/2307.09782)] 125 | - "Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2307.05972)] 126 | - "Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study", arXiv, 2023. [[paper](https://arxiv.org/abs/2307.08072)] 127 | - "INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation", arXiv, 2023. [[paper](https://arxiv.org/abs/2306.08162)] 128 | - "QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models", arXiv, 2023. [[paper](https://arxiv.org/abs/2307.03738)] [[code](https://github.com/IST-DASLab/QIGen)] 129 | - "OWQ: Lessons learned from activation outliers for weight quantization in large language models", arXiv, 2023. [[paper](http://arxiv.org/abs/2306.02272)] [**`PTQ`**] 130 | - "PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2306.00014)] 131 | - "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration", arXiv, 2023. [[paper](https://arxiv.org/abs/2306.00978)] [**`PTQ`**] 132 | - "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models", arXiv, 2023. [[paper](https://arxiv.org/abs/2305.17888)] 133 | - "Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling", arXiv, 2023. [[paper](https://arxiv.org/abs/2304.09145)] [**`PTQ`**] 134 | - "RPTQ: Reorder-based Post-training Quantization for Large Language Models", arXiv, 2023. [[paper](https://arxiv.org/abs/2304.01089)] [[code](https://github.com/hahnyuan/rptq4llm)] [**`PTQ`**] 135 | - "The case for 4-bit precision: k-bit Inference Scaling Laws", ICML, 2023. [[paper](https://openreview.net/forum?id=i8tGb1ab1j)] 136 | - "Quantized Distributed Training of Large Models with Convergence Guarantees", ICML, 2023. [[paper](https://openreview.net/forum?id=Nqp8A5IDzq)] 137 | - "Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases", ICML, 2023. [[paper](https://openreview.net/forum?id=q1WGm3hItW)] 138 | - "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models", ICML, 2023. [[paper](https://arxiv.org/abs/2211.10438)] [[code](https://github.com/mit-han-lab/smoothquant)] [**`PTQ`**] 139 | - "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers", ICLR, 2023. [[papar](https://arxiv.org/abs/2210.17323)] [[code](https://github.com/IST-DASLab/gptq)] [**`PTQ`**] 140 | - "BiBERT: Accurate Fully Binarized BERT", ICLR, 2022. [[paper](https://openreview.net/forum?id=5xEgrl_5FAJ)] [[code](https://github.com/htqin/BiBERT)] [**`Extreme`**] 141 | - "BiT: Robustly Binarized Multi-distilled Transformer", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=55032)] [[code](https://github.com/facebookresearch/bit)] [**`Extreme`**] 142 | - "Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models", NeurIPS, 2022. [[paper]](https://arxiv.org/abs/2209.13325) [[code](https://github.com/wimh966/outlier_suppression)] [**`PTQ`**] 143 | - "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale", NeurIPS, 2022. [[paper](https://arxiv.org/abs/2208.07339)] [[code](https://github.com/timdettmers/bitsandbytes)] 144 | - "Towards Efficient Post-training Quantization of Pre-trained Language Models", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=53407)] [**`PTQ`**] 145 | - "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=54407)] [[code](https://github.com/microsoft/DeepSpeed)] [**`PTQ`**] 146 | - "Compression of Generative Pre-trained Language Models via Quantization", ACL, 2022. [[paper](https://aclanthology.org/2022.acl-long.331)] 147 | - "I-BERT: Integer-only BERT Quantization", ICML, 2021. [[paper](https://proceedings.mlr.press/v139/kim21d.html)] [[code](https://github.com/kssteven418/I-BERT)] 148 | - "BinaryBERT: Pushing the Limit of BERT Quantization", ACL, 2021. [[paper](https://arxiv.org/abs/2012.15701)] [[code](https://github.com/huawei-noah/Pretrained-Language-Model)] [**`Extreme`**] 149 | - "On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers", ACL, 2021. [[paper](https://aclanthology.org/2021.findings-acl.363)] 150 | - "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP, 2021. [[paper](https://arxiv.org/abs/2109.12948)] [[code](https://github.com/qualcomm-ai-research/transformer-quantization)] 151 | - "KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization", arXiv, 2021. [[paper](https://arxiv.org/abs/2101.05938)] 152 | - "TernaryBERT: Distillation-aware Ultra-low Bit BERT", EMNLP, 2020. [[paper](https://arxiv.org/abs/2009.12812)] [[code](https://github.com/huawei-noah/Pretrained-Language-Model)] [**`Extreme`**] 153 | - "Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation", EMNLP, 2020. [[paper](https://aclanthology.org/2020.findings-emnlp.433/)] 154 | - "GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference", MICRO, 2020. [[paper](https://arxiv.org/abs/2005.03842)] 155 | - "Towards Fully 8-bit Integer Inference for the Transformer Model", IJCAI, 2020. [[paper](https://www.ijcai.org/Proceedings/2020/0520.pdf)] 156 | - "Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT", AAAI, 2020. [[paper](https://ojs.aaai.org/index.php/AAAI/article/download/6409/6265)] 157 | - "Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model", ICML, 2019. [[paper](https://arxiv.org/abs/1906.00532)] 158 | - "Q8BERT: Quantized 8Bit BERT", EMC2 Workshop, 2019. [[paper](https://www.emc2-ai.org/assets/docs/neurips-19/emc2-neurips19-paper-31.pdf)] 159 | 160 | [[Back to Overview](#overview)] 161 | 162 | ### Vision Transformers 163 | - "CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/8434_ECCV_2024_paper.php)] [**`PTQ`**] 164 | - "AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/3969_ECCV_2024_paper.php)] [**`PTQ`**] 165 | - "PQ-SAM: Post-training Quantization for Segment Anything Model", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/1627_ECCV_2024_paper.php)] [**`PTQ`**] 166 | - "ERQ: Error Reduction for Post-Training Quantization of Vision Transformers", ICML, 2024. [[paper](https://openreview.net/forum?id=jKUWlgra9b)] [**`PTQ`**] 167 | - "Outlier-aware Slicing for Post-Training Quantization in Vision Transformer", ICML, 2024. [[paper](https://openreview.net/forum?id=Uh5XN9d2J4)] [**`PTQ`**] 168 | - "PTQ4SAM: Post-Training Quantization for Segment Anything", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Lv_PTQ4SAM_Post-Training_Quantization_for_Segment_Anything_CVPR_2024_paper.html)] [**`PTQ`**] 169 | - "Instance-Aware Group Quantization for Vision Transformers", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Moon_Instance-Aware_Group_Quantization_for_Vision_Transformers_CVPR_2024_paper.html)] [**`PTQ`**] 170 | - "Bi-ViT: Pushing the Limit of Vision Transformer Quantization", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/28109)] [**`Extreme`**] 171 | - "AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29487)] 172 | - "LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.11243)] [**`PTQ`**] [**`MP`**] 173 | - "MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.14895)] [**`PTQ`**] [**`MP`**] 174 | - "I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_I-ViT_Integer-only_Quantization_for_Efficient_Vision_Transformer_Inference_ICCV_2023_paper.pdf)] [[code](https://github.com/zkkli/I-ViT)] 175 | - "RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_RepQ-ViT_Scale_Reparameterization_for_Post-Training_Quantization_of_Vision_Transformers_ICCV_2023_paper.pdf)] [[code](https://github.com/zkkli/RepQ-ViT)] [**`PTQ`**] 176 | - "QD-BEV: Quantization-aware View-guided Distillation for Multi-view 3D Object Detection", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Zhang_QD-BEV__Quantization-aware_View-guided_Distillation_for_Multi-view_3D_Object_Detection_ICCV_2023_paper.pdf)] 177 | - "BiViT: Extremely Compressed Binary Vision Transformers", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/He_BiViT_Extremely_Compressed_Binary_Vision_Transformers_ICCV_2023_paper.pdf)] [**`Extreme`**] 178 | - "Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Frumkin_Jumping_through_Local_Minima_Quantization_in_the_Loss_Landscape_of_ICCV_2023_paper.pdf)] 179 | - "PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/71880)] 180 | - "Oscillation-free Quantization for Low-bit Vision Transformers", ICML, 2023. [[paper](https://openreview.net/forum?id=DihXH24AdY)] [[code](https://github.com/nbasyl/OFQ)] 181 | - "PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers", TNNLS, 2023. [[paper](https://arxiv.org/abs/2209.05687)] 182 | - "Variation-aware Vision Transformer Quantization", arXiv, 2023. [[paper](http://arxiv.org/abs/2307.00331)] 183 | - "NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers", CVPR, 2023. [[paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_NoisyQuant_Noisy_Bias-Enhanced_Post-Training_Activation_Quantization_for_Vision_Transformers_CVPR_2023_paper.pdf)] [**`PTQ`**] 184 | - "Boost Vision Transformer with GPU-Friendly Sparsity and Quantization", CVPR, 2023. [[paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Yu_Boost_Vision_Transformer_With_GPU-Friendly_Sparsity_and_Quantization_CVPR_2023_paper.pdf)] 185 | - "Q-DETR: An Efficient Low-Bit Quantized Detection Transformer", CVPR, 2023. [[paper](http://openaccess.thecvf.com/content/CVPR2023/html/Xu_Q-DETR_An_Efficient_Low-Bit_Quantized_Detection_Transformer_CVPR_2023_paper.html)] 186 | - "Output Sensitivity-Aware DETR Quantization", 2023. [[paper](https://practical-dl.github.io/2023/extended_abstract/4/CameraReady/4.pdf)] 187 | - "Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction", arXiv, 2023. [[paper](https://arxiv.org/abs/2303.12557)] [**`PTQ`**] 188 | - "Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer", NeurIPS, 2022. [[paper](https://openreview.net/forum?id=fU-m9kQe0ke)] [[code](https://github.com/yanjingli0202/q-vit)] 189 | - "Patch Similarity Aware Data-Free Quantization for Vision Transformers", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136710154.pdf)] [[code](https://github.com/zkkli/psaq-vit)] [**`PTQ`**] 190 | - "PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136720190.pdf)] [[code](https://github.com/hahnyuan/ptq4vit)] [**`PTQ`**] 191 | - "FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer", IJCAI, 2022. [[paper](https://arxiv.org/abs/2111.13824)] [[code](https://github.com/megvii-research/FQ-ViT)] [**`PTQ`**] 192 | - "Q-ViT: Fully Differentiable Quantization for Vision Transformer", arXiv, 2022. [[paper](https://arxiv.org/pdf/2201.07703.pdf)] 193 | - "Post-Training Quantization for Vision Transformer", NeurIPS, 2021. [[paper](https://openreview.net/forum?id=9TX5OsKJvm)] [**`PTQ`**] 194 | 195 | 196 | [[Back to Overview](#overview)] 197 | 198 | ### Visual Generation 199 | - "SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/27906)] 200 | - "ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/30429)] 201 | - "DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models", ICLR, 2025. [[paper](https://iclr.cc/virtual/2025/poster/29192)] 202 | - "PTQ4DiT: Post-training Quantization for Diffusion Transformers", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/95445)] [**`PTQ`**] 203 | 204 | [[Back to Overview](#overview)] 205 | 206 | ## Convolutional Neural Networks 207 | ### Visual Generation 208 | - "BiDM: Pushing the Limit of Quantization for Diffusion Models", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/93620)] 209 | - "BitsFusion: 1.99 bits Weight Quantization of Diffusion Model", NeurIPS, 2024. [[paper](https://nips.cc/virtual/2024/poster/96909)] 210 | - "Timestep-Aware Correction for Quantized Diffusion Models", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/8312_ECCV_2024_paper.php)] 211 | - "Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/7353_ECCV_2024_paper.php)] [**`PTQ`**] 212 | - "Memory-Efficient Fine-Tuning for Quantized Diffusion Model", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/2494_ECCV_2024_paper.php)] 213 | - "MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/2212_ECCV_2024_paper.php)] 214 | - "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Huang_TFMQ-DM_Temporal_Feature_Maintenance_Quantization_for_Diffusion_Models_CVPR_2024_paper.html)] [**`PTQ`**] 215 | - "Towards Accurate Post-training Quantization for Diffusion Models", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Wang_Towards_Accurate_Post-training_Quantization_for_Diffusion_Models_CVPR_2024_paper.html)] [**`PTQ`**] 216 | - "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models", ICLR, 2024. [[paper](https://openreview.net/forum?id=UmMa3UNDAz)] 217 | - "QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning", arXiv, 2024. [[paper](http://arxiv.org/abs/2402.03666)] 218 | - "Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.04585)] 219 | - "Efficient Quantization Strategies for Latent Diffusion Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.05431)] [**`PTQ`**] 220 | - "Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2311.06322)] 221 | - "Effective Quantization for Diffusion Models on CPUs", arXiv, 2023. [[paper](http://arxiv.org/abs/2311.16133)] 222 | - "PTQD: Accurate Post-Training Quantization for Diffusion Models", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/71314)] [**`PTQ`**] 223 | - "Q-DM: An Efficient Low-bit Quantized Diffusion Model", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/70279)] 224 | - "Temporal Dynamic Quantization for Diffusion Models", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/72396)] 225 | - "Q-diffusion: Quantizing Diffusion Models", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_Q-Diffusion_Quantizing_Diffusion_Models_ICCV_2023_paper.pdf)] [[code](https://github.com/Xiuyu-Li/q-diffusion)] [**`PTQ`**] 226 | - "Towards Accurate Data-free Quantization for Diffusion Models", arXiv, 2023. [[paper](http://arxiv.org/abs/2305.18723)] [**`PTQ`**] 227 | - "Post-training Quantization on Diffusion Models", CVPR, 2023. [[paper](http://openaccess.thecvf.com/content/CVPR2023/html/Shang_Post-Training_Quantization_on_Diffusion_Models_CVPR_2023_paper.html)] [[code](https://https//github.com/42Shawn/PTQ4DM)] [**`PTQ`**] 228 | 229 | [[Back to Overview](#overview)] 230 | 231 | ### Image Classification 232 | - "MetaAug: Meta-Data Augmentation for Post-Training Quantization", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/3914_ECCV_2024_paper.php)] 233 | - "Sharpness-Aware Data Generation for Zero-shot Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=8mKXMnhnFW)] 234 | - "A2Q+: Improving Accumulator-Aware Weight Quantization", ICML, 2024. [[paper](https://openreview.net/forum?id=mbx2pLK5Eq)] 235 | - "HyQ: Hardware-Friendly Post-Training Quantization for CNN-Transformer Hybrid Networks", IJCAI, 2024. [[paper](https://www.ijcai.org/proceedings/2024/474)] [**`PTQ`**] 236 | - "Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Tang_Retraining-Free_Model_Quantization_via_One-Shot_Weight-Coupling_Learning_CVPR_2024_paper.html)] [**`MP`**] 237 | - "Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Chen_Mixed-Precision_Quantization_for_Federated_Learning_on_Resource-Constrained_Heterogeneous_Devices_CVPR_2024_paper.html)] [**`MP`**] 238 | - "Enhancing Post-training Quantization Calibration through Contrastive Learning", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Shang_Enhancing_Post-training_Quantization_Calibration_through_Contrastive_Learning_CVPR_2024_paper.html)] [**`PTQ`**] 239 | - "Data-Free Quantization via Pseudo-label Filtering", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Fan_Data-Free_Quantization_via_Pseudo-label_Filtering_CVPR_2024_paper.html)] 240 | - "Make RepVGG Greater Again: A Quantization-Aware Approach", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29045)] 241 | - "MetaMix: Meta-State Precision Searcher for Mixed-Precision Activation Quantization", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29212)] [**`MP`**] 242 | - "Robustness-Guided Image Synthesis for Data-Free Quantization", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/28972)] 243 | - "PTMQ: Post-training Multi-Bit Quantization of Neural Networks", AAAI, 2024. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/29553)] [**`PTQ`**] 244 | - "Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs", arXiv, 2023. [[paper](http://arxiv.org/abs/2401.17544)] 245 | - "StableQ: Enhancing Data-Scarce Quantization with Text-to-Image Data", arXiv, 2023. [[paper](http://arxiv.org/abs/2312.05272)] 246 | - "Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/71526)] [**`Extreme`**] 247 | - "TexQ: Zero-shot Network Quantization with Texture Feature Distribution Calibration", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/70325)] 248 | - "Overcoming Forgetting Catastrophe in Quantization-Aware Training", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Chen_Overcoming_Forgetting_Catastrophe_in_Quantization-Aware_Training_ICCV_2023_paper.pdf)] 249 | - "Causal-DFQ: Causality Guided Data-Free Network Quantization", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Shang_Causal-DFQ_Causality_Guided_Data-Free_Network_Quantization_ICCV_2023_paper.pdf)] [[code](https://github.com/42Shawn/Causal-DFQ)] 250 | - "DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Li_DenseShift_Towards_Accurate_and_Efficient_Low-Bit_Power-of-Two_Quantization_ICCV_2023_paper.pdf)] 251 | - "EQ-Net: Elastic Quantization Neural Networks", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_EQ-Net_Elastic_Quantization_Neural_Networks_ICCV_2023_paper.pdf)] [[code](https://github.com/xuke225/EQ-Net)] 252 | - "A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Colbert_A2Q_Accumulator-Aware_Quantization_with_Guaranteed_Overflow_Avoidance_ICCV_2023_paper.pdf)] 253 | - "EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Dong_EMQ_Evolving_Training-free_Proxies_for_Automated_Mixed_Precision_Quantization_ICCV_2023_paper.pdf)] [**`MP`**] 254 | - "Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning", ICCV, 2023. [[paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Bai_Unified_Data-Free_Compression_Pruning_and_Quantization_without_Fine-Tuning_ICCV_2023_paper.pdf)] [**`PTQ`**] 255 | - "Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction", ICML, 2023. [[paper](https://openreview.net/forum?id=m2S96Qf2R3)] [[code](https://github.com/SkoltechAI/fewbit)] 256 | - "FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization", ICML, 2023. [[paper](https://openreview.net/forum?id=EPnzNJTYsb)] [**`PTQ`**] 257 | - "Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning", PR, 2023. [[paper](http://arxiv.org/abs/2307.00498)] 258 | - "OMPQ: Orthogonal Mixed Precision Quantization", AAAI, 2023. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/26084)] [**`MP`**] 259 | - "Rethinking Data-Free Quantization as a Zero-Sum Game", AAAI, 2023. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/26136)] 260 | - "Quantized Feature Distillation for Network Quantization", AAAI, 2023. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/26354)] 261 | - "Resilient Binary Neural Network", AAAI, 2023. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/26261)] [**`Extreme`**] 262 | - "Fast and Accurate Binary Neural Networks Based on Depth-Width Reshaping", AAAI, 2023. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/26268)] [**`Extreme`**] 263 | - "Efficient Quantization-aware Training with Adaptive Coreset Selection", arXiv, 2023. [[paper](http://arxiv.org/abs/2306.07215)] 264 | - "One-Shot Model for Mixed-Precision Quantization", CVPR, 2023. [[paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Koryakovskiy_One-Shot_Model_for_Mixed-Precision_Quantization_CVPR_2023_paper.pdf)] [**`MP`**] 265 | - "Adaptive Data-Free Quantization", CVPR, 2023. [[paper](https://arxiv.org/abs/2303.06869)] 266 | - "Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization", CVPR, 2023. [[paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Lin_Bit-Shrinking_Limiting_Instantaneous_Sharpness_for_Improving_Post-Training_Quantization_CVPR_2023_paper.pdf)] [**`PTQ`**] 267 | - "Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective", CVPR, 2023. [[paper](https://arxiv.org/pdf/2303.11906.pdf)] [[code](https://github.com/bytedance/mrecg)] [**`PTQ`**] 268 | - "GENIE: Show Me the Data for Quantization", CVPR, 2023. [[paper](https://arxiv.org/abs/2212.04780)] [[code](https://github.com/SamsungLabs/Genie)] [**`PTQ`**] 269 | - "Bayesian asymmetric quantized neural networks", PR, 2023. [[paper](https://www.sciencedirect.com/science/article/pii/S0031320323001632)] 270 | - "Distribution-sensitive Information Retention for Accurate Binary Neural Network", IJCV, 2023. [[paper](https://arxiv.org/abs/2109.12338)] [**`Extreme`**] 271 | - "SDQ: Stochastic Differentiable Quantization with Mixed Precision", ICML, 2022. [[paper](https://proceedings.mlr.press/v162/huang22h.html)] [**`MP`**] 272 | - "Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks", ICML, 2022. [[paper](https://proceedings.mlr.press/v162/dong22a.html)] [[code](https://github.com/RunpeiDong/DGMS)] 273 | - "GACT: Activation Compressed Training for Generic Network Architectures", ICML, 2022. [[paper](https://proceedings.mlr.press/v162/liu22v.html)] [[code](https://github.com/LiuXiaoxuanPKU/GACT-ICML)] 274 | - "Overcoming Oscillations in Quantization-Aware Training", ICML, 2022. [[paper](https://proceedings.mlr.press/v162/nagel22a/nagel22a.pdf)] [[code](https://github.com/qualcomm-ai-research/oscillations-qat)] 275 | - "Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation", CVPR, 2022. [[paper](https://arxiv.org/abs/2111.14826)] [[code](https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization)] [**`Non-uniform`**] 276 | - "Learnable Lookup Table for Neural Network Quantization", CVPR, 2022. [[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Wang_Learnable_Lookup_Table_for_Neural_Network_Quantization_CVPR_2022_paper.pdf)] [[code](https://github.com/The-Learning-And-Vision-Atelier-LAVA/LLT)] [**`Non-uniform`**] 277 | - "Mr.BiQ: Post-Training Non-Uniform Quantization based on Minimizing the Reconstruction Error", CVPR, 2022. [[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Jeon_Mr.BiQ_Post-Training_Non-Uniform_Quantization_Based_on_Minimizing_the_Reconstruction_Error_CVPR_2022_paper.pdf)] [**`PTQ`**] [**`Non-uniform`**] 278 | - "Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization", CVPR, 2022. [[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Chikin_Data-Free_Network_Compression_via_Parametric_Non-Uniform_Mixed_Precision_Quantization_CVPR_2022_paper.pdf)] [**`Non-uniform`**] [**`MP`**] 279 | - "IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization", CVPR, 2022. [[paper](https://openaccess.thecvf.com/content/CVPR2022/html/Zhong_IntraQ_Learning_Synthetic_Images_With_Intra-Class_Heterogeneity_for_Zero-Shot_Network_CVPR_2022_paper.html)] [[code](https://github.com/zysxmu/IntraQ)] 280 | - "Instance-Aware Dynamic Neural Network Quantization", CVPR, 2022. [[paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Liu_Instance-Aware_Dynamic_Neural_Network_Quantization_CVPR_2022_paper.pdf)] 281 | - "Leveraging Inter-Layer Dependency for Post-Training Quantization", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=54389)] [**`PTQ`**] 282 | - "Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=53476)] 283 | - "Entropy-Driven Mixed-Precision Quantization for Deep Network Design", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=54104)] [**`MP`**] 284 | - "Redistribution of Weights and Activations for AdderNet Quantization", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=54812)] 285 | - "FP8 Quantization: The Power of the Exponent", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=53073)] [[code](https://github.com/qualcomm-ai-research/fp8-quantization)] 286 | - "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=53412)] [[code](https://github.com/ist-daslab/obc)] [**`PTQ`**] 287 | - "ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences", NeurIPS, 2022. [[paper](https://nips.cc/Conferences/2022/Schedule?showEvent=55162)] 288 | - "Non-Uniform Step Size Quantization for Accurate Post-Training Quantization", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136710657.pdf)] [**`PTQ`**] [**`Non-uniform`**] 289 | - "Towards Accurate Network Quantization with Equivalent Smooth Regularizer", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136710726.pdf)] 290 | - "BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136720017.pdf)] [[code](https://github.com/HanByulKim/BASQ)] 291 | - "RDO-Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136720156.pdf)] 292 | - "Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance", ECCV, 2022. [[paper](https://arxiv.org/abs/2203.08368)] [[Code](https://github.com/1hunters/LIMPQ)] [[code](https://github.com/1hunters/LIMPQ)] [**`MP`**] 293 | - "Symmetry Regularization and Saturating Nonlinearity for Robust Quantization", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136710207.pdf)] 294 | - "RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization", IJCAI, 2022. [[paper](https://www.ijcai.org/proceedings/2022/219)] [[code](https://github.com/billamihom/rapq)] [**`PTQ`**] 295 | - "MultiQuant: Training Once for Multi-bit Quantization of Neural Networks", IJCAI, 2022. [[paper](https://www.ijcai.org/proceedings/2022/504)] 296 | - "F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization", ICLR, 2022. [[paper](https://openreview.net/forum?id=_CfpJazzXT2)] 297 | - "8-bit Optimizers via Block-wise Quantization", ICLR, 2022. [[paper](https://openreview.net/forum?id=shpkpVXzo3h)] [[code](https://github.com/facebookresearch/bitsandbytes)] 298 | - "Information Bottleneck: Exact Analysis of (Quantized) Neural Networks", ICLR, 2022. [[paper](https://openreview.net/forum?id=kF9DZQQrU0w)] [[code](https://github.com/StephanLorenzen/ExactIBAnalysisInQNNs)] 299 | - "QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization", ICLR, 2022. [[paper](https://openreview.net/forum?id=ySQH0oDyp7)] [[code](https://github.com/wimh966/QDrop)] [**`PTQ`**] 300 | - "SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation", ICLR, 2022. [[paper](https://openreview.net/forum?id=JXhROKNZzOc)] [[code](https://github.com/clevercool/SQuant)] [**`PTQ`**] 301 | - "FILM-QNN: Efficient FPGA Acceleration of Deep Neural Networks with Intra-Layer, Mixed-Precision Quantization", FPGA, 2022. [[paper](https://dl.acm.org/doi/abs/10.1145/3490422.3502364)] [**`MP`**] 302 | - "Accurate Post Training Quantization with Small Calibration Sets", ICML, 2021. [[paper](http://proceedings.mlr.press/v139/hubara21a.html)] [[code](https://github.com/papers-submission/CalibTIP)] [**`PTQ`**] 303 | - "How Do Adam and Training Strategies Help BNNs Optimization?", ICML, 2021. [[paper](http://proceedings.mlr.press/v139/liu21t/liu21t.pdf)] [[code](https://github.com/liuzechun/AdamBNN)] 304 | - "ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training", ICML, 2021. [[paper](https://proceedings.mlr.press/v139/chen21z.html)] [[code](https://github.com/ucbrise/actnn)] 305 | - "HAWQ-V3: Dyadic Neural Network Quantization", ICML, 2021. [[paper](https://proceedings.mlr.press/v139/yao21a.html)] [[code](https://github.com/Zhen-Dong/HAWQ)] [**`MP`**] 306 | - "Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution", ICML, 2021. [[paper](https://proceedings.mlr.press/v139/zhang21r.html)] [**`MP`**] 307 | - "Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators", ICML, 2021. [[paper](https://proceedings.mlr.press/v139/fu21d.html)] [[code](https://github.com/RICE-EIC/Auto-NBA)] 308 | - "Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples", NeurIPS, 2021. [[paper](https://openreview.net/forum?id=ejo1_Weiart)] [[code](https://github.com/iamkanghyunchoi/qimera)] 309 | - "Post-Training Sparsity-Aware Quantization", NeurIPS, 2021. [[paper](https://openreview.net/forum?id=qe9z54E_cqE)] [[code](https://github.com/gilshm/sparq)] [**`PTQ`**] 310 | - "Diversifying Sample Generation for Accurate Data-Free Quantization", CVPR, 2021. [[paper](https://arxiv.org/abs/2103.01049)] [**`PTQ`**] 311 | - "Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks.", CVPR, 2021. [[paper](https://arxiv.org/abs/2010.15703)] [[code](https://github.com/uber-research/permute-quantize-finetune)] 312 | - "Learnable Companding Quantization for Accurate Low-bit Neural Networks", CVPR, 2021. [[paper](https://arxiv.org/abs/2103.07156)] 313 | - "Zero-shot Adversarial Quantization", CVPR, 2021. [[paper](https://arxiv.org/abs/2103.15263)] [[code](https://github.com/FLHonker/ZAQ-code)] 314 | - "Network Quantization with Element-wise Gradient Scaling", CVPR, 2021. [[paper](https://arxiv.org/abs/2104.00903)] [[code](https://github.com/cvlab-yonsei/EWGS)] 315 | - "High-Capacity Expert Binary Networks", ICLR, 2021. [[paper](https://openreview.net/forum?id=MxaY4FzOTa)] [[code](https://github.com/1adrianb/expert-binary-networks)] [**`Extreme`**] 316 | - "Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network", ICLR, 2021. [[paper](https://openreview.net/forum?id=U_mat0b9iv)] [[code](https://github.com/chrundle/biprop)] [**`Extreme`**] 317 | - "BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction", ICLR, 2021. [[paper](https://openreview.net/forum?id=POWv6hDd9XH)] [[code](https://github.com/yhhhli/BRECQ)] [**`PTQ`**] 318 | - "Neural gradients are near-lognormal: improved quantized and sparse training", ICLR, 2021. [[paper](https://openreview.net/forum?id=EoFNy62JGd)] 319 | - "Training with Quantization Noise for Extreme Model Compression", ICLR, 2021. [[paper](https://openreview.net/forum?id=dV19Yyi1fS3)] 320 | - "BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization", ICLR, 2021. [[paper](https://openreview.net/forum?id=TiXl51SCNw8)] [[code](https://github.com/yanghr/BSQ)] [**`MP`**] 321 | - "Simple Augmentation Goes a Long Way: ADRL for DNN Quantization", ICLR, 2021. [[paper](https://openreview.net/forum?id=Qr0aRliE_Hb)] 322 | - "Distribution Adaptive INT8 Quantization for Training CNNs", AAAI, 2021. [[paper](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwj4-rjuq7nvAhUVPH0KHXlYCUQQFjAFegQIChAD&url=https%3A%2F%2Fwww.aaai.org%2FAAAI21Papers%2FAAAI-7144.ZhaoK.pdf&usg=AOvVaw3dnOXfzKkLIw_qWXj7p7Yc)] 323 | - "Stochastic Precision Ensemble: Self‐Knowledge Distillation for Quantized Deep Neural Networks", AAAI, 2021. [[paper](https://arxiv.org/abs/2009.14502)] 324 | - "Optimizing Information Theory Based Bitwise Bottlenecks for Efficient Mixed-Precision Activation Quantization", AAAI, 2021. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/16474/16281)] [**`MP`**] 325 | - "OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization", AAAI, 2021. [[paper](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&cad=rja&uact=8&ved=2ahUKEwjD6aPrqbnvAhXeIDQIHWNdDCUQFjADegQIAxAD&url=https%3A%2F%2Fwww.aaai.org%2FAAAI21Papers%2FAAAI-1054.HuP.pdf&usg=AOvVaw2R_BcDlKyuuAPHMeO0Q-1c)] 326 | - "Scalable Verification of Quantized Neural Networks", AAAI, 2021. [[paper](https://arxiv.org/pdf/2012.08185)] [[code](https://github.com/mlech26l/qnn_robustness_benchmarks)] 327 | - "Uncertainty Quantification in CNN through the Bootstrap of Convex Neural Networks", AAAI, 2021. [[paper](https://ojs.aaai.org/index.php/AAAI/article/view/17434/17241)] 328 | - "FracBits: Mixed Precision Quantization via Fractional Bit-Widths", AAAI, 2021. [[paper](https://www.semanticscholar.org/paper/FracBits%3A-Mixed-Precision-Quantization-via-Yang-Jin/cb219432863778fa173925d51fbf02af1d17ad98)] [**`MP`**] 329 | - "Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision", AAAI, 2021. [[paper](https://arxiv.org/pdf/2002.09049)] [**`PTQ`**] [**`MP`**] 330 | - "ZeroQ: A Novel Zero Shot Quantization Framework", CVPR, 2020. [[paper](http://openaccess.thecvf.com/content_CVPR_2020/html/Cai_ZeroQ_A_Novel_Zero_Shot_Quantization_Framework_CVPR_2020_paper.html)] [[code](https://github.com/amirgholami/ZeroQ)] [**`PTQ`**] 331 | - "LSQ+: Improving Low-bit Quantization Through Learnable Offsets and Better Initialization", CVPR, 2020. [[paper](http://openaccess.thecvf.com/content_CVPRW_2020/html/w40/Bhalgat_LSQ_Improving_Low-Bit_Quantization_Through_Learnable_Offsets_and_Better_Initialization_CVPRW_2020_paper.html)] 332 | - "HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks", NeurIPS, 2020. [[paper](https://proceedings.neurips.cc/paper/2020/hash/d77c703536718b95308130ff2e5cf9ee-Abstract.html)] [**`MP`**] 333 | - "Learned step size quantization", ICLR, 2020. [[paper](https://openreview.net/forum?id=rkgO66VKDS)] 334 | - "HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision", ICCV, 2019. [[paper](https://openaccess.thecvf.com/content_ICCV_2019/html/Dong_HAWQ_Hessian_AWare_Quantization_of_Neural_Networks_With_Mixed-Precision_ICCV_2019_paper.html)] [**`MP`**] 335 | - "Data-Free Quantization Through Weight Equalization and Bias Correction", ICCV, 2019. [[paper](https://openaccess.thecvf.com/content_ICCV_2019/html/Nagel_Data-Free_Quantization_Through_Weight_Equalization_and_Bias_Correction_ICCV_2019_paper.html)] [**`PTQ`**] 336 | - "HAQ: Hardware-Aware Automated Quantization with Mixed Precision", CVPR, 2019. [[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_HAQ_Hardware-Aware_Automated_Quantization_With_Mixed_Precision_CVPR_2019_paper.pdf)] [[code](https://github.com/mit-han-lab/haq)] [**`MP`**] 337 | - "PACT: Parameterized Clipping Activation for Quantized Neural Networks", arXiv, 2018. [[paper](https://arxiv.org/abs/1805.06085)] 338 | - "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference", CVPR, 2018. [[paper](https://openaccess.thecvf.com/content_cvpr_2018/papers/Jacob_Quantization_and_Training_CVPR_2018_paper.pdf)] 339 | 340 | 341 | [[Back to Overview](#overview)] 342 | 343 | ### Other Tasks 344 | #### Object Detection 345 | - "Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector", CVPR, 2024. [[paper](https://openaccess.thecvf.com/content/CVPR2024/html/Ding_Reg-PTQ_Regression-specialized_Post-training_Quantization_for_Fully_Quantized_Object_Detector_CVPR_2024_paper.html)] [**`PTQ`**] 346 | - "Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric", arXiv, 2023. [[paper](https://arxiv.org/abs/2304.09785)] [**`PTQ`**] 347 | - "AQD: Towards Accurate Quantized Object Detection", CVPR, 2021. [[paper](http://arxiv.org/abs/2007.06919)] 348 | - "BiDet: An Efficient Binarized Object Detector", CVPR, 2020. [[paper](https://arxiv.org/abs/2003.03961)] [[code](https://github.com/ZiweiWangTHU/BiDet)] [**`Extreme`**] 349 | - "Fully Quantized Network for Object Detection", CVPR, 2019. [[paper](https://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Fully_Quantized_Network_for_Object_Detection_CVPR_2019_paper.pdf)] 350 | 351 | [[Back to Overview](#overview)] 352 | 353 | #### Super Resolution 354 | - "Towards Robust Full Low-bit Quantization of Super Resolution Networks", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/9567_ECCV_2024_paper.php)] [**`PTQ`**] 355 | - "Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks", ECCV, 2024. [[paper](https://www.ecva.net/papers/eccv_2024/papers_ECCV/html/2121_ECCV_2024_paper.php)] 356 | - "QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution", NeurIPS, 2023. [[paper](https://neurips.cc/virtual/2023/poster/72890)] 357 | - "Toward Accurate Post-Training Quantization for Image Super Resolution", CVPR, 2023. [[paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Tu_Toward_Accurate_Post-Training_Quantization_for_Image_Super_Resolution_CVPR_2023_paper.pdf)] [[code]( https://github.com/huawei-noah/Efficient-Computing/tree/master/Quantization/PTQ4SR)] [**`PTQ`**] 358 | - "EBSR: Enhanced Binary Neural Network for Image Super-Resolution", arXiv, 2023. [[paper](https://arxiv.org/abs/2303.12270)] [**`Extreme`**] 359 | - "CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution 360 | ", ECCV, 2022. [[paper](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136670360.pdf)] [[code](https://github.com/cheeun/cadyq)] 361 | - "Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks", ECCV, 2022. [[paper](https://arxiv.org/abs/2203.03844)] [[code](https://github.com/zysxmu/ddtb)] 362 | - "DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks", WACV, 2022. [[paper](http://openaccess.thecvf.com/content/WACV2022/html/Hong_DAQ_Channel-Wise_Distribution-Aware_Quantization_for_Deep_Image_Super-Resolution_Networks_WACV_2022_paper.html)] [[code](https://github.com/Cheeun/DAQ-pytorch)] 363 | - "Fully Quantized Image Super-Resolution Networks", ACM MM, 2021. [[paper](https://arxiv.org/abs/2011.14265)] [[code](https://github.com/billhhh/FQSR)] 364 | - "PAMS: Quantized Super-Resolution via Parameterized Max Scale", ECCV, 2020. [[paper](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123700562.pdf)] [[code](https://github.com/colorjam/PAMS)] 365 | - "Training Binary Neural Network without Batch Normalization for Image Super-Resolution", AAAI, 2021. [[paper](https://ojs.aaai.org/index.php/AAAI/article/download/16263/16070)] [**`Extreme`**] 366 | 367 | [[Back to Overview](#overview)] 368 | 369 | #### Point Cloud 370 | - "LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection", ICLR, 2024. [[paper](https://openreview.net/forum?id=0d1gQI114C)] [**`PTQ`**] 371 | - "Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis", arXiv, 2023. [[paper](https://arxiv.org/abs/2303.15493)] [**`Extreme`**] 372 | - "BiPointNet: Binary Neural Network for Point Clouds", ICLR, 2021. [[paper](https://openreview.net/forum?id=9QLRCVysdlO)] [[code](https://github.com/htqin/BiPointNet)] [**`Extreme`**] 373 | 374 | [[Back to Overview](#overview)] 375 | 376 | 377 | 378 | --- 379 | 380 | ## References 381 | * Online Resources: 382 | * [MQBench (Benchmark)](http://mqbench.tech/) 383 | * [Awesome Model Quantization (GitHub)](https://github.com/htqin/awesome-model-quantization) 384 | * [Awesome Transformer Attention (GitHub)](https://github.com/cmhungsteve/Awesome-Transformer-Attention) 385 | 386 | 387 | --------------------------------------------------------------------------------