└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Awesome Knowledge Distillation in Computer vision 2 | 3 | [TOC] 4 | 5 | ## Diffusion Knowledge Distillation 6 | 7 | | Title | Venue | Note | 8 | | ------------------------------------------------------------ | --------- | ------------------------------------------------------------ | 9 | | A Comprehensive Survey on Knowledge Distillation of Diffusion Models | 2023 | Weijian Luo. [[pdf](https://arxiv.org/pdf/2304.04262.pdf)] | 10 | | Knowledge distillation in iterative generative models for improved sampling speed | 2021 | Eric Luhman, Troy Luhman. [[pdf](https://arxiv.org/abs/2101.02388)] | 11 | | Progressive Distillation for Fast Sampling of Diffusion Models | ICLR 2022 | Tim Salimans and Jonathan Ho. [[pdf](https://arxiv.org/abs/2202.00512)] | 12 | | On Distillation of Guided Diffusion Models | CVPR 2023 | Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans. [[pdf](https://arxiv.org/abs/2210.03142)] | 13 | | TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation | 2023 | Berthelot, David, Autef, Arnaud, Lin, Jierui, Yap, Dian Ang, Zhai, Shuangfei, Hu, Siyuan, Zheng, Daniel, Talbott, Walter, Gu, Eric. [[pdf]](https://arxiv.org/abs/2303.04248) | 14 | | BK-SDM: Architecturally Compressed Stable Diffusion for Efficient Text-to-Image Generation | ICML 2023 | Kim, Bo-Kyeong, Song, Hyoung-Kyu, Castells, Thibault, Choi, Shinkook. [[pdf](https://openreview.net/forum?id=bOVydU0XKC)] | 15 | | On Architectural Compression of Text-to-Image Diffusion Models | 2023 | Kim, Bo-Kyeong, Song, Hyoung-Kyu, Castells, Thibault, Choi, Shinkook. [[pdf](https://arxiv.org/abs/2305.15798)] | 16 | | Knowledge Diffusion for Distillation | 2023 | Tao Huang, Yuan Zhang, Mingkai Zheng, Shan You, Fei Wang, Chen Qian, Chang Xu. [[pdf](https://www.researchgate.net/publication/371040763_Knowledge_Diffusion_for_Distillation)] | 17 | | SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds | 2023 | Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, Jian Ren1. [[pdf](https://snap-research.github.io/SnapFusion/)] | 18 | | BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping | 2023 | Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Lingjie Liu, Josh Susskind. [[pdf](https://arxiv.org/abs/2306.05544)] | 19 | | Consistency models | 2023 | Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. [[pdf](https://arxiv.org/abs/2303.01469)] | 20 | 21 | 22 | ## Knowledge Distillation for Semantic Segmentation 23 | 24 | 25 | 26 | | Title | Venue | Note | 27 | | ------------------------------------------------------------ | ---------------- | ---- | 28 | | Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks | arXiv:1709.00513 | | 29 | | Knowledge Distillation for Semantic Segmentation | | | 30 | | Structured knowledge distillation for semantic segmentation | CVPR-2019 | | 31 | | Intra-class feature variation distillation for semantic segmentation | ECCV-2020 | | 32 | | Channel-wise knowledge distillation for dense prediction | ICCV-2021 | | 33 | | Double Similarity Distillation for Semantic Image Segmentation | TIP-2021 | | 34 | | Cross-Image Relational Knowledge Distillation for Semantic Segmentation | CVPR-2022 | | 35 | | | | | 36 | | | | | 37 | 38 | 39 | 40 | 41 | 42 | ## Knowledge Distillation for Object Detection 43 | 44 | 45 | 46 | | Title | Venue | Note | 47 | | ------------------------------------------------------------ | ---------------- | ------------------------------------------------------------ | 48 | | Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks | arXiv:1709.00513 | | 49 | | Mimicking very efficient network for object detection | CVPR 2017 | [pdf](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8100259) | 50 | | Distilling object detectors with fine-grained feature imitation | CVPR 2019 | [pdf](https://arxiv.org/abs/1906.03609) | 51 | | General instance distillation for object detection | CVPR 2021 | [pdf](https://arxiv.org/abs/2103.02340) | 52 | | Distilling object detectors via decoupled features | CVPR 2021 | [pdf](https://arxiv.org/abs/2103.14475) | 53 | | Distilling object detectors with feature richness | NeurIPS 2021 | [pdf](https://arxiv.org/abs/2111.00674) | 54 | | Focal and global knowledge distillation for detectors | CVPR 2022 | [pdf](https://arxiv.org/abs/2111.11837) | 55 | | Rank Mimicking and Prediction-guided Feature Imitation | AAAI 2022 | [pdf](https://ojs.aaai.org/index.php/AAAI/article/download/20018/version/18315/19777) | 56 | | Prediction-Guided Distillation | ECCV 2022 | [pdf](https://arxiv.org/abs/2203.05469) | 57 | | Masked Distillation with Receptive Tokens | ICLR 2023 | [pdf](https://arxiv.org/abs/2205.14589) | 58 | | Structural Knowledge Distillation for Object Detection | NeurIPS 2022 | [OpenReview](https://openreview.net/forum?id=O3My0RK9s_R) | 59 | | Dual Relation Knowledge Distillation for Object Detection | IJCAI 2023 | [pdf](https://arxiv.org/abs/2302.05637) | 60 | | GLAMD: Global and Local Attention Mask Distillation for Object Detectors | ECCV 2022 | [ECVA](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136700456.pdf) | 61 | | G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation | ICCV 2021 | [CVF](http://openaccess.thecvf.com/content/ICCV2021/html/Yao_G-DetKD_Towards_General_Distillation_Framework_for_Object_Detectors_via_Contrastive_ICCV_2021_paper.html) | 62 | | PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient | NeurIPS 2022 | [OpenReview](https://openreview.net/forum?id=Q9dj3MzY1o7) | 63 | | MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection | ECCV 2020 | [ECVA](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123590528.pdf) | 64 | | LabelEnc: A New Intermediate Supervision Method for Object Detection | ECCV 2020 | [ECVA](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123700528.pdf) | 65 | 66 | 67 | 68 | 69 | | Title | Venue | Note | 70 | | ------------------------------------------------------------ | ---------------- | ------- | 71 | | HEtero-Assists Distillation for Heterogeneous Object Detectors | ECCV 2022 | HEAD | 72 | | LGD: Label-Guided Self-Distillation for Object Detection | AAAI 2022 | LGD | 73 | | When Object Detection Meets Knowledge Distillation: A Survey | TPAMI | | 74 | | ScaleKD: Distilling Scale-Aware Knowledge in Small Object Detector | CVPR 2023 | ScaleKD | 75 | | CrossKD: Cross-Head Knowledge Distillation for Dense Object Detection | arXiv:2306.11369 | CrossKD | 76 | 77 | 78 | 79 | 80 | 81 | ## Knowledge Distillation in Vision Transformers 82 | 83 | | Title | Venue | Note | 84 | | ------------------------------------------------------------ | ---------------- | ---- | 85 | | Training data-efficient image transformers & distillation through attention | ICML2021 | | 86 | | Co-advise: Cross inductive bias distillation | CVPR2022 | | 87 | | Tinyvit: Fast pretraining distillation for small vision transformers | arXiv:2207.10666 | | 88 | | Attention Probe: Vision Transformer Distillation in the Wild | ICASSP2022 | | 89 | | Dear KD: Data-Efficient Early Knowledge Distillation for Vision Transformers | CVPR2022 | | 90 | | Efficient vision transformers via fine-grained manifold distillation | NIPS2022 | | 91 | | Cross-Architecture Knowledge Distillation | arXiv:2207.05273 | | 92 | | MiniViT: Compressing Vision Transformers with Weight Multiplexing | CVPR2022 | | 93 | | ViTKD: Practical Guidelines for ViT feature knowledge distillation | arXiv 2022 | code | 94 | 95 | 96 | ## Knowledge Distillation for Teacher-Student Gaps 97 | 98 | | Title | Venue | Note | 99 | | ------------------------------------------------------------ | ---------------- | ---- | 100 | | Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher | AAAI2020 | | 101 | | Search to Distill: Pearls are Everywhere but not the Eyes | CVPR 2020 | | 102 | | Reducing the Teacher-Student Gap via Spherical Knowledge Disitllation | arXiv:2020 | | 103 | | Knowledge Distillation via the Target-aware Transformer | CVPR2022 | | 104 | | Decoupled Knowledge Distillation | CVPR 2022 | code | 105 | | Prune Your Model Before Distill It | ECCV 2022 | code | 106 | | Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again | NeurIPS 2022 | | 107 | | Weighted Distillation with Unlabeled Examples | NeurIPS 2022 | | 108 | | Respecting Transfer Gap in Knowledge Distillation | NeurIPS 2022 | | 109 | | Knowledge Distillation from A Stronger Teacher | arXiv:2205.10536 | | 110 | | Masked Generative Distillation | ECCV 2022 | code | 111 | | Curriculum Temperature for Knowledge Distillation | AAAI 2023 | code | 112 | | Knowledge distillation: A good teacher is patient and consistent | CVPR 2022 | | 113 | | Knowledge Distillation with the Reused Teacher Classifier | CVPR 2022 | | 114 | | Scaffolding a Student to Instill Knowledge | ICLR2023 | | 115 | | Function-Consistent Feature Distillation | ICLR2023 | | 116 | | Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation | ICLR2023 | | 117 | | Supervision Complexity and its Role in Knowledge Distillation | ICLR2023 | | 118 | 119 | 120 | ## Logits Knowledge Distillation 121 | 122 | | Title | Venue | Note | 123 | | ------------------------------------------------------------ | ----------------- | ---- | 124 | | Distilling the knowledge in a neural network | arXiv:1503.2531 | | 125 | | Deep Model Compression: Distilling Knowledge from Noisy Teachers | arXiv:161009650 | | 126 | | Semi-Supervised Knowledge Transfer for Deep Learning from Private Training Data | ICLR 2017 | | 127 | | Knowledge Adaptation: Teaching to Adapt | Arxiv:17022052 | | 128 | | Learning from Multiple Teacher Networks | KDD 2017 | | 129 | | Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results | NIPS 2017 | | 130 | | Training Deep Neural Networks in Generations:A More Tolerant Teacher Educates Better Students | arXiv:1805.551 | | 131 | | Moonshine:Distilling with Cheap Convolutions | NIPS 2018 | | 132 | | Learning from Multiple Teacher Networks | KDD 2017 | | 133 | | Positive-Unlabeled Compression on the Cloud | NIPS 2019 | | 134 | | Variational Student: Learning Compact and Sparser Networks in Knowledge Distillation Framework | arXiv:1910.12061 | | 135 | | Preparing Lessons: Improve Knowledge Distillation with Better Supervision | arXiv:1911.7471 | | 136 | | Adaptive Regularization of Labels | arXiv:1908.5474 | | 137 | | Learning Metrics from Teachers: Compact Networks for Image Embedding | CVPR 2019 | | 138 | | Diversity with Cooperation: Ensemble Methods for Few-Shot Classification | ICCV 2019 | | 139 | | Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher | arXiv:1902.3393 | | 140 | | MEAL: Multi-Model Ensemble via Adversarial Learning | AAAI 2019 | | 141 | | Revisit Knowledge Distillation: a Teacher-free Framework | CVPR 2020 [code] | | 142 | | Ensemble Distribution Distillation | ICLR 2020 | | 143 | | Noisy Collaboration in Knowledge Distillation | ICLR 2020 | | 144 | | Self-training with Noisy Student improves ImageNet classification | CVPR 2020 | | 145 | | QUEST: Quantized embedding space for transferring knowledge | CVPR 2020(pre) | | 146 | | Meta Pseudo Labels | ICML 2020 | | 147 | | Subclass Distillation | ICML2020 | | 148 | | Boosting Self-Supervised Learning via Knowledge Transfer | CVPR 2018 | | 149 | | Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model | CVPR 2020 [code] | | 150 | | Regularizing Class-wise Predictions via Self-knowledge Distillation | CVPR 2020 [code] | | 151 | | Rethinking Data Augmentation: Self-Supervision and Self-Distillation | ICLR 2020 | | 152 | | What it Thinks is Important is Important: Robustness Transfers through Input Gradients | CVPR 2020 | | 153 | | Role-Wise Data Augmentation for Knowledge Distillation | ICLR 2020 [code] | | 154 | | Distilling Effective Supervision from Severe Label Noise | CVPR 2020 | | 155 | | Learning with Noisy Class Labels for Instance Segmentation | ECCV 2020 | | 156 | | Self-Distillation Amplifies Regularization in Hilbert Space | arXiv:2002.5715 | | 157 | | MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers | arXiv:200210957 | | 158 | | Hydra: Preserving Ensemble Diversity for Model Distillation | arXiv:20014694 | | 159 | | Teacher-Class Network: A Neural Network Compression Mechanism | arXiv:2004.3281 | | 160 | | Learning from a Lightweight Teacher for Efficient Knowledge Distillation | arXiv:2005.9163 | | 161 | | Self-Distillation as Instance-Specific Label Smoothing | arXiv:2006.5065 | | 162 | | Self-supervised Knowledge Distillation for Few-shot Learning | arXiv:2006.09785 | | 163 | | Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation | arXiv:2007.1951 | | 164 | | Few Sample Knowledge Distillation for Efficient Network Compression | CVPR 2020 | | 165 | | Learning What and Where to Transfer | ICML 2019 | | 166 | | Transferring Knowledge across Learning Processes | ICLR 2019 | | 167 | | Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval | ICCV 2019 | | 168 | | Diversity with Cooperation: Ensemble Methods for Few-Shot Classification | ICCV 2019 | | 169 | | Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation | arXiv:191105329v1 | | 170 | | Progressive Knowledge Distillation For Generative Modeling | ICLR 2020 | | 171 | | Few Shot Network Compression via Cross Distillation | AAAI 2020 | | 172 | 173 | ## Intermediate Knowledge Distillation 174 | 175 | | Title | Venue | Note | 176 | | ------------------------------------------------------------ | ---------------- | ---- | 177 | | Fitnets: Hints for thin deep nets | arXiv:1412.6550 | | 178 | | Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer | ICLR 2017 | | 179 | | Knowledge Projection for Effective Design of Thinner and Faster Deep Neural Networks | arXiv:1710.9505 | | 180 | | A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning | CVPR 2017 | | 181 | | Paraphrasing complex network: Network compression via factor transfer | NIPS 2018 | | 182 | | Knowledge transfer with jacobian matching | ICML 2018 | | 183 | | Like What You Like: Knowledge Distill via Neuron Selectivity Transfer | CVPR2018 | | 184 | | An Embarrassingly Simple Approach for Knowledge Distillation | MLR 2018 | | 185 | | Self-supervised knowledge distillation using singular value decomposition | ECCV 2018 | | 186 | | Learning Deep Representations with Probabilistic Knowledge Transfer | ECCV 2018 | | 187 | | Correlation Congruence for Knowledge Distillation | ICCV 2019 | | 188 | | Similarity-Preserving Knowledge Distillation | ICCV 2019 | | 189 | | Variational Information Distillation for Knowledge Transfer | CVPR 2019 | | 190 | | Knowledge Distillation via Instance Relationship Graph | CVPR 2019 | | 191 | | Knowledge Distillation via Instance Relationship Graph | CVPR 2019 | | 192 | | Knowledge Distillation via Route Constrained Optimization | ICCV 2019 | | 193 | | Similarity-Preserving Knowledge Distillation | ICCV 2019 | | 194 | | Stagewise Knowledge Distillation | arXiv: 1911.6786 | | 195 | | Distilling Object Detectors with Fine-grained Feature Imitation | ICLR 2020 | | 196 | | Knowledge Squeezed Adversarial Network Compression | AAAI 2020 | | 197 | | Knowledge Distillation from Internal Representations | AAAI 2020 | | 198 | | Knowledge Flow:Improve Upon Your Teachers | ICLR 2019 | | 199 | | LIT: Learned Intermediate Representation Training for Model Compression | ICML 2019 | | 200 | | A Comprehensive Overhaul of Feature Distillation | ICCV 2019 | | 201 | | Residual Knowledge Distillation | arXiv:2002.9168 | | 202 | | Knowledge distillation via adaptive instance normalization | arXiv:2003.4289 | | 203 | | Channel Distillation: Channel-Wise Attention for Knowledge Distillation | arXiv:2006.01683 | | 204 | | Matching Guided Distillation | ECCV 2020 | | 205 | | Differentiable Feature Aggregation Search for Knowledge Distillation | ECCV 2020 | | 206 | | Local Correlation Consistency for Knowledge Distillation | ECCV 2020 | | 207 | 208 | ## Oneline Knowledge Distillation 209 | 210 | | Title | Venue | Note | 211 | | ------------------------------------------------------------ | --------------- | ---- | 212 | | Deep Mutual Learning | CVPR 2018 | | 213 | | Born-Again Neural Networks | ICML 2018 | | 214 | | Knowledge distillation by on-the-fly native ensemble | NIPS 2018 | | 215 | | Collaborative learning for deep neural networks | NIPS 2018 | | 216 | | Unifying Heterogeneous Classifiers with Distillation | CVPR 2019 | | 217 | | Snapshot Distillation: Teacher-Student Optimization in One Generation | CVPR 2019 | | 218 | | Deeply-supervised knowledge synergy | CVPR 2019 | | 219 | | Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation | ICCV 2019 | | 220 | | Distillation-Based Training for Multi-Exit Architectures | ICCV 2019 | | 221 | | MSD: Multi-Self-Distillation Learning via Multi-classifiers within Deep Neural Networks | arXiv:1911.9418 | | 222 | | FEED: Feature-level Ensemble for Knowledge Distillation | AAAI 2020 | | 223 | | Stochasticity and Skip Connection Improve Knowledge Transfer | ICLR 2020 | | 224 | | Online Knowledge Distillation with Diverse Peers | AAAI 2020 | | 225 | | Online Knowledge Distillation via Collaborative Learning | CVPR 2020 | | 226 | | Collaborative Learning for Faster StyleGAN Embedding | arXiv:20071758 | | 227 | | Online Knowledge Distillation via Collaborative Learning | CVPR 2020 | | 228 | | Feature-map-level Online Adversarial Knowledge Distillation | ICML 2020 | | 229 | | Knowledge Transfer via Dense Cross-layer Mutual-distillation | ECCV 2020 | | 230 | | MetaDistiller: Network Self-boosting via Meta-learned Top-down Distillation | ECCV 2020 | | 231 | | ResKD: Residual-Guided Knowledge Distillation | arXiv:2006.4719 | | 232 | | Interactive Knowledge Distillation | arXiv:2007.1476 | | 233 | 234 | ## Understanding Knowledge Distillation 235 | 236 | | Title | Venue | Note | 237 | | ------------------------------------------------------------ | ---------------- | ---- | 238 | | Do deep nets really need to be deep? | NIPS 2014 | | 239 | | When Does Label Smoothing Help? | NIPS 2019 | | 240 | | Towards Understanding Knowledge Distillation | AAAI 2019 | | 241 | | Harnessing deep neural networks with logical rules | ACL 2016 | | 242 | | Adaptive Regularization of Labels | arXiv:1908 | | 243 | | Knowledge Isomorphism between Neural Networks | arXiv:1908 | | 244 | | Understanding and Improving Knowledge Distillation | arXiv:2002.3532 | | 245 | | The State of Knowledge Distillation for Classification | arXiv:1912.10850 | | 246 | | Explaining Knowledge Distillation by Quantifying the Knowledge | CVPR 2020 | | 247 | | DeepVID: deep visual interpretation and diagnosis for image classifiers via knowledge distillation | IEEE Trans, 2019 | | 248 | | On the Unreasonable Effectiveness of Knowledge Distillation: Analysis in the Kernel Regime | arXiv:2003.13438 | | 249 | | Why distillation helps: a statistical perspective | arXiv:2005.10419 | | 250 | | Transferring Inductive Biases through Knowledge Distillation | arXiv:2006.555 | | 251 | | Does label smoothing mitigate label noise? Lukasik, Michal et al | ICML 2020 | | 252 | | An Empirical Analysis of the Impact of Data Augmentation on Knowledge Distillation | arXiv:2006.3810 | | 253 | | Does Adversarial Transferability Indicate Knowledge Transferability? | arXiv:2006.14512 | | 254 | | On the Demystification of Knowledge Distillation: A Residual Network Perspective | arXiv:2006.16589 | | 255 | | Teaching To Teach By Structured Dark Knowledge | ICLR 2020 | | 256 | | Inter-Region Affinity Distillation for Road Marking Segmentation | CVPR 2020 [code] | | 257 | | Heterogeneous Knowledge Distillation using Information Flow Modeling | CVPR 2020 [code] | | 258 | | Local Correlation Consistency for Knowledge Distillation | ECCV2020 | | 259 | | Few-Shot Class-Incremental Learning | CVPR 2020 | | 260 | | Unifying distillation and privileged information | ICLR 2016 | | 261 | 262 | ## Knowledge Distillation with Pruning , Quantization, NAS 263 | 264 | | Title | Venue | Note | 265 | | ------------------------------------------------------------ | ----------------- | ---- | 266 | | Accelerating Convolutional Neural Networks with Dominant Convolutional Kernel and Knowledge Pre-regression | ECCV 2016 | | 267 | | N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning | ICLR 2018 | | 268 | | Slimmable Neural Networks | ICLR 2018 | | 269 | | Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy | NIPS 2018 | | 270 | | MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning | ICCV 2019 | | 271 | | LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning | ICLR 2020 | | 272 | | Pruning with hints: an efficient framework for model acceleration | ICLR 2020 | | 273 | | Knapsack Pruning with Inner Distillation | arXiv:2002.8258 | | 274 | | Training convolutional neural networks with cheap convolutions and online distillation | arXiv:190913063 | | 275 | | Cooperative Pruning in Cross-Domain Deep Neural Network Compression | IJCAI 2019 | | 276 | | QKD: Quantization-aware Knowledge Distillation | arXiv:191112491v1 | | 277 | | Neural Network Pruning with Residual-Connections and Limited-Data | CVPR 2020 | | 278 | | Training Quantized Neural Networks with a Full-precision Auxiliary Module | CVPR 2020 | | 279 | | Towards Effective Low-bitwidth Convolutional Neural Networks | CVPR 2018 | | 280 | | Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations | arXiv:19084680 | | 281 | | Paying more attention to snapshots of Iterative Pruning: Improving Model Compression via Ensemble Distillation | arXiv:200611487 | | 282 | | Knowledge Distillation Beyond Model Compression | arxiv:20071493 | | 283 | | Teacher Guided Architecture Search | ICCV 2019 | | 284 | | Distillation Guided Residual Learning for Binary Convolutional Neural Networks | ECCV 2020 | | 285 | | MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution | ECCV 2020 | | 286 | | Improving Neural Architecture Search Image Classifiers via Ensemble Learning | arXiv:19036236 | | 287 | | Blockwisely Supervised Neural Architecture Search with Knowledge Distillation | arXiv:191113053v1 | | 288 | | Towards Oracle Knowledge Distillation with Neural Architecture Search | AAAI 2020 | | 289 | | Search for Better Students to Learn Distilled Knowledge | arXiv:200111612 | | 290 | | Circumventing Outliers of AutoAugment with Knowledge Distillation | arXiv:200311342 | | 291 | | Network Pruning via Transformable Architecture Search | NIPS 2019 | | 292 | | Search to Distill: Pearls are Everywhere but not the Eyes | CVPR 2020 | | 293 | | AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks | ICML 2020 [code] | | 294 | 295 | ## Application of Knowledge Distillation 296 | 297 | | Sub | Title | Venue | 298 | | ------------ | ------------------------------------------------------------ | ---------------- | 299 | | Graph | Graph-based Knowledge Distillation by Multi-head Attention Network | arXiv:19072226 | 300 | | | Graph Representation Learning via Multi-task Knowledge Distillation | arXiv:19115700 | 301 | | | Deep geometric knowledge distillation with graphs | arXiv:19113080 | 302 | | | Better and faster: Knowledge transfer from multiple self-supervised learning tasks via graph distillation for video classification | IJCAI 2018 | 303 | | | Distillating Knowledge from Graph Convolutional Networks | CVPR 2020 | 304 | | | | | 305 | | Face | Face model compression by distilling knowledge from neurons | AAAI 2016 | 306 | | | MarginDistillation: distillation for margin-based softmax | arXiv:2003.2586 | 307 | | | | | 308 | | ReID | Distilled Person Re-Identification: Towards a More Scalable System | CVPR 2019 | 309 | | | Robust Re-Identification by Multiple Views Knowledge Distillation | ECCV 2020 [code] | 310 | | | | | 311 | | Detection | Learning efficient object detection models with knowledge distillation | NIPS 2017 | 312 | | | Distilling Object Detectors with Fine-grained Feature Imitation | CVPR 2019 | 313 | | | Relation Distillation Networks for Video Object Detection | ICCV 2019 | 314 | | | Learning Lightweight Face Detector with Knowledge Distillation | IEEE 2019 | 315 | | | Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection | ICCV 2019 | 316 | | | Learning Lightweight Lane Detection CNNs by Self Attention Distillation | ICCV 2019 | 317 | | | A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection | CVPR 2020 [code] | 318 | | | Boosting Weakly Supervised Object Detection with Progressive Knowledge Transfer | ECCV 2020 | 319 | | | A Multi-Task Mean Teacher for Semi-Supervised Shadow Detection | CVPR 2020 [code] | 320 | | | Temporal Self-Ensembling Teacher for Semi-Supervised Object Detection | IEEE 2020 [code] | 321 | | | Uninformed Students: Student-Teacher Anomaly Detection with Discriminative Latent Embeddings | CVPR 2020 | 322 | | | Distilling Knowledge from Refinement in Multiple Instance Detection Networks | arXiv:2004.10943 | 323 | | | Enabling Incremental Knowledge Transfer for Object Detection at the Edge | arXiv:2004.5746 | 324 | | | | | 325 | | Pose | DOPE: Distillation Of Part Experts for whole-body 3D pose estimation in the wild | ECCV 2020 | 326 | | | Fast Human Pose Estimation | CVPR 2019 | 327 | | | Distill Knowledge From NRSfM for Weakly Supervised 3D Pose Learning | ICCV 2019 | 328 | | | | | 329 | | Segmentation | ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes | CVPR 2018 | 330 | | | Knowledge Distillation for Incremental Learning in Semantic Segmentation | arXiv:1911.3462 | 331 | | | Geometry-Aware Distillation for Indoor Semantic Segmentation | CVPR 2019 | 332 | | | Structured Knowledge Distillation for Semantic Segmentation | CVPR 2019 | 333 | | | Self-similarity Student for Partial Label Histopathology Image Segmentation | ECCV 2020 | 334 | | | Knowledge Distillation for Brain Tumor Segmentation | arXiv:2002.3688 | 335 | | | ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes | CVPR 2018 | 336 | | | | | 337 | | Low-Vision | Lightweight Image Super-Resolution with Information Multi-distillation Network | ICCVW 2019 | 338 | | | Collaborative Distillation for Ultra-Resolution Universal Style Transfer | CVPR 2020 [code] | 339 | | | | | 340 | | Video | Efficient Video Classification Using Fewer Frames | CVPR 2019 | 341 | | | Relation Distillation Networks for Video Object Detection | ICCV 2019 | 342 | | | Teacher Supervises Students How to Learn From Partially Labeled Images for Facial Landmark Detection | ICCV 2019 | 343 | | | Progressive Teacher-student Learning for Early Action Prediction | CVPR 2019 | 344 | | | MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization | arXiv:1910.12295 | 345 | | | AWSD:Adaptive Weighted Spatiotemporal Distillation for Video Representation | ICCV 2019 | 346 | | | Dynamic Kernel Distillation for Efficient Pose Estimation in Videos | ICCV 2019 | 347 | | | Online Model Distillation for Efficient Video Inference | ICCV 2019 | 348 | | | Optical Flow Distillation: Towards Efficient and Stable Video Style Transfer | ECCV 2020 | 349 | | | Adversarial Self-Supervised Learning for Semi-Supervised 3D Action Recognition | ECCV 2020 | 350 | | | Object Relational Graph with Teacher-Recommended Learning for Video Captioning | CVPR 2020 | 351 | | | Spatio-Temporal Graph for Video Captioning with Knowledge distillation | CVPR 2020 [code] | 352 | | | TA-Student VQA: Multi-Agents Training by Self-Questioning | CVPR 2020 | 353 | 354 | 355 | 356 | ## Data-free Knowledge Distillation 357 | 358 | | Title | Venue | Note | 359 | | ------------------------------------------------------------ | ---------------- | ---- | 360 | | Data-Free Knowledge Distillation for Deep Neural Networks | NIPS 2017 | | 361 | | Zero-Shot Knowledge Distillation in Deep Networks | ICML 2019 | | 362 | | DAFL:Data-Free Learning of Student Networks | ICCV 2019 | | 363 | | Zero-shot Knowledge Transfer via Adversarial Belief Matching | NIPS 2019 | | 364 | | Dream Distillation: A Data-Independent Model Compression Framework | ICML 2019 | | 365 | | Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion | CVPR 2020 | | 366 | | Data-Free Adversarial Distillation | CVPR 2020 | | 367 | | The Knowledge Within: Methods for Data-Free Model Compression | CVPR 2020 | | 368 | | Knowledge Extraction with No Observable Data | NIPS 2019 | | 369 | | Data-Free Knowledge Amalgamation via Group-Stack Dual-GAN | CVPR 2020 | | 370 | | DeGAN : Data-Enriching GAN for Retrieving Representative Samples from a Trained Classifier | arXiv:1912.11960 | | 371 | | Generative Low-bitwidth Data Free Quantization | arXiv:2003.3603 | | 372 | | This dataset does not exist: training models from generated images | arXiv:1911.2888 | | 373 | | MAZE: Data-Free Model Stealing Attack Using Zeroth-Order Gradient Estimation | arXiv:2005.3161 | | 374 | | Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data | ECCV 2020 | | 375 | | Billion-scale semi-supervised learning for image classification | arXiv:1905.00546 | | 376 | | Data-free Parameter Pruning for Deep Neural Networks | arXiv:1507.6149 | | 377 | | Data-Free Quantization Through Weight Equalization and Bias Correction | ICCV 2019 | | 378 | | DAC: Data-free Automatic Acceleration of Convolutional Networks | WACV 2019 | | 379 | 380 | 381 | 382 | ## Cross-modal Knowledge Distillation 383 | 384 | | Title | Venue | Note | 385 | | ------------------------------------------------------------ | ---------------- | ---- | 386 | | SoundNet: Learning Sound Representations from Unlabeled Video SoundNet Architecture | ECCV 2016 | | 387 | | Cross Modal Distillation for Supervision Transfer | CVPR 2016 | | 388 | | Emotion recognition in speech using cross-modal transfer in the wild | ACM MM 2018 | | 389 | | Through-Wall Human Pose Estimation Using Radio Signals | CVPR 2018 | | 390 | | Compact Trilinear Interaction for Visual Question Answering | ICCV 2019 | | 391 | | Cross-Modal Knowledge Distillation for Action Recognition | ICIP 2019 | | 392 | | Learning to Map Nearly Anything | arXiv:1909.6928 | | 393 | | Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image Retrieval | ICCV 2019 | | 394 | | UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation | ICCV 2019 | | 395 | | CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency | CVPR 2019 | | 396 | | XD:Cross lingual Knowledge Distillation for Polyglot Sentence Embeddings | | | 397 | | Effective Domain Knowledge Transfer with Soft Fine-tuning | arXiv:1909.2236 | | 398 | | ASR is all you need: cross-modal distillation for lip reading | arXiv:1911.12747 | | 399 | | Knowledge distillation for semi-supervised domain adaptation | arXiv:1908.7355 | | 400 | | Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition | arXiv:2001.1798 | | 401 | | Cluster Alignment with a Teacher for Unsupervised Domain Adaptation | ICCV 2019. | | 402 | | Attention Bridging Network for Knowledge Transfer | ICCV 2019 | | 403 | | Unpaired Multi-modal Segmentation via Knowledge Distillation | arXiv:2001.3111 | | 404 | | Multi-source Distilling Domain Adaptation | arXiv:1911.11554 | | 405 | | Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing | CVPR 2020 | | 406 | | Improving Semantic Segmentation via Self-Training | arXiv:2004.14960 | | 407 | | Speech to Text Adaptation: Towards an Efficient Cross-Modal Distillation | arXiv:2005.8213 | | 408 | | Joint Progressive Knowledge Distillation and Unsupervised Domain Adaptation | arXiv:2005.7839 | | 409 | | Knowledge as Priors: Cross-Modal Knowledge Generalization for Datasets without Superior Knowledge | CVPR 2020 | | 410 | | Large-Scale Domain Adaptation via Teacher-Student Learning | arXiv:1708.5466 | | 411 | | Large Scale Audiovisual Learning of Sounds with Weakly Labeled Data | IJCAI 2020 | | 412 | | Distilling Cross-Task Knowledge via Relationship Matching | CVPR 2020 [code] | | 413 | | Modality distillation with multiple stream networks for action recognition | ECCV 2018 | | 414 | | Domain Adaptation through Task Distillation | ECCV 2020 | | 415 | 416 | ## Adversarial Knowledge Distillation 417 | 418 | | Title | Venue | Note | 419 | | ------------------------------------------------------------ | ---------------- | ---- | 420 | | Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks | arXiv:1709.00513 | | 421 | | KTAN: Knowledge Transfer Adversarial Network | arXiv:1810.08126 | | 422 | | KDGAN:Knowledge Distillation with Generative Adversarial Networks. | NIPS 2018 | | 423 | | Adversarial Learning of Portable Student Networks | AAAI 2018 | | 424 | | Adversarial Network Compression | ECCV 2018 | | 425 | | Cross-Modality Distillation: A case for Conditional Generative Adversarial Networks | ICASSP 2018 | | 426 | | Adversarial Distillation for Efficient Recommendation with External Knowledge | TOIS 2018 | | 427 | | Training student networks for acceleration with conditional adversarial networks | BMVC 2018 | | 428 | | Adversarial network compression | ECCV 2018 | | 429 | | KDGAN:Knowledge Distillation with Generative Adversarial Networks | NIPS 2018 | | 430 | | DAFL:Data-Free Learning of Student Networks | ICCV 2019 | | 431 | | MEAL: Multi-Model Ensemble via Adversarial Learning | AAAI 2019 | | 432 | | Exploiting the Ground-Truth: An Adversarial Imitation Based Knowledge Distillation Approach for Event Detection | AAAI 2019 | | 433 | | Adversarially Robust Distillation | AAAI 2020 | | 434 | | GAN-Knowledge Distillation for one-stage Object Detection | arXiv:1906.08467 | | 435 | | Lifelong GAN: Continual Learning for Conditional Image Generation | arXiv:1908.03884 | | 436 | | Compressing GANs using Knowledge Distillation | arXiv:1902.00159 | | 437 | | Feature-map-level Online Adversarial Knowledge Distillation | ICML 2020 | | 438 | | MineGAN: effective knowledge transfer from GANs to target domains with few images | CVPR 2020 | | 439 | | Distilling portable Generative Adversarial Networks for Image Translation | AAAI 2020 | | 440 | | GAN Compression: Efficient Architectures for Interactive Conditional GANs | CVPR 2020 | | --------------------------------------------------------------------------------