├── logos
├── HDU.png
├── ITMO.jpg
├── NTU.jpg
├── SEU.jpg
├── SRIBD.png
├── XDU.jpg
├── ZJU.png
└── CUHK-SZ.png
├── LICENSE
└── README.md
/logos/HDU.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection/HEAD/logos/HDU.png
--------------------------------------------------------------------------------
/logos/ITMO.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection/HEAD/logos/ITMO.jpg
--------------------------------------------------------------------------------
/logos/NTU.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection/HEAD/logos/NTU.jpg
--------------------------------------------------------------------------------
/logos/SEU.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection/HEAD/logos/SEU.jpg
--------------------------------------------------------------------------------
/logos/SRIBD.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection/HEAD/logos/SRIBD.png
--------------------------------------------------------------------------------
/logos/XDU.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection/HEAD/logos/XDU.jpg
--------------------------------------------------------------------------------
/logos/ZJU.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection/HEAD/logos/ZJU.png
--------------------------------------------------------------------------------
/logos/CUHK-SZ.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection/HEAD/logos/CUHK-SZ.png
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2025 孙逸飞
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 | [
6 | ](#top)
7 |
8 |
9 |
10 |
11 |
12 |
13 | [](https://github.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection)
14 | [](https://github.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection)
15 | [](https://github.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection/issues)
16 | [](https://github.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection/blob/main/LICENSE)
17 | [](https://visitorbadge.io/status?path=https%3A%2F%2Fgithub.com%2Fdiaoquesang%2FPaper-List-for-Medical-Anomaly-Detection)
18 |
19 |
20 |
21 | **🦉 Contributors: [Yifei Sun (22' HDU-ITMO Undergraduate/26' ZJU PhD)](https://diaoquesang.github.io/), [Junhao Jia (23' HDU Undergraduate)](https://github.com/BeistMedAI), [Hao Zheng (22' HDU-ITMO Undergraduate)](https://github.com/267588), [Zhanghao Chen (25' SEU Master)](https://benny0323.github.io/bio/), [Yuzhi He (23' XDU Undergraduate)](https://github.com/Black0226), [Jinhong Wang (21' ZJU PhD)](https://wang-jinhong.github.io/), [Jincheng Li (23' NTU Undergraduate)](https://github.com/li00000011), [Chengzhi Gui (25' SJTU PhD)](https://github.com/Cooper-Gu).**
22 |
23 | **🎓 DeepWiki: [Generating GitHub Knowledge Base Documentation in One Click](https://deepwiki.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection).**
24 |
25 | **📦 Other resources: [1] [Bone Suppression in Chest X-Rays: A Deep Survey](https://github.com/diaoquesang/A-detailed-summarization-about-bone-suppression-in-Chest-X-rays), [2] [Paper List for Prototypical Learning](https://github.com/BeistMedAI/Paper-List-for-Prototypical-Learning), [3] [Paper List for Cell Detection](https://github.com/li00000011/Paper-List-for-Cell-Detection), [4] [Medical-AI-Guide](https://github.com/diaoquesang/Medical-AI-Guide/), [5] [Paper List for Medical Reasoning Large Language Models](https://github.com/HovChen/Paper-List-for-Medical-Reasoning-Large-Language-Models), [6] [Deep Learning for Virtual Staining of Label-Free Tissue: A Survey](https://github.com/diaoquesang/DL4VS), [7] [Paper Library for Neural and 3D Generation](https://github.com/Alita02384/Paper-Library-for-neural-and-3D-generation).**
26 |
27 |
28 |

29 |

30 |

31 |

32 |

33 |

34 |

35 |

36 |
37 |
38 |
39 | ## 📇 Contents
40 | - [**1. Solving "Identity Mapping"**](#s1)
41 | - [**2. Supervised Learning**](#s2)
42 | - [**3. Self-Supervised Learning**](#s3)
43 | - [**4. AE-Based Approaches**](#s4)
44 | - [**5. GAN-Based Approaches**](#s5)
45 | - [**6. Flow-Based Approaches**](#s6)
46 | - [**7. Diffusion-Based Approaches**](#s7)
47 | - [**8. Patch-Based Approaches**](#s8)
48 | - [**9. Foundation Model-Based Approaches**](#s9)
49 | - [**10. Multi-Modal Fusion**](#s10)
50 | - [**11. Knowledge Distillation**](#s11)
51 | - [**12. Correlation Learning**](#s12)
52 | - [**13. Anomaly Generation**](#s13)
53 | - [**14. Representation Learning**](#s14)
54 | - [**15. Matching Correction**](#s15)
55 | - [**16. Benchmarks, Surveys & Datasets**](#s16)
56 |
57 | ## ✏️ Tips
58 |
59 | - *: Papers for Non-Medical Anomaly Detection
60 |
61 | - :octocat:: Code
62 |
63 |
64 | ## 1. Solving "Identity Mapping"
65 |
66 | - [[CVPR 2025]](https://openaccess.thecvf.com/content/CVPR2025/html/Guo_Dinomaly_The_Less_Is_More_Philosophy_in_Multi-Class_Unsupervised_Anomaly_CVPR_2025_paper.html) **Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection** [:octocat:](https://github.com/guojiajeremy/dinomaly)
67 |
68 | *Guo, Jia and Lu, Shuai and Zhang, Weihang and Chen, Fang and Li, Huiqi and Liao, Hongen*
69 |
70 |
71 |

72 |
73 |
74 |
75 | 📋 Abstract (Click to Expand)
76 | Recent studies highlighted a practical setting of unsupervised anomaly detection (UAD) that builds a unified model for multi-class images. Despite various advancements addressing this challenging task, the detection performance under the multi-class setting still lags far behind state-of-the-art class-separated models. Our research aims to bridge this substantial performance gap. In this paper, we present Dinomaly, a minimalist reconstruction-based anomaly detection framework that harnesses pure Transformer architectures without relying on complex designs, additional modules, or specialized tricks. Given this powerful framework consisting of only Attentions and MLPs, we found four simple components that are essential to multi-class anomaly detection: (1) Scalable foundation Transformers that extracts universal and discriminative features, (2) Noisy Bottleneck where pre-existing Dropouts do all the noise injection tricks, (3) Linear Attention that naturally cannot focus, and (4) Loose Reconstruction that does not force layer-to-layer and point-by-point reconstruction. Extensive experiments are conducted across popular anomaly detection benchmarks including MVTec-AD, VisA, and Real-IAD. Our proposed Dinomaly achieves impressive image-level AUROC of 99.6%, 98.7%, and 89.3% on the three datasets respectively, which is not only superior to state-of-the-art multi-class UAD methods, but also achieves the most advanced class-separated UAD records.
77 |
78 |
79 | - *[[NeurIPS 2022]](https://proceedings.neurips.cc/paper_files/paper/2022/file/1d774c112926348c3e25ea47d87c835b-Paper-Conference.pdf) **A Unified Model for Multi-class Anomaly Detection** [:octocat:](https://github.com/zhiyuanyou/uniad)
80 |
81 | *You, Zhiyuan and Cui, Lei and Shen, Yujun and Yang, Kai and Lu, Xin and Zheng, Yu and Le, Xinyi*
82 |
83 |
84 |

85 |
86 |
87 |
88 | 📋 Abstract (Click to Expand)
89 | Despite the rapid advance of unsupervised anomaly detection, existing methods require to train separate models for different objects. In this work, we present UniAD that accomplishes anomaly detection for multiple classes with a unified framework. Under such a challenging setting, popular reconstruction networks may fall into an" identical shortcut", where both normal and anomalous samples can be well recovered, and hence fail to spot outliers. To tackle this obstacle, we make three improvements. First, we revisit the formulations of fully-connected layer, convolutional layer, as well as attention layer, and confirm the important role of query embedding (ie, within attention layer) in preventing the network from learning the shortcut. We therefore come up with a layer-wise query decoder to help model the multi-class distribution. Second, we employ a neighbor masked attention module to further avoid the information leak from the input feature to the reconstructed output feature. Third, we propose a feature jittering strategy that urges the model to recover the correct message even with noisy inputs. We evaluate our algorithm on MVTec-AD and CIFAR-10 datasets, where we surpass the state-of-the-art alternatives by a sufficiently large margin. For example, when learning a unified model for 15 categories in MVTec-AD, we surpass the second competitor on the tasks of both anomaly detection (from 88.1% to 96.5%) and anomaly localization (from 89.5% to 96.8%). Code is available at https://github.com/zhiyuanyou/UniAD.
90 |
91 |
92 | ## 2. Supervised Learning
93 |
94 | - [[Radiology 2025]](https://pubs.rsna.org/doi/10.1148/radiol.241629) **Cancer Detection in Breast MRI Screening via Explainable AI Anomaly Detection** [:octocat:](https://github.com/microsoft/breastMRI-fcdd)
95 |
96 | *Oviedo, Felipe and Kazerouni, Anum S. and Liznerski, Philipp and Xu, Yixi and Hirano, Michael and Vandermeulen, Robert A. and Kloft, Marius and Blum, Elyse and Alessio, Adam M. and Li, Christopher I. and Weeks, Bill and Dodhia, Rahul and Lavista Ferres, Juan M. and Rahbar, Habib and Partridge, Savannah C.*
97 |
98 |
99 |

100 |
101 |
102 |
103 | 📋 Abstract (Click to Expand)
104 | Background Artificial intelligence (AI) models hold potential to increase the accuracy and efficiency of breast MRI screening; however, existing models have not been rigorously evaluated in populations with low cancer prevalence and lack interpretability, both of which are essential for clinical adoption.
105 | Purpose To develop an explainable AI model for cancer detection at breast MRI that is effective in both high- and low-cancer-prevalence settings. Materials and Methods This retrospective study included 9738 breast MRI examinations from a single institution (2005–2022), with external testing in a publicly available multicenter dataset (221 examinations). In total, 9567 consecutive examinations were used to develop an explainable fully convolutional data description (FCDD) anomaly detection model to detect malignancies on contrast-enhanced MRI scans. Performance was evaluated in three cohorts: grouped cross-validation (for both balanced [20.0% malignant] and imbalanced [1.85% malignant] detection tasks), an internal independent test set (171 examinations), and an external dataset. Explainability was assessed through pixelwise comparisons with reference-standard malignancy annotations. Statistical significance was assessed using the Wilcoxon signed rank test. Results FCDD outperformed the benchmark binary cross-entropy (BCE) model in cross-validation for both balanced (mean area under the receiver operating characteristic curve [AUC] = 0.84 ± 0.01 [SD] vs 0.81 ± 0.01; P < .001) and imbalanced (mean AUC = 0.72 ± 0.03 vs 0.69 ± 0.03; P < .001) detection tasks. At a fixed 97% sensitivity in the imbalanced setting, mean specificity across folds was 13% for FCDD and 9% for BCE (P = .02). In the internal test set, FCDD outperformed BCE for balanced (mean AUC = 0.81 ± 0.02 vs 0.72 ± 0.02; P < .001) and imbalanced (mean AUC = 0.78 ± 0.05 vs 0.76 ± 0.01; P < .02) detection tasks. For model explainability, FCDD demonstrated better spatial agreement with reference-standard annotations than BCE (internal test set: mean pixelwise AUC = 0.92 ± 0.10 vs 0.81 ± 0.13; P < .001). External testing confirmed that FCDD performed well, and better than BCE, in the balanced detection task (AUC = 0.86 ± 0.01 vs 0.79 ± 0.01; P < .001). Conclusion The developed explainable AI model for cancer detection at breast MRI accurately depicted tumor location and outperformed commonly used models in both high- and low-cancer-prevalence scenarios.
106 |
107 |
108 | - *[[CVPR 2024]](https://openaccess.thecvf.com/content/CVPR2024/html/Baitieva_Supervised_Anomaly_Detection_for_Complex_Industrial_Images_CVPR_2024_paper.html) **Supervised Anomaly Detection for Complex Industrial Images** [:octocat:](https://github.com/abc-125/segad)
109 |
110 | *Baitieva, Aimira and Hurych, David and Besnier, Victor and Bernard, Olivier*
111 |
112 |
113 |

114 |
115 |
116 |
117 | 📋 Abstract (Click to Expand)
118 | Automating visual inspection in industrial production lines is essential for increasing product quality across various industries. Anomaly detection (AD) methods serve as robust tools for this purpose. However existing public datasets primarily consist of images without anomalies limiting the practical application of AD methods in production settings. To address this challenge we present (1) the Valeo Anomaly Dataset (VAD) a novel real-world industrial dataset comprising 5000 images including 2000 instances of challenging real defects across more than 20 subclasses. Acknowledging that traditional AD methods struggle with this dataset we introduce (2) Segmentation-based Anomaly Detector (SegAD). First SegAD leverages anomaly maps as well as segmentation maps to compute local statistics. Next SegAD uses these statistics and an optional supervised classifier score as input features for a Boosted Random Forest (BRF) classifier yielding the final anomaly score. Our SegAD achieves state-of-the-art performance on both VAD (+ 2.1% AUROC) and the VisA dataset (+ 0.4% AUROC). The code and the models are publicly available.
119 |
120 |
121 | - [[CVPR 2019]](https://openaccess.thecvf.com/content_CVPR_2019/papers/Zhang_Cascaded_Generative_and_Discriminative_Learning_for_Microcalcification_Detection_in_Breast_CVPR_2019_paper.pdf) **Cascaded Generative and Discriminative Learning for Microcalcification Detection in Breast Mammograms**
122 |
123 | *Zhang, Fandong and Luo, Ling and Sun, Xinwei and Zhou, Zhen and Li, Xiuli and Yu, Yizhou and Wang, Yizhou*
124 |
125 |
126 |

127 |
128 |
129 |
130 | 📋 Abstract (Click to Expand)
131 | Accurate microcalcification (mC) detection is of great importance due to its high proportion in early breast cancers. Most of the previous mC detection methods belong to discriminative models, where classifiers are exploited to distinguish mCs from other backgrounds. However, it is still challenging for these methods to tell the mCs from amounts of normal tissues because they are too tiny (at most 14 pixels). Generative methods can precisely model the normal tissues and regard the abnormal ones as outliers, while they fail to further distinguish the mCs from other anomalies, ie vessel calcifications. In this paper, we propose a hybrid approach by taking advantages of both generative and discriminative models. Firstly, a generative model named Anomaly Separation Network (ASN) is used to generate candidate mCs. ASN contains two major components. A deep convolutional encoder-decoder network is built to learn the image reconstruction mapping and a t-test loss function is designed to separate the distributions of the reconstruction residuals of mCs from normal tissues. Secondly, a discriminative model is cascaded to tell the mCs from the false positives. Finally, to verify the effectiveness of our method, we conduct experiments on both public and in-house datasets, which demonstrates that our approach outperforms previous state-of-the-art methods.
132 |
133 |
134 | ## 3. Self-Supervised Learning
135 |
136 | - *[[TPAMI 2024]](https://ieeexplore.ieee.org/abstract/document/10553645/) **MOODv2: Masked Image Modeling for Out-of-Distribution Detection** [:octocat:](https://github.com/dvlab-research/MOOD)
137 |
138 | *Li, Jingyao and Chen, Pengguang and Yu, Shaozuo and Liu, Shu and Jia, Jiaya*
139 |
140 |
141 |

142 |
143 |
144 |
145 | 📋 Abstract (Click to Expand)
146 | The crux of effective out-of-distribution (OOD) detection lies in acquiring a robust in-distribution (ID) representation, distinct from OOD samples. While previous methods predominantly leaned on recognition-based techniques for this purpose, they often resulted in shortcut learning, lacking comprehensive representations. In our study, we conducted a comprehensive analysis, exploring distinct pretraining tasks and employing various OOD score functions. The results highlight that the feature representations pre-trained through reconstruction yield a notable enhancement and narrow the performance gap among various score functions. This suggests that even simple score functions can rival complex ones when leveraging reconstruction-based pretext tasks. Reconstruction-based pretext tasks adapt well to various score functions. As such, it holds promising potential for further expansion. Our OOD detection framework, MOODv2, employs the masked image modeling pretext task. Without bells and whistles, MOODv2 impressively enhances 14.30% AUROC to 95.68% on ImageNet and achieves 99.98% on CIFAR-10.
147 |
148 |
149 | - *[[CVPR 2023]](http://openaccess.thecvf.com/content/CVPR2023/html/Li_Rethinking_Out-of-Distribution_OOD_Detection_Masked_Image_Modeling_Is_All_You_CVPR_2023_paper.html) **Rethinking Out-of-Distribution (OOD) Detection: Masked Image Modeling Is All You Need** [:octocat:](https://github.com/dvlab-research/MOOD)
150 |
151 | *Li, Jingyao and Chen, Pengguang and He, Zexin and Yu, Shaozuo and Liu, Shu and Jia, Jiaya*
152 |
153 |
154 |

155 |
156 |
157 |
158 | 📋 Abstract (Click to Expand)
159 | The core of out-of-distribution (OOD) detection is to learn the in-distribution (ID) representation, which is distinguishable from OOD samples. Previous work applied recognition-based methods to learn the ID features, which tend to learn shortcuts instead of comprehensive representations. In this work, we find surprisingly that simply using reconstruction-based methods could boost the performance of OOD detection significantly. We deeply explore the main contributors of OOD detection and find that reconstruction-based pretext tasks have the potential to provide a generally applicable and efficacious prior, which benefits the model in learning intrinsic data distributions of the ID dataset. Specifically, we take Masked Image Modeling as a pretext task for our OOD detection framework (MOOD). Without bells and whistles, MOOD outperforms previous SOTA of one-class OOD detection by 5.7%, multi-class OOD detection by 3.0%, and near-distribution OOD detection by 2.1%. It even defeats the 10-shot-per-class outlier exposure OOD detection, although we do not include any OOD samples for our detection.
160 |
161 |
162 | - [[MedIA 2023]](https://arxiv.org/pdf/2301.08330) **The Role of Noise in Denoising Models for Anomaly Detection in Medical Images** [:octocat:](https://github.com/antanaskascenas/denoisingae)
163 |
164 | *Kascenas, Antanas and Sanchez, Pedro and Schrempf, Patrick and Wang, Chaoyang and Clackett, William and Mikhael, Shadia S and Voisey, Jeremy P and Goatman, Keith and Weir, Alexander and Pugeault, Nicolas and others*
165 |
166 |
167 |

168 |
169 |
170 |
171 | 📋 Abstract (Click to Expand)
172 | Pathological brain lesions exhibit diverse appearance in brain images, in terms of intensity, texture, shape, size, and location. Comprehensive sets of data and annotations are difficult to acquire. Therefore, unsupervised anomaly detection approaches have been proposed using only normal data for training, with the aim of detecting outlier anomalous voxels at test time. Denoising methods, for instance classical denoising autoencoders (DAEs) and more recently emerging diffusion models, are a promising approach, however naive application of pixelwise noise leads to poor anomaly detection performance. We show that optimization of the spatial resolution and magnitude of the noise improves the performance of different model training regimes, with similar noise parameter adjustments giving good performance for both DAEs and diffusion models. Visual inspection of the reconstructions suggests that the training noise influences the trade-off between the extent of the detail that is reconstructed and the extent of erasure of anomalies, both of which contribute to better anomaly detection performance. We validate our findings on two real-world datasets (tumor detection in brain MRI and hemorrhage/ischemia/tumor detection in brain CT), showing good detection on diverse anomaly appearances. Overall, we find that a DAE trained with coarse noise is a fast and simple method that gives state-of-the-art accuracy. Diffusion models applied to anomaly detection are as yet in their infancy and provide a promising avenue for further research. Code for our DAE model and coarse noise is provided at: https://github.com/AntanasKascenas/DenoisingAE.
173 |
174 |
175 | ## 4. AE-Based Approaches
176 |
177 | - [[MICCAI 2024]](https://arxiv.org/pdf/2403.09303) **Rethinking Autoencoders for Medical Anomaly Detection from a Theoretical Perspective** [:octocat:](https://github.com/caiyu6666/ae4ad)
178 |
179 | *Cai, Yu and Chen, Hao and Cheng, Kwang-Ting*
180 |
181 |
182 |

183 |
184 |
185 |
186 | 📋 Abstract (Click to Expand)
187 | Medical anomaly detection aims to identify abnormal findings using only normal training data, playing a crucial role in health screening and recognizing rare diseases. Reconstruction-based methods, particularly those utilizing autoencoders (AEs), are dominant in this field. They work under the assumption that AEs trained on only normal data cannot reconstruct unseen abnormal regions well, thereby enabling the anomaly detection based on reconstruction errors. However, this assumption does not always hold due to the mismatch between the reconstruction training objective and the anomaly detection task objective, rendering these methods theoretically unsound. This study focuses on providing a theoretical foundation for AE-based reconstruction methods in anomaly detection. By leveraging information theory, we elucidate the principles of these methods and reveal that the key to improving AE in anomaly detection lies in minimizing the information entropy of latent vectors. Experiments on four datasets with two image modalities validate the effectiveness of our theory. To the best of our knowledge, this is the first effort to theoretically clarify the principles and design philosophy of AE for anomaly detection. The code is available at https://github.com/caiyu6666/AE4AD.
188 |
189 |
190 | - [[ICLR 2023]](https://openreview.net/pdf?id=9OmCr1q54Z) **AE-FLOW: Autoencoders with Normalizing Flows for Medical Images Anomaly Detection**
191 |
192 | *Zhao, Yuzhong and Ding, Qiaoqiao and Zhang, Xiaoqun*
193 |
194 |
195 |

196 |
197 |
198 |
199 | 📋 Abstract (Click to Expand)
200 | Anomaly detection from medical images is an important task for clinical screening and diagnosis. In general, a large dataset of normal images are available while only few abnormal images can be collected in clinical practice. By mimicking the diagnosis process of radiologists, we attempt to tackle this problem by learning a tractable distribution of normal images and identify anomalies by differentiating the original image and the reconstructed normal image. More specifically, we propose a normalizing flow-based autoencoder for an efficient and tractable representation of normal medical images. The anomaly score consists of the likelihood originated from the normalizing flow and the reconstruction error of the autoencoder, which allows to identify the abnormality and provide an interpretability at both image and pixel levels. Experimental evaluation on two medical images datasets showed that the proposed model outperformed the other approaches by a large margin, which validated the effectiveness and robustness of the proposed method.
201 |
202 |
203 | ## 5. GAN-Based Approaches
204 |
205 | - [[Neurocomputing 2025]](https://www.sciencedirect.com/science/article/pii/S0925231224015339) **Industrial and Medical Anomaly Detection Through Cycle-Consistent Adversarial Networks** [:octocat:](https://github.com/valdelch/cyclegans-anomalydetection)
206 |
207 | *Bougaham, Arnaud and Delchevalerie, Valentin and El Adoui, Mohammed and Frénay, Benoît*
208 |
209 |
210 |

211 |
212 |
213 |
214 | 📋 Abstract (Click to Expand)
215 | In this study, a new Anomaly Detection (AD) approach for industrial and medical images is proposed. This method leverages the theoretical strengths of unsupervised learning and the data availability of both normal and abnormal classes. Indeed, the AD is often formulated as an unsupervised task, implying only normal images during training. These normal images are devoted to be reconstructed through an autoencoder architecture, for instance. However, the information contained in abnormal data, when available, is also valuable for this reconstruction. The model would be able to identify its weaknesses by also learning how to transform an abnormal image into a normal one. This abnormal-to-normal reconstruction helps the entire model to learn better than a single normal-to-normal reconstruction. To be able to exploit abnormal images, the proposed method uses Cycle-Generative Adversarial Networks (Cycle-GAN) for (ab)normal-to-normal translation. After an input image has been reconstructed by the normal generator, an anomaly score quantifies the differences between the input and its reconstruction. Based on a threshold set to satisfy a business quality constraint, the input image is then flagged as normal or not. The proposed method is evaluated on industrial and medical datasets. The results demonstrate accurate performance with a zero false negative constraint compared to state-of-the-art methods. Quantitatively, our method reaches an accuracy under a zero false negative constraint of 79.89%, representing an improvement of about 17% compared to competitors. The code is available at https://github.com/ValDelch/CycleGANS-AnomalyDetection.
216 |
217 |
218 | - [[PR 2024]](https://ntnuopen.ntnu.no/ntnu-xmlui/bitstream/handle/11250/3178214/2229397_Article_.pdf?sequence=1) **Anomaly Detection via Gating Highway Connection for Retinal Fundus Images** [:octocat:](https://github.com/WentianZhang-ML/GatingAno)
219 |
220 | *Zhang, Wentian and Liu, Haozhe and Xie, Jinheng and Huang, Yawen and Zhang, Yu and Li, Yuexiang and Ramachandra, Raghavendra and Zheng, Yefeng*
221 |
222 |
223 |

224 |
225 |
226 |
227 | 📋 Abstract (Click to Expand)
228 | Since the labels for medical images are challenging to collect in real scenarios, especially for rare diseases, fully supervised methods cannot achieve robust performance for clinical anomaly detection. Recent research tried to tackle this problem by training the anomaly detection framework using only normal data. Reconstruction-based methods, e.g., auto-encoder, achieved impressive performances in the anomaly detection task. However, most existing methods adopted the straightforward backbone architecture (i.e., encoder-and-decoder) for image reconstruction. The design of a skip connection, which can directly transfer information between the encoder and decoder, is rarely used. Since the existing U-Net has demonstrated the effectiveness of skip connections for image reconstruction tasks, in this paper, we first use the dynamic gating strategy to achieve the usage of skip connections in existing reconstruction-based anomaly detection methods and then propose a novel gating highway connection module to adaptively integrate skip connections into the framework and boost its anomaly detection performance, namely GatingAno. Furthermore, we formulate an auxiliary task, namely histograms of oriented gradients (HOG) prediction, to encourage the framework to exploit contextual information from fundus images in a self-driven manner, which increases the robustness of feature representation extracted from the healthy samples. Last but not least, to improve the model generalization for anomalous data, we introduce an adversarial strategy for the training of our multi-task framework. Experimental results on the publicly available datasets, i.e., IDRiD and ADAM, validate the superiority of our method for detecting abnormalities in retinal fundus images. The source code is available at https://github.com/WentianZhang-ML/GatingAno.
229 |
230 |
231 | - [[CVPR 2023]](http://openaccess.thecvf.com/content/CVPR2023/papers/Xiang_SQUID_Deep_Feature_In-Painting_for_Unsupervised_Anomaly_Detection_CVPR_2023_paper.pdf) **SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection** [:octocat:](https://github.com/tiangexiang/squid)
232 |
233 | *Xiang, Tiange and Zhang, Yixiao and Lu, Yongyi and Yuille, Alan L and Zhang, Chaoyi and Cai, Weidong and Zhou, Zongwei*
234 |
235 |
236 |

237 |
238 |
239 |
240 | 📋 Abstract (Click to Expand)
241 | Radiography imaging protocols focus on particular body regions, therefore producing images of great similarity and yielding recurrent anatomical structures across patients. To exploit this structured information, we propose the use of Space-aware Memory Queues for In-painting and Detecting anomalies from radiography images (abbreviated as SQUID). We show that SQUID can taxonomize the ingrained anatomical structures into recurrent patterns; and in the inference, it can identify anomalies (unseen/modified patterns) in the image. SQUID surpasses 13 state-of-the-art methods in unsupervised anomaly detection by at least 5 points on two chest X-ray benchmark datasets measured by the Area Under the Curve (AUC). Additionally, we have created a new dataset (DigitAnatomy), which synthesizes the spatial correlation and consistent shape in chest anatomy. We hope DigitAnatomy can prompt the development, evaluation, and interpretability of anomaly detection methods.
242 |
243 |
244 | - [[MedIA 2019]](https://www.sciencedirect.com/science/article/pii/S1361841518302640) **f-AnoGAN: Fast Unsupervised Anomaly Detection with Generative Adversarial Networks** [:octocat:](https://github.com/A03ki/f-AnoGAN)
245 |
246 | *Schlegl, Thomas and Seeböck, Philipp and Waldstein, Sebastian M and Langs, Georg and Schmidt-Erfurth, Ursula*
247 |
248 |
249 |

250 |
251 |
252 |
253 | 📋 Abstract (Click to Expand)
254 | Obtaining expert labels in clinical imaging is difficult since exhaustive annotation is time-consuming. Furthermore, not all possibly relevant markers may be known and sufficiently well described a priori to even guide annotation. While supervised learning yields good results if expert labeled training data is available, the visual variability, and thus the vocabulary of findings, we can detect and exploit, is limited to the annotated lesions. Here, we present fast AnoGAN (f-AnoGAN), a generative adversarial network (GAN) based unsupervised learning approach capable of identifying anomalous images and image segments, that can serve as imaging biomarker candidates. We build a generative model of healthy training data, and propose and evaluate a fast mapping technique of new data to the GAN’s latent space. The mapping is based on a trained encoder, and anomalies are detected via a combined anomaly score based on the building blocks of the trained model – comprising a discriminator feature residual error and an image reconstruction error. In the experiments on optical coherence tomography data, we compare the proposed method with alternative approaches, and provide comprehensive empirical evidence that f-AnoGAN outperforms alternative approaches and yields high anomaly detection accuracy. In addition, a visual Turing test with two retina experts showed that the generated images are indistinguishable from real normal retinal OCT images. The f-AnoGAN code is available at https://github.com/tSchlegl/f-AnoGAN.
255 |
256 |
257 | - [[IPMI 2017]](https://link.springer.com/chapter/10.1007/978-3-319-59050-9_12) **Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery** [:octocat:](https://github.com/seungjunlee96/AnoGAN-pytorch)
258 |
259 |
260 | *Schlegl, Thomas and Seeböck, Philipp and Waldstein, Sebastian M and Schmidt-Erfurth, Ursula and Langs, Georg*
261 |
262 |
263 |

264 |
265 |
266 |
267 | 📋 Abstract (Click to Expand)
268 | Obtaining models that capture imaging markers relevant for disease progression and treatment monitoring is challenging. Models are typically based on large amounts of data with annotated examples of known markers aiming at automating detection. High annotation effort and the limitation to a vocabulary of known markers limit the power of such approaches. Here, we perform unsupervised learning to identify anomalies in imaging data as candidates for markers. We propose AnoGAN, a deep convolutional generative adversarial network to learn a manifold of normal anatomical variability, accompanying a novel anomaly scoring scheme based on the mapping from image space to a latent space. Applied to new data, the model labels anomalies, and scores image patches indicating their fit into the learned distribution. Results on optical coherence tomography images of the retina demonstrate that the approach correctly identifies anomalous images, such as images containing retinal fluid or hyperreflective foci.
269 |
270 |
271 | ## 6. Flow-Based Approaches
272 |
273 | - *[[CVPR 2023]](https://openaccess.thecvf.com/content/CVPR2023/html/Lei_PyramidFlow_High-Resolution_Defect_Contrastive_Localization_Using_Pyramid_Normalizing_Flow_CVPR_2023_paper.html?ref=https://githubhelp.com) **PyramidFlow: High-Resolution Defect Contrastive Localization using Pyramid Normalizing Flow** [:octocat:](https://github.com/gasharper/PyramidFlow)
274 |
275 | *Lei, Jiarui and Hu, Xiaobo and Wang, Yue and Liu, Dong*
276 |
277 |
278 |

279 |
280 |
281 |
282 | 📋 Abstract (Click to Expand)
283 | During industrial processing, unforeseen defects may arise in products due to uncontrollable factors. Although unsupervised methods have been successful in defect localization, the usual use of pre-trained models results in low-resolution outputs, which damages visual performance. To address this issue, we propose PyramidFlow, the first fully normalizing flow method without pre-trained models that enables high-resolution defect localization. Specifically, we propose a latent template-based defect contrastive localization paradigm to reduce intra-class variance, as the pre-trained models do. In addition, PyramidFlow utilizes pyramid-like normalizing flows for multi-scale fusing and volume normalization to help generalization. Our comprehensive studies on MVTecAD demonstrate the proposed method outperforms the comparable algorithms that do not use external priors, even achieving state-of-the-art performance in more challenging BTAD scenarios.
284 |
285 |
286 | ## 7. Diffusion-Based Approaches
287 |
288 | - *[[CVPR 2025]](https://openaccess.thecvf.com/content/CVPR2025/html/Beizaee_Correcting_Deviations_from_Normality_A_Reformulated_Diffusion_Model_for_Multi-Class_CVPR_2025_paper.html) **Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection** [:octocat:](https://github.com/farzad-bz/DeCo-Diff)
289 |
290 | *Beizaee, Farzad and Lodygensky, Gregory A and Desrosiers, Christian and Dolz, Jose*
291 |
292 |
293 |

294 |
295 |
296 |
297 | 📋 Abstract (Click to Expand)
298 | Recent advances in diffusion models have spurred research into their application for Reconstruction-based unsupervised anomaly detection. However, these methods may struggle with maintaining structural integrity and recovering the anomaly-free content of abnormal regions, especially in multi-class scenarios. Furthermore, diffusion models are inherently designed to generate images from pure noise and struggle to selectively alter anomalous regions of an image while preserving normal ones. This leads to potential degradation of normal regions during reconstruction, hampering the effectiveness of anomaly detection. This paper introduces a reformulation of the standard diffusion model geared toward selective region alteration, allowing the accurate identification of anomalies. By modeling anomalies as noise in the latent space, our proposed Deviation correction diffusion (DeCo-Diff) model preserves the normal regions and encourages transformations exclusively on anomalous areas. This selective approach enhances the reconstruction quality, facilitating effective unsupervised detection and localization of anomaly regions. Comprehensive evaluations demonstrate the superiority of our method in accurately identifying and localizing anomalies in complex images, with pixel-level AUPRC improvements of 11-14% over state-of-the-art models on well-known anomaly detection datasets. The code is available at https://github.com/farzad-bz/DeCo-Diff.
299 |
300 |
301 | - [[WACV 2025]](https://www.researchgate.net/profile/Sudipta-Roy-9/publication/389540571_Self-Supervised_Anomaly_Segmentation_via_Diffusion_Models_with_Dynamic_Transformer_UNet/links/67c6e067461fb56424f04c9f/Self-Supervised-Anomaly-Segmentation-via-Diffusion-Models-with-Dynamic-Transformer-UNet.pdf) **Self-Supervised Anomaly Segmentation via Diffusion Models with Dynamic Transformer UNet** [:octocat:](https://github.com/MAXNORM8650/Annotsim)
302 |
303 | *Kumar, Komal and Chakraborty, Snehashis and Mahapatra, Dwarikanath and Bozorgtabar, Behzad and Roy, Sudipta*
304 |
305 |
306 |

307 |
308 |
309 |
310 | 📋 Abstract (Click to Expand)
311 | A robust anomaly detection mechanism should possess the capability to effectively remediate anomalies, restoring them to a healthy state, while preserving essential healthy information. Despite the efficacy of existing generative models in learning the underlying distribution of healthy reference data, they face primary challenges when it comes to efficiently repair larger anomalies or anomalies situated near high pixel-density regions. In this paper, we introduce a self-supervised anomaly detection method based on a diffusion model that samples from multi-frequency, four-dimensional simplex noise and makes predictions using our proposed Dynamic Transformer UNet (DTUNet). This simplex-based noise function helps address primary problems to some extent and is scalable for three-dimensional and colored images. In the evolution of ViT, our developed architecture serving as the backbone for the diffusion model, is tailored to treat time and noise image patches as tokens. We incorporate long skip connections bridging the shallow and deep layers, along with smaller skip connections within these layers. Furthermore, we integrate a partial diffusion Markov process, which reduces sampling time, thus enhancing scalability. Our method surpasses existing generative-based anomaly detection methods across three diverse datasets, which include BrainMRI, Brats2021, and the MVtec dataset. It achieves an average improvement of +10.1% in Dice coefficient, +10.4% in IOU, and +9.6% in AUC. Our source code is made publicly available on Github.
312 |
313 |
314 | - *[[ICML 2024]](https://proceedings.mlr.press/v235/li24u.html) **Vague Prototype-Oriented Diffusion Model for Multi-class Anomaly Detection**
315 |
316 | *Li, Yuxin and Feng, Yaoxuan and Chen, Bo and Chen, Wenchao and Wang, Yubiao and Hu, Xinyue and Sun, Baolin and Qu, Chunhui and Zhou, Mingyuan*
317 |
318 |
319 |

320 |
321 |
322 |
323 | 📋 Abstract (Click to Expand)
324 | Multi-class unsupervised anomaly detection aims to create a unified model for identifying anomalies in objects from multiple classes when only normal data is available. In such a challenging setting, widely used reconstruction-based networks persistently grapple with the "identical shortcut" problem, wherein the infiltration of abnormal information from the condition biases the output towards an anomalous distribution. In response to this critical challenge, we introduce a Vague Prototype-Oriented Diffusion Model (VPDM) that extracts only fundamental information from the condition to prevent the occurrence of the "identical shortcut" problem from the input layer. This model leverages prototypes that contain only vague information about the target as the initial condition. Subsequently, a novel conditional diffusion model is introduced to incrementally enhance details based on vague conditions. Finally, a Vague Prototype-Oriented Optimal Transport (VPOT) method is proposed to provide more accurate information about conditions. All these components are seamlessly integrated into a unified optimization objective. The effectiveness of our approach is demonstrated across diverse datasets, including the MVTec, VisA, and MPDD benchmarks, achieving state-of-the-art results.
325 |
326 |
327 | - [[TMI 2024]](https://arxiv.org/pdf/2308.02062) **Diffusion Models for Counterfactual Generation and Anomaly Detection in Brain lmages** [:octocat:](https://github.com/alessandro-f/dif-fuse)
328 |
329 | *Fontanella, Alessandro and Mair, Grant and Wardlaw, Joanna and Trucco, Emanuele and Storkey, Amos*
330 |
331 |
332 |

333 |
334 |
335 |
336 | 📋 Abstract (Click to Expand)
337 | Segmentation masks of pathological areas are useful in many medical applications, such as brain tumour and stroke management. Moreover, healthy counterfactuals of diseased images can be used to enhance radiologists’ training files and to improve the interpretability of segmentation models. In this work, we present a weakly supervised method to generate a healthy version of a diseased image and then use it to obtain a pixel-wise anomaly map. To do so, we start by considering a saliency map that approximately covers the pathological areas, obtained with ACAT. Then, we propose a technique that allows to perform targeted modifications to these regions, while preserving the rest of the image. In particular, we employ a diffusion model trained on healthy samples and combine Denoising Diffusion Probabilistic Model (DDPM) and Denoising Diffusion Implicit Model (DDIM) at each step of the sampling process. DDPM is used to modify the areas affected by a lesion within the saliency map, while DDIM guarantees reconstruction of the normal anatomy outside of it. The two parts are also fused at each timestep, to guarantee the generation of a sample with a coherent appearance and a seamless transition between edited and unedited parts. We verify that when our method is applied to healthy samples, the input images are reconstructed without significant modifications. We compare our approach with alternative weakly supervised methods on the task of brain lesion segmentation, achieving the highest mean Dice and IoU scores among the models considered.
338 |
339 |
340 | - [[MICCAI 2024]](https://arxiv.org/pdf/2403.08464) **Diffusion Models with Implicit Guidance for Medical Anomaly Detection** [:octocat:](https://github.com/ci-ber/thor_ddpm)
341 |
342 | *Bercea, Cosmin I and Wiestler, Benedikt and Rueckert, Daniel and Schnabel, Julia A*
343 |
344 |
345 |

346 |
347 |
348 |
349 | 📋 Abstract (Click to Expand)
350 | Diffusion models have advanced unsupervised anomaly detection by improving the transformation of pathological images into pseudo-healthy equivalents. Nonetheless, standard approaches may compromise critical information during pathology removal, leading to restorations that do not align with unaffected regions in the original scans. Such discrepancies can inadvertently increase false positive rates and reduce specificity, complicating radiological evaluations. This paper introduces Temporal Harmonization for Optimal Restoration (THOR), which refines the reverse diffusion process by integrating implicit guidance through intermediate masks. THOR aims to preserve the integrity of healthy tissue details in reconstructed images, ensuring fidelity to the original scan in areas unaffected by pathology. Comparative evaluations reveal that THOR surpasses existing diffusion-based methods in retaining detail and precision in image restoration and detecting and segmenting anomalies in brain MRIs and wrist X-rays. Code: https://github.com/compai-lab/2024-miccai-bercea-thor.git.
351 |
352 |
353 | - [[UNSURE 2024]](https://link.springer.com/chapter/10.1007/978-3-031-73158-7_11) **Image-Conditioned Diffusion Models for Medical Anomaly Detection**
354 |
355 | *Baugh, Matthew and Reynaud, Hadrien and Marimont, Sergio Naval and Cechnicka, Sarah and M{\"u}ller, Johanna P and Tarroni, Giacomo and Kainz, Bernhard*
356 |
357 |
358 |

359 |
360 |
361 |
362 | 📋 Abstract (Click to Expand)
363 | Generating pseudo-healthy reconstructions of images is an effective way to detect anomalies, as identifying the differences between the reconstruction and the original can localise arbitrary anomalies whilst also providing interpretability for an observer by displaying what the image ‘should’ look like. All existing reconstruction-based methods have a common shortcoming; they assume that models trained on purely normal data are incapable of reproducing pathologies yet also able to fully maintain healthy tissue. These implicit assumptions often fail, with models either not recovering normal regions or reproducing both the normal and abnormal features. We rectify this issue using image-conditioned diffusion models. Our model takes the input image as conditioning and is explicitly trained to correct synthetic anomalies introduced into healthy images, ensuring that it removes anomalies at test time. This conditioning allows the model to attend to the entire image without any loss of information, enabling it to replicate healthy regions with high fidelity. We evaluate our method across four datasets and define a new state-of-the-art performance for residual-based anomaly detection. Code is available at https://github.com/matt-baugh/img-cond-diffusion-model-ad.
364 |
365 |
366 | - [[CVPR 2022]](https://openaccess.thecvf.com/content/CVPR2022W/NTIRE/papers/Wyatt_AnoDDPM_Anomaly_Detection_With_Denoising_Diffusion_Probabilistic_Models_Using_Simplex_CVPRW_2022_paper.pdf) **AnoDDPM: Anomaly Detection with Denoising Diffusion Probabilistic Models using Simplex Noise** [:octocat:](https://github.com/julian-wyatt/anoddpm)
367 |
368 | *Wyatt, Julian and Leach, Adam and Schmon, Sebastian M and Willcocks, Chris G*
369 |
370 |
371 |

372 |
373 |
374 |
375 | 📋 Abstract (Click to Expand)
376 | Generative models have been shown to provide a powerful mechanism for anomaly detection by learning to model healthy or normal reference data which can subsequently be used as a baseline for scoring anomalies. In this work we consider denoising diffusion probabilistic models (DDPMs) for unsupervised anomaly detection. DDPMs have superior mode coverage over generative adversarial networks (GANs) and higher sample quality than variational autoencoders (VAEs). However, this comes at the expense of poor scalability and increased sampling times due to the long Markov chain sequences required. We observe that within reconstruction-based anomaly detection a full-length Markov chain diffusion is not required. This leads us to develop a novel partial diffusion anomaly detection strategy that scales to high-resolution imagery, named AnoDDPM. A secondary problem is that Gaussian diffusion fails to capture larger anomalies; therefore we develop a multi-scale simplex noise diffusion process that gives control over the target anomaly size. AnoDDPM with simplex noise is shown to significantly outperform both f-AnoGAN and Gaussian diffusion for the tumorous dataset of 22 T1-weighted MRI scans (CCBS Edinburgh) qualitatively and quantitatively (improvement of+ 25.5% Sorensen-Dice coefficient,+ 17.6% IoU and+ 7.4% AUC).
377 |
378 |
379 | - [[MICCAI 2022]](https://arxiv.org/pdf/2203.04306) **Diffusion Models for Medical Anomaly Detection** [:octocat:](https://github.com/JuliaWolleb/diffusion-anomaly)
380 |
381 | *Wolleb, Julia and Bieder, Florentin and Sandkühler, Robin and Cattin, Philippe C*
382 |
383 |
384 |

385 |
386 |
387 |
388 | 📋 Abstract (Click to Expand)
389 | In medical applications, weakly supervised anomaly detection methods are of great interest, as only image-level annotations are required for training. Current anomaly detection methods mainly rely on generative adversarial networks or autoencoder models. Those models are often complicated to train or have difficulties to preserve fine details in the image. We present a novel weakly supervised anomaly detection method based on denoising diffusion implicit models. We combine the deterministic iterative noising and denoising scheme with classifier guidance for image-to-image translation between diseased and healthy subjects. Our method generates very detailed anomaly maps without the need for a complex training procedure. We evaluate our method on the BRATS2020 dataset for brain tumor detection and the CheXpert dataset for detecting pleural effusions.
390 |
391 |
392 | ## 8. Patch-Based Approaches
393 |
394 | - [[MIDL 2024]](https://proceedings.mlr.press/v227/behrendt24a/behrendt24a.pdf) **Patched Diffusion Models for Unsupervised Anomaly Detection in Brain MRI** [:octocat:](https://github.com/finnbehrendt/patched-diffusion-models-uad)
395 |
396 | *Behrendt, Finn and Bhattacharya, Debayan and Krüger, Julia and Opfer, Roland and Schlaefer, Alexander*
397 |
398 |
399 |

400 |
401 |
402 |
403 | 📋 Abstract (Click to Expand)
404 | The use of supervised deep learning techniques to detect pathologies in brain MRI scans can be challenging due to the diversity of brain anatomy and the need for annotated data sets. An alternative approach is to use unsupervised anomaly detection, which only requires sample-level labels of healthy brains to create a reference representation. This reference representation can then be compared to unhealthy brain anatomy in a pixel-wise manner to identify abnormalities. To accomplish this, generative models are needed to create anatomically consistent MRI scans of healthy brains. While recent diffusion models have shown promise in this task, accurately generating the complex structure of the human brain remains a challenge. In this paper, we propose a method that reformulates the generation task of diffusion models as a patch-based estimation of healthy brain anatomy, using spatial context to guide and improve reconstruction. We evaluate our approach on data of tumors and multiple sclerosis lesions and demonstrate a relative improvement of 25.1% compared to existing baselines.
405 |
406 |
407 | - *[[CVPR 2021]](https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Glancing_at_the_Patch_Anomaly_Localization_With_Global_and_Local_CVPR_2021_paper.html?ref=https://githubhelp.com) **Glancing at the Patch: Anomaly Localization with Global and Local Feature Comparison**
408 |
409 | *Wang, Shenzhi and Wu, Liwei and Cui, Lei and Shen, Yujun*
410 |
411 |
412 |

413 |
414 |
415 |
416 | 📋 Abstract (Click to Expand)
417 | Anomaly localization, with the purpose to segment the anomalous regions within images, is challenging due to the large variety of anomaly types. Existing methods typically train deep models by treating the entire image as a whole yet put little effort into learning the local distribution, which is vital for this pixel-precise task. In this work, we propose an unsupervised patch-based approach that gives due consideration to both the global and local information. More concretely, we employ a Local-Net and Global-Net to extract features from any individual patch and its surrounding respectively. Global-Net is trained with the purpose to mimic the local feature such that we can easily detect an abnormal patch when its feature mismatches that from the context. We further introduce an Inconsistency Anomaly Detection (IAD) head and a Distortion Anomaly Detection (DAD) head to sufficiently spot the discrepancy between global and local features. A scoring function derived from the multi-head design facilitates high-precision anomaly localization. Extensive experiments on a couple of real-world datasets suggest that our approach outperforms state-of-the-art competitors by a sufficiently large margin.
418 |
419 |
420 | ## 9. Foundation Model-Based Approaches
421 |
422 | - [[CVPR 2025]](https://arxiv.org/pdf/2503.06661) **AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP** [:octocat:](https://github.com/mwxinnn/aa-clip)
423 |
424 | *Ma, Wenxin and Zhang, Xu and Yao, Qingsong and Tang, Fenghe and Wu, Chenxu and Li, Yingtai and Yan, Rui and Jiang, Zihang and Zhou, S Kevin*
425 |
426 |
427 |

428 |
429 |
430 |
431 | 📋 Abstract (Click to Expand)
432 | Anomaly detection (AD) identifies outliers for applications like defect and lesion detection. While CLIP shows promise for zero-shot AD tasks due to its strong generalization capabilities, its inherent Anomaly-Unawareness leads to limited discrimination between normal and abnormal features. To address this problem, we propose Anomaly-Aware CLIP (AA-CLIP), which enhances CLIP's anomaly discrimination ability in both text and visual spaces while preserving its generalization capability. AA-CLIP is achieved through a straightforward yet effective two-stage approach: it first creates anomaly-aware text anchors to differentiate normal and abnormal semantics clearly, then aligns patch-level visual features with these anchors for precise anomaly localization. This two-stage strategy, with the help of residual adapters, gradually adapts CLIP in a controlled manner, achieving effective AD while maintaining CLIP's class knowledge. Extensive experiments validate AA-CLIP as a resource-efficient solution for zero-shot AD tasks, achieving state-of-the-art results in industrial and medical applications. The code is available at https://github.com/Mwxinnn/AA-CLIP.
433 |
434 |
435 | - *[[AAAI 2025]](https://ojs.aaai.org/index.php/AAAI/article/view/32433) **LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction**
436 |
437 | *Jin, Er and Feng, Qihui and Mou, Yongli and Lakemeyer, Gerhard and Decker, Stefan and Simons, Oliver and Stegmaier, Johannes*
438 |
439 |
440 |

441 |
442 |
443 |
444 | 📋 Abstract (Click to Expand)
445 | Logical image understanding involves interpreting and reasoning about the relationships and consistency within an image's visual content. This capability is essential in applications such as industrial inspection, where logical anomaly detection is critical for maintaining high-quality standards and minimizing costly recalls. Previous research in anomaly detection (AD) has relied on prior knowledge for designing algorithms, which often requires extensive manual annotations, significant computing power, and large amounts of data for training. Autoregressive, multimodal Vision Language Models (AVLMs) offer a promising alternative due to their exceptional performance in visual reasoning across various domains. Despite this, their application to logical AD remains unexplored. In this work, we investigate using AVLMs for logical AD and demonstrate that they are well-suited to the task. Combining AVLMs with format embedding and a logic reasoner, we achieve SOTA performance on public benchmarks, MVTec LOCO AD, with an AUROC of 86.0% and an F1-max of 83.7% along with explanations of the anomalies. This significantly outperforms the existing SOTA method by 18.1% in AUROC and 4.6% in F1-max score.
446 |
447 |
448 | - *[[AAAI 2025]](https://ojs.aaai.org/index.php/AAAI/article/view/33420) **Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning**
449 |
450 | *Yang, Hui-Yue and Chen, Hui and Wang, Ao and Chen, Kai and Lin, Zijia and Tang, Yongliang and Gao, Pengcheng and Quan, Yuming and Han, Jungong and Ding, Guiguang*
451 |
452 |
453 |

454 |
455 |
456 |
457 | 📋 Abstract (Click to Expand)
458 | Segment Anything Model (SAM) has made great progress in anomaly segmentation tasks due to its impressive generalization ability. However, existing methods that directly apply SAM through prompting often overlook the domain shift issue, where SAM performs well on natural images but struggles in industrial scenarios. Parameter-Efficient Fine-Tuning (PEFT) offers a promising solution, but it may yield suboptimal performance by not adequately addressing the perception challenges during adaptation to anomaly images. In this paper, we propose a novel Self-Perception Tuning (SPT) method, aiming to enhance SAM's perception capability for anomaly segmentation. The SPT method incorporates a self-drafting tuning strategy, which generates an initial coarse draft of the anomaly mask, followed by a refinement process. Additionally, a visual-relation-aware adapter is introduced to improve the perception of discriminative relational information for mask generation. Extensive experimental results on several benchmark datasets demonstrate that our SPT method can significantly outperform baseline methods, validating its effectiveness.
459 |
460 |
461 | - [[MICCAI 2025]](https://arxiv.org/abs/2503.01020) **Delving into Out-of-Distribution Detection with Medical Vision-Language Models** [:octocat:](https://github.com/pyjulie/medical-vlms-ood-detection)
462 |
463 | *Ju, Lie and Zhou, Sijin and Zhou, Yukun and Lu, Huimin and Zhu, Zhuoting and Keane, Pearse A and Ge, Zongyuan*
464 |
465 |
466 |

467 |
468 |
469 |
470 | 📋 Abstract (Click to Expand)
471 | Recent advances in medical vision-language models (VLMs) demonstrate impressive performance in image classification tasks, driven by their strong zero-shot generalization capabilities. However, given the high variability and complexity inherent in medical imaging data, the ability of these models to detect out-of-distribution (OOD) data in this domain remains underexplored. In this work, we conduct the first systematic investigation into the OOD detection potential of medical VLMs. We evaluate state-of-the-art VLM-based OOD detection methods across a diverse set of medical VLMs, including both general and domain-specific purposes. To accurately reflect real-world challenges, we introduce a cross-modality evaluation pipeline for benchmarking full-spectrum OOD detection, rigorously assessing model robustness against both semantic shifts and covariate shifts. Furthermore, we propose a novel hierarchical prompt-based method that significantly enhances OOD detection performance. Extensive experiments are conducted to validate the effectiveness of our approach. The codes are available at https://github.com/PyJulie/Medical-VLMs-OOD-Detection.
472 |
473 |
474 | - [[NeurIPS 2024]](https://proceedings.neurips.cc/paper_files/paper/2024/hash/8f4477b086a9c97e30d1a0621ea6b2f5-Abstract-Conference.html) **One-to-Normal: Anomaly Personalization for Few-shot Anomaly Detection**
475 |
476 | *Li, Yiyue and Zhang, Shaoting and Li, Kang and Lao, Qicheng*
477 |
478 |
479 |

480 |
481 |
482 |
483 | 📋 Abstract (Click to Expand)
484 | Traditional Anomaly Detection (AD) methods have predominantly relied on unsupervised learning from extensive normal data. Recent AD methods have evolved with the advent of large pre-trained vision-language models, enhancing few-shot anomaly detection capabilities. However, these latest AD methods still exhibit limitations in accuracy improvement. One contributing factor is their direct comparison of a query image's features with those of few-shot normal images. This direct comparison often leads to a loss of precision and complicates the extension of these techniques to more complex domains—an area that remains underexplored in a more refined and comprehensive manner. To address these limitations, we introduce the anomaly personalization method, which performs a personalized one-to-normal transformation of query images using an anomaly-free customized generation model, ensuring close alignment with the normal manifold. Moreover, to further enhance the stability and robustness of prediction results, we propose a triplet contrastive anomaly inference strategy, which incorporates a comprehensive comparison between the query and generated anomaly-free data pool and prompt information. Extensive evaluations across eleven datasets in three domains demonstrate our model's effectiveness compared to the latest AD methods. Additionally, our method has been proven to transfer flexibly to other AD methods, with the generated image data effectively improving the performance of other AD methods.
485 |
486 |
487 | - [[CVPR 2024]](http://openaccess.thecvf.com/content/CVPR2024/papers/Zhu_Toward_Generalist_Anomaly_Detection_via_In-context_Residual_Learning_with_Few-shot_CVPR_2024_paper.pdf) **Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts** [:octocat:](https://github.com/mala-lab/inctrl)
488 |
489 | *Zhu, Jiawen and Pang, Guansong*
490 |
491 |
492 |

493 |
494 |
495 |
496 | 📋 Abstract (Click to Expand)
497 | This paper explores the problem of Generalist Anomaly Detection (GAD) aiming to train one single detection model that can generalize to detect anomalies in diverse datasets from different application domains without any further training on the target data. Some recent studies have shown that large pre-trained Visual-Language Models (VLMs) like CLIP have strong generalization capabilities on detecting industrial defects from various datasets but their methods rely heavily on handcrafted text prompts about defects making them difficult to generalize to anomalies in other applications eg medical image anomalies or semantic anomalies in natural images. In this work we propose to train a GAD model with few-shot normal images as sample prompts for AD on diverse datasets on the fly. To this end we introduce a novel approach that learns an in-context residual learning model for GAD termed InCTRL. It is trained on an auxiliary dataset to discriminate anomalies from normal samples based on a holistic evaluation of the residuals between query images and few-shot normal sample prompts. Regardless of the datasets per definition of anomaly larger residuals are expected for anomalies than normal samples thereby enabling InCTRL to generalize across different domains without further training. Comprehensive experiments on nine AD datasets are performed to establish a GAD benchmark that encapsulate the detection of industrial defect anomalies medical anomalies and semantic anomalies in both one-vs-all and multi-class setting on which InCTRL is the best performer and significantly outperforms state-of-the-art competing methods. Code is available at https://github.com/mala-lab/InCTRL.
498 |
499 |
500 | - [[CVPR 2024]](https://arxiv.org/abs/2403.12570) **Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images** [:octocat:](https://github.com/mediabrain-sjtu/mvfa-ad)
501 |
502 | *Huang, Chaoqin and Jiang, Aofan and Feng, Jinghao and Zhang, Ya and Wang, Xinchao and Wang, Yanfeng*
503 |
504 |
505 |

506 |
507 |
508 |
509 | 📋 Abstract (Click to Expand)
510 | Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains. However the substantial domain divergence between natural and medical images limits the effectiveness of these methodologies in medical anomaly detection. This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder enabling a stepwise enhancement of visual features across different levels. This multi-level adaptation is guided by multi-level pixel-wise visual-language feature alignment loss functions which recalibrate the model's focus from object semantics in natural imagery to anomaly identification in medical images. The adapted features exhibit improved generalization across various medical data types even in zero-shot scenarios where the model encounters unseen medical modalities and anatomical regions during training. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models with an average AUC improvement of 6.24% and 7.33% for anomaly classification 2.03% and 2.37% for anomaly segmentation under the zero-shot and few-shot settings respectively. Source code is available at: https://github.com/MediaBrain-SJTU/MVFA-AD.
511 |
512 |
513 | - [[ICLR 2024]](https://openreview.net/forum?id=buC4E91xZE) **AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection** [:octocat:](https://github.com/zqhang/anomalyclip)
514 |
515 | *Zhou, Qihang and Pang, Guansong and Tian, Yu and He, Shibo and Chen, Jiming*
516 |
517 |
518 |

519 |
520 |
521 |
522 | 📋 Abstract (Click to Expand)
523 | Zero-shot anomaly detection (ZSAD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. It is a crucial task when training data is not accessible due to various concerns, eg, data privacy, yet it is challenging since the models need to generalize to anomalies across different domains where the appearance of foreground objects, abnormal regions, and background features, such as defects/tumors on different products/organs, can vary significantly. Recently large pre-trained vision-language models (VLMs), such as CLIP, have demonstrated strong zero-shot recognition ability in various vision tasks, including anomaly detection. However, their ZSAD performance is weak since the VLMs focus more on modeling the class semantics of the foreground objects rather than the abnormality/normality in the images. In this paper we introduce a novel approach, namely AnomalyCLIP, to adapt CLIP for accurate ZSAD across different domains. The key insight of AnomalyCLIP is to learn object-agnostic text prompts that capture generic normality and abnormality in an image regardless of its foreground objects. This allows our model to focus on the abnormal image regions rather than the object semantics, enabling generalized normality and abnormality recognition on diverse types of objects. Large-scale experiments on 17 real-world anomaly detection datasets show that AnomalyCLIP achieves superior zero-shot performance of detecting and segmenting anomalies in datasets of highly diverse class semantics from various defect inspection and medical imaging domains. Code will be made available at https://github.com/zqhang/AnomalyCLIP.
524 |
525 |
526 | - *[[ACM MM 2024]](https://dl.acm.org/doi/abs/10.1145/3664647.3681376) **SimCLIP: Refining Image-Text Alignment with Simple Prompts for Zero-/Few-shot Anomaly Detection** [:octocat:](https://github.com/CH-ORGI/SimCLIP)
527 |
528 | *Deng, Chenghao and Xu, Haote and Chen, Xiaolu and Xu, Haodi and Tu, Xiaotong and Ding, Xinghao and Huang, Yue*
529 |
530 |
531 |

532 |
533 |
534 |
535 | 📋 Abstract (Click to Expand)
536 | Recently, large pre-trained vision-language models, such as CLIP, have demonstrated significant potential in zero-/few-shot anomaly detection tasks. However, existing methods not only rely on expert knowledge to manually craft extensive text prompts but also suffer from a misalignment of high-level language features with fine-level vision features in anomaly segmentation tasks. In this paper, we propose a method, named SimCLIP, which focuses on refining the aforementioned misalignment problem through bidirectional adaptation of both Multi-Hierarchy Vision Adapter (MHVA) and Implicit Prompt Tuning (IPT). In this way, our approach requires only a simple binary prompt to efficiently accomplish anomaly classification and segmentation tasks in zero-shot scenarios. Furthermore, we introduce its few-shot extension, SimCLIP+, integrating the relational information among vision embeddings and skillfully merging the cross-modal synergy information between vision and language to address downstream anomaly detection tasks. Extensive experiments on two challenging datasets prove the more remarkable generalization capacity of our method compared to the current SOTA approaches. Our code is available at https://github.com/CH-ORGI/SimCLIP.
537 |
538 |
539 | - [[MICCAI 2024]](https://arxiv.org/pdf/2405.11315) **Mediclip: Adapting CLIP for Few-shot Medical Image Anomaly Detection** [:octocat:](https://github.com/cnulab/mediclip)
540 |
541 | *Zhang, Ximiao and Xu, Min and Qiu, Dehui and Yan, Ruixin and Lang, Ning and Zhou, Xiuzhuang*
542 |
543 |
544 |

545 |
546 |
547 |
548 | 📋 Abstract (Click to Expand)
549 | In the field of medical decision-making, precise anomaly detection in medical imaging plays a pivotal role in aiding clinicians. However, previous work is reliant on large-scale datasets for training anomaly detection models, which increases the development cost. This paper first focuses on the task of medical image anomaly detection in the few-shot setting, which is critically significant for the medical field where data collection and annotation are both very expensive. We propose an innovative approach, MediCLIP, which adapts the CLIP model to few-shot medical image anomaly detection through self-supervised fine-tuning. Although CLIP, as a vision-language model, demonstrates outstanding zero-/few-shot performance on various downstream tasks, it still falls short in the anomaly detection of medical images. To address this, we design a series of medical image anomaly synthesis tasks to simulate common disease patterns in medical imaging, transferring the powerful generalization capabilities of CLIP to the task of medical image anomaly detection. When only few-shot normal medical images are provided, MediCLIP achieves state-of-the-art performance in anomaly detection and location compared to other methods. Extensive experiments on three distinct medical anomaly detection tasks have demonstrated the superiority of our approach. The code is available at https://github.com/cnulab/MediCLIP.
550 |
551 |
552 | - *[[NeurIPS 2022]](https://proceedings.neurips.cc/paper_files/paper/2022/hash/e43a33994a28f746dcfd53eb51ed3c2d-Abstract-Conference.html) **Delving into Out-of-Distribution Detection with Vision-Language Representations** [:octocat:](https://github.com/deeplearning-wisc/mcm)
553 |
554 | *Ming, Yifei and Cai, Ziyang and Gu, Jiuxiang and Sun, Yiyou and Li, Wei and Li, Yixuan*
555 |
556 |
557 |

558 |
559 |
560 |
561 | 📋 Abstract (Click to Expand)
562 | Recognizing out-of-distribution (OOD) samples is critical for machine learning systems deployed in the open world. The vast majority of OOD detection methods are driven by a single modality (eg, either vision or language), leaving the rich information in multi-modal representations untapped. Inspired by the recent success of vision-language pre-training, this paper enriches the landscape of OOD detection from a single-modal to a multi-modal regime. Particularly, we propose Maximum Concept Matching (MCM), a simple yet effective zero-shot OOD detection method based on aligning visual features with textual concepts. We contribute in-depth analysis and theoretical insights to understand the effectiveness of MCM. Extensive experiments demonstrate that MCM achieves superior performance on a wide variety of real-world tasks. MCM with vision-language features outperforms a common baseline with pure visual features on a hard OOD task with semantically similar classes by 13.1% (AUROC) Code is available at https://github.com/deeplearning-wisc/MCM.
563 |
564 |
565 | ## 10. Multi-Modal Fusion
566 |
567 | - *[[AAAI 2025]](https://ojs.aaai.org/index.php/AAAI/article/view/33349) **Unveiling Multi-View Anomaly Detection: Intra-view Decoupling and Inter-view Fusion**
568 |
569 | *Mao, Kai and Lian, Yiyang and Wang, Yangyang and Liu, Meiqin and Zheng, Nanning and Wei, Ping*
570 |
571 |
572 |

573 |
574 |
575 |
576 | 📋 Abstract (Click to Expand)
577 | Anomaly detection has garnered significant attention for its extensive industrial application value. Most existing methods focus on single-view scenarios and fail to detect anomalies hidden in blind spots, leaving a gap in addressing the demands of multi-view detection in practical applications. Ensemble of multiple single-view models is a typical way to tackle the multi-view situation, but it overlooks the correlations between different views. In this paper, we propose a novel multi-view anomaly detection framework, Intra-view Decoupling and Inter-view Fusion (IDIF), to explore correlations among views. Our method contains three key components: 1) a proposed Consistency Bottleneck module extracting the common features of different views through information compression and mutual information maximization; 2) an Implicit Voxel Construction module fusing features of different views with prior knowledge represented in the form of voxels; and 3) a View-wise Dropout training strategy enabling the model to learn how to cope with missing views during test. The proposed IDIF achieves state-of-the-art performance on three datasets. Extensive ablation studies also demonstrate the superiority of our methods.
578 |
579 |
580 | - *[[AAAI 2025]](https://arxiv.org/pdf/2412.17297) **Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective**
581 |
582 | *Long, Kaifang and Xie, Guoyang and Ma, Lianbo and Liu, Jiaqi and Lu, Zhichao*
583 |
584 |
585 |

586 |
587 |
588 |
589 | 📋 Abstract (Click to Expand)
590 | Existing efforts to boost multimodal fusion of 3D anomaly detection (3D-AD) primarily concentrate on devising more effective multimodal fusion strategies. However, little attention was devoted to analyzing the role of multimodal fusion architecture (topology) design in contributing to 3D-AD. In this paper, we aim to bridge this gap and present a systematic study on the impact of multimodal fusion architecture design on 3D-AD. This work considers the multimodal fusion architecture design at the intra-module fusion level, ie, independent modality-specific modules, involving early, middle or late multimodal features with specific fusion operations, and also at the inter-module fusion level, ie, the strategies to fuse those modules. In both cases, we first derive insights through theoretically and experimentally exploring how architectural designs influence 3D-AD. Then, we extend SOTA neural architecture search (NAS) paradigm and propose 3D-ADNAS to simultaneously search across multimodal fusion strategies and modality-specific modules for the first time. Extensive experiments show that 3D-ADNAS obtains consistent improvements in 3D-AD across various model capacities in terms of accuracy, frame rate, and memory usage, and it exhibits great potential in dealing with few-shot 3D-AD tasks.
591 |
592 |
593 | - *[[AAAI 2025]](https://ojs.aaai.org/index.php/AAAI/article/view/34841) **RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations**
594 |
595 | *Zhang, Bin and Chen, Jinggang and Qu, Xiaoyang and Li, Guokuan and Lu, Kai and Wan, Jiguang and Xiao, Jing and Wang, Jianzong*
596 |
597 |
598 |

599 |
600 |
601 |
602 | 📋 Abstract (Click to Expand)
603 | Enabling object detectors to recognize out-of-distribution (OOD) objects is vital for building reliable systems. A primary obstacle stems from the fact that models frequently do not receive supervisory signals from unfamiliar data, leading to overly confident predictions regarding OOD objects. Despite previous progress that estimates OOD uncertainty based on the detection model and in-distribution (ID) samples, we explore using pre-trained vision-language representations for object-level OOD detection. We first discuss the limitations of applying image-level CLIP-based OOD detection methods to object-level scenarios. Building upon these insights, we propose RUNA, a novel framework that leverages a dual encoder architecture to capture rich contextual information and employs a regional uncertainty alignment mechanism to distinguish ID from OOD objects effectively. We introduce a few-shot fine-tuning approach that aligns region-level semantic representations to further improve the model's capability to discriminate between similar objects. Our experiments show that RUNA substantially surpasses state-of-the-art methods in object-level OOD detection, particularly in challenging scenarios with diverse and complex object instances.
604 |
605 |
606 | - [[INFFUS 2025]](https://www.sciencedirect.com/science/article/pii/S1566253524004093) **Adapting the Segment Anything Model for Multi-Modal Retinal Anomaly Detection and Localization** [:octocat:](https://github.com/Jingtao-Li-CVer/MMRAD)
607 |
608 | *Li, Jingtao and Chen, Ting and Wang, Xinyu and Zhong, Yanfei and Xiao, Xuan*
609 |
610 |
611 |

612 |
613 |
614 |
615 | 📋 Abstract (Click to Expand)
616 | The fusion of optical coherence tomography (OCT) and fundus modality information can provide a comprehensive diagnosis for retinal artery occlusion (RAO) disease, where OCT provides the cross-sectional examination of the fundus image. Given multi-modal retinal images, an anomaly diagnosis model can discriminate RAO without the need for real diseased samples. Despite this, previous studies have only focused on single-modal diagnosis, because of: 1) the lack of paired modality samples; and 2) the significant imaging differences, which make the fusion difficult with small-scale medical data. In this paper, we describe how we first built a multi-modal RAO dataset including both OCT and fundus modalities, which supports both the anomaly detection and localization tasks with pixel-level annotation. Motivated by the powerful generalization ability of the recent visual foundation model known as the Segment Anything Model (SAM), we adapted it for our task considering the small-scale property of retinal samples. Specifically, a modality-shared decoder with task-specific tokens is introduced to make SAM support the multi-modal image setting, which includes a mask token for the anomaly localization task at the pixel level and a fusion token for the anomaly detection task at the case level. Since SAM has little medical knowledge and lacks the learning of the “normal” concept, it is infeasible to localize RAO anomalies in the zero-shot manner. To integrate expert retinal knowledge while keeping the general segmentation knowledge, general anomaly simulation for both modalities and a low-level prompt-tuning strategy are introduced. The experiments conducted in this study show that the adapted model can surpass the state-of-the-art model by a large margin. This study sets the first benchmark for the multi-modal anomaly detection and localization tasks in the medical community. The code is available at https://github.com/Jingtao-Li-CVer/MMRAD.
617 |
618 |
619 | ## 11. Knowledge Distillation
620 |
621 | - *[[AAAI 2025]](https://ojs.aaai.org/index.php/AAAI/article/view/32243) **Filter or Compensate: Towards Invariant Representation from Distribution Shift for Anomaly Detection** [:octocat:](https://github.com/znchen666/fico)
622 |
623 | *Chen, Zining and Luo, Xingshuang and Wang, Weiqiu and Zhao, Zhicheng and Su, Fei and Men, Aidong*
624 |
625 |
626 |

627 |
628 |
629 |
630 | 📋 Abstract (Click to Expand)
631 | Recent Anomaly Detection (AD) methods have achieved great success with In-Distribution (ID) data. However, real-world data often exhibits distribution shift, causing huge performance decay on traditional AD methods. From this perspective, few previous work has explored AD with distribution shift, and the distribution-invariant normality learning has been proposed based on the Reverse Distillation (RD) framework. However, we observe the misalignment issue between the teacher and the student network that causes detection failure, thereby propose FiCo, Filter or Compensate, to address the distribution shift issue in AD. FiCo firstly compensates the distribution-specific information to reduce the misalignment between the teacher and student network via the Distribution-Specific Compensation (DiSCo) module, and secondly filters all abnormal information to capture distribution-invariant normality with the Distribution-Invariant Filter (DiIFi) module. Extensive experiments on three different AD benchmarks demonstrate the effectiveness of FiCo, which outperforms all existing state-of-the-art (SOTA) methods, and even achieves better results on the ID scenario compared with RD-based methods.
632 |
633 |
634 | - *[[AAAI 2025]](https://arxiv.org/pdf/2412.07579) **Unlocking the Potential of Reverse Distillation for Anomaly Detection** [:octocat:](https://github.com/hito2448/urd)
635 |
636 | *Liu, Xinyue and Wang, Jianyuan and Leng, Biao and Zhang, Shuo*
637 |
638 |
639 |

640 |
641 |
642 |
643 | 📋 Abstract (Click to Expand)
644 | Knowledge Distillation (KD) is a promising approach for unsupervised Anomaly Detection (AD). However, the student network's over-generalization often diminishes the crucial representation differences between teacher and student in anomalous regions, leading to detection failures. To address this problem, the widely accepted Reverse Distillation (RD) paradigm designs the asymmetry teacher and student network, using an encoder as teacher and a decoder as student. Yet, the design of RD does not ensure that the teacher encoder effectively distinguishes between normal and abnormal features or that the student decoder generates anomaly-free features. Additionally, the absence of skip connections results in a loss of fine details during feature reconstruction. To address these issues, we propose RD with Expert, which introduces a novel Expert-Teacher-Student network for simultaneous distillation of both the teacher encoder and student decoder. The added expert network enhances the student's ability to generate normal features and optimizes the teacher's differentiation between normal and abnormal features, reducing missed detections. Additionally, Guided Information Injection is designed to filter and transfer features from teacher to student, improving detail reconstruction and minimizing false positives. Experiments on several benchmarks prove that our method outperforms existing unsupervised AD methods under RD paradigm, fully unlocking RD’s potential.
645 |
646 |
647 | ## 12. Correlation Learning
648 |
649 | - [[TMI 2024]](https://ieeexplore.ieee.org/document/10680612) **Facing Differences of Similarity: Intra- and Inter-Correlation Unsupervised Learning for Chest X-Ray Anomaly Detection**
650 |
651 | *Xu, Shicheng and Li, Wei and Li, Zuoyong and Zhao, Tiesong and Zhang, Bob*
652 |
653 |
654 |

655 |
656 |
657 |
658 | 📋 Abstract (Click to Expand)
659 | Anomaly detection can significantly aid doctors in interpreting chest X-rays. The commonly used strategy involves utilizing the pre-trained network to extract features from normal data to establish feature representations. However, when a pre-trained network is applied to more detailed X-rays, differences of similarity can limit the robustness of these feature representations. Therefore, we propose an intra- and inter-correlation learning framework for chest X-ray anomaly detection. Firstly, to better leverage the similar anatomical structure information in chest X-rays, we introduce the Anatomical-Feature Pyramid Fusion Module for feature fusion. This module aims to obtain fusion features with both local details and global contextual information. These fusion features are initialized by a trainable feature mapper and stored in a feature bank to serve as centers for learning. Furthermore, to Facing Differences of Similarity (FDS) introduced by the pre-trained network, we propose an intra- and inter-correlation learning strategy: 1) We use intra-correlation learning to establish intra-correlation between mapped features of individual images and semantic centers, thereby initially discovering lesions; 2) We employ inter-correlation learning to establish inter-correlation between mapped features of different images, further mitigating the differences of similarity introduced by the pre-trained network, and achieving effective detection results even in diverse chest disease environments. Finally, a comparison with 18 state-of-the-art methods on three datasets demonstrates the superiority and effectiveness of the proposed method across various scenarios.
660 |
661 |
662 | ## 13. Anomaly Generation
663 |
664 | - *[[ICCV 2025]](https://arxiv.org/pdf/2410.14987) **SeaS: Few-Shot Industrial Anomaly Image Generation with Separation and Sharing Fine-Tuning** [:octocat:](https://github.com/HUST-SLOW/SeaS)
665 |
666 | *Dai, Zhewei and Zeng, Shilei and Liu, Haotian and Li, Xurui and Xue, Feng and Zhou, Yu*
667 |
668 |
669 |

670 |
671 |
672 |
673 | 📋 Abstract (Click to Expand)
674 | Current segmentation methods require many training images and precise masks, while insufficient anomaly images hinder their application in industrial scenarios. To address such an issue, we explore producing diverse anomalies and accurate pixel-wise annotations. By observing the real production lines, we find that anomalies vary randomly in shape and appearance, whereas products hold globally consistent patterns with slight local variations. Such a characteristic inspires us to develop a Separation and Sharing Fine-tuning (SeaS) approach using only a few abnormal and some normal images. Firstly, we propose the Unbalanced Abnormal (UA) Text Prompt tailored to industrial anomaly generation, consisting of one product token and several anomaly tokens. Then, for anomaly images, we propose a Decoupled Anomaly Alignment (DA) loss to bind the attributes of the anomalies to different anomaly tokens. Re-blending such attributes may produce never-seen anomalies, achieving a high diversity of anomalies. For normal images, we propose a Normal-image Alignment (NA) loss to learn the products' key features that are used to synthesize products with both global consistency and local variations. The two training processes are separated but conducted on a shared U-Net. Finally, SeaS produces high-fidelity annotations for the generated anomalies by fusing discriminative features of U-Net and high-resolution VAE features. Extensive evaluations on the challenging MVTec AD and MVTec 3D AD dataset demonstrate the effectiveness of our approach. For anomaly image generation, we achieve 1.88 on IS and 0.34 on IC-LPIPS on MVTec AD dataset, 1.95 on IS and 0.30 on IC-LPIPS on MVTec 3D AD dataset. For downstream task, using our generated anomaly image-mask pairs, three common segmentation methods achieve an average 11.17% improvement on IoU on MVTec AD dataset, and a 15.49% enhancement in IoU on MVTec 3D AD dataset.
675 |
676 |
677 | - [[CVPR 2025]](https://arxiv.org/abs/2406.01078) **Anomaly Anything: Promptable Unseen Visual Anomaly Generation** [:octocat:](https://github.com/EPFL-IMOS/AnomalyAny)
678 |
679 | *Sun, Han and Cao, Yunkang and Dong, Hao and Fink, Olga*
680 |
681 |
682 |

683 |
684 |
685 |
686 | 📋 Abstract (Click to Expand)
687 | Visual anomaly detection (AD) presents significant challenges due to the scarcity of anomalous data samples. While numerous works have been proposed to synthesize anomalous samples, these synthetic anomalies often lack authenticity or require extensive training data, limiting their applicability in real-world scenarios. In this work, we propose Anomaly Anything (AnomalyAny), a novel framework that leverages Stable Diffusion (SD)’s image generation capabilities to generate diverse and realistic unseen anomalies. By conditioning on a single normal sample during test time, AnomalyAny is able to generate unseen anomalies for arbitrary object types with text descriptions. Within AnomalyAny, we propose attention-guided anomaly optimization to direct SD’s attention on generating hard anomaly concepts. Additionally, we introduce prompt-guided anomaly refinement, incorporating detailed descriptions to further improve the generation quality. Extensive experiments on MVTec AD and VisA datasets demonstrate AnomalyAny’s ability in generating high-quality unseen anomalies and its effectiveness in enhancing downstream AD performance. Our demo and code are available at https://hansunhayden.github.io/AnomalyAny.github.io/.
688 |
689 |
690 | - *[[CVPR 2025]](https://openaccess.thecvf.com/content/CVPR2025W/SyntaGen/html/Zhao_AnomalyHybrid_A_Domain-agnostic_Generative_Framework_for_General_Anomaly_Detection_CVPRW_2025_paper.html) **AnomalyHybrid: A Domain-Agnostic Generative Framework for General Anomaly Detection**
691 |
692 | *Zhao, Ying*
693 |
694 |
695 |

696 |
697 |
698 |
699 | 📋 Abstract (Click to Expand)
700 | Anomaly generation is an effective way to mitigate data scarcity for anomaly detection task. Most existing works shine at industrial anomaly generation with multiple specialists or large generative models, rarely generalizing to anomalies in other applications. In this paper, we present AnomalyHybrid, a domain-agnostic framework designed to generate authentic and diverse anomalies simply by combining the reference and target images. AnomalyHybrid is a Generative Adversarial Network (GAN)-based framework having two decoders that integrate the appearance of reference image into the depth and edge structures of target image respectively. With the help of depth decoders, AnomalyHybrid achieves authentic generation especially for the anomalies with depth values changing, such as protrusion and dent. More, it relaxes the fine granularity structural control of the edge decoder and brings more diversity. Without using annotations, AnomalyHybrid is easily trained with sets of color, depth and edge of same images having different augmentations. Extensive experiments carried on HeliconiusButterfly, MVTecAD and MVTec3D datasets demonstrate that AnomalyHybrid surpasses the GAN-based state-of-the-art on anomaly generation and its downstream anomaly classification, detection and segmentation tasks. On MVTecAD dataset, AnomalyHybrid achieves 2.06/0.32 IS/LPIPS for anomaly generation, 52.6 Acc for anomaly classification with ResNet34, 97.3/72.9 AP for image/pixel-level anomaly detection with a simple UNet.
701 |
702 |
703 | - *[[CVPR 2025]](https://openaccess.thecvf.com/content/CVPR2025/html/Jin_Dual-Interrelated_Diffusion_Model_for_Few-Shot_Anomaly_Image_Generation_CVPR_2025_paper.html) **Dual-Interrelated Diffusion Model for Few-Shot Anomaly Image Generation** [:octocat:](https://github.com/yinyjin/DualAnoDiff)
704 |
705 | *Jin, Ying and Peng, Jinlong and He, Qingdong and Hu, Teng and Wu, Jiafu and Chen, Hao and Wang, Haoxuan and Zhu, Wenbing and Chi, Mingmin and Liu, Jun and others*
706 |
707 |
708 |

709 |
710 |
711 |
712 | 📋 Abstract (Click to Expand)
713 | The performance of anomaly inspection in industrial manufacturing is constrained by the scarcity of anomaly data. To overcome this challenge, researchers have started employing anomaly generation approaches to augment the anomaly dataset. However, existing anomaly generation methods suffer from limited diversity in the generated anomalies and struggle to achieve a seamless blending of this anomaly with the original image. Moreover, the generated mask is usually not aligned with the generated anomaly. In this paper, we overcome these challenges from a new perspective, simultaneously generating a pair of the overall image and the corresponding anomaly part. We propose DualAnoDiff, a novel diffusion-based few-shot anomaly image generation model, which can generate diverse and realistic anomaly images by using a dual-interrelated diffusion model, where one of them is employed to generate the whole image while the other one generates the anomaly part. Moreover, we extract background and shape information to mitigate the distortion and blurriness phenomenon in few-shot image generation. Extensive experiments demonstrate the superiority of our proposed model over state-of-the-art methods in terms of diversity, realism and the accuracy of mask. Overall, our approach significantly improves the performance of downstream anomaly inspection tasks, including anomaly detection, anomaly localization, and anomaly classification tasks. Code will be made available.
714 |
715 |
716 | ## 14. Representation Learning
717 |
718 | - [[MICCAI 2025]](https://arxiv.org/abs/2505.21228) **Is Hyperbolic Space All You Need for Medical Anomaly Detection?** [:octocat:](https://hyperbolic-anomalies.github.io)
719 |
720 | *Gonzalez-Jimenez, Alvaro and Lionetti, Simone and Amruthalingam, Ludovic and Gottfrois, Philippe and Gröger, Fabian and Pouly, Marc and Navarini, Alexander A*
721 |
722 |
723 |

724 |
725 |
726 |
727 | 📋 Abstract (Click to Expand)
728 | Medical anomaly detection has emerged as a promising solution to challenges in data availability and labeling constraints. Traditional methods extract features from different layers of pre-trained networks in Euclidean space; however, Euclidean representations fail to effectively capture the hierarchical relationships within these features, leading to suboptimal anomaly detection performance. We propose a novel yet simple approach that projects feature representations into hyperbolic space, aggregates them based on confidence levels, and classifies samples as healthy or anomalous. Our experiments demonstrate that hyperbolic space consistently outperforms Euclidean-based frameworks, achieving higher AUROC scores at both image and pixel levels across multiple medical benchmark datasets. Additionally, we show that hyperbolic space exhibits resilience to parameter variations and excels in few-shot scenarios, where healthy images are scarce. These findings underscore the potential of hyperbolic space as a powerful alternative for medical anomaly detection. The project website can be found at https://hyperbolic-anomalies.github.io.
729 |
730 |
731 | ## 15. Matching Correction
732 |
733 | - *[[ICML 2025]](https://arxiv.org/abs/2505.01476) **CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering** [:octocat:](https://github.com/ZHE-SAPI/CostFilter-AD)
734 |
735 | *Zhang, Zhe and Cai, Mingxiu and Wang, Hanxiao and Wu, Gaochang and Chai, Tianyou and Zhu, Xiatian*
736 |
737 |
738 |

739 |
740 |
741 |
742 | 📋 Abstract (Click to Expand)
743 | Unsupervised anomaly detection (UAD) seeks to localize the anomaly mask of an input image with respect to normal samples. Either by reconstructing normal counterparts (reconstruction-based) or by learning an image feature embedding space (embedding-based), existing approaches fundamentally rely on image-level or feature-level matching to derive anomaly scores. Often, such a matching process is inaccurate yet overlooked, leading to sub-optimal detection. To address this issue, we introduce the concept of cost filtering, borrowed from classical matching tasks, such as depth and flow estimation, into the UAD problem. We call this approach {\em CostFilter-AD}. Specifically, we first construct a matching cost volume between the input and normal samples, comprising two spatial dimensions and one matching dimension that encodes potential matches. To refine this, we propose a cost volume filtering network, guided by the input observation as an attention query across multiple feature layers, which effectively suppresses matching noise while preserving edge structures and capturing subtle anomalies. Designed as a generic post-processing plug-in, CostFilter-AD can be integrated with either reconstruction-based or embedding-based methods. Extensive experiments on MVTec-AD and VisA benchmarks validate the generic benefits of CostFilter-AD for both single- and multi-class UAD tasks. Code and models will be released at https://github.com/ZHE-SAPI/CostFilter-AD.
744 |
745 |
746 | ## 16. Benchmarks, Surveys & Datasets
747 |
748 | - *[[CVPR 2025]](https://openaccess.thecvf.com/content/CVPR2025/html/Zhu_Real-IAD_D3_A_Real-World_2DPseudo-3D3D_Dataset_for_Industrial_Anomaly_Detection_CVPR_2025_paper.html) **Real-IAD D3: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection** [:octocat:](https://github.com/Real-IAD-D3/main)
749 |
750 | *Zhu, Wenbing and Wang, Lidong and Zhou, Ziqing and Wang, Chengjie and Pan, Yurui and Zhang, Ruoyi and Chen, Zhuhao and Cheng, Linjie and Gao, Bin-Bin and Zhang, Jiangning and others*
751 |
752 |
753 |

754 |
755 |
756 |
757 | 📋 Abstract (Click to Expand)
758 | The increasing complexity of industrial anomaly detection (IAD) has positioned multimodal detection methods as a focal area of machine vision research. However, dedicated multimodal datasets specifically tailored for IAD remain limited. Pioneering datasets like MVTec 3D have laid essential groundwork in multimodal IAD by incorporating RGB+3D data, but still face challenges in bridging the gap with real industrial environments due to limitations in scale and resolution. To address these challenges, we introduce Real-IAD D3, a high-precision multimodal dataset that uniquely incorporates an additional pseudo-3D modality generated through photometric stereo, alongside high-resolution RGB images and micrometer-level 3D point clouds. Real-IAD D3 comprises industrial components with smaller dimensions and finer defects than existing datasets, offering diverse anomalies across modalities and presenting a more challenging benchmark for multimodal IAD research. With 20 product categories, the dataset offers significantly greater scale and diversity compared to current alternatives. Additionally, we introduce an effective approach that integrates RGB, point cloud, and pseudo-3D depth information to leverage the complementary strengths of each modality, enhancing detection performance. Our experiments highlight the importance of these modalities in boosting detection robustness and overall IAD performance. The Real-IAD D3 dataset will be publicly available to advance research and innovation in multimodal IAD.The dataset and code are publicly accessible for research purposes at https://realiad4ad.github.io/Real-IAD_D3.
759 |
760 |
761 | - [[Nature Communications 2025]](https://www.nature.com/articles/s41467-025-56321-y) **Evaluating Normative Representation Learning in Generative AI for Robust Anomaly Detection in Brain Imaging** [:octocat:](https://github.com/compai-lab/2024-ncomms-bercea.git)
762 |
763 | *Bercea, Cosmin I and Wiestler, Benedikt and Rueckert, Daniel and Schnabel, Julia A*
764 |
765 |
766 |

767 |
768 |
769 |
770 | 📋 Abstract (Click to Expand)
771 | Normative representation learning focuses on understanding the typical anatomical distributions from large datasets of medical scans from healthy individuals. Generative Artificial Intelligence (AI) leverages this attribute to synthesize images that accurately reflect these normative patterns. This capability enables the AI allowing them to effectively detect and correct anomalies in new, unseen pathological data without the need for expert labeling. Traditional anomaly detection methods often evaluate the anomaly detection performance, overlooking the crucial role of normative learning. In our analysis, we introduce novel metrics, specifically designed to evaluate this facet in AI models. We apply these metrics across various generative AI frameworks, including advanced diffusion models, and rigorously test them against complex and diverse brain pathologies. In addition, we conduct a large multi-reader study to compare these metrics to experts’ evaluations. Our analysis demonstrates that models proficient in normative learning exhibit exceptional versatility, adeptly detecting a wide range of unseen medical conditions. Our code is available at https://github.com/compai-lab/2024-ncomms-bercea.git.
772 |
773 |
774 | - [[MedIA 2025]](https://arxiv.org/pdf/2404.04518) **MedIAnomaly: A Comparative Study of Anomaly Detection in Medical Images** [:octocat:](https://github.com/caiyu6666/medianomaly)
775 |
776 | *Cai, Yu and Zhang, Weiwen and Chen, Hao and Cheng, Kwang-Ting*
777 |
778 |
779 |

780 |
781 |
782 |
783 | 📋 Abstract (Click to Expand)
784 | Anomaly detection (AD) aims at detecting abnormal samples that deviate from the expected normal patterns. Generally, it can be trained merely on normal data, without a requirement for abnormal samples, and thereby plays an important role in the recognition of rare diseases and health screening in the medical domain. Despite the emergence of numerous methods for medical AD, we observe a lack of a fair and comprehensive evaluation, which causes ambiguous conclusions and hinders the development of this field. To address this problem, this paper builds a benchmark with unified comparison. Seven medical datasets with five image modalities, including chest X-rays, brain MRIs, retinal fundus images, dermatoscopic images, and histopathology whole slide images, are curated for extensive evaluation. Thirty typical AD methods, including reconstruction and self-supervised learning-based methods, are involved in comparison of image-level anomaly classification and pixel-level anomaly segmentation. Furthermore, for the first time, we formally explore the effect of key components in existing methods, clearly revealing unresolved challenges and potential future directions. The datasets and code are available at https://github.com/caiyu6666/MedIAnomaly.
785 |
786 |
787 | - *[[CVPR 2024]](https://openaccess.thecvf.com/content/CVPR2024/html/Wang_Real-IAD_A_Real-World_Multi-View_Dataset_for_Benchmarking_Versatile_Industrial_Anomaly_CVPR_2024_paper.html) **Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection** [:octocat:](https://github.com/Tencent/AnomalyDetection_Real-IAD)
788 |
789 | *Wang, Chengjie and Zhu, Wenbing and Gao, Bin-Bin and Gan, Zhenye and Zhang, Jiangning and Gu, Zhihao and Qian, Shuguang and Chen, Mingang and Ma, Lizhuang*
790 |
791 |
792 |

793 |
794 |
795 |
796 | 📋 Abstract (Click to Expand)
797 | Industrial anomaly detection (IAD) has garnered significant attention and experienced rapid development. However the recent development of IAD approach has encountered certain difficulties due to dataset limitations. On the one hand most of the state-of-the-art methods have achieved saturation (over 99% in AUROC) on mainstream datasets such as MVTec and the differences of methods cannot be well distinguished leading to a significant gap between public datasets and actual application scenarios. On the other hand the research on various new practical anomaly detection settings is limited by the scale of the dataset posing a risk of overfitting in evaluation results. Therefore we propose a large-scale Real-world and multi-view Industrial Anomaly Detection dataset named Real-IAD which contains 150K high-resolution images of 30 different objects an order of magnitude larger than existing datasets. It has a larger range of defect area and ratio proportions making it more challenging than previous datasets. To make the dataset closer to real application scenarios we adopted a multi-view shooting method and proposed sample-level evaluation metrics. In addition beyond the general unsupervised anomaly detection setting we propose a new setting for Fully Unsupervised Industrial Anomaly Detection (FUIAD) based on the observation that the yield rate in industrial production is usually greater than 60% which has more practical application value. Finally we report the results of popular IAD methods on the Real-IAD dataset providing a highly challenging benchmark to promote the development of the IAD field.
798 |
799 |
800 | - [[CVPR 2024]](https://openaccess.thecvf.com/content/CVPR2024W/VAND/html/Bao_BMAD_Benchmarks_for_Medical_Anomaly_Detection_CVPRW_2024_paper.html) **BMAD: Benchmarks for Medical Anomaly Detection** [:octocat:](https://github.com/dorisbao/bmad)
801 |
802 | *Bao, Jinan and Sun, Hanshi and Deng, Hanqiu and He, Yinsheng and Zhang, Zhaoxiang and Li, Xingyu*
803 |
804 |
805 |

806 |
807 |
808 |
809 | 📋 Abstract (Click to Expand)
810 | Anomaly detection (AD) is a fundamental research problem in machine learning and computer vision with practical applications in industrial inspection video surveillance and medical diagnosis. In the field of medical imaging AD plays a crucial role in identifying anomalies that may indicate rare diseases or conditions. However despite its importance there is currently a lack of a universal and fair benchmark for evaluating AD methods on medical images which hinders the development of more generalized and robust AD methods in this specific domain. To address this gap we present a comprehensive evaluation benchmark for assessing AD methods on medical images. This benchmark consists of six reorganized datasets from five medical domains (ie brain MRI liver CT retinal OCT chest X-ray and digital histopathology) and three key evaluation metrics and includes a total of fifteen state-of-the-art AD algorithms. This standardized and well-curated medical benchmark with the well-structured codebase enables researchers to easily compare and evaluate different AD methods and ultimately leads to the development of more effective and robust AD algorithms for medical imaging. More information on BMAD is available in our GitHub repository: https://github.com/DorisBao/BMAD.
811 |
812 |
813 | - [[arXiv 2024]](https://www.researchgate.net/profile/Haoyang-He-7/publication/381190391_ADer_A_Comprehensive_Benchmark_for_Multi-class_Visual_Anomaly_Detection/links/66dffcb7f84dd1716ce10dc4/ADer-A-Comprehensive-Benchmark-for-Multi-class-Visual-Anomaly-Detection.pdf) **ADer: A Comprehensive Benchmark for Multi-class Visual Anomaly Detection** [:octocat:](https://github.com/zhangzjn/ader)
814 |
815 | *Zhang, Jiangning and He, Haoyang and Gan, Zhenye and He, Qingdong and Cai, Yuxuan and Xue, Zhucun and Wang, Yabiao and Wang, Chengjie and Xie, Lei and Liu, Yong*
816 |
817 |
818 |

819 |
820 |
821 |
822 | 📋 Abstract (Click to Expand)
823 | Visual anomaly detection aims to identify anomalous regions in images through unsupervised learning paradigms, with increasing application demand and value in fields such as industrial inspection and medical lesion detection. Despite significant progress in recent years, there is a lack of comprehensive benchmarks to adequately evaluate the performance of various mainstream methods across different datasets under the practical multi-class setting. The absence of standardized experimental setups can lead to potential biases in training epochs, resolution, and metric results, resulting in erroneous conclusions. This paper addresses this issue by proposing a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework that is highly extensible for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. Additionally, we have open-sourced the GPU-assisted ADEval package to address the slow evaluation problem of metrics like time-consuming mAU-PRO on large-scale data, significantly reducing evaluation time by more than 1000-fold. Through extensive experimental results, we objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multiclass visual anomaly detection. We hope that ADer will become a valuable resource for researchers and practitioners in the field, promoting the development of more robust and generalizable anomaly detection systems. Full codes have been attached in Appendix and open-sourced at https://github.com/zhangzjn/ader.
824 |
825 |
826 | - [[CSUR 2021]](https://dl.acm.org/doi/abs/10.1145/3464423) **Deep Learning for Medical Anomaly Detection - A Survey**
827 |
828 | *Fernando, Tharindu and Gammulle, Harshala and Denman, Simon and Sridharan, Sridha and Fookes, Clinton*
829 |
830 |
831 |

832 |
833 |
834 |
835 | 📋 Abstract (Click to Expand)
836 | Machine learning–based medical anomaly detection is an important problem that has been extensively studied. Numerous approaches have been proposed across various medical application domains and we observe several similarities across these distinct applications. Despite this comparability, we observe a lack of structured organisation of these diverse research applications such that their advantages and limitations can be studied. The principal aim of this survey is to provide a thorough theoretical analysis of popular deep learning techniques in medical anomaly detection. In particular, we contribute a coherent and systematic review of state-of-the-art techniques, comparing and contrasting their architectural differences as well as training algorithms. Furthermore, we provide a comprehensive overview of deep model interpretation strategies that can be used to interpret model decisions. In addition, we outline the key limitations of existing deep medical anomaly detection techniques and propose key research directions for further investigation.
837 |
838 |
839 | - *[[CSUR 2019]](https://dl.acm.org/doi/abs/10.1145/3464423) **Deep Learning for Anomaly Detection: A Survey**
840 |
841 | *Chalapathy, Raghavendra and Chawla, Sanjay*
842 |
843 |
844 |

845 |
846 |
847 |
848 | 📋 Abstract (Click to Expand)
849 | Anomaly detection is an important problem that has been well-studied within diverse research areas and application domains. The aim of this survey is two-fold, firstly we present a structured and comprehensive overview of research methods in deep learning-based anomaly detection. Furthermore, we review the adoption of these methods for anomaly across various application domains and assess their effectiveness. We have grouped state-of-the-art research techniques into different categories based on the underlying assumptions and approach adopted. Within each category we outline the basic anomaly detection technique, along with its variants and present key assumptions, to differentiate between normal and anomalous behavior. For each category, we present we also present the advantages and limitations and discuss the computational complexity of the techniques in real application domains. Finally, we outline open issues in research and challenges faced while adopting these techniques.
850 |
851 |
852 | - *[[CSUR 2009]](https://dl.acm.org/doi/abs/10.1145/3464423) **Anomaly Detection: A Survey**
853 |
854 | *Chandola, Varun and Banerjee, Arindam and Kumar, Vipin*
855 |
856 |
857 |

858 |
859 |
860 |
861 | 📋 Abstract (Click to Expand)
862 | Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and more succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
863 |
864 |
865 | ## 💞 Citation
866 |
867 | ```
868 | @misc{sun2025,
869 | author = {Sun, Yifei and Jia, Junhao and Zheng, Hao and Chen, Zhanghao and He, Yuzhi and Wang, Jinhong and Li, Jincheng and Gui, Chengzhi},
870 | title = {Paper List for Medical Anomaly Detection},
871 | year = {2025},
872 | publisher = {GitHub},
873 | journal = {GitHub repository},
874 | howpublished = {\url{https://github.com/diaoquesang/Paper-List-for-Medical-Anomaly-Detection}}
875 | }
876 | ```
877 |
878 | ## 🥰 Star History
879 | [](https://star-history.com/#diaoquesang/Paper-List-for-Medical-Anomaly-Detection&Date)
880 |
881 | [
882 | ](#top)
883 |
--------------------------------------------------------------------------------