├── .gitignore ├── LICENSE ├── README.md ├── SETUP.md ├── data └── mvtec_ad │ └── .gitkeep ├── docker ├── README.md ├── cpu │ ├── Dockerfile │ ├── entrypoint.sh │ └── installer.sh └── gpu │ ├── Dockerfile │ ├── entrypoint.sh │ └── installer.sh ├── docs ├── MVTecAD_scores.md └── figures │ ├── .gitkeep │ └── roc_curve.svg ├── experiments ├── README.md ├── figures │ ├── MVTecAD_Breakdown_of_inference_time_on_CPU.svg │ ├── MVTecAD_ResNet50_with_different_pretrainings.svg │ ├── MVTecAD_averaged_image-level_roc_auc_score.svg │ ├── MVTecAD_averaged_pixel-level_roc_auc_score.svg │ ├── MVTecAD_image-level_roc_auc_score.svg │ ├── MVTecAD_image-level_roc_auc_score_backbones.svg │ ├── MVTecAD_pixel-level_roc_auc_score.svg │ ├── MVTecAD_pixel-level_roc_auc_score_backbones.svg │ ├── anomalous_area_visualization.jpg │ ├── patchcore_sketch.jpg │ └── samples_mvtec_ad.jpg ├── summary_comparison_pretraining.md ├── summary_comparison_with_backbones.md └── summary_comparison_with_the_paper.md ├── main.py ├── main_mvtecad.py ├── patchcore ├── dataset.py ├── extractor.py ├── knnsearch.py ├── patchcore.py └── utils.py ├── requirements.txt └── run.sh /.gitignore: -------------------------------------------------------------------------------- 1 | # Ignore dataset. 2 | data/mvtec_ad/* 3 | !data/mvtec_ad/.gitkeep 4 | 5 | # Ignore output data. 6 | *.jpg 7 | *.npy 8 | *.faiss 9 | log.txt 10 | 11 | # Ignore Python cache. 12 | __pycache__ 13 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2022 Tetsuya Ishikawa 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | PatchCore Anomaly Detection 2 | ================================================================================ 3 | 4 | This repository provides an unofficial PyTorch implementation 5 | of the PatchCore anomaly detection model [1] and several additional experiments. 6 | 7 | PatchCore is an anomaly detection algorithm that has the following features: 8 | 9 | * uses a memory-bank of nominal features extracted from a pre-trained 10 | backbone network (like SPADE and PaDiM), where the memory back is 11 | coreset-subsampled to ensure low inference cost at higher performance, 12 | * uses an approximated nearest neighbor search for evaluating pixel-wise 13 | anomaly score on inference, 14 | * shows state-of-the-art performance on the MVTec AD dataset (as of Jun 2021). 15 | 16 |
17 | 18 | 19 |
20 | 21 | 22 | Usage 23 | -------------------------------------------------------------------------------- 24 | 25 | ### Installation 26 | 27 | The author recommends using Docker for keeping your environment clean. 28 | For example, you can create a new Docker container and enter into it 29 | by the following command in the root directory of this repository: 30 | 31 | ```console 32 | docker run --rm -it -v `pwd`:/workspace -w /workspace --name patchcore tiskw/patchcore:cpu-2022-03-01 33 | ``` 34 | 35 | If you need GPU support, please use the Docker image with CUDA libraries: 36 | 37 | ```console 38 | docker run --rm -it -v `pwd`:/workspace -w /workspace --name patchcore tiskw/patchcore:gpu-2022-03-01 39 | ``` 40 | 41 | See [this document](SETUP.md) for more details. 42 | 43 | ### Dataset 44 | 45 | You need to get the MVTec AD dataset [2] if you will reproduce [our experiments](experiments/README.md). 46 | If you don't have a plan to use the dataset, you can skip this subsection. 47 | You can download the MVTec AD dataset from 48 | [the official website](https://www.mvtec.com/company/research/datasets/mvtec-ad) 49 | (or [direct link to the data file](https://www.mydrive.ch/shares/38536/3830184030e49fe74747669442f0f282/download/420938113-1629952094/mvtec_anomaly_detection.tar.xz)) 50 | and put it under `data/mvtec_ad/` directory. 51 | 52 | At first, please move to the `data/mvtec_ad/` directory. 53 | 54 | ```console 55 | cd data/mvtec_ad/ 56 | ``` 57 | 58 | Then, run the following command to download the MVTec AD dataset: 59 | 60 | ```console 61 | wget "https://www.mydrive.ch/shares/38536/3830184030e49fe74747669442f0f282/download/420938113-1629952094/mvtec_anomaly_detection.tar.xz" 62 | ``` 63 | 64 | Finally, extract the downloaded data: 65 | 66 | ```console 67 | tar fJx mvtec_anomaly_detection.tar.xz 68 | ``` 69 | 70 | See [this document](SETUP.md) for more details. 71 | 72 | 73 | ### Train and predict on your dataset 74 | 75 | You can train and run predictions on your dataset using `main.py`. 76 | In the following, we assume: 77 | 78 | - your training images (good images) are stored under `data_train/` directory, 79 | - your test images (good or failure images) are stored under `data_test/` directory, 80 | - the training result will be stored at `./index.faiss`, 81 | - the test results will be dumped under `output_test/` directory. 82 | 83 | You can train your model by the following command: 84 | 85 | ```console 86 | python3 main.py train -i data_train -o ./index.faiss 87 | ``` 88 | 89 | Then, you can predict anomaly score for test images by the following command: 90 | 91 | ```console 92 | python3 main.py predict -i data_test -o output_test 93 | ``` 94 | 95 | On the output directory `output_test/`, two types of files will be dumped: 96 | - `.jpg`: anomaly heatmap overlayed on the input image, 97 | - `.npy`: matrix of anomaly heatmap with shape `(height, width)`. 98 | 99 | ### Replicate the experiments 100 | 101 | If you want to replicate [the experiments](experiments/README.md), 102 | run the following command at the root directory of this repository: 103 | 104 | ```console 105 | python3 main_mvtecad.py runall 106 | python3 main_mvtecad.py summarize 107 | ``` 108 | 109 | The `python3 main_mvtecad.py runall` command will take a quite long time, 110 | therefore it is a good idea to wrap the above command by `nohup`. 111 | 112 | ### Anomalous area visualization 113 | 114 | If you want the visualizatino of the anomalous area for each sample like 115 | the following figure, you can try `--contour` option in the prediction 116 | step. The following is the details of steps to generate the anomalous 117 | area visualization. 118 | 119 |
120 | 121 |
122 | 123 | At first, the training data is located under `data_train/` directory, 124 | and we assume that you completed the training of the model: 125 | 126 | ```console 127 | # Train the PatchCore model. 128 | python3 main.py train -i data_train -o ./index.faiss 129 | ``` 130 | 131 | Next, we need to compute the threshold for determining the anomalous area 132 | for each samples. You can compute the threshold by the following command: 133 | 134 | ```console 135 | # Compute threshold. 136 | python3 main.py thresh -i data_train 137 | ``` 138 | 139 | Finally, you can get the anomalouse area visualization by the following command 140 | where we assume that the test data is located under `data_test/` directory 141 | and `THRESH` is the threshold value computed in the previous step: 142 | 143 | ```console 144 | # Visualize contour map using the threshold value obtained by the above. 145 | python3 main.py predict --contour THRESH -i data_test -o output_test 146 | ``` 147 | 148 | 149 | Experiments 150 | -------------------------------------------------------------------------------- 151 | 152 | ### Experiment 1: Comparison with the original paper on MVTec AD dataset 153 | 154 | The following figures are summaries of the comparison of the anomaly detection 155 | scores on MVTec AD dataset [2] with the original paper of the PatchCore [1]. 156 | The performance of our implementation is quite close to the paper's score, 157 | therefore our implementation may have no serious issue. 158 | 159 |
160 | 161 | 162 |
163 | 164 | See [this document](experiments/summary_comparison_with_the_paper.md) 165 | for more details. 166 | 167 | ### Experiment 2: Comparison of backbone networks 168 | 169 | We compared the image/pixel-level scores on the MVTec AD dataset with 170 | different backbone networks. Some networks show a better speed/performance 171 | tradeoff than Wide ResNet50 x2 which is used as a default backbone network 172 | in the original paper. 173 | 174 |
175 | 176 | 177 |
178 | 179 | See [this document](experiments/summary_comparison_with_backbones.md) 180 | for more details. 181 | 182 | ### Experiment 3: Comparison of pre-trainings 183 | 184 | We compared several different pre-trained ResNet50 as a backbone of PatchCore. 185 | We hypothesize that a well-trained neural network achieves higher performance. 186 | We tried the normal ImageNet pre-training ResNet50, DeepLabV3 resNet50 pre-trained 187 | with COCO, and ResNet50-SSL/SWSL model that are pre-trained on ImageNet [3] 188 | with semi-supervise or un-supervised manner. The result is quite interesting, 189 | however, basically, we can say that the normal ImageNet pre-trained model is enough 190 | good for PatchCore purposes. 191 | 192 |
193 | 194 |
195 | 196 | See [this document](experiments/summary_comparison_pretraining.md) 197 | for more details. 198 | 199 | 209 | 210 | Notes 211 | -------------------------------------------------------------------------------- 212 | 213 | * This implementation refers to another PatchCore implementation [6] which 214 | is released under Apache 2.0 license. The author has learned a lot 215 | from the implementation. 216 | 217 | 218 | References 219 | -------------------------------------------------------------------------------- 220 | 221 | [1] K. Roth, L. Pemula, J. Zepeda, B. Scholkopf, T. Brox, and P. Gehler, 222 | "Towards Total Recall in Industrial Anomaly Detection", 223 | arXiv, 2021. 224 | [PDF](https://arxiv.org/pdf/2106.08265.pdf) 225 | 226 | [2] P. Bergmann M. Fauser D. Sattlegger, and C. Steger, 227 | "MVTec AD - A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection", 228 | CVPR, 2019. 229 | [PDF](https://openaccess.thecvf.com/content_CVPR_2019/papers/Bergmann_MVTec_AD_--_A_Comprehensive_Real-World_Dataset_for_Unsupervised_Anomaly_CVPR_2019_paper.pdf) 230 | 231 | [3] I. Yalniz, H. Jegou, K. Chen, M. Paluri and D. Mahajan, 232 | "Billion-scale semi-supervised learning for image classification", 233 | arXiv, 2019. 234 | [PDF](https://arxiv.org/pdf/1905.00546.pdf) 235 | 236 | [4] E. Fix and J. Hodges, 237 | ”Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties”, 238 | USAF School of Aviation Medicine, Randolph Field, Texas, 1951. 239 | 240 | [5] B. Scholkopf, R. Williamson, A. Smola, J. Shawe-Taylor and J. Platt, 241 | "Support Vector Method for Novelty Detection", 242 | NIPS, 1999. 243 | [PDF](https://proceedings.neurips.cc/paper/1999/file/8725fb777f25776ffa9076e44fcfd776-Paper.pdf) 244 | 245 | [6] [hcw-00/PatchCore_anomaly_detection](https://github.com/hcw-00/PatchCore_anomaly_detection), GitHub. 246 | -------------------------------------------------------------------------------- /SETUP.md: -------------------------------------------------------------------------------- 1 | Setting up 2 | ================================================================================ 3 | 4 | Installation 5 | -------------------------------------------------------------------------------- 6 | 7 | ### Using Docker (recommended) 8 | 9 | If you don't like to pollute your development environment, it is a good idea 10 | to run everything inside a Docker container. Our code is executable on this 11 | Docker image. Please download the Docker image to your computer by 12 | the following command at first: 13 | 14 | ``` 15 | docker pull tiskw/patchcore:cpu-2022-01-29 16 | ``` 17 | 18 | You can create your Docker container by the following command: 19 | 20 | ```console 21 | cd ROOT_DIRECTORY_OF_THIS_REPO 22 | docker run --rm -it -v `pwd`:/work -w /work -u `id -u`:`id -g` --name patchcore tiskw/patchcore:cpu-2022-01-29 23 | ``` 24 | 25 | If you need GPU support, use `tiskw/patchcore:gpu-2022-01-29` image instead, 26 | and add `--gpus all` option to the above `docker run` command. 27 | 28 | ### Installing on your environment (easier, but pollute your development environment) 29 | 30 | If you don't mind polluting your environment 31 | (or you are already inside a docker container), 32 | just run the following command for installing required packages: 33 | 34 | ```console 35 | cd ROOT_DIRECTORY_OF_THIS_REPO 36 | pip3 install -r requirements.txt 37 | ``` 38 | 39 | If you need GPU support, open the `requirements.txt` 40 | and replace `faiss-cpu` to `faiss-gpu`. 41 | 42 | 43 | Dataset 44 | -------------------------------------------------------------------------------- 45 | 46 | Download the MVTec AD dataset from 47 | [the official website](https://www.mvtec.com/company/research/datasets/mvtec-ad) 48 | (or [direct link to the data file](https://www.mydrive.ch/shares/38536/3830184030e49fe74747669442f0f282/download/420938113-1629952094/mvtec_anomaly_detection.tar.xz)) 49 | and put the downloaded file (`mvtec_anomaly_detection.tar.xz`) under `data/mvtec_ad` 50 | directory. The following is an example to download the dataset from your terminal: 51 | 52 | ```console 53 | cd ROOT_DIRECTORY_OF_THIS_REPO 54 | cd data/mvtec_ad 55 | wget "https://www.mydrive.ch/shares/38536/3830184030e49fe74747669442f0f282/download/420938113-1629952094/mvtec_anomaly_detection.tar.xz" 56 | ``` 57 | 58 | Then, extract the downloaded data in the `data/mvtec_ad` directory: 59 | 60 | ```console 61 | cd ROOT_DIRECTORY_OF_THIS_REPO 62 | cd data/mvtec_ad 63 | tar xfJ mvtec_anomaly_detection.tar.xz 64 | ``` 65 | 66 | You've succeed to extract the dataset if the directory structure of your 67 | `data/mvtec_ad/` is like the following: 68 | 69 | ```console 70 | data/mvtec_ad/ 71 | |-- bottle 72 | | |-- ground_truth 73 | | | |-- broken_large 74 | | | |-- broken_small 75 | | | `-- contamination 76 | | |-- test 77 | | | |-- broken_large 78 | | | |-- broken_small 79 | | | |-- contamination 80 | | | `-- good 81 | | `-- train 82 | | `-- good 83 | |-- cable 84 | | |-- ground_truth 85 | | | |-- bent_wire 86 | | | |-- cable_swap 87 | | | |-- combined 88 | | | |-- cut_inner_insulation 89 | | | |-- cut_outer_insulation 90 | ... 91 | ``` 92 | 93 | the above is the output of `tree -d --charset unicode data/mvtec_ad` 94 | command on the authors environment. 95 | -------------------------------------------------------------------------------- /data/mvtec_ad/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tiskw/patchcore-ad/de22d18a1bfb444558abcc922326b6ca17460a89/data/mvtec_ad/.gitkeep -------------------------------------------------------------------------------- /docker/README.md: -------------------------------------------------------------------------------- 1 | Docker images 2 | ================================================================================ 3 | 4 | This directory contains files for building Docker images used in 5 | this repository. All images are available from 6 | [Dockerhub](https://hub.docker.com/r/tiskw/patchcore). 7 | 8 | Build docker images 9 | -------------------------------------------------------------------------------- 10 | 11 | ### Docker image for CPU 12 | 13 | ```console 14 | cd ROOT_DIRECTORY_OF_THIS_REPO/docker/cpu 15 | docker build -t `date +"tiskw/patchcore:cpu-%Y-%m-%d"` . 16 | ``` 17 | 18 | ### Docker image for GPU 19 | 20 | ```console 21 | cd ROOT_DIRECTORY_OF_THIS_REPO/docker/gpu 22 | docker build -t `date +"tiskw/patchcore:gpu-%Y-%m-%d"` . 23 | ``` 24 | -------------------------------------------------------------------------------- /docker/cpu/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ubuntu:focal 2 | 3 | MAINTAINER Tetsuya Ishikawa 4 | 5 | # Set environment variables. 6 | ENV DEBIAN_FRONTEND=noninteractive 7 | ENV USERNAME=developer 8 | 9 | # Copy and run the installer. 10 | COPY installer.sh /installer.sh 11 | RUN sh installer.sh 12 | 13 | # Copy a shell script for dynamic user creation. 14 | COPY entrypoint.sh /entrypoint.sh 15 | 16 | # Unlock permissions for the above "entrypoint.sh". 17 | RUN chmod u+s /usr/sbin/useradd /usr/sbin/groupadd 18 | 19 | # Set locales. 20 | ENV LANG en_US.UTF-8 21 | ENV LANGUAGE en_US:en 22 | ENV LC_ALL en_US.UTF-8 23 | 24 | ENTRYPOINT ["sh", "/entrypoint.sh"] 25 | CMD ["/bin/bash"] 26 | -------------------------------------------------------------------------------- /docker/cpu/entrypoint.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash -e 2 | 3 | USERNAME="developer" 4 | 5 | USRID=$(id -u) 6 | GRPID=$(id -g) 7 | 8 | # Create group. 9 | if [ x"$GRPID" != x"0" ]; then 10 | groupadd -g ${GRPID} ${USERNAME} 11 | fi 12 | 13 | # Create user. 14 | if [ x"$USRID" != x"0" ]; then 15 | useradd -d /home/${USERNAME} -m -s /bin/bash -u ${USRID} -g ${GRPID} ${USERNAME} 16 | fi 17 | 18 | # Restore permissions. 19 | sudo chmod u-s /usr/sbin/useradd /usr/sbin/groupadd 20 | 21 | export HOME="/home/${USERNAME}" 22 | 23 | exec $@ 24 | -------------------------------------------------------------------------------- /docker/cpu/installer.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # Update and upgrade installed packages. 4 | apt-get update 5 | apt-get upgrade -y 6 | apt-get install -y apt-utils 7 | 8 | # Install necessary packages. 9 | apt-get install -y sudo locales 10 | 11 | # Install Python3. 12 | apt-get install -y python3 python3-dev python3-distutils python3-pip 13 | 14 | # Install OpenCV. 15 | apt-get install -y libopencv-dev 16 | 17 | # Install Python packages. 18 | pip3 install opencv-python==4.5.5.62 scikit-learn==1.0.2 rich==11.1.0 faiss-cpu==1.7.2 \ 19 | torch==1.10.2+cu113 torchvision==0.11.3+cu113 thop==0.0.31-2005241907 \ 20 | -f https://download.pytorch.org/whl/cu113/torch_stable.html 21 | 22 | # Set locale to UTF8. 23 | locale-gen en_US.UTF-8 24 | 25 | # Clear package cache. 26 | apt-get clean 27 | rm -rf /var/lib/apt/lists/* 28 | 29 | # Enable sudo without password. 30 | mkdir -p /etc/sudoers.d 31 | echo "${USERNAME} ALL=NOPASSWD: ALL" >> /etc/sudoers.d/${USERNAME} 32 | 33 | # Unlock permissions. 34 | chmod u+s /usr/sbin/useradd && chmod u+s /usr/sbin/groupadd 35 | -------------------------------------------------------------------------------- /docker/gpu/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM nvidia/cuda:11.3.1-devel-ubuntu20.04 2 | 3 | MAINTAINER Tetsuya Ishikawa 4 | 5 | # Set environment variables. 6 | ENV DEBIAN_FRONTEND=noninteractive 7 | ENV USERNAME=developer 8 | 9 | # Copy and run the installer. 10 | COPY installer.sh /installer.sh 11 | RUN sh installer.sh 12 | 13 | # Copy a shell script for dynamic user creation. 14 | COPY entrypoint.sh /entrypoint.sh 15 | 16 | # Unlock permissions for the above "entrypoint.sh". 17 | RUN chmod u+s /usr/sbin/useradd /usr/sbin/groupadd 18 | 19 | # Set locales. 20 | ENV LANG en_US.UTF-8 21 | ENV LANGUAGE en_US:en 22 | ENV LC_ALL en_US.UTF-8 23 | 24 | ENTRYPOINT ["sh", "/entrypoint.sh"] 25 | CMD ["/bin/bash"] 26 | -------------------------------------------------------------------------------- /docker/gpu/entrypoint.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash -e 2 | 3 | USERNAME="developer" 4 | 5 | USRID=$(id -u) 6 | GRPID=$(id -g) 7 | 8 | # Create group. 9 | if [ x"$GRPID" != x"0" ]; then 10 | groupadd -g ${GRPID} ${USERNAME} 11 | fi 12 | 13 | # Create user. 14 | if [ x"$USRID" != x"0" ]; then 15 | useradd -d /home/${USERNAME} -m -s /bin/bash -u ${USRID} -g ${GRPID} ${USERNAME} 16 | fi 17 | 18 | # Restore permissions. 19 | sudo chmod u-s /usr/sbin/useradd /usr/sbin/groupadd 20 | 21 | export HOME="/home/${USERNAME}" 22 | 23 | exec $@ 24 | -------------------------------------------------------------------------------- /docker/gpu/installer.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # Update and upgrade installed packages. 4 | apt-get update 5 | apt-get upgrade -y 6 | apt-get install -y apt-utils 7 | 8 | # Install necessary packages. 9 | apt-get install -y sudo locales 10 | 11 | # Install Python3. 12 | apt-get install -y python3 python3-dev python3-distutils python3-pip 13 | 14 | # Install OpenCV. 15 | apt-get install -y libopencv-dev 16 | 17 | # Install Python packages. 18 | pip3 install opencv-python==4.5.5.62 scikit-learn==1.0.2 rich==11.1.0 faiss-gpu==1.7.2 \ 19 | torch==1.10.2+cu113 torchvision==0.11.3+cu113 thop==0.0.31-2005241907 \ 20 | -f https://download.pytorch.org/whl/cu113/torch_stable.html 21 | 22 | # Set locale to UTF8. 23 | locale-gen en_US.UTF-8 24 | 25 | # Clear package cache. 26 | apt-get clean 27 | rm -rf /var/lib/apt/lists/* 28 | 29 | # Enable sudo without password. 30 | mkdir -p /etc/sudoers.d 31 | echo "${USERNAME} ALL=NOPASSWD: ALL" >> /etc/sudoers.d/${USERNAME} 32 | 33 | # Unlock permissions. 34 | chmod u+s /usr/sbin/useradd && chmod u+s /usr/sbin/groupadd 35 | -------------------------------------------------------------------------------- /docs/MVTecAD_scores.md: -------------------------------------------------------------------------------- 1 | Scores used in the MVTec AD dataset 2 | ================================================================================ 3 | 4 | Image-level ROC AUC score 5 | -------------------------------------------------------------------------------- 6 | 7 | The Iimage-level ROC AUC score is a metric for image-wise anomaly detection 8 | performance. This score is not mentioned in the MVTec AD dataset paper, 9 | however, from an application point of view, it is quite important to measure 10 | the image-wise performance of anomaly detection algorithms, therefore, 11 | this score is commonly used in the image anomlay detection paper, 12 | like SPADE and PaDiM. 13 | 14 | Definition of the image-level ROC AUC score is quite simple. 15 | Let's assume that the target anomaly detection algorithm can compute anomaly 16 | values for each image. The image-level ROC AUC acore is defined as 17 | AUC (Area Under the Curve) of ROC (Receiver Operatorating Characteristic) 18 | curve of the anomaly values computed per image by the target algorithm. 19 | 20 | ### What's ROC curve? 21 | 22 | Assume that the target model outputs anomaly score (real value) for each 23 | samples that have ground truth label (for example, 0: good, 1: anomaly). 24 | ROC curve is a plotting of TPR (True Positive Rate) versus 25 | FPR (False Positive Rate) with all threshold for anomaly score. 26 | 27 |
28 | 29 |
30 | 31 | ### What's AUC? 32 | 33 | AUC is a normalied area under the ROC curve. 34 | The minimum and maximum AUC score is 0 and 1 respectively. 35 | 36 | ### Pros and Cons 37 | 38 | * Quite intuitive for many applications 39 | * Insensitive for anomaly location. 40 | 41 | 42 | Pixel-level ROC AUC score 43 | -------------------------------------------------------------------------------- 44 | 45 | Score computation is almost the same as the image-level ROC AUC score, 46 | but the base table is not image-level table, but pixel level table. 47 | 48 | | Sample | Anomaly score | 49 | |:---------------------------:|:-------------:| 50 | | pixel (0, 0) of the image 0 | a(0, 0, 0) | 51 | | pixel (1, 0) of the image 0 | a(1, 0, 0) | 52 | | ... | ... | 53 | 54 | ### Pros and Cons 55 | 56 | * Sensitive for anomaly location 57 | * Not intuitive for some applications 58 | -------------------------------------------------------------------------------- /docs/figures/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tiskw/patchcore-ad/de22d18a1bfb444558abcc922326b6ca17460a89/docs/figures/.gitkeep -------------------------------------------------------------------------------- /experiments/README.md: -------------------------------------------------------------------------------- 1 | Experiments 2 | ================================================================================ 3 | 4 | This directory contains experiment results and their summaries. 5 | 6 | Experiment 1: Comparison with the original paper 7 | -------------------------------------------------------------------------------- 8 | 9 | - **Purpose**: Our purpose is to check that our implementation correctly 10 | reproduced the PatchCore algorithm. 11 | 12 | - **What we've done**: Evaluate our implementation on the MVTec AD dataset 13 | and compare it with the results written in the original paper. 14 | 15 | - **Conclusion**: The score of our implementation is quite close to 16 | the paper's score. Therefore, our implementation may not have a serious issue. 17 | 18 | See [this document](summary_comparison_with_the_paper.md) for details. 19 | 20 |
21 | 22 | 23 |
24 | 25 | 26 | Experiment 2: Comparison of backborn networks 27 | -------------------------------------------------------------------------------- 28 | 29 | - **Purpose**: It's quite easy to swap the backbone network in the PatchCore 30 | algorithm (default: Wide ResNet50 x2). It's meaningful to find a good 31 | backbone network that shows a good performance-speed tradeoff from 32 | an application viewpoint. 33 | 34 | - **What we've done**: Try several backbone networks and evaluate their 35 | average image/pixel-level scores in the MVTecAD dataset. 36 | 37 | - **Conclusion**: The smaller ResNet (ResNet18, ResNet34) shows enough good 38 | scores even for their small computational cost. On the other hand, very 39 | deep ResNet (ResNet101, ResNet152) shows lower performance than ResNet50. 40 | A current tentative hypothesis is that the features used in the PatchCore 41 | algorithm are too deep (too far from input) and don't have enough 42 | high resolution (raw) features in them. In other words, we should 43 | add shallower features in the case of very deep neural networks 44 | like ResNet101/ResNet152 for exceeding ResNet50's score. 45 | 46 |
47 | 48 | 49 |
50 | 51 | See [this document](summary_comparison_with_backbones.md) for details. 52 | 53 | 54 | Experiment 3: Comparison of pre-trainings 55 | -------------------------------------------------------------------------------- 56 | 57 | - **Purpose**: We hyposesize that well-trained neural network achieves 58 | higher performance on PatchCore anomaly detection. 59 | 60 | - **What we've done**: We used several networks that are pre-training in 61 | a different dataset or different method as a backbone of PatchCore, 62 | and evaluate their performance on the MVTec AD dataset. 63 | 64 | - **Conclusion**: We found quite interesting observations, however, 65 | we cannot conclude it at this moment. We would say that the normal ImageNet 66 | pre-trained model seems to be enough good for PatchCore purpose. 67 | 68 |
69 | 70 |
71 | -------------------------------------------------------------------------------- /experiments/figures/MVTecAD_averaged_pixel-level_roc_auc_score.svg: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /experiments/figures/anomalous_area_visualization.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tiskw/patchcore-ad/de22d18a1bfb444558abcc922326b6ca17460a89/experiments/figures/anomalous_area_visualization.jpg -------------------------------------------------------------------------------- /experiments/figures/patchcore_sketch.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tiskw/patchcore-ad/de22d18a1bfb444558abcc922326b6ca17460a89/experiments/figures/patchcore_sketch.jpg -------------------------------------------------------------------------------- /experiments/figures/samples_mvtec_ad.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tiskw/patchcore-ad/de22d18a1bfb444558abcc922326b6ca17460a89/experiments/figures/samples_mvtec_ad.jpg -------------------------------------------------------------------------------- /experiments/summary_comparison_pretraining.md: -------------------------------------------------------------------------------- 1 | Comparison of pre-trainings 2 | ================================================================================ 3 | 4 | This directory contains experiment results of ResNet50 that are pre-trained 5 | with difference dataset or different method. 6 | 7 | 8 | Purpose 9 | -------------------------------------------------------------------------------- 10 | 11 | We hyposesize that well-trained neural network achieves higher performance. 12 | 13 | 14 | What we've done 15 | -------------------------------------------------------------------------------- 16 | 17 | We tried ResNet50 with ImageNet pre-training, DeepLab V3 pretrained with COCO, 18 | and SSL/SWSL model pretrained on ImageNet [3]. 19 | We used these networks as a backbone of PatchCore and evaluate their 20 | performance on the MVTec AD dataset. 21 | 22 | 23 | Conclution 24 | -------------------------------------------------------------------------------- 25 | 26 | We found quite interesting observations: 27 | 28 | - Simple ImageNet pre-trained ResNet50 (ResNet50 in the following figure) 29 | shows lower performance in the image-level ROC AUC score, on the other hand, 30 | it shows higher performance in the pixel-level ROC AUC score. 31 | - One possible reasoning is that the simple ImageNet pre-trained model 32 | keeps high-resolution (= raw) information in the features than the other 33 | models which trained on more complex dataset or trained by more complex 34 | method. High-resolution information may bring higer pixel-lebel score 35 | (and lower image-level score), therefore it can be an explanation 36 | of this phenomenon. 37 | 38 | We cannot conclude it at this moment and we need more experiments, however, 39 | I would say that the normal ImageNet pre-trained model seems to be enough 40 | good for PatchCore purpose. 41 | 42 |
43 | 44 |
45 | -------------------------------------------------------------------------------- /experiments/summary_comparison_with_backbones.md: -------------------------------------------------------------------------------- 1 | Comparison of backbone networks 2 | ================================================================================ 3 | 4 | This directory contains experiment results of different backbones 5 | on MVTec AD dataset. 6 | 7 | 8 | Purpose 9 | -------------------------------------------------------------------------------- 10 | 11 | It's quite easy to swap the backborn network in the PatchCore algorithm 12 | (default: Wide ResNet50 x2). It's meaningful to find a good backbone network 13 | which shows a good performance-speed tradeoff from an application view point. 14 | 15 | 16 | What we've done 17 | -------------------------------------------------------------------------------- 18 | 19 | Try several backbone networks and evaluate their average 20 | image/pixel-level scores in the MVTecAD dataset. 21 | 22 | ### Test Environment 23 | 24 | - CPU: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (56 cores) 25 | - RAM: 128 GB 26 | 27 | 28 | Conclution 29 | -------------------------------------------------------------------------------- 30 | 31 | The smaller ResNet (ResNet18, ResNet34) shows enough good 32 | scores even for their small computational cost. On the otherhand, very deep 33 | ResNet (ResNet101, ResNet152) shows a lower performance than ResNet50. 34 | Current tentative hypothesis is that the features used in the PatchCore 35 | algorithm are too deep (too far from input) and don't have enough 36 | high-resolution (raw) input information in them. In other words, we should 37 | add more shallower fatures in the case of very deep neural networks 38 | like ResNet101 and ResNet152 for exceeding ResNet50's score. 39 | 40 |
41 | 42 | 43 | 44 |
45 | -------------------------------------------------------------------------------- /experiments/summary_comparison_with_the_paper.md: -------------------------------------------------------------------------------- 1 | Comparison with the original paper 2 | ================================================================================ 3 | 4 | We reproduced the same experiments with the original paper, 5 | image-level and pixel-level ROC AUC score on the MVTec AD dataset 6 | (see the table S1 and S2 on page 15 of the paper), 7 | for checking that our code correctly implemented the PatchCore algorithm. 8 | 9 | Conclusion 10 | -------------------------------------------------------------------------------- 11 | 12 | The average image-level and pixel-level ROC AUC scores of our code are quite 13 | close to the paper's score if the sampling ratio is 1%, therefore the author 14 | thinks our code may not have a serious issue. 15 | 16 | | | Sampling ratio | Average image-level roc auc score | Average pixel-level roc auc score | 17 | |:---------:|:--------------:|:---------------------------------:|:---------------------------------:| 18 | | The paper | 1.0 % | 99.0 % | 98.0 % | 19 | | This repo | 1.0 % | 98.8 % | 97.9 % | 20 | 21 | Results 22 | -------------------------------------------------------------------------------- 23 | 24 |
25 | 26 | 27 |
28 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | """ 2 | PatchCore anomaly detection: main script for user custom dataset 3 | """ 4 | 5 | # Import standard libraries. 6 | import argparse 7 | 8 | # Import third-party packages. 9 | import numpy as np 10 | import rich 11 | import rich.progress 12 | import torch 13 | 14 | # Import custom modules. 15 | from patchcore.dataset import MVTecADImageOnly 16 | from patchcore.patchcore import PatchCore 17 | from patchcore.utils import auto_threshold 18 | 19 | 20 | def parse_args(): 21 | """ 22 | Parse command line arguments. 23 | """ 24 | fmtcls = lambda prog: argparse.HelpFormatter(prog,max_help_position=50) 25 | parser = argparse.ArgumentParser(description=__doc__, add_help=False, formatter_class=fmtcls) 26 | 27 | # Required arguments. 28 | parser.add_argument("mode", choices=["train", "predict", "thresh"], help="running mode") 29 | 30 | # Optional arguments for dataset configuration. 31 | group1 = parser.add_argument_group("dataset options") 32 | group1.add_argument("-i", "--input", metavar="PATH", default="data/mvtec_ad/bottle/train", 33 | help="input file/directory path") 34 | group1.add_argument("-o", "--output", metavar="PATH", default="output", 35 | help="output file/directory path") 36 | group1.add_argument("-t", "--trained", metavar="PATH", default="index.faiss", 37 | help="training results") 38 | group1.add_argument("-b", "--batch_size", metavar="INT", default=16, 39 | help="training batch size") 40 | group1.add_argument("-l", "--load_size", metavar="INT", default=224, 41 | help="size of loaded images") 42 | group1.add_argument("-n", "--input_size", metavar="INT", default=224, 43 | help="size of images passed to NN model") 44 | 45 | # Optional arguments for neural network configuration. 46 | group2 = parser.add_argument_group("network options") 47 | group2.add_argument("-m", "--model", metavar="STR", default="wide_resnet50_2", 48 | help="name of a neural network model") 49 | group2.add_argument("-r", "--repo", metavar="STR", default="pytorch/vision:v0.11.3", 50 | help="repository of the neural network model") 51 | 52 | # Optional arguments for anomaly detection algorithm. 53 | group3 = parser.add_argument_group("algorithm options") 54 | group3.add_argument("-k", "--n_neighbors", metavar="INT", type=int, default=3, 55 | help="number of neighbors to be searched") 56 | group3.add_argument("-s", "--sampling_ratio", metavar="FLT", default=0.01, 57 | help="ratio of coreset sub-sampling") 58 | 59 | # Optional arguments for thresholding. 60 | group4 = parser.add_argument_group("thresholding options") 61 | group4.add_argument("-e", "--coef_sigma", metavar="FLT", type=float, default=5.0, 62 | help="coefficient of sigma when computing threshold (= mean + coef * sigma)") 63 | 64 | # Optional arguments for visualization. 65 | group5 = parser.add_argument_group("visualization options") 66 | group5.add_argument("-c", "--contour", metavar="FLT", type=float, default=None, 67 | help="visualize contour map instead of heatmap using the given threshold") 68 | 69 | # Other optional arguments. 70 | group6 = parser.add_argument_group("other options") 71 | group6.add_argument("-d", "--device", metavar="STR", default="auto", 72 | help="device name (e.g. 'cuda')") 73 | group6.add_argument("-w", "--num_workers", metavar="INT", type=int, default=1, 74 | help="number of available CPUs") 75 | group6.add_argument("-h", "--help", action="help", 76 | help="show this help message and exit") 77 | 78 | return parser.parse_args() 79 | 80 | 81 | def main(args): 82 | """ 83 | Main function for running training/test procedure. 84 | 85 | Args: 86 | args (argparse.Namespace): Parsed command line arguments. 87 | """ 88 | rich.print(r"[yellow][Command line arguments][/yellow]") 89 | rich.print(vars(args)) 90 | 91 | if args.device == "auto": 92 | args.device = "cuda" if torch.cuda.is_available() else "cpu" 93 | 94 | # Create PatchCore model instance. 95 | model = PatchCore(args.model, args.repo, args.device, args.sampling_ratio) 96 | 97 | # Arguments required for dataset creation. 98 | # These arguments are mainly used for the transformations applied to 99 | # the input images and ground truth images. Details of the transformations 100 | # are written in the MVTecAD dataset class (see patchcore/dataset.py). 101 | dataset_args = { 102 | "load_shape" : (args.load_size, args.load_size), 103 | "input_shape": (args.input_size, args.input_size), 104 | "im_mean" : (0.485, 0.456, 0.406), 105 | "im_std" : (0.229, 0.224, 0.225), 106 | # The above mean and standard deviation is a values of the ImageNet dataset. 107 | # These values are required because the NN models pre-trained with ImageNet 108 | # assume that the input image is normalized in terms of ImageNet dataset. 109 | } 110 | 111 | if args.mode == "train": 112 | 113 | # Prepare dataset. 114 | dataset = MVTecADImageOnly(args.input, **dataset_args) 115 | 116 | # Train model. 117 | model.fit(dataset, args.batch_size, args.num_workers) 118 | 119 | # Save training result. 120 | model.save(args.trained) 121 | 122 | elif args.mode == "predict": 123 | 124 | # Load trained model. 125 | model.load(args.trained) 126 | 127 | # Prepare dataset. 128 | dataset = MVTecADImageOnly(args.input, **dataset_args) 129 | dloader = torch.utils.data.DataLoader(dataset, batch_size=1, num_workers=args.num_workers, pin_memory=True) 130 | 131 | for x, gt, label, filepath, x_type in rich.progress.track(dloader, description="Processing..."): 132 | 133 | # Run prediction and get anomaly heatmap. 134 | anomaly_map_rw = model.predict(x, args.n_neighbors) 135 | 136 | # Save anomaly heatmap (JPG image and NPY file). 137 | model.save_anomaly_map(args.output, anomaly_map_rw, filepath[0], x_type[0], contour=args.contour) 138 | 139 | elif args.mode == "thresh": 140 | 141 | # Load trained model. 142 | model.load(args.trained) 143 | 144 | # Prepare dataset. 145 | dataset = MVTecADImageOnly(args.input, **dataset_args) 146 | dloader = torch.utils.data.DataLoader(dataset, batch_size=1, num_workers=args.num_workers, pin_memory=True) 147 | 148 | # Initialize the anomaly scores. 149 | scores = list() 150 | 151 | # Compute max value of the anomaly heatmaps. 152 | for x, gt, label, filepath, x_type in rich.progress.track(dloader, description="Processing..."): 153 | 154 | # Run prediction and get anomaly heatmap. 155 | anomaly_map_rw = model.predict(x, args.n_neighbors) 156 | 157 | # Append the anomaly score. 158 | scores.append(np.max(anomaly_map_rw)) 159 | 160 | # Compute threshold. 161 | thresh, score_mean, score_std = auto_threshold(scores, args.coef_sigma) 162 | 163 | print("Anomaly threshold = %f" % thresh) 164 | print(" - score_mean = %f" % score_mean) 165 | print(" - score_std = %f" % score_std) 166 | 167 | 168 | if __name__ == "__main__": 169 | main(parse_args()) 170 | -------------------------------------------------------------------------------- /main_mvtecad.py: -------------------------------------------------------------------------------- 1 | """ 2 | PatchCore anomaly detection: main script for MVTec AD dataset 3 | """ 4 | 5 | # Import standard libraries. 6 | import argparse 7 | import os 8 | import pathlib 9 | import subprocess 10 | 11 | # Import third-party packages. 12 | import rich 13 | import rich.progress 14 | import torch 15 | 16 | # Import custom modules. 17 | from patchcore.dataset import MVTecAD 18 | from patchcore.patchcore import PatchCore 19 | 20 | 21 | def parse_args(): 22 | """ 23 | Parse command line arguments. 24 | """ 25 | fmtcls = lambda prog: argparse.HelpFormatter(prog,max_help_position=50) 26 | parser = argparse.ArgumentParser(description=__doc__, add_help=False, formatter_class=fmtcls) 27 | 28 | # Required arguments. 29 | parser.add_argument("mode", choices=["train", "test", "runall", "summ"], 30 | help="running mode") 31 | 32 | # Common optional arguments. 33 | group0 = parser.add_argument_group("common options") 34 | group0.add_argument("--datadir", default="data/mvtec_ad", 35 | help="path to MVTec AD dataset") 36 | 37 | # Optional arguments for training and test configuration. 38 | group1 = parser.add_argument_group("train/test options") 39 | group1.add_argument("--category", default="hazelnut", 40 | help="data category (e.g. 'hazelnut')") 41 | group1.add_argument("--device", metavar="STR", default="auto", 42 | help="device name (e.g. 'cuda')") 43 | group1.add_argument("--model", metavar="STR", default="wide_resnet50_2", 44 | help="name of a neural network model") 45 | group1.add_argument("--repo", metavar="STR", default="pytorch/vision:v0.11.3", 46 | help="repository of the neural network model") 47 | group1.add_argument("--n_neighbors", metavar="INT", type=int, default=3, 48 | help="number of neighbors to be searched") 49 | group1.add_argument("--sampling_ratio", metavar="FLT", default=0.01, 50 | help="ratio of coreset sub-sampling") 51 | group1.add_argument("--outdir", metavar="PATH", default="output", 52 | help="output file/directory path") 53 | group1.add_argument("--batch_size", metavar="INT", default=16, 54 | help="training batch size") 55 | group1.add_argument("--load_size", metavar="INT", default=256, 56 | help="size of loaded images") 57 | group1.add_argument("--input_size", metavar="INT", default=224, 58 | help="size of images passed to NN model") 59 | group1.add_argument("--num_workers", metavar="INT", type=int, default=1, 60 | help="number of available CPUs") 61 | 62 | # Optional arguments for running experiments configuration. 63 | group2 = parser.add_argument_group("runall options") 64 | group2.add_argument("--dryrun", action="store_true", 65 | help="only dump the commands and do nothing") 66 | group2.add_argument("--test_only", action="store_true", 67 | help="run only test procedure") 68 | group2.add_argument("--no_redirect", action="store_true", 69 | help="do not redirect dump messages") 70 | 71 | # Other optional arguments. 72 | group3 = parser.add_argument_group("other options") 73 | group3.add_argument("-h", "--help", action="help", 74 | help="show this help message and exit") 75 | 76 | return parser.parse_args() 77 | 78 | 79 | def main_traintest(args): 80 | """ 81 | Main function for running training/test procedure. 82 | 83 | Args: 84 | args (argparse.Namespace): Parsed command line arguments. 85 | """ 86 | rich.print("[yellow]Mode[/yellow]: [green]" + args.mode + "[/green]") 87 | rich.print(r"[yellow][Command line arguments][/yellow]") 88 | rich.print(vars(args)) 89 | 90 | if args.device == "auto": 91 | args.device = "cuda" if torch.cuda.is_available() else "cpu" 92 | 93 | # Create a path to input dataset. 94 | dirpath_dataset = os.path.join(args.datadir, args.category) 95 | 96 | # Create PatchCore model instance. 97 | model = PatchCore(args.model, args.repo, args.device, args.sampling_ratio) 98 | 99 | # Arguments required for dataset creation. 100 | # These arguments are mainly used for the transformations applied to 101 | # the input images and ground truth images. Details of the transformations 102 | # are written in the MVTecAD dataset class (see patchcore/dataset.py). 103 | dataset_args = { 104 | "load_shape" : (args.load_size, args.load_size), 105 | "input_shape": (args.input_size, args.input_size), 106 | "im_mean" : (0.485, 0.456, 0.406), 107 | "im_std" : (0.229, 0.224, 0.225), 108 | # The above mean and standard deviation is a values of the ImageNet dataset. 109 | # These values are required because the NN models pre-trained with ImageNet 110 | # assume that the input image is normalized in terms of ImageNet dataset. 111 | } 112 | 113 | # In training mode, run both training and test. 114 | if args.mode == "train": 115 | dataset = MVTecAD(dirpath_dataset, "train", **dataset_args) 116 | model.fit(dataset, args.batch_size, args.num_workers) 117 | model.save(os.path.join(args.outdir, "index.faiss")) 118 | 119 | dataset = MVTecAD(dirpath_dataset, "test", **dataset_args) 120 | model.score(dataset, args.n_neighbors, args.outdir, args.num_workers) 121 | 122 | # In test mode, run test only. 123 | elif args.mode == "test": 124 | dataset = MVTecAD(dirpath_dataset, "test", **dataset_args) 125 | model.load(os.path.join(args.outdir, "index.faiss")) 126 | model.score(dataset, args.n_neighbors, args.outdir, args.num_workers) 127 | 128 | 129 | def main_runall(args): 130 | """ 131 | Run all experiments. 132 | 133 | Args: 134 | args (argparse.Namespace): Parsed command line arguments. 135 | """ 136 | rich.print("[yellow]Mode[/yellow]: [green]" + args.mode + "[/green]") 137 | rich.print("[yellow]Command line arguments:[/yellow]") 138 | rich.print(vars(args)) 139 | 140 | dirpaths = [dirpath for dirpath in pathlib.Path(args.datadir).glob("*") if dirpath.is_dir()] 141 | 142 | for dirpath in sorted(dirpaths): 143 | 144 | program = "python3 main_mvtecad.py " + ("test" if args.test_only else "train") 145 | category = dirpath.name 146 | model = args.model 147 | repo = args.repo 148 | outdir = f"experiments/data_{model}/{category}" 149 | outfile = outdir + "/log.txt" 150 | redirect = "" if args.no_redirect else f" > {outfile}" 151 | command = f"{program} --category {category} --repo {repo} --model {model} --outdir {outdir} {redirect}" 152 | 153 | rich.print("[yellow]Running[/yellow]: " + command) 154 | 155 | # Run command. 156 | if not args.dryrun: 157 | os.makedirs(outdir, exist_ok=True) 158 | subprocess.run(command, shell=True) 159 | 160 | 161 | def main_summarize(args): 162 | """ 163 | Summarize experiment results. 164 | 165 | Args: 166 | args (argparse.Namespace): Parsed command line arguments. 167 | """ 168 | def glob_dir(path, pattern): 169 | """ 170 | Returns only directory. 171 | """ 172 | for target in path.glob(pattern): 173 | if target.is_dir(): 174 | yield target 175 | 176 | def get_value(line): 177 | """ 178 | Get score value from a line string. 179 | """ 180 | return float(line.strip().split(":")[-1].split()[0]) 181 | 182 | def get_scores(filepath): 183 | """ 184 | Get all scores from the given file and returns it as a dict. 185 | """ 186 | scores = dict() 187 | for line in filepath.open(): 188 | if line.startswith("Total pixel-level") : scores["pixel-level"] = get_value(line) 189 | elif line.startswith("Total image-level") : scores["image-level"] = get_value(line) 190 | elif line.startswith("Feature extraction"): scores["time-featex"] = get_value(line) 191 | elif line.startswith("Anomaly map search"): scores["time-anmaps"] = get_value(line) 192 | elif line.startswith("Total infer time") : scores["time-itotal"] = get_value(line) 193 | return scores 194 | 195 | def get_results(root): 196 | """ 197 | Create a dictionaly which contains experiments results 198 | where the key order is `results[network][category][score]`. 199 | """ 200 | results = dict() 201 | for dirpath in glob_dir(pathlib.Path(root), "data_*"): 202 | results[dirpath.name] = dict() 203 | for dirpath_cat in sorted(glob_dir(dirpath, "*")): 204 | results[dirpath.name][dirpath_cat.name] = get_scores(dirpath_cat / "log.txt") 205 | return results 206 | 207 | def print_results(results): 208 | """ 209 | Print summary table to STDOUT. 210 | """ 211 | networks = list(results.keys()) 212 | categories = list(results[networks[0]].keys()) 213 | scores = list(results[networks[0]][categories[0]].keys()) 214 | 215 | # Print table (scores for each netwotks). 216 | for score in scores: 217 | 218 | header = [score] + categories 219 | print(",".join(header)) 220 | 221 | # Print row (scores) for each networks. 222 | for network in networks: 223 | row = [network] + [results[network][c][score] for c in categories] 224 | print(",".join(map(str, row))) 225 | 226 | # Get results and print it. 227 | print_results(get_results("experiments")) 228 | 229 | 230 | def main(args): 231 | """ 232 | Entry point of this script. 233 | 234 | Args: 235 | args (argparse.Namespace): Parsed command line arguments. 236 | """ 237 | if args.mode in ["train", "test"]: main_traintest(args) 238 | elif args.mode in ["runall"] : main_runall(args) 239 | elif args.mode in ["summ"] : main_summarize(args) 240 | else : raise ValueError("unknown mode: " + args.mode) 241 | 242 | 243 | if __name__ == "__main__": 244 | main(parse_args()) 245 | -------------------------------------------------------------------------------- /patchcore/dataset.py: -------------------------------------------------------------------------------- 1 | """ 2 | This module provides a PyTorch implementation of the MVTec AD dataset. 3 | """ 4 | 5 | # Import standard libraries. 6 | import pathlib 7 | 8 | # Import third-party packages. 9 | import numpy as np 10 | import PIL.Image 11 | import torch 12 | import torchvision 13 | 14 | 15 | class MVTecAD(torch.utils.data.Dataset): 16 | """ 17 | PyTorch implementation of the MVTec AD dataset [1]. 18 | 19 | Reference: 20 | [1] 21 | """ 22 | def __init__(self, root="data", split="train", transform_im=None, transform_gt=None, 23 | load_shape=(256, 256), input_shape=(224, 224), 24 | im_mean=(0.485, 0.456, 0.406), im_std=(0.229, 0.224, 0.225)): 25 | """ 26 | Constructor of MVTec AD dataset. 27 | 28 | Args: 29 | root (str) : Dataset directory. 30 | split (str) : The dataset split, supports `train`, or `test`. 31 | transform_im (func) : Transform for the input image. 32 | transform_gt (func) : Transform for the ground truth image. 33 | load_shape (tuple): Shape of the loaded image. 34 | input_shape (tuple): Shape of the input image. 35 | im_mean (tuple): Mean of image (3 channels) for image normalization. 36 | im_std (tuple): Standard deviation of image (3 channels) for image normalization. 37 | 38 | Notes: 39 | The arguments `load_shape`, `input_shape`, `im_mean`, and `im_std` are used 40 | only if the `transform_im` or `transform_gt` is None. 41 | """ 42 | self.root = pathlib.Path(root) 43 | 44 | # Set directory path of input and ground truth images. 45 | if split == "train": 46 | self.dir_im = self.root / "train" 47 | self.dir_gt = None 48 | elif split == "test": 49 | self.dir_im = self.root / "test" 50 | self.dir_gt = self.root / "ground_truth" 51 | 52 | # The value of `split` should be either of "train" or "test". 53 | else: raise ValueError("Error: argument `split` should be 'train' or 'test'.") 54 | 55 | # Use default transform if no transform specified. 56 | args = (load_shape, input_shape, im_mean, im_std) 57 | self.transform_im = self.default_transform_im(*args) if transform_im is None else transform_im 58 | self.transform_gt = self.default_transform_gt(*args) if transform_gt is None else transform_gt 59 | 60 | self.paths_im, self.paths_gt, self.labels, self.anames = self.load_dataset(self.dir_im, self.dir_gt) 61 | 62 | def __getitem__(self, idx): 63 | """ 64 | Returns idx-th data. 65 | 66 | Args: 67 | idx (int): Index of image to be returned. 68 | """ 69 | path_im = self.paths_im[idx] # Image file path. 70 | path_gt = self.paths_gt[idx] # Ground truth file path. 71 | flag_an = self.labels[idx] # Anomaly flag (good : 0, anomaly : 1). 72 | name_an = self.anames[idx] # Anomaly name. 73 | 74 | # Load input image. 75 | img = PIL.Image.open(str(path_im)).convert("RGB") 76 | 77 | # If good data, use zeros as a ground truth image. 78 | if flag_an == 0: 79 | igt = PIL.Image.fromarray(np.zeros(img.size[::-1], dtype=np.uint8)) 80 | 81 | # Otherwise, load ground truth data. 82 | elif flag_an == 1: 83 | igt = PIL.Image.open(str(path_gt)).convert("L") 84 | 85 | # Anomaly flag should be either of 0 or 1. 86 | else: raise ValueError("Error: value of `flag_an` should be 0 or 1.") 87 | 88 | # Size of the input and ground truth image should be the same. 89 | assert img.size == igt.size, "image.size != igt.size !!!" 90 | 91 | # Apply transforms. 92 | img = self.transform_im(img) 93 | igt = self.transform_gt(igt) 94 | 95 | return (img, igt, flag_an, str(path_im), name_an) 96 | 97 | def __len__(self): 98 | """ 99 | Returns number of data. 100 | """ 101 | return len(self.paths_im) 102 | 103 | def load_dataset(self, dir_im, dir_gt): 104 | """ 105 | Load dataset. 106 | 107 | Args: 108 | dir_im (pathlib.Path): Path to the input image directory. 109 | dir_gt (pathlib.Path): Path to the ground truth image directory. 110 | """ 111 | paths_im = list() # List of image file paths. 112 | paths_gt = list() # List of ground truth file paths. 113 | flags_an = list() # List of anomaly flags (good : 0, anomaly : 1). 114 | names_an = list() # List of anomaly names. 115 | 116 | for subdir in sorted(dir_im.iterdir()): 117 | 118 | # Name of the sub directory is the same as the anomaly name. 119 | defect_name = subdir.name 120 | 121 | # Case 1: good data which have only input image. 122 | if defect_name == "good": 123 | 124 | # Get input image paths (good data doesn't have ground truth image). 125 | paths = sorted((dir_im / defect_name).glob("*.png")) 126 | 127 | # Update attributes. 128 | paths_im += paths 129 | paths_gt += len(paths) * [None] 130 | flags_an += len(paths) * [0] 131 | names_an += len(paths) * [defect_name] 132 | 133 | # Case 2: not good data which have both input and ground truth images. 134 | else: 135 | 136 | # Get input and ground truth image paths. 137 | paths1 = sorted((dir_im / defect_name).glob("*.png")) 138 | paths2 = sorted((dir_gt / defect_name).glob("*.png")) 139 | 140 | # Update attributes. 141 | paths_im += paths1 142 | paths_gt += paths2 143 | flags_an += len(paths1) * [1] 144 | names_an += len(paths2) * [defect_name] 145 | 146 | # Number of input image and ground truth image sould be the same. 147 | assert len(paths_im) == len(paths_gt), "Something wrong with test and ground truth pair!" 148 | 149 | return (paths_im, paths_gt, flags_an, names_an) 150 | 151 | def default_transform_im(self, load_shape, input_shape, im_mean, im_std): 152 | """ 153 | Returns default transform for the input image of MVTec AD dataset. 154 | """ 155 | return torchvision.transforms.Compose([ 156 | torchvision.transforms.Resize(load_shape), 157 | torchvision.transforms.ToTensor(), 158 | torchvision.transforms.CenterCrop(input_shape), 159 | torchvision.transforms.Normalize(mean=im_mean, std=im_std), 160 | ]) 161 | 162 | def default_transform_gt(self, load_shape, input_shape, im_mean, im_std): 163 | """ 164 | Returns default transform for the ground truth image of MVTec AD dataset. 165 | """ 166 | return torchvision.transforms.Compose([ 167 | torchvision.transforms.Resize(load_shape), 168 | torchvision.transforms.ToTensor(), 169 | torchvision.transforms.CenterCrop(input_shape), 170 | ]) 171 | 172 | 173 | class MVTecADImageOnly(torch.utils.data.Dataset): 174 | """ 175 | Dataset class that is quite close to MVTecAD, but works 176 | if no ground truth image is available. This class will be 177 | used for user defined datasets. 178 | """ 179 | def __init__(self, root="data", transform=None, 180 | load_shape=(256, 256), input_shape=(224, 224), 181 | im_mean=(0.485, 0.456, 0.406), im_std=(0.229, 0.224, 0.225)): 182 | """ 183 | Constructor of MVTec AD dataset. 184 | 185 | Args: 186 | root (str) : Image directory. 187 | transform (func) : Transform for the input image. 188 | load_shape (tuple): Shape of the loaded image. 189 | input_shape (tuple): Shape of the input image. 190 | im_mean (tuple): Mean of image (3 channels) for image normalization. 191 | im_std (tuple): Standard deviation of image (3 channels) for image normalization. 192 | 193 | Notes: 194 | The arguments `load_shape`, `input_shape`, `im_mean`, and `im_std` are used 195 | only if the `transform_im` or `transform_gt` is None. 196 | """ 197 | self.root = pathlib.Path(root) 198 | 199 | # Use default transform if no transform specified. 200 | args = (load_shape, input_shape, im_mean, im_std) 201 | self.transform = self.default_transform(*args) if transform is None else transform 202 | 203 | self.paths = [path for path in self.root.glob("**/*") if path.suffix in [".jpg", ".png"]] 204 | 205 | def __getitem__(self, idx): 206 | """ 207 | Returns idx-th data. 208 | 209 | Args: 210 | idx (int): Index of image to be returned. 211 | """ 212 | 213 | # Load input image. 214 | path = self.paths[idx] 215 | img = PIL.Image.open(str(path)).convert("RGB") 216 | 217 | # Apply transforms. 218 | img = self.transform(img) 219 | 220 | # Returns only image while keeping the same interface as the MVTecAD class. 221 | return (img, 0, 0, str(path), 0) 222 | 223 | def __len__(self): 224 | """ 225 | Returns number of data. 226 | """ 227 | return len(self.paths) 228 | 229 | def default_transform(self, load_shape, input_shape, im_mean, im_std): 230 | """ 231 | Returns default transform for the input image of MVTec AD dataset. 232 | """ 233 | return torchvision.transforms.Compose([ 234 | torchvision.transforms.Resize(load_shape), 235 | torchvision.transforms.ToTensor(), 236 | torchvision.transforms.CenterCrop(input_shape), 237 | torchvision.transforms.Normalize(mean=im_mean, std=im_std), 238 | ]) 239 | 240 | 241 | if __name__ == "__main__": 242 | 243 | # Test the training data. 244 | dataset = MVTecAD("data/mvtec_anomaly_detection/hazelnut", "train") 245 | print(dataset[0]) 246 | 247 | # Test the test data. 248 | dataset = MVTecAD("data/mvtec_anomaly_detection/hazelnut", "test") 249 | print(dataset[0]) 250 | -------------------------------------------------------------------------------- /patchcore/extractor.py: -------------------------------------------------------------------------------- 1 | """ 2 | This module provides a feature extractor which extract intermediate features 3 | from a neural network model and reshape it to a convenient format for PatchCore. 4 | """ 5 | 6 | # Import standard libraries. 7 | import warnings 8 | 9 | # Import third-party packages. 10 | import numpy as np 11 | import rich 12 | import rich.progress 13 | import torch 14 | import thop 15 | 16 | 17 | # Ignore "UserWarning" because many UserWarnings will be raised 18 | # when calling `thop.profile` function (because of custom layers). 19 | warnings.simplefilter("ignore", UserWarning) 20 | 21 | 22 | class FeatureExtractor: 23 | """ 24 | A class to extract intermediate features from a NN model 25 | and reshape it to a convenient format for PatchCore. 26 | 27 | Example: 28 | >>> model = "pytorch/vision:v0.10.0" # Or your custom model 29 | >>> dataset = MVTecAD() 30 | >>> extractor = FeatureExtractor(model) 31 | >>> 32 | >>> extractor.transform(dataset) 33 | """ 34 | def __init__(self, model, repo, device): 35 | """ 36 | Constructor of the featureExtractor class. 37 | 38 | Args: 39 | model (str/torch.nn.Module): A base NN model. 40 | repo (str) : Repository name which provides the model. 41 | device (str) : Device name (e.g. "cuda"). 42 | """ 43 | def load_model(repo, model): 44 | try : return torch.hub.load(repo, model, verbose=False, pretrained=True) 45 | except: return torch.hub.load(repo, model, verbose=False) 46 | 47 | if isinstance(model, str): 48 | self.model = load_model(repo, model) 49 | self.name = model 50 | macs, pars = thop.profile(self.model, inputs=(torch.randn(1, 3, 224, 224),), verbose=False) 51 | rich.print("Model summary (assume 3x224x224 input):") 52 | rich.print(" - [green]repo[/green]: [magenta]{:s}[/magenta]".format(repo)) 53 | rich.print(" - [green]name[/green]: [magenta]{:s}[/magenta]".format(model)) 54 | rich.print(" - [green]pars[/green]: [magenta]{:,}[/magenta]".format(int(pars))) 55 | rich.print(" - [green]macs[/green]: [magenta]{:,}[/magenta]".format(int(macs))) 56 | else: 57 | rich.print("Custom model specified") 58 | self.model = model 59 | self.name = model.__class__.__name__ 60 | 61 | # Send model to the device. 62 | self.device = device 63 | self.model.to(device) 64 | 65 | # Freeze the model. 66 | self.model.eval() 67 | for param in self.model.parameters(): 68 | param.requires_grad = False 69 | 70 | # Embed hook functions to the mode for extracting intermediate features. 71 | self.model = self.embed_hooks(self.model) 72 | 73 | def forward(self, x): 74 | """ 75 | Extract intermediate feature from a single batch data. 76 | Note that the output tensor is still a tensor of 4-rank (not reshaped). 77 | 78 | Args: 79 | x (torch.Tensor): Input tensor of shape (N, C_in, H_in, W_in). 80 | 81 | Returns: 82 | embeddings (torch.Tensor): Embedding representation of shape (N, C_out, H_out, W_out). 83 | """ 84 | def concat_embeddings(*xs): 85 | """ 86 | Concatenate the given intermediate features with resizing. 87 | 88 | Args: 89 | x[i] (torch.Tensor): Input tensor of i-th argument with shape (N, C_i, H_i, W_i). 90 | 91 | Returns: 92 | z (torch.Tensor): Concatenated tensor with shape (N, sum(C_i), max(H_i), max(W_i)). 93 | """ 94 | # Compute maximum shape. 95 | H_max = max([x.shape[2] for x in xs]) 96 | W_max = max([x.shape[3] for x in xs]) 97 | 98 | # Create resize function instance. 99 | resizer = torch.nn.Upsample(size=(H_max, W_max), mode="nearest") 100 | 101 | # Apply resize function for all input tensors. 102 | zs = [resizer(x) for x in xs] 103 | 104 | # Concatenate in the channel dimention and return it. 105 | return torch.cat(zs, dim=1) 106 | 107 | # Extract features using hook mechanism. 108 | self.features = [] 109 | _ = self.model(x.to(self.device)) 110 | 111 | # Apply smoothing (3x3 average pooling) to the features. 112 | smoothing = torch.nn.AvgPool2d(kernel_size=3, stride=1, padding=1) 113 | features = [smoothing(feature).cpu() for feature in self.features] 114 | 115 | # Concatenate intermediate features. 116 | embedding = concat_embeddings(*features) 117 | 118 | return embedding 119 | 120 | def transform(self, data, description="Extracting...", **kwargs): 121 | """ 122 | Extract features from the given data. 123 | This function can handle 2 types of input: 124 | (1) dataset (torch.utils.data.Dataset), 125 | (2) tensor (torch.Tensor), 126 | where the shape of the tensor is (N, C, H, W). 127 | 128 | Args: 129 | data (Dataset/Tensor): Input data. 130 | description (str) : Message shown on the progress bar. 131 | kwargs (dict) : Keyword arguments for DataLoader class constructor. 132 | 133 | Returns: 134 | embeddings (torch.Tensor): Embedding representations of shape (N*H*W, C). 135 | """ 136 | def flatten_NHW(tensor): 137 | """ 138 | Flatten the given tensor of rank-4 with shape (N, C, H, W) 139 | in terms of N, H and W, and returns a matrix with shape (N*H*W, C). 140 | 141 | Args: 142 | tensor (torch.Tensor): Tensor of shape (N, C, H, W). 143 | 144 | Returns: 145 | matrix (torch.Tensor): Tensor of shape (N*H*W, C). 146 | """ 147 | return tensor.permute((0, 2, 3, 1)).flatten(start_dim=0, end_dim=2) 148 | 149 | # Case 1: input data is a dataset. 150 | if isinstance(data, torch.utils.data.Dataset): 151 | 152 | # Create data loader. 153 | dataloader = torch.utils.data.DataLoader(data, **kwargs, pin_memory=True) 154 | 155 | # Extract features for each batch. 156 | embeddings_list = list() 157 | for x, _, _, _, _ in rich.progress.track(dataloader, description=description): 158 | embedding = self.forward(x) 159 | embeddings_list.append(flatten_NHW(embedding).cpu().numpy()) 160 | 161 | # Concat results for each batch and 162 | return np.concatenate(embeddings_list, axis=0) 163 | 164 | # Case 2: input data is a single tensor (i.e. single batch). 165 | elif isinstance(data, torch.Tensor): 166 | embedding = self.forward(data) 167 | return flatten_NHW(embedding).cpu().numpy() 168 | 169 | def embed_hooks(self, model): 170 | """ 171 | Embed hook functions to a NN model for extracting intermediate features. 172 | """ 173 | # Hook function for capturing intermediate features. 174 | def hook(module, input, output): 175 | self.features.append(output) 176 | 177 | RESNET_FAMILIES = ["resnet18", "resnet34", "resnet50", "resnet101", "resnet152", 178 | "resnext101_32x8d", "resnext50_32x4d", 179 | "wide_resnet50_2", "wide_resnet101_2"] 180 | 181 | DEEPLAB_RESNET = ["deeplabv3_resnet50", "deeplabv3_resnet101"] 182 | 183 | DENSENET_FAMILIES = ["densenet121", "densenet161", "densenet169", "densenet201"] 184 | 185 | VGG_FAMILIES = ["vgg11", "vgg11_bn", "vgg13", "vgg13_bn", 186 | "vgg16", "vgg16_bn", "vgg19", "vgg19_bn"] 187 | 188 | if self.name in RESNET_FAMILIES: 189 | model.layer2[-1].register_forward_hook(hook) 190 | model.layer3[-1].register_forward_hook(hook) 191 | 192 | elif self.name in DEEPLAB_RESNET: 193 | model.backbone.layer2[-1].register_forward_hook(hook) 194 | model.backbone.layer3[-1].register_forward_hook(hook) 195 | 196 | elif self.name in DENSENET_FAMILIES: 197 | model.features.denseblock2.register_forward_hook(hook) 198 | model.features.denseblock3.register_forward_hook(hook) 199 | 200 | # In the case of VGG, register the 2nd and 3rd MaxPool counted from the bottom. 201 | elif self.name in VGG_FAMILIES: 202 | num_maxpool = 0 203 | for idx, module in reversed(list(enumerate(model.features))): 204 | if module.__class__.__name__ == "MaxPool2d": 205 | num_maxpool += 1 206 | if num_maxpool in [2, 3]: 207 | model.features[idx].register_forward_hook(hook) 208 | 209 | # Network proposed by the following paper: 210 | # I. Zeki Yalniz, H. Jegou, K. Chen, M. Paluri, and D. Mahajan, 211 | # "Billion-scale semi-supervised learning for image classification", CVPR, 2019. 212 | # 213 | # 214 | # PyTorch Hub: 215 | # GitHub repo: 216 | elif self.name in ["resnet18_swsl", "resnet50_swsl", "resnet18_ssl", "resnet50_ssl"]: 217 | model.layer2.register_forward_hook(hook) 218 | model.layer3.register_forward_hook(hook) 219 | 220 | # Raise an error if the given network is unknown. 221 | else: RuntimeError("unknown neural network: no hooks registered") 222 | 223 | return model 224 | -------------------------------------------------------------------------------- /patchcore/knnsearch.py: -------------------------------------------------------------------------------- 1 | """ 2 | This module provides a k-NN searcher class using faiss backend, 3 | and sampling algorithms which returns a set of points that minimizes 4 | the maximum distance of any point to a center. 5 | """ 6 | 7 | # Import third-party packages. 8 | import faiss 9 | import numpy as np 10 | import rich 11 | import sklearn.metrics 12 | import sklearn.random_projection 13 | 14 | 15 | class KCenterGreedy: 16 | """ 17 | Python implementation of the k-Center-Greedy method in [1]. 18 | 19 | Distance metric defaults to L2 distance. Features used to calculate distance 20 | are either raw features or if a model has transform method then uses the output 21 | of model.transform(X). 22 | 23 | This algorithm can be extended to a robust k centers algorithm that ignores 24 | a certain number of outlier datapoints. Resulting centers are solution to 25 | multiple integer programing problem. 26 | 27 | Reference: 28 | [1] O. Sener and S. Savarese, "A Geometric Approach to Active Learning for 29 | Convolutional Neural Networks", arXiv, 2017. 30 | 31 | 32 | Notes: 33 | This code originally comesfrom the following code written by Google 34 | which is released under the Apache License 2.0 (as of Jan 25, 2022): 35 | 36 | 37 | 38 | 39 | The following is the description of the license applied to these code. 40 | --- 41 | Copyright 2017 Google Inc. 42 | 43 | Licensed under the Apache License, Version 2.0 (the "License"); 44 | you may not use this file except in compliance with the License. 45 | You may obtain a copy of the License at 46 | 47 | http://www.apache.org/licenses/LICENSE-2.0 48 | 49 | Unless required by applicable law or agreed to in writing, software 50 | distributed under the License is distributed on an "AS IS" BASIS, 51 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 52 | See the License for the specific language governing permissions and 53 | limitations under the License. 54 | """ 55 | def __init__(self, X, y, seed, metric="euclidean"): 56 | self.X = X 57 | self.y = y 58 | self.name = "kcenter" 59 | self.metric = metric 60 | self.min_distances = None 61 | self.n_obs = self.X.shape[0] 62 | self.already_selected = [] 63 | 64 | def update_distances(self, cluster_centers, only_new=True, reset_dist=False): 65 | """ 66 | Update min distances given cluster centers. 67 | 68 | Args: 69 | cluster_centers (list): indices of cluster centers 70 | only_new (bool): only calculate distance for newly selected points 71 | and update min_distances. 72 | rest_dist (bool): whether to reset min_distances variable. 73 | """ 74 | if reset_dist: 75 | self.min_distances = None 76 | 77 | if only_new: 78 | cluster_centers = [d for d in cluster_centers if d not in self.already_selected] 79 | 80 | if cluster_centers: 81 | 82 | # Update min_distances for all examples given new cluster center. 83 | x = self.features[cluster_centers] 84 | dist = sklearn.metrics.pairwise_distances(self.features, x, metric=self.metric) 85 | 86 | if self.min_distances is None: 87 | self.min_distances = np.min(dist, axis=1).reshape(-1,1) 88 | else: 89 | self.min_distances = np.minimum(self.min_distances, dist) 90 | 91 | def select_batch(self, model, already_selected, N, **kwargs): 92 | """ 93 | Diversity promoting active learning method that greedily forms a batch 94 | to minimize the maximum distance to a cluster center among all unlabeled 95 | datapoints. 96 | 97 | Args: 98 | model: model with scikit-like API with decision_function implemented 99 | already_selected: index of datapoints already selected 100 | N: batch size 101 | 102 | Returns: 103 | indices of points selected to minimize distance to cluster centers 104 | """ 105 | # Assumes that the transform function takes in original data and not flattened data. 106 | if model is not None: self.features = model.transform(self.X) 107 | else : self.features = self.X.reshape((self.X.shape[0], -1)) 108 | 109 | # Compute distances. 110 | self.update_distances(already_selected, only_new=False, reset_dist=True) 111 | 112 | # Initialize sampling results. 113 | new_batch = [] 114 | 115 | for _ in rich.progress.track(range(N), description="Sampling..."): 116 | 117 | # Initialize centers with a randomly selected datapoint 118 | if self.already_selected is None: 119 | ind = np.random.choice(np.arange(self.n_obs)) 120 | 121 | # Otherwise, use the index of minimum distance. 122 | else: 123 | ind = np.argmax(self.min_distances) 124 | 125 | # New examples should not be in already selected since those points 126 | # should have min_distance of zero to a cluster center. 127 | assert ind not in already_selected 128 | 129 | self.update_distances([ind], only_new=True, reset_dist=False) 130 | 131 | new_batch.append(ind) 132 | 133 | # Memorize the already selected indices. 134 | self.already_selected = already_selected 135 | 136 | # Print summaries. 137 | rich.print("Maximum distance from cluster centers: [magenta]%0.2f[/magenta]" % max(self.min_distances)) 138 | rich.print("Initial number of features: [magenta]%d[/magenta]" % self.X.shape[0]) 139 | rich.print("Sampled number of features: [magenta]%d[/magenta]" % len(new_batch)) 140 | 141 | return new_batch 142 | 143 | 144 | class KNNSearcher: 145 | """ 146 | A class for k-NN search with dimention resuction (random projection) 147 | and subsampling (k-center greedy method) features. 148 | """ 149 | def __init__(self, projection=True, subsampling=True, sampling_ratio=0.01): 150 | """ 151 | Constructor of the KNNSearcher class. 152 | 153 | Args: 154 | projection (bool) : Enable random projection if true. 155 | subsampling (bool) : Enable subsampling if true. 156 | sampling_ratio (float): Ratio of subsampling. 157 | """ 158 | self.projection = projection 159 | self.subsampling = subsampling 160 | self.sampling_ratio = sampling_ratio 161 | 162 | def fit(self, x): 163 | """ 164 | Train k-NN search model. 165 | 166 | Args: 167 | x (np.ndarray): Training data of shape (n_samples, n_features). 168 | """ 169 | # Apply random projection if specified. Random projection is used for reducing 170 | # dimention while keeping topology. It makes the k-center greedy algorithm faster. 171 | if self.projection: 172 | 173 | rich.print("Sparse random projection") 174 | 175 | # If number of features is much smaller than the number of samples, random 176 | # projection will fail due to the lack of number of features. In that case, 177 | # please increase the parameter `eps`, or just skip the random projection. 178 | projector = sklearn.random_projection.SparseRandomProjection(n_components="auto", eps=0.90) 179 | projector.fit(x) 180 | 181 | # Print the shape of random matrix: (n_features_after, n_features_before). 182 | shape = projector.components_.shape 183 | rich.print(" - [green]random matrix shape[/green]: [cyan]%s[/cyan]" % str(shape)) 184 | 185 | # Set None if random projection is no specified. 186 | else: projector = None 187 | 188 | # Execute coreset subsampling. 189 | if self.subsampling: 190 | rich.print("Coreset subsampling") 191 | n_select = int(x.shape[0] * self.sampling_ratio) 192 | selector = KCenterGreedy(x, 0, 0) 193 | indices = selector.select_batch(projector, [], n_select) 194 | x = x[indices, :] 195 | 196 | # Setup nearest neighbour finder using Faiss library. 197 | self.index = faiss.IndexFlatL2(x.shape[1]) 198 | self.index.add(x) 199 | 200 | def predict(self, x, k=3): 201 | """ 202 | Run k-NN search prediction. 203 | 204 | Args: 205 | x (np.ndarray): Query data of shape (n_samples, n_features) 206 | k (int) : Number of neighbors to be searched. 207 | 208 | Returns: 209 | dists (np.ndarray): Distance between the query and searched data, 210 | where the shape is (n_samples, n_neighbors). 211 | indices (np.ndarray): List of indices to be searched of shape 212 | (n_samples, n_neighbors). 213 | """ 214 | # Faiss searcher requires C-contiguous array as input, 215 | # therefore forcibly convert the input data to contiguous array. 216 | x = np.ascontiguousarray(x) 217 | 218 | # Run k-NN search. 219 | dists, indices = self.index.search(x , k=k) 220 | 221 | return (dists, indices) 222 | 223 | def load(self, filepath): 224 | self.index = faiss.read_index(filepath) 225 | # if torch.cuda.is_available(): 226 | # res = faiss.StandardGpuResources() 227 | # self.index = faiss.index_cpu_to_gpu(res, 0, self.index) 228 | 229 | def save(self, filepath): 230 | if hasattr(self, "index"): faiss.write_index(self.index, filepath) 231 | else : raise RuntimeError("this model not traint yet") 232 | -------------------------------------------------------------------------------- /patchcore/patchcore.py: -------------------------------------------------------------------------------- 1 | """ 2 | This module provides a class for PatchCore algorithm. 3 | """ 4 | 5 | # Import standard libraries. 6 | import os 7 | import pathlib 8 | 9 | # Import third-party packages. 10 | import cv2 as cv 11 | import numpy as np 12 | import rich 13 | import rich.progress 14 | import sklearn.metrics 15 | import scipy.ndimage 16 | import torch 17 | 18 | # Import custom modules. 19 | from patchcore.extractor import FeatureExtractor 20 | from patchcore.knnsearch import KNNSearcher 21 | from patchcore.utils import Timer 22 | 23 | 24 | class PatchCore: 25 | """ 26 | PyTorch implementation of the PatchCore anomaly detection [1]. 27 | 28 | PatchCore algorithm can be divided to 2 steps, (1) feature extraction from NN model, 29 | and (2) k-NN search including coreset subsampling. The procedure of the step (1) is 30 | written in `extractor.FeatureExtractor`, and the step (2) is written in ``. 31 | 32 | Reference: 33 | [1] K. Roth, L. Pemula, J. Zepeda, B. Scholkopf, T. Brox, and P. Gehler, 34 | "Towards Total Recall in Industrial Anomaly Detection", arXiv, 2021. 35 | 36 | """ 37 | def __init__(self, model, repo, device, sampling_ratio=0.001): 38 | """ 39 | Constructor of the PatchCore class. 40 | 41 | Args: 42 | model (str/torch.nn.Module): A base NN model. 43 | repo (str) : Repository name which provides the model. 44 | device (str) : Device type used for NN inference. 45 | outdir (str) : Path to output directory. 46 | 47 | Notes: 48 | The arguments `model` and `repo` are passed to `torch.hub.load` 49 | function if the `model` is not a `torch.Module` instance. 50 | """ 51 | self.device = device 52 | 53 | # Create feature extractor instance. 54 | self.extractor = FeatureExtractor(model, repo, device) 55 | 56 | # Create k-NN searcher instance. 57 | self.searcher = KNNSearcher(sampling_ratio=sampling_ratio) 58 | 59 | def fit(self, dataset, batch_size, num_workers=0): 60 | """ 61 | Args: 62 | dataset (torchvision.utils.data.Dataset): Dataset. 63 | """ 64 | # Step (1): feature extraction from the NN model. 65 | rich.print("\n[yellow][Training 1/2: feature extraction][/yellow]") 66 | embeddings = self.extractor.transform(dataset, batch_size=batch_size, num_workers=num_workers) 67 | rich.print("Embeddings dimentions: [magenta]%s[/magenta]" % str(embeddings.shape)) 68 | 69 | # Step (2): preparation for k-NN search. 70 | rich.print("\n[yellow][Training 2/2: preparation for k-NN search][/yellow]") 71 | self.searcher.fit(embeddings) 72 | 73 | def score(self, dataset, n_neighbors, dirpath_out, num_workers=0): 74 | """ 75 | Args: 76 | dataset (torchvision.utils.data.Dataset): Dataset. 77 | n_neighbors (int): Number of neighbors to be computed in k-NN search. 78 | dirpath_out (str): Directory path to dump detection results. 79 | num_workers (int): Number of available CPUs. 80 | """ 81 | dataloader = torch.utils.data.DataLoader(dataset, batch_size=1, num_workers=num_workers, pin_memory=True) 82 | 83 | # Create timer instances. 84 | timer_feature_ext = Timer() 85 | timer_k_nn_search = Timer() 86 | 87 | # Initialize results. 88 | list_true_px_lvl, list_true_im_lvl = (list(), list()) 89 | list_pred_px_lvl, list_pred_im_lvl = (list(), list()) 90 | 91 | # Create output firectory if not exists. 92 | dirpath_images = pathlib.Path(dirpath_out) / "samples" 93 | dirpath_images.mkdir(parents=True, exist_ok=True) 94 | 95 | rich.print("\n[yellow][Test: anomaly detection inference][/yellow]") 96 | for x, gt, label, filepath, x_type in rich.progress.track(dataloader, description="Processing..."): 97 | 98 | # Extract embeddings. 99 | with timer_feature_ext: 100 | embedding = self.extractor.transform(x) 101 | 102 | # Compute nearest neighbor point and it's score (L2 distance). 103 | with timer_k_nn_search: 104 | score_patches, _ = self.searcher.predict(embedding, k=n_neighbors) 105 | 106 | anomaly_map_rw, score = self.compute_anomaly_scores(score_patches, x.shape) 107 | 108 | # Add pixel level scores. 109 | list_true_px_lvl.extend(gt.cpu().numpy().astype(int).ravel()) 110 | list_pred_px_lvl.extend(anomaly_map_rw.ravel()) 111 | 112 | # Add image level scores. 113 | list_true_im_lvl.append(label.cpu().numpy()[0]) 114 | list_pred_im_lvl.append(score) 115 | 116 | # Save anomaly maps as images. 117 | self.save_anomaly_map(dirpath_images, anomaly_map_rw, filepath[0], x_type[0]) 118 | 119 | rich.print("\n[yellow][Test: score calculation][/yellow]") 120 | image_auc = sklearn.metrics.roc_auc_score(list_true_im_lvl, list_pred_im_lvl) 121 | pixel_auc = sklearn.metrics.roc_auc_score(list_true_px_lvl, list_pred_px_lvl) 122 | rich.print("Total image-level auc-roc score: [magenta]%.6f[/magenta]" % image_auc) 123 | rich.print("Total pixel-level auc-roc score: [magenta]%.6f[/magenta]" % pixel_auc) 124 | 125 | rich.print("\n[yellow][Test: inference time][/yellow]") 126 | t1 = timer_feature_ext.mean() 127 | t2 = timer_k_nn_search.mean() 128 | rich.print("Feature extraction: [magenta]%.4f sec/image[/magenta]" % t1) 129 | rich.print("Anomaly map search: [magenta]%.4f sec/image[/magenta]" % t2) 130 | rich.print("Total infer time : [magenta]%.4f sec/image[/magenta]" % (t1 + t2)) 131 | 132 | def predict(self, x, n_neighbors): 133 | """ 134 | Returns prediction results. 135 | 136 | Args: 137 | x (np.ndarray): Input matrix with shape (n_samples, n_fesatures). 138 | n_neighbors (int) : Number of neighbors to be returned. 139 | """ 140 | # Extract embeddings. 141 | embedding = self.extractor.transform(x) 142 | 143 | # Compute nearest neighbor point and it's score (L2 distance). 144 | score_patches, _ = self.searcher.predict(embedding, k=n_neighbors) 145 | 146 | # Compute anomaly map and it's re-weighting. 147 | anomaly_map_rw, _ = self.compute_anomaly_scores(score_patches, x.shape) 148 | 149 | return anomaly_map_rw 150 | 151 | def load(self, filepath): 152 | """ 153 | Load trained model. 154 | 155 | Args: 156 | filepath (str): path to the trained file. 157 | """ 158 | self.searcher.load(filepath) 159 | 160 | def save(self, filepath): 161 | """ 162 | Save trained model. 163 | 164 | Args: 165 | filepath (str): path to the trained file. 166 | """ 167 | self.searcher.save(filepath) 168 | 169 | def compute_anomaly_scores(seld, score_patches, x_shape): 170 | """ 171 | Returns anomaly index from the results of k-NN search. 172 | 173 | Args: 174 | score_patches (np.ndarary): Results of k-NN search with shape (1, h, w, neighbors). 175 | x_shape (tuple) : Shape of input image with shape (1, c, H, W). 176 | """ 177 | # Anomaly map is defined as a map of L2 distance from the nearest neibours. 178 | # NOTE: The magic number (28, 28) should be removed! 179 | anomaly_score = score_patches.reshape((28, 28, -1)) 180 | 181 | # Refine anomaly map. 182 | anomaly_maps = [anomaly_score[:, :, n] for n in range(anomaly_score.shape[2])] 183 | anomaly_maps = [cv.resize(amap, (x_shape[3], x_shape[2])) for amap in anomaly_maps] 184 | anomaly_maps = [scipy.ndimage.gaussian_filter(amap, sigma=4) for amap in anomaly_maps] 185 | anomaly_maps = np.array(anomaly_maps, dtype=np.float64) 186 | anomaly_map = anomaly_maps[0, :, :] 187 | 188 | # Anomaly map re-wighting. 189 | # We applied log-softmax like processing to the computation of scale factor 190 | # for avoiding the overflow of floating numbers. 191 | normalized_exp = np.exp(anomaly_maps - np.max(anomaly_maps, axis=0)) 192 | anomaly_map_rw = anomaly_map * (1.0 - normalized_exp[0, :, :] / np.sum(normalized_exp, axis=0)) 193 | 194 | # Compute image level score. 195 | i, j = np.unravel_index(np.argmax(anomaly_map), anomaly_map.shape) 196 | score = anomaly_map_rw[i, j] 197 | 198 | return (anomaly_map_rw, score) 199 | 200 | def save_anomaly_map(self, dirpath, anomaly_map, filepath, x_type, contour=None): 201 | """ 202 | Args: 203 | dirpath (str) : Output directory path. 204 | anomaly_map (np.ndarray): Anomaly map with the same size as the input image. 205 | filepath (str) : Path of the input image. 206 | x_type (str) : Anomaly type (e.g. "good", "crack", etc). 207 | contour (float) : Threshold of contour, or None. 208 | """ 209 | def min_max_norm(image): 210 | a_min, a_max = image.min(), image.max() 211 | return (image - a_min) / (a_max - a_min) 212 | 213 | def cvt2heatmap(gray): 214 | return cv.applyColorMap(np.uint8(gray), cv.COLORMAP_JET) 215 | 216 | # Get output directory. 217 | dirpath = pathlib.Path(dirpath) 218 | dirpath.mkdir(parents=True, exist_ok=True) 219 | 220 | # Get the image file name. 221 | filename = os.path.basename(filepath) 222 | 223 | # Load the image file and resize. 224 | original_image = cv.imread(filepath) 225 | original_image = cv.resize(original_image, anomaly_map.shape[:2]) 226 | 227 | # Visualize heat map. 228 | if contour is None: 229 | 230 | # Normalize anomaly map for easier visualization. 231 | anomaly_map_norm = cvt2heatmap(255 * min_max_norm(anomaly_map)) 232 | 233 | # Overlay the anomaly map to the origimal image. 234 | output_image = (anomaly_map_norm / 2 + original_image / 2).astype(np.uint8) 235 | 236 | # Visualize contour map. 237 | else: 238 | 239 | # Additional smoothing for better contour visualization. 240 | anomaly_map = cv.GaussianBlur(anomaly_map, (5, 5), 3) 241 | 242 | # Compute binary map to compute contour. 243 | binary_map = np.where(anomaly_map >= contour, 255, 0).astype(np.uint8) 244 | 245 | # Compute contour. 246 | contour_coord, hierarchy = cv.findContours(binary_map, cv.RETR_LIST, cv.CHAIN_APPROX_SIMPLE) 247 | 248 | # Ignore small anomalies. 249 | w, h = original_image.shape[0:2] 250 | areas = [cv.contourArea(contour_pts) / w / h for contour_pts in contour_coord] 251 | contour_coord = [contour_pts for contour_pts, area in zip(contour_coord, areas) if area > 0.005] 252 | 253 | # Initialize contour image. 254 | contour_map = original_image.copy() 255 | 256 | # Fill inside of all contours. 257 | for idx in range(len(contour_coord)): 258 | contour_map = cv.fillPoly(contour_map, [contour_coord[idx][:,0,:]], (0,165,255)) 259 | 260 | # Overray the filled image to the original image. 261 | contour_map = (0.6 * original_image + 0.4 * contour_map).astype(np.uint8) 262 | 263 | # Draw contour edges. 264 | output_image = cv.drawContours(contour_map, contour_coord, -1, (0,165,255), 2) 265 | 266 | # Save the normalized anomaly map as image. 267 | cv.imwrite(str(dirpath / f"{x_type}_{filename}.jpg"), output_image) 268 | 269 | # Save raw anomaly score as npy file. 270 | np.save(str(dirpath / f"{x_type}_{filename}.npy"), anomaly_map) 271 | -------------------------------------------------------------------------------- /patchcore/utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Utility functions for the PatchCore 3 | """ 4 | 5 | # Import standard libraries. 6 | import time 7 | 8 | # Import third-party packages. 9 | import numpy as np 10 | 11 | 12 | class Timer: 13 | """ 14 | A class for measuring elapsed time by "with" sentence. 15 | 16 | Example: 17 | >>> # Create Timer instance. 18 | >>> timer = Timer() 19 | >>> 20 | >>> # Repeat some procedure 100 times. 21 | >>> for _ in range(100): 22 | >>> with timer: 23 | >>> some_procedure() 24 | >>> 25 | >>> # Print mean elapsed time. 26 | >>> print(timer.mean()) 27 | """ 28 | def __init__(self): 29 | self.times = list() 30 | 31 | def __enter__(self): 32 | self.time_start = time.time() 33 | 34 | def __exit__(self, exc_type, exc_value, traceback): 35 | self.time_end = time.time() 36 | self.times.append(self.time_end - self.time_start) 37 | 38 | def mean(self): 39 | return sum(self.times) / len(self.times) 40 | 41 | 42 | def auto_threshold(scores_good, coef_sigma): 43 | """ 44 | Compute threshold value from the given good scores. 45 | 46 | Args: 47 | score_good (list) : List of anomaly score for good samples. 48 | coef_sigma (float): Hyperparameter of the thresholding. 49 | """ 50 | # Compute mean/std of the anomaly scores. 51 | score_mean = np.mean(scores_good) 52 | score_std = np.std(scores_good) 53 | 54 | # Compute threshold. 55 | thresh = score_mean + coef_sigma * score_std 56 | 57 | return (thresh, score_mean, score_std) 58 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | faiss-cpu==1.7.1 2 | opencv-python==4.5.2.52 3 | rich==11.0.0 4 | scikit-learn==0.24.2 5 | torch==1.8.1 6 | torchvision==0.9.1 7 | -------------------------------------------------------------------------------- /run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | python3 main_mvtecad.py runall --model resnet18 4 | python3 main_mvtecad.py runall --model resnet34 5 | python3 main_mvtecad.py runall --model resnet50 6 | python3 main_mvtecad.py runall --model resnet101 7 | python3 main_mvtecad.py runall --model resnet152 8 | python3 main_mvtecad.py runall --model wide_resnet50_2 9 | 10 | python3 main_mvtecad.py runall --model data_resnext50_32x4d 11 | python3 main_mvtecad.py runall --model data_resnext101_32x8d 12 | 13 | python3 main_mvtecad.py runall --model densenet121 14 | python3 main_mvtecad.py runall --model densenet161 15 | python3 main_mvtecad.py runall --model densenet169 16 | python3 main_mvtecad.py runall --model densenet201 17 | 18 | python3 main_mvtecad.py runall --model vgg11_bn 19 | python3 main_mvtecad.py runall --model vgg13_bn 20 | python3 main_mvtecad.py runall --model vgg16_bn 21 | python3 main_mvtecad.py runall --model vgg19_bn 22 | 23 | python3 main_mvtecad.py runall --model deeplabv3_resnet50 24 | python3 main_mvtecad.py runall --test_only --repo facebookresearch/semi-supervised-ImageNet1K-models --model resnet50_swsl 25 | python3 main_mvtecad.py runall --test_only --repo facebookresearch/semi-supervised-ImageNet1K-models --model resnet50_ssl 26 | --------------------------------------------------------------------------------