├── .github └── ISSUE_TEMPLATE │ └── add-defense.md ├── .gitignore ├── LICENSE ├── README.md ├── autoattack ├── __init__.py ├── autoattack.py ├── autopgd_base.py ├── checks.py ├── examples │ ├── eval.py │ ├── eval_tf1.py │ ├── eval_tf2.py │ ├── model_test.pt │ ├── resnet.py │ └── tf_model.weight.h5 ├── fab_base.py ├── fab_projections.py ├── fab_pt.py ├── fab_tf.py ├── other_utils.py ├── square.py ├── state.py ├── utils_tf.py └── utils_tf2.py ├── flags_doc.md └── setup.py /.github/ISSUE_TEMPLATE/add-defense.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Add defense 3 | about: Use this to have a new model added to the list of defenses 4 | title: Add [defense name] 5 | labels: '' 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Paper**: {title and link} 11 | 12 | **Venue**: {if applicable, the venue where the paper appeared} 13 | 14 | **Dataset and threat model**: {dataset, norm and epsilon for robust accuracy} 15 | 16 | **Code**: {link to the code e.g. GitHub page, possibly including a script to run the evaluation} 17 | 18 | **Pre-trained model**: {link to model weights available for downloading} 19 | 20 | **Log file**: {link to log file of the evaluation} 21 | 22 | **Additional data**: {yes/no, whether extra data, other than the standard training set, is used} 23 | 24 | **Clean and robust accuracy**: {on the full test set} 25 | 26 | **Architecture**: {} 27 | 28 | **Description of the model/defense**: {more details about the method proposed, e.g. new loss, new adversarial training, faster training, compressed model, which are relevant contributions of the paper} 29 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | data/ 2 | /__pycache__/ 3 | */__pycache__/ 4 | 5 | build/ 6 | dist/ 7 | *.egg-info/ -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Francesco Croce 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AutoAttack 2 | 3 | "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"\ 4 | *Francesco Croce*, *Matthias Hein*\ 5 | ICML 2020\ 6 | [https://arxiv.org/abs/2003.01690](https://arxiv.org/abs/2003.01690) 7 | 8 | 9 | We propose to use an ensemble of four diverse attacks to reliably evaluate robustness: 10 | + **APGD-CE**, our new step size-free version of PGD on the cross-entropy, 11 | + **APGD-DLR**, our new step size-free version of PGD on the new DLR loss, 12 | + **FAB**, which minimizes the norm of the adversarial perturbations [(Croce & Hein, 2019)](https://arxiv.org/abs/1907.02044), 13 | + **Square Attack**, a query-efficient black-box attack [(Andriushchenko et al, 2019)](https://arxiv.org/abs/1912.00049). 14 | 15 | **Note**: we fix all the hyperparameters of the attacks, so no tuning is required to test every new classifier. 16 | 17 | ## News 18 | + [Sep 2021] 19 | + We add [automatic checks](https://github.com/fra31/auto-attack/blob/master/flags_doc.md) for potential cases where the standard version of AA might be non suitable or sufficient for robustness evaluation. 20 | + The evaluations of models on CIFAR-10 and CIFAR-100 are no longer maintained. Up-to-date leaderboards are available in [RobustBench](https://robustbench.github.io/). 21 | + [Mar 2021] A version of AutoAttack wrt L1, which includes the extensions of APGD and Square Attack [(Croce & Hein, 2021)](https://arxiv.org/abs/2103.01208), is available! 22 | + [Oct 2020] AutoAttack is used as standard evaluation in the new benchmark [RobustBench](https://robustbench.github.io/), which includes a [Model Zoo](https://github.com/RobustBench/robustbench) of the most robust classifiers! Note that this page and RobustBench's leaderboards are maintained simultaneously. 23 | + [Aug 2020] 24 | + **Updated version**: in order to *i)* scale AutoAttack (AA) to datasets with many classes and *ii)* have a faster and more accurate evaluation, we use APGD-DLR and FAB with their *targeted* versions. 25 | + We add the evaluation of models on CIFAR-100 wrt Linf and CIFAR-10 wrt L2. 26 | + [Jul 2020] A short version of the paper is accepted at [ICML'20 UDL workshop](https://sites.google.com/view/udlworkshop2020/) for a spotlight presentation! 27 | + [Jun 2020] The paper is accepted at ICML 2020! 28 | 29 | # Adversarial Defenses Evaluation 30 | We here list adversarial defenses, for many threat models, recently proposed and evaluated with the standard version of 31 | **AutoAttack (AA)**, including 32 | + *untargeted APGD-CE* (no restarts), 33 | + *targeted APGD-DLR* (9 target classes), 34 | + *targeted FAB* (9 target classes), 35 | + *Square Attack* (5000 queries). 36 | 37 | See below for the more expensive AutoAttack+ (AA+) and more options. 38 | 39 | We report the source of the model, i.e. if it is publicly *available*, if we received it from the *authors* or if we *retrained* it, the architecture, the clean accuracy and the reported robust accuracy (note that might be calculated on a subset of the test set or on different models trained with the same defense). The robust accuracy for AA is on the full test set. 40 | 41 | We plan to add new models as they appear and are made available. Feel free to suggest new defenses to test! 42 | 43 | **To have a model added**: please check [here](https://github.com/fra31/auto-attack/issues/new/choose). 44 | 45 | **Checkpoints**: many of the evaluated models are available and easily accessible at this [Model Zoo](https://github.com/RobustBench/robustbench). 46 | 47 | ## CIFAR-10 - Linf 48 | The robust accuracy is evaluated at `eps = 8/255`, except for those marked with * for which `eps = 0.031`, where `eps` is the maximal Linf-norm allowed for the adversarial perturbations. The `eps` used is the same set in the original papers.\ 49 | **Note**: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training). 50 | 51 | **Update**: this is no longer maintained, but an up-to-date leaderboard is available in [RobustBench](https://robustbench.github.io/). 52 | 53 | |# |paper |model |architecture |clean |report. |AA | 54 | |:---:|---|:---:|:---:|---:|---:|---:| 55 | |**1**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)‡| *available*| WRN-70-16| 91.10| 65.87| 65.88| 56 | |**2**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)‡| *available*| WRN-28-10| 89.48| 62.76| 62.80| 57 | |**3**| [(Wu et al., 2020a)](https://arxiv.org/abs/2010.01279)‡| *available*| WRN-34-15| 87.67| 60.65| 60.65| 58 | |**4**| [(Wu et al., 2020b)](https://arxiv.org/abs/2004.05884)‡| *available*| WRN-28-10| 88.25| 60.04| 60.04| 59 | |**5**| [(Carmon et al., 2019)](https://arxiv.org/abs/1905.13736)‡| *available*| WRN-28-10| 89.69| 62.5| 59.53| 60 | |**6**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)| *available*| WRN-70-16| 85.29| 57.14| 57.20| 61 | |**7**| [(Sehwag et al., 2020)](https://github.com/fra31/auto-attack/issues/7)‡| *available*| WRN-28-10| 88.98| -| 57.14| 62 | |**8**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)| *available*| WRN-34-20| 85.64| 56.82| 56.86| 63 | |**9**| [(Wang et al., 2020)](https://openreview.net/forum?id=rklOg6EFwS)‡| *available*| WRN-28-10| 87.50| 65.04| 56.29| 64 | |**10**| [(Wu et al., 2020b)](https://arxiv.org/abs/2004.05884)| *available*| WRN-34-10| 85.36| 56.17| 56.17| 65 | |**11**| [(Alayrac et al., 2019)](https://arxiv.org/abs/1905.13725)‡| *available*| WRN-106-8| 86.46| 56.30| 56.03| 66 | |**12**| [(Hendrycks et al., 2019)](https://arxiv.org/abs/1901.09960)‡| *available*| WRN-28-10| 87.11| 57.4| 54.92| 67 | |**13**| [(Pang et al., 2020c)](https://arxiv.org/abs/2010.00467)| *available*| WRN-34-20| 86.43| 54.39| 54.39| 68 | |**14**| [(Pang et al., 2020b)](https://arxiv.org/abs/2002.08619)| *available*| WRN-34-20| 85.14| -| 53.74| 69 | |**15**| [(Cui et al., 2020)](https://arxiv.org/abs/2011.11164)\*| *available*| WRN-34-20| 88.70| 53.57| 53.57| 70 | |**16**| [(Zhang et al., 2020b)](https://arxiv.org/abs/2002.11242)| *available*| WRN-34-10| 84.52| 54.36| 53.51| 71 | |**17**| [(Rice et al., 2020)](https://arxiv.org/abs/2002.11569)| *available*| WRN-34-20| 85.34| 58| 53.42| 72 | |**18**| [(Huang et al., 2020)](https://arxiv.org/abs/2002.10319)\*| *available*| WRN-34-10| 83.48| 58.03| 53.34| 73 | |**19**| [(Zhang et al., 2019b)](https://arxiv.org/abs/1901.08573)\*| *available*| WRN-34-10| 84.92| 56.43| 53.08| 74 | |**20**| [(Cui et al., 2020)](https://arxiv.org/abs/2011.11164)\*| *available*| WRN-34-10| 88.22| 52.86| 52.86| 75 | |**21**| [(Qin et al., 2019)](https://arxiv.org/abs/1907.02610v2)| *available*| WRN-40-8| 86.28| 52.81| 52.84| 76 | |**22**| [(Chen et al., 2020a)](https://arxiv.org/abs/2003.12862)| *available*| RN-50 (x3)| 86.04| 54.64| 51.56| 77 | |**23**| [(Chen et al., 2020b)](https://github.com/fra31/auto-attack/issues/26)| *available*| WRN-34-10| 85.32| 51.13| 51.12| 78 | |**24**| [(Sitawarin et al., 2020)](https://github.com/fra31/auto-attack/issues/23)| *available*| WRN-34-10| 86.84| 50.72| 50.72| 79 | |**25**| [(Engstrom et al., 2019)](https://github.com/MadryLab/robustness)| *available*| RN-50| 87.03| 53.29| 49.25| 80 | |**26**| [(Kumari et al., 2019)](https://arxiv.org/abs/1905.05186)| *available*| WRN-34-10| 87.80| 53.04| 49.12| 81 | |**27**| [(Mao et al., 2019)](http://papers.nips.cc/paper/8339-metric-learning-for-adversarial-robustness)| *available*| WRN-34-10| 86.21| 50.03| 47.41| 82 | |**28**| [(Zhang et al., 2019a)](https://arxiv.org/abs/1905.00877)| *retrained*| WRN-34-10| 87.20| 47.98| 44.83| 83 | |**29**| [(Madry et al., 2018)](https://arxiv.org/abs/1706.06083)| *available*| WRN-34-10| 87.14| 47.04| 44.04| 84 | |**30**| [(Pang et al., 2020a)](https://arxiv.org/abs/1905.10626)| *available*| RN-32| 80.89| 55.0| 43.48| 85 | |**31**| [(Wong et al., 2020)](https://arxiv.org/abs/2001.03994)| *available*| RN-18| 83.34| 46.06| 43.21| 86 | |**32**| [(Shafahi et al., 2019)](https://arxiv.org/abs/1904.12843)| *available*| WRN-34-10| 86.11| 46.19| 41.47| 87 | |**33**| [(Ding et al., 2020)](https://openreview.net/forum?id=HkeryxBtPB)| *available*| WRN-28-4| 84.36| 47.18| 41.44| 88 | |**34**| [(Atzmon et al., 2019)](https://arxiv.org/abs/1905.11911)\*| *available*| RN-18| 81.30| 43.17| 40.22| 89 | |**35**| [(Moosavi-Dezfooli et al., 2019)](http://openaccess.thecvf.com/content_CVPR_2019/html/Moosavi-Dezfooli_Robustness_via_Curvature_Regularization_and_Vice_Versa_CVPR_2019_paper)| *authors*| WRN-28-10| 83.11| 41.4| 38.50| 90 | |**36**| [(Zhang & Wang, 2019)](http://papers.nips.cc/paper/8459-defense-against-adversarial-attacks-using-feature-scattering-based-adversarial-training)| *available*| WRN-28-10| 89.98| 60.6| 36.64| 91 | |**37**| [(Zhang & Xu, 2020)](https://openreview.net/forum?id=Syejj0NYvr¬eId=Syejj0NYvr)| *available*| WRN-28-10| 90.25| 68.7| 36.45| 92 | |**38**| [(Jang et al., 2019)](http://openaccess.thecvf.com/content_ICCV_2019/html/Jang_Adversarial_Defense_via_Learning_to_Generate_Diverse_Attacks_ICCV_2019_paper.html)| *available*| RN-20| 78.91| 37.40| 34.95| 93 | |**39**| [(Kim & Wang, 2020)](https://openreview.net/forum?id=rJlf_RVKwr)| *available*| WRN-34-10| 91.51| 57.23| 34.22| 94 | |**40**| [(Wang & Zhang, 2019)](http://openaccess.thecvf.com/content_ICCV_2019/html/Wang_Bilateral_Adversarial_Training_Towards_Fast_Training_of_More_Robust_Models_ICCV_2019_paper.html)| *available*| WRN-28-10| 92.80| 58.6| 29.35| 95 | |**41**| [(Xiao et al., 2020)](https://arxiv.org/abs/1905.10510)\*| *available*| DenseNet-121| 79.28| 52.4| 18.50| 96 | |**42**| [(Jin & Rinard, 2020)](https://arxiv.org/abs/2003.04286v1) | [*available*](https://github.com/charlesjin/adversarial_regularization/blob/6a3704757dcc7c707ff38f8b9de6f2e9e27e0a89/pretrained/pretrained88.pth) | RN-18| 90.84| 71.22| 1.35| 97 | |**43**| [(Mustafa et al., 2019)](https://arxiv.org/abs/1904.00887)| *available*| RN-110| 89.16| 32.32| 0.28| 98 | |**44**| [(Chan et al., 2020)](https://arxiv.org/abs/1912.10185)| *retrained*| WRN-34-10| 93.79| 15.5| 0.26| 99 | 100 | ## CIFAR-100 - Linf 101 | The robust accuracy is computed at `eps = 8/255` in the Linf-norm, except for the models marked with * for which `eps = 0.031` is used. \ 102 | **Note**: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).\ 103 | \ 104 | **Update**: this is no longer maintained, but an up-to-date leaderboard is available in [RobustBench](https://robustbench.github.io/). 105 | 106 | |# |paper |model |architecture |clean |report. |AA | 107 | |:---:|---|:---:|:---:|---:|---:|---:| 108 | |**1**| [(Gowal et al. 2020)](https://arxiv.org/abs/2010.03593)‡| *available*| WRN-70-16| 69.15| 37.70| 36.88| 109 | |**2**| [(Cui et al., 2020)](https://arxiv.org/abs/2011.11164)\*| *available*| WRN-34-20| 62.55| 30.20| 30.20| 110 | |**3**| [(Gowal et al. 2020)](https://arxiv.org/abs/2010.03593)| *available*| WRN-70-16| 60.86| 30.67| 30.03| 111 | |**4**| [(Cui et al., 2020)](https://arxiv.org/abs/2011.11164)\*| *available*| WRN-34-10| 60.64| 29.33| 29.33| 112 | |**5**| [(Wu et al., 2020b)](https://arxiv.org/abs/2004.05884)| *available*| WRN-34-10| 60.38| 28.86| 28.86| 113 | |**6**| [(Hendrycks et al., 2019)](https://arxiv.org/abs/1901.09960)‡| *available*| WRN-28-10| 59.23| 33.5| 28.42| 114 | |**7**| [(Cui et al., 2020)](https://arxiv.org/abs/2011.11164)\*| *available*| WRN-34-10| 70.25| 27.16| 27.16| 115 | |**8**| [(Chen et al., 2020b)](https://github.com/fra31/auto-attack/issues/26)| *available*| WRN-34-10| 62.15| -| 26.94| 116 | |**9**| [(Sitawarin et al., 2020)](https://github.com/fra31/auto-attack/issues/22)| *available*| WRN-34-10| 62.82| 24.57| 24.57| 117 | |**10**| [(Rice et al., 2020)](https://arxiv.org/abs/2002.11569)| *available*| RN-18| 53.83| 28.1| 18.95| 118 | 119 | ## MNIST - Linf 120 | The robust accuracy is computed at `eps = 0.3` in the Linf-norm. 121 | 122 | |# |paper |model |clean |report. |AA | 123 | |:---:|---|:---:|---:|---:|---:| 124 | |**1**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)| *available*| 99.26| 96.38| 96.34| 125 | |**2**| [(Zhang et al., 2020a)](https://arxiv.org/abs/1906.06316)| *available*| 98.38| 96.38| 93.96| 126 | |**3**| [(Gowal et al., 2019)](https://arxiv.org/abs/1810.12715)| *available*| 98.34| 93.78| 92.83| 127 | |**4**| [(Zhang et al., 2019b)](https://arxiv.org/abs/1901.08573)| *available*| 99.48| 95.60| 92.81| 128 | |**5**| [(Ding et al., 2020)](https://openreview.net/forum?id=HkeryxBtPB)| *available*| 98.95| 92.59| 91.40| 129 | |**6**| [(Atzmon et al., 2019)](https://arxiv.org/abs/1905.11911)| *available*| 99.35| 97.35| 90.85| 130 | |**7**| [(Madry et al., 2018)](https://arxiv.org/abs/1706.06083)| *available*| 98.53| 89.62| 88.50| 131 | |**8**| [(Jang et al., 2019)](http://openaccess.thecvf.com/content_ICCV_2019/html/Jang_Adversarial_Defense_via_Learning_to_Generate_Diverse_Attacks_ICCV_2019_paper.html)| *available*| 98.47| 94.61| 87.99| 132 | |**9**| [(Wong et al., 2020)](https://arxiv.org/abs/2001.03994)| *available*| 98.50| 88.77| 82.93| 133 | |**10**| [(Taghanaki et al., 2019)](http://openaccess.thecvf.com/content_CVPR_2019/html/Taghanaki_A_Kernelized_Manifold_Mapping_to_Diminish_the_Effect_of_Adversarial_CVPR_2019_paper.html)| *retrained*| 98.86| 64.25| 0.00| 134 | 135 | ## CIFAR-10 - L2 136 | The robust accuracy is computed at `eps = 0.5` in the L2-norm.\ 137 | **Note**: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training). 138 | 139 | **Update**: this is no longer maintained, but an up-to-date leaderboard is available in [RobustBench](https://robustbench.github.io/). 140 | 141 | |# |paper |model |architecture |clean |report. |AA | 142 | |:---:|---|:---:|:---:|---:|---:|---:| 143 | |**1**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)‡| *available*| WRN-70-16| 94.74| -| 80.53| 144 | |**2**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)| *available*| WRN-70-16| 90.90| -| 74.50| 145 | |**3**| [(Wu et al., 2020b)](https://arxiv.org/abs/2004.05884)| *available*| WRN-34-10| 88.51| 73.66| 73.66| 146 | |**4**| [(Augustin et al., 2020)](https://arxiv.org/abs/2003.09461)‡| *authors*| RN-50| 91.08| 73.27| 72.91| 147 | |**5**| [(Engstrom et al., 2019)](https://github.com/MadryLab/robustness)| *available*| RN-50| 90.83| 70.11| 69.24| 148 | |**6**| [(Rice et al., 2020)](https://arxiv.org/abs/2002.11569)| *available*| RN-18| 88.67| 71.6| 67.68| 149 | |**7**| [(Rony et al., 2019)](https://arxiv.org/abs/1811.09600)| *available*| WRN-28-10| 89.05| 67.6| 66.44| 150 | |**8**| [(Ding et al., 2020)](https://openreview.net/forum?id=HkeryxBtPB)| *available*| WRN-28-4| 88.02| 66.18| 66.09| 151 | 152 | # How to use AutoAttack 153 | 154 | ### Installation 155 | 156 | ``` 157 | pip install git+https://github.com/fra31/auto-attack 158 | ``` 159 | 160 | ### PyTorch models 161 | Import and initialize AutoAttack with 162 | 163 | ```python 164 | from autoattack import AutoAttack 165 | adversary = AutoAttack(forward_pass, norm='Linf', eps=epsilon, version='standard') 166 | ``` 167 | 168 | where: 169 | + `forward_pass` returns the logits and takes input with components in [0, 1] (NCHW format expected), 170 | + `norm = ['Linf' | 'L2' | 'L1']` is the norm of the threat model, 171 | + `eps` is the bound on the norm of the adversarial perturbations, 172 | + `version = 'standard'` uses the standard version of AA. 173 | 174 | To apply the standard evaluation, where the attacks are run sequentially on batches of size `bs` of `images`, use 175 | 176 | ```python 177 | x_adv = adversary.run_standard_evaluation(images, labels, bs=batch_size) 178 | ``` 179 | 180 | To run the attacks individually, use 181 | 182 | ```python 183 | dict_adv = adversary.run_standard_evaluation_individual(images, labels, bs=batch_size) 184 | ``` 185 | 186 | which returns a dictionary with the adversarial examples found by each attack. 187 | 188 | To specify a subset of attacks add e.g. `adversary.attacks_to_run = ['apgd-ce']`. 189 | 190 | ### TensorFlow models 191 | To evaluate models implemented in TensorFlow 1.X, use 192 | 193 | ```python 194 | from autoattack import utils_tf 195 | model_adapted = utils_tf.ModelAdapter(logits, x_input, y_input, sess) 196 | 197 | from autoattack import AutoAttack 198 | adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True) 199 | ``` 200 | 201 | where: 202 | + `logits` is the tensor with the logits given by the model, 203 | + `x_input` is a placeholder for the input for the classifier (NHWC format expected), 204 | + `y_input` is a placeholder for the correct labels, 205 | + `sess` is a TF session. 206 | 207 | If TensorFlow's version is 2.X, use 208 | 209 | ```python 210 | from autoattack import utils_tf2 211 | model_adapted = utils_tf2.ModelAdapter(tf_model) 212 | 213 | from autoattack import AutoAttack 214 | adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True) 215 | ``` 216 | 217 | where: 218 | + `tf_model` is tf.keras model without activation function 'softmax' 219 | 220 | The evaluation can be run in the same way as done with PT models. 221 | 222 | ### Examples 223 | Examples of how to use AutoAttack can be found in `examples/`. To run the standard evaluation on a pretrained 224 | PyTorch model on CIFAR-10 use 225 | ``` 226 | python eval.py [--individual] --version=['standard' | 'plus'] 227 | ``` 228 | where the optional flags activate respectively the *individual* evaluations (all the attacks are run on the full test set) and the *version* of AA to use (see below). 229 | 230 | ## Other versions 231 | ### AutoAttack+ 232 | A more expensive evaluation can be used specifying `version='plus'` when initializing AutoAttack. This includes 233 | + *untargeted APGD-CE* (5 restarts), 234 | + *untargeted APGD-DLR* (5 restarts), 235 | + *untargeted FAB* (5 restarts), 236 | + *Square Attack* (5000 queries), 237 | + *targeted APGD-DLR* (9 target classes), 238 | + *targeted FAB* (9 target classes). 239 | 240 | ### Randomized defenses 241 | In case of classifiers with stochastic components one can combine AA with Expectation over Transformation (EoT) as in [(Athalye et al., 2018)](https://arxiv.org/abs/1802.00420) specifying `version='rand'` when initializing AutoAttack. 242 | This runs 243 | + *untargeted APGD-CE* (no restarts, 20 iterations for EoT), 244 | + *untargeted APGD-DLR* (no restarts, 20 iterations for EoT). 245 | 246 | ### Custom version 247 | It is possible to customize the attacks to run specifying `version='custom'` when initializing the attack and then, for example, 248 | ```python 249 | if args.version == 'custom': 250 | adversary.attacks_to_run = ['apgd-ce', 'fab'] 251 | adversary.apgd.n_restarts = 2 252 | adversary.fab.n_restarts = 2 253 | ``` 254 | 255 | ## Other options 256 | ### Random seed 257 | It is possible to fix the random seed used for the attacks with, e.g., `adversary.seed = 0`. In this case the same seed is used for all the attacks used, otherwise a different random seed is picked for each attack. 258 | 259 | ### Log results 260 | To log the intermediate results of the evaluation specify `log_path=/path/to/logfile.txt` when initializing the attack. 261 | 262 | ## Citation 263 | ``` 264 | @inproceedings{croce2020reliable, 265 | title = {Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks}, 266 | author = {Francesco Croce and Matthias Hein}, 267 | booktitle = {ICML}, 268 | year = {2020} 269 | } 270 | ``` 271 | 272 | ``` 273 | @inproceedings{croce2021mind, 274 | title={Mind the box: $l_1$-APGD for sparse adversarial attacks on image classifiers}, 275 | author={Francesco Croce and Matthias Hein}, 276 | booktitle={ICML}, 277 | year={2021} 278 | } 279 | ``` 280 | -------------------------------------------------------------------------------- /autoattack/__init__.py: -------------------------------------------------------------------------------- 1 | from .autoattack import AutoAttack 2 | -------------------------------------------------------------------------------- /autoattack/autoattack.py: -------------------------------------------------------------------------------- 1 | import math 2 | import time 3 | 4 | import numpy as np 5 | import torch 6 | 7 | from .other_utils import Logger 8 | from autoattack import checks 9 | from autoattack.state import EvaluationState 10 | 11 | 12 | class AutoAttack(): 13 | def __init__(self, model, norm='Linf', eps=.3, seed=None, verbose=True, 14 | attacks_to_run=[], version='standard', is_tf_model=False, 15 | device='cuda', log_path=None): 16 | self.model = model 17 | self.norm = norm 18 | assert norm in ['Linf', 'L2', 'L1'] 19 | self.epsilon = eps 20 | self.seed = seed 21 | self.verbose = verbose 22 | self.attacks_to_run = attacks_to_run 23 | self.version = version 24 | self.is_tf_model = is_tf_model 25 | self.device = device 26 | self.logger = Logger(log_path) 27 | 28 | if version in ['standard', 'plus', 'rand'] and attacks_to_run != []: 29 | raise ValueError("attacks_to_run will be overridden unless you use version='custom'") 30 | 31 | if not self.is_tf_model: 32 | from .autopgd_base import APGDAttack 33 | self.apgd = APGDAttack(self.model, n_restarts=5, n_iter=100, verbose=False, 34 | eps=self.epsilon, norm=self.norm, eot_iter=1, rho=.75, seed=self.seed, 35 | device=self.device, logger=self.logger) 36 | 37 | from .fab_pt import FABAttack_PT 38 | self.fab = FABAttack_PT(self.model, n_restarts=5, n_iter=100, eps=self.epsilon, seed=self.seed, 39 | norm=self.norm, verbose=False, device=self.device) 40 | 41 | from .square import SquareAttack 42 | self.square = SquareAttack(self.model, p_init=.8, n_queries=5000, eps=self.epsilon, norm=self.norm, 43 | n_restarts=1, seed=self.seed, verbose=False, device=self.device, resc_schedule=False) 44 | 45 | from .autopgd_base import APGDAttack_targeted 46 | self.apgd_targeted = APGDAttack_targeted(self.model, n_restarts=1, n_iter=100, verbose=False, 47 | eps=self.epsilon, norm=self.norm, eot_iter=1, rho=.75, seed=self.seed, device=self.device, 48 | logger=self.logger) 49 | 50 | else: 51 | from .autopgd_base import APGDAttack 52 | self.apgd = APGDAttack(self.model, n_restarts=5, n_iter=100, verbose=False, 53 | eps=self.epsilon, norm=self.norm, eot_iter=1, rho=.75, seed=self.seed, device=self.device, 54 | is_tf_model=True, logger=self.logger) 55 | 56 | from .fab_tf import FABAttack_TF 57 | self.fab = FABAttack_TF(self.model, n_restarts=5, n_iter=100, eps=self.epsilon, seed=self.seed, 58 | norm=self.norm, verbose=False, device=self.device) 59 | 60 | from .square import SquareAttack 61 | self.square = SquareAttack(self.model.predict, p_init=.8, n_queries=5000, eps=self.epsilon, norm=self.norm, 62 | n_restarts=1, seed=self.seed, verbose=False, device=self.device, resc_schedule=False) 63 | 64 | from .autopgd_base import APGDAttack_targeted 65 | self.apgd_targeted = APGDAttack_targeted(self.model, n_restarts=1, n_iter=100, verbose=False, 66 | eps=self.epsilon, norm=self.norm, eot_iter=1, rho=.75, seed=self.seed, device=self.device, 67 | is_tf_model=True, logger=self.logger) 68 | 69 | if version in ['standard', 'plus', 'rand']: 70 | self.set_version(version) 71 | 72 | def get_logits(self, x): 73 | if not self.is_tf_model: 74 | return self.model(x) 75 | else: 76 | return self.model.predict(x) 77 | 78 | def get_seed(self): 79 | return time.time() if self.seed is None else self.seed 80 | 81 | def run_standard_evaluation(self, 82 | x_orig, 83 | y_orig, 84 | bs=250, 85 | return_labels=False, 86 | state_path=None): 87 | if state_path is not None and state_path.exists(): 88 | state = EvaluationState.from_disk(state_path) 89 | if set(self.attacks_to_run) != state.attacks_to_run: 90 | raise ValueError("The state was created with a different set of attacks " 91 | "to run. You are probably using the wrong state file.") 92 | if self.verbose: 93 | self.logger.log("Restored state from {}".format(state_path)) 94 | self.logger.log("Since the state has been restored, **only** " 95 | "the adversarial examples from the current run " 96 | "are going to be returned.") 97 | else: 98 | state = EvaluationState(set(self.attacks_to_run), path=state_path) 99 | state.to_disk() 100 | if self.verbose and state_path is not None: 101 | self.logger.log("Created state in {}".format(state_path)) 102 | 103 | attacks_to_run = list(filter(lambda attack: attack not in state.run_attacks, self.attacks_to_run)) 104 | if self.verbose: 105 | self.logger.log('using {} version including {}.'.format(self.version, 106 | ', '.join(attacks_to_run))) 107 | if state.run_attacks: 108 | self.logger.log('{} was/were already run.'.format(', '.join(state.run_attacks))) 109 | 110 | # checks on type of defense 111 | if self.version != 'rand': 112 | checks.check_randomized(self.get_logits, x_orig[:bs].to(self.device), 113 | y_orig[:bs].to(self.device), bs=bs, logger=self.logger) 114 | n_cls = checks.check_range_output(self.get_logits, x_orig[:bs].to(self.device), 115 | logger=self.logger) 116 | checks.check_dynamic(self.model, x_orig[:bs].to(self.device), self.is_tf_model, 117 | logger=self.logger) 118 | checks.check_n_classes(n_cls, self.attacks_to_run, self.apgd_targeted.n_target_classes, 119 | self.fab.n_target_classes, logger=self.logger) 120 | 121 | with torch.no_grad(): 122 | # calculate accuracy 123 | n_batches = int(np.ceil(x_orig.shape[0] / bs)) 124 | if state.robust_flags is None: 125 | robust_flags = torch.zeros(x_orig.shape[0], dtype=torch.bool, device=x_orig.device) 126 | y_adv = torch.empty_like(y_orig) 127 | for batch_idx in range(n_batches): 128 | start_idx = batch_idx * bs 129 | end_idx = min( (batch_idx + 1) * bs, x_orig.shape[0]) 130 | 131 | x = x_orig[start_idx:end_idx, :].clone().to(self.device) 132 | y = y_orig[start_idx:end_idx].clone().to(self.device) 133 | output = self.get_logits(x).max(dim=1)[1] 134 | y_adv[start_idx: end_idx] = output 135 | correct_batch = y.eq(output) 136 | robust_flags[start_idx:end_idx] = correct_batch.detach().to(robust_flags.device) 137 | 138 | state.robust_flags = robust_flags 139 | robust_accuracy = torch.sum(robust_flags).item() / x_orig.shape[0] 140 | robust_accuracy_dict = {'clean': robust_accuracy} 141 | state.clean_accuracy = robust_accuracy 142 | 143 | if self.verbose: 144 | self.logger.log('initial accuracy: {:.2%}'.format(robust_accuracy)) 145 | else: 146 | robust_flags = state.robust_flags.to(x_orig.device) 147 | robust_accuracy = torch.sum(robust_flags).item() / x_orig.shape[0] 148 | robust_accuracy_dict = {'clean': state.clean_accuracy} 149 | if self.verbose: 150 | self.logger.log('initial clean accuracy: {:.2%}'.format(state.clean_accuracy)) 151 | self.logger.log('robust accuracy at the time of restoring the state: {:.2%}'.format(robust_accuracy)) 152 | 153 | x_adv = x_orig.clone().detach() 154 | startt = time.time() 155 | for attack in attacks_to_run: 156 | # item() is super important as pytorch int division uses floor rounding 157 | num_robust = torch.sum(robust_flags).item() 158 | 159 | if num_robust == 0: 160 | break 161 | 162 | n_batches = int(np.ceil(num_robust / bs)) 163 | 164 | robust_lin_idcs = torch.nonzero(robust_flags, as_tuple=False) 165 | if num_robust > 1: 166 | robust_lin_idcs.squeeze_() 167 | 168 | for batch_idx in range(n_batches): 169 | start_idx = batch_idx * bs 170 | end_idx = min((batch_idx + 1) * bs, num_robust) 171 | 172 | batch_datapoint_idcs = robust_lin_idcs[start_idx:end_idx] 173 | if len(batch_datapoint_idcs.shape) > 1: 174 | batch_datapoint_idcs.squeeze_(-1) 175 | x = x_orig[batch_datapoint_idcs, :].clone().to(self.device) 176 | y = y_orig[batch_datapoint_idcs].clone().to(self.device) 177 | 178 | # make sure that x is a 4d tensor even if there is only a single datapoint left 179 | if len(x.shape) == 3: 180 | x.unsqueeze_(dim=0) 181 | 182 | # run attack 183 | if attack == 'apgd-ce': 184 | # apgd on cross-entropy loss 185 | self.apgd.loss = 'ce' 186 | self.apgd.seed = self.get_seed() 187 | adv_curr = self.apgd.perturb(x, y) #cheap=True 188 | 189 | elif attack == 'apgd-dlr': 190 | # apgd on dlr loss 191 | self.apgd.loss = 'dlr' 192 | self.apgd.seed = self.get_seed() 193 | adv_curr = self.apgd.perturb(x, y) #cheap=True 194 | 195 | elif attack == 'fab': 196 | # fab 197 | self.fab.targeted = False 198 | self.fab.seed = self.get_seed() 199 | adv_curr = self.fab.perturb(x, y) 200 | 201 | elif attack == 'square': 202 | # square 203 | self.square.seed = self.get_seed() 204 | adv_curr = self.square.perturb(x, y) 205 | 206 | elif attack == 'apgd-t': 207 | # targeted apgd 208 | self.apgd_targeted.seed = self.get_seed() 209 | adv_curr = self.apgd_targeted.perturb(x, y) #cheap=True 210 | 211 | elif attack == 'fab-t': 212 | # fab targeted 213 | self.fab.targeted = True 214 | self.fab.n_restarts = 1 215 | self.fab.seed = self.get_seed() 216 | adv_curr = self.fab.perturb(x, y) 217 | 218 | else: 219 | raise ValueError('Attack not supported') 220 | 221 | output = self.get_logits(adv_curr).max(dim=1)[1] 222 | false_batch = ~y.eq(output).to(robust_flags.device) 223 | non_robust_lin_idcs = batch_datapoint_idcs[false_batch] 224 | robust_flags[non_robust_lin_idcs] = False 225 | state.robust_flags = robust_flags 226 | 227 | x_adv[non_robust_lin_idcs] = adv_curr[false_batch].detach().to(x_adv.device) 228 | y_adv[non_robust_lin_idcs] = output[false_batch].detach().to(x_adv.device) 229 | 230 | if self.verbose: 231 | num_non_robust_batch = torch.sum(false_batch) 232 | self.logger.log('{} - {}/{} - {} out of {} successfully perturbed'.format( 233 | attack, batch_idx + 1, n_batches, num_non_robust_batch, x.shape[0])) 234 | 235 | robust_accuracy = torch.sum(robust_flags).item() / x_orig.shape[0] 236 | robust_accuracy_dict[attack] = robust_accuracy 237 | state.add_run_attack(attack) 238 | if self.verbose: 239 | self.logger.log('robust accuracy after {}: {:.2%} (total time {:.1f} s)'.format( 240 | attack.upper(), robust_accuracy, time.time() - startt)) 241 | 242 | # check about square 243 | checks.check_square_sr(robust_accuracy_dict, logger=self.logger) 244 | state.to_disk(force=True) 245 | 246 | # final check 247 | if self.verbose: 248 | if self.norm == 'Linf': 249 | res = (x_adv - x_orig).abs().reshape(x_orig.shape[0], -1).max(1)[0] 250 | elif self.norm == 'L2': 251 | res = ((x_adv - x_orig) ** 2).reshape(x_orig.shape[0], -1).sum(-1).sqrt() 252 | elif self.norm == 'L1': 253 | res = (x_adv - x_orig).abs().reshape(x_orig.shape[0], -1).sum(dim=-1) 254 | self.logger.log('max {} perturbation: {:.5f}, nan in tensor: {}, max: {:.5f}, min: {:.5f}'.format( 255 | self.norm, res.max(), (x_adv != x_adv).sum(), x_adv.max(), x_adv.min())) 256 | self.logger.log('robust accuracy: {:.2%}'.format(robust_accuracy)) 257 | if return_labels: 258 | return x_adv, y_adv 259 | else: 260 | return x_adv 261 | 262 | def clean_accuracy(self, x_orig, y_orig, bs=250): 263 | n_batches = math.ceil(x_orig.shape[0] / bs) 264 | acc = 0. 265 | for counter in range(n_batches): 266 | x = x_orig[counter * bs:min((counter + 1) * bs, x_orig.shape[0])].clone().to(self.device) 267 | y = y_orig[counter * bs:min((counter + 1) * bs, x_orig.shape[0])].clone().to(self.device) 268 | output = self.get_logits(x) 269 | acc += (output.max(1)[1] == y).float().sum() 270 | 271 | if self.verbose: 272 | print('clean accuracy: {:.2%}'.format(acc / x_orig.shape[0])) 273 | 274 | return acc.item() / x_orig.shape[0] 275 | 276 | def run_standard_evaluation_individual(self, x_orig, y_orig, bs=250, return_labels=False): 277 | if self.verbose: 278 | print('using {} version including {}'.format(self.version, 279 | ', '.join(self.attacks_to_run))) 280 | 281 | l_attacks = self.attacks_to_run 282 | adv = {} 283 | verbose_indiv = self.verbose 284 | self.verbose = False 285 | 286 | for c in l_attacks: 287 | startt = time.time() 288 | self.attacks_to_run = [c] 289 | x_adv, y_adv = self.run_standard_evaluation(x_orig, y_orig, bs=bs, return_labels=True) 290 | if return_labels: 291 | adv[c] = (x_adv, y_adv) 292 | else: 293 | adv[c] = x_adv 294 | if verbose_indiv: 295 | acc_indiv = self.clean_accuracy(x_adv, y_orig, bs=bs) 296 | space = '\t \t' if c == 'fab' else '\t' 297 | self.logger.log('robust accuracy by {} {} {:.2%} \t (time attack: {:.1f} s)'.format( 298 | c.upper(), space, acc_indiv, time.time() - startt)) 299 | 300 | return adv 301 | 302 | def set_version(self, version='standard'): 303 | if self.verbose: 304 | print('setting parameters for {} version'.format(version)) 305 | 306 | if version == 'standard': 307 | self.attacks_to_run = ['apgd-ce', 'apgd-t', 'fab-t', 'square'] 308 | if self.norm in ['Linf', 'L2']: 309 | self.apgd.n_restarts = 1 310 | self.apgd_targeted.n_target_classes = 9 311 | elif self.norm in ['L1']: 312 | self.apgd.use_largereps = True 313 | self.apgd_targeted.use_largereps = True 314 | self.apgd.n_restarts = 5 315 | self.apgd_targeted.n_target_classes = 5 316 | self.fab.n_restarts = 1 317 | self.apgd_targeted.n_restarts = 1 318 | self.fab.n_target_classes = 9 319 | #self.apgd_targeted.n_target_classes = 9 320 | self.square.n_queries = 5000 321 | 322 | elif version == 'plus': 323 | self.attacks_to_run = ['apgd-ce', 'apgd-dlr', 'fab', 'square', 'apgd-t', 'fab-t'] 324 | self.apgd.n_restarts = 5 325 | self.fab.n_restarts = 5 326 | self.apgd_targeted.n_restarts = 1 327 | self.fab.n_target_classes = 9 328 | self.apgd_targeted.n_target_classes = 9 329 | self.square.n_queries = 5000 330 | if not self.norm in ['Linf', 'L2']: 331 | print('"{}" version is used with {} norm: please check'.format( 332 | version, self.norm)) 333 | 334 | elif version == 'rand': 335 | self.attacks_to_run = ['apgd-ce', 'apgd-dlr'] 336 | self.apgd.n_restarts = 1 337 | self.apgd.eot_iter = 20 338 | 339 | -------------------------------------------------------------------------------- /autoattack/autopgd_base.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020-present, Francesco Croce 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree 6 | # 7 | 8 | import time 9 | import torch 10 | import torch.nn as nn 11 | import torch.nn.functional as F 12 | import math 13 | import random 14 | 15 | from autoattack.other_utils import L0_norm, L1_norm, L2_norm 16 | from autoattack.checks import check_zero_gradients 17 | 18 | 19 | def L1_projection(x2, y2, eps1): 20 | ''' 21 | x2: center of the L1 ball (bs x input_dim) 22 | y2: current perturbation (x2 + y2 is the point to be projected) 23 | eps1: radius of the L1 ball 24 | 25 | output: delta s.th. ||y2 + delta||_1 <= eps1 26 | and 0 <= x2 + y2 + delta <= 1 27 | ''' 28 | 29 | x = x2.clone().float().view(x2.shape[0], -1) 30 | y = y2.clone().float().view(y2.shape[0], -1) 31 | sigma = y.clone().sign() 32 | u = torch.min(1 - x - y, x + y) 33 | #u = torch.min(u, epsinf - torch.clone(y).abs()) 34 | u = torch.min(torch.zeros_like(y), u) 35 | l = -torch.clone(y).abs() 36 | d = u.clone() 37 | 38 | bs, indbs = torch.sort(-torch.cat((u, l), 1), dim=1) 39 | bs2 = torch.cat((bs[:, 1:], torch.zeros(bs.shape[0], 1).to(bs.device)), 1) 40 | 41 | inu = 2*(indbs < u.shape[1]).float() - 1 42 | size1 = inu.cumsum(dim=1) 43 | 44 | s1 = -u.sum(dim=1) 45 | 46 | c = eps1 - y.clone().abs().sum(dim=1) 47 | c5 = s1 + c < 0 48 | c2 = c5.nonzero().squeeze(1) 49 | 50 | s = s1.unsqueeze(-1) + torch.cumsum((bs2 - bs) * size1, dim=1) 51 | 52 | if c2.nelement != 0: 53 | 54 | lb = torch.zeros_like(c2).float() 55 | ub = torch.ones_like(lb) *(bs.shape[1] - 1) 56 | 57 | #print(c2.shape, lb.shape) 58 | 59 | nitermax = torch.ceil(torch.log2(torch.tensor(bs.shape[1]).float())) 60 | counter2 = torch.zeros_like(lb).long() 61 | counter = 0 62 | 63 | while counter < nitermax: 64 | counter4 = torch.floor((lb + ub) / 2.) 65 | counter2 = counter4.type(torch.LongTensor) 66 | 67 | c8 = s[c2, counter2] + c[c2] < 0 68 | ind3 = c8.nonzero().squeeze(1) 69 | ind32 = (~c8).nonzero().squeeze(1) 70 | #print(ind3.shape) 71 | if ind3.nelement != 0: 72 | lb[ind3] = counter4[ind3] 73 | if ind32.nelement != 0: 74 | ub[ind32] = counter4[ind32] 75 | 76 | #print(lb, ub) 77 | counter += 1 78 | 79 | lb2 = lb.long() 80 | alpha = (-s[c2, lb2] -c[c2]) / size1[c2, lb2 + 1] + bs2[c2, lb2] 81 | d[c2] = -torch.min(torch.max(-u[c2], alpha.unsqueeze(-1)), -l[c2]) 82 | 83 | return (sigma * d).view(x2.shape) 84 | 85 | 86 | 87 | 88 | 89 | class APGDAttack(): 90 | """ 91 | AutoPGD 92 | https://arxiv.org/abs/2003.01690 93 | 94 | :param predict: forward pass function 95 | :param norm: Lp-norm of the attack ('Linf', 'L2', 'L0' supported) 96 | :param n_restarts: number of random restarts 97 | :param n_iter: number of iterations 98 | :param eps: bound on the norm of perturbations 99 | :param seed: random seed for the starting point 100 | :param loss: loss to optimize ('ce', 'dlr' supported) 101 | :param eot_iter: iterations for Expectation over Trasformation 102 | :param rho: parameter for decreasing the step size 103 | """ 104 | 105 | def __init__( 106 | self, 107 | predict, 108 | n_iter=100, 109 | norm='Linf', 110 | n_restarts=1, 111 | eps=None, 112 | seed=0, 113 | loss='ce', 114 | eot_iter=1, 115 | rho=.75, 116 | topk=None, 117 | verbose=False, 118 | device=None, 119 | use_largereps=False, 120 | is_tf_model=False, 121 | logger=None): 122 | """ 123 | AutoPGD implementation in PyTorch 124 | """ 125 | 126 | self.model = predict 127 | self.n_iter = n_iter 128 | self.eps = eps 129 | self.norm = norm 130 | self.n_restarts = n_restarts 131 | self.seed = seed 132 | self.loss = loss 133 | self.eot_iter = eot_iter 134 | self.thr_decr = rho 135 | self.topk = topk 136 | self.verbose = verbose 137 | self.device = device 138 | self.use_rs = True 139 | #self.init_point = None 140 | self.use_largereps = use_largereps 141 | #self.larger_epss = None 142 | #self.iters = None 143 | self.n_iter_orig = n_iter + 0 144 | self.eps_orig = eps + 0. 145 | self.is_tf_model = is_tf_model 146 | self.y_target = None 147 | self.logger = logger 148 | 149 | assert self.norm in ['Linf', 'L2', 'L1'] 150 | assert not self.eps is None 151 | 152 | ### set parameters for checkpoints 153 | self.n_iter_2 = max(int(0.22 * self.n_iter), 1) 154 | self.n_iter_min = max(int(0.06 * self.n_iter), 1) 155 | self.size_decr = max(int(0.03 * self.n_iter), 1) 156 | 157 | def init_hyperparam(self, x): 158 | 159 | if self.device is None: 160 | self.device = x.device 161 | self.orig_dim = list(x.shape[1:]) 162 | self.ndims = len(self.orig_dim) 163 | if self.seed is None: 164 | self.seed = time.time() 165 | 166 | def check_oscillation(self, x, j, k, y5, k3=0.75): 167 | t = torch.zeros(x.shape[1]).to(self.device) 168 | for counter5 in range(k): 169 | t += (x[j - counter5] > x[j - counter5 - 1]).float() 170 | 171 | return (t <= k * k3 * torch.ones_like(t)).float() 172 | 173 | def check_shape(self, x): 174 | return x if len(x.shape) > 0 else x.unsqueeze(0) 175 | 176 | def normalize(self, x): 177 | if self.norm == 'Linf': 178 | t = x.abs().view(x.shape[0], -1).max(1)[0] 179 | 180 | elif self.norm == 'L2': 181 | t = (x ** 2).view(x.shape[0], -1).sum(-1).sqrt() 182 | 183 | elif self.norm == 'L1': 184 | try: 185 | t = x.abs().view(x.shape[0], -1).sum(dim=-1) 186 | except: 187 | t = x.abs().reshape([x.shape[0], -1]).sum(dim=-1) 188 | 189 | return x / (t.view(-1, *([1] * self.ndims)) + 1e-12) 190 | 191 | def dlr_loss(self, x, y): 192 | x_sorted, ind_sorted = x.sort(dim=1) 193 | ind = (ind_sorted[:, -1] == y).float() 194 | u = torch.arange(x.shape[0]) 195 | 196 | return -(x[u, y] - x_sorted[:, -2] * ind - x_sorted[:, -1] * ( 197 | 1. - ind)) / (x_sorted[:, -1] - x_sorted[:, -3] + 1e-12) 198 | 199 | # 200 | 201 | def attack_single_run(self, x, y, x_init=None): 202 | if len(x.shape) < self.ndims: 203 | x = x.unsqueeze(0) 204 | y = y.unsqueeze(0) 205 | 206 | if self.norm == 'Linf': 207 | t = 2 * torch.rand(x.shape).to(self.device).detach() - 1 208 | x_adv = x + self.eps * torch.ones_like(x 209 | ).detach() * self.normalize(t) 210 | elif self.norm == 'L2': 211 | t = torch.randn(x.shape).to(self.device).detach() 212 | x_adv = x + self.eps * torch.ones_like(x 213 | ).detach() * self.normalize(t) 214 | elif self.norm == 'L1': 215 | t = torch.randn(x.shape).to(self.device).detach() 216 | delta = L1_projection(x, t, self.eps) 217 | x_adv = x + t + delta 218 | 219 | 220 | 221 | 222 | 223 | if not x_init is None: 224 | x_adv = x_init.clone() 225 | if self.norm == 'L1' and self.verbose: 226 | print('[custom init] L1 perturbation {:.5f}'.format( 227 | (x_adv - x).abs().view(x.shape[0], -1).sum(1).max())) 228 | 229 | 230 | x_adv = x_adv.clamp(0., 1.) 231 | x_best = x_adv.clone() 232 | x_best_adv = x_adv.clone() 233 | loss_steps = torch.zeros([self.n_iter, x.shape[0]] 234 | ).to(self.device) 235 | loss_best_steps = torch.zeros([self.n_iter + 1, x.shape[0]] 236 | ).to(self.device) 237 | acc_steps = torch.zeros_like(loss_best_steps) 238 | 239 | if not self.is_tf_model: 240 | if self.loss == 'ce': 241 | criterion_indiv = nn.CrossEntropyLoss(reduction='none') 242 | elif self.loss == 'ce-targeted-cfts': 243 | criterion_indiv = lambda x, y: -1. * F.cross_entropy(x, y, 244 | reduction='none') 245 | elif self.loss == 'dlr': 246 | criterion_indiv = self.dlr_loss 247 | elif self.loss == 'dlr-targeted': 248 | criterion_indiv = self.dlr_loss_targeted 249 | elif self.loss == 'ce-targeted': 250 | criterion_indiv = self.ce_loss_targeted 251 | else: 252 | raise ValueError('unknowkn loss') 253 | else: 254 | if self.loss == 'ce': 255 | criterion_indiv = self.model.get_logits_loss_grad_xent 256 | elif self.loss == 'dlr': 257 | criterion_indiv = self.model.get_logits_loss_grad_dlr 258 | elif self.loss == 'dlr-targeted': 259 | criterion_indiv = self.model.get_logits_loss_grad_target 260 | else: 261 | raise ValueError('unknowkn loss') 262 | 263 | 264 | x_adv.requires_grad_() 265 | grad = torch.zeros_like(x) 266 | for _ in range(self.eot_iter): 267 | if not self.is_tf_model: 268 | with torch.enable_grad(): 269 | logits = self.model(x_adv) 270 | loss_indiv = criterion_indiv(logits, y) 271 | loss = loss_indiv.sum() 272 | 273 | grad += torch.autograd.grad(loss, [x_adv])[0].detach() 274 | else: 275 | if self.y_target is None: 276 | logits, loss_indiv, grad_curr = criterion_indiv(x_adv, y) 277 | else: 278 | logits, loss_indiv, grad_curr = criterion_indiv(x_adv, y, 279 | self.y_target) 280 | grad += grad_curr 281 | 282 | grad /= float(self.eot_iter) 283 | grad_best = grad.clone() 284 | 285 | if self.loss in ['dlr', 'dlr-targeted']: 286 | # check if there are zero gradients 287 | check_zero_gradients(grad, logger=self.logger) 288 | 289 | acc = logits.detach().max(1)[1] == y 290 | acc_steps[0] = acc + 0 291 | loss_best = loss_indiv.detach().clone() 292 | 293 | alpha = 2. if self.norm in ['Linf', 'L2'] else 1. if self.norm in ['L1'] else 2e-2 294 | step_size = alpha * self.eps * torch.ones([x.shape[0], *( 295 | [1] * self.ndims)]).to(self.device).detach() 296 | x_adv_old = x_adv.clone() 297 | counter = 0 298 | k = self.n_iter_2 + 0 299 | n_fts = math.prod(self.orig_dim) 300 | if self.norm == 'L1': 301 | k = max(int(.04 * self.n_iter), 1) 302 | if x_init is None: 303 | topk = .2 * torch.ones([x.shape[0]], device=self.device) 304 | sp_old = n_fts * torch.ones_like(topk) 305 | else: 306 | topk = L0_norm(x_adv - x) / n_fts / 1.5 307 | sp_old = L0_norm(x_adv - x) 308 | #print(topk[0], sp_old[0]) 309 | adasp_redstep = 1.5 310 | adasp_minstep = 10. 311 | #print(step_size[0].item()) 312 | counter3 = 0 313 | 314 | loss_best_last_check = loss_best.clone() 315 | reduced_last_check = torch.ones_like(loss_best) 316 | n_reduced = 0 317 | 318 | u = torch.arange(x.shape[0], device=self.device) 319 | for i in range(self.n_iter): 320 | ### gradient step 321 | with torch.no_grad(): 322 | x_adv = x_adv.detach() 323 | grad2 = x_adv - x_adv_old 324 | x_adv_old = x_adv.clone() 325 | 326 | a = 0.75 if i > 0 else 1.0 327 | 328 | if self.norm == 'Linf': 329 | x_adv_1 = x_adv + step_size * torch.sign(grad) 330 | x_adv_1 = torch.clamp(torch.min(torch.max(x_adv_1, 331 | x - self.eps), x + self.eps), 0.0, 1.0) 332 | x_adv_1 = torch.clamp(torch.min(torch.max( 333 | x_adv + (x_adv_1 - x_adv) * a + grad2 * (1 - a), 334 | x - self.eps), x + self.eps), 0.0, 1.0) 335 | 336 | elif self.norm == 'L2': 337 | x_adv_1 = x_adv + step_size * self.normalize(grad) 338 | x_adv_1 = torch.clamp(x + self.normalize(x_adv_1 - x 339 | ) * torch.min(self.eps * torch.ones_like(x).detach(), 340 | L2_norm(x_adv_1 - x, keepdim=True)), 0.0, 1.0) 341 | x_adv_1 = x_adv + (x_adv_1 - x_adv) * a + grad2 * (1 - a) 342 | x_adv_1 = torch.clamp(x + self.normalize(x_adv_1 - x 343 | ) * torch.min(self.eps * torch.ones_like(x).detach(), 344 | L2_norm(x_adv_1 - x, keepdim=True)), 0.0, 1.0) 345 | 346 | elif self.norm == 'L1': 347 | grad_topk = grad.abs().view(x.shape[0], -1).sort(-1)[0] 348 | topk_curr = torch.clamp((1. - topk) * n_fts, min=0, max=n_fts - 1).long() 349 | grad_topk = grad_topk[u, topk_curr].view(-1, *[1]*(len(x.shape) - 1)) 350 | sparsegrad = grad * (grad.abs() >= grad_topk).float() 351 | x_adv_1 = x_adv + step_size * sparsegrad.sign() / ( 352 | L1_norm(sparsegrad.sign(), keepdim=True) + 1e-10) 353 | 354 | delta_u = x_adv_1 - x 355 | delta_p = L1_projection(x, delta_u, self.eps) 356 | x_adv_1 = x + delta_u + delta_p 357 | 358 | 359 | x_adv = x_adv_1 + 0. 360 | 361 | ### get gradient 362 | x_adv.requires_grad_() 363 | grad = torch.zeros_like(x) 364 | for _ in range(self.eot_iter): 365 | if not self.is_tf_model: 366 | with torch.enable_grad(): 367 | logits = self.model(x_adv) 368 | loss_indiv = criterion_indiv(logits, y) 369 | loss = loss_indiv.sum() 370 | 371 | grad += torch.autograd.grad(loss, [x_adv])[0].detach() 372 | else: 373 | if self.y_target is None: 374 | logits, loss_indiv, grad_curr = criterion_indiv(x_adv, y) 375 | else: 376 | logits, loss_indiv, grad_curr = criterion_indiv(x_adv, y, self.y_target) 377 | grad += grad_curr 378 | 379 | grad /= float(self.eot_iter) 380 | 381 | pred = logits.detach().max(1)[1] == y 382 | acc = torch.min(acc, pred) 383 | acc_steps[i + 1] = acc + 0 384 | ind_pred = (pred == 0).nonzero().squeeze() 385 | x_best_adv[ind_pred] = x_adv[ind_pred] + 0. 386 | if self.verbose: 387 | str_stats = ' - step size: {:.5f} - topk: {:.2f}'.format( 388 | step_size.mean(), topk.mean() * n_fts) if self.norm in ['L1'] else '' 389 | print('[m] iteration: {} - best loss: {:.6f} - robust accuracy: {:.2%}{}'.format( 390 | i, loss_best.sum(), acc.float().mean(), str_stats)) 391 | #print('pert {}'.format((x - x_best_adv).abs().view(x.shape[0], -1).sum(-1).max())) 392 | 393 | ### check step size 394 | with torch.no_grad(): 395 | y1 = loss_indiv.detach().clone() 396 | loss_steps[i] = y1 + 0 397 | ind = (y1 > loss_best).nonzero().squeeze() 398 | x_best[ind] = x_adv[ind].clone() 399 | grad_best[ind] = grad[ind].clone() 400 | loss_best[ind] = y1[ind] + 0 401 | loss_best_steps[i + 1] = loss_best + 0 402 | 403 | counter3 += 1 404 | 405 | if counter3 == k: 406 | if self.norm in ['Linf', 'L2']: 407 | fl_oscillation = self.check_oscillation(loss_steps, i, k, 408 | loss_best, k3=self.thr_decr) 409 | fl_reduce_no_impr = (1. - reduced_last_check) * ( 410 | loss_best_last_check >= loss_best).float() 411 | fl_oscillation = torch.max(fl_oscillation, 412 | fl_reduce_no_impr) 413 | reduced_last_check = fl_oscillation.clone() 414 | loss_best_last_check = loss_best.clone() 415 | 416 | if fl_oscillation.sum() > 0: 417 | ind_fl_osc = (fl_oscillation > 0).nonzero().squeeze() 418 | step_size[ind_fl_osc] /= 2.0 419 | n_reduced = fl_oscillation.sum() 420 | 421 | x_adv[ind_fl_osc] = x_best[ind_fl_osc].clone() 422 | grad[ind_fl_osc] = grad_best[ind_fl_osc].clone() 423 | 424 | k = max(k - self.size_decr, self.n_iter_min) 425 | 426 | elif self.norm == 'L1': 427 | sp_curr = L0_norm(x_best - x) 428 | fl_redtopk = (sp_curr / sp_old) < .95 429 | topk = sp_curr / n_fts / 1.5 430 | step_size[fl_redtopk] = alpha * self.eps 431 | step_size[~fl_redtopk] /= adasp_redstep 432 | step_size.clamp_(alpha * self.eps / adasp_minstep, alpha * self.eps) 433 | sp_old = sp_curr.clone() 434 | 435 | x_adv[fl_redtopk] = x_best[fl_redtopk].clone() 436 | grad[fl_redtopk] = grad_best[fl_redtopk].clone() 437 | 438 | counter3 = 0 439 | #k = max(k - self.size_decr, self.n_iter_min) 440 | 441 | # 442 | 443 | return (x_best, acc, loss_best, x_best_adv) 444 | 445 | def perturb(self, x, y=None, best_loss=False, x_init=None): 446 | """ 447 | :param x: clean images 448 | :param y: clean labels, if None we use the predicted labels 449 | :param best_loss: if True the points attaining highest loss 450 | are returned, otherwise adversarial examples 451 | """ 452 | 453 | assert self.loss in ['ce', 'dlr'] #'ce-targeted-cfts' 454 | if not y is None and len(y.shape) == 0: 455 | x.unsqueeze_(0) 456 | y.unsqueeze_(0) 457 | self.init_hyperparam(x) 458 | 459 | x = x.detach().clone().float().to(self.device) 460 | if not self.is_tf_model: 461 | y_pred = self.model(x).max(1)[1] 462 | else: 463 | y_pred = self.model.predict(x).max(1)[1] 464 | if y is None: 465 | #y_pred = self.predict(x).max(1)[1] 466 | y = y_pred.detach().clone().long().to(self.device) 467 | else: 468 | y = y.detach().clone().long().to(self.device) 469 | 470 | adv = x.clone() 471 | if self.loss != 'ce-targeted': 472 | acc = y_pred == y 473 | else: 474 | acc = y_pred != y 475 | loss = -1e10 * torch.ones_like(acc).float() 476 | if self.verbose: 477 | print('-------------------------- ', 478 | 'running {}-attack with epsilon {:.5f}'.format( 479 | self.norm, self.eps), 480 | '--------------------------') 481 | print('initial accuracy: {:.2%}'.format(acc.float().mean())) 482 | 483 | 484 | 485 | if self.use_largereps: 486 | epss = [3. * self.eps_orig, 2. * self.eps_orig, 1. * self.eps_orig] 487 | iters = [.3 * self.n_iter_orig, .3 * self.n_iter_orig, 488 | .4 * self.n_iter_orig] 489 | iters = [math.ceil(c) for c in iters] 490 | iters[-1] = self.n_iter_orig - sum(iters[:-1]) # make sure to use the given iterations 491 | if self.verbose: 492 | print('using schedule [{}x{}]'.format('+'.join([str(c 493 | ) for c in epss]), '+'.join([str(c) for c in iters]))) 494 | 495 | startt = time.time() 496 | if not best_loss: 497 | torch.random.manual_seed(self.seed) 498 | torch.cuda.random.manual_seed(self.seed) 499 | 500 | for counter in range(self.n_restarts): 501 | ind_to_fool = acc.nonzero().squeeze() 502 | if len(ind_to_fool.shape) == 0: 503 | ind_to_fool = ind_to_fool.unsqueeze(0) 504 | if ind_to_fool.numel() != 0: 505 | x_to_fool = x[ind_to_fool].clone() 506 | y_to_fool = y[ind_to_fool].clone() 507 | 508 | 509 | if not self.use_largereps: 510 | res_curr = self.attack_single_run(x_to_fool, y_to_fool) 511 | else: 512 | res_curr = self.decr_eps_pgd(x_to_fool, y_to_fool, epss, iters) 513 | best_curr, acc_curr, loss_curr, adv_curr = res_curr 514 | ind_curr = (acc_curr == 0).nonzero().squeeze() 515 | 516 | acc[ind_to_fool[ind_curr]] = 0 517 | adv[ind_to_fool[ind_curr]] = adv_curr[ind_curr].clone() 518 | if self.verbose: 519 | print('restart {} - robust accuracy: {:.2%}'.format( 520 | counter, acc.float().mean()), 521 | '- cum. time: {:.1f} s'.format( 522 | time.time() - startt)) 523 | 524 | return adv 525 | 526 | else: 527 | adv_best = x.detach().clone() 528 | loss_best = torch.ones([x.shape[0]]).to( 529 | self.device) * (-float('inf')) 530 | for counter in range(self.n_restarts): 531 | best_curr, _, loss_curr, _ = self.attack_single_run(x, y) 532 | ind_curr = (loss_curr > loss_best).nonzero().squeeze() 533 | adv_best[ind_curr] = best_curr[ind_curr] + 0. 534 | loss_best[ind_curr] = loss_curr[ind_curr] + 0. 535 | 536 | if self.verbose: 537 | print('restart {} - loss: {:.5f}'.format( 538 | counter, loss_best.sum())) 539 | 540 | return adv_best 541 | 542 | def decr_eps_pgd(self, x, y, epss, iters, use_rs=True): 543 | assert len(epss) == len(iters) 544 | assert self.norm in ['L1'] 545 | self.use_rs = False 546 | if not use_rs: 547 | x_init = None 548 | else: 549 | x_init = x + torch.randn_like(x) 550 | x_init += L1_projection(x, x_init - x, 1. * float(epss[0])) 551 | eps_target = float(epss[-1]) 552 | if self.verbose: 553 | print('total iter: {}'.format(sum(iters))) 554 | for eps, niter in zip(epss, iters): 555 | if self.verbose: 556 | print('using eps: {:.2f}'.format(eps)) 557 | self.n_iter = niter + 0 558 | self.eps = eps + 0. 559 | # 560 | if not x_init is None: 561 | x_init += L1_projection(x, x_init - x, 1. * eps) 562 | x_init, acc, loss, x_adv = self.attack_single_run(x, y, x_init=x_init) 563 | 564 | return (x_init, acc, loss, x_adv) 565 | 566 | class APGDAttack_targeted(APGDAttack): 567 | def __init__( 568 | self, 569 | predict, 570 | n_iter=100, 571 | norm='Linf', 572 | n_restarts=1, 573 | eps=None, 574 | seed=0, 575 | eot_iter=1, 576 | rho=.75, 577 | topk=None, 578 | n_target_classes=9, 579 | verbose=False, 580 | device=None, 581 | use_largereps=False, 582 | is_tf_model=False, 583 | logger=None): 584 | """ 585 | AutoPGD on the targeted DLR loss 586 | """ 587 | super(APGDAttack_targeted, self).__init__(predict, n_iter=n_iter, norm=norm, 588 | n_restarts=n_restarts, eps=eps, seed=seed, loss='dlr-targeted', 589 | eot_iter=eot_iter, rho=rho, topk=topk, verbose=verbose, device=device, 590 | use_largereps=use_largereps, is_tf_model=is_tf_model, logger=logger) 591 | 592 | self.y_target = None 593 | self.n_target_classes = n_target_classes 594 | 595 | def dlr_loss_targeted(self, x, y): 596 | x_sorted, ind_sorted = x.sort(dim=1) 597 | u = torch.arange(x.shape[0]) 598 | 599 | return -(x[u, y] - x[u, self.y_target]) / (x_sorted[:, -1] - .5 * ( 600 | x_sorted[:, -3] + x_sorted[:, -4]) + 1e-12) 601 | 602 | def ce_loss_targeted(self, x, y): 603 | return -1. * F.cross_entropy(x, self.y_target, reduction='none') 604 | 605 | 606 | def perturb(self, x, y=None, x_init=None): 607 | """ 608 | :param x: clean images 609 | :param y: clean labels, if None we use the predicted labels 610 | """ 611 | 612 | assert self.loss in ['dlr-targeted'] #'ce-targeted' 613 | if not y is None and len(y.shape) == 0: 614 | x.unsqueeze_(0) 615 | y.unsqueeze_(0) 616 | self.init_hyperparam(x) 617 | 618 | x = x.detach().clone().float().to(self.device) 619 | if not self.is_tf_model: 620 | y_pred = self.model(x).max(1)[1] 621 | else: 622 | y_pred = self.model.predict(x).max(1)[1] 623 | if y is None: 624 | #y_pred = self._get_predicted_label(x) 625 | y = y_pred.detach().clone().long().to(self.device) 626 | else: 627 | y = y.detach().clone().long().to(self.device) 628 | 629 | adv = x.clone() 630 | acc = y_pred == y 631 | if self.verbose: 632 | print('-------------------------- ', 633 | 'running {}-attack with epsilon {:.5f}'.format( 634 | self.norm, self.eps), 635 | '--------------------------') 636 | print('initial accuracy: {:.2%}'.format(acc.float().mean())) 637 | 638 | startt = time.time() 639 | 640 | torch.random.manual_seed(self.seed) 641 | torch.cuda.random.manual_seed(self.seed) 642 | 643 | # 644 | 645 | if self.use_largereps: 646 | epss = [3. * self.eps_orig, 2. * self.eps_orig, 1. * self.eps_orig] 647 | iters = [.3 * self.n_iter_orig, .3 * self.n_iter_orig, 648 | .4 * self.n_iter_orig] 649 | iters = [math.ceil(c) for c in iters] 650 | iters[-1] = self.n_iter_orig - sum(iters[:-1]) 651 | if self.verbose: 652 | print('using schedule [{}x{}]'.format('+'.join([str(c 653 | ) for c in epss]), '+'.join([str(c) for c in iters]))) 654 | 655 | for target_class in range(2, self.n_target_classes + 2): 656 | for counter in range(self.n_restarts): 657 | ind_to_fool = acc.nonzero().squeeze() 658 | if len(ind_to_fool.shape) == 0: 659 | ind_to_fool = ind_to_fool.unsqueeze(0) 660 | if ind_to_fool.numel() != 0: 661 | x_to_fool = x[ind_to_fool].clone() 662 | y_to_fool = y[ind_to_fool].clone() 663 | 664 | if not self.is_tf_model: 665 | output = self.model(x_to_fool) 666 | else: 667 | output = self.model.predict(x_to_fool) 668 | self.y_target = output.sort(dim=1)[1][:, -target_class] 669 | 670 | if not self.use_largereps: 671 | res_curr = self.attack_single_run(x_to_fool, y_to_fool) 672 | else: 673 | res_curr = self.decr_eps_pgd(x_to_fool, y_to_fool, epss, iters) 674 | best_curr, acc_curr, loss_curr, adv_curr = res_curr 675 | ind_curr = (acc_curr == 0).nonzero().squeeze() 676 | 677 | acc[ind_to_fool[ind_curr]] = 0 678 | adv[ind_to_fool[ind_curr]] = adv_curr[ind_curr].clone() 679 | if self.verbose: 680 | print('target class {}'.format(target_class), 681 | '- restart {} - robust accuracy: {:.2%}'.format( 682 | counter, acc.float().mean()), 683 | '- cum. time: {:.1f} s'.format( 684 | time.time() - startt)) 685 | 686 | return adv 687 | 688 | -------------------------------------------------------------------------------- /autoattack/checks.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import warnings 3 | import math 4 | import sys 5 | 6 | from autoattack.other_utils import L2_norm 7 | 8 | 9 | funcs = {'grad': 0, 10 | 'backward': 0, 11 | #'enable_grad': 0 12 | '_make_grads': 0, 13 | } 14 | 15 | checks_doc_path = 'flags_doc.md' 16 | 17 | 18 | def check_randomized(model, x, y, bs=250, n=5, alpha=1e-4, logger=None): 19 | acc = [] 20 | corrcl = [] 21 | outputs = [] 22 | with torch.no_grad(): 23 | for _ in range(n): 24 | output = model(x) 25 | corrcl_curr = (output.max(1)[1] == y).sum().item() 26 | corrcl.append(corrcl_curr) 27 | outputs.append(output / (L2_norm(output, keepdim=True) + 1e-10)) 28 | acc = [c != corrcl_curr for c in corrcl] 29 | max_diff = 0. 30 | for c in range(n - 1): 31 | for e in range(c + 1, n): 32 | diff = L2_norm(outputs[c] - outputs[e]) 33 | max_diff = max(max_diff, diff.max().item()) 34 | #print(diff.max().item(), max_diff) 35 | if any(acc) or max_diff > alpha: 36 | msg = 'it seems to be a randomized defense! Please use version="rand".' + \ 37 | f' See {checks_doc_path} for details.' 38 | if logger is None: 39 | warnings.warn(Warning(msg)) 40 | else: 41 | logger.log(f'Warning: {msg}') 42 | 43 | 44 | def check_range_output(model, x, alpha=1e-5, logger=None): 45 | with torch.no_grad(): 46 | output = model(x) 47 | fl = [output.max() < 1. + alpha, output.min() > -alpha, 48 | ((output.sum(-1) - 1.).abs() < alpha).all()] 49 | if all(fl): 50 | msg = 'it seems that the output is a probability distribution,' +\ 51 | ' please be sure that the logits are used!' + \ 52 | f' See {checks_doc_path} for details.' 53 | if logger is None: 54 | warnings.warn(Warning(msg)) 55 | else: 56 | logger.log(f'Warning: {msg}') 57 | return output.shape[-1] 58 | 59 | 60 | def check_zero_gradients(grad, logger=None): 61 | z = grad.view(grad.shape[0], -1).abs().sum(-1) 62 | #print(grad[0, :10]) 63 | if (z == 0).any(): 64 | msg = f'there are {(z == 0).sum()} points with zero gradient!' + \ 65 | ' This might lead to unreliable evaluation with gradient-based attacks.' + \ 66 | f' See {checks_doc_path} for details.' 67 | if logger is None: 68 | warnings.warn(Warning(msg)) 69 | else: 70 | logger.log(f'Warning: {msg}') 71 | 72 | 73 | def check_square_sr(acc_dict, alpha=.002, logger=None): 74 | if 'square' in acc_dict.keys() and len(acc_dict) > 2: 75 | acc = min([v for k, v in acc_dict.items() if k != 'square']) 76 | if acc_dict['square'] < acc - alpha: 77 | msg = 'Square Attack has decreased the robust accuracy of' + \ 78 | f' {acc - acc_dict["square"]:.2%}.' + \ 79 | ' This might indicate that the robustness evaluation using' +\ 80 | ' AutoAttack is unreliable. Consider running Square' +\ 81 | ' Attack with more iterations and restarts or an adaptive attack.' + \ 82 | f' See {checks_doc_path} for details.' 83 | if logger is None: 84 | warnings.warn(Warning(msg)) 85 | else: 86 | logger.log(f'Warning: {msg}') 87 | 88 | 89 | ''' from https://stackoverflow.com/questions/26119521/counting-function-calls-python ''' 90 | def tracefunc(frame, event, args): 91 | if event == 'call' and frame.f_code.co_name in funcs.keys(): 92 | funcs[frame.f_code.co_name] += 1 93 | 94 | 95 | def check_dynamic(model, x, is_tf_model=False, logger=None): 96 | if is_tf_model: 97 | msg = 'the check for dynamic defenses is not currently supported' 98 | else: 99 | msg = None 100 | sys.settrace(tracefunc) 101 | model(x) 102 | sys.settrace(None) 103 | #for k, v in funcs.items(): 104 | # print(k, v) 105 | if any([c > 0 for c in funcs.values()]): 106 | msg = 'it seems to be a dynamic defense! The evaluation' + \ 107 | ' with AutoAttack might be insufficient.' + \ 108 | f' See {checks_doc_path} for details.' 109 | if not msg is None: 110 | if logger is None: 111 | warnings.warn(Warning(msg)) 112 | else: 113 | logger.log(f'Warning: {msg}') 114 | #sys.settrace(None) 115 | 116 | 117 | def check_n_classes(n_cls, attacks_to_run, apgd_targets, fab_targets, 118 | logger=None): 119 | msg = None 120 | if 'apgd-dlr' in attacks_to_run or 'apgd-t' in attacks_to_run: 121 | if n_cls <= 2: 122 | msg = f'with only {n_cls} classes it is not possible to use the DLR loss!' 123 | elif n_cls == 3: 124 | msg = f'with only {n_cls} classes it is not possible to use the targeted DLR loss!' 125 | elif 'apgd-t' in attacks_to_run and \ 126 | apgd_targets + 1 > n_cls: 127 | msg = f'it seems that more target classes ({apgd_targets})' + \ 128 | f' than possible ({n_cls - 1}) are used in {"apgd-t".upper()}!' 129 | if 'fab-t' in attacks_to_run and fab_targets + 1 > n_cls: 130 | if msg is None: 131 | msg = f'it seems that more target classes ({apgd_targets})' + \ 132 | f' than possible ({n_cls - 1}) are used in FAB-T!' 133 | else: 134 | msg += f' Also, it seems that too many target classes ({apgd_targets})' + \ 135 | f' are used in {"fab-t".upper()} ({n_cls - 1} possible)!' 136 | if not msg is None: 137 | if logger is None: 138 | warnings.warn(Warning(msg)) 139 | else: 140 | logger.log(f'Warning: {msg}') 141 | 142 | 143 | -------------------------------------------------------------------------------- /autoattack/examples/eval.py: -------------------------------------------------------------------------------- 1 | import os 2 | import argparse 3 | from pathlib import Path 4 | import warnings 5 | 6 | import torch 7 | import torch.nn as nn 8 | import torchvision.datasets as datasets 9 | import torch.utils.data as data 10 | import torchvision.transforms as transforms 11 | 12 | import sys 13 | sys.path.insert(0,'..') 14 | 15 | from resnet import * 16 | 17 | if __name__ == '__main__': 18 | parser = argparse.ArgumentParser() 19 | parser.add_argument('--data_dir', type=str, default='./data') 20 | parser.add_argument('--norm', type=str, default='Linf') 21 | parser.add_argument('--epsilon', type=float, default=8./255.) 22 | parser.add_argument('--model', type=str, default='./model_test.pt') 23 | parser.add_argument('--n_ex', type=int, default=1000) 24 | parser.add_argument('--individual', action='store_true') 25 | parser.add_argument('--save_dir', type=str, default='./results') 26 | parser.add_argument('--batch_size', type=int, default=500) 27 | parser.add_argument('--log_path', type=str, default='./log_file.txt') 28 | parser.add_argument('--version', type=str, default='standard') 29 | parser.add_argument('--state-path', type=Path, default=None) 30 | 31 | args = parser.parse_args() 32 | 33 | # load model 34 | model = ResNet18() 35 | ckpt = torch.load(args.model) 36 | model.load_state_dict(ckpt) 37 | model.cuda() 38 | model.eval() 39 | 40 | # load data 41 | transform_list = [transforms.ToTensor()] 42 | transform_chain = transforms.Compose(transform_list) 43 | item = datasets.CIFAR10(root=args.data_dir, train=False, transform=transform_chain, download=True) 44 | test_loader = data.DataLoader(item, batch_size=1000, shuffle=False, num_workers=0) 45 | 46 | # create save dir 47 | if not os.path.exists(args.save_dir): 48 | os.makedirs(args.save_dir) 49 | 50 | # load attack 51 | from autoattack import AutoAttack 52 | adversary = AutoAttack(model, norm=args.norm, eps=args.epsilon, log_path=args.log_path, 53 | version=args.version) 54 | 55 | l = [x for (x, y) in test_loader] 56 | x_test = torch.cat(l, 0) 57 | l = [y for (x, y) in test_loader] 58 | y_test = torch.cat(l, 0) 59 | 60 | # example of custom version 61 | if args.version == 'custom': 62 | adversary.attacks_to_run = ['apgd-ce', 'fab'] 63 | adversary.apgd.n_restarts = 2 64 | adversary.fab.n_restarts = 2 65 | 66 | # run attack and save images 67 | with torch.no_grad(): 68 | if not args.individual: 69 | adv_complete = adversary.run_standard_evaluation(x_test[:args.n_ex], y_test[:args.n_ex], 70 | bs=args.batch_size, state_path=args.state_path) 71 | 72 | torch.save({'adv_complete': adv_complete}, '{}/{}_{}_1_{}_eps_{:.5f}.pth'.format( 73 | args.save_dir, 'aa', args.version, adv_complete.shape[0], args.epsilon)) 74 | 75 | else: 76 | # individual version, each attack is run on all test points 77 | adv_complete = adversary.run_standard_evaluation_individual(x_test[:args.n_ex], 78 | y_test[:args.n_ex], bs=args.batch_size) 79 | 80 | torch.save(adv_complete, '{}/{}_{}_individual_1_{}_eps_{:.5f}_plus_{}_cheap_{}.pth'.format( 81 | args.save_dir, 'aa', args.version, args.n_ex, args.epsilon)) 82 | 83 | -------------------------------------------------------------------------------- /autoattack/examples/eval_tf1.py: -------------------------------------------------------------------------------- 1 | #%% 2 | from argparse import ArgumentParser 3 | 4 | import numpy as np 5 | import tensorflow as tf 6 | 7 | import torch 8 | import torch.nn as nn 9 | import torchvision.datasets as datasets 10 | import torch.utils.data as data 11 | import torchvision.transforms as transforms 12 | 13 | import sys 14 | #sys.path.insert(0,'..') 15 | 16 | from autoattack import AutoAttack, utils_tf 17 | # 18 | 19 | #%% 20 | class mnist_loader: 21 | def __init__(self): 22 | 23 | self.n_class = 10 24 | self.dim_x = 28 25 | self.dim_y = 28 26 | self.dim_z = 1 27 | self.img_min = 0.0 28 | self.img_max = 1.0 29 | self.epsilon = 0.3 30 | 31 | def download(self): 32 | (trainX, trainY), (testX, testY) = tf.keras.datasets.mnist.load_data() 33 | 34 | trainX = trainX.astype(np.float32) 35 | testX = testX.astype(np.float32) 36 | 37 | # ont-hot 38 | trainY = tf.keras.utils.to_categorical(trainY, self.n_class) 39 | testY = tf.keras.utils.to_categorical(testY , self.n_class) 40 | 41 | # get validation sets 42 | training_size = 55000 43 | validX = trainX[training_size:,:] 44 | validY = trainY[training_size:,:] 45 | 46 | trainX = trainX[:training_size,:] 47 | trainY = trainY[:training_size,:] 48 | 49 | # expand dimesion 50 | trainX = np.expand_dims(trainX, axis=3) 51 | validX = np.expand_dims(validX, axis=3) 52 | testX = np.expand_dims(testX , axis=3) 53 | 54 | return trainX, trainY, validX, validY, testX, testY 55 | 56 | def get_raw_data(self): 57 | return self.download() 58 | 59 | def get_normalized_data(self): 60 | trainX, trainY, validX, validY, testX, testY = self.get_raw_data() 61 | trainX = trainX / 255.0 * (self.img_max - self.img_min) + self.img_min 62 | validX = validX / 255.0 * (self.img_max - self.img_min) + self.img_min 63 | testX = testX / 255.0 * (self.img_max - self.img_min) + self.img_min 64 | trainY = trainY 65 | validY = validY 66 | testY = testY 67 | return trainX, trainY, validX, validY, testX, testY 68 | 69 | #%% 70 | def mnist_model(): 71 | # declare variables 72 | model_layers = [ tf.keras.layers.Input(shape=(28,28,1), name="model/input"), 73 | tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c1"), 74 | tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c2"), 75 | tf.keras.layers.MaxPooling2D(pool_size=(2, 2), name="clf/p1"), 76 | tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c3"), 77 | tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c4"), 78 | tf.keras.layers.MaxPooling2D(pool_size=(2, 2), name="clf/p2"), 79 | tf.keras.layers.Flatten(name="clf/f1"), 80 | tf.keras.layers.Dense(256, activation="relu", kernel_initializer='he_normal', name="clf/d1"), 81 | tf.keras.layers.Dense(10 , activation=None , kernel_initializer='he_normal', name="clf/d2"), 82 | tf.keras.layers.Activation('softmax', name="clf_output") 83 | ] 84 | 85 | # clf_model 86 | clf_model = tf.keras.Sequential() 87 | for ii in model_layers: 88 | clf_model.add(ii) 89 | 90 | clf_model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy']) 91 | clf_model.summary() 92 | 93 | return clf_model 94 | 95 | #%% 96 | def arg_parser(parser): 97 | 98 | parser.add_argument("--path" , dest ="path", type=str, default='./', help="path of tf.keras model's wieghts") 99 | args, unknown = parser.parse_known_args() 100 | if unknown: 101 | msg = " ".join(unknown) 102 | print('[Warning] Unrecognized arguments: {:s}'.format(msg) ) 103 | 104 | return args 105 | 106 | #%% 107 | if __name__ == '__main__': 108 | 109 | # get arguments 110 | parser = ArgumentParser() 111 | args = arg_parser(parser) 112 | 113 | # MODEL PATH 114 | MODEL_PATH = args.path 115 | 116 | # init tf/keras 117 | tf.compat.v1.keras.backend.clear_session() 118 | gpu_options = tf.compat.v1.GPUOptions(allow_growth=True) 119 | sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(gpu_options=gpu_options)) 120 | tf.compat.v1.keras.backend.set_session(sess) 121 | tf.compat.v1.keras.backend.set_learning_phase(0) 122 | 123 | # load data 124 | batch_size = 1000 125 | epsilon = mnist_loader().epsilon 126 | _, _, _, _, testX, testY = mnist_loader().get_normalized_data() 127 | 128 | # convert to pytorch format 129 | testY = np.argmax(testY, axis=1) 130 | torch_testX = torch.from_numpy( np.transpose(testX, (0, 3, 1, 2)) ).float().cuda() 131 | torch_testY = torch.from_numpy( testY ).float() 132 | 133 | # load model from saved weights 134 | print('[INFO] MODEL_PATH: {:s}'.format(MODEL_PATH) ) 135 | tf_model = mnist_model() 136 | tf_model.load_weights(MODEL_PATH) 137 | 138 | # remove 'softmax layer' and put it into adapter 139 | atk_model = tf.keras.models.Model(inputs=tf_model.input, outputs=tf_model.get_layer(index=-2).output) 140 | atk_model.summary() 141 | y_input = tf.placeholder(tf.int64, shape = [None]) 142 | x_input = atk_model.input 143 | logits = atk_model.output 144 | model_adapted = utils_tf.ModelAdapter(logits, x_input, y_input, sess) 145 | 146 | # run attack 147 | adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True) 148 | x_adv, y_adv = adversary.run_standard_evaluation(torch_testX, torch_testY, bs=batch_size, return_labels=True) 149 | np_x_adv = np.moveaxis(x_adv.cpu().numpy(), 1, 3) 150 | np.save("./output/mnist_adv.npy", np_x_adv) 151 | -------------------------------------------------------------------------------- /autoattack/examples/eval_tf2.py: -------------------------------------------------------------------------------- 1 | #%% 2 | from argparse import ArgumentParser 3 | 4 | import numpy as np 5 | import tensorflow as tf 6 | 7 | import torch 8 | import torch.nn as nn 9 | import torchvision.datasets as datasets 10 | import torch.utils.data as data 11 | import torchvision.transforms as transforms 12 | 13 | import sys 14 | sys.path.insert(0, '..') 15 | 16 | from autoattack import AutoAttack, utils_tf2 17 | 18 | 19 | #%% 20 | class mnist_loader: 21 | def __init__(self): 22 | 23 | self.n_class = 10 24 | self.dim_x = 28 25 | self.dim_y = 28 26 | self.dim_z = 1 27 | self.img_min = 0.0 28 | self.img_max = 1.0 29 | self.epsilon = 0.3 30 | 31 | def download(self): 32 | (trainX, trainY), (testX, testY) = tf.keras.datasets.mnist.load_data() 33 | 34 | trainX = trainX.astype(np.float32) 35 | testX = testX.astype(np.float32) 36 | 37 | # ont-hot 38 | trainY = tf.keras.utils.to_categorical(trainY, self.n_class) 39 | testY = tf.keras.utils.to_categorical(testY , self.n_class) 40 | 41 | # get validation sets 42 | training_size = 55000 43 | validX = trainX[training_size:,:] 44 | validY = trainY[training_size:,:] 45 | 46 | trainX = trainX[:training_size,:] 47 | trainY = trainY[:training_size,:] 48 | 49 | # expand dimesion 50 | trainX = np.expand_dims(trainX, axis=3) 51 | validX = np.expand_dims(validX, axis=3) 52 | testX = np.expand_dims(testX , axis=3) 53 | 54 | return trainX, trainY, validX, validY, testX, testY 55 | 56 | def get_raw_data(self): 57 | return self.download() 58 | 59 | def get_normalized_data(self): 60 | trainX, trainY, validX, validY, testX, testY = self.get_raw_data() 61 | trainX = trainX / 255.0 * (self.img_max - self.img_min) + self.img_min 62 | validX = validX / 255.0 * (self.img_max - self.img_min) + self.img_min 63 | testX = testX / 255.0 * (self.img_max - self.img_min) + self.img_min 64 | trainY = trainY 65 | validY = validY 66 | testY = testY 67 | return trainX, trainY, validX, validY, testX, testY 68 | 69 | #%% 70 | def mnist_model(): 71 | # declare variables 72 | model_layers = [ tf.keras.layers.Input(shape=(28,28,1), name="model/input"), 73 | tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c1"), 74 | tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c2"), 75 | tf.keras.layers.MaxPooling2D(pool_size=(2, 2), name="clf/p1"), 76 | tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c3"), 77 | tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c4"), 78 | tf.keras.layers.MaxPooling2D(pool_size=(2, 2), name="clf/p2"), 79 | tf.keras.layers.Flatten(name="clf/f1"), 80 | tf.keras.layers.Dense(256, activation="relu", kernel_initializer='he_normal', name="clf/d1"), 81 | tf.keras.layers.Dense(10 , activation=None , kernel_initializer='he_normal', name="clf/d2"), 82 | tf.keras.layers.Activation('softmax', name="clf_output") 83 | ] 84 | 85 | # clf_model 86 | clf_model = tf.keras.Sequential() 87 | for ii in model_layers: 88 | clf_model.add(ii) 89 | 90 | clf_model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy']) 91 | clf_model.summary() 92 | 93 | return clf_model 94 | 95 | #%% 96 | def arg_parser(parser): 97 | 98 | parser.add_argument("--path" , dest ="path", type=str, default='./autoattack/examples/tf_model.weight.h5', help="path of tf.keras model's wieghts") 99 | args, unknown = parser.parse_known_args() 100 | if unknown: 101 | msg = " ".join(unknown) 102 | print('[Warning] Unrecognized arguments: {:s}'.format(msg) ) 103 | 104 | return args 105 | 106 | #%% 107 | if __name__ == '__main__': 108 | 109 | # get arguments 110 | parser = ArgumentParser() 111 | args = arg_parser(parser) 112 | 113 | # MODEL PATH 114 | MODEL_PATH = args.path 115 | 116 | # init tf/keras 117 | gpus = tf.config.list_physical_devices('GPU') 118 | for gpu in gpus: 119 | tf.config.experimental.set_memory_growth(gpu, True) 120 | 121 | # load data 122 | batch_size = 1000 123 | epsilon = mnist_loader().epsilon 124 | _, _, _, _, testX, testY = mnist_loader().get_normalized_data() 125 | 126 | # convert to pytorch format 127 | testY = np.argmax(testY, axis=1) 128 | torch_testX = torch.from_numpy( np.transpose(testX, (0, 3, 1, 2)) ).float().cuda() 129 | torch_testY = torch.from_numpy( testY ).float() 130 | 131 | # load model from saved weights 132 | print('[INFO] MODEL_PATH: {:s}'.format(MODEL_PATH) ) 133 | tf_model = mnist_model() 134 | tf_model.load_weights(MODEL_PATH) 135 | 136 | # remove 'softmax layer' and put it into adapter 137 | atk_model = tf.keras.models.Model(inputs=tf_model.input, outputs=tf_model.get_layer(index=-2).output) 138 | atk_model.summary() 139 | model_adapted = utils_tf2.ModelAdapter(atk_model) 140 | 141 | # run attack 142 | adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True) 143 | x_adv, y_adv = adversary.run_standard_evaluation(torch_testX, torch_testY, bs=batch_size, return_labels=True) 144 | np_x_adv = np.moveaxis(x_adv.cpu().numpy(), 1, 3) 145 | np.save("./output/mnist_adv.npy", np_x_adv) 146 | -------------------------------------------------------------------------------- /autoattack/examples/model_test.pt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/auto-attack/a39220048b3c9f2cca9a4d3a54604793c68eca7e/autoattack/examples/model_test.pt -------------------------------------------------------------------------------- /autoattack/examples/resnet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | 6 | class BasicBlock(nn.Module): 7 | expansion = 1 8 | 9 | def __init__(self, in_planes, planes, stride=1): 10 | super(BasicBlock, self).__init__() 11 | self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False) 12 | self.bn1 = nn.BatchNorm2d(planes) 13 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False) 14 | self.bn2 = nn.BatchNorm2d(planes) 15 | 16 | self.shortcut = nn.Sequential() 17 | if stride != 1 or in_planes != self.expansion * planes: 18 | self.shortcut = nn.Sequential( 19 | nn.Conv2d(in_planes, self.expansion * planes, kernel_size=1, stride=stride, bias=False), 20 | nn.BatchNorm2d(self.expansion * planes) 21 | ) 22 | 23 | def forward(self, x): 24 | out = F.relu(self.bn1(self.conv1(x))) 25 | out = self.bn2(self.conv2(out)) 26 | out += self.shortcut(x) 27 | out = F.relu(out) 28 | return out 29 | 30 | 31 | class Bottleneck(nn.Module): 32 | expansion = 4 33 | 34 | def __init__(self, in_planes, planes, stride=1): 35 | super(Bottleneck, self).__init__() 36 | self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False) 37 | self.bn1 = nn.BatchNorm2d(planes) 38 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False) 39 | self.bn2 = nn.BatchNorm2d(planes) 40 | self.conv3 = nn.Conv2d(planes, self.expansion * planes, kernel_size=1, bias=False) 41 | self.bn3 = nn.BatchNorm2d(self.expansion * planes) 42 | 43 | self.shortcut = nn.Sequential() 44 | if stride != 1 or in_planes != self.expansion * planes: 45 | self.shortcut = nn.Sequential( 46 | nn.Conv2d(in_planes, self.expansion * planes, kernel_size=1, stride=stride, bias=False), 47 | nn.BatchNorm2d(self.expansion * planes) 48 | ) 49 | 50 | def forward(self, x): 51 | out = F.relu(self.bn1(self.conv1(x))) 52 | out = F.relu(self.bn2(self.conv2(out))) 53 | out = self.bn3(self.conv3(out)) 54 | out += self.shortcut(x) 55 | out = F.relu(out) 56 | return out 57 | 58 | 59 | class ResNet(nn.Module): 60 | def __init__(self, block, num_blocks, num_classes=10): 61 | super(ResNet, self).__init__() 62 | self.in_planes = 64 63 | 64 | self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False) 65 | self.bn1 = nn.BatchNorm2d(64) 66 | self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1) 67 | self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2) 68 | self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2) 69 | self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2) 70 | self.linear = nn.Linear(512 * block.expansion, num_classes) 71 | 72 | def _make_layer(self, block, planes, num_blocks, stride): 73 | strides = [stride] + [1] * (num_blocks - 1) 74 | layers = [] 75 | for stride in strides: 76 | layers.append(block(self.in_planes, planes, stride)) 77 | self.in_planes = planes * block.expansion 78 | return nn.Sequential(*layers) 79 | 80 | def forward(self, x): 81 | out = F.relu(self.bn1(self.conv1(x))) 82 | out = self.layer1(out) 83 | out = self.layer2(out) 84 | out = self.layer3(out) 85 | out = self.layer4(out) 86 | out = F.avg_pool2d(out, 4) 87 | out = out.view(out.size(0), -1) 88 | out = self.linear(out) 89 | return out 90 | 91 | 92 | def ResNet18(): 93 | return ResNet(BasicBlock, [2, 2, 2, 2]) 94 | 95 | 96 | def ResNet34(): 97 | return ResNet(BasicBlock, [3, 4, 6, 3]) 98 | 99 | 100 | def ResNet50(): 101 | return ResNet(Bottleneck, [3, 4, 6, 3]) 102 | 103 | 104 | def ResNet101(): 105 | return ResNet(Bottleneck, [3, 4, 23, 3]) 106 | 107 | 108 | def ResNet152(): 109 | return ResNet(Bottleneck, [3, 8, 36, 3]) 110 | 111 | 112 | def test(): 113 | net = ResNet18() 114 | y = net(torch.randn(1, 3, 32, 32)) 115 | print(y.size()) 116 | -------------------------------------------------------------------------------- /autoattack/examples/tf_model.weight.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/fra31/auto-attack/a39220048b3c9f2cca9a4d3a54604793c68eca7e/autoattack/examples/tf_model.weight.h5 -------------------------------------------------------------------------------- /autoattack/fab_base.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2019-present, Francesco Croce 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | from __future__ import unicode_literals 12 | 13 | import time 14 | 15 | import torch 16 | 17 | from autoattack.fab_projections import projection_linf, projection_l2,\ 18 | projection_l1 19 | 20 | DEFAULT_EPS_DICT_BY_NORM = {'Linf': .3, 'L2': 1., 'L1': 5.0} 21 | 22 | 23 | class FABAttack(): 24 | """ 25 | Fast Adaptive Boundary Attack (Linf, L2, L1) 26 | https://arxiv.org/abs/1907.02044 27 | 28 | :param norm: Lp-norm to minimize ('Linf', 'L2', 'L1' supported) 29 | :param n_restarts: number of random restarts 30 | :param n_iter: number of iterations 31 | :param eps: epsilon for the random restarts 32 | :param alpha_max: alpha_max 33 | :param eta: overshooting 34 | :param beta: backward step 35 | """ 36 | 37 | def __init__( 38 | self, 39 | norm='Linf', 40 | n_restarts=1, 41 | n_iter=100, 42 | eps=None, 43 | alpha_max=0.1, 44 | eta=1.05, 45 | beta=0.9, 46 | loss_fn=None, 47 | verbose=False, 48 | seed=0, 49 | targeted=False, 50 | device=None, 51 | n_target_classes=9): 52 | """ FAB-attack implementation in pytorch """ 53 | 54 | self.norm = norm 55 | self.n_restarts = n_restarts 56 | self.n_iter = n_iter 57 | self.eps = eps if eps is not None else DEFAULT_EPS_DICT_BY_NORM[norm] 58 | self.alpha_max = alpha_max 59 | self.eta = eta 60 | self.beta = beta 61 | self.targeted = targeted 62 | self.verbose = verbose 63 | self.seed = seed 64 | self.target_class = None 65 | self.device = device 66 | self.n_target_classes = n_target_classes 67 | 68 | def check_shape(self, x): 69 | return x if len(x.shape) > 0 else x.unsqueeze(0) 70 | 71 | def _predict_fn(self, x): 72 | raise NotImplementedError("Virtual function.") 73 | 74 | def _get_predicted_label(self, x): 75 | raise NotImplementedError("Virtual function.") 76 | 77 | def get_diff_logits_grads_batch(self, imgs, la): 78 | raise NotImplementedError("Virtual function.") 79 | 80 | def get_diff_logits_grads_batch_targeted(self, imgs, la, la_target): 81 | raise NotImplementedError("Virtual function.") 82 | 83 | def attack_single_run(self, x, y=None, use_rand_start=False, is_targeted=False): 84 | """ 85 | :param x: clean images 86 | :param y: clean labels, if None we use the predicted labels 87 | :param is_targeted True if we ise targeted version. Targeted class is assigned by `self.target_class` 88 | """ 89 | 90 | if self.device is None: 91 | self.device = x.device 92 | self.orig_dim = list(x.shape[1:]) 93 | self.ndims = len(self.orig_dim) 94 | 95 | x = x.detach().clone().float().to(self.device) 96 | #assert next(self.predict.parameters()).device == x.device 97 | 98 | y_pred = self._get_predicted_label(x) 99 | if y is None: 100 | y = y_pred.detach().clone().long().to(self.device) 101 | else: 102 | y = y.detach().clone().long().to(self.device) 103 | pred = y_pred == y 104 | corr_classified = pred.float().sum() 105 | if self.verbose: 106 | print('Clean accuracy: {:.2%}'.format(pred.float().mean())) 107 | if pred.sum() == 0: 108 | return x 109 | pred = self.check_shape(pred.nonzero().squeeze()) 110 | 111 | if is_targeted: 112 | output = self._predict_fn(x) 113 | la_target = output.sort(dim=-1)[1][:, -self.target_class] 114 | la_target2 = la_target[pred].detach().clone() 115 | 116 | startt = time.time() 117 | # runs the attack only on correctly classified points 118 | im2 = x[pred].detach().clone() 119 | la2 = y[pred].detach().clone() 120 | if len(im2.shape) == self.ndims: 121 | im2 = im2.unsqueeze(0) 122 | bs = im2.shape[0] 123 | u1 = torch.arange(bs) 124 | adv = im2.clone() 125 | adv_c = x.clone() 126 | res2 = 1e10 * torch.ones([bs]).to(self.device) 127 | x1 = im2.clone() 128 | x0 = im2.clone().reshape([bs, -1]) 129 | 130 | if use_rand_start: 131 | if self.norm == 'Linf': 132 | t = 2 * torch.rand(x1.shape).to(self.device) - 1 133 | x1 = im2 + (torch.min(res2, 134 | self.eps * torch.ones(res2.shape) 135 | .to(self.device) 136 | ).reshape([-1, *[1]*self.ndims]) 137 | ) * t / (t.reshape([t.shape[0], -1]).abs() 138 | .max(dim=1, keepdim=True)[0] 139 | .reshape([-1, *[1]*self.ndims])) * .5 140 | elif self.norm == 'L2': 141 | t = torch.randn(x1.shape).to(self.device) 142 | x1 = im2 + (torch.min(res2, 143 | self.eps * torch.ones(res2.shape) 144 | .to(self.device) 145 | ).reshape([-1, *[1]*self.ndims]) 146 | ) * t / ((t ** 2) 147 | .view(t.shape[0], -1) 148 | .sum(dim=-1) 149 | .sqrt() 150 | .view(t.shape[0], *[1]*self.ndims)) * .5 151 | elif self.norm == 'L1': 152 | t = torch.randn(x1.shape).to(self.device) 153 | x1 = im2 + (torch.min(res2, 154 | self.eps * torch.ones(res2.shape) 155 | .to(self.device) 156 | ).reshape([-1, *[1]*self.ndims]) 157 | ) * t / (t.abs().view(t.shape[0], -1) 158 | .sum(dim=-1) 159 | .view(t.shape[0], *[1]*self.ndims)) / 2 160 | 161 | x1 = x1.clamp(0.0, 1.0) 162 | 163 | counter_iter = 0 164 | while counter_iter < self.n_iter: 165 | with torch.no_grad(): 166 | if is_targeted: 167 | df, dg = self.get_diff_logits_grads_batch_targeted(x1, la2, la_target2) 168 | else: 169 | df, dg = self.get_diff_logits_grads_batch(x1, la2) 170 | if self.norm == 'Linf': 171 | dist1 = df.abs() / (1e-12 + 172 | dg.abs() 173 | .reshape(dg.shape[0], dg.shape[1], -1) 174 | .sum(dim=-1)) 175 | elif self.norm == 'L2': 176 | dist1 = df.abs() / (1e-12 + (dg ** 2) 177 | .reshape(dg.shape[0], dg.shape[1], -1) 178 | .sum(dim=-1).sqrt()) 179 | elif self.norm == 'L1': 180 | dist1 = df.abs() / (1e-12 + dg.abs().reshape( 181 | [df.shape[0], df.shape[1], -1]).max(dim=2)[0]) 182 | else: 183 | raise ValueError('norm not supported') 184 | ind = dist1.min(dim=1)[1] 185 | dg2 = dg[u1, ind] 186 | b = (- df[u1, ind] + (dg2 * x1).reshape(x1.shape[0], -1) 187 | .sum(dim=-1)) 188 | w = dg2.reshape([bs, -1]) 189 | 190 | if self.norm == 'Linf': 191 | d3 = projection_linf( 192 | torch.cat((x1.reshape([bs, -1]), x0), 0), 193 | torch.cat((w, w), 0), 194 | torch.cat((b, b), 0)) 195 | elif self.norm == 'L2': 196 | d3 = projection_l2( 197 | torch.cat((x1.reshape([bs, -1]), x0), 0), 198 | torch.cat((w, w), 0), 199 | torch.cat((b, b), 0)) 200 | elif self.norm == 'L1': 201 | d3 = projection_l1( 202 | torch.cat((x1.reshape([bs, -1]), x0), 0), 203 | torch.cat((w, w), 0), 204 | torch.cat((b, b), 0)) 205 | d1 = torch.reshape(d3[:bs], x1.shape) 206 | d2 = torch.reshape(d3[-bs:], x1.shape) 207 | if self.norm == 'Linf': 208 | a0 = d3.abs().max(dim=1, keepdim=True)[0]\ 209 | .view(-1, *[1]*self.ndims) 210 | elif self.norm == 'L2': 211 | a0 = (d3 ** 2).sum(dim=1, keepdim=True).sqrt()\ 212 | .view(-1, *[1]*self.ndims) 213 | elif self.norm == 'L1': 214 | a0 = d3.abs().sum(dim=1, keepdim=True)\ 215 | .view(-1, *[1]*self.ndims) 216 | a0 = torch.max(a0, 1e-8 * torch.ones( 217 | a0.shape).to(self.device)) 218 | a1 = a0[:bs] 219 | a2 = a0[-bs:] 220 | alpha = torch.min(torch.max(a1 / (a1 + a2), 221 | torch.zeros(a1.shape) 222 | .to(self.device)), 223 | self.alpha_max * torch.ones(a1.shape) 224 | .to(self.device)) 225 | x1 = ((x1 + self.eta * d1) * (1 - alpha) + 226 | (im2 + d2 * self.eta) * alpha).clamp(0.0, 1.0) 227 | 228 | is_adv = self._get_predicted_label(x1) != la2 229 | 230 | if is_adv.sum() > 0: 231 | ind_adv = is_adv.nonzero().squeeze() 232 | ind_adv = self.check_shape(ind_adv) 233 | if self.norm == 'Linf': 234 | t = (x1[ind_adv] - im2[ind_adv]).reshape( 235 | [ind_adv.shape[0], -1]).abs().max(dim=1)[0] 236 | elif self.norm == 'L2': 237 | t = ((x1[ind_adv] - im2[ind_adv]) ** 2)\ 238 | .reshape(ind_adv.shape[0], -1).sum(dim=-1).sqrt() 239 | elif self.norm == 'L1': 240 | t = (x1[ind_adv] - im2[ind_adv])\ 241 | .abs().reshape(ind_adv.shape[0], -1).sum(dim=-1) 242 | adv[ind_adv] = x1[ind_adv] * (t < res2[ind_adv]).\ 243 | float().reshape([-1, *[1]*self.ndims]) + adv[ind_adv]\ 244 | * (t >= res2[ind_adv]).float().reshape( 245 | [-1, *[1]*self.ndims]) 246 | res2[ind_adv] = t * (t < res2[ind_adv]).float()\ 247 | + res2[ind_adv] * (t >= res2[ind_adv]).float() 248 | x1[ind_adv] = im2[ind_adv] + ( 249 | x1[ind_adv] - im2[ind_adv]) * self.beta 250 | 251 | counter_iter += 1 252 | 253 | ind_succ = res2 < 1e10 254 | if self.verbose: 255 | print('success rate: {:.0f}/{:.0f}' 256 | .format(ind_succ.float().sum(), corr_classified) + 257 | ' (on correctly classified points) in {:.1f} s' 258 | .format(time.time() - startt)) 259 | 260 | ind_succ = self.check_shape(ind_succ.nonzero().squeeze()) 261 | adv_c[pred[ind_succ]] = adv[ind_succ].clone() 262 | 263 | return adv_c 264 | 265 | def perturb(self, x, y): 266 | if self.device is None: 267 | self.device = x.device 268 | adv = x.clone() 269 | with torch.no_grad(): 270 | acc = self._predict_fn(x).max(1)[1] == y 271 | 272 | startt = time.time() 273 | 274 | torch.random.manual_seed(self.seed) 275 | torch.cuda.random.manual_seed(self.seed) 276 | 277 | if not self.targeted: 278 | for counter in range(self.n_restarts): 279 | ind_to_fool = acc.nonzero().squeeze() 280 | if len(ind_to_fool.shape) == 0: ind_to_fool = ind_to_fool.unsqueeze(0) 281 | if ind_to_fool.numel() != 0: 282 | x_to_fool, y_to_fool = x[ind_to_fool].clone(), y[ind_to_fool].clone() 283 | adv_curr = self.attack_single_run(x_to_fool, y_to_fool, use_rand_start=(counter > 0), is_targeted=False) 284 | 285 | acc_curr = self._predict_fn(adv_curr).max(1)[1] == y_to_fool 286 | if self.norm == 'Linf': 287 | res = (x_to_fool - adv_curr).abs().reshape(x_to_fool.shape[0], -1).max(1)[0] 288 | elif self.norm == 'L2': 289 | res = ((x_to_fool - adv_curr) ** 2).reshape(x_to_fool.shape[0], -1).sum(dim=-1).sqrt() 290 | elif self.norm == 'L1': 291 | res = (x_to_fool - adv_curr).abs().reshape(x_to_fool.shape[0], -1).sum(-1) 292 | acc_curr = torch.max(acc_curr, res > self.eps) 293 | 294 | ind_curr = (acc_curr == 0).nonzero().squeeze() 295 | acc[ind_to_fool[ind_curr]] = 0 296 | adv[ind_to_fool[ind_curr]] = adv_curr[ind_curr].clone() 297 | 298 | if self.verbose: 299 | print('restart {} - robust accuracy: {:.2%} at eps = {:.5f} - cum. time: {:.1f} s'.format( 300 | counter, acc.float().mean(), self.eps, time.time() - startt)) 301 | 302 | else: 303 | for target_class in range(2, self.n_target_classes + 2): 304 | self.target_class = target_class 305 | for counter in range(self.n_restarts): 306 | ind_to_fool = acc.nonzero().squeeze() 307 | if len(ind_to_fool.shape) == 0: ind_to_fool = ind_to_fool.unsqueeze(0) 308 | if ind_to_fool.numel() != 0: 309 | x_to_fool, y_to_fool = x[ind_to_fool].clone(), y[ind_to_fool].clone() 310 | adv_curr = self.attack_single_run(x_to_fool, y_to_fool, use_rand_start=(counter > 0), is_targeted=True) 311 | 312 | acc_curr = self._predict_fn(adv_curr).max(1)[1] == y_to_fool 313 | if self.norm == 'Linf': 314 | res = (x_to_fool - adv_curr).abs().reshape(x_to_fool.shape[0], -1).max(1)[0] 315 | elif self.norm == 'L2': 316 | res = ((x_to_fool - adv_curr) ** 2).reshape(x_to_fool.shape[0], -1).sum(dim=-1).sqrt() 317 | elif self.norm == 'L1': 318 | res = (x_to_fool - adv_curr).abs().reshape(x_to_fool.shape[0], -1).sum(-1) 319 | acc_curr = torch.max(acc_curr, res > self.eps) 320 | 321 | ind_curr = (acc_curr == 0).nonzero().squeeze() 322 | acc[ind_to_fool[ind_curr]] = 0 323 | adv[ind_to_fool[ind_curr]] = adv_curr[ind_curr].clone() 324 | 325 | if self.verbose: 326 | print('restart {} - target_class {} - robust accuracy: {:.2%} at eps = {:.5f} - cum. time: {:.1f} s'.format( 327 | counter, self.target_class, acc.float().mean(), self.eps, time.time() - startt)) 328 | 329 | return adv 330 | -------------------------------------------------------------------------------- /autoattack/fab_projections.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import torch 4 | from torch.nn import functional as F 5 | 6 | 7 | def projection_linf(points_to_project, w_hyperplane, b_hyperplane): 8 | device = points_to_project.device 9 | t, w, b = points_to_project, w_hyperplane.clone(), b_hyperplane.clone() 10 | 11 | sign = 2 * ((w * t).sum(1) - b >= 0) - 1 12 | w.mul_(sign.unsqueeze(1)) 13 | b.mul_(sign) 14 | 15 | a = (w < 0).float() 16 | d = (a - t) * (w != 0).float() 17 | 18 | p = a - t * (2 * a - 1) 19 | indp = torch.argsort(p, dim=1) 20 | 21 | b = b - (w * t).sum(1) 22 | b0 = (w * d).sum(1) 23 | 24 | indp2 = indp.flip((1,)) 25 | ws = w.gather(1, indp2) 26 | bs2 = - ws * d.gather(1, indp2) 27 | 28 | s = torch.cumsum(ws.abs(), dim=1) 29 | sb = torch.cumsum(bs2, dim=1) + b0.unsqueeze(1) 30 | 31 | b2 = sb[:, -1] - s[:, -1] * p.gather(1, indp[:, 0:1]).squeeze(1) 32 | c_l = b - b2 > 0 33 | c2 = (b - b0 > 0) & (~c_l) 34 | lb = torch.zeros(c2.sum(), device=device) 35 | ub = torch.full_like(lb, w.shape[1] - 1) 36 | nitermax = math.ceil(math.log2(w.shape[1])) 37 | 38 | indp_, sb_, s_, p_, b_ = indp[c2], sb[c2], s[c2], p[c2], b[c2] 39 | for counter in range(nitermax): 40 | counter4 = torch.floor((lb + ub) / 2) 41 | 42 | counter2 = counter4.long().unsqueeze(1) 43 | indcurr = indp_.gather(1, indp_.size(1) - 1 - counter2) 44 | b2 = (sb_.gather(1, counter2) - s_.gather(1, counter2) * p_.gather(1, indcurr)).squeeze(1) 45 | c = b_ - b2 > 0 46 | 47 | lb = torch.where(c, counter4, lb) 48 | ub = torch.where(c, ub, counter4) 49 | 50 | lb = lb.long() 51 | 52 | if c_l.any(): 53 | lmbd_opt = torch.clamp_min((b[c_l] - sb[c_l, -1]) / (-s[c_l, -1]), min=0).unsqueeze(-1) 54 | d[c_l] = (2 * a[c_l] - 1) * lmbd_opt 55 | 56 | lmbd_opt = torch.clamp_min((b[c2] - sb[c2, lb]) / (-s[c2, lb]), min=0).unsqueeze(-1) 57 | d[c2] = torch.min(lmbd_opt, d[c2]) * a[c2] + torch.max(-lmbd_opt, d[c2]) * (1 - a[c2]) 58 | 59 | return d * (w != 0).float() 60 | 61 | 62 | def projection_l2(points_to_project, w_hyperplane, b_hyperplane): 63 | device = points_to_project.device 64 | t, w, b = points_to_project, w_hyperplane.clone(), b_hyperplane 65 | 66 | c = (w * t).sum(1) - b 67 | ind2 = 2 * (c >= 0) - 1 68 | w.mul_(ind2.unsqueeze(1)) 69 | c.mul_(ind2) 70 | 71 | r = torch.max(t / w, (t - 1) / w).clamp(min=-1e12, max=1e12) 72 | r.masked_fill_(w.abs() < 1e-8, 1e12) 73 | r[r == -1e12] *= -1 74 | rs, indr = torch.sort(r, dim=1) 75 | rs2 = F.pad(rs[:, 1:], (0, 1)) 76 | rs.masked_fill_(rs == 1e12, 0) 77 | rs2.masked_fill_(rs2 == 1e12, 0) 78 | 79 | w3s = (w ** 2).gather(1, indr) 80 | w5 = w3s.sum(dim=1, keepdim=True) 81 | ws = w5 - torch.cumsum(w3s, dim=1) 82 | d = -(r * w) 83 | d.mul_((w.abs() > 1e-8).float()) 84 | s = torch.cat((-w5 * rs[:, 0:1], torch.cumsum((-rs2 + rs) * ws, dim=1) - w5 * rs[:, 0:1]), 1) 85 | 86 | c4 = s[:, 0] + c < 0 87 | c3 = (d * w).sum(dim=1) + c > 0 88 | c2 = ~(c4 | c3) 89 | 90 | lb = torch.zeros(c2.sum(), device=device) 91 | ub = torch.full_like(lb, w.shape[1] - 1) 92 | nitermax = math.ceil(math.log2(w.shape[1])) 93 | 94 | s_, c_ = s[c2], c[c2] 95 | for counter in range(nitermax): 96 | counter4 = torch.floor((lb + ub) / 2) 97 | counter2 = counter4.long().unsqueeze(1) 98 | c3 = s_.gather(1, counter2).squeeze(1) + c_ > 0 99 | lb = torch.where(c3, counter4, lb) 100 | ub = torch.where(c3, ub, counter4) 101 | 102 | lb = lb.long() 103 | 104 | if c4.any(): 105 | alpha = c[c4] / w5[c4].squeeze(-1) 106 | d[c4] = -alpha.unsqueeze(-1) * w[c4] 107 | 108 | if c2.any(): 109 | alpha = (s[c2, lb] + c[c2]) / ws[c2, lb] + rs[c2, lb] 110 | alpha[ws[c2, lb] == 0] = 0 111 | c5 = (alpha.unsqueeze(-1) > r[c2]).float() 112 | d[c2] = d[c2] * c5 - alpha.unsqueeze(-1) * w[c2] * (1 - c5) 113 | 114 | return d * (w.abs() > 1e-8).float() 115 | 116 | 117 | def projection_l1(points_to_project, w_hyperplane, b_hyperplane): 118 | device = points_to_project.device 119 | t, w, b = points_to_project, w_hyperplane.clone(), b_hyperplane 120 | 121 | c = (w * t).sum(1) - b 122 | ind2 = 2 * (c >= 0) - 1 123 | w.mul_(ind2.unsqueeze(1)) 124 | c.mul_(ind2) 125 | 126 | r = (1 / w).abs().clamp_max(1e12) 127 | indr = torch.argsort(r, dim=1) 128 | indr_rev = torch.argsort(indr) 129 | 130 | c6 = (w < 0).float() 131 | d = (-t + c6) * (w != 0).float() 132 | ds = torch.min(-w * t, w * (1 - t)).gather(1, indr) 133 | ds2 = torch.cat((c.unsqueeze(-1), ds), 1) 134 | s = torch.cumsum(ds2, dim=1) 135 | 136 | c2 = s[:, -1] < 0 137 | 138 | lb = torch.zeros(c2.sum(), device=device) 139 | ub = torch.full_like(lb, s.shape[1]) 140 | nitermax = math.ceil(math.log2(w.shape[1])) 141 | 142 | s_ = s[c2] 143 | for counter in range(nitermax): 144 | counter4 = torch.floor((lb + ub) / 2) 145 | counter2 = counter4.long().unsqueeze(1) 146 | c3 = s_.gather(1, counter2).squeeze(1) > 0 147 | lb = torch.where(c3, counter4, lb) 148 | ub = torch.where(c3, ub, counter4) 149 | 150 | lb2 = lb.long() 151 | 152 | if c2.any(): 153 | indr = indr[c2].gather(1, lb2.unsqueeze(1)).squeeze(1) 154 | u = torch.arange(0, w.shape[0], device=device).unsqueeze(1) 155 | u2 = torch.arange(0, w.shape[1], device=device, dtype=torch.float).unsqueeze(0) 156 | alpha = -s[c2, lb2] / w[c2, indr] 157 | c5 = u2 < lb.unsqueeze(-1) 158 | u3 = c5[u[:c5.shape[0]], indr_rev[c2]] 159 | d[c2] = d[c2] * u3.float() 160 | d[c2, indr] = alpha 161 | 162 | return d * (w.abs() > 1e-8).float() 163 | -------------------------------------------------------------------------------- /autoattack/fab_pt.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2019-present, Francesco Croce 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | from __future__ import unicode_literals 12 | 13 | import time 14 | 15 | import torch 16 | 17 | from autoattack.other_utils import zero_gradients 18 | from autoattack.fab_base import FABAttack 19 | 20 | class FABAttack_PT(FABAttack): 21 | """ 22 | Fast Adaptive Boundary Attack (Linf, L2, L1) 23 | https://arxiv.org/abs/1907.02044 24 | 25 | :param predict: forward pass function 26 | :param norm: Lp-norm to minimize ('Linf', 'L2', 'L1' supported) 27 | :param n_restarts: number of random restarts 28 | :param n_iter: number of iterations 29 | :param eps: epsilon for the random restarts 30 | :param alpha_max: alpha_max 31 | :param eta: overshooting 32 | :param beta: backward step 33 | """ 34 | 35 | def __init__( 36 | self, 37 | predict, 38 | norm='Linf', 39 | n_restarts=1, 40 | n_iter=100, 41 | eps=None, 42 | alpha_max=0.1, 43 | eta=1.05, 44 | beta=0.9, 45 | loss_fn=None, 46 | verbose=False, 47 | seed=0, 48 | targeted=False, 49 | device=None, 50 | n_target_classes=9): 51 | """ FAB-attack implementation in pytorch """ 52 | 53 | self.predict = predict 54 | super().__init__(norm, 55 | n_restarts, 56 | n_iter, 57 | eps, 58 | alpha_max, 59 | eta, 60 | beta, 61 | loss_fn, 62 | verbose, 63 | seed, 64 | targeted, 65 | device, 66 | n_target_classes) 67 | 68 | def _predict_fn(self, x): 69 | return self.predict(x) 70 | 71 | def _get_predicted_label(self, x): 72 | with torch.no_grad(): 73 | outputs = self._predict_fn(x) 74 | _, y = torch.max(outputs, dim=1) 75 | return y 76 | 77 | def get_diff_logits_grads_batch(self, imgs, la): 78 | im = imgs.clone().requires_grad_() 79 | with torch.enable_grad(): 80 | y = self.predict(im) 81 | 82 | g2 = torch.zeros([y.shape[-1], *imgs.size()]).to(self.device) 83 | grad_mask = torch.zeros_like(y) 84 | for counter in range(y.shape[-1]): 85 | zero_gradients(im) 86 | grad_mask[:, counter] = 1.0 87 | y.backward(grad_mask, retain_graph=True) 88 | grad_mask[:, counter] = 0.0 89 | g2[counter] = im.grad.data 90 | 91 | g2 = torch.transpose(g2, 0, 1).detach() 92 | #y2 = self.predict(imgs).detach() 93 | y2 = y.detach() 94 | df = y2 - y2[torch.arange(imgs.shape[0]), la].unsqueeze(1) 95 | dg = g2 - g2[torch.arange(imgs.shape[0]), la].unsqueeze(1) 96 | df[torch.arange(imgs.shape[0]), la] = 1e10 97 | 98 | return df, dg 99 | 100 | def get_diff_logits_grads_batch_targeted(self, imgs, la, la_target): 101 | u = torch.arange(imgs.shape[0]) 102 | im = imgs.clone().requires_grad_() 103 | with torch.enable_grad(): 104 | y = self.predict(im) 105 | diffy = -(y[u, la] - y[u, la_target]) 106 | sumdiffy = diffy.sum() 107 | 108 | zero_gradients(im) 109 | sumdiffy.backward() 110 | graddiffy = im.grad.data 111 | df = diffy.detach().unsqueeze(1) 112 | dg = graddiffy.unsqueeze(1) 113 | 114 | return df, dg 115 | -------------------------------------------------------------------------------- /autoattack/fab_tf.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2019-present, Francesco Croce 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | from __future__ import unicode_literals 12 | 13 | import torch 14 | from autoattack.fab_base import FABAttack 15 | 16 | 17 | class FABAttack_TF(FABAttack): 18 | """ 19 | Fast Adaptive Boundary Attack (Linf, L2, L1) 20 | https://arxiv.org/abs/1907.02044 21 | 22 | :param model: TF_model 23 | :param norm: Lp-norm to minimize ('Linf', 'L2', 'L1' supported) 24 | :param n_restarts: number of random restarts 25 | :param n_iter: number of iterations 26 | :param eps: epsilon for the random restarts 27 | :param alpha_max: alpha_max 28 | :param eta: overshooting 29 | :param beta: backward step 30 | """ 31 | 32 | def __init__( 33 | self, 34 | model, 35 | norm='Linf', 36 | n_restarts=1, 37 | n_iter=100, 38 | eps=None, 39 | alpha_max=0.1, 40 | eta=1.05, 41 | beta=0.9, 42 | loss_fn=None, 43 | verbose=False, 44 | seed=0, 45 | targeted=False, 46 | device=None, 47 | n_target_classes=9): 48 | """ FAB-attack implementation in TF2 """ 49 | 50 | self.model = model 51 | super().__init__(norm, 52 | n_restarts, 53 | n_iter, 54 | eps, 55 | alpha_max, 56 | eta, 57 | beta, 58 | loss_fn, 59 | verbose, 60 | seed, 61 | targeted, 62 | device, 63 | n_target_classes) 64 | 65 | def _predict_fn(self, x): 66 | return self.model.predict(x) 67 | 68 | def _get_predicted_label(self, x): 69 | with torch.no_grad(): 70 | outputs = self._predict_fn(x) 71 | _, y = torch.max(outputs, dim=1) 72 | return y 73 | 74 | def get_diff_logits_grads_batch(self, imgs, la): 75 | y2, g2 = self.model.grad_logits(imgs) 76 | df = y2 - y2[torch.arange(imgs.shape[0]), la].unsqueeze(1) 77 | dg = g2 - g2[torch.arange(imgs.shape[0]), la].unsqueeze(1) 78 | df[torch.arange(imgs.shape[0]), la] = 1e10 79 | 80 | return df, dg 81 | 82 | def get_diff_logits_grads_batch_targeted(self, imgs, la, la_target): 83 | df, dg = self.model.get_grad_diff_logits_target(imgs, la, la_target) 84 | df.unsqueeze_(1) 85 | dg.unsqueeze_(1) 86 | 87 | return df, dg 88 | -------------------------------------------------------------------------------- /autoattack/other_utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import collections.abc as container_abcs 3 | 4 | import torch 5 | 6 | class Logger(): 7 | def __init__(self, log_path): 8 | self.log_path = log_path 9 | 10 | def log(self, str_to_log): 11 | print(str_to_log) 12 | if not self.log_path is None: 13 | with open(self.log_path, 'a') as f: 14 | f.write(str_to_log + '\n') 15 | f.flush() 16 | 17 | def check_imgs(adv, x, norm): 18 | delta = (adv - x).view(adv.shape[0], -1) 19 | if norm == 'Linf': 20 | res = delta.abs().max(dim=1)[0] 21 | elif norm == 'L2': 22 | res = (delta ** 2).sum(dim=1).sqrt() 23 | elif norm == 'L1': 24 | res = delta.abs().sum(dim=1) 25 | 26 | str_det = 'max {} pert: {:.5f}, nan in imgs: {}, max in imgs: {:.5f}, min in imgs: {:.5f}'.format( 27 | norm, res.max(), (adv != adv).sum(), adv.max(), adv.min()) 28 | print(str_det) 29 | 30 | return str_det 31 | 32 | def L1_norm(x, keepdim=False): 33 | z = x.abs().view(x.shape[0], -1).sum(-1) 34 | if keepdim: 35 | z = z.view(-1, *[1]*(len(x.shape) - 1)) 36 | return z 37 | 38 | def L2_norm(x, keepdim=False): 39 | z = (x ** 2).view(x.shape[0], -1).sum(-1).sqrt() 40 | if keepdim: 41 | z = z.view(-1, *[1]*(len(x.shape) - 1)) 42 | return z 43 | 44 | def L0_norm(x): 45 | return (x != 0.).view(x.shape[0], -1).sum(-1) 46 | 47 | def makedir(path): 48 | if not os.path.exists(path): 49 | os.makedirs(path) 50 | 51 | def zero_gradients(x): 52 | if isinstance(x, torch.Tensor): 53 | if x.grad is not None: 54 | x.grad.detach_() 55 | x.grad.zero_() 56 | elif isinstance(x, container_abcs.Iterable): 57 | for elem in x: 58 | zero_gradients(elem) 59 | -------------------------------------------------------------------------------- /autoattack/square.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2020-present, Francesco Croce 2 | # All rights reserved. 3 | # 4 | # This source code is licensed under the license found in the 5 | # LICENSE file in the root directory of this source tree. 6 | # 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | from __future__ import unicode_literals 12 | 13 | import torch 14 | import time 15 | import math 16 | import torch.nn.functional as F 17 | 18 | from autoattack.autopgd_base import L1_projection 19 | 20 | class SquareAttack(): 21 | """ 22 | Square Attack 23 | https://arxiv.org/abs/1912.00049 24 | 25 | :param predict: forward pass function 26 | :param norm: Lp-norm of the attack ('Linf', 'L2' supported) 27 | :param n_restarts: number of random restarts 28 | :param n_queries: max number of queries (each restart) 29 | :param eps: bound on the norm of perturbations 30 | :param seed: random seed for the starting point 31 | :param p_init: parameter to control size of squares 32 | :param loss: loss function optimized ('margin', 'ce' supported) 33 | :param resc_schedule adapt schedule of p to n_queries 34 | """ 35 | 36 | def __init__( 37 | self, 38 | predict, 39 | norm='Linf', 40 | n_queries=5000, 41 | eps=None, 42 | p_init=.8, 43 | n_restarts=1, 44 | seed=0, 45 | verbose=False, 46 | targeted=False, 47 | loss='margin', 48 | resc_schedule=True, 49 | device=None): 50 | """ 51 | Square Attack implementation in PyTorch 52 | """ 53 | 54 | self.predict = predict 55 | self.norm = norm 56 | self.n_queries = n_queries 57 | self.eps = eps 58 | self.p_init = p_init 59 | self.n_restarts = n_restarts 60 | self.seed = seed 61 | self.verbose = verbose 62 | self.targeted = targeted 63 | self.loss = loss 64 | self.rescale_schedule = resc_schedule 65 | self.device = device 66 | self.return_all = False 67 | 68 | def margin_and_loss(self, x, y): 69 | """ 70 | :param y: correct labels if untargeted else target labels 71 | """ 72 | 73 | logits = self.predict(x) 74 | xent = F.cross_entropy(logits, y, reduction='none') 75 | u = torch.arange(x.shape[0]) 76 | y_corr = logits[u, y].clone() 77 | logits[u, y] = -float('inf') 78 | y_others = logits.max(dim=-1)[0] 79 | 80 | if not self.targeted: 81 | if self.loss == 'ce': 82 | return y_corr - y_others, -1. * xent 83 | elif self.loss == 'margin': 84 | return y_corr - y_others, y_corr - y_others 85 | else: 86 | return y_others - y_corr, xent 87 | 88 | def init_hyperparam(self, x): 89 | assert self.norm in ['Linf', 'L2', 'L1'] 90 | assert not self.eps is None 91 | assert self.loss in ['ce', 'margin'] 92 | 93 | if self.device is None: 94 | self.device = x.device 95 | self.orig_dim = list(x.shape[1:]) 96 | self.ndims = len(self.orig_dim) 97 | if self.seed is None: 98 | self.seed = time.time() 99 | 100 | def random_target_classes(self, y_pred, n_classes): 101 | y = torch.zeros_like(y_pred) 102 | for counter in range(y_pred.shape[0]): 103 | l = list(range(n_classes)) 104 | l.remove(y_pred[counter]) 105 | t = self.random_int(0, len(l)) 106 | y[counter] = l[t] 107 | 108 | return y.long().to(self.device) 109 | 110 | def check_shape(self, x): 111 | return x if len(x.shape) == (self.ndims + 1) else x.unsqueeze(0) 112 | 113 | def random_choice(self, shape): 114 | t = 2 * torch.rand(shape).to(self.device) - 1 115 | return torch.sign(t) 116 | 117 | def random_int(self, low=0, high=1, shape=[1]): 118 | t = low + (high - low) * torch.rand(shape).to(self.device) 119 | return t.long() 120 | 121 | def normalize(self, x): 122 | if self.norm == 'Linf': 123 | t = x.abs().view(x.shape[0], -1).max(1)[0] 124 | return x / (t.view(-1, *([1] * self.ndims)) + 1e-12) 125 | 126 | elif self.norm == 'L2': 127 | t = (x ** 2).view(x.shape[0], -1).sum(-1).sqrt() 128 | return x / (t.view(-1, *([1] * self.ndims)) + 1e-12) 129 | 130 | elif self.norm == 'L1': 131 | t = x.abs().view(x.shape[0], -1).sum(dim=-1) 132 | return x / (t.view(-1, *([1] * self.ndims)) + 1e-12) 133 | 134 | def lp_norm(self, x): 135 | if self.norm == 'L2': 136 | t = (x ** 2).view(x.shape[0], -1).sum(-1).sqrt() 137 | return t.view(-1, *([1] * self.ndims)) 138 | 139 | elif self.norm == 'L1': 140 | t = x.abs().view(x.shape[0], -1).sum(dim=-1) 141 | return t.view(-1, *([1] * self.ndims)) 142 | 143 | def eta_rectangles(self, x, y): 144 | delta = torch.zeros([x, y]).to(self.device) 145 | x_c, y_c = x // 2 + 1, y // 2 + 1 146 | 147 | counter2 = [x_c - 1, y_c - 1] 148 | if self.norm == 'L2': 149 | for counter in range(0, max(x_c, y_c)): 150 | delta[max(counter2[0], 0):min(counter2[0] + (2*counter + 1), x), 151 | max(0, counter2[1]):min(counter2[1] + (2*counter + 1), y) 152 | ] += 1.0/(torch.Tensor([counter + 1]).view(1, 1).to( 153 | self.device) ** 2) 154 | counter2[0] -= 1 155 | counter2[1] -= 1 156 | 157 | delta /= (delta ** 2).sum(dim=(0, 1), keepdim=True).sqrt() 158 | 159 | elif self.norm == 'L1': 160 | for counter in range(0, max(x_c, y_c)): 161 | delta[max(counter2[0], 0):min(counter2[0] + (2*counter + 1), x), 162 | max(0, counter2[1]):min(counter2[1] + (2*counter + 1), y) 163 | ] += 1.0/(torch.Tensor([counter + 1]).view(1, 1).to( 164 | self.device) ** 4) 165 | counter2[0] -= 1 166 | counter2[1] -= 1 167 | 168 | delta /= delta.abs().sum(dim=(), keepdim=True) 169 | 170 | return delta 171 | 172 | def eta(self, s): 173 | if self.norm == 'L2': 174 | delta = torch.zeros([s, s]).to(self.device) 175 | delta[:s // 2] = self.eta_rectangles(s // 2, s) 176 | delta[s // 2:] = -1. * self.eta_rectangles(s - s // 2, s) 177 | delta /= (delta ** 2).sum(dim=(0, 1), keepdim=True).sqrt() 178 | 179 | elif self.norm == 'L1': 180 | delta = torch.zeros([s, s]).to(self.device) 181 | delta[:s // 2] = self.eta_rectangles(s // 2, s) 182 | delta[s // 2:] = -1. * self.eta_rectangles(s - s // 2, s) 183 | #delta = self.eta_rectangles(s, s) 184 | delta /= delta.abs().sum(dim=(), keepdim=True) 185 | #delta *= (torch.rand([1]) - .5).sign().to(self.device) 186 | 187 | if torch.rand([1]) > 0.5: 188 | delta = delta.permute([1, 0]) 189 | 190 | return delta 191 | 192 | def p_selection(self, it): 193 | """ schedule to decrease the parameter p """ 194 | 195 | if self.rescale_schedule: 196 | it = int(it / self.n_queries * 10000) 197 | 198 | if 10 < it <= 50: 199 | p = self.p_init / 2 200 | elif 50 < it <= 200: 201 | p = self.p_init / 4 202 | elif 200 < it <= 500: 203 | p = self.p_init / 8 204 | elif 500 < it <= 1000: 205 | p = self.p_init / 16 206 | elif 1000 < it <= 2000: 207 | p = self.p_init / 32 208 | elif 2000 < it <= 4000: 209 | p = self.p_init / 64 210 | elif 4000 < it <= 6000: 211 | p = self.p_init / 128 212 | elif 6000 < it <= 8000: 213 | p = self.p_init / 256 214 | elif 8000 < it: 215 | p = self.p_init / 512 216 | else: 217 | p = self.p_init 218 | 219 | return p 220 | 221 | def attack_single_run(self, x, y): 222 | with torch.no_grad(): 223 | adv = x.clone() 224 | c, h, w = x.shape[1:] 225 | n_features = c * h * w 226 | n_ex_total = x.shape[0] 227 | 228 | if self.verbose and h != w: 229 | print('square attack may not work properly for non-square image.') 230 | print('for details please refer to https://github.com/fra31/auto-attack/issues/95') 231 | 232 | 233 | if self.norm == 'Linf': 234 | x_best = torch.clamp(x + self.eps * self.random_choice( 235 | [x.shape[0], c, 1, w]), 0., 1.) 236 | margin_min, loss_min = self.margin_and_loss(x_best, y) 237 | n_queries = torch.ones(x.shape[0]).to(self.device) 238 | s_init = int(math.sqrt(self.p_init * n_features / c)) 239 | 240 | if (margin_min < 0.0).all(): 241 | return n_queries, x_best 242 | 243 | for i_iter in range(self.n_queries): 244 | idx_to_fool = (margin_min > 0.0).nonzero().squeeze() 245 | 246 | x_curr = self.check_shape(x[idx_to_fool]) 247 | x_best_curr = self.check_shape(x_best[idx_to_fool]) 248 | y_curr = y[idx_to_fool] 249 | if len(y_curr.shape) == 0: 250 | y_curr = y_curr.unsqueeze(0) 251 | margin_min_curr = margin_min[idx_to_fool] 252 | loss_min_curr = loss_min[idx_to_fool] 253 | 254 | p = self.p_selection(i_iter) 255 | s = max(int(round(math.sqrt(p * n_features / c))), 1) 256 | s = min(s, min(h, w)) 257 | vh = self.random_int(0, h - s) 258 | vw = self.random_int(0, w - s) 259 | new_deltas = torch.zeros([c, h, w]).to(self.device) 260 | new_deltas[:, vh:vh + s, vw:vw + s 261 | ] = 2. * self.eps * self.random_choice([c, 1, 1]) 262 | 263 | x_new = x_best_curr + new_deltas 264 | x_new = torch.min(torch.max(x_new, x_curr - self.eps), 265 | x_curr + self.eps) 266 | x_new = torch.clamp(x_new, 0., 1.) 267 | x_new = self.check_shape(x_new) 268 | 269 | margin, loss = self.margin_and_loss(x_new, y_curr) 270 | 271 | # update loss if new loss is better 272 | idx_improved = (loss < loss_min_curr).float() 273 | 274 | loss_min[idx_to_fool] = idx_improved * loss + ( 275 | 1. - idx_improved) * loss_min_curr 276 | 277 | # update margin and x_best if new loss is better 278 | # or misclassification 279 | idx_miscl = (margin <= 0.).float() 280 | idx_improved = torch.max(idx_improved, idx_miscl) 281 | 282 | margin_min[idx_to_fool] = idx_improved * margin + ( 283 | 1. - idx_improved) * margin_min_curr 284 | idx_improved = idx_improved.reshape([-1, 285 | *[1]*len(x.shape[:-1])]) 286 | x_best[idx_to_fool] = idx_improved * x_new + ( 287 | 1. - idx_improved) * x_best_curr 288 | n_queries[idx_to_fool] += 1. 289 | 290 | ind_succ = (margin_min <= 0.).nonzero().squeeze() 291 | if self.verbose and ind_succ.numel() != 0: 292 | print('{}'.format(i_iter + 1), 293 | '- success rate={}/{} ({:.2%})'.format( 294 | ind_succ.numel(), n_ex_total, 295 | float(ind_succ.numel()) / n_ex_total), 296 | '- avg # queries={:.1f}'.format( 297 | n_queries[ind_succ].mean().item()), 298 | '- med # queries={:.1f}'.format( 299 | n_queries[ind_succ].median().item()), 300 | '- loss={:.3f}'.format(loss_min.mean())) 301 | 302 | if ind_succ.numel() == n_ex_total: 303 | break 304 | 305 | elif self.norm == 'L2': 306 | delta_init = torch.zeros_like(x) 307 | s = h // 5 308 | sp_init = (h - s * 5) // 2 309 | vh = sp_init + 0 310 | for _ in range(h // s): 311 | vw = sp_init + 0 312 | for _ in range(w // s): 313 | delta_init[:, :, vh:vh + s, vw:vw + s] += self.eta( 314 | s).view(1, 1, s, s) * self.random_choice( 315 | [x.shape[0], c, 1, 1]) 316 | vw += s 317 | vh += s 318 | 319 | x_best = torch.clamp(x + self.normalize(delta_init 320 | ) * self.eps, 0., 1.) 321 | margin_min, loss_min = self.margin_and_loss(x_best, y) 322 | n_queries = torch.ones(x.shape[0]).to(self.device) 323 | s_init = int(math.sqrt(self.p_init * n_features / c)) 324 | 325 | if (margin_min < 0.0).all(): 326 | return n_queries, x_best 327 | 328 | for i_iter in range(self.n_queries): 329 | idx_to_fool = (margin_min > 0.0).nonzero().squeeze() 330 | 331 | x_curr = self.check_shape(x[idx_to_fool]) 332 | x_best_curr = self.check_shape(x_best[idx_to_fool]) 333 | y_curr = y[idx_to_fool] 334 | if len(y_curr.shape) == 0: 335 | y_curr = y_curr.unsqueeze(0) 336 | margin_min_curr = margin_min[idx_to_fool] 337 | loss_min_curr = loss_min[idx_to_fool] 338 | 339 | delta_curr = x_best_curr - x_curr 340 | p = self.p_selection(i_iter) 341 | s = max(int(round(math.sqrt(p * n_features / c))), 3) 342 | if s % 2 == 0: 343 | s += 1 344 | s = min(s, min(h, w)) 345 | 346 | vh = self.random_int(0, h - s) 347 | vw = self.random_int(0, w - s) 348 | new_deltas_mask = torch.zeros_like(x_curr) 349 | new_deltas_mask[:, :, vh:vh + s, vw:vw + s] = 1.0 350 | norms_window_1 = (delta_curr[:, :, vh:vh + s, vw:vw + s 351 | ] ** 2).sum(dim=(-2, -1), keepdim=True).sqrt() 352 | 353 | vh2 = self.random_int(0, h - s) 354 | vw2 = self.random_int(0, w - s) 355 | new_deltas_mask_2 = torch.zeros_like(x_curr) 356 | new_deltas_mask_2[:, :, vh2:vh2 + s, vw2:vw2 + s] = 1. 357 | 358 | norms_image = self.lp_norm(x_best_curr - x_curr) 359 | mask_image = torch.max(new_deltas_mask, new_deltas_mask_2) 360 | norms_windows = ((delta_curr * mask_image) ** 2).sum(dim=( 361 | -2, -1), keepdim=True).sqrt() 362 | 363 | new_deltas = torch.ones([x_curr.shape[0], c, s, s] 364 | ).to(self.device) 365 | new_deltas *= (self.eta(s).view(1, 1, s, s) * 366 | self.random_choice([x_curr.shape[0], c, 1, 1])) 367 | old_deltas = delta_curr[:, :, vh:vh + s, vw:vw + s] / ( 368 | 1e-12 + norms_window_1) 369 | new_deltas += old_deltas 370 | new_deltas = new_deltas / (1e-12 + (new_deltas ** 2).sum( 371 | dim=(-2, -1), keepdim=True).sqrt()) * (torch.max( 372 | (self.eps * torch.ones_like(new_deltas)) ** 2 - 373 | norms_image ** 2, torch.zeros_like(new_deltas)) / 374 | c + norms_windows ** 2).sqrt() 375 | delta_curr[:, :, vh2:vh2 + s, vw2:vw2 + s] = 0. 376 | delta_curr[:, :, vh:vh + s, vw:vw + s] = new_deltas + 0 377 | 378 | x_new = torch.clamp(x_curr + self.normalize(delta_curr 379 | ) * self.eps, 0. ,1.) 380 | x_new = self.check_shape(x_new) 381 | norms_image = self.lp_norm(x_new - x_curr) 382 | 383 | margin, loss = self.margin_and_loss(x_new, y_curr) 384 | 385 | # update loss if new loss is better 386 | idx_improved = (loss < loss_min_curr).float() 387 | 388 | loss_min[idx_to_fool] = idx_improved * loss + ( 389 | 1. - idx_improved) * loss_min_curr 390 | 391 | # update margin and x_best if new loss is better 392 | # or misclassification 393 | idx_miscl = (margin <= 0.).float() 394 | idx_improved = torch.max(idx_improved, idx_miscl) 395 | 396 | margin_min[idx_to_fool] = idx_improved * margin + ( 397 | 1. - idx_improved) * margin_min_curr 398 | idx_improved = idx_improved.reshape([-1, 399 | *[1]*len(x.shape[:-1])]) 400 | x_best[idx_to_fool] = idx_improved * x_new + ( 401 | 1. - idx_improved) * x_best_curr 402 | n_queries[idx_to_fool] += 1. 403 | 404 | ind_succ = (margin_min <= 0.).nonzero().squeeze() 405 | if self.verbose and ind_succ.numel() != 0: 406 | print('{}'.format(i_iter + 1), 407 | '- success rate={}/{} ({:.2%})'.format( 408 | ind_succ.numel(), n_ex_total, float( 409 | ind_succ.numel()) / n_ex_total), 410 | '- avg # queries={:.1f}'.format( 411 | n_queries[ind_succ].mean().item()), 412 | '- med # queries={:.1f}'.format( 413 | n_queries[ind_succ].median().item()), 414 | '- loss={:.3f}'.format(loss_min.mean())) 415 | 416 | assert (x_new != x_new).sum() == 0 417 | assert (x_best != x_best).sum() == 0 418 | 419 | if ind_succ.numel() == n_ex_total: 420 | break 421 | 422 | elif self.norm == 'L1': 423 | delta_init = torch.zeros_like(x) 424 | s = h // 5 425 | sp_init = (h - s * 5) // 2 426 | vh = sp_init + 0 427 | for _ in range(h // s): 428 | vw = sp_init + 0 429 | for _ in range(w // s): 430 | delta_init[:, :, vh:vh + s, vw:vw + s] += self.eta( 431 | s).view(1, 1, s, s) * self.random_choice( 432 | [x.shape[0], c, 1, 1]) 433 | vw += s 434 | vh += s 435 | 436 | #x_best = torch.clamp(x + self.normalize(delta_init 437 | # ) * self.eps, 0., 1.) 438 | r_best = L1_projection(x, delta_init, self.eps * (1. - 1e-6)) 439 | x_best = x + delta_init + r_best 440 | margin_min, loss_min = self.margin_and_loss(x_best, y) 441 | n_queries = torch.ones(x.shape[0]).to(self.device) 442 | s_init = int(math.sqrt(self.p_init * n_features / c)) 443 | 444 | if (margin_min < 0.0).all(): 445 | return n_queries, x_best 446 | 447 | for i_iter in range(self.n_queries): 448 | idx_to_fool = (margin_min > 0.0).nonzero().squeeze() 449 | 450 | x_curr = self.check_shape(x[idx_to_fool]) 451 | x_best_curr = self.check_shape(x_best[idx_to_fool]) 452 | y_curr = y[idx_to_fool] 453 | if len(y_curr.shape) == 0: 454 | y_curr = y_curr.unsqueeze(0) 455 | margin_min_curr = margin_min[idx_to_fool] 456 | loss_min_curr = loss_min[idx_to_fool] 457 | 458 | delta_curr = x_best_curr - x_curr 459 | p = self.p_selection(i_iter) 460 | s = max(int(round(math.sqrt(p * n_features / c))), 3) 461 | if s % 2 == 0: 462 | s += 1 463 | #pass 464 | s = min(s, min(h, w)) 465 | 466 | vh = self.random_int(0, h - s) 467 | vw = self.random_int(0, w - s) 468 | new_deltas_mask = torch.zeros_like(x_curr) 469 | new_deltas_mask[:, :, vh:vh + s, vw:vw + s] = 1.0 470 | norms_window_1 = delta_curr[:, :, vh:vh + s, vw:vw + s 471 | ].abs().sum(dim=(-2, -1), keepdim=True) 472 | 473 | vh2 = self.random_int(0, h - s) 474 | vw2 = self.random_int(0, w - s) 475 | new_deltas_mask_2 = torch.zeros_like(x_curr) 476 | new_deltas_mask_2[:, :, vh2:vh2 + s, vw2:vw2 + s] = 1. 477 | 478 | norms_image = self.lp_norm(x_best_curr - x_curr) 479 | mask_image = torch.max(new_deltas_mask, new_deltas_mask_2) 480 | norms_windows = (delta_curr * mask_image).abs().sum(dim=( 481 | -2, -1), keepdim=True) 482 | 483 | new_deltas = torch.ones([x_curr.shape[0], c, s, s] 484 | ).to(self.device) 485 | new_deltas *= (self.eta(s).view(1, 1, s, s) * 486 | self.random_choice([x_curr.shape[0], c, 1, 1])) 487 | old_deltas = delta_curr[:, :, vh:vh + s, vw:vw + s] / ( 488 | 1e-12 + norms_window_1) 489 | new_deltas += old_deltas 490 | new_deltas = new_deltas / (1e-12 + new_deltas.abs().sum( 491 | dim=(-2, -1), keepdim=True)) * (torch.max( 492 | self.eps * torch.ones_like(norms_image) - 493 | norms_image, torch.zeros_like(norms_image)) / 494 | c + norms_windows) * c 495 | delta_curr[:, :, vh2:vh2 + s, vw2:vw2 + s] = 0. 496 | delta_curr[:, :, vh:vh + s, vw:vw + s] = new_deltas + 0 497 | 498 | # 499 | #norms_image_old = self.lp_norm(delta_curr) 500 | r_curr = L1_projection(x_curr, delta_curr, self.eps * (1. - 1e-6)) 501 | x_new = x_curr + delta_curr + r_curr 502 | x_new = self.check_shape(x_new) 503 | norms_image = self.lp_norm(x_new - x_curr) 504 | 505 | margin, loss = self.margin_and_loss(x_new, y_curr) 506 | 507 | # update loss if new loss is better 508 | idx_improved = (loss < loss_min_curr).float() 509 | 510 | loss_min[idx_to_fool] = idx_improved * loss + ( 511 | 1. - idx_improved) * loss_min_curr 512 | 513 | # update margin and x_best if new loss is better 514 | # or misclassification 515 | idx_miscl = (margin <= 0.).float() 516 | idx_improved = torch.max(idx_improved, idx_miscl) 517 | 518 | margin_min[idx_to_fool] = idx_improved * margin + ( 519 | 1. - idx_improved) * margin_min_curr 520 | idx_improved = idx_improved.reshape([-1, 521 | *[1]*len(x.shape[:-1])]) 522 | x_best[idx_to_fool] = idx_improved * x_new + ( 523 | 1. - idx_improved) * x_best_curr 524 | n_queries[idx_to_fool] += 1. 525 | 526 | ind_succ = (margin_min <= 0.).nonzero().squeeze() 527 | if self.verbose and ind_succ.numel() != 0: 528 | print('{}'.format(i_iter + 1), 529 | '- success rate={}/{} ({:.2%})'.format( 530 | ind_succ.numel(), n_ex_total, float( 531 | ind_succ.numel()) / n_ex_total), 532 | '- avg # queries={:.1f}'.format( 533 | n_queries[ind_succ].mean().item()), 534 | '- med # queries={:.1f}'.format( 535 | n_queries[ind_succ].median().item()), 536 | '- loss={:.3f}'.format(loss_min.mean()), 537 | '- max pert={:.3f}'.format(norms_image.max().item()), 538 | #'- old pert={:.3f}'.format(norms_image_old.max().item()) 539 | ) 540 | 541 | assert (x_new != x_new).sum() == 0 542 | assert (x_best != x_best).sum() == 0 543 | 544 | if ind_succ.numel() == n_ex_total: 545 | break 546 | 547 | return n_queries, x_best 548 | 549 | def perturb(self, x, y=None): 550 | """ 551 | :param x: clean images 552 | :param y: untargeted attack -> clean labels, 553 | if None we use the predicted labels 554 | targeted attack -> target labels, if None random classes, 555 | different from the predicted ones, are sampled 556 | """ 557 | 558 | self.init_hyperparam(x) 559 | 560 | adv = x.clone() 561 | #adv_all = x.clone() 562 | if y is None: 563 | if not self.targeted: 564 | with torch.no_grad(): 565 | output = self.predict(x) 566 | y_pred = output.max(1)[1] 567 | y = y_pred.detach().clone().long().to(self.device) 568 | else: 569 | with torch.no_grad(): 570 | output = self.predict(x) 571 | n_classes = output.shape[-1] 572 | y_pred = output.max(1)[1] 573 | y = self.random_target_classes(y_pred, n_classes) 574 | else: 575 | y = y.detach().clone().long().to(self.device) 576 | 577 | if not self.targeted: 578 | acc = self.predict(x).max(1)[1] == y 579 | else: 580 | acc = self.predict(x).max(1)[1] != y 581 | 582 | startt = time.time() 583 | 584 | torch.random.manual_seed(self.seed) 585 | torch.cuda.random.manual_seed(self.seed) 586 | 587 | for counter in range(self.n_restarts): 588 | ind_to_fool = acc.nonzero().squeeze() 589 | if len(ind_to_fool.shape) == 0: 590 | ind_to_fool = ind_to_fool.unsqueeze(0) 591 | if ind_to_fool.numel() != 0: 592 | x_to_fool = x[ind_to_fool].clone() 593 | y_to_fool = y[ind_to_fool].clone() 594 | 595 | _, adv_curr = self.attack_single_run(x_to_fool, y_to_fool) 596 | 597 | output_curr = self.predict(adv_curr) 598 | if not self.targeted: 599 | acc_curr = output_curr.max(1)[1] == y_to_fool 600 | else: 601 | acc_curr = output_curr.max(1)[1] != y_to_fool 602 | ind_curr = (acc_curr == 0).nonzero().squeeze() 603 | 604 | acc[ind_to_fool[ind_curr]] = 0 605 | adv[ind_to_fool[ind_curr]] = adv_curr[ind_curr].clone() 606 | #adv_all[ind_to_fool] = adv_curr.clone() 607 | if self.verbose: 608 | print('restart {} - robust accuracy: {:.2%}'.format( 609 | counter, acc.float().mean()), 610 | '- cum. time: {:.1f} s'.format( 611 | time.time() - startt)) 612 | 613 | if not self.return_all: 614 | return adv 615 | else: 616 | print('returning final points') 617 | return adv_all 618 | 619 | -------------------------------------------------------------------------------- /autoattack/state.py: -------------------------------------------------------------------------------- 1 | import json 2 | from dataclasses import dataclass, field, asdict 3 | from datetime import datetime 4 | from pathlib import Path 5 | from typing import Optional, Set 6 | import warnings 7 | 8 | import torch 9 | 10 | 11 | @dataclass 12 | class EvaluationState: 13 | _attacks_to_run: Set[str] 14 | path: Optional[Path] = None 15 | _run_attacks: Set[str] = field(default_factory=set) 16 | _robust_flags: Optional[torch.Tensor] = None 17 | _last_saved: datetime = datetime(1, 1, 1) 18 | _SAVE_TIMEOUT: int = 60 19 | _clean_accuracy: float = float("nan") 20 | 21 | def to_disk(self, force: bool = False) -> None: 22 | seconds_since_last_save = (datetime.now() - 23 | self._last_saved).total_seconds() 24 | if self.path is None or (seconds_since_last_save < self._SAVE_TIMEOUT 25 | and not force): 26 | return 27 | self._last_saved = datetime.now() 28 | d = asdict(self) 29 | if self.robust_flags is not None: 30 | d["_robust_flags"] = d["_robust_flags"].cpu().tolist() 31 | d["_run_attacks"] = list(self._run_attacks) 32 | with self.path.open("w", ) as f: 33 | json.dump(d, f, default=str) 34 | 35 | @classmethod 36 | def from_disk(cls, path: Path) -> "EvaluationState": 37 | with path.open("r") as f: 38 | d = json.load(f) 39 | d["_robust_flags"] = torch.tensor(d["_robust_flags"], dtype=torch.bool) 40 | d["path"] = Path(d["path"]) 41 | if path != d["path"]: 42 | warnings.warn( 43 | UserWarning( 44 | "The given path is different from the one found in the state file." 45 | )) 46 | d["_last_saved"] = datetime.fromisoformat(d["_last_saved"]) 47 | return cls(**d) 48 | 49 | @property 50 | def robust_flags(self) -> Optional[torch.Tensor]: 51 | return self._robust_flags 52 | 53 | @robust_flags.setter 54 | def robust_flags(self, robust_flags: torch.Tensor) -> None: 55 | self._robust_flags = robust_flags 56 | self.to_disk(force=True) 57 | 58 | @property 59 | def run_attacks(self) -> Set[str]: 60 | return self._run_attacks 61 | 62 | def add_run_attack(self, attack: str) -> None: 63 | self._run_attacks.add(attack) 64 | self.to_disk() 65 | 66 | @property 67 | def attacks_to_run(self) -> Set[str]: 68 | return self._attacks_to_run 69 | 70 | @attacks_to_run.setter 71 | def attacks_to_run(self, _: Set[str]) -> None: 72 | raise ValueError("attacks_to_run cannot be set outside of the constructor") 73 | 74 | @property 75 | def clean_accuracy(self) -> float: 76 | return self._clean_accuracy 77 | 78 | @clean_accuracy.setter 79 | def clean_accuracy(self, accuracy) -> None: 80 | self._clean_accuracy = accuracy 81 | self.to_disk(force=True) 82 | 83 | @property 84 | def robust_accuracy(self) -> float: 85 | if self.robust_flags is None: 86 | raise ValueError("robust_flags is not set yet. Start the attack first.") 87 | if self.attacks_to_run - self.run_attacks: 88 | warnings.warn("You are checking `robust_accuracy` before all the attacks" 89 | " have been run.") 90 | return self.robust_flags.float().mean().item() -------------------------------------------------------------------------------- /autoattack/utils_tf.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | import torch 4 | 5 | class ModelAdapter(): 6 | def __init__(self, logits, x, y, sess, num_classes=10): 7 | self.logits = logits 8 | self.sess = sess 9 | self.x_input = x 10 | self.y_input = y 11 | self.num_classes = num_classes 12 | 13 | # gradients of logits 14 | if num_classes <= 10: 15 | self.grads = [None] * num_classes 16 | for cl in range(num_classes): 17 | self.grads[cl] = tf.gradients(self.logits[:, cl], self.x_input)[0] 18 | 19 | # cross-entropy loss 20 | self.xent = tf.nn.sparse_softmax_cross_entropy_with_logits( 21 | logits=self.logits, labels=self.y_input) 22 | self.grad_xent = tf.gradients(self.xent, self.x_input)[0] 23 | 24 | # dlr loss 25 | self.dlr = dlr_loss(self.logits, self.y_input, num_classes=self.num_classes) 26 | self.grad_dlr = tf.gradients(self.dlr, self.x_input)[0] 27 | 28 | # targeted dlr loss 29 | self.y_target = tf.placeholder(tf.int64, shape=[None]) 30 | self.dlr_target = dlr_loss_targeted(self.logits, self.y_input, self.y_target, num_classes=self.num_classes) 31 | self.grad_target = tf.gradients(self.dlr_target, self.x_input)[0] 32 | 33 | self.la = tf.placeholder(tf.int64, shape=[None]) 34 | self.la_target = tf.placeholder(tf.int64, shape=[None]) 35 | la_mask = tf.one_hot(self.la, self.num_classes) 36 | la_target_mask = tf.one_hot(self.la_target, self.num_classes) 37 | la_logit = tf.reduce_sum(la_mask * self.logits, axis=1) 38 | la_target_logit = tf.reduce_sum(la_target_mask * self.logits, axis=1) 39 | self.diff_logits = la_target_logit - la_logit 40 | self.grad_diff_logits = tf.gradients(self.diff_logits, self.x_input)[0] 41 | 42 | def predict(self, x): 43 | x2 = np.moveaxis(x.cpu().numpy(), 1, 3) 44 | y = self.sess.run(self.logits, {self.x_input: x2}) 45 | 46 | return torch.from_numpy(y).cuda() 47 | 48 | def grad_logits(self, x): 49 | x2 = np.moveaxis(x.cpu().numpy(), 1, 3) 50 | logits, g2 = self.sess.run([self.logits, self.grads], {self.x_input: x2}) 51 | g2 = np.moveaxis(np.array(g2), 0, 1) 52 | g2 = np.transpose(g2, (0, 1, 4, 2, 3)) 53 | 54 | return torch.from_numpy(logits).cuda(), torch.from_numpy(g2).cuda() 55 | 56 | def get_grad_diff_logits_target(self, x, y=None, y_target=None): 57 | la = y.cpu().numpy() 58 | la_target = y_target.cpu().numpy() 59 | x2 = np.moveaxis(x.cpu().numpy(), 1, 3) 60 | dl, g2 = self.sess.run([self.diff_logits, self.grad_diff_logits], {self.x_input: x2, self.la: la, self.la_target: la_target}) 61 | g2 = np.transpose(np.array(g2), (0, 3, 1, 2)) 62 | 63 | return torch.from_numpy(dl).cuda(), torch.from_numpy(g2).cuda() 64 | 65 | def get_logits_loss_grad_xent(self, x, y): 66 | x2 = np.moveaxis(x.cpu().numpy(), 1, 3) 67 | y2 = y.clone().cpu().numpy() 68 | logits_val, loss_indiv_val, grad_val = self.sess.run([self.logits, self.xent, self.grad_xent], {self.x_input: x2, self.y_input: y2}) 69 | grad_val = np.moveaxis(grad_val, 3, 1) 70 | 71 | return torch.from_numpy(logits_val).cuda(), torch.from_numpy(loss_indiv_val).cuda(), torch.from_numpy(grad_val).cuda() 72 | 73 | def get_logits_loss_grad_dlr(self, x, y): 74 | x2 = np.moveaxis(x.cpu().numpy(), 1, 3) 75 | y2 = y.clone().cpu().numpy() 76 | logits_val, loss_indiv_val, grad_val = self.sess.run([self.logits, self.dlr, self.grad_dlr], {self.x_input: x2, self.y_input: y2}) 77 | grad_val = np.moveaxis(grad_val, 3, 1) 78 | 79 | return torch.from_numpy(logits_val).cuda(), torch.from_numpy(loss_indiv_val).cuda(), torch.from_numpy(grad_val).cuda() 80 | 81 | def get_logits_loss_grad_target(self, x, y, y_target): 82 | x2 = np.moveaxis(x.cpu().numpy(), 1, 3) 83 | y2 = y.clone().cpu().numpy() 84 | y_targ = y_target.clone().cpu().numpy() 85 | logits_val, loss_indiv_val, grad_val = self.sess.run([self.logits, self.dlr_target, self.grad_target], {self.x_input: x2, self.y_input: y2, self.y_target: y_targ}) 86 | grad_val = np.moveaxis(grad_val, 3, 1) 87 | 88 | return torch.from_numpy(logits_val).cuda(), torch.from_numpy(loss_indiv_val).cuda(), torch.from_numpy(grad_val).cuda() 89 | 90 | def dlr_loss(x, y, num_classes=10): 91 | x_sort = tf.contrib.framework.sort(x, axis=1) 92 | y_onehot = tf.one_hot(y, num_classes) 93 | ### TODO: adapt to the case when the point is already misclassified 94 | loss = -(x_sort[:, -1] - x_sort[:, -2]) / (x_sort[:, -1] - x_sort[:, -3] + 1e-12) 95 | 96 | return loss 97 | 98 | def dlr_loss_targeted(x, y, y_target, num_classes=10): 99 | x_sort = tf.contrib.framework.sort(x, axis=1) 100 | y_onehot = tf.one_hot(y, num_classes) 101 | y_target_onehot = tf.one_hot(y_target, num_classes) 102 | loss = -(tf.reduce_sum(x * y_onehot, axis=1) - tf.reduce_sum(x * y_target_onehot, axis=1)) / (x_sort[:, -1] - .5 * x_sort[:, -3] - .5 * x_sort[:, -4] + 1e-12) 103 | 104 | return loss 105 | -------------------------------------------------------------------------------- /autoattack/utils_tf2.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | import torch 4 | 5 | class ModelAdapter(): 6 | def __init__(self, model, num_classes=10): 7 | """ 8 | Please note that model should be tf.keras model without activation function 'softmax' 9 | """ 10 | self.num_classes = num_classes 11 | self.tf_model = model 12 | self.data_format = self.__check_channel_ordering() 13 | 14 | def __tf_to_pt(self, tf_tensor): 15 | """ Private function 16 | Convert tf tensor to pt format 17 | 18 | Args: 19 | tf_tensor: (tf_tensor) TF tensor 20 | 21 | Retruns: 22 | pt_tensor: (pt_tensor) Pytorch tensor 23 | """ 24 | 25 | cpu_tensor = tf_tensor.numpy() 26 | pt_tensor = torch.from_numpy(cpu_tensor).cuda() 27 | 28 | return pt_tensor 29 | 30 | def set_data_format(self, data_format): 31 | """ 32 | Set data_format manually 33 | 34 | Args: 35 | data_format: A string, whose value should be either 'channels_last' or 'channels_first' 36 | """ 37 | 38 | if data_format != 'channels_last' or data_format != 'channels_first': 39 | raise ValueError("data_format should be either 'channels_last' or 'channels_first'") 40 | 41 | self.data_format = data_format 42 | 43 | 44 | def __check_channel_ordering(self): 45 | """ Private function 46 | Determinate TF model's channel ordering based on model's information. 47 | Default ordering is 'channels_last' in TF. 48 | However, 'channels_first' is used in Pytorch. 49 | 50 | Returns: 51 | data_format: A string, whose value should be either 'channels_last' or 'channels_first' 52 | """ 53 | 54 | data_format = None 55 | 56 | # Get the ordering of the dimensions in data from TF model 57 | for L in self.tf_model.layers: 58 | if isinstance(L, tf.keras.layers.Conv2D): 59 | print("[INFO] set data_format = '{:s}'".format(L.data_format)) 60 | data_format = L.data_format 61 | break 62 | 63 | # Guess the ordering of the dimensions in data by input dimensions which sould be 4-D tensor 64 | if data_format is None: 65 | print("[WARNING] Can not find Conv2D layer") 66 | input_shape = self.tf_model.input_shape 67 | 68 | # Assume that input is *colorful image* whose dimensions should be [batch_size, img_w, img_h, 3] 69 | if input_shape[3] == 3: 70 | print("[INFO] Because detecting input_shape[3] == 3, set data_format = 'channels_last'") 71 | data_format = 'channels_last' 72 | 73 | # Assume that input is *gray image* whose dimensions should be [batch_size, img_w, img_h, 1] 74 | elif input_shape[3] == 1: 75 | print("[INFO] Because detecting input_shape[3] == 1, set data_format = 'channels_last'") 76 | data_format = 'channels_last' 77 | 78 | # Assume that input is *colorful image* whose dimensions should be [batch_size, 3, img_w, img_h] 79 | elif input_shape[1] == 3: 80 | print("[INFO] Because detecting input_shape[1] == 3, set data_format = 'channels_first'") 81 | data_format = 'channels_first' 82 | 83 | # Assume that input is *gray image* whose dimensions should be [batch_size, 1, img_w, img_h] 84 | elif input_shape[1] == 1: 85 | print("[INFO] Because detecting input_shape[1] == 1, set data_format = 'channels_first'") 86 | data_format = 'channels_first' 87 | 88 | else: 89 | print("[ERROR] Unknow case") 90 | 91 | return data_format 92 | 93 | 94 | # Common function which may be called in tf.function # 95 | def __get_logits(self, x_input): 96 | """ Private function 97 | Get model's pre-softmax output in inference mode 98 | 99 | Args: 100 | x_input: (tf_tensor) Input data 101 | 102 | Returns: 103 | logits: (tf_tensor) Logits 104 | """ 105 | 106 | return self.tf_model(x_input, training=False) 107 | 108 | 109 | def __get_xent(self, logits, y_input): 110 | """ Private function 111 | Get cross entropy loss 112 | 113 | Args: 114 | logits: (tf_tensor) Logits. 115 | y_input: (tf_tensor) Label. 116 | 117 | Returns: 118 | xent: (tf_tensor) Cross entropy 119 | """ 120 | 121 | return tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y_input) 122 | 123 | 124 | def __get_dlr(self, logit, y_input): 125 | """ Private function 126 | Get DLR loss 127 | 128 | Args: 129 | logit: (tf_tensor) Logits 130 | y_input: (tf_tensor) Input label 131 | 132 | Returns: 133 | loss: (tf_tensor) DLR loss 134 | """ 135 | 136 | # logit 137 | logit_sort = tf.sort(logit, axis=1) 138 | 139 | # onthot_y 140 | y_onehot = tf.one_hot(y_input , self.num_classes, dtype=tf.float32) 141 | logit_y = tf.reduce_sum(y_onehot * logit, axis=1) 142 | 143 | # z_i 144 | logit_pred = tf.reduce_max(logit, axis=1) 145 | cond = (logit_pred == logit_y) 146 | z_i = tf.where(cond, logit_sort[:, -2], logit_sort[:, -1]) 147 | 148 | # loss 149 | z_y = logit_y 150 | z_p1 = logit_sort[:, -1] 151 | z_p3 = logit_sort[:, -3] 152 | 153 | loss = - (z_y - z_i) / (z_p1 - z_p3 + 1e-12) 154 | return loss 155 | 156 | 157 | def __get_dlr_target(self, logits, y_input, y_target): 158 | """ Private function 159 | Get targeted version of DLR loss 160 | 161 | Args: 162 | logit: (tf_tensor) Logits 163 | y_input: (tf_tensor) Input label 164 | y_target: (tf_tensor) Input targeted label 165 | 166 | Returns: 167 | loss: (tf_tensor) Targeted DLR loss 168 | """ 169 | 170 | x = logits 171 | x_sort = tf.sort(x, axis=1) 172 | y_onehot = tf.one_hot(y_input, self.num_classes) 173 | y_target_onehot = tf.one_hot(y_target, self.num_classes) 174 | loss = -(tf.reduce_sum(x * y_onehot, axis=1) - tf.reduce_sum(x * y_target_onehot, axis=1)) / (x_sort[:, -1] - .5 * x_sort[:, -3] - .5 * x_sort[:, -4] + 1e-12) 175 | 176 | return loss 177 | 178 | 179 | # function called by public API directly # 180 | @tf.function 181 | @tf.autograph.experimental.do_not_convert 182 | def __get_jacobian(self, x_input): 183 | """ Private function 184 | Get Jacoian 185 | 186 | Args: 187 | x_input: (tf_tensor) Input data 188 | 189 | Returns: 190 | jaconbian: (tf_tensor) Jacobian 191 | """ 192 | 193 | with tf.GradientTape(watch_accessed_variables=False) as g: 194 | g.watch(x_input) 195 | logits = self.__get_logits(x_input) 196 | 197 | jacobian = g.batch_jacobian(logits, x_input) 198 | 199 | return logits, jacobian 200 | 201 | 202 | @tf.function 203 | @tf.autograph.experimental.do_not_convert 204 | def __get_grad_xent(self, x_input, y_input): 205 | """ Private function 206 | Get gradient of cross entropy 207 | 208 | Args: 209 | x_input: (tf_tensor) Input data 210 | y_input: (tf_tensor) Input label 211 | 212 | Returns: 213 | logits: (tf_tensor) Logits 214 | xent: (tf_tensor) Cross entropy 215 | grad_xent: (tf_tensor) Gradient of cross entropy 216 | """ 217 | 218 | with tf.GradientTape(watch_accessed_variables=False) as g: 219 | g.watch(x_input) 220 | logits = self.__get_logits(x_input) 221 | xent = self.__get_xent(logits, y_input) 222 | 223 | grad_xent = g.gradient(xent, x_input) 224 | 225 | return logits, xent, grad_xent 226 | 227 | 228 | @tf.function 229 | @tf.autograph.experimental.do_not_convert 230 | def __get_grad_diff_logits_target(self, x, la, la_target): 231 | """ Private function 232 | Get difference of logits and corrospopnding gradient 233 | 234 | Args: 235 | x_input: (tf_tensor) Input data 236 | la: (tf_tensor) Input label 237 | la_target: (tf_tensor) Input targeted label 238 | 239 | Returns: 240 | difflogits: (tf_tensor) Difference of logits 241 | grad_diff: (tf_tensor) Gradient of difference of logits 242 | """ 243 | 244 | la_mask = tf.one_hot(la, self.num_classes) 245 | la_target_mask = tf.one_hot(la_target, self.num_classes) 246 | 247 | with tf.GradientTape(watch_accessed_variables=False) as g: 248 | g.watch(x) 249 | logits = self.__get_logits(x) 250 | difflogits = tf.reduce_sum((la_target_mask - la_mask) * logits, axis=1) 251 | 252 | grad_diff = g.gradient(difflogits, x) 253 | 254 | return difflogits, grad_diff 255 | 256 | 257 | @tf.function 258 | @tf.autograph.experimental.do_not_convert 259 | def __get_grad_dlr(self, x_input, y_input): 260 | """ Private function 261 | Get gradient of DLR loss 262 | 263 | Args: 264 | x_input: (tf_tensor) Input data 265 | y_input: (tf_tensor) Input label 266 | 267 | Returns: 268 | logits: (tf_tensor) Logits 269 | val_dlr: (tf_tensor) DLR loss 270 | grad_dlr: (tf_tensor) Gradient of DLR loss 271 | """ 272 | 273 | with tf.GradientTape(watch_accessed_variables=False) as g: 274 | g.watch(x_input) 275 | logits = self.__get_logits(x_input) 276 | val_dlr = self.__get_dlr(logits, y_input) 277 | 278 | grad_dlr = g.gradient(val_dlr, x_input) 279 | 280 | return logits, val_dlr, grad_dlr 281 | 282 | 283 | @tf.function 284 | @tf.autograph.experimental.do_not_convert 285 | def __get_grad_dlr_target(self, x_input, y_input, y_target): 286 | """ Private function 287 | Get gradient of targeted DLR loss 288 | 289 | Args: 290 | x_input: (tf_tensor) Input data 291 | y_input: (tf_tensor) Input label 292 | y_target: (tf_tensor) Input targeted label 293 | 294 | Returns: 295 | logits: (tf_tensor) Logits 296 | val_dlr: (tf_tensor) Targeted DLR loss 297 | grad_dlr: (tf_tensor) Gradient of targeted DLR loss 298 | """ 299 | 300 | with tf.GradientTape(watch_accessed_variables=False) as g: 301 | g.watch(x_input) 302 | logits = self.__get_logits(x_input) 303 | dlr_target = self.__get_dlr_target(logits, y_input, y_target) 304 | 305 | grad_target = g.gradient(dlr_target, x_input) 306 | 307 | return logits, dlr_target, grad_target 308 | 309 | 310 | # Public API # 311 | def predict(self, x): 312 | """ 313 | Get model's pre-softmax output in inference mode 314 | 315 | Args: 316 | x_input: (pytorch_tensor) Input data 317 | 318 | Returns: 319 | y: (pytorch_tensor) Pre-softmax output 320 | """ 321 | 322 | # Convert pt_tensor to tf format 323 | x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32) 324 | if self.data_format == 'channels_last': 325 | x2 = tf.transpose(x2, perm=[0,2,3,1]) 326 | 327 | # Get result 328 | y = self.__get_logits(x2) 329 | 330 | # Convert result to pt format 331 | y = self.__tf_to_pt(y) 332 | 333 | return y 334 | 335 | 336 | def grad_logits(self, x): 337 | """ 338 | Get logits and gradient of logits 339 | 340 | Args: 341 | x: (pytorch_tensor) Input data 342 | 343 | Returns: 344 | logits: (pytorch_tensor) Logits 345 | g2: (pytorch_tensor) Jacobian 346 | """ 347 | 348 | # Convert pt_tensor to tf format 349 | x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32) 350 | if self.data_format == 'channels_last': 351 | x2 = tf.transpose(x2, perm=[0,2,3,1]) 352 | 353 | # Get result 354 | logits, g2 = self.__get_jacobian(x2) 355 | 356 | # Convert result to pt format 357 | if self.data_format == 'channels_last': 358 | g2 = tf.transpose(g2, perm=[0,1,4,2,3]) 359 | logits = self.__tf_to_pt(logits) 360 | g2 = self.__tf_to_pt(g2) 361 | 362 | return logits, g2 363 | 364 | 365 | def get_logits_loss_grad_xent(self, x, y): 366 | """ 367 | Get gradient of cross entropy 368 | 369 | Args: 370 | x: (pytorch_tensor) Input data 371 | y: (pytorch_tensor) Input label 372 | 373 | Returns: 374 | logits_val: (pytorch_tensor) Logits 375 | loss_indiv_val: (pytorch_tensor) Cross entropy 376 | grad_val: (pytorch_tensor) Gradient of cross entropy 377 | """ 378 | 379 | # Convert pt_tensor to tf format 380 | x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32) 381 | y2 = tf.convert_to_tensor(y.cpu().numpy(), dtype=tf.int32) 382 | if self.data_format == 'channels_last': 383 | x2 = tf.transpose(x2, perm=[0,2,3,1]) 384 | 385 | # Get result 386 | logits_val, loss_indiv_val, grad_val = self.__get_grad_xent(x2, y2) 387 | 388 | # Convert result to pt format 389 | if self.data_format == 'channels_last': 390 | grad_val = tf.transpose(grad_val, perm=[0,3,1,2]) 391 | logits_val = self.__tf_to_pt(logits_val) 392 | loss_indiv_val = self.__tf_to_pt(loss_indiv_val) 393 | grad_val = self.__tf_to_pt(grad_val) 394 | 395 | return logits_val, loss_indiv_val, grad_val 396 | 397 | 398 | def set_target_class(self, y, y_target): 399 | pass 400 | 401 | 402 | def get_grad_diff_logits_target(self, x, y, y_target): 403 | """ 404 | Get difference of logits and corrospopnding gradient 405 | 406 | Args: 407 | x: (pytorch_tensor) Input data 408 | y: (pytorch_tensor) Input label 409 | y_target: (pytorch_tensor) Input targeted label 410 | 411 | Returns: 412 | difflogits: (pytorch_tensor) Difference of logits 413 | g2: (pytorch_tensor) Gradient of difference of logits 414 | """ 415 | 416 | # Convert pt_tensor to tf format 417 | la = tf.convert_to_tensor(y.cpu().numpy(), dtype=tf.int32) 418 | la_target = tf.convert_to_tensor(y_target.cpu().numpy(), dtype=tf.int32) 419 | x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32) 420 | if self.data_format == 'channels_last': 421 | x2 = tf.transpose(x2, perm=[0,2,3,1]) 422 | 423 | # Get result 424 | difflogits, g2 = self.__get_grad_diff_logits_target(x2, la, la_target) 425 | 426 | # Convert result to pt format 427 | if self.data_format == 'channels_last': 428 | g2 = tf.transpose(g2, perm=[0, 3, 1, 2]) 429 | difflogits = self.__tf_to_pt(difflogits) 430 | g2 = self.__tf_to_pt(g2) 431 | 432 | return difflogits, g2 433 | 434 | 435 | def get_logits_loss_grad_dlr(self, x, y): 436 | """ 437 | Get gradient of DLR loss 438 | 439 | Args: 440 | x: (pytorch_tensor) Input data 441 | y: (pytorch_tensor) Input label 442 | 443 | Returns: 444 | logits_val: (pytorch_tensor) Logits 445 | loss_indiv_val: (pytorch_tensor) DLR loss 446 | grad_val: (pytorch_tensor) Gradient of DLR loss 447 | """ 448 | 449 | # Convert pt_tensor to tf format 450 | x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32) 451 | y2 = tf.convert_to_tensor(y.cpu().numpy(), dtype=tf.int32) 452 | if self.data_format == 'channels_last': 453 | x2 = tf.transpose(x2, perm=[0,2,3,1]) 454 | 455 | # Get result 456 | logits_val, loss_indiv_val, grad_val = self.__get_grad_dlr(x2, y2) 457 | 458 | # Convert result to pt format 459 | if self.data_format == 'channels_last': 460 | grad_val = tf.transpose(grad_val, perm=[0,3,1,2]) 461 | logits_val = self.__tf_to_pt(logits_val) 462 | loss_indiv_val = self.__tf_to_pt(loss_indiv_val) 463 | grad_val = self.__tf_to_pt(grad_val) 464 | 465 | return logits_val, loss_indiv_val, grad_val 466 | 467 | def get_logits_loss_grad_target(self, x, y, y_target): 468 | """ 469 | Get gradient of targeted DLR loss 470 | 471 | Args: 472 | x: (pytorch_tensor) Input data 473 | y: (pytorch_tensor) Input label 474 | y_target: (pytorch_tensor) Input targeted label 475 | 476 | Returns: 477 | logits_val: (pytorch_tensor) Logits 478 | loss_indiv_val: (pytorch_tensor) Targeted DLR loss 479 | grad_val: (pytorch_tensor) Gradient of targeted DLR loss 480 | """ 481 | 482 | # Convert pt_tensor to tf format 483 | x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32) 484 | y2 = tf.convert_to_tensor(y.cpu().numpy(), dtype=tf.int32) 485 | y_targ = tf.convert_to_tensor(y_target.cpu().numpy(), dtype=tf.int32) 486 | if self.data_format == 'channels_last': 487 | x2 = tf.transpose(x2, perm=[0,2,3,1]) 488 | 489 | # Get result 490 | logits_val, loss_indiv_val, grad_val = self.__get_grad_dlr_target(x2, y2, y_targ) 491 | 492 | # Convert result to pt format 493 | if self.data_format == 'channels_last': 494 | grad_val = tf.transpose(grad_val, perm=[0,3,1,2]) 495 | logits_val = self.__tf_to_pt(logits_val) 496 | loss_indiv_val = self.__tf_to_pt(loss_indiv_val) 497 | grad_val = self.__tf_to_pt(grad_val) 498 | 499 | return logits_val, loss_indiv_val, grad_val 500 | -------------------------------------------------------------------------------- /flags_doc.md: -------------------------------------------------------------------------------- 1 | ## On the usage of AutoAttack 2 | 3 | We here describe cases where the standard version of AA might be non suitable or sufficient for robustness evaluation. While AA is designed to generalize across defenses, there are categories like 4 | randomized, non differentiable or dynamic defenses for which it cannot be applied in the standard version, since those rely on differet principles than commonly used robust models. In such cases, 5 | specific modifications or adaptive attacks [(Tramèr et al., 2020)](https://arxiv.org/abs/2002.08347) might be necessary. 6 | 7 | ## Checks 8 | We introduce a few automatic checks to warn the user in case the classifier presents behaviors typical of non standard models. Below we describe the type of flags which might be raised and provide 9 | some suggestions about how the robustness evaluation could be improved in the specific cases. Note that some of the checks are in line with the analyses and suggestions by recent works 10 | ([Carlini et al., 2019](https://arxiv.org/abs/1902.06705); [Croce et al., 2020](https://arxiv.org/abs/2010.09670); [Pintor et al., 2021](https://arxiv.org/abs/2106.09947)) which provide guidelines for 11 | evaluating robustness and detecting failures of attacks. 12 | 13 | ### Randomized defenses 14 | **Raised if** the clean accuracy of the classifier on a batch or the corresponding logits vary across multiple runs.\ 15 | **Explanation:** non deterministic classifiers need to be evaluated with specific techniques e.g. EoT [(Athalye et al., 2018)](http://proceedings.mlr.press/v80/athalye18a.html) and mislead 16 | standard attacks. We suggest to use AA with `version='rand'`, which inclueds APGD combined with EoT. Note that there might still be some random components 17 | in the network which however do not change the predictions or the logits beyond the chosen threshold. 18 | 19 | ### Softmax output is given 20 | **Raised if** the model outputs a probability distribution. \ 21 | **Explanation:** AA expects the model to return logits, i.e. pre-softmax output of the network. If this is not the case, although the classification is unaltered, 22 | there might be numerical instabilities which prevent the gradient-based attacks to perform well. 23 | 24 | ### Zero gradient 25 | **Raised if** the gradient at the (random) starting point of APGD is zero for any image when using the DLR loss. \ 26 | **Explanation:** zero gradient prevents progress in gradient-based iterative attacks. A source of it could be connected to the cross-entropy loss and the scale of the logits, but a remedy consists in 27 | using margin based losses ([Carlini & Wagner, 2017](https://ieeexplore.ieee.org/abstract/document/7958570); [Croce & Hein, 2020](https://arxiv.org/abs/2003.01690)). Vanishing gradients can be also due to specific 28 | components of the networks, like input quantization (see e.g. [here](https://github.com/fra31/auto-attack/issues/44)), which do not allow 29 | backpropagation. In this case one might use BPDA [(Athalye et al., 2018)](http://proceedings.mlr.press/v80/athalye18a.html), which approximates such functions with differentiable counterparts, or black-box attacks, especially those, like Square Attack, which do not rely on 30 | gradient estimation. 31 | 32 | ### Square Attack improves the robustness evaluation 33 | **Raised if** Square Attack reduces the robust accuracy yielded by the white-box attacks. \ 34 | **Explanation:** as mentioned by [Carlini et al. (2019)](https://arxiv.org/abs/1902.06705), black-box attacks performing better than white-box ones is one of the hints of overestimation of robustness. In this case one might run 35 | Square Attack with higher budget (more queries, random restarts) or design adaptive attacks, since it is likely that the tested defense has some features preventing standard gradient-based methods 36 | to be effective. 37 | 38 | ### Optimization at inference time (only PyTorch models) 39 | **Raised if** standard PyTorch functions for computing the gradients are called when running inference with the given classifier. \ 40 | **Explanation:** several defenses which include some optimization loop in the inference procedure have appeared. While AA can give a first estimation of the robustness, it is necessary in this case 41 | to design adaptive attacks, since such models usually modify the input before classifying it, which requires specific techniques for evaluation. Note that this check is non trivial to make automatic, 42 | and we invite the user to be aware that AA might be not the best option to evaluate dynamic defenses. -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import setuptools 2 | 3 | 4 | with open("README.md", "r", encoding="utf-8") as fh: 5 | long_description = fh.read() 6 | 7 | setuptools.setup( 8 | name="autoattack", 9 | version="0.1", 10 | author="Francesco Croce, Matthias Hein", 11 | author_email="francesco.croce@uni-tuebingen.de", 12 | description="This package provides the implementation of AutoAttack.", 13 | long_description=long_description, 14 | long_description_content_type="text/markdown", 15 | url="https://github.com/fra31/auto-attack", 16 | packages=setuptools.find_packages(), 17 | classifiers=[ 18 | "Programming Language :: Python :: 3", 19 | "License :: OSI Approved :: Apache Software License", 20 | "Operating System :: OS Independent", 21 | ], 22 | ) 23 | 24 | 25 | --------------------------------------------------------------------------------