├── .github
    └── ISSUE_TEMPLATE
    │   └── add-defense.md
├── .gitignore
├── LICENSE
├── README.md
├── autoattack
    ├── __init__.py
    ├── autoattack.py
    ├── autopgd_base.py
    ├── checks.py
    ├── examples
    │   ├── eval.py
    │   ├── eval_tf1.py
    │   ├── eval_tf2.py
    │   ├── model_test.pt
    │   ├── resnet.py
    │   └── tf_model.weight.h5
    ├── fab_base.py
    ├── fab_projections.py
    ├── fab_pt.py
    ├── fab_tf.py
    ├── other_utils.py
    ├── square.py
    ├── state.py
    ├── utils_tf.py
    └── utils_tf2.py
├── flags_doc.md
└── setup.py


/.github/ISSUE_TEMPLATE/add-defense.md:
--------------------------------------------------------------------------------
 1 | ---
 2 | name: Add defense
 3 | about: Use this to have a new model added to the list of defenses
 4 | title: Add [defense name]
 5 | labels: ''
 6 | assignees: ''
 7 | 
 8 | ---
 9 | 
10 | **Paper**: {title and link}
11 | 
12 | **Venue**: {if applicable, the venue where the paper appeared}
13 | 
14 | **Dataset and threat model**: {dataset, norm and epsilon for robust accuracy}
15 | 
16 | **Code**: {link to the code e.g. GitHub page, possibly including a script to run the evaluation}
17 | 
18 | **Pre-trained model**: {link to model weights available for downloading}
19 | 
20 | **Log file**: {link to log file of the evaluation}
21 | 
22 | **Additional data**: {yes/no, whether extra data, other than the standard training set, is used}
23 | 
24 | **Clean and robust accuracy**: {on the full test set}
25 | 
26 | **Architecture**: {}
27 | 
28 | **Description of the model/defense**: {more details about the method proposed, e.g. new loss, new adversarial training, faster training, compressed model, which are relevant contributions of the paper}
29 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | data/
2 | /__pycache__/
3 | */__pycache__/
4 | 
5 | build/
6 | dist/
7 | *.egg-info/


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Francesco Croce
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # AutoAttack
  2 | 
  3 | "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks"\
  4 | *Francesco Croce*, *Matthias Hein*\
  5 | ICML 2020\
  6 | [https://arxiv.org/abs/2003.01690](https://arxiv.org/abs/2003.01690)
  7 | 
  8 | 
  9 | We propose to use an ensemble of four diverse attacks to reliably evaluate robustness:
 10 | + **APGD-CE**, our new step size-free version of PGD on the cross-entropy,
 11 | + **APGD-DLR**, our new step size-free version of PGD on the new DLR loss,
 12 | + **FAB**, which minimizes the norm of the adversarial perturbations [(Croce & Hein, 2019)](https://arxiv.org/abs/1907.02044),
 13 | + **Square Attack**, a query-efficient black-box attack [(Andriushchenko et al, 2019)](https://arxiv.org/abs/1912.00049).
 14 | 
 15 | **Note**: we fix all the hyperparameters of the attacks, so no tuning is required to test every new classifier.
 16 | 
 17 | ## News
 18 | + [Sep 2021]
 19 | 	+ We add [automatic checks](https://github.com/fra31/auto-attack/blob/master/flags_doc.md) for potential cases where the standard version of AA might be non suitable or sufficient for robustness evaluation.
 20 | 	+ The evaluations of models on CIFAR-10 and CIFAR-100 are no longer maintained. Up-to-date leaderboards are available in [RobustBench](https://robustbench.github.io/).
 21 | + [Mar 2021] A version of AutoAttack wrt L1, which includes the extensions of APGD and Square Attack [(Croce & Hein, 2021)](https://arxiv.org/abs/2103.01208), is available!
 22 | + [Oct 2020] AutoAttack is used as standard evaluation in the new benchmark [RobustBench](https://robustbench.github.io/), which includes a [Model Zoo](https://github.com/RobustBench/robustbench) of the most robust classifiers! Note that this page and RobustBench's leaderboards are maintained simultaneously.
 23 | + [Aug 2020]
 24 | 	+ **Updated version**: in order to *i)* scale AutoAttack (AA) to datasets with many classes and *ii)* have a faster and more accurate evaluation, we use APGD-DLR and FAB with their *targeted* versions.
 25 | 	+ We add the evaluation of models on CIFAR-100 wrt Linf and CIFAR-10 wrt L2.
 26 | + [Jul 2020] A short version of the paper is accepted at [ICML'20 UDL workshop](https://sites.google.com/view/udlworkshop2020/) for a spotlight presentation!
 27 | + [Jun 2020] The paper is accepted at ICML 2020!
 28 | 
 29 | # Adversarial Defenses Evaluation
 30 | We here list adversarial defenses, for many threat models, recently proposed and evaluated with the standard version of
 31 | **AutoAttack (AA)**, including
 32 | + *untargeted APGD-CE* (no restarts),
 33 | + *targeted APGD-DLR* (9 target classes),
 34 | + *targeted FAB* (9 target classes),
 35 | + *Square Attack* (5000 queries).
 36 | 
 37 | See below for the more expensive AutoAttack+ (AA+) and more options.
 38 | 
 39 | We report the source of the model, i.e. if it is publicly *available*, if we received it from the *authors* or if we *retrained* it, the architecture, the clean accuracy and the reported robust accuracy (note that might be calculated on a subset of the test set or on different models trained with the same defense). The robust accuracy for AA is on the full test set.
 40 | 
 41 | We plan to add new models as they appear and are made available. Feel free to suggest new defenses to test!
 42 | 
 43 | **To have a model added**: please check [here](https://github.com/fra31/auto-attack/issues/new/choose).
 44 | 
 45 | **Checkpoints**: many of the evaluated models are available and easily accessible at this [Model Zoo](https://github.com/RobustBench/robustbench).
 46 | 
 47 | ## CIFAR-10 - Linf
 48 | The robust accuracy is evaluated at `eps = 8/255`, except for those marked with * for which `eps = 0.031`, where `eps` is the maximal Linf-norm allowed for the adversarial perturbations. The `eps` used is the same set in the original papers.\
 49 | **Note**: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).
 50 | 
 51 | **Update**: this is no longer maintained, but an up-to-date leaderboard is available in [RobustBench](https://robustbench.github.io/).
 52 | 
 53 | |#    |paper           |model     |architecture |clean         |report. |AA  |
 54 | |:---:|---|:---:|:---:|---:|---:|---:|
 55 | |**1**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)‡| *available*| WRN-70-16| 91.10| 65.87| 65.88|
 56 | |**2**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)‡| *available*| WRN-28-10| 89.48| 62.76| 62.80|
 57 | |**3**| [(Wu et al., 2020a)](https://arxiv.org/abs/2010.01279)‡| *available*| WRN-34-15| 87.67| 60.65| 60.65|
 58 | |**4**| [(Wu et al., 2020b)](https://arxiv.org/abs/2004.05884)‡| *available*| WRN-28-10| 88.25| 60.04| 60.04|
 59 | |**5**| [(Carmon et al., 2019)](https://arxiv.org/abs/1905.13736)‡| *available*| WRN-28-10| 89.69| 62.5| 59.53|
 60 | |**6**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)| *available*| WRN-70-16| 85.29| 57.14| 57.20|
 61 | |**7**| [(Sehwag et al., 2020)](https://github.com/fra31/auto-attack/issues/7)‡| *available*| WRN-28-10| 88.98| -| 57.14|
 62 | |**8**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)| *available*| WRN-34-20| 85.64| 56.82| 56.86|
 63 | |**9**| [(Wang et al., 2020)](https://openreview.net/forum?id=rklOg6EFwS)‡| *available*| WRN-28-10| 87.50| 65.04| 56.29|
 64 | |**10**| [(Wu et al., 2020b)](https://arxiv.org/abs/2004.05884)| *available*| WRN-34-10| 85.36| 56.17| 56.17|
 65 | |**11**| [(Alayrac et al., 2019)](https://arxiv.org/abs/1905.13725)‡| *available*| WRN-106-8| 86.46| 56.30| 56.03|
 66 | |**12**| [(Hendrycks et al., 2019)](https://arxiv.org/abs/1901.09960)‡| *available*| WRN-28-10| 87.11| 57.4| 54.92|
 67 | |**13**| [(Pang et al., 2020c)](https://arxiv.org/abs/2010.00467)| *available*| WRN-34-20| 86.43| 54.39| 54.39|
 68 | |**14**| [(Pang et al., 2020b)](https://arxiv.org/abs/2002.08619)| *available*| WRN-34-20| 85.14| -| 53.74|
 69 | |**15**| [(Cui et al., 2020)](https://arxiv.org/abs/2011.11164)\*| *available*| WRN-34-20| 88.70| 53.57| 53.57|
 70 | |**16**| [(Zhang et al., 2020b)](https://arxiv.org/abs/2002.11242)| *available*| WRN-34-10| 84.52| 54.36| 53.51|
 71 | |**17**| [(Rice et al., 2020)](https://arxiv.org/abs/2002.11569)| *available*| WRN-34-20| 85.34| 58| 53.42|
 72 | |**18**| [(Huang et al., 2020)](https://arxiv.org/abs/2002.10319)\*| *available*| WRN-34-10| 83.48| 58.03| 53.34|
 73 | |**19**| [(Zhang et al., 2019b)](https://arxiv.org/abs/1901.08573)\*| *available*| WRN-34-10| 84.92| 56.43| 53.08|
 74 | |**20**| [(Cui et al., 2020)](https://arxiv.org/abs/2011.11164)\*| *available*| WRN-34-10| 88.22| 52.86| 52.86|
 75 | |**21**| [(Qin et al., 2019)](https://arxiv.org/abs/1907.02610v2)| *available*| WRN-40-8| 86.28| 52.81| 52.84|
 76 | |**22**| [(Chen et al., 2020a)](https://arxiv.org/abs/2003.12862)| *available*| RN-50 (x3)| 86.04| 54.64| 51.56|
 77 | |**23**| [(Chen et al., 2020b)](https://github.com/fra31/auto-attack/issues/26)| *available*| WRN-34-10| 85.32| 51.13| 51.12|
 78 | |**24**| [(Sitawarin et al., 2020)](https://github.com/fra31/auto-attack/issues/23)| *available*| WRN-34-10| 86.84| 50.72| 50.72|
 79 | |**25**| [(Engstrom et al., 2019)](https://github.com/MadryLab/robustness)| *available*| RN-50| 87.03| 53.29| 49.25|
 80 | |**26**| [(Kumari et al., 2019)](https://arxiv.org/abs/1905.05186)| *available*| WRN-34-10| 87.80| 53.04| 49.12|
 81 | |**27**| [(Mao et al., 2019)](http://papers.nips.cc/paper/8339-metric-learning-for-adversarial-robustness)| *available*| WRN-34-10| 86.21| 50.03| 47.41|
 82 | |**28**| [(Zhang et al., 2019a)](https://arxiv.org/abs/1905.00877)| *retrained*| WRN-34-10| 87.20| 47.98| 44.83|
 83 | |**29**| [(Madry et al., 2018)](https://arxiv.org/abs/1706.06083)| *available*| WRN-34-10| 87.14| 47.04| 44.04|
 84 | |**30**| [(Pang et al., 2020a)](https://arxiv.org/abs/1905.10626)| *available*| RN-32| 80.89| 55.0| 43.48|
 85 | |**31**| [(Wong et al., 2020)](https://arxiv.org/abs/2001.03994)| *available*| RN-18| 83.34| 46.06| 43.21|
 86 | |**32**| [(Shafahi et al., 2019)](https://arxiv.org/abs/1904.12843)| *available*| WRN-34-10| 86.11| 46.19| 41.47|
 87 | |**33**| [(Ding et al., 2020)](https://openreview.net/forum?id=HkeryxBtPB)| *available*| WRN-28-4| 84.36| 47.18| 41.44|
 88 | |**34**| [(Atzmon et al., 2019)](https://arxiv.org/abs/1905.11911)\*| *available*| RN-18| 81.30| 43.17| 40.22|
 89 | |**35**| [(Moosavi-Dezfooli et al., 2019)](http://openaccess.thecvf.com/content_CVPR_2019/html/Moosavi-Dezfooli_Robustness_via_Curvature_Regularization_and_Vice_Versa_CVPR_2019_paper)| *authors*| WRN-28-10| 83.11| 41.4| 38.50|
 90 | |**36**| [(Zhang & Wang, 2019)](http://papers.nips.cc/paper/8459-defense-against-adversarial-attacks-using-feature-scattering-based-adversarial-training)| *available*| WRN-28-10| 89.98| 60.6| 36.64|
 91 | |**37**| [(Zhang & Xu, 2020)](https://openreview.net/forum?id=Syejj0NYvr&noteId=Syejj0NYvr)| *available*| WRN-28-10| 90.25| 68.7| 36.45|
 92 | |**38**| [(Jang et al., 2019)](http://openaccess.thecvf.com/content_ICCV_2019/html/Jang_Adversarial_Defense_via_Learning_to_Generate_Diverse_Attacks_ICCV_2019_paper.html)| *available*| RN-20| 78.91| 37.40| 34.95|
 93 | |**39**| [(Kim & Wang, 2020)](https://openreview.net/forum?id=rJlf_RVKwr)| *available*| WRN-34-10| 91.51| 57.23| 34.22|
 94 | |**40**| [(Wang & Zhang, 2019)](http://openaccess.thecvf.com/content_ICCV_2019/html/Wang_Bilateral_Adversarial_Training_Towards_Fast_Training_of_More_Robust_Models_ICCV_2019_paper.html)| *available*| WRN-28-10| 92.80| 58.6| 29.35|
 95 | |**41**| [(Xiao et al., 2020)](https://arxiv.org/abs/1905.10510)\*| *available*| DenseNet-121| 79.28| 52.4| 18.50|
 96 | |**42**| [(Jin & Rinard, 2020)](https://arxiv.org/abs/2003.04286v1) | [*available*](https://github.com/charlesjin/adversarial_regularization/blob/6a3704757dcc7c707ff38f8b9de6f2e9e27e0a89/pretrained/pretrained88.pth) | RN-18| 90.84| 71.22| 1.35|
 97 | |**43**| [(Mustafa et al., 2019)](https://arxiv.org/abs/1904.00887)| *available*| RN-110| 89.16| 32.32| 0.28|
 98 | |**44**| [(Chan et al., 2020)](https://arxiv.org/abs/1912.10185)| *retrained*| WRN-34-10| 93.79| 15.5| 0.26|
 99 | 
100 | ## CIFAR-100 - Linf
101 | The robust accuracy is computed at `eps = 8/255` in the Linf-norm, except for the models marked with * for which `eps = 0.031` is used. \
102 | **Note**: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).\
103 | \
104 | **Update**: this is no longer maintained, but an up-to-date leaderboard is available in [RobustBench](https://robustbench.github.io/).
105 | 
106 | |#    |paper           |model     |architecture |clean         |report. |AA  |
107 | |:---:|---|:---:|:---:|---:|---:|---:|
108 | |**1**| [(Gowal et al. 2020)](https://arxiv.org/abs/2010.03593)‡| *available*| WRN-70-16| 69.15| 37.70| 36.88|
109 | |**2**| [(Cui et al., 2020)](https://arxiv.org/abs/2011.11164)\*| *available*| WRN-34-20| 62.55| 30.20| 30.20|
110 | |**3**| [(Gowal et al. 2020)](https://arxiv.org/abs/2010.03593)| *available*| WRN-70-16| 60.86| 30.67| 30.03|
111 | |**4**| [(Cui et al., 2020)](https://arxiv.org/abs/2011.11164)\*| *available*| WRN-34-10| 60.64| 29.33| 29.33|
112 | |**5**| [(Wu et al., 2020b)](https://arxiv.org/abs/2004.05884)| *available*| WRN-34-10| 60.38| 28.86| 28.86|
113 | |**6**| [(Hendrycks et al., 2019)](https://arxiv.org/abs/1901.09960)‡| *available*| WRN-28-10| 59.23| 33.5| 28.42|
114 | |**7**| [(Cui et al., 2020)](https://arxiv.org/abs/2011.11164)\*| *available*| WRN-34-10| 70.25| 27.16| 27.16|
115 | |**8**| [(Chen et al., 2020b)](https://github.com/fra31/auto-attack/issues/26)| *available*| WRN-34-10| 62.15| -| 26.94|
116 | |**9**| [(Sitawarin et al., 2020)](https://github.com/fra31/auto-attack/issues/22)| *available*| WRN-34-10| 62.82| 24.57| 24.57|
117 | |**10**| [(Rice et al., 2020)](https://arxiv.org/abs/2002.11569)| *available*| RN-18| 53.83| 28.1| 18.95|
118 | 
119 | ## MNIST - Linf
120 | The robust accuracy is computed at `eps = 0.3` in the Linf-norm.
121 | 
122 | |#    |paper           |model     |clean         |report. |AA  |
123 | |:---:|---|:---:|---:|---:|---:|
124 | |**1**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)| *available*| 99.26| 96.38| 96.34|
125 | |**2**| [(Zhang et al., 2020a)](https://arxiv.org/abs/1906.06316)| *available*| 98.38| 96.38| 93.96|
126 | |**3**| [(Gowal et al., 2019)](https://arxiv.org/abs/1810.12715)| *available*| 98.34| 93.78| 92.83|
127 | |**4**| [(Zhang et al., 2019b)](https://arxiv.org/abs/1901.08573)| *available*| 99.48| 95.60| 92.81|
128 | |**5**| [(Ding et al., 2020)](https://openreview.net/forum?id=HkeryxBtPB)| *available*| 98.95| 92.59| 91.40|
129 | |**6**| [(Atzmon et al., 2019)](https://arxiv.org/abs/1905.11911)| *available*| 99.35| 97.35| 90.85|
130 | |**7**| [(Madry et al., 2018)](https://arxiv.org/abs/1706.06083)| *available*| 98.53| 89.62| 88.50|
131 | |**8**| [(Jang et al., 2019)](http://openaccess.thecvf.com/content_ICCV_2019/html/Jang_Adversarial_Defense_via_Learning_to_Generate_Diverse_Attacks_ICCV_2019_paper.html)| *available*| 98.47| 94.61| 87.99|
132 | |**9**| [(Wong et al., 2020)](https://arxiv.org/abs/2001.03994)| *available*| 98.50| 88.77| 82.93|
133 | |**10**| [(Taghanaki et al., 2019)](http://openaccess.thecvf.com/content_CVPR_2019/html/Taghanaki_A_Kernelized_Manifold_Mapping_to_Diminish_the_Effect_of_Adversarial_CVPR_2019_paper.html)| *retrained*| 98.86| 64.25| 0.00|
134 | 
135 | ## CIFAR-10 - L2
136 | The robust accuracy is computed at `eps = 0.5` in the L2-norm.\
137 | **Note**: ‡ indicates models which exploit additional data for training (e.g. unlabeled data, pre-training).
138 | 
139 | **Update**: this is no longer maintained, but an up-to-date leaderboard is available in [RobustBench](https://robustbench.github.io/).
140 | 
141 | |#    |paper           |model     |architecture |clean         |report. |AA  |
142 | |:---:|---|:---:|:---:|---:|---:|---:|
143 | |**1**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)‡| *available*| WRN-70-16| 94.74| -| 80.53|
144 | |**2**| [(Gowal et al., 2020)](https://arxiv.org/abs/2010.03593)| *available*| WRN-70-16| 90.90| -| 74.50|
145 | |**3**| [(Wu et al., 2020b)](https://arxiv.org/abs/2004.05884)| *available*| WRN-34-10| 88.51| 73.66| 73.66|
146 | |**4**| [(Augustin et al., 2020)](https://arxiv.org/abs/2003.09461)‡| *authors*| RN-50| 91.08| 73.27| 72.91|
147 | |**5**| [(Engstrom et al., 2019)](https://github.com/MadryLab/robustness)| *available*| RN-50| 90.83| 70.11| 69.24|
148 | |**6**| [(Rice et al., 2020)](https://arxiv.org/abs/2002.11569)| *available*| RN-18| 88.67| 71.6| 67.68|
149 | |**7**| [(Rony et al., 2019)](https://arxiv.org/abs/1811.09600)| *available*| WRN-28-10| 89.05| 67.6| 66.44|
150 | |**8**| [(Ding et al., 2020)](https://openreview.net/forum?id=HkeryxBtPB)| *available*| WRN-28-4| 88.02| 66.18| 66.09|
151 | 
152 | # How to use AutoAttack
153 | 
154 | ### Installation
155 | 
156 | ```
157 | pip install git+https://github.com/fra31/auto-attack
158 | ```
159 | 
160 | ### PyTorch models
161 | Import and initialize AutoAttack with
162 | 
163 | ```python
164 | from autoattack import AutoAttack
165 | adversary = AutoAttack(forward_pass, norm='Linf', eps=epsilon, version='standard')
166 | ```
167 | 
168 | where:
169 | + `forward_pass` returns the logits and takes input with components in [0, 1] (NCHW format expected),
170 | + `norm = ['Linf' | 'L2' | 'L1']` is the norm of the threat model,
171 | + `eps` is the bound on the norm of the adversarial perturbations,
172 | + `version = 'standard'` uses the standard version of AA.
173 | 
174 | To apply the standard evaluation, where the attacks are run sequentially on batches of size `bs` of `images`, use
175 | 
176 | ```python
177 | x_adv = adversary.run_standard_evaluation(images, labels, bs=batch_size)
178 | ```
179 | 
180 | To run the attacks individually, use
181 | 
182 | ```python
183 | dict_adv = adversary.run_standard_evaluation_individual(images, labels, bs=batch_size)
184 | ```
185 | 
186 | which returns a dictionary with the adversarial examples found by each attack.
187 | 
188 | To specify a subset of attacks add e.g. `adversary.attacks_to_run = ['apgd-ce']`.
189 | 
190 | ### TensorFlow models
191 | To evaluate models implemented in TensorFlow 1.X, use
192 | 
193 | ```python
194 | from autoattack import utils_tf
195 | model_adapted = utils_tf.ModelAdapter(logits, x_input, y_input, sess)
196 | 
197 | from autoattack import AutoAttack
198 | adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)
199 | ```
200 | 
201 | where:
202 | + `logits` is the tensor with the logits given by the model,
203 | + `x_input` is a placeholder for the input for the classifier (NHWC format expected),
204 | + `y_input` is a placeholder for the correct labels,
205 | + `sess` is a TF session.
206 | 
207 | If TensorFlow's version is 2.X, use
208 | 
209 | ```python
210 | from autoattack import utils_tf2
211 | model_adapted = utils_tf2.ModelAdapter(tf_model)
212 | 
213 | from autoattack import AutoAttack
214 | adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)
215 | ```
216 | 
217 | where:
218 | + `tf_model` is tf.keras model without activation function 'softmax'
219 | 
220 | The evaluation can be run in the same way as done with PT models.
221 | 
222 | ### Examples
223 | Examples of how to use AutoAttack can be found in `examples/`. To run the standard evaluation on a pretrained
224 | PyTorch model on CIFAR-10 use
225 | ```
226 | python eval.py [--individual] --version=['standard' | 'plus']
227 | ```
228 | where the optional flags activate respectively the *individual* evaluations (all the attacks are run on the full test set) and the *version* of AA to use (see below).
229 | 
230 | ## Other versions
231 | ### AutoAttack+
232 | A more expensive evaluation can be used specifying `version='plus'` when initializing AutoAttack. This includes
233 | + *untargeted APGD-CE* (5 restarts),
234 | + *untargeted APGD-DLR* (5 restarts),
235 | + *untargeted FAB* (5 restarts),
236 | + *Square Attack* (5000 queries),
237 | + *targeted APGD-DLR* (9 target classes),
238 | + *targeted FAB* (9 target classes).
239 | 
240 | ### Randomized defenses
241 | In case of classifiers with stochastic components one can combine AA with Expectation over Transformation (EoT) as in [(Athalye et al., 2018)](https://arxiv.org/abs/1802.00420) specifying `version='rand'` when initializing AutoAttack.
242 | This runs
243 | + *untargeted APGD-CE* (no restarts, 20 iterations for EoT),
244 | + *untargeted APGD-DLR* (no restarts, 20 iterations for EoT).
245 | 
246 | ### Custom version
247 | It is possible to customize the attacks to run specifying `version='custom'` when initializing the attack and then, for example,
248 | ```python
249 | if args.version == 'custom':
250 | 	adversary.attacks_to_run = ['apgd-ce', 'fab']
251 |         adversary.apgd.n_restarts = 2
252 |         adversary.fab.n_restarts = 2
253 | ```
254 | 
255 | ## Other options
256 | ### Random seed
257 | It is possible to fix the random seed used for the attacks with, e.g., `adversary.seed = 0`. In this case the same seed is used for all the attacks used, otherwise a different random seed is picked for each attack.
258 | 
259 | ### Log results
260 | To log the intermediate results of the evaluation specify `log_path=/path/to/logfile.txt` when initializing the attack.
261 | 
262 | ## Citation
263 | ```
264 | @inproceedings{croce2020reliable,
265 |     title = {Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks},
266 |     author = {Francesco Croce and Matthias Hein},
267 |     booktitle = {ICML},
268 |     year = {2020}
269 | }
270 | ```
271 | 
272 | ```
273 | @inproceedings{croce2021mind,
274 |     title={Mind the box: $l_1$-APGD for sparse adversarial attacks on image classifiers}, 
275 |     author={Francesco Croce and Matthias Hein},
276 |     booktitle={ICML},
277 |     year={2021}
278 | }
279 | ```
280 | 


--------------------------------------------------------------------------------
/autoattack/__init__.py:
--------------------------------------------------------------------------------
1 | from .autoattack import AutoAttack
2 | 


--------------------------------------------------------------------------------
/autoattack/autoattack.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | import time
  3 | 
  4 | import numpy as np
  5 | import torch
  6 | 
  7 | from .other_utils import Logger
  8 | from autoattack import checks
  9 | from autoattack.state import EvaluationState
 10 | 
 11 | 
 12 | class AutoAttack():
 13 |     def __init__(self, model, norm='Linf', eps=.3, seed=None, verbose=True,
 14 |                  attacks_to_run=[], version='standard', is_tf_model=False,
 15 |                  device='cuda', log_path=None):
 16 |         self.model = model
 17 |         self.norm = norm
 18 |         assert norm in ['Linf', 'L2', 'L1']
 19 |         self.epsilon = eps
 20 |         self.seed = seed
 21 |         self.verbose = verbose
 22 |         self.attacks_to_run = attacks_to_run
 23 |         self.version = version
 24 |         self.is_tf_model = is_tf_model
 25 |         self.device = device
 26 |         self.logger = Logger(log_path)
 27 | 
 28 |         if version in ['standard', 'plus', 'rand'] and attacks_to_run != []:
 29 |             raise ValueError("attacks_to_run will be overridden unless you use version='custom'")
 30 |         
 31 |         if not self.is_tf_model:
 32 |             from .autopgd_base import APGDAttack
 33 |             self.apgd = APGDAttack(self.model, n_restarts=5, n_iter=100, verbose=False,
 34 |                 eps=self.epsilon, norm=self.norm, eot_iter=1, rho=.75, seed=self.seed,
 35 |                 device=self.device, logger=self.logger)
 36 |             
 37 |             from .fab_pt import FABAttack_PT
 38 |             self.fab = FABAttack_PT(self.model, n_restarts=5, n_iter=100, eps=self.epsilon, seed=self.seed,
 39 |                 norm=self.norm, verbose=False, device=self.device)
 40 |         
 41 |             from .square import SquareAttack
 42 |             self.square = SquareAttack(self.model, p_init=.8, n_queries=5000, eps=self.epsilon, norm=self.norm,
 43 |                 n_restarts=1, seed=self.seed, verbose=False, device=self.device, resc_schedule=False)
 44 |                 
 45 |             from .autopgd_base import APGDAttack_targeted
 46 |             self.apgd_targeted = APGDAttack_targeted(self.model, n_restarts=1, n_iter=100, verbose=False,
 47 |                 eps=self.epsilon, norm=self.norm, eot_iter=1, rho=.75, seed=self.seed, device=self.device,
 48 |                 logger=self.logger)
 49 |     
 50 |         else:
 51 |             from .autopgd_base import APGDAttack
 52 |             self.apgd = APGDAttack(self.model, n_restarts=5, n_iter=100, verbose=False,
 53 |                 eps=self.epsilon, norm=self.norm, eot_iter=1, rho=.75, seed=self.seed, device=self.device,
 54 |                 is_tf_model=True, logger=self.logger)
 55 |             
 56 |             from .fab_tf import FABAttack_TF
 57 |             self.fab = FABAttack_TF(self.model, n_restarts=5, n_iter=100, eps=self.epsilon, seed=self.seed,
 58 |                 norm=self.norm, verbose=False, device=self.device)
 59 |         
 60 |             from .square import SquareAttack
 61 |             self.square = SquareAttack(self.model.predict, p_init=.8, n_queries=5000, eps=self.epsilon, norm=self.norm,
 62 |                 n_restarts=1, seed=self.seed, verbose=False, device=self.device, resc_schedule=False)
 63 |                 
 64 |             from .autopgd_base import APGDAttack_targeted
 65 |             self.apgd_targeted = APGDAttack_targeted(self.model, n_restarts=1, n_iter=100, verbose=False,
 66 |                 eps=self.epsilon, norm=self.norm, eot_iter=1, rho=.75, seed=self.seed, device=self.device,
 67 |                 is_tf_model=True, logger=self.logger)
 68 |     
 69 |         if version in ['standard', 'plus', 'rand']:
 70 |             self.set_version(version)
 71 |         
 72 |     def get_logits(self, x):
 73 |         if not self.is_tf_model:
 74 |             return self.model(x)
 75 |         else:
 76 |             return self.model.predict(x)
 77 |     
 78 |     def get_seed(self):
 79 |         return time.time() if self.seed is None else self.seed
 80 |     
 81 |     def run_standard_evaluation(self,
 82 |                                 x_orig,
 83 |                                 y_orig,
 84 |                                 bs=250,
 85 |                                 return_labels=False,
 86 |                                 state_path=None):
 87 |         if state_path is not None and state_path.exists():
 88 |             state = EvaluationState.from_disk(state_path)
 89 |             if set(self.attacks_to_run) != state.attacks_to_run:
 90 |                 raise ValueError("The state was created with a different set of attacks "
 91 |                                  "to run. You are probably using the wrong state file.")
 92 |             if self.verbose:
 93 |                 self.logger.log("Restored state from {}".format(state_path))
 94 |                 self.logger.log("Since the state has been restored, **only** "
 95 |                                 "the adversarial examples from the current run "
 96 |                                 "are going to be returned.")
 97 |         else:
 98 |             state = EvaluationState(set(self.attacks_to_run), path=state_path)
 99 |             state.to_disk()
100 |             if self.verbose and state_path is not None:
101 |                 self.logger.log("Created state in {}".format(state_path))                                
102 | 
103 |         attacks_to_run = list(filter(lambda attack: attack not in state.run_attacks, self.attacks_to_run))
104 |         if self.verbose:
105 |             self.logger.log('using {} version including {}.'.format(self.version,
106 |                   ', '.join(attacks_to_run)))
107 |             if state.run_attacks:
108 |                 self.logger.log('{} was/were already run.'.format(', '.join(state.run_attacks)))
109 | 
110 |         # checks on type of defense
111 |         if self.version != 'rand':
112 |             checks.check_randomized(self.get_logits, x_orig[:bs].to(self.device),
113 |                 y_orig[:bs].to(self.device), bs=bs, logger=self.logger)
114 |         n_cls = checks.check_range_output(self.get_logits, x_orig[:bs].to(self.device),
115 |             logger=self.logger)
116 |         checks.check_dynamic(self.model, x_orig[:bs].to(self.device), self.is_tf_model,
117 |             logger=self.logger)
118 |         checks.check_n_classes(n_cls, self.attacks_to_run, self.apgd_targeted.n_target_classes,
119 |             self.fab.n_target_classes, logger=self.logger)
120 |         
121 |         with torch.no_grad():
122 |             # calculate accuracy
123 |             n_batches = int(np.ceil(x_orig.shape[0] / bs))
124 |             if state.robust_flags is None:
125 |                 robust_flags = torch.zeros(x_orig.shape[0], dtype=torch.bool, device=x_orig.device)
126 |                 y_adv = torch.empty_like(y_orig)
127 |                 for batch_idx in range(n_batches):
128 |                     start_idx = batch_idx * bs
129 |                     end_idx = min( (batch_idx + 1) * bs, x_orig.shape[0])
130 | 
131 |                     x = x_orig[start_idx:end_idx, :].clone().to(self.device)
132 |                     y = y_orig[start_idx:end_idx].clone().to(self.device)
133 |                     output = self.get_logits(x).max(dim=1)[1]
134 |                     y_adv[start_idx: end_idx] = output
135 |                     correct_batch = y.eq(output)
136 |                     robust_flags[start_idx:end_idx] = correct_batch.detach().to(robust_flags.device)
137 | 
138 |                 state.robust_flags = robust_flags
139 |                 robust_accuracy = torch.sum(robust_flags).item() / x_orig.shape[0]
140 |                 robust_accuracy_dict = {'clean': robust_accuracy}
141 |                 state.clean_accuracy = robust_accuracy
142 |                 
143 |                 if self.verbose:
144 |                     self.logger.log('initial accuracy: {:.2%}'.format(robust_accuracy))
145 |             else:
146 |                 robust_flags = state.robust_flags.to(x_orig.device)
147 |                 robust_accuracy = torch.sum(robust_flags).item() / x_orig.shape[0]
148 |                 robust_accuracy_dict = {'clean': state.clean_accuracy}
149 |                 if self.verbose:
150 |                     self.logger.log('initial clean accuracy: {:.2%}'.format(state.clean_accuracy))
151 |                     self.logger.log('robust accuracy at the time of restoring the state: {:.2%}'.format(robust_accuracy))
152 |                     
153 |             x_adv = x_orig.clone().detach()
154 |             startt = time.time()
155 |             for attack in attacks_to_run:
156 |                 # item() is super important as pytorch int division uses floor rounding
157 |                 num_robust = torch.sum(robust_flags).item()
158 | 
159 |                 if num_robust == 0:
160 |                     break
161 | 
162 |                 n_batches = int(np.ceil(num_robust / bs))
163 | 
164 |                 robust_lin_idcs = torch.nonzero(robust_flags, as_tuple=False)
165 |                 if num_robust > 1:
166 |                     robust_lin_idcs.squeeze_()
167 |                 
168 |                 for batch_idx in range(n_batches):
169 |                     start_idx = batch_idx * bs
170 |                     end_idx = min((batch_idx + 1) * bs, num_robust)
171 | 
172 |                     batch_datapoint_idcs = robust_lin_idcs[start_idx:end_idx]
173 |                     if len(batch_datapoint_idcs.shape) > 1:
174 |                         batch_datapoint_idcs.squeeze_(-1)
175 |                     x = x_orig[batch_datapoint_idcs, :].clone().to(self.device)
176 |                     y = y_orig[batch_datapoint_idcs].clone().to(self.device)
177 | 
178 |                     # make sure that x is a 4d tensor even if there is only a single datapoint left
179 |                     if len(x.shape) == 3:
180 |                         x.unsqueeze_(dim=0)
181 |                     
182 |                     # run attack
183 |                     if attack == 'apgd-ce':
184 |                         # apgd on cross-entropy loss
185 |                         self.apgd.loss = 'ce'
186 |                         self.apgd.seed = self.get_seed()
187 |                         adv_curr = self.apgd.perturb(x, y) #cheap=True
188 |                     
189 |                     elif attack == 'apgd-dlr':
190 |                         # apgd on dlr loss
191 |                         self.apgd.loss = 'dlr'
192 |                         self.apgd.seed = self.get_seed()
193 |                         adv_curr = self.apgd.perturb(x, y) #cheap=True
194 |                     
195 |                     elif attack == 'fab':
196 |                         # fab
197 |                         self.fab.targeted = False
198 |                         self.fab.seed = self.get_seed()
199 |                         adv_curr = self.fab.perturb(x, y)
200 |                     
201 |                     elif attack == 'square':
202 |                         # square
203 |                         self.square.seed = self.get_seed()
204 |                         adv_curr = self.square.perturb(x, y)
205 |                     
206 |                     elif attack == 'apgd-t':
207 |                         # targeted apgd
208 |                         self.apgd_targeted.seed = self.get_seed()
209 |                         adv_curr = self.apgd_targeted.perturb(x, y) #cheap=True
210 |                     
211 |                     elif attack == 'fab-t':
212 |                         # fab targeted
213 |                         self.fab.targeted = True
214 |                         self.fab.n_restarts = 1
215 |                         self.fab.seed = self.get_seed()
216 |                         adv_curr = self.fab.perturb(x, y)
217 |                     
218 |                     else:
219 |                         raise ValueError('Attack not supported')
220 |                 
221 |                     output = self.get_logits(adv_curr).max(dim=1)[1]
222 |                     false_batch = ~y.eq(output).to(robust_flags.device)
223 |                     non_robust_lin_idcs = batch_datapoint_idcs[false_batch]
224 |                     robust_flags[non_robust_lin_idcs] = False
225 |                     state.robust_flags = robust_flags
226 | 
227 |                     x_adv[non_robust_lin_idcs] = adv_curr[false_batch].detach().to(x_adv.device)
228 |                     y_adv[non_robust_lin_idcs] = output[false_batch].detach().to(x_adv.device)
229 | 
230 |                     if self.verbose:
231 |                         num_non_robust_batch = torch.sum(false_batch)    
232 |                         self.logger.log('{} - {}/{} - {} out of {} successfully perturbed'.format(
233 |                             attack, batch_idx + 1, n_batches, num_non_robust_batch, x.shape[0]))
234 |                 
235 |                 robust_accuracy = torch.sum(robust_flags).item() / x_orig.shape[0]
236 |                 robust_accuracy_dict[attack] = robust_accuracy
237 |                 state.add_run_attack(attack)
238 |                 if self.verbose:
239 |                     self.logger.log('robust accuracy after {}: {:.2%} (total time {:.1f} s)'.format(
240 |                         attack.upper(), robust_accuracy, time.time() - startt))
241 |                     
242 |             # check about square
243 |             checks.check_square_sr(robust_accuracy_dict, logger=self.logger)
244 |             state.to_disk(force=True)
245 |             
246 |             # final check
247 |             if self.verbose:
248 |                 if self.norm == 'Linf':
249 |                     res = (x_adv - x_orig).abs().reshape(x_orig.shape[0], -1).max(1)[0]
250 |                 elif self.norm == 'L2':
251 |                     res = ((x_adv - x_orig) ** 2).reshape(x_orig.shape[0], -1).sum(-1).sqrt()
252 |                 elif self.norm == 'L1':
253 |                     res = (x_adv - x_orig).abs().reshape(x_orig.shape[0], -1).sum(dim=-1)
254 |                 self.logger.log('max {} perturbation: {:.5f}, nan in tensor: {}, max: {:.5f}, min: {:.5f}'.format(
255 |                     self.norm, res.max(), (x_adv != x_adv).sum(), x_adv.max(), x_adv.min()))
256 |                 self.logger.log('robust accuracy: {:.2%}'.format(robust_accuracy))
257 |         if return_labels:
258 |             return x_adv, y_adv
259 |         else:
260 |             return x_adv
261 |         
262 |     def clean_accuracy(self, x_orig, y_orig, bs=250):
263 |         n_batches = math.ceil(x_orig.shape[0] / bs)
264 |         acc = 0.
265 |         for counter in range(n_batches):
266 |             x = x_orig[counter * bs:min((counter + 1) * bs, x_orig.shape[0])].clone().to(self.device)
267 |             y = y_orig[counter * bs:min((counter + 1) * bs, x_orig.shape[0])].clone().to(self.device)
268 |             output = self.get_logits(x)
269 |             acc += (output.max(1)[1] == y).float().sum()
270 |             
271 |         if self.verbose:
272 |             print('clean accuracy: {:.2%}'.format(acc / x_orig.shape[0]))
273 |         
274 |         return acc.item() / x_orig.shape[0]
275 |         
276 |     def run_standard_evaluation_individual(self, x_orig, y_orig, bs=250, return_labels=False):
277 |         if self.verbose:
278 |             print('using {} version including {}'.format(self.version,
279 |                 ', '.join(self.attacks_to_run)))
280 |         
281 |         l_attacks = self.attacks_to_run
282 |         adv = {}
283 |         verbose_indiv = self.verbose
284 |         self.verbose = False
285 |         
286 |         for c in l_attacks:
287 |             startt = time.time()
288 |             self.attacks_to_run = [c]
289 |             x_adv, y_adv = self.run_standard_evaluation(x_orig, y_orig, bs=bs, return_labels=True)
290 |             if return_labels:
291 |                 adv[c] = (x_adv, y_adv)
292 |             else:
293 |                 adv[c] = x_adv
294 |             if verbose_indiv:    
295 |                 acc_indiv  = self.clean_accuracy(x_adv, y_orig, bs=bs)
296 |                 space = '\t \t' if c == 'fab' else '\t'
297 |                 self.logger.log('robust accuracy by {} {} {:.2%} \t (time attack: {:.1f} s)'.format(
298 |                     c.upper(), space, acc_indiv,  time.time() - startt))
299 |         
300 |         return adv
301 |         
302 |     def set_version(self, version='standard'):
303 |         if self.verbose:
304 |             print('setting parameters for {} version'.format(version))
305 |         
306 |         if version == 'standard':
307 |             self.attacks_to_run = ['apgd-ce', 'apgd-t', 'fab-t', 'square']
308 |             if self.norm in ['Linf', 'L2']:
309 |                 self.apgd.n_restarts = 1
310 |                 self.apgd_targeted.n_target_classes = 9
311 |             elif self.norm in ['L1']:
312 |                 self.apgd.use_largereps = True
313 |                 self.apgd_targeted.use_largereps = True
314 |                 self.apgd.n_restarts = 5
315 |                 self.apgd_targeted.n_target_classes = 5
316 |             self.fab.n_restarts = 1
317 |             self.apgd_targeted.n_restarts = 1
318 |             self.fab.n_target_classes = 9
319 |             #self.apgd_targeted.n_target_classes = 9
320 |             self.square.n_queries = 5000
321 |         
322 |         elif version == 'plus':
323 |             self.attacks_to_run = ['apgd-ce', 'apgd-dlr', 'fab', 'square', 'apgd-t', 'fab-t']
324 |             self.apgd.n_restarts = 5
325 |             self.fab.n_restarts = 5
326 |             self.apgd_targeted.n_restarts = 1
327 |             self.fab.n_target_classes = 9
328 |             self.apgd_targeted.n_target_classes = 9
329 |             self.square.n_queries = 5000
330 |             if not self.norm in ['Linf', 'L2']:
331 |                 print('"{}" version is used with {} norm: please check'.format(
332 |                     version, self.norm))
333 |         
334 |         elif version == 'rand':
335 |             self.attacks_to_run = ['apgd-ce', 'apgd-dlr']
336 |             self.apgd.n_restarts = 1
337 |             self.apgd.eot_iter = 20
338 | 
339 | 


--------------------------------------------------------------------------------
/autoattack/autopgd_base.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020-present, Francesco Croce
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree
  6 | #
  7 | 
  8 | import time
  9 | import torch
 10 | import torch.nn as nn
 11 | import torch.nn.functional as F
 12 | import math
 13 | import random
 14 | 
 15 | from autoattack.other_utils import L0_norm, L1_norm, L2_norm
 16 | from autoattack.checks import check_zero_gradients
 17 | 
 18 | 
 19 | def L1_projection(x2, y2, eps1):
 20 |     '''
 21 |     x2: center of the L1 ball (bs x input_dim)
 22 |     y2: current perturbation (x2 + y2 is the point to be projected)
 23 |     eps1: radius of the L1 ball
 24 | 
 25 |     output: delta s.th. ||y2 + delta||_1 <= eps1
 26 |     and 0 <= x2 + y2 + delta <= 1
 27 |     '''
 28 | 
 29 |     x = x2.clone().float().view(x2.shape[0], -1)
 30 |     y = y2.clone().float().view(y2.shape[0], -1)
 31 |     sigma = y.clone().sign()
 32 |     u = torch.min(1 - x - y, x + y)
 33 |     #u = torch.min(u, epsinf - torch.clone(y).abs())
 34 |     u = torch.min(torch.zeros_like(y), u)
 35 |     l = -torch.clone(y).abs()
 36 |     d = u.clone()
 37 |     
 38 |     bs, indbs = torch.sort(-torch.cat((u, l), 1), dim=1)
 39 |     bs2 = torch.cat((bs[:, 1:], torch.zeros(bs.shape[0], 1).to(bs.device)), 1)
 40 |     
 41 |     inu = 2*(indbs < u.shape[1]).float() - 1
 42 |     size1 = inu.cumsum(dim=1)
 43 |     
 44 |     s1 = -u.sum(dim=1)
 45 |     
 46 |     c = eps1 - y.clone().abs().sum(dim=1)
 47 |     c5 = s1 + c < 0
 48 |     c2 = c5.nonzero().squeeze(1)
 49 |     
 50 |     s = s1.unsqueeze(-1) + torch.cumsum((bs2 - bs) * size1, dim=1)
 51 |     
 52 |     if c2.nelement != 0:
 53 |     
 54 |       lb = torch.zeros_like(c2).float()
 55 |       ub = torch.ones_like(lb) *(bs.shape[1] - 1)
 56 |       
 57 |       #print(c2.shape, lb.shape)
 58 |       
 59 |       nitermax = torch.ceil(torch.log2(torch.tensor(bs.shape[1]).float()))
 60 |       counter2 = torch.zeros_like(lb).long()
 61 |       counter = 0
 62 |           
 63 |       while counter < nitermax:
 64 |         counter4 = torch.floor((lb + ub) / 2.)
 65 |         counter2 = counter4.type(torch.LongTensor)
 66 |         
 67 |         c8 = s[c2, counter2] + c[c2] < 0
 68 |         ind3 = c8.nonzero().squeeze(1)
 69 |         ind32 = (~c8).nonzero().squeeze(1)
 70 |         #print(ind3.shape)
 71 |         if ind3.nelement != 0:
 72 |             lb[ind3] = counter4[ind3]
 73 |         if ind32.nelement != 0:
 74 |             ub[ind32] = counter4[ind32]
 75 |         
 76 |         #print(lb, ub)
 77 |         counter += 1
 78 |         
 79 |       lb2 = lb.long()
 80 |       alpha = (-s[c2, lb2] -c[c2]) / size1[c2, lb2 + 1] + bs2[c2, lb2]
 81 |       d[c2] = -torch.min(torch.max(-u[c2], alpha.unsqueeze(-1)), -l[c2])
 82 |     
 83 |     return (sigma * d).view(x2.shape)
 84 | 
 85 | 
 86 | 
 87 | 
 88 | 
 89 | class APGDAttack():
 90 |     """
 91 |     AutoPGD
 92 |     https://arxiv.org/abs/2003.01690
 93 | 
 94 |     :param predict:       forward pass function
 95 |     :param norm:          Lp-norm of the attack ('Linf', 'L2', 'L0' supported)
 96 |     :param n_restarts:    number of random restarts
 97 |     :param n_iter:        number of iterations
 98 |     :param eps:           bound on the norm of perturbations
 99 |     :param seed:          random seed for the starting point
100 |     :param loss:          loss to optimize ('ce', 'dlr' supported)
101 |     :param eot_iter:      iterations for Expectation over Trasformation
102 |     :param rho:           parameter for decreasing the step size
103 |     """
104 | 
105 |     def __init__(
106 |             self,
107 |             predict,
108 |             n_iter=100,
109 |             norm='Linf',
110 |             n_restarts=1,
111 |             eps=None,
112 |             seed=0,
113 |             loss='ce',
114 |             eot_iter=1,
115 |             rho=.75,
116 |             topk=None,
117 |             verbose=False,
118 |             device=None,
119 |             use_largereps=False,
120 |             is_tf_model=False,
121 |             logger=None):
122 |         """
123 |         AutoPGD implementation in PyTorch
124 |         """
125 |         
126 |         self.model = predict
127 |         self.n_iter = n_iter
128 |         self.eps = eps
129 |         self.norm = norm
130 |         self.n_restarts = n_restarts
131 |         self.seed = seed
132 |         self.loss = loss
133 |         self.eot_iter = eot_iter
134 |         self.thr_decr = rho
135 |         self.topk = topk
136 |         self.verbose = verbose
137 |         self.device = device
138 |         self.use_rs = True
139 |         #self.init_point = None
140 |         self.use_largereps = use_largereps
141 |         #self.larger_epss = None
142 |         #self.iters = None
143 |         self.n_iter_orig = n_iter + 0
144 |         self.eps_orig = eps + 0.
145 |         self.is_tf_model = is_tf_model
146 |         self.y_target = None
147 |         self.logger = logger
148 | 
149 |         assert self.norm in ['Linf', 'L2', 'L1']
150 |         assert not self.eps is None
151 | 
152 |         ### set parameters for checkpoints
153 |         self.n_iter_2 = max(int(0.22 * self.n_iter), 1)
154 |         self.n_iter_min = max(int(0.06 * self.n_iter), 1)
155 |         self.size_decr = max(int(0.03 * self.n_iter), 1)
156 | 
157 |     def init_hyperparam(self, x):
158 | 
159 |         if self.device is None:
160 |             self.device = x.device
161 |         self.orig_dim = list(x.shape[1:])
162 |         self.ndims = len(self.orig_dim)
163 |         if self.seed is None:
164 |             self.seed = time.time()
165 | 
166 |     def check_oscillation(self, x, j, k, y5, k3=0.75):
167 |         t = torch.zeros(x.shape[1]).to(self.device)
168 |         for counter5 in range(k):
169 |           t += (x[j - counter5] > x[j - counter5 - 1]).float()
170 | 
171 |         return (t <= k * k3 * torch.ones_like(t)).float()
172 | 
173 |     def check_shape(self, x):
174 |         return x if len(x.shape) > 0 else x.unsqueeze(0)
175 | 
176 |     def normalize(self, x):
177 |         if self.norm == 'Linf':
178 |             t = x.abs().view(x.shape[0], -1).max(1)[0]
179 | 
180 |         elif self.norm == 'L2':
181 |             t = (x ** 2).view(x.shape[0], -1).sum(-1).sqrt()
182 | 
183 |         elif self.norm == 'L1':
184 |             try:
185 |                 t = x.abs().view(x.shape[0], -1).sum(dim=-1)
186 |             except:
187 |                 t = x.abs().reshape([x.shape[0], -1]).sum(dim=-1)
188 | 
189 |         return x / (t.view(-1, *([1] * self.ndims)) + 1e-12)
190 | 
191 |     def dlr_loss(self, x, y):
192 |         x_sorted, ind_sorted = x.sort(dim=1)
193 |         ind = (ind_sorted[:, -1] == y).float()
194 |         u = torch.arange(x.shape[0])
195 | 
196 |         return -(x[u, y] - x_sorted[:, -2] * ind - x_sorted[:, -1] * (
197 |             1. - ind)) / (x_sorted[:, -1] - x_sorted[:, -3] + 1e-12)
198 | 
199 |     #
200 |     
201 |     def attack_single_run(self, x, y, x_init=None):
202 |         if len(x.shape) < self.ndims:
203 |             x = x.unsqueeze(0)
204 |             y = y.unsqueeze(0)
205 | 
206 |         if self.norm == 'Linf':
207 |             t = 2 * torch.rand(x.shape).to(self.device).detach() - 1
208 |             x_adv = x + self.eps * torch.ones_like(x
209 |                 ).detach() * self.normalize(t)
210 |         elif self.norm == 'L2':
211 |             t = torch.randn(x.shape).to(self.device).detach()
212 |             x_adv = x + self.eps * torch.ones_like(x
213 |                 ).detach() * self.normalize(t)
214 |         elif self.norm == 'L1':
215 |             t = torch.randn(x.shape).to(self.device).detach()
216 |             delta = L1_projection(x, t, self.eps)
217 |             x_adv = x + t + delta
218 |             
219 |         
220 |         
221 |         
222 |         
223 |         if not x_init is None:
224 |             x_adv = x_init.clone()
225 |             if self.norm == 'L1' and self.verbose:
226 |                 print('[custom init] L1 perturbation {:.5f}'.format(
227 |                     (x_adv - x).abs().view(x.shape[0], -1).sum(1).max()))
228 |             
229 |         
230 |         x_adv = x_adv.clamp(0., 1.)
231 |         x_best = x_adv.clone()
232 |         x_best_adv = x_adv.clone()
233 |         loss_steps = torch.zeros([self.n_iter, x.shape[0]]
234 |             ).to(self.device)
235 |         loss_best_steps = torch.zeros([self.n_iter + 1, x.shape[0]]
236 |             ).to(self.device)
237 |         acc_steps = torch.zeros_like(loss_best_steps)
238 | 
239 |         if not self.is_tf_model:
240 |             if self.loss == 'ce':
241 |                 criterion_indiv = nn.CrossEntropyLoss(reduction='none')
242 |             elif self.loss == 'ce-targeted-cfts':
243 |                 criterion_indiv = lambda x, y: -1. * F.cross_entropy(x, y,
244 |                     reduction='none')
245 |             elif self.loss == 'dlr':
246 |                 criterion_indiv = self.dlr_loss
247 |             elif self.loss == 'dlr-targeted':
248 |                 criterion_indiv = self.dlr_loss_targeted
249 |             elif self.loss == 'ce-targeted':
250 |                 criterion_indiv = self.ce_loss_targeted
251 |             else:
252 |                 raise ValueError('unknowkn loss')
253 |         else:
254 |             if self.loss == 'ce':
255 |                 criterion_indiv = self.model.get_logits_loss_grad_xent
256 |             elif self.loss == 'dlr':
257 |                 criterion_indiv = self.model.get_logits_loss_grad_dlr
258 |             elif self.loss == 'dlr-targeted':
259 |                 criterion_indiv = self.model.get_logits_loss_grad_target
260 |             else:
261 |                 raise ValueError('unknowkn loss')
262 |         
263 |         
264 |         x_adv.requires_grad_()
265 |         grad = torch.zeros_like(x)
266 |         for _ in range(self.eot_iter):
267 |             if not self.is_tf_model:
268 |                 with torch.enable_grad():
269 |                     logits = self.model(x_adv)
270 |                     loss_indiv = criterion_indiv(logits, y)
271 |                     loss = loss_indiv.sum()
272 | 
273 |                 grad += torch.autograd.grad(loss, [x_adv])[0].detach()
274 |             else:
275 |                 if self.y_target is None:
276 |                     logits, loss_indiv, grad_curr = criterion_indiv(x_adv, y)
277 |                 else:
278 |                     logits, loss_indiv, grad_curr = criterion_indiv(x_adv, y,
279 |                         self.y_target)
280 |                 grad += grad_curr
281 |         
282 |         grad /= float(self.eot_iter)
283 |         grad_best = grad.clone()
284 | 
285 |         if self.loss in ['dlr', 'dlr-targeted']:
286 |             # check if there are zero gradients
287 |             check_zero_gradients(grad, logger=self.logger)
288 |         
289 |         acc = logits.detach().max(1)[1] == y
290 |         acc_steps[0] = acc + 0
291 |         loss_best = loss_indiv.detach().clone()
292 | 
293 |         alpha = 2. if self.norm in ['Linf', 'L2'] else 1. if self.norm in ['L1'] else 2e-2
294 |         step_size = alpha * self.eps * torch.ones([x.shape[0], *(
295 |             [1] * self.ndims)]).to(self.device).detach()
296 |         x_adv_old = x_adv.clone()
297 |         counter = 0
298 |         k = self.n_iter_2 + 0
299 |         n_fts = math.prod(self.orig_dim)
300 |         if self.norm == 'L1':
301 |             k = max(int(.04 * self.n_iter), 1)
302 |             if x_init is None:
303 |                 topk = .2 * torch.ones([x.shape[0]], device=self.device)
304 |                 sp_old =  n_fts * torch.ones_like(topk)
305 |             else:
306 |                 topk = L0_norm(x_adv - x) / n_fts / 1.5
307 |                 sp_old = L0_norm(x_adv - x)
308 |             #print(topk[0], sp_old[0])
309 |             adasp_redstep = 1.5
310 |             adasp_minstep = 10.
311 |             #print(step_size[0].item())
312 |         counter3 = 0
313 | 
314 |         loss_best_last_check = loss_best.clone()
315 |         reduced_last_check = torch.ones_like(loss_best)
316 |         n_reduced = 0
317 | 
318 |         u = torch.arange(x.shape[0], device=self.device)
319 |         for i in range(self.n_iter):
320 |             ### gradient step
321 |             with torch.no_grad():
322 |                 x_adv = x_adv.detach()
323 |                 grad2 = x_adv - x_adv_old
324 |                 x_adv_old = x_adv.clone()
325 | 
326 |                 a = 0.75 if i > 0 else 1.0
327 | 
328 |                 if self.norm == 'Linf':
329 |                     x_adv_1 = x_adv + step_size * torch.sign(grad)
330 |                     x_adv_1 = torch.clamp(torch.min(torch.max(x_adv_1,
331 |                         x - self.eps), x + self.eps), 0.0, 1.0)
332 |                     x_adv_1 = torch.clamp(torch.min(torch.max(
333 |                         x_adv + (x_adv_1 - x_adv) * a + grad2 * (1 - a),
334 |                         x - self.eps), x + self.eps), 0.0, 1.0)
335 | 
336 |                 elif self.norm == 'L2':
337 |                     x_adv_1 = x_adv + step_size * self.normalize(grad)
338 |                     x_adv_1 = torch.clamp(x + self.normalize(x_adv_1 - x
339 |                         ) * torch.min(self.eps * torch.ones_like(x).detach(),
340 |                         L2_norm(x_adv_1 - x, keepdim=True)), 0.0, 1.0)
341 |                     x_adv_1 = x_adv + (x_adv_1 - x_adv) * a + grad2 * (1 - a)
342 |                     x_adv_1 = torch.clamp(x + self.normalize(x_adv_1 - x
343 |                         ) * torch.min(self.eps * torch.ones_like(x).detach(),
344 |                         L2_norm(x_adv_1 - x, keepdim=True)), 0.0, 1.0)
345 | 
346 |                 elif self.norm == 'L1':
347 |                     grad_topk = grad.abs().view(x.shape[0], -1).sort(-1)[0]
348 |                     topk_curr = torch.clamp((1. - topk) * n_fts, min=0, max=n_fts - 1).long()
349 |                     grad_topk = grad_topk[u, topk_curr].view(-1, *[1]*(len(x.shape) - 1))
350 |                     sparsegrad = grad * (grad.abs() >= grad_topk).float()
351 |                     x_adv_1 = x_adv + step_size * sparsegrad.sign() / (
352 |                         L1_norm(sparsegrad.sign(), keepdim=True) + 1e-10)
353 |                     
354 |                     delta_u = x_adv_1 - x
355 |                     delta_p = L1_projection(x, delta_u, self.eps)
356 |                     x_adv_1 = x + delta_u + delta_p
357 |                     
358 |                     
359 |                 x_adv = x_adv_1 + 0.
360 | 
361 |             ### get gradient
362 |             x_adv.requires_grad_()
363 |             grad = torch.zeros_like(x)
364 |             for _ in range(self.eot_iter):
365 |                 if not self.is_tf_model:
366 |                     with torch.enable_grad():
367 |                         logits = self.model(x_adv)
368 |                         loss_indiv = criterion_indiv(logits, y)
369 |                         loss = loss_indiv.sum()
370 |     
371 |                     grad += torch.autograd.grad(loss, [x_adv])[0].detach()
372 |                 else:
373 |                     if self.y_target is None:
374 |                         logits, loss_indiv, grad_curr = criterion_indiv(x_adv, y)
375 |                     else:
376 |                         logits, loss_indiv, grad_curr = criterion_indiv(x_adv, y, self.y_target)
377 |                     grad += grad_curr
378 |             
379 |             grad /= float(self.eot_iter)
380 | 
381 |             pred = logits.detach().max(1)[1] == y
382 |             acc = torch.min(acc, pred)
383 |             acc_steps[i + 1] = acc + 0
384 |             ind_pred = (pred == 0).nonzero().squeeze()
385 |             x_best_adv[ind_pred] = x_adv[ind_pred] + 0.
386 |             if self.verbose:
387 |                 str_stats = ' - step size: {:.5f} - topk: {:.2f}'.format(
388 |                     step_size.mean(), topk.mean() * n_fts) if self.norm in ['L1'] else ''
389 |                 print('[m] iteration: {} - best loss: {:.6f} - robust accuracy: {:.2%}{}'.format(
390 |                     i, loss_best.sum(), acc.float().mean(), str_stats))
391 |                 #print('pert {}'.format((x - x_best_adv).abs().view(x.shape[0], -1).sum(-1).max()))
392 |             
393 |             ### check step size
394 |             with torch.no_grad():
395 |               y1 = loss_indiv.detach().clone()
396 |               loss_steps[i] = y1 + 0
397 |               ind = (y1 > loss_best).nonzero().squeeze()
398 |               x_best[ind] = x_adv[ind].clone()
399 |               grad_best[ind] = grad[ind].clone()
400 |               loss_best[ind] = y1[ind] + 0
401 |               loss_best_steps[i + 1] = loss_best + 0
402 | 
403 |               counter3 += 1
404 | 
405 |               if counter3 == k:
406 |                   if self.norm in ['Linf', 'L2']:
407 |                       fl_oscillation = self.check_oscillation(loss_steps, i, k,
408 |                           loss_best, k3=self.thr_decr)
409 |                       fl_reduce_no_impr = (1. - reduced_last_check) * (
410 |                           loss_best_last_check >= loss_best).float()
411 |                       fl_oscillation = torch.max(fl_oscillation,
412 |                           fl_reduce_no_impr)
413 |                       reduced_last_check = fl_oscillation.clone()
414 |                       loss_best_last_check = loss_best.clone()
415 |     
416 |                       if fl_oscillation.sum() > 0:
417 |                           ind_fl_osc = (fl_oscillation > 0).nonzero().squeeze()
418 |                           step_size[ind_fl_osc] /= 2.0
419 |                           n_reduced = fl_oscillation.sum()
420 |     
421 |                           x_adv[ind_fl_osc] = x_best[ind_fl_osc].clone()
422 |                           grad[ind_fl_osc] = grad_best[ind_fl_osc].clone()
423 | 
424 |                       k = max(k - self.size_decr, self.n_iter_min)
425 |                   
426 |                   elif self.norm == 'L1':
427 |                       sp_curr = L0_norm(x_best - x)
428 |                       fl_redtopk = (sp_curr / sp_old) < .95
429 |                       topk = sp_curr / n_fts / 1.5
430 |                       step_size[fl_redtopk] = alpha * self.eps
431 |                       step_size[~fl_redtopk] /= adasp_redstep
432 |                       step_size.clamp_(alpha * self.eps / adasp_minstep, alpha * self.eps)
433 |                       sp_old = sp_curr.clone()
434 |                   
435 |                       x_adv[fl_redtopk] = x_best[fl_redtopk].clone()
436 |                       grad[fl_redtopk] = grad_best[fl_redtopk].clone()
437 |                   
438 |                   counter3 = 0
439 |                   #k = max(k - self.size_decr, self.n_iter_min)
440 | 
441 |         #
442 |         
443 |         return (x_best, acc, loss_best, x_best_adv)
444 | 
445 |     def perturb(self, x, y=None, best_loss=False, x_init=None):
446 |         """
447 |         :param x:           clean images
448 |         :param y:           clean labels, if None we use the predicted labels
449 |         :param best_loss:   if True the points attaining highest loss
450 |                             are returned, otherwise adversarial examples
451 |         """
452 | 
453 |         assert self.loss in ['ce', 'dlr'] #'ce-targeted-cfts'
454 |         if not y is None and len(y.shape) == 0:
455 |             x.unsqueeze_(0)
456 |             y.unsqueeze_(0)
457 |         self.init_hyperparam(x)
458 | 
459 |         x = x.detach().clone().float().to(self.device)
460 |         if not self.is_tf_model:
461 |             y_pred = self.model(x).max(1)[1]
462 |         else:
463 |             y_pred = self.model.predict(x).max(1)[1]
464 |         if y is None:
465 |             #y_pred = self.predict(x).max(1)[1]
466 |             y = y_pred.detach().clone().long().to(self.device)
467 |         else:
468 |             y = y.detach().clone().long().to(self.device)
469 | 
470 |         adv = x.clone()
471 |         if self.loss != 'ce-targeted':
472 |             acc = y_pred == y
473 |         else:
474 |             acc = y_pred != y
475 |         loss = -1e10 * torch.ones_like(acc).float()
476 |         if self.verbose:
477 |             print('-------------------------- ',
478 |                 'running {}-attack with epsilon {:.5f}'.format(
479 |                 self.norm, self.eps),
480 |                 '--------------------------')
481 |             print('initial accuracy: {:.2%}'.format(acc.float().mean()))
482 | 
483 |         
484 |         
485 |         if self.use_largereps:
486 |             epss = [3. * self.eps_orig, 2. * self.eps_orig, 1. * self.eps_orig]
487 |             iters = [.3 * self.n_iter_orig, .3 * self.n_iter_orig,
488 |                 .4 * self.n_iter_orig]
489 |             iters = [math.ceil(c) for c in iters]
490 |             iters[-1] = self.n_iter_orig - sum(iters[:-1]) # make sure to use the given iterations
491 |             if self.verbose:
492 |                 print('using schedule [{}x{}]'.format('+'.join([str(c
493 |                     ) for c in epss]), '+'.join([str(c) for c in iters])))
494 |         
495 |         startt = time.time()
496 |         if not best_loss:
497 |             torch.random.manual_seed(self.seed)
498 |             torch.cuda.random.manual_seed(self.seed)
499 | 
500 |             for counter in range(self.n_restarts):
501 |                 ind_to_fool = acc.nonzero().squeeze()
502 |                 if len(ind_to_fool.shape) == 0:
503 |                     ind_to_fool = ind_to_fool.unsqueeze(0)
504 |                 if ind_to_fool.numel() != 0:
505 |                     x_to_fool = x[ind_to_fool].clone()
506 |                     y_to_fool = y[ind_to_fool].clone()
507 |                     
508 |                     
509 |                     if not self.use_largereps:
510 |                         res_curr = self.attack_single_run(x_to_fool, y_to_fool)
511 |                     else:
512 |                         res_curr = self.decr_eps_pgd(x_to_fool, y_to_fool, epss, iters)
513 |                     best_curr, acc_curr, loss_curr, adv_curr = res_curr
514 |                     ind_curr = (acc_curr == 0).nonzero().squeeze()
515 | 
516 |                     acc[ind_to_fool[ind_curr]] = 0
517 |                     adv[ind_to_fool[ind_curr]] = adv_curr[ind_curr].clone()
518 |                     if self.verbose:
519 |                         print('restart {} - robust accuracy: {:.2%}'.format(
520 |                             counter, acc.float().mean()),
521 |                             '- cum. time: {:.1f} s'.format(
522 |                             time.time() - startt))
523 | 
524 |             return adv
525 | 
526 |         else:
527 |             adv_best = x.detach().clone()
528 |             loss_best = torch.ones([x.shape[0]]).to(
529 |                 self.device) * (-float('inf'))
530 |             for counter in range(self.n_restarts):
531 |                 best_curr, _, loss_curr, _ = self.attack_single_run(x, y)
532 |                 ind_curr = (loss_curr > loss_best).nonzero().squeeze()
533 |                 adv_best[ind_curr] = best_curr[ind_curr] + 0.
534 |                 loss_best[ind_curr] = loss_curr[ind_curr] + 0.
535 | 
536 |                 if self.verbose:
537 |                     print('restart {} - loss: {:.5f}'.format(
538 |                         counter, loss_best.sum()))
539 | 
540 |             return adv_best
541 | 
542 |     def decr_eps_pgd(self, x, y, epss, iters, use_rs=True):
543 |         assert len(epss) == len(iters)
544 |         assert self.norm in ['L1']
545 |         self.use_rs = False
546 |         if not use_rs:
547 |             x_init = None
548 |         else:
549 |             x_init = x + torch.randn_like(x)
550 |             x_init += L1_projection(x, x_init - x, 1. * float(epss[0]))
551 |         eps_target = float(epss[-1])
552 |         if self.verbose:
553 |             print('total iter: {}'.format(sum(iters)))
554 |         for eps, niter in zip(epss, iters):
555 |             if self.verbose:
556 |                 print('using eps: {:.2f}'.format(eps))
557 |             self.n_iter = niter + 0
558 |             self.eps = eps + 0.
559 |             #
560 |             if not x_init is None:
561 |                 x_init += L1_projection(x, x_init - x, 1. * eps)
562 |             x_init, acc, loss, x_adv = self.attack_single_run(x, y, x_init=x_init)
563 | 
564 |         return (x_init, acc, loss, x_adv)
565 | 
566 | class APGDAttack_targeted(APGDAttack):
567 |     def __init__(
568 |             self,
569 |             predict,
570 |             n_iter=100,
571 |             norm='Linf',
572 |             n_restarts=1,
573 |             eps=None,
574 |             seed=0,
575 |             eot_iter=1,
576 |             rho=.75,
577 |             topk=None,
578 |             n_target_classes=9,
579 |             verbose=False,
580 |             device=None,
581 |             use_largereps=False,
582 |             is_tf_model=False,
583 |             logger=None):
584 |         """
585 |         AutoPGD on the targeted DLR loss
586 |         """
587 |         super(APGDAttack_targeted, self).__init__(predict, n_iter=n_iter, norm=norm,
588 |             n_restarts=n_restarts, eps=eps, seed=seed, loss='dlr-targeted',
589 |             eot_iter=eot_iter, rho=rho, topk=topk, verbose=verbose, device=device,
590 |             use_largereps=use_largereps, is_tf_model=is_tf_model, logger=logger)
591 | 
592 |         self.y_target = None
593 |         self.n_target_classes = n_target_classes
594 | 
595 |     def dlr_loss_targeted(self, x, y):
596 |         x_sorted, ind_sorted = x.sort(dim=1)
597 |         u = torch.arange(x.shape[0])
598 | 
599 |         return -(x[u, y] - x[u, self.y_target]) / (x_sorted[:, -1] - .5 * (
600 |             x_sorted[:, -3] + x_sorted[:, -4]) + 1e-12)
601 | 
602 |     def ce_loss_targeted(self, x, y):
603 |         return -1. * F.cross_entropy(x, self.y_target, reduction='none')
604 |     
605 |     
606 |     def perturb(self, x, y=None, x_init=None):
607 |         """
608 |         :param x:           clean images
609 |         :param y:           clean labels, if None we use the predicted labels
610 |         """
611 | 
612 |         assert self.loss in ['dlr-targeted'] #'ce-targeted'
613 |         if not y is None and len(y.shape) == 0:
614 |             x.unsqueeze_(0)
615 |             y.unsqueeze_(0)
616 |         self.init_hyperparam(x)
617 | 
618 |         x = x.detach().clone().float().to(self.device)
619 |         if not self.is_tf_model:
620 |             y_pred = self.model(x).max(1)[1]
621 |         else:
622 |             y_pred = self.model.predict(x).max(1)[1]
623 |         if y is None:
624 |             #y_pred = self._get_predicted_label(x)
625 |             y = y_pred.detach().clone().long().to(self.device)
626 |         else:
627 |             y = y.detach().clone().long().to(self.device)
628 | 
629 |         adv = x.clone()
630 |         acc = y_pred == y
631 |         if self.verbose:
632 |             print('-------------------------- ',
633 |                 'running {}-attack with epsilon {:.5f}'.format(
634 |                 self.norm, self.eps),
635 |                 '--------------------------')
636 |             print('initial accuracy: {:.2%}'.format(acc.float().mean()))
637 | 
638 |         startt = time.time()
639 | 
640 |         torch.random.manual_seed(self.seed)
641 |         torch.cuda.random.manual_seed(self.seed)
642 | 
643 |         #
644 |         
645 |         if self.use_largereps:
646 |             epss = [3. * self.eps_orig, 2. * self.eps_orig, 1. * self.eps_orig]
647 |             iters = [.3 * self.n_iter_orig, .3 * self.n_iter_orig,
648 |                 .4 * self.n_iter_orig]
649 |             iters = [math.ceil(c) for c in iters]
650 |             iters[-1] = self.n_iter_orig - sum(iters[:-1])
651 |             if self.verbose:
652 |                 print('using schedule [{}x{}]'.format('+'.join([str(c
653 |                     ) for c in epss]), '+'.join([str(c) for c in iters])))
654 |         
655 |         for target_class in range(2, self.n_target_classes + 2):
656 |             for counter in range(self.n_restarts):
657 |                 ind_to_fool = acc.nonzero().squeeze()
658 |                 if len(ind_to_fool.shape) == 0:
659 |                     ind_to_fool = ind_to_fool.unsqueeze(0)
660 |                 if ind_to_fool.numel() != 0:
661 |                     x_to_fool = x[ind_to_fool].clone()
662 |                     y_to_fool = y[ind_to_fool].clone()
663 |                     
664 |                     if not self.is_tf_model:
665 |                         output = self.model(x_to_fool)
666 |                     else:
667 |                         output = self.model.predict(x_to_fool)
668 |                     self.y_target = output.sort(dim=1)[1][:, -target_class]
669 | 
670 |                     if not self.use_largereps:
671 |                         res_curr = self.attack_single_run(x_to_fool, y_to_fool)
672 |                     else:
673 |                         res_curr = self.decr_eps_pgd(x_to_fool, y_to_fool, epss, iters)
674 |                     best_curr, acc_curr, loss_curr, adv_curr = res_curr
675 |                     ind_curr = (acc_curr == 0).nonzero().squeeze()
676 | 
677 |                     acc[ind_to_fool[ind_curr]] = 0
678 |                     adv[ind_to_fool[ind_curr]] = adv_curr[ind_curr].clone()
679 |                     if self.verbose:
680 |                         print('target class {}'.format(target_class),
681 |                             '- restart {} - robust accuracy: {:.2%}'.format(
682 |                             counter, acc.float().mean()),
683 |                             '- cum. time: {:.1f} s'.format(
684 |                             time.time() - startt))
685 | 
686 |         return adv
687 | 
688 | 


--------------------------------------------------------------------------------
/autoattack/checks.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import warnings
  3 | import math
  4 | import sys
  5 | 
  6 | from autoattack.other_utils import L2_norm
  7 | 
  8 | 
  9 | funcs = {'grad': 0,
 10 |     'backward': 0,
 11 |     #'enable_grad': 0
 12 |     '_make_grads': 0,
 13 |     }
 14 | 
 15 | checks_doc_path = 'flags_doc.md'
 16 | 
 17 | 
 18 | def check_randomized(model, x, y, bs=250, n=5, alpha=1e-4, logger=None):
 19 |     acc = []
 20 |     corrcl = []
 21 |     outputs = []
 22 |     with torch.no_grad():
 23 |         for _ in range(n):
 24 |             output = model(x)
 25 |             corrcl_curr = (output.max(1)[1] == y).sum().item()
 26 |             corrcl.append(corrcl_curr)
 27 |             outputs.append(output / (L2_norm(output, keepdim=True) + 1e-10))
 28 |     acc = [c != corrcl_curr for c in corrcl]
 29 |     max_diff = 0.
 30 |     for c in range(n - 1):
 31 |         for e in range(c + 1, n):
 32 |             diff = L2_norm(outputs[c] - outputs[e])
 33 |             max_diff = max(max_diff, diff.max().item())
 34 |             #print(diff.max().item(), max_diff)
 35 |     if any(acc) or max_diff > alpha:
 36 |         msg = 'it seems to be a randomized defense! Please use version="rand".' + \
 37 |             f' See {checks_doc_path} for details.'
 38 |         if logger is None:
 39 |             warnings.warn(Warning(msg))
 40 |         else:
 41 |             logger.log(f'Warning: {msg}')
 42 | 
 43 | 
 44 | def check_range_output(model, x, alpha=1e-5, logger=None):
 45 |     with torch.no_grad():
 46 |         output = model(x)
 47 |     fl = [output.max() < 1. + alpha, output.min() >  -alpha,
 48 |         ((output.sum(-1) - 1.).abs() < alpha).all()]
 49 |     if all(fl):
 50 |         msg = 'it seems that the output is a probability distribution,' +\
 51 |             ' please be sure that the logits are used!' + \
 52 |             f' See {checks_doc_path} for details.'
 53 |         if logger is None:
 54 |             warnings.warn(Warning(msg))
 55 |         else:
 56 |             logger.log(f'Warning: {msg}')
 57 |     return output.shape[-1]
 58 | 
 59 | 
 60 | def check_zero_gradients(grad, logger=None):
 61 |     z = grad.view(grad.shape[0], -1).abs().sum(-1)
 62 |     #print(grad[0, :10])
 63 |     if (z == 0).any():
 64 |         msg = f'there are {(z == 0).sum()} points with zero gradient!' + \
 65 |             ' This might lead to unreliable evaluation with gradient-based attacks.' + \
 66 |             f' See {checks_doc_path} for details.'
 67 |         if logger is None:
 68 |             warnings.warn(Warning(msg))
 69 |         else:
 70 |             logger.log(f'Warning: {msg}')
 71 | 
 72 | 
 73 | def check_square_sr(acc_dict, alpha=.002, logger=None):
 74 |     if 'square' in acc_dict.keys() and len(acc_dict) > 2:
 75 |         acc = min([v for k, v in acc_dict.items() if k != 'square'])
 76 |         if acc_dict['square'] < acc - alpha:
 77 |             msg = 'Square Attack has decreased the robust accuracy of' + \
 78 |                 f' {acc - acc_dict["square"]:.2%}.' + \
 79 |                 ' This might indicate that the robustness evaluation using' +\
 80 |                 ' AutoAttack is unreliable. Consider running Square' +\
 81 |                 ' Attack with more iterations and restarts or an adaptive attack.' + \
 82 |                 f' See {checks_doc_path} for details.'
 83 |             if logger is None:
 84 |                 warnings.warn(Warning(msg))
 85 |             else:
 86 |                 logger.log(f'Warning: {msg}')
 87 | 
 88 | 
 89 | ''' from https://stackoverflow.com/questions/26119521/counting-function-calls-python '''
 90 | def tracefunc(frame, event, args):
 91 |     if event == 'call' and frame.f_code.co_name in funcs.keys():
 92 |         funcs[frame.f_code.co_name] += 1
 93 | 
 94 |         
 95 | def check_dynamic(model, x, is_tf_model=False, logger=None):
 96 |     if is_tf_model:
 97 |         msg = 'the check for dynamic defenses is not currently supported'
 98 |     else:
 99 |         msg = None
100 |         sys.settrace(tracefunc)
101 |         model(x)
102 |         sys.settrace(None)
103 |         #for k, v in funcs.items():
104 |         #    print(k, v)
105 |         if any([c > 0 for c in funcs.values()]):
106 |             msg = 'it seems to be a dynamic defense! The evaluation' + \
107 |                 ' with AutoAttack might be insufficient.' + \
108 |                 f' See {checks_doc_path} for details.'
109 |     if not msg is None:
110 |         if logger is None:
111 |             warnings.warn(Warning(msg))
112 |         else:
113 |             logger.log(f'Warning: {msg}')
114 |     #sys.settrace(None)
115 | 
116 | 
117 | def check_n_classes(n_cls, attacks_to_run, apgd_targets, fab_targets,
118 |     logger=None):
119 |     msg = None
120 |     if 'apgd-dlr' in attacks_to_run or 'apgd-t' in attacks_to_run:
121 |         if n_cls <= 2:
122 |             msg = f'with only {n_cls} classes it is not possible to use the DLR loss!'
123 |         elif n_cls == 3:
124 |             msg = f'with only {n_cls} classes it is not possible to use the targeted DLR loss!'
125 |         elif 'apgd-t' in attacks_to_run and \
126 |             apgd_targets + 1 > n_cls:
127 |             msg = f'it seems that more target classes ({apgd_targets})' + \
128 |                 f' than possible ({n_cls - 1}) are used in {"apgd-t".upper()}!'
129 |     if 'fab-t' in attacks_to_run and fab_targets + 1 > n_cls:
130 |         if msg is None:
131 |             msg = f'it seems that more target classes ({apgd_targets})' + \
132 |                 f' than possible ({n_cls - 1}) are used in FAB-T!'
133 |         else:
134 |             msg += f' Also, it seems that too many target classes ({apgd_targets})' + \
135 |                 f' are used in {"fab-t".upper()} ({n_cls - 1} possible)!'
136 |     if not msg is None:
137 |         if logger is None:
138 |             warnings.warn(Warning(msg))
139 |         else:
140 |             logger.log(f'Warning: {msg}')
141 | 
142 | 
143 | 


--------------------------------------------------------------------------------
/autoattack/examples/eval.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import argparse
 3 | from pathlib import Path
 4 | import warnings
 5 | 
 6 | import torch
 7 | import torch.nn as nn
 8 | import torchvision.datasets as datasets
 9 | import torch.utils.data as data
10 | import torchvision.transforms as transforms
11 | 
12 | import sys
13 | sys.path.insert(0,'..')
14 | 
15 | from resnet import *
16 | 
17 | if __name__ == '__main__':
18 |     parser = argparse.ArgumentParser()
19 |     parser.add_argument('--data_dir', type=str, default='./data')
20 |     parser.add_argument('--norm', type=str, default='Linf')
21 |     parser.add_argument('--epsilon', type=float, default=8./255.)
22 |     parser.add_argument('--model', type=str, default='./model_test.pt')
23 |     parser.add_argument('--n_ex', type=int, default=1000)
24 |     parser.add_argument('--individual', action='store_true')
25 |     parser.add_argument('--save_dir', type=str, default='./results')
26 |     parser.add_argument('--batch_size', type=int, default=500)
27 |     parser.add_argument('--log_path', type=str, default='./log_file.txt')
28 |     parser.add_argument('--version', type=str, default='standard')
29 |     parser.add_argument('--state-path', type=Path, default=None)
30 |     
31 |     args = parser.parse_args()
32 | 
33 |     # load model
34 |     model = ResNet18()
35 |     ckpt = torch.load(args.model)
36 |     model.load_state_dict(ckpt)
37 |     model.cuda()
38 |     model.eval()
39 | 
40 |     # load data
41 |     transform_list = [transforms.ToTensor()]
42 |     transform_chain = transforms.Compose(transform_list)
43 |     item = datasets.CIFAR10(root=args.data_dir, train=False, transform=transform_chain, download=True)
44 |     test_loader = data.DataLoader(item, batch_size=1000, shuffle=False, num_workers=0)
45 |     
46 |     # create save dir
47 |     if not os.path.exists(args.save_dir):
48 |         os.makedirs(args.save_dir)
49 |     
50 |     # load attack    
51 |     from autoattack import AutoAttack
52 |     adversary = AutoAttack(model, norm=args.norm, eps=args.epsilon, log_path=args.log_path,
53 |         version=args.version)
54 |     
55 |     l = [x for (x, y) in test_loader]
56 |     x_test = torch.cat(l, 0)
57 |     l = [y for (x, y) in test_loader]
58 |     y_test = torch.cat(l, 0)
59 |     
60 |     # example of custom version
61 |     if args.version == 'custom':
62 |         adversary.attacks_to_run = ['apgd-ce', 'fab']
63 |         adversary.apgd.n_restarts = 2
64 |         adversary.fab.n_restarts = 2
65 |     
66 |     # run attack and save images
67 |     with torch.no_grad():
68 |         if not args.individual:
69 |             adv_complete = adversary.run_standard_evaluation(x_test[:args.n_ex], y_test[:args.n_ex],
70 |                 bs=args.batch_size, state_path=args.state_path)
71 | 
72 |             torch.save({'adv_complete': adv_complete}, '{}/{}_{}_1_{}_eps_{:.5f}.pth'.format(
73 |                 args.save_dir, 'aa', args.version, adv_complete.shape[0], args.epsilon))
74 | 
75 |         else:
76 |             # individual version, each attack is run on all test points
77 |             adv_complete = adversary.run_standard_evaluation_individual(x_test[:args.n_ex],
78 |                 y_test[:args.n_ex], bs=args.batch_size)
79 |             
80 |             torch.save(adv_complete, '{}/{}_{}_individual_1_{}_eps_{:.5f}_plus_{}_cheap_{}.pth'.format(
81 |                 args.save_dir, 'aa', args.version, args.n_ex, args.epsilon))
82 |                 
83 | 


--------------------------------------------------------------------------------
/autoattack/examples/eval_tf1.py:
--------------------------------------------------------------------------------
  1 | #%%
  2 | from argparse import ArgumentParser
  3 | 
  4 | import numpy as np
  5 | import tensorflow as tf
  6 | 
  7 | import torch
  8 | import torch.nn as nn
  9 | import torchvision.datasets as datasets
 10 | import torch.utils.data as data
 11 | import torchvision.transforms as transforms
 12 | 
 13 | import sys
 14 | #sys.path.insert(0,'..')
 15 | 
 16 | from autoattack import AutoAttack, utils_tf
 17 | #
 18 | 
 19 | #%%
 20 | class mnist_loader:
 21 |     def __init__(self):
 22 | 
 23 |         self.n_class = 10
 24 |         self.dim_x   = 28
 25 |         self.dim_y   = 28
 26 |         self.dim_z   = 1
 27 |         self.img_min = 0.0
 28 |         self.img_max = 1.0
 29 |         self.epsilon = 0.3
 30 | 
 31 |     def download(self):
 32 |         (trainX, trainY), (testX, testY) = tf.keras.datasets.mnist.load_data()
 33 | 
 34 |         trainX = trainX.astype(np.float32)
 35 |         testX  = testX.astype(np.float32)
 36 | 
 37 |         # ont-hot
 38 |         trainY = tf.keras.utils.to_categorical(trainY, self.n_class)
 39 |         testY  = tf.keras.utils.to_categorical(testY , self.n_class)
 40 | 
 41 |         # get validation sets
 42 |         training_size = 55000
 43 |         validX = trainX[training_size:,:]
 44 |         validY = trainY[training_size:,:]
 45 | 
 46 |         trainX = trainX[:training_size,:]
 47 |         trainY = trainY[:training_size,:]
 48 | 
 49 |         # expand dimesion
 50 |         trainX = np.expand_dims(trainX, axis=3)
 51 |         validX = np.expand_dims(validX, axis=3)
 52 |         testX  = np.expand_dims(testX , axis=3)
 53 | 
 54 |         return trainX, trainY, validX, validY, testX, testY
 55 | 
 56 |     def get_raw_data(self):
 57 |         return self.download()
 58 | 
 59 |     def get_normalized_data(self):
 60 |         trainX, trainY, validX, validY, testX, testY = self.get_raw_data()
 61 |         trainX = trainX / 255.0 * (self.img_max - self.img_min) + self.img_min
 62 |         validX = validX / 255.0 * (self.img_max - self.img_min) + self.img_min
 63 |         testX  = testX  / 255.0 * (self.img_max - self.img_min) + self.img_min
 64 |         trainY = trainY
 65 |         validY = validY
 66 |         testY  = testY
 67 |         return trainX, trainY, validX, validY, testX, testY
 68 | 
 69 | #%%
 70 | def mnist_model():
 71 |     # declare variables
 72 |     model_layers = [ tf.keras.layers.Input(shape=(28,28,1), name="model/input"),
 73 |                         tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c1"),
 74 |                         tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c2"),
 75 |                         tf.keras.layers.MaxPooling2D(pool_size=(2, 2), name="clf/p1"),
 76 |                         tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c3"),
 77 |                         tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c4"),
 78 |                         tf.keras.layers.MaxPooling2D(pool_size=(2, 2), name="clf/p2"),
 79 |                         tf.keras.layers.Flatten(name="clf/f1"),
 80 |                         tf.keras.layers.Dense(256, activation="relu", kernel_initializer='he_normal', name="clf/d1"),
 81 |                         tf.keras.layers.Dense(10 , activation=None  , kernel_initializer='he_normal', name="clf/d2"),
 82 |                         tf.keras.layers.Activation('softmax', name="clf_output")
 83 |                     ]
 84 | 
 85 |     # clf_model
 86 |     clf_model = tf.keras.Sequential()
 87 |     for ii in model_layers:
 88 |         clf_model.add(ii)
 89 | 
 90 |     clf_model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy'])
 91 |     clf_model.summary()
 92 | 
 93 |     return clf_model
 94 | 
 95 | #%%
 96 | def arg_parser(parser):
 97 | 
 98 |     parser.add_argument("--path" , dest ="path", type=str, default='./', help="path of tf.keras model's wieghts")
 99 |     args, unknown = parser.parse_known_args()
100 |     if unknown:
101 |         msg = " ".join(unknown)
102 |         print('[Warning] Unrecognized arguments: {:s}'.format(msg) )
103 | 
104 |     return args
105 | 
106 | #%%
107 | if __name__ == '__main__':
108 | 
109 |     # get arguments
110 |     parser = ArgumentParser()
111 |     args = arg_parser(parser)
112 | 
113 |     # MODEL PATH
114 |     MODEL_PATH = args.path
115 | 
116 |     # init tf/keras
117 |     tf.compat.v1.keras.backend.clear_session()
118 |     gpu_options = tf.compat.v1.GPUOptions(allow_growth=True)
119 |     sess = tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(gpu_options=gpu_options))
120 |     tf.compat.v1.keras.backend.set_session(sess)
121 |     tf.compat.v1.keras.backend.set_learning_phase(0)
122 | 
123 |     # load data
124 |     batch_size = 1000
125 |     epsilon = mnist_loader().epsilon
126 |     _, _, _, _, testX, testY = mnist_loader().get_normalized_data()
127 | 
128 |     # convert to pytorch format
129 |     testY = np.argmax(testY, axis=1)
130 |     torch_testX = torch.from_numpy( np.transpose(testX, (0, 3, 1, 2)) ).float().cuda()
131 |     torch_testY = torch.from_numpy( testY ).float()
132 | 
133 |     # load model from saved weights
134 |     print('[INFO] MODEL_PATH: {:s}'.format(MODEL_PATH) )
135 |     tf_model = mnist_model()
136 |     tf_model.load_weights(MODEL_PATH)
137 | 
138 |     # remove 'softmax layer' and put it into adapter
139 |     atk_model = tf.keras.models.Model(inputs=tf_model.input, outputs=tf_model.get_layer(index=-2).output) 
140 |     atk_model.summary()
141 |     y_input = tf.placeholder(tf.int64, shape = [None])
142 |     x_input = atk_model.input
143 |     logits  = atk_model.output
144 |     model_adapted = utils_tf.ModelAdapter(logits, x_input, y_input, sess)
145 | 
146 |     # run attack
147 |     adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)
148 |     x_adv, y_adv = adversary.run_standard_evaluation(torch_testX, torch_testY, bs=batch_size, return_labels=True)
149 |     np_x_adv = np.moveaxis(x_adv.cpu().numpy(), 1, 3)
150 |     np.save("./output/mnist_adv.npy", np_x_adv)
151 | 


--------------------------------------------------------------------------------
/autoattack/examples/eval_tf2.py:
--------------------------------------------------------------------------------
  1 | #%%
  2 | from argparse import ArgumentParser
  3 | 
  4 | import numpy as np
  5 | import tensorflow as tf
  6 | 
  7 | import torch
  8 | import torch.nn as nn
  9 | import torchvision.datasets as datasets
 10 | import torch.utils.data as data
 11 | import torchvision.transforms as transforms
 12 | 
 13 | import sys
 14 | sys.path.insert(0, '..')
 15 | 
 16 | from autoattack import AutoAttack, utils_tf2
 17 | 
 18 | 
 19 | #%%
 20 | class mnist_loader:
 21 |     def __init__(self):
 22 | 
 23 |         self.n_class = 10
 24 |         self.dim_x   = 28
 25 |         self.dim_y   = 28
 26 |         self.dim_z   = 1
 27 |         self.img_min = 0.0
 28 |         self.img_max = 1.0
 29 |         self.epsilon = 0.3
 30 | 
 31 |     def download(self):
 32 |         (trainX, trainY), (testX, testY) = tf.keras.datasets.mnist.load_data()
 33 | 
 34 |         trainX = trainX.astype(np.float32)
 35 |         testX  = testX.astype(np.float32)
 36 | 
 37 |         # ont-hot
 38 |         trainY = tf.keras.utils.to_categorical(trainY, self.n_class)
 39 |         testY  = tf.keras.utils.to_categorical(testY , self.n_class)
 40 | 
 41 |         # get validation sets
 42 |         training_size = 55000
 43 |         validX = trainX[training_size:,:]
 44 |         validY = trainY[training_size:,:]
 45 | 
 46 |         trainX = trainX[:training_size,:]
 47 |         trainY = trainY[:training_size,:]
 48 | 
 49 |         # expand dimesion
 50 |         trainX = np.expand_dims(trainX, axis=3)
 51 |         validX = np.expand_dims(validX, axis=3)
 52 |         testX  = np.expand_dims(testX , axis=3)
 53 | 
 54 |         return trainX, trainY, validX, validY, testX, testY
 55 | 
 56 |     def get_raw_data(self):
 57 |         return self.download()
 58 | 
 59 |     def get_normalized_data(self):
 60 |         trainX, trainY, validX, validY, testX, testY = self.get_raw_data()
 61 |         trainX = trainX / 255.0 * (self.img_max - self.img_min) + self.img_min
 62 |         validX = validX / 255.0 * (self.img_max - self.img_min) + self.img_min
 63 |         testX  = testX  / 255.0 * (self.img_max - self.img_min) + self.img_min
 64 |         trainY = trainY
 65 |         validY = validY
 66 |         testY  = testY
 67 |         return trainX, trainY, validX, validY, testX, testY
 68 | 
 69 | #%%
 70 | def mnist_model():
 71 |     # declare variables
 72 |     model_layers = [ tf.keras.layers.Input(shape=(28,28,1), name="model/input"),
 73 |                         tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c1"),
 74 |                         tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c2"),
 75 |                         tf.keras.layers.MaxPooling2D(pool_size=(2, 2), name="clf/p1"),
 76 |                         tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c3"),
 77 |                         tf.keras.layers.Conv2D(16, (3, 3), padding="same", activation="relu", kernel_initializer='he_normal', name="clf/c4"),
 78 |                         tf.keras.layers.MaxPooling2D(pool_size=(2, 2), name="clf/p2"),
 79 |                         tf.keras.layers.Flatten(name="clf/f1"),
 80 |                         tf.keras.layers.Dense(256, activation="relu", kernel_initializer='he_normal', name="clf/d1"),
 81 |                         tf.keras.layers.Dense(10 , activation=None  , kernel_initializer='he_normal', name="clf/d2"),
 82 |                         tf.keras.layers.Activation('softmax', name="clf_output")
 83 |                     ]
 84 | 
 85 |     # clf_model
 86 |     clf_model = tf.keras.Sequential()
 87 |     for ii in model_layers:
 88 |         clf_model.add(ii)
 89 | 
 90 |     clf_model.compile(loss='categorical_crossentropy', optimizer='Nadam', metrics=['accuracy'])
 91 |     clf_model.summary()
 92 | 
 93 |     return clf_model
 94 | 
 95 | #%%
 96 | def arg_parser(parser):
 97 | 
 98 |     parser.add_argument("--path" , dest ="path", type=str, default='./autoattack/examples/tf_model.weight.h5', help="path of tf.keras model's wieghts")
 99 |     args, unknown = parser.parse_known_args()
100 |     if unknown:
101 |         msg = " ".join(unknown)
102 |         print('[Warning] Unrecognized arguments: {:s}'.format(msg) )
103 | 
104 |     return args
105 | 
106 | #%%
107 | if __name__ == '__main__':
108 | 
109 |     # get arguments
110 |     parser = ArgumentParser()
111 |     args = arg_parser(parser)
112 | 
113 |     # MODEL PATH
114 |     MODEL_PATH = args.path
115 | 
116 |     # init tf/keras
117 |     gpus = tf.config.list_physical_devices('GPU')
118 |     for gpu in gpus:
119 |         tf.config.experimental.set_memory_growth(gpu, True)
120 | 
121 |     # load data
122 |     batch_size = 1000
123 |     epsilon = mnist_loader().epsilon
124 |     _, _, _, _, testX, testY = mnist_loader().get_normalized_data()
125 | 
126 |     # convert to pytorch format
127 |     testY = np.argmax(testY, axis=1)
128 |     torch_testX = torch.from_numpy( np.transpose(testX, (0, 3, 1, 2)) ).float().cuda()
129 |     torch_testY = torch.from_numpy( testY ).float()
130 | 
131 |     # load model from saved weights
132 |     print('[INFO] MODEL_PATH: {:s}'.format(MODEL_PATH) )
133 |     tf_model = mnist_model()
134 |     tf_model.load_weights(MODEL_PATH)
135 | 
136 |     # remove 'softmax layer' and put it into adapter
137 |     atk_model = tf.keras.models.Model(inputs=tf_model.input, outputs=tf_model.get_layer(index=-2).output) 
138 |     atk_model.summary()
139 |     model_adapted = utils_tf2.ModelAdapter(atk_model)
140 | 
141 |     # run attack
142 |     adversary = AutoAttack(model_adapted, norm='Linf', eps=epsilon, version='standard', is_tf_model=True)
143 |     x_adv, y_adv = adversary.run_standard_evaluation(torch_testX, torch_testY, bs=batch_size, return_labels=True)
144 |     np_x_adv = np.moveaxis(x_adv.cpu().numpy(), 1, 3)
145 |     np.save("./output/mnist_adv.npy", np_x_adv)
146 | 


--------------------------------------------------------------------------------
/autoattack/examples/model_test.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fra31/auto-attack/a39220048b3c9f2cca9a4d3a54604793c68eca7e/autoattack/examples/model_test.pt


--------------------------------------------------------------------------------
/autoattack/examples/resnet.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | 
  5 | 
  6 | class BasicBlock(nn.Module):
  7 |     expansion = 1
  8 | 
  9 |     def __init__(self, in_planes, planes, stride=1):
 10 |         super(BasicBlock, self).__init__()
 11 |         self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
 12 |         self.bn1 = nn.BatchNorm2d(planes)
 13 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
 14 |         self.bn2 = nn.BatchNorm2d(planes)
 15 | 
 16 |         self.shortcut = nn.Sequential()
 17 |         if stride != 1 or in_planes != self.expansion * planes:
 18 |             self.shortcut = nn.Sequential(
 19 |                 nn.Conv2d(in_planes, self.expansion * planes, kernel_size=1, stride=stride, bias=False),
 20 |                 nn.BatchNorm2d(self.expansion * planes)
 21 |             )
 22 | 
 23 |     def forward(self, x):
 24 |         out = F.relu(self.bn1(self.conv1(x)))
 25 |         out = self.bn2(self.conv2(out))
 26 |         out += self.shortcut(x)
 27 |         out = F.relu(out)
 28 |         return out
 29 | 
 30 | 
 31 | class Bottleneck(nn.Module):
 32 |     expansion = 4
 33 | 
 34 |     def __init__(self, in_planes, planes, stride=1):
 35 |         super(Bottleneck, self).__init__()
 36 |         self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)
 37 |         self.bn1 = nn.BatchNorm2d(planes)
 38 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
 39 |         self.bn2 = nn.BatchNorm2d(planes)
 40 |         self.conv3 = nn.Conv2d(planes, self.expansion * planes, kernel_size=1, bias=False)
 41 |         self.bn3 = nn.BatchNorm2d(self.expansion * planes)
 42 | 
 43 |         self.shortcut = nn.Sequential()
 44 |         if stride != 1 or in_planes != self.expansion * planes:
 45 |             self.shortcut = nn.Sequential(
 46 |                 nn.Conv2d(in_planes, self.expansion * planes, kernel_size=1, stride=stride, bias=False),
 47 |                 nn.BatchNorm2d(self.expansion * planes)
 48 |             )
 49 | 
 50 |     def forward(self, x):
 51 |         out = F.relu(self.bn1(self.conv1(x)))
 52 |         out = F.relu(self.bn2(self.conv2(out)))
 53 |         out = self.bn3(self.conv3(out))
 54 |         out += self.shortcut(x)
 55 |         out = F.relu(out)
 56 |         return out
 57 | 
 58 | 
 59 | class ResNet(nn.Module):
 60 |     def __init__(self, block, num_blocks, num_classes=10):
 61 |         super(ResNet, self).__init__()
 62 |         self.in_planes = 64
 63 | 
 64 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
 65 |         self.bn1 = nn.BatchNorm2d(64)
 66 |         self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
 67 |         self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
 68 |         self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
 69 |         self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
 70 |         self.linear = nn.Linear(512 * block.expansion, num_classes)
 71 | 
 72 |     def _make_layer(self, block, planes, num_blocks, stride):
 73 |         strides = [stride] + [1] * (num_blocks - 1)
 74 |         layers = []
 75 |         for stride in strides:
 76 |             layers.append(block(self.in_planes, planes, stride))
 77 |             self.in_planes = planes * block.expansion
 78 |         return nn.Sequential(*layers)
 79 | 
 80 |     def forward(self, x):
 81 |         out = F.relu(self.bn1(self.conv1(x)))
 82 |         out = self.layer1(out)
 83 |         out = self.layer2(out)
 84 |         out = self.layer3(out)
 85 |         out = self.layer4(out)
 86 |         out = F.avg_pool2d(out, 4)
 87 |         out = out.view(out.size(0), -1)
 88 |         out = self.linear(out)
 89 |         return out
 90 | 
 91 | 
 92 | def ResNet18():
 93 |     return ResNet(BasicBlock, [2, 2, 2, 2])
 94 | 
 95 | 
 96 | def ResNet34():
 97 |     return ResNet(BasicBlock, [3, 4, 6, 3])
 98 | 
 99 | 
100 | def ResNet50():
101 |     return ResNet(Bottleneck, [3, 4, 6, 3])
102 | 
103 | 
104 | def ResNet101():
105 |     return ResNet(Bottleneck, [3, 4, 23, 3])
106 | 
107 | 
108 | def ResNet152():
109 |     return ResNet(Bottleneck, [3, 8, 36, 3])
110 | 
111 | 
112 | def test():
113 |     net = ResNet18()
114 |     y = net(torch.randn(1, 3, 32, 32))
115 |     print(y.size())
116 | 


--------------------------------------------------------------------------------
/autoattack/examples/tf_model.weight.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/fra31/auto-attack/a39220048b3c9f2cca9a4d3a54604793c68eca7e/autoattack/examples/tf_model.weight.h5


--------------------------------------------------------------------------------
/autoattack/fab_base.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2019-present, Francesco Croce
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | from __future__ import absolute_import
  9 | from __future__ import division
 10 | from __future__ import print_function
 11 | from __future__ import unicode_literals
 12 | 
 13 | import time
 14 | 
 15 | import torch
 16 | 
 17 | from autoattack.fab_projections import projection_linf, projection_l2,\
 18 |     projection_l1
 19 | 
 20 | DEFAULT_EPS_DICT_BY_NORM = {'Linf': .3, 'L2': 1., 'L1': 5.0}
 21 | 
 22 | 
 23 | class FABAttack():
 24 |     """
 25 |     Fast Adaptive Boundary Attack (Linf, L2, L1)
 26 |     https://arxiv.org/abs/1907.02044
 27 |     
 28 |     :param norm:          Lp-norm to minimize ('Linf', 'L2', 'L1' supported)
 29 |     :param n_restarts:    number of random restarts
 30 |     :param n_iter:        number of iterations
 31 |     :param eps:           epsilon for the random restarts
 32 |     :param alpha_max:     alpha_max
 33 |     :param eta:           overshooting
 34 |     :param beta:          backward step
 35 |     """
 36 | 
 37 |     def __init__(
 38 |             self,
 39 |             norm='Linf',
 40 |             n_restarts=1,
 41 |             n_iter=100,
 42 |             eps=None,
 43 |             alpha_max=0.1,
 44 |             eta=1.05,
 45 |             beta=0.9,
 46 |             loss_fn=None,
 47 |             verbose=False,
 48 |             seed=0,
 49 |             targeted=False,
 50 |             device=None,
 51 |             n_target_classes=9):
 52 |         """ FAB-attack implementation in pytorch """
 53 | 
 54 |         self.norm = norm
 55 |         self.n_restarts = n_restarts
 56 |         self.n_iter = n_iter
 57 |         self.eps = eps if eps is not None else DEFAULT_EPS_DICT_BY_NORM[norm]
 58 |         self.alpha_max = alpha_max
 59 |         self.eta = eta
 60 |         self.beta = beta
 61 |         self.targeted = targeted
 62 |         self.verbose = verbose
 63 |         self.seed = seed
 64 |         self.target_class = None
 65 |         self.device = device
 66 |         self.n_target_classes = n_target_classes
 67 | 
 68 |     def check_shape(self, x):
 69 |         return x if len(x.shape) > 0 else x.unsqueeze(0)
 70 | 
 71 |     def _predict_fn(self, x):
 72 |         raise NotImplementedError("Virtual function.")
 73 | 
 74 |     def _get_predicted_label(self, x):
 75 |         raise NotImplementedError("Virtual function.")
 76 | 
 77 |     def get_diff_logits_grads_batch(self, imgs, la):
 78 |         raise NotImplementedError("Virtual function.")
 79 | 
 80 |     def get_diff_logits_grads_batch_targeted(self, imgs, la, la_target):
 81 |        raise NotImplementedError("Virtual function.")
 82 | 
 83 |     def attack_single_run(self, x, y=None, use_rand_start=False, is_targeted=False):
 84 |         """
 85 |         :param x:             clean images
 86 |         :param y:             clean labels, if None we use the predicted labels
 87 |         :param is_targeted    True if we ise targeted version. Targeted class is assigned by `self.target_class`
 88 |         """
 89 | 
 90 |         if self.device is None:
 91 |             self.device = x.device
 92 |         self.orig_dim = list(x.shape[1:])
 93 |         self.ndims = len(self.orig_dim)
 94 | 
 95 |         x = x.detach().clone().float().to(self.device)
 96 |         #assert next(self.predict.parameters()).device == x.device
 97 | 
 98 |         y_pred = self._get_predicted_label(x)
 99 |         if y is None:
100 |             y = y_pred.detach().clone().long().to(self.device)
101 |         else:
102 |             y = y.detach().clone().long().to(self.device)
103 |         pred = y_pred == y
104 |         corr_classified = pred.float().sum()
105 |         if self.verbose:
106 |             print('Clean accuracy: {:.2%}'.format(pred.float().mean()))
107 |         if pred.sum() == 0:
108 |             return x
109 |         pred = self.check_shape(pred.nonzero().squeeze())
110 | 
111 |         if is_targeted:
112 |             output = self._predict_fn(x)
113 |             la_target = output.sort(dim=-1)[1][:, -self.target_class]
114 |             la_target2 = la_target[pred].detach().clone()
115 | 
116 |         startt = time.time()
117 |         # runs the attack only on correctly classified points
118 |         im2 = x[pred].detach().clone()
119 |         la2 = y[pred].detach().clone()
120 |         if len(im2.shape) == self.ndims:
121 |             im2 = im2.unsqueeze(0)
122 |         bs = im2.shape[0]
123 |         u1 = torch.arange(bs)
124 |         adv = im2.clone()
125 |         adv_c = x.clone()
126 |         res2 = 1e10 * torch.ones([bs]).to(self.device)
127 |         x1 = im2.clone()
128 |         x0 = im2.clone().reshape([bs, -1])
129 | 
130 |         if use_rand_start:
131 |             if self.norm == 'Linf':
132 |                 t = 2 * torch.rand(x1.shape).to(self.device) - 1
133 |                 x1 = im2 + (torch.min(res2,
134 |                                         self.eps * torch.ones(res2.shape)
135 |                                         .to(self.device)
136 |                                         ).reshape([-1, *[1]*self.ndims])
137 |                             ) * t / (t.reshape([t.shape[0], -1]).abs()
138 |                                         .max(dim=1, keepdim=True)[0]
139 |                                         .reshape([-1, *[1]*self.ndims])) * .5
140 |             elif self.norm == 'L2':
141 |                 t = torch.randn(x1.shape).to(self.device)
142 |                 x1 = im2 + (torch.min(res2,
143 |                                         self.eps * torch.ones(res2.shape)
144 |                                         .to(self.device)
145 |                                         ).reshape([-1, *[1]*self.ndims])
146 |                             ) * t / ((t ** 2)
147 |                                         .view(t.shape[0], -1)
148 |                                         .sum(dim=-1)
149 |                                         .sqrt()
150 |                                         .view(t.shape[0], *[1]*self.ndims)) * .5
151 |             elif self.norm == 'L1':
152 |                 t = torch.randn(x1.shape).to(self.device)
153 |                 x1 = im2 + (torch.min(res2,
154 |                                         self.eps * torch.ones(res2.shape)
155 |                                         .to(self.device)
156 |                                         ).reshape([-1, *[1]*self.ndims])
157 |                             ) * t / (t.abs().view(t.shape[0], -1)
158 |                                         .sum(dim=-1)
159 |                                         .view(t.shape[0], *[1]*self.ndims)) / 2
160 | 
161 |             x1 = x1.clamp(0.0, 1.0)
162 | 
163 |         counter_iter = 0
164 |         while counter_iter < self.n_iter:
165 |             with torch.no_grad():
166 |                 if is_targeted:
167 |                     df, dg = self.get_diff_logits_grads_batch_targeted(x1, la2, la_target2)
168 |                 else:
169 |                     df, dg = self.get_diff_logits_grads_batch(x1, la2)
170 |                 if self.norm == 'Linf':
171 |                     dist1 = df.abs() / (1e-12 +
172 |                                         dg.abs()
173 |                                         .reshape(dg.shape[0], dg.shape[1], -1)
174 |                                         .sum(dim=-1))
175 |                 elif self.norm == 'L2':
176 |                     dist1 = df.abs() / (1e-12 + (dg ** 2)
177 |                                         .reshape(dg.shape[0], dg.shape[1], -1)
178 |                                         .sum(dim=-1).sqrt())
179 |                 elif self.norm == 'L1':
180 |                     dist1 = df.abs() / (1e-12 + dg.abs().reshape(
181 |                         [df.shape[0], df.shape[1], -1]).max(dim=2)[0])
182 |                 else:
183 |                     raise ValueError('norm not supported')
184 |                 ind = dist1.min(dim=1)[1]
185 |                 dg2 = dg[u1, ind]
186 |                 b = (- df[u1, ind] + (dg2 * x1).reshape(x1.shape[0], -1)
187 |                                         .sum(dim=-1))
188 |                 w = dg2.reshape([bs, -1])
189 | 
190 |                 if self.norm == 'Linf':
191 |                     d3 = projection_linf(
192 |                         torch.cat((x1.reshape([bs, -1]), x0), 0),
193 |                         torch.cat((w, w), 0),
194 |                         torch.cat((b, b), 0))
195 |                 elif self.norm == 'L2':
196 |                     d3 = projection_l2(
197 |                         torch.cat((x1.reshape([bs, -1]), x0), 0),
198 |                         torch.cat((w, w), 0),
199 |                         torch.cat((b, b), 0))
200 |                 elif self.norm == 'L1':
201 |                     d3 = projection_l1(
202 |                         torch.cat((x1.reshape([bs, -1]), x0), 0),
203 |                         torch.cat((w, w), 0),
204 |                         torch.cat((b, b), 0))
205 |                 d1 = torch.reshape(d3[:bs], x1.shape)
206 |                 d2 = torch.reshape(d3[-bs:], x1.shape)
207 |                 if self.norm == 'Linf':
208 |                     a0 = d3.abs().max(dim=1, keepdim=True)[0]\
209 |                         .view(-1, *[1]*self.ndims)
210 |                 elif self.norm == 'L2':
211 |                     a0 = (d3 ** 2).sum(dim=1, keepdim=True).sqrt()\
212 |                         .view(-1, *[1]*self.ndims)
213 |                 elif self.norm == 'L1':
214 |                     a0 = d3.abs().sum(dim=1, keepdim=True)\
215 |                         .view(-1, *[1]*self.ndims)
216 |                 a0 = torch.max(a0, 1e-8 * torch.ones(
217 |                     a0.shape).to(self.device))
218 |                 a1 = a0[:bs]
219 |                 a2 = a0[-bs:]
220 |                 alpha = torch.min(torch.max(a1 / (a1 + a2),
221 |                                             torch.zeros(a1.shape)
222 |                                             .to(self.device)),
223 |                                     self.alpha_max * torch.ones(a1.shape)
224 |                                     .to(self.device))
225 |                 x1 = ((x1 + self.eta * d1) * (1 - alpha) +
226 |                         (im2 + d2 * self.eta) * alpha).clamp(0.0, 1.0)
227 | 
228 |                 is_adv = self._get_predicted_label(x1) != la2
229 | 
230 |                 if is_adv.sum() > 0:
231 |                     ind_adv = is_adv.nonzero().squeeze()
232 |                     ind_adv = self.check_shape(ind_adv)
233 |                     if self.norm == 'Linf':
234 |                         t = (x1[ind_adv] - im2[ind_adv]).reshape(
235 |                             [ind_adv.shape[0], -1]).abs().max(dim=1)[0]
236 |                     elif self.norm == 'L2':
237 |                         t = ((x1[ind_adv] - im2[ind_adv]) ** 2)\
238 |                             .reshape(ind_adv.shape[0], -1).sum(dim=-1).sqrt()
239 |                     elif self.norm == 'L1':
240 |                         t = (x1[ind_adv] - im2[ind_adv])\
241 |                             .abs().reshape(ind_adv.shape[0], -1).sum(dim=-1)
242 |                     adv[ind_adv] = x1[ind_adv] * (t < res2[ind_adv]).\
243 |                         float().reshape([-1, *[1]*self.ndims]) + adv[ind_adv]\
244 |                         * (t >= res2[ind_adv]).float().reshape(
245 |                         [-1, *[1]*self.ndims])
246 |                     res2[ind_adv] = t * (t < res2[ind_adv]).float()\
247 |                         + res2[ind_adv] * (t >= res2[ind_adv]).float()
248 |                     x1[ind_adv] = im2[ind_adv] + (
249 |                         x1[ind_adv] - im2[ind_adv]) * self.beta
250 | 
251 |                 counter_iter += 1
252 | 
253 |         ind_succ = res2 < 1e10
254 |         if self.verbose:
255 |             print('success rate: {:.0f}/{:.0f}'
256 |                   .format(ind_succ.float().sum(), corr_classified) +
257 |                   ' (on correctly classified points) in {:.1f} s'
258 |                   .format(time.time() - startt))
259 | 
260 |         ind_succ = self.check_shape(ind_succ.nonzero().squeeze())
261 |         adv_c[pred[ind_succ]] = adv[ind_succ].clone()
262 | 
263 |         return adv_c
264 | 
265 |     def perturb(self, x, y):
266 |         if self.device is None:
267 |             self.device = x.device
268 |         adv = x.clone()
269 |         with torch.no_grad():
270 |             acc = self._predict_fn(x).max(1)[1] == y
271 | 
272 |             startt = time.time()
273 | 
274 |             torch.random.manual_seed(self.seed)
275 |             torch.cuda.random.manual_seed(self.seed)
276 | 
277 |             if not self.targeted:
278 |                 for counter in range(self.n_restarts):
279 |                     ind_to_fool = acc.nonzero().squeeze()
280 |                     if len(ind_to_fool.shape) == 0: ind_to_fool = ind_to_fool.unsqueeze(0)
281 |                     if ind_to_fool.numel() != 0:
282 |                         x_to_fool, y_to_fool = x[ind_to_fool].clone(), y[ind_to_fool].clone()
283 |                         adv_curr = self.attack_single_run(x_to_fool, y_to_fool, use_rand_start=(counter > 0), is_targeted=False)
284 | 
285 |                         acc_curr = self._predict_fn(adv_curr).max(1)[1] == y_to_fool
286 |                         if self.norm == 'Linf':
287 |                             res = (x_to_fool - adv_curr).abs().reshape(x_to_fool.shape[0], -1).max(1)[0]
288 |                         elif self.norm == 'L2':
289 |                             res = ((x_to_fool - adv_curr) ** 2).reshape(x_to_fool.shape[0], -1).sum(dim=-1).sqrt()
290 |                         elif self.norm == 'L1':
291 |                             res = (x_to_fool - adv_curr).abs().reshape(x_to_fool.shape[0], -1).sum(-1)
292 |                         acc_curr = torch.max(acc_curr, res > self.eps)
293 | 
294 |                         ind_curr = (acc_curr == 0).nonzero().squeeze()
295 |                         acc[ind_to_fool[ind_curr]] = 0
296 |                         adv[ind_to_fool[ind_curr]] = adv_curr[ind_curr].clone()
297 | 
298 |                         if self.verbose:
299 |                             print('restart {} - robust accuracy: {:.2%} at eps = {:.5f} - cum. time: {:.1f} s'.format(
300 |                                 counter, acc.float().mean(), self.eps, time.time() - startt))
301 | 
302 |             else:
303 |                 for target_class in range(2, self.n_target_classes + 2):
304 |                     self.target_class = target_class
305 |                     for counter in range(self.n_restarts):
306 |                         ind_to_fool = acc.nonzero().squeeze()
307 |                         if len(ind_to_fool.shape) == 0: ind_to_fool = ind_to_fool.unsqueeze(0)
308 |                         if ind_to_fool.numel() != 0:
309 |                             x_to_fool, y_to_fool = x[ind_to_fool].clone(), y[ind_to_fool].clone()
310 |                             adv_curr = self.attack_single_run(x_to_fool, y_to_fool, use_rand_start=(counter > 0), is_targeted=True)
311 | 
312 |                             acc_curr = self._predict_fn(adv_curr).max(1)[1] == y_to_fool
313 |                             if self.norm == 'Linf':
314 |                                 res = (x_to_fool - adv_curr).abs().reshape(x_to_fool.shape[0], -1).max(1)[0]
315 |                             elif self.norm == 'L2':
316 |                                 res = ((x_to_fool - adv_curr) ** 2).reshape(x_to_fool.shape[0], -1).sum(dim=-1).sqrt()
317 |                             elif self.norm == 'L1':
318 |                                 res = (x_to_fool - adv_curr).abs().reshape(x_to_fool.shape[0], -1).sum(-1)
319 |                             acc_curr = torch.max(acc_curr, res > self.eps)
320 | 
321 |                             ind_curr = (acc_curr == 0).nonzero().squeeze()
322 |                             acc[ind_to_fool[ind_curr]] = 0
323 |                             adv[ind_to_fool[ind_curr]] = adv_curr[ind_curr].clone()
324 | 
325 |                             if self.verbose:
326 |                                 print('restart {} - target_class {} - robust accuracy: {:.2%} at eps = {:.5f} - cum. time: {:.1f} s'.format(
327 |                                     counter, self.target_class, acc.float().mean(), self.eps, time.time() - startt))
328 | 
329 |         return adv
330 | 


--------------------------------------------------------------------------------
/autoattack/fab_projections.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | 
  3 | import torch
  4 | from torch.nn import functional as F
  5 | 
  6 | 
  7 | def projection_linf(points_to_project, w_hyperplane, b_hyperplane):
  8 |     device = points_to_project.device
  9 |     t, w, b = points_to_project, w_hyperplane.clone(), b_hyperplane.clone()
 10 | 
 11 |     sign = 2 * ((w * t).sum(1) - b >= 0) - 1
 12 |     w.mul_(sign.unsqueeze(1))
 13 |     b.mul_(sign)
 14 | 
 15 |     a = (w < 0).float()
 16 |     d = (a - t) * (w != 0).float()
 17 | 
 18 |     p = a - t * (2 * a - 1)
 19 |     indp = torch.argsort(p, dim=1)
 20 | 
 21 |     b = b - (w * t).sum(1)
 22 |     b0 = (w * d).sum(1)
 23 | 
 24 |     indp2 = indp.flip((1,))
 25 |     ws = w.gather(1, indp2)
 26 |     bs2 = - ws * d.gather(1, indp2)
 27 | 
 28 |     s = torch.cumsum(ws.abs(), dim=1)
 29 |     sb = torch.cumsum(bs2, dim=1) + b0.unsqueeze(1)
 30 | 
 31 |     b2 = sb[:, -1] - s[:, -1] * p.gather(1, indp[:, 0:1]).squeeze(1)
 32 |     c_l = b - b2 > 0
 33 |     c2 = (b - b0 > 0) & (~c_l)
 34 |     lb = torch.zeros(c2.sum(), device=device)
 35 |     ub = torch.full_like(lb, w.shape[1] - 1)
 36 |     nitermax = math.ceil(math.log2(w.shape[1]))
 37 | 
 38 |     indp_, sb_, s_, p_, b_ = indp[c2], sb[c2], s[c2], p[c2], b[c2]
 39 |     for counter in range(nitermax):
 40 |         counter4 = torch.floor((lb + ub) / 2)
 41 | 
 42 |         counter2 = counter4.long().unsqueeze(1)
 43 |         indcurr = indp_.gather(1, indp_.size(1) - 1 - counter2)
 44 |         b2 = (sb_.gather(1, counter2) - s_.gather(1, counter2) * p_.gather(1, indcurr)).squeeze(1)
 45 |         c = b_ - b2 > 0
 46 | 
 47 |         lb = torch.where(c, counter4, lb)
 48 |         ub = torch.where(c, ub, counter4)
 49 | 
 50 |     lb = lb.long()
 51 | 
 52 |     if c_l.any():
 53 |         lmbd_opt = torch.clamp_min((b[c_l] - sb[c_l, -1]) / (-s[c_l, -1]), min=0).unsqueeze(-1)
 54 |         d[c_l] = (2 * a[c_l] - 1) * lmbd_opt
 55 | 
 56 |     lmbd_opt = torch.clamp_min((b[c2] - sb[c2, lb]) / (-s[c2, lb]), min=0).unsqueeze(-1)
 57 |     d[c2] = torch.min(lmbd_opt, d[c2]) * a[c2] + torch.max(-lmbd_opt, d[c2]) * (1 - a[c2])
 58 | 
 59 |     return d * (w != 0).float()
 60 | 
 61 | 
 62 | def projection_l2(points_to_project, w_hyperplane, b_hyperplane):
 63 |     device = points_to_project.device
 64 |     t, w, b = points_to_project, w_hyperplane.clone(), b_hyperplane
 65 | 
 66 |     c = (w * t).sum(1) - b
 67 |     ind2 = 2 * (c >= 0) - 1
 68 |     w.mul_(ind2.unsqueeze(1))
 69 |     c.mul_(ind2)
 70 | 
 71 |     r = torch.max(t / w, (t - 1) / w).clamp(min=-1e12, max=1e12)
 72 |     r.masked_fill_(w.abs() < 1e-8, 1e12)
 73 |     r[r == -1e12] *= -1
 74 |     rs, indr = torch.sort(r, dim=1)
 75 |     rs2 = F.pad(rs[:, 1:], (0, 1))
 76 |     rs.masked_fill_(rs == 1e12, 0)
 77 |     rs2.masked_fill_(rs2 == 1e12, 0)
 78 | 
 79 |     w3s = (w ** 2).gather(1, indr)
 80 |     w5 = w3s.sum(dim=1, keepdim=True)
 81 |     ws = w5 - torch.cumsum(w3s, dim=1)
 82 |     d = -(r * w)
 83 |     d.mul_((w.abs() > 1e-8).float())
 84 |     s = torch.cat((-w5 * rs[:, 0:1], torch.cumsum((-rs2 + rs) * ws, dim=1) - w5 * rs[:, 0:1]), 1)
 85 | 
 86 |     c4 = s[:, 0] + c < 0
 87 |     c3 = (d * w).sum(dim=1) + c > 0
 88 |     c2 = ~(c4 | c3)
 89 | 
 90 |     lb = torch.zeros(c2.sum(), device=device)
 91 |     ub = torch.full_like(lb, w.shape[1] - 1)
 92 |     nitermax = math.ceil(math.log2(w.shape[1]))
 93 | 
 94 |     s_, c_ = s[c2], c[c2]
 95 |     for counter in range(nitermax):
 96 |         counter4 = torch.floor((lb + ub) / 2)
 97 |         counter2 = counter4.long().unsqueeze(1)
 98 |         c3 = s_.gather(1, counter2).squeeze(1) + c_ > 0
 99 |         lb = torch.where(c3, counter4, lb)
100 |         ub = torch.where(c3, ub, counter4)
101 | 
102 |     lb = lb.long()
103 | 
104 |     if c4.any():
105 |         alpha = c[c4] / w5[c4].squeeze(-1)
106 |         d[c4] = -alpha.unsqueeze(-1) * w[c4]
107 | 
108 |     if c2.any():
109 |         alpha = (s[c2, lb] + c[c2]) / ws[c2, lb] + rs[c2, lb]
110 |         alpha[ws[c2, lb] == 0] = 0
111 |         c5 = (alpha.unsqueeze(-1) > r[c2]).float()
112 |         d[c2] = d[c2] * c5 - alpha.unsqueeze(-1) * w[c2] * (1 - c5)
113 | 
114 |     return d * (w.abs() > 1e-8).float()
115 | 
116 | 
117 | def projection_l1(points_to_project, w_hyperplane, b_hyperplane):
118 |     device = points_to_project.device
119 |     t, w, b = points_to_project, w_hyperplane.clone(), b_hyperplane
120 | 
121 |     c = (w * t).sum(1) - b
122 |     ind2 = 2 * (c >= 0) - 1
123 |     w.mul_(ind2.unsqueeze(1))
124 |     c.mul_(ind2)
125 | 
126 |     r = (1 / w).abs().clamp_max(1e12)
127 |     indr = torch.argsort(r, dim=1)
128 |     indr_rev = torch.argsort(indr)
129 | 
130 |     c6 = (w < 0).float()
131 |     d = (-t + c6) * (w != 0).float()
132 |     ds = torch.min(-w * t, w * (1 - t)).gather(1, indr)
133 |     ds2 = torch.cat((c.unsqueeze(-1), ds), 1)
134 |     s = torch.cumsum(ds2, dim=1)
135 | 
136 |     c2 = s[:, -1] < 0
137 | 
138 |     lb = torch.zeros(c2.sum(), device=device)
139 |     ub = torch.full_like(lb, s.shape[1])
140 |     nitermax = math.ceil(math.log2(w.shape[1]))
141 | 
142 |     s_ = s[c2]
143 |     for counter in range(nitermax):
144 |         counter4 = torch.floor((lb + ub) / 2)
145 |         counter2 = counter4.long().unsqueeze(1)
146 |         c3 = s_.gather(1, counter2).squeeze(1) > 0
147 |         lb = torch.where(c3, counter4, lb)
148 |         ub = torch.where(c3, ub, counter4)
149 | 
150 |     lb2 = lb.long()
151 | 
152 |     if c2.any():
153 |         indr = indr[c2].gather(1, lb2.unsqueeze(1)).squeeze(1)
154 |         u = torch.arange(0, w.shape[0], device=device).unsqueeze(1)
155 |         u2 = torch.arange(0, w.shape[1], device=device, dtype=torch.float).unsqueeze(0)
156 |         alpha = -s[c2, lb2] / w[c2, indr]
157 |         c5 = u2 < lb.unsqueeze(-1)
158 |         u3 = c5[u[:c5.shape[0]], indr_rev[c2]]
159 |         d[c2] = d[c2] * u3.float()
160 |         d[c2, indr] = alpha
161 | 
162 |     return d * (w.abs() > 1e-8).float()
163 | 


--------------------------------------------------------------------------------
/autoattack/fab_pt.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2019-present, Francesco Croce
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | from __future__ import absolute_import
  9 | from __future__ import division
 10 | from __future__ import print_function
 11 | from __future__ import unicode_literals
 12 | 
 13 | import time
 14 | 
 15 | import torch
 16 | 
 17 | from autoattack.other_utils import zero_gradients
 18 | from autoattack.fab_base import FABAttack
 19 | 
 20 | class FABAttack_PT(FABAttack):
 21 |     """
 22 |     Fast Adaptive Boundary Attack (Linf, L2, L1)
 23 |     https://arxiv.org/abs/1907.02044
 24 |     
 25 |     :param predict:       forward pass function
 26 |     :param norm:          Lp-norm to minimize ('Linf', 'L2', 'L1' supported)
 27 |     :param n_restarts:    number of random restarts
 28 |     :param n_iter:        number of iterations
 29 |     :param eps:           epsilon for the random restarts
 30 |     :param alpha_max:     alpha_max
 31 |     :param eta:           overshooting
 32 |     :param beta:          backward step
 33 |     """
 34 | 
 35 |     def __init__(
 36 |             self,
 37 |             predict,
 38 |             norm='Linf',
 39 |             n_restarts=1,
 40 |             n_iter=100,
 41 |             eps=None,
 42 |             alpha_max=0.1,
 43 |             eta=1.05,
 44 |             beta=0.9,
 45 |             loss_fn=None,
 46 |             verbose=False,
 47 |             seed=0,
 48 |             targeted=False,
 49 |             device=None,
 50 |             n_target_classes=9):
 51 |         """ FAB-attack implementation in pytorch """
 52 | 
 53 |         self.predict = predict
 54 |         super().__init__(norm,
 55 |                          n_restarts,
 56 |                          n_iter,
 57 |                          eps,
 58 |                          alpha_max,
 59 |                          eta,
 60 |                          beta,
 61 |                          loss_fn,
 62 |                          verbose,
 63 |                          seed,
 64 |                          targeted,
 65 |                          device,
 66 |                          n_target_classes)
 67 | 
 68 |     def _predict_fn(self, x):
 69 |         return self.predict(x)
 70 | 
 71 |     def _get_predicted_label(self, x):
 72 |         with torch.no_grad():
 73 |             outputs = self._predict_fn(x)
 74 |         _, y = torch.max(outputs, dim=1)
 75 |         return y
 76 | 
 77 |     def get_diff_logits_grads_batch(self, imgs, la):
 78 |         im = imgs.clone().requires_grad_()
 79 |         with torch.enable_grad():
 80 |             y = self.predict(im)
 81 | 
 82 |         g2 = torch.zeros([y.shape[-1], *imgs.size()]).to(self.device)
 83 |         grad_mask = torch.zeros_like(y)
 84 |         for counter in range(y.shape[-1]):
 85 |             zero_gradients(im)
 86 |             grad_mask[:, counter] = 1.0
 87 |             y.backward(grad_mask, retain_graph=True)
 88 |             grad_mask[:, counter] = 0.0
 89 |             g2[counter] = im.grad.data
 90 | 
 91 |         g2 = torch.transpose(g2, 0, 1).detach()
 92 |         #y2 = self.predict(imgs).detach()
 93 |         y2 = y.detach()
 94 |         df = y2 - y2[torch.arange(imgs.shape[0]), la].unsqueeze(1)
 95 |         dg = g2 - g2[torch.arange(imgs.shape[0]), la].unsqueeze(1)
 96 |         df[torch.arange(imgs.shape[0]), la] = 1e10
 97 | 
 98 |         return df, dg
 99 | 
100 |     def get_diff_logits_grads_batch_targeted(self, imgs, la, la_target):
101 |         u = torch.arange(imgs.shape[0])
102 |         im = imgs.clone().requires_grad_()
103 |         with torch.enable_grad():
104 |             y = self.predict(im)
105 |             diffy = -(y[u, la] - y[u, la_target])
106 |             sumdiffy = diffy.sum()
107 | 
108 |         zero_gradients(im)
109 |         sumdiffy.backward()
110 |         graddiffy = im.grad.data
111 |         df = diffy.detach().unsqueeze(1)
112 |         dg = graddiffy.unsqueeze(1)
113 | 
114 |         return df, dg
115 | 


--------------------------------------------------------------------------------
/autoattack/fab_tf.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) 2019-present, Francesco Croce
 2 | # All rights reserved.
 3 | #
 4 | # This source code is licensed under the license found in the
 5 | # LICENSE file in the root directory of this source tree.
 6 | #
 7 | 
 8 | from __future__ import absolute_import
 9 | from __future__ import division
10 | from __future__ import print_function
11 | from __future__ import unicode_literals
12 | 
13 | import torch
14 | from autoattack.fab_base import FABAttack
15 | 
16 | 
17 | class FABAttack_TF(FABAttack):
18 |     """
19 |     Fast Adaptive Boundary Attack (Linf, L2, L1)
20 |     https://arxiv.org/abs/1907.02044
21 |     
22 |     :param model:         TF_model
23 |     :param norm:          Lp-norm to minimize ('Linf', 'L2', 'L1' supported)
24 |     :param n_restarts:    number of random restarts
25 |     :param n_iter:        number of iterations
26 |     :param eps:           epsilon for the random restarts
27 |     :param alpha_max:     alpha_max
28 |     :param eta:           overshooting
29 |     :param beta:          backward step
30 |     """
31 | 
32 |     def __init__(
33 |             self,
34 |             model,
35 |             norm='Linf',
36 |             n_restarts=1,
37 |             n_iter=100,
38 |             eps=None,
39 |             alpha_max=0.1,
40 |             eta=1.05,
41 |             beta=0.9,
42 |             loss_fn=None,
43 |             verbose=False,
44 |             seed=0,
45 |             targeted=False,
46 |             device=None,
47 |             n_target_classes=9):
48 |         """ FAB-attack implementation in TF2 """
49 | 
50 |         self.model = model
51 |         super().__init__(norm,
52 |                          n_restarts,
53 |                          n_iter,
54 |                          eps,
55 |                          alpha_max,
56 |                          eta,
57 |                          beta,
58 |                          loss_fn,
59 |                          verbose,
60 |                          seed,
61 |                          targeted,
62 |                          device,
63 |                          n_target_classes)
64 |     
65 |     def _predict_fn(self, x):
66 |         return self.model.predict(x)
67 | 
68 |     def _get_predicted_label(self, x):
69 |         with torch.no_grad():
70 |             outputs = self._predict_fn(x)
71 |         _, y = torch.max(outputs, dim=1)
72 |         return y
73 |     
74 |     def get_diff_logits_grads_batch(self, imgs, la):
75 |         y2, g2 = self.model.grad_logits(imgs)
76 |         df = y2 - y2[torch.arange(imgs.shape[0]), la].unsqueeze(1)
77 |         dg = g2 - g2[torch.arange(imgs.shape[0]), la].unsqueeze(1)
78 |         df[torch.arange(imgs.shape[0]), la] = 1e10
79 | 
80 |         return df, dg
81 | 
82 |     def get_diff_logits_grads_batch_targeted(self, imgs, la, la_target):
83 |         df, dg = self.model.get_grad_diff_logits_target(imgs, la, la_target)
84 |         df.unsqueeze_(1)
85 |         dg.unsqueeze_(1)
86 | 
87 |         return df, dg
88 | 


--------------------------------------------------------------------------------
/autoattack/other_utils.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import collections.abc as container_abcs
 3 | 
 4 | import torch
 5 | 
 6 | class Logger():
 7 |     def __init__(self, log_path):
 8 |         self.log_path = log_path
 9 |         
10 |     def log(self, str_to_log):
11 |         print(str_to_log)
12 |         if not self.log_path is None:
13 |             with open(self.log_path, 'a') as f:
14 |                 f.write(str_to_log + '\n')
15 |                 f.flush()
16 |             
17 | def check_imgs(adv, x, norm):
18 |     delta = (adv - x).view(adv.shape[0], -1)
19 |     if norm == 'Linf':
20 |         res = delta.abs().max(dim=1)[0]
21 |     elif norm == 'L2':
22 |         res = (delta ** 2).sum(dim=1).sqrt()
23 |     elif norm == 'L1':
24 |         res = delta.abs().sum(dim=1)
25 | 
26 |     str_det = 'max {} pert: {:.5f}, nan in imgs: {}, max in imgs: {:.5f}, min in imgs: {:.5f}'.format(
27 |         norm, res.max(), (adv != adv).sum(), adv.max(), adv.min())
28 |     print(str_det)
29 |     
30 |     return str_det
31 | 
32 | def L1_norm(x, keepdim=False):
33 |     z = x.abs().view(x.shape[0], -1).sum(-1)
34 |     if keepdim:
35 |         z = z.view(-1, *[1]*(len(x.shape) - 1))
36 |     return z
37 | 
38 | def L2_norm(x, keepdim=False):
39 |     z = (x ** 2).view(x.shape[0], -1).sum(-1).sqrt()
40 |     if keepdim:
41 |         z = z.view(-1, *[1]*(len(x.shape) - 1))
42 |     return z
43 | 
44 | def L0_norm(x):
45 |     return (x != 0.).view(x.shape[0], -1).sum(-1)
46 | 
47 | def makedir(path):
48 |     if not os.path.exists(path):
49 |         os.makedirs(path)
50 | 
51 | def zero_gradients(x):
52 |     if isinstance(x, torch.Tensor):
53 |         if x.grad is not None:
54 |             x.grad.detach_()
55 |             x.grad.zero_()
56 |     elif isinstance(x, container_abcs.Iterable):
57 |         for elem in x:
58 |             zero_gradients(elem)
59 | 


--------------------------------------------------------------------------------
/autoattack/square.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2020-present, Francesco Croce
  2 | # All rights reserved.
  3 | #
  4 | # This source code is licensed under the license found in the
  5 | # LICENSE file in the root directory of this source tree.
  6 | #
  7 | 
  8 | from __future__ import absolute_import
  9 | from __future__ import division
 10 | from __future__ import print_function
 11 | from __future__ import unicode_literals
 12 | 
 13 | import torch
 14 | import time
 15 | import math
 16 | import torch.nn.functional as F
 17 | 
 18 | from autoattack.autopgd_base import L1_projection
 19 | 
 20 | class SquareAttack():
 21 |     """
 22 |     Square Attack
 23 |     https://arxiv.org/abs/1912.00049
 24 | 
 25 |     :param predict:       forward pass function
 26 |     :param norm:          Lp-norm of the attack ('Linf', 'L2' supported)
 27 |     :param n_restarts:    number of random restarts
 28 |     :param n_queries:     max number of queries (each restart)
 29 |     :param eps:           bound on the norm of perturbations
 30 |     :param seed:          random seed for the starting point
 31 |     :param p_init:        parameter to control size of squares
 32 |     :param loss:          loss function optimized ('margin', 'ce' supported)
 33 |     :param resc_schedule  adapt schedule of p to n_queries
 34 |     """
 35 | 
 36 |     def __init__(
 37 |             self,
 38 |             predict,
 39 |             norm='Linf',
 40 |             n_queries=5000,
 41 |             eps=None,
 42 |             p_init=.8,
 43 |             n_restarts=1,
 44 |             seed=0,
 45 |             verbose=False,
 46 |             targeted=False,
 47 |             loss='margin',
 48 |             resc_schedule=True,
 49 |             device=None):
 50 |         """
 51 |         Square Attack implementation in PyTorch
 52 |         """
 53 |         
 54 |         self.predict = predict
 55 |         self.norm = norm
 56 |         self.n_queries = n_queries
 57 |         self.eps = eps
 58 |         self.p_init = p_init
 59 |         self.n_restarts = n_restarts
 60 |         self.seed = seed
 61 |         self.verbose = verbose
 62 |         self.targeted = targeted
 63 |         self.loss = loss
 64 |         self.rescale_schedule = resc_schedule
 65 |         self.device = device
 66 |         self.return_all = False
 67 |     
 68 |     def margin_and_loss(self, x, y):
 69 |         """
 70 |         :param y:        correct labels if untargeted else target labels
 71 |         """
 72 | 
 73 |         logits = self.predict(x)
 74 |         xent = F.cross_entropy(logits, y, reduction='none')
 75 |         u = torch.arange(x.shape[0])
 76 |         y_corr = logits[u, y].clone()
 77 |         logits[u, y] = -float('inf')
 78 |         y_others = logits.max(dim=-1)[0]
 79 | 
 80 |         if not self.targeted:
 81 |             if self.loss == 'ce':
 82 |                 return y_corr - y_others, -1. * xent
 83 |             elif self.loss == 'margin':
 84 |                 return y_corr - y_others, y_corr - y_others
 85 |         else:
 86 |             return y_others - y_corr, xent
 87 | 
 88 |     def init_hyperparam(self, x):
 89 |         assert self.norm in ['Linf', 'L2', 'L1']
 90 |         assert not self.eps is None
 91 |         assert self.loss in ['ce', 'margin']
 92 | 
 93 |         if self.device is None:
 94 |             self.device = x.device
 95 |         self.orig_dim = list(x.shape[1:])
 96 |         self.ndims = len(self.orig_dim)
 97 |         if self.seed is None:
 98 |             self.seed = time.time()
 99 | 
100 |     def random_target_classes(self, y_pred, n_classes):
101 |         y = torch.zeros_like(y_pred)
102 |         for counter in range(y_pred.shape[0]):
103 |             l = list(range(n_classes))
104 |             l.remove(y_pred[counter])
105 |             t = self.random_int(0, len(l))
106 |             y[counter] = l[t]
107 | 
108 |         return y.long().to(self.device)
109 | 
110 |     def check_shape(self, x):
111 |         return x if len(x.shape) == (self.ndims + 1) else x.unsqueeze(0)
112 | 
113 |     def random_choice(self, shape):
114 |         t = 2 * torch.rand(shape).to(self.device) - 1
115 |         return torch.sign(t)
116 | 
117 |     def random_int(self, low=0, high=1, shape=[1]):
118 |         t = low + (high - low) * torch.rand(shape).to(self.device)
119 |         return t.long()
120 | 
121 |     def normalize(self, x):
122 |         if self.norm == 'Linf':
123 |             t = x.abs().view(x.shape[0], -1).max(1)[0]
124 |             return x / (t.view(-1, *([1] * self.ndims)) + 1e-12)
125 | 
126 |         elif self.norm == 'L2':
127 |             t = (x ** 2).view(x.shape[0], -1).sum(-1).sqrt()
128 |             return x / (t.view(-1, *([1] * self.ndims)) + 1e-12)
129 | 
130 |         elif self.norm == 'L1':
131 |             t = x.abs().view(x.shape[0], -1).sum(dim=-1)
132 |             return x / (t.view(-1, *([1] * self.ndims)) + 1e-12)
133 |     
134 |     def lp_norm(self, x):
135 |         if self.norm == 'L2':
136 |             t = (x ** 2).view(x.shape[0], -1).sum(-1).sqrt()
137 |             return t.view(-1, *([1] * self.ndims))
138 | 
139 |         elif self.norm == 'L1':
140 |             t = x.abs().view(x.shape[0], -1).sum(dim=-1)
141 |             return t.view(-1, *([1] * self.ndims))
142 |     
143 |     def eta_rectangles(self, x, y):
144 |         delta = torch.zeros([x, y]).to(self.device)
145 |         x_c, y_c = x // 2 + 1, y // 2 + 1
146 | 
147 |         counter2 = [x_c - 1, y_c - 1]
148 |         if self.norm == 'L2':
149 |             for counter in range(0, max(x_c, y_c)):
150 |               delta[max(counter2[0], 0):min(counter2[0] + (2*counter + 1), x),
151 |                   max(0, counter2[1]):min(counter2[1] + (2*counter + 1), y)
152 |                   ] += 1.0/(torch.Tensor([counter + 1]).view(1, 1).to(
153 |                   self.device) ** 2)
154 |               counter2[0] -= 1
155 |               counter2[1] -= 1
156 |     
157 |             delta /= (delta ** 2).sum(dim=(0, 1), keepdim=True).sqrt()
158 |         
159 |         elif self.norm == 'L1':
160 |             for counter in range(0, max(x_c, y_c)):
161 |               delta[max(counter2[0], 0):min(counter2[0] + (2*counter + 1), x),
162 |                   max(0, counter2[1]):min(counter2[1] + (2*counter + 1), y)
163 |                   ] += 1.0/(torch.Tensor([counter + 1]).view(1, 1).to(
164 |                   self.device) ** 4)
165 |               counter2[0] -= 1
166 |               counter2[1] -= 1
167 |     
168 |             delta /= delta.abs().sum(dim=(), keepdim=True)
169 |         
170 |         return delta
171 | 
172 |     def eta(self, s):
173 |         if self.norm == 'L2':
174 |             delta = torch.zeros([s, s]).to(self.device)
175 |             delta[:s // 2] = self.eta_rectangles(s // 2, s)
176 |             delta[s // 2:] = -1. * self.eta_rectangles(s - s // 2, s)
177 |             delta /= (delta ** 2).sum(dim=(0, 1), keepdim=True).sqrt()
178 |         
179 |         elif self.norm == 'L1':
180 |             delta = torch.zeros([s, s]).to(self.device)
181 |             delta[:s // 2] = self.eta_rectangles(s // 2, s)
182 |             delta[s // 2:] = -1. * self.eta_rectangles(s - s // 2, s)
183 |             #delta = self.eta_rectangles(s, s)
184 |             delta /= delta.abs().sum(dim=(), keepdim=True)
185 |             #delta *= (torch.rand([1]) - .5).sign().to(self.device)
186 |         
187 |         if torch.rand([1]) > 0.5:
188 |             delta = delta.permute([1, 0])
189 | 
190 |         return delta
191 | 
192 |     def p_selection(self, it):
193 |         """ schedule to decrease the parameter p """
194 | 
195 |         if self.rescale_schedule:
196 |             it = int(it / self.n_queries * 10000)
197 | 
198 |         if 10 < it <= 50:
199 |             p = self.p_init / 2
200 |         elif 50 < it <= 200:
201 |             p = self.p_init / 4
202 |         elif 200 < it <= 500:
203 |             p = self.p_init / 8
204 |         elif 500 < it <= 1000:
205 |             p = self.p_init / 16
206 |         elif 1000 < it <= 2000:
207 |             p = self.p_init / 32
208 |         elif 2000 < it <= 4000:
209 |             p = self.p_init / 64
210 |         elif 4000 < it <= 6000:
211 |             p = self.p_init / 128
212 |         elif 6000 < it <= 8000:
213 |             p = self.p_init / 256
214 |         elif 8000 < it:
215 |             p = self.p_init / 512
216 |         else:
217 |             p = self.p_init
218 | 
219 |         return p
220 | 
221 |     def attack_single_run(self, x, y):
222 |         with torch.no_grad():
223 |             adv = x.clone()
224 |             c, h, w = x.shape[1:]
225 |             n_features = c * h * w
226 |             n_ex_total = x.shape[0]
227 | 
228 |             if self.verbose and h != w:
229 |                 print('square attack may not work properly for non-square image.')
230 |                 print('for details please refer to https://github.com/fra31/auto-attack/issues/95')
231 | 
232 |             
233 |             if self.norm == 'Linf':
234 |                 x_best = torch.clamp(x + self.eps * self.random_choice(
235 |                     [x.shape[0], c, 1, w]), 0., 1.)
236 |                 margin_min, loss_min = self.margin_and_loss(x_best, y)
237 |                 n_queries = torch.ones(x.shape[0]).to(self.device)
238 |                 s_init = int(math.sqrt(self.p_init * n_features / c))
239 |                 
240 |                 if (margin_min < 0.0).all():
241 |                     return n_queries, x_best
242 |                 
243 |                 for i_iter in range(self.n_queries):
244 |                     idx_to_fool = (margin_min > 0.0).nonzero().squeeze()
245 |                     
246 |                     x_curr = self.check_shape(x[idx_to_fool])
247 |                     x_best_curr = self.check_shape(x_best[idx_to_fool])
248 |                     y_curr = y[idx_to_fool]
249 |                     if len(y_curr.shape) == 0:
250 |                         y_curr = y_curr.unsqueeze(0)
251 |                     margin_min_curr = margin_min[idx_to_fool]
252 |                     loss_min_curr = loss_min[idx_to_fool]
253 |                     
254 |                     p = self.p_selection(i_iter)
255 |                     s = max(int(round(math.sqrt(p * n_features / c))), 1)
256 |                     s = min(s, min(h, w))
257 |                     vh = self.random_int(0, h - s)
258 |                     vw = self.random_int(0, w - s)
259 |                     new_deltas = torch.zeros([c, h, w]).to(self.device)
260 |                     new_deltas[:, vh:vh + s, vw:vw + s
261 |                         ] = 2. * self.eps * self.random_choice([c, 1, 1])
262 |                     
263 |                     x_new = x_best_curr + new_deltas
264 |                     x_new = torch.min(torch.max(x_new, x_curr - self.eps),
265 |                         x_curr + self.eps)
266 |                     x_new = torch.clamp(x_new, 0., 1.)
267 |                     x_new = self.check_shape(x_new)
268 |                     
269 |                     margin, loss = self.margin_and_loss(x_new, y_curr)
270 | 
271 |                     # update loss if new loss is better
272 |                     idx_improved = (loss < loss_min_curr).float()
273 | 
274 |                     loss_min[idx_to_fool] = idx_improved * loss + (
275 |                         1. - idx_improved) * loss_min_curr
276 | 
277 |                     # update margin and x_best if new loss is better
278 |                     # or misclassification
279 |                     idx_miscl = (margin <= 0.).float()
280 |                     idx_improved = torch.max(idx_improved, idx_miscl)
281 | 
282 |                     margin_min[idx_to_fool] = idx_improved * margin + (
283 |                         1. - idx_improved) * margin_min_curr
284 |                     idx_improved = idx_improved.reshape([-1,
285 |                         *[1]*len(x.shape[:-1])])
286 |                     x_best[idx_to_fool] = idx_improved * x_new + (
287 |                         1. - idx_improved) * x_best_curr
288 |                     n_queries[idx_to_fool] += 1.
289 | 
290 |                     ind_succ = (margin_min <= 0.).nonzero().squeeze()
291 |                     if self.verbose and ind_succ.numel() != 0:
292 |                         print('{}'.format(i_iter + 1),
293 |                             '- success rate={}/{} ({:.2%})'.format(
294 |                             ind_succ.numel(), n_ex_total,
295 |                             float(ind_succ.numel()) / n_ex_total),
296 |                             '- avg # queries={:.1f}'.format(
297 |                             n_queries[ind_succ].mean().item()),
298 |                             '- med # queries={:.1f}'.format(
299 |                             n_queries[ind_succ].median().item()),
300 |                             '- loss={:.3f}'.format(loss_min.mean()))
301 | 
302 |                     if ind_succ.numel() == n_ex_total:
303 |                         break
304 |               
305 |             elif self.norm == 'L2':
306 |                 delta_init = torch.zeros_like(x)
307 |                 s = h // 5
308 |                 sp_init = (h - s * 5) // 2
309 |                 vh = sp_init + 0
310 |                 for _ in range(h // s):
311 |                     vw = sp_init + 0
312 |                     for _ in range(w // s):
313 |                         delta_init[:, :, vh:vh + s, vw:vw + s] += self.eta(
314 |                             s).view(1, 1, s, s) * self.random_choice(
315 |                             [x.shape[0], c, 1, 1])
316 |                         vw += s
317 |                     vh += s
318 | 
319 |                 x_best = torch.clamp(x + self.normalize(delta_init
320 |                     ) * self.eps, 0., 1.)
321 |                 margin_min, loss_min = self.margin_and_loss(x_best, y)
322 |                 n_queries = torch.ones(x.shape[0]).to(self.device)
323 |                 s_init = int(math.sqrt(self.p_init * n_features / c))
324 |                 
325 |                 if (margin_min < 0.0).all():
326 |                     return n_queries, x_best
327 | 
328 |                 for i_iter in range(self.n_queries):
329 |                     idx_to_fool = (margin_min > 0.0).nonzero().squeeze()
330 | 
331 |                     x_curr = self.check_shape(x[idx_to_fool])
332 |                     x_best_curr = self.check_shape(x_best[idx_to_fool])
333 |                     y_curr = y[idx_to_fool]
334 |                     if len(y_curr.shape) == 0:
335 |                         y_curr = y_curr.unsqueeze(0)
336 |                     margin_min_curr = margin_min[idx_to_fool]
337 |                     loss_min_curr = loss_min[idx_to_fool]
338 | 
339 |                     delta_curr = x_best_curr - x_curr
340 |                     p = self.p_selection(i_iter)
341 |                     s = max(int(round(math.sqrt(p * n_features / c))), 3)
342 |                     if s % 2 == 0:
343 |                         s += 1
344 |                     s = min(s, min(h, w))
345 | 
346 |                     vh = self.random_int(0, h - s)
347 |                     vw = self.random_int(0, w - s)
348 |                     new_deltas_mask = torch.zeros_like(x_curr)
349 |                     new_deltas_mask[:, :, vh:vh + s, vw:vw + s] = 1.0
350 |                     norms_window_1 = (delta_curr[:, :, vh:vh + s, vw:vw + s
351 |                         ] ** 2).sum(dim=(-2, -1), keepdim=True).sqrt()
352 | 
353 |                     vh2 = self.random_int(0, h - s)
354 |                     vw2 = self.random_int(0, w - s)
355 |                     new_deltas_mask_2 = torch.zeros_like(x_curr)
356 |                     new_deltas_mask_2[:, :, vh2:vh2 + s, vw2:vw2 + s] = 1.
357 | 
358 |                     norms_image = self.lp_norm(x_best_curr - x_curr)
359 |                     mask_image = torch.max(new_deltas_mask, new_deltas_mask_2)
360 |                     norms_windows = ((delta_curr * mask_image) ** 2).sum(dim=(
361 |                         -2, -1), keepdim=True).sqrt()
362 | 
363 |                     new_deltas = torch.ones([x_curr.shape[0], c, s, s]
364 |                         ).to(self.device)
365 |                     new_deltas *= (self.eta(s).view(1, 1, s, s) *
366 |                         self.random_choice([x_curr.shape[0], c, 1, 1]))
367 |                     old_deltas = delta_curr[:, :, vh:vh + s, vw:vw + s] / (
368 |                         1e-12 + norms_window_1)
369 |                     new_deltas += old_deltas
370 |                     new_deltas = new_deltas / (1e-12 + (new_deltas ** 2).sum(
371 |                         dim=(-2, -1), keepdim=True).sqrt()) * (torch.max(
372 |                         (self.eps * torch.ones_like(new_deltas)) ** 2 -
373 |                         norms_image ** 2, torch.zeros_like(new_deltas)) /
374 |                         c + norms_windows ** 2).sqrt()
375 |                     delta_curr[:, :, vh2:vh2 + s, vw2:vw2 + s] = 0.
376 |                     delta_curr[:, :, vh:vh + s, vw:vw + s] = new_deltas + 0
377 | 
378 |                     x_new = torch.clamp(x_curr + self.normalize(delta_curr
379 |                         ) * self.eps, 0. ,1.)
380 |                     x_new = self.check_shape(x_new)
381 |                     norms_image = self.lp_norm(x_new - x_curr)
382 | 
383 |                     margin, loss = self.margin_and_loss(x_new, y_curr)
384 | 
385 |                     # update loss if new loss is better
386 |                     idx_improved = (loss < loss_min_curr).float()
387 | 
388 |                     loss_min[idx_to_fool] = idx_improved * loss + (
389 |                         1. - idx_improved) * loss_min_curr
390 | 
391 |                     # update margin and x_best if new loss is better
392 |                     # or misclassification
393 |                     idx_miscl = (margin <= 0.).float()
394 |                     idx_improved = torch.max(idx_improved, idx_miscl)
395 | 
396 |                     margin_min[idx_to_fool] = idx_improved * margin + (
397 |                         1. - idx_improved) * margin_min_curr
398 |                     idx_improved = idx_improved.reshape([-1,
399 |                         *[1]*len(x.shape[:-1])])
400 |                     x_best[idx_to_fool] = idx_improved * x_new + (
401 |                         1. - idx_improved) * x_best_curr
402 |                     n_queries[idx_to_fool] += 1.
403 | 
404 |                     ind_succ = (margin_min <= 0.).nonzero().squeeze()
405 |                     if self.verbose and ind_succ.numel() != 0:
406 |                         print('{}'.format(i_iter + 1),
407 |                             '- success rate={}/{} ({:.2%})'.format(
408 |                             ind_succ.numel(), n_ex_total, float(
409 |                             ind_succ.numel()) / n_ex_total),
410 |                             '- avg # queries={:.1f}'.format(
411 |                             n_queries[ind_succ].mean().item()),
412 |                             '- med # queries={:.1f}'.format(
413 |                             n_queries[ind_succ].median().item()),
414 |                             '- loss={:.3f}'.format(loss_min.mean()))
415 | 
416 |                     assert (x_new != x_new).sum() == 0
417 |                     assert (x_best != x_best).sum() == 0
418 |                     
419 |                     if ind_succ.numel() == n_ex_total:
420 |                         break
421 | 
422 |             elif self.norm == 'L1':
423 |                 delta_init = torch.zeros_like(x)
424 |                 s = h // 5
425 |                 sp_init = (h - s * 5) // 2
426 |                 vh = sp_init + 0
427 |                 for _ in range(h // s):
428 |                     vw = sp_init + 0
429 |                     for _ in range(w // s):
430 |                         delta_init[:, :, vh:vh + s, vw:vw + s] += self.eta(
431 |                             s).view(1, 1, s, s) * self.random_choice(
432 |                             [x.shape[0], c, 1, 1])
433 |                         vw += s
434 |                     vh += s
435 | 
436 |                 #x_best = torch.clamp(x + self.normalize(delta_init
437 |                 #    ) * self.eps, 0., 1.)
438 |                 r_best = L1_projection(x, delta_init, self.eps * (1. - 1e-6))
439 |                 x_best = x + delta_init + r_best
440 |                 margin_min, loss_min = self.margin_and_loss(x_best, y)
441 |                 n_queries = torch.ones(x.shape[0]).to(self.device)
442 |                 s_init = int(math.sqrt(self.p_init * n_features / c))
443 |                 
444 |                 if (margin_min < 0.0).all():
445 |                     return n_queries, x_best
446 | 
447 |                 for i_iter in range(self.n_queries):
448 |                     idx_to_fool = (margin_min > 0.0).nonzero().squeeze()
449 | 
450 |                     x_curr = self.check_shape(x[idx_to_fool])
451 |                     x_best_curr = self.check_shape(x_best[idx_to_fool])
452 |                     y_curr = y[idx_to_fool]
453 |                     if len(y_curr.shape) == 0:
454 |                         y_curr = y_curr.unsqueeze(0)
455 |                     margin_min_curr = margin_min[idx_to_fool]
456 |                     loss_min_curr = loss_min[idx_to_fool]
457 | 
458 |                     delta_curr = x_best_curr - x_curr
459 |                     p = self.p_selection(i_iter)
460 |                     s = max(int(round(math.sqrt(p * n_features / c))), 3)
461 |                     if s % 2 == 0:
462 |                         s += 1
463 |                         #pass
464 |                     s = min(s, min(h, w))
465 |                     
466 |                     vh = self.random_int(0, h - s)
467 |                     vw = self.random_int(0, w - s)
468 |                     new_deltas_mask = torch.zeros_like(x_curr)
469 |                     new_deltas_mask[:, :, vh:vh + s, vw:vw + s] = 1.0
470 |                     norms_window_1 = delta_curr[:, :, vh:vh + s, vw:vw + s
471 |                         ].abs().sum(dim=(-2, -1), keepdim=True)
472 | 
473 |                     vh2 = self.random_int(0, h - s)
474 |                     vw2 = self.random_int(0, w - s)
475 |                     new_deltas_mask_2 = torch.zeros_like(x_curr)
476 |                     new_deltas_mask_2[:, :, vh2:vh2 + s, vw2:vw2 + s] = 1.
477 | 
478 |                     norms_image = self.lp_norm(x_best_curr - x_curr)
479 |                     mask_image = torch.max(new_deltas_mask, new_deltas_mask_2)
480 |                     norms_windows = (delta_curr * mask_image).abs().sum(dim=(
481 |                         -2, -1), keepdim=True)
482 | 
483 |                     new_deltas = torch.ones([x_curr.shape[0], c, s, s]
484 |                         ).to(self.device)
485 |                     new_deltas *= (self.eta(s).view(1, 1, s, s) *
486 |                         self.random_choice([x_curr.shape[0], c, 1, 1]))
487 |                     old_deltas = delta_curr[:, :, vh:vh + s, vw:vw + s] / (
488 |                         1e-12 + norms_window_1)
489 |                     new_deltas += old_deltas
490 |                     new_deltas = new_deltas / (1e-12 + new_deltas.abs().sum(
491 |                         dim=(-2, -1), keepdim=True)) * (torch.max(
492 |                         self.eps * torch.ones_like(norms_image) -
493 |                         norms_image, torch.zeros_like(norms_image)) /
494 |                         c + norms_windows) * c
495 |                     delta_curr[:, :, vh2:vh2 + s, vw2:vw2 + s] = 0.
496 |                     delta_curr[:, :, vh:vh + s, vw:vw + s] = new_deltas + 0
497 | 
498 |                     #
499 |                     #norms_image_old = self.lp_norm(delta_curr)
500 |                     r_curr = L1_projection(x_curr, delta_curr, self.eps * (1. - 1e-6))
501 |                     x_new = x_curr + delta_curr + r_curr
502 |                     x_new = self.check_shape(x_new)
503 |                     norms_image = self.lp_norm(x_new - x_curr)
504 | 
505 |                     margin, loss = self.margin_and_loss(x_new, y_curr)
506 | 
507 |                     # update loss if new loss is better
508 |                     idx_improved = (loss < loss_min_curr).float()
509 | 
510 |                     loss_min[idx_to_fool] = idx_improved * loss + (
511 |                         1. - idx_improved) * loss_min_curr
512 | 
513 |                     # update margin and x_best if new loss is better
514 |                     # or misclassification
515 |                     idx_miscl = (margin <= 0.).float()
516 |                     idx_improved = torch.max(idx_improved, idx_miscl)
517 | 
518 |                     margin_min[idx_to_fool] = idx_improved * margin + (
519 |                         1. - idx_improved) * margin_min_curr
520 |                     idx_improved = idx_improved.reshape([-1,
521 |                         *[1]*len(x.shape[:-1])])
522 |                     x_best[idx_to_fool] = idx_improved * x_new + (
523 |                         1. - idx_improved) * x_best_curr
524 |                     n_queries[idx_to_fool] += 1.
525 | 
526 |                     ind_succ = (margin_min <= 0.).nonzero().squeeze()
527 |                     if self.verbose and ind_succ.numel() != 0:
528 |                         print('{}'.format(i_iter + 1),
529 |                             '- success rate={}/{} ({:.2%})'.format(
530 |                             ind_succ.numel(), n_ex_total, float(
531 |                             ind_succ.numel()) / n_ex_total),
532 |                             '- avg # queries={:.1f}'.format(
533 |                             n_queries[ind_succ].mean().item()),
534 |                             '- med # queries={:.1f}'.format(
535 |                             n_queries[ind_succ].median().item()),
536 |                             '- loss={:.3f}'.format(loss_min.mean()),
537 |                             '- max pert={:.3f}'.format(norms_image.max().item()),
538 |                             #'- old pert={:.3f}'.format(norms_image_old.max().item())
539 |                             )
540 |                     
541 |                     assert (x_new != x_new).sum() == 0
542 |                     assert (x_best != x_best).sum() == 0
543 |         
544 |                     if ind_succ.numel() == n_ex_total:
545 |                         break
546 |         
547 |         return n_queries, x_best
548 | 
549 |     def perturb(self, x, y=None):
550 |         """
551 |         :param x:           clean images
552 |         :param y:           untargeted attack -> clean labels,
553 |                             if None we use the predicted labels
554 |                             targeted attack -> target labels, if None random classes,
555 |                             different from the predicted ones, are sampled
556 |         """
557 | 
558 |         self.init_hyperparam(x)
559 | 
560 |         adv = x.clone()
561 |         #adv_all = x.clone()
562 |         if y is None:
563 |             if not self.targeted:
564 |                 with torch.no_grad():
565 |                     output = self.predict(x)
566 |                     y_pred = output.max(1)[1]
567 |                     y = y_pred.detach().clone().long().to(self.device)
568 |             else:
569 |                 with torch.no_grad():
570 |                     output = self.predict(x)
571 |                     n_classes = output.shape[-1]
572 |                     y_pred = output.max(1)[1]
573 |                     y = self.random_target_classes(y_pred, n_classes)
574 |         else:
575 |             y = y.detach().clone().long().to(self.device)
576 | 
577 |         if not self.targeted:
578 |             acc = self.predict(x).max(1)[1] == y
579 |         else:
580 |             acc = self.predict(x).max(1)[1] != y
581 | 
582 |         startt = time.time()
583 | 
584 |         torch.random.manual_seed(self.seed)
585 |         torch.cuda.random.manual_seed(self.seed)
586 | 
587 |         for counter in range(self.n_restarts):
588 |             ind_to_fool = acc.nonzero().squeeze()
589 |             if len(ind_to_fool.shape) == 0:
590 |                 ind_to_fool = ind_to_fool.unsqueeze(0)
591 |             if ind_to_fool.numel() != 0:
592 |                 x_to_fool = x[ind_to_fool].clone()
593 |                 y_to_fool = y[ind_to_fool].clone()
594 | 
595 |                 _, adv_curr = self.attack_single_run(x_to_fool, y_to_fool)
596 | 
597 |                 output_curr = self.predict(adv_curr)
598 |                 if not self.targeted:
599 |                     acc_curr = output_curr.max(1)[1] == y_to_fool
600 |                 else:
601 |                     acc_curr = output_curr.max(1)[1] != y_to_fool
602 |                 ind_curr = (acc_curr == 0).nonzero().squeeze()
603 | 
604 |                 acc[ind_to_fool[ind_curr]] = 0
605 |                 adv[ind_to_fool[ind_curr]] = adv_curr[ind_curr].clone()
606 |                 #adv_all[ind_to_fool] = adv_curr.clone()
607 |                 if self.verbose:
608 |                     print('restart {} - robust accuracy: {:.2%}'.format(
609 |                         counter, acc.float().mean()),
610 |                         '- cum. time: {:.1f} s'.format(
611 |                         time.time() - startt))
612 | 
613 |         if not self.return_all:
614 |             return adv
615 |         else:
616 |             print('returning final points')
617 |             return adv_all
618 | 
619 | 


--------------------------------------------------------------------------------
/autoattack/state.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | from dataclasses import dataclass, field, asdict
 3 | from datetime import datetime
 4 | from pathlib import Path
 5 | from typing import Optional, Set
 6 | import warnings
 7 | 
 8 | import torch
 9 | 
10 | 
11 | @dataclass
12 | class EvaluationState:
13 |     _attacks_to_run: Set[str]
14 |     path: Optional[Path] = None
15 |     _run_attacks: Set[str] = field(default_factory=set)
16 |     _robust_flags: Optional[torch.Tensor] = None
17 |     _last_saved: datetime = datetime(1, 1, 1)
18 |     _SAVE_TIMEOUT: int = 60
19 |     _clean_accuracy: float = float("nan")
20 | 
21 |     def to_disk(self, force: bool = False) -> None:
22 |         seconds_since_last_save = (datetime.now() -
23 |                                    self._last_saved).total_seconds()
24 |         if self.path is None or (seconds_since_last_save < self._SAVE_TIMEOUT
25 |                                  and not force):
26 |             return
27 |         self._last_saved = datetime.now()
28 |         d = asdict(self)
29 |         if self.robust_flags is not None:
30 |             d["_robust_flags"] = d["_robust_flags"].cpu().tolist()
31 |         d["_run_attacks"] = list(self._run_attacks)
32 |         with self.path.open("w", ) as f:
33 |             json.dump(d, f, default=str)
34 | 
35 |     @classmethod
36 |     def from_disk(cls, path: Path) -> "EvaluationState":
37 |         with path.open("r") as f:
38 |             d = json.load(f)
39 |         d["_robust_flags"] = torch.tensor(d["_robust_flags"], dtype=torch.bool)
40 |         d["path"] = Path(d["path"])
41 |         if path != d["path"]:
42 |             warnings.warn(
43 |                 UserWarning(
44 |                     "The given path is different from the one found in the state file."
45 |                 ))
46 |         d["_last_saved"] = datetime.fromisoformat(d["_last_saved"])
47 |         return cls(**d)
48 | 
49 |     @property
50 |     def robust_flags(self) -> Optional[torch.Tensor]:
51 |         return self._robust_flags
52 | 
53 |     @robust_flags.setter
54 |     def robust_flags(self, robust_flags: torch.Tensor) -> None:
55 |         self._robust_flags = robust_flags
56 |         self.to_disk(force=True)
57 | 
58 |     @property
59 |     def run_attacks(self) -> Set[str]:
60 |         return self._run_attacks
61 | 
62 |     def add_run_attack(self, attack: str) -> None:
63 |         self._run_attacks.add(attack)
64 |         self.to_disk()
65 |         
66 |     @property
67 |     def attacks_to_run(self) -> Set[str]:
68 |         return self._attacks_to_run
69 |     
70 |     @attacks_to_run.setter
71 |     def attacks_to_run(self, _: Set[str]) -> None:
72 |         raise ValueError("attacks_to_run cannot be set outside of the constructor")
73 | 
74 |     @property
75 |     def clean_accuracy(self) -> float:
76 |         return self._clean_accuracy
77 | 
78 |     @clean_accuracy.setter
79 |     def clean_accuracy(self, accuracy) -> None:
80 |         self._clean_accuracy = accuracy
81 |         self.to_disk(force=True)
82 | 
83 |     @property
84 |     def robust_accuracy(self) -> float:
85 |         if self.robust_flags is None:
86 |             raise ValueError("robust_flags is not set yet. Start the attack first.")
87 |         if self.attacks_to_run - self.run_attacks:
88 |             warnings.warn("You are checking `robust_accuracy` before all the attacks"
89 |                           " have been run.")
90 |         return self.robust_flags.float().mean().item()


--------------------------------------------------------------------------------
/autoattack/utils_tf.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | import numpy as np
  3 | import torch
  4 | 
  5 | class ModelAdapter():
  6 |     def __init__(self, logits, x, y, sess, num_classes=10):
  7 |         self.logits = logits
  8 |         self.sess = sess
  9 |         self.x_input = x
 10 |         self.y_input = y
 11 |         self.num_classes = num_classes
 12 |         
 13 |         # gradients of logits
 14 |         if num_classes <= 10:
 15 |             self.grads = [None] * num_classes
 16 |             for cl in range(num_classes):
 17 |                 self.grads[cl] = tf.gradients(self.logits[:, cl], self.x_input)[0]
 18 |         
 19 |         # cross-entropy loss
 20 |         self.xent = tf.nn.sparse_softmax_cross_entropy_with_logits(
 21 |             logits=self.logits, labels=self.y_input)
 22 |         self.grad_xent = tf.gradients(self.xent, self.x_input)[0]
 23 |         
 24 |         # dlr loss
 25 |         self.dlr = dlr_loss(self.logits, self.y_input, num_classes=self.num_classes)
 26 |         self.grad_dlr = tf.gradients(self.dlr, self.x_input)[0]
 27 |         
 28 |         # targeted dlr loss
 29 |         self.y_target = tf.placeholder(tf.int64, shape=[None])
 30 |         self.dlr_target = dlr_loss_targeted(self.logits, self.y_input, self.y_target, num_classes=self.num_classes)
 31 |         self.grad_target = tf.gradients(self.dlr_target, self.x_input)[0]
 32 | 
 33 |         self.la = tf.placeholder(tf.int64, shape=[None])
 34 |         self.la_target = tf.placeholder(tf.int64, shape=[None])
 35 |         la_mask = tf.one_hot(self.la, self.num_classes)
 36 |         la_target_mask = tf.one_hot(self.la_target, self.num_classes)
 37 |         la_logit = tf.reduce_sum(la_mask * self.logits, axis=1)
 38 |         la_target_logit = tf.reduce_sum(la_target_mask * self.logits, axis=1)
 39 |         self.diff_logits = la_target_logit - la_logit
 40 |         self.grad_diff_logits = tf.gradients(self.diff_logits, self.x_input)[0]
 41 |     
 42 |     def predict(self, x):
 43 |         x2 = np.moveaxis(x.cpu().numpy(), 1, 3)
 44 |         y = self.sess.run(self.logits, {self.x_input: x2})
 45 |         
 46 |         return torch.from_numpy(y).cuda()
 47 | 
 48 |     def grad_logits(self, x):
 49 |         x2 = np.moveaxis(x.cpu().numpy(), 1, 3)
 50 |         logits, g2 = self.sess.run([self.logits, self.grads], {self.x_input: x2})
 51 |         g2 = np.moveaxis(np.array(g2), 0, 1)
 52 |         g2 = np.transpose(g2, (0, 1, 4, 2, 3))
 53 |         
 54 |         return torch.from_numpy(logits).cuda(), torch.from_numpy(g2).cuda()
 55 | 
 56 |     def get_grad_diff_logits_target(self, x, y=None, y_target=None):
 57 |         la = y.cpu().numpy()
 58 |         la_target = y_target.cpu().numpy()
 59 |         x2 = np.moveaxis(x.cpu().numpy(), 1, 3)
 60 |         dl, g2 = self.sess.run([self.diff_logits, self.grad_diff_logits], {self.x_input: x2, self.la: la, self.la_target: la_target})
 61 |         g2 = np.transpose(np.array(g2), (0, 3, 1, 2))
 62 |         
 63 |         return torch.from_numpy(dl).cuda(), torch.from_numpy(g2).cuda()
 64 |     
 65 |     def get_logits_loss_grad_xent(self, x, y):
 66 |         x2 = np.moveaxis(x.cpu().numpy(), 1, 3)
 67 |         y2 = y.clone().cpu().numpy()
 68 |         logits_val, loss_indiv_val, grad_val = self.sess.run([self.logits, self.xent, self.grad_xent], {self.x_input: x2, self.y_input: y2})
 69 |         grad_val = np.moveaxis(grad_val, 3, 1)
 70 |         
 71 |         return torch.from_numpy(logits_val).cuda(), torch.from_numpy(loss_indiv_val).cuda(), torch.from_numpy(grad_val).cuda()
 72 | 
 73 |     def get_logits_loss_grad_dlr(self, x, y):
 74 |         x2 = np.moveaxis(x.cpu().numpy(), 1, 3)
 75 |         y2 = y.clone().cpu().numpy()
 76 |         logits_val, loss_indiv_val, grad_val = self.sess.run([self.logits, self.dlr, self.grad_dlr], {self.x_input: x2, self.y_input: y2})
 77 |         grad_val = np.moveaxis(grad_val, 3, 1)
 78 |         
 79 |         return torch.from_numpy(logits_val).cuda(), torch.from_numpy(loss_indiv_val).cuda(), torch.from_numpy(grad_val).cuda()
 80 |     
 81 |     def get_logits_loss_grad_target(self, x, y, y_target):
 82 |         x2 = np.moveaxis(x.cpu().numpy(), 1, 3)
 83 |         y2 = y.clone().cpu().numpy()
 84 |         y_targ = y_target.clone().cpu().numpy()
 85 |         logits_val, loss_indiv_val, grad_val = self.sess.run([self.logits, self.dlr_target, self.grad_target], {self.x_input: x2, self.y_input: y2, self.y_target: y_targ})
 86 |         grad_val = np.moveaxis(grad_val, 3, 1)
 87 |         
 88 |         return torch.from_numpy(logits_val).cuda(), torch.from_numpy(loss_indiv_val).cuda(), torch.from_numpy(grad_val).cuda()
 89 | 
 90 | def dlr_loss(x, y, num_classes=10):
 91 |     x_sort = tf.contrib.framework.sort(x, axis=1)
 92 |     y_onehot = tf.one_hot(y, num_classes)
 93 |     ### TODO: adapt to the case when the point is already misclassified
 94 |     loss = -(x_sort[:, -1] - x_sort[:, -2]) / (x_sort[:, -1] - x_sort[:, -3] + 1e-12)
 95 | 
 96 |     return loss
 97 | 
 98 | def dlr_loss_targeted(x, y, y_target, num_classes=10):
 99 |     x_sort = tf.contrib.framework.sort(x, axis=1)
100 |     y_onehot = tf.one_hot(y, num_classes)
101 |     y_target_onehot = tf.one_hot(y_target, num_classes)
102 |     loss = -(tf.reduce_sum(x * y_onehot, axis=1) - tf.reduce_sum(x * y_target_onehot, axis=1)) / (x_sort[:, -1] - .5 * x_sort[:, -3] - .5 * x_sort[:, -4] + 1e-12)
103 |     
104 |     return loss
105 | 


--------------------------------------------------------------------------------
/autoattack/utils_tf2.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | import numpy as np
  3 | import torch
  4 | 
  5 | class ModelAdapter():
  6 |     def __init__(self, model, num_classes=10):
  7 |         """
  8 |         Please note that model should be tf.keras model without activation function 'softmax'
  9 |         """
 10 |         self.num_classes = num_classes
 11 |         self.tf_model = model
 12 |         self.data_format = self.__check_channel_ordering()
 13 | 
 14 |     def __tf_to_pt(self, tf_tensor):
 15 |         """ Private function
 16 |         Convert tf tensor to pt format
 17 | 
 18 |         Args:
 19 |             tf_tensor: (tf_tensor) TF tensor
 20 | 
 21 |         Retruns:
 22 |             pt_tensor: (pt_tensor) Pytorch tensor
 23 |         """
 24 | 
 25 |         cpu_tensor = tf_tensor.numpy()
 26 |         pt_tensor = torch.from_numpy(cpu_tensor).cuda()
 27 | 
 28 |         return pt_tensor
 29 | 
 30 |     def set_data_format(self, data_format):
 31 |         """
 32 |         Set data_format manually
 33 | 
 34 |         Args:
 35 |             data_format: A string, whose value should be either 'channels_last' or 'channels_first'
 36 |         """
 37 | 
 38 |         if data_format != 'channels_last' or data_format != 'channels_first':
 39 |             raise ValueError("data_format should be either 'channels_last' or 'channels_first'")
 40 | 
 41 |         self.data_format = data_format
 42 | 
 43 | 
 44 |     def __check_channel_ordering(self):
 45 |         """ Private function
 46 |         Determinate TF model's channel ordering based on model's information.
 47 |         Default ordering is 'channels_last' in TF.
 48 |         However, 'channels_first' is used in Pytorch.
 49 | 
 50 |         Returns:
 51 |             data_format: A string, whose value should be either 'channels_last' or 'channels_first'
 52 |         """
 53 | 
 54 |         data_format = None
 55 | 
 56 |         # Get the ordering of the dimensions in data from TF model
 57 |         for L in self.tf_model.layers:
 58 |             if isinstance(L, tf.keras.layers.Conv2D):
 59 |                 print("[INFO] set data_format = '{:s}'".format(L.data_format))
 60 |                 data_format = L.data_format
 61 |                 break
 62 | 
 63 |         # Guess the ordering of the dimensions in data by input dimensions which sould be 4-D tensor
 64 |         if data_format is None:
 65 |             print("[WARNING] Can not find Conv2D layer")
 66 |             input_shape = self.tf_model.input_shape
 67 | 
 68 |             # Assume that input is *colorful image* whose dimensions should be [batch_size, img_w, img_h, 3]
 69 |             if input_shape[3] == 3:
 70 |                 print("[INFO] Because detecting input_shape[3] == 3, set data_format = 'channels_last'")
 71 |                 data_format = 'channels_last'
 72 | 
 73 |             # Assume that input is *gray image* whose dimensions should be [batch_size, img_w, img_h, 1]
 74 |             elif input_shape[3] == 1:
 75 |                 print("[INFO] Because detecting input_shape[3] == 1, set data_format = 'channels_last'")
 76 |                 data_format = 'channels_last'
 77 | 
 78 |             # Assume that input is *colorful image* whose dimensions should be [batch_size, 3, img_w, img_h]
 79 |             elif input_shape[1] == 3:
 80 |                 print("[INFO] Because detecting input_shape[1] == 3, set data_format = 'channels_first'")
 81 |                 data_format = 'channels_first'
 82 | 
 83 |             # Assume that input is *gray image* whose dimensions should be [batch_size, 1, img_w, img_h]
 84 |             elif input_shape[1] == 1:
 85 |                 print("[INFO] Because detecting input_shape[1] == 1, set data_format = 'channels_first'")
 86 |                 data_format = 'channels_first'
 87 | 
 88 |             else:
 89 |                 print("[ERROR] Unknow case")
 90 | 
 91 |         return data_format
 92 | 
 93 | 
 94 |     # Common function which may be called in tf.function #
 95 |     def __get_logits(self, x_input):
 96 |         """ Private function
 97 |         Get model's pre-softmax output in inference mode
 98 | 
 99 |         Args:
100 |             x_input: (tf_tensor) Input data
101 | 
102 |         Returns:
103 |             logits: (tf_tensor) Logits
104 |         """
105 | 
106 |         return self.tf_model(x_input, training=False)
107 | 
108 | 
109 |     def __get_xent(self, logits, y_input):
110 |         """ Private function
111 |         Get cross entropy loss
112 | 
113 |         Args:
114 |             logits: (tf_tensor) Logits.
115 |             y_input: (tf_tensor) Label.
116 | 
117 |         Returns:
118 |             xent: (tf_tensor) Cross entropy
119 |         """
120 | 
121 |         return tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=y_input)
122 | 
123 | 
124 |     def __get_dlr(self, logit, y_input):
125 |         """ Private function
126 |         Get DLR loss
127 | 
128 |         Args:
129 |             logit: (tf_tensor) Logits
130 |             y_input: (tf_tensor) Input label
131 | 
132 |         Returns:
133 |             loss: (tf_tensor) DLR loss
134 |         """
135 | 
136 |         # logit
137 |         logit_sort = tf.sort(logit, axis=1)
138 | 
139 |         # onthot_y
140 |         y_onehot = tf.one_hot(y_input , self.num_classes, dtype=tf.float32)
141 |         logit_y = tf.reduce_sum(y_onehot * logit, axis=1)
142 | 
143 |         # z_i
144 |         logit_pred = tf.reduce_max(logit, axis=1)
145 |         cond = (logit_pred == logit_y)
146 |         z_i = tf.where(cond, logit_sort[:, -2], logit_sort[:, -1])
147 | 
148 |         # loss
149 |         z_y = logit_y
150 |         z_p1 =  logit_sort[:, -1]
151 |         z_p3 = logit_sort[:, -3]
152 | 
153 |         loss = - (z_y - z_i) / (z_p1 - z_p3 + 1e-12)
154 |         return loss
155 | 
156 | 
157 |     def __get_dlr_target(self, logits, y_input, y_target):
158 |         """ Private function
159 |         Get targeted version of DLR loss
160 | 
161 |         Args:
162 |             logit: (tf_tensor) Logits
163 |             y_input: (tf_tensor) Input label
164 |             y_target: (tf_tensor) Input targeted label
165 | 
166 |         Returns:
167 |             loss: (tf_tensor) Targeted DLR loss
168 |         """
169 | 
170 |         x = logits
171 |         x_sort = tf.sort(x, axis=1)
172 |         y_onehot = tf.one_hot(y_input, self.num_classes)
173 |         y_target_onehot = tf.one_hot(y_target, self.num_classes)
174 |         loss = -(tf.reduce_sum(x * y_onehot, axis=1) - tf.reduce_sum(x * y_target_onehot, axis=1)) / (x_sort[:, -1] - .5 * x_sort[:, -3] - .5 * x_sort[:, -4] + 1e-12)
175 | 
176 |         return loss
177 | 
178 | 
179 |     # function called by public API directly #
180 |     @tf.function
181 |     @tf.autograph.experimental.do_not_convert
182 |     def __get_jacobian(self, x_input):
183 |         """ Private function
184 |         Get Jacoian
185 | 
186 |         Args:
187 |             x_input: (tf_tensor) Input data
188 | 
189 |         Returns:
190 |             jaconbian: (tf_tensor) Jacobian
191 |         """
192 | 
193 |         with tf.GradientTape(watch_accessed_variables=False) as g:
194 |             g.watch(x_input)
195 |             logits = self.__get_logits(x_input)
196 | 
197 |         jacobian = g.batch_jacobian(logits, x_input)
198 | 
199 |         return logits, jacobian
200 | 
201 | 
202 |     @tf.function
203 |     @tf.autograph.experimental.do_not_convert
204 |     def __get_grad_xent(self, x_input, y_input):
205 |         """ Private function
206 |         Get gradient of cross entropy
207 | 
208 |         Args:
209 |             x_input: (tf_tensor) Input data
210 |             y_input: (tf_tensor) Input label
211 | 
212 |         Returns:
213 |             logits: (tf_tensor) Logits
214 |             xent: (tf_tensor) Cross entropy
215 |             grad_xent: (tf_tensor) Gradient of cross entropy
216 |         """
217 | 
218 |         with tf.GradientTape(watch_accessed_variables=False) as g:
219 |             g.watch(x_input)
220 |             logits = self.__get_logits(x_input)
221 |             xent = self.__get_xent(logits, y_input)
222 |         
223 |         grad_xent = g.gradient(xent, x_input)
224 | 
225 |         return logits, xent, grad_xent
226 | 
227 | 
228 |     @tf.function
229 |     @tf.autograph.experimental.do_not_convert
230 |     def __get_grad_diff_logits_target(self, x, la, la_target):
231 |         """ Private function
232 |         Get difference of logits and corrospopnding gradient
233 | 
234 |         Args:
235 |             x_input: (tf_tensor) Input data
236 |             la: (tf_tensor) Input label
237 |             la_target: (tf_tensor) Input targeted label
238 | 
239 |         Returns:
240 |             difflogits: (tf_tensor) Difference of logits
241 |             grad_diff: (tf_tensor) Gradient of difference of logits
242 |         """
243 | 
244 |         la_mask = tf.one_hot(la, self.num_classes)
245 |         la_target_mask = tf.one_hot(la_target, self.num_classes)
246 | 
247 |         with tf.GradientTape(watch_accessed_variables=False) as g:
248 |             g.watch(x)
249 |             logits = self.__get_logits(x)
250 |             difflogits = tf.reduce_sum((la_target_mask - la_mask) * logits, axis=1)
251 | 
252 |         grad_diff = g.gradient(difflogits, x)
253 | 
254 |         return difflogits, grad_diff
255 | 
256 | 
257 |     @tf.function
258 |     @tf.autograph.experimental.do_not_convert
259 |     def __get_grad_dlr(self, x_input, y_input):
260 |         """ Private function
261 |         Get gradient of DLR loss
262 | 
263 |         Args:
264 |             x_input: (tf_tensor) Input data
265 |             y_input: (tf_tensor) Input label
266 | 
267 |         Returns:
268 |             logits: (tf_tensor) Logits
269 |             val_dlr: (tf_tensor) DLR loss
270 |             grad_dlr: (tf_tensor) Gradient of DLR loss
271 |         """
272 | 
273 |         with tf.GradientTape(watch_accessed_variables=False) as g:
274 |             g.watch(x_input)
275 |             logits = self.__get_logits(x_input)
276 |             val_dlr = self.__get_dlr(logits, y_input)
277 | 
278 |         grad_dlr = g.gradient(val_dlr, x_input)
279 |         
280 |         return logits, val_dlr, grad_dlr
281 | 
282 | 
283 |     @tf.function
284 |     @tf.autograph.experimental.do_not_convert
285 |     def __get_grad_dlr_target(self, x_input, y_input, y_target):
286 |         """ Private function
287 |         Get gradient of targeted DLR loss
288 | 
289 |         Args:
290 |             x_input: (tf_tensor) Input data
291 |             y_input: (tf_tensor) Input label
292 |             y_target: (tf_tensor) Input targeted label
293 | 
294 |         Returns:
295 |             logits: (tf_tensor) Logits
296 |             val_dlr: (tf_tensor) Targeted DLR loss
297 |             grad_dlr: (tf_tensor) Gradient of targeted DLR loss
298 |         """
299 | 
300 |         with tf.GradientTape(watch_accessed_variables=False) as g:
301 |             g.watch(x_input)
302 |             logits = self.__get_logits(x_input)
303 |             dlr_target = self.__get_dlr_target(logits, y_input, y_target)
304 | 
305 |         grad_target = g.gradient(dlr_target, x_input)
306 | 
307 |         return logits, dlr_target, grad_target
308 |     
309 | 
310 |     # Public API #
311 |     def predict(self, x):
312 |         """
313 |         Get model's pre-softmax output in inference mode
314 | 
315 |         Args:
316 |             x_input: (pytorch_tensor) Input data
317 | 
318 |         Returns:
319 |             y: (pytorch_tensor) Pre-softmax output
320 |         """
321 | 
322 |         # Convert pt_tensor to tf format
323 |         x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32)
324 |         if self.data_format == 'channels_last':
325 |             x2 = tf.transpose(x2, perm=[0,2,3,1])
326 | 
327 |         # Get result
328 |         y = self.__get_logits(x2)
329 | 
330 |         # Convert result to pt format
331 |         y = self.__tf_to_pt(y)
332 |         
333 |         return y
334 | 
335 | 
336 |     def grad_logits(self, x):
337 |         """
338 |         Get logits and gradient of logits
339 | 
340 |         Args:
341 |             x: (pytorch_tensor) Input data
342 | 
343 |         Returns:
344 |             logits: (pytorch_tensor) Logits
345 |             g2: (pytorch_tensor) Jacobian
346 |         """
347 | 
348 |         # Convert pt_tensor to tf format
349 |         x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32)
350 |         if self.data_format == 'channels_last':
351 |             x2 = tf.transpose(x2, perm=[0,2,3,1])
352 |         
353 |         # Get result
354 |         logits, g2 = self.__get_jacobian(x2)
355 | 
356 |         # Convert result to pt format
357 |         if self.data_format == 'channels_last':
358 |             g2 = tf.transpose(g2, perm=[0,1,4,2,3])
359 |         logits = self.__tf_to_pt(logits)
360 |         g2 = self.__tf_to_pt(g2)
361 | 
362 |         return logits, g2
363 | 
364 | 
365 |     def get_logits_loss_grad_xent(self, x, y):
366 |         """
367 |         Get gradient of cross entropy
368 | 
369 |         Args:
370 |             x: (pytorch_tensor) Input data
371 |             y: (pytorch_tensor) Input label
372 | 
373 |         Returns:
374 |             logits_val: (pytorch_tensor) Logits
375 |             loss_indiv_val: (pytorch_tensor) Cross entropy
376 |             grad_val: (pytorch_tensor) Gradient of cross entropy
377 |         """
378 | 
379 |         # Convert pt_tensor to tf format
380 |         x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32)
381 |         y2 = tf.convert_to_tensor(y.cpu().numpy(), dtype=tf.int32)
382 |         if self.data_format == 'channels_last':
383 |             x2 = tf.transpose(x2, perm=[0,2,3,1])
384 | 
385 |         # Get result
386 |         logits_val, loss_indiv_val, grad_val = self.__get_grad_xent(x2, y2)
387 | 
388 |         # Convert result to pt format
389 |         if self.data_format == 'channels_last':
390 |             grad_val = tf.transpose(grad_val, perm=[0,3,1,2])
391 |         logits_val = self.__tf_to_pt(logits_val)
392 |         loss_indiv_val = self.__tf_to_pt(loss_indiv_val)
393 |         grad_val = self.__tf_to_pt(grad_val)
394 | 
395 |         return logits_val, loss_indiv_val, grad_val
396 | 
397 | 
398 |     def set_target_class(self, y, y_target):
399 |         pass
400 |     
401 | 
402 |     def get_grad_diff_logits_target(self, x, y, y_target):
403 |         """
404 |         Get difference of logits and corrospopnding gradient
405 | 
406 |         Args:
407 |             x: (pytorch_tensor) Input data
408 |             y: (pytorch_tensor) Input label
409 |             y_target: (pytorch_tensor) Input targeted label
410 | 
411 |         Returns:
412 |             difflogits: (pytorch_tensor) Difference of logits
413 |             g2: (pytorch_tensor) Gradient of difference of logits
414 |         """
415 | 
416 |         # Convert pt_tensor to tf format
417 |         la = tf.convert_to_tensor(y.cpu().numpy(), dtype=tf.int32)
418 |         la_target = tf.convert_to_tensor(y_target.cpu().numpy(), dtype=tf.int32)
419 |         x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32)
420 |         if self.data_format == 'channels_last':
421 |             x2 = tf.transpose(x2, perm=[0,2,3,1])
422 | 
423 |         # Get result
424 |         difflogits, g2 = self.__get_grad_diff_logits_target(x2, la, la_target)
425 | 
426 |         # Convert result to pt format
427 |         if self.data_format == 'channels_last':
428 |             g2 = tf.transpose(g2, perm=[0, 3, 1, 2])
429 |         difflogits = self.__tf_to_pt(difflogits)
430 |         g2 = self.__tf_to_pt(g2)
431 |         
432 |         return difflogits, g2
433 | 
434 | 
435 |     def get_logits_loss_grad_dlr(self, x, y):
436 |         """
437 |         Get gradient of DLR loss
438 | 
439 |         Args:
440 |             x: (pytorch_tensor) Input data
441 |             y: (pytorch_tensor) Input label
442 | 
443 |         Returns:
444 |             logits_val: (pytorch_tensor) Logits
445 |             loss_indiv_val: (pytorch_tensor) DLR loss
446 |             grad_val: (pytorch_tensor) Gradient of DLR loss
447 |         """
448 | 
449 |         # Convert pt_tensor to tf format
450 |         x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32)
451 |         y2 = tf.convert_to_tensor(y.cpu().numpy(), dtype=tf.int32)
452 |         if self.data_format == 'channels_last':
453 |             x2 = tf.transpose(x2, perm=[0,2,3,1])
454 | 
455 |         # Get result
456 |         logits_val, loss_indiv_val, grad_val = self.__get_grad_dlr(x2, y2)
457 | 
458 |         # Convert result to pt format
459 |         if self.data_format == 'channels_last':
460 |             grad_val = tf.transpose(grad_val, perm=[0,3,1,2])
461 |         logits_val = self.__tf_to_pt(logits_val)
462 |         loss_indiv_val = self.__tf_to_pt(loss_indiv_val)
463 |         grad_val = self.__tf_to_pt(grad_val)
464 | 
465 |         return logits_val, loss_indiv_val, grad_val
466 |     
467 |     def get_logits_loss_grad_target(self, x, y, y_target):
468 |         """
469 |         Get gradient of targeted DLR loss
470 | 
471 |         Args:
472 |             x: (pytorch_tensor) Input data
473 |             y: (pytorch_tensor) Input label
474 |             y_target: (pytorch_tensor) Input targeted label
475 | 
476 |         Returns:
477 |             logits_val: (pytorch_tensor) Logits
478 |             loss_indiv_val: (pytorch_tensor) Targeted DLR loss
479 |             grad_val: (pytorch_tensor) Gradient of targeted DLR loss
480 |         """
481 | 
482 |         # Convert pt_tensor to tf format
483 |         x2 = tf.convert_to_tensor(x.cpu().numpy(), dtype=tf.float32)
484 |         y2 = tf.convert_to_tensor(y.cpu().numpy(), dtype=tf.int32)
485 |         y_targ = tf.convert_to_tensor(y_target.cpu().numpy(), dtype=tf.int32)
486 |         if self.data_format == 'channels_last':
487 |             x2 = tf.transpose(x2, perm=[0,2,3,1])
488 | 
489 |         # Get result
490 |         logits_val, loss_indiv_val, grad_val = self.__get_grad_dlr_target(x2, y2, y_targ)
491 | 
492 |         # Convert result to pt format
493 |         if self.data_format == 'channels_last':
494 |             grad_val = tf.transpose(grad_val, perm=[0,3,1,2])
495 |         logits_val = self.__tf_to_pt(logits_val)
496 |         loss_indiv_val = self.__tf_to_pt(loss_indiv_val)
497 |         grad_val = self.__tf_to_pt(grad_val)
498 | 
499 |         return logits_val, loss_indiv_val, grad_val
500 | 


--------------------------------------------------------------------------------
/flags_doc.md:
--------------------------------------------------------------------------------
 1 | ## On the usage of AutoAttack
 2 | 
 3 | We here describe cases where the standard version of AA might be non suitable or sufficient for robustness evaluation. While AA is designed to generalize across defenses, there are categories like
 4 | randomized, non differentiable or dynamic defenses for which it cannot be applied in the standard version, since those rely on differet principles than commonly used robust models. In such cases,
 5 | specific modifications or adaptive attacks [(Tramèr et al., 2020)](https://arxiv.org/abs/2002.08347) might be necessary.
 6 | 
 7 | ## Checks
 8 | We introduce a few automatic checks to warn the user in case the classifier presents behaviors typical of non standard models. Below we describe the type of flags which might be raised and provide
 9 | some suggestions about how the robustness evaluation could be improved in the specific cases. Note that some of the checks are in line with the analyses and suggestions by recent works
10 | ([Carlini et al., 2019](https://arxiv.org/abs/1902.06705); [Croce et al., 2020](https://arxiv.org/abs/2010.09670); [Pintor et al., 2021](https://arxiv.org/abs/2106.09947)) which provide guidelines for 
11 | evaluating robustness and detecting failures of attacks.
12 | 
13 | ### Randomized defenses
14 | **Raised if** the clean accuracy of the classifier on a batch or the corresponding logits vary across multiple runs.\
15 | **Explanation:** non deterministic classifiers need to be evaluated with specific techniques e.g. EoT [(Athalye et al., 2018)](http://proceedings.mlr.press/v80/athalye18a.html) and mislead
16 | standard attacks. We suggest to use AA with `version='rand'`, which inclueds APGD combined with EoT. Note that there might still be some random components
17 | in the network which however do not change the predictions or the logits beyond the chosen threshold.
18 | 
19 | ### Softmax output is given
20 | **Raised if** the model outputs a probability distribution. \
21 | **Explanation:** AA expects the model to return logits, i.e. pre-softmax output of the network. If this is not the case, although the classification is unaltered,
22 | there might be numerical instabilities which prevent the gradient-based attacks to perform well.
23 | 
24 | ### Zero gradient
25 | **Raised if** the gradient at the (random) starting point of APGD is zero for any image when using the DLR loss. \
26 | **Explanation:** zero gradient prevents progress in gradient-based iterative attacks. A source of it could be connected to the cross-entropy loss and the scale of the logits, but a remedy consists in
27 | using margin based losses ([Carlini & Wagner, 2017](https://ieeexplore.ieee.org/abstract/document/7958570); [Croce & Hein, 2020](https://arxiv.org/abs/2003.01690)). Vanishing gradients can be also due to specific
28 | components of the networks, like input quantization (see e.g. [here](https://github.com/fra31/auto-attack/issues/44)), which do not allow
29 | backpropagation. In this case one might use BPDA [(Athalye et al., 2018)](http://proceedings.mlr.press/v80/athalye18a.html), which approximates such functions with differentiable counterparts, or black-box attacks, especially those, like Square Attack, which do not rely on
30 | gradient estimation.
31 | 
32 | ### Square Attack improves the robustness evaluation
33 | **Raised if** Square Attack reduces the robust accuracy yielded by the white-box attacks. \
34 | **Explanation:** as mentioned by [Carlini et al. (2019)](https://arxiv.org/abs/1902.06705), black-box attacks performing better than white-box ones is one of the hints of overestimation of robustness. In this case one might run
35 | Square Attack with higher budget (more queries, random restarts) or design adaptive attacks, since it is likely that the tested defense has some features preventing standard gradient-based methods
36 | to be effective.
37 | 
38 | ### Optimization at inference time (only PyTorch models)
39 | **Raised if** standard PyTorch functions for computing the gradients are called when running inference with the given classifier. \
40 | **Explanation:** several defenses which include some optimization loop in the inference procedure have appeared. While AA can give a first estimation of the robustness, it is necessary in this case
41 | to design adaptive attacks, since such models usually modify the input before classifying it, which requires specific techniques for evaluation. Note that this check is non trivial to make automatic,
42 | and we invite the user to be aware that AA might be not the best option to evaluate dynamic defenses.


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | import setuptools
 2 | 
 3 | 
 4 | with open("README.md", "r", encoding="utf-8") as fh:
 5 |     long_description = fh.read()
 6 | 
 7 | setuptools.setup(
 8 |     name="autoattack",
 9 |     version="0.1",
10 |     author="Francesco Croce, Matthias Hein",
11 |     author_email="francesco.croce@uni-tuebingen.de",
12 |     description="This package provides the implementation of AutoAttack.",
13 |     long_description=long_description,
14 |     long_description_content_type="text/markdown",
15 |     url="https://github.com/fra31/auto-attack",
16 |     packages=setuptools.find_packages(),
17 |     classifiers=[
18 |         "Programming Language :: Python :: 3",
19 |         "License :: OSI Approved :: Apache Software License",
20 |         "Operating System :: OS Independent",
21 |     ],
22 | )
23 | 
24 | 
25 | 


--------------------------------------------------------------------------------