├── LICENSE ├── README.md ├── cifar10_data ├── batches.meta ├── data_batch_1 ├── data_batch_2 ├── data_batch_3 ├── data_batch_4 ├── data_batch_5 ├── readme.html └── test_batch ├── cifar10_input.py ├── cifar10_model.py ├── cleverhans_models.py ├── config.json ├── config_cifar10.json ├── eval.py ├── eval_ch.py ├── eval_fb.py ├── model.py ├── pgd_attack.py ├── requirements.txt ├── scripts ├── __init__.py ├── eval_cifar_lps.py ├── eval_cifar_spatial.py ├── eval_mnist_lps.py ├── eval_mnist_spatial.py └── utils.py └── train.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 ftramer 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Adversarial Training and Robustness for Multiple Perturbations 2 | 3 | Code for the paper: 4 | 5 | **Adversarial Training and Robustness for Multiple Perturbations**
6 | *Florian Tramèr and Dan Boneh*
7 | Conference on Neural Information Processing Systems (NeurIPS), 2019
8 | https://arxiv.org/abs/1904.13000 9 | 10 | Our work studies the scalability and effectiveness of adversarial training for achieving robustness against a combination of multiple types of adversarial examples. 11 | We currently implement multiple Lp-bounded attacks (L1, L2, Linf) as well as rotation-translation attacks, for both MNIST and CIFAR10. 12 | 13 | Before training a model, edit the `config.json` file to specify the training, attack, and evaluation parameters. The given `config.json` file can be used as a basis for MNIST experiments, while the `config_cifar10.json` file has the apropriate hyperparameters for CIFAR10. 14 | 15 | ## Training 16 | 17 | To train, simply run: 18 | 19 | ```[bash] 20 | python train.py output/dir/ 21 | ``` 22 | This will read the `config.json` file from the current directory, and save the trained model, logs, as well as the original config file into `output/dir/`. 23 | 24 | ## Evaluation 25 | 26 | We performed a fairly thorough evaluation of the models we trained using a wide range of attacks. Unfortunately, there is currently no single library implementing all these attacks so we combined different ones. Some attacks we implemented ourselves (different forms of PGD and rotation-translation), others are taken from [Cleverhans](https://github.com/tensorflow/cleverhans) and from [Foolbox](https://github.com/bethgelab/foolbox). 27 | Our [evaluation scripts](scripts/) can give you an idea of how we evaluate a model against all attacks. 28 | 29 | ## Config options 30 | 31 | Many hyperparameters in the `config.json` file are standard and self-explanatory. 32 | Specific to our work are the following parameters you may consider tuning: 33 | 34 | * `"multi_attack_mode"`: When training against multiple attacks, this flag indicates whether to train against examples from all attacks (default), or only on the worst example for each input (`"MAX"`). For the wide ResNet model on CIFAR10, the default option causes memory overflow due to too large batches. The `"HALF_BATCH_HALF_LR"` flag halves the batch size (and the learning rate accordingly) to avoid overflows. 35 | 36 | * `"attacks"`: This list specifies the attacks used for either training or evaluation (or both). The parameters are standard, except for our new L1 attack. This comes with a `"perc"` parameter that specifies the sparsity of the gradient updates (see the paper for detail), and a step-size multiplier (`"a"`). The value of the `"perc"` parameter can be a range (e.g., `[80, 99]`) in which case the sparsity of each gradient update in an attack is sampled uniformly from that range. Each attack can take a `"reps"` parameter (default: 1) that specifies the number of times an attack should be repeated. 37 | 38 | * `"train_attacks"` and `"eval_attacks"`: Specify which of the attacks defined under `"attacks"` should be used for training or evaluation. These are lists of indices into `"attacks"`. I.e., `"train_attacks": [0, 1, 2]` means that the first 3 defined attacks are used for training.
39 | Our paper also defines a new type of *affine attack* that interpolates between two attack types. You can specify an affine attack via a tuple of attacks: e.g., `"eval_attacks": [0, [1, 2]]` will evaluate against the first attack, and against an affine attack that interpolates between the second and third attack. The weighting used by the affine attack can be specified by adding a `"weight"` parameter to the attack parameters. 40 | 41 | ## Acknowledgments 42 | Parts of the codebase are inspired or directly borrowed from: 43 | * https://github.com/MadryLab/cifar10_challenge 44 | * https://github.com/MadryLab/adversarial_spatial 45 | 46 | 47 | ## Citation 48 | 49 | If our code or our results are useful in your reasearch, please consider citing: 50 | 51 | ```[bibtex] 52 | @inproceedings{TB19, 53 | author={Tram{\`e}r, Florian and Boneh, Dan}, 54 | title={Adversarial Training and Robustness for Multiple Perturbations}, 55 | booktitle={Conference on Neural Information Processing Systems (NeurIPS)}, 56 | year={2019}, 57 | howpublished={arXiv preprint arXiv:1904.13000}, 58 | url={https://arxiv.org/abs/1904.13000} 59 | } 60 | ``` 61 | 62 | -------------------------------------------------------------------------------- /cifar10_data/batches.meta: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ftramer/MultiRobustness/f51a75e07f06b010f34ee760d80fea05ba8ba785/cifar10_data/batches.meta -------------------------------------------------------------------------------- /cifar10_data/data_batch_1: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ftramer/MultiRobustness/f51a75e07f06b010f34ee760d80fea05ba8ba785/cifar10_data/data_batch_1 -------------------------------------------------------------------------------- /cifar10_data/data_batch_2: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ftramer/MultiRobustness/f51a75e07f06b010f34ee760d80fea05ba8ba785/cifar10_data/data_batch_2 -------------------------------------------------------------------------------- /cifar10_data/data_batch_3: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ftramer/MultiRobustness/f51a75e07f06b010f34ee760d80fea05ba8ba785/cifar10_data/data_batch_3 -------------------------------------------------------------------------------- /cifar10_data/data_batch_4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ftramer/MultiRobustness/f51a75e07f06b010f34ee760d80fea05ba8ba785/cifar10_data/data_batch_4 -------------------------------------------------------------------------------- /cifar10_data/data_batch_5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ftramer/MultiRobustness/f51a75e07f06b010f34ee760d80fea05ba8ba785/cifar10_data/data_batch_5 -------------------------------------------------------------------------------- /cifar10_data/readme.html: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /cifar10_data/test_batch: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ftramer/MultiRobustness/f51a75e07f06b010f34ee760d80fea05ba8ba785/cifar10_data/test_batch -------------------------------------------------------------------------------- /cifar10_input.py: -------------------------------------------------------------------------------- 1 | """ 2 | Utilities for importing the CIFAR10 dataset. 3 | 4 | Each image in the dataset is a numpy array of shape (32, 32, 3), with the values 5 | being unsigned integers (i.e., in the range 0,1,...,255). 6 | """ 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | 12 | import os 13 | import pickle 14 | import random 15 | import sys 16 | import tensorflow as tf 17 | version = sys.version_info 18 | 19 | import numpy as np 20 | 21 | class CIFAR10Data(object): 22 | """ 23 | Unpickles the CIFAR10 dataset from a specified folder containing a pickled 24 | version following the format of Krizhevsky which can be found 25 | [here](https://www.cs.toronto.edu/~kriz/cifar.html). 26 | 27 | Inputs to constructor 28 | ===================== 29 | 30 | - path: path to the pickled dataset. The training data must be pickled 31 | into five files named data_batch_i for i = 1, ..., 5, containing 10,000 32 | examples each, the test data 33 | must be pickled into a single file called test_batch containing 10,000 34 | examples, and the 10 class names must be 35 | pickled into a file called batches.meta. The pickled examples should 36 | be stored as a tuple of two objects: an array of 10,000 32x32x3-shaped 37 | arrays, and an array of their 10,000 true labels. 38 | 39 | """ 40 | def __init__(self, path): 41 | train_filenames = ['data_batch_{}'.format(ii + 1) for ii in range(5)] 42 | eval_filename = 'test_batch' 43 | metadata_filename = 'batches.meta' 44 | 45 | train_images = np.zeros((50000, 32, 32, 3), dtype='uint8') 46 | train_labels = np.zeros(50000, dtype='int32') 47 | for ii, fname in enumerate(train_filenames): 48 | cur_images, cur_labels = self._load_datafile( 49 | os.path.join(path, fname)) 50 | train_images[ii * 10000 : (ii+1) * 10000, ...] = cur_images 51 | train_labels[ii * 10000 : (ii+1) * 10000, ...] = cur_labels 52 | eval_images, eval_labels = self._load_datafile( 53 | os.path.join(path, eval_filename)) 54 | 55 | with open(os.path.join(path, metadata_filename), 'rb') as fo: 56 | if version.major == 3: 57 | data_dict = pickle.load(fo, encoding='bytes') 58 | else: 59 | data_dict = pickle.load(fo) 60 | 61 | self.label_names = data_dict[b'label_names'] 62 | for ii in range(len(self.label_names)): 63 | self.label_names[ii] = self.label_names[ii].decode('utf-8') 64 | 65 | self.train_data = Dataset(train_images, train_labels) 66 | self.eval_data = Dataset(eval_images, eval_labels) 67 | 68 | @staticmethod 69 | def _load_datafile(filename): 70 | with open(filename, 'rb') as fo: 71 | if version.major == 3: 72 | data_dict = pickle.load(fo, encoding='bytes') 73 | else: 74 | data_dict = pickle.load(fo) 75 | 76 | assert data_dict[b'data'].dtype == np.uint8 77 | image_data = data_dict[b'data'] 78 | image_data = image_data.reshape((10000, 3, 32, 32)).transpose(0,2,3,1) 79 | return image_data, np.array(data_dict[b'labels']) 80 | 81 | class AugmentedCIFAR10Data(object): 82 | """ 83 | Data augmentation wrapper over a loaded dataset. 84 | 85 | Inputs to constructor 86 | ===================== 87 | - raw_cifar10data: the loaded CIFAR10 dataset, via the CIFAR10Data class 88 | - sess: current tensorflow session 89 | """ 90 | def __init__(self, raw_cifar10data, sess): 91 | assert isinstance(raw_cifar10data, CIFAR10Data) 92 | self.image_size = 32 93 | 94 | # create augmentation computational graph 95 | self.x_input_placeholder = tf.placeholder(tf.float32, shape=[None, 32, 32, 3]) 96 | padded = tf.map_fn(lambda img: tf.image.resize_image_with_crop_or_pad( 97 | img, self.image_size + 4, self.image_size + 4), 98 | self.x_input_placeholder) 99 | cropped = tf.map_fn(lambda img: tf.random_crop(img, [self.image_size, 100 | self.image_size, 101 | 3]), padded) 102 | flipped = tf.map_fn(lambda img: tf.image.random_flip_left_right(img), cropped) 103 | self.augmented = flipped 104 | 105 | self.train_data = AugmentedDataset(raw_cifar10data.train_data, sess, 106 | self.x_input_placeholder, 107 | self.augmented) 108 | self.eval_data = AugmentedDataset(raw_cifar10data.eval_data, sess, 109 | self.x_input_placeholder, 110 | self.augmented) 111 | self.label_names = raw_cifar10data.label_names 112 | 113 | 114 | class Dataset(object): 115 | """ 116 | Dataset object implementing a simple batching procedure. 117 | """ 118 | def __init__(self, xs, ys): 119 | self.xs = xs 120 | self.n = xs.shape[0] 121 | self.ys = ys 122 | self.batch_start = 0 123 | self.cur_order = np.random.permutation(self.n) 124 | 125 | def get_next_batch(self, batch_size, multiple_passes=False, 126 | reshuffle_after_pass=True): 127 | if self.n < batch_size: 128 | raise ValueError('Batch size can be at most the dataset size') 129 | if not multiple_passes: 130 | actual_batch_size = min(batch_size, self.n - self.batch_start) 131 | if actual_batch_size <= 0: 132 | raise ValueError('Pass through the dataset is complete.') 133 | batch_end = self.batch_start + actual_batch_size 134 | batch_xs = self.xs[self.cur_order[self.batch_start : batch_end],...] 135 | batch_ys = self.ys[self.cur_order[self.batch_start : batch_end],...] 136 | self.batch_start += actual_batch_size 137 | return batch_xs, batch_ys 138 | actual_batch_size = min(batch_size, self.n - self.batch_start) 139 | if actual_batch_size < batch_size: 140 | if reshuffle_after_pass: 141 | self.cur_order = np.random.permutation(self.n) 142 | self.batch_start = 0 143 | batch_end = self.batch_start + batch_size 144 | batch_xs = self.xs[self.cur_order[self.batch_start : batch_end], ...] 145 | batch_ys = self.ys[self.cur_order[self.batch_start : batch_end], ...] 146 | self.batch_start += actual_batch_size 147 | return batch_xs, batch_ys 148 | 149 | 150 | class AugmentedDataset(object): 151 | """ 152 | Dataset object with built-in data augmentation. When performing 153 | adversarial attacks, we cannot include data augmentation as part of the 154 | model. If we do the adversary will try to backprop through it. 155 | """ 156 | def __init__(self, raw_datasubset, sess, x_input_placeholder, 157 | augmented): 158 | self.sess = sess 159 | self.raw_datasubset = raw_datasubset 160 | self.x_input_placeholder = x_input_placeholder 161 | self.augmented = augmented 162 | 163 | def get_next_batch(self, batch_size, multiple_passes=False, 164 | reshuffle_after_pass=True): 165 | raw_batch = self.raw_datasubset.get_next_batch(batch_size, 166 | multiple_passes, 167 | reshuffle_after_pass) 168 | images = raw_batch[0].astype(np.float32) 169 | 170 | # return both the raw and augmented input 171 | # for adversarial training with rotation/translations, we start 172 | # from the raw input to avoid compounding augmentations 173 | return (raw_batch[0], 174 | self.sess.run( 175 | self.augmented, 176 | feed_dict={self.x_input_placeholder: raw_batch[0]}), 177 | raw_batch[1]) 178 | 179 | -------------------------------------------------------------------------------- /cifar10_model.py: -------------------------------------------------------------------------------- 1 | # based on https://github.com/tensorflow/models/tree/master/resnet 2 | from __future__ import absolute_import 3 | from __future__ import division 4 | from __future__ import print_function 5 | 6 | import numpy as np 7 | import tensorflow as tf 8 | 9 | class Model(object): 10 | """ResNet model.""" 11 | 12 | def __init__(self, config): 13 | """ResNet constructor. 14 | """ 15 | self._build_model(config) 16 | 17 | def add_internal_summaries(self): 18 | pass 19 | 20 | def _stride_arr(self, stride): 21 | """Map a stride scalar to the stride array for tf.nn.conv2d.""" 22 | return [1, stride, stride, 1] 23 | 24 | def _build_model(self, config, pad_mode='CONSTANT', pad_size=32): 25 | """Build the core model within the graph.""" 26 | with tf.variable_scope('input'): 27 | filters = config['filters'] 28 | 29 | self.x_input = tf.placeholder(tf.float32, shape=[None, 32, 32, 3]) 30 | self.y_input = tf.placeholder(tf.int64, shape=None) 31 | 32 | self.transform = tf.placeholder_with_default(tf.zeros((tf.shape(self.x_input)[0], 3)), shape=[None, 3]) 33 | trans_x, trans_y, rot = tf.unstack(self.transform, axis=1) 34 | rot *= np.pi / 180 # convert degrees to radians 35 | 36 | self.is_training = tf.placeholder_with_default(False, []) 37 | 38 | x = self.x_input 39 | x = tf.pad(x, [[0,0], [16,16], [16,16], [0,0]], pad_mode) 40 | #rotate and translate image 41 | ones = tf.ones(shape=tf.shape(trans_x)) 42 | zeros = tf.zeros(shape=tf.shape(trans_x)) 43 | trans = tf.stack([ones, zeros, -trans_x, 44 | zeros, ones, -trans_y, 45 | zeros, zeros], axis=1) 46 | x = tf.contrib.image.rotate(x, rot, interpolation='BILINEAR') 47 | x = tf.contrib.image.transform(x, trans, interpolation='BILINEAR') 48 | x = tf.image.resize_image_with_crop_or_pad(x, pad_size, pad_size) 49 | 50 | # everything below this point is generic (independent of spatial attacks) 51 | self.x_image = x 52 | x = tf.map_fn(lambda img: tf.image.per_image_standardization(img), x) 53 | 54 | x = self._conv('init_conv', x, 3, 3, 16, self._stride_arr(1)) 55 | 56 | strides = [1, 2, 2] 57 | activate_before_residual = [True, False, False] 58 | res_func = self._residual 59 | 60 | with tf.variable_scope('unit_1_0'): 61 | x = res_func(x, filters[0], filters[1], self._stride_arr(strides[0]), 62 | activate_before_residual[0]) 63 | for i in range(1, 5): 64 | with tf.variable_scope('unit_1_%d' % i): 65 | x = res_func(x, filters[1], filters[1], self._stride_arr(1), False) 66 | 67 | with tf.variable_scope('unit_2_0'): 68 | x = res_func(x, filters[1], filters[2], self._stride_arr(strides[1]), 69 | activate_before_residual[1]) 70 | for i in range(1, 5): 71 | with tf.variable_scope('unit_2_%d' % i): 72 | x = res_func(x, filters[2], filters[2], self._stride_arr(1), False) 73 | 74 | with tf.variable_scope('unit_3_0'): 75 | x = res_func(x, filters[2], filters[3], self._stride_arr(strides[2]), 76 | activate_before_residual[2]) 77 | for i in range(1, 5): 78 | with tf.variable_scope('unit_3_%d' % i): 79 | x = res_func(x, filters[3], filters[3], self._stride_arr(1), False) 80 | 81 | with tf.variable_scope('unit_last'): 82 | x = self._batch_norm('final_bn', x) 83 | x = self._relu(x, 0.1) 84 | x = self._global_avg_pool(x) 85 | 86 | # uncomment to add and extra fc layer 87 | #with tf.variable_scope('unit_fc'): 88 | # self.pre_softmax = self._fully_connected(x, 1024) 89 | # x = self._relu(x, 0.1) 90 | 91 | with tf.variable_scope('logit'): 92 | self.pre_softmax = self._fully_connected(x, 10) 93 | 94 | self.predictions = tf.argmax(self.pre_softmax, 1) 95 | self.correct_prediction = tf.equal(self.predictions, self.y_input) 96 | self.num_correct = tf.reduce_sum( 97 | tf.cast(self.correct_prediction, tf.int64)) 98 | self.accuracy = tf.reduce_mean( 99 | tf.cast(self.correct_prediction, tf.float32)) 100 | 101 | with tf.variable_scope('costs'): 102 | self.y_xent = tf.nn.sparse_softmax_cross_entropy_with_logits( 103 | logits=self.pre_softmax, labels=self.y_input) 104 | self.xent = tf.reduce_sum(self.y_xent, name='y_xent') 105 | self.mean_xent = tf.reduce_mean(self.y_xent) 106 | self.weight_decay_loss = self._decay() 107 | 108 | def _batch_norm(self, name, x): 109 | """Batch normalization.""" 110 | with tf.name_scope(name): 111 | return tf.contrib.layers.batch_norm( 112 | inputs=x, 113 | decay=.9, 114 | center=True, 115 | scale=True, 116 | activation_fn=None, 117 | updates_collections=None, 118 | is_training=self.is_training) 119 | 120 | def _residual(self, x, in_filter, out_filter, stride, 121 | activate_before_residual=False): 122 | """Residual unit with 2 sub layers.""" 123 | if activate_before_residual: 124 | with tf.variable_scope('shared_activation'): 125 | x = self._batch_norm('init_bn', x) 126 | x = self._relu(x, 0.1) 127 | orig_x = x 128 | else: 129 | with tf.variable_scope('residual_only_activation'): 130 | orig_x = x 131 | x = self._batch_norm('init_bn', x) 132 | x = self._relu(x, 0.1) 133 | 134 | with tf.variable_scope('sub1'): 135 | x = self._conv('conv1', x, 3, in_filter, out_filter, stride) 136 | 137 | with tf.variable_scope('sub2'): 138 | x = self._batch_norm('bn2', x) 139 | x = self._relu(x, 0.1) 140 | x = self._conv('conv2', x, 3, out_filter, out_filter, [1, 1, 1, 1]) 141 | 142 | with tf.variable_scope('sub_add'): 143 | if in_filter != out_filter: 144 | orig_x = tf.nn.avg_pool(orig_x, stride, stride, 'VALID') 145 | orig_x = tf.pad( 146 | orig_x, [[0, 0], [0, 0], [0, 0], 147 | [(out_filter-in_filter)//2, (out_filter-in_filter)//2]]) 148 | x += orig_x 149 | 150 | tf.logging.debug('image after unit %s', x.get_shape()) 151 | return x 152 | 153 | def _decay(self): 154 | """L2 weight decay loss.""" 155 | costs = [] 156 | for var in tf.trainable_variables(): 157 | if var.op.name.find('DW') >= 0: 158 | costs.append(tf.nn.l2_loss(var)) 159 | return tf.add_n(costs) 160 | 161 | def _conv(self, name, x, filter_size, in_filters, out_filters, strides): 162 | """Convolution.""" 163 | with tf.variable_scope(name): 164 | n = filter_size * filter_size * out_filters 165 | kernel = tf.get_variable( 166 | 'DW', [filter_size, filter_size, in_filters, out_filters], 167 | tf.float32, initializer=tf.random_normal_initializer( 168 | stddev=np.sqrt(2.0/n))) 169 | return tf.nn.conv2d(x, kernel, strides, padding='SAME') 170 | 171 | def _relu(self, x, leakiness=0.0): 172 | """Relu, with optional leaky support.""" 173 | return tf.where(tf.less(x, 0.0), leakiness * x, x, name='leaky_relu') 174 | 175 | def _fully_connected(self, x, out_dim): 176 | """FullyConnected layer for final output.""" 177 | num_non_batch_dimensions = len(x.shape) 178 | prod_non_batch_dimensions = 1 179 | for ii in range(num_non_batch_dimensions - 1): 180 | prod_non_batch_dimensions *= int(x.shape[ii + 1]) 181 | x = tf.reshape(x, [tf.shape(x)[0], -1]) 182 | w = tf.get_variable( 183 | 'DW', [prod_non_batch_dimensions, out_dim], 184 | initializer=tf.uniform_unit_scaling_initializer(factor=1.0)) 185 | b = tf.get_variable('biases', [out_dim], 186 | initializer=tf.constant_initializer()) 187 | return tf.nn.xw_plus_b(x, w, b) 188 | 189 | def _global_avg_pool(self, x): 190 | assert x.get_shape().ndims == 4 191 | return tf.reduce_mean(x, [1, 2]) 192 | 193 | 194 | -------------------------------------------------------------------------------- /cleverhans_models.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | from collections import OrderedDict 6 | from cleverhans.model import Model 7 | from cleverhans.utils import deterministic_dict 8 | from cleverhans.dataset import Factory, MNIST 9 | import numpy as np 10 | import tensorflow as tf 11 | from cleverhans.serial import NoRefModel 12 | 13 | 14 | class Layer(object): 15 | 16 | def get_output_shape(self): 17 | return self.output_shape 18 | 19 | 20 | class ResNet(NoRefModel): 21 | """ResNet model.""" 22 | 23 | def __init__(self, layers, input_shape, scope=None): 24 | """ResNet constructor. 25 | :param layers: a list of layers in CleverHans format 26 | each with set_input_shape() and fprop() methods. 27 | :param input_shape: 4-tuple describing input shape (e.g None, 32, 32, 3) 28 | :param scope: string name of scope for Variables 29 | This works in two ways. 30 | If scope is None, the variables are not put in a scope, and the 31 | model is compatible with Saver.restore from the public downloads 32 | for the CIFAR10 Challenge. 33 | If the scope is a string, then Saver.restore won't work, but the 34 | model functions as a picklable NoRefModels that finds its variables 35 | based on the scope. 36 | """ 37 | super(ResNet, self).__init__(scope, 10, {}, scope is not None) 38 | if scope is None: 39 | before = list(tf.trainable_variables()) 40 | before_vars = list(tf.global_variables()) 41 | self.build(layers, input_shape) 42 | after = list(tf.trainable_variables()) 43 | after_vars = list(tf.global_variables()) 44 | self.params = [param for param in after if param not in before] 45 | self.vars = [var for var in after_vars if var not in before_vars] 46 | else: 47 | with tf.variable_scope(self.scope, reuse=tf.AUTO_REUSE): 48 | self.build(layers, input_shape) 49 | 50 | def get_vars(self): 51 | if hasattr(self, "vars"): 52 | return self.vars 53 | return super(ResNet, self).get_vars() 54 | 55 | def build(self, layers, input_shape): 56 | self.layer_names = [] 57 | self.layers = layers 58 | self.input_shape = input_shape 59 | if isinstance(layers[-1], Softmax): 60 | layers[-1].name = 'probs' 61 | layers[-2].name = 'logits' 62 | else: 63 | layers[-1].name = 'logits' 64 | for i, layer in enumerate(self.layers): 65 | if hasattr(layer, 'name'): 66 | name = layer.name 67 | else: 68 | name = layer.__class__.__name__ + str(i) 69 | layer.name = name 70 | self.layer_names.append(name) 71 | 72 | layer.set_input_shape(input_shape) 73 | input_shape = layer.get_output_shape() 74 | 75 | def make_input_placeholder(self): 76 | return tf.placeholder(tf.float32, (None, 32, 32, 3)) 77 | 78 | def make_label_placeholder(self): 79 | return tf.placeholder(tf.float32, (None, 10)) 80 | 81 | def fprop(self, x, set_ref=False): 82 | x = x * 255.0 83 | if self.scope is not None: 84 | with tf.variable_scope(self.scope, reuse=tf.AUTO_REUSE): 85 | return self._fprop(x, set_ref) 86 | return self._fprop(x, set_ref) 87 | 88 | def _fprop(self, x, set_ref=False): 89 | states = [] 90 | for layer in self.layers: 91 | if set_ref: 92 | layer.ref = x 93 | x = layer.fprop(x) 94 | assert x is not None 95 | states.append(x) 96 | states = dict(zip(self.layer_names, states)) 97 | return states 98 | 99 | def add_internal_summaries(self): 100 | pass 101 | 102 | 103 | def _stride_arr(stride): 104 | """Map a stride scalar to the stride array for tf.nn.conv2d.""" 105 | return [1, stride, stride, 1] 106 | 107 | 108 | class Input(Layer): 109 | 110 | def __init__(self): 111 | pass 112 | 113 | def set_input_shape(self, input_shape): 114 | batch_size, rows, cols, input_channels = input_shape 115 | # assert self.mode == 'train' or self.mode == 'eval' 116 | """Build the core model within the graph.""" 117 | input_shape = list(input_shape) 118 | input_shape[0] = 1 119 | dummy_batch = tf.zeros(input_shape) 120 | dummy_output = self.fprop(dummy_batch) 121 | output_shape = [int(e) for e in dummy_output.get_shape()] 122 | output_shape[0] = batch_size 123 | self.output_shape = tuple(output_shape) 124 | 125 | def fprop(self, x): 126 | with tf.variable_scope('input', reuse=tf.AUTO_REUSE): 127 | input_standardized = tf.map_fn( 128 | lambda img: tf.image.per_image_standardization(img), x) 129 | return _conv('init_conv', input_standardized, 130 | 3, 3, 16, _stride_arr(1)) 131 | 132 | class Conv2D(Layer): 133 | 134 | def __init__(self, filters): 135 | self.filters = filters 136 | 137 | assert filters == [16, 16, 32, 64] or filters == [16, 160, 320, 640] 138 | 139 | pass 140 | 141 | def set_input_shape(self, input_shape): 142 | batch_size, rows, cols, input_channels = input_shape 143 | 144 | # Uncomment the following codes to use w28-10 wide residual network. 145 | # It is more memory efficient than very deep residual network and has 146 | # comparably good performance. 147 | # https://arxiv.org/pdf/1605.07146v1.pdf 148 | input_shape = list(input_shape) 149 | input_shape[0] = 1 150 | dummy_batch = tf.zeros(input_shape) 151 | dummy_output = self.fprop(dummy_batch) 152 | output_shape = [int(e) for e in dummy_output.get_shape()] 153 | output_shape[0] = batch_size 154 | self.output_shape = tuple(output_shape) 155 | 156 | def fprop(self, x): 157 | 158 | # Update hps.num_residual_units to 9 159 | strides = [1, 2, 2] 160 | activate_before_residual = [True, False, False] 161 | filters = self.filters 162 | 163 | res_func = _residual 164 | with tf.variable_scope('unit_1_0', reuse=tf.AUTO_REUSE): 165 | x = res_func(x, filters[0], filters[1], _stride_arr(strides[0]), 166 | activate_before_residual[0]) 167 | for i in range(1, 5): 168 | with tf.variable_scope(('unit_1_%d' % i), reuse=tf.AUTO_REUSE): 169 | x = res_func(x, filters[1], filters[1], 170 | _stride_arr(1), False) 171 | 172 | with tf.variable_scope(('unit_2_0'), reuse=tf.AUTO_REUSE): 173 | x = res_func(x, filters[1], filters[2], _stride_arr(strides[1]), 174 | activate_before_residual[1]) 175 | for i in range(1, 5): 176 | with tf.variable_scope(('unit_2_%d' % i), reuse=tf.AUTO_REUSE): 177 | x = res_func(x, filters[2], filters[2], 178 | _stride_arr(1), False) 179 | 180 | with tf.variable_scope(('unit_3_0'), reuse=tf.AUTO_REUSE): 181 | x = res_func(x, filters[2], filters[3], _stride_arr(strides[2]), 182 | activate_before_residual[2]) 183 | for i in range(1, 5): 184 | with tf.variable_scope(('unit_3_%d' % i), reuse=tf.AUTO_REUSE): 185 | x = res_func(x, filters[3], filters[3], 186 | _stride_arr(1), False) 187 | 188 | with tf.variable_scope(('unit_last'), reuse=tf.AUTO_REUSE): 189 | x = _batch_norm('final_bn', x) 190 | x = _relu(x, 0.1) 191 | x = _global_avg_pool(x) 192 | 193 | return x 194 | 195 | 196 | class Linear(Layer): 197 | 198 | def __init__(self, num_hid): 199 | self.num_hid = num_hid 200 | 201 | def set_input_shape(self, input_shape): 202 | batch_size, dim = input_shape 203 | self.input_shape = [batch_size, dim] 204 | self.dim = dim 205 | self.output_shape = [batch_size, self.num_hid] 206 | self.make_vars() 207 | 208 | def make_vars(self): 209 | with tf.variable_scope('logit', reuse=tf.AUTO_REUSE): 210 | w = tf.get_variable( 211 | 'DW', [self.dim, self.num_hid], 212 | initializer=tf.initializers.variance_scaling( 213 | distribution='uniform')) 214 | b = tf.get_variable('biases', [self.num_hid], 215 | initializer=tf.initializers.constant()) 216 | return w, b 217 | 218 | def fprop(self, x): 219 | w, b = self.make_vars() 220 | return tf.nn.xw_plus_b(x, w, b) 221 | 222 | 223 | def _batch_norm(name, x): 224 | """Batch normalization.""" 225 | with tf.name_scope(name): 226 | return tf.contrib.layers.batch_norm( 227 | inputs=x, 228 | decay=.9, 229 | center=True, 230 | scale=True, 231 | activation_fn=None, 232 | updates_collections=None, 233 | is_training=False) 234 | 235 | 236 | def _residual(x, in_filter, out_filter, stride, 237 | activate_before_residual=False): 238 | """Residual unit with 2 sub layers.""" 239 | if activate_before_residual: 240 | with tf.variable_scope('shared_activation', reuse=tf.AUTO_REUSE): 241 | x = _batch_norm('init_bn', x) 242 | x = _relu(x, 0.1) 243 | orig_x = x 244 | else: 245 | with tf.variable_scope('residual_only_activation', reuse=tf.AUTO_REUSE): 246 | orig_x = x 247 | x = _batch_norm('init_bn', x) 248 | x = _relu(x, 0.1) 249 | 250 | with tf.variable_scope('sub1', reuse=tf.AUTO_REUSE): 251 | x = _conv('conv1', x, 3, in_filter, out_filter, stride) 252 | 253 | with tf.variable_scope('sub2', reuse=tf.AUTO_REUSE): 254 | x = _batch_norm('bn2', x) 255 | x = _relu(x, 0.1) 256 | x = _conv('conv2', x, 3, out_filter, out_filter, [1, 1, 1, 1]) 257 | 258 | with tf.variable_scope('sub_add', reuse=tf.AUTO_REUSE): 259 | if in_filter != out_filter: 260 | orig_x = tf.nn.avg_pool(orig_x, stride, stride, 'VALID') 261 | orig_x = tf.pad( 262 | orig_x, [[0, 0], [0, 0], 263 | [0, 0], [(out_filter - in_filter) // 2, 264 | (out_filter - in_filter) // 2]]) 265 | x += orig_x 266 | 267 | tf.logging.debug('image after unit %s', x.get_shape()) 268 | return x 269 | 270 | 271 | def _decay(): 272 | """L2 weight decay loss.""" 273 | costs = [] 274 | for var in tf.trainable_variables(): 275 | if var.op.name.find('DW') > 0: 276 | costs.append(tf.nn.l2_loss(var)) 277 | return tf.add_n(costs) 278 | 279 | 280 | def _conv(name, x, filter_size, in_filters, out_filters, strides): 281 | """Convolution.""" 282 | with tf.variable_scope(name, reuse=tf.AUTO_REUSE): 283 | n = filter_size * filter_size * out_filters 284 | kernel = tf.get_variable( 285 | 'DW', [filter_size, filter_size, in_filters, out_filters], 286 | tf.float32, initializer=tf.random_normal_initializer( 287 | stddev=np.sqrt(2.0 / n))) 288 | return tf.nn.conv2d(x, kernel, strides, padding='SAME') 289 | 290 | 291 | def _relu(x, leakiness=0.0): 292 | """Relu, with optional leaky support.""" 293 | return tf.where(tf.less(x, 0.0), leakiness * x, x, name='leaky_relu') 294 | 295 | 296 | def _global_avg_pool(x): 297 | assert x.get_shape().ndims == 4 298 | return tf.reduce_mean(x, [1, 2]) 299 | 300 | 301 | class Softmax(Layer): 302 | 303 | def __init__(self): 304 | pass 305 | 306 | def set_input_shape(self, shape): 307 | self.input_shape = shape 308 | self.output_shape = shape 309 | 310 | def fprop(self, x): 311 | return tf.nn.softmax(x) 312 | 313 | 314 | class Flatten(Layer): 315 | 316 | def __init__(self): 317 | pass 318 | 319 | def set_input_shape(self, shape): 320 | self.input_shape = shape 321 | output_width = 1 322 | for factor in shape[1:]: 323 | output_width *= factor 324 | self.output_width = output_width 325 | self.output_shape = [None, output_width] 326 | 327 | def fprop(self, x): 328 | return tf.reshape(x, [-1, self.output_width]) 329 | 330 | 331 | def make_wresnet(nb_classes=10, input_shape=(None, 32, 32, 3), scope=None, filters=None): 332 | layers = [Input(), 333 | Conv2D(filters=filters), # the whole ResNet is basically created in this layer 334 | Flatten(), 335 | Linear(nb_classes), 336 | Softmax()] 337 | 338 | model = ResNet(layers, input_shape, scope) 339 | return model 340 | 341 | 342 | class MadryMNIST(Model): 343 | 344 | def __init__(self, nb_classes=10): 345 | # NOTE: for compatibility with Madry Lab downloadable checkpoints, 346 | # we cannot use scopes, give these variables names, etc. 347 | 348 | """ 349 | self.conv1 = tf.layers.Conv2D(32, (5, 5), activation='relu', padding='same', name='conv1') 350 | self.pool1 = tf.layers.MaxPooling2D((2, 2), (2, 2), padding='same') 351 | self.conv2 = tf.layers.Conv2D(64, (5, 5), activation='relu', padding='same', name='conv2') 352 | self.pool2 = tf.layers.MaxPooling2D((2, 2), (2, 2), padding='same') 353 | self.fc1 = tf.layers.Dense(1024, activation='relu', name='fc1') 354 | self.fc2 = tf.layers.Dense(10, name='fc2') 355 | """ 356 | 357 | keras_model = tf.keras.Sequential() 358 | keras_model.add(tf.keras.layers.Conv2D(32, (5, 5), activation='relu', padding='same', name='conv1', 359 | input_shape=(28, 28, 1))) 360 | keras_model.add(tf.keras.layers.MaxPooling2D((2, 2), (2, 2), padding='same')) 361 | keras_model.add(tf.keras.layers.Conv2D(64, (5, 5), activation='relu', padding='same', name='conv2')) 362 | keras_model.add(tf.keras.layers.MaxPooling2D((2, 2), (2, 2), padding='same')) 363 | keras_model.add(tf.keras.layers.Flatten()) 364 | keras_model.add(tf.keras.layers.Dense(1024, activation='relu', name='fc1')) 365 | keras_model.add(tf.keras.layers.Dense(10, name='fc2')) 366 | 367 | self.keras_model = keras_model 368 | Model.__init__(self, '', nb_classes, {}) 369 | self.dataset_factory = Factory(MNIST, {"center": False}) 370 | 371 | def fprop(self, x): 372 | 373 | output = OrderedDict() 374 | logits = self.keras_model(x) 375 | 376 | output = deterministic_dict(locals()) 377 | del output["self"] 378 | output[self.O_PROBS] = tf.nn.softmax(logits=logits) 379 | 380 | return output 381 | -------------------------------------------------------------------------------- /config.json: -------------------------------------------------------------------------------- 1 | { 2 | "_comment": "===== MODEL CONFIGURATION =====", 3 | "data": "mnist", 4 | "model_type": "cnn", 5 | 6 | "_comment": "===== TRAINING CONFIGURATION =====", 7 | "random_seed": 0, 8 | "max_num_training_steps": 6000, 9 | "num_output_steps": 600, 10 | "num_summary_steps": 600, 11 | "num_checkpoint_steps": 600, 12 | "training_batch_size": 100, 13 | "step_size_schedule": [[0, 0.001], [3000, 0.0001]], 14 | 15 | "_comment": "===== EVAL CONFIGURATION =====", 16 | "num_eval_examples": 100, 17 | "eval_batch_size": 100, 18 | "eval_on_cpu": false, 19 | 20 | "_comment": "=====ADVERSARIAL EXAMPLES CONFIGURATION=====", 21 | 22 | "_comment": "One of: '', 'MAX'", 23 | "multi_attack_mode": "MAX", 24 | "start_small": true, 25 | "attacks": [ 26 | {"type": "linf", "epsilon": 0.3, "k": 10, "a": 0.01, "random_start": true}, 27 | {"type": "l2", "epsilon": 2, "k": 10, "a": 0.1, "random_start": true}, 28 | {"type": "l1", "epsilon": 10, "k": 10, "random_start": true, "perc": 99, "a": 1.0} 29 | ], 30 | "train_attacks": [0, 1, 2], 31 | "eval_attacks": [0, 1, 2] 32 | } 33 | -------------------------------------------------------------------------------- /config_cifar10.json: -------------------------------------------------------------------------------- 1 | { 2 | "_comment": "===== MODEL CONFIGURATION =====", 3 | "data": "cifar10", 4 | "data_path": "cifar10_data", 5 | "model_type": "cnn", 6 | "filters": [16, 160, 320, 640], 7 | 8 | "_comment": "===== TRAINING CONFIGURATION =====", 9 | "random_seed": 0, 10 | "max_num_training_steps": 80000, 11 | "num_output_steps": 200, 12 | "num_summary_steps": 5000, 13 | "num_checkpoint_steps": 5000, 14 | "training_batch_size": 128, 15 | "step_size_schedule": [[0, 0.1], [40000, 0.01], [60000, 0.001]], 16 | "weight_decay": 0.0002, 17 | "momentum": 0.9, 18 | 19 | "_comment": "===== EVAL CONFIGURATION =====", 20 | "num_eval_examples": 1000, 21 | "eval_batch_size": 100, 22 | "eval_on_cpu": false, 23 | 24 | "_comment": "=====ADVERSARIAL EXAMPLES CONFIGURATION=====", 25 | "_comment": "One of: '', 'ALTERNATE', 'MAX', 'HALF_BATCH_HALF_LR'", 26 | "multi_attack_mode": "MAX", 27 | "attacks": [ 28 | {"type": "linf", "epsilon": 8.0, "k": 10, "random_start": true}, 29 | {"type": "l2", "epsilon": 80, "k": 40, "random_start": true}, 30 | {"type": "l1", "epsilon": 2000, "k": 100, "random_start": true, "perc": 99, "a": 2.0}, 31 | {"type": "RT", "spatial_limits": [3, 3, 30], "grid_granularity": [5, 5, 31], "random_tries": 10}, 32 | {"type": "RT", "spatial_limits": [3, 3, 30], "grid_granularity": [5, 5, 31], "random_tries": -1} 33 | ], 34 | "train_attacks": [0], 35 | "eval_attacks": [0] 36 | } 37 | -------------------------------------------------------------------------------- /eval.py: -------------------------------------------------------------------------------- 1 | """ 2 | Infinite evaluation loop going through the checkpoints in the model directory 3 | as they appear and evaluating them. Accuracy and average loss are printed and 4 | added as tensorboard summaries. 5 | """ 6 | from __future__ import absolute_import 7 | from __future__ import division 8 | from __future__ import print_function 9 | 10 | import matplotlib 11 | 12 | matplotlib.use('Agg') 13 | import matplotlib.pyplot as plt 14 | import json 15 | import math 16 | import os 17 | 18 | import numpy as np 19 | import tensorflow as tf 20 | import argparse 21 | 22 | from pgd_attack import PGDAttack, compute_grad 23 | 24 | rows = cols = 8 25 | 26 | 27 | def show_images(images, cols=1, figpath="figure.png"): 28 | """Display a list of images in a single figure with matplotlib. 29 | 30 | Parameters 31 | --------- 32 | images: List of np.arrays compatible with plt.imshow. 33 | 34 | cols (Default = 1): Number of columns in figure (number of rows is 35 | set to np.ceil(n_images/float(cols))). 36 | 37 | titles: List of titles corresponding to each image. Must have 38 | the same length as titles. 39 | """ 40 | n_images = len(images) 41 | fig = plt.figure() 42 | for n, image in enumerate(images): 43 | a = fig.add_subplot(cols, np.ceil(n_images / float(cols)), n + 1) 44 | if image.ndim == 2: 45 | plt.gray() 46 | if np.max(image) > 1.0: 47 | image = image.astype(np.uint8) 48 | 49 | plt.imshow(image) 50 | plt.savefig(figpath) 51 | plt.close() 52 | 53 | 54 | # A function for evaluating a single checkpoint 55 | def evaluate(model, eval_attacks, sess, config, plot=False, summary_writer=None, eval_train=False, eval_validation=False, verbose=True): 56 | num_eval_examples = config['num_eval_examples'] 57 | eval_batch_size = config['eval_batch_size'] 58 | 59 | dataset = config["data"] 60 | assert dataset in ["mnist", "cifar10"] 61 | 62 | if dataset == "mnist": 63 | from tensorflow.examples.tutorials.mnist import input_data 64 | mnist = input_data.read_data_sets('MNIST_data', one_hot=False) 65 | if "model_type" in config and config["model_type"] == "linear": 66 | x_train = mnist.train.images 67 | y_train = mnist.train.labels 68 | x_test = mnist.test.images 69 | y_test = mnist.test.labels 70 | 71 | pos_train = (y_train == 5) | (y_train == 7) 72 | x_train = x_train[pos_train] 73 | y_train = y_train[pos_train] 74 | y_train = (y_train == 5).astype(np.int64) 75 | pos_test = (y_test == 5) | (y_test == 7) 76 | x_test = x_test[pos_test] 77 | y_test = y_test[pos_test] 78 | y_test = (y_test == 5).astype(np.int64) 79 | 80 | from tensorflow.contrib.learn.python.learn.datasets.mnist import DataSet 81 | from tensorflow.contrib.learn.python.learn.datasets import base 82 | 83 | options = dict(dtype=tf.uint8, reshape=False, seed=None) 84 | train = DataSet(x_train, y_train, **options) 85 | test = DataSet(x_test, y_test, **options) 86 | 87 | mnist = base.Datasets(train=train, validation=None, test=test) 88 | else: 89 | import cifar10_input 90 | data_path = config["data_path"] 91 | cifar = cifar10_input.CIFAR10Data(data_path) 92 | 93 | np.random.seed(0) 94 | tf.random.set_random_seed(0) 95 | global_step = tf.contrib.framework.get_or_create_global_step() 96 | 97 | # Iterate over the samples batch-by-batch 98 | num_batches = int(math.ceil(num_eval_examples / eval_batch_size)) 99 | total_xent_nat = 0. 100 | total_xent_advs = np.zeros(len(eval_attacks), dtype=np.float32) 101 | total_corr_nat = 0. 102 | total_corr_advs = [[] for _ in range(len(eval_attacks))] 103 | 104 | l1_norms = [[] for _ in range(len(eval_attacks))] 105 | l2_norms = [[] for _ in range(len(eval_attacks))] 106 | linf_norms = [[] for _ in range(len(eval_attacks))] 107 | 108 | for ibatch in range(num_batches): 109 | bstart = ibatch * eval_batch_size 110 | bend = min(bstart + eval_batch_size, num_eval_examples) 111 | 112 | if eval_train: 113 | if dataset == "mnist": 114 | x_batch = mnist.train.images[bstart:bend, :].reshape(-1, 28, 28, 1) 115 | y_batch = mnist.train.labels[bstart:bend] 116 | else: 117 | x_batch = cifar.train_data.xs[bstart:bend, :].astype(np.float32) 118 | y_batch = cifar.train_data.ys[bstart:bend] 119 | elif eval_validation: 120 | assert dataset == "cifar10" 121 | offset = len(cifar.eval_data.ys) - num_eval_examples 122 | x_batch = cifar.eval_data.xs[offset+bstart:offset+bend, :].astype(np.float32) 123 | y_batch = cifar.eval_data.ys[offset+bstart:offset+bend] 124 | 125 | else: 126 | if dataset == "mnist": 127 | x_batch = mnist.test.images[bstart:bend, :].reshape(-1, 28, 28, 1) 128 | y_batch = mnist.test.labels[bstart:bend] 129 | else: 130 | x_batch = cifar.eval_data.xs[bstart:bend, :].astype(np.float32) 131 | y_batch = cifar.eval_data.ys[bstart:bend] 132 | 133 | noop_trans = np.zeros([len(x_batch), 3]) 134 | dict_nat = {model.x_input: x_batch, 135 | model.y_input: y_batch, 136 | model.is_training: False, 137 | model.transform: noop_trans} 138 | 139 | cur_corr_nat, cur_xent_nat = sess.run( 140 | [model.num_correct, model.xent], 141 | feed_dict=dict_nat) 142 | 143 | total_xent_nat += cur_xent_nat 144 | total_corr_nat += cur_corr_nat 145 | 146 | for i, attack in enumerate(eval_attacks): 147 | x_batch_adv, adv_trans = attack.perturb(x_batch, y_batch, sess) 148 | 149 | dict_adv = {model.x_input: x_batch_adv, 150 | model.y_input: y_batch, 151 | model.is_training: False, 152 | model.transform: adv_trans if adv_trans is not None else np.zeros([len(x_batch), 3])} 153 | 154 | cur_corr_adv, cur_xent_adv, cur_corr_pred, cur_adv_images = \ 155 | sess.run([model.num_correct, model.xent, model.correct_prediction, model.x_image], 156 | feed_dict=dict_adv) 157 | 158 | total_xent_advs[i] += cur_xent_adv 159 | total_corr_advs[i].extend(cur_corr_pred) 160 | 161 | l1_norms[i].extend(np.sum(np.abs(x_batch_adv - x_batch).reshape(len(x_batch), -1), axis=-1)) 162 | l2_norms[i].extend(np.linalg.norm((x_batch_adv - x_batch).reshape(len(x_batch), -1), axis=-1)) 163 | linf_norms[i].extend(np.max(np.abs(x_batch_adv - x_batch).reshape(len(x_batch), -1), axis=-1)) 164 | 165 | avg_xent_nat = total_xent_nat / num_eval_examples 166 | acc_nat = total_corr_nat / num_eval_examples 167 | 168 | avg_xent_advs = total_xent_advs / num_eval_examples 169 | acc_advs = np.sum(total_corr_advs, axis=-1) / num_eval_examples 170 | 171 | if len(eval_attacks) > 0: 172 | tot_correct = np.bitwise_and.reduce(np.asarray(total_corr_advs), 0) 173 | assert len(tot_correct) == num_eval_examples 174 | any_acc = np.sum(tot_correct) / num_eval_examples 175 | 176 | if verbose: 177 | print('natural: {:.2f}%'.format(100 * acc_nat)) 178 | for i, attack in enumerate(eval_attacks): 179 | t = attack.name 180 | print('adversarial ({}):'.format(t)) 181 | print('\tacc: {:.2f}% '.format(100 * acc_advs[i])) 182 | print("\tmean(l1)={:.1f}, min(l1)={:.1f}, max(l1)={:.1f}".format( 183 | np.mean(l1_norms[i]), np.min(l1_norms[i]), np.max(l1_norms[i]))) 184 | print("\tmean(l2)={:.1f}, min(l2)={:.1f}, max(l2)={:.1f}".format( 185 | np.mean(l2_norms[i]), np.min(l2_norms[i]), np.max(l2_norms[i]))) 186 | print("\tmean(linf)={:.1f}, min(linf)={:.1f}, max(linf)={:.1f}".format( 187 | np.mean(linf_norms[i]), np.min(linf_norms[i]), np.max(linf_norms[i]))) 188 | 189 | print('avg nat loss: {:.2f}'.format(avg_xent_nat)) 190 | for i, attack in enumerate(eval_attacks): 191 | t = attack.name 192 | print('avg adv loss ({}): {:.2f}'.format(t, avg_xent_advs[i])) 193 | 194 | if len(eval_attacks) > 0: 195 | print("any attack: {:.2f}%".format(100 * any_acc)) 196 | 197 | if summary_writer: 198 | 199 | values = [ 200 | tf.Summary.Value(tag='xent nat', simple_value=avg_xent_nat), 201 | tf.Summary.Value(tag='accuracy nat', simple_value=acc_nat) 202 | ] 203 | if len(eval_attacks) > 0: 204 | values.append(tf.Summary.Value(tag='accuracy adv any', simple_value=any_acc)) 205 | 206 | for i, attack in enumerate(eval_attacks): 207 | t = attack.name 208 | adv_values = [ 209 | tf.Summary.Value(tag='xent adv eval ({})'.format(t), simple_value=avg_xent_advs[i]), 210 | tf.Summary.Value(tag='xent adv ({})'.format(t), simple_value=avg_xent_advs[i]), 211 | tf.Summary.Value(tag='accuracy adv eval ({})'.format(t), simple_value=acc_advs[i]), 212 | tf.Summary.Value(tag='accuracy adv ({})'.format(t), simple_value=acc_advs[i]) 213 | ] 214 | values.extend(adv_values) 215 | 216 | summary = tf.Summary(value=values) 217 | summary_writer.add_summary(summary, global_step.eval(sess)) 218 | 219 | return acc_nat, total_corr_advs 220 | 221 | 222 | if __name__ == "__main__": 223 | parser = argparse.ArgumentParser( 224 | description='Eval script options', 225 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 226 | parser.add_argument('model_dir', type=str, 227 | help='path to model directory') 228 | parser.add_argument('--epoch', type=int, default=None, 229 | help='specific epoch to load (default=latest)') 230 | parser.add_argument('--eval_train', help='evaluate on training set', 231 | action="store_true") 232 | parser.add_argument('--eval_cpu', help='evaluate on CPU', 233 | action="store_true") 234 | args = parser.parse_args() 235 | 236 | if args.eval_cpu: 237 | os.environ['CUDA_VISIBLE_DEVICES'] = '' 238 | 239 | model_dir = args.model_dir 240 | 241 | with open(model_dir + "/config.json") as config_file: 242 | config = json.load(config_file) 243 | 244 | eval_attack_configs = [np.asarray(config["attacks"])[i] for i in config["eval_attacks"]] 245 | print(eval_attack_configs) 246 | 247 | dataset = config["data"] 248 | if dataset == "mnist": 249 | from model import Model 250 | model = Model(config) 251 | 252 | x_min, x_max = 0.0, 1.0 253 | else: 254 | from cifar10_model import Model 255 | model = Model(config) 256 | x_min, x_max = 0.0, 255.0 257 | 258 | grad = compute_grad(model) 259 | eval_attacks = [PGDAttack(model, a_config, x_min, x_max, grad) for a_config in eval_attack_configs] 260 | 261 | global_step = tf.contrib.framework.get_or_create_global_step() 262 | 263 | if not os.path.exists(model_dir): 264 | os.makedirs(model_dir) 265 | eval_dir = os.path.join(model_dir, 'eval') 266 | if not os.path.exists(eval_dir): 267 | os.makedirs(eval_dir) 268 | 269 | saver = tf.train.Saver() 270 | 271 | if args.epoch is not None: 272 | ckpts = tf.train.get_checkpoint_state(model_dir).all_model_checkpoint_paths 273 | ckpt = [c for c in ckpts if c.endswith('checkpoint-{}'.format(args.epoch))] 274 | assert len(ckpt) == 1 275 | cur_checkpoint = ckpt[0] 276 | else: 277 | cur_checkpoint = tf.train.latest_checkpoint(model_dir) 278 | assert cur_checkpoint is not None 279 | 280 | config_tf = tf.ConfigProto() 281 | config_tf.gpu_options.allow_growth = True 282 | config_tf.gpu_options.per_process_gpu_memory_fraction = 1.0 283 | 284 | with tf.Session(config=config_tf) as sess: 285 | # Restore the checkpoint 286 | print('Evaluating checkpoint {}'.format(cur_checkpoint)) 287 | 288 | saver.restore(sess, cur_checkpoint) 289 | 290 | evaluate(model, eval_attacks, sess, config, plot=True, eval_train=args.eval_train) 291 | 292 | -------------------------------------------------------------------------------- /eval_ch.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | import json 6 | import math 7 | import os 8 | 9 | import numpy as np 10 | import tensorflow as tf 11 | import argparse 12 | 13 | from cleverhans.utils import set_log_level 14 | from cleverhans.attacks import ElasticNetMethod, CarliniWagnerL2 15 | from cleverhans.evaluation import batch_eval 16 | import logging 17 | 18 | 19 | def one_hot(a, n_classes): 20 | res = np.zeros((len(a), n_classes), dtype=np.int64) 21 | res[np.arange(len(a)), a] = 1 22 | return res 23 | 24 | 25 | def evaluate_ch(model, config, sess, norm='l1', bound=None, verbose=True): 26 | dataset = config['data'] 27 | num_eval_examples = config['num_eval_examples'] 28 | eval_batch_size = config['eval_batch_size'] 29 | 30 | if dataset == "mnist": 31 | from tensorflow.examples.tutorials.mnist import input_data 32 | mnist = input_data.read_data_sets('MNIST_data', one_hot=False) 33 | X = mnist.test.images[0:num_eval_examples, :].reshape(-1, 28, 28, 1) 34 | Y = mnist.test.labels[0:num_eval_examples] 35 | x_image = tf.placeholder(tf.float32, shape=[None, 28, 28, 1]) 36 | else: 37 | import cifar10_input 38 | data_path = config["data_path"] 39 | cifar = cifar10_input.CIFAR10Data(data_path) 40 | X = cifar.eval_data.xs[0:num_eval_examples, :].astype(np.float32) / 255.0 41 | Y = cifar.eval_data.ys[0:num_eval_examples] 42 | x_image = tf.placeholder(tf.float32, shape=[None, 32, 32, 3]) 43 | assert norm == 'l1' 44 | 45 | if norm=='l2': 46 | attack = CarliniWagnerL2(model, sess) 47 | params = {'batch_size': eval_batch_size, 'binary_search_steps': 9} 48 | else: 49 | attack = ElasticNetMethod(model, sess, clip_min=0.0, clip_max=1.0) 50 | params = {'beta': 1e-2, 51 | 'decision_rule': 'L1', 52 | 'batch_size': eval_batch_size, 53 | 'learning_rate': 1e-2, 54 | 'max_iterations': 1000} 55 | 56 | if verbose: 57 | set_log_level(logging.DEBUG, name="cleverhans") 58 | 59 | y = tf.placeholder(tf.int64, shape=[None, 10]) 60 | params['y'] = y 61 | adv_x = attack.generate(x_image, **params) 62 | preds_adv = model.get_predicted_class(adv_x) 63 | preds_nat = model.get_predicted_class(x_image) 64 | 65 | all_preds, all_preds_adv, all_adv_x = batch_eval( 66 | sess, [x_image, y], [preds_nat, preds_adv, adv_x], [X, one_hot(Y, 10)], batch_size=eval_batch_size) 67 | 68 | print('acc nat', np.mean(all_preds == Y)) 69 | print('acc adv', np.mean(all_preds_adv == Y)) 70 | 71 | if dataset == "cifar10": 72 | X *= 255.0 73 | all_adv_x *= 255.0 74 | 75 | if norm == 'l2': 76 | lps = np.sqrt(np.sum(np.square(all_adv_x - X), axis=(1,2,3))) 77 | else: 78 | lps = np.sum(np.abs(all_adv_x - X), axis=(1,2,3)) 79 | print('mean lp: ', np.mean(lps)) 80 | for b in [bound, bound/2.0, bound/4.0, bound/8.0]: 81 | print('lp={}, acc={}'.format(b, np.mean((all_preds_adv == Y) | (lps > b)))) 82 | 83 | all_corr_adv = (all_preds_adv == Y) 84 | all_corr_nat = (all_preds == Y) 85 | return all_corr_nat, all_corr_adv, lps 86 | 87 | 88 | def get_model(config): 89 | dataset = config["data"] 90 | if dataset == "mnist": 91 | from cleverhans_models import MadryMNIST 92 | model = MadryMNIST() 93 | else: 94 | from cleverhans_models import make_wresnet 95 | model = make_wresnet(scope="a", filters=config["filters"]) 96 | 97 | return model 98 | 99 | 100 | def get_saver(config): 101 | dataset = config["data"] 102 | if dataset == "cifar10": 103 | # nasty hack 104 | gvars = tf.global_variables() 105 | saver = tf.train.Saver({v.name[2:-2]: v for v in gvars if v.name[:2] == "a/"}) 106 | else: 107 | saver = tf.train.Saver() 108 | return saver 109 | 110 | 111 | if __name__ == "__main__": 112 | parser = argparse.ArgumentParser( 113 | description='Eval script options', 114 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 115 | parser.add_argument('model_dir', type=str, 116 | help='path to model directory') 117 | parser.add_argument('--epoch', type=int, default=None, 118 | help='specific epoch to load (default=latest)') 119 | parser.add_argument('--eval_cpu', help='evaluate on CPU', 120 | action="store_true") 121 | parser.add_argument('--norm', help='norm to use', choices=['l1', 'l2'], default='l1') 122 | parser.add_argument('--bound', type=float, help='attack noise bound', default=None) 123 | 124 | args = parser.parse_args() 125 | 126 | if args.eval_cpu: 127 | os.environ['CUDA_VISIBLE_DEVICES'] = '' 128 | 129 | model_dir = args.model_dir 130 | 131 | with open(model_dir + "/config.json") as config_file: 132 | config = json.load(config_file) 133 | 134 | model = get_model(config) 135 | saver = get_saver(config) 136 | 137 | if args.epoch is not None: 138 | ckpts = tf.train.get_checkpoint_state(model_dir).all_model_checkpoint_paths 139 | ckpt = [c for c in ckpts if c.endswith('checkpoint-{}'.format(args.epoch))] 140 | assert len(ckpt) == 1 141 | cur_checkpoint = ckpt[0] 142 | else: 143 | cur_checkpoint = tf.train.latest_checkpoint(model_dir) 144 | 145 | assert cur_checkpoint is not None 146 | 147 | config_tf = tf.ConfigProto() 148 | config_tf.gpu_options.allow_growth = True 149 | config_tf.gpu_options.per_process_gpu_memory_fraction = 0.1 150 | 151 | with tf.Session(config=config_tf) as sess: 152 | # Restore the checkpoint 153 | print('Evaluating checkpoint {}'.format(cur_checkpoint)) 154 | saver.restore(sess, cur_checkpoint) 155 | 156 | evaluate_ch(model, config, sess, args.norm, args.bound) 157 | -------------------------------------------------------------------------------- /eval_fb.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | import os 6 | import math 7 | import json 8 | from tqdm import tqdm 9 | 10 | import numpy as np 11 | import tensorflow as tf 12 | 13 | import argparse 14 | import foolbox 15 | 16 | 17 | 18 | def evaluate_fb(model, config, x_min, x_max, norm='l1', bound=None, verbose=True): 19 | fmodel = foolbox.models.TensorFlowModel(model.x_input, model.pre_softmax, (x_min, x_max)) 20 | 21 | if norm == 'l2': 22 | attack = foolbox.attacks.BoundaryAttack(fmodel) 23 | else: 24 | attack = foolbox.attacks.PointwiseAttack(fmodel) 25 | 26 | dataset = config["data"] 27 | num_eval_examples = config['num_eval_examples'] 28 | eval_batch_size = config['eval_batch_size'] 29 | 30 | if dataset == "mnist": 31 | from tensorflow.examples.tutorials.mnist import input_data 32 | mnist = input_data.read_data_sets('MNIST_data', one_hot=False) 33 | 34 | if "model_type" in config and config["model_type"] == "linear": 35 | x_train = mnist.train.images 36 | y_train = mnist.train.labels 37 | x_test = mnist.test.images 38 | y_test = mnist.test.labels 39 | 40 | pos_train = (y_train == 5) | (y_train == 7) 41 | x_train = x_train[pos_train] 42 | y_train = y_train[pos_train] 43 | y_train = (y_train == 5).astype(np.int64) 44 | pos_test = (y_test == 5) | (y_test == 7) 45 | x_test = x_test[pos_test] 46 | y_test = y_test[pos_test] 47 | y_test = (y_test == 5).astype(np.int64) 48 | 49 | from tensorflow.contrib.learn.python.learn.datasets.mnist import DataSet 50 | from tensorflow.contrib.learn.python.learn.datasets import base 51 | 52 | options = dict(dtype=tf.uint8, reshape=False, seed=None) 53 | train = DataSet(x_train, y_train, **options) 54 | test = DataSet(x_test, y_test, **options) 55 | 56 | mnist = base.Datasets(train=train, validation=None, test=test) 57 | 58 | else: 59 | import cifar10_input 60 | data_path = config["data_path"] 61 | cifar = cifar10_input.CIFAR10Data(data_path) 62 | 63 | # Iterate over the samples batch-by-batch 64 | num_batches = int(math.ceil(num_eval_examples / eval_batch_size)) 65 | all_corr_nat = [] 66 | all_corr_adv = [] 67 | lps = [] 68 | 69 | num_inconsistencies = 0 70 | num_solved_inconsistencies = 0 71 | 72 | pbar = tqdm(total=num_eval_examples) 73 | 74 | for ibatch in range(num_batches): 75 | bstart = ibatch * eval_batch_size 76 | bend = min(bstart + eval_batch_size, num_eval_examples) 77 | 78 | if dataset == "mnist": 79 | x_batch = mnist.test.images[bstart:bend, :].reshape(-1, 28, 28, 1) 80 | y_batch = mnist.test.labels[bstart:bend] 81 | else: 82 | x_batch = cifar.eval_data.xs[bstart:bend, :].astype(np.float32) 83 | y_batch = cifar.eval_data.ys[bstart:bend] 84 | 85 | adversarials = [] 86 | preds_adv = [] 87 | for x, y in zip(x_batch, y_batch): 88 | 89 | for trial in range(1): 90 | if norm == "l2": 91 | adversarial = attack(x, y, iterations=5000, max_directions=25) 92 | else: 93 | adversarial = attack(x, y) 94 | failed = False 95 | if adversarial is None: 96 | failed = True 97 | adversarial = x 98 | 99 | pred_adv = y 100 | if not failed: 101 | pred_adv = np.argmax(fmodel.predictions(adversarial)) 102 | if pred_adv == y: 103 | num_inconsistencies += 1 104 | if verbose: 105 | print("Inconsistency with l2 {:.3f}!".format(np.sqrt(np.sum(np.square(adversarial - x))))) 106 | new_adversarials = np.asarray([x + a * (adversarial - x) for a in [1.001, 1.005, 1.01, 1.05, 1.1]]) 107 | new_preds_adv = np.argmax(fmodel.batch_predictions(new_adversarials), axis=-1) 108 | 109 | if ((new_preds_adv == y)).all(): 110 | failed = True 111 | adversarial = x 112 | if verbose: 113 | print("Failed to resolve inconsistency!") 114 | else: 115 | adversarial = new_adversarials[np.argmin(new_preds_adv != y)] 116 | pred_adv = new_preds_adv[np.argmin(new_preds_adv != y)] 117 | num_solved_inconsistencies += 1 118 | if verbose: 119 | print("Solved inconsistency") 120 | 121 | if norm == 'l1': 122 | lp = np.sum(np.abs(adversarial - x)) 123 | else: 124 | lp = np.sqrt(np.sum(np.square(adversarial - x))) 125 | 126 | if verbose: 127 | print("trial {}".format(trial), lp, failed) 128 | 129 | if lp < bound: 130 | break 131 | lps.append(lp) 132 | adversarials.append(adversarial) 133 | preds_adv.append(pred_adv) 134 | if not verbose: 135 | pbar.update(n=1) 136 | 137 | preds = np.argmax(fmodel.batch_predictions(x_batch), axis=-1) 138 | all_corr_nat.extend(preds == y_batch) 139 | all_corr_adv.extend(preds_adv == y_batch) 140 | 141 | if verbose: 142 | all_corr_adv = np.asarray(all_corr_adv) 143 | all_corr_nat = np.asarray(all_corr_nat) 144 | lps = np.asarray(lps) 145 | print('acc adv w. bound', np.mean(all_corr_adv | ((lps > bound) & all_corr_nat))) 146 | 147 | pbar.close() 148 | 149 | all_corr_adv = np.asarray(all_corr_adv) 150 | all_corr_nat = np.asarray(all_corr_nat) 151 | lps = np.asarray(lps) 152 | 153 | acc_nat = np.mean(all_corr_nat) 154 | acc_adv = np.mean(all_corr_adv) 155 | print('acc_nat', acc_nat) 156 | print('acc_adv', acc_adv) 157 | print('min(lp)={:.2f}, max(lp)={:.2f}, mean(lp)={:.2f}, median(lp)={:.2f}'.format( 158 | np.min(lps), np.max(lps), np.mean(lps), np.median(lps))) 159 | print('acc adv w. bound', np.mean(all_corr_adv | ((lps > bound) & all_corr_nat))) 160 | 161 | print("num_inconsistencies", num_inconsistencies) 162 | print("num_solved_inconsistencies", num_solved_inconsistencies) 163 | 164 | return all_corr_nat, all_corr_adv, lps 165 | 166 | 167 | if __name__ == "__main__": 168 | parser = argparse.ArgumentParser( 169 | description='Eval script options', 170 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 171 | parser.add_argument('model_dir', type=str, 172 | help='path to model directory') 173 | parser.add_argument('--epoch', type=int, default=None, 174 | help='specific epoch to load (default=latest)') 175 | parser.add_argument('--eval_cpu', help='evaluate on CPU', 176 | action="store_true") 177 | parser.add_argument('--norm', help='norm to use', choices=['l1', 'l2'], default='l1') 178 | parser.add_argument('--bound', type=float, help='Foolbox pointwise attack noise bound', default=None) 179 | 180 | args = parser.parse_args() 181 | 182 | if args.eval_cpu: 183 | os.environ['CUDA_VISIBLE_DEVICES'] = '' 184 | 185 | model_dir = args.model_dir 186 | 187 | with open(model_dir + "/config.json") as config_file: 188 | config = json.load(config_file) 189 | 190 | dataset = config["data"] 191 | if dataset == "mnist": 192 | from model import Model 193 | model = Model(config) 194 | 195 | x_min, x_max = 0.0, 1.0 196 | else: 197 | from cifar10_model import Model 198 | model = Model(config) 199 | x_min, x_max = 0.0, 255.0 200 | 201 | saver = tf.train.Saver() 202 | if args.epoch is not None: 203 | ckpts = tf.train.get_checkpoint_state(model_dir).all_model_checkpoint_paths 204 | ckpt = [c for c in ckpts if c.endswith('checkpoint-{}'.format(args.epoch))] 205 | assert len(ckpt) == 1 206 | cur_checkpoint = ckpt[0] 207 | else: 208 | cur_checkpoint = tf.train.latest_checkpoint(model_dir) 209 | 210 | assert cur_checkpoint is not None 211 | 212 | config_tf = tf.ConfigProto() 213 | config_tf.gpu_options.allow_growth = True 214 | if dataset == "mnist": 215 | config_tf.gpu_options.per_process_gpu_memory_fraction = 0.1 216 | else: 217 | config_tf.gpu_options.per_process_gpu_memory_fraction = 0.1 218 | 219 | with tf.Session(config=config_tf) as sess: 220 | # Restore the checkpoint 221 | print('Evaluating checkpoint {}'.format(cur_checkpoint)) 222 | saver.restore(sess, cur_checkpoint) 223 | 224 | evaluate_fb(model, config, x_min, x_max, args.norm, args.bound) 225 | 226 | -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- 1 | """ 2 | The model is adapted from the tensorflow tutorial: 3 | https://www.tensorflow.org/get_started/mnist/pros 4 | """ 5 | from __future__ import absolute_import 6 | from __future__ import division 7 | from __future__ import print_function 8 | 9 | import tensorflow as tf 10 | import numpy as np 11 | 12 | 13 | class Model(object): 14 | def __init__(self, config): 15 | assert config["model_type"] in ["cnn", "linear"] 16 | self.is_training = tf.placeholder(tf.bool) 17 | self.x_input = tf.placeholder(tf.float32, shape = [None, 28, 28, 1]) 18 | self.y_input = tf.placeholder(tf.int64, shape = [None]) 19 | 20 | self.transform = tf.placeholder_with_default(tf.zeros((tf.shape(self.x_input)[0], 3)), shape=[None, 3]) 21 | trans_x, trans_y, rot = tf.unstack(self.transform, axis=1) 22 | rot *= np.pi / 180 # convert degrees to radians 23 | 24 | x = self.x_input 25 | 26 | #rotate and translate image 27 | ones = tf.ones(shape=tf.shape(trans_x)) 28 | zeros = tf.zeros(shape=tf.shape(trans_x)) 29 | trans = tf.stack([ones, zeros, -trans_x, 30 | zeros, ones, -trans_y, 31 | zeros, zeros], axis=1) 32 | x = tf.contrib.image.rotate(x, rot, interpolation='BILINEAR') 33 | x = tf.contrib.image.transform(x, trans, interpolation='BILINEAR') 34 | self.x_image = x 35 | 36 | ch = 1 37 | 38 | if config["model_type"] == "cnn": 39 | x.set_shape((None, 28, 28, 1)) 40 | x = tf.layers.conv2d(x, 32, (5, 5), activation='relu', padding='same', name='conv1') 41 | x = tf.layers.max_pooling2d(x, (2, 2), (2, 2), padding='same') 42 | x = tf.layers.conv2d(x, 64, (5, 5), activation='relu', padding='same', name='conv2') 43 | x = tf.layers.max_pooling2d(x, (2, 2), (2, 2), padding='same') 44 | 45 | x = tf.layers.flatten(x) 46 | #x = tf.layers.flatten(tf.transpose(x, (0, 3, 1, 2))) 47 | x = tf.layers.dense(x, 1024, activation='relu', name='fc1') 48 | self.pre_softmax = tf.layers.dense(x, 10, name='fc2') 49 | else: 50 | W_fc = self._weight_variable([784*ch, 2]) 51 | b_fc = self._bias_variable([2]) 52 | self.W = W_fc 53 | self.b = b_fc 54 | x_flat = tf.reshape(x, [-1, 784*ch]) 55 | self.pre_softmax = tf.matmul(x_flat, W_fc) + b_fc 56 | 57 | self.y_xent = tf.nn.sparse_softmax_cross_entropy_with_logits( 58 | labels=self.y_input, logits=self.pre_softmax) 59 | 60 | self.xent = tf.reduce_sum(self.y_xent) 61 | self.mean_xent = tf.reduce_mean(self.y_xent) 62 | 63 | self.y_pred = tf.argmax(self.pre_softmax, 1) 64 | 65 | self.correct_prediction = tf.equal(self.y_pred, self.y_input) 66 | 67 | self.num_correct = tf.reduce_sum(tf.cast(self.correct_prediction, tf.int64)) 68 | self.accuracy = tf.reduce_mean(tf.cast(self.correct_prediction, tf.float32)) 69 | 70 | @staticmethod 71 | def _weight_variable(shape): 72 | initial = tf.truncated_normal(shape, stddev=0.1) 73 | return tf.Variable(initial) 74 | 75 | @staticmethod 76 | def _bias_variable(shape): 77 | initial = tf.constant(0.1, shape = shape) 78 | return tf.Variable(initial) 79 | 80 | @staticmethod 81 | def _conv2d(x, W): 82 | return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME') 83 | 84 | @staticmethod 85 | def _max_pool_2x2( x): 86 | return tf.nn.max_pool(x, 87 | ksize = [1,2,2,1], 88 | strides=[1,2,2,1], 89 | padding='SAME') 90 | -------------------------------------------------------------------------------- /pgd_attack.py: -------------------------------------------------------------------------------- 1 | """ 2 | Implementation of attack methods. Running this file as a program will 3 | apply the attack to the model specified by the config file and store 4 | the examples in an .npy file. 5 | """ 6 | from __future__ import absolute_import 7 | from __future__ import division 8 | from __future__ import print_function 9 | 10 | import tensorflow as tf 11 | import numpy as np 12 | from itertools import product 13 | from collections import Counter 14 | import json 15 | 16 | 17 | def uniform_weights(n_attacks, n_samples): 18 | x = np.random.uniform(size=(n_attacks, n_samples)) 19 | y = np.maximum(-np.log(x), 1e-8) 20 | return y / np.sum(y, axis=0, keepdims=True) 21 | 22 | 23 | def init_delta(x, attack, weight): 24 | if not attack["random_start"]: 25 | return np.zeros_like(x) 26 | 27 | assert len(weight) == len(x) 28 | eps = (attack["epsilon"] * weight).reshape(len(x), 1, 1, 1) 29 | 30 | if attack["type"] == "linf": 31 | return np.random.uniform(-eps, eps, x.shape) 32 | elif attack["type"] == "l2": 33 | r = np.random.randn(*x.shape) 34 | norm = np.linalg.norm(r.reshape(r.shape[0], -1), axis=-1).reshape(-1, 1, 1, 1) 35 | return (r / norm) * eps 36 | elif attack["type"] == "l1": 37 | r = np.random.laplace(size=x.shape) 38 | norm = np.linalg.norm(r.reshape(r.shape[0], -1), axis=-1, ord=1).reshape(-1, 1, 1, 1) 39 | return (r / norm) * eps 40 | else: 41 | raise ValueError("Unknown norm {}".format(attack["type"])) 42 | 43 | 44 | def delta_update(old_delta, g, x_adv, attack, x_min, x_max, weight, seed=None, t=None): 45 | assert len(weight) == len(x_adv) 46 | 47 | eps_w = attack["epsilon"] * weight 48 | eps = eps_w.reshape(len(x_adv), 1, 1, 1) 49 | 50 | if attack["type"] == "linf": 51 | a = attack.get('a', (2.5 * eps) / attack["k"]) 52 | new_delta = old_delta + a * np.sign(g) 53 | new_delta = np.clip(new_delta, -eps, eps) 54 | 55 | new_delta = np.clip(new_delta, x_min - (x_adv - old_delta), x_max - (x_adv - old_delta)) 56 | return new_delta 57 | 58 | elif attack["type"] == "l2": 59 | a = attack.get('a', (2.5 * eps) / attack["k"]) 60 | bad_pos = ((x_adv == x_max) & (g > 0)) | ((x_adv == x_min) & (g < 0)) 61 | g[bad_pos] = 0 62 | 63 | g = g.reshape(len(g), -1) 64 | g /= np.maximum(np.linalg.norm(g, axis=-1, keepdims=True), 1e-8) 65 | g = g.reshape(old_delta.shape) 66 | 67 | new_delta = old_delta + a * g 68 | new_delta_norm = np.linalg.norm(new_delta.reshape(len(new_delta), -1), axis=-1).reshape(-1, 1, 1, 1) 69 | new_delta = new_delta / np.maximum(new_delta_norm, 1e-8) * np.minimum(new_delta_norm, eps) 70 | new_delta = np.clip(new_delta, x_min - (x_adv - old_delta), x_max - (x_adv - old_delta)) 71 | return new_delta 72 | 73 | elif attack["type"] == "l1": 74 | _, h, w, ch = g.shape 75 | 76 | a = attack.get('a', 1.0) * x_max 77 | perc = attack.get('perc', 99) 78 | 79 | if perc == 'max': 80 | bad_pos = ((x_adv > (x_max - a)) & (g > 0)) | ((x_adv < a) & (g < 0)) | (x_adv < x_min) | (x_adv > x_max) 81 | g[bad_pos] = 0 82 | else: 83 | bad_pos = ((x_adv == x_max) & (g > 0)) | ((x_adv == x_min) & (g < 0)) 84 | g[bad_pos] = 0 85 | 86 | abs_grad = np.abs(g) 87 | sign = np.sign(g) 88 | 89 | if perc == 'max': 90 | grad_flat = abs_grad.reshape(len(abs_grad), -1) 91 | max_abs_grad = np.argmax(grad_flat, axis=-1) 92 | optimal_perturbation = np.zeros_like(grad_flat) 93 | optimal_perturbation[np.arange(len(grad_flat)), max_abs_grad] = 1.0 94 | optimal_perturbation = sign * optimal_perturbation.reshape(abs_grad.shape) 95 | else: 96 | if isinstance(perc, list): 97 | perc_low, perc_high = perc 98 | perc = np.random.RandomState(seed).uniform(low=perc_low, high=perc_high) 99 | 100 | max_abs_grad = np.percentile(abs_grad, perc, axis=(1, 2, 3), keepdims=True) 101 | tied_for_max = (abs_grad >= max_abs_grad).astype(np.float32) 102 | num_ties = np.sum(tied_for_max, (1, 2, 3), keepdims=True) 103 | optimal_perturbation = sign * tied_for_max / num_ties 104 | 105 | new_delta = old_delta + a * optimal_perturbation 106 | 107 | l1 = np.sum(np.abs(new_delta), axis=(1, 2, 3)) 108 | to_project = l1 > eps_w 109 | if np.any(to_project): 110 | n = np.sum(to_project) 111 | d = new_delta[to_project].reshape(n, -1) # n * N (N=h*w*ch) 112 | abs_d = np.abs(d) # n * N 113 | mu = -np.sort(-abs_d, axis=-1) # n * N 114 | cumsums = mu.cumsum(axis=-1) # n * N 115 | eps_d = eps_w[to_project] 116 | js = 1.0 / np.arange(1, h * w * ch + 1) 117 | temp = mu - js * (cumsums - np.expand_dims(eps_d, -1)) 118 | rho = np.argmin(temp > 0, axis=-1) 119 | theta = 1.0 / (1 + rho) * (cumsums[range(n), rho] - eps_d) 120 | sgn = np.sign(d) 121 | d = sgn * np.maximum(abs_d - np.expand_dims(theta, -1), 0) 122 | new_delta[to_project] = d.reshape(-1, h, w, ch) 123 | 124 | new_delta = np.clip(new_delta, x_min - (x_adv - old_delta), x_max - (x_adv - old_delta)) 125 | return new_delta 126 | 127 | 128 | def compute_grad(model): 129 | label_mask = tf.one_hot(model.y_input, 130 | model.pre_softmax.get_shape().as_list()[-1], 131 | on_value=1.0, 132 | off_value=0.0, 133 | dtype=tf.float32) 134 | correct_logit = tf.reduce_sum(label_mask * model.pre_softmax, axis=1) 135 | wrong_logit = tf.reduce_max((1 - label_mask) * model.pre_softmax - 1e4 * label_mask, axis=1) 136 | loss = -(correct_logit - wrong_logit) 137 | return tf.gradients(loss, model.x_input)[0] 138 | 139 | 140 | def name(attack): 141 | return json.dumps(attack) 142 | 143 | 144 | class PGDAttack: 145 | def __init__(self, model, attack_config, x_min, x_max, grad, reps=1): 146 | """Attack parameter initialization. The attack performs k steps of 147 | size a, while always staying within epsilon from the initial 148 | point.""" 149 | print("new attack: ", attack_config) 150 | if isinstance(attack_config, dict): 151 | attack_config = [attack_config] 152 | 153 | self.model = model 154 | self.x_min = x_min 155 | self.x_max = x_max 156 | self.attack_config = attack_config 157 | self.names = [name(a) for a in attack_config] 158 | self.name = " - ".join(self.names) 159 | self.grad = grad 160 | self.reps = int(attack_config[0].get("reps", 1)) 161 | assert self.reps >= 1 162 | 163 | def perturb(self, x_nat, y, sess, x_nat_no_aug=None): 164 | 165 | if len(self.attack_config) == 0: 166 | return x_nat, None 167 | 168 | if x_nat_no_aug is None: 169 | x_nat_no_aug = x_nat 170 | 171 | n = len(x_nat) 172 | worst_x = np.copy(x_nat) 173 | worst_t = np.zeros([n, 3]) 174 | max_xent = np.zeros(n) 175 | all_correct = np.ones(n).astype(bool) 176 | 177 | for i in range(self.reps): 178 | if "weight" in self.attack_config[0]: 179 | weights = np.asarray([a["weight"] for a in self.attack_config]) 180 | weights = np.repeat(weights[:, np.newaxis], len(x_nat), axis=-1) 181 | else: 182 | weights = uniform_weights(len(self.attack_config), len(x_nat)) 183 | 184 | if self.attack_config[0]["type"] == "RT": 185 | assert np.all([a["type"] != "RT" for a in self.attack_config[1:]]) 186 | norm_attacks = self.attack_config[1:] 187 | norm_weights = weights[1:] 188 | x_adv, trans = self.grid_perturb(x_nat_no_aug, y, sess, self.attack_config[0], 189 | weights[0], norm_attacks, norm_weights) 190 | else: 191 | # rotation and translation attack should always come first 192 | assert np.all([a["type"] != "RT" for a in self.attack_config]) 193 | norm_attacks = self.attack_config 194 | x_adv = self.norm_perturb(x_nat, y, sess, norm_attacks, weights) 195 | trans = worst_t 196 | 197 | cur_xent, cur_correct = sess.run([self.model.y_xent, self.model.correct_prediction], 198 | feed_dict={self.model.x_input: x_adv, 199 | self.model.y_input: y, 200 | self.model.is_training: False, 201 | self.model.transform: trans}) 202 | cur_xent = np.asarray(cur_xent) 203 | cur_correct = np.asarray(cur_correct) 204 | 205 | idx = (cur_xent > max_xent) & (cur_correct == all_correct) 206 | idx = idx | (cur_correct < all_correct) 207 | max_xent = np.maximum(cur_xent, max_xent) 208 | all_correct = cur_correct & all_correct 209 | 210 | idx = np.expand_dims(idx, axis=-1) # shape (bsize, 1) 211 | worst_t = np.where(idx, trans, worst_t) # shape (bsize, 3) 212 | 213 | idx = np.expand_dims(idx, axis=-1) 214 | idx = np.expand_dims(idx, axis=-1) # shape (bsize, 1, 1, 1) 215 | worst_x = np.where(idx, x_adv, worst_x, ) # shape (bsize, h, w, ch) 216 | 217 | return worst_x, worst_t 218 | 219 | def grid_perturb(self, x_nat, y, sess, attack_config, weight, norm_attacks, norm_weights): 220 | random_tries = attack_config["random_tries"] 221 | n = len(x_nat) 222 | 223 | assert len(weight) == len(x_nat) 224 | # (3, 1) * n => (3, n) 225 | spatial_limits = np.asarray(attack_config["spatial_limits"])[:, np.newaxis] * weight 226 | 227 | if random_tries > 0: 228 | grids = np.zeros((n, random_tries)) 229 | else: 230 | # exhaustive grid 231 | # n * (num_x * num_y * num_rot) 232 | grids = [list(product(*list(np.linspace(-l, l, num=g) 233 | for l, g in zip(spatial_limits[:, i], attack_config["grid_granularity"])))) 234 | for i in range(len(x_nat))] 235 | grids = np.asarray(grids) 236 | 237 | worst_x = np.copy(x_nat) 238 | worst_t = np.zeros([n, 3]) 239 | max_xent = np.zeros(n) 240 | all_correct = np.ones(n).astype(bool) 241 | 242 | for idx in range(len(grids[0])): 243 | if random_tries > 0: 244 | t = [[np.random.uniform(-l, l) for l in spatial_limits[:, i]] for i in range(len(x_nat))] 245 | else: 246 | t = grids[:, idx] 247 | 248 | x = self.norm_perturb(x_nat, y, sess, norm_attacks, norm_weights, trans=t) 249 | 250 | curr_dict = {self.model.x_input: x, 251 | self.model.y_input: y, 252 | self.model.is_training: False, 253 | self.model.transform: t} 254 | 255 | cur_xent, cur_correct = sess.run([self.model.y_xent, 256 | self.model.correct_prediction], 257 | feed_dict=curr_dict) # shape (bsize,) 258 | cur_xent = np.asarray(cur_xent) 259 | cur_correct = np.asarray(cur_correct) 260 | 261 | # Select indices to update: we choose the misclassified transformation 262 | # of maximum xent (or just highest xent if everything else if correct). 263 | idx = (cur_xent > max_xent) & (cur_correct == all_correct) 264 | idx = idx | (cur_correct < all_correct) 265 | max_xent = np.maximum(cur_xent, max_xent) 266 | all_correct = cur_correct & all_correct 267 | 268 | idx = np.expand_dims(idx, axis=-1) # shape (bsize, 1) 269 | worst_t = np.where(idx, t, worst_t) # shape (bsize, 3) 270 | 271 | idx = np.expand_dims(idx, axis=-1) 272 | idx = np.expand_dims(idx, axis=-1) # shape (bsize, 1, 1, 1) 273 | worst_x = np.where(idx, x, worst_x, ) # shape (bsize, h, w, ch) 274 | 275 | return worst_x, worst_t 276 | 277 | def norm_perturb(self, x_nat, y, sess, norm_attacks, norm_weights, trans=None): 278 | if len(norm_attacks) == 0: 279 | return x_nat 280 | 281 | x_min = self.x_min 282 | x_max = self.x_max 283 | 284 | if trans is None: 285 | trans = np.zeros([len(x_nat), 3]) 286 | 287 | iters = [a["k"] for a in norm_attacks] 288 | assert (np.all(np.asarray(iters) == iters[0])) 289 | 290 | deltas = np.asarray([init_delta(x_nat, attack, weight) 291 | for attack, weight in zip(norm_attacks, norm_weights)]) 292 | x_adv = np.clip(x_nat + np.sum(deltas, axis=0), 0, 1) 293 | 294 | 295 | # a seed that remains constant across attack iterations 296 | seed = np.random.randint(low=0, high=2**32-1) 297 | 298 | for i in range(np.sum(iters)): 299 | grad = sess.run(self.grad, feed_dict={self.model.x_input: x_adv, 300 | self.model.y_input: y, 301 | self.model.is_training: False, 302 | self.model.transform: trans}) 303 | 304 | deltas[i % len(norm_attacks)] = delta_update(deltas[i % len(norm_attacks)], 305 | grad, 306 | x_adv, 307 | norm_attacks[i % len(norm_attacks)], 308 | x_min, x_max, 309 | norm_weights[i % len(norm_attacks)], 310 | seed=seed, t=i+1) 311 | 312 | x_adv = np.clip(x_nat + np.sum(deltas, axis=0), x_min, x_max) 313 | 314 | return np.clip(x_nat + np.sum(deltas, axis=0), x_min, x_max) 315 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==0.6.1 2 | astor==0.7.1 3 | backports-abc==0.5 4 | backports.functools-lru-cache==1.5 5 | backports.shutil-get-terminal-size==1.0.0 6 | backports.weakref==1.0.post1 7 | bleach==3.1.0 8 | certifi==2018.11.29 9 | chardet==3.0.4 10 | cleverhans==3.0.1 11 | configparser==3.7.1 12 | cycler==0.10.0 13 | Cython==0.29.5 14 | decorator==4.3.2 15 | defusedxml==0.5.0 16 | entrypoints==0.3 17 | enum34==1.1.6 18 | foolbox==1.8.0 19 | funcsigs==1.0.2 20 | functools32==3.2.3.post2 21 | future==0.17.1 22 | futures==3.2.0 23 | gast==0.2.0 24 | gitdb2==2.0.5 25 | GitPython==2.1.11 26 | grpcio==1.17.1 27 | h5py==2.8.0 28 | idna==2.8 29 | ipaddress==1.0.22 30 | ipykernel==4.10.0 31 | ipython==5.8.0 32 | ipython-genutils==0.2.0 33 | ipywidgets==7.4.2 34 | Jinja2==2.10 35 | joblib==0.13.2 36 | jsonschema==2.6.0 37 | jupyter==1.0.0 38 | jupyter-client==5.2.4 39 | jupyter-console==5.2.0 40 | jupyter-core==4.4.0 41 | Keras==2.2.4 42 | Keras-Applications==1.0.6 43 | Keras-Preprocessing==1.0.5 44 | kiwisolver==1.0.1 45 | Markdown==3.0.1 46 | MarkupSafe==1.1.0 47 | matplotlib==2.2.3 48 | mistune==0.8.4 49 | mmdnn==0.2.5 50 | mnist==0.2.2 51 | mock==2.0.0 52 | nbconvert==5.4.1 53 | nbformat==4.4.0 54 | nose==1.3.7 55 | notebook==5.7.8 56 | numpy==1.16.1 57 | pandas==0.24.2 58 | pandocfilters==1.4.2 59 | pathlib2==2.3.3 60 | pbr==5.1.1 61 | pexpect==4.6.0 62 | pickleshare==0.7.5 63 | Pillow==5.4.1 64 | prometheus-client==0.5.0 65 | prompt-toolkit==1.0.15 66 | protobuf==3.6.1 67 | ptyprocess==0.6.0 68 | pycodestyle==2.5.0 69 | Pygments==2.3.1 70 | pyparsing==2.3.0 71 | python-dateutil==2.8.0 72 | pytz==2018.7 73 | PyYAML==5.1 74 | pyzmq==17.1.2 75 | qtconsole==4.4.3 76 | randomgen==1.16.0 77 | requests==2.21.0 78 | scandir==1.9.0 79 | scipy==1.2.0 80 | Send2Trash==1.5.0 81 | simplegeneric==0.8.1 82 | singledispatch==3.4.0.3 83 | six==1.12.0 84 | smmap2==2.0.5 85 | subprocess32==3.5.3 86 | tensorboard==1.12.1 87 | tensorflow-gpu==1.12.0 88 | tensorflow-probability==0.5.0 89 | termcolor==1.1.0 90 | terminado==0.8.1 91 | testpath==0.4.2 92 | torch==1.0.1.post2 93 | torchvision==0.2.1 94 | tornado==5.1.1 95 | tqdm==4.31.1 96 | traitlets==4.3.2 97 | urllib3==1.24.2 98 | wcwidth==0.1.7 99 | webencodings==0.5.1 100 | Werkzeug==0.15.3 101 | widgetsnbextension==3.4.2 102 | -------------------------------------------------------------------------------- /scripts/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ftramer/MultiRobustness/f51a75e07f06b010f34ee760d80fea05ba8ba785/scripts/__init__.py -------------------------------------------------------------------------------- /scripts/eval_cifar_lps.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | from pgd_attack import PGDAttack, compute_grad 4 | from cifar10_model import Model 5 | from scripts.utils import get_ckpt 6 | from eval_ch import evaluate_ch, get_model, get_saver 7 | from eval_fb import evaluate_fb 8 | from eval import evaluate 9 | from multiprocessing import Pool 10 | import sys 11 | 12 | 13 | models_slim = [ 14 | ] 15 | 16 | models_wide = [ 17 | ('path_to_model', -1), 18 | ] 19 | 20 | attack_configs = [ 21 | {"type": "linf", "epsilon": 4.0, "k": 100, "random_start": True, "reps": 20}, 22 | {"type": "linf", "epsilon": 4.0, "k": 1000, "random_start": True}, 23 | {"type": "l1", "epsilon": 2000, "k": 100, "random_start": True, "perc": 99, "a": 2.0, "reps": 20}, 24 | {"type": "l1", "epsilon": 2000, "k": 1000, "random_start": True, "perc": 99, "a": 2.0} 25 | ] 26 | 27 | outdir = "cifar_" + str(int(attack_configs[0]["epsilon"])) 28 | 29 | eval_config = {"data": "cifar10", 30 | "data_path": "cifar10_data", 31 | "num_eval_examples": 1000, 32 | "eval_batch_size": 100} 33 | 34 | eval_wide = sys.argv[1] == "wide" 35 | 36 | if eval_wide: 37 | models = models_wide 38 | eval_config["filters"] = [16, 160, 320, 640] 39 | else: 40 | models = models_slim 41 | eval_config["filters"] = [16, 16, 32, 64] 42 | 43 | model = Model(eval_config) 44 | grad = compute_grad(model) 45 | attacks = [PGDAttack(model, a_config, 0.0, 255.0, grad) for a_config in attack_configs] 46 | 47 | saver = tf.train.Saver() 48 | config_tf = tf.ConfigProto() 49 | config_tf.gpu_options.allow_growth = True 50 | config_tf.gpu_options.per_process_gpu_memory_fraction = 1.0 51 | 52 | nat_accs = np.zeros(len(models)) 53 | adv_accs = np.zeros((len(models), len(attacks) + 2)) 54 | 55 | any_attack = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 56 | any_l1 = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 57 | any_linf = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 58 | 59 | def worker((model_dir, epoch)): 60 | 61 | model_name = model_dir.split('/')[-1] + "_" + str(epoch) 62 | output_file = "results/{}/lps/{}_l1_fb.npy".format(outdir, model_name) 63 | 64 | try: 65 | all_corr_adv1 = np.load(output_file) 66 | return all_corr_adv1 67 | except: 68 | 69 | g = tf.Graph() 70 | with g.as_default(): 71 | config_tf = tf.ConfigProto() 72 | config_tf.gpu_options.allow_growth = True 73 | config_tf.gpu_options.per_process_gpu_memory_fraction = 0.2 74 | with tf.Session(graph=g, config=config_tf) as sess: 75 | model = Model(eval_config) 76 | 77 | saver = tf.train.Saver() 78 | ckpt = get_ckpt(model_dir, epoch) 79 | print("loading ", ckpt) 80 | saver.restore(sess, ckpt) 81 | 82 | # FB attacks 83 | print("Foolbox l1 attack") 84 | all_corr_nat1, all_corr_adv1, l1s = evaluate_fb(model, eval_config, 0.0, 255.0, norm='l1', bound=2000, verbose=False) 85 | all_corr_adv1 = all_corr_adv1 | ((l1s > 2000) & all_corr_nat1) 86 | 87 | np.save(output_file, all_corr_adv1) 88 | 89 | return all_corr_adv1 90 | 91 | pool = Pool(max(len(models), 4)) 92 | all_models_corr_adv1 = pool.map(worker, models) 93 | pool.close() 94 | pool.join() 95 | 96 | adv_accs[:, len(attacks)] = np.mean(all_models_corr_adv1, axis=-1) 97 | any_attack &= all_models_corr_adv1 98 | any_l1 &= all_models_corr_adv1 99 | 100 | print("DONE with FB!") 101 | 102 | print(nat_accs) 103 | print(adv_accs) 104 | print("any: ", np.mean(any_attack, axis=-1)) 105 | print("l1: ", np.mean(any_l1, axis=-1)) 106 | print("linf: ", np.mean(any_linf, axis=-1)) 107 | 108 | with tf.Session(config=config_tf) as sess: 109 | for m_idx, (model_dir, epoch) in enumerate(models): 110 | ckpt = get_ckpt(model_dir, epoch) 111 | saver.restore(sess, ckpt) 112 | 113 | print("starting...", model_dir, epoch) 114 | 115 | # lp attacks 116 | nat_acc, total_corr_advs = evaluate(model, attacks, sess, eval_config) 117 | nat_accs[m_idx] = nat_acc 118 | adv_acc = np.mean(total_corr_advs, axis=-1) 119 | adv_accs[m_idx, :len(attacks)] = adv_acc 120 | any_attack[m_idx] &= np.bitwise_and.reduce(np.asarray(total_corr_advs), 0) 121 | 122 | print(model_dir, adv_accs[m_idx]) 123 | model_name = models[m_idx][0].split('/')[-1] + "_" + str(models[m_idx][1]) 124 | for i, attack in enumerate(attacks): 125 | np.save("results/{}/lps/{}_{}.npy".format(outdir, model_name, attack.name), total_corr_advs[i]) 126 | 127 | if attack_configs[i]["type"] == "l1": 128 | any_l1[m_idx] &= total_corr_advs[i] 129 | else: 130 | any_linf[m_idx] &= total_corr_advs[i] 131 | 132 | print("DONE with PGD!") 133 | print(nat_accs) 134 | print(adv_accs) 135 | print("any: ", np.mean(any_attack, axis=-1)) 136 | print("l1: ", np.mean(any_l1, axis=-1)) 137 | print("linf: ", np.mean(any_linf, axis=-1)) 138 | 139 | tf.reset_default_graph() 140 | 141 | # Cleverhans attacks 142 | g2 = tf.Graph() 143 | with g2.as_default(): 144 | with tf.Session(graph=g2, config=config_tf) as sess2: 145 | model2 = get_model(eval_config) 146 | saver2 = get_saver(eval_config) 147 | 148 | for m_idx, (model_dir, epoch) in enumerate(models): 149 | ckpt = get_ckpt(model_dir, epoch) 150 | saver.restore(sess2, ckpt) 151 | 152 | print("starting...", model_dir, epoch) 153 | 154 | print("EAD") 155 | all_corr_nat1, all_corr_adv1, l1s = evaluate_ch(model2, eval_config, sess2, "l1", 2000, verbose=True) 156 | all_corr_adv1 = all_corr_adv1 | ((l1s > 2000) & all_corr_nat1) 157 | adv_accs[m_idx, len(attacks) + 1] = np.mean(all_corr_adv1) 158 | any_attack[m_idx] &= all_corr_adv1 159 | 160 | print(model_dir, adv_accs[m_idx]) 161 | 162 | model_name = models[m_idx][0].split('/')[-1] + "_" + str(models[m_idx][1]) 163 | any_l1[m_idx] &= all_corr_adv1 164 | np.save("results/{}/lps/{}_l1_ead.npy".format(outdir, model_name), all_corr_adv1) 165 | 166 | print(nat_accs) 167 | print(adv_accs) 168 | print("any: ", np.mean(any_attack, axis=-1)) 169 | print("l1: ", np.mean(any_l1, axis=-1)) 170 | print("linf: ", np.mean(any_linf, axis=-1)) 171 | 172 | -------------------------------------------------------------------------------- /scripts/eval_cifar_spatial.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | from pgd_attack import PGDAttack, compute_grad 4 | from cifar10_model import Model 5 | from scripts.utils import get_ckpt 6 | from eval import evaluate 7 | import sys 8 | 9 | 10 | models_slim = [ 11 | ] 12 | 13 | models_wide = [ 14 | ('path_to_model', -1), 15 | ] 16 | 17 | attack_configs = [ 18 | {"type": "linf", "epsilon": 4.0, "k": 100, "random_start": True, "reps": 20}, 19 | {"type": "linf", "epsilon": 4.0, "k": 1000, "random_start": True}, 20 | {"type": "RT", "spatial_limits": [3, 3, 30], "grid_granularity": [5, 5, 31], "random_tries": 10}, 21 | {"type": "RT", "spatial_limits": [3, 3, 30], "grid_granularity": [5, 5, 31], "random_tries": -1} 22 | ] 23 | 24 | outdir = "cifar_" + str(int(attack_configs[0]["epsilon"])) 25 | 26 | conf_slim = {"filters": [16, 16, 32, 64]} 27 | conf_wide = {"filters": [16, 160, 320, 640]} 28 | 29 | eval_wide = sys.argv[1] == "wide" 30 | 31 | if eval_wide: 32 | models = models_wide 33 | conf = conf_wide 34 | else: 35 | models = models_slim 36 | conf = conf_slim 37 | 38 | model = Model(conf) 39 | grad = compute_grad(model) 40 | attacks = [PGDAttack(model, a_config, 0.0, 255.0, grad) for a_config in attack_configs] 41 | 42 | saver = tf.train.Saver() 43 | config_tf = tf.ConfigProto() 44 | config_tf.gpu_options.allow_growth = True 45 | config_tf.gpu_options.per_process_gpu_memory_fraction = 1.0 46 | 47 | eval_config = {"data": "cifar10", 48 | "data_path": "cifar10_data", 49 | "num_eval_examples": 1000, 50 | "eval_batch_size": 100} 51 | 52 | nat_accs = np.zeros(len(models)) 53 | adv_accs = np.zeros((len(models), len(attacks))) 54 | 55 | any_attack = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 56 | any_rt = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 57 | any_linf = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 58 | 59 | with tf.Session(config=config_tf) as sess: 60 | for m_idx, (model_dir, epoch) in enumerate(models): 61 | ckpt = get_ckpt(model_dir, epoch) 62 | saver.restore(sess, ckpt) 63 | 64 | print("starting...", model_dir, epoch) 65 | 66 | # lp attacks 67 | nat_acc, total_corr_advs = evaluate(model, attacks, sess, eval_config) 68 | nat_accs[m_idx] = nat_acc 69 | adv_acc = np.mean(total_corr_advs, axis=-1) 70 | adv_accs[m_idx, :len(attacks)] = adv_acc 71 | any_attack[m_idx] &= np.bitwise_and.reduce(np.asarray(total_corr_advs), 0) 72 | 73 | print(model_dir, adv_accs[m_idx]) 74 | for i, attack in enumerate(attacks): 75 | model_name = models[m_idx][0].split('/')[-1] + "_" + str(models[m_idx][1]) 76 | np.save("results/{}/spatial/{}_{}.npy".format(outdir, model_name, attack.name), total_corr_advs[i]) 77 | 78 | if attack_configs[i]["type"] == "RT": 79 | any_rt[m_idx] &= total_corr_advs[i] 80 | else: 81 | any_linf[m_idx] &= total_corr_advs[i] 82 | 83 | print(nat_accs) 84 | print(adv_accs) 85 | print("any: ", np.mean(any_attack, axis=-1)) 86 | print("rt: ", np.mean(any_rt, axis=-1)) 87 | print("linf: ", np.mean(any_linf, axis=-1)) 88 | 89 | -------------------------------------------------------------------------------- /scripts/eval_mnist_lps.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | from pgd_attack import PGDAttack, compute_grad 4 | from model import Model 5 | from scripts.utils import get_ckpt 6 | from eval import evaluate 7 | from eval_fb import evaluate_fb 8 | from eval_ch import evaluate_ch, get_model, get_saver 9 | from eval_bapp import evaluate_bapp 10 | from multiprocessing import Pool 11 | import os 12 | 13 | 14 | models = [ 15 | ('path_to_model', -1), 16 | ] 17 | 18 | attack_configs = [ 19 | {"type": "linf", "epsilon": 0.3, "k": 100, "random_start": True, "reps": 40}, 20 | {"type": "l1", "epsilon": 10, "k": 100, "random_start": True, "perc": 99, "a": 0.5, "reps": 40}, 21 | {"type": "l2", "epsilon": 2, "k": 100, "random_start": True, "reps": 40}, 22 | ] 23 | 24 | model = Model({"model_type": "cnn"}) 25 | grad = compute_grad(model) 26 | attacks = [PGDAttack(model, a_config, 0.0, 1.0, grad) for a_config in attack_configs] 27 | 28 | saver = tf.train.Saver() 29 | config_tf = tf.ConfigProto() 30 | config_tf.gpu_options.allow_growth = True 31 | config_tf.gpu_options.per_process_gpu_memory_fraction = 0.5 32 | 33 | 34 | eval_config = {"data": "mnist", 35 | "num_eval_examples": 200, 36 | "eval_batch_size": 200} 37 | 38 | nat_accs = np.zeros(len(models)) 39 | adv_accs = np.zeros((len(models), len(attacks) + 5)) 40 | 41 | any_attack = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 42 | any_l1 = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 43 | any_l2 = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 44 | any_linf = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 45 | 46 | def worker((model_dir, epoch)): 47 | g = tf.Graph() 48 | with g.as_default(): 49 | config_tf = tf.ConfigProto() 50 | config_tf.gpu_options.allow_growth = True 51 | config_tf.gpu_options.per_process_gpu_memory_fraction = 0.2 52 | with tf.Session(graph=g, config=config_tf) as sess: 53 | model = Model({"model_type": "cnn"}) 54 | 55 | saver = tf.train.Saver() 56 | ckpt = get_ckpt(model_dir, epoch) 57 | saver.restore(sess, ckpt) 58 | 59 | # FB attacks 60 | print("Foolbox l1 attack") 61 | all_corr_nat1, all_corr_adv1, l1s = evaluate_fb(model, eval_config, 0.0, 1.0, norm='l1', bound=10, verbose=False) 62 | all_corr_adv1 = all_corr_adv1 | ((l1s > 10) & all_corr_nat1) 63 | 64 | print("Foolbox l2 attack") 65 | all_corr_nat2, all_corr_adv2, l2s = evaluate_fb(model, eval_config, 0.0, 1.0, norm='l2', bound=2.0, verbose=False) 66 | all_corr_adv2 = all_corr_adv2 | ((l2s > 2.0) & all_corr_nat2) 67 | return all_corr_adv1, all_corr_adv2 68 | 69 | 70 | pool = Pool(4) 71 | all_models_corr_adv = pool.map(worker, models) 72 | 73 | all_models_corr_adv1 = np.asarray([a[0] for a in all_models_corr_adv]) 74 | all_models_corr_adv2 = np.asarray([a[1] for a in all_models_corr_adv]) 75 | 76 | adv_accs[:, len(attacks)] = np.mean(all_models_corr_adv1, axis=-1) 77 | any_attack &= all_models_corr_adv1 78 | any_l1 &= all_models_corr_adv1 79 | adv_accs[:, len(attacks) + 1] = np.mean(all_models_corr_adv2, axis=-1) 80 | any_attack &= all_models_corr_adv2 81 | any_l2 &= all_models_corr_adv2 82 | 83 | print("DONE with FB!") 84 | 85 | print(nat_accs) 86 | print(adv_accs) 87 | print("any: ", np.mean(any_attack, axis=-1)) 88 | print("l1: ", np.mean(any_l1, axis=-1)) 89 | print("l2: ", np.mean(any_l2, axis=-1)) 90 | print("linf: ", np.mean(any_linf, axis=-1)) 91 | 92 | with tf.Session(config=config_tf) as sess: 93 | for m_idx, (model_dir, epoch) in enumerate(models): 94 | ckpt = get_ckpt(model_dir, epoch) 95 | saver.restore(sess, ckpt) 96 | 97 | print("starting...", model_dir) 98 | 99 | # lp attacks 100 | nat_acc, total_corr_advs = evaluate(model, attacks, sess, eval_config) 101 | nat_accs[m_idx] = nat_acc 102 | adv_acc = np.mean(total_corr_advs, axis=-1) 103 | adv_accs[m_idx, :len(attacks)] = adv_acc 104 | any_attack[m_idx] &= np.bitwise_and.reduce(np.asarray(total_corr_advs), 0) 105 | 106 | print(model_dir, adv_accs[m_idx]) 107 | model_name = model_dir.split('/')[-1] 108 | for i, attack in enumerate(attacks): 109 | np.save("results/mnist/{}_{}.npy".format(model_name, attack.name), total_corr_advs[i]) 110 | 111 | if attack_configs[i]["type"] == "l1": 112 | any_l1[m_idx] &= total_corr_advs[i] 113 | elif attack_configs[i]["type"] == "l2": 114 | any_l2[m_idx] &= total_corr_advs[i] 115 | else: 116 | any_linf[m_idx] &= total_corr_advs[i] 117 | 118 | print("DONE with PGD!") 119 | print(nat_accs) 120 | print(adv_accs) 121 | print("any: ", np.mean(any_attack, axis=-1)) 122 | print("l1: ", np.mean(any_l1, axis=-1)) 123 | print("l2: ", np.mean(any_l2, axis=-1)) 124 | print("linf: ", np.mean(any_linf, axis=-1)) 125 | 126 | with tf.Session(config=config_tf) as sess: 127 | for m_idx, (model_dir, epoch) in enumerate(models): 128 | ckpt = get_ckpt(model_dir, epoch) 129 | saver.restore(sess, ckpt) 130 | 131 | print("starting...", model_dir) 132 | all_corr_nat_inf, all_corr_adv_inf, l_infs = evaluate_bapp(sess, model, eval_config, 0, 1, 0.3, verbose=False) 133 | all_corr_adv_inf = all_corr_adv_inf | ((l_infs > 0.3) & all_corr_nat_inf) 134 | adv_accs[m_idx, len(attacks) + 2] = np.mean(all_corr_adv_inf) 135 | any_attack[m_idx] &= all_corr_adv_inf 136 | 137 | any_linf[m_idx] &= all_corr_adv_inf 138 | 139 | # Cleverhans attacks 140 | g2 = tf.Graph() 141 | with g2.as_default(): 142 | with tf.Session(graph=g2, config=config_tf) as sess2: 143 | model2 = get_model(eval_config) 144 | saver2 = get_saver(eval_config) 145 | 146 | for m_idx, (model_dir, epoch) in enumerate(models): 147 | ckpt = get_ckpt(model_dir, epoch) 148 | saver.restore(sess2, ckpt) 149 | 150 | print("starting...", model_dir) 151 | 152 | print("EAD") 153 | all_corr_nat1, all_corr_adv1, l1s = evaluate_ch(model2, eval_config, sess2, "l1", 10, verbose=False) 154 | all_corr_adv1 = all_corr_adv1 | ((l1s > 10) & all_corr_nat1) 155 | adv_accs[m_idx, len(attacks) + 3] = np.mean(all_corr_adv1) 156 | any_attack[m_idx] &= all_corr_adv1 157 | 158 | print("C&W") 159 | all_corr_nat2, all_corr_adv2, l2s = evaluate_ch(model2, eval_config, sess2, "l2", 2, verbose=False) 160 | all_corr_adv2 = all_corr_adv2 | ((l2s > 2.0) & all_corr_nat2) 161 | adv_accs[m_idx, len(attacks) + 4] = np.mean(all_corr_adv2) 162 | any_attack[m_idx] &= all_corr_adv2 163 | 164 | print(model_dir, adv_accs[m_idx]) 165 | 166 | model_name = model_dir.split('/')[-1] 167 | any_l1[m_idx] &= all_corr_adv1 168 | any_l2[m_idx] &= all_corr_adv2 169 | np.save("results/mnist/{}_l1_ead.npy".format(model_name), all_corr_adv1) 170 | np.save("results/mnist/{}_l2_cw.npy".format(model_name), all_corr_adv2) 171 | 172 | print(nat_accs) 173 | print(adv_accs) 174 | print("any: ", np.mean(any_attack, axis=-1)) 175 | print("l1: ", np.mean(any_l1, axis=-1)) 176 | print("l2: ", np.mean(any_l2, axis=-1)) 177 | print("linf: ", np.mean(any_linf, axis=-1)) 178 | -------------------------------------------------------------------------------- /scripts/eval_mnist_spatial.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | import numpy as np 3 | from pgd_attack import PGDAttack, compute_grad 4 | from model import Model 5 | from scripts.utils import get_ckpt 6 | from eval import evaluate 7 | 8 | 9 | models = [ 10 | ('path_to_model', -1), 11 | ] 12 | 13 | attack_configs = [ 14 | {"type": "linf", "epsilon": 0.3, "k": 100, "random_start": True, "reps": 40}, 15 | {"type": "linf", "epsilon": 0.3, "k": 1000, "random_start": True}, 16 | {"type": "RT", "spatial_limits": [3, 3, 30], "grid_granularity": [5, 5, 31], "random_tries": 10}, 17 | {"type": "RT", "spatial_limits": [3, 3, 30], "grid_granularity": [5, 5, 31], "random_tries": -1} 18 | ] 19 | 20 | model = Model({"model_type": "cnn"}) 21 | grad = compute_grad(model) 22 | attacks = [PGDAttack(model, a_config, 0.0, 1.0, grad) for a_config in attack_configs] 23 | 24 | saver = tf.train.Saver() 25 | config_tf = tf.ConfigProto() 26 | config_tf.gpu_options.allow_growth = True 27 | config_tf.gpu_options.per_process_gpu_memory_fraction = 0.2 28 | 29 | eval_config = {"data": "mnist", 30 | "model_type": "cnn", 31 | "num_eval_examples": 200, 32 | "eval_batch_size": 100} 33 | 34 | nat_accs = np.zeros(len(models)) 35 | adv_accs = np.zeros((len(models), len(attacks))) 36 | 37 | any_attack = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 38 | any_rt = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 39 | any_linf = np.ones((len(models), eval_config["num_eval_examples"])).astype(np.bool) 40 | 41 | with tf.Session(config=config_tf) as sess: 42 | for m_idx, (model_dir, epoch) in enumerate(models): 43 | ckpt = get_ckpt(model_dir, epoch) 44 | saver.restore(sess, ckpt) 45 | 46 | print("starting...", model_dir, epoch) 47 | 48 | # lp attacks 49 | nat_acc, total_corr_advs = evaluate(model, attacks, sess, eval_config) 50 | nat_accs[m_idx] = nat_acc 51 | adv_acc = np.mean(total_corr_advs, axis=-1) 52 | adv_accs[m_idx, :len(attacks)] = adv_acc 53 | any_attack[m_idx] &= np.bitwise_and.reduce(np.asarray(total_corr_advs), 0) 54 | 55 | print(model_dir, adv_accs[m_idx]) 56 | model_name = model_dir.split('/')[-1] + "_" + str(epoch) 57 | for i, attack in enumerate(attacks): 58 | np.save("results/mnist/spatial/{}_{}.npy".format(model_name, attack.name), total_corr_advs[i]) 59 | 60 | if attack_configs[i]["type"] == "RT": 61 | any_rt[m_idx] &= total_corr_advs[i] 62 | else: 63 | any_linf[m_idx] &= total_corr_advs[i] 64 | 65 | print(nat_accs) 66 | print(adv_accs) 67 | print("any: ", np.mean(any_attack, axis=-1)) 68 | print("rt: ", np.mean(any_rt, axis=-1)) 69 | print("linf: ", np.mean(any_linf, axis=-1)) 70 | -------------------------------------------------------------------------------- /scripts/utils.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | def get_ckpt(model_dir, epoch): 4 | if epoch is not None and epoch > 0: 5 | ckpts = tf.train.get_checkpoint_state(model_dir).all_model_checkpoint_paths 6 | ckpt = [c for c in ckpts if c.endswith('checkpoint-{}'.format(epoch))] 7 | assert len(ckpt) == 1 8 | cur_checkpoint = ckpt[0] 9 | else: 10 | cur_checkpoint = tf.train.latest_checkpoint(model_dir) 11 | return cur_checkpoint 12 | 13 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | """Trains a model, saving checkpoints and tensorboard summaries along 2 | the way.""" 3 | from __future__ import absolute_import 4 | from __future__ import division 5 | from __future__ import print_function 6 | 7 | from datetime import datetime 8 | import json 9 | import os 10 | import shutil 11 | from timeit import default_timer as timer 12 | 13 | import tensorflow as tf 14 | import numpy as np 15 | 16 | from pgd_attack import PGDAttack, compute_grad 17 | from eval import evaluate 18 | 19 | import sys 20 | import logging 21 | 22 | logging.getLogger('tensorflow').setLevel(logging.ERROR) 23 | 24 | model_dir = sys.argv[1] 25 | 26 | try: 27 | with open(model_dir + "/config.json") as config_file: 28 | config = json.load(config_file) 29 | print("opened previous config file") 30 | except IOError: 31 | with open("config.json") as config_file: 32 | config = json.load(config_file) 33 | 34 | # Setting up training parameters 35 | tf.set_random_seed(config['random_seed']) 36 | 37 | max_num_training_steps = config['max_num_training_steps'] 38 | num_output_steps = config['num_output_steps'] 39 | num_summary_steps = config['num_summary_steps'] 40 | num_checkpoint_steps = config['num_checkpoint_steps'] 41 | 42 | batch_size = config['training_batch_size'] 43 | 44 | dataset = config["data"] 45 | assert dataset in ["mnist", "cifar10"] 46 | 47 | num_train_attacks = len(config["train_attacks"]) 48 | multi_attack_mode = config["multi_attack_mode"] 49 | print("num_train_attacks", num_train_attacks) 50 | print("multi_attack_mode", multi_attack_mode) 51 | 52 | step_size_schedule = config['step_size_schedule'] 53 | step_size_schedule = np.asarray(step_size_schedule) 54 | 55 | # strategies for training with adversarial examples from K attacks: 56 | # 57 | # HALF_LR: Keeps the clean batch size fixed 58 | # (so the effective batch size is multiplied by K) and divides the learning rate by K 59 | # 60 | # HALF_BATCH: Divides the clean batch size by K (so the ffective batch size remains unchanged). 61 | # This is necessary to avoid memory overflows with the wide ResNet model on CIFAR10 62 | # 63 | if "HALF_LR" in multi_attack_mode: 64 | step_size_schedule[:, 1] *= 1. / num_train_attacks 65 | if "HALF_BATCH" in multi_attack_mode or "ALTERNATE" in multi_attack_mode: 66 | step_size_schedule[:, 0] *= num_train_attacks 67 | max_num_training_steps *= num_train_attacks 68 | max_num_training_steps = int(max_num_training_steps) 69 | 70 | if "HALF_BATCH" in multi_attack_mode: 71 | batch_size *= 1. / num_train_attacks 72 | batch_size = int(batch_size) 73 | print("batch_size", batch_size) 74 | 75 | boundaries = [int(sss[0]) for sss in step_size_schedule] 76 | boundaries = boundaries[1:] 77 | values = [sss[1] for sss in step_size_schedule] 78 | global_step = tf.contrib.framework.get_or_create_global_step() 79 | learning_rate = tf.train.piecewise_constant( 80 | tf.cast(global_step, tf.int32), boundaries, values) 81 | 82 | if dataset == "mnist": 83 | from tensorflow.examples.tutorials.mnist import input_data 84 | from model import Model 85 | 86 | # Setting up the data and the model 87 | mnist = input_data.read_data_sets('MNIST_data', one_hot=False) 88 | num_train_data = 60000 89 | if config["model_type"] == "linear": 90 | x_train = mnist.train.images 91 | y_train = mnist.train.labels 92 | x_test = mnist.test.images 93 | y_test = mnist.test.labels 94 | 95 | pos_train = (y_train == 5) | (y_train == 7) 96 | x_train = x_train[pos_train] 97 | y_train = y_train[pos_train] 98 | y_train = (y_train == 5).astype(np.int64) 99 | pos_test = (y_test == 5) | (y_test == 7) 100 | x_test = x_test[pos_test] 101 | y_test = y_test[pos_test] 102 | y_test = (y_test == 5).astype(np.int64) 103 | 104 | from tensorflow.contrib.learn.python.learn.datasets.mnist import DataSet 105 | from tensorflow.contrib.learn.python.learn.datasets import base 106 | 107 | options = dict(dtype=tf.uint8, reshape=False, seed=None) 108 | train = DataSet(x_train, y_train, **options) 109 | test = DataSet(x_test, y_test, **options) 110 | 111 | mnist = base.Datasets(train=train, validation=None, test=test) 112 | num_train_data = len(x_train) 113 | 114 | model = Model(config) 115 | x_min, x_max = 0.0, 1.0 116 | 117 | # Setting up the optimizer 118 | opt = tf.train.AdamOptimizer(learning_rate) 119 | gv = opt.compute_gradients(model.xent) 120 | train_step = opt.apply_gradients(gv, global_step=global_step) 121 | else: 122 | import cifar10_input 123 | from cifar10_model import Model 124 | 125 | weight_decay = config['weight_decay'] 126 | data_path = config['data_path'] 127 | momentum = config['momentum'] 128 | raw_cifar = cifar10_input.CIFAR10Data(data_path) 129 | num_train_data = 50000 130 | model = Model(config) 131 | x_min, x_max = 0.0, 255.0 132 | 133 | # Setting up the optimizer 134 | total_loss = model.mean_xent + weight_decay * model.weight_decay_loss 135 | opt = tf.train.MomentumOptimizer(learning_rate, momentum) 136 | gv = opt.compute_gradients(total_loss) 137 | train_step = opt.apply_gradients(gv, global_step=global_step) 138 | 139 | num_epochs = (max_num_training_steps * batch_size) // num_train_data 140 | print("num_epochs: {:d}".format(num_epochs)) 141 | print("max_num_training_steps", max_num_training_steps) 142 | print("step_size_schedule", step_size_schedule) 143 | 144 | # Set up adversary 145 | grad = compute_grad(model) 146 | train_attack_configs = [np.asarray(config["attacks"])[i] for i in config["train_attacks"]] 147 | eval_attack_configs = [np.asarray(config["attacks"])[i] for i in config["eval_attacks"]] 148 | train_attacks = [PGDAttack(model, a_config, x_min, x_max, grad) for a_config in train_attack_configs] 149 | 150 | # Optimization that works well on MNIST: do a first epoch with a lower epsilon 151 | start_small = config.get("start_small", False) 152 | if start_small: 153 | train_attack_configs_small = [a.copy() for a in train_attack_configs] 154 | for attack in train_attack_configs_small: 155 | if 'epsilon' in attack: 156 | attack['epsilon'] /= 3.0 157 | else: 158 | attack['spatial_limits'] = [s/3.0 for s in attack['spatial_limits']] 159 | train_attacks_small = [PGDAttack(model, a_config, x_min, x_max, grad) for a_config in train_attack_configs_small] 160 | print('start_small', start_small) 161 | 162 | eval_attacks = [PGDAttack(model, a_config, x_min, x_max, grad) for a_config in eval_attack_configs] 163 | 164 | # Setting up the Tensorboard and checkpoint outputs 165 | if not os.path.exists(model_dir): 166 | os.makedirs(model_dir) 167 | shutil.copy('config.json', model_dir) 168 | 169 | eval_dir = os.path.join(model_dir, 'eval') 170 | if not os.path.exists(eval_dir): 171 | os.makedirs(eval_dir) 172 | 173 | train_dir = os.path.join(model_dir, 'train') 174 | if not os.path.exists(train_dir): 175 | os.makedirs(train_dir) 176 | 177 | saver = tf.train.Saver(max_to_keep=100) 178 | tf.summary.scalar('accuracy adv train', model.accuracy, collections=['adv']) 179 | tf.summary.scalar('xent adv train', model.mean_xent, collections=['adv']) 180 | tf.summary.image('images adv train', model.x_image, collections=['adv']) 181 | adv_summaries = tf.summary.merge_all('adv') 182 | 183 | tf.summary.scalar('accuracy_nat_train', model.accuracy, collections=['nat']) 184 | tf.summary.scalar('xent_nat_train', model.mean_xent, collections=['nat']) 185 | tf.summary.scalar('learning_rate', learning_rate, collections=['nat']) 186 | nat_summaries = tf.summary.merge_all('nat') 187 | 188 | eval_summaries_train = [] 189 | for i, attack in enumerate(eval_attacks): 190 | a_type = attack.name 191 | tf.summary.scalar('accuracy adv train {}'.format(a_type), model.accuracy, collections=['adv_{}'.format(i)]) 192 | tf.summary.scalar('xent adv train {}'.format(a_type), model.mean_xent, collections=['adv_{}'.format(i)]) 193 | tf.summary.image('images adv train {}'.format(a_type), model.x_image, collections=['adv_{}'.format(i)]) 194 | eval_summaries_train.append(tf.summary.merge_all('adv_{}'.format(i))) 195 | 196 | config_tf = tf.ConfigProto() 197 | config_tf.gpu_options.allow_growth = True 198 | if dataset == "mnist": 199 | config_tf.gpu_options.per_process_gpu_memory_fraction = 0.2 200 | else: 201 | config_tf.gpu_options.per_process_gpu_memory_fraction = 1.0 202 | config_tf.allow_soft_placement = True 203 | 204 | with tf.Session(config=config_tf) as sess: 205 | if dataset == "cifar10": 206 | # initialize data augmentation 207 | cifar = cifar10_input.AugmentedCIFAR10Data(raw_cifar, sess) 208 | 209 | # Initialize the summary writer, global variables, and our time counter. 210 | summary_writer = tf.summary.FileWriter(train_dir, sess.graph) 211 | test_summary_writer = tf.summary.FileWriter(eval_dir) 212 | sess.run(tf.global_variables_initializer()) 213 | training_time = 0.0 214 | 215 | cur_checkpoint = tf.train.latest_checkpoint(model_dir) 216 | if cur_checkpoint is not None: 217 | saver.restore(sess, cur_checkpoint) 218 | else: 219 | print("no checkpoint to load") 220 | 221 | start_step = sess.run(global_step) 222 | 223 | # Main training loop 224 | for ii in range(start_step, max_num_training_steps + 1): 225 | curr_epoch = (ii * batch_size) // num_train_data 226 | 227 | if dataset == "mnist": 228 | x_batch, y_batch = mnist.train.next_batch(batch_size) 229 | x_batch = x_batch.reshape(-1, 28, 28, 1) 230 | x_batch_no_aug = x_batch 231 | else: 232 | x_batch_no_aug, x_batch, y_batch = cifar.train_data.get_next_batch(batch_size, multiple_passes=True) 233 | x_batch_no_aug = x_batch_no_aug.astype(np.float32) 234 | x_batch = x_batch.astype(np.float32) 235 | 236 | noop_trans = np.zeros([len(x_batch), 3]) 237 | 238 | if start_small and curr_epoch == 0: 239 | curr_train_attacks = train_attacks_small 240 | else: 241 | curr_train_attacks = train_attacks 242 | 243 | # Compute Adversarial Perturbations 244 | start = timer() 245 | if multi_attack_mode == "ALTERNATE": 246 | # alternate between attacks each batch (does not work verywell) 247 | curr_attack = curr_train_attacks[ii % num_train_attacks] 248 | adv_outputs = [curr_attack.perturb(x_batch, y_batch, sess, x_nat_no_aug=x_batch_no_aug)] 249 | 250 | elif multi_attack_mode == "MAX": 251 | # choose best attack for each input 252 | adv_outputs = [attack.perturb(x_batch, y_batch, sess, x_nat_no_aug=x_batch_no_aug) for attack in curr_train_attacks] 253 | losses = np.zeros((num_train_attacks, len(x_batch))) 254 | for j in range(num_train_attacks): 255 | x = adv_outputs[j][0] 256 | t = adv_outputs[j][1] 257 | losses[j] = sess.run(model.y_xent, 258 | feed_dict={model.x_input: x, 259 | model.y_input: y_batch, 260 | model.is_training: False, 261 | model.transform: t if t is not None else noop_trans}) 262 | best_idx = np.argmax(losses, axis=0) # shape (batch_size,) 263 | best_x = np.asarray([adv_outputs[best_idx[j]][0][j] for j in range(len(x_batch))]) 264 | best_t = np.asarray([adv_outputs[best_idx[j]][1][j] for j in range(len(x_batch))]) 265 | adv_outputs = [(best_x, best_t)] 266 | 267 | else: 268 | # concatenate multiple attacks (default) 269 | adv_outputs = [attack.perturb(x_batch, y_batch, sess, x_nat_no_aug=x_batch_no_aug) for attack in curr_train_attacks] 270 | 271 | x_batch_advs = [a[0] for a in adv_outputs] 272 | all_trans = [a[1] if a[1] is not None else noop_trans for a in adv_outputs] 273 | end = timer() 274 | training_time += end - start 275 | 276 | nat_dict = {model.x_input: x_batch, 277 | model.y_input: y_batch, 278 | model.is_training: False, 279 | model.transform: noop_trans} 280 | 281 | if num_train_attacks > 0: 282 | x_batch_adv = np.concatenate(x_batch_advs) 283 | y_batch_adv = np.concatenate([y_batch for _ in range(len(x_batch_advs))]) 284 | trans_adv = np.concatenate(all_trans) 285 | 286 | adv_dict = {model.x_input: x_batch_adv, 287 | model.y_input: y_batch_adv, 288 | model.is_training: False, 289 | model.transform: trans_adv} 290 | else: 291 | adv_dict = nat_dict 292 | 293 | if ii % num_output_steps == 0: 294 | print('Step {} (epoch {}): ({})'.format(ii, curr_epoch, datetime.now())) 295 | if ii > 0: 296 | print(' {} examples per second'.format(num_output_steps * batch_size / training_time)) 297 | training_time = 0.0 298 | summary = sess.run(adv_summaries, feed_dict=adv_dict) 299 | summary_writer.add_summary(summary, global_step.eval(sess)) 300 | summary = sess.run(nat_summaries, feed_dict=nat_dict) 301 | summary_writer.add_summary(summary, global_step.eval(sess)) 302 | 303 | # Output to stdout and tensorboard summaries 304 | if ii % num_summary_steps == 0: 305 | nat_acc = sess.run(model.accuracy, feed_dict=nat_dict) 306 | print(' training nat accuracy {:.4}%'.format(nat_acc * 100)) 307 | 308 | for a_idx, attack in enumerate(eval_attacks): 309 | x_batch_adv_eval, trans_eval = attack.perturb(x_batch, y_batch, sess, x_nat_no_aug=x_batch_no_aug) 310 | 311 | adv_dict_eval = {model.x_input: x_batch_adv_eval, 312 | model.y_input: y_batch, 313 | model.is_training: False, 314 | model.transform: trans_eval if trans_eval is not None else noop_trans} 315 | 316 | adv_acc = sess.run(model.accuracy, feed_dict=adv_dict_eval) 317 | print(' training adv accuracy ({}) {:.4}%'.format(attack.name, adv_acc * 100)) 318 | 319 | summary = sess.run(eval_summaries_train[a_idx], feed_dict=adv_dict_eval) 320 | summary_writer.add_summary(summary, global_step.eval(sess)) 321 | 322 | evaluate(model, eval_attacks, sess, config, plot=False, 323 | summary_writer=test_summary_writer, eval_train=False) 324 | 325 | # Write a checkpoint 326 | if ii % num_checkpoint_steps == 0 and ii > 0: 327 | saver.save(sess, os.path.join(model_dir, 'checkpoint'), global_step=global_step) 328 | 329 | # Actual training step 330 | start = timer() 331 | adv_dict[model.is_training] = True 332 | _, curr_gv = sess.run([train_step, [g for (g, v) in gv]], feed_dict=adv_dict) 333 | end = timer() 334 | training_time += end - start 335 | --------------------------------------------------------------------------------