├── LICENSE ├── README.md ├── attack_utils.py ├── carlini.py ├── fgs.py ├── mnist.py ├── models └── .gitignore ├── simple_eval.py ├── tf_utils.py ├── train.py └── train_adv.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 ftramer 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Ensemble Adversarial Training 2 | 3 | This repository contains code to reproduce results from the paper: 4 | 5 | **Ensemble Adversarial Training: Attacks and Defenses**
6 | *Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh and Patrick McDaniel*
7 | ArXiv report: https://arxiv.org/abs/1705.07204 8 | 9 |
10 | 11 | ###### REQUIREMENTS 12 | 13 | The code was tested with Python 2.7.12, Tensorflow 1.0.1 and Keras 1.2.2. 14 | 15 | ###### EXPERIMENTS 16 | 17 | We start by training a few simple MNIST models. These are described in _mnist.py_. 18 | 19 | ``` 20 | python -m train models/modelA --type=0 21 | python -m train models/modelB --type=1 22 | python -m train models/modelC --type=2 23 | python -m train models/modelD --type=3 24 | ``` 25 | 26 | Then, we can use (standard) Adversarial Training or Ensemble Adversarial Training 27 | (we train for either 6 or 12 epochs in the paper). With Ensemble Adversarial 28 | Training, we additionally augment the training data with adversarial examples 29 | crafted from external pre-trained models (models A, C and D here): 30 | 31 | ``` 32 | python -m train_adv models/modelA_adv --type=0 --epochs=12 33 | python -m train_adv models/modelA_ens models/modelA models/modelC models/modelD --type=0 --epochs=12 34 | ``` 35 | 36 | The accuracy of the models on the MNIST test set can be computed using 37 | 38 | ``` 39 | python -m simple_eval test [model(s)] 40 | ``` 41 | 42 | To evaluate robustness to various attacks, we use 43 | 44 | ``` 45 | python -m simple_eval [attack] [source_model] [target_model(s)] [--parameters (opt)] 46 | ``` 47 | 48 | The attack can be: 49 | 50 | | Attack | Description | Parameters | 51 | | ------ | ----------- | ---------- | 52 | | fgs | Standard FGSM | *eps* (the norm of the perturbation) | 53 | |rand_fgs| Our FGSM variant that prepends the gradient computation by a random step | *eps* (the norm of the total perturbation); *alpha* (the norm of the random perturbation) | 54 | | ifgs | The iterative FGSM | *eps* (the norm of the perturbation); *steps* (the number of iterative FGSM steps) | 55 | | CW | The Carlini and Wagner attack | *eps* (the norm of the perturbation); *kappa* (attack confidence) | 56 | 57 | Note that due to GPU non-determinism, the obtained results may vary by a few 58 | percent compared to those reported in the paper. 59 | Nevertheless, we consistently observe the following: 60 | 61 | * Standard Adversarial Training performs worse on transferred FGSM 62 | examples than on a "direct" FGSM attack on the model due to a *gradient masking* effect. 63 | * Our RAND+FGSM attack outperforms the FGSM when applied to any model. The gap 64 | is particularly pronounced for the adversarially trained model. 65 | * Ensemble Adversarial Training is more robust than (standard) adversarial 66 | training to transferred examples computed using any of the attacks above. 67 | 68 | ###### CONTACT 69 | Questions and suggestions can be sent to tramer@cs.stanford.edu 70 | -------------------------------------------------------------------------------- /attack_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import keras.backend as K 3 | 4 | from tensorflow.python.platform import flags 5 | FLAGS = flags.FLAGS 6 | 7 | 8 | def linf_loss(X1, X2): 9 | return np.max(np.abs(X1 - X2), axis=(1, 2, 3)) 10 | 11 | 12 | def gen_adv_loss(logits, y, loss='logloss', mean=False): 13 | """ 14 | Generate the loss function. 15 | """ 16 | 17 | if loss == 'training': 18 | # use the model's output instead of the true labels to avoid 19 | # label leaking at training time 20 | y = K.cast(K.equal(logits, K.max(logits, 1, keepdims=True)), "float32") 21 | y = y / K.sum(y, 1, keepdims=True) 22 | out = K.categorical_crossentropy(logits, y, from_logits=True) 23 | elif loss == 'logloss': 24 | out = K.categorical_crossentropy(logits, y, from_logits=True) 25 | else: 26 | raise ValueError("Unknown loss: {}".format(loss)) 27 | 28 | if mean: 29 | out = K.mean(out) 30 | else: 31 | out = K.sum(out) 32 | return out 33 | 34 | 35 | def gen_grad(x, logits, y, loss='logloss'): 36 | """ 37 | Generate the gradient of the loss function. 38 | """ 39 | 40 | adv_loss = gen_adv_loss(logits, y, loss) 41 | 42 | # Define gradient of loss wrt input 43 | grad = K.gradients(adv_loss, [x])[0] 44 | return grad 45 | -------------------------------------------------------------------------------- /carlini.py: -------------------------------------------------------------------------------- 1 | ## li_attack.py -- attack a network optimizing for l_infinity distance 2 | ## 3 | ## Adapted from https://github.com/carlini/nn_robust_attacks 4 | ## 5 | ## Copyright (C) 2016, Nicholas Carlini . 6 | ## 7 | ## This program is licenced under the BSD 2-Clause licence, 8 | ## contained in the LICENCE file in this directory. 9 | 10 | import tensorflow as tf 11 | import numpy as np 12 | from tensorflow.python.platform import flags 13 | import keras.backend as K 14 | 15 | MAX_ITERATIONS = 1000 # number of iterations to perform gradient descent 16 | ABORT_EARLY = True # abort gradient descent upon first valid solution 17 | INITIAL_CONST = 1e-3 # the first value of c to start at 18 | LEARNING_RATE = 5e-3 # larger values converge faster to less accurate results 19 | LARGEST_CONST = 2e+1 # the largest value of c to go up to before giving up 20 | TARGETED = True # should we target one specific class? or just be wrong? 21 | CONST_FACTOR = 10.0 # f>1, rate at which we increase constant, smaller better 22 | CONFIDENCE = 0 # how strong the adversarial example should be 23 | EPS = 0.3 24 | 25 | FLAGS = flags.FLAGS 26 | 27 | 28 | class CarliniLi: 29 | def __init__(self, sess, model, 30 | targeted = TARGETED, learning_rate = LEARNING_RATE, 31 | max_iterations = MAX_ITERATIONS, abort_early = ABORT_EARLY, 32 | initial_const = INITIAL_CONST, largest_const = LARGEST_CONST, 33 | const_factor = CONST_FACTOR, confidence = CONFIDENCE, eps=EPS): 34 | """ 35 | The L_infinity optimized attack. 36 | Returns adversarial examples for the supplied model. 37 | targeted: True if we should perform a targetted attack, False otherwise. 38 | learning_rate: The learning rate for the attack algorithm. Smaller values 39 | produce better results but are slower to converge. 40 | max_iterations: The maximum number of iterations. Larger values are more 41 | accurate; setting too small will require a large learning rate and will 42 | produce poor results. 43 | abort_early: If true, allows early aborts if gradient descent gets stuck. 44 | initial_const: The initial tradeoff-constant to use to tune the relative 45 | importance of distance and confidence. Should be set to a very small 46 | value (but positive). 47 | largest_const: The largest constant to use until we report failure. Should 48 | be set to a very large value. 49 | reduce_const: If true, after each successful attack, make const smaller. 50 | decrease_factor: Rate at which we should decrease tau, less than one. 51 | Larger produces better quality results. 52 | const_factor: The rate at which we should increase the constant, when the 53 | previous constant failed. Should be greater than one, smaller is better. 54 | """ 55 | self.model = model 56 | self.sess = sess 57 | 58 | self.TARGETED = targeted 59 | self.LEARNING_RATE = learning_rate 60 | self.MAX_ITERATIONS = max_iterations 61 | self.ABORT_EARLY = abort_early 62 | self.INITIAL_CONST = initial_const 63 | self.LARGEST_CONST = largest_const 64 | self.const_factor = const_factor 65 | self.CONFIDENCE = confidence 66 | self.EPS = eps 67 | 68 | self.grad = self.gradient_descent(sess, model) 69 | 70 | def gradient_descent(self, sess, model): 71 | def compare(outputs, labels): 72 | y = np.argmax(labels) 73 | pred = np.argmax(outputs) 74 | 75 | if self.TARGETED: 76 | return (pred == y) 77 | else: 78 | return (pred != y) 79 | 80 | shape = (1, FLAGS.IMAGE_ROWS, FLAGS.IMAGE_COLS, FLAGS.NUM_CHANNELS) 81 | 82 | # the variable to optimize over 83 | modifier = tf.Variable(np.zeros(shape,dtype=np.float32)) 84 | 85 | tau = tf.placeholder(tf.float32, []) 86 | simg = tf.placeholder(tf.float32, shape) 87 | timg = tf.placeholder(tf.float32, shape) 88 | tlab = tf.placeholder(tf.float32, (1, FLAGS.NUM_CLASSES)) 89 | const = tf.placeholder(tf.float32, []) 90 | 91 | newimg = tf.clip_by_value(simg + modifier, 0, 1) 92 | 93 | output = model(newimg) 94 | orig_output = model(timg) 95 | 96 | real = tf.reduce_sum((tlab)*output) 97 | other = tf.reduce_max((1-tlab)*output - (tlab*10000)) 98 | 99 | if self.TARGETED: 100 | # if targetted, optimize for making the other class most likely 101 | loss1 = tf.maximum(0.0,other-real+self.CONFIDENCE) 102 | else: 103 | # if untargeted, optimize for making this class least likely. 104 | loss1 = tf.maximum(0.0,real-other+self.CONFIDENCE) 105 | 106 | # sum up the losses 107 | loss2 = tf.reduce_sum(tf.maximum(0.0, tf.abs(newimg-timg)-tau)) 108 | loss = const*loss1+loss2 109 | 110 | # setup the adam optimizer and keep track of variables we're creating 111 | start_vars = set(x.name for x in tf.global_variables()) 112 | optimizer = tf.train.AdamOptimizer(self.LEARNING_RATE) 113 | train = optimizer.minimize(loss, var_list=[modifier]) 114 | 115 | end_vars = tf.global_variables() 116 | new_vars = [x for x in end_vars if x.name not in start_vars] 117 | init = tf.variables_initializer(var_list=[modifier]+new_vars) 118 | 119 | def doit(oimgs, labs, starts, tt, CONST): 120 | prev_scores = None 121 | 122 | imgs = np.array(oimgs) 123 | starts = np.array(starts) 124 | 125 | # initialize the variables 126 | sess.run(init) 127 | while CONST < self.LARGEST_CONST: 128 | # try solving for each value of the constant 129 | #print('try const', CONST) 130 | for step in range(self.MAX_ITERATIONS): 131 | feed_dict={timg: imgs, 132 | tlab:labs, 133 | tau: tt, 134 | simg: starts, 135 | const: CONST, 136 | K.learning_phase(): 0} 137 | 138 | #if step % (self.MAX_ITERATIONS//10) == 0: 139 | # print(step, sess.run((loss,loss1,loss2),feed_dict=feed_dict)) 140 | 141 | # perform the update step 142 | _, works, linf_slack = sess.run([train, loss, loss2], feed_dict=feed_dict) 143 | 144 | # it worked 145 | if works < .0001*CONST and (self.ABORT_EARLY or step == CONST-1): 146 | get = sess.run(K.softmax(output), feed_dict=feed_dict) 147 | works = compare(get, labs) 148 | if works: 149 | scores, origscores, nimg = sess.run((output,orig_output,newimg),feed_dict=feed_dict) 150 | return scores, origscores, nimg, CONST 151 | 152 | # we didn't succeed, increase constant and try again 153 | 154 | if linf_slack >= 0.1 * self.EPS: 155 | # perturbation is too large 156 | if prev_scores is None: 157 | return prev_scores 158 | return prev_scores, prev_origscores, prev_nimg, CONST 159 | else: 160 | # didn't reach target confidence 161 | CONST *= self.const_factor 162 | 163 | prev_scores, prev_origscores, prev_nimg = sess.run((output,orig_output,newimg),feed_dict=feed_dict) 164 | 165 | scores, origscores, nimg = sess.run((output,orig_output,newimg),feed_dict=feed_dict) 166 | return scores, origscores, nimg, CONST 167 | 168 | return doit 169 | 170 | def attack(self, imgs, targets): 171 | """ 172 | Perform the L_0 attack on the given images for the given targets. 173 | If self.targeted is true, then the targets represents the target labels. 174 | If self.targeted is false, then targets are the original class labels. 175 | """ 176 | r = [] 177 | i = 0 178 | for img,target in zip(imgs, targets): 179 | print i 180 | r.extend(self.attack_single(img, target)) 181 | i += 1 182 | return np.array(r) 183 | 184 | def attack_single(self, img, target): 185 | """ 186 | Run the attack on a single image and label 187 | """ 188 | 189 | # the previous image 190 | prev = np.copy(img).reshape((1, FLAGS.IMAGE_ROWS, FLAGS.IMAGE_COLS, FLAGS.NUM_CHANNELS)) 191 | tau = self.EPS 192 | const = self.INITIAL_CONST 193 | 194 | res = self.grad([np.copy(img)], [target], np.copy(prev), tau, const) 195 | 196 | if res is None: 197 | # the attack failed, we return this as our final answer 198 | return prev 199 | 200 | scores, origscores, nimg, const = res 201 | prev = nimg 202 | return prev 203 | -------------------------------------------------------------------------------- /fgs.py: -------------------------------------------------------------------------------- 1 | 2 | import keras.backend as K 3 | from attack_utils import gen_grad 4 | 5 | 6 | def symbolic_fgs(x, grad, eps=0.3, clipping=True): 7 | """ 8 | FGSM attack. 9 | """ 10 | 11 | # signed gradient 12 | normed_grad = K.sign(grad) 13 | 14 | # Multiply by constant epsilon 15 | scaled_grad = eps * normed_grad 16 | 17 | # Add perturbation to original example to obtain adversarial example 18 | adv_x = K.stop_gradient(x + scaled_grad) 19 | 20 | if clipping: 21 | adv_x = K.clip(adv_x, 0, 1) 22 | return adv_x 23 | 24 | 25 | def iter_fgs(model, x, y, steps, eps): 26 | """ 27 | I-FGSM attack. 28 | """ 29 | 30 | adv_x = x 31 | 32 | # iteratively apply the FGSM with small step size 33 | for i in range(steps): 34 | logits = model(adv_x) 35 | grad = gen_grad(adv_x, logits, y) 36 | 37 | adv_x = symbolic_fgs(adv_x, grad, eps, True) 38 | return adv_x 39 | -------------------------------------------------------------------------------- /mnist.py: -------------------------------------------------------------------------------- 1 | from keras.datasets import mnist 2 | from keras.models import Sequential, model_from_json 3 | from keras.layers import Dense, Dropout, Activation, Flatten, Input 4 | from keras.layers import Convolution2D, MaxPooling2D 5 | from keras.preprocessing.image import ImageDataGenerator 6 | from keras.utils import np_utils 7 | 8 | import argparse 9 | import numpy as np 10 | 11 | from tensorflow.python.platform import flags 12 | FLAGS = flags.FLAGS 13 | 14 | 15 | def set_mnist_flags(): 16 | try: 17 | flags.DEFINE_integer('BATCH_SIZE', 64, 'Size of training batches') 18 | except argparse.ArgumentError: 19 | pass 20 | 21 | flags.DEFINE_integer('NUM_CLASSES', 10, 'Number of classification classes') 22 | flags.DEFINE_integer('IMAGE_ROWS', 28, 'Input row dimension') 23 | flags.DEFINE_integer('IMAGE_COLS', 28, 'Input column dimension') 24 | flags.DEFINE_integer('NUM_CHANNELS', 1, 'Input depth dimension') 25 | 26 | 27 | def data_mnist(one_hot=True): 28 | """ 29 | Preprocess MNIST dataset 30 | """ 31 | # the data, shuffled and split between train and test sets 32 | (X_train, y_train), (X_test, y_test) = mnist.load_data() 33 | 34 | X_train = X_train.reshape(X_train.shape[0], 35 | FLAGS.IMAGE_ROWS, 36 | FLAGS.IMAGE_COLS, 37 | FLAGS.NUM_CHANNELS) 38 | 39 | X_test = X_test.reshape(X_test.shape[0], 40 | FLAGS.IMAGE_ROWS, 41 | FLAGS.IMAGE_COLS, 42 | FLAGS.NUM_CHANNELS) 43 | 44 | X_train = X_train.astype('float32') 45 | X_test = X_test.astype('float32') 46 | X_train /= 255 47 | X_test /= 255 48 | print('X_train shape:', X_train.shape) 49 | print(X_train.shape[0], 'train samples') 50 | print(X_test.shape[0], 'test samples') 51 | 52 | print "Loaded MNIST test data." 53 | 54 | if one_hot: 55 | # convert class vectors to binary class matrices 56 | y_train = np_utils.to_categorical(y_train, FLAGS.NUM_CLASSES).astype(np.float32) 57 | y_test = np_utils.to_categorical(y_test, FLAGS.NUM_CLASSES).astype(np.float32) 58 | 59 | return X_train, y_train, X_test, y_test 60 | 61 | 62 | def modelA(): 63 | model = Sequential() 64 | model.add(Convolution2D(64, 5, 5, 65 | border_mode='valid', 66 | input_shape=(FLAGS.IMAGE_ROWS, 67 | FLAGS.IMAGE_COLS, 68 | FLAGS.NUM_CHANNELS))) 69 | model.add(Activation('relu')) 70 | 71 | model.add(Convolution2D(64, 5, 5)) 72 | model.add(Activation('relu')) 73 | 74 | model.add(Dropout(0.25)) 75 | 76 | model.add(Flatten()) 77 | model.add(Dense(128)) 78 | model.add(Activation('relu')) 79 | 80 | model.add(Dropout(0.5)) 81 | model.add(Dense(FLAGS.NUM_CLASSES)) 82 | return model 83 | 84 | 85 | def modelB(): 86 | model = Sequential() 87 | model.add(Dropout(0.2, input_shape=(FLAGS.IMAGE_ROWS, 88 | FLAGS.IMAGE_COLS, 89 | FLAGS.NUM_CHANNELS))) 90 | model.add(Convolution2D(64, 8, 8, 91 | subsample=(2, 2), 92 | border_mode='same')) 93 | model.add(Activation('relu')) 94 | 95 | model.add(Convolution2D(128, 6, 6, 96 | subsample=(2, 2), 97 | border_mode='valid')) 98 | model.add(Activation('relu')) 99 | 100 | model.add(Convolution2D(128, 5, 5, 101 | subsample=(1, 1))) 102 | model.add(Activation('relu')) 103 | 104 | model.add(Dropout(0.5)) 105 | 106 | model.add(Flatten()) 107 | model.add(Dense(FLAGS.NUM_CLASSES)) 108 | return model 109 | 110 | 111 | def modelC(): 112 | model = Sequential() 113 | model.add(Convolution2D(128, 3, 3, 114 | border_mode='valid', 115 | input_shape=(FLAGS.IMAGE_ROWS, 116 | FLAGS.IMAGE_COLS, 117 | FLAGS.NUM_CHANNELS))) 118 | model.add(Activation('relu')) 119 | 120 | model.add(Convolution2D(64, 3, 3)) 121 | model.add(Activation('relu')) 122 | 123 | model.add(Dropout(0.25)) 124 | 125 | model.add(Flatten()) 126 | model.add(Dense(128)) 127 | model.add(Activation('relu')) 128 | 129 | model.add(Dropout(0.5)) 130 | model.add(Dense(FLAGS.NUM_CLASSES)) 131 | return model 132 | 133 | 134 | def modelD(): 135 | model = Sequential() 136 | 137 | model.add(Flatten(input_shape=(FLAGS.IMAGE_ROWS, 138 | FLAGS.IMAGE_COLS, 139 | FLAGS.NUM_CHANNELS))) 140 | 141 | model.add(Dense(300, init='he_normal', activation='relu')) 142 | model.add(Dropout(0.5)) 143 | model.add(Dense(300, init='he_normal', activation='relu')) 144 | model.add(Dropout(0.5)) 145 | model.add(Dense(300, init='he_normal', activation='relu')) 146 | model.add(Dropout(0.5)) 147 | model.add(Dense(300, init='he_normal', activation='relu')) 148 | model.add(Dropout(0.5)) 149 | 150 | model.add(Dense(FLAGS.NUM_CLASSES)) 151 | return model 152 | 153 | 154 | def model_mnist(type=1): 155 | """ 156 | Defines MNIST model using Keras sequential model 157 | """ 158 | 159 | models = [modelA, modelB, modelC, modelD] 160 | 161 | return models[type]() 162 | 163 | 164 | def data_gen_mnist(X_train): 165 | datagen = ImageDataGenerator() 166 | 167 | datagen.fit(X_train) 168 | return datagen 169 | 170 | 171 | def load_model(model_path, type=1): 172 | 173 | try: 174 | with open(model_path+'.json', 'r') as f: 175 | json_string = f.read() 176 | model = model_from_json(json_string) 177 | except IOError: 178 | model = model_mnist(type=type) 179 | 180 | model.load_weights(model_path) 181 | return model 182 | -------------------------------------------------------------------------------- /models/.gitignore: -------------------------------------------------------------------------------- 1 | # Ignore everything in this directory 2 | * 3 | # Except this file 4 | !.gitignore 5 | -------------------------------------------------------------------------------- /simple_eval.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import tensorflow as tf 3 | import keras.backend as K 4 | from mnist import data_mnist, set_mnist_flags, load_model 5 | from fgs import symbolic_fgs, iter_fgs 6 | from carlini import CarliniLi 7 | from attack_utils import gen_grad 8 | from tf_utils import tf_test_error_rate, batch_eval 9 | from os.path import basename 10 | 11 | from tensorflow.python.platform import flags 12 | FLAGS = flags.FLAGS 13 | 14 | 15 | def main(attack, src_model_name, target_model_names): 16 | np.random.seed(0) 17 | tf.set_random_seed(0) 18 | 19 | flags.DEFINE_integer('BATCH_SIZE', 10, 'Size of batches') 20 | set_mnist_flags() 21 | 22 | x = K.placeholder((None, 23 | FLAGS.IMAGE_ROWS, 24 | FLAGS.IMAGE_COLS, 25 | FLAGS.NUM_CHANNELS)) 26 | 27 | y = K.placeholder((None, FLAGS.NUM_CLASSES)) 28 | 29 | _, _, X_test, Y_test = data_mnist() 30 | 31 | # source model for crafting adversarial examples 32 | src_model = load_model(src_model_name) 33 | 34 | # model(s) to target 35 | target_models = [None] * len(target_model_names) 36 | for i in range(len(target_model_names)): 37 | target_models[i] = load_model(target_model_names[i]) 38 | 39 | # simply compute test error 40 | if attack == "test": 41 | err = tf_test_error_rate(src_model, x, X_test, Y_test) 42 | print '{}: {:.1f}'.format(basename(src_model_name), err) 43 | 44 | for (name, target_model) in zip(target_model_names, target_models): 45 | err = tf_test_error_rate(target_model, x, X_test, Y_test) 46 | print '{}: {:.1f}'.format(basename(name), err) 47 | return 48 | 49 | eps = args.eps 50 | 51 | # take the random step in the RAND+FGSM 52 | if attack == "rand_fgs": 53 | X_test = np.clip( 54 | X_test + args.alpha * np.sign(np.random.randn(*X_test.shape)), 55 | 0.0, 1.0) 56 | eps -= args.alpha 57 | 58 | logits = src_model(x) 59 | grad = gen_grad(x, logits, y) 60 | 61 | # FGSM and RAND+FGSM one-shot attack 62 | if attack in ["fgs", "rand_fgs"]: 63 | adv_x = symbolic_fgs(x, grad, eps=eps) 64 | 65 | # iterative FGSM 66 | if attack == "ifgs": 67 | adv_x = iter_fgs(src_model, x, y, steps=args.steps, eps=args.eps/args.steps) 68 | 69 | # Carlini & Wagner attack 70 | if attack == "CW": 71 | X_test = X_test[0:1000] 72 | Y_test = Y_test[0:1000] 73 | 74 | cli = CarliniLi(K.get_session(), src_model, 75 | targeted=False, confidence=args.kappa, eps=args.eps) 76 | 77 | X_adv = cli.attack(X_test, Y_test) 78 | 79 | r = np.clip(X_adv - X_test, -args.eps, args.eps) 80 | X_adv = X_test + r 81 | 82 | err = tf_test_error_rate(src_model, x, X_adv, Y_test) 83 | print '{}->{}: {:.1f}'.format(basename(src_model_name), basename(src_model_name), err) 84 | 85 | for (name, target_model) in zip(target_model_names, target_models): 86 | err = tf_test_error_rate(target_model, x, X_adv, Y_test) 87 | print '{}->{}: {:.1f}'.format(basename(src_model_name), basename(name), err) 88 | 89 | return 90 | 91 | # compute the adversarial examples and evaluate 92 | X_adv = batch_eval([x, y], [adv_x], [X_test, Y_test])[0] 93 | 94 | # white-box attack 95 | err = tf_test_error_rate(src_model, x, X_adv, Y_test) 96 | print '{}->{}: {:.1f}'.format(basename(src_model_name), basename(src_model_name), err) 97 | 98 | # black-box attack 99 | for (name, target_model) in zip(target_model_names, target_models): 100 | err = tf_test_error_rate(target_model, x, X_adv, Y_test) 101 | print '{}->{}: {:.1f}'.format(basename(src_model_name), basename(name), err) 102 | 103 | 104 | if __name__ == "__main__": 105 | import argparse 106 | parser = argparse.ArgumentParser() 107 | parser.add_argument("attack", help="name of attack", 108 | choices=["test", "fgs", "ifgs", "rand_fgs", "CW"]) 109 | parser.add_argument("src_model", help="source model for attack") 110 | parser.add_argument('target_models', nargs='*', 111 | help='path to target model(s)') 112 | parser.add_argument("--eps", type=float, default=0.3, 113 | help="FGS attack scale") 114 | parser.add_argument("--alpha", type=float, default=0.05, 115 | help="RAND+FGSM random perturbation scale") 116 | parser.add_argument("--steps", type=int, default=10, 117 | help="Iterated FGS steps") 118 | parser.add_argument("--kappa", type=float, default=100, 119 | help="CW attack confidence") 120 | 121 | args = parser.parse_args() 122 | main(args.attack, args.src_model, args.target_models) 123 | -------------------------------------------------------------------------------- /tf_utils.py: -------------------------------------------------------------------------------- 1 | import keras.backend as K 2 | import numpy as np 3 | import tensorflow as tf 4 | from tensorflow.python.platform import flags 5 | from attack_utils import gen_adv_loss 6 | 7 | import time 8 | import sys 9 | 10 | FLAGS = flags.FLAGS 11 | EVAL_FREQUENCY = 100 12 | 13 | 14 | def batch_eval(tf_inputs, tf_outputs, numpy_inputs): 15 | """ 16 | A helper function that computes a tensor on numpy inputs by batches. 17 | From: https://github.com/openai/cleverhans/blob/master/cleverhans/utils_tf.py 18 | """ 19 | 20 | n = len(numpy_inputs) 21 | assert n > 0 22 | assert n == len(tf_inputs) 23 | m = numpy_inputs[0].shape[0] 24 | for i in range(1, n): 25 | assert numpy_inputs[i].shape[0] == m 26 | 27 | out = [] 28 | for _ in tf_outputs: 29 | out.append([]) 30 | 31 | for start in range(0, m, FLAGS.BATCH_SIZE): 32 | batch = start // FLAGS.BATCH_SIZE 33 | 34 | # Compute batch start and end indices 35 | start = batch * FLAGS.BATCH_SIZE 36 | end = start + FLAGS.BATCH_SIZE 37 | numpy_input_batches = [numpy_input[start:end] 38 | for numpy_input in numpy_inputs] 39 | cur_batch_size = numpy_input_batches[0].shape[0] 40 | assert cur_batch_size <= FLAGS.BATCH_SIZE 41 | for e in numpy_input_batches: 42 | assert e.shape[0] == cur_batch_size 43 | 44 | feed_dict = dict(zip(tf_inputs, numpy_input_batches)) 45 | feed_dict[K.learning_phase()] = 0 46 | numpy_output_batches = K.get_session().run(tf_outputs, 47 | feed_dict=feed_dict) 48 | for e in numpy_output_batches: 49 | assert e.shape[0] == cur_batch_size, e.shape 50 | for out_elem, numpy_output_batch in zip(out, numpy_output_batches): 51 | out_elem.append(numpy_output_batch) 52 | 53 | out = [np.concatenate(x, axis=0) for x in out] 54 | for e in out: 55 | assert e.shape[0] == m, e.shape 56 | return out 57 | 58 | 59 | def tf_train(x, y, model, X_train, Y_train, generator, x_advs=None): 60 | old_vars = set(tf.all_variables()) 61 | train_size = Y_train.shape[0] 62 | 63 | # Generate cross-entropy loss for training 64 | logits = model(x) 65 | preds = K.softmax(logits) 66 | l1 = gen_adv_loss(logits, y, mean=True) 67 | 68 | # add adversarial training loss 69 | if x_advs is not None: 70 | idx = tf.placeholder(dtype=np.int32) 71 | logits_adv = model(tf.stack(x_advs)[idx]) 72 | l2 = gen_adv_loss(logits_adv, y, mean=True) 73 | loss = 0.5*(l1+l2) 74 | else: 75 | l2 = tf.constant(0) 76 | loss = l1 77 | 78 | optimizer = tf.train.AdamOptimizer().minimize(loss) 79 | 80 | # Run all the initializers to prepare the trainable parameters. 81 | K.get_session().run(tf.initialize_variables( 82 | set(tf.all_variables()) - old_vars)) 83 | start_time = time.time() 84 | print('Initialized!') 85 | 86 | # Loop through training steps. 87 | num_steps = int(FLAGS.NUM_EPOCHS * train_size + FLAGS.BATCH_SIZE - 1) // FLAGS.BATCH_SIZE 88 | 89 | step = 0 90 | for (batch_data, batch_labels) \ 91 | in generator.flow(X_train, Y_train, batch_size=FLAGS.BATCH_SIZE): 92 | 93 | if len(batch_data) < FLAGS.BATCH_SIZE: 94 | k = FLAGS.BATCH_SIZE - len(batch_data) 95 | batch_data = np.concatenate([batch_data, X_train[0:k]]) 96 | batch_labels = np.concatenate([batch_labels, Y_train[0:k]]) 97 | 98 | feed_dict = {x: batch_data, 99 | y: batch_labels, 100 | K.learning_phase(): 1} 101 | 102 | # choose source of adversarial examples at random 103 | # (for ensemble adversarial training) 104 | if x_advs is not None: 105 | feed_dict[idx] = np.random.randint(len(x_advs)) 106 | 107 | # Run the graph 108 | _, curr_loss, curr_l1, curr_l2, curr_preds, _ = \ 109 | K.get_session().run([optimizer, loss, l1, l2, preds] 110 | + [model.updates], 111 | feed_dict=feed_dict) 112 | 113 | if step % EVAL_FREQUENCY == 0: 114 | elapsed_time = time.time() - start_time 115 | start_time = time.time() 116 | print('Step %d (epoch %.2f), %.2f s' % 117 | (step, float(step) * FLAGS.BATCH_SIZE / train_size, 118 | elapsed_time)) 119 | print('Minibatch loss: %.3f (%.3f, %.3f)' % (curr_loss, curr_l1, curr_l2)) 120 | 121 | print('Minibatch error: %.1f%%' % error_rate(curr_preds, batch_labels)) 122 | 123 | sys.stdout.flush() 124 | 125 | step += 1 126 | if step == num_steps: 127 | break 128 | 129 | 130 | def tf_test_error_rate(model, x, X_test, y_test): 131 | """ 132 | Compute test error. 133 | """ 134 | assert len(X_test) == len(y_test) 135 | 136 | # Predictions for the test set 137 | eval_prediction = K.softmax(model(x)) 138 | 139 | predictions = batch_eval([x], [eval_prediction], [X_test])[0] 140 | 141 | return error_rate(predictions, y_test) 142 | 143 | 144 | def error_rate(predictions, labels): 145 | """ 146 | Return the error rate in percent. 147 | """ 148 | 149 | assert len(predictions) == len(labels) 150 | 151 | return 100.0 - (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0]) 152 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | 2 | import keras 3 | from keras import backend as K 4 | from tensorflow.python.platform import flags 5 | from keras.models import save_model 6 | 7 | from tf_utils import tf_train, tf_test_error_rate 8 | from mnist import * 9 | 10 | 11 | FLAGS = flags.FLAGS 12 | 13 | 14 | def main(model_name, model_type): 15 | np.random.seed(0) 16 | assert keras.backend.backend() == "tensorflow" 17 | set_mnist_flags() 18 | 19 | flags.DEFINE_bool('NUM_EPOCHS', args.epochs, 'Number of epochs') 20 | 21 | # Get MNIST test data 22 | X_train, Y_train, X_test, Y_test = data_mnist() 23 | 24 | data_gen = data_gen_mnist(X_train) 25 | 26 | x = K.placeholder((None, 27 | FLAGS.IMAGE_ROWS, 28 | FLAGS.IMAGE_COLS, 29 | FLAGS.NUM_CHANNELS 30 | )) 31 | 32 | y = K.placeholder(shape=(None, FLAGS.NUM_CLASSES)) 33 | 34 | model = model_mnist(type=model_type) 35 | 36 | # Train an MNIST model 37 | tf_train(x, y, model, X_train, Y_train, data_gen) 38 | 39 | # Finally print the result! 40 | test_error = tf_test_error_rate(model, x, X_test, Y_test) 41 | print('Test error: %.1f%%' % test_error) 42 | save_model(model, model_name) 43 | json_string = model.to_json() 44 | with open(model_name+'.json', 'wr') as f: 45 | f.write(json_string) 46 | 47 | 48 | if __name__ == '__main__': 49 | import argparse 50 | parser = argparse.ArgumentParser() 51 | parser.add_argument("model", help="path to model") 52 | parser.add_argument("--type", type=int, help="model type", default=1) 53 | parser.add_argument("--epochs", type=int, default=6, help="number of epochs") 54 | args = parser.parse_args() 55 | 56 | main(args.model, args.type) 57 | -------------------------------------------------------------------------------- /train_adv.py: -------------------------------------------------------------------------------- 1 | import keras 2 | from keras import backend as K 3 | from tensorflow.python.platform import flags 4 | from keras.models import save_model 5 | 6 | from mnist import * 7 | from tf_utils import tf_train, tf_test_error_rate 8 | from attack_utils import gen_grad 9 | from fgs import symbolic_fgs 10 | 11 | FLAGS = flags.FLAGS 12 | 13 | 14 | def main(model_name, adv_model_names, model_type): 15 | np.random.seed(0) 16 | assert keras.backend.backend() == "tensorflow" 17 | set_mnist_flags() 18 | 19 | flags.DEFINE_bool('NUM_EPOCHS', args.epochs, 'Number of epochs') 20 | 21 | # Get MNIST test data 22 | X_train, Y_train, X_test, Y_test = data_mnist() 23 | 24 | data_gen = data_gen_mnist(X_train) 25 | 26 | x = K.placeholder(shape=(None, 27 | FLAGS.IMAGE_ROWS, 28 | FLAGS.IMAGE_COLS, 29 | FLAGS.NUM_CHANNELS)) 30 | 31 | y = K.placeholder(shape=(FLAGS.BATCH_SIZE, FLAGS.NUM_CLASSES)) 32 | 33 | eps = args.eps 34 | 35 | # if src_models is not None, we train on adversarial examples that come 36 | # from multiple models 37 | adv_models = [None] * len(adv_model_names) 38 | for i in range(len(adv_model_names)): 39 | adv_models[i] = load_model(adv_model_names[i]) 40 | 41 | model = model_mnist(type=model_type) 42 | 43 | x_advs = [None] * (len(adv_models) + 1) 44 | 45 | for i, m in enumerate(adv_models + [model]): 46 | logits = m(x) 47 | grad = gen_grad(x, logits, y, loss='training') 48 | x_advs[i] = symbolic_fgs(x, grad, eps=eps) 49 | 50 | # Train an MNIST model 51 | tf_train(x, y, model, X_train, Y_train, data_gen, x_advs=x_advs) 52 | 53 | # Finally print the result! 54 | test_error = tf_test_error_rate(model, x, X_test, Y_test) 55 | print('Test error: %.1f%%' % test_error) 56 | save_model(model, model_name) 57 | json_string = model.to_json() 58 | with open(model_name+'.json', 'wr') as f: 59 | f.write(json_string) 60 | 61 | if __name__ == '__main__': 62 | import argparse 63 | parser = argparse.ArgumentParser() 64 | parser.add_argument("model", help="path to model") 65 | parser.add_argument('adv_models', nargs='*', 66 | help='path to adv model(s)') 67 | parser.add_argument("--type", type=int, help="model type", default=0) 68 | parser.add_argument("--epochs", type=int, default=12, 69 | help="number of epochs") 70 | parser.add_argument("--eps", type=float, default=0.3, 71 | help="FGS attack scale") 72 | 73 | args = parser.parse_args() 74 | main(args.model, args.adv_models, args.type) 75 | --------------------------------------------------------------------------------