├── LICENSE
├── README.md
├── attack_utils.py
├── carlini.py
├── fgs.py
├── mnist.py
├── models
└── .gitignore
├── simple_eval.py
├── tf_utils.py
├── train.py
└── train_adv.py
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2017 ftramer
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Ensemble Adversarial Training
2 |
3 | This repository contains code to reproduce results from the paper:
4 |
5 | **Ensemble Adversarial Training: Attacks and Defenses**
6 | *Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Dan Boneh and Patrick McDaniel*
7 | ArXiv report: https://arxiv.org/abs/1705.07204
8 |
9 |
10 |
11 | ###### REQUIREMENTS
12 |
13 | The code was tested with Python 2.7.12, Tensorflow 1.0.1 and Keras 1.2.2.
14 |
15 | ###### EXPERIMENTS
16 |
17 | We start by training a few simple MNIST models. These are described in _mnist.py_.
18 |
19 | ```
20 | python -m train models/modelA --type=0
21 | python -m train models/modelB --type=1
22 | python -m train models/modelC --type=2
23 | python -m train models/modelD --type=3
24 | ```
25 |
26 | Then, we can use (standard) Adversarial Training or Ensemble Adversarial Training
27 | (we train for either 6 or 12 epochs in the paper). With Ensemble Adversarial
28 | Training, we additionally augment the training data with adversarial examples
29 | crafted from external pre-trained models (models A, C and D here):
30 |
31 | ```
32 | python -m train_adv models/modelA_adv --type=0 --epochs=12
33 | python -m train_adv models/modelA_ens models/modelA models/modelC models/modelD --type=0 --epochs=12
34 | ```
35 |
36 | The accuracy of the models on the MNIST test set can be computed using
37 |
38 | ```
39 | python -m simple_eval test [model(s)]
40 | ```
41 |
42 | To evaluate robustness to various attacks, we use
43 |
44 | ```
45 | python -m simple_eval [attack] [source_model] [target_model(s)] [--parameters (opt)]
46 | ```
47 |
48 | The attack can be:
49 |
50 | | Attack | Description | Parameters |
51 | | ------ | ----------- | ---------- |
52 | | fgs | Standard FGSM | *eps* (the norm of the perturbation) |
53 | |rand_fgs| Our FGSM variant that prepends the gradient computation by a random step | *eps* (the norm of the total perturbation); *alpha* (the norm of the random perturbation) |
54 | | ifgs | The iterative FGSM | *eps* (the norm of the perturbation); *steps* (the number of iterative FGSM steps) |
55 | | CW | The Carlini and Wagner attack | *eps* (the norm of the perturbation); *kappa* (attack confidence) |
56 |
57 | Note that due to GPU non-determinism, the obtained results may vary by a few
58 | percent compared to those reported in the paper.
59 | Nevertheless, we consistently observe the following:
60 |
61 | * Standard Adversarial Training performs worse on transferred FGSM
62 | examples than on a "direct" FGSM attack on the model due to a *gradient masking* effect.
63 | * Our RAND+FGSM attack outperforms the FGSM when applied to any model. The gap
64 | is particularly pronounced for the adversarially trained model.
65 | * Ensemble Adversarial Training is more robust than (standard) adversarial
66 | training to transferred examples computed using any of the attacks above.
67 |
68 | ###### CONTACT
69 | Questions and suggestions can be sent to tramer@cs.stanford.edu
70 |
--------------------------------------------------------------------------------
/attack_utils.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import keras.backend as K
3 |
4 | from tensorflow.python.platform import flags
5 | FLAGS = flags.FLAGS
6 |
7 |
8 | def linf_loss(X1, X2):
9 | return np.max(np.abs(X1 - X2), axis=(1, 2, 3))
10 |
11 |
12 | def gen_adv_loss(logits, y, loss='logloss', mean=False):
13 | """
14 | Generate the loss function.
15 | """
16 |
17 | if loss == 'training':
18 | # use the model's output instead of the true labels to avoid
19 | # label leaking at training time
20 | y = K.cast(K.equal(logits, K.max(logits, 1, keepdims=True)), "float32")
21 | y = y / K.sum(y, 1, keepdims=True)
22 | out = K.categorical_crossentropy(logits, y, from_logits=True)
23 | elif loss == 'logloss':
24 | out = K.categorical_crossentropy(logits, y, from_logits=True)
25 | else:
26 | raise ValueError("Unknown loss: {}".format(loss))
27 |
28 | if mean:
29 | out = K.mean(out)
30 | else:
31 | out = K.sum(out)
32 | return out
33 |
34 |
35 | def gen_grad(x, logits, y, loss='logloss'):
36 | """
37 | Generate the gradient of the loss function.
38 | """
39 |
40 | adv_loss = gen_adv_loss(logits, y, loss)
41 |
42 | # Define gradient of loss wrt input
43 | grad = K.gradients(adv_loss, [x])[0]
44 | return grad
45 |
--------------------------------------------------------------------------------
/carlini.py:
--------------------------------------------------------------------------------
1 | ## li_attack.py -- attack a network optimizing for l_infinity distance
2 | ##
3 | ## Adapted from https://github.com/carlini/nn_robust_attacks
4 | ##
5 | ## Copyright (C) 2016, Nicholas Carlini .
6 | ##
7 | ## This program is licenced under the BSD 2-Clause licence,
8 | ## contained in the LICENCE file in this directory.
9 |
10 | import tensorflow as tf
11 | import numpy as np
12 | from tensorflow.python.platform import flags
13 | import keras.backend as K
14 |
15 | MAX_ITERATIONS = 1000 # number of iterations to perform gradient descent
16 | ABORT_EARLY = True # abort gradient descent upon first valid solution
17 | INITIAL_CONST = 1e-3 # the first value of c to start at
18 | LEARNING_RATE = 5e-3 # larger values converge faster to less accurate results
19 | LARGEST_CONST = 2e+1 # the largest value of c to go up to before giving up
20 | TARGETED = True # should we target one specific class? or just be wrong?
21 | CONST_FACTOR = 10.0 # f>1, rate at which we increase constant, smaller better
22 | CONFIDENCE = 0 # how strong the adversarial example should be
23 | EPS = 0.3
24 |
25 | FLAGS = flags.FLAGS
26 |
27 |
28 | class CarliniLi:
29 | def __init__(self, sess, model,
30 | targeted = TARGETED, learning_rate = LEARNING_RATE,
31 | max_iterations = MAX_ITERATIONS, abort_early = ABORT_EARLY,
32 | initial_const = INITIAL_CONST, largest_const = LARGEST_CONST,
33 | const_factor = CONST_FACTOR, confidence = CONFIDENCE, eps=EPS):
34 | """
35 | The L_infinity optimized attack.
36 | Returns adversarial examples for the supplied model.
37 | targeted: True if we should perform a targetted attack, False otherwise.
38 | learning_rate: The learning rate for the attack algorithm. Smaller values
39 | produce better results but are slower to converge.
40 | max_iterations: The maximum number of iterations. Larger values are more
41 | accurate; setting too small will require a large learning rate and will
42 | produce poor results.
43 | abort_early: If true, allows early aborts if gradient descent gets stuck.
44 | initial_const: The initial tradeoff-constant to use to tune the relative
45 | importance of distance and confidence. Should be set to a very small
46 | value (but positive).
47 | largest_const: The largest constant to use until we report failure. Should
48 | be set to a very large value.
49 | reduce_const: If true, after each successful attack, make const smaller.
50 | decrease_factor: Rate at which we should decrease tau, less than one.
51 | Larger produces better quality results.
52 | const_factor: The rate at which we should increase the constant, when the
53 | previous constant failed. Should be greater than one, smaller is better.
54 | """
55 | self.model = model
56 | self.sess = sess
57 |
58 | self.TARGETED = targeted
59 | self.LEARNING_RATE = learning_rate
60 | self.MAX_ITERATIONS = max_iterations
61 | self.ABORT_EARLY = abort_early
62 | self.INITIAL_CONST = initial_const
63 | self.LARGEST_CONST = largest_const
64 | self.const_factor = const_factor
65 | self.CONFIDENCE = confidence
66 | self.EPS = eps
67 |
68 | self.grad = self.gradient_descent(sess, model)
69 |
70 | def gradient_descent(self, sess, model):
71 | def compare(outputs, labels):
72 | y = np.argmax(labels)
73 | pred = np.argmax(outputs)
74 |
75 | if self.TARGETED:
76 | return (pred == y)
77 | else:
78 | return (pred != y)
79 |
80 | shape = (1, FLAGS.IMAGE_ROWS, FLAGS.IMAGE_COLS, FLAGS.NUM_CHANNELS)
81 |
82 | # the variable to optimize over
83 | modifier = tf.Variable(np.zeros(shape,dtype=np.float32))
84 |
85 | tau = tf.placeholder(tf.float32, [])
86 | simg = tf.placeholder(tf.float32, shape)
87 | timg = tf.placeholder(tf.float32, shape)
88 | tlab = tf.placeholder(tf.float32, (1, FLAGS.NUM_CLASSES))
89 | const = tf.placeholder(tf.float32, [])
90 |
91 | newimg = tf.clip_by_value(simg + modifier, 0, 1)
92 |
93 | output = model(newimg)
94 | orig_output = model(timg)
95 |
96 | real = tf.reduce_sum((tlab)*output)
97 | other = tf.reduce_max((1-tlab)*output - (tlab*10000))
98 |
99 | if self.TARGETED:
100 | # if targetted, optimize for making the other class most likely
101 | loss1 = tf.maximum(0.0,other-real+self.CONFIDENCE)
102 | else:
103 | # if untargeted, optimize for making this class least likely.
104 | loss1 = tf.maximum(0.0,real-other+self.CONFIDENCE)
105 |
106 | # sum up the losses
107 | loss2 = tf.reduce_sum(tf.maximum(0.0, tf.abs(newimg-timg)-tau))
108 | loss = const*loss1+loss2
109 |
110 | # setup the adam optimizer and keep track of variables we're creating
111 | start_vars = set(x.name for x in tf.global_variables())
112 | optimizer = tf.train.AdamOptimizer(self.LEARNING_RATE)
113 | train = optimizer.minimize(loss, var_list=[modifier])
114 |
115 | end_vars = tf.global_variables()
116 | new_vars = [x for x in end_vars if x.name not in start_vars]
117 | init = tf.variables_initializer(var_list=[modifier]+new_vars)
118 |
119 | def doit(oimgs, labs, starts, tt, CONST):
120 | prev_scores = None
121 |
122 | imgs = np.array(oimgs)
123 | starts = np.array(starts)
124 |
125 | # initialize the variables
126 | sess.run(init)
127 | while CONST < self.LARGEST_CONST:
128 | # try solving for each value of the constant
129 | #print('try const', CONST)
130 | for step in range(self.MAX_ITERATIONS):
131 | feed_dict={timg: imgs,
132 | tlab:labs,
133 | tau: tt,
134 | simg: starts,
135 | const: CONST,
136 | K.learning_phase(): 0}
137 |
138 | #if step % (self.MAX_ITERATIONS//10) == 0:
139 | # print(step, sess.run((loss,loss1,loss2),feed_dict=feed_dict))
140 |
141 | # perform the update step
142 | _, works, linf_slack = sess.run([train, loss, loss2], feed_dict=feed_dict)
143 |
144 | # it worked
145 | if works < .0001*CONST and (self.ABORT_EARLY or step == CONST-1):
146 | get = sess.run(K.softmax(output), feed_dict=feed_dict)
147 | works = compare(get, labs)
148 | if works:
149 | scores, origscores, nimg = sess.run((output,orig_output,newimg),feed_dict=feed_dict)
150 | return scores, origscores, nimg, CONST
151 |
152 | # we didn't succeed, increase constant and try again
153 |
154 | if linf_slack >= 0.1 * self.EPS:
155 | # perturbation is too large
156 | if prev_scores is None:
157 | return prev_scores
158 | return prev_scores, prev_origscores, prev_nimg, CONST
159 | else:
160 | # didn't reach target confidence
161 | CONST *= self.const_factor
162 |
163 | prev_scores, prev_origscores, prev_nimg = sess.run((output,orig_output,newimg),feed_dict=feed_dict)
164 |
165 | scores, origscores, nimg = sess.run((output,orig_output,newimg),feed_dict=feed_dict)
166 | return scores, origscores, nimg, CONST
167 |
168 | return doit
169 |
170 | def attack(self, imgs, targets):
171 | """
172 | Perform the L_0 attack on the given images for the given targets.
173 | If self.targeted is true, then the targets represents the target labels.
174 | If self.targeted is false, then targets are the original class labels.
175 | """
176 | r = []
177 | i = 0
178 | for img,target in zip(imgs, targets):
179 | print i
180 | r.extend(self.attack_single(img, target))
181 | i += 1
182 | return np.array(r)
183 |
184 | def attack_single(self, img, target):
185 | """
186 | Run the attack on a single image and label
187 | """
188 |
189 | # the previous image
190 | prev = np.copy(img).reshape((1, FLAGS.IMAGE_ROWS, FLAGS.IMAGE_COLS, FLAGS.NUM_CHANNELS))
191 | tau = self.EPS
192 | const = self.INITIAL_CONST
193 |
194 | res = self.grad([np.copy(img)], [target], np.copy(prev), tau, const)
195 |
196 | if res is None:
197 | # the attack failed, we return this as our final answer
198 | return prev
199 |
200 | scores, origscores, nimg, const = res
201 | prev = nimg
202 | return prev
203 |
--------------------------------------------------------------------------------
/fgs.py:
--------------------------------------------------------------------------------
1 |
2 | import keras.backend as K
3 | from attack_utils import gen_grad
4 |
5 |
6 | def symbolic_fgs(x, grad, eps=0.3, clipping=True):
7 | """
8 | FGSM attack.
9 | """
10 |
11 | # signed gradient
12 | normed_grad = K.sign(grad)
13 |
14 | # Multiply by constant epsilon
15 | scaled_grad = eps * normed_grad
16 |
17 | # Add perturbation to original example to obtain adversarial example
18 | adv_x = K.stop_gradient(x + scaled_grad)
19 |
20 | if clipping:
21 | adv_x = K.clip(adv_x, 0, 1)
22 | return adv_x
23 |
24 |
25 | def iter_fgs(model, x, y, steps, eps):
26 | """
27 | I-FGSM attack.
28 | """
29 |
30 | adv_x = x
31 |
32 | # iteratively apply the FGSM with small step size
33 | for i in range(steps):
34 | logits = model(adv_x)
35 | grad = gen_grad(adv_x, logits, y)
36 |
37 | adv_x = symbolic_fgs(adv_x, grad, eps, True)
38 | return adv_x
39 |
--------------------------------------------------------------------------------
/mnist.py:
--------------------------------------------------------------------------------
1 | from keras.datasets import mnist
2 | from keras.models import Sequential, model_from_json
3 | from keras.layers import Dense, Dropout, Activation, Flatten, Input
4 | from keras.layers import Convolution2D, MaxPooling2D
5 | from keras.preprocessing.image import ImageDataGenerator
6 | from keras.utils import np_utils
7 |
8 | import argparse
9 | import numpy as np
10 |
11 | from tensorflow.python.platform import flags
12 | FLAGS = flags.FLAGS
13 |
14 |
15 | def set_mnist_flags():
16 | try:
17 | flags.DEFINE_integer('BATCH_SIZE', 64, 'Size of training batches')
18 | except argparse.ArgumentError:
19 | pass
20 |
21 | flags.DEFINE_integer('NUM_CLASSES', 10, 'Number of classification classes')
22 | flags.DEFINE_integer('IMAGE_ROWS', 28, 'Input row dimension')
23 | flags.DEFINE_integer('IMAGE_COLS', 28, 'Input column dimension')
24 | flags.DEFINE_integer('NUM_CHANNELS', 1, 'Input depth dimension')
25 |
26 |
27 | def data_mnist(one_hot=True):
28 | """
29 | Preprocess MNIST dataset
30 | """
31 | # the data, shuffled and split between train and test sets
32 | (X_train, y_train), (X_test, y_test) = mnist.load_data()
33 |
34 | X_train = X_train.reshape(X_train.shape[0],
35 | FLAGS.IMAGE_ROWS,
36 | FLAGS.IMAGE_COLS,
37 | FLAGS.NUM_CHANNELS)
38 |
39 | X_test = X_test.reshape(X_test.shape[0],
40 | FLAGS.IMAGE_ROWS,
41 | FLAGS.IMAGE_COLS,
42 | FLAGS.NUM_CHANNELS)
43 |
44 | X_train = X_train.astype('float32')
45 | X_test = X_test.astype('float32')
46 | X_train /= 255
47 | X_test /= 255
48 | print('X_train shape:', X_train.shape)
49 | print(X_train.shape[0], 'train samples')
50 | print(X_test.shape[0], 'test samples')
51 |
52 | print "Loaded MNIST test data."
53 |
54 | if one_hot:
55 | # convert class vectors to binary class matrices
56 | y_train = np_utils.to_categorical(y_train, FLAGS.NUM_CLASSES).astype(np.float32)
57 | y_test = np_utils.to_categorical(y_test, FLAGS.NUM_CLASSES).astype(np.float32)
58 |
59 | return X_train, y_train, X_test, y_test
60 |
61 |
62 | def modelA():
63 | model = Sequential()
64 | model.add(Convolution2D(64, 5, 5,
65 | border_mode='valid',
66 | input_shape=(FLAGS.IMAGE_ROWS,
67 | FLAGS.IMAGE_COLS,
68 | FLAGS.NUM_CHANNELS)))
69 | model.add(Activation('relu'))
70 |
71 | model.add(Convolution2D(64, 5, 5))
72 | model.add(Activation('relu'))
73 |
74 | model.add(Dropout(0.25))
75 |
76 | model.add(Flatten())
77 | model.add(Dense(128))
78 | model.add(Activation('relu'))
79 |
80 | model.add(Dropout(0.5))
81 | model.add(Dense(FLAGS.NUM_CLASSES))
82 | return model
83 |
84 |
85 | def modelB():
86 | model = Sequential()
87 | model.add(Dropout(0.2, input_shape=(FLAGS.IMAGE_ROWS,
88 | FLAGS.IMAGE_COLS,
89 | FLAGS.NUM_CHANNELS)))
90 | model.add(Convolution2D(64, 8, 8,
91 | subsample=(2, 2),
92 | border_mode='same'))
93 | model.add(Activation('relu'))
94 |
95 | model.add(Convolution2D(128, 6, 6,
96 | subsample=(2, 2),
97 | border_mode='valid'))
98 | model.add(Activation('relu'))
99 |
100 | model.add(Convolution2D(128, 5, 5,
101 | subsample=(1, 1)))
102 | model.add(Activation('relu'))
103 |
104 | model.add(Dropout(0.5))
105 |
106 | model.add(Flatten())
107 | model.add(Dense(FLAGS.NUM_CLASSES))
108 | return model
109 |
110 |
111 | def modelC():
112 | model = Sequential()
113 | model.add(Convolution2D(128, 3, 3,
114 | border_mode='valid',
115 | input_shape=(FLAGS.IMAGE_ROWS,
116 | FLAGS.IMAGE_COLS,
117 | FLAGS.NUM_CHANNELS)))
118 | model.add(Activation('relu'))
119 |
120 | model.add(Convolution2D(64, 3, 3))
121 | model.add(Activation('relu'))
122 |
123 | model.add(Dropout(0.25))
124 |
125 | model.add(Flatten())
126 | model.add(Dense(128))
127 | model.add(Activation('relu'))
128 |
129 | model.add(Dropout(0.5))
130 | model.add(Dense(FLAGS.NUM_CLASSES))
131 | return model
132 |
133 |
134 | def modelD():
135 | model = Sequential()
136 |
137 | model.add(Flatten(input_shape=(FLAGS.IMAGE_ROWS,
138 | FLAGS.IMAGE_COLS,
139 | FLAGS.NUM_CHANNELS)))
140 |
141 | model.add(Dense(300, init='he_normal', activation='relu'))
142 | model.add(Dropout(0.5))
143 | model.add(Dense(300, init='he_normal', activation='relu'))
144 | model.add(Dropout(0.5))
145 | model.add(Dense(300, init='he_normal', activation='relu'))
146 | model.add(Dropout(0.5))
147 | model.add(Dense(300, init='he_normal', activation='relu'))
148 | model.add(Dropout(0.5))
149 |
150 | model.add(Dense(FLAGS.NUM_CLASSES))
151 | return model
152 |
153 |
154 | def model_mnist(type=1):
155 | """
156 | Defines MNIST model using Keras sequential model
157 | """
158 |
159 | models = [modelA, modelB, modelC, modelD]
160 |
161 | return models[type]()
162 |
163 |
164 | def data_gen_mnist(X_train):
165 | datagen = ImageDataGenerator()
166 |
167 | datagen.fit(X_train)
168 | return datagen
169 |
170 |
171 | def load_model(model_path, type=1):
172 |
173 | try:
174 | with open(model_path+'.json', 'r') as f:
175 | json_string = f.read()
176 | model = model_from_json(json_string)
177 | except IOError:
178 | model = model_mnist(type=type)
179 |
180 | model.load_weights(model_path)
181 | return model
182 |
--------------------------------------------------------------------------------
/models/.gitignore:
--------------------------------------------------------------------------------
1 | # Ignore everything in this directory
2 | *
3 | # Except this file
4 | !.gitignore
5 |
--------------------------------------------------------------------------------
/simple_eval.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import tensorflow as tf
3 | import keras.backend as K
4 | from mnist import data_mnist, set_mnist_flags, load_model
5 | from fgs import symbolic_fgs, iter_fgs
6 | from carlini import CarliniLi
7 | from attack_utils import gen_grad
8 | from tf_utils import tf_test_error_rate, batch_eval
9 | from os.path import basename
10 |
11 | from tensorflow.python.platform import flags
12 | FLAGS = flags.FLAGS
13 |
14 |
15 | def main(attack, src_model_name, target_model_names):
16 | np.random.seed(0)
17 | tf.set_random_seed(0)
18 |
19 | flags.DEFINE_integer('BATCH_SIZE', 10, 'Size of batches')
20 | set_mnist_flags()
21 |
22 | x = K.placeholder((None,
23 | FLAGS.IMAGE_ROWS,
24 | FLAGS.IMAGE_COLS,
25 | FLAGS.NUM_CHANNELS))
26 |
27 | y = K.placeholder((None, FLAGS.NUM_CLASSES))
28 |
29 | _, _, X_test, Y_test = data_mnist()
30 |
31 | # source model for crafting adversarial examples
32 | src_model = load_model(src_model_name)
33 |
34 | # model(s) to target
35 | target_models = [None] * len(target_model_names)
36 | for i in range(len(target_model_names)):
37 | target_models[i] = load_model(target_model_names[i])
38 |
39 | # simply compute test error
40 | if attack == "test":
41 | err = tf_test_error_rate(src_model, x, X_test, Y_test)
42 | print '{}: {:.1f}'.format(basename(src_model_name), err)
43 |
44 | for (name, target_model) in zip(target_model_names, target_models):
45 | err = tf_test_error_rate(target_model, x, X_test, Y_test)
46 | print '{}: {:.1f}'.format(basename(name), err)
47 | return
48 |
49 | eps = args.eps
50 |
51 | # take the random step in the RAND+FGSM
52 | if attack == "rand_fgs":
53 | X_test = np.clip(
54 | X_test + args.alpha * np.sign(np.random.randn(*X_test.shape)),
55 | 0.0, 1.0)
56 | eps -= args.alpha
57 |
58 | logits = src_model(x)
59 | grad = gen_grad(x, logits, y)
60 |
61 | # FGSM and RAND+FGSM one-shot attack
62 | if attack in ["fgs", "rand_fgs"]:
63 | adv_x = symbolic_fgs(x, grad, eps=eps)
64 |
65 | # iterative FGSM
66 | if attack == "ifgs":
67 | adv_x = iter_fgs(src_model, x, y, steps=args.steps, eps=args.eps/args.steps)
68 |
69 | # Carlini & Wagner attack
70 | if attack == "CW":
71 | X_test = X_test[0:1000]
72 | Y_test = Y_test[0:1000]
73 |
74 | cli = CarliniLi(K.get_session(), src_model,
75 | targeted=False, confidence=args.kappa, eps=args.eps)
76 |
77 | X_adv = cli.attack(X_test, Y_test)
78 |
79 | r = np.clip(X_adv - X_test, -args.eps, args.eps)
80 | X_adv = X_test + r
81 |
82 | err = tf_test_error_rate(src_model, x, X_adv, Y_test)
83 | print '{}->{}: {:.1f}'.format(basename(src_model_name), basename(src_model_name), err)
84 |
85 | for (name, target_model) in zip(target_model_names, target_models):
86 | err = tf_test_error_rate(target_model, x, X_adv, Y_test)
87 | print '{}->{}: {:.1f}'.format(basename(src_model_name), basename(name), err)
88 |
89 | return
90 |
91 | # compute the adversarial examples and evaluate
92 | X_adv = batch_eval([x, y], [adv_x], [X_test, Y_test])[0]
93 |
94 | # white-box attack
95 | err = tf_test_error_rate(src_model, x, X_adv, Y_test)
96 | print '{}->{}: {:.1f}'.format(basename(src_model_name), basename(src_model_name), err)
97 |
98 | # black-box attack
99 | for (name, target_model) in zip(target_model_names, target_models):
100 | err = tf_test_error_rate(target_model, x, X_adv, Y_test)
101 | print '{}->{}: {:.1f}'.format(basename(src_model_name), basename(name), err)
102 |
103 |
104 | if __name__ == "__main__":
105 | import argparse
106 | parser = argparse.ArgumentParser()
107 | parser.add_argument("attack", help="name of attack",
108 | choices=["test", "fgs", "ifgs", "rand_fgs", "CW"])
109 | parser.add_argument("src_model", help="source model for attack")
110 | parser.add_argument('target_models', nargs='*',
111 | help='path to target model(s)')
112 | parser.add_argument("--eps", type=float, default=0.3,
113 | help="FGS attack scale")
114 | parser.add_argument("--alpha", type=float, default=0.05,
115 | help="RAND+FGSM random perturbation scale")
116 | parser.add_argument("--steps", type=int, default=10,
117 | help="Iterated FGS steps")
118 | parser.add_argument("--kappa", type=float, default=100,
119 | help="CW attack confidence")
120 |
121 | args = parser.parse_args()
122 | main(args.attack, args.src_model, args.target_models)
123 |
--------------------------------------------------------------------------------
/tf_utils.py:
--------------------------------------------------------------------------------
1 | import keras.backend as K
2 | import numpy as np
3 | import tensorflow as tf
4 | from tensorflow.python.platform import flags
5 | from attack_utils import gen_adv_loss
6 |
7 | import time
8 | import sys
9 |
10 | FLAGS = flags.FLAGS
11 | EVAL_FREQUENCY = 100
12 |
13 |
14 | def batch_eval(tf_inputs, tf_outputs, numpy_inputs):
15 | """
16 | A helper function that computes a tensor on numpy inputs by batches.
17 | From: https://github.com/openai/cleverhans/blob/master/cleverhans/utils_tf.py
18 | """
19 |
20 | n = len(numpy_inputs)
21 | assert n > 0
22 | assert n == len(tf_inputs)
23 | m = numpy_inputs[0].shape[0]
24 | for i in range(1, n):
25 | assert numpy_inputs[i].shape[0] == m
26 |
27 | out = []
28 | for _ in tf_outputs:
29 | out.append([])
30 |
31 | for start in range(0, m, FLAGS.BATCH_SIZE):
32 | batch = start // FLAGS.BATCH_SIZE
33 |
34 | # Compute batch start and end indices
35 | start = batch * FLAGS.BATCH_SIZE
36 | end = start + FLAGS.BATCH_SIZE
37 | numpy_input_batches = [numpy_input[start:end]
38 | for numpy_input in numpy_inputs]
39 | cur_batch_size = numpy_input_batches[0].shape[0]
40 | assert cur_batch_size <= FLAGS.BATCH_SIZE
41 | for e in numpy_input_batches:
42 | assert e.shape[0] == cur_batch_size
43 |
44 | feed_dict = dict(zip(tf_inputs, numpy_input_batches))
45 | feed_dict[K.learning_phase()] = 0
46 | numpy_output_batches = K.get_session().run(tf_outputs,
47 | feed_dict=feed_dict)
48 | for e in numpy_output_batches:
49 | assert e.shape[0] == cur_batch_size, e.shape
50 | for out_elem, numpy_output_batch in zip(out, numpy_output_batches):
51 | out_elem.append(numpy_output_batch)
52 |
53 | out = [np.concatenate(x, axis=0) for x in out]
54 | for e in out:
55 | assert e.shape[0] == m, e.shape
56 | return out
57 |
58 |
59 | def tf_train(x, y, model, X_train, Y_train, generator, x_advs=None):
60 | old_vars = set(tf.all_variables())
61 | train_size = Y_train.shape[0]
62 |
63 | # Generate cross-entropy loss for training
64 | logits = model(x)
65 | preds = K.softmax(logits)
66 | l1 = gen_adv_loss(logits, y, mean=True)
67 |
68 | # add adversarial training loss
69 | if x_advs is not None:
70 | idx = tf.placeholder(dtype=np.int32)
71 | logits_adv = model(tf.stack(x_advs)[idx])
72 | l2 = gen_adv_loss(logits_adv, y, mean=True)
73 | loss = 0.5*(l1+l2)
74 | else:
75 | l2 = tf.constant(0)
76 | loss = l1
77 |
78 | optimizer = tf.train.AdamOptimizer().minimize(loss)
79 |
80 | # Run all the initializers to prepare the trainable parameters.
81 | K.get_session().run(tf.initialize_variables(
82 | set(tf.all_variables()) - old_vars))
83 | start_time = time.time()
84 | print('Initialized!')
85 |
86 | # Loop through training steps.
87 | num_steps = int(FLAGS.NUM_EPOCHS * train_size + FLAGS.BATCH_SIZE - 1) // FLAGS.BATCH_SIZE
88 |
89 | step = 0
90 | for (batch_data, batch_labels) \
91 | in generator.flow(X_train, Y_train, batch_size=FLAGS.BATCH_SIZE):
92 |
93 | if len(batch_data) < FLAGS.BATCH_SIZE:
94 | k = FLAGS.BATCH_SIZE - len(batch_data)
95 | batch_data = np.concatenate([batch_data, X_train[0:k]])
96 | batch_labels = np.concatenate([batch_labels, Y_train[0:k]])
97 |
98 | feed_dict = {x: batch_data,
99 | y: batch_labels,
100 | K.learning_phase(): 1}
101 |
102 | # choose source of adversarial examples at random
103 | # (for ensemble adversarial training)
104 | if x_advs is not None:
105 | feed_dict[idx] = np.random.randint(len(x_advs))
106 |
107 | # Run the graph
108 | _, curr_loss, curr_l1, curr_l2, curr_preds, _ = \
109 | K.get_session().run([optimizer, loss, l1, l2, preds]
110 | + [model.updates],
111 | feed_dict=feed_dict)
112 |
113 | if step % EVAL_FREQUENCY == 0:
114 | elapsed_time = time.time() - start_time
115 | start_time = time.time()
116 | print('Step %d (epoch %.2f), %.2f s' %
117 | (step, float(step) * FLAGS.BATCH_SIZE / train_size,
118 | elapsed_time))
119 | print('Minibatch loss: %.3f (%.3f, %.3f)' % (curr_loss, curr_l1, curr_l2))
120 |
121 | print('Minibatch error: %.1f%%' % error_rate(curr_preds, batch_labels))
122 |
123 | sys.stdout.flush()
124 |
125 | step += 1
126 | if step == num_steps:
127 | break
128 |
129 |
130 | def tf_test_error_rate(model, x, X_test, y_test):
131 | """
132 | Compute test error.
133 | """
134 | assert len(X_test) == len(y_test)
135 |
136 | # Predictions for the test set
137 | eval_prediction = K.softmax(model(x))
138 |
139 | predictions = batch_eval([x], [eval_prediction], [X_test])[0]
140 |
141 | return error_rate(predictions, y_test)
142 |
143 |
144 | def error_rate(predictions, labels):
145 | """
146 | Return the error rate in percent.
147 | """
148 |
149 | assert len(predictions) == len(labels)
150 |
151 | return 100.0 - (100.0 * np.sum(np.argmax(predictions, 1) == np.argmax(labels, 1)) / predictions.shape[0])
152 |
--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
1 |
2 | import keras
3 | from keras import backend as K
4 | from tensorflow.python.platform import flags
5 | from keras.models import save_model
6 |
7 | from tf_utils import tf_train, tf_test_error_rate
8 | from mnist import *
9 |
10 |
11 | FLAGS = flags.FLAGS
12 |
13 |
14 | def main(model_name, model_type):
15 | np.random.seed(0)
16 | assert keras.backend.backend() == "tensorflow"
17 | set_mnist_flags()
18 |
19 | flags.DEFINE_bool('NUM_EPOCHS', args.epochs, 'Number of epochs')
20 |
21 | # Get MNIST test data
22 | X_train, Y_train, X_test, Y_test = data_mnist()
23 |
24 | data_gen = data_gen_mnist(X_train)
25 |
26 | x = K.placeholder((None,
27 | FLAGS.IMAGE_ROWS,
28 | FLAGS.IMAGE_COLS,
29 | FLAGS.NUM_CHANNELS
30 | ))
31 |
32 | y = K.placeholder(shape=(None, FLAGS.NUM_CLASSES))
33 |
34 | model = model_mnist(type=model_type)
35 |
36 | # Train an MNIST model
37 | tf_train(x, y, model, X_train, Y_train, data_gen)
38 |
39 | # Finally print the result!
40 | test_error = tf_test_error_rate(model, x, X_test, Y_test)
41 | print('Test error: %.1f%%' % test_error)
42 | save_model(model, model_name)
43 | json_string = model.to_json()
44 | with open(model_name+'.json', 'wr') as f:
45 | f.write(json_string)
46 |
47 |
48 | if __name__ == '__main__':
49 | import argparse
50 | parser = argparse.ArgumentParser()
51 | parser.add_argument("model", help="path to model")
52 | parser.add_argument("--type", type=int, help="model type", default=1)
53 | parser.add_argument("--epochs", type=int, default=6, help="number of epochs")
54 | args = parser.parse_args()
55 |
56 | main(args.model, args.type)
57 |
--------------------------------------------------------------------------------
/train_adv.py:
--------------------------------------------------------------------------------
1 | import keras
2 | from keras import backend as K
3 | from tensorflow.python.platform import flags
4 | from keras.models import save_model
5 |
6 | from mnist import *
7 | from tf_utils import tf_train, tf_test_error_rate
8 | from attack_utils import gen_grad
9 | from fgs import symbolic_fgs
10 |
11 | FLAGS = flags.FLAGS
12 |
13 |
14 | def main(model_name, adv_model_names, model_type):
15 | np.random.seed(0)
16 | assert keras.backend.backend() == "tensorflow"
17 | set_mnist_flags()
18 |
19 | flags.DEFINE_bool('NUM_EPOCHS', args.epochs, 'Number of epochs')
20 |
21 | # Get MNIST test data
22 | X_train, Y_train, X_test, Y_test = data_mnist()
23 |
24 | data_gen = data_gen_mnist(X_train)
25 |
26 | x = K.placeholder(shape=(None,
27 | FLAGS.IMAGE_ROWS,
28 | FLAGS.IMAGE_COLS,
29 | FLAGS.NUM_CHANNELS))
30 |
31 | y = K.placeholder(shape=(FLAGS.BATCH_SIZE, FLAGS.NUM_CLASSES))
32 |
33 | eps = args.eps
34 |
35 | # if src_models is not None, we train on adversarial examples that come
36 | # from multiple models
37 | adv_models = [None] * len(adv_model_names)
38 | for i in range(len(adv_model_names)):
39 | adv_models[i] = load_model(adv_model_names[i])
40 |
41 | model = model_mnist(type=model_type)
42 |
43 | x_advs = [None] * (len(adv_models) + 1)
44 |
45 | for i, m in enumerate(adv_models + [model]):
46 | logits = m(x)
47 | grad = gen_grad(x, logits, y, loss='training')
48 | x_advs[i] = symbolic_fgs(x, grad, eps=eps)
49 |
50 | # Train an MNIST model
51 | tf_train(x, y, model, X_train, Y_train, data_gen, x_advs=x_advs)
52 |
53 | # Finally print the result!
54 | test_error = tf_test_error_rate(model, x, X_test, Y_test)
55 | print('Test error: %.1f%%' % test_error)
56 | save_model(model, model_name)
57 | json_string = model.to_json()
58 | with open(model_name+'.json', 'wr') as f:
59 | f.write(json_string)
60 |
61 | if __name__ == '__main__':
62 | import argparse
63 | parser = argparse.ArgumentParser()
64 | parser.add_argument("model", help="path to model")
65 | parser.add_argument('adv_models', nargs='*',
66 | help='path to adv model(s)')
67 | parser.add_argument("--type", type=int, help="model type", default=0)
68 | parser.add_argument("--epochs", type=int, default=12,
69 | help="number of epochs")
70 | parser.add_argument("--eps", type=float, default=0.3,
71 | help="FGS attack scale")
72 |
73 | args = parser.parse_args()
74 | main(args.model, args.adv_models, args.type)
75 |
--------------------------------------------------------------------------------