├── img
    ├── ex_adv_mnist.pdf
    └── adv-training-not-working.pdf
├── setup.tex
├── README.org
├── LICENSE
├── src
    ├── figure_1.py
    ├── table_2.py
    ├── table_1_mnist.py
    ├── table_1_cifar10.py
    ├── figure_2.py
    └── table_1_svhn.py
└── adv-clean-not-twins.org


/img/ex_adv_mnist.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gongzhitaao/adversarial-classifier/HEAD/img/ex_adv_mnist.pdf


--------------------------------------------------------------------------------
/img/adv-training-not-working.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gongzhitaao/adversarial-classifier/HEAD/img/adv-training-not-working.pdf


--------------------------------------------------------------------------------
/setup.tex:
--------------------------------------------------------------------------------
 1 | \usepackage{booktabs}
 2 | \usepackage{authblk}
 3 | \usepackage{mathtools}
 4 | \usepackage{amssymb}
 5 | \usepackage{natbib}
 6 | \usepackage{physics}
 7 | \usepackage{subfigure}
 8 | \usepackage{times}
 9 | \usepackage{tikz}
10 | 
11 | \DeclareMathOperator*{\argmin}{argmin}
12 | \DeclareMathOperator*{\argmax}{argmax}
13 | \DeclareMathOperator*{\sign}{sign}
14 | \newcommand\pred[1]{\overline{#1}}
15 | \newcommand\given{\:\vert\:}
16 | 
17 | % \usepackage{icml2017}
18 | 
19 | % Employ this version of the ``usepackage'' statement after the paper has
20 | % been accepted, when creating the final version.  This will set the
21 | % note in the first column to ``Proceedings of the...''
22 | \usepackage[accepted]{icml2017}
23 | 


--------------------------------------------------------------------------------
/README.org:
--------------------------------------------------------------------------------
 1 | #+TITLE: Adversarial Classifier
 2 | 
 3 | This repo contains the code to reproduce the experiment in our paper
 4 | (https://arxiv.org/abs/1704.04960), specifically the code to generate
 5 | the figures and tables.
 6 | 
 7 | * Code Dependencies
 8 | :PROPERTIES:
 9 | :CUSTOM_ID: sec:code-dependencies
10 | :END:
11 | 
12 | 1. Python 3.6+
13 | 2. Keras https://keras.io/
14 | 3. Tensorflow https://tensorflow.org
15 | 4. tensorflow-adversarial
16 |    https://github.com/gongzhitaao/tensorflow-adversarial
17 | 
18 | * Datasets
19 | :PROPERTIES:
20 | :CUSTOM_ID: sec:datasets
21 | :END:
22 | 
23 | We used MNIST, CIFAR10 and SVHN in our experiment.
24 | 
25 | * Document
26 | :PROPERTIES:
27 | :CUSTOM_ID: sec:document
28 | :END:
29 | 
30 | The paper itself is written in Org mode (http://orgmode.org/)
31 | employing the ICML2017 LaTeX template, with slight modification, i.e.,
32 | removing the conference information.
33 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 gongzhitaao
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining
 6 | a copy of this software and associated documentation files (the
 7 | "Software"), to deal in the Software without restriction, including
 8 | without limitation the rights to use, copy, modify, merge, publish,
 9 | distribute, sublicense, and/or sell copies of the Software, and to
10 | permit persons to whom the Software is furnished to do so, subject to
11 | the following conditions:
12 | 
13 | The above copyright notice and this permission notice shall be
14 | included in all copies or substantial portions of the Software.
15 | 
16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


--------------------------------------------------------------------------------
/src/figure_1.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | # supress tensorflow logging other than errors
  3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
  4 | 
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | 
  8 | from keras import backend as K
  9 | from keras.datasets import mnist
 10 | from keras.models import Sequential, load_model
 11 | from keras.layers import Dense, Dropout, Activation, Flatten
 12 | from keras.layers import Convolution2D, MaxPooling2D
 13 | from keras.utils import np_utils
 14 | 
 15 | import matplotlib
 16 | matplotlib.use('Agg')
 17 | import matplotlib.pyplot as plt
 18 | import matplotlib.gridspec as gridspec
 19 | 
 20 | from attacks.fgsm import fgsm
 21 | 
 22 | 
 23 | img_rows = 28
 24 | img_cols = 28
 25 | img_chas = 1
 26 | input_shape = (img_rows, img_cols, img_chas)
 27 | nb_classes = 10
 28 | 
 29 | 
 30 | print('\nLoading mnist')
 31 | (X_train, y_train), (X_test, y_test) = mnist.load_data()
 32 | 
 33 | X_train = X_train.astype('float32') / 255.
 34 | X_test = X_test.astype('float32') / 255.
 35 | 
 36 | X_train = X_train.reshape(-1, img_rows, img_cols, img_chas)
 37 | X_test = X_test.reshape(-1, img_rows, img_cols, img_chas)
 38 | 
 39 | # one hot encoding
 40 | y_train = np_utils.to_categorical(y_train, nb_classes)
 41 | z0 = y_test.copy()
 42 | y_test = np_utils.to_categorical(y_test, nb_classes)
 43 | 
 44 | 
 45 | sess = tf.InteractiveSession()
 46 | K.set_session(sess)
 47 | 
 48 | 
 49 | if False:
 50 |     print('\nLoading model')
 51 |     model = load_model('model/figure_1.h5')
 52 | else:
 53 |     print('\nBuilding model')
 54 |     model = Sequential([
 55 |         Convolution2D(32, 3, 3, input_shape=input_shape),
 56 |         Activation('relu'),
 57 |         Convolution2D(32, 3, 3),
 58 |         Activation('relu'),
 59 |         MaxPooling2D(pool_size=(2, 2)),
 60 |         Dropout(0.25),
 61 |         Flatten(),
 62 |         Dense(128),
 63 |         Activation('relu'),
 64 |         Dropout(0.5),
 65 |         Dense(10),
 66 |         Activation('softmax')])
 67 | 
 68 |     model.compile(optimizer='adam', loss='categorical_crossentropy',
 69 |                   metrics=['accuracy'])
 70 | 
 71 |     print('\nTraining model')
 72 |     model.fit(X_train, y_train, nb_epoch=10)
 73 | 
 74 |     print('\nSaving model')
 75 |     os.makedirs('model', exist_ok=True)
 76 |     model.save('model/figure_1.h5')
 77 | 
 78 | 
 79 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chas))
 80 | x_adv = fgsm(model, x, nb_epoch=9, eps=0.02)
 81 | 
 82 | 
 83 | print('\nTest against clean data')
 84 | score = model.evaluate(X_test, y_test)
 85 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
 86 | 
 87 | 
 88 | if False:
 89 |     print('\nLoading adversarial data')
 90 |     X_adv = np.load('data/figure_1.npy')
 91 | else:
 92 |     print('\nGenerating adversarial data')
 93 |     nb_sample = X_test.shape[0]
 94 |     batch_size = 128
 95 |     nb_batch = int(np.ceil(nb_sample/batch_size))
 96 |     X_adv = np.empty(X_test.shape)
 97 |     for batch in range(nb_batch):
 98 |         print('batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
 99 |         start = batch * batch_size
100 |         end = min(nb_sample, start+batch_size)
101 |         tmp = sess.run(x_adv, feed_dict={x: X_test[start:end],
102 |                                          K.learning_phase(): 0})
103 |         X_adv[start:end] = tmp
104 | 
105 |     os.makedirs('data', exist_ok=True)
106 |     np.save('data/figure_1.npy', X_adv)
107 | 
108 | 
109 | print('\nTest against adversarial data')
110 | score = model.evaluate(X_adv, y_test)
111 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
112 | 
113 | 
114 | print('\nMake predictions')
115 | y1 = model.predict(X_test)
116 | z1 = np.argmax(y1, axis=1)
117 | y2 = model.predict(X_adv)
118 | z2 = np.argmax(y2, axis=1)
119 | 
120 | print('\nSelecting figures')
121 | X_tmp = np.empty((2, 10, 28, 28))
122 | y_proba = np.empty((2, 10, 10))
123 | for i in range(10):
124 |     print('Target {0}'.format(i))
125 |     ind, = np.where(np.all([z0==i, z1==i, z2!=i], axis=0))
126 |     cur = np.random.choice(ind)
127 |     X_tmp[0][i] = np.squeeze(X_test[cur])
128 |     X_tmp[1][i] = np.squeeze(X_adv[cur])
129 |     y_proba[0][i] = y1[cur]
130 |     y_proba[1][i] = y2[cur]
131 | 
132 | 
133 | print('\nPlotting results')
134 | fig = plt.figure(figsize=(10, 3))
135 | gs = gridspec.GridSpec(2, 10, wspace=0.1, hspace=0.1)
136 | 
137 | label = np.argmax(y_proba, axis=2)
138 | proba = np.max(y_proba, axis=2)
139 | for i in range(10):
140 |     for j in range(2):
141 |         ax = fig.add_subplot(gs[j, i])
142 |         ax.imshow(X_tmp[j][i], cmap='gray', interpolation='none')
143 |         ax.set_xticks([])
144 |         ax.set_yticks([])
145 |         ax.set_xlabel('{0} ({1:.2f})'.format(label[j][i],
146 |                                              proba[j][i]),
147 |                       fontsize=12)
148 | 
149 | print('\nSaving figure')
150 | gs.tight_layout(fig)
151 | os.makedirs('img', exist_ok=True)
152 | plt.savefig('img/figure_1.pdf')
153 | 


--------------------------------------------------------------------------------
/src/table_2.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | # supress tensorflow logging other than errors
  3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
  4 | 
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | 
  8 | from keras import backend as K
  9 | from keras.datasets import cifar10
 10 | from keras.models import Sequential, load_model
 11 | from keras.layers import Dense, Dropout, Activation, Flatten
 12 | from keras.layers import Convolution2D, MaxPooling2D
 13 | from keras.layers import LeakyReLU
 14 | from keras.callbacks import EarlyStopping
 15 | from keras.utils import np_utils
 16 | 
 17 | from attacks.fgsm import fgsm
 18 | 
 19 | 
 20 | 
 21 | img_rows = 32
 22 | img_cols = 32
 23 | img_chan = 3
 24 | input_shape=(img_rows, img_cols, img_chan)
 25 | nb_classes = 10
 26 | 
 27 | print('\nLoading cifar10')
 28 | (X_train, y_train), (X_test, y_test) = cifar10.load_data()
 29 | X_train = X_train.astype('float32') / 255
 30 | X_test = X_test.astype('float32') / 255
 31 | print('\nX_train shape:', X_train.shape)
 32 | print('X_test shape:', X_train.shape)
 33 | 
 34 | y_train = np_utils.to_categorical(y_train, nb_classes)
 35 | y_test = np_utils.to_categorical(y_test, nb_classes)
 36 | print('\ny_train shape:', y_train.shape)
 37 | print('y_test shape:', y_test.shape)
 38 | 
 39 | 
 40 | sess = tf.InteractiveSession()
 41 | K.set_session(sess)
 42 | 
 43 | 
 44 | if False:
 45 |     print('\nLoading model0')
 46 |     model0 = load_model('model/table_2_model0.h5')
 47 | else:
 48 |     print('\nBuilding model0')
 49 |     model0 = Sequential([
 50 |         Convolution2D(32, 3, 3, border_mode='same',
 51 |                       input_shape=input_shape),
 52 |         LeakyReLU(alpha=0.2),
 53 |         Convolution2D(32, 3, 3),
 54 |         LeakyReLU(alpha=0.2),
 55 |         MaxPooling2D(pool_size=(2,2)),
 56 |         Dropout(0.2),
 57 |         Convolution2D(64, 3, 3, border_mode='same'),
 58 |         LeakyReLU(alpha=0.2),
 59 |         Convolution2D(64, 3, 3),
 60 |         LeakyReLU(alpha=0.2),
 61 |         MaxPooling2D(pool_size=(2, 2)),
 62 |         Dropout(0.2),
 63 |         Convolution2D(128, 3, 3, border_mode='same'),
 64 |         LeakyReLU(alpha=0.2),
 65 |         Convolution2D(128, 3, 3),
 66 |         LeakyReLU(alpha=0.2),
 67 |         MaxPooling2D(pool_size=(2, 2)),
 68 |         Dropout(0.5),
 69 |         Flatten(),
 70 |         Dense(512),
 71 |         Activation('relu'),
 72 |         Dropout(0.5),
 73 |         Dense(nb_classes),
 74 |         Activation('softmax')])
 75 | 
 76 |     model0.compile(loss='categorical_crossentropy',
 77 |                    optimizer='adam', metrics=['accuracy'])
 78 | 
 79 |     earlystopping = EarlyStopping(monitor='val_loss', patience=5,
 80 |                                   verbose=1)
 81 |     model0.fit(X_train, y_train, nb_epoch=100, validation_split=0.1,
 82 |                callbacks=[earlystopping])
 83 | 
 84 |     print('\nSaving model0')
 85 |     os.makedirs('model', exist_ok=True)
 86 |     model0.save('model/table_2_model0.h5')
 87 | 
 88 | 
 89 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chan))
 90 | eps = tf.placeholder(tf.float32, ())
 91 | x_adv = fgsm(model0, x, nb_epoch=9, eps=eps)
 92 | 
 93 | 
 94 | print('\nTesting against clean test data')
 95 | score = model0.evaluate(X_test, y_test)
 96 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
 97 | 
 98 | 
 99 | if False:
100 |     for EPS in [0.01, 0.03, 0.1, 0.3]:
101 |         print('\nBuilding X_train_adv with eps={0:.2f}'.format(EPS))
102 |         nb_sample = X_train.shape[0]
103 |         batch_size = 128
104 |         nb_batch = int(np.ceil(nb_sample/batch_size))
105 |         X_train_adv = np.empty(X_train.shape)
106 |         for batch in range(nb_batch):
107 |             print(' batch {0}/{1}'.format(batch+1, nb_batch),
108 |                   end='\r')
109 |             start = batch * batch_size
110 |             end = min(nb_sample, start+batch_size)
111 |             tmp = sess.run(x_adv, feed_dict={x: X_train[start:end],
112 |                                              eps: EPS,
113 |                                              K.learning_phase(): 0})
114 |             X_train_adv[start:end] = tmp
115 | 
116 |         print('\nBuilding X_test_adv with eps={0:.2f}'.format(EPS))
117 |         nb_sample = X_test.shape[0]
118 |         nb_batch = int(np.ceil(nb_sample/batch_size))
119 |         X_test_adv = np.empty(X_test.shape)
120 |         for batch in range(nb_batch):
121 |             print(' batch {0}/{1}'.format(batch+1, nb_batch),
122 |                   end='\r')
123 |             start = batch * batch_size
124 |             end = min(nb_sample, start+batch_size)
125 |             tmp = sess.run(x_adv, feed_dict={x: X_test[start:end],
126 |                                              eps: EPS,
127 |                                              K.learning_phase(): 0})
128 |             X_test_adv[start:end] = tmp
129 | 
130 |         print('\nSaving adversarial images')
131 |         os.makedirs('data/', exist_ok=True)
132 |         np.savez('data/table_2_{0:.2f}.npz'.format(EPS),
133 |                  X_train_adv=X_train_adv, X_test_adv=X_test_adv)
134 | 
135 | 
136 |     print('\nTesting against adversarial test data')
137 |     score = model0.evaluate(X_test_adv, y_test)
138 |     print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
139 | 
140 | 
141 | y0_test = np.zeros((y_test.shape[0], 1))
142 | y1_test = np.ones((y_test.shape[0], 1))
143 | 
144 | 
145 | if False:
146 |     print('\nLoading model1')
147 |     model1 = load_model('model/table_2_model1.h5')
148 | else:
149 |     print('\nLoading adversarial data with eps=0.03')
150 |     db = np.load('data/table_2_0.03.npz')
151 |     X_train_adv = db['X_train_adv']
152 |     X_test_adv = db['X_test_adv']
153 | 
154 |     print('\nPreparing clean/adversarial mixed dataset')
155 |     X_all_train = np.vstack([X_train, X_train_adv])
156 |     y_all_train = np.vstack([np.zeros([X_train.shape[0], 1]),
157 |                              np.ones([X_train_adv.shape[0], 1])])
158 | 
159 |     ind = np.random.permutation(X_all_train.shape[0])
160 |     X_all_train = X_all_train[ind]
161 |     y_all_train = y_all_train[ind]
162 | 
163 |     print('\nBuilding model1')
164 |     model1 = Sequential([
165 |         Convolution2D(32, 3, 3, border_mode='same',
166 |                       input_shape=input_shape),
167 |         LeakyReLU(alpha=0.2),
168 |         Convolution2D(32, 3, 3),
169 |         LeakyReLU(alpha=0.2),
170 |         MaxPooling2D(pool_size=(2,2)),
171 |         Dropout(0.2),
172 |         Convolution2D(64, 3, 3, border_mode='same'),
173 |         LeakyReLU(alpha=0.2),
174 |         Convolution2D(64, 3, 3),
175 |         LeakyReLU(alpha=0.2),
176 |         MaxPooling2D(pool_size=(2, 2)),
177 |         Flatten(),
178 |         Dense(256),
179 |         Activation('relu'),
180 |         Dropout(0.5),
181 |         Dense(1),
182 |         Activation('sigmoid')])
183 | 
184 |     model1.compile(loss='binary_crossentropy',
185 |                    optimizer='adam', metrics=['accuracy'])
186 | 
187 |     os.makedirs('model', exist_ok=True)
188 |     model1.fit(X_all_train, y_all_train, nb_epoch=2,
189 |                validation_split=0.1)
190 | 
191 |     print('\nSaving model1')
192 |     model1.save('model/table_2_model1.h5')
193 | 
194 | 
195 | for EPS in [0.01, 0.03, 0.1, 0.3]:
196 |     print('\nLoading adversarial data with eps={0:.2f}'.format(EPS))
197 |     db = np.load('data/table_2_{0:.2f}.npz'.format(EPS))
198 |     X_train_adv = db['X_train_adv']
199 |     X_test_adv = db['X_test_adv']
200 | 
201 |     print('\nTesting against clean test data')
202 |     score = model1.evaluate(X_test, y0_test)
203 |     print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
204 | 
205 |     print('\nTesting against adversarial test data')
206 |     score = model1.evaluate(X_test_adv, y1_test)
207 |     print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
208 | 


--------------------------------------------------------------------------------
/src/table_1_mnist.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | # supress tensorflow logging other than errors
  3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
  4 | 
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | 
  8 | from keras import backend as K
  9 | from keras.datasets import mnist
 10 | from keras.models import Sequential, load_model
 11 | from keras.layers import Dense, Dropout, Activation, Flatten
 12 | from keras.layers import Convolution2D, MaxPooling2D
 13 | from keras.utils import np_utils
 14 | 
 15 | from attacks.fgsm import fgsm
 16 | 
 17 | 
 18 | 
 19 | img_rows = 28
 20 | img_cols = 28
 21 | img_chan = 1
 22 | input_shape=(img_rows, img_cols, img_chan)
 23 | nb_classes = 10
 24 | 
 25 | 
 26 | (X_train, y_train), (X_test, y_test) = mnist.load_data()
 27 | X_train = np.reshape(X_train, (-1, img_rows, img_cols, img_chan))
 28 | X_test = np.reshape(X_test, (-1, img_rows, img_cols, img_chan))
 29 | X_train = X_train.astype('float32') / 255
 30 | X_test = X_test.astype('float32') / 255
 31 | print('\nX_train shape:', X_train.shape)
 32 | print('X_test shape:', X_test.shape)
 33 | 
 34 | y_train = np_utils.to_categorical(y_train, nb_classes)
 35 | y_test = np_utils.to_categorical(y_test, nb_classes)
 36 | 
 37 | 
 38 | sess = tf.InteractiveSession()
 39 | K.set_session(sess)
 40 | 
 41 | 
 42 | if False:
 43 |     print('\nLoading model0')
 44 |     model0 = load_model('model/table_1_mnist_model0.h5')
 45 | else:
 46 |     print('\nBuilding model0')
 47 |     model0 = Sequential([
 48 |         Convolution2D(32, 3, 3, border_mode='same',
 49 |                       input_shape=input_shape),
 50 |         Activation('relu'),
 51 |         Convolution2D(32, 3, 3),
 52 |         Activation('relu'),
 53 |         MaxPooling2D(pool_size=(2, 2)),
 54 |         Dropout(0.25),
 55 |         Flatten(),
 56 |         Dense(128),
 57 |         Activation('relu'),
 58 |         Dropout(0.5),
 59 |         Dense(10),
 60 |         Activation('softmax')])
 61 | 
 62 |     model0.compile(loss='categorical_crossentropy',
 63 |                    optimizer='adam', metrics=['accuracy'])
 64 | 
 65 |     print('\nTraining model0')
 66 |     model0.fit(X_train, y_train, nb_epoch=10)
 67 | 
 68 |     print('\nSaving model0')
 69 |     os.makedirs('model', exist_ok=True)
 70 |     model0.save('model/table_1_mnist_model0.h5')
 71 | 
 72 | 
 73 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chan))
 74 | eps = tf.placeholder(tf.float32, ())
 75 | x_adv = fgsm(model0, x, nb_epoch=9, eps=eps)
 76 | 
 77 | 
 78 | print('\nTesting against clean test data')
 79 | score = model0.evaluate(X_test, y_test)
 80 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
 81 | 
 82 | 
 83 | EPS = 0.02
 84 | 
 85 | if False:
 86 |     print('\nLoading adversarial images')
 87 |     db = np.load('data/table_1_mnist_{0:.4f}.npz'.format(EPS))
 88 |     X_train_adv = db['X_train_adv']
 89 |     X_test_adv = db['X_test_adv']
 90 | else:
 91 |     print('\nBuilding X_train_adv')
 92 |     nb_sample = X_train.shape[0]
 93 |     batch_size = 128
 94 |     nb_batch = int(np.ceil(nb_sample/batch_size))
 95 |     X_train_adv = np.empty(X_train.shape)
 96 |     for batch in range(nb_batch):
 97 |         print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
 98 |         start = batch * batch_size
 99 |         end = min(nb_sample, start+batch_size)
100 |         tmp = sess.run(x_adv, feed_dict={x: X_train[start:end],
101 |                                          eps: EPS,
102 |                                          K.learning_phase(): 0})
103 |         X_train_adv[start:end] = tmp
104 | 
105 |     print('\nBuilding X_test_adv')
106 |     nb_sample = X_test.shape[0]
107 |     nb_batch = int(np.ceil(nb_sample/batch_size))
108 |     X_test_adv = np.empty(X_test.shape)
109 |     for batch in range(nb_batch):
110 |         print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
111 |         start = batch * batch_size
112 |         end = min(nb_sample, start+batch_size)
113 |         tmp = sess.run(x_adv, feed_dict={x: X_test[start:end],
114 |                                          eps: EPS,
115 |                                          K.learning_phase(): 0})
116 |         X_test_adv[start:end] = tmp
117 | 
118 |     print('\nSaving adversarial images')
119 |     os.makedirs('data/', exist_ok=True)
120 |     np.savez('data/table_1_mnist_{0:.4f}.npz'.format(EPS),
121 |              X_train_adv=X_train_adv, X_test_adv=X_test_adv)
122 | 
123 | print('\nTesting against adversarial test data')
124 | score = model0.evaluate(X_test_adv, y_test)
125 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
126 | 
127 | 
128 | print('\nPlotting random adversarial data')
129 | 
130 | print('\nMaking predictions')
131 | z0 = np.argmax(y_test, axis=1)
132 | y1 = model0.predict(X_test)
133 | z1 = np.argmax(y1, axis=1)
134 | y2 = model0.predict(X_test_adv)
135 | z2 = np.argmax(y2, axis=1)
136 | 
137 | print('\nSelecting figures')
138 | X_tmp = np.empty((2, nb_classes, img_rows, img_cols))
139 | y_proba = np.empty((2, nb_classes, nb_classes))
140 | for i in range(10):
141 |     print('Target {0}'.format(i))
142 |     ind, = np.where(np.all([z0==i, z1==i, z2!=i], axis=0))
143 |     cur = np.random.choice(ind)
144 |     X_tmp[0][i] = np.squeeze(X_test[cur])
145 |     X_tmp[1][i] = np.squeeze(X_test_adv[cur])
146 |     y_proba[0][i] = y1[cur]
147 |     y_proba[1][i] = y2[cur]
148 | 
149 | 
150 | print('\nPlotting results')
151 | fig = plt.figure(figsize=(10, 3))
152 | gs = gridspec.GridSpec(2, 10, wspace=0.1, hspace=0.1)
153 | 
154 | label = np.argmax(y_proba, axis=2)
155 | proba = np.max(y_proba, axis=2)
156 | for i in range(10):
157 |     for j in range(2):
158 |         ax = fig.add_subplot(gs[j, i])
159 |         ax.imshow(X_tmp[j][i], interpolation='none')
160 |         ax.set_xticks([])
161 |         ax.set_yticks([])
162 |         ax.set_xlabel('{0} ({1:.2f})'.format(label[j][i],
163 |                                              proba[j][i]),
164 |                       fontsize=12)
165 | 
166 | print('\nSaving figure')
167 | gs.tight_layout(fig)
168 | os.makedirs('img', exist_ok=True)
169 | plt.savefig('img/table_1_mnist.pdf')
170 | 
171 | 
172 | print('\nPreparing clean/adversarial mixed dataset')
173 | X_all_train = np.vstack([X_train, X_train_adv])
174 | y_all_train = np.vstack([np.zeros([X_train.shape[0], 1]),
175 |                          np.ones([X_train_adv.shape[0], 1])])
176 | 
177 | y0_test = np.zeros((y_test.shape[0], 1))
178 | y1_test = np.ones((y_test.shape[0], 1))
179 | 
180 | ind = np.random.permutation(X_all_train.shape[0])
181 | X_all_train = X_all_train[ind]
182 | y_all_train = y_all_train[ind]
183 | 
184 | 
185 | if False:
186 |     print('\nLoading model1')
187 |     model1 = load_model('model/table_1_mnist_model1.h5')
188 | else:
189 |     print('\nBuilding model1')
190 |     model1 = Sequential([
191 |         Convolution2D(32, 3, 3, input_shape=input_shape),
192 |         Activation('relu'),
193 |         Convolution2D(32, 3, 3),
194 |         Activation('relu'),
195 |         MaxPooling2D(pool_size=(2, 2)),
196 |         Dropout(0.25),
197 |         Flatten(),
198 |         Dense(128),
199 |         Activation('relu'),
200 |         Dropout(0.5),
201 |         Dense(1),
202 |         Activation('sigmoid')])
203 | 
204 |     model1.compile(loss='binary_crossentropy',
205 |                    optimizer='adam', metrics=['accuracy'])
206 | 
207 |     print('\nTraining model1')
208 |     os.makedirs('model', exist_ok=True)
209 |     model1.fit(X_all_train, y_all_train, nb_epoch=2,
210 |                validation_split=0.1)
211 | 
212 |     print('\nSaving model1')
213 |     model1.save('model/table_1_mnist_model1.h5')
214 | 
215 | 
216 | # x1_adv = fgsm(model1, x, nb_epoch=4, eps=0.2)
217 | 
218 | print('\nTesting against clean test data')
219 | score = model1.evaluate(X_test, y0_test)
220 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
221 | 
222 | 
223 | print('\nTesting against adversarial test data')
224 | score = model1.evaluate(X_test_adv, y1_test)
225 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
226 | 
227 | 
228 | # print('\nDisguising clean test data')
229 | # nb_sample = X_test.shape[0]
230 | # batch_size = 128
231 | # nb_batch = int(np.ceil(nb_sample/batch_size))
232 | # X_test_adv1 = np.empty(X_test.shape)
233 | # for batch in range(nb_batch):
234 | #     print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
235 | #     start = batch * batch_size
236 | #     end = min(nb_sample, start+batch_size)
237 | #     tmp = sess.run(x1_adv, feed_dict={x: X_test[start:end],
238 | #                                       K.learning_phase(): 0})
239 | #     X_test_adv1[start:end] = tmp
240 | 
241 | 
242 | # print('\nTesting against disguised clean data')
243 | # score = model1.evaluate(X_test_adv1, y0_test)
244 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
245 | 
246 | 
247 | # print('\nDisguising adversarial test data')
248 | # nb_sample = X_test_adv.shape[0]
249 | # batch_size = 128
250 | # nb_batch = int(np.ceil(nb_sample/batch_size))
251 | # X_test_adv2 = np.empty(X_test_adv.shape)
252 | # for batch in range(nb_batch):
253 | #     print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
254 | #     start = batch * batch_size
255 | #     end = min(nb_sample, start+batch_size)
256 | #     tmp = sess.run(x1_adv, feed_dict={x: X_test_adv[start:end],
257 | #                                       K.learning_phase(): 0})
258 | #     X_test_adv2[start:end] = tmp
259 | 
260 | 
261 | # print('\nTesting against disguised adversarial data')
262 | # score = model1.evaluate(X_test_adv2, y1_test)
263 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
264 | 


--------------------------------------------------------------------------------
/src/table_1_cifar10.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | # supress tensorflow logging other than errors
  3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
  4 | 
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | 
  8 | from keras import backend as K
  9 | from keras.datasets import cifar10
 10 | from keras.models import Sequential, load_model
 11 | from keras.layers import Dense, Dropout, Activation, Flatten
 12 | from keras.layers import Convolution2D, MaxPooling2D
 13 | from keras.layers import LeakyReLU
 14 | from keras.callbacks import EarlyStopping
 15 | from keras.utils import np_utils
 16 | 
 17 | from attacks.fgsm import fgsm
 18 | 
 19 | 
 20 | 
 21 | img_rows = 32
 22 | img_cols = 32
 23 | img_chan = 3
 24 | input_shape=(img_rows, img_cols, img_chan)
 25 | nb_classes = 10
 26 | 
 27 | (X_train, y_train), (X_test, y_test) = cifar10.load_data()
 28 | X_train = X_train.astype('float32') / 255
 29 | X_test = X_test.astype('float32') / 255
 30 | print('\nX_train shape:', X_train.shape)
 31 | print('X_test shape:', X_train.shape)
 32 | 
 33 | y_train = np_utils.to_categorical(y_train, nb_classes)
 34 | y_test = np_utils.to_categorical(y_test, nb_classes)
 35 | 
 36 | 
 37 | sess = tf.InteractiveSession()
 38 | K.set_session(sess)
 39 | 
 40 | 
 41 | if False:
 42 |     print('\nLoading model0')
 43 |     model0 = load_model('model/table_1_cifar10_model0.h5')
 44 | else:
 45 |     print('\nBuilding model0')
 46 |     model0 = Sequential([
 47 |         Convolution2D(32, 3, 3, border_mode='same',
 48 |                       input_shape=input_shape),
 49 |         LeakyReLU(alpha=0.2),
 50 |         Convolution2D(32, 3, 3),
 51 |         LeakyReLU(alpha=0.2),
 52 |         MaxPooling2D(pool_size=(2,2)),
 53 |         Dropout(0.2),
 54 |         Convolution2D(64, 3, 3, border_mode='same'),
 55 |         LeakyReLU(alpha=0.2),
 56 |         Convolution2D(64, 3, 3),
 57 |         LeakyReLU(alpha=0.2),
 58 |         MaxPooling2D(pool_size=(2, 2)),
 59 |         Dropout(0.2),
 60 |         Convolution2D(128, 3, 3, border_mode='same'),
 61 |         LeakyReLU(alpha=0.2),
 62 |         Convolution2D(128, 3, 3),
 63 |         LeakyReLU(alpha=0.2),
 64 |         MaxPooling2D(pool_size=(2, 2)),
 65 |         Dropout(0.5),
 66 |         Flatten(),
 67 |         Dense(512),
 68 |         Activation('relu'),
 69 |         Dropout(0.5),
 70 |         Dense(nb_classes),
 71 |         Activation('softmax')])
 72 | 
 73 |     model0.compile(loss='categorical_crossentropy',
 74 |                    optimizer='adam', metrics=['accuracy'])
 75 | 
 76 |     earlystopping = EarlyStopping(monitor='val_loss', patience=5,
 77 |                                   verbose=1)
 78 |     model0.fit(X_train, y_train, nb_epoch=100, validation_split=0.1,
 79 |                callbacks=[earlystopping])
 80 | 
 81 |     print('\nSaving model0')
 82 |     os.makedirs('model', exist_ok=True)
 83 |     model0.save('model/table_1_cifar10_model0.h5')
 84 | 
 85 | 
 86 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chan))
 87 | eps = tf.placeholder(tf.float32, ())
 88 | x_adv = fgsm(model0, x, nb_epoch=8, eps=eps)
 89 | 
 90 | 
 91 | print('\nTesting against clean test data')
 92 | score = model0.evaluate(X_test, y_test)
 93 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
 94 | 
 95 | 
 96 | EPS = 0.01
 97 | 
 98 | if False:
 99 |     print('\nLoading adversarial images')
100 |     db = np.load('data/table_1_cifar10_{0:.4f}.npz'.format(EPS))
101 |     X_train_adv = db['X_train_adv']
102 |     X_test_adv = db['X_test_adv']
103 | else:
104 |     print('\nBuilding X_train_adv')
105 |     nb_sample = X_train.shape[0]
106 |     batch_size = 128
107 |     nb_batch = int(np.ceil(nb_sample/batch_size))
108 |     X_train_adv = np.empty(X_train.shape)
109 |     for batch in range(nb_batch):
110 |         print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
111 |         start = batch * batch_size
112 |         end = min(nb_sample, start+batch_size)
113 |         tmp = sess.run(x_adv, feed_dict={x: X_train[start:end],
114 |                                          eps: EPS,
115 |                                          K.learning_phase(): 0})
116 |         X_train_adv[start:end] = tmp
117 | 
118 |     print('\nBuilding X_test_adv')
119 |     nb_sample = X_test.shape[0]
120 |     nb_batch = int(np.ceil(nb_sample/batch_size))
121 |     X_test_adv = np.empty(X_test.shape)
122 |     for batch in range(nb_batch):
123 |         print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
124 |         start = batch * batch_size
125 |         end = min(nb_sample, start+batch_size)
126 |         tmp = sess.run(x_adv, feed_dict={x: X_test[start:end],
127 |                                          eps: EPS,
128 |                                          K.learning_phase(): 0})
129 |         X_test_adv[start:end] = tmp
130 | 
131 |     print('\nSaving adversarial images')
132 |     os.makedirs('data/', exist_ok=True)
133 |     np.savez('data/table_1_cifar10_{0:.4f}.npz'.format(EPS),
134 |              X_train_adv=X_train_adv, X_test_adv=X_test_adv)
135 | 
136 | 
137 | print('\nTesting against adversarial test data')
138 | score = model0.evaluate(X_test_adv, y_test)
139 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
140 | 
141 | 
142 | print('\nPlotting random adversarial data')
143 | 
144 | print('\nMaking predictions')
145 | z0 = np.argmax(y_test, axis=1)
146 | y1 = model0.predict(X_test)
147 | z1 = np.argmax(y1, axis=1)
148 | y2 = model0.predict(X_test_adv)
149 | z2 = np.argmax(y2, axis=1)
150 | 
151 | print('\nSelecting figures')
152 | X_tmp = np.empty((2, nb_classes, img_rows, img_cols, img_chan))
153 | y_proba = np.empty((2, nb_classes, nb_classes))
154 | for i in range(10):
155 |     print('Target {0}'.format(i))
156 |     ind, = np.where(np.all([z0==i, z1==i, z2!=i], axis=0))
157 |     cur = np.random.choice(ind)
158 |     X_tmp[0][i] = X_test[cur]
159 |     X_tmp[1][i] = X_test_adv[cur]
160 |     y_proba[0][i] = y1[cur]
161 |     y_proba[1][i] = y2[cur]
162 | 
163 | 
164 | print('\nPlotting results')
165 | fig = plt.figure(figsize=(10, 3))
166 | gs = gridspec.GridSpec(2, 10, wspace=0.1, hspace=0.1)
167 | 
168 | label = np.argmax(y_proba, axis=2)
169 | proba = np.max(y_proba, axis=2)
170 | for i in range(10):
171 |     for j in range(2):
172 |         ax = fig.add_subplot(gs[j, i])
173 |         ax.imshow(X_tmp[j][i], interpolation='none')
174 |         ax.set_xticks([])
175 |         ax.set_yticks([])
176 |         ax.set_xlabel('{0} ({1:.2f})'.format(label[j][i],
177 |                                              proba[j][i]),
178 |                       fontsize=12)
179 | 
180 | print('\nSaving figure')
181 | gs.tight_layout(fig)
182 | os.makedirs('img', exist_ok=True)
183 | plt.savefig('img/table_1_cifar10.pdf')
184 | 
185 | 
186 | print('\nPreparing clean/adversarial mixed dataset')
187 | 
188 | X_all_train = np.vstack([X_train, X_train_adv])
189 | y_all_train = np.vstack([np.zeros([X_train.shape[0], 1]),
190 |                          np.ones([X_train_adv.shape[0], 1])])
191 | 
192 | y0_test = np.zeros((y_test.shape[0], 1))
193 | y1_test = np.ones((y_test.shape[0], 1))
194 | 
195 | ind = np.random.permutation(X_all_train.shape[0])
196 | X_all_train = X_all_train[ind]
197 | y_all_train = y_all_train[ind]
198 | 
199 | 
200 | if False:
201 |     print('\nLoading model1')
202 |     model1 = load_model('model/table_1_cifar10_model1.h5')
203 | else:
204 |     print('\nBuilding model1')
205 |     model1 = Sequential([
206 |         Convolution2D(32, 3, 3, border_mode='same',
207 |                       input_shape=input_shape),
208 |         LeakyReLU(alpha=0.2),
209 |         Convolution2D(32, 3, 3),
210 |         LeakyReLU(alpha=0.2),
211 |         MaxPooling2D(pool_size=(2,2)),
212 |         Dropout(0.2),
213 |         Convolution2D(64, 3, 3, border_mode='same'),
214 |         LeakyReLU(alpha=0.2),
215 |         Convolution2D(64, 3, 3),
216 |         LeakyReLU(alpha=0.2),
217 |         MaxPooling2D(pool_size=(2, 2)),
218 |         Flatten(),
219 |         Dense(256),
220 |         Activation('relu'),
221 |         Dropout(0.5),
222 |         Dense(1),
223 |         Activation('sigmoid')])
224 | 
225 |     model1.compile(loss='binary_crossentropy',
226 |                    optimizer='adam', metrics=['accuracy'])
227 | 
228 |     os.makedirs('model', exist_ok=True)
229 |     model1.fit(X_all_train, y_all_train, nb_epoch=2,
230 |                validation_split=0.1)
231 | 
232 |     print('\nSaving model1')
233 |     model1.save('model/table_1_cifar10_model1.h5')
234 | 
235 | 
236 | # x1_adv = fgsm(model1, x, nb_epoch=4, eps=0.01)
237 | 
238 | print('\nTesting against clean test data')
239 | score = model1.evaluate(X_test, y0_test)
240 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
241 | 
242 | 
243 | print('\nTesting against adversarial test data')
244 | score = model1.evaluate(X_test_adv, y1_test)
245 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
246 | 
247 | 
248 | # print('\nDisguising clean test data')
249 | # nb_sample = X_test.shape[0]
250 | # batch_size = 128
251 | # nb_batch = int(np.ceil(nb_sample/batch_size))
252 | # X_test_adv1 = np.empty(X_test.shape)
253 | # for batch in range(nb_batch):
254 | #     print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
255 | #     start = batch * batch_size
256 | #     end = min(nb_sample, start+batch_size)
257 | #     tmp = sess.run(x1_adv, feed_dict={x: X_test[start:end],
258 | #                                       K.learning_phase(): 0})
259 | #     X_test_adv1[start:end] = tmp
260 | 
261 | 
262 | # print('\nTesting against disguised clean data')
263 | # score = model1.evaluate(X_test_adv1, y0_test)
264 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
265 | 
266 | 
267 | # print('\nDisguising adversarial test data')
268 | # nb_sample = X_test_adv.shape[0]
269 | # batch_size = 128
270 | # nb_batch = int(np.ceil(nb_sample/batch_size))
271 | # X_test_adv2 = np.empty(X_test_adv.shape)
272 | # for batch in range(nb_batch):
273 | #     print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
274 | #     start = batch * batch_size
275 | #     end = min(nb_sample, start+batch_size)
276 | #     tmp = sess.run(x1_adv, feed_dict={x: X_test_adv[start:end],
277 | #                                       K.learning_phase(): 0})
278 | #     X_test_adv2[start:end] = tmp
279 | 
280 | 
281 | # print('\nTesting against disguised adversarial data')
282 | # score = model1.evaluate(X_test_adv2, y1_test)
283 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
284 | 


--------------------------------------------------------------------------------
/src/figure_2.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | # supress tensorflow logging other than errors
  3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
  4 | 
  5 | import numpy as np
  6 | import tensorflow as tf
  7 | 
  8 | from keras import backend as K
  9 | from keras.datasets import mnist
 10 | from keras.models import Sequential, load_model
 11 | from keras.layers import Dense, Dropout, Activation, Flatten
 12 | from keras.layers import Convolution2D, MaxPooling2D
 13 | from keras.utils import np_utils
 14 | 
 15 | import matplotlib
 16 | matplotlib.use('Agg')
 17 | import matplotlib.pyplot as plt
 18 | import matplotlib.gridspec as gridspec
 19 | 
 20 | from attacks.fgsm import fgsm
 21 | 
 22 | 
 23 | 
 24 | def random_orthogonal(i):
 25 |     """Return a random vector orthogonal to i."""
 26 |     v = np.random.random(i.shape)
 27 |     i /= np.linalg.norm(i)
 28 |     a = np.dot(v, i) / np.dot(i, i)
 29 |     j = v - a*i
 30 |     b = np.linalg.norm(j)
 31 |     j /= b
 32 |     return j, (a, i)
 33 | 
 34 | 
 35 | img_rows = 28
 36 | img_cols = 28
 37 | img_chan = 1
 38 | nb_classes = 10
 39 | input_shape=(img_rows, img_cols, img_chan)
 40 | 
 41 | 
 42 | print('\nLoading mnist')
 43 | (X_train, y_train), (X_test, y_test) = mnist.load_data()
 44 | 
 45 | X_train = X_train.astype('float32') / 255.
 46 | X_test = X_test.astype('float32') / 255.
 47 | 
 48 | X_train = X_train.reshape(-1, img_rows, img_cols, img_chan)
 49 | X_test = X_test.reshape(-1, img_rows, img_cols, img_chan)
 50 | print('\nX_train shape:', X_train.shape)
 51 | print('y_train shape:', y_train.shape)
 52 | 
 53 | # one hot encoding
 54 | y_train = np_utils.to_categorical(y_train, nb_classes)
 55 | y_test = np_utils.to_categorical(y_test, nb_classes)
 56 | 
 57 | 
 58 | sess = tf.InteractiveSession()
 59 | K.set_session(sess)
 60 | 
 61 | 
 62 | if False:
 63 |     print('\nLoading model0')
 64 |     model0 = load_model('model/figure_2_model0.h5')
 65 | else:
 66 |     print('\nBuilding model0')
 67 |     model0 = Sequential([
 68 |         Convolution2D(32, 3, 3, input_shape=input_shape),
 69 |         Activation('relu'),
 70 |         Convolution2D(32, 3, 3),
 71 |         Activation('relu'),
 72 |         MaxPooling2D(pool_size=(2, 2)),
 73 |         # Dropout(0.25),
 74 |         Flatten(),
 75 |         Dense(128),
 76 |         Activation('relu'),
 77 |         # Dropout(0.5),
 78 |         Dense(10),
 79 |         Activation('softmax')])
 80 | 
 81 |     model0.compile(optimizer='adam', loss='categorical_crossentropy',
 82 |                   metrics=['accuracy'])
 83 | 
 84 |     print('\nTraining model0')
 85 |     model0.fit(X_train, y_train, nb_epoch=10)
 86 | 
 87 |     print('\nSaving model0')
 88 |     os.makedirs('model', exist_ok=True)
 89 |     model0.save('model/figure_2_model0.h5')
 90 | 
 91 | 
 92 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chan))
 93 | y = tf.placeholder(tf.int32, (None, ))
 94 | x_adv = fgsm(model0, x, eps=0.25, nb_epoch=1)
 95 | 
 96 | 
 97 | print('\nTesting against clean data')
 98 | score = model0.evaluate(X_test, y_test)
 99 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
100 | 
101 | 
102 | if False:
103 |     print('\nLoading adversarial datasets')
104 |     X_adv = np.load('data/figure_2.npy')
105 | else:
106 |     print('\nGenerating adversarial')
107 |     batch_size = 64
108 |     X_adv = np.empty(X_test.shape)
109 |     nb_sample = X_test.shape[0]
110 |     nb_batch = int(np.ceil(nb_sample/batch_size))
111 |     for batch in range(nb_batch):
112 |         print('batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
113 |         start = batch * batch_size
114 |         end = min(nb_sample, start+batch_size)
115 |         tmp = sess.run(x_adv, feed_dict={x: X_test[start:end],
116 |                                          K.learning_phase(): 0})
117 |         X_adv[start:end] = tmp
118 | 
119 |     print('\nSaving adversarials')
120 |     os.makedirs('data', exist_ok=True)
121 |     np.save('data/figure_2.npy', X_adv)
122 | 
123 | 
124 | print('\nTesting against adversarial data')
125 | score = model0.evaluate(X_adv, y_test)
126 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
127 | 
128 | 
129 | if False:
130 |     print('\nLoading model1')
131 |     model1 = load_model('model/figure_2_model1.h5')
132 | else:
133 |     print('\nBuilding model1')
134 |     model1 = Sequential([
135 |         Convolution2D(32, 3, 3, input_shape=input_shape),
136 |         Activation('relu'),
137 |         Convolution2D(32, 3, 3),
138 |         Activation('relu'),
139 |         MaxPooling2D(pool_size=(2, 2)),
140 |         Dropout(0.25),
141 |         Flatten(),
142 |         Dense(128),
143 |         Activation('relu'),
144 |         Dropout(0.5),
145 |         Dense(10),
146 |         Activation('softmax')])
147 | 
148 |     model1.compile(loss='categorical_crossentropy',
149 |                    optimizer='adam', metrics=['accuracy'])
150 | 
151 |     x_adv_tmp = fgsm(model1, x, eps=0.3, nb_epoch=1)
152 | 
153 |     print('\nDummy testing')
154 |     model1.evaluate(X_test[:10], y_test[:10], verbose=0)
155 | 
156 | 
157 |     print('\nPreparing training/validation dataset')
158 |     validation_split = 0.1
159 |     N = int(X_train.shape[0]*validation_split)
160 |     X_tmp_train, X_tmp_val = X_train[:-N], X_train[-N:]
161 |     y_tmp_train, y_tmp_val = y_train[:-N], y_train[-N:]
162 | 
163 | 
164 |     print('\nTraining model1')
165 |     nb_epoch = 10
166 |     batch_size = 64
167 |     nb_sample = X_tmp_train.shape[0]
168 |     nb_batch = int(np.ceil(nb_sample/batch_size))
169 |     for epoch in range(nb_epoch):
170 |         print('Epoch {0}/{1}'.format(epoch+1, nb_epoch))
171 |         for batch in range(nb_batch):
172 |             print(' batch {0}/{1} '.format(batch+1, nb_batch),
173 |                   end='\r', flush=True)
174 |             start = batch * batch_size
175 |             end = min(nb_sample, start+batch_size)
176 |             X_tmp_adv = sess.run(x_adv_tmp, feed_dict={
177 |                 x: X_tmp_train[start:end], K.learning_phase(): 0})
178 |             y_tmp_adv = y_tmp_train[start:end]
179 |             X_batch = np.vstack((X_tmp_train[start:end], X_tmp_adv))
180 |             y_batch = np.vstack((y_tmp_train[start:end], y_tmp_adv))
181 |             score = model1.train_on_batch(X_batch, y_batch)
182 |         score = model1.evaluate(X_tmp_val, y_tmp_val)
183 |         print(' loss: {0:.4f} acc: {1:.4f}'
184 |               .format(score[0], score[1]))
185 | 
186 |     print('\nSaving model1')
187 |     os.makedirs('model', exist_ok=True)
188 |     model1.save('model/figure_2_model1.h5')
189 | 
190 | 
191 | print('\nTesting against adversarial')
192 | score = model1.evaluate(X_adv, y_test)
193 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
194 | 
195 | 
196 | print('\nPreparing predictions')
197 | y0_0 = model0.predict(X_test)
198 | y0_1 = model0.predict(X_adv)
199 | 
200 | y1_0 = model1.predict(X_test)
201 | y1_1 = model1.predict(X_adv)
202 | 
203 | z_test = np.argmax(y_test, axis=1)
204 | 
205 | z0_0 = np.argmax(y0_0, axis=1)
206 | z0_1 = np.argmax(y0_1, axis=1)
207 | 
208 | z1_0 = np.argmax(y1_0, axis=1)
209 | z1_1 = np.argmax(y1_1, axis=1)
210 | 
211 | p0_0 = np.max(y0_0, axis=1)
212 | p0_1 = np.max(y0_1, axis=1)
213 | 
214 | p1_0 = np.max(y1_0, axis=1)
215 | p1_1 = np.max(y1_1, axis=1)
216 | 
217 | img_rows = 41
218 | img_cols = 41
219 | img_chan = 4
220 | 
221 | p_filter = np.all([p0_0>0.5, p0_1>0.5, p1_1>0.5], axis=0)
222 | 
223 | print('\nGenerating figure')
224 | fig = plt.figure(figsize=(8, 1))
225 | gs = gridspec.GridSpec(1, 10, wspace=0.15)
226 | 
227 | for label in range(10):
228 |     print('Label {0}'.format(label))
229 |     gs0 = gridspec.GridSpecFromSubplotSpec(4, 3,
230 |                                            subplot_spec=gs[label],
231 |                                            wspace=0.05, hspace=0.1)
232 |     ax = fig.add_subplot(gs0[:3, :])
233 |     ax0 = fig.add_subplot(gs0[3, 0])
234 |     ax1 = fig.add_subplot(gs0[3, 1])
235 |     ax2 = fig.add_subplot(gs0[3, 2])
236 | 
237 |     img = np.empty((img_rows, img_cols, img_chan))
238 |     ind, = np.where(np.all([p_filter,
239 |                             z_test==label, z0_0==label,
240 |                             z0_1!=label, z1_1==label],
241 |                            axis=0))
242 | 
243 |     cur = np.random.choice(ind)
244 |     X_i = np.squeeze(X_test[cur])
245 |     X_adv_i = np.squeeze(X_adv[cur])
246 | 
247 |     i = X_adv_i.flatten() - X_i.flatten()
248 |     j, (a, i) = random_orthogonal(i)
249 | 
250 |     D = np.amax([1.5 * np.absolute(a),
251 |                  0.5 / np.linalg.norm(i, ord=np.inf)])
252 | 
253 |     eps_i = np.linspace(-D, D, img_cols)
254 |     eps_j = np.linspace(D, -D, img_rows)
255 | 
256 |     cnt = 0
257 |     tmpr, tmpc = 0, 0
258 | 
259 |     for r, ej in enumerate(eps_j):
260 |         for c, ei in enumerate(eps_i):
261 | 
262 |             X_tmp = np.clip(X_i.flatten()+ej*j+ei*i, 0, 1)
263 |             X_tmp = np.reshape(X_tmp, (1, 28, 28, 1))
264 |             y0_tmp = model0.predict(X_tmp)
265 |             z0_tmp = np.argmax(y0_tmp)
266 |             y1_tmp = model1.predict(X_tmp)
267 |             z1_tmp = np.argmax(y1_tmp)
268 | 
269 |             if z0_tmp == label and z1_tmp == label:
270 |                 # correct prediction in both cases
271 |                 color = [1, 1, 1, 1]
272 |             elif z0_tmp == label:
273 |                 # correct prediction after normal training
274 |                 color = [1, 0, 0, 0.1]
275 |             elif z1_tmp == label:
276 |                 # correct prediction after adv training
277 |                 color = [0, 1, 0, 0.1]
278 |             else:
279 |                 # incorrect prediction in both cases
280 |                 color = [0.1, 0.1, 0.1, 0.1]
281 |                 cnt += 1
282 |                 if np.random.random() < 1./cnt:
283 |                     tmpr, tmpc = r, c
284 |             img[r, c] = color
285 | 
286 |     # the original datum
287 |     r = img_rows // 2
288 |     c = img_rows // 2
289 |     img[r, c] = [0, 0, 0, 1]
290 | 
291 |     # adversarial datum
292 |     r = img_rows // 2
293 |     c = int((np.linalg.norm(X_adv_i-X_i)-eps_i[0]) / (2*D) * img_cols)
294 |     img[r, c] = [1.0, 0.65, 0, 1]
295 | 
296 |     # random adversarial datum
297 |     img[tmpr, tmpc] = [0, 0, 1, 1]
298 | 
299 | 
300 |     ax.imshow(img, interpolation='none')
301 |     ax.set_xticks([])
302 |     ax.set_yticks([])
303 | 
304 |     ax0.imshow(X_i, cmap='gray', interpolation='none')
305 |     ax0.set_xticks([])
306 |     ax0.set_yticks([])
307 | 
308 |     ax1.imshow(X_adv_i, cmap='gray', interpolation='none')
309 |     ax1.set_xticks([])
310 |     ax1.set_yticks([])
311 | 
312 |     X_tmp = np.clip(X_i.flatten()+eps_j[tmpr]*j+eps_i[tmpc]*i,
313 |                     0, 1)
314 |     X_tmp = np.reshape(X_tmp, (28, 28))
315 |     ax2.imshow(X_tmp, cmap='gray', interpolation='none')
316 |     ax2.set_xticks([])
317 |     ax2.set_yticks([])
318 | 
319 | 
320 | gs.tight_layout(fig, pad=0)
321 | os.makedirs('img', exist_ok=True)
322 | plt.savefig('img/figure_2.pdf')
323 | 


--------------------------------------------------------------------------------
/src/table_1_svhn.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | # supress tensorflow logging other than errors
  3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
  4 | 
  5 | from urllib.request import urlopen
  6 | 
  7 | import numpy as np
  8 | import tensorflow as tf
  9 | 
 10 | from keras import backend as K
 11 | from keras.models import Sequential, load_model
 12 | from keras.layers import Dense, Dropout, Activation, Flatten
 13 | from keras.layers import Convolution2D, MaxPooling2D
 14 | from keras.layers import LeakyReLU
 15 | from keras.callbacks import EarlyStopping
 16 | from keras.utils import np_utils
 17 | 
 18 | from scipy.io import loadmat
 19 | 
 20 | import matplotlib
 21 | matplotlib.use('Agg')
 22 | import matplotlib.pyplot as plt
 23 | import matplotlib.gridspec as gridspec
 24 | 
 25 | from attacks.fgsm import fgsm
 26 | 
 27 | 
 28 | 
 29 | img_rows = 32
 30 | img_cols = 32
 31 | img_chan = 3
 32 | nb_classes = 10
 33 | input_shape = (img_rows, img_cols, img_chan)
 34 | 
 35 | 
 36 | def maybe_download():
 37 |     train_url = 'http://ufldl.stanford.edu/housenumbers/train_32x32.mat'
 38 |     test_url = 'http://ufldl.stanford.edu/housenumbers/test_32x32.mat'
 39 |     os.makedirs('data', exist_ok=True)
 40 |     if not os.path.exists('data/svhn_train_32x32.mat'):
 41 |         print('\nDownloading svhn training data')
 42 |         with urlopen(train_url) as response,\
 43 |              open('data/svhn_train_32x32.mat', 'wb') as w:
 44 |             data = response.read()
 45 |             w.write(data)
 46 |     if not os.path.exists('data/svhn_test_32x32.mat'):
 47 |         print('\nDownloading svhn test data')
 48 |         with urlopen(test_url) as response,\
 49 |              open('data/svhn_test_32x32.mat', 'wb') as w:
 50 |             data = response.read()
 51 |             w.write(data)
 52 | 
 53 | 
 54 | maybe_download()
 55 | 
 56 | print('\nLoading SVHN dataset')
 57 | db = loadmat('data/svhn_train_32x32.mat')
 58 | X_train, y_train = db['X'], db['y']
 59 | X_train = X_train.transpose(3, 0, 1, 2)
 60 | db = loadmat('data/svhn_test_32x32.mat')
 61 | X_test, y_test = db['X'], db['y']
 62 | X_test = X_test.transpose(3, 0, 1, 2)
 63 | 
 64 | 
 65 | X_train = X_train.astype('float32') / 128 - 1.
 66 | X_test = X_test.astype('float32') / 128 -1.
 67 | print('\nX_train shape:', X_train.shape)
 68 | print('X_test shape:', X_test.shape)
 69 | 
 70 | 
 71 | y_train[10==y_train] = 0
 72 | y_test[10==y_test] = 0
 73 | y_train = np_utils.to_categorical(y_train, nb_classes)
 74 | y_test = np_utils.to_categorical(y_test, nb_classes)
 75 | 
 76 | 
 77 | sess = tf.InteractiveSession()
 78 | K.set_session(sess)
 79 | 
 80 | 
 81 | if False:
 82 |     print('\nLoading model0')
 83 |     model0 = load_model('model/table_1_svhn_model0.h5')
 84 | else:
 85 |     print('\nBuilding model0')
 86 |     model0 = Sequential([
 87 |         Convolution2D(32, 3, 3, border_mode='same',
 88 |                       input_shape=input_shape),
 89 |         LeakyReLU(alpha=0.2),
 90 |         Convolution2D(32, 3, 3),
 91 |         LeakyReLU(alpha=0.2),
 92 |         MaxPooling2D(pool_size=(2,2)),
 93 |         Dropout(0.2),
 94 |         Convolution2D(64, 3, 3, border_mode='same'),
 95 |         LeakyReLU(alpha=0.2),
 96 |         Convolution2D(64, 3, 3),
 97 |         LeakyReLU(alpha=0.2),
 98 |         MaxPooling2D(pool_size=(2, 2)),
 99 |         Dropout(0.3),
100 |         Convolution2D(128, 3, 3, border_mode='same'),
101 |         LeakyReLU(alpha=0.2),
102 |         Convolution2D(128, 3, 3),
103 |         LeakyReLU(alpha=0.2),
104 |         MaxPooling2D(pool_size=(2, 2)),
105 |         Dropout(0.4),
106 |         Flatten(),
107 |         Dense(512),
108 |         Activation('relu'),
109 |         Dropout(0.5),
110 |         Dense(nb_classes),
111 |         Activation('softmax')])
112 | 
113 |     model0.compile(loss='categorical_crossentropy',
114 |                    optimizer="adam", metrics=['accuracy'])
115 | 
116 | 
117 |     earlystopping = EarlyStopping(monitor='val_loss', patience=5,
118 |                                   verbose=1)
119 |     model0.fit(X_train, y_train, nb_epoch=100, validation_split=0.1,
120 |                callbacks=[earlystopping])
121 | 
122 |     print('\nSaving model0')
123 |     os.makedirs('model', exist_ok=True)
124 |     model0.save('model/table_1_svhn_model0.h5')
125 | 
126 | 
127 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chan))
128 | eps = tf.placeholder(tf.float32, ())
129 | x_adv = fgsm(model0, x, nb_epoch=9, eps=eps, clip_min=-1.,
130 |              clip_max=1.)
131 | 
132 | 
133 | print('\nTesting against clean test data')
134 | score = model0.evaluate(X_test, y_test)
135 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
136 | 
137 | 
138 | EPS = 0.01
139 | 
140 | if False:
141 |     print('\nLoading adversarial dataset')
142 |     db = np.load('data/table_1_svhn_{0:.4f}.npz'.format(EPS))
143 |     X_train_adv = db['X_train_adv']
144 |     X_test_adv = db['X_test_adv']
145 | else:
146 |     print('\nBuilding X_train_adv')
147 |     nb_sample = X_train.shape[0]
148 |     batch_size = 128
149 |     nb_batch = int(np.ceil(nb_sample/batch_size))
150 |     X_train_adv = np.empty(X_train.shape)
151 |     for batch in range(nb_batch):
152 |         print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
153 |         start = batch * batch_size
154 |         end = min(nb_sample, start+batch_size)
155 |         tmp = sess.run(x_adv, feed_dict={x: X_train[start:end],
156 |                                          eps: EPS,
157 |                                          K.learning_phase(): 0})
158 |         X_train_adv[start:end] = tmp
159 | 
160 |     print('\nBuilding X_test_adv')
161 |     nb_sample = X_test.shape[0]
162 |     nb_batch = int(np.ceil(nb_sample/batch_size))
163 |     X_test_adv = np.empty(X_test.shape)
164 |     for batch in range(nb_batch):
165 |         print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
166 |         start = batch * batch_size
167 |         end = min(nb_sample, start+batch_size)
168 |         tmp = sess.run(x_adv, feed_dict={x: X_test[start:end],
169 |                                          eps: EPS,
170 |                                          K.learning_phase(): 0})
171 |         X_test_adv[start:end] = tmp
172 | 
173 |     print('\nSaving adversarial dataset')
174 |     os.makedirs('data', exist_ok=True)
175 |     np.savez('data/table_1_svhn_{0:.4f}.npz'.format(EPS),
176 |              X_train_adv=X_train_adv, X_test_adv=X_test_adv)
177 | 
178 | 
179 | print('\nTesting against adversarial test data')
180 | score = model0.evaluate(X_test_adv, y_test)
181 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
182 | 
183 | 
184 | print('\nPlotting random adversarial data')
185 | 
186 | print('\nMaking predictions')
187 | z0 = np.argmax(y_test, axis=1)
188 | y1 = model0.predict(X_test)
189 | z1 = np.argmax(y1, axis=1)
190 | y2 = model0.predict(X_test_adv)
191 | z2 = np.argmax(y2, axis=1)
192 | 
193 | print('\nSelecting figures')
194 | X_tmp = np.empty((2, nb_classes, img_rows, img_cols, img_chan))
195 | y_proba = np.empty((2, nb_classes, nb_classes))
196 | for i in range(10):
197 |     print('Target {0}'.format(i))
198 |     ind, = np.where(np.all([z0==i, z1==i, z2!=i], axis=0))
199 |     cur = np.random.choice(ind)
200 |     X_tmp[0][i] = X_test[cur]
201 |     X_tmp[1][i] = X_test_adv[cur]
202 |     y_proba[0][i] = y1[cur]
203 |     y_proba[1][i] = y2[cur]
204 | 
205 | 
206 | print('\nPlotting results')
207 | fig = plt.figure(figsize=(10, 3))
208 | gs = gridspec.GridSpec(2, 10, wspace=0.1, hspace=0.1)
209 | 
210 | label = np.argmax(y_proba, axis=2)
211 | proba = np.max(y_proba, axis=2)
212 | for i in range(10):
213 |     for j in range(2):
214 |         ax = fig.add_subplot(gs[j, i])
215 |         ax.imshow(X_tmp[j][i], interpolation='none')
216 |         ax.set_xticks([])
217 |         ax.set_yticks([])
218 |         ax.set_xlabel('{0} ({1:.2f})'.format(label[j][i],
219 |                                              proba[j][i]),
220 |                       fontsize=12)
221 | 
222 | print('\nSaving figure')
223 | gs.tight_layout(fig)
224 | os.makedirs('img', exist_ok=True)
225 | plt.savefig('img/table_1_svhn.pdf')
226 | 
227 | 
228 | print('\nPreparing clean/adversarial mixed dataset')
229 | X_all_train = np.vstack([X_train, X_train_adv])
230 | y_all_train = np.vstack([np.zeros([X_train.shape[0], 1]),
231 |                          np.ones([X_train_adv.shape[0], 1])])
232 | 
233 | y0_test = np.zeros((y_test.shape[0], 1))
234 | y1_test = np.ones((y_test.shape[0], 1))
235 | 
236 | ind = np.random.permutation(X_all_train.shape[0])
237 | X_all_train = X_all_train[ind]
238 | y_all_train = y_all_train[ind]
239 | 
240 | 
241 | if False:
242 |     print('\nLoading model1')
243 |     model1 = load_model('model/table_1_svhn_model1.h5')
244 | else:
245 |     print('\nBuilding model1')
246 |     model1 = Sequential([
247 |         Convolution2D(32, 3, 3, border_mode='same',
248 |                       input_shape=input_shape),
249 |         LeakyReLU(alpha=0.2),
250 |         Convolution2D(32, 3, 3),
251 |         LeakyReLU(alpha=0.2),
252 |         MaxPooling2D(pool_size=(2,2)),
253 |         Dropout(0.2),
254 |         Convolution2D(64, 3, 3, border_mode='same'),
255 |         LeakyReLU(alpha=0.2),
256 |         Convolution2D(64, 3, 3),
257 |         LeakyReLU(alpha=0.2),
258 |         MaxPooling2D(pool_size=(2, 2)),
259 |         Dropout(0.3),
260 |         Convolution2D(128, 3, 3, border_mode='same'),
261 |         LeakyReLU(alpha=0.2),
262 |         Convolution2D(128, 3, 3),
263 |         LeakyReLU(alpha=0.2),
264 |         MaxPooling2D(pool_size=(2, 2)),
265 |         Dropout(0.4),
266 |         Flatten(),
267 |         Dense(512),
268 |         Activation('relu'),
269 |         Dropout(0.5),
270 |         Dense(1),
271 |         Activation('sigmoid')])
272 | 
273 |     model1.compile(loss='binary_crossentropy',
274 |                    optimizer="adam", metrics=['accuracy'])
275 | 
276 |     print('\nTraining model1')
277 |     model1.fit(X_all_train, y_all_train, nb_epoch=2,
278 |                validation_split=0.1)
279 | 
280 |     print('\nSaving model1')
281 |     os.makedirs('model', exist_ok=True)
282 |     model1.save('model/table_1_svhn_model1.h5')
283 | 
284 | 
285 | # x1_adv = fgsm(model1, x, nb_epoch=4, eps=0.2)
286 | 
287 | print('\nTesting against clean test data')
288 | score = model1.evaluate(X_test, y0_test)
289 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
290 | 
291 | 
292 | print('\nTesting against adversarial test data')
293 | score = model1.evaluate(X_test_adv, y1_test)
294 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
295 | 
296 | 
297 | # print('\nDisguising clean test data')
298 | # nb_sample = X_test.shape[0]
299 | # batch_size = 128
300 | # nb_batch = int(np.ceil(nb_sample/batch_size))
301 | # X_test_adv1 = np.empty(X_test.shape)
302 | # for batch in range(nb_batch):
303 | #     print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
304 | #     start = batch * batch_size
305 | #     end = min(nb_sample, start+batch_size)
306 | #     tmp = sess.run(x1_adv, feed_dict={x: X_test[start:end],
307 | #                                       K.learning_phase(): 0})
308 | #     X_test_adv1[start:end] = tmp
309 | 
310 | 
311 | # print('\nTesting against disguised clean data')
312 | # score = model1.evaluate(X_test_adv1, y0_test)
313 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
314 | 
315 | 
316 | # print('\nDisguising adversarial test data')
317 | # nb_sample = X_test_adv.shape[0]
318 | # batch_size = 128
319 | # nb_batch = int(np.ceil(nb_sample/batch_size))
320 | # X_test_adv2 = np.empty(X_test_adv.shape)
321 | # for batch in range(nb_batch):
322 | #     print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r')
323 | #     start = batch * batch_size
324 | #     end = min(nb_sample, start+batch_size)
325 | #     tmp = sess.run(x1_adv, feed_dict={x: X_test_adv[start:end],
326 | #                                       K.learning_phase(): 0})
327 | #     X_test_adv2[start:end] = tmp
328 | 
329 | 
330 | # print('\nTesting against disguised adversarial data')
331 | # score = model1.evaluate(X_test_adv2, y1_test)
332 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1]))
333 | 


--------------------------------------------------------------------------------
/adv-clean-not-twins.org:
--------------------------------------------------------------------------------
  1 | #+OPTIONS: ^:{} toc:nil hideblocks title:nil num:2
  2 | #+LATEX_HEADER: \input{setup.tex}
  3 | 
  4 | # ICML specific typesetting
  5 | #+BEGIN_EXPORT latex
  6 | 
  7 | % The \icmltitle you define below is probably too long as a header.
  8 | % Therefore, a short form for the running title is supplied here:
  9 | % \icmltitlerunning{a shorter running title}
 10 | 
 11 | \twocolumn[
 12 | \icmltitle{Adversarial and Clean Data Are Not Twins}
 13 | 
 14 | \begin{icmlauthorlist}
 15 | \icmlauthor{Zhitao Gong}{au}
 16 | \icmlauthor{Wenlu Wang}{au}
 17 | \icmlauthor{Wei-Shinn Ku}{au}
 18 | \end{icmlauthorlist}
 19 | 
 20 | \icmlaffiliation{au}{Auburn University, Auburn, AL}
 21 | 
 22 | \icmlcorrespondingauthor{Zhitao Gong}{gong@auburn.edu}
 23 | 
 24 | % You may provide any keywords that you find helpful for describing
 25 | % your paper; these are used to populate the "keywords" metadata in
 26 | % the PDF but will not be shown in the document
 27 | 
 28 | \icmlkeywords{adversarial, deep neural network}
 29 | 
 30 | \vskip 0.3in
 31 | ]
 32 | 
 33 | \printAffiliationsAndNotice{}
 34 | 
 35 | #+END_EXPORT
 36 | 
 37 | #+BEGIN_abstract
 38 | 
 39 | Adversarial attack has cast a shadow on the massive success of deep
 40 | neural networks.  Despite being almost visually identical to the clean
 41 | data, the adversarial images can fool deep neural networks into wrong
 42 | predictions with very high confidence.  In this paper, however, we
 43 | show that we can build a simple binary classifier separating the
 44 | adversarial apart from the clean data with accuracy over 99%.  We also
 45 | empirically show that the binary classifier is robust to a
 46 | second-round adversarial attack.  In other words, it is difficult to
 47 | disguise adversarial samples to bypass the binary classifier.  Further
 48 | more, we empirically investigate the generalization limitation which
 49 | lingers on all current defensive methods, including the binary
 50 | classifier approach.  And we hypothesize that this is the result of
 51 | intrinsic property of adversarial crafting algorithms.
 52 | 
 53 | #+END_abstract
 54 | 
 55 | * Introduction
 56 | :PROPERTIES:
 57 | :CUSTOM_ID: sec:introduction
 58 | :END:
 59 | 
 60 | Deep neural networks have been successfully adopted to many life
 61 | critical areas, e.g., skin cancer detection
 62 | cite:esteva2017-dermatologist, auto-driving cite:santana2016-learning,
 63 | traffic sign classification cite:ciresan2012-multi, etc.  A recent
 64 | study cite:szegedy2013-intriguing, however, discovered that deep
 65 | neural networks are susceptible to adversarial images.  Figure
 66 | ref:fig:adv-example shows an example of adversarial images generated
 67 | via fast gradient sign method
 68 | cite:kurakin2016-adversarial,kurakin2016-adversarial-1 on MNIST.  As
 69 | we can see that although the adversarial and original clean images are
 70 | almost identical from the perspective of human beings, the deep neural
 71 | network will produce wrong predictions with very high confidence.
 72 | Similar techniques can easily fool the image system into mistaking a
 73 | stop sign for a yield sign, a dog for a automobile, for example.  When
 74 | leveraged by malicious users, these adversarial images pose a great
 75 | threat to the deep neural network systems.
 76 | 
 77 | #+ATTR_LaTeX: :float multicolumn
 78 | #+CAPTION: The adversarial images (second row) are generated from the first row via iterative FGSM.  The label of each image is shown below with prediction probability in parenthesis.  Our model achieves less then 1% error rate on the clean data.
 79 | #+NAME: fig:adv-example
 80 | [[file:img/ex_adv_mnist.pdf]]
 81 | 
 82 | Although adversarial and clean images appear visually indiscernible,
 83 | their subtle differences can successfully fool the deep neural
 84 | networks.  This means that deep neural networks are sensitive to these
 85 | subtle differences.  So an intuitively question to ask is: can we
 86 | leverage these subtle differences to distinguish between adversarial
 87 | and clean images?  Our experiment suggests the answer is positive.  In
 88 | this paper we demonstrate that a simple binary classifier can separate
 89 | the adversarial from the original clean images with very high accuracy
 90 | (over 99%).  However, we also show that the binary classifier approach
 91 | suffers from the generalization limitation, i.e., it is sensitive 1)
 92 | to a hyper-parameter used in crafting adversarial dataset, and 2) to
 93 | different adversarial crafting algorithms.  In addition to that, we
 94 | also discovered that this limitation is also shared among other
 95 | proposed methods against adversarial attacking, e.g., defensive
 96 | retraining cite:huang2015-learning,kurakin2016-adversarial-1,
 97 | knowledge distillation cite:papernot2015-distillation, etc.  We
 98 | empirically investigate the limitation and propose the hypothesis that
 99 | the adversarial and original dataset are, in effect, two completely
100 | /different/ datasets, despite being visually similar.
101 | 
102 | This article is organized as follows.  In Section
103 | ref:sec:related-work, we give an overview of the current research in
104 | adversarial attack and defense, with a focus on deep neural networks.
105 | Then, it is followed by a brief summary of the state-of-the-art
106 | adversarial crafting algorithms in Section
107 | ref:sec:crafting-adversarials.  Section ref:sec:experiment presents
108 | our experiment results and detailed discussions.  And we conclude in
109 | Section ref:sec:conclusion.
110 | 
111 | * Related Work
112 | :PROPERTIES:
113 | :CUSTOM_ID: sec:related-work
114 | :END:
115 | 
116 | The adversarial image attack on deep neural networks was first
117 | investigated in cite:szegedy2013-intriguing.  The authors discovered
118 | that when added some imperceptible carefully chosen noise, an image
119 | may be wrongly classified with high confidence by a well-trained deep
120 | neural network.  They also proposed an adversarial crafting algorithm
121 | based on optimization.  We will briefly summarize it in section
122 | ref:sec:crafting-adversarials.  They also proposed the hypothesis that
123 | the adversarial samples exist as a result of the high nonlinearity of
124 | deep neural network models.
125 | 
126 | However, cite:goodfellow2014-explaining proposed a counter-intuitive
127 | hypothesis explaining the cause of adversarial samples.  They argued
128 | that adversarial samples are caused by the models being too /linear/,
129 | rather than /nonlinear/.  They proposed two adversarial crafting
130 | algorithms based on this hypothesis, i.e., fast gradient sign method
131 | (FGSM) and least-likely class method (LLCM)
132 | cite:goodfellow2014-explaining.  The least-likely class method is
133 | later generalized to target class gradient sign method (TGSM) in
134 | cite:kurakin2016-adversarial.
135 | 
136 | cite:papernot2015-limitations proposed another gradient based
137 | adversarial algorithm, the Jacobian-based saliency map approach (JSMA)
138 | which can successfully alter the label of an image to any desired
139 | category.
140 | 
141 | The adversarial images have been shown to be transferable among deep
142 | neural networks cite:szegedy2013-intriguing,kurakin2016-adversarial.
143 | This poses a great threat to current learning systems in that the
144 | attacker needs not the knowledge of the target system.  Instead, the
145 | attacker can train a different model to create adversarial samples
146 | which are still effective for the target deep neural networks.  What's
147 | worse, cite:papernot2016-transferability has shown that adversarial
148 | samples are even transferable among different machine learning
149 | techniques, e.g., deep neural networks, support vector machine,
150 | decision tree, logistic regression, etc.
151 | 
152 | Small steps have been made towards the defense of adversarial images.
153 | cite:kurakin2016-adversarial shows that some image transformations,
154 | e.g., Gaussian noise, Gaussian filter, JPEG compression, etc., can
155 | effectively recover over 80% of the adversarial images.  However, in
156 | our experiment, the image transformation defense does not perform well
157 | on images with low resolution, e.g., MNIST.  Knowledge distillation is
158 | also shown to be an effective method against most adversarial images
159 | cite:papernot2015-distillation.  The restrictions of defensive
160 | knowledge distillation are 1) that it only applies to models that
161 | produce categorical probabilities, and 2) that it needs model
162 | retraining.  Adversarial training
163 | cite:kurakin2016-adversarial-1,huang2015-learning was also shown to
164 | greatly enhance the model robustness to adversarials.  However, as
165 | discussed in Section ref:subsec:generalization-limitation, defensive
166 | distillation and adversarial training suffers from, what we call, the
167 | generalization limitations.  Our experiment suggests this seems to be
168 | an intrinsic property of adversarial datasets.
169 | 
170 | * Crafting Adversarials
171 | :PROPERTIES:
172 | :CUSTOM_ID: sec:crafting-adversarials
173 | :END:
174 | 
175 | The are mainly two categories of algorithms to generate adversarial
176 | samples, model independent and model dependent.  We briefly summarize
177 | these two classes of methods in this section.
178 | 
179 | By conventions, we use \(X\) to represent input image set (usually a
180 | 3-dimension tensor), and \(Y\) to represent the label set, usually
181 | one-hot encoded.  Lowercase represents an individual data sample,
182 | e.g., \(x\) for one input image.  Subscript to data samples denotes
183 | one of its elements, e.g., \(x_i\) denotes one pixel in the image,
184 | \(y_i\) denotes probability for the \(i\)-th target class.  \(f\)
185 | denotes the model, \(\theta\) the model parameter, \(J\) the loss
186 | function.  We use the superscript /adv/ to denote adversarial related
187 | variables, e.g., \(x^{adv}\) for one adversarial image.  \(\delta x\)
188 | denotes the adversarial noise for one image, i.e., \(x^{adv} = x +
189 | \delta x\).  For clarity, we also include the model used to craft the
190 | adversarial samples where necessary, e.g., \(x^{adv(f_1)}\) denotes
191 | the adversarial samples created with model \(f_1\).  \(\mathbb{D}\)
192 | denotes the image value domain, usually \([0, 1]\) or \([0, 255]\).
193 | And \(\epsilon\) is a scalar controlling the scale of the adversarial
194 | noise, another hyper-parameter to choose.
195 | 
196 | ** Model Independent Method
197 | 
198 | A box-constrained minimization algorithm based on L-BFGS was the first
199 | algorithm proposed to generate adversarial data
200 | cite:szegedy2013-intriguing.  Concretely we want to find the smallest
201 | (in the sense of \(L^2\)-norm) noise \(\delta x\) such that the
202 | adversarial image belongs to a different category, i.e.,
203 | \(f(x^{adv})\neq f(x)\).
204 | #+BEGIN_EXPORT latex
205 | \begin{equation} \label{eq:guided-walk}
206 |   \begin{split}
207 |     \delta x &= \argmin_r c\norm{r}_\infty + J(x+r, y^{adv})\\
208 |     &\text{ s.t. } x+r\in \mathbb{D}
209 |   \end{split}
210 | \end{equation}
211 | #+END_EXPORT
212 | 
213 | ** Model Dependent Methods
214 | 
215 | There are mainly three methods that rely on model gradient, i.e., fast
216 | gradient sign method (FGSM) cite:kurakin2016-adversarial, target class
217 | method cite:kurakin2016-adversarial,kurakin2016-adversarial-1 (TGSM)
218 | and Jacobian-based saliency map approach (JSMA)
219 | cite:papernot2015-limitations.  We will see in Section
220 | ref:sec:experiment that despite that they all produce highly
221 | disguising adversarials, FGSM and TGSM produce /compatible/
222 | adversarial datasets which are complete /different/ from adversarials
223 | generated via JSMA.
224 | 
225 | *** Fast Gradient Sign Method (FGSM)
226 | 
227 | FGSM tries to modify the input towards the direction where \(J\)
228 | increases, i.e., \(\dv*{J(x, y^{adv})}{x}\), as shown in Equation
229 | ref:eq:fgsm.
230 | #+BEGIN_EXPORT latex
231 | \begin{equation} \label{eq:fgsm}
232 |   \delta x = \epsilon\sign\left(\dv{J(x, \pred{y})}{x}\right)
233 | \end{equation}
234 | #+END_EXPORT
235 | 
236 | Originally cite:kurakin2016-adversarial proposes to generate
237 | adversarial samples by using the true label i.e., \(y^{adv} =
238 | y^{true}\), which has been shown to suffer from the label leaking
239 | problem cite:kurakin2016-adversarial-1.  Instead of true labels,
240 | cite:kurakin2016-adversarial-1 proposes to use the /predicted/ label,
241 | i.e., \(\pred{y} = f(x)\), to generate adversarial examples.
242 | 
243 | This method can also be used iteratively as shown in Equation
244 | ref:eq:fgsm-iter.  Iterative FGSM has much higher success rate than
245 | the one-step FGSM.  However, the iterative version is less robust to
246 | image transformation cite:kurakin2016-adversarial.
247 | #+BEGIN_EXPORT latex
248 | \begin{equation} \label{eq:fgsm-iter}
249 |   \begin{split}
250 |     x^{adv}_{k+1} &= x^{adv}_k + \epsilon\sign\left(\dv{J(x^{adv}_k, \pred{y_k})}{x}\right)\\
251 |     x^{adv}_0 &= x\\
252 |     \pred{y_k} &= f(x^{adv}_k)
253 |   \end{split}
254 | \end{equation}
255 | #+END_EXPORT
256 | 
257 | *** Target Class Gradient Sign Method (TGSM)
258 | 
259 | This method tries to modify the input towards the direction where
260 | \(p(y^{adv}\given x)\) increases.
261 | #+BEGIN_EXPORT latex
262 | \begin{equation} \label{eq:tcm}
263 |     \delta x = -\epsilon\sign\left(\dv{J(x, y^{adv})}{x}\right)
264 | \end{equation}
265 | #+END_EXPORT
266 | 
267 | Originally this method was proposed as the least-likely class method
268 | cite:kurakin2016-adversarial where \(y^{adv}\) was chosen as the
269 | least-likely class predicted by the model as shown in Equation
270 | ref:eq:llcm-y.
271 | #+BEGIN_EXPORT latex
272 | \begin{equation} \label{eq:llcm-y}
273 |   y^{adv} = \text{OneHotEncode}\left(\argmin f(x)\right)
274 | \end{equation}
275 | #+END_EXPORT
276 | 
277 | And it was extended to a more general case where \(y^{adv}\) could be
278 | any desired target class cite:kurakin2016-adversarial-1.
279 | 
280 | # The following table belongs to the "Efficiency and Robustness of the
281 | # Classifier" section, place here only for typesetting.
282 | 
283 | #+BEGIN_EXPORT latex
284 | \begin{table*}[htbp]
285 |   \caption{\label{tbl:accuracy-summary}
286 |     Accuracy on adversarial samples generated with FGSM/TGSM.}
287 |   \centering
288 |   \begin{tabular}{lcrrcrrrr}
289 |     \toprule
290 |     & \phantom{a} & \multicolumn{2}{c}{\(f_1\)} & \phantom{a} & \multicolumn{4}{c}{\(f_2\)} \\
291 |     \cmidrule{3-4} \cmidrule{6-9}
292 |     Dataset && \(X_{test}\) & \(X^{adv(f_1)}_{test}\) && \(X_{test}\) & \(X^{adv(f_1)}_{test}\) & \(\{X_{test}\}^{adv(f_2)}\) & \(\{X^{adv(f_1)}_{test}\}^{adv(f_2)}\) \\
293 |     \midrule
294 |     MNIST && 0.9914 & 0.0213 && 1.00 & 1.00 & 0.00 & 1.00\\
295 |     CIFAR10 && 0.8279 & 0.1500 && 0.99 & 1.00 & 0.01 & 1.00\\
296 |     SVHN && 0.9378 & 0.2453 && 1.00 & 1.00 & 0.00 & 1.00\\
297 |     \bottomrule
298 |   \end{tabular}
299 | \end{table*}
300 | 
301 | #+END_EXPORT
302 | 
303 | # #+CAPTION: Accuracy on adversarial samples generated with FGSM/TGSM.
304 | # #+NAME: tbl:accuracy-summary
305 | # #+ATTR_LaTeX: :booktabs true :align l|rr|rrrr :float multicolumn
306 | # |         |      \(f_1\) |                         |              |                         |                     \(f_2\) |                                        |
307 | # |---------+--------------+-------------------------+--------------+-------------------------+-----------------------------+----------------------------------------|
308 | # | Dataset | \(X_{test}\) | \(X^{adv(f_1)}_{test}\) | \(X_{test}\) | \(X^{adv(f_1)}_{test}\) | \(\{X_{test}\}^{adv(f_2)}\) | \(\{X^{adv(f_1)}_{test}\}^{adv(f_2)}\) |
309 | # |---------+--------------+-------------------------+--------------+-------------------------+-----------------------------+----------------------------------------|
310 | # | MNIST   |       0.9914 |                  0.0213 |         1.00 |                    1.00 |                        0.00 |                                   1.00 |
311 | # | CIFAR10 |       0.8279 |                  0.1500 |         0.99 |                    1.00 |                        0.01 |                                   1.00 |
312 | # | SVHN    |       0.9378 |                  0.2453 |         1.00 |                    1.00 |                        0.00 |                                   1.00 |
313 | 
314 | *** Jacobian-based Saliency Map Approach (JSMA)
315 | 
316 | Similar to the target class method, JSMA cite:papernot2015-limitations
317 | allows to specify the desired target class.  However, instead of
318 | adding noise to the whole input, JSMA changes only one pixel at a
319 | time.  A /saliency score/ is calculated for each pixel and pixel with
320 | the highest score is chosen to be perturbed.
321 | #+BEGIN_EXPORT latex
322 | \begin{equation} \label{eq:jsma-saliency}
323 |   \begin{split}
324 |     s(x_i) &= \begin{cases}
325 |       0 & \text{ if } s_t < 0 \text{ or } s_o > 0\\
326 |       s_t\abs{s_o} & \text{ otherwise}
327 |     \end{cases}\\
328 |     s_t &= \pdv{y_t}{x_i}\qquad s_o = \sum_{j\neq t}\pdv{y_j}{x_i}
329 |   \end{split}
330 | \end{equation}
331 | #+END_EXPORT
332 | 
333 | Concretely, \(s_t\) is the Jacobian value of the desired target class
334 | \(y_t\) w.r.t an individual pixel, \(s_o\) is the sum of Jacobian
335 | values of all non-target class.  Intuitively, saliency score indicates
336 | the sensitivity of each output class w.r.t each individual pixel.  And
337 | we want to perturb the pixel towards the direction where \(p(y_t\given
338 | x)\) increases the most.
339 | 
340 | * Experiment
341 | :PROPERTIES:
342 | :CUSTOM_ID: sec:experiment
343 | :END:
344 | 
345 | Generally, we follow the steps below to test the effectiveness and
346 | limitation of the binary classifier approach.
347 | 
348 | 1. Train a deep neural network \(f_1\) on the original clean training
349 |    data \(X_{train}\), and craft adversarial dataset from the original
350 |    clean data, \(X_{train}\to X^{adv(f_1)}_{train}\), \(X_{test}\to
351 |    X^{adv(f_1)}_{test}\).  \(f_1\) is used to generate the attacking
352 |    adversarial dataset which we want to filter out.
353 | 2. Train a binary classifier \(f_2\) on the combined (shuffled)
354 |    training data \(\{X_{train}, X^{adv(f_1)}_{train}\}\), where
355 |    \(X_{train}\) is labeled 0 and \(X^{adv(f_1)}_{train}\) labeled 1.
356 | 3. Test the accuracy of \(f_2\) on \(X_{test}\) and
357 |    \(X^{adv(f_1)}_{test}\), respectively.
358 | 4. Construct second-round adversarial test data, \(\{X_{test},
359 |    X^{adv(f_1)}_{test}\}\to \{X_{test},
360 |    X^{adv(f_1)}_{test}\}^{adv(f_2)}\) and test \(f_2\) accuracy on
361 |    this new adversarial dataset.  Concretely, we want to test whether
362 |    we could find adversarial samples 1) that can successfully bypass
363 |    the binary classifier \(f_2\), and 2) that can still fool the
364 |    target model \(f_1\) if they bypass the binary classifier.  Since
365 |    adversarial datasets are shown to be transferable among different
366 |    machine learning techniques cite:papernot2016-transferability, the
367 |    binary classifier approach will be seriously flawed if \(f_2\)
368 |    failed this second-round attacking test.
369 | 
370 | The code to reproduce our experiment are available
371 | https://github.com/gongzhitaao/adversarial-classifier.
372 | 
373 | ** Efficiency and Robustness of the Classifier
374 | 
375 | We evaluate the binary classifier approach on MNIST, CIFAR10, and SVHN
376 | datasets.  Of all the datasets, the binary classifier achieved
377 | accuracy over 99% and was shown to be robust to a second-round
378 | adversarial attack.  The results are summarized in Table
379 | ref:tbl:accuracy-summary.  Each column denotes the model accuracy on
380 | the corresponding dataset.  The direct conclusions from Table
381 | ref:tbl:accuracy-summary are summarized as follows.
382 | 1. Accuracy on \(X_{test}\) and \(X^{adv(f_1)}_{test}\) suggests that
383 |    the binary classifier is very effective at separating adversarial
384 |    from clean dataset.  Actually during our experiment, the accuracy
385 |    on \(X_{test}\) is always near 1, while the accuracy on
386 |    \(X^{adv(f_1)}_{test}\) is either near 1 (successful) or near 0
387 |    (unsuccessful).  Which means that the classifier either
388 |    successfully detects the subtle difference completely or fails
389 |    completely.  We did not observe any values in between.
390 | 3. Accuracy on \(\{X^{adv(f_1)}_{test}\}^{adv(f_2)}\) suggests that we
391 |    were not successful in disguising adversarial samples to bypass the
392 |    the classifier.  In other words, the binary classifier approach is
393 |    robust to a second-round adversarial attack.
394 | 4. Accuracy on \(\{X_{test}\}^{adv(f_2)}\) suggests that in case of
395 |    the second-round attack, the binary classifier has very high false
396 |    negative.  In other words, it tends to recognize them all as
397 |    adversarials.  This, does not pose a problem in our opinion.  Since
398 |    our main focus is to block adversarial samples.
399 | 
400 | ** Generalization Limitation
401 | :PROPERTIES:
402 | :CUSTOM_ID: subsec:generalization-limitation
403 | :END:
404 | 
405 | Before we conclude too optimistic about the binary classifier approach
406 | performance, however, we discover that it suffers from the
407 | /generalization limitation/.
408 | 1. When trained to recognize adversarial dataset generated via
409 |    FGSM/TGSM, the binary classifier is sensitive to the
410 |    hyper-parameter \(\epsilon\).
411 | 2. The binary classifier is also sensitive to the adversarial crafting
412 |    algorithm.
413 | 
414 | In out experiment, the aforementioned limitations also apply to
415 | adversarial training cite:kurakin2016-adversarial-1,huang2015-learning
416 | and defensive distillation cite:papernot2015-distillation.
417 | 
418 | *** Sensitivity to \(\epsilon\)
419 | 
420 | Table ref:tbl:eps-sensitivity-cifar10 summarizes our tests on CIFAR10.
421 | For brevity, we use \(\eval{f_2}_{\epsilon=\epsilon_0}\) to denote
422 | that the classifier \(f_2\) is trained on adversarial data generated
423 | on \(f_1\) with \(\epsilon=\epsilon_0\).  The binary classifier is
424 | trained on mixed clean data and adversarial dataset which is generated
425 | via FGSM with \(\epsilon=0.03\).  Then we re-generate adversarial
426 | dataset via FGSM/TGSM with different \(\epsilon\) values.
427 | 
428 | #+BEGIN_EXPORT latex
429 | \begin{table}[htbp]
430 |   \caption{\label{tbl:eps-sensitivity-cifar10}
431 |     \(\epsilon\) sensitivity on CIFAR10}
432 |   \centering
433 |   \begin{tabular}{lcll}
434 |     \toprule
435 |     & \phantom{a} & \multicolumn{2}{c}{\(\eval{f_2}_{\epsilon=0.03}\)} \\
436 |     \cmidrule{3-4}
437 |     \(\epsilon\) && \(X_{test}\) & \(X^{adv(f_1)}_{test}\)\\
438 |     \midrule
439 |     0.3 && 0.9996 & 1.0000\\
440 |     0.1 && 0.9996 & 1.0000\\
441 |     0.03 && 0.9996 & 0.9997\\
442 |     0.01 && 0.9996 & \textbf{0.0030}\\
443 |     \bottomrule
444 |   \end{tabular}
445 | \end{table}
446 | #+END_EXPORT
447 | 
448 | # #+CAPTION: \(\epsilon\) sensitivity on CIFAR10
449 | # #+NAME: tbl:eps-sensitivity-cifar10
450 | # #+ATTR_LaTeX: :booktabs true :align r|rr
451 | # |              | \(\eval{f_2}_{\epsilon=0.03}\) |                         |
452 | # |--------------+--------------------------------+-------------------------|
453 | # | \(\epsilon\) |                   \(X_{test}\) | \(X^{adv(f_1)}_{test}\) |
454 | # |--------------+--------------------------------+-------------------------|
455 | # |          0.3 |                         0.9996 |                  1.0000 |
456 | # |          0.1 |                         0.9996 |                  1.0000 |
457 | # |         0.03 |                         0.9996 |                  0.9997 |
458 | # |         0.01 |                         0.9996 |                *0.0030* |
459 | 
460 | As shown in Table ref:tbl:eps-sensitivity-cifar10,
461 | \(\eval{f_2}_{\epsilon=\epsilon_0}\) can correctly filter out
462 | adversarial dataset generated with \(\epsilon\geq\epsilon_0\), but
463 | fails when adversarial data are generated with
464 | \(\epsilon<\epsilon_1\).  Results on MNIST and SVHN are similar.  This
465 | phenomenon was also observed in defensive retraining
466 | cite:kurakin2016-adversarial-1.  To overcome this issue, they proposed
467 | to use mixed \(\epsilon\) values to generate the adversarial datasets.
468 | However, Table ref:tbl:eps-sensitivity-cifar10 suggests that
469 | adversarial datasets generated with smaller \(\epsilon\) are
470 | /superset/ of those generated with larger \(\epsilon\).  This
471 | hypothesis could be well explained by the linearity hypothesis
472 | cite:kurakin2016-adversarial,warde-farley2016-adversarial.  The same
473 | conclusion also applies to adversarial training.  In our experiment,
474 | the results of defensive retraining are similar to the binary
475 | classifier approach.
476 | 
477 | *** Disparity among Adversarial Samples
478 | 
479 | #+ATTR_LaTeX: :float multicolumn
480 | #+CAPTION: Adversarial training \cite{huang2015-learning,kurakin2016-adversarial-1} does not work.  This is a church window plot \cite{warde-farley2016-adversarial}.  Each pixel \((i, j)\) (row index and column index pair) represents a data point \(\tilde{x}\) in the input space and \(\tilde{x} = x + \vb{h}\epsilon_j + \vb{v}\epsilon_i\), where \(\vb{h}\) is the direction computed by FGSM and \(\vb{v}\) is a random direction orthogonal to \(\vb{h}\).  The \(\epsilon\) ranges from \([-0.5, 0.5]\) and \(\epsilon_{(\cdot)}\) is the interpolated value in between.  The central black dot \tikz[baseline=-0.5ex]{\draw[fill=black] (0,0) circle (0.3ex)} represents the original data point \(x\), the orange dot (on the right of the center dot) \tikz[baseline=-0.5ex]{\draw[fill=orange,draw=none] (0,0) circle (0.3ex)} represents the last adversarial sample created from \(x\) via FGSM that is used in the adversarial training and the blue dot \tikz[baseline=-0.5ex]{\draw[fill=blue,draw=none] (0,0) circle (0.3ex)} represents a random adversarial sample created from \(x\) that cannot be recognized with adversarial training. The three digits below each image, from left to right, are the data samples that correspond to the black dot, orange dot and blue dot, respectively.  \tikz[baseline=0.5ex]{\draw (0,0) rectangle (2.5ex,2.5ex)} ( \tikz[baseline=0.5ex]{\draw[fill=black,opacity=0.1] (0,0) rectangle (2.5ex,2.5ex)} ) represents the data samples that are always correctly (incorrectly) recognized by the model.  \tikz[baseline=0.5ex]{\draw[fill=red,opacity=0.1] (0,0) rectangle (2.5ex,2.5ex)} represents the adversarial samples that can be correctly recognized without adversarial training only.  And \tikz[baseline=0.5ex]{\draw[fill=green,opacity=0.1] (0,0) rectangle (2.5ex,2.5ex)} represents the data points that were correctly recognized with adversarial training only, i.e., the side effect of adversarial training.
481 | #+NAME: fig:adv-training-not-working
482 | [[file:img/adv-training-not-working.pdf]]
483 | 
484 | In our experiment, we also discovered that the binary classifier is
485 | also sensitive to the algorithms used to generate the adversarial
486 | datasets.
487 | 
488 | Specifically, the binary classifier trained on FGSM adversarial
489 | dataset achieves good accuracy (over 99%) on FGSM adversarial dataset,
490 | but not on adversarial generated via JSMA, and vise versa.  However,
491 | when binary classifier is trained on a mixed adversarial dataset from
492 | FGSM and JSMA, it performs well (with accuracy over 99%) on both
493 | datasets.  This suggests that FGSM and JSMA generate adversarial
494 | datasets that are /far away/ from each other.  It is too vague without
495 | defining precisely what is /being far away/.  In our opinion, they are
496 | /far away/ in the same way that CIFAR10 is /far away/ from SVHN.  A
497 | well-trained model on CIFAR10 will perform poorly on SVHN, and vise
498 | versa.  However, a well-trained model on the the mixed dataset of
499 | CIFAR10 and SVHN will perform just as well, if not better, on both
500 | datasets, as if it is trained solely on one dataset.
501 | 
502 | The adversarial datasets generated via FGSM and TGSM are, however,
503 | /compatible/ with each other.  In other words, the classifier trained
504 | on one adversarial datasets performs well on adversarials from the
505 | other algorithm.  They are compatible in the same way that training
506 | set and test set are compatible.  Usually we expect a model, when
507 | properly trained, should generalize well to the unseen data from the
508 | same distribution, e.g., the test dataset.
509 | 
510 | In effect, it is not just FGSM and JSMA are incompatible.  We can
511 | generate adversarial data samples by a linear combination of the
512 | direction computed by FGSM and another random orthogonal direction, as
513 | illustrated in a church plot cite:warde-farley2016-adversarial Figure
514 | ref:fig:adv-training-not-working.  Figure
515 | ref:fig:adv-training-not-working visually shows the effect of
516 | adversarial training cite:kurakin2016-adversarial-1.  Each image
517 | represents adversarial samples generated from /one/ data sample, which
518 | is represented as a black dot in the center of each image, the last
519 | adversarial sample used in adversarial training is represented as an
520 | orange dot (on the right of black dot, i.e., in the direction computed
521 | by FGSM).  The green area represents the adversarial samples that
522 | cannot be correctly recognized without adversarial training but can be
523 | correctly recognized with adversarial training.  The red area
524 | represents data samples that can be correctly recognized without
525 | adversarial training but cannot be correctly recognized with
526 | adversarial training.  In other words, it represents the side effect
527 | of adversarial training, i.e., slightly reducing the model accuracy.
528 | The white (gray) area represents the data samples that are always
529 | correctly (incorrectly) recognized with or without adversarial
530 | training.
531 | 
532 | As we can see from Figure ref:fig:adv-training-not-working,
533 | adversarial training does make the model more robust against the
534 | adversarial sample (and adversarial samples around it to some extent)
535 | used for training (green area).  However, it does not rule out all
536 | adversarials.  There are still adversarial samples (gray area) that
537 | are not affected by the adversarial training.  Further more, we could
538 | observe that the green area largely distributes along the horizontal
539 | direction, i.e., the FGSM direction.  In cite:nguyen2014-deep, they
540 | observed similar results for fooling images.  In their experiment,
541 | adversarial training with fooling images, deep neural network models
542 | are more robust against a limited set of fooling images.  However they
543 | can still be fooled by other fooling images easily.
544 | 
545 | * Conclusion
546 | :PROPERTIES:
547 | :CUSTOM_ID: sec:conclusion
548 | :END:
549 | 
550 | We show in this paper that the binary classifier is a simple yet
551 | effective and robust way to separating adversarial from the original
552 | clean images.  Its advantage over defensive retraining and
553 | distillation is that it serves as a preprocessing step without
554 | assumptions about the model it protects.  Besides, it can be readily
555 | deployed without any modification of the underlying systems.  However,
556 | as we empirically showed in the experiment, the binary classifier
557 | approach, defensive retraining and distillation all suffer from the
558 | generalization limitation.  For future work, we plan to extend our
559 | current work in two directions.  First, we want to investigate the
560 | disparity between different adversarial crafting methods and its
561 | effect on the generated adversarial space.  Second, we will also
562 | carefully examine the cause of adversarial samples since intuitively
563 | the linear hypothesis does not seem right to us.
564 | 
565 | 
566 | #+LaTeX: \bibliographystyle{icml2017}
567 | #+LaTeX: \bibliography{/home/gongzhitaao/Dropbox/bibliography/nn}
568 | 


--------------------------------------------------------------------------------