├── img ├── ex_adv_mnist.pdf └── adv-training-not-working.pdf ├── setup.tex ├── README.org ├── LICENSE ├── src ├── figure_1.py ├── table_2.py ├── table_1_mnist.py ├── table_1_cifar10.py ├── figure_2.py └── table_1_svhn.py └── adv-clean-not-twins.org /img/ex_adv_mnist.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gongzhitaao/adversarial-classifier/HEAD/img/ex_adv_mnist.pdf -------------------------------------------------------------------------------- /img/adv-training-not-working.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gongzhitaao/adversarial-classifier/HEAD/img/adv-training-not-working.pdf -------------------------------------------------------------------------------- /setup.tex: -------------------------------------------------------------------------------- 1 | \usepackage{booktabs} 2 | \usepackage{authblk} 3 | \usepackage{mathtools} 4 | \usepackage{amssymb} 5 | \usepackage{natbib} 6 | \usepackage{physics} 7 | \usepackage{subfigure} 8 | \usepackage{times} 9 | \usepackage{tikz} 10 | 11 | \DeclareMathOperator*{\argmin}{argmin} 12 | \DeclareMathOperator*{\argmax}{argmax} 13 | \DeclareMathOperator*{\sign}{sign} 14 | \newcommand\pred[1]{\overline{#1}} 15 | \newcommand\given{\:\vert\:} 16 | 17 | % \usepackage{icml2017} 18 | 19 | % Employ this version of the ``usepackage'' statement after the paper has 20 | % been accepted, when creating the final version. This will set the 21 | % note in the first column to ``Proceedings of the...'' 22 | \usepackage[accepted]{icml2017} 23 | -------------------------------------------------------------------------------- /README.org: -------------------------------------------------------------------------------- 1 | #+TITLE: Adversarial Classifier 2 | 3 | This repo contains the code to reproduce the experiment in our paper 4 | (https://arxiv.org/abs/1704.04960), specifically the code to generate 5 | the figures and tables. 6 | 7 | * Code Dependencies 8 | :PROPERTIES: 9 | :CUSTOM_ID: sec:code-dependencies 10 | :END: 11 | 12 | 1. Python 3.6+ 13 | 2. Keras https://keras.io/ 14 | 3. Tensorflow https://tensorflow.org 15 | 4. tensorflow-adversarial 16 | https://github.com/gongzhitaao/tensorflow-adversarial 17 | 18 | * Datasets 19 | :PROPERTIES: 20 | :CUSTOM_ID: sec:datasets 21 | :END: 22 | 23 | We used MNIST, CIFAR10 and SVHN in our experiment. 24 | 25 | * Document 26 | :PROPERTIES: 27 | :CUSTOM_ID: sec:document 28 | :END: 29 | 30 | The paper itself is written in Org mode (http://orgmode.org/) 31 | employing the ICML2017 LaTeX template, with slight modification, i.e., 32 | removing the conference information. 33 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 gongzhitaao 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining 6 | a copy of this software and associated documentation files (the 7 | "Software"), to deal in the Software without restriction, including 8 | without limitation the rights to use, copy, modify, merge, publish, 9 | distribute, sublicense, and/or sell copies of the Software, and to 10 | permit persons to whom the Software is furnished to do so, subject to 11 | the following conditions: 12 | 13 | The above copyright notice and this permission notice shall be 14 | included in all copies or substantial portions of the Software. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 19 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 20 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 21 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 22 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /src/figure_1.py: -------------------------------------------------------------------------------- 1 | import os 2 | # supress tensorflow logging other than errors 3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | 8 | from keras import backend as K 9 | from keras.datasets import mnist 10 | from keras.models import Sequential, load_model 11 | from keras.layers import Dense, Dropout, Activation, Flatten 12 | from keras.layers import Convolution2D, MaxPooling2D 13 | from keras.utils import np_utils 14 | 15 | import matplotlib 16 | matplotlib.use('Agg') 17 | import matplotlib.pyplot as plt 18 | import matplotlib.gridspec as gridspec 19 | 20 | from attacks.fgsm import fgsm 21 | 22 | 23 | img_rows = 28 24 | img_cols = 28 25 | img_chas = 1 26 | input_shape = (img_rows, img_cols, img_chas) 27 | nb_classes = 10 28 | 29 | 30 | print('\nLoading mnist') 31 | (X_train, y_train), (X_test, y_test) = mnist.load_data() 32 | 33 | X_train = X_train.astype('float32') / 255. 34 | X_test = X_test.astype('float32') / 255. 35 | 36 | X_train = X_train.reshape(-1, img_rows, img_cols, img_chas) 37 | X_test = X_test.reshape(-1, img_rows, img_cols, img_chas) 38 | 39 | # one hot encoding 40 | y_train = np_utils.to_categorical(y_train, nb_classes) 41 | z0 = y_test.copy() 42 | y_test = np_utils.to_categorical(y_test, nb_classes) 43 | 44 | 45 | sess = tf.InteractiveSession() 46 | K.set_session(sess) 47 | 48 | 49 | if False: 50 | print('\nLoading model') 51 | model = load_model('model/figure_1.h5') 52 | else: 53 | print('\nBuilding model') 54 | model = Sequential([ 55 | Convolution2D(32, 3, 3, input_shape=input_shape), 56 | Activation('relu'), 57 | Convolution2D(32, 3, 3), 58 | Activation('relu'), 59 | MaxPooling2D(pool_size=(2, 2)), 60 | Dropout(0.25), 61 | Flatten(), 62 | Dense(128), 63 | Activation('relu'), 64 | Dropout(0.5), 65 | Dense(10), 66 | Activation('softmax')]) 67 | 68 | model.compile(optimizer='adam', loss='categorical_crossentropy', 69 | metrics=['accuracy']) 70 | 71 | print('\nTraining model') 72 | model.fit(X_train, y_train, nb_epoch=10) 73 | 74 | print('\nSaving model') 75 | os.makedirs('model', exist_ok=True) 76 | model.save('model/figure_1.h5') 77 | 78 | 79 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chas)) 80 | x_adv = fgsm(model, x, nb_epoch=9, eps=0.02) 81 | 82 | 83 | print('\nTest against clean data') 84 | score = model.evaluate(X_test, y_test) 85 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 86 | 87 | 88 | if False: 89 | print('\nLoading adversarial data') 90 | X_adv = np.load('data/figure_1.npy') 91 | else: 92 | print('\nGenerating adversarial data') 93 | nb_sample = X_test.shape[0] 94 | batch_size = 128 95 | nb_batch = int(np.ceil(nb_sample/batch_size)) 96 | X_adv = np.empty(X_test.shape) 97 | for batch in range(nb_batch): 98 | print('batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 99 | start = batch * batch_size 100 | end = min(nb_sample, start+batch_size) 101 | tmp = sess.run(x_adv, feed_dict={x: X_test[start:end], 102 | K.learning_phase(): 0}) 103 | X_adv[start:end] = tmp 104 | 105 | os.makedirs('data', exist_ok=True) 106 | np.save('data/figure_1.npy', X_adv) 107 | 108 | 109 | print('\nTest against adversarial data') 110 | score = model.evaluate(X_adv, y_test) 111 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 112 | 113 | 114 | print('\nMake predictions') 115 | y1 = model.predict(X_test) 116 | z1 = np.argmax(y1, axis=1) 117 | y2 = model.predict(X_adv) 118 | z2 = np.argmax(y2, axis=1) 119 | 120 | print('\nSelecting figures') 121 | X_tmp = np.empty((2, 10, 28, 28)) 122 | y_proba = np.empty((2, 10, 10)) 123 | for i in range(10): 124 | print('Target {0}'.format(i)) 125 | ind, = np.where(np.all([z0==i, z1==i, z2!=i], axis=0)) 126 | cur = np.random.choice(ind) 127 | X_tmp[0][i] = np.squeeze(X_test[cur]) 128 | X_tmp[1][i] = np.squeeze(X_adv[cur]) 129 | y_proba[0][i] = y1[cur] 130 | y_proba[1][i] = y2[cur] 131 | 132 | 133 | print('\nPlotting results') 134 | fig = plt.figure(figsize=(10, 3)) 135 | gs = gridspec.GridSpec(2, 10, wspace=0.1, hspace=0.1) 136 | 137 | label = np.argmax(y_proba, axis=2) 138 | proba = np.max(y_proba, axis=2) 139 | for i in range(10): 140 | for j in range(2): 141 | ax = fig.add_subplot(gs[j, i]) 142 | ax.imshow(X_tmp[j][i], cmap='gray', interpolation='none') 143 | ax.set_xticks([]) 144 | ax.set_yticks([]) 145 | ax.set_xlabel('{0} ({1:.2f})'.format(label[j][i], 146 | proba[j][i]), 147 | fontsize=12) 148 | 149 | print('\nSaving figure') 150 | gs.tight_layout(fig) 151 | os.makedirs('img', exist_ok=True) 152 | plt.savefig('img/figure_1.pdf') 153 | -------------------------------------------------------------------------------- /src/table_2.py: -------------------------------------------------------------------------------- 1 | import os 2 | # supress tensorflow logging other than errors 3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | 8 | from keras import backend as K 9 | from keras.datasets import cifar10 10 | from keras.models import Sequential, load_model 11 | from keras.layers import Dense, Dropout, Activation, Flatten 12 | from keras.layers import Convolution2D, MaxPooling2D 13 | from keras.layers import LeakyReLU 14 | from keras.callbacks import EarlyStopping 15 | from keras.utils import np_utils 16 | 17 | from attacks.fgsm import fgsm 18 | 19 | 20 | 21 | img_rows = 32 22 | img_cols = 32 23 | img_chan = 3 24 | input_shape=(img_rows, img_cols, img_chan) 25 | nb_classes = 10 26 | 27 | print('\nLoading cifar10') 28 | (X_train, y_train), (X_test, y_test) = cifar10.load_data() 29 | X_train = X_train.astype('float32') / 255 30 | X_test = X_test.astype('float32') / 255 31 | print('\nX_train shape:', X_train.shape) 32 | print('X_test shape:', X_train.shape) 33 | 34 | y_train = np_utils.to_categorical(y_train, nb_classes) 35 | y_test = np_utils.to_categorical(y_test, nb_classes) 36 | print('\ny_train shape:', y_train.shape) 37 | print('y_test shape:', y_test.shape) 38 | 39 | 40 | sess = tf.InteractiveSession() 41 | K.set_session(sess) 42 | 43 | 44 | if False: 45 | print('\nLoading model0') 46 | model0 = load_model('model/table_2_model0.h5') 47 | else: 48 | print('\nBuilding model0') 49 | model0 = Sequential([ 50 | Convolution2D(32, 3, 3, border_mode='same', 51 | input_shape=input_shape), 52 | LeakyReLU(alpha=0.2), 53 | Convolution2D(32, 3, 3), 54 | LeakyReLU(alpha=0.2), 55 | MaxPooling2D(pool_size=(2,2)), 56 | Dropout(0.2), 57 | Convolution2D(64, 3, 3, border_mode='same'), 58 | LeakyReLU(alpha=0.2), 59 | Convolution2D(64, 3, 3), 60 | LeakyReLU(alpha=0.2), 61 | MaxPooling2D(pool_size=(2, 2)), 62 | Dropout(0.2), 63 | Convolution2D(128, 3, 3, border_mode='same'), 64 | LeakyReLU(alpha=0.2), 65 | Convolution2D(128, 3, 3), 66 | LeakyReLU(alpha=0.2), 67 | MaxPooling2D(pool_size=(2, 2)), 68 | Dropout(0.5), 69 | Flatten(), 70 | Dense(512), 71 | Activation('relu'), 72 | Dropout(0.5), 73 | Dense(nb_classes), 74 | Activation('softmax')]) 75 | 76 | model0.compile(loss='categorical_crossentropy', 77 | optimizer='adam', metrics=['accuracy']) 78 | 79 | earlystopping = EarlyStopping(monitor='val_loss', patience=5, 80 | verbose=1) 81 | model0.fit(X_train, y_train, nb_epoch=100, validation_split=0.1, 82 | callbacks=[earlystopping]) 83 | 84 | print('\nSaving model0') 85 | os.makedirs('model', exist_ok=True) 86 | model0.save('model/table_2_model0.h5') 87 | 88 | 89 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chan)) 90 | eps = tf.placeholder(tf.float32, ()) 91 | x_adv = fgsm(model0, x, nb_epoch=9, eps=eps) 92 | 93 | 94 | print('\nTesting against clean test data') 95 | score = model0.evaluate(X_test, y_test) 96 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 97 | 98 | 99 | if False: 100 | for EPS in [0.01, 0.03, 0.1, 0.3]: 101 | print('\nBuilding X_train_adv with eps={0:.2f}'.format(EPS)) 102 | nb_sample = X_train.shape[0] 103 | batch_size = 128 104 | nb_batch = int(np.ceil(nb_sample/batch_size)) 105 | X_train_adv = np.empty(X_train.shape) 106 | for batch in range(nb_batch): 107 | print(' batch {0}/{1}'.format(batch+1, nb_batch), 108 | end='\r') 109 | start = batch * batch_size 110 | end = min(nb_sample, start+batch_size) 111 | tmp = sess.run(x_adv, feed_dict={x: X_train[start:end], 112 | eps: EPS, 113 | K.learning_phase(): 0}) 114 | X_train_adv[start:end] = tmp 115 | 116 | print('\nBuilding X_test_adv with eps={0:.2f}'.format(EPS)) 117 | nb_sample = X_test.shape[0] 118 | nb_batch = int(np.ceil(nb_sample/batch_size)) 119 | X_test_adv = np.empty(X_test.shape) 120 | for batch in range(nb_batch): 121 | print(' batch {0}/{1}'.format(batch+1, nb_batch), 122 | end='\r') 123 | start = batch * batch_size 124 | end = min(nb_sample, start+batch_size) 125 | tmp = sess.run(x_adv, feed_dict={x: X_test[start:end], 126 | eps: EPS, 127 | K.learning_phase(): 0}) 128 | X_test_adv[start:end] = tmp 129 | 130 | print('\nSaving adversarial images') 131 | os.makedirs('data/', exist_ok=True) 132 | np.savez('data/table_2_{0:.2f}.npz'.format(EPS), 133 | X_train_adv=X_train_adv, X_test_adv=X_test_adv) 134 | 135 | 136 | print('\nTesting against adversarial test data') 137 | score = model0.evaluate(X_test_adv, y_test) 138 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 139 | 140 | 141 | y0_test = np.zeros((y_test.shape[0], 1)) 142 | y1_test = np.ones((y_test.shape[0], 1)) 143 | 144 | 145 | if False: 146 | print('\nLoading model1') 147 | model1 = load_model('model/table_2_model1.h5') 148 | else: 149 | print('\nLoading adversarial data with eps=0.03') 150 | db = np.load('data/table_2_0.03.npz') 151 | X_train_adv = db['X_train_adv'] 152 | X_test_adv = db['X_test_adv'] 153 | 154 | print('\nPreparing clean/adversarial mixed dataset') 155 | X_all_train = np.vstack([X_train, X_train_adv]) 156 | y_all_train = np.vstack([np.zeros([X_train.shape[0], 1]), 157 | np.ones([X_train_adv.shape[0], 1])]) 158 | 159 | ind = np.random.permutation(X_all_train.shape[0]) 160 | X_all_train = X_all_train[ind] 161 | y_all_train = y_all_train[ind] 162 | 163 | print('\nBuilding model1') 164 | model1 = Sequential([ 165 | Convolution2D(32, 3, 3, border_mode='same', 166 | input_shape=input_shape), 167 | LeakyReLU(alpha=0.2), 168 | Convolution2D(32, 3, 3), 169 | LeakyReLU(alpha=0.2), 170 | MaxPooling2D(pool_size=(2,2)), 171 | Dropout(0.2), 172 | Convolution2D(64, 3, 3, border_mode='same'), 173 | LeakyReLU(alpha=0.2), 174 | Convolution2D(64, 3, 3), 175 | LeakyReLU(alpha=0.2), 176 | MaxPooling2D(pool_size=(2, 2)), 177 | Flatten(), 178 | Dense(256), 179 | Activation('relu'), 180 | Dropout(0.5), 181 | Dense(1), 182 | Activation('sigmoid')]) 183 | 184 | model1.compile(loss='binary_crossentropy', 185 | optimizer='adam', metrics=['accuracy']) 186 | 187 | os.makedirs('model', exist_ok=True) 188 | model1.fit(X_all_train, y_all_train, nb_epoch=2, 189 | validation_split=0.1) 190 | 191 | print('\nSaving model1') 192 | model1.save('model/table_2_model1.h5') 193 | 194 | 195 | for EPS in [0.01, 0.03, 0.1, 0.3]: 196 | print('\nLoading adversarial data with eps={0:.2f}'.format(EPS)) 197 | db = np.load('data/table_2_{0:.2f}.npz'.format(EPS)) 198 | X_train_adv = db['X_train_adv'] 199 | X_test_adv = db['X_test_adv'] 200 | 201 | print('\nTesting against clean test data') 202 | score = model1.evaluate(X_test, y0_test) 203 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 204 | 205 | print('\nTesting against adversarial test data') 206 | score = model1.evaluate(X_test_adv, y1_test) 207 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 208 | -------------------------------------------------------------------------------- /src/table_1_mnist.py: -------------------------------------------------------------------------------- 1 | import os 2 | # supress tensorflow logging other than errors 3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | 8 | from keras import backend as K 9 | from keras.datasets import mnist 10 | from keras.models import Sequential, load_model 11 | from keras.layers import Dense, Dropout, Activation, Flatten 12 | from keras.layers import Convolution2D, MaxPooling2D 13 | from keras.utils import np_utils 14 | 15 | from attacks.fgsm import fgsm 16 | 17 | 18 | 19 | img_rows = 28 20 | img_cols = 28 21 | img_chan = 1 22 | input_shape=(img_rows, img_cols, img_chan) 23 | nb_classes = 10 24 | 25 | 26 | (X_train, y_train), (X_test, y_test) = mnist.load_data() 27 | X_train = np.reshape(X_train, (-1, img_rows, img_cols, img_chan)) 28 | X_test = np.reshape(X_test, (-1, img_rows, img_cols, img_chan)) 29 | X_train = X_train.astype('float32') / 255 30 | X_test = X_test.astype('float32') / 255 31 | print('\nX_train shape:', X_train.shape) 32 | print('X_test shape:', X_test.shape) 33 | 34 | y_train = np_utils.to_categorical(y_train, nb_classes) 35 | y_test = np_utils.to_categorical(y_test, nb_classes) 36 | 37 | 38 | sess = tf.InteractiveSession() 39 | K.set_session(sess) 40 | 41 | 42 | if False: 43 | print('\nLoading model0') 44 | model0 = load_model('model/table_1_mnist_model0.h5') 45 | else: 46 | print('\nBuilding model0') 47 | model0 = Sequential([ 48 | Convolution2D(32, 3, 3, border_mode='same', 49 | input_shape=input_shape), 50 | Activation('relu'), 51 | Convolution2D(32, 3, 3), 52 | Activation('relu'), 53 | MaxPooling2D(pool_size=(2, 2)), 54 | Dropout(0.25), 55 | Flatten(), 56 | Dense(128), 57 | Activation('relu'), 58 | Dropout(0.5), 59 | Dense(10), 60 | Activation('softmax')]) 61 | 62 | model0.compile(loss='categorical_crossentropy', 63 | optimizer='adam', metrics=['accuracy']) 64 | 65 | print('\nTraining model0') 66 | model0.fit(X_train, y_train, nb_epoch=10) 67 | 68 | print('\nSaving model0') 69 | os.makedirs('model', exist_ok=True) 70 | model0.save('model/table_1_mnist_model0.h5') 71 | 72 | 73 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chan)) 74 | eps = tf.placeholder(tf.float32, ()) 75 | x_adv = fgsm(model0, x, nb_epoch=9, eps=eps) 76 | 77 | 78 | print('\nTesting against clean test data') 79 | score = model0.evaluate(X_test, y_test) 80 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 81 | 82 | 83 | EPS = 0.02 84 | 85 | if False: 86 | print('\nLoading adversarial images') 87 | db = np.load('data/table_1_mnist_{0:.4f}.npz'.format(EPS)) 88 | X_train_adv = db['X_train_adv'] 89 | X_test_adv = db['X_test_adv'] 90 | else: 91 | print('\nBuilding X_train_adv') 92 | nb_sample = X_train.shape[0] 93 | batch_size = 128 94 | nb_batch = int(np.ceil(nb_sample/batch_size)) 95 | X_train_adv = np.empty(X_train.shape) 96 | for batch in range(nb_batch): 97 | print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 98 | start = batch * batch_size 99 | end = min(nb_sample, start+batch_size) 100 | tmp = sess.run(x_adv, feed_dict={x: X_train[start:end], 101 | eps: EPS, 102 | K.learning_phase(): 0}) 103 | X_train_adv[start:end] = tmp 104 | 105 | print('\nBuilding X_test_adv') 106 | nb_sample = X_test.shape[0] 107 | nb_batch = int(np.ceil(nb_sample/batch_size)) 108 | X_test_adv = np.empty(X_test.shape) 109 | for batch in range(nb_batch): 110 | print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 111 | start = batch * batch_size 112 | end = min(nb_sample, start+batch_size) 113 | tmp = sess.run(x_adv, feed_dict={x: X_test[start:end], 114 | eps: EPS, 115 | K.learning_phase(): 0}) 116 | X_test_adv[start:end] = tmp 117 | 118 | print('\nSaving adversarial images') 119 | os.makedirs('data/', exist_ok=True) 120 | np.savez('data/table_1_mnist_{0:.4f}.npz'.format(EPS), 121 | X_train_adv=X_train_adv, X_test_adv=X_test_adv) 122 | 123 | print('\nTesting against adversarial test data') 124 | score = model0.evaluate(X_test_adv, y_test) 125 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 126 | 127 | 128 | print('\nPlotting random adversarial data') 129 | 130 | print('\nMaking predictions') 131 | z0 = np.argmax(y_test, axis=1) 132 | y1 = model0.predict(X_test) 133 | z1 = np.argmax(y1, axis=1) 134 | y2 = model0.predict(X_test_adv) 135 | z2 = np.argmax(y2, axis=1) 136 | 137 | print('\nSelecting figures') 138 | X_tmp = np.empty((2, nb_classes, img_rows, img_cols)) 139 | y_proba = np.empty((2, nb_classes, nb_classes)) 140 | for i in range(10): 141 | print('Target {0}'.format(i)) 142 | ind, = np.where(np.all([z0==i, z1==i, z2!=i], axis=0)) 143 | cur = np.random.choice(ind) 144 | X_tmp[0][i] = np.squeeze(X_test[cur]) 145 | X_tmp[1][i] = np.squeeze(X_test_adv[cur]) 146 | y_proba[0][i] = y1[cur] 147 | y_proba[1][i] = y2[cur] 148 | 149 | 150 | print('\nPlotting results') 151 | fig = plt.figure(figsize=(10, 3)) 152 | gs = gridspec.GridSpec(2, 10, wspace=0.1, hspace=0.1) 153 | 154 | label = np.argmax(y_proba, axis=2) 155 | proba = np.max(y_proba, axis=2) 156 | for i in range(10): 157 | for j in range(2): 158 | ax = fig.add_subplot(gs[j, i]) 159 | ax.imshow(X_tmp[j][i], interpolation='none') 160 | ax.set_xticks([]) 161 | ax.set_yticks([]) 162 | ax.set_xlabel('{0} ({1:.2f})'.format(label[j][i], 163 | proba[j][i]), 164 | fontsize=12) 165 | 166 | print('\nSaving figure') 167 | gs.tight_layout(fig) 168 | os.makedirs('img', exist_ok=True) 169 | plt.savefig('img/table_1_mnist.pdf') 170 | 171 | 172 | print('\nPreparing clean/adversarial mixed dataset') 173 | X_all_train = np.vstack([X_train, X_train_adv]) 174 | y_all_train = np.vstack([np.zeros([X_train.shape[0], 1]), 175 | np.ones([X_train_adv.shape[0], 1])]) 176 | 177 | y0_test = np.zeros((y_test.shape[0], 1)) 178 | y1_test = np.ones((y_test.shape[0], 1)) 179 | 180 | ind = np.random.permutation(X_all_train.shape[0]) 181 | X_all_train = X_all_train[ind] 182 | y_all_train = y_all_train[ind] 183 | 184 | 185 | if False: 186 | print('\nLoading model1') 187 | model1 = load_model('model/table_1_mnist_model1.h5') 188 | else: 189 | print('\nBuilding model1') 190 | model1 = Sequential([ 191 | Convolution2D(32, 3, 3, input_shape=input_shape), 192 | Activation('relu'), 193 | Convolution2D(32, 3, 3), 194 | Activation('relu'), 195 | MaxPooling2D(pool_size=(2, 2)), 196 | Dropout(0.25), 197 | Flatten(), 198 | Dense(128), 199 | Activation('relu'), 200 | Dropout(0.5), 201 | Dense(1), 202 | Activation('sigmoid')]) 203 | 204 | model1.compile(loss='binary_crossentropy', 205 | optimizer='adam', metrics=['accuracy']) 206 | 207 | print('\nTraining model1') 208 | os.makedirs('model', exist_ok=True) 209 | model1.fit(X_all_train, y_all_train, nb_epoch=2, 210 | validation_split=0.1) 211 | 212 | print('\nSaving model1') 213 | model1.save('model/table_1_mnist_model1.h5') 214 | 215 | 216 | # x1_adv = fgsm(model1, x, nb_epoch=4, eps=0.2) 217 | 218 | print('\nTesting against clean test data') 219 | score = model1.evaluate(X_test, y0_test) 220 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 221 | 222 | 223 | print('\nTesting against adversarial test data') 224 | score = model1.evaluate(X_test_adv, y1_test) 225 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 226 | 227 | 228 | # print('\nDisguising clean test data') 229 | # nb_sample = X_test.shape[0] 230 | # batch_size = 128 231 | # nb_batch = int(np.ceil(nb_sample/batch_size)) 232 | # X_test_adv1 = np.empty(X_test.shape) 233 | # for batch in range(nb_batch): 234 | # print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 235 | # start = batch * batch_size 236 | # end = min(nb_sample, start+batch_size) 237 | # tmp = sess.run(x1_adv, feed_dict={x: X_test[start:end], 238 | # K.learning_phase(): 0}) 239 | # X_test_adv1[start:end] = tmp 240 | 241 | 242 | # print('\nTesting against disguised clean data') 243 | # score = model1.evaluate(X_test_adv1, y0_test) 244 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 245 | 246 | 247 | # print('\nDisguising adversarial test data') 248 | # nb_sample = X_test_adv.shape[0] 249 | # batch_size = 128 250 | # nb_batch = int(np.ceil(nb_sample/batch_size)) 251 | # X_test_adv2 = np.empty(X_test_adv.shape) 252 | # for batch in range(nb_batch): 253 | # print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 254 | # start = batch * batch_size 255 | # end = min(nb_sample, start+batch_size) 256 | # tmp = sess.run(x1_adv, feed_dict={x: X_test_adv[start:end], 257 | # K.learning_phase(): 0}) 258 | # X_test_adv2[start:end] = tmp 259 | 260 | 261 | # print('\nTesting against disguised adversarial data') 262 | # score = model1.evaluate(X_test_adv2, y1_test) 263 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 264 | -------------------------------------------------------------------------------- /src/table_1_cifar10.py: -------------------------------------------------------------------------------- 1 | import os 2 | # supress tensorflow logging other than errors 3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | 8 | from keras import backend as K 9 | from keras.datasets import cifar10 10 | from keras.models import Sequential, load_model 11 | from keras.layers import Dense, Dropout, Activation, Flatten 12 | from keras.layers import Convolution2D, MaxPooling2D 13 | from keras.layers import LeakyReLU 14 | from keras.callbacks import EarlyStopping 15 | from keras.utils import np_utils 16 | 17 | from attacks.fgsm import fgsm 18 | 19 | 20 | 21 | img_rows = 32 22 | img_cols = 32 23 | img_chan = 3 24 | input_shape=(img_rows, img_cols, img_chan) 25 | nb_classes = 10 26 | 27 | (X_train, y_train), (X_test, y_test) = cifar10.load_data() 28 | X_train = X_train.astype('float32') / 255 29 | X_test = X_test.astype('float32') / 255 30 | print('\nX_train shape:', X_train.shape) 31 | print('X_test shape:', X_train.shape) 32 | 33 | y_train = np_utils.to_categorical(y_train, nb_classes) 34 | y_test = np_utils.to_categorical(y_test, nb_classes) 35 | 36 | 37 | sess = tf.InteractiveSession() 38 | K.set_session(sess) 39 | 40 | 41 | if False: 42 | print('\nLoading model0') 43 | model0 = load_model('model/table_1_cifar10_model0.h5') 44 | else: 45 | print('\nBuilding model0') 46 | model0 = Sequential([ 47 | Convolution2D(32, 3, 3, border_mode='same', 48 | input_shape=input_shape), 49 | LeakyReLU(alpha=0.2), 50 | Convolution2D(32, 3, 3), 51 | LeakyReLU(alpha=0.2), 52 | MaxPooling2D(pool_size=(2,2)), 53 | Dropout(0.2), 54 | Convolution2D(64, 3, 3, border_mode='same'), 55 | LeakyReLU(alpha=0.2), 56 | Convolution2D(64, 3, 3), 57 | LeakyReLU(alpha=0.2), 58 | MaxPooling2D(pool_size=(2, 2)), 59 | Dropout(0.2), 60 | Convolution2D(128, 3, 3, border_mode='same'), 61 | LeakyReLU(alpha=0.2), 62 | Convolution2D(128, 3, 3), 63 | LeakyReLU(alpha=0.2), 64 | MaxPooling2D(pool_size=(2, 2)), 65 | Dropout(0.5), 66 | Flatten(), 67 | Dense(512), 68 | Activation('relu'), 69 | Dropout(0.5), 70 | Dense(nb_classes), 71 | Activation('softmax')]) 72 | 73 | model0.compile(loss='categorical_crossentropy', 74 | optimizer='adam', metrics=['accuracy']) 75 | 76 | earlystopping = EarlyStopping(monitor='val_loss', patience=5, 77 | verbose=1) 78 | model0.fit(X_train, y_train, nb_epoch=100, validation_split=0.1, 79 | callbacks=[earlystopping]) 80 | 81 | print('\nSaving model0') 82 | os.makedirs('model', exist_ok=True) 83 | model0.save('model/table_1_cifar10_model0.h5') 84 | 85 | 86 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chan)) 87 | eps = tf.placeholder(tf.float32, ()) 88 | x_adv = fgsm(model0, x, nb_epoch=8, eps=eps) 89 | 90 | 91 | print('\nTesting against clean test data') 92 | score = model0.evaluate(X_test, y_test) 93 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 94 | 95 | 96 | EPS = 0.01 97 | 98 | if False: 99 | print('\nLoading adversarial images') 100 | db = np.load('data/table_1_cifar10_{0:.4f}.npz'.format(EPS)) 101 | X_train_adv = db['X_train_adv'] 102 | X_test_adv = db['X_test_adv'] 103 | else: 104 | print('\nBuilding X_train_adv') 105 | nb_sample = X_train.shape[0] 106 | batch_size = 128 107 | nb_batch = int(np.ceil(nb_sample/batch_size)) 108 | X_train_adv = np.empty(X_train.shape) 109 | for batch in range(nb_batch): 110 | print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 111 | start = batch * batch_size 112 | end = min(nb_sample, start+batch_size) 113 | tmp = sess.run(x_adv, feed_dict={x: X_train[start:end], 114 | eps: EPS, 115 | K.learning_phase(): 0}) 116 | X_train_adv[start:end] = tmp 117 | 118 | print('\nBuilding X_test_adv') 119 | nb_sample = X_test.shape[0] 120 | nb_batch = int(np.ceil(nb_sample/batch_size)) 121 | X_test_adv = np.empty(X_test.shape) 122 | for batch in range(nb_batch): 123 | print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 124 | start = batch * batch_size 125 | end = min(nb_sample, start+batch_size) 126 | tmp = sess.run(x_adv, feed_dict={x: X_test[start:end], 127 | eps: EPS, 128 | K.learning_phase(): 0}) 129 | X_test_adv[start:end] = tmp 130 | 131 | print('\nSaving adversarial images') 132 | os.makedirs('data/', exist_ok=True) 133 | np.savez('data/table_1_cifar10_{0:.4f}.npz'.format(EPS), 134 | X_train_adv=X_train_adv, X_test_adv=X_test_adv) 135 | 136 | 137 | print('\nTesting against adversarial test data') 138 | score = model0.evaluate(X_test_adv, y_test) 139 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 140 | 141 | 142 | print('\nPlotting random adversarial data') 143 | 144 | print('\nMaking predictions') 145 | z0 = np.argmax(y_test, axis=1) 146 | y1 = model0.predict(X_test) 147 | z1 = np.argmax(y1, axis=1) 148 | y2 = model0.predict(X_test_adv) 149 | z2 = np.argmax(y2, axis=1) 150 | 151 | print('\nSelecting figures') 152 | X_tmp = np.empty((2, nb_classes, img_rows, img_cols, img_chan)) 153 | y_proba = np.empty((2, nb_classes, nb_classes)) 154 | for i in range(10): 155 | print('Target {0}'.format(i)) 156 | ind, = np.where(np.all([z0==i, z1==i, z2!=i], axis=0)) 157 | cur = np.random.choice(ind) 158 | X_tmp[0][i] = X_test[cur] 159 | X_tmp[1][i] = X_test_adv[cur] 160 | y_proba[0][i] = y1[cur] 161 | y_proba[1][i] = y2[cur] 162 | 163 | 164 | print('\nPlotting results') 165 | fig = plt.figure(figsize=(10, 3)) 166 | gs = gridspec.GridSpec(2, 10, wspace=0.1, hspace=0.1) 167 | 168 | label = np.argmax(y_proba, axis=2) 169 | proba = np.max(y_proba, axis=2) 170 | for i in range(10): 171 | for j in range(2): 172 | ax = fig.add_subplot(gs[j, i]) 173 | ax.imshow(X_tmp[j][i], interpolation='none') 174 | ax.set_xticks([]) 175 | ax.set_yticks([]) 176 | ax.set_xlabel('{0} ({1:.2f})'.format(label[j][i], 177 | proba[j][i]), 178 | fontsize=12) 179 | 180 | print('\nSaving figure') 181 | gs.tight_layout(fig) 182 | os.makedirs('img', exist_ok=True) 183 | plt.savefig('img/table_1_cifar10.pdf') 184 | 185 | 186 | print('\nPreparing clean/adversarial mixed dataset') 187 | 188 | X_all_train = np.vstack([X_train, X_train_adv]) 189 | y_all_train = np.vstack([np.zeros([X_train.shape[0], 1]), 190 | np.ones([X_train_adv.shape[0], 1])]) 191 | 192 | y0_test = np.zeros((y_test.shape[0], 1)) 193 | y1_test = np.ones((y_test.shape[0], 1)) 194 | 195 | ind = np.random.permutation(X_all_train.shape[0]) 196 | X_all_train = X_all_train[ind] 197 | y_all_train = y_all_train[ind] 198 | 199 | 200 | if False: 201 | print('\nLoading model1') 202 | model1 = load_model('model/table_1_cifar10_model1.h5') 203 | else: 204 | print('\nBuilding model1') 205 | model1 = Sequential([ 206 | Convolution2D(32, 3, 3, border_mode='same', 207 | input_shape=input_shape), 208 | LeakyReLU(alpha=0.2), 209 | Convolution2D(32, 3, 3), 210 | LeakyReLU(alpha=0.2), 211 | MaxPooling2D(pool_size=(2,2)), 212 | Dropout(0.2), 213 | Convolution2D(64, 3, 3, border_mode='same'), 214 | LeakyReLU(alpha=0.2), 215 | Convolution2D(64, 3, 3), 216 | LeakyReLU(alpha=0.2), 217 | MaxPooling2D(pool_size=(2, 2)), 218 | Flatten(), 219 | Dense(256), 220 | Activation('relu'), 221 | Dropout(0.5), 222 | Dense(1), 223 | Activation('sigmoid')]) 224 | 225 | model1.compile(loss='binary_crossentropy', 226 | optimizer='adam', metrics=['accuracy']) 227 | 228 | os.makedirs('model', exist_ok=True) 229 | model1.fit(X_all_train, y_all_train, nb_epoch=2, 230 | validation_split=0.1) 231 | 232 | print('\nSaving model1') 233 | model1.save('model/table_1_cifar10_model1.h5') 234 | 235 | 236 | # x1_adv = fgsm(model1, x, nb_epoch=4, eps=0.01) 237 | 238 | print('\nTesting against clean test data') 239 | score = model1.evaluate(X_test, y0_test) 240 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 241 | 242 | 243 | print('\nTesting against adversarial test data') 244 | score = model1.evaluate(X_test_adv, y1_test) 245 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 246 | 247 | 248 | # print('\nDisguising clean test data') 249 | # nb_sample = X_test.shape[0] 250 | # batch_size = 128 251 | # nb_batch = int(np.ceil(nb_sample/batch_size)) 252 | # X_test_adv1 = np.empty(X_test.shape) 253 | # for batch in range(nb_batch): 254 | # print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 255 | # start = batch * batch_size 256 | # end = min(nb_sample, start+batch_size) 257 | # tmp = sess.run(x1_adv, feed_dict={x: X_test[start:end], 258 | # K.learning_phase(): 0}) 259 | # X_test_adv1[start:end] = tmp 260 | 261 | 262 | # print('\nTesting against disguised clean data') 263 | # score = model1.evaluate(X_test_adv1, y0_test) 264 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 265 | 266 | 267 | # print('\nDisguising adversarial test data') 268 | # nb_sample = X_test_adv.shape[0] 269 | # batch_size = 128 270 | # nb_batch = int(np.ceil(nb_sample/batch_size)) 271 | # X_test_adv2 = np.empty(X_test_adv.shape) 272 | # for batch in range(nb_batch): 273 | # print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 274 | # start = batch * batch_size 275 | # end = min(nb_sample, start+batch_size) 276 | # tmp = sess.run(x1_adv, feed_dict={x: X_test_adv[start:end], 277 | # K.learning_phase(): 0}) 278 | # X_test_adv2[start:end] = tmp 279 | 280 | 281 | # print('\nTesting against disguised adversarial data') 282 | # score = model1.evaluate(X_test_adv2, y1_test) 283 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 284 | -------------------------------------------------------------------------------- /src/figure_2.py: -------------------------------------------------------------------------------- 1 | import os 2 | # supress tensorflow logging other than errors 3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 4 | 5 | import numpy as np 6 | import tensorflow as tf 7 | 8 | from keras import backend as K 9 | from keras.datasets import mnist 10 | from keras.models import Sequential, load_model 11 | from keras.layers import Dense, Dropout, Activation, Flatten 12 | from keras.layers import Convolution2D, MaxPooling2D 13 | from keras.utils import np_utils 14 | 15 | import matplotlib 16 | matplotlib.use('Agg') 17 | import matplotlib.pyplot as plt 18 | import matplotlib.gridspec as gridspec 19 | 20 | from attacks.fgsm import fgsm 21 | 22 | 23 | 24 | def random_orthogonal(i): 25 | """Return a random vector orthogonal to i.""" 26 | v = np.random.random(i.shape) 27 | i /= np.linalg.norm(i) 28 | a = np.dot(v, i) / np.dot(i, i) 29 | j = v - a*i 30 | b = np.linalg.norm(j) 31 | j /= b 32 | return j, (a, i) 33 | 34 | 35 | img_rows = 28 36 | img_cols = 28 37 | img_chan = 1 38 | nb_classes = 10 39 | input_shape=(img_rows, img_cols, img_chan) 40 | 41 | 42 | print('\nLoading mnist') 43 | (X_train, y_train), (X_test, y_test) = mnist.load_data() 44 | 45 | X_train = X_train.astype('float32') / 255. 46 | X_test = X_test.astype('float32') / 255. 47 | 48 | X_train = X_train.reshape(-1, img_rows, img_cols, img_chan) 49 | X_test = X_test.reshape(-1, img_rows, img_cols, img_chan) 50 | print('\nX_train shape:', X_train.shape) 51 | print('y_train shape:', y_train.shape) 52 | 53 | # one hot encoding 54 | y_train = np_utils.to_categorical(y_train, nb_classes) 55 | y_test = np_utils.to_categorical(y_test, nb_classes) 56 | 57 | 58 | sess = tf.InteractiveSession() 59 | K.set_session(sess) 60 | 61 | 62 | if False: 63 | print('\nLoading model0') 64 | model0 = load_model('model/figure_2_model0.h5') 65 | else: 66 | print('\nBuilding model0') 67 | model0 = Sequential([ 68 | Convolution2D(32, 3, 3, input_shape=input_shape), 69 | Activation('relu'), 70 | Convolution2D(32, 3, 3), 71 | Activation('relu'), 72 | MaxPooling2D(pool_size=(2, 2)), 73 | # Dropout(0.25), 74 | Flatten(), 75 | Dense(128), 76 | Activation('relu'), 77 | # Dropout(0.5), 78 | Dense(10), 79 | Activation('softmax')]) 80 | 81 | model0.compile(optimizer='adam', loss='categorical_crossentropy', 82 | metrics=['accuracy']) 83 | 84 | print('\nTraining model0') 85 | model0.fit(X_train, y_train, nb_epoch=10) 86 | 87 | print('\nSaving model0') 88 | os.makedirs('model', exist_ok=True) 89 | model0.save('model/figure_2_model0.h5') 90 | 91 | 92 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chan)) 93 | y = tf.placeholder(tf.int32, (None, )) 94 | x_adv = fgsm(model0, x, eps=0.25, nb_epoch=1) 95 | 96 | 97 | print('\nTesting against clean data') 98 | score = model0.evaluate(X_test, y_test) 99 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 100 | 101 | 102 | if False: 103 | print('\nLoading adversarial datasets') 104 | X_adv = np.load('data/figure_2.npy') 105 | else: 106 | print('\nGenerating adversarial') 107 | batch_size = 64 108 | X_adv = np.empty(X_test.shape) 109 | nb_sample = X_test.shape[0] 110 | nb_batch = int(np.ceil(nb_sample/batch_size)) 111 | for batch in range(nb_batch): 112 | print('batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 113 | start = batch * batch_size 114 | end = min(nb_sample, start+batch_size) 115 | tmp = sess.run(x_adv, feed_dict={x: X_test[start:end], 116 | K.learning_phase(): 0}) 117 | X_adv[start:end] = tmp 118 | 119 | print('\nSaving adversarials') 120 | os.makedirs('data', exist_ok=True) 121 | np.save('data/figure_2.npy', X_adv) 122 | 123 | 124 | print('\nTesting against adversarial data') 125 | score = model0.evaluate(X_adv, y_test) 126 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 127 | 128 | 129 | if False: 130 | print('\nLoading model1') 131 | model1 = load_model('model/figure_2_model1.h5') 132 | else: 133 | print('\nBuilding model1') 134 | model1 = Sequential([ 135 | Convolution2D(32, 3, 3, input_shape=input_shape), 136 | Activation('relu'), 137 | Convolution2D(32, 3, 3), 138 | Activation('relu'), 139 | MaxPooling2D(pool_size=(2, 2)), 140 | Dropout(0.25), 141 | Flatten(), 142 | Dense(128), 143 | Activation('relu'), 144 | Dropout(0.5), 145 | Dense(10), 146 | Activation('softmax')]) 147 | 148 | model1.compile(loss='categorical_crossentropy', 149 | optimizer='adam', metrics=['accuracy']) 150 | 151 | x_adv_tmp = fgsm(model1, x, eps=0.3, nb_epoch=1) 152 | 153 | print('\nDummy testing') 154 | model1.evaluate(X_test[:10], y_test[:10], verbose=0) 155 | 156 | 157 | print('\nPreparing training/validation dataset') 158 | validation_split = 0.1 159 | N = int(X_train.shape[0]*validation_split) 160 | X_tmp_train, X_tmp_val = X_train[:-N], X_train[-N:] 161 | y_tmp_train, y_tmp_val = y_train[:-N], y_train[-N:] 162 | 163 | 164 | print('\nTraining model1') 165 | nb_epoch = 10 166 | batch_size = 64 167 | nb_sample = X_tmp_train.shape[0] 168 | nb_batch = int(np.ceil(nb_sample/batch_size)) 169 | for epoch in range(nb_epoch): 170 | print('Epoch {0}/{1}'.format(epoch+1, nb_epoch)) 171 | for batch in range(nb_batch): 172 | print(' batch {0}/{1} '.format(batch+1, nb_batch), 173 | end='\r', flush=True) 174 | start = batch * batch_size 175 | end = min(nb_sample, start+batch_size) 176 | X_tmp_adv = sess.run(x_adv_tmp, feed_dict={ 177 | x: X_tmp_train[start:end], K.learning_phase(): 0}) 178 | y_tmp_adv = y_tmp_train[start:end] 179 | X_batch = np.vstack((X_tmp_train[start:end], X_tmp_adv)) 180 | y_batch = np.vstack((y_tmp_train[start:end], y_tmp_adv)) 181 | score = model1.train_on_batch(X_batch, y_batch) 182 | score = model1.evaluate(X_tmp_val, y_tmp_val) 183 | print(' loss: {0:.4f} acc: {1:.4f}' 184 | .format(score[0], score[1])) 185 | 186 | print('\nSaving model1') 187 | os.makedirs('model', exist_ok=True) 188 | model1.save('model/figure_2_model1.h5') 189 | 190 | 191 | print('\nTesting against adversarial') 192 | score = model1.evaluate(X_adv, y_test) 193 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 194 | 195 | 196 | print('\nPreparing predictions') 197 | y0_0 = model0.predict(X_test) 198 | y0_1 = model0.predict(X_adv) 199 | 200 | y1_0 = model1.predict(X_test) 201 | y1_1 = model1.predict(X_adv) 202 | 203 | z_test = np.argmax(y_test, axis=1) 204 | 205 | z0_0 = np.argmax(y0_0, axis=1) 206 | z0_1 = np.argmax(y0_1, axis=1) 207 | 208 | z1_0 = np.argmax(y1_0, axis=1) 209 | z1_1 = np.argmax(y1_1, axis=1) 210 | 211 | p0_0 = np.max(y0_0, axis=1) 212 | p0_1 = np.max(y0_1, axis=1) 213 | 214 | p1_0 = np.max(y1_0, axis=1) 215 | p1_1 = np.max(y1_1, axis=1) 216 | 217 | img_rows = 41 218 | img_cols = 41 219 | img_chan = 4 220 | 221 | p_filter = np.all([p0_0>0.5, p0_1>0.5, p1_1>0.5], axis=0) 222 | 223 | print('\nGenerating figure') 224 | fig = plt.figure(figsize=(8, 1)) 225 | gs = gridspec.GridSpec(1, 10, wspace=0.15) 226 | 227 | for label in range(10): 228 | print('Label {0}'.format(label)) 229 | gs0 = gridspec.GridSpecFromSubplotSpec(4, 3, 230 | subplot_spec=gs[label], 231 | wspace=0.05, hspace=0.1) 232 | ax = fig.add_subplot(gs0[:3, :]) 233 | ax0 = fig.add_subplot(gs0[3, 0]) 234 | ax1 = fig.add_subplot(gs0[3, 1]) 235 | ax2 = fig.add_subplot(gs0[3, 2]) 236 | 237 | img = np.empty((img_rows, img_cols, img_chan)) 238 | ind, = np.where(np.all([p_filter, 239 | z_test==label, z0_0==label, 240 | z0_1!=label, z1_1==label], 241 | axis=0)) 242 | 243 | cur = np.random.choice(ind) 244 | X_i = np.squeeze(X_test[cur]) 245 | X_adv_i = np.squeeze(X_adv[cur]) 246 | 247 | i = X_adv_i.flatten() - X_i.flatten() 248 | j, (a, i) = random_orthogonal(i) 249 | 250 | D = np.amax([1.5 * np.absolute(a), 251 | 0.5 / np.linalg.norm(i, ord=np.inf)]) 252 | 253 | eps_i = np.linspace(-D, D, img_cols) 254 | eps_j = np.linspace(D, -D, img_rows) 255 | 256 | cnt = 0 257 | tmpr, tmpc = 0, 0 258 | 259 | for r, ej in enumerate(eps_j): 260 | for c, ei in enumerate(eps_i): 261 | 262 | X_tmp = np.clip(X_i.flatten()+ej*j+ei*i, 0, 1) 263 | X_tmp = np.reshape(X_tmp, (1, 28, 28, 1)) 264 | y0_tmp = model0.predict(X_tmp) 265 | z0_tmp = np.argmax(y0_tmp) 266 | y1_tmp = model1.predict(X_tmp) 267 | z1_tmp = np.argmax(y1_tmp) 268 | 269 | if z0_tmp == label and z1_tmp == label: 270 | # correct prediction in both cases 271 | color = [1, 1, 1, 1] 272 | elif z0_tmp == label: 273 | # correct prediction after normal training 274 | color = [1, 0, 0, 0.1] 275 | elif z1_tmp == label: 276 | # correct prediction after adv training 277 | color = [0, 1, 0, 0.1] 278 | else: 279 | # incorrect prediction in both cases 280 | color = [0.1, 0.1, 0.1, 0.1] 281 | cnt += 1 282 | if np.random.random() < 1./cnt: 283 | tmpr, tmpc = r, c 284 | img[r, c] = color 285 | 286 | # the original datum 287 | r = img_rows // 2 288 | c = img_rows // 2 289 | img[r, c] = [0, 0, 0, 1] 290 | 291 | # adversarial datum 292 | r = img_rows // 2 293 | c = int((np.linalg.norm(X_adv_i-X_i)-eps_i[0]) / (2*D) * img_cols) 294 | img[r, c] = [1.0, 0.65, 0, 1] 295 | 296 | # random adversarial datum 297 | img[tmpr, tmpc] = [0, 0, 1, 1] 298 | 299 | 300 | ax.imshow(img, interpolation='none') 301 | ax.set_xticks([]) 302 | ax.set_yticks([]) 303 | 304 | ax0.imshow(X_i, cmap='gray', interpolation='none') 305 | ax0.set_xticks([]) 306 | ax0.set_yticks([]) 307 | 308 | ax1.imshow(X_adv_i, cmap='gray', interpolation='none') 309 | ax1.set_xticks([]) 310 | ax1.set_yticks([]) 311 | 312 | X_tmp = np.clip(X_i.flatten()+eps_j[tmpr]*j+eps_i[tmpc]*i, 313 | 0, 1) 314 | X_tmp = np.reshape(X_tmp, (28, 28)) 315 | ax2.imshow(X_tmp, cmap='gray', interpolation='none') 316 | ax2.set_xticks([]) 317 | ax2.set_yticks([]) 318 | 319 | 320 | gs.tight_layout(fig, pad=0) 321 | os.makedirs('img', exist_ok=True) 322 | plt.savefig('img/figure_2.pdf') 323 | -------------------------------------------------------------------------------- /src/table_1_svhn.py: -------------------------------------------------------------------------------- 1 | import os 2 | # supress tensorflow logging other than errors 3 | os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 4 | 5 | from urllib.request import urlopen 6 | 7 | import numpy as np 8 | import tensorflow as tf 9 | 10 | from keras import backend as K 11 | from keras.models import Sequential, load_model 12 | from keras.layers import Dense, Dropout, Activation, Flatten 13 | from keras.layers import Convolution2D, MaxPooling2D 14 | from keras.layers import LeakyReLU 15 | from keras.callbacks import EarlyStopping 16 | from keras.utils import np_utils 17 | 18 | from scipy.io import loadmat 19 | 20 | import matplotlib 21 | matplotlib.use('Agg') 22 | import matplotlib.pyplot as plt 23 | import matplotlib.gridspec as gridspec 24 | 25 | from attacks.fgsm import fgsm 26 | 27 | 28 | 29 | img_rows = 32 30 | img_cols = 32 31 | img_chan = 3 32 | nb_classes = 10 33 | input_shape = (img_rows, img_cols, img_chan) 34 | 35 | 36 | def maybe_download(): 37 | train_url = 'http://ufldl.stanford.edu/housenumbers/train_32x32.mat' 38 | test_url = 'http://ufldl.stanford.edu/housenumbers/test_32x32.mat' 39 | os.makedirs('data', exist_ok=True) 40 | if not os.path.exists('data/svhn_train_32x32.mat'): 41 | print('\nDownloading svhn training data') 42 | with urlopen(train_url) as response,\ 43 | open('data/svhn_train_32x32.mat', 'wb') as w: 44 | data = response.read() 45 | w.write(data) 46 | if not os.path.exists('data/svhn_test_32x32.mat'): 47 | print('\nDownloading svhn test data') 48 | with urlopen(test_url) as response,\ 49 | open('data/svhn_test_32x32.mat', 'wb') as w: 50 | data = response.read() 51 | w.write(data) 52 | 53 | 54 | maybe_download() 55 | 56 | print('\nLoading SVHN dataset') 57 | db = loadmat('data/svhn_train_32x32.mat') 58 | X_train, y_train = db['X'], db['y'] 59 | X_train = X_train.transpose(3, 0, 1, 2) 60 | db = loadmat('data/svhn_test_32x32.mat') 61 | X_test, y_test = db['X'], db['y'] 62 | X_test = X_test.transpose(3, 0, 1, 2) 63 | 64 | 65 | X_train = X_train.astype('float32') / 128 - 1. 66 | X_test = X_test.astype('float32') / 128 -1. 67 | print('\nX_train shape:', X_train.shape) 68 | print('X_test shape:', X_test.shape) 69 | 70 | 71 | y_train[10==y_train] = 0 72 | y_test[10==y_test] = 0 73 | y_train = np_utils.to_categorical(y_train, nb_classes) 74 | y_test = np_utils.to_categorical(y_test, nb_classes) 75 | 76 | 77 | sess = tf.InteractiveSession() 78 | K.set_session(sess) 79 | 80 | 81 | if False: 82 | print('\nLoading model0') 83 | model0 = load_model('model/table_1_svhn_model0.h5') 84 | else: 85 | print('\nBuilding model0') 86 | model0 = Sequential([ 87 | Convolution2D(32, 3, 3, border_mode='same', 88 | input_shape=input_shape), 89 | LeakyReLU(alpha=0.2), 90 | Convolution2D(32, 3, 3), 91 | LeakyReLU(alpha=0.2), 92 | MaxPooling2D(pool_size=(2,2)), 93 | Dropout(0.2), 94 | Convolution2D(64, 3, 3, border_mode='same'), 95 | LeakyReLU(alpha=0.2), 96 | Convolution2D(64, 3, 3), 97 | LeakyReLU(alpha=0.2), 98 | MaxPooling2D(pool_size=(2, 2)), 99 | Dropout(0.3), 100 | Convolution2D(128, 3, 3, border_mode='same'), 101 | LeakyReLU(alpha=0.2), 102 | Convolution2D(128, 3, 3), 103 | LeakyReLU(alpha=0.2), 104 | MaxPooling2D(pool_size=(2, 2)), 105 | Dropout(0.4), 106 | Flatten(), 107 | Dense(512), 108 | Activation('relu'), 109 | Dropout(0.5), 110 | Dense(nb_classes), 111 | Activation('softmax')]) 112 | 113 | model0.compile(loss='categorical_crossentropy', 114 | optimizer="adam", metrics=['accuracy']) 115 | 116 | 117 | earlystopping = EarlyStopping(monitor='val_loss', patience=5, 118 | verbose=1) 119 | model0.fit(X_train, y_train, nb_epoch=100, validation_split=0.1, 120 | callbacks=[earlystopping]) 121 | 122 | print('\nSaving model0') 123 | os.makedirs('model', exist_ok=True) 124 | model0.save('model/table_1_svhn_model0.h5') 125 | 126 | 127 | x = tf.placeholder(tf.float32, (None, img_rows, img_cols, img_chan)) 128 | eps = tf.placeholder(tf.float32, ()) 129 | x_adv = fgsm(model0, x, nb_epoch=9, eps=eps, clip_min=-1., 130 | clip_max=1.) 131 | 132 | 133 | print('\nTesting against clean test data') 134 | score = model0.evaluate(X_test, y_test) 135 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 136 | 137 | 138 | EPS = 0.01 139 | 140 | if False: 141 | print('\nLoading adversarial dataset') 142 | db = np.load('data/table_1_svhn_{0:.4f}.npz'.format(EPS)) 143 | X_train_adv = db['X_train_adv'] 144 | X_test_adv = db['X_test_adv'] 145 | else: 146 | print('\nBuilding X_train_adv') 147 | nb_sample = X_train.shape[0] 148 | batch_size = 128 149 | nb_batch = int(np.ceil(nb_sample/batch_size)) 150 | X_train_adv = np.empty(X_train.shape) 151 | for batch in range(nb_batch): 152 | print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 153 | start = batch * batch_size 154 | end = min(nb_sample, start+batch_size) 155 | tmp = sess.run(x_adv, feed_dict={x: X_train[start:end], 156 | eps: EPS, 157 | K.learning_phase(): 0}) 158 | X_train_adv[start:end] = tmp 159 | 160 | print('\nBuilding X_test_adv') 161 | nb_sample = X_test.shape[0] 162 | nb_batch = int(np.ceil(nb_sample/batch_size)) 163 | X_test_adv = np.empty(X_test.shape) 164 | for batch in range(nb_batch): 165 | print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 166 | start = batch * batch_size 167 | end = min(nb_sample, start+batch_size) 168 | tmp = sess.run(x_adv, feed_dict={x: X_test[start:end], 169 | eps: EPS, 170 | K.learning_phase(): 0}) 171 | X_test_adv[start:end] = tmp 172 | 173 | print('\nSaving adversarial dataset') 174 | os.makedirs('data', exist_ok=True) 175 | np.savez('data/table_1_svhn_{0:.4f}.npz'.format(EPS), 176 | X_train_adv=X_train_adv, X_test_adv=X_test_adv) 177 | 178 | 179 | print('\nTesting against adversarial test data') 180 | score = model0.evaluate(X_test_adv, y_test) 181 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 182 | 183 | 184 | print('\nPlotting random adversarial data') 185 | 186 | print('\nMaking predictions') 187 | z0 = np.argmax(y_test, axis=1) 188 | y1 = model0.predict(X_test) 189 | z1 = np.argmax(y1, axis=1) 190 | y2 = model0.predict(X_test_adv) 191 | z2 = np.argmax(y2, axis=1) 192 | 193 | print('\nSelecting figures') 194 | X_tmp = np.empty((2, nb_classes, img_rows, img_cols, img_chan)) 195 | y_proba = np.empty((2, nb_classes, nb_classes)) 196 | for i in range(10): 197 | print('Target {0}'.format(i)) 198 | ind, = np.where(np.all([z0==i, z1==i, z2!=i], axis=0)) 199 | cur = np.random.choice(ind) 200 | X_tmp[0][i] = X_test[cur] 201 | X_tmp[1][i] = X_test_adv[cur] 202 | y_proba[0][i] = y1[cur] 203 | y_proba[1][i] = y2[cur] 204 | 205 | 206 | print('\nPlotting results') 207 | fig = plt.figure(figsize=(10, 3)) 208 | gs = gridspec.GridSpec(2, 10, wspace=0.1, hspace=0.1) 209 | 210 | label = np.argmax(y_proba, axis=2) 211 | proba = np.max(y_proba, axis=2) 212 | for i in range(10): 213 | for j in range(2): 214 | ax = fig.add_subplot(gs[j, i]) 215 | ax.imshow(X_tmp[j][i], interpolation='none') 216 | ax.set_xticks([]) 217 | ax.set_yticks([]) 218 | ax.set_xlabel('{0} ({1:.2f})'.format(label[j][i], 219 | proba[j][i]), 220 | fontsize=12) 221 | 222 | print('\nSaving figure') 223 | gs.tight_layout(fig) 224 | os.makedirs('img', exist_ok=True) 225 | plt.savefig('img/table_1_svhn.pdf') 226 | 227 | 228 | print('\nPreparing clean/adversarial mixed dataset') 229 | X_all_train = np.vstack([X_train, X_train_adv]) 230 | y_all_train = np.vstack([np.zeros([X_train.shape[0], 1]), 231 | np.ones([X_train_adv.shape[0], 1])]) 232 | 233 | y0_test = np.zeros((y_test.shape[0], 1)) 234 | y1_test = np.ones((y_test.shape[0], 1)) 235 | 236 | ind = np.random.permutation(X_all_train.shape[0]) 237 | X_all_train = X_all_train[ind] 238 | y_all_train = y_all_train[ind] 239 | 240 | 241 | if False: 242 | print('\nLoading model1') 243 | model1 = load_model('model/table_1_svhn_model1.h5') 244 | else: 245 | print('\nBuilding model1') 246 | model1 = Sequential([ 247 | Convolution2D(32, 3, 3, border_mode='same', 248 | input_shape=input_shape), 249 | LeakyReLU(alpha=0.2), 250 | Convolution2D(32, 3, 3), 251 | LeakyReLU(alpha=0.2), 252 | MaxPooling2D(pool_size=(2,2)), 253 | Dropout(0.2), 254 | Convolution2D(64, 3, 3, border_mode='same'), 255 | LeakyReLU(alpha=0.2), 256 | Convolution2D(64, 3, 3), 257 | LeakyReLU(alpha=0.2), 258 | MaxPooling2D(pool_size=(2, 2)), 259 | Dropout(0.3), 260 | Convolution2D(128, 3, 3, border_mode='same'), 261 | LeakyReLU(alpha=0.2), 262 | Convolution2D(128, 3, 3), 263 | LeakyReLU(alpha=0.2), 264 | MaxPooling2D(pool_size=(2, 2)), 265 | Dropout(0.4), 266 | Flatten(), 267 | Dense(512), 268 | Activation('relu'), 269 | Dropout(0.5), 270 | Dense(1), 271 | Activation('sigmoid')]) 272 | 273 | model1.compile(loss='binary_crossentropy', 274 | optimizer="adam", metrics=['accuracy']) 275 | 276 | print('\nTraining model1') 277 | model1.fit(X_all_train, y_all_train, nb_epoch=2, 278 | validation_split=0.1) 279 | 280 | print('\nSaving model1') 281 | os.makedirs('model', exist_ok=True) 282 | model1.save('model/table_1_svhn_model1.h5') 283 | 284 | 285 | # x1_adv = fgsm(model1, x, nb_epoch=4, eps=0.2) 286 | 287 | print('\nTesting against clean test data') 288 | score = model1.evaluate(X_test, y0_test) 289 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 290 | 291 | 292 | print('\nTesting against adversarial test data') 293 | score = model1.evaluate(X_test_adv, y1_test) 294 | print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 295 | 296 | 297 | # print('\nDisguising clean test data') 298 | # nb_sample = X_test.shape[0] 299 | # batch_size = 128 300 | # nb_batch = int(np.ceil(nb_sample/batch_size)) 301 | # X_test_adv1 = np.empty(X_test.shape) 302 | # for batch in range(nb_batch): 303 | # print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 304 | # start = batch * batch_size 305 | # end = min(nb_sample, start+batch_size) 306 | # tmp = sess.run(x1_adv, feed_dict={x: X_test[start:end], 307 | # K.learning_phase(): 0}) 308 | # X_test_adv1[start:end] = tmp 309 | 310 | 311 | # print('\nTesting against disguised clean data') 312 | # score = model1.evaluate(X_test_adv1, y0_test) 313 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 314 | 315 | 316 | # print('\nDisguising adversarial test data') 317 | # nb_sample = X_test_adv.shape[0] 318 | # batch_size = 128 319 | # nb_batch = int(np.ceil(nb_sample/batch_size)) 320 | # X_test_adv2 = np.empty(X_test_adv.shape) 321 | # for batch in range(nb_batch): 322 | # print(' batch {0}/{1}'.format(batch+1, nb_batch), end='\r') 323 | # start = batch * batch_size 324 | # end = min(nb_sample, start+batch_size) 325 | # tmp = sess.run(x1_adv, feed_dict={x: X_test_adv[start:end], 326 | # K.learning_phase(): 0}) 327 | # X_test_adv2[start:end] = tmp 328 | 329 | 330 | # print('\nTesting against disguised adversarial data') 331 | # score = model1.evaluate(X_test_adv2, y1_test) 332 | # print('\nloss: {0:.4f} acc: {1:.4f}'.format(score[0], score[1])) 333 | -------------------------------------------------------------------------------- /adv-clean-not-twins.org: -------------------------------------------------------------------------------- 1 | #+OPTIONS: ^:{} toc:nil hideblocks title:nil num:2 2 | #+LATEX_HEADER: \input{setup.tex} 3 | 4 | # ICML specific typesetting 5 | #+BEGIN_EXPORT latex 6 | 7 | % The \icmltitle you define below is probably too long as a header. 8 | % Therefore, a short form for the running title is supplied here: 9 | % \icmltitlerunning{a shorter running title} 10 | 11 | \twocolumn[ 12 | \icmltitle{Adversarial and Clean Data Are Not Twins} 13 | 14 | \begin{icmlauthorlist} 15 | \icmlauthor{Zhitao Gong}{au} 16 | \icmlauthor{Wenlu Wang}{au} 17 | \icmlauthor{Wei-Shinn Ku}{au} 18 | \end{icmlauthorlist} 19 | 20 | \icmlaffiliation{au}{Auburn University, Auburn, AL} 21 | 22 | \icmlcorrespondingauthor{Zhitao Gong}{gong@auburn.edu} 23 | 24 | % You may provide any keywords that you find helpful for describing 25 | % your paper; these are used to populate the "keywords" metadata in 26 | % the PDF but will not be shown in the document 27 | 28 | \icmlkeywords{adversarial, deep neural network} 29 | 30 | \vskip 0.3in 31 | ] 32 | 33 | \printAffiliationsAndNotice{} 34 | 35 | #+END_EXPORT 36 | 37 | #+BEGIN_abstract 38 | 39 | Adversarial attack has cast a shadow on the massive success of deep 40 | neural networks. Despite being almost visually identical to the clean 41 | data, the adversarial images can fool deep neural networks into wrong 42 | predictions with very high confidence. In this paper, however, we 43 | show that we can build a simple binary classifier separating the 44 | adversarial apart from the clean data with accuracy over 99%. We also 45 | empirically show that the binary classifier is robust to a 46 | second-round adversarial attack. In other words, it is difficult to 47 | disguise adversarial samples to bypass the binary classifier. Further 48 | more, we empirically investigate the generalization limitation which 49 | lingers on all current defensive methods, including the binary 50 | classifier approach. And we hypothesize that this is the result of 51 | intrinsic property of adversarial crafting algorithms. 52 | 53 | #+END_abstract 54 | 55 | * Introduction 56 | :PROPERTIES: 57 | :CUSTOM_ID: sec:introduction 58 | :END: 59 | 60 | Deep neural networks have been successfully adopted to many life 61 | critical areas, e.g., skin cancer detection 62 | cite:esteva2017-dermatologist, auto-driving cite:santana2016-learning, 63 | traffic sign classification cite:ciresan2012-multi, etc. A recent 64 | study cite:szegedy2013-intriguing, however, discovered that deep 65 | neural networks are susceptible to adversarial images. Figure 66 | ref:fig:adv-example shows an example of adversarial images generated 67 | via fast gradient sign method 68 | cite:kurakin2016-adversarial,kurakin2016-adversarial-1 on MNIST. As 69 | we can see that although the adversarial and original clean images are 70 | almost identical from the perspective of human beings, the deep neural 71 | network will produce wrong predictions with very high confidence. 72 | Similar techniques can easily fool the image system into mistaking a 73 | stop sign for a yield sign, a dog for a automobile, for example. When 74 | leveraged by malicious users, these adversarial images pose a great 75 | threat to the deep neural network systems. 76 | 77 | #+ATTR_LaTeX: :float multicolumn 78 | #+CAPTION: The adversarial images (second row) are generated from the first row via iterative FGSM. The label of each image is shown below with prediction probability in parenthesis. Our model achieves less then 1% error rate on the clean data. 79 | #+NAME: fig:adv-example 80 | [[file:img/ex_adv_mnist.pdf]] 81 | 82 | Although adversarial and clean images appear visually indiscernible, 83 | their subtle differences can successfully fool the deep neural 84 | networks. This means that deep neural networks are sensitive to these 85 | subtle differences. So an intuitively question to ask is: can we 86 | leverage these subtle differences to distinguish between adversarial 87 | and clean images? Our experiment suggests the answer is positive. In 88 | this paper we demonstrate that a simple binary classifier can separate 89 | the adversarial from the original clean images with very high accuracy 90 | (over 99%). However, we also show that the binary classifier approach 91 | suffers from the generalization limitation, i.e., it is sensitive 1) 92 | to a hyper-parameter used in crafting adversarial dataset, and 2) to 93 | different adversarial crafting algorithms. In addition to that, we 94 | also discovered that this limitation is also shared among other 95 | proposed methods against adversarial attacking, e.g., defensive 96 | retraining cite:huang2015-learning,kurakin2016-adversarial-1, 97 | knowledge distillation cite:papernot2015-distillation, etc. We 98 | empirically investigate the limitation and propose the hypothesis that 99 | the adversarial and original dataset are, in effect, two completely 100 | /different/ datasets, despite being visually similar. 101 | 102 | This article is organized as follows. In Section 103 | ref:sec:related-work, we give an overview of the current research in 104 | adversarial attack and defense, with a focus on deep neural networks. 105 | Then, it is followed by a brief summary of the state-of-the-art 106 | adversarial crafting algorithms in Section 107 | ref:sec:crafting-adversarials. Section ref:sec:experiment presents 108 | our experiment results and detailed discussions. And we conclude in 109 | Section ref:sec:conclusion. 110 | 111 | * Related Work 112 | :PROPERTIES: 113 | :CUSTOM_ID: sec:related-work 114 | :END: 115 | 116 | The adversarial image attack on deep neural networks was first 117 | investigated in cite:szegedy2013-intriguing. The authors discovered 118 | that when added some imperceptible carefully chosen noise, an image 119 | may be wrongly classified with high confidence by a well-trained deep 120 | neural network. They also proposed an adversarial crafting algorithm 121 | based on optimization. We will briefly summarize it in section 122 | ref:sec:crafting-adversarials. They also proposed the hypothesis that 123 | the adversarial samples exist as a result of the high nonlinearity of 124 | deep neural network models. 125 | 126 | However, cite:goodfellow2014-explaining proposed a counter-intuitive 127 | hypothesis explaining the cause of adversarial samples. They argued 128 | that adversarial samples are caused by the models being too /linear/, 129 | rather than /nonlinear/. They proposed two adversarial crafting 130 | algorithms based on this hypothesis, i.e., fast gradient sign method 131 | (FGSM) and least-likely class method (LLCM) 132 | cite:goodfellow2014-explaining. The least-likely class method is 133 | later generalized to target class gradient sign method (TGSM) in 134 | cite:kurakin2016-adversarial. 135 | 136 | cite:papernot2015-limitations proposed another gradient based 137 | adversarial algorithm, the Jacobian-based saliency map approach (JSMA) 138 | which can successfully alter the label of an image to any desired 139 | category. 140 | 141 | The adversarial images have been shown to be transferable among deep 142 | neural networks cite:szegedy2013-intriguing,kurakin2016-adversarial. 143 | This poses a great threat to current learning systems in that the 144 | attacker needs not the knowledge of the target system. Instead, the 145 | attacker can train a different model to create adversarial samples 146 | which are still effective for the target deep neural networks. What's 147 | worse, cite:papernot2016-transferability has shown that adversarial 148 | samples are even transferable among different machine learning 149 | techniques, e.g., deep neural networks, support vector machine, 150 | decision tree, logistic regression, etc. 151 | 152 | Small steps have been made towards the defense of adversarial images. 153 | cite:kurakin2016-adversarial shows that some image transformations, 154 | e.g., Gaussian noise, Gaussian filter, JPEG compression, etc., can 155 | effectively recover over 80% of the adversarial images. However, in 156 | our experiment, the image transformation defense does not perform well 157 | on images with low resolution, e.g., MNIST. Knowledge distillation is 158 | also shown to be an effective method against most adversarial images 159 | cite:papernot2015-distillation. The restrictions of defensive 160 | knowledge distillation are 1) that it only applies to models that 161 | produce categorical probabilities, and 2) that it needs model 162 | retraining. Adversarial training 163 | cite:kurakin2016-adversarial-1,huang2015-learning was also shown to 164 | greatly enhance the model robustness to adversarials. However, as 165 | discussed in Section ref:subsec:generalization-limitation, defensive 166 | distillation and adversarial training suffers from, what we call, the 167 | generalization limitations. Our experiment suggests this seems to be 168 | an intrinsic property of adversarial datasets. 169 | 170 | * Crafting Adversarials 171 | :PROPERTIES: 172 | :CUSTOM_ID: sec:crafting-adversarials 173 | :END: 174 | 175 | The are mainly two categories of algorithms to generate adversarial 176 | samples, model independent and model dependent. We briefly summarize 177 | these two classes of methods in this section. 178 | 179 | By conventions, we use \(X\) to represent input image set (usually a 180 | 3-dimension tensor), and \(Y\) to represent the label set, usually 181 | one-hot encoded. Lowercase represents an individual data sample, 182 | e.g., \(x\) for one input image. Subscript to data samples denotes 183 | one of its elements, e.g., \(x_i\) denotes one pixel in the image, 184 | \(y_i\) denotes probability for the \(i\)-th target class. \(f\) 185 | denotes the model, \(\theta\) the model parameter, \(J\) the loss 186 | function. We use the superscript /adv/ to denote adversarial related 187 | variables, e.g., \(x^{adv}\) for one adversarial image. \(\delta x\) 188 | denotes the adversarial noise for one image, i.e., \(x^{adv} = x + 189 | \delta x\). For clarity, we also include the model used to craft the 190 | adversarial samples where necessary, e.g., \(x^{adv(f_1)}\) denotes 191 | the adversarial samples created with model \(f_1\). \(\mathbb{D}\) 192 | denotes the image value domain, usually \([0, 1]\) or \([0, 255]\). 193 | And \(\epsilon\) is a scalar controlling the scale of the adversarial 194 | noise, another hyper-parameter to choose. 195 | 196 | ** Model Independent Method 197 | 198 | A box-constrained minimization algorithm based on L-BFGS was the first 199 | algorithm proposed to generate adversarial data 200 | cite:szegedy2013-intriguing. Concretely we want to find the smallest 201 | (in the sense of \(L^2\)-norm) noise \(\delta x\) such that the 202 | adversarial image belongs to a different category, i.e., 203 | \(f(x^{adv})\neq f(x)\). 204 | #+BEGIN_EXPORT latex 205 | \begin{equation} \label{eq:guided-walk} 206 | \begin{split} 207 | \delta x &= \argmin_r c\norm{r}_\infty + J(x+r, y^{adv})\\ 208 | &\text{ s.t. } x+r\in \mathbb{D} 209 | \end{split} 210 | \end{equation} 211 | #+END_EXPORT 212 | 213 | ** Model Dependent Methods 214 | 215 | There are mainly three methods that rely on model gradient, i.e., fast 216 | gradient sign method (FGSM) cite:kurakin2016-adversarial, target class 217 | method cite:kurakin2016-adversarial,kurakin2016-adversarial-1 (TGSM) 218 | and Jacobian-based saliency map approach (JSMA) 219 | cite:papernot2015-limitations. We will see in Section 220 | ref:sec:experiment that despite that they all produce highly 221 | disguising adversarials, FGSM and TGSM produce /compatible/ 222 | adversarial datasets which are complete /different/ from adversarials 223 | generated via JSMA. 224 | 225 | *** Fast Gradient Sign Method (FGSM) 226 | 227 | FGSM tries to modify the input towards the direction where \(J\) 228 | increases, i.e., \(\dv*{J(x, y^{adv})}{x}\), as shown in Equation 229 | ref:eq:fgsm. 230 | #+BEGIN_EXPORT latex 231 | \begin{equation} \label{eq:fgsm} 232 | \delta x = \epsilon\sign\left(\dv{J(x, \pred{y})}{x}\right) 233 | \end{equation} 234 | #+END_EXPORT 235 | 236 | Originally cite:kurakin2016-adversarial proposes to generate 237 | adversarial samples by using the true label i.e., \(y^{adv} = 238 | y^{true}\), which has been shown to suffer from the label leaking 239 | problem cite:kurakin2016-adversarial-1. Instead of true labels, 240 | cite:kurakin2016-adversarial-1 proposes to use the /predicted/ label, 241 | i.e., \(\pred{y} = f(x)\), to generate adversarial examples. 242 | 243 | This method can also be used iteratively as shown in Equation 244 | ref:eq:fgsm-iter. Iterative FGSM has much higher success rate than 245 | the one-step FGSM. However, the iterative version is less robust to 246 | image transformation cite:kurakin2016-adversarial. 247 | #+BEGIN_EXPORT latex 248 | \begin{equation} \label{eq:fgsm-iter} 249 | \begin{split} 250 | x^{adv}_{k+1} &= x^{adv}_k + \epsilon\sign\left(\dv{J(x^{adv}_k, \pred{y_k})}{x}\right)\\ 251 | x^{adv}_0 &= x\\ 252 | \pred{y_k} &= f(x^{adv}_k) 253 | \end{split} 254 | \end{equation} 255 | #+END_EXPORT 256 | 257 | *** Target Class Gradient Sign Method (TGSM) 258 | 259 | This method tries to modify the input towards the direction where 260 | \(p(y^{adv}\given x)\) increases. 261 | #+BEGIN_EXPORT latex 262 | \begin{equation} \label{eq:tcm} 263 | \delta x = -\epsilon\sign\left(\dv{J(x, y^{adv})}{x}\right) 264 | \end{equation} 265 | #+END_EXPORT 266 | 267 | Originally this method was proposed as the least-likely class method 268 | cite:kurakin2016-adversarial where \(y^{adv}\) was chosen as the 269 | least-likely class predicted by the model as shown in Equation 270 | ref:eq:llcm-y. 271 | #+BEGIN_EXPORT latex 272 | \begin{equation} \label{eq:llcm-y} 273 | y^{adv} = \text{OneHotEncode}\left(\argmin f(x)\right) 274 | \end{equation} 275 | #+END_EXPORT 276 | 277 | And it was extended to a more general case where \(y^{adv}\) could be 278 | any desired target class cite:kurakin2016-adversarial-1. 279 | 280 | # The following table belongs to the "Efficiency and Robustness of the 281 | # Classifier" section, place here only for typesetting. 282 | 283 | #+BEGIN_EXPORT latex 284 | \begin{table*}[htbp] 285 | \caption{\label{tbl:accuracy-summary} 286 | Accuracy on adversarial samples generated with FGSM/TGSM.} 287 | \centering 288 | \begin{tabular}{lcrrcrrrr} 289 | \toprule 290 | & \phantom{a} & \multicolumn{2}{c}{\(f_1\)} & \phantom{a} & \multicolumn{4}{c}{\(f_2\)} \\ 291 | \cmidrule{3-4} \cmidrule{6-9} 292 | Dataset && \(X_{test}\) & \(X^{adv(f_1)}_{test}\) && \(X_{test}\) & \(X^{adv(f_1)}_{test}\) & \(\{X_{test}\}^{adv(f_2)}\) & \(\{X^{adv(f_1)}_{test}\}^{adv(f_2)}\) \\ 293 | \midrule 294 | MNIST && 0.9914 & 0.0213 && 1.00 & 1.00 & 0.00 & 1.00\\ 295 | CIFAR10 && 0.8279 & 0.1500 && 0.99 & 1.00 & 0.01 & 1.00\\ 296 | SVHN && 0.9378 & 0.2453 && 1.00 & 1.00 & 0.00 & 1.00\\ 297 | \bottomrule 298 | \end{tabular} 299 | \end{table*} 300 | 301 | #+END_EXPORT 302 | 303 | # #+CAPTION: Accuracy on adversarial samples generated with FGSM/TGSM. 304 | # #+NAME: tbl:accuracy-summary 305 | # #+ATTR_LaTeX: :booktabs true :align l|rr|rrrr :float multicolumn 306 | # | | \(f_1\) | | | | \(f_2\) | | 307 | # |---------+--------------+-------------------------+--------------+-------------------------+-----------------------------+----------------------------------------| 308 | # | Dataset | \(X_{test}\) | \(X^{adv(f_1)}_{test}\) | \(X_{test}\) | \(X^{adv(f_1)}_{test}\) | \(\{X_{test}\}^{adv(f_2)}\) | \(\{X^{adv(f_1)}_{test}\}^{adv(f_2)}\) | 309 | # |---------+--------------+-------------------------+--------------+-------------------------+-----------------------------+----------------------------------------| 310 | # | MNIST | 0.9914 | 0.0213 | 1.00 | 1.00 | 0.00 | 1.00 | 311 | # | CIFAR10 | 0.8279 | 0.1500 | 0.99 | 1.00 | 0.01 | 1.00 | 312 | # | SVHN | 0.9378 | 0.2453 | 1.00 | 1.00 | 0.00 | 1.00 | 313 | 314 | *** Jacobian-based Saliency Map Approach (JSMA) 315 | 316 | Similar to the target class method, JSMA cite:papernot2015-limitations 317 | allows to specify the desired target class. However, instead of 318 | adding noise to the whole input, JSMA changes only one pixel at a 319 | time. A /saliency score/ is calculated for each pixel and pixel with 320 | the highest score is chosen to be perturbed. 321 | #+BEGIN_EXPORT latex 322 | \begin{equation} \label{eq:jsma-saliency} 323 | \begin{split} 324 | s(x_i) &= \begin{cases} 325 | 0 & \text{ if } s_t < 0 \text{ or } s_o > 0\\ 326 | s_t\abs{s_o} & \text{ otherwise} 327 | \end{cases}\\ 328 | s_t &= \pdv{y_t}{x_i}\qquad s_o = \sum_{j\neq t}\pdv{y_j}{x_i} 329 | \end{split} 330 | \end{equation} 331 | #+END_EXPORT 332 | 333 | Concretely, \(s_t\) is the Jacobian value of the desired target class 334 | \(y_t\) w.r.t an individual pixel, \(s_o\) is the sum of Jacobian 335 | values of all non-target class. Intuitively, saliency score indicates 336 | the sensitivity of each output class w.r.t each individual pixel. And 337 | we want to perturb the pixel towards the direction where \(p(y_t\given 338 | x)\) increases the most. 339 | 340 | * Experiment 341 | :PROPERTIES: 342 | :CUSTOM_ID: sec:experiment 343 | :END: 344 | 345 | Generally, we follow the steps below to test the effectiveness and 346 | limitation of the binary classifier approach. 347 | 348 | 1. Train a deep neural network \(f_1\) on the original clean training 349 | data \(X_{train}\), and craft adversarial dataset from the original 350 | clean data, \(X_{train}\to X^{adv(f_1)}_{train}\), \(X_{test}\to 351 | X^{adv(f_1)}_{test}\). \(f_1\) is used to generate the attacking 352 | adversarial dataset which we want to filter out. 353 | 2. Train a binary classifier \(f_2\) on the combined (shuffled) 354 | training data \(\{X_{train}, X^{adv(f_1)}_{train}\}\), where 355 | \(X_{train}\) is labeled 0 and \(X^{adv(f_1)}_{train}\) labeled 1. 356 | 3. Test the accuracy of \(f_2\) on \(X_{test}\) and 357 | \(X^{adv(f_1)}_{test}\), respectively. 358 | 4. Construct second-round adversarial test data, \(\{X_{test}, 359 | X^{adv(f_1)}_{test}\}\to \{X_{test}, 360 | X^{adv(f_1)}_{test}\}^{adv(f_2)}\) and test \(f_2\) accuracy on 361 | this new adversarial dataset. Concretely, we want to test whether 362 | we could find adversarial samples 1) that can successfully bypass 363 | the binary classifier \(f_2\), and 2) that can still fool the 364 | target model \(f_1\) if they bypass the binary classifier. Since 365 | adversarial datasets are shown to be transferable among different 366 | machine learning techniques cite:papernot2016-transferability, the 367 | binary classifier approach will be seriously flawed if \(f_2\) 368 | failed this second-round attacking test. 369 | 370 | The code to reproduce our experiment are available 371 | https://github.com/gongzhitaao/adversarial-classifier. 372 | 373 | ** Efficiency and Robustness of the Classifier 374 | 375 | We evaluate the binary classifier approach on MNIST, CIFAR10, and SVHN 376 | datasets. Of all the datasets, the binary classifier achieved 377 | accuracy over 99% and was shown to be robust to a second-round 378 | adversarial attack. The results are summarized in Table 379 | ref:tbl:accuracy-summary. Each column denotes the model accuracy on 380 | the corresponding dataset. The direct conclusions from Table 381 | ref:tbl:accuracy-summary are summarized as follows. 382 | 1. Accuracy on \(X_{test}\) and \(X^{adv(f_1)}_{test}\) suggests that 383 | the binary classifier is very effective at separating adversarial 384 | from clean dataset. Actually during our experiment, the accuracy 385 | on \(X_{test}\) is always near 1, while the accuracy on 386 | \(X^{adv(f_1)}_{test}\) is either near 1 (successful) or near 0 387 | (unsuccessful). Which means that the classifier either 388 | successfully detects the subtle difference completely or fails 389 | completely. We did not observe any values in between. 390 | 3. Accuracy on \(\{X^{adv(f_1)}_{test}\}^{adv(f_2)}\) suggests that we 391 | were not successful in disguising adversarial samples to bypass the 392 | the classifier. In other words, the binary classifier approach is 393 | robust to a second-round adversarial attack. 394 | 4. Accuracy on \(\{X_{test}\}^{adv(f_2)}\) suggests that in case of 395 | the second-round attack, the binary classifier has very high false 396 | negative. In other words, it tends to recognize them all as 397 | adversarials. This, does not pose a problem in our opinion. Since 398 | our main focus is to block adversarial samples. 399 | 400 | ** Generalization Limitation 401 | :PROPERTIES: 402 | :CUSTOM_ID: subsec:generalization-limitation 403 | :END: 404 | 405 | Before we conclude too optimistic about the binary classifier approach 406 | performance, however, we discover that it suffers from the 407 | /generalization limitation/. 408 | 1. When trained to recognize adversarial dataset generated via 409 | FGSM/TGSM, the binary classifier is sensitive to the 410 | hyper-parameter \(\epsilon\). 411 | 2. The binary classifier is also sensitive to the adversarial crafting 412 | algorithm. 413 | 414 | In out experiment, the aforementioned limitations also apply to 415 | adversarial training cite:kurakin2016-adversarial-1,huang2015-learning 416 | and defensive distillation cite:papernot2015-distillation. 417 | 418 | *** Sensitivity to \(\epsilon\) 419 | 420 | Table ref:tbl:eps-sensitivity-cifar10 summarizes our tests on CIFAR10. 421 | For brevity, we use \(\eval{f_2}_{\epsilon=\epsilon_0}\) to denote 422 | that the classifier \(f_2\) is trained on adversarial data generated 423 | on \(f_1\) with \(\epsilon=\epsilon_0\). The binary classifier is 424 | trained on mixed clean data and adversarial dataset which is generated 425 | via FGSM with \(\epsilon=0.03\). Then we re-generate adversarial 426 | dataset via FGSM/TGSM with different \(\epsilon\) values. 427 | 428 | #+BEGIN_EXPORT latex 429 | \begin{table}[htbp] 430 | \caption{\label{tbl:eps-sensitivity-cifar10} 431 | \(\epsilon\) sensitivity on CIFAR10} 432 | \centering 433 | \begin{tabular}{lcll} 434 | \toprule 435 | & \phantom{a} & \multicolumn{2}{c}{\(\eval{f_2}_{\epsilon=0.03}\)} \\ 436 | \cmidrule{3-4} 437 | \(\epsilon\) && \(X_{test}\) & \(X^{adv(f_1)}_{test}\)\\ 438 | \midrule 439 | 0.3 && 0.9996 & 1.0000\\ 440 | 0.1 && 0.9996 & 1.0000\\ 441 | 0.03 && 0.9996 & 0.9997\\ 442 | 0.01 && 0.9996 & \textbf{0.0030}\\ 443 | \bottomrule 444 | \end{tabular} 445 | \end{table} 446 | #+END_EXPORT 447 | 448 | # #+CAPTION: \(\epsilon\) sensitivity on CIFAR10 449 | # #+NAME: tbl:eps-sensitivity-cifar10 450 | # #+ATTR_LaTeX: :booktabs true :align r|rr 451 | # | | \(\eval{f_2}_{\epsilon=0.03}\) | | 452 | # |--------------+--------------------------------+-------------------------| 453 | # | \(\epsilon\) | \(X_{test}\) | \(X^{adv(f_1)}_{test}\) | 454 | # |--------------+--------------------------------+-------------------------| 455 | # | 0.3 | 0.9996 | 1.0000 | 456 | # | 0.1 | 0.9996 | 1.0000 | 457 | # | 0.03 | 0.9996 | 0.9997 | 458 | # | 0.01 | 0.9996 | *0.0030* | 459 | 460 | As shown in Table ref:tbl:eps-sensitivity-cifar10, 461 | \(\eval{f_2}_{\epsilon=\epsilon_0}\) can correctly filter out 462 | adversarial dataset generated with \(\epsilon\geq\epsilon_0\), but 463 | fails when adversarial data are generated with 464 | \(\epsilon<\epsilon_1\). Results on MNIST and SVHN are similar. This 465 | phenomenon was also observed in defensive retraining 466 | cite:kurakin2016-adversarial-1. To overcome this issue, they proposed 467 | to use mixed \(\epsilon\) values to generate the adversarial datasets. 468 | However, Table ref:tbl:eps-sensitivity-cifar10 suggests that 469 | adversarial datasets generated with smaller \(\epsilon\) are 470 | /superset/ of those generated with larger \(\epsilon\). This 471 | hypothesis could be well explained by the linearity hypothesis 472 | cite:kurakin2016-adversarial,warde-farley2016-adversarial. The same 473 | conclusion also applies to adversarial training. In our experiment, 474 | the results of defensive retraining are similar to the binary 475 | classifier approach. 476 | 477 | *** Disparity among Adversarial Samples 478 | 479 | #+ATTR_LaTeX: :float multicolumn 480 | #+CAPTION: Adversarial training \cite{huang2015-learning,kurakin2016-adversarial-1} does not work. This is a church window plot \cite{warde-farley2016-adversarial}. Each pixel \((i, j)\) (row index and column index pair) represents a data point \(\tilde{x}\) in the input space and \(\tilde{x} = x + \vb{h}\epsilon_j + \vb{v}\epsilon_i\), where \(\vb{h}\) is the direction computed by FGSM and \(\vb{v}\) is a random direction orthogonal to \(\vb{h}\). The \(\epsilon\) ranges from \([-0.5, 0.5]\) and \(\epsilon_{(\cdot)}\) is the interpolated value in between. The central black dot \tikz[baseline=-0.5ex]{\draw[fill=black] (0,0) circle (0.3ex)} represents the original data point \(x\), the orange dot (on the right of the center dot) \tikz[baseline=-0.5ex]{\draw[fill=orange,draw=none] (0,0) circle (0.3ex)} represents the last adversarial sample created from \(x\) via FGSM that is used in the adversarial training and the blue dot \tikz[baseline=-0.5ex]{\draw[fill=blue,draw=none] (0,0) circle (0.3ex)} represents a random adversarial sample created from \(x\) that cannot be recognized with adversarial training. The three digits below each image, from left to right, are the data samples that correspond to the black dot, orange dot and blue dot, respectively. \tikz[baseline=0.5ex]{\draw (0,0) rectangle (2.5ex,2.5ex)} ( \tikz[baseline=0.5ex]{\draw[fill=black,opacity=0.1] (0,0) rectangle (2.5ex,2.5ex)} ) represents the data samples that are always correctly (incorrectly) recognized by the model. \tikz[baseline=0.5ex]{\draw[fill=red,opacity=0.1] (0,0) rectangle (2.5ex,2.5ex)} represents the adversarial samples that can be correctly recognized without adversarial training only. And \tikz[baseline=0.5ex]{\draw[fill=green,opacity=0.1] (0,0) rectangle (2.5ex,2.5ex)} represents the data points that were correctly recognized with adversarial training only, i.e., the side effect of adversarial training. 481 | #+NAME: fig:adv-training-not-working 482 | [[file:img/adv-training-not-working.pdf]] 483 | 484 | In our experiment, we also discovered that the binary classifier is 485 | also sensitive to the algorithms used to generate the adversarial 486 | datasets. 487 | 488 | Specifically, the binary classifier trained on FGSM adversarial 489 | dataset achieves good accuracy (over 99%) on FGSM adversarial dataset, 490 | but not on adversarial generated via JSMA, and vise versa. However, 491 | when binary classifier is trained on a mixed adversarial dataset from 492 | FGSM and JSMA, it performs well (with accuracy over 99%) on both 493 | datasets. This suggests that FGSM and JSMA generate adversarial 494 | datasets that are /far away/ from each other. It is too vague without 495 | defining precisely what is /being far away/. In our opinion, they are 496 | /far away/ in the same way that CIFAR10 is /far away/ from SVHN. A 497 | well-trained model on CIFAR10 will perform poorly on SVHN, and vise 498 | versa. However, a well-trained model on the the mixed dataset of 499 | CIFAR10 and SVHN will perform just as well, if not better, on both 500 | datasets, as if it is trained solely on one dataset. 501 | 502 | The adversarial datasets generated via FGSM and TGSM are, however, 503 | /compatible/ with each other. In other words, the classifier trained 504 | on one adversarial datasets performs well on adversarials from the 505 | other algorithm. They are compatible in the same way that training 506 | set and test set are compatible. Usually we expect a model, when 507 | properly trained, should generalize well to the unseen data from the 508 | same distribution, e.g., the test dataset. 509 | 510 | In effect, it is not just FGSM and JSMA are incompatible. We can 511 | generate adversarial data samples by a linear combination of the 512 | direction computed by FGSM and another random orthogonal direction, as 513 | illustrated in a church plot cite:warde-farley2016-adversarial Figure 514 | ref:fig:adv-training-not-working. Figure 515 | ref:fig:adv-training-not-working visually shows the effect of 516 | adversarial training cite:kurakin2016-adversarial-1. Each image 517 | represents adversarial samples generated from /one/ data sample, which 518 | is represented as a black dot in the center of each image, the last 519 | adversarial sample used in adversarial training is represented as an 520 | orange dot (on the right of black dot, i.e., in the direction computed 521 | by FGSM). The green area represents the adversarial samples that 522 | cannot be correctly recognized without adversarial training but can be 523 | correctly recognized with adversarial training. The red area 524 | represents data samples that can be correctly recognized without 525 | adversarial training but cannot be correctly recognized with 526 | adversarial training. In other words, it represents the side effect 527 | of adversarial training, i.e., slightly reducing the model accuracy. 528 | The white (gray) area represents the data samples that are always 529 | correctly (incorrectly) recognized with or without adversarial 530 | training. 531 | 532 | As we can see from Figure ref:fig:adv-training-not-working, 533 | adversarial training does make the model more robust against the 534 | adversarial sample (and adversarial samples around it to some extent) 535 | used for training (green area). However, it does not rule out all 536 | adversarials. There are still adversarial samples (gray area) that 537 | are not affected by the adversarial training. Further more, we could 538 | observe that the green area largely distributes along the horizontal 539 | direction, i.e., the FGSM direction. In cite:nguyen2014-deep, they 540 | observed similar results for fooling images. In their experiment, 541 | adversarial training with fooling images, deep neural network models 542 | are more robust against a limited set of fooling images. However they 543 | can still be fooled by other fooling images easily. 544 | 545 | * Conclusion 546 | :PROPERTIES: 547 | :CUSTOM_ID: sec:conclusion 548 | :END: 549 | 550 | We show in this paper that the binary classifier is a simple yet 551 | effective and robust way to separating adversarial from the original 552 | clean images. Its advantage over defensive retraining and 553 | distillation is that it serves as a preprocessing step without 554 | assumptions about the model it protects. Besides, it can be readily 555 | deployed without any modification of the underlying systems. However, 556 | as we empirically showed in the experiment, the binary classifier 557 | approach, defensive retraining and distillation all suffer from the 558 | generalization limitation. For future work, we plan to extend our 559 | current work in two directions. First, we want to investigate the 560 | disparity between different adversarial crafting methods and its 561 | effect on the generated adversarial space. Second, we will also 562 | carefully examine the cause of adversarial samples since intuitively 563 | the linear hypothesis does not seem right to us. 564 | 565 | 566 | #+LaTeX: \bibliographystyle{icml2017} 567 | #+LaTeX: \bibliography{/home/gongzhitaao/Dropbox/bibliography/nn} 568 | --------------------------------------------------------------------------------