├── LICENSE ├── README.md ├── backend_extra.py ├── eigenpro.py ├── kernels.py ├── layers.py ├── mnist.py ├── optimizers.py ├── run_mnist.py ├── utils.py └── wrapper.py /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # EigenPro2 2 | 3 | ## Introduction 4 | The EigenPro2 is proposed to achieve very fast, scalable, and accurate training for kernel machines. 5 | A detailed description of the method can be found in paper - 6 | ["Learning kernels that adapt to GPU"](https://arxiv.org/abs/1806.06144). 7 | 8 | ## Requirements: Tensorflow (>=1.2.1) and Keras (=2.0.8) 9 | ``` 10 | pip install tensorflow tensorflow-gpu keras 11 | ``` 12 | Follow the [Tensorflow installation guide](https://www.tensorflow.org/install/install_linux) for Virtualenv setup. 13 | 14 | 15 | ## Case 1: Quick Shell Script 16 | For a quick test on the MNIST dataset, execute the following command in a bash shell, 17 | ``` 18 | CUDA_VISIBLE_DEVICES=0 python run_mnist.py --kernel=Gaussian --s=5 --mem_gb=12 --epochs 1 2 3 4 5 19 | ``` 20 | 21 | The arguments specify that we use Gaussian kernel with bandwidth 5 on a GPU with 12 GB memory. 22 | The train and test (val here) errors are evaluated at the frist five epochs. 23 | ``` 24 | SVD time: 2.82, adjusted k: 277, s1: 0.15, new s1: 6.66e-04 25 | n_subsample=2000, mG=2000, eta=751.35, bs=1432, s1=1.53e-01, delta=0.05 26 | train error: 0.30% val error: 1.53% (1 epochs, 1.71 seconds) train l2: 4.99e-03 val l2: 7.88e-03 27 | train error: 0.04% val error: 1.43% (2 epochs, 3.25 seconds) train l2: 2.79e-03 val l2: 6.60e-03 28 | train error: 0.02% val error: 1.30% (3 epochs, 4.70 seconds) train l2: 1.78e-03 val l2: 6.04e-03 29 | train error: 0.00% val error: 1.23% (4 epochs, 6.24 seconds) train l2: 1.21e-03 val l2: 5.66e-03 30 | train error: 0.00% val error: 1.28% (5 epochs, 7.68 seconds) train l2: 9.40e-04 val l2: 5.60e-03 31 | ``` 32 | 33 | ## Case 2: Interactive Python Console 34 | When using a Python conosle, we can start by loading the dataset. 35 | In this example, we will load the MNIST dataset and transform its multiclass (10 classes) label 36 | into multiple (10) binary labels. 37 | ``` 38 | import keras, mnist 39 | n_class = 10 # number of classes 40 | (x_train, y_train), (x_test, y_test) = mnist.load() 41 | y_train = keras.utils.to_categorical(y_train, n_class) 42 | y_test = keras.utils.to_categorical(y_test, n_class) 43 | x_train, y_train, x_test, y_test = x_train.astype('float32'), \ 44 | y_train.astype('float32'), x_test.astype('float32'), y_test.astype('float32') 45 | ``` 46 | Then specify the kernel function (Gaussian kernel with bandwidth 5), 47 | ``` 48 | import kernels, wrapper 49 | kernel = wrapper.set_f_args(kernels.Gaussian, s=5) 50 | ``` 51 | 52 | Next, we initialize a kernel machine using EigenPro iteration/kernel based on a given Gaussian kernel, 53 | ``` 54 | from eigenpro import EigenPro 55 | model = EigenPro(kernel, x_train, n_class, mem_gb=12) 56 | ``` 57 | To train the model, call the fit method as follows 58 | ``` 59 | res = model.fit(x_train=x_train, y_train=y_train, x_val=x_test, y_val=y_test, epochs=[1, 2, 5, 10]) 60 | ``` 61 | Finally, to make prediction on any input feature, use the predict method as follows 62 | ``` 63 | scores = model.predict(x_test) 64 | ``` 65 | To calcuate the accuray, we can map the binary labels to multiclass label, 66 | ``` 67 | import numpy as np 68 | np.mean(np.argmax(scores, axis=1) == np.argmax(y_test, axis=1)) 69 | ``` 70 | 71 | -------------------------------------------------------------------------------- /backend_extra.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.python.client import device_lib 3 | 4 | def scatter_update(ref, indices, updates): 5 | """Update the value of `ref` at indecies to `updates`. 6 | """ 7 | return tf.scatter_update(ref, indices, updates) 8 | 9 | def hasGPU(): 10 | devs = device_lib.list_local_devices() 11 | return any([dev.device_type == u'GPU' for dev in devs]) 12 | -------------------------------------------------------------------------------- /eigenpro.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import scipy as sp 3 | import tensorflow as tf 4 | import time 5 | 6 | from keras import backend as K 7 | from keras.layers import Dense, Input, Lambda 8 | from keras.models import Model 9 | 10 | import utils 11 | from backend_extra import scatter_update 12 | from layers import KernelEmbedding 13 | from optimizers import PSGD 14 | 15 | 16 | def pre_eigenpro_f(feat, phi, q, n, mG, alpha, min_q=5, seed=1): 17 | """Prepare gradient map f for EigenPro and calculate 18 | scale factor for step size such that the update rule, 19 | p <- p - eta * g 20 | becomes, 21 | p <- p - scale * eta * (g - f(g)) 22 | 23 | Arguments: 24 | feat: feature matrix. 25 | phi: feature map or kernel function. 26 | q: top-q eigensystem for constructing eigenpro iteration/kernel. 27 | n: number of training points. 28 | mG: maxinum batch size corresponding to GPU memory. 29 | alpha: exponential factor (<= 1) for eigenvalue ratio. 30 | min_q: minimum value of q when q (if None) is calculated automatically. 31 | seed: seed for random number. 32 | 33 | Returns: 34 | f: tensor function. 35 | scale: factor that rescales step size. 36 | s1: largest eigenvalue. 37 | beta: largest k(x, x) for the EigenPro kernel. 38 | """ 39 | 40 | np.random.seed(seed) # set random seed for subsamples 41 | start = time.time() 42 | n_sample, d = feat.shape 43 | 44 | if q is None: 45 | svd_q = min(n_sample - 1, 1000) 46 | else: 47 | svd_q = q 48 | 49 | _s, _V = nystrom_kernel_svd(feat, phi, svd_q) 50 | 51 | # Choose k such that the batch size is bounded by 52 | # the subsample size and the memory size. 53 | # Keep the original k if it is pre-specified. 54 | qmG = np.sum(np.power(1 / _s, alpha) < min(n_sample / 5, mG)) - 1 55 | if q is None: 56 | max_m = min(max(n_sample / 5, mG), n_sample) 57 | q = np.sum(np.power(1 / _s, alpha) < max_m) - 1 58 | q = max(q, min_q) 59 | 60 | _s, _sq, _V = _s[:q-1], _s[q-1], _V[:, :q-1] 61 | 62 | s = K.constant(_s) 63 | V = utils.loadvar_in_sess(_V.astype('float32')) 64 | sq = K.constant(_sq) 65 | 66 | scale = np.power(_s[0] / _sq, alpha, dtype='float32') 67 | D = (1 - K.pow(sq / s, np.float32(alpha))) / s 68 | 69 | pre_f = lambda g, kfeat: K.dot( 70 | V * D, K.transpose(K.dot(K.dot(K.transpose(g), kfeat), V))) 71 | s1 = _s[0] 72 | print("SVD time: %.2f, q: %d, adjusted q: %d, s1: %.2f, new s1: %.2e" % 73 | (time.time() - start, qmG, q, _s[0], s1 / scale)) 74 | 75 | kxx = 1 - np.sum(_V ** 2, axis=1) * n_sample 76 | beta = np.max(kxx) 77 | 78 | return pre_f, scale, s1, beta 79 | 80 | 81 | def asm_eigenpro_f(pre_f, kfeat, inx): 82 | """Assemble map for EigenPro iteration""" 83 | def eigenpro_f(p, g, eta): 84 | inx_t = K.constant(inx, dtype='int32') 85 | 86 | kinx = tf.gather(kfeat, inx_t, axis=1) 87 | pinx = K.gather(p, inx_t) 88 | update_p = pinx + eta * pre_f(g, kinx) 89 | new_p = scatter_update(p, inx, update_p) 90 | return new_p 91 | return eigenpro_f 92 | 93 | 94 | def nystrom_kernel_svd(X, kernel_f, q, bs=512): 95 | """Compute top eigensystem of kernel matrix using Nystrom method. 96 | 97 | Arguments: 98 | X: data matrix of shape (n_sample, n_feature). 99 | kernel_f: kernel tensor function k(X, Y). 100 | q: top-q eigensystem. 101 | bs: batch size. 102 | 103 | Returns: 104 | s: top eigenvalues of shape (q). 105 | U: (rescaled) top eigenvectors of shape (n_sample, q). 106 | """ 107 | 108 | m, d = X.shape 109 | 110 | # Assemble kernel function evaluator. 111 | input_shape = (d, ) 112 | x = Input(shape=input_shape, dtype='float32', 113 | name='feat-for-nystrom') 114 | K_t = KernelEmbedding(kernel_f, X)(x) 115 | kernel_tf = Model(x, K_t) 116 | 117 | K = kernel_tf.predict(X, batch_size=bs) 118 | W = K / m 119 | w, V = sp.linalg.eigh(W, eigvals=(m-q, m-1)) 120 | U1r, s = V[:, ::-1], w[::-1][:q] 121 | NU = np.float32(U1r[:, :q] / np.sqrt(m)) 122 | 123 | return s, NU 124 | 125 | 126 | class EigenPro(object): 127 | 128 | def __init__(self, kernel, centers, n_label, mem_gb, 129 | n_subsample=None, q=None, bs=None, 130 | metric='accuracy', scale=.5, seed=1): 131 | """Assemble learner using EigenPro iteration/kernel. 132 | 133 | Arguments: 134 | kernel: kernel tensor function k(X, Y). 135 | centers: kernel centers of shape (n_center, n_feature). 136 | n_label: number of labels. 137 | mem_gb: GPU memory in GB. 138 | n_subsample:number of subsamples for preconditioner. 139 | q: top-q eigensystem for preconditioner. 140 | bs: mini-batch size. 141 | metric: keras metric, e.g., 'accuracy'. 142 | scale: step size factor (0.5 for mse loss). 143 | seed: random seed. 144 | """ 145 | 146 | n, d = centers.shape 147 | if n_subsample is None: 148 | if n < 100000: 149 | n_subsample = min(2000, n) 150 | else: 151 | n_subsample = 12000 152 | 153 | mem_bytes = (mem_gb - 0.6) * 1024**3 # preserve 600MB 154 | # Has a factor 3 due to tensorflow implementation. 155 | bsizes = np.arange(n_subsample) 156 | mem_usages = ((d + 2 * n_label + 3 * bsizes) * n + n_subsample * 1000) * 4 157 | mG = np.sum(mem_usages < mem_bytes) # device-dependent batch size 158 | 159 | # Calculate batch/step size for improved EigenPro iteration. 160 | np.random.seed(seed) 161 | pinx = np.random.choice(n, n_subsample, replace=False).astype('int32') 162 | kf, gap, s1, beta = pre_eigenpro_f( 163 | centers[pinx], kernel, q, n, mG, alpha=.95, seed=seed) 164 | new_s1 = s1 / gap 165 | 166 | if bs is None: 167 | bs = min(np.int32(beta / new_s1 + 1), mG) 168 | 169 | if bs < beta / new_s1 + 1: 170 | eta = bs / beta 171 | elif bs < n: 172 | eta = 2 * bs / (beta + (bs - 1) * new_s1) 173 | else: 174 | eta = 0.95 * 2 / new_s1 175 | eta = scale * eta 176 | 177 | print("n_subsample=%d, mG=%d, eta=%.2f, bs=%d, s1=%.2e, beta=%.2f" % 178 | (n_subsample, mG, eta, bs, s1, beta)) 179 | eta = np.float32(eta * n_label) 180 | 181 | # Assemble kernel model. 182 | ix = Input(shape=(d+1,), dtype='float32', name='indexed-feat') 183 | x, index = utils.separate_index(ix) # features, sample_id 184 | kfeat = KernelEmbedding(kernel, centers, 185 | input_shape=(d,))(x) 186 | 187 | y = Dense(n_label, input_shape=(n,), 188 | activation='linear', 189 | kernel_initializer='zeros', 190 | use_bias=False)(kfeat) 191 | model = Model(ix, y) 192 | model.compile( 193 | loss='mse', 194 | optimizer=PSGD(pred_t=y, index_t=index, eta=eta, 195 | eigenpro_f=asm_eigenpro_f(kf, kfeat, pinx)), 196 | metrics=[metric]) 197 | 198 | self.n_label = n_label 199 | self.seed = seed 200 | self.bs = bs 201 | self.model = model 202 | 203 | def fit(self, x_train, y_train, x_val, y_val, epochs, 204 | n_sample=10000, seed=1): 205 | """Train the model. 206 | 207 | Arguments: 208 | x_train: feature matrix of shape (n_train, n_feature). 209 | y_train: label matrix of shape (n_train, n_label). 210 | x_val: feature matrix for validation. 211 | y_val: label matrix for validation. 212 | epochs: list of epochs when the error is calculated. 213 | n_sample: number of subsamples used to estimate train error. 214 | seed: seed for random number. 215 | 216 | Return: 217 | res: dictionary with key: epoch, 218 | value: (train_error, test_error, train_time). 219 | """ 220 | assert self.n_label == y_train.shape[1] 221 | np.random.seed(seed) 222 | 223 | x_train = utils.add_index(x_train) 224 | x_val = utils.add_index(x_val) 225 | 226 | bs = self.bs 227 | res = dict() 228 | 229 | initial_epoch=0 230 | train_sec = 0 # training time in seconds 231 | n, _ = x_train.shape 232 | 233 | # Subsample training data for fast estimation of training loss. 234 | inx = np.random.choice(n, min(n, n_sample), replace=False) 235 | x_sample, y_sample = x_train[inx], y_train[inx] 236 | 237 | for epoch in epochs: 238 | start = time.time() 239 | for _ in range(epoch - initial_epoch): 240 | epoch_ids = np.random.choice(n, n / bs * bs, replace=False) 241 | for batch_ids in np.array_split(epoch_ids, n / bs): 242 | x_batch, y_batch = x_train[batch_ids], y_train[batch_ids] 243 | self.model.train_on_batch(x_batch, y_batch) 244 | 245 | train_sec += time.time() - start 246 | tr_score = self.model.evaluate(x_sample, y_sample, batch_size=bs, verbose=0) 247 | tv_score = self.model.evaluate(x_val, y_val, batch_size=bs, verbose=0) 248 | print("train error: %.2f%%\tval error: %.2f%% (%d epochs, %.2f seconds)\t" 249 | "train l2: %.2e\tval l2: %.2e" % 250 | ((1 - tr_score[1]) * 100, (1 - tv_score[1]) * 100, epoch, train_sec, 251 | tr_score[0], tv_score[0])) 252 | res[epoch] = (tr_score, tv_score, train_sec) 253 | initial_epoch = epoch 254 | 255 | return res 256 | 257 | def predict(self, x_feat): 258 | """Predict regression scores. 259 | 260 | Argument: 261 | x_feat: feature matrix of shape (?, n_feature). 262 | 263 | Return: 264 | score matrix of shape (?, n_label). 265 | """ 266 | return self.model.predict(utils.add_index(x_feat), batch_size=self.bs) 267 | -------------------------------------------------------------------------------- /kernels.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from keras import backend as K 4 | 5 | def D2(X, Y, Y2=None, YT=None): 6 | """ Calculate the pointwise (squared) distance. 7 | 8 | Arguments: 9 | X: of shape (n_sample, n_feature). 10 | Y: of shape (n_center, n_feature). 11 | Y2: of shape (1, n_center). 12 | YT: of shape (n_feature, n_center). 13 | 14 | Returns: 15 | pointwise distances (n_sample, n_center). 16 | """ 17 | X2 = K.sum(K.square(X), axis = 1, keepdims=True) 18 | if Y2 is None: 19 | if X is Y: 20 | Y2 = X2 21 | else: 22 | Y2 = K.sum(K.square(Y), axis = 1, keepdims=True) 23 | Y2 = K.reshape(Y2, (1, K.shape(Y)[0])) 24 | if YT is None: 25 | YT = K.transpose(Y) 26 | d2 = K.reshape(X2, (K.shape(X)[0], 1)) \ 27 | + Y2 - 2 * K.dot(X, YT) # x2 + y2 - 2xy 28 | return d2 29 | 30 | def Gaussian(X, Y, s, dist2_f=D2): 31 | """ Gaussian kernel. 32 | 33 | Arguments: 34 | X: of shape (n_sample, n_feature). 35 | Y: of shape (n_center, n_feature). 36 | s: kernel bandwidth. 37 | 38 | Returns: 39 | kernel matrix of shape (n_sample, n_center). 40 | """ 41 | assert s > 0 42 | 43 | d2 = dist2_f(X, Y) 44 | gamma = np.float32(1. / (2 * s ** 2)) 45 | G = K.exp(-gamma * K.clip(d2, 0, None)) 46 | return G 47 | 48 | def Laplacian(X, Y, s, dist2_f=D2): 49 | """ Laplacian kernel. 50 | 51 | Arguments: 52 | X: of shape (n_sample, n_feature). 53 | Y: of shape (n_center, n_feature). 54 | s: kernel bandwidth. 55 | 56 | Returns: 57 | kernel matrix of shape (n_sample, n_center). 58 | """ 59 | assert s > 0 60 | 61 | d2 = K.clip(dist2_f(X, Y), 0, None) 62 | d = K.sqrt(d2) 63 | G = K.exp(- d / s) 64 | return G 65 | 66 | def Cauchy(X, Y, s, dist2_f=D2): 67 | """ Cauchy kernel. 68 | 69 | Arguments: 70 | X: of shape (n_sample, n_feature). 71 | Y: of shape (n_center, n_feature). 72 | s: kernel bandwidth. 73 | 74 | Returns: 75 | kernel matrix of shape (n_sample, n_center). 76 | """ 77 | assert s > 0 78 | 79 | d2 = dist2_f(X, Y) 80 | s2 = np.float32(s**2) 81 | G = 1 / K.exp( 1 + K.clip(d2, 0, None) / s2) 82 | return G 83 | 84 | 85 | def Dispersal(X, Y, s, gamma, dist2_f=D2): 86 | """ Dispersal kernel. 87 | 88 | Arguments: 89 | X: of shape (n_sample, n_feature). 90 | Y: of shape (n_center, n_feature). 91 | s: kernel bandwidth. 92 | gamma: dispersal factor. 93 | 94 | Returns: 95 | kernel matrix of shape (n_sample, n_center). 96 | """ 97 | assert s > 0 98 | 99 | d2 = K.clip(dist2_f(X, Y), 0, None) 100 | d = K.pow(d2, gamma / 2.) 101 | G = K.exp(- d / np.float32(s)) 102 | return G 103 | -------------------------------------------------------------------------------- /layers.py: -------------------------------------------------------------------------------- 1 | from keras import backend as K 2 | from keras.engine.topology import Layer 3 | import numpy as np 4 | 5 | from kernels import D2 6 | import utils 7 | 8 | class KernelEmbedding(Layer): 9 | """ Generate kernel features. 10 | 11 | Arguments: 12 | kernel_f: kernel function k(x, y). 13 | centers: matrix of shape (n_center, n_feature). 14 | """ 15 | 16 | def __init__(self, kernel_f, centers, **kwargs): 17 | self.kernel_f = kernel_f 18 | self._centers = centers 19 | self.n_center = centers.shape[0] 20 | super(KernelEmbedding, self).__init__(**kwargs) 21 | 22 | def build(self, input_shape): 23 | self.centers = utils.loadvar_in_sess(self._centers.T.astype('float32'), name='kernel.centers') 24 | center2 = K.eval(K.sum(K.square(self.centers), axis=0, keepdims=True)).astype('float32') 25 | center2_t = utils.loadvar_in_sess(np.reshape(center2, (1, -1)), name='kernel.centers.norm') 26 | self.dist2_f = lambda x, y: D2(x, None, Y2=center2_t, YT=self.centers) # Pre-compute norm for centers. 27 | 28 | super(KernelEmbedding, self).build(input_shape) # Be sure to call this somewhere! 29 | 30 | def call(self, x): 31 | embed = self.kernel_f(x, None, self.dist2_f) 32 | return embed 33 | 34 | def compute_output_shape(self, input_shape): 35 | return (input_shape[0], self.n_center) 36 | 37 | 38 | def rff(X, W): 39 | """Calculate random Fourier features according to paper, 40 | 'Random Features for Large-Scale Kernel Machines'. 41 | 42 | Arguments: 43 | X: data matrix of shape (n, D). 44 | W: weight matrix of shape (D, d). 45 | 46 | Returns: 47 | feature matrix of shape (n, d). 48 | """ 49 | 50 | d = K.get_variable_shape(W)[1] 51 | dot = K.dot(X, W) # of shape (n, d) 52 | RF = K.concatenate([K.cos(dot), K.sin(dot)], axis=1) / np.sqrt(d, dtype='float32') 53 | return RF 54 | 55 | 56 | class RFF(Layer): 57 | """ Generate random Fourier features. 58 | 59 | Arguments: 60 | weights: of shape (D, d). 61 | """ 62 | 63 | def __init__(self, weights, **kwargs): 64 | self._weights = weights 65 | self.d = weights.shape[1] 66 | super(RFF, self).__init__(**kwargs) 67 | 68 | def build(self, input_shape): 69 | self.W = self.add_weight(name='rff-weight', 70 | shape=self._weights.shape, 71 | initializer=(lambda shape: self._weights), 72 | trainable=False) 73 | super(RFF, self).build(input_shape) # Be sure to call this somewhere! 74 | 75 | def call(self, x): 76 | embed = rff(x, self.W) 77 | return embed 78 | 79 | def compute_output_shape(self, input_shape): 80 | return (input_shape[0], 2 * self.d) 81 | -------------------------------------------------------------------------------- /mnist.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from keras.datasets.mnist import load_data 4 | 5 | 6 | def unit_range_normalize(X): 7 | min_ = np.min(X, axis=0) 8 | max_ = np.max(X, axis=0) 9 | diff_ = max_ - min_ 10 | diff_[diff_<=0.0] = np.maximum(1.0, min_[diff_<=0.0]) 11 | SX = (X - min_) / diff_ 12 | return SX 13 | 14 | def load(): 15 | # input image dimensions 16 | img_rows, img_cols = 28, 28 17 | 18 | # the data, shuffled and split between train and test sets 19 | (x_train, y_train), (x_test, y_test) = load_data() 20 | 21 | x_train = x_train.reshape(x_train.shape[0], img_rows * img_cols) 22 | x_test = x_test.reshape(x_test.shape[0], img_rows * img_cols) 23 | 24 | x_train = x_train.astype('float32') / 255 25 | x_test = x_test.astype('float32') / 255 26 | 27 | x_train = unit_range_normalize(x_train) 28 | x_test = unit_range_normalize(x_test) 29 | print("Load MNIST dataset.") 30 | print(x_train.shape[0], 'train samples') 31 | print(x_test.shape[0], 'test samples') 32 | 33 | return (x_train, y_train), (x_test, y_test) 34 | -------------------------------------------------------------------------------- /optimizers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from keras import backend as K 3 | from keras.optimizers import Optimizer 4 | 5 | from backend_extra import scatter_update 6 | 7 | 8 | def nesterov(p0, p1, rmax=.95): 9 | """Nesterov method. 10 | 11 | Arguments: 12 | p0: weight parameter tensor variable. 13 | p1: updated weight parameter tensor. 14 | rmax: maximum momentum term weight in [0, 1]. 15 | 16 | Returns: 17 | p2: parameter tensor adjusted by Nesterov method. 18 | updates: a list of tensor updates. 19 | """ 20 | 21 | p = K.variable(p0, name='nesterov.orig.p', dtype='float32') 22 | r = K.constant(-rmax, dtype='float32') 23 | 24 | p2 = (1 - r) * p1 + r * p 25 | updates = [K.update(p, p1)] 26 | return p2, updates 27 | 28 | 29 | class PSGD(Optimizer): 30 | """Primal stochastic gradient descent optimizer. 31 | 32 | Arguments: 33 | pred_t: tensor. Prediction result. 34 | index_t: tensor. Mini-batch indices for primal updates. 35 | eta: float >= 0. Step size. 36 | eigenpro_f: Map grad of the original kernel to that of 37 | the eigenpro kernel. 38 | nesterov_r: Nesterov parameter. 39 | """ 40 | 41 | def __init__(self, pred_t, index_t, eta=0.01, 42 | eigenpro_f=None, nesterov_r=None, **kwargs): 43 | super(PSGD, self).__init__(**kwargs) 44 | self.eta = K.variable(eta, name='eta') 45 | self.pred_t = pred_t 46 | self.index_t = index_t 47 | self.eigenpro_f = eigenpro_f 48 | self.nesterov_r = nesterov_r 49 | 50 | def get_updates(self, loss, params): 51 | self.updates = [] 52 | grads = self.get_gradients(loss, [self.pred_t]) 53 | 54 | eta = self.eta 55 | index = self.index_t 56 | eigenpro_f = self.eigenpro_f 57 | 58 | shapes = [K.get_variable_shape(p) for p in params] 59 | for p, g in zip(params, grads): 60 | update_p = K.gather(p, index) - eta * g 61 | new_p = scatter_update(p, index, update_p) 62 | 63 | if eigenpro_f: 64 | new_p = eigenpro_f(new_p, g, eta) 65 | 66 | if self.nesterov_r is not None: 67 | new_p, updates = nesterov(p, new_p, rmax=self.nesterov_r) 68 | self.updates += updates 69 | 70 | self.updates.append(K.update(p, new_p)) 71 | return self.updates 72 | 73 | def get_config(self): 74 | config = {'eta': float(K.get_value(self.eta))} 75 | base_config = super(PSGD, self).get_config() 76 | return dict(list(base_config.items()) + list(config.items())) 77 | -------------------------------------------------------------------------------- /run_mnist.py: -------------------------------------------------------------------------------- 1 | '''Train kernel methods on the MNIST dataset. 2 | Require tensorflow (>=1.2.1) and GPU device. 3 | Run command: 4 | CUDA_VISIBLE_DEVICES=0 python run_mnist.py --kernel=Gaussian --s=5 --mem_gb=12 --epochs 1 2 3 4 5 5 | ''' 6 | from __future__ import print_function 7 | 8 | import argparse 9 | import keras 10 | import numpy as np 11 | import sys 12 | import warnings 13 | 14 | from distutils.version import StrictVersion 15 | 16 | import kernels 17 | import mnist 18 | import utils 19 | import wrapper 20 | 21 | from eigenpro import EigenPro 22 | from backend_extra import hasGPU 23 | 24 | 25 | assert StrictVersion(keras.__version__) >= StrictVersion('2.0.8'), \ 26 | "Requires Keras (>=2.0.8)." 27 | 28 | if StrictVersion(keras.__version__) > StrictVersion('2.0.8'): 29 | warnings.warn('\n\nThis code has been tested with Keras 2.0.8. ' 30 | 'If the\ncurrent version (%s) fails, ' 31 | 'switch to 2.0.8 by command,\n\n' 32 | '\tpip install Keras==2.0.8\n\n' %(keras.__version__), Warning) 33 | 34 | assert keras.backend.backend() == u'tensorflow', \ 35 | "Requires Tensorflow (>=1.2.1)." 36 | assert hasGPU(), "Requires GPU." 37 | 38 | 39 | parser = argparse.ArgumentParser(description='Run tests.') 40 | parser.add_argument('--kernel', type=str, default='Gaussian', 41 | help='kernel function (e.g. Gaussian, Laplacian, and Cauchy)', required=True) 42 | parser.add_argument('-s', '--s', type=np.float32, help="bandwidth", required=True) 43 | parser.add_argument('-mem_gb', '--mem_gb', type=np.float32, help="bandwidth", required=True) 44 | parser.add_argument('-epochs', '--epochs', nargs='+', type=int, 45 | help="epochs to calculate errors, e.g., --epochs 1 2 3 4 5", required=True) 46 | 47 | parser.add_argument('-q', '--q', type=np.int32, default=None, 48 | help="using the top-q eigensystem for the eigenpro iteration/kernel") 49 | parser.add_argument('-bs', '--bs', type=np.int32, default=None, 50 | help="size of mini-batch") 51 | parser.add_argument('-n_subsample', '--n_subsample', type=np.int32, default=None, 52 | help="subsample size") 53 | 54 | args = parser.parse_args() 55 | args_dict = vars(args) 56 | 57 | 58 | # Load dataset. 59 | n_class = 10 # number of classes 60 | (x_train, y_train), (x_test, y_test) = mnist.load() 61 | y_train = keras.utils.to_categorical(y_train, n_class) 62 | y_test = keras.utils.to_categorical(y_test, n_class) 63 | x_train, y_train, x_test, y_test = x_train.astype('float32'), \ 64 | y_train.astype('float32'), x_test.astype('float32'), y_test.astype('float32') 65 | 66 | # Choose kernel functions. 67 | s = args_dict['s'] # kernel bandwidth 68 | if args_dict['kernel'] == 'Gaussian': 69 | kernel = wrapper.set_f_args(kernels.Gaussian, s=s) 70 | 71 | elif args_dict['kernel'] == 'Laplacian': 72 | kernel = wrapper.set_f_args(kernels.Laplacian, s=s) 73 | 74 | elif args_dict['kernel'] == 'Cauchy': 75 | kernel = wrapper.set_f_args(kernels.Cauchy, s=s) 76 | 77 | else: 78 | raise Exception("Unknown kernel function - %s. \ 79 | Try Gaussian, Laplacian, or Cauchy" 80 | % args_dict['kernel']) 81 | 82 | # Initialize and train the model. 83 | model = EigenPro(kernel, x_train, n_class, 84 | mem_gb=args_dict['mem_gb'], 85 | n_subsample=args_dict['n_subsample'], 86 | q=args_dict['q'], 87 | bs=args_dict['bs']) 88 | model.fit(x_train, y_train, 89 | x_val=x_test, y_val=y_test, 90 | epochs=args_dict['epochs']) 91 | 92 | utils.reset() 93 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | 3 | import gc 4 | import numpy as np 5 | import tensorflow as tf 6 | 7 | from keras import backend as K 8 | from keras.layers import Lambda, Input 9 | 10 | 11 | def enable_xla(): 12 | """Enable XLA optimization in the default session of Keras.""" 13 | config = tf.ConfigProto() 14 | config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1 15 | K.set_session(tf.Session(config=config)) 16 | 17 | 18 | def loadvar(array, trainable=False, name=None): 19 | """Load numpy array to tensorflow variable without 2GB limitation 20 | 21 | Arguments: 22 | array: numpy array. 23 | trainable: boolean. 24 | 25 | Returns: 26 | var: tensorflow variable. 27 | load_var: function to load array in given seesion. 28 | """ 29 | placeholder = tf.placeholder(dtype=array.dtype, shape=array.shape) 30 | var = tf.Variable(placeholder, trainable=trainable, 31 | collections=[], name=name) 32 | 33 | load_var = lambda sess: sess.run(var.initializer, feed_dict={placeholder: array}) 34 | return var, load_var 35 | 36 | 37 | def loadvar_in_sess(array, trainable=False, sess=None, name=None): 38 | var, load_var = loadvar(array, trainable, name) 39 | if sess is None: 40 | sess = K.get_session() # Keras default session 41 | load_var(sess) 42 | return var 43 | 44 | 45 | def add_index(X): 46 | """Append sample index as the last feature to data matrix. 47 | 48 | Arguments: 49 | X: matrix of shape (n_sample, n_feat). 50 | 51 | Returns: 52 | matrix of shape (n_sample, n_feat+1). 53 | """ 54 | inx = np.reshape(np.arange(X.shape[0]), (-1, 1)) 55 | return np.hstack([X, inx]) 56 | 57 | 58 | def separate_index(IX): 59 | """Separate the index feature from the indexed tensor matrix. 60 | 61 | Arguments: 62 | IX: matrix of shape (n_sample, n_feat+1). 63 | 64 | Returns: 65 | X: matrix of shape (n_sample, n_feat). 66 | index: vector of shape (n_sample,). 67 | """ 68 | X = Lambda(lambda x: x[:, :-1])(IX) 69 | index = Lambda(lambda x: x[:, -1])(IX) 70 | return X, K.cast(index, dtype='int32') 71 | 72 | 73 | def reset(): 74 | """Reset the Keras session and release the GPU memory.""" 75 | K.clear_session() 76 | reload(K) 77 | gc.collect() 78 | -------------------------------------------------------------------------------- /wrapper.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | 3 | import inspect 4 | 5 | 6 | def set_f_args(f, **default_args): 7 | """Set the default argument values of f and return 8 | a corresponding function. 9 | """ 10 | def _set_f_default_args(f, args, default_args, kwargs): 11 | if inspect.isclass(f): 12 | # remove the self. 13 | num_args = f.__init__.func_code.co_argcount - 1 14 | arg_names = f.__init__.func_code.co_varnames[1:num_args] 15 | else: 16 | num_args = f.func_code.co_argcount 17 | arg_names = f.func_code.co_varnames[:num_args] 18 | 19 | cursor = 0 20 | merged_args = [] 21 | for arg in arg_names: 22 | if arg == 'self': 23 | continue 24 | if arg in default_args: 25 | merged_args.append(default_args[arg]) 26 | elif arg in kwargs: 27 | merged_args.append(kwargs[arg]) 28 | else: 29 | merged_args.append(args[cursor]) 30 | cursor += 1 31 | merged_args += list(args[cursor:]) 32 | return f(*merged_args) 33 | 34 | # Since types.LambdaType is types.FunctionType, 35 | # we have to distinguish the functions in a hacking style. 36 | if f.__name__ == '': 37 | g = lambda *args, **kwargs: \ 38 | f(*args, **dict(default_args.items() \ 39 | + kwargs.items())) 40 | else: 41 | g = lambda *args, **kwargs: \ 42 | _set_f_default_args(f, args, default_args, kwargs) 43 | 44 | return g 45 | --------------------------------------------------------------------------------