├── LICENSE
├── README.md
├── backend_extra.py
├── eigenpro.py
├── kernels.py
├── layers.py
├── mnist.py
├── optimizers.py
├── run_mnist.py
├── utils.py
└── wrapper.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Permission is hereby granted, free of charge, to any person obtaining a copy
 4 | of this software and associated documentation files (the "Software"), to deal
 5 | in the Software without restriction, including without limitation the rights
 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 7 | copies of the Software, and to permit persons to whom the Software is
 8 | furnished to do so, subject to the following conditions:
 9 | 
10 | The above copyright notice and this permission notice shall be included in all
11 | copies or substantial portions of the Software.
12 | 
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19 | SOFTWARE.
20 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # EigenPro2
 2 | 
 3 | ## Introduction
 4 | The EigenPro2 is proposed to achieve very fast, scalable, and accurate training for kernel machines.
 5 | A detailed description of the method can be found in paper -
 6 | ["Learning kernels that adapt to GPU"](https://arxiv.org/abs/1806.06144).
 7 | 
 8 | ## Requirements: Tensorflow (>=1.2.1) and Keras (=2.0.8)
 9 | ```
10 | pip install tensorflow tensorflow-gpu keras
11 | ```
12 | Follow the [Tensorflow installation guide](https://www.tensorflow.org/install/install_linux) for Virtualenv setup.
13 | 
14 | 
15 | ## Case 1: Quick Shell Script
16 | For a quick test on the MNIST dataset, execute the following command in a bash shell,
17 | ```
18 | CUDA_VISIBLE_DEVICES=0 python run_mnist.py --kernel=Gaussian --s=5 --mem_gb=12 --epochs 1 2 3 4 5
19 | ```
20 | 
21 | The arguments specify that we use Gaussian kernel with bandwidth 5 on a GPU with 12 GB memory.
22 | The train and test (val here) errors are evaluated at the frist five epochs.
23 | ```
24 | SVD time: 2.82, adjusted k: 277, s1: 0.15, new s1: 6.66e-04
25 | n_subsample=2000, mG=2000, eta=751.35, bs=1432, s1=1.53e-01, delta=0.05
26 | train error: 0.30%      val error: 1.53% (1 epochs, 1.71 seconds)       train l2: 4.99e-03      val l2: 7.88e-03
27 | train error: 0.04%      val error: 1.43% (2 epochs, 3.25 seconds)       train l2: 2.79e-03      val l2: 6.60e-03
28 | train error: 0.02%      val error: 1.30% (3 epochs, 4.70 seconds)       train l2: 1.78e-03      val l2: 6.04e-03
29 | train error: 0.00%      val error: 1.23% (4 epochs, 6.24 seconds)       train l2: 1.21e-03      val l2: 5.66e-03
30 | train error: 0.00%      val error: 1.28% (5 epochs, 7.68 seconds)       train l2: 9.40e-04      val l2: 5.60e-03
31 | ```
32 | 
33 | ## Case 2: Interactive Python Console
34 | When using a Python conosle, we can start by loading the dataset.
35 | In this example, we will load the MNIST dataset and transform its multiclass (10 classes) label
36 | into multiple (10) binary labels.
37 | ```
38 | import keras, mnist
39 | n_class = 10  # number of classes
40 | (x_train, y_train), (x_test, y_test) = mnist.load()
41 | y_train = keras.utils.to_categorical(y_train, n_class)
42 | y_test = keras.utils.to_categorical(y_test, n_class)
43 | x_train, y_train, x_test, y_test = x_train.astype('float32'), \
44 |     y_train.astype('float32'), x_test.astype('float32'), y_test.astype('float32')
45 | ```
46 | Then specify the kernel function (Gaussian kernel with bandwidth 5),
47 | ```
48 | import kernels, wrapper
49 | kernel = wrapper.set_f_args(kernels.Gaussian, s=5)
50 | ```
51 | 
52 | Next, we initialize a kernel machine using EigenPro iteration/kernel based on a given Gaussian kernel,
53 | ```
54 | from eigenpro import EigenPro
55 | model = EigenPro(kernel, x_train, n_class, mem_gb=12)
56 | ```
57 | To train the model, call the fit method as follows 
58 | ```
59 | res = model.fit(x_train=x_train, y_train=y_train, x_val=x_test, y_val=y_test, epochs=[1, 2, 5, 10])
60 | ```
61 | Finally, to make prediction on any input feature, use the predict method as follows
62 | ```
63 | scores = model.predict(x_test)
64 | ```
65 | To calcuate the accuray, we can map the binary labels to multiclass label,
66 | ```
67 | import numpy as np
68 | np.mean(np.argmax(scores, axis=1) == np.argmax(y_test, axis=1))
69 | ```
70 | 
71 | 


--------------------------------------------------------------------------------
/backend_extra.py:
--------------------------------------------------------------------------------
 1 | import tensorflow as tf
 2 | from tensorflow.python.client import device_lib
 3 | 
 4 | def scatter_update(ref, indices, updates):
 5 |     """Update the value of `ref` at indecies to `updates`.
 6 |     """
 7 |     return tf.scatter_update(ref, indices, updates)
 8 | 
 9 | def hasGPU():
10 |     devs = device_lib.list_local_devices()
11 |     return any([dev.device_type == u'GPU' for dev in devs])
12 | 


--------------------------------------------------------------------------------
/eigenpro.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import scipy as sp
  3 | import tensorflow as tf
  4 | import time
  5 | 
  6 | from keras import backend as K
  7 | from keras.layers import Dense, Input, Lambda
  8 | from keras.models import Model
  9 | 
 10 | import utils
 11 | from backend_extra import scatter_update
 12 | from layers import KernelEmbedding
 13 | from optimizers import PSGD
 14 | 
 15 | 
 16 | def pre_eigenpro_f(feat, phi, q, n, mG, alpha, min_q=5, seed=1):
 17 |     """Prepare gradient map f for EigenPro and calculate
 18 |     scale factor for step size such that the update rule,
 19 |         p <- p - eta * g
 20 |     becomes,
 21 |         p <- p - scale * eta * (g - f(g))
 22 | 
 23 |     Arguments:
 24 |         feat:   feature matrix.
 25 |         phi:    feature map or kernel function.
 26 |         q:      top-q eigensystem for constructing eigenpro iteration/kernel.
 27 |         n:      number of training points.
 28 |         mG:     maxinum batch size corresponding to GPU memory. 
 29 |         alpha:  exponential factor (<= 1) for eigenvalue ratio.
 30 |         min_q:  minimum value of q when q (if None) is calculated automatically.
 31 |         seed:   seed for random number.
 32 | 
 33 |     Returns:
 34 |         f:      tensor function.
 35 |         scale:  factor that rescales step size.
 36 |         s1:     largest eigenvalue.
 37 |         beta:   largest k(x, x) for the EigenPro kernel.
 38 |     """
 39 | 
 40 |     np.random.seed(seed) # set random seed for subsamples
 41 |     start = time.time()
 42 |     n_sample, d = feat.shape
 43 | 
 44 |     if q is None:
 45 |         svd_q = min(n_sample - 1, 1000)
 46 |     else:
 47 |         svd_q = q
 48 | 
 49 |     _s, _V = nystrom_kernel_svd(feat, phi, svd_q)
 50 | 
 51 |     # Choose k such that the batch size is bounded by
 52 |     #   the subsample size and the memory size.
 53 |     #   Keep the original k if it is pre-specified.
 54 |     qmG = np.sum(np.power(1 / _s, alpha) < min(n_sample / 5, mG)) - 1
 55 |     if q is None:
 56 |         max_m = min(max(n_sample / 5, mG), n_sample)
 57 |         q = np.sum(np.power(1 / _s, alpha) < max_m) - 1
 58 |         q = max(q, min_q)
 59 | 
 60 |     _s, _sq, _V = _s[:q-1], _s[q-1], _V[:, :q-1]
 61 | 
 62 |     s = K.constant(_s)
 63 |     V = utils.loadvar_in_sess(_V.astype('float32'))
 64 |     sq = K.constant(_sq)
 65 | 
 66 |     scale = np.power(_s[0] / _sq, alpha, dtype='float32')
 67 |     D = (1 - K.pow(sq / s, np.float32(alpha))) / s
 68 | 
 69 |     pre_f = lambda g, kfeat: K.dot(
 70 |         V * D, K.transpose(K.dot(K.dot(K.transpose(g), kfeat), V)))
 71 |     s1 = _s[0]
 72 |     print("SVD time: %.2f, q: %d, adjusted q: %d, s1: %.2f, new s1: %.2e" %
 73 |           (time.time() - start, qmG, q, _s[0], s1 / scale))
 74 | 
 75 |     kxx = 1 - np.sum(_V ** 2, axis=1) * n_sample
 76 |     beta = np.max(kxx)
 77 | 
 78 |     return pre_f, scale, s1, beta
 79 | 
 80 | 
 81 | def asm_eigenpro_f(pre_f, kfeat, inx):
 82 |     """Assemble map for EigenPro iteration"""
 83 |     def eigenpro_f(p, g, eta):
 84 |         inx_t = K.constant(inx, dtype='int32')
 85 | 
 86 |         kinx = tf.gather(kfeat, inx_t, axis=1)
 87 |         pinx = K.gather(p, inx_t)
 88 |         update_p =  pinx + eta * pre_f(g, kinx)
 89 |         new_p = scatter_update(p, inx, update_p)
 90 |         return new_p
 91 |     return eigenpro_f
 92 | 
 93 | 
 94 | def nystrom_kernel_svd(X, kernel_f, q, bs=512):
 95 |     """Compute top eigensystem of kernel matrix using Nystrom method.
 96 | 
 97 |     Arguments:
 98 |         X: data matrix of shape (n_sample, n_feature).
 99 |         kernel_f: kernel tensor function k(X, Y).
100 |         q: top-q eigensystem.
101 |         bs: batch size.
102 | 
103 |     Returns:
104 |         s: top eigenvalues of shape (q).
105 |         U: (rescaled) top eigenvectors of shape (n_sample, q).
106 |     """
107 | 
108 |     m, d = X.shape
109 | 
110 |     # Assemble kernel function evaluator.
111 |     input_shape = (d, )
112 |     x = Input(shape=input_shape, dtype='float32',
113 |               name='feat-for-nystrom')
114 |     K_t  = KernelEmbedding(kernel_f, X)(x)
115 |     kernel_tf = Model(x, K_t)
116 |     
117 |     K = kernel_tf.predict(X, batch_size=bs)
118 |     W = K / m
119 |     w, V = sp.linalg.eigh(W, eigvals=(m-q, m-1))
120 |     U1r, s = V[:, ::-1], w[::-1][:q]
121 |     NU = np.float32(U1r[:, :q] / np.sqrt(m))
122 | 
123 |     return s, NU
124 | 
125 | 
126 | class EigenPro(object):
127 | 
128 |     def __init__(self, kernel, centers, n_label, mem_gb,
129 |                  n_subsample=None, q=None, bs=None,
130 |                  metric='accuracy', scale=.5, seed=1):
131 |         """Assemble learner using EigenPro iteration/kernel.
132 | 
133 |         Arguments:
134 |             kernel:     kernel tensor function k(X, Y).
135 |             centers:    kernel centers of shape (n_center, n_feature).
136 |             n_label:    number of labels.
137 |             mem_gb:     GPU memory in GB.
138 |             n_subsample:number of subsamples for preconditioner.
139 |             q:          top-q eigensystem for preconditioner.
140 |             bs:         mini-batch size.
141 |             metric:     keras metric, e.g., 'accuracy'.
142 |             scale:      step size factor (0.5 for mse loss).
143 |             seed:       random seed.
144 |         """
145 | 
146 |         n, d = centers.shape
147 |         if n_subsample is None:
148 |             if n < 100000:
149 |                 n_subsample = min(2000, n)
150 |             else:
151 |                 n_subsample = 12000
152 | 
153 |         mem_bytes = (mem_gb - 0.6) * 1024**3 # preserve 600MB
154 |         # Has a factor 3 due to tensorflow implementation.
155 |         bsizes = np.arange(n_subsample)
156 |         mem_usages = ((d + 2 * n_label + 3 * bsizes) * n + n_subsample * 1000) * 4
157 |         mG = np.sum(mem_usages < mem_bytes) # device-dependent batch size
158 | 
159 |         # Calculate batch/step size for improved EigenPro iteration.
160 |         np.random.seed(seed)
161 |         pinx = np.random.choice(n, n_subsample, replace=False).astype('int32')
162 |         kf, gap, s1, beta = pre_eigenpro_f(
163 |             centers[pinx], kernel, q, n, mG, alpha=.95, seed=seed)
164 |         new_s1 = s1 / gap
165 | 
166 |         if bs is None:
167 |             bs = min(np.int32(beta / new_s1 + 1), mG)
168 | 
169 |         if bs < beta / new_s1 + 1:
170 |         	eta = bs / beta
171 |         elif bs < n:
172 |         	eta = 2 * bs / (beta + (bs - 1) * new_s1)
173 |         else:
174 |         	eta = 0.95 * 2 / new_s1
175 |         eta = scale * eta
176 | 
177 |         print("n_subsample=%d, mG=%d, eta=%.2f, bs=%d, s1=%.2e, beta=%.2f" %
178 |               (n_subsample, mG, eta, bs, s1, beta))
179 |         eta = np.float32(eta * n_label) 
180 |         
181 |         # Assemble kernel model.
182 |         ix = Input(shape=(d+1,), dtype='float32', name='indexed-feat')
183 |         x, index = utils.separate_index(ix) # features, sample_id
184 |         kfeat = KernelEmbedding(kernel, centers,
185 |                                 input_shape=(d,))(x)
186 | 
187 |         y = Dense(n_label, input_shape=(n,),
188 |                   activation='linear',
189 |                   kernel_initializer='zeros',
190 |                   use_bias=False)(kfeat)
191 |         model = Model(ix, y)
192 |         model.compile(
193 |             loss='mse',
194 |             optimizer=PSGD(pred_t=y, index_t=index, eta=eta,
195 |                            eigenpro_f=asm_eigenpro_f(kf, kfeat, pinx)),
196 |             metrics=[metric])
197 | 
198 |         self.n_label = n_label
199 |         self.seed = seed
200 |         self.bs = bs
201 |         self.model = model
202 | 
203 |     def fit(self, x_train, y_train, x_val, y_val, epochs,
204 |             n_sample=10000, seed=1):
205 |         """Train the model.
206 | 
207 |         Arguments:
208 |             x_train: feature matrix of shape (n_train, n_feature).
209 |             y_train: label matrix of shape (n_train, n_label).
210 |             x_val:   feature matrix for validation.
211 |             y_val:   label matrix for validation.
212 |             epochs:  list of epochs when the error is calculated.
213 |             n_sample:   number of subsamples used to estimate train error.
214 |             seed:   seed for random number.
215 | 
216 |         Return:
217 |             res:    dictionary with key: epoch,
218 |                     value: (train_error, test_error, train_time).
219 |         """
220 |         assert self.n_label == y_train.shape[1]
221 |         np.random.seed(seed)
222 | 
223 |         x_train = utils.add_index(x_train)
224 |         x_val = utils.add_index(x_val)
225 | 
226 |         bs = self.bs
227 |         res = dict()
228 | 
229 |         initial_epoch=0
230 |         train_sec = 0 # training time in seconds
231 |         n, _ = x_train.shape
232 | 
233 |         # Subsample training data for fast estimation of training loss.
234 |         inx = np.random.choice(n, min(n, n_sample), replace=False)
235 |         x_sample, y_sample = x_train[inx], y_train[inx]
236 | 
237 |         for epoch in epochs:
238 |             start = time.time()
239 |             for _ in range(epoch - initial_epoch):
240 |                 epoch_ids = np.random.choice(n, n / bs * bs, replace=False)
241 |                 for batch_ids in np.array_split(epoch_ids, n / bs):
242 |                     x_batch, y_batch = x_train[batch_ids], y_train[batch_ids]
243 |                     self.model.train_on_batch(x_batch, y_batch)
244 | 
245 |             train_sec += time.time() - start
246 |             tr_score = self.model.evaluate(x_sample, y_sample, batch_size=bs, verbose=0)
247 |             tv_score = self.model.evaluate(x_val, y_val, batch_size=bs, verbose=0)
248 |             print("train error: %.2f%%\tval error: %.2f%% (%d epochs, %.2f seconds)\t"
249 | 			      "train l2: %.2e\tval l2: %.2e" %
250 | 			      ((1 - tr_score[1]) * 100, (1 - tv_score[1]) * 100, epoch, train_sec,
251 | 			       tr_score[0], tv_score[0]))
252 |             res[epoch] = (tr_score, tv_score, train_sec)
253 |             initial_epoch = epoch
254 | 
255 |         return res
256 | 
257 |     def predict(self, x_feat):
258 |         """Predict regression scores.
259 | 
260 |         Argument:
261 |             x_feat: feature matrix of shape (?, n_feature).
262 | 
263 |         Return:
264 |             score matrix of shape (?, n_label).
265 |         """
266 |         return self.model.predict(utils.add_index(x_feat), batch_size=self.bs)
267 | 


--------------------------------------------------------------------------------
/kernels.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | from keras import backend as K
  4 | 
  5 | def D2(X, Y, Y2=None, YT=None):
  6 |     """ Calculate the pointwise (squared) distance.
  7 |     
  8 |     Arguments:
  9 |     	X: of shape (n_sample, n_feature).
 10 |     	Y: of shape (n_center, n_feature).
 11 |         Y2: of shape (1, n_center).
 12 |         YT: of shape (n_feature, n_center). 
 13 |     
 14 |     Returns:
 15 |     	pointwise distances (n_sample, n_center).
 16 |     """
 17 |     X2 = K.sum(K.square(X), axis = 1, keepdims=True)
 18 |     if Y2 is None:
 19 |         if X is Y:
 20 |             Y2 = X2
 21 |         else:
 22 |             Y2 = K.sum(K.square(Y), axis = 1, keepdims=True)
 23 |         Y2 = K.reshape(Y2, (1, K.shape(Y)[0]))
 24 |     if YT is None:
 25 |         YT = K.transpose(Y)
 26 |     d2 = K.reshape(X2, (K.shape(X)[0], 1)) \
 27 |        + Y2 - 2 * K.dot(X, YT) # x2 + y2 - 2xy
 28 |     return d2
 29 | 
 30 | def Gaussian(X, Y, s, dist2_f=D2):
 31 |     """ Gaussian kernel.
 32 |     
 33 |     Arguments:
 34 |     	X: of shape (n_sample, n_feature).
 35 |     	Y: of shape (n_center, n_feature).
 36 |     	s: kernel bandwidth.
 37 |     
 38 |     Returns:
 39 |     	kernel matrix of shape (n_sample, n_center).
 40 |     """
 41 |     assert s > 0
 42 |     
 43 |     d2 = dist2_f(X, Y)
 44 |     gamma = np.float32(1. / (2 * s ** 2))
 45 |     G = K.exp(-gamma * K.clip(d2, 0, None))
 46 |     return G
 47 | 
 48 | def Laplacian(X, Y, s, dist2_f=D2):
 49 |     """ Laplacian kernel.
 50 |     
 51 |     Arguments:
 52 |     	X: of shape (n_sample, n_feature).
 53 |     	Y: of shape (n_center, n_feature).
 54 |     	s: kernel bandwidth.
 55 |     
 56 |     Returns:
 57 |     	kernel matrix of shape (n_sample, n_center).
 58 |     """
 59 |     assert s > 0
 60 |     
 61 |     d2 = K.clip(dist2_f(X, Y), 0, None)
 62 |     d = K.sqrt(d2)
 63 |     G = K.exp(- d / s)
 64 |     return G
 65 | 
 66 | def Cauchy(X, Y, s, dist2_f=D2):
 67 |     """ Cauchy kernel.
 68 |     
 69 |     Arguments:
 70 |     	X: of shape (n_sample, n_feature).
 71 |     	Y: of shape (n_center, n_feature).
 72 |     	s: kernel bandwidth.
 73 |     
 74 |     Returns:
 75 |     	kernel matrix of shape (n_sample, n_center).
 76 |     """
 77 |     assert s > 0
 78 |     
 79 |     d2 = dist2_f(X, Y)
 80 |     s2 = np.float32(s**2)
 81 |     G = 1 / K.exp( 1 + K.clip(d2, 0, None) / s2)
 82 |     return G
 83 | 
 84 | 
 85 | def Dispersal(X, Y, s, gamma, dist2_f=D2):
 86 |     """ Dispersal kernel.
 87 |     
 88 |     Arguments:
 89 |         X: of shape (n_sample, n_feature).
 90 |         Y: of shape (n_center, n_feature).
 91 |         s: kernel bandwidth.
 92 | 		gamma: dispersal factor.
 93 |     
 94 |     Returns:
 95 |         kernel matrix of shape (n_sample, n_center).
 96 |     """
 97 |     assert s > 0
 98 | 
 99 |     d2 = K.clip(dist2_f(X, Y), 0, None)
100 |     d = K.pow(d2, gamma / 2.)
101 |     G = K.exp(- d / np.float32(s))
102 |     return G
103 | 


--------------------------------------------------------------------------------
/layers.py:
--------------------------------------------------------------------------------
 1 | from keras import backend as K
 2 | from keras.engine.topology import Layer
 3 | import numpy as np
 4 | 
 5 | from kernels import D2
 6 | import utils
 7 | 
 8 | class KernelEmbedding(Layer):
 9 |     """ Generate kernel features.
10 | 
11 |     Arguments:
12 |         kernel_f:   kernel function k(x, y).
13 |         centers:    matrix of shape (n_center, n_feature).
14 |     """
15 | 
16 |     def __init__(self, kernel_f, centers, **kwargs):
17 |         self.kernel_f = kernel_f
18 |         self._centers = centers
19 |         self.n_center = centers.shape[0]
20 |         super(KernelEmbedding, self).__init__(**kwargs)
21 | 
22 |     def build(self, input_shape):
23 |         self.centers = utils.loadvar_in_sess(self._centers.T.astype('float32'), name='kernel.centers')
24 |         center2 = K.eval(K.sum(K.square(self.centers), axis=0, keepdims=True)).astype('float32')
25 |         center2_t = utils.loadvar_in_sess(np.reshape(center2, (1, -1)), name='kernel.centers.norm')
26 |         self.dist2_f = lambda x, y: D2(x, None, Y2=center2_t, YT=self.centers) # Pre-compute norm for centers.
27 | 
28 |         super(KernelEmbedding, self).build(input_shape)  # Be sure to call this somewhere!
29 | 
30 |     def call(self, x):
31 |         embed = self.kernel_f(x, None, self.dist2_f)
32 |         return embed
33 | 
34 |     def compute_output_shape(self, input_shape):
35 |         return (input_shape[0], self.n_center)
36 | 
37 | 
38 | def rff(X, W):
39 |     """Calculate random Fourier features according to paper,
40 |       'Random Features for Large-Scale Kernel Machines'.
41 | 
42 |     Arguments:
43 |         X: data matrix of shape (n, D).
44 |         W: weight matrix of shape (D, d).
45 | 
46 |     Returns:
47 |         feature matrix of shape (n, d).
48 |     """
49 | 
50 |     d = K.get_variable_shape(W)[1]
51 |     dot = K.dot(X, W) # of shape (n, d)
52 |     RF = K.concatenate([K.cos(dot), K.sin(dot)], axis=1) / np.sqrt(d, dtype='float32')
53 |     return RF
54 | 
55 | 
56 | class RFF(Layer):
57 |     """ Generate random Fourier features.
58 | 
59 |     Arguments:
60 |         weights: of shape (D, d).
61 |     """
62 | 
63 |     def __init__(self, weights, **kwargs):
64 |         self._weights = weights
65 |         self.d = weights.shape[1]
66 |         super(RFF, self).__init__(**kwargs)
67 | 
68 |     def build(self, input_shape):
69 |         self.W = self.add_weight(name='rff-weight', 
70 |                                  shape=self._weights.shape,
71 |                                  initializer=(lambda shape: self._weights),
72 |                                  trainable=False)
73 |         super(RFF, self).build(input_shape)  # Be sure to call this somewhere!
74 | 
75 |     def call(self, x):
76 |         embed = rff(x, self.W)
77 |         return embed
78 | 
79 |     def compute_output_shape(self, input_shape):
80 |         return (input_shape[0], 2 * self.d)
81 | 


--------------------------------------------------------------------------------
/mnist.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | from keras.datasets.mnist import load_data
 4 | 
 5 | 
 6 | def unit_range_normalize(X):
 7 | 	min_ = np.min(X, axis=0)
 8 | 	max_ = np.max(X, axis=0)
 9 | 	diff_ = max_ - min_
10 | 	diff_[diff_<=0.0] = np.maximum(1.0, min_[diff_<=0.0])
11 | 	SX = (X - min_) / diff_
12 | 	return SX
13 | 
14 | def load():
15 | 	# input image dimensions
16 | 	img_rows, img_cols = 28, 28
17 | 	
18 | 	# the data, shuffled and split between train and test sets
19 | 	(x_train, y_train), (x_test, y_test) = load_data()
20 | 	
21 | 	x_train = x_train.reshape(x_train.shape[0], img_rows * img_cols)
22 | 	x_test = x_test.reshape(x_test.shape[0], img_rows * img_cols)
23 | 	
24 | 	x_train = x_train.astype('float32') / 255
25 | 	x_test = x_test.astype('float32') / 255
26 | 	
27 | 	x_train = unit_range_normalize(x_train)
28 | 	x_test = unit_range_normalize(x_test)
29 | 	print("Load MNIST dataset.")
30 | 	print(x_train.shape[0], 'train samples')
31 | 	print(x_test.shape[0], 'test samples')
32 | 
33 | 	return (x_train, y_train), (x_test, y_test)
34 | 


--------------------------------------------------------------------------------
/optimizers.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from keras import backend as K
 3 | from keras.optimizers import Optimizer
 4 | 
 5 | from backend_extra import scatter_update
 6 | 
 7 | 
 8 | def nesterov(p0, p1, rmax=.95): 
 9 |     """Nesterov method.
10 | 
11 |     Arguments:
12 |         p0: weight parameter tensor variable.
13 |         p1: updated weight parameter tensor.
14 |         rmax: maximum momentum term weight in [0, 1].
15 | 
16 |     Returns:
17 |         p2: parameter tensor adjusted by Nesterov method.
18 |         updates: a list of tensor updates.
19 |     """
20 | 
21 |     p = K.variable(p0, name='nesterov.orig.p', dtype='float32')
22 |     r = K.constant(-rmax, dtype='float32')
23 | 
24 |     p2 = (1 - r) * p1 + r * p
25 |     updates = [K.update(p, p1)]
26 |     return p2, updates
27 | 
28 | 
29 | class PSGD(Optimizer):
30 |     """Primal stochastic gradient descent optimizer.
31 | 
32 |     Arguments:
33 |         pred_t: tensor. Prediction result.
34 |         index_t: tensor. Mini-batch indices for primal updates.
35 |         eta: float >= 0. Step size.
36 |         eigenpro_f: Map grad of the original kernel to that of
37 |                     the eigenpro kernel.
38 |         nesterov_r: Nesterov parameter.
39 |     """
40 | 
41 |     def __init__(self, pred_t, index_t, eta=0.01,
42 |                  eigenpro_f=None, nesterov_r=None, **kwargs):
43 |         super(PSGD, self).__init__(**kwargs)
44 |         self.eta = K.variable(eta, name='eta')
45 |         self.pred_t = pred_t
46 |         self.index_t = index_t
47 |         self.eigenpro_f = eigenpro_f
48 |         self.nesterov_r = nesterov_r
49 | 
50 |     def get_updates(self, loss, params):
51 |         self.updates = []
52 |         grads = self.get_gradients(loss, [self.pred_t])
53 | 
54 |         eta = self.eta
55 |         index = self.index_t
56 |         eigenpro_f = self.eigenpro_f
57 | 
58 |         shapes = [K.get_variable_shape(p) for p in params]
59 |         for p, g in zip(params, grads):
60 |             update_p = K.gather(p, index) - eta * g
61 |             new_p = scatter_update(p, index, update_p)
62 |             
63 |             if eigenpro_f:
64 |                 new_p = eigenpro_f(new_p, g, eta)
65 | 
66 |             if self.nesterov_r is not None:
67 |                 new_p, updates = nesterov(p, new_p, rmax=self.nesterov_r)
68 |                 self.updates += updates
69 | 
70 |             self.updates.append(K.update(p, new_p))
71 |         return self.updates
72 | 
73 |     def get_config(self):
74 |         config = {'eta': float(K.get_value(self.eta))}
75 |         base_config = super(PSGD, self).get_config()
76 |         return dict(list(base_config.items()) + list(config.items()))
77 | 


--------------------------------------------------------------------------------
/run_mnist.py:
--------------------------------------------------------------------------------
 1 | '''Train kernel methods on the MNIST dataset.
 2 | Require tensorflow (>=1.2.1) and GPU device.
 3 | Run command:
 4 | 	CUDA_VISIBLE_DEVICES=0 python run_mnist.py --kernel=Gaussian --s=5 --mem_gb=12 --epochs 1 2 3 4 5
 5 | '''
 6 | from __future__ import print_function
 7 | 
 8 | import argparse
 9 | import keras
10 | import numpy as np
11 | import sys
12 | import warnings
13 | 
14 | from distutils.version import StrictVersion
15 | 
16 | import kernels
17 | import mnist
18 | import utils
19 | import wrapper
20 | 
21 | from eigenpro import EigenPro
22 | from backend_extra import hasGPU
23 | 
24 | 
25 | assert StrictVersion(keras.__version__) >= StrictVersion('2.0.8'), \
26 |        "Requires Keras (>=2.0.8)."
27 | 
28 | if StrictVersion(keras.__version__) > StrictVersion('2.0.8'):
29 |     warnings.warn('\n\nThis code has been tested with Keras 2.0.8. '
30 |                    'If the\ncurrent version (%s) fails, ' 
31 |                    'switch to 2.0.8 by command,\n\n'
32 |                    '\tpip install Keras==2.0.8\n\n' %(keras.__version__), Warning)
33 | 
34 | assert keras.backend.backend() == u'tensorflow', \
35 |        "Requires Tensorflow (>=1.2.1)."
36 | assert hasGPU(), "Requires GPU."
37 | 
38 | 
39 | parser = argparse.ArgumentParser(description='Run tests.')
40 | parser.add_argument('--kernel', type=str, default='Gaussian',
41 |                     help='kernel function (e.g. Gaussian, Laplacian, and Cauchy)', required=True)
42 | parser.add_argument('-s', '--s', type=np.float32, help="bandwidth", required=True)
43 | parser.add_argument('-mem_gb', '--mem_gb', type=np.float32, help="bandwidth", required=True)
44 | parser.add_argument('-epochs', '--epochs', nargs='+', type=int,
45 |                     help="epochs to calculate errors, e.g., --epochs 1 2 3 4 5", required=True)
46 | 
47 | parser.add_argument('-q', '--q', type=np.int32, default=None,
48 |                     help="using the top-q eigensystem for the eigenpro iteration/kernel")
49 | parser.add_argument('-bs', '--bs', type=np.int32, default=None,
50 |                     help="size of mini-batch")
51 | parser.add_argument('-n_subsample', '--n_subsample', type=np.int32, default=None,
52 |                     help="subsample size")
53 | 
54 | args = parser.parse_args()
55 | args_dict = vars(args)
56 | 
57 | 
58 | # Load dataset.
59 | n_class = 10  # number of classes
60 | (x_train, y_train), (x_test, y_test) = mnist.load()
61 | y_train = keras.utils.to_categorical(y_train, n_class)
62 | y_test = keras.utils.to_categorical(y_test, n_class)
63 | x_train, y_train, x_test, y_test = x_train.astype('float32'), \
64 |     y_train.astype('float32'), x_test.astype('float32'), y_test.astype('float32')
65 | 
66 | # Choose kernel functions.
67 | s = args_dict['s'] # kernel bandwidth
68 | if args_dict['kernel'] == 'Gaussian':
69 |     kernel = wrapper.set_f_args(kernels.Gaussian, s=s)
70 | 
71 | elif args_dict['kernel'] == 'Laplacian':
72 |     kernel = wrapper.set_f_args(kernels.Laplacian, s=s)
73 | 
74 | elif args_dict['kernel'] == 'Cauchy':
75 |     kernel = wrapper.set_f_args(kernels.Cauchy, s=s)
76 | 
77 | else:
78 |     raise Exception("Unknown kernel function - %s. \
79 |                      Try Gaussian, Laplacian, or Cauchy"
80 |                     % args_dict['kernel'])
81 | 
82 | # Initialize and train the model.
83 | model = EigenPro(kernel, x_train, n_class,
84 |                  mem_gb=args_dict['mem_gb'],
85 |                  n_subsample=args_dict['n_subsample'],
86 |                  q=args_dict['q'],
87 |                  bs=args_dict['bs'])
88 | model.fit(x_train, y_train,
89 |           x_val=x_test, y_val=y_test,
90 |           epochs=args_dict['epochs'])
91 | 
92 | utils.reset()
93 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
 1 | from __future__ import absolute_import
 2 | 
 3 | import gc
 4 | import numpy as np
 5 | import tensorflow as tf
 6 | 
 7 | from keras import backend as K
 8 | from keras.layers import Lambda, Input
 9 | 
10 | 
11 | def enable_xla():
12 |     """Enable XLA optimization in the default session of Keras."""
13 |     config = tf.ConfigProto()
14 |     config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
15 |     K.set_session(tf.Session(config=config))
16 | 
17 | 
18 | def loadvar(array, trainable=False, name=None):
19 |     """Load numpy array to tensorflow variable without 2GB limitation
20 | 
21 |     Arguments:
22 |         array: numpy array.
23 |         trainable: boolean.
24 | 
25 |     Returns:
26 |         var: tensorflow variable.
27 |         load_var: function to load array in given seesion.
28 |     """
29 |     placeholder = tf.placeholder(dtype=array.dtype, shape=array.shape)
30 |     var = tf.Variable(placeholder, trainable=trainable,
31 |                       collections=[], name=name)
32 | 
33 |     load_var = lambda sess: sess.run(var.initializer, feed_dict={placeholder: array})
34 |     return var, load_var
35 | 
36 | 
37 | def loadvar_in_sess(array, trainable=False, sess=None, name=None):
38 |     var, load_var = loadvar(array, trainable, name)
39 |     if sess is None:
40 |         sess = K.get_session() # Keras default session
41 |     load_var(sess)
42 |     return var
43 | 
44 | 
45 | def add_index(X):
46 |     """Append sample index as the last feature to data matrix.
47 | 
48 |     Arguments:
49 |         X: matrix of shape (n_sample, n_feat).
50 | 
51 |     Returns:
52 |         matrix of shape (n_sample, n_feat+1).
53 |     """
54 |     inx = np.reshape(np.arange(X.shape[0]), (-1, 1))
55 |     return np.hstack([X, inx])
56 | 
57 | 
58 | def separate_index(IX):
59 |     """Separate the index feature from the indexed tensor matrix.
60 | 
61 |     Arguments:
62 |         IX: matrix of shape (n_sample, n_feat+1).
63 | 
64 |     Returns:
65 |         X: matrix of shape (n_sample, n_feat).
66 |         index: vector of shape (n_sample,).
67 |     """
68 |     X = Lambda(lambda x: x[:, :-1])(IX)
69 |     index = Lambda(lambda x: x[:, -1])(IX)
70 |     return X, K.cast(index, dtype='int32')
71 | 
72 | 
73 | def reset():
74 |     """Reset the Keras session and release the GPU memory."""
75 |     K.clear_session()
76 |     reload(K)
77 |     gc.collect()
78 | 


--------------------------------------------------------------------------------
/wrapper.py:
--------------------------------------------------------------------------------
 1 | from __future__ import absolute_import
 2 | 
 3 | import inspect
 4 | 
 5 | 
 6 | def set_f_args(f, **default_args):
 7 | 	"""Set the default argument values of f and return
 8 | 	   a corresponding function.
 9 | 	"""
10 | 	def _set_f_default_args(f, args, default_args, kwargs):
11 | 		if inspect.isclass(f):
12 | 			# remove the self.
13 | 			num_args = f.__init__.func_code.co_argcount - 1
14 | 			arg_names = f.__init__.func_code.co_varnames[1:num_args]
15 | 		else:
16 | 			num_args = f.func_code.co_argcount
17 | 			arg_names = f.func_code.co_varnames[:num_args]
18 | 		
19 | 		cursor = 0
20 | 		merged_args = []
21 | 		for arg in arg_names:
22 | 			if arg == 'self':
23 | 				continue
24 | 			if arg in default_args:
25 | 				merged_args.append(default_args[arg])
26 | 			elif arg in kwargs:
27 | 				merged_args.append(kwargs[arg])
28 | 			else:
29 | 				merged_args.append(args[cursor])
30 | 				cursor += 1
31 | 		merged_args += list(args[cursor:])
32 | 		return f(*merged_args)
33 | 	
34 | 	# Since types.LambdaType is types.FunctionType,
35 | 	#   we have to distinguish the functions in a hacking style.
36 | 	if f.__name__ == '<lambda>':
37 | 		g = lambda *args, **kwargs:              \
38 | 			f(*args, **dict(default_args.items() \
39 | 			+ kwargs.items()))
40 | 	else:
41 | 		g = lambda *args, **kwargs: \
42 | 			_set_f_default_args(f, args, default_args, kwargs)
43 | 	
44 | 	return g
45 | 


--------------------------------------------------------------------------------