├── .gitignore ├── MANIFEST.in ├── README.md ├── lmj ├── __init__.py └── rbm.py ├── setup.py └── test ├── idx_reader.py └── mnist.py /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | #*# 3 | *.so 4 | *.py[oc] 5 | *.egg-info 6 | dist 7 | build 8 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include README.md 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # py-rbm 2 | 3 | This is a small Python library that contains code for using and training 4 | Restricted Boltzmann Machines (RBMs), the basic building blocks for many types 5 | of deep belief networks. Variations available include the "standard" RBM (with 6 | optional sparsity-based hidden layer learning); the temporal net introduced by 7 | [Taylor, Hinton & Roweis][]; and convolutional nets with probabilistic 8 | max-pooling described by [Lee, Grosse, Ranganath & Ng][]. 9 | 10 | Mostly I wrote the code to better understand the underlying algorithms. I don't 11 | use it for anything at the moment, having moved on to using primarily [Theano][] 12 | with [networks of rectified linear neurons][http://www.csri.utoronto.ca/~hinton/absps/reluICML.pdf] 13 | (PDF). Still, there seems to be some interest in RBMs, so hopefully others will 14 | find this package instructive, and maybe even useful ! 15 | 16 | [Taylor, Hinton & Roweis]: http://www.cs.nyu.edu/~gwtaylor/publications/nips2006mhmublv/ 17 | [Lee, Grosse, Ranganath & Ng]: http://cacm.acm.org/magazines/2011/10/131415-unsupervised-learning-of-hierarchical-representations-with-convolutional-deep-belief-networks/fulltext 18 | [Theano]: http://deeplearning.net/software/theano/ 19 | 20 | ## Installation 21 | 22 | Just install using the included setup script : 23 | 24 | python setup.py install 25 | 26 | Or you can install the package from the internets using pip : 27 | 28 | pip install lmj.rbm 29 | 30 | ## Testing 31 | 32 | This library is definitely very alpha; so far I just have one main test that 33 | encodes image data. To try things out, clone the source for this package and 34 | install [glumpy][] : 35 | 36 | pip install glumpy 37 | 38 | Then download the MNIST digits data from http://yann.lecun.com/exdb/mnist/ -- 39 | you'll need both the `train-*-images.ubyte.gz` and `train-*-labels.ubyte.gz` 40 | files. Then run the test : 41 | 42 | python test/mnist.py \ 43 | --images *-images.ubyte.gz \ 44 | --labels *-labels.ubyte.gz 45 | 46 | If you're feeling overconfident, go ahead and try out the gaussian visible 47 | units : 48 | 49 | python test/mnist.py \ 50 | --images *-images.ubyte.gz \ 51 | --labels *-labels.ubyte.gz \ 52 | --batch-size 257 \ 53 | --l2 0.0001 \ 54 | --learning-rate 0.2 \ 55 | --momentum 0.5 \ 56 | --sparsity 0.01 \ 57 | --gaussian 58 | 59 | The learning parameters can be a bit squirrely, but if things go right you 60 | should see a number of images show up on your screen that represent the "basis 61 | functions" that the network has learned when trying to auto-encode the MNIST 62 | images you are feeding it. 63 | 64 | You can also try running the test script with `--conv` to try out a 65 | convolutional filterbank, but I'm not confident that the conv net test is 66 | working correctly. Anyway, if you're thinking of using conv nets for a project, 67 | please have a look at [Theano], or for a highly-tuned GPU/C++ implementation, 68 | https://code.google.com/p/cuda-convnet/ (by 69 | [Alex Krizhevsky][www.cs.toronto.edu/~kriz/]). 70 | 71 | [glumpy]: http://code.google.com/p/glumpy/ 72 | 73 | ## License 74 | 75 | (The MIT License) 76 | 77 | Copyright (c) 2011 Leif Johnson 78 | 79 | Permission is hereby granted, free of charge, to any person obtaining a copy of 80 | this software and associated documentation files (the 'Software'), to deal in 81 | the Software without restriction, including without limitation the rights to 82 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 83 | the Software, and to permit persons to whom the Software is furnished to do so, 84 | subject to the following conditions: 85 | 86 | The above copyright notice and this permission notice shall be included in all 87 | copies or substantial portions of the Software. 88 | 89 | THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 90 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 91 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 92 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 93 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 94 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 95 | -------------------------------------------------------------------------------- /lmj/__init__.py: -------------------------------------------------------------------------------- 1 | __import__('pkg_resources').declare_namespace(__name__) 2 | -------------------------------------------------------------------------------- /lmj/rbm.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2011 Leif Johnson 2 | # 3 | # Permission is hereby granted, free of charge, to any person obtaining a copy 4 | # of this software and associated documentation files (the "Software"), to deal 5 | # in the Software without restriction, including without limitation the rights 6 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | # copies of the Software, and to permit persons to whom the Software is 8 | # furnished to do so, subject to the following conditions: 9 | # 10 | # The above copyright notice and this permission notice shall be included in all 11 | # copies or substantial portions of the Software. 12 | # 13 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | # SOFTWARE. 20 | 21 | '''An implementation of several types of Restricted Boltzmann Machines. 22 | 23 | This code is largely based on the Matlab generously provided by Taylor, Hinton 24 | and Roweis, and described in their 2006 NIPS paper, "Modeling Human Motion Using 25 | Binary Hidden Variables". Their code and results are available online at 26 | http://www.cs.nyu.edu/~gwtaylor/publications/nips2006mhmublv/. 27 | 28 | There are more RBM implementations in this module than just the Taylor 29 | "conditional RBM," though. The basic (non-Temporal) RBM is based on the Taylor, 30 | Hinton, and Roweis code, but stripped of the dynamic bias terms and refactored 31 | into an object-oriented style. The convolutional RBM code is based on the 2009 32 | ICML paper by Lee, Grosse, Ranganath and Ng, "Convolutional Deep Belief Networks 33 | for Scalable Unsupervised Learning of Hierarchical Representations". 34 | 35 | The mean-covariance RBM is based on XXX. 36 | 37 | All implementations incorporate an option to train hidden unit biases using a 38 | sparsity criterion, as described in the 2008 NIPS paper by Lee, Ekanadham and 39 | Ng, "Sparse Deep Belief Net Model for Visual Area V2". 40 | 41 | Most RBM implementations provide an option to treat visible units as either 42 | binary or gaussian. Training networks with gaussian visible units is a tricky 43 | dance of parameter-twiddling, but binary units seem quite stable in their 44 | learning and convergence properties. 45 | 46 | Finally, although I have tried to ensure that the code is correct, there are 47 | probably many bugs, all of which are my own doing. I wrote this code to get a 48 | better intuitive understanding for the RBM family of machine learning 49 | algorithms, but I do not claim that the code is useful for a particular purpose 50 | or produces state-of-the-art results. Mostly I hope that this code is readable 51 | so that others can use it to better understand how this whole RBM thing works. 52 | ''' 53 | 54 | import numpy as np 55 | import logging 56 | import numpy.random as rng 57 | 58 | 59 | def sigmoid(eta): 60 | '''Return the logistic sigmoid function of the argument.''' 61 | return 1. / (1. + np.exp(-eta)) 62 | 63 | 64 | def identity(eta): 65 | '''Return the identity function of the argument.''' 66 | return eta 67 | 68 | 69 | def bernoulli(p): 70 | '''Return an array of boolean samples from Bernoulli(p). 71 | 72 | Parameters 73 | ---------- 74 | p : ndarray 75 | This array should contain values in [0, 1]. 76 | 77 | Returns 78 | ------- 79 | An array of the same shape as p. Each value in the result will be a boolean 80 | indicating whether a single Bernoulli trial succeeded for the corresponding 81 | element of `p`. 82 | ''' 83 | return rng.rand(*p.shape) < p 84 | 85 | 86 | class RBM(object): 87 | '''A Restricted Boltzmann Machine (RBM) is a probabilistic model of data. 88 | 89 | RBMs have two layers of variables (here called "units," in keeping with 90 | neural network terminology) -- a "visible" layer that models data in the 91 | world, and a "hidden" layer that is imagined to generate the data in the 92 | visible layer. RBMs are inspired by the (unrestricted) Boltzmann Machine, a 93 | model from statistical physics in which each unit is connected to all other 94 | units, and the states of unobserved variables can be inferred by using a 95 | sampling procedure. 96 | 97 | The full connectivity of an unrestricted Boltzmann Machine makes inference 98 | difficult, requiring sampling or some other approximate technique. RBMs 99 | restrict this more general model by requiring that the visible and hidden 100 | units form a fully connected, undirected, bipartite graph. In this way, each 101 | of the visible units is independent from the other visible units when 102 | conditioned on the state of the hidden layer, and each of the hidden units 103 | is independent of the others when conditioned on the state of the visible 104 | layer. This conditional independence makes inference tractable for the units 105 | in a single RBM. 106 | 107 | To "encode" a signal by determining the state of the hidden units given some 108 | visible data ("signal"), 109 | 110 | 1. the signal is presented to the visible units, and 111 | 2. the states of the hidden units are sampled from the conditional 112 | distribution given the visible data. 113 | 114 | To "decode" an encoding in the hidden units, 115 | 116 | 3. the states of the visible units are sampled from the conditional 117 | distribution given the states of the hidden units. 118 | 119 | Once a signal has been encoded and then decoded, 120 | 121 | 4. the sampled visible units can be compared directly with the original 122 | visible data. 123 | 124 | Training takes place by presenting a number of data points to the network, 125 | encoding the data, reconstructing it from the hidden states, and encoding 126 | the reconstruction in the hidden units again. Then, using contrastive 127 | divergence (Hinton 2002; Hinton & Salakhutdinov 2006), the gradient is 128 | approximated using the correlations between visible and hidden units in the 129 | first encoding and the same correlations in the second encoding. 130 | ''' 131 | 132 | def __init__(self, num_visible, num_hidden, binary=True, scale=0.001): 133 | '''Initialize a restricted boltzmann machine. 134 | 135 | Parameters 136 | ---------- 137 | num_visible : int 138 | The number of visible units. 139 | 140 | num_hidden : int 141 | The number of hidden units. 142 | 143 | binary : bool 144 | True if the visible units are binary, False if the visible units are 145 | normally distributed. 146 | 147 | scale : float 148 | Sample initial weights from N(0, scale). 149 | ''' 150 | self.weights = scale * rng.randn(num_hidden, num_visible) 151 | self.hid_bias = scale * rng.randn(num_hidden, 1) 152 | self.vis_bias = scale * rng.randn(num_visible, 1) 153 | 154 | self._visible = binary and sigmoid or identity 155 | 156 | @property 157 | def num_hidden(self): 158 | return len(self.hid_bias) 159 | 160 | @property 161 | def num_visible(self): 162 | return len(self.vis_bias) 163 | 164 | def hidden_expectation(self, visible, bias=0.): 165 | '''Given visible data, return the expected hidden unit values.''' 166 | return sigmoid(np.dot(self.weights, visible.T) + self.hid_bias + bias) 167 | 168 | def visible_expectation(self, hidden, bias=0.): 169 | '''Given hidden states, return the expected visible unit values.''' 170 | return self._visible(np.dot(hidden.T, self.weights) + self.vis_bias.T + bias) 171 | 172 | def iter_passes(self, visible): 173 | '''Repeatedly pass the given visible layer up and then back down. 174 | 175 | Parameters 176 | ---------- 177 | visible : ndarray 178 | The initial state of the visible layer. 179 | 180 | Returns 181 | ------- 182 | Generates a sequence of (visible, hidden) states. The first pair will be 183 | the (original visible, resulting hidden) states, followed by pairs 184 | containing the values from (visible down-pass, hidden up-pass). 185 | ''' 186 | while True: 187 | hidden = self.hidden_expectation(visible) 188 | yield visible, hidden 189 | visible = self.visible_expectation(bernoulli(hidden)) 190 | 191 | def reconstruct(self, visible, passes=1): 192 | '''Reconstruct a given visible layer through the hidden layer. 193 | 194 | Parameters 195 | ---------- 196 | visible : ndarray 197 | The initial state of the visible layer. 198 | 199 | passes : int 200 | The number of up- and down-passes. 201 | 202 | Returns 203 | ------- 204 | An array containing the reconstructed visible layer after the specified 205 | number of up- and down- passes. 206 | ''' 207 | for i, (visible, _) in enumerate(self.iter_passes(visible)): 208 | if i + 1 == passes: 209 | return visible 210 | 211 | 212 | class Trainer(object): 213 | ''' 214 | ''' 215 | 216 | def __init__(self, rbm, momentum=0., l2=0., target_sparsity=None): 217 | ''' 218 | ''' 219 | self.rbm = rbm 220 | self.momentum = momentum 221 | self.l2 = l2 222 | self.target_sparsity = target_sparsity 223 | 224 | self.grad_weights = np.zeros(rbm.weights.shape, float) 225 | self.grad_vis = np.zeros(rbm.vis_bias.shape, float) 226 | self.grad_hid = np.zeros(rbm.hid_bias.shape, float) 227 | 228 | def learn(self, visible, learning_rate=0.2): 229 | ''' 230 | ''' 231 | gradients = self.calculate_gradients(visible) 232 | self.apply_gradients(*gradients, learning_rate=learning_rate) 233 | 234 | def calculate_gradients(self, visible_batch): 235 | '''Calculate gradients for a batch of visible data. 236 | 237 | Returns a 3-tuple of gradients: weights, visible bias, hidden bias. 238 | 239 | visible_batch: A (batch size, visible units) array of visible data. Each 240 | row represents one visible data sample. 241 | ''' 242 | passes = self.rbm.iter_passes(visible_batch) 243 | v0, h0 = passes.next() 244 | v1, h1 = passes.next() 245 | gw = (np.dot(h0, v0) - np.dot(h1, v1)) / len(visible_batch) 246 | gv = (v0 - v1).mean(axis=0) 247 | gh = (h0 - h1).mean(axis=1) 248 | if self.target_sparsity is not None: 249 | gh = self.target_sparsity - h0.mean(axis=1) 250 | 251 | logging.debug('displacement: %.3g, hidden std: %.3g', 252 | np.linalg.norm(gv), h0.std(axis=1).mean()) 253 | # make sure we pass ndarrays 254 | gv = gv.reshape(gv.shape[0],1) 255 | gh = gh.reshape(gh.shape[0],1) 256 | return gw, gv, gh 257 | 258 | def apply_gradients(self, weights, visible, hidden, learning_rate=0.2): 259 | ''' 260 | ''' 261 | def update(name, g, _g, l2=0): 262 | target = getattr(self.rbm, name) 263 | g *= 1 - self.momentum 264 | g += self.momentum * (g - l2 * target) 265 | target += learning_rate * g 266 | _g[:] = g 267 | 268 | update('vis_bias', visible, self.grad_vis) 269 | update('hid_bias', hidden, self.grad_hid) 270 | update('weights', weights, self.grad_weights, self.l2) 271 | 272 | 273 | class Temporal(RBM): 274 | '''An RBM that incorporates temporal (dynamic) visible and hidden biases. 275 | 276 | This RBM is based on work and code by Taylor, Hinton, and Roweis (2006). 277 | ''' 278 | 279 | def __init__(self, num_visible, num_hidden, order, binary=True, scale=0.001): 280 | ''' 281 | ''' 282 | super(TemporalRBM, self).__init__( 283 | num_visible, num_hidden, binary=binary, scale=scale) 284 | 285 | self.order = order 286 | 287 | self.vis_dyn_bias = scale * rng.randn(order, num_visible, num_visible) 288 | self.hid_dyn_bias = scale * rng.randn(order, num_hidden, num_visible) - 1. 289 | 290 | def iter_passes(self, frames): 291 | '''Repeatedly pass the given visible layer up and then back down. 292 | 293 | Generates the resulting sequence of (visible, hidden) states. 294 | 295 | visible: An (order, visible units) array containing frames of visible 296 | data to "prime" the network. The temporal order of the frames is 297 | assumed to be reversed, so frames[0] will be the current visible 298 | frame, frames[1] is the previous frame, etc. 299 | ''' 300 | vdb = self.vis_dyn_bias[0] 301 | vis_dyn_bias = collections.deque( 302 | [np.dot(self.vis_dyn_bias[i], f).T for i, f in enumerate(frames)], 303 | maxlen=self.order) 304 | 305 | hdb = self.hid_dyn_bias[0] 306 | hid_dyn_bias = collections.deque( 307 | [np.dot(self.hid_dyn_bias[i], f).T for i, f in enumerate(frames)], 308 | maxlen=self.order) 309 | 310 | visible = frames[0] 311 | while True: 312 | hidden = self.hidden_expectation(visible, sum(hid_dyn_bias)) 313 | yield visible, hidden 314 | visible = self.visible_expectation(bernoulli(hidden), sum(vis_dyn_bias)) 315 | vis_dyn_bias.append(np.dot(vdb, visible)) 316 | hid_dyn_bias.append(np.dot(hdb, visible)) 317 | 318 | 319 | class TemporalTrainer(Trainer): 320 | ''' 321 | ''' 322 | 323 | def __init__(self, rbm, momentum=0.2, l2=0.1, target_sparsity=None): 324 | ''' 325 | ''' 326 | super(TemporalTrainer, self).__init__(rbm, momentum, l2, target_sparsity) 327 | self.grad_dyn_vis = np.zeros(rbm.hid_dyn_bias.shape, float) 328 | self.grad_dyn_hid = np.zeros(rbm.hid_dyn_bias.shape, float) 329 | 330 | def calculate_gradients(self, frames_batch): 331 | '''Calculate gradients using contrastive divergence. 332 | 333 | Returns a 5-tuple of gradients: weights, visible bias, hidden bias, 334 | dynamic visible bias, and dynamic hidden bias. 335 | 336 | frames_batch: An (order, visible units, batch size) array containing a 337 | batch of frames of visible data. 338 | 339 | Frames are assumed to be reversed temporally, across the order 340 | dimension, i.e., frames_batch[0] is the current visible frame in each 341 | batch element, frames_batch[1] is the previous visible frame, 342 | frames_batch[2] is the one before that, etc. 343 | ''' 344 | order, _, batch_size = frames_batch.shape 345 | assert order == self.rbm.order 346 | 347 | vis_bias = sum(np.dot(self.rbm.vis_dyn_bias[i], f).T for i, f in enumerate(frames_batch)) 348 | hid_bias = sum(np.dot(self.rbm.hid_dyn_bias[i], f).T for i, f in enumerate(frames_batch)) 349 | 350 | v0 = frames_batch[0].T 351 | h0 = self.rbm.hidden_expectation(v0, hid_bias) 352 | v1 = self.rbm.visible_expectation(bernoulli(h0), vis_bias) 353 | h1 = self.rbm.hidden_expectation(v1, hid_bias) 354 | 355 | gw = (np.dot(h0.T, v0) - np.dot(h1.T, v1)) / batch_size 356 | gv = (v0 - v1).mean(axis=0) 357 | gh = (h0 - h1).mean(axis=0) 358 | 359 | gvd = np.zeros(self.rbm.vis_dyn_bias.shape, float) 360 | ghd = np.zeros(self.rbm.hid_dyn_bias.shape, float) 361 | v = v0 - self.rbm.vis_bias - vis_bias 362 | for i, f in enumerate(frames_batch): 363 | gvd[i] += np.dot(v.T, f) 364 | ghd[i] += np.dot(h0.T, f) 365 | v = v1 - self.rbm.vis_bias - vis_bias 366 | for i, f in enumerate(frames_batch): 367 | gvd[i] -= np.dot(v.T, f) 368 | ghd[i] -= np.dot(h1.T, f) 369 | 370 | return gw, gv, gh, gvd, ghd 371 | 372 | def apply_gradients(self, weights, visible, hidden, visible_dyn, hidden_dyn, 373 | learning_rate=0.2): 374 | ''' 375 | ''' 376 | def update(name, g, _g, l2=0): 377 | target = getattr(self.rbm, name) 378 | g *= 1 - self.momentum 379 | g += self.momentum * (g - l2 * target) 380 | target += learning_rate * g 381 | _g[:] = g 382 | 383 | update('vis_bias', visible, self.grad_vis) 384 | update('hid_bias', hidden, self.grad_hid) 385 | update('weights', weights, self.grad_weights, self.l2) 386 | update('vis_dyn_bias', visible_dyn, self.grad_vis_dyn, self.l2) 387 | update('hid_dyn_bias', hidden_dyn, self.grad_hid_dyn, self.l2) 388 | 389 | 390 | import scipy.signal 391 | convolve = scipy.signal.convolve 392 | 393 | 394 | class Convolutional(RBM): 395 | ''' 396 | ''' 397 | 398 | def __init__(self, num_filters, filter_shape, pool_shape, binary=True, scale=0.001): 399 | '''Initialize a convolutional restricted boltzmann machine. 400 | 401 | num_filters: The number of convolution filters. 402 | filter_shape: An ordered pair describing the shape of the filters. 403 | pool_shape: An ordered pair describing the shape of the pooling groups. 404 | binary: True if the visible units are binary, False if the visible units 405 | are normally distributed. 406 | scale: Scale initial values by this parameter. 407 | ''' 408 | self.num_filters = num_filters 409 | 410 | self.weights = scale * rng.randn(num_filters, *filter_shape) 411 | self.vis_bias = scale * rng.randn() 412 | self.hid_bias = scale * rng.randn(num_filters) 413 | 414 | self._visible = binary and sigmoid or identity 415 | self._pool_shape = pool_shape 416 | 417 | def _pool(self, hidden): 418 | '''Given activity in the hidden units, pool it into groups.''' 419 | _, r, c = hidden.shape 420 | rows, cols = self._pool_shape 421 | active = np.exp(hidden.T) 422 | pool = np.zeros(active.shape, float) 423 | for j in range(int(np.ceil(float(c) / cols))): 424 | cslice = slice(j * cols, (j + 1) * cols) 425 | for i in range(int(np.ceil(float(r) / rows))): 426 | mask = (cslice, slice(i * rows, (i + 1) * rows)) 427 | pool[mask] = active[mask].sum(axis=0).sum(axis=0) 428 | return pool.T 429 | 430 | def pooled_expectation(self, visible, bias=0.): 431 | '''Given visible data, return the expected pooling unit values.''' 432 | activation = np.exp(np.array([ 433 | convolve(visible, self.weights[k, ::-1, ::-1], 'valid') 434 | for k in range(self.num_filters)]).T + self.hid_bias + bias).T 435 | return 1. - 1. / (1. + self._pool(activation)) 436 | 437 | def hidden_expectation(self, visible, bias=0.): 438 | '''Given visible data, return the expected hidden unit values.''' 439 | activation = np.exp(np.array([ 440 | convolve(visible, self.weights[k, ::-1, ::-1], 'valid') 441 | for k in range(self.num_filters)]).T + self.hid_bias + bias).T 442 | return activation / (1. + self._pool(activation)) 443 | 444 | def visible_expectation(self, hidden, bias=0.): 445 | '''Given hidden states, return the expected visible unit values.''' 446 | activation = sum( 447 | convolve(hidden[k], self.weights[k], 'full') 448 | for k in range(self.num_filters)) + self.vis_bias + bias 449 | return self._visible(activation) 450 | 451 | 452 | class ConvolutionalTrainer(Trainer): 453 | ''' 454 | ''' 455 | 456 | def calculate_gradients(self, visible): 457 | '''Calculate gradients for an instance of visible data. 458 | 459 | Returns a 3-tuple of gradients: weights, visible bias, hidden bias. 460 | 461 | visible: A single array of visible data. 462 | ''' 463 | v0 = visible 464 | h0 = self.rbm.hidden_expectation(v0) 465 | v1 = self.rbm.visible_expectation(bernoulli(h0)) 466 | h1 = self.rbm.hidden_expectation(v1) 467 | 468 | # h0.shape == h1.shape == (num_filters, visible_rows - filter_rows + 1, visible_columns - filter_columns + 1) 469 | # v0.shape == v1.shape == (visible_rows, visible_columns) 470 | 471 | gw = np.array([ 472 | convolve(v0, h0[k, ::-1, ::-1], 'valid') - 473 | convolve(v1, h1[k, ::-1, ::-1], 'valid') 474 | for k in range(self.rbm.num_filters)]) 475 | gv = (v0 - v1).sum() 476 | gh = (h0 - h1).sum(axis=-1).sum(axis=-1) 477 | if self.target_sparsity is not None: 478 | h = self.target_sparsity - self.rbm.hidden_expectation(visible).mean(axis=-1).mean(axis=-1) 479 | 480 | logging.debug('displacement: %.3g, hidden activations: %.3g', 481 | np.linalg.norm(gv), h0.mean(axis=-1).mean(axis=-1).std()) 482 | 483 | return gw, gv, gh 484 | 485 | 486 | class MeanCovariance(RBM): 487 | ''' 488 | ''' 489 | 490 | def __init__(self, num_visible, num_mean, num_precision, scale=0.001): 491 | '''Initialize a mean-covariance restricted boltzmann machine. 492 | 493 | num_visible: The number of visible units. 494 | num_mean: The number of units in the hidden mean vector. 495 | num_precision: The number of units in the hidden precision vector. 496 | ''' 497 | super(MeanCovariance, self).__init__( 498 | num_visible, num_mean, binary=False, scale=scale) 499 | 500 | # replace the hidden bias to reflect the precision units. 501 | self.hid_bias = scale * rng.randn(num_precision, 1) 502 | 503 | self.hid_mean = scale * rng.randn(num_mean, 1) 504 | 505 | self.hid_factor_u = scale * -abs(rng.randn(num_precision - 1)) 506 | self.hid_factor_c = scale * -abs(rng.randn(num_precision)) 507 | self.hid_factor_l = scale * -abs(rng.randn(num_precision - 1)) 508 | 509 | self.vis_factor = scale * rng.randn(num_visible, num_precision) 510 | 511 | @property 512 | def hid_factor(self): 513 | return (numpy.diag(self.hid_factor_u, 1) + 514 | numpy.diag(self.hid_factor_c, 0) + 515 | numpy.diag(self.hid_factor_l, -1)) 516 | 517 | def hidden_expectation(self, visible): 518 | '''Given visible data, return the expected hidden unit values.''' 519 | z = numpy.dot(visible.T, self.vis_factor) 520 | return sigmoid(numpy.dot(z * z, self.hid_factor).T + self.hid_bias) 521 | 522 | def visible_expectation(self, hidden): 523 | '''Given hidden states, return the expected visible unit values.''' 524 | z = numpy.diag(numpy.dot(-self.hid_factor.T, hidden)) 525 | Sinv = numpy.dot(self.vis_factor, numpy.dot(z, self.vis_factor.T)) 526 | return numpy.dot(numpy.dot(numpy.pinv(Sinv), self.weights), self.hid_mean) 527 | 528 | 529 | class MeanCovarianceTrainer(Trainer): 530 | ''' 531 | ''' 532 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import os 2 | import setuptools 3 | 4 | setuptools.setup( 5 | name='lmj.rbm', 6 | version='0.1.1', 7 | namespace_packages=['lmj'], 8 | packages=setuptools.find_packages(), 9 | author='Leif Johnson', 10 | author_email='leif@leifjohnson.net', 11 | description='A library of Restricted Boltzmann Machines', 12 | long_description=open(os.path.join(os.path.dirname(os.path.abspath(__file__)), 'README.md')).read(), 13 | license='MIT', 14 | url='http://github.com/lmjohns3/py-rbm/', 15 | keywords=('deep-belief-network ' 16 | 'restricted-boltzmann-machine ' 17 | 'machine-learning'), 18 | install_requires=['numpy'], 19 | classifiers=[ 20 | 'Development Status :: 3 - Alpha', 21 | 'Intended Audience :: Science/Research', 22 | 'License :: OSI Approved :: MIT License', 23 | 'Operating System :: OS Independent', 24 | 'Topic :: Scientific/Engineering :: Artificial Intelligence', 25 | ], 26 | ) 27 | -------------------------------------------------------------------------------- /test/idx_reader.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2011 Leif Johnson 2 | # 3 | # Permission is hereby granted, free of charge, to any person obtaining a copy 4 | # of this software and associated documentation files (the "Software"), to deal 5 | # in the Software without restriction, including without limitation the rights 6 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | # copies of the Software, and to permit persons to whom the Software is 8 | # furnished to do so, subject to the following conditions: 9 | # 10 | # The above copyright notice and this permission notice shall be included in all 11 | # copies or substantial portions of the Software. 12 | # 13 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | # SOFTWARE. 20 | 21 | '''A Python library for reading MNIST handwritten digit database (IDX) files.''' 22 | 23 | import gzip 24 | import numpy as np 25 | import struct 26 | 27 | 28 | def iterimages(image_file, label_file=None, unzip=False): 29 | '''Iterate over labels and images from the MNIST handwritten digit dataset. 30 | 31 | This function generates (label, image) pairs, one for each image in the 32 | dataset. The image is represented as a numpy array of the pixel values, and 33 | the label is an integer. 34 | 35 | image_file: The name of a binary IDX file to load image data from. 36 | label_file: The name of a binary IDX file to load label data from. 37 | ungzip: If True, the binary files will be gunzipped automatically before 38 | reading. 39 | ''' 40 | opener = unzip and gzip.open or open 41 | 42 | # check the label header 43 | label_count = None 44 | if label_file: 45 | handle = opener(label_file, 'rb') 46 | label_data = handle.read() 47 | handle.close() 48 | magic, label_count = struct.unpack('>2i', label_data[:8]) 49 | assert magic == 2049 50 | assert label_count > 0 51 | label_data = label_data[8:] 52 | 53 | # check the image header 54 | handle = opener(image_file, 'rb') 55 | image_data = handle.read() 56 | handle.close() 57 | magic, image_count, rows, columns = struct.unpack('>4i', image_data[:16]) 58 | assert magic == 2051 59 | assert image_count > 0 60 | assert rows > 0 61 | assert columns > 0 62 | image_data = image_data[16:] 63 | 64 | # check that the two files agree on cardinality 65 | assert label_count is None or image_count == label_count 66 | 67 | for _ in range(image_count): 68 | label = None 69 | if label_count: 70 | label, = struct.unpack('B', label_data[:1]) 71 | label_data = label_data[1:] 72 | 73 | count = rows * columns 74 | pixels = struct.unpack('%dB' % count, image_data[:count]) 75 | image_data = image_data[count:] 76 | 77 | yield label, np.array(pixels).astype(float).reshape((rows, columns)) 78 | 79 | 80 | if __name__ == '__main__': 81 | import sys 82 | import glumpy 83 | 84 | iterator = iterimages(sys.argv[1], sys.argv[2], False) 85 | composites = [np.zeros((28, 28), 'f') for _ in range(10)] 86 | 87 | fig = glumpy.Figure() 88 | images_and_frames = [] 89 | for i, c in enumerate(composites): 90 | frame = fig.add_figure(rows=2, cols=5, position=divmod(i, 2)).add_frame(aspect=1) 91 | images_and_frames.append((glumpy.Image(c), frame)) 92 | 93 | @fig.event 94 | def on_draw(): 95 | fig.clear() 96 | for image, frame in images_and_frames: 97 | image.update() 98 | frame.draw(x=frame.x, y=frame.y) 99 | image.draw(x=frame.x, y=frame.y, z=0, width=frame.width, height=frame.height) 100 | 101 | @fig.event 102 | def on_idle(dt): 103 | try: 104 | label, pixels = iterator.next() 105 | except StopIteration: 106 | sys.exit() 107 | composites[label] *= 0.3 108 | composites[label] += 0.7 * pixels 109 | fig.redraw() 110 | 111 | glumpy.show() 112 | -------------------------------------------------------------------------------- /test/mnist.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | # Copyright (c) 2011 Leif Johnson 4 | # 5 | # Permission is hereby granted, free of charge, to any person obtaining a copy 6 | # of this software and associated documentation files (the "Software"), to deal 7 | # in the Software without restriction, including without limitation the rights 8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | # copies of the Software, and to permit persons to whom the Software is 10 | # furnished to do so, subject to the following conditions: 11 | # 12 | # The above copyright notice and this permission notice shall be included in all 13 | # copies or substantial portions of the Software. 14 | # 15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | # SOFTWARE. 22 | 23 | import argparse 24 | import collections 25 | import datetime 26 | import glumpy 27 | import logging 28 | import numpy as np 29 | import numpy.random as rng 30 | import os 31 | import pickle 32 | import sys 33 | 34 | import lmj.rbm 35 | import idx_reader 36 | 37 | FLAGS = argparse.ArgumentParser( 38 | conflict_handler='resolve', 39 | formatter_class=argparse.ArgumentDefaultsHelpFormatter) 40 | FLAGS.add_argument('--model', metavar='FILE', 41 | help='load saved model from FILE') 42 | FLAGS.add_argument('-i', '--images', metavar='FILE', 43 | help='load image data from FILE') 44 | FLAGS.add_argument('-l', '--labels', metavar='FILE', 45 | help='load image labels from FILE') 46 | FLAGS.add_argument('-g', '--gaussian', action='store_true', 47 | help='use gaussian visible units') 48 | FLAGS.add_argument('-c', '--conv', action='store_true', 49 | help='use a convolutional network') 50 | FLAGS.add_argument('-r', '--learning-rate', type=float, default=0.1, metavar='K', 51 | help='use a learning rate of K') 52 | FLAGS.add_argument('-m', '--momentum', type=float, default=0.2, metavar='K', 53 | help='use a learning momentum of K') 54 | FLAGS.add_argument('--l2', type=float, default=0.001, metavar='K', 55 | help='apply L2 regularization with weight K') 56 | FLAGS.add_argument('-p', '--sparsity', type=float, default=0.1, metavar='K', 57 | help='set a target sparsity of K for hidden units') 58 | FLAGS.add_argument('-n', '--n', type=int, default=10, 59 | help='use NxN hidden units') 60 | FLAGS.add_argument('-b', '--batch-size', type=int, default=257, metavar='N', 61 | help='process N images in one minibatch') 62 | 63 | 64 | if __name__ == '__main__': 65 | logging.basicConfig( 66 | stream=sys.stdout, 67 | level=logging.INFO, 68 | format='%(levelname).1s %(asctime)s [%(module)s:%(lineno)d] %(message)s') 69 | 70 | args = FLAGS.parse_args() 71 | 72 | _visibles = np.zeros((args.n, 28, 28), dtype=np.float32) 73 | _hiddens = np.zeros((args.n, args.n), dtype=np.float32) 74 | _weights = np.zeros((args.n * args.n, 28, 28), dtype=np.float32) 75 | 76 | fig = glumpy.figure() 77 | 78 | visibles = [glumpy.image.Image(v) for v in _visibles] 79 | hiddens = glumpy.image.Image(_hiddens) 80 | weights = [glumpy.image.Image(w) for w in _weights] 81 | 82 | visible_frames = [ 83 | fig.add_figure(args.n + 1, args.n, position=(args.n, r)).add_frame(aspect=1) 84 | for r in range(args.n)] 85 | 86 | weight_frames = [ 87 | fig.add_figure(args.n + 1, args.n, position=(c, r)).add_frame(aspect=1) 88 | for r in range(args.n) for c in range(args.n)] 89 | 90 | loaded = False 91 | recent = collections.deque(maxlen=20) 92 | errors = [collections.deque(maxlen=20) for _ in range(10)] 93 | trainset = dict((i, []) for i in range(10)) 94 | loader = idx_reader.iterimages(args.images, args.labels, True) 95 | 96 | Model = lmj.rbm.Convolutional if args.conv else lmj.rbm.RBM 97 | rbm = args.model and pickle.load(open(args.model, 'rb')) or Model( 98 | 28 * 28, args.n * args.n, not args.gaussian) 99 | 100 | Trainer = lmj.rbm.ConvolutionalTrainer if args.conv else lmj.rbm.Trainer 101 | trainer = Trainer(rbm, l2=args.l2, momentum=args.momentum, target_sparsity=args.sparsity) 102 | 103 | def get_pixels(): 104 | global loaded 105 | if not loaded and all(len(v) > 50 for v in trainset.itervalues()): 106 | loaded = True 107 | 108 | if loaded: 109 | t = rng.randint(10) 110 | pixels = trainset[t][rng.randint(len(trainset[t]))] 111 | else: 112 | t, pixels = loader.next() 113 | trainset[t].append(pixels) 114 | 115 | recent.append(pixels) 116 | if len(recent) < 20: 117 | raise RuntimeError 118 | 119 | return pixels 120 | 121 | def flatten(pixels): 122 | if not args.gaussian: 123 | return pixels.reshape((1, 28 * 28)) > 30. 124 | r = np.array(recent) 125 | mu = r.mean(axis=0) 126 | sigma = np.clip(r.std(axis=0), 0.1, np.inf) 127 | return ((pixels - mu) / sigma).reshape((1, 28 * 28)) 128 | 129 | def unflatten(flat): 130 | if not args.gaussian: 131 | return 256. * flat.reshape((28, 28)) 132 | r = np.array(recent) 133 | mu = r.mean(axis=0) 134 | sigma = r.std(axis=0) 135 | return sigma * flat.reshape((28, 28)) + mu 136 | 137 | def learn(): 138 | batch = np.zeros((args.batch_size, 28 * 28), 'd') 139 | for i in range(args.batch_size): 140 | while True: 141 | try: 142 | pixels = get_pixels() 143 | break 144 | except RuntimeError: 145 | pass 146 | flat = flatten(pixels) 147 | batch[i:i+1] = flat 148 | 149 | trainer.learn(batch, learning_rate=args.learning_rate) 150 | 151 | logging.debug('mean weight: %.3g, vis bias: %.3g, hid bias: %.3g', 152 | rbm.weights.mean(), rbm.vis_bias.mean(), rbm.hid_bias.mean()) 153 | 154 | return pixels, flat 155 | 156 | def update(pixels, flat): 157 | for i, (v, h) in enumerate(rbm.iter_passes(flat)): 158 | if i == len(visibles): 159 | break 160 | _visibles[i] = unflatten(v) 161 | [v.update() for v in visibles] 162 | 163 | _hiddens[:] = h.reshape((args.n, args.n)) 164 | hiddens.update() 165 | 166 | for i in range(args.n * args.n): 167 | _weights[i] = rbm.weights[i].reshape((28, 28)) 168 | [w.update() for w in weights] 169 | 170 | fig.redraw() 171 | 172 | @fig.event 173 | def on_draw(): 174 | fig.clear(0, 0, 0, 0) 175 | for f in weight_frames + visible_frames: 176 | f.draw(x=f.x, y=f.y) 177 | for f, w in zip(weight_frames, weights): 178 | w.draw(x=f.x, y=f.y, z=0, width=f.width, height=f.height) 179 | for f, v in zip(visible_frames, visibles): 180 | v.draw(x=f.x, y=f.y, z=0, width=f.width, height=f.height) 181 | 182 | @fig.event 183 | def on_idle(dt): 184 | update(*learn()) 185 | 186 | @fig.event 187 | def on_key_press(key, modifiers): 188 | if key == glumpy.window.key.ESCAPE: 189 | sys.exit() 190 | if key == glumpy.window.key.S: 191 | fn = datetime.datetime.now().strftime('rbm-%Y%m%d-%H%M%S.p') 192 | pickle.dump(rbm, open(fn, 'wb')) 193 | 194 | glumpy.show() 195 | --------------------------------------------------------------------------------