├── .gitignore
├── MANIFEST.in
├── README.md
├── lmj
    ├── __init__.py
    └── rbm.py
├── setup.py
└── test
    ├── idx_reader.py
    └── mnist.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | #*#
3 | *.so
4 | *.py[oc]
5 | *.egg-info
6 | dist
7 | build
8 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include README.md
2 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # py-rbm
 2 | 
 3 | This is a small Python library that contains code for using and training
 4 | Restricted Boltzmann Machines (RBMs), the basic building blocks for many types
 5 | of deep belief networks. Variations available include the "standard" RBM (with
 6 | optional sparsity-based hidden layer learning); the temporal net introduced by
 7 | [Taylor, Hinton & Roweis][]; and convolutional nets with probabilistic
 8 | max-pooling described by [Lee, Grosse, Ranganath & Ng][].
 9 | 
10 | Mostly I wrote the code to better understand the underlying algorithms. I don't
11 | use it for anything at the moment, having moved on to using primarily [Theano][]
12 | with [networks of rectified linear neurons][http://www.csri.utoronto.ca/~hinton/absps/reluICML.pdf]
13 | (PDF). Still, there seems to be some interest in RBMs, so hopefully others will
14 | find this package instructive, and maybe even useful !
15 | 
16 | [Taylor, Hinton & Roweis]: http://www.cs.nyu.edu/~gwtaylor/publications/nips2006mhmublv/
17 | [Lee, Grosse, Ranganath & Ng]: http://cacm.acm.org/magazines/2011/10/131415-unsupervised-learning-of-hierarchical-representations-with-convolutional-deep-belief-networks/fulltext
18 | [Theano]: http://deeplearning.net/software/theano/
19 | 
20 | ## Installation
21 | 
22 | Just install using the included setup script :
23 | 
24 |     python setup.py install
25 | 
26 | Or you can install the package from the internets using pip :
27 | 
28 |     pip install lmj.rbm
29 | 
30 | ## Testing
31 | 
32 | This library is definitely very alpha; so far I just have one main test that
33 | encodes image data. To try things out, clone the source for this package and
34 | install [glumpy][] :
35 | 
36 |     pip install glumpy
37 | 
38 | Then download the MNIST digits data from http://yann.lecun.com/exdb/mnist/ --
39 | you'll need both the `train-*-images.ubyte.gz` and `train-*-labels.ubyte.gz`
40 | files. Then run the test :
41 | 
42 |     python test/mnist.py \
43 |       --images *-images.ubyte.gz \
44 |       --labels *-labels.ubyte.gz
45 | 
46 | If you're feeling overconfident, go ahead and try out the gaussian visible
47 | units :
48 | 
49 |     python test/mnist.py \
50 |       --images *-images.ubyte.gz \
51 |       --labels *-labels.ubyte.gz \
52 |       --batch-size 257 \
53 |       --l2 0.0001 \
54 |       --learning-rate 0.2 \
55 |       --momentum 0.5 \
56 |       --sparsity 0.01 \
57 |       --gaussian
58 | 
59 | The learning parameters can be a bit squirrely, but if things go right you
60 | should see a number of images show up on your screen that represent the "basis
61 | functions" that the network has learned when trying to auto-encode the MNIST
62 | images you are feeding it.
63 | 
64 | You can also try running the test script with `--conv` to try out a
65 | convolutional filterbank, but I'm not confident that the conv net test is
66 | working correctly. Anyway, if you're thinking of using conv nets for a project,
67 | please have a look at [Theano], or for a highly-tuned GPU/C++ implementation,
68 | https://code.google.com/p/cuda-convnet/ (by
69 | [Alex Krizhevsky][www.cs.toronto.edu/~kriz/]).
70 | 
71 | [glumpy]: http://code.google.com/p/glumpy/
72 | 
73 | ## License
74 | 
75 | (The MIT License)
76 | 
77 | Copyright (c) 2011 Leif Johnson <leif@leifjohnson.net>
78 | 
79 | Permission is hereby granted, free of charge, to any person obtaining a copy of
80 | this software and associated documentation files (the 'Software'), to deal in
81 | the Software without restriction, including without limitation the rights to
82 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
83 | the Software, and to permit persons to whom the Software is furnished to do so,
84 | subject to the following conditions:
85 | 
86 | The above copyright notice and this permission notice shall be included in all
87 | copies or substantial portions of the Software.
88 | 
89 | THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
90 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
91 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
92 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
93 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
94 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
95 | 


--------------------------------------------------------------------------------
/lmj/__init__.py:
--------------------------------------------------------------------------------
1 | __import__('pkg_resources').declare_namespace(__name__)
2 | 


--------------------------------------------------------------------------------
/lmj/rbm.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2011 Leif Johnson <leif@leifjohnson.net>
  2 | #
  3 | # Permission is hereby granted, free of charge, to any person obtaining a copy
  4 | # of this software and associated documentation files (the "Software"), to deal
  5 | # in the Software without restriction, including without limitation the rights
  6 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  7 | # copies of the Software, and to permit persons to whom the Software is
  8 | # furnished to do so, subject to the following conditions:
  9 | #
 10 | # The above copyright notice and this permission notice shall be included in all
 11 | # copies or substantial portions of the Software.
 12 | #
 13 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 14 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 15 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 16 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 17 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 18 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 19 | # SOFTWARE.
 20 | 
 21 | '''An implementation of several types of Restricted Boltzmann Machines.
 22 | 
 23 | This code is largely based on the Matlab generously provided by Taylor, Hinton
 24 | and Roweis, and described in their 2006 NIPS paper, "Modeling Human Motion Using
 25 | Binary Hidden Variables". Their code and results are available online at
 26 | http://www.cs.nyu.edu/~gwtaylor/publications/nips2006mhmublv/.
 27 | 
 28 | There are more RBM implementations in this module than just the Taylor
 29 | "conditional RBM," though. The basic (non-Temporal) RBM is based on the Taylor,
 30 | Hinton, and Roweis code, but stripped of the dynamic bias terms and refactored
 31 | into an object-oriented style. The convolutional RBM code is based on the 2009
 32 | ICML paper by Lee, Grosse, Ranganath and Ng, "Convolutional Deep Belief Networks
 33 | for Scalable Unsupervised Learning of Hierarchical Representations".
 34 | 
 35 | The mean-covariance RBM is based on XXX.
 36 | 
 37 | All implementations incorporate an option to train hidden unit biases using a
 38 | sparsity criterion, as described in the 2008 NIPS paper by Lee, Ekanadham and
 39 | Ng, "Sparse Deep Belief Net Model for Visual Area V2".
 40 | 
 41 | Most RBM implementations  provide an option to treat visible units as either
 42 | binary or gaussian. Training networks with gaussian visible units is a tricky
 43 | dance of parameter-twiddling, but binary units seem quite stable in their
 44 | learning and convergence properties.
 45 | 
 46 | Finally, although I have tried to ensure that the code is correct, there are
 47 | probably many bugs, all of which are my own doing. I wrote this code to get a
 48 | better intuitive understanding for the RBM family of machine learning
 49 | algorithms, but I do not claim that the code is useful for a particular purpose
 50 | or produces state-of-the-art results. Mostly I hope that this code is readable
 51 | so that others can use it to better understand how this whole RBM thing works.
 52 | '''
 53 | 
 54 | import numpy as np
 55 | import logging
 56 | import numpy.random as rng
 57 | 
 58 | 
 59 | def sigmoid(eta):
 60 |     '''Return the logistic sigmoid function of the argument.'''
 61 |     return 1. / (1. + np.exp(-eta))
 62 | 
 63 | 
 64 | def identity(eta):
 65 |     '''Return the identity function of the argument.'''
 66 |     return eta
 67 | 
 68 | 
 69 | def bernoulli(p):
 70 |     '''Return an array of boolean samples from Bernoulli(p).
 71 | 
 72 |     Parameters
 73 |     ----------
 74 |     p : ndarray
 75 |         This array should contain values in [0, 1].
 76 | 
 77 |     Returns
 78 |     -------
 79 |     An array of the same shape as p. Each value in the result will be a boolean
 80 |     indicating whether a single Bernoulli trial succeeded for the corresponding
 81 |     element of `p`.
 82 |     '''
 83 |     return rng.rand(*p.shape) < p
 84 | 
 85 | 
 86 | class RBM(object):
 87 |     '''A Restricted Boltzmann Machine (RBM) is a probabilistic model of data.
 88 | 
 89 |     RBMs have two layers of variables (here called "units," in keeping with
 90 |     neural network terminology) -- a "visible" layer that models data in the
 91 |     world, and a "hidden" layer that is imagined to generate the data in the
 92 |     visible layer. RBMs are inspired by the (unrestricted) Boltzmann Machine, a
 93 |     model from statistical physics in which each unit is connected to all other
 94 |     units, and the states of unobserved variables can be inferred by using a
 95 |     sampling procedure.
 96 | 
 97 |     The full connectivity of an unrestricted Boltzmann Machine makes inference
 98 |     difficult, requiring sampling or some other approximate technique. RBMs
 99 |     restrict this more general model by requiring that the visible and hidden
100 |     units form a fully connected, undirected, bipartite graph. In this way, each
101 |     of the visible units is independent from the other visible units when
102 |     conditioned on the state of the hidden layer, and each of the hidden units
103 |     is independent of the others when conditioned on the state of the visible
104 |     layer. This conditional independence makes inference tractable for the units
105 |     in a single RBM.
106 | 
107 |     To "encode" a signal by determining the state of the hidden units given some
108 |     visible data ("signal"),
109 | 
110 |     1. the signal is presented to the visible units, and
111 |     2. the states of the hidden units are sampled from the conditional
112 |        distribution given the visible data.
113 | 
114 |     To "decode" an encoding in the hidden units,
115 | 
116 |     3. the states of the visible units are sampled from the conditional
117 |        distribution given the states of the hidden units.
118 | 
119 |     Once a signal has been encoded and then decoded,
120 | 
121 |     4. the sampled visible units can be compared directly with the original
122 |        visible data.
123 | 
124 |     Training takes place by presenting a number of data points to the network,
125 |     encoding the data, reconstructing it from the hidden states, and encoding
126 |     the reconstruction in the hidden units again. Then, using contrastive
127 |     divergence (Hinton 2002; Hinton & Salakhutdinov 2006), the gradient is
128 |     approximated using the correlations between visible and hidden units in the
129 |     first encoding and the same correlations in the second encoding.
130 |     '''
131 | 
132 |     def __init__(self, num_visible, num_hidden, binary=True, scale=0.001):
133 |         '''Initialize a restricted boltzmann machine.
134 | 
135 |         Parameters
136 |         ----------
137 |         num_visible : int
138 |             The number of visible units.
139 | 
140 |         num_hidden : int
141 |             The number of hidden units.
142 | 
143 |         binary : bool
144 |             True if the visible units are binary, False if the visible units are
145 |             normally distributed.
146 | 
147 |         scale : float
148 |             Sample initial weights from N(0, scale).
149 |         '''
150 |         self.weights = scale * rng.randn(num_hidden, num_visible)
151 |         self.hid_bias = scale * rng.randn(num_hidden, 1)
152 |         self.vis_bias = scale * rng.randn(num_visible, 1)
153 | 
154 |         self._visible = binary and sigmoid or identity
155 | 
156 |     @property
157 |     def num_hidden(self):
158 |         return len(self.hid_bias)
159 | 
160 |     @property
161 |     def num_visible(self):
162 |         return len(self.vis_bias)
163 | 
164 |     def hidden_expectation(self, visible, bias=0.):
165 |         '''Given visible data, return the expected hidden unit values.'''
166 |         return sigmoid(np.dot(self.weights, visible.T) + self.hid_bias + bias)
167 | 
168 |     def visible_expectation(self, hidden, bias=0.):
169 |         '''Given hidden states, return the expected visible unit values.'''
170 |         return self._visible(np.dot(hidden.T, self.weights) + self.vis_bias.T + bias)
171 | 
172 |     def iter_passes(self, visible):
173 |         '''Repeatedly pass the given visible layer up and then back down.
174 | 
175 |         Parameters
176 |         ----------
177 |         visible : ndarray
178 |             The initial state of the visible layer.
179 | 
180 |         Returns
181 |         -------
182 |         Generates a sequence of (visible, hidden) states. The first pair will be
183 |         the (original visible, resulting hidden) states, followed by pairs
184 |         containing the values from (visible down-pass, hidden up-pass).
185 |         '''
186 |         while True:
187 |             hidden = self.hidden_expectation(visible)
188 |             yield visible, hidden
189 |             visible = self.visible_expectation(bernoulli(hidden))
190 | 
191 |     def reconstruct(self, visible, passes=1):
192 |         '''Reconstruct a given visible layer through the hidden layer.
193 | 
194 |         Parameters
195 |         ----------
196 |         visible : ndarray
197 |             The initial state of the visible layer.
198 | 
199 |         passes : int
200 |             The number of up- and down-passes.
201 | 
202 |         Returns
203 |         -------
204 |         An array containing the reconstructed visible layer after the specified
205 |         number of up- and down- passes.
206 |         '''
207 |         for i, (visible, _) in enumerate(self.iter_passes(visible)):
208 |             if i + 1 == passes:
209 |                 return visible
210 | 
211 | 
212 | class Trainer(object):
213 |     '''
214 |     '''
215 | 
216 |     def __init__(self, rbm, momentum=0., l2=0., target_sparsity=None):
217 |         '''
218 |         '''
219 |         self.rbm = rbm
220 |         self.momentum = momentum
221 |         self.l2 = l2
222 |         self.target_sparsity = target_sparsity
223 | 
224 |         self.grad_weights = np.zeros(rbm.weights.shape, float)
225 |         self.grad_vis = np.zeros(rbm.vis_bias.shape, float)
226 |         self.grad_hid = np.zeros(rbm.hid_bias.shape, float)
227 | 
228 |     def learn(self, visible, learning_rate=0.2):
229 |         '''
230 |         '''
231 |         gradients = self.calculate_gradients(visible)
232 |         self.apply_gradients(*gradients, learning_rate=learning_rate)
233 | 
234 |     def calculate_gradients(self, visible_batch):
235 |         '''Calculate gradients for a batch of visible data.
236 | 
237 |         Returns a 3-tuple of gradients: weights, visible bias, hidden bias.
238 | 
239 |         visible_batch: A (batch size, visible units) array of visible data. Each
240 |           row represents one visible data sample.
241 |         '''
242 |         passes = self.rbm.iter_passes(visible_batch)
243 |         v0, h0 = passes.next()
244 |         v1, h1 = passes.next()
245 |         gw = (np.dot(h0, v0) - np.dot(h1, v1)) / len(visible_batch)
246 |         gv = (v0 - v1).mean(axis=0)
247 |         gh = (h0 - h1).mean(axis=1)
248 |         if self.target_sparsity is not None:
249 |             gh = self.target_sparsity - h0.mean(axis=1)
250 | 
251 |         logging.debug('displacement: %.3g, hidden std: %.3g',
252 |                       np.linalg.norm(gv), h0.std(axis=1).mean())
253 |         # make sure we pass ndarrays
254 |         gv = gv.reshape(gv.shape[0],1)
255 |         gh = gh.reshape(gh.shape[0],1)
256 |         return gw, gv, gh
257 | 
258 |     def apply_gradients(self, weights, visible, hidden, learning_rate=0.2):
259 |         '''
260 |         '''
261 |         def update(name, g, _g, l2=0):
262 |             target = getattr(self.rbm, name)
263 |             g *= 1 - self.momentum
264 |             g += self.momentum * (g - l2 * target)
265 |             target += learning_rate * g
266 |             _g[:] = g
267 | 
268 |         update('vis_bias', visible, self.grad_vis)
269 |         update('hid_bias', hidden, self.grad_hid)
270 |         update('weights', weights, self.grad_weights, self.l2)
271 | 
272 | 
273 | class Temporal(RBM):
274 |     '''An RBM that incorporates temporal (dynamic) visible and hidden biases.
275 | 
276 |     This RBM is based on work and code by Taylor, Hinton, and Roweis (2006).
277 |     '''
278 | 
279 |     def __init__(self, num_visible, num_hidden, order, binary=True, scale=0.001):
280 |         '''
281 |         '''
282 |         super(TemporalRBM, self).__init__(
283 |             num_visible, num_hidden, binary=binary, scale=scale)
284 | 
285 |         self.order = order
286 | 
287 |         self.vis_dyn_bias = scale * rng.randn(order, num_visible, num_visible)
288 |         self.hid_dyn_bias = scale * rng.randn(order, num_hidden, num_visible) - 1.
289 | 
290 |     def iter_passes(self, frames):
291 |         '''Repeatedly pass the given visible layer up and then back down.
292 | 
293 |         Generates the resulting sequence of (visible, hidden) states.
294 | 
295 |         visible: An (order, visible units) array containing frames of visible
296 |           data to "prime" the network. The temporal order of the frames is
297 |           assumed to be reversed, so frames[0] will be the current visible
298 |           frame, frames[1] is the previous frame, etc.
299 |         '''
300 |         vdb = self.vis_dyn_bias[0]
301 |         vis_dyn_bias = collections.deque(
302 |             [np.dot(self.vis_dyn_bias[i], f).T for i, f in enumerate(frames)],
303 |             maxlen=self.order)
304 | 
305 |         hdb = self.hid_dyn_bias[0]
306 |         hid_dyn_bias = collections.deque(
307 |             [np.dot(self.hid_dyn_bias[i], f).T for i, f in enumerate(frames)],
308 |             maxlen=self.order)
309 | 
310 |         visible = frames[0]
311 |         while True:
312 |             hidden = self.hidden_expectation(visible, sum(hid_dyn_bias))
313 |             yield visible, hidden
314 |             visible = self.visible_expectation(bernoulli(hidden), sum(vis_dyn_bias))
315 |             vis_dyn_bias.append(np.dot(vdb, visible))
316 |             hid_dyn_bias.append(np.dot(hdb, visible))
317 | 
318 | 
319 | class TemporalTrainer(Trainer):
320 |     '''
321 |     '''
322 | 
323 |     def __init__(self, rbm, momentum=0.2, l2=0.1, target_sparsity=None):
324 |         '''
325 |         '''
326 |         super(TemporalTrainer, self).__init__(rbm, momentum, l2, target_sparsity)
327 |         self.grad_dyn_vis = np.zeros(rbm.hid_dyn_bias.shape, float)
328 |         self.grad_dyn_hid = np.zeros(rbm.hid_dyn_bias.shape, float)
329 | 
330 |     def calculate_gradients(self, frames_batch):
331 |         '''Calculate gradients using contrastive divergence.
332 | 
333 |         Returns a 5-tuple of gradients: weights, visible bias, hidden bias,
334 |         dynamic visible bias, and dynamic hidden bias.
335 | 
336 |         frames_batch: An (order, visible units, batch size) array containing a
337 |           batch of frames of visible data.
338 | 
339 |           Frames are assumed to be reversed temporally, across the order
340 |           dimension, i.e., frames_batch[0] is the current visible frame in each
341 |           batch element, frames_batch[1] is the previous visible frame,
342 |           frames_batch[2] is the one before that, etc.
343 |         '''
344 |         order, _, batch_size = frames_batch.shape
345 |         assert order == self.rbm.order
346 | 
347 |         vis_bias = sum(np.dot(self.rbm.vis_dyn_bias[i], f).T for i, f in enumerate(frames_batch))
348 |         hid_bias = sum(np.dot(self.rbm.hid_dyn_bias[i], f).T for i, f in enumerate(frames_batch))
349 | 
350 |         v0 = frames_batch[0].T
351 |         h0 = self.rbm.hidden_expectation(v0, hid_bias)
352 |         v1 = self.rbm.visible_expectation(bernoulli(h0), vis_bias)
353 |         h1 = self.rbm.hidden_expectation(v1, hid_bias)
354 | 
355 |         gw = (np.dot(h0.T, v0) - np.dot(h1.T, v1)) / batch_size
356 |         gv = (v0 - v1).mean(axis=0)
357 |         gh = (h0 - h1).mean(axis=0)
358 | 
359 |         gvd = np.zeros(self.rbm.vis_dyn_bias.shape, float)
360 |         ghd = np.zeros(self.rbm.hid_dyn_bias.shape, float)
361 |         v = v0 - self.rbm.vis_bias - vis_bias
362 |         for i, f in enumerate(frames_batch):
363 |             gvd[i] += np.dot(v.T, f)
364 |             ghd[i] += np.dot(h0.T, f)
365 |         v = v1 - self.rbm.vis_bias - vis_bias
366 |         for i, f in enumerate(frames_batch):
367 |             gvd[i] -= np.dot(v.T, f)
368 |             ghd[i] -= np.dot(h1.T, f)
369 | 
370 |         return gw, gv, gh, gvd, ghd
371 | 
372 |     def apply_gradients(self, weights, visible, hidden, visible_dyn, hidden_dyn,
373 |                         learning_rate=0.2):
374 |         '''
375 |         '''
376 |         def update(name, g, _g, l2=0):
377 |             target = getattr(self.rbm, name)
378 |             g *= 1 - self.momentum
379 |             g += self.momentum * (g - l2 * target)
380 |             target += learning_rate * g
381 |             _g[:] = g
382 | 
383 |         update('vis_bias', visible, self.grad_vis)
384 |         update('hid_bias', hidden, self.grad_hid)
385 |         update('weights', weights, self.grad_weights, self.l2)
386 |         update('vis_dyn_bias', visible_dyn, self.grad_vis_dyn, self.l2)
387 |         update('hid_dyn_bias', hidden_dyn, self.grad_hid_dyn, self.l2)
388 | 
389 | 
390 | import scipy.signal
391 | convolve = scipy.signal.convolve
392 | 
393 | 
394 | class Convolutional(RBM):
395 |     '''
396 |     '''
397 | 
398 |     def __init__(self, num_filters, filter_shape, pool_shape, binary=True, scale=0.001):
399 |         '''Initialize a convolutional restricted boltzmann machine.
400 | 
401 |         num_filters: The number of convolution filters.
402 |         filter_shape: An ordered pair describing the shape of the filters.
403 |         pool_shape: An ordered pair describing the shape of the pooling groups.
404 |         binary: True if the visible units are binary, False if the visible units
405 |           are normally distributed.
406 |         scale: Scale initial values by this parameter.
407 |         '''
408 |         self.num_filters = num_filters
409 | 
410 |         self.weights = scale * rng.randn(num_filters, *filter_shape)
411 |         self.vis_bias = scale * rng.randn()
412 |         self.hid_bias = scale * rng.randn(num_filters)
413 | 
414 |         self._visible = binary and sigmoid or identity
415 |         self._pool_shape = pool_shape
416 | 
417 |     def _pool(self, hidden):
418 |         '''Given activity in the hidden units, pool it into groups.'''
419 |         _, r, c = hidden.shape
420 |         rows, cols = self._pool_shape
421 |         active = np.exp(hidden.T)
422 |         pool = np.zeros(active.shape, float)
423 |         for j in range(int(np.ceil(float(c) / cols))):
424 |             cslice = slice(j * cols, (j + 1) * cols)
425 |             for i in range(int(np.ceil(float(r) / rows))):
426 |                 mask = (cslice, slice(i * rows, (i + 1) * rows))
427 |                 pool[mask] = active[mask].sum(axis=0).sum(axis=0)
428 |         return pool.T
429 | 
430 |     def pooled_expectation(self, visible, bias=0.):
431 |         '''Given visible data, return the expected pooling unit values.'''
432 |         activation = np.exp(np.array([
433 |             convolve(visible, self.weights[k, ::-1, ::-1], 'valid')
434 |             for k in range(self.num_filters)]).T + self.hid_bias + bias).T
435 |         return 1. - 1. / (1. + self._pool(activation))
436 | 
437 |     def hidden_expectation(self, visible, bias=0.):
438 |         '''Given visible data, return the expected hidden unit values.'''
439 |         activation = np.exp(np.array([
440 |             convolve(visible, self.weights[k, ::-1, ::-1], 'valid')
441 |             for k in range(self.num_filters)]).T + self.hid_bias + bias).T
442 |         return activation / (1. + self._pool(activation))
443 | 
444 |     def visible_expectation(self, hidden, bias=0.):
445 |         '''Given hidden states, return the expected visible unit values.'''
446 |         activation = sum(
447 |             convolve(hidden[k], self.weights[k], 'full')
448 |             for k in range(self.num_filters)) + self.vis_bias + bias
449 |         return self._visible(activation)
450 | 
451 | 
452 | class ConvolutionalTrainer(Trainer):
453 |     '''
454 |     '''
455 | 
456 |     def calculate_gradients(self, visible):
457 |         '''Calculate gradients for an instance of visible data.
458 | 
459 |         Returns a 3-tuple of gradients: weights, visible bias, hidden bias.
460 | 
461 |         visible: A single array of visible data.
462 |         '''
463 |         v0 = visible
464 |         h0 = self.rbm.hidden_expectation(v0)
465 |         v1 = self.rbm.visible_expectation(bernoulli(h0))
466 |         h1 = self.rbm.hidden_expectation(v1)
467 | 
468 |         # h0.shape == h1.shape == (num_filters, visible_rows - filter_rows + 1, visible_columns - filter_columns + 1)
469 |         # v0.shape == v1.shape == (visible_rows, visible_columns)
470 | 
471 |         gw = np.array([
472 |             convolve(v0, h0[k, ::-1, ::-1], 'valid') -
473 |             convolve(v1, h1[k, ::-1, ::-1], 'valid')
474 |             for k in range(self.rbm.num_filters)])
475 |         gv = (v0 - v1).sum()
476 |         gh = (h0 - h1).sum(axis=-1).sum(axis=-1)
477 |         if self.target_sparsity is not None:
478 |             h = self.target_sparsity - self.rbm.hidden_expectation(visible).mean(axis=-1).mean(axis=-1)
479 | 
480 |         logging.debug('displacement: %.3g, hidden activations: %.3g',
481 |                       np.linalg.norm(gv), h0.mean(axis=-1).mean(axis=-1).std())
482 | 
483 |         return gw, gv, gh
484 | 
485 | 
486 | class MeanCovariance(RBM):
487 |     '''
488 |     '''
489 | 
490 |     def __init__(self, num_visible, num_mean, num_precision, scale=0.001):
491 |         '''Initialize a mean-covariance restricted boltzmann machine.
492 | 
493 |         num_visible: The number of visible units.
494 |         num_mean: The number of units in the hidden mean vector.
495 |         num_precision: The number of units in the hidden precision vector.
496 |         '''
497 |         super(MeanCovariance, self).__init__(
498 |             num_visible, num_mean, binary=False, scale=scale)
499 | 
500 |         # replace the hidden bias to reflect the precision units.
501 |         self.hid_bias = scale * rng.randn(num_precision, 1)
502 | 
503 |         self.hid_mean = scale * rng.randn(num_mean, 1)
504 | 
505 |         self.hid_factor_u = scale * -abs(rng.randn(num_precision - 1))
506 |         self.hid_factor_c = scale * -abs(rng.randn(num_precision))
507 |         self.hid_factor_l = scale * -abs(rng.randn(num_precision - 1))
508 | 
509 |         self.vis_factor = scale * rng.randn(num_visible, num_precision)
510 | 
511 |     @property
512 |     def hid_factor(self):
513 |         return (numpy.diag(self.hid_factor_u, 1) +
514 |                 numpy.diag(self.hid_factor_c, 0) +
515 |                 numpy.diag(self.hid_factor_l, -1))
516 | 
517 |     def hidden_expectation(self, visible):
518 |         '''Given visible data, return the expected hidden unit values.'''
519 |         z = numpy.dot(visible.T, self.vis_factor)
520 |         return sigmoid(numpy.dot(z * z, self.hid_factor).T + self.hid_bias)
521 | 
522 |     def visible_expectation(self, hidden):
523 |         '''Given hidden states, return the expected visible unit values.'''
524 |         z = numpy.diag(numpy.dot(-self.hid_factor.T, hidden))
525 |         Sinv = numpy.dot(self.vis_factor, numpy.dot(z, self.vis_factor.T))
526 |         return numpy.dot(numpy.dot(numpy.pinv(Sinv), self.weights), self.hid_mean)
527 | 
528 | 
529 | class MeanCovarianceTrainer(Trainer):
530 |     '''
531 |     '''
532 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import setuptools
 3 | 
 4 | setuptools.setup(
 5 |     name='lmj.rbm',
 6 |     version='0.1.1',
 7 |     namespace_packages=['lmj'],
 8 |     packages=setuptools.find_packages(),
 9 |     author='Leif Johnson',
10 |     author_email='leif@leifjohnson.net',
11 |     description='A library of Restricted Boltzmann Machines',
12 |     long_description=open(os.path.join(os.path.dirname(os.path.abspath(__file__)), 'README.md')).read(),
13 |     license='MIT',
14 |     url='http://github.com/lmjohns3/py-rbm/',
15 |     keywords=('deep-belief-network '
16 |               'restricted-boltzmann-machine '
17 |               'machine-learning'),
18 |     install_requires=['numpy'],
19 |     classifiers=[
20 |         'Development Status :: 3 - Alpha',
21 |         'Intended Audience :: Science/Research',
22 |         'License :: OSI Approved :: MIT License',
23 |         'Operating System :: OS Independent',
24 |         'Topic :: Scientific/Engineering :: Artificial Intelligence',
25 |         ],
26 |     )
27 | 


--------------------------------------------------------------------------------
/test/idx_reader.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2011 Leif Johnson <leif@leifjohnson.net>
  2 | #
  3 | # Permission is hereby granted, free of charge, to any person obtaining a copy
  4 | # of this software and associated documentation files (the "Software"), to deal
  5 | # in the Software without restriction, including without limitation the rights
  6 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  7 | # copies of the Software, and to permit persons to whom the Software is
  8 | # furnished to do so, subject to the following conditions:
  9 | #
 10 | # The above copyright notice and this permission notice shall be included in all
 11 | # copies or substantial portions of the Software.
 12 | #
 13 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 14 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 15 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 16 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 17 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 18 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 19 | # SOFTWARE.
 20 | 
 21 | '''A Python library for reading MNIST handwritten digit database (IDX) files.'''
 22 | 
 23 | import gzip
 24 | import numpy as np
 25 | import struct
 26 | 
 27 | 
 28 | def iterimages(image_file, label_file=None, unzip=False):
 29 |     '''Iterate over labels and images from the MNIST handwritten digit dataset.
 30 | 
 31 |     This function generates (label, image) pairs, one for each image in the
 32 |     dataset. The image is represented as a numpy array of the pixel values, and
 33 |     the label is an integer.
 34 | 
 35 |     image_file: The name of a binary IDX file to load image data from.
 36 |     label_file: The name of a binary IDX file to load label data from.
 37 |     ungzip: If True, the binary files will be gunzipped automatically before
 38 |       reading.
 39 |     '''
 40 |     opener = unzip and gzip.open or open
 41 | 
 42 |     # check the label header
 43 |     label_count = None
 44 |     if label_file:
 45 |         handle = opener(label_file, 'rb')
 46 |         label_data = handle.read()
 47 |         handle.close()
 48 |         magic, label_count = struct.unpack('>2i', label_data[:8])
 49 |         assert magic == 2049
 50 |         assert label_count > 0
 51 |         label_data = label_data[8:]
 52 | 
 53 |     # check the image header
 54 |     handle = opener(image_file, 'rb')
 55 |     image_data = handle.read()
 56 |     handle.close()
 57 |     magic, image_count, rows, columns = struct.unpack('>4i', image_data[:16])
 58 |     assert magic == 2051
 59 |     assert image_count > 0
 60 |     assert rows > 0
 61 |     assert columns > 0
 62 |     image_data = image_data[16:]
 63 | 
 64 |     # check that the two files agree on cardinality
 65 |     assert label_count is None or image_count == label_count
 66 | 
 67 |     for _ in range(image_count):
 68 |         label = None
 69 |         if label_count:
 70 |             label, = struct.unpack('B', label_data[:1])
 71 |             label_data = label_data[1:]
 72 | 
 73 |         count = rows * columns
 74 |         pixels = struct.unpack('%dB' % count, image_data[:count])
 75 |         image_data = image_data[count:]
 76 | 
 77 |         yield label, np.array(pixels).astype(float).reshape((rows, columns))
 78 | 
 79 | 
 80 | if __name__ == '__main__':
 81 |     import sys
 82 |     import glumpy
 83 | 
 84 |     iterator = iterimages(sys.argv[1], sys.argv[2], False)
 85 |     composites = [np.zeros((28, 28), 'f') for _ in range(10)]
 86 | 
 87 |     fig = glumpy.Figure()
 88 |     images_and_frames = []
 89 |     for i, c in enumerate(composites):
 90 |         frame = fig.add_figure(rows=2, cols=5, position=divmod(i, 2)).add_frame(aspect=1)
 91 |         images_and_frames.append((glumpy.Image(c), frame))
 92 | 
 93 |     @fig.event
 94 |     def on_draw():
 95 |         fig.clear()
 96 |         for image, frame in images_and_frames:
 97 |             image.update()
 98 |             frame.draw(x=frame.x, y=frame.y)
 99 |             image.draw(x=frame.x, y=frame.y, z=0, width=frame.width, height=frame.height)
100 | 
101 |     @fig.event
102 |     def on_idle(dt):
103 |         try:
104 |             label, pixels = iterator.next()
105 |         except StopIteration:
106 |             sys.exit()
107 |         composites[label] *= 0.3
108 |         composites[label] += 0.7 * pixels
109 |         fig.redraw()
110 | 
111 |     glumpy.show()
112 | 


--------------------------------------------------------------------------------
/test/mnist.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | # Copyright (c) 2011 Leif Johnson <leif@leifjohnson.net>
  4 | #
  5 | # Permission is hereby granted, free of charge, to any person obtaining a copy
  6 | # of this software and associated documentation files (the "Software"), to deal
  7 | # in the Software without restriction, including without limitation the rights
  8 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  9 | # copies of the Software, and to permit persons to whom the Software is
 10 | # furnished to do so, subject to the following conditions:
 11 | #
 12 | # The above copyright notice and this permission notice shall be included in all
 13 | # copies or substantial portions of the Software.
 14 | #
 15 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 16 | # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 17 | # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 18 | # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 19 | # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 20 | # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 21 | # SOFTWARE.
 22 | 
 23 | import argparse
 24 | import collections
 25 | import datetime
 26 | import glumpy
 27 | import logging
 28 | import numpy as np
 29 | import numpy.random as rng
 30 | import os
 31 | import pickle
 32 | import sys
 33 | 
 34 | import lmj.rbm
 35 | import idx_reader
 36 | 
 37 | FLAGS = argparse.ArgumentParser(
 38 |         conflict_handler='resolve',
 39 |         formatter_class=argparse.ArgumentDefaultsHelpFormatter)
 40 | FLAGS.add_argument('--model', metavar='FILE',
 41 |                  help='load saved model from FILE')
 42 | FLAGS.add_argument('-i', '--images', metavar='FILE',
 43 |                  help='load image data from FILE')
 44 | FLAGS.add_argument('-l', '--labels', metavar='FILE',
 45 |                  help='load image labels from FILE')
 46 | FLAGS.add_argument('-g', '--gaussian', action='store_true',
 47 |                  help='use gaussian visible units')
 48 | FLAGS.add_argument('-c', '--conv', action='store_true',
 49 |                  help='use a convolutional network')
 50 | FLAGS.add_argument('-r', '--learning-rate', type=float, default=0.1, metavar='K',
 51 |                  help='use a learning rate of K')
 52 | FLAGS.add_argument('-m', '--momentum', type=float, default=0.2, metavar='K',
 53 |                  help='use a learning momentum of K')
 54 | FLAGS.add_argument('--l2', type=float, default=0.001, metavar='K',
 55 |                  help='apply L2 regularization with weight K')
 56 | FLAGS.add_argument('-p', '--sparsity', type=float, default=0.1, metavar='K',
 57 |                  help='set a target sparsity of K for hidden units')
 58 | FLAGS.add_argument('-n', '--n', type=int, default=10,
 59 |                  help='use NxN hidden units')
 60 | FLAGS.add_argument('-b', '--batch-size', type=int, default=257, metavar='N',
 61 |                  help='process N images in one minibatch')
 62 | 
 63 | 
 64 | if __name__ == '__main__':
 65 |     logging.basicConfig(
 66 |         stream=sys.stdout,
 67 |         level=logging.INFO,
 68 |         format='%(levelname).1s %(asctime)s [%(module)s:%(lineno)d] %(message)s')
 69 | 
 70 |     args = FLAGS.parse_args()
 71 | 
 72 |     _visibles = np.zeros((args.n, 28, 28), dtype=np.float32)
 73 |     _hiddens = np.zeros((args.n, args.n), dtype=np.float32)
 74 |     _weights = np.zeros((args.n * args.n, 28, 28), dtype=np.float32)
 75 | 
 76 |     fig = glumpy.figure()
 77 | 
 78 |     visibles = [glumpy.image.Image(v) for v in _visibles]
 79 |     hiddens = glumpy.image.Image(_hiddens)
 80 |     weights = [glumpy.image.Image(w) for w in _weights]
 81 | 
 82 |     visible_frames = [
 83 |         fig.add_figure(args.n + 1, args.n, position=(args.n, r)).add_frame(aspect=1)
 84 |         for r in range(args.n)]
 85 | 
 86 |     weight_frames = [
 87 |         fig.add_figure(args.n + 1, args.n, position=(c, r)).add_frame(aspect=1)
 88 |         for r in range(args.n) for c in range(args.n)]
 89 | 
 90 |     loaded = False
 91 |     recent = collections.deque(maxlen=20)
 92 |     errors = [collections.deque(maxlen=20) for _ in range(10)]
 93 |     trainset = dict((i, []) for i in range(10))
 94 |     loader = idx_reader.iterimages(args.images, args.labels, True)
 95 | 
 96 |     Model = lmj.rbm.Convolutional if args.conv else lmj.rbm.RBM
 97 |     rbm = args.model and pickle.load(open(args.model, 'rb')) or Model(
 98 |         28 * 28, args.n * args.n, not args.gaussian)
 99 | 
100 |     Trainer = lmj.rbm.ConvolutionalTrainer if args.conv else lmj.rbm.Trainer
101 |     trainer = Trainer(rbm, l2=args.l2, momentum=args.momentum, target_sparsity=args.sparsity)
102 | 
103 |     def get_pixels():
104 |         global loaded
105 |         if not loaded and all(len(v) > 50 for v in trainset.itervalues()):
106 |             loaded = True
107 | 
108 |         if loaded:
109 |             t = rng.randint(10)
110 |             pixels = trainset[t][rng.randint(len(trainset[t]))]
111 |         else:
112 |             t, pixels = loader.next()
113 |             trainset[t].append(pixels)
114 | 
115 |         recent.append(pixels)
116 |         if len(recent) < 20:
117 |             raise RuntimeError
118 | 
119 |         return pixels
120 | 
121 |     def flatten(pixels):
122 |         if not args.gaussian:
123 |             return pixels.reshape((1, 28 * 28)) > 30.
124 |         r = np.array(recent)
125 |         mu = r.mean(axis=0)
126 |         sigma = np.clip(r.std(axis=0), 0.1, np.inf)
127 |         return ((pixels - mu) / sigma).reshape((1, 28 * 28))
128 | 
129 |     def unflatten(flat):
130 |         if not args.gaussian:
131 |             return 256. * flat.reshape((28, 28))
132 |         r = np.array(recent)
133 |         mu = r.mean(axis=0)
134 |         sigma = r.std(axis=0)
135 |         return sigma * flat.reshape((28, 28)) + mu
136 | 
137 |     def learn():
138 |         batch = np.zeros((args.batch_size, 28 * 28), 'd')
139 |         for i in range(args.batch_size):
140 |             while True:
141 |                 try:
142 |                     pixels = get_pixels()
143 |                     break
144 |                 except RuntimeError:
145 |                     pass
146 |             flat = flatten(pixels)
147 |             batch[i:i+1] = flat
148 | 
149 |         trainer.learn(batch, learning_rate=args.learning_rate)
150 | 
151 |         logging.debug('mean weight: %.3g, vis bias: %.3g, hid bias: %.3g',
152 |                       rbm.weights.mean(), rbm.vis_bias.mean(), rbm.hid_bias.mean())
153 | 
154 |         return pixels, flat
155 | 
156 |     def update(pixels, flat):
157 |         for i, (v, h) in enumerate(rbm.iter_passes(flat)):
158 |             if i == len(visibles):
159 |                 break
160 |             _visibles[i] = unflatten(v)
161 |             [v.update() for v in visibles]
162 | 
163 |             _hiddens[:] = h.reshape((args.n, args.n))
164 |             hiddens.update()
165 | 
166 |         for i in range(args.n * args.n):
167 |             _weights[i] = rbm.weights[i].reshape((28, 28))
168 |         [w.update() for w in weights]
169 | 
170 |         fig.redraw()
171 | 
172 |     @fig.event
173 |     def on_draw():
174 |         fig.clear(0, 0, 0, 0)
175 |         for f in weight_frames + visible_frames:
176 |             f.draw(x=f.x, y=f.y)
177 |         for f, w in zip(weight_frames, weights):
178 |             w.draw(x=f.x, y=f.y, z=0, width=f.width, height=f.height)
179 |         for f, v in zip(visible_frames, visibles):
180 |             v.draw(x=f.x, y=f.y, z=0, width=f.width, height=f.height)
181 | 
182 |     @fig.event
183 |     def on_idle(dt):
184 |         update(*learn())
185 | 
186 |     @fig.event
187 |     def on_key_press(key, modifiers):
188 |         if key == glumpy.window.key.ESCAPE:
189 |             sys.exit()
190 |         if key == glumpy.window.key.S:
191 |             fn = datetime.datetime.now().strftime('rbm-%Y%m%d-%H%M%S.p')
192 |             pickle.dump(rbm, open(fn, 'wb'))
193 | 
194 |     glumpy.show()
195 | 


--------------------------------------------------------------------------------