├── .gitignore
├── README.md
├── doc
    ├── dae.bib
    └── gsn.bib
├── image_tiler.py
├── images
    ├── 1_layer_no_walkback
    │   ├── .directory
    │   ├── number_reconstruction200.png
    │   ├── samples_epoch_100.png
    │   ├── samples_epoch_150.png
    │   ├── samples_epoch_200.png
    │   └── samples_epoch_50.png
    ├── 1_layer_walkback
    │   ├── .directory
    │   ├── number_reconstruction120.png
    │   ├── samples_epoch_100.png
    │   ├── samples_epoch_120.png
    │   ├── samples_epoch_40.png
    │   ├── samples_epoch_60.png
    │   └── samples_epoch_80.png
    └── 2_layers
    │   ├── .directory
    │   ├── number_reconstruction250.png
    │   ├── samples_epoch_100.png
    │   ├── samples_epoch_25.png
    │   ├── samples_epoch_250.png
    │   ├── samples_epoch_5.png
    │   └── samples_epoch_50.png
├── likelihood_estimation_parzen.py
├── model.py
├── run_dae_no_walkback.py
├── run_dae_walkback.py
└── run_gsn.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | *.zip
 2 | *.pkl
 3 | *.npy
 4 | *.linkinfo
 5 | *.o
 6 | *.orig
 7 | *.pyc
 8 | *.pyo
 9 | *.so
10 | *.sw?
11 | *~
12 | *.aux
13 | *.log
14 | *.nav
15 | *.out
16 | *.pdf
17 | *.snm
18 | *.toc
19 | *.vrb
20 | .noseids
21 | \#*\#
22 | build
23 | compiled/*.cpp
24 | core.*
25 | cutils_ext.cpp
26 | dist
27 | doc/.build/
28 | html
29 | pdf
30 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | This package contains the accompanying code for the following two papers:
  2 | 
  3 | * \[1\] Yoshua Bengio, Éric Thibodeau-Laufer, Jason
  4 |   Yosinski. [Deep Generative Stochastic Networks Trainable by Backprop](http://arxiv.org/abs/1306.1091). _arXiv
  5 |   preprint arXiv:1306.1091._ ([PDF](http://arxiv.org/pdf/1306.1091v3),
  6 |   [BibTeX](https://raw.github.com/yaoli/GSN/master/doc/gsn.bib))
  7 | 
  8 | * \[2\] Yoshua Bengio, Li Yao, Guillaume Alain, Pascal
  9 |   Vincent. [Generalized Denoising Auto-Encoders as Generative Models](http://papers.nips.cc/paper/5023-generalized-denoising-auto-encoders-as-generative-models). _NIPS,
 10 |   2013._ ([PDF](http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/491.pdf),
 11 |   [BibTeX](https://raw.github.com/yaoli/GSN/master/doc/dae.bib))
 12 | 
 13 | 
 14 | 
 15 | Setup
 16 | ---------------------
 17 | 
 18 | #### Install Theano
 19 | 
 20 | Download Theano and make sure it's working properly.  All the
 21 | information you need can be found by following this link:
 22 | http://deeplearning.net/software/theano/
 23 | 
 24 | #### Prepare the MNIST dataset
 25 | 
 26 | 1. Download the MNIST dataset from http://deeplearning.net/data/mnist/mnist.pkl.gz
 27 | 
 28 | 2. Unzip the file to generate mnist.pkl using `gunzip mnist.pkl.gz`
 29 | 
 30 | 3. (Optional) To visualize MNIST, run `python image_tiler.py`
 31 | 
 32 | 
 33 | 
 34 | Reproducing the Results
 35 | ---------------------
 36 | 
 37 | The below commands are given in two formats: the first will run on the
 38 | GPU and the second on the CPU. Choose whichever is most appropriate
 39 | for your setup.  Of course, the GPU versions will only work if Theano
 40 | is being used on a machine with a compatible GPU (more about
 41 | [using the GPU in Theano](http://deeplearning.net/software/theano/tutorial/using_gpu.html)).
 42 | 
 43 | 1. To run a two layer Generative Stochastic Network (paper \[1\])
 44 | 
 45 |         THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python run_gsn.py
 46 |         THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python run_gsn.py
 47 | 
 48 | 2. To run a one layer Generalized Denoising Autoencoder with a walkback procedure (paper \[2\])
 49 | 
 50 |         THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python run_dae_walkback.py
 51 |         THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python run_dae_walkback.py
 52 | 
 53 | 3. To run a one layer Generalized Denoising Autoencoder without a walkback procedure (paper \[2\])
 54 | 
 55 |         THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python run_dae_no_walkback.py
 56 |         THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python run_dae_no_walkback.py
 57 | 
 58 | 4. Getting the log-likelihood estimation and inpainting (as described in paper \[1\])
 59 | 
 60 |     To test a model, by generating inpainting pictures, and 10000
 61 |     samples used by the parzen density estimator, run this command in
 62 |     this directory, with the `params_epoch_X.pkl` and `config` file
 63 |     that was generated when training the model. If multiple
 64 |     `params_epoch_X.pkl` files are present, the one with the largest
 65 |     epoch number is used.
 66 | 
 67 |         THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python run_gsn.py --test_model 1
 68 | 
 69 | 
 70 | 
 71 | #### Important notes on running the code
 72 | 
 73 | * (1), (2) and (3) will generate images for both the denoising and
 74 |   pseudo-Gibbs sampling, and save parameters every 5 epochs. We have
 75 |   provided some examples of the reconstruction and generated samples
 76 |   (consecutive Gibbs samples) under the directory 'images/' for 3
 77 |   types of models. By just looking at the pictures there, it is clear
 78 |   that 2-layer model beats the 1-layer model with walkback training,
 79 |   which then beats the 1-layer model without walkback training.
 80 | 
 81 | * The code is written such that it produces better results on the
 82 |   estimated log-likelihood by Parzen density estimator than in our
 83 |   paper \[2\]. For example, (2) produces a log-likelihood of around
 84 |   150 and (3) produces 50. Both number could be higher if the model is
 85 |   trained longer. Trust this number with precaution, as the estimation
 86 |   from the Parzen density estimator is biased and tends to prefer
 87 |   rigid samples. You will notice this number is high even when the
 88 |   generated images do not look good. Trust the visualizations more.
 89 | 
 90 | * The codes outputs a lot of information on the screen. This is meant
 91 |   to show the progression. Also you can safely ignore the warning
 92 |   message from Theano. The training starts when the following is
 93 |   printed out:
 94 | 
 95 |         1    Train :  0.607192    Valid :  0.367054    Test  :  0.364999    time :  20.40169 MeanVisB :  -0.22522 W :  ['0.024063', '0.022423']
 96 |         2    Train :  0.302400    Valid :  0.277827    Test  :  0.277751    time :  20.33490 MeanVisB :  -0.32510 W :  ['0.023877', '0.022512']
 97 |         3    Train :  0.292427    Valid :  0.267693    Test  :  0.268585    time :  20.45896 MeanVisB :  -0.38779 W :  ['0.023882', '0.022544']
 98 |         4    Train :  0.268086    Valid :  0.267201    Test  :  0.268247    time :  20.37603 MeanVisB :  -0.43271 W :  ['0.023856', '0.022535']
 99 |         5    Train :  0.266533    Valid :  0.264087    Test  :  0.265572    time :  20.26944 MeanVisB :  -0.47086 W :  ['0.023840', '0.022517']
100 | 
101 |   For each training epoch, the first 3 numbers are the cost on the
102 |   training, validation, and test sets, followed by the training time
103 |   (in seconds, on GPU Nvidia GeForce GTX580, 300 seconds in Intel(R)
104 |   Core(TM) i7-2600K CPU @ 3.40GHz), then the mean of the visible bias,
105 |   and mean of the magnitude of weights.
106 | 
107 | 
108 | #### Contact
109 | 
110 | Questions? Contact us: li.yao@umontreal.ca
111 | 


--------------------------------------------------------------------------------
/doc/dae.bib:
--------------------------------------------------------------------------------
1 | @INPROCEEDINGS{Bengio-et-al-NIPS2013,
2 |       author = {Bengio, Yoshua and Yao, Li and Alain, Guillaume and Vincent, Pascal},
3 |        title = {Generalized Denoising Auto-Encoders as Generative Models},
4 |         year = {2013},
5 | organization = {NIPS},
6 |    booktitle = {Advances in Neural Information Processing Systems 26 (NIPS'13)},
7 |     crossref = {NIPS26}
8 | }
9 | 


--------------------------------------------------------------------------------
/doc/gsn.bib:
--------------------------------------------------------------------------------
1 | @TECHREPORT{Bengio+Laufer+Yosinski-arxiv-2013,
2 |        author = {Yoshua Bengio and Eric Thibodeau-Laufer and Jason Yosinski},
3 |         title = {Deep Generative Stochastic Networks Trainable by Backprop},
4 |        number = {arXiv:1306.1091},
5 |          year = {2013},
6 |   institution = {Universite de Montreal}
7 | }
8 | 


--------------------------------------------------------------------------------
/image_tiler.py:
--------------------------------------------------------------------------------
  1 | """ This file contains different utility functions that are not connected
  2 | in anyway to the networks presented in the tutorials, but rather help in
  3 | processing the outputs into a more understandable way.
  4 | 
  5 | For example ``tile_raster_images`` helps in generating a easy to grasp
  6 | image from a set of samples or weights.
  7 | 
  8 | """
  9 | 
 10 | 
 11 | import numpy, os, cPickle
 12 | from PIL import Image
 13 | 
 14 | def load_mnist():
 15 |     path = '.'
 16 |     data = cPickle.load(open(os.path.join(path,'mnist.pkl'), 'r'))
 17 |     return data
 18 | 
 19 | def scale_to_unit_interval(ndar, eps=1e-8):
 20 |     """ Scales all values in the ndarray ndar to be between 0 and 1 """
 21 |     ndar = ndar.copy()
 22 |     ndar -= ndar.min()
 23 |     ndar *= 1.0 / (ndar.max() + eps)
 24 |     return ndar
 25 | 
 26 | 
 27 | def tile_raster_images(X, img_shape, tile_shape, tile_spacing=(0, 0),
 28 |                        scale_rows_to_unit_interval=True,
 29 |                        output_pixel_vals=True):
 30 |     """
 31 |     Transform an array with one flattened image per row, into an array in
 32 |     which images are reshaped and layed out like tiles on a floor.
 33 | 
 34 |     This function is useful for visualizing datasets whose rows are images,
 35 |     and also columns of matrices for transforming those rows
 36 |     (such as the first layer of a neural net).
 37 | 
 38 |     :type X: a 2-D ndarray or a tuple of 4 channels, elements of which can
 39 |     be 2-D ndarrays or None;
 40 |     :param X: a 2-D array in which every row is a flattened image.
 41 | 
 42 |     :type img_shape: tuple; (height, width)
 43 |     :param img_shape: the original shape of each image
 44 | 
 45 |     :type tile_shape: tuple; (rows, cols)
 46 |     :param tile_shape: the number of images to tile (rows, cols)
 47 | 
 48 |     :param output_pixel_vals: if output should be pixel values (i.e. int8
 49 |     values) or floats
 50 | 
 51 |     :param scale_rows_to_unit_interval: if the values need to be scaled before
 52 |     being plotted to [0,1] or not
 53 | 
 54 | 
 55 |     :returns: array suitable for viewing as an image.
 56 |     (See:`PIL.Image.fromarray`.)
 57 |     :rtype: a 2-d array with same dtype as X.
 58 | 
 59 |     """
 60 | 
 61 |     assert len(img_shape) == 2
 62 |     assert len(tile_shape) == 2
 63 |     assert len(tile_spacing) == 2
 64 | 
 65 |     # The expression below can be re-written in a more C style as
 66 |     # follows :
 67 |     #
 68 |     # out_shape    = [0,0]
 69 |     # out_shape[0] = (img_shape[0]+tile_spacing[0])*tile_shape[0] -
 70 |     #                tile_spacing[0]
 71 |     # out_shape[1] = (img_shape[1]+tile_spacing[1])*tile_shape[1] -
 72 |     #                tile_spacing[1]
 73 |     out_shape = [(ishp + tsp) * tshp - tsp for ishp, tshp, tsp
 74 |                         in zip(img_shape, tile_shape, tile_spacing)]
 75 | 
 76 |     if isinstance(X, tuple):
 77 |         assert len(X) == 4
 78 |         # Create an output numpy ndarray to store the image
 79 |         if output_pixel_vals:
 80 |             out_array = numpy.zeros((out_shape[0], out_shape[1], 4),
 81 |                                     dtype='uint8')
 82 |         else:
 83 |             out_array = numpy.zeros((out_shape[0], out_shape[1], 4),
 84 |                                     dtype=X.dtype)
 85 | 
 86 |         #colors default to 0, alpha defaults to 1 (opaque)
 87 |         if output_pixel_vals:
 88 |             channel_defaults = [0, 0, 0, 255]
 89 |         else:
 90 |             channel_defaults = [0., 0., 0., 1.]
 91 | 
 92 |         for i in xrange(4):
 93 |             if X[i] is None:
 94 |                 # if channel is None, fill it with zeros of the correct
 95 |                 # dtype
 96 |                 dt = out_array.dtype
 97 |                 if output_pixel_vals:
 98 |                     dt = 'uint8'
 99 |                 out_array[:, :, i] = numpy.zeros(out_shape,
100 |                         dtype=dt) + channel_defaults[i]
101 |             else:
102 |                 # use a recurrent call to compute the channel and store it
103 |                 # in the output
104 |                 out_array[:, :, i] = tile_raster_images(
105 |                     X[i], img_shape, tile_shape, tile_spacing,
106 |                     scale_rows_to_unit_interval, output_pixel_vals)
107 |         return out_array
108 | 
109 |     else:
110 |         # if we are dealing with only one channel
111 |         H, W = img_shape
112 |         Hs, Ws = tile_spacing
113 | 
114 |         # generate a matrix to store the output
115 |         dt = X.dtype
116 |         if output_pixel_vals:
117 |             dt = 'uint8'
118 |         out_array = numpy.zeros(out_shape, dtype=dt)
119 | 
120 |         for tile_row in xrange(tile_shape[0]):
121 |             for tile_col in xrange(tile_shape[1]):
122 |                 if tile_row * tile_shape[1] + tile_col < X.shape[0]:
123 |                     this_x = X[tile_row * tile_shape[1] + tile_col]
124 |                     if scale_rows_to_unit_interval:
125 |                         # if we should scale values to be between 0 and 1
126 |                         # do this by calling the `scale_to_unit_interval`
127 |                         # function
128 |                         this_img = scale_to_unit_interval(
129 |                             this_x.reshape(img_shape))
130 |                     else:
131 |                         this_img = this_x.reshape(img_shape)
132 |                     # add the slice to the corresponding position in the
133 |                     # output array
134 |                     c = 1
135 |                     if output_pixel_vals:
136 |                         c = 255
137 |                     out_array[
138 |                         tile_row * (H + Hs): tile_row * (H + Hs) + H,
139 |                         tile_col * (W + Ws): tile_col * (W + Ws) + W
140 |                         ] = this_img * c
141 |         return out_array
142 | 
143 | def visualize_mnist():
144 |     (train_X, train_Y), (valid_X, valid_Y), (test_X, test_Y) = load_mnist()
145 |     design_matrix = train_X
146 |     images = design_matrix[0:2500, :]
147 |     channel_length = 28 * 28
148 |     to_visualize = images
149 |                     
150 |     image_data = tile_raster_images(to_visualize,
151 |                                     img_shape=[28,28],
152 |                                     tile_shape=[50,50],
153 |                                     tile_spacing=(2,2))
154 |     im_new = Image.fromarray(numpy.uint8(image_data))
155 |     im_new.save('samples_mnist.png')
156 |     os.system('eog samples_mnist.png')
157 |     
158 | if __name__ == '__main__':
159 |     visualize_mnist()
160 |     
161 | 
162 |     
163 | 


--------------------------------------------------------------------------------
/images/1_layer_no_walkback/.directory:
--------------------------------------------------------------------------------
1 | [Dolphin]
2 | AdditionalInfoV2=Details_Size,Details_Date,CustomizedDetails
3 | Timestamp=2013,6,17,17,27,51
4 | Version=2
5 | ViewMode=1
6 | 


--------------------------------------------------------------------------------
/images/1_layer_no_walkback/number_reconstruction200.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_no_walkback/number_reconstruction200.png


--------------------------------------------------------------------------------
/images/1_layer_no_walkback/samples_epoch_100.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_no_walkback/samples_epoch_100.png


--------------------------------------------------------------------------------
/images/1_layer_no_walkback/samples_epoch_150.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_no_walkback/samples_epoch_150.png


--------------------------------------------------------------------------------
/images/1_layer_no_walkback/samples_epoch_200.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_no_walkback/samples_epoch_200.png


--------------------------------------------------------------------------------
/images/1_layer_no_walkback/samples_epoch_50.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_no_walkback/samples_epoch_50.png


--------------------------------------------------------------------------------
/images/1_layer_walkback/.directory:
--------------------------------------------------------------------------------
1 | [Dolphin]
2 | ShowPreview=true
3 | Timestamp=2013,6,17,17,25,18
4 | Version=2
5 | 


--------------------------------------------------------------------------------
/images/1_layer_walkback/number_reconstruction120.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/number_reconstruction120.png


--------------------------------------------------------------------------------
/images/1_layer_walkback/samples_epoch_100.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/samples_epoch_100.png


--------------------------------------------------------------------------------
/images/1_layer_walkback/samples_epoch_120.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/samples_epoch_120.png


--------------------------------------------------------------------------------
/images/1_layer_walkback/samples_epoch_40.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/samples_epoch_40.png


--------------------------------------------------------------------------------
/images/1_layer_walkback/samples_epoch_60.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/samples_epoch_60.png


--------------------------------------------------------------------------------
/images/1_layer_walkback/samples_epoch_80.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/samples_epoch_80.png


--------------------------------------------------------------------------------
/images/2_layers/.directory:
--------------------------------------------------------------------------------
1 | [Dolphin]
2 | ShowPreview=true
3 | Timestamp=2013,6,17,17,19,21
4 | Version=2
5 | 


--------------------------------------------------------------------------------
/images/2_layers/number_reconstruction250.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/number_reconstruction250.png


--------------------------------------------------------------------------------
/images/2_layers/samples_epoch_100.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/samples_epoch_100.png


--------------------------------------------------------------------------------
/images/2_layers/samples_epoch_25.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/samples_epoch_25.png


--------------------------------------------------------------------------------
/images/2_layers/samples_epoch_250.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/samples_epoch_250.png


--------------------------------------------------------------------------------
/images/2_layers/samples_epoch_5.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/samples_epoch_5.png


--------------------------------------------------------------------------------
/images/2_layers/samples_epoch_50.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/samples_epoch_50.png


--------------------------------------------------------------------------------
/likelihood_estimation_parzen.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # encoding: utf-8
 3 | 
 4 | import sys
 5 | import os
 6 | import numpy
 7 | import cPickle, gzip
 8 | import time
 9 | 
10 | import theano
11 | from theano import tensor as T
12 | from model import load_mnist
13 | 
14 | def local_contrast_normalization(patches):
15 |     patches = patches.reshape((patches.shape[0], -1))
16 |     patches -= patches.mean(axis=1)[:,None]
17 | 
18 |     patches_std = numpy.sqrt((patches**2).mean(axis=1))
19 | 
20 |     min_divisor = (2*patches_std.min() + patches_std.mean()) / 3
21 |     patches /= numpy.maximum(min_divisor, patches_std).reshape((patches.shape[0],1))
22 | 
23 |     return patches
24 | 
25 | 
26 | def log_mean_exp(a):
27 |     max_ = a.max(1)
28 |     
29 |     return max_ + T.log(T.exp(a - max_.dimshuffle(0, 'x')).mean(1))
30 | 
31 | 
32 | def theano_parzen(mu, sigma):
33 |     x = T.matrix()
34 |     mu = theano.shared(mu)
35 |     
36 |     a = ( x.dimshuffle(0, 'x', 1) - mu.dimshuffle('x', 0, 1) ) / sigma
37 |     
38 |     E = log_mean_exp(-0.5*(a**2).sum(2))
39 |     
40 |     Z = mu.shape[1] * T.log(sigma * numpy.sqrt(numpy.pi * 2))
41 |     
42 |     return theano.function([x], E - Z)
43 | 
44 | 
45 | def numpy_parzen(x, mu, sigma):
46 |     a = ( x[:, None, :] - mu[None, :, :] ) / sigma
47 |     
48 |     def log_mean(i):
49 |         return i.max(1) + numpy.log(numpy.exp(i - i.max(1)[:, None]).mean(1))
50 |     
51 |     return log_mean(-0.5 * (a**2).sum(2)) - mu.shape[1] * numpy.log(sigma * numpy.sqrt(numpy.pi * 2))
52 | 
53 | 
54 | def get_ll(x, parzen, batch_size=10):
55 |     inds = range(x.shape[0])
56 |     
57 |     n_batches = int(numpy.ceil(float(len(inds)) / batch_size))
58 |     
59 |     times = []
60 |     lls = []
61 |     for i in range(n_batches):
62 |         begin = time.time()
63 |         ll = parzen(x[inds[i::n_batches]])
64 |         end = time.time()
65 |         
66 |         times.append(end-begin)
67 |         
68 |         lls.extend(ll)
69 |         
70 |         if i % 10 == 0:
71 |             print i, numpy.mean(times), numpy.mean(lls)
72 |     
73 |     return lls
74 | 
75 | 
76 | def main(sigma, dataset, sample_path='samples.npy'):
77 |     
78 |     # provide a .npy file where 10k generated samples are saved. 
79 |     filename = sample_path
80 |     
81 |     print 'loading samples from %s'%filename
82 |   
83 |     (train_X, train_Y), (valid_X, valid_Y), (test_X, test_Y) = load_mnist('.')
84 |     
85 |     samples = numpy.load(filename)
86 |     
87 |     parzen = theano_parzen(samples, sigma)
88 |             
89 |     test_ll = get_ll(test_X, parzen)
90 |     
91 |     print "Mean Log-Likelihood of test set = %.5f" % numpy.mean(test_ll)
92 |     print "Std of Mean Log-Likelihood of test set = %.5f" % (numpy.std(test_ll) / 100)
93 | 
94 | 
95 | if __name__ == "__main__":
96 |     # to use it on MNIST: python likelihood_estimation_parzen.py 0.23 MNIST
97 |     main(float(sys.argv[1]), sys.argv[2])
98 |     
99 | 


--------------------------------------------------------------------------------
/model.py:
--------------------------------------------------------------------------------
  1 | import numpy, os, sys, cPickle
  2 | import theano
  3 | import theano.tensor as T
  4 | import theano.sandbox.rng_mrg as RNG_MRG
  5 | import PIL.Image
  6 | from collections import OrderedDict
  7 | from image_tiler import *
  8 | import time
  9 | import argparse
 10 | 
 11 | cast32      = lambda x : numpy.cast['float32'](x)
 12 | trunc       = lambda x : str(x)[:8]
 13 | logit       = lambda p : numpy.log(p / (1 - p) )
 14 | binarize    = lambda x : cast32(x >= 0.5)
 15 | sigmoid     = lambda x : cast32(1. / (1 + numpy.exp(-x)))
 16 | 
 17 | def SaltAndPepper(X, rate=0.3):
 18 |     # Salt and pepper noise
 19 |     
 20 |     drop = numpy.arange(X.shape[1])
 21 |     numpy.random.shuffle(drop)
 22 |     sep = int(len(drop)*rate)
 23 |     drop = drop[:sep]
 24 |     X[:, drop[:sep/2]]=0
 25 |     X[:, drop[sep/2:]]=1
 26 |     return X
 27 | 
 28 | def get_shared_weights(n_in, n_out, interval, name):
 29 |     #val = numpy.random.normal(0, sigma_sqr, size=(n_in, n_out))
 30 |     val = numpy.random.uniform(-interval, interval, size=(n_in, n_out))
 31 |     val = cast32(val)
 32 |     val = theano.shared(value = val, name = name)
 33 |     return val
 34 | 
 35 | def get_shared_bias(n, name, offset = 0):
 36 |     val = numpy.zeros(n) - offset
 37 |     val = cast32(val)
 38 |     val = theano.shared(value = val, name = name)
 39 |     return val
 40 | 
 41 | def load_mnist(path):
 42 |     data = cPickle.load(open(os.path.join(path,'mnist.pkl'), 'r'))
 43 |     return data
 44 | 
 45 | def load_mnist_binary(path):
 46 |     data = cPickle.load(open(os.path.join(path,'mnist.pkl'), 'r'))
 47 |     data = [list(d) for d in data] 
 48 |     data[0][0] = (data[0][0] > 0.5).astype('float32')
 49 |     data[1][0] = (data[1][0] > 0.5).astype('float32')
 50 |     data[2][0] = (data[2][0] > 0.5).astype('float32')
 51 |     data = tuple([tuple(d) for d in data])
 52 |     return data
 53 |     
 54 | def load_tfd(path):
 55 |     import scipy.io as io
 56 |     data = io.loadmat(os.path.join(path, 'TFD_48x48.mat'))
 57 |     X = cast32(data['images'])/cast32(255)
 58 |     X = X.reshape((X.shape[0], X.shape[1] * X.shape[2]))
 59 |     labels  = data['labs_ex'].flatten()
 60 |     labeled = labels != -1
 61 |     unlabeled   =   labels == -1  
 62 |     train_X =   X[unlabeled]
 63 |     valid_X =   X[unlabeled][:100] # Stuf
 64 |     test_X  =   X[labeled]
 65 | 
 66 |     del data
 67 | 
 68 |     return (train_X, labels[unlabeled]), (valid_X, labels[unlabeled][:100]), (test_X, labels[labeled])
 69 | 
 70 | def experiment(state, channel):
 71 |     if state.test_model and 'config' in os.listdir('.'):
 72 |         print 'Loading local config file'
 73 |         config_file =   open('config', 'r')
 74 |         config      =   config_file.readlines()
 75 |         try:
 76 |             config_vals =   config[0].split('(')[1:][0].split(')')[:-1][0].split(', ')
 77 |         except:
 78 |             config_vals =   config[0][3:-1].replace(': ','=').replace("'","").split(', ')
 79 |             config_vals =   filter(lambda x:not 'jobman' in x and not '/' in x and not ':' in x and not 'experiment' in x, config_vals)
 80 |         
 81 |         for CV in config_vals:
 82 |             print CV
 83 |             if CV.startswith('test'):
 84 |                 print 'Do not override testing switch'
 85 |                 continue        
 86 |             try:
 87 |                 exec('state.'+CV) in globals(), locals()
 88 |             except:
 89 |                 exec('state.'+CV.split('=')[0]+"='"+CV.split('=')[1]+"'") in globals(), locals()
 90 |      
 91 | 
 92 | 
 93 |     else:
 94 |         # Save the current configuration
 95 |         # Useful for logs/experiments
 96 |         print 'Saving config'
 97 |         f = open('config', 'w')
 98 |         f.write(str(state))
 99 |         f.close()
100 | 
101 | 
102 |     print state
103 |     # Load the data, train = train+valid, and shuffle train
104 |     # Targets are not used (will be misaligned after shuffling train
105 |     if state.dataset == 'MNIST':
106 |         (train_X, train_Y), (valid_X, valid_Y), (test_X, test_Y) = load_mnist(state.data_path)
107 |         train_X = numpy.concatenate((train_X, valid_X))
108 |         
109 |     elif state.dataset == 'MNIST_binary':
110 |         (train_X, train_Y), (valid_X, valid_Y), (test_X, test_Y) = load_mnist_binary(state.data_path)
111 |         train_X = numpy.concatenate((train_X, valid_X))
112 |         
113 |     elif state.dataset == 'TFD':
114 |         (train_X, train_Y), (valid_X, valid_Y), (test_X, test_Y) = load_tfd(state.data_path)
115 |     
116 |     N_input =   train_X.shape[1]
117 |     root_N_input = numpy.sqrt(N_input)
118 |     numpy.random.seed(1)
119 |     numpy.random.shuffle(train_X)
120 |     train_X = theano.shared(train_X)
121 |     valid_X = theano.shared(valid_X)
122 |     test_X  = theano.shared(test_X)
123 | 
124 |     # Theano variables and RNG
125 |     X       = T.fmatrix()
126 |     index   = T.lscalar()
127 |     MRG = RNG_MRG.MRG_RandomStreams(1)
128 |     
129 |     # Network and training specifications
130 |     K               =   state.K # N hidden layers
131 |     N               =   state.N # number of walkbacks 
132 |     layer_sizes     =   [N_input] + [state.hidden_size] * K # layer sizes, from h0 to hK (h0 is the visible layer)
133 |     learning_rate   =   theano.shared(cast32(state.learning_rate))  # learning rate
134 |     annealing       =   cast32(state.annealing) # exponential annealing coefficient
135 |     momentum        =   theano.shared(cast32(state.momentum)) # momentum term
136 | 
137 |     # THEANO VARIABLES
138 |     X       = T.fmatrix()   # Input of the graph
139 |     index   = T.lscalar()   # index to minibatch
140 |     MRG = RNG_MRG.MRG_RandomStreams(1)
141 |  
142 | 
143 |     # PARAMETERS : weights list and bias list.
144 |     # initialize a list of weights and biases based on layer_sizes
145 |     weights_list    =   [get_shared_weights(layer_sizes[i], layer_sizes[i+1], numpy.sqrt(6. / (layer_sizes[i] + layer_sizes[i+1] )), 'W') for i in range(K)]
146 |     bias_list       =   [get_shared_bias(layer_sizes[i], 'b') for i in range(K + 1)]
147 | 
148 |     if state.test_model:
149 |         # Load the parameters of the last epoch
150 |         # maybe if the path is given, load these specific attributes 
151 |         param_files     =   filter(lambda x:'params' in x, os.listdir('.'))
152 |         max_epoch_idx   =   numpy.argmax([int(x.split('_')[-1].split('.')[0]) for x in param_files])
153 |         params_to_load  =   param_files[max_epoch_idx]
154 |         PARAMS = cPickle.load(open(params_to_load,'r'))
155 |         [p.set_value(lp.get_value(borrow=False)) for lp, p in zip(PARAMS[:len(weights_list)], weights_list)]
156 |         [p.set_value(lp.get_value(borrow=False)) for lp, p in zip(PARAMS[len(weights_list):], bias_list)]
157 | 
158 |     # Util functions
159 |     def dropout(IN, p = 0.5):
160 |         noise   =   MRG.binomial(p = p, n = 1, size = IN.shape, dtype='float32')
161 |         OUT     =   (IN * noise) / cast32(p)
162 |         return OUT
163 | 
164 |     def add_gaussian_noise(IN, std = 1):
165 |         print 'GAUSSIAN NOISE : ', std
166 |         noise   =   MRG.normal(avg  = 0, std  = std, size = IN.shape, dtype='float32')
167 |         OUT     =   IN + noise
168 |         return OUT
169 | 
170 |     def corrupt_input(IN, p = 0.5):
171 |         # salt and pepper? masking?
172 |         noise   =   MRG.binomial(p = p, n = 1, size = IN.shape, dtype='float32')
173 |         IN      =   IN * noise
174 |         return IN
175 | 
176 |     def salt_and_pepper(IN, p = 0.2):
177 |         # salt and pepper noise
178 |         print 'DAE uses salt and pepper noise'
179 |         a = MRG.binomial(size=IN.shape, n=1,
180 |                               p = 1 - p,
181 |                               dtype='float32')
182 |         b = MRG.binomial(size=IN.shape, n=1,
183 |                               p = 0.5,
184 |                               dtype='float32')
185 |         c = T.eq(a,0) * b
186 |         return IN * a + c
187 | 
188 |     # Odd layer update function
189 |     # just a loop over the odd layers
190 |     def update_odd_layers(hiddens, noisy):
191 |         for i in range(1, K+1, 2):
192 |             print i
193 |             if noisy:
194 |                 simple_update_layer(hiddens, None, i)
195 |             else:
196 |                 simple_update_layer(hiddens, None, i, add_noise = False)
197 | 
198 |     # Even layer update
199 |     # p_X_chain is given to append the p(X|...) at each update (one update = odd update + even update)
200 |     def update_even_layers(hiddens, p_X_chain, noisy):
201 |         for i in range(0, K+1, 2):
202 |             print i
203 |             if noisy:
204 |                 simple_update_layer(hiddens, p_X_chain, i)
205 |             else:
206 |                 simple_update_layer(hiddens, p_X_chain, i, add_noise = False)
207 | 
208 |     # The layer update function
209 |     # hiddens   :   list containing the symbolic theano variables [visible, hidden1, hidden2, ...]
210 |     #               layer_update will modify this list inplace
211 |     # p_X_chain :   list containing the successive p(X|...) at each update
212 |     #               update_layer will append to this list
213 |     # add_noise     : pre and post activation gaussian noise
214 | 
215 |     def simple_update_layer(hiddens, p_X_chain, i, add_noise=True):
216 |         # Compute the dot product, whatever layer
217 |         post_act_noise  =   0
218 | 
219 |         if i == 0:
220 |             hiddens[i]  =   T.dot(hiddens[i+1], weights_list[i].T) + bias_list[i]           
221 | 
222 |         elif i == K:
223 |             hiddens[i]  =   T.dot(hiddens[i-1], weights_list[i-1]) + bias_list[i]
224 |             
225 |         else:
226 |             # next layer        :   layers[i+1], assigned weights : W_i
227 |             # previous layer    :   layers[i-1], assigned weights : W_(i-1)
228 |             hiddens[i]  =   T.dot(hiddens[i+1], weights_list[i].T) + T.dot(hiddens[i-1], weights_list[i-1]) + bias_list[i]
229 | 
230 |         # Add pre-activation noise if NOT input layer
231 |         if i==1 and state.noiseless_h1:
232 |             print '>>NO noise in first layer'
233 |             add_noise   =   False
234 | 
235 |         # pre activation noise            
236 |         if i != 0 and add_noise:
237 |             print 'Adding pre-activation gaussian noise'
238 |             hiddens[i]  =   add_gaussian_noise(hiddens[i], state.hidden_add_noise_sigma)
239 |        
240 |         # ACTIVATION!
241 |         if i == 0:
242 |             print 'Sigmoid units'
243 |             hiddens[i]  =   T.nnet.sigmoid(hiddens[i])
244 |         else:
245 |             print 'Hidden units'
246 |             hiddens[i]  =   hidden_activation(hiddens[i])
247 | 
248 |         # post activation noise            
249 |         if i != 0 and add_noise:
250 |             print 'Adding post-activation gaussian noise'
251 |             hiddens[i]  =   add_gaussian_noise(hiddens[i], state.hidden_add_noise_sigma)
252 | 
253 |         # build the reconstruction chain
254 |         if i == 0:
255 |             # if input layer -> append p(X|...)
256 |             p_X_chain.append(hiddens[i])
257 |             
258 |             # sample from p(X|...)
259 |             if state.input_sampling:
260 |                 print 'Sampling from input'
261 |                 sampled     =   MRG.binomial(p = hiddens[i], size=hiddens[i].shape, dtype='float32')
262 |             else:
263 |                 print '>>NO input sampling'
264 |                 sampled     =   hiddens[i]
265 |             # add noise
266 |             sampled     =   salt_and_pepper(sampled, state.input_salt_and_pepper)
267 |             
268 |             # set input layer
269 |             hiddens[i]  =   sampled
270 | 
271 |     def update_layers(hiddens, p_X_chain, noisy = True):
272 |         print 'odd layer update'
273 |         update_odd_layers(hiddens, noisy)
274 |         print
275 |         print 'even layer update'
276 |         update_even_layers(hiddens, p_X_chain, noisy)
277 | 
278 |  
279 |     ''' F PROP '''
280 |     #X = T.fmatrix()
281 |     if state.act == 'sigmoid':
282 |         print 'Using sigmoid activation'
283 |         hidden_activation = T.nnet.sigmoid
284 |     elif state.act == 'rectifier':
285 |         print 'Using rectifier activation'
286 |         hidden_activation = lambda x : T.maximum(cast32(0), x)
287 |     elif state.act == 'tanh':
288 |         hidden_activation = lambda x : T.tanh(x)    
289 |    
290 |     
291 |     ''' Corrupt X '''
292 |     X_corrupt   = salt_and_pepper(X, state.input_salt_and_pepper)
293 | 
294 |     ''' hidden layer init '''
295 | 
296 |     hiddens     = [X_corrupt]
297 |     p_X_chain   = [] 
298 |     print "Hidden units initialization"
299 |     for w,b in zip(weights_list, bias_list[1:]):
300 |         # init with zeros
301 |         print "Init hidden units at zero before creating the graph"
302 |         hiddens.append(T.zeros_like(T.dot(hiddens[-1], w)))
303 | 
304 |     # The layer update scheme
305 |     print "Building the graph :", N,"updates"
306 |     for i in range(N):
307 |         update_layers(hiddens, p_X_chain)
308 |     
309 | 
310 |     # COST AND GRADIENTS    
311 | 
312 |     print 'Cost w.r.t p(X|...) at every step in the graph'
313 |     #COST        =   T.mean(T.nnet.binary_crossentropy(reconstruction, X))
314 |     COST        =   [T.mean(T.nnet.binary_crossentropy(rX, X)) for rX in p_X_chain]
315 |     show_COST   =   COST[-1] 
316 |     COST        =   numpy.sum(COST)
317 |     
318 |     params          =   weights_list + bias_list
319 |     
320 |     gradient        =   T.grad(COST, params)
321 |                 
322 |     gradient_buffer =   [theano.shared(numpy.zeros(x.get_value().shape, dtype='float32')) for x in params]
323 |     
324 |     m_gradient      =   [momentum * gb + (cast32(1) - momentum) * g for (gb, g) in zip(gradient_buffer, gradient)]
325 |     g_updates       =   [(p, p - learning_rate * mg) for (p, mg) in zip(params, m_gradient)]
326 |     b_updates       =   zip(gradient_buffer, m_gradient)
327 |         
328 |     updates         =   OrderedDict(g_updates + b_updates)
329 |     
330 |     f_cost      =   theano.function(inputs = [X], outputs = show_COST)
331 |     
332 |     indexed_batch   = train_X[index * state.batch_size : (index+1) * state.batch_size]
333 |     sampled_batch   = MRG.binomial(p = indexed_batch, size = indexed_batch.shape, dtype='float32')
334 |     
335 |     f_learn     =   theano.function(inputs  = [index], 
336 |                                     updates = updates, 
337 |                                     givens  = {X : indexed_batch},
338 |                                     outputs = show_COST)
339 |     
340 |     f_test      =   theano.function(inputs  =   [X],
341 |                                     outputs =   [X_corrupt] + hiddens[0] + p_X_chain,
342 |                                     on_unused_input = 'warn')
343 | 
344 | 
345 |     #############
346 |     # Denoise some numbers  :   show number, noisy number, reconstructed number
347 |     #############
348 |     import random as R
349 |     R.seed(1)
350 |     random_idx      =   numpy.array(R.sample(range(len(test_X.get_value())), 100))
351 |     numbers         =   test_X.get_value()[random_idx]
352 |     
353 |     f_noise = theano.function(inputs = [X], outputs = salt_and_pepper(X, state.input_salt_and_pepper))
354 |     noisy_numbers   =   f_noise(test_X.get_value()[random_idx])
355 | 
356 |     # Recompile the graph without noise for reconstruction function
357 |     hiddens_R     = [X]
358 |     p_X_chain_R   = []
359 | 
360 |     for w,b in zip(weights_list, bias_list[1:]):
361 |         # init with zeros
362 |         hiddens_R.append(T.zeros_like(T.dot(hiddens_R[-1], w)))
363 | 
364 |     # The layer update scheme
365 |     for i in range(N):
366 |         update_layers(hiddens_R, p_X_chain_R, noisy=False)
367 | 
368 |     f_recon = theano.function(inputs = [X], outputs = p_X_chain_R[-1]) 
369 | 
370 | 
371 |     ############
372 |     # Sampling #
373 |     ############
374 |     
375 |     # the input to the sampling function
376 |     network_state_input     =   [X] + [T.fmatrix() for i in range(K)]
377 |    
378 |     # "Output" state of the network (noisy)
379 |     # initialized with input, then we apply updates
380 |     #network_state_output    =   network_state_input
381 |     
382 |     network_state_output    =   [X] + network_state_input[1:]
383 | 
384 |     visible_pX_chain        =   []
385 | 
386 |     # ONE update
387 |     update_layers(network_state_output, visible_pX_chain, noisy=True)
388 | 
389 |     if K == 1: 
390 |         f_sample_simple = theano.function(inputs = [X], outputs = visible_pX_chain[-1])
391 |     
392 |     
393 |     # WHY IS THERE A WARNING????
394 |     # because the first odd layers are not used -> directly computed FROM THE EVEN layers
395 |     # unused input = warn
396 |     f_sample2   =   theano.function(inputs = network_state_input, outputs = network_state_output + visible_pX_chain, on_unused_input='warn')
397 | 
398 |     def sample_some_numbers_single_layer():
399 |         x0    =   test_X.get_value()[:1]
400 |         samples = [x0]
401 |         x  =   f_noise(x0)
402 |         for i in range(399):
403 |             x = f_sample_simple(x)
404 |             samples.append(x)
405 |             x = numpy.random.binomial(n=1, p=x, size=x.shape).astype('float32')
406 |             x = f_noise(x)
407 |         return numpy.vstack(samples)
408 |             
409 |     def sampling_wrapper(NSI):
410 |         out             =   f_sample2(*NSI)
411 |         NSO             =   out[:len(network_state_output)]
412 |         vis_pX_chain    =   out[len(network_state_output):]
413 |         return NSO, vis_pX_chain
414 | 
415 |     def sample_some_numbers(N=400):
416 |         # The network's initial state
417 |         init_vis    =   test_X.get_value()[:1]
418 | 
419 |         noisy_init_vis  =   f_noise(init_vis)
420 | 
421 |         network_state   =   [[noisy_init_vis] + [numpy.zeros((1,len(b.get_value())), dtype='float32') for b in bias_list[1:]]]
422 | 
423 |         visible_chain   =   [init_vis]
424 | 
425 |         noisy_h0_chain  =   [noisy_init_vis]
426 | 
427 |         for i in range(N-1):
428 |            
429 |             # feed the last state into the network, compute new state, and obtain visible units expectation chain 
430 |             net_state_out, vis_pX_chain =   sampling_wrapper(network_state[-1])
431 | 
432 |             # append to the visible chain
433 |             visible_chain   +=  vis_pX_chain
434 | 
435 |             # append state output to the network state chain
436 |             network_state.append(net_state_out)
437 |             
438 |             noisy_h0_chain.append(net_state_out[0])
439 | 
440 |         return numpy.vstack(visible_chain), numpy.vstack(noisy_h0_chain)
441 |     
442 |     def plot_samples(epoch_number):
443 |         to_sample = time.time()
444 |         if K == 1:
445 |             # one layer model
446 |             V = sample_some_numbers_single_layer()
447 |         else:
448 |             V, H0 = sample_some_numbers()
449 |         img_samples =   PIL.Image.fromarray(tile_raster_images(V, (root_N_input,root_N_input), (20,20)))
450 |         
451 |         fname       =   'samples_epoch_'+str(epoch_number)+'.png'
452 |         img_samples.save(fname) 
453 |         print 'Took ' + str(time.time() - to_sample) + ' to sample 400 numbers'
454 |    
455 |     ##############
456 |     # Inpainting #
457 |     ##############
458 |     def inpainting(digit):
459 |         # The network's initial state
460 | 
461 |         # NOISE INIT
462 |         init_vis    =   cast32(numpy.random.uniform(size=digit.shape))
463 | 
464 |         #noisy_init_vis  =   f_noise(init_vis)
465 |         #noisy_init_vis  =   cast32(numpy.random.uniform(size=init_vis.shape))
466 | 
467 |         # INDEXES FOR VISIBLE AND NOISY PART
468 |         noise_idx = (numpy.arange(N_input) % root_N_input < (root_N_input/2))
469 |         fixed_idx = (numpy.arange(N_input) % root_N_input > (root_N_input/2))
470 |         # function to re-init the visible to the same noise
471 | 
472 |         # FUNCTION TO RESET HALF VISIBLE TO DIGIT
473 |         def reset_vis(V):
474 |             V[0][fixed_idx] =   digit[0][fixed_idx]
475 |             return V
476 |         
477 |         # INIT DIGIT : NOISE and RESET HALF TO DIGIT
478 |         init_vis = reset_vis(init_vis)
479 | 
480 |         network_state   =   [[init_vis] + [numpy.zeros((1,len(b.get_value())), dtype='float32') for b in bias_list[1:]]]
481 | 
482 |         visible_chain   =   [init_vis]
483 | 
484 |         noisy_h0_chain  =   [init_vis]
485 | 
486 |         for i in range(49):
487 |            
488 |             # feed the last state into the network, compute new state, and obtain visible units expectation chain 
489 |             net_state_out, vis_pX_chain =   sampling_wrapper(network_state[-1])
490 | 
491 | 
492 |             # reset half the digit
493 |             net_state_out[0] = reset_vis(net_state_out[0])
494 |             vis_pX_chain[0]  = reset_vis(vis_pX_chain[0])
495 | 
496 |             # append to the visible chain
497 |             visible_chain   +=  vis_pX_chain
498 | 
499 |             # append state output to the network state chain
500 |             network_state.append(net_state_out)
501 |             
502 |             noisy_h0_chain.append(net_state_out[0])
503 | 
504 |         return numpy.vstack(visible_chain), numpy.vstack(noisy_h0_chain)
505 |     
506 |   
507 |   
508 |    
509 |     
510 |     def save_params(n, params):
511 |         print 'saving parameters...'
512 |         save_path = 'params_epoch_'+str(n)+'.pkl'
513 |         f = open(save_path, 'wb')
514 |         try:
515 |             cPickle.dump(params, f, protocol=cPickle.HIGHEST_PROTOCOL)
516 |         finally:
517 |             f.close() 
518 | 
519 |     # TRAINING
520 |     n_epoch     =   state.n_epoch
521 |     batch_size  =   state.batch_size
522 |     STOP        =   False
523 |     counter     =   0
524 | 
525 |     train_costs =   []
526 |     valid_costs =   []
527 |     test_costs  =   []
528 |     
529 |     if state.vis_init:
530 |         bias_list[0].set_value(logit(numpy.clip(0.9,0.001,train_X.get_value().mean(axis=0))))
531 | 
532 |     if state.test_model:
533 |         # If testing, do not train and go directly to generating samples, parzen window estimation, and inpainting
534 |         print 'Testing : skip training'
535 |         STOP    =   True
536 | 
537 | 
538 |     while not STOP:
539 |         counter     +=  1
540 |         t = time.time()
541 |         print counter,'\t',
542 | 
543 |         #train
544 |         train_cost  =   []
545 |         for i in range(len(train_X.get_value(borrow=True)) / batch_size):
546 |             #train_cost.append(f_learn(train_X[i * batch_size : (i+1) * batch_size]))
547 |             #training_idx = numpy.array(range(i*batch_size, (i+1)*batch_size), dtype='int32')
548 |             train_cost.append(f_learn(i))
549 |         train_cost = numpy.mean(train_cost) 
550 |         train_costs.append(train_cost)
551 |         print 'Train : ',trunc(train_cost), '\t',
552 | 
553 | 
554 |         #valid
555 |         valid_cost  =   []
556 |         for i in range(len(valid_X.get_value(borrow=True)) / 100):
557 |             valid_cost.append(f_cost(valid_X.get_value()[i * 100 : (i+1) * batch_size]))
558 |         valid_cost = numpy.mean(valid_cost)
559 |         #valid_cost  =   123
560 |         valid_costs.append(valid_cost)
561 |         print 'Valid : ', trunc(valid_cost), '\t',
562 | 
563 |         #test
564 |         test_cost  =   []
565 |         for i in range(len(test_X.get_value(borrow=True)) / 100):
566 |             test_cost.append(f_cost(test_X.get_value()[i * 100 : (i+1) * batch_size]))
567 |         test_cost = numpy.mean(test_cost)
568 |         test_costs.append(test_cost)
569 |         print 'Test  : ', trunc(test_cost), '\t',
570 | 
571 |         if counter >= n_epoch:
572 |             STOP = True
573 | 
574 |         print 'time : ', trunc(time.time() - t),
575 | 
576 |         print 'MeanVisB : ', trunc(bias_list[0].get_value().mean()),
577 |         
578 |         print 'W : ', [trunc(abs(w.get_value(borrow=True)).mean()) for w in weights_list]
579 | 
580 |         if (counter % 5) == 0:
581 |             # Checking reconstruction
582 |             reconstructed   =   f_recon(noisy_numbers) 
583 |             # Concatenate stuff
584 |             stacked         =   numpy.vstack([numpy.vstack([numbers[i*10 : (i+1)*10], noisy_numbers[i*10 : (i+1)*10], reconstructed[i*10 : (i+1)*10]]) for i in range(10)])
585 |         
586 |             number_reconstruction   =   PIL.Image.fromarray(tile_raster_images(stacked, (root_N_input,root_N_input), (10,30)))
587 |             #epoch_number    =   reduce(lambda x,y : x + y, ['_'] * (4-len(str(counter)))) + str(counter)
588 |             number_reconstruction.save('number_reconstruction'+str(counter)+'.png')
589 |     
590 |             #sample_numbers(counter, 'seven')
591 |             plot_samples(counter)
592 |     
593 |             #save params
594 |             save_params(counter, params)
595 |      
596 |         # ANNEAL!
597 |         new_lr = learning_rate.get_value() * annealing
598 |         learning_rate.set_value(new_lr)
599 | 
600 |     # Save
601 |     state.train_costs = train_costs
602 |     state.valid_costs = valid_costs
603 |     state.test_costs = test_costs
604 | 
605 |     # if test
606 | 
607 |     # 10k samples
608 |     print 'Generating 10,000 samples'
609 |     samples, _  =   sample_some_numbers(N=10000)
610 |     f_samples   =   'samples.npy'
611 |     numpy.save(f_samples, samples)
612 |     print 'saved digits'
613 | 
614 | 
615 |     # parzen
616 |     print 'Evaluating parzen window'
617 |     import likelihood_estimation_parzen
618 |     likelihood_estimation_parzen.main(0.20,'mnist') 
619 | 
620 |     # Inpainting
621 |     print 'Inpainting'
622 |     test_X  =   test_X.get_value()
623 | 
624 |     numpy.random.seed(2)
625 |     test_idx    =   numpy.arange(len(test_Y))
626 | 
627 |     for Iter in range(10):
628 | 
629 |         numpy.random.shuffle(test_idx)
630 |         test_X = test_X[test_idx]
631 |         test_Y = test_Y[test_idx]
632 | 
633 |         digit_idx = [(test_Y==i).argmax() for i in range(10)]
634 |         inpaint_list = []
635 | 
636 |         for idx in digit_idx:
637 |             DIGIT = test_X[idx:idx+1]
638 |             V_inpaint, H_inpaint = inpainting(DIGIT)
639 |             inpaint_list.append(V_inpaint)
640 | 
641 |         INPAINTING  =   numpy.vstack(inpaint_list)
642 | 
643 |         plot_inpainting =   PIL.Image.fromarray(tile_raster_images(INPAINTING, (root_N_input,root_N_input), (10,50)))
644 | 
645 |         fname   =   'inpainting_'+str(Iter)+'.png'
646 |         #fname   =   os.path.join(state.model_path, fname)
647 | 
648 |         plot_inpainting.save(fname)
649 | 
650 |         if False and __name__ ==  "__main__":
651 |             os.system('eog inpainting.png')
652 |  
653 | 
654 | 
655 | 
656 |     if __name__ == '__main__':
657 |         import ipdb; ipdb.set_trace()
658 |     
659 |     return 
660 | 


--------------------------------------------------------------------------------
/run_dae_no_walkback.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This script will train a single layer model known as Generalized Denoising Auto-Encoder
 3 | WITHOUT the walkback training.
 4 | 
 5 | Reference paper: 
 6 | 'Generalized Denoising Auto-Encoders as Generative Models'
 7 | Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent
 8 | http://arxiv.org/abs/1305.6663
 9 | """
10 | import argparse
11 | import model
12 | 
13 | def main():
14 |     parser = argparse.ArgumentParser()
15 |     # Add options here
16 |     parser.add_argument('--K', type=int, default=1) 
17 |     parser.add_argument('--N', type=int, default=1) 
18 |     parser.add_argument('--n_epoch', type=int, default=200)
19 |     parser.add_argument('--batch_size', type=int, default=100)
20 |     parser.add_argument('--hidden_add_noise_sigma', type=float, default=0)
21 |     parser.add_argument('--input_salt_and_pepper', type=float, default=0.4)
22 |     parser.add_argument('--learning_rate', type=float, default=10)
23 |     parser.add_argument('--momentum', type=float, default=0.)
24 |     parser.add_argument('--annealing', type=float, default=1.)
25 |     parser.add_argument('--hidden_size', type=float, default=2000)
26 |     parser.add_argument('--act', type=str, default='sigmoid')
27 |     parser.add_argument('--dataset', type=str, default='MNIST_binary')
28 |     parser.add_argument('--data_path', type=str, default='.')
29 |    
30 |     # argparse does not deal with bool 
31 |     parser.add_argument('--vis_init', type=int, default=0)
32 |     parser.add_argument('--noiseless_h1', type=int, default=1)
33 |     parser.add_argument('--input_sampling', type=int, default=1)
34 |     parser.add_argument('--test_model', type=int, default=0)
35 | 
36 |     args = parser.parse_args()
37 | 
38 |     model.experiment(args, None)
39 |     
40 | if __name__ == '__main__':
41 |     main()


--------------------------------------------------------------------------------
/run_dae_walkback.py:
--------------------------------------------------------------------------------
 1 | """
 2 | This script will train a single layer model known as Generalized Denoising Auto-Encoder
 3 | WITH 5 steps of 'walkback' training.
 4 | 
 5 | Reference paper: 
 6 | 'Generalized Denoising Auto-Encoders as Generative Models'
 7 | Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent
 8 | http://arxiv.org/abs/1305.6663
 9 |   
10 | """
11 | import argparse
12 | import model
13 | 
14 | def main():
15 |     parser = argparse.ArgumentParser()
16 |     # Add options here
17 |     parser.add_argument('--K', type=int, default=1) # nubmer of hidden layers
18 |     parser.add_argument('--N', type=int, default=5) # number of walkbacks
19 |     parser.add_argument('--n_epoch', type=int, default=500)
20 |     parser.add_argument('--batch_size', type=int, default=100)
21 |     parser.add_argument('--hidden_add_noise_sigma', type=float, default=0)
22 |     parser.add_argument('--input_salt_and_pepper', type=float, default=0.4)
23 |     parser.add_argument('--learning_rate', type=float, default=10)
24 |     parser.add_argument('--momentum', type=float, default=0.)
25 |     parser.add_argument('--annealing', type=float, default=1.)
26 |     parser.add_argument('--hidden_size', type=float, default=2000)
27 |     parser.add_argument('--act', type=str, default='sigmoid')
28 |     parser.add_argument('--dataset', type=str, default='MNIST_binary')
29 |     parser.add_argument('--data_path', type=str, default='.')
30 |    
31 |     # argparse does not deal with bool 
32 |     parser.add_argument('--vis_init', type=int, default=0)
33 |     parser.add_argument('--noiseless_h1', type=int, default=1)
34 |     parser.add_argument('--input_sampling', type=int, default=1)
35 |     parser.add_argument('--test_model', type=int, default=0)
36 |   
37 |     args = parser.parse_args()
38 |     
39 |     model.experiment(args, None)
40 |     
41 | if __name__ == '__main__':
42 |     main()


--------------------------------------------------------------------------------
/run_gsn.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | This scripts produces the model trained on MNIST discussed in the paper:
 3 | 
 4 | 'Deep Generative Stochastic Networks Trainable by Backprop'
 5 | Yoshua Bengio, Eric Thibodeau-Laufer
 6 | http://arxiv.org/abs/1306.1091
 7 | '''
 8 | import argparse
 9 | import model
10 | 
11 | def main():
12 |     parser = argparse.ArgumentParser()
13 |     # Add options here
14 | 
15 |     parser.add_argument('--K', type=int, default=2) # nubmer of hidden layers
16 |     parser.add_argument('--N', type=int, default=4) # number of walkbacks
17 |     parser.add_argument('--n_epoch', type=int, default=1000)
18 |     parser.add_argument('--batch_size', type=int, default=100)
19 |     parser.add_argument('--hidden_add_noise_sigma', type=float, default=2)
20 |     parser.add_argument('--input_salt_and_pepper', type=float, default=0.4)
21 |     parser.add_argument('--learning_rate', type=float, default=0.25)
22 |     parser.add_argument('--momentum', type=float, default=0.5)
23 |     parser.add_argument('--annealing', type=float, default=0.995)
24 |     parser.add_argument('--hidden_size', type=float, default=1500)
25 |     parser.add_argument('--act', type=str, default='tanh')
26 |     parser.add_argument('--dataset', type=str, default='MNIST')
27 |     parser.add_argument('--data_path', type=str, default='.')
28 |    
29 |     # argparse does not deal with bool 
30 |     parser.add_argument('--vis_init', type=int, default=0)
31 |     parser.add_argument('--noiseless_h1', type=int, default=1)
32 |     parser.add_argument('--input_sampling', type=int, default=1)
33 |     parser.add_argument('--test_model', type=int, default=0)
34 |     
35 |     args = parser.parse_args()
36 |    
37 |     print args.test_model 
38 |     
39 |     model.experiment(args, None)
40 |     
41 | if __name__ == '__main__':
42 |     main()
43 | 


--------------------------------------------------------------------------------