├── .gitignore ├── README.md ├── doc ├── dae.bib └── gsn.bib ├── image_tiler.py ├── images ├── 1_layer_no_walkback │ ├── .directory │ ├── number_reconstruction200.png │ ├── samples_epoch_100.png │ ├── samples_epoch_150.png │ ├── samples_epoch_200.png │ └── samples_epoch_50.png ├── 1_layer_walkback │ ├── .directory │ ├── number_reconstruction120.png │ ├── samples_epoch_100.png │ ├── samples_epoch_120.png │ ├── samples_epoch_40.png │ ├── samples_epoch_60.png │ └── samples_epoch_80.png └── 2_layers │ ├── .directory │ ├── number_reconstruction250.png │ ├── samples_epoch_100.png │ ├── samples_epoch_25.png │ ├── samples_epoch_250.png │ ├── samples_epoch_5.png │ └── samples_epoch_50.png ├── likelihood_estimation_parzen.py ├── model.py ├── run_dae_no_walkback.py ├── run_dae_walkback.py └── run_gsn.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.zip 2 | *.pkl 3 | *.npy 4 | *.linkinfo 5 | *.o 6 | *.orig 7 | *.pyc 8 | *.pyo 9 | *.so 10 | *.sw? 11 | *~ 12 | *.aux 13 | *.log 14 | *.nav 15 | *.out 16 | *.pdf 17 | *.snm 18 | *.toc 19 | *.vrb 20 | .noseids 21 | \#*\# 22 | build 23 | compiled/*.cpp 24 | core.* 25 | cutils_ext.cpp 26 | dist 27 | doc/.build/ 28 | html 29 | pdf 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | This package contains the accompanying code for the following two papers: 2 | 3 | * \[1\] Yoshua Bengio, Éric Thibodeau-Laufer, Jason 4 | Yosinski. [Deep Generative Stochastic Networks Trainable by Backprop](http://arxiv.org/abs/1306.1091). _arXiv 5 | preprint arXiv:1306.1091._ ([PDF](http://arxiv.org/pdf/1306.1091v3), 6 | [BibTeX](https://raw.github.com/yaoli/GSN/master/doc/gsn.bib)) 7 | 8 | * \[2\] Yoshua Bengio, Li Yao, Guillaume Alain, Pascal 9 | Vincent. [Generalized Denoising Auto-Encoders as Generative Models](http://papers.nips.cc/paper/5023-generalized-denoising-auto-encoders-as-generative-models). _NIPS, 10 | 2013._ ([PDF](http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/491.pdf), 11 | [BibTeX](https://raw.github.com/yaoli/GSN/master/doc/dae.bib)) 12 | 13 | 14 | 15 | Setup 16 | --------------------- 17 | 18 | #### Install Theano 19 | 20 | Download Theano and make sure it's working properly. All the 21 | information you need can be found by following this link: 22 | http://deeplearning.net/software/theano/ 23 | 24 | #### Prepare the MNIST dataset 25 | 26 | 1. Download the MNIST dataset from http://deeplearning.net/data/mnist/mnist.pkl.gz 27 | 28 | 2. Unzip the file to generate mnist.pkl using `gunzip mnist.pkl.gz` 29 | 30 | 3. (Optional) To visualize MNIST, run `python image_tiler.py` 31 | 32 | 33 | 34 | Reproducing the Results 35 | --------------------- 36 | 37 | The below commands are given in two formats: the first will run on the 38 | GPU and the second on the CPU. Choose whichever is most appropriate 39 | for your setup. Of course, the GPU versions will only work if Theano 40 | is being used on a machine with a compatible GPU (more about 41 | [using the GPU in Theano](http://deeplearning.net/software/theano/tutorial/using_gpu.html)). 42 | 43 | 1. To run a two layer Generative Stochastic Network (paper \[1\]) 44 | 45 | THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python run_gsn.py 46 | THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python run_gsn.py 47 | 48 | 2. To run a one layer Generalized Denoising Autoencoder with a walkback procedure (paper \[2\]) 49 | 50 | THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python run_dae_walkback.py 51 | THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python run_dae_walkback.py 52 | 53 | 3. To run a one layer Generalized Denoising Autoencoder without a walkback procedure (paper \[2\]) 54 | 55 | THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python run_dae_no_walkback.py 56 | THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python run_dae_no_walkback.py 57 | 58 | 4. Getting the log-likelihood estimation and inpainting (as described in paper \[1\]) 59 | 60 | To test a model, by generating inpainting pictures, and 10000 61 | samples used by the parzen density estimator, run this command in 62 | this directory, with the `params_epoch_X.pkl` and `config` file 63 | that was generated when training the model. If multiple 64 | `params_epoch_X.pkl` files are present, the one with the largest 65 | epoch number is used. 66 | 67 | THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python run_gsn.py --test_model 1 68 | 69 | 70 | 71 | #### Important notes on running the code 72 | 73 | * (1), (2) and (3) will generate images for both the denoising and 74 | pseudo-Gibbs sampling, and save parameters every 5 epochs. We have 75 | provided some examples of the reconstruction and generated samples 76 | (consecutive Gibbs samples) under the directory 'images/' for 3 77 | types of models. By just looking at the pictures there, it is clear 78 | that 2-layer model beats the 1-layer model with walkback training, 79 | which then beats the 1-layer model without walkback training. 80 | 81 | * The code is written such that it produces better results on the 82 | estimated log-likelihood by Parzen density estimator than in our 83 | paper \[2\]. For example, (2) produces a log-likelihood of around 84 | 150 and (3) produces 50. Both number could be higher if the model is 85 | trained longer. Trust this number with precaution, as the estimation 86 | from the Parzen density estimator is biased and tends to prefer 87 | rigid samples. You will notice this number is high even when the 88 | generated images do not look good. Trust the visualizations more. 89 | 90 | * The codes outputs a lot of information on the screen. This is meant 91 | to show the progression. Also you can safely ignore the warning 92 | message from Theano. The training starts when the following is 93 | printed out: 94 | 95 | 1 Train : 0.607192 Valid : 0.367054 Test : 0.364999 time : 20.40169 MeanVisB : -0.22522 W : ['0.024063', '0.022423'] 96 | 2 Train : 0.302400 Valid : 0.277827 Test : 0.277751 time : 20.33490 MeanVisB : -0.32510 W : ['0.023877', '0.022512'] 97 | 3 Train : 0.292427 Valid : 0.267693 Test : 0.268585 time : 20.45896 MeanVisB : -0.38779 W : ['0.023882', '0.022544'] 98 | 4 Train : 0.268086 Valid : 0.267201 Test : 0.268247 time : 20.37603 MeanVisB : -0.43271 W : ['0.023856', '0.022535'] 99 | 5 Train : 0.266533 Valid : 0.264087 Test : 0.265572 time : 20.26944 MeanVisB : -0.47086 W : ['0.023840', '0.022517'] 100 | 101 | For each training epoch, the first 3 numbers are the cost on the 102 | training, validation, and test sets, followed by the training time 103 | (in seconds, on GPU Nvidia GeForce GTX580, 300 seconds in Intel(R) 104 | Core(TM) i7-2600K CPU @ 3.40GHz), then the mean of the visible bias, 105 | and mean of the magnitude of weights. 106 | 107 | 108 | #### Contact 109 | 110 | Questions? Contact us: li.yao@umontreal.ca 111 | -------------------------------------------------------------------------------- /doc/dae.bib: -------------------------------------------------------------------------------- 1 | @INPROCEEDINGS{Bengio-et-al-NIPS2013, 2 | author = {Bengio, Yoshua and Yao, Li and Alain, Guillaume and Vincent, Pascal}, 3 | title = {Generalized Denoising Auto-Encoders as Generative Models}, 4 | year = {2013}, 5 | organization = {NIPS}, 6 | booktitle = {Advances in Neural Information Processing Systems 26 (NIPS'13)}, 7 | crossref = {NIPS26} 8 | } 9 | -------------------------------------------------------------------------------- /doc/gsn.bib: -------------------------------------------------------------------------------- 1 | @TECHREPORT{Bengio+Laufer+Yosinski-arxiv-2013, 2 | author = {Yoshua Bengio and Eric Thibodeau-Laufer and Jason Yosinski}, 3 | title = {Deep Generative Stochastic Networks Trainable by Backprop}, 4 | number = {arXiv:1306.1091}, 5 | year = {2013}, 6 | institution = {Universite de Montreal} 7 | } 8 | -------------------------------------------------------------------------------- /image_tiler.py: -------------------------------------------------------------------------------- 1 | """ This file contains different utility functions that are not connected 2 | in anyway to the networks presented in the tutorials, but rather help in 3 | processing the outputs into a more understandable way. 4 | 5 | For example ``tile_raster_images`` helps in generating a easy to grasp 6 | image from a set of samples or weights. 7 | 8 | """ 9 | 10 | 11 | import numpy, os, cPickle 12 | from PIL import Image 13 | 14 | def load_mnist(): 15 | path = '.' 16 | data = cPickle.load(open(os.path.join(path,'mnist.pkl'), 'r')) 17 | return data 18 | 19 | def scale_to_unit_interval(ndar, eps=1e-8): 20 | """ Scales all values in the ndarray ndar to be between 0 and 1 """ 21 | ndar = ndar.copy() 22 | ndar -= ndar.min() 23 | ndar *= 1.0 / (ndar.max() + eps) 24 | return ndar 25 | 26 | 27 | def tile_raster_images(X, img_shape, tile_shape, tile_spacing=(0, 0), 28 | scale_rows_to_unit_interval=True, 29 | output_pixel_vals=True): 30 | """ 31 | Transform an array with one flattened image per row, into an array in 32 | which images are reshaped and layed out like tiles on a floor. 33 | 34 | This function is useful for visualizing datasets whose rows are images, 35 | and also columns of matrices for transforming those rows 36 | (such as the first layer of a neural net). 37 | 38 | :type X: a 2-D ndarray or a tuple of 4 channels, elements of which can 39 | be 2-D ndarrays or None; 40 | :param X: a 2-D array in which every row is a flattened image. 41 | 42 | :type img_shape: tuple; (height, width) 43 | :param img_shape: the original shape of each image 44 | 45 | :type tile_shape: tuple; (rows, cols) 46 | :param tile_shape: the number of images to tile (rows, cols) 47 | 48 | :param output_pixel_vals: if output should be pixel values (i.e. int8 49 | values) or floats 50 | 51 | :param scale_rows_to_unit_interval: if the values need to be scaled before 52 | being plotted to [0,1] or not 53 | 54 | 55 | :returns: array suitable for viewing as an image. 56 | (See:`PIL.Image.fromarray`.) 57 | :rtype: a 2-d array with same dtype as X. 58 | 59 | """ 60 | 61 | assert len(img_shape) == 2 62 | assert len(tile_shape) == 2 63 | assert len(tile_spacing) == 2 64 | 65 | # The expression below can be re-written in a more C style as 66 | # follows : 67 | # 68 | # out_shape = [0,0] 69 | # out_shape[0] = (img_shape[0]+tile_spacing[0])*tile_shape[0] - 70 | # tile_spacing[0] 71 | # out_shape[1] = (img_shape[1]+tile_spacing[1])*tile_shape[1] - 72 | # tile_spacing[1] 73 | out_shape = [(ishp + tsp) * tshp - tsp for ishp, tshp, tsp 74 | in zip(img_shape, tile_shape, tile_spacing)] 75 | 76 | if isinstance(X, tuple): 77 | assert len(X) == 4 78 | # Create an output numpy ndarray to store the image 79 | if output_pixel_vals: 80 | out_array = numpy.zeros((out_shape[0], out_shape[1], 4), 81 | dtype='uint8') 82 | else: 83 | out_array = numpy.zeros((out_shape[0], out_shape[1], 4), 84 | dtype=X.dtype) 85 | 86 | #colors default to 0, alpha defaults to 1 (opaque) 87 | if output_pixel_vals: 88 | channel_defaults = [0, 0, 0, 255] 89 | else: 90 | channel_defaults = [0., 0., 0., 1.] 91 | 92 | for i in xrange(4): 93 | if X[i] is None: 94 | # if channel is None, fill it with zeros of the correct 95 | # dtype 96 | dt = out_array.dtype 97 | if output_pixel_vals: 98 | dt = 'uint8' 99 | out_array[:, :, i] = numpy.zeros(out_shape, 100 | dtype=dt) + channel_defaults[i] 101 | else: 102 | # use a recurrent call to compute the channel and store it 103 | # in the output 104 | out_array[:, :, i] = tile_raster_images( 105 | X[i], img_shape, tile_shape, tile_spacing, 106 | scale_rows_to_unit_interval, output_pixel_vals) 107 | return out_array 108 | 109 | else: 110 | # if we are dealing with only one channel 111 | H, W = img_shape 112 | Hs, Ws = tile_spacing 113 | 114 | # generate a matrix to store the output 115 | dt = X.dtype 116 | if output_pixel_vals: 117 | dt = 'uint8' 118 | out_array = numpy.zeros(out_shape, dtype=dt) 119 | 120 | for tile_row in xrange(tile_shape[0]): 121 | for tile_col in xrange(tile_shape[1]): 122 | if tile_row * tile_shape[1] + tile_col < X.shape[0]: 123 | this_x = X[tile_row * tile_shape[1] + tile_col] 124 | if scale_rows_to_unit_interval: 125 | # if we should scale values to be between 0 and 1 126 | # do this by calling the `scale_to_unit_interval` 127 | # function 128 | this_img = scale_to_unit_interval( 129 | this_x.reshape(img_shape)) 130 | else: 131 | this_img = this_x.reshape(img_shape) 132 | # add the slice to the corresponding position in the 133 | # output array 134 | c = 1 135 | if output_pixel_vals: 136 | c = 255 137 | out_array[ 138 | tile_row * (H + Hs): tile_row * (H + Hs) + H, 139 | tile_col * (W + Ws): tile_col * (W + Ws) + W 140 | ] = this_img * c 141 | return out_array 142 | 143 | def visualize_mnist(): 144 | (train_X, train_Y), (valid_X, valid_Y), (test_X, test_Y) = load_mnist() 145 | design_matrix = train_X 146 | images = design_matrix[0:2500, :] 147 | channel_length = 28 * 28 148 | to_visualize = images 149 | 150 | image_data = tile_raster_images(to_visualize, 151 | img_shape=[28,28], 152 | tile_shape=[50,50], 153 | tile_spacing=(2,2)) 154 | im_new = Image.fromarray(numpy.uint8(image_data)) 155 | im_new.save('samples_mnist.png') 156 | os.system('eog samples_mnist.png') 157 | 158 | if __name__ == '__main__': 159 | visualize_mnist() 160 | 161 | 162 | 163 | -------------------------------------------------------------------------------- /images/1_layer_no_walkback/.directory: -------------------------------------------------------------------------------- 1 | [Dolphin] 2 | AdditionalInfoV2=Details_Size,Details_Date,CustomizedDetails 3 | Timestamp=2013,6,17,17,27,51 4 | Version=2 5 | ViewMode=1 6 | -------------------------------------------------------------------------------- /images/1_layer_no_walkback/number_reconstruction200.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_no_walkback/number_reconstruction200.png -------------------------------------------------------------------------------- /images/1_layer_no_walkback/samples_epoch_100.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_no_walkback/samples_epoch_100.png -------------------------------------------------------------------------------- /images/1_layer_no_walkback/samples_epoch_150.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_no_walkback/samples_epoch_150.png -------------------------------------------------------------------------------- /images/1_layer_no_walkback/samples_epoch_200.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_no_walkback/samples_epoch_200.png -------------------------------------------------------------------------------- /images/1_layer_no_walkback/samples_epoch_50.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_no_walkback/samples_epoch_50.png -------------------------------------------------------------------------------- /images/1_layer_walkback/.directory: -------------------------------------------------------------------------------- 1 | [Dolphin] 2 | ShowPreview=true 3 | Timestamp=2013,6,17,17,25,18 4 | Version=2 5 | -------------------------------------------------------------------------------- /images/1_layer_walkback/number_reconstruction120.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/number_reconstruction120.png -------------------------------------------------------------------------------- /images/1_layer_walkback/samples_epoch_100.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/samples_epoch_100.png -------------------------------------------------------------------------------- /images/1_layer_walkback/samples_epoch_120.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/samples_epoch_120.png -------------------------------------------------------------------------------- /images/1_layer_walkback/samples_epoch_40.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/samples_epoch_40.png -------------------------------------------------------------------------------- /images/1_layer_walkback/samples_epoch_60.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/samples_epoch_60.png -------------------------------------------------------------------------------- /images/1_layer_walkback/samples_epoch_80.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/1_layer_walkback/samples_epoch_80.png -------------------------------------------------------------------------------- /images/2_layers/.directory: -------------------------------------------------------------------------------- 1 | [Dolphin] 2 | ShowPreview=true 3 | Timestamp=2013,6,17,17,19,21 4 | Version=2 5 | -------------------------------------------------------------------------------- /images/2_layers/number_reconstruction250.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/number_reconstruction250.png -------------------------------------------------------------------------------- /images/2_layers/samples_epoch_100.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/samples_epoch_100.png -------------------------------------------------------------------------------- /images/2_layers/samples_epoch_25.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/samples_epoch_25.png -------------------------------------------------------------------------------- /images/2_layers/samples_epoch_250.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/samples_epoch_250.png -------------------------------------------------------------------------------- /images/2_layers/samples_epoch_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/samples_epoch_5.png -------------------------------------------------------------------------------- /images/2_layers/samples_epoch_50.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lisa-lab/GSN/05bb5114dcf3528740fdcb3f44ddfce5cab55d83/images/2_layers/samples_epoch_50.png -------------------------------------------------------------------------------- /likelihood_estimation_parzen.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # encoding: utf-8 3 | 4 | import sys 5 | import os 6 | import numpy 7 | import cPickle, gzip 8 | import time 9 | 10 | import theano 11 | from theano import tensor as T 12 | from model import load_mnist 13 | 14 | def local_contrast_normalization(patches): 15 | patches = patches.reshape((patches.shape[0], -1)) 16 | patches -= patches.mean(axis=1)[:,None] 17 | 18 | patches_std = numpy.sqrt((patches**2).mean(axis=1)) 19 | 20 | min_divisor = (2*patches_std.min() + patches_std.mean()) / 3 21 | patches /= numpy.maximum(min_divisor, patches_std).reshape((patches.shape[0],1)) 22 | 23 | return patches 24 | 25 | 26 | def log_mean_exp(a): 27 | max_ = a.max(1) 28 | 29 | return max_ + T.log(T.exp(a - max_.dimshuffle(0, 'x')).mean(1)) 30 | 31 | 32 | def theano_parzen(mu, sigma): 33 | x = T.matrix() 34 | mu = theano.shared(mu) 35 | 36 | a = ( x.dimshuffle(0, 'x', 1) - mu.dimshuffle('x', 0, 1) ) / sigma 37 | 38 | E = log_mean_exp(-0.5*(a**2).sum(2)) 39 | 40 | Z = mu.shape[1] * T.log(sigma * numpy.sqrt(numpy.pi * 2)) 41 | 42 | return theano.function([x], E - Z) 43 | 44 | 45 | def numpy_parzen(x, mu, sigma): 46 | a = ( x[:, None, :] - mu[None, :, :] ) / sigma 47 | 48 | def log_mean(i): 49 | return i.max(1) + numpy.log(numpy.exp(i - i.max(1)[:, None]).mean(1)) 50 | 51 | return log_mean(-0.5 * (a**2).sum(2)) - mu.shape[1] * numpy.log(sigma * numpy.sqrt(numpy.pi * 2)) 52 | 53 | 54 | def get_ll(x, parzen, batch_size=10): 55 | inds = range(x.shape[0]) 56 | 57 | n_batches = int(numpy.ceil(float(len(inds)) / batch_size)) 58 | 59 | times = [] 60 | lls = [] 61 | for i in range(n_batches): 62 | begin = time.time() 63 | ll = parzen(x[inds[i::n_batches]]) 64 | end = time.time() 65 | 66 | times.append(end-begin) 67 | 68 | lls.extend(ll) 69 | 70 | if i % 10 == 0: 71 | print i, numpy.mean(times), numpy.mean(lls) 72 | 73 | return lls 74 | 75 | 76 | def main(sigma, dataset, sample_path='samples.npy'): 77 | 78 | # provide a .npy file where 10k generated samples are saved. 79 | filename = sample_path 80 | 81 | print 'loading samples from %s'%filename 82 | 83 | (train_X, train_Y), (valid_X, valid_Y), (test_X, test_Y) = load_mnist('.') 84 | 85 | samples = numpy.load(filename) 86 | 87 | parzen = theano_parzen(samples, sigma) 88 | 89 | test_ll = get_ll(test_X, parzen) 90 | 91 | print "Mean Log-Likelihood of test set = %.5f" % numpy.mean(test_ll) 92 | print "Std of Mean Log-Likelihood of test set = %.5f" % (numpy.std(test_ll) / 100) 93 | 94 | 95 | if __name__ == "__main__": 96 | # to use it on MNIST: python likelihood_estimation_parzen.py 0.23 MNIST 97 | main(float(sys.argv[1]), sys.argv[2]) 98 | 99 | -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- 1 | import numpy, os, sys, cPickle 2 | import theano 3 | import theano.tensor as T 4 | import theano.sandbox.rng_mrg as RNG_MRG 5 | import PIL.Image 6 | from collections import OrderedDict 7 | from image_tiler import * 8 | import time 9 | import argparse 10 | 11 | cast32 = lambda x : numpy.cast['float32'](x) 12 | trunc = lambda x : str(x)[:8] 13 | logit = lambda p : numpy.log(p / (1 - p) ) 14 | binarize = lambda x : cast32(x >= 0.5) 15 | sigmoid = lambda x : cast32(1. / (1 + numpy.exp(-x))) 16 | 17 | def SaltAndPepper(X, rate=0.3): 18 | # Salt and pepper noise 19 | 20 | drop = numpy.arange(X.shape[1]) 21 | numpy.random.shuffle(drop) 22 | sep = int(len(drop)*rate) 23 | drop = drop[:sep] 24 | X[:, drop[:sep/2]]=0 25 | X[:, drop[sep/2:]]=1 26 | return X 27 | 28 | def get_shared_weights(n_in, n_out, interval, name): 29 | #val = numpy.random.normal(0, sigma_sqr, size=(n_in, n_out)) 30 | val = numpy.random.uniform(-interval, interval, size=(n_in, n_out)) 31 | val = cast32(val) 32 | val = theano.shared(value = val, name = name) 33 | return val 34 | 35 | def get_shared_bias(n, name, offset = 0): 36 | val = numpy.zeros(n) - offset 37 | val = cast32(val) 38 | val = theano.shared(value = val, name = name) 39 | return val 40 | 41 | def load_mnist(path): 42 | data = cPickle.load(open(os.path.join(path,'mnist.pkl'), 'r')) 43 | return data 44 | 45 | def load_mnist_binary(path): 46 | data = cPickle.load(open(os.path.join(path,'mnist.pkl'), 'r')) 47 | data = [list(d) for d in data] 48 | data[0][0] = (data[0][0] > 0.5).astype('float32') 49 | data[1][0] = (data[1][0] > 0.5).astype('float32') 50 | data[2][0] = (data[2][0] > 0.5).astype('float32') 51 | data = tuple([tuple(d) for d in data]) 52 | return data 53 | 54 | def load_tfd(path): 55 | import scipy.io as io 56 | data = io.loadmat(os.path.join(path, 'TFD_48x48.mat')) 57 | X = cast32(data['images'])/cast32(255) 58 | X = X.reshape((X.shape[0], X.shape[1] * X.shape[2])) 59 | labels = data['labs_ex'].flatten() 60 | labeled = labels != -1 61 | unlabeled = labels == -1 62 | train_X = X[unlabeled] 63 | valid_X = X[unlabeled][:100] # Stuf 64 | test_X = X[labeled] 65 | 66 | del data 67 | 68 | return (train_X, labels[unlabeled]), (valid_X, labels[unlabeled][:100]), (test_X, labels[labeled]) 69 | 70 | def experiment(state, channel): 71 | if state.test_model and 'config' in os.listdir('.'): 72 | print 'Loading local config file' 73 | config_file = open('config', 'r') 74 | config = config_file.readlines() 75 | try: 76 | config_vals = config[0].split('(')[1:][0].split(')')[:-1][0].split(', ') 77 | except: 78 | config_vals = config[0][3:-1].replace(': ','=').replace("'","").split(', ') 79 | config_vals = filter(lambda x:not 'jobman' in x and not '/' in x and not ':' in x and not 'experiment' in x, config_vals) 80 | 81 | for CV in config_vals: 82 | print CV 83 | if CV.startswith('test'): 84 | print 'Do not override testing switch' 85 | continue 86 | try: 87 | exec('state.'+CV) in globals(), locals() 88 | except: 89 | exec('state.'+CV.split('=')[0]+"='"+CV.split('=')[1]+"'") in globals(), locals() 90 | 91 | 92 | 93 | else: 94 | # Save the current configuration 95 | # Useful for logs/experiments 96 | print 'Saving config' 97 | f = open('config', 'w') 98 | f.write(str(state)) 99 | f.close() 100 | 101 | 102 | print state 103 | # Load the data, train = train+valid, and shuffle train 104 | # Targets are not used (will be misaligned after shuffling train 105 | if state.dataset == 'MNIST': 106 | (train_X, train_Y), (valid_X, valid_Y), (test_X, test_Y) = load_mnist(state.data_path) 107 | train_X = numpy.concatenate((train_X, valid_X)) 108 | 109 | elif state.dataset == 'MNIST_binary': 110 | (train_X, train_Y), (valid_X, valid_Y), (test_X, test_Y) = load_mnist_binary(state.data_path) 111 | train_X = numpy.concatenate((train_X, valid_X)) 112 | 113 | elif state.dataset == 'TFD': 114 | (train_X, train_Y), (valid_X, valid_Y), (test_X, test_Y) = load_tfd(state.data_path) 115 | 116 | N_input = train_X.shape[1] 117 | root_N_input = numpy.sqrt(N_input) 118 | numpy.random.seed(1) 119 | numpy.random.shuffle(train_X) 120 | train_X = theano.shared(train_X) 121 | valid_X = theano.shared(valid_X) 122 | test_X = theano.shared(test_X) 123 | 124 | # Theano variables and RNG 125 | X = T.fmatrix() 126 | index = T.lscalar() 127 | MRG = RNG_MRG.MRG_RandomStreams(1) 128 | 129 | # Network and training specifications 130 | K = state.K # N hidden layers 131 | N = state.N # number of walkbacks 132 | layer_sizes = [N_input] + [state.hidden_size] * K # layer sizes, from h0 to hK (h0 is the visible layer) 133 | learning_rate = theano.shared(cast32(state.learning_rate)) # learning rate 134 | annealing = cast32(state.annealing) # exponential annealing coefficient 135 | momentum = theano.shared(cast32(state.momentum)) # momentum term 136 | 137 | # THEANO VARIABLES 138 | X = T.fmatrix() # Input of the graph 139 | index = T.lscalar() # index to minibatch 140 | MRG = RNG_MRG.MRG_RandomStreams(1) 141 | 142 | 143 | # PARAMETERS : weights list and bias list. 144 | # initialize a list of weights and biases based on layer_sizes 145 | weights_list = [get_shared_weights(layer_sizes[i], layer_sizes[i+1], numpy.sqrt(6. / (layer_sizes[i] + layer_sizes[i+1] )), 'W') for i in range(K)] 146 | bias_list = [get_shared_bias(layer_sizes[i], 'b') for i in range(K + 1)] 147 | 148 | if state.test_model: 149 | # Load the parameters of the last epoch 150 | # maybe if the path is given, load these specific attributes 151 | param_files = filter(lambda x:'params' in x, os.listdir('.')) 152 | max_epoch_idx = numpy.argmax([int(x.split('_')[-1].split('.')[0]) for x in param_files]) 153 | params_to_load = param_files[max_epoch_idx] 154 | PARAMS = cPickle.load(open(params_to_load,'r')) 155 | [p.set_value(lp.get_value(borrow=False)) for lp, p in zip(PARAMS[:len(weights_list)], weights_list)] 156 | [p.set_value(lp.get_value(borrow=False)) for lp, p in zip(PARAMS[len(weights_list):], bias_list)] 157 | 158 | # Util functions 159 | def dropout(IN, p = 0.5): 160 | noise = MRG.binomial(p = p, n = 1, size = IN.shape, dtype='float32') 161 | OUT = (IN * noise) / cast32(p) 162 | return OUT 163 | 164 | def add_gaussian_noise(IN, std = 1): 165 | print 'GAUSSIAN NOISE : ', std 166 | noise = MRG.normal(avg = 0, std = std, size = IN.shape, dtype='float32') 167 | OUT = IN + noise 168 | return OUT 169 | 170 | def corrupt_input(IN, p = 0.5): 171 | # salt and pepper? masking? 172 | noise = MRG.binomial(p = p, n = 1, size = IN.shape, dtype='float32') 173 | IN = IN * noise 174 | return IN 175 | 176 | def salt_and_pepper(IN, p = 0.2): 177 | # salt and pepper noise 178 | print 'DAE uses salt and pepper noise' 179 | a = MRG.binomial(size=IN.shape, n=1, 180 | p = 1 - p, 181 | dtype='float32') 182 | b = MRG.binomial(size=IN.shape, n=1, 183 | p = 0.5, 184 | dtype='float32') 185 | c = T.eq(a,0) * b 186 | return IN * a + c 187 | 188 | # Odd layer update function 189 | # just a loop over the odd layers 190 | def update_odd_layers(hiddens, noisy): 191 | for i in range(1, K+1, 2): 192 | print i 193 | if noisy: 194 | simple_update_layer(hiddens, None, i) 195 | else: 196 | simple_update_layer(hiddens, None, i, add_noise = False) 197 | 198 | # Even layer update 199 | # p_X_chain is given to append the p(X|...) at each update (one update = odd update + even update) 200 | def update_even_layers(hiddens, p_X_chain, noisy): 201 | for i in range(0, K+1, 2): 202 | print i 203 | if noisy: 204 | simple_update_layer(hiddens, p_X_chain, i) 205 | else: 206 | simple_update_layer(hiddens, p_X_chain, i, add_noise = False) 207 | 208 | # The layer update function 209 | # hiddens : list containing the symbolic theano variables [visible, hidden1, hidden2, ...] 210 | # layer_update will modify this list inplace 211 | # p_X_chain : list containing the successive p(X|...) at each update 212 | # update_layer will append to this list 213 | # add_noise : pre and post activation gaussian noise 214 | 215 | def simple_update_layer(hiddens, p_X_chain, i, add_noise=True): 216 | # Compute the dot product, whatever layer 217 | post_act_noise = 0 218 | 219 | if i == 0: 220 | hiddens[i] = T.dot(hiddens[i+1], weights_list[i].T) + bias_list[i] 221 | 222 | elif i == K: 223 | hiddens[i] = T.dot(hiddens[i-1], weights_list[i-1]) + bias_list[i] 224 | 225 | else: 226 | # next layer : layers[i+1], assigned weights : W_i 227 | # previous layer : layers[i-1], assigned weights : W_(i-1) 228 | hiddens[i] = T.dot(hiddens[i+1], weights_list[i].T) + T.dot(hiddens[i-1], weights_list[i-1]) + bias_list[i] 229 | 230 | # Add pre-activation noise if NOT input layer 231 | if i==1 and state.noiseless_h1: 232 | print '>>NO noise in first layer' 233 | add_noise = False 234 | 235 | # pre activation noise 236 | if i != 0 and add_noise: 237 | print 'Adding pre-activation gaussian noise' 238 | hiddens[i] = add_gaussian_noise(hiddens[i], state.hidden_add_noise_sigma) 239 | 240 | # ACTIVATION! 241 | if i == 0: 242 | print 'Sigmoid units' 243 | hiddens[i] = T.nnet.sigmoid(hiddens[i]) 244 | else: 245 | print 'Hidden units' 246 | hiddens[i] = hidden_activation(hiddens[i]) 247 | 248 | # post activation noise 249 | if i != 0 and add_noise: 250 | print 'Adding post-activation gaussian noise' 251 | hiddens[i] = add_gaussian_noise(hiddens[i], state.hidden_add_noise_sigma) 252 | 253 | # build the reconstruction chain 254 | if i == 0: 255 | # if input layer -> append p(X|...) 256 | p_X_chain.append(hiddens[i]) 257 | 258 | # sample from p(X|...) 259 | if state.input_sampling: 260 | print 'Sampling from input' 261 | sampled = MRG.binomial(p = hiddens[i], size=hiddens[i].shape, dtype='float32') 262 | else: 263 | print '>>NO input sampling' 264 | sampled = hiddens[i] 265 | # add noise 266 | sampled = salt_and_pepper(sampled, state.input_salt_and_pepper) 267 | 268 | # set input layer 269 | hiddens[i] = sampled 270 | 271 | def update_layers(hiddens, p_X_chain, noisy = True): 272 | print 'odd layer update' 273 | update_odd_layers(hiddens, noisy) 274 | print 275 | print 'even layer update' 276 | update_even_layers(hiddens, p_X_chain, noisy) 277 | 278 | 279 | ''' F PROP ''' 280 | #X = T.fmatrix() 281 | if state.act == 'sigmoid': 282 | print 'Using sigmoid activation' 283 | hidden_activation = T.nnet.sigmoid 284 | elif state.act == 'rectifier': 285 | print 'Using rectifier activation' 286 | hidden_activation = lambda x : T.maximum(cast32(0), x) 287 | elif state.act == 'tanh': 288 | hidden_activation = lambda x : T.tanh(x) 289 | 290 | 291 | ''' Corrupt X ''' 292 | X_corrupt = salt_and_pepper(X, state.input_salt_and_pepper) 293 | 294 | ''' hidden layer init ''' 295 | 296 | hiddens = [X_corrupt] 297 | p_X_chain = [] 298 | print "Hidden units initialization" 299 | for w,b in zip(weights_list, bias_list[1:]): 300 | # init with zeros 301 | print "Init hidden units at zero before creating the graph" 302 | hiddens.append(T.zeros_like(T.dot(hiddens[-1], w))) 303 | 304 | # The layer update scheme 305 | print "Building the graph :", N,"updates" 306 | for i in range(N): 307 | update_layers(hiddens, p_X_chain) 308 | 309 | 310 | # COST AND GRADIENTS 311 | 312 | print 'Cost w.r.t p(X|...) at every step in the graph' 313 | #COST = T.mean(T.nnet.binary_crossentropy(reconstruction, X)) 314 | COST = [T.mean(T.nnet.binary_crossentropy(rX, X)) for rX in p_X_chain] 315 | show_COST = COST[-1] 316 | COST = numpy.sum(COST) 317 | 318 | params = weights_list + bias_list 319 | 320 | gradient = T.grad(COST, params) 321 | 322 | gradient_buffer = [theano.shared(numpy.zeros(x.get_value().shape, dtype='float32')) for x in params] 323 | 324 | m_gradient = [momentum * gb + (cast32(1) - momentum) * g for (gb, g) in zip(gradient_buffer, gradient)] 325 | g_updates = [(p, p - learning_rate * mg) for (p, mg) in zip(params, m_gradient)] 326 | b_updates = zip(gradient_buffer, m_gradient) 327 | 328 | updates = OrderedDict(g_updates + b_updates) 329 | 330 | f_cost = theano.function(inputs = [X], outputs = show_COST) 331 | 332 | indexed_batch = train_X[index * state.batch_size : (index+1) * state.batch_size] 333 | sampled_batch = MRG.binomial(p = indexed_batch, size = indexed_batch.shape, dtype='float32') 334 | 335 | f_learn = theano.function(inputs = [index], 336 | updates = updates, 337 | givens = {X : indexed_batch}, 338 | outputs = show_COST) 339 | 340 | f_test = theano.function(inputs = [X], 341 | outputs = [X_corrupt] + hiddens[0] + p_X_chain, 342 | on_unused_input = 'warn') 343 | 344 | 345 | ############# 346 | # Denoise some numbers : show number, noisy number, reconstructed number 347 | ############# 348 | import random as R 349 | R.seed(1) 350 | random_idx = numpy.array(R.sample(range(len(test_X.get_value())), 100)) 351 | numbers = test_X.get_value()[random_idx] 352 | 353 | f_noise = theano.function(inputs = [X], outputs = salt_and_pepper(X, state.input_salt_and_pepper)) 354 | noisy_numbers = f_noise(test_X.get_value()[random_idx]) 355 | 356 | # Recompile the graph without noise for reconstruction function 357 | hiddens_R = [X] 358 | p_X_chain_R = [] 359 | 360 | for w,b in zip(weights_list, bias_list[1:]): 361 | # init with zeros 362 | hiddens_R.append(T.zeros_like(T.dot(hiddens_R[-1], w))) 363 | 364 | # The layer update scheme 365 | for i in range(N): 366 | update_layers(hiddens_R, p_X_chain_R, noisy=False) 367 | 368 | f_recon = theano.function(inputs = [X], outputs = p_X_chain_R[-1]) 369 | 370 | 371 | ############ 372 | # Sampling # 373 | ############ 374 | 375 | # the input to the sampling function 376 | network_state_input = [X] + [T.fmatrix() for i in range(K)] 377 | 378 | # "Output" state of the network (noisy) 379 | # initialized with input, then we apply updates 380 | #network_state_output = network_state_input 381 | 382 | network_state_output = [X] + network_state_input[1:] 383 | 384 | visible_pX_chain = [] 385 | 386 | # ONE update 387 | update_layers(network_state_output, visible_pX_chain, noisy=True) 388 | 389 | if K == 1: 390 | f_sample_simple = theano.function(inputs = [X], outputs = visible_pX_chain[-1]) 391 | 392 | 393 | # WHY IS THERE A WARNING???? 394 | # because the first odd layers are not used -> directly computed FROM THE EVEN layers 395 | # unused input = warn 396 | f_sample2 = theano.function(inputs = network_state_input, outputs = network_state_output + visible_pX_chain, on_unused_input='warn') 397 | 398 | def sample_some_numbers_single_layer(): 399 | x0 = test_X.get_value()[:1] 400 | samples = [x0] 401 | x = f_noise(x0) 402 | for i in range(399): 403 | x = f_sample_simple(x) 404 | samples.append(x) 405 | x = numpy.random.binomial(n=1, p=x, size=x.shape).astype('float32') 406 | x = f_noise(x) 407 | return numpy.vstack(samples) 408 | 409 | def sampling_wrapper(NSI): 410 | out = f_sample2(*NSI) 411 | NSO = out[:len(network_state_output)] 412 | vis_pX_chain = out[len(network_state_output):] 413 | return NSO, vis_pX_chain 414 | 415 | def sample_some_numbers(N=400): 416 | # The network's initial state 417 | init_vis = test_X.get_value()[:1] 418 | 419 | noisy_init_vis = f_noise(init_vis) 420 | 421 | network_state = [[noisy_init_vis] + [numpy.zeros((1,len(b.get_value())), dtype='float32') for b in bias_list[1:]]] 422 | 423 | visible_chain = [init_vis] 424 | 425 | noisy_h0_chain = [noisy_init_vis] 426 | 427 | for i in range(N-1): 428 | 429 | # feed the last state into the network, compute new state, and obtain visible units expectation chain 430 | net_state_out, vis_pX_chain = sampling_wrapper(network_state[-1]) 431 | 432 | # append to the visible chain 433 | visible_chain += vis_pX_chain 434 | 435 | # append state output to the network state chain 436 | network_state.append(net_state_out) 437 | 438 | noisy_h0_chain.append(net_state_out[0]) 439 | 440 | return numpy.vstack(visible_chain), numpy.vstack(noisy_h0_chain) 441 | 442 | def plot_samples(epoch_number): 443 | to_sample = time.time() 444 | if K == 1: 445 | # one layer model 446 | V = sample_some_numbers_single_layer() 447 | else: 448 | V, H0 = sample_some_numbers() 449 | img_samples = PIL.Image.fromarray(tile_raster_images(V, (root_N_input,root_N_input), (20,20))) 450 | 451 | fname = 'samples_epoch_'+str(epoch_number)+'.png' 452 | img_samples.save(fname) 453 | print 'Took ' + str(time.time() - to_sample) + ' to sample 400 numbers' 454 | 455 | ############## 456 | # Inpainting # 457 | ############## 458 | def inpainting(digit): 459 | # The network's initial state 460 | 461 | # NOISE INIT 462 | init_vis = cast32(numpy.random.uniform(size=digit.shape)) 463 | 464 | #noisy_init_vis = f_noise(init_vis) 465 | #noisy_init_vis = cast32(numpy.random.uniform(size=init_vis.shape)) 466 | 467 | # INDEXES FOR VISIBLE AND NOISY PART 468 | noise_idx = (numpy.arange(N_input) % root_N_input < (root_N_input/2)) 469 | fixed_idx = (numpy.arange(N_input) % root_N_input > (root_N_input/2)) 470 | # function to re-init the visible to the same noise 471 | 472 | # FUNCTION TO RESET HALF VISIBLE TO DIGIT 473 | def reset_vis(V): 474 | V[0][fixed_idx] = digit[0][fixed_idx] 475 | return V 476 | 477 | # INIT DIGIT : NOISE and RESET HALF TO DIGIT 478 | init_vis = reset_vis(init_vis) 479 | 480 | network_state = [[init_vis] + [numpy.zeros((1,len(b.get_value())), dtype='float32') for b in bias_list[1:]]] 481 | 482 | visible_chain = [init_vis] 483 | 484 | noisy_h0_chain = [init_vis] 485 | 486 | for i in range(49): 487 | 488 | # feed the last state into the network, compute new state, and obtain visible units expectation chain 489 | net_state_out, vis_pX_chain = sampling_wrapper(network_state[-1]) 490 | 491 | 492 | # reset half the digit 493 | net_state_out[0] = reset_vis(net_state_out[0]) 494 | vis_pX_chain[0] = reset_vis(vis_pX_chain[0]) 495 | 496 | # append to the visible chain 497 | visible_chain += vis_pX_chain 498 | 499 | # append state output to the network state chain 500 | network_state.append(net_state_out) 501 | 502 | noisy_h0_chain.append(net_state_out[0]) 503 | 504 | return numpy.vstack(visible_chain), numpy.vstack(noisy_h0_chain) 505 | 506 | 507 | 508 | 509 | 510 | def save_params(n, params): 511 | print 'saving parameters...' 512 | save_path = 'params_epoch_'+str(n)+'.pkl' 513 | f = open(save_path, 'wb') 514 | try: 515 | cPickle.dump(params, f, protocol=cPickle.HIGHEST_PROTOCOL) 516 | finally: 517 | f.close() 518 | 519 | # TRAINING 520 | n_epoch = state.n_epoch 521 | batch_size = state.batch_size 522 | STOP = False 523 | counter = 0 524 | 525 | train_costs = [] 526 | valid_costs = [] 527 | test_costs = [] 528 | 529 | if state.vis_init: 530 | bias_list[0].set_value(logit(numpy.clip(0.9,0.001,train_X.get_value().mean(axis=0)))) 531 | 532 | if state.test_model: 533 | # If testing, do not train and go directly to generating samples, parzen window estimation, and inpainting 534 | print 'Testing : skip training' 535 | STOP = True 536 | 537 | 538 | while not STOP: 539 | counter += 1 540 | t = time.time() 541 | print counter,'\t', 542 | 543 | #train 544 | train_cost = [] 545 | for i in range(len(train_X.get_value(borrow=True)) / batch_size): 546 | #train_cost.append(f_learn(train_X[i * batch_size : (i+1) * batch_size])) 547 | #training_idx = numpy.array(range(i*batch_size, (i+1)*batch_size), dtype='int32') 548 | train_cost.append(f_learn(i)) 549 | train_cost = numpy.mean(train_cost) 550 | train_costs.append(train_cost) 551 | print 'Train : ',trunc(train_cost), '\t', 552 | 553 | 554 | #valid 555 | valid_cost = [] 556 | for i in range(len(valid_X.get_value(borrow=True)) / 100): 557 | valid_cost.append(f_cost(valid_X.get_value()[i * 100 : (i+1) * batch_size])) 558 | valid_cost = numpy.mean(valid_cost) 559 | #valid_cost = 123 560 | valid_costs.append(valid_cost) 561 | print 'Valid : ', trunc(valid_cost), '\t', 562 | 563 | #test 564 | test_cost = [] 565 | for i in range(len(test_X.get_value(borrow=True)) / 100): 566 | test_cost.append(f_cost(test_X.get_value()[i * 100 : (i+1) * batch_size])) 567 | test_cost = numpy.mean(test_cost) 568 | test_costs.append(test_cost) 569 | print 'Test : ', trunc(test_cost), '\t', 570 | 571 | if counter >= n_epoch: 572 | STOP = True 573 | 574 | print 'time : ', trunc(time.time() - t), 575 | 576 | print 'MeanVisB : ', trunc(bias_list[0].get_value().mean()), 577 | 578 | print 'W : ', [trunc(abs(w.get_value(borrow=True)).mean()) for w in weights_list] 579 | 580 | if (counter % 5) == 0: 581 | # Checking reconstruction 582 | reconstructed = f_recon(noisy_numbers) 583 | # Concatenate stuff 584 | stacked = numpy.vstack([numpy.vstack([numbers[i*10 : (i+1)*10], noisy_numbers[i*10 : (i+1)*10], reconstructed[i*10 : (i+1)*10]]) for i in range(10)]) 585 | 586 | number_reconstruction = PIL.Image.fromarray(tile_raster_images(stacked, (root_N_input,root_N_input), (10,30))) 587 | #epoch_number = reduce(lambda x,y : x + y, ['_'] * (4-len(str(counter)))) + str(counter) 588 | number_reconstruction.save('number_reconstruction'+str(counter)+'.png') 589 | 590 | #sample_numbers(counter, 'seven') 591 | plot_samples(counter) 592 | 593 | #save params 594 | save_params(counter, params) 595 | 596 | # ANNEAL! 597 | new_lr = learning_rate.get_value() * annealing 598 | learning_rate.set_value(new_lr) 599 | 600 | # Save 601 | state.train_costs = train_costs 602 | state.valid_costs = valid_costs 603 | state.test_costs = test_costs 604 | 605 | # if test 606 | 607 | # 10k samples 608 | print 'Generating 10,000 samples' 609 | samples, _ = sample_some_numbers(N=10000) 610 | f_samples = 'samples.npy' 611 | numpy.save(f_samples, samples) 612 | print 'saved digits' 613 | 614 | 615 | # parzen 616 | print 'Evaluating parzen window' 617 | import likelihood_estimation_parzen 618 | likelihood_estimation_parzen.main(0.20,'mnist') 619 | 620 | # Inpainting 621 | print 'Inpainting' 622 | test_X = test_X.get_value() 623 | 624 | numpy.random.seed(2) 625 | test_idx = numpy.arange(len(test_Y)) 626 | 627 | for Iter in range(10): 628 | 629 | numpy.random.shuffle(test_idx) 630 | test_X = test_X[test_idx] 631 | test_Y = test_Y[test_idx] 632 | 633 | digit_idx = [(test_Y==i).argmax() for i in range(10)] 634 | inpaint_list = [] 635 | 636 | for idx in digit_idx: 637 | DIGIT = test_X[idx:idx+1] 638 | V_inpaint, H_inpaint = inpainting(DIGIT) 639 | inpaint_list.append(V_inpaint) 640 | 641 | INPAINTING = numpy.vstack(inpaint_list) 642 | 643 | plot_inpainting = PIL.Image.fromarray(tile_raster_images(INPAINTING, (root_N_input,root_N_input), (10,50))) 644 | 645 | fname = 'inpainting_'+str(Iter)+'.png' 646 | #fname = os.path.join(state.model_path, fname) 647 | 648 | plot_inpainting.save(fname) 649 | 650 | if False and __name__ == "__main__": 651 | os.system('eog inpainting.png') 652 | 653 | 654 | 655 | 656 | if __name__ == '__main__': 657 | import ipdb; ipdb.set_trace() 658 | 659 | return 660 | -------------------------------------------------------------------------------- /run_dae_no_walkback.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script will train a single layer model known as Generalized Denoising Auto-Encoder 3 | WITHOUT the walkback training. 4 | 5 | Reference paper: 6 | 'Generalized Denoising Auto-Encoders as Generative Models' 7 | Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent 8 | http://arxiv.org/abs/1305.6663 9 | """ 10 | import argparse 11 | import model 12 | 13 | def main(): 14 | parser = argparse.ArgumentParser() 15 | # Add options here 16 | parser.add_argument('--K', type=int, default=1) 17 | parser.add_argument('--N', type=int, default=1) 18 | parser.add_argument('--n_epoch', type=int, default=200) 19 | parser.add_argument('--batch_size', type=int, default=100) 20 | parser.add_argument('--hidden_add_noise_sigma', type=float, default=0) 21 | parser.add_argument('--input_salt_and_pepper', type=float, default=0.4) 22 | parser.add_argument('--learning_rate', type=float, default=10) 23 | parser.add_argument('--momentum', type=float, default=0.) 24 | parser.add_argument('--annealing', type=float, default=1.) 25 | parser.add_argument('--hidden_size', type=float, default=2000) 26 | parser.add_argument('--act', type=str, default='sigmoid') 27 | parser.add_argument('--dataset', type=str, default='MNIST_binary') 28 | parser.add_argument('--data_path', type=str, default='.') 29 | 30 | # argparse does not deal with bool 31 | parser.add_argument('--vis_init', type=int, default=0) 32 | parser.add_argument('--noiseless_h1', type=int, default=1) 33 | parser.add_argument('--input_sampling', type=int, default=1) 34 | parser.add_argument('--test_model', type=int, default=0) 35 | 36 | args = parser.parse_args() 37 | 38 | model.experiment(args, None) 39 | 40 | if __name__ == '__main__': 41 | main() -------------------------------------------------------------------------------- /run_dae_walkback.py: -------------------------------------------------------------------------------- 1 | """ 2 | This script will train a single layer model known as Generalized Denoising Auto-Encoder 3 | WITH 5 steps of 'walkback' training. 4 | 5 | Reference paper: 6 | 'Generalized Denoising Auto-Encoders as Generative Models' 7 | Yoshua Bengio, Li Yao, Guillaume Alain, Pascal Vincent 8 | http://arxiv.org/abs/1305.6663 9 | 10 | """ 11 | import argparse 12 | import model 13 | 14 | def main(): 15 | parser = argparse.ArgumentParser() 16 | # Add options here 17 | parser.add_argument('--K', type=int, default=1) # nubmer of hidden layers 18 | parser.add_argument('--N', type=int, default=5) # number of walkbacks 19 | parser.add_argument('--n_epoch', type=int, default=500) 20 | parser.add_argument('--batch_size', type=int, default=100) 21 | parser.add_argument('--hidden_add_noise_sigma', type=float, default=0) 22 | parser.add_argument('--input_salt_and_pepper', type=float, default=0.4) 23 | parser.add_argument('--learning_rate', type=float, default=10) 24 | parser.add_argument('--momentum', type=float, default=0.) 25 | parser.add_argument('--annealing', type=float, default=1.) 26 | parser.add_argument('--hidden_size', type=float, default=2000) 27 | parser.add_argument('--act', type=str, default='sigmoid') 28 | parser.add_argument('--dataset', type=str, default='MNIST_binary') 29 | parser.add_argument('--data_path', type=str, default='.') 30 | 31 | # argparse does not deal with bool 32 | parser.add_argument('--vis_init', type=int, default=0) 33 | parser.add_argument('--noiseless_h1', type=int, default=1) 34 | parser.add_argument('--input_sampling', type=int, default=1) 35 | parser.add_argument('--test_model', type=int, default=0) 36 | 37 | args = parser.parse_args() 38 | 39 | model.experiment(args, None) 40 | 41 | if __name__ == '__main__': 42 | main() -------------------------------------------------------------------------------- /run_gsn.py: -------------------------------------------------------------------------------- 1 | ''' 2 | This scripts produces the model trained on MNIST discussed in the paper: 3 | 4 | 'Deep Generative Stochastic Networks Trainable by Backprop' 5 | Yoshua Bengio, Eric Thibodeau-Laufer 6 | http://arxiv.org/abs/1306.1091 7 | ''' 8 | import argparse 9 | import model 10 | 11 | def main(): 12 | parser = argparse.ArgumentParser() 13 | # Add options here 14 | 15 | parser.add_argument('--K', type=int, default=2) # nubmer of hidden layers 16 | parser.add_argument('--N', type=int, default=4) # number of walkbacks 17 | parser.add_argument('--n_epoch', type=int, default=1000) 18 | parser.add_argument('--batch_size', type=int, default=100) 19 | parser.add_argument('--hidden_add_noise_sigma', type=float, default=2) 20 | parser.add_argument('--input_salt_and_pepper', type=float, default=0.4) 21 | parser.add_argument('--learning_rate', type=float, default=0.25) 22 | parser.add_argument('--momentum', type=float, default=0.5) 23 | parser.add_argument('--annealing', type=float, default=0.995) 24 | parser.add_argument('--hidden_size', type=float, default=1500) 25 | parser.add_argument('--act', type=str, default='tanh') 26 | parser.add_argument('--dataset', type=str, default='MNIST') 27 | parser.add_argument('--data_path', type=str, default='.') 28 | 29 | # argparse does not deal with bool 30 | parser.add_argument('--vis_init', type=int, default=0) 31 | parser.add_argument('--noiseless_h1', type=int, default=1) 32 | parser.add_argument('--input_sampling', type=int, default=1) 33 | parser.add_argument('--test_model', type=int, default=0) 34 | 35 | args = parser.parse_args() 36 | 37 | print args.test_model 38 | 39 | model.experiment(args, None) 40 | 41 | if __name__ == '__main__': 42 | main() 43 | --------------------------------------------------------------------------------