├── .gitignore ├── LICENSE ├── README.md ├── lib ├── __init__.py ├── __init__.pyc ├── __pycache__ │ ├── __init__.cpython-35.pyc │ ├── dbn.cpython-35.pyc │ ├── deeplearning.cpython-35.pyc │ ├── mlp.cpython-35.pyc │ └── rbm.cpython-35.pyc ├── dbn.py ├── dbn.pyc ├── deeplearning.py ├── deeplearning.pyc ├── mlp.py ├── mlp.pyc ├── rbm.py └── rbm.pyc ├── notebooks ├── data_proc.ipynb ├── topic_modelling.ipynb ├── train_dbn.ipynb ├── train_sae.ipynb └── train_sae_2000.ipynb ├── scripts_R └── clustering.R ├── scripts_old ├── UT_1_gibbs_sampling.ipynb ├── Untitled.ipynb ├── Untitled1.ipynb ├── Untitled2.ipynb ├── finalized_1_gibbs.ipynb ├── gen_testing_1.ipynb └── score_rsm.ipynb └── scripts_python ├── 20news_dtm.py ├── data_proc_20news.py ├── pretrain_dbn.py ├── train_dbn.py ├── train_rsm.py └── train_sae.py /.gitignore: -------------------------------------------------------------------------------- 1 | data/ 2 | old_scripts/ 3 | params*/ 4 | trash/ 5 | .ipynb*/ 6 | pgo*/ 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep Learning Topic Modelling 2 | This repo is a collection of neural network tools, built on top of the [Theano](http://deeplearning.net/software/theano/) framework with the primary objective of performing Topic Modelling. Topic modelling is commonly approached using the [Latent Dirichlet Allocation](https://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf) (LDA) or [Latent Semantic Analysis](https://en.wikipedia.org/wiki/Latent_semantic_analysis) (LSA) algorithms but more recently, with the advent of modelling count data using [Restricted Boltzmann Machines](https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine) (RBMs), also known as the [Replicated Softmax Model](https://papers.nips.cc/paper/3856-replicated-softmax-an-undirected-topic-model.pdf) (RSM), Deep Neural Network models were soon adapted to perform Topic Modelling with results empirically shown to be in better agreement with human's semantic interpretations (see [[1](http://www.utstat.toronto.edu/~rsalakhu/papers/topics.pdf)]). 3 | 4 | The overview of the model construction comprises of 3 phases. 5 | 6 | ![addfdfdf](http://i.imgur.com/pVs5Rvb.png)_Image taken from [[1](http://www.utstat.toronto.edu/~rsalakhu/papers/topics.pdf)]_ 7 | 8 | 1. The first is to design the Network architecture using a RSM to model the input data followed by stacking as many layers of RBMs as deemed reasonable to model the outputs of the RSM. The stacking of RBMs (and RSM) leads what is called a Deep Generative Model or a more specifically in this case, a [Deep Belief Network](http://deeplearning.net/tutorial/DBN.html) (DBN). Like single layered RSM or RBM, this multi-layered network is bidirectional. It is able to generate encoded outputs from input data and more distinctly, generate 'input' data using encoded data. However, unlike single layered networks, multilayered networks are more likely to be able to generate input data with more similarity to the training data due to their ability to capture structure in high-dimensions. 9 | 10 | 11 | 2. Once the network's architecture is defined, pre-training then follows. Pre-training has empircally been shown to improve the accuracy (or other measures) of neural network models and one of the main hypothesis to justify this phenomena is that pre-training helps configure the network to start off at a more optimal point compared to a random initialization. 12 | 13 | 14 | 3. After pre-training, the DBN is unrolled to produce an [Auto-Encoder](http://deeplearning.net/tutorial/dA.html). Auto-Encoders take in input data and reduce them into their lower dimensional representations before reconstructing them to be as close as possible to their input form. This is effectively a form of data compression but more importantly, it also means that the lower dimensional representations hold sufficient information about its higher dimensional input data for reconstruction to be feasible. Once training, or more appropriately fine-tuning in this case, is completed, only the segment of the Auto-Encoder that produces the lower dimensional output is retained. 15 | 16 | 17 | As these lower dimensional representations of the input data are easier to work with, algorithms that can be used to establish similarities between data points could be applied to the compressed data, to indirectly estimate similarities between the input data. For text data broken down into counts of words in documents, this dimension reduction technique can be used as an alternative method of information retrieval or topic modelling. 18 | 19 | 20 | ## Codes 21 | Much of codes are a modification and addition of codes to the libraries provided by the developers of Theano at http://deeplearning.net/tutorial/. While Theano may now have been slightly overshadowed by its more prominent counterpart, [TensorFlow](https://www.tensorflow.org/), the tutorials and codes at deeplearning.net still provides a good avenue for anyone who wants to get a deeper introduction to deep learning and the mechanics of it. Moreover, given the undeniable inspiration that TensorFlow had from Theano, once Theano is mastered, the transition from Theano to TensorFlow should be almost seamless. 22 | 23 | The main codes are found in the **lib** folder, where we have: 24 | 25 | |no.| codes| description | 26 | |:-:|:-----:|:----| 27 | |1 | [rbm.py](https://github.com/krenova/DeepLearningTopicModels/blob/master/lib/rbm.py) | contains the RBM and RSM classes | 28 | |2 | [mlp.py](https://github.com/krenova/DeepLearningTopicModels/blob/master/lib/mlp.py) | contains the sigmoid and logistic regression classes | 29 | |3 | [dbn.py](https://github.com/krenova/DeepLearningTopicModels/blob/master/lib/dbn.py) | the DBN class to construct the netowrk functions for pre-training and fine tuning| 30 | |4 | [deeplearning.py](https://github.com/krenova/DeepLearningTopicModels/blob/master/lib/deeplearning.py)| wrapper around the DBN class| 31 | 32 | 33 | ## Examples 34 | Examples of using the tools in this repo are written in jupyter notebooks. The data source for the example can be sourced from 35 | http://qwone.com/~jason/20Newsgroups/20news-18828.tar.gz. 36 | 37 | |no.| codes| description | 38 | |:-:|:-----:|:----| 39 | |1 | [data_proc.ipynb](https://github.com/krenova/DeepLearningTopicModels/blob/master/notebooks/data_proc.ipynb) | notebook to process the raw data (please change the data dir name accordingly) | 40 | |2 | [train_dbn.ipynb](https://github.com/krenova/DeepLearningTopicModels/blob/master/notebooks/train_dbn.ipynb) | demonstrates how to pre-train the DBN and subsequently turn it into a Multilayer Perceptron for document classification | 41 | |3 | [train_sae.ipynb](https://github.com/krenova/DeepLearningTopicModels/blob/master/notebooks/train_sae.ipynb) | training the pre-trained model from train_dbn.ipynb as an Auto-Encoder | 42 | |4 | [topic_modelling.ipynb](https://github.com/krenova/DeepLearningTopicModels/blob/master/notebooks/topic_modelling.ipynb)| (using R here) clustering the lower dimensional output of the Auto-Encoder| 43 | 44 | 45 | 46 | ## Reading References 47 | 1. http://www.utstat.toronto.edu/~rsalakhu/papers/topics.pdf 48 | 2. http://deeplearning.net/tutorial/rbm.html 49 | 3. http://deeplearning.net/tutorial/DBN.html 50 | 4. http://deeplearning.net/tutorial/dA.html 51 | 5. http://deeplearning.net/tutorial/SdA.html -------------------------------------------------------------------------------- /lib/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__init__.py -------------------------------------------------------------------------------- /lib/__init__.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__init__.pyc -------------------------------------------------------------------------------- /lib/__pycache__/__init__.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__pycache__/__init__.cpython-35.pyc -------------------------------------------------------------------------------- /lib/__pycache__/dbn.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__pycache__/dbn.cpython-35.pyc -------------------------------------------------------------------------------- /lib/__pycache__/deeplearning.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__pycache__/deeplearning.cpython-35.pyc -------------------------------------------------------------------------------- /lib/__pycache__/mlp.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__pycache__/mlp.cpython-35.pyc -------------------------------------------------------------------------------- /lib/__pycache__/rbm.cpython-35.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__pycache__/rbm.cpython-35.pyc -------------------------------------------------------------------------------- /lib/dbn.py: -------------------------------------------------------------------------------- 1 | """ 2 | """ 3 | from __future__ import print_function, division 4 | import os 5 | import sys 6 | import timeit 7 | 8 | import numpy 9 | 10 | import theano 11 | import theano.tensor as T 12 | from theano.sandbox.rng_mrg import MRG_RandomStreams 13 | 14 | from lib.mlp import HiddenLayer, LogisticRegression 15 | from lib.rbm import RBM, RSM 16 | 17 | 18 | # start-snippet-1 19 | class DBN(object): 20 | """Deep Belief Network 21 | 22 | A deep belief network is obtained by stacking several RBMs on top of each 23 | other. The hidden layer of the RBM at layer `i` becomes the input of the 24 | RBM at layer `i+1`. The first layer RBM gets as input the input of the 25 | network, and the hidden layer of the last RBM represents the output. When 26 | used for classification, the DBN is treated as a MLP, by adding a logistic 27 | regression layer on top. 28 | """ 29 | 30 | def __init__(self, numpy_rng=None, theano_rng=None, n_ins=784, 31 | hidden_layers_sizes=[500, 500], n_outs=10): 32 | """This class is made to support a variable number of layers. 33 | 34 | :type numpy_rng: numpy.random.RandomState 35 | :param numpy_rng: numpy random number generator used to draw initial 36 | weights 37 | 38 | :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams 39 | :param theano_rng: Theano random generator; if None is given one is 40 | generated based on a seed drawn from `rng` 41 | 42 | :type n_ins: int 43 | :param n_ins: dimension of the input to the DBN 44 | 45 | :type hidden_layers_sizes: list of ints 46 | :param hidden_layers_sizes: intermediate layers size, must contain 47 | at least one value 48 | 49 | :type n_outs: int 50 | :param n_outs: dimension of the output of the network 51 | """ 52 | 53 | self.sigmoid_layers = [] 54 | self.rbm_layers = [] 55 | self.params = [] 56 | self.params_rbm = [] 57 | self.n_layers = len(hidden_layers_sizes) 58 | self.hidden_layers_sizes = hidden_layers_sizes 59 | 60 | assert self.n_layers > 0 61 | 62 | if not numpy_rng: 63 | numpy_rng = numpy.random.RandomState(123) 64 | if not theano_rng: 65 | self.theano_rng = T.shared_randomstreams.RandomStreams(1234) 66 | # allocate symbolic variables for the data 67 | 68 | # the data is presented as rasterized images 69 | self.x = T.matrix('x') 70 | 71 | # the labels are presented as 1D vector of [int] labels 72 | self.y = T.ivector('y') 73 | # end-snippet-1 74 | # The DBN is an MLP, for which all weights of intermediate 75 | # layers are shared with a different RBM. We will first 76 | # construct the DBN as a deep multilayer perceptron, and when 77 | # constructing each sigmoidal layer we also construct an RBM 78 | # that shares weights with that layer. During pretraining we 79 | # will train these RBMs (which will lead to chainging the 80 | # weights of the MLP as well) During finetuning we will finish 81 | # training the DBN by doing stochastic gradient descent on the 82 | # MLP. 83 | 84 | for i in range(self.n_layers): 85 | # construct the sigmoidal layer 86 | 87 | # the size of the input is either the number of hidden 88 | # units of the layer below or the input size if we are on 89 | # the first layer 90 | if i == 0: 91 | input_size = n_ins 92 | else: 93 | input_size = hidden_layers_sizes[i - 1] 94 | 95 | # the input to this layer is either the activation of the 96 | # hidden layer below or the input of the DBN if you are on 97 | # the first layer 98 | if i == 0: 99 | layer_input = self.x 100 | else: 101 | layer_input = self.sigmoid_layers[-1].output 102 | print( 'Building layer: ' + str(i) ) 103 | print( ' Input units: ' + str(input_size) ) 104 | print( ' Output units: ' + str(hidden_layers_sizes[i]) ) 105 | sigmoid_layer = HiddenLayer(rng=numpy_rng, 106 | input=layer_input, 107 | n_in=input_size, 108 | n_out=hidden_layers_sizes[i], 109 | activation=T.nnet.sigmoid) 110 | 111 | # add the layer to our list of layers 112 | self.sigmoid_layers.append(sigmoid_layer) 113 | 114 | # its arguably a philosophical question... but we are 115 | # going to only declare that the parameters of the 116 | # sigmoid_layers are parameters of the DBN. The visible 117 | # biases in the RBM are parameters of those RBMs, but not 118 | # of the DBN. 119 | self.params.extend(sigmoid_layer.params) 120 | 121 | # Construct an RBM that share weights with this layer 122 | # First layer will be a RSM for inputs of count data 123 | # while the other hidden layers will be RBMs 124 | if i == 0: 125 | rbm_layer = RSM(input=layer_input, 126 | n_visible=input_size, 127 | n_hidden=hidden_layers_sizes[i], 128 | W=sigmoid_layer.W, 129 | hbias=sigmoid_layer.b) 130 | else: 131 | rbm_layer = RBM(numpy_rng=numpy_rng, 132 | theano_rng=self.theano_rng, 133 | input=layer_input, 134 | n_visible=input_size, 135 | n_hidden=hidden_layers_sizes[i], 136 | W=sigmoid_layer.W, 137 | hbias=sigmoid_layer.b) 138 | self.rbm_layers.append(rbm_layer) 139 | self.params_rbm.extend(rbm_layer.params) 140 | 141 | # We now need to add a logistic layer on top of the MLP 142 | self.logLayer = LogisticRegression( 143 | input=self.sigmoid_layers[-1].output, 144 | n_in=hidden_layers_sizes[-1], 145 | n_out=n_outs) 146 | self.params.extend(self.logLayer.params) 147 | 148 | # compute the cost for second phase of training, defined as the 149 | # negative log likelihood of the logistic regression (output) layer 150 | self.finetune_cost = self.logLayer.negative_log_likelihood(self.y) 151 | 152 | # compute the gradients with respect to the model parameters 153 | # symbolic variable that points to the number of errors made on the 154 | # minibatch given by self.x and self.y 155 | self.errors = self.logLayer.errors(self.y) 156 | 157 | def pretraining_functions(self, train_set_x, batch_size, k): 158 | ''' 159 | Generates a list of functions, for performing one step of 160 | gradient descent at a given layer. The function will require 161 | as input the minibatch index, and to train an RBM you just 162 | need to iterate, calling the corresponding function on all 163 | minibatch indexes. 164 | 165 | :type train_set_x: theano.tensor.TensorType 166 | :param train_set_x: Shared var. that contains all datapoints used 167 | for training the RBM 168 | :type batch_size: int 169 | :param batch_size: size of a [mini]batch 170 | :param k: number of Gibbs steps to do in CD-k / PCD-k 171 | 172 | ''' 173 | 174 | # index to a [mini]batch 175 | index = T.lscalar('index') # index to a minibatch 176 | learning_rate = T.scalar('lr') # learning rate to use 177 | 178 | # begining of a batch, given `index` 179 | batch_begin = index * batch_size 180 | # ending of a batch given `index` 181 | batch_end = batch_begin + batch_size 182 | 183 | pretrain_fns = [] 184 | for rbm in self.rbm_layers: 185 | 186 | # get the cost and the updates list 187 | # using CD-k here (persisent=None) for training each RBM. 188 | # TODO: change cost function to reconstruction error 189 | persistent_chain = theano.shared(numpy.zeros((batch_size, rbm.n_hidden), 190 | dtype=theano.config.floatX), 191 | borrow=True) 192 | 193 | cost, updates = rbm.get_cost_updates(learning_rate, 194 | persistent=persistent_chain, k=k) 195 | 196 | # compile the theano function 197 | fn = theano.function( 198 | inputs=[index, theano.In(learning_rate, value=0.1)], 199 | outputs=cost, 200 | updates=updates, 201 | givens={ 202 | self.x: train_set_x[batch_begin:batch_end] 203 | } 204 | ) 205 | # append `fn` to the list of functions 206 | pretrain_fns.append(fn) 207 | 208 | return pretrain_fns 209 | 210 | def auto_encoding(self, input, 211 | batch_size =500, learning_rate = None, 212 | add_noise = None, obj_fn = 'cross_entropy'): 213 | 214 | if learning_rate is None: 215 | learning_rate = 1/batch_size 216 | 217 | # Encoding input data 218 | train_set_x = input 219 | N_input_x = train_set_x.shape[0] 220 | if add_noise: 221 | assert type(add_noise) == float or type(add_noise) == int, "'add_noise' must be either None, float or int" 222 | noise = T.matrix('noise') 223 | train_noise = self.theano_rng.normal( 224 | size=(N_input_x.eval(), self.hidden_layers_sizes[-1]), 225 | avg=0, 226 | std=add_noise, 227 | ndim=None 228 | ) 229 | fwd_pass = self.x 230 | 231 | for i in range(self.n_layers): 232 | _, fwd_pass = self.rbm_layers[i].propup(fwd_pass) 233 | 234 | # Decoding encoded input data 235 | if add_noise: 236 | fwd_pass += noise 237 | 238 | # Decoding encoded input data 239 | for i in reversed(range(self.n_layers)): 240 | _, fwd_pass = self.rbm_layers[i].propdown(fwd_pass) 241 | 242 | if obj_fn == 'cross_entropy': 243 | # ------ Objective Function: multinomial cross entropy ------ # 244 | #L = - T.sum(self.x * T.log(fwd_pass), axis=1) 245 | x_normalized = self.x / self.rbm_layers[0].input_rSum[:,None] 246 | L = - T.sum(x_normalized * T.log(fwd_pass), axis=1) 247 | else: 248 | # ------ Objective Function: square error ------ # 249 | L = T.sum( T.pow( fwd_pass - (self.x/self.dbn.rbm_layers[0].input_rSum[:,None]), 2), axis=1) 250 | 251 | # mean cost 252 | cost = T.mean(L) 253 | 254 | # compute the gradients of the cost of the `dA` with respect 255 | # to its parameters 256 | gparams = T.grad(cost, self.params_rbm) 257 | 258 | # generate the list of updates 259 | updates = [ 260 | (param, param - learning_rate * gparam) 261 | for param, gparam in zip(self.params_rbm, gparams) 262 | ] 263 | 264 | index = T.lscalar('index') 265 | if add_noise: 266 | train_dae = theano.function( 267 | inputs=[index], 268 | outputs=cost, 269 | updates=updates, 270 | givens={ 271 | self.x: train_set_x[index * batch_size: (index + 1) * batch_size], 272 | noise: train_noise[index * batch_size: (index + 1) * batch_size] 273 | } 274 | ) 275 | else: 276 | train_dae = theano.function( 277 | inputs=[index], 278 | outputs=cost, 279 | updates=updates, 280 | givens={ 281 | self.x: train_set_x[index * batch_size: (index + 1) * batch_size] 282 | } 283 | ) 284 | 285 | return train_dae 286 | 287 | 288 | 289 | def apply_dropout(self, input, corruption_level): 290 | return self.theano_rng.binomial(size=input.shape, n=1, 291 | p=1 - corruption_level, 292 | dtype=theano.config.floatX) * input 293 | 294 | def build_finetune_functions(self, x, y, 295 | batch_size, learning_rate, 296 | drop_out = None, 297 | split_prop = [0.65,0.15,0.20]): 298 | '''Generates a function `train` that implements one step of 299 | finetuning, a function `validate` that computes the error on a 300 | batch from the validation set, and a function `test` that 301 | computes the error on a batch from the testing set 302 | 303 | :type datasets: list of pairs of theano.tensor.TensorType 304 | :param datasets: It is a list that contain all the datasets; 305 | the has to contain three pairs, `train`, 306 | `valid`, `test` in this order, where each pair 307 | is formed of two Theano variables, one for the 308 | datapoints, the other for the labels 309 | :type batch_size: int 310 | :param batch_size: size of a minibatch 311 | :type learning_rate: float 312 | :param learning_rate: learning rate used during finetune stage 313 | 314 | ''' 315 | assert y.shape[0].eval() == x.shape[0].eval(), "independent and target length do not match!" 316 | assert len(split_prop) == 3 and type(split_prop) == list, \ 317 | "'split_prop' cannot have more or less than 3 inputs and must in list format" 318 | 319 | N = y.shape[0].eval() 320 | split_prop = numpy.array(split_prop) 321 | split_prop = split_prop / split_prop.sum() 322 | idx_rand = numpy.random.randint(N, size=N) 323 | idx_train = idx_rand[:int(N*split_prop[0])] 324 | idx_valid = idx_rand[len(idx_train):(len(idx_train)+int(N*split_prop[1]))] 325 | idx_test = idx_rand[(len(idx_train)+len(idx_valid)):] 326 | 327 | (train_set_x, train_set_y) = (x[idx_train,], y[idx_train]) 328 | (valid_set_x, valid_set_y) = (x[idx_valid,], y[idx_valid]) 329 | (test_set_x, test_set_y) = (x[idx_test,], y[idx_test]) 330 | 331 | # compute number of minibatches for training, validation and testing 332 | n_train_batches = train_set_x.shape[0].eval() 333 | n_train_batches //= batch_size 334 | n_valid_batches = valid_set_x.shape[0].eval() 335 | n_valid_batches //= batch_size 336 | n_test_batches = test_set_x.shape[0].eval() 337 | n_test_batches //= batch_size 338 | 339 | index = T.lscalar('index') # index to a [mini]batch 340 | 341 | if drop_out: 342 | 343 | assert type(drop_out) == list, "'drop_out' variable must be none or or a list of proportions" 344 | assert len(drop_out) == (self.n_layers+1), "len of 'drop_out' list must equal number of layers" 345 | 346 | x_dropout = T.matrix('x_dropout') 347 | fwd_pass = self.x 348 | 349 | for i in range(self.n_layers): 350 | self.sigmoid_layers[i].input = self.apply_dropout(fwd_pass, drop_out[i]) 351 | fwd_pass = self.sigmoid_layers[i].output 352 | 353 | self.logLayer.input = self.apply_dropout(fwd_pass, drop_out[self.n_layers]) 354 | finetune_cost_dropout = self.logLayer.negative_log_likelihood(self.y) 355 | 356 | # compute the gradients with respect to the model parameters 357 | gparams = T.grad(finetune_cost_dropout, self.params) 358 | 359 | # compute list of fine-tuning updates 360 | updates = [] 361 | for param, gparam in zip(self.params, gparams): 362 | updates.append((param, param - gparam * learning_rate)) 363 | 364 | train_fn = theano.function( 365 | inputs=[index], 366 | outputs=finetune_cost_dropout, 367 | updates=updates, 368 | givens={ 369 | self.x: train_set_x[ 370 | index * batch_size: (index + 1) * batch_size 371 | ], 372 | self.y: train_set_y[ 373 | index * batch_size: (index + 1) * batch_size 374 | ] 375 | } 376 | ) 377 | 378 | else: 379 | 380 | # compute the gradients with respect to the model parameters 381 | gparams = T.grad(self.finetune_cost, self.params) 382 | 383 | # compute list of fine-tuning updates 384 | updates = [] 385 | for param, gparam in zip(self.params, gparams): 386 | updates.append((param, param - gparam * learning_rate)) 387 | 388 | train_fn = theano.function( 389 | inputs=[index], 390 | outputs=self.finetune_cost, 391 | updates=updates, 392 | givens={ 393 | self.x: train_set_x[ 394 | index * batch_size: (index + 1) * batch_size 395 | ], 396 | self.y: train_set_y[ 397 | index * batch_size: (index + 1) * batch_size 398 | ] 399 | } 400 | ) 401 | 402 | test_score_i = theano.function( 403 | [index], 404 | self.errors, 405 | givens={ 406 | self.x: test_set_x[ 407 | index * batch_size: (index + 1) * batch_size 408 | ], 409 | self.y: test_set_y[ 410 | index * batch_size: (index + 1) * batch_size 411 | ] 412 | } 413 | ) 414 | 415 | valid_score_i = theano.function( 416 | [index], 417 | self.errors, 418 | givens={ 419 | self.x: valid_set_x[ 420 | index * batch_size: (index + 1) * batch_size 421 | ], 422 | self.y: valid_set_y[ 423 | index * batch_size: (index + 1) * batch_size 424 | ] 425 | } 426 | ) 427 | 428 | # Create a function that scans the entire validation set 429 | def valid_score(): 430 | return [valid_score_i(i) for i in range(n_valid_batches)] 431 | 432 | # Create a function that scans the entire test set 433 | def test_score(): 434 | return [test_score_i(i) for i in range(n_test_batches)] 435 | 436 | return n_train_batches, train_fn, valid_score, test_score 437 | 438 | 439 | 440 | def predict(self, input, batch_size = 2000, prob = False): 441 | 442 | train_set_x = input 443 | N_input_x = train_set_x.shape[0] 444 | 445 | # compute number of minibatches for scoring 446 | if train_set_x.get_value(borrow=True).shape[0] % batch_size != 0: 447 | N_splits = int( numpy.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) + 1 ) 448 | else: 449 | N_splits = int( numpy.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) ) 450 | 451 | # allocate symbolic variables for the data 452 | index = T.lscalar() # index to a [mini]batch 453 | 454 | if prob: 455 | output = theano.function( 456 | inputs = [index], 457 | outputs = self.logLayer.p_y_given_x, 458 | givens={ 459 | self.x: train_set_x[index * batch_size: (index + 1) * batch_size] 460 | } 461 | ) 462 | else: 463 | output = theano.function( 464 | inputs = [index], 465 | outputs = self.logLayer.y_pred, 466 | givens={ 467 | self.x: train_set_x[index * batch_size: (index + 1) * batch_size] 468 | } 469 | ) 470 | 471 | return numpy.concatenate( [output(ii) for ii in range(N_splits)], axis=0 ) 472 | -------------------------------------------------------------------------------- /lib/dbn.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/dbn.pyc -------------------------------------------------------------------------------- /lib/deeplearning.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function, division 2 | import os 3 | import sys 4 | import timeit 5 | from six.moves import cPickle as pickle 6 | 7 | import numpy as np 8 | import pandas as pd 9 | 10 | import theano 11 | import theano.tensor as T 12 | 13 | from lib.mlp import HiddenLayer, LogisticRegression 14 | from lib.rbm import RBM, RSM 15 | from lib.dbn import DBN 16 | 17 | os.chdir('/home/ekhongl/Codes/DL - Topic Modelling') 18 | 19 | class InitializationError(object): 20 | '''this error is raised when the input definitions for the corresponding class is conflicting''' 21 | 22 | ######################################################################################################## 23 | ## Deep Belief Net ##################################################################################### 24 | ######################################################################################################## 25 | class deepbeliefnet(object): 26 | 27 | def __init__(self, architecture = [2000, 500, 500, 128], 28 | opt_epochs = [], predefined_weights = None, n_outs = 1): 29 | 30 | # ensure proper class initialization 31 | assert len(architecture) > 1 , "architecture definition must include both the hidden layers AND input layer" 32 | 33 | # numpy random generator 34 | numpy_rng = np.random.RandomState(123) 35 | 36 | # reconstruct the DBN class 37 | self.hidden_layers_sizes = architecture[1:] 38 | self.n_layers = len(self.hidden_layers_sizes) 39 | self.params = [] #params for the MLP 40 | self.params_rbm = [] #params for the RBMs 41 | self.n_ins = architecture[0] 42 | self.n_outs = n_outs 43 | print('... building the model') 44 | self.dbn = DBN( numpy_rng=numpy_rng, 45 | n_ins=self.n_ins, 46 | n_outs=self.n_outs, 47 | hidden_layers_sizes=self.hidden_layers_sizes ) 48 | 49 | # loading pre-trained weights 50 | if predefined_weights is not None and len(opt_epochs) >0 : 51 | self.new_model = False 52 | 53 | # load saved model 54 | for i in range(self.n_layers): 55 | model_pkl = os.path.join(predefined_weights, 56 | 'dbn_layer' + str(i) + '_epoch_' + str(opt_epochs[i]) + '.pkl') 57 | self.dbn.rbm_layers[i].__setstate__(pickle.load(open(model_pkl, 'rb'))) 58 | # extract the model parameters 59 | for i in range(self.n_layers): 60 | self.params_rbm.extend(self.dbn.rbm_layers[i].params) 61 | 62 | print('Pre-trained DBN model from "' + predefined_weights + '" loaded.') 63 | 64 | # loading fine-tuned weights 65 | elif predefined_weights is not None and len(opt_epochs) == 0: 66 | 67 | self.dbn.params = pickle.load(open(predefined_weights, 'rb')) 68 | for i in range(self.n_layers): 69 | self.dbn.sigmoid_layers[i].__setstate__ (self.dbn.params[(i*2):(i*2+2)]) 70 | self.dbn.logLayer.__setstate__ (self.dbn.params[(self.n_layers*2):(self.n_layers*2+2)]) 71 | 72 | print('Fine-tuned (or MLP) model from "' + predefined_weights + '" loaded.') 73 | 74 | # error in class initialization 75 | elif (predefined_weights is None and len(opt_epochs) > 0): 76 | 77 | raise InitializationError("Either 'opt_epochs' and predefined_weights' is empty or filled at the same time") 78 | 79 | # creating a new generative model 80 | else: 81 | self.new_model = True 82 | for i in range(self.n_layers): 83 | self.params_rbm.extend(self.dbn.rbm_layers[i].params) 84 | 85 | 86 | 87 | def pretrain(self, input, pretraining_epochs=100, pretrain_lr=None, 88 | k=1, batch_size=800, output_path = 'params/dbn_params_test'): 89 | 90 | train_set_x = input 91 | 92 | #---------------------------------------------------------------------------------------# 93 | # ensure class initialization matches input definition before function execution 94 | assert train_set_x.get_value(borrow=True).shape[1] == self.n_ins, \ 95 | "Input data dimensions must match initialized dimensions!" 96 | 97 | if pretrain_lr is None: 98 | pretrain_lr = [1/batch_size] * self.n_layers 99 | else: 100 | if type(pretrain_lr) is not list: 101 | pretrain_lr = [pretrain_lr] * self.n_layers 102 | elif len(pretrain_lr) != self.n_layers: 103 | pretrain_lr = [1/batch_size] * self.n_layers 104 | raise Warning('Warning: pretrain_lr len parameter not equal to number of hidden layers!') 105 | print('Reverting pretrain_lr to the default values (1/batch_size).') 106 | if type(pretraining_epochs) is not list: 107 | pretraining_epochs = [pretraining_epochs] * self.n_layers 108 | #---------------------------------------------------------------------------------------# 109 | 110 | # compute number of minibatches for training, validation and testing 111 | n_train_batches = train_set_x.get_value(borrow=True).shape[0] // batch_size 112 | 113 | ######################### 114 | # PRETRAINING THE MODEL # 115 | ######################### 116 | if not os.path.isdir(output_path): 117 | os.makedirs(output_path) 118 | 119 | print('... getting the pretraining functions') 120 | pretraining_fns = self.dbn.pretraining_functions(train_set_x=train_set_x, 121 | batch_size=batch_size, 122 | k=k) 123 | 124 | print('... pre-training the model') 125 | start_time = timeit.default_timer() 126 | 127 | # Pre-train layer-wise 128 | for i in range(self.dbn.n_layers): 129 | 130 | # go through pretraining epochs 131 | lproxy = [] 132 | for epoch in range(pretraining_epochs[i]): 133 | 134 | # go through the training set 135 | mean_cost = [] 136 | for batch_index in range(n_train_batches): 137 | mean_cost.append(pretraining_fns[i](index=batch_index, \ 138 | lr=pretrain_lr[i])) 139 | 140 | # calculating the epoch's mean proxy likelihood value 141 | lproxy += [np.mean(mean_cost)] 142 | print('Pre-training layer %i, epoch %d, cost ' % (i, epoch), end=' ') 143 | print(lproxy[epoch]) 144 | 145 | # save the model parameters for each epoch 146 | epoch_pickle = output_path +'/dbn_layer' + str(i) + \ 147 | '_epoch_' + str(epoch) + '.pkl' 148 | #path_epoch_pickle = os.path.join( os.getcwd(), epoch_pickle) 149 | pickle.dump( self.dbn.rbm_layers[i].__getstate__(), \ 150 | open(epoch_pickle, 'wb') ) 151 | 152 | # save the proxy likelihood profile 153 | pd.DataFrame(data = {'likelihood_proxy' :lproxy} ). \ 154 | to_csv( output_path +'/lproxy_layer_' + str(i) + '.csv', index = False) 155 | 156 | end_time = timeit.default_timer() 157 | print('The pretraining for layer ' + str(i) + 158 | ' ran for %.2fm' % ((end_time - start_time) / 60.), file=sys.stderr) 159 | 160 | 161 | def train(self, x, y, split_prop = [0.65,0.15,0.20], training_epochs = 100, 162 | batch_size=800, learning_rate=None, drop_out = None, 163 | output_path = 'params/dbn_params_trained'): 164 | 165 | if learning_rate is None: learning_rate = 1/batch_size 166 | 167 | # get the training, validation and testing function for the model 168 | print('... getting the finetuning functions') 169 | n_train_batches, train_fn, validate_model, test_model = self.dbn.build_finetune_functions( 170 | x=x, 171 | y=y, 172 | split_prop = split_prop, 173 | batch_size=batch_size, 174 | learning_rate=learning_rate, 175 | drop_out = drop_out 176 | ) 177 | 178 | print('... finetuning the model') 179 | 180 | ######################### 181 | # TRAINING THE MODEL # 182 | ######################### 183 | if not os.path.isdir(output_path): 184 | os.makedirs(output_path) 185 | 186 | # look as this many examples regardless 187 | patience = 4 * n_train_batches 188 | 189 | # wait this much longer when a new best is found 190 | patience_increase = 2. 191 | 192 | # a relative improvement of this much is considered significant 193 | improvement_threshold = 0.995 194 | 195 | # go through this many minibatches before checking the network on 196 | # the validation set; in this case we check every epoch 197 | validation_frequency = min(n_train_batches, patience / 2) 198 | 199 | best_validation_loss = np.inf 200 | test_score = 0. 201 | start_time = timeit.default_timer() 202 | 203 | done_looping = False 204 | epoch = 0 205 | 206 | while (epoch < training_epochs) and (not done_looping): 207 | epoch = epoch + 1 208 | for minibatch_index in range(n_train_batches): 209 | 210 | train_fn(minibatch_index) 211 | iter = (epoch - 1) * n_train_batches + minibatch_index 212 | 213 | if (iter + 1) % validation_frequency == 0: 214 | 215 | validation_losses = validate_model() 216 | this_validation_loss = np.mean(validation_losses, dtype='float64') 217 | print('epoch %i, minibatch %i/%i, validation error %f %%' % ( 218 | epoch, 219 | minibatch_index + 1, 220 | n_train_batches, 221 | this_validation_loss * 100. 222 | ) 223 | ) 224 | 225 | # if we got the best validation score until now 226 | if this_validation_loss < best_validation_loss: 227 | 228 | # improve patience if loss improvement is good enough 229 | if (this_validation_loss < best_validation_loss * 230 | improvement_threshold): 231 | patience = max(patience, iter * patience_increase) 232 | 233 | # save best validation score and iteration number 234 | best_validation_loss = this_validation_loss 235 | best_iter = iter 236 | 237 | # test it on the test set 238 | test_losses = test_model() 239 | test_score = np.mean(test_losses, dtype='float64') 240 | print((' epoch %i, minibatch %i/%i, test error of ' 241 | 'best model %f %%') % 242 | (epoch, minibatch_index + 1, n_train_batches, 243 | test_score * 100.)) 244 | 245 | #-----------------------------------------------------------------------------# 246 | #----------- Saving the current best model -----------------------------------# 247 | #-----------------------------------------------------------------------------# 248 | print('Saving model...') 249 | 250 | tmp_params = [] 251 | for i in range(self.n_layers): 252 | tmp_params.extend( self.dbn.sigmoid_layers[i].__getstate__ () ) 253 | tmp_params.extend( self.dbn.logLayer.__getstate__ () ) 254 | 255 | pickle.dump( tmp_params, \ 256 | open(output_path +'/trained_dbn.pkl', 'wb') ) 257 | 258 | del tmp_params 259 | 260 | print('...model saved.') 261 | #-----------------------------------------------------------------------------# 262 | 263 | if patience <= iter: 264 | done_looping = True 265 | break 266 | 267 | end_time = timeit.default_timer() 268 | print(('Optimization complete with best validation score of %f %%, ' 269 | 'obtained at iteration %i, ' 270 | 'with test performance %f %%' 271 | ) % (best_validation_loss * 100., best_iter + 1, test_score * 100.)) 272 | print('The fine tuning ran for %.2fm' % ((end_time - start_time) / 60.), file=sys.stderr) 273 | 274 | def score(self, input, batch_size = 2000): 275 | 276 | x = T.matrix('x') 277 | self.dbn.rbm_layers[0].input_rSum = x.sum(axis=1) 278 | train_set_x = input 279 | N_input_x = train_set_x.shape[0] 280 | 281 | # compute number of minibatches for scoring 282 | if train_set_x.get_value(borrow=True).shape[0] % batch_size != 0: 283 | N_splits = int( np.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) + 1 ) 284 | else: 285 | N_splits = int( np.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) ) 286 | 287 | # allocate symbolic variables for the data 288 | index = T.lscalar() # index to a [mini]batch 289 | 290 | # input_rSum must be specified for the RSM layer 291 | activation = x 292 | for i in range(self.n_layers): 293 | _, activation = self.dbn.rbm_layers[i].propup(activation) 294 | 295 | # it is ok for a theano function to have no output 296 | # the purpose of train_rbm is solely to update the RBM parameters 297 | score = theano.function( 298 | inputs = [index], 299 | outputs = activation, 300 | givens={ 301 | x: train_set_x[index * batch_size: (index + 1) * batch_size] 302 | } 303 | ) 304 | 305 | return np.concatenate( [score(ii) for ii in range(N_splits)], axis=0 ) 306 | 307 | def predict(self, input, batch_size = 2000, prob = False): 308 | 309 | return self.dbn.predict(input = input, batch_size = batch_size, prob = prob) 310 | 311 | 312 | 313 | ######################################################################################################## 314 | ## Auto-Encoder ######################################################################################## 315 | ######################################################################################################## 316 | class autoencoder(object): 317 | 318 | def __init__(self, architecture = [], opt_epochs = [], model_src = 'params/dbn_params', param_type = 'dbn'): 319 | 320 | # ensure model source directory is valid 321 | assert type(model_src) == str or model_src is not None, "dir to load model parameters not indicated" 322 | if len(opt_epochs)>0: 323 | assert len(architecture) == (len(opt_epochs)+1) , "len of network inputs must be 1 more than len of hidden layers" 324 | 325 | # reconstruct the DBN class 326 | self.params = [] 327 | self.hidden_layers_sizes = architecture[1:] 328 | self.n_layers = len(self.hidden_layers_sizes) 329 | self.dbn = DBN( n_ins=architecture[0], 330 | hidden_layers_sizes = self.hidden_layers_sizes ) 331 | self.theano_rng = T.shared_randomstreams.RandomStreams(1234) 332 | 333 | if param_type == 'dbn': 334 | # load saved model 335 | print('Loading the pre-trained Deep Belief Net parameters...') 336 | for i in range(self.n_layers): 337 | model_pkl = os.path.join(model_src, 338 | 'dbn_layer' + str(i) + '_epoch_' + str(opt_epochs[i]) + '.pkl') 339 | self.dbn.rbm_layers[i].__setstate__(pickle.load(open(model_pkl, 'rb'))) 340 | # extract the model parameters 341 | self.params.extend(self.dbn.rbm_layers[i].params) 342 | print('...model loaded.') 343 | 344 | 345 | else: 346 | print('Loading the trained auto-encoder parameters.') 347 | print('...please ensure that the auto-encoder params matches the defined architecture.') 348 | for i in range(self.n_layers): 349 | model_pkl = model_src +'/ae_layer_' + str(i) + '.pkl' 350 | self.dbn.rbm_layers[i].__setstate__(pickle.load(open(model_pkl, 'rb'))) 351 | self.params.extend(self.dbn.rbm_layers[i].params) 352 | 353 | 354 | def get_corrupted_input(self, input, corruption_level): 355 | return self.theano_rng.binomial(size=input.shape, n=1, 356 | p=1 - corruption_level, 357 | dtype=theano.config.floatX) * input 358 | 359 | def get_cost_updates(self, learning_rate, add_noise = False, obj_fn = 'cross_entropy'): 360 | """ This function computes the cost and the updates for one trainng 361 | step of the dA """ 362 | 363 | # Encoding input data 364 | fwd_pass = self.x 365 | for i in range(self.n_layers): 366 | _, fwd_pass = self.dbn.rbm_layers[i].propup(fwd_pass) 367 | 368 | # Decoding encoded input data 369 | if add_noise: 370 | fwd_pass += self.noise 371 | 372 | # Decoding encoded input data 373 | for i in reversed(range(self.n_layers)): 374 | _, fwd_pass = self.dbn.rbm_layers[i].propdown(fwd_pass) 375 | 376 | if obj_fn == 'cross_entropy': 377 | # ------ Objective Function: multinomial cross entropy ------ # 378 | #L = - T.sum(self.x * T.log(fwd_pass), axis=1) 379 | x_normalized = self.x / self.dbn.rbm_layers[0].input_rSum[:,None] 380 | L = - T.sum(x_normalized * T.log(fwd_pass), axis=1) 381 | else: 382 | # ------ Objective Function: square error ------ # 383 | # rightfully, should follow by L = L / len(vocab), but linear scaling 384 | # does not affect search for minima and therefore omitted 385 | L = T.sum( T.pow( fwd_pass*self.dbn.rbm_layers[0].input_rSum[:,None] - self.x, 2), axis=1) 386 | 387 | # mean cost 388 | cost = T.mean(L) 389 | 390 | # compute the gradients of the cost of the `dA` with respect 391 | # to its parameters 392 | gparams = T.grad(cost, self.params) 393 | 394 | # generate the list of updates 395 | updates = [ 396 | (param, param - learning_rate * gparam) 397 | for param, gparam in zip(self.params, gparams) 398 | ] 399 | 400 | return (cost, updates) 401 | 402 | 403 | def train(self, input, epochs= 1000, batch_size = 500, learning_rate = None, add_noise = None, 404 | obj_fn = 'cross_entropy', output_path = 'params/ae_params'): 405 | 406 | N_input_x = input.get_value(borrow=True).shape[0] 407 | 408 | # compute number of minibatches for training 409 | if N_input_x % batch_size != 0: 410 | N_splits = int( np.floor(N_input_x / batch_size) + 1 ) 411 | else: 412 | N_splits = int( np.floor(N_input_x / batch_size) ) 413 | 414 | # get the autoecoding training function 415 | print('... getting the finetuning functions') 416 | train_dae = self.dbn.auto_encoding( 417 | input=input, 418 | batch_size = batch_size, 419 | learning_rate = learning_rate, 420 | add_noise = add_noise, 421 | obj_fn = obj_fn 422 | ) 423 | 424 | print('... finetuning the model') 425 | 426 | ######################### 427 | # TRAINING THE MODEL # 428 | ######################### 429 | if not os.path.isdir(output_path): 430 | os.makedirs(output_path) 431 | 432 | start_time = timeit.default_timer() 433 | 434 | # go through training epochs 435 | cost_profile = [] 436 | for epoch in range(epochs): 437 | 438 | # go through trainng set 439 | c = [] 440 | for batch_index in range(N_splits): 441 | c.append(train_dae(batch_index)) 442 | 443 | # saving and printing iterations 444 | cost_profile += [np.mean(c, dtype='float64')] 445 | if epoch % 100 == 0: 446 | #-----------------------------------------------------------------------------# 447 | #----------- Saving the current best model -----------------------------------# 448 | #-----------------------------------------------------------------------------# 449 | print('Saving model...') 450 | # save the model parameters for all layers 451 | for i in range(self.n_layers): 452 | pickle.dump( self.dbn.rbm_layers[i].__getstate__(), \ 453 | open(output_path +'/ae_layer_' + str(i) + '.pkl', 'wb') ) 454 | print('...model saved') 455 | # save the proxy likelihood profile 456 | pd.DataFrame(data = {'cross_entropy' : cost_profile} ). \ 457 | to_csv( output_path +'/cost_profile.csv', index = False) 458 | print('Training epoch %d, cost ' % epoch, cost_profile[epoch]) 459 | #-----------------------------------------------------------------------------# 460 | 461 | print('Saving model...') 462 | # save the model parameters for all layers 463 | for i in range(self.n_layers): 464 | pickle.dump( self.dbn.rbm_layers[i].__getstate__(), \ 465 | open(output_path +'/ae_layer_' + str(i) + '.pkl', 'wb') ) 466 | print('...model saved') 467 | # save the proxy likelihood profile 468 | pd.DataFrame(data = {'cross_entropy' : cost_profile} ). \ 469 | to_csv( output_path +'/cost_profile.csv', index = False) 470 | print('Training epoch %d, cost ' % epoch, cost_profile[epoch]) 471 | 472 | end_time = timeit.default_timer() 473 | 474 | training_time = (end_time - start_time) 475 | 476 | print(('Training ran for %.2fm' % (training_time / 60.)), file=sys.stderr) 477 | 478 | def score(self, input, batch_size = 2000): 479 | train_set_x = input 480 | N_input_x = train_set_x.shape[0] 481 | 482 | # compute number of minibatches for scoring 483 | if train_set_x.get_value(borrow=True).shape[0] % batch_size != 0: 484 | N_splits = int( np.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) + 1 ) 485 | else: 486 | N_splits = int( np.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) ) 487 | 488 | # allocate symbolic variables for the data 489 | index = T.lscalar() # index to a [mini]batch 490 | 491 | # input_rSum must be specified for the RSM layer 492 | x = T.matrix('x') 493 | self.dbn.rbm_layers[0].input_rSum = x.sum(axis=1) 494 | activation = x 495 | for i in range(self.n_layers): 496 | _, activation = self.dbn.rbm_layers[i].propup(activation) 497 | 498 | # it is ok for a theano function to have no output 499 | # the purpose of train_rbm is solely to update the RBM parameters 500 | score = theano.function( 501 | inputs = [index], 502 | outputs = activation, 503 | givens={ 504 | x: train_set_x[index * batch_size: (index + 1) * batch_size] 505 | } 506 | ) 507 | 508 | return np.concatenate( [score(ii) for ii in range(N_splits)], axis=0 ) -------------------------------------------------------------------------------- /lib/deeplearning.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/deeplearning.pyc -------------------------------------------------------------------------------- /lib/mlp.py: -------------------------------------------------------------------------------- 1 | """ 2 | This tutorial introduces the multilayer perceptron using Theano. 3 | 4 | A multilayer perceptron is a logistic regressor where 5 | instead of feeding the input to the logistic regression you insert a 6 | intermediate layer, called the hidden layer, that has a nonlinear 7 | activation function (usually tanh or sigmoid) . One can use many such 8 | hidden layers making the architecture deep. The tutorial will also tackle 9 | the problem of MNIST digit classification. 10 | 11 | .. math:: 12 | 13 | f(x) = G( b^{(2)} + W^{(2)}( s( b^{(1)} + W^{(1)} x))), 14 | 15 | References: 16 | 17 | - textbooks: "Pattern Recognition and Machine Learning" - 18 | Christopher M. Bishop, section 5 19 | 20 | """ 21 | 22 | from __future__ import print_function 23 | 24 | __docformat__ = 'restructedtext en' 25 | 26 | import os 27 | import sys 28 | import timeit 29 | 30 | import numpy 31 | 32 | import theano 33 | import theano.tensor as T 34 | 35 | 36 | # start-snippet-1 37 | class HiddenLayer(object): 38 | 39 | def __getstate__(self): 40 | weights = [p.get_value() for p in self.params] 41 | return weights 42 | 43 | def __setstate__(self, weights): 44 | i = iter(weights) 45 | for p in self.params: 46 | p.set_value(next(i)) 47 | 48 | def __init__(self, rng, input, n_in, n_out, W=None, b=None, 49 | activation=T.tanh): 50 | """ 51 | Typical hidden layer of a MLP: units are fully-connected and have 52 | sigmoidal activation function. Weight matrix W is of shape (n_in,n_out) 53 | and the bias vector b is of shape (n_out,). 54 | 55 | NOTE : The nonlinearity used here is tanh 56 | 57 | Hidden unit activation is given by: tanh(dot(input,W) + b) 58 | 59 | :type rng: numpy.random.RandomState 60 | :param rng: a random number generator used to initialize weights 61 | 62 | :type input: theano.tensor.dmatrix 63 | :param input: a symbolic tensor of shape (n_examples, n_in) 64 | 65 | :type n_in: int 66 | :param n_in: dimensionality of input 67 | 68 | :type n_out: int 69 | :param n_out: number of hidden units 70 | 71 | :type activation: theano.Op or function 72 | :param activation: Non linearity to be applied in the hidden 73 | layer 74 | """ 75 | self.input = input 76 | # end-snippet-1 77 | 78 | # `W` is initialized with `W_values` which is uniformely sampled 79 | # from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden)) 80 | # for tanh activation function 81 | # the output of uniform if converted using asarray to dtype 82 | # theano.config.floatX so that the code is runable on GPU 83 | # Note : optimal initialization of weights is dependent on the 84 | # activation function used (among other things). 85 | # For example, results presented in [Xavier10] suggest that you 86 | # should use 4 times larger initial weights for sigmoid 87 | # compared to tanh 88 | # We have no info for other function, so we use the same as 89 | # tanh. 90 | if W is None: 91 | W_values = numpy.asarray( 92 | rng.uniform( 93 | low=-numpy.sqrt(6. / (n_in + n_out)), 94 | high=numpy.sqrt(6. / (n_in + n_out)), 95 | size=(n_in, n_out) 96 | ), 97 | dtype=theano.config.floatX 98 | ) 99 | if activation == theano.tensor.nnet.sigmoid: 100 | W_values *= 4 101 | 102 | W = theano.shared(value=W_values, name='W', borrow=True) 103 | 104 | if b is None: 105 | b_values = numpy.zeros((n_out,), dtype=theano.config.floatX) 106 | b = theano.shared(value=b_values, name='b', borrow=True) 107 | 108 | self.W = W 109 | self.b = b 110 | 111 | lin_output = T.dot(input, self.W) + self.b 112 | self.output = ( 113 | lin_output if activation is None 114 | else activation(lin_output) 115 | ) 116 | # parameters of the model 117 | self.params = [self.W, self.b] 118 | 119 | """ 120 | This tutorial introduces logistic regression using Theano and stochastic 121 | gradient descent. 122 | 123 | Logistic regression is a probabilistic, linear classifier. It is parametrized 124 | by a weight matrix :math:`W` and a bias vector :math:`b`. Classification is 125 | done by projecting data points onto a set of hyperplanes, the distance to 126 | which is used to determine a class membership probability. 127 | 128 | Mathematically, this can be written as: 129 | 130 | .. math:: 131 | P(Y=i|x, W,b) &= softmax_i(W x + b) \\ 132 | &= \frac {e^{W_i x + b_i}} {\sum_j e^{W_j x + b_j}} 133 | 134 | 135 | The output of the model or prediction is then done by taking the argmax of 136 | the vector whose i'th element is P(Y=i|x). 137 | 138 | .. math:: 139 | 140 | y_{pred} = argmax_i P(Y=i|x,W,b) 141 | 142 | 143 | This tutorial presents a stochastic gradient descent optimization method 144 | suitable for large datasets. 145 | 146 | 147 | References: 148 | 149 | - textbooks: "Pattern Recognition and Machine Learning" - 150 | Christopher M. Bishop, section 4.3.2 151 | 152 | """ 153 | 154 | 155 | class LogisticRegression(object): 156 | """Multi-class Logistic Regression Class 157 | 158 | The logistic regression is fully described by a weight matrix :math:`W` 159 | and bias vector :math:`b`. Classification is done by projecting data 160 | points onto a set of hyperplanes, the distance to which is used to 161 | determine a class membership probability. 162 | """ 163 | 164 | def __getstate__(self): 165 | weights = [p.get_value() for p in self.params] 166 | return weights 167 | 168 | def __setstate__(self, weights): 169 | i = iter(weights) 170 | for p in self.params: 171 | p.set_value(next(i)) 172 | 173 | def __init__(self, input, n_in, n_out): 174 | """ Initialize the parameters of the logistic regression 175 | 176 | :type input: theano.tensor.TensorType 177 | :param input: symbolic variable that describes the input of the 178 | architecture (one minibatch) 179 | 180 | :type n_in: int 181 | :param n_in: number of input units, the dimension of the space in 182 | which the datapoints lie 183 | 184 | :type n_out: int 185 | :param n_out: number of output units, the dimension of the space in 186 | which the labels lie 187 | 188 | """ 189 | # start-snippet-1 190 | # initialize with 0 the weights W as a matrix of shape (n_in, n_out) 191 | self.W = theano.shared( 192 | value=numpy.zeros( 193 | (n_in, n_out), 194 | dtype=theano.config.floatX 195 | ), 196 | name='W', 197 | borrow=True 198 | ) 199 | # initialize the biases b as a vector of n_out 0s 200 | self.b = theano.shared( 201 | value=numpy.zeros( 202 | (n_out,), 203 | dtype=theano.config.floatX 204 | ), 205 | name='b', 206 | borrow=True 207 | ) 208 | 209 | # symbolic expression for computing the matrix of class-membership 210 | # probabilities 211 | # Where: 212 | # W is a matrix where column-k represent the separation hyperplane for 213 | # class-k 214 | # x is a matrix where row-j represents input training sample-j 215 | # b is a vector where element-k represent the free parameter of 216 | # hyperplane-k 217 | self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b) 218 | 219 | # symbolic description of how to compute prediction as class whose 220 | # probability is maximal 221 | self.y_pred = T.argmax(self.p_y_given_x, axis=1) 222 | # end-snippet-1 223 | 224 | # parameters of the model 225 | self.params = [self.W, self.b] 226 | 227 | # keep track of model input 228 | self.input = input 229 | 230 | def negative_log_likelihood(self, y): 231 | """Return the mean of the negative log-likelihood of the prediction 232 | of this model under a given target distribution. 233 | 234 | .. math:: 235 | 236 | \frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) = 237 | \frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|} 238 | \log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\ 239 | \ell (\theta=\{W,b\}, \mathcal{D}) 240 | 241 | :type y: theano.tensor.TensorType 242 | :param y: corresponds to a vector that gives for each example the 243 | correct label 244 | 245 | Note: we use the mean instead of the sum so that 246 | the learning rate is less dependent on the batch size 247 | """ 248 | # start-snippet-2 249 | # y.shape[0] is (symbolically) the number of rows in y, i.e., 250 | # number of examples (call it n) in the minibatch 251 | # T.arange(y.shape[0]) is a symbolic vector which will contain 252 | # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of 253 | # Log-Probabilities (call it LP) with one row per example and 254 | # one column per class LP[T.arange(y.shape[0]),y] is a vector 255 | # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ..., 256 | # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is 257 | # the mean (across minibatch examples) of the elements in v, 258 | # i.e., the mean log-likelihood across the minibatch. 259 | return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y]) 260 | # end-snippet-2 261 | 262 | def errors(self, y): 263 | """Return a float representing the number of errors in the minibatch 264 | over the total number of examples of the minibatch ; zero one 265 | loss over the size of the minibatch 266 | 267 | :type y: theano.tensor.TensorType 268 | :param y: corresponds to a vector that gives for each example the 269 | correct label 270 | """ 271 | 272 | # check if y has same dimension of y_pred 273 | if y.ndim != self.y_pred.ndim: 274 | raise TypeError( 275 | 'y should have the same shape as self.y_pred', 276 | ('y', y.type, 'y_pred', self.y_pred.type) 277 | ) 278 | # check if y is of the correct datatype 279 | if y.dtype.startswith('int'): 280 | # the T.neq operator returns a vector of 0s and 1s, where 1 281 | # represents a mistake in prediction 282 | return T.mean(T.neq(self.y_pred, y)) 283 | else: 284 | raise NotImplementedError() 285 | 286 | # start-snippet-2 287 | class MLP(object): 288 | """Multi-Layer Perceptron Class 289 | 290 | A multilayer perceptron is a feedforward artificial neural network model 291 | that has one layer or more of hidden units and nonlinear activations. 292 | Intermediate layers usually have as activation function tanh or the 293 | sigmoid function (defined here by a ``HiddenLayer`` class) while the 294 | top layer is a softmax layer (defined here by a ``LogisticRegression`` 295 | class). 296 | """ 297 | 298 | def __init__(self, rng, input, n_in, n_hidden, n_out): 299 | """Initialize the parameters for the multilayer perceptron 300 | 301 | :type rng: numpy.random.RandomState 302 | :param rng: a random number generator used to initialize weights 303 | 304 | :type input: theano.tensor.TensorType 305 | :param input: symbolic variable that describes the input of the 306 | architecture (one minibatch) 307 | 308 | :type n_in: int 309 | :param n_in: number of input units, the dimension of the space in 310 | which the datapoints lie 311 | 312 | :type n_hidden: int 313 | :param n_hidden: number of hidden units 314 | 315 | :type n_out: int 316 | :param n_out: number of output units, the dimension of the space in 317 | which the labels lie 318 | 319 | """ 320 | 321 | # Since we are dealing with a one hidden layer MLP, this will translate 322 | # into a HiddenLayer with a tanh activation function connected to the 323 | # LogisticRegression layer; the activation function can be replaced by 324 | # sigmoid or any other nonlinear function 325 | self.hiddenLayer = HiddenLayer( 326 | rng=rng, 327 | input=input, 328 | n_in=n_in, 329 | n_out=n_hidden, 330 | activation=T.tanh 331 | ) 332 | 333 | # The logistic regression layer gets as input the hidden units 334 | # of the hidden layer 335 | self.logRegressionLayer = LogisticRegression( 336 | input=self.hiddenLayer.output, 337 | n_in=n_hidden, 338 | n_out=n_out 339 | ) 340 | # end-snippet-2 start-snippet-3 341 | # L1 norm ; one regularization option is to enforce L1 norm to 342 | # be small 343 | self.L1 = ( 344 | abs(self.hiddenLayer.W).sum() 345 | + abs(self.logRegressionLayer.W).sum() 346 | ) 347 | 348 | # square of L2 norm ; one regularization option is to enforce 349 | # square of L2 norm to be small 350 | self.L2_sqr = ( 351 | (self.hiddenLayer.W ** 2).sum() 352 | + (self.logRegressionLayer.W ** 2).sum() 353 | ) 354 | 355 | # negative log likelihood of the MLP is given by the negative 356 | # log likelihood of the output of the model, computed in the 357 | # logistic regression layer 358 | self.negative_log_likelihood = ( 359 | self.logRegressionLayer.negative_log_likelihood 360 | ) 361 | # same holds for the function computing the number of errors 362 | self.errors = self.logRegressionLayer.errors 363 | 364 | # the parameters of the model are the parameters of the two layer it is 365 | # made out of 366 | self.params = self.hiddenLayer.params + self.logRegressionLayer.params 367 | # end-snippet-3 368 | 369 | # keep track of model input 370 | self.input = input -------------------------------------------------------------------------------- /lib/mlp.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/mlp.pyc -------------------------------------------------------------------------------- /lib/rbm.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/rbm.pyc -------------------------------------------------------------------------------- /notebooks/train_sae.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [ 10 | { 11 | "name": "stderr", 12 | "output_type": "stream", 13 | "text": [ 14 | "Using gpu device 0: Tesla K40c (CNMeM is disabled, cuDNN 5105)\n", 15 | "/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.\n", 16 | " warnings.warn(warn)\n" 17 | ] 18 | } 19 | ], 20 | "source": [ 21 | "import os\n", 22 | "os.chdir('~/Codes/DL - Topic Modelling')\n", 23 | "\n", 24 | "from __future__ import print_function, division\n", 25 | "import sys\n", 26 | "import timeit\n", 27 | "from six.moves import cPickle as pickle\n", 28 | "\n", 29 | "import numpy as np\n", 30 | "import pandas as pd\n", 31 | "\n", 32 | "import theano\n", 33 | "import theano.tensor as T\n", 34 | "\n", 35 | "from lib.deeplearning import autoencoder" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 2, 41 | "metadata": { 42 | "collapsed": false 43 | }, 44 | "outputs": [], 45 | "source": [ 46 | "dat_x = np.genfromtxt('data/dtm_20news.csv', dtype='float32', delimiter=',', skip_header = 1)\n", 47 | "dat_y = dat_x[:,0]\n", 48 | "dat_x = dat_x[:,1:]\n", 49 | "vocab = np.genfromtxt('data/dtm_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:]\n", 50 | "test_input = theano.shared(dat_x)" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "## Loading weights pretrained from the deepbeliefnet to the autoencoder" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 3, 63 | "metadata": { 64 | "collapsed": false, 65 | "scrolled": true 66 | }, 67 | "outputs": [ 68 | { 69 | "name": "stdout", 70 | "output_type": "stream", 71 | "text": [ 72 | "Building layer: 0\n", 73 | " Input units: 2756\n", 74 | " Output units: 500\n", 75 | "Building layer: 1\n", 76 | " Input units: 500\n", 77 | " Output units: 500\n", 78 | "Building layer: 2\n", 79 | " Input units: 500\n", 80 | " Output units: 128\n" 81 | ] 82 | } 83 | ], 84 | "source": [ 85 | "model = autoencoder( architecture = [2756, 500, 500, 128], opt_epochs = [900,5,10], model_src = 'params/dbn_params')" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "## Training the Autoencoder" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": null, 98 | "metadata": { 99 | "collapsed": false, 100 | "scrolled": true 101 | }, 102 | "outputs": [ 103 | { 104 | "name": "stdout", 105 | "output_type": "stream", 106 | "text": [ 107 | "... getting the finetuning functions\n", 108 | "... finetuning the model\n" 109 | ] 110 | } 111 | ], 112 | "source": [ 113 | "model.train(test_input, batch_size = 100, epochs = 110, add_noise = 16, output_path = 'params/to_delete')" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "## Loading the trained Auto-Encoder" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 4, 126 | "metadata": { 127 | "collapsed": false 128 | }, 129 | "outputs": [ 130 | { 131 | "name": "stdout", 132 | "output_type": "stream", 133 | "text": [ 134 | "Building layer: 0\n", 135 | " Input units: 2000\n", 136 | " Output units: 500\n", 137 | "Building layer: 1\n", 138 | " Input units: 500\n", 139 | " Output units: 500\n", 140 | "Building layer: 2\n", 141 | " Input units: 500\n", 142 | " Output units: 128\n", 143 | "Loading the trained auto-encoder parameters.\n", 144 | "...please ensure that the auto-encoder params matches the defined architecture.\n" 145 | ] 146 | } 147 | ], 148 | "source": [ 149 | "model = autoencoder( architecture = [2000, 500, 500, 128], model_src = 'params_2000/ae_train', param_type = 'ae')" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "## Extracting features from the trained Auto-Encoder" 157 | ] 158 | }, 159 | { 160 | "cell_type": "code", 161 | "execution_count": 8, 162 | "metadata": { 163 | "collapsed": false 164 | }, 165 | "outputs": [], 166 | "source": [ 167 | "output = model.score(test_input)" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "## Saving the features extracted" 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": 8, 180 | "metadata": { 181 | "collapsed": false 182 | }, 183 | "outputs": [], 184 | "source": [ 185 | "colnames = ['bit'] * 128\n", 186 | "colnames = [colnames[i] + str(i) for i in range(128)]\n", 187 | "colnames.insert(0,'_label_')\n", 188 | "pd.DataFrame(data = np.c_[dat_y, output], \n", 189 | " columns = colnames). \\\n", 190 | " to_csv( 'data/ae_features.csv', index = False)" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "# Visualizing the convergence behavior" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 9, 203 | "metadata": { 204 | "collapsed": false 205 | }, 206 | "outputs": [ 207 | { 208 | "data": { 209 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAFdCAYAAABfMCThAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzt3XucnOP9//H3Z3dzICQpkUMrIaEJ3zjErkNUBA0JzS+K\n9NF2hRK+aNGSftWhfImqU9W5lKKosNqqbxVphSqC0tpFlFQ0oiISh4gVcs5evz+u2c5hZ3Z3Zmfn\nuu+Z1/PxmMd9nLk/cznMe+/7uq/bnHMCAAAohqrQBQAAgPJBsAAAAEVDsAAAAEVDsAAAAEVDsAAA\nAEVDsAAAAEVDsAAAAEVTE7qAbMxsS0mTJL0laU3YagAAiJXekraV9IhzbnmpDx7JYCEfKu4OXQQA\nADE2TdI9pT5oVIPFW5I0a9Ys7bjjjoFLiY8ZM2bo6quvDl1G7NBu+aPNCkO75Y82y9/8+fN11FFH\nSYnf0lKLarBYI0k77rijamtrQ9cSG/369aO9CkC75Y82Kwztlj/arEuCdCWg8yYAACgaggUAACga\nggUAACgagkUZqa+vD11CLNFu+aPNCkO75Y82ix9zzoWuoQ0zq5XU2NjYSKcdAADy0NTUpLq6Okmq\nc841lfr4nLEAAABFQ7AAAABFQ7AAAABFQ7AAAABFQ7AAAABFQ7AAAABFQ7AAAABFQ7AAAABFE+lg\nsW5d6AoAAEA+Ih0sPvoodAUAACAfkQ4Wq1eHrgAAAOSDYAEAAIom0sHinXdCVwAAAPIR6WBxzTWh\nKwAAAPmIdLCYNi10BQAAIB+RDhbOha4AAADkI9LBYs2a0BUAAIB8RDpYvP566AoAAEA+Ih0sHn88\ndAUAACAfkQ4WAAAgXiIdLLbdNnQFAAAgH3kFCzNbZGYtWV7X59j/9sT2jRn7v9KZ461alU91AAAg\ntHzPWOwuaXDK6yBJTtJvcuz/vcR+QxLTrSV91M7+ad5/n1tOAQCIk5p8dnbOLU9dNrMpkhY65+bm\n2H+lpJUp+x8mqb+kOzp7zE8/lTbfPJ8qAQBAKAX3sTCzHpKmSbotj7cdJ+kx59zizr6BB5EBABAf\nXem8ebikfpLu7MzOZjZE0iGSbsnnIC++mH9hAAAgjK4Ei+Mk/dE5t6yT+x8raYWkB/I5yMEH51kV\nAAAIJq8+Fq3MbJikAyUdlsfbpkv6lXNuQ+ffMkNSPx16aHJNfX296uvr8zgsAADlqaGhQQ0NDWnr\nmpubA1XjmSvgtgszmynpBElDnXMtndh/f0l/lrSTc25+J/avldQoNWr69Fr98pd5lwgAQEVqampS\nXV2dJNU555pKffy8L4WYmclf1rgjM1SY2SVmlq3PxfGSnu9MqMh0++35vgMAAIRSyKWQAyUNlZTt\nJ39IYtt/mFlf+Y6e38v3QAMHSuPGFVAhAAAIIu9g4Zx7VFJ1jm3Ts6z7RNJm+ZfmB8i6//5C3gkA\nAEKI9LNCAABAvEQ6WBx5ZOgKAABAPiIdLO65x095GBkAAPEQ6WAxaJCfzp4dtg4AANA5kQ4WrZdC\nNiuo6ycAACi1SAeL/fbz00svDVsHAADonEgHiwED0qcAACDaIh0sNtnEXwb5/OdDVwIAADoj0sFC\nkj79VPrZz0JXAQAAOiPywQIAAMRHbIJFAQ9hBQAAJRabYLF6degKAABARyIfLL77XT99/vmwdQAA\ngI5FPliY+emsWWHrAAAAHYt8sDjzTD/daaewdQAAgI5FPlh84QtSnz6hqwAAAJ0R+WAhSf37Sx9/\nHLoKAADQkVgEiyVLpB/9KHQVAACgI7EIFgAAIB5iESxOOSV0BQAAoDNiESyee85P580LWwcAAGhf\nLILFkiV+eu65YesAAADti0WwuPRSP33oobB1AACA9sUiWBx5ZOgKAABAZ8QiWPTsGboCAADQGbEI\nFgAAIB5iEyy22SZ0BQAAoCOxCRbNzX66alXYOgAAQG6xCRbXXuunH34Ytg4AAJBbbILF0KF+evjh\nYesAAAC5xSZYDBnip01NYesAAAC5xSZY7LBD6AoAAEBHYhMsAABA9MUyWGzYELoCAACQTayCRW2t\nny5aFLYOAACQXayCxRZb+Olhh4WtAwAAZBerYPGtb/npa6+FrQMAAGQXq2Bx9NF+WlMTtg4AAJBd\nrIJFqw0bJOdCVwEAADLFMlhI3BkCAEAUxS5YTJrkp++/H7YOAADQVuyCxSuv+OlJJ4WtAwAAtBW7\nYHHXXX768MNh6wAAAG3FLlgccEDoCgAAQC6xCxZmoSsAAAC5xC5YSNLgwX66alXYOgAAQLpYBotN\nN/XT//u/sHUAAIB0sQwW1dV+etttYesAAADpYhks7rzTT//yl7B1AACAdLEMFmPHhq4AAABkE8tg\nwZ0hAABEUyyDhSQdcoifvvNO2DoAAEBSbIPF/vv76YwZQcsAAAApYhssJk700/vuC1sHAABIim2w\nGDMmdAUAACBTbINFqpaW0BUAAACpTILF0qWhKwAAAFLMg0Vr/4oJE8LWAQAAvLyChZktMrOWLK/r\n23lPTzO72MzeMrM1ZvammR3b5coljR7tp6+/XoxPAwAAXVWT5/67S6pOWd5Z0hxJv2nnPb+VtJWk\n6ZIWShqiIp0p2WGH5LxzDJwFAEBoeQUL59zy1GUzmyJpoXNubrb9zexgSftKGuGc+zix+u1CCu3I\n2rVS797d8ckAAKCzCj5zYGY9JE2T1N4zRqdIekHSWWb2jpm9bmZXmFnRIwCXQwAACK8rlyQOl9RP\n0p3t7DNC/ozFaEmHSTpN0tck3dCF46aZPt1PGdcCAIDwuhIsjpP0R+fcsg4+v0XSkc65F5xzf5L0\nfUnHmFmvLhz7P665phifAgAAiiHfzpuSJDMbJulA+bMQ7VkqaYlz7tOUdfMlmaSt5Ttz5jRjxgz1\n69cvbV19fb3q6+v/s9y3b+frBgCgnDQ0NKihoSFtXXNzc6BqPHPO5f8ms5mSTpA01DmXc9xLMztB\n0tWSBjrnViXWfVXSfZI2c86tzfG+WkmNjY2Nqq2t7UQ9ftrcTNAAAFS2pqYm1dXVSVKdc66p1MfP\n+1KImZmkYyXdkRkqzOwSM0vtc3GPpOWSbjezHc1svKSfSLotV6joioyTGwAAoMQK6WNxoKShkm7P\nsm1IYpskyTn3maSDJPWX9HdJd0l6QL4TZ9F8+GExPw0AABQq7z4WzrlHlT5IVuq26VnWLZA0Kf/S\nOm/LLbvz0wEAQGfF+lkh2axZE7oCAAAqV9kEi/PP99OMzrEAAKCEyiZYbLednx53XNg6AACoZGUT\nLA4+OHQFAACgbILFwIHJ+QULwtUBAEAlK5tgIUk1iXtcdt45bB0AAFSqsgoWixf76bp1YesAAKBS\nlVWwGDAgdAUAAFS2sgoWNSnDfd1/f7g6AACoVGUVLCRpzz39dOrUsHUAAFCJyi5YnHNO6AoAAKhc\nZRcsDjssdAUAAFSusgsWqW64IXQFAABUlrIMFpdf7qennhq2DgAAKk1ZBgsCBQAAYZRlsNhkk+T8\na6+FqwMAgEpTlsHCTBoxws+PHh22FgAAKklZBgtJmj07dAUAAFSesg0Wo0Yl55csCVcHAACVpGyD\nRarjjw9dAQAAlaGsg8VNN/npI4+ErQMAgEpR1sHixBOT862PVAcAAN2nrIOFWXJ+2LBwdQAAUCnK\nOlhI0syZoSsAAKBylH2wOP/80BUAAFA5yj5YpF4OmTs3XB0AAFSCsg8WqcaPD10BAADlrSKCxR/+\nkJx/++1wdQAAUO4qIlhMmZKc32abcHUAAFDuKiJYSNIxxyTnN2wIVwcAAOWsYoLFddcl588+O1wd\nAACUs4oJFn37JuevvDJcHQAAlLOKCRaS9MQTyXnngpUBAEDZqqhgkXq76U9/Gq4OAADKVUUFCzNp\n1Cg/f+aZYWsBAKAcVVSwkNI7cf773+HqAACgHFVcsJg4MTm/7bbBygAAoCxVXLCQpPPOC10BAADl\nqSKDRepgWZddFq4OAADKTUUGi+23T86fcw63ngIAUCwVGSyk9GG9FywIVwcAAOWkYoNFdXVyfocd\nwtUBAEA5qdhgIUn33Zec53IIAABdV9HBYurU5HxVRbcEAADFUfE/p6mXRAAAQNdUfLBYuzY5P3Bg\nuDoAACgHFR8sUs9YfPBBuDoAACgHFR8sJOnDD5PzjMoJAEDhCBaSttwyOX/xxeHqAAAg7ggWCeef\nn5y/6aZwdQAAEGcEi4QLL0zOf+c74eoAACDOCBYpDj00Of/qq+HqAAAgrggWKR54IDm/007SmjXh\nagEAII4IFhnuvz85f/PN4eoAACCOCBYZDj88OX/66eHqAAAgjggWWRx1VHJ+ypRwdQAAEDcEiyzu\nuis5/9BD6cN+AwCA3AgWOTz2WHJ+2rRwdQAAECd5BQszW2RmLVle1+fYf78s+240s8g/7mvChOT8\n737HHSIAAHRGvmcsdpc0OOV1kCQn6TftvMdJ+mLKe4Y4597Pv9TSmz07Of/1r4erAwCAuKjJZ2fn\n3PLUZTObImmhc25uB2/9wDn3Sb7FhXbIIcn5Bx+Umpulfv3C1QMAQNQV3MfCzHpImibpto52lfSS\nmb1rZnPM7EuFHjOE9euT8/37h6sDAIA46ErnzcMl9ZN0Zzv7LJV0kqSpko6QtFjSE2Y2pgvHLama\njHM6Tz0Vpg4AAOKgK8HiOEl/dM4ty7WDc26Bc+4W59yLzrnnnHPHS3pW0owuHLfkHnooOb/ffpJz\n4WoBACDK8upj0crMhkk6UNJhBbz9b5L26cyOM2bMUL+MTg319fWqr68v4LCFmzxZuvvu5G2nV18t\nff/7JS0BAIA2Ghoa1NDQkLauubk5UDWeuQL+/DazmZJOkDTUOdeS53vnSPrEOfe1dvapldTY2Nio\n2travOvrLmbJ+Xfekb7whXC1AACQTVNTk+rq6iSpzjnXVOrj530pxMxM0rGS7sgMFWZ2iZndmbJ8\nmpkdambbmdloM7tG0gGSftbFuoO45prk/NZbh6sDAICoKqSPxYGShkq6Pcu2IYltrXpKulLSPElP\nSNpZ0gTn3BMFHDe4005LX3755TB1AAAQVXn3sXDOPSqpOse26RnLV0i6orDSomnjRqk68e3HjPHL\nVQyMDgCAJJ4VkreqKunJJ5PLvXuHqwUAgKghWBQgtT/p+vXS3I7GHQUAoEIQLAqw2Wbpy+PHh6kD\nAICoIVgU6OKL05ePOSZMHQAARAnBokDnnOMfStbqV7+SVq4MVw8AAFFAsCiQmdS3b/p4Fn37hqsH\nAIAoIFh00eLF6cupo3MCAFBpCBZF8MAD6cslfpQJAACRQbAogilTpIsuSi7fe6/03nvh6gEAIBSC\nRRGYSeedl75u8GDpwQfD1AMAQCgEiyLKvCvk0EPD1AEAQCgEiyLabLP04b4l6e67w9QCAEAIBIsi\nGz9e2nvv5PJRR0mTJ4erBwCAUiJYdINnn01fnj1bamkJUwsAAKVEsOgmzqUvV1e3XQcAQLkhWHSj\n669PX77oImnVqjC1AABQCgSLbnTyydJppyWXL7hA6tMnXD0AAHQ3gkU3qqqSrrlGOuus9PXLloWp\nBwCA7kawKIHLLktfHjJEevzxMLUAANCdCBYlsm5d+vKECdwpAgAoPwSLEunRQ1q7Nn1ddbX0q1+F\nqQcAgO5AsCihnj2lN95IX3fMMdyGCgAoHwSLEtt+e+mTT9LXVVVJGzeGqQcAgGIiWASw+ebSTTel\nr+vZkzMXAID4I1gEctJJ/jJIq5YWf+aCDp0AgDgjWAR0xx3SkUemr6uu5lZUAEB8ESwCy/ZY9QkT\npLlzS18LAABdRbCIAOek3XdPXzd+fJhaAADoCoJFRPz9723XmUnvv1/6WgAAKBTBIkJWrJAmTkxf\nN2iQtHp1mHoAAMgXwSJC+veXHnmkbb+LTTf1oQMAgKgjWETQkUdKU6akr9tii7ajdgIAEDUEi4j6\nwx+kf/4zfd3IkdKCBWHqAQCgMwgWETZqlHTCCW3XPfJImHoAAOgIwSLibryx7bqDD5Z22aX0tQAA\n0BGCRcTV1PhxLi68MH39K69IdXU8XwQAEC0Ei5g4//y2d4Y0Nfnni3z0EQEDABANBIsY6d9feu+9\ntuu33FK66qrS1wMAQCaCRcwMHCitXy/tvXf6+jPOkJYvD1MTAACtCBYxVFMjPfts2/UDBvhhwBcv\nLn1NAABIBItYe/99afLktuuHDZPOPltaurT0NQEAKhvBIsa22soPpPWTn7Tddvnl0uc/X/qaAACV\njWARc1VV0g9+IL38cvbtZtLChaWtCQBQuQgWZWKXXfwtp9Omtd22/fbZH8sOAECxESzKzKxZUnNz\n2/V77ilts03p6wEAVBaCRRnq2zf7gFlvv+0vjcyZU/qaAACVgWBRxpyT5s1ru37SJGnmTOnmmxmx\nEwBQXASLMrfzzv7ySKYLL5S+/W3f+RMAgGLhZ6UCTJsmffyxdMEF2bebZT+zAQBAvggWFaJfP3/5\n47PPsm/fdVcfMAAA6AqCRYXZdFPpr3+Vrrgi+3Yzadmy0tYEACgfBIsKNHasf2iZc36EzkxDhviA\n8fzz0sqVpa8PABBfBIsKd+aZ0ooV2beNHetvXd24sbQ1AQDii2AB9e8vrV7tO3hmU1MjnXKKtGRJ\naesCAMQPwQKSpN69fQfPDz7Ivv3GG6Wtt5YaGnI/lwQAgJrQBSBaBgzwfS+cyz7GxZFH+unmm0vv\nvOMvlQAA0IozFsjKzIeLNWuyb1+50p/hmDcv9z4AgMpDsEC7evWSNmyQbr01+/Zdd5U22UT65BNp\n8eLS1gYAiJ68goWZLTKzliyv6zvx3n3MbL2ZNRVeLkKorpaOP15aulQ69dTs+/TrJw0b5s90cBcJ\nAFSufM9Y7C5pcMrrIElO0m/ae5OZ9ZN0p6THCqgRETF4sHT99dLcudLs2bn3q6mRxo2TWlpKVxsA\nIBryChbOueXOufdbX5KmSFronJvbwVtvknS3pOcKrBMRMm6cdMghvg/GPvtk3+eZZ/yZjssukxYs\nKG19AIBwCu5jYWY9JE2TdFsH+02XNFzShYUeC9H19NM+YKxfn337OedIo0b5SyRz5pS2NgBA6XWl\n8+bhklovcWRlZl+UdImkac45ToyXsZoaqblZuuOO3PtMmuQDhpn0wgslKw0AUEJdCRbHSfqjcy7r\nI6vMrEr+8scFzrmFrau7cDxEXN++0jHH+L4Vq1b5Dp+57LGHDxgffli6+gAA3c+cc/m/yWyYpDcl\nHeaceyjHPv0krZC0QclAUZWY3yBponPuiRzvrZXUOH78ePXr1y9tW319verr6/OuGWEsWCB96UvS\n8uW595k6VTr3XGnECH93CQCgcxoaGtTQ0JC2rrm5WU899ZQk1TnnSn4nZqHBYqakEyQNzXWJw8xM\n0o4Zq0+RdICkqZLecs6tzvHeWkmNjY2Nqq2tzbs+RI9z0i23SCed1P5+VVXSn/8s1dX50T0BAPlp\nampSXV2dFChY5H0pJBEYjpV0R2aoMLNLzOxOSXLea6kvSe9LWuOcm58rVKA8mUknntjxLagtLdIB\nB/jLKm+8Ib32WmnqAwAURyHPCjlQ0lBJt2fZNiSxDcjKzI/k+eabflyM9p41MnJkcv6jj6TPfa77\n6wMAdE3eZyycc48656qdc//Ksm26c+7L7bz3Qucc1zYqXHW19MUv+ksdrQ8822OP9t+zxRbJO0q4\nqwQAootnhSAS/vY3adEiabfdpKZOXBHcYw/pyiu5qwQAooZggcjYdlsfKnbbzd9F8vLL/uxGLmec\nIW21lT+D8cMfSg0N0urVDCUOACEV0scC6HZbbOFfGzb45d13lxobc+9/6aVt161dK/Xs2T31AQCy\n44wFYuGFF5L9MZZlHZKtrV69/GPd58yRVq7s3voAAB7BArEzaJB/NPtBB3W877x5fijxvn39JZOJ\nEzvXhwMAUBiCBWKpqsqfiXBOev11f0bj2GM7ft+jj/rBt1rvLnnxxW4vFQAqCsECsTdypA8Lt9/u\ng8bGjdK113auf0VtrQ8Y220nnXWW76uxcKH/HABA/goa0ru7MaQ3iuX55/0D0b6cc3SV3FatkjbZ\npPg1AUB3it2Q3kCc7LWXHyK8tePnmjX+Ee+dsemmyUsmv/5199YJAOWCYIGK0quXtH59Mmg8lPXZ\nvG1985vJkPGDH0h33eWHGV+8uHvrBYC4YRwLVLTJk32fjAUL/APPvvGN5NgZufz0p+nLRxwh3Xyz\nPxuy9dbdVysAxAFnLFDxqqqkHXbwAWHdOv9atUoaMqRz77//fj8C6NChybMaL78sffaZNH9+99YO\nAFHDGQsghZnUo4d/vfuuX/f88/4Sym67df5zxoxJzo8dKw0cKN17r+8Mum6dH7Bryy2LWzsARAHB\nAujAXnv5aeoNVGvWdP6Okeee89NNN227frvtpAEDul4jAEQFwQIoQO/evm/GunW+I+eYMdKee+b3\nGWPHJueHD5dOOMGf2Rg2rHOjigJAFBEsgAJVVfmAccIJfrn1jMbatdL3vy/deGPnP2vRIv+E1lZD\nh/rltWul//5vqU+f4tUNAN2JzptAkfXqJd1wQ/KWVuf8La7bbtv5z1i8WPrOd6TTT5c22yzZKdRM\n+t3v/D7O+TMmABAlBAugBGpq/FmJ1qCxcKG/RTX1LEVnfe1rPmBUVfkQM3WqtGRJ8WsGgEIQLIAA\nRoyQTjxRuvjiZNh44gnp+OPz/6z77/fjZ6Se1ejTR3rySX/bLACUEsECiIj99pNuvTUZNP7xD+nr\nX/fb9tknv89atUraf38fMFrDxvDh0o9+5OcPOsg/HRYAio1gAUTU6NH+GSXOSU8/7W9xPfdc6b77\nCvu8t96SLrjAzz/2mDRpkn+a6xNPSKtXF6tqAJWOu0KAmOjVS/rxj/186x0oq1dLt9zi+2z06eOD\nQj7a6+Px3HPJMTwAoLM4YwHE2CabSN/7nnTttdIllyQvo2zcKE2Y0LXPHjs2vd+Gme8TssMO0ptv\n+jMoAJCJYAGUoaoqf7nDOenVV33QWL9eWrHCbx82rLDPPe886fXX/Yihm2ySHjrOOUdavlz67W+l\nDz4o3ncBEC9cCgHK3H/9l59WVUn9+6cPTf7EE9Lee0tTpkiPPtq141x2mX+1uvlmHzS+9CWpXz8/\npPnIkV07BoDoI1gAFWz//f10zhxp2TL/2n576S9/8Wcn5s0r/LNPOqntuvHjpaee8p1QDz9cevxx\n6dhj/dNhAZQHc6l/vkSEmdVKamxsbFRtbW3ocoCK55x0xx3SAQdIH3/sz26ceWb3HOuZZ6RPP/WX\nbiZP7p5jAOWsqalJdXV1klTnnGsq9fE5YwGgQ2bS9OnJ5TFj/AigW20lPfustPPO/i6SI47o+rE6\nGrNj8mTpl7/0Q51nPjEWQHh03gRQkOHD/Y/7xInSkCH+0kbq81Fa705ZulQ65pjiHffhh6VBg9IH\n/7rhBh9wLr7Y9+v4+9/981YAlB7BAkC3qaqSBg/2l1FSA8fixX500B//WDrjDP8sla449VR/puO8\n86QBA/wj7IcN86HjlFN8wPnTn/y+zkktLV3+agByIFgAKLmtt/a3q557rnTFFb4/ReqTYN94Q/ru\nd4tzrBtv9MHlkEOSD2+rrvbzP/yhv6yz997SySdLr7zib88FUDiCBYBIqanxd6Zcd10ybNx6q/TS\nSz5wFNOll0ovv+z7h/z859Iuu0g77ZS8xFJX50c1vfJKf4blmWeKe3ygHBEsAETe8cdLu+7qA0dm\nP46WFj9dvlxqbJTGjSvecZua/DHPOMP36Rg3Ln1QsPfe85dYNmyQPvssOUbIhg1+IDGgEhEsAMSa\nmZ9usYVUWyvNnetfH32UDB8bNkgvvFD8Yw8e7C+x9OjhO7JWVfl6evTwQ59vu6304IPJ/VtafH8P\noJwRLACUnXHjpM99LrlcXe0va7QGjRUr/GWVG2+U7r5bWrvWP0lW8sOVF8u//y0demjyDEd1tb/U\n07p81VX+9tkHH/Thp3XIdSDOGCALAHL417/8oGDvvOMvizz6qHTWWaWt4R//kEaPli6/3D/NduZM\nP4DYZpuVtg7ER+gBsjhjAQA5bL+9vzXWOWm33fxoo5984i9ntJ79mDEj/T2XX17cGlo7k559tnTh\nhX5+883T+3pce60fjj0Vl1wQCsECAPKw+ea+L0Wrq65K70x65pn+jMKGDf4Si3Ndf8BbR04/3Q9S\nlho2Wi+5jBjh+3yMGpW8lfYXv5CWLPHzK1b4WoFiYUhvACiyPn38tLraTw88MHnHyEsv+Y6l48ZJ\nTz/tL3OsWePH9Xj4Yem444pby6JFfrpggT/70ZFXXkl/Iu4bb/ixRVrXAR0hWABACY0Zk5z/8pfT\nt02fnv5MFsn/qC9dKs2eLb32mvT733fvcOU775x7W3W19OKLvqPrTjv5sOGcvxUYaEWwAIAI69HD\nD0/+7W/75euuS25bvdqf5Zg0SbrkEv9QthEjpKOO6p5aNm70g4i1p08fX+///q80cKDUu7f05JPS\n//yPDyE9e6ZfSkL54a4QACgzzvn+Fa3Do//+9/4hcXPm+M6lAwb4Sx6hTZ/uH2I3erS05Za+b8rI\nkdK770offMCZkEKFviuEYAEAFe7dd/0P+uOP+3E9br3VLw8f7kcQfe210BX6TrH77uuD0rBh/iwN\nt9xmR7DIgmABANGyapW/1DJtmh9R9KKLfCAZOjR0Zd4VV/i+KCee6C/H7LSTdM01/tLN6NFSr15+\nv5Ur/fdo7VhbjggWWRAsACAeWlr8q6bGh48ePXxfjN69/fa335amTvX9LGbO9AEgCvr2lfbYQ/rz\nn/3w6wcd5Pt+nHGGf/punBEssiBYAEDlmDfPB5O+faNzBiSbffeVPv5YuvdePyrrc8/5TrOtWlqk\n+fP9GZKQQgcL7goBAASVeqdJ5t+6LS1+unSpf3rtHntIRx8tzZrlA8nRR/sf8smT/dmG7jR3rp+m\nBodLL82+77XX+ifubr659NZb0gkn+L4h06f7Mze77FK+d8dwxgIAUFbWrvU/5q++Kl18sXTssf52\n1698RXoa/CoUAAAJFElEQVTggdDVeT17+jq7A2csAAAool69/BDmo0ZJRxzh1333u7n3X7NG+tnP\nfMfOyZOlvfbq/hrXrfOjoY4c2f3HKjWCBQCgovXunX4ZJdeJ/O99zz/s7ayzpKuv9pc2Wjt7Dhrk\nz5LkY9CggkuONIIFAACdkDrq6axZyflVq9ru2zpImXO+s+fZZ/sA8/TTvgPo8OG+w2o5KtOvBQBA\nOGbJaX29f7XaaqswNZVKmfZJBQAAIRAsAABA0RAsAABA0RAsAABA0RAsAABA0RAsykhDQ0PoEmKJ\ndssfbVYY2i1/tFn85BUszGyRmbVkeV2fY/99zOxpM/vQzFaZ2XwzO704pSMT/wEWhnbLH21WGNot\nf7RZ/OQ7jsXuklKfYr+zpDmSfpNj/88kXS9pXmJ+nKRfmNmnzrlb8zw2AACIuLyChXNueeqymU2R\ntNA5NzfH/i9Jeill1T1mNlXSvpIIFgAAlJmC+1iYWQ9J0yTdlsd7dpO0t6QnCj0uAACIrq4M6X24\npH6S7uxoRzNbLGkr+csoM51zt3fwlt6SNH/+/C6UV3mam5vV1FTyJ+TGHu2WP9qsMLRb/miz/KX8\ndvYOcXxzuR7j1tEbzf4kaa1z7qud2HcbSZtJGivpckmnOOd+3c7+R0q6u6DCAACAJE1zzt1T6oMW\nFCzMbJikNyUd5px7KM/3nivpKOfcju3ss6WkSZLekrQm7wIBAKhcvSVtK+mRzL6RpVDopZDjJL0n\naXYB762W1Ku9HRINUfKUBQBAmXg21IHzDhZmZpKOlXSHc64lY9slkr7gnDsmsXyypLcl/TOxy36S\n/kfSNV2oGQAARFQhZywOlDRUUrYOmEMS21pVSbpU/pTMBkkLJf3AOfeLAo4LAAAiruDOmwAAAJl4\nVggAACgaggUAACiayAULMzsl8bCz1Wb2nJntEbqmUjCzc8zsb2b2iZm9Z2b/Z2Yjs+z3IzN7N/FQ\nt0fNbPuM7b3M7IbEg99Wmtl9ZjYwY5/PmdndZtZsZivM7FYz69Pd37EUzOzsxIPxrspYT7ulMLPP\nm9ldKQ8IfNnMajP2oc1SmFmVmV1kZm8m2uRfZnZelv0qut3MbF8z+4OZLUn8t3holn1K0kZmNtTM\nHjazz8xsmZn9xMyi+LuXs83MrMbMLjezeWb2aWKfO81sSMZnRKfNnHOReUn6hvy4Fd+StIOkmyV9\nJGlA6NpK8N1nSzpa0o7yD3d7SH4cj01S9jkr0R7/T9JOkn4v3yG2Z8o+P0+8bz9Ju8nfcjQ341h/\nlNQk/1C5L0laIGlW6DYoQhvuIT++youSrqLdcrZTf0mL5J/XUydpG/lO2cNps3bb7YeS3pd0sKRh\nko6Q9ImkU2m3tNoPlvQjSV+VtFHSoRnbS9JG8n84vyLpEfn/p05K/PP7ceg2yqfNJPVNfIepkr4o\naU9Jz0n6W8ZnRKbNgjdoxpd+TtK1Kcsm6R1JZ4auLUBbDJDUImlcyrp3Jc3I+BdutaSvpyyvlXR4\nyj6jEp+zZ2J5x8Tybin7TJK/a2dw6O/dhfbaTNLrkr4s6S9KDxa0W3pbXSbpyQ72oc3atsmDkm7J\nWHefpF/RbjnbrEVtg0VJ2kjSIZLWK+UPU0knSVohqSZ02+TTZln22V0+gGwdxTaLzCkh8w81q5P0\n59Z1zn+rx+QfXFZp+kty8sleZjZc0mClt88nkp5Xsn12l7+FOHWf1+XHEmndZ6ykFc65F1OO9Vji\nWHt1xxcpkRskPeicezx1Je2W1RRJL5jZb8xfdmsys/9u3Uib5fSspAlm9kVJMrNdJe2jxECBtFvH\nStxGYyW94pz7MGWfR+SfcTW6SF8plNbfh48Ty3WKUJtFJljI/4VeLT+iZ6r35P9FrBhmZvKDiD3t\nnHstsXqw/L8A7bXPIEnrEv+h5tpnsPyprf9wzm2UDzCxbGcz+6akMZLOybKZdmtrhKTvyJ/hmSh/\nCvU6Mzs6sZ02y+4ySb+W9E8zWyepUdI1zrl7E9tpt46Vso0G5ziOFON2NLNe8v8u3uOc+zSxerAi\n1GZdebopus+Nkv5L/q8htMPMtpYPYQc659aHricmquSvz/5vYvllM9tJ0rcl3RWurMj7hqQjJX1T\n0mvyYfZaM3vXOUe7oduZWY2k38qHs5MDl5NTlM5YfCh/zWhQxvpBkpaVvpwwzOxnkr4iaX/n3NKU\nTcvk+5y01z7LJPU0s74d7JPZU7ha0haKZzvXSdpKUpOZrTez9fKdl05L/FX5nmi3TEslzc9YN1++\nQ6LEv2u5/ETSZc653zrnXnXO3S3paiXPlNFuHStlGy3LcRwphu2YEiqGSpqYcrZCilibRSZYJP7a\nbJQ0oXVd4pLABAV8mEopJULFVyUd4Jx7O3Wbc26R/D/Y1PbpK39trLV9GuU74qTuM0r+B+OviVV/\nldTfzHZL+fgJ8v+xP1/M71Mij8n3Xh4jadfE6wVJsyTt6px7U7RbpmfkO3alGiXp3xL/rrVjU/k/\nflK1KPH/UdqtYyVuo79K2tnMBqTsM1FSs/wZp9hICRUjJE1wzq3I2CVabRa6B2xGT9evS1ql9NtN\nl0vaKnRtJfjuN8r3vN1XPiG2vnqn7HNmoj2myP+Y/l7SG0q/TetG+VsJ95f/a/4Ztb3laLb8j+8e\n8pdbXpd0V+g2KGJbZt4VQrulf4/d5XuQnyNpO/nT+yslfZM2a7fdbpfvDPcV+Vt0D5e/Zn0J7ZZW\nex/5gD9GPnidnlgeWso2kg98L8vfYrmL/B0Q70m6KHQb5dNm8l0WHpAP/jsr/fehRxTbLHiDZmng\nk+XvxV0tn552D11Tib53i/xfQ5mvb2XsN1P+dq1V8r11t8/Y3kvS9fKXllbKp9yBGfv0l/+Lvlk+\nzNwiadPQbVDEtnxcKcGCdsvaRl+RNC/RHq9KOi7LPrRZ+nfpI+mqxP+8P5P/MbxQGbfhVXq7yV+K\nzPb/s1+Wuo3kf5gfkvSp/A/k5ZKqQrdRPm0mH2Izt7Uuj49im/EQMgAAUDSR6WMBAADij2ABAACK\nhmABAACKhmABAACKhmABAACKhmABAACKhmABAACKhmABAACKhmABAACKhmABAACKhmABAACK5v8D\nOPjVaQZ8DfgAAAAASUVORK5CYII=\n", 210 | "text/plain": [ 211 | "" 212 | ] 213 | }, 214 | "metadata": {}, 215 | "output_type": "display_data" 216 | } 217 | ], 218 | "source": [ 219 | "import numpy as np\n", 220 | "import matplotlib.pyplot as plt\n", 221 | "%matplotlib inline\n", 222 | "plt_dat = np.genfromtxt('params_2000/ae_train/cost_profile.csv', delimiter=',', names = True)\n", 223 | "plt.plot(plt_dat)\n", 224 | "plt.show()" 225 | ] 226 | } 227 | ], 228 | "metadata": { 229 | "anaconda-cloud": {}, 230 | "kernelspec": { 231 | "display_name": "Python [conda env:py3]", 232 | "language": "python", 233 | "name": "conda-env-py3-py" 234 | }, 235 | "language_info": { 236 | "codemirror_mode": { 237 | "name": "ipython", 238 | "version": 3 239 | }, 240 | "file_extension": ".py", 241 | "mimetype": "text/x-python", 242 | "name": "python", 243 | "nbconvert_exporter": "python", 244 | "pygments_lexer": "ipython3", 245 | "version": "3.5.3" 246 | } 247 | }, 248 | "nbformat": 4, 249 | "nbformat_minor": 2 250 | } 251 | -------------------------------------------------------------------------------- /notebooks/train_sae_2000.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [ 10 | { 11 | "name": "stderr", 12 | "output_type": "stream", 13 | "text": [ 14 | "Using gpu device 0: Tesla K40c (CNMeM is disabled, cuDNN 5105)\n", 15 | "/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.\n", 16 | " warnings.warn(warn)\n" 17 | ] 18 | } 19 | ], 20 | "source": [ 21 | "from __future__ import print_function, division\n", 22 | "import os\n", 23 | "import sys\n", 24 | "import timeit\n", 25 | "from six.moves import cPickle as pickle\n", 26 | "\n", 27 | "import numpy as np\n", 28 | "import pandas as pd\n", 29 | "\n", 30 | "import theano\n", 31 | "import theano.tensor as T\n", 32 | "\n", 33 | "from lib.deeplearning import autoencoder\n", 34 | "\n", 35 | "os.chdir('~/Codes/DL - Topic Modelling')" 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 2, 41 | "metadata": { 42 | "collapsed": false 43 | }, 44 | "outputs": [], 45 | "source": [ 46 | "dat_x = np.genfromtxt('data/dtm_2000_20news.csv', dtype='float32', delimiter=',', skip_header = 1)\n", 47 | "dat_y = dat_x[:,0]\n", 48 | "dat_x = dat_x[:,1:]\n", 49 | "vocab = np.genfromtxt('data/dtm_2000_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:]\n", 50 | "test_input = theano.shared(dat_x)" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "## loading weights pretrained from the Deep Belief Net (DBN) to the Autoencoder" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 4, 63 | "metadata": { 64 | "collapsed": false, 65 | "scrolled": true 66 | }, 67 | "outputs": [ 68 | { 69 | "name": "stdout", 70 | "output_type": "stream", 71 | "text": [ 72 | "Building layer: 0\n", 73 | " Input units: 2000\n", 74 | " Output units: 500\n", 75 | "Building layer: 1\n", 76 | " Input units: 500\n", 77 | " Output units: 500\n", 78 | "Building layer: 2\n", 79 | " Input units: 500\n", 80 | " Output units: 128\n" 81 | ] 82 | } 83 | ], 84 | "source": [ 85 | "model = autoencoder( architecture = [2000, 500, 500, 128], opt_epochs = [110,15,10], model_src = 'params_2000/dbn_params_pretrain')" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "## Training the Autoencoder" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 5, 98 | "metadata": { 99 | "collapsed": false 100 | }, 101 | "outputs": [ 102 | { 103 | "name": "stdout", 104 | "output_type": "stream", 105 | "text": [ 106 | "... getting the finetuning functions\n", 107 | "... finetuning the model\n", 108 | "Saving model...\n", 109 | "...model saved\n", 110 | "Training epoch 0, cost 7.79978342056\n", 111 | "Saving model...\n", 112 | "...model saved\n", 113 | "Training epoch 100, cost 7.48429107666\n", 114 | "Saving model...\n", 115 | "...model saved\n", 116 | "Training epoch 109, cost 7.46735124588\n" 117 | ] 118 | }, 119 | { 120 | "name": "stderr", 121 | "output_type": "stream", 122 | "text": [ 123 | "Training ran for 0.29m\n" 124 | ] 125 | } 126 | ], 127 | "source": [ 128 | "model.train(test_input, batch_size = 200, epochs = 110, add_noise = 16, output_path = 'params_2000/ae_train')" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": {}, 134 | "source": [ 135 | "## Loading the trained Auto-Encoder" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 3, 141 | "metadata": { 142 | "collapsed": false 143 | }, 144 | "outputs": [ 145 | { 146 | "name": "stdout", 147 | "output_type": "stream", 148 | "text": [ 149 | "Building layer: 0\n", 150 | " Input units: 2000\n", 151 | " Output units: 500\n", 152 | "Building layer: 1\n", 153 | " Input units: 500\n", 154 | " Output units: 500\n", 155 | "Building layer: 2\n", 156 | " Input units: 500\n", 157 | " Output units: 128\n", 158 | "Loading the trained auto-encoder parameters.\n", 159 | "...please ensure that the auto-encoder params matches the defined architecture.\n" 160 | ] 161 | } 162 | ], 163 | "source": [ 164 | "model = autoencoder( architecture = [2000, 500, 500, 128], model_src = 'params_2000/ae_train_nonoise', param_type = 'ae')" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "## Extracting features from the trained Auto-Encoder" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 4, 177 | "metadata": { 178 | "collapsed": false 179 | }, 180 | "outputs": [], 181 | "source": [ 182 | "output = model.score(test_input)" 183 | ] 184 | }, 185 | { 186 | "cell_type": "markdown", 187 | "metadata": {}, 188 | "source": [ 189 | "## Saving the features extracted" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": 27, 195 | "metadata": { 196 | "collapsed": false 197 | }, 198 | "outputs": [], 199 | "source": [ 200 | "colnames = ['bit'] * 128\n", 201 | "colnames = [colnames[i] + str(i) for i in range(128)]\n", 202 | "colnames.insert(0,'_label_')\n", 203 | "pd.DataFrame(data = np.c_[dat_y, output], \n", 204 | " columns = colnames). \\\n", 205 | " to_csv( 'data/ae_features_2000_nonoise.csv', index = False)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "# Visualizing the convergence behavior" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 24, 218 | "metadata": { 219 | "collapsed": false 220 | }, 221 | "outputs": [ 222 | { 223 | "data": { 224 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAFkCAYAAAB8RXKEAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzt3XuUnFWdr/HnlwtJSKAJBHIjgYTEEAig6QAKgiKgoAPC\nUTk2OA4wjkfH43Jwlo4ul0vnnJHjjONddHBAQJBWRAcEE7koN7l4SQOKBIQkgMg9QMslCbns88fu\nmqquru50dd6kqrqfz1rvqvT77rdr16ZJvr33fveOlBKSJElFGNXoCkiSpOHDYCFJkgpjsJAkSYUx\nWEiSpMIYLCRJUmEMFpIkqTAGC0mSVBiDhSRJKozBQpIkFcZgIUmSClNXsIiI1RGxucbx9QHuOS0i\n7oqIlyLisYg4PyJ23fqqS5KkZlNvj8USYFrFcSyQgMtqFY6Iw4GLgP8E9gPeCRwCfHuI9ZUkSU1s\nTD2FU0prKr+OiBOAlSmlW/q55bXA6pTSOT1fPxwR5wIfr7umkiSp6Q15jkVEjAVOA84foNjtwKyI\nOL7nnqnAu4CfDvV9JUlS84qhbpseEacAlwCzU0pPDFDuncB3gPHkHpKfAO9IKW0a4J7dgLcADwHr\nhlRBSZJGpvHA3sA11SMN28PWBIufAetTSm8foMx+wHXAF4FrgenAvwO/SSm9b4D7TgW+N6SKSZIk\ngNNSSpdu7zcdUrCIiNnAKuCklNLVA5T7LjA+pXRKxbnDgVuA6SmlJ/u57zDg1ksuuYSFCxfWXb+R\n6qyzzuLLX/5yo6vRcmy3+tlmQ2O71c82q9+KFSt4z3veA3B4Sum27f3+dU3erHAm8CSwdAvldgRe\nqTq3mfwkSQxw3zqAhQsXsnjx4iFWceRpa2uzvYbAdqufbTY0tlv9bLOt0pCpBHVP3oyIAE4HLkwp\nba66dnZEXFRx6irgHRHxgYiY09Nb8VXgVwPNy5AkSa1pKD0WxwCzgAtqXJvecw2AlNJFETEJ+BB5\nbsXzwM+BTwzhfSVJUpOrO1iklK4DRvdz7Ywa584BzqlRXJIkDTPuFTKMdHR0NLoKLcl2q59tNjS2\nW/1ss9Yz5MdNt6WIWAwsX758uZN2JEmqQ1dXF+3t7QDtKaWu7f3+9lhIkqTCGCwkSVJhDBaSJKkw\nBgtJklQYg4UkSSqMwUKSJBXGYCFJkgpjsJAkSYUxWEiSpMIYLCRJUmEMFpIkqTAGC0mSVBiDhSRJ\nKozBQpIkFcZgIUmSCmOwkCRJhTFYSJKkwhgsJElSYQwWkiSpMAYLSZJUGIOFJEkqjMFCkiQVxmAh\nSZIKY7CQJEmFMVhIkqTCGCwkSVJhDBaSJKkwBgtJklQYg4UkSSqMwUKSJBXGYCFJkgpjsJAkSYUx\nWEiSpMIYLCRJUmEMFpIkqTAGC0mSVBiDhSRJKozBQpIkFcZgIUmSClNXsIiI1RGxucbx9QHu2SEi\nPhcRD0XEuohYFRGnb3XNJUlS0xlTZ/klwOiKrw8ArgUuG+CeHwK7A2cAK4Hp2FMiSdKwVFewSCmt\nqfw6Ik4AVqaUbqlVPiKOA44A5qaUnu85/chQKipJkprfkHsOImIscBpw/gDFTgB+C/xTRDwaEfdH\nxBciYvxQ31eSJDWveodCKp0MtAEXDVBmLrnHYh1wEjAF+BawK/C3W/HekiSpCW1NsDgTWJZSemKA\nMqOAzcCpKaUXASLio8API+LvU0rrB3qDs846i7a2tl7nOjo66Ojo2IpqS5I0PHR2dtLZ2dnrXHd3\nd4Nqk0VKqf6bImYDq4CTUkpXD1DuQuCwlNKrKs7tC/wBeFVKaWU/9y0Gli9fvpzFixfXXT9Jkkaq\nrq4u2tvbAdpTSl3b+/2HOsfiTOBJYOkWyt0KzIiIHSvOLSD3Yjw6xPeWJElNqu5gEREBnA5cmFLa\nXHXt7IionHNxKbAGuCAiFkbEkcC/AedvaRhEkiS1nqH0WBwDzAIuqHFtes81AFJKLwHHArsAvwEu\nBq4EPjKE95UkSU2u7smbKaXr6L1IVuW1M2qc+yPwlvqrJkmSWo0rYEqSpMIYLCRJUmEMFpIkqTAG\nC0mSVBiDhSRJKozBQpIkFcZgIUmSCmOwkCRJhTFYSJKkwhgsJElSYQwWkiSpMAYLSZJUGIOFJEkq\njMFCkiQVxmAhSZIKY7CQJEmFMVhIkqTCGCwkSVJhDBaSJKkwBgtJklQYg4UkSSqMwUKSJBXGYCFJ\nkgpjsJAkSYUxWEiSpMIYLCRJUmEMFpIkqTAGC0mSVBiDhSRJKozBQpIkFcZgIUmSCmOwkCRJhTFY\nSJKkwhgsJElSYQwWkiSpMAYLSZJUGIOFJEkqjMFCkiQVxmAhSZIK09TBYtOmRtdAkiTVo65gERGr\nI2JzjePrg7j38IjYEBFdg32/jRvrqZ0kSWq0MXWWXwKMrvj6AOBa4LKBboqINuAi4Hpg6mDfzGAh\nSVJrqStYpJTWVH4dEScAK1NKt2zh1v8AvgdsBt4+2PdzKESSpNYy5DkWETEWOA04fwvlzgDmAP9c\n73vYYyFJUmupdyik0slAaYijpoiYD5wNvD6ltDki6noDg4UkSa1la54KORNYllJ6otbFiBhFHv74\nTEppZel0PW/gUIgkSa0lUkr13xQxG1gFnJRSurqfMm3Ac8BGyoFiVM+fNwJvTind2M+9i4HlBx98\nJNOmtfW61tHRQUdHR911liRpuOns7KSzs7PXue7ubm6++WaA9pTSoJ/ELMpQg8Vngb8DZqWUNvdT\nJoCFVac/BBwFvAN4KKW0tp97FwPLL798Oe94x+K66ydJ0kjV1dVFe3s7NChY1D3HoicwnA5cWB0q\nIuJsYGZK6W9STiz3Vl1/CliXUloxmPdyjoUkSa1lKHMsjgFmARfUuDa951ohDBaSJLWWunssUkrX\n0XuRrMprZ2zh3n+mjsdODRaSJLUW9wqRJEmFaepgYY+FJEmtxWAhSZIK09TBwqEQSZJaS1MHC3ss\nJElqLQYLSZJUGIOFJEkqTFMHC+dYSJLUWpo6WNhjIUlSazFYSJKkwhgsJElSYZo6WLzySqNrIEmS\n6mGwkCRJhTFYSJKkwhgsJElSYZo6WGzY0OgaSJKkejR1sFi/vtE1kCRJ9WjqYOFQiCRJraWpg4VD\nIZIktZamDhYOhUiS1FqaOljYYyFJUmtp6mDhHAtJklqLwUKSJBXGYCFJkgpjsJAkSYUxWEiSpMIY\nLCRJUmGaOlj4uKkkSa2lqYOFC2RJktRamjpY2GMhSVJraepgsX49pNToWkiSpMFq6mABDodIktRK\nmj5YvPRSo2sgSZIGy2AhSZIKY7CQJEmFMVhIkqTCGCwkSVJhDBaSJKkwBgtJklQYg4UkSSpMUweL\nMWMMFpIktZK6gkVErI6IzTWOr/dT/uSIuDYinoqI7oi4LSLePNj3Gz/eYCFJUiupt8diCTCt4jgW\nSMBl/ZQ/ErgWOB5YDNwAXBURBw3mzSZMMFhIktRKxtRTOKW0pvLriDgBWJlSuqWf8mdVnfpURLwd\nOAG4e0vvZ7CQJKm1DHmORUSMBU4Dzq/jngB2Ap4dTHmDhSRJrWVrJm+eDLQBF9Vxz8eAifQ/dNLL\nhAnw4otDqJkkSWqIuoZCqpwJLEspPTGYwhFxKvBp4MSU0jODueeRR87i8cfbOPHE8rmOjg46OjqG\nUF1JkoaXzs5OOjs7e53r7u5uUG2ySCnVf1PEbGAVcFJK6epBlH83cB7wzpTSzwZRfjGw/Pjjl/Py\ny4u58ca6qyhJ0ojU1dVFe3s7QHtKqWt7v/9Qh0LOBJ4Elm6pYER0kOdhvHswoaLSpEnw/PNDq6Ak\nSdr+6g4WPRMwTwcuTCltrrp2dkRcVPH1qeQ5GP8I/CYipvYcOw/mvXbaCRrcoyNJkuowlB6LY4BZ\nwAU1rk3vuVbyd8Bo4BzgsYrjK4N5I3ssJElqLXVP3kwpXUcOC7WunVH19VFDrBdQ7rHYvBlGNfXi\n45IkCZp8r5BJkyAlHzmVJKlVNHWw2Gmn/OpwiCRJraGpg8WkSfnVCZySJLWGpg4W9lhIktRaDBaS\nJKkwTR0sSkMhBgtJklpDUweLceNg4kR4ZlA7i0iSpEZr6mABsPvu8PTTja6FJEkaDIOFJEkqjMFC\nkiQVpiWCxVNPNboWkiRpMJo+WOyxhz0WkiS1iqYPFg6FSJLUOloiWHR3wyuvNLomkiRpS1oiWIC9\nFpIktYKmDxZ77JFfncApSVLza/pgMWNGfn3sscbWQ5IkbVnTB4tp02D0aHj00UbXRJIkbUnTB4vR\no3OvxZ/+1OiaSJKkLWn6YAGw5572WEiS1AoMFpIkqTAGC0mSVJiWChYpNbomkiRpIC0TLF56CZ57\nrtE1kSRJA2mJYDF3bn5dtaqx9ZAkSQNriWAxf35+feCBxtZDkiQNrCWCRVtb3jPkwQcbXRNJkjSQ\nlggWAPPm2WMhSVKza5lgMX++PRaSJDW7lgkW9lhIktT8WiZYLFgAzzyTD0mS1JxaJlgceGB+vfvu\nxtZDkiT1r2WCxfz5MGGCwUKSpGbWMsFi9GhYtAh+97tG10SSJPWnZYIFwEEH2WMhSVIza7lg8Yc/\nwPr1ja6JJEmqpaWCxaGHwoYN0NXV6JpIkqRaWipYvPrVsOOOcOutja6JJEmqpaWCxdixcMghBgtJ\nkppVSwULgMMPz8EipUbXRJIkVWu5YHHkkfD003DvvY2uiSRJqlZXsIiI1RGxucbx9QHueWNELI+I\ndRHxx4j4m62p8BFHwPjxsGzZ1nwXSZK0LdTbY7EEmFZxHAsk4LJahSNib+Bq4OfAQcBXgfMi4tih\nVTevvvmmNxksJElqRmPqKZxSWlP5dUScAKxMKd3Szy0fBFallD7e8/X9EfF64CzgunorW3L88fDR\nj8ILL8BOOw31u0iSpKINeY5FRIwFTgPOH6DYa4Hrq85dA7xuqO8L8La35fUsli7dmu8iSZKKtjWT\nN08G2oCLBigzDXiy6tyTwM4RMW6obzxnTn7stLNzqN9BkiRtC3UNhVQ5E1iWUnqiqMpUO+uss2hr\na+t1rqOjg46ODk49FT72MXjuOZg8eVvVQJKk5tXZ2Uln1W/Z3d3dDapNFmkIC0JExGxgFXBSSunq\nAcrdBCxPKX204tzpwJdTSv3GgYhYDCxfvnw5ixcvrlnm8cdhzz3hW9+C97+/7o8gSdKw1NXVRXt7\nO0B7Smm7b4Ix1KGQM8lDGlua5XA7cHTVuTf3nN8q06fDW9+ag4WLZUmS1BzqDhYREcDpwIUppc1V\n186OiMo5F/8BzI2If42IBRHx98A7gS9tRZ3/24c/DHfdBb/8ZRHfTZIkba2h9FgcA8wCLqhxbXrP\nNQBSSg8Bb+u55y7yY6Z/m1KqflJkSI45BhYsgK9+tYjvJkmStlbdkzdTStcBo/u5dkaNczcD7fVX\nbctGjYJ//Mc8x+J3v4MDD9wW7yJJkgar5fYKqXb66TB3LnzmM42uiSRJavlgMXYsfPazcMUVzrWQ\nJKnRWj5YAJx6al4w64MfzCtySpKkxhgWwWL0aDj3XFixAr7whUbXRpKkkWtYBAuAV78aPv7xPNfi\njjsaXRtJkkamYRMsAP75n+Hgg+Hd74Znnml0bSRJGnmGVbAYOzZvTPbyy3DiiflVkiRtP8MqWADs\ntRf89Kdw992552L9+kbXSJKkkWPYBQvIwyE/+hFce23uuXjppUbXSJKkkWFYBguA446DZcvgttvg\nTW+CRx9tdI0kSRr+hm2wADjqKLjhBnjsMWhvhxtvbHSNJEka3oZ1sABYsgSWL4f99889F//wDw6N\nSJK0rQz7YAGwxx5w3XXwxS/mhbQWLYIf/ABSanTNJEkaXkZEsIC8OudZZ+VdUBctyk+MvO51cM01\nBgxJkooyYoJFyfz5cNVV8Itf5EBx3HF51c6LL4Z16xpdO0mSWtuICxYlRx2Vl/6+4QaYORPe+16Y\nMQM+/GG4885G106SpNY0YoMFQAS88Y2wdCncdx+8//1w+eWweDHstx98+tM5ZDhUIknS4IzoYFFp\nwQL4/OfhT3+Cq6/O27Cfc04OGfvsk3syrrwSursbXVNJkprXmEZXoNmMGQNve1s+NmzIa1/8+Me5\nV+Mb38iTQA85BN7wBnjta/MxdWqjay1JUnMwWAxg7Fg49th8AKxaBddfn4/vfjf3cADsvXd+wmTJ\nEjjooHxMmdKwakuS1DAGizrMnZvnYbz//XnexaOP5gmgd9wBt98OV1wBa9fmsjNmlEPGAQfAwoV5\nuGXHHRv7GSRJ2pYMFkMUAbNm5eNd78rnNm2CBx/MO6uWjksu6b1PyV575ZCx7775KP15993z95Qk\nqZUZLAo0enTulViwAE45pXz++efh/vthxYr89MmKFXlr9699DTZvzmV23jlPEp03r+/rjBkwymm2\nkqQWYLDYDnbZBQ49NB+V1q/PPRz33ZdfV67Mr7/6VX46pfSY6/jxeRhmn336ho699spzQSRJagYG\niwYaNy5vjrb//n2vrV8Pq1eXw8bKlflYujSf37Ahlxs1CvbcMweMvffue+y5J+yww3b7SJKkEc5g\n0aTGjSvPw6i2aVPu0SiFjYcfzseqVXmp8sceK/d2ROSVRasDRymIzJqV30uSpCIYLFrQ6NHlgHD0\n0X2vr1+fg8dDD+Xj4YfLf77xRvjzn3sHjxkzctCYPbvv6+zZ0Na2nT6YJKnlGSyGoXHj8vyLefNq\nX3/lld7B46GH4JFHcgApze/YuLFcvq2tHDJqBY/p03PYkSTJYDEC7bBDeSJoLZs2wRNPlMNG5esv\nf5n/XLm0+Zgx5XketQLIrFkwceL2+WySpMYyWKiP0aPzvIyZM/OKorV0d+egUTpKwWPVqrxj7GOP\nlR+lBdhtt/6Dx+zZsMceruMhScOBwUJD0taWVxQ94IDa1zdsyHM5qoPHww/Dddfl15dfLpcfPz6H\njDlz8tyR6tcpUwwektQKDBbaJsaOLU8wrSUlePbZcvAozfVYvTovj37ppfCXv5TLT5xYO3CUXidP\n3rafR5I0OAYLNUREHh7ZbTd4zWv6Xk8pr1i6enU5cJTCxw03wHe+07vHo62tHGRqhY+ddtoOH0qS\nZLBQc4rIvRCTJ8PixX2vpwTPPNM7dJRely3Lr+vXl8vvumv/vR177+3mcJJUFIOFWlJE3rht993h\n4IP7Xt+8GZ58snbwuOKKPMejtHop5MmjtQLHnDl5cun48dvjU0lS6zNYaFgaNSqvrzF9eu0nWzZt\ngscf7xs6Vq8ur+WxaVO5/IwZ/Q+zzJrlfi2SVGKw0Ig0enRee2PPPeGII/pe37gxb3dfK3jcdFPv\n1UtL+7X0FzxmznQBMUkjh8FCqmHMmIGfaqlcNr0ydPzxj3DNNXmBscrvNXt2/8Fj2rQcTiRpODBY\nSEOwpWXT164t79FSGTx+9zu48so88bRkhx3Ka3jMnQvz5+fvO39+/tpN4iS1EoOFtA1MmND/7rQA\nL76Yg0dl6Fi9Gm67Db773fKjtBG5t6MUNCpDx5w5TiqV1HzqDhYRMQP4V+B4YEfgAeCMlFLXAPec\nBnwMmA90A8uAj6WUnh1KpaVWN2kS7L9/PqqllCeWPvggPPBA+XVLoaPyde5cQ4ekxqgrWETELsCt\nwM+BtwDPkMPCcwPcczhwEfAR4GpgJnAu8G3gnUOqtTSMlbaynzEDjjyy97XBho5Ro3KPRqnXZOHC\n8p932237fyZJI0e9PRafAB5JKb2v4tzDW7jntcDqlNI5pfIRcS7w8TrfWxrxBhs67r8/HytWwH/9\nF3zpS+WnWHbfvW/YWLgw9344iVTS1qo3WJwA/CwiLgPeAPwZ+GZK6bwB7rkd+FxEHJ9SWhYRU4F3\nAT8dUo0l1TRQ6Fi7Nvds3HdfDhv33Qe//jVcfHG+BnleyIIFfXs5XvUqh1UkDV69wWIu8EHgi8Dn\ngEOAr0XE+pTSxbVuSCndFhHvAX4QEeN73vMnwP8eerUl1WPCBDjwwHxU2rw5bwJXChul4PHzn8PT\nT+cyEeVhlepeDodVJFWrN1iMAn6dUvp0z9d3R8Qi4ANAzWAREfsBXwU+C1wLTAf+nTzP4n217pG0\nfYwaVV6v4/jje19bs6Z32Ljvvrwc+urVOZBA3s6+Omzsu29+fNZhFWlkilQaeB1M4YiHgGtTSu+v\nOPcB4FMppVn93PNdYHxK6ZSKc4cDtwDTU0pP1rhnMbD8yCOPpK2trde1jo4OOjo6Bl1nScVat67v\nsMqKFXlOR2lYZfz48rBKKWwsXJifWpkwobH1l4aTzs5OOjs7e53r7u7m5ptvBmgf6InNbaXeYPE9\nYM+U0hsqzn0ZODil9Pp+7rkceCWldGrFudcBvwRmppSeqHHPYmD58uXLWVxra0tJTWfz5rwaaWXY\nKPV4PPVULhORe0dq9XJMmdLQ6kvDRldXF+3t7dCgYFHvUMiXgVsj4pPAZcCh5OGMvysViIizyYHh\nb3pOXQV8u6dn4xpgRs/3+VWtUCGpNY0alYdA9toLjjuu97Vnn+07rHLllfCVr5SHVXbfHRYtggMO\nKB/775/X/JDUOuoKFiml30bEycDngU8Dq4GPpJS+X1FsOjCr4p6LImIS8CHy3IrnyetgfGIr6y6p\nRey6Kxx2WD4qrVuXH49dsQLuvRd+/3v42c/gG98oB465c3uHjQMOyEMqY1w3WGpKdf+vmVJaCiwd\n4PoZNc6dA5xTo7ikEWz8+NxLsWhR7/Nr15aDRuk477zy5m7jxuUhlOrAMWNGHm6R1DhmfklNZ8IE\naG/PR6VnnukdNn7/e/jxj+Gll/L1yZP7DqcsWgRVc8AlbUMGC0ktY8oUOOqofJRs3pw3dKsMGzfe\nCOeeC5s25TKzZ/ft3ViwIO8sK6lYBgtJLa20L8qcOXDiieXz69fnSaKVgeOSS+DRR/P1MWPy0yjV\ngWP2bIdTpK1hsJA0LI0bBwcdlI9Kzz0H99zTO3AsXQrd3fn6zjuX531UDqe4yqg0OAYLSSPK5Mlw\nxBH5KEkp92RUho077oALLoANG3KZadPKgaN07Lcf7LRTYz6H1KwMFpJGvAiYNSsfb31r+fyGDXmV\n0XvugT/8Ib8uXQpf+1r5cdi99+4dNvbfPw+xuHGbRiqDhST1Y+zY3Cux3369z69dm+dv3HNP+bj0\n0ryhG+R5H/Pn9+3hmDfP9Tc0/PkjLkl1mjABXvOafFTq7s7rb1QGjm99q7yk+Q475PU3qgPH7Nlu\n2qbhw2AhSQVpa4PXvS4flZ56qjyUUjquugr+8pd8fdKkPIRSHTimTvUJFbUeg4UkbWN77JGPyvU3\nShNGK8PGnXfC976XlzqH/CRKKXBUbtw2c6aBQ83LYCFJDVA5YfT448vnN22C1at7B46bbspLmr/y\nSi4zaVLfben33TfP4Rg3rjGfRyoxWEhSExk9OgeEefPgpJPK5zduhIce6rtL7NKleW0OyPM05s4t\nB43Kw3U4tL0YLCSpBYwZUw4cf/VX5fMp5T1USoGjdPz4x7nnI6VcbsqUcsiYPz8f8+bBPvvAjjs2\n5jNpeDJYSFILi4Ddd89H5aJfkB+LfeCB3oFj+XL4/vfhxRfL5WbOLAeNyldDh4bCYCFJw9SECXDg\ngfmolFJ+UuXBB3PwKL12dcEPfgAvvFAuO3Nm38BR6jkxdKgWg4UkjTAR+VHWqVPh8MN7X0sJnn66\nd+B48MHaoWPGjN7DKoYOgcFCklQhovx4bH+ho7qn48474bLLyutyQDl01BpemThx+34mbV8GC0nS\noFSGjsMO632tNIm0uqfjrrvghz/sGzpqDa/ss09+lFatzWAhSdpqlZNI+wsd1T0dd98Nl19e3rIe\ncmjZZ5/yMXdu+c+uRNoaDBaSpG2qMnRUL3eeEqxZk4PGypX5WLUqv15/PTzxRLnsxIk5aFSGjdKx\n11550zg1nsFCktQwEXmNjSlT+oYOgJdeykGjFDZKx09+khcM27gxlxs1Km/mVqunY599YOedt+vH\nGtEMFpKkpjVxIhxwQD6qbdyY91upDByrVsFvfpPX6qic1zFlSu2ejrlzYfp0d5ctksFCktSSxoyB\nvffOx9FH975WGmKpHFopHTfdBI89Vi47fnzv0FH55733dv+VehksJEnDTuUQy6GH9r2+dm1e8rwy\ncKxcCcuW5fOlDd9Km8X119sxefL2/VytwGAhSRpxJkyA/fbLR7VNm+DPf+7b23HXXfCjH8Hzz5fL\nTp7c/1MsM2eOzCEWg4UkSRVGj84TQWfPhqOO6nv9uef69nSsWgW33ZbnfJQ2fhs3DubMqd3bMWdO\nHoIZjgwWkiTVYfJkWLIkH9XWrctPq1T3dlx/PXz727B+fbnszJn993bsumvrrtlhsJAkqSDjx5e3\np6+2eXOeNFo9mfSee/Ljs2vWlMu2tfXt6dh3X1i4MM8baWYGC0mStoNRo2DPPfNx5JF9r3d39x1e\nWbkyPz77pz/lYAJ5obGFC/Ox337l1xkzmqOXw2AhSVITaGuDxYvzUW39+rw66YoVcO+9+fX22+HC\nC8vDKzvtlEPG1Knbtdp9GCwkSWpy48bBokX5qLRpU348thQ27r0XfvvbxtSxxGAhSVKLGj067ww7\nbx6ceGI+19UF7e2Nq9MIfMJWkiRtKwYLSZJUGIOFJEkqjMFCkiQVxmAhSZIKY7CQJEmFMVhIkqTC\nGCwkSVJhDBbDSGdnZ6Or0JJst/rZZkNju9XPNms9dQeLiJgRERdHxDMR8XJE3B0RNVY273XPDhHx\nuYh4KCLWRcSqiDh9yLVWTf4PODS2W/1ss6Gx3epnm7Weupb0johdgFuBnwNvAZ4B5gPPbeHWHwK7\nA2cAK4Hp2FsiSdKwU+9eIZ8AHkkpva/i3MMD3RARxwFHAHNTSs/3nH6kzveVJEktoN5egxOA30bE\nZRHxZER0RcT7BnMP8E8R8WhE3B8RX4iI8UOqsSRJalr19ljMBT4IfBH4HHAI8LWIWJ9SuniAe44A\n1gEnAVOAbwG7An/bzz3jAVasWFFn9Ua27u5uurq6Gl2NlmO71c82GxrbrX62Wf0q/u1syC/wkVIa\nfOGI9cBBAF5JAAAG/ElEQVSvU0pHVJz7KrAkpXR4P/dcA7wemJpSerHn3MnkeRcTU0rra9xzKvC9\nej6IJEnq5bSU0qXb+03r7bF4HKjuRlgB/I8t3PPnUqiouCeAPcmTOatdA5wGPETu6ZAkSYMzHtib\n/G/pdldvsLgVWFB1bgEDT+C8FXhnROyYUnq54p7NwKO1bkgprQG2e8qSJGmYuK1Rb1zv5M0vA6+N\niE9GxD49QxbvA75RKhARZ0fERRX3XAqsAS6IiIURcSTwb8D5tYZBJElS66orWKSUfgucDHQAvwc+\nBXwkpfT9imLTgVkV97wEHAvsAvwGuBi4EvjIVtVckiQ1nbomb0qSJA3E1S8lSVJhDBaSJKkwTRcs\nIuJDEbE6ItZGxB0RcXCj67StRMQREfGTiPhzRGyOiBNrlPk/EfFYz4Zv10XEvKrr4yLinJ5N4V6I\niMsjYo+qMpMj4nsR0R0Rz0XEeRExsarMrIj4aUS8FBFPRMS/RURT/Xz0TBr+dUT8pWfl1/+KiFfV\nKGebVYiID/RsFtjdc9zWs9R+ZRnbbAAR8Yme/0e/VHXedqsQEZ/paafK496qMrZZlRjE5p4t1W4p\npaY5gP9JXrfivcC+wLnAs8CURtdtG33e44D/A7wd2AScWHX9n3o+/18Bi4AryOt+7FBR5lvk9T7e\nALyG/IjRLVXfZxnQBSwBDgP+CFxScX0UeTLuNcAB5A3mngL+pdFtVPU5lgJ/DSzsqefVPZ99gm02\nYLu9rednbR9gHvAvwHpgoW02qPY7GFgF3Al8yZ+1AdvqM8DvyJtO7tFz7GqbDdhmuwCrgfOAdmAv\n4BhgTqu2W8MbtepD3wF8teLrIK918fFG1207fPbN9A0WjwFnVXy9M7AWOKXi6/XAyRVlSmuEHNLz\n9cKer19TUeYtwEZgWs/XxwMbqAhwwP8i71o7ptFtM0CbTen5bK+3zepuuzXAGbbZFttpEnA/8Cbg\nBnoHC9utb3t9Buga4Lpt1rdNPg/ctIUyLdVuTdMtFBFjyWnt56VzKX+q64HXNapejRIRc4Bp9G6P\nvwC/otweS8iLnFWWuZ+8e2ypzGuB51JKd1Z8++uBBBxaUeb3KaVnKspcA7QB+xf0kbaFXcif41mw\nzQYjIkZFxLuBHYHbbLMtOge4KqX0i8qTttuA5kce3l0ZEZdExCywzQYw4OaerdhuTRMsyL99jgae\nrDr/JLlRR5pp5P/gA7XHVOCVnh+y/spMI3dl/beU0ibyP8aVZWq9DzRp20dEAF8BfplSKo3h2mb9\niIhFEfEC+beab5J/s7kf26xfPQHs1cAna1y23Wq7Azid/JvwB4A5wM094/i2WW2lzT3vB95MHtL4\nWkT8dc/1lmu3epf0lprFN4H9gJqb36mP+4CDyL95vBP4buRVcFVDROxJDq7HpJQ2NLo+rSKlVLk3\nxT0R8Wvylg+nkH8G1dco8uaen+75+u6IWEQOZv3tGt7UmqnH4hnyBMapVeenAk9s/+o03BPkOSYD\ntccTwA4RsfMWylTPDB5N3ra+skyt94EmbPuI+AbwVuCNKaXHKy7ZZv1IKW1MKa1KKd2ZUvoUcDd5\n9VvbrLZ28gTErojYEBEbyJPiPhIRr5B/i7PdtiCl1E2eIDgPf9b609/mnrN7/txy7dY0waLnt4Ll\nwNGlcz3d3UfTwM1UGiWltJr8H7KyPXYmj4WV2mM5eeJNZZkF5B/I23tO3Q7sEhGvqfj2R5N/UH9V\nUeaAiJhSUebNQDfQ61GxRusJFW8HjkopPVJ5zTaryyhgnG3Wr+vJs+JfTe7pOQj4LXAJcFBKaRW2\n2xZFxCRyqHjMn7V+Dbi5Z0u2W6NnxFbNfD0FeJnej5uuAXZvdN220eedSP4L69Xk2br/0PP1rJ7r\nH+/5/CeQ/5K7AniA3o8YfZP8qNIbyb9l3UrfR4yWkv9SPJg8dHA/cHHF9VHk32CXAQeSx0efBP5v\no9uo6nN8kzw7+Qhyii4d4yvK2GZ92+3snjbbi/yo2v8j/yX0JtusrnasfirEduvbRl8Ajuz5WTsM\nuK6nrrvZZv222RLy3KdPkh8JPxV4AXh3q/6sNbxRazTy35OfxV1LTk9LGl2nbfhZ30AOFJuqju9U\nlPks+VGjl8mzc+dVfY9xwNfJQ0kvAD8E9qgqswv5N61u8j/M/wnsWFVmFnldiBd7fpD+FRjV6Daq\nqmOtttoEvLeqnG3Wu57nkddhWEv+zedaekKFbVZXO/6CimBhu9Vso07yEgFryU8kXErFegy2Wb/t\n9lby+h8vA38AzqxRpmXazU3IJElSYZpmjoUkSWp9BgtJklQYg4UkSSqMwUKSJBXGYCFJkgpjsJAk\nSYUxWEiSpMIYLCRJUmEMFpIkqTAGC0mSVBiDhSRJKsz/ByaDGWefWV6eAAAAAElFTkSuQmCC\n", 225 | "text/plain": [ 226 | "" 227 | ] 228 | }, 229 | "metadata": {}, 230 | "output_type": "display_data" 231 | } 232 | ], 233 | "source": [ 234 | "import numpy as np\n", 235 | "import matplotlib.pyplot as plt\n", 236 | "%matplotlib inline\n", 237 | "plt_dat = np.genfromtxt('params_2000/ae_train_nonoise/cost_profile.csv', delimiter=',', names = True)\n", 238 | "plt.plot(plt_dat)\n", 239 | "plt.show()" 240 | ] 241 | } 242 | ], 243 | "metadata": { 244 | "anaconda-cloud": {}, 245 | "kernelspec": { 246 | "display_name": "Python [conda env:py3]", 247 | "language": "python", 248 | "name": "conda-env-py3-py" 249 | }, 250 | "language_info": { 251 | "codemirror_mode": { 252 | "name": "ipython", 253 | "version": 3 254 | }, 255 | "file_extension": ".py", 256 | "mimetype": "text/x-python", 257 | "name": "python", 258 | "nbconvert_exporter": "python", 259 | "pygments_lexer": "ipython3", 260 | "version": "3.5.3" 261 | } 262 | }, 263 | "nbformat": 4, 264 | "nbformat_minor": 2 265 | } 266 | -------------------------------------------------------------------------------- /scripts_R/clustering.R: -------------------------------------------------------------------------------- 1 | 2 | setwd('/home/ekhongl/Codes/DL - Topic Modelling') 3 | library(ggplot2) 4 | library(dplyr) 5 | 6 | 7 | 8 | 9 | 10 | 11 | #---------------------------------------------------------------- 12 | # [0] Data Prep 13 | #---------------------------------------------------------------- 14 | map_class <- function(x){ 15 | return( c(3,0,0,0,0,0,5,1,1,1,1,2,2,2,2,3,4,4,4,3)[0:19 %in% x] ) 16 | } 17 | 18 | Y_levels <- c('alt.atheism','comp.graphics','comp.os.ms-windows.misc','comp.sys.ibm.pc.hardware', 19 | 'comp.sys.mac.hardware','comp.windows.x','misc.forsale','rec.autos','rec.motorcycles', 20 | 'rec.sport.baseball','rec.sport.hockey','sci.crypt','sci.electronics','sci.med', 21 | 'sci.space','soc.religion.christian','talk.politics.guns','talk.politics.mideast', 22 | 'talk.politics.misc','talk.religion.misc') 23 | 24 | Y_overview_levels <- c('computer','hobby','science','religion','politics','sales') 25 | 26 | dat <- read.csv('data/ae_features_2000_nonoise.csv',check.names=FALSE) 27 | 28 | dat_raw <- read.table('data/raw_20news/20news.csv',header = T, sep=',', 29 | row.names = NULL, stringsAsFactors = FALSE) 30 | 31 | 32 | X <- dat[,2:ncol(dat)] 33 | Y <- as.factor(dat[,1]) 34 | Y_overview <- as.factor(sapply(Y,map_class)) 35 | levels(Y) <- Y_levels 36 | levels(Y_overview) <- Y_overview_levels 37 | #---------------------------------------------------------------- 38 | 39 | 40 | 41 | #---------------------------------------------------------------- 42 | # [1] Viewing the distribution of each neuron's output 43 | #---------------------------------------------------------------- 44 | for (i in 1:ncol(X)) { 45 | hist(X[,i],breaks=100) 46 | invisible(readline(prompt="Press [enter] to continue")) 47 | } 48 | 49 | hist(unlist(X),breaks=50) 50 | #---------------------------------------------------------------- 51 | 52 | 53 | 54 | 55 | #---------------------------------------------------------------- 56 | # [2] K-means clustering to segment the data 57 | #---------------------------------------------------------------- 58 | X.clust <- kmeans(X,centers = 20, nstart=2000, trace =1) 59 | X.clust_overview <- kmeans(X,centers = 6, nstart=2000, trace =1) 60 | 61 | for (i in 1:20) { 62 | plt_dat <- data.frame(labels = Y[X.clust$cluster == i]) %>% 63 | group_by(labels) %>% 64 | summarize( freq = n()) 65 | plt <- ggplot(plt_dat, aes(x=labels, y = freq)) + 66 | geom_bar(stat="identity") + 67 | theme(axis.text.x = element_text(angle = 60, hjust = 1)) 68 | print(plt) 69 | invisible(readline(prompt="Press [enter] to continue")) 70 | rm(plt) 71 | } 72 | 73 | for (i in sort(unique(X.clust_overview$cluster))) { 74 | print( paste("----- Current cluster: ", i, " -----")) 75 | plt_dat <- data.frame(labels = Y_overview[X.clust_overview$cluster == i]) %>% 76 | group_by(labels) %>% 77 | summarize( freq = n()) 78 | plt <- ggplot(plt_dat, aes(x=labels, y = freq)) + geom_bar(stat="identity") 79 | print(plt) 80 | invisible(readline(prompt="Press [enter] to continue")) 81 | rm(plt) 82 | } 83 | 84 | table(Y,X.clust$cluster) 85 | table(Y_overview,X.clust_overview$cluster) 86 | 87 | write.csv(X.clust$cluster, 'data/clustered_output_nonoise.csv') 88 | write.csv(X.clust_overview$cluster, 'data/clustered_overview_output_nonoise.csv') 89 | #---------------------------------------------------------------- 90 | 91 | 92 | save(Y,Y_overview, X.clust, X.clust_overview, file = "data/cluster_results.RData") 93 | 94 | 95 | # #---------------------------------------------------------------- 96 | # # [3] multinomial mixture modelling to segment the data 97 | # #---------------------------------------------------------------- 98 | # library(mixtools) 99 | # X_bin <- (X >0.1)*1 100 | # 101 | # mixout <- multmixEM(X_bin, lambda = NULL, theta = NULL, k = 6, 102 | # maxit = 10000, epsilon = 1e-08, verb = TRUE) 103 | # 104 | # mixout.clust <- apply( mixout$posterior, 1, function(x) which(x == max(x)) ) 105 | # 106 | # 107 | # for (i in sort(unique(mixout.clust))) { 108 | # print( paste("----- Current cluster: ", i, " -----")) 109 | # plt_dat <- data.frame(labels = Y_overview[mixout.clust == i]) %>% 110 | # group_by(labels) %>% 111 | # summarize( freq = n()) 112 | # plt <- ggplot(plt_dat, aes(x=labels, y = freq)) + geom_bar(stat="identity") 113 | # print(plt) 114 | # invisible(readline(prompt="Press [enter] to continue")) 115 | # rm(plt) 116 | # } 117 | # #---------------------------------------------------------------- 118 | # 119 | # 120 | # 121 | # 122 | # #---------------------------------------------------------------- 123 | # # [4] hiearchical clustering using hamming distance to segment the data 124 | # #---------------------------------------------------------------- 125 | # library(e1071) 126 | # # import your binary data with read.table or read.delim; the following 127 | # # example uses random data 128 | # y <- matrix(sample(c(0,1), 100, replace=TRUE), 10, 10, 129 | # dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:10, sep=""))) 130 | # disma <- hamming.distance(X_bin) 131 | # hr <- hclust(as.dist(disma)) 132 | # plot(as.dendrogram(hr), edgePar=list(col=3, lwd=4), horiz=T) 133 | # 134 | # hr.clust <- cutree(hr,k=6) 135 | # 136 | # for (i in sort(unique(hr.clust))) { 137 | # print( paste("----- Current cluster: ", i, " -----")) 138 | # plt_dat <- data.frame(labels = Y_overview[hr.clust == i]) %>% 139 | # group_by(labels) %>% 140 | # summarize( freq = n()) 141 | # plt <- ggplot(plt_dat, aes(x=labels, y = freq)) + geom_bar(stat="identity") 142 | # print(plt) 143 | # invisible(readline(prompt="Press [enter] to continue")) 144 | # } 145 | # #---------------------------------------------------------------- 146 | 147 | 148 | 149 | 150 | -------------------------------------------------------------------------------- /scripts_old/UT_1_gibbs_sampling.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [ 10 | { 11 | "name": "stderr", 12 | "output_type": "stream", 13 | "text": [ 14 | "Using gpu device 0: Tesla K40c (CNMeM is disabled, cuDNN 5105)\n", 15 | "/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.\n", 16 | " warnings.warn(warn)\n" 17 | ] 18 | } 19 | ], 20 | "source": [ 21 | "import numpy as np\n", 22 | "import theano\n", 23 | "from theano import tensor as T\n", 24 | "theano_rng = T.shared_randomstreams.RandomStreams(1234)" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 2, 30 | "metadata": { 31 | "collapsed": false 32 | }, 33 | "outputs": [], 34 | "source": [ 35 | "W_values = np.array([[1,1],[1,1]], dtype=theano.config.floatX)\n", 36 | "bvis_values = np.array([0,0], dtype=theano.config.floatX)\n", 37 | "bhid_values = np.array([0,0], dtype=theano.config.floatX)\n", 38 | "W = theano.shared(W_values)\n", 39 | "vbias = theano.shared(bvis_values)\n", 40 | "hbias = theano.shared(bhid_values)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 3, 46 | "metadata": { 47 | "collapsed": true 48 | }, 49 | "outputs": [], 50 | "source": [ 51 | "def propup(vis, v0_doc_len):\n", 52 | " pre_sigmoid_activation = T.dot(vis, W) + T.dot(hbias,v0_doc_len).T #---------------------------[edited]\n", 53 | " return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]\n", 54 | "\n", 55 | "def sample_h_given_v(v0_sample, v0_doc_len):\n", 56 | " pre_sigmoid_h1, h1_mean = propup(v0_sample, v0_doc_len)\n", 57 | " h1_sample = theano_rng.binomial(size=h1_mean.shape,\n", 58 | " n=1, p=h1_mean,\n", 59 | " dtype=theano.config.floatX)\n", 60 | " return [pre_sigmoid_h1, h1_mean, h1_sample]\n", 61 | "\n", 62 | "def propdown(hid):\n", 63 | " pre_softmax_activation = T.dot(hid, W.T) + vbias #---------------------------[edited]\n", 64 | " return [pre_softmax_activation, T.nnet.softmax(pre_softmax_activation)]\n", 65 | "\n", 66 | "def sample_v_given_h(h0_sample, v0_doc_len):\n", 67 | " pre_softmax_v1, v1_mean = propdown(h0_sample)\n", 68 | " v1_sample = theano_rng.multinomial(size=None,\n", 69 | " n=v0_doc_len, pvals=v1_mean,\n", 70 | " dtype=theano.config.floatX) #---------------------------[edited]\n", 71 | " v1_doc_len = v1_sample.sum(axis=1).reshape([1,v1_sample.shape[0]])\n", 72 | " return [pre_softmax_v1, v1_mean, v1_sample, v1_doc_len]\n", 73 | "\n", 74 | "def gibbs_hvh(h0_sample, v0_doc_len):\n", 75 | " pre_softmax_v1, v1_mean, v1_sample, v1_doc_len = sample_v_given_h(h0_sample, v0_doc_len)\n", 76 | " pre_sigmoid_h1, h1_mean, h1_sample = sample_h_given_v(v1_sample, v0_doc_len)\n", 77 | " return [pre_sigmoid_h1, h1_mean, h1_sample,\n", 78 | " pre_softmax_v1, v1_mean, v1_sample, v1_doc_len] #---------------------------[edited]\n", 79 | "\n", 80 | "def gibbs_vhv(v0_sample, v0_doc_len):\n", 81 | " pre_sigmoid_h1, h1_mean, h1_sample = sample_h_given_v(v0_sample, v0_doc_len)\n", 82 | " pre_softmax_v1, v1_mean, v1_sample, v1_doc_len = sample_v_given_h(h1_sample, v0_doc_len)\n", 83 | " return [pre_sigmoid_h1, h1_mean, h1_sample,\n", 84 | " softmax_v1, v1_mean, v1_sample, v1_doc_len] #---------------------------[edited]" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 4, 90 | "metadata": { 91 | "collapsed": false 92 | }, 93 | "outputs": [], 94 | "source": [ 95 | "ipt = T.matrix()\n", 96 | "ipt_rSum = ipt.sum(axis=1).reshape([1,ipt.shape[0]])" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 5, 102 | "metadata": { 103 | "collapsed": false 104 | }, 105 | "outputs": [], 106 | "source": [ 107 | "pre_sigmoid_ph, ph_mean, ph_sample = sample_h_given_v(ipt, ipt_rSum)\n", 108 | "chain_start = ph_sample" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": 6, 114 | "metadata": { 115 | "collapsed": false 116 | }, 117 | "outputs": [], 118 | "source": [ 119 | "([pre_sigmoid_nhs,\n", 120 | " nh_means,\n", 121 | " nh_samples,\n", 122 | " pre_sigmoid_nvs,\n", 123 | " nv_means,\n", 124 | " nv_samples,\n", 125 | " nv_samples_sum], updates) = theano.scan(\n", 126 | " gibbs_hvh,\n", 127 | " outputs_info=[None, None, chain_start, None, None, None, ipt_rSum],\n", 128 | " n_steps=1,\n", 129 | " name=\"gibbs_hvh\" )" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 7, 135 | "metadata": { 136 | "collapsed": false 137 | }, 138 | "outputs": [], 139 | "source": [ 140 | "gibbs = theano.function( [ipt], outputs=[pre_sigmoid_nvs,\n", 141 | " nv_means,\n", 142 | " nv_samples,\n", 143 | " nv_samples_sum,\n", 144 | " pre_sigmoid_nhs,\n", 145 | " nh_means,\n", 146 | " nh_samples], updates=updates )" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": 10, 152 | "metadata": { 153 | "collapsed": false 154 | }, 155 | "outputs": [ 156 | { 157 | "ename": "ValueError", 158 | "evalue": "dimension mismatch in args to gemm (1,2)x(2,2)->(2,1)\nApply node that caused the error: GpuGemm{no_inplace}(, TensorConstant{1.0}, GpuFromHost.0, GpuDimShuffle{1,0}.0, TensorConstant{1.0})\nToposort index: 26\nInputs types: [CudaNdarrayType(float32, matrix), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), TensorType(float32, scalar)]\nInputs shapes: [(2, 1), (), (1, 2), (2, 2), ()]\nInputs strides: [(1, 0), (), (0, 1), (1, 2), ()]\nInputs values: [b'CudaNdarray([[ 0.]\\n [ 0.]])', array(1.0, dtype=float32), b'CudaNdarray([[ 1. 1.]])', b'CudaNdarray([[ 1. 1.]\\n [ 1. 1.]])', array(1.0, dtype=float32)]\nOutputs clients: [[GpuDimShuffle{0,1,x,x}(GpuGemm{no_inplace}.0), GpuDimShuffle{x,0,1}(GpuGemm{no_inplace}.0)]]\n\nHINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.\nHINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.", 159 | "output_type": "error", 160 | "traceback": [ 161 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 162 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 163 | "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 858\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 859\u001b[0;31m \u001b[0moutputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 860\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 164 | "\u001b[0;31mValueError\u001b[0m: dimension mismatch in args to gemm (1,2)x(2,2)->(2,1)", 165 | "\nDuring handling of the above exception, another exception occurred:\n", 166 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 167 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtheano\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconfig\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfloatX\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mgibbs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 168 | "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 869\u001b[0m \u001b[0mnode\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnodes\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mposition_of_error\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 870\u001b[0m \u001b[0mthunk\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mthunk\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 871\u001b[0;31m storage_map=getattr(self.fn, 'storage_map', None))\n\u001b[0m\u001b[1;32m 872\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 873\u001b[0m \u001b[0;31m# old-style linkers raise their own exceptions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 169 | "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/gof/link.py\u001b[0m in \u001b[0;36mraise_with_op\u001b[0;34m(node, thunk, exc_info, storage_map)\u001b[0m\n\u001b[1;32m 312\u001b[0m \u001b[0;31m# extra long error message in that case.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 313\u001b[0m \u001b[0;32mpass\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 314\u001b[0;31m \u001b[0mreraise\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexc_type\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc_value\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc_trace\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 315\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 316\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 170 | "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/six.py\u001b[0m in \u001b[0;36mreraise\u001b[0;34m(tp, value, tb)\u001b[0m\n\u001b[1;32m 683\u001b[0m \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtp\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 684\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__traceback__\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mtb\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 685\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwith_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 686\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 687\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 171 | "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 857\u001b[0m \u001b[0mt0_fn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 858\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 859\u001b[0;31m \u001b[0moutputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 860\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 861\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'position_of_error'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 172 | "\u001b[0;31mValueError\u001b[0m: dimension mismatch in args to gemm (1,2)x(2,2)->(2,1)\nApply node that caused the error: GpuGemm{no_inplace}(, TensorConstant{1.0}, GpuFromHost.0, GpuDimShuffle{1,0}.0, TensorConstant{1.0})\nToposort index: 26\nInputs types: [CudaNdarrayType(float32, matrix), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), TensorType(float32, scalar)]\nInputs shapes: [(2, 1), (), (1, 2), (2, 2), ()]\nInputs strides: [(1, 0), (), (0, 1), (1, 2), ()]\nInputs values: [b'CudaNdarray([[ 0.]\\n [ 0.]])', array(1.0, dtype=float32), b'CudaNdarray([[ 1. 1.]])', b'CudaNdarray([[ 1. 1.]\\n [ 1. 1.]])', array(1.0, dtype=float32)]\nOutputs clients: [[GpuDimShuffle{0,1,x,x}(GpuGemm{no_inplace}.0), GpuDimShuffle{x,0,1}(GpuGemm{no_inplace}.0)]]\n\nHINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.\nHINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node." 173 | ] 174 | } 175 | ], 176 | "source": [ 177 | "b = np.array([[2,0,],[0,2],[1,1]], dtype = theano.config.floatX)\n", 178 | "gibbs(b)" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 22, 184 | "metadata": { 185 | "collapsed": false 186 | }, 187 | "outputs": [ 188 | { 189 | "ename": "ValueError", 190 | "evalue": "dimension mismatch in args to gemm (3,2)x(2,2)->(2,1)\nApply node that caused the error: GpuGemm{no_inplace}(, TensorConstant{1.0}, GpuFromHost.0, GpuDimShuffle{1,0}.0, TensorConstant{1.0})\nToposort index: 8\nInputs types: [CudaNdarrayType(float32, matrix), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), TensorType(float32, scalar)]\nInputs shapes: [(2, 1), (), (3, 2), (2, 2), ()]\nInputs strides: [(1, 0), (), (2, 1), (1, 2), ()]\nInputs values: [b'CudaNdarray([[ 0.]\\n [ 0.]])', array(1.0, dtype=float32), 'not shown', b'CudaNdarray([[ 1. 1.]\\n [ 1. 1.]])', array(1.0, dtype=float32)]\nOutputs clients: [[GpuDimShuffle{0,1,x,x}(GpuGemm{no_inplace}.0), GpuDimShuffle{x,0,1}(GpuGemm{no_inplace}.0)]]\n\nHINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.\nHINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.", 191 | "output_type": "error", 192 | "traceback": [ 193 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 194 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 195 | "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 858\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 859\u001b[0;31m \u001b[0moutputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 860\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 196 | "\u001b[0;31mValueError\u001b[0m: dimension mismatch in args to gemm (3,2)x(2,2)->(2,1)", 197 | "\nDuring handling of the above exception, another exception occurred:\n", 198 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 199 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtheano\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconfig\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfloatX\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;34m[\u001b[0m\u001b[0mpre_sigmoid_h1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mh1_mean\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mh1_sample\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msoftmax_v1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mv1_doc_len\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mv1_mean\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mv1_sample\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m=\u001b[0m \u001b[0mgibbs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpre_sigmoid_h1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'-------'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mh1_mean\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 200 | "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 869\u001b[0m \u001b[0mnode\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnodes\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mposition_of_error\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 870\u001b[0m \u001b[0mthunk\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mthunk\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 871\u001b[0;31m storage_map=getattr(self.fn, 'storage_map', None))\n\u001b[0m\u001b[1;32m 872\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 873\u001b[0m \u001b[0;31m# old-style linkers raise their own exceptions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 201 | "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/gof/link.py\u001b[0m in \u001b[0;36mraise_with_op\u001b[0;34m(node, thunk, exc_info, storage_map)\u001b[0m\n\u001b[1;32m 312\u001b[0m \u001b[0;31m# extra long error message in that case.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 313\u001b[0m \u001b[0;32mpass\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 314\u001b[0;31m \u001b[0mreraise\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexc_type\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc_value\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc_trace\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 315\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 316\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 202 | "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/six.py\u001b[0m in \u001b[0;36mreraise\u001b[0;34m(tp, value, tb)\u001b[0m\n\u001b[1;32m 683\u001b[0m \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtp\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 684\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__traceback__\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mtb\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 685\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwith_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 686\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 687\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 203 | "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 857\u001b[0m \u001b[0mt0_fn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 858\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 859\u001b[0;31m \u001b[0moutputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 860\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 861\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'position_of_error'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 204 | "\u001b[0;31mValueError\u001b[0m: dimension mismatch in args to gemm (3,2)x(2,2)->(2,1)\nApply node that caused the error: GpuGemm{no_inplace}(, TensorConstant{1.0}, GpuFromHost.0, GpuDimShuffle{1,0}.0, TensorConstant{1.0})\nToposort index: 8\nInputs types: [CudaNdarrayType(float32, matrix), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), TensorType(float32, scalar)]\nInputs shapes: [(2, 1), (), (3, 2), (2, 2), ()]\nInputs strides: [(1, 0), (), (2, 1), (1, 2), ()]\nInputs values: [b'CudaNdarray([[ 0.]\\n [ 0.]])', array(1.0, dtype=float32), 'not shown', b'CudaNdarray([[ 1. 1.]\\n [ 1. 1.]])', array(1.0, dtype=float32)]\nOutputs clients: [[GpuDimShuffle{0,1,x,x}(GpuGemm{no_inplace}.0), GpuDimShuffle{x,0,1}(GpuGemm{no_inplace}.0)]]\n\nHINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.\nHINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node." 205 | ] 206 | } 207 | ], 208 | "source": [ 209 | "b = np.array([[2,0,],[0,2],[1,1]], dtype = theano.config.floatX)\n", 210 | "[pre_sigmoid_h1, h1_mean, h1_sample, softmax_v1, v1_doc_len, v1_mean, v1_sample]= gibbs(b)\n", 211 | "print(pre_sigmoid_h1)\n", 212 | "print('-------')\n", 213 | "print(h1_mean)\n", 214 | "print('-------')\n", 215 | "print(h1_sample)\n", 216 | "print('-------')\n", 217 | "print(softmax_v1)\n", 218 | "print('-------')\n", 219 | "print(v1_mean)\n", 220 | "print('-------')\n", 221 | "print(v1_sample)" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 193, 227 | "metadata": { 228 | "collapsed": false 229 | }, 230 | "outputs": [ 231 | { 232 | "data": { 233 | "text/plain": [ 234 | "(3, 2)" 235 | ] 236 | }, 237 | "execution_count": 193, 238 | "metadata": {}, 239 | "output_type": "execute_result" 240 | } 241 | ], 242 | "source": [ 243 | "v1_sample.shape\n", 244 | "h1_mean[None,:,:].shape\n", 245 | "h1_mean.shape[1:]\n", 246 | "h1_mean.reshape(h1_mean.shape[1:]).shape" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 11, 252 | "metadata": { 253 | "collapsed": false 254 | }, 255 | "outputs": [ 256 | { 257 | "data": { 258 | "text/plain": [ 259 | "array([[ 2.],\n", 260 | " [ 2.]], dtype=float32)" 261 | ] 262 | }, 263 | "execution_count": 11, 264 | "metadata": {}, 265 | "output_type": "execute_result" 266 | } 267 | ], 268 | "source": [ 269 | "W_values.sum(axis=1).reshape([1,W_values.sum(axis=1).shape[0]]).T" 270 | ] 271 | } 272 | ], 273 | "metadata": { 274 | "anaconda-cloud": {}, 275 | "kernelspec": { 276 | "display_name": "Python [conda env:py3]", 277 | "language": "python", 278 | "name": "conda-env-py3-py" 279 | }, 280 | "language_info": { 281 | "codemirror_mode": { 282 | "name": "ipython", 283 | "version": 3 284 | }, 285 | "file_extension": ".py", 286 | "mimetype": "text/x-python", 287 | "name": "python", 288 | "nbconvert_exporter": "python", 289 | "pygments_lexer": "ipython3", 290 | "version": "3.5.2" 291 | } 292 | }, 293 | "nbformat": 4, 294 | "nbformat_minor": 2 295 | } 296 | -------------------------------------------------------------------------------- /scripts_old/Untitled.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 98, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import numpy as np\n", 12 | "import theano\n", 13 | "from theano import tensor as T\n", 14 | "\n", 15 | "\n", 16 | "W_values = np.array([[1,1],[1,1]], dtype=theano.config.floatX)\n", 17 | "bvis_values = np.array([0,0], dtype=theano.config.floatX)\n", 18 | "bhid_values = np.array([0,0], dtype=theano.config.floatX)\n", 19 | "\n", 20 | "W = theano.shared(W_values) # we assume that ``W_values`` contains the\n", 21 | " # initial values of your weight matrix\n", 22 | "bvis = theano.shared(bvis_values)\n", 23 | "bhid = theano.shared(bhid_values)\n", 24 | "\n", 25 | "trng = T.shared_randomstreams.RandomStreams(1234)\n", 26 | "\n", 27 | "def OneStep(vsample, W, bhid, bvis) :\n", 28 | " hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)\n", 29 | " hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)\n", 30 | " vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)\n", 31 | " return trng.binomial(size=vsample.shape, n=1, p=vmean,\n", 32 | " dtype=theano.config.floatX)\n", 33 | "\n", 34 | "sample = T.matrix()\n", 35 | "\n", 36 | "values, updates = theano.scan(OneStep, sequences=sample, non_sequences = [W, bhid, bvis], n_steps=sample.shape[0])\n", 37 | "\n", 38 | "gibbs10 = theano.function([sample], values, updates=updates)" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 99, 44 | "metadata": { 45 | "collapsed": true 46 | }, 47 | "outputs": [], 48 | "source": [ 49 | "tmp = np.array([[10,10],[-10,-10]], dtype = theano.config.floatX)" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 100, 55 | "metadata": { 56 | "collapsed": false 57 | }, 58 | "outputs": [ 59 | { 60 | "data": { 61 | "text/plain": [ 62 | "array([[ 1., 1.],\n", 63 | " [ 1., 0.]], dtype=float32)" 64 | ] 65 | }, 66 | "execution_count": 100, 67 | "metadata": {}, 68 | "output_type": "execute_result" 69 | } 70 | ], 71 | "source": [ 72 | "gibbs10(tmp)" 73 | ] 74 | } 75 | ], 76 | "metadata": { 77 | "anaconda-cloud": {}, 78 | "kernelspec": { 79 | "display_name": "Python [conda env:py3]", 80 | "language": "python", 81 | "name": "conda-env-py3-py" 82 | }, 83 | "language_info": { 84 | "codemirror_mode": { 85 | "name": "ipython", 86 | "version": 3 87 | }, 88 | "file_extension": ".py", 89 | "mimetype": "text/x-python", 90 | "name": "python", 91 | "nbconvert_exporter": "python", 92 | "pygments_lexer": "ipython3", 93 | "version": "3.5.2" 94 | } 95 | }, 96 | "nbformat": 4, 97 | "nbformat_minor": 2 98 | } 99 | -------------------------------------------------------------------------------- /scripts_old/Untitled1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 75, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import numpy as np\n", 12 | "import theano\n", 13 | "from theano import tensor as T\n", 14 | "\n", 15 | "\n", 16 | "W_values = np.array([[1,2],[1,1]], dtype=theano.config.floatX)\n", 17 | "bvis_values = np.array([1,1], dtype=theano.config.floatX)\n", 18 | "bhid_values = np.array([2,3], dtype=theano.config.floatX)\n", 19 | "\n", 20 | "W = theano.shared(W_values) # we assume that ``W_values`` contains the\n", 21 | " # initial values of your weight matrix\n", 22 | "bvis = theano.shared(bvis_values)\n", 23 | "bhid = theano.shared(bhid_values)\n", 24 | "\n", 25 | "def t_propup(vis,vis_sum):\n", 26 | " pre_sigmoid_activation = T.dot(vis, W) + T.dot(bhid.reshape([1,bhid.shape[0]]).T,vis_sum).T\n", 27 | " return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]\n", 28 | "\n", 29 | "t_ipt = T.matrix()\n", 30 | "t_ipt_sum = t_ipt.sum(axis=1).reshape([1,t_ipt.shape[0]])\n", 31 | "\n", 32 | "t_results, t_updates = theano.scan( fn = t_propup, \n", 33 | " non_sequences = [t_ipt, t_ipt_sum],\n", 34 | " n_steps=1\n", 35 | " )\n", 36 | "\n", 37 | "tmp_f = theano.function( [t_ipt], t_results, updates = t_updates)" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 11, 43 | "metadata": { 44 | "collapsed": true 45 | }, 46 | "outputs": [], 47 | "source": [ 48 | "tmp = np.array([[10,10],[-10,-10],[-10,-10]], dtype = theano.config.floatX)" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 18, 54 | "metadata": { 55 | "collapsed": false 56 | }, 57 | "outputs": [ 58 | { 59 | "data": { 60 | "text/plain": [ 61 | "[array([[[ 60., 20.],\n", 62 | " [-60., -20.],\n", 63 | " [-60., -20.]]], dtype=float32),\n", 64 | " array([[[ 1.00000000e+00, 1.00000000e+00],\n", 65 | " [ 8.75653169e-27, 2.06115347e-09],\n", 66 | " [ 8.75653169e-27, 2.06115347e-09]]], dtype=float32)]" 67 | ] 68 | }, 69 | "execution_count": 18, 70 | "metadata": {}, 71 | "output_type": "execute_result" 72 | } 73 | ], 74 | "source": [ 75 | "tmp_f(tmp)" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 78, 81 | "metadata": { 82 | "collapsed": false 83 | }, 84 | "outputs": [], 85 | "source": [ 86 | "def propdown(hid):\n", 87 | " pre_softmax_activation = T.dot(hid, W.T) + bvis #---------------------------[edited]\n", 88 | " return [pre_softmax_activation, T.nnet.softmax(pre_softmax_activation)]\n", 89 | "\n", 90 | "ipt = T.matrix()\n", 91 | "\n", 92 | "results, updates = theano.scan( fn = propdown, \n", 93 | " non_sequences = ipt,\n", 94 | " n_steps=1\n", 95 | " )\n", 96 | "\n", 97 | "tmp_f2 = theano.function( [ipt], results, updates = updates)\n" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 79, 103 | "metadata": { 104 | "collapsed": false 105 | }, 106 | "outputs": [ 107 | { 108 | "data": { 109 | "text/plain": [ 110 | "[array([[[ 4., 3.],\n", 111 | " [ 1., 1.],\n", 112 | " [ 3., 2.],\n", 113 | " [ 2., 2.]]], dtype=float32), array([[[ 0.7310586 , 0.26894143],\n", 114 | " [ 0.5 , 0.5 ],\n", 115 | " [ 0.7310586 , 0.26894143],\n", 116 | " [ 0.5 , 0.5 ]]], dtype=float32)]" 117 | ] 118 | }, 119 | "execution_count": 79, 120 | "metadata": {}, 121 | "output_type": "execute_result" 122 | } 123 | ], 124 | "source": [ 125 | "tmp_f2(np.array([[1,1],[0,0],[0,1],[1,0]], dtype = theano.config.floatX))" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 76, 131 | "metadata": { 132 | "collapsed": false 133 | }, 134 | "outputs": [ 135 | { 136 | "data": { 137 | "text/plain": [ 138 | "array([[ 43., 24.],\n", 139 | " [ 2., 3.],\n", 140 | " [ 4., 4.],\n", 141 | " [ 3., 4.]], dtype=float32)" 142 | ] 143 | }, 144 | "execution_count": 76, 145 | "metadata": {}, 146 | "output_type": "execute_result" 147 | } 148 | ], 149 | "source": [ 150 | "(T.dot(np.array([[1,20],[0,0],[0,1],[1,0]], dtype = theano.config.floatX), W.T) + bhid).eval()" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 77, 156 | "metadata": { 157 | "collapsed": false 158 | }, 159 | "outputs": [ 160 | { 161 | "data": { 162 | "text/plain": [ 163 | "array([[ 1.00000000e+00, 5.60279645e-09],\n", 164 | " [ 2.68941432e-01, 7.31058598e-01],\n", 165 | " [ 5.00000000e-01, 5.00000000e-01],\n", 166 | " [ 2.68941432e-01, 7.31058598e-01]], dtype=float32)" 167 | ] 168 | }, 169 | "execution_count": 77, 170 | "metadata": {}, 171 | "output_type": "execute_result" 172 | } 173 | ], 174 | "source": [ 175 | "T.nnet.softmax((T.dot(np.array([[1,20],[0,0],[0,1],[1,0]], dtype = theano.config.floatX), W.T) + bhid).eval()).eval()" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 39, 181 | "metadata": { 182 | "collapsed": false 183 | }, 184 | "outputs": [ 185 | { 186 | "data": { 187 | "text/plain": [ 188 | "array([[ 0.1748777 , 0.1748777 , 0.1748777 , 0.47536689]], dtype=float32)" 189 | ] 190 | }, 191 | "execution_count": 39, 192 | "metadata": {}, 193 | "output_type": "execute_result" 194 | } 195 | ], 196 | "source": [ 197 | "T.nnet.softmax(np.array([0,0,0,1], dtype = theano.config.floatX)).eval()" 198 | ] 199 | } 200 | ], 201 | "metadata": { 202 | "anaconda-cloud": {}, 203 | "kernelspec": { 204 | "display_name": "Python [conda env:py3]", 205 | "language": "python", 206 | "name": "conda-env-py3-py" 207 | }, 208 | "language_info": { 209 | "codemirror_mode": { 210 | "name": "ipython", 211 | "version": 3 212 | }, 213 | "file_extension": ".py", 214 | "mimetype": "text/x-python", 215 | "name": "python", 216 | "nbconvert_exporter": "python", 217 | "pygments_lexer": "ipython3", 218 | "version": "3.5.2" 219 | } 220 | }, 221 | "nbformat": 4, 222 | "nbformat_minor": 2 223 | } 224 | -------------------------------------------------------------------------------- /scripts_old/Untitled2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 59, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import numpy as np\n", 12 | "import theano\n", 13 | "from theano import tensor as T\n", 14 | "theano_rng = T.shared_randomstreams.RandomStreams(1234)\n", 15 | "W_values = np.array([[1,1],[1,1]], dtype=theano.config.floatX)\n", 16 | "bvis_values = np.array([1,1], dtype=theano.config.floatX)\n", 17 | "bhid_values = np.array([1,1], dtype=theano.config.floatX)\n", 18 | "W = theano.shared(W_values)\n", 19 | "vbias = theano.shared(bvis_values)\n", 20 | "hbias = theano.shared(bhid_values)\n", 21 | "\n", 22 | "def propup(vis, v0_doc_len):\n", 23 | " pre_sigmoid_activation = T.dot(vis, W) + T.dot(hbias.reshape([1,hbias.shape[0]]).T,v0_doc_len).T #---------------------------[edited]\n", 24 | " return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]\n", 25 | "\n", 26 | "def sample_h_given_v(v0_sample, v0_doc_len):\n", 27 | " pre_sigmoid_h1, h1_mean = propup(v0_sample, v0_doc_len)\n", 28 | " h1_sample = theano_rng.binomial(size=h1_mean.shape,\n", 29 | " n=1, p=h1_mean,\n", 30 | " dtype=theano.config.floatX)\n", 31 | " return [pre_sigmoid_h1, h1_mean, h1_sample]\n", 32 | "\n", 33 | "ipt = T.matrix()\n", 34 | "ipt_rSum = ipt.sum(axis=1).reshape([1,ipt.shape[0]])\n", 35 | "\n", 36 | "[out_1,out_2,out_3], updates =theano.scan( sample_h_given_v,\n", 37 | " non_sequences =[ipt, ipt_rSum],\n", 38 | " n_steps=1,\n", 39 | " name=\"gibbs_hvh\" )\n", 40 | "\n", 41 | "hgv = theano.function( [ipt], outputs=[out_1,out_2,out_3])" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 60, 47 | "metadata": { 48 | "collapsed": true 49 | }, 50 | "outputs": [], 51 | "source": [ 52 | "b = np.array([[1,6,],[1,3],[5,1]], dtype = theano.config.floatX)" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 61, 58 | "metadata": { 59 | "collapsed": false 60 | }, 61 | "outputs": [ 62 | { 63 | "name": "stdout", 64 | "output_type": "stream", 65 | "text": [ 66 | "[[[ 14. 14.]\n", 67 | " [ 8. 8.]\n", 68 | " [ 12. 12.]]]\n", 69 | "[[[ 0.99999917 0.99999917]\n", 70 | " [ 0.99966466 0.99966466]\n", 71 | " [ 0.9999938 0.9999938 ]]]\n", 72 | "[[[ 1. 1.]\n", 73 | " [ 1. 1.]\n", 74 | " [ 1. 1.]]]\n" 75 | ] 76 | } 77 | ], 78 | "source": [ 79 | "pre_sigmoid_ph, ph_mean, ph_sample = hgv(b)\n", 80 | "print(pre_sigmoid_ph)\n", 81 | "print(ph_mean)\n", 82 | "print(ph_sample)" 83 | ] 84 | } 85 | ], 86 | "metadata": { 87 | "anaconda-cloud": {}, 88 | "kernelspec": { 89 | "display_name": "Python [conda env:py3]", 90 | "language": "python", 91 | "name": "conda-env-py3-py" 92 | }, 93 | "language_info": { 94 | "codemirror_mode": { 95 | "name": "ipython", 96 | "version": 3 97 | }, 98 | "file_extension": ".py", 99 | "mimetype": "text/x-python", 100 | "name": "python", 101 | "nbconvert_exporter": "python", 102 | "pygments_lexer": "ipython3", 103 | "version": "3.5.2" 104 | } 105 | }, 106 | "nbformat": 4, 107 | "nbformat_minor": 2 108 | } 109 | -------------------------------------------------------------------------------- /scripts_old/gen_testing_1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 14, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [ 10 | { 11 | "data": { 12 | "text/plain": [ 13 | "array([[ 0.],\n", 14 | " [ 0.]])" 15 | ] 16 | }, 17 | "execution_count": 14, 18 | "metadata": {}, 19 | "output_type": "execute_result" 20 | } 21 | ], 22 | "source": [ 23 | "import numpy as np\n", 24 | "\n", 25 | "a1 = np.array(([1],[0]))\n", 26 | "a2 = np.array(([1],[0]))\n", 27 | "np.zeros([2, 2], 'float64')[a1,a2]" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 15, 33 | "metadata": { 34 | "collapsed": false 35 | }, 36 | "outputs": [ 37 | { 38 | "data": { 39 | "text/plain": [ 40 | "array([0])" 41 | ] 42 | }, 43 | "execution_count": 15, 44 | "metadata": {}, 45 | "output_type": "execute_result" 46 | } 47 | ], 48 | "source": [ 49 | "a1[-1]" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 176, 55 | "metadata": { 56 | "collapsed": false 57 | }, 58 | "outputs": [], 59 | "source": [ 60 | "import theano\n", 61 | "from theano import tensor as T\n", 62 | "\n", 63 | "\n", 64 | "\n", 65 | "W_values = np.random.randn(3,3)\n", 66 | "bvis_values = np.random.randn(3)\n", 67 | "bhid_values = np.random.randn(3)\n", 68 | "W = theano.shared(W_values) # we assume that ``W_values`` contains the\n", 69 | " # initial values of your weight matrix\n", 70 | "\n", 71 | "bvis = theano.shared(bvis_values)\n", 72 | "bhid = theano.shared(bhid_values)\n", 73 | "\n", 74 | "trng = T.shared_randomstreams.RandomStreams(1234)\n", 75 | "\n", 76 | "def OneStep(vsample) :\n", 77 | " hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)\n", 78 | " hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)\n", 79 | " vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)\n", 80 | " return trng.binomial(size=vsample.shape, n=1, p=vmean,\n", 81 | " dtype=theano.config.floatX)*5\n", 82 | "\n", 83 | "sample = theano.tensor.matrix()\n", 84 | "input = sample[:,:-2].flatten()\n", 85 | "\n", 86 | "values, updates = theano.scan(OneStep, outputs_info=input, n_steps=10)\n", 87 | "\n", 88 | "gibbs10 = theano.function([sample], values[-1], updates=updates)" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 173, 94 | "metadata": { 95 | "collapsed": false 96 | }, 97 | "outputs": [], 98 | "source": [ 99 | "a = np.array([[1,2,3],[1,2,3],[1,2,3]], dtype = theano.config.floatX)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 174, 105 | "metadata": { 106 | "collapsed": false 107 | }, 108 | "outputs": [ 109 | { 110 | "data": { 111 | "text/plain": [ 112 | "array([ 0., 0., 5.], dtype=float32)" 113 | ] 114 | }, 115 | "execution_count": 174, 116 | "metadata": {}, 117 | "output_type": "execute_result" 118 | } 119 | ], 120 | "source": [ 121 | "gibbs10(a)" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 152, 127 | "metadata": { 128 | "collapsed": false 129 | }, 130 | "outputs": [ 131 | { 132 | "data": { 133 | "text/plain": [ 134 | "array([ 1., 1., 1.], dtype=float32)" 135 | ] 136 | }, 137 | "execution_count": 152, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "a[:,:-2].flatten()" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 156, 149 | "metadata": { 150 | "collapsed": false 151 | }, 152 | "outputs": [ 153 | { 154 | "data": { 155 | "text/plain": [ 156 | "array([ 6., 6., 6.], dtype=float32)" 157 | ] 158 | }, 159 | "execution_count": 156, 160 | "metadata": {}, 161 | "output_type": "execute_result" 162 | } 163 | ], 164 | "source": [ 165 | "a.sum(axis=1)" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 609, 171 | "metadata": { 172 | "collapsed": false 173 | }, 174 | "outputs": [], 175 | "source": [ 176 | "def MultSim( v, v_sum) :\n", 177 | " return trng.multinomial(size=None, \n", 178 | " n=v_sum, \n", 179 | " pvals=v/v_sum[:,None], \n", 180 | " dtype=theano.config.floatX)\n", 181 | "\n", 182 | "input = T.matrix()\n", 183 | "input_rSum = T.vector()\n", 184 | "\n", 185 | "values, updates = theano.scan(MultSim, outputs_info=input, non_sequences = input_rSum, n_steps=3)\n", 186 | "\n", 187 | "gibbs_mult = theano.function([input,input_rSum], values, updates=updates)\n", 188 | "\n" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 610, 194 | "metadata": { 195 | "collapsed": false 196 | }, 197 | "outputs": [], 198 | "source": [ 199 | "b = np.array([[2,0,0],[0,0,2],[1,1,1]], dtype = theano.config.floatX)\n", 200 | "out = gibbs_mult(b,b.sum(axis=1))" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": 670, 206 | "metadata": { 207 | "collapsed": false 208 | }, 209 | "outputs": [ 210 | { 211 | "name": "stdout", 212 | "output_type": "stream", 213 | "text": [ 214 | "(3, 3, 3)\n" 215 | ] 216 | }, 217 | { 218 | "data": { 219 | "text/plain": [ 220 | "array([[[ 2., 0., 0.],\n", 221 | " [ 0., 0., 2.],\n", 222 | " [ 1., 1., 1.]],\n", 223 | "\n", 224 | " [[ 2., 0., 0.],\n", 225 | " [ 0., 0., 2.],\n", 226 | " [ 0., 1., 2.]],\n", 227 | "\n", 228 | " [[ 2., 0., 0.],\n", 229 | " [ 0., 0., 2.],\n", 230 | " [ 0., 1., 2.]]], dtype=float32)" 231 | ] 232 | }, 233 | "execution_count": 670, 234 | "metadata": {}, 235 | "output_type": "execute_result" 236 | } 237 | ], 238 | "source": [ 239 | "print(out.shape)\n", 240 | "out\n" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": null, 246 | "metadata": { 247 | "collapsed": true 248 | }, 249 | "outputs": [], 250 | "source": [] 251 | } 252 | ], 253 | "metadata": { 254 | "anaconda-cloud": {}, 255 | "kernelspec": { 256 | "display_name": "Python [conda env:py3]", 257 | "language": "python", 258 | "name": "conda-env-py3-py" 259 | }, 260 | "language_info": { 261 | "codemirror_mode": { 262 | "name": "ipython", 263 | "version": 3 264 | }, 265 | "file_extension": ".py", 266 | "mimetype": "text/x-python", 267 | "name": "python", 268 | "nbconvert_exporter": "python", 269 | "pygments_lexer": "ipython3", 270 | "version": "3.5.2" 271 | } 272 | }, 273 | "nbformat": 4, 274 | "nbformat_minor": 2 275 | } 276 | -------------------------------------------------------------------------------- /scripts_old/score_rsm.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 43, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import numpy as np\n", 12 | "import pandas as pd\n", 13 | "\n", 14 | "from six.moves import cPickle as pickle\n", 15 | "\n", 16 | "import theano\n", 17 | "import theano.tensor as T\n", 18 | "\n", 19 | "from lib.rbm import RSM\n", 20 | "\n", 21 | "import matplotlib.pyplot as plt\n", 22 | "%matplotlib inline" 23 | ] 24 | }, 25 | { 26 | "cell_type": "code", 27 | "execution_count": 53, 28 | "metadata": { 29 | "collapsed": false 30 | }, 31 | "outputs": [ 32 | { 33 | "data": { 34 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAh4AAAFkCAYAAABvkjJwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzt3XecXGXZ//HPlUYoIfRQpT6GGmCjGBUEBIJ0hEdgBQQk\nCEgNECmKhCZFCCpFfxi6sggoDyBIQBGiVEkAgYRQAiEb0hM2kISU3ev3xzWHnZ3d2Z3ZnZbZ7/v1\nmtfsnHPPfe45O3POde52zN0RERERKYUe5S6AiIiIdB8KPERERKRkFHiIiIhIySjwEBERkZJR4CEi\nIiIlo8BDRERESkaBh4iIiJSMAg8REREpGQUeIiIiUjIKPERERKRkVpjAw8xOM7MPzGyxmb1oZl8t\nd5lEREQkPytE4GFmRwLXA5cAOwOvA2PMbJ2yFkxERETyYivCTeLM7EXgJXc/K/XagKnAb9z92rIW\nTkRERHJW8TUeZtYbGAz8I1nmES39Hfh6ucolIiIi+etV7gLkYB2gJzAzY/lMYGBbbzCztYF9gQ+B\nz4tZOBERkSrTF9gMGOPucwud+YoQeHTGvsAfy10IERGRFdjRwL2FznRFCDzmAI3AgIzlA4AZWd7z\nIcAf/vAHttlmm+KVTFoYPnw4N9xwQ7mL0a1on5ee9nnpaZ+X1sSJEznmmGMgdS4ttIoPPNx9mZmN\nA/YCHoEvOpfuBfwmy9s+B9hmm22oqakpSTkF+vfvr/1dYtrnpad9Xnra52VTlK4KFR94pIwC7kwF\nIC8Dw4FVgDvLWSgRERHJzwoReLj7/ak5Oy4jmlheA/Z199nlLZmIiIjkY4UIPADc/RbglnKXQ0RE\nqsMnn8CyZbDuuuUuSfdS8fN4yIqjtra23EXodrTPS0/7vPSKtc+POQb22AOamoqSvWSxQsxcmi8z\nqwHGjRs3Th2SRESklalTYdNNwR0eeQQOOqjcJaoc48ePZ/DgwQCD3X18ofNXjYeIiHQ7d90FK68M\nNTXwy18WPv+FC+H55wufbzVQ4CEiIt1KUxPcfjsccQRcfDH861/w4ouF3cZJJ8Fuu0FDQ9fzGj0a\n/va3rudTKRR4iIhIt/LMM/DBB3DiiXDwwfDlLxe21uOhh6CuLgKc117ren6//CUMHx7NQtVAgYd0\n2WefwY47wq23lrskIqX1zjtwwAExOkJWHLffHsHGN78JPXrAuedGsPDuu13Pe84cOOUUOPDAaMoZ\nN65r+blDfT1MmhQ1M9VAgYd02QUXwH//C489Vu6SdOx3v4sDQzGcfTbcUoAB3/X1sOuuMCPbDQG6\nqY8/hgceKHcpmjU2wvHHw+OPV1c1eLFMmgRzC367sfx98gn8+c/wwx+CWSz7wQ9iSO3113c9/zPO\niCG6t94KO+3U9cDjk09g0aL4+/e/73r5KoECD+mSp5+Gm2+GLbeMNtJKrgqcNQtOPbU4P96FC+G3\nv4VRo7q+D269FZ57LqqDpdlll8GRR8ZohEpw003wwguw3nrwxBPlLk1lc4e994ZLLy13SeDeeyMw\nOO645mV9+8KZZ8Kdd8LMzPug5+Evf4H77oMbb4QNNoDBg7seeNTXx/Mhh8CDD8L8+V3LrxIo8JBO\n+/TTaCPdffc44c6aBVOmFHYbr74aY+1Hjux6XskPeMyYrueV6dlnYelSeP/9qP3prMZGuOOO+Ht8\nJwaxNTZ2ftv5mjgRhg2Lg3ixLVsWB133aDsvt8mT4aKL4LTT4IQT4jtV6Lkg3OE3vylM9X8xLV4c\nNYmLF2dPM3ly/P4K0d+hq267LZrH1l+/5fJTT4VevSKgzCWPE06AP/2pORCYMyfyOOQQ+P73Y9ng\nwdEc9+mnnS9vctz62c9g+XL4wx86n1fFcPeqewA1gI8bN86rTWOj+6hR7p98Uu6SuJ9yivuqq7q/\n/777rFnu4F5X1/V8m5rcn3rKfZ99Is9VVnFfeWX3zz7rWr6PPBL59erlvmBB18uZ7swz3TfZxH2N\nNdx/9rPO5/O3v0UZN9/cfe+9c3/fZ5/F/6NHD/eDDnIfOzb2YzHtt1+U9bHHirsdd/fHH49t7bCD\n+6BBxd9ee5qa3Pfay/1LX4rv0T//GWUbP76w23nmmch3m226/t0vlqYm9+OOi3LedVf2dHfcEWnW\nXLP438v2vPpqlOPhh9tef9ZZ7mut1f7+/vzzSLPWWpFXjx7u3/hGPNZay3369Oa0//1vpHn22c6X\n+dZbYxtLl7offnj8Bgq5D//4R/dTT41zS2LcuHEOOFDjRThHq8ZjBfP883DOOfDww+Utxz/+EVc5\n114LW2wR7aNbbNH1IWlTp8Iuu8A++8QVxH33Ra3H4sXRlt4VyZXD8uWFb8YYMwb22y96yP/5z53P\nZ/Ro2GGHqAYePz63ZpuXXoKdd4a774bzzoury299C4YMiVqCYtSCPPdc9GtYZZXYbrHV1cHAgXDF\nFVGj9MYbxd9mNrfdFt//3/8e+vWDb3wDVlut8M0to0bB5ptHLeLppxc270K56aaYD2P11WOfZPOv\nf0V/ivnzo69Oudx+OwwYEL/VtgwfHsNfb789ex4PPwzz5sVv4KOP4jg4YEDUbPzudy1rUrbZpusd\nTOvrI8/evWOI7htvwMsvdz6/dO5wzTVx3O1RymigGNFMuR9UcY3H+edHBH3RReUrQ0NDXO3tuWfL\nKPn733cfMqRreR95pPv667s/+WTLqH7nnd2/972u5X3RRVErsfnm7qed1rW80n3wQfxP/vzn5lqV\nt97KP59Zs9x793b/1a+a85kyJXv6pUvdL7nEvWdP969+1X3SpFje1BS1EHvsEXnsumukLZSmpsh7\n0CD3q69279u3uDVwixa5r7aa+8iR7kuWxFXl+ecXb3vtqa93X3119xNOaLn8kEPcv/Wtwm1n0iR3\nM/fRo6MmoaMahXJ45pn47p19tvu557pvtFH2K/H/+Z/mGrInnihtOROLF0eNy09+0n66o45y32IL\n9+XL214/dGjUbuRqyBD3o4/OPX2mH/7QfZdd4u/GRvdNN3U/8cTO55fuH/+I/8nf/95yebFrPMoe\nJBTlQ1Vx4LHttvFfO/zw9tNddJH7bbcVpwznnBNNLJMnt1z+m9+49+kTVZGd8dJL8dnaKvcvfhFN\nLgsXdi5v96gS/sY3okliq606n0+m3/0uDsCffBIHt9VWc7/ssvzzuf762H9z5sQJDtwfeih7+gMO\niO1eckn2wOLvf4+mpQsvzL882Tz1lH9RXV1f33yC7IpFi7If6B94ILaXBFannuq+8cYtg95imz7d\n/Z573L/2tQiM581ruf63v4393FYA1tTkPmKE+4sv5r69U091X2+9+D65ux9/fHz/J0zo/GcopI8+\ncl933bj4WLasuSls4sTWaadPj3X33huf4brrSlvWxYvd/9//c//yl+P38vbb7ad/+eUo71/+0nrd\nhx/G9z2fY+tpp7lvvXV+ZU43dKj7YYc1v77sstiPDQ2dzzNx0EFtN90o8FDg8YX334//2CabuG+/\nfftp11uvOG3h06fHFe7Ika3XJYFDPgfYRFOT++67u2+3XdsnoHffjbwfeCD/vBN77RW1Jg89FHm9\n/37n80p36KFRq5A46ij3HXfML4+mpmjLP+KI5tfrruv+85+3nX7GjPgMt97acd6/+EUcLP/5z/zK\nlK2cX/taXIElB6u9947/XVfstVfUorQVTBx2mHtNTfPrf/87PnshPk97pk93P++8+B1FpXT8X59+\nunXayZOzn6wee6z5vbkES3PmRJ+mSy9tXvbZZ/H92H773ILv998vTGB2001Rw3TIIfH3pEkRJH7l\nK1HrOWtWpPv006itu+mm1nkkgeO0aVEzd9xxXS9XLubOdb/iCvcBA+L7f9hhcYzKxW67uX/zm62X\njxwZFxaffpp7OW67Lbbf2X5l224bfcgSU6dGn4/f/a5z+SXeeSd7EKXAQ4HHF37967givvJK95VW\nyn6FOHt284Gyvj6/bXTUaWn4cPf+/d3nz2+9bsmSKNevf53fNt3dH33UO+youNNOzSfmzhg4MMr/\nySdx5fPb32YvS1snl7YsXerer5/75Zc3L3vwwfgs776be9mefz7e8+STzcv23df9wAPbTn/ffZH+\n4487znv58ggMNtooDsZdkTQBPfVU87K7745lH37YuTyTgBqi1izdJ5/Ed+qXv2xe1tTkvtlm7sOG\ndW57uTruuGhW+cEP3P/whwj22jNwoPuPftRyWRKobbZZfL777+94u1deGcF9clJPvPFGBCQnnJD9\nt79kSXzHwf2qqzreVnveey+2t88+0YzUu3fku/rqUb7Mw+tuu7l/97ut8znjjGi6cI9mg8GDs29z\n9OgIQLvadFdXFx29V1opajjfeSe/9ycXJ+kXUZ1t5njttchr7Nj83pdYfXX3a69tuezAA9vfj7k4\n7bS4uElq1dIp8FDg8YW9945qtyee8Hav2JPe8JBfFfhrr0UTxOmnt70+qe245JLseXz96+61tblv\n0z2qarfd1v3b324/8Lnyys43tzQ1RfPQ9dfH6113jZqKTDNmRLr+/XM7qY8dG/v5P/9pXvbZZ3HA\nvvrq3Mv3wx/GQS39KvWCCyJYaMtJJ8UVcK4++ijatw8/vPM94hsb46p9jz1a5vHpp/F/ueKKzuV7\n5ZWxz084IfJJ/17feWfs348+avmen/40/kdtHTQLoanJfYMNosYjV2edFbUA6ftmzJgo/5gx0cdh\n662zBw3u0Uy5/vqtA5hEMjpkxx2jfT7dlCkR5PTqFc/9+rUOXnLV1BS/x802a766//TTuDA455wI\nQDONHBkn+8zPt9NOzbUco0bFMSTbPkj6Je29d+f6Jc2bF8cfiP5iHQWL2Sxf7r7llpFH4sknI9/n\nnssvr6VLIwC64Yb8y9HQ4F80U6V7+OFYfsYZnQv4582L31q2GlUFHgo83D2+gL17u994Y3Nnxscf\nbzvtzTdH2sGDO+4LkvjTn+KLuPHGkfd997VOc8452Ws7EmefHZ038/H738c2X3ml/XTvvBPpHnww\nv/zd4woK4nO6Rw3F6qu3PriddVZ8xnXXza125aKL3NdZp3W19uGHR7VyLhYsiBNvZvPV/fdHmWfO\nbP2eLbfMv4NsUuXd2f4Yf/pTvP/f/2697phj4oo/36CmqSmCzqOPjhPbppu27LS8775xJZ1pwgT/\nokNvMbz5ZnPAkKtkKHTSD6OpKfoUfe1r8fcrr8T6u+/OnkcSaLXXl+PFFyPAB/eDD47mj0cfjcDy\nS1+K9bNnx/f7jDNyL3+60aPz//xJE9jLLzcv++STln2AkpN3WzUQS5ZEUHL44XH8Ov74/L5Pf/97\nHL/6948hol11001RM5qc2I86KgLHzgTuu+wSv5F8Jd/zzNqSxsY4Xqy5ZpSxtrbj42e6a6+N2vP0\nob/pFHisgIHH22/Hl7SznSzbkpw0PvggvnR9+2aPoH/84+grcemlbZ9c0y1fHh0PIb68CxdGlN+v\nX1S1JmbMiKv4bBFyImkCaOtk2ZbPPosry+9/P7f0O+7Y8iokV8mJJDlpJv1R/vWv5jRTpsSP8Yor\nomo9lzkqBg9uu4bn3ns95+aH0aPj4Jw5giXp15I5CuDDDzt/0j3xxAgw77gjDmrtXX2nW7AgOuft\nt1/b65MTSvpJJxdJNXSyn5OOq7/9bVyt9+zpfsstbb+3pqbtqv1CuOGGuEpdtCj39yxaFL/LUaPi\ndTJi4K9/bU7z3e9Gs0Nbv8mmpuhPsv/+HW+rqSl+a5tuGjUcEB0F05vSrroq1uXT5OcefTH6948T\nfz6WLo3+D+lNPEmn06RjcNLRtK2+MM8951/UHt5zT/yd3s+lPVdfHen33LN17VhnffZZnNjPOSf2\na2aTXz5OPTW/GspE8nvI7MifXsYbb4yLvaSm6L//bT/PZcuin2B7fW0UeKyAgcfNN3unr8yzOe64\nlh1KBw2Ktsu27L57XK0nvbOzTV7T0BAjI3r0iAg4ieQbGuLg+JWvxFWIewyXW3311r35MyUnxbaq\nYtty2WVxsv/gg9zSX3FF1A7kc0Jwb26eSgKB5cuj01z6ZF/DhkVNx6efxr4YOjQO7NkmE5o507MO\nc2xoiM+VnITaM2RIXNlnamyMff6LX7RcfscdEah0pr/GZ59Fp7mkKW6VVeKq/LzzsveSX7w4Duir\nr559mPDy5e4bbpi9mS6bESOixij9RHzSSXECO//8CDyyNReMGhX7eN68+H+98050lDv33OgP0RX7\n7x8dXvO1777xvXGPZoOampZXyP/9b/zv2uoU/Pe/e5tDG9uzeHGcDG+6qfWV+KJFUQOQ7zD07343\nOmR25vu1//4tJ7678MLo6J6UranJfe212w4orr46/u/LlsXryy/P/vtKl9QSXXRR4Uc6XXhhXIRd\ncUUEcbleUGVKLi4yO6XW10eN32uvtf2+pGmto4vY5cujhnTgwDien3Za9v9fUnOZbZvuCjxWyMAj\nqUE44IDC5Ld8eRycL7igedn3vhcHtkzpP+zGxjiRpr8v3bBhcTL5299ar/vPf6K6c/jw5tqOiy/u\nuKxNTXHQymWekc8/j+0PH95x2sSkSd6pq/3kh58EUu5Rc5KMj580KU5y6bVI770XV7Dnntt2nkmt\nSLbqyoMOartnfLo33vB2R+t861utTxzHHttylEdnzJsXHWivuy5qm/r1iwNg5tXx0qVRnd+3b8ed\n49oKItrT2Bgnxh//uOXyTz5pbvL7zneyv//jj+Mgu8su0S8C4n+89tpxkvjJTzo34+eSJRHc5tNH\nJ5HUlCSBblvDoWtr4/Ml/VMWLoxagjXWaB2odFVyUn7hhdzSJx2jOzt67Prr47uSfLZdd23d3LvH\nHm0HQwccEB1ZE01NUUPXq1f2i7innor1w4YVZ0bUadPiONizZ9dq15IZU9NrWN2jiRFi7p62XH55\nHMNztWRJ/Kb79YsLq1tuiePT9Onxe/n447jQ2XPP9vNR4LECBh7HHht7tkeP+OJ2VTLiIb1T089+\nFk0UmZJhlskP9Zhj2h7a+dFH8YO65prs2x01KvLabbcIEHK9AjrkkNyuFpODc0dVg5l23DGasvJx\n6aUREKVLhrnNmRP5pZ8MEldfHQedtqbDPvbY9ofN3nln5N/eyKIzz4wDS3pAlO7ss6M/R6KpKTqc\n5tPpMRcTJ0ZTypprNo9YaWyMz9irV27ToifTQ+da2/Xss561z0jSX6KjtvpTTong7oILoozz58e+\nTEaGfOlL2afHzibpnN2Zw8fEifHeddeNWsm2rsAnTYpjw/XXR5PSBhvEb/H00zt/RZ3N8uVRjl13\n7fjEvHBh/EYOPbTzJ/HXX4/P/49/xG+pT5/WJ9XTT2/d7NDYGIFX5vw3S5e6/+//Rp7HHtuyxvX1\n1+ME+53vFHaCvEzJ8Ty9ySxfSQfT9H2R9Inp3Ttq+dpy8skxeWK+pk+PztpJzWbmo6PfhAKPFTDw\n2HPP+DH07du5q6ZMF14YV5Lp7fFJG2hm9XjSrpxM5PPHP8brzADorLPih97e2PKmphi2BbnVdiSu\nuioOCB31H/jxj6PXfL4Huc40t5x0UuvhZ1On+hdVtBCdXDMtXRoH7sGDW1aTNjZGFXJ7M2jOnx9V\nx9lqnBYtihP9iBHZ80hmrUyGFyY1Ptk6FnfFvHnRVNCzZxwgzzgjAqe2Ohpns9NOLSc7as+PftR6\nJE+6Dz/s2lXs++83z5Z5+OG5z6Pw05+23WE4F01N8Zk6qjVITgpmcXFQqDll2pIE+P/3f+2nS0ZK\ndDTBVnuSWtaLLmoe8ZV5GE4m3EtvPkgClmeeaZ1nU1P8Dvr3jyDt0Ufjt7vRRnFSLvR9lzK99178\nFpImoM766lcjiHGPY+POO8eyo47KPhPqAQdEzWlnvfFG7K/k8de/Rk1nR78rBR4rYOCx1VZRPX/0\n0XEVmc/Bs620O+wQcwmk+89//IuOWOmS2UOTH8ns2a0niZk5M7eOou5RG/DTn+Y3rv7pp6Ns7bWz\nNzVFDUP6xDi5Sk6+l1+e+77db7+oicm03XaR11ZbZb9qevHFOFD26BFByEknxZUZdDzfxwUXRPAx\nZ07rdUnwmHS8a0vSFJMckJMZMot1sF22LL67yZVRvpMU3XhjlK+jochLlkTQVcgZVdvS1BRt3/36\nRVCUy7w2u+zSuQ7MieHDY1vtBS7TpsV3//XXO7+dfOyzT4zIaK9MJ50Ux6uuOvLIGMlz5ZVtX4Ak\nnUjT+xjceGNc+bd3MVFf3xxIrrNO1GYVoka5VE45JZoz3WMmVYhjy5VXxkVgW8eyHXeMjqmlVtWB\nB/Ah0JT2aAR+kpFmE+AxYCEwA7gW6NFBvmULPJqaoqbjV79q7iyW67jvurp470knNXfiSzprZk48\nlIzv/sMfWi4/+eQIVNINGRLVlYmLLooag7ZOhoWwYEGcpNsbtjl+vOfdkS7deef5Fz35Z8/uOP0O\nO7Q9/DSZbClznHymd9+NGpFhw6KTr1mcOLM1kSRmzYoOnG3VGH3rW23300m3bFnLEUzf+15+94no\nrPvvj6aifH3ySXze9AnV2pJMRNbVTqC5+u9/oyf/Rhu136lu3ryOv7sdWb684+9FqbXXrOUeAckG\nG2Tvz5SP3/8+9mG2TtPJ0PZ77mledsQRuX2vm5riImqXXWKk2ook2S9Tp0Y/pGTU0P/9n7dZK+0e\n6a68srTldK/+wOMD4CJgXWC91GPltPU9gDeAMcAOwL7ALOCKDvItW+AxZ45/0cci35nuhgyJmykl\nHeX23TdOdL16tT3iYIMNWt+CfdddW/d/uPTSqKZctiyq/1dfvfB9BDINGtT+zJIjR0aZutI2+/DD\n8cPccMOOax7WWqv16BD3aJI677z8q9UXLMh9cqJkRFD6/Cdvv+059WFwj4PsscdGGddZp/X/vNIM\nGxYn+faa2o48snWAXGzTpkXnzdVWy95nJelc2d7N+VZESUfebHO/JDWohZiGPpk+HrJPKvelLzU3\nUyaTtZXrxn+lklxsDRkSNUFJp/RkfqL0WYvdo/Ynl1E9xVDswKOUN8LN5jN3n+3us1KPxWnr9gW2\nBo529zfcfQxwMXCamfUqS2k7kNx6feON4zbDxx8Pf/oTLFzY/vvefDNuKX/11XEb7Lvvhlmz4jbp\ne+wRt53ONHAgTJrU/Nod3noLttuuZbr99otbPb/wAtxyCyxZAuec05VP2bGvfS0+TzaPPBLl6t27\n89s4+GB4/fXYD3vtBZde2na6RYviNtYbb9x63dZbwy9/mf8tofv1i1th5+K882DpUvjNb5qXjR4N\na60Fhx3W8ft33hlefTW+I3PmwLe/nV9ZS+2UU+I2248/3vb6Tz+N///RR5e2XBtuCGPHwp57wkEH\nwT33tE7z1FPw5S/Dl75U2rIVW48ecOSR8MADsHx56/V//SussQZ885td39bmm8cDYLfd2k6z/fbx\nfQaYPBmmT8+etlpstx306RPHxZ//PG51D7DFFtC3bxy7002bFs9tHbdWdJUQeFxgZnPMbLyZnWdm\nPdPWDQHecPc5acvGAP2BjNNrZZg6NZ6TL8txx8Fnn8GDD7b/vttug3XXhQMPjC/nscfCuHERLIwe\n3fZ7tt66ZeAxYwbMnx8/6nSDB0fef/4z3HAD/PCHsMEGnft8uRoyJH5IDQ2t19XXw/jxETh01UYb\nxclixAgYORI++qh1mnL/gNdfH04+Ofb9ggUR+N15J/zgB3HA6UhNDUycCI89BiutBF//etGL3CWD\nB8NXvgK/+13b62+/HRYvhqOOKm25AFZdFR56KH6XJ54YgUi6p56CffYpfblK4aij4mLmmWdar3v0\n0a5fCKTbe+84ju2yS9vrt98e3ngj/h47FswKE/RUsj59YKedIrA988zm5T17xrF8woSW6dMvYqtN\nuQOPXwNHAXsAvyOaXa5JW78+MDPjPTPT1lWc+vr4IiXR7OabxxXqHXdkf8+SJVHDcdxx8eVMmMUJ\nfNNN237fwIHw7rvQ1BSvk4g5s8ajRw/Yd1+48cYITEaM6Nxny8c++8Rnueaa1usefRR69YLvfKcw\n2+rZs/mH/OqrrdeXO/CA2OeLFsHNN8PDD0fNxUkn5fbenXeGxsZ47ze/mVuwUm6nngp/+xt88EHL\n5W++CRdcEOuzfa+LrWdP+H//L66wDzsM3n8/lk+eHI+hQ8tTrmIbPBi23BLq6lounzYtLgQOPLBw\n27rgArj33uzf1R12iIuEBQvgX/+CQYOixqXa3X03PPFEy+M8xDE7s8YjCTw22qg0ZSulggceZnaV\nmTW182g0sy8DuPuv3H2su7/p7rcC5wBnmFmB4u7Sq6+PKt2eafU2J5wAzz7bfIDL9PDD0RRw4on5\nbWvgwLhyTGpZ3norfuhbbNE67X77RYBy9NHN1aDFtMkmcfC57jp4++2W6x55BL71LVhzzcJtb8MN\nYb314gCaqRJ+wBttBMOGwfXXR5PLN78J226b23t32CG+T9OmVX4zS+LII6N58Pe/b16W1HJsuWXs\nh3Lq3TuaHdZcM5pdGhqitqNnz2jarEZmUFsLf/lLXOwkHnssPnehLgQgjkGHH559fVIr++abEXhU\nezNLYuDAto+/224bx+/oohjq6+P7ueqqpStfqRSjn8R1QDvX9wBMzrL8ZaJMmwHvEqNYvpqRJmlZ\nn9FRQYYPH07//v1bLKutraW2trajt3ZafX3rK+vDDoPTTosmk6uuav2e0aNh112jui0fAwfG89tv\nx9XjW29FHulBT2L//ePAcvHF+W2jK84/P9rRTz89Dupm0b7/9NNw7bWF3ZZZNElkCzzWXBNWWaWw\n28zX+efHifi556KpJVd9+8aB6Y03VpzAY9VVoynpttuiCaxPn+jr8v778J//wMorl7uE0cfmr3+N\n/khHHhn7eciQtvtTVYujjoIrroAnn4yAC6IG8pvfjP1RKslx6qmn4L33uk/gkc1220Xw+/HHzRdI\nbZ1LiqGuro66jGqwhrbayAupGD1WO/sAjgaWAf1Tr7+Ter1OWpofAfOB3u3kU7ZRLd/+dtvTAf/k\nJzE6JXMT7DviAAAgAElEQVT4aNIDvDNDF5cvj9nwfv3reP2Nb8TcIZXkscfi8yWTUCWjBooxYdKF\nF8YIl0ynnVb6ERTZnHxyDMNduDC/9x13XIzGKOYMjYWW3Jjvvvti6nDIfsO3cnrqqZinBVrfIbga\nbb99840NFy6M4dqdvflZV2y9dQxvho7nfal2yQ0h00e2HHJI9psyFlvVjmoxsyFmdpaZDTKzzc3s\naGAUcI+7J+HWk8AE4J5Uun2By4Gb3H1ZmYrervr6aGbIdMUVcbV6+OHRUTBxxx1xhfW//5v/tnr2\nhP/5n+hgmm1ES7ntvz8cemiMoklGM2y/fdvNQV1VUxNXDDMy6sKmTaucDlo33BD9UPKtfRkxAu66\nq3Cd/0phu+2iSe2aa6IZ8dBDY8RLpdl77+j/1KMHHHBAuUtTfLW10by7cGHUPn7+eXPtRyltv338\nNrfaqvid3Svd5pu3HtlSScetQitn59IlRMfSZ4A3gQuB64GTkwTu3gQcSEws9jxwN3AncElpi5ob\n9+zVY717w/33x7oDDoje5Y2N0cO/trbz7XjJkNqPP46qukoLPAB+9avo1Przn0d7ciFGs7Slpiae\nMzuY1tdXTgetlVfuXKfK7bbLbehtpTnllOZA67bbokmsEp16avwmv/KVcpek+I48Mjo6P/ZYNLNs\ntVWMtCi1HXaI5+7ezALNI1vSA49SNbWUQ9kCD3d/1d2/7u5rufuq7r69u1+bWZPh7lPd/UB3X83d\nB7j7+amApOLMnx8/6Gxflv7948e+aFFc/T38cES1w4Z1fptJ4JFtREsl2HRT+NnPIgCZO7d4gcfm\nm8c+zuznUc0/4Ep32GFwzDERdJeyD0FnrL12uUtQGltuCV/9aow6+etfo7ajHAFh0sFUgUdIH9my\ndCnMnFm9x61yD6etKrmMu95002hueO01+P73YccdY5hbZw0cGNt96aW4mi7FiJXOOPfcuKoaMCAO\nesXQVgfTZcuq+wdc6VZaKToYV/rcI91N0tzy8cflaWaB6FC/++4x4k4i8JgwIWrOp0+P52o9binw\nKKAk8Girj0e6XXaJg/GSJTGxVFeuNpKRLQ89BNtsk/8MnKWy0kpxoHvwweKWMTPwSH7AldLUIlIJ\njjgijjv9+0cAUA7rrReTma1fkTMylV76yJZqnjwMijOcttuqr4+Tai4/pMMPhw8/7PrUzEng8eqr\nMdtpJct3uHBn1NTEHBHz5kXVfrX/gEU6Y6ONYqK0DTdcsTosV7NkXp+33opme6je45YCjwKaOjV6\nZ/fKca8WYubGNdaI5ouZMyuzf0eppXcw3WsvBR4i2TzySOXWkHZH6SNbmpriflDVOqeMvnYFlG0o\nbbEltR4KPGJ48aqrNje31NfH64x55ES6vT59cr9IkuLr2TOayydMqP4O8Qo8CqhcXxYFHs169owb\nMSWBx7RpUa1cqcM4RUQSycgWBR6Ss3J9WQYPjo5a5brpVqVJ72Ba7T9gEakeyT1bpk6t7uOWAo8C\ncS/fl2XYsLhfi9prQ01N3LX3008VeIjIimO77eKOva+/Xt3HLZ2qCqShIaYgLkcfj549C3un1xVd\nTU0Egq+/Xt3TDotIdUmayz//vLqPWwo8CkSjJyrHNtvEvCGvvNLcx0NEpNJttlnznZur+bilwCNP\nt98Ov/hF6+UKPCpH794waBA88QQsX67/iYisGJJ7tkB1H7cUeOTprrvgqqti1tF0U6fGyInufpfF\nSlFTA//8Z/xdzT9gEakuSXNLNR+3FHjkaeJE+OyzmOo3XX19BB2aBbAy1NTEjZagun/AIlJddt45\n5h2q9JsqdoUCjzzMnh0PiPuOpNPoicqSzGDauzess055yyIikqsf/xhefrm65x5S4JGHiRPjeZ99\nYrph9+Z1Cjwqy/bbx6yMG22kYcYisuLo2zfu5F3NdEjOw4QJcTI799wYLZF+F9Rqn/BlRdO3b7SV\n6n8iIlJZNFN/HiZMiHuBfPvbcXO2Rx6JWUOhfPdpkezOPrvcJRARkUyq8cjDhAkxpW3v3nDAAc39\nPBYsiFkydXVdWY4/Ph4iIlI5FHjkIQk8AA4+OGbGnDJFc3iIiIjkSoFHjubPh+nTmwOP73wnaj4e\neST6d4CaWkRERDqiwCNHyYiWJPBYffXo6/Hww1HjocnDREREOqbAI0cTJsSwzPRhTgcfDM8+C2++\nCQMGQJ8+5SufiIjIikCBR44mTIAtt4xhmomDD457gdxzj/p3iIiI5EKBR47SO5YmNt44ZsicO1f9\nO0RERHJRtMDDzC4ys+fMbKGZzcuSZhMzeyyVZoaZXWtmPTLSDDKzsWa22MymmNmIYpW5PW0FHgCH\nHBLPqvEQERHpWDFrPHoD9wO/bWtlKsB4nJjEbAhwHHA8cFlamn7AGOADoAYYAYw0s2FFLHcrCxbE\nyJW2Ao+DD45nBR4iIiIdK9rMpe5+KYCZHZclyb7A1sCe7j4HeMPMLgauNrOR7r4cOIYIYE5MvZ5o\nZjsD5wCji1X2TG+/Hc9tBR477gjDh8OBB5aqNCIiIiuucvbxGAK8kQo6EmOA/sB2aWnGpoKO9DQD\nzax/aYoZzSxmsPXWrdeZwahRbQclIiIi0lI5A4/1gZkZy2amrcs1TdFNmACbbQarrFKqLYqIiFSn\nvJpazOwq4Px2kjiwjbu/06VSFcjw4cPp379lxUhtbS21tbV55ZOtY6mIiMiKrK6ujrq6uhbLGhoa\nirrNfPt4XAfc0UGayTnmNQP4asayAWnrkucBHaTJ6oYbbqCmpibH4mQ3YQL87/92ORsREZGK0tbF\n+Pjx4xmc3Hq9CPIKPNx9LjC3QNt+AbjIzNZJ6+cxFGgAJqSlucLMerp7Y1qaSe5e3JAsZeFC+PBD\n1XiIiIgUQjHn8djEzHYENgV6mtmOqceqqSRPEgHGPam5OvYFLgducvdlqTT3AkuB281sWzM7EjgT\nuL5Y5c40aRK4K/AQEREphKINpyXm4/hB2uvxqec9iZEqTWZ2IDHPx/PAQuBO4JLkDe6+wMyGAjcD\nrwBzgJHuflsxCjxhAmy6Kay6astlANtsU4wtioiIdC/FnMfjBOCEDtJMBdqdAcPd3wR2L2DRstp1\nVxg4EJ58Evr1i2UTJsR06MlrERER6TzdqyVl4UKYPx9efDFmI120KJZrRIuIiEjhKPBImT07nn/+\nc3j5ZTjsMFiyRIGHiIhIIRWzj8cKZdaseP7ud2H33WH//WMI7fvvK/AQEREpFAUeKUmNx3rrwU47\nwV/+AoceCk1NCjxEREQKRU0tKUmNxzrrxPP++8P998Nuu8GgQeUrl4iISDVR4JEyaxasuSb06dO8\n7NBDYexYWG218pVLRESkmijwSJk1K5pZREREpHgUeKTMng3rrlvuUoiIiFQ3BR4pqvEQEREpPgUe\nKQo8REREik+BR4oCDxERkeJT4EHcfVZ9PERERIpPgQewYAEsXaoaDxERkWJT4EHz5GEKPERERIpL\ngQcKPEREREpFgQfNgYf6eIiIiBSXAg+iY2mPHrDWWuUuiYiISHVT4EHUeKyzDvTsWe6SiIiIVDcF\nHmgODxERkVJR4IECDxERkVJR4IEmDxMRESkVBR6oxkNERKRUFHigwENERKRUihZ4mNlFZvacmS00\ns3lZ0jRlPBrN7IiMNIPMbKyZLTazKWY2opDlbGyEOXMUeIiIiJRCryLm3Ru4H3gB+GE76Y4DngAs\n9fqTZIWZ9QPGAE8CJwM7AHeY2Xx3H12IQs6bB01N6uMhIiJSCkULPNz9UgAzO66DpA3uPjvLumOI\nAOZEd18OTDSznYFzgIIEHrNTW1aNh4iISPFVQh+Pm81stpm9ZGYnZKwbAoxNBR2JMcBAM+tfiI3r\nPi0iIiKlU8ymllxcDDwNLAKGAreY2aruflNq/frA5Iz3zExb19DVAijwEBERKZ28Ag8zuwo4v50k\nDmzj7u/kkp+7X5n28nUzWxUYAdyU5S0FN2sW9OkDq69eqi2KiIh0X/nWeFwH3NFBmswainy8DFxs\nZr3dfRkwAxiQkSZ5PaOjzIYPH07//i1bZGpra6mtrf3idTJ5mFnmu0VERKpbXV0ddXV1LZY1NHS5\nMaFdeQUe7j4XmFuksgDsDMxPBR0QI2KuMLOe7t6YWjYUmOTuHe6ZG264gZqamnbTaA4PERHprjIv\nxgHGjx/P4MGDi7bNovXxMLNNgLWATYGeZrZjatV77r7QzA4kai9eBD4nAooLgWvTsrkX+Dlwu5ld\nQwynPRM4q1DlVOAhIiJSOsXsXHoZ8IO01+NTz3sCY4FlwGnAKGIOj/eAs9Pn53D3BWY2FLgZeAWY\nA4x099sKVchZs2DzzQuVm4iIiLSnmPN4nABkDo9NXz+GGBrbUT5vArsXsGgtzJoFu+xSrNxFREQk\nXSXM41FWs2erqUVERKRUunXgsXQpzJ+vwENERKRUunXgMWdOPCvwEBERKY1uHXgks5bqBnEiIiKl\n0a0DD90gTkREpLS6deChGg8REZHS6vaBx6qrxkNERESKr9sHHmpmERERKZ1uHXgkN4gTERGR0ujW\ngYdqPEREREpLgYcCDxERkZJR4KHAQ0REpGS6feChPh4iIiKl020Dj0WLYOFC1XiIiIiUUrcNPDRr\nqYiISOl128AjmbVUgYeIiEjpdPvAQ308RERESqfbBh5JU4sCDxERkdLptoHHrFmwxhrQp0+5SyIi\nItJ9dOvAQ/07RERESqvbBh7Tp8OAAeUuhYiISPfSbQOP+nrYZJNyl0JERKR76baBx7RpsNFG5S6F\niIhI99ItAw/3qPHYeONyl0RERKR7KUrgYWabmtloM5tsZovM7F0zG2lmvTPSbWJmj5nZQjObYWbX\nmlmPjDSDzGysmS02sylmNqKr5Zs7F5YsUeAhIiJSar2KlO/WgAEnAe8D2wOjgVWAnwCkAozHgY+B\nIcCGwD3AUuBnqTT9gDHAk8DJwA7AHWY2391Hd7Zw9fXxrMBDRESktIoSeLj7GCJgSHxoZtcBp5AK\nPIB9iQBlT3efA7xhZhcDV5vZSHdfDhwD9AZOTL2eaGY7A+cQgUynTJsWz+rjISIiUlql7OOxBjAv\n7fUQ4I1U0JEYA/QHtktLMzYVdKSnGWhm/TtbkPp66NkT1l+/szmIiIhIZ5Qk8DCzrYDTgd+lLV4f\nmJmRdGbaulzT5K2+HjbYIIIPERERKZ28mlrM7Crg/HaSOLCNu7+T9p6NgL8Bf3L32ztVyk4aPnw4\n/fu3rBipra2lvr5W/TtERKTbq6uro66ursWyhoaGom7T3D33xGZrA2t3kGxy0jRiZhsC/wSed/cT\nMvK6FDjI3WvSlm0GTAZ2dvfXzewuoJ+7H5aWZg/gH8Ba7t7m3jGzGmDcuHHjqKmpabV+6FBYfXV4\n8MEOPomIiEg3M378eAYPHgww2N3HFzr/vGo83H0uMDeXtKmajqeB/wA/bCPJC8BFZrZOWj+PoUAD\nMCEtzRVm1tPdG9PSTMoWdOSivj6CDxERESmtYs3jsSHwDDCFGMWynpkNMLP0u6M8SQQY96Tm6tgX\nuBy4yd2XpdLcSwyvvd3MtjWzI4Ezgeu7Uj5NHiYiIlIexZrHYx9gi9RjamqZEX1AegK4e5OZHQj8\nFngeWAjcCVySZOLuC8xsKHAz8AowBxjp7rd1tmALFsCnnyrwEBERKYdizeNxF3BXDummAgd2kOZN\nYPcCFU1zeIiIiJRRt7tXi2YtFRERKZ9uG3hsuGF5yyEiItIddcvAY731YKWVyl0SERGR7qfbBR7T\npql/h4iISLl0u8BDQ2lFRETKR4GHiIiIlIwCDxERESmZbhV4fP45zJ2rPh4iIiLl0q0Cj2TyMNV4\niIiIlEe3Cjw0eZiIiEh5dcvAQ00tIiIi5dGtAo9p06B/f1httXKXREREpHvqVoGHRrSIiIiUlwIP\nERERKRkFHiIiIlIy3SrwmDZNgYeIiEg5dZvAY9kymD5dI1pERETKqdsEHjNmgLtqPERERMqp2wQe\nmjxMRESk/BR4iIiISMl0m8Bj2jRYeWVYY41yl0RERKT76jaBRzKU1qzcJREREem+ul3gISIiIuVT\nlMDDzDY1s9FmNtnMFpnZu2Y20sx6Z6Rryng0mtkRGWkGmdlYM1tsZlPMbERnyqTAQ0REpPx6FSnf\nrQEDTgLeB7YHRgOrAD/JSHsc8EQqPcAnyQoz6weMAZ4ETgZ2AO4ws/nuPjqfAk2bBrvtlv8HERER\nkcIpSuDh7mOIgCHxoZldB5xC68Cjwd1nZ8nqGKA3cKK7LwcmmtnOwDlEIJOTpibNWioiIlIJStnH\nYw1gXhvLbzaz2Wb2kpmdkLFuCDA2FXQkxgADzax/rhuePTtmLlXgISIiUl7Famppwcy2Ak4nairS\nXQw8DSwChgK3mNmq7n5Tav36wOSM98xMW9eQy/aTOTw0XbqIiEh55RV4mNlVwPntJHFgG3d/J+09\nGwF/A/7k7re3SOx+ZdrL181sVWAEcBMF9Emq18jaaxcyVxEREclXvjUe1wF3dJDmixoKM9uQqNH4\nt7ufnEP+LwMXm1lvd18GzAAGZKRJXs/oKLPhw4fTv39/ZqbqSE4+GU44oZba2tociiIiIlLd6urq\nqKura7GsoSGnxoROM3cvTsZR0/E08B/gWM9hQ2b2U2C4u6+Ten0KcAUwwN0bU8t+ARzq7tu2k08N\nMG7cuHHU1NTwwANwxBEwf75mLhUREWnP+PHjGTx4MMBgdx9f6PyL0scjVdPxDPABMYplPUtNGeru\nM1NpDiRqL14EPif6eFwIXJuW1b3Az4HbzewaYjjtmcBZ+ZRn8eJ4Xnnlzn0eERERKYxidS7dB9gi\n9ZiaWmZEH5CeqdfLgNOAUal17wFnp8/P4e4LzGwocDPwCjAHGOnut+VTmMWLY6r0Pn06/4FERESk\n64o1j8ddwF0dpMmc6yNbujeB3btSnsWLo7ZD92kREREpr25xr5Yk8BAREZHyUuAhIiIiJaPAQ0RE\nREpGgYeIiIiUTLcIPD7/XIGHiIhIJegWgYdqPERERCqDAg8REREpmW4TePTtW+5SiIiISLcJPFTj\nISIiUn4KPERERKRkFHiIiIhIySjwEBERkZJR4CEiIiIlo8BDRERESkaBh4iIiJRM1Qce7go8RERE\nKkXVBx5Ll8azAg8REZHyq/rAY/HieFbgISIiUn4KPERERKRkuk3goXu1iIiIlF+3CTxU4yEiIlJ+\nCjxERESkZBR4iIiISMko8BAREZGSKVrgYWYPm9kUM1tsZh+b2d1mtkFGmk3M7DEzW2hmM8zsWjPr\nkZFmkJmNTeUzxcxG5FMOBR4iIiKVo5g1Hk8D3wO+DBwGbAk8kKxMBRiPA72AIcBxwPHAZWlp+gFj\ngA+AGmAEMNLMhuVaCAUeIiIilaNXsTJ291+nvZxqZlcDD5lZT3dvBPYFtgb2dPc5wBtmdjFwtZmN\ndPflwDFAb+DE1OuJZrYzcA4wOpdyKPAQERGpHCXp42FmawFHA8+lgg6IWo43UkFHYgzQH9guLc3Y\nVNCRnmagmfXPZduLF0OvXvEQERGR8ipq4GFmV5vZZ8AcYBPg0LTV6wMzM94yM21drmna9fnnqu0Q\nERGpFHnVA5jZVcD57SRxYBt3fyf1+lqiSWRT4BLgHuDATpSzU4YPH86sWf1ZsgQOPjiW1dbWUltb\nW6oiiIiIVKy6ujrq6upaLGtoaCjqNs3dc09stjawdgfJJmc0jSTv3QiYCnzd3V8ys0uBg9y9Ji3N\nZsBkYGd3f93M7gL6ufthaWn2AP4BrOXube4dM6sBxo0bN46//KWGe+6BKVNy/pgiIiLd1vjx4xk8\neDDAYHcfX+j886rxcPe5wNxObqtn6nml1PMLwEVmtk5aP4+hQAMwIS3NFWkdUpM0k7IFHZkWL1ZT\ni4iISKUoSh8PM9vFzE4zsx3N7Etm9m3gXuBdIpgAeJIIMO5JzdWxL3A5cJO7L0uluRdYCtxuZtua\n2ZHAmcD1uZZFgYeIiEjlKFbn0kXE3B1/B94Gfg+8BuyRBBXu3kT092gEngfuBu4k+oKQSrOAqOHY\nDHgF+CUw0t1vy7UgCjxEREQqR1EGmbr7m8BeOaSbSgedTVN57d7ZsijwEBERqRzd4l4tCjxEREQq\ngwIPERERKRkFHiIiIlIyCjxERESkZBR4iIiISMlUfeChe7WIiIhUjqoPPFTjISIiUjm6ReDRt2+5\nSyEiIiLQTQIP1XiIiIhUhqoOPJqaYMkSBR4iIiKVoqoDjyVL4lmBh4iISGVQ4CEiIiIlo8BDRERE\nSkaBh4iIiJSMAg8REREpmaoOPD7/PJ4VeIiIiFSGqg48VOMhIiJSWao68Fi6NJ4VeIiIiFSGqg48\n1NQiIiJSWao68EiaWnSvFhERkcrQLQIP1XiIiIhUhqoPPPr0gR5V/SlFRERWHFV9Sv78c9V2iIiI\nVJKiBR5m9rCZTTGzxWb2sZndbWYbZKRpyng0mtkRGWkGmdnYVD5TzGxErmXQnWlFREQqSzFrPJ4G\nvgd8GTgM2BJ4oI10xwEDgPWBDYD/S1aYWT9gDPABUAOMAEaa2bBcCqDAQ0REpLL0KlbG7v7rtJdT\nzexq4CEz6+nujWnrGtx9dpZsjgF6Aye6+3JgopntDJwDjO6oDAo8REREKktJ+niY2VrA0cBzGUEH\nwM1mNtvMXjKzEzLWDQHGpoKOxBhgoJn172i7CjxEREQqS1EDDzO72sw+A+YAmwCHZiS5GDgC2Bt4\nELjFzE5PW78+MDPjPTPT1rVLnUtFREQqS16Bh5ld1UaH0MzOoV9Oe8u1wE7APkAjcE96fu5+pbu/\n4O6vu/svgWuIfhwFoRoPERGRypJvH4/rgDs6SDM5+cPd5wHzgPfM7G2ir8fX3P2lLO99GbjYzHq7\n+zJgBtHxNF3yekZHhX311eH07t2fgw9uXlZbW0ttbW1HbxUREal6dXV11NXVtVjW0NBQ1G3mFXi4\n+1xgbie31TP1vFI7aXYG5qeCDoAXgCsyOqQOBSa5e4d7Zsstb2CLLWq4775OllhERKSKtXUxPn78\neAYPHly0bRZlVIuZ7QJ8Ffg3MB/YCrgMeJcIJjCzA4naixeBz4mA4kKieSZxL/Bz4HYzuwbYATgT\nOCuXcixZovu0iIiIVJJiDaddRMzdMRJYFZgO/A24Mq02YxlwGjAKMOA94Gx3/2KYrLsvMLOhwM3A\nK0Qn1ZHuflsuhVAfDxERkcpSlMDD3d8E9uogzRhiaGwuee3emXIo8BAREaksuleLiIiIlExVBx6q\n8RAREaksCjxERESkZBR4iIiISMlUdeDR2KjAQ0REpJJUdeABCjxEREQqiQIPERERKRkFHiIiIlIy\nCjxERESkZKo+8NC9WkRERCpH1QceqvEQERGpHAo8REREpGQUeIiIiEjJKPAQERGRklHgISIiIiVT\n9YHHSiuVuwQiIiKSqOrAY6WVwKzcpRAREZFE1QceIiIiUjkUeIiIiEjJKPAQERGRkqnqwEPTpYuI\niFSWqg48+vQpdwlEREQkXVUHHmpqERERqSxVHXioqUVERKSyFD3wMLM+ZvaamTWZ2aCMdZuY2WNm\nttDMZpjZtWbWIyPNIDMba2aLzWyKmY3Idduq8RAREakspajxuBaoBzx9YSrAeBzoBQwBjgOOBy5L\nS9MPGAN8ANQAI4CRZjYslw0r8BAREaksRQ08zGw/YB/gPCBzDtF9ga2Bo939DXcfA1wMnGZmvVJp\njgF6Aye6+0R3vx/4DXBOLttX4CEiIlJZihZ4mNkA4FYieFjcRpIhwBvuPidt2RigP7BdWpqx7r48\nI81AM+vfURnUx0NERKSyFLPG4w7gFnd/Ncv69YGZGctmpq3LNU1WqvEQERGpLL06TtLMzK4Czm8n\niQPbAN8BVgOuSd7aqdJ10bPPDufgg1tWjNTW1lJbW1uO4oiIiFSUuro66urqWixraGgo6jbzCjyA\n64iajPZ8AOwJfB1YYi1vD/uKmf3R3U8AZgBfzXjvgNTzjLTnAR2kyeqgg27gpptqOkomIiLSLbV1\nMT5+/HgGDx5ctG3mFXi4+1xgbkfpzOwM4KdpizYk+mYcAbycWvYCcJGZrZPWz2Mo0ABMSEtzhZn1\ndPfGtDST3L3DkExNLSIiIpWlKH083L3e3SckD+Bdorllsrt/nEr2JBFg3JOaq2Nf4HLgJndflkpz\nL7AUuN3MtjWzI4EzgetzKYcCDxERkcpSyplLW8zj4e5NwIFAI/A8cDdwJ3BJWpoFRA3HZsArwC+B\nke5+Wy4bVOAhIiJSWfLt49Ep7j4F6NnG8qlE8NHee98Edu/MdhV4iIiIVBbdq0VERERKpqoDD9V4\niIiIVBYFHiIiIlIyCjxERESkZKo68FAfDxERkcpS1YGHajxEREQqiwIPERERKRkFHiIiIlIyVR14\nqI+HiIhIZanqwKNnq7lSRUREpJyqOvAwK3cJREREJF1VBx4iIiJSWRR4iIiISMko8BAREZGSUeAh\nIiIiJaPAQ0REREpGgYeIiIiUjAIPERERKRkFHiIiIlIyCjxERESkZBR4iIiISMko8BAREZGSUeAh\nBVNXV1fuInQ72uelp31eetrn1aXogYeZ9TGz18ysycwGZaxryng0mtkRGWkGmdlYM1tsZlPMbESx\nyyydo4ND6Wmfl572eelpn1eXXiXYxrVAPbBDlvXHAU8Ayb1kP0lWmFk/YAzwJHByKo87zGy+u48u\nWolFRESkKIoaeJjZfsA+wOHA/lmSNbj77CzrjgF6Aye6+3JgopntDJwDKPAQERFZwRStqcXMBgC3\nEsHD4naS3mxms83sJTM7IWPdEGBsKuhIjAEGmln/wpZYREREiq2YNR53ALe4+6tmtmmWNBcDTwOL\ngKHALWa2qrvflFq/PjA54z0z09Y1ZMm3L8DEiRM7W3bphIaGBsaPH1/uYnQr2uelp31eetrnpZV2\n7pr5tHYAAAYrSURBVOxblA24e84P4CqgqZ1HI/Bl4ExgLNAj9b7NUusHdZD/SGBK2usxwG8z0myT\n2s7AdvL5PuB66KGHHnrooUenH9/PJ0bI9ZFvjcd1RE1Gez4A9gS+Diwxs/R1r5jZH909s0kl8TJw\nsZn1dvdlwAxgQEaa5PWMdsowBjga+BD4vIPyioiISLO+RIXBmGJknlfg4e5zgbkdpTOzM4Cfpi3a\nkPgARxDBRTY7A/NTQQfAC8AVZtbT3RtTy4YCk9w9WzNLUs57OyqniIiItOn5YmVclD4e7l6f/trM\nFhLDZSe7+8epZQcStRcvErUSQ4ELieG3iXuBnwO3m9k1xHDaM4GzilFuERERKa5SzOOR8IzXy4DT\ngFFEUPIecHb6/BzuvsDMhgI3A68Ac4CR7n5baYosIiIihWSpzpgiIiIiRad7tYiIiEjJKPAQERGR\nkqm6wMPMTjOzD1I3lXvRzL5a7jJVCzO70MxeNrMFZjbTzB4ysy+3ke4yM/vYzBaZ2VNmtlU5yluN\nzOyC1A0VR2Us1z4vIDPb0MzuMbM5qX36upnVZKTRPi8QM+thZpeb2eTU/nzPzH7WRjrt804ys93M\n7BEzm5Y6hhzcRpp296+ZrWRmN6d+F5+a2YNmtl6+ZamqwMPMjgSuBy4hhua+Dowxs3XKWrDqsRtw\nI/A1YG/iPjpPmtnKSQIzOx84HfgRsAuwkPgf9Cl9catLKoj+EfG9Tl+ufV5AZrYG8BywBNiXmLTw\nXGB+Whrt88K6gLgR6I+BrYGfAD8xs9OTBNrnXbYq8Bqxj1t17sxx//4KOIC4/9q3iKky/px3SYox\nK1m5HsTQ3F+nvTbizrg/KXfZqvEBrEPMSLtr2rKPgeFpr1cn7tVzRLnLuyI/gNWAScC3gX8Co7TP\ni7avrwae7SCN9nlh9/mjwO8zlj0I3K19XpT93QQcnLGs3f2ber0E+G5amoGpvHbJZ/tVU+NhZr2B\nwcA/kmUee+bvxCyqUnhrEJHzPAAz25y4h076/2AB8BL6H3TVzcCj7v50+kLt86I4iJhl+f5Uk+J4\nMxuWrNQ+L4rngb3M7H8AzGxH4JvA46nX2udFlOP+/QoxBUd6mknAR+T5PyjlPB7Ftg7Qk+abyCVm\nElGZFJDFXPi/Av7t7hNSi9cnApG2/gfrl7B4VcXMjgJ2In74mbTPC28L4FSi2fZKotr5N2a2xN3v\nQfu8GK4mrqjfNrNGohvAT939vtR67fPiymX/DgCWpgKSbGlyUk2Bh5TWLcC2xFWJFImZbUwEeHt7\n860EpLh6AC+7+8Wp16+b2fbAKcA95StWVTuSuLnnUcAEItD+tZl9nAr2pIpUTVMLMatpI23fVK69\nG8pJnszsJmB/YA93n562agbRr0b/g8IZDKwLjDezZWa2DNgdOMvMlhJXG9rnhTUdmJixbCLwpdTf\n+p4X3rXA1e7+gLu/5e5/BG4gbqMB2ufFlsv+nQH0MbPV20mTk6oJPFJXg+OAvZJlqeaAvSjizW66\nm1TQcQiwp7t/lL7O3T8gvoDp/4PViVEw+h90zt+JexTtBOyYerwC/AHY0d0no31eaM/Runl2IDAF\n9D0vklWIC8d0TaTOUdrnxZXj/h0HLM9IM5AIyF/IZ3vV1tQyCrjTzMYRd8EdTnyh7yxnoaqFmd0C\n1AIHAwvNLImOG9z989TfvwJ+ZmbvAR8ClxMjix4ucXGrgrsvJKqev5C66eJcd0+uyrXPC+sG4Dkz\nuxC4nzj4DgNOSkujfV5YjxL7sx54C6ghjt+j09Jon3eBma0KbEXUbABskerEO8/dp9LB/vW4d9pt\nwCgzmw98CvwGeM7d27vrfGvlHtZThGFCP07ttMVEFPaVcpepWh7EFUhjG48fZKQbSQzNWgSMAbYq\nd9mr6QE8TdpwWu3zouzj/YH/pvbnW8AP20ijfV64/b0qceH4ATF/xLvApUAv7fOC7ePdsxzDb891\n/wIrEXM5zUkFHg8A6+VbFt0kTkREREqmavp4iIiISOVT4CEiIiIlo8BDRERESkaBh4iIiJSMAg8R\nEREpGQUeIiIiUjIKPERERKRkFHiIiIhIySjwEBERkZJR4CEiIiIlo8BDRERESub/Ax42urrlmzgc\nAAAAAElFTkSuQmCC\n", 35 | "text/plain": [ 36 | "" 37 | ] 38 | }, 39 | "metadata": {}, 40 | "output_type": "display_data" 41 | } 42 | ], 43 | "source": [ 44 | "plt_dat = np.genfromtxt('dbn_params/lproxy_layer_2.csv', delimiter=',', names = True) #'model_params/likelihood_proxy.csv'\n", 45 | "plt.plot(plt_dat)\n", 46 | "plt.show()" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 3, 52 | "metadata": { 53 | "collapsed": false 54 | }, 55 | "outputs": [], 56 | "source": [ 57 | "dat_x = np.genfromtxt('data/dtm_20news.csv', dtype='float32', delimiter=',', skip_header = 1)\n", 58 | "dat_y = dat_x[:,0]\n", 59 | "dat_x = dat_x[:,1:]\n", 60 | "vocab = np.genfromtxt('data/dtm_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:]\n", 61 | "test_input = theano.shared(dat_x)" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 4, 67 | "metadata": { 68 | "collapsed": false 69 | }, 70 | "outputs": [ 71 | { 72 | "data": { 73 | "text/plain": [ 74 | "(18828, 2756)" 75 | ] 76 | }, 77 | "execution_count": 4, 78 | "metadata": {}, 79 | "output_type": "execute_result" 80 | } 81 | ], 82 | "source": [ 83 | "dat_x.shape" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 36, 89 | "metadata": { 90 | "collapsed": true 91 | }, 92 | "outputs": [], 93 | "source": [ 94 | "def score_rsm(input, learning_rate=0.002, \n", 95 | " training_epochs=50, batch_size=400, \n", 96 | " n_hidden=2000, model_src = 'model_params/rsm_epoch_80.pkl'):\n", 97 | "\n", 98 | " train_set_x = input\n", 99 | " N_input_x = train_set_x.shape[0]\n", 100 | " \n", 101 | " # compute number of minibatches for scoring\n", 102 | " N_splits = int( np.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) + 1 )\n", 103 | "\n", 104 | " # allocate symbolic variables for the data\n", 105 | " index = T.lscalar() # index to a [mini]batch\n", 106 | " x = T.matrix('x') # the data is presented as rasterized images\n", 107 | " \n", 108 | " # construct the RBM class\n", 109 | " rsm = RSM(input=x, n_visible=train_set_x.get_value(borrow=True).shape[1],\n", 110 | " n_hidden=n_hidden)#, numpy_rng=rng, theano_rng=theano_rng)\n", 111 | " \n", 112 | " # ensure model source directory is valid\n", 113 | " assert type(model_src) == str or model_src is not None\n", 114 | " \n", 115 | " # load saved model\n", 116 | " rsm.__setstate__(pickle.load(open(model_src, 'rb')))\n", 117 | " \n", 118 | " # extract the features w.r.t inputs\n", 119 | " _, hid_extract = rsm.propup(x, x.sum(axis=1))\n", 120 | " \n", 121 | " # start-snippet-5\n", 122 | " # it is ok for a theano function to have no output\n", 123 | " # the purpose of train_rbm is solely to update the RBM parameters\n", 124 | " score = theano.function(\n", 125 | " inputs = [index],\n", 126 | " outputs = hid_extract,\n", 127 | " givens={\n", 128 | " x: train_set_x[index * batch_size: (index + 1) * batch_size]\n", 129 | " }\n", 130 | " )\n", 131 | " \n", 132 | " return np.concatenate( [score(ii) for ii in range(N_splits)], axis=0 )\n" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 37, 138 | "metadata": { 139 | "collapsed": false 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "rsm = score_rsm(input=test_input, n_hidden = 0)" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 54, 149 | "metadata": { 150 | "collapsed": false 151 | }, 152 | "outputs": [ 153 | { 154 | "data": { 155 | "text/plain": [ 156 | "(18828, 2000)" 157 | ] 158 | }, 159 | "execution_count": 54, 160 | "metadata": {}, 161 | "output_type": "execute_result" 162 | } 163 | ], 164 | "source": [ 165 | "rsm.shape" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 26, 171 | "metadata": { 172 | "collapsed": false 173 | }, 174 | "outputs": [ 175 | { 176 | "data": { 177 | "text/plain": [ 178 | "878.73932" 179 | ] 180 | }, 181 | "execution_count": 26, 182 | "metadata": {}, 183 | "output_type": "execute_result" 184 | } 185 | ], 186 | "source": [ 187 | "rsm[18828-1].sum()" 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": 55, 193 | "metadata": { 194 | "collapsed": false 195 | }, 196 | "outputs": [ 197 | { 198 | "data": { 199 | "text/plain": [ 200 | "array([[ 1.92498332e-38, 4.09554756e-38, 0.00000000e+00, ...,\n", 201 | " 1.00000000e+00, 0.00000000e+00, 1.00000000e+00],\n", 202 | " [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, ...,\n", 203 | " 1.00000000e+00, 0.00000000e+00, 1.00000000e+00]], dtype=float32)" 204 | ] 205 | }, 206 | "execution_count": 55, 207 | "metadata": {}, 208 | "output_type": "execute_result" 209 | } 210 | ], 211 | "source": [ 212 | "rsm[:2,]" 213 | ] 214 | } 215 | ], 216 | "metadata": { 217 | "anaconda-cloud": {}, 218 | "kernelspec": { 219 | "display_name": "Python [conda env:py3]", 220 | "language": "python", 221 | "name": "conda-env-py3-py" 222 | }, 223 | "language_info": { 224 | "codemirror_mode": { 225 | "name": "ipython", 226 | "version": 3 227 | }, 228 | "file_extension": ".py", 229 | "mimetype": "text/x-python", 230 | "name": "python", 231 | "nbconvert_exporter": "python", 232 | "pygments_lexer": "ipython3", 233 | "version": "3.5.2" 234 | } 235 | }, 236 | "nbformat": 4, 237 | "nbformat_minor": 2 238 | } 239 | -------------------------------------------------------------------------------- /scripts_python/20news_dtm.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | 4 | import os 5 | import numpy as np 6 | import pandas as pd 7 | import ast 8 | from sklearn.feature_extraction.text import CountVectorizer 9 | 10 | 11 | 12 | os.chdir('~/Codes/DL - Topic Modelling') 13 | 14 | dat = pd.read_csv('data/clean_20news.csv', sep=",") 15 | 16 | 17 | docs = [ast.literal_eval(doc) for doc in dat['document'].tolist()] 18 | 19 | all_words = [word for doc in docs for word in doc] 20 | pd_all_words = pd.DataFrame({'words' : all_words}) 21 | pd_unq_word_counts = pd.DataFrame({'count' : pd_all_words.groupby('words').size()}).reset_index().sort('count', ascending = False) 22 | pd_unq_word_counts_filtered = pd_unq_word_counts.loc[pd_unq_word_counts['count'] >= 150] 23 | list_unq_word_filtered = list( pd_unq_word_counts_filtered.ix[:,0] ) 24 | len(list_unq_word_filtered) 25 | 26 | 27 | vec = CountVectorizer(input = 'content', lowercase = False, vocabulary = list_unq_word_filtered) 28 | 29 | iters = list(range(0,len(docs),500)) 30 | iters.append(len(docs)) 31 | dtm = np.array([] ).reshape(0,len(list_unq_word_filtered)) 32 | for i in range(len(iters)-1): 33 | dtm = np.concatenate( (dtm, list(map(lambda x: vec.fit_transform(x).toarray().sum(axis=0), docs[iters[i]:iters[i+1]] )) ), axis = 0) 34 | print(str(i)) 35 | 36 | colnames = list_unq_word_filtered 37 | colnames.insert(0,'_label_') 38 | 39 | pd.DataFrame(data = np.c_[dat['label'].values, dtm], 40 | columns = colnames). \ 41 | to_csv( 'data/dtm_20news.csv', index = False) 42 | -------------------------------------------------------------------------------- /scripts_python/data_proc_20news.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Tue Jan 17 09:38:22 2017 4 | 5 | @author: ekhongl 6 | """ 7 | 8 | import os 9 | import numpy as np 10 | import pandas as pd 11 | import re 12 | 13 | import nltk 14 | #from nltk.tokenize import PunktSentenceTokenizer 15 | from nltk.corpus import stopwords 16 | from nltk.stem import PorterStemmer, WordNetLemmatizer 17 | ps = PorterStemmer() 18 | lemma = WordNetLemmatizer() 19 | 20 | from sklearn.feature_extraction.text import CountVectorizer 21 | 22 | 23 | import gensim 24 | from gensim import corpora, models 25 | 26 | 27 | dir_wd = os.getcwd() 28 | 29 | dir_src = os.path.join(dir_wd, 'data/raw_20news/20news-18828') 30 | 31 | dir_src_classes = list( map(lambda x: os.path.join(dir_src, x ), os.listdir(dir_src)) ) 32 | 33 | 34 | dat = [] 35 | dat_y = [] 36 | dat_y_cat = [] 37 | 38 | for i in range(0,len(dir_src_classes)): 39 | 40 | print('Currently loading the following topic (iteration ' + str(i) + '):\n \t' + dir_src_classes[i]) 41 | dir_src_classes_file = list( map(lambda x: os.path.join(dir_src_classes[i], x), os.listdir(dir_src_classes[i])) ) 42 | 43 | for ii in range(0, len(dir_src_classes_file)): 44 | 45 | dat_y.append(i) 46 | 47 | with open(dir_src_classes_file[ii], 'r') as file: 48 | dat.append(file.read().replace('\n', ' ')) 49 | 50 | #export data 51 | pd.DataFrame( { 'labels' : dat, 52 | 'documents' : dat_y}). \ 53 | to_csv(os.path.join(dir_wd,'data/raw_20news/20news.csv'), 54 | index=False) 55 | 56 | print('------- Data cleaning -------') 57 | stopwords_en = stopwords.words('english') 58 | dat_clean = [] 59 | for i in range(len(dat)): 60 | 61 | ''' tokenization and punctuation removal ''' 62 | # uses nltk tokenization - e.g. shouldn't = [should, n't] instead of [shouldn, 't] 63 | tmp_doc = nltk.tokenize.word_tokenize(dat[i].lower()) 64 | 65 | # split words sperated by fullstops 66 | tmp_doc_split = [w.split('.') for w in tmp_doc if len(w.split('.')) > 1] 67 | # flatten list 68 | tmp_doc_split = [i_sublist for i_list in tmp_doc_split for i_sublist in i_list] 69 | # clean split words 70 | tmp_doc_split = [w for w in tmp_doc_split if re.search('^[a-z]+$',w)] 71 | 72 | # drop punctuations 73 | tmp_doc_clean = [w for w in tmp_doc if re.search('^[a-z]+$',w)] 74 | tmp_doc_clean.extend(tmp_doc_split) 75 | 76 | ''' stop word removal''' 77 | tmp_doc_clean_stop = [w for w in tmp_doc_clean if w not in stopwords_en] 78 | #retain only words with 2 characters or more 79 | tmp_doc_clean_stop = [w for w in tmp_doc_clean_stop if len(w) >2] 80 | 81 | ''' stemming (using the Porter's algorithm)''' 82 | tmp_doc_clean_stop_stemmed = [ps.stem(w) for w in tmp_doc_clean_stop] 83 | dat_clean.append(tmp_doc_clean_stop_stemmed) 84 | 85 | #print progress 86 | if i % 100 == 0: print( 'Current progress: ' + str(i) + '/' + str(len(dat)) ) 87 | 88 | #save cleaned data 89 | pd.DataFrame( { 'document' : dat_clean, 90 | 'label' : dat_y}). \ 91 | to_csv(os.path.join(dir_wd,'data/clean_20news.csv'), 92 | index=False) 93 | 94 | 95 | 96 | # all_words = [word for doc in dat_clean for word in doc] 97 | # pd_all_words = pd.DataFrame({'words' : all_words}) 98 | # pd_unq_word_counts = pd.DataFrame({'count' : pd_all_words.groupby('words').size()}).reset_index().sort('count', ascending = False) 99 | # pd_unq_word_counts_filtered = pd_unq_word_counts.loc[pd_unq_word_counts['count'] >= 100] 100 | # list_unq_word_filtered = list( pd_unq_word_counts_filtered.ix[:,0] ) 101 | 102 | 103 | # vec = CountVectorizer(input = 'content', lowercase = False, vocabulary = list_unq_word_filtered) 104 | 105 | 106 | # iters = list(range(0,len(dat_clean),500)) 107 | # iters.append(len(dat_clean)) 108 | # dtm = [] 109 | # for i in range(len(iters)-1): 110 | # dtm = np.concatenate((dtm, list(map(lambda x: vec.fit_transform(x).toarray().astype(np.int32), dat_clean[iters[i]:iters[i+1]] )) ), axis = 0) 111 | # print(str(i)) 112 | 113 | 114 | # data = vec.fit_transform(tmp).toarray() 115 | # print(data) 116 | 117 | # import numpy as np 118 | # list(map(lambda x: vec.fit_transform(x).toarray(), tmp )).shape 119 | 120 | # [i for i in map(lambda x: vec.fit_transform(x).toarray(), tmp )] 121 | 122 | 123 | 124 | 125 | 126 | # dictionary = corpora.Dictionary(dat_clean) 127 | # dictionary.filter_extremes(no_below=50, no_above=0.7, keep_n=100000) 128 | 129 | 130 | # dtm = [dictionary.doc2bow(doc) for doc in dat_clean] 131 | 132 | 133 | 134 | # tmp = [list(map(lambda x : doc.count(x), list_unq_word_filtered)) for doc in dat_clean] 135 | 136 | 137 | # tmp = list(map(lambda y: list( map( lambda x : y.count(x), list_unq_word_filtered ) ), dat_clean)) 138 | 139 | 140 | # [doc.count(w_dic) for doc in dat_clean for w_dic in list_unq_word_filtered] 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | import pandas as pd 159 | import matplotlib.pyplot as plt 160 | plt.hist(pd_all_words['words'].value_counts()[300:3000]) 161 | plt.title("Plot of word counts") 162 | plt.xlabel("Value") 163 | plt.ylabel("Frequency") 164 | plt.show() 165 | 166 | 167 | for doc in dat_clean if 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | pd_unq_word_counts_filtered.ix[:,0] 177 | 178 | 179 | count_vect = CountVectorizer(list(pd_unq_word_counts_filtered.ix[:,0])) 180 | count_vect.fit_transform(dat_clean[0]).shape 181 | 182 | all_words = nltk.FreqDist(unq_words) 183 | 184 | 185 | set(dat_clean[0]) 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | dictionary = corpora.Dictionary(dat_clean) 203 | 204 | # convert tokenized documents into a document-term matrix 205 | dtm = [dictionary.doc2bow(doc) for doc in dat_clean] 206 | 207 | # generate LDA model 208 | ldamodel = gensim.models.ldamulticore.LdaMulticore( workers=3, 209 | corpus = dtm, 210 | id2word = dictionary, 211 | num_topics=20, 212 | passes=100, 213 | random_state=123) 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | # Create p_stemmer of class PorterStemmer 224 | p_stemmer = PorterStemmer() 225 | texts = [] 226 | bcfs = [] 227 | tokenizer = RegexpTokenizer(r'\w+') 228 | for i in raw_data: 229 | 230 | # clean and tokenize document string 231 | #raw = str(raw_data[i].lower()) 232 | raw = raw_data[i] 233 | raw_str = pd.Series.to_string(raw) 234 | raw_str = raw_str.lower() 235 | tokens = raw_str.split() 236 | 237 | #StanfordTokenizer().tokenize(raw_str) 238 | #tokens = tokenizer.tokenize(raw_str.split()) 239 | 240 | # remove stop words from tokens 241 | stopped_tokens = [i for i in tokens if not i in en_stop] 242 | removedP_tokens = [w for w in stopped_tokens if re.search('^[a-z]+$',w)] 243 | cleaned_tokens = [w for w in removedP_tokens if len(w) >= 3] 244 | 245 | # stem tokens 246 | #stemmed_tokens = [p_stemmer.stem(i) for i in cleaned_tokens] 247 | normalized = [lemma.lemmatize(i) for i in cleaned_tokens] 248 | normalized_pos = pos_tag(normalized) 249 | print(normalized_pos) 250 | 251 | # add tokens to list 252 | texts.append(normalized) 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | #------------------------------------------------------------------------------ 261 | alt 0 262 | comp 1,2,3,4,5 263 | misc 6, 264 | rec 7,8,9,10, 265 | sci 11,12,13,14 266 | soc 15, 267 | talk 16,17,18,19 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | -------------------------------------------------------------------------------- /scripts_python/pretrain_dbn.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function, division 2 | import os 3 | import sys 4 | import timeit 5 | from six.moves import cPickle as pickle 6 | 7 | import numpy as np 8 | import pandas as pd 9 | 10 | import theano 11 | import theano.tensor as T 12 | 13 | from lib.deeplearning import deepbeliefnet 14 | 15 | os.chdir('~/Codes/DL - Topic Modelling') 16 | 17 | dat_x = np.genfromtxt('data/dtm_2000_20news.csv', dtype='float32', delimiter=',', skip_header = 1) 18 | dat_y = dat_x[:,0] 19 | dat_x = dat_x[:,1:] 20 | vocab = np.genfromtxt('data/dtm_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:] 21 | x = theano.shared(dat_x) 22 | y = T.cast(dat_y, dtype='int32') 23 | 24 | 25 | 26 | 27 | model = deepbeliefnet(architecture = [2000, 500, 500, 128], n_outs = 20) 28 | 29 | 30 | model.pretrain(input = x, pretraining_epochs= [1000,100,100], output_path = 'params/dbn_params_test') 31 | -------------------------------------------------------------------------------- /scripts_python/train_dbn.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function, division 2 | import os 3 | import sys 4 | import timeit 5 | from six.moves import cPickle as pickle 6 | 7 | import numpy as np 8 | import pandas as pd 9 | 10 | import theano 11 | import theano.tensor as T 12 | 13 | from lib.deeplearning import deepbeliefnet 14 | 15 | os.chdir('~/Codes/DL - Topic Modelling') 16 | 17 | dat_x = np.genfromtxt('data/dtm_2000_20news_6class.csv', dtype='float32', delimiter=',', skip_header = 1) 18 | dat_y = dat_x[:,0] 19 | dat_x = dat_x[:,1:] 20 | vocab = np.genfromtxt('data/dtm_2000_20news_6class.csv', dtype=str, delimiter=',', max_rows = 1)[1:] 21 | x = theano.shared(dat_x) 22 | y = T.cast(dat_y, dtype='int32') 23 | 24 | 25 | #model = deepbeliefnet(architecture = [2756, 500, 500, 128], opt_epochs = [900,5,10], n_outs = 20, predefined_weights = 'params/dbn_params') 26 | 27 | #model.train(x=x, y=y, training_epochs = 10000, batch_size = 70, output_path = 'params/dbn_params_trained_long') 28 | 29 | # theano_dropout 30 | #model.train(x=x, y=y, training_epochs = 10000, learning_rate = (1/70)/2, batch_size = 120, 31 | # drop_out = [0.2, .5, .5, .5], output_path = 'params/dbn_params_dropout') 32 | 33 | # theano_dropout2 34 | #model.train(x=x, y=y, training_epochs = 10000, learning_rate = (1/70)/2, batch_size = 120, 35 | # drop_out = [0.2, .3, .4, .5], output_path = 'params/dbn_params_dropout2') 36 | 37 | 38 | 39 | model = deepbeliefnet(n_outs = 6, architecture = [2000, 500, 500, 128], 40 | opt_epochs = [110,15,10], predefined_weights = 'params_2000/dbn_params_pretrain') 41 | 42 | #theano_6 class 43 | model.train(x=x, y=y, training_epochs = 10000, learning_rate = 1/160, batch_size = 50, 44 | drop_out = [0.2, .5, .5, .5], output_path = 'params_2000/dbn_params_dropout') 45 | -------------------------------------------------------------------------------- /scripts_python/train_rsm.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import timeit 4 | from six.moves import cPickle as pickle 5 | 6 | import numpy as np 7 | import pandas as pd 8 | 9 | import theano 10 | import theano.tensor as T 11 | import os 12 | from lib.rbm import RSM 13 | 14 | from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams 15 | 16 | 17 | os.chdir('~/Codes/DL - Topic Modelling') 18 | 19 | 20 | def test_rbm(input, learning_rate=1/1600, 21 | training_epochs=5000, batch_size=1600, 22 | n_hidden=1500, output_folder = 'model_params'): 23 | """ 24 | Demonstrate how to train and afterwards sample from it using Theano. 25 | 26 | This is demonstrated on MNIST. 27 | 28 | :param learning_rate: learning rate used for training the RBM 29 | 30 | :param training_epochs: number of epochs used for training 31 | 32 | :param dataset: path the the pickled dataset 33 | 34 | :param batch_size: size of a batch used to train the RBM 35 | 36 | """ 37 | train_set_x = input 38 | 39 | # compute number of minibatches for training, validation and testing 40 | n_train_batches = train_set_x.get_value(borrow=True).shape[0] // batch_size 41 | 42 | # allocate symbolic variables for the data 43 | index = T.lscalar() # index to a [mini]batch 44 | x = T.matrix('x') # the data is presented as rasterized images 45 | 46 | rng = np.random.RandomState(123) 47 | theano_rng = RandomStreams(rng.randint(2 ** 30)) 48 | 49 | # initialize storage for the persistent chain (state = hidden 50 | # layer of chain) 51 | persistent_chain = theano.shared(np.zeros((batch_size, n_hidden), 52 | dtype=theano.config.floatX), 53 | borrow=True) 54 | 55 | # construct the RBM class 56 | rsm = RSM(input=x, n_visible=train_set_x.get_value(borrow=True).shape[1], 57 | n_hidden=n_hidden)#, numpy_rng=rng, theano_rng=theano_rng) 58 | 59 | # get the cost and the gradient corresponding to one step of CD-15 60 | cost, updates = rsm.get_cost_updates( lr=learning_rate, 61 | persistent=persistent_chain, 62 | k=1 ) 63 | 64 | ################################# 65 | # Training the RBM # 66 | ################################# 67 | if not os.path.isdir(output_folder): 68 | os.makedirs(output_folder) 69 | 70 | # start-snippet-5 71 | # it is ok for a theano function to have no output 72 | # the purpose of train_rbm is solely to update the RBM parameters 73 | train_rbm = theano.function( 74 | [index], 75 | cost, 76 | updates=updates, 77 | givens={ 78 | x: train_set_x[index * batch_size: (index + 1) * batch_size] 79 | }, 80 | name='train_rbm' 81 | ) 82 | 83 | plotting_time = 0. 84 | start_time = timeit.default_timer() 85 | 86 | # go through training epochs 87 | lproxy = [] 88 | for epoch in range(training_epochs): 89 | 90 | # go through the training set 91 | mean_cost = [] 92 | for batch_index in range(n_train_batches): 93 | mean_cost += [train_rbm(batch_index)] 94 | print('Current iteration: Epoch=' + str(epoch) + ', ' + 'Batch=' + str(batch_index)) 95 | 96 | # save the model parameters for each epoch 97 | print('Saving model...') 98 | epoch_pickle = output_folder + '/rsm_epoch_' + str(epoch) + '.pkl' 99 | path_epoch_pickle = os.path.join(os.getcwd(), epoch_pickle) 100 | pickle.dump( rsm.__getstate__(), open(path_epoch_pickle, 'wb')) 101 | print('...model saved.') 102 | 103 | # update current epoch and average cost 104 | lproxy += [np.mean(mean_cost)] 105 | print('Training epoch %d, likelihood proxy is ' % epoch, lproxy[epoch]) 106 | 107 | 108 | end_time = timeit.default_timer() 109 | 110 | pretraining_time = (end_time - start_time) - plotting_time 111 | 112 | pd.DataFrame(data = {'likelihood_proxy' :lproxy} ). \ 113 | to_csv(output_folder + '/likelihood_proxy.csv', index = False) 114 | 115 | print ('Training took %f minutes' % (pretraining_time / 60.)) 116 | 117 | 118 | 119 | dat_x = np.genfromtxt('data/dtm_20news.csv', dtype='float32', delimiter=',', skip_header = 1) 120 | dat_y = dat_x[:,0] 121 | dat_x = dat_x[:,1:] 122 | vocab = np.genfromtxt('data/dtm_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:] 123 | test_input = theano.shared(dat_x) 124 | 125 | 126 | test_input = theano.shared(dat_x) 127 | 128 | test_rbm(input = test_input) 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | -------------------------------------------------------------------------------- /scripts_python/train_sae.py: -------------------------------------------------------------------------------- 1 | """This tutorial introduces restricted boltzmann machines (RBM) using Theano. 2 | 3 | Boltzmann Machines (BMs) are a particular form of energy-based model which 4 | contain hidden variables. Restricted Boltzmann Machines further restrict BMs 5 | to those without visible-visible and hidden-hidden connections. 6 | """ 7 | from __future__ import print_function, division 8 | import os 9 | import sys 10 | import timeit 11 | from six.moves import cPickle as pickle 12 | 13 | import numpy as np 14 | import pandas as pd 15 | 16 | import theano 17 | import theano.tensor as T 18 | 19 | from lib.deeplearning import autoencoder 20 | 21 | os.chdir('~/Codes/DL - Topic Modelling') 22 | 23 | dat_x = np.genfromtxt('data/dtm_2000_20news.csv', dtype='float32', delimiter=',', skip_header = 1) 24 | dat_y = dat_x[:,0] 25 | dat_x = dat_x[:,1:] 26 | vocab = np.genfromtxt('data/dtm_2000_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:] 27 | test_input = theano.shared(dat_x) 28 | 29 | #model = autoencoder( architecture = [2756, 500, 500, 128], opt_epochs = [900,5,10], model_src = 'params/dbn_params') 30 | 31 | 32 | #model.train(test_input, batch_size = 50, learning_rate = 1/20000, epochs = 60000, obj_fn = 'mean_sq', output_path = #'params/ae_params_meansq2') 33 | 34 | # theano 2 35 | #model.train(test_input, batch_size = 200, learning_rate = 1/20000, epochs = 60000, output_path = 'params/ae_params_noise') 36 | 37 | # theano 3 38 | #model.train(test_input, batch_size = 500, learning_rate = 1/20000, epochs = 60000, output_path = 'params/ae_params_noise_2') 39 | 40 | #----------------------------------------------------------------------------------------- 41 | 42 | model = autoencoder( architecture = [2000, 500, 500, 128], opt_epochs = [110,15,10], model_src = 'params_2000/dbn_params_pretrain') 43 | # theano 1 44 | #model.train(test_input, add_noise = 16, batch_size = 500, learning_rate = 1/20000, epochs = 60000, \ 45 | # output_path = 'params_2000/ae_train_noise') 46 | 47 | # theano 2 48 | model.train(test_input, batch_size = 200, learning_rate = 1/20000, epochs = 60000, \ 49 | output_path = 'params_2000/ae_train_nonoise') --------------------------------------------------------------------------------