├── .gitignore
├── LICENSE
├── README.md
├── lib
    ├── __init__.py
    ├── __init__.pyc
    ├── __pycache__
    │   ├── __init__.cpython-35.pyc
    │   ├── dbn.cpython-35.pyc
    │   ├── deeplearning.cpython-35.pyc
    │   ├── mlp.cpython-35.pyc
    │   └── rbm.cpython-35.pyc
    ├── dbn.py
    ├── dbn.pyc
    ├── deeplearning.py
    ├── deeplearning.pyc
    ├── mlp.py
    ├── mlp.pyc
    ├── rbm.py
    └── rbm.pyc
├── notebooks
    ├── data_proc.ipynb
    ├── topic_modelling.ipynb
    ├── train_dbn.ipynb
    ├── train_sae.ipynb
    └── train_sae_2000.ipynb
├── scripts_R
    └── clustering.R
├── scripts_old
    ├── UT_1_gibbs_sampling.ipynb
    ├── Untitled.ipynb
    ├── Untitled1.ipynb
    ├── Untitled2.ipynb
    ├── finalized_1_gibbs.ipynb
    ├── gen_testing_1.ipynb
    └── score_rsm.ipynb
└── scripts_python
    ├── 20news_dtm.py
    ├── data_proc_20news.py
    ├── pretrain_dbn.py
    ├── train_dbn.py
    ├── train_rsm.py
    └── train_sae.py


/.gitignore:
--------------------------------------------------------------------------------
1 | data/
2 | old_scripts/
3 | params*/
4 | trash/
5 | .ipynb*/
6 | pgo*/
7 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Deep Learning Topic Modelling
 2 | This repo is a collection of neural network tools, built on top of the [Theano](http://deeplearning.net/software/theano/) framework with the primary objective of performing Topic Modelling. Topic modelling is commonly approached using the [Latent Dirichlet Allocation](https://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf) (LDA) or [Latent Semantic Analysis](https://en.wikipedia.org/wiki/Latent_semantic_analysis) (LSA) algorithms but more recently, with the advent of modelling count data using [Restricted Boltzmann Machines](https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine) (RBMs), also known as the [Replicated Softmax Model](https://papers.nips.cc/paper/3856-replicated-softmax-an-undirected-topic-model.pdf) (RSM), Deep Neural Network models were soon adapted to perform Topic Modelling with results empirically shown to be in better agreement with human's semantic interpretations (see [[1](http://www.utstat.toronto.edu/~rsalakhu/papers/topics.pdf)]). 
 3 | 
 4 | The overview of the model construction comprises of 3 phases.
 5 | 
 6 | ![addfdfdf](http://i.imgur.com/pVs5Rvb.png)_Image taken from [[1](http://www.utstat.toronto.edu/~rsalakhu/papers/topics.pdf)]_
 7 | 
 8 | 1. The first is to design the Network architecture using a RSM to model the input data followed by stacking as many layers of RBMs as deemed reasonable to model the outputs of the RSM. The stacking of RBMs (and RSM) leads what is called a Deep Generative Model or a more specifically in this case, a [Deep Belief Network](http://deeplearning.net/tutorial/DBN.html) (DBN). Like single layered RSM or RBM, this multi-layered network is bidirectional. It is able to generate encoded outputs from input data and more distinctly, generate 'input' data using encoded data. However, unlike single layered networks, multilayered networks are more likely to be able to generate input data with more similarity to the training data due to their ability to capture structure in high-dimensions.
 9 | 
10 | 
11 | 2. Once the network's architecture is defined, pre-training then follows. Pre-training has empircally been shown to improve the accuracy (or other measures) of neural network models and one of the main hypothesis to justify this phenomena is that pre-training helps configure the network to start off at a more optimal point compared to a random initialization.
12 | 
13 | 
14 | 3. After pre-training, the DBN is unrolled to produce an [Auto-Encoder](http://deeplearning.net/tutorial/dA.html). Auto-Encoders take in input data and reduce them into their lower dimensional representations before reconstructing them to be as close as possible to their input form. This is effectively a form of data compression but more importantly, it also means that the lower dimensional representations hold sufficient information about its higher dimensional input data for reconstruction to be feasible. Once training, or more appropriately fine-tuning in this case, is completed, only the segment of the Auto-Encoder that produces the lower dimensional output is retained.
15 | 
16 | 
17 | As these lower dimensional representations of the input data are easier to work with, algorithms that can be used to establish similarities between data points could be applied to the compressed data, to indirectly estimate similarities between the input data. For text data broken down into counts of words in documents, this dimension reduction technique can be used as an alternative method of information retrieval or topic modelling. 
18 | 
19 | 
20 | ## Codes
21 | Much of codes are a modification and addition of codes to the libraries provided by the developers of Theano at http://deeplearning.net/tutorial/. While Theano may now have been slightly overshadowed by its more prominent counterpart, [TensorFlow](https://www.tensorflow.org/), the tutorials and codes at deeplearning.net still provides a good avenue for anyone who wants to get a deeper introduction to deep learning and the mechanics of it. Moreover, given the undeniable inspiration that TensorFlow had from Theano, once Theano is mastered, the transition from Theano to TensorFlow should be almost seamless.
22 | 
23 | The main codes are found in the **lib** folder, where we have:
24 | 
25 | |no.| codes| description |
26 | |:-:|:-----:|:----|
27 | |1  | [rbm.py](https://github.com/krenova/DeepLearningTopicModels/blob/master/lib/rbm.py) | contains the RBM and RSM classes |
28 | |2  | [mlp.py](https://github.com/krenova/DeepLearningTopicModels/blob/master/lib/mlp.py) | contains the sigmoid and logistic regression classes |
29 | |3  | [dbn.py](https://github.com/krenova/DeepLearningTopicModels/blob/master/lib/dbn.py) | the DBN class to construct the netowrk functions for pre-training and fine tuning|
30 | |4  | [deeplearning.py](https://github.com/krenova/DeepLearningTopicModels/blob/master/lib/deeplearning.py)| wrapper around the DBN class|
31 | 
32 | 
33 | ## Examples
34 | Examples of using the tools in this repo are written in jupyter notebooks. The data source for the example can be sourced from 
35 | http://qwone.com/~jason/20Newsgroups/20news-18828.tar.gz.
36 | 
37 | |no.| codes| description |
38 | |:-:|:-----:|:----|
39 | |1  | [data_proc.ipynb](https://github.com/krenova/DeepLearningTopicModels/blob/master/notebooks/data_proc.ipynb) | notebook to process the raw data (please change the data dir name accordingly) |
40 | |2  | [train_dbn.ipynb](https://github.com/krenova/DeepLearningTopicModels/blob/master/notebooks/train_dbn.ipynb) | demonstrates how to pre-train the DBN and subsequently turn it into a Multilayer Perceptron for document classification |
41 | |3  | [train_sae.ipynb](https://github.com/krenova/DeepLearningTopicModels/blob/master/notebooks/train_sae.ipynb) | training the pre-trained model from train_dbn.ipynb as an Auto-Encoder |
42 | |4  | [topic_modelling.ipynb](https://github.com/krenova/DeepLearningTopicModels/blob/master/notebooks/topic_modelling.ipynb)| (using R here) clustering the lower dimensional output of the Auto-Encoder|
43 | 
44 | 
45 | 
46 | ## Reading References
47 | 1. http://www.utstat.toronto.edu/~rsalakhu/papers/topics.pdf
48 | 2. http://deeplearning.net/tutorial/rbm.html
49 | 3. http://deeplearning.net/tutorial/DBN.html
50 | 4. http://deeplearning.net/tutorial/dA.html
51 | 5. http://deeplearning.net/tutorial/SdA.html


--------------------------------------------------------------------------------
/lib/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__init__.py


--------------------------------------------------------------------------------
/lib/__init__.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__init__.pyc


--------------------------------------------------------------------------------
/lib/__pycache__/__init__.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__pycache__/__init__.cpython-35.pyc


--------------------------------------------------------------------------------
/lib/__pycache__/dbn.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__pycache__/dbn.cpython-35.pyc


--------------------------------------------------------------------------------
/lib/__pycache__/deeplearning.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__pycache__/deeplearning.cpython-35.pyc


--------------------------------------------------------------------------------
/lib/__pycache__/mlp.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__pycache__/mlp.cpython-35.pyc


--------------------------------------------------------------------------------
/lib/__pycache__/rbm.cpython-35.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/__pycache__/rbm.cpython-35.pyc


--------------------------------------------------------------------------------
/lib/dbn.py:
--------------------------------------------------------------------------------
  1 | """
  2 | """
  3 | from __future__ import print_function, division
  4 | import os
  5 | import sys
  6 | import timeit
  7 | 
  8 | import numpy
  9 | 
 10 | import theano
 11 | import theano.tensor as T
 12 | from theano.sandbox.rng_mrg import MRG_RandomStreams
 13 | 
 14 | from lib.mlp import HiddenLayer, LogisticRegression
 15 | from lib.rbm import RBM, RSM
 16 | 
 17 | 
 18 | # start-snippet-1
 19 | class DBN(object):
 20 |     """Deep Belief Network
 21 | 
 22 |     A deep belief network is obtained by stacking several RBMs on top of each
 23 |     other. The hidden layer of the RBM at layer `i` becomes the input of the
 24 |     RBM at layer `i+1`. The first layer RBM gets as input the input of the
 25 |     network, and the hidden layer of the last RBM represents the output. When
 26 |     used for classification, the DBN is treated as a MLP, by adding a logistic
 27 |     regression layer on top.
 28 |     """
 29 | 
 30 |     def __init__(self, numpy_rng=None, theano_rng=None, n_ins=784,
 31 |                  hidden_layers_sizes=[500, 500], n_outs=10):
 32 |         """This class is made to support a variable number of layers.
 33 | 
 34 |         :type numpy_rng: numpy.random.RandomState
 35 |         :param numpy_rng: numpy random number generator used to draw initial
 36 |                     weights
 37 | 
 38 |         :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
 39 |         :param theano_rng: Theano random generator; if None is given one is
 40 |                            generated based on a seed drawn from `rng`
 41 | 
 42 |         :type n_ins: int
 43 |         :param n_ins: dimension of the input to the DBN
 44 | 
 45 |         :type hidden_layers_sizes: list of ints
 46 |         :param hidden_layers_sizes: intermediate layers size, must contain
 47 |                                at least one value
 48 | 
 49 |         :type n_outs: int
 50 |         :param n_outs: dimension of the output of the network
 51 |         """
 52 | 
 53 |         self.sigmoid_layers = []
 54 |         self.rbm_layers = []
 55 |         self.params = []
 56 |         self.params_rbm = []
 57 |         self.n_layers = len(hidden_layers_sizes)
 58 |         self.hidden_layers_sizes = hidden_layers_sizes
 59 | 
 60 |         assert self.n_layers > 0
 61 |         
 62 |         if not numpy_rng:
 63 |             numpy_rng = numpy.random.RandomState(123)
 64 |         if not theano_rng:
 65 |             self.theano_rng = T.shared_randomstreams.RandomStreams(1234)
 66 |         # allocate symbolic variables for the data
 67 | 
 68 |         # the data is presented as rasterized images
 69 |         self.x = T.matrix('x')
 70 | 
 71 |         # the labels are presented as 1D vector of [int] labels
 72 |         self.y = T.ivector('y')
 73 |         # end-snippet-1
 74 |         # The DBN is an MLP, for which all weights of intermediate
 75 |         # layers are shared with a different RBM.  We will first
 76 |         # construct the DBN as a deep multilayer perceptron, and when
 77 |         # constructing each sigmoidal layer we also construct an RBM
 78 |         # that shares weights with that layer. During pretraining we
 79 |         # will train these RBMs (which will lead to chainging the
 80 |         # weights of the MLP as well) During finetuning we will finish
 81 |         # training the DBN by doing stochastic gradient descent on the
 82 |         # MLP.
 83 | 
 84 |         for i in range(self.n_layers):
 85 |             # construct the sigmoidal layer
 86 | 
 87 |             # the size of the input is either the number of hidden
 88 |             # units of the layer below or the input size if we are on
 89 |             # the first layer
 90 |             if i == 0:
 91 |                 input_size = n_ins
 92 |             else:
 93 |                 input_size = hidden_layers_sizes[i - 1]
 94 | 
 95 |             # the input to this layer is either the activation of the
 96 |             # hidden layer below or the input of the DBN if you are on
 97 |             # the first layer
 98 |             if i == 0:
 99 |                 layer_input = self.x
100 |             else:
101 |                 layer_input = self.sigmoid_layers[-1].output
102 |             print( 'Building layer: ' + str(i) )
103 |             print( '   Input units: ' + str(input_size) )
104 |             print( '  Output units: ' + str(hidden_layers_sizes[i]) )
105 |             sigmoid_layer = HiddenLayer(rng=numpy_rng,
106 |                                         input=layer_input,
107 |                                         n_in=input_size,
108 |                                         n_out=hidden_layers_sizes[i],
109 |                                         activation=T.nnet.sigmoid)
110 | 
111 |             # add the layer to our list of layers
112 |             self.sigmoid_layers.append(sigmoid_layer)
113 | 
114 |             # its arguably a philosophical question...  but we are
115 |             # going to only declare that the parameters of the
116 |             # sigmoid_layers are parameters of the DBN. The visible
117 |             # biases in the RBM are parameters of those RBMs, but not
118 |             # of the DBN.
119 |             self.params.extend(sigmoid_layer.params)
120 | 
121 |             # Construct an RBM that share weights with this layer
122 |             # First layer will be a RSM for inputs of count data
123 |             # while the other hidden layers will be RBMs
124 |             if i == 0: 
125 |                 rbm_layer = RSM(input=layer_input,
126 |                                 n_visible=input_size,
127 |                                 n_hidden=hidden_layers_sizes[i],
128 |                                 W=sigmoid_layer.W,
129 |                                 hbias=sigmoid_layer.b)
130 |             else:
131 |                 rbm_layer = RBM(numpy_rng=numpy_rng,
132 |                                 theano_rng=self.theano_rng,
133 |                                 input=layer_input,
134 |                                 n_visible=input_size,
135 |                                 n_hidden=hidden_layers_sizes[i],
136 |                                 W=sigmoid_layer.W,
137 |                                 hbias=sigmoid_layer.b)
138 |             self.rbm_layers.append(rbm_layer)
139 |             self.params_rbm.extend(rbm_layer.params)
140 | 
141 |         # We now need to add a logistic layer on top of the MLP
142 |         self.logLayer = LogisticRegression(
143 |             input=self.sigmoid_layers[-1].output,
144 |             n_in=hidden_layers_sizes[-1],
145 |             n_out=n_outs)
146 |         self.params.extend(self.logLayer.params)
147 | 
148 |         # compute the cost for second phase of training, defined as the
149 |         # negative log likelihood of the logistic regression (output) layer
150 |         self.finetune_cost = self.logLayer.negative_log_likelihood(self.y)
151 | 
152 |         # compute the gradients with respect to the model parameters
153 |         # symbolic variable that points to the number of errors made on the
154 |         # minibatch given by self.x and self.y
155 |         self.errors = self.logLayer.errors(self.y)
156 | 
157 |     def pretraining_functions(self, train_set_x, batch_size, k):
158 |         '''
159 |         Generates a list of functions, for performing one step of
160 |         gradient descent at a given layer. The function will require
161 |         as input the minibatch index, and to train an RBM you just
162 |         need to iterate, calling the corresponding function on all
163 |         minibatch indexes.
164 | 
165 |         :type train_set_x: theano.tensor.TensorType
166 |         :param train_set_x: Shared var. that contains all datapoints used
167 |                             for training the RBM
168 |         :type batch_size: int
169 |         :param batch_size: size of a [mini]batch
170 |         :param k: number of Gibbs steps to do in CD-k / PCD-k
171 | 
172 |         '''
173 | 
174 |         # index to a [mini]batch
175 |         index = T.lscalar('index')  # index to a minibatch
176 |         learning_rate = T.scalar('lr')  # learning rate to use
177 | 
178 |         # begining of a batch, given `index`
179 |         batch_begin = index * batch_size
180 |         # ending of a batch given `index`
181 |         batch_end = batch_begin + batch_size
182 | 
183 |         pretrain_fns = []
184 |         for rbm in self.rbm_layers:
185 | 
186 |             # get the cost and the updates list
187 |             # using CD-k here (persisent=None) for training each RBM.
188 |             # TODO: change cost function to reconstruction error
189 |             persistent_chain = theano.shared(numpy.zeros((batch_size, rbm.n_hidden),
190 |                                                  dtype=theano.config.floatX),
191 |                                      borrow=True)
192 |             
193 |             cost, updates = rbm.get_cost_updates(learning_rate,
194 |                                                  persistent=persistent_chain, k=k)
195 | 
196 |             # compile the theano function
197 |             fn = theano.function(
198 |                 inputs=[index, theano.In(learning_rate, value=0.1)],
199 |                 outputs=cost,
200 |                 updates=updates,
201 |                 givens={
202 |                     self.x: train_set_x[batch_begin:batch_end]
203 |                 }
204 |             )
205 |             # append `fn` to the list of functions
206 |             pretrain_fns.append(fn)
207 | 
208 |         return pretrain_fns
209 |     
210 |     def auto_encoding(self, input, 
211 |                       batch_size =500, learning_rate = None, 
212 |                       add_noise = None, obj_fn = 'cross_entropy'):
213 |         
214 |         if learning_rate is None:
215 |             learning_rate = 1/batch_size
216 |             
217 |         # Encoding input data
218 |         train_set_x = input
219 |         N_input_x = train_set_x.shape[0]
220 |         if add_noise:
221 |             assert type(add_noise) == float or type(add_noise) == int, "'add_noise' must be either None, float or int"
222 |             noise = T.matrix('noise')
223 |             train_noise = self.theano_rng.normal(
224 |                 size=(N_input_x.eval(), self.hidden_layers_sizes[-1]),
225 |                 avg=0, 
226 |                 std=add_noise, 
227 |                 ndim=None
228 |             )
229 |         fwd_pass = self.x
230 |         
231 |         for i in range(self.n_layers):
232 |             _, fwd_pass = self.rbm_layers[i].propup(fwd_pass)
233 |            
234 |         # Decoding encoded input data
235 |         if add_noise:
236 |             fwd_pass += noise
237 |         
238 |         # Decoding encoded input data
239 |         for i in reversed(range(self.n_layers)):
240 |             _, fwd_pass = self.rbm_layers[i].propdown(fwd_pass)
241 |             
242 |         if obj_fn == 'cross_entropy':
243 |             # ------ Objective Function: multinomial cross entropy ------ #
244 |             #L = - T.sum(self.x * T.log(fwd_pass), axis=1)
245 |             x_normalized = self.x / self.rbm_layers[0].input_rSum[:,None]
246 |             L = - T.sum(x_normalized * T.log(fwd_pass), axis=1)
247 |         else:
248 |             # ------ Objective Function: square error ------ #
249 |             L = T.sum( T.pow( fwd_pass - (self.x/self.dbn.rbm_layers[0].input_rSum[:,None]), 2), axis=1)
250 |             
251 |         # mean cost
252 |         cost = T.mean(L)
253 | 
254 |         # compute the gradients of the cost of the `dA` with respect
255 |         # to its parameters
256 |         gparams = T.grad(cost, self.params_rbm)
257 |         
258 |         # generate the list of updates
259 |         updates = [
260 |             (param, param - learning_rate * gparam)
261 |             for param, gparam in zip(self.params_rbm, gparams)
262 |         ]
263 |         
264 |         index = T.lscalar('index')
265 |         if add_noise:
266 |             train_dae = theano.function(
267 |                 inputs=[index],
268 |                 outputs=cost,
269 |                 updates=updates,
270 |                 givens={
271 |                     self.x: train_set_x[index * batch_size: (index + 1) * batch_size],
272 |                     noise: train_noise[index * batch_size: (index + 1) * batch_size]
273 |                 }
274 |             )
275 |         else:
276 |             train_dae = theano.function(
277 |                 inputs=[index],
278 |                 outputs=cost,
279 |                 updates=updates,
280 |                 givens={
281 |                     self.x: train_set_x[index * batch_size: (index + 1) * batch_size]
282 |                 }
283 |             )
284 |             
285 |         return train_dae
286 |     
287 |     
288 | 
289 |     def apply_dropout(self, input, corruption_level):
290 |         return self.theano_rng.binomial(size=input.shape, n=1,
291 |                                         p=1 - corruption_level,
292 |                                         dtype=theano.config.floatX) * input 
293 |     
294 |     def build_finetune_functions(self, x, y,
295 |                                  batch_size, learning_rate,
296 |                                  drop_out = None,
297 |                                  split_prop = [0.65,0.15,0.20]):
298 |         '''Generates a function `train` that implements one step of
299 |         finetuning, a function `validate` that computes the error on a
300 |         batch from the validation set, and a function `test` that
301 |         computes the error on a batch from the testing set
302 | 
303 |         :type datasets: list of pairs of theano.tensor.TensorType
304 |         :param datasets: It is a list that contain all the datasets;
305 |                         the has to contain three pairs, `train`,
306 |                         `valid`, `test` in this order, where each pair
307 |                         is formed of two Theano variables, one for the
308 |                         datapoints, the other for the labels
309 |         :type batch_size: int
310 |         :param batch_size: size of a minibatch
311 |         :type learning_rate: float
312 |         :param learning_rate: learning rate used during finetune stage
313 | 
314 |         '''
315 |         assert y.shape[0].eval() == x.shape[0].eval(), "independent and target length do not match!"
316 |         assert len(split_prop) == 3 and type(split_prop) == list, \
317 |             "'split_prop' cannot have more or less than 3 inputs and must in list format"
318 |         
319 |         N = y.shape[0].eval()
320 |         split_prop = numpy.array(split_prop)
321 |         split_prop = split_prop / split_prop.sum()
322 |         idx_rand = numpy.random.randint(N, size=N)
323 |         idx_train = idx_rand[:int(N*split_prop[0])]
324 |         idx_valid = idx_rand[len(idx_train):(len(idx_train)+int(N*split_prop[1]))]
325 |         idx_test  = idx_rand[(len(idx_train)+len(idx_valid)):]
326 |                                   
327 |         (train_set_x, train_set_y) = (x[idx_train,], y[idx_train])
328 |         (valid_set_x, valid_set_y) = (x[idx_valid,], y[idx_valid])
329 |         (test_set_x, test_set_y)   = (x[idx_test,],  y[idx_test])
330 |         
331 |         # compute number of minibatches for training, validation and testing
332 |         n_train_batches = train_set_x.shape[0].eval()
333 |         n_train_batches //= batch_size
334 |         n_valid_batches = valid_set_x.shape[0].eval()
335 |         n_valid_batches //= batch_size
336 |         n_test_batches = test_set_x.shape[0].eval()
337 |         n_test_batches //= batch_size
338 | 
339 |         index = T.lscalar('index')  # index to a [mini]batch
340 | 
341 |         if drop_out:
342 |             
343 |             assert type(drop_out) == list, "'drop_out' variable must be none or or a list of proportions"
344 |             assert len(drop_out) == (self.n_layers+1), "len of 'drop_out' list must equal number of layers"
345 |             
346 |             x_dropout = T.matrix('x_dropout')
347 |             fwd_pass = self.x
348 |             
349 |             for i in range(self.n_layers):
350 |                 self.sigmoid_layers[i].input = self.apply_dropout(fwd_pass, drop_out[i])
351 |                 fwd_pass = self.sigmoid_layers[i].output
352 |             
353 |             self.logLayer.input = self.apply_dropout(fwd_pass, drop_out[self.n_layers])
354 |             finetune_cost_dropout = self.logLayer.negative_log_likelihood(self.y)
355 |             
356 |             # compute the gradients with respect to the model parameters
357 |             gparams = T.grad(finetune_cost_dropout, self.params)
358 |             
359 |             # compute list of fine-tuning updates
360 |             updates = []
361 |             for param, gparam in zip(self.params, gparams):
362 |                 updates.append((param, param - gparam * learning_rate))
363 | 
364 |             train_fn = theano.function(
365 |                 inputs=[index],
366 |                 outputs=finetune_cost_dropout,
367 |                 updates=updates,
368 |                 givens={
369 |                     self.x: train_set_x[
370 |                         index * batch_size: (index + 1) * batch_size
371 |                     ],
372 |                     self.y: train_set_y[
373 |                         index * batch_size: (index + 1) * batch_size
374 |                     ]
375 |                 }
376 |             )
377 |             
378 |         else:
379 |             
380 |             # compute the gradients with respect to the model parameters
381 |             gparams = T.grad(self.finetune_cost, self.params)
382 | 
383 |             # compute list of fine-tuning updates
384 |             updates = []
385 |             for param, gparam in zip(self.params, gparams):
386 |                 updates.append((param, param - gparam * learning_rate))
387 | 
388 |             train_fn = theano.function(
389 |                 inputs=[index],
390 |                 outputs=self.finetune_cost,
391 |                 updates=updates,
392 |                 givens={
393 |                     self.x: train_set_x[
394 |                         index * batch_size: (index + 1) * batch_size
395 |                     ],
396 |                     self.y: train_set_y[
397 |                         index * batch_size: (index + 1) * batch_size
398 |                     ]
399 |                 }
400 |             )        
401 |         
402 |         test_score_i = theano.function(
403 |             [index],
404 |             self.errors,
405 |             givens={
406 |                 self.x: test_set_x[
407 |                     index * batch_size: (index + 1) * batch_size
408 |                 ],
409 |                 self.y: test_set_y[
410 |                     index * batch_size: (index + 1) * batch_size
411 |                 ]
412 |             }
413 |         )
414 | 
415 |         valid_score_i = theano.function(
416 |             [index],
417 |             self.errors,
418 |             givens={
419 |                 self.x: valid_set_x[
420 |                     index * batch_size: (index + 1) * batch_size
421 |                 ],
422 |                 self.y: valid_set_y[
423 |                     index * batch_size: (index + 1) * batch_size
424 |                 ]
425 |             }
426 |         )
427 |         
428 |         # Create a function that scans the entire validation set
429 |         def valid_score():
430 |             return [valid_score_i(i) for i in range(n_valid_batches)]
431 | 
432 |         # Create a function that scans the entire test set
433 |         def test_score():
434 |             return [test_score_i(i) for i in range(n_test_batches)]
435 | 
436 |         return n_train_batches, train_fn, valid_score, test_score
437 | 
438 |     
439 |     
440 |     def predict(self, input, batch_size = 2000, prob = False):
441 |         
442 |         train_set_x = input
443 |         N_input_x = train_set_x.shape[0]
444 |         
445 |         # compute number of minibatches for scoring
446 |         if train_set_x.get_value(borrow=True).shape[0] % batch_size != 0:
447 |             N_splits = int( numpy.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) + 1 )
448 |         else:
449 |             N_splits = int( numpy.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) )
450 | 
451 |         # allocate symbolic variables for the data
452 |         index = T.lscalar()    # index to a [mini]batch
453 |         
454 |         if prob:
455 |             output = theano.function(
456 |                  inputs = [index],
457 |                  outputs = self.logLayer.p_y_given_x,
458 |                  givens={
459 |                     self.x: train_set_x[index * batch_size: (index + 1) * batch_size]
460 |                  }
461 |             )  
462 |         else:
463 |             output = theano.function(
464 |                  inputs = [index],
465 |                  outputs = self.logLayer.y_pred,
466 |                  givens={
467 |                     self.x: train_set_x[index * batch_size: (index + 1) * batch_size]
468 |                  }
469 |             )  
470 |             
471 |         return numpy.concatenate( [output(ii) for ii in range(N_splits)], axis=0 )
472 | 


--------------------------------------------------------------------------------
/lib/dbn.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/dbn.pyc


--------------------------------------------------------------------------------
/lib/deeplearning.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function, division
  2 | import os
  3 | import sys
  4 | import timeit
  5 | from six.moves import cPickle as pickle
  6 | 
  7 | import numpy as np
  8 | import pandas as pd
  9 | 
 10 | import theano
 11 | import theano.tensor as T
 12 | 
 13 | from lib.mlp import HiddenLayer, LogisticRegression
 14 | from lib.rbm import RBM, RSM
 15 | from lib.dbn import DBN
 16 | 
 17 | os.chdir('/home/ekhongl/Codes/DL - Topic Modelling')
 18 | 
 19 | class InitializationError(object):
 20 |     '''this error is raised when the input definitions for the corresponding class is conflicting'''
 21 | 
 22 | ########################################################################################################
 23 | ## Deep Belief Net #####################################################################################
 24 | ########################################################################################################
 25 | class deepbeliefnet(object):
 26 |             
 27 |     def __init__(self, architecture = [2000, 500, 500, 128], 
 28 |                  opt_epochs = [], predefined_weights = None, n_outs = 1):
 29 |     
 30 |         # ensure proper class initialization
 31 |         assert len(architecture) > 1 , "architecture definition must include both the hidden layers AND input layer"
 32 |         
 33 |         # numpy random generator
 34 |         numpy_rng = np.random.RandomState(123)
 35 |         
 36 |         # reconstruct the DBN class
 37 |         self.hidden_layers_sizes = architecture[1:]
 38 |         self.n_layers = len(self.hidden_layers_sizes)
 39 |         self.params = []        #params for the MLP
 40 |         self.params_rbm = []    #params for the RBMs
 41 |         self.n_ins = architecture[0]
 42 |         self.n_outs = n_outs
 43 |         print('... building the model')
 44 |         self.dbn = DBN( numpy_rng=numpy_rng, 
 45 |                         n_ins=self.n_ins,
 46 |                         n_outs=self.n_outs,
 47 |                         hidden_layers_sizes=self.hidden_layers_sizes )
 48 |         
 49 |         # loading pre-trained weights
 50 |         if predefined_weights is not None and len(opt_epochs) >0 :
 51 |             self.new_model = False
 52 |             
 53 |             # load saved model
 54 |             for i in range(self.n_layers):
 55 |                 model_pkl = os.path.join(predefined_weights,
 56 |                             'dbn_layer' + str(i) + '_epoch_' + str(opt_epochs[i]) + '.pkl')
 57 |                 self.dbn.rbm_layers[i].__setstate__(pickle.load(open(model_pkl, 'rb')))
 58 |             # extract the model parameters
 59 |             for i in range(self.n_layers):
 60 |                 self.params_rbm.extend(self.dbn.rbm_layers[i].params)
 61 |             
 62 |             print('Pre-trained DBN model from "' + predefined_weights + '" loaded.')
 63 |         
 64 |         # loading fine-tuned weights
 65 |         elif predefined_weights is not None and len(opt_epochs) == 0:
 66 |             
 67 |             self.dbn.params = pickle.load(open(predefined_weights, 'rb'))
 68 |             for i in range(self.n_layers):
 69 |                 self.dbn.sigmoid_layers[i].__setstate__ (self.dbn.params[(i*2):(i*2+2)])
 70 |             self.dbn.logLayer.__setstate__ (self.dbn.params[(self.n_layers*2):(self.n_layers*2+2)])
 71 |             
 72 |             print('Fine-tuned (or MLP) model from "' + predefined_weights + '" loaded.')
 73 |         
 74 |         # error in class initialization
 75 |         elif (predefined_weights is None and len(opt_epochs) > 0):
 76 |             
 77 |             raise InitializationError("Either 'opt_epochs' and predefined_weights' is empty or filled at the same time")
 78 |             
 79 |         # creating a new generative model
 80 |         else:
 81 |             self.new_model = True
 82 |             for i in range(self.n_layers):
 83 |                 self.params_rbm.extend(self.dbn.rbm_layers[i].params)
 84 |     
 85 |         
 86 |                         
 87 |     def pretrain(self, input, pretraining_epochs=100, pretrain_lr=None, 
 88 |                         k=1, batch_size=800, output_path = 'params/dbn_params_test'):    
 89 |         
 90 |         train_set_x = input
 91 |         
 92 |         #---------------------------------------------------------------------------------------#
 93 |         # ensure class initialization matches input definition before function execution
 94 |         assert train_set_x.get_value(borrow=True).shape[1] == self.n_ins, \
 95 |                         "Input data dimensions must match initialized dimensions!"
 96 |         
 97 |         if pretrain_lr is None:
 98 |             pretrain_lr = [1/batch_size] * self.n_layers
 99 |         else:
100 |             if type(pretrain_lr) is not list:
101 |                 pretrain_lr = [pretrain_lr] * self.n_layers
102 |             elif len(pretrain_lr) != self.n_layers:
103 |                 pretrain_lr = [1/batch_size] * self.n_layers
104 |                 raise Warning('Warning: pretrain_lr len parameter not equal to number of hidden layers!')
105 |                 print('Reverting pretrain_lr to the default values (1/batch_size).')
106 |         if type(pretraining_epochs) is not list:
107 |             pretraining_epochs = [pretraining_epochs] * self.n_layers
108 |         #---------------------------------------------------------------------------------------#
109 |         
110 |         # compute number of minibatches for training, validation and testing
111 |         n_train_batches = train_set_x.get_value(borrow=True).shape[0] // batch_size
112 |         
113 |         #########################
114 |         # PRETRAINING THE MODEL #
115 |         #########################
116 |         if not os.path.isdir(output_path):
117 |             os.makedirs(output_path)
118 | 
119 |         print('... getting the pretraining functions')
120 |         pretraining_fns = self.dbn.pretraining_functions(train_set_x=train_set_x,
121 |                                                         batch_size=batch_size,
122 |                                                         k=k)
123 |         
124 |         print('... pre-training the model')
125 |         start_time = timeit.default_timer()
126 | 
127 |         # Pre-train layer-wise
128 |         for i in range(self.dbn.n_layers):
129 | 
130 |             # go through pretraining epochs
131 |             lproxy = []
132 |             for epoch in range(pretraining_epochs[i]):
133 | 
134 |                 # go through the training set
135 |                 mean_cost = []
136 |                 for batch_index in range(n_train_batches):
137 |                     mean_cost.append(pretraining_fns[i](index=batch_index, \
138 |                                                         lr=pretrain_lr[i]))
139 | 
140 |                 # calculating the epoch's mean proxy likelihood value
141 |                 lproxy += [np.mean(mean_cost)]
142 |                 print('Pre-training layer %i, epoch %d, cost ' % (i, epoch), end=' ')
143 |                 print(lproxy[epoch])
144 | 
145 |                 # save the model parameters for each epoch
146 |                 epoch_pickle = output_path +'/dbn_layer' + str(i) + \
147 |                                '_epoch_' + str(epoch) + '.pkl'
148 |                 #path_epoch_pickle = os.path.join( os.getcwd(), epoch_pickle)
149 |                 pickle.dump( self.dbn.rbm_layers[i].__getstate__(), \
150 |                              open(epoch_pickle, 'wb')    )
151 | 
152 |             # save the proxy likelihood profile
153 |             pd.DataFrame(data = {'likelihood_proxy' :lproxy} ). \
154 |                to_csv( output_path +'/lproxy_layer_' + str(i) + '.csv', index = False)
155 |             
156 |             end_time = timeit.default_timer()
157 |             print('The pretraining for layer ' + str(i) +
158 |                   ' ran for %.2fm' % ((end_time - start_time) / 60.), file=sys.stderr)
159 |         
160 |             
161 |     def train(self, x, y, split_prop = [0.65,0.15,0.20], training_epochs = 100,
162 |                     batch_size=800, learning_rate=None, drop_out = None,
163 |                     output_path = 'params/dbn_params_trained'):
164 |         
165 |         if learning_rate is None: learning_rate = 1/batch_size
166 |         
167 |         # get the training, validation and testing function for the model
168 |         print('... getting the finetuning functions')
169 |         n_train_batches, train_fn, validate_model, test_model = self.dbn.build_finetune_functions(
170 |             x=x, 
171 |             y=y, 
172 |             split_prop = split_prop, 
173 |             batch_size=batch_size,
174 |             learning_rate=learning_rate,
175 |             drop_out = drop_out
176 |         )
177 | 
178 |         print('... finetuning the model')
179 |         
180 |         #########################
181 |         #  TRAINING THE MODEL   #
182 |         #########################
183 |         if not os.path.isdir(output_path):
184 |             os.makedirs(output_path)
185 |             
186 |         # look as this many examples regardless
187 |         patience = 4 * n_train_batches
188 | 
189 |         # wait this much longer when a new best is found
190 |         patience_increase = 2.
191 | 
192 |         # a relative improvement of this much is considered significant
193 |         improvement_threshold = 0.995
194 | 
195 |         # go through this many minibatches before checking the network on
196 |         # the validation set; in this case we check every epoch
197 |         validation_frequency = min(n_train_batches, patience / 2)
198 | 
199 |         best_validation_loss = np.inf
200 |         test_score = 0.
201 |         start_time = timeit.default_timer()
202 | 
203 |         done_looping = False
204 |         epoch = 0
205 |         
206 |         while (epoch < training_epochs) and (not done_looping):
207 |             epoch = epoch + 1
208 |             for minibatch_index in range(n_train_batches):
209 | 
210 |                 train_fn(minibatch_index)
211 |                 iter = (epoch - 1) * n_train_batches + minibatch_index
212 | 
213 |                 if (iter + 1) % validation_frequency == 0:
214 | 
215 |                     validation_losses = validate_model()
216 |                     this_validation_loss = np.mean(validation_losses, dtype='float64')
217 |                     print('epoch %i, minibatch %i/%i, validation error %f %%' % (
218 |                         epoch,
219 |                         minibatch_index + 1,
220 |                         n_train_batches,
221 |                         this_validation_loss * 100.
222 |                         )
223 |                     )
224 | 
225 |                     # if we got the best validation score until now
226 |                     if this_validation_loss < best_validation_loss:
227 | 
228 |                         # improve patience if loss improvement is good enough
229 |                         if (this_validation_loss < best_validation_loss *
230 |                                 improvement_threshold):
231 |                             patience = max(patience, iter * patience_increase)
232 | 
233 |                         # save best validation score and iteration number
234 |                         best_validation_loss = this_validation_loss
235 |                         best_iter = iter
236 | 
237 |                         # test it on the test set
238 |                         test_losses = test_model()
239 |                         test_score = np.mean(test_losses, dtype='float64')
240 |                         print(('     epoch %i, minibatch %i/%i, test error of '
241 |                                'best model %f %%') %
242 |                               (epoch, minibatch_index + 1, n_train_batches,
243 |                               test_score * 100.))
244 |                         
245 |                         #-----------------------------------------------------------------------------#
246 |                         #----------- Saving the current best model -----------------------------------#
247 |                         #-----------------------------------------------------------------------------#
248 |                         print('Saving model...')
249 |                         
250 |                         tmp_params = []
251 |                         for i in range(self.n_layers):
252 |                             tmp_params.extend( self.dbn.sigmoid_layers[i].__getstate__ () )
253 |                         tmp_params.extend( self.dbn.logLayer.__getstate__ () )
254 |                         
255 |                         pickle.dump( tmp_params, \
256 |                                      open(output_path +'/trained_dbn.pkl', 'wb')    )
257 |                         
258 |                         del tmp_params
259 |                         
260 |                         print('...model saved.')
261 |                         #-----------------------------------------------------------------------------#
262 | 
263 |                 if patience <= iter:
264 |                     done_looping = True
265 |                     break
266 | 
267 |         end_time = timeit.default_timer()
268 |         print(('Optimization complete with best validation score of %f %%, '
269 |                'obtained at iteration %i, '
270 |                'with test performance %f %%'
271 |                ) % (best_validation_loss * 100., best_iter + 1, test_score * 100.))
272 |         print('The fine tuning ran for %.2fm' % ((end_time - start_time) / 60.), file=sys.stderr)
273 | 
274 |     def score(self, input, batch_size = 2000):
275 |         
276 |         x = T.matrix('x')
277 |         self.dbn.rbm_layers[0].input_rSum = x.sum(axis=1)
278 |         train_set_x = input
279 |         N_input_x = train_set_x.shape[0]
280 | 
281 |         # compute number of minibatches for scoring
282 |         if train_set_x.get_value(borrow=True).shape[0] % batch_size != 0:
283 |             N_splits = int( np.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) + 1 )
284 |         else:
285 |             N_splits = int( np.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) )
286 | 
287 |         # allocate symbolic variables for the data
288 |         index = T.lscalar()    # index to a [mini]batch
289 | 
290 |         # input_rSum must be specified for the RSM layer
291 |         activation = x
292 |         for i in range(self.n_layers):
293 |             _, activation = self.dbn.rbm_layers[i].propup(activation)
294 | 
295 |         # it is ok for a theano function to have no output
296 |         # the purpose of train_rbm is solely to update the RBM parameters
297 |         score = theano.function(
298 |             inputs = [index],
299 |             outputs = activation,
300 |             givens={
301 |                 x: train_set_x[index * batch_size: (index + 1) * batch_size]
302 |             }
303 |         )
304 | 
305 |         return np.concatenate( [score(ii) for ii in range(N_splits)], axis=0 )
306 | 
307 |     def predict(self, input, batch_size = 2000, prob = False):
308 |         
309 |         return self.dbn.predict(input = input, batch_size = batch_size, prob = prob)
310 |         
311 |         
312 |         
313 | ########################################################################################################
314 | ## Auto-Encoder ########################################################################################
315 | ########################################################################################################
316 | class autoencoder(object):
317 |         
318 |     def __init__(self, architecture = [], opt_epochs = [], model_src = 'params/dbn_params', param_type = 'dbn'):
319 |         
320 |         # ensure model source directory is valid
321 |         assert type(model_src) == str or model_src is not None, "dir to load model parameters not indicated"
322 |         if len(opt_epochs)>0:
323 |             assert len(architecture) == (len(opt_epochs)+1) , "len of network inputs must be 1 more than len of hidden layers"
324 |         
325 |         # reconstruct the DBN class
326 |         self.params = []
327 |         self.hidden_layers_sizes = architecture[1:]
328 |         self.n_layers = len(self.hidden_layers_sizes)
329 |         self.dbn = DBN( n_ins=architecture[0],
330 |                         hidden_layers_sizes = self.hidden_layers_sizes )
331 |         self.theano_rng = T.shared_randomstreams.RandomStreams(1234)
332 |         
333 |         if param_type == 'dbn':
334 |             # load saved model
335 |             print('Loading the pre-trained Deep Belief Net parameters...')
336 |             for i in range(self.n_layers):
337 |                 model_pkl = os.path.join(model_src,
338 |                             'dbn_layer' + str(i) + '_epoch_' + str(opt_epochs[i]) + '.pkl')
339 |                 self.dbn.rbm_layers[i].__setstate__(pickle.load(open(model_pkl, 'rb')))
340 |                 # extract the model parameters
341 |                 self.params.extend(self.dbn.rbm_layers[i].params)
342 |             print('...model loaded.')
343 | 
344 |         
345 |         else:
346 |             print('Loading the trained auto-encoder parameters.')
347 |             print('...please ensure that the auto-encoder params matches the defined architecture.')
348 |             for i in range(self.n_layers):
349 |                 model_pkl = model_src +'/ae_layer_' + str(i) + '.pkl'
350 |                 self.dbn.rbm_layers[i].__setstate__(pickle.load(open(model_pkl, 'rb')))
351 |                 self.params.extend(self.dbn.rbm_layers[i].params)
352 | 
353 |     
354 |     def get_corrupted_input(self, input, corruption_level):
355 |         return self.theano_rng.binomial(size=input.shape, n=1,
356 |                                         p=1 - corruption_level,
357 |                                         dtype=theano.config.floatX) * input 
358 |     
359 |     def get_cost_updates(self, learning_rate, add_noise = False, obj_fn = 'cross_entropy'):
360 |         """ This function computes the cost and the updates for one trainng
361 |         step of the dA """
362 |          
363 |         # Encoding input data
364 |         fwd_pass = self.x
365 |         for i in range(self.n_layers):
366 |             _, fwd_pass = self.dbn.rbm_layers[i].propup(fwd_pass)
367 |            
368 |         # Decoding encoded input data
369 |         if add_noise:
370 |             fwd_pass += self.noise
371 |         
372 |         # Decoding encoded input data
373 |         for i in reversed(range(self.n_layers)):
374 |             _, fwd_pass = self.dbn.rbm_layers[i].propdown(fwd_pass)
375 |             
376 |         if obj_fn == 'cross_entropy':
377 |             # ------ Objective Function: multinomial cross entropy ------ #
378 |             #L = - T.sum(self.x * T.log(fwd_pass), axis=1)
379 |             x_normalized = self.x / self.dbn.rbm_layers[0].input_rSum[:,None]
380 |             L = - T.sum(x_normalized * T.log(fwd_pass), axis=1)
381 |         else:
382 |             # ------ Objective Function: square error ------ #
383 |             # rightfully, should follow by L = L / len(vocab), but linear scaling
384 |             # does not affect search for minima and therefore omitted
385 |             L = T.sum( T.pow( fwd_pass*self.dbn.rbm_layers[0].input_rSum[:,None] - self.x, 2), axis=1)
386 |             
387 |         # mean cost
388 |         cost = T.mean(L)
389 | 
390 |         # compute the gradients of the cost of the `dA` with respect
391 |         # to its parameters
392 |         gparams = T.grad(cost, self.params)
393 |         
394 |         # generate the list of updates
395 |         updates = [
396 |             (param, param - learning_rate * gparam)
397 |             for param, gparam in zip(self.params, gparams)
398 |         ]
399 | 
400 |         return (cost, updates)
401 |     
402 |     
403 |     def train(self, input, epochs= 1000, batch_size = 500, learning_rate = None, add_noise = None,
404 |                     obj_fn = 'cross_entropy', output_path = 'params/ae_params'):
405 |         
406 |         N_input_x = input.get_value(borrow=True).shape[0]
407 |         
408 |         # compute number of minibatches for training
409 |         if N_input_x % batch_size != 0:
410 |             N_splits = int( np.floor(N_input_x / batch_size) + 1 )
411 |         else:
412 |             N_splits = int( np.floor(N_input_x / batch_size) )    
413 |     
414 |          # get the autoecoding training function
415 |         print('... getting the finetuning functions')
416 |         train_dae = self.dbn.auto_encoding(
417 |             input=input, 
418 |             batch_size = batch_size,
419 |             learning_rate = learning_rate,
420 |             add_noise = add_noise, 
421 |             obj_fn = obj_fn
422 |         )
423 | 
424 |         print('... finetuning the model')
425 |         
426 |         #########################
427 |         #  TRAINING THE MODEL   #
428 |         #########################
429 |         if not os.path.isdir(output_path):
430 |             os.makedirs(output_path)
431 |             
432 |         start_time = timeit.default_timer()
433 |         
434 |         # go through training epochs
435 |         cost_profile = []
436 |         for epoch in range(epochs):
437 |             
438 |             # go through trainng set
439 |             c = []
440 |             for batch_index in range(N_splits):
441 |                 c.append(train_dae(batch_index))
442 |             
443 |             # saving and printing iterations
444 |             cost_profile += [np.mean(c, dtype='float64')]
445 |             if epoch % 100 == 0:
446 |                 #-----------------------------------------------------------------------------#
447 |                 #----------- Saving the current best model -----------------------------------#
448 |                 #-----------------------------------------------------------------------------#
449 |                 print('Saving model...')
450 |                 # save the model parameters for all layers
451 |                 for i in range(self.n_layers):
452 |                     pickle.dump( self.dbn.rbm_layers[i].__getstate__(), \
453 |                                  open(output_path +'/ae_layer_' + str(i) + '.pkl', 'wb')  )
454 |                 print('...model saved')
455 |                 # save the proxy likelihood profile
456 |                 pd.DataFrame(data = {'cross_entropy' : cost_profile} ). \
457 |                    to_csv( output_path +'/cost_profile.csv', index = False)
458 |                 print('Training epoch %d, cost ' % epoch, cost_profile[epoch])
459 |                 #-----------------------------------------------------------------------------#
460 |         
461 |         print('Saving model...')
462 |                 # save the model parameters for all layers
463 |         for i in range(self.n_layers):
464 |             pickle.dump( self.dbn.rbm_layers[i].__getstate__(), \
465 |                         open(output_path +'/ae_layer_' + str(i) + '.pkl', 'wb')  )
466 |         print('...model saved')
467 |         # save the proxy likelihood profile
468 |         pd.DataFrame(data = {'cross_entropy' : cost_profile} ). \
469 |            to_csv( output_path +'/cost_profile.csv', index = False)
470 |         print('Training epoch %d, cost ' % epoch, cost_profile[epoch])
471 |         
472 |         end_time = timeit.default_timer()
473 | 
474 |         training_time = (end_time - start_time)
475 | 
476 |         print(('Training ran for %.2fm' % (training_time / 60.)), file=sys.stderr)
477 |     
478 |     def score(self, input, batch_size = 2000):    
479 |         train_set_x = input
480 |         N_input_x = train_set_x.shape[0]
481 | 
482 |         # compute number of minibatches for scoring
483 |         if train_set_x.get_value(borrow=True).shape[0] % batch_size != 0:
484 |             N_splits = int( np.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) + 1 )
485 |         else:
486 |             N_splits = int( np.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) )
487 |         
488 |         # allocate symbolic variables for the data
489 |         index = T.lscalar()    # index to a [mini]batch
490 | 
491 |         # input_rSum must be specified for the RSM layer
492 |         x = T.matrix('x')
493 |         self.dbn.rbm_layers[0].input_rSum = x.sum(axis=1)
494 |         activation = x
495 |         for i in range(self.n_layers):
496 |             _, activation = self.dbn.rbm_layers[i].propup(activation)
497 | 
498 |         # it is ok for a theano function to have no output
499 |         # the purpose of train_rbm is solely to update the RBM parameters
500 |         score = theano.function(
501 |             inputs = [index],
502 |             outputs = activation,
503 |             givens={
504 |                 x: train_set_x[index * batch_size: (index + 1) * batch_size]
505 |             }
506 |         )
507 | 
508 |         return np.concatenate( [score(ii) for ii in range(N_splits)], axis=0 )


--------------------------------------------------------------------------------
/lib/deeplearning.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/deeplearning.pyc


--------------------------------------------------------------------------------
/lib/mlp.py:
--------------------------------------------------------------------------------
  1 | """
  2 | This tutorial introduces the multilayer perceptron using Theano.
  3 | 
  4 |  A multilayer perceptron is a logistic regressor where
  5 | instead of feeding the input to the logistic regression you insert a
  6 | intermediate layer, called the hidden layer, that has a nonlinear
  7 | activation function (usually tanh or sigmoid) . One can use many such
  8 | hidden layers making the architecture deep. The tutorial will also tackle
  9 | the problem of MNIST digit classification.
 10 | 
 11 | .. math::
 12 | 
 13 |     f(x) = G( b^{(2)} + W^{(2)}( s( b^{(1)} + W^{(1)} x))),
 14 | 
 15 | References:
 16 | 
 17 |     - textbooks: "Pattern Recognition and Machine Learning" -
 18 |                  Christopher M. Bishop, section 5
 19 | 
 20 | """
 21 | 
 22 | from __future__ import print_function
 23 | 
 24 | __docformat__ = 'restructedtext en'
 25 | 
 26 | import os
 27 | import sys
 28 | import timeit
 29 | 
 30 | import numpy
 31 | 
 32 | import theano
 33 | import theano.tensor as T
 34 | 
 35 | 
 36 | # start-snippet-1
 37 | class HiddenLayer(object):
 38 |     
 39 |     def __getstate__(self):
 40 |         weights = [p.get_value() for p in self.params]
 41 |         return weights
 42 | 
 43 |     def __setstate__(self, weights):
 44 |         i = iter(weights)
 45 |         for p in self.params:
 46 |             p.set_value(next(i))
 47 |         
 48 |     def __init__(self, rng, input, n_in, n_out, W=None, b=None,
 49 |                  activation=T.tanh):
 50 |         """
 51 |         Typical hidden layer of a MLP: units are fully-connected and have
 52 |         sigmoidal activation function. Weight matrix W is of shape (n_in,n_out)
 53 |         and the bias vector b is of shape (n_out,).
 54 | 
 55 |         NOTE : The nonlinearity used here is tanh
 56 | 
 57 |         Hidden unit activation is given by: tanh(dot(input,W) + b)
 58 | 
 59 |         :type rng: numpy.random.RandomState
 60 |         :param rng: a random number generator used to initialize weights
 61 | 
 62 |         :type input: theano.tensor.dmatrix
 63 |         :param input: a symbolic tensor of shape (n_examples, n_in)
 64 | 
 65 |         :type n_in: int
 66 |         :param n_in: dimensionality of input
 67 | 
 68 |         :type n_out: int
 69 |         :param n_out: number of hidden units
 70 | 
 71 |         :type activation: theano.Op or function
 72 |         :param activation: Non linearity to be applied in the hidden
 73 |                            layer
 74 |         """
 75 |         self.input = input
 76 |         # end-snippet-1
 77 | 
 78 |         # `W` is initialized with `W_values` which is uniformely sampled
 79 |         # from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden))
 80 |         # for tanh activation function
 81 |         # the output of uniform if converted using asarray to dtype
 82 |         # theano.config.floatX so that the code is runable on GPU
 83 |         # Note : optimal initialization of weights is dependent on the
 84 |         #        activation function used (among other things).
 85 |         #        For example, results presented in [Xavier10] suggest that you
 86 |         #        should use 4 times larger initial weights for sigmoid
 87 |         #        compared to tanh
 88 |         #        We have no info for other function, so we use the same as
 89 |         #        tanh.
 90 |         if W is None:
 91 |             W_values = numpy.asarray(
 92 |                 rng.uniform(
 93 |                     low=-numpy.sqrt(6. / (n_in + n_out)),
 94 |                     high=numpy.sqrt(6. / (n_in + n_out)),
 95 |                     size=(n_in, n_out)
 96 |                 ),
 97 |                 dtype=theano.config.floatX
 98 |             )
 99 |             if activation == theano.tensor.nnet.sigmoid:
100 |                 W_values *= 4
101 | 
102 |             W = theano.shared(value=W_values, name='W', borrow=True)
103 | 
104 |         if b is None:
105 |             b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
106 |             b = theano.shared(value=b_values, name='b', borrow=True)
107 | 
108 |         self.W = W
109 |         self.b = b
110 | 
111 |         lin_output = T.dot(input, self.W) + self.b
112 |         self.output = (
113 |             lin_output if activation is None
114 |             else activation(lin_output)
115 |         )
116 |         # parameters of the model
117 |         self.params = [self.W, self.b]
118 | 
119 | """
120 | This tutorial introduces logistic regression using Theano and stochastic
121 | gradient descent.
122 | 
123 | Logistic regression is a probabilistic, linear classifier. It is parametrized
124 | by a weight matrix :math:`W` and a bias vector :math:`b`. Classification is
125 | done by projecting data points onto a set of hyperplanes, the distance to
126 | which is used to determine a class membership probability.
127 | 
128 | Mathematically, this can be written as:
129 | 
130 | .. math::
131 |   P(Y=i|x, W,b) &= softmax_i(W x + b) \\
132 |                 &= \frac {e^{W_i x + b_i}} {\sum_j e^{W_j x + b_j}}
133 | 
134 | 
135 | The output of the model or prediction is then done by taking the argmax of
136 | the vector whose i'th element is P(Y=i|x).
137 | 
138 | .. math::
139 | 
140 |   y_{pred} = argmax_i P(Y=i|x,W,b)
141 | 
142 | 
143 | This tutorial presents a stochastic gradient descent optimization method
144 | suitable for large datasets.
145 | 
146 | 
147 | References:
148 | 
149 |     - textbooks: "Pattern Recognition and Machine Learning" -
150 |                  Christopher M. Bishop, section 4.3.2
151 | 
152 | """
153 | 
154 | 
155 | class LogisticRegression(object):
156 |     """Multi-class Logistic Regression Class
157 | 
158 |     The logistic regression is fully described by a weight matrix :math:`W`
159 |     and bias vector :math:`b`. Classification is done by projecting data
160 |     points onto a set of hyperplanes, the distance to which is used to
161 |     determine a class membership probability.
162 |     """
163 |     
164 |     def __getstate__(self):
165 |         weights = [p.get_value() for p in self.params]
166 |         return weights
167 | 
168 |     def __setstate__(self, weights):
169 |         i = iter(weights)
170 |         for p in self.params:
171 |             p.set_value(next(i))
172 |         
173 |     def __init__(self, input, n_in, n_out):
174 |         """ Initialize the parameters of the logistic regression
175 | 
176 |         :type input: theano.tensor.TensorType
177 |         :param input: symbolic variable that describes the input of the
178 |                       architecture (one minibatch)
179 | 
180 |         :type n_in: int
181 |         :param n_in: number of input units, the dimension of the space in
182 |                      which the datapoints lie
183 | 
184 |         :type n_out: int
185 |         :param n_out: number of output units, the dimension of the space in
186 |                       which the labels lie
187 | 
188 |         """
189 |         # start-snippet-1
190 |         # initialize with 0 the weights W as a matrix of shape (n_in, n_out)
191 |         self.W = theano.shared(
192 |             value=numpy.zeros(
193 |                 (n_in, n_out),
194 |                 dtype=theano.config.floatX
195 |             ),
196 |             name='W',
197 |             borrow=True
198 |         )
199 |         # initialize the biases b as a vector of n_out 0s
200 |         self.b = theano.shared(
201 |             value=numpy.zeros(
202 |                 (n_out,),
203 |                 dtype=theano.config.floatX
204 |             ),
205 |             name='b',
206 |             borrow=True
207 |         )
208 | 
209 |         # symbolic expression for computing the matrix of class-membership
210 |         # probabilities
211 |         # Where:
212 |         # W is a matrix where column-k represent the separation hyperplane for
213 |         # class-k
214 |         # x is a matrix where row-j  represents input training sample-j
215 |         # b is a vector where element-k represent the free parameter of
216 |         # hyperplane-k
217 |         self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
218 | 
219 |         # symbolic description of how to compute prediction as class whose
220 |         # probability is maximal
221 |         self.y_pred = T.argmax(self.p_y_given_x, axis=1)
222 |         # end-snippet-1
223 | 
224 |         # parameters of the model
225 |         self.params = [self.W, self.b]
226 | 
227 |         # keep track of model input
228 |         self.input = input
229 | 
230 |     def negative_log_likelihood(self, y):
231 |         """Return the mean of the negative log-likelihood of the prediction
232 |         of this model under a given target distribution.
233 | 
234 |         .. math::
235 | 
236 |             \frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =
237 |             \frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|}
238 |                 \log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\
239 |             \ell (\theta=\{W,b\}, \mathcal{D})
240 | 
241 |         :type y: theano.tensor.TensorType
242 |         :param y: corresponds to a vector that gives for each example the
243 |                   correct label
244 | 
245 |         Note: we use the mean instead of the sum so that
246 |               the learning rate is less dependent on the batch size
247 |         """
248 |         # start-snippet-2
249 |         # y.shape[0] is (symbolically) the number of rows in y, i.e.,
250 |         # number of examples (call it n) in the minibatch
251 |         # T.arange(y.shape[0]) is a symbolic vector which will contain
252 |         # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of
253 |         # Log-Probabilities (call it LP) with one row per example and
254 |         # one column per class LP[T.arange(y.shape[0]),y] is a vector
255 |         # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,
256 |         # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is
257 |         # the mean (across minibatch examples) of the elements in v,
258 |         # i.e., the mean log-likelihood across the minibatch.
259 |         return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
260 |         # end-snippet-2
261 | 
262 |     def errors(self, y):
263 |         """Return a float representing the number of errors in the minibatch
264 |         over the total number of examples of the minibatch ; zero one
265 |         loss over the size of the minibatch
266 | 
267 |         :type y: theano.tensor.TensorType
268 |         :param y: corresponds to a vector that gives for each example the
269 |                   correct label
270 |         """
271 | 
272 |         # check if y has same dimension of y_pred
273 |         if y.ndim != self.y_pred.ndim:
274 |             raise TypeError(
275 |                 'y should have the same shape as self.y_pred',
276 |                 ('y', y.type, 'y_pred', self.y_pred.type)
277 |             )
278 |         # check if y is of the correct datatype
279 |         if y.dtype.startswith('int'):
280 |             # the T.neq operator returns a vector of 0s and 1s, where 1
281 |             # represents a mistake in prediction
282 |             return T.mean(T.neq(self.y_pred, y))
283 |         else:
284 |             raise NotImplementedError()
285 | 
286 | # start-snippet-2
287 | class MLP(object):
288 |     """Multi-Layer Perceptron Class
289 | 
290 |     A multilayer perceptron is a feedforward artificial neural network model
291 |     that has one layer or more of hidden units and nonlinear activations.
292 |     Intermediate layers usually have as activation function tanh or the
293 |     sigmoid function (defined here by a ``HiddenLayer`` class)  while the
294 |     top layer is a softmax layer (defined here by a ``LogisticRegression``
295 |     class).
296 |     """
297 | 
298 |     def __init__(self, rng, input, n_in, n_hidden, n_out):
299 |         """Initialize the parameters for the multilayer perceptron
300 | 
301 |         :type rng: numpy.random.RandomState
302 |         :param rng: a random number generator used to initialize weights
303 | 
304 |         :type input: theano.tensor.TensorType
305 |         :param input: symbolic variable that describes the input of the
306 |         architecture (one minibatch)
307 | 
308 |         :type n_in: int
309 |         :param n_in: number of input units, the dimension of the space in
310 |         which the datapoints lie
311 | 
312 |         :type n_hidden: int
313 |         :param n_hidden: number of hidden units
314 | 
315 |         :type n_out: int
316 |         :param n_out: number of output units, the dimension of the space in
317 |         which the labels lie
318 | 
319 |         """
320 | 
321 |         # Since we are dealing with a one hidden layer MLP, this will translate
322 |         # into a HiddenLayer with a tanh activation function connected to the
323 |         # LogisticRegression layer; the activation function can be replaced by
324 |         # sigmoid or any other nonlinear function
325 |         self.hiddenLayer = HiddenLayer(
326 |             rng=rng,
327 |             input=input,
328 |             n_in=n_in,
329 |             n_out=n_hidden,
330 |             activation=T.tanh
331 |         )
332 | 
333 |         # The logistic regression layer gets as input the hidden units
334 |         # of the hidden layer
335 |         self.logRegressionLayer = LogisticRegression(
336 |             input=self.hiddenLayer.output,
337 |             n_in=n_hidden,
338 |             n_out=n_out
339 |         )
340 |         # end-snippet-2 start-snippet-3
341 |         # L1 norm ; one regularization option is to enforce L1 norm to
342 |         # be small
343 |         self.L1 = (
344 |             abs(self.hiddenLayer.W).sum()
345 |             + abs(self.logRegressionLayer.W).sum()
346 |         )
347 | 
348 |         # square of L2 norm ; one regularization option is to enforce
349 |         # square of L2 norm to be small
350 |         self.L2_sqr = (
351 |             (self.hiddenLayer.W ** 2).sum()
352 |             + (self.logRegressionLayer.W ** 2).sum()
353 |         )
354 | 
355 |         # negative log likelihood of the MLP is given by the negative
356 |         # log likelihood of the output of the model, computed in the
357 |         # logistic regression layer
358 |         self.negative_log_likelihood = (
359 |             self.logRegressionLayer.negative_log_likelihood
360 |         )
361 |         # same holds for the function computing the number of errors
362 |         self.errors = self.logRegressionLayer.errors
363 | 
364 |         # the parameters of the model are the parameters of the two layer it is
365 |         # made out of
366 |         self.params = self.hiddenLayer.params + self.logRegressionLayer.params
367 |         # end-snippet-3
368 | 
369 |         # keep track of model input
370 |         self.input = input


--------------------------------------------------------------------------------
/lib/mlp.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/mlp.pyc


--------------------------------------------------------------------------------
/lib/rbm.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/krenova/DeepLearningTopicModels/f13e4c71f65a56b26856327197a193d39c882c71/lib/rbm.pyc


--------------------------------------------------------------------------------
/notebooks/train_sae.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {
  7 |     "collapsed": false
  8 |    },
  9 |    "outputs": [
 10 |     {
 11 |      "name": "stderr",
 12 |      "output_type": "stream",
 13 |      "text": [
 14 |       "Using gpu device 0: Tesla K40c (CNMeM is disabled, cuDNN 5105)\n",
 15 |       "/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.\n",
 16 |       "  warnings.warn(warn)\n"
 17 |      ]
 18 |     }
 19 |    ],
 20 |    "source": [
 21 |     "import os\n",
 22 |     "os.chdir('~/Codes/DL - Topic Modelling')\n",
 23 |     "\n",
 24 |     "from __future__ import print_function, division\n",
 25 |     "import sys\n",
 26 |     "import timeit\n",
 27 |     "from six.moves import cPickle as pickle\n",
 28 |     "\n",
 29 |     "import numpy as np\n",
 30 |     "import pandas as pd\n",
 31 |     "\n",
 32 |     "import theano\n",
 33 |     "import theano.tensor as T\n",
 34 |     "\n",
 35 |     "from lib.deeplearning import autoencoder"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "code",
 40 |    "execution_count": 2,
 41 |    "metadata": {
 42 |     "collapsed": false
 43 |    },
 44 |    "outputs": [],
 45 |    "source": [
 46 |     "dat_x = np.genfromtxt('data/dtm_20news.csv', dtype='float32', delimiter=',', skip_header = 1)\n",
 47 |     "dat_y = dat_x[:,0]\n",
 48 |     "dat_x = dat_x[:,1:]\n",
 49 |     "vocab =  np.genfromtxt('data/dtm_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:]\n",
 50 |     "test_input = theano.shared(dat_x)"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "markdown",
 55 |    "metadata": {},
 56 |    "source": [
 57 |     "## Loading weights pretrained from the deepbeliefnet to the autoencoder"
 58 |    ]
 59 |   },
 60 |   {
 61 |    "cell_type": "code",
 62 |    "execution_count": 3,
 63 |    "metadata": {
 64 |     "collapsed": false,
 65 |     "scrolled": true
 66 |    },
 67 |    "outputs": [
 68 |     {
 69 |      "name": "stdout",
 70 |      "output_type": "stream",
 71 |      "text": [
 72 |       "Building layer: 0\n",
 73 |       "   Input units: 2756\n",
 74 |       "  Output units: 500\n",
 75 |       "Building layer: 1\n",
 76 |       "   Input units: 500\n",
 77 |       "  Output units: 500\n",
 78 |       "Building layer: 2\n",
 79 |       "   Input units: 500\n",
 80 |       "  Output units: 128\n"
 81 |      ]
 82 |     }
 83 |    ],
 84 |    "source": [
 85 |     "model = autoencoder( architecture = [2756, 500, 500, 128], opt_epochs = [900,5,10], model_src = 'params/dbn_params')"
 86 |    ]
 87 |   },
 88 |   {
 89 |    "cell_type": "markdown",
 90 |    "metadata": {},
 91 |    "source": [
 92 |     "## Training the Autoencoder"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "code",
 97 |    "execution_count": null,
 98 |    "metadata": {
 99 |     "collapsed": false,
100 |     "scrolled": true
101 |    },
102 |    "outputs": [
103 |     {
104 |      "name": "stdout",
105 |      "output_type": "stream",
106 |      "text": [
107 |       "... getting the finetuning functions\n",
108 |       "... finetuning the model\n"
109 |      ]
110 |     }
111 |    ],
112 |    "source": [
113 |     "model.train(test_input, batch_size = 100, epochs = 110, add_noise = 16, output_path = 'params/to_delete')"
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "markdown",
118 |    "metadata": {},
119 |    "source": [
120 |     "## Loading the trained Auto-Encoder"
121 |    ]
122 |   },
123 |   {
124 |    "cell_type": "code",
125 |    "execution_count": 4,
126 |    "metadata": {
127 |     "collapsed": false
128 |    },
129 |    "outputs": [
130 |     {
131 |      "name": "stdout",
132 |      "output_type": "stream",
133 |      "text": [
134 |       "Building layer: 0\n",
135 |       "   Input units: 2000\n",
136 |       "  Output units: 500\n",
137 |       "Building layer: 1\n",
138 |       "   Input units: 500\n",
139 |       "  Output units: 500\n",
140 |       "Building layer: 2\n",
141 |       "   Input units: 500\n",
142 |       "  Output units: 128\n",
143 |       "Loading the trained auto-encoder parameters.\n",
144 |       "...please ensure that the auto-encoder params matches the defined architecture.\n"
145 |      ]
146 |     }
147 |    ],
148 |    "source": [
149 |     "model = autoencoder( architecture = [2000, 500, 500, 128], model_src = 'params_2000/ae_train',  param_type = 'ae')"
150 |    ]
151 |   },
152 |   {
153 |    "cell_type": "markdown",
154 |    "metadata": {},
155 |    "source": [
156 |     "## Extracting features from the trained Auto-Encoder"
157 |    ]
158 |   },
159 |   {
160 |    "cell_type": "code",
161 |    "execution_count": 8,
162 |    "metadata": {
163 |     "collapsed": false
164 |    },
165 |    "outputs": [],
166 |    "source": [
167 |     "output = model.score(test_input)"
168 |    ]
169 |   },
170 |   {
171 |    "cell_type": "markdown",
172 |    "metadata": {},
173 |    "source": [
174 |     "## Saving the features extracted"
175 |    ]
176 |   },
177 |   {
178 |    "cell_type": "code",
179 |    "execution_count": 8,
180 |    "metadata": {
181 |     "collapsed": false
182 |    },
183 |    "outputs": [],
184 |    "source": [
185 |     "colnames = ['bit'] * 128\n",
186 |     "colnames = [colnames[i] + str(i) for i in range(128)]\n",
187 |     "colnames.insert(0,'_label_')\n",
188 |     "pd.DataFrame(data = np.c_[dat_y, output], \n",
189 |     "             columns = colnames). \\\n",
190 |     "             to_csv( 'data/ae_features.csv', index = False)"
191 |    ]
192 |   },
193 |   {
194 |    "cell_type": "markdown",
195 |    "metadata": {},
196 |    "source": [
197 |     "# Visualizing the convergence behavior"
198 |    ]
199 |   },
200 |   {
201 |    "cell_type": "code",
202 |    "execution_count": 9,
203 |    "metadata": {
204 |     "collapsed": false
205 |    },
206 |    "outputs": [
207 |     {
208 |      "data": {
209 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAFdCAYAAABfMCThAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzt3XucnOP9//H3Z3dzICQpkUMrIaEJ3zjErkNUBA0JzS+K\n9NF2hRK+aNGSftWhfImqU9W5lKKosNqqbxVphSqC0tpFlFQ0oiISh4gVcs5evz+u2c5hZ3Z3Zmfn\nuu+Z1/PxmMd9nLk/cznMe+/7uq/bnHMCAAAohqrQBQAAgPJBsAAAAEVDsAAAAEVDsAAAAEVDsAAA\nAEVDsAAAAEVDsAAAAEVTE7qAbMxsS0mTJL0laU3YagAAiJXekraV9IhzbnmpDx7JYCEfKu4OXQQA\nADE2TdI9pT5oVIPFW5I0a9Ys7bjjjoFLiY8ZM2bo6quvDl1G7NBu+aPNCkO75Y82y9/8+fN11FFH\nSYnf0lKLarBYI0k77rijamtrQ9cSG/369aO9CkC75Y82Kwztlj/arEuCdCWg8yYAACgaggUAACga\nggUAACgagkUZqa+vD11CLNFu+aPNCkO75Y82ix9zzoWuoQ0zq5XU2NjYSKcdAADy0NTUpLq6Okmq\nc841lfr4nLEAAABFQ7AAAABFQ7AAAABFQ7AAAABFQ7AAAABFQ7AAAABFQ7AAAABFQ7AAAABFE+lg\nsW5d6AoAAEA+Ih0sPvoodAUAACAfkQ4Wq1eHrgAAAOSDYAEAAIom0sHinXdCVwAAAPIR6WBxzTWh\nKwAAAPmIdLCYNi10BQAAIB+RDhbOha4AAADkI9LBYs2a0BUAAIB8RDpYvP566AoAAEA+Ih0sHn88\ndAUAACAfkQ4WAAAgXiIdLLbdNnQFAAAgH3kFCzNbZGYtWV7X59j/9sT2jRn7v9KZ461alU91AAAg\ntHzPWOwuaXDK6yBJTtJvcuz/vcR+QxLTrSV91M7+ad5/n1tOAQCIk5p8dnbOLU9dNrMpkhY65+bm\n2H+lpJUp+x8mqb+kOzp7zE8/lTbfPJ8qAQBAKAX3sTCzHpKmSbotj7cdJ+kx59zizr6BB5EBABAf\nXem8ebikfpLu7MzOZjZE0iGSbsnnIC++mH9hAAAgjK4Ei+Mk/dE5t6yT+x8raYWkB/I5yMEH51kV\nAAAIJq8+Fq3MbJikAyUdlsfbpkv6lXNuQ+ffMkNSPx16aHJNfX296uvr8zgsAADlqaGhQQ0NDWnr\nmpubA1XjmSvgtgszmynpBElDnXMtndh/f0l/lrSTc25+J/avldQoNWr69Fr98pd5lwgAQEVqampS\nXV2dJNU555pKffy8L4WYmclf1rgjM1SY2SVmlq3PxfGSnu9MqMh0++35vgMAAIRSyKWQAyUNlZTt\nJ39IYtt/mFlf+Y6e38v3QAMHSuPGFVAhAAAIIu9g4Zx7VFJ1jm3Ts6z7RNJm+ZfmB8i6//5C3gkA\nAEKI9LNCAABAvEQ6WBx5ZOgKAABAPiIdLO65x095GBkAAPEQ6WAxaJCfzp4dtg4AANA5kQ4WrZdC\nNiuo6ycAACi1SAeL/fbz00svDVsHAADonEgHiwED0qcAACDaIh0sNtnEXwb5/OdDVwIAADoj0sFC\nkj79VPrZz0JXAQAAOiPywQIAAMRHbIJFAQ9hBQAAJRabYLF6degKAABARyIfLL77XT99/vmwdQAA\ngI5FPliY+emsWWHrAAAAHYt8sDjzTD/daaewdQAAgI5FPlh84QtSnz6hqwAAAJ0R+WAhSf37Sx9/\nHLoKAADQkVgEiyVLpB/9KHQVAACgI7EIFgAAIB5iESxOOSV0BQAAoDNiESyee85P580LWwcAAGhf\nLILFkiV+eu65YesAAADti0WwuPRSP33oobB1AACA9sUiWBx5ZOgKAABAZ8QiWPTsGboCAADQGbEI\nFgAAIB5iEyy22SZ0BQAAoCOxCRbNzX66alXYOgAAQG6xCRbXXuunH34Ytg4AAJBbbILF0KF+evjh\nYesAAAC5xSZYDBnip01NYesAAAC5xSZY7LBD6AoAAEBHYhMsAABA9MUyWGzYELoCAACQTayCRW2t\nny5aFLYOAACQXayCxRZb+Olhh4WtAwAAZBerYPGtb/npa6+FrQMAAGQXq2Bx9NF+WlMTtg4AAJBd\nrIJFqw0bJOdCVwEAADLFMlhI3BkCAEAUxS5YTJrkp++/H7YOAADQVuyCxSuv+OlJJ4WtAwAAtBW7\nYHHXXX768MNh6wAAAG3FLlgccEDoCgAAQC6xCxZmoSsAAAC5xC5YSNLgwX66alXYOgAAQLpYBotN\nN/XT//u/sHUAAIB0sQwW1dV+etttYesAAADpYhks7rzTT//yl7B1AACAdLEMFmPHhq4AAABkE8tg\nwZ0hAABEUyyDhSQdcoifvvNO2DoAAEBSbIPF/vv76YwZQcsAAAApYhssJk700/vuC1sHAABIim2w\nGDMmdAUAACBTbINFqpaW0BUAAACpTILF0qWhKwAAAFLMg0Vr/4oJE8LWAQAAvLyChZktMrOWLK/r\n23lPTzO72MzeMrM1ZvammR3b5coljR7tp6+/XoxPAwAAXVWT5/67S6pOWd5Z0hxJv2nnPb+VtJWk\n6ZIWShqiIp0p2WGH5LxzDJwFAEBoeQUL59zy1GUzmyJpoXNubrb9zexgSftKGuGc+zix+u1CCu3I\n2rVS797d8ckAAKCzCj5zYGY9JE2T1N4zRqdIekHSWWb2jpm9bmZXmFnRIwCXQwAACK8rlyQOl9RP\n0p3t7DNC/ozFaEmHSTpN0tck3dCF46aZPt1PGdcCAIDwuhIsjpP0R+fcsg4+v0XSkc65F5xzf5L0\nfUnHmFmvLhz7P665phifAgAAiiHfzpuSJDMbJulA+bMQ7VkqaYlz7tOUdfMlmaSt5Ttz5jRjxgz1\n69cvbV19fb3q6+v/s9y3b+frBgCgnDQ0NKihoSFtXXNzc6BqPHPO5f8ms5mSTpA01DmXc9xLMztB\n0tWSBjrnViXWfVXSfZI2c86tzfG+WkmNjY2Nqq2t7UQ9ftrcTNAAAFS2pqYm1dXVSVKdc66p1MfP\n+1KImZmkYyXdkRkqzOwSM0vtc3GPpOWSbjezHc1svKSfSLotV6joioyTGwAAoMQK6WNxoKShkm7P\nsm1IYpskyTn3maSDJPWX9HdJd0l6QL4TZ9F8+GExPw0AABQq7z4WzrlHlT5IVuq26VnWLZA0Kf/S\nOm/LLbvz0wEAQGfF+lkh2axZE7oCAAAqV9kEi/PP99OMzrEAAKCEyiZYbLednx53XNg6AACoZGUT\nLA4+OHQFAACgbILFwIHJ+QULwtUBAEAlK5tgIUk1iXtcdt45bB0AAFSqsgoWixf76bp1YesAAKBS\nlVWwGDAgdAUAAFS2sgoWNSnDfd1/f7g6AACoVGUVLCRpzz39dOrUsHUAAFCJyi5YnHNO6AoAAKhc\nZRcsDjssdAUAAFSusgsWqW64IXQFAABUlrIMFpdf7qennhq2DgAAKk1ZBgsCBQAAYZRlsNhkk+T8\na6+FqwMAgEpTlsHCTBoxws+PHh22FgAAKklZBgtJmj07dAUAAFSesg0Wo0Yl55csCVcHAACVpGyD\nRarjjw9dAQAAlaGsg8VNN/npI4+ErQMAgEpR1sHixBOT862PVAcAAN2nrIOFWXJ+2LBwdQAAUCnK\nOlhI0syZoSsAAKBylH2wOP/80BUAAFA5yj5YpF4OmTs3XB0AAFSCsg8WqcaPD10BAADlrSKCxR/+\nkJx/++1wdQAAUO4qIlhMmZKc32abcHUAAFDuKiJYSNIxxyTnN2wIVwcAAOWsYoLFddcl588+O1wd\nAACUs4oJFn37JuevvDJcHQAAlLOKCRaS9MQTyXnngpUBAEDZqqhgkXq76U9/Gq4OAADKVUUFCzNp\n1Cg/f+aZYWsBAKAcVVSwkNI7cf773+HqAACgHFVcsJg4MTm/7bbBygAAoCxVXLCQpPPOC10BAADl\nqSKDRepgWZddFq4OAADKTUUGi+23T86fcw63ngIAUCwVGSyk9GG9FywIVwcAAOWkYoNFdXVyfocd\nwtUBAEA5qdhgIUn33Zec53IIAABdV9HBYurU5HxVRbcEAADFUfE/p6mXRAAAQNdUfLBYuzY5P3Bg\nuDoAACgHFR8sUs9YfPBBuDoAACgHFR8sJOnDD5PzjMoJAEDhCBaSttwyOX/xxeHqAAAg7ggWCeef\nn5y/6aZwdQAAEGcEi4QLL0zOf+c74eoAACDOCBYpDj00Of/qq+HqAAAgrggWKR54IDm/007SmjXh\nagEAII4IFhnuvz85f/PN4eoAACCOCBYZDj88OX/66eHqAAAgjggWWRx1VHJ+ypRwdQAAEDcEiyzu\nuis5/9BD6cN+AwCA3AgWOTz2WHJ+2rRwdQAAECd5BQszW2RmLVle1+fYf78s+240s8g/7mvChOT8\n737HHSIAAHRGvmcsdpc0OOV1kCQn6TftvMdJ+mLKe4Y4597Pv9TSmz07Of/1r4erAwCAuKjJZ2fn\n3PLUZTObImmhc25uB2/9wDn3Sb7FhXbIIcn5Bx+Umpulfv3C1QMAQNQV3MfCzHpImibpto52lfSS\nmb1rZnPM7EuFHjOE9euT8/37h6sDAIA46ErnzcMl9ZN0Zzv7LJV0kqSpko6QtFjSE2Y2pgvHLama\njHM6Tz0Vpg4AAOKgK8HiOEl/dM4ty7WDc26Bc+4W59yLzrnnnHPHS3pW0owuHLfkHnooOb/ffpJz\n4WoBACDK8upj0crMhkk6UNJhBbz9b5L26cyOM2bMUL+MTg319fWqr68v4LCFmzxZuvvu5G2nV18t\nff/7JS0BAIA2Ghoa1NDQkLauubk5UDWeuQL+/DazmZJOkDTUOdeS53vnSPrEOfe1dvapldTY2Nio\n2travOvrLmbJ+Xfekb7whXC1AACQTVNTk+rq6iSpzjnXVOrj530pxMxM0rGS7sgMFWZ2iZndmbJ8\nmpkdambbmdloM7tG0gGSftbFuoO45prk/NZbh6sDAICoKqSPxYGShkq6Pcu2IYltrXpKulLSPElP\nSNpZ0gTn3BMFHDe4005LX3755TB1AAAQVXn3sXDOPSqpOse26RnLV0i6orDSomnjRqk68e3HjPHL\nVQyMDgCAJJ4VkreqKunJJ5PLvXuHqwUAgKghWBQgtT/p+vXS3I7GHQUAoEIQLAqw2Wbpy+PHh6kD\nAICoIVgU6OKL05ePOSZMHQAARAnBokDnnOMfStbqV7+SVq4MVw8AAFFAsCiQmdS3b/p4Fn37hqsH\nAIAoIFh00eLF6cupo3MCAFBpCBZF8MAD6cslfpQJAACRQbAogilTpIsuSi7fe6/03nvh6gEAIBSC\nRRGYSeedl75u8GDpwQfD1AMAQCgEiyLKvCvk0EPD1AEAQCgEiyLabLP04b4l6e67w9QCAEAIBIsi\nGz9e2nvv5PJRR0mTJ4erBwCAUiJYdINnn01fnj1bamkJUwsAAKVEsOgmzqUvV1e3XQcAQLkhWHSj\n669PX77oImnVqjC1AABQCgSLbnTyydJppyWXL7hA6tMnXD0AAHQ3gkU3qqqSrrlGOuus9PXLloWp\nBwCA7kawKIHLLktfHjJEevzxMLUAANCdCBYlsm5d+vKECdwpAgAoPwSLEunRQ1q7Nn1ddbX0q1+F\nqQcAgO5AsCihnj2lN95IX3fMMdyGCgAoHwSLEtt+e+mTT9LXVVVJGzeGqQcAgGIiWASw+ebSTTel\nr+vZkzMXAID4I1gEctJJ/jJIq5YWf+aCDp0AgDgjWAR0xx3SkUemr6uu5lZUAEB8ESwCy/ZY9QkT\npLlzS18LAABdRbCIAOek3XdPXzd+fJhaAADoCoJFRPz9723XmUnvv1/6WgAAKBTBIkJWrJAmTkxf\nN2iQtHp1mHoAAMgXwSJC+veXHnmkbb+LTTf1oQMAgKgjWETQkUdKU6akr9tii7ajdgIAEDUEi4j6\nwx+kf/4zfd3IkdKCBWHqAQCgMwgWETZqlHTCCW3XPfJImHoAAOgIwSLibryx7bqDD5Z22aX0tQAA\n0BGCRcTV1PhxLi68MH39K69IdXU8XwQAEC0Ei5g4//y2d4Y0Nfnni3z0EQEDABANBIsY6d9feu+9\ntuu33FK66qrS1wMAQCaCRcwMHCitXy/tvXf6+jPOkJYvD1MTAACtCBYxVFMjPfts2/UDBvhhwBcv\nLn1NAABIBItYe/99afLktuuHDZPOPltaurT0NQEAKhvBIsa22soPpPWTn7Tddvnl0uc/X/qaAACV\njWARc1VV0g9+IL38cvbtZtLChaWtCQBQuQgWZWKXXfwtp9Omtd22/fbZH8sOAECxESzKzKxZUnNz\n2/V77ilts03p6wEAVBaCRRnq2zf7gFlvv+0vjcyZU/qaAACVgWBRxpyT5s1ru37SJGnmTOnmmxmx\nEwBQXASLMrfzzv7ySKYLL5S+/W3f+RMAgGLhZ6UCTJsmffyxdMEF2bebZT+zAQBAvggWFaJfP3/5\n47PPsm/fdVcfMAAA6AqCRYXZdFPpr3+Vrrgi+3Yzadmy0tYEACgfBIsKNHasf2iZc36EzkxDhviA\n8fzz0sqVpa8PABBfBIsKd+aZ0ooV2beNHetvXd24sbQ1AQDii2AB9e8vrV7tO3hmU1MjnXKKtGRJ\naesCAMQPwQKSpN69fQfPDz7Ivv3GG6Wtt5YaGnI/lwQAgJrQBSBaBgzwfS+cyz7GxZFH+unmm0vv\nvOMvlQAA0IozFsjKzIeLNWuyb1+50p/hmDcv9z4AgMpDsEC7evWSNmyQbr01+/Zdd5U22UT65BNp\n8eLS1gYAiJ68goWZLTKzliyv6zvx3n3MbL2ZNRVeLkKorpaOP15aulQ69dTs+/TrJw0b5s90cBcJ\nAFSufM9Y7C5pcMrrIElO0m/ae5OZ9ZN0p6THCqgRETF4sHT99dLcudLs2bn3q6mRxo2TWlpKVxsA\nIBryChbOueXOufdbX5KmSFronJvbwVtvknS3pOcKrBMRMm6cdMghvg/GPvtk3+eZZ/yZjssukxYs\nKG19AIBwCu5jYWY9JE2TdFsH+02XNFzShYUeC9H19NM+YKxfn337OedIo0b5SyRz5pS2NgBA6XWl\n8+bhklovcWRlZl+UdImkac45ToyXsZoaqblZuuOO3PtMmuQDhpn0wgslKw0AUEJdCRbHSfqjcy7r\nI6vMrEr+8scFzrmFrau7cDxEXN++0jHH+L4Vq1b5Dp+57LGHDxgffli6+gAA3c+cc/m/yWyYpDcl\nHeaceyjHPv0krZC0QclAUZWY3yBponPuiRzvrZXUOH78ePXr1y9tW319verr6/OuGWEsWCB96UvS\n8uW595k6VTr3XGnECH93CQCgcxoaGtTQ0JC2rrm5WU899ZQk1TnnSn4nZqHBYqakEyQNzXWJw8xM\n0o4Zq0+RdICkqZLecs6tzvHeWkmNjY2Nqq2tzbs+RI9z0i23SCed1P5+VVXSn/8s1dX50T0BAPlp\nampSXV2dFChY5H0pJBEYjpV0R2aoMLNLzOxOSXLea6kvSe9LWuOcm58rVKA8mUknntjxLagtLdIB\nB/jLKm+8Ib32WmnqAwAURyHPCjlQ0lBJt2fZNiSxDcjKzI/k+eabflyM9p41MnJkcv6jj6TPfa77\n6wMAdE3eZyycc48656qdc//Ksm26c+7L7bz3Qucc1zYqXHW19MUv+ksdrQ8822OP9t+zxRbJO0q4\nqwQAootnhSAS/vY3adEiabfdpKZOXBHcYw/pyiu5qwQAooZggcjYdlsfKnbbzd9F8vLL/uxGLmec\nIW21lT+D8cMfSg0N0urVDCUOACEV0scC6HZbbOFfGzb45d13lxobc+9/6aVt161dK/Xs2T31AQCy\n44wFYuGFF5L9MZZlHZKtrV69/GPd58yRVq7s3voAAB7BArEzaJB/NPtBB3W877x5fijxvn39JZOJ\nEzvXhwMAUBiCBWKpqsqfiXBOev11f0bj2GM7ft+jj/rBt1rvLnnxxW4vFQAqCsECsTdypA8Lt9/u\ng8bGjdK113auf0VtrQ8Y220nnXWW76uxcKH/HABA/goa0ru7MaQ3iuX55/0D0b6cc3SV3FatkjbZ\npPg1AUB3it2Q3kCc7LWXHyK8tePnmjX+Ee+dsemmyUsmv/5199YJAOWCYIGK0quXtH59Mmg8lPXZ\nvG1985vJkPGDH0h33eWHGV+8uHvrBYC4YRwLVLTJk32fjAUL/APPvvGN5NgZufz0p+nLRxwh3Xyz\nPxuy9dbdVysAxAFnLFDxqqqkHXbwAWHdOv9atUoaMqRz77//fj8C6NChybMaL78sffaZNH9+99YO\nAFHDGQsghZnUo4d/vfuuX/f88/4Sym67df5zxoxJzo8dKw0cKN17r+8Mum6dH7Bryy2LWzsARAHB\nAujAXnv5aeoNVGvWdP6Okeee89NNN227frvtpAEDul4jAEQFwQIoQO/evm/GunW+I+eYMdKee+b3\nGWPHJueHD5dOOMGf2Rg2rHOjigJAFBEsgAJVVfmAccIJfrn1jMbatdL3vy/deGPnP2vRIv+E1lZD\nh/rltWul//5vqU+f4tUNAN2JzptAkfXqJd1wQ/KWVuf8La7bbtv5z1i8WPrOd6TTT5c22yzZKdRM\n+t3v/D7O+TMmABAlBAugBGpq/FmJ1qCxcKG/RTX1LEVnfe1rPmBUVfkQM3WqtGRJ8WsGgEIQLIAA\nRoyQTjxRuvjiZNh44gnp+OPz/6z77/fjZ6Se1ejTR3rySX/bLACUEsECiIj99pNuvTUZNP7xD+nr\nX/fb9tknv89atUraf38fMFrDxvDh0o9+5OcPOsg/HRYAio1gAUTU6NH+GSXOSU8/7W9xPfdc6b77\nCvu8t96SLrjAzz/2mDRpkn+a6xNPSKtXF6tqAJWOu0KAmOjVS/rxj/186x0oq1dLt9zi+2z06eOD\nQj7a6+Px3HPJMTwAoLM4YwHE2CabSN/7nnTttdIllyQvo2zcKE2Y0LXPHjs2vd+Gme8TssMO0ptv\n+jMoAJCJYAGUoaoqf7nDOenVV33QWL9eWrHCbx82rLDPPe886fXX/Yihm2ySHjrOOUdavlz67W+l\nDz4o3ncBEC9cCgHK3H/9l59WVUn9+6cPTf7EE9Lee0tTpkiPPtq141x2mX+1uvlmHzS+9CWpXz8/\npPnIkV07BoDoI1gAFWz//f10zhxp2TL/2n576S9/8Wcn5s0r/LNPOqntuvHjpaee8p1QDz9cevxx\n6dhj/dNhAZQHc6l/vkSEmdVKamxsbFRtbW3ocoCK55x0xx3SAQdIH3/sz26ceWb3HOuZZ6RPP/WX\nbiZP7p5jAOWsqalJdXV1klTnnGsq9fE5YwGgQ2bS9OnJ5TFj/AigW20lPfustPPO/i6SI47o+rE6\nGrNj8mTpl7/0Q51nPjEWQHh03gRQkOHD/Y/7xInSkCH+0kbq81Fa705ZulQ65pjiHffhh6VBg9IH\n/7rhBh9wLr7Y9+v4+9/981YAlB7BAkC3qaqSBg/2l1FSA8fixX500B//WDrjDP8sla449VR/puO8\n86QBA/wj7IcN86HjlFN8wPnTn/y+zkktLV3+agByIFgAKLmtt/a3q557rnTFFb4/ReqTYN94Q/ru\nd4tzrBtv9MHlkEOSD2+rrvbzP/yhv6yz997SySdLr7zib88FUDiCBYBIqanxd6Zcd10ybNx6q/TS\nSz5wFNOll0ovv+z7h/z859Iuu0g77ZS8xFJX50c1vfJKf4blmWeKe3ygHBEsAETe8cdLu+7qA0dm\nP46WFj9dvlxqbJTGjSvecZua/DHPOMP36Rg3Ln1QsPfe85dYNmyQPvssOUbIhg1+IDGgEhEsAMSa\nmZ9usYVUWyvNnetfH32UDB8bNkgvvFD8Yw8e7C+x9OjhO7JWVfl6evTwQ59vu6304IPJ/VtafH8P\noJwRLACUnXHjpM99LrlcXe0va7QGjRUr/GWVG2+U7r5bWrvWP0lW8sOVF8u//y0demjyDEd1tb/U\n07p81VX+9tkHH/Thp3XIdSDOGCALAHL417/8oGDvvOMvizz6qHTWWaWt4R//kEaPli6/3D/NduZM\nP4DYZpuVtg7ER+gBsjhjAQA5bL+9vzXWOWm33fxoo5984i9ntJ79mDEj/T2XX17cGlo7k559tnTh\nhX5+883T+3pce60fjj0Vl1wQCsECAPKw+ea+L0Wrq65K70x65pn+jMKGDf4Si3Ndf8BbR04/3Q9S\nlho2Wi+5jBjh+3yMGpW8lfYXv5CWLPHzK1b4WoFiYUhvACiyPn38tLraTw88MHnHyEsv+Y6l48ZJ\nTz/tL3OsWePH9Xj4Yem444pby6JFfrpggT/70ZFXXkl/Iu4bb/ixRVrXAR0hWABACY0Zk5z/8pfT\nt02fnv5MFsn/qC9dKs2eLb32mvT733fvcOU775x7W3W19OKLvqPrTjv5sOGcvxUYaEWwAIAI69HD\nD0/+7W/75euuS25bvdqf5Zg0SbrkEv9QthEjpKOO6p5aNm70g4i1p08fX+///q80cKDUu7f05JPS\n//yPDyE9e6ZfSkL54a4QACgzzvn+Fa3Do//+9/4hcXPm+M6lAwb4Sx6hTZ/uH2I3erS05Za+b8rI\nkdK770offMCZkEKFviuEYAEAFe7dd/0P+uOP+3E9br3VLw8f7kcQfe210BX6TrH77uuD0rBh/iwN\nt9xmR7DIgmABANGyapW/1DJtmh9R9KKLfCAZOjR0Zd4VV/i+KCee6C/H7LSTdM01/tLN6NFSr15+\nv5Ur/fdo7VhbjggWWRAsACAeWlr8q6bGh48ePXxfjN69/fa335amTvX9LGbO9AEgCvr2lfbYQ/rz\nn/3w6wcd5Pt+nHGGf/punBEssiBYAEDlmDfPB5O+faNzBiSbffeVPv5YuvdePyrrc8/5TrOtWlqk\n+fP9GZKQQgcL7goBAASVeqdJ5t+6LS1+unSpf3rtHntIRx8tzZrlA8nRR/sf8smT/dmG7jR3rp+m\nBodLL82+77XX+ifubr659NZb0gkn+L4h06f7Mze77FK+d8dwxgIAUFbWrvU/5q++Kl18sXTssf52\n1698RXoa/CoUAAAJFElEQVTggdDVeT17+jq7A2csAAAool69/BDmo0ZJRxzh1333u7n3X7NG+tnP\nfMfOyZOlvfbq/hrXrfOjoY4c2f3HKjWCBQCgovXunX4ZJdeJ/O99zz/s7ayzpKuv9pc2Wjt7Dhrk\nz5LkY9CggkuONIIFAACdkDrq6axZyflVq9ru2zpImXO+s+fZZ/sA8/TTvgPo8OG+w2o5KtOvBQBA\nOGbJaX29f7XaaqswNZVKmfZJBQAAIRAsAABA0RAsAABA0RAsAABA0RAsAABA0RAsykhDQ0PoEmKJ\ndssfbVYY2i1/tFn85BUszGyRmbVkeV2fY/99zOxpM/vQzFaZ2XwzO704pSMT/wEWhnbLH21WGNot\nf7RZ/OQ7jsXuklKfYr+zpDmSfpNj/88kXS9pXmJ+nKRfmNmnzrlb8zw2AACIuLyChXNueeqymU2R\ntNA5NzfH/i9Jeill1T1mNlXSvpIIFgAAlJmC+1iYWQ9J0yTdlsd7dpO0t6QnCj0uAACIrq4M6X24\npH6S7uxoRzNbLGkr+csoM51zt3fwlt6SNH/+/C6UV3mam5vV1FTyJ+TGHu2WP9qsMLRb/miz/KX8\ndvYOcXxzuR7j1tEbzf4kaa1z7qud2HcbSZtJGivpckmnOOd+3c7+R0q6u6DCAACAJE1zzt1T6oMW\nFCzMbJikNyUd5px7KM/3nivpKOfcju3ss6WkSZLekrQm7wIBAKhcvSVtK+mRzL6RpVDopZDjJL0n\naXYB762W1Ku9HRINUfKUBQBAmXg21IHzDhZmZpKOlXSHc64lY9slkr7gnDsmsXyypLcl/TOxy36S\n/kfSNV2oGQAARFQhZywOlDRUUrYOmEMS21pVSbpU/pTMBkkLJf3AOfeLAo4LAAAiruDOmwAAAJl4\nVggAACgaggUAACiayAULMzsl8bCz1Wb2nJntEbqmUjCzc8zsb2b2iZm9Z2b/Z2Yjs+z3IzN7N/FQ\nt0fNbPuM7b3M7IbEg99Wmtl9ZjYwY5/PmdndZtZsZivM7FYz69Pd37EUzOzsxIPxrspYT7ulMLPP\nm9ldKQ8IfNnMajP2oc1SmFmVmV1kZm8m2uRfZnZelv0qut3MbF8z+4OZLUn8t3holn1K0kZmNtTM\nHjazz8xsmZn9xMyi+LuXs83MrMbMLjezeWb2aWKfO81sSMZnRKfNnHOReUn6hvy4Fd+StIOkmyV9\nJGlA6NpK8N1nSzpa0o7yD3d7SH4cj01S9jkr0R7/T9JOkn4v3yG2Z8o+P0+8bz9Ju8nfcjQ341h/\nlNQk/1C5L0laIGlW6DYoQhvuIT++youSrqLdcrZTf0mL5J/XUydpG/lO2cNps3bb7YeS3pd0sKRh\nko6Q9ImkU2m3tNoPlvQjSV+VtFHSoRnbS9JG8n84vyLpEfn/p05K/PP7ceg2yqfNJPVNfIepkr4o\naU9Jz0n6W8ZnRKbNgjdoxpd+TtK1Kcsm6R1JZ4auLUBbDJDUImlcyrp3Jc3I+BdutaSvpyyvlXR4\nyj6jEp+zZ2J5x8Tybin7TJK/a2dw6O/dhfbaTNLrkr4s6S9KDxa0W3pbXSbpyQ72oc3atsmDkm7J\nWHefpF/RbjnbrEVtg0VJ2kjSIZLWK+UPU0knSVohqSZ02+TTZln22V0+gGwdxTaLzCkh8w81q5P0\n59Z1zn+rx+QfXFZp+kty8sleZjZc0mClt88nkp5Xsn12l7+FOHWf1+XHEmndZ6ykFc65F1OO9Vji\nWHt1xxcpkRskPeicezx1Je2W1RRJL5jZb8xfdmsys/9u3Uib5fSspAlm9kVJMrNdJe2jxECBtFvH\nStxGYyW94pz7MGWfR+SfcTW6SF8plNbfh48Ty3WKUJtFJljI/4VeLT+iZ6r35P9FrBhmZvKDiD3t\nnHstsXqw/L8A7bXPIEnrEv+h5tpnsPyprf9wzm2UDzCxbGcz+6akMZLOybKZdmtrhKTvyJ/hmSh/\nCvU6Mzs6sZ02y+4ySb+W9E8zWyepUdI1zrl7E9tpt46Vso0G5ziOFON2NLNe8v8u3uOc+zSxerAi\n1GZdebopus+Nkv5L/q8htMPMtpYPYQc659aHricmquSvz/5vYvllM9tJ0rcl3RWurMj7hqQjJX1T\n0mvyYfZaM3vXOUe7oduZWY2k38qHs5MDl5NTlM5YfCh/zWhQxvpBkpaVvpwwzOxnkr4iaX/n3NKU\nTcvk+5y01z7LJPU0s74d7JPZU7ha0haKZzvXSdpKUpOZrTez9fKdl05L/FX5nmi3TEslzc9YN1++\nQ6LEv2u5/ETSZc653zrnXnXO3S3paiXPlNFuHStlGy3LcRwphu2YEiqGSpqYcrZCilibRSZYJP7a\nbJQ0oXVd4pLABAV8mEopJULFVyUd4Jx7O3Wbc26R/D/Y1PbpK39trLV9GuU74qTuM0r+B+OviVV/\nldTfzHZL+fgJ8v+xP1/M71Mij8n3Xh4jadfE6wVJsyTt6px7U7RbpmfkO3alGiXp3xL/rrVjU/k/\nflK1KPH/UdqtYyVuo79K2tnMBqTsM1FSs/wZp9hICRUjJE1wzq3I2CVabRa6B2xGT9evS1ql9NtN\nl0vaKnRtJfjuN8r3vN1XPiG2vnqn7HNmoj2myP+Y/l7SG0q/TetG+VsJ95f/a/4Ztb3laLb8j+8e\n8pdbXpd0V+g2KGJbZt4VQrulf4/d5XuQnyNpO/nT+yslfZM2a7fdbpfvDPcV+Vt0D5e/Zn0J7ZZW\nex/5gD9GPnidnlgeWso2kg98L8vfYrmL/B0Q70m6KHQb5dNm8l0WHpAP/jsr/fehRxTbLHiDZmng\nk+XvxV0tn552D11Tib53i/xfQ5mvb2XsN1P+dq1V8r11t8/Y3kvS9fKXllbKp9yBGfv0l/+Lvlk+\nzNwiadPQbVDEtnxcKcGCdsvaRl+RNC/RHq9KOi7LPrRZ+nfpI+mqxP+8P5P/MbxQGbfhVXq7yV+K\nzPb/s1+Wuo3kf5gfkvSp/A/k5ZKqQrdRPm0mH2Izt7Uuj49im/EQMgAAUDSR6WMBAADij2ABAACK\nhmABAACKhmABAACKhmABAACKhmABAACKhmABAACKhmABAACKhmABAACKhmABAACKhmABAACK5v8D\nOPjVaQZ8DfgAAAAASUVORK5CYII=\n",
210 |       "text/plain": [
211 |        "<matplotlib.figure.Figure at 0x7fe275b68128>"
212 |       ]
213 |      },
214 |      "metadata": {},
215 |      "output_type": "display_data"
216 |     }
217 |    ],
218 |    "source": [
219 |     "import numpy as np\n",
220 |     "import matplotlib.pyplot as plt\n",
221 |     "%matplotlib inline\n",
222 |     "plt_dat = np.genfromtxt('params_2000/ae_train/cost_profile.csv', delimiter=',', names = True)\n",
223 |     "plt.plot(plt_dat)\n",
224 |     "plt.show()"
225 |    ]
226 |   }
227 |  ],
228 |  "metadata": {
229 |   "anaconda-cloud": {},
230 |   "kernelspec": {
231 |    "display_name": "Python [conda env:py3]",
232 |    "language": "python",
233 |    "name": "conda-env-py3-py"
234 |   },
235 |   "language_info": {
236 |    "codemirror_mode": {
237 |     "name": "ipython",
238 |     "version": 3
239 |    },
240 |    "file_extension": ".py",
241 |    "mimetype": "text/x-python",
242 |    "name": "python",
243 |    "nbconvert_exporter": "python",
244 |    "pygments_lexer": "ipython3",
245 |    "version": "3.5.3"
246 |   }
247 |  },
248 |  "nbformat": 4,
249 |  "nbformat_minor": 2
250 | }
251 | 


--------------------------------------------------------------------------------
/notebooks/train_sae_2000.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {
  7 |     "collapsed": false
  8 |    },
  9 |    "outputs": [
 10 |     {
 11 |      "name": "stderr",
 12 |      "output_type": "stream",
 13 |      "text": [
 14 |       "Using gpu device 0: Tesla K40c (CNMeM is disabled, cuDNN 5105)\n",
 15 |       "/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.\n",
 16 |       "  warnings.warn(warn)\n"
 17 |      ]
 18 |     }
 19 |    ],
 20 |    "source": [
 21 |     "from __future__ import print_function, division\n",
 22 |     "import os\n",
 23 |     "import sys\n",
 24 |     "import timeit\n",
 25 |     "from six.moves import cPickle as pickle\n",
 26 |     "\n",
 27 |     "import numpy as np\n",
 28 |     "import pandas as pd\n",
 29 |     "\n",
 30 |     "import theano\n",
 31 |     "import theano.tensor as T\n",
 32 |     "\n",
 33 |     "from lib.deeplearning import autoencoder\n",
 34 |     "\n",
 35 |     "os.chdir('~/Codes/DL - Topic Modelling')"
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "code",
 40 |    "execution_count": 2,
 41 |    "metadata": {
 42 |     "collapsed": false
 43 |    },
 44 |    "outputs": [],
 45 |    "source": [
 46 |     "dat_x = np.genfromtxt('data/dtm_2000_20news.csv', dtype='float32', delimiter=',', skip_header = 1)\n",
 47 |     "dat_y = dat_x[:,0]\n",
 48 |     "dat_x = dat_x[:,1:]\n",
 49 |     "vocab =  np.genfromtxt('data/dtm_2000_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:]\n",
 50 |     "test_input = theano.shared(dat_x)"
 51 |    ]
 52 |   },
 53 |   {
 54 |    "cell_type": "markdown",
 55 |    "metadata": {},
 56 |    "source": [
 57 |     "## loading weights pretrained from the Deep Belief Net (DBN) to the Autoencoder"
 58 |    ]
 59 |   },
 60 |   {
 61 |    "cell_type": "code",
 62 |    "execution_count": 4,
 63 |    "metadata": {
 64 |     "collapsed": false,
 65 |     "scrolled": true
 66 |    },
 67 |    "outputs": [
 68 |     {
 69 |      "name": "stdout",
 70 |      "output_type": "stream",
 71 |      "text": [
 72 |       "Building layer: 0\n",
 73 |       "   Input units: 2000\n",
 74 |       "  Output units: 500\n",
 75 |       "Building layer: 1\n",
 76 |       "   Input units: 500\n",
 77 |       "  Output units: 500\n",
 78 |       "Building layer: 2\n",
 79 |       "   Input units: 500\n",
 80 |       "  Output units: 128\n"
 81 |      ]
 82 |     }
 83 |    ],
 84 |    "source": [
 85 |     "model = autoencoder( architecture = [2000, 500, 500, 128], opt_epochs = [110,15,10], model_src = 'params_2000/dbn_params_pretrain')"
 86 |    ]
 87 |   },
 88 |   {
 89 |    "cell_type": "markdown",
 90 |    "metadata": {},
 91 |    "source": [
 92 |     "## Training the Autoencoder"
 93 |    ]
 94 |   },
 95 |   {
 96 |    "cell_type": "code",
 97 |    "execution_count": 5,
 98 |    "metadata": {
 99 |     "collapsed": false
100 |    },
101 |    "outputs": [
102 |     {
103 |      "name": "stdout",
104 |      "output_type": "stream",
105 |      "text": [
106 |       "... getting the finetuning functions\n",
107 |       "... finetuning the model\n",
108 |       "Saving model...\n",
109 |       "...model saved\n",
110 |       "Training epoch 0, cost  7.79978342056\n",
111 |       "Saving model...\n",
112 |       "...model saved\n",
113 |       "Training epoch 100, cost  7.48429107666\n",
114 |       "Saving model...\n",
115 |       "...model saved\n",
116 |       "Training epoch 109, cost  7.46735124588\n"
117 |      ]
118 |     },
119 |     {
120 |      "name": "stderr",
121 |      "output_type": "stream",
122 |      "text": [
123 |       "Training ran for 0.29m\n"
124 |      ]
125 |     }
126 |    ],
127 |    "source": [
128 |     "model.train(test_input, batch_size = 200, epochs = 110, add_noise = 16, output_path = 'params_2000/ae_train')"
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "markdown",
133 |    "metadata": {},
134 |    "source": [
135 |     "## Loading the trained Auto-Encoder"
136 |    ]
137 |   },
138 |   {
139 |    "cell_type": "code",
140 |    "execution_count": 3,
141 |    "metadata": {
142 |     "collapsed": false
143 |    },
144 |    "outputs": [
145 |     {
146 |      "name": "stdout",
147 |      "output_type": "stream",
148 |      "text": [
149 |       "Building layer: 0\n",
150 |       "   Input units: 2000\n",
151 |       "  Output units: 500\n",
152 |       "Building layer: 1\n",
153 |       "   Input units: 500\n",
154 |       "  Output units: 500\n",
155 |       "Building layer: 2\n",
156 |       "   Input units: 500\n",
157 |       "  Output units: 128\n",
158 |       "Loading the trained auto-encoder parameters.\n",
159 |       "...please ensure that the auto-encoder params matches the defined architecture.\n"
160 |      ]
161 |     }
162 |    ],
163 |    "source": [
164 |     "model = autoencoder( architecture = [2000, 500, 500, 128], model_src = 'params_2000/ae_train_nonoise',  param_type = 'ae')"
165 |    ]
166 |   },
167 |   {
168 |    "cell_type": "markdown",
169 |    "metadata": {},
170 |    "source": [
171 |     "## Extracting features from the trained Auto-Encoder"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "code",
176 |    "execution_count": 4,
177 |    "metadata": {
178 |     "collapsed": false
179 |    },
180 |    "outputs": [],
181 |    "source": [
182 |     "output = model.score(test_input)"
183 |    ]
184 |   },
185 |   {
186 |    "cell_type": "markdown",
187 |    "metadata": {},
188 |    "source": [
189 |     "## Saving the features extracted"
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "code",
194 |    "execution_count": 27,
195 |    "metadata": {
196 |     "collapsed": false
197 |    },
198 |    "outputs": [],
199 |    "source": [
200 |     "colnames = ['bit'] * 128\n",
201 |     "colnames = [colnames[i] + str(i) for i in range(128)]\n",
202 |     "colnames.insert(0,'_label_')\n",
203 |     "pd.DataFrame(data = np.c_[dat_y, output], \n",
204 |     "             columns = colnames). \\\n",
205 |     "             to_csv( 'data/ae_features_2000_nonoise.csv', index = False)"
206 |    ]
207 |   },
208 |   {
209 |    "cell_type": "markdown",
210 |    "metadata": {},
211 |    "source": [
212 |     "# Visualizing the convergence behavior"
213 |    ]
214 |   },
215 |   {
216 |    "cell_type": "code",
217 |    "execution_count": 24,
218 |    "metadata": {
219 |     "collapsed": false
220 |    },
221 |    "outputs": [
222 |     {
223 |      "data": {
224 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhYAAAFkCAYAAAB8RXKEAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzt3XuUnFWdr/HnlwtJSKAJBHIjgYTEEAig6QAKgiKgoAPC\nUTk2OA4wjkfH43Jwlo4ul0vnnJHjjONddHBAQJBWRAcEE7koN7l4SQOKBIQkgMg9QMslCbns88fu\nmqquru50dd6kqrqfz1rvqvT77rdr16ZJvr33fveOlBKSJElFGNXoCkiSpOHDYCFJkgpjsJAkSYUx\nWEiSpMIYLCRJUmEMFpIkqTAGC0mSVBiDhSRJKozBQpIkFcZgIUmSClNXsIiI1RGxucbx9QHuOS0i\n7oqIlyLisYg4PyJ23fqqS5KkZlNvj8USYFrFcSyQgMtqFY6Iw4GLgP8E9gPeCRwCfHuI9ZUkSU1s\nTD2FU0prKr+OiBOAlSmlW/q55bXA6pTSOT1fPxwR5wIfr7umkiSp6Q15jkVEjAVOA84foNjtwKyI\nOL7nnqnAu4CfDvV9JUlS84qhbpseEacAlwCzU0pPDFDuncB3gPHkHpKfAO9IKW0a4J7dgLcADwHr\nhlRBSZJGpvHA3sA11SMN28PWBIufAetTSm8foMx+wHXAF4FrgenAvwO/SSm9b4D7TgW+N6SKSZIk\ngNNSSpdu7zcdUrCIiNnAKuCklNLVA5T7LjA+pXRKxbnDgVuA6SmlJ/u57zDg1ksuuYSFCxfWXb+R\n6qyzzuLLX/5yo6vRcmy3+tlmQ2O71c82q9+KFSt4z3veA3B4Sum27f3+dU3erHAm8CSwdAvldgRe\nqTq3mfwkSQxw3zqAhQsXsnjx4iFWceRpa2uzvYbAdqufbTY0tlv9bLOt0pCpBHVP3oyIAE4HLkwp\nba66dnZEXFRx6irgHRHxgYiY09Nb8VXgVwPNy5AkSa1pKD0WxwCzgAtqXJvecw2AlNJFETEJ+BB5\nbsXzwM+BTwzhfSVJUpOrO1iklK4DRvdz7Ywa584BzqlRXJIkDTPuFTKMdHR0NLoKLcl2q59tNjS2\nW/1ss9Yz5MdNt6WIWAwsX758uZN2JEmqQ1dXF+3t7QDtKaWu7f3+9lhIkqTCGCwkSVJhDBaSJKkw\nBgtJklQYg4UkSSqMwUKSJBXGYCFJkgpjsJAkSYUxWEiSpMIYLCRJUmEMFpIkqTAGC0mSVBiDhSRJ\nKozBQpIkFcZgIUmSCmOwkCRJhTFYSJKkwhgsJElSYQwWkiSpMAYLSZJUGIOFJEkqjMFCkiQVxmAh\nSZIKY7CQJEmFMVhIkqTCGCwkSVJhDBaSJKkwBgtJklQYg4UkSSqMwUKSJBXGYCFJkgpjsJAkSYUx\nWEiSpMIYLCRJUmEMFpIkqTAGC0mSVBiDhSRJKozBQpIkFcZgIUmSClNXsIiI1RGxucbx9QHu2SEi\nPhcRD0XEuohYFRGnb3XNJUlS0xlTZ/klwOiKrw8ArgUuG+CeHwK7A2cAK4Hp2FMiSdKwVFewSCmt\nqfw6Ik4AVqaUbqlVPiKOA44A5qaUnu85/chQKipJkprfkHsOImIscBpw/gDFTgB+C/xTRDwaEfdH\nxBciYvxQ31eSJDWveodCKp0MtAEXDVBmLrnHYh1wEjAF+BawK/C3W/HekiSpCW1NsDgTWJZSemKA\nMqOAzcCpKaUXASLio8API+LvU0rrB3qDs846i7a2tl7nOjo66Ojo2IpqS5I0PHR2dtLZ2dnrXHd3\nd4Nqk0VKqf6bImYDq4CTUkpXD1DuQuCwlNKrKs7tC/wBeFVKaWU/9y0Gli9fvpzFixfXXT9Jkkaq\nrq4u2tvbAdpTSl3b+/2HOsfiTOBJYOkWyt0KzIiIHSvOLSD3Yjw6xPeWJElNqu5gEREBnA5cmFLa\nXHXt7IionHNxKbAGuCAiFkbEkcC/AedvaRhEkiS1nqH0WBwDzAIuqHFtes81AFJKLwHHArsAvwEu\nBq4EPjKE95UkSU2u7smbKaXr6L1IVuW1M2qc+yPwlvqrJkmSWo0rYEqSpMIYLCRJUmEMFpIkqTAG\nC0mSVBiDhSRJKozBQpIkFcZgIUmSCmOwkCRJhTFYSJKkwhgsJElSYQwWkiSpMAYLSZJUGIOFJEkq\njMFCkiQVxmAhSZIKY7CQJEmFMVhIkqTCGCwkSVJhDBaSJKkwBgtJklQYg4UkSSqMwUKSJBXGYCFJ\nkgpjsJAkSYUxWEiSpMIYLCRJUmEMFpIkqTAGC0mSVBiDhSRJKozBQpIkFcZgIUmSCmOwkCRJhTFY\nSJKkwhgsJElSYQwWkiSpMAYLSZJUGIOFJEkqjMFCkiQVxmAhSZIK09TBYtOmRtdAkiTVo65gERGr\nI2JzjePrg7j38IjYEBFdg32/jRvrqZ0kSWq0MXWWXwKMrvj6AOBa4LKBboqINuAi4Hpg6mDfzGAh\nSVJrqStYpJTWVH4dEScAK1NKt2zh1v8AvgdsBt4+2PdzKESSpNYy5DkWETEWOA04fwvlzgDmAP9c\n73vYYyFJUmupdyik0slAaYijpoiYD5wNvD6ltDki6noDg4UkSa1la54KORNYllJ6otbFiBhFHv74\nTEppZel0PW/gUIgkSa0lUkr13xQxG1gFnJRSurqfMm3Ac8BGyoFiVM+fNwJvTind2M+9i4HlBx98\nJNOmtfW61tHRQUdHR911liRpuOns7KSzs7PXue7ubm6++WaA9pTSoJ/ELMpQg8Vngb8DZqWUNvdT\nJoCFVac/BBwFvAN4KKW0tp97FwPLL798Oe94x+K66ydJ0kjV1dVFe3s7NChY1D3HoicwnA5cWB0q\nIuJsYGZK6W9STiz3Vl1/CliXUloxmPdyjoUkSa1lKHMsjgFmARfUuDa951ohDBaSJLWWunssUkrX\n0XuRrMprZ2zh3n+mjsdODRaSJLUW9wqRJEmFaepgYY+FJEmtxWAhSZIK09TBwqEQSZJaS1MHC3ss\nJElqLQYLSZJUGIOFJEkqTFMHC+dYSJLUWpo6WNhjIUlSazFYSJKkwhgsJElSYZo6WLzySqNrIEmS\n6mGwkCRJhTFYSJKkwhgsJElSYZo6WGzY0OgaSJKkejR1sFi/vtE1kCRJ9WjqYOFQiCRJraWpg4VD\nIZIktZamDhYOhUiS1FqaOljYYyFJUmtp6mDhHAtJklqLwUKSJBXGYCFJkgpjsJAkSYUxWEiSpMIY\nLCRJUmGaOlj4uKkkSa2lqYOFC2RJktRamjpY2GMhSVJraepgsX49pNToWkiSpMFq6mABDodIktRK\nmj5YvPRSo2sgSZIGy2AhSZIKY7CQJEmFMVhIkqTCGCwkSVJhDBaSJKkwBgtJklQYg4UkSSpMUweL\nMWMMFpIktZK6gkVErI6IzTWOr/dT/uSIuDYinoqI7oi4LSLePNj3Gz/eYCFJUiupt8diCTCt4jgW\nSMBl/ZQ/ErgWOB5YDNwAXBURBw3mzSZMMFhIktRKxtRTOKW0pvLriDgBWJlSuqWf8mdVnfpURLwd\nOAG4e0vvZ7CQJKm1DHmORUSMBU4Dzq/jngB2Ap4dTHmDhSRJrWVrJm+eDLQBF9Vxz8eAifQ/dNLL\nhAnw4otDqJkkSWqIuoZCqpwJLEspPTGYwhFxKvBp4MSU0jODueeRR87i8cfbOPHE8rmOjg46OjqG\nUF1JkoaXzs5OOjs7e53r7u5uUG2ySCnVf1PEbGAVcFJK6epBlH83cB7wzpTSzwZRfjGw/Pjjl/Py\ny4u58ca6qyhJ0ojU1dVFe3s7QHtKqWt7v/9Qh0LOBJ4Elm6pYER0kOdhvHswoaLSpEnw/PNDq6Ak\nSdr+6g4WPRMwTwcuTCltrrp2dkRcVPH1qeQ5GP8I/CYipvYcOw/mvXbaCRrcoyNJkuowlB6LY4BZ\nwAU1rk3vuVbyd8Bo4BzgsYrjK4N5I3ssJElqLXVP3kwpXUcOC7WunVH19VFDrBdQ7rHYvBlGNfXi\n45IkCZp8r5BJkyAlHzmVJKlVNHWw2Gmn/OpwiCRJraGpg8WkSfnVCZySJLWGpg4W9lhIktRaDBaS\nJKkwTR0sSkMhBgtJklpDUweLceNg4kR4ZlA7i0iSpEZr6mABsPvu8PTTja6FJEkaDIOFJEkqjMFC\nkiQVpiWCxVNPNboWkiRpMJo+WOyxhz0WkiS1iqYPFg6FSJLUOloiWHR3wyuvNLomkiRpS1oiWIC9\nFpIktYKmDxZ77JFfncApSVLza/pgMWNGfn3sscbWQ5IkbVnTB4tp02D0aHj00UbXRJIkbUnTB4vR\no3OvxZ/+1OiaSJKkLWn6YAGw5572WEiS1AoMFpIkqTAGC0mSVJiWChYpNbomkiRpIC0TLF56CZ57\nrtE1kSRJA2mJYDF3bn5dtaqx9ZAkSQNriWAxf35+feCBxtZDkiQNrCWCRVtb3jPkwQcbXRNJkjSQ\nlggWAPPm2WMhSVKza5lgMX++PRaSJDW7lgkW9lhIktT8WiZYLFgAzzyTD0mS1JxaJlgceGB+vfvu\nxtZDkiT1r2WCxfz5MGGCwUKSpGbWMsFi9GhYtAh+97tG10SSJPWnZYIFwEEH2WMhSVIza7lg8Yc/\nwPr1ja6JJEmqpaWCxaGHwoYN0NXV6JpIkqRaWipYvPrVsOOOcOutja6JJEmqpaWCxdixcMghBgtJ\nkppVSwULgMMPz8EipUbXRJIkVWu5YHHkkfD003DvvY2uiSRJqlZXsIiI1RGxucbx9QHueWNELI+I\ndRHxx4j4m62p8BFHwPjxsGzZ1nwXSZK0LdTbY7EEmFZxHAsk4LJahSNib+Bq4OfAQcBXgfMi4tih\nVTevvvmmNxksJElqRmPqKZxSWlP5dUScAKxMKd3Szy0fBFallD7e8/X9EfF64CzgunorW3L88fDR\nj8ILL8BOOw31u0iSpKINeY5FRIwFTgPOH6DYa4Hrq85dA7xuqO8L8La35fUsli7dmu8iSZKKtjWT\nN08G2oCLBigzDXiy6tyTwM4RMW6obzxnTn7stLNzqN9BkiRtC3UNhVQ5E1iWUnqiqMpUO+uss2hr\na+t1rqOjg46ODk49FT72MXjuOZg8eVvVQJKk5tXZ2Uln1W/Z3d3dDapNFmkIC0JExGxgFXBSSunq\nAcrdBCxPKX204tzpwJdTSv3GgYhYDCxfvnw5ixcvrlnm8cdhzz3hW9+C97+/7o8gSdKw1NXVRXt7\nO0B7Smm7b4Ix1KGQM8lDGlua5XA7cHTVuTf3nN8q06fDW9+ag4WLZUmS1BzqDhYREcDpwIUppc1V\n186OiMo5F/8BzI2If42IBRHx98A7gS9tRZ3/24c/DHfdBb/8ZRHfTZIkba2h9FgcA8wCLqhxbXrP\nNQBSSg8Bb+u55y7yY6Z/m1KqflJkSI45BhYsgK9+tYjvJkmStlbdkzdTStcBo/u5dkaNczcD7fVX\nbctGjYJ//Mc8x+J3v4MDD9wW7yJJkgar5fYKqXb66TB3LnzmM42uiSRJavlgMXYsfPazcMUVzrWQ\nJKnRWj5YAJx6al4w64MfzCtySpKkxhgWwWL0aDj3XFixAr7whUbXRpKkkWtYBAuAV78aPv7xPNfi\njjsaXRtJkkamYRMsAP75n+Hgg+Hd74Znnml0bSRJGnmGVbAYOzZvTPbyy3DiiflVkiRtP8MqWADs\ntRf89Kdw992552L9+kbXSJKkkWPYBQvIwyE/+hFce23uuXjppUbXSJKkkWFYBguA446DZcvgttvg\nTW+CRx9tdI0kSRr+hm2wADjqKLjhBnjsMWhvhxtvbHSNJEka3oZ1sABYsgSWL4f99889F//wDw6N\nSJK0rQz7YAGwxx5w3XXwxS/mhbQWLYIf/ABSanTNJEkaXkZEsIC8OudZZ+VdUBctyk+MvO51cM01\nBgxJkooyYoJFyfz5cNVV8Itf5EBx3HF51c6LL4Z16xpdO0mSWtuICxYlRx2Vl/6+4QaYORPe+16Y\nMQM+/GG4885G106SpNY0YoMFQAS88Y2wdCncdx+8//1w+eWweDHstx98+tM5ZDhUIknS4IzoYFFp\nwQL4/OfhT3+Cq6/O27Cfc04OGfvsk3syrrwSursbXVNJkprXmEZXoNmMGQNve1s+NmzIa1/8+Me5\nV+Mb38iTQA85BN7wBnjta/MxdWqjay1JUnMwWAxg7Fg49th8AKxaBddfn4/vfjf3cADsvXd+wmTJ\nEjjooHxMmdKwakuS1DAGizrMnZvnYbz//XnexaOP5gmgd9wBt98OV1wBa9fmsjNmlEPGAQfAwoV5\nuGXHHRv7GSRJ2pYMFkMUAbNm5eNd78rnNm2CBx/MO6uWjksu6b1PyV575ZCx7775KP15993z95Qk\nqZUZLAo0enTulViwAE45pXz++efh/vthxYr89MmKFXlr9699DTZvzmV23jlPEp03r+/rjBkwymm2\nkqQWYLDYDnbZBQ49NB+V1q/PPRz33ZdfV67Mr7/6VX46pfSY6/jxeRhmn336ho699spzQSRJagYG\niwYaNy5vjrb//n2vrV8Pq1eXw8bKlflYujSf37Ahlxs1CvbcMweMvffue+y5J+yww3b7SJKkEc5g\n0aTGjSvPw6i2aVPu0SiFjYcfzseqVXmp8sceK/d2ROSVRasDRymIzJqV30uSpCIYLFrQ6NHlgHD0\n0X2vr1+fg8dDD+Xj4YfLf77xRvjzn3sHjxkzctCYPbvv6+zZ0Na2nT6YJKnlGSyGoXHj8vyLefNq\nX3/lld7B46GH4JFHcgApze/YuLFcvq2tHDJqBY/p03PYkSTJYDEC7bBDeSJoLZs2wRNPlMNG5esv\nf5n/XLm0+Zgx5XketQLIrFkwceL2+WySpMYyWKiP0aPzvIyZM/OKorV0d+egUTpKwWPVqrxj7GOP\nlR+lBdhtt/6Dx+zZsMceruMhScOBwUJD0taWVxQ94IDa1zdsyHM5qoPHww/Dddfl15dfLpcfPz6H\njDlz8tyR6tcpUwwektQKDBbaJsaOLU8wrSUlePbZcvAozfVYvTovj37ppfCXv5TLT5xYO3CUXidP\n3rafR5I0OAYLNUREHh7ZbTd4zWv6Xk8pr1i6enU5cJTCxw03wHe+07vHo62tHGRqhY+ddtoOH0qS\nZLBQc4rIvRCTJ8PixX2vpwTPPNM7dJRely3Lr+vXl8vvumv/vR177+3mcJJUFIOFWlJE3rht993h\n4IP7Xt+8GZ58snbwuOKKPMejtHop5MmjtQLHnDl5cun48dvjU0lS6zNYaFgaNSqvrzF9eu0nWzZt\ngscf7xs6Vq8ur+WxaVO5/IwZ/Q+zzJrlfi2SVGKw0Ig0enRee2PPPeGII/pe37gxb3dfK3jcdFPv\n1UtL+7X0FzxmznQBMUkjh8FCqmHMmIGfaqlcNr0ydPzxj3DNNXmBscrvNXt2/8Fj2rQcTiRpODBY\nSEOwpWXT164t79FSGTx+9zu48so88bRkhx3Ka3jMnQvz5+fvO39+/tpN4iS1EoOFtA1MmND/7rQA\nL76Yg0dl6Fi9Gm67Db773fKjtBG5t6MUNCpDx5w5TiqV1HzqDhYRMQP4V+B4YEfgAeCMlFLXAPec\nBnwMmA90A8uAj6WUnh1KpaVWN2kS7L9/PqqllCeWPvggPPBA+XVLoaPyde5cQ4ekxqgrWETELsCt\nwM+BtwDPkMPCcwPcczhwEfAR4GpgJnAu8G3gnUOqtTSMlbaynzEDjjyy97XBho5Ro3KPRqnXZOHC\n8p932237fyZJI0e9PRafAB5JKb2v4tzDW7jntcDqlNI5pfIRcS7w8TrfWxrxBhs67r8/HytWwH/9\nF3zpS+WnWHbfvW/YWLgw9344iVTS1qo3WJwA/CwiLgPeAPwZ+GZK6bwB7rkd+FxEHJ9SWhYRU4F3\nAT8dUo0l1TRQ6Fi7Nvds3HdfDhv33Qe//jVcfHG+BnleyIIFfXs5XvUqh1UkDV69wWIu8EHgi8Dn\ngEOAr0XE+pTSxbVuSCndFhHvAX4QEeN73vMnwP8eerUl1WPCBDjwwHxU2rw5bwJXChul4PHzn8PT\nT+cyEeVhlepeDodVJFWrN1iMAn6dUvp0z9d3R8Qi4ANAzWAREfsBXwU+C1wLTAf+nTzP4n217pG0\nfYwaVV6v4/jje19bs6Z32Ljvvrwc+urVOZBA3s6+Omzsu29+fNZhFWlkilQaeB1M4YiHgGtTSu+v\nOPcB4FMppVn93PNdYHxK6ZSKc4cDtwDTU0pP1rhnMbD8yCOPpK2trde1jo4OOjo6Bl1nScVat67v\nsMqKFXlOR2lYZfz48rBKKWwsXJifWpkwobH1l4aTzs5OOjs7e53r7u7m5ptvBmgf6InNbaXeYPE9\nYM+U0hsqzn0ZODil9Pp+7rkceCWldGrFudcBvwRmppSeqHHPYmD58uXLWVxra0tJTWfz5rwaaWXY\nKPV4PPVULhORe0dq9XJMmdLQ6kvDRldXF+3t7dCgYFHvUMiXgVsj4pPAZcCh5OGMvysViIizyYHh\nb3pOXQV8u6dn4xpgRs/3+VWtUCGpNY0alYdA9toLjjuu97Vnn+07rHLllfCVr5SHVXbfHRYtggMO\nKB/775/X/JDUOuoKFiml30bEycDngU8Dq4GPpJS+X1FsOjCr4p6LImIS8CHy3IrnyetgfGIr6y6p\nRey6Kxx2WD4qrVuXH49dsQLuvRd+/3v42c/gG98oB465c3uHjQMOyEMqY1w3WGpKdf+vmVJaCiwd\n4PoZNc6dA5xTo7ikEWz8+NxLsWhR7/Nr15aDRuk477zy5m7jxuUhlOrAMWNGHm6R1DhmfklNZ8IE\naG/PR6VnnukdNn7/e/jxj+Gll/L1yZP7DqcsWgRVc8AlbUMGC0ktY8oUOOqofJRs3pw3dKsMGzfe\nCOeeC5s25TKzZ/ft3ViwIO8sK6lYBgtJLa20L8qcOXDiieXz69fnSaKVgeOSS+DRR/P1MWPy0yjV\ngWP2bIdTpK1hsJA0LI0bBwcdlI9Kzz0H99zTO3AsXQrd3fn6zjuX531UDqe4yqg0OAYLSSPK5Mlw\nxBH5KEkp92RUho077oALLoANG3KZadPKgaN07Lcf7LRTYz6H1KwMFpJGvAiYNSsfb31r+fyGDXmV\n0XvugT/8Ib8uXQpf+1r5cdi99+4dNvbfPw+xuHGbRiqDhST1Y+zY3Cux3369z69dm+dv3HNP+bj0\n0ryhG+R5H/Pn9+3hmDfP9Tc0/PkjLkl1mjABXvOafFTq7s7rb1QGjm99q7yk+Q475PU3qgPH7Nlu\n2qbhw2AhSQVpa4PXvS4flZ56qjyUUjquugr+8pd8fdKkPIRSHTimTvUJFbUeg4UkbWN77JGPyvU3\nShNGK8PGnXfC976XlzqH/CRKKXBUbtw2c6aBQ83LYCFJDVA5YfT448vnN22C1at7B46bbspLmr/y\nSi4zaVLfben33TfP4Rg3rjGfRyoxWEhSExk9OgeEefPgpJPK5zduhIce6rtL7NKleW0OyPM05s4t\nB43Kw3U4tL0YLCSpBYwZUw4cf/VX5fMp5T1USoGjdPz4x7nnI6VcbsqUcsiYPz8f8+bBPvvAjjs2\n5jNpeDJYSFILi4Ddd89H5aJfkB+LfeCB3oFj+XL4/vfhxRfL5WbOLAeNyldDh4bCYCFJw9SECXDg\ngfmolFJ+UuXBB3PwKL12dcEPfgAvvFAuO3Nm38BR6jkxdKgWg4UkjTAR+VHWqVPh8MN7X0sJnn66\nd+B48MHaoWPGjN7DKoYOgcFCklQhovx4bH+ho7qn48474bLLyutyQDl01BpemThx+34mbV8GC0nS\noFSGjsMO632tNIm0uqfjrrvghz/sGzpqDa/ss09+lFatzWAhSdpqlZNI+wsd1T0dd98Nl19e3rIe\ncmjZZ5/yMXdu+c+uRNoaDBaSpG2qMnRUL3eeEqxZk4PGypX5WLUqv15/PTzxRLnsxIk5aFSGjdKx\n11550zg1nsFCktQwEXmNjSlT+oYOgJdeykGjFDZKx09+khcM27gxlxs1Km/mVqunY599YOedt+vH\nGtEMFpKkpjVxIhxwQD6qbdyY91upDByrVsFvfpPX6qic1zFlSu2ejrlzYfp0d5ctksFCktSSxoyB\nvffOx9FH975WGmKpHFopHTfdBI89Vi47fnzv0FH55733dv+VehksJEnDTuUQy6GH9r2+dm1e8rwy\ncKxcCcuW5fOlDd9Km8X119sxefL2/VytwGAhSRpxJkyA/fbLR7VNm+DPf+7b23HXXfCjH8Hzz5fL\nTp7c/1MsM2eOzCEWg4UkSRVGj84TQWfPhqOO6nv9uef69nSsWgW33ZbnfJQ2fhs3DubMqd3bMWdO\nHoIZjgwWkiTVYfJkWLIkH9XWrctPq1T3dlx/PXz727B+fbnszJn993bsumvrrtlhsJAkqSDjx5e3\np6+2eXOeNFo9mfSee/Ljs2vWlMu2tfXt6dh3X1i4MM8baWYGC0mStoNRo2DPPfNx5JF9r3d39x1e\nWbkyPz77pz/lYAJ5obGFC/Ox337l1xkzmqOXw2AhSVITaGuDxYvzUW39+rw66YoVcO+9+fX22+HC\nC8vDKzvtlEPG1Knbtdp9GCwkSWpy48bBokX5qLRpU348thQ27r0XfvvbxtSxxGAhSVKLGj067ww7\nbx6ceGI+19UF7e2Nq9MIfMJWkiRtKwYLSZJUGIOFJEkqjMFCkiQVxmAhSZIKY7CQJEmFMVhIkqTC\nGCwkSVJhDBbDSGdnZ6Or0JJst/rZZkNju9XPNms9dQeLiJgRERdHxDMR8XJE3B0RNVY273XPDhHx\nuYh4KCLWRcSqiDh9yLVWTf4PODS2W/1ss6Gx3epnm7Weupb0johdgFuBnwNvAZ4B5gPPbeHWHwK7\nA2cAK4Hp2FsiSdKwU+9eIZ8AHkkpva/i3MMD3RARxwFHAHNTSs/3nH6kzveVJEktoN5egxOA30bE\nZRHxZER0RcT7BnMP8E8R8WhE3B8RX4iI8UOqsSRJalr19ljMBT4IfBH4HHAI8LWIWJ9SuniAe44A\n1gEnAVOAbwG7An/bzz3jAVasWFFn9Ua27u5uurq6Gl2NlmO71c82GxrbrX62Wf0q/u1syC/wkVIa\nfOGI9cBBAF5JAAAG/ElEQVSvU0pHVJz7KrAkpXR4P/dcA7wemJpSerHn3MnkeRcTU0rra9xzKvC9\nej6IJEnq5bSU0qXb+03r7bF4HKjuRlgB/I8t3PPnUqiouCeAPcmTOatdA5wGPETu6ZAkSYMzHtib\n/G/pdldvsLgVWFB1bgEDT+C8FXhnROyYUnq54p7NwKO1bkgprQG2e8qSJGmYuK1Rb1zv5M0vA6+N\niE9GxD49QxbvA75RKhARZ0fERRX3XAqsAS6IiIURcSTwb8D5tYZBJElS66orWKSUfgucDHQAvwc+\nBXwkpfT9imLTgVkV97wEHAvsAvwGuBi4EvjIVtVckiQ1nbomb0qSJA3E1S8lSVJhDBaSJKkwTRcs\nIuJDEbE6ItZGxB0RcXCj67StRMQREfGTiPhzRGyOiBNrlPk/EfFYz4Zv10XEvKrr4yLinJ5N4V6I\niMsjYo+qMpMj4nsR0R0Rz0XEeRExsarMrIj4aUS8FBFPRMS/RURT/Xz0TBr+dUT8pWfl1/+KiFfV\nKGebVYiID/RsFtjdc9zWs9R+ZRnbbAAR8Yme/0e/VHXedqsQEZ/paafK496qMrZZlRjE5p4t1W4p\npaY5gP9JXrfivcC+wLnAs8CURtdtG33e44D/A7wd2AScWHX9n3o+/18Bi4AryOt+7FBR5lvk9T7e\nALyG/IjRLVXfZxnQBSwBDgP+CFxScX0UeTLuNcAB5A3mngL+pdFtVPU5lgJ/DSzsqefVPZ99gm02\nYLu9rednbR9gHvAvwHpgoW02qPY7GFgF3Al8yZ+1AdvqM8DvyJtO7tFz7GqbDdhmuwCrgfOAdmAv\n4BhgTqu2W8MbtepD3wF8teLrIK918fFG1207fPbN9A0WjwFnVXy9M7AWOKXi6/XAyRVlSmuEHNLz\n9cKer19TUeYtwEZgWs/XxwMbqAhwwP8i71o7ptFtM0CbTen5bK+3zepuuzXAGbbZFttpEnA/8Cbg\nBnoHC9utb3t9Buga4Lpt1rdNPg/ctIUyLdVuTdMtFBFjyWnt56VzKX+q64HXNapejRIRc4Bp9G6P\nvwC/otweS8iLnFWWuZ+8e2ypzGuB51JKd1Z8++uBBBxaUeb3KaVnKspcA7QB+xf0kbaFXcif41mw\nzQYjIkZFxLuBHYHbbLMtOge4KqX0i8qTttuA5kce3l0ZEZdExCywzQYw4OaerdhuTRMsyL99jgae\nrDr/JLlRR5pp5P/gA7XHVOCVnh+y/spMI3dl/beU0ibyP8aVZWq9DzRp20dEAF8BfplSKo3h2mb9\niIhFEfEC+beab5J/s7kf26xfPQHs1cAna1y23Wq7Azid/JvwB4A5wM094/i2WW2lzT3vB95MHtL4\nWkT8dc/1lmu3epf0lprFN4H9gJqb36mP+4CDyL95vBP4buRVcFVDROxJDq7HpJQ2NLo+rSKlVLk3\nxT0R8Wvylg+nkH8G1dco8uaen+75+u6IWEQOZv3tGt7UmqnH4hnyBMapVeenAk9s/+o03BPkOSYD\ntccTwA4RsfMWylTPDB5N3ra+skyt94EmbPuI+AbwVuCNKaXHKy7ZZv1IKW1MKa1KKd2ZUvoUcDd5\n9VvbrLZ28gTErojYEBEbyJPiPhIRr5B/i7PdtiCl1E2eIDgPf9b609/mnrN7/txy7dY0waLnt4Ll\nwNGlcz3d3UfTwM1UGiWltJr8H7KyPXYmj4WV2mM5eeJNZZkF5B/I23tO3Q7sEhGvqfj2R5N/UH9V\nUeaAiJhSUebNQDfQ61GxRusJFW8HjkopPVJ5zTaryyhgnG3Wr+vJs+JfTe7pOQj4LXAJcFBKaRW2\n2xZFxCRyqHjMn7V+Dbi5Z0u2W6NnxFbNfD0FeJnej5uuAXZvdN220eedSP4L69Xk2br/0PP1rJ7r\nH+/5/CeQ/5K7AniA3o8YfZP8qNIbyb9l3UrfR4yWkv9SPJg8dHA/cHHF9VHk32CXAQeSx0efBP5v\no9uo6nN8kzw7+Qhyii4d4yvK2GZ92+3snjbbi/yo2v8j/yX0JtusrnasfirEduvbRl8Ajuz5WTsM\nuK6nrrvZZv222RLy3KdPkh8JPxV4AXh3q/6sNbxRazTy35OfxV1LTk9LGl2nbfhZ30AOFJuqju9U\nlPks+VGjl8mzc+dVfY9xwNfJQ0kvAD8E9qgqswv5N61u8j/M/wnsWFVmFnldiBd7fpD+FRjV6Daq\nqmOtttoEvLeqnG3Wu57nkddhWEv+zedaekKFbVZXO/6CimBhu9Vso07yEgFryU8kXErFegy2Wb/t\n9lby+h8vA38AzqxRpmXazU3IJElSYZpmjoUkSWp9BgtJklQYg4UkSSqMwUKSJBXGYCFJkgpjsJAk\nSYUxWEiSpMIYLCRJUmEMFpIkqTAGC0mSVBiDhSRJKsz/ByaDGWefWV6eAAAAAElFTkSuQmCC\n",
225 |       "text/plain": [
226 |        "<matplotlib.figure.Figure at 0x7fc1fa76c3c8>"
227 |       ]
228 |      },
229 |      "metadata": {},
230 |      "output_type": "display_data"
231 |     }
232 |    ],
233 |    "source": [
234 |     "import numpy as np\n",
235 |     "import matplotlib.pyplot as plt\n",
236 |     "%matplotlib inline\n",
237 |     "plt_dat = np.genfromtxt('params_2000/ae_train_nonoise/cost_profile.csv', delimiter=',', names = True)\n",
238 |     "plt.plot(plt_dat)\n",
239 |     "plt.show()"
240 |    ]
241 |   }
242 |  ],
243 |  "metadata": {
244 |   "anaconda-cloud": {},
245 |   "kernelspec": {
246 |    "display_name": "Python [conda env:py3]",
247 |    "language": "python",
248 |    "name": "conda-env-py3-py"
249 |   },
250 |   "language_info": {
251 |    "codemirror_mode": {
252 |     "name": "ipython",
253 |     "version": 3
254 |    },
255 |    "file_extension": ".py",
256 |    "mimetype": "text/x-python",
257 |    "name": "python",
258 |    "nbconvert_exporter": "python",
259 |    "pygments_lexer": "ipython3",
260 |    "version": "3.5.3"
261 |   }
262 |  },
263 |  "nbformat": 4,
264 |  "nbformat_minor": 2
265 | }
266 | 


--------------------------------------------------------------------------------
/scripts_R/clustering.R:
--------------------------------------------------------------------------------
  1 | 
  2 | setwd('/home/ekhongl/Codes/DL - Topic Modelling')
  3 | library(ggplot2)
  4 | library(dplyr)
  5 | 
  6 | 
  7 | 
  8 | 
  9 | 
 10 | 
 11 | #----------------------------------------------------------------
 12 | # [0] Data Prep
 13 | #----------------------------------------------------------------
 14 | map_class <- function(x){
 15 |   return( c(3,0,0,0,0,0,5,1,1,1,1,2,2,2,2,3,4,4,4,3)[0:19 %in% x] )
 16 | }
 17 | 
 18 | Y_levels <- c('alt.atheism','comp.graphics','comp.os.ms-windows.misc','comp.sys.ibm.pc.hardware',
 19 |                   'comp.sys.mac.hardware','comp.windows.x','misc.forsale','rec.autos','rec.motorcycles',
 20 |                   'rec.sport.baseball','rec.sport.hockey','sci.crypt','sci.electronics','sci.med',
 21 |                   'sci.space','soc.religion.christian','talk.politics.guns','talk.politics.mideast',
 22 |                   'talk.politics.misc','talk.religion.misc')
 23 | 
 24 | Y_overview_levels <- c('computer','hobby','science','religion','politics','sales')
 25 | 
 26 | dat <- read.csv('data/ae_features_2000_nonoise.csv',check.names=FALSE)
 27 | 
 28 | dat_raw <- read.table('data/raw_20news/20news.csv',header = T, sep=',',
 29 | 					row.names = NULL, stringsAsFactors = FALSE)
 30 | 
 31 | 
 32 | X <- dat[,2:ncol(dat)]
 33 | Y <- as.factor(dat[,1])
 34 | Y_overview <- as.factor(sapply(Y,map_class))
 35 | levels(Y) <- Y_levels
 36 | levels(Y_overview) <- Y_overview_levels
 37 | #----------------------------------------------------------------
 38 | 
 39 | 
 40 | 
 41 | #----------------------------------------------------------------
 42 | # [1] Viewing the distribution of each neuron's output
 43 | #----------------------------------------------------------------
 44 | for (i in 1:ncol(X)) {
 45 | 	hist(X[,i],breaks=100)
 46 | 	invisible(readline(prompt="Press [enter] to continue"))
 47 | }
 48 | 
 49 | hist(unlist(X),breaks=50)
 50 | #----------------------------------------------------------------
 51 | 
 52 | 
 53 | 
 54 | 
 55 | #----------------------------------------------------------------
 56 | # [2] K-means clustering to segment the data
 57 | #----------------------------------------------------------------
 58 | X.clust <- kmeans(X,centers = 20, nstart=2000, trace =1)
 59 | X.clust_overview <- kmeans(X,centers = 6, nstart=2000, trace =1)
 60 | 
 61 | for (i in 1:20) {
 62 | 	plt_dat <- data.frame(labels = Y[X.clust$cluster == i]) %>%
 63 | 					group_by(labels) %>%
 64 | 					summarize( freq = n())
 65 | 	plt <- ggplot(plt_dat, aes(x=labels, y = freq)) + 
 66 | 	            geom_bar(stat="identity") + 
 67 | 	            theme(axis.text.x = element_text(angle = 60, hjust = 1))
 68 | 	print(plt)
 69 | 	invisible(readline(prompt="Press [enter] to continue"))
 70 | 	rm(plt)
 71 | }
 72 | 
 73 | for (i in sort(unique(X.clust_overview$cluster))) {
 74 |     print( paste("----- Current cluster: ", i, " -----"))
 75 | 	plt_dat <- data.frame(labels = Y_overview[X.clust_overview$cluster == i]) %>%
 76 | 					group_by(labels) %>%
 77 | 					summarize( freq = n())
 78 | 	plt <- ggplot(plt_dat, aes(x=labels, y = freq)) + geom_bar(stat="identity")
 79 | 	print(plt)
 80 | 	invisible(readline(prompt="Press [enter] to continue"))
 81 | 	rm(plt)
 82 | }
 83 | 
 84 | table(Y,X.clust$cluster)
 85 | table(Y_overview,X.clust_overview$cluster)
 86 | 
 87 | write.csv(X.clust$cluster, 'data/clustered_output_nonoise.csv')
 88 | write.csv(X.clust_overview$cluster, 'data/clustered_overview_output_nonoise.csv')
 89 | #----------------------------------------------------------------
 90 | 
 91 | 
 92 | save(Y,Y_overview, X.clust, X.clust_overview, file = "data/cluster_results.RData")
 93 | 
 94 | 
 95 | # #----------------------------------------------------------------
 96 | # # [3] multinomial mixture modelling to segment the data
 97 | # #----------------------------------------------------------------
 98 | # library(mixtools)
 99 | # X_bin <- (X >0.1)*1
100 | # 
101 | # mixout <- multmixEM(X_bin, lambda = NULL, theta = NULL, k = 6,
102 | # 					maxit = 10000, epsilon = 1e-08, verb = TRUE)
103 | # 
104 | # mixout.clust <- apply( mixout$posterior, 1, function(x) which(x == max(x)) )
105 | # 
106 | # 
107 | # for (i in sort(unique(mixout.clust))) {
108 | #     print( paste("----- Current cluster: ", i, " -----"))
109 | # 	plt_dat <- data.frame(labels = Y_overview[mixout.clust == i]) %>%
110 | # 					group_by(labels) %>%
111 | # 					summarize( freq = n())
112 | # 	plt <- ggplot(plt_dat, aes(x=labels, y = freq)) + geom_bar(stat="identity")
113 | # 	print(plt)
114 | # 	invisible(readline(prompt="Press [enter] to continue"))
115 | # 	rm(plt)
116 | # }
117 | # #----------------------------------------------------------------
118 | # 
119 | # 
120 | # 
121 | # 
122 | # #----------------------------------------------------------------
123 | # # [4] hiearchical clustering using hamming distance to segment the data
124 | # #----------------------------------------------------------------
125 | # library(e1071)
126 | # # import your binary data with read.table or read.delim; the following
127 | # # example uses random data
128 | # y <- matrix(sample(c(0,1), 100, replace=TRUE), 10, 10,
129 | # dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:10, sep="")))
130 | # disma <- hamming.distance(X_bin)
131 | # hr <- hclust(as.dist(disma))
132 | # plot(as.dendrogram(hr), edgePar=list(col=3, lwd=4), horiz=T)
133 | # 
134 | # hr.clust <- cutree(hr,k=6)
135 | # 
136 | # for (i in sort(unique(hr.clust))) {
137 | #     print( paste("----- Current cluster: ", i, " -----"))
138 | # 	plt_dat <- data.frame(labels = Y_overview[hr.clust == i]) %>%
139 | # 					group_by(labels) %>%
140 | # 					summarize( freq = n())
141 | # 	plt <- ggplot(plt_dat, aes(x=labels, y = freq)) + geom_bar(stat="identity")
142 | # 	print(plt)
143 | # 	invisible(readline(prompt="Press [enter] to continue"))
144 | # }
145 | # #----------------------------------------------------------------
146 | 
147 | 
148 | 	
149 | 	
150 | 


--------------------------------------------------------------------------------
/scripts_old/UT_1_gibbs_sampling.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {
  7 |     "collapsed": false
  8 |    },
  9 |    "outputs": [
 10 |     {
 11 |      "name": "stderr",
 12 |      "output_type": "stream",
 13 |      "text": [
 14 |       "Using gpu device 0: Tesla K40c (CNMeM is disabled, cuDNN 5105)\n",
 15 |       "/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.\n",
 16 |       "  warnings.warn(warn)\n"
 17 |      ]
 18 |     }
 19 |    ],
 20 |    "source": [
 21 |     "import numpy as np\n",
 22 |     "import theano\n",
 23 |     "from theano import tensor as T\n",
 24 |     "theano_rng = T.shared_randomstreams.RandomStreams(1234)"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "code",
 29 |    "execution_count": 2,
 30 |    "metadata": {
 31 |     "collapsed": false
 32 |    },
 33 |    "outputs": [],
 34 |    "source": [
 35 |     "W_values = np.array([[1,1],[1,1]], dtype=theano.config.floatX)\n",
 36 |     "bvis_values = np.array([0,0], dtype=theano.config.floatX)\n",
 37 |     "bhid_values = np.array([0,0], dtype=theano.config.floatX)\n",
 38 |     "W = theano.shared(W_values)\n",
 39 |     "vbias = theano.shared(bvis_values)\n",
 40 |     "hbias = theano.shared(bhid_values)"
 41 |    ]
 42 |   },
 43 |   {
 44 |    "cell_type": "code",
 45 |    "execution_count": 3,
 46 |    "metadata": {
 47 |     "collapsed": true
 48 |    },
 49 |    "outputs": [],
 50 |    "source": [
 51 |     "def propup(vis, v0_doc_len):\n",
 52 |     "        pre_sigmoid_activation = T.dot(vis, W) + T.dot(hbias,v0_doc_len).T        #---------------------------[edited]\n",
 53 |     "        return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]\n",
 54 |     "\n",
 55 |     "def sample_h_given_v(v0_sample, v0_doc_len):\n",
 56 |     "    pre_sigmoid_h1, h1_mean = propup(v0_sample, v0_doc_len)\n",
 57 |     "    h1_sample = theano_rng.binomial(size=h1_mean.shape,\n",
 58 |     "                                         n=1, p=h1_mean,\n",
 59 |     "                                         dtype=theano.config.floatX)\n",
 60 |     "    return [pre_sigmoid_h1, h1_mean, h1_sample]\n",
 61 |     "\n",
 62 |     "def propdown(hid):\n",
 63 |     "    pre_softmax_activation = T.dot(hid, W.T) + vbias                     #---------------------------[edited]\n",
 64 |     "    return [pre_softmax_activation, T.nnet.softmax(pre_softmax_activation)]\n",
 65 |     "\n",
 66 |     "def sample_v_given_h(h0_sample, v0_doc_len):\n",
 67 |     "    pre_softmax_v1, v1_mean = propdown(h0_sample)\n",
 68 |     "    v1_sample = theano_rng.multinomial(size=None,\n",
 69 |     "                                         n=v0_doc_len, pvals=v1_mean,\n",
 70 |     "                                         dtype=theano.config.floatX)     #---------------------------[edited]\n",
 71 |     "    v1_doc_len = v1_sample.sum(axis=1).reshape([1,v1_sample.shape[0]])\n",
 72 |     "    return [pre_softmax_v1, v1_mean, v1_sample, v1_doc_len]\n",
 73 |     "\n",
 74 |     "def gibbs_hvh(h0_sample, v0_doc_len):\n",
 75 |     "    pre_softmax_v1, v1_mean, v1_sample, v1_doc_len = sample_v_given_h(h0_sample, v0_doc_len)\n",
 76 |     "    pre_sigmoid_h1, h1_mean, h1_sample = sample_h_given_v(v1_sample, v0_doc_len)\n",
 77 |     "    return [pre_sigmoid_h1, h1_mean, h1_sample,\n",
 78 |     "            pre_softmax_v1, v1_mean, v1_sample, v1_doc_len]                          #---------------------------[edited]\n",
 79 |     "\n",
 80 |     "def gibbs_vhv(v0_sample, v0_doc_len):\n",
 81 |     "    pre_sigmoid_h1, h1_mean, h1_sample = sample_h_given_v(v0_sample, v0_doc_len)\n",
 82 |     "    pre_softmax_v1, v1_mean, v1_sample, v1_doc_len  = sample_v_given_h(h1_sample, v0_doc_len)\n",
 83 |     "    return [pre_sigmoid_h1, h1_mean, h1_sample,\n",
 84 |     "            softmax_v1, v1_mean, v1_sample, v1_doc_len]                              #---------------------------[edited]"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "code",
 89 |    "execution_count": 4,
 90 |    "metadata": {
 91 |     "collapsed": false
 92 |    },
 93 |    "outputs": [],
 94 |    "source": [
 95 |     "ipt = T.matrix()\n",
 96 |     "ipt_rSum = ipt.sum(axis=1).reshape([1,ipt.shape[0]])"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "code",
101 |    "execution_count": 5,
102 |    "metadata": {
103 |     "collapsed": false
104 |    },
105 |    "outputs": [],
106 |    "source": [
107 |     "pre_sigmoid_ph, ph_mean, ph_sample = sample_h_given_v(ipt, ipt_rSum)\n",
108 |     "chain_start = ph_sample"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "code",
113 |    "execution_count": 6,
114 |    "metadata": {
115 |     "collapsed": false
116 |    },
117 |    "outputs": [],
118 |    "source": [
119 |     "([pre_sigmoid_nhs,\n",
120 |     "  nh_means,\n",
121 |     "  nh_samples,\n",
122 |     "  pre_sigmoid_nvs,\n",
123 |     "  nv_means,\n",
124 |     "  nv_samples,\n",
125 |     "  nv_samples_sum],   updates) = theano.scan(\n",
126 |     "                                gibbs_hvh,\n",
127 |     "                                outputs_info=[None, None, chain_start, None, None, None, ipt_rSum],\n",
128 |     "                                n_steps=1,\n",
129 |     "                                name=\"gibbs_hvh\" )"
130 |    ]
131 |   },
132 |   {
133 |    "cell_type": "code",
134 |    "execution_count": 7,
135 |    "metadata": {
136 |     "collapsed": false
137 |    },
138 |    "outputs": [],
139 |    "source": [
140 |     "gibbs = theano.function( [ipt], outputs=[pre_sigmoid_nvs,\n",
141 |     "                                          nv_means,\n",
142 |     "                                          nv_samples,\n",
143 |     "                                          nv_samples_sum,\n",
144 |     "                                          pre_sigmoid_nhs,\n",
145 |     "                                          nh_means,\n",
146 |     "                                          nh_samples], updates=updates )"
147 |    ]
148 |   },
149 |   {
150 |    "cell_type": "code",
151 |    "execution_count": 10,
152 |    "metadata": {
153 |     "collapsed": false
154 |    },
155 |    "outputs": [
156 |     {
157 |      "ename": "ValueError",
158 |      "evalue": "dimension mismatch in args to gemm (1,2)x(2,2)->(2,1)\nApply node that caused the error: GpuGemm{no_inplace}(<CudaNdarrayType(float32, matrix)>, TensorConstant{1.0}, GpuFromHost.0, GpuDimShuffle{1,0}.0, TensorConstant{1.0})\nToposort index: 26\nInputs types: [CudaNdarrayType(float32, matrix), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), TensorType(float32, scalar)]\nInputs shapes: [(2, 1), (), (1, 2), (2, 2), ()]\nInputs strides: [(1, 0), (), (0, 1), (1, 2), ()]\nInputs values: [b'CudaNdarray([[ 0.]\\n [ 0.]])', array(1.0, dtype=float32), b'CudaNdarray([[ 1.  1.]])', b'CudaNdarray([[ 1.  1.]\\n [ 1.  1.]])', array(1.0, dtype=float32)]\nOutputs clients: [[GpuDimShuffle{0,1,x,x}(GpuGemm{no_inplace}.0), GpuDimShuffle{x,0,1}(GpuGemm{no_inplace}.0)]]\n\nHINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.\nHINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.",
159 |      "output_type": "error",
160 |      "traceback": [
161 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
162 |       "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
163 |       "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m    858\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 859\u001b[0;31m             \u001b[0moutputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    860\u001b[0m         \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
164 |       "\u001b[0;31mValueError\u001b[0m: dimension mismatch in args to gemm (1,2)x(2,2)->(2,1)",
165 |       "\nDuring handling of the above exception, another exception occurred:\n",
166 |       "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
167 |       "\u001b[0;32m<ipython-input-10-b4d5839210af>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtheano\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconfig\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfloatX\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mgibbs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
168 |       "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m    869\u001b[0m                     \u001b[0mnode\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnodes\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mposition_of_error\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    870\u001b[0m                     \u001b[0mthunk\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mthunk\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 871\u001b[0;31m                     storage_map=getattr(self.fn, 'storage_map', None))\n\u001b[0m\u001b[1;32m    872\u001b[0m             \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    873\u001b[0m                 \u001b[0;31m# old-style linkers raise their own exceptions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
169 |       "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/gof/link.py\u001b[0m in \u001b[0;36mraise_with_op\u001b[0;34m(node, thunk, exc_info, storage_map)\u001b[0m\n\u001b[1;32m    312\u001b[0m         \u001b[0;31m# extra long error message in that case.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    313\u001b[0m         \u001b[0;32mpass\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 314\u001b[0;31m     \u001b[0mreraise\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexc_type\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc_value\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc_trace\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    315\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    316\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
170 |       "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/six.py\u001b[0m in \u001b[0;36mreraise\u001b[0;34m(tp, value, tb)\u001b[0m\n\u001b[1;32m    683\u001b[0m             \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtp\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    684\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__traceback__\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mtb\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 685\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwith_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    686\u001b[0m         \u001b[0;32mraise\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    687\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
171 |       "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m    857\u001b[0m         \u001b[0mt0_fn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    858\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 859\u001b[0;31m             \u001b[0moutputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    860\u001b[0m         \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    861\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'position_of_error'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
172 |       "\u001b[0;31mValueError\u001b[0m: dimension mismatch in args to gemm (1,2)x(2,2)->(2,1)\nApply node that caused the error: GpuGemm{no_inplace}(<CudaNdarrayType(float32, matrix)>, TensorConstant{1.0}, GpuFromHost.0, GpuDimShuffle{1,0}.0, TensorConstant{1.0})\nToposort index: 26\nInputs types: [CudaNdarrayType(float32, matrix), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), TensorType(float32, scalar)]\nInputs shapes: [(2, 1), (), (1, 2), (2, 2), ()]\nInputs strides: [(1, 0), (), (0, 1), (1, 2), ()]\nInputs values: [b'CudaNdarray([[ 0.]\\n [ 0.]])', array(1.0, dtype=float32), b'CudaNdarray([[ 1.  1.]])', b'CudaNdarray([[ 1.  1.]\\n [ 1.  1.]])', array(1.0, dtype=float32)]\nOutputs clients: [[GpuDimShuffle{0,1,x,x}(GpuGemm{no_inplace}.0), GpuDimShuffle{x,0,1}(GpuGemm{no_inplace}.0)]]\n\nHINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.\nHINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node."
173 |      ]
174 |     }
175 |    ],
176 |    "source": [
177 |     "b = np.array([[2,0,],[0,2],[1,1]], dtype = theano.config.floatX)\n",
178 |     "gibbs(b)"
179 |    ]
180 |   },
181 |   {
182 |    "cell_type": "code",
183 |    "execution_count": 22,
184 |    "metadata": {
185 |     "collapsed": false
186 |    },
187 |    "outputs": [
188 |     {
189 |      "ename": "ValueError",
190 |      "evalue": "dimension mismatch in args to gemm (3,2)x(2,2)->(2,1)\nApply node that caused the error: GpuGemm{no_inplace}(<CudaNdarrayType(float32, matrix)>, TensorConstant{1.0}, GpuFromHost.0, GpuDimShuffle{1,0}.0, TensorConstant{1.0})\nToposort index: 8\nInputs types: [CudaNdarrayType(float32, matrix), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), TensorType(float32, scalar)]\nInputs shapes: [(2, 1), (), (3, 2), (2, 2), ()]\nInputs strides: [(1, 0), (), (2, 1), (1, 2), ()]\nInputs values: [b'CudaNdarray([[ 0.]\\n [ 0.]])', array(1.0, dtype=float32), 'not shown', b'CudaNdarray([[ 1.  1.]\\n [ 1.  1.]])', array(1.0, dtype=float32)]\nOutputs clients: [[GpuDimShuffle{0,1,x,x}(GpuGemm{no_inplace}.0), GpuDimShuffle{x,0,1}(GpuGemm{no_inplace}.0)]]\n\nHINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.\nHINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.",
191 |      "output_type": "error",
192 |      "traceback": [
193 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
194 |       "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
195 |       "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m    858\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 859\u001b[0;31m             \u001b[0moutputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    860\u001b[0m         \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
196 |       "\u001b[0;31mValueError\u001b[0m: dimension mismatch in args to gemm (3,2)x(2,2)->(2,1)",
197 |       "\nDuring handling of the above exception, another exception occurred:\n",
198 |       "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
199 |       "\u001b[0;32m<ipython-input-22-1f8cc4efdce0>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0marray\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtheano\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconfig\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfloatX\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;34m[\u001b[0m\u001b[0mpre_sigmoid_h1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mh1_mean\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mh1_sample\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msoftmax_v1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mv1_doc_len\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mv1_mean\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mv1_sample\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m=\u001b[0m \u001b[0mgibbs\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      3\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpre_sigmoid_h1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'-------'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mh1_mean\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
200 |       "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m    869\u001b[0m                     \u001b[0mnode\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnodes\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mposition_of_error\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    870\u001b[0m                     \u001b[0mthunk\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mthunk\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 871\u001b[0;31m                     storage_map=getattr(self.fn, 'storage_map', None))\n\u001b[0m\u001b[1;32m    872\u001b[0m             \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    873\u001b[0m                 \u001b[0;31m# old-style linkers raise their own exceptions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
201 |       "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/gof/link.py\u001b[0m in \u001b[0;36mraise_with_op\u001b[0;34m(node, thunk, exc_info, storage_map)\u001b[0m\n\u001b[1;32m    312\u001b[0m         \u001b[0;31m# extra long error message in that case.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    313\u001b[0m         \u001b[0;32mpass\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 314\u001b[0;31m     \u001b[0mreraise\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexc_type\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc_value\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexc_trace\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    315\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    316\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
202 |       "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/six.py\u001b[0m in \u001b[0;36mreraise\u001b[0;34m(tp, value, tb)\u001b[0m\n\u001b[1;32m    683\u001b[0m             \u001b[0mvalue\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtp\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    684\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__traceback__\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mtb\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 685\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwith_traceback\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    686\u001b[0m         \u001b[0;32mraise\u001b[0m \u001b[0mvalue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    687\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
203 |       "\u001b[0;32m/home/ekhongl/.conda/envs/py3/lib/python3.5/site-packages/theano/compile/function_module.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m    857\u001b[0m         \u001b[0mt0_fn\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    858\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 859\u001b[0;31m             \u001b[0moutputs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    860\u001b[0m         \u001b[0;32mexcept\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    861\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfn\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'position_of_error'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
204 |       "\u001b[0;31mValueError\u001b[0m: dimension mismatch in args to gemm (3,2)x(2,2)->(2,1)\nApply node that caused the error: GpuGemm{no_inplace}(<CudaNdarrayType(float32, matrix)>, TensorConstant{1.0}, GpuFromHost.0, GpuDimShuffle{1,0}.0, TensorConstant{1.0})\nToposort index: 8\nInputs types: [CudaNdarrayType(float32, matrix), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), TensorType(float32, scalar)]\nInputs shapes: [(2, 1), (), (3, 2), (2, 2), ()]\nInputs strides: [(1, 0), (), (2, 1), (1, 2), ()]\nInputs values: [b'CudaNdarray([[ 0.]\\n [ 0.]])', array(1.0, dtype=float32), 'not shown', b'CudaNdarray([[ 1.  1.]\\n [ 1.  1.]])', array(1.0, dtype=float32)]\nOutputs clients: [[GpuDimShuffle{0,1,x,x}(GpuGemm{no_inplace}.0), GpuDimShuffle{x,0,1}(GpuGemm{no_inplace}.0)]]\n\nHINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.\nHINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node."
205 |      ]
206 |     }
207 |    ],
208 |    "source": [
209 |     "b = np.array([[2,0,],[0,2],[1,1]], dtype = theano.config.floatX)\n",
210 |     "[pre_sigmoid_h1, h1_mean, h1_sample, softmax_v1, v1_doc_len, v1_mean, v1_sample]= gibbs(b)\n",
211 |     "print(pre_sigmoid_h1)\n",
212 |     "print('-------')\n",
213 |     "print(h1_mean)\n",
214 |     "print('-------')\n",
215 |     "print(h1_sample)\n",
216 |     "print('-------')\n",
217 |     "print(softmax_v1)\n",
218 |     "print('-------')\n",
219 |     "print(v1_mean)\n",
220 |     "print('-------')\n",
221 |     "print(v1_sample)"
222 |    ]
223 |   },
224 |   {
225 |    "cell_type": "code",
226 |    "execution_count": 193,
227 |    "metadata": {
228 |     "collapsed": false
229 |    },
230 |    "outputs": [
231 |     {
232 |      "data": {
233 |       "text/plain": [
234 |        "(3, 2)"
235 |       ]
236 |      },
237 |      "execution_count": 193,
238 |      "metadata": {},
239 |      "output_type": "execute_result"
240 |     }
241 |    ],
242 |    "source": [
243 |     "v1_sample.shape\n",
244 |     "h1_mean[None,:,:].shape\n",
245 |     "h1_mean.shape[1:]\n",
246 |     "h1_mean.reshape(h1_mean.shape[1:]).shape"
247 |    ]
248 |   },
249 |   {
250 |    "cell_type": "code",
251 |    "execution_count": 11,
252 |    "metadata": {
253 |     "collapsed": false
254 |    },
255 |    "outputs": [
256 |     {
257 |      "data": {
258 |       "text/plain": [
259 |        "array([[ 2.],\n",
260 |        "       [ 2.]], dtype=float32)"
261 |       ]
262 |      },
263 |      "execution_count": 11,
264 |      "metadata": {},
265 |      "output_type": "execute_result"
266 |     }
267 |    ],
268 |    "source": [
269 |     "W_values.sum(axis=1).reshape([1,W_values.sum(axis=1).shape[0]]).T"
270 |    ]
271 |   }
272 |  ],
273 |  "metadata": {
274 |   "anaconda-cloud": {},
275 |   "kernelspec": {
276 |    "display_name": "Python [conda env:py3]",
277 |    "language": "python",
278 |    "name": "conda-env-py3-py"
279 |   },
280 |   "language_info": {
281 |    "codemirror_mode": {
282 |     "name": "ipython",
283 |     "version": 3
284 |    },
285 |    "file_extension": ".py",
286 |    "mimetype": "text/x-python",
287 |    "name": "python",
288 |    "nbconvert_exporter": "python",
289 |    "pygments_lexer": "ipython3",
290 |    "version": "3.5.2"
291 |   }
292 |  },
293 |  "nbformat": 4,
294 |  "nbformat_minor": 2
295 | }
296 | 


--------------------------------------------------------------------------------
/scripts_old/Untitled.ipynb:
--------------------------------------------------------------------------------
 1 | {
 2 |  "cells": [
 3 |   {
 4 |    "cell_type": "code",
 5 |    "execution_count": 98,
 6 |    "metadata": {
 7 |     "collapsed": false
 8 |    },
 9 |    "outputs": [],
10 |    "source": [
11 |     "import numpy as np\n",
12 |     "import theano\n",
13 |     "from theano import tensor as T\n",
14 |     "\n",
15 |     "\n",
16 |     "W_values = np.array([[1,1],[1,1]], dtype=theano.config.floatX)\n",
17 |     "bvis_values = np.array([0,0], dtype=theano.config.floatX)\n",
18 |     "bhid_values = np.array([0,0], dtype=theano.config.floatX)\n",
19 |     "\n",
20 |     "W = theano.shared(W_values) # we assume that ``W_values`` contains the\n",
21 |     "                            # initial values of your weight matrix\n",
22 |     "bvis = theano.shared(bvis_values)\n",
23 |     "bhid = theano.shared(bhid_values)\n",
24 |     "\n",
25 |     "trng = T.shared_randomstreams.RandomStreams(1234)\n",
26 |     "\n",
27 |     "def OneStep(vsample, W, bhid, bvis) :\n",
28 |     "    hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)\n",
29 |     "    hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)\n",
30 |     "    vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)\n",
31 |     "    return trng.binomial(size=vsample.shape, n=1, p=vmean,\n",
32 |     "                         dtype=theano.config.floatX)\n",
33 |     "\n",
34 |     "sample = T.matrix()\n",
35 |     "\n",
36 |     "values, updates = theano.scan(OneStep, sequences=sample, non_sequences = [W, bhid, bvis], n_steps=sample.shape[0])\n",
37 |     "\n",
38 |     "gibbs10 = theano.function([sample], values, updates=updates)"
39 |    ]
40 |   },
41 |   {
42 |    "cell_type": "code",
43 |    "execution_count": 99,
44 |    "metadata": {
45 |     "collapsed": true
46 |    },
47 |    "outputs": [],
48 |    "source": [
49 |     "tmp = np.array([[10,10],[-10,-10]], dtype = theano.config.floatX)"
50 |    ]
51 |   },
52 |   {
53 |    "cell_type": "code",
54 |    "execution_count": 100,
55 |    "metadata": {
56 |     "collapsed": false
57 |    },
58 |    "outputs": [
59 |     {
60 |      "data": {
61 |       "text/plain": [
62 |        "array([[ 1.,  1.],\n",
63 |        "       [ 1.,  0.]], dtype=float32)"
64 |       ]
65 |      },
66 |      "execution_count": 100,
67 |      "metadata": {},
68 |      "output_type": "execute_result"
69 |     }
70 |    ],
71 |    "source": [
72 |     "gibbs10(tmp)"
73 |    ]
74 |   }
75 |  ],
76 |  "metadata": {
77 |   "anaconda-cloud": {},
78 |   "kernelspec": {
79 |    "display_name": "Python [conda env:py3]",
80 |    "language": "python",
81 |    "name": "conda-env-py3-py"
82 |   },
83 |   "language_info": {
84 |    "codemirror_mode": {
85 |     "name": "ipython",
86 |     "version": 3
87 |    },
88 |    "file_extension": ".py",
89 |    "mimetype": "text/x-python",
90 |    "name": "python",
91 |    "nbconvert_exporter": "python",
92 |    "pygments_lexer": "ipython3",
93 |    "version": "3.5.2"
94 |   }
95 |  },
96 |  "nbformat": 4,
97 |  "nbformat_minor": 2
98 | }
99 | 


--------------------------------------------------------------------------------
/scripts_old/Untitled1.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 75,
  6 |    "metadata": {
  7 |     "collapsed": false
  8 |    },
  9 |    "outputs": [],
 10 |    "source": [
 11 |     "import numpy as np\n",
 12 |     "import theano\n",
 13 |     "from theano import tensor as T\n",
 14 |     "\n",
 15 |     "\n",
 16 |     "W_values = np.array([[1,2],[1,1]], dtype=theano.config.floatX)\n",
 17 |     "bvis_values = np.array([1,1], dtype=theano.config.floatX)\n",
 18 |     "bhid_values = np.array([2,3], dtype=theano.config.floatX)\n",
 19 |     "\n",
 20 |     "W = theano.shared(W_values) # we assume that ``W_values`` contains the\n",
 21 |     "                            # initial values of your weight matrix\n",
 22 |     "bvis = theano.shared(bvis_values)\n",
 23 |     "bhid = theano.shared(bhid_values)\n",
 24 |     "\n",
 25 |     "def t_propup(vis,vis_sum):\n",
 26 |     "    pre_sigmoid_activation = T.dot(vis, W) + T.dot(bhid.reshape([1,bhid.shape[0]]).T,vis_sum).T\n",
 27 |     "    return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]\n",
 28 |     "\n",
 29 |     "t_ipt = T.matrix()\n",
 30 |     "t_ipt_sum = t_ipt.sum(axis=1).reshape([1,t_ipt.shape[0]])\n",
 31 |     "\n",
 32 |     "t_results, t_updates = theano.scan( fn = t_propup, \n",
 33 |     "                                non_sequences = [t_ipt, t_ipt_sum],\n",
 34 |     "                                n_steps=1\n",
 35 |     "                                  )\n",
 36 |     "\n",
 37 |     "tmp_f = theano.function( [t_ipt], t_results, updates = t_updates)"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "code",
 42 |    "execution_count": 11,
 43 |    "metadata": {
 44 |     "collapsed": true
 45 |    },
 46 |    "outputs": [],
 47 |    "source": [
 48 |     "tmp = np.array([[10,10],[-10,-10],[-10,-10]], dtype = theano.config.floatX)"
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "code",
 53 |    "execution_count": 18,
 54 |    "metadata": {
 55 |     "collapsed": false
 56 |    },
 57 |    "outputs": [
 58 |     {
 59 |      "data": {
 60 |       "text/plain": [
 61 |        "[array([[[ 60.,  20.],\n",
 62 |        "         [-60., -20.],\n",
 63 |        "         [-60., -20.]]], dtype=float32),\n",
 64 |        " array([[[  1.00000000e+00,   1.00000000e+00],\n",
 65 |        "         [  8.75653169e-27,   2.06115347e-09],\n",
 66 |        "         [  8.75653169e-27,   2.06115347e-09]]], dtype=float32)]"
 67 |       ]
 68 |      },
 69 |      "execution_count": 18,
 70 |      "metadata": {},
 71 |      "output_type": "execute_result"
 72 |     }
 73 |    ],
 74 |    "source": [
 75 |     "tmp_f(tmp)"
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "code",
 80 |    "execution_count": 78,
 81 |    "metadata": {
 82 |     "collapsed": false
 83 |    },
 84 |    "outputs": [],
 85 |    "source": [
 86 |     "def propdown(hid):\n",
 87 |     "    pre_softmax_activation = T.dot(hid, W.T) + bvis                               #---------------------------[edited]\n",
 88 |     "    return [pre_softmax_activation, T.nnet.softmax(pre_softmax_activation)]\n",
 89 |     "\n",
 90 |     "ipt = T.matrix()\n",
 91 |     "\n",
 92 |     "results, updates = theano.scan( fn = propdown, \n",
 93 |     "                                non_sequences = ipt,\n",
 94 |     "                                n_steps=1\n",
 95 |     "                                  )\n",
 96 |     "\n",
 97 |     "tmp_f2 = theano.function( [ipt], results, updates = updates)\n"
 98 |    ]
 99 |   },
100 |   {
101 |    "cell_type": "code",
102 |    "execution_count": 79,
103 |    "metadata": {
104 |     "collapsed": false
105 |    },
106 |    "outputs": [
107 |     {
108 |      "data": {
109 |       "text/plain": [
110 |        "[array([[[ 4.,  3.],\n",
111 |        "         [ 1.,  1.],\n",
112 |        "         [ 3.,  2.],\n",
113 |        "         [ 2.,  2.]]], dtype=float32), array([[[ 0.7310586 ,  0.26894143],\n",
114 |        "         [ 0.5       ,  0.5       ],\n",
115 |        "         [ 0.7310586 ,  0.26894143],\n",
116 |        "         [ 0.5       ,  0.5       ]]], dtype=float32)]"
117 |       ]
118 |      },
119 |      "execution_count": 79,
120 |      "metadata": {},
121 |      "output_type": "execute_result"
122 |     }
123 |    ],
124 |    "source": [
125 |     "tmp_f2(np.array([[1,1],[0,0],[0,1],[1,0]], dtype = theano.config.floatX))"
126 |    ]
127 |   },
128 |   {
129 |    "cell_type": "code",
130 |    "execution_count": 76,
131 |    "metadata": {
132 |     "collapsed": false
133 |    },
134 |    "outputs": [
135 |     {
136 |      "data": {
137 |       "text/plain": [
138 |        "array([[ 43.,  24.],\n",
139 |        "       [  2.,   3.],\n",
140 |        "       [  4.,   4.],\n",
141 |        "       [  3.,   4.]], dtype=float32)"
142 |       ]
143 |      },
144 |      "execution_count": 76,
145 |      "metadata": {},
146 |      "output_type": "execute_result"
147 |     }
148 |    ],
149 |    "source": [
150 |     "(T.dot(np.array([[1,20],[0,0],[0,1],[1,0]], dtype = theano.config.floatX), W.T) + bhid).eval()"
151 |    ]
152 |   },
153 |   {
154 |    "cell_type": "code",
155 |    "execution_count": 77,
156 |    "metadata": {
157 |     "collapsed": false
158 |    },
159 |    "outputs": [
160 |     {
161 |      "data": {
162 |       "text/plain": [
163 |        "array([[  1.00000000e+00,   5.60279645e-09],\n",
164 |        "       [  2.68941432e-01,   7.31058598e-01],\n",
165 |        "       [  5.00000000e-01,   5.00000000e-01],\n",
166 |        "       [  2.68941432e-01,   7.31058598e-01]], dtype=float32)"
167 |       ]
168 |      },
169 |      "execution_count": 77,
170 |      "metadata": {},
171 |      "output_type": "execute_result"
172 |     }
173 |    ],
174 |    "source": [
175 |     "T.nnet.softmax((T.dot(np.array([[1,20],[0,0],[0,1],[1,0]], dtype = theano.config.floatX), W.T) + bhid).eval()).eval()"
176 |    ]
177 |   },
178 |   {
179 |    "cell_type": "code",
180 |    "execution_count": 39,
181 |    "metadata": {
182 |     "collapsed": false
183 |    },
184 |    "outputs": [
185 |     {
186 |      "data": {
187 |       "text/plain": [
188 |        "array([[ 0.1748777 ,  0.1748777 ,  0.1748777 ,  0.47536689]], dtype=float32)"
189 |       ]
190 |      },
191 |      "execution_count": 39,
192 |      "metadata": {},
193 |      "output_type": "execute_result"
194 |     }
195 |    ],
196 |    "source": [
197 |     "T.nnet.softmax(np.array([0,0,0,1], dtype = theano.config.floatX)).eval()"
198 |    ]
199 |   }
200 |  ],
201 |  "metadata": {
202 |   "anaconda-cloud": {},
203 |   "kernelspec": {
204 |    "display_name": "Python [conda env:py3]",
205 |    "language": "python",
206 |    "name": "conda-env-py3-py"
207 |   },
208 |   "language_info": {
209 |    "codemirror_mode": {
210 |     "name": "ipython",
211 |     "version": 3
212 |    },
213 |    "file_extension": ".py",
214 |    "mimetype": "text/x-python",
215 |    "name": "python",
216 |    "nbconvert_exporter": "python",
217 |    "pygments_lexer": "ipython3",
218 |    "version": "3.5.2"
219 |   }
220 |  },
221 |  "nbformat": 4,
222 |  "nbformat_minor": 2
223 | }
224 | 


--------------------------------------------------------------------------------
/scripts_old/Untitled2.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 59,
  6 |    "metadata": {
  7 |     "collapsed": false
  8 |    },
  9 |    "outputs": [],
 10 |    "source": [
 11 |     "import numpy as np\n",
 12 |     "import theano\n",
 13 |     "from theano import tensor as T\n",
 14 |     "theano_rng = T.shared_randomstreams.RandomStreams(1234)\n",
 15 |     "W_values = np.array([[1,1],[1,1]], dtype=theano.config.floatX)\n",
 16 |     "bvis_values = np.array([1,1], dtype=theano.config.floatX)\n",
 17 |     "bhid_values = np.array([1,1], dtype=theano.config.floatX)\n",
 18 |     "W = theano.shared(W_values)\n",
 19 |     "vbias = theano.shared(bvis_values)\n",
 20 |     "hbias = theano.shared(bhid_values)\n",
 21 |     "\n",
 22 |     "def propup(vis, v0_doc_len):\n",
 23 |     "        pre_sigmoid_activation = T.dot(vis, W) + T.dot(hbias.reshape([1,hbias.shape[0]]).T,v0_doc_len).T        #---------------------------[edited]\n",
 24 |     "        return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]\n",
 25 |     "\n",
 26 |     "def sample_h_given_v(v0_sample, v0_doc_len):\n",
 27 |     "    pre_sigmoid_h1, h1_mean = propup(v0_sample, v0_doc_len)\n",
 28 |     "    h1_sample = theano_rng.binomial(size=h1_mean.shape,\n",
 29 |     "                                         n=1, p=h1_mean,\n",
 30 |     "                                         dtype=theano.config.floatX)\n",
 31 |     "    return [pre_sigmoid_h1, h1_mean, h1_sample]\n",
 32 |     "\n",
 33 |     "ipt = T.matrix()\n",
 34 |     "ipt_rSum = ipt.sum(axis=1).reshape([1,ipt.shape[0]])\n",
 35 |     "\n",
 36 |     "[out_1,out_2,out_3], updates =theano.scan( sample_h_given_v,\n",
 37 |     "                                non_sequences =[ipt, ipt_rSum],\n",
 38 |     "                                n_steps=1,\n",
 39 |     "                                name=\"gibbs_hvh\" )\n",
 40 |     "\n",
 41 |     "hgv = theano.function( [ipt], outputs=[out_1,out_2,out_3])"
 42 |    ]
 43 |   },
 44 |   {
 45 |    "cell_type": "code",
 46 |    "execution_count": 60,
 47 |    "metadata": {
 48 |     "collapsed": true
 49 |    },
 50 |    "outputs": [],
 51 |    "source": [
 52 |     "b = np.array([[1,6,],[1,3],[5,1]], dtype = theano.config.floatX)"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "code",
 57 |    "execution_count": 61,
 58 |    "metadata": {
 59 |     "collapsed": false
 60 |    },
 61 |    "outputs": [
 62 |     {
 63 |      "name": "stdout",
 64 |      "output_type": "stream",
 65 |      "text": [
 66 |       "[[[ 14.  14.]\n",
 67 |       "  [  8.   8.]\n",
 68 |       "  [ 12.  12.]]]\n",
 69 |       "[[[ 0.99999917  0.99999917]\n",
 70 |       "  [ 0.99966466  0.99966466]\n",
 71 |       "  [ 0.9999938   0.9999938 ]]]\n",
 72 |       "[[[ 1.  1.]\n",
 73 |       "  [ 1.  1.]\n",
 74 |       "  [ 1.  1.]]]\n"
 75 |      ]
 76 |     }
 77 |    ],
 78 |    "source": [
 79 |     "pre_sigmoid_ph, ph_mean, ph_sample = hgv(b)\n",
 80 |     "print(pre_sigmoid_ph)\n",
 81 |     "print(ph_mean)\n",
 82 |     "print(ph_sample)"
 83 |    ]
 84 |   }
 85 |  ],
 86 |  "metadata": {
 87 |   "anaconda-cloud": {},
 88 |   "kernelspec": {
 89 |    "display_name": "Python [conda env:py3]",
 90 |    "language": "python",
 91 |    "name": "conda-env-py3-py"
 92 |   },
 93 |   "language_info": {
 94 |    "codemirror_mode": {
 95 |     "name": "ipython",
 96 |     "version": 3
 97 |    },
 98 |    "file_extension": ".py",
 99 |    "mimetype": "text/x-python",
100 |    "name": "python",
101 |    "nbconvert_exporter": "python",
102 |    "pygments_lexer": "ipython3",
103 |    "version": "3.5.2"
104 |   }
105 |  },
106 |  "nbformat": 4,
107 |  "nbformat_minor": 2
108 | }
109 | 


--------------------------------------------------------------------------------
/scripts_old/gen_testing_1.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 14,
  6 |    "metadata": {
  7 |     "collapsed": false
  8 |    },
  9 |    "outputs": [
 10 |     {
 11 |      "data": {
 12 |       "text/plain": [
 13 |        "array([[ 0.],\n",
 14 |        "       [ 0.]])"
 15 |       ]
 16 |      },
 17 |      "execution_count": 14,
 18 |      "metadata": {},
 19 |      "output_type": "execute_result"
 20 |     }
 21 |    ],
 22 |    "source": [
 23 |     "import numpy as np\n",
 24 |     "\n",
 25 |     "a1 = np.array(([1],[0]))\n",
 26 |     "a2 = np.array(([1],[0]))\n",
 27 |     "np.zeros([2, 2], 'float64')[a1,a2]"
 28 |    ]
 29 |   },
 30 |   {
 31 |    "cell_type": "code",
 32 |    "execution_count": 15,
 33 |    "metadata": {
 34 |     "collapsed": false
 35 |    },
 36 |    "outputs": [
 37 |     {
 38 |      "data": {
 39 |       "text/plain": [
 40 |        "array([0])"
 41 |       ]
 42 |      },
 43 |      "execution_count": 15,
 44 |      "metadata": {},
 45 |      "output_type": "execute_result"
 46 |     }
 47 |    ],
 48 |    "source": [
 49 |     "a1[-1]"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "code",
 54 |    "execution_count": 176,
 55 |    "metadata": {
 56 |     "collapsed": false
 57 |    },
 58 |    "outputs": [],
 59 |    "source": [
 60 |     "import theano\n",
 61 |     "from theano import tensor as T\n",
 62 |     "\n",
 63 |     "\n",
 64 |     "\n",
 65 |     "W_values = np.random.randn(3,3)\n",
 66 |     "bvis_values = np.random.randn(3)\n",
 67 |     "bhid_values = np.random.randn(3)\n",
 68 |     "W = theano.shared(W_values) # we assume that ``W_values`` contains the\n",
 69 |     "                            # initial values of your weight matrix\n",
 70 |     "\n",
 71 |     "bvis = theano.shared(bvis_values)\n",
 72 |     "bhid = theano.shared(bhid_values)\n",
 73 |     "\n",
 74 |     "trng = T.shared_randomstreams.RandomStreams(1234)\n",
 75 |     "\n",
 76 |     "def OneStep(vsample) :\n",
 77 |     "    hmean = T.nnet.sigmoid(theano.dot(vsample, W) + bhid)\n",
 78 |     "    hsample = trng.binomial(size=hmean.shape, n=1, p=hmean)\n",
 79 |     "    vmean = T.nnet.sigmoid(theano.dot(hsample, W.T) + bvis)\n",
 80 |     "    return trng.binomial(size=vsample.shape, n=1, p=vmean,\n",
 81 |     "                         dtype=theano.config.floatX)*5\n",
 82 |     "\n",
 83 |     "sample = theano.tensor.matrix()\n",
 84 |     "input = sample[:,:-2].flatten()\n",
 85 |     "\n",
 86 |     "values, updates = theano.scan(OneStep, outputs_info=input, n_steps=10)\n",
 87 |     "\n",
 88 |     "gibbs10 = theano.function([sample], values[-1], updates=updates)"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": 173,
 94 |    "metadata": {
 95 |     "collapsed": false
 96 |    },
 97 |    "outputs": [],
 98 |    "source": [
 99 |     "a = np.array([[1,2,3],[1,2,3],[1,2,3]], dtype = theano.config.floatX)"
100 |    ]
101 |   },
102 |   {
103 |    "cell_type": "code",
104 |    "execution_count": 174,
105 |    "metadata": {
106 |     "collapsed": false
107 |    },
108 |    "outputs": [
109 |     {
110 |      "data": {
111 |       "text/plain": [
112 |        "array([ 0.,  0.,  5.], dtype=float32)"
113 |       ]
114 |      },
115 |      "execution_count": 174,
116 |      "metadata": {},
117 |      "output_type": "execute_result"
118 |     }
119 |    ],
120 |    "source": [
121 |     "gibbs10(a)"
122 |    ]
123 |   },
124 |   {
125 |    "cell_type": "code",
126 |    "execution_count": 152,
127 |    "metadata": {
128 |     "collapsed": false
129 |    },
130 |    "outputs": [
131 |     {
132 |      "data": {
133 |       "text/plain": [
134 |        "array([ 1.,  1.,  1.], dtype=float32)"
135 |       ]
136 |      },
137 |      "execution_count": 152,
138 |      "metadata": {},
139 |      "output_type": "execute_result"
140 |     }
141 |    ],
142 |    "source": [
143 |     "a[:,:-2].flatten()"
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "code",
148 |    "execution_count": 156,
149 |    "metadata": {
150 |     "collapsed": false
151 |    },
152 |    "outputs": [
153 |     {
154 |      "data": {
155 |       "text/plain": [
156 |        "array([ 6.,  6.,  6.], dtype=float32)"
157 |       ]
158 |      },
159 |      "execution_count": 156,
160 |      "metadata": {},
161 |      "output_type": "execute_result"
162 |     }
163 |    ],
164 |    "source": [
165 |     "a.sum(axis=1)"
166 |    ]
167 |   },
168 |   {
169 |    "cell_type": "code",
170 |    "execution_count": 609,
171 |    "metadata": {
172 |     "collapsed": false
173 |    },
174 |    "outputs": [],
175 |    "source": [
176 |     "def MultSim( v, v_sum) :\n",
177 |     "    return trng.multinomial(size=None, \n",
178 |     "                            n=v_sum, \n",
179 |     "                            pvals=v/v_sum[:,None], \n",
180 |     "                            dtype=theano.config.floatX)\n",
181 |     "\n",
182 |     "input = T.matrix()\n",
183 |     "input_rSum = T.vector()\n",
184 |     "\n",
185 |     "values, updates = theano.scan(MultSim, outputs_info=input, non_sequences = input_rSum, n_steps=3)\n",
186 |     "\n",
187 |     "gibbs_mult = theano.function([input,input_rSum], values, updates=updates)\n",
188 |     "\n"
189 |    ]
190 |   },
191 |   {
192 |    "cell_type": "code",
193 |    "execution_count": 610,
194 |    "metadata": {
195 |     "collapsed": false
196 |    },
197 |    "outputs": [],
198 |    "source": [
199 |     "b = np.array([[2,0,0],[0,0,2],[1,1,1]], dtype = theano.config.floatX)\n",
200 |     "out = gibbs_mult(b,b.sum(axis=1))"
201 |    ]
202 |   },
203 |   {
204 |    "cell_type": "code",
205 |    "execution_count": 670,
206 |    "metadata": {
207 |     "collapsed": false
208 |    },
209 |    "outputs": [
210 |     {
211 |      "name": "stdout",
212 |      "output_type": "stream",
213 |      "text": [
214 |       "(3, 3, 3)\n"
215 |      ]
216 |     },
217 |     {
218 |      "data": {
219 |       "text/plain": [
220 |        "array([[[ 2.,  0.,  0.],\n",
221 |        "        [ 0.,  0.,  2.],\n",
222 |        "        [ 1.,  1.,  1.]],\n",
223 |        "\n",
224 |        "       [[ 2.,  0.,  0.],\n",
225 |        "        [ 0.,  0.,  2.],\n",
226 |        "        [ 0.,  1.,  2.]],\n",
227 |        "\n",
228 |        "       [[ 2.,  0.,  0.],\n",
229 |        "        [ 0.,  0.,  2.],\n",
230 |        "        [ 0.,  1.,  2.]]], dtype=float32)"
231 |       ]
232 |      },
233 |      "execution_count": 670,
234 |      "metadata": {},
235 |      "output_type": "execute_result"
236 |     }
237 |    ],
238 |    "source": [
239 |     "print(out.shape)\n",
240 |     "out\n"
241 |    ]
242 |   },
243 |   {
244 |    "cell_type": "code",
245 |    "execution_count": null,
246 |    "metadata": {
247 |     "collapsed": true
248 |    },
249 |    "outputs": [],
250 |    "source": []
251 |   }
252 |  ],
253 |  "metadata": {
254 |   "anaconda-cloud": {},
255 |   "kernelspec": {
256 |    "display_name": "Python [conda env:py3]",
257 |    "language": "python",
258 |    "name": "conda-env-py3-py"
259 |   },
260 |   "language_info": {
261 |    "codemirror_mode": {
262 |     "name": "ipython",
263 |     "version": 3
264 |    },
265 |    "file_extension": ".py",
266 |    "mimetype": "text/x-python",
267 |    "name": "python",
268 |    "nbconvert_exporter": "python",
269 |    "pygments_lexer": "ipython3",
270 |    "version": "3.5.2"
271 |   }
272 |  },
273 |  "nbformat": 4,
274 |  "nbformat_minor": 2
275 | }
276 | 


--------------------------------------------------------------------------------
/scripts_old/score_rsm.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 43,
  6 |    "metadata": {
  7 |     "collapsed": false
  8 |    },
  9 |    "outputs": [],
 10 |    "source": [
 11 |     "import numpy as np\n",
 12 |     "import pandas as pd\n",
 13 |     "\n",
 14 |     "from six.moves import cPickle as pickle\n",
 15 |     "\n",
 16 |     "import theano\n",
 17 |     "import theano.tensor as T\n",
 18 |     "\n",
 19 |     "from lib.rbm import RSM\n",
 20 |     "\n",
 21 |     "import matplotlib.pyplot as plt\n",
 22 |     "%matplotlib inline"
 23 |    ]
 24 |   },
 25 |   {
 26 |    "cell_type": "code",
 27 |    "execution_count": 53,
 28 |    "metadata": {
 29 |     "collapsed": false
 30 |    },
 31 |    "outputs": [
 32 |     {
 33 |      "data": {
 34 |       "image/png": "iVBORw0KGgoAAAANSUhEUgAAAh4AAAFkCAYAAABvkjJwAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAAPYQAAD2EBqD+naQAAIABJREFUeJzt3XecXGXZ//HPlUYoIfRQpT6GGmCjGBUEBIJ0hEdgBQQk\nCEgNECmKhCZFCCpFfxi6sggoDyBIQBGiVEkAgYRQAiEb0hM2kISU3ev3xzWHnZ3d2Z3ZnZbZ7/v1\nmtfsnHPPfe45O3POde52zN0RERERKYUe5S6AiIiIdB8KPERERKRkFHiIiIhIySjwEBERkZJR4CEi\nIiIlo8BDRERESkaBh4iIiJSMAg8REREpGQUeIiIiUjIKPERERKRkVpjAw8xOM7MPzGyxmb1oZl8t\nd5lEREQkPytE4GFmRwLXA5cAOwOvA2PMbJ2yFkxERETyYivCTeLM7EXgJXc/K/XagKnAb9z92rIW\nTkRERHJW8TUeZtYbGAz8I1nmES39Hfh6ucolIiIi+etV7gLkYB2gJzAzY/lMYGBbbzCztYF9gQ+B\nz4tZOBERkSrTF9gMGOPucwud+YoQeHTGvsAfy10IERGRFdjRwL2FznRFCDzmAI3AgIzlA4AZWd7z\nIcAf/vAHttlmm+KVTFoYPnw4N9xwQ7mL0a1on5ee9nnpaZ+X1sSJEznmmGMgdS4ttIoPPNx9mZmN\nA/YCHoEvOpfuBfwmy9s+B9hmm22oqakpSTkF+vfvr/1dYtrnpad9Xnra52VTlK4KFR94pIwC7kwF\nIC8Dw4FVgDvLWSgRERHJzwoReLj7/ak5Oy4jmlheA/Z199nlLZmIiIjkY4UIPADc/RbglnKXQ0RE\nqsMnn8CyZbDuuuUuSfdS8fN4yIqjtra23EXodrTPS0/7vPSKtc+POQb22AOamoqSvWSxQsxcmi8z\nqwHGjRs3Th2SRESklalTYdNNwR0eeQQOOqjcJaoc48ePZ/DgwQCD3X18ofNXjYeIiHQ7d90FK68M\nNTXwy18WPv+FC+H55wufbzVQ4CEiIt1KUxPcfjsccQRcfDH861/w4ouF3cZJJ8Fuu0FDQ9fzGj0a\n/va3rudTKRR4iIhIt/LMM/DBB3DiiXDwwfDlLxe21uOhh6CuLgKc117ren6//CUMHx7NQtVAgYd0\n2WefwY47wq23lrskIqX1zjtwwAExOkJWHLffHsHGN78JPXrAuedGsPDuu13Pe84cOOUUOPDAaMoZ\nN65r+blDfT1MmhQ1M9VAgYd02QUXwH//C489Vu6SdOx3v4sDQzGcfTbcUoAB3/X1sOuuMCPbDQG6\nqY8/hgceKHcpmjU2wvHHw+OPV1c1eLFMmgRzC367sfx98gn8+c/wwx+CWSz7wQ9iSO3113c9/zPO\niCG6t94KO+3U9cDjk09g0aL4+/e/73r5KoECD+mSp5+Gm2+GLbeMNtJKrgqcNQtOPbU4P96FC+G3\nv4VRo7q+D269FZ57LqqDpdlll8GRR8ZohEpw003wwguw3nrwxBPlLk1lc4e994ZLLy13SeDeeyMw\nOO645mV9+8KZZ8Kdd8LMzPug5+Evf4H77oMbb4QNNoDBg7seeNTXx/Mhh8CDD8L8+V3LrxIo8JBO\n+/TTaCPdffc44c6aBVOmFHYbr74aY+1Hjux6XskPeMyYrueV6dlnYelSeP/9qP3prMZGuOOO+Ht8\nJwaxNTZ2ftv5mjgRhg2Lg3ixLVsWB133aDsvt8mT4aKL4LTT4IQT4jtV6Lkg3OE3vylM9X8xLV4c\nNYmLF2dPM3ly/P4K0d+hq267LZrH1l+/5fJTT4VevSKgzCWPE06AP/2pORCYMyfyOOQQ+P73Y9ng\nwdEc9+mnnS9vctz62c9g+XL4wx86n1fFcPeqewA1gI8bN86rTWOj+6hR7p98Uu6SuJ9yivuqq7q/\n/777rFnu4F5X1/V8m5rcn3rKfZ99Is9VVnFfeWX3zz7rWr6PPBL59erlvmBB18uZ7swz3TfZxH2N\nNdx/9rPO5/O3v0UZN9/cfe+9c3/fZ5/F/6NHD/eDDnIfOzb2YzHtt1+U9bHHirsdd/fHH49t7bCD\n+6BBxd9ee5qa3Pfay/1LX4rv0T//GWUbP76w23nmmch3m226/t0vlqYm9+OOi3LedVf2dHfcEWnW\nXLP438v2vPpqlOPhh9tef9ZZ7mut1f7+/vzzSLPWWpFXjx7u3/hGPNZay3369Oa0//1vpHn22c6X\n+dZbYxtLl7offnj8Bgq5D//4R/dTT41zS2LcuHEOOFDjRThHq8ZjBfP883DOOfDww+Utxz/+EVc5\n114LW2wR7aNbbNH1IWlTp8Iuu8A++8QVxH33Ra3H4sXRlt4VyZXD8uWFb8YYMwb22y96yP/5z53P\nZ/Ro2GGHqAYePz63ZpuXXoKdd4a774bzzoury299C4YMiVqCYtSCPPdc9GtYZZXYbrHV1cHAgXDF\nFVGj9MYbxd9mNrfdFt//3/8e+vWDb3wDVlut8M0to0bB5ptHLeLppxc270K56aaYD2P11WOfZPOv\nf0V/ivnzo69Oudx+OwwYEL/VtgwfHsNfb789ex4PPwzz5sVv4KOP4jg4YEDUbPzudy1rUrbZpusd\nTOvrI8/evWOI7htvwMsvdz6/dO5wzTVx3O1RymigGNFMuR9UcY3H+edHBH3RReUrQ0NDXO3tuWfL\nKPn733cfMqRreR95pPv667s/+WTLqH7nnd2/972u5X3RRVErsfnm7qed1rW80n3wQfxP/vzn5lqV\nt97KP59Zs9x793b/1a+a85kyJXv6pUvdL7nEvWdP969+1X3SpFje1BS1EHvsEXnsumukLZSmpsh7\n0CD3q69279u3uDVwixa5r7aa+8iR7kuWxFXl+ecXb3vtqa93X3119xNOaLn8kEPcv/Wtwm1n0iR3\nM/fRo6MmoaMahXJ45pn47p19tvu557pvtFH2K/H/+Z/mGrInnihtOROLF0eNy09+0n66o45y32IL\n9+XL214/dGjUbuRqyBD3o4/OPX2mH/7QfZdd4u/GRvdNN3U/8cTO55fuH/+I/8nf/95yebFrPMoe\nJBTlQ1Vx4LHttvFfO/zw9tNddJH7bbcVpwznnBNNLJMnt1z+m9+49+kTVZGd8dJL8dnaKvcvfhFN\nLgsXdi5v96gS/sY3okliq606n0+m3/0uDsCffBIHt9VWc7/ssvzzuf762H9z5sQJDtwfeih7+gMO\niO1eckn2wOLvf4+mpQsvzL882Tz1lH9RXV1f33yC7IpFi7If6B94ILaXBFannuq+8cYtg95imz7d\n/Z573L/2tQiM581ruf63v4393FYA1tTkPmKE+4sv5r69U091X2+9+D65ux9/fHz/J0zo/GcopI8+\ncl933bj4WLasuSls4sTWaadPj3X33huf4brrSlvWxYvd/9//c//yl+P38vbb7ad/+eUo71/+0nrd\nhx/G9z2fY+tpp7lvvXV+ZU43dKj7YYc1v77sstiPDQ2dzzNx0EFtN90o8FDg8YX334//2CabuG+/\nfftp11uvOG3h06fHFe7Ika3XJYFDPgfYRFOT++67u2+3XdsnoHffjbwfeCD/vBN77RW1Jg89FHm9\n/37n80p36KFRq5A46ij3HXfML4+mpmjLP+KI5tfrruv+85+3nX7GjPgMt97acd6/+EUcLP/5z/zK\nlK2cX/taXIElB6u9947/XVfstVfUorQVTBx2mHtNTfPrf/87PnshPk97pk93P++8+B1FpXT8X59+\nunXayZOzn6wee6z5vbkES3PmRJ+mSy9tXvbZZ/H92H773ILv998vTGB2001Rw3TIIfH3pEkRJH7l\nK1HrOWtWpPv006itu+mm1nkkgeO0aVEzd9xxXS9XLubOdb/iCvcBA+L7f9hhcYzKxW67uX/zm62X\njxwZFxaffpp7OW67Lbbf2X5l224bfcgSU6dGn4/f/a5z+SXeeSd7EKXAQ4HHF37967givvJK95VW\nyn6FOHt284Gyvj6/bXTUaWn4cPf+/d3nz2+9bsmSKNevf53fNt3dH33UO+youNNOzSfmzhg4MMr/\nySdx5fPb32YvS1snl7YsXerer5/75Zc3L3vwwfgs776be9mefz7e8+STzcv23df9wAPbTn/ffZH+\n4487znv58ggMNtooDsZdkTQBPfVU87K7745lH37YuTyTgBqi1izdJ5/Ed+qXv2xe1tTkvtlm7sOG\ndW57uTruuGhW+cEP3P/whwj22jNwoPuPftRyWRKobbZZfL777+94u1deGcF9clJPvPFGBCQnnJD9\nt79kSXzHwf2qqzreVnveey+2t88+0YzUu3fku/rqUb7Mw+tuu7l/97ut8znjjGi6cI9mg8GDs29z\n9OgIQLvadFdXFx29V1opajjfeSe/9ycXJ+kXUZ1t5njttchr7Nj83pdYfXX3a69tuezAA9vfj7k4\n7bS4uElq1dIp8FDg8YW9945qtyee8Hav2JPe8JBfFfhrr0UTxOmnt70+qe245JLseXz96+61tblv\n0z2qarfd1v3b324/8Lnyys43tzQ1RfPQ9dfH6113jZqKTDNmRLr+/XM7qY8dG/v5P/9pXvbZZ3HA\nvvrq3Mv3wx/GQS39KvWCCyJYaMtJJ8UVcK4++ijatw8/vPM94hsb46p9jz1a5vHpp/F/ueKKzuV7\n5ZWxz084IfJJ/17feWfs348+avmen/40/kdtHTQLoanJfYMNosYjV2edFbUA6ftmzJgo/5gx0cdh\n662zBw3u0Uy5/vqtA5hEMjpkxx2jfT7dlCkR5PTqFc/9+rUOXnLV1BS/x802a766//TTuDA455wI\nQDONHBkn+8zPt9NOzbUco0bFMSTbPkj6Je29d+f6Jc2bF8cfiP5iHQWL2Sxf7r7llpFH4sknI9/n\nnssvr6VLIwC64Yb8y9HQ4F80U6V7+OFYfsYZnQv4582L31q2GlUFHgo83D2+gL17u994Y3Nnxscf\nbzvtzTdH2sGDO+4LkvjTn+KLuPHGkfd997VOc8452Ws7EmefHZ038/H738c2X3ml/XTvvBPpHnww\nv/zd4woK4nO6Rw3F6qu3PriddVZ8xnXXza125aKL3NdZp3W19uGHR7VyLhYsiBNvZvPV/fdHmWfO\nbP2eLbfMv4NsUuXd2f4Yf/pTvP/f/2697phj4oo/36CmqSmCzqOPjhPbppu27LS8775xJZ1pwgT/\nokNvMbz5ZnPAkKtkKHTSD6OpKfoUfe1r8fcrr8T6u+/OnkcSaLXXl+PFFyPAB/eDD47mj0cfjcDy\nS1+K9bNnx/f7jDNyL3+60aPz//xJE9jLLzcv++STln2AkpN3WzUQS5ZEUHL44XH8Ov74/L5Pf/97\nHL/6948hol11001RM5qc2I86KgLHzgTuu+wSv5F8Jd/zzNqSxsY4Xqy5ZpSxtrbj42e6a6+N2vP0\nob/pFHisgIHH22/Hl7SznSzbkpw0PvggvnR9+2aPoH/84+grcemlbZ9c0y1fHh0PIb68CxdGlN+v\nX1S1JmbMiKv4bBFyImkCaOtk2ZbPPosry+9/P7f0O+7Y8iokV8mJJDlpJv1R/vWv5jRTpsSP8Yor\nomo9lzkqBg9uu4bn3ns95+aH0aPj4Jw5giXp15I5CuDDDzt/0j3xxAgw77gjDmrtXX2nW7AgOuft\nt1/b65MTSvpJJxdJNXSyn5OOq7/9bVyt9+zpfsstbb+3pqbtqv1CuOGGuEpdtCj39yxaFL/LUaPi\ndTJi4K9/bU7z3e9Gs0Nbv8mmpuhPsv/+HW+rqSl+a5tuGjUcEB0F05vSrroq1uXT5OcefTH6948T\nfz6WLo3+D+lNPEmn06RjcNLRtK2+MM8951/UHt5zT/yd3s+lPVdfHen33LN17VhnffZZnNjPOSf2\na2aTXz5OPTW/GspE8nvI7MifXsYbb4yLvaSm6L//bT/PZcuin2B7fW0UeKyAgcfNN3unr8yzOe64\nlh1KBw2Ktsu27L57XK0nvbOzTV7T0BAjI3r0iAg4ieQbGuLg+JWvxFWIewyXW3311r35MyUnxbaq\nYtty2WVxsv/gg9zSX3FF1A7kc0Jwb26eSgKB5cuj01z6ZF/DhkVNx6efxr4YOjQO7NkmE5o507MO\nc2xoiM+VnITaM2RIXNlnamyMff6LX7RcfscdEah0pr/GZ59Fp7mkKW6VVeKq/LzzsveSX7w4Duir\nr559mPDy5e4bbpi9mS6bESOixij9RHzSSXECO//8CDyyNReMGhX7eN68+H+98050lDv33OgP0RX7\n7x8dXvO1777xvXGPZoOampZXyP/9b/zv2uoU/Pe/e5tDG9uzeHGcDG+6qfWV+KJFUQOQ7zD07343\nOmR25vu1//4tJ7678MLo6J6UranJfe212w4orr46/u/LlsXryy/P/vtKl9QSXXRR4Uc6XXhhXIRd\ncUUEcbleUGVKLi4yO6XW10eN32uvtf2+pGmto4vY5cujhnTgwDien3Za9v9fUnOZbZvuCjxWyMAj\nqUE44IDC5Ld8eRycL7igedn3vhcHtkzpP+zGxjiRpr8v3bBhcTL5299ar/vPf6K6c/jw5tqOiy/u\nuKxNTXHQymWekc8/j+0PH95x2sSkSd6pq/3kh58EUu5Rc5KMj580KU5y6bVI770XV7Dnntt2nkmt\nSLbqyoMOartnfLo33vB2R+t861utTxzHHttylEdnzJsXHWivuy5qm/r1iwNg5tXx0qVRnd+3b8ed\n49oKItrT2Bgnxh//uOXyTz5pbvL7zneyv//jj+Mgu8su0S8C4n+89tpxkvjJTzo34+eSJRHc5tNH\nJ5HUlCSBblvDoWtr4/Ml/VMWLoxagjXWaB2odFVyUn7hhdzSJx2jOzt67Prr47uSfLZdd23d3LvH\nHm0HQwccEB1ZE01NUUPXq1f2i7innor1w4YVZ0bUadPiONizZ9dq15IZU9NrWN2jiRFi7p62XH55\nHMNztWRJ/Kb79YsLq1tuiePT9Onxe/n447jQ2XPP9vNR4LECBh7HHht7tkeP+OJ2VTLiIb1T089+\nFk0UmZJhlskP9Zhj2h7a+dFH8YO65prs2x01KvLabbcIEHK9AjrkkNyuFpODc0dVg5l23DGasvJx\n6aUREKVLhrnNmRP5pZ8MEldfHQedtqbDPvbY9ofN3nln5N/eyKIzz4wDS3pAlO7ss6M/R6KpKTqc\n5tPpMRcTJ0ZTypprNo9YaWyMz9irV27ToifTQ+da2/Xss561z0jSX6KjtvpTTong7oILoozz58e+\nTEaGfOlL2afHzibpnN2Zw8fEifHeddeNWsm2rsAnTYpjw/XXR5PSBhvEb/H00zt/RZ3N8uVRjl13\n7fjEvHBh/EYOPbTzJ/HXX4/P/49/xG+pT5/WJ9XTT2/d7NDYGIFX5vw3S5e6/+//Rp7HHtuyxvX1\n1+ME+53vFHaCvEzJ8Ty9ySxfSQfT9H2R9Inp3Ttq+dpy8skxeWK+pk+PztpJzWbmo6PfhAKPFTDw\n2HPP+DH07du5q6ZMF14YV5Lp7fFJG2hm9XjSrpxM5PPHP8brzADorLPih97e2PKmphi2BbnVdiSu\nuioOCB31H/jxj6PXfL4Huc40t5x0UuvhZ1On+hdVtBCdXDMtXRoH7sGDW1aTNjZGFXJ7M2jOnx9V\nx9lqnBYtihP9iBHZ80hmrUyGFyY1Ptk6FnfFvHnRVNCzZxwgzzgjAqe2Ohpns9NOLSc7as+PftR6\nJE+6Dz/s2lXs++83z5Z5+OG5z6Pw05+23WE4F01N8Zk6qjVITgpmcXFQqDll2pIE+P/3f+2nS0ZK\ndDTBVnuSWtaLLmoe8ZV5GE4m3EtvPkgClmeeaZ1nU1P8Dvr3jyDt0Ufjt7vRRnFSLvR9lzK99178\nFpImoM766lcjiHGPY+POO8eyo47KPhPqAQdEzWlnvfFG7K/k8de/Rk1nR78rBR4rYOCx1VZRPX/0\n0XEVmc/Bs620O+wQcwmk+89//IuOWOmS2UOTH8ns2a0niZk5M7eOou5RG/DTn+Y3rv7pp6Ns7bWz\nNzVFDUP6xDi5Sk6+l1+e+77db7+oicm03XaR11ZbZb9qevHFOFD26BFByEknxZUZdDzfxwUXRPAx\nZ07rdUnwmHS8a0vSFJMckJMZMot1sF22LL67yZVRvpMU3XhjlK+jochLlkTQVcgZVdvS1BRt3/36\nRVCUy7w2u+zSuQ7MieHDY1vtBS7TpsV3//XXO7+dfOyzT4zIaK9MJ50Ux6uuOvLIGMlz5ZVtX4Ak\nnUjT+xjceGNc+bd3MVFf3xxIrrNO1GYVoka5VE45JZoz3WMmVYhjy5VXxkVgW8eyHXeMjqmlVtWB\nB/Ah0JT2aAR+kpFmE+AxYCEwA7gW6NFBvmULPJqaoqbjV79q7iyW67jvurp470knNXfiSzprZk48\nlIzv/sMfWi4/+eQIVNINGRLVlYmLLooag7ZOhoWwYEGcpNsbtjl+vOfdkS7deef5Fz35Z8/uOP0O\nO7Q9/DSZbClznHymd9+NGpFhw6KTr1mcOLM1kSRmzYoOnG3VGH3rW23300m3bFnLEUzf+15+94no\nrPvvj6aifH3ySXze9AnV2pJMRNbVTqC5+u9/oyf/Rhu136lu3ryOv7sdWb684+9FqbXXrOUeAckG\nG2Tvz5SP3/8+9mG2TtPJ0PZ77mledsQRuX2vm5riImqXXWKk2ook2S9Tp0Y/pGTU0P/9n7dZK+0e\n6a68srTldK/+wOMD4CJgXWC91GPltPU9gDeAMcAOwL7ALOCKDvItW+AxZ45/0cci35nuhgyJmykl\nHeX23TdOdL16tT3iYIMNWt+CfdddW/d/uPTSqKZctiyq/1dfvfB9BDINGtT+zJIjR0aZutI2+/DD\n8cPccMOOax7WWqv16BD3aJI677z8q9UXLMh9cqJkRFD6/Cdvv+059WFwj4PsscdGGddZp/X/vNIM\nGxYn+faa2o48snWAXGzTpkXnzdVWy95nJelc2d7N+VZESUfebHO/JDWohZiGPpk+HrJPKvelLzU3\nUyaTtZXrxn+lklxsDRkSNUFJp/RkfqL0WYvdo/Ynl1E9xVDswKOUN8LN5jN3n+3us1KPxWnr9gW2\nBo529zfcfQxwMXCamfUqS2k7kNx6feON4zbDxx8Pf/oTLFzY/vvefDNuKX/11XEb7Lvvhlmz4jbp\ne+wRt53ONHAgTJrU/Nod3noLttuuZbr99otbPb/wAtxyCyxZAuec05VP2bGvfS0+TzaPPBLl6t27\n89s4+GB4/fXYD3vtBZde2na6RYviNtYbb9x63dZbwy9/mf8tofv1i1th5+K882DpUvjNb5qXjR4N\na60Fhx3W8ft33hlefTW+I3PmwLe/nV9ZS+2UU+I2248/3vb6Tz+N///RR5e2XBtuCGPHwp57wkEH\nwT33tE7z1FPw5S/Dl75U2rIVW48ecOSR8MADsHx56/V//SussQZ885td39bmm8cDYLfd2k6z/fbx\nfQaYPBmmT8+etlpstx306RPHxZ//PG51D7DFFtC3bxy7002bFs9tHbdWdJUQeFxgZnPMbLyZnWdm\nPdPWDQHecPc5acvGAP2BjNNrZZg6NZ6TL8txx8Fnn8GDD7b/vttug3XXhQMPjC/nscfCuHERLIwe\n3fZ7tt66ZeAxYwbMnx8/6nSDB0fef/4z3HAD/PCHsMEGnft8uRoyJH5IDQ2t19XXw/jxETh01UYb\nxclixAgYORI++qh1mnL/gNdfH04+Ofb9ggUR+N15J/zgB3HA6UhNDUycCI89BiutBF//etGL3CWD\nB8NXvgK/+13b62+/HRYvhqOOKm25AFZdFR56KH6XJ54YgUi6p56CffYpfblK4aij4mLmmWdar3v0\n0a5fCKTbe+84ju2yS9vrt98e3ngj/h47FswKE/RUsj59YKedIrA988zm5T17xrF8woSW6dMvYqtN\nuQOPXwNHAXsAvyOaXa5JW78+MDPjPTPT1lWc+vr4IiXR7OabxxXqHXdkf8+SJVHDcdxx8eVMmMUJ\nfNNN237fwIHw7rvQ1BSvk4g5s8ajRw/Yd1+48cYITEaM6Nxny8c++8Rnueaa1usefRR69YLvfKcw\n2+rZs/mH/OqrrdeXO/CA2OeLFsHNN8PDD0fNxUkn5fbenXeGxsZ47ze/mVuwUm6nngp/+xt88EHL\n5W++CRdcEOuzfa+LrWdP+H//L66wDzsM3n8/lk+eHI+hQ8tTrmIbPBi23BLq6lounzYtLgQOPLBw\n27rgArj33uzf1R12iIuEBQvgX/+CQYOixqXa3X03PPFEy+M8xDE7s8YjCTw22qg0ZSulggceZnaV\nmTW182g0sy8DuPuv3H2su7/p7rcC5wBnmFmB4u7Sq6+PKt2eafU2J5wAzz7bfIDL9PDD0RRw4on5\nbWvgwLhyTGpZ3norfuhbbNE67X77RYBy9NHN1aDFtMkmcfC57jp4++2W6x55BL71LVhzzcJtb8MN\nYb314gCaqRJ+wBttBMOGwfXXR5PLN78J226b23t32CG+T9OmVX4zS+LII6N58Pe/b16W1HJsuWXs\nh3Lq3TuaHdZcM5pdGhqitqNnz2jarEZmUFsLf/lLXOwkHnssPnehLgQgjkGHH559fVIr++abEXhU\nezNLYuDAto+/224bx+/oohjq6+P7ueqqpStfqRSjn8R1QDvX9wBMzrL8ZaJMmwHvEqNYvpqRJmlZ\nn9FRQYYPH07//v1bLKutraW2trajt3ZafX3rK+vDDoPTTosmk6uuav2e0aNh112jui0fAwfG89tv\nx9XjW29FHulBT2L//ePAcvHF+W2jK84/P9rRTz89Dupm0b7/9NNw7bWF3ZZZNElkCzzWXBNWWaWw\n28zX+efHifi556KpJVd9+8aB6Y03VpzAY9VVoynpttuiCaxPn+jr8v778J//wMorl7uE0cfmr3+N\n/khHHhn7eciQtvtTVYujjoIrroAnn4yAC6IG8pvfjP1RKslx6qmn4L33uk/gkc1220Xw+/HHzRdI\nbZ1LiqGuro66jGqwhrbayAupGD1WO/sAjgaWAf1Tr7+Ter1OWpofAfOB3u3kU7ZRLd/+dtvTAf/k\nJzE6JXMT7DviAAAgAElEQVT4aNIDvDNDF5cvj9nwfv3reP2Nb8TcIZXkscfi8yWTUCWjBooxYdKF\nF8YIl0ynnVb6ERTZnHxyDMNduDC/9x13XIzGKOYMjYWW3Jjvvvti6nDIfsO3cnrqqZinBVrfIbga\nbb99840NFy6M4dqdvflZV2y9dQxvho7nfal2yQ0h00e2HHJI9psyFlvVjmoxsyFmdpaZDTKzzc3s\naGAUcI+7J+HWk8AE4J5Uun2By4Gb3H1ZmYrervr6aGbIdMUVcbV6+OHRUTBxxx1xhfW//5v/tnr2\nhP/5n+hgmm1ES7ntvz8cemiMoklGM2y/fdvNQV1VUxNXDDMy6sKmTaucDlo33BD9UPKtfRkxAu66\nq3Cd/0phu+2iSe2aa6IZ8dBDY8RLpdl77+j/1KMHHHBAuUtTfLW10by7cGHUPn7+eXPtRyltv338\nNrfaqvid3Svd5pu3HtlSScetQitn59IlRMfSZ4A3gQuB64GTkwTu3gQcSEws9jxwN3AncElpi5ob\n9+zVY717w/33x7oDDoje5Y2N0cO/trbz7XjJkNqPP46qukoLPAB+9avo1Przn0d7ciFGs7Slpiae\nMzuY1tdXTgetlVfuXKfK7bbLbehtpTnllOZA67bbokmsEp16avwmv/KVcpek+I48Mjo6P/ZYNLNs\ntVWMtCi1HXaI5+7ezALNI1vSA49SNbWUQ9kCD3d/1d2/7u5rufuq7r69u1+bWZPh7lPd/UB3X83d\nB7j7+amApOLMnx8/6Gxflv7948e+aFFc/T38cES1w4Z1fptJ4JFtREsl2HRT+NnPIgCZO7d4gcfm\nm8c+zuznUc0/4Ep32GFwzDERdJeyD0FnrL12uUtQGltuCV/9aow6+etfo7ajHAFh0sFUgUdIH9my\ndCnMnFm9x61yD6etKrmMu95002hueO01+P73YccdY5hbZw0cGNt96aW4mi7FiJXOOPfcuKoaMCAO\nesXQVgfTZcuq+wdc6VZaKToYV/rcI91N0tzy8cflaWaB6FC/++4x4k4i8JgwIWrOp0+P52o9binw\nKKAk8Girj0e6XXaJg/GSJTGxVFeuNpKRLQ89BNtsk/8MnKWy0kpxoHvwweKWMTPwSH7AldLUIlIJ\njjgijjv9+0cAUA7rrReTma1fkTMylV76yJZqnjwMijOcttuqr4+Tai4/pMMPhw8/7PrUzEng8eqr\nMdtpJct3uHBn1NTEHBHz5kXVfrX/gEU6Y6ONYqK0DTdcsTosV7NkXp+33opme6je45YCjwKaOjV6\nZ/fKca8WYubGNdaI5ouZMyuzf0eppXcw3WsvBR4i2TzySOXWkHZH6SNbmpriflDVOqeMvnYFlG0o\nbbEltR4KPGJ48aqrNje31NfH64x55ES6vT59cr9IkuLr2TOayydMqP4O8Qo8CqhcXxYFHs169owb\nMSWBx7RpUa1cqcM4RUQSycgWBR6Ss3J9WQYPjo5a5brpVqVJ72Ba7T9gEakeyT1bpk6t7uOWAo8C\ncS/fl2XYsLhfi9prQ01N3LX3008VeIjIimO77eKOva+/Xt3HLZ2qCqShIaYgLkcfj549C3un1xVd\nTU0Egq+/Xt3TDotIdUmayz//vLqPWwo8CkSjJyrHNtvEvCGvvNLcx0NEpNJttlnznZur+bilwCNP\nt98Ov/hF6+UKPCpH794waBA88QQsX67/iYisGJJ7tkB1H7cUeOTprrvgqqti1tF0U6fGyInufpfF\nSlFTA//8Z/xdzT9gEakuSXNLNR+3FHjkaeJE+OyzmOo3XX19BB2aBbAy1NTEjZagun/AIlJddt45\n5h2q9JsqdoUCjzzMnh0PiPuOpNPoicqSzGDauzess055yyIikqsf/xhefrm65x5S4JGHiRPjeZ99\nYrph9+Z1Cjwqy/bbx6yMG22kYcYisuLo2zfu5F3NdEjOw4QJcTI799wYLZF+F9Rqn/BlRdO3b7SV\n6n8iIlJZNFN/HiZMiHuBfPvbcXO2Rx6JWUOhfPdpkezOPrvcJRARkUyq8cjDhAkxpW3v3nDAAc39\nPBYsiFkydXVdWY4/Ph4iIlI5FHjkIQk8AA4+OGbGnDJFc3iIiIjkSoFHjubPh+nTmwOP73wnaj4e\neST6d4CaWkRERDqiwCNHyYiWJPBYffXo6/Hww1HjocnDREREOqbAI0cTJsSwzPRhTgcfDM8+C2++\nCQMGQJ8+5SufiIjIikCBR44mTIAtt4xhmomDD457gdxzj/p3iIiI5EKBR47SO5YmNt44ZsicO1f9\nO0RERHJRtMDDzC4ys+fMbKGZzcuSZhMzeyyVZoaZXWtmPTLSDDKzsWa22MymmNmIYpW5PW0FHgCH\nHBLPqvEQERHpWDFrPHoD9wO/bWtlKsB4nJjEbAhwHHA8cFlamn7AGOADoAYYAYw0s2FFLHcrCxbE\nyJW2Ao+DD45nBR4iIiIdK9rMpe5+KYCZHZclyb7A1sCe7j4HeMPMLgauNrOR7r4cOIYIYE5MvZ5o\nZjsD5wCji1X2TG+/Hc9tBR477gjDh8OBB5aqNCIiIiuucvbxGAK8kQo6EmOA/sB2aWnGpoKO9DQD\nzax/aYoZzSxmsPXWrdeZwahRbQclIiIi0lI5A4/1gZkZy2amrcs1TdFNmACbbQarrFKqLYqIiFSn\nvJpazOwq4Px2kjiwjbu/06VSFcjw4cPp379lxUhtbS21tbV55ZOtY6mIiMiKrK6ujrq6uhbLGhoa\nirrNfPt4XAfc0UGayTnmNQP4asayAWnrkucBHaTJ6oYbbqCmpibH4mQ3YQL87/92ORsREZGK0tbF\n+Pjx4xmc3Hq9CPIKPNx9LjC3QNt+AbjIzNZJ6+cxFGgAJqSlucLMerp7Y1qaSe5e3JAsZeFC+PBD\n1XiIiIgUQjHn8djEzHYENgV6mtmOqceqqSRPEgHGPam5OvYFLgducvdlqTT3AkuB281sWzM7EjgT\nuL5Y5c40aRK4K/AQEREphKINpyXm4/hB2uvxqec9iZEqTWZ2IDHPx/PAQuBO4JLkDe6+wMyGAjcD\nrwBzgJHuflsxCjxhAmy6Kay6astlANtsU4wtioiIdC/FnMfjBOCEDtJMBdqdAcPd3wR2L2DRstp1\nVxg4EJ58Evr1i2UTJsR06MlrERER6TzdqyVl4UKYPx9efDFmI120KJZrRIuIiEjhKPBImT07nn/+\nc3j5ZTjsMFiyRIGHiIhIIRWzj8cKZdaseP7ud2H33WH//WMI7fvvK/AQEREpFAUeKUmNx3rrwU47\nwV/+AoceCk1NCjxEREQKRU0tKUmNxzrrxPP++8P998Nuu8GgQeUrl4iISDVR4JEyaxasuSb06dO8\n7NBDYexYWG218pVLRESkmijwSJk1K5pZREREpHgUeKTMng3rrlvuUoiIiFQ3BR4pqvEQEREpPgUe\nKQo8REREik+BR4oCDxERkeJT4EHcfVZ9PERERIpPgQewYAEsXaoaDxERkWJT4EHz5GEKPERERIpL\ngQcKPEREREpFgQfNgYf6eIiIiBSXAg+iY2mPHrDWWuUuiYiISHVT4EHUeKyzDvTsWe6SiIiIVDcF\nHmgODxERkVJR4IECDxERkVJR4IEmDxMRESkVBR6oxkNERKRUFHigwENERKRUihZ4mNlFZvacmS00\ns3lZ0jRlPBrN7IiMNIPMbKyZLTazKWY2opDlbGyEOXMUeIiIiJRCryLm3Ru4H3gB+GE76Y4DngAs\n9fqTZIWZ9QPGAE8CJwM7AHeY2Xx3H12IQs6bB01N6uMhIiJSCkULPNz9UgAzO66DpA3uPjvLumOI\nAOZEd18OTDSznYFzgIIEHrNTW1aNh4iISPFVQh+Pm81stpm9ZGYnZKwbAoxNBR2JMcBAM+tfiI3r\nPi0iIiKlU8ymllxcDDwNLAKGAreY2aruflNq/frA5Iz3zExb19DVAijwEBERKZ28Ag8zuwo4v50k\nDmzj7u/kkp+7X5n28nUzWxUYAdyU5S0FN2sW9OkDq69eqi2KiIh0X/nWeFwH3NFBmswainy8DFxs\nZr3dfRkwAxiQkSZ5PaOjzIYPH07//i1bZGpra6mtrf3idTJ5mFnmu0VERKpbXV0ddXV1LZY1NHS5\nMaFdeQUe7j4XmFuksgDsDMxPBR0QI2KuMLOe7t6YWjYUmOTuHe6ZG264gZqamnbTaA4PERHprjIv\nxgHGjx/P4MGDi7bNovXxMLNNgLWATYGeZrZjatV77r7QzA4kai9eBD4nAooLgWvTsrkX+Dlwu5ld\nQwynPRM4q1DlVOAhIiJSOsXsXHoZ8IO01+NTz3sCY4FlwGnAKGIOj/eAs9Pn53D3BWY2FLgZeAWY\nA4x099sKVchZs2DzzQuVm4iIiLSnmPN4nABkDo9NXz+GGBrbUT5vArsXsGgtzJoFu+xSrNxFREQk\nXSXM41FWs2erqUVERKRUunXgsXQpzJ+vwENERKRUunXgMWdOPCvwEBERKY1uHXgks5bqBnEiIiKl\n0a0DD90gTkREpLS6deChGg8REZHS6vaBx6qrxkNERESKr9sHHmpmERERKZ1uHXgkN4gTERGR0ujW\ngYdqPEREREpLgYcCDxERkZJR4KHAQ0REpGS6feChPh4iIiKl020Dj0WLYOFC1XiIiIiUUrcNPDRr\nqYiISOl128AjmbVUgYeIiEjpdPvAQ308RERESqfbBh5JU4sCDxERkdLptoHHrFmwxhrQp0+5SyIi\nItJ9dOvAQ/07RERESqvbBh7Tp8OAAeUuhYiISPfSbQOP+nrYZJNyl0JERKR76baBx7RpsNFG5S6F\niIhI99ItAw/3qPHYeONyl0RERKR7KUrgYWabmtloM5tsZovM7F0zG2lmvTPSbWJmj5nZQjObYWbX\nmlmPjDSDzGysmS02sylmNqKr5Zs7F5YsUeAhIiJSar2KlO/WgAEnAe8D2wOjgVWAnwCkAozHgY+B\nIcCGwD3AUuBnqTT9gDHAk8DJwA7AHWY2391Hd7Zw9fXxrMBDRESktIoSeLj7GCJgSHxoZtcBp5AK\nPIB9iQBlT3efA7xhZhcDV5vZSHdfDhwD9AZOTL2eaGY7A+cQgUynTJsWz+rjISIiUlql7OOxBjAv\n7fUQ4I1U0JEYA/QHtktLMzYVdKSnGWhm/TtbkPp66NkT1l+/szmIiIhIZ5Qk8DCzrYDTgd+lLV4f\nmJmRdGbaulzT5K2+HjbYIIIPERERKZ28mlrM7Crg/HaSOLCNu7+T9p6NgL8Bf3L32ztVyk4aPnw4\n/fu3rBipra2lvr5W/TtERKTbq6uro66ursWyhoaGom7T3D33xGZrA2t3kGxy0jRiZhsC/wSed/cT\nMvK6FDjI3WvSlm0GTAZ2dvfXzewuoJ+7H5aWZg/gH8Ba7t7m3jGzGmDcuHHjqKmpabV+6FBYfXV4\n8MEOPomIiEg3M378eAYPHgww2N3HFzr/vGo83H0uMDeXtKmajqeB/wA/bCPJC8BFZrZOWj+PoUAD\nMCEtzRVm1tPdG9PSTMoWdOSivj6CDxERESmtYs3jsSHwDDCFGMWynpkNMLP0u6M8SQQY96Tm6tgX\nuBy4yd2XpdLcSwyvvd3MtjWzI4Ezgeu7Uj5NHiYiIlIexZrHYx9gi9RjamqZEX1AegK4e5OZHQj8\nFngeWAjcCVySZOLuC8xsKHAz8AowBxjp7rd1tmALFsCnnyrwEBERKYdizeNxF3BXDummAgd2kOZN\nYPcCFU1zeIiIiJRRt7tXi2YtFRERKZ9uG3hsuGF5yyEiItIddcvAY731YKWVyl0SERGR7qfbBR7T\npql/h4iISLl0u8BDQ2lFRETKR4GHiIiIlIwCDxERESmZbhV4fP45zJ2rPh4iIiLl0q0Cj2TyMNV4\niIiIlEe3Cjw0eZiIiEh5dcvAQ00tIiIi5dGtAo9p06B/f1httXKXREREpHvqVoGHRrSIiIiUlwIP\nERERKRkFHiIiIlIy3SrwmDZNgYeIiEg5dZvAY9kymD5dI1pERETKqdsEHjNmgLtqPERERMqp2wQe\nmjxMRESk/BR4iIiISMl0m8Bj2jRYeWVYY41yl0RERKT76jaBRzKU1qzcJREREem+ul3gISIiIuVT\nlMDDzDY1s9FmNtnMFpnZu2Y20sx6Z6Rryng0mtkRGWkGmdlYM1tsZlPMbERnyqTAQ0REpPx6FSnf\nrQEDTgLeB7YHRgOrAD/JSHsc8EQqPcAnyQoz6weMAZ4ETgZ2AO4ws/nuPjqfAk2bBrvtlv8HERER\nkcIpSuDh7mOIgCHxoZldB5xC68Cjwd1nZ8nqGKA3cKK7LwcmmtnOwDlEIJOTpibNWioiIlIJStnH\nYw1gXhvLbzaz2Wb2kpmdkLFuCDA2FXQkxgADzax/rhuePTtmLlXgISIiUl7Famppwcy2Ak4nairS\nXQw8DSwChgK3mNmq7n5Tav36wOSM98xMW9eQy/aTOTw0XbqIiEh55RV4mNlVwPntJHFgG3d/J+09\nGwF/A/7k7re3SOx+ZdrL181sVWAEcBMF9Emq18jaaxcyVxEREclXvjUe1wF3dJDmixoKM9uQqNH4\nt7ufnEP+LwMXm1lvd18GzAAGZKRJXs/oKLPhw4fTv39/ZqbqSE4+GU44oZba2tociiIiIlLd6urq\nqKura7GsoSGnxoROM3cvTsZR0/E08B/gWM9hQ2b2U2C4u6+Ten0KcAUwwN0bU8t+ARzq7tu2k08N\nMG7cuHHU1NTwwANwxBEwf75mLhUREWnP+PHjGTx4MMBgdx9f6PyL0scjVdPxDPABMYplPUtNGeru\nM1NpDiRqL14EPif6eFwIXJuW1b3Az4HbzewaYjjtmcBZ+ZRn8eJ4Xnnlzn0eERERKYxidS7dB9gi\n9ZiaWmZEH5CeqdfLgNOAUal17wFnp8/P4e4LzGwocDPwCjAHGOnut+VTmMWLY6r0Pn06/4FERESk\n64o1j8ddwF0dpMmc6yNbujeB3btSnsWLo7ZD92kREREpr25xr5Yk8BAREZHyUuAhIiIiJaPAQ0RE\nREpGgYeIiIiUTLcIPD7/XIGHiIhIJegWgYdqPERERCqDAg8REREpmW4TePTtW+5SiIiISLcJPFTj\nISIiUn4KPERERKRkFHiIiIhIySjwEBERkZJR4CEiIiIlo8BDRERESkaBh4iIiJRM1Qce7go8RERE\nKkXVBx5Ll8azAg8REZHyq/rAY/HieFbgISIiUn4KPERERKRkuk3goXu1iIiIlF+3CTxU4yEiIlJ+\nCjxERESkZBR4iIiISMko8BAREZGSKVrgYWYPm9kUM1tsZh+b2d1mtkFGmk3M7DEzW2hmM8zsWjPr\nkZFmkJmNTeUzxcxG5FMOBR4iIiKVo5g1Hk8D3wO+DBwGbAk8kKxMBRiPA72AIcBxwPHAZWlp+gFj\ngA+AGmAEMNLMhuVaCAUeIiIilaNXsTJ291+nvZxqZlcDD5lZT3dvBPYFtgb2dPc5wBtmdjFwtZmN\ndPflwDFAb+DE1OuJZrYzcA4wOpdyKPAQERGpHCXp42FmawFHA8+lgg6IWo43UkFHYgzQH9guLc3Y\nVNCRnmagmfXPZduLF0OvXvEQERGR8ipq4GFmV5vZZ8AcYBPg0LTV6wMzM94yM21drmna9fnnqu0Q\nERGpFHnVA5jZVcD57SRxYBt3fyf1+lqiSWRT4BLgHuDATpSzU4YPH86sWf1ZsgQOPjiW1dbWUltb\nW6oiiIiIVKy6ujrq6upaLGtoaCjqNs3dc09stjawdgfJJmc0jSTv3QiYCnzd3V8ys0uBg9y9Ji3N\nZsBkYGd3f93M7gL6ufthaWn2AP4BrOXube4dM6sBxo0bN46//KWGe+6BKVNy/pgiIiLd1vjx4xk8\neDDAYHcfX+j886rxcPe5wNxObqtn6nml1PMLwEVmtk5aP4+hQAMwIS3NFWkdUpM0k7IFHZkWL1ZT\ni4iISKUoSh8PM9vFzE4zsx3N7Etm9m3gXuBdIpgAeJIIMO5JzdWxL3A5cJO7L0uluRdYCtxuZtua\n2ZHAmcD1uZZFgYeIiEjlKFbn0kXE3B1/B94Gfg+8BuyRBBXu3kT092gEngfuBu4k+oKQSrOAqOHY\nDHgF+CUw0t1vy7UgCjxEREQqR1EGmbr7m8BeOaSbSgedTVN57d7ZsijwEBERqRzd4l4tCjxEREQq\ngwIPERERKRkFHiIiIlIyCjxERESkZBR4iIiISMlUfeChe7WIiIhUjqoPPFTjISIiUjm6ReDRt2+5\nSyEiIiLQTQIP1XiIiIhUhqoOPJqaYMkSBR4iIiKVoqoDjyVL4lmBh4iISGVQ4CEiIiIlo8BDRERE\nSkaBh4iIiJSMAg8REREpmaoOPD7/PJ4VeIiIiFSGqg48VOMhIiJSWao68Fi6NJ4VeIiIiFSGqg48\n1NQiIiJSWao68EiaWnSvFhERkcrQLQIP1XiIiIhUhqoPPPr0gR5V/SlFRERWHFV9Sv78c9V2iIiI\nVJKiBR5m9rCZTTGzxWb2sZndbWYbZKRpyng0mtkRGWkGmdnYVD5TzGxErmXQnWlFREQqSzFrPJ4G\nvgd8GTgM2BJ4oI10xwEDgPWBDYD/S1aYWT9gDPABUAOMAEaa2bBcCqDAQ0REpLL0KlbG7v7rtJdT\nzexq4CEz6+nujWnrGtx9dpZsjgF6Aye6+3JgopntDJwDjO6oDAo8REREKktJ+niY2VrA0cBzGUEH\nwM1mNtvMXjKzEzLWDQHGpoKOxBhgoJn172i7CjxEREQqS1EDDzO72sw+A+YAmwCHZiS5GDgC2Bt4\nELjFzE5PW78+MDPjPTPT1rVLnUtFREQqS16Bh5ld1UaH0MzOoV9Oe8u1wE7APkAjcE96fu5+pbu/\n4O6vu/svgWuIfhwFoRoPERGRypJvH4/rgDs6SDM5+cPd5wHzgPfM7G2ir8fX3P2lLO99GbjYzHq7\n+zJgBtHxNF3yekZHhX311eH07t2fgw9uXlZbW0ttbW1HbxUREal6dXV11NXVtVjW0NBQ1G3mFXi4\n+1xgbie31TP1vFI7aXYG5qeCDoAXgCsyOqQOBSa5e4d7Zsstb2CLLWq4775OllhERKSKtXUxPn78\neAYPHly0bRZlVIuZ7QJ8Ffg3MB/YCrgMeJcIJjCzA4naixeBz4mA4kKieSZxL/Bz4HYzuwbYATgT\nOCuXcixZovu0iIiIVJJiDaddRMzdMRJYFZgO/A24Mq02YxlwGjAKMOA94Gx3/2KYrLsvMLOhwM3A\nK0Qn1ZHuflsuhVAfDxERkcpSlMDD3d8E9uogzRhiaGwuee3emXIo8BAREaksuleLiIiIlExVBx6q\n8RAREaksCjxERESkZBR4iIiISMlUdeDR2KjAQ0REpJJUdeABCjxEREQqiQIPERERKRkFHiIiIlIy\nCjxERESkZKo+8NC9WkRERCpH1QceqvEQERGpHAo8REREpGQUeIiIiEjJKPAQERGRklHgISIiIiVT\n9YHHSiuVuwQiIiKSqOrAY6WVwKzcpRAREZFE1QceIiIiUjkUeIiIiEjJKPAQERGRkqnqwEPTpYuI\niFSWqg48+vQpdwlEREQkXVUHHmpqERERqSxVHXioqUVERKSyFD3wMLM+ZvaamTWZ2aCMdZuY2WNm\nttDMZpjZtWbWIyPNIDMba2aLzWyKmY3Idduq8RAREakspajxuBaoBzx9YSrAeBzoBQwBjgOOBy5L\nS9MPGAN8ANQAI4CRZjYslw0r8BAREaksRQ08zGw/YB/gPCBzDtF9ga2Bo939DXcfA1wMnGZmvVJp\njgF6Aye6+0R3vx/4DXBOLttX4CEiIlJZihZ4mNkA4FYieFjcRpIhwBvuPidt2RigP7BdWpqx7r48\nI81AM+vfURnUx0NERKSyFLPG4w7gFnd/Ncv69YGZGctmpq3LNU1WqvEQERGpLL06TtLMzK4Czm8n\niQPbAN8BVgOuSd7aqdJ10bPPDufgg1tWjNTW1lJbW1uO4oiIiFSUuro66urqWixraGgo6jbzCjyA\n64iajPZ8AOwJfB1YYi1vD/uKmf3R3U8AZgBfzXjvgNTzjLTnAR2kyeqgg27gpptqOkomIiLSLbV1\nMT5+/HgGDx5ctG3mFXi4+1xgbkfpzOwM4KdpizYk+mYcAbycWvYCcJGZrZPWz2Mo0ABMSEtzhZn1\ndPfGtDST3L3DkExNLSIiIpWlKH083L3e3SckD+Bdorllsrt/nEr2JBFg3JOaq2Nf4HLgJndflkpz\nL7AUuN3MtjWzI4EzgetzKYcCDxERkcpSyplLW8zj4e5NwIFAI/A8cDdwJ3BJWpoFRA3HZsArwC+B\nke5+Wy4bVOAhIiJSWfLt49Ep7j4F6NnG8qlE8NHee98Edu/MdhV4iIiIVBbdq0VERERKpqoDD9V4\niIiIVBYFHiIiIlIyCjxERESkZKo68FAfDxERkcpS1YGHajxEREQqiwIPERERKRkFHiIiIlIyVR14\nqI+HiIhIZanqwKNnq7lSRUREpJyqOvAwK3cJREREJF1VBx4iIiJSWRR4iIiISMko8BAREZGSUeAh\nIiIiJaPAQ0REREpGgYeIiIiUjAIPERERKRkFHiIiIlIyCjxERESkZBR4iIiISMko8BAREZGSUeAh\nBVNXV1fuInQ72uelp31eetrn1aXogYeZ9TGz18ysycwGZaxryng0mtkRGWkGmdlYM1tsZlPMbESx\nyyydo4ND6Wmfl572eelpn1eXXiXYxrVAPbBDlvXHAU8Ayb1kP0lWmFk/YAzwJHByKo87zGy+u48u\nWolFRESkKIoaeJjZfsA+wOHA/lmSNbj77CzrjgF6Aye6+3JgopntDJwDKPAQERFZwRStqcXMBgC3\nEsHD4naS3mxms83sJTM7IWPdEGBsKuhIjAEGmln/wpZYREREiq2YNR53ALe4+6tmtmmWNBcDTwOL\ngKHALWa2qrvflFq/PjA54z0z09Y1ZMm3L8DEiRM7W3bphIaGBsaPH1/uYnQr2uelp31eetrnpZV2\n7pr5tHYAAAYrSURBVOxblA24e84P4CqgqZ1HI/Bl4ExgLNAj9b7NUusHdZD/SGBK2usxwG8z0myT\n2s7AdvL5PuB66KGHHnrooUenH9/PJ0bI9ZFvjcd1RE1Gez4A9gS+Diwxs/R1r5jZH909s0kl8TJw\nsZn1dvdlwAxgQEaa5PWMdsowBjga+BD4vIPyioiISLO+RIXBmGJknlfg4e5zgbkdpTOzM4Cfpi3a\nkPgARxDBRTY7A/NTQQfAC8AVZtbT3RtTy4YCk9w9WzNLUs57OyqniIiItOn5YmVclD4e7l6f/trM\nFhLDZSe7+8epZQcStRcvErUSQ4ELieG3iXuBnwO3m9k1xHDaM4GzilFuERERKa5SzOOR8IzXy4DT\ngFFEUPIecHb6/BzuvsDMhgI3A68Ac4CR7n5baYosIiIihWSpzpgiIiIiRad7tYiIiEjJKPAQERGR\nkqm6wMPMTjOzD1I3lXvRzL5a7jJVCzO70MxeNrMFZjbTzB4ysy+3ke4yM/vYzBaZ2VNmtlU5yluN\nzOyC1A0VR2Us1z4vIDPb0MzuMbM5qX36upnVZKTRPi8QM+thZpeb2eTU/nzPzH7WRjrt804ys93M\n7BEzm5Y6hhzcRpp296+ZrWRmN6d+F5+a2YNmtl6+ZamqwMPMjgSuBy4hhua+Dowxs3XKWrDqsRtw\nI/A1YG/iPjpPmtnKSQIzOx84HfgRsAuwkPgf9Cl9catLKoj+EfG9Tl+ufV5AZrYG8BywBNiXmLTw\nXGB+Whrt88K6gLgR6I+BrYGfAD8xs9OTBNrnXbYq8Bqxj1t17sxx//4KOIC4/9q3iKky/px3SYox\nK1m5HsTQ3F+nvTbizrg/KXfZqvEBrEPMSLtr2rKPgeFpr1cn7tVzRLnLuyI/gNWAScC3gX8Co7TP\ni7avrwae7SCN9nlh9/mjwO8zlj0I3K19XpT93QQcnLGs3f2ber0E+G5amoGpvHbJZ/tVU+NhZr2B\nwcA/kmUee+bvxCyqUnhrEJHzPAAz25y4h076/2AB8BL6H3TVzcCj7v50+kLt86I4iJhl+f5Uk+J4\nMxuWrNQ+L4rngb3M7H8AzGxH4JvA46nX2udFlOP+/QoxBUd6mknAR+T5PyjlPB7Ftg7Qk+abyCVm\nElGZFJDFXPi/Av7t7hNSi9cnApG2/gfrl7B4VcXMjgJ2In74mbTPC28L4FSi2fZKotr5N2a2xN3v\nQfu8GK4mrqjfNrNGohvAT939vtR67fPiymX/DgCWpgKSbGlyUk2Bh5TWLcC2xFWJFImZbUwEeHt7\n860EpLh6AC+7+8Wp16+b2fbAKcA95StWVTuSuLnnUcAEItD+tZl9nAr2pIpUTVMLMatpI23fVK69\nG8pJnszsJmB/YA93n562agbRr0b/g8IZDKwLjDezZWa2DNgdOMvMlhJXG9rnhTUdmJixbCLwpdTf\n+p4X3rXA1e7+gLu/5e5/BG4gbqMB2ufFlsv+nQH0MbPV20mTk6oJPFJXg+OAvZJlqeaAvSjizW66\nm1TQcQiwp7t/lL7O3T8gvoDp/4PViVEw+h90zt+JexTtBOyYerwC/AHY0d0no31eaM/Runl2IDAF\n9D0vklWIC8d0TaTOUdrnxZXj/h0HLM9IM5AIyF/IZ3vV1tQyCrjTzMYRd8EdTnyh7yxnoaqFmd0C\n1AIHAwvNLImOG9z989TfvwJ+ZmbvAR8ClxMjix4ucXGrgrsvJKqev5C66eJcd0+uyrXPC+sG4Dkz\nuxC4nzj4DgNOSkujfV5YjxL7sx54C6ghjt+j09Jon3eBma0KbEXUbABskerEO8/dp9LB/vW4d9pt\nwCgzmw98CvwGeM7d27vrfGvlHtZThGFCP07ttMVEFPaVcpepWh7EFUhjG48fZKQbSQzNWgSMAbYq\nd9mr6QE8TdpwWu3zouzj/YH/pvbnW8AP20ijfV64/b0qceH4ATF/xLvApUAv7fOC7ePdsxzDb891\n/wIrEXM5zUkFHg8A6+VbFt0kTkREREqmavp4iIiISOVT4CEiIiIlo8BDRERESkaBh4iIiJSMAg8R\nEREpGQUeIiIiUjIKPERERKRkFHiIiIhIySjwEBERkZJR4CEiIiIlo8BDRERESub/Ax42urrlmzgc\nAAAAAElFTkSuQmCC\n",
 35 |       "text/plain": [
 36 |        "<matplotlib.figure.Figure at 0x7f42e1240fd0>"
 37 |       ]
 38 |      },
 39 |      "metadata": {},
 40 |      "output_type": "display_data"
 41 |     }
 42 |    ],
 43 |    "source": [
 44 |     "plt_dat = np.genfromtxt('dbn_params/lproxy_layer_2.csv', delimiter=',', names = True) #'model_params/likelihood_proxy.csv'\n",
 45 |     "plt.plot(plt_dat)\n",
 46 |     "plt.show()"
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "code",
 51 |    "execution_count": 3,
 52 |    "metadata": {
 53 |     "collapsed": false
 54 |    },
 55 |    "outputs": [],
 56 |    "source": [
 57 |     "dat_x = np.genfromtxt('data/dtm_20news.csv', dtype='float32', delimiter=',', skip_header = 1)\n",
 58 |     "dat_y = dat_x[:,0]\n",
 59 |     "dat_x = dat_x[:,1:]\n",
 60 |     "vocab =  np.genfromtxt('data/dtm_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:]\n",
 61 |     "test_input = theano.shared(dat_x)"
 62 |    ]
 63 |   },
 64 |   {
 65 |    "cell_type": "code",
 66 |    "execution_count": 4,
 67 |    "metadata": {
 68 |     "collapsed": false
 69 |    },
 70 |    "outputs": [
 71 |     {
 72 |      "data": {
 73 |       "text/plain": [
 74 |        "(18828, 2756)"
 75 |       ]
 76 |      },
 77 |      "execution_count": 4,
 78 |      "metadata": {},
 79 |      "output_type": "execute_result"
 80 |     }
 81 |    ],
 82 |    "source": [
 83 |     "dat_x.shape"
 84 |    ]
 85 |   },
 86 |   {
 87 |    "cell_type": "code",
 88 |    "execution_count": 36,
 89 |    "metadata": {
 90 |     "collapsed": true
 91 |    },
 92 |    "outputs": [],
 93 |    "source": [
 94 |     "def score_rsm(input, learning_rate=0.002, \n",
 95 |     "             training_epochs=50, batch_size=400, \n",
 96 |     "             n_hidden=2000, model_src = 'model_params/rsm_epoch_80.pkl'):\n",
 97 |     "\n",
 98 |     "    train_set_x = input\n",
 99 |     "    N_input_x = train_set_x.shape[0]\n",
100 |     "    \n",
101 |     "    # compute number of minibatches for scoring\n",
102 |     "    N_splits = int( np.floor(train_set_x.get_value(borrow=True).shape[0] / batch_size) + 1 )\n",
103 |     "\n",
104 |     "    # allocate symbolic variables for the data\n",
105 |     "    index = T.lscalar()    # index to a [mini]batch\n",
106 |     "    x = T.matrix('x')  # the data is presented as rasterized images\n",
107 |     "    \n",
108 |     "    # construct the RBM class\n",
109 |     "    rsm = RSM(input=x, n_visible=train_set_x.get_value(borrow=True).shape[1],\n",
110 |     "              n_hidden=n_hidden)#, numpy_rng=rng, theano_rng=theano_rng)\n",
111 |     "    \n",
112 |     "    # ensure model source directory is valid\n",
113 |     "    assert type(model_src) == str or model_src is not None\n",
114 |     "    \n",
115 |     "    # load saved model\n",
116 |     "    rsm.__setstate__(pickle.load(open(model_src, 'rb')))\n",
117 |     "    \n",
118 |     "    # extract the features w.r.t inputs\n",
119 |     "    _, hid_extract = rsm.propup(x, x.sum(axis=1))\n",
120 |     "        \n",
121 |     "    # start-snippet-5\n",
122 |     "    # it is ok for a theano function to have no output\n",
123 |     "    # the purpose of train_rbm is solely to update the RBM parameters\n",
124 |     "    score = theano.function(\n",
125 |     "        inputs = [index],\n",
126 |     "        outputs = hid_extract,\n",
127 |     "        givens={\n",
128 |     "            x: train_set_x[index * batch_size: (index + 1) * batch_size]\n",
129 |     "        }\n",
130 |     "    )\n",
131 |     "    \n",
132 |     "    return np.concatenate( [score(ii) for ii in range(N_splits)], axis=0 )\n"
133 |    ]
134 |   },
135 |   {
136 |    "cell_type": "code",
137 |    "execution_count": 37,
138 |    "metadata": {
139 |     "collapsed": false
140 |    },
141 |    "outputs": [],
142 |    "source": [
143 |     "rsm = score_rsm(input=test_input, n_hidden = 0)"
144 |    ]
145 |   },
146 |   {
147 |    "cell_type": "code",
148 |    "execution_count": 54,
149 |    "metadata": {
150 |     "collapsed": false
151 |    },
152 |    "outputs": [
153 |     {
154 |      "data": {
155 |       "text/plain": [
156 |        "(18828, 2000)"
157 |       ]
158 |      },
159 |      "execution_count": 54,
160 |      "metadata": {},
161 |      "output_type": "execute_result"
162 |     }
163 |    ],
164 |    "source": [
165 |     "rsm.shape"
166 |    ]
167 |   },
168 |   {
169 |    "cell_type": "code",
170 |    "execution_count": 26,
171 |    "metadata": {
172 |     "collapsed": false
173 |    },
174 |    "outputs": [
175 |     {
176 |      "data": {
177 |       "text/plain": [
178 |        "878.73932"
179 |       ]
180 |      },
181 |      "execution_count": 26,
182 |      "metadata": {},
183 |      "output_type": "execute_result"
184 |     }
185 |    ],
186 |    "source": [
187 |     "rsm[18828-1].sum()"
188 |    ]
189 |   },
190 |   {
191 |    "cell_type": "code",
192 |    "execution_count": 55,
193 |    "metadata": {
194 |     "collapsed": false
195 |    },
196 |    "outputs": [
197 |     {
198 |      "data": {
199 |       "text/plain": [
200 |        "array([[  1.92498332e-38,   4.09554756e-38,   0.00000000e+00, ...,\n",
201 |        "          1.00000000e+00,   0.00000000e+00,   1.00000000e+00],\n",
202 |        "       [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00, ...,\n",
203 |        "          1.00000000e+00,   0.00000000e+00,   1.00000000e+00]], dtype=float32)"
204 |       ]
205 |      },
206 |      "execution_count": 55,
207 |      "metadata": {},
208 |      "output_type": "execute_result"
209 |     }
210 |    ],
211 |    "source": [
212 |     "rsm[:2,]"
213 |    ]
214 |   }
215 |  ],
216 |  "metadata": {
217 |   "anaconda-cloud": {},
218 |   "kernelspec": {
219 |    "display_name": "Python [conda env:py3]",
220 |    "language": "python",
221 |    "name": "conda-env-py3-py"
222 |   },
223 |   "language_info": {
224 |    "codemirror_mode": {
225 |     "name": "ipython",
226 |     "version": 3
227 |    },
228 |    "file_extension": ".py",
229 |    "mimetype": "text/x-python",
230 |    "name": "python",
231 |    "nbconvert_exporter": "python",
232 |    "pygments_lexer": "ipython3",
233 |    "version": "3.5.2"
234 |   }
235 |  },
236 |  "nbformat": 4,
237 |  "nbformat_minor": 2
238 | }
239 | 


--------------------------------------------------------------------------------
/scripts_python/20news_dtm.py:
--------------------------------------------------------------------------------
 1 | # -*- coding: utf-8 -*-
 2 | 
 3 | 
 4 | import os
 5 | import numpy as np
 6 | import pandas as pd
 7 | import ast
 8 | from sklearn.feature_extraction.text import CountVectorizer
 9 | 
10 | 
11 | 
12 | os.chdir('~/Codes/DL - Topic Modelling')
13 | 
14 | dat = pd.read_csv('data/clean_20news.csv', sep=",")
15 | 
16 | 
17 | docs = [ast.literal_eval(doc) for doc in dat['document'].tolist()]
18 | 
19 | all_words = [word for doc in docs for word in doc]
20 | pd_all_words = pd.DataFrame({'words' : all_words})
21 | pd_unq_word_counts = pd.DataFrame({'count' : pd_all_words.groupby('words').size()}).reset_index().sort('count', ascending = False)
22 | pd_unq_word_counts_filtered = pd_unq_word_counts.loc[pd_unq_word_counts['count'] >= 150]
23 | list_unq_word_filtered = list( pd_unq_word_counts_filtered.ix[:,0] )
24 | len(list_unq_word_filtered)
25 | 
26 | 
27 | vec = CountVectorizer(input = 'content', lowercase = False, vocabulary = list_unq_word_filtered)
28 | 
29 | iters = list(range(0,len(docs),500))
30 | iters.append(len(docs))
31 | dtm = np.array([] ).reshape(0,len(list_unq_word_filtered))
32 | for i in range(len(iters)-1):
33 |     dtm = np.concatenate( (dtm, list(map(lambda x: vec.fit_transform(x).toarray().sum(axis=0), docs[iters[i]:iters[i+1]] )) ), axis = 0)
34 |     print(str(i))
35 | 
36 | colnames = list_unq_word_filtered
37 | colnames.insert(0,'_label_')
38 |                 
39 | pd.DataFrame(data = np.c_[dat['label'].values, dtm], 
40 |              columns = colnames). \
41 |              to_csv( 'data/dtm_20news.csv', index = False)
42 | 


--------------------------------------------------------------------------------
/scripts_python/data_proc_20news.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Tue Jan 17 09:38:22 2017
  4 | 
  5 | @author: ekhongl
  6 | """
  7 | 
  8 | import os
  9 | import numpy as np
 10 | import pandas as pd
 11 | import re
 12 | 
 13 | import nltk
 14 | #from nltk.tokenize import PunktSentenceTokenizer
 15 | from nltk.corpus import stopwords
 16 | from nltk.stem import PorterStemmer, WordNetLemmatizer
 17 | ps = PorterStemmer()
 18 | lemma = WordNetLemmatizer()
 19 | 
 20 | from sklearn.feature_extraction.text import CountVectorizer
 21 | 
 22 | 
 23 | import gensim
 24 | from gensim import corpora, models
 25 | 
 26 | 
 27 | dir_wd = os.getcwd()
 28 | 
 29 | dir_src = os.path.join(dir_wd, 'data/raw_20news/20news-18828')
 30 | 
 31 | dir_src_classes = list( map(lambda x: os.path.join(dir_src, x ), os.listdir(dir_src)) )
 32 | 
 33 | 
 34 | dat = []
 35 | dat_y = []
 36 | dat_y_cat = []
 37 | 
 38 | for i in range(0,len(dir_src_classes)):
 39 |     
 40 |     print('Currently loading the following topic (iteration ' + str(i) + '):\n \t' + dir_src_classes[i])
 41 |     dir_src_classes_file = list( map(lambda x: os.path.join(dir_src_classes[i], x), os.listdir(dir_src_classes[i])) )
 42 |     
 43 |     for ii in range(0, len(dir_src_classes_file)):
 44 |         
 45 |         dat_y.append(i)
 46 |         
 47 |         with open(dir_src_classes_file[ii], 'r') as file:
 48 |             dat.append(file.read().replace('\n', ' '))
 49 | 
 50 | #export data
 51 | pd.DataFrame( { 'labels' : dat, 
 52 |                 'documents' : dat_y}). \
 53 |                 to_csv(os.path.join(dir_wd,'data/raw_20news/20news.csv'),
 54 |                     index=False)
 55 | 
 56 | print('------- Data cleaning -------')                
 57 | stopwords_en = stopwords.words('english')
 58 | dat_clean = []
 59 | for i in range(len(dat)):
 60 | 
 61 |     ''' tokenization and punctuation removal '''
 62 |     # uses nltk tokenization - e.g. shouldn't = [should, n't] instead of [shouldn, 't]
 63 |     tmp_doc = nltk.tokenize.word_tokenize(dat[i].lower())
 64 |     
 65 |     # split words sperated by fullstops
 66 |     tmp_doc_split = [w.split('.') for w in tmp_doc if len(w.split('.')) > 1]
 67 |     # flatten list
 68 |     tmp_doc_split = [i_sublist for i_list in tmp_doc_split for i_sublist in i_list]
 69 |     # clean split words
 70 |     tmp_doc_split = [w for w in tmp_doc_split if re.search('^[a-z]+$',w)]
 71 |     
 72 |     # drop punctuations
 73 |     tmp_doc_clean = [w for w in tmp_doc if re.search('^[a-z]+$',w)]
 74 |     tmp_doc_clean.extend(tmp_doc_split)
 75 | 
 76 |     ''' stop word removal'''
 77 |     tmp_doc_clean_stop = [w for w in tmp_doc_clean if w not in stopwords_en]
 78 |     #retain only words with 2 characters or more
 79 |     tmp_doc_clean_stop = [w for w in  tmp_doc_clean_stop if len(w) >2]
 80 |     
 81 |     ''' stemming (using the Porter's algorithm)'''
 82 |     tmp_doc_clean_stop_stemmed = [ps.stem(w) for w in  tmp_doc_clean_stop]
 83 |     dat_clean.append(tmp_doc_clean_stop_stemmed)
 84 |     
 85 |     #print progress
 86 |     if i % 100 == 0: print( 'Current progress: ' + str(i) + '/' + str(len(dat)) )
 87 | 
 88 | #save cleaned data
 89 | pd.DataFrame( { 'document' : dat_clean, 
 90 |                 'label' : dat_y}). \
 91 |                 to_csv(os.path.join(dir_wd,'data/clean_20news.csv'),
 92 |                     index=False)
 93 | 
 94 | 
 95 | 
 96 | # all_words = [word for doc in dat_clean for word in doc]
 97 | # pd_all_words = pd.DataFrame({'words' : all_words})
 98 | # pd_unq_word_counts = pd.DataFrame({'count' : pd_all_words.groupby('words').size()}).reset_index().sort('count', ascending = False)
 99 | # pd_unq_word_counts_filtered = pd_unq_word_counts.loc[pd_unq_word_counts['count'] >= 100]
100 | # list_unq_word_filtered = list( pd_unq_word_counts_filtered.ix[:,0] )
101 | 
102 | 
103 | # vec = CountVectorizer(input = 'content', lowercase = False, vocabulary = list_unq_word_filtered)
104 | 
105 | 
106 | # iters = list(range(0,len(dat_clean),500))
107 | # iters.append(len(dat_clean))
108 | # dtm = []
109 | # for i in range(len(iters)-1):
110 |     # dtm = np.concatenate((dtm, list(map(lambda x: vec.fit_transform(x).toarray().astype(np.int32), dat_clean[iters[i]:iters[i+1]] )) ), axis = 0)
111 |     # print(str(i))
112 | 
113 | 
114 | # data = vec.fit_transform(tmp).toarray()
115 | # print(data)
116 | 
117 | # import numpy as np
118 | # list(map(lambda x: vec.fit_transform(x).toarray(), tmp )).shape
119 | 
120 | # [i for i in map(lambda x: vec.fit_transform(x).toarray(), tmp )]
121 | 
122 | 
123 | 
124 | 
125 | 
126 | # dictionary = corpora.Dictionary(dat_clean)
127 | # dictionary.filter_extremes(no_below=50, no_above=0.7, keep_n=100000)
128 | 
129 | 
130 | # dtm = [dictionary.doc2bow(doc) for doc in dat_clean]
131 | 
132 | 
133 | 
134 | # tmp = [list(map(lambda x : doc.count(x), list_unq_word_filtered)) for doc in dat_clean]
135 | 
136 | 
137 | # tmp = list(map(lambda y: list( map( lambda x : y.count(x), list_unq_word_filtered ) ), dat_clean))
138 | 
139 | 
140 | # [doc.count(w_dic) for doc in dat_clean for w_dic in list_unq_word_filtered]
141 | 
142 | 
143 | 
144 | 
145 | 
146 | 
147 | 
148 | 
149 | 
150 | 
151 | 
152 | 
153 | 
154 | 
155 | 
156 | 
157 | 
158 | import pandas as pd
159 | import matplotlib.pyplot as plt
160 | plt.hist(pd_all_words['words'].value_counts()[300:3000])
161 | plt.title("Plot of word counts")
162 | plt.xlabel("Value")
163 | plt.ylabel("Frequency")
164 | plt.show()
165 | 
166 | 
167 | for doc in dat_clean if 
168 | 
169 | 
170 | 
171 | 
172 | 
173 | 
174 | 
175 | 
176 | pd_unq_word_counts_filtered.ix[:,0]
177 | 
178 | 
179 | count_vect = CountVectorizer(list(pd_unq_word_counts_filtered.ix[:,0]))
180 | count_vect.fit_transform(dat_clean[0]).shape
181 |     
182 | all_words = nltk.FreqDist(unq_words)
183 | 
184 | 
185 | set(dat_clean[0])
186 | 
187 | 
188 | 
189 | 
190 | 
191 | 
192 | 
193 | 
194 | 
195 | 
196 | 
197 | 
198 | 
199 | 
200 | 
201 | 
202 | dictionary = corpora.Dictionary(dat_clean)
203 |     
204 | # convert tokenized documents into a document-term matrix
205 | dtm = [dictionary.doc2bow(doc) for doc in dat_clean]
206 | 
207 | # generate LDA model
208 | ldamodel = gensim.models.ldamulticore.LdaMulticore( workers=3,
209 |                                            corpus = dtm,
210 |                                            id2word = dictionary, 
211 |                                            num_topics=20, 
212 |                                            passes=100, 
213 |                                            random_state=123)
214 | 
215 | 
216 | 
217 | 
218 | 
219 | 
220 | 
221 | 
222 | 
223 | # Create p_stemmer of class PorterStemmer
224 | p_stemmer = PorterStemmer()
225 | texts = []
226 | bcfs = []
227 | tokenizer = RegexpTokenizer(r'\w+')
228 | for i in raw_data:
229 |     
230 |     # clean and tokenize document string
231 |     #raw = str(raw_data[i].lower())    
232 |     raw = raw_data[i]
233 |     raw_str = pd.Series.to_string(raw)
234 |     raw_str = raw_str.lower()
235 |     tokens = raw_str.split()
236 |     
237 |     #StanfordTokenizer().tokenize(raw_str)
238 |     #tokens = tokenizer.tokenize(raw_str.split())
239 | 
240 |     # remove stop words from tokens
241 |     stopped_tokens = [i for i in tokens if not i in en_stop]
242 |     removedP_tokens = [w for w in stopped_tokens if re.search('^[a-z]+$',w)]
243 |     cleaned_tokens = [w for w in removedP_tokens if len(w) >= 3]
244 |     
245 |     # stem tokens
246 |     #stemmed_tokens = [p_stemmer.stem(i) for i in cleaned_tokens]
247 |     normalized = [lemma.lemmatize(i) for i in cleaned_tokens]
248 |     normalized_pos = pos_tag(normalized)
249 |     print(normalized_pos)
250 |     
251 |     # add tokens to list
252 |     texts.append(normalized)
253 | 
254 | 
255 | 
256 | 
257 | 
258 | 
259 | 
260 | #------------------------------------------------------------------------------  
261 | alt            0
262 | comp           1,2,3,4,5
263 | misc            6,
264 | rec            7,8,9,10,
265 | sci            11,12,13,14
266 | soc            15,
267 | talk            16,17,18,19
268 | 
269 | 
270 | 
271 | 
272 | 
273 | 
274 | 
275 | 
276 | 
277 | 
278 | 
279 | 
280 | 
281 | 
282 | 
283 | 
284 | 
285 | 
286 | 
287 | 
288 | 
289 | 
290 | 
291 | 
292 | 
293 | 
294 | 
295 | 
296 | 
297 | 
298 | 
299 | 


--------------------------------------------------------------------------------
/scripts_python/pretrain_dbn.py:
--------------------------------------------------------------------------------
 1 | from __future__ import print_function, division
 2 | import os
 3 | import sys
 4 | import timeit
 5 | from six.moves import cPickle as pickle
 6 | 
 7 | import numpy as np
 8 | import pandas as pd
 9 | 
10 | import theano
11 | import theano.tensor as T
12 | 
13 | from lib.deeplearning import deepbeliefnet
14 | 
15 | os.chdir('~/Codes/DL - Topic Modelling')
16 |             
17 | dat_x = np.genfromtxt('data/dtm_2000_20news.csv', dtype='float32', delimiter=',', skip_header = 1)
18 | dat_y = dat_x[:,0]
19 | dat_x = dat_x[:,1:]
20 | vocab =  np.genfromtxt('data/dtm_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:]
21 | x = theano.shared(dat_x)
22 | y = T.cast(dat_y, dtype='int32')
23 | 
24 | 
25 | 
26 | 
27 | model = deepbeliefnet(architecture = [2000, 500, 500, 128], n_outs = 20)
28 | 
29 | 
30 | model.pretrain(input = x, pretraining_epochs= [1000,100,100], output_path = 'params/dbn_params_test')
31 | 


--------------------------------------------------------------------------------
/scripts_python/train_dbn.py:
--------------------------------------------------------------------------------
 1 | from __future__ import print_function, division
 2 | import os
 3 | import sys
 4 | import timeit
 5 | from six.moves import cPickle as pickle
 6 | 
 7 | import numpy as np
 8 | import pandas as pd
 9 | 
10 | import theano
11 | import theano.tensor as T
12 | 
13 | from lib.deeplearning import deepbeliefnet
14 | 
15 | os.chdir('~/Codes/DL - Topic Modelling')
16 |             
17 | dat_x = np.genfromtxt('data/dtm_2000_20news_6class.csv', dtype='float32', delimiter=',', skip_header = 1)
18 | dat_y = dat_x[:,0]
19 | dat_x = dat_x[:,1:]
20 | vocab =  np.genfromtxt('data/dtm_2000_20news_6class.csv', dtype=str, delimiter=',', max_rows = 1)[1:]
21 | x = theano.shared(dat_x)
22 | y = T.cast(dat_y, dtype='int32')
23 | 
24 | 
25 | #model = deepbeliefnet(architecture = [2756, 500, 500, 128], opt_epochs = [900,5,10], n_outs = 20, predefined_weights = 'params/dbn_params')
26 | 
27 | #model.train(x=x, y=y, training_epochs = 10000, batch_size = 70, output_path = 'params/dbn_params_trained_long')
28 | 
29 | # theano_dropout
30 | #model.train(x=x, y=y, training_epochs = 10000, learning_rate = (1/70)/2, batch_size = 120,
31 | #            drop_out = [0.2, .5, .5, .5], output_path = 'params/dbn_params_dropout')
32 | 
33 | # theano_dropout2
34 | #model.train(x=x, y=y, training_epochs = 10000, learning_rate = (1/70)/2, batch_size = 120,
35 | #            drop_out = [0.2, .3, .4, .5], output_path = 'params/dbn_params_dropout2')
36 | 
37 | 
38 |  
39 | model = deepbeliefnet(n_outs = 6, architecture = [2000, 500, 500, 128], 
40 |                       opt_epochs = [110,15,10], predefined_weights = 'params_2000/dbn_params_pretrain')
41 | 
42 | #theano_6 class
43 | model.train(x=x, y=y, training_epochs = 10000, learning_rate = 1/160, batch_size = 50,
44 |             drop_out = [0.2, .5, .5, .5], output_path = 'params_2000/dbn_params_dropout')
45 | 


--------------------------------------------------------------------------------
/scripts_python/train_rsm.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | 
  3 | import timeit
  4 | from six.moves import cPickle as pickle
  5 | 
  6 | import numpy as np
  7 | import pandas as pd
  8 | 
  9 | import theano
 10 | import theano.tensor as T
 11 | import os
 12 | from lib.rbm import RSM
 13 | 
 14 | from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
 15 | 
 16 | 
 17 | os.chdir('~/Codes/DL - Topic Modelling')
 18 | 
 19 | 
 20 | def test_rbm(input, learning_rate=1/1600, 
 21 |              training_epochs=5000, batch_size=1600, 
 22 |              n_hidden=1500, output_folder = 'model_params'):
 23 |     """
 24 |     Demonstrate how to train and afterwards sample from it using Theano.
 25 | 
 26 |     This is demonstrated on MNIST.
 27 | 
 28 |     :param learning_rate: learning rate used for training the RBM
 29 | 
 30 |     :param training_epochs: number of epochs used for training
 31 | 
 32 |     :param dataset: path the the pickled dataset
 33 | 
 34 |     :param batch_size: size of a batch used to train the RBM
 35 | 
 36 |     """
 37 |     train_set_x = input
 38 | 
 39 |     # compute number of minibatches for training, validation and testing
 40 |     n_train_batches = train_set_x.get_value(borrow=True).shape[0] // batch_size
 41 |     
 42 |     # allocate symbolic variables for the data
 43 |     index = T.lscalar()    # index to a [mini]batch
 44 |     x = T.matrix('x')  # the data is presented as rasterized images
 45 | 
 46 |     rng = np.random.RandomState(123)
 47 |     theano_rng = RandomStreams(rng.randint(2 ** 30))
 48 | 
 49 |     # initialize storage for the persistent chain (state = hidden
 50 |     # layer of chain)
 51 |     persistent_chain = theano.shared(np.zeros((batch_size, n_hidden),
 52 |                                                  dtype=theano.config.floatX),
 53 |                                      borrow=True)
 54 |     
 55 |     # construct the RBM class
 56 |     rsm = RSM(input=x, n_visible=train_set_x.get_value(borrow=True).shape[1],
 57 |               n_hidden=n_hidden)#, numpy_rng=rng, theano_rng=theano_rng)
 58 | 
 59 |     # get the cost and the gradient corresponding to one step of CD-15
 60 |     cost, updates = rsm.get_cost_updates( lr=learning_rate,
 61 |                                           persistent=persistent_chain, 
 62 |                                           k=1 )
 63 | 
 64 |     #################################
 65 |     #     Training the RBM          #
 66 |     #################################
 67 |     if not os.path.isdir(output_folder):
 68 |         os.makedirs(output_folder)
 69 |     
 70 |     # start-snippet-5
 71 |     # it is ok for a theano function to have no output
 72 |     # the purpose of train_rbm is solely to update the RBM parameters
 73 |     train_rbm = theano.function(
 74 |         [index],
 75 |         cost,
 76 |         updates=updates,
 77 |         givens={
 78 |             x: train_set_x[index * batch_size: (index + 1) * batch_size]
 79 |         },
 80 |         name='train_rbm'
 81 |     )
 82 | 
 83 |     plotting_time = 0.
 84 |     start_time = timeit.default_timer()
 85 | 
 86 |     # go through training epochs
 87 |     lproxy = []
 88 |     for epoch in range(training_epochs):
 89 | 
 90 |         # go through the training set
 91 |         mean_cost = []
 92 |         for batch_index in range(n_train_batches):
 93 |             mean_cost += [train_rbm(batch_index)]
 94 |             print('Current iteration: Epoch=' + str(epoch) + ', ' + 'Batch=' + str(batch_index))
 95 |             
 96 |         # save the model parameters for each epoch
 97 |         print('Saving model...')
 98 |         epoch_pickle = output_folder + '/rsm_epoch_' + str(epoch) + '.pkl'
 99 |         path_epoch_pickle = os.path.join(os.getcwd(), epoch_pickle)
100 |         pickle.dump( rsm.__getstate__(), open(path_epoch_pickle, 'wb'))
101 |         print('...model saved.')
102 |         
103 |         # update current epoch and average cost
104 |         lproxy += [np.mean(mean_cost)]
105 |         print('Training epoch %d, likelihood proxy is ' % epoch, lproxy[epoch])
106 | 
107 |         
108 |     end_time = timeit.default_timer()
109 | 
110 |     pretraining_time = (end_time - start_time) - plotting_time
111 |     
112 |     pd.DataFrame(data = {'likelihood_proxy' :lproxy} ). \
113 |                 to_csv(output_folder + '/likelihood_proxy.csv', index = False)
114 |         
115 |     print ('Training took %f minutes' % (pretraining_time / 60.))
116 | 
117 | 
118 | 
119 | dat_x = np.genfromtxt('data/dtm_20news.csv', dtype='float32', delimiter=',', skip_header = 1)
120 | dat_y = dat_x[:,0]
121 | dat_x = dat_x[:,1:]
122 | vocab =  np.genfromtxt('data/dtm_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:]
123 | test_input = theano.shared(dat_x)
124 | 
125 | 
126 | test_input = theano.shared(dat_x)
127 | 
128 | test_rbm(input = test_input)
129 | 
130 | 
131 | 
132 | 
133 | 
134 | 
135 | 
136 | 
137 | 


--------------------------------------------------------------------------------
/scripts_python/train_sae.py:
--------------------------------------------------------------------------------
 1 | """This tutorial introduces restricted boltzmann machines (RBM) using Theano.
 2 | 
 3 | Boltzmann Machines (BMs) are a particular form of energy-based model which
 4 | contain hidden variables. Restricted Boltzmann Machines further restrict BMs
 5 | to those without visible-visible and hidden-hidden connections.
 6 | """
 7 | from __future__ import print_function, division
 8 | import os
 9 | import sys
10 | import timeit
11 | from six.moves import cPickle as pickle
12 | 
13 | import numpy as np
14 | import pandas as pd
15 | 
16 | import theano
17 | import theano.tensor as T
18 | 
19 | from lib.deeplearning import autoencoder
20 | 
21 | os.chdir('~/Codes/DL - Topic Modelling')
22 | 
23 | dat_x = np.genfromtxt('data/dtm_2000_20news.csv', dtype='float32', delimiter=',', skip_header = 1)
24 | dat_y = dat_x[:,0]
25 | dat_x = dat_x[:,1:]
26 | vocab =  np.genfromtxt('data/dtm_2000_20news.csv', dtype=str, delimiter=',', max_rows = 1)[1:]
27 | test_input = theano.shared(dat_x)
28 | 
29 | #model = autoencoder( architecture = [2756, 500, 500, 128], opt_epochs = [900,5,10], model_src = 'params/dbn_params')
30 | 
31 | 
32 | #model.train(test_input, batch_size = 50, learning_rate = 1/20000, epochs = 60000, obj_fn = 'mean_sq', output_path = #'params/ae_params_meansq2')
33 | 
34 | # theano 2
35 | #model.train(test_input, batch_size = 200, learning_rate = 1/20000, epochs = 60000, output_path = 'params/ae_params_noise')
36 | 
37 | # theano 3
38 | #model.train(test_input, batch_size = 500, learning_rate = 1/20000, epochs = 60000, output_path = 'params/ae_params_noise_2')
39 | 
40 | #-----------------------------------------------------------------------------------------
41 | 
42 | model = autoencoder( architecture = [2000, 500, 500, 128], opt_epochs = [110,15,10], model_src = 'params_2000/dbn_params_pretrain')
43 | # theano 1
44 | #model.train(test_input, add_noise = 16, batch_size = 500, learning_rate = 1/20000, epochs = 60000, \
45 | #            output_path = 'params_2000/ae_train_noise')
46 | 
47 | # theano 2
48 | model.train(test_input,  batch_size = 200, learning_rate = 1/20000, epochs = 60000, \
49 |             output_path = 'params_2000/ae_train_nonoise')


--------------------------------------------------------------------------------