├── runtrain.lua
├── README.md
├── dA.py
├── NTP.ipynb
└── NTP2.ipynb


/runtrain.lua:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/elbamos/NeuralTopicModels/HEAD/runtrain.lua


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # NeuralTopicModels
 2 | 
 3 | This repo contains a WIP implementation of http://nlp.cs.rpi.edu/paper/AAAI15.pdf.  This is here for the purpose of sharing with collaborators -- it isn't final code or presentation-quality code.  
 4 | 
 5 | The NTM model is intended to work essentially as follows:
 6 | 
 7 | The outputs are W1 and lt. 
 8 | 
 9 | * W1 -- An embedding showing, approximately, the distribution of topics over each document. 
10 | * lt -- An embedding showing, approximately, the distribution of topics over each term. 
11 | 
12 | W1 and lt are calculated as follows:
13 | 
14 | ### Pre-training
15 | 
16 | `le` R^{n_terms * 300}   is created mapping each term to the sum of word2vec embeddings of grams within the term.  (Because a term may be an n > 1 gram.)
17 | 
18 | `le` is mapped to `lt` by sigmoid activation of weight matrix W2. 
19 | 
20 | W2 R^{300 * n_topics}   is pre-trained by auto-encoding `le` against itself.  
21 | 
22 | W1 R^{n_docs * n_topics}  is pre-trained so that each document's embedding is the sum of the pre-trained `lt` activations for the terms contained in the document. 
23 | 
24 | ### Fine-tuning
25 | 
26 | Each example is a combination of (a) a term, (b) a document containing the term, and (c) a random document that does not contain the term.
27 | 
28 | `ld+` and `ld-` R^{n_topics} are the softmax activation of W1 for the positive and negative documents, respectively. 
29 | 
30 | `ls+` and `ls-` are calculated.  Each is a scalar representing the predicted probability that the term would appear in the positive and negative documents, respectively. 
31 | 
32 | `ls` = `lt` dotproduct ld'`
33 | 
34 | The cost is then calculated as:
35 | 
36 | c(g, d+, d-) = max(0., 0.5 - `ls-` + `ls+`)
37 | 
38 | Thus, the algorithm wants to find (a) an embedding for the documents, and (b) a weight matrix mapping the term word2vec embeddings to topics, where given any term and a document containing the term, the predicted probability that the term would appear in the document is at least 0.5 greater than the predicted probability the term would appear in a randomly chosen document that does not contain it. 
39 | 
40 | ### Issues
41 | 
42 | In my testing, with a 1M document, 30000 term corpus with ~ 10M total grams, aiming for 128 topics, I found that the W1 and W2 both consistnetly converge toward 0, usually after only 1 epoch.  
43 | 
44 | * Theory:   In debugging, I observed that the calculation of `ls-` - `ls+` tends to be around 1 e-8.  Adding 0.5, I suspect that a 32-bit float would represent the number only as 0.5, losing precision.  
45 | 
46 | I suspect that this is then causing every loss to be calculated as 0.5, and confusing the gradients for W1 and W2. 
47 | 
48 | Experiments:
49 | 
50 | *  Pretraining W1 & W2 by ignoring the 0.5 separation:  I tried this cost function:
51 | 
52 | c(g, d+, d-) = mean(binary_crossentropy(`ls+`, 1), binary_crossentropy(`ls-`, 0))
53 | 
54 | Result:  Convergence toward zero. 
55 | 
56 | *  Gradient enhancement:  on the theory that the problem was underflow, I tried this cost function:
57 | 
58 | c(g, d+, d-) = max(0., 0.5 +  max(n_docs, 10 ** epoch) * (`ls-` - `ls+`))
59 | 
60 | Result: Convergence toward zero
61 | 
62 | *  Normalization:  To try to force the weights on W2 and W1 to not approach zero, I tried: 
63 | 
64 | Modifying the formula for `lt` to softmax(softplus(`le`)).  This is intended to prevent `lt` from approaching zero while encouraging greater differentiation of topics and terms. 
65 | 
66 | Enforcing a unit-norm constraint on W1, the document-topic embedding matrix.  
67 | 
68 | Result:  In testing
69 | 
70 | *  Optimization:  I experimented with `adadelta` (which I have found very effective) intead of vanilla SGD. 
71 | 
72 | Result:  W1 converged toward zero much more quickly than with vanilla SGD. 
73 | 
74 | 


--------------------------------------------------------------------------------
/dA.py:
--------------------------------------------------------------------------------
  1 | """
  2 | This is the denoising auto-encoder class from deeplearning.net, modified as described in the accompanying iPython notebook
  3 | """
  4 | 
  5 | import os
  6 | import sys
  7 | import timeit
  8 | 
  9 | import numpy
 10 | 
 11 | import theano
 12 | import theano.tensor as T
 13 | from theano.tensor.shared_randomstreams import RandomStreams
 14 | 
 15 | from logistic_sgd import load_data
 16 | from utils import tile_raster_images
 17 | 
 18 | try:
 19 |     import PIL.Image as Image
 20 | except ImportError:
 21 |     import Image
 22 | 
 23 | 
 24 | class dA(object):
 25 |     """Denoising Auto-Encoder class (dA)
 26 | 
 27 |     A denoising autoencoders tries to reconstruct the input from a corrupted
 28 |     version of it by projecting it first in a latent space and reprojecting
 29 |     it afterwards back in the input space. Please refer to Vincent et al.,2008
 30 |     for more details. If x is the input then equation (1) computes a partially
 31 |     destroyed version of x by means of a stochastic mapping q_D. Equation (2)
 32 |     computes the projection of the input into the latent space. Equation (3)
 33 |     computes the reconstruction of the input, while equation (4) computes the
 34 |     reconstruction error.
 35 | 
 36 |     .. math::
 37 | 
 38 |         \tilde{x} ~ q_D(\tilde{x}|x)                                     (1)
 39 | 
 40 |         y = s(W \tilde{x} + b)                                           (2)
 41 | 
 42 |         x = s(W' y  + b')                                                (3)
 43 | 
 44 |         L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)]      (4)
 45 | 
 46 |     """
 47 | 
 48 |     def __init__(
 49 |         self,
 50 |         numpy_rng,
 51 |         theano_rng=None,
 52 |         input=None,
 53 |         n_visible=784,
 54 |         n_hidden=500,
 55 |         W=None
 56 |     ):
 57 |         """
 58 |         Initialize the dA class by specifying the number of visible units (the
 59 |         dimension d of the input ), the number of hidden units ( the dimension
 60 |         d' of the latent or hidden space ) and the corruption level. The
 61 |         constructor also receives symbolic variables for the input, weights and
 62 |         bias. Such a symbolic variables are useful when, for example the input
 63 |         is the result of some computations, or when weights are shared between
 64 |         the dA and an MLP layer. When dealing with SdAs this always happens,
 65 |         the dA on layer 2 gets as input the output of the dA on layer 1,
 66 |         and the weights of the dA are used in the second stage of training
 67 |         to construct an MLP.
 68 | 
 69 |         :type numpy_rng: numpy.random.RandomState
 70 |         :param numpy_rng: number random generator used to generate weights
 71 | 
 72 |         :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams
 73 |         :param theano_rng: Theano random generator; if None is given one is
 74 |                      generated based on a seed drawn from `rng`
 75 | 
 76 |         :type input: theano.tensor.TensorType
 77 |         :param input: a symbolic description of the input or None for
 78 |                       standalone dA
 79 | 
 80 |         :type n_visible: int
 81 |         :param n_visible: number of visible units
 82 | 
 83 |         :type n_hidden: int
 84 |         :param n_hidden:  number of hidden units
 85 | 
 86 |         :type W: theano.tensor.TensorType
 87 |         :param W: Theano variable pointing to a set of weights that should be
 88 |                   shared belong the dA and another architecture; if dA should
 89 |                   be standalone set this to None
 90 | 
 91 |         """
 92 |         self.n_visible = n_visible
 93 |         self.n_hidden = n_hidden
 94 | 
 95 |         # create a Theano random generator that gives symbolic random values
 96 |         if not theano_rng:
 97 |             theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
 98 | 
 99 |         # note : W' was written as `W_prime` and b' as `b_prime`
100 |         if not W:
101 |             # W is initialized with `initial_W` which is uniformely sampled
102 |             # from -4*sqrt(6./(n_visible+n_hidden)) and
103 |             # 4*sqrt(6./(n_hidden+n_visible))the output of uniform if
104 |             # converted using asarray to dtype
105 |             # theano.config.floatX so that the code is runable on GPU
106 |             initial_W = numpy.asarray(
107 |                 numpy_rng.uniform(
108 |                     low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)),
109 |                     high=4 * numpy.sqrt(6. / (n_hidden + n_visible)),
110 |                     size=(n_visible, n_hidden)
111 |                 ),
112 |                 dtype=theano.config.floatX
113 |             )
114 |             W = theano.shared(value=initial_W, name='W', borrow=True)
115 | 
116 |         self.W = W
117 |         # tied weights, therefore W_prime is W transpose
118 |         self.W_prime = self.W.T
119 |         self.theano_rng = theano_rng
120 |         # if no input is given, generate a variable representing the input
121 |         if input is None:
122 |             # we use a matrix because we expect a minibatch of several
123 |             # examples, each example being a row
124 |             self.x = T.dmatrix(name='input')
125 |         else:
126 |             self.x = input
127 | 
128 |         self.params = [self.W]
129 | 
130 |     def get_corrupted_input(self, input, corruption_level):
131 |         """This function keeps ``1-corruption_level`` entries of the inputs the
132 |         same and zero-out randomly selected subset of size ``coruption_level``
133 |         Note : first argument of theano.rng.binomial is the shape(size) of
134 |                random numbers that it should produce
135 |                second argument is the number of trials
136 |                third argument is the probability of success of any trial
137 | 
138 |                 this will produce an array of 0s and 1s where 1 has a
139 |                 probability of 1 - ``corruption_level`` and 0 with
140 |                 ``corruption_level``
141 | 
142 |                 The binomial function return int64 data type by
143 |                 default.  int64 multiplicated by the input
144 |                 type(floatX) always return float64.  To keep all data
145 |                 in floatX when floatX is float32, we set the dtype of
146 |                 the binomial to floatX. As in our case the value of
147 |                 the binomial is always 0 or 1, this don't change the
148 |                 result. This is needed to allow the gpu to work
149 |                 correctly as it only support float32 for now.
150 | 
151 |         """
152 |         return self.theano_rng.binomial(size=input.shape, n=1,
153 |                                         p=1 - corruption_level,
154 |                                         dtype=theano.config.floatX) * input
155 | 
156 |     def get_hidden_values(self, input):
157 |         """ Computes the values of the hidden layer """
158 |         return T.nnet.sigmoid(T.dot(input, self.W))
159 | 
160 |     def get_reconstructed_input(self, hidden):
161 |         """Computes the reconstructed input given the values of the
162 |         hidden layer
163 | 
164 |         """
165 |         return T.dot(hidden, self.W_prime)
166 | 
167 |     def get_cost_updates(self, corruption_level, learning_rate):
168 |         """ This function computes the cost and the updates for one trainng
169 |         step of the dA """
170 | 
171 |         tilde_x = self.get_corrupted_input(self.x, corruption_level)
172 |         y = self.get_hidden_values(tilde_x)
173 |         z = self.get_reconstructed_input(y)
174 |         # note : we sum over the size of a datapoint; if we are using
175 |         #        minibatches, L will be a vector, with one entry per
176 |         #        example in minibatch
177 |         # L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1)
178 |         L = T.sqr(self.x - z)
179 |         # note : L is now a vector, where each element is the
180 |         #        cross-entropy cost of the reconstruction of the
181 |         #        corresponding example of the minibatch. We need to
182 |         #        compute the average of all these to get the cost of
183 |         #        the minibatch
184 |         cost = T.mean(L)
185 | 
186 |         # compute the gradients of the cost of the `dA` with respect
187 |         # to its parameters
188 |         gparams = T.grad(cost, self.params)
189 |         # generate the list of updates
190 |         updates = [
191 |             (param, param - learning_rate * gparam)
192 |             for param, gparam in zip(self.params, gparams)
193 |         ]
194 | 
195 |         return (cost, updates)
196 | 
197 | 
198 | def test_dA(learning_rate=0.1, training_epochs=15,
199 |             dataset='mnist.pkl.gz',
200 |             batch_size=20, output_folder='dA_plots'):
201 | 
202 |     """
203 |     This demo is tested on MNIST
204 | 
205 |     :type learning_rate: float
206 |     :param learning_rate: learning rate used for training the DeNosing
207 |                           AutoEncoder
208 | 
209 |     :type training_epochs: int
210 |     :param training_epochs: number of epochs used for training
211 | 
212 |     :type dataset: string
213 |     :param dataset: path to the picked dataset
214 | 
215 |     """
216 |     datasets = load_data(dataset)
217 |     train_set_x, train_set_y = datasets[0]
218 | 
219 |     # compute number of minibatches for training, validation and testing
220 |     n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size
221 | 
222 |     # start-snippet-2
223 |     # allocate symbolic variables for the data
224 |     index = T.lscalar()    # index to a [mini]batch
225 |     x = T.matrix('x')  # the data is presented as rasterized images
226 |     # end-snippet-2
227 | 
228 |     if not os.path.isdir(output_folder):
229 |         os.makedirs(output_folder)
230 |     os.chdir(output_folder)
231 | 
232 |     ####################################
233 |     # BUILDING THE MODEL NO CORRUPTION #
234 |     ####################################
235 | 
236 |     rng = numpy.random.RandomState(123)
237 |     theano_rng = RandomStreams(rng.randint(2 ** 30))
238 | 
239 |     da = dA(
240 |         numpy_rng=rng,
241 |         theano_rng=theano_rng,
242 |         input=x,
243 |         n_visible=28 * 28,
244 |         n_hidden=500
245 |     )
246 | 
247 |     cost, updates = da.get_cost_updates(
248 |         corruption_level=0.,
249 |         learning_rate=learning_rate
250 |     )
251 | 
252 |     train_da = theano.function(
253 |         [index],
254 |         cost,
255 |         updates=updates,
256 |         givens={
257 |             x: train_set_x[index * batch_size: (index + 1) * batch_size]
258 |         }
259 |     )
260 | 
261 |     start_time = timeit.default_timer()
262 | 
263 |     ############
264 |     # TRAINING #
265 |     ############
266 | 
267 |     # go through training epochs
268 |     for epoch in xrange(training_epochs):
269 |         # go through trainng set
270 |         c = []
271 |         for batch_index in xrange(n_train_batches):
272 |             c.append(train_da(batch_index))
273 | 
274 |         print 'Training epoch %d, cost ' % epoch, numpy.mean(c)
275 | 
276 |     end_time = timeit.default_timer()
277 | 
278 |     training_time = (end_time - start_time)
279 | 
280 |     print >> sys.stderr, ('The no corruption code for file ' +
281 |                           os.path.split(__file__)[1] +
282 |                           ' ran for %.2fm' % ((training_time) / 60.))
283 |     image = Image.fromarray(
284 |         tile_raster_images(X=da.W.get_value(borrow=True).T,
285 |                            img_shape=(28, 28), tile_shape=(10, 10),
286 |                            tile_spacing=(1, 1)))
287 |     image.save('filters_corruption_0.png')
288 | 
289 |     # start-snippet-3
290 |     #####################################
291 |     # BUILDING THE MODEL CORRUPTION 30% #
292 |     #####################################
293 | 
294 |     rng = numpy.random.RandomState(123)
295 |     theano_rng = RandomStreams(rng.randint(2 ** 30))
296 | 
297 |     da = dA(
298 |         numpy_rng=rng,
299 |         theano_rng=theano_rng,
300 |         input=x,
301 |         n_visible=28 * 28,
302 |         n_hidden=500
303 |     )
304 | 
305 |     cost, updates = da.get_cost_updates(
306 |         corruption_level=0.3,
307 |         learning_rate=learning_rate
308 |     )
309 | 
310 |     train_da = theano.function(
311 |         [index],
312 |         cost,
313 |         updates=updates,
314 |         givens={
315 |             x: train_set_x[index * batch_size: (index + 1) * batch_size]
316 |         }
317 |     )
318 | 
319 |     start_time = timeit.default_timer()
320 | 
321 |     ############
322 |     # TRAINING #
323 |     ############
324 | 
325 |     # go through training epochs
326 |     for epoch in xrange(training_epochs):
327 |         # go through trainng set
328 |         c = []
329 |         for batch_index in xrange(n_train_batches):
330 |             c.append(train_da(batch_index))
331 | 
332 |         print 'Training epoch %d, cost ' % epoch, numpy.mean(c)
333 | 
334 |     end_time = timeit.default_timer()
335 | 
336 |     training_time = (end_time - start_time)
337 | 
338 |     print >> sys.stderr, ('The 30% corruption code for file ' +
339 |                           os.path.split(__file__)[1] +
340 |                           ' ran for %.2fm' % (training_time / 60.))
341 |     # end-snippet-3
342 | 
343 |     # start-snippet-4
344 |     image = Image.fromarray(tile_raster_images(
345 |         X=da.W.get_value(borrow=True).T,
346 |         img_shape=(28, 28), tile_shape=(10, 10),
347 |         tile_spacing=(1, 1)))
348 |     image.save('filters_corruption_30.png')
349 |     # end-snippet-4
350 | 
351 |     os.chdir('../')
352 | 
353 | 
354 | if __name__ == '__main__':
355 |     test_dA()
356 | 


--------------------------------------------------------------------------------
/NTP.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": null,
  6 |    "metadata": {
  7 |     "collapsed": false
  8 |    },
  9 |    "outputs": [],
 10 |    "source": [
 11 |     "import os\n",
 12 |     "import sys\n",
 13 |     "import timeit\n",
 14 |     "import numpy\n",
 15 |     "from keras.models import *\n",
 16 |     "from keras.layers.core import *\n",
 17 |     "from keras.layers.embeddings import *\n",
 18 |     "from keras.regularizers import l2\n",
 19 |     "from keras import backend as K\n",
 20 |     "from scipy.io import loadmat\n",
 21 |     "from scipy.io import savemat\n",
 22 |     "from keras.models import model_from_json\n",
 23 |     "from IPython.display import SVG\n",
 24 |     "from keras.utils.visualize_util import to_graph\n",
 25 |     "from keras.callbacks import ModelCheckpoint\n",
 26 |     "import theano\n",
 27 |     "import theano.tensor as T\n",
 28 |     "from theano.tensor.shared_randomstreams import RandomStreams\n",
 29 |     "to_path = \"./\""
 30 |    ]
 31 |   },
 32 |   {
 33 |    "cell_type": "code",
 34 |    "execution_count": null,
 35 |    "metadata": {
 36 |     "collapsed": false
 37 |    },
 38 |    "outputs": [],
 39 |    "source": [
 40 |     "# Matrix of word2vec activations for each term in our dictionary, (n_terms, 300)\n",
 41 |     "term_matrix = loadmat(to_path + \"t1_termatrix.mat\", variable_names = \"target\").get(\"target\").astype(\"float32\")\n",
 42 |     "term_matrix.shape"
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": null,
 48 |    "metadata": {
 49 |     "collapsed": false
 50 |    },
 51 |    "outputs": [],
 52 |    "source": [
 53 |     "# Pretrain W2\n",
 54 |     "# W2 is pretrained by autoencoding with the formula le' = sigmoid(le dot W2) dot t(W2)\n",
 55 |     "from dA import dA\n",
 56 |     "# The dA class from deeplearning.net was modified for this purpose.  \n",
 57 |     "# In particular, the changes are:\n",
 58 |     "#     1. Biases are taken out. \n",
 59 |     "#     2. The visible layer activation function is changed from sigmoid(sigmoid(le dot W2) dot t(W2))\n",
 60 |     "#     3. mse is used as the cost function\n",
 61 |     "\n",
 62 |     "index = T.lscalar()    \n",
 63 |     "x = T.matrix('x')\n",
 64 |     "rng = numpy.random.RandomState(123)\n",
 65 |     "theano_rng = RandomStreams(rng.randint(2 ** 30))\n",
 66 |     "pretrainer = dA(input = x, numpy_rng = rng, \n",
 67 |     "                theano_rng = theano_rng, \n",
 68 |     "                n_visible = 300, n_hidden = 128)\n",
 69 |     "cost, updates = pretrainer.get_cost_updates(\n",
 70 |     "        corruption_level=0,\n",
 71 |     "        learning_rate=0.01\n",
 72 |     "    )\n",
 73 |     "train_data = theano.shared(name = \"trainer\", \n",
 74 |     "                           value = term_matrix, \n",
 75 |     "                           borrow = True)\n",
 76 |     "batch_size = 10\n",
 77 |     "train_da = theano.function(\n",
 78 |     "        [index],\n",
 79 |     "        cost,\n",
 80 |     "        updates=updates,\n",
 81 |     "        givens={\n",
 82 |     "            x: train_data[index * batch_size: (index + 1) * batch_size]\n",
 83 |     "        }\n",
 84 |     "    )"
 85 |    ]
 86 |   },
 87 |   {
 88 |    "cell_type": "code",
 89 |    "execution_count": null,
 90 |    "metadata": {
 91 |     "collapsed": false
 92 |    },
 93 |    "outputs": [],
 94 |    "source": [
 95 |     "# Ultimately, I trained for about 700 epochs\n",
 96 |     "# The measure of sparseness was used for validation because\n",
 97 |     "# of the risk of exploding/vanishing gradients\n",
 98 |     "n_epochs = 1000\n",
 99 |     "batch_size = 10\n",
100 |     "batches = term_matrix.shape[0] // batch_size\n",
101 |     "for epoch in xrange(n_epochs):\n",
102 |     "    c = []\n",
103 |     "    for batch_index in xrange(batches):\n",
104 |     "        c.append(train_da(batch_index))\n",
105 |     "    W2_pre = pretrainer.W.get_value(borrow = True)\n",
106 |     "    output = numpy.dot(term_matrix, W2_pre) \n",
107 |     "    output = 1 / (1 + numpy.exp(-output))\n",
108 |     "    sparseness = numpy.sum(output) / (output.shape[0] * output.shape[1])\n",
109 |     "    print 'Training epoch %d, cost %f, sparseness %f' % (epoch, numpy.mean(c), sparseness)"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "code",
114 |    "execution_count": null,
115 |    "metadata": {
116 |     "collapsed": false
117 |    },
118 |    "outputs": [],
119 |    "source": [
120 |     "# The trained weight matrix is extracted, and hidden-unit activations\n",
121 |     "# are extracted and saved to disk for pre-training W1. \n",
122 |     "W2_pre = pretrainer.W.get_value(borrow = True)\n",
123 |     "output = numpy.dot(term_matrix, W2_pre)\n",
124 |     "savemat(\"./t1_ntm_pretrain.mat\", { 'activations' : output,\n",
125 |     "                                 'W2' : W2_pre})\n",
126 |     "(W2_pre.shape, W2_pre[0,:], output.shape)"
127 |    ]
128 |   },
129 |   {
130 |    "cell_type": "code",
131 |    "execution_count": null,
132 |    "metadata": {
133 |     "collapsed": false
134 |    },
135 |    "outputs": [],
136 |    "source": [
137 |     "#W2_pre = loadmat(to_path + \"t1_ntm_pretrain.mat\", variable_names = \"W2\").get(\"W2\").astype(\"float32\")"
138 |    ]
139 |   },
140 |   {
141 |    "cell_type": "code",
142 |    "execution_count": null,
143 |    "metadata": {
144 |     "collapsed": false
145 |    },
146 |    "outputs": [],
147 |    "source": [
148 |     "# W1 was pretrained in R\n",
149 |     "# For each document, its pre-trained embedding is the sum of the W1 activations for all terms found in the document\n",
150 |     "pretrained_W1 = loadmat(to_path + \"t1_ntm_pret.mat\", variable_names = \"w1\").get(\"w1\").astype(\"float32\") "
151 |    ]
152 |   },
153 |   {
154 |    "cell_type": "code",
155 |    "execution_count": null,
156 |    "metadata": {
157 |     "collapsed": true
158 |    },
159 |    "outputs": [],
160 |    "source": [
161 |     "# The training set is (n_grams, 2 + n_epochs) matrix\n",
162 |     "# Columns are:\n",
163 |     "#      0. Index of a document (d_pos)\n",
164 |     "#      1. Index of a term found in d_pos (g)\n",
165 |     "#      2...(1 + n_epochs). Indices of randomly selected documents that do not contain g (d_neg)\n",
166 |     "#         d_negs were selected proportionate to the inverse of the number of terms in each document,\n",
167 |     "#         so documents get approximately the same number of total passes in each epoch\n",
168 |     "\n",
169 |     "examples = loadmat(to_path + \"t1_ntm_pret.mat\", variable_names = \"examples\").get(\"examples\")\n",
170 |     "examples = numpy.vstack(tuple([examples[:,(0,1,x)] for x in range(2, examples.shape[1])]))"
171 |    ]
172 |   },
173 |   {
174 |    "cell_type": "code",
175 |    "execution_count": null,
176 |    "metadata": {
177 |     "collapsed": false
178 |    },
179 |    "outputs": [],
180 |    "source": [
181 |     "(n_docs, n_topics, n_terms, n_total_grams) = (pretrained_W1.shape[0], \n",
182 |     "                               pretrained_W1.shape[1], \n",
183 |     "                               term_matrix.shape[0], \n",
184 |     "                                        examples.shape[0])\n",
185 |     "(n_docs, n_topics, n_terms, n_total_grams)"
186 |    ]
187 |   },
188 |   {
189 |    "cell_type": "code",
190 |    "execution_count": null,
191 |    "metadata": {
192 |     "collapsed": false,
193 |     "scrolled": true
194 |    },
195 |    "outputs": [],
196 |    "source": [
197 |     "# Basic NTM model\n",
198 |     "def build_ntm(w1_weights, \n",
199 |     "              w2_weights,\n",
200 |     "              term_matrix,\n",
201 |     "              W1_regularizer = l2(0.001), \n",
202 |     "              W2_regularizer = l2(0.001)\n",
203 |     "             ):\n",
204 |     "    \n",
205 |     "    n_docs = w1_weights.shape[0]\n",
206 |     "    n_topics = w2_weights.shape[1]\n",
207 |     "    n_terms = w2_weights.shape[0]\n",
208 |     "    \n",
209 |     "    ntm = Graph()\n",
210 |     "    \n",
211 |     "    ntm.add_input(name = \"d_pos\", input_shape = (1,), dtype = \"int\")\n",
212 |     "    ntm.add_input(name = \"d_neg\", input_shape = (1,), dtype = \"int\")\n",
213 |     "    ntm.add_shared_node(Embedding(input_dim = n_docs, \n",
214 |     "                                  output_dim = n_topics, \n",
215 |     "                                  weights = [w1_weights], \n",
216 |     "                                  W_regularizer = W1_regularizer,\n",
217 |     "                                  input_length = 1),\n",
218 |     "                        name = \"topicmatrix\",\n",
219 |     "                        inputs =  [\"d_pos\", \"d_neg\"], \n",
220 |     "                        outputs = [\"wd_pos\", \"wd_neg\"],\n",
221 |     "                        merge_mode = None)\n",
222 |     "    ntm.add_node(Flatten(), name = \"wd_pos_\", input = \"wd_pos\")\n",
223 |     "    ntm.add_node(Flatten(), name = \"wd_neg_\", input = \"wd_neg\")\n",
224 |     "    ntm.add_node(Activation(\"softmax\"), name = \"ld_pos\", input = \"wd_pos_\")\n",
225 |     "    ntm.add_node(Activation(\"softmax\"), name = \"ld_neg\", input = \"wd_neg_\")\n",
226 |     "    \n",
227 |     "    ntm.add_input(name = \"g\", input_shape = (1,), dtype = \"int\")\n",
228 |     "    ntm.add_node(Embedding(input_dim = n_terms, \n",
229 |     "                          output_dim = 300,\n",
230 |     "                          weights = [term_matrix], \n",
231 |     "                           trainable = False,\n",
232 |     "                           input_length = 1), \n",
233 |     "                 name = \"le\", input = \"g\")\n",
234 |     "    ntm.add_node(Flatten(), input = \"le\", name = \"le_\")\n",
235 |     "    ntm.add_node(Dense(n_topics, activation = \"sigmoid\", \n",
236 |     "                       weights = [w2_weights, numpy.zeros(n_topics)], \n",
237 |     "                       W_regularizer = W2_regularizer),\n",
238 |     "                 name = \"lt\", input = \"le_\")\n",
239 |     "    \n",
240 |     "    ntm.add_node(Layer(),\n",
241 |     "                       name = \"ls_pos\", \n",
242 |     "                       inputs = [\"lt\", \"ld_pos\"], \n",
243 |     "                       merge_mode = 'dot', dot_axes = -1)\n",
244 |     "    ntm.add_node(Layer(), \n",
245 |     "                       name = \"ls_neg\", \n",
246 |     "                       inputs = [\"lt\", \"ld_neg\"], \n",
247 |     "                        merge_mode = 'dot', dot_axes = -1)\n",
248 |     "    return ntm"
249 |    ]
250 |   },
251 |   {
252 |    "cell_type": "code",
253 |    "execution_count": null,
254 |    "metadata": {
255 |     "collapsed": false,
256 |     "scrolled": true
257 |    },
258 |    "outputs": [],
259 |    "source": [
260 |     "# Train the model\n",
261 |     "\n",
262 |     "# Very large batch sizes are a good idea.  \n",
263 |     "# Even with very large batches, the its unlikely that many weights in W1 will be \n",
264 |     "# triggered more than once each batch.  But, W2 weights get updated from every row. \n",
265 |     "# W2 therefore wants to overfit before W1 is finished training. Using large batch\n",
266 |     "# sizes mitigates the effect. \n",
267 |     "batch_size = 20000\n",
268 |     "n_epochs = 10\n",
269 |     "margin = 0.5\n",
270 |     "ntm = build_ntm(\n",
271 |     "              w1_weights = pretrained_W1, \n",
272 |     "              w2_weights = W2_pre,\n",
273 |     "              term_matrix = term_matrix,\n",
274 |     "              W1_regularizer = l2(0.001), \n",
275 |     "              W2_regularizer = l2(0.001))\n",
276 |     "\n",
277 |     "def output_shape(input_shape):\n",
278 |     "    return (None, 1)\n",
279 |     "\n",
280 |     "def sumLam(x):\n",
281 |     "    return (margin + (x[1] - x[0]))\n",
282 |     "\n",
283 |     "summer = LambdaMerge(layers = [ntm.nodes[\"ls_pos\"], ntm.nodes[\"ls_neg\"] ], \n",
284 |     "                     function = sumLam,\n",
285 |     "                     output_shape = output_shape)\n",
286 |     "ntm.add_node(summer, inputs = [\"ls_pos\", \"ls_neg\"], \n",
287 |     "             name = \"summed\")\n",
288 |     "ntm.add_output(name = \"loss_out\",  input= \"summed\")\n",
289 |     "\n",
290 |     "def rawloss(x_train, x_test):\n",
291 |     "    return x_train * x_test\n",
292 |     "\n",
293 |     "# Adadelta tended to converge more quickly than SGD\n",
294 |     "ntm.compile(loss = {'loss_out' : rawloss},\n",
295 |     "           optimizer = 'Adadelta') \n",
296 |     "\n",
297 |     "checkpointer = ModelCheckpoint(filepath=\"./checkpointweights.hdf5\", verbose = 1, save_best_only=True)\n",
298 |     "\n",
299 |     "train_data = examples\n",
300 |     "train_shape = (train_data.shape[0], 1)\n",
301 |     "g = numpy.reshape(examples[:,1], train_shape)\n",
302 |     "d_pos = numpy.reshape(examples[:,0], train_shape)\n",
303 |     "d_neg = numpy.reshape(examples[:,2], train_shape)\n",
304 |     "        \n",
305 |     "ntm.fit(data = {\n",
306 |     "            \"g\" : g, \n",
307 |     "            \"d_pos\" : d_pos, \n",
308 |     "            \"d_neg\" : d_neg,\n",
309 |     "            \"loss_out\" : numpy.reshape(numpy.ones(trainer.shape[0], \n",
310 |     "                                                  dtype = theano.config.floatX), train_shape)\n",
311 |     "        }, callbacks = [checkpointer],\n",
312 |     "        validation_split = 0.005,\n",
313 |     "        nb_epoch = n_epochs, \n",
314 |     "        batch_size = batch_size)"
315 |    ]
316 |   },
317 |   {
318 |    "cell_type": "code",
319 |    "execution_count": null,
320 |    "metadata": {
321 |     "collapsed": false
322 |    },
323 |    "outputs": [],
324 |    "source": [
325 |     "json_string = ntm.to_json()\n",
326 |     "open('ntm_final.json', 'w').write(json_string)\n",
327 |     "ntm.save_weights(to_path + 'ntm_finalweights_.h5', overwrite=True)"
328 |    ]
329 |   },
330 |   {
331 |    "cell_type": "code",
332 |    "execution_count": null,
333 |    "metadata": {
334 |     "collapsed": true
335 |    },
336 |    "outputs": [],
337 |    "source": [
338 |     "# sNTM - Not fully tested\n",
339 |     "n_categories = 3\n",
340 |     "ntm.add_node(Dense(n_categories, activation = \"sigmoid\"), input = \"ld_pos\", name = \"ll\")\n",
341 |     "ntm.add_output(name = \"label\", input = \"ll\")\n",
342 |     "ntm.compile(loss = {'loss_out' : threshold,\n",
343 |     "                   'label' : 'categorical_crossentropy'}, \n",
344 |     "           optimizer = \"Adadelta\")"
345 |    ]
346 |   }
347 |  ],
348 |  "metadata": {
349 |   "kernelspec": {
350 |    "display_name": "Python 2",
351 |    "language": "python",
352 |    "name": "python2"
353 |   },
354 |   "language_info": {
355 |    "codemirror_mode": {
356 |     "name": "ipython",
357 |     "version": 2
358 |    },
359 |    "file_extension": ".py",
360 |    "mimetype": "text/x-python",
361 |    "name": "python",
362 |    "nbconvert_exporter": "python",
363 |    "pygments_lexer": "ipython2",
364 |    "version": "2.7.11"
365 |   }
366 |  },
367 |  "nbformat": 4,
368 |  "nbformat_minor": 0
369 | }
370 | 


--------------------------------------------------------------------------------
/NTP2.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "code",
   5 |    "execution_count": 1,
   6 |    "metadata": {
   7 |     "collapsed": false
   8 |    },
   9 |    "outputs": [
  10 |     {
  11 |      "name": "stdout",
  12 |      "output_type": "stream",
  13 |      "text": [
  14 |       "Using Theano backend.\n"
  15 |      ]
  16 |     },
  17 |     {
  18 |      "name": "stderr",
  19 |      "output_type": "stream",
  20 |      "text": [
  21 |       "Using gpu device 0: GeForce GTX 980 (CNMeM is enabled)\n"
  22 |      ]
  23 |     }
  24 |    ],
  25 |    "source": [
  26 |     "import os\n",
  27 |     "import sys\n",
  28 |     "import timeit\n",
  29 |     "import numpy\n",
  30 |     "from keras.models import *\n",
  31 |     "from keras.layers.core import *\n",
  32 |     "from keras.layers.embeddings import *\n",
  33 |     "from keras.optimizers import SGD,Adadelta,Adam\n",
  34 |     "from keras.regularizers import l2, l1l2\n",
  35 |     "from keras.constraints import unitnorm,nonneg\n",
  36 |     "from keras.layers.advanced_activations import ThresholdedReLU\n",
  37 |     "from keras import backend as K\n",
  38 |     "from scipy.io import loadmat\n",
  39 |     "from scipy.io import savemat\n",
  40 |     "from keras.models import model_from_json\n",
  41 |     "from IPython.display import SVG\n",
  42 |     "from keras.utils.visualize_util import to_graph\n",
  43 |     "from keras.callbacks import ModelCheckpoint,RemoteMonitor\n",
  44 |     "import theano\n",
  45 |     "import theano.tensor as T\n",
  46 |     "import h5py\n",
  47 |     "from theano.tensor.shared_randomstreams import RandomStreams\n",
  48 |     "to_path = \"./\""
  49 |    ]
  50 |   },
  51 |   {
  52 |    "cell_type": "code",
  53 |    "execution_count": 2,
  54 |    "metadata": {
  55 |     "collapsed": false
  56 |    },
  57 |    "outputs": [
  58 |     {
  59 |      "data": {
  60 |       "text/plain": [
  61 |        "(28956, 300)"
  62 |       ]
  63 |      },
  64 |      "execution_count": 2,
  65 |      "metadata": {},
  66 |      "output_type": "execute_result"
  67 |     }
  68 |    ],
  69 |    "source": [
  70 |     "term_matrix = loadmat(to_path + \"t1_termatrix.mat\", variable_names = \"target\").get(\"target\").astype(\"float32\")\n",
  71 |     "term_matrix.shape"
  72 |    ]
  73 |   },
  74 |   {
  75 |    "cell_type": "code",
  76 |    "execution_count": 3,
  77 |    "metadata": {
  78 |     "collapsed": false
  79 |    },
  80 |    "outputs": [],
  81 |    "source": [
  82 |     "class SymmetricAutoencoder(Layer):\n",
  83 |     "    '''AutoEncoder where reconstruction = reconstruction_activation(activation(x * W) * W')\n",
  84 |     "    # Input shape\n",
  85 |     "        2D tensor with shape: `(nb_samples, input_dim)`.\n",
  86 |     "    # Output shape\n",
  87 |     "        2D tensor with shape: `(nb_samples, input_dim)` if output_reconstruction = True,\n",
  88 |     "        shape: `(nb_samples,output_dim)` if output_reconstruction = False\n",
  89 |     "    # Arguments\n",
  90 |     "        output_dim: int > 0.\n",
  91 |     "        init: name of initialization function for the weights of the layer\n",
  92 |     "            (see [initializations](../initializations.md)),\n",
  93 |     "            or alternatively, Theano function to use for weights\n",
  94 |     "            initialization. This parameter is only relevant\n",
  95 |     "            if you don't pass a `weights` argument.\n",
  96 |     "        activation: name of activation function to use\n",
  97 |     "            (see [activations](../activations.md)),\n",
  98 |     "            or alternatively, elementwise Theano function.\n",
  99 |     "            If you don't specify anything, no activation is applied\n",
 100 |     "            (ie. \"linear\" activation: a(x) = x).\n",
 101 |     "        weights: list of numpy arrays to set as initial weights.\n",
 102 |     "            The list should have 1 element, of shape `(input_dim, output_dim)`.\n",
 103 |     "        output_reconstruction: Whether, when not being trained, the output of the \n",
 104 |     "            layer should be the reconstructed input, or the hidden layer activations.\n",
 105 |     "        W_regularizer: instance of [WeightRegularizer](../regularizers.md)\n",
 106 |     "            (eg. L1 or L2 regularization), applied to the main weights matrix.\n",
 107 |     "        activity_regularizer: instance of [ActivityRegularizer](../regularizers.md),\n",
 108 |     "            applied to the network output.\n",
 109 |     "        W_constraint: instance of the [constraints](../constraints.md) module\n",
 110 |     "            (eg. maxnorm, nonneg), applied to the main weights matrix.\n",
 111 |     "        input_dim: dimensionality of the input (integer).\n",
 112 |     "            This argument (or alternatively, the keyword argument `input_shape`)\n",
 113 |     "            is required when using this layer as the first layer in a model.\n",
 114 |     "    '''\n",
 115 |     "    input_ndim = 2\n",
 116 |     "\n",
 117 |     "    def __init__(self, output_dim, init='glorot_uniform', activation='linear',\n",
 118 |     "                 reconstruction_activation='linear', weights=None,\n",
 119 |     "                 W_regularizer=None, b_regularizer=None, activity_regularizer=None,\n",
 120 |     "                 output_reconstruction=False,\n",
 121 |     "                 W_constraint=None, b_constraint=None, input_dim=None, **kwargs):\n",
 122 |     "        self.init = initializations.get(init)\n",
 123 |     "        self.activation = activations.get(activation)\n",
 124 |     "        self.reconstruction_activation = activations.get(reconstruction_activation)\n",
 125 |     "        self.output_reconstruction = output_reconstruction\n",
 126 |     "        self.output_dim = output_dim\n",
 127 |     "        self.pretrain = True\n",
 128 |     "\n",
 129 |     "        self.W_regularizer = regularizers.get(W_regularizer)\n",
 130 |     "        self.b_regularizer = regularizers.get(b_regularizer)\n",
 131 |     "        self.activity_regularizer = regularizers.get(activity_regularizer)\n",
 132 |     "\n",
 133 |     "        self.W_constraint = constraints.get(W_constraint)\n",
 134 |     "        self.b_constraint = constraints.get(b_constraint)\n",
 135 |     "        self.constraints = [self.W_constraint, self.b_constraint]\n",
 136 |     "\n",
 137 |     "        self.initial_weights = weights\n",
 138 |     "\n",
 139 |     "        self.input_dim = input_dim\n",
 140 |     "        if self.input_dim:\n",
 141 |     "            kwargs['input_shape'] = (self.input_dim,)\n",
 142 |     "        self.input = K.placeholder(ndim=2)\n",
 143 |     "        super(SymmetricAutoencoder, self).__init__(**kwargs)\n",
 144 |     "\n",
 145 |     "    def build(self):\n",
 146 |     "        input_dim = self.input_shape[1]\n",
 147 |     "\n",
 148 |     "        self.W = self.init((input_dim, self.output_dim))\n",
 149 |     "\n",
 150 |     "        self.params = [self.W]\n",
 151 |     "\n",
 152 |     "        self.regularizers = []\n",
 153 |     "        if self.W_regularizer:\n",
 154 |     "            self.W_regularizer.set_param(self.W)\n",
 155 |     "            self.regularizers.append(self.W_regularizer)\n",
 156 |     "\n",
 157 |     "        if self.activity_regularizer:\n",
 158 |     "            self.activity_regularizer.set_layer(self)\n",
 159 |     "            self.regularizers.append(self.activity_regularizer)\n",
 160 |     "\n",
 161 |     "        if self.initial_weights is not None:\n",
 162 |     "            self.set_weights(self.initial_weights)\n",
 163 |     "            del self.initial_weights\n",
 164 |     "\n",
 165 |     "    @property\n",
 166 |     "    def output_shape(self):\n",
 167 |     "        if self.pretrain or self.output_reconstruction: \n",
 168 |     "            return self.input_shape\n",
 169 |     "        else:\n",
 170 |     "            return (self.input_shape[0], self.output_dim)\n",
 171 |     "\n",
 172 |     "    def get_output(self, train=False):\n",
 173 |     "        X = self.get_input(train)\n",
 174 |     "        if self.pretrain or self.output_reconstruction: \n",
 175 |     "            output = self.reconstruction_activation(K.dot(self.activation(K.dot(X, self.W)), K.transpose(self.W)))\n",
 176 |     "            return output            \n",
 177 |     "        else:\n",
 178 |     "            output = self.activation(K.dot(X, self.W))\n",
 179 |     "            return output\n",
 180 |     "\n",
 181 |     "    def get_config(self):\n",
 182 |     "        config = {'name': self.__class__.__name__,\n",
 183 |     "                  'output_dim': self.output_dim,\n",
 184 |     "                  'init': self.init.__name__,\n",
 185 |     "                  'activation': self.activation.__name__,\n",
 186 |     "                  'reconstruction_activation': self.reconstruction_activation.__name__,\n",
 187 |     "                  'W_regularizer': self.W_regularizer.get_config() if self.W_regularizer else None,\n",
 188 |     "                  'activity_regularizer': self.activity_regularizer.get_config() if self.activity_regularizer else None,\n",
 189 |     "                  'W_constraint': self.W_constraint.get_config() if self.W_constraint else None,\n",
 190 |     "                  'input_dim': self.input_dim}\n",
 191 |     "        base_config = super(SymmetricAutoencoder, self).get_config()\n",
 192 |     "        return dict(list(base_config.items()) + list(config.items()))"
 193 |    ]
 194 |   },
 195 |   {
 196 |    "cell_type": "code",
 197 |    "execution_count": 4,
 198 |    "metadata": {
 199 |     "collapsed": false
 200 |    },
 201 |    "outputs": [],
 202 |    "source": [
 203 |     "encoder = Sequential()\n",
 204 |     "encoder.add(Embedding(\n",
 205 |     "        input_dim = term_matrix.shape[0], \n",
 206 |     "                          output_dim = 300,\n",
 207 |     "                          weights = [term_matrix], \n",
 208 |     "                           trainable = False,\n",
 209 |     "                           input_length = 1)\n",
 210 |     "    )\n",
 211 |     "encoder.add(Flatten())\n",
 212 |     "encoder.add(SymmetricAutoencoder(\n",
 213 |     "        activation = 'sigmoid',\n",
 214 |     "        reconstruction_activation = 'linear',\n",
 215 |     "        output_dim=40\n",
 216 |     "    ))\n",
 217 |     "inputs = numpy.reshape(numpy.arange(term_matrix.shape[0]), (term_matrix.shape[0], 1))\n",
 218 |     "outputs = term_matrix"
 219 |    ]
 220 |   },
 221 |   {
 222 |    "cell_type": "code",
 223 |    "execution_count": null,
 224 |    "metadata": {
 225 |     "collapsed": false
 226 |    },
 227 |    "outputs": [
 228 |     {
 229 |      "name": "stdout",
 230 |      "output_type": "stream",
 231 |      "text": [
 232 |       "Epoch 1/1000\n",
 233 |       "28956/28956 [==============================] - 36s - loss: 0.0145    \n",
 234 |       "Epoch 2/1000\n",
 235 |       "28956/28956 [==============================] - 35s - loss: 0.0115    \n",
 236 |       "Epoch 3/1000\n",
 237 |       "28956/28956 [==============================] - 34s - loss: 0.0111    \n",
 238 |       "Epoch 4/1000\n",
 239 |       "28956/28956 [==============================] - 45s - loss: 0.0110    \n",
 240 |       "Epoch 5/1000\n",
 241 |       "28956/28956 [==============================] - 42s - loss: 0.0109    \n",
 242 |       "Epoch 6/1000\n",
 243 |       "28956/28956 [==============================] - 35s - loss: 0.0109    \n",
 244 |       "Epoch 7/1000\n",
 245 |       "28956/28956 [==============================] - 36s - loss: 0.0109    \n",
 246 |       "Epoch 8/1000\n",
 247 |       "28956/28956 [==============================] - 44s - loss: 0.0108    \n",
 248 |       "Epoch 9/1000\n",
 249 |       "28956/28956 [==============================] - 37s - loss: 0.0108    \n",
 250 |       "Epoch 10/1000\n",
 251 |       "28956/28956 [==============================] - 37s - loss: 0.0108    \n",
 252 |       "Epoch 11/1000\n",
 253 |       "28956/28956 [==============================] - 37s - loss: 0.0108    \n",
 254 |       "Epoch 12/1000\n",
 255 |       "28956/28956 [==============================] - 35s - loss: 0.0108    \n",
 256 |       "Epoch 13/1000\n",
 257 |       "28956/28956 [==============================] - 42s - loss: 0.0108    \n",
 258 |       "Epoch 14/1000\n",
 259 |       "28956/28956 [==============================] - 43s - loss: 0.0108    \n",
 260 |       "Epoch 15/1000\n",
 261 |       "28956/28956 [==============================] - 39s - loss: 0.0108    \n",
 262 |       "Epoch 16/1000\n",
 263 |       "28956/28956 [==============================] - 38s - loss: 0.0107    \n",
 264 |       "Epoch 17/1000\n",
 265 |       "28956/28956 [==============================] - 39s - loss: 0.0107    \n",
 266 |       "Epoch 18/1000\n",
 267 |       "28956/28956 [==============================] - 38s - loss: 0.0107    \n",
 268 |       "Epoch 19/1000\n",
 269 |       "28956/28956 [==============================] - 45s - loss: 0.0107    \n",
 270 |       "Epoch 20/1000\n",
 271 |       "28956/28956 [==============================] - 39s - loss: 0.0107    \n",
 272 |       "Epoch 21/1000\n",
 273 |       "28956/28956 [==============================] - 38s - loss: 0.0107    \n",
 274 |       "Epoch 22/1000\n",
 275 |       "28956/28956 [==============================] - 39s - loss: 0.0107    \n",
 276 |       "Epoch 23/1000\n",
 277 |       "28956/28956 [==============================] - 38s - loss: 0.0107    \n",
 278 |       "Epoch 24/1000\n",
 279 |       "28956/28956 [==============================] - 37s - loss: 0.0107    \n",
 280 |       "Epoch 25/1000\n",
 281 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 282 |       "Epoch 26/1000\n",
 283 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 284 |       "Epoch 27/1000\n",
 285 |       "28956/28956 [==============================] - 38s - loss: 0.0107    \n",
 286 |       "Epoch 28/1000\n",
 287 |       "28956/28956 [==============================] - 47s - loss: 0.0107    \n",
 288 |       "Epoch 29/1000\n",
 289 |       "28956/28956 [==============================] - 37s - loss: 0.0107    \n",
 290 |       "Epoch 30/1000\n",
 291 |       "28956/28956 [==============================] - 37s - loss: 0.0107    \n",
 292 |       "Epoch 31/1000\n",
 293 |       "28956/28956 [==============================] - 34s - loss: 0.0107    \n",
 294 |       "Epoch 32/1000\n",
 295 |       "28956/28956 [==============================] - 34s - loss: 0.0107    \n",
 296 |       "Epoch 33/1000\n",
 297 |       "28956/28956 [==============================] - 38s - loss: 0.0107    \n",
 298 |       "Epoch 34/1000\n",
 299 |       "28956/28956 [==============================] - 41s - loss: 0.0107    \n",
 300 |       "Epoch 35/1000\n",
 301 |       "28956/28956 [==============================] - 47s - loss: 0.0107    \n",
 302 |       "Epoch 36/1000\n",
 303 |       "28956/28956 [==============================] - 41s - loss: 0.0107    \n",
 304 |       "Epoch 37/1000\n",
 305 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 306 |       "Epoch 38/1000\n",
 307 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 308 |       "Epoch 39/1000\n",
 309 |       "28956/28956 [==============================] - 44s - loss: 0.0107    \n",
 310 |       "Epoch 40/1000\n",
 311 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 312 |       "Epoch 41/1000\n",
 313 |       "28956/28956 [==============================] - 34s - loss: 0.0107    \n",
 314 |       "Epoch 42/1000\n",
 315 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 316 |       "Epoch 90/1000\n",
 317 |       "28956/28956 [==============================] - 38s - loss: 0.0107    \n",
 318 |       "Epoch 93/1000\n",
 319 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 320 |       "Epoch 98/1000\n",
 321 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 322 |       "Epoch 101/1000\n",
 323 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 324 |       "Epoch 106/1000\n",
 325 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 326 |       "Epoch 109/1000\n",
 327 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 328 |       "Epoch 112/1000\n",
 329 |       "28956/28956 [==============================] - 40s - loss: 0.0107    \n",
 330 |       "Epoch 117/1000\n",
 331 |       "28956/28956 [==============================] - 40s - loss: 0.0107    \n",
 332 |       "Epoch 122/1000\n",
 333 |       "28956/28956 [==============================] - 34s - loss: 0.0107    \n",
 334 |       "Epoch 127/1000\n",
 335 |       "28956/28956 [==============================] - 34s - loss: 0.0107    \n",
 336 |       "Epoch 130/1000\n",
 337 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 338 |       "Epoch 135/1000\n",
 339 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 340 |       "Epoch 138/1000\n",
 341 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 342 |       "Epoch 141/1000\n",
 343 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 344 |       "Epoch 149/1000\n",
 345 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 346 |       "Epoch 152/1000\n",
 347 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 348 |       "Epoch 157/1000\n",
 349 |       "28956/28956 [==============================] - 34s - loss: 0.0107    \n",
 350 |       "Epoch 160/1000\n",
 351 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 352 |       "Epoch 163/1000\n",
 353 |       "28956/28956 [==============================] - 42s - loss: 0.0107    \n",
 354 |       "Epoch 170/1000\n",
 355 |       "28956/28956 [==============================] - 39s - loss: 0.0107    \n",
 356 |       "Epoch 174/1000\n",
 357 |       "28956/28956 [==============================] - 34s - loss: 0.0107    \n",
 358 |       "Epoch 179/1000\n",
 359 |       "28956/28956 [==============================] - 34s - loss: 0.0107    \n",
 360 |       "Epoch 182/1000\n",
 361 |       "28956/28956 [==============================] - 39s - loss: 0.0107    \n",
 362 |       "Epoch 187/1000\n",
 363 |       "28956/28956 [==============================] - 42s - loss: 0.0107    \n",
 364 |       "Epoch 194/1000\n",
 365 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 366 |       "Epoch 199/1000\n",
 367 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 368 |       "Epoch 202/1000\n",
 369 |       "28956/28956 [==============================] - 37s - loss: 0.0107    \n",
 370 |       "Epoch 207/1000\n",
 371 |       "28956/28956 [==============================] - 36s - loss: 0.0107    \n",
 372 |       "Epoch 212/1000\n",
 373 |       "28956/28956 [==============================] - 37s - loss: 0.0107    \n",
 374 |       "Epoch 217/1000\n",
 375 |       "28956/28956 [==============================] - 37s - loss: 0.0107    \n",
 376 |       "Epoch 222/1000\n",
 377 |       "28956/28956 [==============================] - 38s - loss: 0.0107    \n",
 378 |       "Epoch 227/1000\n",
 379 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 380 |       "Epoch 232/1000\n",
 381 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 382 |       "Epoch 235/1000\n",
 383 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 384 |       "Epoch 238/1000\n",
 385 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 386 |       "Epoch 243/1000\n",
 387 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 388 |       "Epoch 246/1000\n",
 389 |       "28956/28956 [==============================] - 34s - loss: 0.0107    \n",
 390 |       "Epoch 257/1000\n",
 391 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 392 |       "Epoch 260/1000\n",
 393 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 394 |       "Epoch 265/1000\n",
 395 |       "28956/28956 [==============================] - 36s - loss: 0.0107    \n",
 396 |       "Epoch 268/1000\n",
 397 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 398 |       "Epoch 276/1000\n",
 399 |       "28956/28956 [==============================] - 35s - loss: 0.0107    \n",
 400 |       "Epoch 284/1000\n",
 401 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 402 |       "Epoch 289/1000\n",
 403 |       "28956/28956 [==============================] - 34s - loss: 0.0107    \n",
 404 |       "Epoch 300/1000\n",
 405 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 406 |       "Epoch 303/1000\n",
 407 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 408 |       "Epoch 306/1000\n",
 409 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 410 |       "Epoch 309/1000\n",
 411 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 412 |       "Epoch 312/1000\n",
 413 |       "28956/28956 [==============================] - 31s - loss: 0.0107    \n",
 414 |       "Epoch 315/1000\n",
 415 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 416 |       "Epoch 318/1000\n",
 417 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 418 |       "Epoch 321/1000\n",
 419 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 420 |       "Epoch 324/1000\n",
 421 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 422 |       "Epoch 327/1000\n",
 423 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 424 |       "Epoch 330/1000\n",
 425 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 426 |       "Epoch 356/1000\n",
 427 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 428 |       "Epoch 359/1000\n",
 429 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 430 |       "Epoch 362/1000\n",
 431 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 432 |       "Epoch 365/1000\n",
 433 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 434 |       "Epoch 368/1000\n",
 435 |       "28956/28956 [==============================] - 32s - loss: 0.0107    \n",
 436 |       "Epoch 371/1000\n",
 437 |       "28956/28956 [==============================] - 33s - loss: 0.0107    \n",
 438 |       "Epoch 374/1000\n",
 439 |       "28956/28956 [==============================] - 35s - loss: 0.0106    \n",
 440 |       "Epoch 397/1000\n",
 441 |       "28956/28956 [==============================] - 36s - loss: 0.0106    \n",
 442 |       "Epoch 408/1000\n",
 443 |       "28956/28956 [==============================] - 36s - loss: 0.0107    \n",
 444 |       "Epoch 413/1000\n",
 445 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 446 |       "Epoch 454/1000\n",
 447 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 448 |       "Epoch 457/1000\n",
 449 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 450 |       "Epoch 460/1000\n",
 451 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 452 |       "Epoch 463/1000\n",
 453 |       "28956/28956 [==============================] - 34s - loss: 0.0106    \n",
 454 |       "Epoch 504/1000\n",
 455 |       "28956/28956 [==============================] - 38s - loss: 0.0107    \n",
 456 |       "Epoch 526/1000\n",
 457 |       "28956/28956 [==============================] - 56s - loss: 0.0106    \n",
 458 |       "Epoch 530/1000\n",
 459 |       "28956/28956 [==============================] - 46s - loss: 0.0106    \n",
 460 |       "Epoch 537/1000\n",
 461 |       "28956/28956 [==============================] - 33s - loss: 0.0106    \n",
 462 |       "Epoch 550/1000\n",
 463 |       "28956/28956 [==============================] - 43s - loss: 0.0106    \n",
 464 |       "Epoch 555/1000\n",
 465 |       "28956/28956 [==============================] - 40s - loss: 0.0106    \n",
 466 |       "Epoch 562/1000\n",
 467 |       "28956/28956 [==============================] - 41s - loss: 0.0106    \n",
 468 |       "Epoch 564/1000\n",
 469 |       "28956/28956 [==============================] - 38s - loss: 0.0106    \n",
 470 |       "Epoch 569/1000\n",
 471 |       "28956/28956 [==============================] - 35s - loss: 0.0106    \n",
 472 |       "Epoch 574/1000\n",
 473 |       "28956/28956 [==============================] - 48s - loss: 0.0106    \n",
 474 |       "Epoch 584/1000\n",
 475 |       "28956/28956 [==============================] - 44s - loss: 0.0106    \n",
 476 |       "Epoch 586/1000\n",
 477 |       "28956/28956 [==============================] - 50s - loss: 0.0106    \n",
 478 |       "Epoch 588/1000\n",
 479 |       "28956/28956 [==============================] - 57s - loss: 0.0106    \n",
 480 |       "Epoch 592/1000\n",
 481 |       "28956/28956 [==============================] - 136s - loss: 0.0106   \n",
 482 |       "Epoch 594/1000\n",
 483 |       "28956/28956 [==============================] - 114s - loss: 0.0106   \n",
 484 |       "Epoch 597/1000\n",
 485 |       "28956/28956 [==============================] - 100s - loss: 0.0106   \n",
 486 |       "Epoch 601/1000\n",
 487 |       "28956/28956 [==============================] - 37s - loss: 0.0106    \n",
 488 |       "Epoch 603/1000\n",
 489 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 490 |       "Epoch 617/1000\n",
 491 |       "28956/28956 [==============================] - 40s - loss: 0.0106    \n",
 492 |       "Epoch 622/1000\n",
 493 |       "28956/28956 [==============================] - 43s - loss: 0.0106    \n",
 494 |       "Epoch 630/1000\n",
 495 |       "28956/28956 [==============================] - 42s - loss: 0.0106    \n",
 496 |       "Epoch 640/1000\n",
 497 |       "28956/28956 [==============================] - 37s - loss: 0.0106    \n",
 498 |       "Epoch 645/1000\n",
 499 |       "28956/28956 [==============================] - 43s - loss: 0.0106    \n",
 500 |       "Epoch 650/1000\n",
 501 |       "28956/28956 [==============================] - 38s - loss: 0.0106    \n",
 502 |       "Epoch 655/1000\n",
 503 |       "28956/28956 [==============================] - 38s - loss: 0.0106    \n",
 504 |       "Epoch 660/1000\n",
 505 |       "28956/28956 [==============================] - 35s - loss: 0.0106    \n",
 506 |       "Epoch 668/1000\n",
 507 |       "28956/28956 [==============================] - 37s - loss: 0.0106    \n",
 508 |       "Epoch 683/1000\n",
 509 |       "28956/28956 [==============================] - 37s - loss: 0.0106    \n",
 510 |       "Epoch 688/1000\n",
 511 |       "28956/28956 [==============================] - 35s - loss: 0.0106    \n",
 512 |       "Epoch 699/1000\n",
 513 |       "28956/28956 [==============================] - 35s - loss: 0.0106    \n",
 514 |       "Epoch 715/1000\n",
 515 |       "28956/28956 [==============================] - 35s - loss: 0.0106    \n",
 516 |       "Epoch 718/1000\n",
 517 |       "28956/28956 [==============================] - 37s - loss: 0.0106    \n",
 518 |       "Epoch 726/1000\n",
 519 |       "28956/28956 [==============================] - 33s - loss: 0.0106    \n",
 520 |       "Epoch 740/1000\n",
 521 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 522 |       "Epoch 751/1000\n",
 523 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 524 |       "Epoch 763/1000\n",
 525 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 526 |       "Epoch 766/1000\n",
 527 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 528 |       "Epoch 769/1000\n",
 529 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 530 |       "Epoch 772/1000\n",
 531 |       "28956/28956 [==============================] - 34s - loss: 0.0106    \n",
 532 |       "Epoch 775/1000\n",
 533 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 534 |       "Epoch 807/1000\n",
 535 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 536 |       "Epoch 810/1000\n",
 537 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 538 |       "Epoch 813/1000\n",
 539 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 540 |       "Epoch 816/1000\n",
 541 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 542 |       "Epoch 822/1000\n",
 543 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 544 |       "Epoch 859/1000\n",
 545 |       "28956/28956 [==============================] - 31s - loss: 0.0106    \n",
 546 |       "Epoch 869/1000\n",
 547 |       "28956/28956 [==============================] - 32s - loss: 0.0106    \n",
 548 |       "Epoch 872/1000\n",
 549 |       "28956/28956 [==============================] - 31s - loss: 0.0106    \n",
 550 |       "Epoch 875/1000\n",
 551 |       "28956/28956 [==============================] - 31s - loss: 0.0106    \n",
 552 |       "Epoch 916/1000\n",
 553 |       "28956/28956 [==============================] - 31s - loss: 0.0106    \n",
 554 |       "Epoch 954/1000\n",
 555 |       "28956/28956 [==============================] - 31s - loss: 0.0106    \n",
 556 |       "Epoch 957/1000\n",
 557 |       "28956/28956 [==============================] - 31s - loss: 0.0106    \n",
 558 |       "Epoch 960/1000\n",
 559 |       "28956/28956 [==============================] - 31s - loss: 0.0106    \n",
 560 |       "Epoch 986/1000\n",
 561 |       "28956/28956 [==============================] - 31s - loss: 0.0106    \n",
 562 |       "Epoch 989/1000\n",
 563 |       "28956/28956 [==============================] - 31s - loss: 0.0106    \n",
 564 |       "Epoch 992/1000\n",
 565 |       " 8077/28956 [=======>......................] - ETA: 26s - loss: 0.0105"
 566 |      ]
 567 |     }
 568 |    ],
 569 |    "source": [
 570 |     "encoder.compile(loss = 'mse', optimizer = 'Adadelta')\n",
 571 |     "\n",
 572 |     "history = encoder.fit(inputs, outputs, nb_epoch = 1000, batch_size = 1)"
 573 |    ]
 574 |   },
 575 |   {
 576 |    "cell_type": "code",
 577 |    "execution_count": 6,
 578 |    "metadata": {
 579 |     "collapsed": true
 580 |    },
 581 |    "outputs": [],
 582 |    "source": [
 583 |     "encoder.save_weights(\"W1_pretrain_40.hdf5\")\n",
 584 |     "#encoder.load_weights(\"W1_pretrain_Adam_1000_loss_0037.hdf5\")"
 585 |    ]
 586 |   },
 587 |   {
 588 |    "cell_type": "code",
 589 |    "execution_count": 31,
 590 |    "metadata": {
 591 |     "collapsed": false
 592 |    },
 593 |    "outputs": [
 594 |     {
 595 |      "data": {
 596 |       "text/plain": [
 597 |        "(28956, 300)"
 598 |       ]
 599 |      },
 600 |      "execution_count": 31,
 601 |      "metadata": {},
 602 |      "output_type": "execute_result"
 603 |     }
 604 |    ],
 605 |    "source": [
 606 |     "encoder.output_reconstruction = False\n",
 607 |     "encoder.pretrain = False\n",
 608 |     "activations = encoder.predict(inputs, batch_size = 15000)\n",
 609 |     "#savemat(\"./t1_ntm_pretrain.mat\", { 'activations' : activations,\n",
 610 |     "#                                 'W2' : encoder.get_weights()[1]})\n",
 611 |     "activations.shape\n",
 612 |     "import h5py\n",
 613 |     "h5f = h5py.File(\"activations.hdf5\")\n",
 614 |     "h5f.create_dataset('activations', data = activations)\n",
 615 |     "h5f.close()"
 616 |    ]
 617 |   },
 618 |   {
 619 |    "cell_type": "code",
 620 |    "execution_count": 23,
 621 |    "metadata": {
 622 |     "collapsed": false
 623 |    },
 624 |    "outputs": [
 625 |     {
 626 |      "data": {
 627 |       "text/plain": [
 628 |        "(300, 40)"
 629 |       ]
 630 |      },
 631 |      "execution_count": 23,
 632 |      "metadata": {},
 633 |      "output_type": "execute_result"
 634 |     }
 635 |    ],
 636 |    "source": [
 637 |     "#get initial weights for W2 from the autoencoder\n",
 638 |     "#pretrained_W2 = encoder.get_weights()[1]\n",
 639 |     "#pretrained_W2 = loadmat(to_path + \"t1_ntm_pretrain.mat\", variable_names = \"W2\").get(\"W2\").astype(\"float32\")\n",
 640 |     "h5w2 = h5py.File('W1_pretrain_40.hdf5', 'r')\n",
 641 |     "h5w2['/layer_2'].items()\n",
 642 |     "pretrained_W2 = h5w2['layer_2/param_0'][:]\n",
 643 |     "h5w2.close()\n",
 644 |     "pretrained_W2.shape"
 645 |    ]
 646 |   },
 647 |   {
 648 |    "cell_type": "code",
 649 |    "execution_count": 4,
 650 |    "metadata": {
 651 |     "collapsed": false
 652 |    },
 653 |    "outputs": [],
 654 |    "source": [
 655 |     "#get initial weights for W1 that were pretrained in R based on the autoencoder activations\n",
 656 |     "#pretrained_W1 = loadmat(to_path + \"t1_ntm_pret.mat\", variable_names = \"w1\").get(\"w1\").astype(\"float32\") \n",
 657 |     "\n",
 658 |     "examples = loadmat(to_path + \"t1_ntm_pret.mat\", variable_names = \"examples\").get(\"examples\")\n",
 659 |     "# Take the multiple sets and combine them into one big super-epoch\n",
 660 |     "examples = numpy.vstack(tuple([examples[:,(0,1,x)] for x in range(2, examples.shape[1])]))"
 661 |    ]
 662 |   },
 663 |   {
 664 |    "cell_type": "code",
 665 |    "execution_count": 5,
 666 |    "metadata": {
 667 |     "collapsed": false
 668 |    },
 669 |    "outputs": [
 670 |     {
 671 |      "data": {
 672 |       "text/plain": [
 673 |        "(954905, 40)"
 674 |       ]
 675 |      },
 676 |      "execution_count": 5,
 677 |      "metadata": {},
 678 |      "output_type": "execute_result"
 679 |     }
 680 |    ],
 681 |    "source": [
 682 |     "#pretrained_W1 = loadmat(to_path + \"t1_ntm_w1.mat\", variable_names = \"w1\").get(\"w1\").astype(\"float32\") \n",
 683 |     "h5w1 = h5py.File('w1_pretrain.hdf5', 'r')\n",
 684 |     "pretrained_W1 = numpy.transpose(h5w1['w1'][:])\n",
 685 |     "h5w1.close()\n",
 686 |     "pretrained_W1.shape"
 687 |    ]
 688 |   },
 689 |   {
 690 |    "cell_type": "code",
 691 |    "execution_count": 6,
 692 |    "metadata": {
 693 |     "collapsed": false
 694 |    },
 695 |    "outputs": [
 696 |     {
 697 |      "data": {
 698 |       "text/plain": [
 699 |        "(954905, 40, 28956, 1)"
 700 |       ]
 701 |      },
 702 |      "execution_count": 6,
 703 |      "metadata": {},
 704 |      "output_type": "execute_result"
 705 |     }
 706 |    ],
 707 |    "source": [
 708 |     "(n_docs, n_topics, n_terms, n_epochs) = (pretrained_W1.shape[0], \n",
 709 |     "                               pretrained_W1.shape[1], \n",
 710 |     "                               term_matrix.shape[0], \n",
 711 |     "                                        examples.shape[1] - 2)\n",
 712 |     "(n_docs, n_topics, n_terms, n_epochs)"
 713 |    ]
 714 |   },
 715 |   {
 716 |    "cell_type": "code",
 717 |    "execution_count": 24,
 718 |    "metadata": {
 719 |     "collapsed": true
 720 |    },
 721 |    "outputs": [],
 722 |    "source": [
 723 |     "class DenseNoBias(Layer):\n",
 724 |     "    '''Fully connected NN layer with no bias term.\n",
 725 |     "    # Input shape\n",
 726 |     "        2D tensor with shape: `(nb_samples, input_dim)`.\n",
 727 |     "    # Output shape\n",
 728 |     "        2D tensor with shape: `(nb_samples, output_dim)`.\n",
 729 |     "    # Arguments\n",
 730 |     "        output_dim: int > 0.\n",
 731 |     "        init: name of initialization function for the weights of the layer\n",
 732 |     "            (see [initializations](../initializations.md)),\n",
 733 |     "            or alternatively, Theano function to use for weights\n",
 734 |     "            initialization. This parameter is only relevant\n",
 735 |     "            if you don't pass a `weights` argument.\n",
 736 |     "        activation: name of activation function to use\n",
 737 |     "            (see [activations](../activations.md)),\n",
 738 |     "            or alternatively, elementwise Theano function.\n",
 739 |     "            If you don't specify anything, no activation is applied\n",
 740 |     "            (ie. \"linear\" activation: a(x) = x).\n",
 741 |     "        weights: list of numpy arrays to set as initial weights.\n",
 742 |     "            The list should have 1 element, of shape `(input_dim, output_dim)`.\n",
 743 |     "        W_regularizer: instance of [WeightRegularizer](../regularizers.md)\n",
 744 |     "            (eg. L1 or L2 regularization), applied to the main weights matrix.\n",
 745 |     "        activity_regularizer: instance of [ActivityRegularizer](../regularizers.md),\n",
 746 |     "            applied to the network output.\n",
 747 |     "        W_constraint: instance of the [constraints](../constraints.md) module\n",
 748 |     "            (eg. maxnorm, nonneg), applied to the main weights matrix.\n",
 749 |     "        input_dim: dimensionality of the input (integer).\n",
 750 |     "            This argument (or alternatively, the keyword argument `input_shape`)\n",
 751 |     "            is required when using this layer as the first layer in a model.\n",
 752 |     "    '''\n",
 753 |     "    input_ndim = 2\n",
 754 |     "\n",
 755 |     "    def __init__(self, output_dim, init='glorot_uniform', activation='linear', weights=None,\n",
 756 |     "                 W_regularizer=None, activity_regularizer=None,\n",
 757 |     "                 W_constraint=None, input_dim=None, **kwargs):\n",
 758 |     "        self.init = initializations.get(init)\n",
 759 |     "        self.activation = activations.get(activation)\n",
 760 |     "        self.output_dim = output_dim\n",
 761 |     "\n",
 762 |     "        self.W_regularizer = regularizers.get(W_regularizer)\n",
 763 |     "        self.activity_regularizer = regularizers.get(activity_regularizer)\n",
 764 |     "\n",
 765 |     "        self.W_constraint = constraints.get(W_constraint)\n",
 766 |     "        self.constraints = [self.W_constraint]\n",
 767 |     "\n",
 768 |     "        self.initial_weights = weights\n",
 769 |     "\n",
 770 |     "        self.input_dim = input_dim\n",
 771 |     "        if self.input_dim:\n",
 772 |     "            kwargs['input_shape'] = (self.input_dim,)\n",
 773 |     "        self.input = K.placeholder(ndim=2)\n",
 774 |     "        super(DenseNoBias, self).__init__(**kwargs)\n",
 775 |     "\n",
 776 |     "    def build(self):\n",
 777 |     "        input_dim = self.input_shape[1]\n",
 778 |     "\n",
 779 |     "        self.W = self.init((input_dim, self.output_dim))\n",
 780 |     "\n",
 781 |     "        self.params = [self.W]\n",
 782 |     "\n",
 783 |     "        self.regularizers = []\n",
 784 |     "        if self.W_regularizer:\n",
 785 |     "            self.W_regularizer.set_param(self.W)\n",
 786 |     "            self.regularizers.append(self.W_regularizer)\n",
 787 |     "\n",
 788 |     "        if self.activity_regularizer:\n",
 789 |     "            self.activity_regularizer.set_layer(self)\n",
 790 |     "            self.regularizers.append(self.activity_regularizer)\n",
 791 |     "\n",
 792 |     "        if self.initial_weights is not None:\n",
 793 |     "            self.set_weights(self.initial_weights)\n",
 794 |     "            del self.initial_weights\n",
 795 |     "\n",
 796 |     "    @property\n",
 797 |     "    def output_shape(self):\n",
 798 |     "        return (self.input_shape[0], self.output_dim)\n",
 799 |     "\n",
 800 |     "    def get_output(self, train=False):\n",
 801 |     "        X = self.get_input(train)\n",
 802 |     "        output = self.activation(K.dot(X, self.W))\n",
 803 |     "        return output\n",
 804 |     "\n",
 805 |     "    def get_config(self):\n",
 806 |     "        config = {'name': self.__class__.__name__,\n",
 807 |     "                  'output_dim': self.output_dim,\n",
 808 |     "                  'init': self.init.__name__,\n",
 809 |     "                  'activation': self.activation.__name__,\n",
 810 |     "                  'W_regularizer': self.W_regularizer.get_config() if self.W_regularizer else None,\n",
 811 |     "                  'activity_regularizer': self.activity_regularizer.get_config() if self.activity_regularizer else None,\n",
 812 |     "                  'W_constraint': self.W_constraint.get_config() if self.W_constraint else None,\n",
 813 |     "                  'input_dim': self.input_dim}\n",
 814 |     "        base_config = super(DenseNoBias, self).get_config()\n",
 815 |     "        return dict(list(base_config.items()) + list(config.items()))"
 816 |    ]
 817 |   },
 818 |   {
 819 |    "cell_type": "code",
 820 |    "execution_count": 25,
 821 |    "metadata": {
 822 |     "collapsed": false,
 823 |     "scrolled": true
 824 |    },
 825 |    "outputs": [],
 826 |    "source": [
 827 |     "# Build the actual training model\n",
 828 |     "\n",
 829 |     "def build_ntm(term_matrix = term_matrix, \n",
 830 |     "              pre_W1 = pretrained_W1, \n",
 831 |     "              pre_W2 = pretrained_W2,  \n",
 832 |     "              W2_l2 = 0.001\n",
 833 |     "             ):\n",
 834 |     "    \n",
 835 |     "    n_docs = pretrained_W1.shape[0]\n",
 836 |     "    n_topics = pretrained_W1.shape[1]\n",
 837 |     "    n_terms = term_matrix.shape[0]\n",
 838 |     "    \n",
 839 |     "    ntm = Graph()\n",
 840 |     "    \n",
 841 |     "    ntm.add_input(name = \"g\", input_shape = (1,), dtype = \"int\")\n",
 842 |     "    ntm.add_node(Embedding(input_dim = n_terms, \n",
 843 |     "                          output_dim = 300,\n",
 844 |     "                          weights = [term_matrix], \n",
 845 |     "                           trainable = False,\n",
 846 |     "                           input_length = 1), \n",
 847 |     "                 name = \"le\", input = \"g\")\n",
 848 |     "    ntm.add_node(Flatten(), input = \"le\", name = \"le_\")\n",
 849 |     "    ntm.add_node(DenseNoBias(n_topics, activation = \"sigmoid\", \n",
 850 |     "                       weights = [pre_W2], \n",
 851 |     "                       W_regularizer = l2(W2_l2)\n",
 852 |     "                      ),\n",
 853 |     "                 name = \"lt\", input = \"le_\")\n",
 854 |     "    \n",
 855 |     "    ntm.add_input(name = \"d_pos\", input_shape = (1,), dtype = \"int\")\n",
 856 |     "    ntm.add_input(name = \"d_neg\", input_shape = (1,), dtype = \"int\")\n",
 857 |     "    ntm.add_shared_node(Embedding(input_dim = n_docs, \n",
 858 |     "                                  output_dim = n_topics, \n",
 859 |     "                                  weights = [pre_W1], \n",
 860 |     "                                  input_length = 1),\n",
 861 |     "                        name = \"topicmatrix\",\n",
 862 |     "                        inputs =  [\"d_pos\", \"d_neg\"], \n",
 863 |     "                        outputs = [\"wd_pos\", \"wd_neg\"],\n",
 864 |     "                        merge_mode = None)\n",
 865 |     "    ntm.add_node(Flatten(), name = \"wd_pos_\", input = \"wd_pos\")\n",
 866 |     "    ntm.add_node(Flatten(), name = \"wd_neg_\", input = \"wd_neg\")\n",
 867 |     "    ntm.add_node(Activation(\"softmax\"), name = \"ld_pos\", input = \"wd_pos_\")\n",
 868 |     "    ntm.add_node(Activation(\"softmax\"), name = \"ld_neg\", input = \"wd_neg_\")\n",
 869 |     "    \n",
 870 |     "    ntm.add_node(Layer(),\n",
 871 |     "                       name = \"ls_pos\", \n",
 872 |     "                       inputs = [\"lt\", \"ld_pos\"], \n",
 873 |     "                       merge_mode = 'dot', dot_axes = -1)# , create_output = True)\n",
 874 |     "    ntm.add_node(Layer(), \n",
 875 |     "                       name = \"ls_neg\", \n",
 876 |     "                       inputs = [\"lt\", \"ld_neg\"], \n",
 877 |     "                        merge_mode = 'dot', dot_axes = -1)#, create_output = True)\n",
 878 |     "    return ntm\n",
 879 |     "\n",
 880 |     "def add_fine_tuning(ntm = None):\n",
 881 |     "    import theano.tensor as T\n",
 882 |     "    def output_shape(input_shape):\n",
 883 |     "        return (None, 1)\n",
 884 |     "    \n",
 885 |     "    def sub_merge(layers):\n",
 886 |     "        import theano.tensor as T\n",
 887 |     "#        ls_pos = T.dot(layers[0], layers[1].T)\n",
 888 |     "#        ls_neg = T.dot(layers[0], layers[2].T)\n",
 889 |     "        ls_pos = layers[0]\n",
 890 |     "        ls_neg = layers[1]\n",
 891 |     "        #less = #T.mul(40000000,T.add(ls_neg, ls_pos))\n",
 892 |     "        less = T.sub(ls_neg, ls_pos)\n",
 893 |     "        return T.add(0.5, less)\n",
 894 |     "\n",
 895 |     "    #def sumLam(x):\n",
 896 |     "    #    return (0.5 + (x[1] - x[0]))\n",
 897 |     "\n",
 898 |     "    summer = LambdaMerge(layers = [ntm.nodes[\"ls_pos\"], \n",
 899 |     "                                   ntm.nodes[\"ls_neg\"]], \n",
 900 |     "                     function = sub_merge,\n",
 901 |     "                    output_shape = output_shape)\n",
 902 |     "    ntm.add_node(summer, inputs = [\"ls_pos\", \"ls_neg\"], name = \"summed\", create_output = True)\n",
 903 |     "\n",
 904 |     "    return ntm\n",
 905 |     "\n",
 906 |     "\n",
 907 |     "#SVG(to_graph(ntm).create(prog='dot', format='svg'))"
 908 |    ]
 909 |   },
 910 |   {
 911 |    "cell_type": "code",
 912 |    "execution_count": null,
 913 |    "metadata": {
 914 |     "collapsed": false,
 915 |     "scrolled": true
 916 |    },
 917 |    "outputs": [
 918 |     {
 919 |      "name": "stdout",
 920 |      "output_type": "stream",
 921 |      "text": [
 922 |       "Train on 45794004 samples, validate on 934572 samples\n",
 923 |       "Epoch 1/20\n",
 924 |       "  330000/45794004 [..............................] - ETA: 10931s - loss: 0.6916"
 925 |      ]
 926 |     }
 927 |    ],
 928 |    "source": [
 929 |     "# Fine-tuning\n",
 930 |     "ntm = build_ntm(W2_l2 = 0.001)\n",
 931 |     "ntm = add_fine_tuning(ntm)\n",
 932 |     "\n",
 933 |     "#def rawloss(x_train, x_test):\n",
 934 |     "#    return x_train * x_test\n",
 935 |     "def maxloss(y_true, y_predict):\n",
 936 |     "    return K.maximum(y_true,y_predict)\n",
 937 |     "#    return T.maximum(0., T.mul(y_true,y_predict ))\n",
 938 |     "\n",
 939 |     "#ntm.load_weights(\"cpw4_starte0_batch10000_sgd001_e_01_0.499998.hdf5\")\n",
 940 |     "\n",
 941 |     "ntm.compile(loss = {'summed' : maxloss#, \n",
 942 |     "                #   'ls_pos' : 'binary_crossentropy', \n",
 943 |     "              #     'ls_neg' : 'binary_crossentropy'\n",
 944 |     "                   },\n",
 945 |     "           optimizer = SGD(lr = 0.01))\n",
 946 |     "\n",
 947 |     "checkpointer = ModelCheckpoint(filepath=\"./cpw5_starte0_batch10000_sgd001_e_{epoch:02d}_{val_loss:.6f}.hdf5\", \n",
 948 |     "                               monitor = 'val_loss', verbose = 1, save_best_only=False)\n",
 949 |     "\n",
 950 |     "train_shape = (examples.shape[0], 1)\n",
 951 |     "trainer = examples \n",
 952 |     "        \n",
 953 |     "historylog = ntm.fit(data = {\n",
 954 |     "            \"g\" : numpy.reshape(trainer[:,1], train_shape), \n",
 955 |     "            \"d_pos\" : numpy.reshape(trainer[:,0], train_shape), \n",
 956 |     "            \"d_neg\" : numpy.reshape(trainer[:,2], train_shape),\n",
 957 |     "            \"summed\" : numpy.reshape(numpy.zeros(trainer.shape[0], dtype = theano.config.floatX),\n",
 958 |     "                                     train_shape)#, \n",
 959 |     "#            \"ls_pos\" : numpy.reshape(numpy.ones(trainer.shape[0], dtype = theano.config.floatX),\n",
 960 |     "#                                     train_shape),\n",
 961 |     "#            \"ls_neg\" : numpy.reshape(numpy.zeros(trainer.shape[0], dtype = theano.config.floatX),\n",
 962 |     "#                                     train_shape)\n",
 963 |     "        }, callbacks = [checkpointer],\n",
 964 |     "        validation_split = 0.02,\n",
 965 |     "            nb_epoch = 20, \n",
 966 |     "            batch_size = 10000)"
 967 |    ]
 968 |   },
 969 |   {
 970 |    "cell_type": "code",
 971 |    "execution_count": null,
 972 |    "metadata": {
 973 |     "collapsed": false
 974 |    },
 975 |    "outputs": [],
 976 |    "source": [
 977 |     "ntm.load_weights(\"cpw_new_startepoch0_00_0.5000.hdf5\")\n",
 978 |     "idxs = numpy.random.choice(trainer.shape[0], 200000, replace = False)\n",
 979 |     "tester = trainer[idxs,:]\n",
 980 |     "tester_shape = (tester.shape[0], 1)\n",
 981 |     "ntm.evaluate(data = {\n",
 982 |     "            \"g\" : numpy.reshape(tester[:,1], tester_shape), \n",
 983 |     "            \"d_pos\" : numpy.reshape(tester[:,0], tester_shape), \n",
 984 |     "            \"d_neg\" : numpy.reshape(tester[:,2], tester_shape),\n",
 985 |     "            \"loss_out\" : numpy.reshape(numpy.ones(tester.shape[0], \n",
 986 |     "                                                  dtype = theano.config.floatX), tester_shape)\n",
 987 |     "        }, batch_size = 20000)"
 988 |    ]
 989 |   },
 990 |   {
 991 |    "cell_type": "code",
 992 |    "execution_count": null,
 993 |    "metadata": {
 994 |     "collapsed": false
 995 |    },
 996 |    "outputs": [],
 997 |    "source": [
 998 |     "0.49998338818550109,0.49998418092727659,0.49998493790626525,0.49998548328876496,0.4999860256910324,0.49998660981655119,0.49998704195022581,0.49998756051063536"
 999 |    ]
1000 |   },
1001 |   {
1002 |    "cell_type": "code",
1003 |    "execution_count": null,
1004 |    "metadata": {
1005 |     "collapsed": false
1006 |    },
1007 |    "outputs": [],
1008 |    "source": [
1009 |     "[(x,  type(ntm.nodes[x]), ntm.nodes[x].output_shape) for x in ntm.nodes]"
1010 |    ]
1011 |   },
1012 |   {
1013 |    "cell_type": "code",
1014 |    "execution_count": null,
1015 |    "metadata": {
1016 |     "collapsed": false
1017 |    },
1018 |    "outputs": [],
1019 |    "source": [
1020 |     "json_string = ntm.to_json()\n",
1021 |     "open('ntm_final.json', 'w').write(json_string)\n",
1022 |     "ntm.save_weights(to_path + 'ntm_finalweights_.h5', overwrite=True)"
1023 |    ]
1024 |   },
1025 |   {
1026 |    "cell_type": "code",
1027 |    "execution_count": null,
1028 |    "metadata": {
1029 |     "collapsed": false
1030 |    },
1031 |    "outputs": [],
1032 |    "source": [
1033 |     "weights = ntm.get_weights()\n",
1034 |     "(weights[0].shape, weights[1].shape, weights[2].shape, \n",
1035 |     " weights[3].shape)"
1036 |    ]
1037 |   },
1038 |   {
1039 |    "cell_type": "code",
1040 |    "execution_count": null,
1041 |    "metadata": {
1042 |     "collapsed": false
1043 |    },
1044 |    "outputs": [],
1045 |    "source": [
1046 |     "w = ntm.nodes[\"lt\"].get_weights()\n",
1047 |     "(w[0].shape, w[1].shape)"
1048 |    ]
1049 |   },
1050 |   {
1051 |    "cell_type": "code",
1052 |    "execution_count": null,
1053 |    "metadata": {
1054 |     "collapsed": false
1055 |    },
1056 |    "outputs": [],
1057 |    "source": [
1058 |     "softies = weights[0][100000,:]\n",
1059 |     "numpy.exp(softies)/numpy.sum(numpy.exp(softies))"
1060 |    ]
1061 |   },
1062 |   {
1063 |    "cell_type": "code",
1064 |    "execution_count": null,
1065 |    "metadata": {
1066 |     "collapsed": false
1067 |    },
1068 |    "outputs": [],
1069 |    "source": [
1070 |     "1 / (1 + numpy.exp( - weights[2][100,:]))"
1071 |    ]
1072 |   },
1073 |   {
1074 |    "cell_type": "code",
1075 |    "execution_count": null,
1076 |    "metadata": {
1077 |     "collapsed": true
1078 |    },
1079 |    "outputs": [],
1080 |    "source": [
1081 |     "# sNTM\n",
1082 |     "def rawloss(x_train, x_test):\n",
1083 |     "    return x_train * x_test\n",
1084 |     "n_categories = 3\n",
1085 |     "ntm.add_node(Dense(n_categories, activation = \"sigmoid\"), input = \"ld_pos\", name = \"ll\")\n",
1086 |     "ntm.add_output(name = \"label\", input = \"ll\")\n",
1087 |     "ntm.compile(loss = {'loss_out' : threshold,\n",
1088 |     "                   'label' : 'categorical_crossentropy'}, \n",
1089 |     "           optimizer = \"Adadelta\")\n",
1090 |     "\n",
1091 |     "checkpointer = ModelCheckpoint(filepath=\"./cpw_new_smallbatch_sgd001_epoch_{epoch:02d}_{val_loss:.5f}.hdf5\", \n",
1092 |     "                               monitor = 'val_loss', verbose = 1, save_best_only=False)\n",
1093 |     "\n",
1094 |     "train_shape = (examples.shape[0], 1)\n",
1095 |     "trainer = examples \n",
1096 |     "        \n",
1097 |     "historylog = ntm.fit(data = {\n",
1098 |     "            \"g\" : numpy.reshape(trainer[:,1], train_shape), \n",
1099 |     "            \"d_pos\" : numpy.reshape(trainer[:,0], train_shape), \n",
1100 |     "            \"d_neg\" : numpy.reshape(trainer[:,2], train_shape),\n",
1101 |     "            \"loss_out\" : numpy.reshape(numpy.ones(trainer.shape[0], \n",
1102 |     "                                                  dtype = theano.config.floatX), train_shape)\n",
1103 |     "        # Need to add something here for the labels\n",
1104 |     "        }, callbacks = [checkpointer],\n",
1105 |     "        validation_split = 0.02,\n",
1106 |     "            nb_epoch = 20, \n",
1107 |     "            batch_size = 10)"
1108 |    ]
1109 |   }
1110 |  ],
1111 |  "metadata": {
1112 |   "kernelspec": {
1113 |    "display_name": "Python 2",
1114 |    "language": "python",
1115 |    "name": "python2"
1116 |   },
1117 |   "language_info": {
1118 |    "codemirror_mode": {
1119 |     "name": "ipython",
1120 |     "version": 2
1121 |    },
1122 |    "file_extension": ".py",
1123 |    "mimetype": "text/x-python",
1124 |    "name": "python",
1125 |    "nbconvert_exporter": "python",
1126 |    "pygments_lexer": "ipython2",
1127 |    "version": "2.7.11"
1128 |   }
1129 |  },
1130 |  "nbformat": 4,
1131 |  "nbformat_minor": 0
1132 | }
1133 | 


--------------------------------------------------------------------------------