├── runtrain.lua ├── README.md ├── dA.py ├── NTP.ipynb └── NTP2.ipynb /runtrain.lua: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/elbamos/NeuralTopicModels/HEAD/runtrain.lua -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # NeuralTopicModels 2 | 3 | This repo contains a WIP implementation of http://nlp.cs.rpi.edu/paper/AAAI15.pdf. This is here for the purpose of sharing with collaborators -- it isn't final code or presentation-quality code. 4 | 5 | The NTM model is intended to work essentially as follows: 6 | 7 | The outputs are W1 and lt. 8 | 9 | * W1 -- An embedding showing, approximately, the distribution of topics over each document. 10 | * lt -- An embedding showing, approximately, the distribution of topics over each term. 11 | 12 | W1 and lt are calculated as follows: 13 | 14 | ### Pre-training 15 | 16 | `le` R^{n_terms * 300} is created mapping each term to the sum of word2vec embeddings of grams within the term. (Because a term may be an n > 1 gram.) 17 | 18 | `le` is mapped to `lt` by sigmoid activation of weight matrix W2. 19 | 20 | W2 R^{300 * n_topics} is pre-trained by auto-encoding `le` against itself. 21 | 22 | W1 R^{n_docs * n_topics} is pre-trained so that each document's embedding is the sum of the pre-trained `lt` activations for the terms contained in the document. 23 | 24 | ### Fine-tuning 25 | 26 | Each example is a combination of (a) a term, (b) a document containing the term, and (c) a random document that does not contain the term. 27 | 28 | `ld+` and `ld-` R^{n_topics} are the softmax activation of W1 for the positive and negative documents, respectively. 29 | 30 | `ls+` and `ls-` are calculated. Each is a scalar representing the predicted probability that the term would appear in the positive and negative documents, respectively. 31 | 32 | `ls` = `lt` dotproduct ld'` 33 | 34 | The cost is then calculated as: 35 | 36 | c(g, d+, d-) = max(0., 0.5 - `ls-` + `ls+`) 37 | 38 | Thus, the algorithm wants to find (a) an embedding for the documents, and (b) a weight matrix mapping the term word2vec embeddings to topics, where given any term and a document containing the term, the predicted probability that the term would appear in the document is at least 0.5 greater than the predicted probability the term would appear in a randomly chosen document that does not contain it. 39 | 40 | ### Issues 41 | 42 | In my testing, with a 1M document, 30000 term corpus with ~ 10M total grams, aiming for 128 topics, I found that the W1 and W2 both consistnetly converge toward 0, usually after only 1 epoch. 43 | 44 | * Theory: In debugging, I observed that the calculation of `ls-` - `ls+` tends to be around 1 e-8. Adding 0.5, I suspect that a 32-bit float would represent the number only as 0.5, losing precision. 45 | 46 | I suspect that this is then causing every loss to be calculated as 0.5, and confusing the gradients for W1 and W2. 47 | 48 | Experiments: 49 | 50 | * Pretraining W1 & W2 by ignoring the 0.5 separation: I tried this cost function: 51 | 52 | c(g, d+, d-) = mean(binary_crossentropy(`ls+`, 1), binary_crossentropy(`ls-`, 0)) 53 | 54 | Result: Convergence toward zero. 55 | 56 | * Gradient enhancement: on the theory that the problem was underflow, I tried this cost function: 57 | 58 | c(g, d+, d-) = max(0., 0.5 + max(n_docs, 10 ** epoch) * (`ls-` - `ls+`)) 59 | 60 | Result: Convergence toward zero 61 | 62 | * Normalization: To try to force the weights on W2 and W1 to not approach zero, I tried: 63 | 64 | Modifying the formula for `lt` to softmax(softplus(`le`)). This is intended to prevent `lt` from approaching zero while encouraging greater differentiation of topics and terms. 65 | 66 | Enforcing a unit-norm constraint on W1, the document-topic embedding matrix. 67 | 68 | Result: In testing 69 | 70 | * Optimization: I experimented with `adadelta` (which I have found very effective) intead of vanilla SGD. 71 | 72 | Result: W1 converged toward zero much more quickly than with vanilla SGD. 73 | 74 | -------------------------------------------------------------------------------- /dA.py: -------------------------------------------------------------------------------- 1 | """ 2 | This is the denoising auto-encoder class from deeplearning.net, modified as described in the accompanying iPython notebook 3 | """ 4 | 5 | import os 6 | import sys 7 | import timeit 8 | 9 | import numpy 10 | 11 | import theano 12 | import theano.tensor as T 13 | from theano.tensor.shared_randomstreams import RandomStreams 14 | 15 | from logistic_sgd import load_data 16 | from utils import tile_raster_images 17 | 18 | try: 19 | import PIL.Image as Image 20 | except ImportError: 21 | import Image 22 | 23 | 24 | class dA(object): 25 | """Denoising Auto-Encoder class (dA) 26 | 27 | A denoising autoencoders tries to reconstruct the input from a corrupted 28 | version of it by projecting it first in a latent space and reprojecting 29 | it afterwards back in the input space. Please refer to Vincent et al.,2008 30 | for more details. If x is the input then equation (1) computes a partially 31 | destroyed version of x by means of a stochastic mapping q_D. Equation (2) 32 | computes the projection of the input into the latent space. Equation (3) 33 | computes the reconstruction of the input, while equation (4) computes the 34 | reconstruction error. 35 | 36 | .. math:: 37 | 38 | \tilde{x} ~ q_D(\tilde{x}|x) (1) 39 | 40 | y = s(W \tilde{x} + b) (2) 41 | 42 | x = s(W' y + b') (3) 43 | 44 | L(x,z) = -sum_{k=1}^d [x_k \log z_k + (1-x_k) \log( 1-z_k)] (4) 45 | 46 | """ 47 | 48 | def __init__( 49 | self, 50 | numpy_rng, 51 | theano_rng=None, 52 | input=None, 53 | n_visible=784, 54 | n_hidden=500, 55 | W=None 56 | ): 57 | """ 58 | Initialize the dA class by specifying the number of visible units (the 59 | dimension d of the input ), the number of hidden units ( the dimension 60 | d' of the latent or hidden space ) and the corruption level. The 61 | constructor also receives symbolic variables for the input, weights and 62 | bias. Such a symbolic variables are useful when, for example the input 63 | is the result of some computations, or when weights are shared between 64 | the dA and an MLP layer. When dealing with SdAs this always happens, 65 | the dA on layer 2 gets as input the output of the dA on layer 1, 66 | and the weights of the dA are used in the second stage of training 67 | to construct an MLP. 68 | 69 | :type numpy_rng: numpy.random.RandomState 70 | :param numpy_rng: number random generator used to generate weights 71 | 72 | :type theano_rng: theano.tensor.shared_randomstreams.RandomStreams 73 | :param theano_rng: Theano random generator; if None is given one is 74 | generated based on a seed drawn from `rng` 75 | 76 | :type input: theano.tensor.TensorType 77 | :param input: a symbolic description of the input or None for 78 | standalone dA 79 | 80 | :type n_visible: int 81 | :param n_visible: number of visible units 82 | 83 | :type n_hidden: int 84 | :param n_hidden: number of hidden units 85 | 86 | :type W: theano.tensor.TensorType 87 | :param W: Theano variable pointing to a set of weights that should be 88 | shared belong the dA and another architecture; if dA should 89 | be standalone set this to None 90 | 91 | """ 92 | self.n_visible = n_visible 93 | self.n_hidden = n_hidden 94 | 95 | # create a Theano random generator that gives symbolic random values 96 | if not theano_rng: 97 | theano_rng = RandomStreams(numpy_rng.randint(2 ** 30)) 98 | 99 | # note : W' was written as `W_prime` and b' as `b_prime` 100 | if not W: 101 | # W is initialized with `initial_W` which is uniformely sampled 102 | # from -4*sqrt(6./(n_visible+n_hidden)) and 103 | # 4*sqrt(6./(n_hidden+n_visible))the output of uniform if 104 | # converted using asarray to dtype 105 | # theano.config.floatX so that the code is runable on GPU 106 | initial_W = numpy.asarray( 107 | numpy_rng.uniform( 108 | low=-4 * numpy.sqrt(6. / (n_hidden + n_visible)), 109 | high=4 * numpy.sqrt(6. / (n_hidden + n_visible)), 110 | size=(n_visible, n_hidden) 111 | ), 112 | dtype=theano.config.floatX 113 | ) 114 | W = theano.shared(value=initial_W, name='W', borrow=True) 115 | 116 | self.W = W 117 | # tied weights, therefore W_prime is W transpose 118 | self.W_prime = self.W.T 119 | self.theano_rng = theano_rng 120 | # if no input is given, generate a variable representing the input 121 | if input is None: 122 | # we use a matrix because we expect a minibatch of several 123 | # examples, each example being a row 124 | self.x = T.dmatrix(name='input') 125 | else: 126 | self.x = input 127 | 128 | self.params = [self.W] 129 | 130 | def get_corrupted_input(self, input, corruption_level): 131 | """This function keeps ``1-corruption_level`` entries of the inputs the 132 | same and zero-out randomly selected subset of size ``coruption_level`` 133 | Note : first argument of theano.rng.binomial is the shape(size) of 134 | random numbers that it should produce 135 | second argument is the number of trials 136 | third argument is the probability of success of any trial 137 | 138 | this will produce an array of 0s and 1s where 1 has a 139 | probability of 1 - ``corruption_level`` and 0 with 140 | ``corruption_level`` 141 | 142 | The binomial function return int64 data type by 143 | default. int64 multiplicated by the input 144 | type(floatX) always return float64. To keep all data 145 | in floatX when floatX is float32, we set the dtype of 146 | the binomial to floatX. As in our case the value of 147 | the binomial is always 0 or 1, this don't change the 148 | result. This is needed to allow the gpu to work 149 | correctly as it only support float32 for now. 150 | 151 | """ 152 | return self.theano_rng.binomial(size=input.shape, n=1, 153 | p=1 - corruption_level, 154 | dtype=theano.config.floatX) * input 155 | 156 | def get_hidden_values(self, input): 157 | """ Computes the values of the hidden layer """ 158 | return T.nnet.sigmoid(T.dot(input, self.W)) 159 | 160 | def get_reconstructed_input(self, hidden): 161 | """Computes the reconstructed input given the values of the 162 | hidden layer 163 | 164 | """ 165 | return T.dot(hidden, self.W_prime) 166 | 167 | def get_cost_updates(self, corruption_level, learning_rate): 168 | """ This function computes the cost and the updates for one trainng 169 | step of the dA """ 170 | 171 | tilde_x = self.get_corrupted_input(self.x, corruption_level) 172 | y = self.get_hidden_values(tilde_x) 173 | z = self.get_reconstructed_input(y) 174 | # note : we sum over the size of a datapoint; if we are using 175 | # minibatches, L will be a vector, with one entry per 176 | # example in minibatch 177 | # L = - T.sum(self.x * T.log(z) + (1 - self.x) * T.log(1 - z), axis=1) 178 | L = T.sqr(self.x - z) 179 | # note : L is now a vector, where each element is the 180 | # cross-entropy cost of the reconstruction of the 181 | # corresponding example of the minibatch. We need to 182 | # compute the average of all these to get the cost of 183 | # the minibatch 184 | cost = T.mean(L) 185 | 186 | # compute the gradients of the cost of the `dA` with respect 187 | # to its parameters 188 | gparams = T.grad(cost, self.params) 189 | # generate the list of updates 190 | updates = [ 191 | (param, param - learning_rate * gparam) 192 | for param, gparam in zip(self.params, gparams) 193 | ] 194 | 195 | return (cost, updates) 196 | 197 | 198 | def test_dA(learning_rate=0.1, training_epochs=15, 199 | dataset='mnist.pkl.gz', 200 | batch_size=20, output_folder='dA_plots'): 201 | 202 | """ 203 | This demo is tested on MNIST 204 | 205 | :type learning_rate: float 206 | :param learning_rate: learning rate used for training the DeNosing 207 | AutoEncoder 208 | 209 | :type training_epochs: int 210 | :param training_epochs: number of epochs used for training 211 | 212 | :type dataset: string 213 | :param dataset: path to the picked dataset 214 | 215 | """ 216 | datasets = load_data(dataset) 217 | train_set_x, train_set_y = datasets[0] 218 | 219 | # compute number of minibatches for training, validation and testing 220 | n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size 221 | 222 | # start-snippet-2 223 | # allocate symbolic variables for the data 224 | index = T.lscalar() # index to a [mini]batch 225 | x = T.matrix('x') # the data is presented as rasterized images 226 | # end-snippet-2 227 | 228 | if not os.path.isdir(output_folder): 229 | os.makedirs(output_folder) 230 | os.chdir(output_folder) 231 | 232 | #################################### 233 | # BUILDING THE MODEL NO CORRUPTION # 234 | #################################### 235 | 236 | rng = numpy.random.RandomState(123) 237 | theano_rng = RandomStreams(rng.randint(2 ** 30)) 238 | 239 | da = dA( 240 | numpy_rng=rng, 241 | theano_rng=theano_rng, 242 | input=x, 243 | n_visible=28 * 28, 244 | n_hidden=500 245 | ) 246 | 247 | cost, updates = da.get_cost_updates( 248 | corruption_level=0., 249 | learning_rate=learning_rate 250 | ) 251 | 252 | train_da = theano.function( 253 | [index], 254 | cost, 255 | updates=updates, 256 | givens={ 257 | x: train_set_x[index * batch_size: (index + 1) * batch_size] 258 | } 259 | ) 260 | 261 | start_time = timeit.default_timer() 262 | 263 | ############ 264 | # TRAINING # 265 | ############ 266 | 267 | # go through training epochs 268 | for epoch in xrange(training_epochs): 269 | # go through trainng set 270 | c = [] 271 | for batch_index in xrange(n_train_batches): 272 | c.append(train_da(batch_index)) 273 | 274 | print 'Training epoch %d, cost ' % epoch, numpy.mean(c) 275 | 276 | end_time = timeit.default_timer() 277 | 278 | training_time = (end_time - start_time) 279 | 280 | print >> sys.stderr, ('The no corruption code for file ' + 281 | os.path.split(__file__)[1] + 282 | ' ran for %.2fm' % ((training_time) / 60.)) 283 | image = Image.fromarray( 284 | tile_raster_images(X=da.W.get_value(borrow=True).T, 285 | img_shape=(28, 28), tile_shape=(10, 10), 286 | tile_spacing=(1, 1))) 287 | image.save('filters_corruption_0.png') 288 | 289 | # start-snippet-3 290 | ##################################### 291 | # BUILDING THE MODEL CORRUPTION 30% # 292 | ##################################### 293 | 294 | rng = numpy.random.RandomState(123) 295 | theano_rng = RandomStreams(rng.randint(2 ** 30)) 296 | 297 | da = dA( 298 | numpy_rng=rng, 299 | theano_rng=theano_rng, 300 | input=x, 301 | n_visible=28 * 28, 302 | n_hidden=500 303 | ) 304 | 305 | cost, updates = da.get_cost_updates( 306 | corruption_level=0.3, 307 | learning_rate=learning_rate 308 | ) 309 | 310 | train_da = theano.function( 311 | [index], 312 | cost, 313 | updates=updates, 314 | givens={ 315 | x: train_set_x[index * batch_size: (index + 1) * batch_size] 316 | } 317 | ) 318 | 319 | start_time = timeit.default_timer() 320 | 321 | ############ 322 | # TRAINING # 323 | ############ 324 | 325 | # go through training epochs 326 | for epoch in xrange(training_epochs): 327 | # go through trainng set 328 | c = [] 329 | for batch_index in xrange(n_train_batches): 330 | c.append(train_da(batch_index)) 331 | 332 | print 'Training epoch %d, cost ' % epoch, numpy.mean(c) 333 | 334 | end_time = timeit.default_timer() 335 | 336 | training_time = (end_time - start_time) 337 | 338 | print >> sys.stderr, ('The 30% corruption code for file ' + 339 | os.path.split(__file__)[1] + 340 | ' ran for %.2fm' % (training_time / 60.)) 341 | # end-snippet-3 342 | 343 | # start-snippet-4 344 | image = Image.fromarray(tile_raster_images( 345 | X=da.W.get_value(borrow=True).T, 346 | img_shape=(28, 28), tile_shape=(10, 10), 347 | tile_spacing=(1, 1))) 348 | image.save('filters_corruption_30.png') 349 | # end-snippet-4 350 | 351 | os.chdir('../') 352 | 353 | 354 | if __name__ == '__main__': 355 | test_dA() 356 | -------------------------------------------------------------------------------- /NTP.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "import os\n", 12 | "import sys\n", 13 | "import timeit\n", 14 | "import numpy\n", 15 | "from keras.models import *\n", 16 | "from keras.layers.core import *\n", 17 | "from keras.layers.embeddings import *\n", 18 | "from keras.regularizers import l2\n", 19 | "from keras import backend as K\n", 20 | "from scipy.io import loadmat\n", 21 | "from scipy.io import savemat\n", 22 | "from keras.models import model_from_json\n", 23 | "from IPython.display import SVG\n", 24 | "from keras.utils.visualize_util import to_graph\n", 25 | "from keras.callbacks import ModelCheckpoint\n", 26 | "import theano\n", 27 | "import theano.tensor as T\n", 28 | "from theano.tensor.shared_randomstreams import RandomStreams\n", 29 | "to_path = \"./\"" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": { 36 | "collapsed": false 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "# Matrix of word2vec activations for each term in our dictionary, (n_terms, 300)\n", 41 | "term_matrix = loadmat(to_path + \"t1_termatrix.mat\", variable_names = \"target\").get(\"target\").astype(\"float32\")\n", 42 | "term_matrix.shape" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": null, 48 | "metadata": { 49 | "collapsed": false 50 | }, 51 | "outputs": [], 52 | "source": [ 53 | "# Pretrain W2\n", 54 | "# W2 is pretrained by autoencoding with the formula le' = sigmoid(le dot W2) dot t(W2)\n", 55 | "from dA import dA\n", 56 | "# The dA class from deeplearning.net was modified for this purpose. \n", 57 | "# In particular, the changes are:\n", 58 | "# 1. Biases are taken out. \n", 59 | "# 2. The visible layer activation function is changed from sigmoid(sigmoid(le dot W2) dot t(W2))\n", 60 | "# 3. mse is used as the cost function\n", 61 | "\n", 62 | "index = T.lscalar() \n", 63 | "x = T.matrix('x')\n", 64 | "rng = numpy.random.RandomState(123)\n", 65 | "theano_rng = RandomStreams(rng.randint(2 ** 30))\n", 66 | "pretrainer = dA(input = x, numpy_rng = rng, \n", 67 | " theano_rng = theano_rng, \n", 68 | " n_visible = 300, n_hidden = 128)\n", 69 | "cost, updates = pretrainer.get_cost_updates(\n", 70 | " corruption_level=0,\n", 71 | " learning_rate=0.01\n", 72 | " )\n", 73 | "train_data = theano.shared(name = \"trainer\", \n", 74 | " value = term_matrix, \n", 75 | " borrow = True)\n", 76 | "batch_size = 10\n", 77 | "train_da = theano.function(\n", 78 | " [index],\n", 79 | " cost,\n", 80 | " updates=updates,\n", 81 | " givens={\n", 82 | " x: train_data[index * batch_size: (index + 1) * batch_size]\n", 83 | " }\n", 84 | " )" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": null, 90 | "metadata": { 91 | "collapsed": false 92 | }, 93 | "outputs": [], 94 | "source": [ 95 | "# Ultimately, I trained for about 700 epochs\n", 96 | "# The measure of sparseness was used for validation because\n", 97 | "# of the risk of exploding/vanishing gradients\n", 98 | "n_epochs = 1000\n", 99 | "batch_size = 10\n", 100 | "batches = term_matrix.shape[0] // batch_size\n", 101 | "for epoch in xrange(n_epochs):\n", 102 | " c = []\n", 103 | " for batch_index in xrange(batches):\n", 104 | " c.append(train_da(batch_index))\n", 105 | " W2_pre = pretrainer.W.get_value(borrow = True)\n", 106 | " output = numpy.dot(term_matrix, W2_pre) \n", 107 | " output = 1 / (1 + numpy.exp(-output))\n", 108 | " sparseness = numpy.sum(output) / (output.shape[0] * output.shape[1])\n", 109 | " print 'Training epoch %d, cost %f, sparseness %f' % (epoch, numpy.mean(c), sparseness)" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": null, 115 | "metadata": { 116 | "collapsed": false 117 | }, 118 | "outputs": [], 119 | "source": [ 120 | "# The trained weight matrix is extracted, and hidden-unit activations\n", 121 | "# are extracted and saved to disk for pre-training W1. \n", 122 | "W2_pre = pretrainer.W.get_value(borrow = True)\n", 123 | "output = numpy.dot(term_matrix, W2_pre)\n", 124 | "savemat(\"./t1_ntm_pretrain.mat\", { 'activations' : output,\n", 125 | " 'W2' : W2_pre})\n", 126 | "(W2_pre.shape, W2_pre[0,:], output.shape)" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": null, 132 | "metadata": { 133 | "collapsed": false 134 | }, 135 | "outputs": [], 136 | "source": [ 137 | "#W2_pre = loadmat(to_path + \"t1_ntm_pretrain.mat\", variable_names = \"W2\").get(\"W2\").astype(\"float32\")" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": { 144 | "collapsed": false 145 | }, 146 | "outputs": [], 147 | "source": [ 148 | "# W1 was pretrained in R\n", 149 | "# For each document, its pre-trained embedding is the sum of the W1 activations for all terms found in the document\n", 150 | "pretrained_W1 = loadmat(to_path + \"t1_ntm_pret.mat\", variable_names = \"w1\").get(\"w1\").astype(\"float32\") " 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": { 157 | "collapsed": true 158 | }, 159 | "outputs": [], 160 | "source": [ 161 | "# The training set is (n_grams, 2 + n_epochs) matrix\n", 162 | "# Columns are:\n", 163 | "# 0. Index of a document (d_pos)\n", 164 | "# 1. Index of a term found in d_pos (g)\n", 165 | "# 2...(1 + n_epochs). Indices of randomly selected documents that do not contain g (d_neg)\n", 166 | "# d_negs were selected proportionate to the inverse of the number of terms in each document,\n", 167 | "# so documents get approximately the same number of total passes in each epoch\n", 168 | "\n", 169 | "examples = loadmat(to_path + \"t1_ntm_pret.mat\", variable_names = \"examples\").get(\"examples\")\n", 170 | "examples = numpy.vstack(tuple([examples[:,(0,1,x)] for x in range(2, examples.shape[1])]))" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": null, 176 | "metadata": { 177 | "collapsed": false 178 | }, 179 | "outputs": [], 180 | "source": [ 181 | "(n_docs, n_topics, n_terms, n_total_grams) = (pretrained_W1.shape[0], \n", 182 | " pretrained_W1.shape[1], \n", 183 | " term_matrix.shape[0], \n", 184 | " examples.shape[0])\n", 185 | "(n_docs, n_topics, n_terms, n_total_grams)" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "metadata": { 192 | "collapsed": false, 193 | "scrolled": true 194 | }, 195 | "outputs": [], 196 | "source": [ 197 | "# Basic NTM model\n", 198 | "def build_ntm(w1_weights, \n", 199 | " w2_weights,\n", 200 | " term_matrix,\n", 201 | " W1_regularizer = l2(0.001), \n", 202 | " W2_regularizer = l2(0.001)\n", 203 | " ):\n", 204 | " \n", 205 | " n_docs = w1_weights.shape[0]\n", 206 | " n_topics = w2_weights.shape[1]\n", 207 | " n_terms = w2_weights.shape[0]\n", 208 | " \n", 209 | " ntm = Graph()\n", 210 | " \n", 211 | " ntm.add_input(name = \"d_pos\", input_shape = (1,), dtype = \"int\")\n", 212 | " ntm.add_input(name = \"d_neg\", input_shape = (1,), dtype = \"int\")\n", 213 | " ntm.add_shared_node(Embedding(input_dim = n_docs, \n", 214 | " output_dim = n_topics, \n", 215 | " weights = [w1_weights], \n", 216 | " W_regularizer = W1_regularizer,\n", 217 | " input_length = 1),\n", 218 | " name = \"topicmatrix\",\n", 219 | " inputs = [\"d_pos\", \"d_neg\"], \n", 220 | " outputs = [\"wd_pos\", \"wd_neg\"],\n", 221 | " merge_mode = None)\n", 222 | " ntm.add_node(Flatten(), name = \"wd_pos_\", input = \"wd_pos\")\n", 223 | " ntm.add_node(Flatten(), name = \"wd_neg_\", input = \"wd_neg\")\n", 224 | " ntm.add_node(Activation(\"softmax\"), name = \"ld_pos\", input = \"wd_pos_\")\n", 225 | " ntm.add_node(Activation(\"softmax\"), name = \"ld_neg\", input = \"wd_neg_\")\n", 226 | " \n", 227 | " ntm.add_input(name = \"g\", input_shape = (1,), dtype = \"int\")\n", 228 | " ntm.add_node(Embedding(input_dim = n_terms, \n", 229 | " output_dim = 300,\n", 230 | " weights = [term_matrix], \n", 231 | " trainable = False,\n", 232 | " input_length = 1), \n", 233 | " name = \"le\", input = \"g\")\n", 234 | " ntm.add_node(Flatten(), input = \"le\", name = \"le_\")\n", 235 | " ntm.add_node(Dense(n_topics, activation = \"sigmoid\", \n", 236 | " weights = [w2_weights, numpy.zeros(n_topics)], \n", 237 | " W_regularizer = W2_regularizer),\n", 238 | " name = \"lt\", input = \"le_\")\n", 239 | " \n", 240 | " ntm.add_node(Layer(),\n", 241 | " name = \"ls_pos\", \n", 242 | " inputs = [\"lt\", \"ld_pos\"], \n", 243 | " merge_mode = 'dot', dot_axes = -1)\n", 244 | " ntm.add_node(Layer(), \n", 245 | " name = \"ls_neg\", \n", 246 | " inputs = [\"lt\", \"ld_neg\"], \n", 247 | " merge_mode = 'dot', dot_axes = -1)\n", 248 | " return ntm" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": null, 254 | "metadata": { 255 | "collapsed": false, 256 | "scrolled": true 257 | }, 258 | "outputs": [], 259 | "source": [ 260 | "# Train the model\n", 261 | "\n", 262 | "# Very large batch sizes are a good idea. \n", 263 | "# Even with very large batches, the its unlikely that many weights in W1 will be \n", 264 | "# triggered more than once each batch. But, W2 weights get updated from every row. \n", 265 | "# W2 therefore wants to overfit before W1 is finished training. Using large batch\n", 266 | "# sizes mitigates the effect. \n", 267 | "batch_size = 20000\n", 268 | "n_epochs = 10\n", 269 | "margin = 0.5\n", 270 | "ntm = build_ntm(\n", 271 | " w1_weights = pretrained_W1, \n", 272 | " w2_weights = W2_pre,\n", 273 | " term_matrix = term_matrix,\n", 274 | " W1_regularizer = l2(0.001), \n", 275 | " W2_regularizer = l2(0.001))\n", 276 | "\n", 277 | "def output_shape(input_shape):\n", 278 | " return (None, 1)\n", 279 | "\n", 280 | "def sumLam(x):\n", 281 | " return (margin + (x[1] - x[0]))\n", 282 | "\n", 283 | "summer = LambdaMerge(layers = [ntm.nodes[\"ls_pos\"], ntm.nodes[\"ls_neg\"] ], \n", 284 | " function = sumLam,\n", 285 | " output_shape = output_shape)\n", 286 | "ntm.add_node(summer, inputs = [\"ls_pos\", \"ls_neg\"], \n", 287 | " name = \"summed\")\n", 288 | "ntm.add_output(name = \"loss_out\", input= \"summed\")\n", 289 | "\n", 290 | "def rawloss(x_train, x_test):\n", 291 | " return x_train * x_test\n", 292 | "\n", 293 | "# Adadelta tended to converge more quickly than SGD\n", 294 | "ntm.compile(loss = {'loss_out' : rawloss},\n", 295 | " optimizer = 'Adadelta') \n", 296 | "\n", 297 | "checkpointer = ModelCheckpoint(filepath=\"./checkpointweights.hdf5\", verbose = 1, save_best_only=True)\n", 298 | "\n", 299 | "train_data = examples\n", 300 | "train_shape = (train_data.shape[0], 1)\n", 301 | "g = numpy.reshape(examples[:,1], train_shape)\n", 302 | "d_pos = numpy.reshape(examples[:,0], train_shape)\n", 303 | "d_neg = numpy.reshape(examples[:,2], train_shape)\n", 304 | " \n", 305 | "ntm.fit(data = {\n", 306 | " \"g\" : g, \n", 307 | " \"d_pos\" : d_pos, \n", 308 | " \"d_neg\" : d_neg,\n", 309 | " \"loss_out\" : numpy.reshape(numpy.ones(trainer.shape[0], \n", 310 | " dtype = theano.config.floatX), train_shape)\n", 311 | " }, callbacks = [checkpointer],\n", 312 | " validation_split = 0.005,\n", 313 | " nb_epoch = n_epochs, \n", 314 | " batch_size = batch_size)" 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": null, 320 | "metadata": { 321 | "collapsed": false 322 | }, 323 | "outputs": [], 324 | "source": [ 325 | "json_string = ntm.to_json()\n", 326 | "open('ntm_final.json', 'w').write(json_string)\n", 327 | "ntm.save_weights(to_path + 'ntm_finalweights_.h5', overwrite=True)" 328 | ] 329 | }, 330 | { 331 | "cell_type": "code", 332 | "execution_count": null, 333 | "metadata": { 334 | "collapsed": true 335 | }, 336 | "outputs": [], 337 | "source": [ 338 | "# sNTM - Not fully tested\n", 339 | "n_categories = 3\n", 340 | "ntm.add_node(Dense(n_categories, activation = \"sigmoid\"), input = \"ld_pos\", name = \"ll\")\n", 341 | "ntm.add_output(name = \"label\", input = \"ll\")\n", 342 | "ntm.compile(loss = {'loss_out' : threshold,\n", 343 | " 'label' : 'categorical_crossentropy'}, \n", 344 | " optimizer = \"Adadelta\")" 345 | ] 346 | } 347 | ], 348 | "metadata": { 349 | "kernelspec": { 350 | "display_name": "Python 2", 351 | "language": "python", 352 | "name": "python2" 353 | }, 354 | "language_info": { 355 | "codemirror_mode": { 356 | "name": "ipython", 357 | "version": 2 358 | }, 359 | "file_extension": ".py", 360 | "mimetype": "text/x-python", 361 | "name": "python", 362 | "nbconvert_exporter": "python", 363 | "pygments_lexer": "ipython2", 364 | "version": "2.7.11" 365 | } 366 | }, 367 | "nbformat": 4, 368 | "nbformat_minor": 0 369 | } 370 | -------------------------------------------------------------------------------- /NTP2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [ 10 | { 11 | "name": "stdout", 12 | "output_type": "stream", 13 | "text": [ 14 | "Using Theano backend.\n" 15 | ] 16 | }, 17 | { 18 | "name": "stderr", 19 | "output_type": "stream", 20 | "text": [ 21 | "Using gpu device 0: GeForce GTX 980 (CNMeM is enabled)\n" 22 | ] 23 | } 24 | ], 25 | "source": [ 26 | "import os\n", 27 | "import sys\n", 28 | "import timeit\n", 29 | "import numpy\n", 30 | "from keras.models import *\n", 31 | "from keras.layers.core import *\n", 32 | "from keras.layers.embeddings import *\n", 33 | "from keras.optimizers import SGD,Adadelta,Adam\n", 34 | "from keras.regularizers import l2, l1l2\n", 35 | "from keras.constraints import unitnorm,nonneg\n", 36 | "from keras.layers.advanced_activations import ThresholdedReLU\n", 37 | "from keras import backend as K\n", 38 | "from scipy.io import loadmat\n", 39 | "from scipy.io import savemat\n", 40 | "from keras.models import model_from_json\n", 41 | "from IPython.display import SVG\n", 42 | "from keras.utils.visualize_util import to_graph\n", 43 | "from keras.callbacks import ModelCheckpoint,RemoteMonitor\n", 44 | "import theano\n", 45 | "import theano.tensor as T\n", 46 | "import h5py\n", 47 | "from theano.tensor.shared_randomstreams import RandomStreams\n", 48 | "to_path = \"./\"" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 2, 54 | "metadata": { 55 | "collapsed": false 56 | }, 57 | "outputs": [ 58 | { 59 | "data": { 60 | "text/plain": [ 61 | "(28956, 300)" 62 | ] 63 | }, 64 | "execution_count": 2, 65 | "metadata": {}, 66 | "output_type": "execute_result" 67 | } 68 | ], 69 | "source": [ 70 | "term_matrix = loadmat(to_path + \"t1_termatrix.mat\", variable_names = \"target\").get(\"target\").astype(\"float32\")\n", 71 | "term_matrix.shape" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 3, 77 | "metadata": { 78 | "collapsed": false 79 | }, 80 | "outputs": [], 81 | "source": [ 82 | "class SymmetricAutoencoder(Layer):\n", 83 | " '''AutoEncoder where reconstruction = reconstruction_activation(activation(x * W) * W')\n", 84 | " # Input shape\n", 85 | " 2D tensor with shape: `(nb_samples, input_dim)`.\n", 86 | " # Output shape\n", 87 | " 2D tensor with shape: `(nb_samples, input_dim)` if output_reconstruction = True,\n", 88 | " shape: `(nb_samples,output_dim)` if output_reconstruction = False\n", 89 | " # Arguments\n", 90 | " output_dim: int > 0.\n", 91 | " init: name of initialization function for the weights of the layer\n", 92 | " (see [initializations](../initializations.md)),\n", 93 | " or alternatively, Theano function to use for weights\n", 94 | " initialization. This parameter is only relevant\n", 95 | " if you don't pass a `weights` argument.\n", 96 | " activation: name of activation function to use\n", 97 | " (see [activations](../activations.md)),\n", 98 | " or alternatively, elementwise Theano function.\n", 99 | " If you don't specify anything, no activation is applied\n", 100 | " (ie. \"linear\" activation: a(x) = x).\n", 101 | " weights: list of numpy arrays to set as initial weights.\n", 102 | " The list should have 1 element, of shape `(input_dim, output_dim)`.\n", 103 | " output_reconstruction: Whether, when not being trained, the output of the \n", 104 | " layer should be the reconstructed input, or the hidden layer activations.\n", 105 | " W_regularizer: instance of [WeightRegularizer](../regularizers.md)\n", 106 | " (eg. L1 or L2 regularization), applied to the main weights matrix.\n", 107 | " activity_regularizer: instance of [ActivityRegularizer](../regularizers.md),\n", 108 | " applied to the network output.\n", 109 | " W_constraint: instance of the [constraints](../constraints.md) module\n", 110 | " (eg. maxnorm, nonneg), applied to the main weights matrix.\n", 111 | " input_dim: dimensionality of the input (integer).\n", 112 | " This argument (or alternatively, the keyword argument `input_shape`)\n", 113 | " is required when using this layer as the first layer in a model.\n", 114 | " '''\n", 115 | " input_ndim = 2\n", 116 | "\n", 117 | " def __init__(self, output_dim, init='glorot_uniform', activation='linear',\n", 118 | " reconstruction_activation='linear', weights=None,\n", 119 | " W_regularizer=None, b_regularizer=None, activity_regularizer=None,\n", 120 | " output_reconstruction=False,\n", 121 | " W_constraint=None, b_constraint=None, input_dim=None, **kwargs):\n", 122 | " self.init = initializations.get(init)\n", 123 | " self.activation = activations.get(activation)\n", 124 | " self.reconstruction_activation = activations.get(reconstruction_activation)\n", 125 | " self.output_reconstruction = output_reconstruction\n", 126 | " self.output_dim = output_dim\n", 127 | " self.pretrain = True\n", 128 | "\n", 129 | " self.W_regularizer = regularizers.get(W_regularizer)\n", 130 | " self.b_regularizer = regularizers.get(b_regularizer)\n", 131 | " self.activity_regularizer = regularizers.get(activity_regularizer)\n", 132 | "\n", 133 | " self.W_constraint = constraints.get(W_constraint)\n", 134 | " self.b_constraint = constraints.get(b_constraint)\n", 135 | " self.constraints = [self.W_constraint, self.b_constraint]\n", 136 | "\n", 137 | " self.initial_weights = weights\n", 138 | "\n", 139 | " self.input_dim = input_dim\n", 140 | " if self.input_dim:\n", 141 | " kwargs['input_shape'] = (self.input_dim,)\n", 142 | " self.input = K.placeholder(ndim=2)\n", 143 | " super(SymmetricAutoencoder, self).__init__(**kwargs)\n", 144 | "\n", 145 | " def build(self):\n", 146 | " input_dim = self.input_shape[1]\n", 147 | "\n", 148 | " self.W = self.init((input_dim, self.output_dim))\n", 149 | "\n", 150 | " self.params = [self.W]\n", 151 | "\n", 152 | " self.regularizers = []\n", 153 | " if self.W_regularizer:\n", 154 | " self.W_regularizer.set_param(self.W)\n", 155 | " self.regularizers.append(self.W_regularizer)\n", 156 | "\n", 157 | " if self.activity_regularizer:\n", 158 | " self.activity_regularizer.set_layer(self)\n", 159 | " self.regularizers.append(self.activity_regularizer)\n", 160 | "\n", 161 | " if self.initial_weights is not None:\n", 162 | " self.set_weights(self.initial_weights)\n", 163 | " del self.initial_weights\n", 164 | "\n", 165 | " @property\n", 166 | " def output_shape(self):\n", 167 | " if self.pretrain or self.output_reconstruction: \n", 168 | " return self.input_shape\n", 169 | " else:\n", 170 | " return (self.input_shape[0], self.output_dim)\n", 171 | "\n", 172 | " def get_output(self, train=False):\n", 173 | " X = self.get_input(train)\n", 174 | " if self.pretrain or self.output_reconstruction: \n", 175 | " output = self.reconstruction_activation(K.dot(self.activation(K.dot(X, self.W)), K.transpose(self.W)))\n", 176 | " return output \n", 177 | " else:\n", 178 | " output = self.activation(K.dot(X, self.W))\n", 179 | " return output\n", 180 | "\n", 181 | " def get_config(self):\n", 182 | " config = {'name': self.__class__.__name__,\n", 183 | " 'output_dim': self.output_dim,\n", 184 | " 'init': self.init.__name__,\n", 185 | " 'activation': self.activation.__name__,\n", 186 | " 'reconstruction_activation': self.reconstruction_activation.__name__,\n", 187 | " 'W_regularizer': self.W_regularizer.get_config() if self.W_regularizer else None,\n", 188 | " 'activity_regularizer': self.activity_regularizer.get_config() if self.activity_regularizer else None,\n", 189 | " 'W_constraint': self.W_constraint.get_config() if self.W_constraint else None,\n", 190 | " 'input_dim': self.input_dim}\n", 191 | " base_config = super(SymmetricAutoencoder, self).get_config()\n", 192 | " return dict(list(base_config.items()) + list(config.items()))" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 4, 198 | "metadata": { 199 | "collapsed": false 200 | }, 201 | "outputs": [], 202 | "source": [ 203 | "encoder = Sequential()\n", 204 | "encoder.add(Embedding(\n", 205 | " input_dim = term_matrix.shape[0], \n", 206 | " output_dim = 300,\n", 207 | " weights = [term_matrix], \n", 208 | " trainable = False,\n", 209 | " input_length = 1)\n", 210 | " )\n", 211 | "encoder.add(Flatten())\n", 212 | "encoder.add(SymmetricAutoencoder(\n", 213 | " activation = 'sigmoid',\n", 214 | " reconstruction_activation = 'linear',\n", 215 | " output_dim=40\n", 216 | " ))\n", 217 | "inputs = numpy.reshape(numpy.arange(term_matrix.shape[0]), (term_matrix.shape[0], 1))\n", 218 | "outputs = term_matrix" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": { 225 | "collapsed": false 226 | }, 227 | "outputs": [ 228 | { 229 | "name": "stdout", 230 | "output_type": "stream", 231 | "text": [ 232 | "Epoch 1/1000\n", 233 | "28956/28956 [==============================] - 36s - loss: 0.0145 \n", 234 | "Epoch 2/1000\n", 235 | "28956/28956 [==============================] - 35s - loss: 0.0115 \n", 236 | "Epoch 3/1000\n", 237 | "28956/28956 [==============================] - 34s - loss: 0.0111 \n", 238 | "Epoch 4/1000\n", 239 | "28956/28956 [==============================] - 45s - loss: 0.0110 \n", 240 | "Epoch 5/1000\n", 241 | "28956/28956 [==============================] - 42s - loss: 0.0109 \n", 242 | "Epoch 6/1000\n", 243 | "28956/28956 [==============================] - 35s - loss: 0.0109 \n", 244 | "Epoch 7/1000\n", 245 | "28956/28956 [==============================] - 36s - loss: 0.0109 \n", 246 | "Epoch 8/1000\n", 247 | "28956/28956 [==============================] - 44s - loss: 0.0108 \n", 248 | "Epoch 9/1000\n", 249 | "28956/28956 [==============================] - 37s - loss: 0.0108 \n", 250 | "Epoch 10/1000\n", 251 | "28956/28956 [==============================] - 37s - loss: 0.0108 \n", 252 | "Epoch 11/1000\n", 253 | "28956/28956 [==============================] - 37s - loss: 0.0108 \n", 254 | "Epoch 12/1000\n", 255 | "28956/28956 [==============================] - 35s - loss: 0.0108 \n", 256 | "Epoch 13/1000\n", 257 | "28956/28956 [==============================] - 42s - loss: 0.0108 \n", 258 | "Epoch 14/1000\n", 259 | "28956/28956 [==============================] - 43s - loss: 0.0108 \n", 260 | "Epoch 15/1000\n", 261 | "28956/28956 [==============================] - 39s - loss: 0.0108 \n", 262 | "Epoch 16/1000\n", 263 | "28956/28956 [==============================] - 38s - loss: 0.0107 \n", 264 | "Epoch 17/1000\n", 265 | "28956/28956 [==============================] - 39s - loss: 0.0107 \n", 266 | "Epoch 18/1000\n", 267 | "28956/28956 [==============================] - 38s - loss: 0.0107 \n", 268 | "Epoch 19/1000\n", 269 | "28956/28956 [==============================] - 45s - loss: 0.0107 \n", 270 | "Epoch 20/1000\n", 271 | "28956/28956 [==============================] - 39s - loss: 0.0107 \n", 272 | "Epoch 21/1000\n", 273 | "28956/28956 [==============================] - 38s - loss: 0.0107 \n", 274 | "Epoch 22/1000\n", 275 | "28956/28956 [==============================] - 39s - loss: 0.0107 \n", 276 | "Epoch 23/1000\n", 277 | "28956/28956 [==============================] - 38s - loss: 0.0107 \n", 278 | "Epoch 24/1000\n", 279 | "28956/28956 [==============================] - 37s - loss: 0.0107 \n", 280 | "Epoch 25/1000\n", 281 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 282 | "Epoch 26/1000\n", 283 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 284 | "Epoch 27/1000\n", 285 | "28956/28956 [==============================] - 38s - loss: 0.0107 \n", 286 | "Epoch 28/1000\n", 287 | "28956/28956 [==============================] - 47s - loss: 0.0107 \n", 288 | "Epoch 29/1000\n", 289 | "28956/28956 [==============================] - 37s - loss: 0.0107 \n", 290 | "Epoch 30/1000\n", 291 | "28956/28956 [==============================] - 37s - loss: 0.0107 \n", 292 | "Epoch 31/1000\n", 293 | "28956/28956 [==============================] - 34s - loss: 0.0107 \n", 294 | "Epoch 32/1000\n", 295 | "28956/28956 [==============================] - 34s - loss: 0.0107 \n", 296 | "Epoch 33/1000\n", 297 | "28956/28956 [==============================] - 38s - loss: 0.0107 \n", 298 | "Epoch 34/1000\n", 299 | "28956/28956 [==============================] - 41s - loss: 0.0107 \n", 300 | "Epoch 35/1000\n", 301 | "28956/28956 [==============================] - 47s - loss: 0.0107 \n", 302 | "Epoch 36/1000\n", 303 | "28956/28956 [==============================] - 41s - loss: 0.0107 \n", 304 | "Epoch 37/1000\n", 305 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 306 | "Epoch 38/1000\n", 307 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 308 | "Epoch 39/1000\n", 309 | "28956/28956 [==============================] - 44s - loss: 0.0107 \n", 310 | "Epoch 40/1000\n", 311 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 312 | "Epoch 41/1000\n", 313 | "28956/28956 [==============================] - 34s - loss: 0.0107 \n", 314 | "Epoch 42/1000\n", 315 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 316 | "Epoch 90/1000\n", 317 | "28956/28956 [==============================] - 38s - loss: 0.0107 \n", 318 | "Epoch 93/1000\n", 319 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 320 | "Epoch 98/1000\n", 321 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 322 | "Epoch 101/1000\n", 323 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 324 | "Epoch 106/1000\n", 325 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 326 | "Epoch 109/1000\n", 327 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 328 | "Epoch 112/1000\n", 329 | "28956/28956 [==============================] - 40s - loss: 0.0107 \n", 330 | "Epoch 117/1000\n", 331 | "28956/28956 [==============================] - 40s - loss: 0.0107 \n", 332 | "Epoch 122/1000\n", 333 | "28956/28956 [==============================] - 34s - loss: 0.0107 \n", 334 | "Epoch 127/1000\n", 335 | "28956/28956 [==============================] - 34s - loss: 0.0107 \n", 336 | "Epoch 130/1000\n", 337 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 338 | "Epoch 135/1000\n", 339 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 340 | "Epoch 138/1000\n", 341 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 342 | "Epoch 141/1000\n", 343 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 344 | "Epoch 149/1000\n", 345 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 346 | "Epoch 152/1000\n", 347 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 348 | "Epoch 157/1000\n", 349 | "28956/28956 [==============================] - 34s - loss: 0.0107 \n", 350 | "Epoch 160/1000\n", 351 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 352 | "Epoch 163/1000\n", 353 | "28956/28956 [==============================] - 42s - loss: 0.0107 \n", 354 | "Epoch 170/1000\n", 355 | "28956/28956 [==============================] - 39s - loss: 0.0107 \n", 356 | "Epoch 174/1000\n", 357 | "28956/28956 [==============================] - 34s - loss: 0.0107 \n", 358 | "Epoch 179/1000\n", 359 | "28956/28956 [==============================] - 34s - loss: 0.0107 \n", 360 | "Epoch 182/1000\n", 361 | "28956/28956 [==============================] - 39s - loss: 0.0107 \n", 362 | "Epoch 187/1000\n", 363 | "28956/28956 [==============================] - 42s - loss: 0.0107 \n", 364 | "Epoch 194/1000\n", 365 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 366 | "Epoch 199/1000\n", 367 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 368 | "Epoch 202/1000\n", 369 | "28956/28956 [==============================] - 37s - loss: 0.0107 \n", 370 | "Epoch 207/1000\n", 371 | "28956/28956 [==============================] - 36s - loss: 0.0107 \n", 372 | "Epoch 212/1000\n", 373 | "28956/28956 [==============================] - 37s - loss: 0.0107 \n", 374 | "Epoch 217/1000\n", 375 | "28956/28956 [==============================] - 37s - loss: 0.0107 \n", 376 | "Epoch 222/1000\n", 377 | "28956/28956 [==============================] - 38s - loss: 0.0107 \n", 378 | "Epoch 227/1000\n", 379 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 380 | "Epoch 232/1000\n", 381 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 382 | "Epoch 235/1000\n", 383 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 384 | "Epoch 238/1000\n", 385 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 386 | "Epoch 243/1000\n", 387 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 388 | "Epoch 246/1000\n", 389 | "28956/28956 [==============================] - 34s - loss: 0.0107 \n", 390 | "Epoch 257/1000\n", 391 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 392 | "Epoch 260/1000\n", 393 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 394 | "Epoch 265/1000\n", 395 | "28956/28956 [==============================] - 36s - loss: 0.0107 \n", 396 | "Epoch 268/1000\n", 397 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 398 | "Epoch 276/1000\n", 399 | "28956/28956 [==============================] - 35s - loss: 0.0107 \n", 400 | "Epoch 284/1000\n", 401 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 402 | "Epoch 289/1000\n", 403 | "28956/28956 [==============================] - 34s - loss: 0.0107 \n", 404 | "Epoch 300/1000\n", 405 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 406 | "Epoch 303/1000\n", 407 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 408 | "Epoch 306/1000\n", 409 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 410 | "Epoch 309/1000\n", 411 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 412 | "Epoch 312/1000\n", 413 | "28956/28956 [==============================] - 31s - loss: 0.0107 \n", 414 | "Epoch 315/1000\n", 415 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 416 | "Epoch 318/1000\n", 417 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 418 | "Epoch 321/1000\n", 419 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 420 | "Epoch 324/1000\n", 421 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 422 | "Epoch 327/1000\n", 423 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 424 | "Epoch 330/1000\n", 425 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 426 | "Epoch 356/1000\n", 427 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 428 | "Epoch 359/1000\n", 429 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 430 | "Epoch 362/1000\n", 431 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 432 | "Epoch 365/1000\n", 433 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 434 | "Epoch 368/1000\n", 435 | "28956/28956 [==============================] - 32s - loss: 0.0107 \n", 436 | "Epoch 371/1000\n", 437 | "28956/28956 [==============================] - 33s - loss: 0.0107 \n", 438 | "Epoch 374/1000\n", 439 | "28956/28956 [==============================] - 35s - loss: 0.0106 \n", 440 | "Epoch 397/1000\n", 441 | "28956/28956 [==============================] - 36s - loss: 0.0106 \n", 442 | "Epoch 408/1000\n", 443 | "28956/28956 [==============================] - 36s - loss: 0.0107 \n", 444 | "Epoch 413/1000\n", 445 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 446 | "Epoch 454/1000\n", 447 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 448 | "Epoch 457/1000\n", 449 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 450 | "Epoch 460/1000\n", 451 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 452 | "Epoch 463/1000\n", 453 | "28956/28956 [==============================] - 34s - loss: 0.0106 \n", 454 | "Epoch 504/1000\n", 455 | "28956/28956 [==============================] - 38s - loss: 0.0107 \n", 456 | "Epoch 526/1000\n", 457 | "28956/28956 [==============================] - 56s - loss: 0.0106 \n", 458 | "Epoch 530/1000\n", 459 | "28956/28956 [==============================] - 46s - loss: 0.0106 \n", 460 | "Epoch 537/1000\n", 461 | "28956/28956 [==============================] - 33s - loss: 0.0106 \n", 462 | "Epoch 550/1000\n", 463 | "28956/28956 [==============================] - 43s - loss: 0.0106 \n", 464 | "Epoch 555/1000\n", 465 | "28956/28956 [==============================] - 40s - loss: 0.0106 \n", 466 | "Epoch 562/1000\n", 467 | "28956/28956 [==============================] - 41s - loss: 0.0106 \n", 468 | "Epoch 564/1000\n", 469 | "28956/28956 [==============================] - 38s - loss: 0.0106 \n", 470 | "Epoch 569/1000\n", 471 | "28956/28956 [==============================] - 35s - loss: 0.0106 \n", 472 | "Epoch 574/1000\n", 473 | "28956/28956 [==============================] - 48s - loss: 0.0106 \n", 474 | "Epoch 584/1000\n", 475 | "28956/28956 [==============================] - 44s - loss: 0.0106 \n", 476 | "Epoch 586/1000\n", 477 | "28956/28956 [==============================] - 50s - loss: 0.0106 \n", 478 | "Epoch 588/1000\n", 479 | "28956/28956 [==============================] - 57s - loss: 0.0106 \n", 480 | "Epoch 592/1000\n", 481 | "28956/28956 [==============================] - 136s - loss: 0.0106 \n", 482 | "Epoch 594/1000\n", 483 | "28956/28956 [==============================] - 114s - loss: 0.0106 \n", 484 | "Epoch 597/1000\n", 485 | "28956/28956 [==============================] - 100s - loss: 0.0106 \n", 486 | "Epoch 601/1000\n", 487 | "28956/28956 [==============================] - 37s - loss: 0.0106 \n", 488 | "Epoch 603/1000\n", 489 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 490 | "Epoch 617/1000\n", 491 | "28956/28956 [==============================] - 40s - loss: 0.0106 \n", 492 | "Epoch 622/1000\n", 493 | "28956/28956 [==============================] - 43s - loss: 0.0106 \n", 494 | "Epoch 630/1000\n", 495 | "28956/28956 [==============================] - 42s - loss: 0.0106 \n", 496 | "Epoch 640/1000\n", 497 | "28956/28956 [==============================] - 37s - loss: 0.0106 \n", 498 | "Epoch 645/1000\n", 499 | "28956/28956 [==============================] - 43s - loss: 0.0106 \n", 500 | "Epoch 650/1000\n", 501 | "28956/28956 [==============================] - 38s - loss: 0.0106 \n", 502 | "Epoch 655/1000\n", 503 | "28956/28956 [==============================] - 38s - loss: 0.0106 \n", 504 | "Epoch 660/1000\n", 505 | "28956/28956 [==============================] - 35s - loss: 0.0106 \n", 506 | "Epoch 668/1000\n", 507 | "28956/28956 [==============================] - 37s - loss: 0.0106 \n", 508 | "Epoch 683/1000\n", 509 | "28956/28956 [==============================] - 37s - loss: 0.0106 \n", 510 | "Epoch 688/1000\n", 511 | "28956/28956 [==============================] - 35s - loss: 0.0106 \n", 512 | "Epoch 699/1000\n", 513 | "28956/28956 [==============================] - 35s - loss: 0.0106 \n", 514 | "Epoch 715/1000\n", 515 | "28956/28956 [==============================] - 35s - loss: 0.0106 \n", 516 | "Epoch 718/1000\n", 517 | "28956/28956 [==============================] - 37s - loss: 0.0106 \n", 518 | "Epoch 726/1000\n", 519 | "28956/28956 [==============================] - 33s - loss: 0.0106 \n", 520 | "Epoch 740/1000\n", 521 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 522 | "Epoch 751/1000\n", 523 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 524 | "Epoch 763/1000\n", 525 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 526 | "Epoch 766/1000\n", 527 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 528 | "Epoch 769/1000\n", 529 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 530 | "Epoch 772/1000\n", 531 | "28956/28956 [==============================] - 34s - loss: 0.0106 \n", 532 | "Epoch 775/1000\n", 533 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 534 | "Epoch 807/1000\n", 535 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 536 | "Epoch 810/1000\n", 537 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 538 | "Epoch 813/1000\n", 539 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 540 | "Epoch 816/1000\n", 541 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 542 | "Epoch 822/1000\n", 543 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 544 | "Epoch 859/1000\n", 545 | "28956/28956 [==============================] - 31s - loss: 0.0106 \n", 546 | "Epoch 869/1000\n", 547 | "28956/28956 [==============================] - 32s - loss: 0.0106 \n", 548 | "Epoch 872/1000\n", 549 | "28956/28956 [==============================] - 31s - loss: 0.0106 \n", 550 | "Epoch 875/1000\n", 551 | "28956/28956 [==============================] - 31s - loss: 0.0106 \n", 552 | "Epoch 916/1000\n", 553 | "28956/28956 [==============================] - 31s - loss: 0.0106 \n", 554 | "Epoch 954/1000\n", 555 | "28956/28956 [==============================] - 31s - loss: 0.0106 \n", 556 | "Epoch 957/1000\n", 557 | "28956/28956 [==============================] - 31s - loss: 0.0106 \n", 558 | "Epoch 960/1000\n", 559 | "28956/28956 [==============================] - 31s - loss: 0.0106 \n", 560 | "Epoch 986/1000\n", 561 | "28956/28956 [==============================] - 31s - loss: 0.0106 \n", 562 | "Epoch 989/1000\n", 563 | "28956/28956 [==============================] - 31s - loss: 0.0106 \n", 564 | "Epoch 992/1000\n", 565 | " 8077/28956 [=======>......................] - ETA: 26s - loss: 0.0105" 566 | ] 567 | } 568 | ], 569 | "source": [ 570 | "encoder.compile(loss = 'mse', optimizer = 'Adadelta')\n", 571 | "\n", 572 | "history = encoder.fit(inputs, outputs, nb_epoch = 1000, batch_size = 1)" 573 | ] 574 | }, 575 | { 576 | "cell_type": "code", 577 | "execution_count": 6, 578 | "metadata": { 579 | "collapsed": true 580 | }, 581 | "outputs": [], 582 | "source": [ 583 | "encoder.save_weights(\"W1_pretrain_40.hdf5\")\n", 584 | "#encoder.load_weights(\"W1_pretrain_Adam_1000_loss_0037.hdf5\")" 585 | ] 586 | }, 587 | { 588 | "cell_type": "code", 589 | "execution_count": 31, 590 | "metadata": { 591 | "collapsed": false 592 | }, 593 | "outputs": [ 594 | { 595 | "data": { 596 | "text/plain": [ 597 | "(28956, 300)" 598 | ] 599 | }, 600 | "execution_count": 31, 601 | "metadata": {}, 602 | "output_type": "execute_result" 603 | } 604 | ], 605 | "source": [ 606 | "encoder.output_reconstruction = False\n", 607 | "encoder.pretrain = False\n", 608 | "activations = encoder.predict(inputs, batch_size = 15000)\n", 609 | "#savemat(\"./t1_ntm_pretrain.mat\", { 'activations' : activations,\n", 610 | "# 'W2' : encoder.get_weights()[1]})\n", 611 | "activations.shape\n", 612 | "import h5py\n", 613 | "h5f = h5py.File(\"activations.hdf5\")\n", 614 | "h5f.create_dataset('activations', data = activations)\n", 615 | "h5f.close()" 616 | ] 617 | }, 618 | { 619 | "cell_type": "code", 620 | "execution_count": 23, 621 | "metadata": { 622 | "collapsed": false 623 | }, 624 | "outputs": [ 625 | { 626 | "data": { 627 | "text/plain": [ 628 | "(300, 40)" 629 | ] 630 | }, 631 | "execution_count": 23, 632 | "metadata": {}, 633 | "output_type": "execute_result" 634 | } 635 | ], 636 | "source": [ 637 | "#get initial weights for W2 from the autoencoder\n", 638 | "#pretrained_W2 = encoder.get_weights()[1]\n", 639 | "#pretrained_W2 = loadmat(to_path + \"t1_ntm_pretrain.mat\", variable_names = \"W2\").get(\"W2\").astype(\"float32\")\n", 640 | "h5w2 = h5py.File('W1_pretrain_40.hdf5', 'r')\n", 641 | "h5w2['/layer_2'].items()\n", 642 | "pretrained_W2 = h5w2['layer_2/param_0'][:]\n", 643 | "h5w2.close()\n", 644 | "pretrained_W2.shape" 645 | ] 646 | }, 647 | { 648 | "cell_type": "code", 649 | "execution_count": 4, 650 | "metadata": { 651 | "collapsed": false 652 | }, 653 | "outputs": [], 654 | "source": [ 655 | "#get initial weights for W1 that were pretrained in R based on the autoencoder activations\n", 656 | "#pretrained_W1 = loadmat(to_path + \"t1_ntm_pret.mat\", variable_names = \"w1\").get(\"w1\").astype(\"float32\") \n", 657 | "\n", 658 | "examples = loadmat(to_path + \"t1_ntm_pret.mat\", variable_names = \"examples\").get(\"examples\")\n", 659 | "# Take the multiple sets and combine them into one big super-epoch\n", 660 | "examples = numpy.vstack(tuple([examples[:,(0,1,x)] for x in range(2, examples.shape[1])]))" 661 | ] 662 | }, 663 | { 664 | "cell_type": "code", 665 | "execution_count": 5, 666 | "metadata": { 667 | "collapsed": false 668 | }, 669 | "outputs": [ 670 | { 671 | "data": { 672 | "text/plain": [ 673 | "(954905, 40)" 674 | ] 675 | }, 676 | "execution_count": 5, 677 | "metadata": {}, 678 | "output_type": "execute_result" 679 | } 680 | ], 681 | "source": [ 682 | "#pretrained_W1 = loadmat(to_path + \"t1_ntm_w1.mat\", variable_names = \"w1\").get(\"w1\").astype(\"float32\") \n", 683 | "h5w1 = h5py.File('w1_pretrain.hdf5', 'r')\n", 684 | "pretrained_W1 = numpy.transpose(h5w1['w1'][:])\n", 685 | "h5w1.close()\n", 686 | "pretrained_W1.shape" 687 | ] 688 | }, 689 | { 690 | "cell_type": "code", 691 | "execution_count": 6, 692 | "metadata": { 693 | "collapsed": false 694 | }, 695 | "outputs": [ 696 | { 697 | "data": { 698 | "text/plain": [ 699 | "(954905, 40, 28956, 1)" 700 | ] 701 | }, 702 | "execution_count": 6, 703 | "metadata": {}, 704 | "output_type": "execute_result" 705 | } 706 | ], 707 | "source": [ 708 | "(n_docs, n_topics, n_terms, n_epochs) = (pretrained_W1.shape[0], \n", 709 | " pretrained_W1.shape[1], \n", 710 | " term_matrix.shape[0], \n", 711 | " examples.shape[1] - 2)\n", 712 | "(n_docs, n_topics, n_terms, n_epochs)" 713 | ] 714 | }, 715 | { 716 | "cell_type": "code", 717 | "execution_count": 24, 718 | "metadata": { 719 | "collapsed": true 720 | }, 721 | "outputs": [], 722 | "source": [ 723 | "class DenseNoBias(Layer):\n", 724 | " '''Fully connected NN layer with no bias term.\n", 725 | " # Input shape\n", 726 | " 2D tensor with shape: `(nb_samples, input_dim)`.\n", 727 | " # Output shape\n", 728 | " 2D tensor with shape: `(nb_samples, output_dim)`.\n", 729 | " # Arguments\n", 730 | " output_dim: int > 0.\n", 731 | " init: name of initialization function for the weights of the layer\n", 732 | " (see [initializations](../initializations.md)),\n", 733 | " or alternatively, Theano function to use for weights\n", 734 | " initialization. This parameter is only relevant\n", 735 | " if you don't pass a `weights` argument.\n", 736 | " activation: name of activation function to use\n", 737 | " (see [activations](../activations.md)),\n", 738 | " or alternatively, elementwise Theano function.\n", 739 | " If you don't specify anything, no activation is applied\n", 740 | " (ie. \"linear\" activation: a(x) = x).\n", 741 | " weights: list of numpy arrays to set as initial weights.\n", 742 | " The list should have 1 element, of shape `(input_dim, output_dim)`.\n", 743 | " W_regularizer: instance of [WeightRegularizer](../regularizers.md)\n", 744 | " (eg. L1 or L2 regularization), applied to the main weights matrix.\n", 745 | " activity_regularizer: instance of [ActivityRegularizer](../regularizers.md),\n", 746 | " applied to the network output.\n", 747 | " W_constraint: instance of the [constraints](../constraints.md) module\n", 748 | " (eg. maxnorm, nonneg), applied to the main weights matrix.\n", 749 | " input_dim: dimensionality of the input (integer).\n", 750 | " This argument (or alternatively, the keyword argument `input_shape`)\n", 751 | " is required when using this layer as the first layer in a model.\n", 752 | " '''\n", 753 | " input_ndim = 2\n", 754 | "\n", 755 | " def __init__(self, output_dim, init='glorot_uniform', activation='linear', weights=None,\n", 756 | " W_regularizer=None, activity_regularizer=None,\n", 757 | " W_constraint=None, input_dim=None, **kwargs):\n", 758 | " self.init = initializations.get(init)\n", 759 | " self.activation = activations.get(activation)\n", 760 | " self.output_dim = output_dim\n", 761 | "\n", 762 | " self.W_regularizer = regularizers.get(W_regularizer)\n", 763 | " self.activity_regularizer = regularizers.get(activity_regularizer)\n", 764 | "\n", 765 | " self.W_constraint = constraints.get(W_constraint)\n", 766 | " self.constraints = [self.W_constraint]\n", 767 | "\n", 768 | " self.initial_weights = weights\n", 769 | "\n", 770 | " self.input_dim = input_dim\n", 771 | " if self.input_dim:\n", 772 | " kwargs['input_shape'] = (self.input_dim,)\n", 773 | " self.input = K.placeholder(ndim=2)\n", 774 | " super(DenseNoBias, self).__init__(**kwargs)\n", 775 | "\n", 776 | " def build(self):\n", 777 | " input_dim = self.input_shape[1]\n", 778 | "\n", 779 | " self.W = self.init((input_dim, self.output_dim))\n", 780 | "\n", 781 | " self.params = [self.W]\n", 782 | "\n", 783 | " self.regularizers = []\n", 784 | " if self.W_regularizer:\n", 785 | " self.W_regularizer.set_param(self.W)\n", 786 | " self.regularizers.append(self.W_regularizer)\n", 787 | "\n", 788 | " if self.activity_regularizer:\n", 789 | " self.activity_regularizer.set_layer(self)\n", 790 | " self.regularizers.append(self.activity_regularizer)\n", 791 | "\n", 792 | " if self.initial_weights is not None:\n", 793 | " self.set_weights(self.initial_weights)\n", 794 | " del self.initial_weights\n", 795 | "\n", 796 | " @property\n", 797 | " def output_shape(self):\n", 798 | " return (self.input_shape[0], self.output_dim)\n", 799 | "\n", 800 | " def get_output(self, train=False):\n", 801 | " X = self.get_input(train)\n", 802 | " output = self.activation(K.dot(X, self.W))\n", 803 | " return output\n", 804 | "\n", 805 | " def get_config(self):\n", 806 | " config = {'name': self.__class__.__name__,\n", 807 | " 'output_dim': self.output_dim,\n", 808 | " 'init': self.init.__name__,\n", 809 | " 'activation': self.activation.__name__,\n", 810 | " 'W_regularizer': self.W_regularizer.get_config() if self.W_regularizer else None,\n", 811 | " 'activity_regularizer': self.activity_regularizer.get_config() if self.activity_regularizer else None,\n", 812 | " 'W_constraint': self.W_constraint.get_config() if self.W_constraint else None,\n", 813 | " 'input_dim': self.input_dim}\n", 814 | " base_config = super(DenseNoBias, self).get_config()\n", 815 | " return dict(list(base_config.items()) + list(config.items()))" 816 | ] 817 | }, 818 | { 819 | "cell_type": "code", 820 | "execution_count": 25, 821 | "metadata": { 822 | "collapsed": false, 823 | "scrolled": true 824 | }, 825 | "outputs": [], 826 | "source": [ 827 | "# Build the actual training model\n", 828 | "\n", 829 | "def build_ntm(term_matrix = term_matrix, \n", 830 | " pre_W1 = pretrained_W1, \n", 831 | " pre_W2 = pretrained_W2, \n", 832 | " W2_l2 = 0.001\n", 833 | " ):\n", 834 | " \n", 835 | " n_docs = pretrained_W1.shape[0]\n", 836 | " n_topics = pretrained_W1.shape[1]\n", 837 | " n_terms = term_matrix.shape[0]\n", 838 | " \n", 839 | " ntm = Graph()\n", 840 | " \n", 841 | " ntm.add_input(name = \"g\", input_shape = (1,), dtype = \"int\")\n", 842 | " ntm.add_node(Embedding(input_dim = n_terms, \n", 843 | " output_dim = 300,\n", 844 | " weights = [term_matrix], \n", 845 | " trainable = False,\n", 846 | " input_length = 1), \n", 847 | " name = \"le\", input = \"g\")\n", 848 | " ntm.add_node(Flatten(), input = \"le\", name = \"le_\")\n", 849 | " ntm.add_node(DenseNoBias(n_topics, activation = \"sigmoid\", \n", 850 | " weights = [pre_W2], \n", 851 | " W_regularizer = l2(W2_l2)\n", 852 | " ),\n", 853 | " name = \"lt\", input = \"le_\")\n", 854 | " \n", 855 | " ntm.add_input(name = \"d_pos\", input_shape = (1,), dtype = \"int\")\n", 856 | " ntm.add_input(name = \"d_neg\", input_shape = (1,), dtype = \"int\")\n", 857 | " ntm.add_shared_node(Embedding(input_dim = n_docs, \n", 858 | " output_dim = n_topics, \n", 859 | " weights = [pre_W1], \n", 860 | " input_length = 1),\n", 861 | " name = \"topicmatrix\",\n", 862 | " inputs = [\"d_pos\", \"d_neg\"], \n", 863 | " outputs = [\"wd_pos\", \"wd_neg\"],\n", 864 | " merge_mode = None)\n", 865 | " ntm.add_node(Flatten(), name = \"wd_pos_\", input = \"wd_pos\")\n", 866 | " ntm.add_node(Flatten(), name = \"wd_neg_\", input = \"wd_neg\")\n", 867 | " ntm.add_node(Activation(\"softmax\"), name = \"ld_pos\", input = \"wd_pos_\")\n", 868 | " ntm.add_node(Activation(\"softmax\"), name = \"ld_neg\", input = \"wd_neg_\")\n", 869 | " \n", 870 | " ntm.add_node(Layer(),\n", 871 | " name = \"ls_pos\", \n", 872 | " inputs = [\"lt\", \"ld_pos\"], \n", 873 | " merge_mode = 'dot', dot_axes = -1)# , create_output = True)\n", 874 | " ntm.add_node(Layer(), \n", 875 | " name = \"ls_neg\", \n", 876 | " inputs = [\"lt\", \"ld_neg\"], \n", 877 | " merge_mode = 'dot', dot_axes = -1)#, create_output = True)\n", 878 | " return ntm\n", 879 | "\n", 880 | "def add_fine_tuning(ntm = None):\n", 881 | " import theano.tensor as T\n", 882 | " def output_shape(input_shape):\n", 883 | " return (None, 1)\n", 884 | " \n", 885 | " def sub_merge(layers):\n", 886 | " import theano.tensor as T\n", 887 | "# ls_pos = T.dot(layers[0], layers[1].T)\n", 888 | "# ls_neg = T.dot(layers[0], layers[2].T)\n", 889 | " ls_pos = layers[0]\n", 890 | " ls_neg = layers[1]\n", 891 | " #less = #T.mul(40000000,T.add(ls_neg, ls_pos))\n", 892 | " less = T.sub(ls_neg, ls_pos)\n", 893 | " return T.add(0.5, less)\n", 894 | "\n", 895 | " #def sumLam(x):\n", 896 | " # return (0.5 + (x[1] - x[0]))\n", 897 | "\n", 898 | " summer = LambdaMerge(layers = [ntm.nodes[\"ls_pos\"], \n", 899 | " ntm.nodes[\"ls_neg\"]], \n", 900 | " function = sub_merge,\n", 901 | " output_shape = output_shape)\n", 902 | " ntm.add_node(summer, inputs = [\"ls_pos\", \"ls_neg\"], name = \"summed\", create_output = True)\n", 903 | "\n", 904 | " return ntm\n", 905 | "\n", 906 | "\n", 907 | "#SVG(to_graph(ntm).create(prog='dot', format='svg'))" 908 | ] 909 | }, 910 | { 911 | "cell_type": "code", 912 | "execution_count": null, 913 | "metadata": { 914 | "collapsed": false, 915 | "scrolled": true 916 | }, 917 | "outputs": [ 918 | { 919 | "name": "stdout", 920 | "output_type": "stream", 921 | "text": [ 922 | "Train on 45794004 samples, validate on 934572 samples\n", 923 | "Epoch 1/20\n", 924 | " 330000/45794004 [..............................] - ETA: 10931s - loss: 0.6916" 925 | ] 926 | } 927 | ], 928 | "source": [ 929 | "# Fine-tuning\n", 930 | "ntm = build_ntm(W2_l2 = 0.001)\n", 931 | "ntm = add_fine_tuning(ntm)\n", 932 | "\n", 933 | "#def rawloss(x_train, x_test):\n", 934 | "# return x_train * x_test\n", 935 | "def maxloss(y_true, y_predict):\n", 936 | " return K.maximum(y_true,y_predict)\n", 937 | "# return T.maximum(0., T.mul(y_true,y_predict ))\n", 938 | "\n", 939 | "#ntm.load_weights(\"cpw4_starte0_batch10000_sgd001_e_01_0.499998.hdf5\")\n", 940 | "\n", 941 | "ntm.compile(loss = {'summed' : maxloss#, \n", 942 | " # 'ls_pos' : 'binary_crossentropy', \n", 943 | " # 'ls_neg' : 'binary_crossentropy'\n", 944 | " },\n", 945 | " optimizer = SGD(lr = 0.01))\n", 946 | "\n", 947 | "checkpointer = ModelCheckpoint(filepath=\"./cpw5_starte0_batch10000_sgd001_e_{epoch:02d}_{val_loss:.6f}.hdf5\", \n", 948 | " monitor = 'val_loss', verbose = 1, save_best_only=False)\n", 949 | "\n", 950 | "train_shape = (examples.shape[0], 1)\n", 951 | "trainer = examples \n", 952 | " \n", 953 | "historylog = ntm.fit(data = {\n", 954 | " \"g\" : numpy.reshape(trainer[:,1], train_shape), \n", 955 | " \"d_pos\" : numpy.reshape(trainer[:,0], train_shape), \n", 956 | " \"d_neg\" : numpy.reshape(trainer[:,2], train_shape),\n", 957 | " \"summed\" : numpy.reshape(numpy.zeros(trainer.shape[0], dtype = theano.config.floatX),\n", 958 | " train_shape)#, \n", 959 | "# \"ls_pos\" : numpy.reshape(numpy.ones(trainer.shape[0], dtype = theano.config.floatX),\n", 960 | "# train_shape),\n", 961 | "# \"ls_neg\" : numpy.reshape(numpy.zeros(trainer.shape[0], dtype = theano.config.floatX),\n", 962 | "# train_shape)\n", 963 | " }, callbacks = [checkpointer],\n", 964 | " validation_split = 0.02,\n", 965 | " nb_epoch = 20, \n", 966 | " batch_size = 10000)" 967 | ] 968 | }, 969 | { 970 | "cell_type": "code", 971 | "execution_count": null, 972 | "metadata": { 973 | "collapsed": false 974 | }, 975 | "outputs": [], 976 | "source": [ 977 | "ntm.load_weights(\"cpw_new_startepoch0_00_0.5000.hdf5\")\n", 978 | "idxs = numpy.random.choice(trainer.shape[0], 200000, replace = False)\n", 979 | "tester = trainer[idxs,:]\n", 980 | "tester_shape = (tester.shape[0], 1)\n", 981 | "ntm.evaluate(data = {\n", 982 | " \"g\" : numpy.reshape(tester[:,1], tester_shape), \n", 983 | " \"d_pos\" : numpy.reshape(tester[:,0], tester_shape), \n", 984 | " \"d_neg\" : numpy.reshape(tester[:,2], tester_shape),\n", 985 | " \"loss_out\" : numpy.reshape(numpy.ones(tester.shape[0], \n", 986 | " dtype = theano.config.floatX), tester_shape)\n", 987 | " }, batch_size = 20000)" 988 | ] 989 | }, 990 | { 991 | "cell_type": "code", 992 | "execution_count": null, 993 | "metadata": { 994 | "collapsed": false 995 | }, 996 | "outputs": [], 997 | "source": [ 998 | "0.49998338818550109,0.49998418092727659,0.49998493790626525,0.49998548328876496,0.4999860256910324,0.49998660981655119,0.49998704195022581,0.49998756051063536" 999 | ] 1000 | }, 1001 | { 1002 | "cell_type": "code", 1003 | "execution_count": null, 1004 | "metadata": { 1005 | "collapsed": false 1006 | }, 1007 | "outputs": [], 1008 | "source": [ 1009 | "[(x, type(ntm.nodes[x]), ntm.nodes[x].output_shape) for x in ntm.nodes]" 1010 | ] 1011 | }, 1012 | { 1013 | "cell_type": "code", 1014 | "execution_count": null, 1015 | "metadata": { 1016 | "collapsed": false 1017 | }, 1018 | "outputs": [], 1019 | "source": [ 1020 | "json_string = ntm.to_json()\n", 1021 | "open('ntm_final.json', 'w').write(json_string)\n", 1022 | "ntm.save_weights(to_path + 'ntm_finalweights_.h5', overwrite=True)" 1023 | ] 1024 | }, 1025 | { 1026 | "cell_type": "code", 1027 | "execution_count": null, 1028 | "metadata": { 1029 | "collapsed": false 1030 | }, 1031 | "outputs": [], 1032 | "source": [ 1033 | "weights = ntm.get_weights()\n", 1034 | "(weights[0].shape, weights[1].shape, weights[2].shape, \n", 1035 | " weights[3].shape)" 1036 | ] 1037 | }, 1038 | { 1039 | "cell_type": "code", 1040 | "execution_count": null, 1041 | "metadata": { 1042 | "collapsed": false 1043 | }, 1044 | "outputs": [], 1045 | "source": [ 1046 | "w = ntm.nodes[\"lt\"].get_weights()\n", 1047 | "(w[0].shape, w[1].shape)" 1048 | ] 1049 | }, 1050 | { 1051 | "cell_type": "code", 1052 | "execution_count": null, 1053 | "metadata": { 1054 | "collapsed": false 1055 | }, 1056 | "outputs": [], 1057 | "source": [ 1058 | "softies = weights[0][100000,:]\n", 1059 | "numpy.exp(softies)/numpy.sum(numpy.exp(softies))" 1060 | ] 1061 | }, 1062 | { 1063 | "cell_type": "code", 1064 | "execution_count": null, 1065 | "metadata": { 1066 | "collapsed": false 1067 | }, 1068 | "outputs": [], 1069 | "source": [ 1070 | "1 / (1 + numpy.exp( - weights[2][100,:]))" 1071 | ] 1072 | }, 1073 | { 1074 | "cell_type": "code", 1075 | "execution_count": null, 1076 | "metadata": { 1077 | "collapsed": true 1078 | }, 1079 | "outputs": [], 1080 | "source": [ 1081 | "# sNTM\n", 1082 | "def rawloss(x_train, x_test):\n", 1083 | " return x_train * x_test\n", 1084 | "n_categories = 3\n", 1085 | "ntm.add_node(Dense(n_categories, activation = \"sigmoid\"), input = \"ld_pos\", name = \"ll\")\n", 1086 | "ntm.add_output(name = \"label\", input = \"ll\")\n", 1087 | "ntm.compile(loss = {'loss_out' : threshold,\n", 1088 | " 'label' : 'categorical_crossentropy'}, \n", 1089 | " optimizer = \"Adadelta\")\n", 1090 | "\n", 1091 | "checkpointer = ModelCheckpoint(filepath=\"./cpw_new_smallbatch_sgd001_epoch_{epoch:02d}_{val_loss:.5f}.hdf5\", \n", 1092 | " monitor = 'val_loss', verbose = 1, save_best_only=False)\n", 1093 | "\n", 1094 | "train_shape = (examples.shape[0], 1)\n", 1095 | "trainer = examples \n", 1096 | " \n", 1097 | "historylog = ntm.fit(data = {\n", 1098 | " \"g\" : numpy.reshape(trainer[:,1], train_shape), \n", 1099 | " \"d_pos\" : numpy.reshape(trainer[:,0], train_shape), \n", 1100 | " \"d_neg\" : numpy.reshape(trainer[:,2], train_shape),\n", 1101 | " \"loss_out\" : numpy.reshape(numpy.ones(trainer.shape[0], \n", 1102 | " dtype = theano.config.floatX), train_shape)\n", 1103 | " # Need to add something here for the labels\n", 1104 | " }, callbacks = [checkpointer],\n", 1105 | " validation_split = 0.02,\n", 1106 | " nb_epoch = 20, \n", 1107 | " batch_size = 10)" 1108 | ] 1109 | } 1110 | ], 1111 | "metadata": { 1112 | "kernelspec": { 1113 | "display_name": "Python 2", 1114 | "language": "python", 1115 | "name": "python2" 1116 | }, 1117 | "language_info": { 1118 | "codemirror_mode": { 1119 | "name": "ipython", 1120 | "version": 2 1121 | }, 1122 | "file_extension": ".py", 1123 | "mimetype": "text/x-python", 1124 | "name": "python", 1125 | "nbconvert_exporter": "python", 1126 | "pygments_lexer": "ipython2", 1127 | "version": "2.7.11" 1128 | } 1129 | }, 1130 | "nbformat": 4, 1131 | "nbformat_minor": 0 1132 | } 1133 | --------------------------------------------------------------------------------