├── .gitignore ├── DL.md ├── DL ├── .gitignore ├── DL │ ├── __init__.py │ ├── datasets │ │ └── __init__.py │ ├── models │ │ ├── DBN.py │ │ ├── EmbeddingLayer.py │ │ ├── ForwardFeed.py │ │ ├── HiddenLayer.py │ │ ├── LSTM.py │ │ ├── MLP.py │ │ ├── RNN.py │ │ └── __init__.py │ ├── optimizers │ │ ├── __init__.py │ │ ├── adadelta.py │ │ ├── rmsprop.py │ │ └── sgd.py │ └── utils.py └── setup.py ├── README.md ├── THEANO.md ├── examples ├── dbn-mnist.py ├── lstm-imdb.py ├── mlp-mnist-adadelta.py ├── mlp-mnist-dropout.py ├── mlp-mnist-load.py ├── mlp-mnist-rmsprop.py ├── mlp-mnist-save.py ├── mlp-mnist-sgd.py ├── rnn-lag-binary.py ├── rnn-lag-real.py └── rnn-lag-softmax.py └── theano-tests ├── diag.py ├── dot.py ├── embedding-indexing.py ├── forward-feed-column-pooling.py ├── mean-pooling.py ├── random-streams-scan-clone.py ├── rnn-dropout.py └── tensor-shape-append.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.DS_Store 2 | *.pyc 3 | *.pkl 4 | *.gz 5 | *.npz -------------------------------------------------------------------------------- /DL.md: -------------------------------------------------------------------------------- 1 | # Research Review 2 | 3 | Neural Networks (NNs) have been studied for decades. But it wasn't unitl 1986 when an efficient method for training them was discovered called [backpropagation][1]. The idea behind it is simple and intuitive - use the chain rule to propagate error derivatives backwards through the network. 4 | 5 | A multilayer perceptron (MLP) with one hidden layer has been proven to be a [universal approximator][2]. That means a MLP can represent any arbitrary function if the hidden layer has a enough units. However, it is extremely challenging to learn a set parameters of parameters that generalizes well. Thus, researchers resorted to designing NN architectures that are more specific to certain problems. 6 | 7 | The first big success for NNs was the Convolutional Neural Network (CNN). It was designed to be used by the US Postal service for [zipcode recognition][3]. CNNs are deep NNs, meaning they have more than one hidden layer. They also involve shared weights that are convolved against the previous layer. These networks have been [wildly sucessful for image recognition][4] by [heirarchically learning and composing low-level features into successively higher-level features][5]. 8 | 9 | 10 | Deep neural networks still out of reach. Problem with the gradients. Autoencoders. Then Momentum. RMSProp, etc. 11 | 12 | RNNs for NLP. Vanishing gradients. LSTM for long term dependancies. 13 | 14 | Text Embedding. 15 | 16 | 17 | 18 | --- 19 | 20 | NNs are... They have certain properties... 21 | 22 | In 2006... autoencoders... pretraining... newer methods that don't need pretraining but the concept is still the same. 23 | 24 | Can we bring this concept to RNNs for robotics? 25 | 26 | Generalization - NLP embedding 27 | 28 | 29 | 30 | 31 | --- 32 | 33 | "Unsupervised State Estimation using Deep Learning for Interactive Object Recognition" 34 | 35 | Outline: 36 | 37 | Deep learning: 38 | - unsupervised learning 39 | - autoencoders for pretraining 40 | - recurrent neural networks 41 | - recurrent autoencoder for ASR 42 | 43 | Goal: 44 | - unsupervised learning for state estimation 45 | - pretraining for supervising learning on hidden state 46 | 47 | Models: 48 | - RNN for unsupervised 49 | - ARNN for unsupervised 50 | 51 | train to predict the next observation. using the hidden state varaibles, supervised predict the die from h_t. For each action, given what we're expected to see next, compute the likelihood for each die. Take the action that leads to the minimum entropy over these guesses. This is the optimal. 52 | 53 | Likely overfit. User regularizatoin and dropout. 54 | 55 | - RNN for supervised 56 | - ARNN for supervised 57 | 58 | Predict the object likelihood directly as opposed to this unsupervised middleman. performance? 59 | 60 | Other Problems: 61 | 62 | Try to use this model on LiDaR SLAM to predict the room and navigate between rooms. 63 | 64 | 65 | 66 | [1]: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=JicYPdAAAAAJ&citation_for_view=JicYPdAAAAAJ:GFxP56DSvIMC 67 | [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.2647&rep=rep1&type=pdf 68 | [3]: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=WLN3QrAAAAAJ&citation_for_view=WLN3QrAAAAAJ:u-x6o8ySG0sC 69 | [4]: http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf 70 | [5]: http://ftp.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf 71 | 72 | [topo]: http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | how could we use embedding to learn the 3d generalization of actions? 99 | you cant quite do it in this case because all the labels are entirely interchangeable based on a new experience. So what if every experience, we check to see if we can predict correctly, otherwise we train up a new RNN. Can we prove that this can generalize 3d geometries? Use embedding to give EVERY example it own unique sets of observables and use embedding to get that down to a reasonable dimension! 100 | 101 | For one trial we have a 3 by 6 matrix to project into 3D. We run it through the USE. the predictions use the projection matrix transposed. Thus we have an "internal" model that always stays the same between trials. And for each trial, we need to learn its own projections into the internal model. Thus fitting experiences to the internal "notion" fo reality. 102 | 103 | Thus we can compare experiences using these projections and learn to predict which die / experience we are closest to. My fear is that this will not learn a multimodal distribution of preditions when at the beginning of a new trial. So maybe we must enforce that 104 | 105 | Now using the 2D SLAM exmaple, we can focus in on a different part of the problem. For 2D slam, the observations are always of the same nature, thus we dont need to embed of observations for the reasons of the dice problem. But now we need to think about how we represent a multimodal distribution. The problem exists in both. How to we represent the fact that we may be in two places at once? I think the point of NNs is that we get this for free... 106 | 107 | TODO: -------------------------------------------------------------------------------- /DL/.gitignore: -------------------------------------------------------------------------------- 1 | *.egg-info 2 | dist/ 3 | *.pyc 4 | *.pkl 5 | *.gz -------------------------------------------------------------------------------- /DL/DL/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ccorcos/deep-learning/df5e3072077460d72d3281724cacefa97a3b2dfd/DL/DL/__init__.py -------------------------------------------------------------------------------- /DL/DL/datasets/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | import numpy 3 | import cPickle as pickle 4 | import os 5 | import urllib 6 | import gzip 7 | from ..utils import untuple 8 | 9 | datasetPath = '/'.join((__file__.split('/')[:-1]+[''])) 10 | 11 | def getDataset(name, url): 12 | name = datasetPath + name 13 | if not os.path.isfile(name): 14 | print "Retieving dataset from %s" % (url) 15 | urllib.urlretrieve(url, name) 16 | 17 | if not os.path.isfile(name): 18 | print "Cannot find dataset %s" % (name) 19 | 20 | if name[-2:] == 'gz': 21 | f = gzip.open(name, 'rb') 22 | data = pickle.load(f) 23 | f.close() 24 | return data 25 | else: 26 | f = open(name, 'rb') 27 | data = pickle.load(f) 28 | f.close() 29 | return data 30 | 31 | def mnist(): 32 | dataset = getDataset('mnist.pkl.gz', 'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz') 33 | return untuple(dataset) 34 | 35 | 36 | def imdb(validation_ratio=0.1, vocabulary_size=10000, maxlen=100): 37 | """ 38 | validation_ratio: ratio of training data set aside for validation 39 | vocabulary_size: Vocabulary size. Assuming the larger the word number, 40 | the less often it occurs. Unknown words are set to 1 41 | maxlen: Sequence longer then this get ignored 42 | """ 43 | 44 | train_set = getDataset('imdb.pkl', 'http://www.iro.umontreal.ca/~lisa/deep/data/imdb.pkl') 45 | test_set = getDataset('imdb.pkl', 'http://www.iro.umontreal.ca/~lisa/deep/data/imdb.pkl') 46 | 47 | # filter out the sequences longer than maxlen 48 | new_train_set_x = [] 49 | new_train_set_y = [] 50 | for x, y in zip(train_set[0], train_set[1]): 51 | if len(x) < maxlen: 52 | new_train_set_x.append(x) 53 | new_train_set_y.append(y) 54 | train_set = (new_train_set_x, new_train_set_y) 55 | del new_train_set_x, new_train_set_y 56 | 57 | 58 | # split training set into validation set 59 | train_set_x, train_set_y = train_set 60 | n_samples = len(train_set_x) 61 | sample_idx = numpy.random.permutation(n_samples) 62 | n_train = int(numpy.round(n_samples * (1. - validation_ratio))) 63 | valid_set_x = [train_set_x[s] for s in sample_idx[n_train:]] 64 | valid_set_y = [train_set_y[s] for s in sample_idx[n_train:]] 65 | train_set_x = [train_set_x[s] for s in sample_idx[:n_train]] 66 | train_set_y = [train_set_y[s] for s in sample_idx[:n_train]] 67 | train_set = (train_set_x, train_set_y) 68 | valid_set = (valid_set_x, valid_set_y) 69 | 70 | # all words outside the vocabulary are set to 1 71 | removeUnknownWords = lambda x: [[1 if word >= vocabulary_size else word for word in review] for review in x] 72 | 73 | test_set_x, test_set_y = test_set 74 | valid_set_x, valid_set_y = valid_set 75 | train_set_x, train_set_y = train_set 76 | 77 | train_set_x = removeUnknownWords(train_set_x) 78 | valid_set_x = removeUnknownWords(valid_set_x) 79 | test_set_x = removeUnknownWords(test_set_x) 80 | 81 | # sort the sequences by their length 82 | sortLength = lambda sequences: sorted(range(len(sequences)), key=lambda x: len(sequences[x])) 83 | 84 | sorted_index = sortLength(test_set_x) 85 | test_set_x = [test_set_x[i] for i in sorted_index] 86 | test_set_y = [test_set_y[i] for i in sorted_index] 87 | 88 | sorted_index = sortLength(valid_set_x) 89 | valid_set_x = [valid_set_x[i] for i in sorted_index] 90 | valid_set_y = [valid_set_y[i] for i in sorted_index] 91 | 92 | sorted_index = sortLength(train_set_x) 93 | train_set_x = [train_set_x[i] for i in sorted_index] 94 | train_set_y = [train_set_y[i] for i in sorted_index] 95 | 96 | # gather the dataset again 97 | train = (train_set_x, train_set_y) 98 | valid = (valid_set_x, valid_set_y) 99 | test = (test_set_x, test_set_y) 100 | 101 | dataset = [train, valid, test] 102 | return untuple(dataset) 103 | 104 | -------------------------------------------------------------------------------- /DL/DL/models/DBN.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | # import theano 5 | # import theano.tensor as T 6 | # import numpy 7 | from ForwardFeed import ForwardFeed 8 | from HiddenLayer import HiddenLayer 9 | from ..utils import * 10 | 11 | class DBN(object): 12 | """Deep Belief Network Class 13 | 14 | A Deep Belief network is a feedforward artificial neural network model 15 | that has many layers of hidden units and nonlinear activations. 16 | """ 17 | 18 | def __init__(self, rng, input, n_in, n_out, layer_sizes=[], dropout_rate=0, srng=None, activation='tanh', outputActivation='softmax', params=None): 19 | """Initialize the parameters for the multilayer perceptron 20 | 21 | rng: random number generator, e.g. numpy.random.RandomState(1234) 22 | 23 | input: theano.tensor matrix of shape (n_examples, n_in) 24 | 25 | n_in: int, dimensionality of input 26 | 27 | layer_sizes: array of ints, dimensionality of the hidden layers 28 | 29 | n_out: int, number of hidden units 30 | 31 | dropout_rate: float, if dropout_rate is non zero, then we implement a Dropout in the hidden layer 32 | 33 | activation: string, nonlinearity to be applied in the hidden layer 34 | """ 35 | 36 | ff = ForwardFeed( 37 | rng=rng, 38 | input=input, 39 | layer_sizes=[n_in] + layer_sizes, 40 | activation=activation, 41 | params=maybe(lambda: params[0]), 42 | dropout_rate=dropout_rate, 43 | srng=srng, 44 | ) 45 | 46 | outputLayer = HiddenLayer( 47 | rng=rng, 48 | input=ff.output, 49 | n_in=layer_sizes[-1], 50 | n_out=n_out, 51 | activation=outputActivation, 52 | params=maybe(lambda: params[1]) 53 | ) 54 | 55 | self.layers = [ff, outputLayer] 56 | 57 | self.params = layers_params(self.layers) 58 | self.L1 = layers_L1(self.layers) 59 | self.L2_sqr = layers_L2_sqr(self.layers) 60 | 61 | self.output = outputLayer.output 62 | -------------------------------------------------------------------------------- /DL/DL/models/EmbeddingLayer.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from ..utils import * 8 | 9 | class EmbeddingLayer(object): 10 | def __init__(self, rng, input, n_in, n_out, sequenceData=True, onehot=False, params=None): 11 | # sequenceData tells us if the input dimension is (n_examples, n_timesteps) 12 | # or if it is (n_examples) 13 | # onhot tell us us if the input dimension is (n_examples, n_timesteps, n_in) or (n_examples, n_in) 14 | 15 | # the output of uniform if converted using asarray to dtype 16 | # theano.config.floatX so that the code is runable on GPU 17 | # [Xavier10] suggests that you should use 4 times larger initial 18 | # weights for sigmoid compared to tanh. 19 | W = None 20 | if params is not None: 21 | W = params[0] 22 | 23 | if W is None: 24 | W_values = numpy.asarray( 25 | rng.rand(n_in, n_out), 26 | dtype=theano.config.floatX 27 | ) 28 | 29 | W = theano.shared(value=W_values * 0.01, name='W', borrow=True) 30 | 31 | if onehot: 32 | self.output = T.dot(input, W) 33 | else: 34 | # change the last dimension to the projected dimension 35 | shape = T.concatenate([input.shape, [n_out]]) 36 | if sequenceData: 37 | self.output = W[input.flatten()].reshape(shape, ndim=3) 38 | else: 39 | self.output = W[input.flatten()].reshape(shape, ndim=2) 40 | 41 | self.params = [W] 42 | self.L1 = 0 43 | self.L2_sqr = 0 -------------------------------------------------------------------------------- /DL/DL/models/ForwardFeed.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | # import theano 5 | # import theano.tensor as T 6 | # import numpy 7 | from HiddenLayer import HiddenLayer 8 | from ..utils import * 9 | 10 | class ForwardFeed(object): 11 | """ForwardFeed Class 12 | 13 | This is just a chain of hidden layers. 14 | """ 15 | 16 | def __init__(self, rng, input, layer_sizes=[], dropout_rate=0, srng=None, params=None, activation='tanh'): 17 | """Initialize the parameters for the forward feed 18 | 19 | rng: random number generator, e.g. numpy.random.RandomState(1234) 20 | 21 | input: theano.tensor matrix of shape (n_examples, n_in) 22 | 23 | layer_sizes: array of ints, dimensionality of each layer size, input to output 24 | 25 | activation: string, nonlinearity to be applied in the hidden layer 26 | """ 27 | 28 | output = input 29 | layers = [] 30 | for i in range(0, len(layer_sizes)-1): 31 | hiddenLayer = HiddenLayer( 32 | rng=rng, 33 | input=output, 34 | params=maybe(lambda: params[i]), 35 | n_in=layer_sizes[i], 36 | n_out=layer_sizes[i+1], 37 | activation=activation) 38 | 39 | h = hiddenLayer.output 40 | if dropout_rate > 0: 41 | assert(srng is not None) 42 | h = dropout(srng, dropout_rate, h) 43 | 44 | output = h 45 | layers.append(hiddenLayer) 46 | 47 | self.layers = layers 48 | self.output = output 49 | 50 | self.params = layers_params(self.layers) 51 | self.L1 = layers_L1(self.layers) 52 | self.L2_sqr = layers_L2_sqr(self.layers) -------------------------------------------------------------------------------- /DL/DL/models/HiddenLayer.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from ..utils import * 8 | 9 | class HiddenLayer(object): 10 | def __init__(self, rng, input, n_in, n_out, params=None, activation='tanh'): 11 | """ 12 | Typical hidden layer of a MLP: units are fully-connected and have 13 | sigmoidal (tanh actually) activation function. Weight matrix W is of shape (n_in, n_out) 14 | and the bias vector b is of shape (n_out,). 15 | 16 | rng: random number generator, e.g. numpy.random.RandomState(1234) 17 | 18 | input: theano.tensor matrix of shape (n_examples, n_in) 19 | 20 | n_in: int, dimensionality of input 21 | 22 | n_out: int, number of hidden units 23 | 24 | activation: string, nonlinearity to be applied in the hidden layer 25 | """ 26 | self.input = input 27 | 28 | # the output of uniform if converted using asarray to dtype 29 | # theano.config.floatX so that the code is runable on GPU 30 | # [Xavier10] suggests that you should use 4 times larger initial 31 | # weights for sigmoid compared to tanh. 32 | W = None 33 | b = None 34 | 35 | if params is not None: 36 | W = params[0] 37 | b = params[1] 38 | 39 | if W is None: 40 | W_values = numpy.asarray( 41 | rng.uniform( 42 | low=-numpy.sqrt(6. / (n_in + n_out)), 43 | high=numpy.sqrt(6. / (n_in + n_out)), 44 | size=(n_in, n_out) 45 | ), 46 | dtype=theano.config.floatX 47 | ) 48 | if activation is 'sigmoid': 49 | W_values *= 4 50 | 51 | W = theano.shared(value=W_values, name='W', borrow=True) 52 | 53 | if b is None: 54 | b_values = numpy.zeros((n_out,), dtype=theano.config.floatX) 55 | b = theano.shared(value=b_values, name='b', borrow=True) 56 | 57 | 58 | self.output_linear = T.dot(input, W) + b 59 | self.output = (self.output_linear if activation is None else activations[activation](self.output_linear)) 60 | 61 | self.params = [W, b] 62 | self.weights = [W] 63 | 64 | self.L1 = compute_L1(self.weights) 65 | self.L2_sqr = compute_L2_sqr(self.weights) -------------------------------------------------------------------------------- /DL/DL/models/LSTM.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from ..utils import * 8 | 9 | """ 10 | 11 | Generic LSTM Architecture 12 | LSTM [Graves 2012] 13 | 14 | x is the input 15 | y is the output 16 | h is the memory cell 17 | 18 | g_i is the input gate 19 | c_i is the input candidate 20 | g_f is the forget gate 21 | g_o is the output gete 22 | 23 | s is the sigmoid function 24 | f is a nonlinear transfer function 25 | 26 | 27 | g_i_t = s(W_i * x_t + U_i * y_tm1 + b_i) 28 | c_i_t = f(W_c * x_t + U_c * y_tm1 + b_c) 29 | 30 | g_f_t = s(W_f * x_t + U_f * y_tm1 + b_f) 31 | h_t = g_i_t * c_i_t + g_f_t * h_tm1 32 | 33 | g_o_t = s(W_o * x_t + U_o * y_tm1 + V_o * h_t + b_o) 34 | y_t = g_o_t * f(h_t) 35 | 36 | """ 37 | 38 | 39 | class LSTM(object): 40 | """ 41 | A simplified LSTM Layer. 42 | http://deeplearning.net/tutorial/lstm.html 43 | 44 | For parallelization: 45 | g_o_t = s(W_o * x_t + U_o * y_tm1 + b_o) 46 | 47 | From the input and the previous output, we can compute the input, output, 48 | and forget gates along with the input candidate. Then we can compute the 49 | hidden memory units and the output. 50 | 51 | h_tm1 ------ 52 | X ------------- X --▶ y_t 53 | --▶ g_o_t ------ | 54 | y_tm1 -- | | 55 | |-----▶ g_i_t -- | 56 | x_t ------------ | X ----- + --▶ h_t -- 57 | |--▶ c_i_t -- | 58 | | | 59 | --▶ g_f_t ------ | 60 | X -- 61 | h_tm1 ------ 62 | 63 | """ 64 | 65 | def __init__(self, rng, input, mask, n_units, activation='tanh', params=None): 66 | 67 | # LSTM weights 68 | W = None 69 | U = None 70 | b = None 71 | if params is not None: 72 | W = params[0] 73 | U = params[1] 74 | b = params[2] 75 | 76 | if W is None: 77 | # g_i, g_f, g_o, c_i 78 | W_values = numpy.concatenate([ortho_weight(n_units), 79 | ortho_weight(n_units), 80 | ortho_weight(n_units), 81 | ortho_weight(n_units)], axis=1) 82 | 83 | W = theano.shared(value=W_values, name='W', borrow=True) 84 | 85 | if U is None: 86 | U_values = numpy.concatenate([ortho_weight(n_units), 87 | ortho_weight(n_units), 88 | ortho_weight(n_units), 89 | ortho_weight(n_units)], axis=1) 90 | 91 | U = theano.shared(value=U_values, name='U', borrow=True) 92 | 93 | if b is None: 94 | b_values = numpy.zeros((4 * n_units,)).astype(theano.config.floatX) 95 | 96 | b = theano.shared(value=b_values, name='b', borrow=True) 97 | 98 | 99 | # cut out the gates after parallel matrix multiplication 100 | def cut(x, n, dim): 101 | return x[:, n * dim:(n + 1) * dim] 102 | 103 | f = activations[activation] 104 | s = activations['sigmoid'] 105 | def step(mask_t, xWb_t, y_tm1, h_tm1): 106 | # (n_examples, 4*n_units) 107 | pre_activation = T.dot(y_tm1, U) + xWb_t 108 | 109 | g_i_t = s(cut(pre_activation, 0, n_units)) 110 | c_i_t = f(cut(pre_activation, 3, n_units)) 111 | 112 | g_f_t = s(cut(pre_activation, 1, n_units)) 113 | h_t = g_f_t * h_tm1 + g_i_t * c_i_t 114 | # mask for valid inputs 115 | h_t = mask_t[:, None] * h_t + (1. - mask_t)[:, None] * h_tm1 116 | 117 | g_o_t = s(cut(pre_activation, 2, n_units)) 118 | y_t = g_o_t * f(h_t) 119 | # mask for valid inputs 120 | y_t = mask_t[:, None] * y_t + (1. - mask_t)[:, None] * y_tm1 121 | 122 | return y_t, h_t 123 | 124 | 125 | # input is initially (n_examples, n_timesteps, n_units) 126 | # we want to scan over timesteps! 127 | input = input.dimshuffle(1,0,2) 128 | # mask is (n_examples, maxlen) 129 | mask = mask.dimshuffle(1,0) 130 | 131 | # timesteps, samples, dimension 132 | # n_timesteps = input.shape[0] 133 | n_samples = input.shape[1] 134 | 135 | 136 | # efficiently compute the input gate, forget gate, 137 | # (n_timesteps, n_examples, 4 * n_units) 138 | xWb = T.dot(input, W) + b 139 | 140 | [y, h], updates = theano.scan(step, 141 | sequences=[mask, xWb], 142 | outputs_info=[T.alloc(numpy.asarray(0., dtype=theano.config.floatX), n_samples, n_units), 143 | T.alloc(numpy.asarray(0., dtype=theano.config.floatX), n_samples, n_units)]) 144 | # n_steps=n_timesteps) 145 | 146 | # swap the dimensions back to (n_examples, n_timesteps, n_units) 147 | h = h.dimshuffle(1,0,2) 148 | y = y.dimshuffle(1,0,2) 149 | 150 | self.params = [U, W, b] 151 | self.weights = [U, W] 152 | self.L1 = compute_L1(self.weights) 153 | self.L2_sqr = compute_L2_sqr(self.weights) 154 | 155 | self.output = y 156 | -------------------------------------------------------------------------------- /DL/DL/models/MLP.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | # import theano 5 | # import theano.tensor as T 6 | # import numpy 7 | from HiddenLayer import HiddenLayer 8 | from ..utils import * 9 | 10 | 11 | class MLP(object): 12 | """Multi-Layer Perceptron Class 13 | 14 | A multilayer perceptron is a feedforward artificial neural network model 15 | that has one layer or more of hidden units and nonlinear activations. 16 | Intermediate layers usually have as activation function tanh or the 17 | sigmoid function while the top layer is a softamx layer. 18 | """ 19 | 20 | def __init__(self, rng, input, n_in, n_hidden, n_out, srng=None, dropout_rate=0, activation='tanh', outputActivation='softmax', params=None): 21 | """Initialize the parameters for the multilayer perceptron 22 | 23 | rng: random number generator, e.g. numpy.random.RandomState(1234) 24 | 25 | input: theano.tensor matrix of shape (n_examples, n_in) 26 | 27 | n_in: int, dimensionality of input 28 | 29 | n_hidden: int, number of hidden units 30 | 31 | n_out: int, number of hidden units 32 | 33 | dropout_rate: float, if dropout_rate is non zero, then we implement a Dropout in the hidden layer 34 | 35 | activation: string, nonlinearity to be applied in the hidden layer 36 | 37 | """ 38 | 39 | hiddenLayer = HiddenLayer( 40 | rng=rng, 41 | input=input, 42 | n_in=n_in, 43 | n_out=n_hidden, 44 | activation=activation, 45 | params=maybe(lambda: params[0]) 46 | ) 47 | 48 | h = hiddenLayer.output 49 | if dropout_rate > 0: 50 | assert(srng is not None) 51 | h = dropout(srng, dropout_rate, h) 52 | 53 | outputLayer = HiddenLayer( 54 | rng=rng, 55 | input=h, 56 | n_in=n_hidden, 57 | n_out=n_out, 58 | activation=outputActivation, 59 | params=maybe(lambda: params[1]) 60 | ) 61 | 62 | self.layers = [hiddenLayer, outputLayer] 63 | self.params = layers_params(self.layers) 64 | self.L1 = layers_L1(self.layers) 65 | self.L2_sqr = layers_L2_sqr(self.layers) 66 | 67 | self.output = outputLayer.output 68 | -------------------------------------------------------------------------------- /DL/DL/models/RNN.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from ..utils import * 8 | from HiddenLayer import HiddenLayer 9 | 10 | 11 | class Recurrence(object): 12 | """A Reccurence Class which wraps an architecture into a recurrent one.""" 13 | 14 | def __init__(self, input, input_t, output_t, recurrent_t, recurrent_tm1, recurrent_0, updates=[]): 15 | """Initialize the recurrence class with the input, output, the recurrent variable 16 | and the initial recurrent variable. 17 | 18 | This compute in minibatches, so input is (n_examples, n_timesteps, n_in) 19 | input_t is (n_examples, n_in) 20 | 21 | """ 22 | 23 | # compute the recurrence 24 | def step(x_t, h_tm1): 25 | h_t = theano.clone(recurrent_t, replace=updates + [(input_t, x_t), (recurrent_tm1, h_tm1)]) 26 | y_t = theano.clone(output_t, replace=updates + [(recurrent_t, h_t)]) 27 | return h_t, y_t 28 | 29 | h0_t = T.extra_ops.repeat(recurrent_0[numpy.newaxis, :], input.shape[0], axis=0) 30 | 31 | [h, y], _ = theano.scan(step, 32 | sequences=[input.dimshuffle(1,0,2),], # swap the first two dimensions to scan over n_timesteps 33 | outputs_info=[h0_t, None]) 34 | 35 | # swap the dimensions back to (n_examples, n_timesteps, n_out) 36 | h = h.dimshuffle(1,0,2) 37 | y = y.dimshuffle(1,0,2) 38 | 39 | self.output = y 40 | self.recurrent = h 41 | 42 | 43 | 44 | class RNN(object): 45 | """Recurrent Neural Network Class 46 | 47 | A RNN looks a lot like an MLP but the hidden layer is recurrent, so the hidden 48 | layer receives the input and the itselft at the previous time step. RNNs can have 49 | "deep" transition, inputs, and outputs like so: 50 | 51 | 52 | (n_in) ----▶ (n_hidden) ----▶ (n_out) 53 | ▲ | 54 | | | 55 | | | 56 | -----{t-1}---- 57 | """ 58 | 59 | def __init__(self, rng, input, n_in, n_hidden, n_out, activation='tanh', outputActivation='softmax', params=None): 60 | """Initialize the parameters for the recurrent neural network 61 | 62 | rng: random number generator, e.g. numpy.random.RandomState(1234) 63 | 64 | input: theano.tensor matrix of shape (n_examples, n_timesteps, n_in) 65 | 66 | n_in: int, dimensionality of input 67 | 68 | n_hidden: int, number of hidden units 69 | 70 | n_out: int, number of hidden units 71 | 72 | dropout_rate: float, if dropout_rate is non zero, then we implement a Dropout all hidden layers 73 | 74 | activation: string, nonlinearity to be applied in the hidden layer 75 | """ 76 | 77 | # create the h0 prior 78 | h0 = None 79 | if params: 80 | h0 = params[0] 81 | else: 82 | h0_values = numpy.asarray( 83 | rng.uniform( 84 | low=-numpy.sqrt(6. / (n_in + n_out)), 85 | high=numpy.sqrt(6. / (n_in + n_out)), 86 | size=(n_hidden,) 87 | ), 88 | dtype=theano.config.floatX 89 | ) 90 | if activation is 'sigmoid': 91 | h0_values *= 4 92 | 93 | h0 = theano.shared(value=h0_values, name='h0', borrow=True) 94 | 95 | 96 | 97 | # Create the computation graph 98 | h_tm1 = T.matrix('h_tm1') # n_examples, n_hidden @ t-1 99 | x_t = T.matrix('x_t') # n_examples, n_in @ some specific time 100 | 101 | hiddenLayer = HiddenLayer( 102 | rng=rng, 103 | input= T.concatenate([x_t, h_tm1], axis=1), 104 | n_in=n_in+n_hidden, 105 | n_out=n_hidden, 106 | activation=activation, 107 | params=maybe(lambda: params[1]) 108 | ) 109 | 110 | h_t = hiddenLayer.output 111 | 112 | outputLayer = HiddenLayer( 113 | rng=rng, 114 | input=h_t, 115 | n_in=n_hidden, 116 | n_out=n_out, 117 | activation=outputActivation, 118 | params=maybe(lambda: params[1]) 119 | ) 120 | 121 | y_t = outputLayer.output 122 | 123 | self.layers = [hiddenLayer, outputLayer] 124 | self.params = [h0] + layers_params(self.layers) 125 | self.L1 = layers_L1(self.layers) 126 | self.L2_sqr = layers_L2_sqr(self.layers) 127 | 128 | recurrence = Recurrence( 129 | input=input, 130 | input_t=x_t, 131 | output_t=y_t, 132 | recurrent_t=h_t, 133 | recurrent_tm1=h_tm1, 134 | recurrent_0=h0, 135 | ) 136 | 137 | self.output = recurrence.output 138 | self.h = recurrence.recurrent 139 | -------------------------------------------------------------------------------- /DL/DL/models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ccorcos/deep-learning/df5e3072077460d72d3281724cacefa97a3b2dfd/DL/DL/models/__init__.py -------------------------------------------------------------------------------- /DL/DL/optimizers/__init__.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | import time 8 | import random 9 | from ..utils import startTimer 10 | import sgd 11 | import rmsprop 12 | import adadelta 13 | 14 | optimizers = { 15 | 'sgd': sgd.sgd, 16 | 'rmsprop': rmsprop.rmsprop, 17 | 'adadelta': adadelta.adadelta, 18 | } 19 | 20 | def optimize(dataset=None, 21 | inputs=None, 22 | cost=None, 23 | params=None, 24 | errors=None, 25 | n_epochs=1000, 26 | batch_size=20, 27 | patience=10000, 28 | patience_increase=2, 29 | improvement_threshold=0.995, 30 | updates=[], 31 | test_batches=-1, 32 | print_cost=False, 33 | optimizer='rmsprop', 34 | **options): 35 | 36 | 37 | # index to a [mini]batch 38 | index = T.lscalar() 39 | 40 | train_set = dataset[0] 41 | valid_set = dataset[1] 42 | test_set = dataset[2] 43 | 44 | # compute number of minibatches for training, validation and testing 45 | n_train_batches = train_set[0].get_value(borrow=True).shape[0] / batch_size 46 | n_valid_batches = valid_set[0].get_value(borrow=True).shape[0] / batch_size 47 | n_test_batches = test_set[0].get_value(borrow=True).shape[0] / batch_size 48 | 49 | if n_train_batches == 0: 50 | n_train_batches = 1 51 | if n_valid_batches == 0: 52 | n_valid_batches = 1 53 | if n_test_batches == 0: 54 | n_test_batches = 1 55 | 56 | print "compiling test function" 57 | stop = startTimer("compiling test function") 58 | # compiling a Theano function that computes the mistakes that are made 59 | # by the model on a minibatch 60 | test_givens = list(updates) 61 | valid_givens = list(updates) 62 | train_givens = list(updates) 63 | for i in range(len(inputs)): 64 | test_givens.append((inputs[i], test_set[i][index * batch_size:(index + 1) * batch_size])) 65 | valid_givens.append((inputs[i], valid_set[i][index * batch_size:(index + 1) * batch_size])) 66 | train_givens.append((inputs[i], train_set[i][index * batch_size:(index + 1) * batch_size])) 67 | 68 | test_model = theano.function( 69 | inputs=[index], 70 | outputs=errors, 71 | givens=test_givens 72 | ) 73 | stop() 74 | 75 | print "compiling validate function" 76 | stop = startTimer("compiling validate function") 77 | validate_model = theano.function( 78 | inputs=[index], 79 | outputs=errors, 80 | givens=valid_givens 81 | ) 82 | stop() 83 | 84 | print "computing gradients" 85 | stop = startTimer("computing gradients") 86 | gparams = T.grad(cost, params) 87 | stop() 88 | 89 | 90 | updates = optimizers[optimizer]( 91 | params=params, 92 | gparams=gparams, 93 | **options 94 | ) 95 | 96 | print optimizer + ": compiling training function" 97 | stop = startTimer(optimizer + ": compiling training function") 98 | # compiling a Theano function `train_model` that returns the cost, but in 99 | # the same time updates the parameter of the model based on the rules 100 | # defined in `updates` 101 | train_model = theano.function( 102 | inputs=[index], 103 | outputs=cost, 104 | updates=updates, 105 | givens=train_givens 106 | ) 107 | stop() 108 | 109 | validation_frequency = min(n_train_batches, patience / 2) 110 | 111 | start_time = time.clock() 112 | 113 | best_validation_loss = numpy.inf 114 | best_iter = 0 115 | test_loss = 0. 116 | 117 | epoch = 0 118 | impatient = False 119 | 120 | print optimizer + ": optimizing..." 121 | try: 122 | while (epoch < n_epochs) and (not impatient): 123 | epoch = epoch + 1 124 | for minibatch_index in xrange(n_train_batches): 125 | minibatch_avg_cost = train_model(minibatch_index) 126 | if print_cost: 127 | print " cost: %0.05f" % minibatch_avg_cost 128 | 129 | # keep track of how many minibatches we've trained. Every so often, do a validation. 130 | iteration = (epoch - 1) * n_train_batches + minibatch_index 131 | if (iteration + 1) % validation_frequency is 0: 132 | # compute zero-one loss on validation set 133 | validation_losses = [validate_model(i) for i in xrange(n_valid_batches)] 134 | this_validation_loss = numpy.mean(validation_losses) 135 | print 'epoch %i, minibatch %i/%i, validation error %f %%' % (epoch, minibatch_index + 1, n_train_batches, this_validation_loss * 100.) 136 | print ' iteration %i, patience %i' % (iteration, patience) 137 | 138 | # if we have a better validation then keep track of it 139 | if this_validation_loss < best_validation_loss: 140 | #improve patience if loss improvement is good enough 141 | if this_validation_loss < best_validation_loss * improvement_threshold: 142 | patience = max(patience, iteration * patience_increase) 143 | # keep track of the best validation 144 | best_validation_loss = this_validation_loss 145 | best_iter = iteration 146 | # remember the test loss as well 147 | test_losses = None 148 | if test_batches > 0: 149 | test_losses = [test_model(i) for i in random.sample(range(n_test_batches), test_batches)] 150 | else: 151 | test_losses = [test_model(i) for i in xrange(n_test_batches)] 152 | test_loss = numpy.mean(test_losses) 153 | print ' epoch %i, minibatch %i/%i, best test error %f %%' % (epoch, minibatch_index + 1, n_train_batches, test_loss * 100.) 154 | 155 | if patience <= iteration: 156 | impatient = True 157 | break 158 | 159 | except KeyboardInterrupt: 160 | print "" 161 | print "" 162 | print optimizer + ": optimization interupted" 163 | print "" 164 | 165 | end_time = time.clock() 166 | 167 | print 'Optimiztation complete' 168 | print 'The code run for %d epochs, with %f epochs/sec' % (epoch, 1. * epoch / (end_time - start_time)) 169 | print 'Best validation score of %f %% obtained at iteration %i, with test performance %f %%' % (best_validation_loss * 100., best_iter + 1, test_loss * 100.) 170 | print "" 171 | 172 | train_loss = None 173 | valid_loss = None 174 | test_loss = None 175 | 176 | try: 177 | print "computing model errors" 178 | print "" 179 | print " training..." 180 | train_losses = [train_model(i) for i in xrange(n_train_batches)] 181 | train_loss = numpy.mean(train_losses) 182 | print " validation..." 183 | valid_losses = [validate_model(i) for i in xrange(n_valid_batches)] 184 | valid_loss = numpy.mean(valid_losses) 185 | print " test..." 186 | test_losses = [test_model(i) for i in xrange(n_test_batches)] 187 | test_loss = numpy.mean(test_losses) 188 | print "\n train: %0.05f \n validation: %0.05f \n test: %0.05f \n" % (train_loss, valid_loss, test_loss) 189 | 190 | except KeyboardInterrupt: 191 | print "computing model errors interrupted" 192 | 193 | return train_loss, valid_loss, test_loss 194 | -------------------------------------------------------------------------------- /DL/DL/optimizers/adadelta.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | 8 | def adadelta(params, gparams): 9 | 10 | # http://deeplearning.net/tutorial/code/lstm.py 11 | zipped_grads = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params] 12 | running_up2 = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params] 13 | running_grads2 = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params] 14 | 15 | zgup = [(zg, g) for zg, g in zip(zipped_grads, gparams)] 16 | rg2up = [(rg2, 0.95 * rg2 + 0.05 * (g ** 2)) for rg2, g in zip(running_grads2, gparams)] 17 | 18 | updir = [-T.sqrt(ru2 + 1e-6) / T.sqrt(rg2 + 1e-6) * zg for zg, ru2, rg2 in zip(zipped_grads, running_up2, running_grads2)] 19 | ru2up = [(ru2, 0.95 * ru2 + 0.05 * (ud ** 2)) for ru2, ud in zip(running_up2, updir)] 20 | param_up = [(p, p + ud) for p, ud in zip(params, updir)] 21 | 22 | updates = zgup + rg2up + ru2up + param_up 23 | 24 | return updates 25 | -------------------------------------------------------------------------------- /DL/DL/optimizers/rmsprop.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | 8 | def rmsprop(params, gparams): 9 | 10 | # http://deeplearning.net/tutorial/code/lstm.py 11 | zipped_grads = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params] 12 | running_grads = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params] 13 | running_grads2 = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params] 14 | 15 | zgup = [(zg, g) for zg, g in zip(zipped_grads, gparams)] 16 | rgup = [(rg, 0.95 * rg + 0.05 * g) for rg, g in zip(running_grads, gparams)] 17 | rg2up = [(rg2, 0.95 * rg2 + 0.05 * (g ** 2)) for rg2, g in zip(running_grads2, gparams)] 18 | 19 | updir = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params] 20 | updir_new = [(ud, 0.9 * ud - 1e-4 * zg / T.sqrt(rg2 - rg ** 2 + 1e-4)) for ud, zg, rg, rg2 in zip(updir, zipped_grads, running_grads, running_grads2)] 21 | param_up = [(p, p + udn[1]) for p, udn in zip(params, updir_new)] 22 | 23 | updates = zgup + rgup + rg2up + updir_new + param_up 24 | 25 | return updates -------------------------------------------------------------------------------- /DL/DL/optimizers/sgd.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | # import theano.tensor as T 6 | import numpy 7 | 8 | def sgd(params, gparams,learning_rate=0.01, momentum=0.1): 9 | 10 | """ 11 | stochastic gradient descent optimization with early stopping and momentum 12 | 13 | for vanilla gradient decent, set the patience to numpy.inf and momentum to 0 14 | 15 | early stopping criteria 16 | patience: look as this many examples regardless 17 | patience_increase: wait this much longer when a new best is found 18 | improvement_threshold: a relative improvement of this much is considered significant 19 | 20 | dataset is a list or tuple of length 3 including the training set, validation set 21 | and the test set. In each set, these must be a list or tuple of the inputs to 22 | the computational graph in the same order as the list of Theano.tensor variable 23 | that are passed in as inputs. The inputs to the graph must accept minibatches meaning 24 | that the first dimension is the number of training examples. 25 | 26 | """ 27 | 28 | 29 | momentums = [theano.shared(numpy.zeros(param.get_value(borrow=True).shape, dtype=theano.config.floatX)) for param in params] 30 | updates = [] 31 | for param, gparam, mom in zip(params, gparams, momentums): 32 | update = momentum * mom - learning_rate * gparam 33 | updates.append((mom, update)) 34 | updates.append((param, param + update)) 35 | 36 | return updates -------------------------------------------------------------------------------- /DL/DL/utils.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | import operator 8 | import time 9 | 10 | def load_data(dataset): 11 | ''' Loads the dataset to the GPU 12 | 13 | dataset = [train_set, valid_set, test_set] 14 | 15 | each set is a tuple (input, target) 16 | input is a matrix where rows are a sample 17 | target is a 1d array of what output should be 18 | ''' 19 | 20 | def shared_dataset(data, borrow=True): 21 | """ Function that loads the dataset into shared variables 22 | 23 | Create a shared dataset, copying the whole thing to the GPU. 24 | We dont want to copy each minibatch over one at a time. 25 | """ 26 | sharedData = [] 27 | for input in data: 28 | shared = theano.shared(numpy.asarray(input, 29 | dtype=theano.config.floatX), 30 | borrow=borrow) 31 | 32 | sharedData.append(shared) 33 | 34 | return sharedData 35 | 36 | test_set = shared_dataset(dataset[2]) 37 | valid_set = shared_dataset(dataset[1]) 38 | train_set = shared_dataset(dataset[0]) 39 | 40 | rval = [train_set, valid_set, test_set] 41 | return rval 42 | 43 | def maybe(func, otherwise=None): 44 | res = None 45 | try: 46 | res = func() 47 | except: 48 | return otherwise 49 | return res 50 | 51 | def flattenIterator(container): 52 | for i in container: 53 | if isinstance(i, list) or isinstance(i, tuple): 54 | for j in flatten(i): 55 | yield j 56 | else: 57 | yield i 58 | 59 | flatten = lambda x: list(flattenIterator(x)) 60 | 61 | fmt = lambda x: "{:12.8f}".format(x) 62 | 63 | def onehot(value, length): 64 | v = [0]*length 65 | v[value] = 1 66 | return v 67 | 68 | relu = lambda x: T.switch(x<0, 0, x) 69 | cappedrelu = lambda x: T.minimum(T.switch(x<0, 0, x), 6) 70 | sigmoid = T.nnet.sigmoid 71 | tanh = T.tanh 72 | # softmax = T.nnet.softmax 73 | 74 | # a differentiable version for HF that doesn't have some optimizations. 75 | def softmax(x): 76 | e_x = T.exp(x - x.max(axis=1, keepdims=True)) 77 | out = e_x / e_x.sum(axis=1, keepdims=True) 78 | return out 79 | 80 | activations = { 81 | 'relu': relu, 82 | 'cappedrelu': cappedrelu, 83 | 'sigmoid': sigmoid, 84 | 'tanh': tanh, 85 | 'linear': lambda x: x, 86 | 'softmax': softmax 87 | } 88 | 89 | def compute_L1(weights): 90 | return reduce(operator.add, map(lambda x: abs(x).sum(), weights), 0) 91 | 92 | def compute_L2_sqr(weights): 93 | return reduce(operator.add, map(lambda x: (x ** 2).sum(), weights), 0) 94 | 95 | def layers_L1(layers): 96 | return reduce(operator.add, map(lambda x: x.L1, layers), 0) 97 | 98 | def layers_L2_sqr(layers): 99 | return reduce(operator.add, map(lambda x: x.L2_sqr, layers), 0) 100 | 101 | def layers_params(layers): 102 | return map(lambda x: x.params, layers) 103 | 104 | def mse(output, targets): 105 | return T.mean((output - targets) ** 2) 106 | 107 | def nll_binary(output, targets): 108 | # negative log likelihood based on binary cross entropy error 109 | return T.mean(T.nnet.binary_crossentropy(output, targets)) 110 | 111 | def nll_multiclass(output, targets): 112 | return -T.mean(T.log(output)[T.arange(targets.shape[0]), targets]) 113 | 114 | def nll_multiclass_timeseries(output, targets): 115 | # Theano's advanced indexing is limited 116 | # therefore we reshape our n_steps x n_seq x n_classes tensor3 of probs 117 | # to a (n_steps * n_seq) x n_classes matrix of probs 118 | # so that we can use advanced indexing (i.e. get the probs which 119 | # correspond to the true class) 120 | # the labels targets also must be flattened when we do this to use the 121 | # advanced indexing 122 | p_y = output 123 | p_y_m = T.reshape(p_y, (p_y.shape[0] * p_y.shape[1], -1)) 124 | y_f = targets.flatten(ndim=1) 125 | return -T.mean(T.log(p_y_m)[T.arange(p_y_m.shape[0]), y_f]) 126 | 127 | def pred_binary(output): 128 | return T.round(output) # round to {0,1} 129 | 130 | def pred_multiclass(output): 131 | return T.argmax(output, axis=-1) 132 | 133 | def pred_error(pred, targets): 134 | # check if y has same dimension of y_pred 135 | if targets.ndim != pred.ndim: 136 | raise TypeError('targets should have the same shape as pred', ('targets', targets.type, 'pred', pred.type)) 137 | 138 | # check if targets is of the correct datatype 139 | if targets.dtype.startswith('int'): 140 | # the T.neq operator returns a vector of 0s and 1s, where 1 141 | # represents a mistake in prediction 142 | return T.mean(T.neq(pred, targets)) 143 | 144 | def untuple(a): 145 | if isinstance(a, tuple): 146 | return untuple(list(a)) 147 | if isinstance(a, (numpy.ndarray, numpy.generic) ): 148 | return a 149 | if isinstance(a, list): 150 | for i in range(len(a)): 151 | a[i] = untuple(a[i]) 152 | return a 153 | 154 | # allow to specify the size separately -- sometimes an issue when cloning, scanning, etc. 155 | # see the theano-tests/random-streams-scan-clone.py 156 | def dropout(srng, dropout_rate, inp, size=None): 157 | if size is None: 158 | size = inp.shape 159 | # p=1-p because 1's indicate keep and p is prob of dropping 160 | mask = srng.binomial(n=1, p=1-dropout_rate, size=size, dtype=theano.config.floatX) 161 | # The cast is important because int * float32 = float64 which pulls things off the gpu 162 | output = inp * mask 163 | return output 164 | 165 | # "Exact solutions to the nonlinear dynamics of learning in deep linear neural networks" 166 | # http://arxiv.org/abs/1312.6120 167 | def ortho_weight(ndim): 168 | W = numpy.random.randn(ndim, ndim) 169 | u, s, v = numpy.linalg.svd(W) 170 | return u.astype(theano.config.floatX) 171 | 172 | 173 | def stopTimer(start, message): 174 | print message + " took %0.03f seconds" % (time.clock() - start) 175 | 176 | def startTimer(message): 177 | start = time.clock() 178 | return lambda: stopTimer(start, message) 179 | 180 | def sequencePadAndMask(seqs): 181 | """ 182 | takes a sequence of (n_samples, n_timeteps, ...) 183 | pads every sequences to the maxlen of timesteps for all sequences 184 | also produces a mask 185 | """ 186 | 187 | n_samples = len(seqs) 188 | lengths = map(len, seqs) 189 | maxlen = max(lengths) 190 | 191 | x = numpy.zeros((n_samples, maxlen)) 192 | x_mask = numpy.zeros((n_samples, maxlen)) 193 | 194 | for idx, s in enumerate(seqs): 195 | x[idx, 0:lengths[idx]] = s 196 | x_mask[idx, 0:lengths[idx]] = 1. 197 | 198 | return x, x_mask 199 | 200 | def datasetPadAndMask(dataset, sequenceIdx): 201 | """ 202 | for each set in the dataset, it find the sequence at the sequenceIdx 203 | and pads and masks it, then mutates the dataset and append the mask as 204 | the last value in the dataset 205 | """ 206 | for s in dataset: 207 | seq = s[sequenceIdx] 208 | paddedSeq, seqMask = sequencePadAndMask(seq) 209 | s[sequenceIdx] = paddedSeq 210 | s.append(seqMask) 211 | -------------------------------------------------------------------------------- /DL/setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | 4 | setup( 5 | name='DL', 6 | version='0.1', 7 | description='Some deep learning tools.', 8 | long_description='Some deep learning tools.', 9 | classifiers=[ 10 | 'Development Status :: 2 - Pre-Alpha', 11 | 'Programming Language :: Python :: 2.7', 12 | 'Intended Audience :: Science/Research', 13 | ], 14 | keywords='deep learning', 15 | url='https://github.com/ccorcos/', 16 | author='Chet Corcos', 17 | author_email='ccorcos@gmail', 18 | license='MIT', 19 | packages=['DL'], 20 | install_requires=[ 21 | 'numpy', 22 | 'theano', 23 | ], 24 | test_suite='nose.collector', 25 | tests_require=['nose'], 26 | include_package_data=True, 27 | zip_safe=False 28 | ) 29 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Some Deep Learning models built with Theano 2 | 3 | 4 | ## To Do 5 | 6 | - save and reload model. 7 | 8 | LSTM 9 | - try on dice model 10 | 11 | RNN 12 | - tensor.alloc in RNN instead of repeat 13 | - modify rnn to handle varying length sequences with a mask 14 | - dropout? 15 | 16 | MRNN 17 | - character prediction 18 | - hf optimization 19 | 20 | MUSE 21 | - USE with multiplicative units 22 | 23 | Theano-users 24 | - ask about rnn dropout 25 | - ask about ubuntu install script 26 | 27 | CNN 28 | - conv net on images or maybe even imagenet 29 | 30 | DA 31 | - denoising autoencoder on mnist 32 | 33 | Writing 34 | - write about deep learning, https://imgur.com/a/Hqolp 35 | 36 | TSNE 37 | - low dimensional visualization: http://lvdmaaten.github.io/tsne/ 38 | 39 | ## Getting Started 40 | 41 | Some examples use `sparkprob` to visualize probablity distributions at the commandline so you may need to install it 42 | 43 | pip install sparkprob 44 | 45 | All of the examples use the `DL` package. To use it: 46 | 47 | cd DL 48 | python setup.py develop 49 | 50 | To unlink this package when you are done: 51 | 52 | cd DL 53 | python setup.py develop --uninstall 54 | 55 | To load the datasets 56 | 57 | cd datasets 58 | curl -O http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz 59 | curl -O http://www.iro.umontreal.ca/~lisa/deep/data/imdb.pkl 60 | 61 | 62 | -------------------------------------------------------------------------------- /THEANO.md: -------------------------------------------------------------------------------- 1 | # Theano Notes 2 | 3 | ## Debugging 4 | [This is a good resource](http://deeplearning.net/software/theano/tutorial/debug_faq.html). 5 | 6 | Print out `.type` of a theano variable to get information about the tensor type. 7 | 8 | Also, try running your program with 9 | 10 | THEANO_FLAGS="optimizer=None" python program.py 11 | 12 | This will give you line numbers and more information. 13 | 14 | 15 | Also, for any symbolic variables defined, it helps to give them test values which can be used 16 | to test the functionality of the program as it goes so you can get a line number when it happens: 17 | 18 | x = T.matrix() 19 | x.tag.test_value = numpy.random.rand(10, 20) 20 | 21 | Then make sure you set the flag when you run it. 22 | 23 | THEANO_FLAGS="optimizer=None,compute_test_value=raise" python program.py 24 | 25 | THEANO_FLAGS="exception_verbosity=high" 26 | 27 | # Parallelization 28 | 29 | On Mac, you can use the GPU if you have a newer machine with an NVIDIA graphics card. 30 | 31 | Theano uses the OS X Accelerate framework for BLAS and other optimizations. 32 | 33 | 34 | -------------------------------------------------------------------------------- /examples/dbn-mnist.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.models.DBN import DBN 8 | from DL.optimizers import optimize 9 | from DL import datasets 10 | from DL.utils import * 11 | import warnings 12 | import time 13 | 14 | warnings.simplefilter("ignore") 15 | 16 | print "An DBN on MNIST with dropout." 17 | print "loading MNIST" 18 | mnist = datasets.mnist() 19 | 20 | print "loading data to the GPU" 21 | dataset = load_data(mnist) 22 | 23 | print "creating the DBN" 24 | x = T.matrix('x') # input 25 | t = T.vector('t') # targets 26 | inputs = [x, t] 27 | # cast to an int. needs to be initially a float to load to the GPU 28 | it = t.astype('int64') 29 | 30 | rng = numpy.random.RandomState(int(time.time())) # random number generator 31 | srng = T.shared_randomstreams.RandomStreams(int(time.time())) 32 | 33 | # construct the DBN class 34 | dbn = DBN( 35 | rng=rng, 36 | input=x, 37 | n_in=28 * 28, 38 | layer_sizes=[200,200,200], 39 | n_out=10, 40 | dropout_rate=0.5, 41 | srng=srng 42 | ) 43 | 44 | # regularization 45 | L1_reg=0.00 46 | L2_reg=0.0001 47 | 48 | # cost function 49 | cost = ( 50 | nll_multiclass(dbn.output, it) 51 | + L1_reg * dbn.L1 52 | + L2_reg * dbn.L2_sqr 53 | ) 54 | 55 | pred = pred_multiclass(dbn.output) 56 | 57 | errors = pred_error(pred, it) 58 | 59 | params = flatten(dbn.params) 60 | 61 | print "training the dbn with rmsprop" 62 | 63 | optimize(dataset=dataset, 64 | inputs=inputs, 65 | cost=cost, 66 | params=params, 67 | errors=errors, 68 | n_epochs=100, 69 | batch_size=20, 70 | patience=10000, 71 | patience_increase=1.25, 72 | improvement_threshold=0.995, 73 | optimizer="rmsprop") 74 | 75 | print "compiling the prediction function" 76 | 77 | predict = theano.function(inputs=[x], outputs=pred) 78 | distribution = theano.function(inputs=[x], outputs=dbn.output) 79 | 80 | print "predicting the first 10 samples of the test dataset" 81 | print "predict:", predict(mnist[2][0][0:10]) 82 | print "answer: ", mnist[2][1][0:10] 83 | 84 | print "the output distribution should be slightly different each time due to dropout" 85 | print "distribution:", distribution(mnist[2][0][0:1]) 86 | print "distribution:", distribution(mnist[2][0][0:1]) 87 | print "distribution:", distribution(mnist[2][0][0:1]) 88 | print "distribution:", distribution(mnist[2][0][0:1]) 89 | print "distribution:", distribution(mnist[2][0][0:1]) -------------------------------------------------------------------------------- /examples/lstm-imdb.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.models.LSTM import LSTM 8 | from DL.models.EmbeddingLayer import EmbeddingLayer 9 | from DL.models.HiddenLayer import HiddenLayer 10 | from DL.optimizers import optimize 11 | from DL import datasets 12 | from DL.utils import * 13 | import time 14 | 15 | # hide warnings 16 | import warnings 17 | warnings.simplefilter("ignore") 18 | 19 | print "An LSTM with mean-pooling and embedded words on IMDB for sentiment analysis." 20 | print " x ---> x_emb ---> LSTM ---> meanPool ---> softmax ---> {1,0} sentiment" 21 | print "loading IMDB" 22 | 23 | dim_proj=128 # word embeding dimension and LSTM number of hidden units. 24 | vocabulary_size=10000 # Vocabulary size 25 | maxlen=100 # Sequence longer then this get ignored 26 | dropout_rate = 0.5 27 | validation_ratio=0.05 28 | 29 | 30 | # imdb has 3 elements, train, validation and test sets 31 | # the first input in each set is a matrix of (n_examples, n_timesteps) with a number representing each word 32 | # the second input is a vector of {0,1} sentiment 33 | imdb = datasets.imdb(validation_ratio=validation_ratio, vocabulary_size=vocabulary_size, maxlen=maxlen) 34 | 35 | # mutate the dataset to pad and mask the sequences 36 | # the sequences are the first input in the dataset 37 | # now each set consists of [padded_sequences, targets, sequence_mask] with shapes: 38 | # [(n_examples, maxlen), (n_examples), (n_examples, maxlen)] 39 | # note that the mask must remain float32! 40 | datasetPadAndMask(imdb, 0) 41 | 42 | print "loading data to the GPU" 43 | 44 | dataset = load_data(imdb) 45 | 46 | print "creating the LSTM" 47 | x = T.matrix('x') # input words, (n_examples, maxlen) 48 | t = T.vector('t') # targets 49 | mask = T.matrix('mask') # mask for valid words (n_examples, maxlen) 50 | 51 | inputs = [x, t, mask] # the mask comes last! 52 | 53 | ix = x.astype('int32') 54 | it = t.astype('int32') 55 | 56 | rng = numpy.random.RandomState(int(time.time())) # random number generator 57 | srng = T.shared_randomstreams.RandomStreams(int(time.time())) 58 | 59 | embeddingLayer = EmbeddingLayer( 60 | rng=rng, 61 | input=ix, 62 | n_in=vocabulary_size, 63 | n_out=dim_proj, 64 | onehot=False 65 | ) 66 | 67 | # (n_examples, n_timesteps, dim_proj) 68 | x_emb = embeddingLayer.output 69 | 70 | lstm = LSTM( 71 | rng=rng, 72 | input=x_emb, 73 | mask=mask, 74 | n_units=dim_proj, 75 | activation='tanh' 76 | ) 77 | 78 | # (n_examples, maxlen, dimproj) 79 | z = lstm.output 80 | 81 | # only get the active and mean mool. 82 | 83 | # mask[:, :, None].shape = (n_examples, maxlen, 1) 84 | # (z * mask[:, :, None]).shape = (n_examples, maxlen, dim_proj) 85 | # (z * mask[:, :, None]).sum(axis=1).shape = (n_examples, dim_proj) 86 | z = (z * mask[:, :, None]).sum(axis=1) 87 | 88 | # mask.sum(axis=1).shape = (n_examples,) 89 | # mask.sum(axis=1)[:, None].shape = (n_examples,1) 90 | meanPool = z / mask.sum(axis=1)[:, None] 91 | # meanPool is now (n_examples, dim_proj) 92 | 93 | meanPool_drop = dropout(srng, dropout_rate, meanPool) 94 | 95 | outputLayer = HiddenLayer( 96 | rng=rng, 97 | input=meanPool_drop, 98 | n_in=dim_proj, 99 | n_out=2, # {0,1} sentiment 100 | params=None, 101 | activation='softmax' 102 | ) 103 | 104 | y = outputLayer.output 105 | 106 | layers = [embeddingLayer, lstm, outputLayer] 107 | 108 | L1 = layers_L1(layers) 109 | L2_sqr = layers_L2_sqr(layers) 110 | 111 | # L1 = 0 112 | # L2_sqr = (lstm.params[0].get_value() ** 2).sum() 113 | 114 | # regularization 115 | L1_reg=0.00 116 | L2_reg=0.00 117 | 118 | # cost function 119 | cost = ( 120 | nll_multiclass(y, it) 121 | + L1_reg * L1 122 | + L2_reg * L2_sqr 123 | ) 124 | 125 | pred = pred_multiclass(y) 126 | 127 | errors = pred_error(pred, it) 128 | 129 | params = flatten(layers_params(layers)) 130 | 131 | print "training the LSTM with adadelta" 132 | optimize(dataset=dataset, 133 | inputs=inputs, 134 | cost=cost, 135 | params=params, 136 | errors=errors, 137 | n_epochs=200, 138 | batch_size=64, 139 | patience=1500, 140 | patience_increase=1.25, 141 | improvement_threshold=0.995, 142 | test_batches=1, 143 | print_cost=True, 144 | optimizer="adadelta") 145 | 146 | print "compiling the prediction function" 147 | predict = theano.function(inputs=[x, mask], outputs=pred) 148 | 149 | print "predicting the first 10 samples of the test dataset" 150 | print "predict:", predict(dataset[2][0].get_value()[0:10], dataset[2][-1].get_value()[0:10]) 151 | print "answer: ", dataset[2][1].get_value()[0:10] 152 | 153 | 154 | -------------------------------------------------------------------------------- /examples/mlp-mnist-adadelta.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.models.MLP import MLP 8 | from DL.optimizers import optimize 9 | from DL import datasets 10 | from DL.utils import * 11 | import time 12 | 13 | # hide warnings 14 | import warnings 15 | warnings.simplefilter("ignore") 16 | 17 | 18 | print "An MLP on MNIST." 19 | print "loading MNIST" 20 | mnist = datasets.mnist() 21 | 22 | print "loading data to the GPU" 23 | dataset = load_data(mnist) 24 | 25 | print "creating the MLP" 26 | x = T.matrix('x') # input 27 | t = T.vector('t') # targets 28 | inputs = [x, t] 29 | # cast to an int. needs to be initially a float to load to the GPU 30 | it = t.astype('int64') 31 | 32 | rng = numpy.random.RandomState(int(time.time())) # random number generator 33 | 34 | # construct the MLP class 35 | mlp = MLP( 36 | rng=rng, 37 | input=x, 38 | n_in=28 * 28, 39 | n_hidden=500, 40 | n_out=10 41 | ) 42 | 43 | # regularization 44 | L1_reg=0.00 45 | L2_reg=0.0001 46 | 47 | # cost function 48 | cost = ( 49 | nll_multiclass(mlp.output, it) 50 | + L1_reg * mlp.L1 51 | + L2_reg * mlp.L2_sqr 52 | ) 53 | 54 | pred = pred_multiclass(mlp.output) 55 | 56 | errors = pred_error(pred, it) 57 | 58 | params = flatten(mlp.params) 59 | 60 | print "training the MLP with adadelta" 61 | optimize(dataset=dataset, 62 | inputs=inputs, 63 | cost=cost, 64 | params=params, 65 | errors=errors, 66 | n_epochs=1000, 67 | batch_size=20, 68 | patience=5000, 69 | patience_increase=1.5, 70 | improvement_threshold=0.995, 71 | optimizer="adadelta") 72 | 73 | print "" 74 | print "compiling the prediction function" 75 | predict = theano.function(inputs=[x], outputs=pred) 76 | 77 | print "predicting the first 10 samples of the test dataset" 78 | print "predict:", predict(mnist[2][0][0:10]) 79 | print "answer: ", mnist[2][1][0:10] -------------------------------------------------------------------------------- /examples/mlp-mnist-dropout.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.models.MLP import MLP 8 | from DL.optimizers import optimize 9 | from DL import datasets 10 | from DL.utils import * 11 | import time 12 | # from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams 13 | 14 | # hide warnings 15 | import warnings 16 | warnings.simplefilter("ignore") 17 | 18 | 19 | print "An MLP with dropout on MNIST." 20 | print "loading MNIST" 21 | mnist = datasets.mnist() 22 | 23 | print "loading data to the GPU" 24 | dataset = load_data(mnist) 25 | 26 | print "creating the MLP" 27 | x = T.matrix('x') # input 28 | t = T.vector('t') # targets 29 | inputs = [x, t] 30 | # cast to an int. needs to be initially a float to load to the GPU 31 | it = t.astype('int64') 32 | 33 | rng = numpy.random.RandomState(int(time.time())) # random number generator 34 | # srng = RandomStreams(int(time.time())) 35 | srng = T.shared_randomstreams.RandomStreams(int(time.time())) 36 | 37 | # construct the MLP class 38 | mlp = MLP( 39 | rng=rng, 40 | input=x, 41 | n_in=28 * 28, 42 | n_hidden=500, 43 | n_out=10, 44 | dropout_rate=0.5, 45 | srng=srng 46 | ) 47 | 48 | # regularization 49 | L1_reg=0.00 50 | L2_reg=0.0001 51 | 52 | # cost function 53 | cost = ( 54 | nll_multiclass(mlp.output, it) 55 | + L1_reg * mlp.L1 56 | + L2_reg * mlp.L2_sqr 57 | ) 58 | 59 | pred = pred_multiclass(mlp.output) 60 | 61 | errors = pred_error(pred, it) 62 | 63 | params = flatten(mlp.params) 64 | 65 | print "training the MLP with rmsprop" 66 | optimize(dataset=dataset, 67 | inputs=inputs, 68 | cost=cost, 69 | params=params, 70 | errors=errors, 71 | n_epochs=1000, 72 | batch_size=20, 73 | patience=5000, 74 | patience_increase=1.5, 75 | improvement_threshold=0.995, 76 | optimizer="rmsprop") 77 | 78 | print "compiling the prediction function" 79 | predict = theano.function(inputs=[x], outputs=pred) 80 | distribution = theano.function(inputs=[x], outputs=mlp.output) 81 | 82 | print "predicting the first 10 samples of the test dataset" 83 | print "predict:", predict(mnist[2][0][0:10]) 84 | print "answer: ", mnist[2][1][0:10] 85 | 86 | print "with dropout, the output distributions should all be slightly different" 87 | print "predict:", distribution(mnist[2][0][0:1]) 88 | print "predict:", distribution(mnist[2][0][0:1]) 89 | print "predict:", distribution(mnist[2][0][0:1]) 90 | print "predict:", distribution(mnist[2][0][0:1]) 91 | -------------------------------------------------------------------------------- /examples/mlp-mnist-load.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.models.MLP import MLP 8 | from DL.optimizers import optimize 9 | from DL import datasets 10 | from DL.utils import * 11 | import time 12 | 13 | # hide warnings 14 | import warnings 15 | warnings.simplefilter("ignore") 16 | 17 | print "An MLP on MNIST." 18 | print "loading MNIST" 19 | mnist = datasets.mnist() 20 | 21 | 22 | print "loading a previous model" 23 | l = numpy.load("saved.npz") 24 | lparams = l['params'] 25 | losses = l['results'] 26 | notes = l['notes'] 27 | 28 | print "model loaded" 29 | print notes 30 | print "errors" 31 | print " train: %0.05f" % losses[0] 32 | print " validation: %0.05f" % losses[1] 33 | print " test: %0.05f" % losses[2] 34 | print "" 35 | 36 | print "loading data to the GPU" 37 | dataset = load_data(mnist) 38 | 39 | print "creating the MLP" 40 | x = T.matrix('x') # input 41 | t = T.vector('t') # targets 42 | inputs = [x, t] 43 | # cast to an int. needs to be initially a float to load to the GPU 44 | it = t.astype('int64') 45 | 46 | rng = numpy.random.RandomState(int(time.time())) # random number generator 47 | 48 | # construct the MLP class 49 | mlp = MLP( 50 | rng=rng, 51 | input=x, 52 | n_in=28 * 28, 53 | n_hidden=500, 54 | n_out=10, 55 | params=lparams 56 | ) 57 | 58 | # regularization 59 | L1_reg=0.00 60 | L2_reg=0.0001 61 | 62 | # cost function 63 | cost = ( 64 | nll_multiclass(mlp.output, it) 65 | + L1_reg * mlp.L1 66 | + L2_reg * mlp.L2_sqr 67 | ) 68 | 69 | pred = pred_multiclass(mlp.output) 70 | 71 | errors = pred_error(pred, it) 72 | 73 | params = flatten(mlp.params) 74 | 75 | 76 | # print "training the MLP with rmsprop" 77 | # losses = optimize( 78 | # dataset=dataset, 79 | # inputs=inputs, 80 | # cost=cost, 81 | # params=params, 82 | # errors=errors, 83 | # n_epochs=5, 84 | # batch_size=20, 85 | # patience=5000, 86 | # patience_increase=1.5, 87 | # improvement_threshold=0.995, 88 | # optimizer='rmsprop' 89 | # ) 90 | 91 | print "compiling the prediction function" 92 | predict = theano.function(inputs=[x], outputs=pred) 93 | 94 | print "predicting the first 10 samples of the test dataset" 95 | print "predict:", predict(mnist[2][0][0:10]) 96 | print "answer: ", mnist[2][1][0:10] 97 | 98 | -------------------------------------------------------------------------------- /examples/mlp-mnist-rmsprop.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.models.MLP import MLP 8 | from DL.optimizers import optimize 9 | from DL import datasets 10 | from DL.utils import * 11 | import time 12 | 13 | # hide warnings 14 | import warnings 15 | warnings.simplefilter("ignore") 16 | 17 | 18 | print "An MLP on MNIST." 19 | print "loading MNIST" 20 | mnist = datasets.mnist() 21 | 22 | print "loading data to the GPU" 23 | dataset = load_data(mnist) 24 | 25 | print "creating the MLP" 26 | x = T.matrix('x') # input 27 | t = T.vector('t') # targets 28 | inputs = [x, t] 29 | # cast to an int. needs to be initially a float to load to the GPU 30 | it = t.astype('int64') 31 | 32 | rng = numpy.random.RandomState(int(time.time())) # random number generator 33 | 34 | # construct the MLP class 35 | mlp = MLP( 36 | rng=rng, 37 | input=x, 38 | n_in=28 * 28, 39 | n_hidden=500, 40 | n_out=10 41 | ) 42 | 43 | # regularization 44 | L1_reg=0.00 45 | L2_reg=0.0001 46 | 47 | # cost function 48 | cost = ( 49 | nll_multiclass(mlp.output, it) 50 | + L1_reg * mlp.L1 51 | + L2_reg * mlp.L2_sqr 52 | ) 53 | 54 | pred = pred_multiclass(mlp.output) 55 | 56 | errors = pred_error(pred, it) 57 | 58 | params = flatten(mlp.params) 59 | 60 | 61 | print "training the MLP with rmsprop" 62 | optimize(dataset=dataset, 63 | inputs=inputs, 64 | cost=cost, 65 | params=params, 66 | errors=errors, 67 | n_epochs=1000, 68 | batch_size=20, 69 | patience=5000, 70 | patience_increase=1.5, 71 | improvement_threshold=0.995, 72 | optimizer="rmsprop") 73 | 74 | print "compiling the prediction function" 75 | predict = theano.function(inputs=[x], outputs=pred) 76 | 77 | print "predicting the first 10 samples of the test dataset" 78 | print "predict:", predict(mnist[2][0][0:10]) 79 | print "answer: ", mnist[2][1][0:10] -------------------------------------------------------------------------------- /examples/mlp-mnist-save.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.models.MLP import MLP 8 | from DL.optimizers import optimize 9 | from DL import datasets 10 | from DL.utils import * 11 | import time 12 | 13 | # hide warnings 14 | import warnings 15 | warnings.simplefilter("ignore") 16 | 17 | print "An MLP on MNIST." 18 | print "loading MNIST" 19 | mnist = datasets.mnist() 20 | 21 | print "loading data to the GPU" 22 | dataset = load_data(mnist) 23 | 24 | print "creating the MLP" 25 | x = T.matrix('x') # input 26 | t = T.vector('t') # targets 27 | inputs = [x, t] 28 | # cast to an int. needs to be initially a float to load to the GPU 29 | it = t.astype('int64') 30 | 31 | rng = numpy.random.RandomState(int(time.time())) # random number generator 32 | 33 | # construct the MLP class 34 | mlp = MLP( 35 | rng=rng, 36 | input=x, 37 | n_in=28 * 28, 38 | n_hidden=500, 39 | n_out=10 40 | ) 41 | 42 | # regularization 43 | L1_reg=0.00 44 | L2_reg=0.0001 45 | 46 | # cost function 47 | cost = ( 48 | nll_multiclass(mlp.output, it) 49 | + L1_reg * mlp.L1 50 | + L2_reg * mlp.L2_sqr 51 | ) 52 | 53 | pred = pred_multiclass(mlp.output) 54 | 55 | errors = pred_error(pred, it) 56 | 57 | params = flatten(mlp.params) 58 | 59 | 60 | print "training the MLP with rmsprop" 61 | losses = optimize( 62 | dataset=dataset, 63 | inputs=inputs, 64 | cost=cost, 65 | params=params, 66 | errors=errors, 67 | n_epochs=2, 68 | batch_size=20, 69 | patience=5000, 70 | patience_increase=1.5, 71 | improvement_threshold=0.995, 72 | optimizer='rmsprop' 73 | ) 74 | 75 | print "compiling the prediction function" 76 | predict = theano.function(inputs=[x], outputs=pred) 77 | 78 | print "predicting the first 10 samples of the test dataset" 79 | print "predict:", predict(mnist[2][0][0:10]) 80 | print "answer: ", mnist[2][1][0:10] 81 | 82 | 83 | print "saving..." 84 | numpy.savez("saved.npz", results=losses, params=mlp.params, notes="Just a vanilla mlp on MNIST...") 85 | 86 | -------------------------------------------------------------------------------- /examples/mlp-mnist-sgd.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.models.MLP import MLP 8 | from DL.optimizers import optimize 9 | from DL import datasets 10 | from DL.utils import * 11 | import time 12 | 13 | # hide warnings 14 | import warnings 15 | warnings.simplefilter("ignore") 16 | 17 | print "An MLP on MNIST." 18 | print "loading MNIST" 19 | mnist = datasets.mnist() 20 | 21 | print "loading data to the GPU" 22 | dataset = load_data(mnist) 23 | 24 | print "creating the MLP" 25 | x = T.matrix('x') # input 26 | t = T.vector('t') # targets 27 | inputs = [x, t] 28 | 29 | # cast to an int. needs to be initially a float to load to the GPU 30 | it = t.astype('int64') 31 | 32 | rng = numpy.random.RandomState(int(time.time())) # random number generator 33 | 34 | # construct the MLP class 35 | mlp = MLP( 36 | rng=rng, 37 | input=x, 38 | n_in=28 * 28, 39 | n_hidden=500, 40 | n_out=10 41 | ) 42 | 43 | # regularization 44 | L1_reg=0.00 45 | L2_reg=0.0001 46 | 47 | # cost function 48 | cost = ( 49 | nll_multiclass(mlp.output, it) 50 | + L1_reg * mlp.L1 51 | + L2_reg * mlp.L2_sqr 52 | ) 53 | 54 | pred = pred_multiclass(mlp.output) 55 | 56 | errors = pred_error(pred, it) 57 | 58 | params = flatten(mlp.params) 59 | 60 | print "training the MLP with sgd" 61 | optimize(dataset=dataset, 62 | inputs=inputs, 63 | cost=cost, 64 | params=params, 65 | errors=errors, 66 | learning_rate=0.01, 67 | momentum=0.2, 68 | n_epochs=1000, 69 | batch_size=20, 70 | patience=1000, 71 | patience_increase=1.5, 72 | improvement_threshold=0.995, 73 | optimizer="sgd") 74 | 75 | print "compiling the prediction function" 76 | predict = theano.function(inputs=[x], outputs=pred) 77 | 78 | print "predicting the first 10 samples of the test dataset" 79 | print "predict:", predict(mnist[2][0][0:10]) 80 | print "answer: ", mnist[2][1][0:10] -------------------------------------------------------------------------------- /examples/rnn-lag-binary.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.models.RNN import RNN 8 | from DL.optimizers import optimize 9 | from DL.utils import * 10 | import time 11 | import matplotlib.pyplot as plt 12 | 13 | # hide warnings 14 | import warnings 15 | warnings.simplefilter("ignore") 16 | 17 | print "Testing an RNN with binary outputs" 18 | print "Generating lag test data..." 19 | 20 | n_hidden = 30 21 | n_in = 5 22 | n_out = 2 23 | n_steps = 11 24 | n_seq = 100 25 | 26 | # simple lag test 27 | seq = numpy.random.randn(n_seq, n_steps, n_in) 28 | targets = numpy.zeros((n_seq, n_steps, n_out)) 29 | 30 | # whether lag 1 (dim 3) is greater than lag 2 (dim 0) 31 | targets[:, 2:, 0] = numpy.cast[numpy.int](seq[:, 1:-1, 3] > seq[:, :-2, 0]) 32 | 33 | # whether product of lag 1 (dim 4) and lag 1 (dim 2) 34 | # is less than lag 2 (dim 0) 35 | targets[:, 2:, 1] = numpy.cast[numpy.int]((seq[:, 1:-1, 4] * seq[:, 1:-1, 2]) > seq[:, :-2, 0]) 36 | 37 | # split into training, validation, and test 38 | trainIdx = int(numpy.floor(4./6.*n_seq)) 39 | validIdx = int(numpy.floor(5./6.*n_seq)) 40 | 41 | lagData = ((seq[0:trainIdx,:,:], targets[0:trainIdx,:,:]), 42 | (seq[trainIdx:validIdx,:,:], targets[trainIdx:validIdx,:,:]), 43 | (seq[validIdx:,:,:], targets[validIdx:,:,:])) 44 | 45 | print "loading data to the GPU" 46 | # if you change this to int32, make you change the target tensor type! 47 | dataset = load_data(lagData) 48 | 49 | print "creating the RNN" 50 | x = T.tensor3('x') # input 51 | t = T.tensor3('t') # targets 52 | inputs = [x,t] 53 | # cast to an int. needs to be initially a float to load to the GPU 54 | it = t.astype('int64') 55 | 56 | rng = numpy.random.RandomState(int(time.time())) # random number generator 57 | 58 | rnn = RNN(rng=rng, 59 | input=x, 60 | n_in=n_in, 61 | n_hidden=n_hidden, 62 | n_out=n_out, 63 | activation='tanh', 64 | outputActivation='sigmoid' 65 | ) 66 | 67 | # regularization 68 | L1_reg=0.00 69 | L2_reg=0.0001 70 | 71 | # cost function 72 | cost = ( 73 | nll_binary(rnn.output, it) 74 | + L1_reg * rnn.L1 75 | + L2_reg * rnn.L2_sqr 76 | ) 77 | 78 | pred = pred_binary(rnn.output) 79 | 80 | errors = pred_error(pred, it) 81 | 82 | params = flatten(rnn.params) 83 | 84 | print "training the rnn with rmsprop" 85 | 86 | optimize(dataset=dataset, 87 | inputs=inputs, 88 | cost=cost, 89 | params=params, 90 | errors=errors, 91 | n_epochs=5000, 92 | batch_size=20, 93 | patience=1000, 94 | patience_increase=1.5, 95 | improvement_threshold=0.995, 96 | optimizer="rmsprop") 97 | 98 | print "compiling the prediction function" 99 | 100 | predict = theano.function(inputs=[x], outputs=rnn.output) 101 | 102 | print "predicting the first 10 samples of the training dataset" 103 | 104 | seqs = xrange(10) 105 | for seq_num in seqs: 106 | fig = plt.figure() 107 | ax1 = plt.subplot(211) 108 | plt.plot(seq[seq_num]) 109 | ax1.set_title('input') 110 | ax2 = plt.subplot(212) 111 | true_targets = plt.step(xrange(n_steps), targets[seq_num], marker='o') 112 | 113 | guess = predict(seq[seq_num:seq_num+1])[0] 114 | guessed_targets = plt.step(xrange(n_steps), guess) 115 | plt.setp(guessed_targets, linestyle='--', marker='d') 116 | for i, x in enumerate(guessed_targets): 117 | x.set_color(true_targets[i].get_color()) 118 | ax2.set_ylim((-0.1, 1.1)) 119 | ax2.set_title('solid: true output, dashed: model output (prob)') 120 | 121 | plt.show() -------------------------------------------------------------------------------- /examples/rnn-lag-real.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.models.RNN import RNN 8 | from DL.optimizers import optimize 9 | from DL.utils import * 10 | import time 11 | import matplotlib.pyplot as plt 12 | 13 | # hide warnings 14 | import warnings 15 | warnings.simplefilter("ignore") 16 | 17 | print "Testing an RNN with linear outputs" 18 | print "Generating lag test data..." 19 | n_hidden = 30 20 | n_in = 5 21 | n_out = 3 22 | n_steps = 11 23 | n_seq = 100 24 | 25 | numpy.random.seed(0) 26 | # simple lag test 27 | seq = numpy.random.randn(n_seq, n_steps, n_in) 28 | targets = numpy.zeros((n_seq, n_steps, n_out)) 29 | 30 | targets[:, 1:, 0] = seq[:, :-1, 3] # delayed 1 31 | targets[:, 1:, 1] = seq[:, :-1, 2] # delayed 1 32 | targets[:, 2:, 2] = seq[:, :-2, 0] # delayed 2 33 | 34 | targets += 0.01 * numpy.random.standard_normal(targets.shape) 35 | 36 | # split into training, validation, and test 37 | trainIdx = int(numpy.floor(4./6.*n_seq)) 38 | validIdx = int(numpy.floor(5./6.*n_seq)) 39 | 40 | lagData = ((seq[0:trainIdx,:,:], targets[0:trainIdx,:,:]), 41 | (seq[trainIdx:validIdx,:,:], targets[trainIdx:validIdx,:,:]), 42 | (seq[validIdx:,:,:], targets[validIdx:,:,:])) 43 | 44 | print "loading data to the GPU" 45 | dataset = load_data(lagData) 46 | 47 | print "creating the RNN" 48 | x = T.tensor3('x') # input 49 | t = T.tensor3('t') # targets 50 | inputs = [x,t] 51 | rng = numpy.random.RandomState(int(time.time())) # random number generator 52 | 53 | rnn = RNN(rng=rng, 54 | input=x, 55 | n_in=n_in, 56 | n_hidden=n_hidden, 57 | n_out=n_out, 58 | activation='tanh', 59 | outputActivation='linear' 60 | ) 61 | 62 | # regularization 63 | L1_reg=0.00 64 | L2_reg=0.0001 65 | 66 | # cost function 67 | cost = ( 68 | mse(rnn.output, t) 69 | + L1_reg * rnn.L1 70 | + L2_reg * rnn.L2_sqr 71 | ) 72 | 73 | errors = mse(rnn.output, t) 74 | 75 | params = flatten(rnn.params) 76 | 77 | print "training the rnn with rmsprop" 78 | 79 | optimize(dataset=dataset, 80 | inputs=inputs, 81 | cost=cost, 82 | params=params, 83 | errors=errors, 84 | n_epochs=5000, 85 | batch_size=20, 86 | patience=1000, 87 | patience_increase=1.5, 88 | improvement_threshold=0.995, 89 | optimizer="rmsprop") 90 | 91 | print "compiling the prediction function" 92 | 93 | predict = theano.function(inputs=[x], outputs=rnn.output) 94 | 95 | print "predicting the first sample of the training dataset" 96 | 97 | fig = plt.figure() 98 | ax1 = plt.subplot(211) 99 | plt.plot(seq[0]) 100 | ax1.set_title('input') 101 | 102 | ax2 = plt.subplot(212) 103 | true_targets = plt.plot(targets[0]) 104 | 105 | guess = predict(seq[0:1])[0] 106 | guessed_targets = plt.plot(guess, linestyle='--') 107 | for i, x in enumerate(guessed_targets): 108 | x.set_color(true_targets[i].get_color()) 109 | 110 | ax2.set_title('solid: true output, dashed: model output') 111 | plt.show() 112 | -------------------------------------------------------------------------------- /examples/rnn-lag-softmax.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.models.RNN import RNN 8 | from DL.optimizers import optimize 9 | from DL.utils import * 10 | import time 11 | import matplotlib.pyplot as plt 12 | 13 | # hide warnings 14 | import warnings 15 | warnings.simplefilter("ignore") 16 | 17 | print "Testing an RNN with softmax outputs" 18 | print "Generating lag test data..." 19 | 20 | n_hidden = 100 21 | n_in = 5 22 | n_steps = 10 23 | n_seq = 100 24 | n_classes = 3 25 | n_out = n_classes # restricted to single softmax per time step 26 | 27 | # simple lag test 28 | seq = numpy.random.randn(n_seq, n_steps, n_in) 29 | targets = numpy.zeros((n_seq, n_steps), dtype=numpy.int) 30 | 31 | thresh = 0.5 32 | # if lag 1 (dim 3) is greater than lag 2 (dim 0) + thresh 33 | # class 1 34 | # if lag 1 (dim 3) is less than lag 2 (dim 0) - thresh 35 | # class 2 36 | # if lag 2(dim0) - thresh <= lag 1 (dim 3) <= lag2(dim0) + thresh 37 | # class 0 38 | targets[:, 2:][seq[:, 1:-1, 3] > seq[:, :-2, 0] + thresh] = 1 39 | targets[:, 2:][seq[:, 1:-1, 3] < seq[:, :-2, 0] - thresh] = 2 40 | #targets[:, 2:, 0] = numpy.cast[numpy.int](seq[:, 1:-1, 3] > seq[:, :-2, 0]) 41 | 42 | # split into training, validation, and test 43 | trainIdx = int(numpy.floor(4./6.*n_seq)) 44 | validIdx = int(numpy.floor(5./6.*n_seq)) 45 | 46 | lagData = ((seq[0:trainIdx,:,:], targets[0:trainIdx,:]), 47 | (seq[trainIdx:validIdx,:,:], targets[trainIdx:validIdx,:]), 48 | (seq[validIdx:,:,:], targets[validIdx:,:])) 49 | 50 | print "loading data to the GPU" 51 | dataset = load_data(lagData) 52 | 53 | print "creating the RNN" 54 | x = T.tensor3('x') # input 55 | t = T.matrix('t') # targets 56 | inputs = [x,t] 57 | # cast to an int. needs to be initially a float to load to the GPU 58 | it = t.astype('int64') 59 | 60 | rng = numpy.random.RandomState(int(time.time())) # random number generator 61 | 62 | rnn = RNN(rng=rng, 63 | input=x, 64 | n_in=n_in, 65 | n_hidden=n_hidden, 66 | n_out=n_out, 67 | activation='tanh', 68 | outputActivation='softmax' 69 | ) 70 | 71 | # regularization 72 | L1_reg=0.00 73 | L2_reg=0.0001 74 | 75 | # cost function 76 | cost = ( 77 | nll_multiclass_timeseries(rnn.output, it) 78 | + L1_reg * rnn.L1 79 | + L2_reg * rnn.L2_sqr 80 | ) 81 | 82 | pred = pred_multiclass(rnn.output) 83 | 84 | errors = pred_error(pred, it) 85 | 86 | params = flatten(rnn.params) 87 | 88 | print "training the rnn with rmsprop" 89 | 90 | optimize(dataset=dataset, 91 | inputs=inputs, 92 | cost=cost, 93 | params=params, 94 | errors=errors, 95 | n_epochs=5000, 96 | batch_size=100, 97 | patience=1000, 98 | patience_increase=2., 99 | improvement_threshold=0.9995, 100 | optimizer="rmsprop") 101 | 102 | print "compiling the prediction function" 103 | 104 | predict = theano.function(inputs=[x], outputs=rnn.output) 105 | 106 | print "predicting the first 10 samples of the training dataset" 107 | 108 | seqs = xrange(10) 109 | for seq_num in seqs: 110 | fig = plt.figure() 111 | ax1 = plt.subplot(211) 112 | plt.plot(seq[seq_num]) 113 | ax1.set_title('input') 114 | ax2 = plt.subplot(212) 115 | 116 | # blue line will represent true classes 117 | true_targets = plt.step(xrange(n_steps), targets[seq_num], marker='o') 118 | 119 | # show probabilities (in b/w) output by model 120 | guess = predict(seq[seq_num:seq_num+1])[0] 121 | guessed_probs = plt.imshow(guess.T, interpolation='nearest', cmap='gray') 122 | ax2.set_title('blue: true class, grayscale: probs assigned by model') 123 | 124 | plt.show() -------------------------------------------------------------------------------- /theano-tests/diag.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | 7 | x = T.matrix() 8 | y = T.vector() 9 | 10 | d = T.diag(x) 11 | z = T.diag(y) 12 | 13 | 14 | matrixDiags = theano.function(inputs=[x], outputs=d) 15 | vectorDiag = theano.function(inputs=[y], outputs=z) 16 | 17 | print matrixDiags([[1,2,3],[4,5,6], [7,8,9]]) 18 | print vectorDiag([1,2,3]) 19 | -------------------------------------------------------------------------------- /theano-tests/dot.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | 8 | 9 | y = T.vector() 10 | n_x = 2 11 | n_y = 3 12 | W_xy = numpy.random.randn(n_x, n_y) 13 | z = T.dot(W_xy, y) 14 | 15 | dot = theano.function(inputs=[y], outputs=z) 16 | 17 | print dot(numpy.random.randn(n_y)) 18 | 19 | 20 | q = T.vector() 21 | w = T.vector() 22 | 23 | vecDot = theano.function(inputs=[q,w], outputs=T.dot(q,w)) 24 | vecDot2 = theano.function(inputs=[q,w], outputs=T.dot(w,q)) 25 | vecMult = theano.function(inputs=[q,w], outputs=w*q) 26 | vecMult2 = theano.function(inputs=[q,w], outputs=q*w) 27 | 28 | 29 | print vecDot([1,2], [2,3]) 30 | print vecDot2([1,2], [2,3]) 31 | print vecMult([1,2], [2,3]) 32 | print vecMult2([1,2], [2,3]) 33 | -------------------------------------------------------------------------------- /theano-tests/embedding-indexing.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | from DL.utils import * 8 | 9 | input = T.imatrix('input') 10 | 11 | n_in = 5 12 | n_out = 4 13 | 14 | W = theano.shared(numpy.random.randn(n_in, n_out)) 15 | 16 | theano.printing.debugprint(input.shape[:-1], print_type=True) 17 | 18 | shape = T.concatenate([input.shape, [n_out]]) 19 | 20 | theano.printing.debugprint(shape, print_type=True) 21 | 22 | 23 | output = W[input.flatten()].reshape(shape, ndim=3) 24 | 25 | r = theano.function(inputs=[input], outputs=output) 26 | 27 | s = theano.function(inputs=[input], outputs=shape) 28 | 29 | # 1 example 30 | # 2 timesteps 31 | # ints representing 32 | print r([ 33 | [0,1,3,2,4], 34 | [0,1,3,2,4] 35 | ]) -------------------------------------------------------------------------------- /theano-tests/forward-feed-column-pooling.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | import operator 8 | from utils import * 9 | 10 | x = T.vector() 11 | y = T.vector() 12 | z = T.vector() 13 | 14 | X = T.stacklists([x,y,z]) 15 | 16 | M = T.max(X, axis=0) 17 | 18 | together = theano.function(inputs=[x,y,z], outputs=X) 19 | maximum = theano.function(inputs=[x,y,z], outputs=M) 20 | 21 | print together([1,2,3],[4,5,6], [7,8,9]) 22 | print maximum([1,2,3],[4,5,6], [7,8,9]) 23 | -------------------------------------------------------------------------------- /theano-tests/mean-pooling.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | # coding: utf-8 3 | 4 | import theano 5 | import theano.tensor as T 6 | import numpy 7 | import operator 8 | from DL.utils import * 9 | 10 | 11 | n_examples = 4 12 | maxlen = 3 13 | dim_proj = 2 14 | 15 | mask = numpy.random.randn(n_examples, maxlen) 16 | z = numpy.random.randn(n_examples, maxlen, dim_proj) 17 | 18 | 19 | # only get the active and mean mool. 20 | 21 | # mask[:, :, None].shape = (n_examples, maxlen, 1) 22 | # (z * mask[:, :, None]).shape = (n_examples, maxlen, dim_proj) 23 | # (z * mask[:, :, None]).sum(axis=1).shape = (n_examples, dim_proj) 24 | z = (z * mask[:, :, None]).sum(axis=1) 25 | # mask.sum(axis=1).shape = (n_examples,) 26 | # mask.sum(axis=1)[:, None].shape = (n_examples,1) 27 | z = z / mask.sum(axis=1)[:, None] 28 | # z is now (n_examples, dim_proj) -------------------------------------------------------------------------------- /theano-tests/random-streams-scan-clone.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy 4 | import time 5 | 6 | srng = T.shared_randomstreams.RandomStreams(int(time.time())) 7 | 8 | 9 | input = T.vector('input') 10 | inputs = T.matrix('inputs') 11 | 12 | 13 | def dropout(srng, dropout_rate, inp, size=None): 14 | if size is None: 15 | size = inp.shape 16 | # mask = srng.binomial(n=1, p=1-dropout_rate, size=(10,), dtype=theano.config.floatX) 17 | mask = srng.binomial(n=1, p=1-dropout_rate, size=size, dtype=theano.config.floatX) 18 | out = inp * mask 19 | return out 20 | 21 | output = dropout(srng, 0.5, input, inputs[0].shape) 22 | 23 | def step(x_t): 24 | upd = [(input, x_t)] 25 | y_t = theano.clone(output, replace=upd) 26 | return y_t 27 | 28 | outputs, _ = theano.scan(step, sequences=inputs, outputs_info=[None]) 29 | 30 | predict = theano.function(inputs=[inputs], outputs=outputs) 31 | 32 | print predict(numpy.random.randn(2,10)) 33 | 34 | g = theano.function(inputs=[inputs, outputs], outputs=T.grad(outputs, inputs)) 35 | 36 | print g 37 | 38 | # # define tensor variables 39 | # X = T.matrix("X") 40 | # W = T.matrix("W") 41 | # b_sym = T.vector("b_sym") 42 | 43 | # # define shared random stream 44 | # trng = T.shared_randomstreams.RandomStreams(1234) 45 | # d=trng.binomial(size=W[1].shape) 46 | 47 | # results, updates = theano.scan(lambda v: T.tanh(T.dot(v, W) + b_sym) * d, sequences=X) 48 | # compute_with_bnoise = theano.function(inputs=[X, W, b_sym], outputs=[results], 49 | # updates=updates, allow_input_downcast=True) 50 | # x = numpy.eye(10, 2, dtype=theano.config.floatX) 51 | # w = numpy.ones((2, 2), dtype=theano.config.floatX) 52 | # b = numpy.ones((2), dtype=theano.config.floatX) 53 | 54 | # print compute_with_bnoise(x, w, b) -------------------------------------------------------------------------------- /theano-tests/rnn-dropout.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy 4 | import time 5 | 6 | # Make a simple hidden layer: 7 | n_in = 10 8 | n_out = 20 9 | dropout_rate = 0.5 10 | 11 | rng = numpy.random.RandomState(int(time.time())) # random number generator 12 | 13 | input = T.matrix('input') 14 | 15 | W_values = numpy.asarray( 16 | rng.uniform( 17 | low=-numpy.sqrt(6. / (n_in + n_out)), 18 | high=numpy.sqrt(6. / (n_in + n_out)), 19 | size=(n_in, n_out) 20 | ), 21 | dtype=theano.config.floatX 22 | ) 23 | 24 | W = theano.shared(value=W_values, name='W', borrow=True) 25 | 26 | b_values = numpy.zeros((n_out,), dtype=theano.config.floatX) 27 | b = theano.shared(value=b_values, name='b', borrow=True) 28 | 29 | output = T.tanh(T.dot(input, W) + b) 30 | 31 | updates = {} 32 | if dropout_rate > 0: 33 | # p=1-p because 1's indicate keep and p is prob of dropping 34 | srng = theano.tensor.shared_randomstreams.RandomStreams(int(time.time())) 35 | # mask = T.imatrix('mask') 36 | mask = srng.binomial(n=1, p=1-dropout_rate, size=output.shape) 37 | # The cast is important because int * float32 = float64 which pulls things off the gpu 38 | output = output * T.cast(mask, theano.config.floatX) 39 | updates = srng.updates() 40 | 41 | 42 | x = T.tensor3('x') 43 | 44 | def step(x_t, x_tm1): 45 | replace = [(input, x_t)] 46 | replace += updates 47 | x_tp1 = theano.clone(output, replace=replace) 48 | return x_tp1 + x_tm1 49 | 50 | z, _ = theano.scan(step, sequences=[x[1:]], outputs_info=[x[0]]) 51 | 52 | print z 53 | 54 | predit = theano.function(inputs=[x], outputs=z) -------------------------------------------------------------------------------- /theano-tests/tensor-shape-append.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy 4 | 5 | 6 | # x = T.matrix('x') 7 | # y = T.tensor3('y') 8 | 9 | # sx = x.shape 10 | # sy = y.shape 11 | 12 | # shape2 = theano.function(inputs=[x], outputs=sx) 13 | # shape3 = theano.function(inputs=[y], outputs=sy) 14 | 15 | # print shape2(numpy.random.randn(2,3)) 16 | # print shape3(numpy.random.randn(2,3,4)) 17 | 18 | # raise 19 | 20 | x = T.matrix('x') 21 | y = T.tensor3('y') 22 | 23 | sx = x.shape 24 | sy = y.shape 25 | 26 | sx = T.concatenate([sx[:-1], [10]]) 27 | sy = T.concatenate([sy[:-1], [10]]) 28 | 29 | shape2 = theano.function(inputs=[x], outputs=sx) 30 | shape3 = theano.function(inputs=[y], outputs=sy) 31 | 32 | print shape2(numpy.random.randn(2,3)) 33 | print shape3(numpy.random.randn(2,3,4)) --------------------------------------------------------------------------------