├── .gitignore
├── DL.md
├── DL
    ├── .gitignore
    ├── DL
    │   ├── __init__.py
    │   ├── datasets
    │   │   └── __init__.py
    │   ├── models
    │   │   ├── DBN.py
    │   │   ├── EmbeddingLayer.py
    │   │   ├── ForwardFeed.py
    │   │   ├── HiddenLayer.py
    │   │   ├── LSTM.py
    │   │   ├── MLP.py
    │   │   ├── RNN.py
    │   │   └── __init__.py
    │   ├── optimizers
    │   │   ├── __init__.py
    │   │   ├── adadelta.py
    │   │   ├── rmsprop.py
    │   │   └── sgd.py
    │   └── utils.py
    └── setup.py
├── README.md
├── THEANO.md
├── examples
    ├── dbn-mnist.py
    ├── lstm-imdb.py
    ├── mlp-mnist-adadelta.py
    ├── mlp-mnist-dropout.py
    ├── mlp-mnist-load.py
    ├── mlp-mnist-rmsprop.py
    ├── mlp-mnist-save.py
    ├── mlp-mnist-sgd.py
    ├── rnn-lag-binary.py
    ├── rnn-lag-real.py
    └── rnn-lag-softmax.py
└── theano-tests
    ├── diag.py
    ├── dot.py
    ├── embedding-indexing.py
    ├── forward-feed-column-pooling.py
    ├── mean-pooling.py
    ├── random-streams-scan-clone.py
    ├── rnn-dropout.py
    └── tensor-shape-append.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.DS_Store
2 | *.pyc
3 | *.pkl
4 | *.gz
5 | *.npz


--------------------------------------------------------------------------------
/DL.md:
--------------------------------------------------------------------------------
  1 | # Research Review
  2 | 
  3 | Neural Networks (NNs) have been studied for decades. But it wasn't unitl 1986 when an efficient method for training them was discovered called [backpropagation][1]. The idea behind it is simple and intuitive - use the chain rule to propagate error derivatives backwards through the network.
  4 | 
  5 | A multilayer perceptron (MLP) with one hidden layer has been proven to be a [universal approximator][2]. That means a MLP can represent any arbitrary function if the hidden layer has a enough units. However, it is extremely challenging to learn a set parameters of parameters that generalizes well. Thus, researchers resorted to designing NN architectures that are more specific to certain problems. 
  6 | 
  7 | The first big success for NNs was the Convolutional Neural Network (CNN). It was designed to be used by the US Postal service for [zipcode recognition][3]. CNNs are deep NNs, meaning they have more than one hidden layer. They also involve shared weights that are convolved against the previous layer. These networks have been [wildly sucessful for image recognition][4] by [heirarchically learning and composing low-level features into successively higher-level features][5].
  8 | 
  9 | 
 10 | Deep neural networks still out of reach. Problem with the gradients. Autoencoders. Then Momentum. RMSProp, etc.
 11 | 
 12 | RNNs for NLP. Vanishing gradients. LSTM for long term dependancies.
 13 | 
 14 | Text Embedding.
 15 | 
 16 | 
 17 | 
 18 | ---
 19 | 
 20 | NNs are... They have certain properties... 
 21 | 
 22 | In 2006... autoencoders... pretraining... newer methods that don't need pretraining but the concept is still the same. 
 23 | 
 24 | Can we bring this concept to RNNs for robotics?
 25 | 
 26 | Generalization - NLP embedding
 27 | 
 28 | 
 29 | 
 30 | 
 31 | --- 
 32 | 
 33 | "Unsupervised State Estimation using Deep Learning for Interactive Object Recognition"
 34 | 
 35 | Outline:
 36 | 
 37 | Deep learning:
 38 |  - unsupervised learning
 39 |  - autoencoders for pretraining
 40 |  - recurrent neural networks
 41 |  - recurrent autoencoder for ASR
 42 | 
 43 | Goal:
 44 |  - unsupervised learning for state estimation
 45 |  - pretraining for supervising learning on hidden state
 46 | 
 47 | Models:
 48 |  - RNN for unsupervised
 49 |  - ARNN for unsupervised
 50 | 
 51 | train to predict the next observation. using the hidden state varaibles, supervised predict the die from h_t. For each action, given what we're expected to see next, compute the likelihood for each die. Take the action that leads to the minimum entropy over these guesses. This is the optimal.
 52 | 
 53 | Likely overfit. User regularizatoin and dropout.
 54 | 
 55 |  - RNN for supervised
 56 |  - ARNN for supervised
 57 | 
 58 | Predict the object likelihood directly as opposed to this unsupervised middleman. performance?
 59 | 
 60 | Other Problems:
 61 | 
 62 | Try to use this model on LiDaR SLAM to predict the room and navigate between rooms.
 63 | 
 64 | 
 65 | <!--References-->
 66 | [1]: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=JicYPdAAAAAJ&citation_for_view=JicYPdAAAAAJ:GFxP56DSvIMC
 67 | [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.101.2647&rep=rep1&type=pdf
 68 | [3]: https://scholar.google.com/citations?view_op=view_citation&hl=en&user=WLN3QrAAAAAJ&citation_for_view=WLN3QrAAAAAJ:u-x6o8ySG0sC
 69 | [4]: http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf
 70 | [5]: http://ftp.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf
 71 | 
 72 | [topo]: http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
 73 | 
 74 | 
 75 | 
 76 | 
 77 | 
 78 | 
 79 | 
 80 | 
 81 | 
 82 | 
 83 | 
 84 | 
 85 | 
 86 | 
 87 | 
 88 | 
 89 | 
 90 | 
 91 | 
 92 | 
 93 | 
 94 | 
 95 | 
 96 | 
 97 | 
 98 | how could we use embedding to learn the 3d generalization of actions?
 99 | you cant quite do it in this case because all the labels are entirely interchangeable based on a new experience. So what if every experience, we check to see if we can predict correctly, otherwise we train up a new RNN. Can we prove that this can generalize 3d geometries? Use embedding to give EVERY example it own unique sets of observables and use embedding to get that down to a reasonable dimension!
100 | 
101 | For one trial we have a 3 by 6 matrix to project into 3D. We run it through the USE. the predictions use the projection matrix transposed. Thus we have an "internal" model that always stays the same between trials. And for each trial, we need to learn its own projections into the internal model. Thus fitting experiences to the internal "notion" fo reality.
102 | 
103 | Thus we can compare experiences using these projections and learn to predict which die / experience we are closest to. My fear is that this will not learn a multimodal distribution of preditions when at the beginning of a new trial. So maybe we must enforce that 
104 | 
105 | Now using the 2D SLAM exmaple, we can focus in on a different part of the problem. For 2D slam, the observations are always of the same nature, thus we dont need to embed of observations for the reasons of the dice problem. But now we need to think about how we represent a multimodal distribution. The problem exists in both. How to we represent the fact that we may be in two places at once? I think the point of NNs is that we get this for free...
106 | 
107 | TODO:


--------------------------------------------------------------------------------
/DL/.gitignore:
--------------------------------------------------------------------------------
1 | *.egg-info
2 | dist/
3 | *.pyc
4 | *.pkl
5 | *.gz


--------------------------------------------------------------------------------
/DL/DL/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ccorcos/deep-learning/df5e3072077460d72d3281724cacefa97a3b2dfd/DL/DL/__init__.py


--------------------------------------------------------------------------------
/DL/DL/datasets/__init__.py:
--------------------------------------------------------------------------------
  1 | 
  2 | import numpy
  3 | import cPickle as pickle
  4 | import os
  5 | import urllib
  6 | import gzip
  7 | from ..utils import untuple
  8 | 
  9 | datasetPath = '/'.join((__file__.split('/')[:-1]+['']))
 10 |  
 11 | def getDataset(name, url):
 12 |     name = datasetPath + name
 13 |     if not os.path.isfile(name):
 14 |         print "Retieving dataset from %s" % (url)
 15 |         urllib.urlretrieve(url, name)
 16 | 
 17 |     if not os.path.isfile(name):
 18 |         print "Cannot find dataset %s" % (name)
 19 | 
 20 |     if name[-2:] == 'gz':
 21 |         f = gzip.open(name, 'rb')
 22 |         data = pickle.load(f)
 23 |         f.close()
 24 |         return data
 25 |     else:
 26 |         f = open(name, 'rb')
 27 |         data = pickle.load(f)
 28 |         f.close()
 29 |         return data
 30 | 
 31 | def mnist():
 32 |     dataset = getDataset('mnist.pkl.gz', 'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz')
 33 |     return untuple(dataset)
 34 | 
 35 | 
 36 | def imdb(validation_ratio=0.1, vocabulary_size=10000, maxlen=100):
 37 |     """
 38 |     validation_ratio: ratio of training data set aside for validation
 39 |     vocabulary_size: Vocabulary size. Assuming the larger the word number, 
 40 |         the less often it occurs. Unknown words are set to 1
 41 |     maxlen: Sequence longer then this get ignored
 42 |     """
 43 | 
 44 |     train_set = getDataset('imdb.pkl', 'http://www.iro.umontreal.ca/~lisa/deep/data/imdb.pkl')
 45 |     test_set  = getDataset('imdb.pkl', 'http://www.iro.umontreal.ca/~lisa/deep/data/imdb.pkl')
 46 | 
 47 |     # filter out the sequences longer than maxlen
 48 |     new_train_set_x = []
 49 |     new_train_set_y = []
 50 |     for x, y in zip(train_set[0], train_set[1]):
 51 |         if len(x) < maxlen:
 52 |             new_train_set_x.append(x)
 53 |             new_train_set_y.append(y)
 54 |     train_set = (new_train_set_x, new_train_set_y)
 55 |     del new_train_set_x, new_train_set_y
 56 |     
 57 | 
 58 |     # split training set into validation set
 59 |     train_set_x, train_set_y = train_set
 60 |     n_samples = len(train_set_x)
 61 |     sample_idx = numpy.random.permutation(n_samples)
 62 |     n_train = int(numpy.round(n_samples * (1. - validation_ratio)))
 63 |     valid_set_x = [train_set_x[s] for s in sample_idx[n_train:]]
 64 |     valid_set_y = [train_set_y[s] for s in sample_idx[n_train:]]
 65 |     train_set_x = [train_set_x[s] for s in sample_idx[:n_train]]
 66 |     train_set_y = [train_set_y[s] for s in sample_idx[:n_train]]
 67 |     train_set = (train_set_x, train_set_y)
 68 |     valid_set = (valid_set_x, valid_set_y)
 69 | 
 70 |     # all words outside the vocabulary are set to 1
 71 |     removeUnknownWords = lambda x: [[1 if word >= vocabulary_size else word for word in review] for review in x]
 72 | 
 73 |     test_set_x, test_set_y = test_set
 74 |     valid_set_x, valid_set_y = valid_set
 75 |     train_set_x, train_set_y = train_set
 76 | 
 77 |     train_set_x = removeUnknownWords(train_set_x)
 78 |     valid_set_x = removeUnknownWords(valid_set_x)
 79 |     test_set_x = removeUnknownWords(test_set_x)
 80 | 
 81 |     # sort the sequences by their length
 82 |     sortLength = lambda sequences: sorted(range(len(sequences)), key=lambda x: len(sequences[x]))
 83 | 
 84 |     sorted_index = sortLength(test_set_x)
 85 |     test_set_x = [test_set_x[i] for i in sorted_index]
 86 |     test_set_y = [test_set_y[i] for i in sorted_index]
 87 | 
 88 |     sorted_index = sortLength(valid_set_x)
 89 |     valid_set_x = [valid_set_x[i] for i in sorted_index]
 90 |     valid_set_y = [valid_set_y[i] for i in sorted_index]
 91 | 
 92 |     sorted_index = sortLength(train_set_x)
 93 |     train_set_x = [train_set_x[i] for i in sorted_index]
 94 |     train_set_y = [train_set_y[i] for i in sorted_index]
 95 | 
 96 |     # gather the dataset again
 97 |     train = (train_set_x, train_set_y)
 98 |     valid = (valid_set_x, valid_set_y)
 99 |     test = (test_set_x, test_set_y)
100 | 
101 |     dataset = [train, valid, test]
102 |     return untuple(dataset)
103 | 
104 | 


--------------------------------------------------------------------------------
/DL/DL/models/DBN.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | # import theano
 5 | # import theano.tensor as T
 6 | # import numpy
 7 | from ForwardFeed import ForwardFeed
 8 | from HiddenLayer import HiddenLayer
 9 | from ..utils import *
10 | 
11 | class DBN(object):
12 |     """Deep Belief Network Class
13 | 
14 |     A Deep Belief network is a feedforward artificial neural network model
15 |     that has many layers of hidden units and nonlinear activations.
16 |     """
17 | 
18 |     def __init__(self, rng, input, n_in, n_out, layer_sizes=[], dropout_rate=0, srng=None, activation='tanh', outputActivation='softmax', params=None):
19 |         """Initialize the parameters for the multilayer perceptron
20 | 
21 |         rng: random number generator, e.g. numpy.random.RandomState(1234)
22 | 
23 |         input: theano.tensor matrix of shape (n_examples, n_in)
24 | 
25 |         n_in: int, dimensionality of input
26 | 
27 |         layer_sizes: array of ints, dimensionality of the hidden layers
28 | 
29 |         n_out: int, number of hidden units
30 | 
31 |         dropout_rate: float, if dropout_rate is non zero, then we implement a Dropout in the hidden layer
32 | 
33 |         activation: string, nonlinearity to be applied in the hidden layer
34 |         """
35 | 
36 |         ff = ForwardFeed(
37 |             rng=rng,
38 |             input=input,
39 |             layer_sizes=[n_in] + layer_sizes,
40 |             activation=activation,
41 |             params=maybe(lambda: params[0]),
42 |             dropout_rate=dropout_rate, 
43 |             srng=srng,
44 |         )
45 | 
46 |         outputLayer = HiddenLayer(
47 |             rng=rng,
48 |             input=ff.output,
49 |             n_in=layer_sizes[-1],
50 |             n_out=n_out,
51 |             activation=outputActivation,
52 |             params=maybe(lambda: params[1])
53 |         )
54 | 
55 |         self.layers = [ff, outputLayer]
56 | 
57 |         self.params = layers_params(self.layers)
58 |         self.L1 = layers_L1(self.layers)
59 |         self.L2_sqr = layers_L2_sqr(self.layers)
60 | 
61 |         self.output = outputLayer.output
62 | 


--------------------------------------------------------------------------------
/DL/DL/models/EmbeddingLayer.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | from ..utils import *
 8 | 
 9 | class EmbeddingLayer(object):
10 |     def __init__(self, rng, input, n_in, n_out, sequenceData=True, onehot=False, params=None):
11 |         # sequenceData tells us if the input dimension is (n_examples, n_timesteps)
12 |         # or if it is (n_examples)
13 |         # onhot tell us us if the input dimension is (n_examples, n_timesteps, n_in) or (n_examples, n_in)
14 | 
15 |         # the output of uniform if converted using asarray to dtype
16 |         # theano.config.floatX so that the code is runable on GPU
17 |         # [Xavier10] suggests that you should use 4 times larger initial 
18 |         # weights for sigmoid compared to tanh.
19 |         W = None
20 |         if params is not None:
21 |             W = params[0]
22 | 
23 |         if W is None:
24 |             W_values = numpy.asarray(
25 |                 rng.rand(n_in, n_out),
26 |                 dtype=theano.config.floatX
27 |             )
28 | 
29 |             W = theano.shared(value=W_values * 0.01, name='W', borrow=True)
30 | 
31 |         if onehot:
32 |             self.output = T.dot(input, W)
33 |         else:
34 |             # change the last dimension to the projected dimension
35 |             shape = T.concatenate([input.shape, [n_out]])
36 |             if sequenceData:
37 |                 self.output = W[input.flatten()].reshape(shape, ndim=3)
38 |             else:
39 |                 self.output = W[input.flatten()].reshape(shape, ndim=2)
40 | 
41 |         self.params = [W]
42 |         self.L1 = 0
43 |         self.L2_sqr = 0


--------------------------------------------------------------------------------
/DL/DL/models/ForwardFeed.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | # import theano
 5 | # import theano.tensor as T
 6 | # import numpy
 7 | from HiddenLayer import HiddenLayer
 8 | from ..utils import *
 9 | 
10 | class ForwardFeed(object):
11 |     """ForwardFeed Class
12 | 
13 |     This is just a chain of hidden layers.
14 |     """
15 | 
16 |     def __init__(self, rng, input, layer_sizes=[], dropout_rate=0, srng=None, params=None, activation='tanh'):
17 |         """Initialize the parameters for the forward feed
18 | 
19 |         rng: random number generator, e.g. numpy.random.RandomState(1234)
20 | 
21 |         input: theano.tensor matrix of shape (n_examples, n_in)
22 | 
23 |         layer_sizes: array of ints, dimensionality of each layer size, input to output
24 | 
25 |         activation: string, nonlinearity to be applied in the hidden layer
26 |         """
27 | 
28 |         output = input
29 |         layers = []
30 |         for i in range(0, len(layer_sizes)-1):
31 |             hiddenLayer = HiddenLayer(
32 |                 rng=rng,
33 |                 input=output,
34 |                 params=maybe(lambda: params[i]),
35 |                 n_in=layer_sizes[i],
36 |                 n_out=layer_sizes[i+1],
37 |                 activation=activation)
38 | 
39 |             h = hiddenLayer.output
40 |             if dropout_rate > 0:
41 |                 assert(srng is not None)
42 |                 h = dropout(srng, dropout_rate, h)
43 | 
44 |             output = h
45 |             layers.append(hiddenLayer)
46 | 
47 |         self.layers = layers
48 |         self.output = output
49 | 
50 |         self.params = layers_params(self.layers)
51 |         self.L1 = layers_L1(self.layers)
52 |         self.L2_sqr = layers_L2_sqr(self.layers)


--------------------------------------------------------------------------------
/DL/DL/models/HiddenLayer.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | from ..utils import *
 8 | 
 9 | class HiddenLayer(object):
10 |     def __init__(self, rng, input, n_in, n_out, params=None, activation='tanh'):
11 |         """
12 |         Typical hidden layer of a MLP: units are fully-connected and have
13 |         sigmoidal (tanh actually) activation function. Weight matrix W is of shape (n_in, n_out)
14 |         and the bias vector b is of shape (n_out,).
15 | 
16 |         rng: random number generator, e.g. numpy.random.RandomState(1234)
17 | 
18 |         input: theano.tensor matrix of shape (n_examples, n_in)
19 | 
20 |         n_in: int, dimensionality of input
21 | 
22 |         n_out: int, number of hidden units
23 | 
24 |         activation: string, nonlinearity to be applied in the hidden layer
25 |         """
26 |         self.input = input
27 |     
28 |         # the output of uniform if converted using asarray to dtype
29 |         # theano.config.floatX so that the code is runable on GPU
30 |         # [Xavier10] suggests that you should use 4 times larger initial 
31 |         # weights for sigmoid compared to tanh.
32 |         W = None
33 |         b = None
34 |         
35 |         if params is not None:
36 |             W = params[0]
37 |             b = params[1]
38 | 
39 |         if W is None:
40 |             W_values = numpy.asarray(
41 |                 rng.uniform(
42 |                     low=-numpy.sqrt(6. / (n_in + n_out)),
43 |                     high=numpy.sqrt(6. / (n_in + n_out)),
44 |                     size=(n_in, n_out)
45 |                 ),
46 |                 dtype=theano.config.floatX
47 |             )
48 |             if activation is 'sigmoid':
49 |                 W_values *= 4
50 | 
51 |             W = theano.shared(value=W_values, name='W', borrow=True)
52 | 
53 |         if b is None:
54 |             b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
55 |             b = theano.shared(value=b_values, name='b', borrow=True)
56 | 
57 | 
58 |         self.output_linear = T.dot(input, W) + b
59 |         self.output = (self.output_linear if activation is None else activations[activation](self.output_linear))
60 | 
61 |         self.params = [W, b]
62 |         self.weights = [W]
63 | 
64 |         self.L1 = compute_L1(self.weights)
65 |         self.L2_sqr = compute_L2_sqr(self.weights)


--------------------------------------------------------------------------------
/DL/DL/models/LSTM.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | # coding: utf-8
  3 | 
  4 | import theano
  5 | import theano.tensor as T
  6 | import numpy
  7 | from ..utils import *
  8 | 
  9 | """
 10 | 
 11 | Generic LSTM Architecture
 12 | LSTM [Graves 2012]
 13 | 
 14 | x is the input
 15 | y is the output
 16 | h is the memory cell
 17 | 
 18 | g_i is the input gate
 19 | c_i is the input candidate
 20 | g_f is the forget gate
 21 | g_o is the output gete
 22 | 
 23 | s is the sigmoid function
 24 | f is a nonlinear transfer function
 25 | 
 26 | 
 27 | g_i_t = s(W_i * x_t + U_i * y_tm1 + b_i)
 28 | c_i_t = f(W_c * x_t + U_c * y_tm1 + b_c)
 29 | 
 30 | g_f_t = s(W_f * x_t + U_f * y_tm1 + b_f)
 31 | h_t = g_i_t * c_i_t + g_f_t * h_tm1
 32 | 
 33 | g_o_t = s(W_o * x_t + U_o * y_tm1 + V_o * h_t + b_o)
 34 | y_t = g_o_t * f(h_t)
 35 | 
 36 | """
 37 | 
 38 | 
 39 | class LSTM(object):
 40 |     """
 41 |     A simplified LSTM Layer.
 42 |     http://deeplearning.net/tutorial/lstm.html
 43 | 
 44 |     For parallelization:
 45 |     g_o_t = s(W_o * x_t + U_o * y_tm1 + b_o)
 46 | 
 47 |     From the input and the previous output, we can compute the input, output, 
 48 |     and forget gates along with the input candidate. Then we can compute the
 49 |     hidden memory units and the output.
 50 |         
 51 |                               h_tm1 ------
 52 |                                           X ------------- X --▶ y_t
 53 |                           --▶ g_o_t ------                |
 54 |               y_tm1 --   |                                |
 55 |                       |-----▶ g_i_t --                    |
 56 |       x_t ------------   |            X ----- + --▶ h_t --
 57 |                          |--▶ c_i_t --        |      
 58 |                          |                    |
 59 |                           --▶ g_f_t ------    |
 60 |                                           X --
 61 |                               h_tm1 ------
 62 | 
 63 |     """
 64 | 
 65 |     def __init__(self, rng, input, mask, n_units, activation='tanh', params=None):
 66 |         
 67 |         # LSTM weights
 68 |         W = None
 69 |         U = None
 70 |         b = None
 71 |         if params is not None:
 72 |             W = params[0]
 73 |             U = params[1]
 74 |             b = params[2]
 75 | 
 76 |         if W is None:
 77 |             # g_i, g_f, g_o, c_i
 78 |             W_values = numpy.concatenate([ortho_weight(n_units),
 79 |                                           ortho_weight(n_units),
 80 |                                           ortho_weight(n_units),
 81 |                                           ortho_weight(n_units)], axis=1)
 82 | 
 83 |             W = theano.shared(value=W_values, name='W', borrow=True)
 84 | 
 85 |         if U is None:
 86 |             U_values = numpy.concatenate([ortho_weight(n_units),
 87 |                                           ortho_weight(n_units),
 88 |                                           ortho_weight(n_units),
 89 |                                           ortho_weight(n_units)], axis=1)
 90 | 
 91 |             U = theano.shared(value=U_values, name='U', borrow=True)
 92 | 
 93 |         if b is None:
 94 |             b_values = numpy.zeros((4 * n_units,)).astype(theano.config.floatX)
 95 | 
 96 |             b = theano.shared(value=b_values, name='b', borrow=True)
 97 | 
 98 | 
 99 |         # cut out the gates after parallel matrix multiplication
100 |         def cut(x, n, dim):
101 |             return x[:, n * dim:(n + 1) * dim]
102 | 
103 |         f = activations[activation]
104 |         s = activations['sigmoid']
105 |         def step(mask_t, xWb_t, y_tm1, h_tm1):
106 |             # (n_examples, 4*n_units)
107 |             pre_activation = T.dot(y_tm1, U) + xWb_t
108 | 
109 |             g_i_t = s(cut(pre_activation, 0, n_units))
110 |             c_i_t = f(cut(pre_activation, 3, n_units))
111 | 
112 |             g_f_t = s(cut(pre_activation, 1, n_units))
113 |             h_t = g_f_t * h_tm1 + g_i_t * c_i_t
114 |             # mask for valid inputs
115 |             h_t = mask_t[:, None] * h_t + (1. - mask_t)[:, None] * h_tm1
116 | 
117 |             g_o_t = s(cut(pre_activation, 2, n_units))
118 |             y_t = g_o_t * f(h_t)
119 |             # mask for valid inputs
120 |             y_t = mask_t[:, None] * y_t + (1. - mask_t)[:, None] * y_tm1
121 | 
122 |             return y_t, h_t
123 | 
124 | 
125 |         # input is initially (n_examples, n_timesteps, n_units)
126 |         # we want to scan over timesteps!
127 |         input = input.dimshuffle(1,0,2)
128 |         # mask is (n_examples, maxlen)
129 |         mask = mask.dimshuffle(1,0)
130 | 
131 |         # timesteps, samples, dimension
132 |         # n_timesteps = input.shape[0]
133 |         n_samples = input.shape[1]
134 | 
135 | 
136 |         # efficiently compute  the input gate, forget gate,
137 |         # (n_timesteps, n_examples, 4 * n_units)
138 |         xWb = T.dot(input, W) + b
139 | 
140 |         [y, h], updates = theano.scan(step,
141 |                                     sequences=[mask, xWb],
142 |                                     outputs_info=[T.alloc(numpy.asarray(0., dtype=theano.config.floatX), n_samples, n_units),
143 |                                                   T.alloc(numpy.asarray(0., dtype=theano.config.floatX), n_samples, n_units)])
144 |                                     # n_steps=n_timesteps)
145 | 
146 |         # swap the dimensions back to (n_examples, n_timesteps, n_units)
147 |         h = h.dimshuffle(1,0,2)
148 |         y = y.dimshuffle(1,0,2)
149 | 
150 |         self.params = [U, W, b]
151 |         self.weights = [U, W]
152 |         self.L1 = compute_L1(self.weights)
153 |         self.L2_sqr = compute_L2_sqr(self.weights)
154 | 
155 |         self.output = y
156 | 


--------------------------------------------------------------------------------
/DL/DL/models/MLP.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | # import theano
 5 | # import theano.tensor as T
 6 | # import numpy
 7 | from HiddenLayer import HiddenLayer
 8 | from ..utils import *
 9 | 
10 | 
11 | class MLP(object):
12 |     """Multi-Layer Perceptron Class
13 | 
14 |     A multilayer perceptron is a feedforward artificial neural network model
15 |     that has one layer or more of hidden units and nonlinear activations.
16 |     Intermediate layers usually have as activation function tanh or the
17 |     sigmoid function  while the top layer is a softamx layer.
18 |     """
19 | 
20 |     def __init__(self, rng, input, n_in, n_hidden, n_out, srng=None, dropout_rate=0, activation='tanh', outputActivation='softmax', params=None):
21 |         """Initialize the parameters for the multilayer perceptron
22 | 
23 |         rng: random number generator, e.g. numpy.random.RandomState(1234)
24 | 
25 |         input: theano.tensor matrix of shape (n_examples, n_in)
26 | 
27 |         n_in: int, dimensionality of input
28 | 
29 |         n_hidden: int, number of hidden units
30 | 
31 |         n_out: int, number of hidden units
32 | 
33 |         dropout_rate: float, if dropout_rate is non zero, then we implement a Dropout in the hidden layer
34 | 
35 |         activation: string, nonlinearity to be applied in the hidden layer
36 | 
37 |         """
38 | 
39 |         hiddenLayer = HiddenLayer(
40 |             rng=rng,
41 |             input=input,
42 |             n_in=n_in,
43 |             n_out=n_hidden,
44 |             activation=activation,
45 |             params=maybe(lambda: params[0])
46 |         )
47 | 
48 |         h = hiddenLayer.output
49 |         if dropout_rate > 0:
50 |             assert(srng is not None)
51 |             h = dropout(srng, dropout_rate, h)
52 | 
53 |         outputLayer = HiddenLayer(
54 |             rng=rng,
55 |             input=h,
56 |             n_in=n_hidden,
57 |             n_out=n_out,
58 |             activation=outputActivation,
59 |             params=maybe(lambda: params[1])
60 |         )
61 | 
62 |         self.layers = [hiddenLayer, outputLayer]
63 |         self.params = layers_params(self.layers)
64 |         self.L1 = layers_L1(self.layers)
65 |         self.L2_sqr = layers_L2_sqr(self.layers)
66 | 
67 |         self.output = outputLayer.output
68 | 


--------------------------------------------------------------------------------
/DL/DL/models/RNN.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | # coding: utf-8
  3 | 
  4 | import theano
  5 | import theano.tensor as T
  6 | import numpy
  7 | from ..utils import *
  8 | from HiddenLayer import HiddenLayer
  9 | 
 10 | 
 11 | class Recurrence(object):
 12 |     """A Reccurence Class which wraps an architecture into a recurrent one."""
 13 | 
 14 |     def __init__(self, input, input_t, output_t, recurrent_t, recurrent_tm1, recurrent_0, updates=[]):
 15 |         """Initialize the recurrence class with the input, output, the recurrent variable
 16 |         and the initial recurrent variable.
 17 | 
 18 |         This compute in minibatches, so input is (n_examples, n_timesteps, n_in)
 19 |         input_t is (n_examples, n_in)
 20 | 
 21 |         """
 22 | 
 23 |         # compute the recurrence
 24 |         def step(x_t, h_tm1):
 25 |             h_t = theano.clone(recurrent_t, replace=updates + [(input_t, x_t), (recurrent_tm1, h_tm1)])       
 26 |             y_t = theano.clone(output_t, replace=updates + [(recurrent_t, h_t)])
 27 |             return h_t, y_t
 28 | 
 29 |         h0_t = T.extra_ops.repeat(recurrent_0[numpy.newaxis, :], input.shape[0], axis=0)
 30 | 
 31 |         [h, y], _ = theano.scan(step,
 32 |                             sequences=[input.dimshuffle(1,0,2),], # swap the first two dimensions to scan over n_timesteps
 33 |                             outputs_info=[h0_t, None])
 34 | 
 35 |         # swap the dimensions back to (n_examples, n_timesteps, n_out)
 36 |         h = h.dimshuffle(1,0,2)
 37 |         y = y.dimshuffle(1,0,2)
 38 | 
 39 |         self.output = y
 40 |         self.recurrent = h
 41 | 
 42 | 
 43 | 
 44 | class RNN(object):
 45 |     """Recurrent Neural Network Class
 46 | 
 47 |     A RNN looks a lot like an MLP but the hidden layer is recurrent, so the hidden
 48 |     layer receives the input and the itselft at the previous time step. RNNs can have
 49 |     "deep" transition, inputs, and outputs like so:
 50 | 
 51 | 
 52 |      (n_in) ----▶  (n_hidden) ----▶ (n_out)
 53 |                  ▲            |
 54 |                  |            |
 55 |                  |            |
 56 |                  -----{t-1}----     
 57 |     """
 58 | 
 59 |     def __init__(self, rng, input, n_in, n_hidden, n_out, activation='tanh', outputActivation='softmax', params=None):
 60 |         """Initialize the parameters for the recurrent neural network
 61 | 
 62 |         rng: random number generator, e.g. numpy.random.RandomState(1234)
 63 | 
 64 |         input: theano.tensor matrix of shape (n_examples, n_timesteps, n_in)
 65 | 
 66 |         n_in: int, dimensionality of input
 67 | 
 68 |         n_hidden: int, number of hidden units
 69 | 
 70 |         n_out: int, number of hidden units
 71 | 
 72 |         dropout_rate: float, if dropout_rate is non zero, then we implement a Dropout all hidden layers
 73 | 
 74 |         activation: string, nonlinearity to be applied in the hidden layer
 75 |         """
 76 | 
 77 |         # create the h0 prior
 78 |         h0 = None
 79 |         if params:
 80 |             h0 = params[0]
 81 |         else:
 82 |             h0_values = numpy.asarray(
 83 |                 rng.uniform(
 84 |                     low=-numpy.sqrt(6. / (n_in + n_out)),
 85 |                     high=numpy.sqrt(6. / (n_in + n_out)),
 86 |                     size=(n_hidden,)
 87 |                 ),
 88 |                 dtype=theano.config.floatX
 89 |             )
 90 |             if activation is 'sigmoid':
 91 |                 h0_values *= 4
 92 | 
 93 |             h0 = theano.shared(value=h0_values, name='h0', borrow=True)
 94 | 
 95 | 
 96 | 
 97 |         # Create the computation graph
 98 |         h_tm1 = T.matrix('h_tm1') # n_examples, n_hidden @ t-1
 99 |         x_t = T.matrix('x_t') # n_examples, n_in @ some specific time
100 |        
101 |         hiddenLayer = HiddenLayer(
102 |             rng=rng,
103 |             input= T.concatenate([x_t, h_tm1], axis=1),
104 |             n_in=n_in+n_hidden,
105 |             n_out=n_hidden,
106 |             activation=activation,
107 |             params=maybe(lambda: params[1])
108 |         )
109 | 
110 |         h_t = hiddenLayer.output
111 | 
112 |         outputLayer = HiddenLayer(
113 |             rng=rng,
114 |             input=h_t,
115 |             n_in=n_hidden,
116 |             n_out=n_out,
117 |             activation=outputActivation,
118 |             params=maybe(lambda: params[1])
119 |         )
120 | 
121 |         y_t = outputLayer.output
122 | 
123 |         self.layers = [hiddenLayer, outputLayer]
124 |         self.params = [h0] + layers_params(self.layers)
125 |         self.L1 = layers_L1(self.layers)
126 |         self.L2_sqr = layers_L2_sqr(self.layers)
127 | 
128 |         recurrence = Recurrence(
129 |             input=input, 
130 |             input_t=x_t, 
131 |             output_t=y_t, 
132 |             recurrent_t=h_t, 
133 |             recurrent_tm1=h_tm1, 
134 |             recurrent_0=h0,
135 |         )
136 | 
137 |         self.output = recurrence.output
138 |         self.h = recurrence.recurrent
139 | 


--------------------------------------------------------------------------------
/DL/DL/models/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ccorcos/deep-learning/df5e3072077460d72d3281724cacefa97a3b2dfd/DL/DL/models/__init__.py


--------------------------------------------------------------------------------
/DL/DL/optimizers/__init__.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | # coding: utf-8
  3 | 
  4 | import theano
  5 | import theano.tensor as T
  6 | import numpy
  7 | import time
  8 | import random
  9 | from ..utils import startTimer
 10 | import sgd
 11 | import rmsprop
 12 | import adadelta
 13 | 
 14 | optimizers = {
 15 |     'sgd': sgd.sgd,
 16 |     'rmsprop': rmsprop.rmsprop,
 17 |     'adadelta': adadelta.adadelta,
 18 | }
 19 | 
 20 | def optimize(dataset=None,
 21 |              inputs=None,
 22 |              cost=None,
 23 |              params=None,
 24 |              errors=None,
 25 |              n_epochs=1000,
 26 |              batch_size=20,
 27 |              patience=10000,
 28 |              patience_increase=2,
 29 |              improvement_threshold=0.995,
 30 |              updates=[],
 31 |              test_batches=-1,
 32 |              print_cost=False,
 33 |              optimizer='rmsprop',
 34 |              **options):
 35 | 
 36 | 
 37 |     # index to a [mini]batch
 38 |     index = T.lscalar()  
 39 | 
 40 |     train_set = dataset[0]
 41 |     valid_set = dataset[1]
 42 |     test_set = dataset[2]
 43 | 
 44 |     # compute number of minibatches for training, validation and testing
 45 |     n_train_batches = train_set[0].get_value(borrow=True).shape[0] / batch_size
 46 |     n_valid_batches = valid_set[0].get_value(borrow=True).shape[0] / batch_size
 47 |     n_test_batches = test_set[0].get_value(borrow=True).shape[0] / batch_size
 48 | 
 49 |     if n_train_batches == 0:
 50 |         n_train_batches = 1
 51 |     if n_valid_batches == 0:
 52 |         n_valid_batches = 1
 53 |     if n_test_batches == 0:
 54 |         n_test_batches = 1
 55 | 
 56 |     print "compiling test function"
 57 |     stop = startTimer("compiling test function")
 58 |     # compiling a Theano function that computes the mistakes that are made
 59 |     # by the model on a minibatch
 60 |     test_givens = list(updates)
 61 |     valid_givens = list(updates)
 62 |     train_givens = list(updates)
 63 |     for i in range(len(inputs)):
 64 |         test_givens.append((inputs[i], test_set[i][index * batch_size:(index + 1) * batch_size]))
 65 |         valid_givens.append((inputs[i], valid_set[i][index * batch_size:(index + 1) * batch_size]))
 66 |         train_givens.append((inputs[i], train_set[i][index * batch_size:(index + 1) * batch_size]))
 67 | 
 68 |     test_model = theano.function(
 69 |         inputs=[index],
 70 |         outputs=errors,
 71 |         givens=test_givens
 72 |     )
 73 |     stop()
 74 | 
 75 |     print "compiling validate function"
 76 |     stop = startTimer("compiling validate function")
 77 |     validate_model = theano.function(
 78 |         inputs=[index],
 79 |         outputs=errors,
 80 |         givens=valid_givens
 81 |     )
 82 |     stop()
 83 | 
 84 |     print "computing gradients"
 85 |     stop = startTimer("computing gradients")
 86 |     gparams = T.grad(cost, params)
 87 |     stop()
 88 | 
 89 | 
 90 |     updates = optimizers[optimizer](
 91 |         params=params,
 92 |         gparams=gparams,
 93 |         **options
 94 |     )
 95 | 
 96 |     print optimizer + ": compiling training function"
 97 |     stop = startTimer(optimizer + ": compiling training function")
 98 |     # compiling a Theano function `train_model` that returns the cost, but in
 99 |     # the same time updates the parameter of the model based on the rules
100 |     # defined in `updates`
101 |     train_model = theano.function(
102 |         inputs=[index],
103 |         outputs=cost,
104 |         updates=updates,
105 |         givens=train_givens
106 |     )
107 |     stop()
108 | 
109 |     validation_frequency = min(n_train_batches, patience / 2)
110 | 
111 |     start_time = time.clock()
112 | 
113 |     best_validation_loss = numpy.inf
114 |     best_iter = 0
115 |     test_loss = 0.
116 | 
117 |     epoch = 0
118 |     impatient = False
119 | 
120 |     print optimizer + ": optimizing..."
121 |     try:
122 |         while (epoch < n_epochs) and (not impatient):
123 |             epoch = epoch + 1
124 |             for minibatch_index in xrange(n_train_batches):
125 |                 minibatch_avg_cost = train_model(minibatch_index)
126 |                 if print_cost:
127 |                     print "    cost: %0.05f" % minibatch_avg_cost
128 |                     
129 |                 # keep track of how many minibatches we've trained. Every so often, do a validation.
130 |                 iteration = (epoch - 1) * n_train_batches + minibatch_index
131 |                 if (iteration + 1) % validation_frequency is 0:
132 |                     # compute zero-one loss on validation set
133 |                     validation_losses = [validate_model(i) for i in xrange(n_valid_batches)]
134 |                     this_validation_loss = numpy.mean(validation_losses)
135 |                     print 'epoch %i, minibatch %i/%i, validation error %f %%' % (epoch, minibatch_index + 1, n_train_batches, this_validation_loss * 100.)
136 |                     print '   iteration %i, patience %i' % (iteration, patience)
137 | 
138 |                     # if we have a better validation then keep track of it
139 |                     if this_validation_loss < best_validation_loss:
140 |                         #improve patience if loss improvement is good enough
141 |                         if this_validation_loss < best_validation_loss * improvement_threshold:
142 |                             patience = max(patience, iteration * patience_increase)
143 |                         # keep track of the best validation
144 |                         best_validation_loss = this_validation_loss
145 |                         best_iter = iteration
146 |                         # remember the test loss as well
147 |                         test_losses = None
148 |                         if test_batches > 0:
149 |                             test_losses = [test_model(i) for i in random.sample(range(n_test_batches), test_batches)]
150 |                         else:
151 |                             test_losses = [test_model(i) for i in xrange(n_test_batches)]
152 |                         test_loss = numpy.mean(test_losses)
153 |                         print '     epoch %i, minibatch %i/%i, best test error %f %%' % (epoch, minibatch_index + 1, n_train_batches, test_loss * 100.)
154 | 
155 |                 if patience <= iteration:
156 |                     impatient = True
157 |                     break
158 | 
159 |     except KeyboardInterrupt:
160 |         print ""
161 |         print ""
162 |         print optimizer + ": optimization interupted"
163 |         print ""
164 | 
165 |     end_time = time.clock()
166 | 
167 |     print 'Optimiztation complete'
168 |     print 'The code run for %d epochs, with %f epochs/sec' % (epoch, 1. * epoch / (end_time - start_time))
169 |     print 'Best validation score of %f %% obtained at iteration %i, with test performance %f %%' % (best_validation_loss * 100., best_iter + 1, test_loss * 100.)
170 |     print ""
171 | 
172 |     train_loss = None
173 |     valid_loss = None
174 |     test_loss = None
175 | 
176 |     try:
177 |         print "computing model errors"
178 |         print ""
179 |         print "  training..."
180 |         train_losses = [train_model(i) for i in xrange(n_train_batches)]
181 |         train_loss = numpy.mean(train_losses)
182 |         print "  validation..."
183 |         valid_losses = [validate_model(i) for i in xrange(n_valid_batches)]
184 |         valid_loss = numpy.mean(valid_losses)
185 |         print "  test..."
186 |         test_losses = [test_model(i) for i in xrange(n_test_batches)]
187 |         test_loss = numpy.mean(test_losses)
188 |         print "\n  train: %0.05f \n  validation: %0.05f \n  test: %0.05f \n" % (train_loss, valid_loss, test_loss)
189 |     
190 |     except KeyboardInterrupt:
191 |         print "computing model errors interrupted"
192 |     
193 |     return train_loss, valid_loss, test_loss
194 | 


--------------------------------------------------------------------------------
/DL/DL/optimizers/adadelta.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | 
 8 | def adadelta(params, gparams):
 9 | 
10 |     # http://deeplearning.net/tutorial/code/lstm.py
11 |     zipped_grads = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params]
12 |     running_up2 = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params]
13 |     running_grads2 = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params]
14 | 
15 |     zgup = [(zg, g) for zg, g in zip(zipped_grads, gparams)]
16 |     rg2up = [(rg2, 0.95 * rg2 + 0.05 * (g ** 2)) for rg2, g in zip(running_grads2, gparams)]
17 | 
18 |     updir = [-T.sqrt(ru2 + 1e-6) / T.sqrt(rg2 + 1e-6) * zg for zg, ru2, rg2 in zip(zipped_grads, running_up2, running_grads2)]
19 |     ru2up = [(ru2, 0.95 * ru2 + 0.05 * (ud ** 2)) for ru2, ud in zip(running_up2, updir)]
20 |     param_up = [(p, p + ud) for p, ud in zip(params, updir)]
21 | 
22 |     updates = zgup + rg2up + ru2up + param_up
23 | 
24 |     return updates
25 | 


--------------------------------------------------------------------------------
/DL/DL/optimizers/rmsprop.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | 
 8 | def rmsprop(params, gparams):
 9 | 
10 |     # http://deeplearning.net/tutorial/code/lstm.py
11 |     zipped_grads = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params]
12 |     running_grads = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params]
13 |     running_grads2 = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params]
14 | 
15 |     zgup = [(zg, g) for zg, g in zip(zipped_grads, gparams)]
16 |     rgup = [(rg, 0.95 * rg + 0.05 * g) for rg, g in zip(running_grads, gparams)]
17 |     rg2up = [(rg2, 0.95 * rg2 + 0.05 * (g ** 2)) for rg2, g in zip(running_grads2, gparams)]
18 | 
19 |     updir = [theano.shared(p.get_value() * numpy.asarray(0., dtype=theano.config.floatX)) for p in params]
20 |     updir_new = [(ud, 0.9 * ud - 1e-4 * zg / T.sqrt(rg2 - rg ** 2 + 1e-4)) for ud, zg, rg, rg2 in zip(updir, zipped_grads, running_grads, running_grads2)]
21 |     param_up = [(p, p + udn[1]) for p, udn in zip(params, updir_new)]
22 |     
23 |     updates = zgup + rgup + rg2up + updir_new + param_up
24 | 
25 |     return updates


--------------------------------------------------------------------------------
/DL/DL/optimizers/sgd.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | # import theano.tensor as T
 6 | import numpy
 7 | 
 8 | def sgd(params, gparams,learning_rate=0.01, momentum=0.1):
 9 | 
10 |     """
11 |     stochastic gradient descent optimization with early stopping and momentum
12 | 
13 |     for vanilla gradient decent, set the patience to numpy.inf and momentum to 0
14 | 
15 |     early stopping criteria
16 |     patience: look as this many examples regardless
17 |     patience_increase: wait this much longer when a new best is found
18 |     improvement_threshold: a relative improvement of this much is considered significant
19 | 
20 |     dataset is a list or tuple of length 3 including the training set, validation set
21 |     and the test set. In each set, these must be a list or tuple of the inputs to
22 |     the computational graph in the same order as the list of Theano.tensor variable
23 |     that are passed in as inputs. The inputs to the graph must accept minibatches meaning
24 |     that the first dimension is the number of training examples. 
25 | 
26 |     """
27 | 
28 | 
29 |     momentums = [theano.shared(numpy.zeros(param.get_value(borrow=True).shape, dtype=theano.config.floatX)) for param in params]
30 |     updates = []
31 |     for param, gparam, mom in zip(params, gparams, momentums):
32 |         update = momentum * mom - learning_rate * gparam
33 |         updates.append((mom, update))
34 |         updates.append((param, param + update))
35 | 
36 |     return updates


--------------------------------------------------------------------------------
/DL/DL/utils.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | # coding: utf-8
  3 | 
  4 | import theano
  5 | import theano.tensor as T
  6 | import numpy
  7 | import operator
  8 | import time
  9 | 
 10 | def load_data(dataset):
 11 |     ''' Loads the dataset to the GPU
 12 | 
 13 |     dataset = [train_set, valid_set, test_set]
 14 |     
 15 |     each set is a tuple (input, target)
 16 |     input is a matrix where rows are a sample
 17 |     target is a 1d array of what output should be
 18 |     '''
 19 | 
 20 |     def shared_dataset(data, borrow=True):
 21 |         """ Function that loads the dataset into shared variables
 22 | 
 23 |         Create a shared dataset, copying the whole thing to the GPU.
 24 |         We dont want to copy each minibatch over one at a time.
 25 |         """
 26 |         sharedData = []
 27 |         for input in data:
 28 |             shared = theano.shared(numpy.asarray(input,
 29 |                                                    dtype=theano.config.floatX),
 30 |                                                    borrow=borrow)
 31 |             
 32 |             sharedData.append(shared)
 33 |             
 34 |         return sharedData
 35 | 
 36 |     test_set = shared_dataset(dataset[2])
 37 |     valid_set = shared_dataset(dataset[1])
 38 |     train_set = shared_dataset(dataset[0])
 39 | 
 40 |     rval = [train_set, valid_set, test_set]
 41 |     return rval
 42 | 
 43 | def maybe(func, otherwise=None):
 44 |     res = None
 45 |     try:
 46 |         res = func()
 47 |     except:
 48 |         return otherwise
 49 |     return res
 50 | 
 51 | def flattenIterator(container):
 52 |     for i in container:
 53 |         if isinstance(i, list) or isinstance(i, tuple):
 54 |             for j in flatten(i):
 55 |                 yield j
 56 |         else:
 57 |             yield i
 58 | 
 59 | flatten = lambda x: list(flattenIterator(x))
 60 | 
 61 | fmt = lambda x: "{:12.8f}".format(x)
 62 | 
 63 | def onehot(value, length):
 64 |     v = [0]*length
 65 |     v[value] = 1
 66 |     return v
 67 | 
 68 | relu = lambda x: T.switch(x<0, 0, x)
 69 | cappedrelu =  lambda x: T.minimum(T.switch(x<0, 0, x), 6)
 70 | sigmoid = T.nnet.sigmoid
 71 | tanh = T.tanh
 72 | # softmax = T.nnet.softmax
 73 | 
 74 | # a differentiable version for HF that doesn't have some optimizations.
 75 | def softmax(x):
 76 |     e_x = T.exp(x - x.max(axis=1, keepdims=True)) 
 77 |     out = e_x / e_x.sum(axis=1, keepdims=True)
 78 |     return out
 79 | 
 80 | activations = {
 81 |     'relu': relu,
 82 |     'cappedrelu': cappedrelu,
 83 |     'sigmoid': sigmoid,
 84 |     'tanh': tanh,
 85 |     'linear': lambda x: x,
 86 |     'softmax': softmax
 87 | }
 88 | 
 89 | def compute_L1(weights):
 90 |     return reduce(operator.add, map(lambda x: abs(x).sum(), weights), 0)
 91 | 
 92 | def compute_L2_sqr(weights):
 93 |     return reduce(operator.add, map(lambda x: (x ** 2).sum(), weights), 0)
 94 | 
 95 | def layers_L1(layers):
 96 |     return reduce(operator.add, map(lambda x: x.L1, layers), 0)
 97 | 
 98 | def layers_L2_sqr(layers):
 99 |     return reduce(operator.add, map(lambda x: x.L2_sqr, layers), 0)
100 | 
101 | def layers_params(layers):
102 |     return map(lambda x: x.params, layers)
103 | 
104 | def mse(output, targets):
105 |     return T.mean((output - targets) ** 2)
106 | 
107 | def nll_binary(output, targets):
108 |     # negative log likelihood based on binary cross entropy error
109 |     return T.mean(T.nnet.binary_crossentropy(output, targets))    
110 | 
111 | def nll_multiclass(output, targets):
112 |     return -T.mean(T.log(output)[T.arange(targets.shape[0]), targets])
113 | 
114 | def nll_multiclass_timeseries(output, targets):    
115 |     # Theano's advanced indexing is limited
116 |     # therefore we reshape our n_steps x n_seq x n_classes tensor3 of probs
117 |     # to a (n_steps * n_seq) x n_classes matrix of probs
118 |     # so that we can use advanced indexing (i.e. get the probs which
119 |     # correspond to the true class)
120 |     # the labels targets also must be flattened when we do this to use the
121 |     # advanced indexing
122 |     p_y = output
123 |     p_y_m = T.reshape(p_y, (p_y.shape[0] * p_y.shape[1], -1))
124 |     y_f = targets.flatten(ndim=1)
125 |     return -T.mean(T.log(p_y_m)[T.arange(p_y_m.shape[0]), y_f])
126 | 
127 | def pred_binary(output):
128 |     return T.round(output)  # round to {0,1}
129 | 
130 | def pred_multiclass(output):
131 |     return T.argmax(output, axis=-1)
132 | 
133 | def pred_error(pred, targets):
134 |     # check if y has same dimension of y_pred
135 |     if targets.ndim != pred.ndim:
136 |         raise TypeError('targets should have the same shape as pred', ('targets', targets.type, 'pred', pred.type))
137 |     
138 |     # check if targets is of the correct datatype
139 |     if targets.dtype.startswith('int'):
140 |         # the T.neq operator returns a vector of 0s and 1s, where 1
141 |         # represents a mistake in prediction
142 |         return T.mean(T.neq(pred, targets))
143 | 
144 | def untuple(a):
145 |     if isinstance(a, tuple):
146 |         return untuple(list(a))
147 |     if isinstance(a, (numpy.ndarray, numpy.generic) ):
148 |         return a
149 |     if isinstance(a, list):
150 |         for i in range(len(a)):
151 |             a[i] = untuple(a[i])
152 |     return a
153 | 
154 | # allow to specify the size separately -- sometimes an issue when cloning, scanning, etc.
155 | # see the theano-tests/random-streams-scan-clone.py
156 | def dropout(srng, dropout_rate, inp, size=None):
157 |     if size is None:
158 |       size = inp.shape  
159 |     # p=1-p because 1's indicate keep and p is prob of dropping
160 |     mask = srng.binomial(n=1, p=1-dropout_rate, size=size, dtype=theano.config.floatX)
161 |     # The cast is important because int * float32 = float64 which pulls things off the gpu
162 |     output = inp * mask
163 |     return output
164 | 
165 | # "Exact solutions to the nonlinear dynamics of learning in deep linear neural networks"
166 | # http://arxiv.org/abs/1312.6120
167 | def ortho_weight(ndim):
168 |     W = numpy.random.randn(ndim, ndim)
169 |     u, s, v = numpy.linalg.svd(W)
170 |     return u.astype(theano.config.floatX)
171 | 
172 | 
173 | def stopTimer(start, message):
174 |     print message + " took %0.03f seconds" % (time.clock() - start)
175 | 
176 | def startTimer(message):
177 |     start = time.clock()
178 |     return lambda: stopTimer(start, message)
179 | 
180 | def sequencePadAndMask(seqs):
181 |   """
182 |   takes a sequence of (n_samples, n_timeteps, ...)
183 |   pads every sequences to the maxlen of timesteps for all sequences
184 |   also produces a mask
185 |   """
186 | 
187 |   n_samples = len(seqs)
188 |   lengths = map(len, seqs)
189 |   maxlen = max(lengths)
190 | 
191 |   x = numpy.zeros((n_samples, maxlen))
192 |   x_mask = numpy.zeros((n_samples, maxlen))
193 | 
194 |   for idx, s in enumerate(seqs):
195 |       x[idx, 0:lengths[idx]] = s
196 |       x_mask[idx, 0:lengths[idx]] = 1.
197 | 
198 |   return x, x_mask
199 | 
200 | def datasetPadAndMask(dataset, sequenceIdx):
201 |     """
202 |     for each set in the dataset, it find the sequence at the sequenceIdx
203 |     and pads and masks it, then mutates the dataset and append the mask as
204 |     the last value in the dataset
205 |     """
206 |     for s in dataset:
207 |         seq = s[sequenceIdx]
208 |         paddedSeq, seqMask = sequencePadAndMask(seq)
209 |         s[sequenceIdx] = paddedSeq
210 |         s.append(seqMask)
211 | 


--------------------------------------------------------------------------------
/DL/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | 
 3 | 
 4 | setup(
 5 |     name='DL',
 6 |     version='0.1',
 7 |     description='Some deep learning tools.',
 8 |     long_description='Some deep learning tools.',
 9 |     classifiers=[
10 |         'Development Status :: 2 - Pre-Alpha',
11 |         'Programming Language :: Python :: 2.7',
12 |         'Intended Audience :: Science/Research',
13 |     ],
14 |     keywords='deep learning',
15 |     url='https://github.com/ccorcos/',
16 |     author='Chet Corcos',
17 |     author_email='ccorcos@gmail',
18 |     license='MIT',
19 |     packages=['DL'],
20 |     install_requires=[
21 |         'numpy',
22 |         'theano',
23 |     ],
24 |     test_suite='nose.collector',
25 |     tests_require=['nose'],
26 |     include_package_data=True,
27 |     zip_safe=False
28 | )
29 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Some Deep Learning models built with Theano
 2 | 
 3 | 
 4 | ## To Do
 5 | 
 6 | - save and reload model.
 7 | 
 8 | LSTM
 9 | - try on dice model
10 | 
11 | RNN
12 | - tensor.alloc in RNN instead of repeat
13 | - modify rnn to handle varying length sequences with a mask
14 | - dropout?
15 | 
16 | MRNN
17 | - character prediction
18 | - hf optimization
19 | 
20 | MUSE
21 | - USE with multiplicative units
22 | 
23 | Theano-users
24 | - ask about rnn dropout
25 | - ask about ubuntu install script
26 | 
27 | CNN
28 | - conv net on images or maybe even imagenet
29 | 
30 | DA
31 | - denoising autoencoder on mnist
32 | 
33 | Writing
34 | - write about deep learning, https://imgur.com/a/Hqolp
35 | 
36 | TSNE
37 | - low dimensional visualization: http://lvdmaaten.github.io/tsne/
38 | 
39 | ## Getting Started
40 | 
41 | Some examples use `sparkprob` to visualize probablity distributions at the commandline so you may need to install it
42 | 
43 |     pip install sparkprob
44 | 
45 | All of the examples use the `DL` package. To use it:
46 |   
47 |     cd DL
48 |     python setup.py develop
49 | 
50 | To unlink this package when you are done:
51 | 
52 |     cd DL
53 |     python setup.py develop --uninstall
54 | 
55 | To load the datasets
56 | 
57 |     cd datasets
58 |     curl -O http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz
59 |     curl -O http://www.iro.umontreal.ca/~lisa/deep/data/imdb.pkl
60 | 
61 | 
62 | <!-- ### Models
63 | 
64 | The general idea works like this. A "model" take symbolic tensors representing a minibatch. The model constructs the computational graph and produces some class variables to hook into: params, L1, L2_sqr, loss, errors, output, pred. Thus, we can compose models and propagate the L1 and L2_sqr for regularization. We can save the params and pass them as inputs to load a model. We can use the loss and the errors to pass into an optimization function. And we can create a prediction function using output or pred.
65 | 
66 | Some subtleties here about the naming. Loss is typically something like cross-entropy-error, negative-log-likelihood, or mean-square-error. Errors would be something like, did the model predict the correct MNIST number? Similarly, the output of model could be a distribution of MNIST number likelihoods, but pred, the prediction, is the number with the maximum likelihood.
67 | 
68 | ### Optimizers
69 | 
70 | An "optimizer" takes in a dataset which is a list of 3 elements: training dataset, validation dataset, test dataset. Each these sub-datasets is an array of data for each of the inputs to the computational graph of the model. Note that the inputs to the "computational graph of the model" includes the "outputs of the model" so-to-speak. The order of the data must correspond to the order of the tensors list passed into inputs. Optimizers are also given param's to update with respect to a cost function. The errors are used for test and validation.
71 |  -->


--------------------------------------------------------------------------------
/THEANO.md:
--------------------------------------------------------------------------------
 1 | # Theano Notes
 2 | 
 3 | ## Debugging
 4 | [This is a good resource](http://deeplearning.net/software/theano/tutorial/debug_faq.html).
 5 | 
 6 | Print out `.type` of a theano variable to get information about the tensor type.
 7 | 
 8 | Also, try running your program with
 9 | 
10 |     THEANO_FLAGS="optimizer=None" python program.py
11 | 
12 | This will give you line numbers and more information.
13 | 
14 | 
15 | Also, for any symbolic variables defined, it helps to give them test values which can be used
16 | to test the functionality of the program as it goes so you can get a line number when it happens:
17 | 
18 |     x = T.matrix()
19 |     x.tag.test_value = numpy.random.rand(10, 20)
20 | 
21 | Then make sure you set the flag when you run it.
22 | 
23 |     THEANO_FLAGS="optimizer=None,compute_test_value=raise" python program.py
24 | 
25 |     THEANO_FLAGS="exception_verbosity=high"
26 |     
27 | # Parallelization
28 | 
29 | On Mac, you can use the GPU if you have a newer machine with an NVIDIA graphics card. 
30 | 
31 | Theano uses the OS X Accelerate framework for BLAS and other optimizations.
32 | 
33 | 
34 | 


--------------------------------------------------------------------------------
/examples/dbn-mnist.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | from DL.models.DBN import DBN
 8 | from DL.optimizers import optimize
 9 | from DL import datasets
10 | from DL.utils import *
11 | import warnings
12 | import time
13 | 
14 | warnings.simplefilter("ignore")
15 | 
16 | print "An DBN on MNIST with dropout."
17 | print "loading MNIST"
18 | mnist = datasets.mnist()
19 | 
20 | print "loading data to the GPU"
21 | dataset = load_data(mnist)
22 | 
23 | print "creating the DBN"
24 | x = T.matrix('x')  # input
25 | t = T.vector('t')  # targets
26 | inputs = [x, t]
27 | # cast to an int. needs to be initially a float to load to the GPU
28 | it = t.astype('int64')
29 | 
30 | rng = numpy.random.RandomState(int(time.time())) # random number generator
31 | srng = T.shared_randomstreams.RandomStreams(int(time.time()))
32 | 
33 | # construct the DBN class
34 | dbn = DBN(
35 |     rng=rng,
36 |     input=x,
37 |     n_in=28 * 28,
38 |     layer_sizes=[200,200,200],
39 |     n_out=10,
40 |     dropout_rate=0.5,
41 |     srng=srng
42 | )
43 | 
44 | # regularization
45 | L1_reg=0.00
46 | L2_reg=0.0001
47 | 
48 | # cost function
49 | cost = (
50 |     nll_multiclass(dbn.output, it)
51 |     + L1_reg * dbn.L1
52 |     + L2_reg * dbn.L2_sqr
53 | )
54 | 
55 | pred = pred_multiclass(dbn.output)
56 | 
57 | errors = pred_error(pred, it)
58 | 
59 | params = flatten(dbn.params)
60 | 
61 | print "training the dbn with rmsprop"
62 | 
63 | optimize(dataset=dataset,
64 |         inputs=inputs,
65 |         cost=cost,
66 |         params=params,
67 |         errors=errors,
68 |         n_epochs=100,
69 |         batch_size=20,
70 |         patience=10000,
71 |         patience_increase=1.25,
72 |         improvement_threshold=0.995,
73 |         optimizer="rmsprop")
74 | 
75 | print "compiling the prediction function"
76 | 
77 | predict = theano.function(inputs=[x], outputs=pred)
78 | distribution = theano.function(inputs=[x], outputs=dbn.output)
79 | 
80 | print "predicting the first 10 samples of the test dataset"
81 | print "predict:", predict(mnist[2][0][0:10])
82 | print "answer: ", mnist[2][1][0:10]
83 | 
84 | print "the output distribution should be slightly different each time due to dropout"
85 | print "distribution:", distribution(mnist[2][0][0:1])
86 | print "distribution:", distribution(mnist[2][0][0:1])
87 | print "distribution:", distribution(mnist[2][0][0:1])
88 | print "distribution:", distribution(mnist[2][0][0:1])
89 | print "distribution:", distribution(mnist[2][0][0:1])


--------------------------------------------------------------------------------
/examples/lstm-imdb.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | # coding: utf-8
  3 | 
  4 | import theano
  5 | import theano.tensor as T
  6 | import numpy
  7 | from DL.models.LSTM import LSTM
  8 | from DL.models.EmbeddingLayer import EmbeddingLayer
  9 | from DL.models.HiddenLayer import HiddenLayer
 10 | from DL.optimizers import optimize
 11 | from DL import datasets
 12 | from DL.utils import *
 13 | import time
 14 | 
 15 | # hide warnings
 16 | import warnings
 17 | warnings.simplefilter("ignore")
 18 | 
 19 | print "An LSTM with mean-pooling and embedded words on IMDB for sentiment analysis."
 20 | print "    x ---> x_emb ---> LSTM ---> meanPool ---> softmax ---> {1,0} sentiment"
 21 | print "loading IMDB"
 22 | 
 23 | dim_proj=128               # word embeding dimension and LSTM number of hidden units.
 24 | vocabulary_size=10000      # Vocabulary size
 25 | maxlen=100                 # Sequence longer then this get ignored
 26 | dropout_rate = 0.5
 27 | validation_ratio=0.05
 28 | 
 29 | 
 30 | # imdb has 3 elements, train, validation and test sets
 31 | # the first input in each set is a matrix of (n_examples, n_timesteps) with a number representing each word
 32 | # the second input is a vector of {0,1} sentiment
 33 | imdb = datasets.imdb(validation_ratio=validation_ratio, vocabulary_size=vocabulary_size, maxlen=maxlen)
 34 | 
 35 | # mutate the dataset to pad and mask the sequences
 36 | # the sequences are the first input in the dataset
 37 | # now each set consists of [padded_sequences, targets, sequence_mask] with shapes:
 38 | # [(n_examples, maxlen), (n_examples), (n_examples, maxlen)]
 39 | # note that the mask must remain float32!
 40 | datasetPadAndMask(imdb, 0)
 41 | 
 42 | print "loading data to the GPU"
 43 | 
 44 | dataset = load_data(imdb)
 45 | 
 46 | print "creating the LSTM"
 47 | x = T.matrix('x')           # input words, (n_examples, maxlen)
 48 | t = T.vector('t')           # targets
 49 | mask = T.matrix('mask')     # mask for valid words (n_examples, maxlen)
 50 | 
 51 | inputs = [x, t, mask]       # the mask comes last!
 52 | 
 53 | ix = x.astype('int32')      
 54 | it = t.astype('int32')         
 55 | 
 56 | rng = numpy.random.RandomState(int(time.time())) # random number generator
 57 | srng = T.shared_randomstreams.RandomStreams(int(time.time()))
 58 | 
 59 | embeddingLayer = EmbeddingLayer(
 60 |     rng=rng, 
 61 |     input=ix, 
 62 |     n_in=vocabulary_size, 
 63 |     n_out=dim_proj, 
 64 |     onehot=False
 65 | )
 66 | 
 67 | # (n_examples, n_timesteps, dim_proj)
 68 | x_emb = embeddingLayer.output
 69 | 
 70 | lstm = LSTM(
 71 |     rng=rng, 
 72 |     input=x_emb, 
 73 |     mask=mask, 
 74 |     n_units=dim_proj, 
 75 |     activation='tanh'
 76 | )
 77 | 
 78 | # (n_examples, maxlen, dimproj)
 79 | z = lstm.output
 80 | 
 81 | # only get the active and mean mool.
 82 | 
 83 | # mask[:, :, None].shape = (n_examples, maxlen, 1)
 84 | # (z * mask[:, :, None]).shape = (n_examples, maxlen, dim_proj)
 85 | # (z * mask[:, :, None]).sum(axis=1).shape = (n_examples, dim_proj)
 86 | z = (z * mask[:, :, None]).sum(axis=1)
 87 | 
 88 | # mask.sum(axis=1).shape = (n_examples,)
 89 | # mask.sum(axis=1)[:, None].shape = (n_examples,1)
 90 | meanPool = z / mask.sum(axis=1)[:, None]
 91 | # meanPool is now (n_examples, dim_proj)
 92 | 
 93 | meanPool_drop = dropout(srng, dropout_rate, meanPool)
 94 | 
 95 | outputLayer = HiddenLayer(
 96 |     rng=rng, 
 97 |     input=meanPool_drop, 
 98 |     n_in=dim_proj,
 99 |     n_out=2,               # {0,1} sentiment
100 |     params=None, 
101 |     activation='softmax'
102 | )
103 | 
104 | y = outputLayer.output
105 | 
106 | layers = [embeddingLayer, lstm, outputLayer]
107 | 
108 | L1 = layers_L1(layers)
109 | L2_sqr = layers_L2_sqr(layers)
110 | 
111 | # L1 = 0
112 | # L2_sqr = (lstm.params[0].get_value() ** 2).sum()
113 | 
114 | # regularization
115 | L1_reg=0.00
116 | L2_reg=0.00
117 | 
118 | # cost function
119 | cost = (
120 |     nll_multiclass(y, it)
121 |     + L1_reg * L1
122 |     + L2_reg * L2_sqr
123 | )
124 | 
125 | pred = pred_multiclass(y)
126 | 
127 | errors = pred_error(pred, it)
128 | 
129 | params = flatten(layers_params(layers))
130 | 
131 | print "training the LSTM with adadelta"
132 | optimize(dataset=dataset,
133 |         inputs=inputs,
134 |         cost=cost,
135 |         params=params,
136 |         errors=errors,
137 |         n_epochs=200,
138 |         batch_size=64,
139 |         patience=1500,
140 |         patience_increase=1.25,
141 |         improvement_threshold=0.995,
142 |         test_batches=1,
143 |         print_cost=True,
144 |         optimizer="adadelta")
145 | 
146 | print "compiling the prediction function"
147 | predict = theano.function(inputs=[x, mask], outputs=pred)
148 | 
149 | print "predicting the first 10 samples of the test dataset"
150 | print "predict:", predict(dataset[2][0].get_value()[0:10], dataset[2][-1].get_value()[0:10])
151 | print "answer: ", dataset[2][1].get_value()[0:10]
152 | 
153 | 
154 | 


--------------------------------------------------------------------------------
/examples/mlp-mnist-adadelta.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | from DL.models.MLP import MLP
 8 | from DL.optimizers import optimize
 9 | from DL import datasets
10 | from DL.utils import *
11 | import time
12 | 
13 | # hide warnings
14 | import warnings
15 | warnings.simplefilter("ignore")
16 | 
17 | 
18 | print "An MLP on MNIST."
19 | print "loading MNIST"
20 | mnist = datasets.mnist()
21 | 
22 | print "loading data to the GPU"
23 | dataset = load_data(mnist)
24 | 
25 | print "creating the MLP"
26 | x = T.matrix('x')  # input
27 | t = T.vector('t')  # targets
28 | inputs = [x, t]
29 | # cast to an int. needs to be initially a float to load to the GPU
30 | it = t.astype('int64')
31 | 
32 | rng = numpy.random.RandomState(int(time.time())) # random number generator
33 | 
34 | # construct the MLP class
35 | mlp = MLP(
36 |     rng=rng,
37 |     input=x,
38 |     n_in=28 * 28,
39 |     n_hidden=500,
40 |     n_out=10
41 | )
42 | 
43 | # regularization
44 | L1_reg=0.00
45 | L2_reg=0.0001
46 | 
47 | # cost function
48 | cost = (
49 |     nll_multiclass(mlp.output, it)
50 |     + L1_reg * mlp.L1
51 |     + L2_reg * mlp.L2_sqr
52 | )
53 | 
54 | pred = pred_multiclass(mlp.output)
55 | 
56 | errors = pred_error(pred, it)
57 | 
58 | params = flatten(mlp.params)
59 | 
60 | print "training the MLP with adadelta"
61 | optimize(dataset=dataset,
62 |         inputs=inputs,
63 |         cost=cost,
64 |         params=params,
65 |         errors=errors,
66 |         n_epochs=1000,
67 |         batch_size=20,
68 |         patience=5000,
69 |         patience_increase=1.5,
70 |         improvement_threshold=0.995,
71 |         optimizer="adadelta")
72 | 
73 | print ""
74 | print "compiling the prediction function"
75 | predict = theano.function(inputs=[x], outputs=pred)
76 | 
77 | print "predicting the first 10 samples of the test dataset"
78 | print "predict:", predict(mnist[2][0][0:10])
79 | print "answer: ", mnist[2][1][0:10]


--------------------------------------------------------------------------------
/examples/mlp-mnist-dropout.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | from DL.models.MLP import MLP
 8 | from DL.optimizers import optimize
 9 | from DL import datasets
10 | from DL.utils import *
11 | import time
12 | # from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
13 | 
14 | # hide warnings
15 | import warnings
16 | warnings.simplefilter("ignore")
17 | 
18 | 
19 | print "An MLP with dropout on MNIST."
20 | print "loading MNIST"
21 | mnist = datasets.mnist()
22 | 
23 | print "loading data to the GPU"
24 | dataset = load_data(mnist)
25 | 
26 | print "creating the MLP"
27 | x = T.matrix('x')  # input
28 | t = T.vector('t')  # targets
29 | inputs = [x, t]
30 | # cast to an int. needs to be initially a float to load to the GPU
31 | it = t.astype('int64')
32 | 
33 | rng = numpy.random.RandomState(int(time.time())) # random number generator
34 | # srng = RandomStreams(int(time.time()))
35 | srng = T.shared_randomstreams.RandomStreams(int(time.time()))
36 | 
37 | # construct the MLP class
38 | mlp = MLP(
39 |     rng=rng,
40 |     input=x,
41 |     n_in=28 * 28,
42 |     n_hidden=500,
43 |     n_out=10,
44 |     dropout_rate=0.5,
45 |     srng=srng
46 | )
47 | 
48 | # regularization
49 | L1_reg=0.00
50 | L2_reg=0.0001
51 | 
52 | # cost function
53 | cost = (
54 |     nll_multiclass(mlp.output, it)
55 |     + L1_reg * mlp.L1
56 |     + L2_reg * mlp.L2_sqr
57 | )
58 | 
59 | pred = pred_multiclass(mlp.output)
60 | 
61 | errors = pred_error(pred, it)
62 | 
63 | params = flatten(mlp.params)
64 | 
65 | print "training the MLP with rmsprop"
66 | optimize(dataset=dataset,
67 |         inputs=inputs,
68 |         cost=cost,
69 |         params=params,
70 |         errors=errors,
71 |         n_epochs=1000,
72 |         batch_size=20,
73 |         patience=5000,
74 |         patience_increase=1.5,
75 |         improvement_threshold=0.995,
76 |         optimizer="rmsprop")
77 | 
78 | print "compiling the prediction function"
79 | predict = theano.function(inputs=[x], outputs=pred)
80 | distribution = theano.function(inputs=[x], outputs=mlp.output)
81 | 
82 | print "predicting the first 10 samples of the test dataset"
83 | print "predict:", predict(mnist[2][0][0:10])
84 | print "answer: ", mnist[2][1][0:10]
85 | 
86 | print "with dropout, the output distributions should all be slightly different"
87 | print "predict:", distribution(mnist[2][0][0:1])
88 | print "predict:", distribution(mnist[2][0][0:1])
89 | print "predict:", distribution(mnist[2][0][0:1])
90 | print "predict:", distribution(mnist[2][0][0:1])
91 | 


--------------------------------------------------------------------------------
/examples/mlp-mnist-load.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | from DL.models.MLP import MLP
 8 | from DL.optimizers import optimize
 9 | from DL import datasets
10 | from DL.utils import *
11 | import time
12 | 
13 | # hide warnings
14 | import warnings
15 | warnings.simplefilter("ignore")
16 | 
17 | print "An MLP on MNIST."
18 | print "loading MNIST"
19 | mnist = datasets.mnist()
20 | 
21 | 
22 | print "loading a previous model"
23 | l = numpy.load("saved.npz")
24 | lparams = l['params']
25 | losses = l['results']
26 | notes = l['notes']
27 | 
28 | print "model loaded"
29 | print notes
30 | print "errors"
31 | print "  train: %0.05f" % losses[0]
32 | print "  validation: %0.05f" % losses[1]
33 | print "  test: %0.05f" % losses[2]
34 | print ""
35 | 
36 | print "loading data to the GPU"
37 | dataset = load_data(mnist)
38 | 
39 | print "creating the MLP"
40 | x = T.matrix('x')  # input
41 | t = T.vector('t')  # targets
42 | inputs = [x, t]
43 | # cast to an int. needs to be initially a float to load to the GPU
44 | it = t.astype('int64')
45 | 
46 | rng = numpy.random.RandomState(int(time.time())) # random number generator
47 | 
48 | # construct the MLP class
49 | mlp = MLP(
50 |     rng=rng,
51 |     input=x,
52 |     n_in=28 * 28,
53 |     n_hidden=500,
54 |     n_out=10,
55 |     params=lparams
56 | )
57 | 
58 | # regularization
59 | L1_reg=0.00
60 | L2_reg=0.0001
61 | 
62 | # cost function
63 | cost = (
64 |     nll_multiclass(mlp.output, it)
65 |     + L1_reg * mlp.L1
66 |     + L2_reg * mlp.L2_sqr
67 | )
68 | 
69 | pred = pred_multiclass(mlp.output)
70 | 
71 | errors = pred_error(pred, it)
72 | 
73 | params = flatten(mlp.params)
74 | 
75 | 
76 | # print "training the MLP with rmsprop"
77 | # losses = optimize(
78 | #     dataset=dataset,
79 | #     inputs=inputs,
80 | #     cost=cost,
81 | #     params=params,
82 | #     errors=errors,
83 | #     n_epochs=5,
84 | #     batch_size=20,
85 | #     patience=5000,
86 | #     patience_increase=1.5,
87 | #     improvement_threshold=0.995,
88 | #     optimizer='rmsprop'
89 | # )
90 | 
91 | print "compiling the prediction function"
92 | predict = theano.function(inputs=[x], outputs=pred)
93 | 
94 | print "predicting the first 10 samples of the test dataset"
95 | print "predict:", predict(mnist[2][0][0:10])
96 | print "answer: ", mnist[2][1][0:10]
97 | 
98 | 


--------------------------------------------------------------------------------
/examples/mlp-mnist-rmsprop.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | from DL.models.MLP import MLP
 8 | from DL.optimizers import optimize
 9 | from DL import datasets
10 | from DL.utils import *
11 | import time
12 | 
13 | # hide warnings
14 | import warnings
15 | warnings.simplefilter("ignore")
16 | 
17 | 
18 | print "An MLP on MNIST."
19 | print "loading MNIST"
20 | mnist = datasets.mnist()
21 | 
22 | print "loading data to the GPU"
23 | dataset = load_data(mnist)
24 | 
25 | print "creating the MLP"
26 | x = T.matrix('x')  # input
27 | t = T.vector('t')  # targets
28 | inputs = [x, t]
29 | # cast to an int. needs to be initially a float to load to the GPU
30 | it = t.astype('int64')
31 | 
32 | rng = numpy.random.RandomState(int(time.time())) # random number generator
33 | 
34 | # construct the MLP class
35 | mlp = MLP(
36 |     rng=rng,
37 |     input=x,
38 |     n_in=28 * 28,
39 |     n_hidden=500,
40 |     n_out=10
41 | )
42 | 
43 | # regularization
44 | L1_reg=0.00
45 | L2_reg=0.0001
46 | 
47 | # cost function
48 | cost = (
49 |     nll_multiclass(mlp.output, it)
50 |     + L1_reg * mlp.L1
51 |     + L2_reg * mlp.L2_sqr
52 | )
53 | 
54 | pred = pred_multiclass(mlp.output)
55 | 
56 | errors = pred_error(pred, it)
57 | 
58 | params = flatten(mlp.params)
59 | 
60 | 
61 | print "training the MLP with rmsprop"
62 | optimize(dataset=dataset,
63 |         inputs=inputs,
64 |         cost=cost,
65 |         params=params,
66 |         errors=errors,
67 |         n_epochs=1000,
68 |         batch_size=20,
69 |         patience=5000,
70 |         patience_increase=1.5,
71 |         improvement_threshold=0.995,
72 |         optimizer="rmsprop")
73 | 
74 | print "compiling the prediction function"
75 | predict = theano.function(inputs=[x], outputs=pred)
76 | 
77 | print "predicting the first 10 samples of the test dataset"
78 | print "predict:", predict(mnist[2][0][0:10])
79 | print "answer: ", mnist[2][1][0:10]


--------------------------------------------------------------------------------
/examples/mlp-mnist-save.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | from DL.models.MLP import MLP
 8 | from DL.optimizers import optimize
 9 | from DL import datasets
10 | from DL.utils import *
11 | import time
12 | 
13 | # hide warnings
14 | import warnings
15 | warnings.simplefilter("ignore")
16 | 
17 | print "An MLP on MNIST."
18 | print "loading MNIST"
19 | mnist = datasets.mnist()
20 | 
21 | print "loading data to the GPU"
22 | dataset = load_data(mnist)
23 | 
24 | print "creating the MLP"
25 | x = T.matrix('x')  # input
26 | t = T.vector('t')  # targets
27 | inputs = [x, t]
28 | # cast to an int. needs to be initially a float to load to the GPU
29 | it = t.astype('int64')
30 | 
31 | rng = numpy.random.RandomState(int(time.time())) # random number generator
32 | 
33 | # construct the MLP class
34 | mlp = MLP(
35 |     rng=rng,
36 |     input=x,
37 |     n_in=28 * 28,
38 |     n_hidden=500,
39 |     n_out=10
40 | )
41 | 
42 | # regularization
43 | L1_reg=0.00
44 | L2_reg=0.0001
45 | 
46 | # cost function
47 | cost = (
48 |     nll_multiclass(mlp.output, it)
49 |     + L1_reg * mlp.L1
50 |     + L2_reg * mlp.L2_sqr
51 | )
52 | 
53 | pred = pred_multiclass(mlp.output)
54 | 
55 | errors = pred_error(pred, it)
56 | 
57 | params = flatten(mlp.params)
58 | 
59 | 
60 | print "training the MLP with rmsprop"
61 | losses = optimize(
62 |     dataset=dataset,
63 |     inputs=inputs,
64 |     cost=cost,
65 |     params=params,
66 |     errors=errors,
67 |     n_epochs=2,
68 |     batch_size=20,
69 |     patience=5000,
70 |     patience_increase=1.5,
71 |     improvement_threshold=0.995,
72 |     optimizer='rmsprop'
73 | )
74 | 
75 | print "compiling the prediction function"
76 | predict = theano.function(inputs=[x], outputs=pred)
77 | 
78 | print "predicting the first 10 samples of the test dataset"
79 | print "predict:", predict(mnist[2][0][0:10])
80 | print "answer: ", mnist[2][1][0:10]
81 | 
82 | 
83 | print "saving..."
84 | numpy.savez("saved.npz", results=losses, params=mlp.params, notes="Just a vanilla mlp on MNIST...")
85 | 
86 | 


--------------------------------------------------------------------------------
/examples/mlp-mnist-sgd.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | from DL.models.MLP import MLP
 8 | from DL.optimizers import optimize
 9 | from DL import datasets
10 | from DL.utils import *
11 | import time
12 | 
13 | # hide warnings
14 | import warnings
15 | warnings.simplefilter("ignore")
16 | 
17 | print "An MLP on MNIST."
18 | print "loading MNIST"
19 | mnist = datasets.mnist()
20 | 
21 | print "loading data to the GPU"
22 | dataset = load_data(mnist)
23 | 
24 | print "creating the MLP"
25 | x = T.matrix('x')  # input
26 | t = T.vector('t')  # targets
27 | inputs = [x, t]
28 | 
29 | # cast to an int. needs to be initially a float to load to the GPU
30 | it = t.astype('int64')
31 | 
32 | rng = numpy.random.RandomState(int(time.time())) # random number generator
33 | 
34 | # construct the MLP class
35 | mlp = MLP(
36 |     rng=rng,
37 |     input=x,
38 |     n_in=28 * 28,
39 |     n_hidden=500,
40 |     n_out=10
41 | )
42 | 
43 | # regularization
44 | L1_reg=0.00
45 | L2_reg=0.0001
46 | 
47 | # cost function
48 | cost = (
49 |     nll_multiclass(mlp.output, it)
50 |     + L1_reg * mlp.L1
51 |     + L2_reg * mlp.L2_sqr
52 | )
53 | 
54 | pred = pred_multiclass(mlp.output)
55 | 
56 | errors = pred_error(pred, it)
57 | 
58 | params = flatten(mlp.params)
59 | 
60 | print "training the MLP with sgd"
61 | optimize(dataset=dataset,
62 |     inputs=inputs,
63 |     cost=cost,
64 |     params=params,
65 |     errors=errors,
66 |     learning_rate=0.01,
67 |     momentum=0.2,
68 |     n_epochs=1000,
69 |     batch_size=20,
70 |     patience=1000,
71 |     patience_increase=1.5,
72 |     improvement_threshold=0.995,
73 |     optimizer="sgd")
74 | 
75 | print "compiling the prediction function"
76 | predict = theano.function(inputs=[x], outputs=pred)
77 | 
78 | print "predicting the first 10 samples of the test dataset"
79 | print "predict:", predict(mnist[2][0][0:10])
80 | print "answer: ", mnist[2][1][0:10]


--------------------------------------------------------------------------------
/examples/rnn-lag-binary.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | # coding: utf-8
  3 | 
  4 | import theano
  5 | import theano.tensor as T
  6 | import numpy
  7 | from DL.models.RNN import RNN
  8 | from DL.optimizers import optimize
  9 | from DL.utils import *
 10 | import time
 11 | import matplotlib.pyplot as plt
 12 | 
 13 | # hide warnings
 14 | import warnings
 15 | warnings.simplefilter("ignore")
 16 | 
 17 | print "Testing an RNN with binary outputs"
 18 | print "Generating lag test data..."
 19 | 
 20 | n_hidden = 30
 21 | n_in = 5
 22 | n_out = 2
 23 | n_steps = 11
 24 | n_seq = 100
 25 | 
 26 | # simple lag test
 27 | seq = numpy.random.randn(n_seq, n_steps, n_in)
 28 | targets = numpy.zeros((n_seq, n_steps, n_out))
 29 | 
 30 | # whether lag 1 (dim 3) is greater than lag 2 (dim 0)
 31 | targets[:, 2:, 0] = numpy.cast[numpy.int](seq[:, 1:-1, 3] > seq[:, :-2, 0])
 32 | 
 33 | # whether product of lag 1 (dim 4) and lag 1 (dim 2)
 34 | # is less than lag 2 (dim 0)
 35 | targets[:, 2:, 1] = numpy.cast[numpy.int]((seq[:, 1:-1, 4] * seq[:, 1:-1, 2]) > seq[:, :-2, 0])
 36 | 
 37 | # split into training, validation, and test
 38 | trainIdx = int(numpy.floor(4./6.*n_seq))
 39 | validIdx = int(numpy.floor(5./6.*n_seq))
 40 | 
 41 | lagData = ((seq[0:trainIdx,:,:], targets[0:trainIdx,:,:]), 
 42 |            (seq[trainIdx:validIdx,:,:], targets[trainIdx:validIdx,:,:]),
 43 |            (seq[validIdx:,:,:], targets[validIdx:,:,:]))
 44 | 
 45 | print "loading data to the GPU"
 46 | # if you change this to int32, make you change the target tensor type!
 47 | dataset = load_data(lagData)
 48 | 
 49 | print "creating the RNN"
 50 | x = T.tensor3('x')  # input
 51 | t = T.tensor3('t')  # targets
 52 | inputs = [x,t]
 53 | # cast to an int. needs to be initially a float to load to the GPU
 54 | it = t.astype('int64')
 55 | 
 56 | rng = numpy.random.RandomState(int(time.time())) # random number generator
 57 | 
 58 | rnn = RNN(rng=rng, 
 59 |           input=x, 
 60 |           n_in=n_in, 
 61 |           n_hidden=n_hidden, 
 62 |           n_out=n_out, 
 63 |           activation='tanh', 
 64 |           outputActivation='sigmoid'
 65 | )
 66 | 
 67 | # regularization
 68 | L1_reg=0.00
 69 | L2_reg=0.0001
 70 | 
 71 | # cost function
 72 | cost = (
 73 |     nll_binary(rnn.output, it)
 74 |     + L1_reg * rnn.L1
 75 |     + L2_reg * rnn.L2_sqr
 76 | )
 77 | 
 78 | pred = pred_binary(rnn.output)
 79 | 
 80 | errors = pred_error(pred, it)
 81 | 
 82 | params = flatten(rnn.params)
 83 | 
 84 | print "training the rnn with rmsprop"
 85 | 
 86 | optimize(dataset=dataset,
 87 |         inputs=inputs,
 88 |         cost=cost,
 89 |         params=params,
 90 |         errors=errors,
 91 |         n_epochs=5000,
 92 |         batch_size=20,
 93 |         patience=1000,
 94 |         patience_increase=1.5,
 95 |         improvement_threshold=0.995,
 96 |         optimizer="rmsprop")
 97 | 
 98 | print "compiling the prediction function"
 99 | 
100 | predict = theano.function(inputs=[x], outputs=rnn.output)
101 | 
102 | print "predicting the first 10 samples of the training dataset"
103 | 
104 | seqs = xrange(10)
105 | for seq_num in seqs:
106 |     fig = plt.figure()
107 |     ax1 = plt.subplot(211)
108 |     plt.plot(seq[seq_num])
109 |     ax1.set_title('input')
110 |     ax2 = plt.subplot(212)
111 |     true_targets = plt.step(xrange(n_steps), targets[seq_num], marker='o')
112 | 
113 |     guess = predict(seq[seq_num:seq_num+1])[0]
114 |     guessed_targets = plt.step(xrange(n_steps), guess)
115 |     plt.setp(guessed_targets, linestyle='--', marker='d')
116 |     for i, x in enumerate(guessed_targets):
117 |         x.set_color(true_targets[i].get_color())
118 |     ax2.set_ylim((-0.1, 1.1))
119 |     ax2.set_title('solid: true output, dashed: model output (prob)')
120 | 
121 | plt.show()


--------------------------------------------------------------------------------
/examples/rnn-lag-real.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | # coding: utf-8
  3 | 
  4 | import theano
  5 | import theano.tensor as T
  6 | import numpy
  7 | from DL.models.RNN import RNN
  8 | from DL.optimizers import optimize
  9 | from DL.utils import *
 10 | import time
 11 | import matplotlib.pyplot as plt
 12 | 
 13 | # hide warnings
 14 | import warnings
 15 | warnings.simplefilter("ignore")
 16 | 
 17 | print "Testing an RNN with linear outputs"
 18 | print "Generating lag test data..."
 19 | n_hidden = 30
 20 | n_in = 5
 21 | n_out = 3
 22 | n_steps = 11
 23 | n_seq = 100
 24 | 
 25 | numpy.random.seed(0)
 26 | # simple lag test
 27 | seq = numpy.random.randn(n_seq, n_steps, n_in)
 28 | targets = numpy.zeros((n_seq, n_steps, n_out))
 29 | 
 30 | targets[:, 1:, 0] = seq[:, :-1, 3]  # delayed 1
 31 | targets[:, 1:, 1] = seq[:, :-1, 2]  # delayed 1
 32 | targets[:, 2:, 2] = seq[:, :-2, 0]  # delayed 2
 33 | 
 34 | targets += 0.01 * numpy.random.standard_normal(targets.shape)
 35 | 
 36 | # split into training, validation, and test
 37 | trainIdx = int(numpy.floor(4./6.*n_seq))
 38 | validIdx = int(numpy.floor(5./6.*n_seq))
 39 | 
 40 | lagData = ((seq[0:trainIdx,:,:], targets[0:trainIdx,:,:]), 
 41 |            (seq[trainIdx:validIdx,:,:], targets[trainIdx:validIdx,:,:]),
 42 |            (seq[validIdx:,:,:], targets[validIdx:,:,:]))
 43 | 
 44 | print "loading data to the GPU"
 45 | dataset = load_data(lagData)
 46 | 
 47 | print "creating the RNN"
 48 | x = T.tensor3('x')  # input
 49 | t = T.tensor3('t')  # targets
 50 | inputs = [x,t]
 51 | rng = numpy.random.RandomState(int(time.time())) # random number generator
 52 | 
 53 | rnn = RNN(rng=rng, 
 54 |           input=x, 
 55 |           n_in=n_in, 
 56 |           n_hidden=n_hidden, 
 57 |           n_out=n_out, 
 58 |           activation='tanh', 
 59 |           outputActivation='linear'
 60 | )
 61 | 
 62 | # regularization
 63 | L1_reg=0.00
 64 | L2_reg=0.0001
 65 | 
 66 | # cost function
 67 | cost = (
 68 |     mse(rnn.output, t)
 69 |     + L1_reg * rnn.L1
 70 |     + L2_reg * rnn.L2_sqr
 71 | )
 72 | 
 73 | errors = mse(rnn.output, t)
 74 | 
 75 | params = flatten(rnn.params)
 76 | 
 77 | print "training the rnn with rmsprop"
 78 | 
 79 | optimize(dataset=dataset,
 80 |         inputs=inputs,
 81 |         cost=cost,
 82 |         params=params,
 83 |         errors=errors,
 84 |         n_epochs=5000,
 85 |         batch_size=20,
 86 |         patience=1000,
 87 |         patience_increase=1.5,
 88 |         improvement_threshold=0.995,
 89 |         optimizer="rmsprop")
 90 | 
 91 | print "compiling the prediction function"
 92 | 
 93 | predict = theano.function(inputs=[x], outputs=rnn.output)
 94 | 
 95 | print "predicting the first sample of the training dataset"
 96 | 
 97 | fig = plt.figure()
 98 | ax1 = plt.subplot(211)
 99 | plt.plot(seq[0])
100 | ax1.set_title('input')
101 | 
102 | ax2 = plt.subplot(212)
103 | true_targets = plt.plot(targets[0])
104 | 
105 | guess = predict(seq[0:1])[0]
106 | guessed_targets = plt.plot(guess, linestyle='--')
107 | for i, x in enumerate(guessed_targets):
108 |     x.set_color(true_targets[i].get_color())
109 | 
110 | ax2.set_title('solid: true output, dashed: model output')
111 | plt.show()
112 | 


--------------------------------------------------------------------------------
/examples/rnn-lag-softmax.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | # coding: utf-8
  3 | 
  4 | import theano
  5 | import theano.tensor as T
  6 | import numpy
  7 | from DL.models.RNN import RNN
  8 | from DL.optimizers import optimize
  9 | from DL.utils import *
 10 | import time
 11 | import matplotlib.pyplot as plt
 12 | 
 13 | # hide warnings
 14 | import warnings
 15 | warnings.simplefilter("ignore")
 16 | 
 17 | print "Testing an RNN with softmax outputs"
 18 | print "Generating lag test data..."
 19 | 
 20 | n_hidden = 100
 21 | n_in = 5
 22 | n_steps = 10
 23 | n_seq = 100
 24 | n_classes = 3
 25 | n_out = n_classes  # restricted to single softmax per time step
 26 | 
 27 | # simple lag test
 28 | seq = numpy.random.randn(n_seq, n_steps, n_in)
 29 | targets = numpy.zeros((n_seq, n_steps), dtype=numpy.int)
 30 | 
 31 | thresh = 0.5
 32 | # if lag 1 (dim 3) is greater than lag 2 (dim 0) + thresh
 33 | # class 1
 34 | # if lag 1 (dim 3) is less than lag 2 (dim 0) - thresh
 35 | # class 2
 36 | # if lag 2(dim0) - thresh <= lag 1 (dim 3) <= lag2(dim0) + thresh
 37 | # class 0
 38 | targets[:, 2:][seq[:, 1:-1, 3] > seq[:, :-2, 0] + thresh] = 1
 39 | targets[:, 2:][seq[:, 1:-1, 3] < seq[:, :-2, 0] - thresh] = 2
 40 | #targets[:, 2:, 0] = numpy.cast[numpy.int](seq[:, 1:-1, 3] > seq[:, :-2, 0])
 41 | 
 42 | # split into training, validation, and test
 43 | trainIdx = int(numpy.floor(4./6.*n_seq))
 44 | validIdx = int(numpy.floor(5./6.*n_seq))
 45 | 
 46 | lagData = ((seq[0:trainIdx,:,:], targets[0:trainIdx,:]), 
 47 |            (seq[trainIdx:validIdx,:,:], targets[trainIdx:validIdx,:]),
 48 |            (seq[validIdx:,:,:], targets[validIdx:,:]))
 49 | 
 50 | print "loading data to the GPU"
 51 | dataset = load_data(lagData)
 52 | 
 53 | print "creating the RNN"
 54 | x = T.tensor3('x')  # input
 55 | t = T.matrix('t')  # targets
 56 | inputs = [x,t]
 57 | # cast to an int. needs to be initially a float to load to the GPU
 58 | it = t.astype('int64')
 59 | 
 60 | rng = numpy.random.RandomState(int(time.time())) # random number generator
 61 | 
 62 | rnn = RNN(rng=rng, 
 63 |           input=x, 
 64 |           n_in=n_in, 
 65 |           n_hidden=n_hidden, 
 66 |           n_out=n_out, 
 67 |           activation='tanh', 
 68 |           outputActivation='softmax'
 69 | )
 70 | 
 71 | # regularization
 72 | L1_reg=0.00
 73 | L2_reg=0.0001
 74 | 
 75 | # cost function
 76 | cost = (
 77 |     nll_multiclass_timeseries(rnn.output, it)
 78 |     + L1_reg * rnn.L1
 79 |     + L2_reg * rnn.L2_sqr
 80 | )
 81 | 
 82 | pred = pred_multiclass(rnn.output)
 83 | 
 84 | errors = pred_error(pred, it)
 85 | 
 86 | params = flatten(rnn.params)
 87 | 
 88 | print "training the rnn with rmsprop"
 89 | 
 90 | optimize(dataset=dataset,
 91 |         inputs=inputs,
 92 |         cost=cost,
 93 |         params=params,
 94 |         errors=errors,
 95 |         n_epochs=5000,
 96 |         batch_size=100,
 97 |         patience=1000,
 98 |         patience_increase=2.,
 99 |         improvement_threshold=0.9995,
100 |         optimizer="rmsprop")
101 | 
102 | print "compiling the prediction function"
103 | 
104 | predict = theano.function(inputs=[x], outputs=rnn.output)
105 | 
106 | print "predicting the first 10 samples of the training dataset"
107 | 
108 | seqs = xrange(10)
109 | for seq_num in seqs:
110 |     fig = plt.figure()
111 |     ax1 = plt.subplot(211)
112 |     plt.plot(seq[seq_num])
113 |     ax1.set_title('input')
114 |     ax2 = plt.subplot(212)
115 | 
116 |     # blue line will represent true classes
117 |     true_targets = plt.step(xrange(n_steps), targets[seq_num], marker='o')
118 | 
119 |     # show probabilities (in b/w) output by model
120 |     guess = predict(seq[seq_num:seq_num+1])[0]
121 |     guessed_probs = plt.imshow(guess.T, interpolation='nearest', cmap='gray')
122 |     ax2.set_title('blue: true class, grayscale: probs assigned by model')
123 | 
124 | plt.show()


--------------------------------------------------------------------------------
/theano-tests/diag.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | 
 7 | x = T.matrix()
 8 | y = T.vector()
 9 | 
10 | d = T.diag(x)
11 | z = T.diag(y)
12 | 
13 | 
14 | matrixDiags = theano.function(inputs=[x], outputs=d)
15 | vectorDiag = theano.function(inputs=[y], outputs=z)
16 | 
17 | print matrixDiags([[1,2,3],[4,5,6], [7,8,9]])
18 | print vectorDiag([1,2,3])
19 | 


--------------------------------------------------------------------------------
/theano-tests/dot.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | 
 8 | 
 9 | y = T.vector()
10 | n_x = 2
11 | n_y = 3
12 | W_xy = numpy.random.randn(n_x, n_y)
13 | z = T.dot(W_xy, y)
14 | 
15 | dot = theano.function(inputs=[y], outputs=z)
16 | 
17 | print dot(numpy.random.randn(n_y))
18 | 
19 | 
20 | q = T.vector()
21 | w = T.vector()
22 | 
23 | vecDot = theano.function(inputs=[q,w], outputs=T.dot(q,w))
24 | vecDot2 = theano.function(inputs=[q,w], outputs=T.dot(w,q))
25 | vecMult = theano.function(inputs=[q,w], outputs=w*q)
26 | vecMult2 = theano.function(inputs=[q,w], outputs=q*w)
27 | 
28 | 
29 | print vecDot([1,2], [2,3])
30 | print vecDot2([1,2], [2,3])
31 | print vecMult([1,2], [2,3])
32 | print vecMult2([1,2], [2,3])
33 | 


--------------------------------------------------------------------------------
/theano-tests/embedding-indexing.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | from DL.utils import *
 8 | 
 9 | input = T.imatrix('input')
10 | 
11 | n_in = 5
12 | n_out = 4
13 | 
14 | W = theano.shared(numpy.random.randn(n_in, n_out))
15 | 
16 | theano.printing.debugprint(input.shape[:-1], print_type=True)
17 | 
18 | shape = T.concatenate([input.shape, [n_out]])
19 |   
20 | theano.printing.debugprint(shape, print_type=True)
21 | 
22 | 
23 | output = W[input.flatten()].reshape(shape, ndim=3)
24 | 
25 | r = theano.function(inputs=[input], outputs=output)
26 | 
27 | s = theano.function(inputs=[input], outputs=shape)
28 | 
29 | # 1 example
30 | # 2 timesteps
31 | # ints representing
32 | print r([
33 |   [0,1,3,2,4], 
34 |   [0,1,3,2,4]
35 | ])


--------------------------------------------------------------------------------
/theano-tests/forward-feed-column-pooling.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | import operator
 8 | from utils import *
 9 | 
10 | x = T.vector()
11 | y = T.vector()
12 | z = T.vector()
13 | 
14 | X = T.stacklists([x,y,z])
15 | 
16 | M = T.max(X, axis=0)
17 | 
18 | together = theano.function(inputs=[x,y,z], outputs=X)
19 | maximum = theano.function(inputs=[x,y,z], outputs=M)
20 | 
21 | print together([1,2,3],[4,5,6], [7,8,9])
22 | print maximum([1,2,3],[4,5,6], [7,8,9])
23 | 


--------------------------------------------------------------------------------
/theano-tests/mean-pooling.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # coding: utf-8
 3 | 
 4 | import theano
 5 | import theano.tensor as T
 6 | import numpy
 7 | import operator
 8 | from DL.utils import *
 9 | 
10 | 
11 | n_examples = 4
12 | maxlen = 3
13 | dim_proj = 2
14 | 
15 | mask = numpy.random.randn(n_examples, maxlen)
16 | z    = numpy.random.randn(n_examples, maxlen, dim_proj)
17 | 
18 | 
19 | # only get the active and mean mool.
20 | 
21 | # mask[:, :, None].shape = (n_examples, maxlen, 1)
22 | # (z * mask[:, :, None]).shape = (n_examples, maxlen, dim_proj)
23 | # (z * mask[:, :, None]).sum(axis=1).shape = (n_examples, dim_proj)
24 | z = (z * mask[:, :, None]).sum(axis=1)
25 | # mask.sum(axis=1).shape = (n_examples,)
26 | # mask.sum(axis=1)[:, None].shape = (n_examples,1)
27 | z = z / mask.sum(axis=1)[:, None]
28 | # z is now (n_examples, dim_proj)


--------------------------------------------------------------------------------
/theano-tests/random-streams-scan-clone.py:
--------------------------------------------------------------------------------
 1 | import theano
 2 | import theano.tensor as T
 3 | import numpy
 4 | import time
 5 | 
 6 | srng = T.shared_randomstreams.RandomStreams(int(time.time()))
 7 | 
 8 | 
 9 | input = T.vector('input')
10 | inputs = T.matrix('inputs')
11 | 
12 | 
13 | def dropout(srng, dropout_rate, inp, size=None):
14 |     if size is None:
15 |       size = inp.shape  
16 |     # mask = srng.binomial(n=1, p=1-dropout_rate, size=(10,), dtype=theano.config.floatX)
17 |     mask = srng.binomial(n=1, p=1-dropout_rate, size=size, dtype=theano.config.floatX)
18 |     out = inp * mask
19 |     return out
20 | 
21 | output = dropout(srng, 0.5, input, inputs[0].shape)
22 | 
23 | def step(x_t):
24 |     upd = [(input, x_t)]
25 |     y_t = theano.clone(output, replace=upd)
26 |     return y_t
27 | 
28 | outputs, _ = theano.scan(step, sequences=inputs, outputs_info=[None])
29 | 
30 | predict = theano.function(inputs=[inputs], outputs=outputs)
31 | 
32 | print predict(numpy.random.randn(2,10))
33 | 
34 | g = theano.function(inputs=[inputs, outputs], outputs=T.grad(outputs, inputs))
35 | 
36 | print g
37 | 
38 | # # define tensor variables
39 | # X = T.matrix("X")
40 | # W = T.matrix("W")
41 | # b_sym = T.vector("b_sym")
42 | 
43 | # # define shared random stream
44 | # trng = T.shared_randomstreams.RandomStreams(1234)
45 | # d=trng.binomial(size=W[1].shape)
46 | 
47 | # results, updates = theano.scan(lambda v: T.tanh(T.dot(v, W) + b_sym) * d, sequences=X)
48 | # compute_with_bnoise = theano.function(inputs=[X, W, b_sym], outputs=[results],
49 | #                           updates=updates, allow_input_downcast=True)
50 | # x = numpy.eye(10, 2, dtype=theano.config.floatX)
51 | # w = numpy.ones((2, 2), dtype=theano.config.floatX)
52 | # b = numpy.ones((2), dtype=theano.config.floatX)
53 | 
54 | # print compute_with_bnoise(x, w, b)


--------------------------------------------------------------------------------
/theano-tests/rnn-dropout.py:
--------------------------------------------------------------------------------
 1 | import theano
 2 | import theano.tensor as T
 3 | import numpy
 4 | import time
 5 | 
 6 | # Make a simple hidden layer:
 7 | n_in = 10
 8 | n_out = 20
 9 | dropout_rate = 0.5
10 | 
11 | rng = numpy.random.RandomState(int(time.time())) # random number generator
12 | 
13 | input = T.matrix('input')
14 | 
15 | W_values = numpy.asarray(
16 |     rng.uniform(
17 |         low=-numpy.sqrt(6. / (n_in + n_out)),
18 |         high=numpy.sqrt(6. / (n_in + n_out)),
19 |         size=(n_in, n_out)
20 |     ),
21 |     dtype=theano.config.floatX
22 | )
23 | 
24 | W = theano.shared(value=W_values, name='W', borrow=True)
25 | 
26 | b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
27 | b = theano.shared(value=b_values, name='b', borrow=True)
28 | 
29 | output = T.tanh(T.dot(input, W) + b)
30 | 
31 | updates = {}
32 | if dropout_rate > 0:
33 |     # p=1-p because 1's indicate keep and p is prob of dropping
34 |     srng = theano.tensor.shared_randomstreams.RandomStreams(int(time.time()))
35 |     # mask = T.imatrix('mask')
36 |     mask = srng.binomial(n=1, p=1-dropout_rate, size=output.shape)
37 |     # The cast is important because int * float32 = float64 which pulls things off the gpu
38 |     output = output * T.cast(mask, theano.config.floatX)
39 |     updates = srng.updates()
40 | 
41 | 
42 | x = T.tensor3('x')
43 | 
44 | def step(x_t, x_tm1):
45 |     replace = [(input, x_t)]
46 |     replace += updates
47 |     x_tp1 = theano.clone(output, replace=replace)
48 |     return x_tp1 + x_tm1
49 | 
50 | z, _ = theano.scan(step, sequences=[x[1:]], outputs_info=[x[0]])
51 | 
52 | print z
53 | 
54 | predit = theano.function(inputs=[x], outputs=z)


--------------------------------------------------------------------------------
/theano-tests/tensor-shape-append.py:
--------------------------------------------------------------------------------
 1 | import theano
 2 | import theano.tensor as T
 3 | import numpy
 4 | 
 5 | 
 6 | # x = T.matrix('x')
 7 | # y = T.tensor3('y')
 8 | 
 9 | # sx = x.shape
10 | # sy = y.shape
11 | 
12 | # shape2 = theano.function(inputs=[x], outputs=sx)
13 | # shape3 = theano.function(inputs=[y], outputs=sy)
14 | 
15 | # print shape2(numpy.random.randn(2,3))
16 | # print shape3(numpy.random.randn(2,3,4))
17 | 
18 | # raise
19 | 
20 | x = T.matrix('x')
21 | y = T.tensor3('y')
22 | 
23 | sx = x.shape
24 | sy = y.shape
25 | 
26 | sx = T.concatenate([sx[:-1], [10]])
27 | sy = T.concatenate([sy[:-1], [10]])
28 | 
29 | shape2 = theano.function(inputs=[x], outputs=sx)
30 | shape3 = theano.function(inputs=[y], outputs=sy)
31 | 
32 | print shape2(numpy.random.randn(2,3))
33 | print shape3(numpy.random.randn(2,3,4))


--------------------------------------------------------------------------------