├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── examples
    ├── associative-recall-task.py
    ├── copy-task.py
    ├── dyck-words-task.py
    ├── repeat-copy-task.py
    ├── reversed-copy-task.py
    ├── sort-task.py
    └── upsidedown-copy-task.py
├── models
    ├── README.rst
    ├── associative-recall.npy
    ├── copy.npy
    ├── dyck-words.npy
    ├── repeat-copy.npy
    ├── reversed-copy.npy
    ├── sort.npy
    └── upsidedown-copy.npy
├── ntm
    ├── __init__.py
    ├── controllers.py
    ├── heads.py
    ├── init.py
    ├── layers.py
    ├── memory.py
    ├── nonlinearities.py
    ├── similarities.py
    ├── test
    │   ├── test_heads.py
    │   ├── test_layers.py
    │   └── test_similarities.py
    └── updates.py
├── requirements.txt
├── setup.py
└── utils
    ├── __init__.py
    ├── generators.py
    └── visualization.py


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | env/
 12 | build/
 13 | develop-eggs/
 14 | dist/
 15 | downloads/
 16 | eggs/
 17 | .eggs/
 18 | lib/
 19 | lib64/
 20 | parts/
 21 | sdist/
 22 | var/
 23 | wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *,cover
 47 | .hypothesis/
 48 | 
 49 | # Translations
 50 | *.mo
 51 | *.pot
 52 | 
 53 | # Django stuff:
 54 | *.log
 55 | local_settings.py
 56 | 
 57 | # Flask stuff:
 58 | instance/
 59 | .webassets-cache
 60 | 
 61 | # Scrapy stuff:
 62 | .scrapy
 63 | 
 64 | # Sphinx documentation
 65 | docs/_build/
 66 | 
 67 | # PyBuilder
 68 | target/
 69 | 
 70 | # Jupyter Notebook
 71 | .ipynb_checkpoints
 72 | 
 73 | # pyenv
 74 | .python-version
 75 | 
 76 | # celery beat schedule file
 77 | celerybeat-schedule
 78 | 
 79 | # SageMath parsed files
 80 | *.sage.py
 81 | 
 82 | # dotenv
 83 | .env
 84 | 
 85 | # virtualenv
 86 | .venv
 87 | venv/
 88 | ENV/
 89 | 
 90 | # Spyder project settings
 91 | .spyderproject
 92 | 
 93 | # Rope project settings
 94 | .ropeproject
 95 | 
 96 | ## Temporary
 97 | tmp/
 98 | data/
 99 | img/
100 | notebooks/
101 | animation/
102 | models/learning_curves/
103 | snapshots/
104 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Contributor Covenant Code of Conduct
 2 | 
 3 | ## Our Pledge
 4 | 
 5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
 6 | 
 7 | ## Our Standards
 8 | 
 9 | Examples of behavior that contributes to creating a positive environment include:
10 | 
11 | * Using welcoming and inclusive language
12 | * Being respectful of differing viewpoints and experiences
13 | * Gracefully accepting constructive criticism
14 | * Focusing on what is best for the community
15 | * Showing empathy towards other community members
16 | 
17 | Examples of unacceptable behavior by participants include:
18 | 
19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances
20 | * Trolling, insulting/derogatory comments, and personal or political attacks
21 | * Public or private harassment
22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission
23 | * Other conduct which could reasonably be considered inappropriate in a professional setting
24 | 
25 | ## Our Responsibilities
26 | 
27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
28 | 
29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
30 | 
31 | ## Scope
32 | 
33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers.
34 | 
35 | ## Enforcement
36 | 
37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at support@snips.ai. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately.
38 | 
39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership.
40 | 
41 | ## Attribution
42 | 
43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version]
44 | 
45 | [homepage]: http://contributor-covenant.org
46 | [version]: http://contributor-covenant.org/version/1/4/
47 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | How to Contribute
 2 | =================
 3 | 
 4 | Contributions are welcome! Not familiar with the codebase yet? No problem!
 5 | There are many ways to contribute to open source projects: reporting bugs,
 6 | helping with the documentation, spreading the word and of course, adding
 7 | new features and patches.
 8 | 
 9 | Getting Started
10 | ---------------
11 | * Make sure you have a GitHub account.
12 | * Open a [new issue](https://github.com/snipsco/ntm-lasagne/issues), assuming one does not already exist.
13 | * Clearly describe the issue including steps to reproduce when it is a bug.
14 | 
15 | Making Changes
16 | --------------
17 | * Fork this repository.
18 | * Create a feature branch from where you want to base your work.
19 | * Make commits of logical units (if needed rebase your feature branch before
20 |   submitting it).
21 | * Check for unnecessary whitespace with ``git diff --check`` before committing.
22 | * Make sure your commit messages are well formatted.
23 | * If your commit fixes an open issue, reference it in the commit message (f.e. `#15`).
24 | * Run all the tests (if existing) to assure nothing else was accidentally broken.
25 | 
26 | These guidelines also apply when helping with documentation.
27 | 
28 | Submitting Changes
29 | ------------------
30 | * Push your changes to a feature branch in your fork of the repository.
31 | * Submit a `Pull Request`.
32 | * Wait for maintainer feedback.
33 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2015-2016 Tristan Deleu
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 
23 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # NTM-Lasagne
 2 | 
 3 | [![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/snipsco/ntm-lasagne/master/LICENSE)
 4 | 
 5 | NTM-Lasagne is a library to create Neural Turing Machines (NTMs) in [Theano](http://deeplearning.net/software/theano/) using the [Lasagne](http://lasagne.readthedocs.org/) library. If you want to learn more about NTMs, check out our [blog post](https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural-turing-machines-in-lasagne-2cdce6837315#.63t84s5r5).
 6 | 
 7 | This library features:
 8 |  - A Neural Turing Machine layer `NTMLayer`, where all its components (controller, heads, memory) are fully customizable.
 9 |  - Two types of controllers: a feed-forward `DenseController` and a "vanilla" recurrent `RecurrentController`.
10 |  - A dashboard to visualize the inner mechanism of the NTM.
11 |  - Generators to sample examples from algorithmic tasks.
12 | 
13 | ## Getting started
14 | To avoid any conflict with your existing Python setup, and to keep this project self-contained, it is suggested to work in a virtual environment with [`virtualenv`](http://docs.python-guide.org/en/latest/dev/virtualenvs/). To install `virtualenv`:
15 | ```bash
16 | sudo pip install --upgrade virtualenv
17 | ```
18 | 
19 | Create a virtual environment called `venv`, activate it and install the requirements given by `requirements.txt`. NTM-Lasagne requires the bleeding-edge version, check the [Lasagne installation instructions](http://lasagne.readthedocs.org/en/latest/user/installation.html#bleeding-edge-version) for details. The latest version of [Lasagne](https://github.com/Lasagne/Lasagne/) is included in the `requirements.txt`.
20 | ```bash
21 | virtualenv venv
22 | source venv/bin/activate
23 | pip install -r requirements.txt
24 | pip install .
25 | ```
26 | 
27 | ## Example
28 | Here is minimal example to define a `NTMLayer`
29 | 
30 | ```python
31 | # Neural Turing Machine Layer
32 | memory = Memory((128, 20), memory_init=lasagne.init.Constant(1e-6),
33 |     learn_init=False, name='memory')
34 | controller = DenseController(l_input, memory_shape=(128, 20),
35 |     num_units=100, num_reads=1,
36 |     nonlinearity=lasagne.nonlinearities.rectify,
37 |     name='controller')
38 | heads = [
39 |     WriteHead(controller, num_shifts=3, memory_shape=(128, 20),
40 |         nonlinearity_key=lasagne.nonlinearities.rectify,
41 |         nonlinearity_add=lasagne.nonlinearities.rectify,
42 |         learn_init=False, name='write'),
43 |     ReadHead(controller, num_shifts=3, memory_shape=(128, 20),
44 |         nonlinearity_key=lasagne.nonlinearities.rectify,
45 |         learn_init=False, name='read')
46 | ]
47 | l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads)
48 | ```
49 | 
50 | For more detailed examples, check the [`examples` folder](examples/). If you would like to train a Neural Turing Machine on one of these examples, simply run the corresponding script, like
51 | 
52 | ```
53 | PYTHONPATH=. python examples/copy-task.py
54 | ```
55 | 
56 | ## Tests
57 | This projects has a few basic tests. To run these tests, you can run the `py.test` on the project folder
58 | ```bash
59 | venv/bin/py.test ntm -vv
60 | ```
61 | 
62 | ## Known issues
63 | Graph optimization is computationally intensive. If you are encountering suspiciously long compilation times (more than a few minutes), you may need to increase the amount of memory allocated (if you run it on a Virtual Machine). Alternatively, turning off the swap may help for debugging (with `swapoff`/`swapon`).
64 | 
65 | Note: Unlucky initialisation of the parameters might lead to a diverging solution (`NaN` scores).
66 | 
67 | ## Paper
68 | Alex Graves, Greg Wayne, Ivo Danihelka, *Neural Turing Machines*, [[arXiv](https://arxiv.org/abs/1410.5401)]
69 | 
70 | ## Contributing
71 | 
72 | Please see the [Contribution Guidelines](https://github.com/snipsco/ntm-lasagne/blob/master/CONTRIBUTING.md).
73 | 
74 | ## Copyright
75 | 
76 | This library is provided by [Snips](https://www.snips.ai) as Open Source software. See [LICENSE](https://github.com/snipsco/ntm-lasagne/blob/master/LICENSE) for more information.
77 | 


--------------------------------------------------------------------------------
/examples/associative-recall-task.py:
--------------------------------------------------------------------------------
  1 | import theano
  2 | import theano.tensor as T
  3 | import numpy as np
  4 | import matplotlib.pyplot as plt
  5 | 
  6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer
  7 | import lasagne.layers
  8 | import lasagne.nonlinearities
  9 | import lasagne.updates
 10 | import lasagne.objectives
 11 | import lasagne.init
 12 | 
 13 | from ntm.layers import NTMLayer
 14 | from ntm.memory import Memory
 15 | from ntm.controllers import DenseController
 16 | from ntm.heads import WriteHead, ReadHead
 17 | from ntm.updates import graves_rmsprop
 18 | 
 19 | from utils.generators import AssociativeRecallTask
 20 | from utils.visualization import Dashboard
 21 | 
 22 | 
 23 | 
 24 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)):
 25 | 
 26 |     # Input Layer
 27 |     l_input = InputLayer((batch_size, None, size + 2), input_var=input_var)
 28 |     _, seqlen, _ = l_input.input_var.shape
 29 | 
 30 |     # Neural Turing Machine Layer
 31 |     memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False)
 32 |     controller = DenseController(l_input, memory_shape=memory_shape,
 33 |         num_units=num_units, num_reads=1,
 34 |         nonlinearity=lasagne.nonlinearities.rectify,
 35 |         name='controller')
 36 |     heads = [
 37 |         WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False,
 38 |             nonlinearity_key=lasagne.nonlinearities.rectify,
 39 |             nonlinearity_add=lasagne.nonlinearities.rectify),
 40 |         ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False,
 41 |             nonlinearity_key=lasagne.nonlinearities.rectify)
 42 |     ]
 43 |     l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads)
 44 | 
 45 |     # Output Layer
 46 |     l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units))
 47 |     l_output_dense = DenseLayer(l_output_reshape, num_units=size + 2, nonlinearity=lasagne.nonlinearities.sigmoid, \
 48 |         name='dense')
 49 |     l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 2))
 50 | 
 51 |     return l_output, l_ntm
 52 | 
 53 | 
 54 | if __name__ == '__main__':
 55 |     # Define the input and expected output variable
 56 |     input_var, target_var = T.tensor3s('input', 'target')
 57 |     # The generator to sample examples from
 58 |     generator = AssociativeRecallTask(batch_size=1, max_iter=1000000, size=8, max_num_items=6, \
 59 |         min_item_length=1, max_item_length=3)
 60 |     # The model (1-layer Neural Turing Machine)
 61 |     l_output, l_ntm = model(input_var, batch_size=generator.batch_size,
 62 |         size=generator.size, num_units=100, memory_shape=(128, 20))
 63 |     # The generated output variable and the loss function
 64 |     pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6)
 65 |     loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var))
 66 |     # Create the update expressions
 67 |     params = lasagne.layers.get_all_params(l_output, trainable=True)
 68 |     learning_rate = theano.shared(1e-4)
 69 |     updates = lasagne.updates.adam(loss, params, learning_rate=learning_rate)
 70 |     # Compile the function for a training step, as well as the prediction function and
 71 |     # a utility function to get the inner details of the NTM
 72 |     train_fn = theano.function([input_var, target_var], loss, updates=updates)
 73 |     ntm_fn = theano.function([input_var], pred_var)
 74 |     ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True))
 75 | 
 76 |     # Training
 77 |     try:
 78 |         scores, all_scores = [], []
 79 |         for i, (example_input, example_output) in generator:
 80 |             score = train_fn(example_input, example_output)
 81 |             scores.append(score)
 82 |             all_scores.append(score)
 83 |             if i % 500 == 0:
 84 |                 mean_scores = np.mean(scores)
 85 |                 if mean_scores < 0.01:
 86 |                     learning_rate.set_value(1e-5)
 87 |                 print 'Batch #%d: %.6f' % (i, mean_scores)
 88 |                 scores = []
 89 |     except KeyboardInterrupt:
 90 |         pass
 91 | 
 92 |     # Visualization
 93 |     def marker1(params):
 94 |         return params['num_items'] * (params['item_length'] + 1)
 95 |     def marker2(params):
 96 |         return (params['num_items'] + 1) * (params['item_length'] + 1)
 97 |     markers = [
 98 |         {
 99 |             'location': marker1,
100 |             'style': {'color': 'red', 'ls': '-'}
101 |         },
102 |         {
103 |             'location': marker2,
104 |             'style': {'color': 'green', 'ls': '-'}
105 |         }
106 |     ]
107 | 
108 |     dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \
109 |         memory_shape=(128, 20), markers=markers, cmap='bone')
110 | 
111 |     # Example
112 |     params = generator.sample_params()
113 |     dashboard.sample(**params)
114 | 


--------------------------------------------------------------------------------
/examples/copy-task.py:
--------------------------------------------------------------------------------
  1 | import theano
  2 | import theano.tensor as T
  3 | import numpy as np
  4 | import matplotlib.pyplot as plt
  5 | 
  6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer
  7 | import lasagne.layers
  8 | import lasagne.nonlinearities
  9 | import lasagne.updates
 10 | import lasagne.objectives
 11 | import lasagne.init
 12 | 
 13 | from ntm.layers import NTMLayer
 14 | from ntm.memory import Memory
 15 | from ntm.controllers import DenseController
 16 | from ntm.heads import WriteHead, ReadHead
 17 | from ntm.updates import graves_rmsprop
 18 | 
 19 | from utils.generators import CopyTask
 20 | from utils.visualization import Dashboard
 21 | 
 22 | 
 23 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)):
 24 | 
 25 |     # Input Layer
 26 |     l_input = InputLayer((batch_size, None, size + 1), input_var=input_var)
 27 |     _, seqlen, _ = l_input.input_var.shape
 28 | 
 29 |     # Neural Turing Machine Layer
 30 |     memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False)
 31 |     controller = DenseController(l_input, memory_shape=memory_shape,
 32 |         num_units=num_units, num_reads=1,
 33 |         nonlinearity=lasagne.nonlinearities.rectify,
 34 |         name='controller')
 35 |     heads = [
 36 |         WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False,
 37 |             nonlinearity_key=lasagne.nonlinearities.rectify,
 38 |             nonlinearity_add=lasagne.nonlinearities.rectify),
 39 |         ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False,
 40 |             nonlinearity_key=lasagne.nonlinearities.rectify)
 41 |     ]
 42 |     l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads)
 43 | 
 44 |     # Output Layer
 45 |     l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units))
 46 |     l_output_dense = DenseLayer(l_output_reshape, num_units=size + 1, nonlinearity=lasagne.nonlinearities.sigmoid, \
 47 |         name='dense')
 48 |     l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 1))
 49 | 
 50 |     return l_output, l_ntm
 51 | 
 52 | 
 53 | if __name__ == '__main__':
 54 |     # Define the input and expected output variable
 55 |     input_var, target_var = T.tensor3s('input', 'target')
 56 |     # The generator to sample examples from
 57 |     generator = CopyTask(batch_size=1, max_iter=1000000, size=8, max_length=5, end_marker=True)
 58 |     # The model (1-layer Neural Turing Machine)
 59 |     l_output, l_ntm = model(input_var, batch_size=generator.batch_size, \
 60 |         size=generator.size, num_units=100, memory_shape=(128, 20))
 61 |     # The generated output variable and the loss function
 62 |     pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6)
 63 |     loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var))
 64 |     # Create the update expressions
 65 |     params = lasagne.layers.get_all_params(l_output, trainable=True)
 66 |     updates = graves_rmsprop(loss, params, learning_rate=1e-3)
 67 |     # Compile the function for a training step, as well as the prediction function and
 68 |     # a utility function to get the inner details of the NTM
 69 |     train_fn = theano.function([input_var, target_var], loss, updates=updates)
 70 |     ntm_fn = theano.function([input_var], pred_var)
 71 |     ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True))
 72 | 
 73 |     # Training
 74 |     try:
 75 |         scores, all_scores = [], []
 76 |         for i, (example_input, example_output) in generator:
 77 |             score = train_fn(example_input, example_output)
 78 |             scores.append(score)
 79 |             all_scores.append(score)
 80 |             if i % 500 == 0:
 81 |                 mean_scores = np.mean(scores)
 82 |                 print 'Batch #%d: %.6f' % (i, mean_scores)
 83 |                 scores = []
 84 |     except KeyboardInterrupt:
 85 |         pass
 86 | 
 87 |     # Visualization
 88 |     markers = [
 89 |         {
 90 |             'location': (lambda params: params['length']),
 91 |             'style': {'color': 'red'}
 92 |         }
 93 |     ]
 94 | 
 95 |     dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \
 96 |         memory_shape=(128, 20), markers=markers, cmap='bone')
 97 | 
 98 |     # Example
 99 |     params = generator.sample_params()
100 |     dashboard.sample(**params)
101 | 


--------------------------------------------------------------------------------
/examples/dyck-words-task.py:
--------------------------------------------------------------------------------
 1 | import theano
 2 | import theano.tensor as T
 3 | import numpy as np
 4 | import matplotlib.pyplot as plt
 5 | 
 6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer
 7 | import lasagne.layers
 8 | import lasagne.nonlinearities
 9 | import lasagne.updates
10 | import lasagne.objectives
11 | import lasagne.init
12 | 
13 | from ntm.layers import NTMLayer
14 | from ntm.memory import Memory
15 | from ntm.controllers import DenseController
16 | from ntm.heads import WriteHead, ReadHead
17 | from ntm.updates import graves_rmsprop
18 | 
19 | from utils.generators import DyckWordsTask
20 | from utils.visualization import Dashboard
21 | 
22 | 
23 | def model(input_var, batch_size=1, num_units=100, memory_shape=(128, 20)):
24 | 
25 |     # Input Layer
26 |     l_input = InputLayer((batch_size, None, 1), input_var=input_var)
27 |     _, seqlen, _ = l_input.input_var.shape
28 | 
29 |     # Neural Turing Machine Layer
30 |     memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False)
31 |     controller = DenseController(l_input, memory_shape=memory_shape,
32 |         num_units=num_units, num_reads=1,
33 |         nonlinearity=lasagne.nonlinearities.rectify,
34 |         name='controller')
35 |     heads = [
36 |         WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False,
37 |             nonlinearity_key=lasagne.nonlinearities.rectify,
38 |             nonlinearity_add=lasagne.nonlinearities.rectify),
39 |         ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False,
40 |             nonlinearity_key=lasagne.nonlinearities.rectify)
41 |     ]
42 |     l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads)
43 | 
44 |     # Output Layer
45 |     l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units))
46 |     l_output_dense = DenseLayer(l_output_reshape, num_units=1, nonlinearity=lasagne.nonlinearities.sigmoid, \
47 |         name='dense')
48 |     l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, 1))
49 | 
50 |     return l_output, l_ntm
51 | 
52 | 
53 | if __name__ == '__main__':
54 |     # Define the input and expected output variable
55 |     input_var, target_var = T.tensor3s('input', 'target')
56 |     # The generator to sample examples from
57 |     generator = DyckWordsTask(batch_size=1, max_iter=1000000, max_length=5)
58 |     # The model (1-layer Neural Turing Machine)
59 |     l_output, l_ntm = model(input_var, batch_size=generator.batch_size,
60 |         num_units=100, memory_shape=(128, 20))
61 |     # The generated output variable and the loss function
62 |     pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6)
63 |     loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var))
64 |     # Create the update expressions
65 |     params = lasagne.layers.get_all_params(l_output, trainable=True)
66 |     updates = lasagne.updates.adam(loss, params, learning_rate=5e-4)
67 |     # Compile the function for a training step, as well as the prediction function and
68 |     # a utility function to get the inner details of the NTM
69 |     train_fn = theano.function([input_var, target_var], loss, updates=updates)
70 |     ntm_fn = theano.function([input_var], pred_var)
71 |     ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True))
72 | 
73 |     # Training
74 |     try:
75 |         scores, all_scores = [], []
76 |         for i, (example_input, example_output) in generator:
77 |             score = train_fn(example_input, example_output)
78 |             scores.append(score)
79 |             all_scores.append(score)
80 |             if i % 500 == 0:
81 |                 mean_scores = np.mean(scores)
82 |                 if mean_scores < 1e-4 and generator.max_length < 20:
83 |                     generator.max_length *= 2
84 |                 print 'Batch #%d: %.6f' % (i, mean_scores)
85 |                 scores = []
86 |     except KeyboardInterrupt:
87 |         pass
88 | 


--------------------------------------------------------------------------------
/examples/repeat-copy-task.py:
--------------------------------------------------------------------------------
  1 | import theano
  2 | import theano.tensor as T
  3 | import numpy as np
  4 | import matplotlib.pyplot as plt
  5 | 
  6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer
  7 | import lasagne.layers
  8 | import lasagne.nonlinearities
  9 | import lasagne.updates
 10 | import lasagne.objectives
 11 | import lasagne.init
 12 | 
 13 | from ntm.layers import NTMLayer
 14 | from ntm.memory import Memory
 15 | from ntm.controllers import DenseController
 16 | from ntm.heads import WriteHead, ReadHead
 17 | from ntm.updates import graves_rmsprop
 18 | 
 19 | from utils.generators import RepeatCopyTask
 20 | from utils.visualization import Dashboard
 21 | 
 22 | 
 23 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)):
 24 | 
 25 |     # Input Layer
 26 |     l_input = InputLayer((batch_size, None, size + 2), input_var=input_var)
 27 |     _, seqlen, _ = l_input.input_var.shape
 28 | 
 29 |     # Neural Turing Machine Layer
 30 |     memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False)
 31 |     controller = DenseController(l_input, memory_shape=memory_shape,
 32 |         num_units=num_units, num_reads=1,
 33 |         nonlinearity=lasagne.nonlinearities.rectify,
 34 |         name='controller')
 35 |     heads = [
 36 |         WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False,
 37 |             nonlinearity_key=lasagne.nonlinearities.rectify,
 38 |             nonlinearity_add=lasagne.nonlinearities.rectify),
 39 |         ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False,
 40 |             nonlinearity_key=lasagne.nonlinearities.rectify)
 41 |     ]
 42 |     l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads)
 43 | 
 44 |     # Output Layer
 45 |     l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units))
 46 |     l_output_dense = DenseLayer(l_output_reshape, num_units=size + 2, nonlinearity=lasagne.nonlinearities.sigmoid, \
 47 |         name='dense')
 48 |     l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 2))
 49 | 
 50 |     return l_output, l_ntm
 51 | 
 52 | 
 53 | if __name__ == '__main__':
 54 |     # Define the input and expected output variable
 55 |     input_var, target_var = T.tensor3s('input', 'target')
 56 |     # The generator to sample examples from
 57 |     generator = RepeatCopyTask(batch_size=1, max_iter=1000000, size=8, min_length=3, \
 58 |         max_length=5, max_repeats=5, unary=True, end_marker=True)
 59 |     # The model (1-layer Neural Turing Machine)
 60 |     l_output, l_ntm = model(input_var, batch_size=generator.batch_size,
 61 |         size=generator.size, num_units=100, memory_shape=(128, 20))
 62 |     # The generated output variable and the loss function
 63 |     pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6)
 64 |     loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var))
 65 |     # Create the update expressions
 66 |     params = lasagne.layers.get_all_params(l_output, trainable=True)
 67 |     updates = lasagne.updates.adam(loss, params, learning_rate=5e-4)
 68 |     # Compile the function for a training step, as well as the prediction function and
 69 |     # a utility function to get the inner details of the NTM
 70 |     train_fn = theano.function([input_var, target_var], loss, updates=updates)
 71 |     ntm_fn = theano.function([input_var], pred_var)
 72 |     ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True))
 73 | 
 74 |     # Training
 75 |     try:
 76 |         scores, all_scores = [], []
 77 |         for i, (example_input, example_output) in generator:
 78 |             score = train_fn(example_input, example_output)
 79 |             scores.append(score)
 80 |             all_scores.append(score)
 81 |             if i % 500 == 0:
 82 |                 mean_scores = np.mean(scores)
 83 |                 print 'Batch #%d: %.6f' % (i, mean_scores)
 84 |                 scores = []
 85 |     except KeyboardInterrupt:
 86 |         pass
 87 | 
 88 |     # Visualization
 89 |     def marker(generator):
 90 |         def marker_(params):
 91 |             num_repeats_length = params['repeats'] if generator.unary else 1
 92 |             return params['length'] + num_repeats_length
 93 |         return marker_
 94 |     markers = [
 95 |         {
 96 |             'location': marker(generator),
 97 |             'style': {'color': 'red', 'ls': '-'}
 98 |         }
 99 |     ]
100 | 
101 |     dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \
102 |         memory_shape=(128, 20), markers=markers, cmap='bone')
103 | 
104 |     # Example
105 |     params = generator.sample_params()
106 |     dashboard.sample(**params)
107 | 


--------------------------------------------------------------------------------
/examples/reversed-copy-task.py:
--------------------------------------------------------------------------------
  1 | import theano
  2 | import theano.tensor as T
  3 | import numpy as np
  4 | import matplotlib.pyplot as plt
  5 | 
  6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer
  7 | import lasagne.layers
  8 | import lasagne.nonlinearities
  9 | import lasagne.updates
 10 | import lasagne.objectives
 11 | import lasagne.init
 12 | 
 13 | from ntm.layers import NTMLayer
 14 | from ntm.memory import Memory
 15 | from ntm.controllers import RecurrentController
 16 | from ntm.heads import WriteHead, ReadHead
 17 | from ntm.updates import graves_rmsprop
 18 | 
 19 | from utils.generators import ReversedCopyTask
 20 | from utils.visualization import Dashboard
 21 | 
 22 | 
 23 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)):
 24 | 
 25 |     # Input Layer
 26 |     l_input = InputLayer((batch_size, None, size + 1), input_var=input_var)
 27 |     _, seqlen, _ = l_input.input_var.shape
 28 | 
 29 |     # Neural Turing Machine Layer
 30 |     memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False)
 31 |     controller = RecurrentController(l_input, memory_shape=memory_shape,
 32 |         num_units=num_units, num_reads=1,
 33 |         nonlinearity=lasagne.nonlinearities.rectify,
 34 |         name='controller')
 35 |     heads = [
 36 |         WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False,
 37 |             nonlinearity_key=lasagne.nonlinearities.rectify,
 38 |             nonlinearity_add=lasagne.nonlinearities.rectify),
 39 |         ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False,
 40 |             nonlinearity_key=lasagne.nonlinearities.rectify)
 41 |     ]
 42 |     l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads)
 43 | 
 44 |     # Output Layer
 45 |     l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units))
 46 |     l_output_dense = DenseLayer(l_output_reshape, num_units=size + 1, nonlinearity=lasagne.nonlinearities.sigmoid, \
 47 |         name='recurrent')
 48 |     l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 1))
 49 | 
 50 |     return l_output, l_ntm
 51 | 
 52 | 
 53 | if __name__ == '__main__':
 54 |     # Define the input and expected output variable
 55 |     input_var, target_var = T.tensor3s('input', 'target')
 56 |     # The generator to sample examples from
 57 |     generator = ReversedCopyTask(batch_size=1, max_iter=1000000, size=8, max_length=5, end_marker=True)
 58 |     # The model (1-layer Neural Turing Machine)
 59 |     l_output, l_ntm = model(input_var, batch_size=generator.batch_size, \
 60 |         size=generator.size, num_units=100, memory_shape=(128, 20))
 61 |     # The generated output variable and the loss function
 62 |     pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6)
 63 |     loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var))
 64 |     # Create the update expressions
 65 |     params = lasagne.layers.get_all_params(l_output, trainable=True)
 66 |     updates = graves_rmsprop(loss, params, learning_rate=1e-3)
 67 |     # Compile the function for a training step, as well as the prediction function and
 68 |     # a utility function to get the inner details of the NTM
 69 |     train_fn = theano.function([input_var, target_var], loss, updates=updates)
 70 |     ntm_fn = theano.function([input_var], pred_var)
 71 |     ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True))
 72 | 
 73 |     # Training
 74 |     try:
 75 |         scores, all_scores = [], []
 76 |         for i, (example_input, example_output) in generator:
 77 |             score = train_fn(example_input, example_output)
 78 |             scores.append(score)
 79 |             all_scores.append(score)
 80 |             if i % 500 == 0:
 81 |                 mean_scores = np.mean(scores)
 82 |                 print 'Batch #%d: %.6f' % (i, mean_scores)
 83 |                 scores = []
 84 |     except KeyboardInterrupt:
 85 |         pass
 86 | 
 87 |    # Visualization
 88 |     markers = [
 89 |         {
 90 |             'location': (lambda params: params['length']),
 91 |             'style': {'color': 'red'}
 92 |         }
 93 |     ]
 94 | 
 95 |     dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \
 96 |         memory_shape=(128, 20), markers=markers, cmap='bone')
 97 | 
 98 |     # Example
 99 |     params = generator.sample_params()
100 |     dashboard.sample(**params)
101 | 


--------------------------------------------------------------------------------
/examples/sort-task.py:
--------------------------------------------------------------------------------
  1 | import theano
  2 | import theano.tensor as T
  3 | import numpy as np
  4 | import matplotlib.pyplot as plt
  5 | 
  6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer
  7 | import lasagne.layers
  8 | import lasagne.nonlinearities
  9 | import lasagne.updates
 10 | import lasagne.objectives
 11 | import lasagne.init
 12 | 
 13 | from ntm.layers import NTMLayer
 14 | from ntm.memory import Memory
 15 | from ntm.controllers import GRUController
 16 | from ntm.heads import WriteHead, ReadHead
 17 | from ntm.updates import graves_rmsprop
 18 | 
 19 | from utils.generators import SortTask
 20 | from utils.visualization import Dashboard
 21 | 
 22 | 
 23 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)):
 24 | 
 25 |     # Input Layer
 26 |     l_input = InputLayer((batch_size, None, size + 1), input_var=input_var)
 27 |     _, seqlen, _ = l_input.input_var.shape
 28 | 
 29 |     # Neural Turing Machine Layer
 30 |     memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False)
 31 |     controller = GRUController(l_input, memory_shape=memory_shape,
 32 |         num_units=num_units, num_reads=1,
 33 |         nonlinearity=lasagne.nonlinearities.rectify,
 34 |         name='controller')
 35 |     heads = [
 36 |         WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False,
 37 |             nonlinearity_key=lasagne.nonlinearities.rectify,
 38 |             nonlinearity_add=lasagne.nonlinearities.rectify),
 39 |         ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False,
 40 |             nonlinearity_key=lasagne.nonlinearities.rectify)
 41 |     ]
 42 |     l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads)
 43 | 
 44 |     # Output Layer
 45 |     l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units))
 46 |     l_output_dense = DenseLayer(l_output_reshape, num_units=size + 1, nonlinearity=lasagne.nonlinearities.sigmoid, \
 47 |         name='dense')
 48 |     l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 1))
 49 | 
 50 |     return l_output, l_ntm
 51 | 
 52 | 
 53 | if __name__ == '__main__':
 54 |     # Define the input and expected output variable
 55 |     input_var, target_var = T.tensor3s('input', 'target')
 56 |     # The generator to sample examples from
 57 |     generator = SortTask(batch_size=1, max_iter=1000000, size=8, max_length=5, end_marker=True)
 58 |     # The model (1-layer Neural Turing Machine)
 59 |     l_output, l_ntm = model(input_var, batch_size=generator.batch_size, \
 60 |     size=generator.size, num_units=100, memory_shape=(128, 20))
 61 |     # The generated output variable and the loss function
 62 |     pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6)
 63 |     loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var))
 64 |     # Create the update expressions
 65 |     params = lasagne.layers.get_all_params(l_output, trainable=True)
 66 |     updates = graves_rmsprop(loss, params, learning_rate=1e-3)
 67 |     # Compile the function for a training step, as well as the prediction function and
 68 |     # a utility function to get the inner details of the NTM
 69 |     train_fn = theano.function([input_var, target_var], loss, updates=updates)
 70 |     ntm_fn = theano.function([input_var], pred_var)
 71 |     ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True))
 72 | 
 73 |     # Training
 74 |     try:
 75 |         scores, all_scores = [], []
 76 |         for i, (example_input, example_output) in generator:
 77 |             score = train_fn(example_input, example_output)
 78 |             scores.append(score)
 79 |             all_scores.append(score)
 80 |             if i % 500 == 0:
 81 |                 mean_scores = np.mean(scores)
 82 |                 print 'Batch #%d: %.6f' % (i, mean_scores)
 83 |                 scores = []
 84 |     except KeyboardInterrupt:
 85 |         pass
 86 | 
 87 |     # Visualization
 88 |     markers = [
 89 |         {
 90 |             'location': (lambda params: params['length']),
 91 |             'style': {'color': 'red'}
 92 |         }
 93 |     ]
 94 | 
 95 |     dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \
 96 |         memory_shape=(128, 20), markers=markers, cmap='bone')
 97 | 
 98 |     # Example
 99 |     params = generator.sample_params()
100 |     dashboard.sample(**params)
101 | 


--------------------------------------------------------------------------------
/examples/upsidedown-copy-task.py:
--------------------------------------------------------------------------------
  1 | import theano
  2 | import theano.tensor as T
  3 | import numpy as np
  4 | import matplotlib.pyplot as plt
  5 | 
  6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer
  7 | import lasagne.layers
  8 | import lasagne.nonlinearities
  9 | import lasagne.updates
 10 | import lasagne.objectives
 11 | import lasagne.init
 12 | 
 13 | from ntm.layers import NTMLayer
 14 | from ntm.memory import Memory
 15 | from ntm.controllers import DenseController
 16 | from ntm.heads import WriteHead, ReadHead
 17 | from ntm.updates import graves_rmsprop
 18 | 
 19 | from utils.generators import UpsideDownCopyTask
 20 | from utils.visualization import Dashboard
 21 | 
 22 | 
 23 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)):
 24 | 
 25 |     # Input Layer
 26 |     l_input = InputLayer((batch_size, None, size + 1), input_var=input_var)
 27 |     _, seqlen, _ = l_input.input_var.shape
 28 | 
 29 |     # Neural Turing Machine Layer
 30 |     memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False)
 31 |     controller = DenseController(l_input, memory_shape=memory_shape,
 32 |         num_units=num_units, num_reads=1,
 33 |         nonlinearity=lasagne.nonlinearities.rectify,
 34 |         name='controller')
 35 |     heads = [
 36 |         WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False,
 37 |             nonlinearity_key=lasagne.nonlinearities.rectify,
 38 |             nonlinearity_add=lasagne.nonlinearities.rectify),
 39 |         ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False,
 40 |             nonlinearity_key=lasagne.nonlinearities.rectify)
 41 |     ]
 42 |     l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads)
 43 | 
 44 |     # Output Layer
 45 |     l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units))
 46 |     l_output_dense = DenseLayer(l_output_reshape, num_units=size + 1, nonlinearity=lasagne.nonlinearities.sigmoid, \
 47 |         name='dense')
 48 |     l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 1))
 49 | 
 50 |     return l_output, l_ntm
 51 | 
 52 | 
 53 | if __name__ == '__main__':
 54 |     # Define the input and expected output variable
 55 |     input_var, target_var = T.tensor3s('input', 'target')
 56 |     # The generator to sample examples from
 57 |     generator = UpsideDownCopyTask(batch_size=1, max_iter=1000000, size=8, max_length=5, end_marker=True)
 58 |     # The model (1-layer Neural Turing Machine)
 59 |     l_output, l_ntm = model(input_var, batch_size=generator.batch_size, \
 60 |         size=generator.size, num_units=100, memory_shape=(128, 20))
 61 |     # The generated output variable and the loss function
 62 |     pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6)
 63 |     loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var))
 64 |     # Create the update expressions
 65 |     params = lasagne.layers.get_all_params(l_output, trainable=True)
 66 |     updates = graves_rmsprop(loss, params, learning_rate=1e-3)
 67 |     # Compile the function for a training step, as well as the prediction function and
 68 |     # a utility function to get the inner details of the NTM
 69 |     train_fn = theano.function([input_var, target_var], loss, updates=updates)
 70 |     ntm_fn = theano.function([input_var], pred_var)
 71 |     ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True))
 72 | 
 73 |     # Training
 74 |     try:
 75 |         scores, all_scores = [], []
 76 |         for i, (example_input, example_output) in generator:
 77 |             score = train_fn(example_input, example_output)
 78 |             scores.append(score)
 79 |             all_scores.append(score)
 80 |             if i % 500 == 0:
 81 |                 mean_scores = np.mean(scores)
 82 |                 print 'Batch #%d: %.6f' % (i, mean_scores)
 83 |                 scores = []
 84 |     except KeyboardInterrupt:
 85 |         pass
 86 | 
 87 |     # Visualization
 88 |     markers = [
 89 |         {
 90 |             'location': (lambda params: params['length']),
 91 |             'style': {'color': 'red'}
 92 |         }
 93 |     ]
 94 | 
 95 |     dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \
 96 |         memory_shape=(128, 20), markers=markers, cmap='bone')
 97 | 
 98 |     # Example
 99 |     params = generator.sample_params()
100 |     dashboard.sample(**params)
101 | 


--------------------------------------------------------------------------------
/models/README.rst:
--------------------------------------------------------------------------------
  1 | Models
  2 | ======
  3 | 
  4 | Copy task
  5 | ---------
  6 | 
  7 | General
  8 | ^^^^^^^
  9 | * **Batch size** : 1
 10 | * **Architecture** : Single NTM layer + Dense output layer
 11 | 
 12 |   - *Memory shape* : ``(128, 20)``
 13 |   - *Controller* : Dense controller
 14 |   - *Heads* : 1 Read head + 1 Write head
 15 | 
 16 | * **Training examples**
 17 | 
 18 |   - *Size* : 8
 19 |   - *Minimum Length* : 1
 20 |   - *Maximum Length* : 5
 21 | 
 22 | Optimization
 23 | ^^^^^^^^^^^^
 24 | * **Objective** : Binary Cross-Entropy
 25 | 
 26 |   - Prediction trucated to ``[1e-10, 1 - 1e-10]``
 27 | 
 28 | * **Learning algorithm** : A. Graves' RMSProp
 29 | 
 30 |   - *Learning rate* : ``1e-3``
 31 |   - *Chi* : ``0.95``
 32 |   - *Alpha* : ``0.9``
 33 |   - *Epsilon* : ``1e-4``
 34 | 
 35 | Parameters
 36 | ^^^^^^^^^^
 37 | +------------------+--------------+---------------------+------------------+------------------+
 38 | |                  | Parameter    | W (init)            |  b (init)        | nonlinearity     |
 39 | +==================+==============+=====================+==================+==================+
 40 | | **Controller**   | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
 41 | +------------------+--------------+---------------------+------------------+------------------+
 42 | |                  | ``sign``     | ``None``            | \-               | \-               |
 43 | | **Read Head**    +--------------+---------------------+------------------+------------------+
 44 | |                  | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
 45 | |                  +--------------+---------------------+------------------+------------------+
 46 | |                  | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
 47 | |                  +--------------+---------------------+------------------+------------------+
 48 | |                  | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
 49 | |                  +--------------+---------------------+------------------+------------------+
 50 | |                  | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
 51 | |                  +--------------+---------------------+------------------+------------------+
 52 | |                  | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
 53 | +------------------+--------------+---------------------+------------------+------------------+
 54 | |                  | ``sign``     | ``None``            | \-               | \-               |
 55 | | **Write Head**   +--------------+---------------------+------------------+------------------+
 56 | |                  | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
 57 | |                  +--------------+---------------------+------------------+------------------+
 58 | |                  | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
 59 | |                  +--------------+---------------------+------------------+------------------+
 60 | |                  | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
 61 | |                  +--------------+---------------------+------------------+------------------+
 62 | |                  | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
 63 | |                  +--------------+---------------------+------------------+------------------+
 64 | |                  | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
 65 | |                  +--------------+---------------------+------------------+------------------+
 66 | |                  | ``sign_add`` | ``None``            | \-               | \-               |
 67 | |                  +--------------+---------------------+------------------+------------------+
 68 | |                  | ``add``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
 69 | |                  +--------------+---------------------+------------------+------------------+
 70 | |                  | ``erase``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
 71 | +------------------+--------------+---------------------+------------------+------------------+
 72 | | **Dense Layer**  | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid``      |
 73 | +------------------+--------------+---------------------+------------------+------------------+
 74 | 
 75 | Initialization
 76 | ^^^^^^^^^^^^^^
 77 | +------------------+---------------------+-------------+-------------------+
 78 | |                  | Initialization      | Learn init? | Operation dropout |
 79 | +==================+=====================+=============+===================+
 80 | | **Memory**       | ``GlorotUniform()`` | ``False``   | \-                |
 81 | +------------------+---------------------+-------------+-------------------+
 82 | | **Read Head**    | ``init.OneHot()``   | ``False``   | No                |
 83 | +------------------+---------------------+-------------+-------------------+
 84 | | **Write Head**   | ``init.OneHot()``   | ``False``   | No                |
 85 | +------------------+---------------------+-------------+-------------------+
 86 | 
 87 | 
 88 | Repeat Copy task
 89 | ----------------
 90 | **Git commit** : ``90d72d6``
 91 | 
 92 | General
 93 | ^^^^^^^
 94 | * **Batch size** : 1
 95 | * **Architecture** : Single NTM layer + Dense output layer
 96 | 
 97 |   - *Memory shape* : ``(128, 20)``
 98 |   - *Controller* : Dense controller
 99 |   - *Heads* : 1 Read head + 1 Write head
100 | 
101 | * **Training examples**
102 | 
103 |   - *Size* : 8
104 |   - *Minimum Length* : 3
105 |   - *Maximum Length* : 5
106 |   - *Minimum Repeat number* : 1
107 |   - *Maximum Repeat number* : 5
108 |   - *Unary* : ``True``
109 | 
110 | Optimization
111 | ^^^^^^^^^^^^
112 | * **Objective** : Binary Cross-Entropy
113 | 
114 |   - Prediction trucated to ``[1e-10, 1 - 1e-10]``
115 | 
116 | * **Learning algorithm** : A. Graves' RMSProp
117 | 
118 |   - *Learning rate* : ``1e-3``
119 |   - *Chi* : ``0.95``
120 |   - *Alpha* : ``0.9``
121 |   - *Epsilon* : ``1e-4``
122 | 
123 | Parameters
124 | ^^^^^^^^^^
125 | +------------------+--------------+---------------------+------------------+------------------+
126 | |                  | Parameter    | W (init)            |  b (init)        | nonlinearity     |
127 | +==================+==============+=====================+==================+==================+
128 | | **Controller**   | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
129 | +------------------+--------------+---------------------+------------------+------------------+
130 | |                  | ``sign``     | ``None``            | \-               | \-               |
131 | | **Read Head**    +--------------+---------------------+------------------+------------------+
132 | |                  | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
133 | |                  +--------------+---------------------+------------------+------------------+
134 | |                  | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
135 | |                  +--------------+---------------------+------------------+------------------+
136 | |                  | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
137 | |                  +--------------+---------------------+------------------+------------------+
138 | |                  | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
139 | |                  +--------------+---------------------+------------------+------------------+
140 | |                  | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
141 | +------------------+--------------+---------------------+------------------+------------------+
142 | |                  | ``sign``     | ``None``            | \-               | \-               |
143 | | **Write Head**   +--------------+---------------------+------------------+------------------+
144 | |                  | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
145 | |                  +--------------+---------------------+------------------+------------------+
146 | |                  | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
147 | |                  +--------------+---------------------+------------------+------------------+
148 | |                  | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
149 | |                  +--------------+---------------------+------------------+------------------+
150 | |                  | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
151 | |                  +--------------+---------------------+------------------+------------------+
152 | |                  | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
153 | |                  +--------------+---------------------+------------------+------------------+
154 | |                  | ``sign_add`` | ``None``            | \-               | \-               |
155 | |                  +--------------+---------------------+------------------+------------------+
156 | |                  | ``add``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
157 | |                  +--------------+---------------------+------------------+------------------+
158 | |                  | ``erase``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
159 | +------------------+--------------+---------------------+------------------+------------------+
160 | | **Dense Layer**  | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid``      |
161 | +------------------+--------------+---------------------+------------------+------------------+
162 | 
163 | Initialization
164 | ^^^^^^^^^^^^^^
165 | +------------------+---------------------+-------------+-------------------+
166 | |                  | Initialization      | Learn init? | Operation dropout |
167 | +==================+=====================+=============+===================+
168 | | **Memory**       | ``GlorotUniform()`` | ``False``   | \-                |
169 | +------------------+---------------------+-------------+-------------------+
170 | | **Read Head**    | ``init.OneHot()``   | ``False``   | No                |
171 | +------------------+---------------------+-------------+-------------------+
172 | | **Write Head**   | ``init.OneHot()``   | ``False``   | No                |
173 | +------------------+---------------------+-------------+-------------------+
174 | 
175 | 
176 | Associative Recall task
177 | -----------------------
178 | **Git commit** : ``3bd7512``
179 | 
180 | General
181 | ^^^^^^^
182 | * **Batch size** : 1
183 | * **Architecture** : Single NTM layer + Dense output layer
184 | 
185 |   - *Memory shape* : ``(128, 20)``
186 |   - *Controller* : Dense controller
187 |   - *Heads* : 1 Read head + 1 Write head
188 | 
189 | * **Training examples**
190 | 
191 |   - *Size* : 8
192 |   - *Minimum Item Length* : 1
193 |   - *Maximum Item Length* : 3
194 |   - *Minimum Number of Items* : 2
195 |   - *Maximum Number of Items* : 6
196 | 
197 | Optimization
198 | ^^^^^^^^^^^^
199 | * **Objective** : Binary Cross-Entropy
200 | 
201 |   - Prediction trucated to ``[1e-10, 1 - 1e-10]``
202 | 
203 | * **Learning algorithm** : Adam
204 | 
205 |   - *Learning rate* : ``1e-4``
206 |   - *Beta1* : ``0.9``
207 |   - *Beta2* : ``0.999``
208 |   - *Epsilon* : ``1e-8``
209 | 
210 | Parameters
211 | ^^^^^^^^^^
212 | +------------------+--------------+---------------------+------------------+------------------+
213 | |                  | Parameter    | W (init)            |  b (init)        | nonlinearity     |
214 | +==================+==============+=====================+==================+==================+
215 | | **Controller**   | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
216 | +------------------+--------------+---------------------+------------------+------------------+
217 | |                  | ``sign``     | ``None``            | \-               | \-               |
218 | | **Read Head**    +--------------+---------------------+------------------+------------------+
219 | |                  | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
220 | |                  +--------------+---------------------+------------------+------------------+
221 | |                  | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
222 | |                  +--------------+---------------------+------------------+------------------+
223 | |                  | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
224 | |                  +--------------+---------------------+------------------+------------------+
225 | |                  | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
226 | |                  +--------------+---------------------+------------------+------------------+
227 | |                  | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
228 | +------------------+--------------+---------------------+------------------+------------------+
229 | |                  | ``sign``     | ``None``            | \-               | \-               |
230 | | **Write Head**   +--------------+---------------------+------------------+------------------+
231 | |                  | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
232 | |                  +--------------+---------------------+------------------+------------------+
233 | |                  | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
234 | |                  +--------------+---------------------+------------------+------------------+
235 | |                  | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
236 | |                  +--------------+---------------------+------------------+------------------+
237 | |                  | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
238 | |                  +--------------+---------------------+------------------+------------------+
239 | |                  | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
240 | |                  +--------------+---------------------+------------------+------------------+
241 | |                  | ``sign_add`` | ``None``            | \-               | \-               |
242 | |                  +--------------+---------------------+------------------+------------------+
243 | |                  | ``add``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
244 | |                  +--------------+---------------------+------------------+------------------+
245 | |                  | ``erase``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
246 | +------------------+--------------+---------------------+------------------+------------------+
247 | | **Dense Layer**  | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid``      |
248 | +------------------+--------------+---------------------+------------------+------------------+
249 | 
250 | Initialization
251 | ^^^^^^^^^^^^^^
252 | +------------------+---------------------+-------------+-------------------+
253 | |                  | Initialization      | Learn init? | Operation dropout |
254 | +==================+=====================+=============+===================+
255 | | **Memory**       | ``Constant(1e-6)``  | ``False``   | \-                |
256 | +------------------+---------------------+-------------+-------------------+
257 | | **Read Head**    | ``init.OneHot()``   | ``False``   | No                |
258 | +------------------+---------------------+-------------+-------------------+
259 | | **Write Head**   | ``init.OneHot()``   | ``False``   | No                |
260 | +------------------+---------------------+-------------+-------------------+
261 | 
262 | 
263 | Dyck Words task
264 | ---------------
265 | **Git commit** : ``873deec``
266 | 
267 | General
268 | ^^^^^^^
269 | * **Batch size** : 1
270 | * **Architecture** : Single NTM layer + Dense output layer
271 | 
272 |   - *Memory shape* : ``(128, 20)``
273 |   - *Controller* : Dense controller
274 |   - *Heads* : 1 Read head + 1 Write head
275 | 
276 | * **Training examples**
277 | 
278 |   - *Initial Maximum Semi-Length* : 5
279 |   Double maximum semi-length every time the mean loss over 500 samples is below ``1e-4`` up to a maximum of 40.
280 | 
281 | Optimization
282 | ^^^^^^^^^^^^
283 | * **Objective** : Binary Cross-Entropy
284 | 
285 |   - Prediction trucated to ``[1e-10, 1 - 1e-10]``
286 | 
287 | * **Learning algorithm** : Adam
288 | 
289 |   - *Learning rate* : ``1e-3``
290 |   - *Beta1* : ``0.9``
291 |   - *Beta2* : ``0.999``
292 |   - *Epsilon* : ``1e-8``
293 | 
294 | Parameters
295 | ^^^^^^^^^^
296 | +------------------+--------------+---------------------+------------------+------------------+
297 | |                  | Parameter    | W (init)            |  b (init)        | nonlinearity     |
298 | +==================+==============+=====================+==================+==================+
299 | | **Controller**   | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
300 | +------------------+--------------+---------------------+------------------+------------------+
301 | |                  | ``sign``     | ``None``            | \-               | \-               |
302 | | **Read Head**    +--------------+---------------------+------------------+------------------+
303 | |                  | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
304 | |                  +--------------+---------------------+------------------+------------------+
305 | |                  | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
306 | |                  +--------------+---------------------+------------------+------------------+
307 | |                  | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
308 | |                  +--------------+---------------------+------------------+------------------+
309 | |                  | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
310 | |                  +--------------+---------------------+------------------+------------------+
311 | |                  | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
312 | +------------------+--------------+---------------------+------------------+------------------+
313 | |                  | ``sign``     | ``None``            | \-               | \-               |
314 | | **Write Head**   +--------------+---------------------+------------------+------------------+
315 | |                  | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
316 | |                  +--------------+---------------------+------------------+------------------+
317 | |                  | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
318 | |                  +--------------+---------------------+------------------+------------------+
319 | |                  | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
320 | |                  +--------------+---------------------+------------------+------------------+
321 | |                  | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
322 | |                  +--------------+---------------------+------------------+------------------+
323 | |                  | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
324 | |                  +--------------+---------------------+------------------+------------------+
325 | |                  | ``sign_add`` | ``None``            | \-               | \-               |
326 | |                  +--------------+---------------------+------------------+------------------+
327 | |                  | ``add``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
328 | |                  +--------------+---------------------+------------------+------------------+
329 | |                  | ``erase``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
330 | +------------------+--------------+---------------------+------------------+------------------+
331 | | **Dense Layer**  | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid``      |
332 | +------------------+--------------+---------------------+------------------+------------------+
333 | 
334 | Initialization
335 | ^^^^^^^^^^^^^^
336 | +------------------+---------------------+-------------+-------------------+
337 | |                  | Initialization      | Learn init? | Operation dropout |
338 | +==================+=====================+=============+===================+
339 | | **Memory**       | ``Constant(1e-6)``  | ``False``   | \-                |
340 | +------------------+---------------------+-------------+-------------------+
341 | | **Read Head**    | ``init.OneHot()``   | ``False``   | No                |
342 | +------------------+---------------------+-------------+-------------------+
343 | | **Write Head**   | ``init.OneHot()``   | ``False``   | No                |
344 | +------------------+---------------------+-------------+-------------------+
345 | 
346 | 
347 | Upside Down Copy task
348 | ---------
349 | 
350 | General
351 | ^^^^^^^
352 | * **Batch size** : 1
353 | * **Architecture** : Single NTM layer + Dense output layer
354 | 
355 |   - *Memory shape* : ``(128, 20)``
356 |   - *Controller* : Dense controller
357 |   - *Heads* : 1 Read head + 1 Write head
358 | 
359 | * **Training examples**
360 | 
361 |   - *Size* : 8
362 |   - *Minimum Length* : 1
363 |   - *Maximum Length* : 5
364 | 
365 | Optimization
366 | ^^^^^^^^^^^^
367 | * **Objective** : Binary Cross-Entropy
368 | 
369 |   - Prediction trucated to ``[1e-10, 1 - 1e-10]``
370 | 
371 | * **Learning algorithm** : A. Graves' RMSProp
372 | 
373 |   - *Learning rate* : ``1e-3``
374 |   - *Chi* : ``0.95``
375 |   - *Alpha* : ``0.9``
376 |   - *Epsilon* : ``1e-4``
377 | 
378 | Parameters
379 | ^^^^^^^^^^
380 | +------------------+--------------+---------------------+------------------+------------------+
381 | |                  | Parameter    | W (init)            |  b (init)        | nonlinearity     |
382 | +==================+==============+=====================+==================+==================+
383 | | **Controller**   | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
384 | +------------------+--------------+---------------------+------------------+------------------+
385 | |                  | ``sign``     | ``None``            | \-               | \-               |
386 | | **Read Head**    +--------------+---------------------+------------------+------------------+
387 | |                  | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
388 | |                  +--------------+---------------------+------------------+------------------+
389 | |                  | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
390 | |                  +--------------+---------------------+------------------+------------------+
391 | |                  | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
392 | |                  +--------------+---------------------+------------------+------------------+
393 | |                  | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
394 | |                  +--------------+---------------------+------------------+------------------+
395 | |                  | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
396 | +------------------+--------------+---------------------+------------------+------------------+
397 | |                  | ``sign``     | ``None``            | \-               | \-               |
398 | | **Write Head**   +--------------+---------------------+------------------+------------------+
399 | |                  | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
400 | |                  +--------------+---------------------+------------------+------------------+
401 | |                  | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
402 | |                  +--------------+---------------------+------------------+------------------+
403 | |                  | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
404 | |                  +--------------+---------------------+------------------+------------------+
405 | |                  | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
406 | |                  +--------------+---------------------+------------------+------------------+
407 | |                  | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
408 | |                  +--------------+---------------------+------------------+------------------+
409 | |                  | ``sign_add`` | ``None``            | \-               | \-               |
410 | |                  +--------------+---------------------+------------------+------------------+
411 | |                  | ``add``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
412 | |                  +--------------+---------------------+------------------+------------------+
413 | |                  | ``erase``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
414 | +------------------+--------------+---------------------+------------------+------------------+
415 | | **Dense Layer**  | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid``      |
416 | +------------------+--------------+---------------------+------------------+------------------+
417 | 
418 | Initialization
419 | ^^^^^^^^^^^^^^
420 | +------------------+---------------------+-------------+-------------------+
421 | |                  | Initialization      | Learn init? | Operation dropout |
422 | +==================+=====================+=============+===================+
423 | | **Memory**       | ``GlorotUniform()`` | ``False``   | \-                |
424 | +------------------+---------------------+-------------+-------------------+
425 | | **Read Head**    | ``init.OneHot()``   | ``False``   | No                |
426 | +------------------+---------------------+-------------+-------------------+
427 | | **Write Head**   | ``init.OneHot()``   | ``False``   | No                |
428 | +------------------+---------------------+-------------+-------------------+
429 | 
430 | 
431 | Reversed Copy task
432 | ---------
433 | 
434 | General
435 | ^^^^^^^
436 | * **Batch size** : 1
437 | * **Architecture** : Single NTM layer + Recurrent output layer
438 | 
439 |   - *Memory shape* : ``(128, 20)``
440 |   - *Controller* : Recurrent controller
441 |   - *Heads* : 1 Read head + 1 Write head
442 | 
443 | * **Training examples**
444 | 
445 |   - *Size* : 8
446 |   - *Minimum Length* : 1
447 |   - *Maximum Length* : 5
448 | 
449 | Optimization
450 | ^^^^^^^^^^^^
451 | * **Objective** : Binary Cross-Entropy
452 | 
453 |   - Prediction trucated to ``[1e-10, 1 - 1e-10]``
454 | 
455 | * **Learning algorithm** : A. Graves' RMSProp
456 | 
457 |   - *Learning rate* : ``1e-3``
458 |   - *Chi* : ``0.95``
459 |   - *Alpha* : ``0.9``
460 |   - *Epsilon* : ``1e-4``
461 | 
462 | Parameters
463 | ^^^^^^^^^^
464 | +----------------------+--------------+---------------------+------------------+------------------+
465 | |                      | Parameter    | W (init)            |  b (init)        | nonlinearity     |
466 | +======================+==============+=====================+==================+==================+
467 | | **Controller**       | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
468 | +----------------------+--------------+---------------------+------------------+------------------+
469 | |                      | ``sign``     | ``None``            | \-               | \-               |
470 | | **Read Head**        +--------------+---------------------+------------------+------------------+
471 | |                      | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
472 | |                      +--------------+---------------------+------------------+------------------+
473 | |                      | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
474 | |                      +--------------+---------------------+------------------+------------------+
475 | |                      | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
476 | |                      +--------------+---------------------+------------------+------------------+
477 | |                      | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
478 | |                      +--------------+---------------------+------------------+------------------+
479 | |                      | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
480 | +----------------------+--------------+---------------------+------------------+------------------+
481 | |                      | ``sign``     | ``None``            | \-               | \-               |
482 | | **Write Head**       +--------------+---------------------+------------------+------------------+
483 | |                      | ``key``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
484 | |                      +--------------+---------------------+------------------+------------------+
485 | |                      | ``beta``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
486 | |                      +--------------+---------------------+------------------+------------------+
487 | |                      | ``gate``     | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
488 | |                      +--------------+---------------------+------------------+------------------+
489 | |                      | ``shift``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax``      |
490 | |                      +--------------+---------------------+------------------+------------------+
491 | |                      | ``gamma``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` |
492 | |                      +--------------+---------------------+------------------+------------------+
493 | |                      | ``sign_add`` | ``None``            | \-               | \-               |
494 | |                      +--------------+---------------------+------------------+------------------+
495 | |                      | ``add``      | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify``      |
496 | |                      +--------------+---------------------+------------------+------------------+
497 | |                      | ``erase``    | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` |
498 | +----------------------+--------------+---------------------+------------------+------------------+
499 | | **Recurrent Layer**  | \-           | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid``      |
500 | +----------------------+--------------+---------------------+------------------+------------------+
501 | 
502 | Initialization
503 | ^^^^^^^^^^^^^^
504 | +------------------+---------------------+-------------+-------------------+
505 | |                  | Initialization      | Learn init? | Operation dropout |
506 | +==================+=====================+=============+===================+
507 | | **Memory**       | ``GlorotUniform()`` | ``False``   | \-                |
508 | +------------------+---------------------+-------------+-------------------+
509 | | **Read Head**    | ``init.OneHot()``   | ``False``   | No                |
510 | +------------------+---------------------+-------------+-------------------+
511 | | **Write Head**   | ``init.OneHot()``   | ``False``   | No                |
512 | +------------------+---------------------+-------------+-------------------+
513 | 


--------------------------------------------------------------------------------
/models/associative-recall.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/associative-recall.npy


--------------------------------------------------------------------------------
/models/copy.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/copy.npy


--------------------------------------------------------------------------------
/models/dyck-words.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/dyck-words.npy


--------------------------------------------------------------------------------
/models/repeat-copy.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/repeat-copy.npy


--------------------------------------------------------------------------------
/models/reversed-copy.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/reversed-copy.npy


--------------------------------------------------------------------------------
/models/sort.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/sort.npy


--------------------------------------------------------------------------------
/models/upsidedown-copy.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/upsidedown-copy.npy


--------------------------------------------------------------------------------
/ntm/__init__.py:
--------------------------------------------------------------------------------
1 | from . import controllers
2 | from . import heads
3 | from . import init
4 | from . import layers
5 | from . import memory
6 | from . import nonlinearities
7 | from . import similarities
8 | from . import updates


--------------------------------------------------------------------------------
/ntm/controllers.py:
--------------------------------------------------------------------------------
  1 | import theano
  2 | import theano.tensor as T
  3 | import numpy as np
  4 | 
  5 | from lasagne.layers import Layer
  6 | from lasagne.layers.recurrent import Gate
  7 | import lasagne.nonlinearities
  8 | import lasagne.init
  9 | 
 10 | 
 11 | class Controller(Layer):
 12 |     r"""
 13 |     The base class :class:`Controller` represents a generic controller
 14 |     for the Neural Turing Machine. The controller is a neural network
 15 |     (feed-forward or recurrent) making the interface between the
 16 |     incoming layer (eg. an instance of :class:`lasagne.layers.InputLayer`)
 17 |     and the NTM.
 18 | 
 19 |     Parameters
 20 |     ----------
 21 |     incoming: a :class:`lasagne.layers.Layer` instance
 22 |         The layer feeding into the Neural Turing Machine.
 23 |     memory_shape: tuple
 24 |         Shape of the NTM's memory.
 25 |     num_units: int
 26 |         Number of hidden units in the controller.
 27 |     num_reads: int
 28 |         Number of read heads in the Neural Turing Machine.
 29 |     hid_init: callable, Numpy array or Theano shared variable
 30 |         Initializer for the initial hidden state (:math:`h_{0}`).
 31 |     learn_init: bool
 32 |         If ``True``, initial hidden values are learned.
 33 |     """
 34 |     def __init__(self, incoming, memory_shape, num_units, num_reads,
 35 |                  hid_init=lasagne.init.GlorotUniform(),
 36 |                  learn_init=False,
 37 |                  **kwargs):
 38 |         super(Controller, self).__init__(incoming, **kwargs)
 39 |         self.hid_init = self.add_param(hid_init, (1, num_units),
 40 |             name='hid_init', regularizable=False, trainable=learn_init)
 41 |         self.memory_shape = memory_shape
 42 |         self.num_units = num_units
 43 |         self.num_reads = num_reads
 44 | 
 45 |     def step(self, input, reads, hidden, state, *args, **kwargs):
 46 |         raise NotImplementedError
 47 | 
 48 |     def get_output_shape_for(self, input_shape):
 49 |         return (input_shape[0], self.num_units)
 50 | 
 51 | 
 52 | class DenseController(Controller):
 53 |     r"""
 54 |     A fully connected (feed-forward) controller for the NTM.
 55 | 
 56 |     .. math ::
 57 |         h_t = \sigma(x_{t} W_{x} + r_{t} W_{r} + b_{x} + b_{r})
 58 | 
 59 |     Parameters
 60 |     ----------
 61 |     incoming: a :class:`lasagne.layers.Layer` instance
 62 |         The layer feeding into the Neural Turing Machine.
 63 |     memory_shape: tuple
 64 |         Shape of the NTM's memory.
 65 |     num_units: int
 66 |         Number of hidden units in the controller.
 67 |     num_reads: int
 68 |         Number of read heads in the Neural Turing Machine.
 69 |     W_in_to_hid: callable, Numpy array or Theano shared variable
 70 |         If callable, initializer for the weights between the
 71 |         input and the hidden state. Otherwise a matrix with
 72 |         shape ``(num_inputs, num_units)`` (:math:`W_{x}`).
 73 |     b_in_to_hid: callable, Numpy array, Theano shared variable or ``None``
 74 |         If callable, initializer for the biases between the
 75 |         input and the hidden state. If ``None``, the controller
 76 |         has no bias between the input and the hidden state. Otherwise
 77 |         a 1D array with shape ``(num_units,)`` (:math:`b_{x}`).
 78 |     W_reads_to_hid: callable, Numpy array or Theano shared variable
 79 |         If callable, initializer for the weights between the
 80 |         read vector and the hidden state. Otherwise a matrix with
 81 |         shape ``(num_reads * memory_shape[1], num_units)`` (:math:`W_{r}`).
 82 |     b_reads_to_hid: callable, Numpy array, Theano shared variable or ``None``
 83 |         If callable, initializer for the biases between the
 84 |         read vector and the hidden state. If ``None``, the controller
 85 |         has no bias between the read vector and the hidden state.
 86 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`b_{r}`).
 87 |     nonlinearity: callable or ``None``
 88 |         The nonlinearity that is applied to the controller. If ``None``,
 89 |         the controller will be linear (:math:`\sigma`).
 90 |     hid_init: callable, np.ndarray or theano.shared
 91 |         Initializer for the initial hidden state (:math:`h_{0}`).
 92 |     learn_init: bool
 93 |         If ``True``, initial hidden values are learned.
 94 |     """
 95 |     def __init__(self, incoming, memory_shape, num_units, num_reads,
 96 |                  W_in_to_hid=lasagne.init.GlorotUniform(),
 97 |                  b_in_to_hid=lasagne.init.Constant(0.),
 98 |                  W_reads_to_hid=lasagne.init.GlorotUniform(),
 99 |                  b_reads_to_hid=lasagne.init.Constant(0.),
100 |                  nonlinearity=lasagne.nonlinearities.rectify,
101 |                  hid_init=lasagne.init.GlorotUniform(),
102 |                  learn_init=False,
103 |                  **kwargs):
104 |         super(DenseController, self).__init__(incoming, memory_shape, num_units,
105 |                                               num_reads, hid_init, learn_init,
106 |                                               **kwargs)
107 |         self.nonlinearity = (lasagne.nonlinearities.identity if
108 |                              nonlinearity is None else nonlinearity)
109 | 
110 |         def add_weight_and_bias_params(input_dim, W, b, name):
111 |             return (self.add_param(W, (input_dim, self.num_units),
112 |                 name='W_{}'.format(name)),
113 |                 self.add_param(b, (self.num_units,),
114 |                 name='b_{}'.format(name)) if b is not None else None)
115 |         num_inputs = int(np.prod(self.input_shape[2:]))
116 |         # Inputs / Hidden parameters
117 |         self.W_in_to_hid, self.b_in_to_hid = add_weight_and_bias_params(num_inputs,
118 |             W_in_to_hid, b_in_to_hid, name='in_to_hid')
119 |         # Read vectors / Hidden parameters
120 |         self.W_reads_to_hid, self.b_reads_to_hid = add_weight_and_bias_params(self.num_reads * self.memory_shape[1],
121 |             W_reads_to_hid, b_reads_to_hid, name='reads_to_hid')
122 | 
123 |     def step(self, input, reads, *args):
124 |         if input.ndim > 2:
125 |             input = input.flatten(2)
126 |         if reads.ndim > 2:
127 |             reads = reads.flatten(2)
128 | 
129 |         activation = T.dot(input, self.W_in_to_hid) + \
130 |                      T.dot(reads, self.W_reads_to_hid)
131 |         if self.b_in_to_hid is not None:
132 |             activation += self.b_in_to_hid.dimshuffle('x', 0)
133 |         if self.b_reads_to_hid is not None:
134 |             activation += self.b_reads_to_hid.dimshuffle('x', 0)
135 |         state = self.nonlinearity(activation)
136 |         return state, state
137 | 
138 |     def outputs_info(self, batch_size):
139 |         ones_vector = T.ones((batch_size, 1))
140 |         hid_init = T.dot(ones_vector, self.hid_init)
141 |         hid_init = T.unbroadcast(hid_init, 0)
142 |         return [hid_init, hid_init]
143 | 
144 | 
145 | class RecurrentController(Controller):
146 |     r"""
147 |     A "vanilla" recurrent controller for the NTM.
148 | 
149 |     .. math ::
150 |         h_t = \sigma(x_{t} W_{x} + r_{t} W_{r} +
151 |               h_{t-1} W_{h} + b_{x} + b_{r} + b_{h})
152 | 
153 |     Parameters
154 |     ----------
155 |     incoming: a :class:`lasagne.layers.Layer` instance
156 |         The layer feeding into the Neural Turing Machine.
157 |     memory_shape: tuple
158 |         Shape of the NTM's memory.
159 |     num_units: int
160 |         Number of hidden units in the controller.
161 |     num_reads: int
162 |         Number of read heads in the Neural Turing Machine.
163 |     W_in_to_hid: callable, Numpy array or Theano shared variable
164 |         If callable, initializer for the weights between the
165 |         input and the hidden state. Otherwise a matrix with
166 |         shape ``(num_inputs, num_units)`` (:math:`W_{x}`).
167 |     b_in_to_hid: callable, Numpy array, Theano shared variable or ``None``
168 |         If callable, initializer for the biases between the
169 |         input and the hidden state. If ``None``, the controller
170 |         has no bias between the input and the hidden state. Otherwise
171 |         a 1D array with shape ``(num_units,)`` (:math:`b_{x}`).
172 |     W_reads_to_hid: callable, Numpy array or Theano shared variable
173 |         If callable, initializer for the weights between the
174 |         read vector and the hidden state. Otherwise a matrix with
175 |         shape ``(num_reads * memory_shape[1], num_units)`` (:math:`W_{r}`).
176 |     b_reads_to_hid: callable, Numpy array, Theano shared variable or ``None``
177 |         If callable, initializer for the biases between the
178 |         read vector and the hidden state. If ``None``, the controller
179 |         has no bias between the read vector and the hidden state.
180 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`b_{r}`).
181 |     W_hid_to_hid: callable, Numpy array or Theano shared variable
182 |         If callable, initializer for the weights in the hidden-to-hidden
183 |         update. Otherwise a matrix with shape ``(num_units, num_units)``
184 |         (:math:`W_{h}`).
185 |     b_hid_to_hid: callable, Numpy array, Theano shared variable or ``None``
186 |         If callable, initializer for the biases in the hidden-to-hidden
187 |         update. If ``None``, the controller has no bias in the
188 |         hidden-to-hidden update. Otherwise a 1D array with shape
189 |         ``(num_units,)`` (:math:`b_{h}`).
190 |     nonlinearity: callable or ``None``
191 |         The nonlinearity that is applied to the controller. If ``None``,
192 |         the controller will be linear (:math:`\sigma`).
193 |     hid_init: callable, np.ndarray or theano.shared
194 |         Initializer for the initial hidden state (:math:`h_{0}`).
195 |     learn_init: bool
196 |         If ``True``, initial hidden values are learned.
197 |     """
198 |     def __init__(self, incoming, memory_shape, num_units, num_reads,
199 |                  W_in_to_hid=lasagne.init.GlorotUniform(),
200 |                  b_in_to_hid=lasagne.init.Constant(0.),
201 |                  W_reads_to_hid=lasagne.init.GlorotUniform(),
202 |                  b_reads_to_hid=lasagne.init.Constant(0.),
203 |                  W_hid_to_hid=lasagne.init.GlorotUniform(),
204 |                  b_hid_to_hid=lasagne.init.Constant(0.),
205 |                  nonlinearity=lasagne.nonlinearities.rectify,
206 |                  hid_init=lasagne.init.GlorotUniform(),
207 |                  learn_init=False,
208 |                  **kwargs):
209 |         super(RecurrentController, self).__init__(incoming, memory_shape, num_units,
210 |                                               num_reads, hid_init, learn_init,
211 |                                               **kwargs)
212 |         self.nonlinearity = (lasagne.nonlinearities.identity if
213 |                              nonlinearity is None else nonlinearity)
214 | 
215 |         def add_weight_and_bias_params(input_dim, W, b, name):
216 |             return (self.add_param(W, (input_dim, self.num_units),
217 |                 name='W_{}'.format(name)),
218 |                 self.add_param(b, (self.num_units,),
219 |                 name='b_{}'.format(name)) if b is not None else None)
220 |         num_inputs = int(np.prod(self.input_shape[2:]))
221 |         # Inputs / Hidden parameters
222 |         self.W_in_to_hid, self.b_in_to_hid = add_weight_and_bias_params(num_inputs,
223 |             W_in_to_hid, b_in_to_hid, name='in_to_hid')
224 |         # Read vectors / Hidden parameters
225 |         self.W_reads_to_hid, self.b_reads_to_hid = add_weight_and_bias_params(self.num_reads * self.memory_shape[1],
226 |             W_reads_to_hid, b_reads_to_hid, name='reads_to_hid')
227 |         # Hidden / Hidden parameters
228 |         self.W_hid_to_hid, self.b_hid_to_hid = add_weight_and_bias_params(self.num_units,
229 |             W_hid_to_hid, b_hid_to_hid, name='hid_to_hid')
230 | 
231 |     def step(self, input, reads, hidden, *args):
232 |         if input.ndim > 2:
233 |             input = input.flatten(2)
234 |         if reads.ndim > 2:
235 |             reads = reads.flatten(2)
236 | 
237 |         activation = T.dot(input, self.W_in_to_hid) + \
238 |                      T.dot(reads, self.W_reads_to_hid) + \
239 |                      T.dot(hidden, self.W_hid_to_hid)
240 |         if self.b_in_to_hid is not None:
241 |             activation += self.b_in_to_hid.dimshuffle('x', 0)
242 |         if self.b_reads_to_hid is not None:
243 |             activation += self.b_reads_to_hid.dimshuffle('x', 0)
244 |         if self.b_hid_to_hid is not None:
245 |             activation += self.b_hid_to_hid.dimshuffle('x', 0)
246 |         state = self.nonlinearity(activation)
247 |         return state, state
248 | 
249 |     def outputs_info(self, batch_size):
250 |         ones_vector = T.ones((batch_size, 1))
251 |         hid_init = T.dot(ones_vector, self.hid_init)
252 |         hid_init = T.unbroadcast(hid_init, 0)
253 |         return [hid_init, hid_init]
254 | 
255 | class LSTMController(Controller):
256 |     r"""
257 |     A LSTM recurrent controller for the NTM.
258 |     .. math ::
259 |         input-gate = \sigma(x_{t} Wi_{x} + r_{t} Wi_{r} +
260 |               h_{t-1} Wi_{h} + bi_{x} + bi_{r} + bi_{h})
261 |         forget-gate = \sigma(x_{t} Wf_{x} + r_{t} Wf_{r} +
262 |               h_{t-1} Wf_{h} + bf_{x} + bf_{r} + bf_{h})
263 |         output-gate = \sigma(x_{t} Wo_{x} + r_{t} Wo_{r} +
264 |               h_{t-1} Wo_{h} + bo_{x} + bo_{r} + bo_{h})
265 |         candidate-cell-state = \tanh(x_{t} Wc_{x} + r_{t} Wc_{r} +
266 |               h_{t-1} Wc_{h} + bc_{x} + bc_{r} + bc_{h})
267 |         cell-state_{t} = cell-state_{t-1} \odot forget-gate +
268 |               candidate-cell-state \odot input-gate
269 |         h_{t} = \tanh(cell-state_{t}) \odot output-gate
270 |     Parameters
271 |     ----------
272 |     incoming: a :class:`lasagne.layers.Layer` instance
273 |         The layer feeding into the Neural Turing Machine.
274 |     memory_shape: tuple
275 |         Shape of the NTM's memory.
276 |     num_units: int
277 |         Number of hidden units in the controller.
278 |     num_reads: int
279 |         Number of read heads in the Neural Turing Machine.
280 |     W_in_to_input: callable, Numpy array or Theano shared variable
281 |         If callable, initializer for the weights between the
282 |         input and the input gate. Otherwise a matrix with
283 |         shape ``(num_inputs, num_units)`` (:math:`Wi_{x}`).
284 |     b_in_to_input: callable, Numpy array, Theano shared variable or ``None``
285 |         If callable, initializer for the biases between the
286 |         input and the input gate. If ``None``, the controller
287 |         has no bias between the input and the input gate. Otherwise
288 |         a 1D array with shape ``(num_units,)`` (:math:`bi_{x}`).
289 |     W_reads_to_input: callable, Numpy array or Theano shared variable
290 |         If callable, initializer for the weights between the
291 |         read vector and the input gate. Otherwise a matrix with
292 |         shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wi_{r}`).
293 |     b_reads_to_input: callable, Numpy array, Theano shared variable or ``None``
294 |         If callable, initializer for the biases between the
295 |         read vector and the input gate. If ``None``, the controller
296 |         has no bias between the read vector and the input gate.
297 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`bi_{r}`).
298 |     W_hid_to_input: callable, Numpy array or Theano shared variable
299 |         If callable, initializer for the weights between the
300 |         hidden state and the input gate. Otherwise a matrix with
301 |         shape ``(num_units, num_units)`` (:math:`Wi_{h}`).
302 |     b_hid_to_input: callable, Numpy array, Theano shared variable or ``None``
303 |         If callable, initializer for the biases between the
304 |         hidden state and the input gate. If ``None``, the controller
305 |         has no bias between the hidden state and the input gate.
306 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`bi_{h}`).
307 |     W_in_to_forget: callable, Numpy array or Theano shared variable
308 |         If callable, initializer for the weights between the
309 |         input and the forget gate. Otherwise a matrix with
310 |         shape ``(num_inputs, num_units)`` (:math:`Wf_{x}`).
311 |     b_in_to_forget: callable, Numpy array, Theano shared variable or ``None``
312 |         If callable, initializer for the biases between the
313 |         input and the forget gate. If ``None``, the controller
314 |         has no bias between the input and the forget gate. Otherwise
315 |         a 1D array with shape ``(num_units,)`` (:math:`bf_{x}`).
316 |     W_reads_to_forget: callable, Numpy array or Theano shared variable
317 |         If callable, initializer for the weights between the
318 |         read vector and the forget gate. Otherwise a matrix with
319 |         shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wf_{r}`).
320 |     b_reads_to_forget: callable, Numpy array, Theano shared variable or ``None``
321 |         If callable, initializer for the biases between the
322 |         read vector and the forget gate. If ``None``, the controller
323 |         has no bias between the read vector and the forget gate.
324 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`bf_{r}`).
325 |     W_hid_to_forget: callable, Numpy array or Theano shared variable
326 |         If callable, initializer for the weights between the
327 |         hidden state and the forget gate. Otherwise a matrix with
328 |         shape ``(num_units, num_units)`` (:math:`Wf_{h}`).
329 |     b_hid_to_forget: callable, Numpy array, Theano shared variable or ``None``
330 |         If callable, initializer for the biases between the
331 |         hidden state and the forget gate. If ``None``, the controller
332 |         has no bias between the hidden state and the forget gate.
333 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`bf_{h}`).
334 |     W_in_to_output: callable, Numpy array or Theano shared variable
335 |         If callable, initializer for the weights between the
336 |         input and the output gate. Otherwise a matrix with
337 |         shape ``(num_inputs, num_units)`` (:math:`Wo_{x}`).
338 |     b_in_to_output: callable, Numpy array, Theano shared variable or ``None``
339 |         If callable, initializer for the biases between the
340 |         input and the output gate. If ``None``, the controller
341 |         has no bias between the input and the output gate. Otherwise
342 |         a 1D array with shape ``(num_units,)`` (:math:`bo_{x}`).
343 |     W_reads_to_output: callable, Numpy array or Theano shared variable
344 |         If callable, initializer for the weights between the
345 |         read vector and the output gate. Otherwise a matrix with
346 |         shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wo_{r}`).
347 |     b_reads_to_output: callable, Numpy array, Theano shared variable or ``None``
348 |         If callable, initializer for the biases between the
349 |         read vector and the output gate. If ``None``, the controller
350 |         has no bias between the read vector and the output gate.
351 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`bo_{r}`).
352 |     W_hid_to_output: callable, Numpy array or Theano shared variable
353 |         If callable, initializer for the weights between the
354 |         hidden state and the output gate. Otherwise a matrix with
355 |         shape ``(num_units, num_units)`` (:math:`Wo_{h}`).
356 |     b_hid_to_output: callable, Numpy array, Theano shared variable or ``None``
357 |         If callable, initializer for the biases between the
358 |         hidden state and the output gate. If ``None``, the controller
359 |         has no bias between the hidden state and the output gate.
360 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`bo_{h}`).
361 |     W_in_to_cell: callable, Numpy array or Theano shared variable
362 |         If callable, initializer for the weights between the
363 |         input and the cell state computation gate. Otherwise a matrix
364 |         with shape ``(num_inputs, num_units)`` (:math:`Wc_{x}`).
365 |     b_in_to_cell: callable, Numpy array, Theano shared variable or ``None``
366 |         If callable, initializer for the biases between the
367 |         input and the cell state computation gate. If ``None``,
368 |         the controller has no bias between the input and the cell
369 |         state computation gate. Otherwise a 1D array with shape
370 |         ``(num_units,)`` (:math:`bc_{x}`).
371 |     W_reads_to_cell: callable, Numpy array or Theano shared variable
372 |         If callable, initializer for the weights between the
373 |         read vector and the cell state computation gate. Otherwise a matrix
374 |         with shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wc_{r}`).
375 |     b_reads_to_cell: callable, Numpy array, Theano shared variable or ``None``
376 |         If callable, initializer for the biases between the
377 |         read vector and the cell state computation gate. If ``None``,
378 |         the controller has no bias between the read vector and the cell
379 |         state computation gate. Otherwise a 1D array with shape
380 |         ``(num_units,)`` (:math:`bc_{r}`).
381 |     W_hid_to_cell: callable, Numpy array or Theano shared variable
382 |         If callable, initializer for the weights between the
383 |         hidden state and the cell state computation gate. Otherwise a matrix
384 |         with shape ``(num_units, num_units)`` (:math:`Wc_{h}`).
385 |     b_hid_to_cell: callable, Numpy array, Theano shared variable or ``None``
386 |         If callable, initializer for the biases between the
387 |         hidden state and the cell state computation gate. If ``None``,
388 |         the controller has no bias between the hidden state and the cell
389 |         state computation gate. Otherwise a 1D array with shape
390 |         ``(num_units,)`` (:math:`bc_{h}`).
391 |     hid_init: callable, np.ndarray or theano.shared
392 |         Initializer for the initial hidden state (:math:`h_{0}`).
393 |     cell_init: callable, np.ndarray or theano.shared
394 |         Initializer for the initial cell state (:math:`cell-state_{0}`).
395 |     learn_init: bool
396 |         If ``True``, initial hidden values are learned.
397 |     """
398 |     def __init__(self, incoming, memory_shape, num_units, num_reads,
399 |                  W_in_to_input=lasagne.init.GlorotUniform(),
400 |                  b_in_to_input=lasagne.init.Constant(0.),
401 |                  W_reads_to_input=lasagne.init.GlorotUniform(),
402 |                  b_reads_to_input=lasagne.init.Constant(0.),
403 |                  W_hid_to_input=lasagne.init.GlorotUniform(),
404 |                  b_hid_to_input=lasagne.init.Constant(0.),
405 |                  W_in_to_forget=lasagne.init.GlorotUniform(),
406 |                  b_in_to_forget=lasagne.init.Constant(0.),
407 |                  W_reads_to_forget=lasagne.init.GlorotUniform(),
408 |                  b_reads_to_forget=lasagne.init.Constant(0.),
409 |                  W_hid_to_forget=lasagne.init.GlorotUniform(),
410 |                  b_hid_to_forget=lasagne.init.Constant(0.),
411 |                  W_in_to_output=lasagne.init.GlorotUniform(),
412 |                  b_in_to_output=lasagne.init.Constant(0.),
413 |                  W_reads_to_output=lasagne.init.GlorotUniform(),
414 |                  b_reads_to_output=lasagne.init.Constant(0.),
415 |                  W_hid_to_output=lasagne.init.GlorotUniform(),
416 |                  b_hid_to_output=lasagne.init.Constant(0.),
417 |                  W_in_to_cell=lasagne.init.GlorotUniform(),
418 |                  b_in_to_cell=lasagne.init.Constant(0.),
419 |                  W_reads_to_cell=lasagne.init.GlorotUniform(),
420 |                  b_reads_to_cell=lasagne.init.Constant(0.),
421 |                  W_hid_to_cell=lasagne.init.GlorotUniform(),
422 |                  b_hid_to_cell=lasagne.init.Constant(0.),
423 |                  nonlinearity=lasagne.nonlinearities.rectify,
424 |                  hid_init=lasagne.init.GlorotUniform(),
425 |                  cell_init=lasagne.init.Constant(0.),
426 |                  learn_init=False,
427 |                  **kwargs):
428 |         super(LSTMController, self).__init__(incoming, memory_shape, num_units,
429 |                                               num_reads, hid_init, learn_init,
430 |                                               **kwargs)
431 |         self.nonlinearity = (lasagne.nonlinearities.identity if
432 |                              nonlinearity is None else nonlinearity)
433 |         self.cell_init = self.add_param(cell_init, (1, num_units),
434 |             name='cell_init', regularizable=False, trainable=learn_init)
435 | 
436 |         def add_weight_and_bias_params(input_dim, W, b, name):
437 |             return (self.add_param(W, (input_dim, self.num_units),
438 |                 name='W_{}'.format(name)),
439 |                 self.add_param(b, (self.num_units,),
440 |                 name='b_{}'.format(name)) if b is not None else None)
441 |         num_inputs = int(np.prod(self.input_shape[2:]))
442 |         # Inputs / Input Gate parameters
443 |         self.W_in_to_input, self.b_in_to_input = add_weight_and_bias_params(num_inputs,
444 |             W_in_to_input, b_in_to_input, name='in_to_input')
445 |         # Read vectors / Input Gate parameters
446 |         self.W_reads_to_input, self.b_reads_to_input = add_weight_and_bias_params(self.num_reads * self.memory_shape[1],
447 |             W_reads_to_input, b_reads_to_input, name='reads_to_input')
448 |         # Hidden / Input Gate parameters
449 |         self.W_hid_to_input, self.b_hid_to_input = add_weight_and_bias_params(self.num_units,
450 |             W_hid_to_input, b_hid_to_input, name='hid_to_input')
451 |         # Inputs / Forget Gate parameters
452 |         self.W_in_to_forget, self.b_in_to_forget = add_weight_and_bias_params(num_inputs,
453 |             W_in_to_forget, b_in_to_forget, name='in_to_forget')
454 |         # Read vectors / Forget Gate parameters
455 |         self.W_reads_to_forget, self.b_reads_to_forget = add_weight_and_bias_params(self.num_reads * self.memory_shape[1],
456 |             W_reads_to_forget, b_reads_to_forget, name='reads_to_forget')
457 |         # Hidden / Forget Gate parameters
458 |         self.W_hid_to_forget, self.b_hid_to_forget = add_weight_and_bias_params(self.num_units,
459 |             W_hid_to_forget, b_hid_to_forget, name='hid_to_forget')
460 |         # Inputs / Output Gate parameters
461 |         self.W_in_to_output, self.b_in_to_output = add_weight_and_bias_params(num_inputs,
462 |             W_in_to_output, b_in_to_output, name='in_to_output')
463 |         # Read vectors / Output Gate parameters
464 |         self.W_reads_to_output, self.b_reads_to_output = add_weight_and_bias_params(self.num_reads * self.memory_shape[1],
465 |             W_reads_to_output, b_reads_to_output, name='reads_to_output')
466 |         # Hidden / Output Gate parameters
467 |         self.W_hid_to_output, self.b_hid_to_output = add_weight_and_bias_params(self.num_units,
468 |             W_hid_to_output, b_hid_to_output, name='hid_to_output')
469 |         # Inputs / Cell State parameters
470 |         self.W_in_to_cell, self.b_in_to_cell = add_weight_and_bias_params(num_inputs,
471 |             W_in_to_cell, b_in_to_cell, name='in_to_cell')
472 |         # Read vectors / Cell State parameters
473 |         self.W_reads_to_cell, self.b_reads_to_cell = add_weight_and_bias_params(self.num_reads * self.memory_shape[1],
474 |             W_reads_to_cell, b_reads_to_cell, name='reads_to_cell')
475 |         # Hidden / Cell State parameters
476 |         self.W_hid_to_cell, self.b_hid_to_cell = add_weight_and_bias_params(self.num_units,
477 |             W_hid_to_cell, b_hid_to_cell, name='hid_to_cell')
478 | 
479 |     def step(self, input, reads, hidden, cell, *args):
480 |         if input.ndim > 2:
481 |             input = input.flatten(2)
482 |         if reads.ndim > 2:
483 |             reads = reads.flatten(2)
484 |         # Input Gate output computation
485 |         activation = T.dot(input, self.W_in_to_input) + \
486 |                      T.dot(reads, self.W_reads_to_input) + \
487 |                      T.dot(hidden, self.W_hid_to_input)
488 |         if self.b_in_to_input is not None:
489 |             activation += self.b_in_to_input.dimshuffle('x', 0)
490 |         if self.b_reads_to_input is not None:
491 |             activation += self.b_reads_to_input.dimshuffle('x', 0)
492 |         if self.b_hid_to_input is not None:
493 |             activation += self.b_hid_to_input.dimshuffle('x', 0)
494 |         input_gate = lasagne.nonlinearities.sigmoid(activation)
495 |         # Forget Gate output computation
496 |         activation = T.dot(input, self.W_in_to_forget) + \
497 |                      T.dot(reads, self.W_reads_to_forget) + \
498 |                      T.dot(hidden, self.W_hid_to_forget)
499 |         if self.b_in_to_forget is not None:
500 |             activation += self.b_in_to_forget.dimshuffle('x', 0)
501 |         if self.b_reads_to_forget is not None:
502 |             activation += self.b_reads_to_forget.dimshuffle('x', 0)
503 |         if self.b_hid_to_forget is not None:
504 |             activation += self.b_hid_to_forget.dimshuffle('x', 0)
505 |         forget_gate = lasagne.nonlinearities.sigmoid(activation)
506 |         # Output Gate output computation
507 |         activation = T.dot(input, self.W_in_to_output) + \
508 |                      T.dot(reads, self.W_reads_to_output) + \
509 |                      T.dot(hidden, self.W_hid_to_output)
510 |         if self.b_in_to_output is not None:
511 |             activation += self.b_in_to_output.dimshuffle('x', 0)
512 |         if self.b_reads_to_output is not None:
513 |             activation += self.b_reads_to_output.dimshuffle('x', 0)
514 |         if self.b_hid_to_output is not None:
515 |             activation += self.b_hid_to_output.dimshuffle('x', 0)
516 |         output_gate = lasagne.nonlinearities.sigmoid(activation)
517 |         # New candidate cell state computation
518 |         activation = T.dot(input, self.W_in_to_cell) + \
519 |                      T.dot(reads, self.W_reads_to_cell) + \
520 |                      T.dot(hidden, self.W_hid_to_cell)
521 |         if self.b_in_to_cell is not None:
522 |             activation += self.b_in_to_cell.dimshuffle('x', 0)
523 |         if self.b_reads_to_cell is not None:
524 |             activation += self.b_reads_to_cell.dimshuffle('x', 0)
525 |         if self.b_hid_to_cell is not None:
526 |             activation += self.b_hid_to_cell.dimshuffle('x', 0)
527 |         candidate_cell_state = lasagne.nonlinearities.tanh(activation)
528 |         # New cell state and hidden state computation
529 |         cell_state = cell * forget_gate + candidate_cell_state * input_gate
530 |         state = lasagne.nonlinearities.tanh(cell_state) * output_gate
531 |         return state, cell_state
532 | 
533 |     def outputs_info(self, batch_size):
534 |         ones_vector = T.ones((batch_size, 1))
535 |         hid_init = T.dot(ones_vector, self.hid_init)
536 |         hid_init = T.unbroadcast(hid_init, 0)
537 |         cell_init = T.dot(ones_vector, self.cell_init)
538 |         cell_init = T.unbroadcast(cell_init, 0)
539 |         return [hid_init, cell_init]
540 | 
541 | class GRUController(Controller):
542 |     r"""
543 |     A GRU recurrent controller for the NTM.
544 |     .. math ::
545 |         update-gate = \sigma(x_{t} Wz_{x} + r_{t} Wz_{r} +
546 |               h_{t-1} Wz_{h} + bz_{x} + bz_{r} + bz_{h})
547 |         reset-gate = \sigma(x_{t} Wr_{x} + r_{t} Wr_{r} +
548 |               h_{t-1} Wr_{h} + br_{x} + br_{r} + br_{h})
549 |         s = \tanh(x_{t} Ws_{x} + r_{t} Ws_{r} +
550 |               (h_{t-1} \odot reset-gate) Ws_{h})
551 |         h_{t} = (1 - update-gate) \odot s + update-gate \odot h_{t-1}
552 |     Parameters
553 |     ----------
554 |     incoming: a :class:`lasagne.layers.Layer` instance
555 |         The layer feeding into the Neural Turing Machine.
556 |     memory_shape: tuple
557 |         Shape of the NTM's memory.
558 |     num_units: int
559 |         Number of hidden units in the controller.
560 |     num_reads: int
561 |         Number of read heads in the Neural Turing Machine.
562 |     W_in_to_input: callable, Numpy array or Theano shared variable
563 |         If callable, initializer for the weights between the
564 |         input and the update gate. Otherwise a matrix with
565 |         shape ``(num_inputs, num_units)`` (:math:`Wz_{x}`).
566 |     b_in_to_update: callable, Numpy array, Theano shared variable or ``None``
567 |         If callable, initializer for the biases between the
568 |         input and the update gate. If ``None``, the controller
569 |         has no bias between the input and the update gate. Otherwise
570 |         a 1D array with shape ``(num_units,)`` (:math:`bz_{x}`).
571 |     W_reads_to_update: callable, Numpy array or Theano shared variable
572 |         If callable, initializer for the weights between the
573 |         read vector and the update gate. Otherwise a matrix with
574 |         shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wz_{r}`).
575 |     b_reads_to_update: callable, Numpy array, Theano shared variable or ``None``
576 |         If callable, initializer for the biases between the
577 |         read vector and the update gate. If ``None``, the controller
578 |         has no bias between the read vector and the update gate.
579 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`bz_{r}`).
580 |     W_hid_to_update: callable, Numpy array or Theano shared variable
581 |         If callable, initializer for the weights between the
582 |         hidden state and the update gate. Otherwise a matrix with
583 |         shape ``(num_units, num_units)`` (:math:`Wz_{h}`).
584 |     b_hid_to_update: callable, Numpy array, Theano shared variable or ``None``
585 |         If callable, initializer for the biases between the
586 |         hidden state and the update gate. If ``None``, the controller
587 |         has no bias between the hidden state and the update gate.
588 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`bz_{h}`).
589 |     W_in_to_reset: callable, Numpy array or Theano shared variable
590 |         If callable, initializer for the weights between the
591 |         input and the reset gate. Otherwise a matrix with
592 |         shape ``(num_inputs, num_units)`` (:math:`Wr_{x}`).
593 |     b_in_to_reset: callable, Numpy array, Theano shared variable or ``None``
594 |         If callable, initializer for the biases between the
595 |         input and the reset gate. If ``None``, the controller
596 |         has no bias between the input and the reset gate. Otherwise
597 |         a 1D array with shape ``(num_units,)`` (:math:`br_{x}`).
598 |     W_reads_to_reset: callable, Numpy array or Theano shared variable
599 |         If callable, initializer for the weights between the
600 |         read vector and the reset gate. Otherwise a matrix with
601 |         shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wr_{r}`).
602 |     b_reads_to_reset: callable, Numpy array, Theano shared variable or ``None``
603 |         If callable, initializer for the biases between the
604 |         read vector and the reset gate. If ``None``, the controller
605 |         has no bias between the read vector and the reset gate.
606 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`br_{r}`).
607 |     W_hid_to_reset: callable, Numpy array or Theano shared variable
608 |         If callable, initializer for the weights between the
609 |         hidden state and the reset gate. Otherwise a matrix with
610 |         shape ``(num_units, num_units)`` (:math:`Wr_{h}`).
611 |     b_hid_to_reset: callable, Numpy array, Theano shared variable or ``None``
612 |         If callable, initializer for the biases between the
613 |         hidden state and the reset gate. If ``None``, the controller
614 |         has no bias between the hidden state and the reset gate.
615 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`br_{h}`).
616 |     W_in_to_hid: callable, Numpy array or Theano shared variable
617 |         If callable, initializer for the weights between the
618 |         input and the hidden gate. Otherwise a matrix with
619 |         shape ``(num_inputs, num_units)`` (:math:`Ws_{x}`).
620 |     b_in_to_hid: callable, Numpy array, Theano shared variable or ``None``
621 |         If callable, initializer for the biases between the
622 |         input and the hidden gate. If ``None``, the controller
623 |         has no bias between the input and the hidden gate. Otherwise
624 |         a 1D array with shape ``(num_units,)`` (:math:`bs_{x}`).
625 |     W_reads_to_hid: callable, Numpy array or Theano shared variable
626 |         If callable, initializer for the weights between the
627 |         read vector and the hidden gate. Otherwise a matrix with
628 |         shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Ws_{r}`).
629 |     b_reads_to_hid: callable, Numpy array, Theano shared variable or ``None``
630 |         If callable, initializer for the biases between the
631 |         read vector and the hidden gate. If ``None``, the controller
632 |         has no bias between the read vector and the hidden gate.
633 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`bs_{r}`).
634 |     W_hid_to_hid: callable, Numpy array or Theano shared variable
635 |         If callable, initializer for the weights between the
636 |         hidden state and the hidden gate. Otherwise a matrix with
637 |         shape ``(num_units, num_units)`` (:math:`Ws_{h}`).
638 |     b_hid_to_hid: callable, Numpy array, Theano shared variable or ``None``
639 |         If callable, initializer for the biases between the
640 |         hidden state and the hidden gate. If ``None``, the controller
641 |         has no bias between the hidden state and the hidden gate.
642 |         Otherwise a 1D array with shape ``(num_units,)`` (:math:`bs_{h}`).
643 |     hid_init: callable, np.ndarray or theano.shared
644 |         Initializer for the initial hidden state (:math:`h_{0}`).
645 |     learn_init: bool
646 |         If ``True``, initial hidden values are learned.
647 |     """
648 |     def __init__(self, incoming, memory_shape, num_units, num_reads,
649 |                  W_in_to_update=lasagne.init.GlorotUniform(),
650 |                  b_in_to_update=lasagne.init.Constant(0.),
651 |                  W_reads_to_update=lasagne.init.GlorotUniform(),
652 |                  b_reads_to_update=lasagne.init.Constant(0.),
653 |                  W_hid_to_update=lasagne.init.GlorotUniform(),
654 |                  b_hid_to_update=lasagne.init.Constant(0.),
655 |                  W_in_to_reset=lasagne.init.GlorotUniform(),
656 |                  b_in_to_reset=lasagne.init.Constant(0.),
657 |                  W_reads_to_reset=lasagne.init.GlorotUniform(),
658 |                  b_reads_to_reset=lasagne.init.Constant(0.),
659 |                  W_hid_to_reset=lasagne.init.GlorotUniform(),
660 |                  b_hid_to_reset=lasagne.init.Constant(0.),
661 |                  W_in_to_hid=lasagne.init.GlorotUniform(),
662 |                  b_in_to_hid=lasagne.init.Constant(0.),
663 |                  W_reads_to_hid=lasagne.init.GlorotUniform(),
664 |                  b_reads_to_hid=lasagne.init.Constant(0.),
665 |                  W_hid_to_hid=lasagne.init.GlorotUniform(),
666 |                  b_hid_to_hid=lasagne.init.Constant(0.),
667 |                  nonlinearity=lasagne.nonlinearities.rectify,
668 |                  hid_init=lasagne.init.GlorotUniform(),
669 |                  learn_init=False,
670 |                  **kwargs):
671 |         super(GRUController, self).__init__(incoming, memory_shape, num_units,
672 |                                               num_reads, hid_init, learn_init,
673 |                                               **kwargs)
674 |         self.nonlinearity = (lasagne.nonlinearities.identity if
675 |                              nonlinearity is None else nonlinearity)
676 | 
677 |         def add_weight_and_bias_params(input_dim, W, b, name):
678 |             return (self.add_param(W, (input_dim, self.num_units),
679 |                 name='W_{}'.format(name)),
680 |                 self.add_param(b, (self.num_units,),
681 |                 name='b_{}'.format(name)) if b is not None else None)
682 |         num_inputs = int(np.prod(self.input_shape[2:]))
683 |         # Inputs / Update Gate parameters
684 |         self.W_in_to_update, self.b_in_to_update = add_weight_and_bias_params(num_inputs,
685 |             W_in_to_update, b_in_to_update, name='in_to_update')
686 |         # Read vectors / Update Gate parameters
687 |         self.W_reads_to_update, self.b_reads_to_update = add_weight_and_bias_params(self.num_reads * self.memory_shape[1],
688 |             W_reads_to_update, b_reads_to_update, name='reads_to_update')
689 |         # Hidden / Update Gate parameters
690 |         self.W_hid_to_update, self.b_hid_to_update = add_weight_and_bias_params(self.num_units,
691 |             W_hid_to_update, b_hid_to_update, name='hid_to_update')
692 |         # Inputs / Reset Gate parameters
693 |         self.W_in_to_reset, self.b_in_to_reset = add_weight_and_bias_params(num_inputs,
694 |             W_in_to_reset, b_in_to_reset, name='in_to_reset')
695 |         # Read vectors / Reset Gate parameters
696 |         self.W_reads_to_reset, self.b_reads_to_reset = add_weight_and_bias_params(self.num_reads * self.memory_shape[1],
697 |             W_reads_to_reset, b_reads_to_reset, name='reads_to_reset')
698 |         # Hidden / Reset Gate parameters
699 |         self.W_hid_to_reset, self.b_hid_to_reset = add_weight_and_bias_params(self.num_units,
700 |             W_hid_to_reset, b_hid_to_reset, name='hid_to_reset')
701 |         # Inputs / Hidden Gate parameters
702 |         self.W_in_to_hid, self.b_in_to_hid = add_weight_and_bias_params(num_inputs,
703 |             W_in_to_hid, b_in_to_hid, name='in_to_hid')
704 |         # Read vectors / Hidden Gate parameters
705 |         self.W_reads_to_hid, self.b_reads_to_hid = add_weight_and_bias_params(self.num_reads * self.memory_shape[1],
706 |             W_reads_to_hid, b_reads_to_hid, name='reads_to_hid')
707 |         # Hidden / Hidden Gate parameters
708 |         self.W_hid_to_hid, self.b_hid_to_hid = add_weight_and_bias_params(self.num_units,
709 |             W_hid_to_hid, b_hid_to_hid, name='hid_to_hid')
710 | 
711 |     def step(self, input, reads, hidden, *args):
712 |         if input.ndim > 2:
713 |             input = input.flatten(2)
714 |         if reads.ndim > 2:
715 |             reads = reads.flatten(2)
716 |         # Update Gate output computation
717 |         activation = T.dot(input, self.W_in_to_update) + \
718 |                      T.dot(reads, self.W_reads_to_update) + \
719 |                      T.dot(hidden, self.W_hid_to_update)
720 |         if self.b_in_to_update is not None:
721 |             activation += self.b_in_to_update.dimshuffle('x', 0)
722 |         if self.b_reads_to_update is not None:
723 |             activation += self.b_reads_to_update.dimshuffle('x', 0)
724 |         if self.b_hid_to_update is not None:
725 |             activation += self.b_hid_to_update.dimshuffle('x', 0)
726 |         update_gate = lasagne.nonlinearities.sigmoid(activation)
727 |         # Reset Gate output computation
728 |         activation = T.dot(input, self.W_in_to_reset) + \
729 |                      T.dot(reads, self.W_reads_to_reset) + \
730 |                      T.dot(hidden, self.W_hid_to_reset)
731 |         if self.b_in_to_reset is not None:
732 |             activation += self.b_in_to_reset.dimshuffle('x', 0)
733 |         if self.b_reads_to_reset is not None:
734 |             activation += self.b_reads_to_reset.dimshuffle('x', 0)
735 |         if self.b_hid_to_reset is not None:
736 |             activation += self.b_hid_to_reset.dimshuffle('x', 0)
737 |         reset_gate = lasagne.nonlinearities.sigmoid(activation)
738 |         # Hidden Gate output computation
739 |         activation = T.dot(input, self.W_in_to_hid) + \
740 |                      T.dot(reads, self.W_reads_to_hid) + \
741 |                      T.dot((hidden * reset_gate), self.W_hid_to_hid)
742 |         if self.b_in_to_hid is not None:
743 |             activation += self.b_in_to_hid.dimshuffle('x', 0)
744 |         if self.b_reads_to_hid is not None:
745 |             activation += self.b_reads_to_hid.dimshuffle('x', 0)
746 |         if self.b_hid_to_hid is not None:
747 |             activation += self.b_hid_to_hid.dimshuffle('x', 0)
748 |         hidden_gate = lasagne.nonlinearities.tanh(activation)
749 |         # New hidden state computation
750 |         ones = T.ones(update_gate.shape)
751 |         state = (ones - update_gate) * hidden_gate + update_gate * hidden
752 |         return state, state
753 | 
754 |     def outputs_info(self, batch_size):
755 |         ones_vector = T.ones((batch_size, 1))
756 |         hid_init = T.dot(ones_vector, self.hid_init)
757 |         hid_init = T.unbroadcast(hid_init, 0)
758 |         return [hid_init, hid_init]
759 | 


--------------------------------------------------------------------------------
/ntm/heads.py:
--------------------------------------------------------------------------------
  1 | import theano
  2 | import theano.tensor as T
  3 | import numpy as np
  4 | from collections import OrderedDict
  5 | 
  6 | from lasagne.layers import Layer, DenseLayer
  7 | from lasagne.theano_extensions import padding
  8 | import lasagne.init
  9 | import lasagne.nonlinearities
 10 | 
 11 | import similarities
 12 | import nonlinearities
 13 | import init
 14 | 
 15 | 
 16 | class Head(Layer):
 17 |     r"""
 18 |     The base class :class:`Head` represents a generic head for the
 19 |     Neural Turing Machine. The heads are responsible for the read/write
 20 |     operations on the memory. An instance of :class:`Head` outputs a
 21 |     weight vector defined by
 22 | 
 23 |     .. math ::
 24 |         k_{t} &= \sigma_{key}(h_{t} W_{key} + b_{key})\\
 25 |         \beta_{t} &= \sigma_{beta}(h_{t} W_{beta} + b_{beta})\\
 26 |         g_{t} &= \sigma_{gate}(h_{t} W_{gate} + b_{gate})\\
 27 |         s_{t} &= \sigma_{shift}(h_{t} W_{shift} + b_{shift})\\
 28 |         \gamma_{t} &= \sigma_{gamma}(h_{t} W_{gamma} + b_{gamma})
 29 | 
 30 |     .. math ::
 31 |         w_{t}^{c} &= softmax(\beta_{t} * K(k_{t}, M_{t}))\\
 32 |         w_{t}^{g} &= g_{t} * w_{t}^{c} + (1 - g_{t}) * w_{t-1}\\
 33 |         \tilde{w}_{t} &= s_{t} \ast w_{t}^{g}\\
 34 |         w_{t} \propto \tilde{w}_{t}^{\gamma_{t}}
 35 | 
 36 |     Parameters
 37 |     ----------
 38 |     controller: a :class:`Controller` instance
 39 |         The controller of the Neural Turing Machine.
 40 |     num_shifts: int
 41 |         Number of shifts allowed by the convolutional shift operation
 42 |         (centered on 0, eg. ``num_shifts=3`` represents shifts
 43 |         in [-1, 0, 1]).
 44 |     memory_shape: tuple
 45 |         Shape of the NTM's memory
 46 |     W_hid_to_key: callable, Numpy array or Theano shared variable
 47 |         If callable, initializer of the weights for the parameter
 48 |         :math:`k_{t}`. Otherwise a matrix with shape
 49 |         ``(controller.num_units, memory_shape[1])``.
 50 |     b_hid_to_key: callable, Numpy array, Theano shared variable or ``None``
 51 |         If callable, initializer of the biases for the parameter
 52 |         :math:`k_{t}`. If ``None``, no bias. Otherwise a matrix
 53 |         with shape ``(memory_shape[1],)``.
 54 |     nonlinearity_key: callable or ``None``
 55 |         The nonlinearity that is applied for parameter :math:`k_{t}`. If
 56 |         ``None``, the nonlinearity is ``identity``.
 57 |     W_hid_to_beta: callable, Numpy array or Theano shared variable
 58 |     b_hid_to_beta: callable, Numpy array, Theano shared variable or ``None``
 59 |     nonlinearity_beta: callable or ``None``
 60 |         Weights, biases and nonlinearity for parameter :math:`\beta_{t}`.
 61 |     W_hid_to_gate: callable, Numpy array or Theano shared variable
 62 |     b_hid_to_gate: callable, Numpy array, Theano shared variable or ``None``
 63 |     nonlinearity_gate: callable or ``None``
 64 |         Weights, biases and nonlinearity for parameter :math:`g_{t}`.
 65 |     W_hid_to_shift: callable, Numpy array or Theano shared variable
 66 |     b_hid_to_shift: callable, Numpy array, Theano shared variable or ``None``
 67 |     nonlinearity_shift: callable or ``None``
 68 |         Weights, biases and nonlinearity for parameter :math:`s_{t}`.
 69 |     W_hid_to_gamma: callable, Numpy array or Theano shared variable
 70 |     b_hid_to_gamma: callable, Numpy array, Theano shared variable or ``None``
 71 |     nonlinearity_gamma: callable or ``None``
 72 |         Weights, biases and nonlinearity for parameter :math:`\gamma_{t}`
 73 |     weights_init: callable, Numpy array or Theano shared variable
 74 |         Initializer for the initial weight vector (:math:`w_{0}`).
 75 |     learn_init: bool
 76 |         If ``True``, initial hidden values are learned.
 77 |     """
 78 |     def __init__(self, controller, num_shifts=3, memory_shape=(128, 20),
 79 |                  W_hid_to_key=lasagne.init.GlorotUniform(),
 80 |                  b_hid_to_key=lasagne.init.Constant(0.),
 81 |                  nonlinearity_key=nonlinearities.ClippedLinear(low=0., high=1.),
 82 |                  W_hid_to_beta=lasagne.init.GlorotUniform(),
 83 |                  b_hid_to_beta=lasagne.init.Constant(0.),
 84 |                  nonlinearity_beta=lasagne.nonlinearities.rectify,
 85 |                  W_hid_to_gate=lasagne.init.GlorotUniform(),
 86 |                  b_hid_to_gate=lasagne.init.Constant(0.),
 87 |                  nonlinearity_gate=nonlinearities.hard_sigmoid,
 88 |                  W_hid_to_shift=lasagne.init.GlorotUniform(),
 89 |                  b_hid_to_shift=lasagne.init.Constant(0.),
 90 |                  nonlinearity_shift=lasagne.nonlinearities.softmax,
 91 |                  W_hid_to_gamma=lasagne.init.GlorotUniform(),
 92 |                  b_hid_to_gamma=lasagne.init.Constant(0.),
 93 |                  nonlinearity_gamma=lambda x: 1. + lasagne.nonlinearities.rectify(x),
 94 |                  weights_init=init.OneHot(),
 95 |                  learn_init=False,
 96 |                  **kwargs):
 97 |         super(Head, self).__init__(controller, **kwargs)
 98 | 
 99 |         self.memory_shape = memory_shape
100 |         self.name = kwargs.get('name', 'head')
101 |         self.learn_init = learn_init
102 | 
103 |         # Key
104 |         self.W_hid_to_key = self.add_param(W_hid_to_key, (1, self.input_shape[1], \
105 |             self.memory_shape[1]), name=self.name + '.key.W')
106 |         self.b_hid_to_key = self.add_param(b_hid_to_key, (1, self.memory_shape[1]), \
107 |             name=self.name + '.key.b', regularizable=False)
108 |         self.nonlinearity_key = nonlinearity_key
109 |         # Beta
110 |         self.W_hid_to_beta = self.add_param(W_hid_to_beta, (1, self.input_shape[1], \
111 |             1), name=self.name + '.beta.W')
112 |         self.b_hid_to_beta = self.add_param(b_hid_to_beta, (1, 1), \
113 |             name=self.name + '.beta.b', regularizable=False)
114 |         self.nonlinearity_beta = nonlinearity_beta
115 |         # Gate
116 |         self.W_hid_to_gate = self.add_param(W_hid_to_gate, (1, self.input_shape[1], \
117 |             1), name=self.name + '.gate.W')
118 |         self.b_hid_to_gate = self.add_param(b_hid_to_gate, (1, 1), \
119 |             name=self.name + '.gate.b', regularizable=False)
120 |         self.nonlinearity_gate = nonlinearity_gate
121 |         # Shift
122 |         self.num_shifts = num_shifts
123 |         self.W_hid_to_shift = self.add_param(W_hid_to_shift, (1, self.input_shape[1], \
124 |             self.num_shifts), name=self.name + '.shift.W')
125 |         self.b_hid_to_shift = self.add_param(b_hid_to_shift, (1, self.num_shifts), \
126 |             name=self.name + '.shift.b', regularizable=False)
127 |         self.nonlinearity_shift = nonlinearity_shift
128 |         # Gamma
129 |         self.W_hid_to_gamma = self.add_param(W_hid_to_gamma, (1, self.input_shape[1], \
130 |             1), name=self.name + '.gamma.W')
131 |         self.b_hid_to_gamma = self.add_param(b_hid_to_gamma, (1, 1), \
132 |             name=self.name + '.gamma.b', regularizable=False)
133 |         self.nonlinearity_gamma = nonlinearity_gamma
134 | 
135 |         self.weights_init = self.add_param(
136 |             weights_init, (1, self.memory_shape[0]),
137 |             name='weights_init', trainable=learn_init, regularizable=False)
138 | 
139 | 
140 | class WriteHead(Head):
141 |     r"""
142 |     Write head. In addition to the weight vector, the write head
143 |     also outputs an add vector :math:`a_{t}` and an erase vector
144 |     :math:`e_{t}` defined by
145 | 
146 |     .. math ::
147 |         a_{t} &= \sigma_{a}(h_{t} W_{a} + b_{a})
148 |         e_{t} &= \sigma_{e}(h_{t} W_{e} + b_{e})
149 | 
150 |     Parameters
151 |     ----------
152 |     controller: a :class:`Controller` instance
153 |         The controller of the Neural Turing Machine.
154 |     num_shifts: int
155 |         Number of shifts allowed by the convolutional shift operation
156 |         (centered on 0, eg. ``num_shifts=3`` represents shifts
157 |         in [-1, 0, 1]).
158 |     memory_shape: tuple
159 |         Shape of the NTM's memory
160 |     W_hid_to_key: callable, Numpy array or Theano shared variable
161 |     b_hid_to_key: callable, Numpy array, Theano shared variable or ``None``
162 |     nonlinearity_key: callable or ``None``
163 |         Weights, biases and nonlinearity for parameter :math:`k_{t}`.
164 |     W_hid_to_beta: callable, Numpy array or Theano shared variable
165 |     b_hid_to_beta: callable, Numpy array, Theano shared variable or ``None``
166 |     nonlinearity_beta: callable or ``None``
167 |         Weights, biases and nonlinearity for parameter :math:`\beta_{t}`.
168 |     W_hid_to_gate: callable, Numpy array or Theano shared variable
169 |     b_hid_to_gate: callable, Numpy array, Theano shared variable or ``None``
170 |     nonlinearity_gate: callable or ``None``
171 |         Weights, biases and nonlinearity for parameter :math:`g_{t}`.
172 |     W_hid_to_shift: callable, Numpy array or Theano shared variable
173 |     b_hid_to_shift: callable, Numpy array, Theano shared variable or ``None``
174 |     nonlinearity_shift: callable or ``None``
175 |         Weights, biases and nonlinearity for parameter :math:`s_{t}`.
176 |     W_hid_to_gamma: callable, Numpy array or Theano shared variable
177 |     b_hid_to_gamma: callable, Numpy array, Theano shared variable or ``None``
178 |     nonlinearity_gamma: callable or ``None``
179 |         Weights, biases and nonlinearity for parameter :math:`\gamma_{t}`
180 |     W_hid_to_erase: callable, Numpy array or Theano shared variable
181 |     b_hid_to_erase: callable, Numpy array, Theano shared variable or ``None``
182 |     nonlinearity_erase: callable or ``None``
183 |         Weights, biases and nonlinearity for parameter :math:`e_{t}`
184 |     W_hid_to_add: callable, Numpy array or Theano shared variable
185 |     b_hid_to_add: callable, Numpy array, Theano shared variable or ``None``
186 |     nonlinearity_add: callable or ``None``
187 |         Weights, biases and nonlinearity for parameter :math:`a_{t}`
188 |     weights_init: callable, Numpy array or Theano shared variable
189 |         Initializer for the initial weight vector (:math:`w_{0}`).
190 |     learn_init: bool
191 |         If ``True``, initial hidden values are learned.
192 |     """
193 |     def __init__(self, controller, num_shifts=3, memory_shape=(128, 20),
194 |                  W_hid_to_key=lasagne.init.GlorotUniform(),
195 |                  b_hid_to_key=lasagne.init.Constant(0.),
196 |                  nonlinearity_key=nonlinearities.ClippedLinear(low=0., high=1.),
197 |                  W_hid_to_beta=lasagne.init.GlorotUniform(),
198 |                  b_hid_to_beta=lasagne.init.Constant(0.),
199 |                  nonlinearity_beta=lasagne.nonlinearities.rectify,
200 |                  W_hid_to_gate=lasagne.init.GlorotUniform(),
201 |                  b_hid_to_gate=lasagne.init.Constant(0.),
202 |                  nonlinearity_gate=nonlinearities.hard_sigmoid,
203 |                  W_hid_to_shift=lasagne.init.GlorotUniform(),
204 |                  b_hid_to_shift=lasagne.init.Constant(0.),
205 |                  nonlinearity_shift=lasagne.nonlinearities.softmax,
206 |                  W_hid_to_gamma=lasagne.init.GlorotUniform(),
207 |                  b_hid_to_gamma=lasagne.init.Constant(0.),
208 |                  nonlinearity_gamma=lambda x: 1. + lasagne.nonlinearities.rectify(x),
209 |                  W_hid_to_erase=lasagne.init.GlorotUniform(),
210 |                  b_hid_to_erase=lasagne.init.Constant(0.),
211 |                  nonlinearity_erase=nonlinearities.hard_sigmoid,
212 |                  W_hid_to_add=lasagne.init.GlorotUniform(),
213 |                  b_hid_to_add=lasagne.init.Constant(0.),
214 |                  nonlinearity_add=nonlinearities.ClippedLinear(low=0., high=1.),
215 |                  weights_init=init.OneHot(),
216 |                  learn_init=False,
217 |                  **kwargs):
218 |         super(WriteHead, self).__init__(controller, num_shifts=num_shifts, memory_shape=memory_shape,
219 |             W_hid_to_key=W_hid_to_key, b_hid_to_key=b_hid_to_key, nonlinearity_key=nonlinearity_key,
220 |             W_hid_to_beta=W_hid_to_beta, b_hid_to_beta=b_hid_to_beta, nonlinearity_beta=nonlinearity_beta,
221 |             W_hid_to_gate=W_hid_to_gate, b_hid_to_gate=b_hid_to_gate, nonlinearity_gate=nonlinearity_gate,
222 |             W_hid_to_shift=W_hid_to_shift, b_hid_to_shift=b_hid_to_shift, nonlinearity_shift=nonlinearity_shift,
223 |             W_hid_to_gamma=W_hid_to_gamma, b_hid_to_gamma=b_hid_to_gamma, nonlinearity_gamma=nonlinearity_gamma,
224 |             weights_init=weights_init, learn_init=learn_init, **kwargs)
225 |         # Erase
226 |         self.W_hid_to_erase = self.add_param(W_hid_to_erase, (1, self.input_shape[1], \
227 |             self.memory_shape[1]), name=self.name + '.erase.W')
228 |         self.b_hid_to_erase = self.add_param(b_hid_to_erase, (1, self.memory_shape[1]), \
229 |             name=self.name + '.erase.b', regularizable=False)
230 |         self.nonlinearity_erase = nonlinearity_erase
231 |         # Add
232 |         self.W_hid_to_add = self.add_param(W_hid_to_add, (1, self.input_shape[1], \
233 |             self.memory_shape[1]), name=self.name + '.add.W')
234 |         self.b_hid_to_add = self.add_param(b_hid_to_add, (1, self.memory_shape[1]), \
235 |             name=self.name + '.add.b', regularizable=False)
236 |         self.nonlinearity_add = nonlinearity_add
237 | 
238 | 
239 | class ReadHead(Head):
240 |     r"""
241 |     Read head.
242 | 
243 |     Parameters
244 |     ----------
245 |     controller: a :class:`Controller` instance
246 |         The controller of the Neural Turing Machine.
247 |     num_shifts: int
248 |         Number of shifts allowed by the convolutional shift operation
249 |         (centered on 0, eg. ``num_shifts=3`` represents shifts
250 |         in [-1, 0, 1]).
251 |     memory_shape: tuple
252 |         Shape of the NTM's memory
253 |     W_hid_to_key: callable, Numpy array or Theano shared variable
254 |     b_hid_to_key: callable, Numpy array, Theano shared variable or ``None``
255 |     nonlinearity_key: callable or ``None``
256 |         Weights, biases and nonlinearity for parameter :math:`k_{t}`.
257 |     W_hid_to_beta: callable, Numpy array or Theano shared variable
258 |     b_hid_to_beta: callable, Numpy array, Theano shared variable or ``None``
259 |     nonlinearity_beta: callable or ``None``
260 |         Weights, biases and nonlinearity for parameter :math:`\beta_{t}`.
261 |     W_hid_to_gate: callable, Numpy array or Theano shared variable
262 |     b_hid_to_gate: callable, Numpy array, Theano shared variable or ``None``
263 |     nonlinearity_gate: callable or ``None``
264 |         Weights, biases and nonlinearity for parameter :math:`g_{t}`.
265 |     W_hid_to_shift: callable, Numpy array or Theano shared variable
266 |     b_hid_to_shift: callable, Numpy array, Theano shared variable or ``None``
267 |     nonlinearity_shift: callable or ``None``
268 |         Weights, biases and nonlinearity for parameter :math:`s_{t}`.
269 |     W_hid_to_gamma: callable, Numpy array or Theano shared variable
270 |     b_hid_to_gamma: callable, Numpy array, Theano shared variable or ``None``
271 |     nonlinearity_gamma: callable or ``None``
272 |         Weights, biases and nonlinearity for parameter :math:`\gamma_{t}`
273 |     weights_init: callable, Numpy array or Theano shared variable
274 |         Initializer for the initial weight vector (:math:`w_{0}`).
275 |     learn_init: bool
276 |         If ``True``, initial hidden values are learned.
277 |     """
278 |     def __init__(self, controller, num_shifts=3, memory_shape=(128, 20),
279 |                  W_hid_to_key=lasagne.init.GlorotUniform(),
280 |                  b_hid_to_key=lasagne.init.Constant(0.),
281 |                  nonlinearity_key=nonlinearities.ClippedLinear(low=0., high=1.),
282 |                  W_hid_to_beta=lasagne.init.GlorotUniform(),
283 |                  b_hid_to_beta=lasagne.init.Constant(0.),
284 |                  nonlinearity_beta=lasagne.nonlinearities.rectify,
285 |                  W_hid_to_gate=lasagne.init.GlorotUniform(),
286 |                  b_hid_to_gate=lasagne.init.Constant(0.),
287 |                  nonlinearity_gate=T.nnet.hard_sigmoid,
288 |                  W_hid_to_shift=lasagne.init.GlorotUniform(),
289 |                  b_hid_to_shift=lasagne.init.Constant(0.),
290 |                  nonlinearity_shift=lasagne.nonlinearities.softmax,
291 |                  W_hid_to_gamma=lasagne.init.GlorotUniform(),
292 |                  b_hid_to_gamma=lasagne.init.Constant(0.),
293 |                  nonlinearity_gamma=lambda x: 1. + lasagne.nonlinearities.rectify(x),
294 |                  weights_init=init.OneHot(),
295 |                  learn_init=False,
296 |                  **kwargs):
297 |         super(ReadHead, self).__init__(controller, num_shifts=num_shifts, memory_shape=memory_shape,
298 |             W_hid_to_key=W_hid_to_key, b_hid_to_key=b_hid_to_key, nonlinearity_key=nonlinearity_key,
299 |             W_hid_to_beta=W_hid_to_beta, b_hid_to_beta=b_hid_to_beta, nonlinearity_beta=nonlinearity_beta,
300 |             W_hid_to_gate=W_hid_to_gate, b_hid_to_gate=b_hid_to_gate, nonlinearity_gate=nonlinearity_gate,
301 |             W_hid_to_shift=W_hid_to_shift, b_hid_to_shift=b_hid_to_shift, nonlinearity_shift=nonlinearity_shift,
302 |             W_hid_to_gamma=W_hid_to_gamma, b_hid_to_gamma=b_hid_to_gamma, nonlinearity_gamma=nonlinearity_gamma,
303 |             weights_init=weights_init, learn_init=learn_init, **kwargs)
304 | 
305 | 
306 | class HeadCollection(object):
307 |     r"""
308 |     The base class :class:`HeadCollection` represents a generic collection 
309 |     of heads. Each head is an instance of :class:`Head`. This allows to 
310 |     process the heads simultaneously if they have the same type. This should 
311 |     be limited to internal uses only.
312 | 
313 |     Parameters
314 |     ----------
315 |     heads: a list of :class:`Head` instances
316 |         List of the heads.
317 |     """
318 |     def __init__(self, heads):
319 |         self.heads = heads
320 |         # QKFIX: Assume that all the heads have the same number of shifts and nonlinearities
321 |         self.memory_shape = self.heads[0].memory_shape
322 |         self.num_shifts = self.heads[0].num_shifts
323 |         # Key
324 |         self.W_hid_to_key = T.concatenate([head.W_hid_to_key for head in self.heads], axis=0)
325 |         self.b_hid_to_key = T.concatenate([head.b_hid_to_key for head in self.heads], axis=0)
326 |         self.nonlinearity_key = self.heads[0].nonlinearity_key
327 |         # Beta
328 |         self.W_hid_to_beta = T.concatenate([head.W_hid_to_beta for head in self.heads], axis=0)
329 |         self.b_hid_to_beta = T.concatenate([head.b_hid_to_beta for head in self.heads], axis=0)
330 |         self.nonlinearity_beta = self.heads[0].nonlinearity_beta
331 |         # Gate
332 |         self.W_hid_to_gate = T.concatenate([head.W_hid_to_gate for head in self.heads], axis=0)
333 |         self.b_hid_to_gate = T.concatenate([head.b_hid_to_gate for head in self.heads], axis=0)
334 |         self.nonlinearity_gate = self.heads[0].nonlinearity_gate
335 |         # Shift
336 |         self.W_hid_to_shift = T.concatenate([head.W_hid_to_shift for head in self.heads], axis=0)
337 |         self.b_hid_to_shift = T.concatenate([head.b_hid_to_shift for head in self.heads], axis=0)
338 |         self.nonlinearity_shift = self.heads[0].nonlinearity_shift
339 |         # Gamma
340 |         self.W_hid_to_gamma = T.concatenate([head.W_hid_to_gamma for head in self.heads], axis=0)
341 |         self.b_hid_to_gamma = T.concatenate([head.b_hid_to_gamma for head in self.heads], axis=0)
342 |         self.nonlinearity_gamma = self.heads[0].nonlinearity_gamma
343 |         # Initialization
344 |         self.weights_init = T.concatenate([head.weights_init for head in self.heads], axis=0)
345 | 
346 |     def get_params(self, **tags):
347 |         params = []
348 |         for head in self.heads:
349 |             params += head.get_params(**tags)
350 | 
351 |         return params
352 | 
353 |     def get_weights(self, h_t, w_tm1, M_t, **kwargs):
354 |         batch_size = self.heads[0].input_shape[0] # QKFIX: Get the size of the batches from the 1st head
355 |         num_heads = len(self.heads)
356 |         k_t = self.nonlinearity_key(T.dot(h_t, self.W_hid_to_key) + self.b_hid_to_key)
357 |         beta_t = self.nonlinearity_beta(T.dot(h_t, self.W_hid_to_beta) + self.b_hid_to_beta)
358 |         g_t = self.nonlinearity_gate(T.dot(h_t, self.W_hid_to_gate) + self.b_hid_to_gate)
359 |         # QKFIX: If the nonlinearity is softmax (which is usually the case), then the activations
360 |         # need to be reshaped (T.nnet.softmax only accepts 2D inputs)
361 |         try:
362 |             s_t = self.nonlinearity_shift(T.dot(h_t, self.W_hid_to_shift) + self.b_hid_to_shift)
363 |         except ValueError:
364 |             shift_activation_t = T.dot(h_t, self.W_hid_to_shift) + self.b_hid_to_shift
365 |             s_t = self.nonlinearity_shift(shift_activation_t.reshape((h_t.shape[0] * num_heads, self.num_shifts)))
366 |             s_t = s_t.reshape(shift_activation_t.shape)
367 |         gamma_t = self.nonlinearity_gamma(T.dot(h_t, self.W_hid_to_gamma) + self.b_hid_to_gamma)
368 | 
369 |         # Content Addressing (3.3.1)
370 |         beta_t = T.addbroadcast(beta_t, 2)
371 |         betaK = beta_t * similarities.cosine_similarity(k_t, M_t)
372 |         w_c = lasagne.nonlinearities.softmax(betaK.flatten(ndim=2))
373 |         w_c = w_c.reshape(betaK.shape)
374 | 
375 |         # Interpolation (3.3.2)
376 |         g_t = T.addbroadcast(g_t, 2)
377 |         w_g = g_t * w_c + (1. - g_t) * w_tm1
378 | 
379 |         # Convolutional Shift (3.3.2)
380 |         # NOTE: This library is using a flat (zero-padded) convolution instead of the circular
381 |         # convolution from the original paper. In practice, this change has a minimal impact.
382 |         w_g_padded = w_g.reshape((h_t.shape[0] * num_heads, self.memory_shape[0])).dimshuffle(0, 'x', 'x', 1)
383 |         conv_filter = s_t.reshape((h_t.shape[0] * num_heads, self.num_shifts)).dimshuffle(0, 'x', 'x', 1)
384 |         pad = (self.num_shifts // 2, (self.num_shifts - 1) // 2)
385 |         w_g_padded = padding.pad(w_g_padded, [pad], batch_ndim=3)
386 |         convolution = T.nnet.conv2d(w_g_padded, conv_filter,
387 |             input_shape=(None if batch_size is None else \
388 |                 batch_size * num_heads, 1, 1, self.memory_shape[0] + pad[0] + pad[1]),
389 |             filter_shape=(None if batch_size is None else \
390 |                 batch_size * num_heads, 1, 1, self.num_shifts),
391 |             subsample=(1, 1),
392 |             border_mode='valid')
393 |         w_tilde = convolution[T.arange(h_t.shape[0] * num_heads), T.arange(h_t.shape[0] * num_heads), 0, :]
394 |         w_tilde = w_tilde.reshape((h_t.shape[0], num_heads, self.memory_shape[0]))
395 | 
396 |         # Sharpening (3.3.2)
397 |         gamma_t = T.addbroadcast(gamma_t, 2)
398 |         w = T.pow(w_tilde + 1e-6, gamma_t)
399 |         w /= T.sum(w, axis=2).dimshuffle(0, 1, 'x')
400 | 
401 |         return w
402 | 
403 | 
404 | class ReadHeadCollection(HeadCollection):
405 |     r"""
406 |     Collection of read heads.
407 | 
408 |     Parameters
409 |     ----------
410 |     heads: a list of :class:`ReadHead` instances
411 |         List of the read heads.
412 |     """
413 |     def __init__(self, heads):
414 |         assert all([isinstance(head, ReadHead) for head in heads])
415 |         super(ReadHeadCollection, self).__init__(heads=heads)
416 | 
417 |     def read(self, w_tm1, M_t, **kwargs):
418 |         r_t = T.batched_dot(w_tm1, M_t)
419 | 
420 |         return r_t.flatten(ndim=2)
421 | 
422 | 
423 | class WriteHeadCollection(HeadCollection):
424 |     r"""
425 |     Collection of write heads.
426 | 
427 |     Parameters
428 |     ----------
429 |     heads: a list of :class:`WriteHead` instances
430 |         List of the write heads.
431 |     """
432 |     def __init__(self, heads):
433 |         assert all([isinstance(head, WriteHead) for head in heads])
434 |         super(WriteHeadCollection, self).__init__(heads=heads)
435 |         # Erase
436 |         self.W_hid_to_erase = T.concatenate([head.W_hid_to_erase for head in self.heads], axis=0)
437 |         self.b_hid_to_erase = T.concatenate([head.b_hid_to_erase for head in self.heads], axis=0)
438 |         self.nonlinearity_erase = self.heads[0].nonlinearity_erase
439 |         # Add
440 |         self.W_hid_to_add = T.concatenate([head.W_hid_to_add for head in self.heads], axis=0)
441 |         self.b_hid_to_add = T.concatenate([head.b_hid_to_add for head in self.heads], axis=0)
442 |         self.nonlinearity_add = self.heads[0].nonlinearity_add
443 | 
444 |     def write(self, h_tm1, w_tm1, M_tm1, **kwargs):
445 |         e_t = self.nonlinearity_erase(T.dot(h_tm1, self.W_hid_to_erase) + self.b_hid_to_erase)
446 |         a_t = self.nonlinearity_add(T.dot(h_tm1, self.W_hid_to_add) + self.b_hid_to_add)
447 |         # Erase
448 |         M_tp1 = M_tm1 * T.prod(1 - w_tm1.dimshuffle(0, 1, 2, 'x') * e_t.dimshuffle(0, 1, 'x', 2), axis=1)
449 |         # Add
450 |         M_tp1 += T.sum(w_tm1.dimshuffle(0, 1, 2, 'x') * a_t.dimshuffle(0, 1, 'x', 2), axis=1)
451 | 
452 |         return M_tp1
453 | 


--------------------------------------------------------------------------------
/ntm/init.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | 
 3 | import lasagne.init
 4 | from lasagne.utils import floatX
 5 | 
 6 | 
 7 | class OneHot(lasagne.init.Initializer):
 8 |     """
 9 |     Initialize the weights to one-hot vectors.
10 |     """
11 |     def sample(self, shape):
12 |         if len(shape) != 2:
13 |             raise ValueError('The OneHot initializer '
14 |                              'only works with 2D arrays.')
15 |         M = np.min(shape)
16 |         arr = np.zeros(shape)
17 |         arr[:M, :M] += 1 * np.eye(M)
18 |         return floatX(arr)
19 | 


--------------------------------------------------------------------------------
/ntm/layers.py:
--------------------------------------------------------------------------------
  1 | import theano
  2 | import theano.tensor as T
  3 | 
  4 | from lasagne.layers import Layer
  5 | 
  6 | from heads import ReadHead, WriteHead, ReadHeadCollection, WriteHeadCollection
  7 | 
  8 | 
  9 | class NTMLayer(Layer):
 10 |     r"""
 11 |     A Neural Turing Machine layer.
 12 | 
 13 |     Parameters
 14 |     ----------
 15 |     incoming: a :class:`lasagne.layers.Layer` instance
 16 |         The layer feeding into the Neural Turing Machine. This
 17 |         layer must match the incoming layer in the controller.
 18 |     memory: a :class:`Memory` instance
 19 |         The memory of the NTM.
 20 |     controller: a :class:`Controller` instance
 21 |         The controller of the NTM.
 22 |     heads: a list of :class:`Head` instances
 23 |         The read and write heads of the NTM.
 24 |     only_return_final: bool
 25 |         If ``True``, only return the final sequential output (e.g.
 26 |         for tasks where a single target value for the entire
 27 |         sequence is desired).  In this case, Theano makes an
 28 |         optimization which saves memory.
 29 |     """
 30 |     def __init__(self, incoming,
 31 |                  memory,
 32 |                  controller,
 33 |                  heads,
 34 |                  only_return_final=False,
 35 |                  **kwargs):
 36 |         super(NTMLayer, self).__init__(incoming, **kwargs)
 37 | 
 38 |         self.memory = memory
 39 |         self.controller = controller
 40 |         self.heads = heads
 41 |         self.write_heads = WriteHeadCollection(heads=\
 42 |             filter(lambda head: isinstance(head, WriteHead), heads))
 43 |         self.read_heads = ReadHeadCollection(heads=\
 44 |             filter(lambda head: isinstance(head, ReadHead), heads))
 45 |         self.only_return_final = only_return_final
 46 | 
 47 |     def get_output_shape_for(self, input_shapes):
 48 |         if self.only_return_final:
 49 |             return (input_shapes[0], self.controller.num_units)
 50 |         else:
 51 |             return (input_shapes[0], input_shapes[1], self.controller.num_units)
 52 | 
 53 |     def get_params(self, **tags):
 54 |         params = super(NTMLayer, self).get_params(**tags)
 55 |         params += self.controller.get_params(**tags)
 56 |         params += self.memory.get_params(**tags)
 57 |         for head in self.heads:
 58 |             params += head.get_params(**tags)
 59 | 
 60 |         return params
 61 | 
 62 |     def get_output_for(self, input, get_details=False, **kwargs):
 63 | 
 64 |         input = input.dimshuffle(1, 0, 2)
 65 | 
 66 |         def step(x_t, M_tm1, h_tm1, state_tm1, ww_tm1, wr_tm1, *params):
 67 |             # Update the memory (using w_tm1 of the writing heads & M_tm1)
 68 |             M_t = self.write_heads.write(h_tm1, ww_tm1, M_tm1)
 69 | 
 70 |             # Get the read vector (using w_tm1 of the reading heads & M_t)
 71 |             r_t = self.read_heads.read(wr_tm1, M_t)
 72 | 
 73 |             # Apply the controller (using x_t, r_t & the requirements for the controller)
 74 |             h_t, state_t = self.controller.step(x_t, r_t, h_tm1, state_tm1)
 75 | 
 76 |             # Update the weights (using h_t, M_t & w_tm1)
 77 |             ww_t = self.write_heads.get_weights(h_t, ww_tm1, M_t)
 78 |             wr_t = self.read_heads.get_weights(h_t, wr_tm1, M_t)
 79 | 
 80 |             return [M_t, h_t, state_t, ww_t, wr_t]
 81 | 
 82 |         memory_init = T.tile(self.memory.memory_init, (input.shape[1], 1, 1))
 83 |         memory_init = T.unbroadcast(memory_init, 0)
 84 | 
 85 |         write_weights_init = T.tile(self.write_heads.weights_init, (input.shape[1], 1, 1))
 86 |         write_weights_init = T.unbroadcast(write_weights_init, 0)
 87 |         read_weights_init = T.tile(self.read_heads.weights_init, (input.shape[1], 1, 1))
 88 |         read_weights_init = T.unbroadcast(read_weights_init, 0)
 89 | 
 90 |         non_seqs = self.controller.get_params() + self.memory.get_params() + \
 91 |             self.write_heads.get_params() + self.read_heads.get_params()
 92 | 
 93 |         hids, _ = theano.scan(
 94 |             fn=step,
 95 |             sequences=input,
 96 |             outputs_info=[memory_init] + self.controller.outputs_info(input.shape[1]) + \
 97 |                          [write_weights_init, read_weights_init],
 98 |             non_sequences=non_seqs,
 99 |             strict=True)
100 | 
101 |         # dimshuffle back to (n_batch, n_time_steps, n_features)
102 |         if get_details:
103 |             hid_out = [
104 |                 hids[0].dimshuffle(1, 0, 2, 3),
105 |                 hids[1].dimshuffle(1, 0, 2),
106 |                 hids[2].dimshuffle(1, 0, 2),
107 |                 hids[3].dimshuffle(1, 0, 2, 3),
108 |                 hids[4].dimshuffle(1, 0, 2, 3)]
109 |         else:
110 |             if self.only_return_final:
111 |                 hid_out = hids[1][-1]
112 |             else:
113 |                 hid_out = hids[1].dimshuffle(1, 0, 2)
114 | 
115 |         return hid_out
116 | 


--------------------------------------------------------------------------------
/ntm/memory.py:
--------------------------------------------------------------------------------
 1 | import theano
 2 | import theano.tensor as T
 3 | import numpy as np
 4 | 
 5 | from lasagne.layers import InputLayer
 6 | import lasagne.init
 7 | 
 8 | 
 9 | class Memory(InputLayer):
10 |     r"""
11 |     Memory of the Neural Turing Machine.
12 | 
13 |     Parameters
14 |     ----------
15 |     memory_shape: tuple
16 |         Shape of the NTM's memory.
17 |     memory_init: callable, Numpy array or Theano shared variable
18 |         Initializer for the initial state of the memory (:math:`M_{0}`).
19 |         The initial state of the memory must be non-zero.
20 |     learn_init: bool
21 |         If ``True``, initial state of the memory is learned.
22 |     """
23 |     def __init__(self, memory_shape,
24 |         memory_init=lasagne.init.Constant(1e-6),
25 |         learn_init=True,
26 |         **kwargs):
27 |         super(Memory, self).__init__(memory_shape, **kwargs)
28 |         self.memory_init = self.add_param(
29 |             memory_init, memory_shape,
30 |             name='memory_init', trainable=learn_init, regularizable=False)
31 | 


--------------------------------------------------------------------------------
/ntm/nonlinearities.py:
--------------------------------------------------------------------------------
 1 | import theano.tensor as T
 2 | 
 3 | 
 4 | class ClippedLinear(object):
 5 |     """
 6 |     Clipped linear activation.
 7 |     """
 8 |     def __init__(self, low=0., high=1.):
 9 |         super(ClippedLinear, self).__init__()
10 |         self.low = low
11 |         self.high = high
12 |     
13 |     def __call__(self, x):
14 |         return T.clip(x, self.low, self.high)
15 | 
16 | def hard_sigmoid(x):
17 |     return T.nnet.hard_sigmoid(x)


--------------------------------------------------------------------------------
/ntm/similarities.py:
--------------------------------------------------------------------------------
 1 | import theano
 2 | import theano.tensor as T
 3 | import numpy as np
 4 | 
 5 | 
 6 | def cosine_similarity(x, y, eps=1e-6):
 7 |     r"""
 8 |     Cosine similarity between a vector and each row of a base matrix.
 9 | 
10 |     Parameters
11 |     ----------
12 |     x: a 3D Theano variable
13 |         Vector to compare to each row of the matrix y.
14 |     y: a 3D Theano variable
15 |         Matrix to be compared to
16 |     eps: float
17 |         Precision of the operation (necessary for differentiability).
18 | 
19 |     Return
20 |     ------
21 |     z: a 3D Theano variable
22 |         A vector whose components are the cosine similarities
23 |         between x and each row of y.
24 |     """
25 |     z = T.batched_dot(x, y.dimshuffle(0, 2, 1))
26 |     z /= T.sqrt(T.sum(x * x, axis=2).dimshuffle(0, 1, 'x') * T.sum(y * y, axis=2).dimshuffle(0, 'x', 1) + eps)
27 | 
28 |     return z
29 | 


--------------------------------------------------------------------------------
/ntm/test/test_heads.py:
--------------------------------------------------------------------------------
  1 | import pytest
  2 | 
  3 | import theano
  4 | import theano.tensor as T
  5 | import numpy as np
  6 | 
  7 | import lasagne.nonlinearities
  8 | from lasagne.theano_extensions import padding
  9 | 
 10 | 
 11 | def test_content_addressing():
 12 |     from ntm.similarities import cosine_similarity
 13 |     beta_var, key_var, memory_var = T.tensor3s('beta', 'key', 'memory')
 14 | 
 15 |     beta_var = T.addbroadcast(beta_var, 2)
 16 |     betaK = beta_var * cosine_similarity(key_var, memory_var)
 17 |     w_c = lasagne.nonlinearities.softmax(betaK.reshape((16 * 4, 128)))
 18 |     w_c = w_c.reshape(betaK.shape)
 19 | 
 20 |     content_addressing_fn = theano.function([beta_var, key_var, memory_var], w_c)
 21 | 
 22 |     beta = np.random.rand(16, 4, 1)
 23 |     key = np.random.rand(16, 4, 20)
 24 |     memory = np.random.rand(16, 128, 20)
 25 | 
 26 |     weights = content_addressing_fn(beta, key, memory)
 27 |     weights_manual = np.zeros_like(weights)
 28 | 
 29 |     def softmax(x):
 30 |         y = np.exp(x.T - np.max(x, axis=1))
 31 |         z = y / np.sum(y, axis=0)
 32 |         return z.T
 33 | 
 34 |     betaK_manual = np.zeros((16, 4, 128))
 35 |     for i in range(16):
 36 |         for j in range(4):
 37 |             for k in range(128):
 38 |                 betaK_manual[i, j, k] = beta[i, j, 0] * np.dot(key[i, j], \
 39 |                     memory[i, k]) / np.sqrt(np.sum(key[i, j] * key[i, j]) * \
 40 |                     np.sum(memory[i, k] * memory[i, k]) + 1e-6)
 41 |     for i in range(16):
 42 |         weights_manual[i] = softmax(betaK_manual[i])
 43 | 
 44 |     assert weights.shape == (16, 4, 128)
 45 |     assert np.allclose(np.sum(weights, axis=2), np.ones((16, 4)))
 46 |     assert np.allclose(weights, weights_manual)
 47 | 
 48 | 
 49 | def test_convolutional_shift():
 50 |     weights_var, shift_var = T.tensor3s('weights', 'shift')
 51 |     num_shifts = 3
 52 | 
 53 |     weights_reshaped = weights_var.reshape((16 * 4, 128))
 54 |     weights_reshaped = weights_reshaped.dimshuffle(0, 'x', 'x', 1)
 55 |     shift_reshaped = shift_var.reshape((16 * 4, num_shifts))
 56 |     shift_reshaped = shift_reshaped.dimshuffle(0, 'x', 'x', 1)
 57 |     pad = (num_shifts // 2, (num_shifts - 1) // 2)
 58 |     weights_padded = padding.pad(weights_reshaped, [pad], batch_ndim=3)
 59 |     convolution = T.nnet.conv2d(weights_padded, shift_reshaped,
 60 |         input_shape=(16 * 4, 1, 1, 128 + pad[0] + pad[1]),
 61 |         filter_shape=(16 * 4, 1, 1, num_shifts),
 62 |         subsample=(1, 1),
 63 |         border_mode='valid')
 64 |     w_tilde = convolution[T.arange(16 * 4), T.arange(16 * 4), 0, :]
 65 |     w_tilde = w_tilde.reshape((16, 4, 128))
 66 | 
 67 |     convolutional_shift_fn = theano.function([weights_var, shift_var], w_tilde)
 68 | 
 69 |     weights = np.random.rand(16, 4, 128)
 70 |     shift = np.random.rand(16, 4, 3)
 71 | 
 72 |     weight_tilde = convolutional_shift_fn(weights, shift)
 73 |     weight_tilde_manual = np.zeros_like(weight_tilde)
 74 | 
 75 |     for i in range(16):
 76 |         for j in range(4):
 77 |             for k in range(128):
 78 |                 # Filters in T.nnet.conv2d are reversed
 79 |                 if (k - 1) >= 0:
 80 |                     weight_tilde_manual[i, j, k] += shift[i, j, 2] * weights[i, j, k - 1]
 81 |                 weight_tilde_manual[i, j, k] += shift[i, j, 1] * weights[i, j, k]
 82 |                 if (k + 1) < 128:
 83 |                     weight_tilde_manual[i, j, k] += shift[i, j, 0] * weights[i, j, k + 1]
 84 | 
 85 |     assert weight_tilde.shape == (16, 4, 128)
 86 |     assert np.allclose(weight_tilde, weight_tilde_manual)
 87 | 
 88 | 
 89 | def test_sharpening():
 90 |     weight_var, gamma_var = T.tensor3s('weight', 'gamma')
 91 | 
 92 |     gamma_var = T.addbroadcast(gamma_var, 2)
 93 |     w = T.pow(weight_var + 1e-6, gamma_var)
 94 |     w /= T.sum(w, axis=2).dimshuffle(0, 1, 'x')
 95 | 
 96 |     sharpening_fn = theano.function([weight_var, gamma_var], w)
 97 | 
 98 |     weights = np.random.rand(16, 4, 128)
 99 |     gamma = np.random.rand(16, 4, 1)
100 | 
101 |     weight_t = sharpening_fn(weights, gamma)
102 |     weight_t_manual = np.zeros_like(weight_t)
103 | 
104 |     for i in range(16):
105 |         for j in range(4):
106 |             for k in range(128):
107 |                 weight_t_manual[i, j, k] = np.power(weights[i, j, k] + 1e-6, gamma[i, j])
108 |             weight_t_manual[i, j] /= np.sum(weight_t_manual[i, j])
109 | 
110 |     assert weight_t.shape == (16, 4, 128)
111 |     assert np.allclose(weight_t, weight_t_manual)


--------------------------------------------------------------------------------
/ntm/test/test_layers.py:
--------------------------------------------------------------------------------
 1 | import pytest
 2 | 
 3 | import theano
 4 | import theano.tensor as T
 5 | import numpy as np
 6 | 
 7 | from lasagne.layers import InputLayer, ReshapeLayer, DenseLayer
 8 | from lasagne.layers import get_output, get_all_param_values, set_all_param_values
 9 | from ntm.layers import NTMLayer
10 | from ntm.heads import WriteHead, ReadHead
11 | from ntm.controllers import DenseController
12 | from ntm.memory import Memory
13 | 
14 | 
15 | def model(input_var, batch_size=1):
16 |     l_input = InputLayer((batch_size, None, 8), input_var=input_var)
17 |     batch_size_var, seqlen, _ = l_input.input_var.shape
18 | 
19 |     # Neural Turing Machine Layer
20 |     memory = Memory((128, 20), name='memory')
21 |     controller = DenseController(l_input, memory_shape=(128, 20),
22 |         num_units=100, num_reads=1, name='controller')
23 |     heads = [
24 |         WriteHead(controller, num_shifts=3, memory_shape=(128, 20), name='write'),
25 |         ReadHead(controller, num_shifts=3, memory_shape=(128, 20), name='read')
26 |     ]
27 |     l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads)
28 | 
29 |     # Output Layer
30 |     l_output_reshape = ReshapeLayer(l_ntm, (-1, 100))
31 |     l_output_dense = DenseLayer(l_output_reshape, num_units=8, name='dense')
32 |     l_output = ReshapeLayer(l_output_dense, (batch_size_var if batch_size \
33 |         is None else batch_size, seqlen, 8))
34 | 
35 |     return l_output
36 | 
37 | 
38 | def test_batch_size():
39 |     input_var01, input_var16 = T.tensor3s('input01', 'input16')
40 |     l_output01 = model(input_var01, batch_size=1)
41 |     l_output16 = model(input_var16, batch_size=16)
42 | 
43 |     # Share the parameters for both models
44 |     params01 = get_all_param_values(l_output01)
45 |     set_all_param_values(l_output16, params01)
46 | 
47 |     posterior_fn01 = theano.function([input_var01], get_output(l_output01))
48 |     posterior_fn16 = theano.function([input_var16], get_output(l_output16))
49 | 
50 |     example_input = np.random.rand(16, 30, 8)
51 |     example_output16 = posterior_fn16(example_input)
52 |     example_output01 = np.zeros_like(example_output16)
53 | 
54 |     for i in range(16):
55 |         example_output01[i] = posterior_fn01(example_input[i][np.newaxis, :, :])
56 | 
57 |     assert example_output16.shape == (16, 30, 8)
58 |     assert np.allclose(example_output16, example_output01, atol=1e-3)
59 | 
60 | 
61 | def test_batch_size_none():
62 |     input_var = T.tensor3('input')
63 |     l_output = model(input_var, batch_size=None)
64 |     posterior_fn = theano.function([input_var], get_output(l_output))
65 | 
66 |     example_input = np.random.rand(16, 30, 8)
67 |     example_output = posterior_fn(example_input)
68 | 
69 |     assert example_output.shape == (16, 30, 8)
70 | 


--------------------------------------------------------------------------------
/ntm/test/test_similarities.py:
--------------------------------------------------------------------------------
 1 | import pytest
 2 | 
 3 | import theano
 4 | import theano.tensor as T
 5 | import numpy as np
 6 | 
 7 | 
 8 | def test_cosine_similarity():
 9 |     from ntm.similarities import cosine_similarity
10 | 
11 |     key_var, memory_var = T.tensor3s('key', 'memory')
12 |     cosine_similarity_fn = theano.function([key_var, memory_var], \
13 |         cosine_similarity(key_var, memory_var, eps=1e-6))
14 | 
15 |     test_key = np.random.rand(16, 4, 20)
16 |     test_memory = np.random.rand(16, 128, 20)
17 | 
18 |     test_output = cosine_similarity_fn(test_key, test_memory)
19 |     test_output_manual = np.zeros_like(test_output)
20 | 
21 |     for i in range(16):
22 |         for j in range(4):
23 |             for k in range(128):
24 |                 test_output_manual[i, j, k] = np.dot(test_key[i, j], test_memory[i, k]) / \
25 |                     np.sqrt(np.sum(test_key[i, j] * test_key[i, j]) * np.sum(test_memory[i, k] * \
26 |                     test_memory[i, k]) + 1e-6)
27 | 
28 |     assert np.allclose(test_output, test_output_manual)
29 | 


--------------------------------------------------------------------------------
/ntm/updates.py:
--------------------------------------------------------------------------------
 1 | import theano
 2 | import theano.tensor as T
 3 | import numpy as np
 4 | 
 5 | from lasagne.updates import get_or_compute_grads
 6 | from collections import OrderedDict
 7 | 
 8 | def graves_rmsprop(loss_or_grads, params, learning_rate=1e-4, chi=0.95, alpha=0.9, epsilon=1e-4):
 9 |     r"""
10 |     Alex Graves' RMSProp [1]_.
11 | 
12 |     .. math ::
13 |         n_{i} &= \chi * n_i-1 + (1 - \chi) * grad^{2}\\
14 |         g_{i} &= \chi * g_i-1 + (1 - \chi) * grad\\
15 |         \Delta_{i} &= \alpha * Delta_{i-1} - learning_rate * grad /
16 |                   sqrt(n_{i} - g_{i}^{2} + \epsilon)\\
17 |         w_{i} &= w_{i-1} + \Delta_{i}
18 | 
19 |     References
20 |     ----------
21 |     .. [1] Graves, Alex.
22 |            "Generating Sequences With Recurrent Neural Networks", p.23
23 |            arXiv:1308.0850
24 | 
25 |     """
26 |     grads = get_or_compute_grads(loss_or_grads, params)
27 |     updates = OrderedDict()
28 | 
29 |     for param, grad in zip(params, grads):
30 |         value = param.get_value(borrow=True)
31 |         n = theano.shared(np.zeros(value.shape, dtype=value.dtype),
32 |                           broadcastable=param.broadcastable)
33 |         g = theano.shared(np.zeros(value.shape, dtype=value.dtype),
34 |                           broadcastable=param.broadcastable)
35 |         delta = theano.shared(np.zeros(value.shape, dtype=value.dtype),
36 |                               broadcastable=param.broadcastable)
37 |         n_ip1 = chi * n + (1. - chi) * grad ** 2
38 |         g_ip1 = chi * g + (1. - chi) * grad
39 |         delta_ip1 = alpha * delta - learning_rate * grad / T.sqrt(n_ip1 + \
40 |                     g_ip1 ** 2 + epsilon)
41 |         updates[n] = n_ip1
42 |         updates[g] = g_ip1
43 |         updates[delta] = delta_ip1
44 |         updates[param] = param + delta_ip1
45 | 
46 |     return updates
47 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | appdirs==1.4.3
 2 | cycler==0.10.0
 3 | functools32==3.2.3.post2
 4 | matplotlib==2.0.0
 5 | numpy==1.12.1
 6 | packaging==16.8
 7 | py==1.4.33
 8 | pyparsing==2.2.0
 9 | pytest==3.0.7
10 | python-dateutil==2.6.0
11 | pytz==2016.10
12 | scipy==0.19.0
13 | six==1.10.0
14 | subprocess32==3.2.7
15 | Theano==0.9.0
16 | 
17 | git+ssh://git@github.com/Lasagne/Lasagne.git


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup
 2 | from setuptools import find_packages
 3 | 
 4 | 
 5 | setup(name='NTM-Lasagne',
 6 |     version='0.3.0',
 7 |     description='Neural Turing Machines in Theano with Lasagne',
 8 |     author='Tristan Deleu',
 9 |     author_email='tristan.deleu@snips.ai',
10 |     url='',
11 |     download_url='',
12 |     license='MIT',
13 |     install_requires=[
14 |         'numpy>=1.12.1',
15 |         'theano==0.9.0'
16 |     ],
17 |     packages=['ntm'],
18 |     include_package_data=False,
19 |     zip_safe=False)


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/utils/__init__.py


--------------------------------------------------------------------------------
/utils/generators.py:
--------------------------------------------------------------------------------
  1 | import theano
  2 | import numpy as np
  3 | 
  4 | 
  5 | class Task(object):
  6 | 
  7 |     def __init__(self, max_iter=None, batch_size=1):
  8 |         self.max_iter = max_iter
  9 |         self.batch_size = batch_size
 10 |         self.num_iter = 0
 11 | 
 12 |     def __iter__(self):
 13 |         return self
 14 | 
 15 |     def __next__(self):
 16 |         return self.next()
 17 | 
 18 |     def next(self):
 19 |         if (self.max_iter is None) or (self.num_iter < self.max_iter):
 20 |             self.num_iter += 1
 21 |             params = self.sample_params()
 22 |             return (self.num_iter - 1), self.sample(**params)
 23 |         else:
 24 |             raise StopIteration()
 25 | 
 26 |     def sample_params(self):
 27 |         raise NotImplementedError()
 28 | 
 29 |     def sample(self):
 30 |         raise NotImplementedError()
 31 | 
 32 | 
 33 | class CopyTask(Task):
 34 | 
 35 |     def __init__(self, size, max_length, min_length=1, max_iter=None, \
 36 |         batch_size=1, end_marker=False):
 37 |         super(CopyTask, self).__init__(max_iter=max_iter, batch_size=batch_size)
 38 |         self.size = size
 39 |         self.min_length = min_length
 40 |         self.max_length = max_length
 41 |         self.end_marker = end_marker
 42 | 
 43 |     def sample_params(self, length=None):
 44 |         if length is None:
 45 |             length = np.random.randint(self.min_length, self.max_length + 1)
 46 |         return {'length': length}
 47 | 
 48 |     def sample(self, length):
 49 |         sequence = np.random.binomial(1, 0.5, (self.batch_size, length, self.size))
 50 |         example_input = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \
 51 |             self.size + 1), dtype=theano.config.floatX)
 52 |         example_output = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \
 53 |             self.size + 1), dtype=theano.config.floatX)
 54 | 
 55 |         example_input[:, :length, :self.size] = sequence
 56 |         example_input[:, length, -1] = 1
 57 |         example_output[:, length + 1:2 * length + 1, :self.size] = sequence
 58 |         if self.end_marker:
 59 |             example_output[:, -1, -1] = 1
 60 | 
 61 |         return example_input, example_output
 62 | 
 63 | 
 64 | class RepeatCopyTask(Task):
 65 | 
 66 |     def __init__(self, size, max_length, max_repeats=20, min_length=1, \
 67 |         min_repeats=1, unary=False, max_iter=None, batch_size=1, end_marker=False):
 68 |         super(RepeatCopyTask, self).__init__(max_iter=max_iter, batch_size=batch_size)
 69 |         self.size = size
 70 |         self.min_length = min_length
 71 |         self.max_length = max_length
 72 |         self.min_repeats = min_repeats
 73 |         self.max_repeats = max_repeats
 74 |         self.unary = unary
 75 |         self.end_marker = end_marker
 76 | 
 77 |     def sample_params(self, length=None, repeats=None):
 78 |         if length is None:
 79 |             length = np.random.randint(self.min_length, self.max_length + 1)
 80 |         if repeats is None:
 81 |             repeats = np.random.randint(self.min_repeats, self.max_repeats + 1)
 82 |         return {'length': length, 'repeats': repeats}
 83 | 
 84 |     def sample(self, length, repeats):
 85 |         sequence = np.random.binomial(1, 0.5, (self.batch_size, length, self.size))
 86 |         num_repeats_length = repeats if self.unary else 1
 87 |         example_input = np.zeros((self.batch_size, (repeats + 1) * length + \
 88 |             num_repeats_length + 1 + self.end_marker, self.size + 2), dtype=theano.config.floatX)
 89 |         example_output = np.zeros((self.batch_size, (repeats + 1) * length + \
 90 |             num_repeats_length + 1 + self.end_marker, self.size + 2), dtype=theano.config.floatX)
 91 | 
 92 |         example_input[:, :length, :self.size] = sequence
 93 |         for j in range(repeats):
 94 |             example_output[:, (j + 1) * length + num_repeats_length + 1:\
 95 |             (j + 2) * length + num_repeats_length + 1, :self.size] = sequence
 96 |         if self.unary:
 97 |             example_input[:, length:length + repeats, -2] = 1
 98 |         else:
 99 |             example_input[:, length, -2] = repeats / float(self.max_repeats)
100 |         example_input[:, length + num_repeats_length, -1] = 1
101 |         if self.end_marker:
102 |             example_output[:, -1, -1] = 1
103 | 
104 |         return example_input, example_output
105 | 
106 | 
107 | class AssociativeRecallTask(Task):
108 | 
109 |     def __init__(self, size, max_item_length, max_num_items, \
110 |         min_item_length=1, min_num_items=2, max_iter=None, batch_size=1):
111 |         super(AssociativeRecallTask, self).__init__(max_iter=max_iter, batch_size=batch_size)
112 |         self.size = size
113 |         self.max_item_length = max_item_length
114 |         self.max_num_items = max_num_items
115 |         self.min_item_length = min_item_length
116 |         self.min_num_items = min_num_items
117 | 
118 |     def sample_params(self, item_length=None, num_items=None):
119 |         if item_length is None:
120 |             item_length = np.random.randint(self.min_item_length, \
121 |                 self.max_item_length + 1)
122 |         if num_items is None:
123 |             num_items = np.random.randint(self.min_num_items, \
124 |                 self.max_num_items + 1)
125 |         return {'item_length': item_length, 'num_items': num_items}
126 | 
127 |     def sample(self, item_length, num_items):
128 |         def item_slice(j):
129 |             slice_idx = j * (item_length + 1) + 1
130 |             return slice(slice_idx, slice_idx + item_length)
131 | 
132 |         items = np.random.binomial(1, 0.5, (self.batch_size, item_length, self.size, num_items))
133 |         queries = np.random.randint(num_items - 1, size=self.batch_size)
134 |         example_input = np.zeros((self.batch_size, (item_length + 1) * (num_items + 2), \
135 |             self.size + 2), dtype=theano.config.floatX)
136 |         example_output = np.zeros((self.batch_size, (item_length + 1) * (num_items + 2), \
137 |             self.size + 2), dtype=theano.config.floatX)
138 | 
139 |         for j in range(num_items):
140 |             example_input[:, j * (item_length + 1), -2] = 1
141 |             example_input[:, item_slice(j), :self.size] = items[:,:,:,j]
142 |         example_input[:, num_items * (item_length + 1), -1] = 1
143 |         for batch in range(self.batch_size):
144 |             example_input[batch, item_slice(num_items), :self.size] = items[batch,:,:,queries[batch]]
145 |             example_output[batch, -item_length:, :self.size] = items[batch,:,:,queries[batch] + 1]
146 |         example_input[:, (num_items + 1) * (item_length + 1), -1] = 1
147 | 
148 |         return example_input, example_output
149 | 
150 | 
151 | class DynamicNGramsTask(Task):
152 | 
153 |     def __init__(self, ngrams, max_length, min_length=1, max_iter=None, \
154 |         table=None, batch_size=1):
155 |         super(DynamicNGramsTask, self).__init__(max_iter=max_iter, batch_size=batch_size)
156 |         self.ngrams = ngrams
157 |         if table is None:
158 |             table = self.make_table()
159 |         self.table = table
160 |         self.max_length = max(ngrams, max_length)
161 |         self.min_length = min_length
162 |         
163 |     def make_table(self):
164 |         return np.random.beta(0.5, 0.5, 1 << self.ngrams)
165 | 
166 |     def sample_params(self, length=None):
167 |         if length is None:
168 |             length = np.random.randint(self.min_length, self.max_length + 1)
169 |         return {'length': length}
170 | 
171 |     def sample(self, length):
172 |         sequence = np.zeros((self.batch_size, length + 1, 1), dtype=theano.config.floatX)
173 |         head = np.random.binomial(1, 0.5, (self.batch_size, self.ngrams))
174 |         sequence[:, :self.ngrams, 0] = head
175 |         index = np.dot(head, 1 << (np.arange(self.ngrams, 0, -1) - 1))
176 |         mask = (1 << (self.ngrams - 1)) - 1
177 | 
178 |         for j in range(self.ngrams, length + 1):
179 |             b = np.random.binomial(1, self.table[index])
180 |             sequence[:, j, 0] = b
181 |             index = ((index & mask) << 1) + b
182 | 
183 |         return sequence[:,:-1], sequence[:,1:]
184 | 
185 | 
186 | class DyckWordsTask(Task):
187 | 
188 |     def __init__(self, max_length, min_length=1, max_iter=None, batch_size=1):
189 |         super(DyckWordsTask, self).__init__(max_iter=max_iter, batch_size=batch_size)
190 |         self.max_length = max_length
191 |         self.min_length = min_length
192 | 
193 |     def sample_params(self, length=None):
194 |         if length is None:
195 |             length = np.random.randint(self.min_length, self.max_length + 1)
196 |         return {'length': length}
197 | 
198 |     def sample(self, length):
199 |         example_input = np.zeros((self.batch_size, 2 * length, 1), \
200 |             dtype=theano.config.floatX)
201 |         example_output = np.zeros((self.batch_size, 2 * length, 1), \
202 |             dtype=theano.config.floatX)
203 |         is_dyck_word = np.random.binomial(1, 0.5, self.batch_size)\
204 |                        .astype(dtype=theano.config.floatX)
205 | 
206 |         for batch in range(self.batch_size):
207 |             if is_dyck_word[batch]:
208 |                 word = self.get_random_dyck(length)
209 |             else:
210 |                 word = self.get_random_non_dyck(length)
211 |             example_input[batch, :, 0] = word
212 |             example_output[batch, :, 0] = self.get_dyck_prefix(word)
213 | 
214 |         return example_input, example_output
215 | 
216 |     def get_dyck_prefix(self, word):
217 |         def dyck_prefixes(prefixes_and_stack, u):
218 |             prefixes, is_valid, stack = prefixes_and_stack
219 |             if u: stack -= 1
220 |             else: stack += 1
221 |             if stack < 0:
222 |                 is_valid = False
223 |             prefixes.append(is_valid and (stack == 0))
224 |             return (prefixes, is_valid, stack)
225 | 
226 |         prefixes, _, _ = reduce(dyck_prefixes, word, ([], True, 0))
227 |         return prefixes
228 | 
229 |     def get_random_dyck(self, n):
230 |         """
231 |         Return a random Dyck word of a given semilength `n`
232 | 
233 |         This algorithm is based on a conjugacy property between words in
234 |         the language `L = S(u^n d^{n+1})` and *Dyck words* of length 2n,
235 |         where `S` is the group of permutations.
236 |         This 1-to-(2n+1) correspondance between these words is given by
237 |         the cycle lemma:
238 | 
239 |         **Cycle Lemma**: Let `A = {u, d}` be a binary alphabet and `delta`
240 |         a "height" function such that `delta(u) = +1` and `delta(d) = -1`.
241 |         For any word `w` in `A^*` such that `delta(w) = -1`, there exists
242 |         a unique factorization `w = w_1 w_2` satisfying
243 |             - `w_1` is not empty;
244 |             - `w_2 w_1` has the Lukasiewicz property, i.e. any strict left
245 |             factor of `w_2 w_1` satisfies `delta(v) >= 0`.
246 |         where we extend the definition of `delta` to words by summing the
247 |         heights of every individual letter.
248 | 
249 |         To summarize, here is the pseudo-code for this algorithm:
250 |             - Pick a random word `w` in the language `L = S(u^n d^{n+1})`
251 |             - Apply the cycle lemma to find the unique conjugate of
252 |             `w` having the Lukasiewicz property
253 |             - Return its prefix of length 2n, which is a Dyck word
254 | 
255 |         See: [Fla09], Notes I.47 and I.49 (pp.75-77)
256 | 
257 |         [Fla09] Analytic Combinatorics, *Philippe Flajolet, Robert Sedgewick*
258 |                 <http://algo.inria.fr/flajolet/Publications/AnaCombi/anacombi.html>
259 |         """
260 |         # Get a random element in L = u^n d^{n+1}
261 |         w = [0] * n + [1] * (n + 1)
262 |         np.random.shuffle(w)
263 | 
264 |         # Get the unique conjugate of w having the Lukasiewicz property
265 |         # (Cycle Lemma)
266 |         min_height = (0, 0)
267 |         stack = 0
268 |         for i in range(2 * n):
269 |             if w[i]: stack -= 1
270 |             else: stack += 1
271 |             if stack < min_height[1]:
272 |                 min_height = (i + 1, stack)
273 |         min_idx = min_height[0]
274 |         luka = w[min_idx:] + w[:min_idx]
275 | 
276 |         return luka[:-1]
277 | 
278 |     def get_random_non_dyck(self, n):
279 |         """
280 |         Return a random balanced non-Dyck word of semilength `n`
281 | 
282 |         The algorithm is based on the bijection between words in the
283 |         language `L = S(u^{n-1} d^{n+1})` and the balanced words of length
284 |         2n that are not Dyck words. This transformation is given by the
285 |         reflection of the letters after the first letter that violates
286 |         the Dyck property (i.e. the first right parenthesis that does
287 |         not have a matching left counterpart). The reflexion transformation
288 |         is defined by transforming any left parenthesis in a right one
289 |         and vice-versa.
290 | 
291 |         To summarize, here is the pseudo-code for this algorithm:
292 |             - Pick a random word `w` in the language `L = S(u^{n-1} d^{n+1})`
293 |             - Find the first letter violating the Dyck property
294 |             - Apply the reflection transformation to the following letters
295 |         """
296 |         w = [0] * (n - 1) + [1] * (n + 1)
297 |         np.random.shuffle(w)
298 | 
299 |         stack, reflection = (0, False)
300 |         for i in range(2 * n):
301 |             if reflection:
302 |                 w[i] = 1 * (not w[i])
303 |             else:
304 |                 if w[i]: stack -= 1
305 |                 else: stack += 1
306 |                 reflection = (stack < 0)
307 |         return w
308 | 
309 | 
310 | class UpsideDownCopyTask(Task):
311 | 
312 |     def __init__(self, size, max_length, min_length=1, max_iter=None, \
313 |         batch_size=1, end_marker=False):
314 |         super(UpsideDownCopyTask, self).__init__(max_iter=max_iter, batch_size=batch_size)
315 |         self.size = size
316 |         self.min_length = min_length
317 |         self.max_length = max_length
318 |         self.end_marker = end_marker
319 | 
320 |     def sample_params(self, length=None):
321 |         if length is None:
322 |             length = np.random.randint(self.min_length, self.max_length + 1)
323 |         return {'length': length}
324 | 
325 |     def sample(self, length):
326 |         sequence = np.random.binomial(1, 0.5, (self.batch_size, length, self.size))
327 |         example_input = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \
328 |             self.size + 1), dtype=theano.config.floatX)
329 |         example_output = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \
330 |             self.size + 1), dtype=theano.config.floatX)
331 |         index = 0
332 |         reversed_sequence = np.empty(shape = sequence.shape)
333 |         for inner in sequence:
334 |             reversed_sequence[index] = np.fliplr(inner)
335 |             index += 1
336 |         example_input[:, :length, :self.size] = sequence
337 |         example_input[:, length, -1] = 1
338 |         example_output[:, length + 1:2 * length + 1, :self.size] = reversed_sequence
339 |         if self.end_marker:
340 |             example_output[:, -1, -1] = 1
341 | 
342 |         return example_input, example_output
343 | 
344 | 
345 | class ReversedCopyTask(Task):
346 | 
347 |     def __init__(self, size, max_length, min_length=1, max_iter=None, \
348 |         batch_size=1, end_marker=False):
349 |         super(ReversedCopyTask, self).__init__(max_iter=max_iter, batch_size=batch_size)
350 |         self.size = size
351 |         self.min_length = min_length
352 |         self.max_length = max_length
353 |         self.end_marker = end_marker
354 | 
355 |     def sample_params(self, length=None):
356 |         if length is None:
357 |             length = np.random.randint(self.min_length, self.max_length + 1)
358 |         return {'length': length}
359 | 
360 |     def sample(self, length):
361 |         sequence = np.random.binomial(1, 0.5, (self.batch_size, length, self.size))
362 |         example_input = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \
363 |             self.size + 1), dtype=theano.config.floatX)
364 |         example_output = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \
365 |             self.size + 1), dtype=theano.config.floatX)
366 |         index = 0
367 |         reversed_sequence = np.empty(shape = sequence.shape)
368 |         for inner in sequence:
369 |             reversed_sequence[index] = np.flipud(inner)
370 |             index += 1
371 |         example_input[:, :length, :self.size] = sequence
372 |         example_input[:, length, -1] = 1
373 |         example_output[:, length + 1:2 * length + 1, :self.size] = reversed_sequence
374 |         if self.end_marker:
375 |             example_output[:, -1, -1] = 1
376 | 
377 |         return example_input, example_output
378 | 
379 | class SortTask(Task):
380 | 
381 |     def __init__(self, size, max_length, min_length=1, max_iter=None, \
382 |         batch_size=1, end_marker=False):
383 |         super(SortTask, self).__init__(max_iter=max_iter, batch_size=batch_size)
384 |         self.size = size
385 |         self.min_length = min_length
386 |         self.max_length = max_length
387 |         self.end_marker = end_marker
388 | 
389 |     def sample_params(self, length=None):
390 |         if length is None:
391 |             length = np.random.randint(self.min_length, self.max_length + 1)
392 |         return {'length': length}
393 | 
394 |     def sample(self, length):
395 |         sequence = np.random.binomial(1, 0.5, (self.batch_size, length, self.size))
396 |         example_input = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \
397 |             self.size + 1), dtype=theano.config.floatX)
398 |         example_output = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \
399 |             self.size + 1), dtype=theano.config.floatX)
400 |         index = 0
401 |         sorted_sequence = np.empty(shape = sequence.shape)
402 |         for inner in sequence:
403 |             sorted_sequence[index] = inner[np.lexsort(inner.T[::-1])]
404 |             index += 1
405 |         example_input[:, :length, :self.size] = sequence
406 |         example_input[:, length, -1] = 1
407 |         example_output[:, length + 1:2 * length + 1, :self.size] = sorted_sequence
408 |         if self.end_marker:
409 |             example_output[:, -1, -1] = 1
410 | 
411 |         return example_input, example_output
412 | 


--------------------------------------------------------------------------------
/utils/visualization.py:
--------------------------------------------------------------------------------
  1 | import theano
  2 | import theano.tensor as T
  3 | import numpy as np
  4 | import pandas as pd
  5 | 
  6 | import matplotlib
  7 | import matplotlib.pyplot as plt
  8 | 
  9 | class Dashboard(object):
 10 | 
 11 |     def __init__(self, ntm_fn, generator, memory_shape, ntm_layer_fn=None, \
 12 |         cmap='bone', markers=[]):
 13 |         super(Dashboard, self).__init__()
 14 |         self.ntm_fn = ntm_fn
 15 |         self.ntm_layer_fn = ntm_layer_fn
 16 |         self.memory_shape = memory_shape
 17 |         self.generator = generator
 18 |         self.markers = markers
 19 |         self.cmap = cmap
 20 | 
 21 |     def sample(self, **params):
 22 |         params = self.generator.sample_params(**params)
 23 |         example_input, example_output = self.generator.sample(**params)
 24 |         self.show(example_input, example_output, params)
 25 | 
 26 |     def show(self, example_input, example_output, params):
 27 |         example_prediction = self.ntm_fn(example_input)
 28 |         num_columns = 1
 29 |         if self.ntm_layer_fn is not None:
 30 |             num_columns = 3
 31 |             example_ntm = self.ntm_layer_fn(example_input)
 32 |         subplot_shape = (3, num_columns)
 33 |         title_props = matplotlib.font_manager.FontProperties(weight='bold', \
 34 |             size=9)
 35 | 
 36 |         ax1 = plt.subplot2grid(subplot_shape, (0, 2))
 37 |         ax1.imshow(example_input[0].T, interpolation='nearest', cmap=self.cmap,
 38 |             vmin=0.0, vmax=1.0)
 39 |         ax1.set_title('Input')
 40 |         ax1.title.set_font_properties(title_props)
 41 |         ax1.get_xaxis().set_visible(False)
 42 |         ax1.get_yaxis().set_visible(False)
 43 | 
 44 |         ax2 = plt.subplot2grid(subplot_shape, (1, 2))
 45 |         ax2.imshow(example_output[0].T, interpolation='nearest', cmap=self.cmap,
 46 |             vmin=0.0, vmax=1.0)
 47 |         ax2.set_title('Output')
 48 |         ax2.title.set_font_properties(title_props)
 49 |         ax2.get_xaxis().set_visible(False)
 50 |         ax2.get_yaxis().set_visible(False)
 51 | 
 52 |         ax3 = plt.subplot2grid(subplot_shape, (2, 2))
 53 |         ax3.imshow(example_prediction[0].T, interpolation='nearest', \
 54 |             cmap=self.cmap, vmin=0.0, vmax=1.0)
 55 |         ax3.set_title('Prediction')
 56 |         ax3.title.set_font_properties(title_props)
 57 |         ax3.get_xaxis().set_visible(False)
 58 |         ax3.get_yaxis().set_visible(False)
 59 | 
 60 |         if self.ntm_layer_fn is not None:
 61 |             ax4 = plt.subplot2grid(subplot_shape, (0, 0), rowspan=3)
 62 |             ax4.imshow(example_ntm[3][0,:,0].T, interpolation='nearest', \
 63 |                 cmap=self.cmap, vmin=0.0, vmax=1.0)
 64 |             ax4.set_title('Write Weights')
 65 |             ax4.title.set_font_properties(title_props)
 66 |             ax4.get_xaxis().set_visible(False)
 67 |             for marker in self.markers:
 68 |                 marker_style = marker.get('style', {})
 69 |                 ax4.plot([marker['location'](params), \
 70 |                     marker['location'](params)], [0, \
 71 |                     self.memory_shape[0] - 1], **marker_style)
 72 |             ax4.set_xlim([-0.5, example_input.shape[1] - 0.5])
 73 |             ax4.set_ylim([-0.5, self.memory_shape[0] - 0.5])
 74 |             ax4.tick_params(axis='y', labelsize=9)
 75 | 
 76 |             ax5 = plt.subplot2grid(subplot_shape, (0, 1), rowspan=3)
 77 |             ax5.imshow(example_ntm[4][0,:,0].T, interpolation='nearest', \
 78 |                 cmap=self.cmap, vmin=0.0, vmax=1.0)
 79 |             ax5.set_title('Read Weights')
 80 |             ax5.title.set_font_properties(title_props)
 81 |             ax5.get_xaxis().set_visible(False)
 82 |             for marker in self.markers:
 83 |                 marker_style = marker.get('style', {})
 84 |                 ax5.plot([marker['location'](params), \
 85 |                     marker['location'](params)], [0, \
 86 |                     self.memory_shape[0] - 1], **marker_style)
 87 |             ax5.set_xlim([-0.5, example_input.shape[1] - 0.5])
 88 |             ax5.set_ylim([-0.5, self.memory_shape[0] - 0.5])
 89 |             ax5.tick_params(axis='y', labelsize=9)
 90 | 
 91 |         plt.show()
 92 | 
 93 | 
 94 | def learning_curve(scores):
 95 |     sc = pd.Series(scores)
 96 |     ma = pd.rolling_mean(sc, window=500)
 97 | 
 98 |     ax = plt.subplot(1, 1, 1)
 99 |     ax.plot(sc.index, sc, color='lightgray')
100 |     ax.plot(ma.index, ma, color='red')
101 |     ax.set_yscale('log')
102 |     ax.set_xlim(sc.index.min(), sc.index.max())
103 |     plt.show()
104 | 


--------------------------------------------------------------------------------