├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── examples ├── associative-recall-task.py ├── copy-task.py ├── dyck-words-task.py ├── repeat-copy-task.py ├── reversed-copy-task.py ├── sort-task.py └── upsidedown-copy-task.py ├── models ├── README.rst ├── associative-recall.npy ├── copy.npy ├── dyck-words.npy ├── repeat-copy.npy ├── reversed-copy.npy ├── sort.npy └── upsidedown-copy.npy ├── ntm ├── __init__.py ├── controllers.py ├── heads.py ├── init.py ├── layers.py ├── memory.py ├── nonlinearities.py ├── similarities.py ├── test │ ├── test_heads.py │ ├── test_layers.py │ └── test_similarities.py └── updates.py ├── requirements.txt ├── setup.py └── utils ├── __init__.py ├── generators.py └── visualization.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *,cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # Jupyter Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # SageMath parsed files 80 | *.sage.py 81 | 82 | # dotenv 83 | .env 84 | 85 | # virtualenv 86 | .venv 87 | venv/ 88 | ENV/ 89 | 90 | # Spyder project settings 91 | .spyderproject 92 | 93 | # Rope project settings 94 | .ropeproject 95 | 96 | ## Temporary 97 | tmp/ 98 | data/ 99 | img/ 100 | notebooks/ 101 | animation/ 102 | models/learning_curves/ 103 | snapshots/ 104 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Contributor Covenant Code of Conduct 2 | 3 | ## Our Pledge 4 | 5 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation. 6 | 7 | ## Our Standards 8 | 9 | Examples of behavior that contributes to creating a positive environment include: 10 | 11 | * Using welcoming and inclusive language 12 | * Being respectful of differing viewpoints and experiences 13 | * Gracefully accepting constructive criticism 14 | * Focusing on what is best for the community 15 | * Showing empathy towards other community members 16 | 17 | Examples of unacceptable behavior by participants include: 18 | 19 | * The use of sexualized language or imagery and unwelcome sexual attention or advances 20 | * Trolling, insulting/derogatory comments, and personal or political attacks 21 | * Public or private harassment 22 | * Publishing others' private information, such as a physical or electronic address, without explicit permission 23 | * Other conduct which could reasonably be considered inappropriate in a professional setting 24 | 25 | ## Our Responsibilities 26 | 27 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. 28 | 29 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. 30 | 31 | ## Scope 32 | 33 | This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. 34 | 35 | ## Enforcement 36 | 37 | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at support@snips.ai. The project team will review and investigate all complaints, and will respond in a way that it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. 38 | 39 | Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. 40 | 41 | ## Attribution 42 | 43 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at [http://contributor-covenant.org/version/1/4][version] 44 | 45 | [homepage]: http://contributor-covenant.org 46 | [version]: http://contributor-covenant.org/version/1/4/ 47 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | How to Contribute 2 | ================= 3 | 4 | Contributions are welcome! Not familiar with the codebase yet? No problem! 5 | There are many ways to contribute to open source projects: reporting bugs, 6 | helping with the documentation, spreading the word and of course, adding 7 | new features and patches. 8 | 9 | Getting Started 10 | --------------- 11 | * Make sure you have a GitHub account. 12 | * Open a [new issue](https://github.com/snipsco/ntm-lasagne/issues), assuming one does not already exist. 13 | * Clearly describe the issue including steps to reproduce when it is a bug. 14 | 15 | Making Changes 16 | -------------- 17 | * Fork this repository. 18 | * Create a feature branch from where you want to base your work. 19 | * Make commits of logical units (if needed rebase your feature branch before 20 | submitting it). 21 | * Check for unnecessary whitespace with ``git diff --check`` before committing. 22 | * Make sure your commit messages are well formatted. 23 | * If your commit fixes an open issue, reference it in the commit message (f.e. `#15`). 24 | * Run all the tests (if existing) to assure nothing else was accidentally broken. 25 | 26 | These guidelines also apply when helping with documentation. 27 | 28 | Submitting Changes 29 | ------------------ 30 | * Push your changes to a feature branch in your fork of the repository. 31 | * Submit a `Pull Request`. 32 | * Wait for maintainer feedback. 33 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015-2016 Tristan Deleu 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # NTM-Lasagne 2 | 3 | [![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/snipsco/ntm-lasagne/master/LICENSE) 4 | 5 | NTM-Lasagne is a library to create Neural Turing Machines (NTMs) in [Theano](http://deeplearning.net/software/theano/) using the [Lasagne](http://lasagne.readthedocs.org/) library. If you want to learn more about NTMs, check out our [blog post](https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural-turing-machines-in-lasagne-2cdce6837315#.63t84s5r5). 6 | 7 | This library features: 8 | - A Neural Turing Machine layer `NTMLayer`, where all its components (controller, heads, memory) are fully customizable. 9 | - Two types of controllers: a feed-forward `DenseController` and a "vanilla" recurrent `RecurrentController`. 10 | - A dashboard to visualize the inner mechanism of the NTM. 11 | - Generators to sample examples from algorithmic tasks. 12 | 13 | ## Getting started 14 | To avoid any conflict with your existing Python setup, and to keep this project self-contained, it is suggested to work in a virtual environment with [`virtualenv`](http://docs.python-guide.org/en/latest/dev/virtualenvs/). To install `virtualenv`: 15 | ```bash 16 | sudo pip install --upgrade virtualenv 17 | ``` 18 | 19 | Create a virtual environment called `venv`, activate it and install the requirements given by `requirements.txt`. NTM-Lasagne requires the bleeding-edge version, check the [Lasagne installation instructions](http://lasagne.readthedocs.org/en/latest/user/installation.html#bleeding-edge-version) for details. The latest version of [Lasagne](https://github.com/Lasagne/Lasagne/) is included in the `requirements.txt`. 20 | ```bash 21 | virtualenv venv 22 | source venv/bin/activate 23 | pip install -r requirements.txt 24 | pip install . 25 | ``` 26 | 27 | ## Example 28 | Here is minimal example to define a `NTMLayer` 29 | 30 | ```python 31 | # Neural Turing Machine Layer 32 | memory = Memory((128, 20), memory_init=lasagne.init.Constant(1e-6), 33 | learn_init=False, name='memory') 34 | controller = DenseController(l_input, memory_shape=(128, 20), 35 | num_units=100, num_reads=1, 36 | nonlinearity=lasagne.nonlinearities.rectify, 37 | name='controller') 38 | heads = [ 39 | WriteHead(controller, num_shifts=3, memory_shape=(128, 20), 40 | nonlinearity_key=lasagne.nonlinearities.rectify, 41 | nonlinearity_add=lasagne.nonlinearities.rectify, 42 | learn_init=False, name='write'), 43 | ReadHead(controller, num_shifts=3, memory_shape=(128, 20), 44 | nonlinearity_key=lasagne.nonlinearities.rectify, 45 | learn_init=False, name='read') 46 | ] 47 | l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads) 48 | ``` 49 | 50 | For more detailed examples, check the [`examples` folder](examples/). If you would like to train a Neural Turing Machine on one of these examples, simply run the corresponding script, like 51 | 52 | ``` 53 | PYTHONPATH=. python examples/copy-task.py 54 | ``` 55 | 56 | ## Tests 57 | This projects has a few basic tests. To run these tests, you can run the `py.test` on the project folder 58 | ```bash 59 | venv/bin/py.test ntm -vv 60 | ``` 61 | 62 | ## Known issues 63 | Graph optimization is computationally intensive. If you are encountering suspiciously long compilation times (more than a few minutes), you may need to increase the amount of memory allocated (if you run it on a Virtual Machine). Alternatively, turning off the swap may help for debugging (with `swapoff`/`swapon`). 64 | 65 | Note: Unlucky initialisation of the parameters might lead to a diverging solution (`NaN` scores). 66 | 67 | ## Paper 68 | Alex Graves, Greg Wayne, Ivo Danihelka, *Neural Turing Machines*, [[arXiv](https://arxiv.org/abs/1410.5401)] 69 | 70 | ## Contributing 71 | 72 | Please see the [Contribution Guidelines](https://github.com/snipsco/ntm-lasagne/blob/master/CONTRIBUTING.md). 73 | 74 | ## Copyright 75 | 76 | This library is provided by [Snips](https://www.snips.ai) as Open Source software. See [LICENSE](https://github.com/snipsco/ntm-lasagne/blob/master/LICENSE) for more information. 77 | -------------------------------------------------------------------------------- /examples/associative-recall-task.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | 6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer 7 | import lasagne.layers 8 | import lasagne.nonlinearities 9 | import lasagne.updates 10 | import lasagne.objectives 11 | import lasagne.init 12 | 13 | from ntm.layers import NTMLayer 14 | from ntm.memory import Memory 15 | from ntm.controllers import DenseController 16 | from ntm.heads import WriteHead, ReadHead 17 | from ntm.updates import graves_rmsprop 18 | 19 | from utils.generators import AssociativeRecallTask 20 | from utils.visualization import Dashboard 21 | 22 | 23 | 24 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)): 25 | 26 | # Input Layer 27 | l_input = InputLayer((batch_size, None, size + 2), input_var=input_var) 28 | _, seqlen, _ = l_input.input_var.shape 29 | 30 | # Neural Turing Machine Layer 31 | memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False) 32 | controller = DenseController(l_input, memory_shape=memory_shape, 33 | num_units=num_units, num_reads=1, 34 | nonlinearity=lasagne.nonlinearities.rectify, 35 | name='controller') 36 | heads = [ 37 | WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False, 38 | nonlinearity_key=lasagne.nonlinearities.rectify, 39 | nonlinearity_add=lasagne.nonlinearities.rectify), 40 | ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False, 41 | nonlinearity_key=lasagne.nonlinearities.rectify) 42 | ] 43 | l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads) 44 | 45 | # Output Layer 46 | l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units)) 47 | l_output_dense = DenseLayer(l_output_reshape, num_units=size + 2, nonlinearity=lasagne.nonlinearities.sigmoid, \ 48 | name='dense') 49 | l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 2)) 50 | 51 | return l_output, l_ntm 52 | 53 | 54 | if __name__ == '__main__': 55 | # Define the input and expected output variable 56 | input_var, target_var = T.tensor3s('input', 'target') 57 | # The generator to sample examples from 58 | generator = AssociativeRecallTask(batch_size=1, max_iter=1000000, size=8, max_num_items=6, \ 59 | min_item_length=1, max_item_length=3) 60 | # The model (1-layer Neural Turing Machine) 61 | l_output, l_ntm = model(input_var, batch_size=generator.batch_size, 62 | size=generator.size, num_units=100, memory_shape=(128, 20)) 63 | # The generated output variable and the loss function 64 | pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6) 65 | loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var)) 66 | # Create the update expressions 67 | params = lasagne.layers.get_all_params(l_output, trainable=True) 68 | learning_rate = theano.shared(1e-4) 69 | updates = lasagne.updates.adam(loss, params, learning_rate=learning_rate) 70 | # Compile the function for a training step, as well as the prediction function and 71 | # a utility function to get the inner details of the NTM 72 | train_fn = theano.function([input_var, target_var], loss, updates=updates) 73 | ntm_fn = theano.function([input_var], pred_var) 74 | ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True)) 75 | 76 | # Training 77 | try: 78 | scores, all_scores = [], [] 79 | for i, (example_input, example_output) in generator: 80 | score = train_fn(example_input, example_output) 81 | scores.append(score) 82 | all_scores.append(score) 83 | if i % 500 == 0: 84 | mean_scores = np.mean(scores) 85 | if mean_scores < 0.01: 86 | learning_rate.set_value(1e-5) 87 | print 'Batch #%d: %.6f' % (i, mean_scores) 88 | scores = [] 89 | except KeyboardInterrupt: 90 | pass 91 | 92 | # Visualization 93 | def marker1(params): 94 | return params['num_items'] * (params['item_length'] + 1) 95 | def marker2(params): 96 | return (params['num_items'] + 1) * (params['item_length'] + 1) 97 | markers = [ 98 | { 99 | 'location': marker1, 100 | 'style': {'color': 'red', 'ls': '-'} 101 | }, 102 | { 103 | 'location': marker2, 104 | 'style': {'color': 'green', 'ls': '-'} 105 | } 106 | ] 107 | 108 | dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \ 109 | memory_shape=(128, 20), markers=markers, cmap='bone') 110 | 111 | # Example 112 | params = generator.sample_params() 113 | dashboard.sample(**params) 114 | -------------------------------------------------------------------------------- /examples/copy-task.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | 6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer 7 | import lasagne.layers 8 | import lasagne.nonlinearities 9 | import lasagne.updates 10 | import lasagne.objectives 11 | import lasagne.init 12 | 13 | from ntm.layers import NTMLayer 14 | from ntm.memory import Memory 15 | from ntm.controllers import DenseController 16 | from ntm.heads import WriteHead, ReadHead 17 | from ntm.updates import graves_rmsprop 18 | 19 | from utils.generators import CopyTask 20 | from utils.visualization import Dashboard 21 | 22 | 23 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)): 24 | 25 | # Input Layer 26 | l_input = InputLayer((batch_size, None, size + 1), input_var=input_var) 27 | _, seqlen, _ = l_input.input_var.shape 28 | 29 | # Neural Turing Machine Layer 30 | memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False) 31 | controller = DenseController(l_input, memory_shape=memory_shape, 32 | num_units=num_units, num_reads=1, 33 | nonlinearity=lasagne.nonlinearities.rectify, 34 | name='controller') 35 | heads = [ 36 | WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False, 37 | nonlinearity_key=lasagne.nonlinearities.rectify, 38 | nonlinearity_add=lasagne.nonlinearities.rectify), 39 | ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False, 40 | nonlinearity_key=lasagne.nonlinearities.rectify) 41 | ] 42 | l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads) 43 | 44 | # Output Layer 45 | l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units)) 46 | l_output_dense = DenseLayer(l_output_reshape, num_units=size + 1, nonlinearity=lasagne.nonlinearities.sigmoid, \ 47 | name='dense') 48 | l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 1)) 49 | 50 | return l_output, l_ntm 51 | 52 | 53 | if __name__ == '__main__': 54 | # Define the input and expected output variable 55 | input_var, target_var = T.tensor3s('input', 'target') 56 | # The generator to sample examples from 57 | generator = CopyTask(batch_size=1, max_iter=1000000, size=8, max_length=5, end_marker=True) 58 | # The model (1-layer Neural Turing Machine) 59 | l_output, l_ntm = model(input_var, batch_size=generator.batch_size, \ 60 | size=generator.size, num_units=100, memory_shape=(128, 20)) 61 | # The generated output variable and the loss function 62 | pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6) 63 | loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var)) 64 | # Create the update expressions 65 | params = lasagne.layers.get_all_params(l_output, trainable=True) 66 | updates = graves_rmsprop(loss, params, learning_rate=1e-3) 67 | # Compile the function for a training step, as well as the prediction function and 68 | # a utility function to get the inner details of the NTM 69 | train_fn = theano.function([input_var, target_var], loss, updates=updates) 70 | ntm_fn = theano.function([input_var], pred_var) 71 | ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True)) 72 | 73 | # Training 74 | try: 75 | scores, all_scores = [], [] 76 | for i, (example_input, example_output) in generator: 77 | score = train_fn(example_input, example_output) 78 | scores.append(score) 79 | all_scores.append(score) 80 | if i % 500 == 0: 81 | mean_scores = np.mean(scores) 82 | print 'Batch #%d: %.6f' % (i, mean_scores) 83 | scores = [] 84 | except KeyboardInterrupt: 85 | pass 86 | 87 | # Visualization 88 | markers = [ 89 | { 90 | 'location': (lambda params: params['length']), 91 | 'style': {'color': 'red'} 92 | } 93 | ] 94 | 95 | dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \ 96 | memory_shape=(128, 20), markers=markers, cmap='bone') 97 | 98 | # Example 99 | params = generator.sample_params() 100 | dashboard.sample(**params) 101 | -------------------------------------------------------------------------------- /examples/dyck-words-task.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | 6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer 7 | import lasagne.layers 8 | import lasagne.nonlinearities 9 | import lasagne.updates 10 | import lasagne.objectives 11 | import lasagne.init 12 | 13 | from ntm.layers import NTMLayer 14 | from ntm.memory import Memory 15 | from ntm.controllers import DenseController 16 | from ntm.heads import WriteHead, ReadHead 17 | from ntm.updates import graves_rmsprop 18 | 19 | from utils.generators import DyckWordsTask 20 | from utils.visualization import Dashboard 21 | 22 | 23 | def model(input_var, batch_size=1, num_units=100, memory_shape=(128, 20)): 24 | 25 | # Input Layer 26 | l_input = InputLayer((batch_size, None, 1), input_var=input_var) 27 | _, seqlen, _ = l_input.input_var.shape 28 | 29 | # Neural Turing Machine Layer 30 | memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False) 31 | controller = DenseController(l_input, memory_shape=memory_shape, 32 | num_units=num_units, num_reads=1, 33 | nonlinearity=lasagne.nonlinearities.rectify, 34 | name='controller') 35 | heads = [ 36 | WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False, 37 | nonlinearity_key=lasagne.nonlinearities.rectify, 38 | nonlinearity_add=lasagne.nonlinearities.rectify), 39 | ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False, 40 | nonlinearity_key=lasagne.nonlinearities.rectify) 41 | ] 42 | l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads) 43 | 44 | # Output Layer 45 | l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units)) 46 | l_output_dense = DenseLayer(l_output_reshape, num_units=1, nonlinearity=lasagne.nonlinearities.sigmoid, \ 47 | name='dense') 48 | l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, 1)) 49 | 50 | return l_output, l_ntm 51 | 52 | 53 | if __name__ == '__main__': 54 | # Define the input and expected output variable 55 | input_var, target_var = T.tensor3s('input', 'target') 56 | # The generator to sample examples from 57 | generator = DyckWordsTask(batch_size=1, max_iter=1000000, max_length=5) 58 | # The model (1-layer Neural Turing Machine) 59 | l_output, l_ntm = model(input_var, batch_size=generator.batch_size, 60 | num_units=100, memory_shape=(128, 20)) 61 | # The generated output variable and the loss function 62 | pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6) 63 | loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var)) 64 | # Create the update expressions 65 | params = lasagne.layers.get_all_params(l_output, trainable=True) 66 | updates = lasagne.updates.adam(loss, params, learning_rate=5e-4) 67 | # Compile the function for a training step, as well as the prediction function and 68 | # a utility function to get the inner details of the NTM 69 | train_fn = theano.function([input_var, target_var], loss, updates=updates) 70 | ntm_fn = theano.function([input_var], pred_var) 71 | ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True)) 72 | 73 | # Training 74 | try: 75 | scores, all_scores = [], [] 76 | for i, (example_input, example_output) in generator: 77 | score = train_fn(example_input, example_output) 78 | scores.append(score) 79 | all_scores.append(score) 80 | if i % 500 == 0: 81 | mean_scores = np.mean(scores) 82 | if mean_scores < 1e-4 and generator.max_length < 20: 83 | generator.max_length *= 2 84 | print 'Batch #%d: %.6f' % (i, mean_scores) 85 | scores = [] 86 | except KeyboardInterrupt: 87 | pass 88 | -------------------------------------------------------------------------------- /examples/repeat-copy-task.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | 6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer 7 | import lasagne.layers 8 | import lasagne.nonlinearities 9 | import lasagne.updates 10 | import lasagne.objectives 11 | import lasagne.init 12 | 13 | from ntm.layers import NTMLayer 14 | from ntm.memory import Memory 15 | from ntm.controllers import DenseController 16 | from ntm.heads import WriteHead, ReadHead 17 | from ntm.updates import graves_rmsprop 18 | 19 | from utils.generators import RepeatCopyTask 20 | from utils.visualization import Dashboard 21 | 22 | 23 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)): 24 | 25 | # Input Layer 26 | l_input = InputLayer((batch_size, None, size + 2), input_var=input_var) 27 | _, seqlen, _ = l_input.input_var.shape 28 | 29 | # Neural Turing Machine Layer 30 | memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False) 31 | controller = DenseController(l_input, memory_shape=memory_shape, 32 | num_units=num_units, num_reads=1, 33 | nonlinearity=lasagne.nonlinearities.rectify, 34 | name='controller') 35 | heads = [ 36 | WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False, 37 | nonlinearity_key=lasagne.nonlinearities.rectify, 38 | nonlinearity_add=lasagne.nonlinearities.rectify), 39 | ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False, 40 | nonlinearity_key=lasagne.nonlinearities.rectify) 41 | ] 42 | l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads) 43 | 44 | # Output Layer 45 | l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units)) 46 | l_output_dense = DenseLayer(l_output_reshape, num_units=size + 2, nonlinearity=lasagne.nonlinearities.sigmoid, \ 47 | name='dense') 48 | l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 2)) 49 | 50 | return l_output, l_ntm 51 | 52 | 53 | if __name__ == '__main__': 54 | # Define the input and expected output variable 55 | input_var, target_var = T.tensor3s('input', 'target') 56 | # The generator to sample examples from 57 | generator = RepeatCopyTask(batch_size=1, max_iter=1000000, size=8, min_length=3, \ 58 | max_length=5, max_repeats=5, unary=True, end_marker=True) 59 | # The model (1-layer Neural Turing Machine) 60 | l_output, l_ntm = model(input_var, batch_size=generator.batch_size, 61 | size=generator.size, num_units=100, memory_shape=(128, 20)) 62 | # The generated output variable and the loss function 63 | pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6) 64 | loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var)) 65 | # Create the update expressions 66 | params = lasagne.layers.get_all_params(l_output, trainable=True) 67 | updates = lasagne.updates.adam(loss, params, learning_rate=5e-4) 68 | # Compile the function for a training step, as well as the prediction function and 69 | # a utility function to get the inner details of the NTM 70 | train_fn = theano.function([input_var, target_var], loss, updates=updates) 71 | ntm_fn = theano.function([input_var], pred_var) 72 | ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True)) 73 | 74 | # Training 75 | try: 76 | scores, all_scores = [], [] 77 | for i, (example_input, example_output) in generator: 78 | score = train_fn(example_input, example_output) 79 | scores.append(score) 80 | all_scores.append(score) 81 | if i % 500 == 0: 82 | mean_scores = np.mean(scores) 83 | print 'Batch #%d: %.6f' % (i, mean_scores) 84 | scores = [] 85 | except KeyboardInterrupt: 86 | pass 87 | 88 | # Visualization 89 | def marker(generator): 90 | def marker_(params): 91 | num_repeats_length = params['repeats'] if generator.unary else 1 92 | return params['length'] + num_repeats_length 93 | return marker_ 94 | markers = [ 95 | { 96 | 'location': marker(generator), 97 | 'style': {'color': 'red', 'ls': '-'} 98 | } 99 | ] 100 | 101 | dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \ 102 | memory_shape=(128, 20), markers=markers, cmap='bone') 103 | 104 | # Example 105 | params = generator.sample_params() 106 | dashboard.sample(**params) 107 | -------------------------------------------------------------------------------- /examples/reversed-copy-task.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | 6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer 7 | import lasagne.layers 8 | import lasagne.nonlinearities 9 | import lasagne.updates 10 | import lasagne.objectives 11 | import lasagne.init 12 | 13 | from ntm.layers import NTMLayer 14 | from ntm.memory import Memory 15 | from ntm.controllers import RecurrentController 16 | from ntm.heads import WriteHead, ReadHead 17 | from ntm.updates import graves_rmsprop 18 | 19 | from utils.generators import ReversedCopyTask 20 | from utils.visualization import Dashboard 21 | 22 | 23 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)): 24 | 25 | # Input Layer 26 | l_input = InputLayer((batch_size, None, size + 1), input_var=input_var) 27 | _, seqlen, _ = l_input.input_var.shape 28 | 29 | # Neural Turing Machine Layer 30 | memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False) 31 | controller = RecurrentController(l_input, memory_shape=memory_shape, 32 | num_units=num_units, num_reads=1, 33 | nonlinearity=lasagne.nonlinearities.rectify, 34 | name='controller') 35 | heads = [ 36 | WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False, 37 | nonlinearity_key=lasagne.nonlinearities.rectify, 38 | nonlinearity_add=lasagne.nonlinearities.rectify), 39 | ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False, 40 | nonlinearity_key=lasagne.nonlinearities.rectify) 41 | ] 42 | l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads) 43 | 44 | # Output Layer 45 | l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units)) 46 | l_output_dense = DenseLayer(l_output_reshape, num_units=size + 1, nonlinearity=lasagne.nonlinearities.sigmoid, \ 47 | name='recurrent') 48 | l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 1)) 49 | 50 | return l_output, l_ntm 51 | 52 | 53 | if __name__ == '__main__': 54 | # Define the input and expected output variable 55 | input_var, target_var = T.tensor3s('input', 'target') 56 | # The generator to sample examples from 57 | generator = ReversedCopyTask(batch_size=1, max_iter=1000000, size=8, max_length=5, end_marker=True) 58 | # The model (1-layer Neural Turing Machine) 59 | l_output, l_ntm = model(input_var, batch_size=generator.batch_size, \ 60 | size=generator.size, num_units=100, memory_shape=(128, 20)) 61 | # The generated output variable and the loss function 62 | pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6) 63 | loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var)) 64 | # Create the update expressions 65 | params = lasagne.layers.get_all_params(l_output, trainable=True) 66 | updates = graves_rmsprop(loss, params, learning_rate=1e-3) 67 | # Compile the function for a training step, as well as the prediction function and 68 | # a utility function to get the inner details of the NTM 69 | train_fn = theano.function([input_var, target_var], loss, updates=updates) 70 | ntm_fn = theano.function([input_var], pred_var) 71 | ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True)) 72 | 73 | # Training 74 | try: 75 | scores, all_scores = [], [] 76 | for i, (example_input, example_output) in generator: 77 | score = train_fn(example_input, example_output) 78 | scores.append(score) 79 | all_scores.append(score) 80 | if i % 500 == 0: 81 | mean_scores = np.mean(scores) 82 | print 'Batch #%d: %.6f' % (i, mean_scores) 83 | scores = [] 84 | except KeyboardInterrupt: 85 | pass 86 | 87 | # Visualization 88 | markers = [ 89 | { 90 | 'location': (lambda params: params['length']), 91 | 'style': {'color': 'red'} 92 | } 93 | ] 94 | 95 | dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \ 96 | memory_shape=(128, 20), markers=markers, cmap='bone') 97 | 98 | # Example 99 | params = generator.sample_params() 100 | dashboard.sample(**params) 101 | -------------------------------------------------------------------------------- /examples/sort-task.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | 6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer 7 | import lasagne.layers 8 | import lasagne.nonlinearities 9 | import lasagne.updates 10 | import lasagne.objectives 11 | import lasagne.init 12 | 13 | from ntm.layers import NTMLayer 14 | from ntm.memory import Memory 15 | from ntm.controllers import GRUController 16 | from ntm.heads import WriteHead, ReadHead 17 | from ntm.updates import graves_rmsprop 18 | 19 | from utils.generators import SortTask 20 | from utils.visualization import Dashboard 21 | 22 | 23 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)): 24 | 25 | # Input Layer 26 | l_input = InputLayer((batch_size, None, size + 1), input_var=input_var) 27 | _, seqlen, _ = l_input.input_var.shape 28 | 29 | # Neural Turing Machine Layer 30 | memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False) 31 | controller = GRUController(l_input, memory_shape=memory_shape, 32 | num_units=num_units, num_reads=1, 33 | nonlinearity=lasagne.nonlinearities.rectify, 34 | name='controller') 35 | heads = [ 36 | WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False, 37 | nonlinearity_key=lasagne.nonlinearities.rectify, 38 | nonlinearity_add=lasagne.nonlinearities.rectify), 39 | ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False, 40 | nonlinearity_key=lasagne.nonlinearities.rectify) 41 | ] 42 | l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads) 43 | 44 | # Output Layer 45 | l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units)) 46 | l_output_dense = DenseLayer(l_output_reshape, num_units=size + 1, nonlinearity=lasagne.nonlinearities.sigmoid, \ 47 | name='dense') 48 | l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 1)) 49 | 50 | return l_output, l_ntm 51 | 52 | 53 | if __name__ == '__main__': 54 | # Define the input and expected output variable 55 | input_var, target_var = T.tensor3s('input', 'target') 56 | # The generator to sample examples from 57 | generator = SortTask(batch_size=1, max_iter=1000000, size=8, max_length=5, end_marker=True) 58 | # The model (1-layer Neural Turing Machine) 59 | l_output, l_ntm = model(input_var, batch_size=generator.batch_size, \ 60 | size=generator.size, num_units=100, memory_shape=(128, 20)) 61 | # The generated output variable and the loss function 62 | pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6) 63 | loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var)) 64 | # Create the update expressions 65 | params = lasagne.layers.get_all_params(l_output, trainable=True) 66 | updates = graves_rmsprop(loss, params, learning_rate=1e-3) 67 | # Compile the function for a training step, as well as the prediction function and 68 | # a utility function to get the inner details of the NTM 69 | train_fn = theano.function([input_var, target_var], loss, updates=updates) 70 | ntm_fn = theano.function([input_var], pred_var) 71 | ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True)) 72 | 73 | # Training 74 | try: 75 | scores, all_scores = [], [] 76 | for i, (example_input, example_output) in generator: 77 | score = train_fn(example_input, example_output) 78 | scores.append(score) 79 | all_scores.append(score) 80 | if i % 500 == 0: 81 | mean_scores = np.mean(scores) 82 | print 'Batch #%d: %.6f' % (i, mean_scores) 83 | scores = [] 84 | except KeyboardInterrupt: 85 | pass 86 | 87 | # Visualization 88 | markers = [ 89 | { 90 | 'location': (lambda params: params['length']), 91 | 'style': {'color': 'red'} 92 | } 93 | ] 94 | 95 | dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \ 96 | memory_shape=(128, 20), markers=markers, cmap='bone') 97 | 98 | # Example 99 | params = generator.sample_params() 100 | dashboard.sample(**params) 101 | -------------------------------------------------------------------------------- /examples/upsidedown-copy-task.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | import matplotlib.pyplot as plt 5 | 6 | from lasagne.layers import InputLayer, DenseLayer, ReshapeLayer 7 | import lasagne.layers 8 | import lasagne.nonlinearities 9 | import lasagne.updates 10 | import lasagne.objectives 11 | import lasagne.init 12 | 13 | from ntm.layers import NTMLayer 14 | from ntm.memory import Memory 15 | from ntm.controllers import DenseController 16 | from ntm.heads import WriteHead, ReadHead 17 | from ntm.updates import graves_rmsprop 18 | 19 | from utils.generators import UpsideDownCopyTask 20 | from utils.visualization import Dashboard 21 | 22 | 23 | def model(input_var, batch_size=1, size=8, num_units=100, memory_shape=(128, 20)): 24 | 25 | # Input Layer 26 | l_input = InputLayer((batch_size, None, size + 1), input_var=input_var) 27 | _, seqlen, _ = l_input.input_var.shape 28 | 29 | # Neural Turing Machine Layer 30 | memory = Memory(memory_shape, name='memory', memory_init=lasagne.init.Constant(1e-6), learn_init=False) 31 | controller = DenseController(l_input, memory_shape=memory_shape, 32 | num_units=num_units, num_reads=1, 33 | nonlinearity=lasagne.nonlinearities.rectify, 34 | name='controller') 35 | heads = [ 36 | WriteHead(controller, num_shifts=3, memory_shape=memory_shape, name='write', learn_init=False, 37 | nonlinearity_key=lasagne.nonlinearities.rectify, 38 | nonlinearity_add=lasagne.nonlinearities.rectify), 39 | ReadHead(controller, num_shifts=3, memory_shape=memory_shape, name='read', learn_init=False, 40 | nonlinearity_key=lasagne.nonlinearities.rectify) 41 | ] 42 | l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads) 43 | 44 | # Output Layer 45 | l_output_reshape = ReshapeLayer(l_ntm, (-1, num_units)) 46 | l_output_dense = DenseLayer(l_output_reshape, num_units=size + 1, nonlinearity=lasagne.nonlinearities.sigmoid, \ 47 | name='dense') 48 | l_output = ReshapeLayer(l_output_dense, (batch_size, seqlen, size + 1)) 49 | 50 | return l_output, l_ntm 51 | 52 | 53 | if __name__ == '__main__': 54 | # Define the input and expected output variable 55 | input_var, target_var = T.tensor3s('input', 'target') 56 | # The generator to sample examples from 57 | generator = UpsideDownCopyTask(batch_size=1, max_iter=1000000, size=8, max_length=5, end_marker=True) 58 | # The model (1-layer Neural Turing Machine) 59 | l_output, l_ntm = model(input_var, batch_size=generator.batch_size, \ 60 | size=generator.size, num_units=100, memory_shape=(128, 20)) 61 | # The generated output variable and the loss function 62 | pred_var = T.clip(lasagne.layers.get_output(l_output), 1e-6, 1. - 1e-6) 63 | loss = T.mean(lasagne.objectives.binary_crossentropy(pred_var, target_var)) 64 | # Create the update expressions 65 | params = lasagne.layers.get_all_params(l_output, trainable=True) 66 | updates = graves_rmsprop(loss, params, learning_rate=1e-3) 67 | # Compile the function for a training step, as well as the prediction function and 68 | # a utility function to get the inner details of the NTM 69 | train_fn = theano.function([input_var, target_var], loss, updates=updates) 70 | ntm_fn = theano.function([input_var], pred_var) 71 | ntm_layer_fn = theano.function([input_var], lasagne.layers.get_output(l_ntm, get_details=True)) 72 | 73 | # Training 74 | try: 75 | scores, all_scores = [], [] 76 | for i, (example_input, example_output) in generator: 77 | score = train_fn(example_input, example_output) 78 | scores.append(score) 79 | all_scores.append(score) 80 | if i % 500 == 0: 81 | mean_scores = np.mean(scores) 82 | print 'Batch #%d: %.6f' % (i, mean_scores) 83 | scores = [] 84 | except KeyboardInterrupt: 85 | pass 86 | 87 | # Visualization 88 | markers = [ 89 | { 90 | 'location': (lambda params: params['length']), 91 | 'style': {'color': 'red'} 92 | } 93 | ] 94 | 95 | dashboard = Dashboard(generator=generator, ntm_fn=ntm_fn, ntm_layer_fn=ntm_layer_fn, \ 96 | memory_shape=(128, 20), markers=markers, cmap='bone') 97 | 98 | # Example 99 | params = generator.sample_params() 100 | dashboard.sample(**params) 101 | -------------------------------------------------------------------------------- /models/README.rst: -------------------------------------------------------------------------------- 1 | Models 2 | ====== 3 | 4 | Copy task 5 | --------- 6 | 7 | General 8 | ^^^^^^^ 9 | * **Batch size** : 1 10 | * **Architecture** : Single NTM layer + Dense output layer 11 | 12 | - *Memory shape* : ``(128, 20)`` 13 | - *Controller* : Dense controller 14 | - *Heads* : 1 Read head + 1 Write head 15 | 16 | * **Training examples** 17 | 18 | - *Size* : 8 19 | - *Minimum Length* : 1 20 | - *Maximum Length* : 5 21 | 22 | Optimization 23 | ^^^^^^^^^^^^ 24 | * **Objective** : Binary Cross-Entropy 25 | 26 | - Prediction trucated to ``[1e-10, 1 - 1e-10]`` 27 | 28 | * **Learning algorithm** : A. Graves' RMSProp 29 | 30 | - *Learning rate* : ``1e-3`` 31 | - *Chi* : ``0.95`` 32 | - *Alpha* : ``0.9`` 33 | - *Epsilon* : ``1e-4`` 34 | 35 | Parameters 36 | ^^^^^^^^^^ 37 | +------------------+--------------+---------------------+------------------+------------------+ 38 | | | Parameter | W (init) | b (init) | nonlinearity | 39 | +==================+==============+=====================+==================+==================+ 40 | | **Controller** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 41 | +------------------+--------------+---------------------+------------------+------------------+ 42 | | | ``sign`` | ``None`` | \- | \- | 43 | | **Read Head** +--------------+---------------------+------------------+------------------+ 44 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 45 | | +--------------+---------------------+------------------+------------------+ 46 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 47 | | +--------------+---------------------+------------------+------------------+ 48 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 49 | | +--------------+---------------------+------------------+------------------+ 50 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 51 | | +--------------+---------------------+------------------+------------------+ 52 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 53 | +------------------+--------------+---------------------+------------------+------------------+ 54 | | | ``sign`` | ``None`` | \- | \- | 55 | | **Write Head** +--------------+---------------------+------------------+------------------+ 56 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 57 | | +--------------+---------------------+------------------+------------------+ 58 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 59 | | +--------------+---------------------+------------------+------------------+ 60 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 61 | | +--------------+---------------------+------------------+------------------+ 62 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 63 | | +--------------+---------------------+------------------+------------------+ 64 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 65 | | +--------------+---------------------+------------------+------------------+ 66 | | | ``sign_add`` | ``None`` | \- | \- | 67 | | +--------------+---------------------+------------------+------------------+ 68 | | | ``add`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 69 | | +--------------+---------------------+------------------+------------------+ 70 | | | ``erase`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 71 | +------------------+--------------+---------------------+------------------+------------------+ 72 | | **Dense Layer** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid`` | 73 | +------------------+--------------+---------------------+------------------+------------------+ 74 | 75 | Initialization 76 | ^^^^^^^^^^^^^^ 77 | +------------------+---------------------+-------------+-------------------+ 78 | | | Initialization | Learn init? | Operation dropout | 79 | +==================+=====================+=============+===================+ 80 | | **Memory** | ``GlorotUniform()`` | ``False`` | \- | 81 | +------------------+---------------------+-------------+-------------------+ 82 | | **Read Head** | ``init.OneHot()`` | ``False`` | No | 83 | +------------------+---------------------+-------------+-------------------+ 84 | | **Write Head** | ``init.OneHot()`` | ``False`` | No | 85 | +------------------+---------------------+-------------+-------------------+ 86 | 87 | 88 | Repeat Copy task 89 | ---------------- 90 | **Git commit** : ``90d72d6`` 91 | 92 | General 93 | ^^^^^^^ 94 | * **Batch size** : 1 95 | * **Architecture** : Single NTM layer + Dense output layer 96 | 97 | - *Memory shape* : ``(128, 20)`` 98 | - *Controller* : Dense controller 99 | - *Heads* : 1 Read head + 1 Write head 100 | 101 | * **Training examples** 102 | 103 | - *Size* : 8 104 | - *Minimum Length* : 3 105 | - *Maximum Length* : 5 106 | - *Minimum Repeat number* : 1 107 | - *Maximum Repeat number* : 5 108 | - *Unary* : ``True`` 109 | 110 | Optimization 111 | ^^^^^^^^^^^^ 112 | * **Objective** : Binary Cross-Entropy 113 | 114 | - Prediction trucated to ``[1e-10, 1 - 1e-10]`` 115 | 116 | * **Learning algorithm** : A. Graves' RMSProp 117 | 118 | - *Learning rate* : ``1e-3`` 119 | - *Chi* : ``0.95`` 120 | - *Alpha* : ``0.9`` 121 | - *Epsilon* : ``1e-4`` 122 | 123 | Parameters 124 | ^^^^^^^^^^ 125 | +------------------+--------------+---------------------+------------------+------------------+ 126 | | | Parameter | W (init) | b (init) | nonlinearity | 127 | +==================+==============+=====================+==================+==================+ 128 | | **Controller** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 129 | +------------------+--------------+---------------------+------------------+------------------+ 130 | | | ``sign`` | ``None`` | \- | \- | 131 | | **Read Head** +--------------+---------------------+------------------+------------------+ 132 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 133 | | +--------------+---------------------+------------------+------------------+ 134 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 135 | | +--------------+---------------------+------------------+------------------+ 136 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 137 | | +--------------+---------------------+------------------+------------------+ 138 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 139 | | +--------------+---------------------+------------------+------------------+ 140 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 141 | +------------------+--------------+---------------------+------------------+------------------+ 142 | | | ``sign`` | ``None`` | \- | \- | 143 | | **Write Head** +--------------+---------------------+------------------+------------------+ 144 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 145 | | +--------------+---------------------+------------------+------------------+ 146 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 147 | | +--------------+---------------------+------------------+------------------+ 148 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 149 | | +--------------+---------------------+------------------+------------------+ 150 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 151 | | +--------------+---------------------+------------------+------------------+ 152 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 153 | | +--------------+---------------------+------------------+------------------+ 154 | | | ``sign_add`` | ``None`` | \- | \- | 155 | | +--------------+---------------------+------------------+------------------+ 156 | | | ``add`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 157 | | +--------------+---------------------+------------------+------------------+ 158 | | | ``erase`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 159 | +------------------+--------------+---------------------+------------------+------------------+ 160 | | **Dense Layer** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid`` | 161 | +------------------+--------------+---------------------+------------------+------------------+ 162 | 163 | Initialization 164 | ^^^^^^^^^^^^^^ 165 | +------------------+---------------------+-------------+-------------------+ 166 | | | Initialization | Learn init? | Operation dropout | 167 | +==================+=====================+=============+===================+ 168 | | **Memory** | ``GlorotUniform()`` | ``False`` | \- | 169 | +------------------+---------------------+-------------+-------------------+ 170 | | **Read Head** | ``init.OneHot()`` | ``False`` | No | 171 | +------------------+---------------------+-------------+-------------------+ 172 | | **Write Head** | ``init.OneHot()`` | ``False`` | No | 173 | +------------------+---------------------+-------------+-------------------+ 174 | 175 | 176 | Associative Recall task 177 | ----------------------- 178 | **Git commit** : ``3bd7512`` 179 | 180 | General 181 | ^^^^^^^ 182 | * **Batch size** : 1 183 | * **Architecture** : Single NTM layer + Dense output layer 184 | 185 | - *Memory shape* : ``(128, 20)`` 186 | - *Controller* : Dense controller 187 | - *Heads* : 1 Read head + 1 Write head 188 | 189 | * **Training examples** 190 | 191 | - *Size* : 8 192 | - *Minimum Item Length* : 1 193 | - *Maximum Item Length* : 3 194 | - *Minimum Number of Items* : 2 195 | - *Maximum Number of Items* : 6 196 | 197 | Optimization 198 | ^^^^^^^^^^^^ 199 | * **Objective** : Binary Cross-Entropy 200 | 201 | - Prediction trucated to ``[1e-10, 1 - 1e-10]`` 202 | 203 | * **Learning algorithm** : Adam 204 | 205 | - *Learning rate* : ``1e-4`` 206 | - *Beta1* : ``0.9`` 207 | - *Beta2* : ``0.999`` 208 | - *Epsilon* : ``1e-8`` 209 | 210 | Parameters 211 | ^^^^^^^^^^ 212 | +------------------+--------------+---------------------+------------------+------------------+ 213 | | | Parameter | W (init) | b (init) | nonlinearity | 214 | +==================+==============+=====================+==================+==================+ 215 | | **Controller** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 216 | +------------------+--------------+---------------------+------------------+------------------+ 217 | | | ``sign`` | ``None`` | \- | \- | 218 | | **Read Head** +--------------+---------------------+------------------+------------------+ 219 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 220 | | +--------------+---------------------+------------------+------------------+ 221 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 222 | | +--------------+---------------------+------------------+------------------+ 223 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 224 | | +--------------+---------------------+------------------+------------------+ 225 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 226 | | +--------------+---------------------+------------------+------------------+ 227 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 228 | +------------------+--------------+---------------------+------------------+------------------+ 229 | | | ``sign`` | ``None`` | \- | \- | 230 | | **Write Head** +--------------+---------------------+------------------+------------------+ 231 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 232 | | +--------------+---------------------+------------------+------------------+ 233 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 234 | | +--------------+---------------------+------------------+------------------+ 235 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 236 | | +--------------+---------------------+------------------+------------------+ 237 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 238 | | +--------------+---------------------+------------------+------------------+ 239 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 240 | | +--------------+---------------------+------------------+------------------+ 241 | | | ``sign_add`` | ``None`` | \- | \- | 242 | | +--------------+---------------------+------------------+------------------+ 243 | | | ``add`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 244 | | +--------------+---------------------+------------------+------------------+ 245 | | | ``erase`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 246 | +------------------+--------------+---------------------+------------------+------------------+ 247 | | **Dense Layer** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid`` | 248 | +------------------+--------------+---------------------+------------------+------------------+ 249 | 250 | Initialization 251 | ^^^^^^^^^^^^^^ 252 | +------------------+---------------------+-------------+-------------------+ 253 | | | Initialization | Learn init? | Operation dropout | 254 | +==================+=====================+=============+===================+ 255 | | **Memory** | ``Constant(1e-6)`` | ``False`` | \- | 256 | +------------------+---------------------+-------------+-------------------+ 257 | | **Read Head** | ``init.OneHot()`` | ``False`` | No | 258 | +------------------+---------------------+-------------+-------------------+ 259 | | **Write Head** | ``init.OneHot()`` | ``False`` | No | 260 | +------------------+---------------------+-------------+-------------------+ 261 | 262 | 263 | Dyck Words task 264 | --------------- 265 | **Git commit** : ``873deec`` 266 | 267 | General 268 | ^^^^^^^ 269 | * **Batch size** : 1 270 | * **Architecture** : Single NTM layer + Dense output layer 271 | 272 | - *Memory shape* : ``(128, 20)`` 273 | - *Controller* : Dense controller 274 | - *Heads* : 1 Read head + 1 Write head 275 | 276 | * **Training examples** 277 | 278 | - *Initial Maximum Semi-Length* : 5 279 | Double maximum semi-length every time the mean loss over 500 samples is below ``1e-4`` up to a maximum of 40. 280 | 281 | Optimization 282 | ^^^^^^^^^^^^ 283 | * **Objective** : Binary Cross-Entropy 284 | 285 | - Prediction trucated to ``[1e-10, 1 - 1e-10]`` 286 | 287 | * **Learning algorithm** : Adam 288 | 289 | - *Learning rate* : ``1e-3`` 290 | - *Beta1* : ``0.9`` 291 | - *Beta2* : ``0.999`` 292 | - *Epsilon* : ``1e-8`` 293 | 294 | Parameters 295 | ^^^^^^^^^^ 296 | +------------------+--------------+---------------------+------------------+------------------+ 297 | | | Parameter | W (init) | b (init) | nonlinearity | 298 | +==================+==============+=====================+==================+==================+ 299 | | **Controller** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 300 | +------------------+--------------+---------------------+------------------+------------------+ 301 | | | ``sign`` | ``None`` | \- | \- | 302 | | **Read Head** +--------------+---------------------+------------------+------------------+ 303 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 304 | | +--------------+---------------------+------------------+------------------+ 305 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 306 | | +--------------+---------------------+------------------+------------------+ 307 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 308 | | +--------------+---------------------+------------------+------------------+ 309 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 310 | | +--------------+---------------------+------------------+------------------+ 311 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 312 | +------------------+--------------+---------------------+------------------+------------------+ 313 | | | ``sign`` | ``None`` | \- | \- | 314 | | **Write Head** +--------------+---------------------+------------------+------------------+ 315 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 316 | | +--------------+---------------------+------------------+------------------+ 317 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 318 | | +--------------+---------------------+------------------+------------------+ 319 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 320 | | +--------------+---------------------+------------------+------------------+ 321 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 322 | | +--------------+---------------------+------------------+------------------+ 323 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 324 | | +--------------+---------------------+------------------+------------------+ 325 | | | ``sign_add`` | ``None`` | \- | \- | 326 | | +--------------+---------------------+------------------+------------------+ 327 | | | ``add`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 328 | | +--------------+---------------------+------------------+------------------+ 329 | | | ``erase`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 330 | +------------------+--------------+---------------------+------------------+------------------+ 331 | | **Dense Layer** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid`` | 332 | +------------------+--------------+---------------------+------------------+------------------+ 333 | 334 | Initialization 335 | ^^^^^^^^^^^^^^ 336 | +------------------+---------------------+-------------+-------------------+ 337 | | | Initialization | Learn init? | Operation dropout | 338 | +==================+=====================+=============+===================+ 339 | | **Memory** | ``Constant(1e-6)`` | ``False`` | \- | 340 | +------------------+---------------------+-------------+-------------------+ 341 | | **Read Head** | ``init.OneHot()`` | ``False`` | No | 342 | +------------------+---------------------+-------------+-------------------+ 343 | | **Write Head** | ``init.OneHot()`` | ``False`` | No | 344 | +------------------+---------------------+-------------+-------------------+ 345 | 346 | 347 | Upside Down Copy task 348 | --------- 349 | 350 | General 351 | ^^^^^^^ 352 | * **Batch size** : 1 353 | * **Architecture** : Single NTM layer + Dense output layer 354 | 355 | - *Memory shape* : ``(128, 20)`` 356 | - *Controller* : Dense controller 357 | - *Heads* : 1 Read head + 1 Write head 358 | 359 | * **Training examples** 360 | 361 | - *Size* : 8 362 | - *Minimum Length* : 1 363 | - *Maximum Length* : 5 364 | 365 | Optimization 366 | ^^^^^^^^^^^^ 367 | * **Objective** : Binary Cross-Entropy 368 | 369 | - Prediction trucated to ``[1e-10, 1 - 1e-10]`` 370 | 371 | * **Learning algorithm** : A. Graves' RMSProp 372 | 373 | - *Learning rate* : ``1e-3`` 374 | - *Chi* : ``0.95`` 375 | - *Alpha* : ``0.9`` 376 | - *Epsilon* : ``1e-4`` 377 | 378 | Parameters 379 | ^^^^^^^^^^ 380 | +------------------+--------------+---------------------+------------------+------------------+ 381 | | | Parameter | W (init) | b (init) | nonlinearity | 382 | +==================+==============+=====================+==================+==================+ 383 | | **Controller** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 384 | +------------------+--------------+---------------------+------------------+------------------+ 385 | | | ``sign`` | ``None`` | \- | \- | 386 | | **Read Head** +--------------+---------------------+------------------+------------------+ 387 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 388 | | +--------------+---------------------+------------------+------------------+ 389 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 390 | | +--------------+---------------------+------------------+------------------+ 391 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 392 | | +--------------+---------------------+------------------+------------------+ 393 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 394 | | +--------------+---------------------+------------------+------------------+ 395 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 396 | +------------------+--------------+---------------------+------------------+------------------+ 397 | | | ``sign`` | ``None`` | \- | \- | 398 | | **Write Head** +--------------+---------------------+------------------+------------------+ 399 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 400 | | +--------------+---------------------+------------------+------------------+ 401 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 402 | | +--------------+---------------------+------------------+------------------+ 403 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 404 | | +--------------+---------------------+------------------+------------------+ 405 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 406 | | +--------------+---------------------+------------------+------------------+ 407 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 408 | | +--------------+---------------------+------------------+------------------+ 409 | | | ``sign_add`` | ``None`` | \- | \- | 410 | | +--------------+---------------------+------------------+------------------+ 411 | | | ``add`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 412 | | +--------------+---------------------+------------------+------------------+ 413 | | | ``erase`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 414 | +------------------+--------------+---------------------+------------------+------------------+ 415 | | **Dense Layer** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid`` | 416 | +------------------+--------------+---------------------+------------------+------------------+ 417 | 418 | Initialization 419 | ^^^^^^^^^^^^^^ 420 | +------------------+---------------------+-------------+-------------------+ 421 | | | Initialization | Learn init? | Operation dropout | 422 | +==================+=====================+=============+===================+ 423 | | **Memory** | ``GlorotUniform()`` | ``False`` | \- | 424 | +------------------+---------------------+-------------+-------------------+ 425 | | **Read Head** | ``init.OneHot()`` | ``False`` | No | 426 | +------------------+---------------------+-------------+-------------------+ 427 | | **Write Head** | ``init.OneHot()`` | ``False`` | No | 428 | +------------------+---------------------+-------------+-------------------+ 429 | 430 | 431 | Reversed Copy task 432 | --------- 433 | 434 | General 435 | ^^^^^^^ 436 | * **Batch size** : 1 437 | * **Architecture** : Single NTM layer + Recurrent output layer 438 | 439 | - *Memory shape* : ``(128, 20)`` 440 | - *Controller* : Recurrent controller 441 | - *Heads* : 1 Read head + 1 Write head 442 | 443 | * **Training examples** 444 | 445 | - *Size* : 8 446 | - *Minimum Length* : 1 447 | - *Maximum Length* : 5 448 | 449 | Optimization 450 | ^^^^^^^^^^^^ 451 | * **Objective** : Binary Cross-Entropy 452 | 453 | - Prediction trucated to ``[1e-10, 1 - 1e-10]`` 454 | 455 | * **Learning algorithm** : A. Graves' RMSProp 456 | 457 | - *Learning rate* : ``1e-3`` 458 | - *Chi* : ``0.95`` 459 | - *Alpha* : ``0.9`` 460 | - *Epsilon* : ``1e-4`` 461 | 462 | Parameters 463 | ^^^^^^^^^^ 464 | +----------------------+--------------+---------------------+------------------+------------------+ 465 | | | Parameter | W (init) | b (init) | nonlinearity | 466 | +======================+==============+=====================+==================+==================+ 467 | | **Controller** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 468 | +----------------------+--------------+---------------------+------------------+------------------+ 469 | | | ``sign`` | ``None`` | \- | \- | 470 | | **Read Head** +--------------+---------------------+------------------+------------------+ 471 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 472 | | +--------------+---------------------+------------------+------------------+ 473 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 474 | | +--------------+---------------------+------------------+------------------+ 475 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 476 | | +--------------+---------------------+------------------+------------------+ 477 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 478 | | +--------------+---------------------+------------------+------------------+ 479 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 480 | +----------------------+--------------+---------------------+------------------+------------------+ 481 | | | ``sign`` | ``None`` | \- | \- | 482 | | **Write Head** +--------------+---------------------+------------------+------------------+ 483 | | | ``key`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 484 | | +--------------+---------------------+------------------+------------------+ 485 | | | ``beta`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 486 | | +--------------+---------------------+------------------+------------------+ 487 | | | ``gate`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 488 | | +--------------+---------------------+------------------+------------------+ 489 | | | ``shift`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``softmax`` | 490 | | +--------------+---------------------+------------------+------------------+ 491 | | | ``gamma`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``1. + rectify`` | 492 | | +--------------+---------------------+------------------+------------------+ 493 | | | ``sign_add`` | ``None`` | \- | \- | 494 | | +--------------+---------------------+------------------+------------------+ 495 | | | ``add`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``rectify`` | 496 | | +--------------+---------------------+------------------+------------------+ 497 | | | ``erase`` | ``GlorotUniform()`` | ``Constant(0.)`` | ``hard_sigmoid`` | 498 | +----------------------+--------------+---------------------+------------------+------------------+ 499 | | **Recurrent Layer** | \- | ``GlorotUniform()`` | ``Constant(0.)`` | ``sigmoid`` | 500 | +----------------------+--------------+---------------------+------------------+------------------+ 501 | 502 | Initialization 503 | ^^^^^^^^^^^^^^ 504 | +------------------+---------------------+-------------+-------------------+ 505 | | | Initialization | Learn init? | Operation dropout | 506 | +==================+=====================+=============+===================+ 507 | | **Memory** | ``GlorotUniform()`` | ``False`` | \- | 508 | +------------------+---------------------+-------------+-------------------+ 509 | | **Read Head** | ``init.OneHot()`` | ``False`` | No | 510 | +------------------+---------------------+-------------+-------------------+ 511 | | **Write Head** | ``init.OneHot()`` | ``False`` | No | 512 | +------------------+---------------------+-------------+-------------------+ 513 | -------------------------------------------------------------------------------- /models/associative-recall.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/associative-recall.npy -------------------------------------------------------------------------------- /models/copy.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/copy.npy -------------------------------------------------------------------------------- /models/dyck-words.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/dyck-words.npy -------------------------------------------------------------------------------- /models/repeat-copy.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/repeat-copy.npy -------------------------------------------------------------------------------- /models/reversed-copy.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/reversed-copy.npy -------------------------------------------------------------------------------- /models/sort.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/sort.npy -------------------------------------------------------------------------------- /models/upsidedown-copy.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/models/upsidedown-copy.npy -------------------------------------------------------------------------------- /ntm/__init__.py: -------------------------------------------------------------------------------- 1 | from . import controllers 2 | from . import heads 3 | from . import init 4 | from . import layers 5 | from . import memory 6 | from . import nonlinearities 7 | from . import similarities 8 | from . import updates -------------------------------------------------------------------------------- /ntm/controllers.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | 5 | from lasagne.layers import Layer 6 | from lasagne.layers.recurrent import Gate 7 | import lasagne.nonlinearities 8 | import lasagne.init 9 | 10 | 11 | class Controller(Layer): 12 | r""" 13 | The base class :class:`Controller` represents a generic controller 14 | for the Neural Turing Machine. The controller is a neural network 15 | (feed-forward or recurrent) making the interface between the 16 | incoming layer (eg. an instance of :class:`lasagne.layers.InputLayer`) 17 | and the NTM. 18 | 19 | Parameters 20 | ---------- 21 | incoming: a :class:`lasagne.layers.Layer` instance 22 | The layer feeding into the Neural Turing Machine. 23 | memory_shape: tuple 24 | Shape of the NTM's memory. 25 | num_units: int 26 | Number of hidden units in the controller. 27 | num_reads: int 28 | Number of read heads in the Neural Turing Machine. 29 | hid_init: callable, Numpy array or Theano shared variable 30 | Initializer for the initial hidden state (:math:`h_{0}`). 31 | learn_init: bool 32 | If ``True``, initial hidden values are learned. 33 | """ 34 | def __init__(self, incoming, memory_shape, num_units, num_reads, 35 | hid_init=lasagne.init.GlorotUniform(), 36 | learn_init=False, 37 | **kwargs): 38 | super(Controller, self).__init__(incoming, **kwargs) 39 | self.hid_init = self.add_param(hid_init, (1, num_units), 40 | name='hid_init', regularizable=False, trainable=learn_init) 41 | self.memory_shape = memory_shape 42 | self.num_units = num_units 43 | self.num_reads = num_reads 44 | 45 | def step(self, input, reads, hidden, state, *args, **kwargs): 46 | raise NotImplementedError 47 | 48 | def get_output_shape_for(self, input_shape): 49 | return (input_shape[0], self.num_units) 50 | 51 | 52 | class DenseController(Controller): 53 | r""" 54 | A fully connected (feed-forward) controller for the NTM. 55 | 56 | .. math :: 57 | h_t = \sigma(x_{t} W_{x} + r_{t} W_{r} + b_{x} + b_{r}) 58 | 59 | Parameters 60 | ---------- 61 | incoming: a :class:`lasagne.layers.Layer` instance 62 | The layer feeding into the Neural Turing Machine. 63 | memory_shape: tuple 64 | Shape of the NTM's memory. 65 | num_units: int 66 | Number of hidden units in the controller. 67 | num_reads: int 68 | Number of read heads in the Neural Turing Machine. 69 | W_in_to_hid: callable, Numpy array or Theano shared variable 70 | If callable, initializer for the weights between the 71 | input and the hidden state. Otherwise a matrix with 72 | shape ``(num_inputs, num_units)`` (:math:`W_{x}`). 73 | b_in_to_hid: callable, Numpy array, Theano shared variable or ``None`` 74 | If callable, initializer for the biases between the 75 | input and the hidden state. If ``None``, the controller 76 | has no bias between the input and the hidden state. Otherwise 77 | a 1D array with shape ``(num_units,)`` (:math:`b_{x}`). 78 | W_reads_to_hid: callable, Numpy array or Theano shared variable 79 | If callable, initializer for the weights between the 80 | read vector and the hidden state. Otherwise a matrix with 81 | shape ``(num_reads * memory_shape[1], num_units)`` (:math:`W_{r}`). 82 | b_reads_to_hid: callable, Numpy array, Theano shared variable or ``None`` 83 | If callable, initializer for the biases between the 84 | read vector and the hidden state. If ``None``, the controller 85 | has no bias between the read vector and the hidden state. 86 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`b_{r}`). 87 | nonlinearity: callable or ``None`` 88 | The nonlinearity that is applied to the controller. If ``None``, 89 | the controller will be linear (:math:`\sigma`). 90 | hid_init: callable, np.ndarray or theano.shared 91 | Initializer for the initial hidden state (:math:`h_{0}`). 92 | learn_init: bool 93 | If ``True``, initial hidden values are learned. 94 | """ 95 | def __init__(self, incoming, memory_shape, num_units, num_reads, 96 | W_in_to_hid=lasagne.init.GlorotUniform(), 97 | b_in_to_hid=lasagne.init.Constant(0.), 98 | W_reads_to_hid=lasagne.init.GlorotUniform(), 99 | b_reads_to_hid=lasagne.init.Constant(0.), 100 | nonlinearity=lasagne.nonlinearities.rectify, 101 | hid_init=lasagne.init.GlorotUniform(), 102 | learn_init=False, 103 | **kwargs): 104 | super(DenseController, self).__init__(incoming, memory_shape, num_units, 105 | num_reads, hid_init, learn_init, 106 | **kwargs) 107 | self.nonlinearity = (lasagne.nonlinearities.identity if 108 | nonlinearity is None else nonlinearity) 109 | 110 | def add_weight_and_bias_params(input_dim, W, b, name): 111 | return (self.add_param(W, (input_dim, self.num_units), 112 | name='W_{}'.format(name)), 113 | self.add_param(b, (self.num_units,), 114 | name='b_{}'.format(name)) if b is not None else None) 115 | num_inputs = int(np.prod(self.input_shape[2:])) 116 | # Inputs / Hidden parameters 117 | self.W_in_to_hid, self.b_in_to_hid = add_weight_and_bias_params(num_inputs, 118 | W_in_to_hid, b_in_to_hid, name='in_to_hid') 119 | # Read vectors / Hidden parameters 120 | self.W_reads_to_hid, self.b_reads_to_hid = add_weight_and_bias_params(self.num_reads * self.memory_shape[1], 121 | W_reads_to_hid, b_reads_to_hid, name='reads_to_hid') 122 | 123 | def step(self, input, reads, *args): 124 | if input.ndim > 2: 125 | input = input.flatten(2) 126 | if reads.ndim > 2: 127 | reads = reads.flatten(2) 128 | 129 | activation = T.dot(input, self.W_in_to_hid) + \ 130 | T.dot(reads, self.W_reads_to_hid) 131 | if self.b_in_to_hid is not None: 132 | activation += self.b_in_to_hid.dimshuffle('x', 0) 133 | if self.b_reads_to_hid is not None: 134 | activation += self.b_reads_to_hid.dimshuffle('x', 0) 135 | state = self.nonlinearity(activation) 136 | return state, state 137 | 138 | def outputs_info(self, batch_size): 139 | ones_vector = T.ones((batch_size, 1)) 140 | hid_init = T.dot(ones_vector, self.hid_init) 141 | hid_init = T.unbroadcast(hid_init, 0) 142 | return [hid_init, hid_init] 143 | 144 | 145 | class RecurrentController(Controller): 146 | r""" 147 | A "vanilla" recurrent controller for the NTM. 148 | 149 | .. math :: 150 | h_t = \sigma(x_{t} W_{x} + r_{t} W_{r} + 151 | h_{t-1} W_{h} + b_{x} + b_{r} + b_{h}) 152 | 153 | Parameters 154 | ---------- 155 | incoming: a :class:`lasagne.layers.Layer` instance 156 | The layer feeding into the Neural Turing Machine. 157 | memory_shape: tuple 158 | Shape of the NTM's memory. 159 | num_units: int 160 | Number of hidden units in the controller. 161 | num_reads: int 162 | Number of read heads in the Neural Turing Machine. 163 | W_in_to_hid: callable, Numpy array or Theano shared variable 164 | If callable, initializer for the weights between the 165 | input and the hidden state. Otherwise a matrix with 166 | shape ``(num_inputs, num_units)`` (:math:`W_{x}`). 167 | b_in_to_hid: callable, Numpy array, Theano shared variable or ``None`` 168 | If callable, initializer for the biases between the 169 | input and the hidden state. If ``None``, the controller 170 | has no bias between the input and the hidden state. Otherwise 171 | a 1D array with shape ``(num_units,)`` (:math:`b_{x}`). 172 | W_reads_to_hid: callable, Numpy array or Theano shared variable 173 | If callable, initializer for the weights between the 174 | read vector and the hidden state. Otherwise a matrix with 175 | shape ``(num_reads * memory_shape[1], num_units)`` (:math:`W_{r}`). 176 | b_reads_to_hid: callable, Numpy array, Theano shared variable or ``None`` 177 | If callable, initializer for the biases between the 178 | read vector and the hidden state. If ``None``, the controller 179 | has no bias between the read vector and the hidden state. 180 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`b_{r}`). 181 | W_hid_to_hid: callable, Numpy array or Theano shared variable 182 | If callable, initializer for the weights in the hidden-to-hidden 183 | update. Otherwise a matrix with shape ``(num_units, num_units)`` 184 | (:math:`W_{h}`). 185 | b_hid_to_hid: callable, Numpy array, Theano shared variable or ``None`` 186 | If callable, initializer for the biases in the hidden-to-hidden 187 | update. If ``None``, the controller has no bias in the 188 | hidden-to-hidden update. Otherwise a 1D array with shape 189 | ``(num_units,)`` (:math:`b_{h}`). 190 | nonlinearity: callable or ``None`` 191 | The nonlinearity that is applied to the controller. If ``None``, 192 | the controller will be linear (:math:`\sigma`). 193 | hid_init: callable, np.ndarray or theano.shared 194 | Initializer for the initial hidden state (:math:`h_{0}`). 195 | learn_init: bool 196 | If ``True``, initial hidden values are learned. 197 | """ 198 | def __init__(self, incoming, memory_shape, num_units, num_reads, 199 | W_in_to_hid=lasagne.init.GlorotUniform(), 200 | b_in_to_hid=lasagne.init.Constant(0.), 201 | W_reads_to_hid=lasagne.init.GlorotUniform(), 202 | b_reads_to_hid=lasagne.init.Constant(0.), 203 | W_hid_to_hid=lasagne.init.GlorotUniform(), 204 | b_hid_to_hid=lasagne.init.Constant(0.), 205 | nonlinearity=lasagne.nonlinearities.rectify, 206 | hid_init=lasagne.init.GlorotUniform(), 207 | learn_init=False, 208 | **kwargs): 209 | super(RecurrentController, self).__init__(incoming, memory_shape, num_units, 210 | num_reads, hid_init, learn_init, 211 | **kwargs) 212 | self.nonlinearity = (lasagne.nonlinearities.identity if 213 | nonlinearity is None else nonlinearity) 214 | 215 | def add_weight_and_bias_params(input_dim, W, b, name): 216 | return (self.add_param(W, (input_dim, self.num_units), 217 | name='W_{}'.format(name)), 218 | self.add_param(b, (self.num_units,), 219 | name='b_{}'.format(name)) if b is not None else None) 220 | num_inputs = int(np.prod(self.input_shape[2:])) 221 | # Inputs / Hidden parameters 222 | self.W_in_to_hid, self.b_in_to_hid = add_weight_and_bias_params(num_inputs, 223 | W_in_to_hid, b_in_to_hid, name='in_to_hid') 224 | # Read vectors / Hidden parameters 225 | self.W_reads_to_hid, self.b_reads_to_hid = add_weight_and_bias_params(self.num_reads * self.memory_shape[1], 226 | W_reads_to_hid, b_reads_to_hid, name='reads_to_hid') 227 | # Hidden / Hidden parameters 228 | self.W_hid_to_hid, self.b_hid_to_hid = add_weight_and_bias_params(self.num_units, 229 | W_hid_to_hid, b_hid_to_hid, name='hid_to_hid') 230 | 231 | def step(self, input, reads, hidden, *args): 232 | if input.ndim > 2: 233 | input = input.flatten(2) 234 | if reads.ndim > 2: 235 | reads = reads.flatten(2) 236 | 237 | activation = T.dot(input, self.W_in_to_hid) + \ 238 | T.dot(reads, self.W_reads_to_hid) + \ 239 | T.dot(hidden, self.W_hid_to_hid) 240 | if self.b_in_to_hid is not None: 241 | activation += self.b_in_to_hid.dimshuffle('x', 0) 242 | if self.b_reads_to_hid is not None: 243 | activation += self.b_reads_to_hid.dimshuffle('x', 0) 244 | if self.b_hid_to_hid is not None: 245 | activation += self.b_hid_to_hid.dimshuffle('x', 0) 246 | state = self.nonlinearity(activation) 247 | return state, state 248 | 249 | def outputs_info(self, batch_size): 250 | ones_vector = T.ones((batch_size, 1)) 251 | hid_init = T.dot(ones_vector, self.hid_init) 252 | hid_init = T.unbroadcast(hid_init, 0) 253 | return [hid_init, hid_init] 254 | 255 | class LSTMController(Controller): 256 | r""" 257 | A LSTM recurrent controller for the NTM. 258 | .. math :: 259 | input-gate = \sigma(x_{t} Wi_{x} + r_{t} Wi_{r} + 260 | h_{t-1} Wi_{h} + bi_{x} + bi_{r} + bi_{h}) 261 | forget-gate = \sigma(x_{t} Wf_{x} + r_{t} Wf_{r} + 262 | h_{t-1} Wf_{h} + bf_{x} + bf_{r} + bf_{h}) 263 | output-gate = \sigma(x_{t} Wo_{x} + r_{t} Wo_{r} + 264 | h_{t-1} Wo_{h} + bo_{x} + bo_{r} + bo_{h}) 265 | candidate-cell-state = \tanh(x_{t} Wc_{x} + r_{t} Wc_{r} + 266 | h_{t-1} Wc_{h} + bc_{x} + bc_{r} + bc_{h}) 267 | cell-state_{t} = cell-state_{t-1} \odot forget-gate + 268 | candidate-cell-state \odot input-gate 269 | h_{t} = \tanh(cell-state_{t}) \odot output-gate 270 | Parameters 271 | ---------- 272 | incoming: a :class:`lasagne.layers.Layer` instance 273 | The layer feeding into the Neural Turing Machine. 274 | memory_shape: tuple 275 | Shape of the NTM's memory. 276 | num_units: int 277 | Number of hidden units in the controller. 278 | num_reads: int 279 | Number of read heads in the Neural Turing Machine. 280 | W_in_to_input: callable, Numpy array or Theano shared variable 281 | If callable, initializer for the weights between the 282 | input and the input gate. Otherwise a matrix with 283 | shape ``(num_inputs, num_units)`` (:math:`Wi_{x}`). 284 | b_in_to_input: callable, Numpy array, Theano shared variable or ``None`` 285 | If callable, initializer for the biases between the 286 | input and the input gate. If ``None``, the controller 287 | has no bias between the input and the input gate. Otherwise 288 | a 1D array with shape ``(num_units,)`` (:math:`bi_{x}`). 289 | W_reads_to_input: callable, Numpy array or Theano shared variable 290 | If callable, initializer for the weights between the 291 | read vector and the input gate. Otherwise a matrix with 292 | shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wi_{r}`). 293 | b_reads_to_input: callable, Numpy array, Theano shared variable or ``None`` 294 | If callable, initializer for the biases between the 295 | read vector and the input gate. If ``None``, the controller 296 | has no bias between the read vector and the input gate. 297 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`bi_{r}`). 298 | W_hid_to_input: callable, Numpy array or Theano shared variable 299 | If callable, initializer for the weights between the 300 | hidden state and the input gate. Otherwise a matrix with 301 | shape ``(num_units, num_units)`` (:math:`Wi_{h}`). 302 | b_hid_to_input: callable, Numpy array, Theano shared variable or ``None`` 303 | If callable, initializer for the biases between the 304 | hidden state and the input gate. If ``None``, the controller 305 | has no bias between the hidden state and the input gate. 306 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`bi_{h}`). 307 | W_in_to_forget: callable, Numpy array or Theano shared variable 308 | If callable, initializer for the weights between the 309 | input and the forget gate. Otherwise a matrix with 310 | shape ``(num_inputs, num_units)`` (:math:`Wf_{x}`). 311 | b_in_to_forget: callable, Numpy array, Theano shared variable or ``None`` 312 | If callable, initializer for the biases between the 313 | input and the forget gate. If ``None``, the controller 314 | has no bias between the input and the forget gate. Otherwise 315 | a 1D array with shape ``(num_units,)`` (:math:`bf_{x}`). 316 | W_reads_to_forget: callable, Numpy array or Theano shared variable 317 | If callable, initializer for the weights between the 318 | read vector and the forget gate. Otherwise a matrix with 319 | shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wf_{r}`). 320 | b_reads_to_forget: callable, Numpy array, Theano shared variable or ``None`` 321 | If callable, initializer for the biases between the 322 | read vector and the forget gate. If ``None``, the controller 323 | has no bias between the read vector and the forget gate. 324 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`bf_{r}`). 325 | W_hid_to_forget: callable, Numpy array or Theano shared variable 326 | If callable, initializer for the weights between the 327 | hidden state and the forget gate. Otherwise a matrix with 328 | shape ``(num_units, num_units)`` (:math:`Wf_{h}`). 329 | b_hid_to_forget: callable, Numpy array, Theano shared variable or ``None`` 330 | If callable, initializer for the biases between the 331 | hidden state and the forget gate. If ``None``, the controller 332 | has no bias between the hidden state and the forget gate. 333 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`bf_{h}`). 334 | W_in_to_output: callable, Numpy array or Theano shared variable 335 | If callable, initializer for the weights between the 336 | input and the output gate. Otherwise a matrix with 337 | shape ``(num_inputs, num_units)`` (:math:`Wo_{x}`). 338 | b_in_to_output: callable, Numpy array, Theano shared variable or ``None`` 339 | If callable, initializer for the biases between the 340 | input and the output gate. If ``None``, the controller 341 | has no bias between the input and the output gate. Otherwise 342 | a 1D array with shape ``(num_units,)`` (:math:`bo_{x}`). 343 | W_reads_to_output: callable, Numpy array or Theano shared variable 344 | If callable, initializer for the weights between the 345 | read vector and the output gate. Otherwise a matrix with 346 | shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wo_{r}`). 347 | b_reads_to_output: callable, Numpy array, Theano shared variable or ``None`` 348 | If callable, initializer for the biases between the 349 | read vector and the output gate. If ``None``, the controller 350 | has no bias between the read vector and the output gate. 351 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`bo_{r}`). 352 | W_hid_to_output: callable, Numpy array or Theano shared variable 353 | If callable, initializer for the weights between the 354 | hidden state and the output gate. Otherwise a matrix with 355 | shape ``(num_units, num_units)`` (:math:`Wo_{h}`). 356 | b_hid_to_output: callable, Numpy array, Theano shared variable or ``None`` 357 | If callable, initializer for the biases between the 358 | hidden state and the output gate. If ``None``, the controller 359 | has no bias between the hidden state and the output gate. 360 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`bo_{h}`). 361 | W_in_to_cell: callable, Numpy array or Theano shared variable 362 | If callable, initializer for the weights between the 363 | input and the cell state computation gate. Otherwise a matrix 364 | with shape ``(num_inputs, num_units)`` (:math:`Wc_{x}`). 365 | b_in_to_cell: callable, Numpy array, Theano shared variable or ``None`` 366 | If callable, initializer for the biases between the 367 | input and the cell state computation gate. If ``None``, 368 | the controller has no bias between the input and the cell 369 | state computation gate. Otherwise a 1D array with shape 370 | ``(num_units,)`` (:math:`bc_{x}`). 371 | W_reads_to_cell: callable, Numpy array or Theano shared variable 372 | If callable, initializer for the weights between the 373 | read vector and the cell state computation gate. Otherwise a matrix 374 | with shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wc_{r}`). 375 | b_reads_to_cell: callable, Numpy array, Theano shared variable or ``None`` 376 | If callable, initializer for the biases between the 377 | read vector and the cell state computation gate. If ``None``, 378 | the controller has no bias between the read vector and the cell 379 | state computation gate. Otherwise a 1D array with shape 380 | ``(num_units,)`` (:math:`bc_{r}`). 381 | W_hid_to_cell: callable, Numpy array or Theano shared variable 382 | If callable, initializer for the weights between the 383 | hidden state and the cell state computation gate. Otherwise a matrix 384 | with shape ``(num_units, num_units)`` (:math:`Wc_{h}`). 385 | b_hid_to_cell: callable, Numpy array, Theano shared variable or ``None`` 386 | If callable, initializer for the biases between the 387 | hidden state and the cell state computation gate. If ``None``, 388 | the controller has no bias between the hidden state and the cell 389 | state computation gate. Otherwise a 1D array with shape 390 | ``(num_units,)`` (:math:`bc_{h}`). 391 | hid_init: callable, np.ndarray or theano.shared 392 | Initializer for the initial hidden state (:math:`h_{0}`). 393 | cell_init: callable, np.ndarray or theano.shared 394 | Initializer for the initial cell state (:math:`cell-state_{0}`). 395 | learn_init: bool 396 | If ``True``, initial hidden values are learned. 397 | """ 398 | def __init__(self, incoming, memory_shape, num_units, num_reads, 399 | W_in_to_input=lasagne.init.GlorotUniform(), 400 | b_in_to_input=lasagne.init.Constant(0.), 401 | W_reads_to_input=lasagne.init.GlorotUniform(), 402 | b_reads_to_input=lasagne.init.Constant(0.), 403 | W_hid_to_input=lasagne.init.GlorotUniform(), 404 | b_hid_to_input=lasagne.init.Constant(0.), 405 | W_in_to_forget=lasagne.init.GlorotUniform(), 406 | b_in_to_forget=lasagne.init.Constant(0.), 407 | W_reads_to_forget=lasagne.init.GlorotUniform(), 408 | b_reads_to_forget=lasagne.init.Constant(0.), 409 | W_hid_to_forget=lasagne.init.GlorotUniform(), 410 | b_hid_to_forget=lasagne.init.Constant(0.), 411 | W_in_to_output=lasagne.init.GlorotUniform(), 412 | b_in_to_output=lasagne.init.Constant(0.), 413 | W_reads_to_output=lasagne.init.GlorotUniform(), 414 | b_reads_to_output=lasagne.init.Constant(0.), 415 | W_hid_to_output=lasagne.init.GlorotUniform(), 416 | b_hid_to_output=lasagne.init.Constant(0.), 417 | W_in_to_cell=lasagne.init.GlorotUniform(), 418 | b_in_to_cell=lasagne.init.Constant(0.), 419 | W_reads_to_cell=lasagne.init.GlorotUniform(), 420 | b_reads_to_cell=lasagne.init.Constant(0.), 421 | W_hid_to_cell=lasagne.init.GlorotUniform(), 422 | b_hid_to_cell=lasagne.init.Constant(0.), 423 | nonlinearity=lasagne.nonlinearities.rectify, 424 | hid_init=lasagne.init.GlorotUniform(), 425 | cell_init=lasagne.init.Constant(0.), 426 | learn_init=False, 427 | **kwargs): 428 | super(LSTMController, self).__init__(incoming, memory_shape, num_units, 429 | num_reads, hid_init, learn_init, 430 | **kwargs) 431 | self.nonlinearity = (lasagne.nonlinearities.identity if 432 | nonlinearity is None else nonlinearity) 433 | self.cell_init = self.add_param(cell_init, (1, num_units), 434 | name='cell_init', regularizable=False, trainable=learn_init) 435 | 436 | def add_weight_and_bias_params(input_dim, W, b, name): 437 | return (self.add_param(W, (input_dim, self.num_units), 438 | name='W_{}'.format(name)), 439 | self.add_param(b, (self.num_units,), 440 | name='b_{}'.format(name)) if b is not None else None) 441 | num_inputs = int(np.prod(self.input_shape[2:])) 442 | # Inputs / Input Gate parameters 443 | self.W_in_to_input, self.b_in_to_input = add_weight_and_bias_params(num_inputs, 444 | W_in_to_input, b_in_to_input, name='in_to_input') 445 | # Read vectors / Input Gate parameters 446 | self.W_reads_to_input, self.b_reads_to_input = add_weight_and_bias_params(self.num_reads * self.memory_shape[1], 447 | W_reads_to_input, b_reads_to_input, name='reads_to_input') 448 | # Hidden / Input Gate parameters 449 | self.W_hid_to_input, self.b_hid_to_input = add_weight_and_bias_params(self.num_units, 450 | W_hid_to_input, b_hid_to_input, name='hid_to_input') 451 | # Inputs / Forget Gate parameters 452 | self.W_in_to_forget, self.b_in_to_forget = add_weight_and_bias_params(num_inputs, 453 | W_in_to_forget, b_in_to_forget, name='in_to_forget') 454 | # Read vectors / Forget Gate parameters 455 | self.W_reads_to_forget, self.b_reads_to_forget = add_weight_and_bias_params(self.num_reads * self.memory_shape[1], 456 | W_reads_to_forget, b_reads_to_forget, name='reads_to_forget') 457 | # Hidden / Forget Gate parameters 458 | self.W_hid_to_forget, self.b_hid_to_forget = add_weight_and_bias_params(self.num_units, 459 | W_hid_to_forget, b_hid_to_forget, name='hid_to_forget') 460 | # Inputs / Output Gate parameters 461 | self.W_in_to_output, self.b_in_to_output = add_weight_and_bias_params(num_inputs, 462 | W_in_to_output, b_in_to_output, name='in_to_output') 463 | # Read vectors / Output Gate parameters 464 | self.W_reads_to_output, self.b_reads_to_output = add_weight_and_bias_params(self.num_reads * self.memory_shape[1], 465 | W_reads_to_output, b_reads_to_output, name='reads_to_output') 466 | # Hidden / Output Gate parameters 467 | self.W_hid_to_output, self.b_hid_to_output = add_weight_and_bias_params(self.num_units, 468 | W_hid_to_output, b_hid_to_output, name='hid_to_output') 469 | # Inputs / Cell State parameters 470 | self.W_in_to_cell, self.b_in_to_cell = add_weight_and_bias_params(num_inputs, 471 | W_in_to_cell, b_in_to_cell, name='in_to_cell') 472 | # Read vectors / Cell State parameters 473 | self.W_reads_to_cell, self.b_reads_to_cell = add_weight_and_bias_params(self.num_reads * self.memory_shape[1], 474 | W_reads_to_cell, b_reads_to_cell, name='reads_to_cell') 475 | # Hidden / Cell State parameters 476 | self.W_hid_to_cell, self.b_hid_to_cell = add_weight_and_bias_params(self.num_units, 477 | W_hid_to_cell, b_hid_to_cell, name='hid_to_cell') 478 | 479 | def step(self, input, reads, hidden, cell, *args): 480 | if input.ndim > 2: 481 | input = input.flatten(2) 482 | if reads.ndim > 2: 483 | reads = reads.flatten(2) 484 | # Input Gate output computation 485 | activation = T.dot(input, self.W_in_to_input) + \ 486 | T.dot(reads, self.W_reads_to_input) + \ 487 | T.dot(hidden, self.W_hid_to_input) 488 | if self.b_in_to_input is not None: 489 | activation += self.b_in_to_input.dimshuffle('x', 0) 490 | if self.b_reads_to_input is not None: 491 | activation += self.b_reads_to_input.dimshuffle('x', 0) 492 | if self.b_hid_to_input is not None: 493 | activation += self.b_hid_to_input.dimshuffle('x', 0) 494 | input_gate = lasagne.nonlinearities.sigmoid(activation) 495 | # Forget Gate output computation 496 | activation = T.dot(input, self.W_in_to_forget) + \ 497 | T.dot(reads, self.W_reads_to_forget) + \ 498 | T.dot(hidden, self.W_hid_to_forget) 499 | if self.b_in_to_forget is not None: 500 | activation += self.b_in_to_forget.dimshuffle('x', 0) 501 | if self.b_reads_to_forget is not None: 502 | activation += self.b_reads_to_forget.dimshuffle('x', 0) 503 | if self.b_hid_to_forget is not None: 504 | activation += self.b_hid_to_forget.dimshuffle('x', 0) 505 | forget_gate = lasagne.nonlinearities.sigmoid(activation) 506 | # Output Gate output computation 507 | activation = T.dot(input, self.W_in_to_output) + \ 508 | T.dot(reads, self.W_reads_to_output) + \ 509 | T.dot(hidden, self.W_hid_to_output) 510 | if self.b_in_to_output is not None: 511 | activation += self.b_in_to_output.dimshuffle('x', 0) 512 | if self.b_reads_to_output is not None: 513 | activation += self.b_reads_to_output.dimshuffle('x', 0) 514 | if self.b_hid_to_output is not None: 515 | activation += self.b_hid_to_output.dimshuffle('x', 0) 516 | output_gate = lasagne.nonlinearities.sigmoid(activation) 517 | # New candidate cell state computation 518 | activation = T.dot(input, self.W_in_to_cell) + \ 519 | T.dot(reads, self.W_reads_to_cell) + \ 520 | T.dot(hidden, self.W_hid_to_cell) 521 | if self.b_in_to_cell is not None: 522 | activation += self.b_in_to_cell.dimshuffle('x', 0) 523 | if self.b_reads_to_cell is not None: 524 | activation += self.b_reads_to_cell.dimshuffle('x', 0) 525 | if self.b_hid_to_cell is not None: 526 | activation += self.b_hid_to_cell.dimshuffle('x', 0) 527 | candidate_cell_state = lasagne.nonlinearities.tanh(activation) 528 | # New cell state and hidden state computation 529 | cell_state = cell * forget_gate + candidate_cell_state * input_gate 530 | state = lasagne.nonlinearities.tanh(cell_state) * output_gate 531 | return state, cell_state 532 | 533 | def outputs_info(self, batch_size): 534 | ones_vector = T.ones((batch_size, 1)) 535 | hid_init = T.dot(ones_vector, self.hid_init) 536 | hid_init = T.unbroadcast(hid_init, 0) 537 | cell_init = T.dot(ones_vector, self.cell_init) 538 | cell_init = T.unbroadcast(cell_init, 0) 539 | return [hid_init, cell_init] 540 | 541 | class GRUController(Controller): 542 | r""" 543 | A GRU recurrent controller for the NTM. 544 | .. math :: 545 | update-gate = \sigma(x_{t} Wz_{x} + r_{t} Wz_{r} + 546 | h_{t-1} Wz_{h} + bz_{x} + bz_{r} + bz_{h}) 547 | reset-gate = \sigma(x_{t} Wr_{x} + r_{t} Wr_{r} + 548 | h_{t-1} Wr_{h} + br_{x} + br_{r} + br_{h}) 549 | s = \tanh(x_{t} Ws_{x} + r_{t} Ws_{r} + 550 | (h_{t-1} \odot reset-gate) Ws_{h}) 551 | h_{t} = (1 - update-gate) \odot s + update-gate \odot h_{t-1} 552 | Parameters 553 | ---------- 554 | incoming: a :class:`lasagne.layers.Layer` instance 555 | The layer feeding into the Neural Turing Machine. 556 | memory_shape: tuple 557 | Shape of the NTM's memory. 558 | num_units: int 559 | Number of hidden units in the controller. 560 | num_reads: int 561 | Number of read heads in the Neural Turing Machine. 562 | W_in_to_input: callable, Numpy array or Theano shared variable 563 | If callable, initializer for the weights between the 564 | input and the update gate. Otherwise a matrix with 565 | shape ``(num_inputs, num_units)`` (:math:`Wz_{x}`). 566 | b_in_to_update: callable, Numpy array, Theano shared variable or ``None`` 567 | If callable, initializer for the biases between the 568 | input and the update gate. If ``None``, the controller 569 | has no bias between the input and the update gate. Otherwise 570 | a 1D array with shape ``(num_units,)`` (:math:`bz_{x}`). 571 | W_reads_to_update: callable, Numpy array or Theano shared variable 572 | If callable, initializer for the weights between the 573 | read vector and the update gate. Otherwise a matrix with 574 | shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wz_{r}`). 575 | b_reads_to_update: callable, Numpy array, Theano shared variable or ``None`` 576 | If callable, initializer for the biases between the 577 | read vector and the update gate. If ``None``, the controller 578 | has no bias between the read vector and the update gate. 579 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`bz_{r}`). 580 | W_hid_to_update: callable, Numpy array or Theano shared variable 581 | If callable, initializer for the weights between the 582 | hidden state and the update gate. Otherwise a matrix with 583 | shape ``(num_units, num_units)`` (:math:`Wz_{h}`). 584 | b_hid_to_update: callable, Numpy array, Theano shared variable or ``None`` 585 | If callable, initializer for the biases between the 586 | hidden state and the update gate. If ``None``, the controller 587 | has no bias between the hidden state and the update gate. 588 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`bz_{h}`). 589 | W_in_to_reset: callable, Numpy array or Theano shared variable 590 | If callable, initializer for the weights between the 591 | input and the reset gate. Otherwise a matrix with 592 | shape ``(num_inputs, num_units)`` (:math:`Wr_{x}`). 593 | b_in_to_reset: callable, Numpy array, Theano shared variable or ``None`` 594 | If callable, initializer for the biases between the 595 | input and the reset gate. If ``None``, the controller 596 | has no bias between the input and the reset gate. Otherwise 597 | a 1D array with shape ``(num_units,)`` (:math:`br_{x}`). 598 | W_reads_to_reset: callable, Numpy array or Theano shared variable 599 | If callable, initializer for the weights between the 600 | read vector and the reset gate. Otherwise a matrix with 601 | shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Wr_{r}`). 602 | b_reads_to_reset: callable, Numpy array, Theano shared variable or ``None`` 603 | If callable, initializer for the biases between the 604 | read vector and the reset gate. If ``None``, the controller 605 | has no bias between the read vector and the reset gate. 606 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`br_{r}`). 607 | W_hid_to_reset: callable, Numpy array or Theano shared variable 608 | If callable, initializer for the weights between the 609 | hidden state and the reset gate. Otherwise a matrix with 610 | shape ``(num_units, num_units)`` (:math:`Wr_{h}`). 611 | b_hid_to_reset: callable, Numpy array, Theano shared variable or ``None`` 612 | If callable, initializer for the biases between the 613 | hidden state and the reset gate. If ``None``, the controller 614 | has no bias between the hidden state and the reset gate. 615 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`br_{h}`). 616 | W_in_to_hid: callable, Numpy array or Theano shared variable 617 | If callable, initializer for the weights between the 618 | input and the hidden gate. Otherwise a matrix with 619 | shape ``(num_inputs, num_units)`` (:math:`Ws_{x}`). 620 | b_in_to_hid: callable, Numpy array, Theano shared variable or ``None`` 621 | If callable, initializer for the biases between the 622 | input and the hidden gate. If ``None``, the controller 623 | has no bias between the input and the hidden gate. Otherwise 624 | a 1D array with shape ``(num_units,)`` (:math:`bs_{x}`). 625 | W_reads_to_hid: callable, Numpy array or Theano shared variable 626 | If callable, initializer for the weights between the 627 | read vector and the hidden gate. Otherwise a matrix with 628 | shape ``(num_reads * memory_shape[1], num_units)`` (:math:`Ws_{r}`). 629 | b_reads_to_hid: callable, Numpy array, Theano shared variable or ``None`` 630 | If callable, initializer for the biases between the 631 | read vector and the hidden gate. If ``None``, the controller 632 | has no bias between the read vector and the hidden gate. 633 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`bs_{r}`). 634 | W_hid_to_hid: callable, Numpy array or Theano shared variable 635 | If callable, initializer for the weights between the 636 | hidden state and the hidden gate. Otherwise a matrix with 637 | shape ``(num_units, num_units)`` (:math:`Ws_{h}`). 638 | b_hid_to_hid: callable, Numpy array, Theano shared variable or ``None`` 639 | If callable, initializer for the biases between the 640 | hidden state and the hidden gate. If ``None``, the controller 641 | has no bias between the hidden state and the hidden gate. 642 | Otherwise a 1D array with shape ``(num_units,)`` (:math:`bs_{h}`). 643 | hid_init: callable, np.ndarray or theano.shared 644 | Initializer for the initial hidden state (:math:`h_{0}`). 645 | learn_init: bool 646 | If ``True``, initial hidden values are learned. 647 | """ 648 | def __init__(self, incoming, memory_shape, num_units, num_reads, 649 | W_in_to_update=lasagne.init.GlorotUniform(), 650 | b_in_to_update=lasagne.init.Constant(0.), 651 | W_reads_to_update=lasagne.init.GlorotUniform(), 652 | b_reads_to_update=lasagne.init.Constant(0.), 653 | W_hid_to_update=lasagne.init.GlorotUniform(), 654 | b_hid_to_update=lasagne.init.Constant(0.), 655 | W_in_to_reset=lasagne.init.GlorotUniform(), 656 | b_in_to_reset=lasagne.init.Constant(0.), 657 | W_reads_to_reset=lasagne.init.GlorotUniform(), 658 | b_reads_to_reset=lasagne.init.Constant(0.), 659 | W_hid_to_reset=lasagne.init.GlorotUniform(), 660 | b_hid_to_reset=lasagne.init.Constant(0.), 661 | W_in_to_hid=lasagne.init.GlorotUniform(), 662 | b_in_to_hid=lasagne.init.Constant(0.), 663 | W_reads_to_hid=lasagne.init.GlorotUniform(), 664 | b_reads_to_hid=lasagne.init.Constant(0.), 665 | W_hid_to_hid=lasagne.init.GlorotUniform(), 666 | b_hid_to_hid=lasagne.init.Constant(0.), 667 | nonlinearity=lasagne.nonlinearities.rectify, 668 | hid_init=lasagne.init.GlorotUniform(), 669 | learn_init=False, 670 | **kwargs): 671 | super(GRUController, self).__init__(incoming, memory_shape, num_units, 672 | num_reads, hid_init, learn_init, 673 | **kwargs) 674 | self.nonlinearity = (lasagne.nonlinearities.identity if 675 | nonlinearity is None else nonlinearity) 676 | 677 | def add_weight_and_bias_params(input_dim, W, b, name): 678 | return (self.add_param(W, (input_dim, self.num_units), 679 | name='W_{}'.format(name)), 680 | self.add_param(b, (self.num_units,), 681 | name='b_{}'.format(name)) if b is not None else None) 682 | num_inputs = int(np.prod(self.input_shape[2:])) 683 | # Inputs / Update Gate parameters 684 | self.W_in_to_update, self.b_in_to_update = add_weight_and_bias_params(num_inputs, 685 | W_in_to_update, b_in_to_update, name='in_to_update') 686 | # Read vectors / Update Gate parameters 687 | self.W_reads_to_update, self.b_reads_to_update = add_weight_and_bias_params(self.num_reads * self.memory_shape[1], 688 | W_reads_to_update, b_reads_to_update, name='reads_to_update') 689 | # Hidden / Update Gate parameters 690 | self.W_hid_to_update, self.b_hid_to_update = add_weight_and_bias_params(self.num_units, 691 | W_hid_to_update, b_hid_to_update, name='hid_to_update') 692 | # Inputs / Reset Gate parameters 693 | self.W_in_to_reset, self.b_in_to_reset = add_weight_and_bias_params(num_inputs, 694 | W_in_to_reset, b_in_to_reset, name='in_to_reset') 695 | # Read vectors / Reset Gate parameters 696 | self.W_reads_to_reset, self.b_reads_to_reset = add_weight_and_bias_params(self.num_reads * self.memory_shape[1], 697 | W_reads_to_reset, b_reads_to_reset, name='reads_to_reset') 698 | # Hidden / Reset Gate parameters 699 | self.W_hid_to_reset, self.b_hid_to_reset = add_weight_and_bias_params(self.num_units, 700 | W_hid_to_reset, b_hid_to_reset, name='hid_to_reset') 701 | # Inputs / Hidden Gate parameters 702 | self.W_in_to_hid, self.b_in_to_hid = add_weight_and_bias_params(num_inputs, 703 | W_in_to_hid, b_in_to_hid, name='in_to_hid') 704 | # Read vectors / Hidden Gate parameters 705 | self.W_reads_to_hid, self.b_reads_to_hid = add_weight_and_bias_params(self.num_reads * self.memory_shape[1], 706 | W_reads_to_hid, b_reads_to_hid, name='reads_to_hid') 707 | # Hidden / Hidden Gate parameters 708 | self.W_hid_to_hid, self.b_hid_to_hid = add_weight_and_bias_params(self.num_units, 709 | W_hid_to_hid, b_hid_to_hid, name='hid_to_hid') 710 | 711 | def step(self, input, reads, hidden, *args): 712 | if input.ndim > 2: 713 | input = input.flatten(2) 714 | if reads.ndim > 2: 715 | reads = reads.flatten(2) 716 | # Update Gate output computation 717 | activation = T.dot(input, self.W_in_to_update) + \ 718 | T.dot(reads, self.W_reads_to_update) + \ 719 | T.dot(hidden, self.W_hid_to_update) 720 | if self.b_in_to_update is not None: 721 | activation += self.b_in_to_update.dimshuffle('x', 0) 722 | if self.b_reads_to_update is not None: 723 | activation += self.b_reads_to_update.dimshuffle('x', 0) 724 | if self.b_hid_to_update is not None: 725 | activation += self.b_hid_to_update.dimshuffle('x', 0) 726 | update_gate = lasagne.nonlinearities.sigmoid(activation) 727 | # Reset Gate output computation 728 | activation = T.dot(input, self.W_in_to_reset) + \ 729 | T.dot(reads, self.W_reads_to_reset) + \ 730 | T.dot(hidden, self.W_hid_to_reset) 731 | if self.b_in_to_reset is not None: 732 | activation += self.b_in_to_reset.dimshuffle('x', 0) 733 | if self.b_reads_to_reset is not None: 734 | activation += self.b_reads_to_reset.dimshuffle('x', 0) 735 | if self.b_hid_to_reset is not None: 736 | activation += self.b_hid_to_reset.dimshuffle('x', 0) 737 | reset_gate = lasagne.nonlinearities.sigmoid(activation) 738 | # Hidden Gate output computation 739 | activation = T.dot(input, self.W_in_to_hid) + \ 740 | T.dot(reads, self.W_reads_to_hid) + \ 741 | T.dot((hidden * reset_gate), self.W_hid_to_hid) 742 | if self.b_in_to_hid is not None: 743 | activation += self.b_in_to_hid.dimshuffle('x', 0) 744 | if self.b_reads_to_hid is not None: 745 | activation += self.b_reads_to_hid.dimshuffle('x', 0) 746 | if self.b_hid_to_hid is not None: 747 | activation += self.b_hid_to_hid.dimshuffle('x', 0) 748 | hidden_gate = lasagne.nonlinearities.tanh(activation) 749 | # New hidden state computation 750 | ones = T.ones(update_gate.shape) 751 | state = (ones - update_gate) * hidden_gate + update_gate * hidden 752 | return state, state 753 | 754 | def outputs_info(self, batch_size): 755 | ones_vector = T.ones((batch_size, 1)) 756 | hid_init = T.dot(ones_vector, self.hid_init) 757 | hid_init = T.unbroadcast(hid_init, 0) 758 | return [hid_init, hid_init] 759 | -------------------------------------------------------------------------------- /ntm/heads.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | from collections import OrderedDict 5 | 6 | from lasagne.layers import Layer, DenseLayer 7 | from lasagne.theano_extensions import padding 8 | import lasagne.init 9 | import lasagne.nonlinearities 10 | 11 | import similarities 12 | import nonlinearities 13 | import init 14 | 15 | 16 | class Head(Layer): 17 | r""" 18 | The base class :class:`Head` represents a generic head for the 19 | Neural Turing Machine. The heads are responsible for the read/write 20 | operations on the memory. An instance of :class:`Head` outputs a 21 | weight vector defined by 22 | 23 | .. math :: 24 | k_{t} &= \sigma_{key}(h_{t} W_{key} + b_{key})\\ 25 | \beta_{t} &= \sigma_{beta}(h_{t} W_{beta} + b_{beta})\\ 26 | g_{t} &= \sigma_{gate}(h_{t} W_{gate} + b_{gate})\\ 27 | s_{t} &= \sigma_{shift}(h_{t} W_{shift} + b_{shift})\\ 28 | \gamma_{t} &= \sigma_{gamma}(h_{t} W_{gamma} + b_{gamma}) 29 | 30 | .. math :: 31 | w_{t}^{c} &= softmax(\beta_{t} * K(k_{t}, M_{t}))\\ 32 | w_{t}^{g} &= g_{t} * w_{t}^{c} + (1 - g_{t}) * w_{t-1}\\ 33 | \tilde{w}_{t} &= s_{t} \ast w_{t}^{g}\\ 34 | w_{t} \propto \tilde{w}_{t}^{\gamma_{t}} 35 | 36 | Parameters 37 | ---------- 38 | controller: a :class:`Controller` instance 39 | The controller of the Neural Turing Machine. 40 | num_shifts: int 41 | Number of shifts allowed by the convolutional shift operation 42 | (centered on 0, eg. ``num_shifts=3`` represents shifts 43 | in [-1, 0, 1]). 44 | memory_shape: tuple 45 | Shape of the NTM's memory 46 | W_hid_to_key: callable, Numpy array or Theano shared variable 47 | If callable, initializer of the weights for the parameter 48 | :math:`k_{t}`. Otherwise a matrix with shape 49 | ``(controller.num_units, memory_shape[1])``. 50 | b_hid_to_key: callable, Numpy array, Theano shared variable or ``None`` 51 | If callable, initializer of the biases for the parameter 52 | :math:`k_{t}`. If ``None``, no bias. Otherwise a matrix 53 | with shape ``(memory_shape[1],)``. 54 | nonlinearity_key: callable or ``None`` 55 | The nonlinearity that is applied for parameter :math:`k_{t}`. If 56 | ``None``, the nonlinearity is ``identity``. 57 | W_hid_to_beta: callable, Numpy array or Theano shared variable 58 | b_hid_to_beta: callable, Numpy array, Theano shared variable or ``None`` 59 | nonlinearity_beta: callable or ``None`` 60 | Weights, biases and nonlinearity for parameter :math:`\beta_{t}`. 61 | W_hid_to_gate: callable, Numpy array or Theano shared variable 62 | b_hid_to_gate: callable, Numpy array, Theano shared variable or ``None`` 63 | nonlinearity_gate: callable or ``None`` 64 | Weights, biases and nonlinearity for parameter :math:`g_{t}`. 65 | W_hid_to_shift: callable, Numpy array or Theano shared variable 66 | b_hid_to_shift: callable, Numpy array, Theano shared variable or ``None`` 67 | nonlinearity_shift: callable or ``None`` 68 | Weights, biases and nonlinearity for parameter :math:`s_{t}`. 69 | W_hid_to_gamma: callable, Numpy array or Theano shared variable 70 | b_hid_to_gamma: callable, Numpy array, Theano shared variable or ``None`` 71 | nonlinearity_gamma: callable or ``None`` 72 | Weights, biases and nonlinearity for parameter :math:`\gamma_{t}` 73 | weights_init: callable, Numpy array or Theano shared variable 74 | Initializer for the initial weight vector (:math:`w_{0}`). 75 | learn_init: bool 76 | If ``True``, initial hidden values are learned. 77 | """ 78 | def __init__(self, controller, num_shifts=3, memory_shape=(128, 20), 79 | W_hid_to_key=lasagne.init.GlorotUniform(), 80 | b_hid_to_key=lasagne.init.Constant(0.), 81 | nonlinearity_key=nonlinearities.ClippedLinear(low=0., high=1.), 82 | W_hid_to_beta=lasagne.init.GlorotUniform(), 83 | b_hid_to_beta=lasagne.init.Constant(0.), 84 | nonlinearity_beta=lasagne.nonlinearities.rectify, 85 | W_hid_to_gate=lasagne.init.GlorotUniform(), 86 | b_hid_to_gate=lasagne.init.Constant(0.), 87 | nonlinearity_gate=nonlinearities.hard_sigmoid, 88 | W_hid_to_shift=lasagne.init.GlorotUniform(), 89 | b_hid_to_shift=lasagne.init.Constant(0.), 90 | nonlinearity_shift=lasagne.nonlinearities.softmax, 91 | W_hid_to_gamma=lasagne.init.GlorotUniform(), 92 | b_hid_to_gamma=lasagne.init.Constant(0.), 93 | nonlinearity_gamma=lambda x: 1. + lasagne.nonlinearities.rectify(x), 94 | weights_init=init.OneHot(), 95 | learn_init=False, 96 | **kwargs): 97 | super(Head, self).__init__(controller, **kwargs) 98 | 99 | self.memory_shape = memory_shape 100 | self.name = kwargs.get('name', 'head') 101 | self.learn_init = learn_init 102 | 103 | # Key 104 | self.W_hid_to_key = self.add_param(W_hid_to_key, (1, self.input_shape[1], \ 105 | self.memory_shape[1]), name=self.name + '.key.W') 106 | self.b_hid_to_key = self.add_param(b_hid_to_key, (1, self.memory_shape[1]), \ 107 | name=self.name + '.key.b', regularizable=False) 108 | self.nonlinearity_key = nonlinearity_key 109 | # Beta 110 | self.W_hid_to_beta = self.add_param(W_hid_to_beta, (1, self.input_shape[1], \ 111 | 1), name=self.name + '.beta.W') 112 | self.b_hid_to_beta = self.add_param(b_hid_to_beta, (1, 1), \ 113 | name=self.name + '.beta.b', regularizable=False) 114 | self.nonlinearity_beta = nonlinearity_beta 115 | # Gate 116 | self.W_hid_to_gate = self.add_param(W_hid_to_gate, (1, self.input_shape[1], \ 117 | 1), name=self.name + '.gate.W') 118 | self.b_hid_to_gate = self.add_param(b_hid_to_gate, (1, 1), \ 119 | name=self.name + '.gate.b', regularizable=False) 120 | self.nonlinearity_gate = nonlinearity_gate 121 | # Shift 122 | self.num_shifts = num_shifts 123 | self.W_hid_to_shift = self.add_param(W_hid_to_shift, (1, self.input_shape[1], \ 124 | self.num_shifts), name=self.name + '.shift.W') 125 | self.b_hid_to_shift = self.add_param(b_hid_to_shift, (1, self.num_shifts), \ 126 | name=self.name + '.shift.b', regularizable=False) 127 | self.nonlinearity_shift = nonlinearity_shift 128 | # Gamma 129 | self.W_hid_to_gamma = self.add_param(W_hid_to_gamma, (1, self.input_shape[1], \ 130 | 1), name=self.name + '.gamma.W') 131 | self.b_hid_to_gamma = self.add_param(b_hid_to_gamma, (1, 1), \ 132 | name=self.name + '.gamma.b', regularizable=False) 133 | self.nonlinearity_gamma = nonlinearity_gamma 134 | 135 | self.weights_init = self.add_param( 136 | weights_init, (1, self.memory_shape[0]), 137 | name='weights_init', trainable=learn_init, regularizable=False) 138 | 139 | 140 | class WriteHead(Head): 141 | r""" 142 | Write head. In addition to the weight vector, the write head 143 | also outputs an add vector :math:`a_{t}` and an erase vector 144 | :math:`e_{t}` defined by 145 | 146 | .. math :: 147 | a_{t} &= \sigma_{a}(h_{t} W_{a} + b_{a}) 148 | e_{t} &= \sigma_{e}(h_{t} W_{e} + b_{e}) 149 | 150 | Parameters 151 | ---------- 152 | controller: a :class:`Controller` instance 153 | The controller of the Neural Turing Machine. 154 | num_shifts: int 155 | Number of shifts allowed by the convolutional shift operation 156 | (centered on 0, eg. ``num_shifts=3`` represents shifts 157 | in [-1, 0, 1]). 158 | memory_shape: tuple 159 | Shape of the NTM's memory 160 | W_hid_to_key: callable, Numpy array or Theano shared variable 161 | b_hid_to_key: callable, Numpy array, Theano shared variable or ``None`` 162 | nonlinearity_key: callable or ``None`` 163 | Weights, biases and nonlinearity for parameter :math:`k_{t}`. 164 | W_hid_to_beta: callable, Numpy array or Theano shared variable 165 | b_hid_to_beta: callable, Numpy array, Theano shared variable or ``None`` 166 | nonlinearity_beta: callable or ``None`` 167 | Weights, biases and nonlinearity for parameter :math:`\beta_{t}`. 168 | W_hid_to_gate: callable, Numpy array or Theano shared variable 169 | b_hid_to_gate: callable, Numpy array, Theano shared variable or ``None`` 170 | nonlinearity_gate: callable or ``None`` 171 | Weights, biases and nonlinearity for parameter :math:`g_{t}`. 172 | W_hid_to_shift: callable, Numpy array or Theano shared variable 173 | b_hid_to_shift: callable, Numpy array, Theano shared variable or ``None`` 174 | nonlinearity_shift: callable or ``None`` 175 | Weights, biases and nonlinearity for parameter :math:`s_{t}`. 176 | W_hid_to_gamma: callable, Numpy array or Theano shared variable 177 | b_hid_to_gamma: callable, Numpy array, Theano shared variable or ``None`` 178 | nonlinearity_gamma: callable or ``None`` 179 | Weights, biases and nonlinearity for parameter :math:`\gamma_{t}` 180 | W_hid_to_erase: callable, Numpy array or Theano shared variable 181 | b_hid_to_erase: callable, Numpy array, Theano shared variable or ``None`` 182 | nonlinearity_erase: callable or ``None`` 183 | Weights, biases and nonlinearity for parameter :math:`e_{t}` 184 | W_hid_to_add: callable, Numpy array or Theano shared variable 185 | b_hid_to_add: callable, Numpy array, Theano shared variable or ``None`` 186 | nonlinearity_add: callable or ``None`` 187 | Weights, biases and nonlinearity for parameter :math:`a_{t}` 188 | weights_init: callable, Numpy array or Theano shared variable 189 | Initializer for the initial weight vector (:math:`w_{0}`). 190 | learn_init: bool 191 | If ``True``, initial hidden values are learned. 192 | """ 193 | def __init__(self, controller, num_shifts=3, memory_shape=(128, 20), 194 | W_hid_to_key=lasagne.init.GlorotUniform(), 195 | b_hid_to_key=lasagne.init.Constant(0.), 196 | nonlinearity_key=nonlinearities.ClippedLinear(low=0., high=1.), 197 | W_hid_to_beta=lasagne.init.GlorotUniform(), 198 | b_hid_to_beta=lasagne.init.Constant(0.), 199 | nonlinearity_beta=lasagne.nonlinearities.rectify, 200 | W_hid_to_gate=lasagne.init.GlorotUniform(), 201 | b_hid_to_gate=lasagne.init.Constant(0.), 202 | nonlinearity_gate=nonlinearities.hard_sigmoid, 203 | W_hid_to_shift=lasagne.init.GlorotUniform(), 204 | b_hid_to_shift=lasagne.init.Constant(0.), 205 | nonlinearity_shift=lasagne.nonlinearities.softmax, 206 | W_hid_to_gamma=lasagne.init.GlorotUniform(), 207 | b_hid_to_gamma=lasagne.init.Constant(0.), 208 | nonlinearity_gamma=lambda x: 1. + lasagne.nonlinearities.rectify(x), 209 | W_hid_to_erase=lasagne.init.GlorotUniform(), 210 | b_hid_to_erase=lasagne.init.Constant(0.), 211 | nonlinearity_erase=nonlinearities.hard_sigmoid, 212 | W_hid_to_add=lasagne.init.GlorotUniform(), 213 | b_hid_to_add=lasagne.init.Constant(0.), 214 | nonlinearity_add=nonlinearities.ClippedLinear(low=0., high=1.), 215 | weights_init=init.OneHot(), 216 | learn_init=False, 217 | **kwargs): 218 | super(WriteHead, self).__init__(controller, num_shifts=num_shifts, memory_shape=memory_shape, 219 | W_hid_to_key=W_hid_to_key, b_hid_to_key=b_hid_to_key, nonlinearity_key=nonlinearity_key, 220 | W_hid_to_beta=W_hid_to_beta, b_hid_to_beta=b_hid_to_beta, nonlinearity_beta=nonlinearity_beta, 221 | W_hid_to_gate=W_hid_to_gate, b_hid_to_gate=b_hid_to_gate, nonlinearity_gate=nonlinearity_gate, 222 | W_hid_to_shift=W_hid_to_shift, b_hid_to_shift=b_hid_to_shift, nonlinearity_shift=nonlinearity_shift, 223 | W_hid_to_gamma=W_hid_to_gamma, b_hid_to_gamma=b_hid_to_gamma, nonlinearity_gamma=nonlinearity_gamma, 224 | weights_init=weights_init, learn_init=learn_init, **kwargs) 225 | # Erase 226 | self.W_hid_to_erase = self.add_param(W_hid_to_erase, (1, self.input_shape[1], \ 227 | self.memory_shape[1]), name=self.name + '.erase.W') 228 | self.b_hid_to_erase = self.add_param(b_hid_to_erase, (1, self.memory_shape[1]), \ 229 | name=self.name + '.erase.b', regularizable=False) 230 | self.nonlinearity_erase = nonlinearity_erase 231 | # Add 232 | self.W_hid_to_add = self.add_param(W_hid_to_add, (1, self.input_shape[1], \ 233 | self.memory_shape[1]), name=self.name + '.add.W') 234 | self.b_hid_to_add = self.add_param(b_hid_to_add, (1, self.memory_shape[1]), \ 235 | name=self.name + '.add.b', regularizable=False) 236 | self.nonlinearity_add = nonlinearity_add 237 | 238 | 239 | class ReadHead(Head): 240 | r""" 241 | Read head. 242 | 243 | Parameters 244 | ---------- 245 | controller: a :class:`Controller` instance 246 | The controller of the Neural Turing Machine. 247 | num_shifts: int 248 | Number of shifts allowed by the convolutional shift operation 249 | (centered on 0, eg. ``num_shifts=3`` represents shifts 250 | in [-1, 0, 1]). 251 | memory_shape: tuple 252 | Shape of the NTM's memory 253 | W_hid_to_key: callable, Numpy array or Theano shared variable 254 | b_hid_to_key: callable, Numpy array, Theano shared variable or ``None`` 255 | nonlinearity_key: callable or ``None`` 256 | Weights, biases and nonlinearity for parameter :math:`k_{t}`. 257 | W_hid_to_beta: callable, Numpy array or Theano shared variable 258 | b_hid_to_beta: callable, Numpy array, Theano shared variable or ``None`` 259 | nonlinearity_beta: callable or ``None`` 260 | Weights, biases and nonlinearity for parameter :math:`\beta_{t}`. 261 | W_hid_to_gate: callable, Numpy array or Theano shared variable 262 | b_hid_to_gate: callable, Numpy array, Theano shared variable or ``None`` 263 | nonlinearity_gate: callable or ``None`` 264 | Weights, biases and nonlinearity for parameter :math:`g_{t}`. 265 | W_hid_to_shift: callable, Numpy array or Theano shared variable 266 | b_hid_to_shift: callable, Numpy array, Theano shared variable or ``None`` 267 | nonlinearity_shift: callable or ``None`` 268 | Weights, biases and nonlinearity for parameter :math:`s_{t}`. 269 | W_hid_to_gamma: callable, Numpy array or Theano shared variable 270 | b_hid_to_gamma: callable, Numpy array, Theano shared variable or ``None`` 271 | nonlinearity_gamma: callable or ``None`` 272 | Weights, biases and nonlinearity for parameter :math:`\gamma_{t}` 273 | weights_init: callable, Numpy array or Theano shared variable 274 | Initializer for the initial weight vector (:math:`w_{0}`). 275 | learn_init: bool 276 | If ``True``, initial hidden values are learned. 277 | """ 278 | def __init__(self, controller, num_shifts=3, memory_shape=(128, 20), 279 | W_hid_to_key=lasagne.init.GlorotUniform(), 280 | b_hid_to_key=lasagne.init.Constant(0.), 281 | nonlinearity_key=nonlinearities.ClippedLinear(low=0., high=1.), 282 | W_hid_to_beta=lasagne.init.GlorotUniform(), 283 | b_hid_to_beta=lasagne.init.Constant(0.), 284 | nonlinearity_beta=lasagne.nonlinearities.rectify, 285 | W_hid_to_gate=lasagne.init.GlorotUniform(), 286 | b_hid_to_gate=lasagne.init.Constant(0.), 287 | nonlinearity_gate=T.nnet.hard_sigmoid, 288 | W_hid_to_shift=lasagne.init.GlorotUniform(), 289 | b_hid_to_shift=lasagne.init.Constant(0.), 290 | nonlinearity_shift=lasagne.nonlinearities.softmax, 291 | W_hid_to_gamma=lasagne.init.GlorotUniform(), 292 | b_hid_to_gamma=lasagne.init.Constant(0.), 293 | nonlinearity_gamma=lambda x: 1. + lasagne.nonlinearities.rectify(x), 294 | weights_init=init.OneHot(), 295 | learn_init=False, 296 | **kwargs): 297 | super(ReadHead, self).__init__(controller, num_shifts=num_shifts, memory_shape=memory_shape, 298 | W_hid_to_key=W_hid_to_key, b_hid_to_key=b_hid_to_key, nonlinearity_key=nonlinearity_key, 299 | W_hid_to_beta=W_hid_to_beta, b_hid_to_beta=b_hid_to_beta, nonlinearity_beta=nonlinearity_beta, 300 | W_hid_to_gate=W_hid_to_gate, b_hid_to_gate=b_hid_to_gate, nonlinearity_gate=nonlinearity_gate, 301 | W_hid_to_shift=W_hid_to_shift, b_hid_to_shift=b_hid_to_shift, nonlinearity_shift=nonlinearity_shift, 302 | W_hid_to_gamma=W_hid_to_gamma, b_hid_to_gamma=b_hid_to_gamma, nonlinearity_gamma=nonlinearity_gamma, 303 | weights_init=weights_init, learn_init=learn_init, **kwargs) 304 | 305 | 306 | class HeadCollection(object): 307 | r""" 308 | The base class :class:`HeadCollection` represents a generic collection 309 | of heads. Each head is an instance of :class:`Head`. This allows to 310 | process the heads simultaneously if they have the same type. This should 311 | be limited to internal uses only. 312 | 313 | Parameters 314 | ---------- 315 | heads: a list of :class:`Head` instances 316 | List of the heads. 317 | """ 318 | def __init__(self, heads): 319 | self.heads = heads 320 | # QKFIX: Assume that all the heads have the same number of shifts and nonlinearities 321 | self.memory_shape = self.heads[0].memory_shape 322 | self.num_shifts = self.heads[0].num_shifts 323 | # Key 324 | self.W_hid_to_key = T.concatenate([head.W_hid_to_key for head in self.heads], axis=0) 325 | self.b_hid_to_key = T.concatenate([head.b_hid_to_key for head in self.heads], axis=0) 326 | self.nonlinearity_key = self.heads[0].nonlinearity_key 327 | # Beta 328 | self.W_hid_to_beta = T.concatenate([head.W_hid_to_beta for head in self.heads], axis=0) 329 | self.b_hid_to_beta = T.concatenate([head.b_hid_to_beta for head in self.heads], axis=0) 330 | self.nonlinearity_beta = self.heads[0].nonlinearity_beta 331 | # Gate 332 | self.W_hid_to_gate = T.concatenate([head.W_hid_to_gate for head in self.heads], axis=0) 333 | self.b_hid_to_gate = T.concatenate([head.b_hid_to_gate for head in self.heads], axis=0) 334 | self.nonlinearity_gate = self.heads[0].nonlinearity_gate 335 | # Shift 336 | self.W_hid_to_shift = T.concatenate([head.W_hid_to_shift for head in self.heads], axis=0) 337 | self.b_hid_to_shift = T.concatenate([head.b_hid_to_shift for head in self.heads], axis=0) 338 | self.nonlinearity_shift = self.heads[0].nonlinearity_shift 339 | # Gamma 340 | self.W_hid_to_gamma = T.concatenate([head.W_hid_to_gamma for head in self.heads], axis=0) 341 | self.b_hid_to_gamma = T.concatenate([head.b_hid_to_gamma for head in self.heads], axis=0) 342 | self.nonlinearity_gamma = self.heads[0].nonlinearity_gamma 343 | # Initialization 344 | self.weights_init = T.concatenate([head.weights_init for head in self.heads], axis=0) 345 | 346 | def get_params(self, **tags): 347 | params = [] 348 | for head in self.heads: 349 | params += head.get_params(**tags) 350 | 351 | return params 352 | 353 | def get_weights(self, h_t, w_tm1, M_t, **kwargs): 354 | batch_size = self.heads[0].input_shape[0] # QKFIX: Get the size of the batches from the 1st head 355 | num_heads = len(self.heads) 356 | k_t = self.nonlinearity_key(T.dot(h_t, self.W_hid_to_key) + self.b_hid_to_key) 357 | beta_t = self.nonlinearity_beta(T.dot(h_t, self.W_hid_to_beta) + self.b_hid_to_beta) 358 | g_t = self.nonlinearity_gate(T.dot(h_t, self.W_hid_to_gate) + self.b_hid_to_gate) 359 | # QKFIX: If the nonlinearity is softmax (which is usually the case), then the activations 360 | # need to be reshaped (T.nnet.softmax only accepts 2D inputs) 361 | try: 362 | s_t = self.nonlinearity_shift(T.dot(h_t, self.W_hid_to_shift) + self.b_hid_to_shift) 363 | except ValueError: 364 | shift_activation_t = T.dot(h_t, self.W_hid_to_shift) + self.b_hid_to_shift 365 | s_t = self.nonlinearity_shift(shift_activation_t.reshape((h_t.shape[0] * num_heads, self.num_shifts))) 366 | s_t = s_t.reshape(shift_activation_t.shape) 367 | gamma_t = self.nonlinearity_gamma(T.dot(h_t, self.W_hid_to_gamma) + self.b_hid_to_gamma) 368 | 369 | # Content Addressing (3.3.1) 370 | beta_t = T.addbroadcast(beta_t, 2) 371 | betaK = beta_t * similarities.cosine_similarity(k_t, M_t) 372 | w_c = lasagne.nonlinearities.softmax(betaK.flatten(ndim=2)) 373 | w_c = w_c.reshape(betaK.shape) 374 | 375 | # Interpolation (3.3.2) 376 | g_t = T.addbroadcast(g_t, 2) 377 | w_g = g_t * w_c + (1. - g_t) * w_tm1 378 | 379 | # Convolutional Shift (3.3.2) 380 | # NOTE: This library is using a flat (zero-padded) convolution instead of the circular 381 | # convolution from the original paper. In practice, this change has a minimal impact. 382 | w_g_padded = w_g.reshape((h_t.shape[0] * num_heads, self.memory_shape[0])).dimshuffle(0, 'x', 'x', 1) 383 | conv_filter = s_t.reshape((h_t.shape[0] * num_heads, self.num_shifts)).dimshuffle(0, 'x', 'x', 1) 384 | pad = (self.num_shifts // 2, (self.num_shifts - 1) // 2) 385 | w_g_padded = padding.pad(w_g_padded, [pad], batch_ndim=3) 386 | convolution = T.nnet.conv2d(w_g_padded, conv_filter, 387 | input_shape=(None if batch_size is None else \ 388 | batch_size * num_heads, 1, 1, self.memory_shape[0] + pad[0] + pad[1]), 389 | filter_shape=(None if batch_size is None else \ 390 | batch_size * num_heads, 1, 1, self.num_shifts), 391 | subsample=(1, 1), 392 | border_mode='valid') 393 | w_tilde = convolution[T.arange(h_t.shape[0] * num_heads), T.arange(h_t.shape[0] * num_heads), 0, :] 394 | w_tilde = w_tilde.reshape((h_t.shape[0], num_heads, self.memory_shape[0])) 395 | 396 | # Sharpening (3.3.2) 397 | gamma_t = T.addbroadcast(gamma_t, 2) 398 | w = T.pow(w_tilde + 1e-6, gamma_t) 399 | w /= T.sum(w, axis=2).dimshuffle(0, 1, 'x') 400 | 401 | return w 402 | 403 | 404 | class ReadHeadCollection(HeadCollection): 405 | r""" 406 | Collection of read heads. 407 | 408 | Parameters 409 | ---------- 410 | heads: a list of :class:`ReadHead` instances 411 | List of the read heads. 412 | """ 413 | def __init__(self, heads): 414 | assert all([isinstance(head, ReadHead) for head in heads]) 415 | super(ReadHeadCollection, self).__init__(heads=heads) 416 | 417 | def read(self, w_tm1, M_t, **kwargs): 418 | r_t = T.batched_dot(w_tm1, M_t) 419 | 420 | return r_t.flatten(ndim=2) 421 | 422 | 423 | class WriteHeadCollection(HeadCollection): 424 | r""" 425 | Collection of write heads. 426 | 427 | Parameters 428 | ---------- 429 | heads: a list of :class:`WriteHead` instances 430 | List of the write heads. 431 | """ 432 | def __init__(self, heads): 433 | assert all([isinstance(head, WriteHead) for head in heads]) 434 | super(WriteHeadCollection, self).__init__(heads=heads) 435 | # Erase 436 | self.W_hid_to_erase = T.concatenate([head.W_hid_to_erase for head in self.heads], axis=0) 437 | self.b_hid_to_erase = T.concatenate([head.b_hid_to_erase for head in self.heads], axis=0) 438 | self.nonlinearity_erase = self.heads[0].nonlinearity_erase 439 | # Add 440 | self.W_hid_to_add = T.concatenate([head.W_hid_to_add for head in self.heads], axis=0) 441 | self.b_hid_to_add = T.concatenate([head.b_hid_to_add for head in self.heads], axis=0) 442 | self.nonlinearity_add = self.heads[0].nonlinearity_add 443 | 444 | def write(self, h_tm1, w_tm1, M_tm1, **kwargs): 445 | e_t = self.nonlinearity_erase(T.dot(h_tm1, self.W_hid_to_erase) + self.b_hid_to_erase) 446 | a_t = self.nonlinearity_add(T.dot(h_tm1, self.W_hid_to_add) + self.b_hid_to_add) 447 | # Erase 448 | M_tp1 = M_tm1 * T.prod(1 - w_tm1.dimshuffle(0, 1, 2, 'x') * e_t.dimshuffle(0, 1, 'x', 2), axis=1) 449 | # Add 450 | M_tp1 += T.sum(w_tm1.dimshuffle(0, 1, 2, 'x') * a_t.dimshuffle(0, 1, 'x', 2), axis=1) 451 | 452 | return M_tp1 453 | -------------------------------------------------------------------------------- /ntm/init.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | import lasagne.init 4 | from lasagne.utils import floatX 5 | 6 | 7 | class OneHot(lasagne.init.Initializer): 8 | """ 9 | Initialize the weights to one-hot vectors. 10 | """ 11 | def sample(self, shape): 12 | if len(shape) != 2: 13 | raise ValueError('The OneHot initializer ' 14 | 'only works with 2D arrays.') 15 | M = np.min(shape) 16 | arr = np.zeros(shape) 17 | arr[:M, :M] += 1 * np.eye(M) 18 | return floatX(arr) 19 | -------------------------------------------------------------------------------- /ntm/layers.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | 4 | from lasagne.layers import Layer 5 | 6 | from heads import ReadHead, WriteHead, ReadHeadCollection, WriteHeadCollection 7 | 8 | 9 | class NTMLayer(Layer): 10 | r""" 11 | A Neural Turing Machine layer. 12 | 13 | Parameters 14 | ---------- 15 | incoming: a :class:`lasagne.layers.Layer` instance 16 | The layer feeding into the Neural Turing Machine. This 17 | layer must match the incoming layer in the controller. 18 | memory: a :class:`Memory` instance 19 | The memory of the NTM. 20 | controller: a :class:`Controller` instance 21 | The controller of the NTM. 22 | heads: a list of :class:`Head` instances 23 | The read and write heads of the NTM. 24 | only_return_final: bool 25 | If ``True``, only return the final sequential output (e.g. 26 | for tasks where a single target value for the entire 27 | sequence is desired). In this case, Theano makes an 28 | optimization which saves memory. 29 | """ 30 | def __init__(self, incoming, 31 | memory, 32 | controller, 33 | heads, 34 | only_return_final=False, 35 | **kwargs): 36 | super(NTMLayer, self).__init__(incoming, **kwargs) 37 | 38 | self.memory = memory 39 | self.controller = controller 40 | self.heads = heads 41 | self.write_heads = WriteHeadCollection(heads=\ 42 | filter(lambda head: isinstance(head, WriteHead), heads)) 43 | self.read_heads = ReadHeadCollection(heads=\ 44 | filter(lambda head: isinstance(head, ReadHead), heads)) 45 | self.only_return_final = only_return_final 46 | 47 | def get_output_shape_for(self, input_shapes): 48 | if self.only_return_final: 49 | return (input_shapes[0], self.controller.num_units) 50 | else: 51 | return (input_shapes[0], input_shapes[1], self.controller.num_units) 52 | 53 | def get_params(self, **tags): 54 | params = super(NTMLayer, self).get_params(**tags) 55 | params += self.controller.get_params(**tags) 56 | params += self.memory.get_params(**tags) 57 | for head in self.heads: 58 | params += head.get_params(**tags) 59 | 60 | return params 61 | 62 | def get_output_for(self, input, get_details=False, **kwargs): 63 | 64 | input = input.dimshuffle(1, 0, 2) 65 | 66 | def step(x_t, M_tm1, h_tm1, state_tm1, ww_tm1, wr_tm1, *params): 67 | # Update the memory (using w_tm1 of the writing heads & M_tm1) 68 | M_t = self.write_heads.write(h_tm1, ww_tm1, M_tm1) 69 | 70 | # Get the read vector (using w_tm1 of the reading heads & M_t) 71 | r_t = self.read_heads.read(wr_tm1, M_t) 72 | 73 | # Apply the controller (using x_t, r_t & the requirements for the controller) 74 | h_t, state_t = self.controller.step(x_t, r_t, h_tm1, state_tm1) 75 | 76 | # Update the weights (using h_t, M_t & w_tm1) 77 | ww_t = self.write_heads.get_weights(h_t, ww_tm1, M_t) 78 | wr_t = self.read_heads.get_weights(h_t, wr_tm1, M_t) 79 | 80 | return [M_t, h_t, state_t, ww_t, wr_t] 81 | 82 | memory_init = T.tile(self.memory.memory_init, (input.shape[1], 1, 1)) 83 | memory_init = T.unbroadcast(memory_init, 0) 84 | 85 | write_weights_init = T.tile(self.write_heads.weights_init, (input.shape[1], 1, 1)) 86 | write_weights_init = T.unbroadcast(write_weights_init, 0) 87 | read_weights_init = T.tile(self.read_heads.weights_init, (input.shape[1], 1, 1)) 88 | read_weights_init = T.unbroadcast(read_weights_init, 0) 89 | 90 | non_seqs = self.controller.get_params() + self.memory.get_params() + \ 91 | self.write_heads.get_params() + self.read_heads.get_params() 92 | 93 | hids, _ = theano.scan( 94 | fn=step, 95 | sequences=input, 96 | outputs_info=[memory_init] + self.controller.outputs_info(input.shape[1]) + \ 97 | [write_weights_init, read_weights_init], 98 | non_sequences=non_seqs, 99 | strict=True) 100 | 101 | # dimshuffle back to (n_batch, n_time_steps, n_features) 102 | if get_details: 103 | hid_out = [ 104 | hids[0].dimshuffle(1, 0, 2, 3), 105 | hids[1].dimshuffle(1, 0, 2), 106 | hids[2].dimshuffle(1, 0, 2), 107 | hids[3].dimshuffle(1, 0, 2, 3), 108 | hids[4].dimshuffle(1, 0, 2, 3)] 109 | else: 110 | if self.only_return_final: 111 | hid_out = hids[1][-1] 112 | else: 113 | hid_out = hids[1].dimshuffle(1, 0, 2) 114 | 115 | return hid_out 116 | -------------------------------------------------------------------------------- /ntm/memory.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | 5 | from lasagne.layers import InputLayer 6 | import lasagne.init 7 | 8 | 9 | class Memory(InputLayer): 10 | r""" 11 | Memory of the Neural Turing Machine. 12 | 13 | Parameters 14 | ---------- 15 | memory_shape: tuple 16 | Shape of the NTM's memory. 17 | memory_init: callable, Numpy array or Theano shared variable 18 | Initializer for the initial state of the memory (:math:`M_{0}`). 19 | The initial state of the memory must be non-zero. 20 | learn_init: bool 21 | If ``True``, initial state of the memory is learned. 22 | """ 23 | def __init__(self, memory_shape, 24 | memory_init=lasagne.init.Constant(1e-6), 25 | learn_init=True, 26 | **kwargs): 27 | super(Memory, self).__init__(memory_shape, **kwargs) 28 | self.memory_init = self.add_param( 29 | memory_init, memory_shape, 30 | name='memory_init', trainable=learn_init, regularizable=False) 31 | -------------------------------------------------------------------------------- /ntm/nonlinearities.py: -------------------------------------------------------------------------------- 1 | import theano.tensor as T 2 | 3 | 4 | class ClippedLinear(object): 5 | """ 6 | Clipped linear activation. 7 | """ 8 | def __init__(self, low=0., high=1.): 9 | super(ClippedLinear, self).__init__() 10 | self.low = low 11 | self.high = high 12 | 13 | def __call__(self, x): 14 | return T.clip(x, self.low, self.high) 15 | 16 | def hard_sigmoid(x): 17 | return T.nnet.hard_sigmoid(x) -------------------------------------------------------------------------------- /ntm/similarities.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | 5 | 6 | def cosine_similarity(x, y, eps=1e-6): 7 | r""" 8 | Cosine similarity between a vector and each row of a base matrix. 9 | 10 | Parameters 11 | ---------- 12 | x: a 3D Theano variable 13 | Vector to compare to each row of the matrix y. 14 | y: a 3D Theano variable 15 | Matrix to be compared to 16 | eps: float 17 | Precision of the operation (necessary for differentiability). 18 | 19 | Return 20 | ------ 21 | z: a 3D Theano variable 22 | A vector whose components are the cosine similarities 23 | between x and each row of y. 24 | """ 25 | z = T.batched_dot(x, y.dimshuffle(0, 2, 1)) 26 | z /= T.sqrt(T.sum(x * x, axis=2).dimshuffle(0, 1, 'x') * T.sum(y * y, axis=2).dimshuffle(0, 'x', 1) + eps) 27 | 28 | return z 29 | -------------------------------------------------------------------------------- /ntm/test/test_heads.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | 3 | import theano 4 | import theano.tensor as T 5 | import numpy as np 6 | 7 | import lasagne.nonlinearities 8 | from lasagne.theano_extensions import padding 9 | 10 | 11 | def test_content_addressing(): 12 | from ntm.similarities import cosine_similarity 13 | beta_var, key_var, memory_var = T.tensor3s('beta', 'key', 'memory') 14 | 15 | beta_var = T.addbroadcast(beta_var, 2) 16 | betaK = beta_var * cosine_similarity(key_var, memory_var) 17 | w_c = lasagne.nonlinearities.softmax(betaK.reshape((16 * 4, 128))) 18 | w_c = w_c.reshape(betaK.shape) 19 | 20 | content_addressing_fn = theano.function([beta_var, key_var, memory_var], w_c) 21 | 22 | beta = np.random.rand(16, 4, 1) 23 | key = np.random.rand(16, 4, 20) 24 | memory = np.random.rand(16, 128, 20) 25 | 26 | weights = content_addressing_fn(beta, key, memory) 27 | weights_manual = np.zeros_like(weights) 28 | 29 | def softmax(x): 30 | y = np.exp(x.T - np.max(x, axis=1)) 31 | z = y / np.sum(y, axis=0) 32 | return z.T 33 | 34 | betaK_manual = np.zeros((16, 4, 128)) 35 | for i in range(16): 36 | for j in range(4): 37 | for k in range(128): 38 | betaK_manual[i, j, k] = beta[i, j, 0] * np.dot(key[i, j], \ 39 | memory[i, k]) / np.sqrt(np.sum(key[i, j] * key[i, j]) * \ 40 | np.sum(memory[i, k] * memory[i, k]) + 1e-6) 41 | for i in range(16): 42 | weights_manual[i] = softmax(betaK_manual[i]) 43 | 44 | assert weights.shape == (16, 4, 128) 45 | assert np.allclose(np.sum(weights, axis=2), np.ones((16, 4))) 46 | assert np.allclose(weights, weights_manual) 47 | 48 | 49 | def test_convolutional_shift(): 50 | weights_var, shift_var = T.tensor3s('weights', 'shift') 51 | num_shifts = 3 52 | 53 | weights_reshaped = weights_var.reshape((16 * 4, 128)) 54 | weights_reshaped = weights_reshaped.dimshuffle(0, 'x', 'x', 1) 55 | shift_reshaped = shift_var.reshape((16 * 4, num_shifts)) 56 | shift_reshaped = shift_reshaped.dimshuffle(0, 'x', 'x', 1) 57 | pad = (num_shifts // 2, (num_shifts - 1) // 2) 58 | weights_padded = padding.pad(weights_reshaped, [pad], batch_ndim=3) 59 | convolution = T.nnet.conv2d(weights_padded, shift_reshaped, 60 | input_shape=(16 * 4, 1, 1, 128 + pad[0] + pad[1]), 61 | filter_shape=(16 * 4, 1, 1, num_shifts), 62 | subsample=(1, 1), 63 | border_mode='valid') 64 | w_tilde = convolution[T.arange(16 * 4), T.arange(16 * 4), 0, :] 65 | w_tilde = w_tilde.reshape((16, 4, 128)) 66 | 67 | convolutional_shift_fn = theano.function([weights_var, shift_var], w_tilde) 68 | 69 | weights = np.random.rand(16, 4, 128) 70 | shift = np.random.rand(16, 4, 3) 71 | 72 | weight_tilde = convolutional_shift_fn(weights, shift) 73 | weight_tilde_manual = np.zeros_like(weight_tilde) 74 | 75 | for i in range(16): 76 | for j in range(4): 77 | for k in range(128): 78 | # Filters in T.nnet.conv2d are reversed 79 | if (k - 1) >= 0: 80 | weight_tilde_manual[i, j, k] += shift[i, j, 2] * weights[i, j, k - 1] 81 | weight_tilde_manual[i, j, k] += shift[i, j, 1] * weights[i, j, k] 82 | if (k + 1) < 128: 83 | weight_tilde_manual[i, j, k] += shift[i, j, 0] * weights[i, j, k + 1] 84 | 85 | assert weight_tilde.shape == (16, 4, 128) 86 | assert np.allclose(weight_tilde, weight_tilde_manual) 87 | 88 | 89 | def test_sharpening(): 90 | weight_var, gamma_var = T.tensor3s('weight', 'gamma') 91 | 92 | gamma_var = T.addbroadcast(gamma_var, 2) 93 | w = T.pow(weight_var + 1e-6, gamma_var) 94 | w /= T.sum(w, axis=2).dimshuffle(0, 1, 'x') 95 | 96 | sharpening_fn = theano.function([weight_var, gamma_var], w) 97 | 98 | weights = np.random.rand(16, 4, 128) 99 | gamma = np.random.rand(16, 4, 1) 100 | 101 | weight_t = sharpening_fn(weights, gamma) 102 | weight_t_manual = np.zeros_like(weight_t) 103 | 104 | for i in range(16): 105 | for j in range(4): 106 | for k in range(128): 107 | weight_t_manual[i, j, k] = np.power(weights[i, j, k] + 1e-6, gamma[i, j]) 108 | weight_t_manual[i, j] /= np.sum(weight_t_manual[i, j]) 109 | 110 | assert weight_t.shape == (16, 4, 128) 111 | assert np.allclose(weight_t, weight_t_manual) -------------------------------------------------------------------------------- /ntm/test/test_layers.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | 3 | import theano 4 | import theano.tensor as T 5 | import numpy as np 6 | 7 | from lasagne.layers import InputLayer, ReshapeLayer, DenseLayer 8 | from lasagne.layers import get_output, get_all_param_values, set_all_param_values 9 | from ntm.layers import NTMLayer 10 | from ntm.heads import WriteHead, ReadHead 11 | from ntm.controllers import DenseController 12 | from ntm.memory import Memory 13 | 14 | 15 | def model(input_var, batch_size=1): 16 | l_input = InputLayer((batch_size, None, 8), input_var=input_var) 17 | batch_size_var, seqlen, _ = l_input.input_var.shape 18 | 19 | # Neural Turing Machine Layer 20 | memory = Memory((128, 20), name='memory') 21 | controller = DenseController(l_input, memory_shape=(128, 20), 22 | num_units=100, num_reads=1, name='controller') 23 | heads = [ 24 | WriteHead(controller, num_shifts=3, memory_shape=(128, 20), name='write'), 25 | ReadHead(controller, num_shifts=3, memory_shape=(128, 20), name='read') 26 | ] 27 | l_ntm = NTMLayer(l_input, memory=memory, controller=controller, heads=heads) 28 | 29 | # Output Layer 30 | l_output_reshape = ReshapeLayer(l_ntm, (-1, 100)) 31 | l_output_dense = DenseLayer(l_output_reshape, num_units=8, name='dense') 32 | l_output = ReshapeLayer(l_output_dense, (batch_size_var if batch_size \ 33 | is None else batch_size, seqlen, 8)) 34 | 35 | return l_output 36 | 37 | 38 | def test_batch_size(): 39 | input_var01, input_var16 = T.tensor3s('input01', 'input16') 40 | l_output01 = model(input_var01, batch_size=1) 41 | l_output16 = model(input_var16, batch_size=16) 42 | 43 | # Share the parameters for both models 44 | params01 = get_all_param_values(l_output01) 45 | set_all_param_values(l_output16, params01) 46 | 47 | posterior_fn01 = theano.function([input_var01], get_output(l_output01)) 48 | posterior_fn16 = theano.function([input_var16], get_output(l_output16)) 49 | 50 | example_input = np.random.rand(16, 30, 8) 51 | example_output16 = posterior_fn16(example_input) 52 | example_output01 = np.zeros_like(example_output16) 53 | 54 | for i in range(16): 55 | example_output01[i] = posterior_fn01(example_input[i][np.newaxis, :, :]) 56 | 57 | assert example_output16.shape == (16, 30, 8) 58 | assert np.allclose(example_output16, example_output01, atol=1e-3) 59 | 60 | 61 | def test_batch_size_none(): 62 | input_var = T.tensor3('input') 63 | l_output = model(input_var, batch_size=None) 64 | posterior_fn = theano.function([input_var], get_output(l_output)) 65 | 66 | example_input = np.random.rand(16, 30, 8) 67 | example_output = posterior_fn(example_input) 68 | 69 | assert example_output.shape == (16, 30, 8) 70 | -------------------------------------------------------------------------------- /ntm/test/test_similarities.py: -------------------------------------------------------------------------------- 1 | import pytest 2 | 3 | import theano 4 | import theano.tensor as T 5 | import numpy as np 6 | 7 | 8 | def test_cosine_similarity(): 9 | from ntm.similarities import cosine_similarity 10 | 11 | key_var, memory_var = T.tensor3s('key', 'memory') 12 | cosine_similarity_fn = theano.function([key_var, memory_var], \ 13 | cosine_similarity(key_var, memory_var, eps=1e-6)) 14 | 15 | test_key = np.random.rand(16, 4, 20) 16 | test_memory = np.random.rand(16, 128, 20) 17 | 18 | test_output = cosine_similarity_fn(test_key, test_memory) 19 | test_output_manual = np.zeros_like(test_output) 20 | 21 | for i in range(16): 22 | for j in range(4): 23 | for k in range(128): 24 | test_output_manual[i, j, k] = np.dot(test_key[i, j], test_memory[i, k]) / \ 25 | np.sqrt(np.sum(test_key[i, j] * test_key[i, j]) * np.sum(test_memory[i, k] * \ 26 | test_memory[i, k]) + 1e-6) 27 | 28 | assert np.allclose(test_output, test_output_manual) 29 | -------------------------------------------------------------------------------- /ntm/updates.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | 5 | from lasagne.updates import get_or_compute_grads 6 | from collections import OrderedDict 7 | 8 | def graves_rmsprop(loss_or_grads, params, learning_rate=1e-4, chi=0.95, alpha=0.9, epsilon=1e-4): 9 | r""" 10 | Alex Graves' RMSProp [1]_. 11 | 12 | .. math :: 13 | n_{i} &= \chi * n_i-1 + (1 - \chi) * grad^{2}\\ 14 | g_{i} &= \chi * g_i-1 + (1 - \chi) * grad\\ 15 | \Delta_{i} &= \alpha * Delta_{i-1} - learning_rate * grad / 16 | sqrt(n_{i} - g_{i}^{2} + \epsilon)\\ 17 | w_{i} &= w_{i-1} + \Delta_{i} 18 | 19 | References 20 | ---------- 21 | .. [1] Graves, Alex. 22 | "Generating Sequences With Recurrent Neural Networks", p.23 23 | arXiv:1308.0850 24 | 25 | """ 26 | grads = get_or_compute_grads(loss_or_grads, params) 27 | updates = OrderedDict() 28 | 29 | for param, grad in zip(params, grads): 30 | value = param.get_value(borrow=True) 31 | n = theano.shared(np.zeros(value.shape, dtype=value.dtype), 32 | broadcastable=param.broadcastable) 33 | g = theano.shared(np.zeros(value.shape, dtype=value.dtype), 34 | broadcastable=param.broadcastable) 35 | delta = theano.shared(np.zeros(value.shape, dtype=value.dtype), 36 | broadcastable=param.broadcastable) 37 | n_ip1 = chi * n + (1. - chi) * grad ** 2 38 | g_ip1 = chi * g + (1. - chi) * grad 39 | delta_ip1 = alpha * delta - learning_rate * grad / T.sqrt(n_ip1 + \ 40 | g_ip1 ** 2 + epsilon) 41 | updates[n] = n_ip1 42 | updates[g] = g_ip1 43 | updates[delta] = delta_ip1 44 | updates[param] = param + delta_ip1 45 | 46 | return updates 47 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | appdirs==1.4.3 2 | cycler==0.10.0 3 | functools32==3.2.3.post2 4 | matplotlib==2.0.0 5 | numpy==1.12.1 6 | packaging==16.8 7 | py==1.4.33 8 | pyparsing==2.2.0 9 | pytest==3.0.7 10 | python-dateutil==2.6.0 11 | pytz==2016.10 12 | scipy==0.19.0 13 | six==1.10.0 14 | subprocess32==3.2.7 15 | Theano==0.9.0 16 | 17 | git+ssh://git@github.com/Lasagne/Lasagne.git -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | from setuptools import find_packages 3 | 4 | 5 | setup(name='NTM-Lasagne', 6 | version='0.3.0', 7 | description='Neural Turing Machines in Theano with Lasagne', 8 | author='Tristan Deleu', 9 | author_email='tristan.deleu@snips.ai', 10 | url='', 11 | download_url='', 12 | license='MIT', 13 | install_requires=[ 14 | 'numpy>=1.12.1', 15 | 'theano==0.9.0' 16 | ], 17 | packages=['ntm'], 18 | include_package_data=False, 19 | zip_safe=False) -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/snipsco/ntm-lasagne/65c950b01f52afb87cf3dccc963d8bbc5b1dbf69/utils/__init__.py -------------------------------------------------------------------------------- /utils/generators.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import numpy as np 3 | 4 | 5 | class Task(object): 6 | 7 | def __init__(self, max_iter=None, batch_size=1): 8 | self.max_iter = max_iter 9 | self.batch_size = batch_size 10 | self.num_iter = 0 11 | 12 | def __iter__(self): 13 | return self 14 | 15 | def __next__(self): 16 | return self.next() 17 | 18 | def next(self): 19 | if (self.max_iter is None) or (self.num_iter < self.max_iter): 20 | self.num_iter += 1 21 | params = self.sample_params() 22 | return (self.num_iter - 1), self.sample(**params) 23 | else: 24 | raise StopIteration() 25 | 26 | def sample_params(self): 27 | raise NotImplementedError() 28 | 29 | def sample(self): 30 | raise NotImplementedError() 31 | 32 | 33 | class CopyTask(Task): 34 | 35 | def __init__(self, size, max_length, min_length=1, max_iter=None, \ 36 | batch_size=1, end_marker=False): 37 | super(CopyTask, self).__init__(max_iter=max_iter, batch_size=batch_size) 38 | self.size = size 39 | self.min_length = min_length 40 | self.max_length = max_length 41 | self.end_marker = end_marker 42 | 43 | def sample_params(self, length=None): 44 | if length is None: 45 | length = np.random.randint(self.min_length, self.max_length + 1) 46 | return {'length': length} 47 | 48 | def sample(self, length): 49 | sequence = np.random.binomial(1, 0.5, (self.batch_size, length, self.size)) 50 | example_input = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \ 51 | self.size + 1), dtype=theano.config.floatX) 52 | example_output = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \ 53 | self.size + 1), dtype=theano.config.floatX) 54 | 55 | example_input[:, :length, :self.size] = sequence 56 | example_input[:, length, -1] = 1 57 | example_output[:, length + 1:2 * length + 1, :self.size] = sequence 58 | if self.end_marker: 59 | example_output[:, -1, -1] = 1 60 | 61 | return example_input, example_output 62 | 63 | 64 | class RepeatCopyTask(Task): 65 | 66 | def __init__(self, size, max_length, max_repeats=20, min_length=1, \ 67 | min_repeats=1, unary=False, max_iter=None, batch_size=1, end_marker=False): 68 | super(RepeatCopyTask, self).__init__(max_iter=max_iter, batch_size=batch_size) 69 | self.size = size 70 | self.min_length = min_length 71 | self.max_length = max_length 72 | self.min_repeats = min_repeats 73 | self.max_repeats = max_repeats 74 | self.unary = unary 75 | self.end_marker = end_marker 76 | 77 | def sample_params(self, length=None, repeats=None): 78 | if length is None: 79 | length = np.random.randint(self.min_length, self.max_length + 1) 80 | if repeats is None: 81 | repeats = np.random.randint(self.min_repeats, self.max_repeats + 1) 82 | return {'length': length, 'repeats': repeats} 83 | 84 | def sample(self, length, repeats): 85 | sequence = np.random.binomial(1, 0.5, (self.batch_size, length, self.size)) 86 | num_repeats_length = repeats if self.unary else 1 87 | example_input = np.zeros((self.batch_size, (repeats + 1) * length + \ 88 | num_repeats_length + 1 + self.end_marker, self.size + 2), dtype=theano.config.floatX) 89 | example_output = np.zeros((self.batch_size, (repeats + 1) * length + \ 90 | num_repeats_length + 1 + self.end_marker, self.size + 2), dtype=theano.config.floatX) 91 | 92 | example_input[:, :length, :self.size] = sequence 93 | for j in range(repeats): 94 | example_output[:, (j + 1) * length + num_repeats_length + 1:\ 95 | (j + 2) * length + num_repeats_length + 1, :self.size] = sequence 96 | if self.unary: 97 | example_input[:, length:length + repeats, -2] = 1 98 | else: 99 | example_input[:, length, -2] = repeats / float(self.max_repeats) 100 | example_input[:, length + num_repeats_length, -1] = 1 101 | if self.end_marker: 102 | example_output[:, -1, -1] = 1 103 | 104 | return example_input, example_output 105 | 106 | 107 | class AssociativeRecallTask(Task): 108 | 109 | def __init__(self, size, max_item_length, max_num_items, \ 110 | min_item_length=1, min_num_items=2, max_iter=None, batch_size=1): 111 | super(AssociativeRecallTask, self).__init__(max_iter=max_iter, batch_size=batch_size) 112 | self.size = size 113 | self.max_item_length = max_item_length 114 | self.max_num_items = max_num_items 115 | self.min_item_length = min_item_length 116 | self.min_num_items = min_num_items 117 | 118 | def sample_params(self, item_length=None, num_items=None): 119 | if item_length is None: 120 | item_length = np.random.randint(self.min_item_length, \ 121 | self.max_item_length + 1) 122 | if num_items is None: 123 | num_items = np.random.randint(self.min_num_items, \ 124 | self.max_num_items + 1) 125 | return {'item_length': item_length, 'num_items': num_items} 126 | 127 | def sample(self, item_length, num_items): 128 | def item_slice(j): 129 | slice_idx = j * (item_length + 1) + 1 130 | return slice(slice_idx, slice_idx + item_length) 131 | 132 | items = np.random.binomial(1, 0.5, (self.batch_size, item_length, self.size, num_items)) 133 | queries = np.random.randint(num_items - 1, size=self.batch_size) 134 | example_input = np.zeros((self.batch_size, (item_length + 1) * (num_items + 2), \ 135 | self.size + 2), dtype=theano.config.floatX) 136 | example_output = np.zeros((self.batch_size, (item_length + 1) * (num_items + 2), \ 137 | self.size + 2), dtype=theano.config.floatX) 138 | 139 | for j in range(num_items): 140 | example_input[:, j * (item_length + 1), -2] = 1 141 | example_input[:, item_slice(j), :self.size] = items[:,:,:,j] 142 | example_input[:, num_items * (item_length + 1), -1] = 1 143 | for batch in range(self.batch_size): 144 | example_input[batch, item_slice(num_items), :self.size] = items[batch,:,:,queries[batch]] 145 | example_output[batch, -item_length:, :self.size] = items[batch,:,:,queries[batch] + 1] 146 | example_input[:, (num_items + 1) * (item_length + 1), -1] = 1 147 | 148 | return example_input, example_output 149 | 150 | 151 | class DynamicNGramsTask(Task): 152 | 153 | def __init__(self, ngrams, max_length, min_length=1, max_iter=None, \ 154 | table=None, batch_size=1): 155 | super(DynamicNGramsTask, self).__init__(max_iter=max_iter, batch_size=batch_size) 156 | self.ngrams = ngrams 157 | if table is None: 158 | table = self.make_table() 159 | self.table = table 160 | self.max_length = max(ngrams, max_length) 161 | self.min_length = min_length 162 | 163 | def make_table(self): 164 | return np.random.beta(0.5, 0.5, 1 << self.ngrams) 165 | 166 | def sample_params(self, length=None): 167 | if length is None: 168 | length = np.random.randint(self.min_length, self.max_length + 1) 169 | return {'length': length} 170 | 171 | def sample(self, length): 172 | sequence = np.zeros((self.batch_size, length + 1, 1), dtype=theano.config.floatX) 173 | head = np.random.binomial(1, 0.5, (self.batch_size, self.ngrams)) 174 | sequence[:, :self.ngrams, 0] = head 175 | index = np.dot(head, 1 << (np.arange(self.ngrams, 0, -1) - 1)) 176 | mask = (1 << (self.ngrams - 1)) - 1 177 | 178 | for j in range(self.ngrams, length + 1): 179 | b = np.random.binomial(1, self.table[index]) 180 | sequence[:, j, 0] = b 181 | index = ((index & mask) << 1) + b 182 | 183 | return sequence[:,:-1], sequence[:,1:] 184 | 185 | 186 | class DyckWordsTask(Task): 187 | 188 | def __init__(self, max_length, min_length=1, max_iter=None, batch_size=1): 189 | super(DyckWordsTask, self).__init__(max_iter=max_iter, batch_size=batch_size) 190 | self.max_length = max_length 191 | self.min_length = min_length 192 | 193 | def sample_params(self, length=None): 194 | if length is None: 195 | length = np.random.randint(self.min_length, self.max_length + 1) 196 | return {'length': length} 197 | 198 | def sample(self, length): 199 | example_input = np.zeros((self.batch_size, 2 * length, 1), \ 200 | dtype=theano.config.floatX) 201 | example_output = np.zeros((self.batch_size, 2 * length, 1), \ 202 | dtype=theano.config.floatX) 203 | is_dyck_word = np.random.binomial(1, 0.5, self.batch_size)\ 204 | .astype(dtype=theano.config.floatX) 205 | 206 | for batch in range(self.batch_size): 207 | if is_dyck_word[batch]: 208 | word = self.get_random_dyck(length) 209 | else: 210 | word = self.get_random_non_dyck(length) 211 | example_input[batch, :, 0] = word 212 | example_output[batch, :, 0] = self.get_dyck_prefix(word) 213 | 214 | return example_input, example_output 215 | 216 | def get_dyck_prefix(self, word): 217 | def dyck_prefixes(prefixes_and_stack, u): 218 | prefixes, is_valid, stack = prefixes_and_stack 219 | if u: stack -= 1 220 | else: stack += 1 221 | if stack < 0: 222 | is_valid = False 223 | prefixes.append(is_valid and (stack == 0)) 224 | return (prefixes, is_valid, stack) 225 | 226 | prefixes, _, _ = reduce(dyck_prefixes, word, ([], True, 0)) 227 | return prefixes 228 | 229 | def get_random_dyck(self, n): 230 | """ 231 | Return a random Dyck word of a given semilength `n` 232 | 233 | This algorithm is based on a conjugacy property between words in 234 | the language `L = S(u^n d^{n+1})` and *Dyck words* of length 2n, 235 | where `S` is the group of permutations. 236 | This 1-to-(2n+1) correspondance between these words is given by 237 | the cycle lemma: 238 | 239 | **Cycle Lemma**: Let `A = {u, d}` be a binary alphabet and `delta` 240 | a "height" function such that `delta(u) = +1` and `delta(d) = -1`. 241 | For any word `w` in `A^*` such that `delta(w) = -1`, there exists 242 | a unique factorization `w = w_1 w_2` satisfying 243 | - `w_1` is not empty; 244 | - `w_2 w_1` has the Lukasiewicz property, i.e. any strict left 245 | factor of `w_2 w_1` satisfies `delta(v) >= 0`. 246 | where we extend the definition of `delta` to words by summing the 247 | heights of every individual letter. 248 | 249 | To summarize, here is the pseudo-code for this algorithm: 250 | - Pick a random word `w` in the language `L = S(u^n d^{n+1})` 251 | - Apply the cycle lemma to find the unique conjugate of 252 | `w` having the Lukasiewicz property 253 | - Return its prefix of length 2n, which is a Dyck word 254 | 255 | See: [Fla09], Notes I.47 and I.49 (pp.75-77) 256 | 257 | [Fla09] Analytic Combinatorics, *Philippe Flajolet, Robert Sedgewick* 258 | 259 | """ 260 | # Get a random element in L = u^n d^{n+1} 261 | w = [0] * n + [1] * (n + 1) 262 | np.random.shuffle(w) 263 | 264 | # Get the unique conjugate of w having the Lukasiewicz property 265 | # (Cycle Lemma) 266 | min_height = (0, 0) 267 | stack = 0 268 | for i in range(2 * n): 269 | if w[i]: stack -= 1 270 | else: stack += 1 271 | if stack < min_height[1]: 272 | min_height = (i + 1, stack) 273 | min_idx = min_height[0] 274 | luka = w[min_idx:] + w[:min_idx] 275 | 276 | return luka[:-1] 277 | 278 | def get_random_non_dyck(self, n): 279 | """ 280 | Return a random balanced non-Dyck word of semilength `n` 281 | 282 | The algorithm is based on the bijection between words in the 283 | language `L = S(u^{n-1} d^{n+1})` and the balanced words of length 284 | 2n that are not Dyck words. This transformation is given by the 285 | reflection of the letters after the first letter that violates 286 | the Dyck property (i.e. the first right parenthesis that does 287 | not have a matching left counterpart). The reflexion transformation 288 | is defined by transforming any left parenthesis in a right one 289 | and vice-versa. 290 | 291 | To summarize, here is the pseudo-code for this algorithm: 292 | - Pick a random word `w` in the language `L = S(u^{n-1} d^{n+1})` 293 | - Find the first letter violating the Dyck property 294 | - Apply the reflection transformation to the following letters 295 | """ 296 | w = [0] * (n - 1) + [1] * (n + 1) 297 | np.random.shuffle(w) 298 | 299 | stack, reflection = (0, False) 300 | for i in range(2 * n): 301 | if reflection: 302 | w[i] = 1 * (not w[i]) 303 | else: 304 | if w[i]: stack -= 1 305 | else: stack += 1 306 | reflection = (stack < 0) 307 | return w 308 | 309 | 310 | class UpsideDownCopyTask(Task): 311 | 312 | def __init__(self, size, max_length, min_length=1, max_iter=None, \ 313 | batch_size=1, end_marker=False): 314 | super(UpsideDownCopyTask, self).__init__(max_iter=max_iter, batch_size=batch_size) 315 | self.size = size 316 | self.min_length = min_length 317 | self.max_length = max_length 318 | self.end_marker = end_marker 319 | 320 | def sample_params(self, length=None): 321 | if length is None: 322 | length = np.random.randint(self.min_length, self.max_length + 1) 323 | return {'length': length} 324 | 325 | def sample(self, length): 326 | sequence = np.random.binomial(1, 0.5, (self.batch_size, length, self.size)) 327 | example_input = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \ 328 | self.size + 1), dtype=theano.config.floatX) 329 | example_output = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \ 330 | self.size + 1), dtype=theano.config.floatX) 331 | index = 0 332 | reversed_sequence = np.empty(shape = sequence.shape) 333 | for inner in sequence: 334 | reversed_sequence[index] = np.fliplr(inner) 335 | index += 1 336 | example_input[:, :length, :self.size] = sequence 337 | example_input[:, length, -1] = 1 338 | example_output[:, length + 1:2 * length + 1, :self.size] = reversed_sequence 339 | if self.end_marker: 340 | example_output[:, -1, -1] = 1 341 | 342 | return example_input, example_output 343 | 344 | 345 | class ReversedCopyTask(Task): 346 | 347 | def __init__(self, size, max_length, min_length=1, max_iter=None, \ 348 | batch_size=1, end_marker=False): 349 | super(ReversedCopyTask, self).__init__(max_iter=max_iter, batch_size=batch_size) 350 | self.size = size 351 | self.min_length = min_length 352 | self.max_length = max_length 353 | self.end_marker = end_marker 354 | 355 | def sample_params(self, length=None): 356 | if length is None: 357 | length = np.random.randint(self.min_length, self.max_length + 1) 358 | return {'length': length} 359 | 360 | def sample(self, length): 361 | sequence = np.random.binomial(1, 0.5, (self.batch_size, length, self.size)) 362 | example_input = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \ 363 | self.size + 1), dtype=theano.config.floatX) 364 | example_output = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \ 365 | self.size + 1), dtype=theano.config.floatX) 366 | index = 0 367 | reversed_sequence = np.empty(shape = sequence.shape) 368 | for inner in sequence: 369 | reversed_sequence[index] = np.flipud(inner) 370 | index += 1 371 | example_input[:, :length, :self.size] = sequence 372 | example_input[:, length, -1] = 1 373 | example_output[:, length + 1:2 * length + 1, :self.size] = reversed_sequence 374 | if self.end_marker: 375 | example_output[:, -1, -1] = 1 376 | 377 | return example_input, example_output 378 | 379 | class SortTask(Task): 380 | 381 | def __init__(self, size, max_length, min_length=1, max_iter=None, \ 382 | batch_size=1, end_marker=False): 383 | super(SortTask, self).__init__(max_iter=max_iter, batch_size=batch_size) 384 | self.size = size 385 | self.min_length = min_length 386 | self.max_length = max_length 387 | self.end_marker = end_marker 388 | 389 | def sample_params(self, length=None): 390 | if length is None: 391 | length = np.random.randint(self.min_length, self.max_length + 1) 392 | return {'length': length} 393 | 394 | def sample(self, length): 395 | sequence = np.random.binomial(1, 0.5, (self.batch_size, length, self.size)) 396 | example_input = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \ 397 | self.size + 1), dtype=theano.config.floatX) 398 | example_output = np.zeros((self.batch_size, 2 * length + 1 + self.end_marker, \ 399 | self.size + 1), dtype=theano.config.floatX) 400 | index = 0 401 | sorted_sequence = np.empty(shape = sequence.shape) 402 | for inner in sequence: 403 | sorted_sequence[index] = inner[np.lexsort(inner.T[::-1])] 404 | index += 1 405 | example_input[:, :length, :self.size] = sequence 406 | example_input[:, length, -1] = 1 407 | example_output[:, length + 1:2 * length + 1, :self.size] = sorted_sequence 408 | if self.end_marker: 409 | example_output[:, -1, -1] = 1 410 | 411 | return example_input, example_output 412 | -------------------------------------------------------------------------------- /utils/visualization.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import theano.tensor as T 3 | import numpy as np 4 | import pandas as pd 5 | 6 | import matplotlib 7 | import matplotlib.pyplot as plt 8 | 9 | class Dashboard(object): 10 | 11 | def __init__(self, ntm_fn, generator, memory_shape, ntm_layer_fn=None, \ 12 | cmap='bone', markers=[]): 13 | super(Dashboard, self).__init__() 14 | self.ntm_fn = ntm_fn 15 | self.ntm_layer_fn = ntm_layer_fn 16 | self.memory_shape = memory_shape 17 | self.generator = generator 18 | self.markers = markers 19 | self.cmap = cmap 20 | 21 | def sample(self, **params): 22 | params = self.generator.sample_params(**params) 23 | example_input, example_output = self.generator.sample(**params) 24 | self.show(example_input, example_output, params) 25 | 26 | def show(self, example_input, example_output, params): 27 | example_prediction = self.ntm_fn(example_input) 28 | num_columns = 1 29 | if self.ntm_layer_fn is not None: 30 | num_columns = 3 31 | example_ntm = self.ntm_layer_fn(example_input) 32 | subplot_shape = (3, num_columns) 33 | title_props = matplotlib.font_manager.FontProperties(weight='bold', \ 34 | size=9) 35 | 36 | ax1 = plt.subplot2grid(subplot_shape, (0, 2)) 37 | ax1.imshow(example_input[0].T, interpolation='nearest', cmap=self.cmap, 38 | vmin=0.0, vmax=1.0) 39 | ax1.set_title('Input') 40 | ax1.title.set_font_properties(title_props) 41 | ax1.get_xaxis().set_visible(False) 42 | ax1.get_yaxis().set_visible(False) 43 | 44 | ax2 = plt.subplot2grid(subplot_shape, (1, 2)) 45 | ax2.imshow(example_output[0].T, interpolation='nearest', cmap=self.cmap, 46 | vmin=0.0, vmax=1.0) 47 | ax2.set_title('Output') 48 | ax2.title.set_font_properties(title_props) 49 | ax2.get_xaxis().set_visible(False) 50 | ax2.get_yaxis().set_visible(False) 51 | 52 | ax3 = plt.subplot2grid(subplot_shape, (2, 2)) 53 | ax3.imshow(example_prediction[0].T, interpolation='nearest', \ 54 | cmap=self.cmap, vmin=0.0, vmax=1.0) 55 | ax3.set_title('Prediction') 56 | ax3.title.set_font_properties(title_props) 57 | ax3.get_xaxis().set_visible(False) 58 | ax3.get_yaxis().set_visible(False) 59 | 60 | if self.ntm_layer_fn is not None: 61 | ax4 = plt.subplot2grid(subplot_shape, (0, 0), rowspan=3) 62 | ax4.imshow(example_ntm[3][0,:,0].T, interpolation='nearest', \ 63 | cmap=self.cmap, vmin=0.0, vmax=1.0) 64 | ax4.set_title('Write Weights') 65 | ax4.title.set_font_properties(title_props) 66 | ax4.get_xaxis().set_visible(False) 67 | for marker in self.markers: 68 | marker_style = marker.get('style', {}) 69 | ax4.plot([marker['location'](params), \ 70 | marker['location'](params)], [0, \ 71 | self.memory_shape[0] - 1], **marker_style) 72 | ax4.set_xlim([-0.5, example_input.shape[1] - 0.5]) 73 | ax4.set_ylim([-0.5, self.memory_shape[0] - 0.5]) 74 | ax4.tick_params(axis='y', labelsize=9) 75 | 76 | ax5 = plt.subplot2grid(subplot_shape, (0, 1), rowspan=3) 77 | ax5.imshow(example_ntm[4][0,:,0].T, interpolation='nearest', \ 78 | cmap=self.cmap, vmin=0.0, vmax=1.0) 79 | ax5.set_title('Read Weights') 80 | ax5.title.set_font_properties(title_props) 81 | ax5.get_xaxis().set_visible(False) 82 | for marker in self.markers: 83 | marker_style = marker.get('style', {}) 84 | ax5.plot([marker['location'](params), \ 85 | marker['location'](params)], [0, \ 86 | self.memory_shape[0] - 1], **marker_style) 87 | ax5.set_xlim([-0.5, example_input.shape[1] - 0.5]) 88 | ax5.set_ylim([-0.5, self.memory_shape[0] - 0.5]) 89 | ax5.tick_params(axis='y', labelsize=9) 90 | 91 | plt.show() 92 | 93 | 94 | def learning_curve(scores): 95 | sc = pd.Series(scores) 96 | ma = pd.rolling_mean(sc, window=500) 97 | 98 | ax = plt.subplot(1, 1, 1) 99 | ax.plot(sc.index, sc, color='lightgray') 100 | ax.plot(ma.index, ma, color='red') 101 | ax.set_yscale('log') 102 | ax.set_xlim(sc.index.min(), sc.index.max()) 103 | plt.show() 104 | --------------------------------------------------------------------------------