├── .gitignore
├── LICENSE.txt
├── README.md
├── docs
    └── index.html
├── tagger.dy.py
├── tagger.pt.py
├── tagger.tf.py
└── visualise.py


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | *.egg-info/
 24 | .installed.cfg
 25 | *.egg
 26 | MANIFEST
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *.cover
 47 | .hypothesis/
 48 | .pytest_cache/
 49 | 
 50 | # Translations
 51 | *.mo
 52 | *.pot
 53 | 
 54 | # Django stuff:
 55 | *.log
 56 | local_settings.py
 57 | db.sqlite3
 58 | 
 59 | # Flask stuff:
 60 | instance/
 61 | .webassets-cache
 62 | 
 63 | # Scrapy stuff:
 64 | .scrapy
 65 | 
 66 | # Sphinx documentation
 67 | docs/_build/
 68 | 
 69 | # PyBuilder
 70 | target/
 71 | 
 72 | # Jupyter Notebook
 73 | .ipynb_checkpoints
 74 | 
 75 | # pyenv
 76 | .python-version
 77 | 
 78 | # celery beat schedule file
 79 | celerybeat-schedule
 80 | 
 81 | # SageMath parsed files
 82 | *.sage.py
 83 | 
 84 | # Environments
 85 | .env
 86 | .venv
 87 | env/
 88 | venv/
 89 | ENV/
 90 | env.bak/
 91 | venv.bak/
 92 | 
 93 | # Spyder project settings
 94 | .spyderproject
 95 | .spyproject
 96 | 
 97 | # Rope project settings
 98 | .ropeproject
 99 | 
100 | # mkdocs documentation
101 | /site
102 | 
103 | # mypy
104 | .mypy_cache/
105 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2018, Jonathan K Kummerfeld <jkummerf@umich.edu>
 2 | 
 3 | Permission to use, copy, modify, and/or distribute this software for any
 4 | purpose with or without fee is hereby granted, provided that the above
 5 | copyright notice and this permission notice appear in all copies.
 6 | 
 7 | THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH
 8 | REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
 9 | FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,
10 | INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
11 | LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
12 | OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
13 | PERFORMANCE OF THIS SOFTWARE.
14 | 
15 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Neural Part-of-Speech tagger examples
 2 | 
 3 | This repository contains the code for DyNet, PyTorch and Tensorflow versions of a reasonably good POS tagger.
 4 | It also contains a program to convert that code (and comments) into a website for easy reading and comparison.
 5 | 
 6 | For the website, go [here](http://jkk.name/neural-tagger-tutorial/).
 7 | 
 8 | To generate the site, install the Python library `pygments` and run:
 9 | 
10 | ```
11 | ./visualise.py tagger.dy.py tagger.pt.py tagger.tf.py > docs/index.html
12 | ```
13 | 


--------------------------------------------------------------------------------
/tagger.dy.py:
--------------------------------------------------------------------------------
  1 | #### We use argparse for processing command line arguments, random for shuffling our data, sys for flushing output, and numpy for handling vectors of data.
  2 | # DyNet Implementation
  3 | import argparse
  4 | import random
  5 | import sys
  6 | 
  7 | import numpy as np
  8 | 
  9 | #### Typically, we would make many of these constants command line arguments and tune using the development set. For simplicity, I have fixed their values here to match Jiang, Liang and Zhang (CoLing 2018).
 10 | PAD = "__PAD__"
 11 | UNK = "__UNK__"
 12 | DIM_EMBEDDING = 100 # DIM_EMBEDDING - number of dimensions in our word embeddings.
 13 | LSTM_HIDDEN = 100 # LSTM_HIDDEN - number of dimensions in the hidden vectors for the LSTM. Based on NCRFpp (200 in the paper, but 100 per direction in code) 
 14 | BATCH_SIZE = 10 # BATCH_SIZE - number of examples considered in each model update.
 15 | LEARNING_RATE = 0.015 # LEARNING_RATE - adjusts how rapidly model parameters change by rescaling the gradient vector.
 16 | LEARNING_DECAY_RATE = 0.05 # LEARNING_DECAY_RATE - part of a rescaling of the learning rate after each pass through the data.
 17 | EPOCHS = 100 # EPOCHS - number of passes through the data in training.
 18 | KEEP_PROB = 0.5 # KEEP_PROB - probability of keeping a value when applying dropout.
 19 | GLOVE = "../data/glove.6B.100d.txt" # GLOVE - location of glove vectors.
 20 | WEIGHT_DECAY = 1e-8 # WEIGHT_DECAY - part of a rescaling of weights when an update occurs.
 21 | 
 22 | #### Dynet library imports. The first allows us to configure DyNet from within code rather than on the command line: mem is the amount of system memory initially allocated (DyNet has its own memory management), autobatch toggles automatic parallelisation of computations, weight_decay rescales weights by (1 - decay) after every update, random_seed sets the seed for random number generation.
 23 | import dynet_config
 24 | dynet_config.set(mem=256, autobatch=0, weight_decay=WEIGHT_DECAY,random_seed=0)
 25 | # dynet_config.set_gpu() for when we want to run with GPUs
 26 | import dynet as dy 
 27 | 
 28 | ####
 29 | # Data reading
 30 | def read_data(filename):
 31 |     #### We are expecting a minor variation on the raw Penn Treebank data, with one line per sentence, tokens separated by spaces, and the tag for each token placed next to its word (the | works as a separator as it does not appear as a token).
 32 |     """Example input:
 33 |     Pierre|NNP Vinken|NNP ,|, 61|CD years|NNS old|JJ
 34 |     """
 35 |     content = []
 36 |     with open(filename) as data_src:
 37 |         for line in data_src:
 38 |             t_p = [w.split("|") for w in line.strip().split()]
 39 |             tokens = [v[0] for v in t_p]
 40 |             tags = [v[1] for v in t_p]
 41 |             content.append((tokens, tags))
 42 |     return content
 43 | 
 44 | def simplify_token(token):
 45 |     chars = []
 46 |     for char in token:
 47 |         #### Reduce sparsity by replacing all digits with 0.
 48 |         if char.isdigit():
 49 |             chars.append("0")
 50 |         else:
 51 |             chars.append(char)
 52 |     return ''.join(chars)
 53 | 
 54 | def main():
 55 |     #### For the purpose of this example we only have arguments for locations of the data.
 56 |     parser = argparse.ArgumentParser(description='POS tagger.')
 57 |     parser.add_argument('training_data')
 58 |     parser.add_argument('dev_data')
 59 |     args = parser.parse_args()
 60 | 
 61 |     train = read_data(args.training_data)
 62 |     dev = read_data(args.dev_data)
 63 | 
 64 |     #### These indices map from strings to integers, which we apply to the input for our model. UNK is added to our mapping so that there is a vector we can use when we encounter unknown words. The special PAD symbol is used in PyTorch and Tensorflow as part of shaping the data in a batch to be a consistent size. It is not needed for DyNet, but kept for consistency.
 65 |     # Make indices
 66 |     id_to_token = [PAD, UNK]
 67 |     token_to_id = {PAD: 0, UNK: 1}
 68 |     id_to_tag = [PAD]
 69 |     tag_to_id = {PAD: 0}
 70 |     #### The '+ dev' may seem like an error, but is done here for convenience. It means in the next section we will retain the GloVe embeddings that appear in dev but not train. They won't be updated during training, so it does not mean we are getting information we shouldn't. In practise I would simply keep all the GloVe embeddings to avoid any potential incorrect use of the evaluation data.
 71 |     for tokens, tags in train + dev:
 72 |         for token in tokens:
 73 |             token = simplify_token(token)
 74 |             if token not in token_to_id:
 75 |                 token_to_id[token] = len(token_to_id)
 76 |                 id_to_token.append(token)
 77 |         for tag in tags:
 78 |             if tag not in tag_to_id:
 79 |                 tag_to_id[tag] = len(tag_to_id)
 80 |                 id_to_tag.append(tag)
 81 |     NWORDS = len(token_to_id)
 82 |     NTAGS = len(tag_to_id)
 83 | 
 84 |     # Load pre-trained GloVe vectors
 85 |     #### I am assuming these are 100-dimensional GloVe embeddings in their standard format.
 86 |     pretrained = {}
 87 |     for line in open(GLOVE):
 88 |         parts = line.strip().split()
 89 |         word = parts[0]
 90 |         vector = [float(v) for v in parts[1:]]
 91 |         pretrained[word] = vector
 92 |     #### We need the word vectors as a list to initialise the embeddings. Each entry in the list corresponds to the token with that index.
 93 |     pretrained_list = []
 94 |     scale = np.sqrt(3.0 / DIM_EMBEDDING)
 95 |     for word in id_to_token:
 96 |         # apply lower() because all GloVe vectors are for lowercase words
 97 |         if word.lower() in pretrained:
 98 |             pretrained_list.append(np.array(pretrained[word.lower()]))
 99 |         else:
100 |             #### For words that do not appear in GloVe we generate a random vector (note, the choice of scale here is important and we follow Jiang, Liang and Zhang (CoLing 2018).
101 |             random_vector = np.random.uniform(-scale, scale, [DIM_EMBEDDING])
102 |             pretrained_list.append(random_vector)
103 | 
104 |     #### The most significant difference between the frameworks is how the model parameters and their execution is defined. In DyNet we define parameters here and then define computation as needed. In PyTorch we use a class with the parameters defined in the constructor and the computation defined in the forward() method. In Tensorflow we define both parameters and computation here.
105 |     # Model creation
106 |     ####
107 |     model = dy.ParameterCollection()
108 |     # Create word embeddings and initialise
109 |     #### Lookup parameters are a matrix that supports efficient sparse lookup.
110 |     pEmbedding = model.add_lookup_parameters((NWORDS, DIM_EMBEDDING))
111 |     pEmbedding.init_from_array(np.array(pretrained_list))
112 |     # Create LSTM parameters
113 |     #### Objects that create LSTM cells and the necessary parameters.
114 |     stdv = 1.0 / np.sqrt(LSTM_HIDDEN) # Needed to match PyTorch
115 |     f_lstm = dy.VanillaLSTMBuilder(1, DIM_EMBEDDING, LSTM_HIDDEN, model,
116 |             forget_bias=(np.random.random_sample() - 0.5) * 2 * stdv)
117 |     b_lstm = dy.VanillaLSTMBuilder(1, DIM_EMBEDDING, LSTM_HIDDEN, model,
118 |             forget_bias=(np.random.random_sample() - 0.5) * 2 * stdv)
119 |     # Create output layer
120 |     pOutput = model.add_parameters((NTAGS, 2 * LSTM_HIDDEN))
121 |     
122 |     # Set recurrent dropout values (not used in this case)
123 |     f_lstm.set_dropouts(0.0, 0.0)
124 |     b_lstm.set_dropouts(0.0, 0.0)
125 |     # Initialise LSTM parameters
126 |     #### To match PyTorch, we initialise the parameters with an unconventional approach.
127 |     f_lstm.get_parameters()[0][0].set_value(
128 |             np.random.uniform(-stdv, stdv, [4 * LSTM_HIDDEN, DIM_EMBEDDING]))
129 |     f_lstm.get_parameters()[0][1].set_value(
130 |             np.random.uniform(-stdv, stdv, [4 * LSTM_HIDDEN, LSTM_HIDDEN]))
131 |     f_lstm.get_parameters()[0][2].set_value(
132 |             np.random.uniform(-stdv, stdv, [4 * LSTM_HIDDEN]))
133 |     b_lstm.get_parameters()[0][0].set_value(
134 |             np.random.uniform(-stdv, stdv, [4 * LSTM_HIDDEN, DIM_EMBEDDING]))
135 |     b_lstm.get_parameters()[0][1].set_value(
136 |             np.random.uniform(-stdv, stdv, [4 * LSTM_HIDDEN, LSTM_HIDDEN]))
137 |     b_lstm.get_parameters()[0][2].set_value(
138 |             np.random.uniform(-stdv, stdv, [4 * LSTM_HIDDEN]))
139 | 
140 |     #### The trainer object is used to update the model.
141 |     # Create the trainer
142 |     trainer = dy.SimpleSGDTrainer(model, learning_rate=LEARNING_RATE)
143 |     #### DyNet clips gradients by default, which we disable here (this can have a big impact on performance).
144 |     trainer.set_clip_threshold(-1)
145 | 
146 |     #### To make the code match across the three versions, we group together some framework specific values needed when doing a pass over the data.
147 |     expressions = (pEmbedding, pOutput, f_lstm, b_lstm, trainer)
148 |     #### Main training loop, in which we shuffle the data, set the learning rate, do one complete pass over the training data, then evaluate on the development data.
149 |     for epoch in range(EPOCHS):
150 |         random.shuffle(train)
151 | 
152 |         ####
153 |         # Update learning rate
154 |         trainer.learning_rate = LEARNING_RATE / (1+ LEARNING_DECAY_RATE * epoch)
155 | 
156 |         #### Training pass.
157 |         loss, tacc = do_pass(train, token_to_id, tag_to_id, expressions, True)
158 |         #### Dev pass.
159 |         _, dacc = do_pass(dev, token_to_id, tag_to_id, expressions, False)
160 |         print("{} loss {} t-acc {} d-acc {}".format(epoch, loss, tacc, dacc))
161 | 
162 |     #### The syntax varies, but in all three cases either saving or loading the parameters of a model must be done after the model is defined.
163 |     # Save model
164 |     model.save("tagger.dy.model")
165 | 
166 |     # Load model
167 |     model.populate("tagger.dy.model")
168 | 
169 |     # Evaluation pass.
170 |     _, test_acc = do_pass(dev, token_to_id, tag_to_id, expressions, False)
171 |     print("Test Accuracy: {:.3f}".format(test_acc))
172 | 
173 | #### Inference (the same function for train and test).
174 | def do_pass(data, token_to_id, tag_to_id, expressions, train):
175 |     pEmbedding, pOutput, f_lstm, b_lstm, trainer = expressions
176 | 
177 | 
178 |     # Loop over batches
179 |     loss = 0
180 |     match = 0
181 |     total = 0
182 |     for start in range(0, len(data), BATCH_SIZE):
183 |         #### Form the batch and order it based on length (important for efficient processing in PyTorch).
184 |         batch = data[start : start + BATCH_SIZE]
185 |         batch.sort(key = lambda x: -len(x[0]))
186 |         #### Log partial results so we can conveniently check progress.
187 |         if start % 4000 == 0 and start > 0:
188 |             print(loss, match / total)
189 |             sys.stdout.flush()
190 | 
191 |         #### Start a new computation graph for this batch.
192 |         # Process batch
193 |         dy.renew_cg()
194 |         #### For each example, we will construct an expression that gives the loss.
195 |         loss_expressions = []
196 |         predicted = []
197 |         #### Convert tokens and tags from strings to numbers using the indices.
198 |         for n, (tokens, tags) in enumerate(batch):
199 |             token_ids = [token_to_id.get(simplify_token(t), 0) for t in tokens]
200 |             tag_ids = [tag_to_id[t] for t in tags]
201 | 
202 |             #### Now we define the computation to be performed with the model. Note that they are not applied yet, we are simply building the computation graph.
203 |             # Look up word embeddings
204 |             wembs = [dy.lookup(pEmbedding, w) for w in token_ids]
205 |             # Apply dropout
206 |             if train:
207 |                 wembs = [dy.dropout(w, 1.0 - KEEP_PROB) for w in wembs]
208 |             # Feed words into the LSTM
209 |             #### Create an expression for two LSTMs and feed in the embeddings (reversed in one case).
210 |             #### We pull out the output vector from the cell state at each step.
211 |             f_init = f_lstm.initial_state()
212 |             f_lstm_output = [x.output() for x in f_init.add_inputs(wembs)]
213 |             rev_embs = reversed(wembs)
214 |             b_init = b_lstm.initial_state()
215 |             b_lstm_output = [x.output() for x in b_init.add_inputs(rev_embs)]
216 | 
217 |             # For each output, calculate the output and loss
218 |             pred_tags = []
219 |             for f, b, t in zip(f_lstm_output, reversed(b_lstm_output), tag_ids):
220 |                 # Combine the outputs
221 |                 combined = dy.concatenate([f,b])
222 |                 # Apply dropout
223 |                 if train:
224 |                     combined = dy.dropout(combined, 1.0 - KEEP_PROB)
225 |                 # Matrix multiply to get scores for each tag
226 |                 r_t = pOutput * combined
227 |                 # Calculate cross-entropy loss
228 |                 if train:
229 |                     err = dy.pickneglogsoftmax(r_t, t)
230 |                     #### We are not actually evaluating the loss values here, instead we collect them together in a list. This enables DyNet's <a href="http://dynet.readthedocs.io/en/latest/tutorials_notebooks/Autobatching.html">autobatching</a>.
231 |                     loss_expressions.append(err)
232 |                 # Calculate the highest scoring tag
233 |                 #### This call to .npvalue() will lead to evaluation of the graph and so we don't actually get the benefits of autobatching. With some refactoring we could get the benefit back (simply keep the r_t expressions around and do this after the update), but that would have complicated this code.
234 |                 chosen = np.argmax(r_t.npvalue())
235 |                 pred_tags.append(chosen)
236 |             predicted.append(pred_tags)
237 | 
238 |         # combine the losses for the batch, do an update, and record the loss
239 |         if train:
240 |             loss_for_batch = dy.esum(loss_expressions)
241 |             loss_for_batch.backward()
242 |             trainer.update()
243 |             loss += loss_for_batch.scalar_value()
244 | 
245 |         ####
246 |         # Update the number of correct tags and total tags
247 |         for (_, g), a in zip(batch, predicted):
248 |             total += len(g)
249 |             for gt, at in zip(g, a):
250 |                 gt = tag_to_id[gt]
251 |                 if gt == at:
252 |                     match += 1
253 | 
254 |     return loss, match / total
255 | 
256 | if __name__ == '__main__':
257 |     main()
258 | 


--------------------------------------------------------------------------------
/tagger.pt.py:
--------------------------------------------------------------------------------
  1 | #### We use argparse for processing command line arguments, random for shuffling our data, sys for flushing output, and numpy for handling vectors of data.
  2 | # PyTorch Implementation
  3 | import argparse
  4 | import random
  5 | import sys
  6 | 
  7 | import numpy as np
  8 | 
  9 | #### Typically, we would make many of these constants command line arguments and tune using the development set. For simplicity, I have fixed their values here to match Jiang, Liang and Zhang (CoLing 2018).
 10 | PAD = "__PAD__"
 11 | UNK = "__UNK__"
 12 | DIM_EMBEDDING = 100 # DIM_EMBEDDING - number of dimensions in our word embeddings.
 13 | LSTM_HIDDEN = 100 # LSTM_HIDDEN - number of dimensions in the hidden vectors for the LSTM. Based on NCRFpp (200 in the paper, but 100 per direction in code) 
 14 | BATCH_SIZE = 10 # BATCH_SIZE - number of examples considered in each model update.
 15 | LEARNING_RATE = 0.015 # LEARNING_RATE - adjusts how rapidly model parameters change by rescaling the gradient vector.
 16 | LEARNING_DECAY_RATE = 0.05 # LEARNING_DECAY_RATE - part of a rescaling of the learning rate after each pass through the data.
 17 | EPOCHS = 100 # EPOCHS - number of passes through the data in training.
 18 | KEEP_PROB = 0.5 # KEEP_PROB - probability of keeping a value when applying dropout.
 19 | GLOVE = "../data/glove.6B.100d.txt" # GLOVE - location of glove vectors.
 20 | WEIGHT_DECAY = 1e-8 # WEIGHT_DECAY - part of a rescaling of weights when an update occurs.
 21 | 
 22 | #### PyTorch library import.
 23 | import torch
 24 | torch.manual_seed(0)
 25 | 
 26 | ####
 27 | # Data reading
 28 | def read_data(filename):
 29 |     #### We are expecting a minor variation on the raw Penn Treebank data, with one line per sentence, tokens separated by spaces, and the tag for each token placed next to its word (the | works as a separator as it does not appear as a token).
 30 |     """Example input:
 31 |     Pierre|NNP Vinken|NNP ,|, 61|CD years|NNS old|JJ
 32 |     """
 33 |     content = []
 34 |     with open(filename) as data_src:
 35 |         for line in data_src:
 36 |             t_p = [w.split("|") for w in line.strip().split()]
 37 |             tokens = [v[0] for v in t_p]
 38 |             tags = [v[1] for v in t_p]
 39 |             content.append((tokens, tags))
 40 |     return content
 41 | 
 42 | def simplify_token(token):
 43 |     chars = []
 44 |     for char in token:
 45 |         #### Reduce sparsity by replacing all digits with 0.
 46 |         if char.isdigit():
 47 |             chars.append("0")
 48 |         else:
 49 |             chars.append(char)
 50 |     return ''.join(chars)
 51 | 
 52 | def main():
 53 |     #### For the purpose of this example we only have arguments for locations of the data.
 54 |     parser = argparse.ArgumentParser(description='POS tagger.')
 55 |     parser.add_argument('training_data')
 56 |     parser.add_argument('dev_data')
 57 |     args = parser.parse_args()
 58 | 
 59 |     train = read_data(args.training_data)
 60 |     dev = read_data(args.dev_data)
 61 | 
 62 |     #### These indices map from strings to integers, which we apply to the input for our model. UNK is added to our mapping so that there is a vector we can use when we encounter unknown words. The special PAD symbol is used in PyTorch and Tensorflow as part of shaping the data in a batch to be a consistent size. It is not needed for DyNet, but kept for consistency.
 63 |     # Make indices
 64 |     id_to_token = [PAD, UNK]
 65 |     token_to_id = {PAD: 0, UNK: 1}
 66 |     id_to_tag = [PAD]
 67 |     tag_to_id = {PAD: 0}
 68 |     #### The '+ dev' may seem like an error, but is done here for convenience. It means in the next section we will retain the GloVe embeddings that appear in dev but not train. They won't be updated during training, so it does not mean we are getting information we shouldn't. In practise I would simply keep all the GloVe embeddings to avoid any potential incorrect use of the evaluation data.
 69 |     for tokens, tags in train + dev:
 70 |         for token in tokens:
 71 |             token = simplify_token(token)
 72 |             if token not in token_to_id:
 73 |                 token_to_id[token] = len(token_to_id)
 74 |                 id_to_token.append(token)
 75 |         for tag in tags:
 76 |             if tag not in tag_to_id:
 77 |                 tag_to_id[tag] = len(tag_to_id)
 78 |                 id_to_tag.append(tag)
 79 |     NWORDS = len(token_to_id)
 80 |     NTAGS = len(tag_to_id)
 81 | 
 82 |     # Load pre-trained GloVe vectors
 83 |     #### I am assuming these are 100-dimensional GloVe embeddings in their standard format.
 84 |     pretrained = {}
 85 |     for line in open(GLOVE):
 86 |         parts = line.strip().split()
 87 |         word = parts[0]
 88 |         vector = [float(v) for v in parts[1:]]
 89 |         pretrained[word] = vector
 90 |     #### We need the word vectors as a list to initialise the embeddings. Each entry in the list corresponds to the token with that index.
 91 |     pretrained_list = []
 92 |     scale = np.sqrt(3.0 / DIM_EMBEDDING)
 93 |     for word in id_to_token:
 94 |         # apply lower() because all GloVe vectors are for lowercase words
 95 |         if word.lower() in pretrained:
 96 |             pretrained_list.append(np.array(pretrained[word.lower()]))
 97 |         else:
 98 |             #### For words that do not appear in GloVe we generate a random vector (note, the choice of scale here is important and we follow Jiang, Liang and Zhang (CoLing 2018).
 99 |             random_vector = np.random.uniform(-scale, scale, [DIM_EMBEDDING])
100 |             pretrained_list.append(random_vector)
101 | 
102 |     #### The most significant difference between the frameworks is how the model parameters and their execution is defined. In DyNet we define parameters here and then define computation as needed. In PyTorch we use a class with the parameters defined in the constructor and the computation defined in the forward() method. In Tensorflow we define both parameters and computation here.
103 |     # Model creation
104 |     #### 
105 |     model = TaggerModel(NWORDS, NTAGS, pretrained_list, id_to_token)
106 |     # Create optimizer and configure the learning rate
107 |     optimizer = torch.optim.SGD(model.parameters(), lr=LEARNING_RATE,
108 |             weight_decay=WEIGHT_DECAY)
109 |     #### The learning rate for each epoch is set by multiplying the initial rate by the factor produced by this function.
110 |     rescale_lr = lambda epoch: 1 / (1 + LEARNING_DECAY_RATE * epoch)
111 |     scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer,
112 |             lr_lambda=rescale_lr)
113 | 
114 |     #### To make the code match across the three versions, we group together some framework specific values needed when doing a pass over the data.
115 |     expressions = (model, optimizer)
116 |     #### Main training loop, in which we shuffle the data, set the learning rate, do one complete pass over the training data, then evaluate on the development data.
117 |     for epoch in range(EPOCHS):
118 |         random.shuffle(train)
119 | 
120 |         ####
121 |         # Update learning rate
122 |         #### First call to rescale_lr is with a 0, which is why this must be done before the pass over the data.
123 |         scheduler.step()
124 | 
125 |         #### Training mode (and evaluation mode below) do things like enable dropout components.
126 |         model.train() 
127 |         model.zero_grad()
128 |         #### Training pass.
129 |         loss, tacc = do_pass(train, token_to_id, tag_to_id, expressions,
130 |                 True)
131 | 
132 |         ####
133 |         model.eval() 
134 |         #### Dev pass.
135 |         _, dacc = do_pass(dev, token_to_id, tag_to_id, expressions, False)
136 |         print("{} loss {} t-acc {} d-acc {}".format(epoch, loss,
137 |             tacc, dacc))
138 | 
139 |     #### The syntax varies, but in all three cases either saving or loading the parameters of a model must be done after the model is defined.
140 |     # Save model
141 |     torch.save(model.state_dict(), "tagger.pt.model")
142 | 
143 |     # Load model
144 |     model.load_state_dict(torch.load('tagger.pt.model'))
145 | 
146 |     # Evaluation pass.
147 |     _, test_acc = do_pass(dev, token_to_id, tag_to_id, expressions, False)
148 |     print("Test Accuracy: {:.3f}".format(test_acc))
149 | 
150 | #### Neural network definition code. In PyTorch networks are defined using classes that extend Module.
151 | class TaggerModel(torch.nn.Module):
152 |     #### In the constructor we define objects that will do each of the computations.
153 |     def __init__(self, nwords, ntags, pretrained_list, id_to_token):
154 |         super().__init__()
155 | 
156 |         # Create word embeddings
157 |         pretrained_tensor = torch.FloatTensor(pretrained_list)
158 |         self.word_embedding = torch.nn.Embedding.from_pretrained(
159 |                 pretrained_tensor, freeze=False)
160 |         # Create input dropout parameter
161 |         self.word_dropout = torch.nn.Dropout(1 - KEEP_PROB)
162 |         # Create LSTM parameters
163 |         self.lstm = torch.nn.LSTM(DIM_EMBEDDING, LSTM_HIDDEN, num_layers=1,
164 |                 batch_first=True, bidirectional=True)
165 |         # Create output dropout parameter
166 |         self.lstm_output_dropout = torch.nn.Dropout(1 - KEEP_PROB)
167 |         # Create final matrix multiply parameters
168 |         self.hidden_to_tag = torch.nn.Linear(LSTM_HIDDEN * 2, ntags)
169 | 
170 |     def forward(self, sentences, labels, lengths, cur_batch_size):
171 |         max_length = sentences.size(1)
172 | 
173 |         # Look up word vectors
174 |         word_vectors = self.word_embedding(sentences)
175 |         # Apply dropout
176 |         dropped_word_vectors = self.word_dropout(word_vectors)
177 |         # Run the LSTM over the input, reshaping data for efficiency
178 |         #### Assuming the data is ordered longest to shortest, this provides a view of the data that fits with how cuDNN works.
179 |         packed_words = torch.nn.utils.rnn.pack_padded_sequence(
180 |                 dropped_word_vectors, lengths, True)
181 |         #### The None argument is an optional initial hidden state (default is a zero vector). The ignored return value contains the hidden states.
182 |         lstm_out, _ = self.lstm(packed_words, None)
183 |         #### Reverse the view shift made for cuDNN. Specifying total_length is not necessary in general (it can be inferred), but is necessary for parallel processing. The ignored return value contains the length of each sequence.
184 |         lstm_out, _ = torch.nn.utils.rnn.pad_packed_sequence(lstm_out,
185 |                 batch_first=True, total_length=max_length)
186 |         # Apply dropout
187 |         lstm_out_dropped = self.lstm_output_dropout(lstm_out)
188 |         # Matrix multiply to get scores for each tag
189 |         output_scores = self.hidden_to_tag(lstm_out_dropped)
190 | 
191 |         # Calculate loss and predictions
192 |         #### We reshape to [batch size * sequence length , ntags] for more efficient processing.
193 |         output_scores = output_scores.view(cur_batch_size * max_length, -1)
194 |         flat_labels = labels.view(cur_batch_size * max_length)
195 |         #### The ignore index refers to outputs to not score, which we use to ignore padding. 'reduction' defines how to combine the losses at each point in the sequence. The default is elementwise_mean, which would not do what we want.
196 |         loss_function = torch.nn.CrossEntropyLoss(ignore_index=0, reduction='sum')
197 |         loss = loss_function(output_scores, flat_labels)
198 |         predicted_tags  = torch.argmax(output_scores, 1)
199 |         #### Reshape to have dimensions [batch size , sequence length].
200 |         predicted_tags = predicted_tags.view(cur_batch_size, max_length)
201 |         return loss, predicted_tags
202 | 
203 | #### Inference (the same function for train and test).
204 | def do_pass(data, token_to_id, tag_to_id, expressions, train):
205 |     model, optimizer = expressions
206 | 
207 | 
208 |     # Loop over batches
209 |     loss = 0
210 |     match = 0
211 |     total = 0
212 |     for start in range(0, len(data), BATCH_SIZE):
213 |         #### Form the batch and order it based on length (important for efficient processing in PyTorch).
214 |         batch = data[start : start + BATCH_SIZE]
215 |         batch.sort(key = lambda x: -len(x[0]))
216 |         #### Log partial results so we can conveniently check progress.
217 |         if start % 4000 == 0 and start > 0:
218 |             print(loss, match / total)
219 |             sys.stdout.flush()
220 | 
221 |         ####
222 |         # Prepare inputs
223 |         #### Prepare input arrays, using .long() to cast the type from Tensor to LongTensor.
224 |         cur_batch_size = len(batch)
225 |         max_length = len(batch[0][0])
226 |         lengths = [len(v[0]) for v in batch]
227 |         input_array = torch.zeros((cur_batch_size, max_length)).long()
228 |         output_array = torch.zeros((cur_batch_size, max_length)).long()
229 |         #### Convert tokens and tags from strings to numbers using the indices.
230 |         for n, (tokens, tags) in enumerate(batch):
231 |             token_ids = [token_to_id.get(simplify_token(t), 0) for t in tokens]
232 |             tag_ids = [tag_to_id[t] for t in tags]
233 | 
234 |             #### Fill the arrays, leaving the remaining values as zero (our padding value).
235 |             input_array[n, :len(tokens)] = torch.LongTensor(token_ids)
236 |             output_array[n, :len(tags)] = torch.LongTensor(tag_ids)
237 | 
238 |         # Construct computation
239 |         #### Calling the model as a function will run its forward() function, which constructs the computations.
240 |         batch_loss, output = model(input_array, output_array, lengths,
241 |                 cur_batch_size)
242 | 
243 |         # Run computations
244 |         if train:
245 |             batch_loss.backward()
246 |             optimizer.step()
247 |             model.zero_grad()
248 |             #### To get the loss value we use .item().
249 |             loss += batch_loss.item()
250 |         #### Our output is an array (rather than a single value), so we use a different approach to get it into a usable form.
251 |         predicted = output.cpu().data.numpy()
252 | 
253 |         ####
254 |         # Update the number of correct tags and total tags
255 |         for (_, g), a in zip(batch, predicted):
256 |             total += len(g)
257 |             for gt, at in zip(g, a):
258 |                 gt = tag_to_id[gt]
259 |                 if gt == at:
260 |                     match += 1
261 | 
262 |     return loss, match / total
263 | 
264 | if __name__ == '__main__':
265 |     main()
266 | 


--------------------------------------------------------------------------------
/tagger.tf.py:
--------------------------------------------------------------------------------
  1 | #### We use argparse for processing command line arguments, random for shuffling our data, sys for flushing output, and numpy for handling vectors of data.
  2 | # Tensorflow Implementation
  3 | import argparse
  4 | import random
  5 | import sys
  6 | 
  7 | import numpy as np
  8 | 
  9 | #### Typically, we would make many of these constants command line arguments and tune using the development set. For simplicity, I have fixed their values here to match Jiang, Liang and Zhang (CoLing 2018).
 10 | PAD = "__PAD__"
 11 | UNK = "__UNK__"
 12 | DIM_EMBEDDING = 100 # DIM_EMBEDDING - number of dimensions in our word embeddings.
 13 | LSTM_HIDDEN = 100 # LSTM_HIDDEN - number of dimensions in the hidden vectors for the LSTM. Based on NCRFpp (200 in the paper, but 100 per direction in code) 
 14 | BATCH_SIZE = 10 # BATCH_SIZE - number of examples considered in each model update.
 15 | LEARNING_RATE = 0.015 # LEARNING_RATE - adjusts how rapidly model parameters change by rescaling the gradient vector.
 16 | LEARNING_DECAY_RATE = 0.05 # LEARNING_DECAY_RATE - part of a rescaling of the learning rate after each pass through the data.
 17 | EPOCHS = 100 # EPOCHS - number of passes through the data in training.
 18 | KEEP_PROB = 0.5 # KEEP_PROB - probability of keeping a value when applying dropout.
 19 | GLOVE = "../data/glove.6B.100d.txt" # GLOVE - location of glove vectors.
 20 | # WEIGHT_DECAY = 1e-8 Not used, see note at the bottom of the page
 21 | 
 22 | #### Tensorflow library import.
 23 | import tensorflow as tf
 24 | 
 25 | ####
 26 | # Data reading
 27 | def read_data(filename):
 28 |     #### We are expecting a minor variation on the raw Penn Treebank data, with one line per sentence, tokens separated by spaces, and the tag for each token placed next to its word (the | works as a separator as it does not appear as a token).
 29 |     """Example input:
 30 |     Pierre|NNP Vinken|NNP ,|, 61|CD years|NNS old|JJ
 31 |     """
 32 |     content = []
 33 |     with open(filename) as data_src:
 34 |         for line in data_src:
 35 |             t_p = [w.split("|") for w in line.strip().split()]
 36 |             tokens = [v[0] for v in t_p]
 37 |             tags = [v[1] for v in t_p]
 38 |             content.append((tokens, tags))
 39 |     return content
 40 | 
 41 | def simplify_token(token):
 42 |     chars = []
 43 |     for char in token:
 44 |         #### Reduce sparsity by replacing all digits with 0.
 45 |         if char.isdigit():
 46 |             chars.append("0")
 47 |         else:
 48 |             chars.append(char)
 49 |     return ''.join(chars)
 50 | 
 51 | def main():
 52 |     #### For the purpose of this example we only have arguments for locations of the data.
 53 |     parser = argparse.ArgumentParser(description='POS tagger.')
 54 |     parser.add_argument('training_data')
 55 |     parser.add_argument('dev_data')
 56 |     args = parser.parse_args()
 57 | 
 58 |     train = read_data(args.training_data)
 59 |     dev = read_data(args.dev_data)
 60 | 
 61 |     #### These indices map from strings to integers, which we apply to the input for our model. UNK is added to our mapping so that there is a vector we can use when we encounter unknown words. The special PAD symbol is used in PyTorch and Tensorflow as part of shaping the data in a batch to be a consistent size. It is not needed for DyNet, but kept for consistency.
 62 |     # Make indices
 63 |     id_to_token = [PAD, UNK]
 64 |     token_to_id = {PAD: 0, UNK: 1}
 65 |     id_to_tag = [PAD]
 66 |     tag_to_id = {PAD: 0}
 67 |     #### The '+ dev' may seem like an error, but is done here for convenience. It means in the next section we will retain the GloVe embeddings that appear in dev but not train. They won't be updated during training, so it does not mean we are getting information we shouldn't. In practise I would simply keep all the GloVe embeddings to avoid any potential incorrect use of the evaluation data.
 68 |     for tokens, tags in train + dev:
 69 |         for token in tokens:
 70 |             token = simplify_token(token)
 71 |             if token not in token_to_id:
 72 |                 token_to_id[token] = len(token_to_id)
 73 |                 id_to_token.append(token)
 74 |         for tag in tags:
 75 |             if tag not in tag_to_id:
 76 |                 tag_to_id[tag] = len(tag_to_id)
 77 |                 id_to_tag.append(tag)
 78 |     NWORDS = len(token_to_id)
 79 |     NTAGS = len(tag_to_id)
 80 | 
 81 |     # Load pre-trained GloVe vectors
 82 |     #### I am assuming these are 100-dimensional GloVe embeddings in their standard format.
 83 |     pretrained = {}
 84 |     for line in open(GLOVE):
 85 |         parts = line.strip().split()
 86 |         word = parts[0]
 87 |         vector = [float(v) for v in parts[1:]]
 88 |         pretrained[word] = vector
 89 |     #### We need the word vectors as a list to initialise the embeddings. Each entry in the list corresponds to the token with that index.
 90 |     pretrained_list = []
 91 |     scale = np.sqrt(3.0 / DIM_EMBEDDING)
 92 |     for word in id_to_token:
 93 |         # apply lower() because all GloVe vectors are for lowercase words
 94 |         if word.lower() in pretrained:
 95 |             pretrained_list.append(np.array(pretrained[word.lower()]))
 96 |         else:
 97 |             #### For words that do not appear in GloVe we generate a random vector (note, the choice of scale here is important and we follow Jiang, Liang and Zhang (CoLing 2018).
 98 |             random_vector = np.random.uniform(-scale, scale, [DIM_EMBEDDING])
 99 |             pretrained_list.append(random_vector)
100 | 
101 |     #### The most significant difference between the frameworks is how the model parameters and their execution is defined. In DyNet we define parameters here and then define computation as needed. In PyTorch we use a class with the parameters defined in the constructor and the computation defined in the forward() method. In Tensorflow we define both parameters and computation here.
102 |     # Model creation
103 |     ####
104 |     #### This line creates a new graph and makes it the default graph for operations to be registered to. It is not necessary here because we only have one graph, but is considered good practise (more discussion on <a href="https://stackoverflow.com/questions/39614938/why-do-we-need-tensorflow-tf-graph">Stackoverflow</a>.
105 |     with tf.Graph().as_default():
106 |         #### Placeholders are inputs/values that will be fed into the network each time it is run. We define their type, name, and shape (constant, 1D vector, 2D vector, etc). This includes what we normally think of as inputs (e.g. the tokens) as well as parameters we want to change at run time (e.g. the learning rate).
107 |         # Define inputs
108 |         e_input = tf.placeholder(tf.int32, [None, None], name='input')
109 |         e_lengths = tf.placeholder(tf.int32, [None], name='lengths')
110 |         e_mask = tf.placeholder(tf.int32, [None, None], name='mask')
111 |         e_gold_output = tf.placeholder(tf.int32, [None, None],
112 |                 name='gold_output')
113 |         e_keep_prob = tf.placeholder(tf.float32, name='keep_prob')
114 |         e_learning_rate = tf.placeholder(tf.float32, name='learning_rate')
115 | 
116 |         # Define word embedding
117 |         #### The embedding matrix is a variable (so they can shift in training), initialized with the vectors defined above.
118 |         glove_init = tf.constant_initializer(np.array(pretrained_list))
119 |         e_embedding = tf.get_variable("embedding", [NWORDS, DIM_EMBEDDING],
120 |                 initializer=glove_init)
121 |         e_embed = tf.nn.embedding_lookup(e_embedding, e_input)
122 | 
123 |         # Define LSTM cells
124 |         #### We create an LSTM cell, then wrap it in a class that applies dropout to the input and output.
125 |         e_cell_f = tf.contrib.rnn.BasicLSTMCell(LSTM_HIDDEN)
126 |         e_cell_f = tf.contrib.rnn.DropoutWrapper(e_cell_f,
127 |                 input_keep_prob=e_keep_prob, output_keep_prob=e_keep_prob)
128 |         # Recurrent dropout options
129 |         #### We are not using recurrent dropout, but it is a common enough feature of networks that it's good to see how it is done.
130 |         #        variational_recurrent=True, dtype=tf.float32,
131 |         #        input_size=DIM_EMBEDDING)
132 |         #### Similarly, multi-layer networks are a common use case. In Tensorflow, we would wrap a list of cells with a MultiRNNCell.
133 |         # Multi-layer cell creation
134 |         # e_cell_f = tf.contrib.rnn.MultiRNNCell([e_cell_f])
135 |         #### We are making a bidirectional network, so we need another cell for the reverse direction.
136 |         e_cell_b = tf.contrib.rnn.BasicLSTMCell(LSTM_HIDDEN)
137 |         e_cell_b = tf.contrib.rnn.DropoutWrapper(e_cell_b,
138 |                 input_keep_prob=e_keep_prob, output_keep_prob=e_keep_prob)
139 |         #### To use the cells we create a dynamic RNN. The 'dynamic' aspect means we can feed in the lengths of input sequences not counting padding and it will stop early.
140 |         e_initial_state_f = e_cell_f.zero_state(BATCH_SIZE, dtype=tf.float32)
141 |         e_initial_state_b = e_cell_f.zero_state(BATCH_SIZE, dtype=tf.float32)
142 |         e_lstm_outputs, e_final_state = tf.nn.bidirectional_dynamic_rnn(
143 |                 cell_fw=e_cell_f, cell_bw=e_cell_b, inputs=e_embed,
144 |                 initial_state_fw=e_initial_state_f,
145 |                 initial_state_bw=e_initial_state_b,
146 |                 sequence_length=e_lengths, dtype=tf.float32)
147 |         e_lstm_outputs_merged = tf.concat(e_lstm_outputs, 2)
148 | 
149 |         # Define output layer
150 |         #### Matrix multiply to get scores for each class.
151 |         e_predictions = tf.contrib.layers.fully_connected(e_lstm_outputs_merged,
152 |                 NTAGS, activation_fn=None)
153 |         # Define loss and update
154 |         #### Cross-entropy loss. The reduction flag is crucial (the default is to average over the sequence). The weights flag accounts for padding that makes all of the sequences the same length.
155 |         e_loss = tf.losses.sparse_softmax_cross_entropy(e_gold_output,
156 |                 e_predictions, weights=e_mask,
157 |                 reduction=tf.losses.Reduction.SUM)
158 |         e_train = tf.train.GradientDescentOptimizer(e_learning_rate).minimize(e_loss)
159 |         # Update with gradient clipping
160 |         #### If we wanted to do gradient clipping we would need to do the update in a few steps, first calculating the gradient, then modifying it before applying it.
161 |         # e_optimiser = tf.train.GradientDescentOptimizer(LEARNING_RATE)
162 |         # e_gradients = e_optimiser.compute_gradients(e_loss)
163 |         # e_clipped_gradients = [(tf.clip_by_value(grad, -5., 5.), var)
164 |         #         for grad, var in e_gradients]
165 |         # e_train = e_optimiser.apply_gradients(e_gradients)
166 | 
167 |         # Define output
168 |         e_auto_output = tf.argmax(e_predictions, 2, output_type=tf.int32)
169 | 
170 |         # Do training
171 |         #### Configure the system environment. By default Tensorflow uses all available GPUs and RAM. These lines limit the number of GPUs used and the amount of RAM. To limit which GPUs are used, set the environment variable CUDA_VISIBLE_DEVICES (e.g. "export CUDA_VISIBLE_DEVICES=0,1").
172 |         config = tf.ConfigProto(
173 |             device_count = {'GPU': 0},
174 |             gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = 0.8)
175 |         )
176 |         #### A session runs the graph. We use a 'with' block to ensure it is closed, which frees various resources.
177 |         with tf.Session(config=config) as sess:
178 |             #### Run executes operations, in this case initializing the variables.
179 |             sess.run(tf.global_variables_initializer())
180 | 
181 |             #### To make the code match across the three versions, we group together some framework specific values needed when doing a pass over the data.
182 |             expressions = [
183 |                 e_auto_output, e_gold_output, e_input, e_keep_prob, e_lengths,
184 |                 e_loss, e_train, e_mask, e_learning_rate, sess
185 |             ]
186 |             #### Main training loop, in which we shuffle the data, set the learning rate, do one complete pass over the training data, then evaluate on the development data.
187 |             for epoch in range(EPOCHS):
188 |                 random.shuffle(train)
189 | 
190 |                 ####
191 |                 # Determine the current learning rate
192 |                 current_lr = LEARNING_RATE / (1+ LEARNING_DECAY_RATE * epoch)
193 | 
194 |                 #### Training pass.
195 |                 loss, tacc = do_pass(train, token_to_id, tag_to_id, expressions,
196 |                         True, current_lr)
197 |                 #### Dev pass.
198 |                 _, dacc = do_pass(dev, token_to_id, tag_to_id, expressions,
199 |                         False)
200 |                 print("{} loss {} t-acc {} d-acc {}".format(epoch, loss, tacc,
201 |                     dacc))
202 | 
203 |             #### The syntax varies, but in all three cases either saving or loading the parameters of a model must be done after the model is defined.
204 |             # Save model
205 |             saver = tf.train.Saver()
206 |             saver.save(sess, "./tagger.tf.model")
207 | 
208 |             # Load model
209 |             saver.restore(sess, "./tagger.tf.model")
210 | 
211 |             # Evaluation pass.
212 |             _, test_acc = do_pass(dev, token_to_id, tag_to_id, expressions,
213 |                     False)
214 |             print("Test Accuracy: {:.3f}".format(test_acc))
215 | 
216 | #### Inference (the same function for train and test).
217 | def do_pass(data, token_to_id, tag_to_id, expressions, train, lr=0.0):
218 |     e_auto_output, e_gold_output, e_input, e_keep_prob, e_lengths, e_loss, \
219 |             e_train, e_mask, e_learning_rate, session = expressions
220 | 
221 |     # Loop over batches
222 |     loss = 0
223 |     match = 0
224 |     total = 0
225 |     for start in range(0, len(data), BATCH_SIZE):
226 |         #### Form the batch and order it based on length (important for efficient processing in PyTorch).
227 |         batch = data[start : start + BATCH_SIZE]
228 |         batch.sort(key = lambda x: -len(x[0]))
229 |         #### Log partial results so we can conveniently check progress.
230 |         if start % 4000 == 0 and start > 0:
231 |             print(loss, match / total)
232 |             sys.stdout.flush()
233 | 
234 |         ####
235 |         # Add empty sentences to fill the batch
236 |         #### We add empty sentences because Tensorflow requires every batch to be the same size.
237 |         batch += [([], []) for _ in range(BATCH_SIZE - len(batch))]
238 |         # Prepare inputs
239 |         #### We do this here for convenience and to have greater alignment between implementations, but in practise it would be best to do this once in pre-processing.
240 |         max_length = len(batch[0][0])
241 |         input_array = np.zeros([len(batch), max_length])
242 |         output_array = np.zeros([len(batch), max_length])
243 |         lengths = np.array([len(v[0]) for v in batch])
244 |         mask = np.zeros([len(batch), max_length])
245 |         #### Convert tokens and tags from strings to numbers using the indices.
246 |         for n, (tokens, tags) in enumerate(batch):
247 |             token_ids = [token_to_id.get(simplify_token(t), 0) for t in tokens]
248 |             tag_ids = [tag_to_id[t] for t in tags]
249 | 
250 |             #### Fill the arrays, leaving the remaining values as zero (our padding value).
251 |             input_array[n, :len(tokens)] = token_ids
252 |             output_array[n, :len(tags)] = tag_ids
253 |             mask[n, :len(tokens)] = np.ones([len(tokens)])
254 |         #### We can't change the computation graph to disable dropout when not training, so we just change the keep probability.
255 |         cur_keep_prob = KEEP_PROB if train else 1.0
256 |         #### This dictionary contains values for all of the placeholders we defined.
257 |         feed = {
258 |                 e_input: input_array,
259 |                 e_gold_output: output_array,
260 |                 e_mask: mask,
261 |                 e_keep_prob: cur_keep_prob,
262 |                 e_lengths: lengths,
263 |                 e_learning_rate: lr
264 |         }
265 | 
266 |         # Define the computations needed
267 |         todo = [e_auto_output]
268 |         #### If we are not training we do not need to compute a loss and we do not want to do the update.
269 |         if train:
270 |             todo.append(e_loss)
271 |             todo.append(e_train)
272 |         # Run computations
273 |         outcomes = session.run(todo, feed_dict=feed)
274 |         # Get outputs
275 |         predicted = outcomes[0]
276 |         if train:
277 |             #### We do not request the e_train value because its work is done - it performed the update during its computation.
278 |             loss += outcomes[1]
279 | 
280 |         ####
281 |         # Update the number of correct tags and total tags
282 |         for (_, g), a in zip(batch, predicted):
283 |             total += len(g)
284 |             for gt, at in zip(g, a):
285 |                 gt = tag_to_id[gt]
286 |                 if gt == at:
287 |                     match += 1
288 | 
289 |     return loss, match / total
290 | 
291 | if __name__ == '__main__':
292 |     main()
293 | 


--------------------------------------------------------------------------------
/visualise.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | 
  3 | import sys
  4 | 
  5 | import pygments
  6 | from pygments.lexers import get_lexer_by_name
  7 | from pygments.formatters import HtmlFormatter
  8 | 
  9 | lexer = get_lexer_by_name("python", stripall=True)
 10 | formatter = HtmlFormatter(cssclass="source")
 11 | 
 12 | def highlight(raw_code):
 13 |     code = pygments.highlight(raw_code, lexer, formatter)
 14 |     if len(raw_code) > 0:
 15 |         if raw_code[-1] == '\n':
 16 |             code = code.split("</pre></div>")[0] +"\n</pre></div>"
 17 |         if raw_code[0] == ' ':
 18 |             indent = 0
 19 |             for i, char in enumerate(raw_code):
 20 |                 if char != ' ':
 21 |                     break
 22 |                 indent += 1
 23 |             parts = code.split("<span class")
 24 |             parts[0] += ' '*indent
 25 |             code = "<span class".join(parts)
 26 |     if code.startswith("""<div class="source">""") and code.endswith("</div>"):
 27 |         code = code[20:-6]
 28 |     return code
 29 | 
 30 | def print_comment_and_code(content, p0, p1, p2):
 31 |     comment0 = ''
 32 |     comment1 = ''
 33 |     comment2 = ''
 34 |     code0 = '&nbsp;'
 35 |     code1 = '&nbsp;'
 36 |     code2 = '&nbsp;'
 37 |     if p0 is not None:
 38 |         part = content[0][p0]
 39 |         comment0 = [v[0] for v in part if v[0] is not None and v[1] is None]
 40 |         code0 = '\n'.join([v[1] for v in part if v[1] is not None])
 41 |         code0 = highlight(code0)
 42 |     if p1 is not None:
 43 |         part = content[1][p1]
 44 |         comment1 = [v[0] for v in part if v[0] is not None and v[1] is None]
 45 |         code1 = '\n'.join([v[1] for v in part if v[1] is not None])
 46 |         code1 = highlight(code1)
 47 |     if p2 is not None:
 48 |         part = content[2][p2]
 49 |         comment2 = [v[0] for v in part if v[0] is not None and v[1] is None]
 50 |         code2 = '\n'.join([v[1] for v in part if v[1] is not None])
 51 |         code2 = highlight(code2)
 52 | 
 53 |     class_name = 'shared-content'
 54 |     comment = comment0
 55 |     if p0 is not None and p1 is None and p2 is None:
 56 |         if len(''.join(comment).strip()) > 0:
 57 |             class_name = 'dynet'
 58 |     elif p0 is None and p1 is not None and p2 is None:
 59 |         comment = comment1
 60 |         if len(''.join(comment).strip()) > 0:
 61 |             class_name = 'pytorch'
 62 |     elif p0 is None and p1 is None and p2 is not None:
 63 |         comment = comment2
 64 |         if len(''.join(comment).strip()) > 0:
 65 |             class_name = 'tensorflow'
 66 | 
 67 |     print("""<div class="outer {}">""".format(class_name))
 68 |     
 69 |     div_comment = """<span class="description {}">""".format(class_name)
 70 |     if len(''.join(comment).strip()) > 0:
 71 |         div_comment += "\n<br />\n".join(comment) +"<br /><br />"
 72 |     else:
 73 |         div_comment += "&nbsp;"
 74 |     div_comment += "</span>"
 75 | 
 76 |     if len(''.join(comment).strip()) > 0:
 77 |         if code0 == "&nbsp;":
 78 |             if code1 == "&nbsp;":
 79 |                 order = 'tfonly'
 80 |             else:
 81 |                 order = 'ptonly'
 82 |     
 83 |     div_code0 = """<code class="dynet">""" + code0 +"</code>"
 84 |     div_code1 = """<code class="pytorch">""" + code1 +"</code>"
 85 |     div_code2 = """<code class="tensorflow">""" + code2 + "</code>"
 86 |     if class_name == 'pytorch':
 87 |         div_code0 = """<code class="dynet pytorch-line">""" + code0 +"</code>"
 88 |         div_code1 = """<code class="pytorch">""" + code1 +"</code>"
 89 |         div_code2 = """<code class="tensorflow">""" + code2 + "</code>"
 90 |     elif class_name == 'tensorflow':
 91 |         div_code0 = """<code class="dynet tensorflow-line">""" + code0 +"</code>"
 92 |         div_code1 = """<code class="pytorch tensorflow-line">""" + code1 +"</code>"
 93 |         div_code2 = """<code class="tensorflow">""" + code2 + "</code>"
 94 | 
 95 |     print(div_comment, end="")
 96 |     print(div_code0, end="")
 97 |     print(div_code1, end="")
 98 |     print(div_code2, end="")
 99 | 
100 |     print("</div>")
101 | 
102 | def read_file(filename):
103 |     parts = [[]]
104 |     prev_comment = True
105 |     for line in open(filename):
106 |         line = line.strip('\n')
107 |         
108 |         # Update comment status
109 |         if line.strip().startswith("####"):
110 |             if not prev_comment:
111 |                 parts.append([])
112 |             prev_comment = True
113 |         else:
114 |             prev_comment = False
115 | 
116 |         # Divide up the line
117 |         comment = None
118 |         code = None
119 |         if line.strip().startswith("####"):
120 |             comment = line.strip()[4:].strip()
121 |         elif '#' in line and line.strip()[0] != '#':
122 |             comment = line.split("#")[-1]
123 |             code = line[:-len(comment)-1]
124 |             comment = comment.strip()
125 |         else:
126 |             code = line
127 | 
128 |         parts[-1].append((comment, code))
129 |     return parts
130 | 
131 | def match(part0, part1, do_comments=False):
132 |     if do_comments:
133 |         part0 = ' '.join([v[0].strip() for v in part0 if v[0] is not None and v[1] is None])
134 |         part1 = ' '.join([v[0].strip() for v in part1 if v[0] is not None and v[1] is None])
135 |         return part0 == part1 and part0.strip() != ''
136 |     else:
137 |         part0 = ' '.join([v[1].strip() for v in part0 if v[1] is not None])
138 |         part1 = ' '.join([v[1].strip() for v in part1 if v[1] is not None])
139 |         return part0 == part1
140 | 
141 | def align(content):
142 |     # Find parts in common between all three
143 |     matches = set()
144 |     for i0, part0 in enumerate(content[0]):
145 |         for i1, part1 in enumerate(content[1]):
146 |             if match(part0, part1):
147 |                 for i2, part2 in enumerate(content[2]):
148 |                     if match(part0, part2):
149 |                         matches.add((i0, i1, i2))
150 |             if match(part0, part1, True):
151 |                 for i2, part2 in enumerate(content[2]):
152 |                     if match(part0, part2, True):
153 |                         matches.add((i0, i1, i2))
154 |     matches = sorted(list(matches))
155 |     return matches
156 | 
157 | def main():
158 |     # Read data
159 |     content = [read_file(filename) for filename in sys.argv[1:]]
160 | 
161 |     # Work out aligned sections
162 |     matches = align(content)
163 | 
164 |     # Render
165 |     print(head)
166 | 
167 |     print("""<div class="main">""")
168 |     positions = [0 for _ in content]
169 |     for p0, p1, p2 in matches:
170 |         while positions[0] < p0:
171 |             print_comment_and_code(content, positions[0], None, None)
172 |             positions[0] += 1
173 |         while positions[1] < p1:
174 |             print_comment_and_code(content, None, positions[1], None)
175 |             positions[1] += 1
176 |         while positions[2] < p2:
177 |             print_comment_and_code(content, None, None, positions[2])
178 |             positions[2] += 1
179 |         print_comment_and_code(content, p0, p1, p2)
180 |         positions[0] += 1
181 |         positions[1] += 1
182 |         positions[2] += 1
183 | 
184 |     print("""<br /></div>""")
185 | 
186 |     print(tail)
187 | 
188 | ###style_dark = """
189 | ###  <style type="text/css">
190 | ###body {
191 | ###    background: #000000;
192 | ###    color: #FFFFFF;
193 | ###}
194 | ###h1 {
195 | ###    color: #EEEEEE;
196 | ###    background: #111111;
197 | ###    margin: 0px;
198 | ###    text-align: center;
199 | ###    padding: 10px;
200 | ###}
201 | ###div.buttons {
202 | ###    width: 100%;
203 | ###    display: flex;
204 | ###    justify-content: center;
205 | ###}
206 | ###div.main {
207 | ###    display: flex;
208 | ###    flex-direction: column;
209 | ###    align-items: center;     /* center items horizontally, in this case */
210 | ###    padding-top: 10px;
211 | ###}
212 | ###div.header-outer {
213 | ###    display: flex;
214 | ###    flex-direction: column;
215 | ###    align-items: center;
216 | ###    background: #111111;
217 | ###    letter-spacing: 1px;
218 | ###    line-height: 130%;
219 | ###    color: #e0e0e0;
220 | ###    margin: 0px;
221 | ###    padding: 10px;
222 | ###    font-size: large;
223 | ###}
224 | ###div.disqus {
225 | ###    max-width: 1000px;
226 | ###    margin: auto;
227 | ###}
228 | ###div.header {
229 | ###    max-width: 1000px;
230 | ###}
231 | ###div.outer {
232 | ###    clear: both;
233 | ###}
234 | ###div.description {
235 | ###    letter-spacing: 1px;
236 | ###    font-size: large;
237 | ###    color: #97cae0;
238 | ###    line-height: 112%;
239 | ###    width: 400px;
240 | ###    float: left;
241 | ###}
242 | ###code  {
243 | ###    background: #000000;
244 | ###    color: #FFFFFF;
245 | ###    float: right;
246 | ###    width: 100ch;
247 | ###    padding-left: 15px;
248 | ###}
249 | ###code.empty {
250 | ###    background: #666666;
251 | ###    margin-top: 8px;
252 | ###    max-height: 3px;
253 | ###}
254 | ###code.dynet {
255 | ###}
256 | ###code.pytorch {
257 | ###}
258 | ###code.tensorflow {
259 | ###}
260 | ###a {
261 | ###    color: #00a1d6;
262 | ###}
263 | ###.button {
264 | ###    cursor: pointer;
265 | ###    background-color: #008CBA;
266 | ###    border: 10px;
267 | ###    margin: 5px;
268 | ###    color: white;
269 | ###    padding: 15px 32px;
270 | ###    text-align: center;
271 | ###    text-decoration: none;
272 | ###    display: inline-block;
273 | ###    font-size: 16px;
274 | ###}
275 | ###td.linenos { background-color: #f0f0f0; padding-right: 10px; }
276 | ###span.lineno { background-color: #f0f0f0; padding: 0 5px 0 5px; }
277 | ###pre { line-height: 125%; font-family: Menlo, "Courier New", Courier, monospace; margin: 0 }
278 | ###body .hll { background-color: #ffffcc }
279 | ###body .c { color: #408080; } /* Comment */
280 | ###body .err { border: 1px solid #FF0000 } /* Error */
281 | ###body .k { color: #fceb60; } /* Keyword */
282 | ###body .o { color: #FFFFFF } /* Operator */
283 | ###body .ch { color: #36e6e8; } /* Comment.Hashbang */
284 | ###body .cm { color: #36e6e8; } /* Comment.Multiline */
285 | ###body .cp { color: #36e6e8 } /* Comment.Preproc */
286 | ###body .cpf { color: #36e6e8; } /* Comment.PreprocFile */
287 | ###body .c1 { color: #36e6e8; } /* Comment.Single */
288 | ###body .cs { color: #36e6e8; } /* Comment.Special */
289 | ###body .gd { color: #A00000 } /* Generic.Deleted */
290 | ###body .ge { font-style: italic } /* Generic.Emph */
291 | ###body .gr { color: #FF0000 } /* Generic.Error */
292 | ###body .gh { color: #000080; } /* Generic.Heading */
293 | ###body .gi { color: #00A000 } /* Generic.Inserted */
294 | ###body .go { color: #888888 } /* Generic.Output */
295 | ###body .gp { color: #000080; } /* Generic.Prompt */
296 | ###body .gs { font-weight: bold } /* Generic.Strong */
297 | ###body .gu { color: #800080; } /* Generic.Subheading */
298 | ###body .gt { color: #0044DD } /* Generic.Traceback */
299 | ###body .kc { color: #008000; } /* Keyword.Constant */
300 | ###body .kd { color: #008000; } /* Keyword.Declaration */
301 | ###body .kn { color: #84b0d9; } /* Keyword.Namespace */
302 | ###body .kp { color: #008000 } /* Keyword.Pseudo */
303 | ###body .kr { color: #008000; } /* Keyword.Reserved */
304 | ###body .kt { color: #B00040 } /* Keyword.Type */
305 | ###body .m { color: #BC94B7 } /* Literal.Number */
306 | ###body .s { color: #BC94B7 } /* Literal.String */
307 | ###body .na { color: #7D9029 } /* Name.Attribute */
308 | ###body .nb { color: #36e6e8 } /* Name.Builtin */
309 | ###body .nc { color: #36e6e8; } /* Name.Class */
310 | ###body .no { color: #880000 } /* Name.Constant */
311 | ###body .nd { color: #AA22FF } /* Name.Decorator */
312 | ###body .ni { color: #999999; } /* Name.Entity */
313 | ###body .ne { color: #D2413A; } /* Name.Exception */
314 | ###body .nf { color: #36e6e8 } /* Name.Function */
315 | ###body .nl { color: #A0A000 } /* Name.Label */
316 | ###body .nn { color: #FFFFFF; } /* Name.Namespace */
317 | ###body .nt { color: #008000; } /* Name.Tag */
318 | ###body .nv { color: #19177C } /* Name.Variable */
319 | ###body .ow { color: #fceb60; } /* Operator.Word */
320 | ###body .w { color: #bbbbbb } /* Text.Whitespace */
321 | ###body .mb { color: #BC94B7 } /* Literal.Number.Bin */
322 | ###body .mf { color: #BC94B7 } /* Literal.Number.Float */
323 | ###body .mh { color: #BC94B7 } /* Literal.Number.Hex */
324 | ###body .mi { color: #BC94B7 } /* Literal.Number.Integer */
325 | ###body .mo { color: #BC94B7 } /* Literal.Number.Oct */
326 | ###body .sa { color: #BC94B7 } /* Literal.String.Affix */
327 | ###body .sb { color: #BC94B7 } /* Literal.String.Backtick */
328 | ###body .sc { color: #BC94B7 } /* Literal.String.Char */
329 | ###body .dl { color: #BC94B7 } /* Literal.String.Delimiter */
330 | ###body .sd { color: #BC94B7; } /* Literal.String.Doc */
331 | ###body .s2 { color: #BC94B7 } /* Literal.String.Double */
332 | ###body .se { color: #BB6622; } /* Literal.String.Escape */
333 | ###body .sh { color: #BC94B7 } /* Literal.String.Heredoc */
334 | ###body .si { color: #BB6688; } /* Literal.String.Interpol */
335 | ###body .sx { color: #008000 } /* Literal.String.Other */
336 | ###body .sr { color: #BB6688 } /* Literal.String.Regex */
337 | ###body .s1 { color: #BC94B7 } /* Literal.String.Single */
338 | ###body .ss { color: #19177C } /* Literal.String.Symbol */
339 | ###body .bp { color: #36e6e8 } /* Name.Builtin.Pseudo */
340 | ###body .fm { color: #36e6e8 } /* Name.Function.Magic */
341 | ###body .vc { color: #FFFFFF } /* Name.Variable.Class */
342 | ###body .vg { color: #FFFFFF } /* Name.Variable.Global */
343 | ###body .vi { color: #FFFFFF } /* Name.Variable.Instance */
344 | ###body .vm { color: #FFFFFF } /* Name.Variable.Magic */
345 | ###body .il { color: #BC94B7 } /* Literal.Number.Integer.Long */
346 | ###</style>
347 | ###"""
348 | 
349 | style_light = """
350 |   <style type="text/css">
351 | body {
352 |     padding: 0px;
353 |     margin: 0px;
354 | }
355 | h1 {
356 |     margin: 0px;
357 |     text-align: center;
358 |     padding: 10px;
359 | }
360 | div.buttons {
361 |     width: 100%;
362 |     display: flex;
363 |     justify-content: center;
364 | }
365 | div.header-outer {
366 |     background: #f3f3f3;
367 |     display: flex;
368 |     flex-direction: column;
369 |     align-items: center;
370 |     line-height: 130%;
371 |     margin: 0px;
372 |     padding: 10px;
373 |     font-size: large;
374 | }
375 | div.header {
376 |     max-width: 1000px;
377 | }
378 | button {
379 |     cursor: pointer;
380 |     margin: 5px;
381 |     border: none;
382 |     padding: 15px 32px;
383 |     text-align: center;
384 |     text-decoration: none;
385 |     display: inline-block;
386 |     font-size: 16px;
387 |     outline: none;
388 | }
389 | #dybutton {
390 |     background-color: #f8fad2;
391 |     border: 3px solid #f8fad2;
392 | }
393 | #tfbutton {
394 |     background-color: #d0eaf0;
395 |     border: 3px solid #d0eaf0;
396 | }
397 | #ptbutton { 
398 |     background-color: #fbe1d3;
399 |     border: 3px solid #fbe1d3;
400 | }
401 | div.disqus {
402 |     max-width: 1000px;
403 |     margin: auto;
404 | }
405 | 
406 | div.main {
407 |     text-align: center;
408 |     padding-top: 10px;
409 | }
410 | div.outer {
411 |     white-space: nowrap;
412 |     display: none;
413 | }
414 | span.description {
415 |     text-align: left;
416 |     vertical-align: top;
417 |     font-size: large;
418 |     line-height: 112%;
419 |     width: 400px;
420 |     white-space: normal;
421 |     display: none;
422 | }
423 | span.description.dynet {
424 |     background-color: #f8fad2;
425 |     display: none;
426 | }
427 | span.description.pytorch {
428 |     background-color: #fbe1d3;
429 |     display: none;
430 | }
431 | span.description.tensorflow {
432 |     background-color: #d0eaf0;
433 |     display: none;
434 | }
435 | span.paragraph-start {
436 |     font-weight: bold;
437 | }
438 | code {
439 |     text-align: left;
440 |     vertical-align: top;
441 |     width: 100ch;
442 |     display: none;
443 |     padding-left: 5px;
444 |     padding-right: 5px;
445 | }
446 | code.pytorch-line {
447 |     display: inline-block;
448 |     background: #fbe1d3;
449 |     margin-top: 8px;
450 |     max-height: 3px;
451 | }
452 | code.tensorflow-line {
453 |     display: inline-block;
454 |     background: #d0eaf0;
455 |     margin-top: 8px;
456 |     max-height: 3px;
457 | }
458 | pre { }
459 | td.linenos { background-color: #f0f0f0; padding-right: 10px; }
460 | span.lineno { background-color: #f0f0f0; padding: 0 5px 0 5px; }
461 | pre { line-height: 125%; font-family: Menlo, "Courier New", Courier, monospace; margin: 0 }
462 | body .bp { color: #0a5ac2 } /* Name.Builtin.Pseudo */
463 | body .c1 { color: #6a727c } /* Comment.Single */
464 | body .fm { color: #0a5ac2 } /* Name.Function.Magic */
465 | body .k { color: #d63e4d } /* Keyword */
466 | body .kn { color: #d63e4d; } /* Keyword.Namespace */
467 | body .mf { color: #0a5ac2 } /* Literal.Number.Float */
468 | body .mi { color: #0a5ac2 } /* Literal.Number.Integer */
469 | body .n { color: #000000 } /* Variable */
470 | body .nb { color: #0a5ac2 } /* Name.Builtin */
471 | body .nc { color: #6f40bf } /* Name.Class */
472 | body .nf { color: #6f40bf } /* Name.Function */
473 | body .nn { color: #000000 } /* Name.Namespace */
474 | body .o { color: #d63e4d } /* Operator */
475 | body .ow { color: #d63e4d } /* Operator.Word */
476 | body .p { color: #000000 } /* Parentheses */
477 | body .s1 { color: #052e60 } /* Literal.String.Single */
478 | body .s2 { color: #052e60 } /* Literal.String.Double */
479 | body .sd { color: #052e60 } /* Literal.String.Doc */
480 | body .vm { color: #0a5ac2 } 
481 | 
482 | body .c { color: #999988 } /* Comment */
483 | body .cm { color: #6a727c } /* Comment.Multiline */
484 | body .cp { color: #6a727c } /* Comment.Preproc */
485 | body .cs { color: #6a727c } /* Comment.Special */
486 | body .err { color: #a61717 } /* Error */
487 | body .gd { color: #000000 } /* Generic.Deleted */
488 | body .ge { color: #000000 } /* Generic.Emph */
489 | body .gh { color: #999999 } /* Generic.Heading */
490 | body .gi { color: #000000 } /* Generic.Inserted */
491 | body .go { color: #888888 } /* Generic.Output */
492 | body .gp { color: #555555 } /* Generic.Prompt */
493 | body .gr { color: #aa0000 } /* Generic.Error */
494 | body .gs { } /* Generic.Strong */
495 | body .gt { color: #aa0000 } /* Generic.Traceback */
496 | body .gu { color: #aaaaaa } /* Generic.Subheading */
497 | body .hll { }
498 | body .kc { color: #000000; } /* Keyword.Constant */
499 | body .kd { color: #000000; } /* Keyword.Declaration */
500 | body .kp { color: #000000; } /* Keyword.Pseudo */
501 | body .kr { color: #000000; } /* Keyword.Reserved */
502 | body .kt { color: #445588; } /* Keyword.Type */
503 | body .m { color: #009999 } /* Literal.Number */
504 | body .no { color: #008080 } /* Name.Constant */
505 | body .s { color: #d01040 } /* Literal.String */
506 | body .na { color: #008080 } /* Name.Attribute */
507 | body .nd { color: #3c5d5d } /* Name.Decorator */
508 | body .ni { color: #800080 } /* Name.Entity */
509 | body .ne { color: #990000 } /* Name.Exception */
510 | body .nl { color: #990000 } /* Name.Label */
511 | body .nt { color: #000080 } /* Name.Tag */
512 | body .nv { color: #008080 } /* Name.Variable */
513 | body .w { color: #bbbbbb } /* Text.Whitespace */
514 | body .mh { color: #0a5ac2 } /* Literal.Number.Hex */
515 | body .mo { color: #0a5ac2 } /* Literal.Number.Oct */
516 | body .sb { color: #052e60 } /* Literal.String.Backtick */
517 | body .sc { color: #052e60 } /* Literal.String.Char */
518 | body .se { color: #052e60 } /* Literal.String.Escape */
519 | body .sh { color: #052e60 } /* Literal.String.Heredoc */
520 | body .si { color: #052e60 } /* Literal.String.Interpol */
521 | body .sx { color: #052e60 } /* Literal.String.Other */
522 | body .sr { color: #052e60 } /* Literal.String.Regex */
523 | body .ss { color: #052e60 } /* Literal.String.Symbol */
524 | body .vc { color: #008080 } /* Name.Variable.Class */
525 | body .vg { color: #008080 } /* Name.Variable.Global */
526 | body .vi { color: #008080 } /* Name.Variable.Instance */
527 | body .il { color: #009999 } /* Literal.Number.Integer.Long */
528 | </style>
529 | """
530 | 
531 | head = """<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
532 |    "http://www.w3.org/TR/html4/strict.dtd">
533 | <html>
534 | <head>
535 | <!-- Global site tag (gtag.js) - Google Analytics -->
536 | <script async src="https://www.googletagmanager.com/gtag/js?id=UA-19179423-2"></script>
537 | <script>
538 |   window.dataLayer = window.dataLayer || [];
539 |   function gtag(){dataLayer.push(arguments);}
540 |   gtag('js', new Date());
541 | 
542 |   gtag('config', 'UA-19179423-2');
543 | </script>
544 |   <title>Neural Tagger Implementations</title>
545 |   <meta http-equiv="content-type" content="text/html; charset=UTF-8">
546 | """+ style_light +"""
547 | </head>
548 | 
549 | <body>
550 | <div class="header-outer">
551 | <h1>Implementing a neural Part-of-Speech tagger</h1>
552 | <h3>by Jonathan K. Kummerfeld <a href="http://www.jkk.name/">[site]</a></h3>
553 | <div class=header>
554 | <p>
555 | DyNet, PyTorch and Tensorflow are complex frameworks with different ways of approaching neural network implementation and variations in default behaviour.
556 | This page is intended to show how to implement the same non-trivial model in all three.
557 | The design of the page is motivated by my own preference for a complete program with annotations, rather than the more common tutorial style of introducing code piecemeal in between discussion.
558 | The design of the code is also geared towards providing a complete picture of how things fit together.
559 | For a non-tutorial version of this code it would be better to use abstraction to improve flexibility, but that would have complicated the flow here.
560 | </p>
561 | <p>
562 | <span class="paragraph-start">Model:</span>
563 | The three implementations below all define a part-of-speech tagger with word embeddings initialised using GloVe, fed into a one-layer bidirectional LSTM, followed by a matrix multiplication to produce scores for tags.
564 | They all score ~97.2% on the development set of the Penn Treebank.
565 | The specific hyperparameter choices follows <a href="https://arxiv.org/abs/1806.04470">Yang, Liang, and Zhang (CoLing 2018)</a> and matches their performance for the setting without a CRF layer or character-based word embeddings.
566 | The <a href="https://github.com/jkkummerfeld/neural-tagger-tutorial">repository</a> for this page provides the code in runnable form.
567 | The only dependencies are the respective frameworks (DyNet <a href="https://github.com/clab/dynet/releases/tag/2.0.3">2.0.3</a>, PyTorch <a href="https://github.com/pytorch/pytorch/releases/tag/v0.4.1">0.4.1</a> and Tensorflow <a href="https://github.com/tensorflow/tensorflow/releases/tag/v1.9.0">1.9.0</a>).
568 | </p>
569 | <p>
570 | <span class="paragraph-start">Website usage:</span> Use the buttons to show one or more implementations and their associated comments (note, depending on your screen size you may need to scroll to see all the code).
571 | Matching or closely related content is aligned.
572 | Framework-specific comments are highlighted in a colour that matches their button and a line is used to make the link from the comment to the code clear.
573 | </p>
574 | <p>
575 | <span class="paragraph-start">New (2019) Runnable Version:</span> I have made a slightly modified version of the Tensorflow code available as a <a href="https://colab.research.google.com/drive/1z5oU0ZkYMOMH4R_JGGLs0e06YXv6AdO-">Google Colaboratory Notebook</a>.
576 | </p>
577 | <p>
578 | Making this helped me understand all three frameworks better. Hopefully you will find it informative too!
579 | </p>
580 | 
581 | <div class="buttons">
582 | <button class="button" id="dybutton" onmouseover="" onclick="toggleDyNet()">Show/Hide DyNet</button>
583 | <button class="button" id="ptbutton" onmouseover="" onclick="togglePyTorch()">Show/Hide PyTorch</button>
584 | <button class="button" id="tfbutton" onmouseover="" onclick="toggleTensorflow()">Show/Hide Tensorflow</button>
585 | </div>
586 | </div>
587 | 
588 | </div>
589 | 
590 | """
591 | 
592 | tail = """
593 | 
594 | <script>
595 | var dyShowing = false;
596 | var tfShowing = false;
597 | var ptShowing = false;
598 | function whichShowing() {
599 |     if (dyShowing && tfShowing && ptShowing) return "all";
600 |     else if (dyShowing && ptShowing) return "-t";
601 |     else if (dyShowing && tfShowing) return "-p";
602 |     else if (ptShowing && tfShowing) return "-d";
603 |     else if (dyShowing) return "d";
604 |     else if (ptShowing) return "p";
605 |     else if (tfShowing) return "t";
606 |     else return "-";
607 | }
608 | function toggleItem(toEdit, showing) {
609 |     if (
610 |         (showing === "d" && toEdit.classList.contains("dynet")) ||
611 |         (showing === "p" && toEdit.classList.contains("pytorch")) ||
612 |         (showing === "t" && toEdit.classList.contains("tensorflow")) ||
613 |         (showing === "-d" && toEdit.classList.contains("pytorch")) ||
614 |         (showing === "-p" && toEdit.classList.contains("dynet")) ||
615 |         (showing === "-t" && toEdit.classList.contains("dynet")) ||
616 |         (showing === "-d" && toEdit.classList.contains("tensorflow")) ||
617 |         (showing === "-p" && toEdit.classList.contains("tensorflow")) ||
618 |         (showing === "-t" && toEdit.classList.contains("pytorch")) ||
619 |         (showing != "-" && toEdit.classList.contains("shared-content")) ||
620 |         showing === "all"
621 |        ) {
622 |         if (toEdit.classList.contains('outer')) {
623 |             toEdit.style.display = "block";
624 |         } else {
625 |             toEdit.style.display = "inline-block";
626 |         }
627 |     } else {
628 |         toEdit.style.display = "none";
629 |     }
630 | }
631 | function toggleDyNet() {
632 |     dyShowing = ! dyShowing;
633 | 
634 |     var dybutton = document.getElementById("dybutton");
635 |     if (dyShowing) {
636 |         dybutton.style.backgroundColor = "#f1f5a4";
637 |         dybutton.style.border = "3px solid #777777";
638 |     } else {
639 |         dybutton.style.backgroundColor = "#f8fad2";
640 |         dybutton.style.border = "3px solid #f8fad2";
641 |     }
642 | 
643 |     toggleAll();
644 | }
645 | function togglePyTorch() {
646 |     ptShowing = ! ptShowing;
647 | 
648 |     var ptbutton = document.getElementById("ptbutton");
649 |     if (ptShowing) {
650 |         ptbutton.style.backgroundColor = "#f7c1a4";
651 |         ptbutton.style.border = "3px solid #777777";
652 |     } else {
653 |         ptbutton.style.backgroundColor = "#fbe1d3";
654 |         ptbutton.style.border = "3px solid #fbe1d3";
655 |     }
656 | 
657 |     toggleAll();
658 | }
659 | function toggleTensorflow() {
660 |     tfShowing = ! tfShowing;
661 | 
662 |     var tfbutton = document.getElementById("tfbutton");
663 |     if (tfShowing) {
664 |         tfbutton.style.backgroundColor = "#a9d9e4";
665 |         tfbutton.style.border = "3px solid #777777";
666 |     } else {
667 |         tfbutton.style.backgroundColor = "#d0eaf0";
668 |         tfbutton.style.border = "3px solid #d0eaf0";
669 |     }
670 | 
671 |     toggleAll();
672 | }
673 | function toggleAll() {
674 |     showing = whichShowing();
675 |     var dyitems = document.getElementsByClassName("dynet");
676 |     for (var i = dyitems.length - 1; i >= 0; i--) {
677 |         toggleItem(dyitems[i], showing);
678 |     }
679 |     var tfitems = document.getElementsByClassName("tensorflow");
680 |     for (var i = tfitems.length - 1; i >= 0; i--) {
681 |         toggleItem(tfitems[i], showing);
682 |     }
683 |     var ptitems = document.getElementsByClassName("pytorch");
684 |     for (var i = ptitems.length - 1; i >= 0; i--) {
685 |         toggleItem(ptitems[i], showing);
686 |     }
687 |     var allitems = document.getElementsByClassName("shared-content");
688 |     for (var i = allitems.length - 1; i >= 0; i--) {
689 |         toggleItem(allitems[i], showing);
690 |     }
691 | }
692 | </script>
693 | 
694 | <div class="header-outer">
695 | <div class="header">
696 | <p>
697 | This code was last updated in August 2018.
698 | If one of the frameworks has changed in a way that should be reflected here, please let me know!
699 | </p>
700 | <p>
701 | A few miscellaneous notes:
702 | <ul>
703 |     <li>PyTorch 0.4 does not support recurrent dropout directly. For an example of how to achieve it, see the LSTM and QRNN Language Model Toolkit's <a href="https://github.com/salesforce/awd-lstm-lm/blob/28683b20154fce8e5812aeb6403e35010348c3ea/weight_drop.py">WeightDrop class</a> and <a href="https://github.com/salesforce/awd-lstm-lm/blob/457a422eb46e970a6aad659ca815a04b3d074d6c/model.py#L22">how it is used</a>.</li>
704 |     <li>Tensorflow 1.9 does not support weight decay directly, but <a href="https://github.com/tensorflow/tensorflow/pull/17438">this pull request</a> appears to add support and will be part of 1.10.</li>
705 | </ul>
706 | </p>
707 | <p>
708 | And a few other gotchas I've come across:
709 | <ul>
710 |     <li>For PyTorch, consider running your code with these two environment variables set: "OMP_NUM_THREADS=1 and MKL_NUM_THREADS=1". The reason is that they prevent unnecessary thread creation by low-level matrix manipulation libraries. See <a href="https://twitter.com/honnibal/status/1067920534585917440">this twitter thread</a> for discussion.</li>
711 | </ul>
712 | </p>
713 | <p>
714 | I developed this code and webpage with help from many people and resources. In particular:
715 | <ul>
716 |     <li> Feedback from <a href="http://proebsting.cs.arizona.edu/">Todd Proebsting</a>, <a href="http://www.it.usyd.edu.au/~judy/">Judy Kay</a>, <a href="http://www.it.usyd.edu.au/~bob/">Bob Kummerfeld</a>, and members of the <a href="http://web.eecs.umich.edu/~wlasecki/croma.html">CROMA Lab</a>.</li>
717 |     <li> <a href="https://github.com/jiesutd/NCRFpp">NCRFpp</a>, the code associated with <a href="https://arxiv.org/abs/1806.04470">Yang, Liang, and Zhang (CoLing 2018)</a>, which was my starting point for PyTorch and my reference point when trying to check performance for the others.</li>
718 |     <li> Guillaume Genthial's blog post about <a href="https://guillaumegenthial.github.io/sequence-tagging-with-tensorflow.html">Sequence Tagging with Tensorflow</a>. </li>
719 |     <li> The DyNet <a href="https://github.com/clab/dynet/blob/master/examples/tagger/bilstmtagger.py">example tagger</a>. </li>
720 | </ul>
721 | </p>
722 | </div>
723 | </div>
724 | 
725 | <div class="disqus">
726 | <div id="disqus_thread"></div>
727 | </div>
728 | <script>
729 | var disqus_config = function () {
730 |     this.page.url = 'http://jkk.name/neural-tagger-tutorial/';
731 |     this.page.identifier = '/neural-tagger-tutorial/';
732 |     this.page.title = 'Neural Tagger Example';
733 | };
734 | (function() { // DON'T EDIT BELOW THIS LINE
735 | var d = document, s = d.createElement('script');
736 | s.src = 'https://www-jkk-name.disqus.com/embed.js';
737 | s.setAttribute('data-timestamp', +new Date());
738 | (d.head || d.body).appendChild(s);
739 | })();
740 | </script>
741 | <noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
742 | </body>
743 | </html>"""
744 | 
745 | if __name__ == '__main__':
746 |     main()
747 | 
748 | 


--------------------------------------------------------------------------------