├── Elmo_tutorial.ipynb
├── README.md
├── Tsne_vis.png
├── tensorboard_vis.png
├── train_elmo_updated.py
└── training_updated.py
/Elmo_tutorial.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Elmo - A short Tutorial\n"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 2,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "%matplotlib inline\n",
17 | "import tensorflow as tf\n",
18 | "import tensorflow_hub as hub\n",
19 | "import matplotlib.pyplot as plt\n",
20 | "import numpy as np\n",
21 | "from sklearn.manifold import TSNE\n",
22 | "from tensorflow.examples.tutorials.mnist import input_data\n",
23 | "from tensorflow.contrib.tensorboard.plugins import projector"
24 | ]
25 | },
26 | {
27 | "cell_type": "markdown",
28 | "metadata": {},
29 | "source": [
30 | "## Pre Trained Elmo Model:"
31 | ]
32 | },
33 | {
34 | "cell_type": "markdown",
35 | "metadata": {},
36 | "source": [
37 | "### Loading the Elmo Model\n",
38 | "\n",
39 | "\n",
40 | "The model trained on One Billion World Language Model Benchmark (http://www.statmt.org/lm-benchmark/) as been exposed on Tensorflow Hub. This can be loaded as :"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": null,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "elmo = hub.Module(\"https://tfhub.dev/google/elmo/2\", trainable=True)"
50 | ]
51 | },
52 | {
53 | "cell_type": "markdown",
54 | "metadata": {},
55 | "source": [
56 | "We set the trainable parameter to True when creating the module so that the 4 scalar weights (as described in the paper) and all LSTM cell variables can be trained. In this setting, the module still keeps all other parameters fixed.\n",
57 | "This will help to get the embedding of a word the model has not seen, given the context."
58 | ]
59 | },
60 | {
61 | "cell_type": "markdown",
62 | "metadata": {},
63 | "source": [
64 | "## Structure\n",
65 | "The elmo model consists of two files:\n",
66 | "\n",
67 | "**options.json** : These are the parameters/options using which the language model was trained on
\n",
68 | "\n",
69 | "**weights.hdf5** : The weights file for the best model\n",
70 | "\n",
71 | "\n",
72 | "The input to the pre trained model (elmo) above can be fed in two different ways:\n",
73 | "\n",
74 | " ### 1. **Tokens**"
75 | ]
76 | },
77 | {
78 | "cell_type": "code",
79 | "execution_count": null,
80 | "metadata": {},
81 | "outputs": [],
82 | "source": [
83 | "tokens_input = [[\"Argentina\", \"played\", \"football\", \"very\", \"well\",\"\",\"\",\"\",\"\"],\n",
84 | " [\"Brazil\",\"is\",\"a\",\"strong\",\"team\",\"\",\"\",\"\",\"\"],\n",
85 | " [\"Artists\",\"all\",\"over\",\"the\",\"world\",\"are\",\"attending\",\"the\",\"play\"],\n",
86 | " [\"Child\",\"is\",\"playing\",\"the\",\"guitar\",\"\",\"\",\"\",\"\"],\n",
87 | " [\"There\",\"was\",\"absolute\",\"silence\",\"during\",\"the\",\"play\",\"\",\"\"]]\n",
88 | "\n",
89 | "tokens_length = [5,5,9,5,7]\n",
90 | "embeddings = elmo(inputs={\"tokens\": tokens_input,\"sequence_len\": tokens_length},\n",
91 | " signature=\"tokens\",\n",
92 | " as_dict=True)[\"elmo\"]"
93 | ]
94 | },
95 | {
96 | "cell_type": "markdown",
97 | "metadata": {},
98 | "source": [
99 | "### 2. **Default**"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": null,
105 | "metadata": {},
106 | "outputs": [],
107 | "source": [
108 | "embeddings = elmo([\"Argentina played football very well\",\"Brazil is a strong team\", \n",
109 | " \"Artists all over the world are attending the play\", \n",
110 | " \"Child is playing the guitar\", \n",
111 | " \"There was absolute silence during the play\"],\n",
112 | " signature=\"default\",\n",
113 | " as_dict=True)[\"elmo\"]"
114 | ]
115 | },
116 | {
117 | "cell_type": "markdown",
118 | "metadata": {},
119 | "source": [
120 | "**Inputs**\n",
121 | "\n",
122 | "The module defines two signatures: tokens and default.\n",
123 | "\n",
124 | "With the tokens signature, the module takes tokenized sentences as input. The input tensor is a string tensor with shape [batch_size, max_length] and an int32 tensor with shape [batch_size] corresponding to the sentence length. The length input is necessary to exclude padding in the case of sentences with varying length.\n",
125 | "\n",
126 | "In case of above Example:\n",
127 | "\n",
128 | " Each element contains one layer of ELMo representations with shape\n",
129 | " (5, 9, 1024).\n",
130 | " 5 - the batch size\n",
131 | " 9 - the sequence length of the batch\n",
132 | " 1024 - the dimension of each ELMo vector\n",
133 | "\n",
134 | "With the default signature, the module takes untokenized sentences as input. The input tensor is a string tensor with shape [batch_size]. The module tokenizes each string by splitting on spaces.\n",
135 | "\n",
136 | "\n",
137 | "\n",
138 | "\n",
139 | "**Outputs**\n",
140 | "\n",
141 | "The output (_embeddings_) is a dictionary with following keys:\n",
142 | "\n",
143 | " - word_emb: the character-based word representations with shape [batch_size, max_length, 512].\n",
144 | " - lstm_outputs1: the first LSTM hidden state with shape [batch_size, max_length, 1024].\n",
145 | " - lstm_outputs2: the second LSTM hidden state with shape [batch_size, max_length, 1024].\n",
146 | " - elmo: the weighted sum of the 3 layers, where the weights are trainable. This tensor has shape [batch_size, max_length, 1024]\n",
147 | " - default: a fixed mean-pooling of all contextualized word representations with shape [batch_size, 1024].\n",
148 | " \n",
149 | " The \"elmo\" value is selected.\n"
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": null,
155 | "metadata": {},
156 | "outputs": [],
157 | "source": [
158 | "init = tf.initialize_all_variables()\n",
159 | "sess = tf.Session()\n",
160 | "sess.run(init)\n",
161 | "\n",
162 | "print(\"Argentina\")\n",
163 | "print(sess.run(embeddings[0][0]))\n",
164 | "\n",
165 | "print(\"played\")\n",
166 | "print(sess.run(embeddings[0][1]))\n",
167 | "\n",
168 | "print(\"play - Theatre\")\n",
169 | "print(sess.run(embeddings[4][6]))\n"
170 | ]
171 | },
172 | {
173 | "cell_type": "markdown",
174 | "metadata": {},
175 | "source": [
176 | "### Getting the Embeddings:\n",
177 | "\n",
178 | "Firstly, initialise the session:\n",
179 | "\n",
180 | " init = tf.initialize_all_variables()\n",
181 | " sess = tf.Session()\n",
182 | " sess.run(init)\n",
183 | "\n",
184 | "The embeddings for any token can be obtained using _embeddings_ and passing the position of the token.\n",
185 | "\n",
186 | "For Example:\n",
187 | "\n",
188 | "**Argentina** ([0][0])\n",
189 | "\n",
190 | " print(sess.run(embeddings[0][0]))\n",
191 | "\n",
192 | "**played** ([0][1])\n",
193 | "\n",
194 | " print(sess.run(embeddings[0][1]))\n",
195 | " \n",
196 | "**play** ([4][6])\n",
197 | "\n",
198 | " print(sess.run(embeddings[4][6]))\n"
199 | ]
200 | },
201 | {
202 | "cell_type": "markdown",
203 | "metadata": {},
204 | "source": [
205 | "### Visualizing the Embeddings \n",
206 | "\n",
207 | "#### 1. **t-SNE**"
208 | ]
209 | },
210 | {
211 | "cell_type": "code",
212 | "execution_count": 15,
213 | "metadata": {},
214 | "outputs": [],
215 | "source": [
216 | "def tsne_plot():\n",
217 | " \"Creates and TSNE model and plots it\"\n",
218 | " labels = []\n",
219 | " tokens = []\n",
220 | " \n",
221 | " import tensorflow as tf\n",
222 | " import sys\n",
223 | " np.set_printoptions(threshold=sys.maxsize)\n",
224 | "\n",
225 | " tokens_input = [[\"Argentina\", \"played\", \"football\", \"very\", \"well\",\"\",\"\",\"\",\"\"],\n",
226 | " [\"Brazil\",\"is\",\"a\",\"strong\",\"team\",\"\",\"\",\"\",\"\"],\n",
227 | " [\"Artists\",\"all\",\"over\",\"the\",\"world\",\"are\",\"attending\",\"the\",\"play\"],\n",
228 | " [\"Child\",\"is\",\"playing\",\"the\",\"guitar\",\"\",\"\",\"\",\"\"],\n",
229 | " [\"There\",\"was\",\"absolute\",\"silence\",\"during\",\"the\",\"play\",\"\",\"\"]]\n",
230 | "\n",
231 | " tokens_length = [5,5,9,5,7]\n",
232 | " embeddings = elmo(inputs={\"tokens\": tokens_input,\"sequence_len\": tokens_length},\n",
233 | " signature=\"tokens\",\n",
234 | " as_dict=True)[\"elmo\"]\n",
235 | "\n",
236 | "\n",
237 | "\n",
238 | " init = tf.initialize_all_variables()\n",
239 | " sess = tf.Session()\n",
240 | " sess.run(init)\n",
241 | " sent = [\"Argentina\", \"played\", \"football\", \"very\", \"well\"]\n",
242 | " sent1 = [\"Brazil\",\"is\",\"a\",\"strong\",\"team\"]\n",
243 | " sent2 = [\"Artists\",\"all\",\"over\",\"the\",\"world\",\"are\",\"attending\",\"the\",\"play\"]\n",
244 | " sent3 = [\"Child\",\"is\",\"playing\",\"the\",\"guitar\"]\n",
245 | " sent4 = [\"There\",\"was\",\"absolute\",\"silence\",\"during\",\"the\",\"play\"]\n",
246 | " \n",
247 | "\n",
248 | " for i in range(len(sent)):\n",
249 | " tokens.append(sess.run(embeddings[0][i]))\n",
250 | " labels.append(sent[i]) \n",
251 | " for i in range(len(sent1)):\n",
252 | " tokens.append(sess.run(embeddings[1][i]))\n",
253 | " labels.append(sent1[i])\n",
254 | " for i in range(len(sent2)):\n",
255 | " tokens.append(sess.run(embeddings[2][i]))\n",
256 | " labels.append(sent2[i])\n",
257 | " for i in range(len(sent3)):\n",
258 | " tokens.append(sess.run(embeddings[3][i]))\n",
259 | " labels.append(sent3[i]) \n",
260 | " for i in range(len(sent4)):\n",
261 | " tokens.append(sess.run(embeddings[4][i]))\n",
262 | " labels.append(sent4[i])\n",
263 | " \n",
264 | " tsne_model = TSNE(perplexity=6, n_components=2, init='random', n_iter=500)\n",
265 | " new_values = tsne_model.fit_transform(tokens)\n",
266 | "\n",
267 | " x = []\n",
268 | " y = []\n",
269 | " for value in new_values:\n",
270 | " x.append(value[0])\n",
271 | " y.append(value[1])\n",
272 | " \n",
273 | " plt.figure(figsize=(18, 12)) \n",
274 | " for i in range(len(x)):\n",
275 | " plt.scatter(x[i],y[i])\n",
276 | " plt.annotate(labels[i],\n",
277 | " xy=(x[i], y[i]),\n",
278 | " xytext=(5, 2),\n",
279 | " textcoords='offset points',\n",
280 | " ha='right',\n",
281 | " va='bottom')\n",
282 | " plt.show()"
283 | ]
284 | },
285 | {
286 | "cell_type": "code",
287 | "execution_count": 16,
288 | "metadata": {},
289 | "outputs": [
290 | {
291 | "data": {
292 | "image/png": "iVBORw0KGgoAAAANSUhEUgAABCAAAAKvCAYAAAC72r5SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzs3XtYVnW+///XEhFBFNRy56kB3Xm68eZGwCAFUbdBeU4tJ8+WznZqW803f+avsdxNTu2Rdg3Tr7yYScudYaaZ2eR4xjQxu1EkVIzRUCsyD18QFAxw/f7gsEPxQLJYIM/Hdc3FvT7rs9b9/njVXPbiczBM0xQAAAAAAICVmthdAAAAAAAAuPURQAAAAAAAAMsRQAAAAAAAAMsRQAAAAAAAAMsRQAAAAAAAAMsRQAAAAAAAAMsRQAAAAAAAAMsRQAAAAAAAAMsRQAAAAAAAAMs1tbuAG3HbbbeZAQEBdpcBAAAAAAAuk5qaeto0zduv169BBBABAQFyu912lwEAAAAAAC5jGMaxG+nHEgwAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AggAAAAAAGA5AgigjqxZs0aGYSgzM7Pa+7m5uXrjjTcqr7///nuNHTv2qu+7vD8AAAAA1GcEEEAdSUpKUv/+/bVixYor7pWWll4RKHTo0EGrVq266vsIIAAAAAA0JAQQQB0oKCjQ559/rrfeeqsygEhOTtbAgQP18MMPq3fv3nrmmWd05MgRuVwuzZkzR9nZ2QoKCpIkHThwQH379pXL5ZLT6VRWVtYV/XNychQdHS2Xy6WgoCDt2LHDziEDAAAAQBVN7S4AaAw++ugjxcXFqVu3bmrTpo327t0rSdqzZ48yMjIUGBio7OxsZWRkKC0tTZKUnZ1d+fzixYv1xBNPaMKECfrpp59UWlqql19+uUr/V155RbGxsXr22WdVWlqqCxcu1Pk4AQAAAOBqmAEB1IGkpCSNHz9ekjR+/HglJSVJkvr27avAwMDrPh8ZGak//vGP+q//+i8dO3ZM3t7eV/QJDw/X0qVLtWDBAn311Vdq2bJl7Q4CAAAAAG4CAQRgsTNnzmjr1q169NFHFRAQoEWLFun999+XaZpq0aLFDb3j4Ycf1scffyxvb2/FxsZq69atV/SJjo7WZ599po4dO2rSpElatmxZbQ8FAGpdQECATp8+LUny9fW1uRoAAGAlAgjAYqtWrdLkyZN17NgxZWdn68SJEwoMDNTOnTur9GvZsqXy8/OrfcfRo0fVpUsXzZ49WyNGjFB6evoV/Y8dO6Z27dppxowZeuSRRyqXeQAAAABAfUAAAVgsKSlJo0ePrtI2ZswYvffee1Xa2rZtq379+ikoKEhz5sypcu/9999XUFCQXC6XMjMzNXny5Cv6Jycny+VyKSQkRKtXr9YTTzxh+dgAoCZGjRql0NBQORwOJSYm2l0OAACoY4ZpmnbXcF1hYWGm2+22uwwAAHATzp49qzZt2qiwsFDh4eHavn27QkND5Xa7ddttt8nX11cFBQV2lwkAAGrIMIxU0zTDrtePUzCAW8BH+77Tog2H9X1uoTr4e2tObHeNCulod1kAUEVCQoLWrFkjSTpx4oSysrJsrggAANQlAgiggfto33ea9+FXKiwulSR9l1uoeR9+JUmEEADqjeTkZG3evFkpKSny8fFRTEyMioqK7C4LAADUIfaAABq4RRsOV4YPFQqLS7Vow2GbKgKAK+Xl5al169by8fFRZmamdu/ebXdJAACgjhFAAA3c97mFNWoHADvExcWppKRETqdT8+fPV0REhN0lAQCAOsYSDKCB6+Dvre+qCRs6+HvbUA0AVM/Ly0vr16+/oj07O7vyMxtQAgBwa2MGBNDAzYntLm9Pjypt3p4emhPb3aaKAOAGpK+UXg2SFviX/UxfaXdFAADAYsyAABq4io0mOQUDQIORvlJaN1sqLp+9lXei7FqSnA/aVxcAALCUYZqm3TVcV1hYmOl2u+0uAwAA1IZXg8pCh8v5dZaeyqj7egAAwE0xDCPVNM2w6/VjCQYAAKhbed/WrB0AANwSCCAAAEDd8utUs3YAAHBLIIAAAAB1a/BzkudlJ/V4epe1AwCAWxYBBAAAqFvOB6XhCWV7Psgo+zk8gQ0oAQC4xXEKBgAAqHvOBwkcAABoZJgBAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALEcAAQAAAAAALFdrAYRhGB6GYewzDOOT8utAwzC+MAwjyzCM9w3DaFbe7lV+/c/y+wG1VQNuHWfOnJHL5ZLL5dIdd9yhjh07yuVyyd/fX7169bK7PAAAAABADdXmDIgnJB362fV/SXrVNM27JP1fSY+Utz8i6f+apvmvkl4t7wdU0bZtW6WlpSktLU3//u//rqeeeqryukmTX/6PbUlJSS1WCQAAAAC4UbUSQBiG0UnSUEl/K782JA2StKq8yzuSRpV/Hll+rfL7g8v7AzektLRUM2bMkMPh0L333qvCwkJJ0pEjRxQXF6fQ0FBFRUUpMzNTkjR16lT97ne/08CBAzV37lydP39e06dPV3h4uEJCQrR27Vo7hwMAAAAAjUJtzYB4TdL/I+lS+XVbSbmmaVb8uvlbSR3LP3eUdEKSyu/nlfcHbkhWVpYee+wxHThwQP7+/lq9erUkaebMmfrLX/6i1NRUxcfH67e//W3lM19//bU2b96sV155RQsXLtSgQYP05Zdfatu2bZozZ47Onz9v13AAAAAAoFFoerMvMAxjmKQfTdNMNQwjpqK5mq7mDdz7+XtnSpopSXfeeefNlolbSGBgoFwulyQpNDRU2dnZKigo0K5duzRu3LjKfhcvXqz8PG7cOHl4eEiSNm7cqI8//ljx8fGSpKKiIh0/flw9e/asw1EAAAAAQONy0wGEpH6SRhiGcb+k5pJaqWxGhL9hGE3LZzl0kvR9ef9vJXWW9K1hGE0l+Uk6e/lLTdNMlJQoSWFhYVcEFGi8vLy8Kj97eHiosLBQly5dkr+/v9LS0qp9pkWLFpWfTdPU6tWr1b17d8trBQAAAACUueklGKZpzjNNs5NpmgGSxkvaaprmBEnbJI0t7zZFUsVC+4/Lr1V+f6tpmgQMuCmtWrVSYGCgPvjgA0llIcP+/fur7RsbG6u//OUvqvjHbt++fXVWJwAAAAA0VrV5Csbl5kr6nWEY/1TZHg9vlbe/JaltefvvJD1jYQ1oRJYvX6633npLwcHBcjgcV91ccv78+SouLpbT6VRQUJDmz59fx5UCAAAAQONjNITJB2FhYabb7ba7DAAAAAAAcBnDMFJN0wy7Xr/a2AMCaDC+/uIHpaw9ooKzF+XbxkuRI7uq29132F0WAAAAANzyCCDQaHz9xQ/atjxTJT+VnRZbcPaiti3PlCRCCAAAAACwmJV7QAD1SsraI5XhQ4WSny4pZe0RmyoCAAAAgMaDAAKNRsHZizVqBwAAAADUHgIINBq+bbxq1A4AAAAAqD0EEGg0Ikd2VdNmVf+Rb9qsiSJHdrWpIgAAAABoPNiEEo1GxUaTnIIBAAAAAHWPAAKNSre77yBwAAAAAAAbsAQDAAAAAABYjgACAAAAAABYjgACAAAAAABYjgACAAAAAABYjgACAAAAAABYjgACAAAAAABYjgACAAAAAABYjgACAADgFvboo4/q4MGDkqSAgACdPn3a5ooAAI1VU7sLAAAAgHX+9re/2V0CAACSmAEBAABwyzh//ryGDh2q4OBgBQUF6f3331dMTIzcbvcVfd9991317dtXLpdLv/nNb1RaWipJ8vX11bPPPqvg4GBFRETo5MmTkqSTJ09q9OjRCg4OVnBwsHbt2nXN9wAAcDkCCAAAgFvEP/7xD3Xo0EH79+9XRkaG4uLiqu136NAhvf/++/r888+VlpYmDw8PLV++XFJZiBEREaH9+/crOjpaf/3rXyVJs2fP1oABA7R//37t3btXDofjmu8BAOByLMEAAAC4RfTu3VtPP/205s6dq2HDhikqKqraflu2bFFqaqrCw8MlSYWFhWrXrp0kqVmzZho2bJgkKTQ0VJs2bZIkbd26VcuWLZMkeXh4yM/PT//zP/9z1fcAAHA5AggAAIBbRLdu3ZSamqpPP/1U8+bN07333lttP9M0NWXKFL300ktX3PP09JRhGJLKgoaSkpKrft+13gMAwOVYggEAqFWjRo1SaGioHA6HEhMT7S4HaFS+//57+fj4aOLEiXr66ae1d+/eavsNHjxYq1at0o8//ihJOnv2rI4dO3bNdw8ePFhvvvmmJKm0tFTnzp37Re8BADReBBAAgFq1ZMkSpaamyu12KyEhQWfOnLG7JKDR+Oqrryo3hFy4cKF+//vfV9uvV69eevHFF3XvvffK6XRqyJAhysnJuea7//znP2vbtm3q3bu3QkNDdeDAgV/0HgBA42WYpml3DdcVFhZmVrd7MwCg/lmwYIHWrFkjScrOztaGDRsUERFhc1UAAACwimEYqaZphl2vH3tAAABqTXJysjZv3qyUlBT5+PgoJiZGRUVFdpcFwCLp6enasmWL8vLy5Ofnp8GDB8vpdNpdFgCgniKAAADUmry8PLVu3Vo+Pj7KzMzU7t277S4JgEXS09O1bt06FRcXSyr793/dunWSRAgBAKgWe0AAAGpNXFycSkpK5HQ6NX/+fJZeALewLVu2VIYPFYqLi7VlyxabKgIA1HfMgAAA1BovLy+tX7/e7jIA1IG8vLwatQMAQAABALhprAMHGh8/P79qwwY/Pz8bqgEANAQswQAA3JSKdeAV/yFSsQ48PT3d5soAWGnw4MHy9PSs0ubp6anBgwfbVBEAoL4jgAAA3BTWgQONk9Pp1PDhwytnPPj5+Wn48OHMfgIAXBVLMAAAN4V14EDj5XQ6CRwAADeMGRAAgJtytfXerAMHAADAzxFAAABuCuvAAQAAcCNYggEAuCkV0685BQMAAADXQgABALhprAMHAADA9bAEAwAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4AAgAAAAAAWI4Aohq5ubl644037C4DAAAAAIBbBgFENQggAAAAAACoXQQQ1XjmmWd05MgRuVwuzZkzR4sWLVJ4eLicTqeef/75yn6jRo1SaGioHA6HEhMTK9t9fX01d+5chYaG6t/+7d+0Z88excTEqEuXLvr444/tGBIAwEJ//OMfKz/XdoidnZ2toKAgSZLb7dbs2bNr7d0AAAB1iQCiGi+//LK6du2qtLQ0DRkyRFlZWdqzZ4/S0tKUmpqqzz77TJK0ZMkSpaamyu12KyEhQWfOnJEknT9/XjExMUpNTVXLli31+9//Xps2bdKaNWv03HPP2Tk0AIAFrAwgfi4sLEwJCQmWvBsAAMBqTe0uoL7buHGjNm7cqJCQEElSQUGBsrKyFB0drYSEBK1Zs0aSdOLECWVlZalt27Zq1qyZ4uLiJEm9e/eWl5eXPD091bt3b2VnZ9s1FABALRg1apROnDihoqIiPfHEEzp69KgKCwvlcrnkcDhUWlpaOYtuyJAhWrRokRYtWqSVK1fq4sWLGj16tP7zP/9T2dnZuu+++9S/f3/t2rVLHTt21Nq1a+Xt7a3U1FRNnz5dPj4+6t+/f+V3JycnKz4+Xp988okWLFig48eP6+jRozp+/LiefPLJytkRf/jDH7R8+XJ17txZt912m0JDQ/X000/b9UcGAAAgiQDiukzT1Lx58/Sb3/ymSntycrI2b96slJQU+fj4KCYmRkVFRZIkT09PGYYhSWrSpIm8vLwqP5eUlNTtAAAAtWrJkiVq06aNCgsLFR4eru3bt+v1119XWlqapLIlExkZGZXXGzdurJxJZ5qmRowYoc8++0x33nmnsrKylJSUpL/+9a968MEHtXr1ak2cOFHTpk3TX/7yFw0YMEBz5sy5ai2ZmZnatm2b8vPz1b17d82aNUv79+/X6tWrtW/fPpWUlKhPnz4KDQ2tkz8bAACAa2EJRjVatmyp/Px8SVJsbKyWLFmigoICSdJ3332nH3/8UXl5eWrdurV8fHyUmZmp3bt321kyAKCOJCQkKDg4WBEREZWz367l5zPp+vTpo8zMzMpnAgMD5XK5JEmhoaHKzs5WXl6ecnNzNWDAAEnSpEmTrvruoUOHysvLS7fddpvatWunkydPaufOnRo5cqS8vb3VsmVLDR8+vJZGDgAAcHOYAVGNtm3bql+/fgoKCtJ9992nhx9+WJGRkZLKNph89913FRcXp8WLF8vpdKp79+6KiIiwuWoAgNWuNfvtaq42ky47O7tyhpwkeXh4qLCwUKZpVs6iu57Lny8pKZFpmjUYEQAAQN0hgLiK9957r8r1E088cUWf9evXV/tsxWwJSVqwYMFV7wEAGparzX7z9PRUcXGxPD09q8yik8pm0s2fP18TJkyQr6+vvvvuO3l6el71O/z9/eXn56edO3eqf//+Wr58eY1q7N+/v37zm99o3rx5Kikp0d///nfNmDHjlw0YAACgFhFAWGz1D2f10tEcfXexWB29PDWvS3uNuaON3WUBAH6Bq81+mzlzppxOp/r06aPly5dXmUW3aNEiHTp06IqZdB4eHlf9nqVLl1ZuQhkbG1ujGsPDwzVixAgFBwfrV7/6lcLCwuTn5/fLBw0AAFBLjIYwVTMsLMx0u912l1Fjq384q6cPn1Dhpf/9M/ZuYii+e2dCCACAZQoKCuTr66sLFy4oOjpaiYmJ6tOnj91lAQCAW5RhGKmmaYZdrx+bUFropaM5VcIHSSq8ZOqlozk2VQQAuJWd3/ejcl7eo4l9R8vRoZtcjmCNGTOG8AEAANQLLMGw0HcXi2vUDgDAL3V+34/K/TBLZvElvT7iOUmS4dlE/nF32VwZAABAGWZAWKijV/WbjF2tHQCAX+rchmyZxZeqtJnFl3RuQ7Y9BQEAAFyGAMJC87q0l3eTqkepeTcxNK9Le5sqAgDcqkpzL9aoHQAAoK4RQFhozB1tFN+9szp5ecqQ1MnLkw0oAQCW8PD3qlE7AABAXWMPCIuNuaMNgQMAwHKtYgMq94CoYHg2UavYAPuKAgAA+BkCCAAAbgEtQtpJKtsLojT3ojz8vdQqNqCyHQAAwG4EEAAA3CJahLQjcAAAAPUWe0AAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADL3XQAYRhGZ8MwthmGccgwjAOGYTxR3t7GMIxNhmFklf9sXd5uGIaRYBjGPw3DSDcMo8/N1gAAAAAAAOq32pgBUSLp/5im2VNShKTHDMPoJekZSVtM07xL0pbya0m6T9Jd5f+bKenNWqgBwC3gT3/6kxISEiRJTz31lAYNGiRJ2rJliyZOnKhZs2YpLCxMDodDzz//fOVzzzzzjHr16iWn06mnn37altoBAAAAXFvTm32BaZo5knLKP+cbhnFIUkdJIyXFlHd7R1KypLnl7ctM0zQl7TYMw98wjPbl7wHQiEVHR+uVV17R7Nmz5Xa7dfHiRRUXF2vnzp2KiorSuHHj1KZNG5WWlmrw4MFKT09Xp06dtGbNGmVmZsowDOXm5to9DAAAAADVqNU9IAzDCJAUIukLSf9SESqU/2xX3q2jpBM/e+zb8jYAjVxoaKhSU1OVn58vLy8vRUZGyu12a8eOHYqKitLKlSvVp08fhYSE6MCBAzp48KBatWql5s2b69FHH9WHH34oHx8fu4cBAAAAoBq1FkAYhuErabWkJ03TPHetrtW0mdW8b6ZhGG7DMNynTp2qrTIB1GOenp4KCAjQ0qVLdc899ygqKkrbtm3TkSNH5O3trfj4eG3ZskXp6ekaOnSoioqK1LRpU+3Zs0djxozRRx99pLi4OLuHAQAAAKAatRJAGIbhqbLwYblpmh+WN580DKN9+f32kn4sb/9WUuefPd5J0veXv9M0zUTTNMNM0wy7/fbba6NMAA1AdHS04uPjFR0draioKC1evFgul0vnzp1TixYt5Ofnp5MnT2r9+vWSpIKCAuXl5en+++/Xa6+9prS0NJtHAAAAAKA6N70HhGEYhqS3JB0yTfO/f3brY0lTJL1c/nPtz9ofNwxjhaS7JeWx/wOAClFRUVq4cKEiIyPVokULNW/eXFFRUQoODlZISIgcDoe6dOmifv36SZLy8/M1cuRIFRUVyTRNvfrqqzaPAAAAAEB1bjqAkNRP0iRJXxmGUfGrx/9XZcHDSsMwHpF0XNK48nufSrpf0j8lXZA0rRZqAHCLGDx4sIqLiyuvv/7668rPb7/99hX9D+3YpkfDHco/c1ot296mvl3urIsyAQAAANRQbZyCsVPV7+sgSYOr6W9KeuxmvxcADu3Ypo2Jr6vkp4uSpPzTp7Qx8XVJUs+ogXaWBgAAAOAytXoKBgDUpR0rllWGDxVKfrqoHSuW2VQRAAAAgKshgADQYOWfOV2jdgAAAAD2IYAA0GC1bHtbjdoBAAAA2IcAAkCDFTV+spo286rS1rSZl6LGT7apIgAAAABXUxunYACALSo2mtyxYlnlKRhR4yezASUAAABQDxFAAGjQekYNJHAAAAAAGgCWYAAAAKDO5Obm6o033pAkJScna9iwYTZXBACoKwQQAAAAqDM/DyAAAI0LAQQAAADqzDPPPKMjR47I5XJpzpw5Kigo0NixY9WjRw9NmDBBpmlKklJTUzVgwACFhoYqNjZWOTk5NlcOALhZBBAAAACoMy+//LK6du2qtLQ0LVq0SPv27dNrr72mgwcP6ujRo/r8889VXFys//iP/9CqVauUmpqq6dOn69lnn7W7dADATWITSgAAANimb9++6tSpkyTJ5XIpOztb/v7+ysjI0JAhQyRJpaWlat++vZ1lAgBqAQEEAAAAbOPl5VX52cPDQyUlJTJNUw6HQykpKTZWBgCobSzBAAAAQJ1p2bKl8vPzr9mne/fuOnXqVGUAUVxcrAMHDtRFeQAACzEDAgAAAHWmbdu26tevn4KCguTt7a1/+Zd/uaJPs2bNtGrVKs2ePVt5eXkqKSnRk08+KYfDYUPFAIDaYlTsNFyfhYWFmW632+4yAAAAAADAZQzDSDVNM+x6/ViCAQAAgHolb906ZQ0arEM9eylr0GDlrVtnd0kAgFrAEgwAAADUG3nr1iln/nMyi4okSSXff6+c+c9JkvyGD7ezNADATWIGBAAAAOqNH199rTJ8qGAWFenHV1+zqSIAQG0hgAAAAEC9UZKTU6N2AEDDQQABAACAeqNp+/Y1agcANBwEEADQAN1zzz12lwAAlmj31JMymjev0mY0b652Tz1pU0UAgNrCJpQA0ADt2rXL7hIAwBIVG03++OprKsnJUdP27dXuqSfZgBIAbgEEEADQAPn6+qqgoEA5OTl66KGHdO7cOZWUlOjNN99UVFSU3eUBwE3xGz6cwAEAbkEEEADQgL333nuKjY3Vs88+q9LSUl24cMHukgAAAIBqEUAAQAMWHh6u6dOnq7i4WKNGjZLL5bK7JAAAAKBabEIJAA1YdHS0PvvsM3Xs2FGTJk3SsmXL7C4JAAAAqBYBBAA0YMeOHVO7du00Y8YMPfLII9q7d6/dJQEAAADVYgkGADRgycnJWrRokTw9PeXr68sMCAAAANRbzIBArfLw8JDL5VJwcLD69OlTa0cFPvroozp48KAkKSAgQKdPn66V9wINVUFBgSRpypQpysjI0L59+7Rjxw4FBgbaXBlwc2JiYuR2u69of/vtt/X444/bUBEAAKgtzIBArfL29lZaWpokacOGDZo3b562b99epU9paak8PDxq9N6//e1vtVYjcCs4v+9HnduQrdLci/KzXZXjAAAgAElEQVTw91Kr2AC1CGlnd1nATSktLbW7BAAAYCFmQMAy586dU+vWrSWVTRMfOHCgHn74YfXu3VuSNGrUKIWGhsrhcCgxMVGS9PHHH8vlcsnlcql79+6Vv8292m/EgMbo/L4flfthlkpzL0qSSnMvKvfDLJ3f96PNlaEx+9Of/qSEhARJ0lNPPaVBgwZJkrZs2aKJEycqKSlJvXv3VlBQkObOnVv5nK+vr5577jndfffdSklJqfLOpUuXqlu3bhowYIA+//zzuhsMAACwBDMgUKsKCwvlcrlUVFSknJwcbd26tfLenj17lJGRURkqLFmyRG3atFFhYaHCw8M1ZswYjRgxQiNGjJAkPfjggxowYIAt4wDqs3MbsmUWX6rSZhZf0rkN2cyCgG2io6P1yiuvaPbs2XK73bp48aKKi4u1c+dO3XXXXZo7d65SU1PVunVr3Xvvvfroo480atQonT9/XkFBQXrhhReqvC8nJ0fPP/+8UlNT5efnp4EDByokJMSm0QEAgNrADAjUqoolGJmZmfrHP/6hyZMnyzRNSVLfvn2rrE9PSEhQcHCwIiIidOLECWVlZVXe+9Of/iRvb2899thjdT4GoL6rmPlwo+1AXQgNDVVqaqry8/Pl5eWlyMhIud1u7dixQ/7+/oqJidHtt9+upk2basKECfrss88kle0dNGbMmCve98UXX1Q+06xZMz300EN1PSQAAFDLmAEBy0RGRur06dM6deqUJKlFixaV95KTk7V582alpKTIx8dHMTExKioqklQ2XfeDDz6o/MspgKo8/L2qDRs8/L1sqAYo4+npqYCAAC1dulT33HOPnE6ntm3bpiNHjujOO+9Uampqtc81b978qvsCGYZhZckAAKCOMQMClsnMzFRpaanatm17xb28vDy1bt1aPj4+yszM1O7duyVJx44d029/+1utXLlS3t7edV0y0CC0ig2Q4Vn1/74NzyZqFRtgT0FAuejoaMXHxys6OlpRUVFavHixXC6XIiIitH37dp0+fVqlpaVKSkq67hK7u+++W8nJyTpz5oyKi4v1wQcf1NEoAACAVZgBgVpVsQeEJJmmqXfeeafa32zFxcVp8eLFcjqd6t69uyIiIiSVHbN25swZjR49WpLUoUMHffrpp3U3AKABqNjngVMwUN9ERUVp4cKFioyMVIsWLdS8eXNFRUWpffv2eumllzRw4ECZpqn7779fI0eOvOa72rdvrwULFigyMlLt27dXnz59OCUDAIAGzqhYn1+fhYWFmZyAAABA4/H1Fz8oZe0RFZy9KN82Xooc2VXd7r7D7rIAAEA1DMNINU0z7Hr9mAGBeu3Qjm3asWKZ8s+cVsu2tylq/GT1jBpod1kAAAt9/cUP2rY8UyU/lZ32UnD2orYtz5QkQggAABow9oBAvXVoxzZtTHxd+adPSaap/NOntDHxdR3asc3u0gAAFkpZe6QyfKhQ8tMlpaw9YlNFAACgNhBAoN7asWKZSn6qutN/yU8XtWPFMpsqAgDUhYKz1R8pe7V2ALe+mJgY/dIl2ffff79yc3NruSIAvwRLMFBv5Z85XaN2AMCtwbeNV7Vhg28bjpoFUHNsaA7UH8yAQL3Vsu1tNWoHANwaIkd2VdNmVf+K0rRZE0WO7GpTRQDqSnZ2tnr06KEpU6bI6XRq7NixunDhQpU+s2bNUlhYmBwOh55//nlJ0pYtWypPUZOkTZs26YEHHpAkBQQE6PTp08rOzlbPnj01Y8YMORwO3XvvvSosLJQkffnll3I6nYqMjNScOXMUFBRURyMGGhcCCNRbUeMnq2mzqr/tatrMS1HjJ9tUEQCgLnS7+w4NnNCjcsaDbxsvDZzQgw0ogUbi8OHDmjlzptLT09WqVSu98cYbVe4vXLhQbrdb6enp2r59u9LT0zVo0CAdOnRIp06dkiQtXbpU06ZNu+LdWVlZeuyxx3TgwAH5+/tr9erVkqRp06Zp8eLFSklJqfYIeQC1gyUYqLcqTrvgFAwAaHy63X0HgQPQSHXu3Fn9+vWTJE2cOFEJCQlV7q9cuVKJiYkqKSlRTk6ODh48KKfTqUmTJundd9/VtGnTlJKSomXLrtw3LDAwUC6XS5IUGhqq7Oxs5ebmKj8/X/fcc48k6eGHH9Ynn3xi8SiBxokAAvVaz6iBBA4AAACNiGEYV73+5ptvFB8fry+//FKtW7fW1KlTVVRUJKlsFsPw4cPVvHlzjRs3Tk2bXvmfOl5e/zu71sPDQ4WFhTJN06KRALgcSzAAAAAA1BvHjx9XSkqKJCkpKUn9+/evvHfu3Dm1aNFCfn5+OnnypNavX195r0OHDurQoYNefPFFTZ069Ya/r3Xr1mrZsqV2794tSVqxYkXtDATAFQggAAAAANQbPXv21DvvvCOn06mzZ89q1qxZlfeCg4MVEhIih8Oh6dOnVy7VqDBhwgR17txZvXr1qtF3vvXWW5o5c6YiIyNlmqb8/PxqZSwAqjIawpSjsLAw85ee+wsAAACgYcjOztawYcOUkZHxi55//PHHFRISokceeaRGzxUUFMjX11eS9PLLLysnJ0d//vOff1ENQGNkGEaqaZph1+vHHhAAAAAAGrzQ0FC1aNFCr7zySs0eTF+pv8f/H7208XuVGJ761V299PbqDdYUCTRyzIAAAAAA0Dilr5TWzZaKC/+3zdNbGp4gOR+0ry6ggbnRGRDsAQEAuCWUlpbaXQIAoKHZ8kLV8EEqu97ygj31ALc4AggAQIMwatQohYaGyuFwKDExUZLk6+ur5557TnfffbdSUlKUmpqqAQMGKDQ0VLGxscrJybG5agBAvZb3bc3aAdwU9oAAADQIS5YsUZs2bVRYWKjw8HCNGTNG58+fV1BQkF544QUVFxdrwIABWrt2rW6//Xa9//77evbZZ7VkyRK7SwcA1Fd+naS8E9W3A6h1BBAAgAYhISFBa9askSSdOHFCWVlZ8vDw0JgxYyRJhw8fVkZGhoYMGSKpbElG+/btbasXANAADH6u+j0gBj9nX03ALYwAAgBQ7yUnJ2vz5s1KSUmRj4+PYmJiVFRUpObNm8vDw0OSZJqmHA6HUlJSbK4WANBgVGw0ueWFsmUXfp3Kwgc2oAQsQQABAKj38vLy1Lp1a/n4+CgzM1O7d+++ok/37t116tQppaSkKDIyUsXFxfr666/lcDhsqBgA0GA4HyRwAOoIm1ACAOq9uLg4lZSUyOl0av78+YqIiLiiT7NmzbRq1SrNnTtXwcHBcrlc2rVrlw3VAgAAoDrMgAAA1HteXl5av379Fe0FBQWSpEM7tmnHimXKP3NaE3t3VdT4yeoZNbCuywQAAMA1MAMCANCgHdqxTRsTX1f+6VOSaSr/9CltTHxdh3Zss7s0AABuSExMjNxut2XvT05O1rBhwyx7P3CjCCAAAA3ajhXLVPLTxSptJT9d1I4Vy2yqCAAAANUhgAAANGj5Z07XqB0AALtkZ2erR48emjJlipxOp8aOHasLFy5U6TNr1iyFhYXJ4XDo+eeflyRt2bJFo0ePruyzadMmPfDAA5KkjRs3KjIyUn369NG4ceMqlyf+4x//UI8ePdS/f399+OGHdTRC4NoIIAAADVrLtrfVqB0AADsdPnxYM2fOVHp6ulq1aqU33nijyv2FCxfK7XYrPT1d27dvV3p6ugYNGqRDhw7p1KlTkqSlS5dq2rRpOn36tF588UVt3rxZe/fuVVhYmP77v/9bRUVFmjFjhtatW6cdO3bohx9+sGOowBUIIAAADVrU+Mlq2syrSlvTZl6KGj/ZpooAALi6zp07q1+/fpKkiRMnaufOnVXur1y5Un369FFISIgOHDiggwcPyjAMTZo0Se+++65yc3OVkpKi++67T7t379bBgwfVr18/uVwuvfPOOzp27JgyMzMVGBiou+66S4ZhaOLEiXYMFbgCp2AAABq0itMuKk7BaNn2Nk7BAADUW4ZhXPX6m2++UXx8vL788ku1bt1aU6dOVVFRkSRp2rRpGj58uJo3b65x48apadOmMk1TQ4YMUVJSUpV3pqWlXfE9QH1AAAEAaPB6Rg0kcAAANAjHjx9XSkqKIiMjlZSUpP79+2vdunWSpHPnzqlFixby8/PTyZMntX79esXExEiSOnTooA4dOujFF1/Upk2bJEkRERF67LHH9M9//lP/+q//qgsXLujbb79Vjx499M033+jIkSPq2rXrFQEFYBeWYAAAAABAHenZs6feeecdOZ1OnT17VrNmzaq8FxwcrJCQEDkcDk2fPr1yqUaFCRMmqHPnzurVq5ck6fbbb9fbb7+tX//613I6nYqIiFBmZqaaN2+uxMREDR06VP3799evfvWrOh0jcDWGaZp213BdYWFhppXn4gIAAACA1bKzszVs2DBlZGT8oucff/xxhYSE6JFHHrl2x/SV0pYXpLxvJb9O0uDnJOeDv+g7gRthGEaqaZph1+vHDIh6aM2aNTIMQ5mZmZZ/V3Z2tt57773Ka7fbrdmzZ1v+vQAAAABuXGhoqNLT06+/oWT6SmndbCnvhCSz7Oe62WXtgM2YAVEPPfjgg8rJydHgwYO1YMGCKvdKS0vl4eFRa9+VnJys+Ph4ffLJJ7X2TgAAAAA2eTWoPHy4jF9n6alfNvMCuB5mQDRQBQUF+vzzz/XWW29pxYoVkspCgoEDB+rhhx9W7969JUl/+MMf1KNHDw0ZMkS//vWvFR8fL0k6cuSI4uLiFBoaqqioqMpZFFOnTtXs2bN1zz33qEuXLlq1apUk6ZlnntGOHTvkcrn06quvKjk5WcOGDZMkLViwQNOnT1dMTIy6dOmihISEyjpHjRql0NBQORwOJSYm1tmfDwAAAIBryPu2Zu1AHeIUjHrmo48+UlxcnLp166Y2bdpo7969kqQ9e/YoIyNDgYGBcrvdWr16tfbt26eSkhL16dNHoaGhkqSZM2dq8eLFuuuuu/TFF1/ot7/9rbZu3SpJysnJ0c6dO5WZmakRI0Zo7Nixevnll6vMgEhOTq5ST2ZmprZt26b8/Hx1795ds2bNkqenp5YsWaI2bdqosLBQ4eHhGjNmjNq2bVt3f1AAAAAAruTX6SozIDrVfS3AZQgg6pmkpCQ9+eSTkqTx48crKSlJQ4cOVd++fRUYGChJ2rlzp0aOHClvb29J0vDhwyWVzZ7YtWuXxo0bV/m+ixcvVn4eNWqUmjRpol69eunkyZM3VM/QoUPl5eUlLy8vtWvXTidPnlSnTp2UkJCgNWvWSJJOnDihrKwsAggAAADAboOfK9vzobjwf9s8vcvaAZsRQNQjZ86c0datW5WRkSHDMFRaWirDMHT//ferRYsWlf2utm/HpUuX5O/vr7S0tGrve3l5Xfcd13rGw8NDJSUlSk5O1ubNm5WSkiIfHx/FxMSoqKjoht4HAAAAwEIVp11wCgbqIfaAqEdWrVqlyZMn69ixY8rOztaJEycUGBionTt3VunXv39/rVu3TkVFRSooKNDf//53SVKrVq0UGBioDz74QFJZyLB///5rfmfLli2Vn59fozrz8vLUunVr+fj4KDMzU7t3767R8wAAAAAs5HywbMPJBbllPwkfUE8QQNQjSUlJGj16dJW2MWPGVDkmU5LCw8M1YsQIBQcH64EHHlBYWJj8/PwkScuXL9dbb72l4OBgORwOrV279prf6XQ61bRpUwUHB+vVV1+9oTrj4uJUUlIip9Op+fPnKyIiogajBAAAAAA0RhzD2UAVFBTI19dXFy5cUHR0tBITE9WnTx+7ywJQB+655x7t2rXL7jIAAAAASTd+DCd7QDRQM2fO1MGDB1VUVKQpU6bUafjw0b7vtGjDYX2fW6gO/t6aE9tdo0I61tn3A40d4QMAAAAaIgKIBuryZRl15aN932neh1+psLhUkvRdbqHmffiVJBFCAHXE19dXBQUFysnJ0UMPPaRz586ppKREb775pqKiouwuDwAAAKgWe0CgRhZtOFwZPlQoLC7Vog2HbaoIaLzee+89xcbGKi0tTfv375fL5bK7JAAAAOCqmAGBGvk+t7BG7QCsEx4erunTp6u4uFijRo0igAAAAEC9xgwI1EgHf+8atQOwTnR0tD777DN17NhRkyZN0rJly+wuCQAAALgqAgjUyJzY7vL29KjS5u3poTmx3W2qCGi8jh07pnbt2mnGjBl65JFHtHfvXrtLAgAAAK6KJRiokYqNJjkFA7BfcnKyFi1aJE9PT/n6+jIDAgAAAPWaYZqm3TVcV1hYmOl2u+0uAwAAAAAAXMYwjFTTNMOu148ZEADQwHy07ztmIQEAAKDBIYAAgAbko33fad6HX1Ueh/tdbqHmffiVJBFCAAAAoF5jE0oAaEAWbThcGT5UKCwu1aINh22qCAAAALgxBBAA0IB8n1tYo3YAAACgviCAAIAGpIO/d43aAQAAgPqCAAIAGpA5sd3l7elRpc3b00NzYrvbVBEAAABwY9iEEgAakIqNJjkFAwAAAA0NAQQANDCjQjoSOAAAAKDBYQkGAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAAAAAACwHAEEAACAxWJiYuR2u+0uAwAAWxFAAAAAAAAAyxFAAAAA1JLs7Gz16NFDU6ZMkdPp1NixY3XhwoUqfWbNmqWwsDA5HA49//zzkqQtW7Zo9OjRlX02bdqkBx54oE5rBwDAarYFEIZhxBmGcdgwjH8ahvGMXXUAAADUpsOHD2vmzJlKT09Xq1at9MYbb1S5v3DhQrndbqWnp2v79u1KT0/XoEGDdOjQIZ06dUqStHTpUk2bNs2O8gEAsIwtAYRhGB6S/j9J90nqJenXhmH0sqMWAACA2tS5c2f169dPkjRx4kTt3Lmzyv2VK1eqT58+CgkJ0YEDB3Tw4EEZhqFJkybp3XffVW5urlJSUnTffffZUT4AAJZpatP39pX0T9M0j0qSYRgrJI2UdNCmegAAAGqFYRhXvf7mm28UHx+vL7/8Uq1bt9bUqVNVVFQkSZo2bZqGDx+u5s2ba9y4cWra1K6/pgEAYA27lmB0lHTiZ9fflrcBAAA0aMePH1dKSookKSkpSf3796+8d+7cObVo0UJ+fn46efKk1q9fX3mvQ4cO6tChg1588UVNnTq1rssGAMBydgUQRjVtZpUOhjHTMAy3YRjuivWQAAAA9V3Pnj31zjvvyOl06uzZs5o1a1blveDgYIWEhMjhcGj69OmVSzUqTJgwQZ07d1avXqxMBQDceuya2/etpM4/u+4k6fufdzBNM1FSoiSFhYVVCScAAADqqyZNmmjx4sVV2pKTkys/v/3221d9dufOnZoxY4ZFlQEAYC+7ZkB8KekuwzACDcNoJmm8pI9tqgUAAMA2OT+s1eefR6lbNy/t3Pk/GvxvfnaXBACAJWyZAWGaZolhGI9L2iDJQ9IS0zQP2FELAABAbQkICFBGRsYN98/5Ya0yM5/VpUuFenNxJ0nSN98skJeXp9rfMdKqMgEAsIVt2yubpvmppE/t+n4AAAC7HT0Sr0uXCqu0XbpUqKNH4gkgAAC3HLuWYAAAADR6RRdzatQOAEBDRgABAABgk+Ze7WvUDgBAQ0YAAQAAYJMuXZ9WkybeVdqaNPFWl65P21QRAADWsW0PCAAAgMauYp+Ho0fiVXQxR8292qtL16fZ/wEAcEsigAAAALBR+ztGEjgAABoFlmAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAAAADLEUAAAAAA+P/bu/fwqKp7/+PvRcAEAwKKSgAVoQoCxhAioggoVNEqKhYUixW12mqPiPbgUevTGn/W1lOxIu1Rq62XeqnaWKuIraiFCoqXBCgKigjNqUo4oggFFEzI+v2RSQoYbpLN5PJ+PU8eZ9ZeM/PdzHbP5JO11pakxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASEqr5cuXM3r0aLp160bPnj35xje+wd13382pp55aa/+LLrqIhQsXAtClSxc+/vjjL/UpLCxk4sSJidYtSZIkaec0T3cBkpquGCMjRoxg7NixPProowDMmzePKVOmbPUxv/nNb3ZXeZIkSZLqkCMgJKXN9OnTadGiBZdccklNW15eHgMHDmTt2rWMHDmSHj16MGbMGGKMABx33HEUFxd/6bluuukmunfvzte//nUWLVq02/ZBkiRJ0o5xBISktHnrrbfo27dvrdvmzp3LggUL6NixIwMGDODll1/m2GOPrbVvSUkJjz76KHPnzqWiooL8/PytPq8kSZKk9HAEhKR6qV+/fnTu3JlmzZqRl5dHaWnpVvvOnDmTESNGsOeee7LXXntx2mmn7b5CJUmSJO0QAwhJadOrVy9KSkpq3ZaZmVlzOyMjg4qKim0+VwihTmuTpIZsa9PVmprJkydz2GGHMWbMmJ163IwZM3jllVdq7p9//vkUFRXt8ONLS0vp3bt3zXNtbWFlSWpqDCAkpc2QIUPYsGED99xzT03bG2+8wd/+9redep5Bgwbx5JNP8vnnn7NmzZptLmIpSWo67rjjDp599lkefvjhnXrclgGEJKluGEBISpsQAk8++STPP/883bp1o1evXhQWFtKxY8edep78/HzOPvts8vLy+OY3v8nAgQMTqliS6pfS0lJ69OjB2LFjyc3NZeTIkXz22Web9bn00kspKCigV69eXH/99QC8+OKLjBgxoqbP888/z5lnnrlba0/aJZdcwtKlSznttNO49dZbOeOMM8jNzaV///7Mnz8fgJUrV36pvbS0lLvuuovbbruNvLw8Zs6cCcALL7zAwIEDOfTQQ3nmmWeAqn//gQMHkp+fT35+vqGFJG1HqF5Zvj4rKCiIDiOUJEnaXGlpKQcffDCzZs1iwIABXHjhhfTs2ZNnnnmGiRMnUlBQwMqVK9l7773ZuHEjQ4cOZfLkyRx++OEcdthhzJw5k3333ZdvfetbnHPOOQwfPjzdu1SnunTpQnFxMTfccAPt27fn+uuv569//Ss/+MEPmDdvHuPGjau1vbCwkFatWjFhwgSgagrG8uXLefbZZ1myZAnHH3887733HpWVlTRr1oysrCwWL17MOeecQ3FxMaWlpZx66qm89dZbzJgxg4kTJ9aEFpLUGIUQSmKMBdvr5wgISQ3eE8tXUvDKAnKmz6PglQU8sXxlukuSpN3mgAMOYMCAAQCce+65zJo1a7Ptjz/+OPn5+fTp04cFCxawcOFCQgh8+9vf5qGHHmLVqlXMnj2bk08+OR3l7xazZs3i29/+NlA1/e+TTz5h9erVW22vzVlnnUWzZs045JBD6Nq1K++88w7l5eVcfPHFHH744YwaNYqFCxfutn2SpIbIy3BKatCeWL6SCYve5/PKqtFcH2woZ8Ki9wH4Zoe901maJO0WWy7Cu+n9f/zjH0ycOJE33niDdu3acf7557N+/XoALrjgAoYPH05WVhajRo2iefPG+7WwthG/IYStttemtn/n2267jf3335+///3vVFZWkpWVVTcFS1Ij5QgISQ3az5aW1YQP1T6vjPxsaVmaKpKk3euf//wns2fPBuD3v/89xx57bM22f/3rX2RnZ9OmTRv+7//+jz//+c812zp27EjHjh35yU9+wvnnn7+7y96tBg0aVLMQ5YwZM2jfvj177bXXVttbt27NmjVrNnuOP/zhD1RWVrJkyRKWLl1K9+7dWb16NTk5OTRr1owHH3yQjRs37vZ9k6SGxABCUoP24YbynWqXpMbmsMMO44EHHiA3N5eVK1dy6aWX1mw74ogj6NOnD7169eLCCy+smapRbcyYMRxwwAH07Nlzd5e9WxUWFlJcXExubi7XXHMNDzzwwDbbhw8fzpNPPrnZIpTdu3dn8ODBnHzyydx1111kZWXx/e9/nwceeID+/fvz7rvvkp2dnbZ9lKSGwEUoJTVoBa8s4INawobOmS0oPqZXGiqSpN1n08UOv4rLLruMPn368J3vfKeOK2vapi6dyu1zbmf5uuV0yO7A+PzxnNL1lHSXJUmJcRFKSU3CtV1zaNls83m5LZsFru2ak6aKJKl++9PcDxlw81/J7PA1Hpz6Env1Pj7dJTUqU5dOpfCVQsrWlRGJlK0ro/CVQqYunZru0iQp7RwBIanBe2L5Sn62tIwPN5TTKbMF13bNcQFKSarFn+Z+yLV/fJPPy/+9VkHLFhn87MzDOaNPpzRW1nicWHQiZeu+vA5RTnYO00ZOS0NFkpS8HR0B0XiXO5bUZHyzw94GDpK0A255btFm4QPA5+UbueW5RQYQdWT5uuU71S5JTYlTMCRJkpqIZas+36l27bwO2R12ql2SmhIDCEmSpCaiY9uWO9WunTc+fzxZGVmbtWVlZDE+f3yaKpKk+sMAQpIkqYm4alh3WrbI2KytZYsMrhrWPU0VNT6ndD2FwmMKycnOIRDIyc6h8JhCr4IhSbgGhCRJUpNRvc7DLc8tYtmqz+nYtiVXDevu+g917JSupxg4SFItDCAkSZKakDP6dDJwkCSlhVMwJEmqJ1atWsUdd9wBwIwZMzj11FPTXJEkSVLdMYCQJKme2DSAkCRJamwMICRJqieuueYalixZQl5eHldddRVr165l5MiR9OjRgzFjxhBjBKCkpITBgwfTt29fhg0bRllZWZorlyRJ2j4DCEmS6ombb76Zbt26MV18OScAACAASURBVG/ePG655Rbmzp3LpEmTWLhwIUuXLuXll1+mvLyccePGUVRURElJCRdeeCHXXXddukuXJEnaLhehlCSpnurXrx+dO3cGIC8vj9LSUtq2bctbb73FCSecAMDGjRvJyclJZ5mSJEk7xABCkqR6KjMzs+Z2RkYGFRUVxBjp1asXs2fPTmNlkiRJO88pGJIk1ROtW7dmzZo12+zTvXt3VqxYURNAlJeXs2DBgt1RniRJ0i5xBIQkSfXEPvvsw4ABA+jduzctW7Zk//33/1KfPfbYg6KiIi6//HJWr15NRUUFV1xxBb169UpDxZIkSTsuVK+oXZ8VFBTE4uLidJchSZIkSZK2EEIoiTEWbK+fIyAkSWpA5s+fz4svvsjq1atp06YNQ4cOJTc3N91lSZIkbZcBhCRJDcT8+fOZMmUK5eXlAKxevZopU6YAGEJIkqR6z0UoJUlqIF588cWa8KFaeXk5L774YpoqkiRJ2nEGEJIkNRCrV6/eqXZJkqT6xABCkqQGok2bNjvVLkmSVJ8YQEiS1EAMHTqUFi1abNbWokULhg4dmqaKJEmSdpyLUEqS1EBULzTpVTAkSVJDZAAhSVIDkpuba+AgSZIaJKdgSJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJIkSZKkxBlASJK0hVWrVnHHHXcAMGPGDE499dQ0VyRJktTwGUBIkrSFTQMISZIk1Q0DCEmStnDNNdewZMkS8vLyuOqqq1i7di0jR46kR48ejBkzhhgjACUlJQwePJi+ffsybNgwysrK0ly5JElK0qRJk/jss8/SXUaDZQAhSdIWbr75Zrp168a8efO45ZZbmDt3LpMmTWLhwoUsXbqUl19+mfLycsaNG0dRURElJSVceOGFXHfddekuXZIkJWhbAcTGjRt3czUNT/N0FyBJUn3Xr18/OnfuDEBeXh6lpaW0bduWt956ixNOOAGo+tKRk5OTzjIlSVIdWrduHWeddRYffPABGzduZNSoUSxbtozjjz+e9u3bM336dFq1asUPfvADnnvuOW699VY2bNjAhAkTqKio4Mgjj+TOO+8kMzOTLl26MHbsWKZMmUJ5eTl/+MMf6NGjBytWrOBb3/oWn3zyCUceeSR/+ctfKCkpoX379une/UQ4AkKSpO3IzMysuZ2RkUFFRQUxRnr16sW8efOYN28eb775JtOmTUtjlZIkqS795S9/oWPHjvz973/nrbfe4oorrqBjx45Mnz6d6dOnA1UhRe/evXnttdcoKCjg/PPP57HHHuPNN9+koqKCO++8s+b52rdvz5w5c7j00kuZOHEiADfccANDhgxhzpw5jBgxgn/+859p2dfdxQBCkqQttG7dmjVr1myzT/fu3VmxYgWzZ88GoLy8nAULFuyO8iQprYqLi7n88suBqisFvfLKK2muSErG4YcfzgsvvMDVV1/NzJkzadOmzZf6ZGRk8M1vfhOARYsWcfDBB3PooYcCMHbsWF566aWavmeeeSYAffv2pbS0FIBZs2YxevRoAE466STatWuX5C6lnVMwJEnawj777MOAAQPo3bs3LVu2ZP/99/9Snz322IOioiIuv/xyVq9eTUVFBVdccQW9evVKQ8WStPsUFBRQUFAAVAUQrVq14phjjtnhx1dUVNC8ub+GqP479NBDKSkp4dlnn+Xaa6/lxBNP/FKfrKwsMjIyAGoWqd6a6hGV1aMpd+QxjY3/50uSVItHHnmk1vZf/epXNbfz8vI2+8uGJDVUN954Iw8//DAHHHAA7du3p2/fvjzzzDNMnDiRgoICPv74YwoKCigtLWXGjBlMnDiRX/3qV9x1111kZGTw0EMP8ctf/pJVq1bxk5/8hC+++IJ99tmHhx9+mP3335/CwkKWLVtGaWkp7du33+o5VqpPli1bxt577825555Lq1atuP/++2tGSda2RkOPHj0oLS3lvffe42tf+xoPPvgggwcP3uZrHHvssTz++ONcffXVTJs2jU8//TSp3akXDCAkSfoKnli+kp8tLePDDeV0ymzBtV1z+GaHvdNdliTttOLiYp544gnmzp1LRUUF+fn59O3bd7uP69KlC5dccgmtWrViwoQJAHz66ae8+uqrhBD4zW9+w89//nNuvfVWoOrSxbNmzaJly5aJ7o9UV958802uuuoqmjVrRosWLbjzzjuZPXs2J598Mjk5OTXrQFTLysrivvvuY9SoUTWLUF5yySXbfI3rr7+ec845h8cee4zBgweTk5ND69atk9yttDKAkCRpJz2xfCUTFr3P55VVwyY/2FDOhEXvAxhCSGpwZs2axemnn14TDAwfPvwrP9cHH3zA2WefTVlZGV988QUHH3xwzbbTTjvN8EENyrBhwxg2bNhmbQUFBYwbN67m/tq1azfbPnToUObOnful56pe86H6OWbMmAFAmzZteO6552jevDmzZ89m+vTpmy1+3di4CKUkSTvpZ0vLasKHap9XRn62tCxNFUn113HHHUdxcTFQ9Rfzjz/+OM0VaUtbm4PevHlzKisrAVi/fv0OPde4ceO47LLLePPNN/n1r3+92eOys7N3vVipkSkuuZeePdvSrVsmF154Ij+7+dx0l5QoAwhJknbShxvKd6pdkuqzY489lilTprB+/XrWrl3L1KlTgarAqKSkBICioqJaH7vlVYNWr15Np06dAHjggQcSrlxq2MqWP8UXX/wPd961P3ff05lf/c9+ZGXdT9nyp9JdWmIMICRJ2kmdMlvsVLvUGPz85z9n8uTJAFx55ZUMGTIEgBdffJFzzz2XadOmcfTRR5Ofn8+oUaO+NCxZ9deRRx7JaaedxhFHHMGZZ55JQUEBbdq0YcKECdx5550cc8wxWx25Mnz4cJ588kny8vKYOXMmhYWFjBo1ioEDB9a6SJ+kf1u6ZCKVlZ9v1lZZ+TlLl0xMU0XJCw3hsh8FBQWxeuieJEnptuUaEAAtmwUmdj/ANSDUaL366qvceuut/OEPf2DgwIFs2LCBl19+mZ/+9KdkZWUxdepU/vznP5Odnc1///d/s2HDBn784x9z3HHH1VxJoUuXLhQXF/uLaT20du1aWrVqxWeffcagQYO4++67yc/PT3dZUqP24l+/BtT2+3hg6JD3dnc5uySEUBJjLNhePxehlCRpJ1WHDF4FQ01J3759KSkpYc2aNWRmZpKfn09xcTEzZ87ktNNOY+HChQwYMACAL774gqOPPjrNFWtnfPe732XhwoWsX7+esWPH1kn4sHrKFD66bRIVZWU0z8lhvyuvoM0uLHApNTZZmTms37Cs1vbGygBCkqSv4Jsd9jZwUJPSokULunTpwn333ccxxxxDbm4u06dPZ8mSJRx88MGccMIJ/P73v093mfqKHnnkkTp9vtVTplD2ox8TU4tQVixbRtmPfgxgCCGldO02gXfeuW6zaRjNmrWka7cJaawqWa4BIUmSpB0yaNAgJk6cyKBBgxg4cCB33XUXeXl59O/fn5dffpn33qsaMvzZZ5/x7rvvprlapdNHt02qCR+qxfXr+ei2SWmqSKp/cjqcTo8eN5GV2REIZGV2pEePm8jpcHq6S0uMIyAkSZK0QwYOHMhNN93E0UcfTXZ2NllZWQwcOJB9992X+++/n3POOYcNGzYA8JOf/IRDDz00zRUrXSrKar8s8dbapaYqp8PpjTpw2JKLUEqSJEmqU4uHDKVi2Zfntjfv2JFD/vpiGiqSlCQXoZQkSVJavfvacmY/tYS1KzfQau9Mjj69G4ce1SHdZWk32O/KKzZbAwIgZGWx35VXpLEqSelmACFJkqQ69+5ry5n+8DtUfFEJwNqVG5j+8DsAhhBNQPVCk14FQ9KmDCAkSZJU52Y/taQmfKhW8UUls59aYgDRRLQZPtzAQdJmvAqGJEmS6tzalRt2ql2S1PgZQEiSJKnOtdo7c6faJUmNnwGEJEmS6tzRp3ej+R6bf9Vsvkczjj69W5oqkiSlm2tASJIkqc5Vr/PgVTAkSdUMICRJkpSIQ4/qYOAgSarhFAxJkrRNq1at4o477gBgxowZnHrqqWmuSJIkNUS7FECEEG4JIbwTQpgfQngyhNB2k23XhhDeCyEsCiEM26T9pFTbeyGEa3bl9SVJUvI2DSAkSZK+ql0dAfE80DvGmAu8C1wLEELoCYwGegEnAXeEEDJCCBnA/wAnAz2Bc1J9JUlSPXXNNdewZMkS8vLyuOqqq1i7di0jR46kR48ejBkzhhgjACUlJQwePJi+ffsybNgwysrK0ly5JEmqT3YpgIgxTosxVqTuvgp0Tt0+HXg0xrghxvgP4D2gX+rnvRjj0hjjF8Cjqb6SJKmeuvnmm+nWrRvz5s3jlltuYe7cuUyaNImFCxeydOlSXn75ZcrLyxk3bhxFRUWUlJRw4YUXct1116W7dEmSVI/U5SKUFwKPpW53oiqQqPZBqg3g/S3aj6rtyUII3wW+C3DggQfWYZmSJGlX9OvXj86dq/7mkJeXR2lpKW3btuWtt97ihBNOAGDjxo3k5OSks0xJklTPbDeACCG8ANS2fPF1McanUn2uAyqAh6sfVkv/SO0jLmJtrxtjvBu4G6CgoKDWPpIkaffLzMysuZ2RkUFFRQUxRnr16sXs2bPTWJkkSarPthtAxBi/vq3tIYSxwKnA0Fg9CbRqZMMBm3TrDCxL3d5auyRJqodat27NmjVrttmne/furFixgtmzZ3P00UdTXl7Ou+++S69evXZTlZIkqb7bpSkYIYSTgKuBwTHGzzbZ9DTwSAjhF0BH4BDgdapGRhwSQjgY+JCqhSq/tSs1SJKkZO2zzz4MGDCA3r1707JlS/bff/8v9dljjz0oKiri8ssvZ/Xq1VRUVHDFFVcYQEiSpBrh34MWvsKDQ3gPyAQ+STW9GmO8JLXtOqrWhagArogx/jnV/g1gEpAB3BtjvGl7r1NQUBCLi4u/cp2SJEmSJCkZIYSSGGPBdvvtSgCxuxhASJJUv01dOpXb59zO8nXL6ZDdgfH54zml6ynpLkuSJO0GOxpA1OVVMCRJUhM0delUCl8pZP3G9QCUrSuj8JVCAEMISZJUo7arUkiSJO2w2+fcXhM+VFu/cT23z7k9TRVJkqT6yABCkiTtkuXrlu9UuyRJapoMICRJ0i7pkN1hp9olSVLTZAAhSZJ2yfj88WRlZG3WlpWRxfj88WmqSJIk1UcuQilJknZJ9UKTXgVDkiRtiwGEJEnaZad0PcXAQZIkbZNTMCRJkiRJUuIMICRJkiRJUuIMICRJkiRJUuIMICRJkiRJUuIMICSpCSssLGTixIk73H/ZsmWMHDkywYokSZLUWBlASJJ2SEVFBR07dqSoqCjdpUiSJKkBMoCQpCbmpptuonv37nz9619n0aJFABx33HEUFxcD8PHHH9OlSxcA7r//fkaNGsXw4cM58cQTKS0tpXfv3jXbzjzzTE466SQOOeQQ/uu//qvmNX77299y6KGHctxxx3HxxRdz2WWX7d6dlCRJUr3TPN0FSJJ2n5KSEh599FHmzp1LRUUF+fn59O3bd5uPmT17NvPnz2fvvfemtLR0s23z5s1j7ty5ZGZm0r17d8aNG0dGRgY33ngjc+bMoXXr1gwZMoQjjjgiwb2SJElSQ2AAIUlNyMyZMxkxYgR77rknAKeddtp2H3PCCSew995717pt6NChtGnTBoCePXvyv//7v3z88ccMHjy45jGjRo3i3XffraM9kCRJUkPlFAxJamJCCF9qa968OZWVlQCsX79+s23Z2dlbfa7MzMya2xkZGVRUVBBjrKNKJUmS1JgYQEhSEzJo0CCefPJJPv/8c9asWcOUKVMA6NKlCyUlJQC7vMhkv379+Nvf/sann35KRUUFTzzxxC7XLUmSpIbPKRiS1ITk5+dz9tlnk5eXx0EHHcTAgQMBmDBhAmeddRYPPvggQ4YM2aXX6NSpEz/84Q856qij6NixIz179qyZpiFJkqSmKzSEobIFBQWxenV2SVL9t3btWlq1akVFRQUjRozgwgsvZMSIEekuS5IkSQkIIZTEGAu2188pGJKkOlW2/CkuvrgHX/taJt267cV++0fOOOOMdJclSZKkNHMKhiSpzpQtf4p33rmOiy7O5KKLOwPQrFkpy//vaXI6nJ7m6iRJkpROjoCQJNWZpUsmUln5+WZtlZWfs3TJxDRVJEmSpPrCAEKSVGfWbyjbqXZJkiQ1HQYQkqQ6k5WZs1PtkiRJajoMICRJdaZrtwk0a9Zys7ZmzVrStduENFUkSZKk+sJFKCVJdaZ6ocmlSyayfkMZWZk5dO02wQUoJUmSZAAhSapbOR1ON3CQJEnSlzgFQ5IkSZIkJc4AQpIkSZIkJc4AQpIkSZIkJc4AQpIkSZIkJc4AQpIkSZIkJc4AQpIkSZIkJc4AQpIkSZIkJc4AQpIkSZIkJc4AQpIkSXUmxkhlZWW6y5Ak1UPN012AJEmS6p+rr76agw46iO9///sAFBYW0rp1ayorK3n88cfZsGEDI0aM4IYbbqC0tJSTTz6Z448/ntmzZ3PGGWewatUqbrvtNgDuuece3n77bX7xi1+kc5ckSWnmCAhJkiR9yejRo3nsscdq7j/++OPsu+++LF68mNdff5158+ZRUlLCSy+9BMCiRYs477zzmDt3LhMmTODpp5+mvLwcgPvuu48LLrggLfshSao/HAEhSZKkL+nTpw8fffQRy5YtY8WKFbRr14758+czbdo0+vTpA8DatWtZvHgxBx54IAcddBD9+/cHIDs7myFDhvDMM89w2GGHUV5ezuGHH57O3ZEk1QMGEJIkSarVyJEjKSoqYvny5YwePZrS0lKuvfZavve9723Wr7S0lOzs7M3aLrroIn7605/So0cPRz9IkgADCEmSJG3F6NGjufjii/n444/529/+xptvvsmPfvQjxowZQ6tWrfjwww9p0aJFrY896qijeP/995kzZw7z58/fzZVLkuojAwhJkiTVqlevXqxZs4ZOnTqRk5NDTk4Ob7/9NkcffTQArVq14qGHHiIjI6PWx5911lnMmzePdu3a7c6yJUn1VIgxpruG7SooKIjFxcXpLkOSJEk74dRTT+XKK69k6NCh6S5FkpSgEEJJjLFge/28CoYkSZLqzOopUygZOIgue2Sy8fXXKfjss3SXJEmqJ5yCIUmSpDqxesoUyn70Y/Zcv54/d+0KQNmPfgxAm+HD01maJKkecASEJEmS6sRHt00irl+/WVtcv56PbpuUpookSfWJAYQkSZLqREVZ2U61S5KaFgMISZIk1YnmOTk71S5JaloMICRJklQn9rvyCkJW1mZtISuL/a68Ik0VSZLqExehlCRJUp2oXmjyo9smUVFWRvOcHPa78goXoJQkAQYQkiRJqkNthg83cJAk1copGJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSZIkKXEGEJIkSVIj1KpVq11+jvvvv5/LLrtsm31KS0t55JFHdvm1JDV+BhCSJEmSvjIDCEk7ygBCkiRJauDOOOMM+vbtS69evbj77rtr2v/zP/+T/Px8hg4dyooVKwCYPHkyPXv2JDc3l9GjRwOwcuVKzjjjDHJzc+nfvz/z58//0mucf/75FBUV1dyvHmFxzTXXMHPmTPLy8rjtttvYuHEjV111FUceeSS5ubn8+te/TnLXJTUgBhCSJElSA3fvvfdSUlJCcXExkydP5pNPPmHdunXk5+czZ84cBg8ezA033ADAzTffzNy5c5k/fz533XUXANdffz19+vRh/vz5/PSnP+W8887b4de++eabGThwIPPmzePKK6/kt7/9LW3atOGNN97gjTfe4J577uEf//hHIvstqWExgJAkSZIauMmTJ3PEEUfQv39/3n//fRYvXkyzZs04++yzATj33HOZNWsWALm5uYwZM4aHHnqI5s2bAzBr1iy+/e1vAzBkyBA++eQTVq9e/ZVqmTZtGr/73e/Iy8vjqKOO4pNPPmHx4sV1sJeSGrrm6S5AkiRJ0lc3Y8YMXnjhBWbPns2ee+7Jcccdx/r167/UL4QAwNSpU3nppZd4+umnufHGG1mwYAExxq32r9a8eXMqKysBiDHyxRdf1FpPjJFf/vKXDBs2bFd3TVIj4wgISZIkqQFbvXo17dq1Y8899+Sdd97h1VdfBaCysrJmzYZHHnmEY489lsrKSt5//32OP/54fv7zn7Nq1SrWrl3LoEGDePjhh4GqQKN9+/bstddem71Oly5dKCkpAeCpp56ivLwcgNatW7NmzZqafsOGDePOO++s2f7uu++ybt26ZP8RJDUIjoCQJEmSGrCTTjqJu+66i9zcXLp3707//v0ByM7OZsGCBfTt25c2bdrw2GOPsXHjRs4991xWr15NjJErr7yStm3bUlhYyAUXXEBubi577rknDzzwwJde5+KLL+b000+nX79+DB06lOzsbKBqSkfz5s054ogjOP/88xk/fjylpaXk5+cTY2TfffflT3/60279N5FUP4XahlvVNwUFBbG4uDjdZUiSJEnahnVzP+Jfz5WycdUGMtpmstewLmT32S/dZUlKWAihJMZYsL1+joCQJEmStMvWzf2IVX9cTCyvWidi46oNrPpj1eKThhCSwDUgJEmSJNWBfz1XWhM+VIvllfzrudL0FCSp3qmTACKEMCGEEEMI7VP3QwhhcgjhvRDC/BBC/iZ9x4YQFqd+xtbF60uSJElKr42rNuxUu6SmZ5enYIQQDgBOAP65SfPJwCGpn6OAO4GjQgh7A9cDBUAESkIIT8cYP93VOiRJkiSlT0bbzFrDhoy2mWmoRlJ9VBcjIG4D/ouqQKHa6cDvYpVXgbYhhBxgGPB8jHFlKnR4HjipDmqQJEmSlEZ7DetCaLH5rxehRTP2GtYlPQVJqnd2aQRECOE04MMY499DCJtu6gS8v8n9D1JtW2uXJEmS1IBVLzTpVTAkbc12A4gQwgtAh1o2XQf8EDixtofV0ha30V7b634X+C7AgQceuL0yJUmSJKVZdp/9DBwkbdV2A4gY49draw8hHA4cDFSPfugMzAkh9KNqZMMBm3TvDCxLtR+3RfuMrbzu3cDdAAUFBbWGFJIkSZIkqWH4ymtAxBjfjDHuF2PsEmPsQlW4kB9jXA48DZyXuhpGf2B1jLEMeA44MYTQLoTQjqrRE8/t+m5IkiRJkqT6bJevgrEVzwLfAN4DPgMuAIgxrgwh3Ai8ker3/2KMKxOqQZIkSZIk1RN1FkCkRkFU347Af2yl373AvXX1upIkSZIkqf6ri8twSpIkSZIkbZMBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCRJkiRJSpwBhCTVUzFGKisr012GJEmSVCcMICSpDv3iF7+gd+/e9O7dm0mTJnH11Vdzxx131GwvLCzk1ltvBeCWW27hyCOPJDc3l+uvvx6A0tJSDjvsML7//e+Tn5/P+++/n5b9kCRJkuqaAYQk1ZGSkhLuu+8+XnvtNV599VXuueceRo8ezWOPPVbT5/HHH2fUqFFMmzaNxYsX8/rrrzNv3jxKSkp46aWXAFi0aBHnnXcec+fO5aCDDkrX7kiSJEl1qnm6C5CkxmLWrFmMGDGC7OxsAM4880xmzpzJRx99xLJly1ixYgXt2rXjwAMPZPLkyUybNo0+ffoAsHbtWhYvXsyBBx7IQQcdRP/+/dO5K5IkSVKdM4CQpDoSY6y1feTIkRQVFbF8+XJGjx5d0/faa6/le9/73mZ9S0tLawIMSZIkqTFxCoYk1ZFBgwbxpz/9ic8++4x169bx5JNPMnDgQEaPHs2jjz5KUVERI0eOBGDYsGHce++9rF27FoAPP/yQjz76KJ3lS5IkSYlyBIQk1ZH8/HzOP/98+vXrB8BFF11UM8VizZo1dOrUiZycHABOPPFE3n77bY4++mgAWrVqxUMPPURGAScC2wAACLhJREFURkZ6ipckSZISFrY2ZLg+KSgoiMXFxekuQ5IkSZIkbSGEUBJjLNheP6dgSFI9MHXpVE4sOpHcB3I5sehEpi6dmu6SJEmSpDrlFAxJSrOpS6dS+Eoh6zeuB6BsXRmFrxQCcErXU9JYmSRJklR3HAEhSWl2+5zba8KHaus3ruf2ObenqSJJkiSp7hlASFKaLV+3fKfaJUmSpIbIAEKS0qxDdoedapckSZIaIgMISUqz8fnjycrI2qwtKyOL8fnj01SRJEmSVPdchFKS0qx6ocnb59zO8nXL6ZDdgfH5412AUpIkSY2KAYQk1QOndD3FwEGSJEmNmlMwJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4gwgJEmSJElS4kKMMd01bFcIYQXwv+muoxbtgY/TXYTqHY8LbcljQlvymFBtPC60JY8J1cbjQluqD8fEQTHGfbfXqUEEEPVVCKE4xliQ7jpUv3hcaEseE9qSx4Rq43GhLXlMqDYeF9pSQzomnIIhSZIkSZISZwAhSZIkSZISZwCxa+5OdwGqlzwutCWPCW3JY0K18bjQljwmVBuPC22pwRwTrgEhSZIkSZIS5wgISZIkSZKUOAOInRBCGBdCWBRCWBBC+Pkm7deGEN5LbRu2SftJqbb3QgjXpKdqJS2EMCGEEEMI7VP3Qwhhcup9nx9CyN+k79gQwuLUz9j0Va0khBBuCSG8k3rfnwwhtN1km+cJAb7nTVUI4YAQwvQQwtup7xHjU+17hxCeT30uPB9CaJdq3+pniRqXEEJGCGFuCOGZ1P2DQwivpY6Jx0IIe6TaM1P330tt75LOupWcEELbEEJR6jvF2yGEoz1XKIRwZerz460Qwu9DCFkN8XxhALGDQgjHA6cDuTHGXsDEVHtPYDTQCzgJuCP1QZIB/A9wMtATOCfVV41ICOEA4ATgn5s0nwwckvr5LnBnqu/ewPXAUUA/4PrqDw81Gs8DvWOMucC7wLXgeUL/5nvepFUA/xljPAzoD/xH6r2/BngxxngI8GLqPmzls0SN0njg7U3u/zdwW+qY+BT4Tqr9O8CnMcavAbel+qlxuh34S4yxB3AEVceH54omLITQCbgcKIgx9gYyqPpu2eDOFwYQO+5S4OYY4waAGONHqfbTgUdjjBtijP8A3qPql8t+wHsxxqUxxi+AR1N91bjcBvwXsOliKqcDv4tVXgXahhBygGHA8zHGlTHGT6n6ZfWk3V6xEhNjnBZjrEjdfRXonLrteULVfM+bqBhjWYxxTur2Gqp+oehE1fv/QKrbA8AZqdtb+yxRIxJC6AycAvwmdT8AQ4CiVJctj4nqY6UIGJrqr0YkhLAXMAj4LUCM8YsY4yo8VwiaAy1DCM2BPYEyGuD5wgBixx0KDEwNYflbCOHIVHsn4P1N+n2QattauxqJEMJpwIcxxr9vscljQgAXAn9O3faYUDXfc5EaCtsHeA3YP8ZYBlUhBbBfqpvHStMwiao/ZFSm7u8DrNokzN70fa85JlLbV6f6q3HpCqwA7ktNzflNCCEbzxVNWozxQ6pG4P+TquBhNVBCAzxfNE93AfVJCOEFoEMtm66j6t+qHVXDJo8EHg8hdAVqS5IitYc7XnKkgdnOMfFD4MTaHlZLW9xGuxqQbR0TMcanUn2uo2q49cPVD6ulv+eJpsnzQBMXQmgFPAFcEWP81zb+IOWx0siFEE4FPooxloQQjqturqVr3IFtajyaA/nAuBjjayGE2/n3dIvaeFw0Aalp26cDBwOrgD9QNf1mS/X+fGEAsYkY49e3ti2EcCnwx1h13dLXQwiVQHuqkqYDNunaGViWur21djUQWzsmQgiHU3UC+Hvqy2NnYE4IoR9bPyY+AI7bon1GnRetRG3rPAFVC40CpwJD47+vc+x5QtW2dSyokQshtKAqfHg4xvjHVPP/hRByYoxlqWHT1VM8PVYavwHAaSGEbwBZwF5UjYhoG0Jonvqr5abve/Ux8UFqCHYbYOXuL1sJ+wD4IMb4Wup+EVUBhOeKpu3rwD9ijCsAQgh/BI6hAZ4vnIKx4/5E1RwbQgiHAnsAHwNPA6NTK40eTNUCMK8DbwCHpFYm3YOqRUKeTkvlqnMxxjdjjPvFGLvEGLtQ9T95foxxOVXv83mpVYn7A6tTQ+WeA04MIbRLpZgnptrUSIQQTgKuBk6LMX62ySbPE6rme95Epebe/hZ4O8b4i002PQ1UXxVpLPDUJu21fZaokYgxXhtj7Jz6HjEa+GuMcQwwHRiZ6rblMVF9rIxM9a8Xf9FU3Ul9l3w/hNA91TQUWIjniqbun0D/EMKeqc+T6uOiwZ0vHAGx4+4F7g0hvAV8AYxNvYkLQgiPU3UAVAD/EWPcCBBCuIyqXzAzgHtjjAvSU7p2s2eBb1C10OBnwAUAMcaVIYQbqfoFBOD/xRjrRRKpOvMrIBN4PjUy5tUY4yUxRs8TAqrmYfqeN1kDgG8Db4YQ5qXafgjcTNW0zu9Q9QVzVGpbrZ8lahKuBh4NIfwEmEtqMcLUfx8MIbxH1V8yR6epPiVvHPBwKqheStX//83wXNFkpabjFAFzqPouORe4G5hKAztfhHoShEiSJEmSpEbMKRiSJEmSJClxBhCSJEmSJClxBhCSJEmSJClxBhCSJEmSJClxBhCSJEmSJClxBhCSJEmSJClxBhCSJEmSJClxBhCSJEmSJClx/x+RWdEsI6vJnAAAAABJRU5ErkJggg==\n",
293 | "text/plain": [
294 | ""
295 | ]
296 | },
297 | "metadata": {},
298 | "output_type": "display_data"
299 | }
300 | ],
301 | "source": [
302 | "tsne_plot()"
303 | ]
304 | },
305 | {
306 | "cell_type": "markdown",
307 | "metadata": {},
308 | "source": [
309 | "We see that \"play\" has been used in different context and Elmo is able to distinguish."
310 | ]
311 | },
312 | {
313 | "cell_type": "markdown",
314 | "metadata": {},
315 | "source": [
316 | "### 2. **Tensorboard**\n",
317 | "\n",
318 | "Visualizing using Tensorboard involves the following:\n"
319 | ]
320 | },
321 | {
322 | "cell_type": "markdown",
323 | "metadata": {},
324 | "source": [
325 | "- **Preparing the Batch** (embedding matrix)
\n",
326 | " \n",
327 | " The batch is an embedding matrix of shape **(_tokens_ , embedding dimension)** where each entry is the embedding of the token at that position."
328 | ]
329 | },
330 | {
331 | "cell_type": "code",
332 | "execution_count": 121,
333 | "metadata": {},
334 | "outputs": [],
335 | "source": [
336 | "def prepare_batch():\n",
337 | " \"Creates matrix for embeddings\"\n",
338 | " \n",
339 | " sent = [\"Argentina\", \"played\", \"football\", \"very\", \"well\"]\n",
340 | " sent1 = [\"Brazil\",\"is\",\"a\",\"strong\",\"team\"]\n",
341 | " sent2 = [\"Artists\",\"all\",\"over\",\"the\",\"world\",\"are\",\"attending\",\"the\",\"play\"]\n",
342 | " sent3 = [\"Child\",\"is\",\"playing\",\"the\",\"guitar\"]\n",
343 | " sent4 = [\"There\",\"was\",\"absolute\",\"silence\",\"during\",\"the\",\"play\"]\n",
344 | " \n",
345 | " counter=0\n",
346 | " batch_xs = np.zeros((31,1024)) # of shape(tokens,embedding_dimension)\n",
347 | " \n",
348 | " for i in range(len(sent)):\n",
349 | " batch_xs[counter] = sess.run(embeddings[0][i])\n",
350 | " counter = counter+1 \n",
351 | " counter = len(sent) \n",
352 | " for i in range(len(sent1)):\n",
353 | " batch_xs[counter] = sess.run(embeddings[0][i])\n",
354 | " counter = counter+1\n",
355 | " counter = len(sent)+len(sent1)\n",
356 | " for i in range(len(sent2)):\n",
357 | " batch_xs[counter] = sess.run(embeddings[0][i])\n",
358 | " counter = counter+1\n",
359 | " counter = len(sent)+len(sent1)+len(sent2)\n",
360 | " for i in range(len(sent3)):\n",
361 | " batch_xs[counter] = sess.run(embeddings[0][i])\n",
362 | " counter = counter+1\n",
363 | " counter = len(sent) \n",
364 | " for i in range(len(sent4)):\n",
365 | " batch_xs[counter] = sess.run(embeddings[0][i])\n",
366 | " counter = counter+1\n",
367 | " return batch_xs"
368 | ]
369 | },
370 | {
371 | "cell_type": "markdown",
372 | "metadata": {},
373 | "source": [
374 | "- **Specifying the log_directory**
\n",
375 | " The log_directory is the location where you save the metadata.tsv and where all the checkpoints will get created.\n",
376 | " \n",
377 | " \n",
378 | "- **Creating the metadata.tsv**
\n",
379 | " This is a file which contains the index and token separated by a tab."
380 | ]
381 | },
382 | {
383 | "cell_type": "code",
384 | "execution_count": 61,
385 | "metadata": {},
386 | "outputs": [],
387 | "source": [
388 | "token_list = [\"Argentina\", \"played\", \"football\", \"very\", \"well\",\"Brazil\",\"is\",\"a\",\"strong\",\n",
389 | " \"team\",\"Artists\",\"all\",\"over\",\"the\",\"world\",\"are\",\"attending\",\"the\",\"play\",\n",
390 | " \"Child\",\"is\",\"playing\",\"the\",\"guitar\",\"There\",\"was\",\"absolute\",\"silence\",\n",
391 | " \"during\",\"the\",\"play\"]"
392 | ]
393 | },
394 | {
395 | "cell_type": "code",
396 | "execution_count": 106,
397 | "metadata": {},
398 | "outputs": [],
399 | "source": [
400 | "with open('/Users/a554755/Elmo/log_dir/Elmo_metadata.tsv','w') as f:\n",
401 | " f.write(\"Index\\tLabel\\n\")\n",
402 | " for index,label in enumerate(token_list):\n",
403 | " f.write(\"%d\\t%s\\n\" % (index,label))"
404 | ]
405 | },
406 | {
407 | "cell_type": "code",
408 | "execution_count": 115,
409 | "metadata": {},
410 | "outputs": [],
411 | "source": [
412 | "LOG_DIR = 'log_dir'\n",
413 | "NAME_TO_VISUALISE_VARIABLE = \"elmoembedding\"\n",
414 | "TO_EMBED_COUNT = 25\n",
415 | "\n",
416 | "\n",
417 | "#path_for_mnist_sprites = os.path.join(LOG_DIR,'mnistdigits.png')\n",
418 | "path_for_elmo_metadata = os.path.join(LOG_DIR,'Elmo_metadata.tsv')"
419 | ]
420 | },
421 | {
422 | "cell_type": "code",
423 | "execution_count": 116,
424 | "metadata": {},
425 | "outputs": [],
426 | "source": [
427 | "embedding_var = tf.Variable(batch_x, name=NAME_TO_VISUALISE_VARIABLE)\n",
428 | "summary_writer = tf.summary.FileWriter(LOG_DIR)"
429 | ]
430 | },
431 | {
432 | "cell_type": "code",
433 | "execution_count": 117,
434 | "metadata": {},
435 | "outputs": [],
436 | "source": [
437 | "config = projector.ProjectorConfig()\n",
438 | "embedding = config.embeddings.add()\n",
439 | "embedding.tensor_name = embedding_var.name\n",
440 | "\n",
441 | "# Specify where you find the metadata\n",
442 | "embedding.metadata_path = 'Elmo_metadata.tsv' #'metadata.tsv'\n",
443 | "\n",
444 | "# Say that you want to visualise the embeddings\n",
445 | "projector.visualize_embeddings(summary_writer, config)"
446 | ]
447 | },
448 | {
449 | "cell_type": "code",
450 | "execution_count": 118,
451 | "metadata": {},
452 | "outputs": [
453 | {
454 | "data": {
455 | "text/plain": [
456 | "'log_dir/model.ckpt-1'"
457 | ]
458 | },
459 | "execution_count": 118,
460 | "metadata": {},
461 | "output_type": "execute_result"
462 | }
463 | ],
464 | "source": [
465 | "sess = tf.InteractiveSession()\n",
466 | "sess.run(tf.global_variables_initializer())\n",
467 | "saver = tf.train.Saver()\n",
468 | "saver.save(sess, os.path.join(LOG_DIR, \"model.ckpt\"), 1)"
469 | ]
470 | },
471 | {
472 | "cell_type": "markdown",
473 | "metadata": {},
474 | "source": [
475 | "### Running tensorboard :\n",
476 | "\n",
477 | " tensorboard --logdir=\"your_log_directory\"\n",
478 | " \n",
479 | "Tensorboard will be running at: http://localhost:6006 \n"
480 | ]
481 | },
482 | {
483 | "cell_type": "markdown",
484 | "metadata": {},
485 | "source": [
486 | "\n",
487 | "\n",
488 | "\n",
489 | "\n",
490 | "\n",
491 | "# Training Elmo Model on new data:\n",
492 | "\n",
493 | "\n",
494 | "\n",
495 | "To train and evaluate a biLM, you need to provide:\n",
496 | "\n",
497 | " a vocabulary file \n",
498 | " a set of training files \n",
499 | " a set of heldout files \n",
500 | "\n",
501 | "The vocabulary file is a a text file with one token per line. It must also include the special tokens , and (case sensitive) in the file.\n",
502 | "\n",
503 | "The vocabulary file should be sorted in descending order by token count in your training data. The first three lines should be the special tokens (, and ), then the most common token in the training data, ending with the least common token.\n",
504 | "\n",
505 | "The training data should be randomly split into many training files, each containing one slice of the data. Each file contains pre-tokenized and white space separated text, one sentence per line. Don't include the or tokens in your training data.\n",
506 | "\n",
507 | "\n",
508 | "Once done, git clone https://github.com/allenai/bilm-tf.git \n",
509 | "and run:\n",
510 | "\n",
511 | " python bin/train_elmo.py --train_prefix= --vocab_file --save_dir \n",
512 | " \n",
513 | "To get the weights file, run:\n",
514 | " \n",
515 | " python bin/dump_weights.py --save_dir /output_path/to/checkpoint --outfile/output_path/to/weights.hdf5\n",
516 | " \n",
517 | " \n",
518 | "In the save dir, one options.json will be dumped and above command will give you a weights file required to create an Elmo model (options file and json file)\n",
519 | "\n",
520 | "### To use Elmo programatically :\n",
521 | "\n",
522 | "from allennlp.modules.elmo import Elmo, batch_to_ids\n",
523 | "\n",
524 | "options_file = \"path to options file\"
\n",
525 | "weight_file = \"path to weights file\"\n",
526 | "\n",
527 | "elmo = Elmo(options_file, weight_file, 2, dropout=0)\n",
528 | "\n",
529 | "2 is an integer which represents ``num_output_representations``.\n",
530 | "Typically ``num_output_representations`` is 1 or 2. For example, in the case of the SRL model in the above paper, ``num_output_representations=1`` where ELMo was included at\n",
531 | "the input token representation layer. In the case of the SQuAD model,``num_output_representations=2``as ELMo was also included at the GRU output layer.\n",
532 | "\n",
533 | "\n",
534 | "use batch_to_ids to convert sentences to character ids:\n",
535 | "\n",
536 | " sentences = [['First', 'sentence', '.'], ['Another', '.']]\n",
537 | " character_ids = batch_to_ids(sentences)\n",
538 | "\n",
539 | " embeddings = elmo(character_ids)\n",
540 | "\n",
541 | " embeddings[elmo] is length two list of tensors. \n",
542 | " Each element contains one layer of ELMo representations with shape\n",
543 | " (2, 3, 1024).\n"
544 | ]
545 | },
546 | {
547 | "cell_type": "markdown",
548 | "metadata": {},
549 | "source": [
550 | "# Incremental Learning\n",
551 | "\n",
552 | "In order to do the incremental learning, one would want to update an existing weights file based on new vocab file. The previously checkpointed file will have a different structure given the different vocab size. \n",
553 | "\n",
554 | "See: https://github.com/tensorflow/nmt/issues/134 \n",
555 | "\n",
556 | "### The problem:\n",
557 | "We have trained a model and created the checkpoint and have a saved weights file. We have new data with new vocab file and we create the same model structure. Now when we try to load the weights from the checkpointed file you may encounter an error:\n",
558 | "\n",
559 | "**tensor shape mis-match** \n",
560 | "\n",
561 | "The most likely reason being the vocab size mis match and hence loading the weights of the embedding layer from checkpointed file(x,y) will not align with shape of the new model structure (embedding layer (m,y)) and hence will be a problem.\n",
562 | "\n",
563 | "### Solution\n",
564 | "Load all the layers weights except for the embedding layer and learn the weights of the embedding during the new training process.\n",
565 | "\n",
566 | "train with the same command as above with a new parameter:\n",
567 | "\n",
568 | " python bin/train_elmo_updated.py --train_prefix= --vocab_file --save_dir --restart_ckpt_file \n",
569 | " \n",
570 | "replace the _training.py_ within _bilm_ with the updated _training_updated.py attached at _home_\n",
571 | "\n",
572 | "In the _train_elmo_updated.py_ within _bin_, set these options based on your data:\n",
573 | "\n",
574 | " batch_size = 128 # batch size for each GPU\n",
575 | " n_gpus = 3\n",
576 | "\n",
577 | " # number of tokens in training data \n",
578 | " n_train_tokens = \n",
579 | "\n",
580 | " options = {\n",
581 | " 'bidirectional': True,\n",
582 | " 'dropout': 0.1,\n",
583 | " 'all_clip_norm_val': 10.0,\n",
584 | " \n",
585 | " 'n_epochs': 10,\n",
586 | " 'n_train_tokens': n_train_tokens,\n",
587 | " 'batch_size': batch_size,\n",
588 | " 'n_tokens_vocab': vocab.size,\n",
589 | " 'unroll_steps': 20,\n",
590 | " 'n_negative_samples_batch': 8192,\n",
591 | "\n",
592 | "NOTE: Try running the training on a gpu as recommended, else the training will be slow.\n",
593 | "\n",
594 | "\n",
595 | "\n",
596 | "\n"
597 | ]
598 | },
599 | {
600 | "cell_type": "markdown",
601 | "metadata": {},
602 | "source": [
603 | "## Errors\n",
604 | "\n",
605 | "Errors you may run into:\n",
606 | "\n",
607 | " **Cudnn_internal_status_error**
\n",
608 | " **failed to enqueue convolution/max pooling on stream**
\n",
609 | " \n",
610 | "#### Solution\n",
611 | " try reducing the no. of paramters - lesser batch size etc.\n",
612 | " \n",
613 | "Kindly let me know more in the issues section if theres any.\n",
614 | " "
615 | ]
616 | },
617 | {
618 | "cell_type": "markdown",
619 | "metadata": {},
620 | "source": [
621 | "### Using Elmo Embedding layer in consequent models\n",
622 | "\n",
623 | "if you want to use Elmo Embedding layer in consequent model build refer :\n",
624 | "https://github.com/PrashantRanjan09/WordEmbeddings-Elmo-Fasttext-Word2Vec "
625 | ]
626 | }
627 | ],
628 | "metadata": {
629 | "kernelspec": {
630 | "display_name": "Python 3",
631 | "language": "python",
632 | "name": "python3"
633 | },
634 | "language_info": {
635 | "codemirror_mode": {
636 | "name": "ipython",
637 | "version": 3
638 | },
639 | "file_extension": ".py",
640 | "mimetype": "text/x-python",
641 | "name": "python",
642 | "nbconvert_exporter": "python",
643 | "pygments_lexer": "ipython3",
644 | "version": "3.6.4"
645 | }
646 | },
647 | "nbformat": 4,
648 | "nbformat_minor": 2
649 | }
650 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Elmo-Tutorial
2 |
3 | This is a short tutorial on using Deep contextualized word representations (ELMo) which is discussed in the paper https://arxiv.org/abs/1802.05365.
4 | This tutorial can help in using:
5 |
6 | * **Pre Trained Elmo Model** - refer _Elmo_tutorial.ipynb_
7 | * **Training an Elmo Model on your new data from scratch**
8 |
9 | To train and evaluate a biLM, you need to provide:
10 | * a vocabulary file
11 | * a set of training files
12 | * a set of heldout files
13 |
14 | The vocabulary file is a text file with one token per line. It must also include the special tokens , and
15 | The vocabulary file should be sorted in descending order by token count in your training data. The first three entries/lines should be the special tokens :
16 | `` ,
17 | `` and
18 | ``.
19 |
20 | The training data should be randomly split into many training files, each containing one slice of the data. Each file contains pre-tokenized and white space separated text, one sentence per line.
21 |
22 | **Don't include the `` or `` tokens in your training data.**
23 |
24 | Once done, git clone **https://github.com/allenai/bilm-tf.git**
25 | and run:
26 |
27 | python bin/train_elmo.py --train_prefix= --vocab_file --save_dir
28 |
29 | To get the weights file,
30 | run:
31 |
32 | python bin/dump_weights.py --save_dir /output_path/to/checkpoint --outfile/output_path/to/weights.hdf5
33 |
34 | In the save dir, one options.json will be dumped and above command will give you a weights file required to create an Elmo model (options file and the weights file)
35 |
36 | For more information refer **Elmo_tutorial.ipynb**
37 |
38 |
39 | * ## Incremental Learning/Training
40 |
41 | To incrementally train an existing model with new data
42 |
43 | While doing Incremental training :
44 | git clone https://github.com/allenai/bilm-tf.git
45 |
46 | Once done, replace _train_elmo_ within allenai/bilm-tf/bin/ with **train_elmo_updated.py** provided at home.
47 |
48 | **Updated changes** :
49 |
50 | _train_elmo_updated.py_
51 |
52 | tf_save_dir = args.save_dir
53 | tf_log_dir = args.save_dir
54 | train(options, data, n_gpus, tf_save_dir, tf_log_dir,restart_ckpt_file)
55 |
56 | if __name__ == '__main__':
57 | parser = argparse.ArgumentParser()
58 | parser.add_argument('--save_dir', help='Location of checkpoint files')
59 | parser.add_argument('--vocab_file', help='Vocabulary file')
60 | parser.add_argument('--train_prefix', help='Prefix for train files')
61 | parser.add_argument('--restart_ckpt_file', help='latest checkpoint file to start with')
62 |
63 | This takes an argument (--restart_ckpt_file) to accept the path of the checkpointed file.
64 |
65 |
66 | replace _training.py_ within allenai/bilm-tf/bilm/ with **training_updated.py** provided at home.
67 | Also, make sure to put your embedding layer name in line 758 in **training_updated.py** :
68 |
69 | exclude = ['the embedding layer name you want to remove']
70 |
71 | **Updated changes** :
72 |
73 | _training_updated.py_
74 |
75 | # load the checkpoint data if needed
76 | if restart_ckpt_file is not None:
77 | reader = tf.train.NewCheckpointReader(your_checkpoint_file)
78 | cur_vars = reader.get_variable_to_shape_map()
79 | exclude = ['the embedding layer name you want to remove']
80 | variables_to_restore = tf.contrib.slim.get_variables_to_restore(exclude=exclude)
81 | loader = tf.train.Saver(variables_to_restore)
82 | #loader = tf.train.Saver()
83 | loader.save(sess,'/tmp')
84 | loader.restore(sess, '/tmp')
85 | with open(os.path.join(tf_save_dir, 'options.json'), 'w') as fout:
86 | fout.write(json.dumps(options))
87 |
88 | summary_writer = tf.summary.FileWriter(tf_log_dir, sess.graph)
89 |
90 | The code reads the checkpointed file and reads all the current variables in the graph and excludes the layers mentioned in the _exclude_ variable, restores rest of the variables along with the associated weights.
91 |
92 | For training run:
93 |
94 | python bin/train_elmo_updated.py --train_prefix= --vocab_file --save_dir --restart_ckpt_file
95 |
96 |
97 | In the _train_elmo_updated.py_ within bin, set these options based on your data:
98 |
99 | batch_size = 128 # batch size for each GPU
100 | n_gpus = 3
101 |
102 | # number of tokens in training data
103 | n_train_tokens =
104 |
105 | options = {
106 | 'bidirectional': True,
107 | 'dropout': 0.1,
108 | 'all_clip_norm_val': 10.0,
109 |
110 | 'n_epochs': 10,
111 | 'n_train_tokens': n_train_tokens,
112 | 'batch_size': batch_size,
113 | 'n_tokens_vocab': vocab.size,
114 | 'unroll_steps': 20,
115 | 'n_negative_samples_batch': 8192,
116 |
117 |
118 | **Visualisation**
119 |
120 | Visualization of the word vectors using Elmo:
121 |
122 | * Tsne
123 | 
124 |
125 | * Tensorboard
126 |
127 | 
128 |
129 |
130 | ### Using Elmo Embedding layer in consequent models
131 | if you want to use Elmo Embedding layer in consequent model build refer : https://github.com/PrashantRanjan09/WordEmbeddings-Elmo-Fasttext-Word2Vec
132 |
--------------------------------------------------------------------------------
/Tsne_vis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PrashantRanjan09/Elmo-Tutorial/0992a5ccc72c0da60931f61f166b05490785684e/Tsne_vis.png
--------------------------------------------------------------------------------
/tensorboard_vis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/PrashantRanjan09/Elmo-Tutorial/0992a5ccc72c0da60931f61f166b05490785684e/tensorboard_vis.png
--------------------------------------------------------------------------------
/train_elmo_updated.py:
--------------------------------------------------------------------------------
1 | import argparse
2 |
3 | import numpy as np
4 |
5 | from bilm.training import train, load_options_latest_checkpoint, load_vocab
6 | from bilm.data import BidirectionalLMDataset
7 |
8 |
9 | def main(args):
10 | # load the vocab
11 | vocab = load_vocab(args.vocab_file, 50)
12 | restart_ckpt_file = args.restart_ckpt_file
13 |
14 | # define the options
15 | batch_size = 128 # batch size for each GPU
16 | n_gpus = 3
17 |
18 | # number of tokens in training data (this for 1B Word Benchmark)
19 | n_train_tokens = 768648884
20 |
21 | options = {
22 | 'bidirectional': True,
23 |
24 | 'char_cnn': {'activation': 'relu',
25 | 'embedding': {'dim': 16},
26 | 'filters': [[1, 32],
27 | [2, 32],
28 | [3, 64],
29 | [4, 128],
30 | [5, 256],
31 | [6, 512],
32 | [7, 1024]],
33 | 'max_characters_per_token': 50,
34 | 'n_characters': 261,
35 | 'n_highway': 2},
36 |
37 | 'dropout': 0.1,
38 |
39 | 'lstm': {
40 | 'cell_clip': 3,
41 | 'dim': 4096,
42 | 'n_layers': 2,
43 | 'proj_clip': 3,
44 | 'projection_dim': 512,
45 | 'use_skip_connections': True},
46 |
47 | 'all_clip_norm_val': 10.0,
48 |
49 | 'n_epochs': 10,
50 | 'n_train_tokens': n_train_tokens,
51 | 'batch_size': batch_size,
52 | 'n_tokens_vocab': vocab.size,
53 | 'unroll_steps': 20,
54 | 'n_negative_samples_batch': 8192,
55 | }
56 |
57 | prefix = args.train_prefix
58 | data = BidirectionalLMDataset(prefix, vocab, test=False,
59 | shuffle_on_load=True)
60 |
61 | tf_save_dir = args.save_dir
62 | tf_log_dir = args.save_dir
63 | train(options, data, n_gpus, tf_save_dir, tf_log_dir,restart_ckpt_file)
64 |
65 |
66 | if __name__ == '__main__':
67 | parser = argparse.ArgumentParser()
68 | parser.add_argument('--save_dir', help='Location of checkpoint files')
69 | parser.add_argument('--vocab_file', help='Vocabulary file')
70 | parser.add_argument('--train_prefix', help='Prefix for train files')
71 | parser.add_argument('--restart_ckpt_file', help='latest checkpoint file to start with')
72 |
73 | args = parser.parse_args()
74 | main(args)
75 |
--------------------------------------------------------------------------------
/training_updated.py:
--------------------------------------------------------------------------------
1 |
2 | '''
3 | Train and test bidirectional language models.
4 | '''
5 |
6 | import os
7 | import time
8 | import json
9 | import re
10 |
11 | import tensorflow as tf
12 | import numpy as np
13 |
14 | from tensorflow.python.ops.init_ops import glorot_uniform_initializer
15 |
16 | from .data import Vocabulary, UnicodeCharsVocabulary
17 |
18 | DTYPE = 'float32'
19 | DTYPE_INT = 'int64'
20 |
21 | tf.logging.set_verbosity(tf.logging.INFO)
22 |
23 |
24 | def print_variable_summary():
25 | import pprint
26 | variables = sorted([[v.name, v.get_shape()] for v in tf.global_variables()])
27 | pprint.pprint(variables)
28 |
29 |
30 | class LanguageModel(object):
31 | '''
32 | A class to build the tensorflow computational graph for NLMs
33 | All hyperparameters and model configuration is specified in a dictionary
34 | of 'options'.
35 | is_training is a boolean used to control behavior of dropout layers
36 | and softmax. Set to False for testing.
37 | The LSTM cell is controlled by the 'lstm' key in options
38 | Here is an example:
39 | 'lstm': {
40 | 'cell_clip': 5,
41 | 'dim': 4096,
42 | 'n_layers': 2,
43 | 'proj_clip': 5,
44 | 'projection_dim': 512,
45 | 'use_skip_connections': True},
46 | 'projection_dim' is assumed token embedding size and LSTM output size.
47 | 'dim' is the hidden state size.
48 | Set 'dim' == 'projection_dim' to skip a projection layer.
49 | '''
50 | def __init__(self, options, is_training):
51 | self.options = options
52 | self.is_training = is_training
53 | self.bidirectional = options.get('bidirectional', False)
54 |
55 | # use word or char inputs?
56 | self.char_inputs = 'char_cnn' in self.options
57 |
58 | # for the loss function
59 | self.share_embedding_softmax = options.get(
60 | 'share_embedding_softmax', False)
61 | if self.char_inputs and self.share_embedding_softmax:
62 | raise ValueError("Sharing softmax and embedding weights requires "
63 | "word input")
64 |
65 | self.sample_softmax = options.get('sample_softmax', True)
66 |
67 | self._build()
68 |
69 | def _build_word_embeddings(self):
70 | n_tokens_vocab = self.options['n_tokens_vocab']
71 | batch_size = self.options['batch_size']
72 | unroll_steps = self.options['unroll_steps']
73 |
74 | # LSTM options
75 | projection_dim = self.options['lstm']['projection_dim']
76 |
77 | # the input token_ids and word embeddings
78 | self.token_ids = tf.placeholder(DTYPE_INT,
79 | shape=(batch_size, unroll_steps),
80 | name='token_ids')
81 | # the word embeddings
82 | with tf.device("/cpu:0"):
83 | self.embedding_weights = tf.get_variable(
84 | "embedding", [n_tokens_vocab, projection_dim],
85 | dtype=DTYPE,
86 | )
87 | self.embedding = tf.nn.embedding_lookup(self.embedding_weights,
88 | self.token_ids)
89 |
90 | # if a bidirectional LM then make placeholders for reverse
91 | # model and embeddings
92 | if self.bidirectional:
93 | self.token_ids_reverse = tf.placeholder(DTYPE_INT,
94 | shape=(batch_size, unroll_steps),
95 | name='token_ids_reverse')
96 | with tf.device("/cpu:0"):
97 | self.embedding_reverse = tf.nn.embedding_lookup(
98 | self.embedding_weights, self.token_ids_reverse)
99 |
100 | def _build_word_char_embeddings(self):
101 | '''
102 | options contains key 'char_cnn': {
103 | 'n_characters': 60,
104 | # includes the start / end characters
105 | 'max_characters_per_token': 17,
106 | 'filters': [
107 | [1, 32],
108 | [2, 32],
109 | [3, 64],
110 | [4, 128],
111 | [5, 256],
112 | [6, 512],
113 | [7, 512]
114 | ],
115 | 'activation': 'tanh',
116 | # for the character embedding
117 | 'embedding': {'dim': 16}
118 | # for highway layers
119 | # if omitted, then no highway layers
120 | 'n_highway': 2,
121 | }
122 | '''
123 | batch_size = self.options['batch_size']
124 | unroll_steps = self.options['unroll_steps']
125 | projection_dim = self.options['lstm']['projection_dim']
126 |
127 | cnn_options = self.options['char_cnn']
128 | filters = cnn_options['filters']
129 | n_filters = sum(f[1] for f in filters)
130 | max_chars = cnn_options['max_characters_per_token']
131 | char_embed_dim = cnn_options['embedding']['dim']
132 | n_chars = cnn_options['n_characters']
133 | if cnn_options['activation'] == 'tanh':
134 | activation = tf.nn.tanh
135 | elif cnn_options['activation'] == 'relu':
136 | activation = tf.nn.relu
137 |
138 | # the input character ids
139 | self.tokens_characters = tf.placeholder(DTYPE_INT,
140 | shape=(batch_size, unroll_steps, max_chars),
141 | name='tokens_characters')
142 | # the character embeddings
143 | with tf.device("/cpu:0"):
144 | self.embedding_weights = tf.get_variable(
145 | "char_embed", [n_chars, char_embed_dim],
146 | dtype=DTYPE,
147 | initializer=tf.random_uniform_initializer(-1.0, 1.0)
148 | )
149 | # shape (batch_size, unroll_steps, max_chars, embed_dim)
150 | self.char_embedding = tf.nn.embedding_lookup(self.embedding_weights,
151 | self.tokens_characters)
152 |
153 | if self.bidirectional:
154 | self.tokens_characters_reverse = tf.placeholder(DTYPE_INT,
155 | shape=(batch_size, unroll_steps, max_chars),
156 | name='tokens_characters_reverse')
157 | self.char_embedding_reverse = tf.nn.embedding_lookup(
158 | self.embedding_weights, self.tokens_characters_reverse)
159 |
160 |
161 | # the convolutions
162 | def make_convolutions(inp, reuse):
163 | with tf.variable_scope('CNN', reuse=reuse) as scope:
164 | convolutions = []
165 | for i, (width, num) in enumerate(filters):
166 | if cnn_options['activation'] == 'relu':
167 | # He initialization for ReLU activation
168 | # with char embeddings init between -1 and 1
169 | #w_init = tf.random_normal_initializer(
170 | # mean=0.0,
171 | # stddev=np.sqrt(2.0 / (width * char_embed_dim))
172 | #)
173 |
174 | # Kim et al 2015, +/- 0.05
175 | w_init = tf.random_uniform_initializer(
176 | minval=-0.05, maxval=0.05)
177 | elif cnn_options['activation'] == 'tanh':
178 | # glorot init
179 | w_init = tf.random_normal_initializer(
180 | mean=0.0,
181 | stddev=np.sqrt(1.0 / (width * char_embed_dim))
182 | )
183 | w = tf.get_variable(
184 | "W_cnn_%s" % i,
185 | [1, width, char_embed_dim, num],
186 | initializer=w_init,
187 | dtype=DTYPE)
188 | b = tf.get_variable(
189 | "b_cnn_%s" % i, [num], dtype=DTYPE,
190 | initializer=tf.constant_initializer(0.0))
191 |
192 | conv = tf.nn.conv2d(
193 | inp, w,
194 | strides=[1, 1, 1, 1],
195 | padding="VALID") + b
196 | # now max pool
197 | conv = tf.nn.max_pool(
198 | conv, [1, 1, max_chars-width+1, 1],
199 | [1, 1, 1, 1], 'VALID')
200 |
201 | # activation
202 | conv = activation(conv)
203 | conv = tf.squeeze(conv, squeeze_dims=[2])
204 |
205 | convolutions.append(conv)
206 |
207 | return tf.concat(convolutions, 2)
208 |
209 | # for first model, this is False, for others it's True
210 | reuse = tf.get_variable_scope().reuse
211 | embedding = make_convolutions(self.char_embedding, reuse)
212 |
213 | self.token_embedding_layers = [embedding]
214 |
215 | if self.bidirectional:
216 | # re-use the CNN weights from forward pass
217 | embedding_reverse = make_convolutions(
218 | self.char_embedding_reverse, True)
219 |
220 | # for highway and projection layers:
221 | # reshape from (batch_size, n_tokens, dim) to
222 | n_highway = cnn_options.get('n_highway')
223 | use_highway = n_highway is not None and n_highway > 0
224 | use_proj = n_filters != projection_dim
225 |
226 | if use_highway or use_proj:
227 | embedding = tf.reshape(embedding, [-1, n_filters])
228 | if self.bidirectional:
229 | embedding_reverse = tf.reshape(embedding_reverse,
230 | [-1, n_filters])
231 |
232 | # set up weights for projection
233 | if use_proj:
234 | assert n_filters > projection_dim
235 | with tf.variable_scope('CNN_proj') as scope:
236 | W_proj_cnn = tf.get_variable(
237 | "W_proj", [n_filters, projection_dim],
238 | initializer=tf.random_normal_initializer(
239 | mean=0.0, stddev=np.sqrt(1.0 / n_filters)),
240 | dtype=DTYPE)
241 | b_proj_cnn = tf.get_variable(
242 | "b_proj", [projection_dim],
243 | initializer=tf.constant_initializer(0.0),
244 | dtype=DTYPE)
245 |
246 | # apply highways layers
247 | def high(x, ww_carry, bb_carry, ww_tr, bb_tr):
248 | carry_gate = tf.nn.sigmoid(tf.matmul(x, ww_carry) + bb_carry)
249 | transform_gate = tf.nn.relu(tf.matmul(x, ww_tr) + bb_tr)
250 | return carry_gate * transform_gate + (1.0 - carry_gate) * x
251 |
252 | if use_highway:
253 | highway_dim = n_filters
254 |
255 | for i in range(n_highway):
256 | with tf.variable_scope('CNN_high_%s' % i) as scope:
257 | W_carry = tf.get_variable(
258 | 'W_carry', [highway_dim, highway_dim],
259 | # glorit init
260 | initializer=tf.random_normal_initializer(
261 | mean=0.0, stddev=np.sqrt(1.0 / highway_dim)),
262 | dtype=DTYPE)
263 | b_carry = tf.get_variable(
264 | 'b_carry', [highway_dim],
265 | initializer=tf.constant_initializer(-2.0),
266 | dtype=DTYPE)
267 | W_transform = tf.get_variable(
268 | 'W_transform', [highway_dim, highway_dim],
269 | initializer=tf.random_normal_initializer(
270 | mean=0.0, stddev=np.sqrt(1.0 / highway_dim)),
271 | dtype=DTYPE)
272 | b_transform = tf.get_variable(
273 | 'b_transform', [highway_dim],
274 | initializer=tf.constant_initializer(0.0),
275 | dtype=DTYPE)
276 |
277 | embedding = high(embedding, W_carry, b_carry,
278 | W_transform, b_transform)
279 | if self.bidirectional:
280 | embedding_reverse = high(embedding_reverse,
281 | W_carry, b_carry,
282 | W_transform, b_transform)
283 | self.token_embedding_layers.append(
284 | tf.reshape(embedding,
285 | [batch_size, unroll_steps, highway_dim])
286 | )
287 |
288 | # finally project down to projection dim if needed
289 | if use_proj:
290 | embedding = tf.matmul(embedding, W_proj_cnn) + b_proj_cnn
291 | if self.bidirectional:
292 | embedding_reverse = tf.matmul(embedding_reverse, W_proj_cnn) \
293 | + b_proj_cnn
294 | self.token_embedding_layers.append(
295 | tf.reshape(embedding,
296 | [batch_size, unroll_steps, projection_dim])
297 | )
298 |
299 | # reshape back to (batch_size, tokens, dim)
300 | if use_highway or use_proj:
301 | shp = [batch_size, unroll_steps, projection_dim]
302 | embedding = tf.reshape(embedding, shp)
303 | if self.bidirectional:
304 | embedding_reverse = tf.reshape(embedding_reverse, shp)
305 |
306 | # at last assign attributes for remainder of the model
307 | self.embedding = embedding
308 | if self.bidirectional:
309 | self.embedding_reverse = embedding_reverse
310 |
311 | def _build(self):
312 | # size of input options
313 | n_tokens_vocab = self.options['n_tokens_vocab']
314 | batch_size = self.options['batch_size']
315 | unroll_steps = self.options['unroll_steps']
316 |
317 | # LSTM options
318 | lstm_dim = self.options['lstm']['dim']
319 | projection_dim = self.options['lstm']['projection_dim']
320 | n_lstm_layers = self.options['lstm'].get('n_layers', 1)
321 | dropout = self.options['dropout']
322 | keep_prob = 1.0 - dropout
323 |
324 | if self.char_inputs:
325 | self._build_word_char_embeddings()
326 | else:
327 | self._build_word_embeddings()
328 |
329 | # now the LSTMs
330 | # these will collect the initial states for the forward
331 | # (and reverse LSTMs if we are doing bidirectional)
332 | self.init_lstm_state = []
333 | self.final_lstm_state = []
334 |
335 | # get the LSTM inputs
336 | if self.bidirectional:
337 | lstm_inputs = [self.embedding, self.embedding_reverse]
338 | else:
339 | lstm_inputs = [self.embedding]
340 |
341 | # now compute the LSTM outputs
342 | cell_clip = self.options['lstm'].get('cell_clip')
343 | proj_clip = self.options['lstm'].get('proj_clip')
344 |
345 | use_skip_connections = self.options['lstm'].get(
346 | 'use_skip_connections')
347 | if use_skip_connections:
348 | print("USING SKIP CONNECTIONS")
349 |
350 | lstm_outputs = []
351 | for lstm_num, lstm_input in enumerate(lstm_inputs):
352 | lstm_cells = []
353 | for i in range(n_lstm_layers):
354 | if projection_dim < lstm_dim:
355 | # are projecting down output
356 | lstm_cell = tf.nn.rnn_cell.LSTMCell(
357 | lstm_dim, num_proj=projection_dim,
358 | cell_clip=cell_clip, proj_clip=proj_clip)
359 | else:
360 | lstm_cell = tf.nn.rnn_cell.LSTMCell(
361 | lstm_dim,
362 | cell_clip=cell_clip, proj_clip=proj_clip)
363 |
364 | if use_skip_connections:
365 | # ResidualWrapper adds inputs to outputs
366 | if i == 0:
367 | # don't add skip connection from token embedding to
368 | # 1st layer output
369 | pass
370 | else:
371 | # add a skip connection
372 | lstm_cell = tf.nn.rnn_cell.ResidualWrapper(lstm_cell)
373 |
374 | # add dropout
375 | if self.is_training:
376 | lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell,
377 | input_keep_prob=keep_prob)
378 |
379 | lstm_cells.append(lstm_cell)
380 |
381 | if n_lstm_layers > 1:
382 | lstm_cell = tf.nn.rnn_cell.MultiRNNCell(lstm_cells)
383 | else:
384 | lstm_cell = lstm_cells[0]
385 |
386 | with tf.control_dependencies([lstm_input]):
387 | self.init_lstm_state.append(
388 | lstm_cell.zero_state(batch_size, DTYPE))
389 | # NOTE: this variable scope is for backward compatibility
390 | # with existing models...
391 | if self.bidirectional:
392 | with tf.variable_scope('RNN_%s' % lstm_num):
393 | _lstm_output_unpacked, final_state = tf.nn.static_rnn(
394 | lstm_cell,
395 | tf.unstack(lstm_input, axis=1),
396 | initial_state=self.init_lstm_state[-1])
397 | else:
398 | _lstm_output_unpacked, final_state = tf.nn.static_rnn(
399 | lstm_cell,
400 | tf.unstack(lstm_input, axis=1),
401 | initial_state=self.init_lstm_state[-1])
402 | self.final_lstm_state.append(final_state)
403 |
404 | # (batch_size * unroll_steps, 512)
405 | lstm_output_flat = tf.reshape(
406 | tf.stack(_lstm_output_unpacked, axis=1), [-1, projection_dim])
407 | if self.is_training:
408 | # add dropout to output
409 | lstm_output_flat = tf.nn.dropout(lstm_output_flat,
410 | keep_prob)
411 | tf.add_to_collection('lstm_output_embeddings',
412 | _lstm_output_unpacked)
413 |
414 | lstm_outputs.append(lstm_output_flat)
415 |
416 | self._build_loss(lstm_outputs)
417 |
418 | def _build_loss(self, lstm_outputs):
419 | '''
420 | Create:
421 | self.total_loss: total loss op for training
422 | self.softmax_W, softmax_b: the softmax variables
423 | self.next_token_id / _reverse: placeholders for gold input
424 | '''
425 | batch_size = self.options['batch_size']
426 | unroll_steps = self.options['unroll_steps']
427 |
428 | n_tokens_vocab = self.options['n_tokens_vocab']
429 |
430 | # DEFINE next_token_id and *_reverse placeholders for the gold input
431 | def _get_next_token_placeholders(suffix):
432 | name = 'next_token_id' + suffix
433 | id_placeholder = tf.placeholder(DTYPE_INT,
434 | shape=(batch_size, unroll_steps),
435 | name=name)
436 | return id_placeholder
437 |
438 | # get the window and weight placeholders
439 | self.next_token_id = _get_next_token_placeholders('')
440 | if self.bidirectional:
441 | self.next_token_id_reverse = _get_next_token_placeholders(
442 | '_reverse')
443 |
444 | # DEFINE THE SOFTMAX VARIABLES
445 | # get the dimension of the softmax weights
446 | # softmax dimension is the size of the output projection_dim
447 | softmax_dim = self.options['lstm']['projection_dim']
448 |
449 | # the output softmax variables -- they are shared if bidirectional
450 | if self.share_embedding_softmax:
451 | # softmax_W is just the embedding layer
452 | self.softmax_W = self.embedding_weights
453 |
454 | with tf.variable_scope('softmax'), tf.device('/cpu:0'):
455 | # Glorit init (std=(1.0 / sqrt(fan_in))
456 | softmax_init = tf.random_normal_initializer(0.0,
457 | 1.0 / np.sqrt(softmax_dim))
458 | if not self.share_embedding_softmax:
459 | self.softmax_W = tf.get_variable(
460 | 'W', [n_tokens_vocab, softmax_dim],
461 | dtype=DTYPE,
462 | initializer=softmax_init
463 | )
464 | self.softmax_b = tf.get_variable(
465 | 'b', [n_tokens_vocab],
466 | dtype=DTYPE,
467 | initializer=tf.constant_initializer(0.0))
468 |
469 | # now calculate losses
470 | # loss for each direction of the LSTM
471 | self.individual_losses = []
472 |
473 | if self.bidirectional:
474 | next_ids = [self.next_token_id, self.next_token_id_reverse]
475 | else:
476 | next_ids = [self.next_token_id]
477 |
478 | for id_placeholder, lstm_output_flat in zip(next_ids, lstm_outputs):
479 | # flatten the LSTM output and next token id gold to shape:
480 | # (batch_size * unroll_steps, softmax_dim)
481 | # Flatten and reshape the token_id placeholders
482 | next_token_id_flat = tf.reshape(id_placeholder, [-1, 1])
483 |
484 | with tf.control_dependencies([lstm_output_flat]):
485 | if self.is_training and self.sample_softmax:
486 | losses = tf.nn.sampled_softmax_loss(
487 | self.softmax_W, self.softmax_b,
488 | next_token_id_flat, lstm_output_flat,
489 | self.options['n_negative_samples_batch'],
490 | self.options['n_tokens_vocab'],
491 | num_true=1)
492 |
493 | else:
494 | # get the full softmax loss
495 | output_scores = tf.matmul(
496 | lstm_output_flat,
497 | tf.transpose(self.softmax_W)
498 | ) + self.softmax_b
499 | # NOTE: tf.nn.sparse_softmax_cross_entropy_with_logits
500 | # expects unnormalized output since it performs the
501 | # softmax internally
502 | losses = tf.nn.sparse_softmax_cross_entropy_with_logits(
503 | logits=output_scores,
504 | labels=tf.squeeze(next_token_id_flat, squeeze_dims=[1])
505 | )
506 |
507 | self.individual_losses.append(tf.reduce_mean(losses))
508 |
509 | # now make the total loss -- it's the mean of the individual losses
510 | if self.bidirectional:
511 | self.total_loss = 0.5 * (self.individual_losses[0]
512 | + self.individual_losses[1])
513 | else:
514 | self.total_loss = self.individual_losses[0]
515 |
516 |
517 | def average_gradients(tower_grads, batch_size, options):
518 | # calculate average gradient for each shared variable across all GPUs
519 | average_grads = []
520 | for grad_and_vars in zip(*tower_grads):
521 | # Note that each grad_and_vars looks like the following:
522 | # ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN))
523 | # We need to average the gradients across each GPU.
524 |
525 | g0, v0 = grad_and_vars[0]
526 |
527 | if g0 is None:
528 | # no gradient for this variable, skip it
529 | average_grads.append((g0, v0))
530 | continue
531 |
532 | if isinstance(g0, tf.IndexedSlices):
533 | # If the gradient is type IndexedSlices then this is a sparse
534 | # gradient with attributes indices and values.
535 | # To average, need to concat them individually then create
536 | # a new IndexedSlices object.
537 | indices = []
538 | values = []
539 | for g, v in grad_and_vars:
540 | indices.append(g.indices)
541 | values.append(g.values)
542 | all_indices = tf.concat(indices, 0)
543 | avg_values = tf.concat(values, 0) / len(grad_and_vars)
544 | # deduplicate across indices
545 | av, ai = _deduplicate_indexed_slices(avg_values, all_indices)
546 | grad = tf.IndexedSlices(av, ai, dense_shape=g0.dense_shape)
547 |
548 | else:
549 | # a normal tensor can just do a simple average
550 | grads = []
551 | for g, v in grad_and_vars:
552 | # Add 0 dimension to the gradients to represent the tower.
553 | expanded_g = tf.expand_dims(g, 0)
554 | # Append on a 'tower' dimension which we will average over
555 | grads.append(expanded_g)
556 |
557 | # Average over the 'tower' dimension.
558 | grad = tf.concat(grads, 0)
559 | grad = tf.reduce_mean(grad, 0)
560 |
561 | # the Variables are redundant because they are shared
562 | # across towers. So.. just return the first tower's pointer to
563 | # the Variable.
564 | v = grad_and_vars[0][1]
565 | grad_and_var = (grad, v)
566 |
567 | average_grads.append(grad_and_var)
568 |
569 | assert len(average_grads) == len(list(zip(*tower_grads)))
570 |
571 | return average_grads
572 |
573 |
574 | def summary_gradient_updates(grads, opt, lr):
575 | '''get summary ops for the magnitude of gradient updates'''
576 |
577 | # strategy:
578 | # make a dict of variable name -> [variable, grad, adagrad slot]
579 | vars_grads = {}
580 | for v in tf.trainable_variables():
581 | vars_grads[v.name] = [v, None, None]
582 | for g, v in grads:
583 | vars_grads[v.name][1] = g
584 | vars_grads[v.name][2] = opt.get_slot(v, 'accumulator')
585 |
586 | # now make summaries
587 | ret = []
588 | for vname, (v, g, a) in vars_grads.items():
589 |
590 | if g is None:
591 | continue
592 |
593 | if isinstance(g, tf.IndexedSlices):
594 | # a sparse gradient - only take norm of params that are updated
595 | values = tf.gather(v, g.indices)
596 | updates = lr * g.values
597 | if a is not None:
598 | updates /= tf.sqrt(tf.gather(a, g.indices))
599 | else:
600 | values = v
601 | updates = lr * g
602 | if a is not None:
603 | updates /= tf.sqrt(a)
604 |
605 | values_norm = tf.sqrt(tf.reduce_sum(v * v)) + 1.0e-7
606 | updates_norm = tf.sqrt(tf.reduce_sum(updates * updates))
607 | ret.append(
608 | tf.summary.scalar('UPDATE/' + vname.replace(":", "_"), updates_norm / values_norm))
609 |
610 | return ret
611 |
612 | def _deduplicate_indexed_slices(values, indices):
613 | """Sums `values` associated with any non-unique `indices`.
614 | Args:
615 | values: A `Tensor` with rank >= 1.
616 | indices: A one-dimensional integer `Tensor`, indexing into the first
617 | dimension of `values` (as in an IndexedSlices object).
618 | Returns:
619 | A tuple of (`summed_values`, `unique_indices`) where `unique_indices` is a
620 | de-duplicated version of `indices` and `summed_values` contains the sum of
621 | `values` slices associated with each unique index.
622 | """
623 | unique_indices, new_index_positions = tf.unique(indices)
624 | summed_values = tf.unsorted_segment_sum(
625 | values, new_index_positions,
626 | tf.shape(unique_indices)[0])
627 | return (summed_values, unique_indices)
628 |
629 |
630 | def _get_feed_dict_from_X(X, start, end, model, char_inputs, bidirectional):
631 | feed_dict = {}
632 | if not char_inputs:
633 | token_ids = X['token_ids'][start:end]
634 | feed_dict[model.token_ids] = token_ids
635 | else:
636 | # character inputs
637 | char_ids = X['tokens_characters'][start:end]
638 | feed_dict[model.tokens_characters] = char_ids
639 |
640 | if bidirectional:
641 | if not char_inputs:
642 | feed_dict[model.token_ids_reverse] = \
643 | X['token_ids_reverse'][start:end]
644 | else:
645 | feed_dict[model.tokens_characters_reverse] = \
646 | X['tokens_characters_reverse'][start:end]
647 |
648 | # now the targets with weights
649 | next_id_placeholders = [[model.next_token_id, '']]
650 | if bidirectional:
651 | next_id_placeholders.append([model.next_token_id_reverse, '_reverse'])
652 |
653 | for id_placeholder, suffix in next_id_placeholders:
654 | name = 'next_token_id' + suffix
655 | feed_dict[id_placeholder] = X[name][start:end]
656 |
657 | return feed_dict
658 |
659 |
660 | def train(options, data, n_gpus, tf_save_dir, tf_log_dir,
661 | restart_ckpt_file=None):
662 |
663 | # not restarting so save the options
664 | if restart_ckpt_file is None:
665 | with open(os.path.join(tf_save_dir, 'options.json'), 'w') as fout:
666 | fout.write(json.dumps(options))
667 |
668 | with tf.device('/cpu:0'):
669 | global_step = tf.get_variable(
670 | 'global_step', [],
671 | initializer=tf.constant_initializer(0), trainable=False)
672 |
673 | # set up the optimizer
674 | lr = options.get('learning_rate', 0.2)
675 | opt = tf.train.AdagradOptimizer(learning_rate=lr,
676 | initial_accumulator_value=1.0)
677 |
678 | # calculate the gradients on each GPU
679 | tower_grads = []
680 | models = []
681 | train_perplexity = tf.get_variable(
682 | 'train_perplexity', [],
683 | initializer=tf.constant_initializer(0.0), trainable=False)
684 | norm_summaries = []
685 | for k in range(n_gpus):
686 | with tf.device('/gpu:%d' % k):
687 | with tf.variable_scope('lm', reuse=k > 0):
688 | # calculate the loss for one model replica and get
689 | # lstm states
690 | model = LanguageModel(options, True)
691 | loss = model.total_loss
692 | models.append(model)
693 | # get gradients
694 | grads = opt.compute_gradients(
695 | loss * options['unroll_steps'],
696 | aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE,
697 | )
698 | tower_grads.append(grads)
699 | # keep track of loss across all GPUs
700 | train_perplexity += loss
701 |
702 | print_variable_summary()
703 |
704 | # calculate the mean of each gradient across all GPUs
705 | grads = average_gradients(tower_grads, options['batch_size'], options)
706 | grads, norm_summary_ops = clip_grads(grads, options, True, global_step)
707 | norm_summaries.extend(norm_summary_ops)
708 |
709 | # log the training perplexity
710 | train_perplexity = tf.exp(train_perplexity / n_gpus)
711 | perplexity_summmary = tf.summary.scalar(
712 | 'train_perplexity', train_perplexity)
713 |
714 | # some histogram summaries. all models use the same parameters
715 | # so only need to summarize one
716 | histogram_summaries = [
717 | tf.summary.histogram('token_embedding', models[0].embedding)
718 | ]
719 | # tensors of the output from the LSTM layer
720 | lstm_out = tf.get_collection('lstm_output_embeddings')
721 | histogram_summaries.append(
722 | tf.summary.histogram('lstm_embedding_0', lstm_out[0]))
723 | if options.get('bidirectional', False):
724 | # also have the backward embedding
725 | histogram_summaries.append(
726 | tf.summary.histogram('lstm_embedding_1', lstm_out[1]))
727 |
728 | # apply the gradients to create the training operation
729 | train_op = opt.apply_gradients(grads, global_step=global_step)
730 |
731 | # histograms of variables
732 | for v in tf.global_variables():
733 | histogram_summaries.append(tf.summary.histogram(v.name.replace(":", "_"), v))
734 |
735 | # get the gradient updates -- these aren't histograms, but we'll
736 | # only update them when histograms are computed
737 | histogram_summaries.extend(
738 | summary_gradient_updates(grads, opt, lr))
739 |
740 | saver = tf.train.Saver(tf.global_variables(), max_to_keep=2)
741 | summary_op = tf.summary.merge(
742 | [perplexity_summmary] + norm_summaries
743 | )
744 | hist_summary_op = tf.summary.merge(histogram_summaries)
745 |
746 | init = tf.initialize_all_variables()
747 |
748 | # do the training loop
749 | bidirectional = options.get('bidirectional', False)
750 | with tf.Session(config=tf.ConfigProto(
751 | allow_soft_placement=True)) as sess:
752 | sess.run(init)
753 |
754 | # load the checkpoint data if needed
755 | if restart_ckpt_file is not None:
756 | reader = tf.train.NewCheckpointReader(your_checkpoint_file)
757 | cur_vars = reader.get_variable_to_shape_map()
758 | exclude = ['the embedding layer name yo want to remove']
759 | variables_to_restore = tf.contrib.slim.get_variables_to_restore(exclude=exclude)
760 | loader = tf.train.Saver(variables_to_restore)
761 | #loader = tf.train.Saver()
762 | loader.save(sess,'/tmp')
763 | loader.restore(sess, '/tmp')
764 | with open(os.path.join(tf_save_dir, 'options.json'), 'w') as fout:
765 | fout.write(json.dumps(options))
766 |
767 | summary_writer = tf.summary.FileWriter(tf_log_dir, sess.graph)
768 |
769 | # For each batch:
770 | # Get a batch of data from the generator. The generator will
771 | # yield batches of size batch_size * n_gpus that are sliced
772 | # and fed for each required placeholer.
773 | #
774 | # We also need to be careful with the LSTM states. We will
775 | # collect the final LSTM states after each batch, then feed
776 | # them back in as the initial state for the next batch
777 |
778 | batch_size = options['batch_size']
779 | unroll_steps = options['unroll_steps']
780 | n_train_tokens = options.get('n_train_tokens', 768648884)
781 | n_tokens_per_batch = batch_size * unroll_steps * n_gpus
782 | n_batches_per_epoch = int(n_train_tokens / n_tokens_per_batch)
783 | n_batches_total = options['n_epochs'] * n_batches_per_epoch
784 | print("Training for %s epochs and %s batches" % (
785 | options['n_epochs'], n_batches_total))
786 |
787 | # get the initial lstm states
788 | init_state_tensors = []
789 | final_state_tensors = []
790 | for model in models:
791 | init_state_tensors.extend(model.init_lstm_state)
792 | final_state_tensors.extend(model.final_lstm_state)
793 |
794 | char_inputs = 'char_cnn' in options
795 | if char_inputs:
796 | max_chars = options['char_cnn']['max_characters_per_token']
797 |
798 | if not char_inputs:
799 | feed_dict = {
800 | model.token_ids:
801 | np.zeros([batch_size, unroll_steps], dtype=np.int64)
802 | for model in models
803 | }
804 | else:
805 | feed_dict = {
806 | model.tokens_characters:
807 | np.zeros([batch_size, unroll_steps, max_chars],
808 | dtype=np.int32)
809 | for model in models
810 | }
811 |
812 | if bidirectional:
813 | if not char_inputs:
814 | feed_dict.update({
815 | model.token_ids_reverse:
816 | np.zeros([batch_size, unroll_steps], dtype=np.int64)
817 | for model in models
818 | })
819 | else:
820 | feed_dict.update({
821 | model.tokens_characters_reverse:
822 | np.zeros([batch_size, unroll_steps, max_chars],
823 | dtype=np.int32)
824 | for model in models
825 | })
826 |
827 | init_state_values = sess.run(init_state_tensors, feed_dict=feed_dict)
828 |
829 | t1 = time.time()
830 | data_gen = data.iter_batches(batch_size * n_gpus, unroll_steps)
831 | for batch_no, batch in enumerate(data_gen, start=1):
832 |
833 | # slice the input in the batch for the feed_dict
834 | X = batch
835 | feed_dict = {t: v for t, v in zip(
836 | init_state_tensors, init_state_values)}
837 | for k in range(n_gpus):
838 | model = models[k]
839 | start = k * batch_size
840 | end = (k + 1) * batch_size
841 |
842 | feed_dict.update(
843 | _get_feed_dict_from_X(X, start, end, model,
844 | char_inputs, bidirectional)
845 | )
846 |
847 | # This runs the train_op, summaries and the "final_state_tensors"
848 | # which just returns the tensors, passing in the initial
849 | # state tensors, token ids and next token ids
850 | if batch_no % 1250 != 0:
851 | ret = sess.run(
852 | [train_op, summary_op, train_perplexity] +
853 | final_state_tensors,
854 | feed_dict=feed_dict
855 | )
856 |
857 | # first three entries of ret are:
858 | # train_op, summary_op, train_perplexity
859 | # last entries are the final states -- set them to
860 | # init_state_values
861 | # for next batch
862 | init_state_values = ret[3:]
863 |
864 | else:
865 | # also run the histogram summaries
866 | ret = sess.run(
867 | [train_op, summary_op, train_perplexity, hist_summary_op] +
868 | final_state_tensors,
869 | feed_dict=feed_dict
870 | )
871 | init_state_values = ret[4:]
872 |
873 |
874 | if batch_no % 1250 == 0:
875 | summary_writer.add_summary(ret[3], batch_no)
876 | if batch_no % 100 == 0:
877 | # write the summaries to tensorboard and display perplexity
878 | summary_writer.add_summary(ret[1], batch_no)
879 | print("Batch %s, train_perplexity=%s" % (batch_no, ret[2]))
880 | print("Total time: %s" % (time.time() - t1))
881 |
882 | if (batch_no % 1250 == 0) or (batch_no == n_batches_total):
883 | # save the model
884 | checkpoint_path = os.path.join(tf_save_dir, 'model.ckpt')
885 | saver.save(sess, checkpoint_path, global_step=global_step)
886 |
887 | if batch_no == n_batches_total:
888 | # done training!
889 | break
890 |
891 |
892 | def clip_by_global_norm_summary(t_list, clip_norm, norm_name, variables):
893 | # wrapper around tf.clip_by_global_norm that also does summary ops of norms
894 |
895 | # compute norms
896 | # use global_norm with one element to handle IndexedSlices vs dense
897 | norms = [tf.global_norm([t]) for t in t_list]
898 |
899 | # summary ops before clipping
900 | summary_ops = []
901 | for ns, v in zip(norms, variables):
902 | name = 'norm_pre_clip/' + v.name.replace(":", "_")
903 | summary_ops.append(tf.summary.scalar(name, ns))
904 |
905 | # clip
906 | clipped_t_list, tf_norm = tf.clip_by_global_norm(t_list, clip_norm)
907 |
908 | # summary ops after clipping
909 | norms_post = [tf.global_norm([t]) for t in clipped_t_list]
910 | for ns, v in zip(norms_post, variables):
911 | name = 'norm_post_clip/' + v.name.replace(":", "_")
912 | summary_ops.append(tf.summary.scalar(name, ns))
913 |
914 | summary_ops.append(tf.summary.scalar(norm_name, tf_norm))
915 |
916 | return clipped_t_list, tf_norm, summary_ops
917 |
918 |
919 | def clip_grads(grads, options, do_summaries, global_step):
920 | # grads = [(grad1, var1), (grad2, var2), ...]
921 | def _clip_norms(grad_and_vars, val, name):
922 | # grad_and_vars is a list of (g, v) pairs
923 | grad_tensors = [g for g, v in grad_and_vars]
924 | vv = [v for g, v in grad_and_vars]
925 | scaled_val = val
926 | if do_summaries:
927 | clipped_tensors, g_norm, so = clip_by_global_norm_summary(
928 | grad_tensors, scaled_val, name, vv)
929 | else:
930 | so = []
931 | clipped_tensors, g_norm = tf.clip_by_global_norm(
932 | grad_tensors, scaled_val)
933 |
934 | ret = []
935 | for t, (g, v) in zip(clipped_tensors, grad_and_vars):
936 | ret.append((t, v))
937 |
938 | return ret, so
939 |
940 | all_clip_norm_val = options['all_clip_norm_val']
941 | ret, summary_ops = _clip_norms(grads, all_clip_norm_val, 'norm_grad')
942 |
943 | assert len(ret) == len(grads)
944 |
945 | return ret, summary_ops
946 |
947 |
948 | def test(options, ckpt_file, data, batch_size=256):
949 | '''
950 | Get the test set perplexity!
951 | '''
952 |
953 | bidirectional = options.get('bidirectional', False)
954 | char_inputs = 'char_cnn' in options
955 | if char_inputs:
956 | max_chars = options['char_cnn']['max_characters_per_token']
957 |
958 | unroll_steps = 1
959 |
960 | config = tf.ConfigProto(allow_soft_placement=True)
961 | with tf.Session(config=config) as sess:
962 | with tf.device('/gpu:0'), tf.variable_scope('lm'):
963 | test_options = dict(options)
964 | # NOTE: the number of tokens we skip in the last incomplete
965 | # batch is bounded above batch_size * unroll_steps
966 | test_options['batch_size'] = batch_size
967 | test_options['unroll_steps'] = 1
968 | model = LanguageModel(test_options, False)
969 | # we use the "Saver" class to load the variables
970 | loader = tf.train.Saver()
971 | loader.restore(sess, ckpt_file)
972 |
973 | # model.total_loss is the op to compute the loss
974 | # perplexity is exp(loss)
975 | init_state_tensors = model.init_lstm_state
976 | final_state_tensors = model.final_lstm_state
977 | if not char_inputs:
978 | feed_dict = {
979 | model.token_ids:
980 | np.zeros([batch_size, unroll_steps], dtype=np.int64)
981 | }
982 | if bidirectional:
983 | feed_dict.update({
984 | model.token_ids_reverse:
985 | np.zeros([batch_size, unroll_steps], dtype=np.int64)
986 | })
987 | else:
988 | feed_dict = {
989 | model.tokens_characters:
990 | np.zeros([batch_size, unroll_steps, max_chars],
991 | dtype=np.int32)
992 | }
993 | if bidirectional:
994 | feed_dict.update({
995 | model.tokens_characters_reverse:
996 | np.zeros([batch_size, unroll_steps, max_chars],
997 | dtype=np.int32)
998 | })
999 |
1000 | init_state_values = sess.run(
1001 | init_state_tensors,
1002 | feed_dict=feed_dict)
1003 |
1004 | t1 = time.time()
1005 | batch_losses = []
1006 | total_loss = 0.0
1007 | for batch_no, batch in enumerate(
1008 | data.iter_batches(batch_size, 1), start=1):
1009 | # slice the input in the batch for the feed_dict
1010 | X = batch
1011 |
1012 | feed_dict = {t: v for t, v in zip(
1013 | init_state_tensors, init_state_values)}
1014 |
1015 | feed_dict.update(
1016 | _get_feed_dict_from_X(X, 0, X['token_ids'].shape[0], model,
1017 | char_inputs, bidirectional)
1018 | )
1019 |
1020 | ret = sess.run(
1021 | [model.total_loss, final_state_tensors],
1022 | feed_dict=feed_dict
1023 | )
1024 |
1025 | loss, init_state_values = ret
1026 | batch_losses.append(loss)
1027 | batch_perplexity = np.exp(loss)
1028 | total_loss += loss
1029 | avg_perplexity = np.exp(total_loss / batch_no)
1030 |
1031 | print("batch=%s, batch_perplexity=%s, avg_perplexity=%s, time=%s" %
1032 | (batch_no, batch_perplexity, avg_perplexity, time.time() - t1))
1033 |
1034 | avg_loss = np.mean(batch_losses)
1035 | print("FINSIHED! AVERAGE PERPLEXITY = %s" % np.exp(avg_loss))
1036 |
1037 | return np.exp(avg_loss)
1038 |
1039 |
1040 | def load_options_latest_checkpoint(tf_save_dir):
1041 | options_file = os.path.join(tf_save_dir, 'options.json')
1042 | ckpt_file = tf.train.latest_checkpoint(tf_save_dir)
1043 |
1044 | with open(options_file, 'r') as fin:
1045 | options = json.load(fin)
1046 |
1047 | return options, ckpt_file
1048 |
1049 |
1050 | def load_vocab(vocab_file, max_word_length=None):
1051 | if max_word_length:
1052 | return UnicodeCharsVocabulary(vocab_file, max_word_length,
1053 | validate_file=True)
1054 | else:
1055 | return Vocabulary(vocab_file, validate_file=True)
1056 |
1057 |
1058 | def dump_weights(tf_save_dir, outfile):
1059 | '''
1060 | Dump the trained weights from a model to a HDF5 file.
1061 | '''
1062 | import h5py
1063 |
1064 | def _get_outname(tf_name):
1065 | outname = re.sub(':0$', '', tf_name)
1066 | outname = outname.lstrip('lm/')
1067 | outname = re.sub('/rnn/', '/RNN/', outname)
1068 | outname = re.sub('/multi_rnn_cell/', '/MultiRNNCell/', outname)
1069 | outname = re.sub('/cell_', '/Cell', outname)
1070 | outname = re.sub('/lstm_cell/', '/LSTMCell/', outname)
1071 | if '/RNN/' in outname:
1072 | if 'projection' in outname:
1073 | outname = re.sub('projection/kernel', 'W_P_0', outname)
1074 | else:
1075 | outname = re.sub('/kernel', '/W_0', outname)
1076 | outname = re.sub('/bias', '/B', outname)
1077 | return outname
1078 |
1079 | options, ckpt_file = load_options_latest_checkpoint(tf_save_dir)
1080 |
1081 | config = tf.ConfigProto(allow_soft_placement=True)
1082 | with tf.Session(config=config) as sess:
1083 | with tf.variable_scope('lm'):
1084 | model = LanguageModel(options, False)
1085 | # we use the "Saver" class to load the variables
1086 | loader = tf.train.Saver()
1087 | loader.restore(sess, ckpt_file)
1088 |
1089 | with h5py.File(outfile, 'w') as fout:
1090 | for v in tf.trainable_variables():
1091 | if v.name.find('softmax') >= 0:
1092 | # don't dump these
1093 | continue
1094 | outname = _get_outname(v.name)
1095 | print("Saving variable {0} with name {1}".format(
1096 | v.name, outname))
1097 | shape = v.get_shape().as_list()
1098 | dset = fout.create_dataset(outname, shape, dtype='float32')
1099 | values = sess.run([v])[0]
1100 | dset[...] = values
1101 |
1102 | © 2018 GitHub, Inc.
1103 | Terms
1104 | Privacy
1105 | Security
1106 | Status
1107 | Help
1108 | Contact GitHub
1109 | API
1110 | Training
1111 | Shop
1112 | Blog
1113 | About
1114 | Press h to open a hovercard with more details.
1115 |
--------------------------------------------------------------------------------