├── .gitignore
├── LICENSE
├── Question_Answering_with_ALBERT.ipynb
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Ming
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/Question_Answering_with_ALBERT.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "nbformat": 4,
  3 |   "nbformat_minor": 0,
  4 |   "metadata": {
  5 |     "colab": {
  6 |       "name": "Question Answering with ALBERT.ipynb",
  7 |       "provenance": [],
  8 |       "private_outputs": true,
  9 |       "collapsed_sections": [],
 10 |       "toc_visible": true,
 11 |       "machine_shape": "hm",
 12 |       "authorship_tag": "ABX9TyPtt7t+plApQ1pDExTO4vM/",
 13 |       "include_colab_link": true
 14 |     },
 15 |     "kernelspec": {
 16 |       "name": "python3",
 17 |       "display_name": "Python 3"
 18 |     },
 19 |     "accelerator": "GPU"
 20 |   },
 21 |   "cells": [
 22 |     {
 23 |       "cell_type": "markdown",
 24 |       "metadata": {
 25 |         "id": "view-in-github",
 26 |         "colab_type": "text"
 27 |       },
 28 |       "source": [
 29 |         "<a href=\"https://colab.research.google.com/github/spark-ming/albert-qa-demo/blob/master/Question_Answering_with_ALBERT.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
 30 |       ]
 31 |     },
 32 |     {
 33 |       "cell_type": "markdown",
 34 |       "metadata": {
 35 |         "id": "1qfQAtRsMVl7",
 36 |         "colab_type": "text"
 37 |       },
 38 |       "source": [
 39 |         "# Reading Comprehension with ALBERT (and similar)\n",
 40 |         "\n",
 41 |         "Author: [@techno246](https://twitter.com/techno246)\n",
 42 |         "\n",
 43 |         "Github Repo: https://github.com/spark-ming/albert-qa-demo/\n",
 44 |         "\n",
 45 |         "Blog Post: https://www.spark64.com/post/machine-comprehension\n",
 46 |         "\n",
 47 |         "\n",
 48 |         "## Introduction\n",
 49 |         "\n",
 50 |         "Reading comprehension, otherwise known as question answering systems, are one of the tasks that NLP tries to solve. The goal of this task is to be able to answer an arbitary question given a context. For instance, given the following context:\n",
 51 |         "\n",
 52 |         "> New Zealand (Māori: Aotearoa) is a sovereign island country in the southwestern Pacific Ocean. It has a total land area of 268,000 square kilometres (103,500 sq mi), and a population of 4.9 million. New Zealand's capital city is Wellington, and its most populous city is Auckland.\n",
 53 |         "\n",
 54 |         "We ask the question\n",
 55 |         "\n",
 56 |         "> How many people live in New Zealand?\n",
 57 |         "\n",
 58 |         "We expect the QA system is to respond with something like this:\n",
 59 |         "\n",
 60 |         "> 4.9 million\n",
 61 |         "\n",
 62 |         "Since 2017, transformer models have shown to outperform existing approaches for this task. Many pretrained transformer models exist, including BERT, GPT-2, XLNET. One of the newcomers to the group is ALBERT (A Lite BERT) which was published in September 2019. The research group claims that it outperforms BERT, with much less parameters (shorter training and inference time). \n",
 63 |         "\n",
 64 |         "This tutorial demonstrates how you can fine-tune ALBERT for the task of QnA and use it for inference. For this tutorial, we will use the transformer library built by [Hugging Face](https://huggingface.co/), which is an extremely nice implementation of the transformer models (including ALBERT) in both TensorFlow and PyTorch. You can  just use a fine-tuned model from their [model repository](https://huggingface.co/models) (which I encourage in general to save money and reduce emissions). However for educational purposes I will also show you how to finetune it yourself so you can adapt it for your own data. \n",
 65 |         "\n",
 66 |         "Note that the goal of this is not to build an optimised, production ready system, but to demonstrate the concept with as little code as possible. Therefore a lot of code will be retrofitted for this purpose. \n"
 67 |       ]
 68 |     },
 69 |     {
 70 |       "cell_type": "markdown",
 71 |       "metadata": {
 72 |         "id": "sBBHbGvQN5vX",
 73 |         "colab_type": "text"
 74 |       },
 75 |       "source": [
 76 |         "## 1.0 Setup\n",
 77 |         "\n",
 78 |         "Let's check out what kind of GPU our friends at Google gave us. This notebook should be configured to give you a P100 😃 (saved in metadata)"
 79 |       ]
 80 |     },
 81 |     {
 82 |       "cell_type": "code",
 83 |       "metadata": {
 84 |         "id": "frTeTcy4WdbY",
 85 |         "colab_type": "code",
 86 |         "colab": {}
 87 |       },
 88 |       "source": [
 89 |         "!nvidia-smi"
 90 |       ],
 91 |       "execution_count": 0,
 92 |       "outputs": []
 93 |     },
 94 |     {
 95 |       "cell_type": "markdown",
 96 |       "metadata": {
 97 |         "id": "D5RImM3oWbrZ",
 98 |         "colab_type": "text"
 99 |       },
100 |       "source": [
101 |         "First, we clone the Hugging Face transformer library from Github.\n",
102 |         "\n",
103 |         "\n",
104 |         "Note it's checking out a specific commit only because I've tested this"
105 |       ]
106 |     },
107 |     {
108 |       "cell_type": "code",
109 |       "metadata": {
110 |         "id": "QOAoUwBFMQCg",
111 |         "colab_type": "code",
112 |         "colab": {}
113 |       },
114 |       "source": [
115 |         "!git clone https://github.com/huggingface/transformers \\\n",
116 |         "&& cd transformers \\\n",
117 |         "&& git checkout a3085020ed0d81d4903c50967687192e3101e770 "
118 |       ],
119 |       "execution_count": 0,
120 |       "outputs": []
121 |     },
122 |     {
123 |       "cell_type": "code",
124 |       "metadata": {
125 |         "id": "TRZned-8WJrj",
126 |         "colab_type": "code",
127 |         "colab": {}
128 |       },
129 |       "source": [
130 |         "!pip install ./transformers\n",
131 |         "!pip install tensorboardX"
132 |       ],
133 |       "execution_count": 0,
134 |       "outputs": []
135 |     },
136 |     {
137 |       "cell_type": "markdown",
138 |       "metadata": {
139 |         "id": "UHCuzhPptH0M",
140 |         "colab_type": "text"
141 |       },
142 |       "source": [
143 |         "## 2.0 Train Model\n",
144 |         "\n",
145 |         "This is where we can train our own model. Note you can skip this step if you don't want to wait 1.5 hours!"
146 |       ]
147 |     },
148 |     {
149 |       "cell_type": "markdown",
150 |       "metadata": {
151 |         "id": "OaQGsAiWXcnd",
152 |         "colab_type": "text"
153 |       },
154 |       "source": [
155 |         "### 2.1 Get Training and Evaluation Data\n",
156 |         "\n",
157 |         "The SQuAD dataset contains question/answer pairs to for training the ALBERT model for the QA task. \n",
158 |         "\n",
159 |         "Now get the SQuAD V2.0 dataset. `train-v2.0.json` is for training and `dev-v2.0.json` is for evaluation to see how well your model trained.\n",
160 |         "\n",
161 |         "Read more about this dataset here: https://rajpurkar.github.io/SQuAD-explorer/"
162 |       ]
163 |     },
164 |     {
165 |       "cell_type": "code",
166 |       "metadata": {
167 |         "id": "dI6e-PfOXSnO",
168 |         "colab_type": "code",
169 |         "colab": {}
170 |       },
171 |       "source": [
172 |         "!mkdir dataset \\\n",
173 |         "&& cd dataset \\\n",
174 |         "&& wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json \\\n",
175 |         "&& wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json"
176 |       ],
177 |       "execution_count": 0,
178 |       "outputs": []
179 |     },
180 |     {
181 |       "cell_type": "markdown",
182 |       "metadata": {
183 |         "id": "dZ87q93GDeeL",
184 |         "colab_type": "text"
185 |       },
186 |       "source": [
187 |         "### 2.2 Run training \n",
188 |         "\n",
189 |         "We can now train the model with the training set. \n",
190 |         "\n",
191 |         "### Notes about parameters:\n",
192 |         "`per_gpu_train_batch_size` specifies the number of training examples per iteration per GPU. *In general*, higher means more accuracy and faster training. However, the biggest limitation is the size of the GPU. 12 is what I use for a GPU with 16GB memory. \n",
193 |         "\n",
194 |         "`save_steps` specifies number of steps before it outputs a checkpoint file. I've increased it to save disk space.\n",
195 |         "\n",
196 |         "`num_train_epochs` I recommend two epochs here. It's currently set to one for the purpose of time\n",
197 |         "\n",
198 |         "`version_2_with_negative` is required for SQuAD V2.0. If training with V1.1, take out this flag\n",
199 |         "\n",
200 |         "Warning: it takes about 1.5 hours to train an epoch! If you don't want to wait this long, feel free to skip this step and note the comment in the code to use a pretrained model!"
201 |       ]
202 |     },
203 |     {
204 |       "cell_type": "code",
205 |       "metadata": {
206 |         "id": "-Eg53t3QXZAb",
207 |         "colab_type": "code",
208 |         "colab": {}
209 |       },
210 |       "source": [
211 |         "!export SQUAD_DIR=/content/dataset \\\n",
212 |         "&& python transformers/examples/run_squad.py \\\n",
213 |         "  --model_type albert \\\n",
214 |         "  --model_name_or_path albert-base-v2 \\\n",
215 |         "  --do_train \\\n",
216 |         "  --do_eval \\\n",
217 |         "  --do_lower_case \\\n",
218 |         "  --train_file $SQUAD_DIR/train-v2.0.json \\\n",
219 |         "  --predict_file $SQUAD_DIR/dev-v2.0.json \\\n",
220 |         "  --per_gpu_train_batch_size 12 \\\n",
221 |         "  --learning_rate 3e-5 \\\n",
222 |         "  --num_train_epochs 1.0 \\\n",
223 |         "  --max_seq_length 384 \\\n",
224 |         "  --doc_stride 128 \\\n",
225 |         "  --output_dir /content/model_output \\\n",
226 |         "  --save_steps 1000 \\\n",
227 |         "  --threads 4 \\\n",
228 |         "  --version_2_with_negative "
229 |       ],
230 |       "execution_count": 0,
231 |       "outputs": []
232 |     },
233 |     {
234 |       "cell_type": "markdown",
235 |       "metadata": {
236 |         "id": "-JCNRkQwUD56",
237 |         "colab_type": "text"
238 |       },
239 |       "source": [
240 |         "## 3.0 Setup prediction code\n",
241 |         "\n",
242 |         "Now we can use the Hugging Face library to make predictions using our newly trained model. Note that a lot of the code is pulled from `run_squad.py` in the Hugging Face repository, with all the training parts removed. This modified code allows to run predictions we pass in directly as strings, rather .json format like the training/test set.\n",
243 |         "\n",
244 |         "NOTE if you decided train your own mode, change the flag `use_own_model` to `True`\n"
245 |       ]
246 |     },
247 |     {
248 |       "cell_type": "code",
249 |       "metadata": {
250 |         "id": "qp0Pq9z9Y4S0",
251 |         "colab_type": "code",
252 |         "cellView": "code",
253 |         "colab": {}
254 |       },
255 |       "source": [
256 |         "import os\n",
257 |         "import torch\n",
258 |         "import time\n",
259 |         "from torch.utils.data import DataLoader, RandomSampler, SequentialSampler\n",
260 |         "\n",
261 |         "from transformers import (\n",
262 |         "    AlbertConfig,\n",
263 |         "    AlbertForQuestionAnswering,\n",
264 |         "    AlbertTokenizer,\n",
265 |         "    squad_convert_examples_to_features\n",
266 |         ")\n",
267 |         "\n",
268 |         "from transformers.data.processors.squad import SquadResult, SquadV2Processor, SquadExample\n",
269 |         "\n",
270 |         "from transformers.data.metrics.squad_metrics import compute_predictions_logits\n",
271 |         "\n",
272 |         "# READER NOTE: Set this flag to use own model, or use pretrained model in the Hugging Face repository\n",
273 |         "use_own_model = False\n",
274 |         "\n",
275 |         "if use_own_model:\n",
276 |         "  model_name_or_path = \"/content/model_output\"\n",
277 |         "else:\n",
278 |         "  model_name_or_path = \"ktrapeznikov/albert-xlarge-v2-squad-v2\"\n",
279 |         "\n",
280 |         "output_dir = \"\"\n",
281 |         "\n",
282 |         "# Config\n",
283 |         "n_best_size = 1\n",
284 |         "max_answer_length = 30\n",
285 |         "do_lower_case = True\n",
286 |         "null_score_diff_threshold = 0.0\n",
287 |         "\n",
288 |         "def to_list(tensor):\n",
289 |         "    return tensor.detach().cpu().tolist()\n",
290 |         "\n",
291 |         "# Setup model\n",
292 |         "config_class, model_class, tokenizer_class = (\n",
293 |         "    AlbertConfig, AlbertForQuestionAnswering, AlbertTokenizer)\n",
294 |         "config = config_class.from_pretrained(model_name_or_path)\n",
295 |         "tokenizer = tokenizer_class.from_pretrained(\n",
296 |         "    model_name_or_path, do_lower_case=True)\n",
297 |         "model = model_class.from_pretrained(model_name_or_path, config=config)\n",
298 |         "\n",
299 |         "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
300 |         "\n",
301 |         "model.to(device)\n",
302 |         "\n",
303 |         "processor = SquadV2Processor()\n",
304 |         "\n",
305 |         "def run_prediction(question_texts, context_text):\n",
306 |         "    \"\"\"Setup function to compute predictions\"\"\"\n",
307 |         "    examples = []\n",
308 |         "\n",
309 |         "    for i, question_text in enumerate(question_texts):\n",
310 |         "        example = SquadExample(\n",
311 |         "            qas_id=str(i),\n",
312 |         "            question_text=question_text,\n",
313 |         "            context_text=context_text,\n",
314 |         "            answer_text=None,\n",
315 |         "            start_position_character=None,\n",
316 |         "            title=\"Predict\",\n",
317 |         "            is_impossible=False,\n",
318 |         "            answers=None,\n",
319 |         "        )\n",
320 |         "\n",
321 |         "        examples.append(example)\n",
322 |         "\n",
323 |         "    features, dataset = squad_convert_examples_to_features(\n",
324 |         "        examples=examples,\n",
325 |         "        tokenizer=tokenizer,\n",
326 |         "        max_seq_length=384,\n",
327 |         "        doc_stride=128,\n",
328 |         "        max_query_length=64,\n",
329 |         "        is_training=False,\n",
330 |         "        return_dataset=\"pt\",\n",
331 |         "        threads=1,\n",
332 |         "    )\n",
333 |         "\n",
334 |         "    eval_sampler = SequentialSampler(dataset)\n",
335 |         "    eval_dataloader = DataLoader(dataset, sampler=eval_sampler, batch_size=10)\n",
336 |         "\n",
337 |         "    all_results = []\n",
338 |         "\n",
339 |         "    for batch in eval_dataloader:\n",
340 |         "        model.eval()\n",
341 |         "        batch = tuple(t.to(device) for t in batch)\n",
342 |         "\n",
343 |         "        with torch.no_grad():\n",
344 |         "            inputs = {\n",
345 |         "                \"input_ids\": batch[0],\n",
346 |         "                \"attention_mask\": batch[1],\n",
347 |         "                \"token_type_ids\": batch[2],\n",
348 |         "            }\n",
349 |         "\n",
350 |         "            example_indices = batch[3]\n",
351 |         "\n",
352 |         "            outputs = model(**inputs)\n",
353 |         "\n",
354 |         "            for i, example_index in enumerate(example_indices):\n",
355 |         "                eval_feature = features[example_index.item()]\n",
356 |         "                unique_id = int(eval_feature.unique_id)\n",
357 |         "\n",
358 |         "                output = [to_list(output[i]) for output in outputs]\n",
359 |         "\n",
360 |         "                start_logits, end_logits = output\n",
361 |         "                result = SquadResult(unique_id, start_logits, end_logits)\n",
362 |         "                all_results.append(result)\n",
363 |         "\n",
364 |         "    output_prediction_file = \"predictions.json\"\n",
365 |         "    output_nbest_file = \"nbest_predictions.json\"\n",
366 |         "    output_null_log_odds_file = \"null_predictions.json\"\n",
367 |         "\n",
368 |         "    predictions = compute_predictions_logits(\n",
369 |         "        examples,\n",
370 |         "        features,\n",
371 |         "        all_results,\n",
372 |         "        n_best_size,\n",
373 |         "        max_answer_length,\n",
374 |         "        do_lower_case,\n",
375 |         "        output_prediction_file,\n",
376 |         "        output_nbest_file,\n",
377 |         "        output_null_log_odds_file,\n",
378 |         "        False,  # verbose_logging\n",
379 |         "        True,  # version_2_with_negative\n",
380 |         "        null_score_diff_threshold,\n",
381 |         "        tokenizer,\n",
382 |         "    )\n",
383 |         "\n",
384 |         "    return predictions"
385 |       ],
386 |       "execution_count": 0,
387 |       "outputs": []
388 |     },
389 |     {
390 |       "cell_type": "markdown",
391 |       "metadata": {
392 |         "id": "nIQOB8vhpcKs",
393 |         "colab_type": "text"
394 |       },
395 |       "source": [
396 |         "## 4.0 Run predictions\n",
397 |         "\n",
398 |         "Now for the fun part... testing out your model on different inputs. Pretty rudimentary example here. But the possibilities are endless with this function."
399 |       ]
400 |     },
401 |     {
402 |       "cell_type": "code",
403 |       "metadata": {
404 |         "id": "F-sUrcA5nXTH",
405 |         "colab_type": "code",
406 |         "cellView": "code",
407 |         "colab": {}
408 |       },
409 |       "source": [
410 |         "context = \"New Zealand (Māori: Aotearoa) is a sovereign island country in the southwestern Pacific Ocean. It has a total land area of 268,000 square kilometres (103,500 sq mi), and a population of 4.9 million. New Zealand's capital city is Wellington, and its most populous city is Auckland.\"\n",
411 |         "questions = [\"How many people live in New Zealand?\", \n",
412 |         "             \"What's the largest city?\"]\n",
413 |         "\n",
414 |         "# Run method\n",
415 |         "predictions = run_prediction(questions, context)\n",
416 |         "\n",
417 |         "# Print results\n",
418 |         "for key in predictions.keys():\n",
419 |         "  print(predictions[key])"
420 |       ],
421 |       "execution_count": 0,
422 |       "outputs": []
423 |     },
424 |     {
425 |       "cell_type": "markdown",
426 |       "metadata": {
427 |         "id": "rkivu8FOqp_8",
428 |         "colab_type": "text"
429 |       },
430 |       "source": [
431 |         "## 5.0 Next Steps\n",
432 |         "\n",
433 |         "In this tutorial, you learnt how to fine-tune an ALBERT model for the task of question answering, using the SQuAD dataset. Then, you learnt how you can make predictions using the model. \n",
434 |         "\n",
435 |         "We retrofitted `compute_predictions_logits` to make the prediction for the purpose of simplicity and minimising dependencies in the tutorial. Take a peak inside that module to see how it works. If you want to serve this as an API, you will want to strip out a lot of the stuff it's doing (such as writing the predictions to a JSON, etc)\n",
436 |         "\n",
437 |         "You can now turn this into an API by serving it using a web framework. I recommend checking out FastAPI, which is what [Albert Learns to Read](https://littlealbert.now.sh) is built on. \n",
438 |         "\n",
439 |         "Feel free to open an issue in the [Github respository](https://github.com/spark-ming/albert-qa-demo/) for this notebook, or tweet me @techno246 if you have any questions! \n",
440 |         "\n"
441 |       ]
442 |     }
443 |   ]
444 | }


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Reading Comprehension with ALBERT (and similar)
 2 | 
 3 | Author: [@techno246](https://twitter.com/techno246)
 4 | 
 5 | Blog Post: https://www.spark64.com/post/machine-comprehension
 6 | 
 7 | ## Introduction
 8 | 
 9 | Reading comprehension, otherwise known as question answering systems, are one of the tasks that NLP tries to solve. The goal of this task is to be able to answer an arbitary question given a context. For instance, given the following context:
10 | 
11 | > New Zealand (Māori: Aotearoa) is a sovereign island country in the southwestern Pacific Ocean. It has a total land area of 268,000 square kilometres (103,500 sq mi), and a population of 4.9 million. New Zealand's capital city is Wellington, and its most populous city is Auckland.
12 | 
13 | We ask the question
14 | 
15 | > How many people live in New Zealand?
16 | 
17 | We expect the QA system is to respond with something like this:
18 | 
19 | > 4.9 million
20 | 
21 | Since 2017, transformer models have shown to outperform existing approaches for this task. Many pretrained transformer models exist, including BERT, GPT-2, XLNET. One of the newcomers to the group is ALBERT (A Lite BERT) which was published in September 2019. The research group claims that it outperforms BERT, with much less parameters (shorter training and inference time). 
22 | 
23 | This tutorial demonstrates how you can fine-tune ALBERT for the task of QnA and use it for inference. For this tutorial, we will use the transformer library built by Hugging Face, which is an extremely nice implementation of the transformer models (including ALBERT) in both TensorFlow and PyTorch. You can just use a fine-tuned model from their [model repository](https://huggingface.co/models) (which I encourage in general to save money and reduce emissions). However for educational purposes I will also show you how to finetune it yourself so you can adapt it for your own data. 
24 | 
25 | Note that the goal of this is not to build an optimised, production ready system, but to demonstrate the concept with as little code as possible. Therefore a lot of code will be retrofitted for this purpose. 
26 | 
27 | Get started below:
28 | 
29 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/spark-ming/albert-qa-demo/blob/master/Question_Answering_with_ALBERT.ipynb)
30 | 
31 | 


--------------------------------------------------------------------------------