├── Few-shot └── SetFit_SST_2_Few_shot.ipynb ├── OCR-Document-Processing └── Doc_Visual_QA_and_Bill_extraction_demo.ipynb ├── README.md ├── classification ├── BERT_Fine_Tuning_Sentence_Classification_v2.ipynb ├── Bert_Classification_Pt.ipynb ├── Generic_Transformer_Classification.ipynb ├── large_scale_multilabelclassification.ipynb └── transformers_multilabel_text_classification_with_problem_type.ipynb ├── docs ├── BERT_Fine_Tuning_Sentence_Classification_v2.md ├── BERT_Fine_Tuning_Sentence_Classification_v2_files │ └── BERT_Fine_Tuning_Sentence_Classification_v2_92_0.png ├── Bert_Classification_Pt.md ├── Bert_Classification_Pt_files │ └── Bert_Classification_Pt_28_0.png ├── Bert_Pre_Training.md ├── Doc_Visual_QA_and_Bill_extraction_demo.md ├── Doc_Visual_QA_and_Bill_extraction_demo_files │ ├── Doc_Visual_QA_and_Bill_extraction_demo_15_0.png │ ├── Doc_Visual_QA_and_Bill_extraction_demo_26_0.png │ └── Doc_Visual_QA_and_Bill_extraction_demo_35_0.png ├── GPT_2_on_Onnx_CPU.md ├── Generic_Transformer_Classification.md ├── Question_Answering_with_a_Fine_Tuned_BERT.md ├── Question_Answering_with_a_Fine_Tuned_BERT_files │ ├── Question_Answering_with_a_Fine_Tuned_BERT_44_0.png │ ├── Question_Answering_with_a_Fine_Tuned_BERT_46_0.png │ └── Question_Answering_with_a_Fine_Tuned_BERT_49_0.png ├── Seq2Seq_Pytorch.md ├── SetFit_SST_2_Few_shot.md ├── SetFit_SST_2_Few_shot_files │ └── SetFit_SST_2_Few_shot_22_0.png ├── Simpletransformers_2.md ├── TAPAS_fine_tuning_in_tf.md ├── Using_Transformers_with_Fastai_Tutorial.md ├── Wikipedia_answer_retrieval_DPR.md ├── contextual_topic_modeling.md ├── index.md ├── knowledge_distillation_exploration.md ├── large_scale_multilabelclassification.md ├── simpletransformers_intro.md ├── token_classification_transformers_zenml.md └── token_classification_transformers_zenml_files │ └── token_classification_transformers_zenml_36_1.svg ├── knowledge-distillation └── knowledge_distillation_exploration.ipynb ├── machine-translation └── Seq2Seq_Pytorch.ipynb ├── mkdocs.yml ├── nlp-onnx └── GPT_2_on_Onnx_CPU.ipynb ├── nlp-package-exploration ├── Simpletransformers_2.ipynb ├── Using_Transformers_with_Fastai_Tutorial.ipynb └── simpletransformers_intro.ipynb ├── opendomain-qa └── Wikipedia_answer_retrieval_DPR.ipynb ├── pretraining └── Bert_Pre_Training.ipynb ├── question-answering └── Question_Answering_with_a_Fine_Tuned_BERT.ipynb ├── table-qa └── TAPAS_fine_tuning_in_tf.ipynb ├── token-classification └── token_classification_transformers_zenml.ipynb └── topic-modeling └── contextual_topic_modeling.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # NLP Notebooks 2 | 3 | ![NLP](https://www.upwork.com/catalog-images/c0717a4a34e39d0ff4391b01b6898cd1) 4 | 5 | ## Pretraining 6 | 7 | * BERT Mask Language Modeling | Pretraining[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ankur3107/colab_notebooks/blob/master/pretraining/Bert_Pre_Training.ipynb) 8 | 9 | ## Classification 10 | 11 | * BERT Classification in Pytorch[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ankur3107/colab_notebooks/blob/master/classification/Bert_Classification_Pt.ipynb) 12 | 13 | * BERT Sentence Classification 2 in Pytorch[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ankur3107/colab_notebooks/blob/master/classification/BERT_Fine_Tuning_Sentence_Classification_v2.ipynb) 14 | 15 | * Generic Class for Classification using transformers[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ankur3107/colab_notebooks/blob/master/classification/Generic_Transformer_Classification.ipynb) 16 | 17 | * Large Scale Multi Label Classification[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ankur3107/colab_notebooks/blob/master/classification/large_scale_multilabelclassification.ipynb) 18 | 19 | ## Question Answering 20 | 21 | * Question Answering with a Fine Tuned BERT [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ankur3107/colab_notebooks/blob/master/question-answering/Question_Answering_with_a_Fine_Tuned_BERT.ipynb) 22 | 23 | 24 | ## Machine Translation 25 | 26 | * Machine Translation using transformers [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ankur3107/colab_notebooks/blob/master/machine-translation/Seq2Seq_Pytorch.ipynb) 27 | 28 | 29 | ## Topic Modeling 30 | 31 | * Contextual Topic Modeling [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ankur3107/colab_notebooks/blob/master/topic-modeling/contextual_topic_modeling.ipynb) 32 | 33 | 34 | ## Knowledge Distillation 35 | 36 | * BERT Knowledge Distillation in LSTM [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ankur3107/colab_notebooks/blob/master/knowledge-distillation/knowledge_distillation_exploration.ipynb) 37 | 38 | 39 | ## NLP Package Exploration 40 | 41 | * Simpletransformers Exploration [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ankur3107/colab_notebooks/blob/master/nlp-package-exploration/Simpletransformers_2.ipynb) 42 | 43 | * Fastai + Transformers Exploration [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Ankur3107/colab_notebooks/blob/master/nlp-package-exploration/Using_Transformers_with_Fastai_Tutorial.ipynb) 44 | -------------------------------------------------------------------------------- /docs/BERT_Fine_Tuning_Sentence_Classification_v2_files/BERT_Fine_Tuning_Sentence_Classification_v2_92_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ankur3107/nlp_notebooks/db671902eebb52f7174df5f302cbc700fe0b0b97/docs/BERT_Fine_Tuning_Sentence_Classification_v2_files/BERT_Fine_Tuning_Sentence_Classification_v2_92_0.png -------------------------------------------------------------------------------- /docs/Bert_Classification_Pt_files/Bert_Classification_Pt_28_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ankur3107/nlp_notebooks/db671902eebb52f7174df5f302cbc700fe0b0b97/docs/Bert_Classification_Pt_files/Bert_Classification_Pt_28_0.png -------------------------------------------------------------------------------- /docs/Bert_Pre_Training.md: -------------------------------------------------------------------------------- 1 | Open In Colab 2 | 3 | 4 | ``` 5 | !pip install transformers 6 | ``` 7 | 8 | Collecting transformers 9 | [?25l Downloading https://files.pythonhosted.org/packages/ae/05/c8c55b600308dc04e95100dc8ad8a244dd800fe75dfafcf1d6348c6f6209/transformers-3.1.0-py3-none-any.whl (884kB) 10 |  |████████████████████████████████| 890kB 3.4MB/s 11 | [?25hCollecting sacremoses 12 | [?25l Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB) 13 |  |████████████████████████████████| 890kB 16.3MB/s 14 | [?25hRequirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.6/dist-packages (from transformers) (4.41.1) 15 | Requirement already satisfied: dataclasses; python_version < "3.7" in /usr/local/lib/python3.6/dist-packages (from transformers) (0.7) 16 | Collecting tokenizers==0.8.1.rc2 17 | [?25l Downloading https://files.pythonhosted.org/packages/80/83/8b9fccb9e48eeb575ee19179e2bdde0ee9a1904f97de5f02d19016b8804f/tokenizers-0.8.1rc2-cp36-cp36m-manylinux1_x86_64.whl (3.0MB) 18 |  |████████████████████████████████| 3.0MB 25.3MB/s 19 | [?25hRequirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from transformers) (2.23.0) 20 | Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from transformers) (1.18.5) 21 | Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.6/dist-packages (from transformers) (2019.12.20) 22 | Collecting sentencepiece!=0.1.92 23 | [?25l Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB) 24 |  |████████████████████████████████| 1.1MB 44.6MB/s 25 | [?25hRequirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from transformers) (20.4) 26 | Requirement already satisfied: filelock in /usr/local/lib/python3.6/dist-packages (from transformers) (3.0.12) 27 | Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (1.15.0) 28 | Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (7.1.2) 29 | Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (0.16.0) 30 | Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (1.24.3) 31 | Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2020.6.20) 32 | Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2.10) 33 | Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (3.0.4) 34 | Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->transformers) (2.4.7) 35 | Building wheels for collected packages: sacremoses 36 | Building wheel for sacremoses (setup.py) ... [?25l[?25hdone 37 | Created wheel for sacremoses: filename=sacremoses-0.0.43-cp36-none-any.whl size=893257 sha256=b4d9f604b99e77f4dc2b8892460fc931726fa7d37c4fc51dfcb98c01d6d08797 38 | Stored in directory: /root/.cache/pip/wheels/29/3c/fd/7ce5c3f0666dab31a50123635e6fb5e19ceb42ce38d4e58f45 39 | Successfully built sacremoses 40 | Installing collected packages: sacremoses, tokenizers, sentencepiece, transformers 41 | Successfully installed sacremoses-0.0.43 sentencepiece-0.1.91 tokenizers-0.8.1rc2 transformers-3.1.0 42 | 43 | 44 | 45 | ``` 46 | MAX_LEN = 128 47 | BATCH_SIZE = 16 # per TPU core 48 | TOTAL_STEPS = 2000 # thats approx 4 epochs 49 | EVALUATE_EVERY = 200 50 | LR = 1e-5 51 | 52 | PRETRAINED_MODEL = 'bert-base-uncased' 53 | 54 | 55 | import os 56 | import numpy as np 57 | import pandas as pd 58 | import tensorflow as tf 59 | print(tf.__version__) 60 | from tensorflow.keras.optimizers import Adam 61 | import transformers 62 | from transformers import TFAutoModelWithLMHead, AutoTokenizer 63 | import logging 64 | 65 | AUTO = tf.data.experimental.AUTOTUNE 66 | ``` 67 | 68 | 2.3.0 69 | 70 | 71 | 72 | ``` 73 | def connect_to_TPU(): 74 | """Detect hardware, return appropriate distribution strategy""" 75 | try: 76 | # TPU detection. No parameters necessary if TPU_NAME environment variable is 77 | # set: this is always the case on Kaggle. 78 | tpu = tf.distribute.cluster_resolver.TPUClusterResolver() 79 | print('Running on TPU ', tpu.master()) 80 | except ValueError: 81 | tpu = None 82 | 83 | if tpu: 84 | tf.config.experimental_connect_to_cluster(tpu) 85 | tf.tpu.experimental.initialize_tpu_system(tpu) 86 | strategy = tf.distribute.experimental.TPUStrategy(tpu) 87 | else: 88 | # Default distribution strategy in Tensorflow. Works on CPU and single GPU. 89 | strategy = tf.distribute.get_strategy() 90 | 91 | global_batch_size = BATCH_SIZE * strategy.num_replicas_in_sync 92 | 93 | return tpu, strategy, global_batch_size 94 | 95 | 96 | tpu, strategy, global_batch_size = connect_to_TPU() 97 | print("REPLICAS: ", strategy.num_replicas_in_sync) 98 | ``` 99 | 100 | INFO:absl:Entering into master device scope: /job:worker/replica:0/task:0/device:CPU:0 101 | 102 | 103 | Running on TPU grpc://10.19.232.114:8470 104 | INFO:tensorflow:Initializing the TPU system: grpc://10.19.232.114:8470 105 | 106 | 107 | INFO:tensorflow:Initializing the TPU system: grpc://10.19.232.114:8470 108 | 109 | 110 | INFO:tensorflow:Clearing out eager caches 111 | 112 | 113 | INFO:tensorflow:Clearing out eager caches 114 | 115 | 116 | INFO:tensorflow:Finished initializing TPU system. 117 | 118 | 119 | INFO:tensorflow:Finished initializing TPU system. 120 | WARNING:absl:`tf.distribute.experimental.TPUStrategy` is deprecated, please use the non experimental symbol `tf.distribute.TPUStrategy` instead. 121 | 122 | 123 | INFO:tensorflow:Found TPU system: 124 | 125 | 126 | INFO:tensorflow:Found TPU system: 127 | 128 | 129 | INFO:tensorflow:*** Num TPU Cores: 8 130 | 131 | 132 | INFO:tensorflow:*** Num TPU Cores: 8 133 | 134 | 135 | INFO:tensorflow:*** Num TPU Workers: 1 136 | 137 | 138 | INFO:tensorflow:*** Num TPU Workers: 1 139 | 140 | 141 | INFO:tensorflow:*** Num TPU Cores Per Worker: 8 142 | 143 | 144 | INFO:tensorflow:*** Num TPU Cores Per Worker: 8 145 | 146 | 147 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0) 148 | 149 | 150 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0) 151 | 152 | 153 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0) 154 | 155 | 156 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0) 157 | 158 | 159 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0) 160 | 161 | 162 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0) 163 | 164 | 165 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0) 166 | 167 | 168 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0) 169 | 170 | 171 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0) 172 | 173 | 174 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0) 175 | 176 | 177 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0) 178 | 179 | 180 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0) 181 | 182 | 183 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0) 184 | 185 | 186 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0) 187 | 188 | 189 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0) 190 | 191 | 192 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0) 193 | 194 | 195 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0) 196 | 197 | 198 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0) 199 | 200 | 201 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0) 202 | 203 | 204 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0) 205 | 206 | 207 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0) 208 | 209 | 210 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0) 211 | 212 | 213 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0) 214 | 215 | 216 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0) 217 | 218 | 219 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0) 220 | 221 | 222 | INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0) 223 | 224 | 225 | REPLICAS: 8 226 | 227 | 228 | 229 | ``` 230 | !wget https://raw.githubusercontent.com/SrinidhiRaghavan/AI-Sentiment-Analysis-on-IMDB-Dataset/master/imdb_tr.csv 231 | ``` 232 | 233 | --2020-09-02 10:33:57-- https://raw.githubusercontent.com/SrinidhiRaghavan/AI-Sentiment-Analysis-on-IMDB-Dataset/master/imdb_tr.csv 234 | Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ... 235 | Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected. 236 | HTTP request sent, awaiting response... 200 OK 237 | Length: 23677025 (23M) [text/plain] 238 | Saving to: ‘imdb_tr.csv’ 239 | 240 | imdb_tr.csv 100%[===================>] 22.58M 49.2MB/s in 0.5s 241 | 242 | 2020-09-02 10:33:58 (49.2 MB/s) - ‘imdb_tr.csv’ saved [23677025/23677025] 243 | 244 | 245 | 246 | 247 | ``` 248 | data = pd.read_csv('imdb_tr.csv', encoding = "ISO-8859-1") 249 | ``` 250 | 251 | 252 | ``` 253 | data.head() 254 | ``` 255 | 256 | 257 | 258 | 259 |
260 | 273 | 274 | 275 | 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | 286 | 287 | 288 | 289 | 290 | 291 | 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | 309 | 310 | 311 | 312 | 313 | 314 |
row_Numbertextpolarity
02148first think another Disney movie, might good, ...1
123577Put aside Dr. House repeat missed, Desperate H...0
21319big fan Stephen King's work, film made even gr...1
313358watched horrid thing TV. Needless say one movi...0
49495truly enjoyed film. acting terrific plot. Jeff...1
315 |
316 | 317 | 318 | 319 | 320 | ``` 321 | #data = data.sample(1000) 322 | ``` 323 | 324 | 325 | ``` 326 | %%time 327 | 328 | def regular_encode(texts, tokenizer, maxlen=512): 329 | enc_di = tokenizer.batch_encode_plus( 330 | texts, 331 | return_attention_mask=False, 332 | return_token_type_ids=False, 333 | pad_to_max_length=True, 334 | max_length=maxlen, 335 | truncation=True 336 | ) 337 | 338 | return np.array(enc_di['input_ids']) 339 | 340 | 341 | tokenizer = AutoTokenizer.from_pretrained(PRETRAINED_MODEL) 342 | X_data = regular_encode(data.text.values, tokenizer, maxlen=MAX_LEN) 343 | ``` 344 | 345 | 346 | HBox(children=(FloatProgress(value=0.0, description='Downloading', max=433.0, style=ProgressStyle(description_… 347 | 348 | 349 | 350 | 351 | 352 | 353 | HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti… 354 | 355 | 356 | 357 | 358 | 359 | /usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils_base.py:1770: FutureWarning: The `pad_to_max_length` argument is deprecated and will be removed in a future version, use `padding=True` or `padding='longest'` to pad to the longest sequence in the batch, or use `padding='max_length'` to pad to a max length. In this case, you can give a specific length with `max_length` (e.g. `max_length=45`) or leave max_length to None to pad to the maximal input size of the model (e.g. 512 for Bert). 360 | FutureWarning, 361 | 362 | 363 | CPU times: user 1min 4s, sys: 233 ms, total: 1min 4s 364 | Wall time: 1min 5s 365 | 366 | 367 | 368 | ``` 369 | def prepare_mlm_input_and_labels(X): 370 | # 15% BERT masking 371 | inp_mask = np.random.rand(*X.shape)<0.15 372 | # do not mask special tokens 373 | inp_mask[X<=2] = False 374 | # set targets to -1 by default, it means ignore 375 | labels = -1 * np.ones(X.shape, dtype=int) 376 | # set labels for masked tokens 377 | labels[inp_mask] = X[inp_mask] 378 | 379 | # prepare input 380 | X_mlm = np.copy(X) 381 | # set input to [MASK] which is the last token for the 90% of tokens 382 | # this means leaving 10% unchanged 383 | inp_mask_2mask = inp_mask & (np.random.rand(*X.shape)<0.90) 384 | X_mlm[inp_mask_2mask] = tokenizer.mask_token_id # mask token is the last in the dict 385 | 386 | # set 10% to a random token 387 | inp_mask_2random = inp_mask_2mask & (np.random.rand(*X.shape) < 1/9) 388 | X_mlm[inp_mask_2random] = np.random.randint(3, tokenizer.mask_token_id, inp_mask_2random.sum()) 389 | 390 | return X_mlm, labels 391 | 392 | 393 | # use validation and test data for mlm 394 | X_train_mlm = np.vstack(X_data) 395 | # masks and labels 396 | X_train_mlm, y_train_mlm = prepare_mlm_input_and_labels(X_train_mlm) 397 | ``` 398 | 399 | 400 | ``` 401 | def create_dist_dataset(X, y=None, training=False): 402 | dataset = tf.data.Dataset.from_tensor_slices(X) 403 | 404 | ### Add y if present ### 405 | if y is not None: 406 | dataset_y = tf.data.Dataset.from_tensor_slices(y) 407 | dataset = tf.data.Dataset.zip((dataset, dataset_y)) 408 | 409 | ### Repeat if training ### 410 | if training: 411 | dataset = dataset.shuffle(len(X)).repeat() 412 | 413 | dataset = dataset.batch(global_batch_size).prefetch(AUTO) 414 | 415 | ### make it distributed ### 416 | dist_dataset = strategy.experimental_distribute_dataset(dataset) 417 | 418 | return dist_dataset 419 | 420 | 421 | train_dist_dataset = create_dist_dataset(X_train_mlm, y_train_mlm, True) 422 | 423 | ``` 424 | 425 | 426 | ``` 427 | %%time 428 | 429 | def create_mlm_model_and_optimizer(): 430 | with strategy.scope(): 431 | model = TFAutoModelWithLMHead.from_pretrained(PRETRAINED_MODEL) 432 | optimizer = tf.keras.optimizers.Adam(learning_rate=LR) 433 | return model, optimizer 434 | 435 | 436 | mlm_model, optimizer = create_mlm_model_and_optimizer() 437 | mlm_model.summary() 438 | ``` 439 | 440 | /usr/local/lib/python3.6/dist-packages/transformers/modeling_tf_auto.py:788: FutureWarning: The class `TFAutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `TFAutoModelForCausalLM` for causal language models, `TFAutoModelForMaskedLM` for masked language models and `TFAutoModelForSeq2SeqLM` for encoder-decoder models. 441 | FutureWarning, 442 | 443 | 444 | 445 | HBox(children=(FloatProgress(value=0.0, description='Downloading', max=536063208.0, style=ProgressStyle(descri… 446 | 447 | 448 | 449 | 450 | 451 | Some weights of the model checkpoint at bert-base-uncased were not used when initializing TFBertForMaskedLM: ['nsp___cls'] 452 | - This IS expected if you are initializing TFBertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model). 453 | - This IS NOT expected if you are initializing TFBertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 454 | All the weights of TFBertForMaskedLM were initialized from the model checkpoint at bert-base-uncased. 455 | If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForMaskedLM for predictions without further training. 456 | 457 | 458 | Model: "tf_bert_for_masked_lm" 459 | _________________________________________________________________ 460 | Layer (type) Output Shape Param # 461 | ================================================================= 462 | bert (TFBertMainLayer) multiple 109482240 463 | _________________________________________________________________ 464 | mlm___cls (TFBertMLMHead) multiple 24459834 465 | ================================================================= 466 | Total params: 110,104,890 467 | Trainable params: 110,104,890 468 | Non-trainable params: 0 469 | _________________________________________________________________ 470 | CPU times: user 14.5 s, sys: 15 s, total: 29.5 s 471 | Wall time: 58.1 s 472 | 473 | 474 | 475 | ``` 476 | def define_mlm_loss_and_metrics(): 477 | with strategy.scope(): 478 | mlm_loss_object = masked_sparse_categorical_crossentropy 479 | 480 | def compute_mlm_loss(labels, predictions): 481 | per_example_loss = mlm_loss_object(labels, predictions) 482 | loss = tf.nn.compute_average_loss( 483 | per_example_loss, global_batch_size = global_batch_size) 484 | return loss 485 | 486 | train_mlm_loss_metric = tf.keras.metrics.Mean() 487 | 488 | return compute_mlm_loss, train_mlm_loss_metric 489 | 490 | 491 | def masked_sparse_categorical_crossentropy(y_true, y_pred): 492 | y_true_masked = tf.boolean_mask(y_true, tf.not_equal(y_true, -1)) 493 | y_pred_masked = tf.boolean_mask(y_pred, tf.not_equal(y_true, -1)) 494 | loss = tf.keras.losses.sparse_categorical_crossentropy(y_true_masked, 495 | y_pred_masked, 496 | from_logits=True) 497 | return loss 498 | 499 | 500 | 501 | def train_mlm(train_dist_dataset, total_steps=2000, evaluate_every=200): 502 | step = 0 503 | ### Training lopp ### 504 | for tensor in train_dist_dataset: 505 | distributed_mlm_train_step(tensor) 506 | step+=1 507 | 508 | if (step % evaluate_every == 0): 509 | ### Print train metrics ### 510 | train_metric = train_mlm_loss_metric.result().numpy() 511 | print("Step %d, train loss: %.2f" % (step, train_metric)) 512 | 513 | ### Reset metrics ### 514 | train_mlm_loss_metric.reset_states() 515 | 516 | if step == total_steps: 517 | break 518 | 519 | 520 | @tf.function 521 | def distributed_mlm_train_step(data): 522 | strategy.experimental_run_v2(mlm_train_step, args=(data,)) 523 | 524 | 525 | @tf.function 526 | def mlm_train_step(inputs): 527 | features, labels = inputs 528 | 529 | with tf.GradientTape() as tape: 530 | predictions = mlm_model(features, training=True)[0] 531 | loss = compute_mlm_loss(labels, predictions) 532 | 533 | gradients = tape.gradient(loss, mlm_model.trainable_variables) 534 | optimizer.apply_gradients(zip(gradients, mlm_model.trainable_variables)) 535 | 536 | train_mlm_loss_metric.update_state(loss) 537 | 538 | 539 | compute_mlm_loss, train_mlm_loss_metric = define_mlm_loss_and_metrics() 540 | ``` 541 | 542 | 543 | ``` 544 | %%time 545 | train_mlm(train_dist_dataset, TOTAL_STEPS, EVALUATE_EVERY) 546 | ``` 547 | 548 | WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version. 549 | Instructions for updating: 550 | Use `tf.data.Iterator.get_next_as_optional()` instead. 551 | 552 | 553 | WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py:601: get_next_as_optional (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version. 554 | Instructions for updating: 555 | Use `tf.data.Iterator.get_next_as_optional()` instead. 556 | 557 | 558 | WARNING:tensorflow:From :47: StrategyBase.experimental_run_v2 (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version. 559 | Instructions for updating: 560 | renamed to `run` 561 | 562 | 563 | WARNING:tensorflow:From :47: StrategyBase.experimental_run_v2 (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version. 564 | Instructions for updating: 565 | renamed to `run` 566 | 567 | 568 | WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_for_masked_lm/bert/pooler/dense/kernel:0', 'tf_bert_for_masked_lm/bert/pooler/dense/bias:0'] when minimizing the loss. 569 | 570 | 571 | WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_for_masked_lm/bert/pooler/dense/kernel:0', 'tf_bert_for_masked_lm/bert/pooler/dense/bias:0'] when minimizing the loss. 572 | 573 | 574 | WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_for_masked_lm/bert/pooler/dense/kernel:0', 'tf_bert_for_masked_lm/bert/pooler/dense/bias:0'] when minimizing the loss. 575 | 576 | 577 | WARNING:tensorflow:Gradients do not exist for variables ['tf_bert_for_masked_lm/bert/pooler/dense/kernel:0', 'tf_bert_for_masked_lm/bert/pooler/dense/bias:0'] when minimizing the loss. 578 | 579 | 580 | Step 200, train loss: 8.89 581 | Step 400, train loss: 8.03 582 | Step 600, train loss: 7.68 583 | Step 800, train loss: 7.43 584 | Step 1000, train loss: 7.22 585 | Step 1200, train loss: 7.00 586 | Step 1400, train loss: 6.86 587 | Step 1600, train loss: 6.68 588 | Step 1800, train loss: 6.54 589 | Step 2000, train loss: 6.38 590 | CPU times: user 1min 23s, sys: 13.4 s, total: 1min 37s 591 | Wall time: 9min 3s 592 | 593 | 594 | 595 | ``` 596 | mlm_model.save_pretrained('imdb_bert_uncased') 597 | ``` 598 | 599 | # Load and Test 600 | 601 | 602 | ``` 603 | from transformers import * 604 | from pprint import pprint 605 | ``` 606 | 607 | 608 | ``` 609 | pretrained_model = TFAutoModelWithLMHead.from_pretrained(PRETRAINED_MODEL) 610 | nlp = pipeline("fill-mask",model=pretrained_model, tokenizer=tokenizer ,framework='tf') 611 | pprint(nlp(f"I watched {nlp.tokenizer.mask_token} and that was awesome")) 612 | ``` 613 | 614 | /usr/local/lib/python3.6/dist-packages/transformers/modeling_tf_auto.py:788: FutureWarning: The class `TFAutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `TFAutoModelForCausalLM` for causal language models, `TFAutoModelForMaskedLM` for masked language models and `TFAutoModelForSeq2SeqLM` for encoder-decoder models. 615 | FutureWarning, 616 | Some weights of the model checkpoint at bert-base-uncased were not used when initializing TFBertForMaskedLM: ['nsp___cls'] 617 | - This IS expected if you are initializing TFBertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model). 618 | - This IS NOT expected if you are initializing TFBertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 619 | All the weights of TFBertForMaskedLM were initialized from the model checkpoint at bert-base-uncased. 620 | If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForMaskedLM for predictions without further training. 621 | 622 | 623 | [{'score': 0.31239137053489685, 624 | 'sequence': '[CLS] i watched him and that was awesome [SEP]', 625 | 'token': 2032, 626 | 'token_str': 'him'}, 627 | {'score': 0.1729636937379837, 628 | 'sequence': '[CLS] i watched her and that was awesome [SEP]', 629 | 'token': 2014, 630 | 'token_str': 'her'}, 631 | {'score': 0.13816313445568085, 632 | 'sequence': '[CLS] i watched it and that was awesome [SEP]', 633 | 'token': 2009, 634 | 'token_str': 'it'}, 635 | {'score': 0.08374697715044022, 636 | 'sequence': '[CLS] i watched, and that was awesome [SEP]', 637 | 'token': 1010, 638 | 'token_str': ','}, 639 | {'score': 0.06438492983579636, 640 | 'sequence': '[CLS] i watched them and that was awesome [SEP]', 641 | 'token': 2068, 642 | 'token_str': 'them'}] 643 | 644 | 645 | 646 | ``` 647 | movie_mlm_model = TFAutoModelWithLMHead.from_pretrained('imdb_bert_uncased') 648 | nlp = pipeline("fill-mask",model=movie_mlm_model, tokenizer=tokenizer ,framework='tf') 649 | pprint(nlp(f"I watched {nlp.tokenizer.mask_token} and that was awesome")) 650 | ``` 651 | 652 | /usr/local/lib/python3.6/dist-packages/transformers/modeling_tf_auto.py:788: FutureWarning: The class `TFAutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `TFAutoModelForCausalLM` for causal language models, `TFAutoModelForMaskedLM` for masked language models and `TFAutoModelForSeq2SeqLM` for encoder-decoder models. 653 | FutureWarning, 654 | All model checkpoint weights were used when initializing TFBertForMaskedLM. 655 | 656 | All the weights of TFBertForMaskedLM were initialized from the model checkpoint at imdb_bert_uncased. 657 | If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForMaskedLM for predictions without further training. 658 | 659 | 660 | [{'score': 0.4467789828777313, 661 | 'sequence': '[CLS] i watched it and that was awesome [SEP]', 662 | 'token': 2009, 663 | 'token_str': 'it'}, 664 | {'score': 0.06318594515323639, 665 | 'sequence': '[CLS] i watched movie and that was awesome [SEP]', 666 | 'token': 3185, 667 | 'token_str': 'movie'}, 668 | {'score': 0.056345004588365555, 669 | 'sequence': '[CLS] i watched, and that was awesome [SEP]', 670 | 'token': 1010, 671 | 'token_str': ','}, 672 | {'score': 0.013144557364284992, 673 | 'sequence': '[CLS] i watched this and that was awesome [SEP]', 674 | 'token': 2023, 675 | 'token_str': 'this'}, 676 | {'score': 0.012886741198599339, 677 | 'sequence': '[CLS] i watched one and that was awesome [SEP]', 678 | 'token': 2028, 679 | 'token_str': 'one'}] 680 | 681 | 682 | 683 | ``` 684 | 685 | ``` 686 | -------------------------------------------------------------------------------- /docs/Doc_Visual_QA_and_Bill_extraction_demo.md: -------------------------------------------------------------------------------- 1 | Open In Colab 2 | 3 | 4 | ```python 5 | !wget --no-check-certificate https://datasets.cvc.uab.es/rrc/DocVQA/train.tar.gz 6 | ``` 7 | 8 | --2022-05-01 13:20:26-- https://datasets.cvc.uab.es/rrc/DocVQA/train.tar.gz 9 | Resolving datasets.cvc.uab.es (datasets.cvc.uab.es)... 158.109.8.18 10 | Connecting to datasets.cvc.uab.es (datasets.cvc.uab.es)|158.109.8.18|:443... connected. 11 | WARNING: cannot verify datasets.cvc.uab.es's certificate, issued by ‘CN=GEANT OV RSA CA 4,O=GEANT Vereniging,C=NL’: 12 | Unable to locally verify the issuer's authority. 13 | HTTP request sent, awaiting response... 200 OK 14 | Length: 7122739200 (6.6G) [application/x-gzip] 15 | Saving to: ‘train.tar.gz.1’ 16 | 17 | train.tar.gz.1 0%[ ] 39.45M 638KB/s eta 2h 26m 18 | 19 | 20 | ```python 21 | !wget --no-check-certificate https://datasets.cvc.uab.es/rrc/DocVQA/val.tar.gz 22 | ``` 23 | 24 | 25 | ```python 26 | !wget --no-check-certificate https://datasets.cvc.uab.es/rrc/DocVQA/test.tar.gz 27 | ``` 28 | 29 | # Install Packages 30 | 31 | 32 | ```python 33 | !pip install -q transformers 34 | ``` 35 | 36 |  |████████████████████████████████| 4.0 MB 5.3 MB/s 37 |  |████████████████████████████████| 895 kB 45.4 MB/s 38 |  |████████████████████████████████| 6.6 MB 31.6 MB/s 39 |  |████████████████████████████████| 77 kB 4.5 MB/s 40 |  |████████████████████████████████| 596 kB 43.9 MB/s 41 | [?25h 42 | 43 | 44 | ```python 45 | !pip install pyyaml==5.1 46 | # workaround: install old version of pytorch since detectron2 hasn't released packages for pytorch 1.9 (issue: https://github.com/facebookresearch/detectron2/issues/3158) 47 | !pip install torch==1.8.0+cu101 torchvision==0.9.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html 48 | 49 | # install detectron2 that matches pytorch 1.8 50 | # See https://detectron2.readthedocs.io/tutorials/install.html for instructions 51 | !pip install -q detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.8/index.html 52 | # exit(0) # After installation, you need to "restart runtime" in Colab. This line can also restart runtime 53 | ``` 54 | 55 | Collecting pyyaml==5.1 56 | Downloading PyYAML-5.1.tar.gz (274 kB) 57 |  |████████████████████████████████| 274 kB 5.3 MB/s 58 | [?25hBuilding wheels for collected packages: pyyaml 59 | Building wheel for pyyaml (setup.py) ... [?25l[?25hdone 60 | Created wheel for pyyaml: filename=PyYAML-5.1-cp37-cp37m-linux_x86_64.whl size=44092 sha256=923c6817c78b049bf1912c6f36e26c890af93770cb6627b50efc1b77ac4eeeae 61 | Stored in directory: /root/.cache/pip/wheels/77/f5/10/d00a2bd30928b972790053b5de0c703ca87324f3fead0f2fd9 62 | Successfully built pyyaml 63 | Installing collected packages: pyyaml 64 | Attempting uninstall: pyyaml 65 | Found existing installation: PyYAML 6.0 66 | Uninstalling PyYAML-6.0: 67 | Successfully uninstalled PyYAML-6.0 68 | Successfully installed pyyaml-5.1 69 | Looking in links: https://download.pytorch.org/whl/torch_stable.html 70 | Collecting torch==1.8.0+cu101 71 | Downloading https://download.pytorch.org/whl/cu101/torch-1.8.0%2Bcu101-cp37-cp37m-linux_x86_64.whl (763.5 MB) 72 |  |████████████████████████████████| 763.5 MB 16 kB/s 73 | [?25hCollecting torchvision==0.9.0+cu101 74 | Downloading https://download.pytorch.org/whl/cu101/torchvision-0.9.0%2Bcu101-cp37-cp37m-linux_x86_64.whl (17.3 MB) 75 |  |████████████████████████████████| 17.3 MB 795 kB/s 76 | [?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from torch==1.8.0+cu101) (1.21.6) 77 | Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch==1.8.0+cu101) (4.2.0) 78 | Requirement already satisfied: pillow>=4.1.1 in /usr/local/lib/python3.7/dist-packages (from torchvision==0.9.0+cu101) (7.1.2) 79 | Installing collected packages: torch, torchvision 80 | Attempting uninstall: torch 81 | Found existing installation: torch 1.11.0+cu113 82 | Uninstalling torch-1.11.0+cu113: 83 | Successfully uninstalled torch-1.11.0+cu113 84 | Attempting uninstall: torchvision 85 | Found existing installation: torchvision 0.12.0+cu113 86 | Uninstalling torchvision-0.12.0+cu113: 87 | Successfully uninstalled torchvision-0.12.0+cu113 88 | ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 89 | torchtext 0.12.0 requires torch==1.11.0, but you have torch 1.8.0+cu101 which is incompatible. 90 | torchaudio 0.11.0+cu113 requires torch==1.11.0, but you have torch 1.8.0+cu101 which is incompatible. 91 | Successfully installed torch-1.8.0+cu101 torchvision-0.9.0+cu101 92 |  |████████████████████████████████| 6.3 MB 897 kB/s 93 |  |████████████████████████████████| 74 kB 2.1 MB/s 94 |  |████████████████████████████████| 50 kB 5.2 MB/s 95 |  |████████████████████████████████| 147 kB 10.7 MB/s 96 |  |████████████████████████████████| 130 kB 33.3 MB/s 97 |  |████████████████████████████████| 749 kB 42.3 MB/s 98 |  |████████████████████████████████| 843 kB 34.7 MB/s 99 |  |████████████████████████████████| 112 kB 46.3 MB/s 100 | [?25h Building wheel for fvcore (setup.py) ... [?25l[?25hdone 101 | Building wheel for antlr4-python3-runtime (setup.py) ... [?25l[?25hdone 102 | 103 | 104 | 105 | ```python 106 | !pip install -q datasets 107 | ``` 108 | 109 |  |████████████████████████████████| 325 kB 5.4 MB/s 110 |  |████████████████████████████████| 212 kB 43.4 MB/s 111 |  |████████████████████████████████| 136 kB 43.2 MB/s 112 |  |████████████████████████████████| 1.1 MB 35.9 MB/s 113 |  |████████████████████████████████| 127 kB 45.1 MB/s 114 |  |████████████████████████████████| 144 kB 46.5 MB/s 115 |  |████████████████████████████████| 94 kB 2.6 MB/s 116 |  |████████████████████████████████| 271 kB 47.1 MB/s 117 | ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 118 | datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible. 119 | [?25h 120 | 121 | 122 | ```python 123 | !sudo apt install tesseract-ocr 124 | !pip install -q pytesseract 125 | ``` 126 | 127 | Reading package lists... Done 128 | Building dependency tree 129 | Reading state information... Done 130 | The following packages were automatically installed and are no longer required: 131 | libnvidia-common-460 nsight-compute-2020.2.0 132 | Use 'sudo apt autoremove' to remove them. 133 | The following additional packages will be installed: 134 | tesseract-ocr-eng tesseract-ocr-osd 135 | The following NEW packages will be installed: 136 | tesseract-ocr tesseract-ocr-eng tesseract-ocr-osd 137 | 0 upgraded, 3 newly installed, 0 to remove and 42 not upgraded. 138 | Need to get 4,795 kB of archives. 139 | After this operation, 15.8 MB of additional disk space will be used. 140 | Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tesseract-ocr-eng all 4.00~git24-0e00fe6-1.2 [1,588 kB] 141 | Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tesseract-ocr-osd all 4.00~git24-0e00fe6-1.2 [2,989 kB] 142 | Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tesseract-ocr amd64 4.00~git2288-10f4998a-2 [218 kB] 143 | Fetched 4,795 kB in 1s (3,752 kB/s) 144 | debconf: unable to initialize frontend: Dialog 145 | debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 3.) 146 | debconf: falling back to frontend: Readline 147 | debconf: unable to initialize frontend: Readline 148 | debconf: (This frontend requires a controlling tty.) 149 | debconf: falling back to frontend: Teletype 150 | dpkg-preconfigure: unable to re-open stdin: 151 | Selecting previously unselected package tesseract-ocr-eng. 152 | (Reading database ... 155202 files and directories currently installed.) 153 | Preparing to unpack .../tesseract-ocr-eng_4.00~git24-0e00fe6-1.2_all.deb ... 154 | Unpacking tesseract-ocr-eng (4.00~git24-0e00fe6-1.2) ... 155 | Selecting previously unselected package tesseract-ocr-osd. 156 | Preparing to unpack .../tesseract-ocr-osd_4.00~git24-0e00fe6-1.2_all.deb ... 157 | Unpacking tesseract-ocr-osd (4.00~git24-0e00fe6-1.2) ... 158 | Selecting previously unselected package tesseract-ocr. 159 | Preparing to unpack .../tesseract-ocr_4.00~git2288-10f4998a-2_amd64.deb ... 160 | Unpacking tesseract-ocr (4.00~git2288-10f4998a-2) ... 161 | Setting up tesseract-ocr-osd (4.00~git24-0e00fe6-1.2) ... 162 | Setting up tesseract-ocr-eng (4.00~git24-0e00fe6-1.2) ... 163 | Setting up tesseract-ocr (4.00~git2288-10f4998a-2) ... 164 | Processing triggers for man-db (2.8.3-2ubuntu0.1) ... 165 |  |████████████████████████████████| 4.3 MB 5.4 MB/s 166 | ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 167 | albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible. 168 | [?25h 169 | 170 | 171 | ```python 172 | !pip install Pillow==9.0.0 173 | ``` 174 | 175 | Collecting Pillow==9.0.0 176 | Downloading Pillow-9.0.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB) 177 |  |████████████████████████████████| 4.3 MB 5.3 MB/s 178 | [?25hInstalling collected packages: Pillow 179 | Attempting uninstall: Pillow 180 | Found existing installation: Pillow 9.1.0 181 | Uninstalling Pillow-9.1.0: 182 | Successfully uninstalled Pillow-9.1.0 183 | ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. 184 | albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible. 185 | Successfully installed Pillow-9.0.0 186 | 187 | 188 | 189 | 190 | # DocVQA Demo 191 | 192 | 193 | ```python 194 | from transformers import LayoutLMv2Processor 195 | processor = LayoutLMv2Processor.from_pretrained("microsoft/layoutlmv2-base-uncased") 196 | ``` 197 | 198 | 199 | Downloading: 0%| | 0.00/135 [00:00Open In Colab 2 | 3 | 4 | ```python 5 | !pip install onnxruntime==1.8.1 onnx==1.9.0 onnxconverter_common==1.8.1 transformers==4.8.2 psutil pytz pandas py-cpuinfo py3nvml 6 | ``` 7 | 8 | Uninstalling transformers-4.14.1: 9 | Successfully uninstalled transformers-4.14.1 10 | Attempting uninstall: onnxruntime 11 | Found existing installation: onnxruntime 1.10.0 12 | Uninstalling onnxruntime-1.10.0: 13 | Successfully uninstalled onnxruntime-1.10.0 14 | Attempting uninstall: onnxconverter-common 15 | Found existing installation: onnxconverter-common 1.9.0 16 | Uninstalling onnxconverter-common-1.9.0: 17 | Successfully uninstalled onnxconverter-common-1.9.0 18 | Successfully installed huggingface-hub-0.0.12 onnx-1.9.0 onnxconverter-common-1.8.1 onnxruntime-1.8.1 transformers-4.8.2 19 | 20 | 21 | 22 | 23 | 24 | ```python 25 | import os 26 | 27 | # Create a cache directory to store pretrained model. 28 | cache_dir = os.path.join(".", "cache_models") 29 | if not os.path.exists(cache_dir): 30 | os.makedirs(cache_dir) 31 | ``` 32 | 33 | 34 | ```python 35 | !lscpu 36 | ``` 37 | 38 | Architecture: x86_64 39 | CPU op-mode(s): 32-bit, 64-bit 40 | Byte Order: Little Endian 41 | CPU(s): 2 42 | On-line CPU(s) list: 0,1 43 | Thread(s) per core: 2 44 | Core(s) per socket: 1 45 | Socket(s): 1 46 | NUMA node(s): 1 47 | Vendor ID: GenuineIntel 48 | CPU family: 6 49 | Model: 79 50 | Model name: Intel(R) Xeon(R) CPU @ 2.20GHz 51 | Stepping: 0 52 | CPU MHz: 2199.998 53 | BogoMIPS: 4399.99 54 | Hypervisor vendor: KVM 55 | Virtualization type: full 56 | L1d cache: 32K 57 | L1i cache: 32K 58 | L2 cache: 256K 59 | L3 cache: 56320K 60 | NUMA node0 CPU(s): 0,1 61 | Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities 62 | 63 | 64 | 65 | ```python 66 | !pip install coloredlogs 67 | ``` 68 | 69 | Requirement already satisfied: coloredlogs in /usr/local/lib/python3.7/dist-packages (15.0.1) 70 | Requirement already satisfied: humanfriendly>=9.1 in /usr/local/lib/python3.7/dist-packages (from coloredlogs) (10.0) 71 | 72 | 73 | 74 | ```python 75 | from onnxruntime.transformers.gpt2_beamsearch_helper import Gpt2BeamSearchHelper, GPT2LMHeadModel_BeamSearchStep 76 | from transformers import AutoConfig 77 | import torch 78 | ``` 79 | 80 | 81 | ```python 82 | model_name_or_path = "gpt2" 83 | config = AutoConfig.from_pretrained(model_name_or_path, cache_dir=cache_dir) 84 | model = GPT2LMHeadModel_BeamSearchStep.from_pretrained(model_name_or_path, config=config, batch_size=1, beam_size=4, cache_dir=cache_dir) 85 | device = torch.device("cpu") 86 | model.eval().to(device) 87 | 88 | print(model.config) 89 | 90 | num_attention_heads = model.config.n_head 91 | hidden_size = model.config.n_embd 92 | num_layer = model.config.n_layer 93 | ``` 94 | 95 | GPT2Config { 96 | "_name_or_path": "gpt2", 97 | "activation_function": "gelu_new", 98 | "architectures": [ 99 | "GPT2LMHeadModel" 100 | ], 101 | "attn_pdrop": 0.1, 102 | "batch_size": 1, 103 | "beam_size": 4, 104 | "bos_token_id": 50256, 105 | "embd_pdrop": 0.1, 106 | "eos_token_id": 50256, 107 | "gradient_checkpointing": false, 108 | "initializer_range": 0.02, 109 | "layer_norm_epsilon": 1e-05, 110 | "model_type": "gpt2", 111 | "n_ctx": 1024, 112 | "n_embd": 768, 113 | "n_head": 12, 114 | "n_inner": null, 115 | "n_layer": 12, 116 | "n_positions": 1024, 117 | "resid_pdrop": 0.1, 118 | "scale_attn_weights": true, 119 | "summary_activation": null, 120 | "summary_first_dropout": 0.1, 121 | "summary_proj_to_labels": true, 122 | "summary_type": "cls_index", 123 | "summary_use_proj": true, 124 | "task_specific_params": { 125 | "text-generation": { 126 | "do_sample": true, 127 | "max_length": 50 128 | } 129 | }, 130 | "transformers_version": "4.8.2", 131 | "use_cache": true, 132 | "vocab_size": 50257 133 | } 134 | 135 | 136 | 137 | 138 | ```python 139 | onnx_model_path = "gpt2_one_step_search.onnx" 140 | Gpt2BeamSearchHelper.export_onnx(model, device, onnx_model_path) # add parameter use_external_data_format=True when model size > 2 GB 141 | ``` 142 | 143 | /usr/local/lib/python3.7/dist-packages/onnxruntime/transformers/gpt2_beamsearch_helper.py:91: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). 144 | selected_input_seq = selected_index_flat // self.config.beam_size 145 | /usr/local/lib/python3.7/dist-packages/torch/onnx/utils.py:100: UserWarning: `example_outputs' is deprecated and ignored. Will be removed in next PyTorch release. 146 | warnings.warn("`example_outputs' is deprecated and ignored. Will be removed in " 147 | /usr/local/lib/python3.7/dist-packages/torch/onnx/utils.py:103: UserWarning: `use_external_data_format' is deprecated and ignored. Will be removed in next PyTorch release. The code will work as it is False if models are not larger than 2GB, Otherwise set to False because of size limits imposed by Protocol Buffers. 148 | warnings.warn("`use_external_data_format' is deprecated and ignored. Will be removed in next " 149 | /usr/local/lib/python3.7/dist-packages/transformers/models/gpt2/modeling_gpt2.py:698: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! 150 | assert batch_size > 0, "batch_size has to be defined and > 0" 151 | /usr/local/lib/python3.7/dist-packages/transformers/models/gpt2/modeling_gpt2.py:249: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results). 152 | past_key, past_value = layer_past 153 | /usr/local/lib/python3.7/dist-packages/transformers/models/gpt2/modeling_gpt2.py:181: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! 154 | attn_weights = attn_weights / (float(value.size(-1)) ** 0.5) 155 | 156 | 157 | 158 | ```python 159 | import onnxruntime 160 | import numpy 161 | from transformers import AutoTokenizer 162 | 163 | EXAMPLE_Text = ['best hotel in bay area.'] 164 | 165 | def get_tokenizer(model_name_or_path, cache_dir): 166 | tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, cache_dir=cache_dir) 167 | tokenizer.padding_side = "left" 168 | tokenizer.pad_token = tokenizer.eos_token 169 | #okenizer.add_special_tokens({'pad_token': '[PAD]'}) 170 | return tokenizer 171 | 172 | def get_example_inputs(prompt_text=EXAMPLE_Text): 173 | tokenizer = get_tokenizer(model_name_or_path, cache_dir) 174 | encodings_dict = tokenizer.batch_encode_plus(prompt_text, padding=True) 175 | 176 | input_ids = torch.tensor(encodings_dict['input_ids'], dtype=torch.int64) 177 | attention_mask = torch.tensor(encodings_dict['attention_mask'], dtype=torch.float32) 178 | position_ids = (attention_mask.long().cumsum(-1) - 1) 179 | position_ids.masked_fill_(position_ids < 0, 0) 180 | 181 | #Empty Past State for generating first word 182 | empty_past = [] 183 | batch_size = input_ids.size(0) 184 | sequence_length = input_ids.size(1) 185 | past_shape = [2, batch_size, num_attention_heads, 0, hidden_size // num_attention_heads] 186 | for i in range(num_layer): 187 | empty_past.append(torch.empty(past_shape).type(torch.float32).to(device)) 188 | 189 | return input_ids, attention_mask, position_ids, empty_past 190 | 191 | input_ids, attention_mask, position_ids, empty_past = get_example_inputs() 192 | beam_select_idx = torch.zeros([1, input_ids.shape[0]]).long() 193 | input_log_probs = torch.zeros([input_ids.shape[0], 1]) 194 | input_unfinished_sents = torch.ones([input_ids.shape[0], 1], dtype=torch.bool) 195 | prev_step_scores = torch.zeros([input_ids.shape[0], 1]) 196 | 197 | onnx_model_path = "gpt2_one_step_search.onnx" 198 | session = onnxruntime.InferenceSession(onnx_model_path) 199 | ort_inputs = { 200 | 'input_ids': numpy.ascontiguousarray(input_ids.cpu().numpy()), 201 | 'attention_mask' : numpy.ascontiguousarray(attention_mask.cpu().numpy()), 202 | 'position_ids': numpy.ascontiguousarray(position_ids.cpu().numpy()), 203 | 'beam_select_idx': numpy.ascontiguousarray(beam_select_idx.cpu().numpy()), 204 | 'input_log_probs': numpy.ascontiguousarray(input_log_probs.cpu().numpy()), 205 | 'input_unfinished_sents': numpy.ascontiguousarray(input_unfinished_sents.cpu().numpy()), 206 | 'prev_step_results': numpy.ascontiguousarray(input_ids.cpu().numpy()), 207 | 'prev_step_scores': numpy.ascontiguousarray(prev_step_scores.cpu().numpy()), 208 | } 209 | for i, past_i in enumerate(empty_past): 210 | ort_inputs[f'past_{i}'] = numpy.ascontiguousarray(past_i.cpu().numpy()) 211 | ort_outputs = session.run(None, ort_inputs) 212 | ``` 213 | 214 | 215 | ```python 216 | def inference_with_io_binding(session, config, input_ids, position_ids, attention_mask, past, beam_select_idx, input_log_probs, input_unfinished_sents, prev_step_results, prev_step_scores, step, context_len): 217 | output_shapes = Gpt2BeamSearchHelper.get_output_shapes(batch_size=1, 218 | context_len=context_len, 219 | past_sequence_length=past[0].size(3), 220 | sequence_length=input_ids.size(1), 221 | beam_size=4, 222 | step=step, 223 | config=config, 224 | model_class="GPT2LMHeadModel_BeamSearchStep") 225 | output_buffers = Gpt2BeamSearchHelper.get_output_buffers(output_shapes, device) 226 | 227 | io_binding = Gpt2BeamSearchHelper.prepare_io_binding(session, input_ids, position_ids, attention_mask, past, output_buffers, output_shapes, beam_select_idx, input_log_probs, input_unfinished_sents, prev_step_results, prev_step_scores) 228 | session.run_with_iobinding(io_binding) 229 | 230 | outputs = Gpt2BeamSearchHelper.get_outputs_from_io_binding_buffer(session, output_buffers, output_shapes, return_numpy=False) 231 | return outputs 232 | ``` 233 | 234 | 235 | ```python 236 | input_ids, attention_mask, position_ids, empty_past = get_example_inputs() 237 | beam_select_idx = torch.zeros([1, input_ids.shape[0]]).long() 238 | input_log_probs = torch.zeros([input_ids.shape[0], 1]) 239 | input_unfinished_sents = torch.ones([input_ids.shape[0], 1], dtype=torch.bool) 240 | prev_step_scores = torch.zeros([input_ids.shape[0], 1]) 241 | outputs = inference_with_io_binding(session, config, input_ids, position_ids, attention_mask, empty_past, beam_select_idx, input_log_probs, input_unfinished_sents, input_ids, prev_step_scores, 0, input_ids.shape[-1]) 242 | assert torch.eq(outputs[-2], torch.from_numpy(ort_outputs[-2])).all() 243 | print("IO Binding result is good") 244 | ``` 245 | 246 | IO Binding result is good 247 | 248 | 249 | 250 | ```python 251 | def update(output, step, batch_size, beam_size, context_length, prev_attention_mask, device): 252 | """ 253 | Update the inputs for next inference. 254 | """ 255 | last_state = (torch.from_numpy(output[0]).to(device) 256 | if isinstance(output[0], numpy.ndarray) else output[0].clone().detach().cpu()) 257 | 258 | input_ids = last_state.view(batch_size * beam_size, -1).to(device) 259 | 260 | input_unfinished_sents_id = -3 261 | prev_step_results = (torch.from_numpy(output[-2]).to(device) if isinstance(output[-2], numpy.ndarray) 262 | else output[-2].clone().detach().to(device)) 263 | position_ids = (torch.tensor([context_length + step - 1 264 | ]).unsqueeze(0).repeat(batch_size * beam_size, 1).to(device)) 265 | 266 | if prev_attention_mask.shape[0] != (batch_size * beam_size): 267 | prev_attention_mask = prev_attention_mask.repeat(batch_size * beam_size, 1) 268 | attention_mask = torch.cat( 269 | [ 270 | prev_attention_mask, 271 | torch.ones([batch_size * beam_size, 1]).type_as(prev_attention_mask), 272 | ], 273 | 1, 274 | ).to(device) 275 | 276 | beam_select_idx = (torch.from_numpy(output[input_unfinished_sents_id - 2]).to(device) if isinstance( 277 | output[input_unfinished_sents_id - 2], numpy.ndarray) else output[input_unfinished_sents_id - 2].clone().detach().to(device)) 278 | input_log_probs = (torch.from_numpy(output[input_unfinished_sents_id - 1]).to(device) if isinstance( 279 | output[input_unfinished_sents_id - 1], numpy.ndarray) else output[input_unfinished_sents_id - 1].clone().detach().to(device)) 280 | input_unfinished_sents = (torch.from_numpy(output[input_unfinished_sents_id]).to(device) if isinstance( 281 | output[input_unfinished_sents_id], numpy.ndarray) else 282 | output[input_unfinished_sents_id].clone().detach().to(device)) 283 | prev_step_scores = (torch.from_numpy(output[-1]).to(device) 284 | if isinstance(output[-1], numpy.ndarray) else output[-1].clone().detach().to(device)) 285 | 286 | past = [] 287 | if isinstance(output[1], tuple): # past in torch output is tuple 288 | past = list(output[1]) 289 | else: 290 | for i in range(model.config.n_layer): 291 | past_i = (torch.from_numpy(output[i + 1]) 292 | if isinstance(output[i + 1], numpy.ndarray) else output[i + 1].clone().detach()) 293 | past.append(past_i.to(device)) 294 | 295 | inputs = { 296 | 'input_ids': input_ids, 297 | 'attention_mask' : attention_mask, 298 | 'position_ids': position_ids, 299 | 'beam_select_idx': beam_select_idx, 300 | 'input_log_probs': input_log_probs, 301 | 'input_unfinished_sents': input_unfinished_sents, 302 | 'prev_step_results': prev_step_results, 303 | 'prev_step_scores': prev_step_scores, 304 | } 305 | ort_inputs = { 306 | 'input_ids': numpy.ascontiguousarray(input_ids.cpu().numpy()), 307 | 'attention_mask' : numpy.ascontiguousarray(attention_mask.cpu().numpy()), 308 | 'position_ids': numpy.ascontiguousarray(position_ids.cpu().numpy()), 309 | 'beam_select_idx': numpy.ascontiguousarray(beam_select_idx.cpu().numpy()), 310 | 'input_log_probs': numpy.ascontiguousarray(input_log_probs.cpu().numpy()), 311 | 'input_unfinished_sents': numpy.ascontiguousarray(input_unfinished_sents.cpu().numpy()), 312 | 'prev_step_results': numpy.ascontiguousarray(prev_step_results.cpu().numpy()), 313 | 'prev_step_scores': numpy.ascontiguousarray(prev_step_scores.cpu().numpy()), 314 | } 315 | for i, past_i in enumerate(past): 316 | ort_inputs[f'past_{i}'] = numpy.ascontiguousarray(past_i.cpu().numpy()) 317 | 318 | return inputs, ort_inputs, past 319 | 320 | def test_generation(tokenizer, input_text, use_onnxruntime_io, ort_session = None, num_tokens_to_produce = 30): 321 | print("Text generation using", "OnnxRuntime with IO binding" if use_onnxruntime_io else "OnnxRuntime", "...") 322 | input_ids, attention_mask, position_ids, past = get_example_inputs(input_text) 323 | beam_select_idx = torch.zeros([1, input_ids.shape[0]]).long() 324 | input_log_probs = torch.zeros([input_ids.shape[0], 1]) 325 | input_unfinished_sents = torch.ones([input_ids.shape[0], 1], dtype=torch.bool) 326 | prev_step_scores = torch.zeros([input_ids.shape[0], 1]) 327 | inputs = { 328 | 'input_ids': input_ids, 329 | 'attention_mask' : attention_mask, 330 | 'position_ids': position_ids, 331 | 'beam_select_idx': beam_select_idx, 332 | 'input_log_probs': input_log_probs, 333 | 'input_unfinished_sents': input_unfinished_sents, 334 | 'prev_step_results': input_ids, 335 | 'prev_step_scores': prev_step_scores, 336 | } 337 | ort_inputs = { 338 | 'input_ids': numpy.ascontiguousarray(input_ids.cpu().numpy()), 339 | 'attention_mask' : numpy.ascontiguousarray(attention_mask.cpu().numpy()), 340 | 'position_ids': numpy.ascontiguousarray(position_ids.cpu().numpy()), 341 | 'beam_select_idx': numpy.ascontiguousarray(beam_select_idx.cpu().numpy()), 342 | 'input_log_probs': numpy.ascontiguousarray(input_log_probs.cpu().numpy()), 343 | 'input_unfinished_sents': numpy.ascontiguousarray(input_unfinished_sents.cpu().numpy()), 344 | 'prev_step_results': numpy.ascontiguousarray(input_ids.cpu().numpy()), 345 | 'prev_step_scores': numpy.ascontiguousarray(prev_step_scores.cpu().numpy()), 346 | } 347 | for i, past_i in enumerate(past): 348 | ort_inputs[f'past_{i}'] = numpy.ascontiguousarray(past_i.cpu().numpy()) 349 | batch_size = input_ids.size(0) 350 | beam_size = 4 351 | context_length = input_ids.size(-1) 352 | 353 | for step in range(num_tokens_to_produce): 354 | if use_onnxruntime_io: 355 | outputs = inference_with_io_binding(ort_session, config, inputs['input_ids'], inputs['position_ids'], inputs['attention_mask'], past, inputs['beam_select_idx'], inputs['input_log_probs'], inputs['input_unfinished_sents'], inputs['prev_step_results'], inputs['prev_step_scores'], step, context_length) 356 | else: 357 | outputs = ort_session.run(None, ort_inputs) 358 | inputs, ort_inputs, past = update(outputs, step, batch_size, beam_size, context_length, inputs['attention_mask'], device) 359 | 360 | if not inputs['input_unfinished_sents'].any(): 361 | break 362 | 363 | print("------------") 364 | print(tokenizer.decode(inputs['prev_step_results'][0], skip_special_tokens=True)) 365 | ``` 366 | 367 | 368 | ```python 369 | tokenizer = get_tokenizer(model_name_or_path, cache_dir) 370 | input_text = EXAMPLE_Text 371 | test_generation(tokenizer, input_text, use_onnxruntime_io=False, ort_session=session) 372 | ``` 373 | 374 | Text generation using OnnxRuntime ... 375 | ------------ 376 | best hotel in bay area. 377 | 378 | "It's a great place to stay," he said. 379 | 380 | 381 | 382 | ```python 383 | test_generation(tokenizer, input_text, use_onnxruntime_io=True, ort_session=session) 384 | ``` 385 | 386 | Text generation using OnnxRuntime with IO binding ... 387 | ------------ 388 | best hotel in bay area. 389 | 390 | "It's a great place to stay," he said. 391 | 392 | 393 | 394 | ```python 395 | 396 | ``` 397 | -------------------------------------------------------------------------------- /docs/Generic_Transformer_Classification.md: -------------------------------------------------------------------------------- 1 | Open In Colab 2 | 3 | 4 | ``` 5 | !pip install transformers 6 | ``` 7 | 8 | Collecting transformers 9 | [?25l Downloading https://files.pythonhosted.org/packages/27/3c/91ed8f5c4e7ef3227b4119200fc0ed4b4fd965b1f0172021c25701087825/transformers-3.0.2-py3-none-any.whl (769kB) 10 |  |████████████████████████████████| 778kB 2.8MB/s 11 | [?25hRequirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from transformers) (1.18.5) 12 | Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.6/dist-packages (from transformers) (4.41.1) 13 | Collecting sentencepiece!=0.1.92 14 | [?25l Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB) 15 |  |████████████████████████████████| 1.1MB 13.3MB/s 16 | [?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.6/dist-packages (from transformers) (2019.12.20) 17 | Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from transformers) (2.23.0) 18 | Requirement already satisfied: dataclasses; python_version < "3.7" in /usr/local/lib/python3.6/dist-packages (from transformers) (0.7) 19 | Collecting sacremoses 20 | [?25l Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB) 21 |  |████████████████████████████████| 890kB 19.7MB/s 22 | [?25hRequirement already satisfied: filelock in /usr/local/lib/python3.6/dist-packages (from transformers) (3.0.12) 23 | Requirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from transformers) (20.4) 24 | Collecting tokenizers==0.8.1.rc1 25 | [?25l Downloading https://files.pythonhosted.org/packages/40/d0/30d5f8d221a0ed981a186c8eb986ce1c94e3a6e87f994eae9f4aa5250217/tokenizers-0.8.1rc1-cp36-cp36m-manylinux1_x86_64.whl (3.0MB) 26 |  |████████████████████████████████| 3.0MB 20.6MB/s 27 | [?25hRequirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2.10) 28 | Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (3.0.4) 29 | Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (1.24.3) 30 | Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2020.6.20) 31 | Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (1.15.0) 32 | Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (7.1.2) 33 | Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (0.16.0) 34 | Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->transformers) (2.4.7) 35 | Building wheels for collected packages: sacremoses 36 | Building wheel for sacremoses (setup.py) ... [?25l[?25hdone 37 | Created wheel for sacremoses: filename=sacremoses-0.0.43-cp36-none-any.whl size=893257 sha256=6bda503b3bbdbf7626ff38a3e3235c3a20e948d97c40c78a681f5c87b90ed237 38 | Stored in directory: /root/.cache/pip/wheels/29/3c/fd/7ce5c3f0666dab31a50123635e6fb5e19ceb42ce38d4e58f45 39 | Successfully built sacremoses 40 | Installing collected packages: sentencepiece, sacremoses, tokenizers, transformers 41 | Successfully installed sacremoses-0.0.43 sentencepiece-0.1.91 tokenizers-0.8.1rc1 transformers-3.0.2 42 | 43 | 44 | 45 | ``` 46 | import os, pandas as pd 47 | from sklearn.model_selection import train_test_split 48 | import logging 49 | from transformers import * 50 | import torch 51 | from torch.utils.data import Dataset, DataLoader 52 | import torch.nn as nn 53 | from tqdm.autonotebook import tqdm 54 | ``` 55 | 56 | # 1. Set Configuration 57 | 58 | 59 | ``` 60 | class Config: 61 | train_file = './data.csv' 62 | eval_file = './eval.csv' 63 | max_seq_len = 128 64 | batch_size = 32 65 | epochs = 5 66 | model_name = 'bert-base-uncased' 67 | learning_rate = 2e-5 68 | n_classes = 3 69 | device = 'cpu' 70 | 71 | 72 | 73 | flags = Config 74 | ``` 75 | 76 | # 2. Build Dataset Pipeline 77 | 78 | 79 | ``` 80 | class TextLabelDataset(Dataset): 81 | 82 | def __init__(self, texts, labels, tokenizer, max_len): 83 | self.texts = texts 84 | self.labels = labels 85 | self.tokenizer = tokenizer 86 | self.max_len = max_len 87 | 88 | def __len__(self): 89 | return len(self.texts) 90 | 91 | def __getitem__(self, item): 92 | text = str(self.texts[item]) 93 | label = self.labels[item] 94 | 95 | encoding = self.tokenizer.encode_plus( 96 | text, 97 | add_special_tokens=True, 98 | max_length=self.max_len, 99 | return_token_type_ids=False, 100 | pad_to_max_length=True, 101 | return_attention_mask=True, 102 | return_tensors='pt', 103 | truncation=True 104 | ) 105 | 106 | return { 107 | 'texts': text, 108 | 'input_ids': encoding['input_ids'].flatten(), 109 | 'attention_mask': encoding['attention_mask'].flatten(), 110 | 'targets': torch.tensor(label, dtype=torch.long) 111 | } 112 | 113 | def create_data_loader(df, tokenizer, max_len, batch_size, is_prediction=False): 114 | 115 | if isinstance(df, str): 116 | df = pd.read_csv(df) 117 | else: 118 | pass 119 | 120 | if is_prediction: 121 | ds = TextLabelDataset( 122 | texts=df.text.to_numpy(), 123 | labels=np.array([-1]*len(df.text.values)), 124 | tokenizer=tokenizer, 125 | max_len=max_len 126 | ) 127 | else: 128 | ds = TextLabelDataset( 129 | texts=df.text.to_numpy(), 130 | labels=df.labels.to_numpy(), 131 | tokenizer=tokenizer, 132 | max_len=max_len 133 | ) 134 | 135 | return DataLoader( 136 | ds, 137 | batch_size=batch_size, 138 | num_workers=4 139 | ) 140 | ``` 141 | 142 | # 3. Build Model 143 | 144 | 145 | ``` 146 | class Classifier(nn.Module): 147 | 148 | def __init__(self, model_name, n_classes): 149 | super(Classifier, self).__init__() 150 | self.bert = AutoModel.from_pretrained(model_name) 151 | self.drop = nn.Dropout(p=0.3) 152 | self.out = nn.Linear(self.bert.config.hidden_size, n_classes) 153 | 154 | def forward(self, input_ids, attention_mask): 155 | _, pooled_output = self.bert( 156 | input_ids=input_ids, 157 | attention_mask=attention_mask 158 | ) 159 | output = self.drop(pooled_output) 160 | return self.out(output) 161 | ``` 162 | 163 | 164 | ``` 165 | class ClassificationModel: 166 | 167 | def __init__(self, flags): 168 | self.flags = flags 169 | self.tokenizer = BertTokenizer.from_pretrained(self.flags.model_name) 170 | self.model = Classifier(self.flags.model_name, self.flags.n_classes) 171 | self.model = self.model.to(self.flags.device) 172 | 173 | def train(self): 174 | 175 | train_data_loader = create_data_loader(self.flags.train_file, self.tokenizer, self.flags.max_seq_len, self.flags.batch_size) 176 | val_data_loader = create_data_loader(self.flags.eval_file, self.tokenizer, self.flags.max_seq_len, self.flags.batch_size) 177 | 178 | optimizer = AdamW(self.model.parameters(), lr=self.flags.learning_rate, correct_bias=False) 179 | total_steps = len(train_data_loader) * self.flags.epochs 180 | 181 | scheduler = get_linear_schedule_with_warmup( 182 | optimizer, 183 | num_warmup_steps=0, 184 | num_training_steps=total_steps 185 | ) 186 | 187 | loss_fn = nn.CrossEntropyLoss().to(self.flags.device) 188 | 189 | history = defaultdict(list) 190 | best_accuracy = 0 191 | 192 | if isinstance(self.flags.train_file, str): 193 | train_df = pd.read_csv(self.flags.train_file) 194 | 195 | if isinstance(self.flags.eval_file, str): 196 | eval_df = pd.read_csv(self.flags.eval_file) 197 | 198 | for epoch in range(self.flags.epochs): 199 | 200 | print(f'Epoch {epoch + 1}/{self.flags.epochs}') 201 | print('-' * 10) 202 | 203 | train_acc, train_loss = self.train_epoch( 204 | self.model, 205 | train_data_loader, 206 | loss_fn, 207 | optimizer, 208 | self.flags.device, 209 | scheduler, 210 | len(train_df) 211 | ) 212 | 213 | print(f'Train loss {train_loss} accuracy {train_acc}') 214 | 215 | val_acc, val_loss = self.eval_model( 216 | self.model, 217 | val_data_loader, 218 | loss_fn, 219 | self.flags.device, 220 | len(eval_df) 221 | ) 222 | 223 | print(f'Val loss {val_loss} accuracy {val_acc}') 224 | print() 225 | 226 | history['train_acc'].append(train_acc) 227 | history['train_loss'].append(train_loss) 228 | history['val_acc'].append(val_acc) 229 | history['val_loss'].append(val_loss) 230 | 231 | if val_acc > best_accuracy: 232 | torch.save(self.model.state_dict(), 'best_model_state.bin') 233 | best_accuracy = val_acc 234 | 235 | 236 | def train_epoch(self, model, data_loader, loss_fn, optimizer, device, scheduler, n_examples): 237 | model = model.train() 238 | 239 | losses = [] 240 | correct_predictions = 0 241 | tk0 = tqdm(data_loader, total=len(data_loader), desc="Training") 242 | for bi, d in enumerate(tk0): 243 | input_ids = d["input_ids"].to(device) 244 | attention_mask = d["attention_mask"].to(device) 245 | targets = d["targets"].to(device) 246 | 247 | outputs = model( 248 | input_ids=input_ids, 249 | attention_mask=attention_mask 250 | ) 251 | 252 | _, preds = torch.max(outputs, dim=1) 253 | loss = loss_fn(outputs, targets) 254 | 255 | correct_predictions += torch.sum(preds == targets) 256 | losses.append(loss.item()) 257 | 258 | loss.backward() 259 | nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) 260 | optimizer.step() 261 | scheduler.step() 262 | optimizer.zero_grad() 263 | 264 | return correct_predictions.double() / n_examples, np.mean(losses) 265 | 266 | def eval_model(self, model, data_loader, loss_fn, device, n_examples): 267 | model = model.eval() 268 | 269 | losses = [] 270 | correct_predictions = 0 271 | 272 | with torch.no_grad(): 273 | tk0 = tqdm(data_loader, total=len(data_loader), desc="Evaluating") 274 | for bi, d in enumerate(tk0): 275 | input_ids = d["input_ids"].to(device) 276 | attention_mask = d["attention_mask"].to(device) 277 | targets = d["targets"].to(device) 278 | 279 | outputs = model( 280 | input_ids=input_ids, 281 | attention_mask=attention_mask 282 | ) 283 | _, preds = torch.max(outputs, dim=1) 284 | 285 | loss = loss_fn(outputs, targets) 286 | 287 | correct_predictions += torch.sum(preds == targets) 288 | losses.append(loss.item()) 289 | 290 | return correct_predictions.double() / n_examples, np.mean(losses) 291 | 292 | 293 | 294 | ``` 295 | 296 | ### Download Data and Preparation 297 | 298 | 299 | ``` 300 | !wget https://raw.githubusercontent.com/SrinidhiRaghavan/AI-Sentiment-Analysis-on-IMDB-Dataset/master/imdb_tr.csv 301 | ``` 302 | 303 | --2020-08-26 15:57:14-- https://raw.githubusercontent.com/SrinidhiRaghavan/AI-Sentiment-Analysis-on-IMDB-Dataset/master/imdb_tr.csv 304 | Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ... 305 | Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected. 306 | HTTP request sent, awaiting response... 200 OK 307 | Length: 23677025 (23M) [text/plain] 308 | Saving to: ‘imdb_tr.csv’ 309 | 310 | imdb_tr.csv 100%[===================>] 22.58M 39.9MB/s in 0.6s 311 | 312 | 2020-08-26 15:57:16 (39.9 MB/s) - ‘imdb_tr.csv’ saved [23677025/23677025] 313 | 314 | 315 | 316 | 317 | ``` 318 | data = pd.read_csv('imdb_tr.csv', encoding = "ISO-8859-1") 319 | ``` 320 | 321 | 322 | ``` 323 | data.columns = ['row_Number', 'text', 'labels'] 324 | ``` 325 | 326 | 327 | ``` 328 | train_data = data.sample(1000) 329 | test_data = data.sample(100) 330 | 331 | ``` 332 | 333 | 334 | ``` 335 | train_data.to_csv('data.csv', index=False) 336 | test_data.to_csv('eval.csv', index=False) 337 | ``` 338 | 339 | # Training 340 | 341 | 342 | ``` 343 | from collections import defaultdict 344 | import numpy as np 345 | ``` 346 | 347 | 348 | ``` 349 | class Config: 350 | train_file = './data.csv' 351 | eval_file = './eval.csv' 352 | max_seq_len = 128 353 | batch_size = 32 354 | epochs = 5 355 | model_name = 'bert-base-uncased' 356 | learning_rate = 2e-5 357 | n_classes = 2 358 | device = 'cuda' 359 | 360 | flags = Config 361 | ``` 362 | 363 | 364 | ``` 365 | classification = ClassificationModel(flags) 366 | ``` 367 | 368 | 369 | ``` 370 | classification.train() 371 | ``` 372 | 373 | Epoch 1/5 374 | ---------- 375 | 376 | 377 | 378 | HBox(children=(FloatProgress(value=0.0, description='Training', max=32.0, style=ProgressStyle(description_widt… 379 | 380 | 381 | 382 | Train loss 0.6540139000862837 accuracy 0.622 383 | 384 | 385 | 386 | HBox(children=(FloatProgress(value=0.0, description='Evaluating', max=4.0, style=ProgressStyle(description_wid… 387 | 388 | 389 | 390 | Val loss 0.4141501262784004 accuracy 0.78 391 | 392 | Epoch 2/5 393 | ---------- 394 | 395 | 396 | 397 | HBox(children=(FloatProgress(value=0.0, description='Training', max=32.0, style=ProgressStyle(description_widt… 398 | 399 | 400 | 401 | Train loss 0.3276493112789467 accuracy 0.864 402 | 403 | 404 | 405 | HBox(children=(FloatProgress(value=0.0, description='Evaluating', max=4.0, style=ProgressStyle(description_wid… 406 | 407 | 408 | 409 | Val loss 0.3254726273007691 accuracy 0.87 410 | 411 | Epoch 3/5 412 | ---------- 413 | 414 | 415 | 416 | HBox(children=(FloatProgress(value=0.0, description='Training', max=32.0, style=ProgressStyle(description_widt… 417 | 418 | 419 | 420 | Train loss 0.12970392164424993 accuracy 0.9530000000000001 421 | 422 | 423 | 424 | HBox(children=(FloatProgress(value=0.0, description='Evaluating', max=4.0, style=ProgressStyle(description_wid… 425 | 426 | 427 | 428 | Val loss 0.4319960339926183 accuracy 0.8300000000000001 429 | 430 | Epoch 4/5 431 | ---------- 432 | 433 | 434 | 435 | HBox(children=(FloatProgress(value=0.0, description='Training', max=32.0, style=ProgressStyle(description_widt… 436 | 437 | 438 | 439 | Train loss 0.0639086696319282 accuracy 0.982 440 | 441 | 442 | 443 | HBox(children=(FloatProgress(value=0.0, description='Evaluating', max=4.0, style=ProgressStyle(description_wid… 444 | 445 | 446 | 447 | Val loss 0.5208611574489623 accuracy 0.8300000000000001 448 | 449 | Epoch 5/5 450 | ---------- 451 | 452 | 453 | 454 | HBox(children=(FloatProgress(value=0.0, description='Training', max=32.0, style=ProgressStyle(description_widt… 455 | 456 | 457 | 458 | Train loss 0.01748604617023375 accuracy 0.996 459 | 460 | 461 | 462 | HBox(children=(FloatProgress(value=0.0, description='Evaluating', max=4.0, style=ProgressStyle(description_wid… 463 | 464 | 465 | 466 | Val loss 0.4388579736405518 accuracy 0.9 467 | 468 | 469 | 470 | 471 | ``` 472 | 473 | ``` 474 | -------------------------------------------------------------------------------- /docs/Question_Answering_with_a_Fine_Tuned_BERT.md: -------------------------------------------------------------------------------- 1 | Open In Colab 2 | 3 | # Question Answering with a Fine-Tuned BERT 4 | *by Ankur Singh* 5 | 6 | # Part 1: How BERT is applied to Question Answering 7 | 8 | ## The SQuAD v1.1 Benchmark 9 | 10 | When someone mentions "Question Answering" as an application of BERT, what they are really referring to is applying BERT to the Stanford Question Answering Dataset (SQuAD). 11 | 12 | The task posed by the SQuAD benchmark is a little different than you might think. Given a question, and *a passage of text containing the answer*, BERT needs to highlight the "span" of text corresponding to the correct answer. 13 | 14 | The SQuAD homepage has a fantastic tool for exploring the questions and reference text for this dataset, and even shows the predictions made by top-performing models. 15 | 16 | For example, here are some [interesting examples](https://rajpurkar.github.io/SQuAD-explorer/explore/1.1/dev/Super_Bowl_50.html?model=r-net+%20(ensemble)%20(Microsoft%20Research%20Asia)&version=1.1) on the topic of Super Bowl 50. 17 | 18 | 19 | ## BERT Input Format 20 | 21 | To feed a QA task into BERT, we pack both the question and the reference text into the input. 22 | 23 | ![Input format for QA](http://www.mccormickml.com/assets/BERT/SQuAD/input_formatting.png) 24 | 25 | The two pieces of text are separated by the special `[SEP]` token. 26 | 27 | BERT also uses "Segment Embeddings" to differentiate the question from the reference text. These are simply two embeddings (for segments "A" and "B") that BERT learned, and which it adds to the token embeddings before feeding them into the input layer. 28 | 29 | ## Start & End Token Classifiers 30 | 31 | BERT needs to highlight a "span" of text containing the answer--this is represented as simply predicting which token marks the start of the answer, and which token marks the end. 32 | 33 | ![Start token classification](http://www.mccormickml.com/assets/BERT/SQuAD/start_token_classification.png) 34 | 35 | For every token in the text, we feed its final embedding into the start token classifier. The start token classifier only has a single set of weights (represented by the blue "start" rectangle in the above illustration) which it applies to every word. 36 | 37 | After taking the dot product between the output embeddings and the 'start' weights, we apply the softmax activation to produce a probability distribution over all of the words. Whichever word has the highest probability of being the start token is the one that we pick. 38 | 39 | We repeat this process for the end token--we have a separate weight vector this. 40 | 41 | ![End token classification](http://www.mccormickml.com/assets/BERT/SQuAD/end_token_classification.png) 42 | 43 | # Part 2: Example Code 44 | 45 | In the example code below, we'll be downloading a model that's *already been fine-tuned* for question answering, and try it out on our own text. 46 | 47 | If you do want to fine-tune on your own dataset, it is possible to fine-tune BERT for question answering yourself. See [run_squad.py](https://github.com/huggingface/transformers/blob/master/examples/run_squad.py) in the `transformers` library. However,you may find that the below "fine-tuned-on-squad" model already does a good job, even if your text is from a different domain. 48 | 49 | > Note: The example code in this Notebook is a commented and expanded version of the short example provided in the `transformers` documentation [here](https://huggingface.co/transformers/model_doc/bert.html?highlight=bertforquestionanswering#transformers.BertForQuestionAnswering). 50 | 51 | ## 1. Install huggingface transformers library 52 | 53 | This example uses the `transformers` [library](https://github.com/huggingface/transformers/) by huggingface. We'll start by installing the package. 54 | 55 | 56 | ``` 57 | !pip install transformers 58 | ``` 59 | 60 | Collecting transformers 61 | [?25l Downloading https://files.pythonhosted.org/packages/27/3c/91ed8f5c4e7ef3227b4119200fc0ed4b4fd965b1f0172021c25701087825/transformers-3.0.2-py3-none-any.whl (769kB) 62 |  |████████████████████████████████| 778kB 8.6MB/s 63 | [?25hRequirement already satisfied: packaging in /usr/local/lib/python3.6/dist-packages (from transformers) (20.4) 64 | Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.6/dist-packages (from transformers) (2019.12.20) 65 | Requirement already satisfied: dataclasses; python_version < "3.7" in /usr/local/lib/python3.6/dist-packages (from transformers) (0.7) 66 | Requirement already satisfied: requests in /usr/local/lib/python3.6/dist-packages (from transformers) (2.23.0) 67 | Collecting sacremoses 68 | [?25l Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB) 69 |  |████████████████████████████████| 890kB 14.4MB/s 70 | [?25hCollecting tokenizers==0.8.1.rc1 71 | [?25l Downloading https://files.pythonhosted.org/packages/40/d0/30d5f8d221a0ed981a186c8eb986ce1c94e3a6e87f994eae9f4aa5250217/tokenizers-0.8.1rc1-cp36-cp36m-manylinux1_x86_64.whl (3.0MB) 72 |  |████████████████████████████████| 3.0MB 43.4MB/s 73 | [?25hCollecting sentencepiece!=0.1.92 74 | [?25l Downloading https://files.pythonhosted.org/packages/d4/a4/d0a884c4300004a78cca907a6ff9a5e9fe4f090f5d95ab341c53d28cbc58/sentencepiece-0.1.91-cp36-cp36m-manylinux1_x86_64.whl (1.1MB) 75 |  |████████████████████████████████| 1.1MB 44.5MB/s 76 | [?25hRequirement already satisfied: filelock in /usr/local/lib/python3.6/dist-packages (from transformers) (3.0.12) 77 | Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.6/dist-packages (from transformers) (4.41.1) 78 | Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from transformers) (1.18.5) 79 | Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from packaging->transformers) (1.15.0) 80 | Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from packaging->transformers) (2.4.7) 81 | Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2020.6.20) 82 | Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (2.10) 83 | Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (3.0.4) 84 | Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests->transformers) (1.24.3) 85 | Requirement already satisfied: click in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (7.1.2) 86 | Requirement already satisfied: joblib in /usr/local/lib/python3.6/dist-packages (from sacremoses->transformers) (0.16.0) 87 | Building wheels for collected packages: sacremoses 88 | Building wheel for sacremoses (setup.py) ... [?25l[?25hdone 89 | Created wheel for sacremoses: filename=sacremoses-0.0.43-cp36-none-any.whl size=893257 sha256=09966dce42bbfaa2aa5abc0f5f82653b74458d122a6fefeb7c895da49b40ce50 90 | Stored in directory: /root/.cache/pip/wheels/29/3c/fd/7ce5c3f0666dab31a50123635e6fb5e19ceb42ce38d4e58f45 91 | Successfully built sacremoses 92 | Installing collected packages: sacremoses, tokenizers, sentencepiece, transformers 93 | Successfully installed sacremoses-0.0.43 sentencepiece-0.1.91 tokenizers-0.8.1rc1 transformers-3.0.2 94 | 95 | 96 | 97 | ``` 98 | import torch 99 | ``` 100 | 101 | ## 2. Load Fine-Tuned BERT-large 102 | 103 | For Question Answering we use the `BertForQuestionAnswering` class from the `transformers` library. 104 | 105 | This class supports fine-tuning, but for this example we will keep things simpler and load a BERT model that has already been fine-tuned for the SQuAD benchmark. 106 | 107 | The `transformers` library has a large collection of pre-trained models which you can reference by name and load easily. The full list is in their documentation [here](https://huggingface.co/transformers/pretrained_models.html). 108 | 109 | For Question Answering, they have a version of BERT-large that has already been fine-tuned for the SQuAD benchmark. 110 | 111 | BERT-large is really big... it has 24-layers and an embedding size of 1,024, for a total of 340M parameters! Altogether it is 1.34GB, so expect it to take a couple minutes to download to your Colab instance. 112 | 113 | (Note that this download is not using your own network bandwidth--it's between the Google instance and wherever the model is stored on the web). 114 | 115 | Note: I believe this model was trained on version 1 of SQuAD, since it's not outputting whether the question is "impossible" to answer from the text (which is part of the task in v2 of SQuAD). 116 | 117 | 118 | 119 | ``` 120 | from transformers import BertForQuestionAnswering 121 | 122 | model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad') 123 | 124 | ``` 125 | 126 | 127 | HBox(children=(FloatProgress(value=0.0, description='Downloading', max=443.0, style=ProgressStyle(description_… 128 | 129 | 130 | 131 | 132 | 133 | 134 | HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1340675298.0, style=ProgressStyle(descr… 135 | 136 | 137 | 138 | 139 | 140 | Load the tokenizer as well. 141 | 142 | Side note: Apparently the vocabulary of this model is identicaly to the one in bert-base-uncased. You can load the tokenizer from `bert-base-uncased` and that works just as well. 143 | 144 | 145 | ``` 146 | from transformers import BertTokenizer 147 | 148 | tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad') 149 | ``` 150 | 151 | 152 | HBox(children=(FloatProgress(value=0.0, description='Downloading', max=231508.0, style=ProgressStyle(descripti… 153 | 154 | 155 | 156 | 157 | 158 | ## 3. Ask a Question 159 | 160 | Now we're ready to feed in an example! 161 | 162 | A QA example consists of a question and a passage of text containing the answer to that question. 163 | 164 | Let's try an example using the text in this tutorial! 165 | 166 | 167 | ``` 168 | question = "How many parameters does BERT-large have?" 169 | answer_text = "BERT-large is really big... it has 24-layers and an embedding size of 1,024, for a total of 340M parameters! Altogether it is 1.34GB, so expect it to take a couple minutes to download to your Colab instance." 170 | ``` 171 | 172 | We'll need to run the BERT tokenizer against both the `question` and the `answer_text`. To feed these into BERT, we actually concatenate them together and place the special [SEP] token in between. 173 | 174 | 175 | 176 | ``` 177 | # Apply the tokenizer to the input text, treating them as a text-pair. 178 | input_ids = tokenizer.encode(question, answer_text) 179 | 180 | print('The input has a total of {:} tokens.'.format(len(input_ids))) 181 | ``` 182 | 183 | The input has a total of 70 tokens. 184 | 185 | 186 | Just to see exactly what the tokenizer is doing, let's print out the tokens with their IDs. 187 | 188 | 189 | ``` 190 | # BERT only needs the token IDs, but for the purpose of inspecting the 191 | # tokenizer's behavior, let's also get the token strings and display them. 192 | tokens = tokenizer.convert_ids_to_tokens(input_ids) 193 | 194 | # For each token and its id... 195 | for token, id in zip(tokens, input_ids): 196 | 197 | # If this is the [SEP] token, add some space around it to make it stand out. 198 | if id == tokenizer.sep_token_id: 199 | print('') 200 | 201 | # Print the token string and its ID in two columns. 202 | print('{:<12} {:>6,}'.format(token, id)) 203 | 204 | if id == tokenizer.sep_token_id: 205 | print('') 206 | 207 | ``` 208 | 209 | [CLS] 101 210 | how 2,129 211 | many 2,116 212 | parameters 11,709 213 | does 2,515 214 | bert 14,324 215 | - 1,011 216 | large 2,312 217 | have 2,031 218 | ? 1,029 219 | 220 | [SEP] 102 221 | 222 | bert 14,324 223 | - 1,011 224 | large 2,312 225 | is 2,003 226 | really 2,428 227 | big 2,502 228 | . 1,012 229 | . 1,012 230 | . 1,012 231 | it 2,009 232 | has 2,038 233 | 24 2,484 234 | - 1,011 235 | layers 9,014 236 | and 1,998 237 | an 2,019 238 | em 7,861 239 | ##bed 8,270 240 | ##ding 4,667 241 | size 2,946 242 | of 1,997 243 | 1 1,015 244 | , 1,010 245 | 02 6,185 246 | ##4 2,549 247 | , 1,010 248 | for 2,005 249 | a 1,037 250 | total 2,561 251 | of 1,997 252 | 340 16,029 253 | ##m 2,213 254 | parameters 11,709 255 | ! 999 256 | altogether 10,462 257 | it 2,009 258 | is 2,003 259 | 1 1,015 260 | . 1,012 261 | 34 4,090 262 | ##gb 18,259 263 | , 1,010 264 | so 2,061 265 | expect 5,987 266 | it 2,009 267 | to 2,000 268 | take 2,202 269 | a 1,037 270 | couple 3,232 271 | minutes 2,781 272 | to 2,000 273 | download 8,816 274 | to 2,000 275 | your 2,115 276 | cola 15,270 277 | ##b 2,497 278 | instance 6,013 279 | . 1,012 280 | 281 | [SEP] 102 282 | 283 | 284 | 285 | We've concatenated the `question` and `answer_text` together, but BERT still needs a way to distinguish them. BERT has two special "Segment" embeddings, one for segment "A" and one for segment "B". Before the word embeddings go into the BERT layers, the segment A embedding needs to be added to the `question` tokens, and the segment B embedding needs to be added to each of the `answer_text` tokens. 286 | 287 | These additions are handled for us by the `transformer` library, and all we need to do is specify a '0' or '1' for each token. 288 | 289 | Note: In the `transformers` library, huggingface likes to call these `token_type_ids`, but I'm going with `segment_ids` since this seems clearer, and is consistent with the BERT paper. 290 | 291 | 292 | ``` 293 | # Search the input_ids for the first instance of the `[SEP]` token. 294 | sep_index = input_ids.index(tokenizer.sep_token_id) 295 | 296 | # The number of segment A tokens includes the [SEP] token istelf. 297 | num_seg_a = sep_index + 1 298 | 299 | # The remainder are segment B. 300 | num_seg_b = len(input_ids) - num_seg_a 301 | 302 | # Construct the list of 0s and 1s. 303 | segment_ids = [0]*num_seg_a + [1]*num_seg_b 304 | 305 | # There should be a segment_id for every input token. 306 | assert len(segment_ids) == len(input_ids) 307 | ``` 308 | 309 | >*Side Note: Where's the padding?* 310 | > 311 | > The original [example code](https://huggingface.co/transformers/model_doc/bert.html?highlight=bertforquestionanswering#transformers.BertForQuestionAnswering) does not perform any padding. I suspect that this is because we are only feeding in a *single example*. If we instead fed in a batch of examples, then we would need to pad or truncate all of the samples in the batch to a single length, and supply an attention mask to tell BERT to ignore the padding tokens. 312 | 313 | We're ready to feed our example into the model! 314 | 315 | 316 | 317 | 318 | ``` 319 | # Run our example through the model. 320 | start_scores, end_scores = model(torch.tensor([input_ids]), # The tokens representing our input text. 321 | token_type_ids=torch.tensor([segment_ids])) # The segment IDs to differentiate question from answer_text 322 | 323 | ``` 324 | 325 | Now we can highlight the answer just by looking at the most probable start and end words. 326 | 327 | 328 | ``` 329 | # Find the tokens with the highest `start` and `end` scores. 330 | answer_start = torch.argmax(start_scores) 331 | answer_end = torch.argmax(end_scores) 332 | 333 | # Combine the tokens in the answer and print it out. 334 | answer = ' '.join(tokens[answer_start:answer_end+1]) 335 | 336 | print('Answer: "' + answer + '"') 337 | ``` 338 | 339 | Answer: "340 ##m" 340 | 341 | 342 | It got it right! Awesome :) 343 | 344 | > *Side Note: It's a little naive to pick the highest scores for start and end--what if it predicts an end word that's before the start word?! The correct implementation is to pick the highest total score for which end >= start.* 345 | 346 | With a little more effort, we can reconstruct any words that got broken down into subwords. 347 | 348 | 349 | ``` 350 | # Start with the first token. 351 | answer = tokens[answer_start] 352 | 353 | # Select the remaining answer tokens and join them with whitespace. 354 | for i in range(answer_start + 1, answer_end + 1): 355 | 356 | # If it's a subword token, then recombine it with the previous token. 357 | if tokens[i][0:2] == '##': 358 | answer += tokens[i][2:] 359 | 360 | # Otherwise, add a space then the token. 361 | else: 362 | answer += ' ' + tokens[i] 363 | 364 | print('Answer: "' + answer + '"') 365 | ``` 366 | 367 | Answer: "340m" 368 | 369 | 370 | ## 4. Visualizing Scores 371 | 372 | I was curious to see what the scores were for all of the words. The following cells generate bar plots showing the start and end scores for every word in the input. 373 | 374 | 375 | ``` 376 | import matplotlib.pyplot as plt 377 | import seaborn as sns 378 | 379 | # Use plot styling from seaborn. 380 | sns.set(style='darkgrid') 381 | 382 | # Increase the plot size and font size. 383 | #sns.set(font_scale=1.5) 384 | plt.rcParams["figure.figsize"] = (16,8) 385 | ``` 386 | 387 | Retrieve all of the start and end scores, and use all of the tokens as x-axis labels. 388 | 389 | 390 | ``` 391 | # Pull the scores out of PyTorch Tensors and convert them to 1D numpy arrays. 392 | s_scores = start_scores.detach().numpy().flatten() 393 | e_scores = end_scores.detach().numpy().flatten() 394 | 395 | # We'll use the tokens as the x-axis labels. In order to do that, they all need 396 | # to be unique, so we'll add the token index to the end of each one. 397 | token_labels = [] 398 | for (i, token) in enumerate(tokens): 399 | token_labels.append('{:} - {:>2}'.format(token, i)) 400 | 401 | ``` 402 | 403 | Create a bar plot showing the score for every input word being the "start" word. 404 | 405 | 406 | ``` 407 | # Create a barplot showing the start word score for all of the tokens. 408 | ax = sns.barplot(x=token_labels, y=s_scores, ci=None) 409 | 410 | # Turn the xlabels vertical. 411 | ax.set_xticklabels(ax.get_xticklabels(), rotation=90, ha="center") 412 | 413 | # Turn on the vertical grid to help align words to scores. 414 | ax.grid(True) 415 | 416 | plt.title('Start Word Scores') 417 | 418 | plt.show() 419 | ``` 420 | 421 | 422 | 423 | ![png](Question_Answering_with_a_Fine_Tuned_BERT_files/Question_Answering_with_a_Fine_Tuned_BERT_44_0.png) 424 | 425 | 426 | 427 | Create a second bar plot showing the score for every input word being the "end" word. 428 | 429 | 430 | ``` 431 | # Create a barplot showing the end word score for all of the tokens. 432 | ax = sns.barplot(x=token_labels, y=e_scores, ci=None) 433 | 434 | # Turn the xlabels vertical. 435 | ax.set_xticklabels(ax.get_xticklabels(), rotation=90, ha="center") 436 | 437 | # Turn on the vertical grid to help align words to scores. 438 | ax.grid(True) 439 | 440 | plt.title('End Word Scores') 441 | 442 | plt.show() 443 | ``` 444 | 445 | 446 | 447 | ![png](Question_Answering_with_a_Fine_Tuned_BERT_files/Question_Answering_with_a_Fine_Tuned_BERT_46_0.png) 448 | 449 | 450 | 451 | **Alternate View** 452 | 453 | I also tried visualizing both the start and end scores on a single bar plot, but I think it may actually be more confusing then seeing them separately. 454 | 455 | 456 | ``` 457 | import pandas as pd 458 | 459 | # Store the tokens and scores in a DataFrame. 460 | # Each token will have two rows, one for its start score and one for its end 461 | # score. The "marker" column will differentiate them. A little wacky, I know. 462 | scores = [] 463 | for (i, token_label) in enumerate(token_labels): 464 | 465 | # Add the token's start score as one row. 466 | scores.append({'token_label': token_label, 467 | 'score': s_scores[i], 468 | 'marker': 'start'}) 469 | 470 | # Add the token's end score as another row. 471 | scores.append({'token_label': token_label, 472 | 'score': e_scores[i], 473 | 'marker': 'end'}) 474 | 475 | df = pd.DataFrame(scores) 476 | 477 | ``` 478 | 479 | 480 | ``` 481 | # Draw a grouped barplot to show start and end scores for each word. 482 | # The "hue" parameter is where we tell it which datapoints belong to which 483 | # of the two series. 484 | g = sns.catplot(x="token_label", y="score", hue="marker", data=df, 485 | kind="bar", height=6, aspect=4) 486 | 487 | # Turn the xlabels vertical. 488 | g.set_xticklabels(g.ax.get_xticklabels(), rotation=90, ha="center") 489 | 490 | # Turn on the vertical grid to help align words to scores. 491 | g.ax.grid(True) 492 | 493 | ``` 494 | 495 | 496 | 497 | ![png](Question_Answering_with_a_Fine_Tuned_BERT_files/Question_Answering_with_a_Fine_Tuned_BERT_49_0.png) 498 | 499 | 500 | 501 | ## 5. More Examples 502 | 503 | Turn the QA process into a function so we can easily try out other examples. 504 | 505 | 506 | ``` 507 | def answer_question(question, answer_text): 508 | ''' 509 | Takes a `question` string and an `answer_text` string (which contains the 510 | answer), and identifies the words within the `answer_text` that are the 511 | answer. Prints them out. 512 | ''' 513 | # ======== Tokenize ======== 514 | # Apply the tokenizer to the input text, treating them as a text-pair. 515 | input_ids = tokenizer.encode(question, answer_text) 516 | 517 | # Report how long the input sequence is. 518 | print('Query has {:,} tokens.\n'.format(len(input_ids))) 519 | 520 | # ======== Set Segment IDs ======== 521 | # Search the input_ids for the first instance of the `[SEP]` token. 522 | sep_index = input_ids.index(tokenizer.sep_token_id) 523 | 524 | # The number of segment A tokens includes the [SEP] token istelf. 525 | num_seg_a = sep_index + 1 526 | 527 | # The remainder are segment B. 528 | num_seg_b = len(input_ids) - num_seg_a 529 | 530 | # Construct the list of 0s and 1s. 531 | segment_ids = [0]*num_seg_a + [1]*num_seg_b 532 | 533 | # There should be a segment_id for every input token. 534 | assert len(segment_ids) == len(input_ids) 535 | 536 | # ======== Evaluate ======== 537 | # Run our example question through the model. 538 | start_scores, end_scores = model(torch.tensor([input_ids]), # The tokens representing our input text. 539 | token_type_ids=torch.tensor([segment_ids])) # The segment IDs to differentiate question from answer_text 540 | 541 | # ======== Reconstruct Answer ======== 542 | # Find the tokens with the highest `start` and `end` scores. 543 | answer_start = torch.argmax(start_scores) 544 | answer_end = torch.argmax(end_scores) 545 | 546 | # Get the string versions of the input tokens. 547 | tokens = tokenizer.convert_ids_to_tokens(input_ids) 548 | 549 | # Start with the first token. 550 | answer = tokens[answer_start] 551 | 552 | # Select the remaining answer tokens and join them with whitespace. 553 | for i in range(answer_start + 1, answer_end + 1): 554 | 555 | # If it's a subword token, then recombine it with the previous token. 556 | if tokens[i][0:2] == '##': 557 | answer += tokens[i][2:] 558 | 559 | # Otherwise, add a space then the token. 560 | else: 561 | answer += ' ' + tokens[i] 562 | 563 | print('Answer: "' + answer + '"') 564 | ``` 565 | 566 | As our reference text, I've taken the Abstract of the [BERT paper](https://arxiv.org/pdf/1810.04805.pdf). 567 | 568 | 569 | 570 | ``` 571 | import textwrap 572 | 573 | # Wrap text to 80 characters. 574 | wrapper = textwrap.TextWrapper(width=80) 575 | 576 | bert_abstract = "We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement)." 577 | 578 | print(wrapper.fill(bert_abstract)) 579 | ``` 580 | 581 | We introduce a new language representation model called BERT, which stands for 582 | Bidirectional Encoder Representations from Transformers. Unlike recent language 583 | representation models (Peters et al., 2018a; Radford et al., 2018), BERT is 584 | designed to pretrain deep bidirectional representations from unlabeled text by 585 | jointly conditioning on both left and right context in all layers. As a result, 586 | the pre-trained BERT model can be finetuned with just one additional output 587 | layer to create state-of-the-art models for a wide range of tasks, such as 588 | question answering and language inference, without substantial taskspecific 589 | architecture modifications. BERT is conceptually simple and empirically 590 | powerful. It obtains new state-of-the-art results on eleven natural language 591 | processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute 592 | improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 593 | question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD 594 | v2.0 Test F1 to 83.1 (5.1 point absolute improvement). 595 | 596 | 597 | ----------------------------- 598 | Ask BERT what its name stands for (the answer is in the first sentence of the abstract). 599 | 600 | 601 | ``` 602 | question = "What does the 'B' in BERT stand for?" 603 | 604 | answer_question(question, bert_abstract) 605 | ``` 606 | 607 | Query has 258 tokens. 608 | 609 | Answer: "bidirectional encoder representations from transformers" 610 | 611 | 612 | --------------------- 613 | Ask BERT about example applications of itself :) 614 | 615 | The answer to the question comes from this passage from the abstract: 616 | 617 | > "...BERT model can be finetuned with just one additional output 618 | layer to create state-of-the-art models for **a wide range of tasks, such as 619 | question answering and language inference,** without substantial taskspecific 620 | architecture modifications." 621 | 622 | 623 | ``` 624 | question = "What are some example applications of BERT?" 625 | 626 | answer_question(question, bert_abstract) 627 | ``` 628 | 629 | Query has 255 tokens. 630 | 631 | Answer: "question answering and language inference" 632 | 633 | 634 | 635 | ``` 636 | 637 | ``` 638 | -------------------------------------------------------------------------------- /docs/Question_Answering_with_a_Fine_Tuned_BERT_files/Question_Answering_with_a_Fine_Tuned_BERT_44_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ankur3107/nlp_notebooks/db671902eebb52f7174df5f302cbc700fe0b0b97/docs/Question_Answering_with_a_Fine_Tuned_BERT_files/Question_Answering_with_a_Fine_Tuned_BERT_44_0.png -------------------------------------------------------------------------------- /docs/Question_Answering_with_a_Fine_Tuned_BERT_files/Question_Answering_with_a_Fine_Tuned_BERT_46_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ankur3107/nlp_notebooks/db671902eebb52f7174df5f302cbc700fe0b0b97/docs/Question_Answering_with_a_Fine_Tuned_BERT_files/Question_Answering_with_a_Fine_Tuned_BERT_46_0.png -------------------------------------------------------------------------------- /docs/Question_Answering_with_a_Fine_Tuned_BERT_files/Question_Answering_with_a_Fine_Tuned_BERT_49_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Ankur3107/nlp_notebooks/db671902eebb52f7174df5f302cbc700fe0b0b97/docs/Question_Answering_with_a_Fine_Tuned_BERT_files/Question_Answering_with_a_Fine_Tuned_BERT_49_0.png -------------------------------------------------------------------------------- /docs/Seq2Seq_Pytorch.md: -------------------------------------------------------------------------------- 1 | Open In Colab 2 | 3 | 4 | ``` 5 | !pip install spacy --upgrade 6 | ``` 7 | 8 | Collecting spacy 9 | [?25l Downloading https://files.pythonhosted.org/packages/10/b5/c7a92c7ce5d4b353b70b4b5b4385687206c8b230ddfe08746ab0fd310a3a/spacy-2.3.2-cp36-cp36m-manylinux1_x86_64.whl (9.9MB) 10 |  |████████████████████████████████| 10.0MB 3.9MB/s 11 | [?25hRequirement already satisfied, skipping upgrade: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.6/dist-packages (from spacy) (1.0.2) 12 | Requirement already satisfied, skipping upgrade: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy) (0.7.1) 13 | Requirement already satisfied, skipping upgrade: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.6/dist-packages (from spacy) (1.0.0) 14 | Requirement already satisfied, skipping upgrade: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy) (2.0.3) 15 | Requirement already satisfied, skipping upgrade: setuptools in /usr/local/lib/python3.6/dist-packages (from spacy) (49.6.0) 16 | Collecting thinc==7.4.1 17 | [?25l Downloading https://files.pythonhosted.org/packages/10/ae/ef3ae5e93639c0ef8e3eb32e3c18341e511b3c515fcfc603f4b808087651/thinc-7.4.1-cp36-cp36m-manylinux1_x86_64.whl (2.1MB) 18 |  |████████████████████████████████| 2.1MB 18.3MB/s 19 | [?25hRequirement already satisfied, skipping upgrade: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.6/dist-packages (from spacy) (2.23.0) 20 | Requirement already satisfied, skipping upgrade: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.6/dist-packages (from spacy) (4.41.1) 21 | Requirement already satisfied, skipping upgrade: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.6/dist-packages (from spacy) (1.1.3) 22 | Requirement already satisfied, skipping upgrade: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy) (0.4.1) 23 | Requirement already satisfied, skipping upgrade: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy) (1.0.2) 24 | Requirement already satisfied, skipping upgrade: numpy>=1.15.0 in /usr/local/lib/python3.6/dist-packages (from spacy) (1.18.5) 25 | Requirement already satisfied, skipping upgrade: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy) (3.0.2) 26 | Requirement already satisfied, skipping upgrade: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from catalogue<1.1.0,>=0.0.7->spacy) (1.7.0) 27 | Requirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (3.0.4) 28 | Requirement already satisfied, skipping upgrade: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (2.10) 29 | Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (1.24.3) 30 | Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (2020.6.20) 31 | Requirement already satisfied, skipping upgrade: zipp>=0.5 in /usr/local/lib/python3.6/dist-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy) (3.1.0) 32 | Installing collected packages: thinc, spacy 33 | Found existing installation: thinc 7.4.0 34 | Uninstalling thinc-7.4.0: 35 | Successfully uninstalled thinc-7.4.0 36 | Found existing installation: spacy 2.2.4 37 | Uninstalling spacy-2.2.4: 38 | Successfully uninstalled spacy-2.2.4 39 | Successfully installed spacy-2.3.2 thinc-7.4.1 40 | 41 | 42 | 43 | ``` 44 | !python -m spacy download en 45 | !python -m spacy download de 46 | !python -m spacy download hi 47 | ``` 48 | 49 | Collecting en_core_web_sm==2.3.1 50 | [?25l Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz (12.0MB) 51 |  |████████████████████████████████| 12.1MB 803kB/s 52 | [?25hRequirement already satisfied: spacy<2.4.0,>=2.3.0 in /usr/local/lib/python3.6/dist-packages (from en_core_web_sm==2.3.1) (2.3.2) 53 | Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (0.7.1) 54 | Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.2) 55 | Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (0.4.1) 56 | Requirement already satisfied: thinc==7.4.1 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (7.4.1) 57 | Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.2) 58 | Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.0) 59 | Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.18.5) 60 | Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (3.0.2) 61 | Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.1.3) 62 | Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (4.41.1) 63 | Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.23.0) 64 | Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (49.6.0) 65 | Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.0.3) 66 | Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.7.0) 67 | Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (3.0.4) 68 | Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.24.3) 69 | Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2020.6.20) 70 | Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.10) 71 | Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.6/dist-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (3.1.0) 72 | Building wheels for collected packages: en-core-web-sm 73 | Building wheel for en-core-web-sm (setup.py) ... [?25l[?25hdone 74 | Created wheel for en-core-web-sm: filename=en_core_web_sm-2.3.1-cp36-none-any.whl size=12047109 sha256=c566e8eddcd63bc8259a784a72fae15d14da6e23d183b7a16bf6032b9aeaeed2 75 | Stored in directory: /tmp/pip-ephem-wheel-cache-76hzqxdi/wheels/2b/3f/41/f0b92863355c3ba34bb32b37d8a0c662959da0058202094f46 76 | Successfully built en-core-web-sm 77 | Installing collected packages: en-core-web-sm 78 | Found existing installation: en-core-web-sm 2.2.5 79 | Uninstalling en-core-web-sm-2.2.5: 80 | Successfully uninstalled en-core-web-sm-2.2.5 81 | Successfully installed en-core-web-sm-2.3.1 82 | ✔ Download and installation successful 83 | You can now load the model via spacy.load('en_core_web_sm') 84 | ✔ Linking successful 85 | /usr/local/lib/python3.6/dist-packages/en_core_web_sm --> 86 | /usr/local/lib/python3.6/dist-packages/spacy/data/en 87 | You can now load the model via spacy.load('en') 88 | Collecting de_core_news_sm==2.3.0 89 | [?25l Downloading https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-2.3.0/de_core_news_sm-2.3.0.tar.gz (14.9MB) 90 |  |████████████████████████████████| 14.9MB 833kB/s 91 | [?25hRequirement already satisfied: spacy<2.4.0,>=2.3.0 in /usr/local/lib/python3.6/dist-packages (from de_core_news_sm==2.3.0) (2.3.2) 92 | Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (1.18.5) 93 | Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (4.41.1) 94 | Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (2.23.0) 95 | Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (1.0.0) 96 | Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (1.0.2) 97 | Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (1.1.3) 98 | Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (0.4.1) 99 | Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (1.0.2) 100 | Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (49.6.0) 101 | Requirement already satisfied: thinc==7.4.1 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (7.4.1) 102 | Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (3.0.2) 103 | Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (0.7.1) 104 | Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.6/dist-packages (from spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (2.0.3) 105 | Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (2020.6.20) 106 | Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (3.0.4) 107 | Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (2.10) 108 | Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.6/dist-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (1.24.3) 109 | Requirement already satisfied: importlib-metadata>=0.20; python_version < "3.8" in /usr/local/lib/python3.6/dist-packages (from catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (1.7.0) 110 | Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.6/dist-packages (from importlib-metadata>=0.20; python_version < "3.8"->catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->de_core_news_sm==2.3.0) (3.1.0) 111 | Building wheels for collected packages: de-core-news-sm 112 | Building wheel for de-core-news-sm (setup.py) ... [?25l[?25hdone 113 | Created wheel for de-core-news-sm: filename=de_core_news_sm-2.3.0-cp36-none-any.whl size=14907580 sha256=959d3b6df2936d8e86bbc601dbd80eddfb310d3af7aca8895529c319cb58b539 114 | Stored in directory: /tmp/pip-ephem-wheel-cache-59e5ledg/wheels/db/f3/1e/0df0f27eee12bd1aaa94bcfef11b01eca62f90b9b9a0ce08fd 115 | Successfully built de-core-news-sm 116 | Installing collected packages: de-core-news-sm 117 | Found existing installation: de-core-news-sm 2.2.5 118 | Uninstalling de-core-news-sm-2.2.5: 119 | Successfully uninstalled de-core-news-sm-2.2.5 120 | Successfully installed de-core-news-sm-2.3.0 121 | ✔ Download and installation successful 122 | You can now load the model via spacy.load('de_core_news_sm') 123 | ✔ Linking successful 124 | /usr/local/lib/python3.6/dist-packages/de_core_news_sm --> 125 | /usr/local/lib/python3.6/dist-packages/spacy/data/de 126 | You can now load the model via spacy.load('de') 127 | 128 | ✘ No compatible model found for 'hi' (spaCy v2.3.2). 129 | 130 | 131 | 132 | 133 | ``` 134 | import torch 135 | import torch.nn as nn 136 | import torch.optim as optim 137 | import spacy 138 | from torch.utils.tensorboard import SummaryWriter 139 | from torchtext.datasets import Multi30k 140 | from torchtext.data import Field, BucketIterator 141 | from spacy.lang.hi import Hindi 142 | ``` 143 | 144 | 145 | ``` 146 | spacy_ger = spacy.load("de") 147 | spacy_eng = spacy.load("en") 148 | 149 | spacy_hi = Hindi() 150 | def tokenize_hi(text): 151 | return [tok.text for tok in spacy_hi.tokenizer(text)] 152 | 153 | def tokenize_ger(text): 154 | return [tok.text for tok in spacy_ger.tokenizer(text)] 155 | 156 | 157 | def tokenize_eng(text): 158 | return [tok.text for tok in spacy_eng.tokenizer(text)] 159 | 160 | 161 | german = Field(tokenize=tokenize_ger, lower=True, init_token="", eos_token="") 162 | 163 | english = Field( 164 | tokenize=tokenize_eng, lower=True, init_token="", eos_token="" 165 | ) 166 | 167 | train_data, valid_data, test_data = Multi30k.splits( 168 | exts=(".de", ".en"), fields=(german, english) 169 | ) 170 | 171 | german.build_vocab(train_data, max_size=10000, min_freq=2) 172 | english.build_vocab(train_data, max_size=10000, min_freq=2) 173 | ``` 174 | 175 | 176 | ``` 177 | class Transformer(nn.Module): 178 | def __init__( 179 | self, 180 | embedding_size, 181 | src_vocab_size, 182 | trg_vocab_size, 183 | src_pad_idx, 184 | num_heads, 185 | num_encoder_layers, 186 | num_decoder_layers, 187 | forward_expansion, 188 | dropout, 189 | max_len, 190 | device, 191 | ): 192 | super(Transformer, self).__init__() 193 | self.src_word_embedding = nn.Embedding(src_vocab_size, embedding_size) 194 | self.src_position_embedding = nn.Embedding(max_len, embedding_size) 195 | self.trg_word_embedding = nn.Embedding(trg_vocab_size, embedding_size) 196 | self.trg_position_embedding = nn.Embedding(max_len, embedding_size) 197 | 198 | self.device = device 199 | self.transformer = nn.Transformer( 200 | embedding_size, 201 | num_heads, 202 | num_encoder_layers, 203 | num_decoder_layers, 204 | forward_expansion, 205 | dropout, 206 | ) 207 | self.fc_out = nn.Linear(embedding_size, trg_vocab_size) 208 | self.dropout = nn.Dropout(dropout) 209 | self.src_pad_idx = src_pad_idx 210 | 211 | def make_src_mask(self, src): 212 | src_mask = src.transpose(0, 1) == self.src_pad_idx 213 | 214 | # (N, src_len) 215 | return src_mask.to(self.device) 216 | 217 | def forward(self, src, trg): 218 | src_seq_length, N = src.shape 219 | trg_seq_length, N = trg.shape 220 | 221 | src_positions = ( 222 | torch.arange(0, src_seq_length) 223 | .unsqueeze(1) 224 | .expand(src_seq_length, N) 225 | .to(self.device) 226 | ) 227 | 228 | trg_positions = ( 229 | torch.arange(0, trg_seq_length) 230 | .unsqueeze(1) 231 | .expand(trg_seq_length, N) 232 | .to(self.device) 233 | ) 234 | 235 | embed_src = self.dropout( 236 | (self.src_word_embedding(src) + self.src_position_embedding(src_positions)) 237 | ) 238 | embed_trg = self.dropout( 239 | (self.trg_word_embedding(trg) + self.trg_position_embedding(trg_positions)) 240 | ) 241 | 242 | src_padding_mask = self.make_src_mask(src) 243 | trg_mask = self.transformer.generate_square_subsequent_mask(trg_seq_length).to( 244 | self.device 245 | ) 246 | 247 | out = self.transformer( 248 | embed_src, 249 | embed_trg, 250 | src_key_padding_mask=src_padding_mask, 251 | tgt_mask=trg_mask, 252 | ) 253 | out = self.fc_out(out) 254 | return out 255 | 256 | ``` 257 | 258 | 259 | ``` 260 | # We're ready to define everything we need for training our Seq2Seq model 261 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 262 | 263 | load_model = False 264 | save_model = True 265 | 266 | # Training hyperparameters 267 | num_epochs = 10000 268 | learning_rate = 3e-4 269 | batch_size = 32 270 | 271 | # Model hyperparameters 272 | src_vocab_size = len(german.vocab) 273 | trg_vocab_size = len(english.vocab) 274 | embedding_size = 512 275 | num_heads = 8 276 | num_encoder_layers = 3 277 | num_decoder_layers = 3 278 | dropout = 0.10 279 | max_len = 100 280 | forward_expansion = 4 281 | src_pad_idx = english.vocab.stoi[""] 282 | 283 | # Tensorboard to get nice loss plot 284 | writer = SummaryWriter("runs/loss_plot") 285 | step = 0 286 | ``` 287 | 288 | 289 | ``` 290 | train_iterator, valid_iterator, test_iterator = BucketIterator.splits( 291 | (train_data, valid_data, test_data), 292 | batch_size=batch_size, 293 | sort_within_batch=True, 294 | sort_key=lambda x: len(x.src), 295 | device=device, 296 | ) 297 | 298 | ``` 299 | 300 | 301 | ``` 302 | model = Transformer( 303 | embedding_size, 304 | src_vocab_size, 305 | trg_vocab_size, 306 | src_pad_idx, 307 | num_heads, 308 | num_encoder_layers, 309 | num_decoder_layers, 310 | forward_expansion, 311 | dropout, 312 | max_len, 313 | device, 314 | ).to(device) 315 | ``` 316 | 317 | 318 | ``` 319 | def translate_sentence(model, sentence, german, english, device, max_length=50): 320 | # Load german tokenizer 321 | spacy_ger = spacy.load("de") 322 | 323 | # Create tokens using spacy and everything in lower case (which is what our vocab is) 324 | if type(sentence) == str: 325 | tokens = [token.text.lower() for token in spacy_ger(sentence)] 326 | else: 327 | tokens = [token.lower() for token in sentence] 328 | 329 | # Add and in beginning and end respectively 330 | tokens.insert(0, german.init_token) 331 | tokens.append(german.eos_token) 332 | 333 | # Go through each german token and convert to an index 334 | text_to_indices = [german.vocab.stoi[token] for token in tokens] 335 | 336 | # Convert to Tensor 337 | sentence_tensor = torch.LongTensor(text_to_indices).unsqueeze(1).to(device) 338 | 339 | outputs = [english.vocab.stoi[""]] 340 | for i in range(max_length): 341 | trg_tensor = torch.LongTensor(outputs).unsqueeze(1).to(device) 342 | 343 | with torch.no_grad(): 344 | output = model(sentence_tensor, trg_tensor) 345 | 346 | best_guess = output.argmax(2)[-1, :].item() 347 | outputs.append(best_guess) 348 | 349 | if best_guess == english.vocab.stoi[""]: 350 | break 351 | 352 | translated_sentence = [english.vocab.itos[idx] for idx in outputs] 353 | # remove start token 354 | return translated_sentence[1:] 355 | 356 | 357 | def bleu(data, model, german, english, device): 358 | targets = [] 359 | outputs = [] 360 | 361 | for example in data: 362 | src = vars(example)["src"] 363 | trg = vars(example)["trg"] 364 | 365 | prediction = translate_sentence(model, src, german, english, device) 366 | prediction = prediction[:-1] # remove token 367 | 368 | targets.append([trg]) 369 | outputs.append(prediction) 370 | 371 | return bleu_score(outputs, targets) 372 | 373 | 374 | def save_checkpoint(state, filename="my_checkpoint.pth.tar"): 375 | print("=> Saving checkpoint") 376 | torch.save(state, filename) 377 | 378 | 379 | def load_checkpoint(checkpoint, model, optimizer): 380 | print("=> Loading checkpoint") 381 | model.load_state_dict(checkpoint["state_dict"]) 382 | optimizer.load_state_dict(checkpoint["optimizer"]) 383 | 384 | 385 | optimizer = optim.Adam(model.parameters(), lr=learning_rate) 386 | 387 | scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau( 388 | optimizer, factor=0.1, patience=10, verbose=True 389 | ) 390 | 391 | pad_idx = english.vocab.stoi[""] 392 | criterion = nn.CrossEntropyLoss(ignore_index=pad_idx) 393 | 394 | if load_model: 395 | load_checkpoint(torch.load("my_checkpoint.pth.tar"), model, optimizer) 396 | 397 | sentence = "ein pferd geht unter einer brücke neben einem boot." 398 | ``` 399 | 400 | 401 | ``` 402 | for epoch in range(num_epochs): 403 | print(f"[Epoch {epoch} / {num_epochs}]") 404 | 405 | if save_model: 406 | checkpoint = { 407 | "state_dict": model.state_dict(), 408 | "optimizer": optimizer.state_dict(), 409 | } 410 | save_checkpoint(checkpoint) 411 | 412 | model.eval() 413 | translated_sentence = translate_sentence( 414 | model, sentence, german, english, device, max_length=50 415 | ) 416 | 417 | print(f"Translated example sentence: \n {translated_sentence}") 418 | model.train() 419 | losses = [] 420 | 421 | for batch_idx, batch in enumerate(train_iterator): 422 | # Get input and targets and get to cuda 423 | inp_data = batch.src.to(device) 424 | target = batch.trg.to(device) 425 | 426 | # Forward prop 427 | output = model(inp_data, target[:-1, :]) 428 | 429 | # Output is of shape (trg_len, batch_size, output_dim) but Cross Entropy Loss 430 | # doesn't take input in that form. For example if we have MNIST we want to have 431 | # output to be: (N, 10) and targets just (N). Here we can view it in a similar 432 | # way that we have output_words * batch_size that we want to send in into 433 | # our cost function, so we need to do some reshapin. 434 | # Let's also remove the start token while we're at it 435 | output = output.reshape(-1, output.shape[2]) 436 | target = target[1:].reshape(-1) 437 | 438 | optimizer.zero_grad() 439 | 440 | loss = criterion(output, target) 441 | losses.append(loss.item()) 442 | 443 | # Back prop 444 | loss.backward() 445 | # Clip to avoid exploding gradient issues, makes sure grads are 446 | # within a healthy range 447 | torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1) 448 | 449 | # Gradient descent step 450 | optimizer.step() 451 | 452 | # plot to tensorboard 453 | writer.add_scalar("Training loss", loss, global_step=step) 454 | step += 1 455 | 456 | mean_loss = sum(losses) / len(losses) 457 | scheduler.step(mean_loss) 458 | 459 | # running on entire test data takes a while 460 | score = bleu(test_data[1:100], model, german, english, device) 461 | print(f"Bleu score {score * 100:.2f}") 462 | ``` 463 | 464 | [Epoch 0 / 10000] 465 | => Saving checkpoint 466 | Translated example sentence: 467 | ['.', 'secures', 'secures', 'half', 'secures', 'secures', 'half', '.', 'secures', 'toddler', 'secures', 'secures', 'half', '.', '.', 'olympians', 'half', '.', 'olympians', 'helmet', '.', 'secures', 'toddler', 'secures', 'secures', 'toddler', 'secures', 'secures', '.', 'secures', 'secures', '.', 'secures', '.', 'secures', 'secures', 'secures', 'secures', 'half', 'half', '.', 'secures', 'toddler', 'secures', 'secures', 'secures', 'mosaic', 'secures', 'secures', 'toddler'] 468 | [Epoch 1 / 10000] 469 | => Saving checkpoint 470 | Translated example sentence: 471 | ['a', 'horse', 'walking', 'under', 'a', 'boat', 'next', 'to', 'a', 'boat', '.', ''] 472 | [Epoch 2 / 10000] 473 | => Saving checkpoint 474 | Translated example sentence: 475 | ['a', 'horse', 'is', 'walking', 'under', 'a', 'bridge', 'next', 'to', 'a', 'boat', '.', ''] 476 | [Epoch 3 / 10000] 477 | => Saving checkpoint 478 | Translated example sentence: 479 | ['a', 'horse', 'is', 'walking', 'under', 'a', 'bridge', 'next', 'to', 'a', 'boat', '.', ''] 480 | [Epoch 4 / 10000] 481 | => Saving checkpoint 482 | Translated example sentence: 483 | ['a', 'horse', 'walks', 'under', 'a', 'bridge', 'next', 'to', 'a', 'boat', '.', ''] 484 | [Epoch 5 / 10000] 485 | => Saving checkpoint 486 | Translated example sentence: 487 | ['a', 'horse', 'walks', 'under', 'a', 'bridge', 'next', 'to', 'a', 'boat', '.', ''] 488 | [Epoch 6 / 10000] 489 | => Saving checkpoint 490 | Translated example sentence: 491 | ['a', 'horse', 'walks', 'underneath', 'a', 'bridge', 'next', 'to', 'a', 'boat', '.', ''] 492 | [Epoch 7 / 10000] 493 | => Saving checkpoint 494 | Translated example sentence: 495 | ['a', 'horse', 'is', 'walking', 'under', 'a', 'bridge', 'beside', 'a', 'boat', '.', ''] 496 | 497 | 498 | 499 | ``` 500 | 501 | ``` 502 | -------------------------------------------------------------------------------- /docs/SetFit_SST_2_Few_shot.md: -------------------------------------------------------------------------------- 1 | Open In Colab 2 | 3 | Reference Links: 4 | 5 | 1. https://towardsdatascience.com/sentence-transformer-fine-tuning-setfit-outperforms-gpt-3-on-few-shot-text-classification-while-d9a3788f0b4e 6 | 7 | 2. https://arxiv.org/pdf/2109.14076.pdf 8 | 9 | 3. https://huggingface.co/spaces/ought/raft-leaderboard 10 | 11 | 4. https://github.com/timoschick/pet 12 | 13 | 14 | # Init 15 | 16 | 17 | ```python 18 | !pip install sentence_transformers -q 19 | ``` 20 | 21 |  |████████████████████████████████| 78 kB 3.4 MB/s 22 |  |████████████████████████████████| 3.4 MB 11.4 MB/s 23 |  |████████████████████████████████| 3.3 MB 36.4 MB/s 24 |  |████████████████████████████████| 1.2 MB 29.8 MB/s 25 |  |████████████████████████████████| 61 kB 370 kB/s 26 |  |████████████████████████████████| 596 kB 25.4 MB/s 27 |  |████████████████████████████████| 895 kB 32.0 MB/s 28 | [?25h Building wheel for sentence-transformers (setup.py) ... [?25l[?25hdone 29 | 30 | 31 | 32 | ```python 33 | from sklearn.metrics import accuracy_score, f1_score 34 | from sklearn.linear_model import LogisticRegression 35 | from sentence_transformers import SentenceTransformer, InputExample, losses, models, datasets, evaluation 36 | from torch.utils.data import DataLoader 37 | 38 | from sklearn.manifold import TSNE 39 | from matplotlib import pyplot as plt 40 | 41 | # import warnings filter 42 | from warnings import simplefilter 43 | # ignore all future warnings 44 | simplefilter(action='ignore', category=FutureWarning) 45 | 46 | import pandas as pd 47 | import numpy as np 48 | 49 | import torch 50 | import random 51 | import torch 52 | 53 | def set_seed(seed): 54 | random.seed(seed) 55 | np.random.seed(seed) 56 | torch.manual_seed(seed) 57 | ``` 58 | 59 | 60 | ```python 61 | def sentence_pairs_generation(sentences, labels, pairs): 62 | # initialize two empty lists to hold the (sentence, sentence) pairs and 63 | # labels to indicate if a pair is positive or negative 64 | 65 | numClassesList = np.unique(labels) 66 | idx = [np.where(labels == i)[0] for i in numClassesList] 67 | 68 | for idxA in range(len(sentences)): 69 | currentSentence = sentences[idxA] 70 | label = labels[idxA] 71 | idxB = np.random.choice(idx[np.where(numClassesList==label)[0][0]]) 72 | posSentence = sentences[idxB] 73 | # prepare a positive pair and update the sentences and labels 74 | # lists, respectively 75 | pairs.append(InputExample(texts=[currentSentence, posSentence], label=1.0)) 76 | 77 | negIdx = np.where(labels != label)[0] 78 | negSentence = sentences[np.random.choice(negIdx)] 79 | # prepare a negative pair of images and update our lists 80 | pairs.append(InputExample(texts=[currentSentence, negSentence], label=0.0)) 81 | 82 | # return a 2-tuple of our image pairs and labels 83 | return (pairs) 84 | ``` 85 | 86 | 87 | ```python 88 | #SST-2 89 | # Load SST-2 dataset into a pandas dataframe. 90 | 91 | train_df = pd.read_csv('https://github.com/clairett/pytorch-sentiment-classification/raw/master/data/SST2/train.tsv', delimiter='\t', header=None) 92 | 93 | # Load the test dataset into a pandas dataframe. 94 | eval_df = pd.read_csv('https://github.com/clairett/pytorch-sentiment-classification/raw/master/data/SST2/test.tsv', delimiter='\t', header=None) 95 | 96 | text_col=train_df.columns.values[0] 97 | category_col=train_df.columns.values[1] 98 | 99 | x_eval = eval_df[text_col].values.tolist() 100 | y_eval = eval_df[category_col].values.tolist() 101 | ``` 102 | 103 | 104 | ```python 105 | train_df.head() 106 | ``` 107 | 108 | 109 | 110 | 111 | 112 |
113 |
114 |
115 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 |
01
0a stirring , funny and finally transporting re...1
1apparently reassembled from the cutting room f...0
2they presume their audience wo n't sit still f...0
3this is a visually stunning rumination on love...1
4jonathan parker 's bartleby should have been t...1
164 |
165 | 175 | 176 | 213 | 214 | 238 |
239 |
240 | 241 | 242 | 243 | 244 | # SetFit 245 | 246 | 247 | ```python 248 | #@title SetFit 249 | st_model = 'paraphrase-mpnet-base-v2' #@param ['paraphrase-mpnet-base-v2', 'all-mpnet-base-v1', 'all-mpnet-base-v2', 'stsb-mpnet-base-v2', 'all-MiniLM-L12-v2', 'paraphrase-albert-small-v2', 'all-roberta-large-v1'] 250 | num_training = 32 #@param ["8", "16", "32", "54", "128", "256", "512"] {type:"raw"} 251 | num_itr = 5 #@param ["1", "2", "3", "4", "5", "10"] {type:"raw"} 252 | plot2d_checkbox = True #@param {type: 'boolean'} 253 | 254 | ``` 255 | 256 | 257 | ```python 258 | set_seed(0) 259 | # Equal samples per class training 260 | train_df_sample = pd.concat([train_df[train_df[1]==0].sample(num_training), train_df[train_df[1]==1].sample(num_training)]) 261 | x_train = train_df_sample[text_col].values.tolist() 262 | y_train = train_df_sample[category_col].values.tolist() 263 | 264 | ``` 265 | 266 | 267 | ```python 268 | train_df_sample.shape 269 | ``` 270 | 271 | 272 | 273 | 274 | (64, 2) 275 | 276 | 277 | 278 | 279 | ```python 280 | x_train[0:5], y_train[0:5] 281 | ``` 282 | 283 | 284 | 285 | 286 | (['makes a joke out of car chases for an hour and then gives us half an hour of car chases', 287 | "so mind numbingly awful that you hope britney wo n't do it one more time , as far as movies are concerned", 288 | "maid in manhattan proves that it 's easier to change the sheets than to change hackneyed concepts when it comes to dreaming up romantic comedies", 289 | 'if you go , pack your knitting needles', 290 | 'time of favor could have given audiences the time of day by concentrating on the elements of a revealing alienation among a culture of people who sadly are at hostile odds with one another through recklessness and retaliation'], 291 | [0, 0, 0, 0, 0]) 292 | 293 | 294 | 295 | 296 | ```python 297 | train_examples = [] 298 | for x in range(num_itr): 299 | train_examples = sentence_pairs_generation(np.array(x_train), np.array(y_train), train_examples) 300 | ``` 301 | 302 | 303 | ```python 304 | len(train_examples) 305 | ``` 306 | 307 | 308 | 309 | 310 | 640 311 | 312 | 313 | 314 | 315 | ```python 316 | i_example = train_examples[0] 317 | ``` 318 | 319 | 320 | ```python 321 | i_example.texts, i_example.label 322 | ``` 323 | 324 | 325 | 326 | 327 | (['makes a joke out of car chases for an hour and then gives us half an hour of car chases', 328 | 'it is that rare combination of bad writing , bad direction and bad acting the trifecta of badness'], 329 | 1.0) 330 | 331 | 332 | 333 | 334 | ```python 335 | orig_model = SentenceTransformer(st_model) 336 | model = SentenceTransformer(st_model) 337 | ``` 338 | 339 | 340 | Downloading: 0%| | 0.00/690 [00:00Open In Colab 2 | 3 | 4 | ```python 5 | !pip install wikipedia transformers sentence_transformers faiss-cpu -q 6 | ``` 7 | 8 |  |████████████████████████████████| 3.4 MB 36.1 MB/s 9 |  |████████████████████████████████| 78 kB 5.8 MB/s 10 |  |████████████████████████████████| 8.6 MB 60.3 MB/s 11 |  |████████████████████████████████| 3.3 MB 54.2 MB/s 12 |  |████████████████████████████████| 895 kB 66.2 MB/s 13 |  |████████████████████████████████| 67 kB 4.6 MB/s 14 |  |████████████████████████████████| 596 kB 64.9 MB/s 15 |  |████████████████████████████████| 1.2 MB 54.0 MB/s 16 | [?25h Building wheel for wikipedia (setup.py) ... [?25l[?25hdone 17 | Building wheel for sentence-transformers (setup.py) ... [?25l[?25hdone 18 | 19 | 20 | 21 | ```python 22 | import wikipedia 23 | from wikipedia.exceptions import DisambiguationError 24 | from transformers import pipeline 25 | ``` 26 | 27 | 28 | ```python 29 | def divide_chunks(l, n): 30 | # looping till length l 31 | for i in range(0, len(l), n): 32 | yield l[i:i + n] 33 | 34 | def get_passages(text, k=100): 35 | tokens = text.split(" ") 36 | tokens_chunks = list(divide_chunks(tokens, k)) 37 | passages = [" ".join(c) for c in tokens_chunks] 38 | return passages 39 | 40 | def get_passage_for_question(question, wiki_hits=3, passage_len=100, debug=False): 41 | top_hits = wikipedia.search(question, wiki_hits) 42 | if debug: 43 | print("Top Wiki hits :", top_hits) 44 | passages = [] 45 | for hit in top_hits: 46 | try: 47 | html_page = wikipedia.page(title = hit, auto_suggest = False) 48 | except DisambiguationError: 49 | continue 50 | hit_passages = get_passages(html_page.content, k=passage_len) 51 | passages.extend(hit_passages) 52 | 53 | return passages 54 | ``` 55 | 56 | 57 | ```python 58 | qa = pipeline("question-answering", model="ankur310794/roberta-base-squad2-nq") 59 | ``` 60 | 61 | 62 | Downloading: 0%| | 0.00/643 [00:002: 230 | answer_dict['answer_sentence'] = all_sents[i-1] 231 | return answer_dict 232 | return answer_dict 233 | ``` 234 | 235 | 236 | ```python 237 | from sentence_transformers import CrossEncoder 238 | ranking_model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2', max_length=196) 239 | def get_reranked_passage(passages, question, top_rr): 240 | passage_question_pair = [(question, p) for p in passages] 241 | scores = ranking_model.predict(passage_question_pair) 242 | shorted_index = np.argpartition(scores, -top_rr)[::-1] 243 | shorted_scores = np.array([scores[i] for i in shorted_index]) 244 | return [passages[i] for i in shorted_index[0:top_rr]] 245 | ``` 246 | 247 | 248 | ```python 249 | # end to end with dpr 250 | topk_r=30 251 | topk_rr=8 252 | import pandas as pd 253 | 254 | 255 | def get_answer_dpr(question): 256 | passages = get_passage_for_question(question, debug=True) 257 | print("Total passages: ", len(passages)) 258 | passage_embeddings = extracted_passage_embeddings(passages) 259 | query_embeddings = extracted_query_embeddings([question]) 260 | faiss_index = faiss.IndexFlatL2(128) 261 | faiss_index.add(passage_embeddings.pooler_output) 262 | prob, index = faiss_index.search(query_embeddings.pooler_output, k=topk_r) 263 | r_passages = [passages[i] for i in index[0]] 264 | print("Top k retrieved passages :", len(r_passages)) 265 | rr_passages = get_reranked_passage(r_passages, question, topk_rr) 266 | print("Top k reranked passages :", len(rr_passages)) 267 | m_passages = combine_results(rr_passages) 268 | print("Merged passages :", len(m_passages)) 269 | results = qa(question=[question]*len(m_passages), context=m_passages, max_seq_len=512) 270 | if isinstance(results, dict): 271 | results = [results] 272 | output_results = [get_answer_full_sent(m_passages[i],results[i]) for i in range(len(results))] 273 | return pd.DataFrame(output_results)[['answer', 'answer_sentence', 'score']].sort_values("score", ascending=False) 274 | ``` 275 | 276 | 277 | ```python 278 | results= get_answer_dpr("where was tara located in gone with the wind?") 279 | results 280 | ``` 281 | 282 | Top Wiki hits : ['Tara (plantation)', 'Margaret Mitchell', 'RKO Forty Acres'] 283 | Total passages: 95 284 | 1/1 [==============================] - 0s 470ms/step 285 | 1/1 [==============================] - 0s 21ms/step 286 | Top k retrieved passages : 30 287 | Top k reranked passages : 8 288 | Merged passages : 2 289 | 290 | 291 | /usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray 292 | return array(a, dtype, copy=False, order=order) 293 | 294 | 295 | 296 | 297 | 298 | 299 |
300 |
301 |
302 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 |
answeranswer_sentencescore
0Talmadge FarmsNow the Tara facade is still located at Talmad...0.740677
1virtually the sameIn the 2007 novel by Donald McCaig, Rhett Butl...0.159294
339 |
340 | 350 | 351 | 388 | 389 | 413 |
414 |
415 | 416 | 417 | 418 | 419 | 420 | ```python 421 | results.sort_values("score", ascending=False) 422 | ``` 423 | 424 | 425 | 426 | 427 | 428 |
429 |
430 |
431 | 444 | 445 | 446 | 447 | 448 | 449 | 450 | 451 | 452 | 453 | 454 | 455 | 456 | 457 | 458 | 459 | 460 | 461 | 462 | 463 | 464 | 465 | 466 | 467 |
answeranswer_sentencescore
0Talmadge FarmsNow the Tara facade is still located at Talmad...0.740677
1virtually the sameIn the 2007 novel by Donald McCaig, Rhett Butl...0.159294
468 |
469 | 479 | 480 | 517 | 518 | 542 |
543 |
544 | 545 | 546 | 547 | 548 | 549 | ```python 550 | !pip install gradio -q 551 | ``` 552 | 553 |  |████████████████████████████████| 865 kB 21.2 MB/s 554 |  |████████████████████████████████| 2.0 MB 37.0 MB/s 555 |  |████████████████████████████████| 210 kB 50.3 MB/s 556 |  |████████████████████████████████| 61 kB 345 kB/s 557 |  |████████████████████████████████| 856 kB 42.0 MB/s 558 |  |████████████████████████████████| 3.6 MB 54.9 MB/s 559 | [?25h Building wheel for ffmpy (setup.py) ... [?25l[?25hdone 560 | Building wheel for flask-cachebuster (setup.py) ... [?25l[?25hdone 561 | 562 | 563 | 564 | ```python 565 | import gradio as gr 566 | inp = gr.inputs.Textbox(lines=2, default='what is coronavirus?', label="Question") 567 | out = gr.outputs.Dataframe(label="Answers")#gr.outputs.Textbox(label="Answers") 568 | gr.Interface(fn=get_answer_dpr, inputs=inp, outputs=out).launch() 569 | ``` 570 | 571 | Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()` 572 | Running on public URL: https://21615.gradio.app 573 | 574 | This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces) 575 | 576 | 577 | 578 | 579 | 586 | 587 | 588 | 589 | 590 | 591 | 592 | (, 593 | 'http://127.0.0.1:7860/', 594 | 'https://21615.gradio.app') 595 | 596 | 597 | 598 | 599 | ```python 600 | 601 | ``` 602 | -------------------------------------------------------------------------------- /docs/index.md: -------------------------------------------------------------------------------- 1 | # NLP-Notebooks Collection -------------------------------------------------------------------------------- /docs/knowledge_distillation_exploration.md: -------------------------------------------------------------------------------- 1 | Open In Colab 2 | 3 | 4 | ``` 5 | import tensorflow as tf 6 | from tensorflow import keras 7 | from tensorflow.keras import layers 8 | ``` 9 | 10 | 11 | ``` 12 | vocab_size = 20000 # Only consider the top 20k words 13 | maxlen = 200 # Only consider the first 200 words of each movie review 14 | (x_train, y_train), (x_val, y_val) = keras.datasets.imdb.load_data(num_words=vocab_size) 15 | print(len(x_train), "Training sequences") 16 | print(len(x_val), "Validation sequences") 17 | x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen) 18 | x_val = keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen) 19 | ``` 20 | 21 | 25000 Training sequences 22 | 25000 Validation sequences 23 | 24 | 25 | 26 | ``` 27 | class Distiller(keras.Model): 28 | def __init__(self, student, teacher): 29 | super(Distiller, self).__init__() 30 | self.teacher = student 31 | self.student = teacher 32 | 33 | def compile( 34 | self, 35 | optimizer, 36 | metrics, 37 | student_loss_fn, 38 | distillation_loss_fn, 39 | alpha=0.1, 40 | temperature=3, 41 | ): 42 | """ Configure the distiller. 43 | 44 | Args: 45 | optimizer: Keras optimizer for the student weights 46 | metrics: Keras metrics for evaluation 47 | student_loss_fn: Loss function of difference between student 48 | predictions and ground-truth 49 | distillation_loss_fn: Loss function of difference between soft 50 | student predictions and soft teacher predictions 51 | alpha: weight to student_loss_fn and 1-alpha to distillation_loss_fn 52 | temperature: Temperature for softening probability distributions. 53 | Larger temperature gives softer distributions. 54 | """ 55 | super(Distiller, self).compile(optimizer=optimizer, metrics=metrics) 56 | self.student_loss_fn = student_loss_fn 57 | self.distillation_loss_fn = distillation_loss_fn 58 | self.alpha = alpha 59 | self.temperature = temperature 60 | 61 | def train_step(self, data): 62 | # Unpack data 63 | x, y = data 64 | 65 | # Forward pass of teacher 66 | teacher_predictions = self.teacher(x, training=False) 67 | 68 | with tf.GradientTape() as tape: 69 | # Forward pass of student 70 | student_predictions = self.student(x, training=True) 71 | 72 | # Compute losses 73 | student_loss = self.student_loss_fn(y, student_predictions) 74 | distillation_loss = self.distillation_loss_fn( 75 | tf.nn.softmax(teacher_predictions / self.temperature, axis=1), 76 | tf.nn.softmax(student_predictions / self.temperature, axis=1), 77 | ) 78 | loss = self.alpha * student_loss + (1 - self.alpha) * distillation_loss 79 | 80 | # Compute gradients 81 | trainable_vars = self.student.trainable_variables 82 | gradients = tape.gradient(loss, trainable_vars) 83 | 84 | # Update weights 85 | self.optimizer.apply_gradients(zip(gradients, trainable_vars)) 86 | 87 | # Update the metrics configured in `compile()`. 88 | self.compiled_metrics.update_state(y, student_predictions) 89 | 90 | # Return a dict of performance 91 | results = {m.name: m.result() for m in self.metrics} 92 | results.update( 93 | {"student_loss": student_loss, "distillation_loss": distillation_loss} 94 | ) 95 | return results 96 | 97 | def test_step(self, data): 98 | # Unpack the data 99 | x, y = data 100 | 101 | # Compute predictions 102 | y_prediction = self.student(x, training=False) 103 | 104 | # Calculate the loss 105 | student_loss = self.student_loss_fn(y, y_prediction) 106 | 107 | # Update the metrics. 108 | self.compiled_metrics.update_state(y, y_prediction) 109 | 110 | # Return a dict of performance 111 | results = {m.name: m.result() for m in self.metrics} 112 | results.update({"student_loss": student_loss}) 113 | return results 114 | ``` 115 | 116 | 117 | ``` 118 | # Create the teacher 119 | # Input for variable-length sequences of integers 120 | inputs = keras.Input(shape=(maxlen,), dtype="int32") 121 | # Embed each integer in a 128-dimensional vector 122 | x = layers.Embedding(vocab_size, 128)(inputs) 123 | # Add 2 bidirectional LSTMs and GRUs 124 | x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(x) 125 | x = layers.Bidirectional(layers.GRU(128))(x) 126 | # Add a classifier 127 | outputs = layers.Dense(2)(x) 128 | teacher = keras.Model(inputs, outputs, name='teacher') 129 | teacher.summary() 130 | 131 | 132 | # Create the student 133 | # Input for variable-length sequences of integers 134 | inputs = keras.Input(shape=(maxlen,), dtype="int32") 135 | # Embed each integer in a 128-dimensional vector 136 | x = layers.Embedding(vocab_size, 64)(inputs) 137 | # Add 2 bidirectional LSTMs and GRUs 138 | x = layers.Bidirectional(layers.LSTM(32, return_sequences=True))(x) 139 | x = layers.Bidirectional(layers.GRU(32))(x) 140 | # Add a classifier 141 | outputs = layers.Dense(2)(x) 142 | student = keras.Model(inputs, outputs, name='student') 143 | student.summary() 144 | 145 | # Clone student for later comparison 146 | student_scratch = keras.models.clone_model(student) 147 | ``` 148 | 149 | Model: "teacher" 150 | _________________________________________________________________ 151 | Layer (type) Output Shape Param # 152 | ================================================================= 153 | input_1 (InputLayer) [(None, 200)] 0 154 | _________________________________________________________________ 155 | embedding (Embedding) (None, 200, 128) 2560000 156 | _________________________________________________________________ 157 | bidirectional (Bidirectional (None, 200, 128) 98816 158 | _________________________________________________________________ 159 | bidirectional_1 (Bidirection (None, 256) 198144 160 | _________________________________________________________________ 161 | dense (Dense) (None, 2) 514 162 | ================================================================= 163 | Total params: 2,857,474 164 | Trainable params: 2,857,474 165 | Non-trainable params: 0 166 | _________________________________________________________________ 167 | Model: "student" 168 | _________________________________________________________________ 169 | Layer (type) Output Shape Param # 170 | ================================================================= 171 | input_2 (InputLayer) [(None, 200)] 0 172 | _________________________________________________________________ 173 | embedding_1 (Embedding) (None, 200, 64) 1280000 174 | _________________________________________________________________ 175 | bidirectional_2 (Bidirection (None, 200, 64) 24832 176 | _________________________________________________________________ 177 | bidirectional_3 (Bidirection (None, 64) 18816 178 | _________________________________________________________________ 179 | dense_1 (Dense) (None, 2) 130 180 | ================================================================= 181 | Total params: 1,323,778 182 | Trainable params: 1,323,778 183 | Non-trainable params: 0 184 | _________________________________________________________________ 185 | 186 | 187 | 188 | ``` 189 | # Train teacher as usual 190 | teacher.compile( 191 | optimizer=keras.optimizers.Adam(), 192 | loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), 193 | metrics=[keras.metrics.SparseCategoricalAccuracy()], 194 | ) 195 | 196 | # Train and evaluate teacher on data. 197 | teacher.fit(x_train, y_train, epochs=3) 198 | teacher.evaluate(x_val, y_val) 199 | ``` 200 | 201 | Epoch 1/3 202 | 782/782 [==============================] - 43s 55ms/step - loss: 0.3714 - sparse_categorical_accuracy: 0.8351 203 | Epoch 2/3 204 | 782/782 [==============================] - 43s 55ms/step - loss: 0.2004 - sparse_categorical_accuracy: 0.9232 205 | Epoch 3/3 206 | 782/782 [==============================] - 42s 54ms/step - loss: 0.1297 - sparse_categorical_accuracy: 0.9519 207 | 782/782 [==============================] - 12s 15ms/step - loss: 0.4922 - sparse_categorical_accuracy: 0.8590 208 | 209 | 210 | 211 | 212 | 213 | [0.49218621850013733, 0.8590400218963623] 214 | 215 | 216 | 217 | 218 | ``` 219 | # Initialize and compile distiller 220 | distiller = Distiller(student=student, teacher=teacher) 221 | distiller.compile( 222 | optimizer=keras.optimizers.Adam(), 223 | metrics=[keras.metrics.SparseCategoricalAccuracy()], 224 | student_loss_fn=keras.losses.SparseCategoricalCrossentropy(from_logits=True), 225 | distillation_loss_fn=keras.losses.KLDivergence(), 226 | alpha=0.1, 227 | temperature=10, 228 | ) 229 | 230 | # Distill teacher to student 231 | distiller.fit(x_train, y_train, epochs=3) 232 | 233 | # Evaluate student on test dataset 234 | distiller.evaluate(x_val, y_val) 235 | ``` 236 | 237 | Epoch 1/3 238 | 782/782 [==============================] - 53s 68ms/step - sparse_categorical_accuracy: 0.9664 - student_loss: 0.1382 - distillation_loss: 0.0084 239 | Epoch 2/3 240 | 782/782 [==============================] - 54s 69ms/step - sparse_categorical_accuracy: 0.9781 - student_loss: 0.1119 - distillation_loss: 0.0085 241 | Epoch 3/3 242 | 782/782 [==============================] - 54s 69ms/step - sparse_categorical_accuracy: 0.9868 - student_loss: 0.0937 - distillation_loss: 0.0088 243 | 782/782 [==============================] - 12s 15ms/step - sparse_categorical_accuracy: 0.8476 - student_loss: 0.3908 244 | 245 | 246 | 247 | 248 | 249 | 0.8475599884986877 250 | 251 | 252 | 253 | 254 | ``` 255 | # Train student as doen usually 256 | student_scratch.compile( 257 | optimizer=keras.optimizers.Adam(), 258 | loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), 259 | metrics=[keras.metrics.SparseCategoricalAccuracy()], 260 | ) 261 | 262 | # Train and evaluate student trained from scratch. 263 | student_scratch.fit(x_train, y_train, epochs=3) 264 | 265 | ``` 266 | 267 | Epoch 1/3 268 | 782/782 [==============================] - 31s 39ms/step - loss: 0.3796 - sparse_categorical_accuracy: 0.8246 269 | Epoch 2/3 270 | 782/782 [==============================] - 30s 39ms/step - loss: 0.1918 - sparse_categorical_accuracy: 0.9283 271 | Epoch 3/3 272 | 782/782 [==============================] - 30s 39ms/step - loss: 0.1123 - sparse_categorical_accuracy: 0.9602 273 | 274 | 275 | 276 | --------------------------------------------------------------------------- 277 | 278 | NameError Traceback (most recent call last) 279 | 280 | in () 281 | 8 # Train and evaluate student trained from scratch. 282 | 9 student_scratch.fit(x_train, y_train, epochs=3) 283 | ---> 10 student_scratch.evaluate(x_test, y_test) 284 | 285 | 286 | NameError: name 'x_test' is not defined 287 | 288 | 289 | 290 | ``` 291 | student_scratch.evaluate(x_val, y_val) 292 | ``` 293 | 294 | 782/782 [==============================] - 12s 15ms/step - loss: 0.3720 - sparse_categorical_accuracy: 0.8596 295 | 296 | 297 | 298 | 299 | 300 | [0.37198370695114136, 0.8595600128173828] 301 | 302 | 303 | 304 | 305 | ``` 306 | 307 | ``` 308 | -------------------------------------------------------------------------------- /docs/token_classification_transformers_zenml_files/token_classification_transformers_zenml_36_1.svg: -------------------------------------------------------------------------------- 1 | 2 | 4 | 6 | 7 | 9 | 10 | %3 11 | 12 | 13 | 14 | step_1 15 | 16 | data_importer 17 | 18 | 19 | 20 | artifact_1 21 | 22 | output 23 | (datasets.dataset_dict.DatasetDict) 24 | 25 | 26 | 27 | step_1->artifact_1 28 | 29 | 30 | 31 | 32 | 33 | step_2 34 | 35 | tokenization 36 | 37 | 38 | 39 | artifact_1->step_2 40 | 41 | 42 | 43 | 44 | 45 | artifact_2 46 | 47 | output 48 | (datasets.dataset_dict.DatasetDict) 49 | 50 | 51 | 52 | step_2->artifact_2 53 | 54 | 55 | 56 | 57 | 58 | step_3 59 | 60 | trainer 61 | 62 | 63 | 64 | artifact_2->step_3 65 | 66 | 67 | 68 | 69 | 70 | step_4 71 | 72 | evaluator 73 | 74 | 75 | 76 | artifact_2->step_4 77 | 78 | 79 | 80 | 81 | 82 | artifact_3 83 | 84 | output 85 | (transformers.models.distilbert.modeling_tf_distilbert.TFDistilBertForTokenClassification) 86 | 87 | 88 | 89 | step_3->artifact_3 90 | 91 | 92 | 93 | 94 | 95 | artifact_3->step_4 96 | 97 | 98 | 99 | 100 | 101 | artifact_4 102 | 103 | output 104 | (builtins.float) 105 | 106 | 107 | 108 | step_4->artifact_4 109 | 110 | 111 | 112 | 113 | 114 | -------------------------------------------------------------------------------- /knowledge-distillation/knowledge_distillation_exploration.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "knowledge_distillation_exploration.ipynb", 7 | "provenance": [], 8 | "authorship_tag": "ABX9TyPqqa3lzi3AUvvZifiQJtP8", 9 | "include_colab_link": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "accelerator": "GPU" 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "\"Open" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "metadata": { 31 | "id": "ykxHS6TBX6eS", 32 | "colab_type": "code", 33 | "colab": {} 34 | }, 35 | "source": [ 36 | "import tensorflow as tf\n", 37 | "from tensorflow import keras\n", 38 | "from tensorflow.keras import layers" 39 | ], 40 | "execution_count": 1, 41 | "outputs": [] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "metadata": { 46 | "id": "ryolGkbbZJ2b", 47 | "colab_type": "code", 48 | "colab": { 49 | "base_uri": "https://localhost:8080/", 50 | "height": 51 51 | }, 52 | "outputId": "e3242dc0-00a2-47e1-9d64-1d8212a07153" 53 | }, 54 | "source": [ 55 | "vocab_size = 20000 # Only consider the top 20k words\n", 56 | "maxlen = 200 # Only consider the first 200 words of each movie review\n", 57 | "(x_train, y_train), (x_val, y_val) = keras.datasets.imdb.load_data(num_words=vocab_size)\n", 58 | "print(len(x_train), \"Training sequences\")\n", 59 | "print(len(x_val), \"Validation sequences\")\n", 60 | "x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)\n", 61 | "x_val = keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen)" 62 | ], 63 | "execution_count": 2, 64 | "outputs": [ 65 | { 66 | "output_type": "stream", 67 | "text": [ 68 | "25000 Training sequences\n", 69 | "25000 Validation sequences\n" 70 | ], 71 | "name": "stdout" 72 | } 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "metadata": { 78 | "id": "WGiLAkFRZP3S", 79 | "colab_type": "code", 80 | "colab": {} 81 | }, 82 | "source": [ 83 | "class Distiller(keras.Model):\n", 84 | " def __init__(self, student, teacher):\n", 85 | " super(Distiller, self).__init__()\n", 86 | " self.teacher = student\n", 87 | " self.student = teacher\n", 88 | "\n", 89 | " def compile(\n", 90 | " self,\n", 91 | " optimizer,\n", 92 | " metrics,\n", 93 | " student_loss_fn,\n", 94 | " distillation_loss_fn,\n", 95 | " alpha=0.1,\n", 96 | " temperature=3,\n", 97 | " ):\n", 98 | " \"\"\" Configure the distiller.\n", 99 | "\n", 100 | " Args:\n", 101 | " optimizer: Keras optimizer for the student weights\n", 102 | " metrics: Keras metrics for evaluation\n", 103 | " student_loss_fn: Loss function of difference between student\n", 104 | " predictions and ground-truth\n", 105 | " distillation_loss_fn: Loss function of difference between soft\n", 106 | " student predictions and soft teacher predictions\n", 107 | " alpha: weight to student_loss_fn and 1-alpha to distillation_loss_fn\n", 108 | " temperature: Temperature for softening probability distributions.\n", 109 | " Larger temperature gives softer distributions.\n", 110 | " \"\"\"\n", 111 | " super(Distiller, self).compile(optimizer=optimizer, metrics=metrics)\n", 112 | " self.student_loss_fn = student_loss_fn\n", 113 | " self.distillation_loss_fn = distillation_loss_fn\n", 114 | " self.alpha = alpha\n", 115 | " self.temperature = temperature\n", 116 | "\n", 117 | " def train_step(self, data):\n", 118 | " # Unpack data\n", 119 | " x, y = data\n", 120 | "\n", 121 | " # Forward pass of teacher\n", 122 | " teacher_predictions = self.teacher(x, training=False)\n", 123 | "\n", 124 | " with tf.GradientTape() as tape:\n", 125 | " # Forward pass of student\n", 126 | " student_predictions = self.student(x, training=True)\n", 127 | "\n", 128 | " # Compute losses\n", 129 | " student_loss = self.student_loss_fn(y, student_predictions)\n", 130 | " distillation_loss = self.distillation_loss_fn(\n", 131 | " tf.nn.softmax(teacher_predictions / self.temperature, axis=1),\n", 132 | " tf.nn.softmax(student_predictions / self.temperature, axis=1),\n", 133 | " )\n", 134 | " loss = self.alpha * student_loss + (1 - self.alpha) * distillation_loss\n", 135 | "\n", 136 | " # Compute gradients\n", 137 | " trainable_vars = self.student.trainable_variables\n", 138 | " gradients = tape.gradient(loss, trainable_vars)\n", 139 | "\n", 140 | " # Update weights\n", 141 | " self.optimizer.apply_gradients(zip(gradients, trainable_vars))\n", 142 | "\n", 143 | " # Update the metrics configured in `compile()`.\n", 144 | " self.compiled_metrics.update_state(y, student_predictions)\n", 145 | "\n", 146 | " # Return a dict of performance\n", 147 | " results = {m.name: m.result() for m in self.metrics}\n", 148 | " results.update(\n", 149 | " {\"student_loss\": student_loss, \"distillation_loss\": distillation_loss}\n", 150 | " )\n", 151 | " return results\n", 152 | "\n", 153 | " def test_step(self, data):\n", 154 | " # Unpack the data\n", 155 | " x, y = data\n", 156 | "\n", 157 | " # Compute predictions\n", 158 | " y_prediction = self.student(x, training=False)\n", 159 | "\n", 160 | " # Calculate the loss\n", 161 | " student_loss = self.student_loss_fn(y, y_prediction)\n", 162 | "\n", 163 | " # Update the metrics.\n", 164 | " self.compiled_metrics.update_state(y, y_prediction)\n", 165 | "\n", 166 | " # Return a dict of performance\n", 167 | " results = {m.name: m.result() for m in self.metrics}\n", 168 | " results.update({\"student_loss\": student_loss})\n", 169 | " return results" 170 | ], 171 | "execution_count": 3, 172 | "outputs": [] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "metadata": { 177 | "id": "Hbo44nFoZclu", 178 | "colab_type": "code", 179 | "colab": { 180 | "base_uri": "https://localhost:8080/", 181 | "height": 629 182 | }, 183 | "outputId": "2cad14a0-0bb4-4cf6-ba87-eba58ed59309" 184 | }, 185 | "source": [ 186 | "# Create the teacher\n", 187 | "# Input for variable-length sequences of integers\n", 188 | "inputs = keras.Input(shape=(maxlen,), dtype=\"int32\")\n", 189 | "# Embed each integer in a 128-dimensional vector\n", 190 | "x = layers.Embedding(vocab_size, 128)(inputs)\n", 191 | "# Add 2 bidirectional LSTMs and GRUs\n", 192 | "x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(x)\n", 193 | "x = layers.Bidirectional(layers.GRU(128))(x)\n", 194 | "# Add a classifier\n", 195 | "outputs = layers.Dense(2)(x)\n", 196 | "teacher = keras.Model(inputs, outputs, name='teacher')\n", 197 | "teacher.summary()\n", 198 | "\n", 199 | "\n", 200 | "# Create the student\n", 201 | "# Input for variable-length sequences of integers\n", 202 | "inputs = keras.Input(shape=(maxlen,), dtype=\"int32\")\n", 203 | "# Embed each integer in a 128-dimensional vector\n", 204 | "x = layers.Embedding(vocab_size, 64)(inputs)\n", 205 | "# Add 2 bidirectional LSTMs and GRUs\n", 206 | "x = layers.Bidirectional(layers.LSTM(32, return_sequences=True))(x)\n", 207 | "x = layers.Bidirectional(layers.GRU(32))(x)\n", 208 | "# Add a classifier\n", 209 | "outputs = layers.Dense(2)(x)\n", 210 | "student = keras.Model(inputs, outputs, name='student')\n", 211 | "student.summary()\n", 212 | "\n", 213 | "# Clone student for later comparison\n", 214 | "student_scratch = keras.models.clone_model(student)" 215 | ], 216 | "execution_count": 4, 217 | "outputs": [ 218 | { 219 | "output_type": "stream", 220 | "text": [ 221 | "Model: \"teacher\"\n", 222 | "_________________________________________________________________\n", 223 | "Layer (type) Output Shape Param # \n", 224 | "=================================================================\n", 225 | "input_1 (InputLayer) [(None, 200)] 0 \n", 226 | "_________________________________________________________________\n", 227 | "embedding (Embedding) (None, 200, 128) 2560000 \n", 228 | "_________________________________________________________________\n", 229 | "bidirectional (Bidirectional (None, 200, 128) 98816 \n", 230 | "_________________________________________________________________\n", 231 | "bidirectional_1 (Bidirection (None, 256) 198144 \n", 232 | "_________________________________________________________________\n", 233 | "dense (Dense) (None, 2) 514 \n", 234 | "=================================================================\n", 235 | "Total params: 2,857,474\n", 236 | "Trainable params: 2,857,474\n", 237 | "Non-trainable params: 0\n", 238 | "_________________________________________________________________\n", 239 | "Model: \"student\"\n", 240 | "_________________________________________________________________\n", 241 | "Layer (type) Output Shape Param # \n", 242 | "=================================================================\n", 243 | "input_2 (InputLayer) [(None, 200)] 0 \n", 244 | "_________________________________________________________________\n", 245 | "embedding_1 (Embedding) (None, 200, 64) 1280000 \n", 246 | "_________________________________________________________________\n", 247 | "bidirectional_2 (Bidirection (None, 200, 64) 24832 \n", 248 | "_________________________________________________________________\n", 249 | "bidirectional_3 (Bidirection (None, 64) 18816 \n", 250 | "_________________________________________________________________\n", 251 | "dense_1 (Dense) (None, 2) 130 \n", 252 | "=================================================================\n", 253 | "Total params: 1,323,778\n", 254 | "Trainable params: 1,323,778\n", 255 | "Non-trainable params: 0\n", 256 | "_________________________________________________________________\n" 257 | ], 258 | "name": "stdout" 259 | } 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "metadata": { 265 | "id": "C0t_Mc07aBH1", 266 | "colab_type": "code", 267 | "colab": { 268 | "base_uri": "https://localhost:8080/", 269 | "height": 153 270 | }, 271 | "outputId": "84e82f1d-9dcd-44d4-e078-87440bccd1cb" 272 | }, 273 | "source": [ 274 | "# Train teacher as usual\n", 275 | "teacher.compile(\n", 276 | " optimizer=keras.optimizers.Adam(),\n", 277 | " loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", 278 | " metrics=[keras.metrics.SparseCategoricalAccuracy()],\n", 279 | ")\n", 280 | "\n", 281 | "# Train and evaluate teacher on data.\n", 282 | "teacher.fit(x_train, y_train, epochs=3)\n", 283 | "teacher.evaluate(x_val, y_val)" 284 | ], 285 | "execution_count": 5, 286 | "outputs": [ 287 | { 288 | "output_type": "stream", 289 | "text": [ 290 | "Epoch 1/3\n", 291 | "782/782 [==============================] - 43s 55ms/step - loss: 0.3714 - sparse_categorical_accuracy: 0.8351\n", 292 | "Epoch 2/3\n", 293 | "782/782 [==============================] - 43s 55ms/step - loss: 0.2004 - sparse_categorical_accuracy: 0.9232\n", 294 | "Epoch 3/3\n", 295 | "782/782 [==============================] - 42s 54ms/step - loss: 0.1297 - sparse_categorical_accuracy: 0.9519\n", 296 | "782/782 [==============================] - 12s 15ms/step - loss: 0.4922 - sparse_categorical_accuracy: 0.8590\n" 297 | ], 298 | "name": "stdout" 299 | }, 300 | { 301 | "output_type": "execute_result", 302 | "data": { 303 | "text/plain": [ 304 | "[0.49218621850013733, 0.8590400218963623]" 305 | ] 306 | }, 307 | "metadata": { 308 | "tags": [] 309 | }, 310 | "execution_count": 5 311 | } 312 | ] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "metadata": { 317 | "id": "6diJumxgaU1e", 318 | "colab_type": "code", 319 | "colab": { 320 | "base_uri": "https://localhost:8080/", 321 | "height": 153 322 | }, 323 | "outputId": "827b0b52-e59a-4c83-81e7-8252a0a6287d" 324 | }, 325 | "source": [ 326 | "# Initialize and compile distiller\n", 327 | "distiller = Distiller(student=student, teacher=teacher)\n", 328 | "distiller.compile(\n", 329 | " optimizer=keras.optimizers.Adam(),\n", 330 | " metrics=[keras.metrics.SparseCategoricalAccuracy()],\n", 331 | " student_loss_fn=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", 332 | " distillation_loss_fn=keras.losses.KLDivergence(),\n", 333 | " alpha=0.1,\n", 334 | " temperature=10,\n", 335 | ")\n", 336 | "\n", 337 | "# Distill teacher to student\n", 338 | "distiller.fit(x_train, y_train, epochs=3)\n", 339 | "\n", 340 | "# Evaluate student on test dataset\n", 341 | "distiller.evaluate(x_val, y_val)" 342 | ], 343 | "execution_count": 6, 344 | "outputs": [ 345 | { 346 | "output_type": "stream", 347 | "text": [ 348 | "Epoch 1/3\n", 349 | "782/782 [==============================] - 53s 68ms/step - sparse_categorical_accuracy: 0.9664 - student_loss: 0.1382 - distillation_loss: 0.0084\n", 350 | "Epoch 2/3\n", 351 | "782/782 [==============================] - 54s 69ms/step - sparse_categorical_accuracy: 0.9781 - student_loss: 0.1119 - distillation_loss: 0.0085\n", 352 | "Epoch 3/3\n", 353 | "782/782 [==============================] - 54s 69ms/step - sparse_categorical_accuracy: 0.9868 - student_loss: 0.0937 - distillation_loss: 0.0088\n", 354 | "782/782 [==============================] - 12s 15ms/step - sparse_categorical_accuracy: 0.8476 - student_loss: 0.3908\n" 355 | ], 356 | "name": "stdout" 357 | }, 358 | { 359 | "output_type": "execute_result", 360 | "data": { 361 | "text/plain": [ 362 | "0.8475599884986877" 363 | ] 364 | }, 365 | "metadata": { 366 | "tags": [] 367 | }, 368 | "execution_count": 6 369 | } 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "metadata": { 375 | "id": "v_sZMYFXafAc", 376 | "colab_type": "code", 377 | "colab": { 378 | "base_uri": "https://localhost:8080/", 379 | "height": 300 380 | }, 381 | "outputId": "98b839c7-b5f5-4a40-c822-fefd510eedb9" 382 | }, 383 | "source": [ 384 | "# Train student as doen usually\n", 385 | "student_scratch.compile(\n", 386 | " optimizer=keras.optimizers.Adam(),\n", 387 | " loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),\n", 388 | " metrics=[keras.metrics.SparseCategoricalAccuracy()],\n", 389 | ")\n", 390 | "\n", 391 | "# Train and evaluate student trained from scratch.\n", 392 | "student_scratch.fit(x_train, y_train, epochs=3)\n" 393 | ], 394 | "execution_count": 7, 395 | "outputs": [ 396 | { 397 | "output_type": "stream", 398 | "text": [ 399 | "Epoch 1/3\n", 400 | "782/782 [==============================] - 31s 39ms/step - loss: 0.3796 - sparse_categorical_accuracy: 0.8246\n", 401 | "Epoch 2/3\n", 402 | "782/782 [==============================] - 30s 39ms/step - loss: 0.1918 - sparse_categorical_accuracy: 0.9283\n", 403 | "Epoch 3/3\n", 404 | "782/782 [==============================] - 30s 39ms/step - loss: 0.1123 - sparse_categorical_accuracy: 0.9602\n" 405 | ], 406 | "name": "stdout" 407 | }, 408 | { 409 | "output_type": "error", 410 | "ename": "NameError", 411 | "evalue": "ignored", 412 | "traceback": [ 413 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 414 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 415 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0;31m# Train and evaluate student trained from scratch.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mstudent_scratch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_train\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_train\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mepochs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 10\u001b[0;31m \u001b[0mstudent_scratch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mevaluate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx_test\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_test\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 416 | "\u001b[0;31mNameError\u001b[0m: name 'x_test' is not defined" 417 | ] 418 | } 419 | ] 420 | }, 421 | { 422 | "cell_type": "code", 423 | "metadata": { 424 | "id": "xmd5LaNpakwp", 425 | "colab_type": "code", 426 | "colab": { 427 | "base_uri": "https://localhost:8080/", 428 | "height": 51 429 | }, 430 | "outputId": "f19a9c16-5ff2-4a24-ebe0-0bec4e90e0f9" 431 | }, 432 | "source": [ 433 | "student_scratch.evaluate(x_val, y_val)" 434 | ], 435 | "execution_count": 9, 436 | "outputs": [ 437 | { 438 | "output_type": "stream", 439 | "text": [ 440 | "782/782 [==============================] - 12s 15ms/step - loss: 0.3720 - sparse_categorical_accuracy: 0.8596\n" 441 | ], 442 | "name": "stdout" 443 | }, 444 | { 445 | "output_type": "execute_result", 446 | "data": { 447 | "text/plain": [ 448 | "[0.37198370695114136, 0.8595600128173828]" 449 | ] 450 | }, 451 | "metadata": { 452 | "tags": [] 453 | }, 454 | "execution_count": 9 455 | } 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "metadata": { 461 | "id": "GUPSHu9Td6CI", 462 | "colab_type": "code", 463 | "colab": {} 464 | }, 465 | "source": [ 466 | "" 467 | ], 468 | "execution_count": null, 469 | "outputs": [] 470 | } 471 | ] 472 | } -------------------------------------------------------------------------------- /mkdocs.yml: -------------------------------------------------------------------------------- 1 | site_name: NLP-Notebooks 2 | nav: 3 | - BERT_Fine_Tuning_Sentence_Classification_v2.md 4 | - Bert_Classification_Pt.md 5 | - Bert_Pre_Training.md 6 | - Doc_Visual_QA_and_Bill_extraction_demo.md 7 | - GPT_2_on_Onnx_CPU.md 8 | - Generic_Transformer_Classification.md 9 | - Question_Answering_with_a_Fine_Tuned_BERT.md 10 | - Seq2Seq_Pytorch.md 11 | - SetFit_SST_2_Few_shot.md 12 | - Simpletransformers_2.md 13 | - TAPAS_fine_tuning_in_tf.md 14 | - Using_Transformers_with_Fastai_Tutorial.md 15 | - Wikipedia_answer_retrieval_DPR.md 16 | - contextual_topic_modeling.md 17 | - knowledge_distillation_exploration.md 18 | - large_scale_multilabelclassification.md 19 | - simpletransformers_intro.md 20 | - token_classification_transformers_zenml.md 21 | theme: readthedocs --------------------------------------------------------------------------------