├── LICENSE ├── README.md ├── classification └── hf_evaluate.ipynb ├── data ├── mini-dataset.csv ├── mini-dataset.json ├── mini-dataset_with_embedding.json ├── mini-llama-articles-with_embeddings.csv ├── mini-llama-articles.csv ├── rag_eval_dataset.json └── vectorstore.zip ├── paraphrasing └── hf_T5_paraphrasing.ipynb ├── summarization ├── hf_BART_inference_breakdown.ipynb ├── hf_BART_train_breakdown.ipynb └── hf_BERT-BERT_training.ipynb └── translation └── hf_bart_translation.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # tutorial_notebooks -------------------------------------------------------------------------------- /classification/hf_evaluate.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "authorship_tag": "ABX9TyPjI+p6qUuJ1t5az+EkoO8E", 8 | "include_colab_link": true 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | }, 17 | "widgets": { 18 | "application/vnd.jupyter.widget-state+json": { 19 | "182593c10e3548dbb190d6cea821eb17": { 20 | "model_module": "@jupyter-widgets/controls", 21 | "model_name": "HBoxModel", 22 | "model_module_version": "1.5.0", 23 | "state": { 24 | "_dom_classes": [], 25 | "_model_module": "@jupyter-widgets/controls", 26 | "_model_module_version": "1.5.0", 27 | "_model_name": "HBoxModel", 28 | "_view_count": null, 29 | "_view_module": "@jupyter-widgets/controls", 30 | "_view_module_version": "1.5.0", 31 | "_view_name": "HBoxView", 32 | "box_style": "", 33 | "children": [ 34 | "IPY_MODEL_9940ccc1f94d40fa84838c91489e9822", 35 | "IPY_MODEL_da2123ea1f604bb7bfa910320755a999", 36 | "IPY_MODEL_47cb9e14f37746528b34f48488b45d83" 37 | ], 38 | "layout": "IPY_MODEL_2677f8fb1a1a4baeb591541487de20f2" 39 | } 40 | }, 41 | "9940ccc1f94d40fa84838c91489e9822": { 42 | "model_module": "@jupyter-widgets/controls", 43 | "model_name": "HTMLModel", 44 | "model_module_version": "1.5.0", 45 | "state": { 46 | "_dom_classes": [], 47 | "_model_module": "@jupyter-widgets/controls", 48 | "_model_module_version": "1.5.0", 49 | "_model_name": "HTMLModel", 50 | "_view_count": null, 51 | "_view_module": "@jupyter-widgets/controls", 52 | "_view_module_version": "1.5.0", 53 | "_view_name": "HTMLView", 54 | "description": "", 55 | "description_tooltip": null, 56 | "layout": "IPY_MODEL_9e32b380fcd4429e8e2477ede70aaa1d", 57 | "placeholder": "​", 58 | "style": "IPY_MODEL_0dd59e6c6e464937be617e1ff0d4c448", 59 | "value": "100%" 60 | } 61 | }, 62 | "da2123ea1f604bb7bfa910320755a999": { 63 | "model_module": "@jupyter-widgets/controls", 64 | "model_name": "FloatProgressModel", 65 | "model_module_version": "1.5.0", 66 | "state": { 67 | "_dom_classes": [], 68 | "_model_module": "@jupyter-widgets/controls", 69 | "_model_module_version": "1.5.0", 70 | "_model_name": "FloatProgressModel", 71 | "_view_count": null, 72 | "_view_module": "@jupyter-widgets/controls", 73 | "_view_module_version": "1.5.0", 74 | "_view_name": "ProgressView", 75 | "bar_style": "success", 76 | "description": "", 77 | "description_tooltip": null, 78 | "layout": "IPY_MODEL_b2aded75b044469f9a090bc592b2015c", 79 | "max": 2, 80 | "min": 0, 81 | "orientation": "horizontal", 82 | "style": "IPY_MODEL_6c6dcb3cdcae44aea7f27a3dbece79fc", 83 | "value": 2 84 | } 85 | }, 86 | "47cb9e14f37746528b34f48488b45d83": { 87 | "model_module": "@jupyter-widgets/controls", 88 | "model_name": "HTMLModel", 89 | "model_module_version": "1.5.0", 90 | "state": { 91 | "_dom_classes": [], 92 | "_model_module": "@jupyter-widgets/controls", 93 | "_model_module_version": "1.5.0", 94 | "_model_name": "HTMLModel", 95 | "_view_count": null, 96 | "_view_module": "@jupyter-widgets/controls", 97 | "_view_module_version": "1.5.0", 98 | "_view_name": "HTMLView", 99 | "description": "", 100 | "description_tooltip": null, 101 | "layout": "IPY_MODEL_d9e1989bfbd946b68decc24c67a37b32", 102 | "placeholder": "​", 103 | "style": "IPY_MODEL_b26dbdcb1fef49f695b92799fc310cd8", 104 | "value": " 2/2 [00:00<00:00, 19.46it/s]" 105 | } 106 | }, 107 | "2677f8fb1a1a4baeb591541487de20f2": { 108 | "model_module": "@jupyter-widgets/base", 109 | "model_name": "LayoutModel", 110 | "model_module_version": "1.2.0", 111 | "state": { 112 | "_model_module": "@jupyter-widgets/base", 113 | "_model_module_version": "1.2.0", 114 | "_model_name": "LayoutModel", 115 | "_view_count": null, 116 | "_view_module": "@jupyter-widgets/base", 117 | "_view_module_version": "1.2.0", 118 | "_view_name": "LayoutView", 119 | "align_content": null, 120 | "align_items": null, 121 | "align_self": null, 122 | "border": null, 123 | "bottom": null, 124 | "display": null, 125 | "flex": null, 126 | "flex_flow": null, 127 | "grid_area": null, 128 | "grid_auto_columns": null, 129 | "grid_auto_flow": null, 130 | "grid_auto_rows": null, 131 | "grid_column": null, 132 | "grid_gap": null, 133 | "grid_row": null, 134 | "grid_template_areas": null, 135 | "grid_template_columns": null, 136 | "grid_template_rows": null, 137 | "height": null, 138 | "justify_content": null, 139 | "justify_items": null, 140 | "left": null, 141 | "margin": null, 142 | "max_height": null, 143 | "max_width": null, 144 | "min_height": null, 145 | "min_width": null, 146 | "object_fit": null, 147 | "object_position": null, 148 | "order": null, 149 | "overflow": null, 150 | "overflow_x": null, 151 | "overflow_y": null, 152 | "padding": null, 153 | "right": null, 154 | "top": null, 155 | "visibility": null, 156 | "width": null 157 | } 158 | }, 159 | "9e32b380fcd4429e8e2477ede70aaa1d": { 160 | "model_module": "@jupyter-widgets/base", 161 | "model_name": "LayoutModel", 162 | "model_module_version": "1.2.0", 163 | "state": { 164 | "_model_module": "@jupyter-widgets/base", 165 | "_model_module_version": "1.2.0", 166 | "_model_name": "LayoutModel", 167 | "_view_count": null, 168 | "_view_module": "@jupyter-widgets/base", 169 | "_view_module_version": "1.2.0", 170 | "_view_name": "LayoutView", 171 | "align_content": null, 172 | "align_items": null, 173 | "align_self": null, 174 | "border": null, 175 | "bottom": null, 176 | "display": null, 177 | "flex": null, 178 | "flex_flow": null, 179 | "grid_area": null, 180 | "grid_auto_columns": null, 181 | "grid_auto_flow": null, 182 | "grid_auto_rows": null, 183 | "grid_column": null, 184 | "grid_gap": null, 185 | "grid_row": null, 186 | "grid_template_areas": null, 187 | "grid_template_columns": null, 188 | "grid_template_rows": null, 189 | "height": null, 190 | "justify_content": null, 191 | "justify_items": null, 192 | "left": null, 193 | "margin": null, 194 | "max_height": null, 195 | "max_width": null, 196 | "min_height": null, 197 | "min_width": null, 198 | "object_fit": null, 199 | "object_position": null, 200 | "order": null, 201 | "overflow": null, 202 | "overflow_x": null, 203 | "overflow_y": null, 204 | "padding": null, 205 | "right": null, 206 | "top": null, 207 | "visibility": null, 208 | "width": null 209 | } 210 | }, 211 | "0dd59e6c6e464937be617e1ff0d4c448": { 212 | "model_module": "@jupyter-widgets/controls", 213 | "model_name": "DescriptionStyleModel", 214 | "model_module_version": "1.5.0", 215 | "state": { 216 | "_model_module": "@jupyter-widgets/controls", 217 | "_model_module_version": "1.5.0", 218 | "_model_name": "DescriptionStyleModel", 219 | "_view_count": null, 220 | "_view_module": "@jupyter-widgets/base", 221 | "_view_module_version": "1.2.0", 222 | "_view_name": "StyleView", 223 | "description_width": "" 224 | } 225 | }, 226 | "b2aded75b044469f9a090bc592b2015c": { 227 | "model_module": "@jupyter-widgets/base", 228 | "model_name": "LayoutModel", 229 | "model_module_version": "1.2.0", 230 | "state": { 231 | "_model_module": "@jupyter-widgets/base", 232 | "_model_module_version": "1.2.0", 233 | "_model_name": "LayoutModel", 234 | "_view_count": null, 235 | "_view_module": "@jupyter-widgets/base", 236 | "_view_module_version": "1.2.0", 237 | "_view_name": "LayoutView", 238 | "align_content": null, 239 | "align_items": null, 240 | "align_self": null, 241 | "border": null, 242 | "bottom": null, 243 | "display": null, 244 | "flex": null, 245 | "flex_flow": null, 246 | "grid_area": null, 247 | "grid_auto_columns": null, 248 | "grid_auto_flow": null, 249 | "grid_auto_rows": null, 250 | "grid_column": null, 251 | "grid_gap": null, 252 | "grid_row": null, 253 | "grid_template_areas": null, 254 | "grid_template_columns": null, 255 | "grid_template_rows": null, 256 | "height": null, 257 | "justify_content": null, 258 | "justify_items": null, 259 | "left": null, 260 | "margin": null, 261 | "max_height": null, 262 | "max_width": null, 263 | "min_height": null, 264 | "min_width": null, 265 | "object_fit": null, 266 | "object_position": null, 267 | "order": null, 268 | "overflow": null, 269 | "overflow_x": null, 270 | "overflow_y": null, 271 | "padding": null, 272 | "right": null, 273 | "top": null, 274 | "visibility": null, 275 | "width": null 276 | } 277 | }, 278 | "6c6dcb3cdcae44aea7f27a3dbece79fc": { 279 | "model_module": "@jupyter-widgets/controls", 280 | "model_name": "ProgressStyleModel", 281 | "model_module_version": "1.5.0", 282 | "state": { 283 | "_model_module": "@jupyter-widgets/controls", 284 | "_model_module_version": "1.5.0", 285 | "_model_name": "ProgressStyleModel", 286 | "_view_count": null, 287 | "_view_module": "@jupyter-widgets/base", 288 | "_view_module_version": "1.2.0", 289 | "_view_name": "StyleView", 290 | "bar_color": null, 291 | "description_width": "" 292 | } 293 | }, 294 | "d9e1989bfbd946b68decc24c67a37b32": { 295 | "model_module": "@jupyter-widgets/base", 296 | "model_name": "LayoutModel", 297 | "model_module_version": "1.2.0", 298 | "state": { 299 | "_model_module": "@jupyter-widgets/base", 300 | "_model_module_version": "1.2.0", 301 | "_model_name": "LayoutModel", 302 | "_view_count": null, 303 | "_view_module": "@jupyter-widgets/base", 304 | "_view_module_version": "1.2.0", 305 | "_view_name": "LayoutView", 306 | "align_content": null, 307 | "align_items": null, 308 | "align_self": null, 309 | "border": null, 310 | "bottom": null, 311 | "display": null, 312 | "flex": null, 313 | "flex_flow": null, 314 | "grid_area": null, 315 | "grid_auto_columns": null, 316 | "grid_auto_flow": null, 317 | "grid_auto_rows": null, 318 | "grid_column": null, 319 | "grid_gap": null, 320 | "grid_row": null, 321 | "grid_template_areas": null, 322 | "grid_template_columns": null, 323 | "grid_template_rows": null, 324 | "height": null, 325 | "justify_content": null, 326 | "justify_items": null, 327 | "left": null, 328 | "margin": null, 329 | "max_height": null, 330 | "max_width": null, 331 | "min_height": null, 332 | "min_width": null, 333 | "object_fit": null, 334 | "object_position": null, 335 | "order": null, 336 | "overflow": null, 337 | "overflow_x": null, 338 | "overflow_y": null, 339 | "padding": null, 340 | "right": null, 341 | "top": null, 342 | "visibility": null, 343 | "width": null 344 | } 345 | }, 346 | "b26dbdcb1fef49f695b92799fc310cd8": { 347 | "model_module": "@jupyter-widgets/controls", 348 | "model_name": "DescriptionStyleModel", 349 | "model_module_version": "1.5.0", 350 | "state": { 351 | "_model_module": "@jupyter-widgets/controls", 352 | "_model_module_version": "1.5.0", 353 | "_model_name": "DescriptionStyleModel", 354 | "_view_count": null, 355 | "_view_module": "@jupyter-widgets/base", 356 | "_view_module_version": "1.2.0", 357 | "_view_name": "StyleView", 358 | "description_width": "" 359 | } 360 | } 361 | } 362 | } 363 | }, 364 | "cells": [ 365 | { 366 | "cell_type": "markdown", 367 | "metadata": { 368 | "id": "view-in-github", 369 | "colab_type": "text" 370 | }, 371 | "source": [ 372 | "\"Open" 373 | ] 374 | }, 375 | { 376 | "cell_type": "markdown", 377 | "source": [ 378 | "# A sample code of how to use Huggingface's Evaluate library\n", 379 | "The code is the supplementary material to the story published in NLPiation medium blog. Follow [the link](https://medium.com/@nlpiation/how-to-use-the-huggingface-evaluate-library-in-action-with-batching-2948929015bf) for a detailed explanation of the diverse beam search and following code." 380 | ], 381 | "metadata": { 382 | "id": "2JeLSabjDMnU" 383 | } 384 | }, 385 | { 386 | "cell_type": "markdown", 387 | "source": [ 388 | "# Download and Load Libraries" 389 | ], 390 | "metadata": { 391 | "id": "rhUVNMEvDoI0" 392 | } 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": 1, 397 | "metadata": { 398 | "id": "oN2D9LrebBVo" 399 | }, 400 | "outputs": [], 401 | "source": [ 402 | "!pip install -q torch==1.13.1 datasets==2.9.0 evaluate==0.4.0 transformers==4.26.0" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "source": [ 408 | "## Import Libraries" 409 | ], 410 | "metadata": { 411 | "id": "Qlj89Verbd5n" 412 | } 413 | }, 414 | { 415 | "cell_type": "code", 416 | "source": [ 417 | "import torch\n", 418 | "from transformers import AutoModelForSequenceClassification\n", 419 | "from transformers import AutoTokenizer\n", 420 | "import evaluate\n", 421 | "from datasets import load_dataset\n", 422 | "from datasets import Dataset\n", 423 | "\n", 424 | "from tqdm import tqdm\n", 425 | "import pandas as pd\n", 426 | "from sklearn.model_selection import train_test_split" 427 | ], 428 | "metadata": { 429 | "id": "Z7NDPpHdbotq" 430 | }, 431 | "execution_count": 2, 432 | "outputs": [] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "source": [ 437 | "# Load The Dataset" 438 | ], 439 | "metadata": { 440 | "id": "ZXciiNwaqxtl" 441 | } 442 | }, 443 | { 444 | "cell_type": "code", 445 | "source": [ 446 | "sentiment140 = load_dataset(\"sentiment140\", cache_dir=\"./ds_sentiment140\")" 447 | ], 448 | "metadata": { 449 | "id": "9bThj8ZYbJ0U", 450 | "colab": { 451 | "base_uri": "https://localhost:8080/", 452 | "height": 67, 453 | "referenced_widgets": [ 454 | "182593c10e3548dbb190d6cea821eb17", 455 | "9940ccc1f94d40fa84838c91489e9822", 456 | "da2123ea1f604bb7bfa910320755a999", 457 | "47cb9e14f37746528b34f48488b45d83", 458 | "2677f8fb1a1a4baeb591541487de20f2", 459 | "9e32b380fcd4429e8e2477ede70aaa1d", 460 | "0dd59e6c6e464937be617e1ff0d4c448", 461 | "b2aded75b044469f9a090bc592b2015c", 462 | "6c6dcb3cdcae44aea7f27a3dbece79fc", 463 | "d9e1989bfbd946b68decc24c67a37b32", 464 | "b26dbdcb1fef49f695b92799fc310cd8" 465 | ] 466 | }, 467 | "outputId": "b0a33fd9-f84c-4d77-b2d3-e053c78243ed" 468 | }, 469 | "execution_count": 3, 470 | "outputs": [ 471 | { 472 | "output_type": "stream", 473 | "name": "stderr", 474 | "text": [ 475 | "WARNING:datasets.builder:Found cached dataset sentiment140 (/content/ds_sentiment140/sentiment140/sentiment140/1.0.0/f81c014152931b776735658d8ae493b181927de002e706c4d5244ecb26376997)\n" 476 | ] 477 | }, 478 | { 479 | "output_type": "display_data", 480 | "data": { 481 | "text/plain": [ 482 | " 0%| | 0/2 [00:00\n", 610 | "
\n", 611 | "
\n", 612 | "\n", 625 | "\n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | "
textsentiment
1556092@JessicaKnows I use it and do like it.1
868905Almost home aaand I need to pee rather badly....1
218471dropping the marmite and cheese covered bread ...0
620327Having issues with Xfire broadcast. Cancelled ...0
981867@kunaldua ask for Hermes Heritage complex. Its...1
\n", 661 | "
\n", 662 | " \n", 672 | " \n", 673 | " \n", 710 | "\n", 711 | " \n", 735 | "
\n", 736 | " \n", 737 | " " 738 | ] 739 | }, 740 | "metadata": {}, 741 | "execution_count": 9 742 | } 743 | ] 744 | }, 745 | { 746 | "cell_type": "code", 747 | "source": [ 748 | "test_df = pd.DataFrame(X_test,columns=['text'])\n", 749 | "test_df['sentiment'] = y_test\n", 750 | "test_df.head()" 751 | ], 752 | "metadata": { 753 | "colab": { 754 | "base_uri": "https://localhost:8080/", 755 | "height": 206 756 | }, 757 | "id": "5plcKEcAluOd", 758 | "outputId": "2a10df29-b42f-4f52-8663-bf2f705f700b" 759 | }, 760 | "execution_count": 10, 761 | "outputs": [ 762 | { 763 | "output_type": "execute_result", 764 | "data": { 765 | "text/plain": [ 766 | " text sentiment\n", 767 | "932067 @door_kicker hey tofu is super good for u...an... 1\n", 768 | "909762 Caps lost. ARGH! But HP game evening was much ... 1\n", 769 | "1275248 @Ellen_F OF has already been on in Oz. Not sur... 1\n", 770 | "1274799 @alexpham4 with teleporation, I wouldn't need ... 1\n", 771 | "1530405 Omg! i have 16 followers! thank u thank u thaa... 1" 772 | ], 773 | "text/html": [ 774 | "\n", 775 | "
\n", 776 | "
\n", 777 | "
\n", 778 | "\n", 791 | "\n", 792 | " \n", 793 | " \n", 794 | " \n", 795 | " \n", 796 | " \n", 797 | " \n", 798 | " \n", 799 | " \n", 800 | " \n", 801 | " \n", 802 | " \n", 803 | " \n", 804 | " \n", 805 | " \n", 806 | " \n", 807 | " \n", 808 | " \n", 809 | " \n", 810 | " \n", 811 | " \n", 812 | " \n", 813 | " \n", 814 | " \n", 815 | " \n", 816 | " \n", 817 | " \n", 818 | " \n", 819 | " \n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | "
textsentiment
932067@door_kicker hey tofu is super good for u...an...1
909762Caps lost. ARGH! But HP game evening was much ...1
1275248@Ellen_F OF has already been on in Oz. Not sur...1
1274799@alexpham4 with teleporation, I wouldn't need ...1
1530405Omg! i have 16 followers! thank u thank u thaa...1
\n", 827 | "
\n", 828 | " \n", 838 | " \n", 839 | " \n", 876 | "\n", 877 | " \n", 901 | "
\n", 902 | "
\n", 903 | " " 904 | ] 905 | }, 906 | "metadata": {}, 907 | "execution_count": 10 908 | } 909 | ] 910 | }, 911 | { 912 | "cell_type": "code", 913 | "source": [ 914 | "valid_df = pd.DataFrame(X_val,columns=['text'])\n", 915 | "valid_df['sentiment'] = y_val\n", 916 | "valid_df.head()" 917 | ], 918 | "metadata": { 919 | "colab": { 920 | "base_uri": "https://localhost:8080/", 921 | "height": 206 922 | }, 923 | "id": "pbGnTUqelyvg", 924 | "outputId": "3c15e154-77f4-4c34-e701-a3f7a397f1d1" 925 | }, 926 | "execution_count": 11, 927 | "outputs": [ 928 | { 929 | "output_type": "execute_result", 930 | "data": { 931 | "text/plain": [ 932 | " text sentiment\n", 933 | "60473 It is still raining and more storms are moving... 0\n", 934 | "1174268 @JaydyGaGa ... Was well suprised ... I was li... 1\n", 935 | "1404666 @ddlovato I'm sure you will Demi. 1\n", 936 | "380353 @LightFoundDark yes Geographie and i dont know... 0\n", 937 | "470328 Everyone follow @truthtweet, shows which celeb... 0" 938 | ], 939 | "text/html": [ 940 | "\n", 941 | "
\n", 942 | "
\n", 943 | "
\n", 944 | "\n", 957 | "\n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | " \n", 967 | " \n", 968 | " \n", 969 | " \n", 970 | " \n", 971 | " \n", 972 | " \n", 973 | " \n", 974 | " \n", 975 | " \n", 976 | " \n", 977 | " \n", 978 | " \n", 979 | " \n", 980 | " \n", 981 | " \n", 982 | " \n", 983 | " \n", 984 | " \n", 985 | " \n", 986 | " \n", 987 | " \n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | "
textsentiment
60473It is still raining and more storms are moving...0
1174268@JaydyGaGa ... Was well suprised ... I was li...1
1404666@ddlovato I'm sure you will Demi.1
380353@LightFoundDark yes Geographie and i dont know...0
470328Everyone follow @truthtweet, shows which celeb...0
\n", 993 | "
\n", 994 | " \n", 1004 | " \n", 1005 | " \n", 1042 | "\n", 1043 | " \n", 1067 | "
\n", 1068 | "
\n", 1069 | " " 1070 | ] 1071 | }, 1072 | "metadata": {}, 1073 | "execution_count": 11 1074 | } 1075 | ] 1076 | }, 1077 | { 1078 | "cell_type": "markdown", 1079 | "source": [ 1080 | "Lastly, convert the Dataframes to the Huggingface dataset object." 1081 | ], 1082 | "metadata": { 1083 | "id": "EqNPvj1Yrmqm" 1084 | } 1085 | }, 1086 | { 1087 | "cell_type": "code", 1088 | "source": [ 1089 | "hf_train = Dataset.from_pandas(train_df)\n", 1090 | "hf_test = Dataset.from_pandas(test_df)\n", 1091 | "hf_valid = Dataset.from_pandas(valid_df)" 1092 | ], 1093 | "metadata": { 1094 | "id": "ckPOdPJpl0Sa" 1095 | }, 1096 | "execution_count": 12, 1097 | "outputs": [] 1098 | }, 1099 | { 1100 | "cell_type": "markdown", 1101 | "source": [ 1102 | "The following fields are optional if you want to save the Dataset objects for later." 1103 | ], 1104 | "metadata": { 1105 | "id": "4Ih0nyJprv5Q" 1106 | } 1107 | }, 1108 | { 1109 | "cell_type": "code", 1110 | "source": [ 1111 | "# hf_train.save_to_disk('./hf-cache/processed_sentiment140/train')\n", 1112 | "# hf_test.save_to_disk('./hf-cache/processed_sentiment140/test')\n", 1113 | "# hf_valid.save_to_disk('./hf-cache/processed_sentiment140/valid')\n", 1114 | "\n", 1115 | "# from datasets import load_from_disk\n", 1116 | "\n", 1117 | "# train_set = load_from_disk('./hf-cache/processed_sentiment140/train')\n", 1118 | "# valid_set = load_from_disk('./hf-cache/processed_sentiment140/test')\n", 1119 | "# test_set = load_from_disk('./hf-cache/processed_sentiment140/valid')" 1120 | ], 1121 | "metadata": { 1122 | "id": "p7HK_oJjmDGG" 1123 | }, 1124 | "execution_count": 13, 1125 | "outputs": [] 1126 | }, 1127 | { 1128 | "cell_type": "markdown", 1129 | "source": [ 1130 | "⚠️ Bonus: You should also consider renaming the \"sentiment\" column to \"label\" if you want to use this dataset for training using the Huggingface's Trainer function." 1131 | ], 1132 | "metadata": { 1133 | "id": "0xd1yQnZr1Yh" 1134 | } 1135 | }, 1136 | { 1137 | "cell_type": "code", 1138 | "source": [ 1139 | "# train_set = hf_train.rename_column(\"sentiment\", \"label\")\n", 1140 | "# valid_set = hf_valid.rename_column(\"sentiment\", \"label\")" 1141 | ], 1142 | "metadata": { 1143 | "id": "vizH53ZUnQUc" 1144 | }, 1145 | "execution_count": 14, 1146 | "outputs": [] 1147 | }, 1148 | { 1149 | "cell_type": "markdown", 1150 | "source": [ 1151 | "⚠️ Comment the following fields if it is not a test run. It will select 32 datapoints from the dataset for faster prediction." 1152 | ], 1153 | "metadata": { 1154 | "id": "PisCdjcCsnrf" 1155 | } 1156 | }, 1157 | { 1158 | "cell_type": "code", 1159 | "source": [ 1160 | "hf_test = hf_test.select( range(32) )" 1161 | ], 1162 | "metadata": { 1163 | "id": "2fcSkrMXsnZt" 1164 | }, 1165 | "execution_count": 15, 1166 | "outputs": [] 1167 | }, 1168 | { 1169 | "cell_type": "markdown", 1170 | "source": [ 1171 | "It is not possible to do batching and itterate over the Huggingface dataset. So, the PyTorch DataLoader will take care of that." 1172 | ], 1173 | "metadata": { 1174 | "id": "jxl7UP-GsFt0" 1175 | } 1176 | }, 1177 | { 1178 | "cell_type": "code", 1179 | "source": [ 1180 | "ds_loader = torch.utils.data.DataLoader(\n", 1181 | " hf_test,\n", 1182 | " batch_size=16,\n", 1183 | " num_workers=4,\n", 1184 | " pin_memory=True,\n", 1185 | ")" 1186 | ], 1187 | "metadata": { 1188 | "colab": { 1189 | "base_uri": "https://localhost:8080/" 1190 | }, 1191 | "id": "iFNcyIZ1oWr1", 1192 | "outputId": "dcf065a6-f1de-4a2e-bd36-ff33f91c55a7" 1193 | }, 1194 | "execution_count": 16, 1195 | "outputs": [ 1196 | { 1197 | "output_type": "stream", 1198 | "name": "stderr", 1199 | "text": [ 1200 | "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n", 1201 | " warnings.warn(_create_warning_msg(\n" 1202 | ] 1203 | } 1204 | ] 1205 | }, 1206 | { 1207 | "cell_type": "markdown", 1208 | "source": [ 1209 | "# Load Tokenizer and Model" 1210 | ], 1211 | "metadata": { 1212 | "id": "YH7CIjHvsR--" 1213 | } 1214 | }, 1215 | { 1216 | "cell_type": "markdown", 1217 | "source": [ 1218 | "I chose a RoBERTa model from the Huggingface hub that is finetuned for this dataset." 1219 | ], 1220 | "metadata": { 1221 | "id": "C_B59vUasUHr" 1222 | } 1223 | }, 1224 | { 1225 | "cell_type": "code", 1226 | "source": [ 1227 | "tokenizer = AutoTokenizer.from_pretrained(\"pig4431/Sentiment140_roBERTa_5E\", cache_dir=\"./hf-cache/roberta\")" 1228 | ], 1229 | "metadata": { 1230 | "id": "s1ITIWcVnai9" 1231 | }, 1232 | "execution_count": 17, 1233 | "outputs": [] 1234 | }, 1235 | { 1236 | "cell_type": "code", 1237 | "source": [ 1238 | "model = AutoModelForSequenceClassification.from_pretrained(\"pig4431/Sentiment140_roBERTa_5E\", cache_dir=\"./hf-cache/roberta\")" 1239 | ], 1240 | "metadata": { 1241 | "id": "f520bEFUomRE" 1242 | }, 1243 | "execution_count": 18, 1244 | "outputs": [] 1245 | }, 1246 | { 1247 | "cell_type": "markdown", 1248 | "source": [ 1249 | "Put the model on GPU if available." 1250 | ], 1251 | "metadata": { 1252 | "id": "R6FvDjEGsel_" 1253 | } 1254 | }, 1255 | { 1256 | "cell_type": "code", 1257 | "source": [ 1258 | "if torch.cuda.is_available():\n", 1259 | " model.to('cuda')" 1260 | ], 1261 | "metadata": { 1262 | "id": "_bi3oogAqGxc" 1263 | }, 1264 | "execution_count": 19, 1265 | "outputs": [] 1266 | }, 1267 | { 1268 | "cell_type": "markdown", 1269 | "source": [ 1270 | "# Load the Metrics" 1271 | ], 1272 | "metadata": { 1273 | "id": "YOHVPq6-si02" 1274 | } 1275 | }, 1276 | { 1277 | "cell_type": "code", 1278 | "source": [ 1279 | "metrics = evaluate.combine([\"accuracy\", \"f1\", \"precision\", \"recall\"])" 1280 | ], 1281 | "metadata": { 1282 | "id": "H0Pd_tGUqKrq" 1283 | }, 1284 | "execution_count": 20, 1285 | "outputs": [] 1286 | }, 1287 | { 1288 | "cell_type": "markdown", 1289 | "source": [ 1290 | "# Prediction Loop" 1291 | ], 1292 | "metadata": { 1293 | "id": "CQhhUnI3skuL" 1294 | } 1295 | }, 1296 | { 1297 | "cell_type": "code", 1298 | "source": [ 1299 | "for batch in tqdm( ds_loader ): \n", 1300 | " # Tokenize\n", 1301 | " inputs = tokenizer(batch['text'], return_tensors=\"pt\", padding=True)\n", 1302 | "\n", 1303 | " if torch.cuda.is_available():\n", 1304 | " model.to('cuda')\n", 1305 | " \n", 1306 | " # Make Predictions\n", 1307 | " with torch.no_grad():\n", 1308 | " logits = model(**inputs).logits\n", 1309 | "\n", 1310 | " # Find the Predicted Label\n", 1311 | " predicted_class_id = logits.argmax(dim=-1)\n", 1312 | "\n", 1313 | " # Add the batch result to Evaluator object\n", 1314 | " metrics.add_batch(references=batch['sentiment'], predictions=predicted_class_id)" 1315 | ], 1316 | "metadata": { 1317 | "colab": { 1318 | "base_uri": "https://localhost:8080/" 1319 | }, 1320 | "id": "mCx7-9FhqR8n", 1321 | "outputId": "f5818669-ba94-493a-ac16-bdbf0ce56aa2" 1322 | }, 1323 | "execution_count": 21, 1324 | "outputs": [ 1325 | { 1326 | "output_type": "stream", 1327 | "name": "stderr", 1328 | "text": [ 1329 | "100%|██████████| 2/2 [00:08<00:00, 4.36s/it]\n" 1330 | ] 1331 | } 1332 | ] 1333 | }, 1334 | { 1335 | "cell_type": "code", 1336 | "source": [ 1337 | "metrics.compute()" 1338 | ], 1339 | "metadata": { 1340 | "colab": { 1341 | "base_uri": "https://localhost:8080/" 1342 | }, 1343 | "id": "rtQGZyc1qhM1", 1344 | "outputId": "d0ebc8d3-56ea-45d5-de31-4ae9d64c98d2" 1345 | }, 1346 | "execution_count": 22, 1347 | "outputs": [ 1348 | { 1349 | "output_type": "execute_result", 1350 | "data": { 1351 | "text/plain": [ 1352 | "{'accuracy': 0.8125,\n", 1353 | " 'f1': 0.8333333333333334,\n", 1354 | " 'precision': 0.9375,\n", 1355 | " 'recall': 0.75}" 1356 | ] 1357 | }, 1358 | "metadata": {}, 1359 | "execution_count": 22 1360 | } 1361 | ] 1362 | } 1363 | ] 1364 | } -------------------------------------------------------------------------------- /data/mini-dataset.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/8c42a0b8456d7a534d567c4731344770fe6269c2/data/mini-dataset.csv -------------------------------------------------------------------------------- /data/mini-dataset.json: -------------------------------------------------------------------------------- 1 | { 2 | "chunks": [ 3 | { 4 | "text": "Meta has once again pushed the boundaries of AI with the release of Llama 2, the highly anticipated successor to its groundbreaking Llama 1 language model. Boasting a range of cutting-edge features, Llama 2 has already disrupted the AI landscape and poses a real challenge to ChatGPT’s dominance. In this article, we will dive into the exciting world of Llama 2 and explore what makes it a true game-changer. I. Llama 2: Revolutionizing Commercial Use Unlike its predecessor Llama 1, which was limited to research use, Llama 2 represents a major advancement as an open-source commercial model. Businesses can now integrate Llama 2 into products to create AI-powered applications. Availability on Azure and AWS facilitates fine-tuning and adoption. However, restrictions apply to prevent exploitation. Companies with over 700 million active daily users cannot use Llama 2. Additionally, its output cannot be used to improve other language models.", 5 | "embedding": "" 6 | }, 7 | { 8 | "text": "II. Llama 2 Model Flavors: Llama 2 is available in four different model sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters. While 7B, 13B, and 70B have already been released, the 34B model is still awaited. The pretrained variant, trained on a whopping 2 trillion tokens, boasts a context window of 4096 tokens, twice the size of its predecessor Llama 1. Meta also released a Llama 2 fine-tuned model for chat applications that was trained on over 1 million human annotations. Such extensive training comes at a cost, with the 70B model taking a staggering 1720320 GPU hours to train. The context window’s length determines the amount of content the model can process at once, making Llama 2 a powerful language model in terms of scale and efficiency.", 9 | "embedding": "" 10 | }, 11 | { 12 | "text": "III. Safety Considerations: A Top Priority for Meta: Meta’s commitment to safety and alignment shines through in Llama 2’s design. The model demonstrates exceptionally low AI safety violation percentages, surpassing even ChatGPT in safety benchmarks. Source: Meta Llama 2 paper Finding the right balance between helpfulness and safety when optimizing a model poses significant challenges. While a highly helpful model may be capable of answering any question, including sensitive ones like “How do I build a bomb?”, it also raises concerns about potential misuse. Thus, striking the perfect equilibrium between providing useful information and ensuring safety is paramount.", 13 | "embedding": "" 14 | }, 15 | { 16 | "text": "However, prioritizing safety to an extreme extent can lead to a model that struggles to effectively address a diverse range of questions. This limitation could hinder the model’s practical applicability and user experience. Thus, achieving an optimum balance that allows the model to be both helpful and safe is of utmost importance. To strike the right balance between helpfulness and safety, Meta employed two reward models — one for helpfulness and another for safety — to optimize the model’s responses. The 34B parameter model has reported higher safety violations than other variants, possibly contributing to the delay in its release.", 17 | "embedding": "" 18 | }, 19 | { 20 | "text": "IV. Helpfulness Comparison: Llama 2 Outperforms Competitors: Llama 2 emerges as a strong contender in the open-source language model arena, outperforming its competitors in most categories. The 70B parameter model outperforms all other open-source models, while the 7B and 34B models outshine Falcon in all categories and MPT in all categories except coding. Source: Meta Llama 2 paper. Despite being smaller, Llam a2’s performance rivals that of Chat GPT 3.5, a significantly larger closed-source model. While GPT 4 and PalM-2-L, with their larger size, outperform Llama 2, this is expected due to their capacity for handling complex language tasks. Llama 2’s impressive ability to compete with larger models highlights its efficiency and potential in the market. Source: Meta Llama 2 paper. However, Llama 2 does face challenges in coding and math problems, where models like Chat GPT 4 excel, given their significantly larger size. Chat GPT 4 performed significantly better than Llama 2 for coding (HumanEval benchmark)and math problem tasks (GSM8k benchmark). Open-source AI technologies, like Llama 2, continue to advance, offering strong competition to closed-source models.", 21 | "embedding": "" 22 | }, 23 | { 24 | "text": "V. Ghost Attention: Enhancing Conversational Continuity: One unique feature in Llama 2 is Ghost Attention, which ensures continuity in conversations. This means that even after multiple interactions, the model remembers its initial instructions, ensuring more coherent and consistent responses throughout the conversation. This feature significantly enhances the user experience and makes Llama 2 a more reliable language model for interactive applications. In the example below, on the left, it forgets to use an emoji after a few conversations. On the right, with Ghost Attention, even after having many conversations, it will remember the context and continue to use emojis in its response. Source: Meta Llama 2 paper.", 25 | "embedding": "" 26 | }, 27 | { 28 | "text": "VI. Temporal Capability: A Leap in Information Organization: Meta reported a groundbreaking temporal capability, where the model organizes information based on time relevance. Each question posed to the model is associated with a date, and it responds accordingly by considering the event date before which the question becomes irrelevant. For example, if you ask the question, “How long ago did Barack Obama become president?”, its only relevant after 2008. This temporal awareness allows Llama 2 to deliver more contextually accurate responses, enriching the user experience further. Source: Meta Llama 2 paper.", 29 | "embedding": "" 30 | }, 31 | { 32 | "text": "VII. Open Questions and Future Outlook: Meta’s open-sourcing of Llama 2 represents a seismic shift, now offering developers and researchers commercial access to a leading language model. With Llama 2 outperforming MosaicML’s current MPT models, all eyes are on how Databricks will respond. Can MosaicML’s next MPT iteration beat Llama 2? Is it worthwhile to compete with Llama 2 or join hands with the open-source community to make the open-source models better? Meanwhile, Microsoft’s move to host Llama 2 on Azure despite having significant investment in ChatGPT raises interesting questions. Will users prefer the capabilities and transparency of an open-source model like Llama 2 over closed, proprietary options? The stakes are high, as Meta’s bold democratization play stands to reshape preferences and partnerships in the AI space. One thing is certain — the era of open language model competition has begun.", 33 | "embedding": "" 34 | }, 35 | { 36 | "text": "VIII. Conclusion: With the launch of Llama 2, Meta has achieved a landmark breakthrough in open-source language models, unleashing new potential through its commercial accessibility. Llama 2’s formidable capabilities in natural language processing, along with robust safety protocols and temporal reasoning, set new benchmarks for the field. While select limitations around math and coding exist presently, Llama 2’s strengths far outweigh its weaknesses. As Meta continues honing Llama technology, this latest innovation promises to be truly transformative. By open-sourcing such an advanced model, Meta is propelling democratization and proliferation of AI across industries. From healthcare to education and beyond, Llama 2 stands to shape the landscape by putting groundbreaking language modeling into the hands of all developers and researchers. The possibilities unlocked by this open-source approach signal a shift towards a more collaborative, creative AI future.", 37 | "embedding": "" 38 | }, 39 | { 40 | "text": "About 2 weeks ago, the world of generative AI was shocked by the company Meta's release of the new Llama-2 AI model. Its predecessor, Llama-1, was a breaking point in the LLM industry, as with the release of its weights along with new finetuning techniques, there was a massive creation of open-source LLM models that led to the emergence of high-performance models such as Vicuna, Koala, … In this article, we will briefly discuss some of this model's relevant points but will focus on showing how we can quickly train the model for a specific task using libraries and tools standard in this world. We will not make an exhaustive analysis of the new model, there are already numerous articles published on the subject.", 41 | "embedding": "" 42 | }, 43 | { 44 | "text": "New Llama-2 model: In mid-July, Meta released its new family of pre-trained and finetuned models called Llama-2, with an open source and commercial character to facilitate its use and expansion. The base model was released with a chat version and sizes 7B, 13B, and 70B. Together with the models, the corresponding papers were published describing their characteristics and relevant points of the learning process, which provide very interesting information on the subject. An updated version of Llama 1, trained on a new mix of publicly available data. The pretraining corpus size was increased by 40%, the model’s context length was doubled, and grouped-query attention was adopted. Variants with 7B, 13B, and 70B parameters are released, along with 34B variants reported in the paper but not released.[1]", 45 | "embedding": "" 46 | }, 47 | { 48 | "text": "For pre-training, 40% more tokens were used, reaching 2T, the context length was doubled and the grouped-query attention (GQA) technique was applied to speed up inference on the heavier 70B model. On the standard transformer architecture, RMSNorm normalization, SwiGLU activation, and rotatory positional embedding are used, the context length reaches 4096 tokens, and an Adam optimizer is applied with a cosine learning rate schedule, a weight decay of 0.1 and gradient clipping. The Supervised Fine-Tuning (SFT) stage is characterized by a prioritization of quality examples over quantity, as numerous reports show that the use of high-quality data results in improved final model performance. Finally, a Reinforcement Learning with Human Feedback (RLHF) step is applied to align the model with user preferences. A multitude of examples are collected where annotators select their preferred model output over a binary comparison. This data is used to train a reward model, where the focus is on helpfulness and safety.", 49 | "embedding": "" 50 | }, 51 | { 52 | "text": "In short: - Trained on 2T Tokens. - Commercial use allowed. - Chat models for dialogue use cases. - 4096 default context window (can be increased). - 7B, 13B & 70B parameter version. - 70B model adopted grouped-query attention (GQA). - Chat models can use tools & plugins. - LLaMA 2-CHAT as good as OpenAI ChatGPT.", 53 | "embedding": "" 54 | }, 55 | { 56 | "text": "The dataset for tuning: For our tuning process, we will take a dataset containing about 18,000 examples where the model is asked to build a Python code that solves a given task. This is an extraction of the original dataset [2], where only the Python language examples are selected. Each row contains the description of the task to be solved, an example of data input to the task if applicable, and the generated code fragment that solves the task is provided [3]. # Load dataset from the hub\ndataset = load_dataset(dataset_name, split=dataset_split)\n# Show dataset size print(f'dataset size: {len(dataset)}')\n# Show an example\nprint(dataset[randrange(len(dataset))])", 57 | "embedding": "" 58 | }, 59 | { 60 | "text": "Creating the prompt To carry out an instruction fine-tuning, we must transform each one of our data examples as if it were an instruction, outlining its main sections as follows: def format_instruction(sample):\n return f'''### Instruction:\n Use the Task below and the Input given to write the Response, which is a programming code that can solve the following Task: ### Task:\n{sample['instruction']}\n### Input: \n {sample['input']}\n### Response:\n{sample['output']}\n'''Output:### Instruction: Use the Task below and the Input given to write the Response, which is a programming code that can solve the following Task: ### Task: Develop a Python program that prints 'Hello, World!' whenever it is run. ### Input: ### Response: #Python program to print 'Hello World!' print('Hello, World!')", 61 | "embedding": "" 62 | }, 63 | { 64 | "text": "Fine-tuning the model\nTo carry out this stage, we have used the Google Colab environment, where we have developed a notebook that allows us to run the training in an interactive way and also a Python script to run the training in unattended mode. For the first test runs, a T4 instance with a high RAM capacity is enough, but when it comes to running the whole dataset and epochs, we have opted to use an A100 instance in order to speed up the training and ensure that its execution time is reasonable.\nIn order to be able to share the model, we will log in to the Huggingface hub using the appropriate token, so that at the end of the whole process, we will upload the model files so that they can be shared with the rest of the users.\nfrom huggingface_hub import login\nfrom dotenv import load_dotenv\nimport os\n# Load the enviroment variables\nload_dotenv()\n# Login to the Hugging Face Hub\nlogin(token=os.getenv('HF_HUB_TOKEN'))", 65 | "embedding": "" 66 | }, 67 | { 68 | "text": "Fine-tuning techniques: PEFT, Lora, and QLora\nIn recent months, some papers have appeared showing how PEFT techniques can be used to train large language models with a drastic reduction of RAM requirements and consequently allowing fine-tuning of these models on a single GPU of reasonable size.\nThe usual steps to train an LLM consist, first, an intensive pre-training on billions or trillions of tokens to obtain a foundation model, and then a fine-tuning is performed on this model to specialize it on a downstream task. In this fine-tuning phase is where the PEFT technique has its purpose.\nParameter Efficient Fine-Tuning (PEFT) allows us to considerably reduce RAM and storage requirements by only fine-tuning a small number of additional parameters, with virtually all model parameters remaining frozen. PEFT has been found to produce good generalization with relatively low-volume datasets. Furthermore, it enhances the reusability and portability of the model, as the small checkpoints obtained can be easily added to the base model, and the base model can be easily fine-tuned and reused in multiple scenarios by adding the PEFT parameters. Finally, since the base model is not adjusted, all the knowledge acquired in the pre-training phase is preserved, thus avoiding catastrophic forgetting.\n", 69 | "embedding": "" 70 | }, 71 | { 72 | "text": "Most widely used PEFT techniques aim to keep the pre-trained base model untouched and add new layers or parameters on top of it. These layers are called “Adapters” and the technique of their adjustment “adapter-tuning”, we add these layers to the pre-trained base model and only train the parameters of these new layers. However, a serious problem with this approach is that these layers lead to increased latency in the inference phase, which makes the process inefficient in many scenarios.\nIn the LoRa technique, a Low-Rank Adaptation of Large Language Models, the idea is not to include new layers but to add values to the parameters in a way that avoids this scary problem of latency in the inference phase. LoRa trains and stores the changes of the additional weights while freezing all the weights of the pre-trained model. Therefore, we train a new weights matrix with the changes in the pre-trained model matrix, and this new matrix is decomposed into 2 Low-rank matrices as explained here:", 73 | "embedding": "" 74 | }, 75 | { 76 | "text": "Let all the parameters of a LLM in the matrix W0 and the additional weight changes in the matrix ∆W, the final weights become W0 + ∆W. The authors of LoRA [1] proposed that the change in weight change matrix ∆W can be decomposed into two low-rank matrices A and B. LoRA does not train the parameters in ∆W directly, but the parameters in A and B. So the number of trainable parameters is much less. Hypothetically suppose the dimension of A is 100 * 1 and that of B is 1 * 100, the number of parameters in ∆W will be 100 * 100 = 10000. There are only 100 + 100 = 200 to train in A and B, instead of 10000 to train in ∆W\n[4]. Explanation by Dr. Dataman in Fine-tuning a GPT — LoRA\nThe size of these low-rank matrices is defined by the r parameter. The smaller this value is, the fewer parameters to train, therefore, less effort and faster, but on the other hand, a potential loss of information and performance.\nIf you want a more detailed explanation, you can refer to the original paper, or there are plenty of articles that explain it in detail, such as [4].\nFinally, QLoRa [6] consists of applying quantization to the LoRa method allowing 4-bit normal quantization, nf4, a type optimized for normally distributed weights; double quantization to reduce the memory footprint and the optimization of the NVIDIA unified memory. These are techniques to optimize memory usage to achieve “lighter” and less expensive training.", 77 | "embedding": "" 78 | }, 79 | { 80 | "text": "Implementing QLoRa in our experiment requires specifying the BitsAndBytes configuration, downloading the pretrained model in 4-bit quantization, and defining a LoraConfig. Finally, we need to retrieve the tokenizer.\n# Get the type\ncompute_dtype = getattr(torch, bnb_4bit_compute_dtype)\n# BitsAndBytesConfig int-4 config\nbnb_config = BitsAndBytesConfig(\n load_in_4bit=use_4bit,\n bnb_4bit_use_double_quant=use_double_nested_quant,\n bnb_4bit_quant_type=bnb_4bit_quant_type,\n bnb_4bit_compute_dtype=compute_dtype\n)\n# Load model and tokenizer\nmodel = AutoModelForCausalLM.from_pretrained(model_id, \n quantization_config=bnb_config, use_cache = False, device_map=device_map)\nmodel.config.pretraining_tp = 1\n# Load the tokenizer\ntokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)\ntokenizer.pad_token = tokenizer.eos_token\ntokenizer.padding_side = 'right'\nParameters defined,\n# Activate 4-bit precision base model loading\nuse_4bit = True\n# Compute dtype for 4-bit base models\nbnb_4bit_compute_dtype = 'float16'\n# Quantization type (fp4 or nf4)\nbnb_4bit_quant_type = 'nf4'\n# Activate nested quantization for 4-bit base models (double quantization)\nuse_double_nested_quant = False\n# LoRA attention dimension\nlora_r = 64\n# Alpha parameter for LoRA scaling\nlora_alpha = 16\n# Dropout probability for LoRA layers\nlora_dropout = 0.1\nAnd the next steps are well-known for all Hugging Face users, setting up the training arguments, and creating a Trainer. As we are executing an instruction fine-tuning we call to the SFTTrainer method that encapsulates the PEFT model definition and other steps.\n# Define the training arguments\nargs = TrainingArguments(\n output_dir=output_dir,\n num_train_epochs=num_train_epochs,\n per_device_train_batch_size=per_device_train_batch_size, # 6 if use_flash_attention else 4,\n gradient_accumulation_steps=gradient_accumulation_steps,\n gradient_checkpointing=gradient_checkpointing,\n optim=optim,\n logging_steps=logging_steps,\n save_strategy='epoch',\n learning_rate=learning_rate,\n weight_decay=weight_decay,\n fp16=fp16,\n bf16=bf16,\n max_grad_norm=max_grad_norm,\n warmup_ratio=warmup_ratio,\n group_by_length=group_by_length,\n lr_scheduler_type=lr_scheduler_type,\n disable_tqdm=disable_tqdm,\n report_to='tensorboard',\n seed=42\n)\n# Create the trainer\ntrainer = SFTTrainer(\n model=model,\n train_dataset=dataset,\n peft_config=peft_config,\n max_seq_length=max_seq_length,\n tokenizer=tokenizer,\n packing=packing,\n formatting_func=format_instruction,\n args=args,\n)\n# train the model\ntrainer.train() # there will not be a progress bar since tqdm is disabled\n", 81 | "embedding": "" 82 | }, 83 | { 84 | "text": "# save model in local\ntrainer.save_model()\nThe parameters can be found on my GitHub repository, most of them are commonly used in other fine-tuning scripts on LLMs and are the following ones:\n# Number of training epochs\nnum_train_epochs = 1\n# Enable fp16/bf16 training (set bf16 to True with an A100)\nfp16 = False\nbf16 = True\n# Batch size per GPU for training\nper_device_train_batch_size = 4\n# Number of update steps to accumulate the gradients for\ngradient_accumulation_steps = 1\n# Enable gradient checkpointing\ngradient_checkpointing = True\n# Maximum gradient normal (gradient clipping)\nmax_grad_norm = 0.3\n# Initial learning rate (AdamW optimizer)\nlearning_rate = 2e-4\n# Weight decay to apply to all layers except bias/LayerNorm weights\nweight_decay = 0.001\n# Optimizer to use\noptim = 'paged_adamw_32bit'\n# Learning rate schedule\nlr_scheduler_type = 'cosine' #'constant'\n# Ratio of steps for a linear warmup (from 0 to learning rate)\nwarmup_ratio = 0.03\n# Group sequences into batches with same length\n# Saves memory and speeds up training considerably\ngroup_by_length = False\n# Save checkpoint every X updates steps\nsave_steps = 0\n# Log every X updates steps\nlogging_steps = 25\n# Disable tqdm\ndisable_tqdm= True\nMerge the base model and the adapter weights\nAs we mention, we have trained “modification weights” on the base model, our final model requires merging the pretrained model and the adapters in a single model.\nfrom peft import AutoPeftModelForCausalLM\nmodel = AutoPeftModelForCausalLM.from_pretrained(\n args.output_dir,\n low_cpu_mem_usage=True,\n return_dict=True,\n torch_dtype=torch.float16,\n device_map=device_map, \n)\n# Merge LoRA and base model\nmerged_model = model.merge_and_unload()\n# Save the merged model\nmerged_model.save_pretrained('merged_model',safe_serialization=True)\ntokenizer.save_pretrained('merged_model')\n# push merged model to the hub\nmerged_model.push_to_hub(hf_model_repo)\ntokenizer.push_to_hub(hf_model_repo)\nYou can find and download the model in my Hugging Face account edumunozsala/llama-2–7b-int4-python-code-20k. Give it a try!\n", 85 | "embedding": "" 86 | }, 87 | { 88 | "text": "Inferencing or generating Python code\nAnd finally, we will show you how you can download the model from the Hugging Face Hub and call the model to generate an accurate result:\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n# Get the tokenizer\ntokenizer = AutoTokenizer.from_pretrained(hf_model_repo)\n# Load the model\nmodel = AutoModelForCausalLM.from_pretrained(hf_model_repo, load_in_4bit=True, \n torch_dtype=torch.float16,\n device_map=device_map)\n# Create an instruction\ninstruction='Optimize a code snippet written in Python. The code snippet should create a list of numbers from 0 to 10 that are divisible by 2.'\ninput=''\nprompt = f'''### Instruction:\nUse the Task below and the Input given to write the Response, which is a programming code that can solve the Task.\n### Task:\n{instruction}\n### Input:\n{input}\n### Response:\n'''\n# Tokenize the input\ninput_ids = tokenizer(prompt, return_tensors='pt', truncation=True).input_ids.cuda()\n# Run the model to infere an output\noutputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.5)\n# Print the result\nprint(f'Prompt:\n{prompt}\n')\nprint(f'Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}')\nPrompt:\n### Instruction:\nUse the Task below and the Input given to write the Response, which is a programming code that can solve the Task.\n### Task:\nOptimize a code snippet written in Python. The code snippet should create a list of numbers from 0 to 10 that are divisible by 2.\n### Input:\narr = []\nfor i in range(10):\n if i % 2 == 0:\n arr.append(i)\n### Response:\nGenerated instruction:\narr = [i for i in range(10) if i % 2 == 0]\nGround truth:\narr = [i for i in range(11) if i % 2 == 0]\nThanks to Maxime Labonne for an excellent article [9] and Philipp Schmid who provides an inspiring code [8]. Their articles are a must-read for everyone interested in Llama 2 and model fine-tuning.\nAnd it is all I have to mention, I hope you find useful this article and claps are welcome!! You can Follow me and Subscribe to my articles, or even connect to me via Linkedin. The code is available in my Github Repository.", 89 | "embedding": "" 90 | } 91 | ] 92 | } -------------------------------------------------------------------------------- /data/vectorstore.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/8c42a0b8456d7a534d567c4731344770fe6269c2/data/vectorstore.zip -------------------------------------------------------------------------------- /paraphrasing/hf_T5_paraphrasing.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "collapsed_sections": [], 8 | "authorship_tag": "ABX9TyMKfpocsc59ZH0ZboM7IjYh", 9 | "include_colab_link": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "language_info": { 16 | "name": "python" 17 | }, 18 | "widgets": { 19 | "application/vnd.jupyter.widget-state+json": { 20 | "3ed30a8a0a5e4666b4cbeaefb75aa9e3": { 21 | "model_module": "@jupyter-widgets/controls", 22 | "model_name": "HBoxModel", 23 | "model_module_version": "1.5.0", 24 | "state": { 25 | "_dom_classes": [], 26 | "_model_module": "@jupyter-widgets/controls", 27 | "_model_module_version": "1.5.0", 28 | "_model_name": "HBoxModel", 29 | "_view_count": null, 30 | "_view_module": "@jupyter-widgets/controls", 31 | "_view_module_version": "1.5.0", 32 | "_view_name": "HBoxView", 33 | "box_style": "", 34 | "children": [ 35 | "IPY_MODEL_209495704ab34383af7fe65c8ea9dd09", 36 | "IPY_MODEL_4624d5203f1c44718a660a6edb6bc66e", 37 | "IPY_MODEL_94dc84edc49b4199b0a6e7c6b8d87e7b" 38 | ], 39 | "layout": "IPY_MODEL_2391632bc3a34b3fb3210092776c74aa" 40 | } 41 | }, 42 | "209495704ab34383af7fe65c8ea9dd09": { 43 | "model_module": "@jupyter-widgets/controls", 44 | "model_name": "HTMLModel", 45 | "model_module_version": "1.5.0", 46 | "state": { 47 | "_dom_classes": [], 48 | "_model_module": "@jupyter-widgets/controls", 49 | "_model_module_version": "1.5.0", 50 | "_model_name": "HTMLModel", 51 | "_view_count": null, 52 | "_view_module": "@jupyter-widgets/controls", 53 | "_view_module_version": "1.5.0", 54 | "_view_name": "HTMLView", 55 | "description": "", 56 | "description_tooltip": null, 57 | "layout": "IPY_MODEL_a06640373e86495b97346998fa59edd9", 58 | "placeholder": "​", 59 | "style": "IPY_MODEL_2cf8d55b5eb34e38b7b3079312da60b5", 60 | "value": "Downloading spiece.model: 100%" 61 | } 62 | }, 63 | "4624d5203f1c44718a660a6edb6bc66e": { 64 | "model_module": "@jupyter-widgets/controls", 65 | "model_name": "FloatProgressModel", 66 | "model_module_version": "1.5.0", 67 | "state": { 68 | "_dom_classes": [], 69 | "_model_module": "@jupyter-widgets/controls", 70 | "_model_module_version": "1.5.0", 71 | "_model_name": "FloatProgressModel", 72 | "_view_count": null, 73 | "_view_module": "@jupyter-widgets/controls", 74 | "_view_module_version": "1.5.0", 75 | "_view_name": "ProgressView", 76 | "bar_style": "success", 77 | "description": "", 78 | "description_tooltip": null, 79 | "layout": "IPY_MODEL_3782e4e8cad847f087c3014e39d35c41", 80 | "max": 791656, 81 | "min": 0, 82 | "orientation": "horizontal", 83 | "style": "IPY_MODEL_faa73c3ee50a412a91adf244b85b1de6", 84 | "value": 791656 85 | } 86 | }, 87 | "94dc84edc49b4199b0a6e7c6b8d87e7b": { 88 | "model_module": "@jupyter-widgets/controls", 89 | "model_name": "HTMLModel", 90 | "model_module_version": "1.5.0", 91 | "state": { 92 | "_dom_classes": [], 93 | "_model_module": "@jupyter-widgets/controls", 94 | "_model_module_version": "1.5.0", 95 | "_model_name": "HTMLModel", 96 | "_view_count": null, 97 | "_view_module": "@jupyter-widgets/controls", 98 | "_view_module_version": "1.5.0", 99 | "_view_name": "HTMLView", 100 | "description": "", 101 | "description_tooltip": null, 102 | "layout": "IPY_MODEL_733aa710e8cd406695b3736087107bdf", 103 | "placeholder": "​", 104 | "style": "IPY_MODEL_6e66c2ca3d794fe393ee8c13b037fe9c", 105 | "value": " 773k/773k [00:00<00:00, 2.62kB/s]" 106 | } 107 | }, 108 | "2391632bc3a34b3fb3210092776c74aa": { 109 | "model_module": "@jupyter-widgets/base", 110 | "model_name": "LayoutModel", 111 | "model_module_version": "1.2.0", 112 | "state": { 113 | "_model_module": "@jupyter-widgets/base", 114 | "_model_module_version": "1.2.0", 115 | "_model_name": "LayoutModel", 116 | "_view_count": null, 117 | "_view_module": "@jupyter-widgets/base", 118 | "_view_module_version": "1.2.0", 119 | "_view_name": "LayoutView", 120 | "align_content": null, 121 | "align_items": null, 122 | "align_self": null, 123 | "border": null, 124 | "bottom": null, 125 | "display": null, 126 | "flex": null, 127 | "flex_flow": null, 128 | "grid_area": null, 129 | "grid_auto_columns": null, 130 | "grid_auto_flow": null, 131 | "grid_auto_rows": null, 132 | "grid_column": null, 133 | "grid_gap": null, 134 | "grid_row": null, 135 | "grid_template_areas": null, 136 | "grid_template_columns": null, 137 | "grid_template_rows": null, 138 | "height": null, 139 | "justify_content": null, 140 | "justify_items": null, 141 | "left": null, 142 | "margin": null, 143 | "max_height": null, 144 | "max_width": null, 145 | "min_height": null, 146 | "min_width": null, 147 | "object_fit": null, 148 | "object_position": null, 149 | "order": null, 150 | "overflow": null, 151 | "overflow_x": null, 152 | "overflow_y": null, 153 | "padding": null, 154 | "right": null, 155 | "top": null, 156 | "visibility": null, 157 | "width": null 158 | } 159 | }, 160 | "a06640373e86495b97346998fa59edd9": { 161 | "model_module": "@jupyter-widgets/base", 162 | "model_name": "LayoutModel", 163 | "model_module_version": "1.2.0", 164 | "state": { 165 | "_model_module": "@jupyter-widgets/base", 166 | "_model_module_version": "1.2.0", 167 | "_model_name": "LayoutModel", 168 | "_view_count": null, 169 | "_view_module": "@jupyter-widgets/base", 170 | "_view_module_version": "1.2.0", 171 | "_view_name": "LayoutView", 172 | "align_content": null, 173 | "align_items": null, 174 | "align_self": null, 175 | "border": null, 176 | "bottom": null, 177 | "display": null, 178 | "flex": null, 179 | "flex_flow": null, 180 | "grid_area": null, 181 | "grid_auto_columns": null, 182 | "grid_auto_flow": null, 183 | "grid_auto_rows": null, 184 | "grid_column": null, 185 | "grid_gap": null, 186 | "grid_row": null, 187 | "grid_template_areas": null, 188 | "grid_template_columns": null, 189 | "grid_template_rows": null, 190 | "height": null, 191 | "justify_content": null, 192 | "justify_items": null, 193 | "left": null, 194 | "margin": null, 195 | "max_height": null, 196 | "max_width": null, 197 | "min_height": null, 198 | "min_width": null, 199 | "object_fit": null, 200 | "object_position": null, 201 | "order": null, 202 | "overflow": null, 203 | "overflow_x": null, 204 | "overflow_y": null, 205 | "padding": null, 206 | "right": null, 207 | "top": null, 208 | "visibility": null, 209 | "width": null 210 | } 211 | }, 212 | "2cf8d55b5eb34e38b7b3079312da60b5": { 213 | "model_module": "@jupyter-widgets/controls", 214 | "model_name": "DescriptionStyleModel", 215 | "model_module_version": "1.5.0", 216 | "state": { 217 | "_model_module": "@jupyter-widgets/controls", 218 | "_model_module_version": "1.5.0", 219 | "_model_name": "DescriptionStyleModel", 220 | "_view_count": null, 221 | "_view_module": "@jupyter-widgets/base", 222 | "_view_module_version": "1.2.0", 223 | "_view_name": "StyleView", 224 | "description_width": "" 225 | } 226 | }, 227 | "3782e4e8cad847f087c3014e39d35c41": { 228 | "model_module": "@jupyter-widgets/base", 229 | "model_name": "LayoutModel", 230 | "model_module_version": "1.2.0", 231 | "state": { 232 | "_model_module": "@jupyter-widgets/base", 233 | "_model_module_version": "1.2.0", 234 | "_model_name": "LayoutModel", 235 | "_view_count": null, 236 | "_view_module": "@jupyter-widgets/base", 237 | "_view_module_version": "1.2.0", 238 | "_view_name": "LayoutView", 239 | "align_content": null, 240 | "align_items": null, 241 | "align_self": null, 242 | "border": null, 243 | "bottom": null, 244 | "display": null, 245 | "flex": null, 246 | "flex_flow": null, 247 | "grid_area": null, 248 | "grid_auto_columns": null, 249 | "grid_auto_flow": null, 250 | "grid_auto_rows": null, 251 | "grid_column": null, 252 | "grid_gap": null, 253 | "grid_row": null, 254 | "grid_template_areas": null, 255 | "grid_template_columns": null, 256 | "grid_template_rows": null, 257 | "height": null, 258 | "justify_content": null, 259 | "justify_items": null, 260 | "left": null, 261 | "margin": null, 262 | "max_height": null, 263 | "max_width": null, 264 | "min_height": null, 265 | "min_width": null, 266 | "object_fit": null, 267 | "object_position": null, 268 | "order": null, 269 | "overflow": null, 270 | "overflow_x": null, 271 | "overflow_y": null, 272 | "padding": null, 273 | "right": null, 274 | "top": null, 275 | "visibility": null, 276 | "width": null 277 | } 278 | }, 279 | "faa73c3ee50a412a91adf244b85b1de6": { 280 | "model_module": "@jupyter-widgets/controls", 281 | "model_name": "ProgressStyleModel", 282 | "model_module_version": "1.5.0", 283 | "state": { 284 | "_model_module": "@jupyter-widgets/controls", 285 | "_model_module_version": "1.5.0", 286 | "_model_name": "ProgressStyleModel", 287 | "_view_count": null, 288 | "_view_module": "@jupyter-widgets/base", 289 | "_view_module_version": "1.2.0", 290 | "_view_name": "StyleView", 291 | "bar_color": null, 292 | "description_width": "" 293 | } 294 | }, 295 | "733aa710e8cd406695b3736087107bdf": { 296 | "model_module": "@jupyter-widgets/base", 297 | "model_name": "LayoutModel", 298 | "model_module_version": "1.2.0", 299 | "state": { 300 | "_model_module": "@jupyter-widgets/base", 301 | "_model_module_version": "1.2.0", 302 | "_model_name": "LayoutModel", 303 | "_view_count": null, 304 | "_view_module": "@jupyter-widgets/base", 305 | "_view_module_version": "1.2.0", 306 | "_view_name": "LayoutView", 307 | "align_content": null, 308 | "align_items": null, 309 | "align_self": null, 310 | "border": null, 311 | "bottom": null, 312 | "display": null, 313 | "flex": null, 314 | "flex_flow": null, 315 | "grid_area": null, 316 | "grid_auto_columns": null, 317 | "grid_auto_flow": null, 318 | "grid_auto_rows": null, 319 | "grid_column": null, 320 | "grid_gap": null, 321 | "grid_row": null, 322 | "grid_template_areas": null, 323 | "grid_template_columns": null, 324 | "grid_template_rows": null, 325 | "height": null, 326 | "justify_content": null, 327 | "justify_items": null, 328 | "left": null, 329 | "margin": null, 330 | "max_height": null, 331 | "max_width": null, 332 | "min_height": null, 333 | "min_width": null, 334 | "object_fit": null, 335 | "object_position": null, 336 | "order": null, 337 | "overflow": null, 338 | "overflow_x": null, 339 | "overflow_y": null, 340 | "padding": null, 341 | "right": null, 342 | "top": null, 343 | "visibility": null, 344 | "width": null 345 | } 346 | }, 347 | "6e66c2ca3d794fe393ee8c13b037fe9c": { 348 | "model_module": "@jupyter-widgets/controls", 349 | "model_name": "DescriptionStyleModel", 350 | "model_module_version": "1.5.0", 351 | "state": { 352 | "_model_module": "@jupyter-widgets/controls", 353 | "_model_module_version": "1.5.0", 354 | "_model_name": "DescriptionStyleModel", 355 | "_view_count": null, 356 | "_view_module": "@jupyter-widgets/base", 357 | "_view_module_version": "1.2.0", 358 | "_view_name": "StyleView", 359 | "description_width": "" 360 | } 361 | }, 362 | "406d0b1379c748e5bddf41a08d1bc16a": { 363 | "model_module": "@jupyter-widgets/controls", 364 | "model_name": "HBoxModel", 365 | "model_module_version": "1.5.0", 366 | "state": { 367 | "_dom_classes": [], 368 | "_model_module": "@jupyter-widgets/controls", 369 | "_model_module_version": "1.5.0", 370 | "_model_name": "HBoxModel", 371 | "_view_count": null, 372 | "_view_module": "@jupyter-widgets/controls", 373 | "_view_module_version": "1.5.0", 374 | "_view_name": "HBoxView", 375 | "box_style": "", 376 | "children": [ 377 | "IPY_MODEL_4dde398e566244c1a01ea9a21cf61991", 378 | "IPY_MODEL_79c00b4cf7ea4912824088b2e56468bd", 379 | "IPY_MODEL_ed031c9b1a6345809176bea833e995d0" 380 | ], 381 | "layout": "IPY_MODEL_3bb9525493964c70a227235c1585713b" 382 | } 383 | }, 384 | "4dde398e566244c1a01ea9a21cf61991": { 385 | "model_module": "@jupyter-widgets/controls", 386 | "model_name": "HTMLModel", 387 | "model_module_version": "1.5.0", 388 | "state": { 389 | "_dom_classes": [], 390 | "_model_module": "@jupyter-widgets/controls", 391 | "_model_module_version": "1.5.0", 392 | "_model_name": "HTMLModel", 393 | "_view_count": null, 394 | "_view_module": "@jupyter-widgets/controls", 395 | "_view_module_version": "1.5.0", 396 | "_view_name": "HTMLView", 397 | "description": "", 398 | "description_tooltip": null, 399 | "layout": "IPY_MODEL_f147fd7003f64d0e8badb771bfb21b18", 400 | "placeholder": "​", 401 | "style": "IPY_MODEL_b758692b7b44431b80d7f75386da912f", 402 | "value": "Downloading special_tokens_map.json: 100%" 403 | } 404 | }, 405 | "79c00b4cf7ea4912824088b2e56468bd": { 406 | "model_module": "@jupyter-widgets/controls", 407 | "model_name": "FloatProgressModel", 408 | "model_module_version": "1.5.0", 409 | "state": { 410 | "_dom_classes": [], 411 | "_model_module": "@jupyter-widgets/controls", 412 | "_model_module_version": "1.5.0", 413 | "_model_name": "FloatProgressModel", 414 | "_view_count": null, 415 | "_view_module": "@jupyter-widgets/controls", 416 | "_view_module_version": "1.5.0", 417 | "_view_name": "ProgressView", 418 | "bar_style": "success", 419 | "description": "", 420 | "description_tooltip": null, 421 | "layout": "IPY_MODEL_6daf1c8875ae4945ae382a159131f953", 422 | "max": 1786, 423 | "min": 0, 424 | "orientation": "horizontal", 425 | "style": "IPY_MODEL_f29012cb87a647008db4bf4464870157", 426 | "value": 1786 427 | } 428 | }, 429 | "ed031c9b1a6345809176bea833e995d0": { 430 | "model_module": "@jupyter-widgets/controls", 431 | "model_name": "HTMLModel", 432 | "model_module_version": "1.5.0", 433 | "state": { 434 | "_dom_classes": [], 435 | "_model_module": "@jupyter-widgets/controls", 436 | "_model_module_version": "1.5.0", 437 | "_model_name": "HTMLModel", 438 | "_view_count": null, 439 | "_view_module": "@jupyter-widgets/controls", 440 | "_view_module_version": "1.5.0", 441 | "_view_name": "HTMLView", 442 | "description": "", 443 | "description_tooltip": null, 444 | "layout": "IPY_MODEL_01b4832c87444f87a33f2c4517d5c051", 445 | "placeholder": "​", 446 | "style": "IPY_MODEL_503531fc8424441e9a8ac4cfc64f7507", 447 | "value": " 1.74k/1.74k [00:00<00:00, 7.37kB/s]" 448 | } 449 | }, 450 | "3bb9525493964c70a227235c1585713b": { 451 | "model_module": "@jupyter-widgets/base", 452 | "model_name": "LayoutModel", 453 | "model_module_version": "1.2.0", 454 | "state": { 455 | "_model_module": "@jupyter-widgets/base", 456 | "_model_module_version": "1.2.0", 457 | "_model_name": "LayoutModel", 458 | "_view_count": null, 459 | "_view_module": "@jupyter-widgets/base", 460 | "_view_module_version": "1.2.0", 461 | "_view_name": "LayoutView", 462 | "align_content": null, 463 | "align_items": null, 464 | "align_self": null, 465 | "border": null, 466 | "bottom": null, 467 | "display": null, 468 | "flex": null, 469 | "flex_flow": null, 470 | "grid_area": null, 471 | "grid_auto_columns": null, 472 | "grid_auto_flow": null, 473 | "grid_auto_rows": null, 474 | "grid_column": null, 475 | "grid_gap": null, 476 | "grid_row": null, 477 | "grid_template_areas": null, 478 | "grid_template_columns": null, 479 | "grid_template_rows": null, 480 | "height": null, 481 | "justify_content": null, 482 | "justify_items": null, 483 | "left": null, 484 | "margin": null, 485 | "max_height": null, 486 | "max_width": null, 487 | "min_height": null, 488 | "min_width": null, 489 | "object_fit": null, 490 | "object_position": null, 491 | "order": null, 492 | "overflow": null, 493 | "overflow_x": null, 494 | "overflow_y": null, 495 | "padding": null, 496 | "right": null, 497 | "top": null, 498 | "visibility": null, 499 | "width": null 500 | } 501 | }, 502 | "f147fd7003f64d0e8badb771bfb21b18": { 503 | "model_module": "@jupyter-widgets/base", 504 | "model_name": "LayoutModel", 505 | "model_module_version": "1.2.0", 506 | "state": { 507 | "_model_module": "@jupyter-widgets/base", 508 | "_model_module_version": "1.2.0", 509 | "_model_name": "LayoutModel", 510 | "_view_count": null, 511 | "_view_module": "@jupyter-widgets/base", 512 | "_view_module_version": "1.2.0", 513 | "_view_name": "LayoutView", 514 | "align_content": null, 515 | "align_items": null, 516 | "align_self": null, 517 | "border": null, 518 | "bottom": null, 519 | "display": null, 520 | "flex": null, 521 | "flex_flow": null, 522 | "grid_area": null, 523 | "grid_auto_columns": null, 524 | "grid_auto_flow": null, 525 | "grid_auto_rows": null, 526 | "grid_column": null, 527 | "grid_gap": null, 528 | "grid_row": null, 529 | "grid_template_areas": null, 530 | "grid_template_columns": null, 531 | "grid_template_rows": null, 532 | "height": null, 533 | "justify_content": null, 534 | "justify_items": null, 535 | "left": null, 536 | "margin": null, 537 | "max_height": null, 538 | "max_width": null, 539 | "min_height": null, 540 | "min_width": null, 541 | "object_fit": null, 542 | "object_position": null, 543 | "order": null, 544 | "overflow": null, 545 | "overflow_x": null, 546 | "overflow_y": null, 547 | "padding": null, 548 | "right": null, 549 | "top": null, 550 | "visibility": null, 551 | "width": null 552 | } 553 | }, 554 | "b758692b7b44431b80d7f75386da912f": { 555 | "model_module": "@jupyter-widgets/controls", 556 | "model_name": "DescriptionStyleModel", 557 | "model_module_version": "1.5.0", 558 | "state": { 559 | "_model_module": "@jupyter-widgets/controls", 560 | "_model_module_version": "1.5.0", 561 | "_model_name": "DescriptionStyleModel", 562 | "_view_count": null, 563 | "_view_module": "@jupyter-widgets/base", 564 | "_view_module_version": "1.2.0", 565 | "_view_name": "StyleView", 566 | "description_width": "" 567 | } 568 | }, 569 | "6daf1c8875ae4945ae382a159131f953": { 570 | "model_module": "@jupyter-widgets/base", 571 | "model_name": "LayoutModel", 572 | "model_module_version": "1.2.0", 573 | "state": { 574 | "_model_module": "@jupyter-widgets/base", 575 | "_model_module_version": "1.2.0", 576 | "_model_name": "LayoutModel", 577 | "_view_count": null, 578 | "_view_module": "@jupyter-widgets/base", 579 | "_view_module_version": "1.2.0", 580 | "_view_name": "LayoutView", 581 | "align_content": null, 582 | "align_items": null, 583 | "align_self": null, 584 | "border": null, 585 | "bottom": null, 586 | "display": null, 587 | "flex": null, 588 | "flex_flow": null, 589 | "grid_area": null, 590 | "grid_auto_columns": null, 591 | "grid_auto_flow": null, 592 | "grid_auto_rows": null, 593 | "grid_column": null, 594 | "grid_gap": null, 595 | "grid_row": null, 596 | "grid_template_areas": null, 597 | "grid_template_columns": null, 598 | "grid_template_rows": null, 599 | "height": null, 600 | "justify_content": null, 601 | "justify_items": null, 602 | "left": null, 603 | "margin": null, 604 | "max_height": null, 605 | "max_width": null, 606 | "min_height": null, 607 | "min_width": null, 608 | "object_fit": null, 609 | "object_position": null, 610 | "order": null, 611 | "overflow": null, 612 | "overflow_x": null, 613 | "overflow_y": null, 614 | "padding": null, 615 | "right": null, 616 | "top": null, 617 | "visibility": null, 618 | "width": null 619 | } 620 | }, 621 | "f29012cb87a647008db4bf4464870157": { 622 | "model_module": "@jupyter-widgets/controls", 623 | "model_name": "ProgressStyleModel", 624 | "model_module_version": "1.5.0", 625 | "state": { 626 | "_model_module": "@jupyter-widgets/controls", 627 | "_model_module_version": "1.5.0", 628 | "_model_name": "ProgressStyleModel", 629 | "_view_count": null, 630 | "_view_module": "@jupyter-widgets/base", 631 | "_view_module_version": "1.2.0", 632 | "_view_name": "StyleView", 633 | "bar_color": null, 634 | "description_width": "" 635 | } 636 | }, 637 | "01b4832c87444f87a33f2c4517d5c051": { 638 | "model_module": "@jupyter-widgets/base", 639 | "model_name": "LayoutModel", 640 | "model_module_version": "1.2.0", 641 | "state": { 642 | "_model_module": "@jupyter-widgets/base", 643 | "_model_module_version": "1.2.0", 644 | "_model_name": "LayoutModel", 645 | "_view_count": null, 646 | "_view_module": "@jupyter-widgets/base", 647 | "_view_module_version": "1.2.0", 648 | "_view_name": "LayoutView", 649 | "align_content": null, 650 | "align_items": null, 651 | "align_self": null, 652 | "border": null, 653 | "bottom": null, 654 | "display": null, 655 | "flex": null, 656 | "flex_flow": null, 657 | "grid_area": null, 658 | "grid_auto_columns": null, 659 | "grid_auto_flow": null, 660 | "grid_auto_rows": null, 661 | "grid_column": null, 662 | "grid_gap": null, 663 | "grid_row": null, 664 | "grid_template_areas": null, 665 | "grid_template_columns": null, 666 | "grid_template_rows": null, 667 | "height": null, 668 | "justify_content": null, 669 | "justify_items": null, 670 | "left": null, 671 | "margin": null, 672 | "max_height": null, 673 | "max_width": null, 674 | "min_height": null, 675 | "min_width": null, 676 | "object_fit": null, 677 | "object_position": null, 678 | "order": null, 679 | "overflow": null, 680 | "overflow_x": null, 681 | "overflow_y": null, 682 | "padding": null, 683 | "right": null, 684 | "top": null, 685 | "visibility": null, 686 | "width": null 687 | } 688 | }, 689 | "503531fc8424441e9a8ac4cfc64f7507": { 690 | "model_module": "@jupyter-widgets/controls", 691 | "model_name": "DescriptionStyleModel", 692 | "model_module_version": "1.5.0", 693 | "state": { 694 | "_model_module": "@jupyter-widgets/controls", 695 | "_model_module_version": "1.5.0", 696 | "_model_name": "DescriptionStyleModel", 697 | "_view_count": null, 698 | "_view_module": "@jupyter-widgets/base", 699 | "_view_module_version": "1.2.0", 700 | "_view_name": "StyleView", 701 | "description_width": "" 702 | } 703 | }, 704 | "4aba0951ea8344219f3133451a314ce2": { 705 | "model_module": "@jupyter-widgets/controls", 706 | "model_name": "HBoxModel", 707 | "model_module_version": "1.5.0", 708 | "state": { 709 | "_dom_classes": [], 710 | "_model_module": "@jupyter-widgets/controls", 711 | "_model_module_version": "1.5.0", 712 | "_model_name": "HBoxModel", 713 | "_view_count": null, 714 | "_view_module": "@jupyter-widgets/controls", 715 | "_view_module_version": "1.5.0", 716 | "_view_name": "HBoxView", 717 | "box_style": "", 718 | "children": [ 719 | "IPY_MODEL_b61ca199e36947d8a76bb94d43705cb3", 720 | "IPY_MODEL_1c9984805be54af987bbc637cff39366", 721 | "IPY_MODEL_10e1477041db4335861396d47fcd38ba" 722 | ], 723 | "layout": "IPY_MODEL_41e7824ba62448169ed589d94a9ff1a7" 724 | } 725 | }, 726 | "b61ca199e36947d8a76bb94d43705cb3": { 727 | "model_module": "@jupyter-widgets/controls", 728 | "model_name": "HTMLModel", 729 | "model_module_version": "1.5.0", 730 | "state": { 731 | "_dom_classes": [], 732 | "_model_module": "@jupyter-widgets/controls", 733 | "_model_module_version": "1.5.0", 734 | "_model_name": "HTMLModel", 735 | "_view_count": null, 736 | "_view_module": "@jupyter-widgets/controls", 737 | "_view_module_version": "1.5.0", 738 | "_view_name": "HTMLView", 739 | "description": "", 740 | "description_tooltip": null, 741 | "layout": "IPY_MODEL_381c420b802a4929be8fb948e6fe77b7", 742 | "placeholder": "​", 743 | "style": "IPY_MODEL_5b9986e6fa314c4da1ffb45b513255a9", 744 | "value": "Downloading tokenizer_config.json: 100%" 745 | } 746 | }, 747 | "1c9984805be54af987bbc637cff39366": { 748 | "model_module": "@jupyter-widgets/controls", 749 | "model_name": "FloatProgressModel", 750 | "model_module_version": "1.5.0", 751 | "state": { 752 | "_dom_classes": [], 753 | "_model_module": "@jupyter-widgets/controls", 754 | "_model_module_version": "1.5.0", 755 | "_model_name": "FloatProgressModel", 756 | "_view_count": null, 757 | "_view_module": "@jupyter-widgets/controls", 758 | "_view_module_version": "1.5.0", 759 | "_view_name": "ProgressView", 760 | "bar_style": "success", 761 | "description": "", 762 | "description_tooltip": null, 763 | "layout": "IPY_MODEL_20774c588bab4925bd80a491549a8dd6", 764 | "max": 1889, 765 | "min": 0, 766 | "orientation": "horizontal", 767 | "style": "IPY_MODEL_c92e682a63664d4e83b3596e6126508b", 768 | "value": 1889 769 | } 770 | }, 771 | "10e1477041db4335861396d47fcd38ba": { 772 | "model_module": "@jupyter-widgets/controls", 773 | "model_name": "HTMLModel", 774 | "model_module_version": "1.5.0", 775 | "state": { 776 | "_dom_classes": [], 777 | "_model_module": "@jupyter-widgets/controls", 778 | "_model_module_version": "1.5.0", 779 | "_model_name": "HTMLModel", 780 | "_view_count": null, 781 | "_view_module": "@jupyter-widgets/controls", 782 | "_view_module_version": "1.5.0", 783 | "_view_name": "HTMLView", 784 | "description": "", 785 | "description_tooltip": null, 786 | "layout": "IPY_MODEL_9d8c0363f8e7449b8f6a30a92d66e924", 787 | "placeholder": "​", 788 | "style": "IPY_MODEL_3b8a84d61f4e4805b017ec11c0599db7", 789 | "value": " 1.84k/1.84k [00:00<00:00, 6.55kB/s]" 790 | } 791 | }, 792 | "41e7824ba62448169ed589d94a9ff1a7": { 793 | "model_module": "@jupyter-widgets/base", 794 | "model_name": "LayoutModel", 795 | "model_module_version": "1.2.0", 796 | "state": { 797 | "_model_module": "@jupyter-widgets/base", 798 | "_model_module_version": "1.2.0", 799 | "_model_name": "LayoutModel", 800 | "_view_count": null, 801 | "_view_module": "@jupyter-widgets/base", 802 | "_view_module_version": "1.2.0", 803 | "_view_name": "LayoutView", 804 | "align_content": null, 805 | "align_items": null, 806 | "align_self": null, 807 | "border": null, 808 | "bottom": null, 809 | "display": null, 810 | "flex": null, 811 | "flex_flow": null, 812 | "grid_area": null, 813 | "grid_auto_columns": null, 814 | "grid_auto_flow": null, 815 | "grid_auto_rows": null, 816 | "grid_column": null, 817 | "grid_gap": null, 818 | "grid_row": null, 819 | "grid_template_areas": null, 820 | "grid_template_columns": null, 821 | "grid_template_rows": null, 822 | "height": null, 823 | "justify_content": null, 824 | "justify_items": null, 825 | "left": null, 826 | "margin": null, 827 | "max_height": null, 828 | "max_width": null, 829 | "min_height": null, 830 | "min_width": null, 831 | "object_fit": null, 832 | "object_position": null, 833 | "order": null, 834 | "overflow": null, 835 | "overflow_x": null, 836 | "overflow_y": null, 837 | "padding": null, 838 | "right": null, 839 | "top": null, 840 | "visibility": null, 841 | "width": null 842 | } 843 | }, 844 | "381c420b802a4929be8fb948e6fe77b7": { 845 | "model_module": "@jupyter-widgets/base", 846 | "model_name": "LayoutModel", 847 | "model_module_version": "1.2.0", 848 | "state": { 849 | "_model_module": "@jupyter-widgets/base", 850 | "_model_module_version": "1.2.0", 851 | "_model_name": "LayoutModel", 852 | "_view_count": null, 853 | "_view_module": "@jupyter-widgets/base", 854 | "_view_module_version": "1.2.0", 855 | "_view_name": "LayoutView", 856 | "align_content": null, 857 | "align_items": null, 858 | "align_self": null, 859 | "border": null, 860 | "bottom": null, 861 | "display": null, 862 | "flex": null, 863 | "flex_flow": null, 864 | "grid_area": null, 865 | "grid_auto_columns": null, 866 | "grid_auto_flow": null, 867 | "grid_auto_rows": null, 868 | "grid_column": null, 869 | "grid_gap": null, 870 | "grid_row": null, 871 | "grid_template_areas": null, 872 | "grid_template_columns": null, 873 | "grid_template_rows": null, 874 | "height": null, 875 | "justify_content": null, 876 | "justify_items": null, 877 | "left": null, 878 | "margin": null, 879 | "max_height": null, 880 | "max_width": null, 881 | "min_height": null, 882 | "min_width": null, 883 | "object_fit": null, 884 | "object_position": null, 885 | "order": null, 886 | "overflow": null, 887 | "overflow_x": null, 888 | "overflow_y": null, 889 | "padding": null, 890 | "right": null, 891 | "top": null, 892 | "visibility": null, 893 | "width": null 894 | } 895 | }, 896 | "5b9986e6fa314c4da1ffb45b513255a9": { 897 | "model_module": "@jupyter-widgets/controls", 898 | "model_name": "DescriptionStyleModel", 899 | "model_module_version": "1.5.0", 900 | "state": { 901 | "_model_module": "@jupyter-widgets/controls", 902 | "_model_module_version": "1.5.0", 903 | "_model_name": "DescriptionStyleModel", 904 | "_view_count": null, 905 | "_view_module": "@jupyter-widgets/base", 906 | "_view_module_version": "1.2.0", 907 | "_view_name": "StyleView", 908 | "description_width": "" 909 | } 910 | }, 911 | "20774c588bab4925bd80a491549a8dd6": { 912 | "model_module": "@jupyter-widgets/base", 913 | "model_name": "LayoutModel", 914 | "model_module_version": "1.2.0", 915 | "state": { 916 | "_model_module": "@jupyter-widgets/base", 917 | "_model_module_version": "1.2.0", 918 | "_model_name": "LayoutModel", 919 | "_view_count": null, 920 | "_view_module": "@jupyter-widgets/base", 921 | "_view_module_version": "1.2.0", 922 | "_view_name": "LayoutView", 923 | "align_content": null, 924 | "align_items": null, 925 | "align_self": null, 926 | "border": null, 927 | "bottom": null, 928 | "display": null, 929 | "flex": null, 930 | "flex_flow": null, 931 | "grid_area": null, 932 | "grid_auto_columns": null, 933 | "grid_auto_flow": null, 934 | "grid_auto_rows": null, 935 | "grid_column": null, 936 | "grid_gap": null, 937 | "grid_row": null, 938 | "grid_template_areas": null, 939 | "grid_template_columns": null, 940 | "grid_template_rows": null, 941 | "height": null, 942 | "justify_content": null, 943 | "justify_items": null, 944 | "left": null, 945 | "margin": null, 946 | "max_height": null, 947 | "max_width": null, 948 | "min_height": null, 949 | "min_width": null, 950 | "object_fit": null, 951 | "object_position": null, 952 | "order": null, 953 | "overflow": null, 954 | "overflow_x": null, 955 | "overflow_y": null, 956 | "padding": null, 957 | "right": null, 958 | "top": null, 959 | "visibility": null, 960 | "width": null 961 | } 962 | }, 963 | "c92e682a63664d4e83b3596e6126508b": { 964 | "model_module": "@jupyter-widgets/controls", 965 | "model_name": "ProgressStyleModel", 966 | "model_module_version": "1.5.0", 967 | "state": { 968 | "_model_module": "@jupyter-widgets/controls", 969 | "_model_module_version": "1.5.0", 970 | "_model_name": "ProgressStyleModel", 971 | "_view_count": null, 972 | "_view_module": "@jupyter-widgets/base", 973 | "_view_module_version": "1.2.0", 974 | "_view_name": "StyleView", 975 | "bar_color": null, 976 | "description_width": "" 977 | } 978 | }, 979 | "9d8c0363f8e7449b8f6a30a92d66e924": { 980 | "model_module": "@jupyter-widgets/base", 981 | "model_name": "LayoutModel", 982 | "model_module_version": "1.2.0", 983 | "state": { 984 | "_model_module": "@jupyter-widgets/base", 985 | "_model_module_version": "1.2.0", 986 | "_model_name": "LayoutModel", 987 | "_view_count": null, 988 | "_view_module": "@jupyter-widgets/base", 989 | "_view_module_version": "1.2.0", 990 | "_view_name": "LayoutView", 991 | "align_content": null, 992 | "align_items": null, 993 | "align_self": null, 994 | "border": null, 995 | "bottom": null, 996 | "display": null, 997 | "flex": null, 998 | "flex_flow": null, 999 | "grid_area": null, 1000 | "grid_auto_columns": null, 1001 | "grid_auto_flow": null, 1002 | "grid_auto_rows": null, 1003 | "grid_column": null, 1004 | "grid_gap": null, 1005 | "grid_row": null, 1006 | "grid_template_areas": null, 1007 | "grid_template_columns": null, 1008 | "grid_template_rows": null, 1009 | "height": null, 1010 | "justify_content": null, 1011 | "justify_items": null, 1012 | "left": null, 1013 | "margin": null, 1014 | "max_height": null, 1015 | "max_width": null, 1016 | "min_height": null, 1017 | "min_width": null, 1018 | "object_fit": null, 1019 | "object_position": null, 1020 | "order": null, 1021 | "overflow": null, 1022 | "overflow_x": null, 1023 | "overflow_y": null, 1024 | "padding": null, 1025 | "right": null, 1026 | "top": null, 1027 | "visibility": null, 1028 | "width": null 1029 | } 1030 | }, 1031 | "3b8a84d61f4e4805b017ec11c0599db7": { 1032 | "model_module": "@jupyter-widgets/controls", 1033 | "model_name": "DescriptionStyleModel", 1034 | "model_module_version": "1.5.0", 1035 | "state": { 1036 | "_model_module": "@jupyter-widgets/controls", 1037 | "_model_module_version": "1.5.0", 1038 | "_model_name": "DescriptionStyleModel", 1039 | "_view_count": null, 1040 | "_view_module": "@jupyter-widgets/base", 1041 | "_view_module_version": "1.2.0", 1042 | "_view_name": "StyleView", 1043 | "description_width": "" 1044 | } 1045 | } 1046 | } 1047 | } 1048 | }, 1049 | "cells": [ 1050 | { 1051 | "cell_type": "markdown", 1052 | "metadata": { 1053 | "id": "view-in-github", 1054 | "colab_type": "text" 1055 | }, 1056 | "source": [ 1057 | "\"Open" 1058 | ] 1059 | }, 1060 | { 1061 | "cell_type": "markdown", 1062 | "source": [ 1063 | "# A sample code to show how Diverse Beam Search can improve the paraphrasing quality." 1064 | ], 1065 | "metadata": { 1066 | "id": "2GUHrLV5qf08" 1067 | } 1068 | }, 1069 | { 1070 | "cell_type": "markdown", 1071 | "source": [ 1072 | "The code is the supplementary material to the story published in NLPiation medium blog. Follow [the link](https://pub.towardsai.net/how-to-do-effective-paraphrasing-using-huggingface-and-diverse-beam-search-t5-pegasus-229ca998d229) for a detailed explanation of the diverse beam search and following code." 1073 | ], 1074 | "metadata": { 1075 | "id": "TnNSa-tb3PyL" 1076 | } 1077 | }, 1078 | { 1079 | "cell_type": "markdown", 1080 | "source": [ 1081 | "# Download, and Load the Libraries" 1082 | ], 1083 | "metadata": { 1084 | "id": "fpQu8KsJqgtM" 1085 | } 1086 | }, 1087 | { 1088 | "cell_type": "markdown", 1089 | "source": [ 1090 | "Start by installing the Transformers library (by Huggingface) and then import the modules." 1091 | ], 1092 | "metadata": { 1093 | "id": "XVMJ_qId4OT4" 1094 | } 1095 | }, 1096 | { 1097 | "cell_type": "code", 1098 | "execution_count": 1, 1099 | "metadata": { 1100 | "id": "C7MrVX5wqVTJ" 1101 | }, 1102 | "outputs": [], 1103 | "source": [ 1104 | "!pip install -q transformers\n", 1105 | "!pip install -q datasets\n", 1106 | "!pip install -q sentencepiece" 1107 | ] 1108 | }, 1109 | { 1110 | "cell_type": "markdown", 1111 | "source": [ 1112 | "# Load the Architecture and Weights" 1113 | ], 1114 | "metadata": { 1115 | "id": "tVA5L7W04R0y" 1116 | } 1117 | }, 1118 | { 1119 | "cell_type": "code", 1120 | "source": [ 1121 | "from transformers import T5Tokenizer, T5ForConditionalGeneration" 1122 | ], 1123 | "metadata": { 1124 | "id": "9UNCoaoGqrHk" 1125 | }, 1126 | "execution_count": 2, 1127 | "outputs": [] 1128 | }, 1129 | { 1130 | "cell_type": "code", 1131 | "source": [ 1132 | "model = T5ForConditionalGeneration.from_pretrained('prithivida/parrot_paraphraser_on_T5')\n", 1133 | "tokenizer = T5Tokenizer.from_pretrained('prithivida/parrot_paraphraser_on_T5')" 1134 | ], 1135 | "metadata": { 1136 | "colab": { 1137 | "base_uri": "https://localhost:8080/", 1138 | "height": 113, 1139 | "referenced_widgets": [ 1140 | "3ed30a8a0a5e4666b4cbeaefb75aa9e3", 1141 | "209495704ab34383af7fe65c8ea9dd09", 1142 | "4624d5203f1c44718a660a6edb6bc66e", 1143 | "94dc84edc49b4199b0a6e7c6b8d87e7b", 1144 | "2391632bc3a34b3fb3210092776c74aa", 1145 | "a06640373e86495b97346998fa59edd9", 1146 | "2cf8d55b5eb34e38b7b3079312da60b5", 1147 | "3782e4e8cad847f087c3014e39d35c41", 1148 | "faa73c3ee50a412a91adf244b85b1de6", 1149 | "733aa710e8cd406695b3736087107bdf", 1150 | "6e66c2ca3d794fe393ee8c13b037fe9c", 1151 | "406d0b1379c748e5bddf41a08d1bc16a", 1152 | "4dde398e566244c1a01ea9a21cf61991", 1153 | "79c00b4cf7ea4912824088b2e56468bd", 1154 | "ed031c9b1a6345809176bea833e995d0", 1155 | "3bb9525493964c70a227235c1585713b", 1156 | "f147fd7003f64d0e8badb771bfb21b18", 1157 | "b758692b7b44431b80d7f75386da912f", 1158 | "6daf1c8875ae4945ae382a159131f953", 1159 | "f29012cb87a647008db4bf4464870157", 1160 | "01b4832c87444f87a33f2c4517d5c051", 1161 | "503531fc8424441e9a8ac4cfc64f7507", 1162 | "4aba0951ea8344219f3133451a314ce2", 1163 | "b61ca199e36947d8a76bb94d43705cb3", 1164 | "1c9984805be54af987bbc637cff39366", 1165 | "10e1477041db4335861396d47fcd38ba", 1166 | "41e7824ba62448169ed589d94a9ff1a7", 1167 | "381c420b802a4929be8fb948e6fe77b7", 1168 | "5b9986e6fa314c4da1ffb45b513255a9", 1169 | "20774c588bab4925bd80a491549a8dd6", 1170 | "c92e682a63664d4e83b3596e6126508b", 1171 | "9d8c0363f8e7449b8f6a30a92d66e924", 1172 | "3b8a84d61f4e4805b017ec11c0599db7" 1173 | ] 1174 | }, 1175 | "id": "LkN6zDHsq2m-", 1176 | "outputId": "810e0f4d-f1fb-4cb2-c95d-12352673398d" 1177 | }, 1178 | "execution_count": 3, 1179 | "outputs": [ 1180 | { 1181 | "output_type": "display_data", 1182 | "data": { 1183 | "text/plain": [ 1184 | "Downloading spiece.model: 0%| | 0.00/773k [00:00\"Open" 536 | ] 537 | }, 538 | { 539 | "cell_type": "markdown", 540 | "metadata": { 541 | "id": "H1F58j028eTV" 542 | }, 543 | "source": [ 544 | "## **Warm-starting BERT2BERT for CNN/Dailymail**\n", 545 | "\n", 546 | "***Note***: This notebook only uses a few training, validation, and test data samples for demonstration purposes. To fine-tune an encoder-decoder model on the full training data, the user should change the training and data preprocessing parameters accordingly as highlighted by the comments.\n" 547 | ] 548 | }, 549 | { 550 | "cell_type": "markdown", 551 | "metadata": { 552 | "id": "3FO5ESocXvlK" 553 | }, 554 | "source": [ 555 | "### **Data Preprocessing**\n" 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "metadata": { 561 | "id": "w67vkz3KP9eZ" 562 | }, 563 | "source": [ 564 | "%%capture\n", 565 | "!pip install datasets==1.0.2\n", 566 | "!pip install transformers==4.2.1\n", 567 | "\n", 568 | "import datasets\n", 569 | "import transformers" 570 | ], 571 | "execution_count": null, 572 | "outputs": [] 573 | }, 574 | { 575 | "cell_type": "code", 576 | "metadata": { 577 | "id": "sgTiC0rhMb7C" 578 | }, 579 | "source": [ 580 | "from transformers import BertTokenizerFast\n", 581 | "\n", 582 | "tokenizer = BertTokenizerFast.from_pretrained(\"bert-base-uncased\")\n", 583 | "tokenizer.bos_token = tokenizer.cls_token\n", 584 | "tokenizer.eos_token = tokenizer.sep_token\n", 585 | "\n", 586 | "train_data = datasets.load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"train\")\n", 587 | "val_data = datasets.load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"validation[:10%]\")" 588 | ], 589 | "execution_count": null, 590 | "outputs": [] 591 | }, 592 | { 593 | "cell_type": "code", 594 | "metadata": { 595 | "id": "yoN2q0hZUbXN", 596 | "colab": { 597 | "base_uri": "https://localhost:8080/", 598 | "height": 117, 599 | "referenced_widgets": [ 600 | "92b31e571b3b4036a5d605419ebb69d5", 601 | "abf87c25c3044264b89cfaf6c0eaee8b", 602 | "3b07ecf8fed64b31812a7cf340f9e330", 603 | "6da276ed463b4e368a40ac82f044849e", 604 | "693f95c74c8d48aebe6dc834c4b67c0d", 605 | "085e086df51e490ab1dabc1c135101fe", 606 | "cab934d5f7a440428766e4d5c03e3e8a", 607 | "b41787b2568c4b9fb3b2588e189fb071", 608 | "0e5512bcb89643ffab38dc34e2fb9e2c", 609 | "741245611dfc46448f4563de7c13b798", 610 | "888cd957aee64f3b87e94d65b2780e73", 611 | "72014877a8cb410880d9f9f4c1cf8203", 612 | "c3629a6449bc469c8556b0fd897d34ef", 613 | "8d108f99c47e4398bb07f9ce9b61ed7e", 614 | "d13beb567f01474fb828b69b3b7ceb4a", 615 | "020ec98f3c8e42f28299053abd682a56" 616 | ] 617 | }, 618 | "outputId": "71b0dd46-befc-46fd-9e00-b7975709a9d3" 619 | }, 620 | "source": [ 621 | "batch_size=4 # change to 16 for full training\n", 622 | "encoder_max_length=512\n", 623 | "decoder_max_length=128\n", 624 | "\n", 625 | "def process_data_to_model_inputs(batch):\n", 626 | " # tokenize the inputs and labels\n", 627 | " inputs = tokenizer(batch[\"article\"], padding=\"max_length\", truncation=True, max_length=encoder_max_length)\n", 628 | " outputs = tokenizer(batch[\"highlights\"], padding=\"max_length\", truncation=True, max_length=decoder_max_length)\n", 629 | "\n", 630 | " batch[\"input_ids\"] = inputs.input_ids\n", 631 | " batch[\"attention_mask\"] = inputs.attention_mask\n", 632 | " batch[\"decoder_input_ids\"] = outputs.input_ids\n", 633 | " batch[\"decoder_attention_mask\"] = outputs.attention_mask\n", 634 | " batch[\"labels\"] = outputs.input_ids.copy()\n", 635 | "\n", 636 | " # because BERT automatically shifts the labels, the labels correspond exactly to `decoder_input_ids`. \n", 637 | " # We have to make sure that the PAD token is ignored\n", 638 | " batch[\"labels\"] = [[-100 if token == tokenizer.pad_token_id else token for token in labels] for labels in batch[\"labels\"]]\n", 639 | "\n", 640 | " return batch\n", 641 | "\n", 642 | "# only use 32 training examples for notebook - DELETE LINE FOR FULL TRAINING\n", 643 | "train_data = train_data.select(range(32))\n", 644 | "\n", 645 | "train_data = train_data.map(\n", 646 | " process_data_to_model_inputs, \n", 647 | " batched=True, \n", 648 | " batch_size=batch_size, \n", 649 | " remove_columns=[\"article\", \"highlights\", \"id\"]\n", 650 | ")\n", 651 | "train_data.set_format(\n", 652 | " type=\"torch\", columns=[\"input_ids\", \"attention_mask\", \"decoder_input_ids\", \"decoder_attention_mask\", \"labels\"],\n", 653 | ")\n", 654 | "\n", 655 | "\n", 656 | "# only use 16 training examples for notebook - DELETE LINE FOR FULL TRAINING\n", 657 | "val_data = val_data.select(range(16))\n", 658 | "\n", 659 | "val_data = val_data.map(\n", 660 | " process_data_to_model_inputs, \n", 661 | " batched=True, \n", 662 | " batch_size=batch_size, \n", 663 | " remove_columns=[\"article\", \"highlights\", \"id\"]\n", 664 | ")\n", 665 | "val_data.set_format(\n", 666 | " type=\"torch\", columns=[\"input_ids\", \"attention_mask\", \"decoder_input_ids\", \"decoder_attention_mask\", \"labels\"],\n", 667 | ")" 668 | ], 669 | "execution_count": null, 670 | "outputs": [ 671 | { 672 | "output_type": "display_data", 673 | "data": { 674 | "application/vnd.jupyter.widget-view+json": { 675 | "model_id": "92b31e571b3b4036a5d605419ebb69d5", 676 | "version_minor": 0, 677 | "version_major": 2 678 | }, 679 | "text/plain": [ 680 | "HBox(children=(FloatProgress(value=0.0, max=8.0), HTML(value='')))" 681 | ] 682 | }, 683 | "metadata": { 684 | "tags": [] 685 | } 686 | }, 687 | { 688 | "output_type": "stream", 689 | "text": [ 690 | "\n" 691 | ], 692 | "name": "stdout" 693 | }, 694 | { 695 | "output_type": "display_data", 696 | "data": { 697 | "application/vnd.jupyter.widget-view+json": { 698 | "model_id": "0e5512bcb89643ffab38dc34e2fb9e2c", 699 | "version_minor": 0, 700 | "version_major": 2 701 | }, 702 | "text/plain": [ 703 | "HBox(children=(FloatProgress(value=0.0, max=4.0), HTML(value='')))" 704 | ] 705 | }, 706 | "metadata": { 707 | "tags": [] 708 | } 709 | }, 710 | { 711 | "output_type": "stream", 712 | "text": [ 713 | "\n" 714 | ], 715 | "name": "stdout" 716 | } 717 | ] 718 | }, 719 | { 720 | "cell_type": "markdown", 721 | "metadata": { 722 | "id": "aEjb026cNC38" 723 | }, 724 | "source": [ 725 | "### **Warm-starting the Encoder-Decoder Model**" 726 | ] 727 | }, 728 | { 729 | "cell_type": "code", 730 | "metadata": { 731 | "id": "tS0UndNoQh8t" 732 | }, 733 | "source": [ 734 | "from transformers import EncoderDecoderModel\n", 735 | "\n", 736 | "bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained(\"bert-base-uncased\", \"bert-base-uncased\")" 737 | ], 738 | "execution_count": null, 739 | "outputs": [] 740 | }, 741 | { 742 | "cell_type": "code", 743 | "metadata": { 744 | "id": "JD2jv3GkyjR-" 745 | }, 746 | "source": [ 747 | "# set special tokens\n", 748 | "bert2bert.config.decoder_start_token_id = tokenizer.bos_token_id\n", 749 | "bert2bert.config.eos_token_id = tokenizer.eos_token_id\n", 750 | "bert2bert.config.pad_token_id = tokenizer.pad_token_id\n", 751 | "\n", 752 | "# sensible parameters for beam search\n", 753 | "bert2bert.config.vocab_size = bert2bert.config.decoder.vocab_size\n", 754 | "bert2bert.config.max_length = 142\n", 755 | "bert2bert.config.min_length = 56\n", 756 | "bert2bert.config.no_repeat_ngram_size = 3\n", 757 | "bert2bert.config.early_stopping = True\n", 758 | "bert2bert.config.length_penalty = 2.0\n", 759 | "bert2bert.config.num_beams = 4" 760 | ], 761 | "execution_count": null, 762 | "outputs": [] 763 | }, 764 | { 765 | "cell_type": "markdown", 766 | "metadata": { 767 | "id": "u98CLZiTkgzv" 768 | }, 769 | "source": [ 770 | "### **Fine-Tuning Warm-Started Encoder-Decoder Models**" 771 | ] 772 | }, 773 | { 774 | "cell_type": "markdown", 775 | "metadata": { 776 | "id": "rZK_gnIzZgTO" 777 | }, 778 | "source": [ 779 | "For the `EncoderDecoderModel` framework, we will use the `Seq2SeqTrainingArguments` and the `Seq2SeqTrainer`. Let's import them." 780 | ] 781 | }, 782 | { 783 | "cell_type": "code", 784 | "metadata": { 785 | "id": "-zkkd66rtsnA" 786 | }, 787 | "source": [ 788 | "from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer" 789 | ], 790 | "execution_count": null, 791 | "outputs": [] 792 | }, 793 | { 794 | "cell_type": "markdown", 795 | "metadata": { 796 | "id": "dPUAgo7pxH24" 797 | }, 798 | "source": [ 799 | "Also, we need to define a function to correctly compute the ROUGE score during validation. ROUGE is a much better metric to track during training than only language modeling loss." 800 | ] 801 | }, 802 | { 803 | "cell_type": "code", 804 | "metadata": { 805 | "id": "68IHmFYLx09W" 806 | }, 807 | "source": [ 808 | "# load rouge for validation\n", 809 | "rouge = datasets.load_metric(\"rouge\")\n", 810 | "\n", 811 | "def compute_metrics(pred):\n", 812 | " labels_ids = pred.label_ids\n", 813 | " pred_ids = pred.predictions\n", 814 | "\n", 815 | " # all unnecessary tokens are removed\n", 816 | " pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)\n", 817 | " labels_ids[labels_ids == -100] = tokenizer.pad_token_id\n", 818 | " label_str = tokenizer.batch_decode(labels_ids, skip_special_tokens=True)\n", 819 | "\n", 820 | " rouge_output = rouge.compute(predictions=pred_str, references=label_str, rouge_types=[\"rouge2\"])[\"rouge2\"].mid\n", 821 | "\n", 822 | " return {\n", 823 | " \"rouge2_precision\": round(rouge_output.precision, 4),\n", 824 | " \"rouge2_recall\": round(rouge_output.recall, 4),\n", 825 | " \"rouge2_fmeasure\": round(rouge_output.fmeasure, 4),\n", 826 | " }" 827 | ], 828 | "execution_count": null, 829 | "outputs": [] 830 | }, 831 | { 832 | "cell_type": "markdown", 833 | "metadata": { 834 | "id": "1ik4hZb2yV-b" 835 | }, 836 | "source": [ 837 | "Cool! Finally, we start training." 838 | ] 839 | }, 840 | { 841 | "cell_type": "code", 842 | "metadata": { 843 | "id": "LAaTxUpdzshF", 844 | "colab": { 845 | "base_uri": "https://localhost:8080/", 846 | "height": 273 847 | }, 848 | "outputId": "3605173b-5561-4ca9-b20f-2fc64709ea81" 849 | }, 850 | "source": [ 851 | "# set training arguments - these params are not really tuned, feel free to change\n", 852 | "training_args = Seq2SeqTrainingArguments(\n", 853 | " output_dir=\"./\",\n", 854 | " evaluation_strategy=\"steps\",\n", 855 | " per_device_train_batch_size=batch_size,\n", 856 | " per_device_eval_batch_size=batch_size,\n", 857 | " predict_with_generate=True,\n", 858 | " logging_steps=2, # set to 1000 for full training\n", 859 | " save_steps=16, # set to 500 for full training\n", 860 | " eval_steps=4, # set to 8000 for full training\n", 861 | " warmup_steps=1, # set to 2000 for full training\n", 862 | " max_steps=16, # delete for full training\n", 863 | " overwrite_output_dir=True,\n", 864 | " save_total_limit=3,\n", 865 | " fp16=True, \n", 866 | ")\n", 867 | "\n", 868 | "# instantiate trainer\n", 869 | "trainer = Seq2SeqTrainer(\n", 870 | " model=bert2bert,\n", 871 | " tokenizer=tokenizer,\n", 872 | " args=training_args,\n", 873 | " compute_metrics=compute_metrics,\n", 874 | " train_dataset=train_data,\n", 875 | " eval_dataset=val_data,\n", 876 | ")\n", 877 | "trainer.train()" 878 | ], 879 | "execution_count": null, 880 | "outputs": [ 881 | { 882 | "output_type": "stream", 883 | "text": [ 884 | "/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n", 885 | " \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n" 886 | ], 887 | "name": "stderr" 888 | }, 889 | { 890 | "output_type": "display_data", 891 | "data": { 892 | "text/html": [ 893 | "\n", 894 | "
\n", 895 | " \n", 904 | " \n", 905 | " \n", 906 | " [16/16 05:31, Epoch 2/2]\n", 907 | "
\n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | " \n", 921 | " \n", 922 | " \n", 923 | " \n", 924 | " \n", 925 | " \n", 926 | " \n", 927 | " \n", 928 | " \n", 929 | " \n", 930 | " \n", 931 | " \n", 932 | " \n", 933 | " \n", 934 | " \n", 935 | " \n", 936 | " \n", 937 | " \n", 938 | " \n", 939 | " \n", 940 | " \n", 941 | " \n", 942 | " \n", 943 | " \n", 944 | " \n", 945 | " \n", 946 | " \n", 947 | " \n", 948 | " \n", 949 | " \n", 950 | " \n", 951 | " \n", 952 | " \n", 953 | " \n", 954 | " \n", 955 | " \n", 956 | " \n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | "
StepTraining LossValidation LossRouge2 PrecisionRouge2 RecallRouge2 FmeasureRuntimeSamples Per Second
48.5022007.9047780.0044000.0062000.00480059.3205000.270000
87.5914007.8537090.0000000.0000000.00000059.5548000.269000
127.3440007.7365130.0047000.0042000.00440059.4654000.269000
167.4565007.7347560.0039000.0044000.00410059.4285000.269000

" 964 | ], 965 | "text/plain": [ 966 | "" 967 | ] 968 | }, 969 | "metadata": { 970 | "tags": [] 971 | } 972 | }, 973 | { 974 | "output_type": "execute_result", 975 | "data": { 976 | "text/plain": [ 977 | "TrainOutput(global_step=16, training_loss=7.902346074581146, metrics={'train_runtime': 335.2171, 'train_samples_per_second': 0.048, 'total_flos': 60792025743360, 'epoch': 2.0})" 978 | ] 979 | }, 980 | "metadata": { 981 | "tags": [] 982 | }, 983 | "execution_count": 9 984 | } 985 | ] 986 | }, 987 | { 988 | "cell_type": "markdown", 989 | "metadata": { 990 | "id": "ZwQIEhKOrJpl" 991 | }, 992 | "source": [ 993 | "### **Evaluation**\n", 994 | "\n", 995 | "Awesome, we finished training our dummy model. Let's now evaluated the model on the test data. We make use of the dataset's handy `.map()` function to generate a summary of each sample of the test data." 996 | ] 997 | }, 998 | { 999 | "cell_type": "code", 1000 | "metadata": { 1001 | "id": "oOoSrwWarJAC" 1002 | }, 1003 | "source": [ 1004 | "import datasets\n", 1005 | "from transformers import BertTokenizer, EncoderDecoderModel\n", 1006 | "\n", 1007 | "tokenizer = BertTokenizer.from_pretrained(\"bert-base-uncased\")\n", 1008 | "model = EncoderDecoderModel.from_pretrained(\"./checkpoint-16\")\n", 1009 | "model.to(\"cuda\")\n", 1010 | "\n", 1011 | "test_data = datasets.load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"test\")\n", 1012 | "\n", 1013 | "# only use 16 training examples for notebook - DELETE LINE FOR FULL TRAINING\n", 1014 | "test_data = test_data.select(range(16))\n", 1015 | "\n", 1016 | "batch_size = 16 # change to 64 for full evaluation\n", 1017 | "\n", 1018 | "# map data correctly\n", 1019 | "def generate_summary(batch):\n", 1020 | " # Tokenizer will automatically set [BOS] [EOS]\n", 1021 | " # cut off at BERT max length 512\n", 1022 | " inputs = tokenizer(batch[\"article\"], padding=\"max_length\", truncation=True, max_length=512, return_tensors=\"pt\")\n", 1023 | " input_ids = inputs.input_ids.to(\"cuda\")\n", 1024 | " attention_mask = inputs.attention_mask.to(\"cuda\")\n", 1025 | "\n", 1026 | " outputs = model.generate(input_ids, attention_mask=attention_mask)\n", 1027 | "\n", 1028 | " # all special tokens including will be removed\n", 1029 | " output_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)\n", 1030 | "\n", 1031 | " batch[\"pred\"] = output_str\n", 1032 | "\n", 1033 | " return batch\n", 1034 | "\n", 1035 | "results = test_data.map(generate_summary, batched=True, batch_size=batch_size, remove_columns=[\"article\"])\n", 1036 | "\n", 1037 | "pred_str = results[\"pred\"]\n", 1038 | "label_str = results[\"highlights\"]\n", 1039 | "\n", 1040 | "rouge_output = rouge.compute(predictions=pred_str, references=label_str, rouge_types=[\"rouge2\"])[\"rouge2\"].mid\n", 1041 | "\n", 1042 | "print(rouge_output)" 1043 | ], 1044 | "execution_count": null, 1045 | "outputs": [] 1046 | }, 1047 | { 1048 | "cell_type": "markdown", 1049 | "metadata": { 1050 | "id": "7zdm50ZotZqb" 1051 | }, 1052 | "source": [ 1053 | "The fully trained *BERT2BERT* model is uploaded to the 🤗model hub under [patrickvonplaten/bert2bert_cnn_daily_mail](https://huggingface.co/patrickvonplaten/bert2bert_cnn_daily_mail). \n", 1054 | "\n", 1055 | "The model achieves a ROUGE-2 score of **18.22**, which is even a little better than reported in the paper.\n", 1056 | "\n", 1057 | "For some summarization examples, the reader is advised to use the online inference API of the model, [here](https://huggingface.co/patrickvonplaten/bert2bert_cnn_daily_mail)." 1058 | ] 1059 | } 1060 | ] 1061 | } --------------------------------------------------------------------------------