├── LICENSE
├── README.md
├── classification
└── hf_evaluate.ipynb
├── data
├── mini-dataset.csv
├── mini-dataset.json
├── mini-dataset_with_embedding.json
├── mini-llama-articles-with_embeddings.csv
├── mini-llama-articles.csv
├── rag_eval_dataset.json
└── vectorstore.zip
├── paraphrasing
└── hf_T5_paraphrasing.ipynb
├── summarization
├── hf_BART_inference_breakdown.ipynb
├── hf_BART_train_breakdown.ipynb
└── hf_BERT-BERT_training.ipynb
└── translation
└── hf_bart_translation.ipynb
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # tutorial_notebooks
--------------------------------------------------------------------------------
/classification/hf_evaluate.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "provenance": [],
7 | "authorship_tag": "ABX9TyPjI+p6qUuJ1t5az+EkoO8E",
8 | "include_colab_link": true
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | },
14 | "language_info": {
15 | "name": "python"
16 | },
17 | "widgets": {
18 | "application/vnd.jupyter.widget-state+json": {
19 | "182593c10e3548dbb190d6cea821eb17": {
20 | "model_module": "@jupyter-widgets/controls",
21 | "model_name": "HBoxModel",
22 | "model_module_version": "1.5.0",
23 | "state": {
24 | "_dom_classes": [],
25 | "_model_module": "@jupyter-widgets/controls",
26 | "_model_module_version": "1.5.0",
27 | "_model_name": "HBoxModel",
28 | "_view_count": null,
29 | "_view_module": "@jupyter-widgets/controls",
30 | "_view_module_version": "1.5.0",
31 | "_view_name": "HBoxView",
32 | "box_style": "",
33 | "children": [
34 | "IPY_MODEL_9940ccc1f94d40fa84838c91489e9822",
35 | "IPY_MODEL_da2123ea1f604bb7bfa910320755a999",
36 | "IPY_MODEL_47cb9e14f37746528b34f48488b45d83"
37 | ],
38 | "layout": "IPY_MODEL_2677f8fb1a1a4baeb591541487de20f2"
39 | }
40 | },
41 | "9940ccc1f94d40fa84838c91489e9822": {
42 | "model_module": "@jupyter-widgets/controls",
43 | "model_name": "HTMLModel",
44 | "model_module_version": "1.5.0",
45 | "state": {
46 | "_dom_classes": [],
47 | "_model_module": "@jupyter-widgets/controls",
48 | "_model_module_version": "1.5.0",
49 | "_model_name": "HTMLModel",
50 | "_view_count": null,
51 | "_view_module": "@jupyter-widgets/controls",
52 | "_view_module_version": "1.5.0",
53 | "_view_name": "HTMLView",
54 | "description": "",
55 | "description_tooltip": null,
56 | "layout": "IPY_MODEL_9e32b380fcd4429e8e2477ede70aaa1d",
57 | "placeholder": "",
58 | "style": "IPY_MODEL_0dd59e6c6e464937be617e1ff0d4c448",
59 | "value": "100%"
60 | }
61 | },
62 | "da2123ea1f604bb7bfa910320755a999": {
63 | "model_module": "@jupyter-widgets/controls",
64 | "model_name": "FloatProgressModel",
65 | "model_module_version": "1.5.0",
66 | "state": {
67 | "_dom_classes": [],
68 | "_model_module": "@jupyter-widgets/controls",
69 | "_model_module_version": "1.5.0",
70 | "_model_name": "FloatProgressModel",
71 | "_view_count": null,
72 | "_view_module": "@jupyter-widgets/controls",
73 | "_view_module_version": "1.5.0",
74 | "_view_name": "ProgressView",
75 | "bar_style": "success",
76 | "description": "",
77 | "description_tooltip": null,
78 | "layout": "IPY_MODEL_b2aded75b044469f9a090bc592b2015c",
79 | "max": 2,
80 | "min": 0,
81 | "orientation": "horizontal",
82 | "style": "IPY_MODEL_6c6dcb3cdcae44aea7f27a3dbece79fc",
83 | "value": 2
84 | }
85 | },
86 | "47cb9e14f37746528b34f48488b45d83": {
87 | "model_module": "@jupyter-widgets/controls",
88 | "model_name": "HTMLModel",
89 | "model_module_version": "1.5.0",
90 | "state": {
91 | "_dom_classes": [],
92 | "_model_module": "@jupyter-widgets/controls",
93 | "_model_module_version": "1.5.0",
94 | "_model_name": "HTMLModel",
95 | "_view_count": null,
96 | "_view_module": "@jupyter-widgets/controls",
97 | "_view_module_version": "1.5.0",
98 | "_view_name": "HTMLView",
99 | "description": "",
100 | "description_tooltip": null,
101 | "layout": "IPY_MODEL_d9e1989bfbd946b68decc24c67a37b32",
102 | "placeholder": "",
103 | "style": "IPY_MODEL_b26dbdcb1fef49f695b92799fc310cd8",
104 | "value": " 2/2 [00:00<00:00, 19.46it/s]"
105 | }
106 | },
107 | "2677f8fb1a1a4baeb591541487de20f2": {
108 | "model_module": "@jupyter-widgets/base",
109 | "model_name": "LayoutModel",
110 | "model_module_version": "1.2.0",
111 | "state": {
112 | "_model_module": "@jupyter-widgets/base",
113 | "_model_module_version": "1.2.0",
114 | "_model_name": "LayoutModel",
115 | "_view_count": null,
116 | "_view_module": "@jupyter-widgets/base",
117 | "_view_module_version": "1.2.0",
118 | "_view_name": "LayoutView",
119 | "align_content": null,
120 | "align_items": null,
121 | "align_self": null,
122 | "border": null,
123 | "bottom": null,
124 | "display": null,
125 | "flex": null,
126 | "flex_flow": null,
127 | "grid_area": null,
128 | "grid_auto_columns": null,
129 | "grid_auto_flow": null,
130 | "grid_auto_rows": null,
131 | "grid_column": null,
132 | "grid_gap": null,
133 | "grid_row": null,
134 | "grid_template_areas": null,
135 | "grid_template_columns": null,
136 | "grid_template_rows": null,
137 | "height": null,
138 | "justify_content": null,
139 | "justify_items": null,
140 | "left": null,
141 | "margin": null,
142 | "max_height": null,
143 | "max_width": null,
144 | "min_height": null,
145 | "min_width": null,
146 | "object_fit": null,
147 | "object_position": null,
148 | "order": null,
149 | "overflow": null,
150 | "overflow_x": null,
151 | "overflow_y": null,
152 | "padding": null,
153 | "right": null,
154 | "top": null,
155 | "visibility": null,
156 | "width": null
157 | }
158 | },
159 | "9e32b380fcd4429e8e2477ede70aaa1d": {
160 | "model_module": "@jupyter-widgets/base",
161 | "model_name": "LayoutModel",
162 | "model_module_version": "1.2.0",
163 | "state": {
164 | "_model_module": "@jupyter-widgets/base",
165 | "_model_module_version": "1.2.0",
166 | "_model_name": "LayoutModel",
167 | "_view_count": null,
168 | "_view_module": "@jupyter-widgets/base",
169 | "_view_module_version": "1.2.0",
170 | "_view_name": "LayoutView",
171 | "align_content": null,
172 | "align_items": null,
173 | "align_self": null,
174 | "border": null,
175 | "bottom": null,
176 | "display": null,
177 | "flex": null,
178 | "flex_flow": null,
179 | "grid_area": null,
180 | "grid_auto_columns": null,
181 | "grid_auto_flow": null,
182 | "grid_auto_rows": null,
183 | "grid_column": null,
184 | "grid_gap": null,
185 | "grid_row": null,
186 | "grid_template_areas": null,
187 | "grid_template_columns": null,
188 | "grid_template_rows": null,
189 | "height": null,
190 | "justify_content": null,
191 | "justify_items": null,
192 | "left": null,
193 | "margin": null,
194 | "max_height": null,
195 | "max_width": null,
196 | "min_height": null,
197 | "min_width": null,
198 | "object_fit": null,
199 | "object_position": null,
200 | "order": null,
201 | "overflow": null,
202 | "overflow_x": null,
203 | "overflow_y": null,
204 | "padding": null,
205 | "right": null,
206 | "top": null,
207 | "visibility": null,
208 | "width": null
209 | }
210 | },
211 | "0dd59e6c6e464937be617e1ff0d4c448": {
212 | "model_module": "@jupyter-widgets/controls",
213 | "model_name": "DescriptionStyleModel",
214 | "model_module_version": "1.5.0",
215 | "state": {
216 | "_model_module": "@jupyter-widgets/controls",
217 | "_model_module_version": "1.5.0",
218 | "_model_name": "DescriptionStyleModel",
219 | "_view_count": null,
220 | "_view_module": "@jupyter-widgets/base",
221 | "_view_module_version": "1.2.0",
222 | "_view_name": "StyleView",
223 | "description_width": ""
224 | }
225 | },
226 | "b2aded75b044469f9a090bc592b2015c": {
227 | "model_module": "@jupyter-widgets/base",
228 | "model_name": "LayoutModel",
229 | "model_module_version": "1.2.0",
230 | "state": {
231 | "_model_module": "@jupyter-widgets/base",
232 | "_model_module_version": "1.2.0",
233 | "_model_name": "LayoutModel",
234 | "_view_count": null,
235 | "_view_module": "@jupyter-widgets/base",
236 | "_view_module_version": "1.2.0",
237 | "_view_name": "LayoutView",
238 | "align_content": null,
239 | "align_items": null,
240 | "align_self": null,
241 | "border": null,
242 | "bottom": null,
243 | "display": null,
244 | "flex": null,
245 | "flex_flow": null,
246 | "grid_area": null,
247 | "grid_auto_columns": null,
248 | "grid_auto_flow": null,
249 | "grid_auto_rows": null,
250 | "grid_column": null,
251 | "grid_gap": null,
252 | "grid_row": null,
253 | "grid_template_areas": null,
254 | "grid_template_columns": null,
255 | "grid_template_rows": null,
256 | "height": null,
257 | "justify_content": null,
258 | "justify_items": null,
259 | "left": null,
260 | "margin": null,
261 | "max_height": null,
262 | "max_width": null,
263 | "min_height": null,
264 | "min_width": null,
265 | "object_fit": null,
266 | "object_position": null,
267 | "order": null,
268 | "overflow": null,
269 | "overflow_x": null,
270 | "overflow_y": null,
271 | "padding": null,
272 | "right": null,
273 | "top": null,
274 | "visibility": null,
275 | "width": null
276 | }
277 | },
278 | "6c6dcb3cdcae44aea7f27a3dbece79fc": {
279 | "model_module": "@jupyter-widgets/controls",
280 | "model_name": "ProgressStyleModel",
281 | "model_module_version": "1.5.0",
282 | "state": {
283 | "_model_module": "@jupyter-widgets/controls",
284 | "_model_module_version": "1.5.0",
285 | "_model_name": "ProgressStyleModel",
286 | "_view_count": null,
287 | "_view_module": "@jupyter-widgets/base",
288 | "_view_module_version": "1.2.0",
289 | "_view_name": "StyleView",
290 | "bar_color": null,
291 | "description_width": ""
292 | }
293 | },
294 | "d9e1989bfbd946b68decc24c67a37b32": {
295 | "model_module": "@jupyter-widgets/base",
296 | "model_name": "LayoutModel",
297 | "model_module_version": "1.2.0",
298 | "state": {
299 | "_model_module": "@jupyter-widgets/base",
300 | "_model_module_version": "1.2.0",
301 | "_model_name": "LayoutModel",
302 | "_view_count": null,
303 | "_view_module": "@jupyter-widgets/base",
304 | "_view_module_version": "1.2.0",
305 | "_view_name": "LayoutView",
306 | "align_content": null,
307 | "align_items": null,
308 | "align_self": null,
309 | "border": null,
310 | "bottom": null,
311 | "display": null,
312 | "flex": null,
313 | "flex_flow": null,
314 | "grid_area": null,
315 | "grid_auto_columns": null,
316 | "grid_auto_flow": null,
317 | "grid_auto_rows": null,
318 | "grid_column": null,
319 | "grid_gap": null,
320 | "grid_row": null,
321 | "grid_template_areas": null,
322 | "grid_template_columns": null,
323 | "grid_template_rows": null,
324 | "height": null,
325 | "justify_content": null,
326 | "justify_items": null,
327 | "left": null,
328 | "margin": null,
329 | "max_height": null,
330 | "max_width": null,
331 | "min_height": null,
332 | "min_width": null,
333 | "object_fit": null,
334 | "object_position": null,
335 | "order": null,
336 | "overflow": null,
337 | "overflow_x": null,
338 | "overflow_y": null,
339 | "padding": null,
340 | "right": null,
341 | "top": null,
342 | "visibility": null,
343 | "width": null
344 | }
345 | },
346 | "b26dbdcb1fef49f695b92799fc310cd8": {
347 | "model_module": "@jupyter-widgets/controls",
348 | "model_name": "DescriptionStyleModel",
349 | "model_module_version": "1.5.0",
350 | "state": {
351 | "_model_module": "@jupyter-widgets/controls",
352 | "_model_module_version": "1.5.0",
353 | "_model_name": "DescriptionStyleModel",
354 | "_view_count": null,
355 | "_view_module": "@jupyter-widgets/base",
356 | "_view_module_version": "1.2.0",
357 | "_view_name": "StyleView",
358 | "description_width": ""
359 | }
360 | }
361 | }
362 | }
363 | },
364 | "cells": [
365 | {
366 | "cell_type": "markdown",
367 | "metadata": {
368 | "id": "view-in-github",
369 | "colab_type": "text"
370 | },
371 | "source": [
372 | " "
373 | ]
374 | },
375 | {
376 | "cell_type": "markdown",
377 | "source": [
378 | "# A sample code of how to use Huggingface's Evaluate library\n",
379 | "The code is the supplementary material to the story published in NLPiation medium blog. Follow [the link](https://medium.com/@nlpiation/how-to-use-the-huggingface-evaluate-library-in-action-with-batching-2948929015bf) for a detailed explanation of the diverse beam search and following code."
380 | ],
381 | "metadata": {
382 | "id": "2JeLSabjDMnU"
383 | }
384 | },
385 | {
386 | "cell_type": "markdown",
387 | "source": [
388 | "# Download and Load Libraries"
389 | ],
390 | "metadata": {
391 | "id": "rhUVNMEvDoI0"
392 | }
393 | },
394 | {
395 | "cell_type": "code",
396 | "execution_count": 1,
397 | "metadata": {
398 | "id": "oN2D9LrebBVo"
399 | },
400 | "outputs": [],
401 | "source": [
402 | "!pip install -q torch==1.13.1 datasets==2.9.0 evaluate==0.4.0 transformers==4.26.0"
403 | ]
404 | },
405 | {
406 | "cell_type": "markdown",
407 | "source": [
408 | "## Import Libraries"
409 | ],
410 | "metadata": {
411 | "id": "Qlj89Verbd5n"
412 | }
413 | },
414 | {
415 | "cell_type": "code",
416 | "source": [
417 | "import torch\n",
418 | "from transformers import AutoModelForSequenceClassification\n",
419 | "from transformers import AutoTokenizer\n",
420 | "import evaluate\n",
421 | "from datasets import load_dataset\n",
422 | "from datasets import Dataset\n",
423 | "\n",
424 | "from tqdm import tqdm\n",
425 | "import pandas as pd\n",
426 | "from sklearn.model_selection import train_test_split"
427 | ],
428 | "metadata": {
429 | "id": "Z7NDPpHdbotq"
430 | },
431 | "execution_count": 2,
432 | "outputs": []
433 | },
434 | {
435 | "cell_type": "markdown",
436 | "source": [
437 | "# Load The Dataset"
438 | ],
439 | "metadata": {
440 | "id": "ZXciiNwaqxtl"
441 | }
442 | },
443 | {
444 | "cell_type": "code",
445 | "source": [
446 | "sentiment140 = load_dataset(\"sentiment140\", cache_dir=\"./ds_sentiment140\")"
447 | ],
448 | "metadata": {
449 | "id": "9bThj8ZYbJ0U",
450 | "colab": {
451 | "base_uri": "https://localhost:8080/",
452 | "height": 67,
453 | "referenced_widgets": [
454 | "182593c10e3548dbb190d6cea821eb17",
455 | "9940ccc1f94d40fa84838c91489e9822",
456 | "da2123ea1f604bb7bfa910320755a999",
457 | "47cb9e14f37746528b34f48488b45d83",
458 | "2677f8fb1a1a4baeb591541487de20f2",
459 | "9e32b380fcd4429e8e2477ede70aaa1d",
460 | "0dd59e6c6e464937be617e1ff0d4c448",
461 | "b2aded75b044469f9a090bc592b2015c",
462 | "6c6dcb3cdcae44aea7f27a3dbece79fc",
463 | "d9e1989bfbd946b68decc24c67a37b32",
464 | "b26dbdcb1fef49f695b92799fc310cd8"
465 | ]
466 | },
467 | "outputId": "b0a33fd9-f84c-4d77-b2d3-e053c78243ed"
468 | },
469 | "execution_count": 3,
470 | "outputs": [
471 | {
472 | "output_type": "stream",
473 | "name": "stderr",
474 | "text": [
475 | "WARNING:datasets.builder:Found cached dataset sentiment140 (/content/ds_sentiment140/sentiment140/sentiment140/1.0.0/f81c014152931b776735658d8ae493b181927de002e706c4d5244ecb26376997)\n"
476 | ]
477 | },
478 | {
479 | "output_type": "display_data",
480 | "data": {
481 | "text/plain": [
482 | " 0%| | 0/2 [00:00, ?it/s]"
483 | ],
484 | "application/vnd.jupyter.widget-view+json": {
485 | "version_major": 2,
486 | "version_minor": 0,
487 | "model_id": "182593c10e3548dbb190d6cea821eb17"
488 | }
489 | },
490 | "metadata": {}
491 | }
492 | ]
493 | },
494 | {
495 | "cell_type": "markdown",
496 | "source": [
497 | "We first need to convert the dataset to Dataframe and split it, sinec the dataset does not have a fixed test or validation set."
498 | ],
499 | "metadata": {
500 | "id": "lC71MbMqqz2S"
501 | }
502 | },
503 | {
504 | "cell_type": "code",
505 | "source": [
506 | "df = sentiment140[\"train\"].to_pandas()"
507 | ],
508 | "metadata": {
509 | "id": "GN36mg0WkS-f"
510 | },
511 | "execution_count": 4,
512 | "outputs": []
513 | },
514 | {
515 | "cell_type": "code",
516 | "source": [
517 | "df['sentiment'] = df['sentiment'].replace(4, 1)"
518 | ],
519 | "metadata": {
520 | "id": "wEADNTxou_QL"
521 | },
522 | "execution_count": 5,
523 | "outputs": []
524 | },
525 | {
526 | "cell_type": "code",
527 | "source": [
528 | "X_train, X_test, y_train, y_test = train_test_split(df['text'], df['sentiment'], test_size=0.2, random_state=1)"
529 | ],
530 | "metadata": {
531 | "id": "3jjnNxAjk2dT"
532 | },
533 | "execution_count": 6,
534 | "outputs": []
535 | },
536 | {
537 | "cell_type": "code",
538 | "source": [
539 | "X_test, X_val, y_test, y_val = train_test_split(X_test, y_test, test_size=0.5, random_state=1)"
540 | ],
541 | "metadata": {
542 | "id": "ONBdamqAk8J3"
543 | },
544 | "execution_count": 7,
545 | "outputs": []
546 | },
547 | {
548 | "cell_type": "code",
549 | "source": [
550 | "print(\"Train:\", len(y_train), \" / Test:\", len(y_test), \" / Val:\", len(y_val))"
551 | ],
552 | "metadata": {
553 | "colab": {
554 | "base_uri": "https://localhost:8080/"
555 | },
556 | "id": "D6U2xs7QlB6b",
557 | "outputId": "cbd03349-e9eb-4ec8-95a9-5a45bca771be"
558 | },
559 | "execution_count": 8,
560 | "outputs": [
561 | {
562 | "output_type": "stream",
563 | "name": "stdout",
564 | "text": [
565 | "Train: 1280000 / Test: 160000 / Val: 160000\n"
566 | ]
567 | }
568 | ]
569 | },
570 | {
571 | "cell_type": "markdown",
572 | "source": [
573 | "Now, convert the separated texts and labels to Dataframes back to a joint Dataframe."
574 | ],
575 | "metadata": {
576 | "id": "Olyb2NKvrSh0"
577 | }
578 | },
579 | {
580 | "cell_type": "code",
581 | "source": [
582 | "train_df = pd.DataFrame(X_train,columns=['text'])\n",
583 | "train_df['sentiment'] = y_train\n",
584 | "train_df.head()"
585 | ],
586 | "metadata": {
587 | "colab": {
588 | "base_uri": "https://localhost:8080/",
589 | "height": 206
590 | },
591 | "id": "l4ZkQ_6nlMFO",
592 | "outputId": "25ffa5c4-491c-4aeb-e04d-b4705d40b565"
593 | },
594 | "execution_count": 9,
595 | "outputs": [
596 | {
597 | "output_type": "execute_result",
598 | "data": {
599 | "text/plain": [
600 | " text sentiment\n",
601 | "1556092 @JessicaKnows I use it and do like it. 1\n",
602 | "868905 Almost home aaand I need to pee rather badly.... 1\n",
603 | "218471 dropping the marmite and cheese covered bread ... 0\n",
604 | "620327 Having issues with Xfire broadcast. Cancelled ... 0\n",
605 | "981867 @kunaldua ask for Hermes Heritage complex. Its... 1"
606 | ],
607 | "text/html": [
608 | "\n",
609 | "
\n",
610 | "
\n",
611 | "
\n",
612 | "\n",
625 | "
\n",
626 | " \n",
627 | " \n",
628 | " \n",
629 | " text \n",
630 | " sentiment \n",
631 | " \n",
632 | " \n",
633 | " \n",
634 | " \n",
635 | " 1556092 \n",
636 | " @JessicaKnows I use it and do like it. \n",
637 | " 1 \n",
638 | " \n",
639 | " \n",
640 | " 868905 \n",
641 | " Almost home aaand I need to pee rather badly.... \n",
642 | " 1 \n",
643 | " \n",
644 | " \n",
645 | " 218471 \n",
646 | " dropping the marmite and cheese covered bread ... \n",
647 | " 0 \n",
648 | " \n",
649 | " \n",
650 | " 620327 \n",
651 | " Having issues with Xfire broadcast. Cancelled ... \n",
652 | " 0 \n",
653 | " \n",
654 | " \n",
655 | " 981867 \n",
656 | " @kunaldua ask for Hermes Heritage complex. Its... \n",
657 | " 1 \n",
658 | " \n",
659 | " \n",
660 | "
\n",
661 | "
\n",
662 | "
\n",
665 | " \n",
666 | " \n",
668 | " \n",
669 | " \n",
670 | " \n",
671 | " \n",
672 | " \n",
673 | " \n",
710 | "\n",
711 | " \n",
735 | "
\n",
736 | "
\n",
737 | " "
738 | ]
739 | },
740 | "metadata": {},
741 | "execution_count": 9
742 | }
743 | ]
744 | },
745 | {
746 | "cell_type": "code",
747 | "source": [
748 | "test_df = pd.DataFrame(X_test,columns=['text'])\n",
749 | "test_df['sentiment'] = y_test\n",
750 | "test_df.head()"
751 | ],
752 | "metadata": {
753 | "colab": {
754 | "base_uri": "https://localhost:8080/",
755 | "height": 206
756 | },
757 | "id": "5plcKEcAluOd",
758 | "outputId": "2a10df29-b42f-4f52-8663-bf2f705f700b"
759 | },
760 | "execution_count": 10,
761 | "outputs": [
762 | {
763 | "output_type": "execute_result",
764 | "data": {
765 | "text/plain": [
766 | " text sentiment\n",
767 | "932067 @door_kicker hey tofu is super good for u...an... 1\n",
768 | "909762 Caps lost. ARGH! But HP game evening was much ... 1\n",
769 | "1275248 @Ellen_F OF has already been on in Oz. Not sur... 1\n",
770 | "1274799 @alexpham4 with teleporation, I wouldn't need ... 1\n",
771 | "1530405 Omg! i have 16 followers! thank u thank u thaa... 1"
772 | ],
773 | "text/html": [
774 | "\n",
775 | " \n",
776 | "
\n",
777 | "
\n",
778 | "\n",
791 | "
\n",
792 | " \n",
793 | " \n",
794 | " \n",
795 | " text \n",
796 | " sentiment \n",
797 | " \n",
798 | " \n",
799 | " \n",
800 | " \n",
801 | " 932067 \n",
802 | " @door_kicker hey tofu is super good for u...an... \n",
803 | " 1 \n",
804 | " \n",
805 | " \n",
806 | " 909762 \n",
807 | " Caps lost. ARGH! But HP game evening was much ... \n",
808 | " 1 \n",
809 | " \n",
810 | " \n",
811 | " 1275248 \n",
812 | " @Ellen_F OF has already been on in Oz. Not sur... \n",
813 | " 1 \n",
814 | " \n",
815 | " \n",
816 | " 1274799 \n",
817 | " @alexpham4 with teleporation, I wouldn't need ... \n",
818 | " 1 \n",
819 | " \n",
820 | " \n",
821 | " 1530405 \n",
822 | " Omg! i have 16 followers! thank u thank u thaa... \n",
823 | " 1 \n",
824 | " \n",
825 | " \n",
826 | "
\n",
827 | "
\n",
828 | "
\n",
831 | " \n",
832 | " \n",
834 | " \n",
835 | " \n",
836 | " \n",
837 | " \n",
838 | " \n",
839 | " \n",
876 | "\n",
877 | " \n",
901 | "
\n",
902 | "
\n",
903 | " "
904 | ]
905 | },
906 | "metadata": {},
907 | "execution_count": 10
908 | }
909 | ]
910 | },
911 | {
912 | "cell_type": "code",
913 | "source": [
914 | "valid_df = pd.DataFrame(X_val,columns=['text'])\n",
915 | "valid_df['sentiment'] = y_val\n",
916 | "valid_df.head()"
917 | ],
918 | "metadata": {
919 | "colab": {
920 | "base_uri": "https://localhost:8080/",
921 | "height": 206
922 | },
923 | "id": "pbGnTUqelyvg",
924 | "outputId": "3c15e154-77f4-4c34-e701-a3f7a397f1d1"
925 | },
926 | "execution_count": 11,
927 | "outputs": [
928 | {
929 | "output_type": "execute_result",
930 | "data": {
931 | "text/plain": [
932 | " text sentiment\n",
933 | "60473 It is still raining and more storms are moving... 0\n",
934 | "1174268 @JaydyGaGa ... Was well suprised ... I was li... 1\n",
935 | "1404666 @ddlovato I'm sure you will Demi. 1\n",
936 | "380353 @LightFoundDark yes Geographie and i dont know... 0\n",
937 | "470328 Everyone follow @truthtweet, shows which celeb... 0"
938 | ],
939 | "text/html": [
940 | "\n",
941 | " \n",
942 | "
\n",
943 | "
\n",
944 | "\n",
957 | "
\n",
958 | " \n",
959 | " \n",
960 | " \n",
961 | " text \n",
962 | " sentiment \n",
963 | " \n",
964 | " \n",
965 | " \n",
966 | " \n",
967 | " 60473 \n",
968 | " It is still raining and more storms are moving... \n",
969 | " 0 \n",
970 | " \n",
971 | " \n",
972 | " 1174268 \n",
973 | " @JaydyGaGa ... Was well suprised ... I was li... \n",
974 | " 1 \n",
975 | " \n",
976 | " \n",
977 | " 1404666 \n",
978 | " @ddlovato I'm sure you will Demi. \n",
979 | " 1 \n",
980 | " \n",
981 | " \n",
982 | " 380353 \n",
983 | " @LightFoundDark yes Geographie and i dont know... \n",
984 | " 0 \n",
985 | " \n",
986 | " \n",
987 | " 470328 \n",
988 | " Everyone follow @truthtweet, shows which celeb... \n",
989 | " 0 \n",
990 | " \n",
991 | " \n",
992 | "
\n",
993 | "
\n",
994 | "
\n",
997 | " \n",
998 | " \n",
1000 | " \n",
1001 | " \n",
1002 | " \n",
1003 | " \n",
1004 | " \n",
1005 | " \n",
1042 | "\n",
1043 | " \n",
1067 | "
\n",
1068 | "
\n",
1069 | " "
1070 | ]
1071 | },
1072 | "metadata": {},
1073 | "execution_count": 11
1074 | }
1075 | ]
1076 | },
1077 | {
1078 | "cell_type": "markdown",
1079 | "source": [
1080 | "Lastly, convert the Dataframes to the Huggingface dataset object."
1081 | ],
1082 | "metadata": {
1083 | "id": "EqNPvj1Yrmqm"
1084 | }
1085 | },
1086 | {
1087 | "cell_type": "code",
1088 | "source": [
1089 | "hf_train = Dataset.from_pandas(train_df)\n",
1090 | "hf_test = Dataset.from_pandas(test_df)\n",
1091 | "hf_valid = Dataset.from_pandas(valid_df)"
1092 | ],
1093 | "metadata": {
1094 | "id": "ckPOdPJpl0Sa"
1095 | },
1096 | "execution_count": 12,
1097 | "outputs": []
1098 | },
1099 | {
1100 | "cell_type": "markdown",
1101 | "source": [
1102 | "The following fields are optional if you want to save the Dataset objects for later."
1103 | ],
1104 | "metadata": {
1105 | "id": "4Ih0nyJprv5Q"
1106 | }
1107 | },
1108 | {
1109 | "cell_type": "code",
1110 | "source": [
1111 | "# hf_train.save_to_disk('./hf-cache/processed_sentiment140/train')\n",
1112 | "# hf_test.save_to_disk('./hf-cache/processed_sentiment140/test')\n",
1113 | "# hf_valid.save_to_disk('./hf-cache/processed_sentiment140/valid')\n",
1114 | "\n",
1115 | "# from datasets import load_from_disk\n",
1116 | "\n",
1117 | "# train_set = load_from_disk('./hf-cache/processed_sentiment140/train')\n",
1118 | "# valid_set = load_from_disk('./hf-cache/processed_sentiment140/test')\n",
1119 | "# test_set = load_from_disk('./hf-cache/processed_sentiment140/valid')"
1120 | ],
1121 | "metadata": {
1122 | "id": "p7HK_oJjmDGG"
1123 | },
1124 | "execution_count": 13,
1125 | "outputs": []
1126 | },
1127 | {
1128 | "cell_type": "markdown",
1129 | "source": [
1130 | "⚠️ Bonus: You should also consider renaming the \"sentiment\" column to \"label\" if you want to use this dataset for training using the Huggingface's Trainer function."
1131 | ],
1132 | "metadata": {
1133 | "id": "0xd1yQnZr1Yh"
1134 | }
1135 | },
1136 | {
1137 | "cell_type": "code",
1138 | "source": [
1139 | "# train_set = hf_train.rename_column(\"sentiment\", \"label\")\n",
1140 | "# valid_set = hf_valid.rename_column(\"sentiment\", \"label\")"
1141 | ],
1142 | "metadata": {
1143 | "id": "vizH53ZUnQUc"
1144 | },
1145 | "execution_count": 14,
1146 | "outputs": []
1147 | },
1148 | {
1149 | "cell_type": "markdown",
1150 | "source": [
1151 | "⚠️ Comment the following fields if it is not a test run. It will select 32 datapoints from the dataset for faster prediction."
1152 | ],
1153 | "metadata": {
1154 | "id": "PisCdjcCsnrf"
1155 | }
1156 | },
1157 | {
1158 | "cell_type": "code",
1159 | "source": [
1160 | "hf_test = hf_test.select( range(32) )"
1161 | ],
1162 | "metadata": {
1163 | "id": "2fcSkrMXsnZt"
1164 | },
1165 | "execution_count": 15,
1166 | "outputs": []
1167 | },
1168 | {
1169 | "cell_type": "markdown",
1170 | "source": [
1171 | "It is not possible to do batching and itterate over the Huggingface dataset. So, the PyTorch DataLoader will take care of that."
1172 | ],
1173 | "metadata": {
1174 | "id": "jxl7UP-GsFt0"
1175 | }
1176 | },
1177 | {
1178 | "cell_type": "code",
1179 | "source": [
1180 | "ds_loader = torch.utils.data.DataLoader(\n",
1181 | " hf_test,\n",
1182 | " batch_size=16,\n",
1183 | " num_workers=4,\n",
1184 | " pin_memory=True,\n",
1185 | ")"
1186 | ],
1187 | "metadata": {
1188 | "colab": {
1189 | "base_uri": "https://localhost:8080/"
1190 | },
1191 | "id": "iFNcyIZ1oWr1",
1192 | "outputId": "dcf065a6-f1de-4a2e-bd36-ff33f91c55a7"
1193 | },
1194 | "execution_count": 16,
1195 | "outputs": [
1196 | {
1197 | "output_type": "stream",
1198 | "name": "stderr",
1199 | "text": [
1200 | "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.\n",
1201 | " warnings.warn(_create_warning_msg(\n"
1202 | ]
1203 | }
1204 | ]
1205 | },
1206 | {
1207 | "cell_type": "markdown",
1208 | "source": [
1209 | "# Load Tokenizer and Model"
1210 | ],
1211 | "metadata": {
1212 | "id": "YH7CIjHvsR--"
1213 | }
1214 | },
1215 | {
1216 | "cell_type": "markdown",
1217 | "source": [
1218 | "I chose a RoBERTa model from the Huggingface hub that is finetuned for this dataset."
1219 | ],
1220 | "metadata": {
1221 | "id": "C_B59vUasUHr"
1222 | }
1223 | },
1224 | {
1225 | "cell_type": "code",
1226 | "source": [
1227 | "tokenizer = AutoTokenizer.from_pretrained(\"pig4431/Sentiment140_roBERTa_5E\", cache_dir=\"./hf-cache/roberta\")"
1228 | ],
1229 | "metadata": {
1230 | "id": "s1ITIWcVnai9"
1231 | },
1232 | "execution_count": 17,
1233 | "outputs": []
1234 | },
1235 | {
1236 | "cell_type": "code",
1237 | "source": [
1238 | "model = AutoModelForSequenceClassification.from_pretrained(\"pig4431/Sentiment140_roBERTa_5E\", cache_dir=\"./hf-cache/roberta\")"
1239 | ],
1240 | "metadata": {
1241 | "id": "f520bEFUomRE"
1242 | },
1243 | "execution_count": 18,
1244 | "outputs": []
1245 | },
1246 | {
1247 | "cell_type": "markdown",
1248 | "source": [
1249 | "Put the model on GPU if available."
1250 | ],
1251 | "metadata": {
1252 | "id": "R6FvDjEGsel_"
1253 | }
1254 | },
1255 | {
1256 | "cell_type": "code",
1257 | "source": [
1258 | "if torch.cuda.is_available():\n",
1259 | " model.to('cuda')"
1260 | ],
1261 | "metadata": {
1262 | "id": "_bi3oogAqGxc"
1263 | },
1264 | "execution_count": 19,
1265 | "outputs": []
1266 | },
1267 | {
1268 | "cell_type": "markdown",
1269 | "source": [
1270 | "# Load the Metrics"
1271 | ],
1272 | "metadata": {
1273 | "id": "YOHVPq6-si02"
1274 | }
1275 | },
1276 | {
1277 | "cell_type": "code",
1278 | "source": [
1279 | "metrics = evaluate.combine([\"accuracy\", \"f1\", \"precision\", \"recall\"])"
1280 | ],
1281 | "metadata": {
1282 | "id": "H0Pd_tGUqKrq"
1283 | },
1284 | "execution_count": 20,
1285 | "outputs": []
1286 | },
1287 | {
1288 | "cell_type": "markdown",
1289 | "source": [
1290 | "# Prediction Loop"
1291 | ],
1292 | "metadata": {
1293 | "id": "CQhhUnI3skuL"
1294 | }
1295 | },
1296 | {
1297 | "cell_type": "code",
1298 | "source": [
1299 | "for batch in tqdm( ds_loader ): \n",
1300 | " # Tokenize\n",
1301 | " inputs = tokenizer(batch['text'], return_tensors=\"pt\", padding=True)\n",
1302 | "\n",
1303 | " if torch.cuda.is_available():\n",
1304 | " model.to('cuda')\n",
1305 | " \n",
1306 | " # Make Predictions\n",
1307 | " with torch.no_grad():\n",
1308 | " logits = model(**inputs).logits\n",
1309 | "\n",
1310 | " # Find the Predicted Label\n",
1311 | " predicted_class_id = logits.argmax(dim=-1)\n",
1312 | "\n",
1313 | " # Add the batch result to Evaluator object\n",
1314 | " metrics.add_batch(references=batch['sentiment'], predictions=predicted_class_id)"
1315 | ],
1316 | "metadata": {
1317 | "colab": {
1318 | "base_uri": "https://localhost:8080/"
1319 | },
1320 | "id": "mCx7-9FhqR8n",
1321 | "outputId": "f5818669-ba94-493a-ac16-bdbf0ce56aa2"
1322 | },
1323 | "execution_count": 21,
1324 | "outputs": [
1325 | {
1326 | "output_type": "stream",
1327 | "name": "stderr",
1328 | "text": [
1329 | "100%|██████████| 2/2 [00:08<00:00, 4.36s/it]\n"
1330 | ]
1331 | }
1332 | ]
1333 | },
1334 | {
1335 | "cell_type": "code",
1336 | "source": [
1337 | "metrics.compute()"
1338 | ],
1339 | "metadata": {
1340 | "colab": {
1341 | "base_uri": "https://localhost:8080/"
1342 | },
1343 | "id": "rtQGZyc1qhM1",
1344 | "outputId": "d0ebc8d3-56ea-45d5-de31-4ae9d64c98d2"
1345 | },
1346 | "execution_count": 22,
1347 | "outputs": [
1348 | {
1349 | "output_type": "execute_result",
1350 | "data": {
1351 | "text/plain": [
1352 | "{'accuracy': 0.8125,\n",
1353 | " 'f1': 0.8333333333333334,\n",
1354 | " 'precision': 0.9375,\n",
1355 | " 'recall': 0.75}"
1356 | ]
1357 | },
1358 | "metadata": {},
1359 | "execution_count": 22
1360 | }
1361 | ]
1362 | }
1363 | ]
1364 | }
--------------------------------------------------------------------------------
/data/mini-dataset.csv:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/8c42a0b8456d7a534d567c4731344770fe6269c2/data/mini-dataset.csv
--------------------------------------------------------------------------------
/data/mini-dataset.json:
--------------------------------------------------------------------------------
1 | {
2 | "chunks": [
3 | {
4 | "text": "Meta has once again pushed the boundaries of AI with the release of Llama 2, the highly anticipated successor to its groundbreaking Llama 1 language model. Boasting a range of cutting-edge features, Llama 2 has already disrupted the AI landscape and poses a real challenge to ChatGPT’s dominance. In this article, we will dive into the exciting world of Llama 2 and explore what makes it a true game-changer. I. Llama 2: Revolutionizing Commercial Use Unlike its predecessor Llama 1, which was limited to research use, Llama 2 represents a major advancement as an open-source commercial model. Businesses can now integrate Llama 2 into products to create AI-powered applications. Availability on Azure and AWS facilitates fine-tuning and adoption. However, restrictions apply to prevent exploitation. Companies with over 700 million active daily users cannot use Llama 2. Additionally, its output cannot be used to improve other language models.",
5 | "embedding": ""
6 | },
7 | {
8 | "text": "II. Llama 2 Model Flavors: Llama 2 is available in four different model sizes: 7 billion, 13 billion, 34 billion, and 70 billion parameters. While 7B, 13B, and 70B have already been released, the 34B model is still awaited. The pretrained variant, trained on a whopping 2 trillion tokens, boasts a context window of 4096 tokens, twice the size of its predecessor Llama 1. Meta also released a Llama 2 fine-tuned model for chat applications that was trained on over 1 million human annotations. Such extensive training comes at a cost, with the 70B model taking a staggering 1720320 GPU hours to train. The context window’s length determines the amount of content the model can process at once, making Llama 2 a powerful language model in terms of scale and efficiency.",
9 | "embedding": ""
10 | },
11 | {
12 | "text": "III. Safety Considerations: A Top Priority for Meta: Meta’s commitment to safety and alignment shines through in Llama 2’s design. The model demonstrates exceptionally low AI safety violation percentages, surpassing even ChatGPT in safety benchmarks. Source: Meta Llama 2 paper Finding the right balance between helpfulness and safety when optimizing a model poses significant challenges. While a highly helpful model may be capable of answering any question, including sensitive ones like “How do I build a bomb?”, it also raises concerns about potential misuse. Thus, striking the perfect equilibrium between providing useful information and ensuring safety is paramount.",
13 | "embedding": ""
14 | },
15 | {
16 | "text": "However, prioritizing safety to an extreme extent can lead to a model that struggles to effectively address a diverse range of questions. This limitation could hinder the model’s practical applicability and user experience. Thus, achieving an optimum balance that allows the model to be both helpful and safe is of utmost importance. To strike the right balance between helpfulness and safety, Meta employed two reward models — one for helpfulness and another for safety — to optimize the model’s responses. The 34B parameter model has reported higher safety violations than other variants, possibly contributing to the delay in its release.",
17 | "embedding": ""
18 | },
19 | {
20 | "text": "IV. Helpfulness Comparison: Llama 2 Outperforms Competitors: Llama 2 emerges as a strong contender in the open-source language model arena, outperforming its competitors in most categories. The 70B parameter model outperforms all other open-source models, while the 7B and 34B models outshine Falcon in all categories and MPT in all categories except coding. Source: Meta Llama 2 paper. Despite being smaller, Llam a2’s performance rivals that of Chat GPT 3.5, a significantly larger closed-source model. While GPT 4 and PalM-2-L, with their larger size, outperform Llama 2, this is expected due to their capacity for handling complex language tasks. Llama 2’s impressive ability to compete with larger models highlights its efficiency and potential in the market. Source: Meta Llama 2 paper. However, Llama 2 does face challenges in coding and math problems, where models like Chat GPT 4 excel, given their significantly larger size. Chat GPT 4 performed significantly better than Llama 2 for coding (HumanEval benchmark)and math problem tasks (GSM8k benchmark). Open-source AI technologies, like Llama 2, continue to advance, offering strong competition to closed-source models.",
21 | "embedding": ""
22 | },
23 | {
24 | "text": "V. Ghost Attention: Enhancing Conversational Continuity: One unique feature in Llama 2 is Ghost Attention, which ensures continuity in conversations. This means that even after multiple interactions, the model remembers its initial instructions, ensuring more coherent and consistent responses throughout the conversation. This feature significantly enhances the user experience and makes Llama 2 a more reliable language model for interactive applications. In the example below, on the left, it forgets to use an emoji after a few conversations. On the right, with Ghost Attention, even after having many conversations, it will remember the context and continue to use emojis in its response. Source: Meta Llama 2 paper.",
25 | "embedding": ""
26 | },
27 | {
28 | "text": "VI. Temporal Capability: A Leap in Information Organization: Meta reported a groundbreaking temporal capability, where the model organizes information based on time relevance. Each question posed to the model is associated with a date, and it responds accordingly by considering the event date before which the question becomes irrelevant. For example, if you ask the question, “How long ago did Barack Obama become president?”, its only relevant after 2008. This temporal awareness allows Llama 2 to deliver more contextually accurate responses, enriching the user experience further. Source: Meta Llama 2 paper.",
29 | "embedding": ""
30 | },
31 | {
32 | "text": "VII. Open Questions and Future Outlook: Meta’s open-sourcing of Llama 2 represents a seismic shift, now offering developers and researchers commercial access to a leading language model. With Llama 2 outperforming MosaicML’s current MPT models, all eyes are on how Databricks will respond. Can MosaicML’s next MPT iteration beat Llama 2? Is it worthwhile to compete with Llama 2 or join hands with the open-source community to make the open-source models better? Meanwhile, Microsoft’s move to host Llama 2 on Azure despite having significant investment in ChatGPT raises interesting questions. Will users prefer the capabilities and transparency of an open-source model like Llama 2 over closed, proprietary options? The stakes are high, as Meta’s bold democratization play stands to reshape preferences and partnerships in the AI space. One thing is certain — the era of open language model competition has begun.",
33 | "embedding": ""
34 | },
35 | {
36 | "text": "VIII. Conclusion: With the launch of Llama 2, Meta has achieved a landmark breakthrough in open-source language models, unleashing new potential through its commercial accessibility. Llama 2’s formidable capabilities in natural language processing, along with robust safety protocols and temporal reasoning, set new benchmarks for the field. While select limitations around math and coding exist presently, Llama 2’s strengths far outweigh its weaknesses. As Meta continues honing Llama technology, this latest innovation promises to be truly transformative. By open-sourcing such an advanced model, Meta is propelling democratization and proliferation of AI across industries. From healthcare to education and beyond, Llama 2 stands to shape the landscape by putting groundbreaking language modeling into the hands of all developers and researchers. The possibilities unlocked by this open-source approach signal a shift towards a more collaborative, creative AI future.",
37 | "embedding": ""
38 | },
39 | {
40 | "text": "About 2 weeks ago, the world of generative AI was shocked by the company Meta's release of the new Llama-2 AI model. Its predecessor, Llama-1, was a breaking point in the LLM industry, as with the release of its weights along with new finetuning techniques, there was a massive creation of open-source LLM models that led to the emergence of high-performance models such as Vicuna, Koala, … In this article, we will briefly discuss some of this model's relevant points but will focus on showing how we can quickly train the model for a specific task using libraries and tools standard in this world. We will not make an exhaustive analysis of the new model, there are already numerous articles published on the subject.",
41 | "embedding": ""
42 | },
43 | {
44 | "text": "New Llama-2 model: In mid-July, Meta released its new family of pre-trained and finetuned models called Llama-2, with an open source and commercial character to facilitate its use and expansion. The base model was released with a chat version and sizes 7B, 13B, and 70B. Together with the models, the corresponding papers were published describing their characteristics and relevant points of the learning process, which provide very interesting information on the subject. An updated version of Llama 1, trained on a new mix of publicly available data. The pretraining corpus size was increased by 40%, the model’s context length was doubled, and grouped-query attention was adopted. Variants with 7B, 13B, and 70B parameters are released, along with 34B variants reported in the paper but not released.[1]",
45 | "embedding": ""
46 | },
47 | {
48 | "text": "For pre-training, 40% more tokens were used, reaching 2T, the context length was doubled and the grouped-query attention (GQA) technique was applied to speed up inference on the heavier 70B model. On the standard transformer architecture, RMSNorm normalization, SwiGLU activation, and rotatory positional embedding are used, the context length reaches 4096 tokens, and an Adam optimizer is applied with a cosine learning rate schedule, a weight decay of 0.1 and gradient clipping. The Supervised Fine-Tuning (SFT) stage is characterized by a prioritization of quality examples over quantity, as numerous reports show that the use of high-quality data results in improved final model performance. Finally, a Reinforcement Learning with Human Feedback (RLHF) step is applied to align the model with user preferences. A multitude of examples are collected where annotators select their preferred model output over a binary comparison. This data is used to train a reward model, where the focus is on helpfulness and safety.",
49 | "embedding": ""
50 | },
51 | {
52 | "text": "In short: - Trained on 2T Tokens. - Commercial use allowed. - Chat models for dialogue use cases. - 4096 default context window (can be increased). - 7B, 13B & 70B parameter version. - 70B model adopted grouped-query attention (GQA). - Chat models can use tools & plugins. - LLaMA 2-CHAT as good as OpenAI ChatGPT.",
53 | "embedding": ""
54 | },
55 | {
56 | "text": "The dataset for tuning: For our tuning process, we will take a dataset containing about 18,000 examples where the model is asked to build a Python code that solves a given task. This is an extraction of the original dataset [2], where only the Python language examples are selected. Each row contains the description of the task to be solved, an example of data input to the task if applicable, and the generated code fragment that solves the task is provided [3]. # Load dataset from the hub\ndataset = load_dataset(dataset_name, split=dataset_split)\n# Show dataset size print(f'dataset size: {len(dataset)}')\n# Show an example\nprint(dataset[randrange(len(dataset))])",
57 | "embedding": ""
58 | },
59 | {
60 | "text": "Creating the prompt To carry out an instruction fine-tuning, we must transform each one of our data examples as if it were an instruction, outlining its main sections as follows: def format_instruction(sample):\n return f'''### Instruction:\n Use the Task below and the Input given to write the Response, which is a programming code that can solve the following Task: ### Task:\n{sample['instruction']}\n### Input: \n {sample['input']}\n### Response:\n{sample['output']}\n'''Output:### Instruction: Use the Task below and the Input given to write the Response, which is a programming code that can solve the following Task: ### Task: Develop a Python program that prints 'Hello, World!' whenever it is run. ### Input: ### Response: #Python program to print 'Hello World!' print('Hello, World!')",
61 | "embedding": ""
62 | },
63 | {
64 | "text": "Fine-tuning the model\nTo carry out this stage, we have used the Google Colab environment, where we have developed a notebook that allows us to run the training in an interactive way and also a Python script to run the training in unattended mode. For the first test runs, a T4 instance with a high RAM capacity is enough, but when it comes to running the whole dataset and epochs, we have opted to use an A100 instance in order to speed up the training and ensure that its execution time is reasonable.\nIn order to be able to share the model, we will log in to the Huggingface hub using the appropriate token, so that at the end of the whole process, we will upload the model files so that they can be shared with the rest of the users.\nfrom huggingface_hub import login\nfrom dotenv import load_dotenv\nimport os\n# Load the enviroment variables\nload_dotenv()\n# Login to the Hugging Face Hub\nlogin(token=os.getenv('HF_HUB_TOKEN'))",
65 | "embedding": ""
66 | },
67 | {
68 | "text": "Fine-tuning techniques: PEFT, Lora, and QLora\nIn recent months, some papers have appeared showing how PEFT techniques can be used to train large language models with a drastic reduction of RAM requirements and consequently allowing fine-tuning of these models on a single GPU of reasonable size.\nThe usual steps to train an LLM consist, first, an intensive pre-training on billions or trillions of tokens to obtain a foundation model, and then a fine-tuning is performed on this model to specialize it on a downstream task. In this fine-tuning phase is where the PEFT technique has its purpose.\nParameter Efficient Fine-Tuning (PEFT) allows us to considerably reduce RAM and storage requirements by only fine-tuning a small number of additional parameters, with virtually all model parameters remaining frozen. PEFT has been found to produce good generalization with relatively low-volume datasets. Furthermore, it enhances the reusability and portability of the model, as the small checkpoints obtained can be easily added to the base model, and the base model can be easily fine-tuned and reused in multiple scenarios by adding the PEFT parameters. Finally, since the base model is not adjusted, all the knowledge acquired in the pre-training phase is preserved, thus avoiding catastrophic forgetting.\n",
69 | "embedding": ""
70 | },
71 | {
72 | "text": "Most widely used PEFT techniques aim to keep the pre-trained base model untouched and add new layers or parameters on top of it. These layers are called “Adapters” and the technique of their adjustment “adapter-tuning”, we add these layers to the pre-trained base model and only train the parameters of these new layers. However, a serious problem with this approach is that these layers lead to increased latency in the inference phase, which makes the process inefficient in many scenarios.\nIn the LoRa technique, a Low-Rank Adaptation of Large Language Models, the idea is not to include new layers but to add values to the parameters in a way that avoids this scary problem of latency in the inference phase. LoRa trains and stores the changes of the additional weights while freezing all the weights of the pre-trained model. Therefore, we train a new weights matrix with the changes in the pre-trained model matrix, and this new matrix is decomposed into 2 Low-rank matrices as explained here:",
73 | "embedding": ""
74 | },
75 | {
76 | "text": "Let all the parameters of a LLM in the matrix W0 and the additional weight changes in the matrix ∆W, the final weights become W0 + ∆W. The authors of LoRA [1] proposed that the change in weight change matrix ∆W can be decomposed into two low-rank matrices A and B. LoRA does not train the parameters in ∆W directly, but the parameters in A and B. So the number of trainable parameters is much less. Hypothetically suppose the dimension of A is 100 * 1 and that of B is 1 * 100, the number of parameters in ∆W will be 100 * 100 = 10000. There are only 100 + 100 = 200 to train in A and B, instead of 10000 to train in ∆W\n[4]. Explanation by Dr. Dataman in Fine-tuning a GPT — LoRA\nThe size of these low-rank matrices is defined by the r parameter. The smaller this value is, the fewer parameters to train, therefore, less effort and faster, but on the other hand, a potential loss of information and performance.\nIf you want a more detailed explanation, you can refer to the original paper, or there are plenty of articles that explain it in detail, such as [4].\nFinally, QLoRa [6] consists of applying quantization to the LoRa method allowing 4-bit normal quantization, nf4, a type optimized for normally distributed weights; double quantization to reduce the memory footprint and the optimization of the NVIDIA unified memory. These are techniques to optimize memory usage to achieve “lighter” and less expensive training.",
77 | "embedding": ""
78 | },
79 | {
80 | "text": "Implementing QLoRa in our experiment requires specifying the BitsAndBytes configuration, downloading the pretrained model in 4-bit quantization, and defining a LoraConfig. Finally, we need to retrieve the tokenizer.\n# Get the type\ncompute_dtype = getattr(torch, bnb_4bit_compute_dtype)\n# BitsAndBytesConfig int-4 config\nbnb_config = BitsAndBytesConfig(\n load_in_4bit=use_4bit,\n bnb_4bit_use_double_quant=use_double_nested_quant,\n bnb_4bit_quant_type=bnb_4bit_quant_type,\n bnb_4bit_compute_dtype=compute_dtype\n)\n# Load model and tokenizer\nmodel = AutoModelForCausalLM.from_pretrained(model_id, \n quantization_config=bnb_config, use_cache = False, device_map=device_map)\nmodel.config.pretraining_tp = 1\n# Load the tokenizer\ntokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)\ntokenizer.pad_token = tokenizer.eos_token\ntokenizer.padding_side = 'right'\nParameters defined,\n# Activate 4-bit precision base model loading\nuse_4bit = True\n# Compute dtype for 4-bit base models\nbnb_4bit_compute_dtype = 'float16'\n# Quantization type (fp4 or nf4)\nbnb_4bit_quant_type = 'nf4'\n# Activate nested quantization for 4-bit base models (double quantization)\nuse_double_nested_quant = False\n# LoRA attention dimension\nlora_r = 64\n# Alpha parameter for LoRA scaling\nlora_alpha = 16\n# Dropout probability for LoRA layers\nlora_dropout = 0.1\nAnd the next steps are well-known for all Hugging Face users, setting up the training arguments, and creating a Trainer. As we are executing an instruction fine-tuning we call to the SFTTrainer method that encapsulates the PEFT model definition and other steps.\n# Define the training arguments\nargs = TrainingArguments(\n output_dir=output_dir,\n num_train_epochs=num_train_epochs,\n per_device_train_batch_size=per_device_train_batch_size, # 6 if use_flash_attention else 4,\n gradient_accumulation_steps=gradient_accumulation_steps,\n gradient_checkpointing=gradient_checkpointing,\n optim=optim,\n logging_steps=logging_steps,\n save_strategy='epoch',\n learning_rate=learning_rate,\n weight_decay=weight_decay,\n fp16=fp16,\n bf16=bf16,\n max_grad_norm=max_grad_norm,\n warmup_ratio=warmup_ratio,\n group_by_length=group_by_length,\n lr_scheduler_type=lr_scheduler_type,\n disable_tqdm=disable_tqdm,\n report_to='tensorboard',\n seed=42\n)\n# Create the trainer\ntrainer = SFTTrainer(\n model=model,\n train_dataset=dataset,\n peft_config=peft_config,\n max_seq_length=max_seq_length,\n tokenizer=tokenizer,\n packing=packing,\n formatting_func=format_instruction,\n args=args,\n)\n# train the model\ntrainer.train() # there will not be a progress bar since tqdm is disabled\n",
81 | "embedding": ""
82 | },
83 | {
84 | "text": "# save model in local\ntrainer.save_model()\nThe parameters can be found on my GitHub repository, most of them are commonly used in other fine-tuning scripts on LLMs and are the following ones:\n# Number of training epochs\nnum_train_epochs = 1\n# Enable fp16/bf16 training (set bf16 to True with an A100)\nfp16 = False\nbf16 = True\n# Batch size per GPU for training\nper_device_train_batch_size = 4\n# Number of update steps to accumulate the gradients for\ngradient_accumulation_steps = 1\n# Enable gradient checkpointing\ngradient_checkpointing = True\n# Maximum gradient normal (gradient clipping)\nmax_grad_norm = 0.3\n# Initial learning rate (AdamW optimizer)\nlearning_rate = 2e-4\n# Weight decay to apply to all layers except bias/LayerNorm weights\nweight_decay = 0.001\n# Optimizer to use\noptim = 'paged_adamw_32bit'\n# Learning rate schedule\nlr_scheduler_type = 'cosine' #'constant'\n# Ratio of steps for a linear warmup (from 0 to learning rate)\nwarmup_ratio = 0.03\n# Group sequences into batches with same length\n# Saves memory and speeds up training considerably\ngroup_by_length = False\n# Save checkpoint every X updates steps\nsave_steps = 0\n# Log every X updates steps\nlogging_steps = 25\n# Disable tqdm\ndisable_tqdm= True\nMerge the base model and the adapter weights\nAs we mention, we have trained “modification weights” on the base model, our final model requires merging the pretrained model and the adapters in a single model.\nfrom peft import AutoPeftModelForCausalLM\nmodel = AutoPeftModelForCausalLM.from_pretrained(\n args.output_dir,\n low_cpu_mem_usage=True,\n return_dict=True,\n torch_dtype=torch.float16,\n device_map=device_map, \n)\n# Merge LoRA and base model\nmerged_model = model.merge_and_unload()\n# Save the merged model\nmerged_model.save_pretrained('merged_model',safe_serialization=True)\ntokenizer.save_pretrained('merged_model')\n# push merged model to the hub\nmerged_model.push_to_hub(hf_model_repo)\ntokenizer.push_to_hub(hf_model_repo)\nYou can find and download the model in my Hugging Face account edumunozsala/llama-2–7b-int4-python-code-20k. Give it a try!\n",
85 | "embedding": ""
86 | },
87 | {
88 | "text": "Inferencing or generating Python code\nAnd finally, we will show you how you can download the model from the Hugging Face Hub and call the model to generate an accurate result:\nimport torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n# Get the tokenizer\ntokenizer = AutoTokenizer.from_pretrained(hf_model_repo)\n# Load the model\nmodel = AutoModelForCausalLM.from_pretrained(hf_model_repo, load_in_4bit=True, \n torch_dtype=torch.float16,\n device_map=device_map)\n# Create an instruction\ninstruction='Optimize a code snippet written in Python. The code snippet should create a list of numbers from 0 to 10 that are divisible by 2.'\ninput=''\nprompt = f'''### Instruction:\nUse the Task below and the Input given to write the Response, which is a programming code that can solve the Task.\n### Task:\n{instruction}\n### Input:\n{input}\n### Response:\n'''\n# Tokenize the input\ninput_ids = tokenizer(prompt, return_tensors='pt', truncation=True).input_ids.cuda()\n# Run the model to infere an output\noutputs = model.generate(input_ids=input_ids, max_new_tokens=100, do_sample=True, top_p=0.9,temperature=0.5)\n# Print the result\nprint(f'Prompt:\n{prompt}\n')\nprint(f'Generated instruction:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}')\nPrompt:\n### Instruction:\nUse the Task below and the Input given to write the Response, which is a programming code that can solve the Task.\n### Task:\nOptimize a code snippet written in Python. The code snippet should create a list of numbers from 0 to 10 that are divisible by 2.\n### Input:\narr = []\nfor i in range(10):\n if i % 2 == 0:\n arr.append(i)\n### Response:\nGenerated instruction:\narr = [i for i in range(10) if i % 2 == 0]\nGround truth:\narr = [i for i in range(11) if i % 2 == 0]\nThanks to Maxime Labonne for an excellent article [9] and Philipp Schmid who provides an inspiring code [8]. Their articles are a must-read for everyone interested in Llama 2 and model fine-tuning.\nAnd it is all I have to mention, I hope you find useful this article and claps are welcome!! You can Follow me and Subscribe to my articles, or even connect to me via Linkedin. The code is available in my Github Repository.",
89 | "embedding": ""
90 | }
91 | ]
92 | }
--------------------------------------------------------------------------------
/data/vectorstore.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/AlaFalaki/tutorial_notebooks/8c42a0b8456d7a534d567c4731344770fe6269c2/data/vectorstore.zip
--------------------------------------------------------------------------------
/paraphrasing/hf_T5_paraphrasing.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "provenance": [],
7 | "collapsed_sections": [],
8 | "authorship_tag": "ABX9TyMKfpocsc59ZH0ZboM7IjYh",
9 | "include_colab_link": true
10 | },
11 | "kernelspec": {
12 | "name": "python3",
13 | "display_name": "Python 3"
14 | },
15 | "language_info": {
16 | "name": "python"
17 | },
18 | "widgets": {
19 | "application/vnd.jupyter.widget-state+json": {
20 | "3ed30a8a0a5e4666b4cbeaefb75aa9e3": {
21 | "model_module": "@jupyter-widgets/controls",
22 | "model_name": "HBoxModel",
23 | "model_module_version": "1.5.0",
24 | "state": {
25 | "_dom_classes": [],
26 | "_model_module": "@jupyter-widgets/controls",
27 | "_model_module_version": "1.5.0",
28 | "_model_name": "HBoxModel",
29 | "_view_count": null,
30 | "_view_module": "@jupyter-widgets/controls",
31 | "_view_module_version": "1.5.0",
32 | "_view_name": "HBoxView",
33 | "box_style": "",
34 | "children": [
35 | "IPY_MODEL_209495704ab34383af7fe65c8ea9dd09",
36 | "IPY_MODEL_4624d5203f1c44718a660a6edb6bc66e",
37 | "IPY_MODEL_94dc84edc49b4199b0a6e7c6b8d87e7b"
38 | ],
39 | "layout": "IPY_MODEL_2391632bc3a34b3fb3210092776c74aa"
40 | }
41 | },
42 | "209495704ab34383af7fe65c8ea9dd09": {
43 | "model_module": "@jupyter-widgets/controls",
44 | "model_name": "HTMLModel",
45 | "model_module_version": "1.5.0",
46 | "state": {
47 | "_dom_classes": [],
48 | "_model_module": "@jupyter-widgets/controls",
49 | "_model_module_version": "1.5.0",
50 | "_model_name": "HTMLModel",
51 | "_view_count": null,
52 | "_view_module": "@jupyter-widgets/controls",
53 | "_view_module_version": "1.5.0",
54 | "_view_name": "HTMLView",
55 | "description": "",
56 | "description_tooltip": null,
57 | "layout": "IPY_MODEL_a06640373e86495b97346998fa59edd9",
58 | "placeholder": "",
59 | "style": "IPY_MODEL_2cf8d55b5eb34e38b7b3079312da60b5",
60 | "value": "Downloading spiece.model: 100%"
61 | }
62 | },
63 | "4624d5203f1c44718a660a6edb6bc66e": {
64 | "model_module": "@jupyter-widgets/controls",
65 | "model_name": "FloatProgressModel",
66 | "model_module_version": "1.5.0",
67 | "state": {
68 | "_dom_classes": [],
69 | "_model_module": "@jupyter-widgets/controls",
70 | "_model_module_version": "1.5.0",
71 | "_model_name": "FloatProgressModel",
72 | "_view_count": null,
73 | "_view_module": "@jupyter-widgets/controls",
74 | "_view_module_version": "1.5.0",
75 | "_view_name": "ProgressView",
76 | "bar_style": "success",
77 | "description": "",
78 | "description_tooltip": null,
79 | "layout": "IPY_MODEL_3782e4e8cad847f087c3014e39d35c41",
80 | "max": 791656,
81 | "min": 0,
82 | "orientation": "horizontal",
83 | "style": "IPY_MODEL_faa73c3ee50a412a91adf244b85b1de6",
84 | "value": 791656
85 | }
86 | },
87 | "94dc84edc49b4199b0a6e7c6b8d87e7b": {
88 | "model_module": "@jupyter-widgets/controls",
89 | "model_name": "HTMLModel",
90 | "model_module_version": "1.5.0",
91 | "state": {
92 | "_dom_classes": [],
93 | "_model_module": "@jupyter-widgets/controls",
94 | "_model_module_version": "1.5.0",
95 | "_model_name": "HTMLModel",
96 | "_view_count": null,
97 | "_view_module": "@jupyter-widgets/controls",
98 | "_view_module_version": "1.5.0",
99 | "_view_name": "HTMLView",
100 | "description": "",
101 | "description_tooltip": null,
102 | "layout": "IPY_MODEL_733aa710e8cd406695b3736087107bdf",
103 | "placeholder": "",
104 | "style": "IPY_MODEL_6e66c2ca3d794fe393ee8c13b037fe9c",
105 | "value": " 773k/773k [00:00<00:00, 2.62kB/s]"
106 | }
107 | },
108 | "2391632bc3a34b3fb3210092776c74aa": {
109 | "model_module": "@jupyter-widgets/base",
110 | "model_name": "LayoutModel",
111 | "model_module_version": "1.2.0",
112 | "state": {
113 | "_model_module": "@jupyter-widgets/base",
114 | "_model_module_version": "1.2.0",
115 | "_model_name": "LayoutModel",
116 | "_view_count": null,
117 | "_view_module": "@jupyter-widgets/base",
118 | "_view_module_version": "1.2.0",
119 | "_view_name": "LayoutView",
120 | "align_content": null,
121 | "align_items": null,
122 | "align_self": null,
123 | "border": null,
124 | "bottom": null,
125 | "display": null,
126 | "flex": null,
127 | "flex_flow": null,
128 | "grid_area": null,
129 | "grid_auto_columns": null,
130 | "grid_auto_flow": null,
131 | "grid_auto_rows": null,
132 | "grid_column": null,
133 | "grid_gap": null,
134 | "grid_row": null,
135 | "grid_template_areas": null,
136 | "grid_template_columns": null,
137 | "grid_template_rows": null,
138 | "height": null,
139 | "justify_content": null,
140 | "justify_items": null,
141 | "left": null,
142 | "margin": null,
143 | "max_height": null,
144 | "max_width": null,
145 | "min_height": null,
146 | "min_width": null,
147 | "object_fit": null,
148 | "object_position": null,
149 | "order": null,
150 | "overflow": null,
151 | "overflow_x": null,
152 | "overflow_y": null,
153 | "padding": null,
154 | "right": null,
155 | "top": null,
156 | "visibility": null,
157 | "width": null
158 | }
159 | },
160 | "a06640373e86495b97346998fa59edd9": {
161 | "model_module": "@jupyter-widgets/base",
162 | "model_name": "LayoutModel",
163 | "model_module_version": "1.2.0",
164 | "state": {
165 | "_model_module": "@jupyter-widgets/base",
166 | "_model_module_version": "1.2.0",
167 | "_model_name": "LayoutModel",
168 | "_view_count": null,
169 | "_view_module": "@jupyter-widgets/base",
170 | "_view_module_version": "1.2.0",
171 | "_view_name": "LayoutView",
172 | "align_content": null,
173 | "align_items": null,
174 | "align_self": null,
175 | "border": null,
176 | "bottom": null,
177 | "display": null,
178 | "flex": null,
179 | "flex_flow": null,
180 | "grid_area": null,
181 | "grid_auto_columns": null,
182 | "grid_auto_flow": null,
183 | "grid_auto_rows": null,
184 | "grid_column": null,
185 | "grid_gap": null,
186 | "grid_row": null,
187 | "grid_template_areas": null,
188 | "grid_template_columns": null,
189 | "grid_template_rows": null,
190 | "height": null,
191 | "justify_content": null,
192 | "justify_items": null,
193 | "left": null,
194 | "margin": null,
195 | "max_height": null,
196 | "max_width": null,
197 | "min_height": null,
198 | "min_width": null,
199 | "object_fit": null,
200 | "object_position": null,
201 | "order": null,
202 | "overflow": null,
203 | "overflow_x": null,
204 | "overflow_y": null,
205 | "padding": null,
206 | "right": null,
207 | "top": null,
208 | "visibility": null,
209 | "width": null
210 | }
211 | },
212 | "2cf8d55b5eb34e38b7b3079312da60b5": {
213 | "model_module": "@jupyter-widgets/controls",
214 | "model_name": "DescriptionStyleModel",
215 | "model_module_version": "1.5.0",
216 | "state": {
217 | "_model_module": "@jupyter-widgets/controls",
218 | "_model_module_version": "1.5.0",
219 | "_model_name": "DescriptionStyleModel",
220 | "_view_count": null,
221 | "_view_module": "@jupyter-widgets/base",
222 | "_view_module_version": "1.2.0",
223 | "_view_name": "StyleView",
224 | "description_width": ""
225 | }
226 | },
227 | "3782e4e8cad847f087c3014e39d35c41": {
228 | "model_module": "@jupyter-widgets/base",
229 | "model_name": "LayoutModel",
230 | "model_module_version": "1.2.0",
231 | "state": {
232 | "_model_module": "@jupyter-widgets/base",
233 | "_model_module_version": "1.2.0",
234 | "_model_name": "LayoutModel",
235 | "_view_count": null,
236 | "_view_module": "@jupyter-widgets/base",
237 | "_view_module_version": "1.2.0",
238 | "_view_name": "LayoutView",
239 | "align_content": null,
240 | "align_items": null,
241 | "align_self": null,
242 | "border": null,
243 | "bottom": null,
244 | "display": null,
245 | "flex": null,
246 | "flex_flow": null,
247 | "grid_area": null,
248 | "grid_auto_columns": null,
249 | "grid_auto_flow": null,
250 | "grid_auto_rows": null,
251 | "grid_column": null,
252 | "grid_gap": null,
253 | "grid_row": null,
254 | "grid_template_areas": null,
255 | "grid_template_columns": null,
256 | "grid_template_rows": null,
257 | "height": null,
258 | "justify_content": null,
259 | "justify_items": null,
260 | "left": null,
261 | "margin": null,
262 | "max_height": null,
263 | "max_width": null,
264 | "min_height": null,
265 | "min_width": null,
266 | "object_fit": null,
267 | "object_position": null,
268 | "order": null,
269 | "overflow": null,
270 | "overflow_x": null,
271 | "overflow_y": null,
272 | "padding": null,
273 | "right": null,
274 | "top": null,
275 | "visibility": null,
276 | "width": null
277 | }
278 | },
279 | "faa73c3ee50a412a91adf244b85b1de6": {
280 | "model_module": "@jupyter-widgets/controls",
281 | "model_name": "ProgressStyleModel",
282 | "model_module_version": "1.5.0",
283 | "state": {
284 | "_model_module": "@jupyter-widgets/controls",
285 | "_model_module_version": "1.5.0",
286 | "_model_name": "ProgressStyleModel",
287 | "_view_count": null,
288 | "_view_module": "@jupyter-widgets/base",
289 | "_view_module_version": "1.2.0",
290 | "_view_name": "StyleView",
291 | "bar_color": null,
292 | "description_width": ""
293 | }
294 | },
295 | "733aa710e8cd406695b3736087107bdf": {
296 | "model_module": "@jupyter-widgets/base",
297 | "model_name": "LayoutModel",
298 | "model_module_version": "1.2.0",
299 | "state": {
300 | "_model_module": "@jupyter-widgets/base",
301 | "_model_module_version": "1.2.0",
302 | "_model_name": "LayoutModel",
303 | "_view_count": null,
304 | "_view_module": "@jupyter-widgets/base",
305 | "_view_module_version": "1.2.0",
306 | "_view_name": "LayoutView",
307 | "align_content": null,
308 | "align_items": null,
309 | "align_self": null,
310 | "border": null,
311 | "bottom": null,
312 | "display": null,
313 | "flex": null,
314 | "flex_flow": null,
315 | "grid_area": null,
316 | "grid_auto_columns": null,
317 | "grid_auto_flow": null,
318 | "grid_auto_rows": null,
319 | "grid_column": null,
320 | "grid_gap": null,
321 | "grid_row": null,
322 | "grid_template_areas": null,
323 | "grid_template_columns": null,
324 | "grid_template_rows": null,
325 | "height": null,
326 | "justify_content": null,
327 | "justify_items": null,
328 | "left": null,
329 | "margin": null,
330 | "max_height": null,
331 | "max_width": null,
332 | "min_height": null,
333 | "min_width": null,
334 | "object_fit": null,
335 | "object_position": null,
336 | "order": null,
337 | "overflow": null,
338 | "overflow_x": null,
339 | "overflow_y": null,
340 | "padding": null,
341 | "right": null,
342 | "top": null,
343 | "visibility": null,
344 | "width": null
345 | }
346 | },
347 | "6e66c2ca3d794fe393ee8c13b037fe9c": {
348 | "model_module": "@jupyter-widgets/controls",
349 | "model_name": "DescriptionStyleModel",
350 | "model_module_version": "1.5.0",
351 | "state": {
352 | "_model_module": "@jupyter-widgets/controls",
353 | "_model_module_version": "1.5.0",
354 | "_model_name": "DescriptionStyleModel",
355 | "_view_count": null,
356 | "_view_module": "@jupyter-widgets/base",
357 | "_view_module_version": "1.2.0",
358 | "_view_name": "StyleView",
359 | "description_width": ""
360 | }
361 | },
362 | "406d0b1379c748e5bddf41a08d1bc16a": {
363 | "model_module": "@jupyter-widgets/controls",
364 | "model_name": "HBoxModel",
365 | "model_module_version": "1.5.0",
366 | "state": {
367 | "_dom_classes": [],
368 | "_model_module": "@jupyter-widgets/controls",
369 | "_model_module_version": "1.5.0",
370 | "_model_name": "HBoxModel",
371 | "_view_count": null,
372 | "_view_module": "@jupyter-widgets/controls",
373 | "_view_module_version": "1.5.0",
374 | "_view_name": "HBoxView",
375 | "box_style": "",
376 | "children": [
377 | "IPY_MODEL_4dde398e566244c1a01ea9a21cf61991",
378 | "IPY_MODEL_79c00b4cf7ea4912824088b2e56468bd",
379 | "IPY_MODEL_ed031c9b1a6345809176bea833e995d0"
380 | ],
381 | "layout": "IPY_MODEL_3bb9525493964c70a227235c1585713b"
382 | }
383 | },
384 | "4dde398e566244c1a01ea9a21cf61991": {
385 | "model_module": "@jupyter-widgets/controls",
386 | "model_name": "HTMLModel",
387 | "model_module_version": "1.5.0",
388 | "state": {
389 | "_dom_classes": [],
390 | "_model_module": "@jupyter-widgets/controls",
391 | "_model_module_version": "1.5.0",
392 | "_model_name": "HTMLModel",
393 | "_view_count": null,
394 | "_view_module": "@jupyter-widgets/controls",
395 | "_view_module_version": "1.5.0",
396 | "_view_name": "HTMLView",
397 | "description": "",
398 | "description_tooltip": null,
399 | "layout": "IPY_MODEL_f147fd7003f64d0e8badb771bfb21b18",
400 | "placeholder": "",
401 | "style": "IPY_MODEL_b758692b7b44431b80d7f75386da912f",
402 | "value": "Downloading special_tokens_map.json: 100%"
403 | }
404 | },
405 | "79c00b4cf7ea4912824088b2e56468bd": {
406 | "model_module": "@jupyter-widgets/controls",
407 | "model_name": "FloatProgressModel",
408 | "model_module_version": "1.5.0",
409 | "state": {
410 | "_dom_classes": [],
411 | "_model_module": "@jupyter-widgets/controls",
412 | "_model_module_version": "1.5.0",
413 | "_model_name": "FloatProgressModel",
414 | "_view_count": null,
415 | "_view_module": "@jupyter-widgets/controls",
416 | "_view_module_version": "1.5.0",
417 | "_view_name": "ProgressView",
418 | "bar_style": "success",
419 | "description": "",
420 | "description_tooltip": null,
421 | "layout": "IPY_MODEL_6daf1c8875ae4945ae382a159131f953",
422 | "max": 1786,
423 | "min": 0,
424 | "orientation": "horizontal",
425 | "style": "IPY_MODEL_f29012cb87a647008db4bf4464870157",
426 | "value": 1786
427 | }
428 | },
429 | "ed031c9b1a6345809176bea833e995d0": {
430 | "model_module": "@jupyter-widgets/controls",
431 | "model_name": "HTMLModel",
432 | "model_module_version": "1.5.0",
433 | "state": {
434 | "_dom_classes": [],
435 | "_model_module": "@jupyter-widgets/controls",
436 | "_model_module_version": "1.5.0",
437 | "_model_name": "HTMLModel",
438 | "_view_count": null,
439 | "_view_module": "@jupyter-widgets/controls",
440 | "_view_module_version": "1.5.0",
441 | "_view_name": "HTMLView",
442 | "description": "",
443 | "description_tooltip": null,
444 | "layout": "IPY_MODEL_01b4832c87444f87a33f2c4517d5c051",
445 | "placeholder": "",
446 | "style": "IPY_MODEL_503531fc8424441e9a8ac4cfc64f7507",
447 | "value": " 1.74k/1.74k [00:00<00:00, 7.37kB/s]"
448 | }
449 | },
450 | "3bb9525493964c70a227235c1585713b": {
451 | "model_module": "@jupyter-widgets/base",
452 | "model_name": "LayoutModel",
453 | "model_module_version": "1.2.0",
454 | "state": {
455 | "_model_module": "@jupyter-widgets/base",
456 | "_model_module_version": "1.2.0",
457 | "_model_name": "LayoutModel",
458 | "_view_count": null,
459 | "_view_module": "@jupyter-widgets/base",
460 | "_view_module_version": "1.2.0",
461 | "_view_name": "LayoutView",
462 | "align_content": null,
463 | "align_items": null,
464 | "align_self": null,
465 | "border": null,
466 | "bottom": null,
467 | "display": null,
468 | "flex": null,
469 | "flex_flow": null,
470 | "grid_area": null,
471 | "grid_auto_columns": null,
472 | "grid_auto_flow": null,
473 | "grid_auto_rows": null,
474 | "grid_column": null,
475 | "grid_gap": null,
476 | "grid_row": null,
477 | "grid_template_areas": null,
478 | "grid_template_columns": null,
479 | "grid_template_rows": null,
480 | "height": null,
481 | "justify_content": null,
482 | "justify_items": null,
483 | "left": null,
484 | "margin": null,
485 | "max_height": null,
486 | "max_width": null,
487 | "min_height": null,
488 | "min_width": null,
489 | "object_fit": null,
490 | "object_position": null,
491 | "order": null,
492 | "overflow": null,
493 | "overflow_x": null,
494 | "overflow_y": null,
495 | "padding": null,
496 | "right": null,
497 | "top": null,
498 | "visibility": null,
499 | "width": null
500 | }
501 | },
502 | "f147fd7003f64d0e8badb771bfb21b18": {
503 | "model_module": "@jupyter-widgets/base",
504 | "model_name": "LayoutModel",
505 | "model_module_version": "1.2.0",
506 | "state": {
507 | "_model_module": "@jupyter-widgets/base",
508 | "_model_module_version": "1.2.0",
509 | "_model_name": "LayoutModel",
510 | "_view_count": null,
511 | "_view_module": "@jupyter-widgets/base",
512 | "_view_module_version": "1.2.0",
513 | "_view_name": "LayoutView",
514 | "align_content": null,
515 | "align_items": null,
516 | "align_self": null,
517 | "border": null,
518 | "bottom": null,
519 | "display": null,
520 | "flex": null,
521 | "flex_flow": null,
522 | "grid_area": null,
523 | "grid_auto_columns": null,
524 | "grid_auto_flow": null,
525 | "grid_auto_rows": null,
526 | "grid_column": null,
527 | "grid_gap": null,
528 | "grid_row": null,
529 | "grid_template_areas": null,
530 | "grid_template_columns": null,
531 | "grid_template_rows": null,
532 | "height": null,
533 | "justify_content": null,
534 | "justify_items": null,
535 | "left": null,
536 | "margin": null,
537 | "max_height": null,
538 | "max_width": null,
539 | "min_height": null,
540 | "min_width": null,
541 | "object_fit": null,
542 | "object_position": null,
543 | "order": null,
544 | "overflow": null,
545 | "overflow_x": null,
546 | "overflow_y": null,
547 | "padding": null,
548 | "right": null,
549 | "top": null,
550 | "visibility": null,
551 | "width": null
552 | }
553 | },
554 | "b758692b7b44431b80d7f75386da912f": {
555 | "model_module": "@jupyter-widgets/controls",
556 | "model_name": "DescriptionStyleModel",
557 | "model_module_version": "1.5.0",
558 | "state": {
559 | "_model_module": "@jupyter-widgets/controls",
560 | "_model_module_version": "1.5.0",
561 | "_model_name": "DescriptionStyleModel",
562 | "_view_count": null,
563 | "_view_module": "@jupyter-widgets/base",
564 | "_view_module_version": "1.2.0",
565 | "_view_name": "StyleView",
566 | "description_width": ""
567 | }
568 | },
569 | "6daf1c8875ae4945ae382a159131f953": {
570 | "model_module": "@jupyter-widgets/base",
571 | "model_name": "LayoutModel",
572 | "model_module_version": "1.2.0",
573 | "state": {
574 | "_model_module": "@jupyter-widgets/base",
575 | "_model_module_version": "1.2.0",
576 | "_model_name": "LayoutModel",
577 | "_view_count": null,
578 | "_view_module": "@jupyter-widgets/base",
579 | "_view_module_version": "1.2.0",
580 | "_view_name": "LayoutView",
581 | "align_content": null,
582 | "align_items": null,
583 | "align_self": null,
584 | "border": null,
585 | "bottom": null,
586 | "display": null,
587 | "flex": null,
588 | "flex_flow": null,
589 | "grid_area": null,
590 | "grid_auto_columns": null,
591 | "grid_auto_flow": null,
592 | "grid_auto_rows": null,
593 | "grid_column": null,
594 | "grid_gap": null,
595 | "grid_row": null,
596 | "grid_template_areas": null,
597 | "grid_template_columns": null,
598 | "grid_template_rows": null,
599 | "height": null,
600 | "justify_content": null,
601 | "justify_items": null,
602 | "left": null,
603 | "margin": null,
604 | "max_height": null,
605 | "max_width": null,
606 | "min_height": null,
607 | "min_width": null,
608 | "object_fit": null,
609 | "object_position": null,
610 | "order": null,
611 | "overflow": null,
612 | "overflow_x": null,
613 | "overflow_y": null,
614 | "padding": null,
615 | "right": null,
616 | "top": null,
617 | "visibility": null,
618 | "width": null
619 | }
620 | },
621 | "f29012cb87a647008db4bf4464870157": {
622 | "model_module": "@jupyter-widgets/controls",
623 | "model_name": "ProgressStyleModel",
624 | "model_module_version": "1.5.0",
625 | "state": {
626 | "_model_module": "@jupyter-widgets/controls",
627 | "_model_module_version": "1.5.0",
628 | "_model_name": "ProgressStyleModel",
629 | "_view_count": null,
630 | "_view_module": "@jupyter-widgets/base",
631 | "_view_module_version": "1.2.0",
632 | "_view_name": "StyleView",
633 | "bar_color": null,
634 | "description_width": ""
635 | }
636 | },
637 | "01b4832c87444f87a33f2c4517d5c051": {
638 | "model_module": "@jupyter-widgets/base",
639 | "model_name": "LayoutModel",
640 | "model_module_version": "1.2.0",
641 | "state": {
642 | "_model_module": "@jupyter-widgets/base",
643 | "_model_module_version": "1.2.0",
644 | "_model_name": "LayoutModel",
645 | "_view_count": null,
646 | "_view_module": "@jupyter-widgets/base",
647 | "_view_module_version": "1.2.0",
648 | "_view_name": "LayoutView",
649 | "align_content": null,
650 | "align_items": null,
651 | "align_self": null,
652 | "border": null,
653 | "bottom": null,
654 | "display": null,
655 | "flex": null,
656 | "flex_flow": null,
657 | "grid_area": null,
658 | "grid_auto_columns": null,
659 | "grid_auto_flow": null,
660 | "grid_auto_rows": null,
661 | "grid_column": null,
662 | "grid_gap": null,
663 | "grid_row": null,
664 | "grid_template_areas": null,
665 | "grid_template_columns": null,
666 | "grid_template_rows": null,
667 | "height": null,
668 | "justify_content": null,
669 | "justify_items": null,
670 | "left": null,
671 | "margin": null,
672 | "max_height": null,
673 | "max_width": null,
674 | "min_height": null,
675 | "min_width": null,
676 | "object_fit": null,
677 | "object_position": null,
678 | "order": null,
679 | "overflow": null,
680 | "overflow_x": null,
681 | "overflow_y": null,
682 | "padding": null,
683 | "right": null,
684 | "top": null,
685 | "visibility": null,
686 | "width": null
687 | }
688 | },
689 | "503531fc8424441e9a8ac4cfc64f7507": {
690 | "model_module": "@jupyter-widgets/controls",
691 | "model_name": "DescriptionStyleModel",
692 | "model_module_version": "1.5.0",
693 | "state": {
694 | "_model_module": "@jupyter-widgets/controls",
695 | "_model_module_version": "1.5.0",
696 | "_model_name": "DescriptionStyleModel",
697 | "_view_count": null,
698 | "_view_module": "@jupyter-widgets/base",
699 | "_view_module_version": "1.2.0",
700 | "_view_name": "StyleView",
701 | "description_width": ""
702 | }
703 | },
704 | "4aba0951ea8344219f3133451a314ce2": {
705 | "model_module": "@jupyter-widgets/controls",
706 | "model_name": "HBoxModel",
707 | "model_module_version": "1.5.0",
708 | "state": {
709 | "_dom_classes": [],
710 | "_model_module": "@jupyter-widgets/controls",
711 | "_model_module_version": "1.5.0",
712 | "_model_name": "HBoxModel",
713 | "_view_count": null,
714 | "_view_module": "@jupyter-widgets/controls",
715 | "_view_module_version": "1.5.0",
716 | "_view_name": "HBoxView",
717 | "box_style": "",
718 | "children": [
719 | "IPY_MODEL_b61ca199e36947d8a76bb94d43705cb3",
720 | "IPY_MODEL_1c9984805be54af987bbc637cff39366",
721 | "IPY_MODEL_10e1477041db4335861396d47fcd38ba"
722 | ],
723 | "layout": "IPY_MODEL_41e7824ba62448169ed589d94a9ff1a7"
724 | }
725 | },
726 | "b61ca199e36947d8a76bb94d43705cb3": {
727 | "model_module": "@jupyter-widgets/controls",
728 | "model_name": "HTMLModel",
729 | "model_module_version": "1.5.0",
730 | "state": {
731 | "_dom_classes": [],
732 | "_model_module": "@jupyter-widgets/controls",
733 | "_model_module_version": "1.5.0",
734 | "_model_name": "HTMLModel",
735 | "_view_count": null,
736 | "_view_module": "@jupyter-widgets/controls",
737 | "_view_module_version": "1.5.0",
738 | "_view_name": "HTMLView",
739 | "description": "",
740 | "description_tooltip": null,
741 | "layout": "IPY_MODEL_381c420b802a4929be8fb948e6fe77b7",
742 | "placeholder": "",
743 | "style": "IPY_MODEL_5b9986e6fa314c4da1ffb45b513255a9",
744 | "value": "Downloading tokenizer_config.json: 100%"
745 | }
746 | },
747 | "1c9984805be54af987bbc637cff39366": {
748 | "model_module": "@jupyter-widgets/controls",
749 | "model_name": "FloatProgressModel",
750 | "model_module_version": "1.5.0",
751 | "state": {
752 | "_dom_classes": [],
753 | "_model_module": "@jupyter-widgets/controls",
754 | "_model_module_version": "1.5.0",
755 | "_model_name": "FloatProgressModel",
756 | "_view_count": null,
757 | "_view_module": "@jupyter-widgets/controls",
758 | "_view_module_version": "1.5.0",
759 | "_view_name": "ProgressView",
760 | "bar_style": "success",
761 | "description": "",
762 | "description_tooltip": null,
763 | "layout": "IPY_MODEL_20774c588bab4925bd80a491549a8dd6",
764 | "max": 1889,
765 | "min": 0,
766 | "orientation": "horizontal",
767 | "style": "IPY_MODEL_c92e682a63664d4e83b3596e6126508b",
768 | "value": 1889
769 | }
770 | },
771 | "10e1477041db4335861396d47fcd38ba": {
772 | "model_module": "@jupyter-widgets/controls",
773 | "model_name": "HTMLModel",
774 | "model_module_version": "1.5.0",
775 | "state": {
776 | "_dom_classes": [],
777 | "_model_module": "@jupyter-widgets/controls",
778 | "_model_module_version": "1.5.0",
779 | "_model_name": "HTMLModel",
780 | "_view_count": null,
781 | "_view_module": "@jupyter-widgets/controls",
782 | "_view_module_version": "1.5.0",
783 | "_view_name": "HTMLView",
784 | "description": "",
785 | "description_tooltip": null,
786 | "layout": "IPY_MODEL_9d8c0363f8e7449b8f6a30a92d66e924",
787 | "placeholder": "",
788 | "style": "IPY_MODEL_3b8a84d61f4e4805b017ec11c0599db7",
789 | "value": " 1.84k/1.84k [00:00<00:00, 6.55kB/s]"
790 | }
791 | },
792 | "41e7824ba62448169ed589d94a9ff1a7": {
793 | "model_module": "@jupyter-widgets/base",
794 | "model_name": "LayoutModel",
795 | "model_module_version": "1.2.0",
796 | "state": {
797 | "_model_module": "@jupyter-widgets/base",
798 | "_model_module_version": "1.2.0",
799 | "_model_name": "LayoutModel",
800 | "_view_count": null,
801 | "_view_module": "@jupyter-widgets/base",
802 | "_view_module_version": "1.2.0",
803 | "_view_name": "LayoutView",
804 | "align_content": null,
805 | "align_items": null,
806 | "align_self": null,
807 | "border": null,
808 | "bottom": null,
809 | "display": null,
810 | "flex": null,
811 | "flex_flow": null,
812 | "grid_area": null,
813 | "grid_auto_columns": null,
814 | "grid_auto_flow": null,
815 | "grid_auto_rows": null,
816 | "grid_column": null,
817 | "grid_gap": null,
818 | "grid_row": null,
819 | "grid_template_areas": null,
820 | "grid_template_columns": null,
821 | "grid_template_rows": null,
822 | "height": null,
823 | "justify_content": null,
824 | "justify_items": null,
825 | "left": null,
826 | "margin": null,
827 | "max_height": null,
828 | "max_width": null,
829 | "min_height": null,
830 | "min_width": null,
831 | "object_fit": null,
832 | "object_position": null,
833 | "order": null,
834 | "overflow": null,
835 | "overflow_x": null,
836 | "overflow_y": null,
837 | "padding": null,
838 | "right": null,
839 | "top": null,
840 | "visibility": null,
841 | "width": null
842 | }
843 | },
844 | "381c420b802a4929be8fb948e6fe77b7": {
845 | "model_module": "@jupyter-widgets/base",
846 | "model_name": "LayoutModel",
847 | "model_module_version": "1.2.0",
848 | "state": {
849 | "_model_module": "@jupyter-widgets/base",
850 | "_model_module_version": "1.2.0",
851 | "_model_name": "LayoutModel",
852 | "_view_count": null,
853 | "_view_module": "@jupyter-widgets/base",
854 | "_view_module_version": "1.2.0",
855 | "_view_name": "LayoutView",
856 | "align_content": null,
857 | "align_items": null,
858 | "align_self": null,
859 | "border": null,
860 | "bottom": null,
861 | "display": null,
862 | "flex": null,
863 | "flex_flow": null,
864 | "grid_area": null,
865 | "grid_auto_columns": null,
866 | "grid_auto_flow": null,
867 | "grid_auto_rows": null,
868 | "grid_column": null,
869 | "grid_gap": null,
870 | "grid_row": null,
871 | "grid_template_areas": null,
872 | "grid_template_columns": null,
873 | "grid_template_rows": null,
874 | "height": null,
875 | "justify_content": null,
876 | "justify_items": null,
877 | "left": null,
878 | "margin": null,
879 | "max_height": null,
880 | "max_width": null,
881 | "min_height": null,
882 | "min_width": null,
883 | "object_fit": null,
884 | "object_position": null,
885 | "order": null,
886 | "overflow": null,
887 | "overflow_x": null,
888 | "overflow_y": null,
889 | "padding": null,
890 | "right": null,
891 | "top": null,
892 | "visibility": null,
893 | "width": null
894 | }
895 | },
896 | "5b9986e6fa314c4da1ffb45b513255a9": {
897 | "model_module": "@jupyter-widgets/controls",
898 | "model_name": "DescriptionStyleModel",
899 | "model_module_version": "1.5.0",
900 | "state": {
901 | "_model_module": "@jupyter-widgets/controls",
902 | "_model_module_version": "1.5.0",
903 | "_model_name": "DescriptionStyleModel",
904 | "_view_count": null,
905 | "_view_module": "@jupyter-widgets/base",
906 | "_view_module_version": "1.2.0",
907 | "_view_name": "StyleView",
908 | "description_width": ""
909 | }
910 | },
911 | "20774c588bab4925bd80a491549a8dd6": {
912 | "model_module": "@jupyter-widgets/base",
913 | "model_name": "LayoutModel",
914 | "model_module_version": "1.2.0",
915 | "state": {
916 | "_model_module": "@jupyter-widgets/base",
917 | "_model_module_version": "1.2.0",
918 | "_model_name": "LayoutModel",
919 | "_view_count": null,
920 | "_view_module": "@jupyter-widgets/base",
921 | "_view_module_version": "1.2.0",
922 | "_view_name": "LayoutView",
923 | "align_content": null,
924 | "align_items": null,
925 | "align_self": null,
926 | "border": null,
927 | "bottom": null,
928 | "display": null,
929 | "flex": null,
930 | "flex_flow": null,
931 | "grid_area": null,
932 | "grid_auto_columns": null,
933 | "grid_auto_flow": null,
934 | "grid_auto_rows": null,
935 | "grid_column": null,
936 | "grid_gap": null,
937 | "grid_row": null,
938 | "grid_template_areas": null,
939 | "grid_template_columns": null,
940 | "grid_template_rows": null,
941 | "height": null,
942 | "justify_content": null,
943 | "justify_items": null,
944 | "left": null,
945 | "margin": null,
946 | "max_height": null,
947 | "max_width": null,
948 | "min_height": null,
949 | "min_width": null,
950 | "object_fit": null,
951 | "object_position": null,
952 | "order": null,
953 | "overflow": null,
954 | "overflow_x": null,
955 | "overflow_y": null,
956 | "padding": null,
957 | "right": null,
958 | "top": null,
959 | "visibility": null,
960 | "width": null
961 | }
962 | },
963 | "c92e682a63664d4e83b3596e6126508b": {
964 | "model_module": "@jupyter-widgets/controls",
965 | "model_name": "ProgressStyleModel",
966 | "model_module_version": "1.5.0",
967 | "state": {
968 | "_model_module": "@jupyter-widgets/controls",
969 | "_model_module_version": "1.5.0",
970 | "_model_name": "ProgressStyleModel",
971 | "_view_count": null,
972 | "_view_module": "@jupyter-widgets/base",
973 | "_view_module_version": "1.2.0",
974 | "_view_name": "StyleView",
975 | "bar_color": null,
976 | "description_width": ""
977 | }
978 | },
979 | "9d8c0363f8e7449b8f6a30a92d66e924": {
980 | "model_module": "@jupyter-widgets/base",
981 | "model_name": "LayoutModel",
982 | "model_module_version": "1.2.0",
983 | "state": {
984 | "_model_module": "@jupyter-widgets/base",
985 | "_model_module_version": "1.2.0",
986 | "_model_name": "LayoutModel",
987 | "_view_count": null,
988 | "_view_module": "@jupyter-widgets/base",
989 | "_view_module_version": "1.2.0",
990 | "_view_name": "LayoutView",
991 | "align_content": null,
992 | "align_items": null,
993 | "align_self": null,
994 | "border": null,
995 | "bottom": null,
996 | "display": null,
997 | "flex": null,
998 | "flex_flow": null,
999 | "grid_area": null,
1000 | "grid_auto_columns": null,
1001 | "grid_auto_flow": null,
1002 | "grid_auto_rows": null,
1003 | "grid_column": null,
1004 | "grid_gap": null,
1005 | "grid_row": null,
1006 | "grid_template_areas": null,
1007 | "grid_template_columns": null,
1008 | "grid_template_rows": null,
1009 | "height": null,
1010 | "justify_content": null,
1011 | "justify_items": null,
1012 | "left": null,
1013 | "margin": null,
1014 | "max_height": null,
1015 | "max_width": null,
1016 | "min_height": null,
1017 | "min_width": null,
1018 | "object_fit": null,
1019 | "object_position": null,
1020 | "order": null,
1021 | "overflow": null,
1022 | "overflow_x": null,
1023 | "overflow_y": null,
1024 | "padding": null,
1025 | "right": null,
1026 | "top": null,
1027 | "visibility": null,
1028 | "width": null
1029 | }
1030 | },
1031 | "3b8a84d61f4e4805b017ec11c0599db7": {
1032 | "model_module": "@jupyter-widgets/controls",
1033 | "model_name": "DescriptionStyleModel",
1034 | "model_module_version": "1.5.0",
1035 | "state": {
1036 | "_model_module": "@jupyter-widgets/controls",
1037 | "_model_module_version": "1.5.0",
1038 | "_model_name": "DescriptionStyleModel",
1039 | "_view_count": null,
1040 | "_view_module": "@jupyter-widgets/base",
1041 | "_view_module_version": "1.2.0",
1042 | "_view_name": "StyleView",
1043 | "description_width": ""
1044 | }
1045 | }
1046 | }
1047 | }
1048 | },
1049 | "cells": [
1050 | {
1051 | "cell_type": "markdown",
1052 | "metadata": {
1053 | "id": "view-in-github",
1054 | "colab_type": "text"
1055 | },
1056 | "source": [
1057 | " "
1058 | ]
1059 | },
1060 | {
1061 | "cell_type": "markdown",
1062 | "source": [
1063 | "# A sample code to show how Diverse Beam Search can improve the paraphrasing quality."
1064 | ],
1065 | "metadata": {
1066 | "id": "2GUHrLV5qf08"
1067 | }
1068 | },
1069 | {
1070 | "cell_type": "markdown",
1071 | "source": [
1072 | "The code is the supplementary material to the story published in NLPiation medium blog. Follow [the link](https://pub.towardsai.net/how-to-do-effective-paraphrasing-using-huggingface-and-diverse-beam-search-t5-pegasus-229ca998d229) for a detailed explanation of the diverse beam search and following code."
1073 | ],
1074 | "metadata": {
1075 | "id": "TnNSa-tb3PyL"
1076 | }
1077 | },
1078 | {
1079 | "cell_type": "markdown",
1080 | "source": [
1081 | "# Download, and Load the Libraries"
1082 | ],
1083 | "metadata": {
1084 | "id": "fpQu8KsJqgtM"
1085 | }
1086 | },
1087 | {
1088 | "cell_type": "markdown",
1089 | "source": [
1090 | "Start by installing the Transformers library (by Huggingface) and then import the modules."
1091 | ],
1092 | "metadata": {
1093 | "id": "XVMJ_qId4OT4"
1094 | }
1095 | },
1096 | {
1097 | "cell_type": "code",
1098 | "execution_count": 1,
1099 | "metadata": {
1100 | "id": "C7MrVX5wqVTJ"
1101 | },
1102 | "outputs": [],
1103 | "source": [
1104 | "!pip install -q transformers\n",
1105 | "!pip install -q datasets\n",
1106 | "!pip install -q sentencepiece"
1107 | ]
1108 | },
1109 | {
1110 | "cell_type": "markdown",
1111 | "source": [
1112 | "# Load the Architecture and Weights"
1113 | ],
1114 | "metadata": {
1115 | "id": "tVA5L7W04R0y"
1116 | }
1117 | },
1118 | {
1119 | "cell_type": "code",
1120 | "source": [
1121 | "from transformers import T5Tokenizer, T5ForConditionalGeneration"
1122 | ],
1123 | "metadata": {
1124 | "id": "9UNCoaoGqrHk"
1125 | },
1126 | "execution_count": 2,
1127 | "outputs": []
1128 | },
1129 | {
1130 | "cell_type": "code",
1131 | "source": [
1132 | "model = T5ForConditionalGeneration.from_pretrained('prithivida/parrot_paraphraser_on_T5')\n",
1133 | "tokenizer = T5Tokenizer.from_pretrained('prithivida/parrot_paraphraser_on_T5')"
1134 | ],
1135 | "metadata": {
1136 | "colab": {
1137 | "base_uri": "https://localhost:8080/",
1138 | "height": 113,
1139 | "referenced_widgets": [
1140 | "3ed30a8a0a5e4666b4cbeaefb75aa9e3",
1141 | "209495704ab34383af7fe65c8ea9dd09",
1142 | "4624d5203f1c44718a660a6edb6bc66e",
1143 | "94dc84edc49b4199b0a6e7c6b8d87e7b",
1144 | "2391632bc3a34b3fb3210092776c74aa",
1145 | "a06640373e86495b97346998fa59edd9",
1146 | "2cf8d55b5eb34e38b7b3079312da60b5",
1147 | "3782e4e8cad847f087c3014e39d35c41",
1148 | "faa73c3ee50a412a91adf244b85b1de6",
1149 | "733aa710e8cd406695b3736087107bdf",
1150 | "6e66c2ca3d794fe393ee8c13b037fe9c",
1151 | "406d0b1379c748e5bddf41a08d1bc16a",
1152 | "4dde398e566244c1a01ea9a21cf61991",
1153 | "79c00b4cf7ea4912824088b2e56468bd",
1154 | "ed031c9b1a6345809176bea833e995d0",
1155 | "3bb9525493964c70a227235c1585713b",
1156 | "f147fd7003f64d0e8badb771bfb21b18",
1157 | "b758692b7b44431b80d7f75386da912f",
1158 | "6daf1c8875ae4945ae382a159131f953",
1159 | "f29012cb87a647008db4bf4464870157",
1160 | "01b4832c87444f87a33f2c4517d5c051",
1161 | "503531fc8424441e9a8ac4cfc64f7507",
1162 | "4aba0951ea8344219f3133451a314ce2",
1163 | "b61ca199e36947d8a76bb94d43705cb3",
1164 | "1c9984805be54af987bbc637cff39366",
1165 | "10e1477041db4335861396d47fcd38ba",
1166 | "41e7824ba62448169ed589d94a9ff1a7",
1167 | "381c420b802a4929be8fb948e6fe77b7",
1168 | "5b9986e6fa314c4da1ffb45b513255a9",
1169 | "20774c588bab4925bd80a491549a8dd6",
1170 | "c92e682a63664d4e83b3596e6126508b",
1171 | "9d8c0363f8e7449b8f6a30a92d66e924",
1172 | "3b8a84d61f4e4805b017ec11c0599db7"
1173 | ]
1174 | },
1175 | "id": "LkN6zDHsq2m-",
1176 | "outputId": "810e0f4d-f1fb-4cb2-c95d-12352673398d"
1177 | },
1178 | "execution_count": 3,
1179 | "outputs": [
1180 | {
1181 | "output_type": "display_data",
1182 | "data": {
1183 | "text/plain": [
1184 | "Downloading spiece.model: 0%| | 0.00/773k [00:00, ?B/s]"
1185 | ],
1186 | "application/vnd.jupyter.widget-view+json": {
1187 | "version_major": 2,
1188 | "version_minor": 0,
1189 | "model_id": "3ed30a8a0a5e4666b4cbeaefb75aa9e3"
1190 | }
1191 | },
1192 | "metadata": {}
1193 | },
1194 | {
1195 | "output_type": "display_data",
1196 | "data": {
1197 | "text/plain": [
1198 | "Downloading special_tokens_map.json: 0%| | 0.00/1.74k [00:00, ?B/s]"
1199 | ],
1200 | "application/vnd.jupyter.widget-view+json": {
1201 | "version_major": 2,
1202 | "version_minor": 0,
1203 | "model_id": "406d0b1379c748e5bddf41a08d1bc16a"
1204 | }
1205 | },
1206 | "metadata": {}
1207 | },
1208 | {
1209 | "output_type": "display_data",
1210 | "data": {
1211 | "text/plain": [
1212 | "Downloading tokenizer_config.json: 0%| | 0.00/1.84k [00:00, ?B/s]"
1213 | ],
1214 | "application/vnd.jupyter.widget-view+json": {
1215 | "version_major": 2,
1216 | "version_minor": 0,
1217 | "model_id": "4aba0951ea8344219f3133451a314ce2"
1218 | }
1219 | },
1220 | "metadata": {}
1221 | }
1222 | ]
1223 | },
1224 | {
1225 | "cell_type": "markdown",
1226 | "source": [
1227 | "# Tokenizing the Iinput Sequence"
1228 | ],
1229 | "metadata": {
1230 | "id": "MoEklMpK4Vss"
1231 | }
1232 | },
1233 | {
1234 | "cell_type": "code",
1235 | "source": [
1236 | "batch = tokenizer(\"Natural Language Processing can improve the quality life.\", return_tensors='pt')"
1237 | ],
1238 | "metadata": {
1239 | "id": "O9t92uTmq8kw"
1240 | },
1241 | "execution_count": 4,
1242 | "outputs": []
1243 | },
1244 | {
1245 | "cell_type": "markdown",
1246 | "source": [
1247 | "# Paraphrasing using Beam Search"
1248 | ],
1249 | "metadata": {
1250 | "id": "QURd4GgC4Ynx"
1251 | }
1252 | },
1253 | {
1254 | "cell_type": "code",
1255 | "source": [
1256 | "generated_ids = model.generate( batch['input_ids'],\n",
1257 | " num_beams=5,\n",
1258 | " temperature=1.5,\n",
1259 | " no_repeat_ngram_size=2,\n",
1260 | " early_stopping=True,\n",
1261 | " length_penalty=2.0)"
1262 | ],
1263 | "metadata": {
1264 | "id": "lSeTtk2wskZu"
1265 | },
1266 | "execution_count": 15,
1267 | "outputs": []
1268 | },
1269 | {
1270 | "cell_type": "code",
1271 | "source": [
1272 | "generated_sentence = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)"
1273 | ],
1274 | "metadata": {
1275 | "id": "kKBg43cPsoLU"
1276 | },
1277 | "execution_count": 16,
1278 | "outputs": []
1279 | },
1280 | {
1281 | "cell_type": "code",
1282 | "source": [
1283 | "print( generated_sentence )"
1284 | ],
1285 | "metadata": {
1286 | "colab": {
1287 | "base_uri": "https://localhost:8080/"
1288 | },
1289 | "id": "u3wsaH_Aso6B",
1290 | "outputId": "96156cc7-b744-42bf-ac59-010e6b29aa12"
1291 | },
1292 | "execution_count": 17,
1293 | "outputs": [
1294 | {
1295 | "output_type": "execute_result",
1296 | "data": {
1297 | "text/plain": [
1298 | "['Natural language processing can improve the quality of life.']"
1299 | ]
1300 | },
1301 | "metadata": {},
1302 | "execution_count": 17
1303 | }
1304 | ]
1305 | },
1306 | {
1307 | "cell_type": "markdown",
1308 | "source": [
1309 | "# Paraphrasing using Diverse Beam Search"
1310 | ],
1311 | "metadata": {
1312 | "id": "SF1KHNPG4cxf"
1313 | }
1314 | },
1315 | {
1316 | "cell_type": "code",
1317 | "source": [
1318 | "generated_ids = model.generate( batch['input_ids'],\n",
1319 | " num_beams=5,\n",
1320 | " num_return_sequences=5,\n",
1321 | " temperature=1.5,\n",
1322 | " num_beam_groups=5,\n",
1323 | " diversity_penalty=2.0,\n",
1324 | " no_repeat_ngram_size=2,\n",
1325 | " early_stopping=True,\n",
1326 | " length_penalty=2.0)"
1327 | ],
1328 | "metadata": {
1329 | "colab": {
1330 | "base_uri": "https://localhost:8080/"
1331 | },
1332 | "id": "9Ef7Hatsr1ZY",
1333 | "outputId": "c4cd2ef6-1aed-4b64-e323-f9359285a759"
1334 | },
1335 | "execution_count": 9,
1336 | "outputs": [
1337 | {
1338 | "output_type": "stream",
1339 | "name": "stderr",
1340 | "text": [
1341 | "/usr/local/lib/python3.7/dist-packages/transformers/generation_beam_search.py:198: UserWarning: Passing `max_length` to BeamSearchScorer is deprecated and has no effect. `max_length` should be passed directly to `beam_search(...)`, `beam_sample(...)`, or `group_beam_search(...)`.\n",
1342 | " \"Passing `max_length` to BeamSearchScorer is deprecated and has no effect. \"\n"
1343 | ]
1344 | }
1345 | ]
1346 | },
1347 | {
1348 | "cell_type": "code",
1349 | "source": [
1350 | "generated_sentence = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)"
1351 | ],
1352 | "metadata": {
1353 | "id": "M-mtEtfRr4xe"
1354 | },
1355 | "execution_count": 10,
1356 | "outputs": []
1357 | },
1358 | {
1359 | "cell_type": "code",
1360 | "source": [
1361 | "print( generated_sentence )"
1362 | ],
1363 | "metadata": {
1364 | "colab": {
1365 | "base_uri": "https://localhost:8080/"
1366 | },
1367 | "id": "c_hARzhgr9g7",
1368 | "outputId": "64a8b723-70af-4e36-d0e4-15a79785cd86"
1369 | },
1370 | "execution_count": 11,
1371 | "outputs": [
1372 | {
1373 | "output_type": "execute_result",
1374 | "data": {
1375 | "text/plain": [
1376 | "['Natural language processing can improve the quality of life.',\n",
1377 | " 'Natural Language Processing is a tool that improves quality of life.',\n",
1378 | " 'Natural Language Processing can improve quality of life.',\n",
1379 | " 'Nature can improve the quality of life.',\n",
1380 | " 'Natural language processing improves life.']"
1381 | ]
1382 | },
1383 | "metadata": {},
1384 | "execution_count": 11
1385 | }
1386 | ]
1387 | }
1388 | ]
1389 | }
--------------------------------------------------------------------------------
/summarization/hf_BERT-BERT_training.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "BERT2BERT for CNN/Dailymail",
7 | "provenance": [],
8 | "collapsed_sections": [],
9 | "include_colab_link": true
10 | },
11 | "kernelspec": {
12 | "name": "python3",
13 | "display_name": "Python 3"
14 | },
15 | "accelerator": "GPU",
16 | "widgets": {
17 | "application/vnd.jupyter.widget-state+json": {
18 | "92b31e571b3b4036a5d605419ebb69d5": {
19 | "model_module": "@jupyter-widgets/controls",
20 | "model_name": "HBoxModel",
21 | "state": {
22 | "_view_name": "HBoxView",
23 | "_dom_classes": [],
24 | "_model_name": "HBoxModel",
25 | "_view_module": "@jupyter-widgets/controls",
26 | "_model_module_version": "1.5.0",
27 | "_view_count": null,
28 | "_view_module_version": "1.5.0",
29 | "box_style": "",
30 | "layout": "IPY_MODEL_abf87c25c3044264b89cfaf6c0eaee8b",
31 | "_model_module": "@jupyter-widgets/controls",
32 | "children": [
33 | "IPY_MODEL_3b07ecf8fed64b31812a7cf340f9e330",
34 | "IPY_MODEL_6da276ed463b4e368a40ac82f044849e"
35 | ]
36 | },
37 | "model_module_version": "1.5.0"
38 | },
39 | "abf87c25c3044264b89cfaf6c0eaee8b": {
40 | "model_module": "@jupyter-widgets/base",
41 | "model_name": "LayoutModel",
42 | "state": {
43 | "_view_name": "LayoutView",
44 | "grid_template_rows": null,
45 | "right": null,
46 | "justify_content": null,
47 | "_view_module": "@jupyter-widgets/base",
48 | "overflow": null,
49 | "_model_module_version": "1.2.0",
50 | "_view_count": null,
51 | "flex_flow": null,
52 | "width": null,
53 | "min_width": null,
54 | "border": null,
55 | "align_items": null,
56 | "bottom": null,
57 | "_model_module": "@jupyter-widgets/base",
58 | "top": null,
59 | "grid_column": null,
60 | "overflow_y": null,
61 | "overflow_x": null,
62 | "grid_auto_flow": null,
63 | "grid_area": null,
64 | "grid_template_columns": null,
65 | "flex": null,
66 | "_model_name": "LayoutModel",
67 | "justify_items": null,
68 | "grid_row": null,
69 | "max_height": null,
70 | "align_content": null,
71 | "visibility": null,
72 | "align_self": null,
73 | "height": null,
74 | "min_height": null,
75 | "padding": null,
76 | "grid_auto_rows": null,
77 | "grid_gap": null,
78 | "max_width": null,
79 | "order": null,
80 | "_view_module_version": "1.2.0",
81 | "grid_template_areas": null,
82 | "object_position": null,
83 | "object_fit": null,
84 | "grid_auto_columns": null,
85 | "margin": null,
86 | "display": null,
87 | "left": null
88 | },
89 | "model_module_version": "1.2.0"
90 | },
91 | "3b07ecf8fed64b31812a7cf340f9e330": {
92 | "model_module": "@jupyter-widgets/controls",
93 | "model_name": "FloatProgressModel",
94 | "state": {
95 | "_view_name": "ProgressView",
96 | "style": "IPY_MODEL_693f95c74c8d48aebe6dc834c4b67c0d",
97 | "_dom_classes": [],
98 | "description": "100%",
99 | "_model_name": "FloatProgressModel",
100 | "bar_style": "success",
101 | "max": 8,
102 | "_view_module": "@jupyter-widgets/controls",
103 | "_model_module_version": "1.5.0",
104 | "value": 8,
105 | "_view_count": null,
106 | "_view_module_version": "1.5.0",
107 | "orientation": "horizontal",
108 | "min": 0,
109 | "description_tooltip": null,
110 | "_model_module": "@jupyter-widgets/controls",
111 | "layout": "IPY_MODEL_085e086df51e490ab1dabc1c135101fe"
112 | },
113 | "model_module_version": "1.5.0"
114 | },
115 | "6da276ed463b4e368a40ac82f044849e": {
116 | "model_module": "@jupyter-widgets/controls",
117 | "model_name": "HTMLModel",
118 | "state": {
119 | "_view_name": "HTMLView",
120 | "style": "IPY_MODEL_cab934d5f7a440428766e4d5c03e3e8a",
121 | "_dom_classes": [],
122 | "description": "",
123 | "_model_name": "HTMLModel",
124 | "placeholder": "",
125 | "_view_module": "@jupyter-widgets/controls",
126 | "_model_module_version": "1.5.0",
127 | "value": " 8/8 [00:00<00:00, 24.58ba/s]",
128 | "_view_count": null,
129 | "_view_module_version": "1.5.0",
130 | "description_tooltip": null,
131 | "_model_module": "@jupyter-widgets/controls",
132 | "layout": "IPY_MODEL_b41787b2568c4b9fb3b2588e189fb071"
133 | },
134 | "model_module_version": "1.5.0"
135 | },
136 | "693f95c74c8d48aebe6dc834c4b67c0d": {
137 | "model_module": "@jupyter-widgets/controls",
138 | "model_name": "ProgressStyleModel",
139 | "state": {
140 | "_view_name": "StyleView",
141 | "_model_name": "ProgressStyleModel",
142 | "description_width": "initial",
143 | "_view_module": "@jupyter-widgets/base",
144 | "_model_module_version": "1.5.0",
145 | "_view_count": null,
146 | "_view_module_version": "1.2.0",
147 | "bar_color": null,
148 | "_model_module": "@jupyter-widgets/controls"
149 | },
150 | "model_module_version": "1.5.0"
151 | },
152 | "085e086df51e490ab1dabc1c135101fe": {
153 | "model_module": "@jupyter-widgets/base",
154 | "model_name": "LayoutModel",
155 | "state": {
156 | "_view_name": "LayoutView",
157 | "grid_template_rows": null,
158 | "right": null,
159 | "justify_content": null,
160 | "_view_module": "@jupyter-widgets/base",
161 | "overflow": null,
162 | "_model_module_version": "1.2.0",
163 | "_view_count": null,
164 | "flex_flow": null,
165 | "width": null,
166 | "min_width": null,
167 | "border": null,
168 | "align_items": null,
169 | "bottom": null,
170 | "_model_module": "@jupyter-widgets/base",
171 | "top": null,
172 | "grid_column": null,
173 | "overflow_y": null,
174 | "overflow_x": null,
175 | "grid_auto_flow": null,
176 | "grid_area": null,
177 | "grid_template_columns": null,
178 | "flex": null,
179 | "_model_name": "LayoutModel",
180 | "justify_items": null,
181 | "grid_row": null,
182 | "max_height": null,
183 | "align_content": null,
184 | "visibility": null,
185 | "align_self": null,
186 | "height": null,
187 | "min_height": null,
188 | "padding": null,
189 | "grid_auto_rows": null,
190 | "grid_gap": null,
191 | "max_width": null,
192 | "order": null,
193 | "_view_module_version": "1.2.0",
194 | "grid_template_areas": null,
195 | "object_position": null,
196 | "object_fit": null,
197 | "grid_auto_columns": null,
198 | "margin": null,
199 | "display": null,
200 | "left": null
201 | },
202 | "model_module_version": "1.2.0"
203 | },
204 | "cab934d5f7a440428766e4d5c03e3e8a": {
205 | "model_module": "@jupyter-widgets/controls",
206 | "model_name": "DescriptionStyleModel",
207 | "state": {
208 | "_view_name": "StyleView",
209 | "_model_name": "DescriptionStyleModel",
210 | "description_width": "",
211 | "_view_module": "@jupyter-widgets/base",
212 | "_model_module_version": "1.5.0",
213 | "_view_count": null,
214 | "_view_module_version": "1.2.0",
215 | "_model_module": "@jupyter-widgets/controls"
216 | },
217 | "model_module_version": "1.5.0"
218 | },
219 | "b41787b2568c4b9fb3b2588e189fb071": {
220 | "model_module": "@jupyter-widgets/base",
221 | "model_name": "LayoutModel",
222 | "state": {
223 | "_view_name": "LayoutView",
224 | "grid_template_rows": null,
225 | "right": null,
226 | "justify_content": null,
227 | "_view_module": "@jupyter-widgets/base",
228 | "overflow": null,
229 | "_model_module_version": "1.2.0",
230 | "_view_count": null,
231 | "flex_flow": null,
232 | "width": null,
233 | "min_width": null,
234 | "border": null,
235 | "align_items": null,
236 | "bottom": null,
237 | "_model_module": "@jupyter-widgets/base",
238 | "top": null,
239 | "grid_column": null,
240 | "overflow_y": null,
241 | "overflow_x": null,
242 | "grid_auto_flow": null,
243 | "grid_area": null,
244 | "grid_template_columns": null,
245 | "flex": null,
246 | "_model_name": "LayoutModel",
247 | "justify_items": null,
248 | "grid_row": null,
249 | "max_height": null,
250 | "align_content": null,
251 | "visibility": null,
252 | "align_self": null,
253 | "height": null,
254 | "min_height": null,
255 | "padding": null,
256 | "grid_auto_rows": null,
257 | "grid_gap": null,
258 | "max_width": null,
259 | "order": null,
260 | "_view_module_version": "1.2.0",
261 | "grid_template_areas": null,
262 | "object_position": null,
263 | "object_fit": null,
264 | "grid_auto_columns": null,
265 | "margin": null,
266 | "display": null,
267 | "left": null
268 | },
269 | "model_module_version": "1.2.0"
270 | },
271 | "0e5512bcb89643ffab38dc34e2fb9e2c": {
272 | "model_module": "@jupyter-widgets/controls",
273 | "model_name": "HBoxModel",
274 | "state": {
275 | "_view_name": "HBoxView",
276 | "_dom_classes": [],
277 | "_model_name": "HBoxModel",
278 | "_view_module": "@jupyter-widgets/controls",
279 | "_model_module_version": "1.5.0",
280 | "_view_count": null,
281 | "_view_module_version": "1.5.0",
282 | "box_style": "",
283 | "layout": "IPY_MODEL_741245611dfc46448f4563de7c13b798",
284 | "_model_module": "@jupyter-widgets/controls",
285 | "children": [
286 | "IPY_MODEL_888cd957aee64f3b87e94d65b2780e73",
287 | "IPY_MODEL_72014877a8cb410880d9f9f4c1cf8203"
288 | ]
289 | },
290 | "model_module_version": "1.5.0"
291 | },
292 | "741245611dfc46448f4563de7c13b798": {
293 | "model_module": "@jupyter-widgets/base",
294 | "model_name": "LayoutModel",
295 | "state": {
296 | "_view_name": "LayoutView",
297 | "grid_template_rows": null,
298 | "right": null,
299 | "justify_content": null,
300 | "_view_module": "@jupyter-widgets/base",
301 | "overflow": null,
302 | "_model_module_version": "1.2.0",
303 | "_view_count": null,
304 | "flex_flow": null,
305 | "width": null,
306 | "min_width": null,
307 | "border": null,
308 | "align_items": null,
309 | "bottom": null,
310 | "_model_module": "@jupyter-widgets/base",
311 | "top": null,
312 | "grid_column": null,
313 | "overflow_y": null,
314 | "overflow_x": null,
315 | "grid_auto_flow": null,
316 | "grid_area": null,
317 | "grid_template_columns": null,
318 | "flex": null,
319 | "_model_name": "LayoutModel",
320 | "justify_items": null,
321 | "grid_row": null,
322 | "max_height": null,
323 | "align_content": null,
324 | "visibility": null,
325 | "align_self": null,
326 | "height": null,
327 | "min_height": null,
328 | "padding": null,
329 | "grid_auto_rows": null,
330 | "grid_gap": null,
331 | "max_width": null,
332 | "order": null,
333 | "_view_module_version": "1.2.0",
334 | "grid_template_areas": null,
335 | "object_position": null,
336 | "object_fit": null,
337 | "grid_auto_columns": null,
338 | "margin": null,
339 | "display": null,
340 | "left": null
341 | },
342 | "model_module_version": "1.2.0"
343 | },
344 | "888cd957aee64f3b87e94d65b2780e73": {
345 | "model_module": "@jupyter-widgets/controls",
346 | "model_name": "FloatProgressModel",
347 | "state": {
348 | "_view_name": "ProgressView",
349 | "style": "IPY_MODEL_c3629a6449bc469c8556b0fd897d34ef",
350 | "_dom_classes": [],
351 | "description": "100%",
352 | "_model_name": "FloatProgressModel",
353 | "bar_style": "success",
354 | "max": 4,
355 | "_view_module": "@jupyter-widgets/controls",
356 | "_model_module_version": "1.5.0",
357 | "value": 4,
358 | "_view_count": null,
359 | "_view_module_version": "1.5.0",
360 | "orientation": "horizontal",
361 | "min": 0,
362 | "description_tooltip": null,
363 | "_model_module": "@jupyter-widgets/controls",
364 | "layout": "IPY_MODEL_8d108f99c47e4398bb07f9ce9b61ed7e"
365 | },
366 | "model_module_version": "1.5.0"
367 | },
368 | "72014877a8cb410880d9f9f4c1cf8203": {
369 | "model_module": "@jupyter-widgets/controls",
370 | "model_name": "HTMLModel",
371 | "state": {
372 | "_view_name": "HTMLView",
373 | "style": "IPY_MODEL_d13beb567f01474fb828b69b3b7ceb4a",
374 | "_dom_classes": [],
375 | "description": "",
376 | "_model_name": "HTMLModel",
377 | "placeholder": "",
378 | "_view_module": "@jupyter-widgets/controls",
379 | "_model_module_version": "1.5.0",
380 | "value": " 4/4 [00:00<00:00, 28.18ba/s]",
381 | "_view_count": null,
382 | "_view_module_version": "1.5.0",
383 | "description_tooltip": null,
384 | "_model_module": "@jupyter-widgets/controls",
385 | "layout": "IPY_MODEL_020ec98f3c8e42f28299053abd682a56"
386 | },
387 | "model_module_version": "1.5.0"
388 | },
389 | "c3629a6449bc469c8556b0fd897d34ef": {
390 | "model_module": "@jupyter-widgets/controls",
391 | "model_name": "ProgressStyleModel",
392 | "state": {
393 | "_view_name": "StyleView",
394 | "_model_name": "ProgressStyleModel",
395 | "description_width": "initial",
396 | "_view_module": "@jupyter-widgets/base",
397 | "_model_module_version": "1.5.0",
398 | "_view_count": null,
399 | "_view_module_version": "1.2.0",
400 | "bar_color": null,
401 | "_model_module": "@jupyter-widgets/controls"
402 | },
403 | "model_module_version": "1.5.0"
404 | },
405 | "8d108f99c47e4398bb07f9ce9b61ed7e": {
406 | "model_module": "@jupyter-widgets/base",
407 | "model_name": "LayoutModel",
408 | "state": {
409 | "_view_name": "LayoutView",
410 | "grid_template_rows": null,
411 | "right": null,
412 | "justify_content": null,
413 | "_view_module": "@jupyter-widgets/base",
414 | "overflow": null,
415 | "_model_module_version": "1.2.0",
416 | "_view_count": null,
417 | "flex_flow": null,
418 | "width": null,
419 | "min_width": null,
420 | "border": null,
421 | "align_items": null,
422 | "bottom": null,
423 | "_model_module": "@jupyter-widgets/base",
424 | "top": null,
425 | "grid_column": null,
426 | "overflow_y": null,
427 | "overflow_x": null,
428 | "grid_auto_flow": null,
429 | "grid_area": null,
430 | "grid_template_columns": null,
431 | "flex": null,
432 | "_model_name": "LayoutModel",
433 | "justify_items": null,
434 | "grid_row": null,
435 | "max_height": null,
436 | "align_content": null,
437 | "visibility": null,
438 | "align_self": null,
439 | "height": null,
440 | "min_height": null,
441 | "padding": null,
442 | "grid_auto_rows": null,
443 | "grid_gap": null,
444 | "max_width": null,
445 | "order": null,
446 | "_view_module_version": "1.2.0",
447 | "grid_template_areas": null,
448 | "object_position": null,
449 | "object_fit": null,
450 | "grid_auto_columns": null,
451 | "margin": null,
452 | "display": null,
453 | "left": null
454 | },
455 | "model_module_version": "1.2.0"
456 | },
457 | "d13beb567f01474fb828b69b3b7ceb4a": {
458 | "model_module": "@jupyter-widgets/controls",
459 | "model_name": "DescriptionStyleModel",
460 | "state": {
461 | "_view_name": "StyleView",
462 | "_model_name": "DescriptionStyleModel",
463 | "description_width": "",
464 | "_view_module": "@jupyter-widgets/base",
465 | "_model_module_version": "1.5.0",
466 | "_view_count": null,
467 | "_view_module_version": "1.2.0",
468 | "_model_module": "@jupyter-widgets/controls"
469 | },
470 | "model_module_version": "1.5.0"
471 | },
472 | "020ec98f3c8e42f28299053abd682a56": {
473 | "model_module": "@jupyter-widgets/base",
474 | "model_name": "LayoutModel",
475 | "state": {
476 | "_view_name": "LayoutView",
477 | "grid_template_rows": null,
478 | "right": null,
479 | "justify_content": null,
480 | "_view_module": "@jupyter-widgets/base",
481 | "overflow": null,
482 | "_model_module_version": "1.2.0",
483 | "_view_count": null,
484 | "flex_flow": null,
485 | "width": null,
486 | "min_width": null,
487 | "border": null,
488 | "align_items": null,
489 | "bottom": null,
490 | "_model_module": "@jupyter-widgets/base",
491 | "top": null,
492 | "grid_column": null,
493 | "overflow_y": null,
494 | "overflow_x": null,
495 | "grid_auto_flow": null,
496 | "grid_area": null,
497 | "grid_template_columns": null,
498 | "flex": null,
499 | "_model_name": "LayoutModel",
500 | "justify_items": null,
501 | "grid_row": null,
502 | "max_height": null,
503 | "align_content": null,
504 | "visibility": null,
505 | "align_self": null,
506 | "height": null,
507 | "min_height": null,
508 | "padding": null,
509 | "grid_auto_rows": null,
510 | "grid_gap": null,
511 | "max_width": null,
512 | "order": null,
513 | "_view_module_version": "1.2.0",
514 | "grid_template_areas": null,
515 | "object_position": null,
516 | "object_fit": null,
517 | "grid_auto_columns": null,
518 | "margin": null,
519 | "display": null,
520 | "left": null
521 | },
522 | "model_module_version": "1.2.0"
523 | }
524 | }
525 | }
526 | },
527 | "cells": [
528 | {
529 | "cell_type": "markdown",
530 | "metadata": {
531 | "id": "view-in-github",
532 | "colab_type": "text"
533 | },
534 | "source": [
535 | " "
536 | ]
537 | },
538 | {
539 | "cell_type": "markdown",
540 | "metadata": {
541 | "id": "H1F58j028eTV"
542 | },
543 | "source": [
544 | "## **Warm-starting BERT2BERT for CNN/Dailymail**\n",
545 | "\n",
546 | "***Note***: This notebook only uses a few training, validation, and test data samples for demonstration purposes. To fine-tune an encoder-decoder model on the full training data, the user should change the training and data preprocessing parameters accordingly as highlighted by the comments.\n"
547 | ]
548 | },
549 | {
550 | "cell_type": "markdown",
551 | "metadata": {
552 | "id": "3FO5ESocXvlK"
553 | },
554 | "source": [
555 | "### **Data Preprocessing**\n"
556 | ]
557 | },
558 | {
559 | "cell_type": "code",
560 | "metadata": {
561 | "id": "w67vkz3KP9eZ"
562 | },
563 | "source": [
564 | "%%capture\n",
565 | "!pip install datasets==1.0.2\n",
566 | "!pip install transformers==4.2.1\n",
567 | "\n",
568 | "import datasets\n",
569 | "import transformers"
570 | ],
571 | "execution_count": null,
572 | "outputs": []
573 | },
574 | {
575 | "cell_type": "code",
576 | "metadata": {
577 | "id": "sgTiC0rhMb7C"
578 | },
579 | "source": [
580 | "from transformers import BertTokenizerFast\n",
581 | "\n",
582 | "tokenizer = BertTokenizerFast.from_pretrained(\"bert-base-uncased\")\n",
583 | "tokenizer.bos_token = tokenizer.cls_token\n",
584 | "tokenizer.eos_token = tokenizer.sep_token\n",
585 | "\n",
586 | "train_data = datasets.load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"train\")\n",
587 | "val_data = datasets.load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"validation[:10%]\")"
588 | ],
589 | "execution_count": null,
590 | "outputs": []
591 | },
592 | {
593 | "cell_type": "code",
594 | "metadata": {
595 | "id": "yoN2q0hZUbXN",
596 | "colab": {
597 | "base_uri": "https://localhost:8080/",
598 | "height": 117,
599 | "referenced_widgets": [
600 | "92b31e571b3b4036a5d605419ebb69d5",
601 | "abf87c25c3044264b89cfaf6c0eaee8b",
602 | "3b07ecf8fed64b31812a7cf340f9e330",
603 | "6da276ed463b4e368a40ac82f044849e",
604 | "693f95c74c8d48aebe6dc834c4b67c0d",
605 | "085e086df51e490ab1dabc1c135101fe",
606 | "cab934d5f7a440428766e4d5c03e3e8a",
607 | "b41787b2568c4b9fb3b2588e189fb071",
608 | "0e5512bcb89643ffab38dc34e2fb9e2c",
609 | "741245611dfc46448f4563de7c13b798",
610 | "888cd957aee64f3b87e94d65b2780e73",
611 | "72014877a8cb410880d9f9f4c1cf8203",
612 | "c3629a6449bc469c8556b0fd897d34ef",
613 | "8d108f99c47e4398bb07f9ce9b61ed7e",
614 | "d13beb567f01474fb828b69b3b7ceb4a",
615 | "020ec98f3c8e42f28299053abd682a56"
616 | ]
617 | },
618 | "outputId": "71b0dd46-befc-46fd-9e00-b7975709a9d3"
619 | },
620 | "source": [
621 | "batch_size=4 # change to 16 for full training\n",
622 | "encoder_max_length=512\n",
623 | "decoder_max_length=128\n",
624 | "\n",
625 | "def process_data_to_model_inputs(batch):\n",
626 | " # tokenize the inputs and labels\n",
627 | " inputs = tokenizer(batch[\"article\"], padding=\"max_length\", truncation=True, max_length=encoder_max_length)\n",
628 | " outputs = tokenizer(batch[\"highlights\"], padding=\"max_length\", truncation=True, max_length=decoder_max_length)\n",
629 | "\n",
630 | " batch[\"input_ids\"] = inputs.input_ids\n",
631 | " batch[\"attention_mask\"] = inputs.attention_mask\n",
632 | " batch[\"decoder_input_ids\"] = outputs.input_ids\n",
633 | " batch[\"decoder_attention_mask\"] = outputs.attention_mask\n",
634 | " batch[\"labels\"] = outputs.input_ids.copy()\n",
635 | "\n",
636 | " # because BERT automatically shifts the labels, the labels correspond exactly to `decoder_input_ids`. \n",
637 | " # We have to make sure that the PAD token is ignored\n",
638 | " batch[\"labels\"] = [[-100 if token == tokenizer.pad_token_id else token for token in labels] for labels in batch[\"labels\"]]\n",
639 | "\n",
640 | " return batch\n",
641 | "\n",
642 | "# only use 32 training examples for notebook - DELETE LINE FOR FULL TRAINING\n",
643 | "train_data = train_data.select(range(32))\n",
644 | "\n",
645 | "train_data = train_data.map(\n",
646 | " process_data_to_model_inputs, \n",
647 | " batched=True, \n",
648 | " batch_size=batch_size, \n",
649 | " remove_columns=[\"article\", \"highlights\", \"id\"]\n",
650 | ")\n",
651 | "train_data.set_format(\n",
652 | " type=\"torch\", columns=[\"input_ids\", \"attention_mask\", \"decoder_input_ids\", \"decoder_attention_mask\", \"labels\"],\n",
653 | ")\n",
654 | "\n",
655 | "\n",
656 | "# only use 16 training examples for notebook - DELETE LINE FOR FULL TRAINING\n",
657 | "val_data = val_data.select(range(16))\n",
658 | "\n",
659 | "val_data = val_data.map(\n",
660 | " process_data_to_model_inputs, \n",
661 | " batched=True, \n",
662 | " batch_size=batch_size, \n",
663 | " remove_columns=[\"article\", \"highlights\", \"id\"]\n",
664 | ")\n",
665 | "val_data.set_format(\n",
666 | " type=\"torch\", columns=[\"input_ids\", \"attention_mask\", \"decoder_input_ids\", \"decoder_attention_mask\", \"labels\"],\n",
667 | ")"
668 | ],
669 | "execution_count": null,
670 | "outputs": [
671 | {
672 | "output_type": "display_data",
673 | "data": {
674 | "application/vnd.jupyter.widget-view+json": {
675 | "model_id": "92b31e571b3b4036a5d605419ebb69d5",
676 | "version_minor": 0,
677 | "version_major": 2
678 | },
679 | "text/plain": [
680 | "HBox(children=(FloatProgress(value=0.0, max=8.0), HTML(value='')))"
681 | ]
682 | },
683 | "metadata": {
684 | "tags": []
685 | }
686 | },
687 | {
688 | "output_type": "stream",
689 | "text": [
690 | "\n"
691 | ],
692 | "name": "stdout"
693 | },
694 | {
695 | "output_type": "display_data",
696 | "data": {
697 | "application/vnd.jupyter.widget-view+json": {
698 | "model_id": "0e5512bcb89643ffab38dc34e2fb9e2c",
699 | "version_minor": 0,
700 | "version_major": 2
701 | },
702 | "text/plain": [
703 | "HBox(children=(FloatProgress(value=0.0, max=4.0), HTML(value='')))"
704 | ]
705 | },
706 | "metadata": {
707 | "tags": []
708 | }
709 | },
710 | {
711 | "output_type": "stream",
712 | "text": [
713 | "\n"
714 | ],
715 | "name": "stdout"
716 | }
717 | ]
718 | },
719 | {
720 | "cell_type": "markdown",
721 | "metadata": {
722 | "id": "aEjb026cNC38"
723 | },
724 | "source": [
725 | "### **Warm-starting the Encoder-Decoder Model**"
726 | ]
727 | },
728 | {
729 | "cell_type": "code",
730 | "metadata": {
731 | "id": "tS0UndNoQh8t"
732 | },
733 | "source": [
734 | "from transformers import EncoderDecoderModel\n",
735 | "\n",
736 | "bert2bert = EncoderDecoderModel.from_encoder_decoder_pretrained(\"bert-base-uncased\", \"bert-base-uncased\")"
737 | ],
738 | "execution_count": null,
739 | "outputs": []
740 | },
741 | {
742 | "cell_type": "code",
743 | "metadata": {
744 | "id": "JD2jv3GkyjR-"
745 | },
746 | "source": [
747 | "# set special tokens\n",
748 | "bert2bert.config.decoder_start_token_id = tokenizer.bos_token_id\n",
749 | "bert2bert.config.eos_token_id = tokenizer.eos_token_id\n",
750 | "bert2bert.config.pad_token_id = tokenizer.pad_token_id\n",
751 | "\n",
752 | "# sensible parameters for beam search\n",
753 | "bert2bert.config.vocab_size = bert2bert.config.decoder.vocab_size\n",
754 | "bert2bert.config.max_length = 142\n",
755 | "bert2bert.config.min_length = 56\n",
756 | "bert2bert.config.no_repeat_ngram_size = 3\n",
757 | "bert2bert.config.early_stopping = True\n",
758 | "bert2bert.config.length_penalty = 2.0\n",
759 | "bert2bert.config.num_beams = 4"
760 | ],
761 | "execution_count": null,
762 | "outputs": []
763 | },
764 | {
765 | "cell_type": "markdown",
766 | "metadata": {
767 | "id": "u98CLZiTkgzv"
768 | },
769 | "source": [
770 | "### **Fine-Tuning Warm-Started Encoder-Decoder Models**"
771 | ]
772 | },
773 | {
774 | "cell_type": "markdown",
775 | "metadata": {
776 | "id": "rZK_gnIzZgTO"
777 | },
778 | "source": [
779 | "For the `EncoderDecoderModel` framework, we will use the `Seq2SeqTrainingArguments` and the `Seq2SeqTrainer`. Let's import them."
780 | ]
781 | },
782 | {
783 | "cell_type": "code",
784 | "metadata": {
785 | "id": "-zkkd66rtsnA"
786 | },
787 | "source": [
788 | "from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer"
789 | ],
790 | "execution_count": null,
791 | "outputs": []
792 | },
793 | {
794 | "cell_type": "markdown",
795 | "metadata": {
796 | "id": "dPUAgo7pxH24"
797 | },
798 | "source": [
799 | "Also, we need to define a function to correctly compute the ROUGE score during validation. ROUGE is a much better metric to track during training than only language modeling loss."
800 | ]
801 | },
802 | {
803 | "cell_type": "code",
804 | "metadata": {
805 | "id": "68IHmFYLx09W"
806 | },
807 | "source": [
808 | "# load rouge for validation\n",
809 | "rouge = datasets.load_metric(\"rouge\")\n",
810 | "\n",
811 | "def compute_metrics(pred):\n",
812 | " labels_ids = pred.label_ids\n",
813 | " pred_ids = pred.predictions\n",
814 | "\n",
815 | " # all unnecessary tokens are removed\n",
816 | " pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)\n",
817 | " labels_ids[labels_ids == -100] = tokenizer.pad_token_id\n",
818 | " label_str = tokenizer.batch_decode(labels_ids, skip_special_tokens=True)\n",
819 | "\n",
820 | " rouge_output = rouge.compute(predictions=pred_str, references=label_str, rouge_types=[\"rouge2\"])[\"rouge2\"].mid\n",
821 | "\n",
822 | " return {\n",
823 | " \"rouge2_precision\": round(rouge_output.precision, 4),\n",
824 | " \"rouge2_recall\": round(rouge_output.recall, 4),\n",
825 | " \"rouge2_fmeasure\": round(rouge_output.fmeasure, 4),\n",
826 | " }"
827 | ],
828 | "execution_count": null,
829 | "outputs": []
830 | },
831 | {
832 | "cell_type": "markdown",
833 | "metadata": {
834 | "id": "1ik4hZb2yV-b"
835 | },
836 | "source": [
837 | "Cool! Finally, we start training."
838 | ]
839 | },
840 | {
841 | "cell_type": "code",
842 | "metadata": {
843 | "id": "LAaTxUpdzshF",
844 | "colab": {
845 | "base_uri": "https://localhost:8080/",
846 | "height": 273
847 | },
848 | "outputId": "3605173b-5561-4ca9-b20f-2fc64709ea81"
849 | },
850 | "source": [
851 | "# set training arguments - these params are not really tuned, feel free to change\n",
852 | "training_args = Seq2SeqTrainingArguments(\n",
853 | " output_dir=\"./\",\n",
854 | " evaluation_strategy=\"steps\",\n",
855 | " per_device_train_batch_size=batch_size,\n",
856 | " per_device_eval_batch_size=batch_size,\n",
857 | " predict_with_generate=True,\n",
858 | " logging_steps=2, # set to 1000 for full training\n",
859 | " save_steps=16, # set to 500 for full training\n",
860 | " eval_steps=4, # set to 8000 for full training\n",
861 | " warmup_steps=1, # set to 2000 for full training\n",
862 | " max_steps=16, # delete for full training\n",
863 | " overwrite_output_dir=True,\n",
864 | " save_total_limit=3,\n",
865 | " fp16=True, \n",
866 | ")\n",
867 | "\n",
868 | "# instantiate trainer\n",
869 | "trainer = Seq2SeqTrainer(\n",
870 | " model=bert2bert,\n",
871 | " tokenizer=tokenizer,\n",
872 | " args=training_args,\n",
873 | " compute_metrics=compute_metrics,\n",
874 | " train_dataset=train_data,\n",
875 | " eval_dataset=val_data,\n",
876 | ")\n",
877 | "trainer.train()"
878 | ],
879 | "execution_count": null,
880 | "outputs": [
881 | {
882 | "output_type": "stream",
883 | "text": [
884 | "/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\n",
885 | " \"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate\", UserWarning)\n"
886 | ],
887 | "name": "stderr"
888 | },
889 | {
890 | "output_type": "display_data",
891 | "data": {
892 | "text/html": [
893 | "\n",
894 | " \n",
895 | " \n",
904 | " \n",
905 | "
\n",
906 | " [16/16 05:31, Epoch 2/2]\n",
907 | "
\n",
908 | " \n",
909 | " \n",
910 | " \n",
911 | " Step \n",
912 | " Training Loss \n",
913 | " Validation Loss \n",
914 | " Rouge2 Precision \n",
915 | " Rouge2 Recall \n",
916 | " Rouge2 Fmeasure \n",
917 | " Runtime \n",
918 | " Samples Per Second \n",
919 | " \n",
920 | " \n",
921 | " \n",
922 | " \n",
923 | " 4 \n",
924 | " 8.502200 \n",
925 | " 7.904778 \n",
926 | " 0.004400 \n",
927 | " 0.006200 \n",
928 | " 0.004800 \n",
929 | " 59.320500 \n",
930 | " 0.270000 \n",
931 | " \n",
932 | " \n",
933 | " 8 \n",
934 | " 7.591400 \n",
935 | " 7.853709 \n",
936 | " 0.000000 \n",
937 | " 0.000000 \n",
938 | " 0.000000 \n",
939 | " 59.554800 \n",
940 | " 0.269000 \n",
941 | " \n",
942 | " \n",
943 | " 12 \n",
944 | " 7.344000 \n",
945 | " 7.736513 \n",
946 | " 0.004700 \n",
947 | " 0.004200 \n",
948 | " 0.004400 \n",
949 | " 59.465400 \n",
950 | " 0.269000 \n",
951 | " \n",
952 | " \n",
953 | " 16 \n",
954 | " 7.456500 \n",
955 | " 7.734756 \n",
956 | " 0.003900 \n",
957 | " 0.004400 \n",
958 | " 0.004100 \n",
959 | " 59.428500 \n",
960 | " 0.269000 \n",
961 | " \n",
962 | " \n",
963 | "
"
964 | ],
965 | "text/plain": [
966 | ""
967 | ]
968 | },
969 | "metadata": {
970 | "tags": []
971 | }
972 | },
973 | {
974 | "output_type": "execute_result",
975 | "data": {
976 | "text/plain": [
977 | "TrainOutput(global_step=16, training_loss=7.902346074581146, metrics={'train_runtime': 335.2171, 'train_samples_per_second': 0.048, 'total_flos': 60792025743360, 'epoch': 2.0})"
978 | ]
979 | },
980 | "metadata": {
981 | "tags": []
982 | },
983 | "execution_count": 9
984 | }
985 | ]
986 | },
987 | {
988 | "cell_type": "markdown",
989 | "metadata": {
990 | "id": "ZwQIEhKOrJpl"
991 | },
992 | "source": [
993 | "### **Evaluation**\n",
994 | "\n",
995 | "Awesome, we finished training our dummy model. Let's now evaluated the model on the test data. We make use of the dataset's handy `.map()` function to generate a summary of each sample of the test data."
996 | ]
997 | },
998 | {
999 | "cell_type": "code",
1000 | "metadata": {
1001 | "id": "oOoSrwWarJAC"
1002 | },
1003 | "source": [
1004 | "import datasets\n",
1005 | "from transformers import BertTokenizer, EncoderDecoderModel\n",
1006 | "\n",
1007 | "tokenizer = BertTokenizer.from_pretrained(\"bert-base-uncased\")\n",
1008 | "model = EncoderDecoderModel.from_pretrained(\"./checkpoint-16\")\n",
1009 | "model.to(\"cuda\")\n",
1010 | "\n",
1011 | "test_data = datasets.load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"test\")\n",
1012 | "\n",
1013 | "# only use 16 training examples for notebook - DELETE LINE FOR FULL TRAINING\n",
1014 | "test_data = test_data.select(range(16))\n",
1015 | "\n",
1016 | "batch_size = 16 # change to 64 for full evaluation\n",
1017 | "\n",
1018 | "# map data correctly\n",
1019 | "def generate_summary(batch):\n",
1020 | " # Tokenizer will automatically set [BOS] [EOS]\n",
1021 | " # cut off at BERT max length 512\n",
1022 | " inputs = tokenizer(batch[\"article\"], padding=\"max_length\", truncation=True, max_length=512, return_tensors=\"pt\")\n",
1023 | " input_ids = inputs.input_ids.to(\"cuda\")\n",
1024 | " attention_mask = inputs.attention_mask.to(\"cuda\")\n",
1025 | "\n",
1026 | " outputs = model.generate(input_ids, attention_mask=attention_mask)\n",
1027 | "\n",
1028 | " # all special tokens including will be removed\n",
1029 | " output_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)\n",
1030 | "\n",
1031 | " batch[\"pred\"] = output_str\n",
1032 | "\n",
1033 | " return batch\n",
1034 | "\n",
1035 | "results = test_data.map(generate_summary, batched=True, batch_size=batch_size, remove_columns=[\"article\"])\n",
1036 | "\n",
1037 | "pred_str = results[\"pred\"]\n",
1038 | "label_str = results[\"highlights\"]\n",
1039 | "\n",
1040 | "rouge_output = rouge.compute(predictions=pred_str, references=label_str, rouge_types=[\"rouge2\"])[\"rouge2\"].mid\n",
1041 | "\n",
1042 | "print(rouge_output)"
1043 | ],
1044 | "execution_count": null,
1045 | "outputs": []
1046 | },
1047 | {
1048 | "cell_type": "markdown",
1049 | "metadata": {
1050 | "id": "7zdm50ZotZqb"
1051 | },
1052 | "source": [
1053 | "The fully trained *BERT2BERT* model is uploaded to the 🤗model hub under [patrickvonplaten/bert2bert_cnn_daily_mail](https://huggingface.co/patrickvonplaten/bert2bert_cnn_daily_mail). \n",
1054 | "\n",
1055 | "The model achieves a ROUGE-2 score of **18.22**, which is even a little better than reported in the paper.\n",
1056 | "\n",
1057 | "For some summarization examples, the reader is advised to use the online inference API of the model, [here](https://huggingface.co/patrickvonplaten/bert2bert_cnn_daily_mail)."
1058 | ]
1059 | }
1060 | ]
1061 | }
--------------------------------------------------------------------------------