├── CLIR_Tutorial_References.pdf ├── LICENSE ├── README.md ├── SIGIR 2023 CLIR Tutorial – Slides.pdf └── notebooks ├── clir_tutorial_blade.ipynb ├── clir_tutorial_bm25.ipynb ├── clir_tutorial_cross_encoder.ipynb └── clir_tutorial_plaidx.ipynb /CLIR_Tutorial_References.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hltcoe/clir-tutorial/fe4ae4741e30bd60c4a4f85982fb0dbaefba17d0/CLIR_Tutorial_References.pdf -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Creative Commons Legal Code 2 | 3 | CC0 1.0 Universal 4 | 5 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE 6 | LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN 7 | ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS 8 | INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES 9 | REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS 10 | PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM 11 | THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED 12 | HEREUNDER. 13 | 14 | Statement of Purpose 15 | 16 | The laws of most jurisdictions throughout the world automatically confer 17 | exclusive Copyright and Related Rights (defined below) upon the creator 18 | and subsequent owner(s) (each and all, an "owner") of an original work of 19 | authorship and/or a database (each, a "Work"). 20 | 21 | Certain owners wish to permanently relinquish those rights to a Work for 22 | the purpose of contributing to a commons of creative, cultural and 23 | scientific works ("Commons") that the public can reliably and without fear 24 | of later claims of infringement build upon, modify, incorporate in other 25 | works, reuse and redistribute as freely as possible in any form whatsoever 26 | and for any purposes, including without limitation commercial purposes. 27 | These owners may contribute to the Commons to promote the ideal of a free 28 | culture and the further production of creative, cultural and scientific 29 | works, or to gain reputation or greater distribution for their Work in 30 | part through the use and efforts of others. 31 | 32 | For these and/or other purposes and motivations, and without any 33 | expectation of additional consideration or compensation, the person 34 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she 35 | is an owner of Copyright and Related Rights in the Work, voluntarily 36 | elects to apply CC0 to the Work and publicly distribute the Work under its 37 | terms, with knowledge of his or her Copyright and Related Rights in the 38 | Work and the meaning and intended legal effect of CC0 on those rights. 39 | 40 | 1. Copyright and Related Rights. A Work made available under CC0 may be 41 | protected by copyright and related or neighboring rights ("Copyright and 42 | Related Rights"). Copyright and Related Rights include, but are not 43 | limited to, the following: 44 | 45 | i. the right to reproduce, adapt, distribute, perform, display, 46 | communicate, and translate a Work; 47 | ii. moral rights retained by the original author(s) and/or performer(s); 48 | iii. publicity and privacy rights pertaining to a person's image or 49 | likeness depicted in a Work; 50 | iv. rights protecting against unfair competition in regards to a Work, 51 | subject to the limitations in paragraph 4(a), below; 52 | v. rights protecting the extraction, dissemination, use and reuse of data 53 | in a Work; 54 | vi. database rights (such as those arising under Directive 96/9/EC of the 55 | European Parliament and of the Council of 11 March 1996 on the legal 56 | protection of databases, and under any national implementation 57 | thereof, including any amended or successor version of such 58 | directive); and 59 | vii. other similar, equivalent or corresponding rights throughout the 60 | world based on applicable law or treaty, and any national 61 | implementations thereof. 62 | 63 | 2. Waiver. To the greatest extent permitted by, but not in contravention 64 | of, applicable law, Affirmer hereby overtly, fully, permanently, 65 | irrevocably and unconditionally waives, abandons, and surrenders all of 66 | Affirmer's Copyright and Related Rights and associated claims and causes 67 | of action, whether now known or unknown (including existing as well as 68 | future claims and causes of action), in the Work (i) in all territories 69 | worldwide, (ii) for the maximum duration provided by applicable law or 70 | treaty (including future time extensions), (iii) in any current or future 71 | medium and for any number of copies, and (iv) for any purpose whatsoever, 72 | including without limitation commercial, advertising or promotional 73 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each 74 | member of the public at large and to the detriment of Affirmer's heirs and 75 | successors, fully intending that such Waiver shall not be subject to 76 | revocation, rescission, cancellation, termination, or any other legal or 77 | equitable action to disrupt the quiet enjoyment of the Work by the public 78 | as contemplated by Affirmer's express Statement of Purpose. 79 | 80 | 3. Public License Fallback. Should any part of the Waiver for any reason 81 | be judged legally invalid or ineffective under applicable law, then the 82 | Waiver shall be preserved to the maximum extent permitted taking into 83 | account Affirmer's express Statement of Purpose. In addition, to the 84 | extent the Waiver is so judged Affirmer hereby grants to each affected 85 | person a royalty-free, non transferable, non sublicensable, non exclusive, 86 | irrevocable and unconditional license to exercise Affirmer's Copyright and 87 | Related Rights in the Work (i) in all territories worldwide, (ii) for the 88 | maximum duration provided by applicable law or treaty (including future 89 | time extensions), (iii) in any current or future medium and for any number 90 | of copies, and (iv) for any purpose whatsoever, including without 91 | limitation commercial, advertising or promotional purposes (the 92 | "License"). The License shall be deemed effective as of the date CC0 was 93 | applied by Affirmer to the Work. Should any part of the License for any 94 | reason be judged legally invalid or ineffective under applicable law, such 95 | partial invalidity or ineffectiveness shall not invalidate the remainder 96 | of the License, and in such case Affirmer hereby affirms that he or she 97 | will not (i) exercise any of his or her remaining Copyright and Related 98 | Rights in the Work or (ii) assert any associated claims and causes of 99 | action with respect to the Work, in either case contrary to Affirmer's 100 | express Statement of Purpose. 101 | 102 | 4. Limitations and Disclaimers. 103 | 104 | a. No trademark or patent rights held by Affirmer are waived, abandoned, 105 | surrendered, licensed or otherwise affected by this document. 106 | b. Affirmer offers the Work as-is and makes no representations or 107 | warranties of any kind concerning the Work, express, implied, 108 | statutory or otherwise, including without limitation warranties of 109 | title, merchantability, fitness for a particular purpose, non 110 | infringement, or the absence of latent or other defects, accuracy, or 111 | the present or absence of errors, whether or not discoverable, all to 112 | the greatest extent permissible under applicable law. 113 | c. Affirmer disclaims responsibility for clearing rights of other persons 114 | that may apply to the Work or any use thereof, including without 115 | limitation any person's Copyright and Related Rights in the Work. 116 | Further, Affirmer disclaims responsibility for obtaining any necessary 117 | consents, permissions or other rights required for any use of the 118 | Work. 119 | d. Affirmer understands and acknowledges that Creative Commons is not a 120 | party to this document and has no duty or obligation with respect to 121 | this CC0 or use of the Work. 122 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Neural Methods for CLIR Tutorial at SIGIR 2023 2 | 3 | This repositories contains four pratice notebooks for the CLIR Tutorial at SIGIR 2023. 4 | You can either directly download the notebooks in `./notebooks` directory or click the following links 5 | to open the notebooks in Google Colab. 6 | 7 | ## Notebooks 8 | 9 | - BM25: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hltcoe/clir-tutorial/blob/main/notebooks/clir_tutorial_bm25.ipynb) 10 | - PLAID-X: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hltcoe/clir-tutorial/blob/main/notebooks/clir_tutorial_plaidx.ipynb) 11 | - BLADE: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hltcoe/clir-tutorial/blob/main/notebooks/clir_tutorial_blade.ipynb) 12 | - Cross-Enoder Reranking: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hltcoe/clir-tutorial/blob/main/notebooks/clir_tutorial_cross_encoder.ipynb) 13 | 14 | ## Online and Offline Supports 15 | 16 | If you run into any issues, feel free to raise an issue in this repository, message us on SIGIR Slack channel, or catch us in the conference! 17 | 18 | ## Citation 19 | If you would like to cite our tutorial, please use the following Bibtex. 20 | ```bibtex 21 | @inproceedings{sigir2023clir-tutorial, 22 | author = {Eugene Yang and Dawn Lawrie and James Mayfield and Suraj Nair and Douglas W. Oard}, 23 | title = {Neural Methods for Cross-Language Information Retrieval}, 24 | booktitle = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR) (Tutorial)}, 25 | year = {2023}, 26 | } 27 | ``` -------------------------------------------------------------------------------- /SIGIR 2023 CLIR Tutorial – Slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hltcoe/clir-tutorial/fe4ae4741e30bd60c4a4f85982fb0dbaefba17d0/SIGIR 2023 CLIR Tutorial – Slides.pdf -------------------------------------------------------------------------------- /notebooks/clir_tutorial_blade.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "gpuType": "T4" 8 | }, 9 | "kernelspec": { 10 | "name": "python3", 11 | "display_name": "Python 3" 12 | }, 13 | "language_info": { 14 | "name": "python" 15 | }, 16 | "widgets": { 17 | "application/vnd.jupyter.widget-state+json": { 18 | "19129db08a4d45ac9a7451c54fa6294e": { 19 | "model_module": "@jupyter-widgets/controls", 20 | "model_name": "HBoxModel", 21 | "model_module_version": "1.5.0", 22 | "state": { 23 | "_dom_classes": [], 24 | "_model_module": "@jupyter-widgets/controls", 25 | "_model_module_version": "1.5.0", 26 | "_model_name": "HBoxModel", 27 | "_view_count": null, 28 | "_view_module": "@jupyter-widgets/controls", 29 | "_view_module_version": "1.5.0", 30 | "_view_name": "HBoxView", 31 | "box_style": "", 32 | "children": [ 33 | "IPY_MODEL_83f94e7318f7470e84e57bb9793cd516", 34 | "IPY_MODEL_21febb8af09845e3b4035ca4d7b4e6f5", 35 | "IPY_MODEL_2eb7be5cf5724ff2abdbb11d574d13df" 36 | ], 37 | "layout": "IPY_MODEL_d5c2bfc469dd4d75a15456e9f5139703" 38 | } 39 | }, 40 | "83f94e7318f7470e84e57bb9793cd516": { 41 | "model_module": "@jupyter-widgets/controls", 42 | "model_name": "HTMLModel", 43 | "model_module_version": "1.5.0", 44 | "state": { 45 | "_dom_classes": [], 46 | "_model_module": "@jupyter-widgets/controls", 47 | "_model_module_version": "1.5.0", 48 | "_model_name": "HTMLModel", 49 | "_view_count": null, 50 | "_view_module": "@jupyter-widgets/controls", 51 | "_view_module_version": "1.5.0", 52 | "_view_name": "HTMLView", 53 | "description": "", 54 | "description_tooltip": null, 55 | "layout": "IPY_MODEL_b04d6073e65845c4b0e70945a95c23a2", 56 | "placeholder": "​", 57 | "style": "IPY_MODEL_bef41c64e3a64f71b4fdb32b00bfb90b", 58 | "value": "Downloading builder script: 100%" 59 | } 60 | }, 61 | "21febb8af09845e3b4035ca4d7b4e6f5": { 62 | "model_module": "@jupyter-widgets/controls", 63 | "model_name": "FloatProgressModel", 64 | "model_module_version": "1.5.0", 65 | "state": { 66 | "_dom_classes": [], 67 | "_model_module": "@jupyter-widgets/controls", 68 | "_model_module_version": "1.5.0", 69 | "_model_name": "FloatProgressModel", 70 | "_view_count": null, 71 | "_view_module": "@jupyter-widgets/controls", 72 | "_view_module_version": "1.5.0", 73 | "_view_name": "ProgressView", 74 | "bar_style": "success", 75 | "description": "", 76 | "description_tooltip": null, 77 | "layout": "IPY_MODEL_dc51464eadf942aaab5dab9ec166c6cc", 78 | "max": 1311, 79 | "min": 0, 80 | "orientation": "horizontal", 81 | "style": "IPY_MODEL_f4b0311ca9ad412296d0614a3517169e", 82 | "value": 1311 83 | } 84 | }, 85 | "2eb7be5cf5724ff2abdbb11d574d13df": { 86 | "model_module": "@jupyter-widgets/controls", 87 | "model_name": "HTMLModel", 88 | "model_module_version": "1.5.0", 89 | "state": { 90 | "_dom_classes": [], 91 | "_model_module": "@jupyter-widgets/controls", 92 | "_model_module_version": "1.5.0", 93 | "_model_name": "HTMLModel", 94 | "_view_count": null, 95 | "_view_module": "@jupyter-widgets/controls", 96 | "_view_module_version": "1.5.0", 97 | "_view_name": "HTMLView", 98 | "description": "", 99 | "description_tooltip": null, 100 | "layout": "IPY_MODEL_d3da9c98fd4c4bcbb0df491a28e234fb", 101 | "placeholder": "​", 102 | "style": "IPY_MODEL_c674c869c50e4eb8a2b63d5b928a6b82", 103 | "value": " 1.31k/1.31k [00:00<00:00, 59.4kB/s]" 104 | } 105 | }, 106 | "d5c2bfc469dd4d75a15456e9f5139703": { 107 | "model_module": "@jupyter-widgets/base", 108 | "model_name": "LayoutModel", 109 | "model_module_version": "1.2.0", 110 | "state": { 111 | "_model_module": "@jupyter-widgets/base", 112 | "_model_module_version": "1.2.0", 113 | "_model_name": "LayoutModel", 114 | "_view_count": null, 115 | "_view_module": "@jupyter-widgets/base", 116 | "_view_module_version": "1.2.0", 117 | "_view_name": "LayoutView", 118 | "align_content": null, 119 | "align_items": null, 120 | "align_self": null, 121 | "border": null, 122 | "bottom": null, 123 | "display": null, 124 | "flex": null, 125 | "flex_flow": null, 126 | "grid_area": null, 127 | "grid_auto_columns": null, 128 | "grid_auto_flow": null, 129 | "grid_auto_rows": null, 130 | "grid_column": null, 131 | "grid_gap": null, 132 | "grid_row": null, 133 | "grid_template_areas": null, 134 | "grid_template_columns": null, 135 | "grid_template_rows": null, 136 | "height": null, 137 | "justify_content": null, 138 | "justify_items": null, 139 | "left": null, 140 | "margin": null, 141 | "max_height": null, 142 | "max_width": null, 143 | "min_height": null, 144 | "min_width": null, 145 | "object_fit": null, 146 | "object_position": null, 147 | "order": null, 148 | "overflow": null, 149 | "overflow_x": null, 150 | "overflow_y": null, 151 | "padding": null, 152 | "right": null, 153 | "top": null, 154 | "visibility": null, 155 | "width": null 156 | } 157 | }, 158 | "b04d6073e65845c4b0e70945a95c23a2": { 159 | "model_module": "@jupyter-widgets/base", 160 | "model_name": "LayoutModel", 161 | "model_module_version": "1.2.0", 162 | "state": { 163 | "_model_module": "@jupyter-widgets/base", 164 | "_model_module_version": "1.2.0", 165 | "_model_name": "LayoutModel", 166 | "_view_count": null, 167 | "_view_module": "@jupyter-widgets/base", 168 | "_view_module_version": "1.2.0", 169 | "_view_name": "LayoutView", 170 | "align_content": null, 171 | "align_items": null, 172 | "align_self": null, 173 | "border": null, 174 | "bottom": null, 175 | "display": null, 176 | "flex": null, 177 | "flex_flow": null, 178 | "grid_area": null, 179 | "grid_auto_columns": null, 180 | "grid_auto_flow": null, 181 | "grid_auto_rows": null, 182 | "grid_column": null, 183 | "grid_gap": null, 184 | "grid_row": null, 185 | "grid_template_areas": null, 186 | "grid_template_columns": null, 187 | "grid_template_rows": null, 188 | "height": null, 189 | "justify_content": null, 190 | "justify_items": null, 191 | "left": null, 192 | "margin": null, 193 | "max_height": null, 194 | "max_width": null, 195 | "min_height": null, 196 | "min_width": null, 197 | "object_fit": null, 198 | "object_position": null, 199 | "order": null, 200 | "overflow": null, 201 | "overflow_x": null, 202 | "overflow_y": null, 203 | "padding": null, 204 | "right": null, 205 | "top": null, 206 | "visibility": null, 207 | "width": null 208 | } 209 | }, 210 | "bef41c64e3a64f71b4fdb32b00bfb90b": { 211 | "model_module": "@jupyter-widgets/controls", 212 | "model_name": "DescriptionStyleModel", 213 | "model_module_version": "1.5.0", 214 | "state": { 215 | "_model_module": "@jupyter-widgets/controls", 216 | "_model_module_version": "1.5.0", 217 | "_model_name": "DescriptionStyleModel", 218 | "_view_count": null, 219 | "_view_module": "@jupyter-widgets/base", 220 | "_view_module_version": "1.2.0", 221 | "_view_name": "StyleView", 222 | "description_width": "" 223 | } 224 | }, 225 | "dc51464eadf942aaab5dab9ec166c6cc": { 226 | "model_module": "@jupyter-widgets/base", 227 | "model_name": "LayoutModel", 228 | "model_module_version": "1.2.0", 229 | "state": { 230 | "_model_module": "@jupyter-widgets/base", 231 | "_model_module_version": "1.2.0", 232 | "_model_name": "LayoutModel", 233 | "_view_count": null, 234 | "_view_module": "@jupyter-widgets/base", 235 | "_view_module_version": "1.2.0", 236 | "_view_name": "LayoutView", 237 | "align_content": null, 238 | "align_items": null, 239 | "align_self": null, 240 | "border": null, 241 | "bottom": null, 242 | "display": null, 243 | "flex": null, 244 | "flex_flow": null, 245 | "grid_area": null, 246 | "grid_auto_columns": null, 247 | "grid_auto_flow": null, 248 | "grid_auto_rows": null, 249 | "grid_column": null, 250 | "grid_gap": null, 251 | "grid_row": null, 252 | "grid_template_areas": null, 253 | "grid_template_columns": null, 254 | "grid_template_rows": null, 255 | "height": null, 256 | "justify_content": null, 257 | "justify_items": null, 258 | "left": null, 259 | "margin": null, 260 | "max_height": null, 261 | "max_width": null, 262 | "min_height": null, 263 | "min_width": null, 264 | "object_fit": null, 265 | "object_position": null, 266 | "order": null, 267 | "overflow": null, 268 | "overflow_x": null, 269 | "overflow_y": null, 270 | "padding": null, 271 | "right": null, 272 | "top": null, 273 | "visibility": null, 274 | "width": null 275 | } 276 | }, 277 | "f4b0311ca9ad412296d0614a3517169e": { 278 | "model_module": "@jupyter-widgets/controls", 279 | "model_name": "ProgressStyleModel", 280 | "model_module_version": "1.5.0", 281 | "state": { 282 | "_model_module": "@jupyter-widgets/controls", 283 | "_model_module_version": "1.5.0", 284 | "_model_name": "ProgressStyleModel", 285 | "_view_count": null, 286 | "_view_module": "@jupyter-widgets/base", 287 | "_view_module_version": "1.2.0", 288 | "_view_name": "StyleView", 289 | "bar_color": null, 290 | "description_width": "" 291 | } 292 | }, 293 | "d3da9c98fd4c4bcbb0df491a28e234fb": { 294 | "model_module": "@jupyter-widgets/base", 295 | "model_name": "LayoutModel", 296 | "model_module_version": "1.2.0", 297 | "state": { 298 | "_model_module": "@jupyter-widgets/base", 299 | "_model_module_version": "1.2.0", 300 | "_model_name": "LayoutModel", 301 | "_view_count": null, 302 | "_view_module": "@jupyter-widgets/base", 303 | "_view_module_version": "1.2.0", 304 | "_view_name": "LayoutView", 305 | "align_content": null, 306 | "align_items": null, 307 | "align_self": null, 308 | "border": null, 309 | "bottom": null, 310 | "display": null, 311 | "flex": null, 312 | "flex_flow": null, 313 | "grid_area": null, 314 | "grid_auto_columns": null, 315 | "grid_auto_flow": null, 316 | "grid_auto_rows": null, 317 | "grid_column": null, 318 | "grid_gap": null, 319 | "grid_row": null, 320 | "grid_template_areas": null, 321 | "grid_template_columns": null, 322 | "grid_template_rows": null, 323 | "height": null, 324 | "justify_content": null, 325 | "justify_items": null, 326 | "left": null, 327 | "margin": null, 328 | "max_height": null, 329 | "max_width": null, 330 | "min_height": null, 331 | "min_width": null, 332 | "object_fit": null, 333 | "object_position": null, 334 | "order": null, 335 | "overflow": null, 336 | "overflow_x": null, 337 | "overflow_y": null, 338 | "padding": null, 339 | "right": null, 340 | "top": null, 341 | "visibility": null, 342 | "width": null 343 | } 344 | }, 345 | "c674c869c50e4eb8a2b63d5b928a6b82": { 346 | "model_module": "@jupyter-widgets/controls", 347 | "model_name": "DescriptionStyleModel", 348 | "model_module_version": "1.5.0", 349 | "state": { 350 | "_model_module": "@jupyter-widgets/controls", 351 | "_model_module_version": "1.5.0", 352 | "_model_name": "DescriptionStyleModel", 353 | "_view_count": null, 354 | "_view_module": "@jupyter-widgets/base", 355 | "_view_module_version": "1.2.0", 356 | "_view_name": "StyleView", 357 | "description_width": "" 358 | } 359 | }, 360 | "0dc339c38da74695a179d6004f63d0e5": { 361 | "model_module": "@jupyter-widgets/controls", 362 | "model_name": "HBoxModel", 363 | "model_module_version": "1.5.0", 364 | "state": { 365 | "_dom_classes": [], 366 | "_model_module": "@jupyter-widgets/controls", 367 | "_model_module_version": "1.5.0", 368 | "_model_name": "HBoxModel", 369 | "_view_count": null, 370 | "_view_module": "@jupyter-widgets/controls", 371 | "_view_module_version": "1.5.0", 372 | "_view_name": "HBoxView", 373 | "box_style": "", 374 | "children": [ 375 | "IPY_MODEL_f728ebeea0a845dcbc2c6b607c6bc978", 376 | "IPY_MODEL_f8a9fd28144d456baf5912b92a04c251", 377 | "IPY_MODEL_f863f30570744e3483bf124b2b0667b0" 378 | ], 379 | "layout": "IPY_MODEL_5d3236474a4f41b2b0b2f8b8e6d7acf2" 380 | } 381 | }, 382 | "f728ebeea0a845dcbc2c6b607c6bc978": { 383 | "model_module": "@jupyter-widgets/controls", 384 | "model_name": "HTMLModel", 385 | "model_module_version": "1.5.0", 386 | "state": { 387 | "_dom_classes": [], 388 | "_model_module": "@jupyter-widgets/controls", 389 | "_model_module_version": "1.5.0", 390 | "_model_name": "HTMLModel", 391 | "_view_count": null, 392 | "_view_module": "@jupyter-widgets/controls", 393 | "_view_module_version": "1.5.0", 394 | "_view_name": "HTMLView", 395 | "description": "", 396 | "description_tooltip": null, 397 | "layout": "IPY_MODEL_474e96e623aa49c8a603801db8ab73e5", 398 | "placeholder": "​", 399 | "style": "IPY_MODEL_0d9c8dba0a10477dbb3449ae1124ee17", 400 | "value": "Downloading readme: 100%" 401 | } 402 | }, 403 | "f8a9fd28144d456baf5912b92a04c251": { 404 | "model_module": "@jupyter-widgets/controls", 405 | "model_name": "FloatProgressModel", 406 | "model_module_version": "1.5.0", 407 | "state": { 408 | "_dom_classes": [], 409 | "_model_module": "@jupyter-widgets/controls", 410 | "_model_module_version": "1.5.0", 411 | "_model_name": "FloatProgressModel", 412 | "_view_count": null, 413 | "_view_module": "@jupyter-widgets/controls", 414 | "_view_module_version": "1.5.0", 415 | "_view_name": "ProgressView", 416 | "bar_style": "success", 417 | "description": "", 418 | "description_tooltip": null, 419 | "layout": "IPY_MODEL_cbfc4a83c769442b9a25c27bcc5a38a8", 420 | "max": 1477, 421 | "min": 0, 422 | "orientation": "horizontal", 423 | "style": "IPY_MODEL_3869a56e41df4c418d04ed4f623ba50c", 424 | "value": 1477 425 | } 426 | }, 427 | "f863f30570744e3483bf124b2b0667b0": { 428 | "model_module": "@jupyter-widgets/controls", 429 | "model_name": "HTMLModel", 430 | "model_module_version": "1.5.0", 431 | "state": { 432 | "_dom_classes": [], 433 | "_model_module": "@jupyter-widgets/controls", 434 | "_model_module_version": "1.5.0", 435 | "_model_name": "HTMLModel", 436 | "_view_count": null, 437 | "_view_module": "@jupyter-widgets/controls", 438 | "_view_module_version": "1.5.0", 439 | "_view_name": "HTMLView", 440 | "description": "", 441 | "description_tooltip": null, 442 | "layout": "IPY_MODEL_3bdc109f2eef4108b0f01eb2d1b641e9", 443 | "placeholder": "​", 444 | "style": "IPY_MODEL_cb92e9d3f2d4438a87d9ca41b1e461a8", 445 | "value": " 1.48k/1.48k [00:00<00:00, 29.1kB/s]" 446 | } 447 | }, 448 | "5d3236474a4f41b2b0b2f8b8e6d7acf2": { 449 | "model_module": "@jupyter-widgets/base", 450 | "model_name": "LayoutModel", 451 | "model_module_version": "1.2.0", 452 | "state": { 453 | "_model_module": "@jupyter-widgets/base", 454 | "_model_module_version": "1.2.0", 455 | "_model_name": "LayoutModel", 456 | "_view_count": null, 457 | "_view_module": "@jupyter-widgets/base", 458 | "_view_module_version": "1.2.0", 459 | "_view_name": "LayoutView", 460 | "align_content": null, 461 | "align_items": null, 462 | "align_self": null, 463 | "border": null, 464 | "bottom": null, 465 | "display": null, 466 | "flex": null, 467 | "flex_flow": null, 468 | "grid_area": null, 469 | "grid_auto_columns": null, 470 | "grid_auto_flow": null, 471 | "grid_auto_rows": null, 472 | "grid_column": null, 473 | "grid_gap": null, 474 | "grid_row": null, 475 | "grid_template_areas": null, 476 | "grid_template_columns": null, 477 | "grid_template_rows": null, 478 | "height": null, 479 | "justify_content": null, 480 | "justify_items": null, 481 | "left": null, 482 | "margin": null, 483 | "max_height": null, 484 | "max_width": null, 485 | "min_height": null, 486 | "min_width": null, 487 | "object_fit": null, 488 | "object_position": null, 489 | "order": null, 490 | "overflow": null, 491 | "overflow_x": null, 492 | "overflow_y": null, 493 | "padding": null, 494 | "right": null, 495 | "top": null, 496 | "visibility": null, 497 | "width": null 498 | } 499 | }, 500 | "474e96e623aa49c8a603801db8ab73e5": { 501 | "model_module": "@jupyter-widgets/base", 502 | "model_name": "LayoutModel", 503 | "model_module_version": "1.2.0", 504 | "state": { 505 | "_model_module": "@jupyter-widgets/base", 506 | "_model_module_version": "1.2.0", 507 | "_model_name": "LayoutModel", 508 | "_view_count": null, 509 | "_view_module": "@jupyter-widgets/base", 510 | "_view_module_version": "1.2.0", 511 | "_view_name": "LayoutView", 512 | "align_content": null, 513 | "align_items": null, 514 | "align_self": null, 515 | "border": null, 516 | "bottom": null, 517 | "display": null, 518 | "flex": null, 519 | "flex_flow": null, 520 | "grid_area": null, 521 | "grid_auto_columns": null, 522 | "grid_auto_flow": null, 523 | "grid_auto_rows": null, 524 | "grid_column": null, 525 | "grid_gap": null, 526 | "grid_row": null, 527 | "grid_template_areas": null, 528 | "grid_template_columns": null, 529 | "grid_template_rows": null, 530 | "height": null, 531 | "justify_content": null, 532 | "justify_items": null, 533 | "left": null, 534 | "margin": null, 535 | "max_height": null, 536 | "max_width": null, 537 | "min_height": null, 538 | "min_width": null, 539 | "object_fit": null, 540 | "object_position": null, 541 | "order": null, 542 | "overflow": null, 543 | "overflow_x": null, 544 | "overflow_y": null, 545 | "padding": null, 546 | "right": null, 547 | "top": null, 548 | "visibility": null, 549 | "width": null 550 | } 551 | }, 552 | "0d9c8dba0a10477dbb3449ae1124ee17": { 553 | "model_module": "@jupyter-widgets/controls", 554 | "model_name": "DescriptionStyleModel", 555 | "model_module_version": "1.5.0", 556 | "state": { 557 | "_model_module": "@jupyter-widgets/controls", 558 | "_model_module_version": "1.5.0", 559 | "_model_name": "DescriptionStyleModel", 560 | "_view_count": null, 561 | "_view_module": "@jupyter-widgets/base", 562 | "_view_module_version": "1.2.0", 563 | "_view_name": "StyleView", 564 | "description_width": "" 565 | } 566 | }, 567 | "cbfc4a83c769442b9a25c27bcc5a38a8": { 568 | "model_module": "@jupyter-widgets/base", 569 | "model_name": "LayoutModel", 570 | "model_module_version": "1.2.0", 571 | "state": { 572 | "_model_module": "@jupyter-widgets/base", 573 | "_model_module_version": "1.2.0", 574 | "_model_name": "LayoutModel", 575 | "_view_count": null, 576 | "_view_module": "@jupyter-widgets/base", 577 | "_view_module_version": "1.2.0", 578 | "_view_name": "LayoutView", 579 | "align_content": null, 580 | "align_items": null, 581 | "align_self": null, 582 | "border": null, 583 | "bottom": null, 584 | "display": null, 585 | "flex": null, 586 | "flex_flow": null, 587 | "grid_area": null, 588 | "grid_auto_columns": null, 589 | "grid_auto_flow": null, 590 | "grid_auto_rows": null, 591 | "grid_column": null, 592 | "grid_gap": null, 593 | "grid_row": null, 594 | "grid_template_areas": null, 595 | "grid_template_columns": null, 596 | "grid_template_rows": null, 597 | "height": null, 598 | "justify_content": null, 599 | "justify_items": null, 600 | "left": null, 601 | "margin": null, 602 | "max_height": null, 603 | "max_width": null, 604 | "min_height": null, 605 | "min_width": null, 606 | "object_fit": null, 607 | "object_position": null, 608 | "order": null, 609 | "overflow": null, 610 | "overflow_x": null, 611 | "overflow_y": null, 612 | "padding": null, 613 | "right": null, 614 | "top": null, 615 | "visibility": null, 616 | "width": null 617 | } 618 | }, 619 | "3869a56e41df4c418d04ed4f623ba50c": { 620 | "model_module": "@jupyter-widgets/controls", 621 | "model_name": "ProgressStyleModel", 622 | "model_module_version": "1.5.0", 623 | "state": { 624 | "_model_module": "@jupyter-widgets/controls", 625 | "_model_module_version": "1.5.0", 626 | "_model_name": "ProgressStyleModel", 627 | "_view_count": null, 628 | "_view_module": "@jupyter-widgets/base", 629 | "_view_module_version": "1.2.0", 630 | "_view_name": "StyleView", 631 | "bar_color": null, 632 | "description_width": "" 633 | } 634 | }, 635 | "3bdc109f2eef4108b0f01eb2d1b641e9": { 636 | "model_module": "@jupyter-widgets/base", 637 | "model_name": "LayoutModel", 638 | "model_module_version": "1.2.0", 639 | "state": { 640 | "_model_module": "@jupyter-widgets/base", 641 | "_model_module_version": "1.2.0", 642 | "_model_name": "LayoutModel", 643 | "_view_count": null, 644 | "_view_module": "@jupyter-widgets/base", 645 | "_view_module_version": "1.2.0", 646 | "_view_name": "LayoutView", 647 | "align_content": null, 648 | "align_items": null, 649 | "align_self": null, 650 | "border": null, 651 | "bottom": null, 652 | "display": null, 653 | "flex": null, 654 | "flex_flow": null, 655 | "grid_area": null, 656 | "grid_auto_columns": null, 657 | "grid_auto_flow": null, 658 | "grid_auto_rows": null, 659 | "grid_column": null, 660 | "grid_gap": null, 661 | "grid_row": null, 662 | "grid_template_areas": null, 663 | "grid_template_columns": null, 664 | "grid_template_rows": null, 665 | "height": null, 666 | "justify_content": null, 667 | "justify_items": null, 668 | "left": null, 669 | "margin": null, 670 | "max_height": null, 671 | "max_width": null, 672 | "min_height": null, 673 | "min_width": null, 674 | "object_fit": null, 675 | "object_position": null, 676 | "order": null, 677 | "overflow": null, 678 | "overflow_x": null, 679 | "overflow_y": null, 680 | "padding": null, 681 | "right": null, 682 | "top": null, 683 | "visibility": null, 684 | "width": null 685 | } 686 | }, 687 | "cb92e9d3f2d4438a87d9ca41b1e461a8": { 688 | "model_module": "@jupyter-widgets/controls", 689 | "model_name": "DescriptionStyleModel", 690 | "model_module_version": "1.5.0", 691 | "state": { 692 | "_model_module": "@jupyter-widgets/controls", 693 | "_model_module_version": "1.5.0", 694 | "_model_name": "DescriptionStyleModel", 695 | "_view_count": null, 696 | "_view_module": "@jupyter-widgets/base", 697 | "_view_module_version": "1.2.0", 698 | "_view_name": "StyleView", 699 | "description_width": "" 700 | } 701 | }, 702 | "b16ee4cb317c4148a51659b04a87aa9a": { 703 | "model_module": "@jupyter-widgets/controls", 704 | "model_name": "HBoxModel", 705 | "model_module_version": "1.5.0", 706 | "state": { 707 | "_dom_classes": [], 708 | "_model_module": "@jupyter-widgets/controls", 709 | "_model_module_version": "1.5.0", 710 | "_model_name": "HBoxModel", 711 | "_view_count": null, 712 | "_view_module": "@jupyter-widgets/controls", 713 | "_view_module_version": "1.5.0", 714 | "_view_name": "HBoxView", 715 | "box_style": "", 716 | "children": [ 717 | "IPY_MODEL_6e1648b5d8594515b03da651c375a26a", 718 | "IPY_MODEL_07a81aa9807840bbab84804d45740c05", 719 | "IPY_MODEL_e7f39db62bd242a087d42947b504248f" 720 | ], 721 | "layout": "IPY_MODEL_1f684d285e65464fa702c9881b5143bf" 722 | } 723 | }, 724 | "6e1648b5d8594515b03da651c375a26a": { 725 | "model_module": "@jupyter-widgets/controls", 726 | "model_name": "HTMLModel", 727 | "model_module_version": "1.5.0", 728 | "state": { 729 | "_dom_classes": [], 730 | "_model_module": "@jupyter-widgets/controls", 731 | "_model_module_version": "1.5.0", 732 | "_model_name": "HTMLModel", 733 | "_view_count": null, 734 | "_view_module": "@jupyter-widgets/controls", 735 | "_view_module_version": "1.5.0", 736 | "_view_name": "HTMLView", 737 | "description": "", 738 | "description_tooltip": null, 739 | "layout": "IPY_MODEL_38739a346bbf497894bcbe7490aba6b9", 740 | "placeholder": "​", 741 | "style": "IPY_MODEL_c8b80b5f957b4e9cb73afe4362f83f93", 742 | "value": "Loading first 40k docs from NeuCLIR Chinese Collection: 100%" 743 | } 744 | }, 745 | "07a81aa9807840bbab84804d45740c05": { 746 | "model_module": "@jupyter-widgets/controls", 747 | "model_name": "FloatProgressModel", 748 | "model_module_version": "1.5.0", 749 | "state": { 750 | "_dom_classes": [], 751 | "_model_module": "@jupyter-widgets/controls", 752 | "_model_module_version": "1.5.0", 753 | "_model_name": "FloatProgressModel", 754 | "_view_count": null, 755 | "_view_module": "@jupyter-widgets/controls", 756 | "_view_module_version": "1.5.0", 757 | "_view_name": "ProgressView", 758 | "bar_style": "success", 759 | "description": "", 760 | "description_tooltip": null, 761 | "layout": "IPY_MODEL_fd8f17f50ac94bd8bcc6d6314d1a4379", 762 | "max": 40000, 763 | "min": 0, 764 | "orientation": "horizontal", 765 | "style": "IPY_MODEL_4345f1024bec4be4b245310f3f223ddc", 766 | "value": 40000 767 | } 768 | }, 769 | "e7f39db62bd242a087d42947b504248f": { 770 | "model_module": "@jupyter-widgets/controls", 771 | "model_name": "HTMLModel", 772 | "model_module_version": "1.5.0", 773 | "state": { 774 | "_dom_classes": [], 775 | "_model_module": "@jupyter-widgets/controls", 776 | "_model_module_version": "1.5.0", 777 | "_model_name": "HTMLModel", 778 | "_view_count": null, 779 | "_view_module": "@jupyter-widgets/controls", 780 | "_view_module_version": "1.5.0", 781 | "_view_name": "HTMLView", 782 | "description": "", 783 | "description_tooltip": null, 784 | "layout": "IPY_MODEL_aec3d14f998a4766887c6f5abd9baed3", 785 | "placeholder": "​", 786 | "style": "IPY_MODEL_9e21e3cb60ee4f32ab957eb86d8d400c", 787 | "value": " 40000/40000 [00:19<00:00, 3542.89it/s]" 788 | } 789 | }, 790 | "1f684d285e65464fa702c9881b5143bf": { 791 | "model_module": "@jupyter-widgets/base", 792 | "model_name": "LayoutModel", 793 | "model_module_version": "1.2.0", 794 | "state": { 795 | "_model_module": "@jupyter-widgets/base", 796 | "_model_module_version": "1.2.0", 797 | "_model_name": "LayoutModel", 798 | "_view_count": null, 799 | "_view_module": "@jupyter-widgets/base", 800 | "_view_module_version": "1.2.0", 801 | "_view_name": "LayoutView", 802 | "align_content": null, 803 | "align_items": null, 804 | "align_self": null, 805 | "border": null, 806 | "bottom": null, 807 | "display": null, 808 | "flex": null, 809 | "flex_flow": null, 810 | "grid_area": null, 811 | "grid_auto_columns": null, 812 | "grid_auto_flow": null, 813 | "grid_auto_rows": null, 814 | "grid_column": null, 815 | "grid_gap": null, 816 | "grid_row": null, 817 | "grid_template_areas": null, 818 | "grid_template_columns": null, 819 | "grid_template_rows": null, 820 | "height": null, 821 | "justify_content": null, 822 | "justify_items": null, 823 | "left": null, 824 | "margin": null, 825 | "max_height": null, 826 | "max_width": null, 827 | "min_height": null, 828 | "min_width": null, 829 | "object_fit": null, 830 | "object_position": null, 831 | "order": null, 832 | "overflow": null, 833 | "overflow_x": null, 834 | "overflow_y": null, 835 | "padding": null, 836 | "right": null, 837 | "top": null, 838 | "visibility": null, 839 | "width": null 840 | } 841 | }, 842 | "38739a346bbf497894bcbe7490aba6b9": { 843 | "model_module": "@jupyter-widgets/base", 844 | "model_name": "LayoutModel", 845 | "model_module_version": "1.2.0", 846 | "state": { 847 | "_model_module": "@jupyter-widgets/base", 848 | "_model_module_version": "1.2.0", 849 | "_model_name": "LayoutModel", 850 | "_view_count": null, 851 | "_view_module": "@jupyter-widgets/base", 852 | "_view_module_version": "1.2.0", 853 | "_view_name": "LayoutView", 854 | "align_content": null, 855 | "align_items": null, 856 | "align_self": null, 857 | "border": null, 858 | "bottom": null, 859 | "display": null, 860 | "flex": null, 861 | "flex_flow": null, 862 | "grid_area": null, 863 | "grid_auto_columns": null, 864 | "grid_auto_flow": null, 865 | "grid_auto_rows": null, 866 | "grid_column": null, 867 | "grid_gap": null, 868 | "grid_row": null, 869 | "grid_template_areas": null, 870 | "grid_template_columns": null, 871 | "grid_template_rows": null, 872 | "height": null, 873 | "justify_content": null, 874 | "justify_items": null, 875 | "left": null, 876 | "margin": null, 877 | "max_height": null, 878 | "max_width": null, 879 | "min_height": null, 880 | "min_width": null, 881 | "object_fit": null, 882 | "object_position": null, 883 | "order": null, 884 | "overflow": null, 885 | "overflow_x": null, 886 | "overflow_y": null, 887 | "padding": null, 888 | "right": null, 889 | "top": null, 890 | "visibility": null, 891 | "width": null 892 | } 893 | }, 894 | "c8b80b5f957b4e9cb73afe4362f83f93": { 895 | "model_module": "@jupyter-widgets/controls", 896 | "model_name": "DescriptionStyleModel", 897 | "model_module_version": "1.5.0", 898 | "state": { 899 | "_model_module": "@jupyter-widgets/controls", 900 | "_model_module_version": "1.5.0", 901 | "_model_name": "DescriptionStyleModel", 902 | "_view_count": null, 903 | "_view_module": "@jupyter-widgets/base", 904 | "_view_module_version": "1.2.0", 905 | "_view_name": "StyleView", 906 | "description_width": "" 907 | } 908 | }, 909 | "fd8f17f50ac94bd8bcc6d6314d1a4379": { 910 | "model_module": "@jupyter-widgets/base", 911 | "model_name": "LayoutModel", 912 | "model_module_version": "1.2.0", 913 | "state": { 914 | "_model_module": "@jupyter-widgets/base", 915 | "_model_module_version": "1.2.0", 916 | "_model_name": "LayoutModel", 917 | "_view_count": null, 918 | "_view_module": "@jupyter-widgets/base", 919 | "_view_module_version": "1.2.0", 920 | "_view_name": "LayoutView", 921 | "align_content": null, 922 | "align_items": null, 923 | "align_self": null, 924 | "border": null, 925 | "bottom": null, 926 | "display": null, 927 | "flex": null, 928 | "flex_flow": null, 929 | "grid_area": null, 930 | "grid_auto_columns": null, 931 | "grid_auto_flow": null, 932 | "grid_auto_rows": null, 933 | "grid_column": null, 934 | "grid_gap": null, 935 | "grid_row": null, 936 | "grid_template_areas": null, 937 | "grid_template_columns": null, 938 | "grid_template_rows": null, 939 | "height": null, 940 | "justify_content": null, 941 | "justify_items": null, 942 | "left": null, 943 | "margin": null, 944 | "max_height": null, 945 | "max_width": null, 946 | "min_height": null, 947 | "min_width": null, 948 | "object_fit": null, 949 | "object_position": null, 950 | "order": null, 951 | "overflow": null, 952 | "overflow_x": null, 953 | "overflow_y": null, 954 | "padding": null, 955 | "right": null, 956 | "top": null, 957 | "visibility": null, 958 | "width": null 959 | } 960 | }, 961 | "4345f1024bec4be4b245310f3f223ddc": { 962 | "model_module": "@jupyter-widgets/controls", 963 | "model_name": "ProgressStyleModel", 964 | "model_module_version": "1.5.0", 965 | "state": { 966 | "_model_module": "@jupyter-widgets/controls", 967 | "_model_module_version": "1.5.0", 968 | "_model_name": "ProgressStyleModel", 969 | "_view_count": null, 970 | "_view_module": "@jupyter-widgets/base", 971 | "_view_module_version": "1.2.0", 972 | "_view_name": "StyleView", 973 | "bar_color": null, 974 | "description_width": "" 975 | } 976 | }, 977 | "aec3d14f998a4766887c6f5abd9baed3": { 978 | "model_module": "@jupyter-widgets/base", 979 | "model_name": "LayoutModel", 980 | "model_module_version": "1.2.0", 981 | "state": { 982 | "_model_module": "@jupyter-widgets/base", 983 | "_model_module_version": "1.2.0", 984 | "_model_name": "LayoutModel", 985 | "_view_count": null, 986 | "_view_module": "@jupyter-widgets/base", 987 | "_view_module_version": "1.2.0", 988 | "_view_name": "LayoutView", 989 | "align_content": null, 990 | "align_items": null, 991 | "align_self": null, 992 | "border": null, 993 | "bottom": null, 994 | "display": null, 995 | "flex": null, 996 | "flex_flow": null, 997 | "grid_area": null, 998 | "grid_auto_columns": null, 999 | "grid_auto_flow": null, 1000 | "grid_auto_rows": null, 1001 | "grid_column": null, 1002 | "grid_gap": null, 1003 | "grid_row": null, 1004 | "grid_template_areas": null, 1005 | "grid_template_columns": null, 1006 | "grid_template_rows": null, 1007 | "height": null, 1008 | "justify_content": null, 1009 | "justify_items": null, 1010 | "left": null, 1011 | "margin": null, 1012 | "max_height": null, 1013 | "max_width": null, 1014 | "min_height": null, 1015 | "min_width": null, 1016 | "object_fit": null, 1017 | "object_position": null, 1018 | "order": null, 1019 | "overflow": null, 1020 | "overflow_x": null, 1021 | "overflow_y": null, 1022 | "padding": null, 1023 | "right": null, 1024 | "top": null, 1025 | "visibility": null, 1026 | "width": null 1027 | } 1028 | }, 1029 | "9e21e3cb60ee4f32ab957eb86d8d400c": { 1030 | "model_module": "@jupyter-widgets/controls", 1031 | "model_name": "DescriptionStyleModel", 1032 | "model_module_version": "1.5.0", 1033 | "state": { 1034 | "_model_module": "@jupyter-widgets/controls", 1035 | "_model_module_version": "1.5.0", 1036 | "_model_name": "DescriptionStyleModel", 1037 | "_view_count": null, 1038 | "_view_module": "@jupyter-widgets/base", 1039 | "_view_module_version": "1.2.0", 1040 | "_view_name": "StyleView", 1041 | "description_width": "" 1042 | } 1043 | }, 1044 | "9488edc4e4b44e66b5ca897b28510918": { 1045 | "model_module": "@jupyter-widgets/controls", 1046 | "model_name": "HBoxModel", 1047 | "model_module_version": "1.5.0", 1048 | "state": { 1049 | "_dom_classes": [], 1050 | "_model_module": "@jupyter-widgets/controls", 1051 | "_model_module_version": "1.5.0", 1052 | "_model_name": "HBoxModel", 1053 | "_view_count": null, 1054 | "_view_module": "@jupyter-widgets/controls", 1055 | "_view_module_version": "1.5.0", 1056 | "_view_name": "HBoxView", 1057 | "box_style": "", 1058 | "children": [ 1059 | "IPY_MODEL_7d653439a17744be8fa9d073f7e872a0", 1060 | "IPY_MODEL_ae886d7b6a9c4971ba79ea7827e07fc3", 1061 | "IPY_MODEL_bea8662ba12a4b3f906b48096c8ce877" 1062 | ], 1063 | "layout": "IPY_MODEL_f4d3a5abe24143c2b6d75296caccae5c" 1064 | } 1065 | }, 1066 | "7d653439a17744be8fa9d073f7e872a0": { 1067 | "model_module": "@jupyter-widgets/controls", 1068 | "model_name": "HTMLModel", 1069 | "model_module_version": "1.5.0", 1070 | "state": { 1071 | "_dom_classes": [], 1072 | "_model_module": "@jupyter-widgets/controls", 1073 | "_model_module_version": "1.5.0", 1074 | "_model_name": "HTMLModel", 1075 | "_view_count": null, 1076 | "_view_module": "@jupyter-widgets/controls", 1077 | "_view_module_version": "1.5.0", 1078 | "_view_name": "HTMLView", 1079 | "description": "", 1080 | "description_tooltip": null, 1081 | "layout": "IPY_MODEL_f5a986f51865470881b2a643d9a83597", 1082 | "placeholder": "​", 1083 | "style": "IPY_MODEL_02ae8d0d38724d57a231734768837b96", 1084 | "value": "100%" 1085 | } 1086 | }, 1087 | "ae886d7b6a9c4971ba79ea7827e07fc3": { 1088 | "model_module": "@jupyter-widgets/controls", 1089 | "model_name": "FloatProgressModel", 1090 | "model_module_version": "1.5.0", 1091 | "state": { 1092 | "_dom_classes": [], 1093 | "_model_module": "@jupyter-widgets/controls", 1094 | "_model_module_version": "1.5.0", 1095 | "_model_name": "FloatProgressModel", 1096 | "_view_count": null, 1097 | "_view_module": "@jupyter-widgets/controls", 1098 | "_view_module_version": "1.5.0", 1099 | "_view_name": "ProgressView", 1100 | "bar_style": "success", 1101 | "description": "", 1102 | "description_tooltip": null, 1103 | "layout": "IPY_MODEL_33d9873506f9449db284457fb2f7a95c", 1104 | "max": 40000, 1105 | "min": 0, 1106 | "orientation": "horizontal", 1107 | "style": "IPY_MODEL_fffecdb96c8f4c51840b2a09d5dc91cf", 1108 | "value": 40000 1109 | } 1110 | }, 1111 | "bea8662ba12a4b3f906b48096c8ce877": { 1112 | "model_module": "@jupyter-widgets/controls", 1113 | "model_name": "HTMLModel", 1114 | "model_module_version": "1.5.0", 1115 | "state": { 1116 | "_dom_classes": [], 1117 | "_model_module": "@jupyter-widgets/controls", 1118 | "_model_module_version": "1.5.0", 1119 | "_model_name": "HTMLModel", 1120 | "_view_count": null, 1121 | "_view_module": "@jupyter-widgets/controls", 1122 | "_view_module_version": "1.5.0", 1123 | "_view_name": "HTMLView", 1124 | "description": "", 1125 | "description_tooltip": null, 1126 | "layout": "IPY_MODEL_5bf6c19517364eeb8043250f4cc65f6e", 1127 | "placeholder": "​", 1128 | "style": "IPY_MODEL_226d61b2375b47a088528088edbf38a2", 1129 | "value": " 40000/40000 [00:00<00:00, 48729.24it/s]" 1130 | } 1131 | }, 1132 | "f4d3a5abe24143c2b6d75296caccae5c": { 1133 | "model_module": "@jupyter-widgets/base", 1134 | "model_name": "LayoutModel", 1135 | "model_module_version": "1.2.0", 1136 | "state": { 1137 | "_model_module": "@jupyter-widgets/base", 1138 | "_model_module_version": "1.2.0", 1139 | "_model_name": "LayoutModel", 1140 | "_view_count": null, 1141 | "_view_module": "@jupyter-widgets/base", 1142 | "_view_module_version": "1.2.0", 1143 | "_view_name": "LayoutView", 1144 | "align_content": null, 1145 | "align_items": null, 1146 | "align_self": null, 1147 | "border": null, 1148 | "bottom": null, 1149 | "display": null, 1150 | "flex": null, 1151 | "flex_flow": null, 1152 | "grid_area": null, 1153 | "grid_auto_columns": null, 1154 | "grid_auto_flow": null, 1155 | "grid_auto_rows": null, 1156 | "grid_column": null, 1157 | "grid_gap": null, 1158 | "grid_row": null, 1159 | "grid_template_areas": null, 1160 | "grid_template_columns": null, 1161 | "grid_template_rows": null, 1162 | "height": null, 1163 | "justify_content": null, 1164 | "justify_items": null, 1165 | "left": null, 1166 | "margin": null, 1167 | "max_height": null, 1168 | "max_width": null, 1169 | "min_height": null, 1170 | "min_width": null, 1171 | "object_fit": null, 1172 | "object_position": null, 1173 | "order": null, 1174 | "overflow": null, 1175 | "overflow_x": null, 1176 | "overflow_y": null, 1177 | "padding": null, 1178 | "right": null, 1179 | "top": null, 1180 | "visibility": null, 1181 | "width": null 1182 | } 1183 | }, 1184 | "f5a986f51865470881b2a643d9a83597": { 1185 | "model_module": "@jupyter-widgets/base", 1186 | "model_name": "LayoutModel", 1187 | "model_module_version": "1.2.0", 1188 | "state": { 1189 | "_model_module": "@jupyter-widgets/base", 1190 | "_model_module_version": "1.2.0", 1191 | "_model_name": "LayoutModel", 1192 | "_view_count": null, 1193 | "_view_module": "@jupyter-widgets/base", 1194 | "_view_module_version": "1.2.0", 1195 | "_view_name": "LayoutView", 1196 | "align_content": null, 1197 | "align_items": null, 1198 | "align_self": null, 1199 | "border": null, 1200 | "bottom": null, 1201 | "display": null, 1202 | "flex": null, 1203 | "flex_flow": null, 1204 | "grid_area": null, 1205 | "grid_auto_columns": null, 1206 | "grid_auto_flow": null, 1207 | "grid_auto_rows": null, 1208 | "grid_column": null, 1209 | "grid_gap": null, 1210 | "grid_row": null, 1211 | "grid_template_areas": null, 1212 | "grid_template_columns": null, 1213 | "grid_template_rows": null, 1214 | "height": null, 1215 | "justify_content": null, 1216 | "justify_items": null, 1217 | "left": null, 1218 | "margin": null, 1219 | "max_height": null, 1220 | "max_width": null, 1221 | "min_height": null, 1222 | "min_width": null, 1223 | "object_fit": null, 1224 | "object_position": null, 1225 | "order": null, 1226 | "overflow": null, 1227 | "overflow_x": null, 1228 | "overflow_y": null, 1229 | "padding": null, 1230 | "right": null, 1231 | "top": null, 1232 | "visibility": null, 1233 | "width": null 1234 | } 1235 | }, 1236 | "02ae8d0d38724d57a231734768837b96": { 1237 | "model_module": "@jupyter-widgets/controls", 1238 | "model_name": "DescriptionStyleModel", 1239 | "model_module_version": "1.5.0", 1240 | "state": { 1241 | "_model_module": "@jupyter-widgets/controls", 1242 | "_model_module_version": "1.5.0", 1243 | "_model_name": "DescriptionStyleModel", 1244 | "_view_count": null, 1245 | "_view_module": "@jupyter-widgets/base", 1246 | "_view_module_version": "1.2.0", 1247 | "_view_name": "StyleView", 1248 | "description_width": "" 1249 | } 1250 | }, 1251 | "33d9873506f9449db284457fb2f7a95c": { 1252 | "model_module": "@jupyter-widgets/base", 1253 | "model_name": "LayoutModel", 1254 | "model_module_version": "1.2.0", 1255 | "state": { 1256 | "_model_module": "@jupyter-widgets/base", 1257 | "_model_module_version": "1.2.0", 1258 | "_model_name": "LayoutModel", 1259 | "_view_count": null, 1260 | "_view_module": "@jupyter-widgets/base", 1261 | "_view_module_version": "1.2.0", 1262 | "_view_name": "LayoutView", 1263 | "align_content": null, 1264 | "align_items": null, 1265 | "align_self": null, 1266 | "border": null, 1267 | "bottom": null, 1268 | "display": null, 1269 | "flex": null, 1270 | "flex_flow": null, 1271 | "grid_area": null, 1272 | "grid_auto_columns": null, 1273 | "grid_auto_flow": null, 1274 | "grid_auto_rows": null, 1275 | "grid_column": null, 1276 | "grid_gap": null, 1277 | "grid_row": null, 1278 | "grid_template_areas": null, 1279 | "grid_template_columns": null, 1280 | "grid_template_rows": null, 1281 | "height": null, 1282 | "justify_content": null, 1283 | "justify_items": null, 1284 | "left": null, 1285 | "margin": null, 1286 | "max_height": null, 1287 | "max_width": null, 1288 | "min_height": null, 1289 | "min_width": null, 1290 | "object_fit": null, 1291 | "object_position": null, 1292 | "order": null, 1293 | "overflow": null, 1294 | "overflow_x": null, 1295 | "overflow_y": null, 1296 | "padding": null, 1297 | "right": null, 1298 | "top": null, 1299 | "visibility": null, 1300 | "width": null 1301 | } 1302 | }, 1303 | "fffecdb96c8f4c51840b2a09d5dc91cf": { 1304 | "model_module": "@jupyter-widgets/controls", 1305 | "model_name": "ProgressStyleModel", 1306 | "model_module_version": "1.5.0", 1307 | "state": { 1308 | "_model_module": "@jupyter-widgets/controls", 1309 | "_model_module_version": "1.5.0", 1310 | "_model_name": "ProgressStyleModel", 1311 | "_view_count": null, 1312 | "_view_module": "@jupyter-widgets/base", 1313 | "_view_module_version": "1.2.0", 1314 | "_view_name": "StyleView", 1315 | "bar_color": null, 1316 | "description_width": "" 1317 | } 1318 | }, 1319 | "5bf6c19517364eeb8043250f4cc65f6e": { 1320 | "model_module": "@jupyter-widgets/base", 1321 | "model_name": "LayoutModel", 1322 | "model_module_version": "1.2.0", 1323 | "state": { 1324 | "_model_module": "@jupyter-widgets/base", 1325 | "_model_module_version": "1.2.0", 1326 | "_model_name": "LayoutModel", 1327 | "_view_count": null, 1328 | "_view_module": "@jupyter-widgets/base", 1329 | "_view_module_version": "1.2.0", 1330 | "_view_name": "LayoutView", 1331 | "align_content": null, 1332 | "align_items": null, 1333 | "align_self": null, 1334 | "border": null, 1335 | "bottom": null, 1336 | "display": null, 1337 | "flex": null, 1338 | "flex_flow": null, 1339 | "grid_area": null, 1340 | "grid_auto_columns": null, 1341 | "grid_auto_flow": null, 1342 | "grid_auto_rows": null, 1343 | "grid_column": null, 1344 | "grid_gap": null, 1345 | "grid_row": null, 1346 | "grid_template_areas": null, 1347 | "grid_template_columns": null, 1348 | "grid_template_rows": null, 1349 | "height": null, 1350 | "justify_content": null, 1351 | "justify_items": null, 1352 | "left": null, 1353 | "margin": null, 1354 | "max_height": null, 1355 | "max_width": null, 1356 | "min_height": null, 1357 | "min_width": null, 1358 | "object_fit": null, 1359 | "object_position": null, 1360 | "order": null, 1361 | "overflow": null, 1362 | "overflow_x": null, 1363 | "overflow_y": null, 1364 | "padding": null, 1365 | "right": null, 1366 | "top": null, 1367 | "visibility": null, 1368 | "width": null 1369 | } 1370 | }, 1371 | "226d61b2375b47a088528088edbf38a2": { 1372 | "model_module": "@jupyter-widgets/controls", 1373 | "model_name": "DescriptionStyleModel", 1374 | "model_module_version": "1.5.0", 1375 | "state": { 1376 | "_model_module": "@jupyter-widgets/controls", 1377 | "_model_module_version": "1.5.0", 1378 | "_model_name": "DescriptionStyleModel", 1379 | "_view_count": null, 1380 | "_view_module": "@jupyter-widgets/base", 1381 | "_view_module_version": "1.2.0", 1382 | "_view_name": "StyleView", 1383 | "description_width": "" 1384 | } 1385 | }, 1386 | "d0cd311b96414b28b4dfc8ff2ae0c6cf": { 1387 | "model_module": "@jupyter-widgets/controls", 1388 | "model_name": "HBoxModel", 1389 | "model_module_version": "1.5.0", 1390 | "state": { 1391 | "_dom_classes": [], 1392 | "_model_module": "@jupyter-widgets/controls", 1393 | "_model_module_version": "1.5.0", 1394 | "_model_name": "HBoxModel", 1395 | "_view_count": null, 1396 | "_view_module": "@jupyter-widgets/controls", 1397 | "_view_module_version": "1.5.0", 1398 | "_view_name": "HBoxView", 1399 | "box_style": "", 1400 | "children": [ 1401 | "IPY_MODEL_04de8e99846742de9e2703356fbc8bd1", 1402 | "IPY_MODEL_27d8ebc1a65a4f1a87eab934a376e708", 1403 | "IPY_MODEL_b7cc467aa2f74f3a9b684b0c47f2cf8b" 1404 | ], 1405 | "layout": "IPY_MODEL_b36c54541c554304bae1e2074748c198" 1406 | } 1407 | }, 1408 | "04de8e99846742de9e2703356fbc8bd1": { 1409 | "model_module": "@jupyter-widgets/controls", 1410 | "model_name": "HTMLModel", 1411 | "model_module_version": "1.5.0", 1412 | "state": { 1413 | "_dom_classes": [], 1414 | "_model_module": "@jupyter-widgets/controls", 1415 | "_model_module_version": "1.5.0", 1416 | "_model_name": "HTMLModel", 1417 | "_view_count": null, 1418 | "_view_module": "@jupyter-widgets/controls", 1419 | "_view_module_version": "1.5.0", 1420 | "_view_name": "HTMLView", 1421 | "description": "", 1422 | "description_tooltip": null, 1423 | "layout": "IPY_MODEL_c1bb714605a047c7b0f5b13ae0caf9c0", 1424 | "placeholder": "​", 1425 | "style": "IPY_MODEL_c124ef90df8c4ec5854617c3ba2242ed", 1426 | "value": "100%" 1427 | } 1428 | }, 1429 | "27d8ebc1a65a4f1a87eab934a376e708": { 1430 | "model_module": "@jupyter-widgets/controls", 1431 | "model_name": "FloatProgressModel", 1432 | "model_module_version": "1.5.0", 1433 | "state": { 1434 | "_dom_classes": [], 1435 | "_model_module": "@jupyter-widgets/controls", 1436 | "_model_module_version": "1.5.0", 1437 | "_model_name": "FloatProgressModel", 1438 | "_view_count": null, 1439 | "_view_module": "@jupyter-widgets/controls", 1440 | "_view_module_version": "1.5.0", 1441 | "_view_name": "ProgressView", 1442 | "bar_style": "success", 1443 | "description": "", 1444 | "description_tooltip": null, 1445 | "layout": "IPY_MODEL_e925844235344329a1a222fce30891cb", 1446 | "max": 1250, 1447 | "min": 0, 1448 | "orientation": "horizontal", 1449 | "style": "IPY_MODEL_d5bc297f47b2498ea36e19794309e949", 1450 | "value": 1250 1451 | } 1452 | }, 1453 | "b7cc467aa2f74f3a9b684b0c47f2cf8b": { 1454 | "model_module": "@jupyter-widgets/controls", 1455 | "model_name": "HTMLModel", 1456 | "model_module_version": "1.5.0", 1457 | "state": { 1458 | "_dom_classes": [], 1459 | "_model_module": "@jupyter-widgets/controls", 1460 | "_model_module_version": "1.5.0", 1461 | "_model_name": "HTMLModel", 1462 | "_view_count": null, 1463 | "_view_module": "@jupyter-widgets/controls", 1464 | "_view_module_version": "1.5.0", 1465 | "_view_name": "HTMLView", 1466 | "description": "", 1467 | "description_tooltip": null, 1468 | "layout": "IPY_MODEL_b65befe0653444f9866de8202a8eff30", 1469 | "placeholder": "​", 1470 | "style": "IPY_MODEL_553a9802dc21410e85a406c23a14c422", 1471 | "value": " 1250/1250 [16:13<00:00, 1.26it/s]" 1472 | } 1473 | }, 1474 | "b36c54541c554304bae1e2074748c198": { 1475 | "model_module": "@jupyter-widgets/base", 1476 | "model_name": "LayoutModel", 1477 | "model_module_version": "1.2.0", 1478 | "state": { 1479 | "_model_module": "@jupyter-widgets/base", 1480 | "_model_module_version": "1.2.0", 1481 | "_model_name": "LayoutModel", 1482 | "_view_count": null, 1483 | "_view_module": "@jupyter-widgets/base", 1484 | "_view_module_version": "1.2.0", 1485 | "_view_name": "LayoutView", 1486 | "align_content": null, 1487 | "align_items": null, 1488 | "align_self": null, 1489 | "border": null, 1490 | "bottom": null, 1491 | "display": null, 1492 | "flex": null, 1493 | "flex_flow": null, 1494 | "grid_area": null, 1495 | "grid_auto_columns": null, 1496 | "grid_auto_flow": null, 1497 | "grid_auto_rows": null, 1498 | "grid_column": null, 1499 | "grid_gap": null, 1500 | "grid_row": null, 1501 | "grid_template_areas": null, 1502 | "grid_template_columns": null, 1503 | "grid_template_rows": null, 1504 | "height": null, 1505 | "justify_content": null, 1506 | "justify_items": null, 1507 | "left": null, 1508 | "margin": null, 1509 | "max_height": null, 1510 | "max_width": null, 1511 | "min_height": null, 1512 | "min_width": null, 1513 | "object_fit": null, 1514 | "object_position": null, 1515 | "order": null, 1516 | "overflow": null, 1517 | "overflow_x": null, 1518 | "overflow_y": null, 1519 | "padding": null, 1520 | "right": null, 1521 | "top": null, 1522 | "visibility": null, 1523 | "width": null 1524 | } 1525 | }, 1526 | "c1bb714605a047c7b0f5b13ae0caf9c0": { 1527 | "model_module": "@jupyter-widgets/base", 1528 | "model_name": "LayoutModel", 1529 | "model_module_version": "1.2.0", 1530 | "state": { 1531 | "_model_module": "@jupyter-widgets/base", 1532 | "_model_module_version": "1.2.0", 1533 | "_model_name": "LayoutModel", 1534 | "_view_count": null, 1535 | "_view_module": "@jupyter-widgets/base", 1536 | "_view_module_version": "1.2.0", 1537 | "_view_name": "LayoutView", 1538 | "align_content": null, 1539 | "align_items": null, 1540 | "align_self": null, 1541 | "border": null, 1542 | "bottom": null, 1543 | "display": null, 1544 | "flex": null, 1545 | "flex_flow": null, 1546 | "grid_area": null, 1547 | "grid_auto_columns": null, 1548 | "grid_auto_flow": null, 1549 | "grid_auto_rows": null, 1550 | "grid_column": null, 1551 | "grid_gap": null, 1552 | "grid_row": null, 1553 | "grid_template_areas": null, 1554 | "grid_template_columns": null, 1555 | "grid_template_rows": null, 1556 | "height": null, 1557 | "justify_content": null, 1558 | "justify_items": null, 1559 | "left": null, 1560 | "margin": null, 1561 | "max_height": null, 1562 | "max_width": null, 1563 | "min_height": null, 1564 | "min_width": null, 1565 | "object_fit": null, 1566 | "object_position": null, 1567 | "order": null, 1568 | "overflow": null, 1569 | "overflow_x": null, 1570 | "overflow_y": null, 1571 | "padding": null, 1572 | "right": null, 1573 | "top": null, 1574 | "visibility": null, 1575 | "width": null 1576 | } 1577 | }, 1578 | "c124ef90df8c4ec5854617c3ba2242ed": { 1579 | "model_module": "@jupyter-widgets/controls", 1580 | "model_name": "DescriptionStyleModel", 1581 | "model_module_version": "1.5.0", 1582 | "state": { 1583 | "_model_module": "@jupyter-widgets/controls", 1584 | "_model_module_version": "1.5.0", 1585 | "_model_name": "DescriptionStyleModel", 1586 | "_view_count": null, 1587 | "_view_module": "@jupyter-widgets/base", 1588 | "_view_module_version": "1.2.0", 1589 | "_view_name": "StyleView", 1590 | "description_width": "" 1591 | } 1592 | }, 1593 | "e925844235344329a1a222fce30891cb": { 1594 | "model_module": "@jupyter-widgets/base", 1595 | "model_name": "LayoutModel", 1596 | "model_module_version": "1.2.0", 1597 | "state": { 1598 | "_model_module": "@jupyter-widgets/base", 1599 | "_model_module_version": "1.2.0", 1600 | "_model_name": "LayoutModel", 1601 | "_view_count": null, 1602 | "_view_module": "@jupyter-widgets/base", 1603 | "_view_module_version": "1.2.0", 1604 | "_view_name": "LayoutView", 1605 | "align_content": null, 1606 | "align_items": null, 1607 | "align_self": null, 1608 | "border": null, 1609 | "bottom": null, 1610 | "display": null, 1611 | "flex": null, 1612 | "flex_flow": null, 1613 | "grid_area": null, 1614 | "grid_auto_columns": null, 1615 | "grid_auto_flow": null, 1616 | "grid_auto_rows": null, 1617 | "grid_column": null, 1618 | "grid_gap": null, 1619 | "grid_row": null, 1620 | "grid_template_areas": null, 1621 | "grid_template_columns": null, 1622 | "grid_template_rows": null, 1623 | "height": null, 1624 | "justify_content": null, 1625 | "justify_items": null, 1626 | "left": null, 1627 | "margin": null, 1628 | "max_height": null, 1629 | "max_width": null, 1630 | "min_height": null, 1631 | "min_width": null, 1632 | "object_fit": null, 1633 | "object_position": null, 1634 | "order": null, 1635 | "overflow": null, 1636 | "overflow_x": null, 1637 | "overflow_y": null, 1638 | "padding": null, 1639 | "right": null, 1640 | "top": null, 1641 | "visibility": null, 1642 | "width": null 1643 | } 1644 | }, 1645 | "d5bc297f47b2498ea36e19794309e949": { 1646 | "model_module": "@jupyter-widgets/controls", 1647 | "model_name": "ProgressStyleModel", 1648 | "model_module_version": "1.5.0", 1649 | "state": { 1650 | "_model_module": "@jupyter-widgets/controls", 1651 | "_model_module_version": "1.5.0", 1652 | "_model_name": "ProgressStyleModel", 1653 | "_view_count": null, 1654 | "_view_module": "@jupyter-widgets/base", 1655 | "_view_module_version": "1.2.0", 1656 | "_view_name": "StyleView", 1657 | "bar_color": null, 1658 | "description_width": "" 1659 | } 1660 | }, 1661 | "b65befe0653444f9866de8202a8eff30": { 1662 | "model_module": "@jupyter-widgets/base", 1663 | "model_name": "LayoutModel", 1664 | "model_module_version": "1.2.0", 1665 | "state": { 1666 | "_model_module": "@jupyter-widgets/base", 1667 | "_model_module_version": "1.2.0", 1668 | "_model_name": "LayoutModel", 1669 | "_view_count": null, 1670 | "_view_module": "@jupyter-widgets/base", 1671 | "_view_module_version": "1.2.0", 1672 | "_view_name": "LayoutView", 1673 | "align_content": null, 1674 | "align_items": null, 1675 | "align_self": null, 1676 | "border": null, 1677 | "bottom": null, 1678 | "display": null, 1679 | "flex": null, 1680 | "flex_flow": null, 1681 | "grid_area": null, 1682 | "grid_auto_columns": null, 1683 | "grid_auto_flow": null, 1684 | "grid_auto_rows": null, 1685 | "grid_column": null, 1686 | "grid_gap": null, 1687 | "grid_row": null, 1688 | "grid_template_areas": null, 1689 | "grid_template_columns": null, 1690 | "grid_template_rows": null, 1691 | "height": null, 1692 | "justify_content": null, 1693 | "justify_items": null, 1694 | "left": null, 1695 | "margin": null, 1696 | "max_height": null, 1697 | "max_width": null, 1698 | "min_height": null, 1699 | "min_width": null, 1700 | "object_fit": null, 1701 | "object_position": null, 1702 | "order": null, 1703 | "overflow": null, 1704 | "overflow_x": null, 1705 | "overflow_y": null, 1706 | "padding": null, 1707 | "right": null, 1708 | "top": null, 1709 | "visibility": null, 1710 | "width": null 1711 | } 1712 | }, 1713 | "553a9802dc21410e85a406c23a14c422": { 1714 | "model_module": "@jupyter-widgets/controls", 1715 | "model_name": "DescriptionStyleModel", 1716 | "model_module_version": "1.5.0", 1717 | "state": { 1718 | "_model_module": "@jupyter-widgets/controls", 1719 | "_model_module_version": "1.5.0", 1720 | "_model_name": "DescriptionStyleModel", 1721 | "_view_count": null, 1722 | "_view_module": "@jupyter-widgets/base", 1723 | "_view_module_version": "1.2.0", 1724 | "_view_name": "StyleView", 1725 | "description_width": "" 1726 | } 1727 | } 1728 | } 1729 | }, 1730 | "accelerator": "GPU" 1731 | }, 1732 | "cells": [ 1733 | { 1734 | "cell_type": "markdown", 1735 | "source": [ 1736 | "# BLADE model for CLIR\n", 1737 | "\n", 1738 | "In this notebook we are going to walk through a CLIR example using a Translate-trained bi-encoder model BLADE to produce a ranked list on the NeuCLIR Chinese collection.\n", 1739 | "\n" 1740 | ], 1741 | "metadata": { 1742 | "id": "Gf3QADZx9uWO" 1743 | } 1744 | }, 1745 | { 1746 | "cell_type": "markdown", 1747 | "source": [ 1748 | "## Setup\n", 1749 | "Replicating the steps from the official Anserini [notebook](https://github.com/castorini/anserini-notebooks/blob/master/anserini_robust04_demo.ipynb)" 1750 | ], 1751 | "metadata": { 1752 | "id": "LMpCvJI7-B86" 1753 | } 1754 | }, 1755 | { 1756 | "cell_type": "markdown", 1757 | "source": [ 1758 | "First, install Maven (Java 11 comes pre-installed already):\n" 1759 | ], 1760 | "metadata": { 1761 | "id": "_KRfyjqi-N-M" 1762 | } 1763 | }, 1764 | { 1765 | "cell_type": "code", 1766 | "source": [ 1767 | "%%capture\n", 1768 | "!apt-get install maven -qq" 1769 | ], 1770 | "metadata": { 1771 | "id": "cKQK-XuY-Ohm" 1772 | }, 1773 | "execution_count": null, 1774 | "outputs": [] 1775 | }, 1776 | { 1777 | "cell_type": "markdown", 1778 | "source": [ 1779 | "Clone and build Anserini:" 1780 | ], 1781 | "metadata": { 1782 | "id": "YDR5nlMiBhm_" 1783 | } 1784 | }, 1785 | { 1786 | "cell_type": "code", 1787 | "source": [ 1788 | "%%capture\n", 1789 | "!git clone --recurse-submodules https://github.com/castorini/anserini.git\n", 1790 | "%cd anserini\n", 1791 | "!cd tools/eval && tar xvfz trec_eval.9.0.4.tar.gz && cd trec_eval.9.0.4 && make && cd ../../..\n", 1792 | "!mvn clean package appassembler:assemble -DskipTests -Dmaven.javadoc.skip=true\n", 1793 | "!cd .." 1794 | ], 1795 | "metadata": { 1796 | "id": "fyBTjylzBiaj" 1797 | }, 1798 | "execution_count": null, 1799 | "outputs": [] 1800 | }, 1801 | { 1802 | "cell_type": "markdown", 1803 | "source": [ 1804 | "If all goes well, you should be able to see anserini-X.Y.Z-SNAPSHOT-fatjar.jar in target/:" 1805 | ], 1806 | "metadata": { 1807 | "id": "laeFw5oUBlHg" 1808 | } 1809 | }, 1810 | { 1811 | "cell_type": "markdown", 1812 | "source": [ 1813 | "Let's install the packages!\n", 1814 | "The following command will install `ir_measurees`, Huggingface `datasets`, Google Translate (for presentation), and Huggingface Transformers." 1815 | ], 1816 | "metadata": { 1817 | "id": "QAnsVLdaMm7G" 1818 | } 1819 | }, 1820 | { 1821 | "cell_type": "code", 1822 | "source": [ 1823 | "!pip install -q -U --progress-bar on ir_measures transformers datasets googletrans==3.1.0a0" 1824 | ], 1825 | "metadata": { 1826 | "colab": { 1827 | "base_uri": "https://localhost:8080/" 1828 | }, 1829 | "id": "ZVlNOiqpB2ps", 1830 | "outputId": "36203a94-260d-473d-d24d-71f83390dbb3" 1831 | }, 1832 | "execution_count": null, 1833 | "outputs": [ 1834 | { 1835 | "output_type": "stream", 1836 | "name": "stdout", 1837 | "text": [ 1838 | "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/48.8 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.8/48.8 kB\u001b[0m \u001b[31m2.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1839 | "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 1840 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.2/7.2 MB\u001b[0m \u001b[31m53.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1841 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m486.2/486.2 kB\u001b[0m \u001b[31m36.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1842 | "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 1843 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m55.1/55.1 kB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1844 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.5/1.5 MB\u001b[0m \u001b[31m71.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1845 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m133.4/133.4 kB\u001b[0m \u001b[31m16.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1846 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.8/58.8 kB\u001b[0m \u001b[31m7.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1847 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m42.6/42.6 kB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1848 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.6/53.6 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1849 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m65.0/65.0 kB\u001b[0m \u001b[31m8.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1850 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m287.4/287.4 kB\u001b[0m \u001b[31m33.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1851 | "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 1852 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m268.8/268.8 kB\u001b[0m \u001b[31m28.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1853 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m104.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1854 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m70.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1855 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m110.5/110.5 kB\u001b[0m \u001b[31m13.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1856 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m212.5/212.5 kB\u001b[0m \u001b[31m23.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1857 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.3/134.3 kB\u001b[0m \u001b[31m12.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 1858 | "\u001b[?25h Building wheel for googletrans (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 1859 | " Building wheel for ir_measures (setup.py) ... \u001b[?25l\u001b[?25hdone\n", 1860 | " Building wheel for cwl-eval (setup.py) ... \u001b[?25l\u001b[?25hdone\n" 1861 | ] 1862 | } 1863 | ] 1864 | }, 1865 | { 1866 | "cell_type": "markdown", 1867 | "source": [ 1868 | "After installation, let's download the dataset. The NeuCLIR 1 Collection is publicly available on Huggingface Datasets! Topics and qrels are available on the TREC website, from which we will directly download it.\n", 1869 | "\n", 1870 | "Working with the entire NeuCLIR Chinese collection will take too much indexing time. For this demonstration, we'll just use the first 40k documents." 1871 | ], 1872 | "metadata": { 1873 | "id": "LiZVp9JgMtaL" 1874 | } 1875 | }, 1876 | { 1877 | "cell_type": "code", 1878 | "source": [ 1879 | "# Download topics and qrels from NIST\n", 1880 | "!wget -q --show-progress https://trec.nist.gov/data/neuclir/topics.0720.utf8.jsonl\n", 1881 | "!wget -q --show-progress https://trec.nist.gov/data/neuclir/2022-qrels.zho\n", 1882 | "\n", 1883 | "import json\n", 1884 | "import pandas as pd\n", 1885 | "from tqdm.auto import tqdm\n", 1886 | "\n", 1887 | "import ir_measures as irms\n", 1888 | "from datasets import load_dataset\n", 1889 | "\n", 1890 | "# Only loading the first 40k docs from HF Datasets\n", 1891 | "ds = load_dataset('neuclir/neuclir1', split='zho', streaming=True) # total 3179209\n", 1892 | "doc_subset = [ o for i, o in zip(tqdm(range(40_000), desc='Loading first 40k docs from NeuCLIR Chinese Collection'), ds) ]\n", 1893 | "subset_doc_ids = set([ d['id'] for d in doc_subset ])\n", 1894 | "\n", 1895 | "use_topic = '66' # use topic 66 as demo -- expecting to have 9 relevant docs\n", 1896 | "\n", 1897 | "qrels = pd.DataFrame([ l for l in irms.read_trec_qrels('2022-qrels.zho') if l.query_id == use_topic and l.doc_id in subset_doc_ids ])\n", 1898 | "topics = [ t for t in map(json.loads, open(\"topics.0720.utf8.jsonl\")) if t['topic_id'] == use_topic ]" 1899 | ], 1900 | "metadata": { 1901 | "colab": { 1902 | "base_uri": "https://localhost:8080/", 1903 | "height": 149, 1904 | "referenced_widgets": [ 1905 | "19129db08a4d45ac9a7451c54fa6294e", 1906 | "83f94e7318f7470e84e57bb9793cd516", 1907 | "21febb8af09845e3b4035ca4d7b4e6f5", 1908 | "2eb7be5cf5724ff2abdbb11d574d13df", 1909 | "d5c2bfc469dd4d75a15456e9f5139703", 1910 | "b04d6073e65845c4b0e70945a95c23a2", 1911 | "bef41c64e3a64f71b4fdb32b00bfb90b", 1912 | "dc51464eadf942aaab5dab9ec166c6cc", 1913 | "f4b0311ca9ad412296d0614a3517169e", 1914 | "d3da9c98fd4c4bcbb0df491a28e234fb", 1915 | "c674c869c50e4eb8a2b63d5b928a6b82", 1916 | "0dc339c38da74695a179d6004f63d0e5", 1917 | "f728ebeea0a845dcbc2c6b607c6bc978", 1918 | "f8a9fd28144d456baf5912b92a04c251", 1919 | "f863f30570744e3483bf124b2b0667b0", 1920 | "5d3236474a4f41b2b0b2f8b8e6d7acf2", 1921 | "474e96e623aa49c8a603801db8ab73e5", 1922 | "0d9c8dba0a10477dbb3449ae1124ee17", 1923 | "cbfc4a83c769442b9a25c27bcc5a38a8", 1924 | "3869a56e41df4c418d04ed4f623ba50c", 1925 | "3bdc109f2eef4108b0f01eb2d1b641e9", 1926 | "cb92e9d3f2d4438a87d9ca41b1e461a8", 1927 | "b16ee4cb317c4148a51659b04a87aa9a", 1928 | "6e1648b5d8594515b03da651c375a26a", 1929 | "07a81aa9807840bbab84804d45740c05", 1930 | "e7f39db62bd242a087d42947b504248f", 1931 | "1f684d285e65464fa702c9881b5143bf", 1932 | "38739a346bbf497894bcbe7490aba6b9", 1933 | "c8b80b5f957b4e9cb73afe4362f83f93", 1934 | "fd8f17f50ac94bd8bcc6d6314d1a4379", 1935 | "4345f1024bec4be4b245310f3f223ddc", 1936 | "aec3d14f998a4766887c6f5abd9baed3", 1937 | "9e21e3cb60ee4f32ab957eb86d8d400c" 1938 | ] 1939 | }, 1940 | "id": "PI64O_uLCK_o", 1941 | "outputId": "6e012b12-696c-4198-8eb7-56840bcb4134" 1942 | }, 1943 | "execution_count": null, 1944 | "outputs": [ 1945 | { 1946 | "output_type": "stream", 1947 | "name": "stdout", 1948 | "text": [ 1949 | "topics.0720.utf8.js 100%[===================>] 646.75K --.-KB/s in 0.1s \n", 1950 | "2022-qrels.zho 100%[===================>] 1.54M 9.01MB/s in 0.2s \n" 1951 | ] 1952 | }, 1953 | { 1954 | "output_type": "display_data", 1955 | "data": { 1956 | "text/plain": [ 1957 | "Downloading builder script: 0%| | 0.00/1.31k [00:00 0\n", 2136 | " }\n", 2137 | "\n", 2138 | " dict_blade = dict(sorted(dict_blade.items(), key = operator.itemgetter(1), reverse = True))\n", 2139 | "\n", 2140 | " if len(dict_blade.keys()) == 0:\n", 2141 | " print(\"empty input =>\", id_)\n", 2142 | " dict_blade['\"[unused993]\"'] = 1\n", 2143 | "\n", 2144 | " res[id_] = dict_blade\n", 2145 | " return res" 2146 | ], 2147 | "metadata": { 2148 | "id": "GiNpaRlXV0vx" 2149 | }, 2150 | "execution_count": null, 2151 | "outputs": [] 2152 | }, 2153 | { 2154 | "cell_type": "markdown", 2155 | "source": [ 2156 | "Loading the BLADE model. Make sure to change the runtime type to include the free GPU (T4)." 2157 | ], 2158 | "metadata": { 2159 | "id": "u-wnXrcfYfPD" 2160 | } 2161 | }, 2162 | { 2163 | "cell_type": "code", 2164 | "source": [ 2165 | "import json\n", 2166 | "import math\n", 2167 | "import heapq\n", 2168 | "import torch\n", 2169 | "import operator\n", 2170 | "\n", 2171 | "from transformers import AutoTokenizer\n", 2172 | "\n", 2173 | "model_name = \"srnair/blade-en-zh\"\n", 2174 | "\n", 2175 | "tokenizer = AutoTokenizer.from_pretrained(model_name)\n", 2176 | "model = Blade(model_name)\n", 2177 | "device = torch.device(\"cuda\")\n", 2178 | "model.to(device)\n", 2179 | "model.eval()\n" 2180 | ], 2181 | "metadata": { 2182 | "colab": { 2183 | "base_uri": "https://localhost:8080/" 2184 | }, 2185 | "id": "4Hg8HgyRY812", 2186 | "outputId": "b1f28fa8-7419-41ad-a79c-63427ab185e1" 2187 | }, 2188 | "execution_count": null, 2189 | "outputs": [ 2190 | { 2191 | "output_type": "execute_result", 2192 | "data": { 2193 | "text/plain": [ 2194 | "Blade(\n", 2195 | " (transformer): BertForMaskedLM(\n", 2196 | " (bert): BertModel(\n", 2197 | " (embeddings): BertEmbeddings(\n", 2198 | " (word_embeddings): Embedding(35225, 768, padding_idx=0)\n", 2199 | " (position_embeddings): Embedding(512, 768)\n", 2200 | " (token_type_embeddings): Embedding(2, 768)\n", 2201 | " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", 2202 | " (dropout): Dropout(p=0.1, inplace=False)\n", 2203 | " )\n", 2204 | " (encoder): BertEncoder(\n", 2205 | " (layer): ModuleList(\n", 2206 | " (0-11): 12 x BertLayer(\n", 2207 | " (attention): BertAttention(\n", 2208 | " (self): BertSelfAttention(\n", 2209 | " (query): Linear(in_features=768, out_features=768, bias=True)\n", 2210 | " (key): Linear(in_features=768, out_features=768, bias=True)\n", 2211 | " (value): Linear(in_features=768, out_features=768, bias=True)\n", 2212 | " (dropout): Dropout(p=0.1, inplace=False)\n", 2213 | " )\n", 2214 | " (output): BertSelfOutput(\n", 2215 | " (dense): Linear(in_features=768, out_features=768, bias=True)\n", 2216 | " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", 2217 | " (dropout): Dropout(p=0.1, inplace=False)\n", 2218 | " )\n", 2219 | " )\n", 2220 | " (intermediate): BertIntermediate(\n", 2221 | " (dense): Linear(in_features=768, out_features=3072, bias=True)\n", 2222 | " (intermediate_act_fn): GELUActivation()\n", 2223 | " )\n", 2224 | " (output): BertOutput(\n", 2225 | " (dense): Linear(in_features=3072, out_features=768, bias=True)\n", 2226 | " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", 2227 | " (dropout): Dropout(p=0.1, inplace=False)\n", 2228 | " )\n", 2229 | " )\n", 2230 | " )\n", 2231 | " )\n", 2232 | " )\n", 2233 | " (cls): BertOnlyMLMHead(\n", 2234 | " (predictions): BertLMPredictionHead(\n", 2235 | " (transform): BertPredictionHeadTransform(\n", 2236 | " (dense): Linear(in_features=768, out_features=768, bias=True)\n", 2237 | " (transform_act_fn): GELUActivation()\n", 2238 | " (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)\n", 2239 | " )\n", 2240 | " (decoder): Linear(in_features=768, out_features=35225, bias=True)\n", 2241 | " )\n", 2242 | " )\n", 2243 | " )\n", 2244 | ")" 2245 | ] 2246 | }, 2247 | "metadata": {}, 2248 | "execution_count": 14 2249 | } 2250 | ] 2251 | }, 2252 | { 2253 | "cell_type": "markdown", 2254 | "source": [ 2255 | "Processing document identifiers and document text." 2256 | ], 2257 | "metadata": { 2258 | "id": "lvJao5UMYnzo" 2259 | } 2260 | }, 2261 | { 2262 | "cell_type": "code", 2263 | "source": [ 2264 | "ids, docs = [], []\n", 2265 | "for doc_id in tqdm(doc_id_to_idx, total = len(doc_id_to_idx)):\n", 2266 | " content = get_doc_text_by_doc_id(doc_id)\n", 2267 | " ids.append(doc_id)\n", 2268 | " docs.append(content.lower())" 2269 | ], 2270 | "metadata": { 2271 | "colab": { 2272 | "base_uri": "https://localhost:8080/", 2273 | "height": 49, 2274 | "referenced_widgets": [ 2275 | "9488edc4e4b44e66b5ca897b28510918", 2276 | "7d653439a17744be8fa9d073f7e872a0", 2277 | "ae886d7b6a9c4971ba79ea7827e07fc3", 2278 | "bea8662ba12a4b3f906b48096c8ce877", 2279 | "f4d3a5abe24143c2b6d75296caccae5c", 2280 | "f5a986f51865470881b2a643d9a83597", 2281 | "02ae8d0d38724d57a231734768837b96", 2282 | "33d9873506f9449db284457fb2f7a95c", 2283 | "fffecdb96c8f4c51840b2a09d5dc91cf", 2284 | "5bf6c19517364eeb8043250f4cc65f6e", 2285 | "226d61b2375b47a088528088edbf38a2" 2286 | ] 2287 | }, 2288 | "id": "t5XQ2rMTZFVh", 2289 | "outputId": "3897451e-3ff0-4edb-8e97-82f0fb0e45a4" 2290 | }, 2291 | "execution_count": null, 2292 | "outputs": [ 2293 | { 2294 | "output_type": "display_data", 2295 | "data": { 2296 | "text/plain": [ 2297 | " 0%| | 0/40000 [00:00] 646.75K --.-KB/s in 0.03s \n", 925 | "2022-qrels.zho.1 100%[===================>] 1.54M --.-KB/s in 0.04s \n", 926 | "zho-base-run-result 100%[===================>] 9.11M 24.6MB/s in 0.4s \n" 927 | ] 928 | }, 929 | { 930 | "output_type": "display_data", 931 | "data": { 932 | "text/plain": [ 933 | "Loading first 40k docs from NeuCLIR Chinese Collection: 0%| | 0/40000 [00:00