├── fonts ├── aletheiasans-webfont.eot └── aletheiasans-webfont.ttf ├── example_ocr.tsv ├── neat.html ├── README.md ├── LICENSE ├── example_nerd.tsv └── neat.js /fonts/aletheiasans-webfont.eot: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qurator-spk/neat/HEAD/fonts/aletheiasans-webfont.eot -------------------------------------------------------------------------------- /fonts/aletheiasans-webfont.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qurator-spk/neat/HEAD/fonts/aletheiasans-webfont.ttf -------------------------------------------------------------------------------- /example_ocr.tsv: -------------------------------------------------------------------------------- 1 | TEXT url_id left right top bottom 2 | # https://content.staatsbibliothek-berlin.de/dc/PPN757123368-00000008/left,top,width,height/full/0/default.jpg 3 | 6. O ſit ſich nun geſumbler gehan / iu 0 141 1087 134 230 4 | Fuß biß in diẽ tauſent Mann / die hatten 0 236 1069 225 286 5 | Toͤwens muhte / ſie kament hin gehn But⸗ 0 200 1069 280 341 6 | tisholtz / da fandens mengen Engler ſtoltz / 0 237 1071 335 397 7 | den ſie legten ins Blutte. Es war zu mahl 0 238 1068 388 452 8 | ein harter Streit / das kein theil woite wei⸗ 0 172 1068 448 507 9 | ch. n/ ſie ſtunden veſt zu beider ſeit / letſtlich 0 237 1069 501 562 10 | fleng an zaweichen / der Englich kauff vnd 0 238 1073 558 615 11 | nahm die flucht / alſo handt die Eydgnoſ⸗ 0 240 1070 611 673 12 | ſen/ ihnen ſelber gemachet lufft. 0 239 853 667 730 13 | So zu mal haͤndt die from̃en Landleut / 0 297 1074 721 779 14 | trobert ein ſchoͤne beuih / an geſchmeidt 0 240 1075 779 832 15 | harniſch vnd Roſſen / zwey hundert Man 0 242 1075 831 890 16 | hands erſchlagen / die warhrit thun ich 0 242 1077 888 947 17 | luch ſagen / von den freirn Eydgnoſſen / 0 242 1075 940 998 18 | deß haben ſie gedancket Gott / daß er jhnen 0 246 1076 992 1052 19 | bey geſtanden / vnnd ſit erloͤſt auß Feindes 0 243 1078 1049 1108 20 | noch / die kon aub ferren Landen / ſic zu ver⸗ 0 247 1077 1103 1163 21 | derben alle gar / doch wardis diß orths ver⸗ 0 214 1077 1161 1217 22 | irieb:n/diẽ Engellandiſche ſcha⸗ 0 249 907 1214 1271 23 | — — ⏑V 2 0 306 1098 1255 1326 24 | Landtman von Entlibuch / hat ein Edlen 0 251 1082 1323 1388 25 | auß zogen / ſein Helm vnd Kuͤriß legt eran 0 252 1085 1378 1438 26 | den ſchilt vnd ſper zu handen gnon / hiemit 0 254 1087 1434 1493 27 | ein hengſt beſtigen / daß hatt erſehen der 0 256 1085 1489 1551 28 | Cdaman /von Elorbun ab du dmaurge 0 257 1086 1542 1648 29 | tt 0 1045 1125 1609 1655 30 | -------------------------------------------------------------------------------- /neat.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | neat 6 | 7 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 68 | 69 | 70 | 71 |
72 | 73 |
74 | 75 |
76 |
77 |
78 |
79 |
80 |
81 |

neat: neat annotation tool

82 | User Guide | Annotation Guidelines | Issues
83 |
84 |
85 |
86 |
87 |
88 |
89 |
90 |
91 |
92 |
93 | 94 | 95 | 96 |
97 |
98 |
99 | Please upload a TSV(i) file: 100 |

101 | 102 |
103 |
104 |
105 |
106 |
107 |
108 |
109 |
110 |
111 |
112 |
113 |
114 |
115 |
116 | 117 | 118 | 119 | 120 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # neat: named entity annotation tool 2 | --- 3 | ![Screenshot](https://user-images.githubusercontent.com/952378/76674885-6b7b9000-65b4-11ea-9a36-1f9179dc5d6b.png) 4 | --- 5 | 6 | ### Table of contents 7 | [1. Introduction](https://github.com/qurator-spk/neat/blob/master/README.md#1-introduction) 8 | 9 | [2. User Guide](https://github.com/qurator-spk/neat/blob/master/README.md#2-user-guide) 10 | 11 |    [2.1 Installation](https://github.com/qurator-spk/neat/blob/master/README.md#21-installation) 12 | 13 |    [2.2 Data format](https://github.com/qurator-spk/neat/blob/master/README.md#22-data-format) 14 | 15 |    [2.3 Navigation](https://github.com/qurator-spk/neat/blob/master/README.md#23-navigation) 16 | 17 |    [2.4 Saving progress](https://github.com/qurator-spk/neat/blob/master/README.md#24-saving-progress) 18 | 19 | [3. Annotation Guidelines](https://github.com/qurator-spk/neat/blob/master/README.md#3-annotation-guidelines) 20 | 21 | ### 1. Introduction 22 | *neat* is a simple, browser-based tool for editing and annotating text with named entities to produce labeled data for training/testing/evaluation. It can be used to add or correct named entity labels and to correct the token text or tokenization (e.g. due to OCR/segmentation errors). 23 | 24 | *neat* is developed at the [Berlin State Library](https://staatsbibliothek-berlin.de/) for data annotation in the [SoNAR-IDH](https://sonar.fh-potsdam.de/) project and the [QURATOR](https://qurator.ai/) project. 25 | 26 | ### 2. User Guide 27 | 28 | #### 2.1 Installation 29 | *neat* runs locally as a pure HTML+JavaScript webpage in your web browser. No additional software needs to be installed, but JavaScript has to be enabled in the browser. 30 | 31 | Clone the repo using ``git clone https://github.com/qurator-spk/neat.git`` or download and extract the [ZIP](https://github.com/qurator-spk/neat/archive/master.zip). Make sure you have ``neat.html`` and ``neat.js`` in the same directory and open ``neat.html`` in a browser. Any fairly recent browser should work, but only Chrome and Firefox are tested. 32 | 33 | #### 2.2 Data format 34 | The source data we use for annotation are OCR results in [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) format. We provide a [Python tool](https://github.com/qurator-spk/page2tsv) for the transformation of OCR files in [PAGE-XML](https://github.com/PRImA-Research-Lab/PAGE-XML) into the [TSV format](https://github.com/qurator-spk/neat/blob/master/README.md#22-data-format) used by *neat*. 35 | 36 | The internal data format used by *neat* is based on the format used in the [GermEval2014 ](https://sites.google.com/site/germeval2014ner/data) Named Entity Recognition Shared Task. Text is encoded as one token per line, with name spans in the [IOB2](https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)) format as tab-separated values: 37 | * the first column contains either 38 | * `#` a comment to indicate the source the sentence is taken from, or 39 | * ``>=1`` the token position within the sentence, or 40 | * ``0`` to mark sentence boundaries 41 | * the second column contains the token ``text`` 42 | * outer entity spans are encoded in the third column ``NE-TAG`` 43 | * embedded entity spans are encoded in the fourth column ``NE-EMB`` 44 | 45 | ##### Example (simple) 46 | ```tsv 47 | No. TOKEN NE-TAG NE-EMB 48 | # https://example.url 49 | 1 Donnerstag O O 50 | 2 , O O 51 | 3 1 O O 52 | 4 . O O 53 | 5 Januar O O 54 | 6 . O O 55 | 0 O O 56 | 1 Berliner B-ORG B-LOC 57 | 2 Tageblatt I-ORG O 58 | 3 . O O 59 | 0 O O 60 | 1 Nr O O 61 | 2 . O O 62 | 3 1 O O 63 | 4 . O O 64 | 0 O O 65 | 1 Seite O O 66 | 2 3 O O 67 | ``` 68 | 69 | For our purposes we extend this format by adding these (optional) values: 70 | * a fifth column for an ``ID`` for the outer ``NE-TAG`` from an authority file (*neat* supports automatic linking for [Wikidata](https://www.wikidata.org) identifiers) 71 | * column six for use as a variable ``url_id`` for [iiif](https://iiif.io/) Image API support (*neat* supports the embedding of image snippets into its interface to assist data annotation and correction if the PAGE-XML source contains word bounding boxes) 72 | * columns 7-10 are used for storing ``left,right,top,bottom`` pixel coordinates for the image snippets 73 | 74 | ##### Example (full) 75 | ```tsv 76 | No. TOKEN NE-TAG NE-EMB ID url_id left,right,top,bottom 77 | # https://example.url/iiif/left,right,top,bottom/full/0/default.jpg 78 | 1 Donnerstag O O - 0 174,352,358,390 79 | 2 , O O - 0 174,352,358,390 80 | 3 1 O O - 0 367,392,361,381 81 | 4 . O O - 0 370,397,352,379 82 | 5 Januar O O - 0 406,518,358,386 83 | 6 . O O - 0 406,518,358,386 84 | 0 85 | 1 Berliner B-ORG B-LOC Q455014 0 816,984,358,388 86 | 2 Tageblatt I-ORG O Q455014 0 1005,1208,360,387 87 | 3 . O O - 0 1005,1208,360,387 88 | 0 89 | 1 Nr O O - 0 1237,1288,360,382 90 | 2 . O O - 0 1237,1288,360,382 91 | 3 1 O O - 0 1304,1326,361,381 92 | 4 . O O - 0 1304,1326,361,381 93 | 0 94 | 1 Seite O O - 0 1837,1926,361,392 95 | 2 3 O O - 0 1939,1967,364,385 96 | ``` 97 | 98 | #### 2.3 Navigation 99 | *neat* can be used both with a [keyboard](https://github.com/qurator-spk/neat#keyboard) or a [mouse](https://github.com/qurator-spk/neat#mouse), but for ergonomic reasons, we strongly recommend the use of below key combinations. 100 | 101 | ##### Keyboard 102 | | Key Combination| Action | 103 | |:---------|:-------------------------------------------| 104 | | Left | Move one cell left | 105 | | Right | Move one cell right | 106 | | Up | Move one row up | 107 | | Down | Move one row down | 108 | | PageDown | Move page down | 109 | | PageUp | Move page up | 110 | | Crtl+Up | Move entire table one row up | 111 | | Crtl+Down| Move entire table one row down | 112 | |----------|--------------------------------------------| 113 | | s t | Start new sentence in current row | 114 | | m e | Merge current row with row above | 115 | | s p | Create copy of current row | 116 | | d l | Delete current row | 117 | |----------|--------------------------------------------| 118 | | backspace| Set NE-TAG / NE-EMB to ``O`` | 119 | | b p | Set NE-TAG / NE-EMB to ``B-PER`` | 120 | | b l | Set NE-TAG / NE-EMB to ``B-LOC`` | 121 | | b o | Set NE-TAG / NE-EMB to ``B-ORG`` | 122 | | b w | Set NE-TAG / NE-EMB to ``B-WORK`` | 123 | | b c | Set NE-TAG / NE-EMB to ``B-CONF`` | 124 | | b e | Set NE-TAG / NE-EMB to ``B-EVT`` | 125 | | b t | Set NE-TAG / NE-EMB to ``B-TODO`` | 126 | | i p | Set NE-TAG / NE-EMB to ``I-PER`` | 127 | | i l | Set NE-TAG / NE-EMB to ``I-LOC`` | 128 | | i o | Set NE-TAG / NE-EMB to ``I-ORG`` | 129 | | i w | Set NE-TAG / NE-EMB to ``I-WORK`` | 130 | | i c | Set NE-TAG / NE-EMB to ``I-CONF`` | 131 | | i e | Set NE-TAG / NE-EMB to ``I-EVT`` | 132 | | i t | Set NE-TAG / NE-EMB to ``I-TODO`` | 133 | |----------|--------------------------------------------| 134 | | enter | Edit TOKEN or ID | 135 | | esc | Close TOKEN or ID edit field without | 136 | | | application of changes | 137 | |----------|--------------------------------------------| 138 | | l a | add one display row | 139 | | l r | remove on display row (minimum is 5) | 140 | |----------|--------------------------------------------| 141 | 142 | ##### Mouse 143 | * use mouse wheel to scroll up and down 144 | 145 | * left-click `<<` and `>>` to move 15 rows up or down 146 | 147 | * left-click `O` in the `NE-TAG` or `NE-EMB` column to open a drop-down menu and subsequently select any of the supported NE-Tags to tag a token or change an existing tag 148 | 149 | * left-click the `NE-TAG` or `NE-EMB` column and select `O` to remove a tag 150 | 151 | * left-click the `TOKEN` column to edit the token text 152 | 153 | * left-click the `POSITION` and select `split` from the drop-down menu to create a copy of the current row below 154 | 155 | * left-click the `POSITION` and select `merge` from the drop-down menu to merge the current row with the row above 156 | 157 | * left-click the `POSITION` and select `start-sentence` from the drop-down menu to mark the start of a new sentence 158 | 159 | #### 2.4 Saving progress 160 | *neat* runs fully locally in the browser. Therefore it can not automatically save any changes you made to disk. You have to use the `Save Changes` button to do so manually from time to time. 161 | 162 | ### 3. Annotation Guidelines 163 | [Annotation Guidelines](https://zenodo.org/record/5116015) 164 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /example_nerd.tsv: -------------------------------------------------------------------------------- 1 | No. TOKEN NE-TAG NE-EMB ID url_id left right top bottom 2 | # https://content.staatsbibliothek-berlin.de/zefys/SNP27646518-18800101-0-3-0-0/left,top,width,height/full/0/default.jpg 3 | 1 Kampf O O - 0 154 212 400 419 4 | 2 , O O - 0 154 212 400 419 5 | 3 deſſen O O - 0 221 264 400 419 6 | 4 Ende O O - 0 274 313 401 417 7 | 5 vielleicht O O - 0 324 388 399 418 8 | 6 noch O O - 0 397 429 400 418 9 | 7 heute O O - 0 439 478 400 418 10 | 8 nicht O O - 0 487 523 399 417 11 | 9 abzuſehen O O - 0 532 605 399 418 12 | 10 wäre O O - 0 615 656 399 417 13 | 11 , O O - 0 615 656 399 417 14 | 12 wenn O O - 0 671 701 402 415 15 | 13 nicht O O - 0 702 755 399 417 16 | 14 Herr O O - 0 155 192 419 437 17 | 15 Gambetta B-PER O Q295090 0 202 277 419 437 18 | 16 als O O - 0 287 311 420 436 19 | 17 deus O O - 0 320 357 419 434 20 | 18 ex O O - 0 366 385 422 434 21 | 19 machina O O - 0 395 451 419 434 22 | 20 erſchienen O O - 0 452 543 417 436 23 | 21 wäre O O - 0 553 594 417 437 24 | 22 , O O - 0 553 594 417 437 25 | 23 reſp O O - 0 608 642 418 437 26 | 24 . O O - 0 608 642 418 437 27 | 25 durch O O - 0 652 692 418 436 28 | 26 perſön⸗ O O - 0 698 756 418 437 29 | 27 liche O O - 0 156 188 437 457 30 | 28 Intervention O O - 0 197 298 438 457 31 | 29 bei O O - 0 309 330 438 453 32 | 30 dem O O - 0 339 370 437 453 33 | 31 Präſidenten O O - 0 379 468 437 457 34 | 32 Grévy B-PER O Q296083 0 475 524 436 456 35 | 33 einen O O - 0 534 572 437 453 36 | 34 Ausgleich O O - 0 577 650 437 455 37 | 35 herbeigeführt O O - 0 658 755 436 455 38 | 36 hätte O O - 0 155 207 457 475 39 | 37 . O O - 0 155 207 457 475 40 | 0 O O - 0 216 239 457 474 41 | 1 Es O O - 0 216 239 457 474 42 | 2 ſcheint O O - 0 252 300 457 475 43 | 3 dem O O - 0 309 339 457 472 44 | 4 Kammerpräſidenten O O - 0 349 498 455 474 45 | 5 plötzlich O O - 0 508 566 455 475 46 | 6 ein O O - 0 576 598 455 472 47 | 7 Argwohn O O - 0 604 676 455 475 48 | 8 oder O O - 0 686 710 455 471 49 | 9 eine O O - 0 711 756 455 471 50 | 10 Befürchtung O O - 0 155 250 475 495 51 | 11 gekommen O O - 0 259 338 475 495 52 | 12 zu O O - 0 346 354 479 495 53 | 13 ſein O O - 0 354 404 475 494 54 | 14 , O O - 0 354 404 475 494 55 | 15 als O O - 0 414 438 475 490 56 | 16 ob O O - 0 449 467 474 490 57 | 17 hinter O O - 0 476 522 474 493 58 | 18 dem O O - 0 531 561 474 491 59 | 19 Bemühen O O - 0 570 648 474 492 60 | 20 , O O - 0 570 648 474 492 61 | 21 Waddington B-PER O Q548696 0 660 756 474 493 62 | 22 unb O O - 0 155 185 494 512 63 | 23 Léon B-PER O Q3271322 0 200 249 494 512 64 | 24 Say I-PER O Q3271322 0 254 288 494 512 65 | 25 zu O O - 0 308 324 498 512 66 | 26 halten O O - 0 343 398 494 512 67 | 27 , O O - 0 343 398 494 512 68 | 28 dagegen O O - 0 410 477 492 512 69 | 29 Lepère B-PER O Q670573 0 492 544 493 512 70 | 30 zu O O - 0 563 581 497 512 71 | 31 entfernen O O - 0 600 678 492 511 72 | 32 , O O - 0 600 678 492 511 73 | 33 die O O - 0 693 718 492 509 74 | 34 Ab O O - 0 724 756 492 509 75 | 35 ſicht O O - 0 156 187 513 531 76 | 36 ſtecke O O - 0 206 250 513 531 77 | 37 , O O - 0 206 250 513 531 78 | 38 das O O - 0 268 296 513 529 79 | 39 neue O O - 0 316 349 516 529 80 | 40 Miniſterium O O - 0 367 463 511 529 81 | 41 von O O - 0 482 509 515 528 82 | 42 dem O O - 0 529 559 512 528 83 | 43 bisher O O - 0 566 632 511 530 84 | 44 dominirenden O O - 0 653 756 511 528 85 | 45 Einfluß O O - 0 156 216 531 550 86 | 46 des O O - 0 240 266 532 548 87 | 47 Palais B-LOC O Q936633 0 293 346 530 550 88 | 48 Bourbon I-LOC O Q936633 0 368 437 530 546 89 | 49 frei O O - 0 462 488 530 549 90 | 50 zu O O - 0 511 528 535 550 91 | 51 machen O O - 0 552 610 530 549 92 | 52 . O O - 0 552 610 530 549 93 | 0 O O - 0 644 682 529 546 94 | 1 Sein O O - 0 644 682 529 546 95 | 2 Beſuch O O - 0 706 756 530 548 96 | 3 bei O O - 0 159 189 550 567 97 | 4 Grévy B-PER O Q296083 0 195 246 551 569 98 | 5 am O O - 0 262 285 554 566 99 | 6 Sonntag O O - 0 300 368 550 569 100 | 7 Morgen O O - 0 380 442 549 569 101 | 8 um O O - 0 457 482 553 565 102 | 9 10 O O - 0 496 514 550 565 103 | 10 Uhr O O - 0 525 546 549 568 104 | 11 ſoll O O - 0 546 593 549 569 105 | 12 keineswegs O O - 0 607 691 548 567 106 | 13 erbeten O O - 0 703 756 549 565 107 | 14 ſondern O O - 0 163 216 570 586 108 | 15 — O O - 0 225 243 577 580 109 | 16 zum O O - 0 254 285 573 587 110 | 17 erſten O O - 0 295 335 569 587 111 | 18 Mal O O - 0 345 386 567 587 112 | 19 ! O O - 0 345 386 567 587 113 | 20 — O O - 0 396 414 576 578 114 | 21 freiwillig O O - 0 418 493 567 587 115 | 22 und O O - 0 508 537 568 584 116 | 23 ziemlich O O - 0 542 605 567 587 117 | 24 unerwartet O O - 0 615 697 568 583 118 | 25 erfolgt O O - 0 707 756 567 586 119 | 26 ſein O O - 0 156 190 586 606 120 | 27 . O O - 0 156 190 586 606 121 | 0 O O - 0 209 237 588 604 122 | 1 Was O O - 0 209 237 588 604 123 | 2 zwiſchen O O - 0 238 317 587 606 124 | 3 den O O - 0 327 353 587 603 125 | 4 beiden O O - 0 362 408 587 603 126 | 5 Präſidenten O O - 0 418 508 586 606 127 | 6 verhandelt O O - 0 523 602 587 606 128 | 7 worden O O - 0 611 671 586 604 129 | 8 , O O - 0 611 671 586 604 130 | 9 weiß O O - 0 687 723 586 604 131 | 10 na⸗ O O - 0 732 756 590 602 132 | 11 türlich O O - 0 157 205 606 624 133 | 12 Niemand O O - 0 217 289 607 624 134 | 13 , O O - 0 217 289 607 624 135 | 14 wenn O O - 0 300 339 609 623 136 | 15 nicht O O - 0 349 383 605 624 137 | 16 Herr O O - 0 393 429 606 624 138 | 17 Gambetta B-PER O Q295090 0 434 509 606 622 139 | 18 ſelbſt O O - 0 519 557 604 624 140 | 19 es O O - 0 566 582 607 621 141 | 20 hinterher O O - 0 588 656 605 623 142 | 21 beim O O - 0 666 700 605 621 143 | 22 Früh O O - 0 710 756 604 624 144 | 23 — O O - 0 710 756 604 624 145 | 24 ftück O O - 0 157 189 625 643 146 | 25 ſeinem O O - 0 199 248 624 643 147 | 26 Intimus O O - 0 257 330 625 643 148 | 27 , O O - 0 257 330 625 643 149 | 28 dem O O - 0 339 370 624 640 150 | 29 Schauſpieler O O - 0 380 476 624 643 151 | 30 Coquelin B-PER O Q142310 0 491 559 624 642 152 | 31 dem O O - 0 575 605 624 640 153 | 32 „ O O - 0 620 714 623 642 154 | 33 Jüngeren O O - 0 620 714 623 642 155 | 34 “ O O - 0 620 714 623 642 156 | 35 von O O - 0 728 756 626 639 157 | 36 der O O - 0 157 181 643 660 158 | 37 Comédie B-ORG O Q61460498 0 197 262 643 661 159 | 38 françaiſe I-ORG O Q61460498 0 277 345 642 661 160 | 39 anvertraut O O - 0 359 440 644 659 161 | 40 hat O O - 0 455 484 644 661 162 | 41 . O O - 0 455 484 644 661 163 | 0 O O - 0 503 560 642 659 164 | 1 Abends O O - 0 503 560 642 659 165 | 2 im O O - 0 576 595 642 658 166 | 3 Theater O O - 0 604 665 642 661 167 | 4 ſpielte O O - 0 665 724 642 661 168 | 5 der O O - 0 733 756 642 658 169 | 6 Allgewaltige O O - 0 157 252 662 682 170 | 7 freilich O O - 0 262 312 662 681 171 | 8 wieder O O - 0 326 375 661 678 172 | 9 den O O - 0 389 415 662 678 173 | 10 Unbefangenen O O - 0 425 530 662 681 174 | 11 und O O - 0 544 572 661 677 175 | 12 Ununterrichteten O O - 0 582 711 661 679 176 | 13 , O O - 0 582 711 661 679 177 | 14 denn O O - 0 720 755 661 677 178 | 15 er O O - 0 158 172 686 697 179 | 16 leugnete O O - 0 182 242 682 700 180 | 17 ſogar O O - 0 256 296 681 699 181 | 18 ſeinen O O - 0 312 356 680 699 182 | 19 Beſuch O O - 0 366 416 681 699 183 | 20 vom O O - 0 433 465 683 696 184 | 21 Vormittag O O - 0 481 566 679 699 185 | 22 , O O - 0 481 566 679 699 186 | 23 obwohl O O - 0 583 638 681 698 187 | 24 Hunderte O O - 0 646 716 679 699 188 | 25 das O O - 0 728 755 679 695 189 | 26 wohlbekannte O O - 0 157 258 700 718 190 | 27 kleine O O - 0 271 312 698 715 191 | 28 Coupé O O - 0 322 371 699 718 192 | 29 Gambettas B-PER O Q295090 0 382 466 698 716 193 | 30 eine O O - 0 482 510 699 715 194 | 31 ganze O O - 0 525 566 702 718 195 | 32 Stunde O O - 0 577 633 698 715 196 | 33 lang O O - 0 648 681 698 715 197 | 34 von O O - 0 695 712 701 714 198 | 35 der O O - 0 714 756 698 714 199 | 36 Rue B-LOC O - 0 157 189 718 735 200 | 37 du I-LOC O - 0 204 222 719 735 201 | 38 Faubourg I-LOC O - 0 232 308 718 736 202 | 39 St I-LOC O - 0 324 351 718 735 203 | 40 . I-LOC O - 0 324 351 718 735 204 | 41 Honoré I-LOC O - 0 360 418 718 736 205 | 42 aus O O - 0 434 462 720 733 206 | 43 im O O - 0 476 496 718 73 207 | 44 Vorhof O O - 0 505 562 717 735 208 | 45 des O O - 0 577 602 718 733 209 | 46 Elyſée B-LOC O Q188190 0 612 661 717 736 210 | 47 ſtationiren O O - 0 666 755 703 736 211 | 48 geſehen O O - 0 158 211 737 756 212 | 49 hatten O O - 0 222 273 737 754 213 | 50 . O O - 0 222 273 737 754 214 | 0 O O - 0 292 321 737 753 215 | 1 Der O O - 0 292 321 737 753 216 | 2 Erfolg O O - 0 331 382 736 756 217 | 3 dieſer O O - 0 392 432 736 755 218 | 4 Viſite O O - 0 437 480 736 754 219 | 5 war O O - 0 490 520 740 753 220 | 6 denn O O - 0 530 565 736 752 221 | 7 auch O O - 0 574 606 736 754 222 | 8 ſchon O O - 0 616 655 735 755 223 | 9 in O O - 0 665 679 735 752 224 | 10 derſelben O O - 0 689 756 736 754 225 | 11 Zeit O O - 0 157 189 756 775 226 | 12 zu O O - 0 198 214 760 775 227 | 13 ſpüren O O - 0 224 277 755 774 228 | 14 , O O - 0 224 277 755 774 229 | 15 da O O - 0 287 305 755 772 230 | 16 derjenige O O - 0 314 392 755 774 231 | 17 , O O - 0 314 392 755 774 232 | 18 der O O - 0 396 419 756 771 233 | 19 ſie O O - 0 429 445 755 774 234 | 20 gemacht O O - 0 455 519 756 774 235 | 21 , O O - 0 455 519 756 774 236 | 22 ſie O O - 0 533 550 754 773 237 | 23 ableugnen O O - 0 565 641 756 774 238 | 24 wollte O O - 0 651 702 754 770 239 | 25 . O O - 0 651 702 754 770 240 | 0 O O - 0 720 756 754 774 241 | 1 Herr O O - 0 720 756 754 774 242 | 2 Lepère B-PER O Q670573 0 156 212 774 793 243 | 3 , O O - 0 156 212 774 793 244 | 4 der O O - 0 227 250 774 790 245 | 5 bereits O O - 0 264 314 774 790 246 | 6 ſeine O O - 0 331 374 773 792 247 | 7 Siebenſachen O O - 0 382 480 773 792 248 | 8 zuſammengepackt O O - 0 494 623 773 793 249 | 9 hatte O O - 0 638 679 773 791 250 | 10 , O O - 0 638 679 773 791 251 | 11 weil O O - 0 696 727 773 789 252 | 12 er O O - 0 743 756 777 789 253 | 13 glaubte O O - 0 157 211 793 811 254 | 14 ausziehen O O - 0 221 295 793 811 255 | 15 zu O O - 0 305 322 797 811 256 | 16 müſſen O O - 0 332 383 793 811 257 | 17 — O O - 0 393 412 801 803 258 | 18 Freycinet B-PER O Q317957 0 421 493 793 811 259 | 19 ſelbſt O O - 0 496 544 792 811 260 | 20 hatte O O - 0 554 590 792 810 261 | 21 ihm O O - 0 600 629 793 809 262 | 22 das O O - 0 639 666 792 808 263 | 23 zu O O - 0 675 692 796 811 264 | 24 wieder O O - 0 702 756 791 808 265 | 25 — O O - 0 702 756 791 808 266 | 26 holten O O - 0 156 202 810 830 267 | 27 Malen O O - 0 212 262 811 828 268 | 28 in O O - 0 272 287 811 828 269 | 29 dürren O O - 0 297 347 812 827 270 | 30 Worten O O - 0 357 415 812 827 271 | 31 geſagt O O - 0 425 475 811 830 272 | 32 — O O - 0 484 503 819 822 273 | 33 Herr O O - 0 512 548 811 830 274 | 34 Lepre B-PER O Q670573 0 556 607 811 829 275 | 35 erhielt O O - 0 616 664 811 830 276 | 36 von O O - 0 674 701 814 826 277 | 37 Gam B-PER O Q295090 0 711 755 811 827 278 | 38 — I-PER O - 0 711 755 811 827 279 | 39 betta I-PER O - 0 156 192 829 846 280 | 40 die O O - 0 202 224 830 846 281 | 41 Nachricht O O - 0 234 308 830 848 282 | 42 , O O - 0 234 308 830 848 283 | 43 daß O O - 0 318 346 830 848 284 | 44 er O O - 0 356 370 835 846 285 | 45 bleiben O O - 0 380 432 830 846 286 | 46 dürfe O O - 0 445 488 830 848 287 | 47 . O O - 0 445 488 830 848 288 | 0 O O - 0 508 592 830 848 289 | 1 Gleichzeitig O O - 0 508 592 830 848 290 | 2 wurde O O - 0 602 649 829 845 291 | 3 Herrn O O - 0 658 703 829 848 292 | 4 Waddington B-PER O Q548696 0 714 756 829 845 293 | 5 das O O - 0 230 257 849 865 294 | 6 Gegentheil O O - 0 272 354 848 867 295 | 7 bedeutet O O - 0 370 437 849 867 296 | 8 ; O O - 0 370 437 849 867 297 | 9 den O O - 0 451 476 849 864 298 | 10 Botſchafterpoſten O O - 0 486 617 848 867 299 | 11 in O O - 0 633 648 848 864 300 | 12 London B-LOC O Q84 0 658 716 848 864 301 | 13 , O O - 0 658 716 848 864 302 | 14 der O O - 0 720 756 848 866 303 | 15 ihm O O - 0 156 185 866 885 304 | 16 als O O - 0 196 219 868 884 305 | 17 Entſchädigung O O - 0 230 339 867 886 306 | 18 angeboten O O - 0 350 426 868 886 307 | 19 wurde O O - 0 436 486 868 884 308 | 20 , O O - 0 436 486 868 884 309 | 21 ſchlug O O - 0 496 539 867 886 310 | 22 er O O - 0 549 563 872 883 311 | 23 aus O O - 0 573 605 869 883 312 | 24 . O O - 0 573 605 869 883 313 | 0 O O - 0 625 648 866 882 314 | 1 Von O O - 0 625 648 866 882 315 | 2 allen O O - 0 649 699 868 882 316 | 3 dieſen O O - 0 699 756 866 884 317 | 4 Vorgängen O O - 0 159 244 885 905 318 | 5 erhielt O O - 0 254 305 886 904 319 | 6 Léon B-PER O Q3271322 0 310 350 885 902 320 | 7 Say I-PER O Q3271322 0 360 394 886 905 321 | 8 erſt O O - 0 407 432 886 902 322 | 9 in O O - 0 445 460 886 902 323 | 10 ſpäter O O - 0 475 519 886 903 324 | 11 Nachmittagsſtunde O O - 0 528 671 885 905 325 | 12 Kenntniß O O - 0 682 756 885 903 326 | 13 . O O - 0 682 756 885 903 327 | 0 O O - 0 161 198 904 921 328 | 1 Sein O O - 0 161 198 904 921 329 | 2 Entſchluß O O - 0 208 281 904 923 330 | 3 war O O - 0 297 328 908 920 331 | 4 ſofort O O - 0 343 391 905 923 332 | 5 gefaßt O O - 0 400 451 903 923 333 | 6 . O O - 0 400 451 903 923 334 | 0 O O - 0 471 519 905 923 335 | 1 Gegen O O - 0 471 519 905 923 336 | 2 6 O O - 0 535 544 907 920 337 | 3 Uhr O O - 0 560 589 905 922 338 | 4 Abends O O - 0 599 656 904 920 339 | 5 fuhr O O - 0 666 690 904 922 340 | 6 er O O - 0 692 723 909 920 341 | 7 ins O O - 0 733 756 904 919 342 | 8 Elyſée B-LOC O Q188190 0 158 207 923 942 343 | 9 und O O - 0 220 248 924 939 344 | 10 legte O O - 0 264 299 924 940 345 | 11 ſein O O - 0 313 340 923 940 346 | 12 Portefeuille O O - 0 355 445 923 942 347 | 13 in O O - 0 461 475 923 939 348 | 14 Grevys B-PER O Q296083 0 490 546 923 942 349 | 15 Hände O O - 0 557 606 923 942 350 | 16 zurück O O - 0 621 671 923 942 351 | 17 . O O - 0 621 671 923 942 352 | -------------------------------------------------------------------------------- /neat.js: -------------------------------------------------------------------------------- 1 | 2 | function loadFile(evt, onComplete) { 3 | 4 | let file = evt.target.files[0]; 5 | 6 | let urls = null; 7 | 8 | let reader = new FileReader(); 9 | 10 | reader.onload = 11 | function(event) { 12 | 13 | let link_detector = /(https?:\/\/[^\s]+)/g; 14 | 15 | let lines = event.target.result.split(/\r\n|\n/); 16 | for(let i = 0; i < lines.length; i++){ 17 | 18 | let line = lines[i]; 19 | 20 | if (!line.startsWith('#')) continue; 21 | 22 | let tmp = line.match(link_detector); 23 | 24 | if (tmp == null) continue; 25 | 26 | if (urls == null) { 27 | urls = tmp; 28 | } 29 | else { 30 | urls.push(tmp[0]) 31 | } 32 | }; 33 | }; 34 | 35 | reader.readAsText(file); 36 | 37 | Papa.parse(file, { 38 | header: true, 39 | delimiter: '\t', 40 | quoteChar: String.fromCharCode(0), 41 | escapeChar: String.fromCharCode(0), 42 | comments: "#", 43 | skipEmptyLines: true, 44 | dynamicTyping: false, 45 | complete: function(results) { onComplete(results, file, urls); } 46 | }); 47 | } 48 | 49 | function setupInterface(data, file, urls) { 50 | 51 | if (data.data.length <= 0) { 52 | let empty_html = ` 53 | File is empty. 54 | Load another one. 55 | `; 56 | 57 | $('#tableregion').html(empty_html); 58 | return null; 59 | } 60 | 61 | // private variables of app 62 | 63 | let displayRows=15 64 | let startIndex=0; 65 | let endIndex=displayRows; 66 | 67 | let do_not_display = new Set(['url_id', 'left', 'right', 'top', 'bottom', 'ocrconf', 'conf', 'line_id']); 68 | let tagClasses = 'ner_per ner_loc ner_org ner_work ner_conf ner_evt ner_todo'; 69 | 70 | let has_changes = false; 71 | 72 | let save_timeout = null; 73 | 74 | let listener_defaults = { prevent_repeat : true }; 75 | 76 | let editingTd; 77 | 78 | let wnd_listener = new window.keypress.Listener(); 79 | let slider_pos = data.data.length - startIndex; 80 | let slider_min = displayRows; 81 | let slider_max = data.data.length; 82 | 83 | let min_left = 1000000000 84 | let max_right = 0 85 | let min_top = 1000000000 86 | let max_bottom = 0 87 | 88 | // private functions of app 89 | 90 | function notifyChange() { 91 | if (save_timeout != null) clearTimeout(save_timeout); 92 | has_changes = true; 93 | 94 | $("#save").attr('disabled', false); 95 | } 96 | 97 | function resetChanged() { 98 | if (save_timeout != null) clearTimeout(save_timeout); 99 | 100 | $("#save").attr('disabled', true); 101 | has_changes = false; 102 | } 103 | 104 | function checkForSave (csv) { 105 | 106 | if (save_timeout != null) clearTimeout(save_timeout); 107 | 108 | // This is a work-around that checks if the user actually saved the file or if the save dialog was cancelled. 109 | let counter = 0; 110 | let checker = 111 | function() { 112 | //console.log('checker ...', counter); 113 | 114 | if (counter > 20) return; 115 | 116 | let reader = new FileReader(); 117 | 118 | reader.onload = 119 | function(event) { 120 | 121 | let content = event.target.result; 122 | 123 | if (content == csv) { // ok content of the file is actually equal to desired content. 124 | console.log('Save success ...'); 125 | resetChanged(); 126 | return; 127 | } 128 | 129 | counter++; 130 | save_timeout = setTimeout(checker, 3000); 131 | }; 132 | 133 | reader.readAsText(file); 134 | }; 135 | 136 | save_timeout = setTimeout(checker, 3000); 137 | }; 138 | 139 | function updatePreview(nRow) { 140 | 141 | if (urls == null) return; 142 | 143 | let img_url = urls[data.data[nRow]['url_id']]; 144 | 145 | if (img_url == "http://empty") 146 | return; 147 | 148 | let raw_left = parseInt(data.data[nRow]['left']); 149 | let raw_right = parseInt(data.data[nRow]['right']); 150 | let raw_top = parseInt(data.data[nRow]['top']); 151 | let raw_bottom = parseInt(data.data[nRow]['bottom']); 152 | 153 | if (isNaN(raw_left) || isNaN(raw_right) || isNaN(raw_top) || isNaN(raw_bottom)) 154 | return; 155 | 156 | let left = raw_left; 157 | let right = raw_right; 158 | let top = raw_top; 159 | let bottom = raw_bottom; 160 | 161 | let raw_width = right - left; 162 | let raw_height = bottom - top; 163 | 164 | 165 | top = Math.max(0, top - 25); 166 | bottom = Math.min(max_bottom, bottom + 25); 167 | 168 | left = Math.max(0, left - 50); 169 | right = Math.min(max_right, right + 50); 170 | 171 | width = right - left; 172 | height = bottom - top; 173 | 174 | img_url = img_url.replace('left', left.toString()); 175 | img_url = img_url.replace('right', right.toString()); 176 | img_url = img_url.replace('top', top.toString()); 177 | img_url = img_url.replace('bottom',bottom.toString()); 178 | img_url = img_url.replace('width', width.toString()); 179 | img_url = img_url.replace('height', height.toString()); 180 | 181 | let offscreen= document.createElement('canvas'); 182 | offscreen.width= width; 183 | offscreen.height= height; 184 | 185 | $("#preview").attr("src", offscreen.toDataURL()); 186 | 187 | let ctx = offscreen.getContext("2d"); 188 | let img = new Image(); 189 | img.crossOrigin = "anonymous"; 190 | 191 | (function(left,top) { 192 | img.onload = function() { 193 | ctx.drawImage(img, 0, 0); 194 | ctx.beginPath(); 195 | ctx.lineWidth = "1"; 196 | ctx.strokeStyle = "red"; 197 | ctx.rect(raw_left - left, raw_top - top, raw_width, raw_height); 198 | ctx.stroke(); 199 | 200 | $("#preview").attr("src", offscreen.toDataURL()); 201 | }; 202 | })(left, top); 203 | 204 | img.src = img_url; 205 | 206 | top = Math.max(0, top - 200); 207 | bottom = Math.min(max_bottom, bottom + 200); 208 | 209 | left = Math.max(0, left - 400); 210 | right = Math.min(max_right, right + 400); 211 | 212 | width = right - left; 213 | height = bottom - top; 214 | 215 | let highlight = "?highlight=left,top,width,height&highlightColor=ff0000"; 216 | highlight = highlight.replace(/left/g, (raw_left -left).toString()); 217 | highlight = highlight.replace(/top/g, (raw_top - top).toString()); 218 | highlight = highlight.replace(/width/g, raw_width.toString()); 219 | highlight = highlight.replace(/height/g, raw_height.toString()); 220 | 221 | let enlarge_img_url = urls[data.data[nRow]['url_id']] + highlight; 222 | 223 | enlarge_img_url = enlarge_img_url.replace(/left/g, left.toString()); 224 | enlarge_img_url = enlarge_img_url.replace(/right/g, right.toString()); 225 | enlarge_img_url = enlarge_img_url.replace(/top/g, top.toString()); 226 | enlarge_img_url = enlarge_img_url.replace(/bottom/g,bottom.toString()); 227 | enlarge_img_url = enlarge_img_url.replace(/width/g, width.toString()); 228 | enlarge_img_url = enlarge_img_url.replace(/height/g, height.toString()); 229 | 230 | //?highlight=left,top,width,height&highlightColor=ff0000 231 | 232 | if ($('#enlarge-page-link').length == 0) { 233 | let enlarge_html = 234 | ` 235 | enlarge 236 | `; 237 | 238 | $('#preview-rgn').append($(enlarge_html)); 239 | } 240 | 241 | $("#preview-link").attr("href", enlarge_img_url); 242 | $("#enlarge-page-link").attr("href", enlarge_img_url); 243 | 244 | highlight = "?highlight=left,top,width,height&highlightColor=ff0000"; 245 | highlight = highlight.replace(/left/g, raw_left.toString()); 246 | highlight = highlight.replace(/top/g, raw_top.toString()); 247 | highlight = highlight.replace(/width/g, raw_width.toString()); 248 | highlight = highlight.replace(/height/g, raw_height.toString()); 249 | 250 | full_img_url = urls[data.data[nRow]['url_id']] + highlight; 251 | 252 | width = max_right - min_left; 253 | height = max_bottom - min_top; 254 | 255 | full_img_url = full_img_url.replace("left,top,width,height", "full") 256 | full_img_url = full_img_url.replace("left,right,top,bottom", "full") 257 | full_img_url = full_img_url.replace("left,top,right,bottom", "full") 258 | 259 | full_img_url = full_img_url.replace(/left/g, min_left.toString()); 260 | full_img_url = full_img_url.replace(/right/g, max_right.toString()); 261 | full_img_url = full_img_url.replace(/top/g, min_top.toString()); 262 | full_img_url = full_img_url.replace(/bottom/g, max_bottom.toString()); 263 | full_img_url = full_img_url.replace(/width/g, width.toString()); 264 | full_img_url = full_img_url.replace(/height/g, height.toString()); 265 | 266 | if ($('#full-page-link').length == 0) { 267 | $('#preview-rgn').append($('| full ')); 268 | } 269 | 270 | $("#full-page-link").attr("href", full_img_url); 271 | } 272 | 273 | function colorCodeNETag() { 274 | $(".editable").removeClass(tagClasses); 275 | 276 | $("#table td:contains('B-PER')").addClass('ner_per'); 277 | $("#table td:contains('I-PER')").addClass('ner_per'); 278 | $("#table td:contains('B-LOC')").addClass('ner_loc'); 279 | $("#table td:contains('I-LOC')").addClass('ner_loc'); 280 | $("#table td:contains('B-ORG')").addClass('ner_org'); 281 | $("#table td:contains('I-ORG')").addClass('ner_org'); 282 | $("#table td:contains('B-WORK')").addClass('ner_work'); 283 | $("#table td:contains('I-WORK')").addClass('ner_work'); 284 | $("#table td:contains('B-CONF')").addClass('ner_conf'); 285 | $("#table td:contains('I-CONF')").addClass('ner_conf'); 286 | $("#table td:contains('B-EVT')").addClass('ner_evt'); 287 | $("#table td:contains('I-EVT')").addClass('ner_evt'); 288 | $("#table td:contains('B-TODO')").addClass('ner_todo'); 289 | $("#table td:contains('I-TODO')").addClass('ner_todo'); 290 | } 291 | 292 | function makeTdEditable(td, content) { 293 | 294 | $(td).removeClass('editable'); 295 | 296 | let tableInfo = $(td).data('tableInfo'); 297 | 298 | editingTd = { 299 | finish: 300 | function (isOk) { 301 | $(td).addClass('editable'); 302 | keyboard_listener.reset(); 303 | listener.reset(); 304 | 305 | if (isOk) { 306 | let newValue = $('#edit-area').val(); 307 | 308 | data.data[tableInfo.nRow][tableInfo.column] = newValue; 309 | 310 | sanitizeData(); 311 | notifyChange(); 312 | updateTable(); 313 | } 314 | 315 | tableInfo.fillAction($(td)); 316 | editingTd = null; 317 | 318 | $(".simple-keyboard").html(""); 319 | 320 | $(td).focus(); 321 | } 322 | }; 323 | 324 | let textArea = document.createElement('textarea'); 325 | textArea.style.width = td.clientWidth + 'px'; 326 | textArea.style.height = td.clientHeight + 'px'; 327 | textArea.className = "input" 328 | textArea.id = 'edit-area'; 329 | 330 | $(textArea).val(data.data[tableInfo.nRow][tableInfo.column]); 331 | $(td).html(''); 332 | $(td).append(textArea); 333 | textArea.focus(); 334 | 335 | let edit_html = 336 | `
337 | 338 | 339 | 341 |
`; 342 | 343 | td.insertAdjacentHTML("beforeEnd", edit_html); 344 | 345 | $('#edit-ok').on('click', 346 | function(evt) { 347 | editingTd.finish(true); 348 | }); 349 | 350 | $('#edit-cancel').on('click', 351 | function(evt) { 352 | editingTd.finish(false); 353 | }); 354 | 355 | let listener = new window.keypress.Listener($('#edit-area'), listener_defaults); 356 | 357 | listener.simple_combo('enter', function() { $('#edit-ok').click(); } ); 358 | listener.simple_combo('esc', function() { $('#edit-cancel').click(); } ); 359 | listener.simple_combo('ctrl', function() { toggleLayout(); } ); 360 | 361 | let keyboard_listener = new window.keypress.Listener($('#simple-keyboard'), listener_defaults); 362 | 363 | keyboard_listener.simple_combo('enter', function() { $('#edit-ok').click(); } ); 364 | keyboard_listener.simple_combo('esc', function() { $('#edit-cancel').click(); } ); 365 | keyboard_listener.simple_combo('ctrl', function() { toggleLayout(); } ); 366 | 367 | let Keyboard = window.SimpleKeyboard.default; 368 | 369 | function onChange(input) { 370 | document.querySelector("#edit-area").value = input; 371 | } 372 | 373 | function toggleLayout() { 374 | let currentLayout = keyboard.options.layoutName; 375 | let layoutToggle = currentLayout === "default" ? "layout1" : "default"; 376 | 377 | keyboard.setOptions({ layoutName: layoutToggle}); 378 | } 379 | 380 | let keyboard = 381 | new Keyboard( 382 | { onChange: input => onChange(input), 383 | layout: { 384 | 'default': [ 385 | "\u010C \u010D \u0108 \u0109 \u011C \u011D \u0124 \u0125 \u0134 \u0135 \u015C \u015D \u0174 \u0175 \u0176", 386 | "\u0158 \u0159 \u0160 \u0161 \u010E \u011A \u011B \u0147 \u0148 \u0164 \u0102 \u0103 \u0114 \u0115 \u011E", 387 | "\u011F \u012C \u012D \u014E \u014F \u016C \u00D2 \u00D5 \u00D6 \u00D8 \u00F2 \u00F5 \u00F6 \uE1DC \uE5DC", 388 | "\u0177 \u0104 \u0105 \u0106 \u0143 \u0144 \u015A \u015B \u0179 \u017A \u017B \uF1AC \u00AD \u00AC \u00BD", 389 | "\u00C0 \u00C3 \u00C4 \u00C6 \u00E0 \u00E3 \u00E4 \u00E6 \u0101 \u023A \u2C65 \uE42C \uEFA1 \uF500 \uF532", 390 | "\u0253 \uF524 \u00C7 \u00E7 \u0107 \uEEC4 \uEEC5 \uF501 \uF502 \uF517 \uF520 \uF522 \uF531 \uF50A \uF51B", 391 | "\u00C8 \u00C9 \u00CB \u00E8 \u00E9 \u00EB \u0113 \u0118 \u0119 \u0256 \u0247 \u1EBD \u204A \uE4E1 \uF158", 392 | "\uF219 \uF515 \uFB00 \uFB01 \uFB02 \uFB03 \uA7A0 \uA7A1 \uF504 \uF505 \uF506 \uF521 \uF525 \u00CD \u00ED", 393 | "\u00EF \u0129 \u012B \u0133 \uA76D \uF220 \uF533 \uEBE3 \uA742 \uA743 \uA7A2 \uA7A3 \u0141 \u0142 \uF4F9", 394 | "\uF50B \uE5B8 \uF519 \u00D1 \u00F1 \uA7A4 \uA7A5 {bksp}"], 395 | 'layout1': [ 396 | "\u00F8 \u014D \u0153 \uE644 \uA750 \uA751 \uA752 \uA753 \uE665 \uEED6 \uEED7 \uF51F \uF526 \uF529 \uA756 \uA757", 397 | "\uA759 \uE282 \uE681 \uE682 \uE68B \uE8BF \uF508 \uF509 \uF50C \uF50D \uF50E \uF50F \uF51A \uF523 \uF52F \uF535", 398 | "\u211F \uA75C \uA75D \uA7A6 \uA7A7 \uF510 \uF518 \uF536 \u00DF \u017F \u1E9C \u1E9E \uEADA \uEBA2 \uEBA3 \uEBA6", 399 | "\uEBA7 \uEBAC \uF4FC \uF4FF \uF511 \uF51E \uF528 \uF52C \uFB06 \uE6E2 \uEEDC \uF512 \uF537 \u00D9 \u00DC \u00F9", 400 | "\u00FC \u0169 \u016B \u016D \u016E \u016F \uA770 \uE72B \uF1A5 \uF1A6 \uF534 \uE73A \uE8BA \uF513 \uF527 \uF514", 401 | "\u1EF9 \uE781 \uF52A \uF52B \u017C \u017D \u017E \uF516 \uF51D \u1F51 \u2042 \u2184 \u2234 \u261c \u261E \u2767", 402 | "\u2010 \u2011 \u2E17 \uF161 \uF51C \uF52D \uF538 \uFFFD \uA75B {bksp}"] 403 | } 404 | }); 405 | 406 | keyboard.setInput($(textArea).val()); 407 | 408 | $(textArea).on('input', function() { 409 | keyboard.setInput($(textArea).val()); 410 | }); 411 | } 412 | 413 | function sanitizeData() { 414 | let last_url_id = 0; 415 | let word_pos = 0; 416 | for(let i = 0; i < data.data.length; i++){ 417 | 418 | min_left = (parseInt(data.data[i]['left']) < min_left) ? parseInt(data.data[i]['left']) : min_left; 419 | max_right= (parseInt(data.data[i]['right']) > max_right) ? parseInt(data.data[i]['right']) : max_right; 420 | 421 | min_top = (parseInt(data.data[i]['top']) < min_top) ? parseInt(data.data[i]['top']) : min_top; 422 | max_bottom = (parseInt(data.data[i]['bottom']) > max_bottom) ? parseInt(data.data[i]['bottom']) : max_bottom; 423 | 424 | let token_col = "TOKEN"; 425 | if (data.meta.fields.includes('TEXT')) { 426 | token_col = "TEXT"; 427 | } 428 | 429 | if ((data.data[i][token_col] == null) || (data.data[i][token_col].toString().length == 0)){ 430 | word_pos = 0; 431 | } 432 | 433 | if (data.meta.fields.includes('No.')) { 434 | if (data.data[i]['No.'] < word_pos) { 435 | word_pos = 0; 436 | } 437 | data.data[i]['No.'] = word_pos; 438 | } 439 | 440 | if (data.data[i][token_col] == null) data.data[i][token_col] = ''; 441 | data.data[i][token_col] = data.data[i][token_col].toString().replace(/(\r\n|\n|\r)/gm, ""); 442 | 443 | if (data.meta.fields.includes('NE-TAG')) { 444 | if (data.data[i]['ID'] == null) data.data[i]['ID'] = ''; 445 | if (data.data[i]['NE-TAG'] == null) data.data[i]['NE-TAG'] = ''; 446 | if (data.data[i]['NE-EMB'] == null) data.data[i]['NE-EMB'] = ''; 447 | 448 | data.data[i]['ID'] = data.data[i]['ID'].toString().replace(/(\r\n|\n|\r)/gm, ""); 449 | data.data[i]['NE-TAG'] = data.data[i]['NE-TAG'].toString().replace(/(\r\n|\n|\r)/gm, ""); 450 | data.data[i]['NE-EMB'] = data.data[i]['NE-EMB'].toString().replace(/(\r\n|\n|\r)/gm, ""); 451 | } 452 | 453 | if (data.meta.fields.includes('url_id')) { 454 | if (typeof data.data[i]['url_id'] === 'string' || data.data[i]['url_id'] instanceof String) { 455 | 456 | let num = parseInt(data.data[i]['url_id']); 457 | 458 | if (!isNaN(num)) { 459 | last_url_id = num; 460 | } 461 | 462 | data.data[i]['url_id'] = last_url_id; 463 | } 464 | else { 465 | last_url_id = data.data[i]['url_id']; 466 | } 467 | } 468 | 469 | word_pos++; 470 | } 471 | } 472 | 473 | function tableEditAction(nRow, action) { 474 | 475 | if (editingTd != null) return; 476 | 477 | if (action == null) return; 478 | 479 | if (data.data[nRow]['TOKEN'] == null) data.data[nRow]['TOKEN'] = ''; 480 | 481 | if (action.includes('merge')) { 482 | 483 | if (nRow < 1) return; 484 | 485 | if (data.data[nRow - 1]['TOKEN'] == null) data.data[nRow - 1]['TOKEN'] = ''; 486 | 487 | data.data[nRow - 1]['TOKEN'] = 488 | data.data[nRow - 1]['TOKEN'].toString() + data.data[nRow]['TOKEN'].toString(); 489 | 490 | if (data.meta.fields.includes('No.')) { 491 | 492 | let word_pos=data.data[nRow]['No.']; 493 | 494 | data.data.splice(nRow, 1); 495 | 496 | for(let i = nRow; i < data.data.length; i++){ 497 | if (data.data[i]['No.'] <= word_pos) { 498 | break; 499 | } 500 | else { 501 | data.data[i]['No.'] = word_pos; 502 | word_pos++; 503 | } 504 | } 505 | } 506 | else { 507 | data.data.splice(nRow, 1); 508 | } 509 | } 510 | else if (action.includes('split')) { 511 | 512 | data.data.splice(nRow, 0, JSON.parse(JSON.stringify(data.data[nRow]))); 513 | 514 | if (data.meta.fields.includes('No.')) { 515 | 516 | data.data[nRow + 1]['No.'] = data.data[nRow]['No.'] + 1; 517 | let word_pos=data.data[nRow]['No.']; 518 | for(let i = nRow+1; i < data.data.length; i++){ 519 | if (data.data[i]['No.'] <= data.data[nRow]['No.']) { 520 | break; 521 | } 522 | else { 523 | data.data[i]['No.'] = word_pos+1; 524 | word_pos++; 525 | } 526 | } 527 | } 528 | } 529 | else if (action.includes('delete')) { 530 | 531 | if (data.meta.fields.includes('No.')) { 532 | 533 | let word_pos=data.data[nRow]['No.']; 534 | 535 | data.data.splice(nRow, 1); 536 | 537 | for(let i = nRow; i < data.data.length; i++){ 538 | if (data.data[i]['No.'] <= word_pos) { 539 | break; 540 | } 541 | else { 542 | data.data[i]['No.'] = word_pos; 543 | word_pos++; 544 | } 545 | } 546 | } 547 | else { 548 | data.data.splice(nRow, 1); 549 | } 550 | } 551 | else if (action.includes('sentence')) { 552 | 553 | if (data.meta.fields.includes('No.')) { 554 | if (data.data[nRow]['No.'] == 0) return; 555 | } 556 | 557 | let new_line = JSON.parse(JSON.stringify(data.data[nRow])); 558 | new_line['TOKEN'] = ''; 559 | new_line['NE-TAG'] = 'O'; 560 | new_line['NE-EMB'] = 'O'; 561 | new_line['ID'] = ''; 562 | 563 | data.data.splice(nRow, 0, new_line); 564 | 565 | if (data.meta.fields.includes('No.')) { 566 | 567 | data.data[nRow]['No.'] = 0; 568 | let word_pos=0; 569 | for(let i = nRow+1; i < data.data.length; i++){ 570 | if (data.data[i]['No.'] <= word_pos) { 571 | break; 572 | } 573 | else { 574 | data.data[i]['No.'] = word_pos+1; 575 | word_pos++; 576 | } 577 | } 578 | } 579 | } 580 | 581 | sanitizeData(); 582 | notifyChange(); 583 | updateTable(); 584 | } 585 | 586 | function makeLineSplitMerge(td) { 587 | 588 | let tableInfo = $(td).data('tableInfo'); 589 | 590 | editingTd = { 591 | data: data.data[tableInfo.nRow][tableInfo.column], 592 | finish: function(action, isOk) { 593 | 594 | $(td).html(editingTd.data); 595 | $(td).addClass('editable'); 596 | 597 | editingTd = null; 598 | 599 | tableEditAction(tableInfo.nRow, action) 600 | 601 | $(td).focus(); 602 | } 603 | }; 604 | 605 | let edit_html = ` 606 |
607 |
↕  split
608 |
⟳ merge
609 |
☇ sentence
610 |
ⓧ delete
611 |
`; 612 | 613 | $(td).removeClass('editable'); 614 | $(td).html(edit_html); 615 | 616 | $('#tokenizer').mouseleave( function(event) { editingTd.finish(null, false); }); 617 | 618 | $('.tokenizer-action').click(function(event) { editingTd.finish($(event.target).text(), true); }); 619 | } 620 | 621 | function makeTagEdit(td) { 622 | 623 | let tableInfo = $(td).data('tableInfo'); 624 | 625 | editingTd = { 626 | data: data.data[tableInfo.nRow][tableInfo.column], 627 | finish: function(isOk) { 628 | tableInfo.fillAction($(td)) 629 | 630 | $(td).addClass('editable'); 631 | 632 | editingTd = null; 633 | 634 | colorCodeNETag(); 635 | 636 | notifyChange(); 637 | } 638 | }; 639 | 640 | let edit_html = ` 641 |
642 |
O 643 |
644 |
B 645 |
646 |
B-PER
647 |
B-LOC
648 |
B-ORG
649 |
B-WORK
650 |
B-CONF
651 |
B-EVT
652 |
B-TODO
653 |
654 |
655 |
I 656 |
657 |
I-PER
658 |
I-LOC
659 |
I-ORG
660 |
I-WORK
661 |
I-CONF
662 |
I-EVT
663 |
I-TODO
664 |
665 |
666 |
667 | `; 668 | 669 | $(td).removeClass('editable'); 670 | $(td).html(edit_html); 671 | $('#tagger').mouseleave( function(event) { editingTd.finish(false); }); 672 | 673 | $('.type_select').click( 674 | function(event) { 675 | data.data[tableInfo.nRow][tableInfo.column] = $(event.target).text().trim(); 676 | 677 | editingTd.finish(true); 678 | }); 679 | } 680 | 681 | function createTable() { 682 | 683 | sanitizeData(); 684 | 685 | let editable_html =``; 686 | 687 | $.each(data.data, 688 | function(nRow, el) { 689 | 690 | if (nRow < startIndex) return; 691 | if (nRow >= endIndex) return; 692 | 693 | let row = $("").data('tableInfo', { 'nRow': nRow }); 694 | 695 | row.focusin( 696 | function() { 697 | updatePreview(row.data('tableInfo').nRow); 698 | 699 | $('#preview-rgn').css('transform', 'translate(0,' + (row.position().top + row.height()/2) + 'px)' 700 | + ' translate(0%,-50%)'); 701 | }); 702 | 703 | row.append($(''). 704 | text(nRow). 705 | data('tableInfo', { 'nRow': nRow }) 706 | ); 707 | 708 | let row_listener = new window.keypress.Listener(row, listener_defaults); 709 | 710 | row_listener.register_many( 711 | [ 712 | { 713 | keys: 's t', 714 | on_keydown: 715 | function() { 716 | if (editingTd != null) return true; 717 | 718 | tableEditAction(row.data('tableInfo').nRow, 'sentence'); 719 | }, 720 | is_sequence: true, 721 | is_solitary: true, 722 | is_exclusive: true 723 | }, 724 | 725 | { 726 | keys: 's p', 727 | on_keydown: 728 | function() { 729 | if (editingTd != null) return true; 730 | 731 | tableEditAction(row.data('tableInfo').nRow, 'split'); 732 | }, 733 | is_sequence: true, 734 | is_solitary: true, 735 | is_exclusive: true 736 | }, 737 | 738 | { 739 | keys: 'm e', 740 | on_keydown: 741 | function() { 742 | if (editingTd != null) return true; 743 | 744 | tableEditAction(row.data('tableInfo').nRow, 'merge'); 745 | }, 746 | is_sequence: true, 747 | is_solitary: true, 748 | is_exclusive: true 749 | }, 750 | 751 | { 752 | keys: 'd l', 753 | on_keydown: 754 | function() { 755 | if (editingTd != null) return true; 756 | 757 | tableEditAction(row.data('tableInfo').nRow, 'delete'); 758 | }, 759 | is_sequence: true, 760 | is_solitary: true, 761 | is_exclusive: true 762 | } 763 | ] 764 | ); 765 | 766 | $.each(el, 767 | function(column, content) { 768 | 769 | let td = $(editable_html) 770 | 771 | let listener = new window.keypress.Listener(td, listener_defaults); 772 | 773 | if (do_not_display.has(column)) return 774 | 775 | let clickAction = function() { console.log('Do something different');} 776 | 777 | let fillAction = (function(column) { 778 | return function(td) { 779 | let tableInfo = $(td).data('tableInfo'); 780 | 781 | let content = data.data[tableInfo.nRow][tableInfo.column]; 782 | 783 | td.text(content); 784 | 785 | if ( ((column == 'TEXT') || (column == 'TOKEN')) 786 | && (data.meta.fields.includes('ocrconf'))) { 787 | 788 | td.css('background-color', data.data[tableInfo.nRow]['ocrconf']); 789 | } 790 | 791 | }; })(column); 792 | 793 | let head_html = ` 794 | 795 |
${column}
796 | `; 797 | 798 | if (!($("th#" + column.replace(/\./g, "\\.")).length)) { 799 | $("#tablehead").append(head_html); 800 | } 801 | 802 | if (column == 'No.') { 803 | clickAction = makeLineSplitMerge; 804 | } 805 | else if ((column == 'TEXT') || (column == 'TOKEN') || (column == 'ID')) { 806 | 807 | clickAction = makeTdEditable; 808 | 809 | listener.simple_combo('enter', function() { $(td).click(); }); 810 | 811 | if (column == 'ID') { 812 | fillAction = 813 | function(td) { 814 | 815 | let tableInfo = $(td).data('tableInfo'); 816 | 817 | let content = data.data[tableInfo.nRow]['ID']; 818 | 819 | if (String(content).match(/^Q[0-9]+.*/g) == null) { 820 | td.text(content); 821 | } 822 | else { 823 | td.html(""); 824 | 825 | var reg = /.*?(Q[0-9]+).*?/g; 826 | var result; 827 | let count = 0; 828 | while((element = reg.exec(content)) !== null) { 829 | 830 | if (count > 2) break; 831 | 832 | //console.log(element); 833 | let link = $('' + 835 | element[1] + "") 836 | link.click( 837 | function(event) { 838 | event.stopPropagation(); 839 | } 840 | ); 841 | 842 | td.append(link); 843 | td.append($("
")) 844 | count++; 845 | } 846 | } 847 | }; 848 | } 849 | } 850 | else if ((column == 'NE-TAG') || (column == 'NE-EMB')) { 851 | clickAction = makeTagEdit; 852 | 853 | function tagAction(tag) { 854 | 855 | tableInfo = $(td).data('tableInfo'); 856 | 857 | data.data[tableInfo.nRow][tableInfo.column] = tag; 858 | 859 | td.html(tag); 860 | colorCodeNETag(); 861 | notifyChange(); 862 | }; 863 | 864 | listener.sequence_combo('b p', function() { tagAction('B-PER'); }); 865 | listener.sequence_combo('b l', function() { tagAction('B-LOC'); }); 866 | listener.sequence_combo('b o', function() { tagAction('B-ORG'); }); 867 | listener.sequence_combo('b w', function() { tagAction('B-WORK'); }); 868 | listener.sequence_combo('b c', function() { tagAction('B-CONF'); }); 869 | listener.sequence_combo('b e', function() { tagAction('B-EVT'); }); 870 | listener.sequence_combo('b t', function() { tagAction('B-TODO'); }); 871 | 872 | listener.sequence_combo('i p', function() { tagAction('I-PER'); }); 873 | listener.sequence_combo('i l', function() { tagAction('I-LOC'); }); 874 | listener.sequence_combo('i o', function() { tagAction('I-ORG'); }); 875 | listener.sequence_combo('i w', function() { tagAction('I-WORK'); }); 876 | listener.sequence_combo('i c', function() { tagAction('I-CONF'); }); 877 | listener.sequence_combo('i e', function() { tagAction('I-EVT'); }); 878 | listener.sequence_combo('i t', function() { tagAction('I-TODO'); }); 879 | 880 | listener.sequence_combo('backspace', function() { tagAction('O'); }); 881 | } 882 | 883 | td.attr('tabindex', 0). 884 | data('tableInfo', 885 | {'nRow': nRow, 886 | 'column': column , 887 | 'clickAction': clickAction, 888 | 'fillAction': fillAction 889 | }); 890 | 891 | fillAction(td); 892 | 893 | row.append(td); 894 | }); 895 | 896 | $("#table tbody").append(row); 897 | }); 898 | 899 | colorCodeNETag(); 900 | 901 | $(".hover").on('mouseover', 902 | function (evt) { 903 | 904 | if (editingTd != null) return; 905 | 906 | $(evt.target).focus(); 907 | } 908 | ); 909 | 910 | if ($("#docpos").val() != startIndex) { 911 | 912 | $("#docpos").val(data.data.length - startIndex); 913 | } 914 | } 915 | 916 | function updateTable() { 917 | 918 | editingTd = null; 919 | 920 | let rows = $('tbody').children('tr'); 921 | 922 | let pRow = 0; 923 | 924 | $.each(data.data, 925 | function(nRow, el) { 926 | 927 | if (nRow < startIndex) return; 928 | if (nRow >= endIndex) return; 929 | 930 | 931 | let row = $(rows[pRow]); 932 | let tableInfo = row.data('tableInfo'); 933 | 934 | tableInfo.nRow = nRow; 935 | 936 | row.data('tableInfo', tableInfo); 937 | 938 | let loc = $(row.children('td').first()); 939 | loc.data('tableInfo', tableInfo); 940 | loc.text(nRow); 941 | 942 | let columns = $(rows[pRow]).children('.editable'); 943 | let pColumn = 0; 944 | 945 | $.each(el, 946 | function(column_name, content) { 947 | 948 | if (do_not_display.has(column_name)) return; 949 | 950 | let td = $(columns[pColumn]); 951 | 952 | tableInfo = td.data('tableInfo'); 953 | 954 | tableInfo.nRow = nRow; 955 | 956 | td.data('tableInfo', tableInfo); 957 | 958 | tableInfo.fillAction(td); 959 | 960 | pColumn++; 961 | }); 962 | 963 | pRow++; 964 | }); 965 | 966 | colorCodeNETag(); 967 | 968 | if ($("#docpos").val() != startIndex) { 969 | 970 | $("#docpos").val(data.data.length - startIndex); 971 | } 972 | 973 | if ($(':focus').data('tableInfo')) 974 | updatePreview($(':focus').data('tableInfo').nRow); 975 | } 976 | 977 | function saveFile(evt) { 978 | 979 | let csv = 980 | Papa.unparse(data, 981 | { 982 | header: true, 983 | delimiter: '\t', 984 | comments: "#", 985 | quoteChar: "", 986 | escapeChar: "", 987 | //quoteChar: String.fromCharCode(0), 988 | //escapeChar: String.fromCharCode(0), 989 | skipEmptyLines: true, 990 | dynamicTyping: true 991 | }); 992 | 993 | let lines = csv.split(/\r\n|\n/); 994 | 995 | csv = [ lines[0] ]; 996 | 997 | let url_id = -1; 998 | 999 | for(let i = 0; i < data.data.length; i++){ 1000 | if (data.data[i]['url_id'] > url_id) { 1001 | 1002 | url_id = data.data[i]['url_id']; 1003 | 1004 | if (urls != null) 1005 | csv.push("# " + urls[url_id]); 1006 | } 1007 | csv.push(lines[i+1]); 1008 | } 1009 | 1010 | csv = csv.join('\n'); 1011 | 1012 | openSaveFileDialog (csv, file.name, null); 1013 | 1014 | checkForSave(csv); 1015 | } 1016 | 1017 | function openSaveFileDialog (data, filename, mimetype) { 1018 | 1019 | if (!data) return; 1020 | 1021 | let blob = data.constructor !== Blob 1022 | ? new Blob([data], {type: mimetype || 'application/octet-stream'}) 1023 | : data ; 1024 | 1025 | if (navigator.msSaveBlob) { 1026 | navigator.msSaveBlob(blob, filename); 1027 | return; 1028 | } 1029 | 1030 | let lnk = document.createElement('a'), 1031 | url = window.URL, 1032 | objectURL; 1033 | 1034 | if (mimetype) { 1035 | lnk.type = mimetype; 1036 | } 1037 | 1038 | lnk.download = filename || 'untitled'; 1039 | lnk.href = objectURL = url.createObjectURL(blob); 1040 | lnk.dispatchEvent(new MouseEvent('click')); 1041 | setTimeout(url.revokeObjectURL.bind(url, objectURL)); 1042 | } 1043 | 1044 | function stepsBackward (nrows) { 1045 | 1046 | if (editingTd != null) return; 1047 | 1048 | if (startIndex >= nrows) { 1049 | startIndex -= nrows; 1050 | endIndex -= nrows; 1051 | } 1052 | else { 1053 | startIndex = 0; 1054 | endIndex = displayRows; 1055 | } 1056 | 1057 | updateTable(); 1058 | } 1059 | 1060 | function stepsForward(nrows) { 1061 | 1062 | if (editingTd != null) return; 1063 | 1064 | if (endIndex + nrows < data.data.length) { 1065 | endIndex += nrows; 1066 | startIndex = endIndex - displayRows; 1067 | } 1068 | else { 1069 | endIndex = data.data.length; 1070 | startIndex = endIndex - displayRows; 1071 | } 1072 | 1073 | updateTable(); 1074 | } 1075 | 1076 | function init() { 1077 | 1078 | $("#tableregion").empty(); 1079 | 1080 | $("#btn-region").empty(); 1081 | 1082 | $("#file-region").empty(); 1083 | 1084 | $("#region-right").empty(); 1085 | 1086 | let range_html = 1087 | ` 1088 | 1091 | `; 1092 | 1093 | $("#region-right").html(range_html) 1094 | 1095 | $("#docpos").change( 1096 | function(evt) { 1097 | 1098 | if (editingTd != null) { 1099 | editingTd.finish(true); 1100 | } 1101 | 1102 | if (startIndex == data.data.length - this.value) return; 1103 | 1104 | startIndex = data.data.length - this.value; 1105 | endIndex = startIndex + displayRows; 1106 | 1107 | updateTable(); 1108 | }); 1109 | 1110 | $('#docpos').slider(); 1111 | 1112 | let table_html = 1113 | ` 1114 | 1115 | 1116 | 1117 | 1120 | 1121 | 1122 | 1123 |
1118 |
LOCATION
1119 |
1124 |
1125 |
1126 | `; 1127 | 1128 | let save_html = 1129 | ``; 1130 | 1131 | $("#tableregion").html(table_html); 1132 | 1133 | $("#btn-region").html(save_html); 1134 | 1135 | $("#save").attr('disabled', !has_changes); 1136 | 1137 | let parts = file.name.split(/(?=[\.|\-|_])/); 1138 | 1139 | let heading = parts.join("­") 1140 | 1141 | $("#file-region").html('

' + heading + '

'); 1142 | 1143 | $('.saveButton').on('click', saveFile); 1144 | 1145 | $('#table').on('click', 1146 | function(event) { 1147 | 1148 | let target = event.target.closest('.editable'); 1149 | 1150 | if (target == null) return; 1151 | 1152 | if (editingTd) { 1153 | 1154 | if (target == $(':focus')) return; 1155 | if ($.contains($(':focus')[0], target)) return; 1156 | if ($.contains(target, $(':focus')[0])) return; 1157 | if ($.contains($('.simple-keyboard')[0], event.target)) return; 1158 | 1159 | let refocus = $(':focus'); 1160 | 1161 | editingTd.finish(true); 1162 | 1163 | refocus.focus(); 1164 | 1165 | } 1166 | 1167 | if (!$.contains($('#table')[0], target)) return 1168 | 1169 | $(target).data('tableInfo').clickAction(target); 1170 | }); 1171 | 1172 | 1173 | createTable(); 1174 | 1175 | let prev_button_html=` 1176 | 1177 | `; 1178 | 1179 | let next_button_html= ` 1180 | 1181 | `; 1182 | 1183 | $("#location").prepend(prev_button_html); 1184 | $("#tablehead").children().last().children().last().append(next_button_html); 1185 | 1186 | $('#back').on('click', function() { stepsBackward(displayRows); } ); 1187 | $('#next').on('click', function() { stepsForward(displayRows); } ); 1188 | 1189 | } 1190 | 1191 | $('#tableregion')[0].addEventListener("wheel", 1192 | function(event) { 1193 | 1194 | if (editingTd != null) return true; 1195 | 1196 | if (event.deltaY < 0) stepsBackward(1); 1197 | else stepsForward(1); 1198 | }); 1199 | 1200 | wnd_listener.simple_combo('tab', 1201 | function () { 1202 | if (editingTd != null) 1203 | return false; // If we are in editing mode, we do not want to propagate the TAB event. 1204 | else return true; // In non-editing mode, we want to get the "normal" tab behaviour. 1205 | }); 1206 | 1207 | wnd_listener.simple_combo('pageup', 1208 | function() { 1209 | if (editingTd != null) return true; 1210 | 1211 | $('#back').click(); 1212 | }); 1213 | 1214 | wnd_listener.simple_combo('pagedown', 1215 | function() { 1216 | if (editingTd != null) return true; 1217 | 1218 | $('#next').click(); 1219 | }); 1220 | 1221 | wnd_listener.simple_combo('left', 1222 | function() { 1223 | if (editingTd != null) return true; 1224 | 1225 | let prev = $(':focus').prev('.editable') 1226 | 1227 | if (prev.length==0) { 1228 | $(':focus').closest('tr').prev('tr').children('.editable').last().focus(); 1229 | } 1230 | else { 1231 | prev.focus(); 1232 | } 1233 | }); 1234 | wnd_listener.simple_combo('right', 1235 | function() { 1236 | if (editingTd != null) return true; 1237 | 1238 | let next = $(':focus').next('.editable') 1239 | 1240 | if (next.length==0) { 1241 | $(':focus').closest('tr').next('tr').children('.editable').first().focus(); 1242 | } 1243 | else { 1244 | next.focus(); 1245 | } 1246 | }); 1247 | 1248 | wnd_listener.register_combo( 1249 | { 1250 | keys: 'meta up', 1251 | on_keydown: 1252 | function() { 1253 | if (editingTd != null) return true; 1254 | 1255 | stepsBackward(1); 1256 | }, 1257 | is_solitary: true 1258 | } 1259 | ); 1260 | 1261 | wnd_listener.register_combo( 1262 | { 1263 | keys: 'up', 1264 | 1265 | on_keydown: 1266 | function() { 1267 | if (editingTd != null) return true; 1268 | 1269 | let prev = $(':focus').closest('tr').prev('tr') 1270 | 1271 | let pos = $(':focus').closest('tr').children('.editable').index($(':focus')) 1272 | 1273 | if (prev.length==0) { 1274 | stepsBackward(1); 1275 | } 1276 | else { 1277 | prev.children('.editable')[pos].focus(); 1278 | } 1279 | }, 1280 | is_solitary : true 1281 | }); 1282 | 1283 | wnd_listener.register_combo( 1284 | { 1285 | keys: 'meta down', 1286 | 1287 | on_keydown: function() { 1288 | if (editingTd != null) return true; 1289 | 1290 | stepsForward(1); 1291 | }, 1292 | is_solitary: true 1293 | } 1294 | ); 1295 | 1296 | wnd_listener.register_combo( 1297 | { 1298 | keys : 'down', 1299 | on_keydown: 1300 | function() { 1301 | if (editingTd != null) return true; 1302 | 1303 | let next = $(':focus').closest('tr').next('tr') 1304 | 1305 | let pos = $(':focus').closest('tr').children('.editable').index($(':focus')) 1306 | 1307 | if (next.length==0) { 1308 | stepsForward(1); 1309 | } 1310 | else { 1311 | next.children('.editable')[pos].focus(); 1312 | } 1313 | }, 1314 | is_solitary: true, 1315 | } 1316 | ); 1317 | 1318 | 1319 | wnd_listener.sequence_combo('l a', 1320 | function() { 1321 | 1322 | if (editingTd != null) return true; 1323 | 1324 | displayRows++; 1325 | 1326 | endIndex = startIndex + displayRows; 1327 | 1328 | if (endIndex >= data.data.length) { 1329 | startIndex = data.data.length - displayRows; 1330 | endIndex = data.data.length; 1331 | } 1332 | 1333 | slider_min = displayRows; 1334 | slider_max = data.data.length; 1335 | 1336 | init(); 1337 | }); 1338 | 1339 | wnd_listener.sequence_combo('l r', 1340 | function() { 1341 | 1342 | if (editingTd != null) return true; 1343 | 1344 | if (displayRows > 5) displayRows--; 1345 | 1346 | endIndex = startIndex + displayRows; 1347 | slider_min = displayRows; 1348 | slider_max = data.data.length; 1349 | 1350 | init(); 1351 | }); 1352 | 1353 | // public interface 1354 | let that = 1355 | { 1356 | hasChanges: function () { return has_changes; } 1357 | }; 1358 | 1359 | init(); 1360 | 1361 | return that; 1362 | } 1363 | 1364 | 1365 | $(document).ready( 1366 | function() { 1367 | 1368 | $('#tsv-file').change( 1369 | function(evt) { 1370 | 1371 | loadFile ( evt, 1372 | function(results, file, urls) { 1373 | 1374 | let neat = setupInterface(results, file, urls); 1375 | 1376 | $(window).bind("beforeunload", 1377 | function() { 1378 | 1379 | console.log(neat.hasChanges()); 1380 | 1381 | if (neat.hasChanges()) 1382 | return confirm("You have unsaved changes. Do you want to save them before leaving?"); 1383 | } 1384 | ); 1385 | }) 1386 | } 1387 | ); 1388 | } 1389 | ); 1390 | --------------------------------------------------------------------------------