├── .DS_Store ├── LICENSE.txt ├── README.md ├── _LREC_2020__I_Feel_Offended__Don_t_Be_Abusive_camera-ready.pdf ├── abusive_explicit_implicit_tree-2.png ├── data ├── .DS_Store ├── abuseval_labels │ ├── abuseval_offenseval_test.tsv │ └── abuseval_offenseval_train.tsv └── offenseval_explicit_implicit │ ├── offenseval_exp_imp_test.tsv │ └── offenseval_exp_imp_train.tsv ├── dictionary-based_experiments ├── .DS_Store ├── .idea │ ├── dictionary-based_experiments.iml │ ├── misc.xml │ ├── modules.xml │ └── workspace.xml ├── classify_abusivevalpy.py ├── classify_offenseval.py └── evaluation.py └── keywords ├── .DS_Store ├── keywords_offenseval_test.txt └── keywords_offenseval_train.txt /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tommasoc80/AbuseEval/73401ae84c703b7471197385c26f64848535a722/.DS_Store -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | ======================================================================= 2 | 3 | Attribution-NonCommercial-ShareAlike 4.0 International 4 | 5 | ======================================================================= 6 | Creative Commons Corporation (“Creative Commons”) is not a law firm and does not provide legal services or legal advice. 7 | Distribution of Creative Commons public licenses does not create a lawyer-client or other relationship. 8 | Creative Commons makes its licenses and related information available on an “as-is” basis. 9 | Creative Commons gives no warranties regarding its licenses, any material licensed under their terms and conditions, or any related information. 10 | Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible. 11 | 12 | Using Creative Commons Public Licenses 13 | 14 | Creative Commons public licenses provide a standard set of terms and conditions that creators and other rights holders may use to share original works of authorship and other material subject to copyright and certain other rights specified in the public license below. The following considerations are for informational purposes only, are not exhaustive, and do not form part of our licenses. 15 | 16 | Considerations for licensors: Our public licenses are intended for use by those authorized to give the public permission to use material in ways otherwise restricted by copyright and certain other rights. Our licenses are irrevocable. Licensors should read and understand the terms and conditions of the license they choose before applying it. Licensors should also secure all rights necessary before applying our licenses so that the public can reuse the material as expected. Licensors should clearly mark any material not subject to the license. This includes other CC-licensed material, or material used under an exception or limitation to copyright. More considerations for licensors. 17 | Considerations for the public: By using one of our public licenses, a licensor grants the public permission to use the licensed material under specified terms and conditions. If the licensor’s permission is not necessary for any reason–for example, because of any applicable exception or limitation to copyright–then that use is not regulated by the license. Our licenses grant only permissions under copyright and certain other rights that a licensor has authority to grant. Use of the licensed material may still be restricted for other reasons, including because others have copyright or other rights in the material. A licensor may make special requests, such as asking that all changes be marked or described. Although not required by our licenses, you are encouraged to respect those requests where reasonable. More considerations for the public. 18 | 19 | ======================================================================= 20 | 21 | Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License 22 | 23 | By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions. 24 | 25 | Section 1 – Definitions. 26 | 27 | Adapted Material means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image. 28 | Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License. 29 | BY-NC-SA Compatible License means a license listed at creativecommons.org/compatiblelicenses, approved by Creative Commons as essentially the equivalent of this Public License. 30 | Copyright and Similar Rights means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights. 31 | Effective Technological Measures means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements. 32 | Exceptions and Limitations means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material. 33 | License Elements means the license attributes listed in the name of a Creative Commons Public License. The License Elements of this Public License are Attribution, NonCommercial, and ShareAlike. 34 | Licensed Material means the artistic or literary work, database, or other material to which the Licensor applied this Public License. 35 | Licensed Rights means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license. 36 | Licensor means the individual(s) or entity(ies) granting rights under this Public License. 37 | NonCommercial means not primarily intended for or directed towards commercial advantage or monetary compensation. For purposes of this Public License, the exchange of the Licensed Material for other material subject to Copyright and Similar Rights by digital file-sharing or similar means is NonCommercial provided there is no payment of monetary compensation in connection with the exchange. 38 | Share means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them. 39 | Sui Generis Database Rights means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world. 40 | You means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning. 41 | 42 | Section 2 – Scope. 43 | 44 | License grant. 45 | Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to: 46 | reproduce and Share the Licensed Material, in whole or in part, for NonCommercial purposes only; and 47 | produce, reproduce, and Share Adapted Material for NonCommercial purposes only. 48 | Exceptions and Limitations. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions. 49 | Term. The term of this Public License is specified in Section 6(a). 50 | Media and formats; technical modifications allowed. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material. 51 | Downstream recipients. 52 | Offer from the Licensor – Licensed Material. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License. 53 | Additional offer from the Licensor – Adapted Material. Every recipient of Adapted Material from You automatically receives an offer from the Licensor to exercise the Licensed Rights in the Adapted Material under the conditions of the Adapter’s License You apply. 54 | No downstream restrictions. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material. 55 | No endorsement. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i). 56 | Other rights. 57 | 58 | Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise. 59 | Patent and trademark rights are not licensed under this Public License. 60 | To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties, including when the Licensed Material is used other than for NonCommercial purposes. 61 | 62 | Section 3 – License Conditions. 63 | 64 | Your exercise of the Licensed Rights is expressly made subject to the following conditions. 65 | 66 | Attribution. 67 | 68 | If You Share the Licensed Material (including in modified form), You must: 69 | 70 | retain the following if it is supplied by the Licensor with the Licensed Material: 71 | identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated); 72 | a copyright notice; 73 | a notice that refers to this Public License; 74 | a notice that refers to the disclaimer of warranties; 75 | a URI or hyperlink to the Licensed Material to the extent reasonably practicable; 76 | indicate if You modified the Licensed Material and retain an indication of any previous modifications; and 77 | indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. 78 | You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. 79 | If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable. 80 | ShareAlike. 81 | In addition to the conditions in Section 3(a), if You Share Adapted Material You produce, the following conditions also apply. 82 | 83 | The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-NC-SA Compatible License. 84 | You must include the text of, or the URI or hyperlink to, the Adapter's License You apply. You may satisfy this condition in any reasonable manner based on the medium, means, and context in which You Share Adapted Material. 85 | You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, Adapted Material that restrict exercise of the rights granted under the Adapter's License You apply. 86 | 87 | Section 4 – Sui Generis Database Rights. 88 | 89 | Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material: 90 | 91 | for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database for NonCommercial purposes only; 92 | if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material, including for purposes of Section 3(b); and 93 | You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database. 94 | For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights. 95 | 96 | Section 5 – Disclaimer of Warranties and Limitation of Liability. 97 | 98 | Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You. 99 | To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You. 100 | The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. 101 | 102 | Section 6 – Term and Termination. 103 | 104 | This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically. 105 | Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates: 106 | 107 | automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or 108 | upon express reinstatement by the Licensor. 109 | For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License. 110 | For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License. 111 | Sections 1, 5, 6, 7, and 8 survive termination of this Public License. 112 | 113 | Section 7 – Other Terms and Conditions. 114 | 115 | The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed. 116 | Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License. 117 | Section 8 – Interpretation. 118 | 119 | For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License. 120 | To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions. 121 | No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor. 122 | Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority. 123 | Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” The text of the Creative Commons public licenses is dedicated to the public domain under the CC0 Public Domain Dedication. Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at creativecommons.org/policies, Creative Commons does not authorize the use of the trademark “Creative Commons” or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses. 124 | 125 | Creative Commons may be contacted at creativecommons.org. 126 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # AbuseEval 2 | 3 | [![DOI](https://zenodo.org/badge/224515599.svg)](https://zenodo.org/badge/latestdoi/224515599) 4 | 5 | Data set for LREC 2020 paper ["I Feel Offended, Don't Be Abusive!"](_LREC_2020__I_Feel_Offended__Don_t_Be_Abusive_camera-ready.pdf) 6 | 7 | The repository is structured as follows: 8 | 9 | - data/ : the folder contains the enriched versions of the OffensEval/OLID dataset with the distinction of explicit/implicit offensive messages (./data/offenseval_explicit_implicit) and the newly proposed annotations of abusive messages (./data/abuseval_labels) 10 | - dictionary-based_experiments/ : the folder contains the script to replicate the dictionary experiments reported in the paper (OffenseEval sub-task A and AbuseEval binary classification) 11 | - keywords/ : the folder contains the list of the top 50 keywords from the OffensEval training and test data for sub-task A per class (list of keywords for offensive and not offensive messages) 12 | 13 | OLID/OffensEval Data: https://competitions.codalab.org/competitions/20011 14 | 15 | 16 | # Data Statement ([Bender and Friedman, 2018](https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00041)) 17 | 18 | The annotation of the explicit-implicit labels in OffensEval has been conducted by a male (38, Italian) and a female (39, Serbian) annotators, highly educated, with a background in computational linguistics, and familiar with Twitter. 19 | 20 | The inter-annotator agreement of AbuseEval has been conducted by three annotators: 1 man (38, Italian) and 2 women (39, Serbian; 23, Russian); all highly educated, with a background in computational linguistics, and familiar with Twitter. The full annotation of AbuseEval has been conducted by one annotator (23, Russian), highly educated and with a background in computational linguistics. 21 | 22 | All ages refer to the time of annotation: 2019. 23 | 24 | # References 25 | ``` 26 | @inproceedings{zampierietal2019, 27 | title={{Predicting the Type and Target of Offensive Posts in Social Media}}, 28 | author={Zampieri, Marcos and Malmasi, Shervin and Nakov, Preslav and Rosenthal, Sara and Farra, Noura and Kumar, Ritesh}, 29 | booktitle={Proceedings of NAACL}, 30 | year={2019} 31 | } 32 | 33 | @inproceedings{casellietal2020, 34 | title={{I Feel Offended, Don’t Be Abusive! Implicit/Explicit Messages in Offensive and Abusive Language}}, 35 | author={Tommaso Caselli,Valerio Basile, Jelena Mitrovi\'{c}, Inga Kartoziya, Michael Granitzer}, 36 | booktitle={Proceedings of LREC}, 37 | year={2020} 38 | } 39 | ``` 40 | 41 | Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 42 | -------------------------------------------------------------------------------- /_LREC_2020__I_Feel_Offended__Don_t_Be_Abusive_camera-ready.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tommasoc80/AbuseEval/73401ae84c703b7471197385c26f64848535a722/_LREC_2020__I_Feel_Offended__Don_t_Be_Abusive_camera-ready.pdf -------------------------------------------------------------------------------- /abusive_explicit_implicit_tree-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tommasoc80/AbuseEval/73401ae84c703b7471197385c26f64848535a722/abusive_explicit_implicit_tree-2.png -------------------------------------------------------------------------------- /data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tommasoc80/AbuseEval/73401ae84c703b7471197385c26f64848535a722/data/.DS_Store -------------------------------------------------------------------------------- /data/abuseval_labels/abuseval_offenseval_test.tsv: -------------------------------------------------------------------------------- 1 | id abuse 2 | 15923 NOTABU 3 | 27014 NOTABU 4 | 30530 NOTABU 5 | 13876 NOTABU 6 | 60133 EXP 7 | 83681 NOTABU 8 | 96874 NOTABU 9 | 65507 NOTABU 10 | 78910 NOTABU 11 | 46363 NOTABU 12 | 68123 NOTABU 13 | 22452 NOTABU 14 | 15565 NOTABU 15 | 64376 NOTABU 16 | 12588 NOTABU 17 | 34263 EXP 18 | 65773 NOTABU 19 | 95457 NOTABU 20 | 24930 NOTABU 21 | 15938 NOTABU 22 | 45712 NOTABU 23 | 70840 NOTABU 24 | 53563 NOTABU 25 | 59432 NOTABU 26 | 21454 NOTABU 27 | 83155 NOTABU 28 | 69576 NOTABU 29 | 49139 EXP 30 | 76669 NOTABU 31 | 58995 IMP 32 | 88490 IMP 33 | 77101 NOTABU 34 | 86917 NOTABU 35 | 78472 NOTABU 36 | 17798 NOTABU 37 | 71873 NOTABU 38 | 30247 NOTABU 39 | 81150 NOTABU 40 | 46444 EXP 41 | 60587 EXP 42 | 70569 EXP 43 | 59200 NOTABU 44 | 30900 NOTABU 45 | 44546 EXP 46 | 53982 NOTABU 47 | 37884 NOTABU 48 | 15079 NOTABU 49 | 51628 EXP 50 | 30899 NOTABU 51 | 99680 NOTABU 52 | 38628 NOTABU 53 | 40110 NOTABU 54 | 50310 NOTABU 55 | 84101 NOTABU 56 | 84876 NOTABU 57 | 57425 NOTABU 58 | 83262 NOTABU 59 | 46777 NOTABU 60 | 54053 NOTABU 61 | 15998 NOTABU 62 | 62945 NOTABU 63 | 96457 EXP 64 | 85977 NOTABU 65 | 70841 EXP 66 | 46139 EXP 67 | 99009 NOTABU 68 | 66388 NOTABU 69 | 91217 NOTABU 70 | 91994 NOTABU 71 | 58751 NOTABU 72 | 80947 NOTABU 73 | 99253 NOTABU 74 | 40386 EXP 75 | 58742 NOTABU 76 | 39752 NOTABU 77 | 25262 NOTABU 78 | 17703 NOTABU 79 | 98916 IMP 80 | 18709 NOTABU 81 | 26026 NOTABU 82 | 32190 IMP 83 | 27550 EXP 84 | 68539 NOTABU 85 | 93643 NOTABU 86 | 99120 NOTABU 87 | 24040 EXP 88 | 85436 NOTABU 89 | 73516 EXP 90 | 88905 EXP 91 | 52955 NOTABU 92 | 33452 NOTABU 93 | 13276 NOTABU 94 | 26462 NOTABU 95 | 94339 NOTABU 96 | 23786 NOTABU 97 | 27371 NOTABU 98 | 88670 NOTABU 99 | 54391 NOTABU 100 | 42112 EXP 101 | 37740 EXP 102 | 72405 EXP 103 | 20780 NOTABU 104 | 38820 NOTABU 105 | 82913 NOTABU 106 | 97383 NOTABU 107 | 21397 NOTABU 108 | 51738 NOTABU 109 | 47445 IMP 110 | 27158 NOTABU 111 | 26382 NOTABU 112 | 17183 NOTABU 113 | 10595 NOTABU 114 | 49481 NOTABU 115 | 37949 NOTABU 116 | 96729 NOTABU 117 | 42325 NOTABU 118 | 21748 NOTABU 119 | 24316 NOTABU 120 | 19516 NOTABU 121 | 79383 NOTABU 122 | 64857 NOTABU 123 | 89936 NOTABU 124 | 90362 NOTABU 125 | 75620 NOTABU 126 | 24907 NOTABU 127 | 26259 NOTABU 128 | 76095 NOTABU 129 | 11557 NOTABU 130 | 51963 NOTABU 131 | 15180 EXP 132 | 58026 NOTABU 133 | 26432 NOTABU 134 | 76565 NOTABU 135 | 91588 NOTABU 136 | 58033 NOTABU 137 | 81352 EXP 138 | 33133 NOTABU 139 | 96661 NOTABU 140 | 91036 EXP 141 | 56581 NOTABU 142 | 22882 IMP 143 | 66377 NOTABU 144 | 40842 EXP 145 | 73612 EXP 146 | 45709 NOTABU 147 | 21338 NOTABU 148 | 15630 NOTABU 149 | 15815 IMP 150 | 53858 NOTABU 151 | 89606 NOTABU 152 | 30942 NOTABU 153 | 56973 NOTABU 154 | 24049 EXP 155 | 79204 EXP 156 | 46229 EXP 157 | 93101 NOTABU 158 | 42196 NOTABU 159 | 38084 NOTABU 160 | 67973 NOTABU 161 | 72071 NOTABU 162 | 34669 IMP 163 | 66054 NOTABU 164 | 15607 NOTABU 165 | 13959 NOTABU 166 | 34575 IMP 167 | 27546 NOTABU 168 | 69203 NOTABU 169 | 86385 NOTABU 170 | 20357 EXP 171 | 83011 NOTABU 172 | 80059 NOTABU 173 | 52104 NOTABU 174 | 29531 NOTABU 175 | 52445 EXP 176 | 12746 NOTABU 177 | 10991 NOTABU 178 | 63725 NOTABU 179 | 57135 EXP 180 | 11284 NOTABU 181 | 57107 NOTABU 182 | 72369 IMP 183 | 84879 NOTABU 184 | 42192 EXP 185 | 20034 NOTABU 186 | 39516 NOTABU 187 | 43587 EXP 188 | 10684 NOTABU 189 | 76394 NOTABU 190 | 76692 NOTABU 191 | 41553 EXP 192 | 48839 NOTABU 193 | 38718 NOTABU 194 | 56048 NOTABU 195 | 18997 NOTABU 196 | 36720 NOTABU 197 | 51656 NOTABU 198 | 62986 EXP 199 | 24430 NOTABU 200 | 19563 NOTABU 201 | 19836 NOTABU 202 | 31033 NOTABU 203 | 70051 EXP 204 | 73642 IMP 205 | 10727 EXP 206 | 73442 NOTABU 207 | 59758 NOTABU 208 | 11781 NOTABU 209 | 11645 EXP 210 | 91116 NOTABU 211 | 86437 NOTABU 212 | 86268 NOTABU 213 | 21643 NOTABU 214 | 72274 EXP 215 | 95198 NOTABU 216 | 31550 NOTABU 217 | 21524 EXP 218 | 84579 NOTABU 219 | 26291 NOTABU 220 | 72424 NOTABU 221 | 28744 NOTABU 222 | 59961 NOTABU 223 | 59519 IMP 224 | 51386 IMP 225 | 76052 NOTABU 226 | 14837 NOTABU 227 | 57295 NOTABU 228 | 74610 NOTABU 229 | 55653 NOTABU 230 | 10918 IMP 231 | 57541 IMP 232 | 55318 NOTABU 233 | 80965 NOTABU 234 | 14051 NOTABU 235 | 25125 NOTABU 236 | 81974 NOTABU 237 | 76797 NOTABU 238 | 57494 NOTABU 239 | 49441 NOTABU 240 | 78266 NOTABU 241 | 31401 NOTABU 242 | 80020 NOTABU 243 | 61295 EXP 244 | 45518 NOTABU 245 | 53351 EXP 246 | 51610 NOTABU 247 | 26758 NOTABU 248 | 30718 NOTABU 249 | 25463 EXP 250 | 82558 NOTABU 251 | 85613 EXP 252 | 36875 NOTABU 253 | 86142 NOTABU 254 | 87711 NOTABU 255 | 51626 NOTABU 256 | 59423 NOTABU 257 | 80555 NOTABU 258 | 85830 NOTABU 259 | 29102 NOTABU 260 | 85886 IMP 261 | 54621 EXP 262 | 80397 IMP 263 | 76937 NOTABU 264 | 68875 IMP 265 | 89591 NOTABU 266 | 67926 EXP 267 | 41588 IMP 268 | 18091 NOTABU 269 | 56205 NOTABU 270 | 24467 NOTABU 271 | 47245 NOTABU 272 | 32851 NOTABU 273 | 86982 NOTABU 274 | 95247 NOTABU 275 | 74211 NOTABU 276 | 51948 EXP 277 | 28196 EXP 278 | 71881 NOTABU 279 | 50376 NOTABU 280 | 64161 NOTABU 281 | 45304 EXP 282 | 22248 NOTABU 283 | 27228 IMP 284 | 57326 IMP 285 | 97729 NOTABU 286 | 24957 NOTABU 287 | 68774 NOTABU 288 | 43350 NOTABU 289 | 41709 NOTABU 290 | 71063 NOTABU 291 | 92516 NOTABU 292 | 85100 NOTABU 293 | 89661 NOTABU 294 | 47898 NOTABU 295 | 88523 NOTABU 296 | 19321 NOTABU 297 | 26042 NOTABU 298 | 16694 NOTABU 299 | 88221 EXP 300 | 30357 NOTABU 301 | 11021 NOTABU 302 | 26763 NOTABU 303 | 59708 NOTABU 304 | 78244 NOTABU 305 | 11813 NOTABU 306 | 16468 NOTABU 307 | 75637 NOTABU 308 | 71071 NOTABU 309 | 17714 NOTABU 310 | 97610 IMP 311 | 63048 EXP 312 | 30075 EXP 313 | 74237 NOTABU 314 | 60466 IMP 315 | 25685 EXP 316 | 16333 IMP 317 | 78417 NOTABU 318 | 78760 NOTABU 319 | 78261 NOTABU 320 | 40790 NOTABU 321 | 63775 NOTABU 322 | 44108 NOTABU 323 | 98949 NOTABU 324 | 74414 NOTABU 325 | 51525 NOTABU 326 | 84550 NOTABU 327 | 19499 NOTABU 328 | 23530 EXP 329 | 84034 NOTABU 330 | 79051 NOTABU 331 | 93740 NOTABU 332 | 54366 NOTABU 333 | 57284 IMP 334 | 82565 NOTABU 335 | 59700 NOTABU 336 | 13433 EXP 337 | 80197 IMP 338 | 90353 NOTABU 339 | 36496 NOTABU 340 | 93321 NOTABU 341 | 79309 IMP 342 | 80637 NOTABU 343 | 50793 NOTABU 344 | 84239 NOTABU 345 | 47834 NOTABU 346 | 65426 NOTABU 347 | 81354 NOTABU 348 | 14479 NOTABU 349 | 47696 EXP 350 | 75438 NOTABU 351 | 11699 NOTABU 352 | 62379 NOTABU 353 | 71294 NOTABU 354 | 55633 EXP 355 | 43864 NOTABU 356 | 40013 NOTABU 357 | 30399 NOTABU 358 | 47886 NOTABU 359 | 87934 NOTABU 360 | 46232 NOTABU 361 | 21869 NOTABU 362 | 35976 NOTABU 363 | 59337 NOTABU 364 | 57732 NOTABU 365 | 17563 NOTABU 366 | 55899 NOTABU 367 | 47396 NOTABU 368 | 46717 NOTABU 369 | 78260 NOTABU 370 | 22695 NOTABU 371 | 27455 EXP 372 | 83466 IMP 373 | 48657 EXP 374 | 89635 NOTABU 375 | 52335 NOTABU 376 | 27360 NOTABU 377 | 53971 NOTABU 378 | 89603 NOTABU 379 | 86611 EXP 380 | 44222 NOTABU 381 | 41078 NOTABU 382 | 44625 NOTABU 383 | 21197 NOTABU 384 | 12193 IMP 385 | 73484 NOTABU 386 | 96283 NOTABU 387 | 78289 NOTABU 388 | 29485 NOTABU 389 | 67024 NOTABU 390 | 97430 NOTABU 391 | 63985 NOTABU 392 | 65505 NOTABU 393 | 29697 NOTABU 394 | 26695 NOTABU 395 | 86529 NOTABU 396 | 37332 NOTABU 397 | 80612 NOTABU 398 | 89438 NOTABU 399 | 68755 NOTABU 400 | 53675 NOTABU 401 | 31195 NOTABU 402 | 20522 IMP 403 | 52866 NOTABU 404 | 65428 NOTABU 405 | 94260 NOTABU 406 | 93256 NOTABU 407 | 78364 NOTABU 408 | 70324 NOTABU 409 | 96847 NOTABU 410 | 89329 EXP 411 | 40929 NOTABU 412 | 24798 NOTABU 413 | 99947 NOTABU 414 | 94203 NOTABU 415 | 11100 NOTABU 416 | 83240 EXP 417 | 14591 NOTABU 418 | 66094 NOTABU 419 | 59164 NOTABU 420 | 90036 NOTABU 421 | 24047 NOTABU 422 | 81890 IMP 423 | 26010 NOTABU 424 | 72037 EXP 425 | 56312 NOTABU 426 | 10860 NOTABU 427 | 38829 NOTABU 428 | 78301 NOTABU 429 | 56599 NOTABU 430 | 94607 EXP 431 | 72531 IMP 432 | 37214 NOTABU 433 | 84952 NOTABU 434 | 98575 EXP 435 | 58342 NOTABU 436 | 53914 NOTABU 437 | 76324 NOTABU 438 | 18730 NOTABU 439 | 39429 IMP 440 | 72893 NOTABU 441 | 18786 EXP 442 | 61004 NOTABU 443 | 46603 NOTABU 444 | 73236 NOTABU 445 | 35785 NOTABU 446 | 84045 NOTABU 447 | 67049 EXP 448 | 57804 NOTABU 449 | 82273 NOTABU 450 | 68340 NOTABU 451 | 78646 NOTABU 452 | 13582 NOTABU 453 | 36455 NOTABU 454 | 98994 NOTABU 455 | 92967 NOTABU 456 | 70762 NOTABU 457 | 98216 EXP 458 | 42113 NOTABU 459 | 61889 NOTABU 460 | 30365 NOTABU 461 | 68518 NOTABU 462 | 37649 EXP 463 | 21394 NOTABU 464 | 14582 NOTABU 465 | 88487 NOTABU 466 | 21714 NOTABU 467 | 50821 NOTABU 468 | 57793 NOTABU 469 | 55400 NOTABU 470 | 59751 NOTABU 471 | 66470 NOTABU 472 | 92215 IMP 473 | 11122 NOTABU 474 | 87139 NOTABU 475 | 17311 NOTABU 476 | 89392 NOTABU 477 | 22311 EXP 478 | 88188 NOTABU 479 | 35503 NOTABU 480 | 75125 EXP 481 | 49210 NOTABU 482 | 21354 EXP 483 | 48965 NOTABU 484 | 88939 NOTABU 485 | 22594 NOTABU 486 | 45855 NOTABU 487 | 59159 NOTABU 488 | 77395 NOTABU 489 | 90492 NOTABU 490 | 19410 NOTABU 491 | 67211 NOTABU 492 | 35940 NOTABU 493 | 33394 IMP 494 | 96543 NOTABU 495 | 45271 NOTABU 496 | 56513 IMP 497 | 59691 EXP 498 | 34864 NOTABU 499 | 97956 NOTABU 500 | 49164 NOTABU 501 | 14273 NOTABU 502 | 38323 NOTABU 503 | 84848 NOTABU 504 | 41997 IMP 505 | 56637 NOTABU 506 | 65514 NOTABU 507 | 19815 NOTABU 508 | 36737 NOTABU 509 | 97410 IMP 510 | 18782 NOTABU 511 | 77746 NOTABU 512 | 42925 NOTABU 513 | 17704 EXP 514 | 61674 NOTABU 515 | 42419 NOTABU 516 | 83463 NOTABU 517 | 39860 NOTABU 518 | 93586 NOTABU 519 | 22189 NOTABU 520 | 39918 NOTABU 521 | 51726 NOTABU 522 | 86382 NOTABU 523 | 20568 NOTABU 524 | 14923 NOTABU 525 | 53787 NOTABU 526 | 53660 NOTABU 527 | 61551 NOTABU 528 | 21826 NOTABU 529 | 80230 NOTABU 530 | 24974 NOTABU 531 | 56257 NOTABU 532 | 61311 NOTABU 533 | 79934 IMP 534 | 96418 NOTABU 535 | 55048 IMP 536 | 77625 NOTABU 537 | 10252 NOTABU 538 | 63128 NOTABU 539 | 70740 NOTABU 540 | 34034 NOTABU 541 | 92341 NOTABU 542 | 84419 NOTABU 543 | 72634 NOTABU 544 | 18825 NOTABU 545 | 70424 NOTABU 546 | 35265 NOTABU 547 | 44869 NOTABU 548 | 22047 NOTABU 549 | 66950 NOTABU 550 | 57443 NOTABU 551 | 76833 IMP 552 | 22502 NOTABU 553 | 44206 NOTABU 554 | 82044 NOTABU 555 | 24929 NOTABU 556 | 55832 IMP 557 | 32056 NOTABU 558 | 76339 NOTABU 559 | 21266 NOTABU 560 | 83591 NOTABU 561 | 67841 EXP 562 | 93576 NOTABU 563 | 44218 NOTABU 564 | 51732 NOTABU 565 | 74605 NOTABU 566 | 58487 NOTABU 567 | 74006 NOTABU 568 | 94427 EXP 569 | 96594 IMP 570 | 85447 NOTABU 571 | 62689 IMP 572 | 72609 NOTABU 573 | 25047 NOTABU 574 | 35968 NOTABU 575 | 99924 NOTABU 576 | 34665 NOTABU 577 | 45053 NOTABU 578 | 56232 NOTABU 579 | 62051 NOTABU 580 | 69036 NOTABU 581 | 14516 NOTABU 582 | 29514 NOTABU 583 | 75770 NOTABU 584 | 89749 NOTABU 585 | 25138 NOTABU 586 | 15866 IMP 587 | 91659 NOTABU 588 | 30657 NOTABU 589 | 57185 EXP 590 | 33172 NOTABU 591 | 71506 NOTABU 592 | 49079 NOTABU 593 | 91430 IMP 594 | 29203 EXP 595 | 66750 NOTABU 596 | 27913 EXP 597 | 94816 NOTABU 598 | 54004 NOTABU 599 | 41609 NOTABU 600 | 74157 NOTABU 601 | 65187 NOTABU 602 | 42656 NOTABU 603 | 53325 IMP 604 | 48503 NOTABU 605 | 36871 NOTABU 606 | 10417 NOTABU 607 | 10611 NOTABU 608 | 66771 NOTABU 609 | 23461 NOTABU 610 | 71455 NOTABU 611 | 46410 IMP 612 | 48142 NOTABU 613 | 58191 NOTABU 614 | 65968 NOTABU 615 | 26072 NOTABU 616 | 16856 IMP 617 | 89200 IMP 618 | 57869 NOTABU 619 | 45269 EXP 620 | 85583 NOTABU 621 | 88398 NOTABU 622 | 50667 NOTABU 623 | 90222 NOTABU 624 | 74235 NOTABU 625 | 52352 NOTABU 626 | 95106 NOTABU 627 | 16717 NOTABU 628 | 37049 NOTABU 629 | 58975 NOTABU 630 | 62938 NOTABU 631 | 15437 IMP 632 | 21198 NOTABU 633 | 66623 NOTABU 634 | 91472 IMP 635 | 27211 NOTABU 636 | 98531 IMP 637 | 46018 NOTABU 638 | 19152 NOTABU 639 | 43753 NOTABU 640 | 98685 IMP 641 | 31665 IMP 642 | 38732 EXP 643 | 47022 NOTABU 644 | 66074 NOTABU 645 | 41323 NOTABU 646 | 27613 NOTABU 647 | 61997 NOTABU 648 | 58690 NOTABU 649 | 15977 NOTABU 650 | 31957 NOTABU 651 | 46983 IMP 652 | 90529 NOTABU 653 | 91136 NOTABU 654 | 76332 NOTABU 655 | 42929 NOTABU 656 | 82896 NOTABU 657 | 30424 NOTABU 658 | 97795 NOTABU 659 | 22067 EXP 660 | 48738 NOTABU 661 | 62374 NOTABU 662 | 95156 NOTABU 663 | 81749 NOTABU 664 | 68363 NOTABU 665 | 79756 IMP 666 | 24097 NOTABU 667 | 99563 EXP 668 | 92047 NOTABU 669 | 23542 IMP 670 | 43453 NOTABU 671 | 69292 NOTABU 672 | 97930 NOTABU 673 | 28354 NOTABU 674 | 74959 NOTABU 675 | 96033 NOTABU 676 | 80440 NOTABU 677 | 98096 NOTABU 678 | 83416 EXP 679 | 59687 NOTABU 680 | 86570 NOTABU 681 | 67418 EXP 682 | 61929 NOTABU 683 | 58632 NOTABU 684 | 51621 NOTABU 685 | 85309 NOTABU 686 | 57330 NOTABU 687 | 30207 NOTABU 688 | 62237 NOTABU 689 | 92823 NOTABU 690 | 39788 NOTABU 691 | 48820 NOTABU 692 | 35612 NOTABU 693 | 96905 EXP 694 | 34122 NOTABU 695 | 17100 NOTABU 696 | 72538 NOTABU 697 | 68371 NOTABU 698 | 24638 NOTABU 699 | 73621 NOTABU 700 | 79778 EXP 701 | 87617 NOTABU 702 | 16323 NOTABU 703 | 90328 NOTABU 704 | 71350 IMP 705 | 43782 IMP 706 | 46378 NOTABU 707 | 18357 NOTABU 708 | 24000 NOTABU 709 | 51286 NOTABU 710 | 10412 NOTABU 711 | 84900 NOTABU 712 | 45147 NOTABU 713 | 58287 IMP 714 | 44599 NOTABU 715 | 38726 NOTABU 716 | 36393 NOTABU 717 | 74747 NOTABU 718 | 43173 NOTABU 719 | 58109 NOTABU 720 | 42091 NOTABU 721 | 29475 NOTABU 722 | 19296 NOTABU 723 | 96203 NOTABU 724 | 27467 NOTABU 725 | 42404 NOTABU 726 | 14972 NOTABU 727 | 49426 NOTABU 728 | 82638 NOTABU 729 | 51052 NOTABU 730 | 32059 NOTABU 731 | 77649 NOTABU 732 | 32061 EXP 733 | 88660 NOTABU 734 | 63129 IMP 735 | 79653 NOTABU 736 | 99016 NOTABU 737 | 57283 NOTABU 738 | 74797 EXP 739 | 25177 NOTABU 740 | 68831 NOTABU 741 | 60621 NOTABU 742 | 87428 EXP 743 | 39400 IMP 744 | 90355 NOTABU 745 | 38003 NOTABU 746 | 62788 EXP 747 | 28484 NOTABU 748 | 91471 NOTABU 749 | 84343 NOTABU 750 | 88745 EXP 751 | 70443 EXP 752 | 90402 NOTABU 753 | 90327 EXP 754 | 85687 NOTABU 755 | 88177 NOTABU 756 | 31022 NOTABU 757 | 65545 IMP 758 | 56129 NOTABU 759 | 51851 NOTABU 760 | 31997 NOTABU 761 | 89493 NOTABU 762 | 34903 NOTABU 763 | 40766 NOTABU 764 | 58543 NOTABU 765 | 53194 NOTABU 766 | 79222 IMP 767 | 72401 IMP 768 | 37002 NOTABU 769 | 31354 EXP 770 | 99347 NOTABU 771 | 60713 NOTABU 772 | 89054 NOTABU 773 | 29008 EXP 774 | 96333 NOTABU 775 | 19770 NOTABU 776 | 41590 EXP 777 | 72523 IMP 778 | 25783 NOTABU 779 | 16683 NOTABU 780 | 14640 IMP 781 | 31641 NOTABU 782 | 23762 NOTABU 783 | 45984 NOTABU 784 | 74909 EXP 785 | 88822 NOTABU 786 | 89432 NOTABU 787 | 38070 NOTABU 788 | 96397 IMP 789 | 37342 NOTABU 790 | 77917 NOTABU 791 | 31499 NOTABU 792 | 52547 NOTABU 793 | 34030 EXP 794 | 21054 NOTABU 795 | 60546 NOTABU 796 | 98396 NOTABU 797 | 29882 NOTABU 798 | 90581 NOTABU 799 | 69073 NOTABU 800 | 49338 NOTABU 801 | 22812 NOTABU 802 | 29113 EXP 803 | 13097 NOTABU 804 | 98816 NOTABU 805 | 77222 NOTABU 806 | 95258 NOTABU 807 | 36892 NOTABU 808 | 51589 NOTABU 809 | 48945 NOTABU 810 | 74448 NOTABU 811 | 76036 NOTABU 812 | 11286 NOTABU 813 | 36500 NOTABU 814 | 21677 NOTABU 815 | 73105 NOTABU 816 | 41821 NOTABU 817 | 72213 NOTABU 818 | 14258 NOTABU 819 | 13131 NOTABU 820 | 76379 EXP 821 | 25354 NOTABU 822 | 89417 NOTABU 823 | 68256 NOTABU 824 | 78984 NOTABU 825 | 52080 EXP 826 | 25127 NOTABU 827 | 48650 NOTABU 828 | 51762 IMP 829 | 71592 EXP 830 | 28558 NOTABU 831 | 32141 NOTABU 832 | 80406 NOTABU 833 | 73366 NOTABU 834 | 26454 NOTABU 835 | 78688 IMP 836 | 76135 IMP 837 | 30778 EXP 838 | 48418 NOTABU 839 | 10313 NOTABU 840 | 78950 NOTABU 841 | 13993 NOTABU 842 | 32492 NOTABU 843 | 50781 NOTABU 844 | 22569 EXP 845 | 87317 NOTABU 846 | 53862 NOTABU 847 | 89424 NOTABU 848 | 22470 NOTABU 849 | 48938 NOTABU 850 | 85360 NOTABU 851 | 90032 NOTABU 852 | 31182 NOTABU 853 | 83464 NOTABU 854 | 51218 NOTABU 855 | 41438 EXP 856 | 72867 NOTABU 857 | 73439 EXP 858 | 25657 NOTABU 859 | 67018 NOTABU 860 | 50665 NOTABU 861 | 24583 NOTABU 862 | -------------------------------------------------------------------------------- /data/offenseval_explicit_implicit/offenseval_exp_imp_test.tsv: -------------------------------------------------------------------------------- 1 | id implicit_explicit 2 | 15923 EXP 3 | 27014 O 4 | 30530 O 5 | 13876 O 6 | 60133 EXP 7 | 83681 EXP 8 | 96874 O 9 | 65507 IMP 10 | 78910 O 11 | 46363 O 12 | 68123 O 13 | 22452 O 14 | 15565 O 15 | 64376 O 16 | 12588 EXP 17 | 34263 EXP 18 | 65773 O 19 | 95457 O 20 | 24930 O 21 | 15938 O 22 | 45712 O 23 | 70840 O 24 | 53563 O 25 | 59432 O 26 | 21454 O 27 | 83155 O 28 | 69576 O 29 | 49139 EXP 30 | 76669 O 31 | 58995 IMP 32 | 88490 IMP 33 | 77101 O 34 | 86917 O 35 | 78472 O 36 | 17798 O 37 | 71873 O 38 | 30247 O 39 | 81150 O 40 | 46444 EXP 41 | 60587 EXP 42 | 70569 EXP 43 | 59200 O 44 | 30900 O 45 | 44546 IMP 46 | 53982 O 47 | 37884 O 48 | 15079 O 49 | 51628 EXP 50 | 30899 EXP 51 | 99680 O 52 | 38628 O 53 | 40110 EXP 54 | 50310 O 55 | 84101 O 56 | 84876 O 57 | 57425 O 58 | 83262 O 59 | 46777 O 60 | 54053 O 61 | 15998 EXP 62 | 62945 O 63 | 96457 EXP 64 | 85977 O 65 | 70841 IMP 66 | 46139 EXP 67 | 99009 O 68 | 66388 O 69 | 91217 O 70 | 91994 O 71 | 58751 O 72 | 80947 EXP 73 | 99253 O 74 | 40386 EXP 75 | 58742 O 76 | 39752 O 77 | 25262 O 78 | 17703 O 79 | 98916 IMP 80 | 18709 O 81 | 26026 O 82 | 32190 IMP 83 | 27550 EXP 84 | 68539 O 85 | 93643 O 86 | 99120 O 87 | 24040 EXP 88 | 85436 O 89 | 73516 EXP 90 | 88905 EXP 91 | 52955 O 92 | 33452 O 93 | 13276 O 94 | 26462 O 95 | 94339 O 96 | 23786 O 97 | 27371 O 98 | 88670 O 99 | 54391 O 100 | 42112 IMP 101 | 37740 EXP 102 | 72405 EXP 103 | 20780 O 104 | 38820 O 105 | 82913 O 106 | 97383 O 107 | 21397 O 108 | 51738 O 109 | 47445 IMP 110 | 27158 EXP 111 | 26382 O 112 | 17183 EXP 113 | 10595 O 114 | 49481 O 115 | 37949 O 116 | 96729 O 117 | 42325 IMP 118 | 21748 O 119 | 24316 O 120 | 19516 O 121 | 79383 O 122 | 64857 O 123 | 89936 O 124 | 90362 O 125 | 75620 O 126 | 24907 O 127 | 26259 O 128 | 76095 O 129 | 11557 O 130 | 51963 O 131 | 15180 EXP 132 | 58026 IMP 133 | 26432 O 134 | 76565 O 135 | 91588 O 136 | 58033 O 137 | 81352 EXP 138 | 33133 IMP 139 | 96661 O 140 | 91036 EXP 141 | 56581 O 142 | 22882 IMP 143 | 66377 O 144 | 40842 EXP 145 | 73612 EXP 146 | 45709 O 147 | 21338 O 148 | 15630 O 149 | 15815 IMP 150 | 53858 O 151 | 89606 O 152 | 30942 O 153 | 56973 O 154 | 24049 EXP 155 | 79204 EXP 156 | 46229 EXP 157 | 93101 O 158 | 42196 EXP 159 | 38084 O 160 | 67973 O 161 | 72071 O 162 | 34669 IMP 163 | 66054 O 164 | 15607 O 165 | 13959 O 166 | 34575 IMP 167 | 27546 O 168 | 69203 O 169 | 86385 O 170 | 20357 EXP 171 | 83011 O 172 | 80059 O 173 | 52104 O 174 | 29531 O 175 | 52445 EXP 176 | 12746 O 177 | 10991 IMP 178 | 63725 O 179 | 57135 EXP 180 | 11284 O 181 | 57107 O 182 | 72369 IMP 183 | 84879 O 184 | 42192 EXP 185 | 20034 O 186 | 39516 O 187 | 43587 EXP 188 | 10684 EXP 189 | 76394 O 190 | 76692 O 191 | 41553 EXP 192 | 48839 O 193 | 38718 O 194 | 56048 O 195 | 18997 O 196 | 36720 O 197 | 51656 O 198 | 62986 EXP 199 | 24430 O 200 | 19563 O 201 | 19836 O 202 | 31033 O 203 | 70051 EXP 204 | 73642 IMP 205 | 10727 EXP 206 | 73442 O 207 | 59758 O 208 | 11781 O 209 | 11645 EXP 210 | 91116 O 211 | 86437 O 212 | 86268 O 213 | 21643 O 214 | 72274 EXP 215 | 95198 O 216 | 31550 O 217 | 21524 EXP 218 | 84579 EXP 219 | 26291 O 220 | 72424 O 221 | 28744 O 222 | 59961 O 223 | 59519 IMP 224 | 51386 IMP 225 | 76052 O 226 | 14837 O 227 | 57295 O 228 | 74610 O 229 | 55653 O 230 | 10918 IMP 231 | 57541 IMP 232 | 55318 O 233 | 80965 O 234 | 14051 O 235 | 25125 EXP 236 | 81974 O 237 | 76797 O 238 | 57494 O 239 | 49441 O 240 | 78266 O 241 | 31401 O 242 | 80020 O 243 | 61295 EXP 244 | 45518 O 245 | 53351 EXP 246 | 51610 O 247 | 26758 O 248 | 30718 O 249 | 25463 EXP 250 | 82558 O 251 | 85613 EXP 252 | 36875 O 253 | 86142 O 254 | 87711 O 255 | 51626 O 256 | 59423 O 257 | 80555 O 258 | 85830 O 259 | 29102 O 260 | 85886 IMP 261 | 54621 EXP 262 | 80397 IMP 263 | 76937 O 264 | 68875 IMP 265 | 89591 O 266 | 67926 EXP 267 | 41588 IMP 268 | 18091 O 269 | 56205 O 270 | 24467 O 271 | 47245 O 272 | 32851 O 273 | 86982 O 274 | 95247 O 275 | 74211 O 276 | 51948 EXP 277 | 28196 EXP 278 | 71881 O 279 | 50376 EXP 280 | 64161 O 281 | 45304 EXP 282 | 22248 O 283 | 27228 IMP 284 | 57326 IMP 285 | 97729 O 286 | 24957 O 287 | 68774 O 288 | 43350 O 289 | 41709 O 290 | 71063 O 291 | 92516 O 292 | 85100 O 293 | 89661 O 294 | 47898 O 295 | 88523 O 296 | 19321 O 297 | 26042 O 298 | 16694 O 299 | 88221 EXP 300 | 30357 O 301 | 11021 O 302 | 26763 O 303 | 59708 O 304 | 78244 O 305 | 11813 O 306 | 16468 O 307 | 75637 O 308 | 71071 O 309 | 17714 EXP 310 | 97610 IMP 311 | 63048 EXP 312 | 30075 EXP 313 | 74237 O 314 | 60466 IMP 315 | 25685 EXP 316 | 16333 IMP 317 | 78417 EXP 318 | 78760 O 319 | 78261 O 320 | 40790 O 321 | 63775 O 322 | 44108 O 323 | 98949 O 324 | 74414 O 325 | 51525 EXP 326 | 84550 IMP 327 | 19499 O 328 | 23530 EXP 329 | 84034 O 330 | 79051 O 331 | 93740 O 332 | 54366 O 333 | 57284 IMP 334 | 82565 O 335 | 59700 EXP 336 | 13433 EXP 337 | 80197 IMP 338 | 90353 O 339 | 36496 O 340 | 93321 O 341 | 79309 IMP 342 | 80637 O 343 | 50793 O 344 | 84239 O 345 | 47834 O 346 | 65426 O 347 | 81354 O 348 | 14479 O 349 | 47696 EXP 350 | 75438 O 351 | 11699 O 352 | 62379 O 353 | 71294 IMP 354 | 55633 EXP 355 | 43864 O 356 | 40013 O 357 | 30399 O 358 | 47886 O 359 | 87934 EXP 360 | 46232 O 361 | 21869 O 362 | 35976 O 363 | 59337 O 364 | 57732 EXP 365 | 17563 O 366 | 55899 O 367 | 47396 O 368 | 46717 O 369 | 78260 O 370 | 22695 O 371 | 27455 EXP 372 | 83466 IMP 373 | 48657 EXP 374 | 89635 EXP 375 | 52335 O 376 | 27360 O 377 | 53971 O 378 | 89603 EXP 379 | 86611 EXP 380 | 44222 O 381 | 41078 O 382 | 44625 O 383 | 21197 O 384 | 12193 IMP 385 | 73484 O 386 | 96283 O 387 | 78289 O 388 | 29485 O 389 | 67024 O 390 | 97430 O 391 | 63985 EXP 392 | 65505 O 393 | 29697 O 394 | 26695 O 395 | 86529 O 396 | 37332 O 397 | 80612 O 398 | 89438 O 399 | 68755 O 400 | 53675 O 401 | 31195 O 402 | 20522 IMP 403 | 52866 O 404 | 65428 O 405 | 94260 EXP 406 | 93256 O 407 | 78364 O 408 | 70324 EXP 409 | 96847 O 410 | 89329 EXP 411 | 40929 O 412 | 24798 O 413 | 99947 O 414 | 94203 O 415 | 11100 EXP 416 | 83240 EXP 417 | 14591 O 418 | 66094 O 419 | 59164 O 420 | 90036 O 421 | 24047 O 422 | 81890 IMP 423 | 26010 O 424 | 72037 EXP 425 | 56312 O 426 | 10860 O 427 | 38829 O 428 | 78301 EXP 429 | 56599 O 430 | 94607 EXP 431 | 72531 IMP 432 | 37214 O 433 | 84952 O 434 | 98575 EXP 435 | 58342 EXP 436 | 53914 O 437 | 76324 O 438 | 18730 O 439 | 39429 IMP 440 | 72893 O 441 | 18786 EXP 442 | 61004 O 443 | 46603 O 444 | 73236 O 445 | 35785 EXP 446 | 84045 O 447 | 67049 EXP 448 | 57804 O 449 | 82273 EXP 450 | 68340 O 451 | 78646 O 452 | 13582 O 453 | 36455 O 454 | 98994 O 455 | 92967 O 456 | 70762 O 457 | 98216 EXP 458 | 42113 O 459 | 61889 O 460 | 30365 O 461 | 68518 O 462 | 37649 EXP 463 | 21394 O 464 | 14582 EXP 465 | 88487 O 466 | 21714 O 467 | 50821 O 468 | 57793 O 469 | 55400 O 470 | 59751 EXP 471 | 66470 O 472 | 92215 IMP 473 | 11122 O 474 | 87139 O 475 | 17311 O 476 | 89392 O 477 | 22311 EXP 478 | 88188 O 479 | 35503 O 480 | 75125 EXP 481 | 49210 O 482 | 21354 EXP 483 | 48965 O 484 | 88939 O 485 | 22594 IMP 486 | 45855 O 487 | 59159 O 488 | 77395 O 489 | 90492 O 490 | 19410 EXP 491 | 67211 O 492 | 35940 O 493 | 33394 IMP 494 | 96543 O 495 | 45271 O 496 | 56513 IMP 497 | 59691 EXP 498 | 34864 O 499 | 97956 O 500 | 49164 O 501 | 14273 O 502 | 38323 O 503 | 84848 O 504 | 41997 IMP 505 | 56637 EXP 506 | 65514 O 507 | 19815 EXP 508 | 36737 O 509 | 97410 IMP 510 | 18782 O 511 | 77746 EXP 512 | 42925 O 513 | 17704 EXP 514 | 61674 IMP 515 | 42419 O 516 | 83463 O 517 | 39860 EXP 518 | 93586 O 519 | 22189 O 520 | 39918 O 521 | 51726 O 522 | 86382 O 523 | 20568 O 524 | 14923 O 525 | 53787 O 526 | 53660 O 527 | 61551 O 528 | 21826 EXP 529 | 80230 O 530 | 24974 O 531 | 56257 O 532 | 61311 O 533 | 79934 IMP 534 | 96418 O 535 | 55048 IMP 536 | 77625 O 537 | 10252 O 538 | 63128 O 539 | 70740 O 540 | 34034 O 541 | 92341 O 542 | 84419 O 543 | 72634 O 544 | 18825 O 545 | 70424 O 546 | 35265 O 547 | 44869 O 548 | 22047 O 549 | 66950 O 550 | 57443 O 551 | 76833 IMP 552 | 22502 O 553 | 44206 O 554 | 82044 O 555 | 24929 O 556 | 55832 IMP 557 | 32056 EXP 558 | 76339 O 559 | 21266 O 560 | 83591 O 561 | 67841 EXP 562 | 93576 O 563 | 44218 O 564 | 51732 O 565 | 74605 O 566 | 58487 O 567 | 74006 O 568 | 94427 EXP 569 | 96594 IMP 570 | 85447 O 571 | 62689 IMP 572 | 72609 O 573 | 25047 O 574 | 35968 EXP 575 | 99924 O 576 | 34665 O 577 | 45053 O 578 | 56232 O 579 | 62051 O 580 | 69036 O 581 | 14516 EXP 582 | 29514 O 583 | 75770 O 584 | 89749 O 585 | 25138 O 586 | 15866 IMP 587 | 91659 O 588 | 30657 O 589 | 57185 EXP 590 | 33172 O 591 | 71506 O 592 | 49079 O 593 | 91430 IMP 594 | 29203 EXP 595 | 66750 O 596 | 27913 EXP 597 | 94816 O 598 | 54004 O 599 | 41609 O 600 | 74157 O 601 | 65187 O 602 | 42656 O 603 | 53325 IMP 604 | 48503 O 605 | 36871 O 606 | 10417 O 607 | 10611 O 608 | 66771 O 609 | 23461 O 610 | 71455 O 611 | 46410 IMP 612 | 48142 O 613 | 58191 O 614 | 65968 O 615 | 26072 EXP 616 | 16856 IMP 617 | 89200 IMP 618 | 57869 O 619 | 45269 EXP 620 | 85583 O 621 | 88398 O 622 | 50667 O 623 | 90222 O 624 | 74235 O 625 | 52352 O 626 | 95106 O 627 | 16717 O 628 | 37049 O 629 | 58975 O 630 | 62938 O 631 | 15437 IMP 632 | 21198 O 633 | 66623 O 634 | 91472 IMP 635 | 27211 O 636 | 98531 IMP 637 | 46018 O 638 | 19152 O 639 | 43753 EXP 640 | 98685 IMP 641 | 31665 IMP 642 | 38732 EXP 643 | 47022 O 644 | 66074 O 645 | 41323 O 646 | 27613 O 647 | 61997 O 648 | 58690 O 649 | 15977 O 650 | 31957 O 651 | 46983 IMP 652 | 90529 O 653 | 91136 O 654 | 76332 O 655 | 42929 O 656 | 82896 O 657 | 30424 O 658 | 97795 O 659 | 22067 EXP 660 | 48738 O 661 | 62374 O 662 | 95156 O 663 | 81749 O 664 | 68363 O 665 | 79756 IMP 666 | 24097 O 667 | 99563 EXP 668 | 92047 O 669 | 23542 IMP 670 | 43453 O 671 | 69292 O 672 | 97930 O 673 | 28354 O 674 | 74959 O 675 | 96033 O 676 | 80440 O 677 | 98096 O 678 | 83416 EXP 679 | 59687 O 680 | 86570 O 681 | 67418 EXP 682 | 61929 O 683 | 58632 EXP 684 | 51621 O 685 | 85309 O 686 | 57330 O 687 | 30207 O 688 | 62237 O 689 | 92823 O 690 | 39788 O 691 | 48820 O 692 | 35612 EXP 693 | 96905 EXP 694 | 34122 O 695 | 17100 O 696 | 72538 O 697 | 68371 O 698 | 24638 O 699 | 73621 O 700 | 79778 EXP 701 | 87617 O 702 | 16323 O 703 | 90328 EXP 704 | 71350 IMP 705 | 43782 IMP 706 | 46378 O 707 | 18357 O 708 | 24000 O 709 | 51286 O 710 | 10412 O 711 | 84900 O 712 | 45147 O 713 | 58287 IMP 714 | 44599 O 715 | 38726 O 716 | 36393 O 717 | 74747 O 718 | 43173 O 719 | 58109 O 720 | 42091 O 721 | 29475 O 722 | 19296 O 723 | 96203 O 724 | 27467 O 725 | 42404 O 726 | 14972 O 727 | 49426 O 728 | 82638 O 729 | 51052 O 730 | 32059 O 731 | 77649 O 732 | 32061 EXP 733 | 88660 O 734 | 63129 IMP 735 | 79653 O 736 | 99016 O 737 | 57283 O 738 | 74797 EXP 739 | 25177 IMP 740 | 68831 O 741 | 60621 O 742 | 87428 EXP 743 | 39400 IMP 744 | 90355 O 745 | 38003 O 746 | 62788 EXP 747 | 28484 O 748 | 91471 O 749 | 84343 O 750 | 88745 EXP 751 | 70443 EXP 752 | 90402 O 753 | 90327 EXP 754 | 85687 O 755 | 88177 O 756 | 31022 O 757 | 65545 IMP 758 | 56129 O 759 | 51851 O 760 | 31997 O 761 | 89493 O 762 | 34903 O 763 | 40766 O 764 | 58543 IMP 765 | 53194 O 766 | 79222 IMP 767 | 72401 IMP 768 | 37002 O 769 | 31354 EXP 770 | 99347 O 771 | 60713 O 772 | 89054 O 773 | 29008 EXP 774 | 96333 O 775 | 19770 O 776 | 41590 EXP 777 | 72523 IMP 778 | 25783 O 779 | 16683 O 780 | 14640 IMP 781 | 31641 O 782 | 23762 O 783 | 45984 O 784 | 74909 EXP 785 | 88822 O 786 | 89432 O 787 | 38070 O 788 | 96397 IMP 789 | 37342 O 790 | 77917 O 791 | 31499 O 792 | 52547 O 793 | 34030 EXP 794 | 21054 O 795 | 60546 O 796 | 98396 O 797 | 29882 O 798 | 90581 O 799 | 69073 EXP 800 | 49338 O 801 | 22812 O 802 | 29113 EXP 803 | 13097 O 804 | 98816 O 805 | 77222 O 806 | 95258 O 807 | 36892 O 808 | 51589 O 809 | 48945 O 810 | 74448 O 811 | 76036 O 812 | 11286 EXP 813 | 36500 O 814 | 21677 O 815 | 73105 O 816 | 41821 EXP 817 | 72213 O 818 | 14258 O 819 | 13131 O 820 | 76379 EXP 821 | 25354 O 822 | 89417 O 823 | 68256 O 824 | 78984 O 825 | 52080 EXP 826 | 25127 O 827 | 48650 O 828 | 51762 IMP 829 | 71592 EXP 830 | 28558 O 831 | 32141 O 832 | 80406 O 833 | 73366 O 834 | 26454 O 835 | 78688 IMP 836 | 76135 IMP 837 | 30778 EXP 838 | 48418 O 839 | 10313 O 840 | 78950 O 841 | 13993 O 842 | 32492 O 843 | 50781 O 844 | 22569 EXP 845 | 87317 O 846 | 53862 O 847 | 89424 O 848 | 22470 O 849 | 48938 EXP 850 | 85360 O 851 | 90032 O 852 | 31182 O 853 | 83464 O 854 | 51218 O 855 | 41438 EXP 856 | 72867 O 857 | 73439 EXP 858 | 25657 O 859 | 67018 EXP 860 | 50665 O 861 | 24583 O 862 | -------------------------------------------------------------------------------- /dictionary-based_experiments/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tommasoc80/AbuseEval/73401ae84c703b7471197385c26f64848535a722/dictionary-based_experiments/.DS_Store -------------------------------------------------------------------------------- /dictionary-based_experiments/.idea/dictionary-based_experiments.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 11 | -------------------------------------------------------------------------------- /dictionary-based_experiments/.idea/misc.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | -------------------------------------------------------------------------------- /dictionary-based_experiments/.idea/modules.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /dictionary-based_experiments/.idea/workspace.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 14 | 15 | 16 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 43 | 44 | 45 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 82 | 83 | 84 | 85 | 88 | 89 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 101 | 102 | 103 | 104 | 1574943111113 105 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | -------------------------------------------------------------------------------- /dictionary-based_experiments/classify_abusivevalpy.py: -------------------------------------------------------------------------------- 1 | import csv, re 2 | from nltk.tokenize import TweetTokenizer 3 | import collections 4 | from nltk.stem import PorterStemmer 5 | 6 | def output_print(prediction_dict, outfile_): 7 | 8 | output = open(outfile_, "a") 9 | 10 | for k, v in prediction_dict.items(): 11 | output.writelines(str(k[0]) + "\t" + str(k[1]) + "\t" + str(k[2]) + "\t" + v + "\n") 12 | output.close() 13 | 14 | 15 | def check_messages(message_dict, offensive_term): 16 | """ 17 | :param message_dict: message id and list of tokens of the message 18 | :param offensive_term: offensive term list 19 | :return: 20 | """ 21 | 22 | """ 23 | 2 or more off words in == OFF 24 | """ 25 | 26 | data_tokens = collections.defaultdict(list) 27 | 28 | for id, tokenized_message in message_dict.items(): 29 | for elem in tokenized_message: 30 | if elem in offensive_term: 31 | data_tokens[id].append(elem) 32 | 33 | """ 34 | classification 35 | """ 36 | 37 | classified_elems = {} 38 | for msid, token_list in data_tokens.items(): 39 | if len(token_list) >= 1: 40 | #print(msid, token_list) 41 | classified_elems[msid] = "ABU\tEXP" 42 | 43 | for entry in message_dict.keys(): 44 | if entry not in classified_elems: 45 | classified_elems[entry] = "NOTABU\tO" 46 | 47 | return classified_elems 48 | 49 | 50 | def clean_tokens(tweet_message): 51 | tweets_clean = [] 52 | 53 | tweet_tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True) 54 | 55 | # only removing the hash # sign from the word 56 | tweet_message = re.sub(r'#', '', tweet_message) 57 | # tokenize 58 | tokens = tweet_tokenizer.tokenize(tweet_message) 59 | # get stem of tokens 60 | for elem in tokens: 61 | stem_token = PorterStemmer().stem(elem) # stemming word 62 | #print(elem, stem_token) 63 | tweets_clean.append(stem_token) 64 | 65 | return tweets_clean 66 | 67 | 68 | def read_and_match(comments, terms2): 69 | 70 | tokens_ = {} 71 | 72 | 73 | with open(comments, newline='') as csvfile: 74 | read_data = csv.reader(csvfile, delimiter='\t',) # OLID; 75 | # next(read_data) 76 | counter = 0 77 | 78 | for row in read_data: 79 | message_id = row[0] 80 | message = row[1] 81 | label = row[-1] 82 | 83 | """ 84 | tokenize data with TweetTokeinizer NLTK and clean it 85 | """ 86 | tokens = clean_tokens(message) 87 | tokens_[(message_id,message,label)] = tokens 88 | 89 | wiegand_extended = [] 90 | 91 | with open(terms2) as f: 92 | for line in f: 93 | line_splitted = line.strip().split("\t") 94 | token_off = line_splitted[0].split("_")[0] 95 | score_off = float(line_splitted[1]) 96 | if score_off >= 0.75: 97 | stem_offensive = PorterStemmer().stem(token_off) 98 | wiegand_extended.append(stem_offensive) 99 | 100 | print() 101 | print("number of test messages: " + str(len(tokens_))) # number of messages 102 | 103 | 104 | set_4 = check_messages(tokens_, wiegand_extended) 105 | #print(set_4) 106 | 107 | # outdir = "./predictions-v4/test/" 108 | 109 | outfile4 = "abuseval_test.txt" 110 | 111 | output_print(set_4, outfile4) 112 | 113 | 114 | if __name__ == '__main__': 115 | 116 | """ 117 | abusive_comments_test = .tsv file containing the following information: message id (OffenseEval test file); 118 | message (OffenseEval test file); abuseval labels (labels available in ./data/abuseval_labels/abuseval_offenseval_test.tsv) 119 | 120 | list_offensive_words4 = expanded lexicon of offensive/abusive terms available at https://github.com/uds-lsv/lexicon-of-abusive-words/tree/master/Lexicons 121 | 122 | """ 123 | 124 | 125 | abusive_comments_test = "../abuseval_offenseval_test.tsv" 126 | list_offensive_words4 = "../../../implicit_explicit/offeseval_data/dictionary_approach/wiegand_expanded" # Wiegand et al., 2018 - offensive terms expanded 127 | 128 | read_and_match(abusive_comments_test,list_offensive_words4) -------------------------------------------------------------------------------- /dictionary-based_experiments/classify_offenseval.py: -------------------------------------------------------------------------------- 1 | import csv, re 2 | from nltk.tokenize import TweetTokenizer 3 | import collections 4 | from nltk.stem import PorterStemmer 5 | 6 | def output_print(prediction_dict, outfile_): 7 | 8 | output = open(outfile_, "a") 9 | 10 | for k, v in prediction_dict.items(): 11 | output.writelines(str(k[0]) + "\t" + str(k[1]) + "\t" + str(k[2]) + "\t" + v + "\n") 12 | output.close() 13 | 14 | 15 | def check_messages(message_dict, offensive_term): 16 | """ 17 | :param message_dict: message id and list of tokens of the message 18 | :param offensive_term: offensive term list 19 | :return: 20 | """ 21 | 22 | """ 23 | 2 or more off words in == OFF 24 | """ 25 | 26 | data_tokens = collections.defaultdict(list) 27 | 28 | for id, tokenized_message in message_dict.items(): 29 | for elem in tokenized_message: 30 | if elem in offensive_term: 31 | data_tokens[id].append(elem) 32 | 33 | """ 34 | classification 35 | """ 36 | 37 | classified_elems = {} 38 | for msid, token_list in data_tokens.items(): 39 | if len(token_list) >= 1: 40 | #print(msid, token_list) 41 | classified_elems[msid] = "ABU\tEXP" 42 | 43 | for entry in message_dict.keys(): 44 | if entry not in classified_elems: 45 | classified_elems[entry] = "NOTABU\tO" 46 | 47 | return classified_elems 48 | 49 | 50 | def clean_tokens(tweet_message): 51 | tweets_clean = [] 52 | 53 | tweet_tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True, reduce_len=True) 54 | 55 | # only removing the hash # sign from the word 56 | tweet_message = re.sub(r'#', '', tweet_message) 57 | # tokenize 58 | tokens = tweet_tokenizer.tokenize(tweet_message) 59 | # get stem of tokens 60 | for elem in tokens: 61 | stem_token = PorterStemmer().stem(elem) # stemming word 62 | #print(elem, stem_token) 63 | tweets_clean.append(stem_token) 64 | 65 | return tweets_clean 66 | 67 | 68 | def read_and_match(comments, terms2): 69 | 70 | tokens_ = {} 71 | 72 | 73 | with open(comments, newline='') as csvfile: 74 | read_data = csv.reader(csvfile, delimiter='\t',) # OLID 75 | next(read_data) # OLID; HateEval Train 76 | counter = 0 77 | 78 | for row in read_data: 79 | message_id = row[0] 80 | message = row[1] 81 | label = row[-1] 82 | 83 | """ 84 | tokenize data with TweetTokeinizer NLTK and clean it 85 | """ 86 | tokens = clean_tokens(message) 87 | tokens_[(message_id,message,label)] = tokens 88 | 89 | wiegand_extended = [] 90 | 91 | with open(terms2) as f: 92 | for line in f: 93 | line_splitted = line.strip().split("\t") 94 | token_off = line_splitted[0].split("_")[0] 95 | score_off = float(line_splitted[1]) 96 | if score_off >= 0.75: 97 | stem_offensive = PorterStemmer().stem(token_off) 98 | wiegand_extended.append(stem_offensive) 99 | 100 | print() 101 | print("number of test messages: " + str(len(tokens_))) # number of messages 102 | 103 | 104 | set_4 = check_messages(tokens_, wiegand_extended) 105 | #print(set_4) 106 | 107 | # outdir = "./predictions-v4/test/" 108 | 109 | outfile4 = "abuseval_test.txt" 110 | 111 | output_print(set_4, outfile4) 112 | 113 | 114 | if __name__ == '__main__': 115 | 116 | """ 117 | offensive_comments_test = OffenseEval test data 118 | list_offensive_words4 = expanded lexicon of offensive/abusive terms available at https://github.com/uds-lsv/lexicon-of-abusive-words/tree/master/Lexicons 119 | """ 120 | 121 | offensive_comments_test = "../abuseval_offenseval_test.tsv" 122 | list_offensive_words4 = "../../../implicit_explicit/offeseval_data/dictionary_approach/wiegand_expanded" # Wiegand et al., 2018 - offensive terms expanded 123 | 124 | read_and_match(offensive_comments_test,list_offensive_words4) -------------------------------------------------------------------------------- /dictionary-based_experiments/evaluation.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | """ 4 | evaluation script for the HaSpeeDe 2018 shared task 5 | 6 | USAGE: eval.py [reference] [predicted] 7 | """ 8 | 9 | import argparse, sys 10 | from pandas_ml import ConfusionMatrix 11 | from sklearn.metrics import precision_recall_fscore_support as score 12 | from sklearn.metrics import classification_report as report 13 | from operator import itemgetter 14 | 15 | def preproc(infile): 16 | y_test = [] 17 | y_gold = [] 18 | 19 | reader = infile.read().splitlines() 20 | for row in reader: 21 | gold_label_abusive = row.split('\t')[-3].replace('EXP', 'ABU').replace('IMP', 'ABU') 22 | gold_label_explicit = row.split('\t')[-3] 23 | dict_pred_binary = row.split('\t')[-2] 24 | dict_pred_ter = row.split('\t')[-1].replace('O', 'NOTABU') 25 | 26 | #print(gold_label) 27 | y_gold.append(gold_label_abusive) 28 | # y_gold.append(gold_label_explicit) 29 | # y_test.append(dict_pred_ter) 30 | y_test.append(dict_pred_binary) 31 | 32 | 33 | return y_gold, y_test 34 | 35 | #def preproc1(infile): 36 | # y = [] 37 | # 38 | # reader = infile.read().splitlines() 39 | # for row in reader: 40 | # print(row) 41 | # label = row.split('\t')[-1] 42 | # y.append(label) 43 | # 44 | # return y 45 | 46 | 47 | 48 | def eval(y_test, y_predicted): 49 | 50 | print(set(y_test) - set(y_predicted)) 51 | 52 | #print(report(y_test, y_predicted)) 53 | 54 | precision, recall, fscore, _ = score(y_test, y_predicted) 55 | print('\n {0} {1}'.format("OFF","NOT")) 56 | print('P: {}'.format(precision)) 57 | print('R: {}'.format(recall)) 58 | print('F: {}'.format(fscore)) 59 | 60 | mprecision, mrecall, mfscore, _ = score(y_test, y_predicted, average='macro') 61 | print('\n MACRO-AVG') 62 | print('P: {}'.format(mprecision)) 63 | print('R: {}'.format(mrecall)) 64 | print('F: {}'.format(mfscore)) 65 | 66 | print('\n CONFUSION MATRIX:') 67 | print (ConfusionMatrix(y_test, y_predicted)) 68 | 69 | 70 | if __name__ == "__main__": 71 | 72 | """ 73 | The evalution script assumes all labels (gold and predicted) are in the same file. 74 | Order of data: id, message, gold lables, predicted labels 75 | """ 76 | with open(sys.argv[1], 'r') as tf: 77 | y_gold, y_predicted = preproc(tf) 78 | 79 | eval(y_gold, y_predicted) 80 | 81 | 82 | -------------------------------------------------------------------------------- /keywords/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tommasoc80/AbuseEval/73401ae84c703b7471197385c26f64848535a722/keywords/.DS_Store -------------------------------------------------------------------------------- /keywords/keywords_offenseval_test.txt: -------------------------------------------------------------------------------- 1 | OFF class - Top 50 2 | 3 | hiding 0.53 4 | personality 0.53 5 | metal 0.536 6 | domesticterrorists 0.541 7 | funds 0.541 8 | arrestgeorgesoros 0.541 9 | assumption 0.549 10 | dead 0.553 11 | boycottnfl 0.553 12 | future 0.557 13 | alien 0.557 14 | ff 0.557 15 | nigeria 0.559 16 | 1500 0.562 17 | holder 0.565 18 | impeached 0.565 19 | needed 0.565 20 | including 0.57 21 | garbage 0.575 22 | greatestthingsaboutthe90s 0.577 23 | unhinged 0.577 24 | sierraburgessisaloser 0.591 25 | tricks 0.591 26 | dirty 0.591 27 | dings 0.591 28 | nauseous 0.594 29 | rosie 0.594 30 | twitterfuck 0.605 31 | dickhead 0.635 32 | pet 0.635 33 | saw 0.658 34 | mytwitteranniversary 0.68 35 | bitches 0.681 36 | carrey 0.685 37 | pervert 0.685 38 | gutierrez 0.685 39 | barrysoetorobullshit 0.685 40 | morebs 0.685 41 | racebaiter 0.685 42 | game 0.707 43 | sucks 0.707 44 | taste 0.721 45 | 5k 0.732 46 | racist 0.737 47 | potus 0.748 48 | extremely 0.756 49 | oh 0.772 50 | clown 0.801 51 | females 0.832 52 | davidhogg 0.941 53 | 54 | NOT class - Top 50 55 | 56 | thinking 0.608 57 | moviefightslive 0.608 58 | spaldo 0.609 59 | godson 0.609 60 | hardest 0.609 61 | regretted 0.609 62 | realsmart 0.612 63 | quietly 0.613 64 | obvious 0.613 65 | martina 0.614 66 | background 0.614 67 | pensions 0.615 68 | caps 0.63 69 | ah 0.631 70 | hunting 0.653 71 | 4chan 0.653 72 | praywithoutceasing 0.656 73 | anonymous 0.659 74 | sounds 0.661 75 | fair 0.661 76 | typical 0.669 77 | boycottthenfl 0.674 78 | bepannaah 0.691 79 | greatestthingsaboutthe90s 0.691 80 | zuckerberg 0.693 81 | lies 0.693 82 | nasty 0.703 83 | stimulation 0.707 84 | anal 0.707 85 | dust 0.707 86 | bite 0.707 87 | ideas 0.722 88 | marry 0.729 89 | managed 0.744 90 | 2x 0.744 91 | duckduckgo 0.749 92 | jamule 0.749 93 | liberallogic 0.755 94 | monbebeselcaday 0.757 95 | farticus 0.765 96 | pink 0.767 97 | ripmacmiller 0.767 98 | muslim 0.776 99 | sam 0.824 100 | irish 0.833 101 | titty 0.871 102 | literally 0.871 103 | revolting 0.892 104 | fucking 1.0 105 | nickidagoat 1.0 106 | -------------------------------------------------------------------------------- /keywords/keywords_offenseval_train.txt: -------------------------------------------------------------------------------- 1 | OFF class - Top 50 2 | 3 | idio 0.834 4 | boatload 0.836 5 | diversion 0.841 6 | hy 0.844 7 | blisters 0.852 8 | coont 0.856 9 | steroid 0.856 10 | hobag 0.858 11 | lmfaoo 0.874 12 | german 0.878 13 | kaze 0.882 14 | whiff 0.89 15 | sexist 0.892 16 | dominos 0.893 17 | patrick 0.893 18 | 1001 0.893 19 | nailed 0.893 20 | yen 0.894 21 | thw 0.897 22 | hplyk 0.897 23 | abnormal 0.902 24 | crack 0.902 25 | resounding 0.921 26 | ohh 0.923 27 | awh 0.923 28 | ode 0.923 29 | sweetie 0.923 30 | poppin 0.923 31 | qual 0.94 32 | claustrophobic 0.941 33 | kookoo 0.941 34 | melt 0.941 35 | unepic 1.0 36 | miriam 1.0 37 | uncultured 1.0 38 | omfg 1.0 39 | delirious 1.0 40 | bothering 1.0 41 | shithouse 1.0 42 | fuckass 1.0 43 | bafoonicus 1.0 44 | dillusional 1.0 45 | dickhead 1.0 46 | batting 1.0 47 | hungery 1.0 48 | bunk 1.0 49 | dickmatized 1.0 50 | ostrich 1.0 51 | pornhub 1.0 52 | fuckbucket 1.0 53 | 54 | NOT class - Top 50 55 | 56 | mossad 1.0 57 | dreaming 1.0 58 | omggggggg 1.0 59 | grape 1.0 60 | hooray 1.0 61 | ingles 1.0 62 | goober 1.0 63 | booooring 1.0 64 | muscles 1.0 65 | definantly 1.0 66 | dumped 1.0 67 | gamestop 1.0 68 | romanian 1.0 69 | phinnaly 1.0 70 | yeeeeess 1.0 71 | diplomat 1.0 72 | fartacus 1.0 73 | besth 1.0 74 | fad 1.0 75 | painter 1.0 76 | quack 1.0 77 | hawd 1.0 78 | diabetic 1.0 79 | oboma 1.0 80 | toffee 1.0 81 | unwell 1.0 82 | beautifull 1.0 83 | childess 1.0 84 | boat 1.0 85 | chatterbox 1.0 86 | untreatable 1.0 87 | doomentio 1.0 88 | bullcrap 1.0 89 | meeeeeeee 1.0 90 | speechless 1.0 91 | welcum 1.0 92 | glorita 1.0 93 | niggaaahhhh 1.0 94 | fk 1.0 95 | razzinfrazzinmaggle 1.0 96 | darling 1.0 97 | lmfaoooooo 1.0 98 | rejoicing 1.0 99 | burger 1.0 100 | eggman 1.0 101 | dd 1.0 102 | kingggg 1.0 103 | fluffy 1.0 104 | follback 1.0 105 | austria 1.0 106 | --------------------------------------------------------------------------------