├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | Attribution 4.0 International 2 | 3 | ======================================================================= 4 | 5 | Creative Commons Corporation ("Creative Commons") is not a law firm and 6 | does not provide legal services or legal advice. Distribution of 7 | Creative Commons public licenses does not create a lawyer-client or 8 | other relationship. Creative Commons makes its licenses and related 9 | information available on an "as-is" basis. Creative Commons gives no 10 | warranties regarding its licenses, any material licensed under their 11 | terms and conditions, or any related information. Creative Commons 12 | disclaims all liability for damages resulting from their use to the 13 | fullest extent possible. 14 | 15 | Using Creative Commons Public Licenses 16 | 17 | Creative Commons public licenses provide a standard set of terms and 18 | conditions that creators and other rights holders may use to share 19 | original works of authorship and other material subject to copyright 20 | and certain other rights specified in the public license below. The 21 | following considerations are for informational purposes only, are not 22 | exhaustive, and do not form part of our licenses. 23 | 24 | Considerations for licensors: Our public licenses are 25 | intended for use by those authorized to give the public 26 | permission to use material in ways otherwise restricted by 27 | copyright and certain other rights. Our licenses are 28 | irrevocable. Licensors should read and understand the terms 29 | and conditions of the license they choose before applying it. 30 | Licensors should also secure all rights necessary before 31 | applying our licenses so that the public can reuse the 32 | material as expected. Licensors should clearly mark any 33 | material not subject to the license. This includes other CC- 34 | licensed material, or material used under an exception or 35 | limitation to copyright. More considerations for licensors: 36 | wiki.creativecommons.org/Considerations_for_licensors 37 | 38 | Considerations for the public: By using one of our public 39 | licenses, a licensor grants the public permission to use the 40 | licensed material under specified terms and conditions. If 41 | the licensor's permission is not necessary for any reason--for 42 | example, because of any applicable exception or limitation to 43 | copyright--then that use is not regulated by the license. Our 44 | licenses grant only permissions under copyright and certain 45 | other rights that a licensor has authority to grant. Use of 46 | the licensed material may still be restricted for other 47 | reasons, including because others have copyright or other 48 | rights in the material. A licensor may make special requests, 49 | such as asking that all changes be marked or described. 50 | Although not required by our licenses, you are encouraged to 51 | respect those requests where reasonable. More_considerations 52 | for the public: 53 | wiki.creativecommons.org/Considerations_for_licensees 54 | 55 | ======================================================================= 56 | 57 | Creative Commons Attribution 4.0 International Public License 58 | 59 | By exercising the Licensed Rights (defined below), You accept and agree 60 | to be bound by the terms and conditions of this Creative Commons 61 | Attribution 4.0 International Public License ("Public License"). To the 62 | extent this Public License may be interpreted as a contract, You are 63 | granted the Licensed Rights in consideration of Your acceptance of 64 | these terms and conditions, and the Licensor grants You such rights in 65 | consideration of benefits the Licensor receives from making the 66 | Licensed Material available under these terms and conditions. 67 | 68 | 69 | Section 1 -- Definitions. 70 | 71 | a. Adapted Material means material subject to Copyright and Similar 72 | Rights that is derived from or based upon the Licensed Material 73 | and in which the Licensed Material is translated, altered, 74 | arranged, transformed, or otherwise modified in a manner requiring 75 | permission under the Copyright and Similar Rights held by the 76 | Licensor. For purposes of this Public License, where the Licensed 77 | Material is a musical work, performance, or sound recording, 78 | Adapted Material is always produced where the Licensed Material is 79 | synched in timed relation with a moving image. 80 | 81 | b. Adapter's License means the license You apply to Your Copyright 82 | and Similar Rights in Your contributions to Adapted Material in 83 | accordance with the terms and conditions of this Public License. 84 | 85 | c. Copyright and Similar Rights means copyright and/or similar rights 86 | closely related to copyright including, without limitation, 87 | performance, broadcast, sound recording, and Sui Generis Database 88 | Rights, without regard to how the rights are labeled or 89 | categorized. For purposes of this Public License, the rights 90 | specified in Section 2(b)(1)-(2) are not Copyright and Similar 91 | Rights. 92 | 93 | d. Effective Technological Measures means those measures that, in the 94 | absence of proper authority, may not be circumvented under laws 95 | fulfilling obligations under Article 11 of the WIPO Copyright 96 | Treaty adopted on December 20, 1996, and/or similar international 97 | agreements. 98 | 99 | e. Exceptions and Limitations means fair use, fair dealing, and/or 100 | any other exception or limitation to Copyright and Similar Rights 101 | that applies to Your use of the Licensed Material. 102 | 103 | f. Licensed Material means the artistic or literary work, database, 104 | or other material to which the Licensor applied this Public 105 | License. 106 | 107 | g. Licensed Rights means the rights granted to You subject to the 108 | terms and conditions of this Public License, which are limited to 109 | all Copyright and Similar Rights that apply to Your use of the 110 | Licensed Material and that the Licensor has authority to license. 111 | 112 | h. Licensor means the individual(s) or entity(ies) granting rights 113 | under this Public License. 114 | 115 | i. Share means to provide material to the public by any means or 116 | process that requires permission under the Licensed Rights, such 117 | as reproduction, public display, public performance, distribution, 118 | dissemination, communication, or importation, and to make material 119 | available to the public including in ways that members of the 120 | public may access the material from a place and at a time 121 | individually chosen by them. 122 | 123 | j. Sui Generis Database Rights means rights other than copyright 124 | resulting from Directive 96/9/EC of the European Parliament and of 125 | the Council of 11 March 1996 on the legal protection of databases, 126 | as amended and/or succeeded, as well as other essentially 127 | equivalent rights anywhere in the world. 128 | 129 | k. You means the individual or entity exercising the Licensed Rights 130 | under this Public License. Your has a corresponding meaning. 131 | 132 | 133 | Section 2 -- Scope. 134 | 135 | a. License grant. 136 | 137 | 1. Subject to the terms and conditions of this Public License, 138 | the Licensor hereby grants You a worldwide, royalty-free, 139 | non-sublicensable, non-exclusive, irrevocable license to 140 | exercise the Licensed Rights in the Licensed Material to: 141 | 142 | a. reproduce and Share the Licensed Material, in whole or 143 | in part; and 144 | 145 | b. produce, reproduce, and Share Adapted Material. 146 | 147 | 2. Exceptions and Limitations. For the avoidance of doubt, where 148 | Exceptions and Limitations apply to Your use, this Public 149 | License does not apply, and You do not need to comply with 150 | its terms and conditions. 151 | 152 | 3. Term. The term of this Public License is specified in Section 153 | 6(a). 154 | 155 | 4. Media and formats; technical modifications allowed. The 156 | Licensor authorizes You to exercise the Licensed Rights in 157 | all media and formats whether now known or hereafter created, 158 | and to make technical modifications necessary to do so. The 159 | Licensor waives and/or agrees not to assert any right or 160 | authority to forbid You from making technical modifications 161 | necessary to exercise the Licensed Rights, including 162 | technical modifications necessary to circumvent Effective 163 | Technological Measures. For purposes of this Public License, 164 | simply making modifications authorized by this Section 2(a) 165 | (4) never produces Adapted Material. 166 | 167 | 5. Downstream recipients. 168 | 169 | a. Offer from the Licensor -- Licensed Material. Every 170 | recipient of the Licensed Material automatically 171 | receives an offer from the Licensor to exercise the 172 | Licensed Rights under the terms and conditions of this 173 | Public License. 174 | 175 | b. No downstream restrictions. You may not offer or impose 176 | any additional or different terms or conditions on, or 177 | apply any Effective Technological Measures to, the 178 | Licensed Material if doing so restricts exercise of the 179 | Licensed Rights by any recipient of the Licensed 180 | Material. 181 | 182 | 6. No endorsement. Nothing in this Public License constitutes or 183 | may be construed as permission to assert or imply that You 184 | are, or that Your use of the Licensed Material is, connected 185 | with, or sponsored, endorsed, or granted official status by, 186 | the Licensor or others designated to receive attribution as 187 | provided in Section 3(a)(1)(A)(i). 188 | 189 | b. Other rights. 190 | 191 | 1. Moral rights, such as the right of integrity, are not 192 | licensed under this Public License, nor are publicity, 193 | privacy, and/or other similar personality rights; however, to 194 | the extent possible, the Licensor waives and/or agrees not to 195 | assert any such rights held by the Licensor to the limited 196 | extent necessary to allow You to exercise the Licensed 197 | Rights, but not otherwise. 198 | 199 | 2. Patent and trademark rights are not licensed under this 200 | Public License. 201 | 202 | 3. To the extent possible, the Licensor waives any right to 203 | collect royalties from You for the exercise of the Licensed 204 | Rights, whether directly or through a collecting society 205 | under any voluntary or waivable statutory or compulsory 206 | licensing scheme. In all other cases the Licensor expressly 207 | reserves any right to collect such royalties. 208 | 209 | 210 | Section 3 -- License Conditions. 211 | 212 | Your exercise of the Licensed Rights is expressly made subject to the 213 | following conditions. 214 | 215 | a. Attribution. 216 | 217 | 1. If You Share the Licensed Material (including in modified 218 | form), You must: 219 | 220 | a. retain the following if it is supplied by the Licensor 221 | with the Licensed Material: 222 | 223 | i. identification of the creator(s) of the Licensed 224 | Material and any others designated to receive 225 | attribution, in any reasonable manner requested by 226 | the Licensor (including by pseudonym if 227 | designated); 228 | 229 | ii. a copyright notice; 230 | 231 | iii. a notice that refers to this Public License; 232 | 233 | iv. a notice that refers to the disclaimer of 234 | warranties; 235 | 236 | v. a URI or hyperlink to the Licensed Material to the 237 | extent reasonably practicable; 238 | 239 | b. indicate if You modified the Licensed Material and 240 | retain an indication of any previous modifications; and 241 | 242 | c. indicate the Licensed Material is licensed under this 243 | Public License, and include the text of, or the URI or 244 | hyperlink to, this Public License. 245 | 246 | 2. You may satisfy the conditions in Section 3(a)(1) in any 247 | reasonable manner based on the medium, means, and context in 248 | which You Share the Licensed Material. For example, it may be 249 | reasonable to satisfy the conditions by providing a URI or 250 | hyperlink to a resource that includes the required 251 | information. 252 | 253 | 3. If requested by the Licensor, You must remove any of the 254 | information required by Section 3(a)(1)(A) to the extent 255 | reasonably practicable. 256 | 257 | 4. If You Share Adapted Material You produce, the Adapter's 258 | License You apply must not prevent recipients of the Adapted 259 | Material from complying with this Public License. 260 | 261 | 262 | Section 4 -- Sui Generis Database Rights. 263 | 264 | Where the Licensed Rights include Sui Generis Database Rights that 265 | apply to Your use of the Licensed Material: 266 | 267 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right 268 | to extract, reuse, reproduce, and Share all or a substantial 269 | portion of the contents of the database; 270 | 271 | b. if You include all or a substantial portion of the database 272 | contents in a database in which You have Sui Generis Database 273 | Rights, then the database in which You have Sui Generis Database 274 | Rights (but not its individual contents) is Adapted Material; and 275 | 276 | c. You must comply with the conditions in Section 3(a) if You Share 277 | all or a substantial portion of the contents of the database. 278 | 279 | For the avoidance of doubt, this Section 4 supplements and does not 280 | replace Your obligations under this Public License where the Licensed 281 | Rights include other Copyright and Similar Rights. 282 | 283 | 284 | Section 5 -- Disclaimer of Warranties and Limitation of Liability. 285 | 286 | a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE 287 | EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS 288 | AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF 289 | ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, 290 | IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, 291 | WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR 292 | PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, 293 | ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT 294 | KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT 295 | ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. 296 | 297 | b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE 298 | TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, 299 | NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, 300 | INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, 301 | COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR 302 | USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN 303 | ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR 304 | DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR 305 | IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. 306 | 307 | c. The disclaimer of warranties and limitation of liability provided 308 | above shall be interpreted in a manner that, to the extent 309 | possible, most closely approximates an absolute disclaimer and 310 | waiver of all liability. 311 | 312 | 313 | Section 6 -- Term and Termination. 314 | 315 | a. This Public License applies for the term of the Copyright and 316 | Similar Rights licensed here. However, if You fail to comply with 317 | this Public License, then Your rights under this Public License 318 | terminate automatically. 319 | 320 | b. Where Your right to use the Licensed Material has terminated under 321 | Section 6(a), it reinstates: 322 | 323 | 1. automatically as of the date the violation is cured, provided 324 | it is cured within 30 days of Your discovery of the 325 | violation; or 326 | 327 | 2. upon express reinstatement by the Licensor. 328 | 329 | For the avoidance of doubt, this Section 6(b) does not affect any 330 | right the Licensor may have to seek remedies for Your violations 331 | of this Public License. 332 | 333 | c. For the avoidance of doubt, the Licensor may also offer the 334 | Licensed Material under separate terms or conditions or stop 335 | distributing the Licensed Material at any time; however, doing so 336 | will not terminate this Public License. 337 | 338 | d. Sections 1, 5, 6, 7, and 8 survive termination of this Public 339 | License. 340 | 341 | 342 | Section 7 -- Other Terms and Conditions. 343 | 344 | a. The Licensor shall not be bound by any additional or different 345 | terms or conditions communicated by You unless expressly agreed. 346 | 347 | b. Any arrangements, understandings, or agreements regarding the 348 | Licensed Material not stated herein are separate from and 349 | independent of the terms and conditions of this Public License. 350 | 351 | 352 | Section 8 -- Interpretation. 353 | 354 | a. For the avoidance of doubt, this Public License does not, and 355 | shall not be interpreted to, reduce, limit, restrict, or impose 356 | conditions on any use of the Licensed Material that could lawfully 357 | be made without permission under this Public License. 358 | 359 | b. To the extent possible, if any provision of this Public License is 360 | deemed unenforceable, it shall be automatically reformed to the 361 | minimum extent necessary to make it enforceable. If the provision 362 | cannot be reformed, it shall be severed from this Public License 363 | without affecting the enforceability of the remaining terms and 364 | conditions. 365 | 366 | c. No term or condition of this Public License will be waived and no 367 | failure to comply consented to unless expressly agreed to by the 368 | Licensor. 369 | 370 | d. Nothing in this Public License constitutes or may be interpreted 371 | as a limitation upon, or waiver of, any privileges and immunities 372 | that apply to the Licensor or You, including from the legal 373 | processes of any jurisdiction or authority. 374 | 375 | 376 | ======================================================================= 377 | 378 | Creative Commons is not a party to its public 379 | licenses. Notwithstanding, Creative Commons may elect to apply one of 380 | its public licenses to material it publishes and in those instances 381 | will be considered the “Licensor.” The text of the Creative Commons 382 | public licenses is dedicated to the public domain under the CC0 Public 383 | Domain Dedication. Except for the limited purpose of indicating that 384 | material is shared under a Creative Commons public license or as 385 | otherwise permitted by the Creative Commons policies published at 386 | creativecommons.org/policies, Creative Commons does not authorize the 387 | use of the trademark "Creative Commons" or any other trademark or logo 388 | of Creative Commons without its prior written consent including, 389 | without limitation, in connection with any unauthorized modifications 390 | to any of its public licenses or any other arrangements, 391 | understandings, or agreements concerning use of licensed material. For 392 | the avoidance of doubt, this paragraph does not form part of the 393 | public licenses. 394 | 395 | Creative Commons may be contacted at creativecommons.org. 396 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Comprehensive Guide to Machine Learning Software for Text Screening 2 | 3 | This project aims to provide a comparison of 4 | different software tools for machine learning-assisted text screening. The 5 | comparison is designed to help researchers and practitioners make informed 6 | decisions when selecting a suitable tool for their needs. We compare various 7 | aspects, such as software functionality, data handling capabilities, and 8 | machine learning properties. 9 | 10 | # Table of Contents 11 | - [Inclusion Criteria](#inclusion-criteria) 12 | - [Quick Overview](@overview) 13 | - [Installation](#installation) 14 | - [Data Handling](#data-handling) 15 | - [Machine Learning Properties](#machine-learning-properties) 16 | - [Excluded Software](#excluded-software) 17 | - [Software Description](#software) 18 | - [Contributing](#contributing) 19 | - [License](#license) 20 | - [Contact](#contact) 21 | 22 | 23 | # Inclusion Criteria 24 | 25 | The initial selection process for 26 | selecting the software tools is documented on the [Open Science 27 | Framework](https://osf.io/g3nkz/) and meet the following 28 | inclusion criteria: 29 | 30 | - Implements a Researcher-in-the-Loop [(RITL)-based active learning cycle](https://www.nature.com/articles/s42256-020-00287-7) for systematically screening large volumes of textual data. 31 | - Achieves a Technology Readiness Level of at least [TRL7](https://en.wikipedia.org/wiki/Technology_readiness_level). 32 | - Offers user-friendly software that is accessible to a broad audience. 33 | - Provides a generic application that is not limited to specific content, fields, or types of interventions. 34 | 35 | 36 | # Overview 37 | The table below offers a concise overview of various software tools designed 38 | for systematically screening large volumes of textual data using machine 39 | learning techniques. Each software is evaluated based on the following 40 | properties: 41 | 42 | - Is there a website? 43 | - Is the software [open-source](https://opensource.org/osd) (provide a :link: to the source code)? 44 | - Is the software peer-reviewed in a scientific article? 45 | - Is documentation or a manual available (provide a :link:)? 46 | - Is the full version of the software free of charge? 47 | 48 | 49 | | Software | Website | Open-Source | Published | Documentation | Free | 50 | |:-------------------------------:|:------------------------------------------------------------:|:-----------------------------------------------------------:|:------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------:|:---------------------------:| 51 | | [Abstrackr](#abstrackr) | [:link:](http://abstrackr.cebm.brown.edu) | :x: | [![DOI](https://img.shields.io/badge/DOI-10.1145/2110363.2110464-green.svg)](https://doi.org/10.1145/2110363.2110464) | :x: | :white_check_mark: | 52 | | [ASReview](#asreview) | [:link:](https://asreview.nl/) | :white_check_mark:[:link:](https://github.com/asreview/) | [![DOI](https://img.shields.io/badge/DOI-10.1038/s42256--020--00287--7-green.svg)](https://doi.org/10.1038/s42256-020-00287-7) | :white_check_mark:[:link:](https://asreview.readthedocs.io/) | :white_check_mark: | 53 | | [Colandr](#colandr) | [:link:](https://hslib.jabsom.hawaii.edu/colandr) | :x: | [![DOI](https://img.shields.io/badge/DOI-10.1111/cobi.13117-green.svg)](https://doi.org/10.1111/cobi.13117) | :white_check_mark:[:link:](https://hslib.jabsom.hawaii.edu/colandr/getting_started) | :white_check_mark: | 54 | | [DistillerSR](#distillersr) | [:link:](https://www.evidencepartners.com/) | :x: | [![DOI](https://img.shields.io/badge/DOI-10.1016/j.vhri.2020.07.479-green.svg)](https://doi.org/10.1016/j.vhri.2020.07.479)| :white_check_mark:[:link:](https://www.evidencepartners.com/resources) | :x:| 55 | | [EPPI-Reviewer](#eppi-reviewer) | [:link:](https://eppi.ioe.ac.uk/cms/Default.aspx?tabid=2914) | :x: | :x: | :white_check_mark:[:link:](https://eppi.ioe.ac.uk/cms/Default.aspx?tabid=3822) | :x: | 56 | | [FASTREAD](#fastread) | :x: | :white_check_mark:[:link:](https://github.com/fastread/src) | [![DOI](https://img.shields.io/badge/DOI-10.1007/s10664--017--9587--0-green.svg)](https://doi.org/10.1007/s10664-017-9587-0) | :white_check_mark:[:link:](https://github.com/fastread/src/#readme) | :white_check_mark: | 57 | | [Rayyan](#rayyan) | [:link:](https://www.rayyan.ai/) | :x: | [![DOI](https://img.shields.io/badge/DOI-10.1186/s13643--016--0384--4-green.svg)](https://doi.org/10.1186/s13643-016-0384-4) | :white_check_mark:[:link:](https://help.rayyan.ai/hc/en-us) | :x: | 58 | | [SWIFT-Active Screener](#swift-activescreener) | [:link:](https://www.sciome.com/swift-activescreener/) | :x: | [![DOI](https://img.shields.io/badge/DOI-10.1016/j.envint.2020.105623-green.svg)](https://doi.org/10.1016/j.envint.2020.105623) | :white_check_mark:[:link:](https://www.sciome.com/swift-activescreener/knowledgebase/) | :x: | 59 | 60 | 61 | :white_check_mark: Yes/Implemented; 62 | :x: No/Not implemented; 63 | :grey_question: Unknown (requires an issue). 64 | 65 | 1 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/29 66 | 67 | # Installation 68 | 69 | This table summarizes the various installation options available for each 70 | software tool, highlighting whether: 71 | 72 | - The software can be installed locally, ensuring that data and labeling decisions are only stored on the user's device (yes/no)? 73 | - The software can be installed on a server (yes/no)? 74 | - The software is available as an online service (Software as a Service - SAAS; yes/no; provide a link to the registration page)? 75 | 76 | | Software | Local | Server | Online Service | 77 | |:----------------------------------------------:|:------------------:|:------------------:|:-------------------------------------------------------------------------------------------------------:| 78 | | [Abstrackr](#abstrackr) | :x: | :x: | :white_check_mark:[:link:](http://abstrackr.cebm.brown.edu) | 79 | | [ASReview](#asreview) | :white_check_mark: | :white_check_mark: | :x: | 80 | | [Colandr](#colandr) | :x: | :x: | :white_check_mark:[:link:](https://www.colandrapp.com/) | 81 | | [DistillerSR](#distillersr) | :x: | :x: | :white_check_mark:[:link:](https://www.distillersr.com/products/distillersr-systematic-review-software) | 82 | | [EPPI-Reviewer](#eppi-reviewer) | :x: | :x: | :white_check_mark:[:link:](https://eppi.ioe.ac.uk/) | 83 | | [FASTREAD](#fastread) | :white_check_mark: | :white_check_mark: | :x: | 84 | | [Rayyan](#rayyan) | :x: | :x: | :white_check_mark:[:link:](https://www.rayyan.ai/) | 85 | | [RobotAnalyst](#robotanalyst) | :x: | :x: | :white_check_mark:[:link:](http://www.nactem.ac.uk/robotanalyst/)1 | 86 | | [SWIFT-Active Screener](#swift-activescreener) | :x: | :x: | :x:[:link:](https://swift.sciome.com/activescreener/) | 87 | 88 | 89 | :white_check_mark: Yes; 90 | :x: No; 91 | :grey_question: Unknown (requires an issue). 92 | 93 | # Data Handling 94 | 95 | This table provides an overview of the data input/output capabilities of each 96 | software, including: 97 | 98 | - Supported import data formats. 99 | - Whether partially labeled data can be imported (yes/no; if yes, as **S**(ingle) or **M**(ultiple) files)? 100 | - Supported export data formats. 101 | - If the export file includes the labeling decisions. 102 | - Whether the export file can be re-imported into the same software, retaining the labeling decisions (Re-Import-1: yes/no)? 103 | - Whether the export file can be re-imported into reference manager software, retaining the labeling decision (Re-Import-2: yes/no)? 104 | 105 | | Software | Input data format | Partly labeled | Output data format | Labeling decisions | Re-Import-1 | Re-Import-2 | 106 | |:----------------------------------------------:|:-----------------------------------------:|:---------------------------------------------:|:---------------------------:|:---------------------------:|:---------------------------:|:---------------------------:| 107 | | [Abstrackr](#abstrackr) | RIS, TAB, TXT1 | :x: | CSV, XML, RIS | :white_check_mark: | :x: | :white_check_mark: | 108 | | [ASReview](#asreview) | RIS, TSV, CSV, XLSX, TAB, `+`2 | :white_check_mark:(S)`+`2 | RIS, TSV, CSV, XLSX, TAB | :white_check_mark: | :white_check_mark: | :white_check_mark: | 109 | | [Colandr](#colandr) | RIS, BIB, TXT | :white_check_mark:(M) | CSV | :white_check_mark: | :x: | :x: | 110 | | [DistillerSR](#distillersr) | ENLX, RIS, CSV, ZIP | :white_check_mark:(M) | RIS, CSV, XLSX, Word | :grey_question:4 | :grey_question:4 | :grey_question:4 | 111 | | [EPPI-Reviewer](#eppi-reviewer) | RIS, TXT, `+`3 | :white_check_mark:(M) | RIS, XLSX | :grey_question:5 | :grey_question:5 | :grey_question:5 | 112 | | [FASTREAD](#fastread) | CSV | :white_check_mark:(S) | CSV | :white_check_mark: | :white_check_mark: | :x: | 113 | | [Rayyan](#rayyan) | RIS, ENW, BIB, CSV, XML, CIW, NBIB | :white_check_mark:(M) | RIS, BIB, ENW, CSV | :white_check_mark: | :x: | :white_check_mark: | 114 | | [SWIFT-Active Screener](#swift-activescreener) | TXT, RIS, XML, BibTex | :white_check_mark:(M) | CSV, RIS | :white_check_mark: | :grey_question:7 | :white_check_mark: | 115 | 116 | :white_check_mark: Yes/Implemented; 117 | :x: No/Not implemented; 118 | :zap: Only for some extensions (add a footnote for more explanation); 119 | :grey_question: Unknown (requires an issue). 120 | 121 | 1 List of PubMed IDs 122 | 123 | 2 ASReview provides several open-source tools to convert file formats (e.g., CSV->RIS or RIS->XLSX), combine datasets (labeled, partly labeled, or unlabeled), and deduplicate records based on title/abstract/DOI. 124 | 125 | 3 EPPI-Reviewer provides a closed-source [online file converter](https://eppi.ioe.ac.uk/cms/Default.aspx?tabid=2934) to convert several file formats to RIS. 126 | 127 | 4 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/54 128 | 129 | 5 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/21 130 | 131 | 7 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/40 132 | 133 | 134 | 135 | 136 | # Machine Learning Properties 137 | 138 | The tables below provide an overview of the machine learning properties of each software. 139 | 140 | 141 | ## Active Learning 142 | 143 | ### Training Data 144 | 145 | - Can the user select training data (prior knowledge) to train the first iteration of the model (yes/no)? 146 | - What is the minimum training data size (provide a number for **R**elevant and **I**rrelevant records)? 147 | 148 | 149 | 150 | | Software | Tr.Data by user | Minimum Tr.data | 151 | |:----------------------------------------------:|:------------------------------:|:------------------------------------------------:| 152 | | [Abstrackr](#abstrackr) | :x: | :grey_question:1 | 153 | | [ASReview](#asreview) | :white_check_mark: | ≥1R+≥1I | 154 | | [Colandr](#colandr) | :white_check_mark: | 10 | 155 | | [DistillerSR](#distillersr) | :white_check_mark: | 25 or 2%2 | 156 | | [EPPI-Reviewer](#eppi-reviewer) | :white_check_mark: | ≥5R | 157 | | [FASTREAD](#fastread) | :white_check_mark: | ≥1R | 158 | | [Rayyan](#rayyan) | :white_check_mark: | ≥50 with ≥5R | 159 | | [SWIFT-Active Screener](#swift-activescreener) | :white_check_mark:4 | ≥1R5 | 160 | 161 | :white_check_mark: Yes/Implemented; 162 | :x: No/Not implemented; 163 | :zap: With some effort (add a footnote for more explanation); 164 | :grey_question: Unknown (requires an issue). 165 | 166 | 1 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/34 167 | 168 | 2 Training takes place after screening 25 records or after screening 2% of the dataset, whichever is greater. 169 | 170 | 4 Only relevant records can be provided as training data prior to screening. 171 | 172 | 5 If no relevant records are uploaded prior to screening, training will be initiated after screening ≥30 records with atleast ≥1R and ≥1I. 173 | 174 | ### Model Selection 175 | 176 | The table below provides an overview of the model selection properties for each software. 177 | 178 | - Can the user select the active learning model (yes/no)? 179 | - Can a user upload their own model (yes/no)? 180 | - Can the feature extraction results be stored (yes/no)? 181 | - Does (re-)training proceed **A**utomatically or is it triggered **M**anually? 182 | - Can the user continue labeling during training (yes/no)? 183 | - Can the user select batch size (yes/no; provide the default)? 184 | - Is it possible to switch to a different model during screening (yes/no)? 185 | 186 | | Software | Select model | User model | Store Feat.matrix | Training | Continue | Batch size | Switch | 187 | |:----------------------------------------------:|:------------------:|:------------------:|:------------------:|:--------:|:---------------------------:|:----------:|:-----------------:| 188 | | [Abstrackr](#abstrackr) | :x: | :x: | :x: | A | :white_check_mark: | :x: | :x: | 189 | | [ASReview](#asreview) | :white_check_mark: | :white_check_mark: | :white_check_mark: | A | :white_check_mark: | :x: (1) | :zap:1 | 190 | | [Colandr](#colandr) | :x: | :x: | :x: | A | :white_check_mark: | :x: (10) | :x: | 191 | | [DistillerSR](#distillersr) | :x: | :x: | :x: | A, M | :white_check_mark: | :x: | :x: | 192 | | [EPPI-Reviewer](#eppi-reviewer) | :x: | :x: | :x: | M | :white_check_mark: | :x: | :x: | 193 | | [FASTREAD](#fastread) | :x: | :x: | :x: | M | :x: | :x: | :x: | 194 | | [Rayyan](#rayyan) | :x: | :x: | :x: | M | :white_check_mark: | :x: | :x: | 195 | | [SWIFT-Active Screener](#swift-activescreener) | :x: | :x: | :x: | A | :grey_question:3 | :x: (30) | :x: | 196 | 197 | :white_check_mark: Yes/Implemented; 198 | :x: No/Not implemented; 199 | :zap: With some effort (add a footnote with more explanation); 200 | 201 | 1 Switching to a different model in ASReview is available by exporting the data of the first model and importing the data back into ASReview. 202 | The software will recognize all previous labeling decisions, and a new model can be trained. 203 | 204 | 2 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/29 205 | 206 | 3 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/40 207 | 208 | 209 | ### Overview of Available Models 210 | 211 | - Which feature extraction methods are available? 212 | **BOW** = bag of words; 213 | **Doc2Vec** = document to vector; 214 | **sBERT** = sentence bidirectional encoder representations from transformers; 215 | **TF–IDF** = term frequency–inverse document frequency; 216 | **Word2Vec** = words to vector; 217 | **ML** = Multi-language; 218 | 219 | - Which classifiers are available? 220 | **CNN** = convolutional neural network; 221 | **DNN** = dense neural network; 222 | **LDA** = latent Dirichlet allocation; 223 | **LL** = log linear; 224 | **LR**= logistic regression; 225 | **LSTM** = long short-term memory; 226 | **NB** = naive Bayes; 227 | **RF** =random forests; 228 | **SGD** = stochastic gradient descent; 229 | **SVM** = support vector machine; 230 | 231 | 232 | - Which balancing strategies are available? 233 | **S / Simple** = no balancing balance strategy; 234 | **D / Double** = Double balance strategy; 235 | **T / Triple** = Triple balance strategy; 236 | **U / Under** = Undersampling balance strategy; 237 | **A / Aggressive** = Aggressive undersampling balance strategy (after classifier is stable); 238 | **W / Weighting** = Weighting for data balancing (before and after classifier is stable); 239 | **M / Mixing** = Mixing: weighting is applied before the classifier is stable and aggressive undersampling is applied after the classifier is stable; 240 | 241 | 242 | - Which query strategies are available? 243 | **R / Random** = Records are selected randomly; 244 | **C / Certain** = Certainty based; 245 | **U / Uncertain** = Uncertainty based; 246 | **M / Mixed** = A combination of query strategies, for example 90% Certainty based and 10% Random; 247 | **Cl / Clustering** = Clustering query strategy; 248 | 249 | 250 | 251 | | Software | Feature Extr. | Classifiers | Balancing | Query Stra. | 252 | |:----------------------------------------------:|:------------------------------------:|:--------------------------------:|:----------------------------:|:---------------------------:| 253 | | [Abstrackr](#abstrackr) | TF-IDF :grey_question:1 | SVM | :grey_question:1 | R, C, U | 254 | | [ASReview](#asreview) | TF–IDF, Doc2Vec, sBert, TF-IDF, ML | CNN, DNN, LR, LSTM, NB, RF, SVM | S, D, U, T | R, C, U, M, CL | 255 | | [Colandr](#colandr) | Word2Vec :grey_question:2 | SGD :grey_question: 2 | :grey_question:2 | C | 256 | | [DistillerSR](#distillersr) | :grey_question:3 | SVM | :grey_question:3 | R, C | 257 | | [EPPI-Reviewer](#eppi-reviewer) | TF-IDF | SVM | :grey_question:4 | R, C, Cl | 258 | | [FASTREAD](#fastread) | TF-IDF | SVM | S, A, W, M | C, U | 259 | | [Rayyan](#rayyan) | :grey_question:5 | SVM | :grey_question:5 | C, U | 260 | | [SWIFT-Active Screener](#swift-activescreener) | TF-IDF | LL | S:grey_question:7 | C | 261 | 262 | 263 | :white_check_mark: Yes/Implemented; 264 | :x: No/Not implemented; 265 | :grey_question: Unknown (requires an issue). 266 | 267 | 1 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/34 268 | 269 | 2 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/16 270 | 271 | 3 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/54 272 | 273 | 4 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/21 274 | 275 | 5 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/19 276 | 277 | 7 See issues https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/40 278 | 279 | 280 | ## Supervised Learning 281 | 282 | | Software | Feature Extr. | Classifiers | Balancing | Query Stra. | 283 | |:-------------------------------------------:|:-------------:|:------------------------------:|:---------------------------:|:-----------:| 284 | | [EPPI-Reviewer](#eppi-reviewer)1 | TF-IDF | SVM:grey_question:2 | :grey_question:2 | R, C, Cl | 285 | 286 | 1 EPPI-Reviewer offers the option to choose from, or use custom, pre-trained models to find a specific type of literature, e.g., for RCTs. 287 | 288 | 2 See issue https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/21 289 | 290 | ## Unsupervised Learning 291 | 292 | | Software | Q1 | 293 | |:--------:|:-----------:| 294 | 295 | # Excluded Software 296 | 297 | This section contains a list of software that did not fullfill the [inclusion criteria](#inclusion-criteria), but that are still largely used in the scientific community. 298 | 299 | ## [RobotAnalyst](http://www.nactem.ac.uk/robotanalyst/) 300 | 301 | RobotAnalyst was developed as part of the Supporting Evidence-based Public Health Interventions using Text Mining project to support the literature screening phase of systematic reviews. The current version of RobotAnalyst is mounted on a University of Manchester server and is a prototype demo system for research purposes at Manchester University and partners (see [response to issue #29](https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/issues/29#issuecomment-2517211394)). 302 | 303 | # Software 304 | 305 | This section briefly describes the software in alphabetical order. 306 | 307 | ## [Abstrackr](https://github.com/bwallace/abstrackr-web) 308 | 309 | Abstrackr is a collaborative (i.e., multiple reviewers can simultaneously 310 | screen citations for a review), web-based annotation tool for the citation 311 | screening task. 312 | 313 | ## [ASReview](www.asreview.nl) 314 | 315 | ASReview, developed at Utrecht University, helps scholars and practitioners 316 | to get an overview of the most relevant records for their work as efficiently 317 | as possible while being transparent in the process. It allows multiple 318 | machine learning models, and ships with exploration and simulation modes, 319 | which are especially useful for comparing and designing algorithms. 320 | Furthermore, it is intended to be easily extensible, allowing third parties 321 | to add modules that enhance the pipeline with new models, data, and other 322 | extensions. 323 | 324 | ## [Colandr](https://hslib.jabsom.hawaii.edu/colandr) 325 | 326 | Colandr is a free, web-based, open-access tool for conducting evidence 327 | synthesis projects. 328 | 329 | ## [DistillerSR](https://www.evidencepartners.com/products/distillersr-systematic-review-software) 330 | 331 | DistillerSR automates the management of literature collection, screening, and assessment using AI and intelligent workflows. From a systematic literature review to a rapid review to a living review, DistillerSR makes any project simpler to manage and configure to produce transparent, audit-ready, and compliant results. 332 | 333 | 334 | ## [EPPI-Reviewer](https://eppi.ioe.ac.uk/cms/Default.aspx?tabid=2914) 335 | 336 | EPPI-Reviewer is a web-based software program for managing and analysing data 337 | in literature reviews. It has been developed for all types of systematic 338 | review (meta-analysis, framework synthesis, thematic synthesis etc) but also 339 | has features that would be useful in any literature review. It manages 340 | references, stores PDF files and facilitates qualitative and quantitative 341 | analyses such as meta-analysis and thematic synthesis. It also contains some 342 | new ‘text mining’ technology which is promising to make systematic reviewing 343 | more efficient. 344 | 345 | ## [FASTREAD](https://github.com/fastread/src) 346 | 347 | FASTREAD (FAST2) is a tool to support primary study selection in systematic 348 | literature review. 349 | 350 | ## [Rayyan](https://www.rayyan.ai/) 351 | 352 | Rayyan is a free web and mobile app, that helps expedite the initial screening 353 | of abstracts and titles using a process of semi-automation while incorporating 354 | a high level of usability. 355 | 356 | ## [SWIFT-Active Screener](https://www.sciome.com/swift-activescreener/) 357 | 358 | SWIFT-Active Screener (SWIFT is an acronym for “Sciome Workbench for Interactive 359 | computer-Facilitated Text-mining”) is a freely available interactive workbench 360 | which provides numerous tools to assist with problem formulation and 361 | literature prioritization. 362 | 363 | # Contributing 364 | 365 | If you know of other software that meets the inclusion criteria, please make a 366 | Pull Request and add it to the overview. If you find any missing, incorrect, 367 | or incomplete information, please open an issue to discuss it. 368 | 369 | By collaborating on this repository, we can create a valuable resource for 370 | researchers, practitioners, and other stakeholders interested in leveraging 371 | machine learning for text screening purposes. 372 | 373 | # License 374 | 375 | This project is licensed under CC-BY 4.0. 376 | 377 | # Contact 378 | 379 | For suggestions, questions, or comments, please file an issue in the issue 380 | tracker. 381 | 382 | This comparison is maintained by Rens van de Schoot. The goal is to provide a 383 | fair and unbiased comparison. If you have any concerns regarding the 384 | comparison, please open an issue in the issue tracker so that it can be 385 | discussed openly. 386 | --------------------------------------------------------------------------------