├── NLP-Annotation-Platforms.csv ├── NLP-Annotation-Platforms.tsv ├── NLP-Annotation-Platforms.xlsx └── README.md /NLP-Annotation-Platforms.csv: -------------------------------------------------------------------------------- 1 | Do you need manual annotations for your NLP projects? ,What NLP tasks do you need annotation for? ,Have you used open source annotation platforms before? ,Which of these tools have you used before?,"If you've used open source annotation tool(s) before, which annotation tool(s) do you like best? List 1-3.",Which feature(s) do you find most useful from your previous experience with open source annotation platforms? List 1-3.,Any other feedback on Open Source NLP/CV annotation tools?,Have you used commercial annotation platforms before?,Which of these tools/platforms have you heard of before ?,Which of the commercial tools listed above have you used before?,"If you've used any of the above platforms/tool(s) before, which tool(s) do you like best? List 1-3.",Which feature(s) do you find most useful from your previous experience with commercial platforms? List 1-3.,Tell us what's your dream annotation platform!!,Are you student? Do you work in academia / industry? Or are you an independent/freelancer?,Why do you need annotations for your task/data? ,Would the annotations you need require domain expertise or can it be handled by crowdsource workers with some minimum requirements?,Do you have a pool of trusted/expert annotators that can work on your annotation task(s)? ,Do you have any feedback on NLP annotation platforms? 2 | Yes,"Classification tasks, Span annotation, Evaluating Generation (Seq2Seq) tasks, Creating Translation/Generation data, Entity linking tasks, Parsing annotations / Treebanking tasks",Yes,WebAnno (https://webanno.github.io/webanno/),,,,No,MTurk (https://www.mturk.com/),,,,"all in one 3 | ",Student,"Existing open datasets don't fit my needs, Exploring new facet of the data/task",Domain expertise / Trained annotators required,Yes, 4 | Yes,"Classification tasks, Regression/Scoring tasks, Parsing annotations / Treebanking tasks",Yes,"Brat (https://brat.nlplab.org/), WebAnno (https://webanno.github.io/webanno/), Annotatrix, TrEd",TrEd,"dependency tree visualization, customization (keyboard shortcuts, shown attributes)",,No,MTurk (https://www.mturk.com/),,,,"Full Unicode support, customizable keyboard shortcuts for fast annotation, can be installed on server and support remote collaboration, but also allows working completely offline (such as annotating while in a plane).",Academia,Existing open datasets don't fit my needs,Domain expertise / Trained annotators required,No, 5 | Yes,"Classification tasks, Span annotation",Yes,"Brat (https://brat.nlplab.org/), WebAnno (https://webanno.github.io/webanno/), SWAN (https://github.com/annefried/swan/wiki)","WebAnno, SWAN",linking annotations (e.g. for coreference annotations),,Yes,"MTurk (https://www.mturk.com/), Fiverr.com, Prolific Academic",,MTurk,,"Easy to use, HTMl based, comes with a good range of pre-defined templates and functionalities. Takes over all the server hosting etc. and leaves the researcher with the task of designing experiments.",Industrial research,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,Yes, 6 | Yes,"Classification tasks, Span annotation, Entity linking tasks",Yes,"Brat (https://brat.nlplab.org/), Appraise (https://github.com/cfedermann/Appraise), MyMiner, TeamTat, TextAE","brat, Appraise, TeamTat",online available (no need to install),,No,"Prodigy (https://prodi.gy/), LightTag (https://www.lighttag.io/)",,,,"online available, easy teams management",Government,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in",Domain expertise / Trained annotators required,Yes, 7 | Yes,Classification tasks,No,"None, I haven't used any before.",,,,No,"None, I haven't heard any before.",,,,"Quick, easy",Student,Exploring new facet of the data/task,Domain expertise / Trained annotators required,Yes, 8 | Yes,"Classification tasks, Regression/Scoring tasks, Span annotation",Yes,"Brat (https://brat.nlplab.org/), custom software",Custom Software,,,Yes,"Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/)",?,,,"Custom ""widgets"", big high quality annotator community",Independent/Freelancer,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, There's no way to get around this without supervised datasets",Crowdsource is sufficient,Yes, 9 | Yes,"Classification tasks, Parsing annotations / Treebanking tasks",Yes,"Alpaca (https://inklab.usc.edu/AlpacaTag/), WebAnno (https://webanno.github.io/webanno/), CAT","WebAnno, Conllueditor",,,Yes,MTurk (https://www.mturk.com/),,,,"Easy to use, simple interface, high customization, built-in IAA calculation, simple & advanced search in files and annotation",Academia,"There's no data in the domain I'm interested in, Exploring new facet of the data/task",Domain expertise / Trained annotators required,Yes, 10 | Yes,"Span annotation, Entity linking tasks, word level annotation correction/tuning",Yes,TXM https://pages.textometrie.org/textometrie/en,TXM,be able to combine word level annotations with XML (for example TEI based) text encodings,"tools should make it clear if and how they provide character or word level annotations. With XML text encoding, character level annotation GUIs are not efficient. It is a general NLP tools question though: NLP tools usually take only raw .txt or word streams as input, which makes it difficult to inject NLP tools results back into XML encoded texts.",No,,,,,combining XML encoding and word level annotations,Academia,There's no way to get around this without supervised datasets,Domain expertise / Trained annotators required,Yes, 11 | Yes,"Classification tasks, Span annotation",Yes,"Brat (https://brat.nlplab.org/), CAT","1)CAT 12 | 2) Brat","1) flexibility in setting up the task 13 | 2) export annotated data in different formats",,No,"Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/)",Appen data annotation,"1) flexibility in setting up the task 14 | 2) possibility of setting gold threshold 15 | 3) re-use of annotation template","1) large pool of annotators 16 | 2) simple export format of the annotations",1) Easy to install; 2) flexibility in the creation of annotation tasks; 3) possibility of annotating discontinuous tags; 4) annotation both token-based and character-based; 4) possibility of annotating data across documents (e.g. cross-document anaphora annotation task); 5) easy to work export format: e.g. stand-off XML; Tab field (or accompanying conversion scripts across formats); ,Academia,"Existing open datasets don't fit my needs, Exploring new facet of the data/task",depends on the task - sometimes only experts; in other cases a mixture of the 2,No, 17 | Yes,"Classification tasks, Regression/Scoring tasks, Span annotation, Evaluating Generation (Seq2Seq) tasks, Entity linking tasks",Yes,"Brat (https://brat.nlplab.org/), Inception (https://github.com/inception-project/inception), WebAnno (https://webanno.github.io/webanno/), MMax2, Callipso, Knowtator",Knowtator,,,Yes,"Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/)",,,,too long to write,Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,No, 18 | Yes,Classification tasks,Yes,Brat (https://brat.nlplab.org/),,,,Yes,"MTurk (https://www.mturk.com/), Upwork.com, Crowdflower/Figure Eight",Don't understand question,Figure Eight,,🤷,Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,No, 19 | Yes,"Classification tasks, Span annotation, Entity linking tasks",Yes,Brat (https://brat.nlplab.org/),"only used brat, and I really did not like it","connecting with external resources, annotating relationships between named-entities",,Yes,"Prodigy (https://prodi.gy/), LightTag (https://www.lighttag.io/)",,Prodigy,Active Learning,"Active Learning, NER, Entity Linking w/ connection to wikidata for instances, relationships between named-entities, events annotation",Industry,There's no data in the domain I'm interested in,Domain expertise / Trained annotators required,Yes, 20 | Yes,"Classification tasks, Span annotation, Entity linking tasks",Yes,"WebAnno (https://webanno.github.io/webanno/), CAT (not open source); MMAX2 (not open source)",,"ease to create relations, easily exportable/usable annotation format",,No,"MTurk (https://www.mturk.com/), Samasource (https://www.samasource.com/), Fiverr.com",,,,trustworthy tokenisation and ability to deal with different character encodings so you can upload raw texts straight forward. ease to create the annotation projects (the less clicks the better). ease to edit annotations while annotating manually. ease to create relations among tags.,Academia,There's no data in the domain I'm interested in,Domain expertise / Trained annotators required,Yes, 21 | Yes,Parsing annotations / Treebanking tasks,Yes,"WebAnno (https://webanno.github.io/webanno/), Callisto, Praat","WebAnno, Callisto, Praat","sharing projects, drawing dependencies by hand, having a tagset to choose from",Controlling the annotations with a defined tagset is important and having a shared repository is too.,No,"None, I haven't heard any before.",,,,"It has a shared repository, possibly with version management, fixed tagsets and a possibility to give the annotators some test sentences to evaluate the quality of their annotation. If it does not follow the guidelines, their annotations should be labeled as uncertain. A clear display with the progression of the annotation for each set would be good. The annotators need to have a possibility to add comments on their annotation, with labels to make their point easier to retrieve. To help the colourblinds work, several colour palets should be made available (if any, but it is useful to distinguish text and dependencies).",Student,Exploring new facet of the data/task,Domain expertise / Trained annotators required,Yes, 22 | Yes,"Classification tasks, Regression/Scoring tasks, Span annotation, Evaluating Generation (Seq2Seq) tasks",Yes,Brat (https://brat.nlplab.org/),Brat,"easy to set up, and export output",,Yes,"Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/)",MTurk,MTurk,"easy and fast to get a massive amount of labels, allow for custom interface, easy to import/export data","easy to set up, and input/output data",Academia,There's no data in the domain I'm interested in,Domain expertise / Trained annotators required,No, 23 | Yes,"Classification tasks, Parsing annotations / Treebanking tasks",Yes,"Brat (https://brat.nlplab.org/), WebAnno (https://webanno.github.io/webanno/)",WebAnno,The feature of being able to monitor our student annotators easily and calculate the interannotator agreement on the fly. Also the curation feature where you see the different annotations and choose the right one is very useful.,,No,Fiverr.com,,,,WebAnno served our purpose well but had some problems with export/import and the design behind was not popular with our developers. Maybe INCEpTION is the answer but I have not had the possibility to try it out.,Academia,"There's no data in the domain I'm interested in, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,We had a some trusted annotators for the last project. We would have to train a new team when we need it next time, 24 | Yes,"Span annotation, Entity linking tasks",Yes,"Brat (https://brat.nlplab.org/), WebAnno (https://webanno.github.io/webanno/)",The 4A Annotation Tool,Disambiguation context and visual clues for entitiy disambiguation,,No,,,,,-,Academia,"Existing open datasets don't fit my needs, Exploring new facet of the data/task",Crowdsource is sufficient,No, 25 | Yes,"Classification tasks, Entity linking tasks, Parsing annotations / Treebanking tasks",Yes,WebAnno (https://webanno.github.io/webanno/),WebAnno,web application; possibility of making coreference chains; easy graphical-based way of working,,No,,,,,possibility to accomodate many people to work; input and output of many formats; easy adaptation/upload of annotation schemes on different levels; good communication among annotation levels; being fast,Academia,There's no data in the domain I'm interested in,Domain expertise / Trained annotators required,Yes,We work with INCEpTION at the moment. Close to the dream platform but still no perfect ways to combine annotation and linking. 26 | Yes,"Classification tasks, Entity linking tasks, Parsing annotations / Treebanking tasks",Yes,"Inception (https://github.com/inception-project/inception), WebAnno (https://webanno.github.io/webanno/)",WebAnno,Multilayer annotation,WebAnno was running slow with long texts.,Yes,LightTag (https://www.lighttag.io/),,,,"Multiayer annotation available, fast even with very long texts.",Academia,"Existing open datasets don't fit my needs, Exploring new facet of the data/task",Domain expertise / Trained annotators required,Yes, 27 | Yes,"Classification tasks, Span annotation, Entity linking tasks, Parsing annotations / Treebanking tasks",Yes,"Brat (https://brat.nlplab.org/), Eva (https://github.com/Ericsson/eva)",Brat,Flexibility in terms of entities to annotate,,No,,,,,Great UI and multi modal,Student,"There's no data in the domain I'm interested in, Exploring new facet of the data/task",Domain expertise / Trained annotators required,Yes, 28 | Yes,"Classification tasks, Bound box tasks, Span annotation, Audio annotation tasks, Entity linking tasks",Yes,"MMAX2, Praat, ELAN","Praat, ELAN, MMAX2",stand-off annotation on several layers to allow for overlapping spans,,No,"Prodigy (https://prodi.gy/), MTurk (https://www.mturk.com/)",MTurk,MTurk,,Can handle dialogue (time-based data on several layers),Academia,"Existing open datasets don't fit my needs, Exploring new facet of the data/task",Domain expertise / Trained annotators required,Yes, 29 | Yes,Parsing annotations / Treebanking tasks,Yes,UD Annotatrix https://jonorthwash.github.io/ud-annotatrix/,Annotatrix,Being able to modify the tokenisation and original input,,No,"None, I haven't heard any before.",,,,"offline, free/open-source, tightly coupled with the tools, data and formats that i'm annotating ",Academia,"Existing open datasets don't fit my needs, I work with annotation of marginalised and indigenous languages, for most of the languages I'm creating the first annotated data.",Domain expertise / Trained annotators required,Yes, 30 | Yes,"Classification tasks, Span annotation",Yes,"Brat (https://brat.nlplab.org/), own tools for the specific task","The only open source tool I used is Brat and it is a pain to parse afterwords, I would definitely avoid it","Easy deploy, multiple annotation options",,No,,,,,"Easy to use, intuitive, customisable (lets you define tags, set boundaries like not allowing the selection of trailing spaces in spans, whether to allow or not overlapping tags, etc), can output the annotated data in json format, errors are easy to correct, hopefully integrates the possibility of using active learning to preannotate data",Researcher in a fundation dedicated to a combination of academia + industry,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, There's no way to get around this without supervised datasets","depends on the task, but mostly with domain expertise, crwodsource usually won't do",mostly it is us the researchers who end up annotating as well as exploiting the data, 31 | Yes,Parsing annotations / Treebanking tasks,Yes,Sanchay Tool,None,None,It is good to make the created resources as open source.,No,"Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, Upwork.com",Upwork,4,3,Text Analysis Platform. From Phonetic to Discourse analysis information ,Academia,There's no way to get around this without supervised datasets,Domain expertise / Trained annotators required,Yes,Make the platforms as open source 32 | Yes,"Classification tasks, Span annotation, Entity linking tasks, Parsing annotations / Treebanking tasks",Yes,"Brat (https://brat.nlplab.org/), Folia (http://proycon.github.io/folia/), WebAnno (https://webanno.github.io/webanno/), Gate",WebAnno,"Open for new Annotation Schemas, Inter-Annotation Agreement module incorporated",,No,"LightTag (https://www.lighttag.io/), MTurk (https://www.mturk.com/)",None,,,"Open source, xml-compliant, allow for various annotation types, functionalities like: various levels of users (annotator manager, annotator, system manager, etc.), functionality for annotation guidelines that can be easily updated by manager and accessed by annotators, allow for multiple annotators on a project, inter-annotator agreement module (with various metrics to choose), ",Academia,"Existing open datasets don't fit my needs, Exploring new facet of the data/task",Domain expertise / Trained annotators required,Yes,Yes 33 | Yes,"Classification tasks, Regression/Scoring tasks, Span annotation",Yes,"Brat (https://brat.nlplab.org/), Amazon Mechanical Turk","For open source, I’ve only used one",Being able to mark overlapping spans,,Yes,"Prodigy (https://prodi.gy/), MTurk (https://www.mturk.com/), Upwork.com",AMT,I’ve only used one ,The workers themselves ,Ensures workers are paid fairly,Student,Existing open datasets don't fit my needs,Crowdsource is sufficient,Yes, 34 | Yes,"Classification tasks, Audio annotation tasks, Video captioning tasks, Creating Translation/Generation data, Emotion, Sentiment, Stance annotation...",No,"None, I haven't used any before., Google Sheets",,Be able to annotate plain simple text files as folder or individual files,,No,"Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/)",MTruk,,Acess to annotators from all over the world,"Portable, easy to handle, web base and desktop application and able to handle various text types and can export the annotation in simple easy to read format not requiring any scripting or programming knowledge to manipulate and analyze the data annotated",Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task",Crowdsource is sufficient,Yes,Not suitable for most annotation tasks 35 | Yes,"Classification tasks, Regression/Scoring tasks, Span annotation",Yes,"Brat (https://brat.nlplab.org/), WebAnno (https://webanno.github.io/webanno/)",Brat,Dictionary loading,,Yes,,,,,"Would have ability to have multiple different types of annotations for both NLP entity - codification, and ML types ",Industry,"Existing open datasets don't fit my needs, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,Yes, 36 | Yes,"Span annotation, Evaluating Generation (Seq2Seq) tasks",No,Appraise (https://github.com/cfedermann/Appraise),,,,No,"None, I haven't heard any before.",,,,"marking affected words (for example, erroneous words in a translation output), adding comments about the reasons/causes of marking (for example wrong inflection, ambiguous word, etc.)",Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task","something in between works well, such as researchers, CL students, etc","I have some, but not really a ""pool""", 37 | Yes,"Classification tasks, Span annotation, Entity linking tasks",Yes,"Doccano (https://github.com/doccano/doccano), Prodigy [not open source] (https://github.com/explosion/prodigy-recipes), Own custom coded one for specific task (audio quality classification)",Doccano,"Ability to deploy with docker images, set format of output data.","Biggest pain points were installing and running the live version - it may work locally on a laptop but sunning it on a server for others always brings problems, making sure the interface was usable by multiple people simultaneously without falling over, network and firewall settings to allow access etc.",Yes,"Prodigy (https://prodi.gy/), Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/), Upwork.com, Fiverr.com",Prodigy,Prodigy,"Support on forums, great documentation and example code from maintainers",Like prodigy but free? One that integrates easily with existing data and is easy to deploy and configure. Ability to support multiple interfaces too - like annotating on mobile. ,Startup,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task",Domain expertise / Trained annotators required,Yes, 38 | Yes,"Creating Translation/Generation data, Parsing annotations / Treebanking tasks",Yes,"Inception (https://github.com/inception-project/inception), WebAnno (https://webanno.github.io/webanno/)",,,,No,,,,,Inception with XML editor,Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task",Domain expertise / Trained annotators required,Yes, 39 | Yes,"Classification tasks, Span annotation, Parsing annotations / Treebanking tasks",Yes,Arethusa (https://www.perseids.org/tools/arethusa/app/#/),Arethusa,"visualization/highlighting, drag & drop interaction, automatic suggestions",,No,Prodigy (https://prodi.gy/),,,,"intuitive, assistive, consultative/making suggestions, manageable/understandable...",Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,No, 40 | Yes,"Span annotation, Creating Translation/Generation data",Yes,"Brat (https://brat.nlplab.org/), Appraise (https://github.com/cfedermann/Appraise)","Brat, TextAE",Being able to drag and drop annotations,,No,"MTurk (https://www.mturk.com/), Fiverr.com",,,,A flexible Open Source platform that allows both on-premises and online hosted applications,Academia,"Existing open datasets don't fit my needs, Exploring new facet of the data/task",Crowdsource is sufficient,Yes, 41 | Yes,"Classification tasks, Span annotation, Entity linking tasks, NER annotation",Yes,"Brat (https://brat.nlplab.org/), Doccano (https://github.com/doccano/doccano)",Doccano,Active learning,,Yes,"Prodigy (https://prodi.gy/), Label Studio",Prodigy,Prodigy,I haven't used the rest.,Label Studio,Industry,"Existing open datasets don't fit my needs, There's no way to get around this without supervised datasets, Security/privacy reason",Domain expertise / Trained annotators required,Yes, 42 | Yes,"Classification tasks, Regression/Scoring tasks, Span annotation, Entity linking tasks, Writing paraphrases or participating in dialogue",Yes,Slate (https://github.com/jkkummerfeld/slate),Slate,"(1) Simple human-readable non-DRM data formats, (2) Easy installation, (3) The ability to modify the tool for project-specific features.",,Yes,"Prodigy (https://prodi.gy/), LightTag (https://www.lighttag.io/), Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/), Scale (https://scale.com/), Upwork.com, Fiverr.com",MTurk,,Access to a large pool of workers,"(1) Easy to install, (2) Options for both a web interface and an offline interface, (3) Standard data formats and the option to export to flat-text files that can be manipulated with command line utilities, (4) Customisable, (5) Integrates with git or another version control tool for data storage and sharing.",Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task, Developing new annotation methods","Sometimes one and sometimes the other, depending on the project",Kind of - I train university students as needed for projects., 43 | Yes,"Classification tasks, Span annotation, Creating Translation/Generation data",Yes,"Brat (https://brat.nlplab.org/), GATE",None yet,"Must be simple to set up, have a clean interface, and have a helpful output format",,Yes,"Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/)",,,"Diverse crowd, built in quality control","simple, fast, lots of demo templates, easy data collection and qa",Academia,"There's no data in the domain I'm interested in, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,No, 44 | Yes,"Classification tasks, Audio annotation tasks",Yes,"None, I haven't used any before.","VoTT, UAM Corpus Tool, ","Access of coding scheme, calculations of inter coder reliability ",,Yes,"None, I haven't heard any before.",NVivo,,,"Easy to choose labels, availability of annotation guides and examples, calculations of intercoder reliability ",Academia,There's no way to get around this without supervised datasets,Domain expertise / Trained annotators required,Yes, 45 | Yes,"Classification tasks, Span annotation, Audio annotation tasks, Parsing annotations / Treebanking tasks",Yes,"WebAnno (https://webanno.github.io/webanno/), EXMARaLDA, Praat, ELAN, CorpusDraw, WebLicht",,,,No,MTurk (https://www.mturk.com/),,,,"A sustainable open source tool that is developed by an open community and extensible. It should rely on well-established open formats as well and at best supports multiple formats. And of course Unicode-based and available for Linux, Mac, and Windows. Its use must be intuitive to encourage the user community to grow fast. Furthermore allowing to flexibly plug-in automatic tools to pre-annotate my data'd be great, also allowing the model to directly improve from my corrections",Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,Yes, 46 | Yes,"Classification tasks, Span annotation, Entity linking tasks",Yes,"Brat (https://brat.nlplab.org/), GATE, TextAE","GATE, TextAE, Brat",annotation visualization,,No,,,,,improved TextAE,Academia,"There's no data in the domain I'm interested in, Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,Yes, 47 | Yes,"Classification tasks, Span annotation, Evaluating Generation (Seq2Seq) tasks, Entity linking tasks, Parsing annotations / Treebanking tasks",Yes,"Brat (https://brat.nlplab.org/), Folia (http://proycon.github.io/folia/)","Prodigy, snorkel ",Annotations ,No,Yes,"Prodigy (https://prodi.gy/), MTurk (https://www.mturk.com/), Upwork.com, Fiverr.com","M Turk, Prodigy",M Turk,Annotations ,"Easy interface, flexibility to work for different annotation tasks",Industry,"Existing open datasets don't fit my needs, Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,Yes,No 48 | Yes,Parsing annotations / Treebanking tasks,Yes,"Brat (https://brat.nlplab.org/), WebAnno (https://webanno.github.io/webanno/), Synpathy, Annotate",None are good for everything,"Allowing for different types of annotation, easy choosing your own tagset/annotation format",,No,"None, I haven't heard any before.",,,,"Adaptable for annotation type/format, easy input/output conversion, fast even with larger datasets, search for both words and annotations",Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,No, 49 | Yes,"Classification tasks, Regression/Scoring tasks, Span annotation, Entity linking tasks, Parsing annotations / Treebanking tasks",Yes,WebAnno (https://webanno.github.io/webanno/),Inception ,"multiplication of example annotations, visualization , constraints ","In process of annotation, the annotator needs a support for rules writing to facilitate the annotation.",No,Prodigy (https://prodi.gy/),no one,I did not,no,"1. Dynamic structure of the annotation - easy extension of the annotation, the tagsets, etc. 50 | 2. Access to other resources via context dependant rules - access to different lexicons, another annotated corpora, etc. 51 | 3. Rules for extension of the annotation and for validation checking 52 | 4. Different visualizations 53 | 5. Context dependant help for the annotations",Academia,"There's no data in the domain I'm interested in, Available data is not enough in terms of annotated phenomena and size",Domain expertise / Trained annotators required,Yes,"It needs to support the creativity of the annotator or the annotation teams (with different roles of the members). Very often in the process of annotation it is necessary to update other resources. For example, if the task is annotation with word senses, then the addition of new senses will e necessary. It will be nice to have the possibility to switch to the other resources and to update it. To support the changes in the annotation schema and reclassification of the annotation that are already done. " 54 | Yes,"Classification tasks, Span annotation, Creating Translation/Generation data, Entity linking tasks, Parsing annotations / Treebanking tasks",Yes,Folia (http://proycon.github.io/folia/),Folia,the possibility to keep previous annotations and to annotate also discontinuous elements,,No,"Prodigy (https://prodi.gy/), MTurk (https://www.mturk.com/), Gengo AI Workbench (https://support.gengo.com/hc/en-us/articles/360019035694-The-AI-Workbench)",none,,,An annotation tool for bitext or parallel texts ,Academia,"Existing open datasets don't fit my needs, Exploring new facet of the data/task",Domain expertise / Trained annotators required,Yes, 55 | Yes,"Span annotation, Entity linking tasks",Yes,"Brat (https://brat.nlplab.org/), Knowtator",Havent tried anything else,Good UI,Nothing has an active community to maintain,Yes,"Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/)",Figure8 and MTurk,MTurk,Easy management of jobs,A maintained platform with an active community to support. New feature requests easily added,Industry,"Existing open datasets don't fit my needs, I work on biomedical data",Domain expertise / Trained annotators required,Yes,Great initiative. 56 | Yes,"Classification tasks, Span annotation, Entity linking tasks, Parsing annotations / Treebanking tasks",Yes,"Brat (https://brat.nlplab.org/), MAE (Multi-purpose Annotation Environment) by Amber Stubbs",MAE,custom labels,nice to have inter annotation agreement evaluation functionality,Yes,"Prodigy (https://prodi.gy/), MTurk (https://www.mturk.com/), Watson Knowledge Studio","MTurk, Watson Knowledge Studio",Watson Knowledge Studio,"end-to-end functionality (type system creation, ground truth creation, inter annotation agreement evaluation, model building, evaluation)",Watson Knowledge Studio is already my dream platform.,Industry,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task",Domain expertise / Trained annotators required,Yes, 57 | Yes,"Classification tasks, Parsing annotations / Treebanking tasks",No,"None, I haven't used any before.",,,,No,"Upwork.com, Fiverr.com",Fiverr.com,,,i don't know any,Independent/Freelancer,"There's no data in the domain I'm interested in, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,No, 58 | Yes,Classification tasks,Yes,"Alpaca (https://inklab.usc.edu/AlpacaTag/), Eva (https://github.com/Ericsson/eva), Annotation Studio (https://github.com/hyperstudio/Annotation-Studio)",Eva,Lol,Nope,Yes,Prodigy (https://prodi.gy/),Prodigy,Lol,Lol,Lol,Industry,Existing open datasets don't fit my needs,Domain expertise / Trained annotators required,No,Nope 59 | Yes,"Classification tasks, Span annotation, Creating Translation/Generation data, Entity linking tasks",Yes,"Brat (https://brat.nlplab.org/), WebAnno (https://webanno.github.io/webanno/)",Brat,"quick annotation, web access",,No,,,,,"brat, with collaborative annotation and long chunk management (sentence-size)",Industry,"Existing open datasets don't fit my needs, There's no way to get around this without supervised datasets",Crowdsource is sufficient,No, 60 | Yes,"Classification tasks, Span annotation, Evaluating Generation (Seq2Seq) tasks, Creating Translation/Generation data",Yes,Brat (https://brat.nlplab.org/),,flexibility; span annotation,,Yes,"Prodigy (https://prodi.gy/), Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/), Alegion (https://www.alegion.com/platform)",prodify.ai; mturk,,active learning; focus on one annotation at time; csv export ,plug & play existing models; possibility of annotations without lexical grounding,Industry,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in",Crowdsource is sufficient,No, 61 | Maybe,"Audio annotation tasks, Creating Translation/Generation data, Parsing annotations / Treebanking tasks",No,"None, I haven't used any before.",,,,No,"None, I haven't heard any before.",,,,Unknown,Independent/Freelancer,Exploring new facet of the data/task,Domain expertise / Trained annotators required,No, 62 | Yes,"Classification tasks, Regression/Scoring tasks, Parsing annotations / Treebanking tasks",Yes,"Brat (https://brat.nlplab.org/), WebAnno (https://webanno.github.io/webanno/), MMAX, RSTTool, Annotate, ELAN, Exmaralda, TreeAligner, Excel [sic!]",task-specific,"interoperability (for annotations that require the combined application of different tools), keyboard-only annotation (no mouse!), controlled annotation vocabularies (limited freetext entries)","semiautomatic preannotation is yet another criterion, but not for annotation per se; requires interoperability with NLP tools",No,"Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/)",,,,"INTEROPERABILTIY, transformability, compliance to web standards, a uniform interchange format for NLP tools and annotation tools, easy combination of conflicting annotations (even if the tools skrewed up tokenization or replaced special characters)",Academia,"Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,Yes,There's no universal tool and there will never be. So we need to find ways to combine the output of different tools. 63 | Yes,"Classification tasks, Span annotation, Entity linking tasks",Yes,Brat (https://brat.nlplab.org/),,,,Yes,"Prodigy (https://prodi.gy/), LightTag (https://www.lighttag.io/), MTurk (https://www.mturk.com/), Scale (https://scale.com/), Upwork.com, Fiverr.com, Own custom google-doc workflow",All above,Own workflow; the rest were all rubbish.,Ability to define complex NLP annotation tasks; ability to instruct and guide annotators; flexibility to automate evaluation.,Powerful for real-world NLP like our custom; slick UI & marketing like Prodigy; cheap and strong community like open source.,Industry,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task, There's no way to get around this without supervised datasets, Academic datasets and pre-trained models are unusable on real data",Domain expertise / Trained annotators required,Yes,"Haven't checked lately but 1-2 years ago, they were all sad. Some have good marketing, but all were very naive and difficult to apply outside of trivial tasks and oh-wow shallow tutorials." 64 | Yes,"Classification tasks, Span annotation, Evaluating Generation (Seq2Seq) tasks, Entity linking tasks, Parsing annotations / Treebanking tasks",Yes,Brat (https://brat.nlplab.org/),Brat,Easy to use for multiple tasks,,Yes,Prodigy (https://prodi.gy/),Prodigy,Prodigy,Super useful,Uses active learning for all possible NLp takes for efficient annotations,Industry,"Existing open datasets don't fit my needs, Exploring new facet of the data/task",Domain expertise / Trained annotators required,No, 65 | Yes,"Classification tasks, Span annotation, Evaluating Generation (Seq2Seq) tasks, Audio annotation tasks, Creating Translation/Generation data, Parsing annotations / Treebanking tasks",Yes,"Brat (https://brat.nlplab.org/), Praat (https://www.fon.hum.uva.nl/praat/)","Brat, Praat",Click and drag annotation; Waveform annotation,"They only work on niche tasks. Most real world tasks need annotations that span several of these tools, but a single tool is insufficient.",No,"Prodigy (https://prodi.gy/), Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/), Upwork.com, Fiverr.com",,,,"A platform that allows audio transcription of full dialogs with entity annotation and (inverse) 66 | text normalization for entity values. Annotation of dialog turns, similar to dialogue act annotation. Entity co-reference resolution.",Industry,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Could be crowdsourced but the data is sensitive.,Yes,"Most platforms work well for an individual task, but there's an enormous amount of effort needed to add new features or repurpose an existing tool to a more complicated NLP task. Especially if the NLP task requires a speech input, most tools become unusable. Also, each platform has a different output format. If you want to use two different tools, you need to write middleware to transform the data formats into a compatible form -- usually a proprietary format." 67 | Yes,"Classification tasks, Span annotation",Yes,Brat (https://brat.nlplab.org/),Brat,Using automatic annotation tools,Automatic recommendation from existing annotations,No,"None, I haven't heard any before.",None,No,No,Annotate with existing tools and then modify manually. Platform could suggest modifications from previous manual modifications,Industry,Existing open datasets don't fit my needs,Domain expertise / Trained annotators required,No, 68 | Yes,"Classification tasks, Regression/Scoring tasks, Span annotation, Creating Translation/Generation data, Entity linking tasks",Yes,Appraise (https://github.com/cfedermann/Appraise),Appraise,"Fast UI 69 | Single Sign-On (SSO) support 70 | Python-based",,Yes,MTurk (https://www.mturk.com/),Mturk,,,Appraise v4 (2020),Industry,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task",Crowdsource is sufficient,Yes, 71 | Yes,"Creating Translation/Generation data, Entity linking tasks, Parsing annotations / Treebanking tasks",No,"None, I haven't used any before.",,,,Yes,MTurk (https://www.mturk.com/),,,,"The platform should learn and suggest annotation, the user should just check and correct it, and should have option to skip cases, where (s)he is not sure.",Industry,There's no data in the domain I'm interested in,Domain expertise / Trained annotators required,Yes, 72 | Yes,"Classification tasks, Span annotation, Evaluating Generation (Seq2Seq) tasks, Creating Translation/Generation data, Entity linking tasks",Yes,"Brat (https://brat.nlplab.org/), Prodigy",,,,Yes,"Prodigy (https://prodi.gy/), MTurk (https://www.mturk.com/)",,,,Built in active learning to get the best annotations quickest,Industry,There's no data in the domain I'm interested in,Crowdsource is sufficient,No, 73 | Yes,"Classification tasks, Span annotation, Creating Translation/Generation data, Entity linking tasks",Yes,"Brat (https://brat.nlplab.org/), Appraise (https://github.com/cfedermann/Appraise), Knowtator, PubTator","Brat, Appraise",1/ ability to use pre-annotations prepared outside of tool 2/ ease of designing annotation scheme 3/ ease of exporting annotation results to standard format,,No,"Prodigy (https://prodi.gy/), MTurk (https://www.mturk.com/), Microworkers.com",,,,Honestly BRAT is pretty good. It would be nice to combine the ability to also do document level annotations. ,Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in",Domain expertise / Trained annotators required,sometimes,"One important criteria is ease of installing and using. I think WebAnno would meet many of my needs, but I haven't used it yet because it seems very complex to get into. " 74 | Yes,"Classification tasks, Evaluating Generation (Seq2Seq) tasks, Audio annotation tasks",Yes,Brat (https://brat.nlplab.org/),,,,No,"Prodigy (https://prodi.gy/), MTurk (https://www.mturk.com/)",,,,None,Academia,There's no way to get around this without supervised datasets,Domain expertise / Trained annotators required,Yes, 75 | Yes,"Classification tasks, Regression/Scoring tasks, Span annotation, Entity linking tasks, Parsing annotations / Treebanking tasks",Yes,"Doccano (https://github.com/doccano/doccano), WebAnno (https://webanno.github.io/webanno/), Catma","Webanno, catma",Multiple annotation schemes for the same data. Visualisation of annotations.,,No,"Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/)",,,,"Annotations can be done on different levels, use of multiple annotation schemes, several annotators being able to use the tool at the same time, automatic kappa computation",Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Half and half. At least some training experience.,Yes, 76 | Yes,Span annotation,Yes,Brat (https://brat.nlplab.org/),,,,Yes,Prodigy (https://prodi.gy/),,,,"Intuitive, powerfull, scriptable",Industry,Existing open datasets don't fit my needs,Domain expertise / Trained annotators required,Yes, 77 | Maybe,Classification tasks,Yes,Brat (https://brat.nlplab.org/),,"Scalability, multi-annotator agreement",,Yes,"Prodigy (https://prodi.gy/), MTurk (https://www.mturk.com/)",MTurk,MTurk was ok as it gave us access to SMEs.,"Easy to use, can test them on a small project upfront","Easy to use, with annotation stats, annotations to be exported in a variety of formats, stable",Industry,"Existing open datasets don't fit my needs, Exploring new facet of the data/task",Domain expertise / Trained annotators required,No, 78 | Yes,"Classification tasks, Evaluating Generation (Seq2Seq) tasks",No,"None, I haven't used any before.",,,,Yes,MTurk (https://www.mturk.com/),Amazon mechanical turk,Amazon mechanical turk,Amazon mechanical turk,"A place where I can create simple scenarios that apply to all the data, for example if we need to annotate a classification task I want to show the classes that are relevant for every text with a tooltip automatically just by plugin in the data into the tool and selecting the target UI preference for the commodity of the annotator.",Industry + Student,Exploring new facet of the data/task,Crowdsource is sufficient,I don't really know, 79 | Yes,"Audio annotation tasks, Video captioning tasks",Yes,https://github.com/dynilib/dynitag,,"for audio, multi-region (and overlapping regions) support",,Yes,Gengo AI Workbench (https://support.gengo.com/hc/en-us/articles/360019035694-The-AI-Workbench),Modified version of Gengo workbench for speech annotation,,Flexibility due to syntax-based annotation (but not user/non-technical annotator friendly),"Multiple mark-up support (JSON, protobuf, XML, etc), but at the same time, can be fully GUI-based. In short, targets a wide range of annotators (from non-technical to technical).",Industry,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Depends. In the past (circa 2005) we even had spectrogram experts read spectrograms without audio.,"Can't say trusted/expert, but we have annotators", 80 | Yes,Creating Translation/Generation data,No,"None, I haven't used any before.",,,,Yes,"MTurk (https://www.mturk.com/), Upwork.com, Fiverr.com","Upwork, MTurk",MTurk,creating annotation jobs,have similar interface as mturk to create jobs. ,Industry,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task",Domain expertise / Trained annotators required,No, 81 | Maybe,multi choice question answering,Yes,"Brat (https://brat.nlplab.org/), Doccano (https://github.com/doccano/doccano)",,less efforts for annotator,,No,"Prodigy (https://prodi.gy/), Upwork.com",,,,"Annotation should feel effortless, descriptions might be helpful for annotators. We can create it like a game to make it more fun. ",Industry,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,Yes, 82 | Yes,"Classification tasks, Regression/Scoring tasks, Entity linking tasks",No,"None, I haven't used any before.",,,,No,Prodigy (https://prodi.gy/),,,,easy to define what to annotated,Industry,There's no data in the domain I'm interested in,Crowdsource is sufficient,No, 83 | Yes,"Classification tasks, item/product relevancy against search queries",No,"None, I haven't used any before.",,,,No,"None, I haven't heard any before.",,,,Unfortunately I've very less experience on using an annotation platform.,Industry,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in",Domain expertise / Trained annotators required,No, 84 | Yes,"Bound box tasks, Video captioning tasks, Parsing annotations / Treebanking tasks, Semantic Role Labeling",Yes,WebAnno (https://webanno.github.io/webanno/),,,https://github.com/FrameNetBrasil/webtool,No,"None, I haven't heard any before.",,,,"Multimodal, multilingual, fast and light...",Academia,We build datasets in my lab,Domain expertise / Trained annotators required,Yes, 85 | Yes,"Classification tasks, Open Information Extraction",No,Brat (https://brat.nlplab.org/),None,"Related to the task , easy to include, known exportation format.",,No,"None, I haven't heard any before.",,,,Fitted to Open Information Extraction. ,Academia,"Existing open datasets don't fit my needs, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,Yes, 86 | Yes,Classification tasks,Yes,WebAnno (https://webanno.github.io/webanno/),,,,No,MTurk (https://www.mturk.com/),,,,ok,Industry,Existing open datasets don't fit my needs,Crowdsource is sufficient,No, 87 | Yes,"Classification tasks, Span annotation, Creating Translation/Generation data",No,"None, I haven't used any before.",,,,No,"None, I haven't heard any before.",,,,Should have cross validation function to avoid mistakes.,Industry,There's no way to get around this without supervised datasets,Crowdsource is sufficient,No, 88 | Yes,Span annotation,Yes,YEDDA (https://github.com/jiesutd/YEDDA),YEDDA (https://github.com/jiesutd/YEDDA),"shortcut annotation, suggestion, annotation quality analysis",,No,"None, I haven't heard any before.",,,,Python without any pre-installed package.,Academia,Existing open datasets don't fit my needs,Domain expertise / Trained annotators required,No, 89 | Yes,"Classification tasks, Audio annotation tasks",No,"None, I haven't used any before.",,,,No,"None, I haven't heard any before.",,,,Supporting multiple tasks in multiple languages in at one place where anyone can contribute.,Industry,There's no data in the domain I'm interested in,Crowdsource is sufficient,No, 90 | Maybe,Classification tasks,No,"None, I haven't used any before.",,,,No,"None, I haven't heard any before.",,,,x,x,x,Crowdsource is sufficient,No, 91 | Yes,"Classification tasks, Span annotation, Evaluating Generation (Seq2Seq) tasks, Creating Translation/Generation data, Parsing annotations / Treebanking tasks",Yes,"Brat (https://brat.nlplab.org/), Doccano (https://github.com/doccano/doccano), Inception (https://github.com/inception-project/inception), MAT (MITRE Annotation Toolkit)",Doccano,Approachable UX,,Yes,"Prodigy (https://prodi.gy/), MTurk (https://www.mturk.com/), Fiverr.com",Prodigy,Prodigy,Tight integration with spaCy,Task agnostic (task as config?). Tight integration loop with active learning and bootstrapping.,Industry,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Domain expertise / Trained annotators required,Yes, 92 | Yes,"Span annotation, Evaluating Generation (Seq2Seq) tasks, Creating Translation/Generation data, Parsing annotations / Treebanking tasks",No,"None, I haven't used any before.",,,I want an easy way to associate different portions of tree-structured semantics to different portions of text. Ideally allowing up to many-to-many mappings (so discontiguous portions of trees can map to discontiguous portions of text).,Yes,"Prodigy (https://prodi.gy/), Appen Data Annotation (https://appen.com/solutions/platform-overview/Formerly Figure8, former-formerly Crowdflower),, MTurk (https://www.mturk.com/), Upwork.com, Fiverr.com, Prolific Academic","MTurk, Prolific Academic",,Recruiting and screening participants in human annotation tasks and evaluations.,"Ease of multi-level annotations, even if they don't all align nicely into a hierarchy. Being able to start with e.g. entity-linking or basic SRL and build up to a richer semantic representation including perhaps discourse or pragmatic annotations. Playing nice with audio or video data would be a bonus as well. And working nicely for dialogue would be cool, too",Academia,"Existing open datasets don't fit my needs, There's no data in the domain I'm interested in, Exploring new facet of the data/task, There's no way to get around this without supervised datasets",Crowdsource is sufficient,No, -------------------------------------------------------------------------------- /NLP-Annotation-Platforms.tsv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alvations/annotate-questionnaire/2b890f34d5a897129ec8cbe61a93aba62940e5d4/NLP-Annotation-Platforms.tsv -------------------------------------------------------------------------------- /NLP-Annotation-Platforms.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alvations/annotate-questionnaire/2b890f34d5a897129ec8cbe61a93aba62940e5d4/NLP-Annotation-Platforms.xlsx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # A Survey of NLP Annotation Platforms 2 | 3 | This `README` is a summary of responses to questionnaire on annotation platforms (from https://forms.gle/iZk8kehkjAWmB8xe9). The questionnaire is a short survey on users' usage and wants of annotation for text and/or image processing for Natural Language Processing (NLP) tasks. 4 | 5 | This summary is based on the results collated on 30 June 2020. We may update the results if there are significant no. of new responses after the stipulate dated in the previous sentence. 6 | 7 | The raw data for the survey results can be found on: 8 | 9 | - [`.csv` format](https://github.com/alvations/annotate-questionnaire/blob/master/NLP-Annotation-Platforms.csv) 10 | - [`.tsv` format](https://github.com/alvations/annotate-questionnaire/blob/master/NLP-Annotation-Platforms.tsv) 11 | - [`.xlsx` format](https://github.com/alvations/annotate-questionnaire/blob/master/NLP-Annotation-Platforms.xlsx) 12 | 13 | # Overview 14 | 15 | - [**Population**](https://github.com/alvations/annotate-questionnaire/blob/master/README.md#population) 16 | - Population Breakdown 17 | - [**Annotations Requirements**](https://github.com/alvations/annotate-questionnaire/blob/master/README.md#annotations-requirements) 18 | - Why do you need annotations for your task/data? 19 | - What NLP tasks do you need annotation for? 20 | - Would the annotations you need require domain expertise or can it be handled by crowdsource workers with some minimum requirements? 21 | - Do you have a pool of trusted/expert annotators that can work on your annotation task(s)? 22 | - [**Annotation Platforms**](https://github.com/alvations/annotate-questionnaire/blob/master/README.md#annotation-platforms) 23 | - Have you used open source / commercial annotation platforms before? 24 | - Which of these *open source* annotation platforms/tools have you ***used*** before? 25 | - Which of these *commercial* annotation platforms/tools have you ***heard*** before? 26 | - Which of these *commercial* annotation platforms/tools have you ***used*** before? 27 | - Most Useful Features 28 | - Suggestions to annotation tool creators 29 | - Any other feedback on annotation tools? 30 | - [**Your Dream Annotation Platform**](https://github.com/alvations/annotate-questionnaire/blob/master/README.md#your-dream-annotation-platform) 31 | - Notable Mentions about "Your Dream Anntoation Platform" 32 | - [**Acknowledgements**](https://github.com/alvations/annotate-questionnaire/blob/master/README.md#acknowledgement) 33 | 34 | # Population 35 | 36 | - There are 78 responses to the questionnaire. 37 | - **94.9% need _annotations_** for their work 38 | - **80.8% have used an _annotation platform_** before 39 | 40 | 41 | ## Population Breakdown 42 | 43 | ![](https://docs.google.com/spreadsheets/d/e/2PACX-1vTMa1QOjA4b45HvT3FKnRnfsgIpgjtjOb7RzJ8ZODJZ1vZr0ImtQ313CdAxcnSi3tXcHiXcXc-oUEoC/pubchart?oid=483246479&format=image) 44 | 45 | 46 | - 47.4% from academia 47 | - 34.6% from industry 48 | - 6.4% are students 49 | - 3.8% freelance/independent researchers 50 | - 1.3% government 51 | - the rest comes from a mix of either of the above categories 52 | 53 | 54 | # Annotations Requirements 55 | 56 | 57 | ## Why do you need annotations for your task/data? 58 | 59 | 60 | ![](https://docs.google.com/spreadsheets/d/e/2PACX-1vTMa1QOjA4b45HvT3FKnRnfsgIpgjtjOb7RzJ8ZODJZ1vZr0ImtQ313CdAxcnSi3tXcHiXcXc-oUEoC/pubchart?oid=1144655843&format=image) 61 | 62 | - **65.3% (of 78 respondents) can't find existing open datasets** that fit their needs 63 | - **53.8% (of 78 respondents) stated that there is no data in the domain** they're interested in 64 | - **47.4% (of 78 respondents) wants to explore new facet of data/task** that requires new annotations 65 | 66 | - **Others** reasons includes: 67 | - Data for low-resource languages doesn't exist (e.g. marginalised and indigenous langauges) 68 | - Security/privacy reason 69 | - Developing new annotation methods 70 | - Available data size is insufficient for specific phenomena / task 71 | - Data for biomedical data is inadequate/insufficient 72 | - Academic datasets and pre-trained models are unusable on real data 73 | - "We build datasets in my lab" 74 | 75 | 76 | ## What NLP tasks do you need annotation for? 77 | 78 | ![](https://docs.google.com/spreadsheets/d/e/2PACX-1vTMa1QOjA4b45HvT3FKnRnfsgIpgjtjOb7RzJ8ZODJZ1vZr0ImtQ313CdAxcnSi3tXcHiXcXc-oUEoC/pubchart?oid=1777988562&format=image) 79 | 80 | - **75.6% (of 78 respondents) need Classification** annotations 81 | - **60.0% (of 78 respondents) need Span** annotations 82 | - **38.5% (of 78 respondents) need Entity Linking** annotations 83 | 84 | - **Others** annotation includes: 85 | - Video captioning (3 out of 78 respondents) 86 | - Bounding boxes (2 out of 78 respondents) 87 | - Word level annotation (includes correction of word tokens) 88 | - Sentiment / Emotions annotation 89 | - Stance annotation 90 | - NER annotation 91 | - Writing paraphrases / Dialog systems 92 | - MCQ answering 93 | - Item/Product relevancy scoring 94 | - Semantic Role Labeling 95 | - Open Information Extraction 96 | 97 | ## Would the annotations you need require domain expertise or can it be handled by crowdsource workers with some minimum requirements? 98 | 99 | ![](https://docs.google.com/spreadsheets/d/e/2PACX-1vTMa1QOjA4b45HvT3FKnRnfsgIpgjtjOb7RzJ8ZODJZ1vZr0ImtQ313CdAxcnSi3tXcHiXcXc-oUEoC/pubchart?oid=2081029176&format=image) 100 | 101 | ## Do you have a pool of trusted/expert annotators that can work on your annotation task(s)? 102 | 103 | ![](https://docs.google.com/spreadsheets/d/e/2PACX-1vTMa1QOjA4b45HvT3FKnRnfsgIpgjtjOb7RzJ8ZODJZ1vZr0ImtQ313CdAxcnSi3tXcHiXcXc-oUEoC/pubchart?oid=1814184526&format=image) 104 | 105 | - **Others** comments includes: 106 | - Had some trusted annotators for last project. Would have to train a new team the next time. 107 | - Often, it is the researchers who want to use the annotations for a specific task annotating the data. 108 | - Kind of - I train university students to annotate as needed for specific projects. 109 | - Somtimes. 110 | - I don't really know. 111 | - Can't say we have trusted or expert annotators but we have annotators. 112 | 113 | # Annotation Platforms 114 | 115 | ## Have you used open source / commercial annotation platforms before? 116 | 117 | ![](https://docs.google.com/spreadsheets/d/e/2PACX-1vTMa1QOjA4b45HvT3FKnRnfsgIpgjtjOb7RzJ8ZODJZ1vZr0ImtQ313CdAxcnSi3tXcHiXcXc-oUEoC/pubchart?oid=851618670&format=image) 118 | 119 | - 79.7% (63 of 78 respondents) **have** used open source annotation platforms before 120 | - 58.2% (46 of 78 respondents) **have not** used commercial annotation platforms before 121 | 122 | ## Which of these *open source* annotation platforms/tools have you ***used*** before? 123 | 124 | 125 | ![](https://docs.google.com/spreadsheets/d/e/2PACX-1vTMa1QOjA4b45HvT3FKnRnfsgIpgjtjOb7RzJ8ZODJZ1vZr0ImtQ313CdAxcnSi3tXcHiXcXc-oUEoC/pubchart?oid=1311576751&format=image) 126 | 127 | - 50.0% (out of 78 respondents) used [Brat](https://brat.nlplab.org/) before 128 | - 28.2% (out of 78 respondents) used [WebAnno](https://webanno.github.io/webanno/) before 129 | - 17.9% (out of 78 respondents) have not used any open source annotation tools before 130 | - 12 other tools (not listed in the questionnaire) is used by 2 respondents 131 | - 40 other tools (not listed in the questionnaire) is used by 1 respondent 132 | 133 | 134 | ## Which of these commercial annotation platforms/tools have you ***heard*** before? 135 | 136 | ![](https://docs.google.com/spreadsheets/d/e/2PACX-1vTMa1QOjA4b45HvT3FKnRnfsgIpgjtjOb7RzJ8ZODJZ1vZr0ImtQ313CdAxcnSi3tXcHiXcXc-oUEoC/pubchart?oid=922848789&format=image) 137 | 138 | - 51.3% (out of 78 respondents) have heard of [Amazon Mechanical Turk](https://www.mturk.com/) (MTurk) 139 | - 41.0% (out of 78 respondents) have heard of [Appen Data Annotation](https://appen.com/solutions/platform-overview/) (formerly known as Figure8, former-formerly known Crowdflower) 140 | - 33.3% (out of 78 respondents) have heard of [Prodigy](https://prodi.gy/) 141 | - 19.2% (out of 78 respondents) have not heard of any of the commercial tools before 142 | 143 | (**Note**: Fiverr and Upwork are generally crowd-source sites that provides huamn annotators but may/may not provide an annotation platform) 144 | 145 | ## Which of these commercial annotation platforms/tools have you ***used*** before? 146 | 147 | ![](https://docs.google.com/spreadsheets/d/e/2PACX-1vTMa1QOjA4b45HvT3FKnRnfsgIpgjtjOb7RzJ8ZODJZ1vZr0ImtQ313CdAxcnSi3tXcHiXcXc-oUEoC/pubchart?oid=14680981&format=image) 148 | 149 | **Note:** The response for this question is really low, so it might not be representative of all annotation platform users. But this also highlights the stark adoption rate between open source vs commercial annotation tools. 150 | 151 | ## Most Useful Features 152 | 153 | The table below presents the list of features that respondents find useful in the open source or commercial annotation tools they have previously interacted with. 154 | 155 | | Feature | Open Source | Commercial | 156 | |:-|:-:|:-:| 157 | | Active Learning | ✓ | ✓ | 158 | | Annotation progress monitor | ✓ | | 159 | | Annotation shortcuts | ✓ | | 160 | | Annotation visualization | ✓ | | 161 | | Automatically suggesting annotations | ✓ | | 162 | | Audio annotation support | ✓ | | 163 | | Available Online (No installation)| ✓ | | 164 | | Connecting to external resources (e.g. storage / knowledge base / dictionary) | ✓ | | 165 | | Customizable annotation tasks/labels (with extra code/schema)| ✓ | ✓ | 166 | | Customizable annotation view (when annotating) | ✓ | | 167 | | Drag and drop interactions | ✓ | | 168 | | Easy setup/installation (e.g Docker) | ✓ | | 169 | | Export to multiple formats| ✓ | ✓ | 170 | | Good UI/UX | ✓ | | 171 | | Interoperability (e.g. load/combine annotations from other tools, integration with other tools)| ✓ | ✓ | 172 | | Multi-annotator agreement mechanisms/metrics / Automatic evaluation of annotations | ✓ | ✓ | 173 | | Post-Annotation curation | ✓ | | 174 | | Project Management with collaboration features | ✓ | ✓ | 175 | | Python-based | ✓ | | 176 | | Simple Login/Sign-on | ✓ | | 177 | | Supports multiple tasks | ✓ | | 178 | | Communication tools (e.g. annotators interaction with project managers) | | ✓ | 179 | | Documentation (e.g. forums, example setup) | | ✓ | 180 | | Built-in quality control (e.g. screening tests, data cleanup/filer) | | ✓ | 181 | | Demo/test small projects with least setup effort | | ✓ | 182 | | Access to large/diverse/global pool of annotators | | ✓ | 183 | 184 | ### Suggestions to annotation tool creators 185 | 186 | The keywords **flexibility** and **ease**/**easy**/**simple** appears in many of the comments to list the top features for annotation platform. We suggest the following for annotation tools creators to accomodate these feedbacks: 187 | 188 | - Customizable tasks / labels / schema setup and output formats for annotation project managers 189 | - User-friendly and intuitive UI/UX 190 | - Customizable shortcuts and/or mobile-friendly features to help annotators 191 | 192 | Another recurring theme in the top feature list includes: 193 | 194 | - Respondents highlighting ability to annotate (i) overlapping spans, (ii) discontinuous spans, (iii) corrections to initial tokenizations (iv) entity relation annotations 195 | - In cases where free text annotation is allowed, some respondents highlighted the **need for some sort of constraints/limitations of free text** 196 | 197 | 198 | ## Any other feedback on annotation tools? 199 | 200 | Here are a couple of aggregated free-form feedbacks from our respondents: 201 | 202 | 203 | Feedback for **Open Source** annotation tools: 204 | 205 | - Tokenizations can/should be explicit in the data input format 206 | - Pre-definition of tagset is important 207 | - Automatic publishing of annotations as open source 208 | - Having web version accessible to everyone is preferred 209 | - Tools for documentation updates during the annotation project progress 210 | - Having an active community to maintain the open source annotation tool 211 | - Automatic population of annotations in the tools would be great 212 | - Annotating across texts segments would be nice 213 | 214 | Feedback for annotation tools **in general**: 215 | 216 | - Make the platforms as open source 217 | - Support changes in the annotation schema and reclassification of the annotation that are already done. 218 | - There's no universal tool and there will never be. So we need to find ways to combine the output of different tools. 219 | - 1-2 years ago, annotation platforms were all simplistic and difficult to use for non-trivial tasks outside of their tutorials 220 | - Most platforms work well for an individual task, but there's an enormous amount of effort needed to add new features or repurpose an existing tool to a more complicated NLP task. 221 | 222 | # Your Dream Annotation Platform 223 | 224 | Summarizing the respondents' dream platforms: 225 | 226 | - All in one 227 | - Quick, easy to use, simple, plug & play 228 | - Flexibity (ability to customize annotation project, labels, tasks, schema, in-/output formats) 229 | - Interoperability (ability to integrate with other annotation tools and their in-/output formats) 230 | - Active and large pool of annotators, annotation project creators and annotation tool management community 231 | 232 | #### Notable Mentions about "Your Dream Anntoation Platform" 233 | 234 | - "*Takes over all the server hosting etc. and leaves the researcher with the task of designing experiments.*" 235 | - "*Fair treatment of both the annotator and the task creator.*" 236 | 237 | 238 | # Acknowledgement 239 | 240 | We thank all participants/respondents of the questionnaire and the precious insights/feedbacks given!!! 241 | 242 | We also have a thankful mention to `Mariana Neves` for referring us to the following survey 243 | 244 | - [An extensive review of tools for manual annotation of documents](https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbz130/5670958) (published on Dec 2019) [List of tool reviewed](https://github.com/mariananeves/annotation-tools) 245 | 246 | # Cite 247 | 248 | If reference to this report is necessary, 249 | 250 | to cite (in-text): 251 | 252 | > Liling Tan. 2020. A Survey of NLP Annotation Platforms. Retrieved from https://github.com/alvations/annotate-questionnaire 253 | 254 | 255 | in bibtex: 256 | 257 | ``` 258 | @misc{survey-annotation-platform, 259 | author = {Liling Tan}, 260 | title = {A Survey of NLP Annotation Platforms}, 261 | howpublished = {https://github.com/alvations/annotate-questionnaire}, 262 | year = {2020} 263 | } 264 | ``` 265 | --------------------------------------------------------------------------------