└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # IE Dataset Zoo 2 | Information extraction dataset zoo. 3 | 4 | # Relation Extraction 5 | 6 | | Dataset | #Rel.|#Inst.|Feature|Source |Resource| Origin | 7 | |---------|------:|------:|-------:|-------:|-------:|----------| 8 | | Fewrel | 100 | 44,800 | Supervised| Wikipedia+Wikidata | [url](http://47.92.96.190/dataset/fewrel.tar.gz) | [url](http://www.zhuhao.me/fewrel/) | 9 | | TACRED | 42 | 68,120 | Supervised |Newswire+web | - | [url](https://nlp.stanford.edu/projects/tacred/) | 10 | | Semeval | 19 | 8,000 | Supervised|Web| [url](http://47.92.96.190/dataset/semeval.tar.gz) | [url](https://www.kaggle.com/drtoshi/semeval2010-task-8-dataset#__sid=js0) | 11 | | Wikidata | 352 | 495,883 | Distent-supervision|Wikipedia+Wikidata| [url](http://47.92.96.190/dataset/wikidata.tar.gz) | [url](https://public.ukp.informatik.tu-darmstadt.de/UKP_Webpage/DATA/WikipediaWikidataDistantSupervisionAnnotations.v1.0.zip) | 12 | | NYT10(tsinghua) | 53 | 522,043 |Distent-supervision | NYT+Freebase| [url](http://47.92.96.190/dataset/nyt10.tar.gz) | [url](https://github.com/thunlp/OpenNRE/) | 13 | | NYT10-large(tsinghua) | 53 | 570,088 |Distent-supervision | NYT+Freebase| [url](http://47.92.96.190/dataset/nyt10-large.tar.gz) | [url](https://github.com/thunlp/HNRE) 14 | | NYT-Wikidata | 100 | 882,177 | Distent-supervision| NYT+Wikidata | [url](http://47.92.96.190/dataset/NYT-Wikidata.tar.gz) | [url](https://github.com/thunlp/PathNRE) | 15 | | NYT10-29 | 29 | 70,339 | Distent-supervision| NYT+Freebase | [url](http://47.92.96.190/dataset/NYT10.rar) | [url](https://github.com/truthless11/HRL-RE/tree/master/data) | 16 | | NYT11-12 | 12 | 62,648 | DS+supervised| NYT+Freebase| [url](http://47.92.96.190/dataset/NYT11.rar) | [url](https://github.com/truthless11/HRL-RE/tree/master/data) | 17 | | NYT-manual | 24 | 235,982 |Distent-supervision| NYT+Freebase | [url](http://47.92.96.190/dataset/nyt-manual.tar.gz) | [url](https://github.com/INK-USC/USC-DS-RelationExtraction) | 18 | | NYT-Wiki(zju) | 73 | 1,989,377 |Distent-supervision| NYT-Wikipedia-Wikidata | [url](http://47.92.96.190/dataset/nyt-wiki.zip) | [url](https://github.com/zxlzr/RAN/tree/master/data/NYT-Wiki) | 19 | | Wiki-KBP | 19 | 23,784 |Distent-supervision| Wikipedia+KBP+Freebase | [url](http://47.92.96.190/dataset/kbp.tar.gz) | [url](https://github.com/INK-USC/USC-DS-RelationExtraction) | 20 | | PubMed-BioInfer | 94 | 1,580 | Distent-supervision | PubMed+NESH | - | [url](https://github.com/INK-USC/USC-DS-RelationExtraction) | 21 | | WebNLG | 14 | 75,325 | Supervised | Web | - | [url](https://drive.google.com/open?id=1zISxYa-8ROe2Zv8iRc82jY9QsQrfY1Vj) | 22 | | SKE | 50 | 173,108 | Supervised | Web | [url](http://47.92.96.190/dataset/ske.tar.gz) | [url](https://ai.baidu.com/broad/download?dataset=sked) | 23 | | KBP37 | 37 | 15,916 | Supervised | Web | [url](http://47.92.96.190/dataset/ske.tar.gz) | [url](https://github.com/zhangdongxu/kbp37) | 24 | | T-REx | 642 | 6.3M | Distent-supervision | Wikipedia+Wikidata | - | [url](https://hadyelsahar.github.io/t-rex/) | 25 | | Google-RE | 5 | 59,576 | Supervised | Wikipedia | - | [url](https://github.com/google-research-datasets/relation-extraction-corpus) | 26 | | ADE | 3 | 23,516 | Supervised| Medical Report | [url](http://47.92.96.190/dataset/ade.tar.gz) | [url](https://github.com/davidsbatista/Annotated-Semantic-Relationships-Datasets) | 27 | 28 | Other Datasets 29 | 30 | - [WikiReading](https://github.com/google-research-datasets/wiki-reading) 31 | 32 | 33 | ## Fewrel 34 | 35 | FewRel : A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation 36 | 37 | Matching the Blanks : Distributional Similarity for Relation Learning ACL2019 38 | 39 | Multi-Level Matching and Aggregation Network for Few-Shot Relation Classification ACL2019 40 | 41 | ## TACRED 42 | 43 | Position-aware attention and supervised data improve slot filling. 44 | 45 | Matching the Blanks : Distributional Similarity for Relation Learning ACL2019 46 | 47 | 48 | ## Semeval 49 | 50 | Matching the Blanks : Distributional Similarity for Relation Learning 51 | 52 | ## Wikidata 53 | 54 | Context-Aware Representations for Knowledge Base Relation Extraction 55 | 56 | Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction 57 | 58 | 59 | 60 | 61 | ## NYT10(tsinghua) 62 | 63 | Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks 64 | 65 | Self-Attention Enhanced Selective Gate with Entity-Aware Embedding for Distantly Supervised Relation Extraction 66 | 67 | 68 | ## NYT10-large (tsinghua) 69 | 70 | Attention-Based Capsule Networks with Dynamic Routing for Relation Extraction EMNLP2018 71 | 72 | Hierarchical Relation Extraction with Coarse-to-Fine Grained Attention EMNLP2018 73 | 74 | Long-tail Relation Extraction via Knowledge Graph Embeddings and Graph Convolution Networks NAACL2019 75 | 76 | 77 | 78 | 79 | ## NYT-Wikidata 80 | 81 | Incorporating Relation Paths in Neural Relation Extraction EMNLP2017 82 | 83 | 84 | ## NYT10-29 85 | 86 | A Hierarchical Framework for Relation Extraction with Reinforcement Learning 87 | 88 | Joint Extraction of Entities and Relations with a Hierarchical Multi-task Tagging Model 89 | 90 | ## NYT11-12 91 | 92 | A Hierarchical Framework for Relation Extraction with Reinforcement Learning 93 | 94 | Joint Extraction of Entities and Relations with a Hierarchical Multi-task Tagging Model 95 | 96 | ## NYT-manual 97 | 98 | Indirect Supervision for Relation Extraction Using Question-Answer Pairs 99 | 100 | CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases. 101 | 102 | Joint Extraction of Entities and Relations Based on a Novel Decomposition Strategy 103 | 104 | Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism 105 | 106 | Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme 107 | 108 | 109 | ## NYT-Wiki(zju) 110 | 111 | 112 | Relation Adversarial Network for Low Resource Knowledge Graph Completion 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | ## Wiki-KBP 121 | 122 | CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases. 123 | 124 | Indirect Supervision for Relation Extraction Using Question-Answer Pairs 125 | 126 | 127 | 128 | 129 | 130 | 131 | ## PubMed-BioInfer 132 | 133 | CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases. 134 | 135 | ## WebNLG 136 | 137 | Extracting relational facts by an end-to-end neural model with copy mechanism 138 | 139 | 140 | ## SKE 141 | 142 | MrMep: joint extraction of multiple relations and multiple entity pairs based on triplet attention 143 | 144 | 145 | ## KBP37 146 | 147 | Matching the Blanks : Distributional Similarity for Relation Learning ACL2019 148 | 149 | Relation classification via recurrent neural network 150 | 151 | 152 | 153 | ## T-REx 154 | 155 | T-Rex : A Large Scale Alignment of Natural Language with Knowledge Base Triples 156 | 157 | K-ADAPTER: Infusing Knowledge into Pre-Trained Models with Adapters 158 | 159 | 160 | 161 | 162 | # Event Extraction 163 | 164 | 165 | | Dataset |# Inst.|Feature|Source |Resource| Origin | 166 | |---------|------:|-------:|-------:|-------:|----------| 167 | | ACE05 | 599 | Supervised| Web | - | [url](https://catalog.ldc.upenn.edu/LDC2006T06) | 168 | | FewEvent(zju) | 71,385 | Supervised| ACE05+_TAC-KBP17| [url](http://47.92.96.190/dataset/FewEvent.tar.gz) | [url](https://github.com/231sm/Low_Resource_KBP) | 169 | | CCKS2019_Event | 17,815 | Supervised| Financial Announcements | [url](http://47.92.96.190/dataset/ccks2019_event.tar.gz) | [url](https://www.biendata.com/competition/ccks_2019_4/data/) | 170 | | Doc2EDAG | 32,040 | Supervised| Financial Announcements | [url](http://47.92.96.190/dataset/doc2edag.tar.gz) | [url](https://github.com/dolphin-zs/Doc2EDAG) | 171 | 172 | ## ACE05 173 | 174 | too many papers 175 | 176 | ## FewEvent(zju) 177 | 178 | Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection 179 | 180 | 181 | ## Doc2EDAG 182 | 183 | Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction 184 | 185 | 186 | 187 | --------------------------------------------------------------------------------