├── README.md ├── evaluation-systems.md ├── languages.md └── tools.md /README.md: -------------------------------------------------------------------------------- 1 | # Resources for Knowledge Graph Construction 2 | 3 | List of the tools, mapping languages and evaluation systems. 4 | 5 | To contribute to any of the lists: 6 | 7 | ### Github users: 8 | 9 | Option 1 (recommendable): 10 | - Make a fork of the repository to your own personal account. 11 | - Copy the template of the list you want to edit (always at the top of the file) and put it at the end of the same file. 12 | - Fill the template with the requested information. 13 | - Make a pull request. 14 | 15 | Option 2: 16 | - Send an issue with the filled template of the list where you want to add the resource. 17 | 18 | 19 | ### Non-Github users: 20 | - Send the filled template of the list where you want to add the resource to the internal mailing list of the group. 21 | -------------------------------------------------------------------------------- /evaluation-systems.md: -------------------------------------------------------------------------------- 1 | # Evaluation Systems for Knowledge Graph Construction 2 | 3 | 4 | ## Evaluation System X (Template): 5 | - **Name**: 6 | - **Description**: 7 | - **Repository/Website**: 8 | - **Main Features (e.g., parameters that tests)**: 9 | - **Supported data sources and formats**: 10 | - **Data format**: 11 | - **Sizes or Generator**: 12 | - **Purpose(Virtual KG /Materialized KG/Both)**: 13 | - **Supported mapping language(s)**: 14 | - **Target and source models**: 15 | - **Contact point**: 16 | - **DOI**: 17 | - **License**: 18 | 19 | 20 | ## Evaluation System 1: 21 | - **Name**: Madrid-GTFS-Bench 22 | - **Description**: Benchmarking for virtual knowledge graph access in the transport domain. 23 | - **Repository/Website**: https://github.com/oeg-upm/gtfs-bench 24 | - **Main Features**: Data format, SPARQL operators, data size, joins 25 | - **Supported data sources and formats**: Any possible configuration by the user 26 | - **Data format**: SQL, CSV, JSON, XML, MongoDB (and combinations) 27 | - **Sizes or Generator**: Generator (based in VIG) 28 | - **Purpose(Virtual KG /Materialized KG/Both)**: Virtual KG 29 | - **Supported mapping language(s)**: RML, R2RML, xR2RML, CSVW 30 | - **Target and source models**: Ontology (http://vocab.gtfs.org/gtfs.ttl), source (https://developers.google.com/transit/gtfs) 31 | - **Contact point**: David Chaves (dchaves@fi.upm.es) 32 | - **DOI**: https://doi.org/10.5281/zenodo.3574493 33 | - **License**: Apache-2.0 34 | 35 | 36 | 37 | ## Evaluation System 2: 38 | - **Name**: KGC Parameters 39 | - **Description**: Parameters that affect the construction of a Knowledge Graph 40 | - **Repository/Website**: https://github.com/SDM-TIB/KGC-Param-Eval 41 | - **Main Features**: Join-Selectivity, Join-Duplicates, Relation-Types (1-N,N-1,N-M), Data-Paritioning 42 | - **Supported data sources and formats**: 43 | - **Data format**: CSV 44 | - **Sizes or Generator**: Ad-hoc generator in Python 45 | - **Purpose(Virtual KG /Materialized KG/Both)**: Materialized Knowledge Graph 46 | - **Supported mapping language(s)**: RML 47 | - **Target and source models**: N/A 48 | - **Contact point**: David Chaves (dchaves@fi.upm.es) 49 | - **DOI**: N/A 50 | - **License**: MIT 51 | 52 | -------------------------------------------------------------------------------- /languages.md: -------------------------------------------------------------------------------- 1 | # Mapping Languages for Knowledge Graph Construction 2 | 3 | ## Mapping Language X: 4 | - **Name**: 5 | - **specification**: 6 | - **year**: 7 | - **syntax(RDF or not RDF, if not RDF, what?)**: 8 | - **description**: 9 | - **data source limitations (If yes, what limitations? By default, we assume languages generating RDF can support any data format)**: 10 | - **features (select from the list)**: 11 | - collections/lists → can collections/lists be generated with this language? 12 | - named graphs → are named graphs supported? 13 | - blank nodes → can blank nodes be generated? 14 | - data transformations/functions → can data transformations be handled? 15 | - joining data sources → can the language support joins? If so, on which basis? 16 | - By ID (e.g., how you typically join database tables) 17 | - By structure (e.g., a JSON object’s children should be mapped as objects of a relation) 18 | - other (add any other type of join supported and we haven’t thought of) 19 | - RDF* support 20 | - add your own 21 | - **website**: 22 | - **test-cases**: 23 | - **contact point**: 24 | 25 | 26 | ## Mapping Language 1: 27 | - **Name**: RML 28 | - **specification**: https://rml.io/specs/rml/ 29 | - **year**: 2013 30 | - **syntax**: RDF 31 | - **description**: RML is a generation of the W3C recommended R2RML aiming to support heterogeneous data 32 | - **data source limitations**: no 33 | - **features**: named graphs, blank nodes, data transformations with FnO extension, joins 34 | - **website**: RML.io 35 | - **test-cases**: https://rml.io/test-cases/ 36 | - **contact point**: Anastasia Dimou 37 | 38 | ## Mapping Language 2: 39 | - **Name**: SPARQL-Generate 40 | - **specification**: see [_Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally A SPARQL extension for generating RDF from heterogeneous formats, In Proc. Extended Semantic Web Conference, ESWC, May 2017, Portoroz, Slovenia_](http://www.maxime-lefrancois.info/docs/LefrancoisZimmermannBakerally-ESWC2017-Generate.pdf) for the specification of the first version. 41 | - **year**: 2016 42 | - **syntax**: See [Javacc](https://github.com/sparql-generate/sparql-generate/blob/master/sparql-generate-jena/src/main/javacc/spargl.jj), [extension of YASQE](https://github.com/sparql-generate/sparql-generate-editor/blob/gh-pages/lib/grammar/sparql11-grammar.pl), extension of [Sublime Linked Data package](https://github.com/sparql-generate/sublime-editor/blob/master/src/syntax/sparql-generate.sublime-syntax-source) 43 | - **description**: SPARQL-Generate is an expressive template-based language to generate RDF streams or text streams from RDF datasets and document streams in arbitrary formats 44 | - **data source limitations**: no 45 | - **features**: 46 | - collections/lists [(try out)](https://ci.mines-stetienne.fr/sparql-generate/playground.html#ex=example/generate/08-Lists), 47 | - querying named graphs: yes as it's an extension of SPARQL 48 | - blank nodes: yes as it's an extension of SPARQL 49 | - data transformations/function (the standard SPARQL ones, plus the [SPARQL Function form O. Corby and C. Faron Zucker](http://ns.inria.fr/sparql-extension/#function) ) 50 | - joining data sources: 51 | - By ID, potentially with any data transformation 52 | - By structure (e.g., a JSON object’s children are mapped as objects of a relation, or even subjects or predicates of relations) 53 | - [try out the example of a query that combines JSON weather station reports with lists of events described in XML documents and obtained from an external web service](https://ci.mines-stetienne.fr/sparql-generate/playground.html#ex=example/generate/06-DifferentSources) 54 | - generating RDF streams 55 | - generating output as HDT 56 | - IRIs or Literals can be generated with bits of different data sources 57 | - binary data 58 | - **website**: 59 | - https://ci.mines-stetienne.fr/sparql-generate/ 60 | - **test-cases**: https://ci.mines-stetienne.fr/sparql-generate/playground.html 61 | - **contact point**: Maxime Lefrançois, EMSE Saint-Étienne 62 | 63 | ## Mapping Language 3: 64 | - **Name**: ShExML 65 | - **specification**: http://shexml.herminiogarcia.com/spec/ 66 | - **year**: 2019 67 | - **syntax**: Based on ShEx syntax but with its own grammar ([ANTLR4 grammar](https://github.com/herminiogg/ShExML/blob/master/src/main/java/es/weso/antlr/ShExMLParser.g4)) 68 | - **description**: ShExML is a language based on ShEx to map and merge heterogeneous data sources. It is designed with usability in mind trying to make the script creation easier to the users. 69 | - **data source limitations**: no 70 | - **features (select from the list)**: 71 | - collections/lists → no 72 | - named graphs → no (future extension) 73 | - blank nodes → yes 74 | - data transformations/functions → limited ones (see [String operators](http://shexml.herminiogarcia.com/spec/#string-operation-over-iterators) and [Matchers](http://shexml.herminiogarcia.com/spec/#string-operation-over-iterators)), planning to add an extensible function library 75 | - joining data sources: 76 | - By ID: [JOIN keyword](http://shexml.herminiogarcia.com/spec/#join-over-iterators) 77 | - By structure: [Shape linking](http://shexml.herminiogarcia.com/spec/#linking-shapes) 78 | - **website**: http://shexml.herminiogarcia.com/ 79 | - **test-cases**: https://github.com/herminiogg/ShExML/tree/master/src/test/scala-2.12/es/weso/shexml 80 | - **contact point**: Herminio García González (garciaherminio@uniovi.es) 81 | -------------------------------------------------------------------------------- /tools.md: -------------------------------------------------------------------------------- 1 | # Knowledge Graph Construction Tools 2 | Description of the tools for knowledge graph construction 3 | 4 | ## Tool X (TEMPLATE): 5 | - **Name of the tool**: 6 | - **Description**: 7 | - **Repository (link to the tool’s repository)**: 8 | - **Website (if is different to the repository)**: 9 | - **Open source? (If not open sourced, ideally provide an option to test it)**: 10 | - **Year introduced**: 11 | - **Contact person (who is the main contact person?)**: 12 | - **Purpose (what can one do with the tool?)**: Select one of this options: Processor (executes rules to generate a knowledge graph), editor (automatically o manually generation of mapping rules), other (e.g., pre-processing) 13 | - **Mapping language**: (which mapping language(s) is supported by the tool) 14 | - **Supported data (formats, sizes)**: 15 | - **Programming language**: 16 | - **Special features**: 17 | - **DOI**: 18 | - **License**: 19 | - **Test cases**: (if any for the supported languages) 20 | - **Related use cases**: (specify use cases shared with the community group (if any) where the tool is used) 21 | - **Related projects**: (specify projects (if any) where the tool is used, ideally provide links to the projects descriptions) 22 | 23 | 24 | ## Tool 1: 25 | - **Name of the tool**: Morph-CSV 26 | - **Description**: Morph-CSV is an open source tool for querying tabular data sources using SPARQL. It exploits the information from the query, RML+FnO mappings and CSVW metadata to enhance the performance and completness of traditional OBDA systems (SPARQL-to-SQL translators). At this moment can be embebed in the top of any R2RML-compliant system. 27 | - **Repository**: https://github.com/oeg-upm/morph-csv 28 | - **Website**: https://morph.oeg.fi.upm.es/tool/morph-csv 29 | - **Open source**: Yes 30 | - **Year introduced**: 2019 31 | - **Contact person**: David Chaves (dchaves@fi.upm.es) 32 | - **Purpose**: Other (Enhance SPARQL-to-SQL engines when data is in CSV) 33 | - **Mapping language**: YARRRML+FnO and CSVW 34 | - **Supported data**: CSV (tested with BSBM 360K products) 35 | - **Programming language**: Python + Bash 36 | - **DOI**: https://doi.org/10.5281/zenodo.3731941 37 | - **License**: Apache-2.0 38 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/oeg-publictransport.md 39 | - **Related projects**: http://sprint-transport.eu/ 40 | 41 | ## Tool 2: 42 | - **Name of the tool**: RMLEditor 43 | - **Description**: The RMLEditor offers a graphical user interface to create rules to generate knowledge graphs based on heterogeneous data sources. 44 | - **Repository (link to the tool’s repository)**: https://github.com/RMLio/rmleditor-ce 45 | - **Website (if is different to the repository)**: https://app.rml.io/rmleditor/ 46 | - **Open source? (If not open sourced, ideally provide an option to test it)**: No 47 | - **Year introduced**: 2016 48 | - **Contact person (who is the main contact person?)**: Pieter Heyvaert (pieter.heyvaert@ugent.be) 49 | - **Purpose (what can one do with the tool?)**: editor 50 | - **Mapping language**: [R2]RML 51 | - **Supported data (formats, sizes)**: CSV, JSON, XML 52 | - **Programming language**: HTML/CSS/JS 53 | - **Special features**: Uses LOV to find relevant classes and properties. Uses MapVOWL to visualize rules. 54 | - **DOI**: N/A 55 | - **License**: Free community edition with limitations and paid edition without limitations. 56 | - **Test cases**: None 57 | - **Related use cases**: None 58 | - **Related projects**: [DyVerSIFy](https://www.imec-int.com/en/what-we-offer/research-portfolio/dyversify), [MOS2S](https://www.mos2s.eu/), [COMBUST](https://www.imec-int.com/en/what-we-offer/research-portfolio/combust) 59 | 60 | ## Tool 3: 61 | - **Name of the tool**: RMLMapper 62 | - **Description**: The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources 63 | - **Repository (link to the tool’s repository)**: https://github.com/RMLio/rmlmapper-java 64 | - **Website (if is different to the repository)**; https://rml.io 65 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes 66 | - **Year introduced**: 2014 67 | - **Contact person (who is the main contact person?)**: Ben De Meester (ben.demeester@ugent.be) 68 | - **Purpose (what can one do with the tool?)**: Processor 69 | - **Mapping language**: RML, R2RML 70 | - **Supported data (formats, sizes)**: local and remote files (CSV using ql:CSV or CSVW, JSON using JSONPath, XML using XPath), databases (MySQL, PostgreSQL, SQLServer, OracleDB). The mapper is in-memory, so query result size should be less than the machine's memory 71 | - **Programming language**: JAVA 72 | - **Special features**: Extensible in terms of supported data formats, Configurable and extensible data transformations using https://FnO.io, interdatasource join. Reference implementation of RML. 73 | - **DOI**: N/A 74 | - **License**: MIT 75 | - **Test cases**: https://rml.io/test-cases/ 76 | - **Related use cases**: betweenourworlds-anime, idlab-covid19, idlab-dbpedia, idlab-facebook, idlab-twitter, idlab-velopark 77 | - **Related projects**: [EcoDaLo], [ESSENCE], [DAIQUIRI], [DiSSeCt] 78 | 79 | [EcoDaLo]: https://www.imec-int.com/en/what-we-offer/research-portfolio/ecodalo 80 | [ESSENCE]: https://www.imec-int.com/en/what-we-offer/research-portfolio/essence 81 | [DAIQUIRI]: https://www.imec-int.com/en/what-we-offer/research-portfolio/daiquiri 82 | [DiSSeCt]: https://dissectsite.wordpress.com/ 83 | 84 | ## Tool 4: 85 | - **Name of the tool**: Morph-RDB 86 | - **Description**: Morph-RDB (formerly called ODEMapster) is an RDB2RDF engine developed by the Ontology Engineering Group, that follows the R2RML specification (http://www.w3.org/TR/r2rml/).This engine supports two operational modes: data upgrade (generating RDF instances from data in a relational database) and query translation (SPARQL to SQL). Morph-RDB employs various optimisation techniques in order to generate efficient SQL queries, such as self-join elimination and subquery elimination. 87 | - **Repository**: https://github.com/oeg-upm/morph-rdb 88 | - **Website**: https://morph.oeg.fi.upm.es/tool/morph-rdb 89 | - **Open source**: Yes 90 | - **Year introduced**: 2014 91 | - **Contact person**: David Chaves (dchaves@fi.upm.es) 92 | - **Purpose**: Processor 93 | - **Mapping language**: R2RML 94 | - **Supported data**: SQL (tested with MySQL and PostgreSQL) 95 | - **Programming language**: Scala + Java 96 | - **DOI**: N/A 97 | - **License**: Apache-2.0 98 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/oeg-publictransport.md 99 | - **Related projects**: http://sprint-transport.eu/, https://www.mobile-age.eu/, https://bimerr.eu/ 100 | 101 | ## Tool 5: 102 | - **Name of the tool**: Morph-GraphQL 103 | - **Description**: Morph-GraphQL is an open source system for generating GraphQL servers automatically from declarative mappings such as R2RML or RML. Currently, Morph-GraphQL is able to generate GraphQL servers in JavaScript and SQL databases. Current experimental features include the generation of GraphQL server in other languages (e.g. Java) and other data models (e.g. MongoDB) 104 | - **Repository**: https://github.com/oeg-upm/morph-graphql 105 | - **Website**: https://morph.oeg.fi.upm.es/tool/morph-graphql 106 | - **Open source**: Yes 107 | - **Year introduced**: 2019 108 | - **Contact person**: David Chaves (dchaves@fi.upm.es) 109 | - **Purpose**: Processor 110 | - **Mapping language**: R2RML and RML 111 | - **Supported data**: SQL (tested with H2) and NoSQL (experimental, tested with MongoDB) 112 | - **Programming language**: JavaScript/Node.js 113 | - **DOI**: N/A 114 | - **License**: Apache-2.0 115 | - **Related use cases**: N/A 116 | - **Related projects**: N/A 117 | 118 | ## Tool 6: 119 | - **Name of the tool**: Mapeathor 120 | - **Description**: Mapeathor is a simple spreadsheet parser able to generate mapping rules in three mapping languages: R2RML, RML (with extension to functions from FnO) and YARRRML. It takes the mapping rules expressed in a spreadsheet (designed to facilitate the mapping rule writting process) and transforms them into the desired language. 121 | - **Repository (link to the tool’s repository)**: https://github.com/oeg-upm/Mapeathor 122 | - **Website (if is different to the repository)**: https://morph.oeg.fi.upm.es/tool/mapeathor 123 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes 124 | - **Year introduced**: 2019 125 | - **Contact person (who is the main contact person?)**: Ana Iglesias (ana.iglesiasm@upm.es) 126 | - **Purpose (what can one do with the tool?)**: Editor 127 | - **Mapping language**: R2RML, RML, YARRRML 128 | - **Supported data (formats, sizes)**: Excel 129 | - **Programming language**: Python 130 | - **License**: Apache-2.0 131 | - **Test cases**: None 132 | - **Related use cases**: None 133 | - **Related projects**: [Ciudades Abiertas](http://www.ciudadesabiertas.es/) 134 | 135 | ## Tool 7: 136 | - **Name of the tool**: SDM-RDFizer 137 | - **Description**: SDM-RDFizer is an interpreter of mapping rules that allows the transformation of (un)structured data into RDF knowledge graphs. 138 | - **Repository (link to the tool’s repository)**: https://github.com/SDM-TIB/SDM-RDFizer 139 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes 140 | - **Year introduced**: 2017 141 | - **Contact person (who is the main contact person?)**: Enrique Iglesias (s6enigle@uni-bonn.de) 142 | - **Purpose (what can one do with the tool?)**: Transformation of (un)structured data into RDF knowledge graphs by an efficient execution of RML triple maps. 143 | - **Mapping language**: RML (current version) 144 | - **Supported data (formats, sizes)**: CSV, JSON, RDB, XML 145 | - **Programming language**: Python 146 | - **Special features**: The SDM-RDFizer implements optimized data structures and relational algebra operators that enable an efficient execution of RML triple maps even in the presence of Big data. SDM-RDFizer is also extensible in terms of supported data formats, Configurable and extensible data processing functions using https://FnO.io 147 | - **DOI**: https://doi.org/10.5281/zenodo.3872103 148 | - **License**: Apache-2.0 149 | - **Test cases**: https://rml.io/test-cases/ 150 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/sdm-genomics.md 151 | - **Related projects**: [iASiS](http://project-iasis.eu/), [BigMedilytics - lung cancer pilot](https://www.bigmedilytics.eu/), [CLARIFY](https://www.clarify2020.eu/), [P4-LUCAT](https://www.tib.eu/de/forschung-entwicklung/projektuebersicht/projektsteckbrief/p4-lucat), [ImProVIT](https://www.tib.eu/de/forschung-entwicklung/projektuebersicht/projektsteckbrief/improvit), [PLATOON](https://platoon-project.eu/) 152 | 153 | ## Tool 8: 154 | - **Name of the tool**: RMLStreamer 155 | - **Description**: The RMLStreamer executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources in a streaming way. 156 | - **Repository (link to the tool’s repository)**: https://github.com/RMLio/RMLStreamer 157 | - **Website (if is different to the repository)**: https://rml.io 158 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes 159 | - **Year introduced**: 2019 160 | - **Contact person (who is the main contact person?)**: Gerald Haesendonck (gerald.haesendonck@ugent.be) 161 | - **Purpose (what can one do with the tool?)**: Processor 162 | - **Mapping language**: RML 163 | - **Supported data (formats, sizes)**: formats: CSV, XML, JSON; media: files, TCP sockets, Kafka topics 164 | - **Programming language**: Scala 165 | - **Special features**: Extensible in terms of supported data formats and media, optimised for processing big data sets and contious data streams, designed to run on a cluster. 166 | - **DOI**: https://doi.org/10.5281/zenodo.3887065 167 | - **License**: MIT 168 | - **Test cases**: https://rml.io/test-cases/ 169 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/idlab-twitter.md 170 | - **Related projects**: [MOS2S], [DyVerSIFy], [ESSENCE], [DAIQUIRI] 171 | 172 | [MOS2S]: https://www.mos2s.eu/ 173 | [DyVerSIFy]: https://www.imec-int.com/en/what-we-offer/research-portfolio/dyversify 174 | 175 | 176 | ## Tool 9: 177 | - **Name of the tool**: SPARQL micro-services 178 | - **Description**: 179 | The SPARQL Micro-Service architecture is meant to unlock data silos hidden behind proprietary Web APIs by equipping them with a lightweight SPARQL endpoint. The whole idea is about bringing Web APIs into the Web of Data and making it possible to integrate Linked Data and Web APIs within a simple federated SPARQL query. 180 | 181 | A SPARQL micro-service encapsulates a Web API and typically yields a **small, resource-centric graph** generated dynamically. It can be seen as a **configurable** SPARQL endpoint in that it expects parameters, e.g. a SPARQL micro-service to find photos from Snapshat may expect tags. 182 | 183 | An interesting use of SPARQL micro-services is to **assign dereferenceable URIs to Web API resources** that do not have URIs in the first place. For instance, https://sparql-micro-services.org/ld/flickr/photo/31173091626 is the dereferenceable URI of a photo in Flickr. The content is generated dynamically based on the photo identifier. 184 | - **Repository (link to the tool’s repository)**: https://github.com/frmichel/sparql-micro-service 185 | - **Website (if is different to the repository)**: example SPARQL micro-services: https://sparql-micro-services.org/ 186 | - **Open source? (If not open sourced, ideally provide an option to test it)**: yes 187 | - **Year introduced**: 2018 188 | - **Contact person (who is the main contact person?)**: Franck Michel (franck.michel@cnrs.fr) 189 | - **Purpose (what can one do with the tool?)**: processor, other 190 | - **Mapping language**: (which mapping language(s) is supported by the tool) : SPARQL construct 191 | - **Supported data (formats, sizes)**: mainly JSON-based Web APIs, XML-based Web APIs can be adapted too 192 | - **Programming language**: php 193 | - **Special features**: 194 | - Docker deployment ready 195 | - Assign dereferenceable URIs to Web API resources (bridge Web APIs and LOD) 196 | - Provide provenance information as part of the graph generated 197 | - Simple configuration with a config.ini file, or with rich SPARQL Sescription Description and SHACL shapes graph 198 | - Dynamic generation of HTML documentation + test interface from the SPARQL micro-service Sescription Description see [example](https://sparql-micro-services.org/service/flickr/getPhotosByTags_sd/)) 199 | - Autmatic markup of HTML documentation as schema.org Dataset to allow webscale discoverability of SPARQL micro-services, e.g. with Google Dataset Search 200 | - **DOI**: n/a 201 | - **License**: Apache 2.0 202 | - **Test cases**: n/a 203 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/inria-kg-vs-webapis.md 204 | - **Related projects**: [Taxref-Web](https://taxref.mnhn.fr/taxref-web/) (private access only, comparison of 20+ Web APIs in the biodiversity domain). Multiple hands-on sessions experimented successfully with various Web APIs: Flickr, Youtube, Twitter, Spotify, Deezer, Musicbrainz... 205 | 206 | ## Tool 10: 207 | - **Name of the tool**: WordLift Plugin 208 | - **Description**: 209 | 210 | WordLift is a WordPress plugin that brings state-of-the-art semantic technologies to the hands of any blogger and publisher: without requiring any technical skills, it helps produce richer content and organize it by suggesting facts and information to provide readers with meaningful context and adding semantic markup to the text to help machines fully understand any website. 211 | 212 | Features: 213 | 214 | * Text Analysis: WordLift analyzes content and identifies matching entities organized in 4 categories: Who, What, When and Where. 215 | * Tag Content: Editors can accept the suggested entities to add contextual info for the user, efficiently selecting internal links to existing content. 216 | * Create New Entities: Editors can create new entities providing additional context and enriching the web site Knowledge Graph. WordLift will learn and next time they will be detected. 217 | * Edit Entities: Editors can edit all entities to customize the Knowledge Graph around the web sites' audiences and build new relationships. 218 | * Images: WordLift suggests open license images and media from own library, saving the time usually spent searching for visuals. 219 | * Geomaps: Locations in articles can quickly be mapped adding the Geomap widget. 220 | * Timelines: Events can be displayed chronologically adding the Timeline widget. 221 | * Chords: Visualize what relates to what in every article adding the Chord widget. 222 | * Navigator: Recommend relevant articles adding the Navigator widget. 223 | * Faceted Search: Suggest additional content related to the topics found in your article, letting readers dive into your archive with the Faceted Search widget. 224 | * Meaningful Navigation: WordLift automatically identifies topics in articles, using Wikipedia’s classification system. This allows to create new entry points for content based on topics, events, people and places. 225 | * Publish Search Data: WordLift automatically adds schema.org markup articles, allowing search engines to properly index and display content and intelligent agents such as Siri and Alexa to access it. 226 | * Publish Linked Data: WordLift publishes content’s metadata. 227 | 228 | - **Repository**: https://github.com/insideout10/wordlift-plugin 229 | - **Website**: https://wordlift.io 230 | - **Open source?**: yes 231 | - **Year introduced**: 2017 232 | - **Contact person**: David Riccitelli (david@wordlift.io) 233 | - **Purpose**: editor, other 234 | - **Mapping language**: n/a 235 | - **Supported data**: WordPress, unstructured HTML 236 | - **Programming language**: PHP, Java, Python 237 | - **Special features**: [Knowledge Graph](https://wordlift.io/blog/en/entity/knowledge-graph/), [Linked Data](https://wordlift.io/blog/en/entity/linked-data/), NLP, SPARQL, GraphQL, Persistent URIs 238 | - **DOI**: n/a 239 | - **License**: GPL 240 | - **Test cases**: n/a 241 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/wordlift-salzburgerland.md 242 | - **Related projects**: [WordLift NG] 243 | 244 | [WordLift NG]: https://wordlift.io/blog/en/wordlift-next-generation-receives-grant-from-eu/ 245 | 246 | 247 | ## Tool 11: 248 | - **Name of the tool**: RocketRML 249 | - **Description**: An efficient RML-mapper implementation with Javascript for the RDF mapping language (RML). 250 | - **Repository (link to the tool’s repository)**: https://github.com/semantifyit/RocketRML 251 | - **Website (if is different to the repository)**: https://semantifyit.github.io/RocketRML/ 252 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes 253 | - **Year introduced**: 2019 254 | - **Contact person (who is the main contact person?)**: Umutcan Simsek (umutcan.simsek@sti2.at) 255 | - **Purpose (what can one do with the tool?)**: Processor 256 | - **Mapping language**: RML (in Turtle and YARRML syntax) 257 | - **Supported data (formats, sizes)**: CSV, JSON, XML. Tested with 500k triples (takes ~20s) 258 | - **Programming language**: Javascript (Node.js) 259 | - **Special features**: It efficiently maps hierarchical sources by using some caching mechanisms for iterators and JOIN results. Available as a tool with CLI and as an NPM package. A Dockerfile is also provided. Please see the Github repository. 260 | - **DOI**: n/a 261 | - **License**: CC-BY-SA-4.0 262 | - **Test cases**: n/a 263 | - **Related use cases**: TBD 264 | - **Related projects**: [semantify.it], [MindLab] 265 | 266 | [semantify.it]: https://semantify.it 267 | [MindLab]: https://mindlab.ai 268 | 269 | 270 | ## Tool 12: 271 | - **Name of the tool**: SPARQL-Generate 272 | - **Description**: SPARQL-Generate is an expressive template-based language to generate RDF streams or text streams from RDF datasets and document streams in arbitrary formats 273 | - **Repository (link to the tool’s repository)**: https://github.com/sparql-generate/sparql-generate 274 | - **Website (if is different to the repository)**: https://ci.mines-stetienne.fr/sparql-generate/ 275 | - **Open source? (If not open sourced, ideally provide an option to test it)**: yes 276 | - **Year introduced**: 2016 277 | - **Contact person (who is the main contact person?)**: Maxime Lefrançois, MINES Saint-Étienne 278 | - **Purpose (what can one do with the tool?)**: Processor 279 | - **Mapping language**: SPARQL-Generate 280 | - **Supported data (formats, sizes)**: RDF, SQL, XML, JSON, CSV, GeoJSON, HTML, CBOR, plain text with regular expressions, large CSV documents (unofficially: generation of 17.5 M triples HDT in < 9'20''), MQTT or WebSocket streams, repeated HTTP GET operations. 281 | - **Programming language**: Java 282 | - **Special features**: 283 | - useable [on the web playground](https://ci.mines-stetienne.fr/sparql-generate/playground.html), [inside Sublime Text](https://ci.mines-stetienne.fr/sparql-generate/sublime.html), [as an executable JAR](https://ci.mines-stetienne.fr/sparql-generate/language-cli.html), [as an open source Java library](https://ci.mines-stetienne.fr/sparql-generate/get-started.html) ; 284 | - can also generate text streams from RDF datasets and document streams in arbitrary format (implements something like [STTL](https://ns.inria.fr/sparql-template/)). 285 | - **DOI**: https://doi.org/10.5281/zenodo.3965916 286 | - **License**: Apache 2 287 | - **Test cases**: see https://ci.mines-stetienne.fr/sparql-generate/playground.html 288 | - **Related projects**: [ITEA2 12004 SEAS (Smart Energy Aware Systems)], [ANR 14-CE24-0029 OpenSensingCity], [ETSI STF 578], bilateral research contracts with ENGIE R&D CRIGEN, [ANR 19-CE23-0012 CoSWoT], [ANR HyperAgents]. 289 | 290 | [ITEA2 12004 SEAS (Smart Energy Aware Systems)]: https://itea3.org/project/seas.html 291 | [ANR 14-CE24-0029 OpenSensingCity]: https://anr.fr/Project-ANR-14-CE24-0029 292 | [ETSI STF 578]: https://portal.etsi.org/STF/STFs/STF-HomePages/STF578 293 | [ANR 19-CE23-0012 CoSWoT]: https://anr.fr/Project-ANR-19-CE23-0012 294 | [ANR HyperAgents]: http://hyperagents.gitlab.emse.fr/ 295 | 296 | ## Tool 13: 297 | - **Name of the tool**: ShExML 298 | - **Description**: ShExML engine and webapp 299 | - **Repository**: https://github.com/herminiogg/ShExML 300 | - **Website (if is different to the repository)**: http://shexml.herminiogarcia.com 301 | - **Open source?**: yes 302 | - **Year introduced**: 2019 303 | - **Contact person**: Herminio García González (garciaherminio@uniovi.es) 304 | - **Purpose**: Processor (executes rules to generate a knowledge graph), editor (automatically o manually generation of mapping rules), translator (convert ShExML rules to RML rules) 305 | - **Mapping language**: ShExML 306 | - **Supported data**: XML, JSON, CSV. 307 | - **Programming language**: Scala 308 | - **Special features**: N/A 309 | - **DOI**: N/A 310 | - **License**: Not yet decided 311 | - **Test cases**: https://github.com/herminiogg/ShExML/tree/master/src/test/scala-2.12/es/weso/shexml (sbt test and [CI](https://travis-ci.org/github/herminiogg/ShExML)) 312 | - **Related use cases**: [Asturian Notaries Manuscripts](https://github.com/kg-construct/use-cases/blob/master/uniovi-notaries.md) 313 | - **Related projects**: N/A 314 | 315 | ## Tool 14: 316 | - **Name of the tool**: CARML 317 | - **Description**: An extensible RML mapping engine with built-in support for JSON, CSV, and XML 318 | - **Repository (link to the tool’s repository)**: https://github.com/carml/carml 319 | - **Website (if is different to the repository)**: 320 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes 321 | - **Year introduced**: 2017 322 | - **Contact person (who is the main contact person?)**: Pano Maria (pano@skemu.com) 323 | - **Purpose (what can one do with the tool?)**: Processor 324 | - **Mapping language**: RML 325 | - **Supported data (formats, sizes)**: CSV, JSON, XML 326 | - **Programming language**: JAVA 327 | - **Special features**: Easily extensible for other formats. InputStream extension for easy programmatic binding of sources. XML document extension to be able to use namespace prefix mappings in XPath expressions. Support for FnO functions. 328 | - **DOI**: n/a 329 | - **License**: MIT 330 | - **Test cases**: https://rml.io/test-cases/ 331 | - **Related use cases**: [Kadaster Data Platform](https://github.com/kg-construct/use-cases/blob/master/kadaster-ld.md) 332 | - **Related projects**: [Kadaster Data Platform (PDOK)](https://www.pdok.nl/), [Zazuko XRM](https://zazuko.com/products/expressive-rdf-mapper/), [DotWebStack Framework](https://github.com/dotwebstack/dotwebstack-framework/) 333 | 334 | ## Tool 15: 335 | - **Name of the tool**: Helio 336 | - **Description**: Helio is a framework that allows publishing RDF data from different heterogeneous sources as Linked Data 337 | - **Repository (link to the tool’s repository)**: https://github.com/oeg-upm/Helio 338 | - **Website (if is different to the repository)**: https://oeg-upm.github.io/Helio/ 339 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes 340 | - **Year introduced**: 2018 341 | - **Contact person (who is the main contact person?)**: Andrea Cimmino (cimmino@fi.upm.es) 342 | - **Purpose (what can one do with the tool?)**: Processor (executes rules to generate a knowledge graph), Publish Knowledge Graph. 343 | - **Mapping language**: RML, WoT-Mapping, and Helio mapping 344 | - **Supported data (formats, sizes)**: CSV, XML, HTML, text, JSON, RDF 345 | - **Programming language**: Java 346 | - **Special features**: relies on a plugin sistem that does not require developers to download the core code, customizable html views, can integrate existing tools that generate RDF. 347 | - **DOI**: 348 | - **License**: APACHE 2.0 349 | - **Test cases**: (if any for the supported languages) 350 | - **Related use cases**: - 351 | - **Related projects**: [VICINITY H2020](https://www.vicinity2020.eu/vicinity/), [DELTA H2020](https://www.delta-h2020.eu/), [BIMER H2020](https://bimerr.eu/about/) 352 | 353 | ## Tool 16: 354 | - **Name of the tool**: FunMap 355 | - **Description**: FunMap is an interpreter of RML+FnO that converts a data integration system defined using RML+FnO into an equivalent data integration system where RML mappings are function-free. 356 | - **Repository (link to the tool’s repository)**: https://github.com/SDM-TIB/FunMap 357 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes 358 | - **Year introduced**: 2020 359 | - **Contact person (who is the main contact person?)**: Samaneh Jozashoori (samaneh.jozashoori@tib.eu) 360 | - **Purpose (what can one do with the tool?)**: It can be applied when pre-processing step is provided in the context of mapping rules as functions; i.e. the data pre-processing is supposed to be performed at the time of data model transformation (into RDF) and knowledge graph creation. 361 | - **Mapping language**: RML (current version) 362 | - **Supported data (formats, sizes)**: CSV, RDB 363 | - **Programming language**: Python 364 | - **Special features**: FunMap empowers the knowledge graph creation process with optimization techniques to reduce execution time. 365 | - **DOI**: https://doi.org/10.5281/zenodo.3993657 366 | - **License**: Apache-2.0 367 | - **Test cases**: - 368 | - **Related use cases**: - 369 | - **Related projects**: [CLARIFY](https://www.clarify2020.eu/), [P4-LUCAT](https://www.tib.eu/de/forschung-entwicklung/projektuebersicht/projektsteckbrief/p4-lucat), [Ciudades Abiertas](https://ciudades-abiertas.es/) 370 | 371 | ## Tool 17: 372 | - **Name of the tool**: Squerall 373 | - **Description**: An implementation of the so-called _Semantic Data Lake_, a query engine _uniformly_ accessing original large and heterogeneous data sources using Semantic Web principles and technologies 374 | - **Repository (link to the tool’s repository)**: https://github.com/EIS-Bonn/Squerall 375 | - **Website (if is different to the repository)**: https://eis-bonn.github.io/Squerall/ 376 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes 377 | - **Year introduced**: 2017. 378 | - **Contact person (who is the main contact person?)**: Mohamed Nadjib Mami (mohamed.nadjib.mami@gmail.com) 379 | - **Purpose (what can one do with the tool?)**: Processor (executes rules to generate a knowledge graph). Squerall is a virtual OBDA (Ontology Based Data Access) engine, where a knowledge graph is only constirbuted _on-the-fly_ at query-time. However, with a slight development effort, it would be possible to physically materialize the knowledge graph (in RDF) following a property table partitining-like scheme. 380 | - **Mapping language**: RML 381 | - **Supported data (formats, sizes)**: CSV, Parquet, MongoDB, Cassandra, JDBC (MySQL, SQL Server, etc.), (beta) Elasticsearch. Squerall can be [extended](https://github.com/EIS-Bonn/Squerall#extensibility) to support other sources 382 | - **Programming language**: Scala, Java 383 | - **Special features**: Use SPARQL to query popular distributed data sources, e.g. files in Hadoop, NoSQL stores _on-the-fly_ i.e. without requiring pre-processing or ingestion. Disparate data may be joinable by declaratively altering some of its atrributes thanks to the use of the FnO ontology. State-of-the-art Big Data query engines are used for the querying, namely Apache Spark and Presto. Squerall can programmatically be [extended](https://github.com/EIS-Bonn/Squerall#extensibility) to use another query engines e.g. Drill or Dremio) 384 | - **DOI**: https://zenodo.org/record/2636436#.X3tOY_kzZPY 385 | - **License**: Apache-2.0 386 | 387 | ## Tool 18: 388 | - **Name of the tool**: Chimera 389 | - **Description**: Chimera is a tool to build conversion pipelines leveraging Semantic Web technologies. It is built on-top of Apache Camel to easily configure message-to-message mediators or batch converters using lifting/lowering procedures to/from a reference ontology. In principle the aim is to completely avoid coding by just configuring a pipeline using the various blocks provided. 390 | - **Repository (link to the tool’s repository)**: [https://github.com/cefriel/chimera](https://github.com/cefriel/chimera) 391 | - **Open source?** : YES 392 | - **Year introduced**: 2019 393 | - **Contact person (who is the main contact person?)**: Mario Scrocca ([mario.scrocca@cefriel.com](mailto:mario.scrocca@cefriel.com)) 394 | - **Purpose (what can one do with the tool?)**: A basic Chimera pipeline involves a lifting Processor (fork of the RMLMapper) Tool and a lowering Processor ([rdf-lowerer](https://github.com/cefriel/rdf-lowerer) built on Apache Velocity). Additional blocks, e.g., for pre-processing/enrichment of the knowledge graph, can be integrated in the pipeline. 395 | - **Mapping language**: RML for lifting, _extended_ VTL (Velocity Template Language) for lowering 396 | - **Supported data (formats, sizes)**: CSV, JSON, XML 397 | - **Programming language**: Java 398 | - **Special features**: High configurability of pipelines to satisfy different data integration requirements using Semantic Web Technologies. Easy to integrate with existing data sources and sinks thanks to Apache Camel components. 399 | - **License**: Apache-2.0 400 | - **Related use cases**:https://github.com/kg-construct/use-cases/blob/master/oeg-publictransport.md 401 | - **Related projects**: http://sprint-transport.eu/ 402 | 403 | 404 | 405 | ## Tool 19: 406 | - **Name of the tool**: Ontario 407 | - **Description**: A federated query processing engine that is able to access heterogeneous data sources in a Semantic Data Lake. Ontario leverages the concept of RDF Molecule Templates to effectively and efficeintly decompose, plan and execute SPARQL queries over a federation of data sources. The given SPARQL queries are transformed to the query languages of data sources in a Semantic Data Lake using the mapping rules expressed uring RML language. 408 | - **Repository (link to the tool’s repository)**: https://github.com/SDM-TIB/Ontario 409 | - **Website (if is different to the repository)**: https://labs.tib.eu/info/projekt/ontario/ 410 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes 411 | - **Year introduced**: 2017. 412 | - **Contact person (who is the main contact person?)**: Kemele M. Endris (kemele.endris@gmail.com) 413 | - **Purpose (what can one do with the tool?)**: Processor. Ontario is able to answer SPARQL SELECT queries over heterogeneous data sources; CSV, JSON, XML, RDBMS, Neo4j, MongoDB, RDF. Non-RDF data is transformed on-the-fly during query time. Ontario also support SPARQL CONSTRUCT queries to transform data from a Semantic Data Lake to RDF. 414 | - **Mapping language**: RML 415 | - **Supported data (formats, sizes)**: CSV, Parquet, MongoDB, JDBC (MySQL, Postgres, Neo4j, RDF. 416 | - **Programming language**: Python 417 | - **DOI**: http://doi.org/10.1007/978-3-030-27615-7_29 418 | - **License**: GNU/GPL v2 419 | 420 | ## Tool 20: 421 | - **Name of the tool**: Gra.fo 422 | - **Description**: a visual, collaborative and real-time knowledge graph schema and mapping tool 423 | - **Repository (link to the tool’s repository)**: N/A 424 | - **Website (if is different to the repository)**: https://gra.fo/ 425 | - **Open source? (If not open sourced, ideally provide an option to test it)**: No. https://gra.fo/ 426 | - **Year introduced**: 2019 427 | - **Contact person (who is the main contact person?)**: Juan Sequeda (juan@data.world) 428 | - **Purpose (what can one do with the tool?)**: Editor. Gra.fo in conjunction with data.world provides virtualization 429 | - **Mapping language**: R2RML 430 | - **Supported data (formats, sizes)**: Any relational databases and CSV/XLS connected in data.world 431 | - **Programming language**: Commercial tool 432 | - **Special features**: Visual (drag and drop), Collaborative (Share document with different permissions, history, comments), Real-Time (multiple users at the same time collaborating) 433 | - **DOI**: N/A 434 | - **License**: https://gra.fo/terms-and-conditions/ 435 | --------------------------------------------------------------------------------