├── README.md
├── evaluation-systems.md
├── languages.md
└── tools.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Resources for Knowledge Graph Construction
 2 | 
 3 | List of the tools, mapping languages and evaluation systems.
 4 | 
 5 | To contribute to any of the lists:
 6 | 
 7 | ### Github users:
 8 | 
 9 | Option 1 (recommendable):
10 | - Make a fork of the repository to your own personal account.
11 | - Copy the template of the list you want to edit (always at the top of the file) and put it at the end of the same file.
12 | - Fill the template with the requested information.
13 | - Make a pull request.
14 | 
15 | Option 2:
16 | - Send an issue with the filled template of the list where you want to add the resource.
17 | 
18 | 
19 | ### Non-Github users:
20 | - Send the filled template of the list where you want to add the resource to the internal mailing list of the group.
21 | 


--------------------------------------------------------------------------------
/evaluation-systems.md:
--------------------------------------------------------------------------------
 1 | # Evaluation Systems for Knowledge Graph Construction
 2 | 
 3 | 
 4 | ## Evaluation System X (Template):
 5 | - **Name**: 
 6 | - **Description**:
 7 | - **Repository/Website**:
 8 | - **Main Features (e.g., parameters that tests)**:
 9 | - **Supported data sources and formats**:
10 |     - **Data format**: 
11 |     - **Sizes or Generator**:
12 | - **Purpose(Virtual KG /Materialized KG/Both)**:
13 | - **Supported mapping language(s)**:
14 | - **Target and source models**:
15 | - **Contact point**:
16 | - **DOI**:
17 | - **License**:
18 | 
19 | 
20 | ## Evaluation System 1:
21 | - **Name**: Madrid-GTFS-Bench
22 | - **Description**: Benchmarking for virtual knowledge graph access in the transport domain. 
23 | - **Repository/Website**: https://github.com/oeg-upm/gtfs-bench
24 | - **Main Features**: Data format, SPARQL operators, data size, joins
25 | - **Supported data sources and formats**: Any possible configuration by the user
26 |     - **Data format**: SQL, CSV, JSON, XML, MongoDB (and combinations)
27 |     - **Sizes or Generator**: Generator (based in VIG)
28 | - **Purpose(Virtual KG /Materialized KG/Both)**: Virtual KG
29 | - **Supported mapping language(s)**: RML, R2RML, xR2RML, CSVW
30 | - **Target and source models**: Ontology (http://vocab.gtfs.org/gtfs.ttl), source (https://developers.google.com/transit/gtfs)
31 | - **Contact point**: David Chaves (dchaves@fi.upm.es)
32 | - **DOI**: https://doi.org/10.5281/zenodo.3574493
33 | - **License**: Apache-2.0
34 | 
35 | 
36 | 
37 | ## Evaluation System 2:
38 | - **Name**: KGC Parameters
39 | - **Description**: Parameters that affect the construction of a Knowledge Graph
40 | - **Repository/Website**: https://github.com/SDM-TIB/KGC-Param-Eval
41 | - **Main Features**: Join-Selectivity, Join-Duplicates, Relation-Types (1-N,N-1,N-M), Data-Paritioning
42 | - **Supported data sources and formats**:
43 |     - **Data format**: CSV
44 |     - **Sizes or Generator**: Ad-hoc generator in Python
45 | - **Purpose(Virtual KG /Materialized KG/Both)**: Materialized Knowledge Graph 
46 | - **Supported mapping language(s)**: RML
47 | - **Target and source models**: N/A 
48 | - **Contact point**: David Chaves (dchaves@fi.upm.es)
49 | - **DOI**: N/A
50 | - **License**: MIT
51 | 
52 | 


--------------------------------------------------------------------------------
/languages.md:
--------------------------------------------------------------------------------
 1 | # Mapping Languages for Knowledge Graph Construction
 2 | 
 3 | ## Mapping Language X:
 4 | - **Name**:
 5 | - **specification**: 
 6 | - **year**:
 7 | - **syntax(RDF or not RDF, if not RDF, what?)**:
 8 | - **description**: 
 9 | - **data source limitations (If yes, what limitations? By default, we assume languages generating RDF can support any data format)**:
10 | - **features (select from the list)**: 
11 |   - collections/lists → can collections/lists be generated with this language?
12 |   - named graphs → are named graphs supported?
13 |   - blank nodes → can blank nodes be generated?
14 |   - data transformations/functions → can data transformations be handled?
15 |   - joining data sources → can the language support joins? If so, on which basis?
16 |     - By ID (e.g., how you typically join database tables)
17 |     - By structure (e.g., a JSON object’s children should be mapped as objects of a relation)
18 |     - other (add any other type of join supported and we haven’t thought of)
19 |   - RDF* support
20 |   - add your own
21 | - **website**:
22 | - **test-cases**: 
23 | - **contact point**: 
24 | 
25 | 
26 | ## Mapping Language 1:
27 | - **Name**: RML
28 | - **specification**: https://rml.io/specs/rml/
29 | - **year**: 2013
30 | - **syntax**: RDF
31 | - **description**: RML is a generation of the W3C recommended R2RML aiming to support heterogeneous data
32 | - **data source limitations**: no
33 | - **features**: named graphs, blank nodes, data transformations with FnO extension, joins
34 | - **website**: RML.io
35 | - **test-cases**: https://rml.io/test-cases/
36 | - **contact point**: Anastasia Dimou
37 | 
38 | ## Mapping Language 2:
39 | - **Name**: SPARQL-Generate
40 | - **specification**: see [_Maxime Lefrançois, Antoine Zimmermann, Noorani Bakerally A SPARQL extension for generating RDF from heterogeneous formats, In Proc. Extended Semantic Web Conference, ESWC, May 2017, Portoroz, Slovenia_](http://www.maxime-lefrancois.info/docs/LefrancoisZimmermannBakerally-ESWC2017-Generate.pdf) for the specification of the first version.
41 | - **year**: 2016
42 | - **syntax**: See [Javacc](https://github.com/sparql-generate/sparql-generate/blob/master/sparql-generate-jena/src/main/javacc/spargl.jj), [extension of YASQE](https://github.com/sparql-generate/sparql-generate-editor/blob/gh-pages/lib/grammar/sparql11-grammar.pl), extension of [Sublime Linked Data package](https://github.com/sparql-generate/sublime-editor/blob/master/src/syntax/sparql-generate.sublime-syntax-source)
43 | - **description**: SPARQL-Generate is an expressive template-based language to generate RDF streams or text streams from RDF datasets and document streams in arbitrary formats
44 | - **data source limitations**: no
45 | - **features**: 
46 |     - collections/lists [(try out)](https://ci.mines-stetienne.fr/sparql-generate/playground.html#ex=example/generate/08-Lists),
47 |     - querying named graphs: yes as it's an extension of SPARQL
48 |     - blank nodes: yes as it's an extension of SPARQL
49 |     - data transformations/function (the standard SPARQL ones, plus the [SPARQL Function form O. Corby and C. Faron Zucker](http://ns.inria.fr/sparql-extension/#function) ) 
50 |     - joining data sources: 
51 |         - By ID, potentially with any data transformation
52 |         - By structure (e.g., a JSON object’s children are mapped as objects of a relation, or even subjects or predicates of relations)
53 |         - [try out the example of a query that combines JSON weather station reports with lists of events described in XML documents and obtained from an external web service](https://ci.mines-stetienne.fr/sparql-generate/playground.html#ex=example/generate/06-DifferentSources)
54 |     - generating RDF streams
55 |     - generating output as HDT
56 |     - IRIs or Literals can be generated with bits of different data sources
57 |     - binary data
58 | - **website**: 
59 |     - https://ci.mines-stetienne.fr/sparql-generate/
60 | - **test-cases**: https://ci.mines-stetienne.fr/sparql-generate/playground.html
61 | - **contact point**: Maxime Lefrançois, EMSE Saint-Étienne
62 | 
63 | ## Mapping Language 3:
64 | - **Name**: ShExML
65 | - **specification**: http://shexml.herminiogarcia.com/spec/ 
66 | - **year**: 2019
67 | - **syntax**: Based on ShEx syntax but with its own grammar ([ANTLR4 grammar](https://github.com/herminiogg/ShExML/blob/master/src/main/java/es/weso/antlr/ShExMLParser.g4))
68 | - **description**: ShExML is a language based on ShEx to map and merge heterogeneous data sources. It is designed with usability in mind trying to make the script creation easier to the users.
69 | - **data source limitations**: no
70 | - **features (select from the list)**: 
71 |   - collections/lists → no
72 |   - named graphs → no (future extension)
73 |   - blank nodes → yes
74 |   - data transformations/functions → limited ones (see [String operators](http://shexml.herminiogarcia.com/spec/#string-operation-over-iterators) and [Matchers](http://shexml.herminiogarcia.com/spec/#string-operation-over-iterators)), planning to add an extensible function library
75 |   - joining data sources:
76 |     - By ID: [JOIN keyword](http://shexml.herminiogarcia.com/spec/#join-over-iterators)
77 |     - By structure: [Shape linking](http://shexml.herminiogarcia.com/spec/#linking-shapes)
78 | - **website**: http://shexml.herminiogarcia.com/
79 | - **test-cases**: https://github.com/herminiogg/ShExML/tree/master/src/test/scala-2.12/es/weso/shexml
80 | - **contact point**: Herminio García González (garciaherminio@uniovi.es)
81 | 


--------------------------------------------------------------------------------
/tools.md:
--------------------------------------------------------------------------------
  1 | # Knowledge Graph Construction Tools
  2 | Description of the tools for knowledge graph construction
  3 | 
  4 | ## Tool X (TEMPLATE):
  5 | - **Name of the tool**:
  6 | - **Description**:
  7 | - **Repository (link to the tool’s repository)**:
  8 | - **Website (if is different to the repository)**:
  9 | - **Open source? (If not open sourced, ideally provide an option to test it)**:
 10 | - **Year introduced**:
 11 | - **Contact person (who is the main contact person?)**:
 12 | - **Purpose (what can one do with the tool?)**: Select one of this options: Processor (executes rules to generate a knowledge graph), editor (automatically o manually generation of mapping rules), other (e.g., pre-processing)
 13 | - **Mapping language**: (which mapping language(s) is supported by the tool)
 14 | - **Supported data (formats, sizes)**:
 15 | - **Programming language**:
 16 | - **Special features**:
 17 | - **DOI**:
 18 | - **License**:
 19 | - **Test cases**: (if any for the supported languages)
 20 | - **Related use cases**: (specify use cases shared with the community group (if any) where the tool is used)
 21 | - **Related projects**: (specify projects (if any) where the tool is used, ideally provide links to the projects descriptions)
 22 | 
 23 | 
 24 | ## Tool 1:
 25 | - **Name of the tool**: Morph-CSV
 26 | - **Description**: Morph-CSV is an open source tool for querying tabular data sources using SPARQL. It exploits the information from the query, RML+FnO mappings and CSVW metadata to enhance the performance and completness of traditional OBDA systems (SPARQL-to-SQL translators). At this moment can be embebed in the top of any R2RML-compliant system.
 27 | - **Repository**: https://github.com/oeg-upm/morph-csv
 28 | - **Website**: https://morph.oeg.fi.upm.es/tool/morph-csv
 29 | - **Open source**: Yes
 30 | - **Year introduced**: 2019
 31 | - **Contact person**: David Chaves (dchaves@fi.upm.es)
 32 | - **Purpose**: Other (Enhance SPARQL-to-SQL engines when data is in CSV)
 33 | - **Mapping language**: YARRRML+FnO and CSVW
 34 | - **Supported data**: CSV (tested with BSBM 360K products)
 35 | - **Programming language**: Python + Bash
 36 | - **DOI**: https://doi.org/10.5281/zenodo.3731941
 37 | - **License**: Apache-2.0
 38 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/oeg-publictransport.md
 39 | - **Related projects**: http://sprint-transport.eu/
 40 | 
 41 | ## Tool 2:
 42 | - **Name of the tool**: RMLEditor
 43 | - **Description**: The RMLEditor offers a graphical user interface to create rules to generate knowledge graphs based on heterogeneous data sources.
 44 | - **Repository (link to the tool’s repository)**: https://github.com/RMLio/rmleditor-ce
 45 | - **Website (if is different to the repository)**: https://app.rml.io/rmleditor/
 46 | - **Open source? (If not open sourced, ideally provide an option to test it)**: No
 47 | - **Year introduced**: 2016
 48 | - **Contact person (who is the main contact person?)**: Pieter Heyvaert (pieter.heyvaert@ugent.be)
 49 | - **Purpose (what can one do with the tool?)**: editor
 50 | - **Mapping language**: [R2]RML
 51 | - **Supported data (formats, sizes)**: CSV, JSON, XML
 52 | - **Programming language**: HTML/CSS/JS
 53 | - **Special features**: Uses LOV to find relevant classes and properties. Uses MapVOWL to visualize rules.
 54 | - **DOI**: N/A
 55 | - **License**: Free community edition with limitations and paid edition without limitations. 
 56 | - **Test cases**: None
 57 | - **Related use cases**: None
 58 | - **Related projects**: [DyVerSIFy](https://www.imec-int.com/en/what-we-offer/research-portfolio/dyversify), [MOS2S](https://www.mos2s.eu/), [COMBUST](https://www.imec-int.com/en/what-we-offer/research-portfolio/combust)
 59 | 
 60 | ## Tool 3:
 61 | - **Name of the tool**: RMLMapper
 62 | - **Description**: The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
 63 | - **Repository (link to the tool’s repository)**: https://github.com/RMLio/rmlmapper-java
 64 | - **Website (if is different to the repository)**; https://rml.io
 65 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes
 66 | - **Year introduced**: 2014
 67 | - **Contact person (who is the main contact person?)**: Ben De Meester (ben.demeester@ugent.be)
 68 | - **Purpose (what can one do with the tool?)**: Processor
 69 | - **Mapping language**: RML, R2RML
 70 | - **Supported data (formats, sizes)**: local and remote files (CSV using ql:CSV or CSVW, JSON using JSONPath, XML using XPath), databases (MySQL, PostgreSQL, SQLServer, OracleDB). The mapper is in-memory, so query result size should be less than the machine's memory
 71 | - **Programming language**: JAVA
 72 | - **Special features**: Extensible in terms of supported data formats, Configurable and extensible data transformations using https://FnO.io, interdatasource join. Reference implementation of RML.
 73 | - **DOI**: N/A
 74 | - **License**: MIT
 75 | - **Test cases**: https://rml.io/test-cases/
 76 | - **Related use cases**: betweenourworlds-anime, idlab-covid19, idlab-dbpedia, idlab-facebook, idlab-twitter, idlab-velopark
 77 | - **Related projects**: [EcoDaLo], [ESSENCE], [DAIQUIRI], [DiSSeCt]
 78 | 
 79 | [EcoDaLo]: https://www.imec-int.com/en/what-we-offer/research-portfolio/ecodalo
 80 | [ESSENCE]: https://www.imec-int.com/en/what-we-offer/research-portfolio/essence
 81 | [DAIQUIRI]: https://www.imec-int.com/en/what-we-offer/research-portfolio/daiquiri
 82 | [DiSSeCt]: https://dissectsite.wordpress.com/
 83 | 
 84 | ## Tool 4:
 85 | - **Name of the tool**: Morph-RDB
 86 | - **Description**: Morph-RDB (formerly called ODEMapster) is an RDB2RDF engine developed by the Ontology Engineering Group, that follows the R2RML specification (http://www.w3.org/TR/r2rml/).This engine supports two operational modes: data upgrade (generating RDF instances from data in a relational database) and query translation (SPARQL to SQL). Morph-RDB employs various optimisation techniques in order to generate efficient SQL queries, such as self-join elimination and subquery elimination. 
 87 | - **Repository**: https://github.com/oeg-upm/morph-rdb
 88 | - **Website**: https://morph.oeg.fi.upm.es/tool/morph-rdb
 89 | - **Open source**: Yes
 90 | - **Year introduced**: 2014
 91 | - **Contact person**: David Chaves (dchaves@fi.upm.es)
 92 | - **Purpose**: Processor
 93 | - **Mapping language**: R2RML
 94 | - **Supported data**: SQL (tested with MySQL and PostgreSQL)
 95 | - **Programming language**: Scala + Java
 96 | - **DOI**: N/A
 97 | - **License**: Apache-2.0
 98 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/oeg-publictransport.md
 99 | - **Related projects**: http://sprint-transport.eu/, https://www.mobile-age.eu/, https://bimerr.eu/
100 | 
101 | ## Tool 5:
102 | - **Name of the tool**: Morph-GraphQL
103 | - **Description**: Morph-GraphQL is an open source system for generating GraphQL servers automatically from declarative mappings such as R2RML or RML. Currently, Morph-GraphQL is able to generate GraphQL servers in JavaScript and SQL databases. Current experimental features include the generation of GraphQL server in other languages (e.g. Java) and other data models (e.g. MongoDB)
104 | - **Repository**: https://github.com/oeg-upm/morph-graphql
105 | - **Website**: https://morph.oeg.fi.upm.es/tool/morph-graphql
106 | - **Open source**: Yes
107 | - **Year introduced**: 2019
108 | - **Contact person**: David Chaves (dchaves@fi.upm.es)
109 | - **Purpose**: Processor
110 | - **Mapping language**: R2RML and RML
111 | - **Supported data**: SQL (tested with H2) and NoSQL (experimental, tested with MongoDB)
112 | - **Programming language**: JavaScript/Node.js
113 | - **DOI**: N/A
114 | - **License**: Apache-2.0
115 | - **Related use cases**: N/A
116 | - **Related projects**: N/A
117 | 
118 | ## Tool 6:
119 | - **Name of the tool**: Mapeathor
120 | - **Description**: Mapeathor is a simple spreadsheet parser able to generate mapping rules in three mapping languages: R2RML, RML (with extension to functions from FnO) and YARRRML. It takes the mapping rules expressed in a spreadsheet (designed to facilitate the mapping rule writting process) and transforms them into the desired language. 
121 | - **Repository (link to the tool’s repository)**: https://github.com/oeg-upm/Mapeathor
122 | - **Website (if is different to the repository)**: https://morph.oeg.fi.upm.es/tool/mapeathor
123 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes
124 | - **Year introduced**: 2019
125 | - **Contact person (who is the main contact person?)**: Ana Iglesias (ana.iglesiasm@upm.es)
126 | - **Purpose (what can one do with the tool?)**: Editor
127 | - **Mapping language**: R2RML, RML, YARRRML
128 | - **Supported data (formats, sizes)**: Excel
129 | - **Programming language**: Python
130 | - **License**: Apache-2.0
131 | - **Test cases**: None
132 | - **Related use cases**: None
133 | - **Related projects**: [Ciudades Abiertas](http://www.ciudadesabiertas.es/)
134 | 
135 | ## Tool 7:
136 | - **Name of the tool**: SDM-RDFizer
137 | - **Description**: SDM-RDFizer is an interpreter of mapping rules that allows the transformation of (un)structured data into RDF knowledge graphs.
138 | - **Repository (link to the tool’s repository)**: https://github.com/SDM-TIB/SDM-RDFizer
139 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes
140 | - **Year introduced**: 2017
141 | - **Contact person (who is the main contact person?)**: Enrique Iglesias (s6enigle@uni-bonn.de)
142 | - **Purpose (what can one do with the tool?)**:  Transformation of (un)structured data into RDF knowledge graphs by an efficient execution of RML triple maps.
143 | - **Mapping language**: RML (current version)
144 | - **Supported data (formats, sizes)**: CSV, JSON, RDB, XML
145 | - **Programming language**: Python
146 | - **Special features**: The SDM-RDFizer implements optimized data structures and relational algebra operators that enable an efficient execution of RML triple maps even in the presence of Big data. SDM-RDFizer is also extensible in terms of supported data formats, Configurable and extensible data processing functions using https://FnO.io
147 | - **DOI**: https://doi.org/10.5281/zenodo.3872103
148 | - **License**: Apache-2.0
149 | - **Test cases**: https://rml.io/test-cases/
150 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/sdm-genomics.md
151 | - **Related projects**: [iASiS](http://project-iasis.eu/), [BigMedilytics - lung cancer pilot](https://www.bigmedilytics.eu/), [CLARIFY](https://www.clarify2020.eu/), [P4-LUCAT](https://www.tib.eu/de/forschung-entwicklung/projektuebersicht/projektsteckbrief/p4-lucat), [ImProVIT](https://www.tib.eu/de/forschung-entwicklung/projektuebersicht/projektsteckbrief/improvit), [PLATOON](https://platoon-project.eu/)
152 | 
153 | ## Tool 8:
154 | - **Name of the tool**: RMLStreamer
155 | - **Description**: The RMLStreamer executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources in a streaming way. 
156 | - **Repository (link to the tool’s repository)**: https://github.com/RMLio/RMLStreamer
157 | - **Website (if is different to the repository)**: https://rml.io
158 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes
159 | - **Year introduced**: 2019
160 | - **Contact person (who is the main contact person?)**: Gerald Haesendonck (gerald.haesendonck@ugent.be)
161 | - **Purpose (what can one do with the tool?)**: Processor
162 | - **Mapping language**: RML
163 | - **Supported data (formats, sizes)**: formats: CSV, XML, JSON; media: files, TCP sockets, Kafka topics
164 | - **Programming language**: Scala
165 | - **Special features**: Extensible in terms of supported data formats and media, optimised for processing big data sets and contious data streams, designed to run on a cluster.
166 | - **DOI**: https://doi.org/10.5281/zenodo.3887065
167 | - **License**: MIT
168 | - **Test cases**: https://rml.io/test-cases/
169 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/idlab-twitter.md
170 | - **Related projects**: [MOS2S], [DyVerSIFy], [ESSENCE], [DAIQUIRI]
171 | 
172 | [MOS2S]: https://www.mos2s.eu/
173 | [DyVerSIFy]: https://www.imec-int.com/en/what-we-offer/research-portfolio/dyversify
174 | 
175 | 
176 | ## Tool 9:
177 | - **Name of the tool**: SPARQL micro-services
178 | - **Description**:
179 |     The SPARQL Micro-Service architecture is meant to unlock data silos hidden behind proprietary Web APIs by equipping them with a lightweight SPARQL endpoint. The whole idea is about bringing Web APIs into the Web of Data and making it possible to integrate Linked Data and Web APIs within a simple federated SPARQL query. 
180 |     
181 |     A SPARQL micro-service encapsulates a Web API and typically yields a **small, resource-centric graph** generated dynamically. It can be seen as a **configurable** SPARQL endpoint in that it expects parameters, e.g. a SPARQL micro-service to find photos from Snapshat may expect tags.
182 |     
183 |     An interesting use of SPARQL micro-services is to **assign dereferenceable URIs to Web API resources** that do not have URIs in the first place. For instance, https://sparql-micro-services.org/ld/flickr/photo/31173091626 is the dereferenceable URI of a photo in Flickr. The content is generated dynamically based on the photo identifier.
184 | - **Repository (link to the tool’s repository)**: https://github.com/frmichel/sparql-micro-service
185 | - **Website (if is different to the repository)**: example SPARQL micro-services: https://sparql-micro-services.org/
186 | - **Open source? (If not open sourced, ideally provide an option to test it)**: yes
187 | - **Year introduced**: 2018
188 | - **Contact person (who is the main contact person?)**: Franck Michel (franck.michel@cnrs.fr)
189 | - **Purpose (what can one do with the tool?)**: processor, other
190 | - **Mapping language**: (which mapping language(s) is supported by the tool) : SPARQL construct
191 | - **Supported data (formats, sizes)**: mainly JSON-based Web APIs, XML-based Web APIs can be adapted too
192 | - **Programming language**: php 
193 | - **Special features**:
194 |     - Docker deployment ready
195 |     - Assign dereferenceable URIs to Web API resources (bridge Web APIs and LOD)
196 |     - Provide provenance information as part of the graph generated
197 |     - Simple configuration with a config.ini file, or with rich SPARQL Sescription Description and SHACL shapes graph
198 |     - Dynamic generation of HTML documentation + test interface from the SPARQL micro-service Sescription Description see [example](https://sparql-micro-services.org/service/flickr/getPhotosByTags_sd/))
199 |     - Autmatic markup of HTML documentation as schema.org Dataset to allow webscale discoverability of SPARQL micro-services, e.g. with Google Dataset Search
200 | - **DOI**: n/a
201 | - **License**: Apache 2.0
202 | - **Test cases**: n/a
203 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/inria-kg-vs-webapis.md
204 | - **Related projects**: [Taxref-Web](https://taxref.mnhn.fr/taxref-web/) (private access only, comparison of 20+ Web APIs in the biodiversity domain). Multiple hands-on sessions experimented successfully with various Web APIs: Flickr, Youtube, Twitter, Spotify, Deezer, Musicbrainz...
205 | 
206 | ## Tool 10:
207 | - **Name of the tool**: WordLift Plugin
208 | - **Description**: 
209 | 
210 | WordLift is a WordPress plugin that brings state-of-the-art semantic technologies to the hands of any blogger and publisher: without requiring any technical skills, it helps produce richer content and organize it by suggesting facts and information to provide readers with meaningful context and adding semantic markup to the text to help machines fully understand any website.
211 | 
212 | Features:
213 | 
214 |   * Text Analysis: WordLift analyzes content and identifies matching entities organized in 4 categories: Who, What, When and Where.
215 |   * Tag Content: Editors can accept the suggested entities to add contextual info for the user, efficiently selecting internal links to existing content.
216 |   * Create New Entities: Editors can create new entities providing additional context and enriching the web site Knowledge Graph. WordLift will learn and next time they will be detected.
217 |   * Edit Entities: Editors can edit all entities to customize the Knowledge Graph around the web sites' audiences and build new relationships.
218 |   * Images: WordLift suggests open license images and media from own library, saving the time usually spent searching for visuals.
219 |   * Geomaps: Locations in articles can quickly be mapped adding the Geomap widget.
220 |   * Timelines: Events can be displayed chronologically adding the Timeline widget.
221 |   * Chords: Visualize what relates to what in every article adding the Chord widget.
222 |   * Navigator: Recommend relevant articles adding the Navigator widget.
223 |   * Faceted Search: Suggest additional content related to the topics found in your article, letting readers dive into your archive with the Faceted Search widget.
224 |   * Meaningful Navigation: WordLift automatically identifies topics in articles, using Wikipedia’s classification system. This allows to create new entry points for content based on topics, events, people and places.
225 |   * Publish Search Data: WordLift automatically adds schema.org markup articles, allowing search engines to properly index and display content and intelligent agents such as Siri and Alexa to access it.
226 |   * Publish Linked Data: WordLift publishes content’s metadata.
227 | 
228 | - **Repository**: https://github.com/insideout10/wordlift-plugin
229 | - **Website**: https://wordlift.io
230 | - **Open source?**: yes
231 | - **Year introduced**: 2017
232 | - **Contact person**: David Riccitelli (david@wordlift.io)
233 | - **Purpose**: editor, other
234 | - **Mapping language**: n/a
235 | - **Supported data**: WordPress, unstructured HTML
236 | - **Programming language**: PHP, Java, Python
237 | - **Special features**: [Knowledge Graph](https://wordlift.io/blog/en/entity/knowledge-graph/), [Linked Data](https://wordlift.io/blog/en/entity/linked-data/), NLP, SPARQL, GraphQL, Persistent URIs
238 | - **DOI**: n/a
239 | - **License**: GPL
240 | - **Test cases**: n/a
241 | - **Related use cases**: https://github.com/kg-construct/use-cases/blob/master/wordlift-salzburgerland.md
242 | - **Related projects**: [WordLift NG]
243 | 
244 | [WordLift NG]: https://wordlift.io/blog/en/wordlift-next-generation-receives-grant-from-eu/
245 | 
246 | 
247 | ## Tool 11:
248 | - **Name of the tool**: RocketRML
249 | - **Description**: An efficient RML-mapper implementation with Javascript for the RDF mapping language (RML).
250 | - **Repository (link to the tool’s repository)**: https://github.com/semantifyit/RocketRML
251 | - **Website (if is different to the repository)**: https://semantifyit.github.io/RocketRML/
252 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes
253 | - **Year introduced**: 2019
254 | - **Contact person (who is the main contact person?)**: Umutcan Simsek (umutcan.simsek@sti2.at)
255 | - **Purpose (what can one do with the tool?)**: Processor
256 | - **Mapping language**: RML (in Turtle and YARRML syntax)
257 | - **Supported data (formats, sizes)**: CSV, JSON, XML. Tested with 500k triples (takes ~20s)
258 | - **Programming language**: Javascript (Node.js)
259 | - **Special features**: It efficiently maps hierarchical sources by using some caching mechanisms for iterators and JOIN results. Available as a tool with CLI and as an NPM package. A Dockerfile is also provided. Please see the Github repository.
260 | - **DOI**: n/a
261 | - **License**: CC-BY-SA-4.0
262 | - **Test cases**: n/a
263 | - **Related use cases**: TBD
264 | - **Related projects**: [semantify.it], [MindLab]
265 | 
266 | [semantify.it]: https://semantify.it
267 | [MindLab]: https://mindlab.ai
268 | 
269 | 
270 | ## Tool 12:
271 | - **Name of the tool**: SPARQL-Generate
272 | - **Description**: SPARQL-Generate is an expressive template-based language to generate RDF streams or text streams from RDF datasets and document streams in arbitrary formats
273 | - **Repository (link to the tool’s repository)**: https://github.com/sparql-generate/sparql-generate
274 | - **Website (if is different to the repository)**: https://ci.mines-stetienne.fr/sparql-generate/
275 | - **Open source? (If not open sourced, ideally provide an option to test it)**: yes
276 | - **Year introduced**: 2016
277 | - **Contact person (who is the main contact person?)**: Maxime Lefrançois, MINES Saint-Étienne
278 | - **Purpose (what can one do with the tool?)**: Processor
279 | - **Mapping language**: SPARQL-Generate
280 | - **Supported data (formats, sizes)**: RDF, SQL, XML, JSON, CSV, GeoJSON, HTML, CBOR, plain text with regular expressions, large CSV documents (unofficially: generation of 17.5 M triples HDT in < 9'20''), MQTT or WebSocket streams, repeated HTTP GET operations. 
281 | - **Programming language**: Java
282 | - **Special features**: 
283 |     - useable [on the web playground](https://ci.mines-stetienne.fr/sparql-generate/playground.html), [inside Sublime Text](https://ci.mines-stetienne.fr/sparql-generate/sublime.html), [as an executable JAR](https://ci.mines-stetienne.fr/sparql-generate/language-cli.html), [as an open source Java library](https://ci.mines-stetienne.fr/sparql-generate/get-started.html) ;
284 |     - can also generate text streams from RDF datasets and document streams in arbitrary format (implements something like [STTL](https://ns.inria.fr/sparql-template/)).
285 | - **DOI**: https://doi.org/10.5281/zenodo.3965916
286 | - **License**: Apache 2
287 | - **Test cases**: see https://ci.mines-stetienne.fr/sparql-generate/playground.html 
288 | - **Related projects**: [ITEA2 12004 SEAS (Smart Energy Aware Systems)], [ANR 14-CE24-0029 OpenSensingCity], [ETSI STF 578], bilateral research contracts with ENGIE R&D CRIGEN, [ANR 19-CE23-0012 CoSWoT], [ANR HyperAgents].
289 | 
290 | [ITEA2 12004 SEAS (Smart Energy Aware Systems)]: https://itea3.org/project/seas.html
291 | [ANR 14-CE24-0029 OpenSensingCity]: https://anr.fr/Project-ANR-14-CE24-0029
292 | [ETSI STF 578]: https://portal.etsi.org/STF/STFs/STF-HomePages/STF578 
293 | [ANR 19-CE23-0012 CoSWoT]: https://anr.fr/Project-ANR-19-CE23-0012
294 | [ANR HyperAgents]: http://hyperagents.gitlab.emse.fr/
295 | 
296 | ## Tool 13:
297 | - **Name of the tool**: ShExML
298 | - **Description**: ShExML engine and webapp
299 | - **Repository**: https://github.com/herminiogg/ShExML
300 | - **Website (if is different to the repository)**: http://shexml.herminiogarcia.com
301 | - **Open source?**: yes
302 | - **Year introduced**: 2019
303 | - **Contact person**: Herminio García González (garciaherminio@uniovi.es)
304 | - **Purpose**: Processor (executes rules to generate a knowledge graph), editor (automatically o manually generation of mapping rules), translator (convert ShExML rules to RML rules)
305 | - **Mapping language**: ShExML
306 | - **Supported data**: XML, JSON, CSV.
307 | - **Programming language**: Scala
308 | - **Special features**: N/A
309 | - **DOI**: N/A
310 | - **License**: Not yet decided
311 | - **Test cases**: https://github.com/herminiogg/ShExML/tree/master/src/test/scala-2.12/es/weso/shexml (sbt test and [CI](https://travis-ci.org/github/herminiogg/ShExML))
312 | - **Related use cases**: [Asturian Notaries Manuscripts](https://github.com/kg-construct/use-cases/blob/master/uniovi-notaries.md)
313 | - **Related projects**: N/A
314 | 
315 | ## Tool 14:
316 | - **Name of the tool**: CARML
317 | - **Description**: An extensible RML mapping engine with built-in support for JSON, CSV, and XML
318 | - **Repository (link to the tool’s repository)**: https://github.com/carml/carml
319 | - **Website (if is different to the repository)**:
320 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes
321 | - **Year introduced**: 2017
322 | - **Contact person (who is the main contact person?)**: Pano Maria (pano@skemu.com)
323 | - **Purpose (what can one do with the tool?)**: Processor
324 | - **Mapping language**: RML
325 | - **Supported data (formats, sizes)**: CSV, JSON, XML
326 | - **Programming language**: JAVA
327 | - **Special features**: Easily extensible for other formats. InputStream extension for easy programmatic binding of sources. XML document extension to be able to use namespace prefix mappings in XPath expressions. Support for FnO functions. 
328 | - **DOI**: n/a
329 | - **License**: MIT
330 | - **Test cases**: https://rml.io/test-cases/
331 | - **Related use cases**: [Kadaster Data Platform](https://github.com/kg-construct/use-cases/blob/master/kadaster-ld.md)
332 | - **Related projects**: [Kadaster Data Platform (PDOK)](https://www.pdok.nl/), [Zazuko XRM](https://zazuko.com/products/expressive-rdf-mapper/), [DotWebStack Framework](https://github.com/dotwebstack/dotwebstack-framework/)
333 | 
334 | ## Tool 15:
335 | - **Name of the tool**: Helio
336 | - **Description**: Helio is a framework that allows publishing RDF data from different heterogeneous sources as Linked Data
337 | - **Repository (link to the tool’s repository)**: https://github.com/oeg-upm/Helio
338 | - **Website (if is different to the repository)**: https://oeg-upm.github.io/Helio/
339 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes
340 | - **Year introduced**: 2018
341 | - **Contact person (who is the main contact person?)**: Andrea Cimmino (cimmino@fi.upm.es)
342 | - **Purpose (what can one do with the tool?)**: Processor (executes rules to generate a knowledge graph),  Publish Knowledge Graph.
343 | - **Mapping language**: RML, WoT-Mapping, and Helio mapping
344 | - **Supported data (formats, sizes)**: CSV, XML, HTML, text, JSON, RDF
345 | - **Programming language**: Java
346 | - **Special features**: relies on a plugin sistem that does not require developers to download the core code, customizable html views, can integrate existing tools that generate RDF.
347 | - **DOI**:
348 | - **License**: APACHE 2.0
349 | - **Test cases**: (if any for the supported languages)
350 | - **Related use cases**: -
351 | - **Related projects**: [VICINITY H2020](https://www.vicinity2020.eu/vicinity/), [DELTA H2020](https://www.delta-h2020.eu/), [BIMER H2020](https://bimerr.eu/about/)
352 | 
353 | ## Tool 16:
354 | - **Name of the tool**: FunMap
355 | - **Description**: FunMap is an interpreter of RML+FnO that converts a data integration system defined using RML+FnO into an equivalent data integration system where RML mappings are function-free.
356 | - **Repository (link to the tool’s repository)**: https://github.com/SDM-TIB/FunMap
357 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes
358 | - **Year introduced**: 2020
359 | - **Contact person (who is the main contact person?)**: Samaneh Jozashoori (samaneh.jozashoori@tib.eu)
360 | - **Purpose (what can one do with the tool?)**:  It can be applied when pre-processing step is provided in the context of mapping rules as functions; i.e. the data pre-processing is supposed to be performed at the time of data model transformation (into RDF) and knowledge graph creation.
361 | - **Mapping language**: RML (current version)
362 | - **Supported data (formats, sizes)**: CSV, RDB
363 | - **Programming language**: Python
364 | - **Special features**: FunMap empowers the knowledge graph creation process with optimization techniques to reduce execution time.
365 | - **DOI**: https://doi.org/10.5281/zenodo.3993657
366 | - **License**: Apache-2.0
367 | - **Test cases**: - 
368 | - **Related use cases**: -
369 | - **Related projects**: [CLARIFY](https://www.clarify2020.eu/), [P4-LUCAT](https://www.tib.eu/de/forschung-entwicklung/projektuebersicht/projektsteckbrief/p4-lucat), [Ciudades Abiertas](https://ciudades-abiertas.es/)
370 | 
371 | ## Tool 17:
372 | - **Name of the tool**: Squerall
373 | - **Description**: An implementation of the so-called _Semantic Data Lake_, a query engine _uniformly_ accessing original large and heterogeneous data sources using Semantic Web principles and technologies
374 | - **Repository (link to the tool’s repository)**: https://github.com/EIS-Bonn/Squerall
375 | - **Website (if is different to the repository)**: https://eis-bonn.github.io/Squerall/
376 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes
377 | - **Year introduced**: 2017.
378 | - **Contact person (who is the main contact person?)**: Mohamed Nadjib Mami (mohamed.nadjib.mami@gmail.com)
379 | - **Purpose (what can one do with the tool?)**: Processor (executes rules to generate a knowledge graph). Squerall is a virtual OBDA (Ontology Based Data Access) engine, where a knowledge graph is only constirbuted _on-the-fly_ at query-time. However, with a slight development effort, it would be possible to physically materialize the knowledge graph (in RDF) following a property table partitining-like scheme.
380 | - **Mapping language**: RML
381 | - **Supported data (formats, sizes)**: CSV, Parquet, MongoDB, Cassandra, JDBC (MySQL, SQL Server, etc.), (beta) Elasticsearch. Squerall can be [extended](https://github.com/EIS-Bonn/Squerall#extensibility) to support other sources
382 | - **Programming language**: Scala, Java
383 | - **Special features**: Use SPARQL to query popular distributed data sources, e.g. files in Hadoop, NoSQL stores _on-the-fly_ i.e. without requiring pre-processing or ingestion. Disparate data may be joinable by declaratively altering some of its atrributes thanks to the use of the FnO ontology. State-of-the-art Big Data query engines are used for the querying, namely Apache Spark and Presto. Squerall can programmatically be [extended](https://github.com/EIS-Bonn/Squerall#extensibility) to use another query engines e.g. Drill or Dremio)
384 | - **DOI**: https://zenodo.org/record/2636436#.X3tOY_kzZPY
385 | - **License**: Apache-2.0
386 | 
387 | ## Tool 18:
388 | - **Name of the tool**: Chimera
389 | - **Description**: Chimera is a tool to build conversion pipelines leveraging Semantic Web technologies.  It is built on-top of Apache Camel to easily configure message-to-message mediators or batch converters using lifting/lowering procedures to/from a reference ontology. In principle the aim is to completely avoid coding by just configuring a pipeline using the various blocks provided.
390 | - **Repository (link to the tool’s repository)**: [https://github.com/cefriel/chimera](https://github.com/cefriel/chimera)
391 | - **Open source?** : YES
392 | - **Year introduced**: 2019
393 | - **Contact person (who is the main contact person?)**: Mario Scrocca ([mario.scrocca@cefriel.com](mailto:mario.scrocca@cefriel.com))
394 | - **Purpose (what can one do with the tool?)**: A basic Chimera pipeline involves a lifting Processor (fork of the RMLMapper) Tool and a lowering Processor ([rdf-lowerer](https://github.com/cefriel/rdf-lowerer) built on Apache Velocity). Additional blocks, e.g., for pre-processing/enrichment of the knowledge graph, can be integrated in the pipeline.
395 | - **Mapping language**: RML for lifting, _extended_ VTL (Velocity Template Language) for lowering
396 | - **Supported data (formats, sizes)**: CSV, JSON, XML
397 | - **Programming language**: Java
398 | - **Special features**: High configurability of pipelines to satisfy different data integration requirements using Semantic Web Technologies. Easy to integrate with existing data sources and sinks thanks to Apache Camel components.
399 | - **License**: Apache-2.0
400 | - **Related use cases**:https://github.com/kg-construct/use-cases/blob/master/oeg-publictransport.md
401 | - **Related projects**: http://sprint-transport.eu/
402 | 
403 | 
404 | 
405 | ## Tool 19:
406 | - **Name of the tool**: Ontario
407 | - **Description**: A federated query processing engine that is able to access heterogeneous data sources in a Semantic Data Lake. Ontario leverages the concept of RDF Molecule Templates to effectively and efficeintly decompose, plan and execute SPARQL queries over a federation of data sources. The given SPARQL queries are transformed to the query languages of data sources in a Semantic Data Lake using the mapping rules expressed uring RML language.  
408 | - **Repository (link to the tool’s repository)**: https://github.com/SDM-TIB/Ontario
409 | - **Website (if is different to the repository)**: https://labs.tib.eu/info/projekt/ontario/
410 | - **Open source? (If not open sourced, ideally provide an option to test it)**: Yes
411 | - **Year introduced**: 2017.
412 | - **Contact person (who is the main contact person?)**: Kemele M. Endris (kemele.endris@gmail.com)
413 | - **Purpose (what can one do with the tool?)**: Processor. Ontario is able to answer SPARQL SELECT queries over heterogeneous data sources; CSV, JSON, XML, RDBMS, Neo4j, MongoDB, RDF. Non-RDF data is transformed on-the-fly during query time. Ontario also support SPARQL CONSTRUCT queries to transform data from a Semantic Data Lake to RDF.
414 | - **Mapping language**: RML
415 | - **Supported data (formats, sizes)**: CSV, Parquet, MongoDB, JDBC (MySQL, Postgres, Neo4j, RDF.
416 | - **Programming language**: Python
417 | - **DOI**: http://doi.org/10.1007/978-3-030-27615-7_29
418 | - **License**: GNU/GPL v2
419 | 
420 | ## Tool 20:
421 | - **Name of the tool**: Gra.fo
422 | - **Description**: a visual, collaborative and real-time knowledge graph schema and mapping tool 
423 | - **Repository (link to the tool’s repository)**: N/A
424 | - **Website (if is different to the repository)**: https://gra.fo/
425 | - **Open source? (If not open sourced, ideally provide an option to test it)**: No. https://gra.fo/
426 | - **Year introduced**: 2019 
427 | - **Contact person (who is the main contact person?)**: Juan Sequeda (juan@data.world)
428 | - **Purpose (what can one do with the tool?)**: Editor. Gra.fo in conjunction with data.world provides virtualization 
429 | - **Mapping language**: R2RML
430 | - **Supported data (formats, sizes)**: Any relational databases and CSV/XLS connected in data.world
431 | - **Programming language**: Commercial tool
432 | - **Special features**: Visual (drag and drop), Collaborative (Share document with different permissions, history, comments), Real-Time (multiple users at the same time collaborating)
433 | - **DOI**: N/A
434 | - **License**: https://gra.fo/terms-and-conditions/
435 | 


--------------------------------------------------------------------------------