└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # HexTuples 2 | 3 | _Status: draft_ 4 | 5 | _Version: 0.3.0_ 6 | 7 | HexTuples is a simple datamodel for dealing with linked data. 8 | This document both describes the model and concepts of HexTuples, as well as the (at this moment only) serialization format: HexTuples-NDJSON. 9 | It is very **easy to parse**, can be used for **streaming parsing** and is designed to be **highly performant** in JS contexts. 10 | 11 | ## Concepts 12 | 13 | ### HexTuple 14 | 15 | A single _HexTuple_ is an atomic piece of data, similar to an [RDF Triple](https://www.w3.org/TR/rdf-concepts/#section-triples) (also known as Statements or Quads). 16 | A HexTuple cotains a small piece of information. 17 | HexTuples consist of six fields: `subject`, `predicate`, `value`, `datatype`, `language` and `graph`. 18 | 19 | Let's encode the following sentence in HexTuples: 20 | 21 | _Tim Berners-Lee, the director of W3C, is born in London on the 8th of June, 1955._ 22 | 23 | | Subject | Predicate | Value | DataType | Language | Graph | 24 | |---------|----------------|------------|-----|-----|----| 25 | | [Tim](https://www.w3.org/People/Berners-Lee/) |[birthPlace](http://schema.org/birthPlace) | [London](http://dbpedia.org/resource/London) | | | 26 | | [Tim](https://www.w3.org/People/Berners-Lee/) |[birthDate](http://schema.org/birthDate) | 1955-06-08 | [xsd:date](http://www.w3.org/2001/XMLSchema#date) | | 27 | | [Tim](https://www.w3.org/People/Berners-Lee/) |[jobTitle](http://schema.org/jobTitle) | Director of W3C | [rdf:langString](http://www.w3.org/2000/01/rdf-schema#langString) | en-US | 28 | 29 | ### URI 30 | 31 | URI stands for [Uniform Resource Identifier, specified in RDF 3986](https://tools.ietf.org/html/rfc3986). 32 | The best known type of URI is the URL. 33 | Although it is currently best practice to use mostly HTTPS URLs as URIs, HexTuples works with any type of URI. 34 | 35 | ### Subject 36 | 37 | - The _subject_ is identifier of the thing the statement is about. 38 | - This field is required. 39 | - It MUST be a URI. 40 | 41 | ### Predicate 42 | 43 | - The _predicate_ describes the abstract property of the statement. 44 | - This field is required. 45 | - It MUST be a URI. 46 | 47 | ### Value 48 | 49 | - The _value_ contains the object of the HexTuple. 50 | - This field is required. 51 | - It can be any datatype, specified in the `datatype` of the HexTuple. 52 | 53 | ### Datatype 54 | 55 | - The _datatype_ contains the object of the HexTuple. 56 | - This field is optional. 57 | - It MUST be a URI or an empty string. 58 | - When the Datatype is a NamedNode, use: `globalId` 59 | - When the Datatype is a BlankNode, use: `localId` 60 | 61 | ### Language 62 | 63 | - The _datatype_ contains the object of the HexTuple. 64 | - This field is optional. 65 | - It MUST be an [RFC 3066 language tag](https://tools.ietf.org/html/rfc3066) or an empty string. 66 | 67 | ## Relation to RDF 68 | 69 | The HexTuples datamodel closely resembles the RDF Data Model, which is the de-facto standard for linked data. 70 | RDF statements are often called Triples, because they consist of a `subject`, `predicate` and `value`. 71 | The `object` field is either a single URI (in Named Nodes), or a combination of three fields (in Literal): `value`, `datatype`, `language`. 72 | This means that a single Triple can actually consist of _five_ fields: the `subject`, `predicate`, `value`, `datatype` and the `language`. 73 | A Quad statement also has a `graph`, which totals to six fields, hence the name: HexTuples. 74 | Instead of making a distinction between Literal statements and NamedNode statements (which have two different models), HexTuples uses a single model that describes both. 75 | **Having a single model for all statements (HexTuples), makes it easier to serialize, query and store data.** 76 | 77 | ## HexTuples-NDJSON 78 | 79 | _This document serves as a work in progress / draft specification_ 80 | 81 | HexTuples-NDJSON is an [NDJSON](http://ndjson.org/) (Newline Delimited JSON) based HexTuples / RDF serialization format. 82 | It is desgined to support streaming parsing and provide great performance in a JS context (i.e. the browser). 83 | 84 | - A valid HexTuples document MUST be serialized using [NDJSON](http://ndjson.org/) 85 | - HexTuples-NDJSON MIME type: `application/hex+x-ndjson; charset=utf-8` 86 | - Each array MUST consist of six strings. 87 | - Each array represents one RDF statement / quad / triple 88 | - The six strings in each array respectively represent `subject`, `predicate`, `value`, `datatype`, `lang` and `graph`. 89 | - The `datatype` and `lang` fields are only used when the `value` represents a Literal value (i.e. not a URI, but a string / date / something else). In RDF, the combination of `value`, `datatype` and `lang` are known as `object`. 90 | - When expressing an Object that is a NamedNode, use this string as the datatype: "globalId" ([discussion](https://github.com/ontola/hextuples/issues/1)) 91 | - When expressing an Object that is a BlankNode, use this string as the datatype: "localId" 92 | - If the `graph` is a blank node (i.e. anonymous), use an underscore as the URI scheme: `_:myNode`. ([discussion](https://github.com/ontola/hextuples/issues/2)). Parsers SHOULD interpret these as blank graphs, but MAY discard these if they have no support for them. 93 | - When a field has no value, use an empty string: `""` 94 | 95 | ### Example 96 | 97 | English: 98 | 99 | _Tim Berners-Lee was born in London, on the 8th of june in 1955._ 100 | 101 | Turtle / N-Triples: 102 | 103 | ```n-triples 104 | "1955-06-08"^^. 105 | . 106 | ``` 107 | 108 | Expresed in HexTuples: 109 | 110 | ```ndjson 111 | ["https://www.w3.org/People/Berners-Lee/", "http://schema.org/birthDate", "1955-06-08", "http://www.w3.org/2001/XMLSchema#date", "", ""] 112 | ["https://www.w3.org/People/Berners-Lee/", "http://schema.org/birthPlace", "http://dbpedia.org/resource/London", "globalId", "", ""] 113 | ``` 114 | 115 | ## Implementations 116 | 117 | ### Ontola TypeScript HexTuples Parser 118 | 119 | * 120 | 121 | This Typescript code should give you some idea of how to write a parser for HexTuples. 122 | 123 | ```ts 124 | const object = (value: string, datatype: string, language: string): SomeTerm => { 125 | if (language) { 126 | return literal(value, language); 127 | } else if (datatype === 'globalId') { 128 | return namedNode(value); 129 | } else if (datatype === 'localId') { 130 | return blankNode(value); 131 | } 132 | 133 | return literal(value, namedNode(datatype)); 134 | }; 135 | 136 | const lineToQuad = (h: string[]) => quad( 137 | h[0].startsWith('_:') ? blankNode(h[0]) : namedNode(h[0]), 138 | namedNode(h[1]), 139 | object(h[2], h[3], h[4]), 140 | h[5] ? namedNode(h[5]) : defaultGraph(), 141 | ); 142 | ``` 143 | 144 | ### Python RDFlib 145 | 146 | * 147 | * RDFLib is a pure Python package for working with RDF. 148 | * It supports parsing and serliazing RDF as HexTuples 149 | * Internally (in Python objects), RDF parsed from HexTuples data is represented in a _Conjunctive Graph_, that is a multi-graph object 150 | * HexTuples files must end in the file extension `.hext` for RDFlib to auto-recognise the format although files with any ending can be used if the format is given (`format=hext`) 151 | 152 | An RDF format conversion tool using RDFLib that can convert from/to HexTuples is online at . 153 | 154 | ## Motivation for HexTuples-NDJSON 155 | 156 | HexTuples was designed by [Thom van Kalkeren](https://github.com/fletcher91/) (CTO of Ontola) because he noticed that parsing / serialization was unnecessarily costly in our [full-RDF stack](https://ontola.io/blog/full-stack-linked-data/), even when using the relatively performant `n-quads` format. 157 | 158 | - Since HexTuples is serialized in NDJSON, it benefits from the [highly optimised JSON parsers in browsers](https://v8.dev/blog/cost-of-javascript-2019#json). 159 | - It uses NDJSON instead of regular JSON because it makes it easier to parse **concatenated responses** (multiple root objects in one document). 160 | - NDJSON enables **streaming parsing** as well, which gives it another performance boost. 161 | - Some JS RDF libraries ([link-lib](https://github.com/fletcher91/link-lib/), [link-redux](https://github.com/fletcher91/link-redux/)) have an internal RDF graph model which uses these HexTuples arrays as well, which means that there is minimal mapping cost when parsing Hex-Tuple statements. 162 | This format is especially suitable for real front-end applications that use dynamic RDF data. 163 | --------------------------------------------------------------------------------