└── README.md
/README.md:
--------------------------------------------------------------------------------
1 | # HexTuples
2 |
3 | _Status: draft_
4 |
5 | _Version: 0.3.0_
6 |
7 | HexTuples is a simple datamodel for dealing with linked data.
8 | This document both describes the model and concepts of HexTuples, as well as the (at this moment only) serialization format: HexTuples-NDJSON.
9 | It is very **easy to parse**, can be used for **streaming parsing** and is designed to be **highly performant** in JS contexts.
10 |
11 | ## Concepts
12 |
13 | ### HexTuple
14 |
15 | A single _HexTuple_ is an atomic piece of data, similar to an [RDF Triple](https://www.w3.org/TR/rdf-concepts/#section-triples) (also known as Statements or Quads).
16 | A HexTuple cotains a small piece of information.
17 | HexTuples consist of six fields: `subject`, `predicate`, `value`, `datatype`, `language` and `graph`.
18 |
19 | Let's encode the following sentence in HexTuples:
20 |
21 | _Tim Berners-Lee, the director of W3C, is born in London on the 8th of June, 1955._
22 |
23 | | Subject | Predicate | Value | DataType | Language | Graph |
24 | |---------|----------------|------------|-----|-----|----|
25 | | [Tim](https://www.w3.org/People/Berners-Lee/) |[birthPlace](http://schema.org/birthPlace) | [London](http://dbpedia.org/resource/London) | | |
26 | | [Tim](https://www.w3.org/People/Berners-Lee/) |[birthDate](http://schema.org/birthDate) | 1955-06-08 | [xsd:date](http://www.w3.org/2001/XMLSchema#date) | |
27 | | [Tim](https://www.w3.org/People/Berners-Lee/) |[jobTitle](http://schema.org/jobTitle) | Director of W3C | [rdf:langString](http://www.w3.org/2000/01/rdf-schema#langString) | en-US |
28 |
29 | ### URI
30 |
31 | URI stands for [Uniform Resource Identifier, specified in RDF 3986](https://tools.ietf.org/html/rfc3986).
32 | The best known type of URI is the URL.
33 | Although it is currently best practice to use mostly HTTPS URLs as URIs, HexTuples works with any type of URI.
34 |
35 | ### Subject
36 |
37 | - The _subject_ is identifier of the thing the statement is about.
38 | - This field is required.
39 | - It MUST be a URI.
40 |
41 | ### Predicate
42 |
43 | - The _predicate_ describes the abstract property of the statement.
44 | - This field is required.
45 | - It MUST be a URI.
46 |
47 | ### Value
48 |
49 | - The _value_ contains the object of the HexTuple.
50 | - This field is required.
51 | - It can be any datatype, specified in the `datatype` of the HexTuple.
52 |
53 | ### Datatype
54 |
55 | - The _datatype_ contains the object of the HexTuple.
56 | - This field is optional.
57 | - It MUST be a URI or an empty string.
58 | - When the Datatype is a NamedNode, use: `globalId`
59 | - When the Datatype is a BlankNode, use: `localId`
60 |
61 | ### Language
62 |
63 | - The _datatype_ contains the object of the HexTuple.
64 | - This field is optional.
65 | - It MUST be an [RFC 3066 language tag](https://tools.ietf.org/html/rfc3066) or an empty string.
66 |
67 | ## Relation to RDF
68 |
69 | The HexTuples datamodel closely resembles the RDF Data Model, which is the de-facto standard for linked data.
70 | RDF statements are often called Triples, because they consist of a `subject`, `predicate` and `value`.
71 | The `object` field is either a single URI (in Named Nodes), or a combination of three fields (in Literal): `value`, `datatype`, `language`.
72 | This means that a single Triple can actually consist of _five_ fields: the `subject`, `predicate`, `value`, `datatype` and the `language`.
73 | A Quad statement also has a `graph`, which totals to six fields, hence the name: HexTuples.
74 | Instead of making a distinction between Literal statements and NamedNode statements (which have two different models), HexTuples uses a single model that describes both.
75 | **Having a single model for all statements (HexTuples), makes it easier to serialize, query and store data.**
76 |
77 | ## HexTuples-NDJSON
78 |
79 | _This document serves as a work in progress / draft specification_
80 |
81 | HexTuples-NDJSON is an [NDJSON](http://ndjson.org/) (Newline Delimited JSON) based HexTuples / RDF serialization format.
82 | It is desgined to support streaming parsing and provide great performance in a JS context (i.e. the browser).
83 |
84 | - A valid HexTuples document MUST be serialized using [NDJSON](http://ndjson.org/)
85 | - HexTuples-NDJSON MIME type: `application/hex+x-ndjson; charset=utf-8`
86 | - Each array MUST consist of six strings.
87 | - Each array represents one RDF statement / quad / triple
88 | - The six strings in each array respectively represent `subject`, `predicate`, `value`, `datatype`, `lang` and `graph`.
89 | - The `datatype` and `lang` fields are only used when the `value` represents a Literal value (i.e. not a URI, but a string / date / something else). In RDF, the combination of `value`, `datatype` and `lang` are known as `object`.
90 | - When expressing an Object that is a NamedNode, use this string as the datatype: "globalId" ([discussion](https://github.com/ontola/hextuples/issues/1))
91 | - When expressing an Object that is a BlankNode, use this string as the datatype: "localId"
92 | - If the `graph` is a blank node (i.e. anonymous), use an underscore as the URI scheme: `_:myNode`. ([discussion](https://github.com/ontola/hextuples/issues/2)). Parsers SHOULD interpret these as blank graphs, but MAY discard these if they have no support for them.
93 | - When a field has no value, use an empty string: `""`
94 |
95 | ### Example
96 |
97 | English:
98 |
99 | _Tim Berners-Lee was born in London, on the 8th of june in 1955._
100 |
101 | Turtle / N-Triples:
102 |
103 | ```n-triples
104 | "1955-06-08"^^.
105 | .
106 | ```
107 |
108 | Expresed in HexTuples:
109 |
110 | ```ndjson
111 | ["https://www.w3.org/People/Berners-Lee/", "http://schema.org/birthDate", "1955-06-08", "http://www.w3.org/2001/XMLSchema#date", "", ""]
112 | ["https://www.w3.org/People/Berners-Lee/", "http://schema.org/birthPlace", "http://dbpedia.org/resource/London", "globalId", "", ""]
113 | ```
114 |
115 | ## Implementations
116 |
117 | ### Ontola TypeScript HexTuples Parser
118 |
119 | *
120 |
121 | This Typescript code should give you some idea of how to write a parser for HexTuples.
122 |
123 | ```ts
124 | const object = (value: string, datatype: string, language: string): SomeTerm => {
125 | if (language) {
126 | return literal(value, language);
127 | } else if (datatype === 'globalId') {
128 | return namedNode(value);
129 | } else if (datatype === 'localId') {
130 | return blankNode(value);
131 | }
132 |
133 | return literal(value, namedNode(datatype));
134 | };
135 |
136 | const lineToQuad = (h: string[]) => quad(
137 | h[0].startsWith('_:') ? blankNode(h[0]) : namedNode(h[0]),
138 | namedNode(h[1]),
139 | object(h[2], h[3], h[4]),
140 | h[5] ? namedNode(h[5]) : defaultGraph(),
141 | );
142 | ```
143 |
144 | ### Python RDFlib
145 |
146 | *
147 | * RDFLib is a pure Python package for working with RDF.
148 | * It supports parsing and serliazing RDF as HexTuples
149 | * Internally (in Python objects), RDF parsed from HexTuples data is represented in a _Conjunctive Graph_, that is a multi-graph object
150 | * HexTuples files must end in the file extension `.hext` for RDFlib to auto-recognise the format although files with any ending can be used if the format is given (`format=hext`)
151 |
152 | An RDF format conversion tool using RDFLib that can convert from/to HexTuples is online at .
153 |
154 | ## Motivation for HexTuples-NDJSON
155 |
156 | HexTuples was designed by [Thom van Kalkeren](https://github.com/fletcher91/) (CTO of Ontola) because he noticed that parsing / serialization was unnecessarily costly in our [full-RDF stack](https://ontola.io/blog/full-stack-linked-data/), even when using the relatively performant `n-quads` format.
157 |
158 | - Since HexTuples is serialized in NDJSON, it benefits from the [highly optimised JSON parsers in browsers](https://v8.dev/blog/cost-of-javascript-2019#json).
159 | - It uses NDJSON instead of regular JSON because it makes it easier to parse **concatenated responses** (multiple root objects in one document).
160 | - NDJSON enables **streaming parsing** as well, which gives it another performance boost.
161 | - Some JS RDF libraries ([link-lib](https://github.com/fletcher91/link-lib/), [link-redux](https://github.com/fletcher91/link-redux/)) have an internal RDF graph model which uses these HexTuples arrays as well, which means that there is minimal mapping cost when parsing Hex-Tuple statements.
162 | This format is especially suitable for real front-end applications that use dynamic RDF data.
163 |
--------------------------------------------------------------------------------