└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | # HexTuples
  2 | 
  3 | _Status: draft_
  4 | 
  5 | _Version: 0.3.0_
  6 | 
  7 | HexTuples is a simple datamodel for dealing with linked data.
  8 | This document both describes the model and concepts of HexTuples, as well as the (at this moment only) serialization format: HexTuples-NDJSON.
  9 | It is very **easy to parse**, can be used for **streaming parsing** and is designed to be **highly performant** in JS contexts. 
 10 | 
 11 | ## Concepts
 12 | 
 13 | ### HexTuple
 14 | 
 15 | A single _HexTuple_ is an atomic piece of data, similar to an [RDF Triple](https://www.w3.org/TR/rdf-concepts/#section-triples) (also known as Statements or Quads).
 16 | A HexTuple cotains a small piece of information. 
 17 | HexTuples consist of six fields: `subject`, `predicate`, `value`, `datatype`, `language` and `graph`.
 18 | 
 19 | Let's encode the following sentence in HexTuples:
 20 | 
 21 | _Tim Berners-Lee, the director of W3C, is born in London on the 8th of June, 1955._
 22 | 
 23 | | Subject    | Predicate     | Value | DataType | Language | Graph |
 24 | |---------|----------------|------------|-----|-----|----|
 25 | | [Tim](https://www.w3.org/People/Berners-Lee/)     |[birthPlace](http://schema.org/birthPlace) | [London](http://dbpedia.org/resource/London)     | | |
 26 | | [Tim](https://www.w3.org/People/Berners-Lee/)     |[birthDate](http://schema.org/birthDate) | 1955-06-08     | [xsd:date](http://www.w3.org/2001/XMLSchema#date) | | 
 27 | | [Tim](https://www.w3.org/People/Berners-Lee/)     |[jobTitle](http://schema.org/jobTitle) | Director of W3C  | [rdf:langString](http://www.w3.org/2000/01/rdf-schema#langString) | en-US | 
 28 | 
 29 | ### URI
 30 | 
 31 | URI stands for [Uniform Resource Identifier, specified in RDF 3986](https://tools.ietf.org/html/rfc3986).
 32 | The best known type of URI is the URL.
 33 | Although it is currently best practice to use mostly HTTPS URLs as URIs, HexTuples works with any type of URI.
 34 | 
 35 | ### Subject
 36 | 
 37 | - The _subject_ is identifier of the thing the statement is about.
 38 | - This field is required.
 39 | - It MUST be a URI.
 40 | 
 41 | ### Predicate
 42 | 
 43 | - The _predicate_ describes the abstract property of the statement.
 44 | - This field is required.
 45 | - It MUST be a URI.
 46 | 
 47 | ### Value
 48 | 
 49 | - The _value_ contains the object of the HexTuple.
 50 | - This field is required.
 51 | - It can be any datatype, specified in the `datatype` of the HexTuple.
 52 | 
 53 | ### Datatype
 54 | 
 55 | - The _datatype_ contains the object of the HexTuple.
 56 | - This field is optional.
 57 | - It MUST be a URI or an empty string.
 58 | - When the Datatype is a NamedNode, use: `globalId`
 59 | - When the Datatype is a BlankNode, use: `localId`
 60 | 
 61 | ### Language
 62 | 
 63 | - The _datatype_ contains the object of the HexTuple.
 64 | - This field is optional.
 65 | - It MUST be an [RFC 3066 language tag](https://tools.ietf.org/html/rfc3066) or an empty string.
 66 | 
 67 | ## Relation to RDF
 68 | 
 69 | The HexTuples datamodel closely resembles the RDF Data Model, which is the de-facto standard for linked data.
 70 | RDF statements are often called Triples, because they consist of a `subject`, `predicate` and `value`.
 71 | The `object` field is either a single URI (in Named Nodes), or a combination of three fields (in Literal): `value`, `datatype`, `language`.
 72 | This means that a single Triple can actually consist of _five_ fields: the `subject`, `predicate`, `value`, `datatype` and the `language`. 
 73 | A Quad statement also has a `graph`, which totals to six fields, hence the name: HexTuples.
 74 | Instead of making a distinction between Literal statements and NamedNode statements (which have two different models), HexTuples uses a single model that describes both.
 75 | **Having a single model for all statements (HexTuples), makes it easier to serialize, query and store data.**
 76 | 
 77 | ## HexTuples-NDJSON
 78 | 
 79 | _This document serves as a work in progress / draft specification_
 80 | 
 81 | HexTuples-NDJSON is an [NDJSON](http://ndjson.org/) (Newline Delimited JSON) based HexTuples / RDF serialization format.
 82 | It is desgined to support streaming parsing and provide great performance in a JS context (i.e. the browser).
 83 | 
 84 | - A valid HexTuples document MUST be serialized using [NDJSON](http://ndjson.org/)
 85 | - HexTuples-NDJSON MIME type: `application/hex+x-ndjson; charset=utf-8`
 86 | - Each array MUST consist of six strings.
 87 | - Each array represents one RDF statement / quad / triple
 88 | - The six strings in each array respectively represent  `subject`, `predicate`, `value`, `datatype`, `lang` and `graph`.
 89 | - The `datatype` and `lang` fields are only used when the `value` represents a Literal value (i.e. not a URI, but a string / date / something else). In RDF, the combination of `value`, `datatype` and `lang` are known as `object`.
 90 | - When expressing an Object that is a NamedNode, use this string as the datatype: "globalId" ([discussion](https://github.com/ontola/hextuples/issues/1))
 91 | - When expressing an Object that is a BlankNode, use this string as the datatype: "localId"
 92 | - If the `graph` is a blank node (i.e. anonymous), use an underscore as the URI scheme: `_:myNode`. ([discussion](https://github.com/ontola/hextuples/issues/2)). Parsers SHOULD interpret these as blank graphs, but MAY discard these if they have no support for them.
 93 | - When a field has no value, use an empty string: `""`
 94 | 
 95 | ### Example
 96 | 
 97 | English:
 98 | 
 99 | _Tim Berners-Lee was born in London, on the 8th of june in 1955._
100 | 
101 | Turtle / N-Triples:
102 | 
103 | ```n-triples
104 | <https://www.w3.org/People/Berners-Lee/> <http://schema.org/birthDate> "1955-06-08"^^<http://www.w3.org/2001/XMLSchema#date>.
105 | <https://www.w3.org/People/Berners-Lee/> <http://schema.org/birthPlace> <http://dbpedia.org/resource/London>.
106 | ```
107 | 
108 | Expresed in HexTuples:
109 | 
110 | ```ndjson
111 | ["https://www.w3.org/People/Berners-Lee/", "http://schema.org/birthDate", "1955-06-08", "http://www.w3.org/2001/XMLSchema#date", "", ""]
112 | ["https://www.w3.org/People/Berners-Lee/", "http://schema.org/birthPlace", "http://dbpedia.org/resource/London", "globalId", "", ""]
113 | ```
114 | 
115 | ## Implementations
116 | 
117 | ### Ontola TypeScript HexTuples Parser
118 | 
119 | * <https://github.com/ontola/hextuples-parser>
120 | 
121 | This Typescript code should give you some idea of how to write a parser for HexTuples.
122 | 
123 | ```ts
124 | const object = (value: string, datatype: string, language: string): SomeTerm => {
125 |   if (language) {
126 |     return literal(value, language);
127 |   } else if (datatype === 'globalId') {
128 |     return namedNode(value);
129 |   } else if (datatype === 'localId') {
130 |     return blankNode(value);
131 |   }
132 | 
133 |   return literal(value, namedNode(datatype));
134 | };
135 | 
136 | const lineToQuad = (h: string[]) => quad(
137 |   h[0].startsWith('_:') ? blankNode(h[0]) : namedNode(h[0]),
138 |   namedNode(h[1]),
139 |   object(h[2], h[3], h[4]),
140 |   h[5] ? namedNode(h[5]) : defaultGraph(),
141 | );
142 | ```
143 | 
144 | ### Python RDFlib
145 | 
146 | * <https://pypi.org/project/rdflib/>
147 | * RDFLib is a pure Python package for working with RDF. 
148 | * It supports parsing and serliazing RDF as HexTuples
149 | * Internally (in Python objects), RDF parsed from HexTuples data is represented in a _Conjunctive Graph_, that is a multi-graph object
150 | * HexTuples files must end in the file extension `.hext` for RDFlib to auto-recognise the format although files with any ending can be used if the format is given (`format=hext`)
151 | 
152 | An RDF format conversion tool using RDFLib that can convert from/to HexTuples is online at <https://tools.dev.kurrawong.ai/convert>.
153 | 
154 | ## Motivation for HexTuples-NDJSON
155 | 
156 | HexTuples was designed by [Thom van Kalkeren](https://github.com/fletcher91/) (CTO of Ontola) because he noticed that parsing / serialization was unnecessarily costly in our [full-RDF stack](https://ontola.io/blog/full-stack-linked-data/), even when using the relatively performant `n-quads` format.
157 | 
158 | - Since HexTuples is serialized in NDJSON, it benefits from the [highly optimised JSON parsers in browsers](https://v8.dev/blog/cost-of-javascript-2019#json).
159 | - It uses NDJSON instead of regular JSON because it makes it easier to parse **concatenated responses** (multiple root objects in one document).
160 | - NDJSON enables **streaming parsing** as well, which gives it another performance boost.
161 | - Some JS RDF libraries ([link-lib](https://github.com/fletcher91/link-lib/), [link-redux](https://github.com/fletcher91/link-redux/)) have an internal RDF graph model which uses these HexTuples arrays as well, which means that there is minimal mapping cost when parsing Hex-Tuple statements.
162 | This format is especially suitable for real front-end applications that use dynamic RDF data.
163 | 


--------------------------------------------------------------------------------