├── CHANGELOG.md ├── LICENSE ├── README.md ├── assertions.md ├── collections.md └── reduction.md /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Changelog 2 | All notable changes to this project will be documented in this file. 3 | 4 | The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). 5 | 6 | ## [Unreleased] 7 | - Initial description of assertions, reduction, and collections. 8 | 9 | 10 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Knowledge Futures, Inc. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 6 | 7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ARC Protocol 2 | 3 | The ARC protocol describes assertions, reduction, and collections for the [Underlay](https://www.underlay.org). The protocol's basic premise is that a knowledge graph can be constructed by a series of transactional statements called *assertions*. These transactional assertions are transformed into a singular *materialized state* through a process called *reduction*. Assertions can be grouped together in a *collection* which provides details on the intended shape and reduction of the data. 4 | 5 | ## Assertions 6 | Assertions are the fundamental unit of data in the Underlay. An assertion is represented by an immutable RDF dataset. Assertions can be signed and can specify their provenance to allow for trust- and context-based filtering of data. 7 | 8 | [Spec](assertions.md) 9 | 10 | **Examples** 11 | 12 | 13 | ## Reduction 14 | Reduction is the process of taking a set of [assertions](assertions.md) and merging them to create a consistent state called a 'materialized state'. Reduction allows assertions to be used as immutable transactions that change a larger graph. 15 | 16 | A reduction process is described using the Rex (reduction expressions) language. Rex is a structural schema language (similar to [ShEx](https://shex.io)) that describe RDF graphs. 17 | 18 | [Spec](reduction.md) 19 | 20 | **Examples**: 21 | 22 | **Implementations**: [rex-js](), [rex-go]() 23 | 24 | ## Collections 25 | Collections are containers that enable curation of a usefully scoped set of graph data – they serve many of the same roles packages do in software development. Collections contain a set of assertions, a schema describing the shape of data within those assertions, metadata to help discovery and curation, associated files, and sub-collections. 26 | 27 | Collections hold data through a set of immutable, transactional updates called assertions. Collections contain a schema that describes the shape of the materialized state intended by the collection author. The same set of assertions held in two collections with different schemas would produce different materialized states. 28 | 29 | [Spec](collections.md) 30 | 31 | **Examples** -------------------------------------------------------------------------------- /assertions.md: -------------------------------------------------------------------------------- 1 | # Assertions 2 | 3 | Assertions are the fundamental unit of data in the Underlay. An assertion is represented by an immutable RDF dataset. Assertions can be signed and can specify their provenance to allow for trust- and context-based filtering of data. 4 | 5 | There is no strict "spec" for an assertion other than that it is an RDF dataset. There are a number of best practices and common procedures that are expected of assertions (e.g. signing and provenance), but technically speaking, they are not required. 6 | 7 | Content within an assertion is held in the RDF dataset's named graphs. Named graphs must have blank graph names. 8 | 9 | A single assertion should carry a coherent unit of data, such as a row in a database or a snapshot of state. Not all domains have clear divisions; applications are encouraged to use assertions in whatever way feels most appropriate. 10 | 11 | Assertions label components of data according to their provenance. They group the parts of the data that came from the same source, are attributed to the same entity, etc. In general, schemas and provenance do not always align, so named graphs are used as an extra degree of freedom to capture the difference. 12 | 13 | ## Provenance 14 | 15 | > Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. - _[PROV-Overview](https://www.w3.org/TR/prov-overview/)_ 16 | 17 | The default graph of an assertion describes the provenance of the named graphs using the [PROV-O Ontology](https://www.w3.org/TR/prov-o/), using each named graph's blank graph label to refer to each one as needed. This promotes an interpretation of RDF dataset semantics where the graph name denotes the named graph, as described [here](https://www.w3.org/TR/rdf11-datasets/#the-graph-name-denotes-the-named-graph-or-the-graph) and as popularized by JSON-LD's representation of named graphs. 18 | 19 | Every named graph must be the subject of at least one triple in the default graph, whose predicate is one of `prov:wasDerivedFrom`, `prov:wasAttributedTo`, or `prov:wasGeneratedBy` (or a well-known subclass). Note that the range of these predicates are PROV Entities, so the objects of these "provenance entry point" triples cannot be RDF literals. 20 | 21 | There is no upper limit to the contents of the default graph - it may describe the PROV Entities associated with an named graph using other ontologies as well - as long as it is in the service of describing the named graphs, and not carrying asserted data itself. 22 | 23 | Assertions split data into named graphs by their known provenance. A named graph should be a chunk of data whose provenance is described atomically, even if it means splitting up parts of a data structure between named graphs. Querying, validation, and other operations are all done over a merged graph. Depending on the granularity of the known provenance, there may be only one named graph, or there may be a separate one for each asserted triple. Most messages have just one assertion. 24 | 25 | 26 | ## Signatures 27 | 28 | The signature is the last part of an assertion to be assembled, and should be the first part of an assertion to be parsed. 29 | 30 | Assertions use the [`LinkedDataSignature2016`](https://web-payments.org/vocabs/security#LinkedDataSignature2015) signature for signing RDF datasets, which represents signatures as part of the dataset itself. That is, the signature is represented directly as RDF in the default graph, and only signs the "rest" of the dataset. 31 | 32 | To sign a message, the unsigned dataset is first canonicalized using the [URDNA2015](https://json-ld.github.io/normalization/spec/) algorithm. Then the canonicalized string is signed with the [rsa-sha256 algorithm](http://www.w3.org/2000/09/xmldsig#rsa-sha256). 33 | 34 | The signature is encoded as base64 text in an `xsd:string` RDF literal, and added to the default graph as the object of a triple with predicate `sec:signatureValue` and a new blank node subject (called `_:sig` here). 35 | 36 | The blank signature node is also the subject of three additional triples added to the default graph: 37 | 38 | - `_:sig rdf:type sec:LinkedDataSignature2016` 39 | - `_:sig dcterms:created `, where `` is a literal with datatype [`xsd:dateTime`](https://www.w3.org/TR/xmlschema11-2/#dateTime) 40 | - `_:sig dcterms:creator `, where `` is a URI that can be dereferenced to retrieve the associated public key 41 | - For IPFS keys, use `dweb:/ipns/Qm...`, where `Qm...` is a [base58 PeerId](https://docs.ipfs.io/guides/concepts/ipns/). 42 | - Registries that control user's keys will have to implement their own standards around this. 43 | 44 | Despite the awkwardness of splicing signatures into the default graph, parsing and validating them is deterministic so long as the signature node label is unique, and no other set of four triples in the default graph match the same pattern. 45 | 46 | ### Identities vs keys 47 | 48 | In general, tying user identity to individual keys is bad cryptographic practice. There is an [Linked Data Signatures](https://w3c-dvcg.github.io/ld-signatures) spec under active development by the W3C Digital Verification Community Group, which approaches signing with this in mind, with the goal of supporting "N entities with M keys" per user. `LinkedDataSignature2016` is a much simpler scheme that is adopted here for temporary use while these specs and tools stabilize. 49 | 50 | -------------------------------------------------------------------------------- /collections.md: -------------------------------------------------------------------------------- 1 | # Collections 2 | 3 | Collections are containers that enable curation of a usefully scoped set of graph data – they serve many of the same roles packages do in software development. Collections contain a set of assertions, a schema describing the shape of data within those assertions, metadata to help discovery and curation, associated files, and sub-collections. 4 | 5 | Collections hold data through a set of immutable, transactional updates called assertions. Collections contain a schema that describes the shape of the materialized state intended by the collection author. The same set of assertions held in two collections with different schemas would produce different materialized states. 6 | 7 | Collections are described by a single file. We recommend naming this file collection.json. A collection file will describe metadata about the collection and the specific content of the collection: 8 | 9 | ```json 10 | { 11 | "namespace": "https://r1.underlay.org", 12 | "id": "ba36a43e-53e9-46b8-a8f0-2bcff1909bdc", 13 | "name": "isTravis/actors", 14 | "version": "2.0.2", 15 | "schema": "dweb:/ipfs/bafybeih4wdwetrvaz2ospgebag4rtndhhqebwgmos6hbwxdhfhtw3d2vde", 16 | "assertions": [ 17 | "dweb:/ipfs/bafybeih4wdwetrvaz2ospgebag4rtndhhqebwgmos6hbwxdhfhtw3d2vde", 18 | "https://r1.underlay.org/cid/bafkreib2xgk7gwailskap5ohnz4iua3pno2lm4wemop2bm7opgcun2dtse" 19 | ], 20 | "collections": { 21 | "https://r1.underlay.org/a950c6c0-475a-4f7d-8b3c-7a38ce2c735c": { 22 | "name": "isTravis/cities", 23 | "hash": "dweb:/ipfs/bafybeiatr6vzozvaxtp5f32ghixj4bvauz6wgl4lbbh6np4yrrsvtep3y4", 24 | "version": "2.0.3" 25 | } 26 | }, 27 | "files": { 28 | "schema.rex": "dweb:/ipfs/bafybeih4wdwetrvaz2ospgebag4rtndhhqebwgmos6hbwxdhfhtw3d2vde", 29 | "images/whale.jpg": "dweb:/ipfs/bafybeiatr6vzozvaxtp5f32ghixj4bvauz6wgl4lbbh6np4yrrsvtep3y4" 30 | } 31 | } 32 | ``` 33 | 34 | 35 | The single **collection file** uses content hashes to reference assertions, sub-collections, and files. These items are not described within the file itself, they are only referenced. As such, the collection file fully describes a collection, but does not provide the content of the collection itself. This allows the single collection file to be simply passed around without the heft of all the (potentially large) content. A **collection directory** can be created based on the content references in the collection file. The collection directory that is produced by locally storing the assertions, sub-collections, and files specified in the collection file. 36 | 37 | In practice, the word 'collection' may often be used to refer to either the collection file or the collection directory. Since both fully describe the collection, it's appropriate to do so. 38 | 39 | A collection is uniquely identified by the content hash of its collection file representation. 40 | 41 | File are listed with paths that allow the content of the collection directory to be heirarchyally structured. 42 | 43 | A collection has no record of past or future versions other than it’s singular `version` attribute. Other programs or tools can be used to keep track of version histories for collections that are understood to be conceptually versions of the same thing. The `id` attribute allows other programs and tools to consistently track related versions, while the `name` field provides space for a human-readable title for the collection. 44 | 45 | The schema describes the intended shape of the materialized state after reduction as well as reduction specifics. 46 | 47 | - `namespace`: [required] A URI of the service that provides namespace validation and verification. 48 | - `id`: [required] A stable id (prefereably, a uuid) that is unique within the namespace that allows multiple collection files (e.g. different versions) to be aggregated. 49 | - `name`: [required] A human-readable name for the collection. 50 | - `version`: [required] A [semver](https://semver.org) number. 51 | - `schema`: [required] A URI to the schema file. 52 | - `assertions`: An list of URIs included in the collection. 53 | - `collections`: An object describing sub collections within the collection. The keys are stable URIs of the sub-collection while the value is another object containing the human-readable name, semver version, and content URI of the sepecific collection file. 54 | - `files`: An object describing files contained within the collection. The keys are filepath strings while the values are URIs to the file itself. 55 | 56 | -------------------------------------------------------------------------------- /reduction.md: -------------------------------------------------------------------------------- 1 | # Reduction 2 | 3 | Reduction is the process of taking a set of [assertions](assertions.md) and merging them to create a consistent state called a 'materialized state'. Reduction allows assertions to be used as immutable transactions that change a larger graph. 4 | 5 | A reduction process is described using the Rex (reduction expressions) language. Rex is a structural schema language (similar to [ShEx](https://shex.io)) that describe RDF graphs. 6 | 7 | A Rex schema is a set of named shapes, each of which is a set of triple constraints, which we will also call properties. Each property has a unique (within the shape) predicate, a minimum and maximum cardinality, and a value expression that is either a pure function over IRIs and Literals (term: RDF.NamedNode | RDF.Literal) => boolean or a reference to another shape in the same schema. 8 | 9 | --------------------------------------------------------------------------------