11 | DASL components are designed to be small and to the extent possible self-contained.
12 | Because of that, despite having roots in IPFS, no DASL component depends on
13 | traditional IPFS implementations. In many cases, systems produced to work with
14 | DASL will interoperate with IPFS, but DASL's independence means that there can
15 | occasionally be impedance mismatch. This document is a non-normative list of
16 | things to consider when using DASL with IPFS systems, particularly with ones based
17 | on the DHT and Bitswap.
18 |
19 |
20 |
21 |
Amino / Bitswap Compatibility
22 |
23 | If you're interoperating with Amino (or other DHTs with similar properties) or
24 | need to use Bitswap:
25 |
26 |
27 |
Block size
28 |
29 | Nothing in DASL assumes a maximum block size, but Bitswap does. DASL does not have
30 | a built-in solution to break files into blocks (and we intend to keep it that way).
31 | In order to plan for interop with Bitswap, you will want to do your own splitting
32 | and probably to have a metadata wrapper to keep track of the split.
33 |
11 | BDASL extends DASL CIDs with a new hash type that works better for large files but isn't
12 | available by default in browsers, and therefore not an appropriate option in most
13 | situations.
14 |
15 |
16 |
17 |
Introduction
18 |
19 | BDASL extends DASL CIDs by adding BLAKE3 support ([[blake3]). BLAKE3 is a powerful hashing
20 | framework that works well for progressive verification of large streams. Unfortunately,
21 | it isn't available in browser (and neither is streaming hashing in general) which makes it
22 | inappriopriate for inclusion as the primary hash function in DASL CIDs.
23 |
24 |
25 | It is recommended to avoid using BDASL CIDs in arbitrary open environments, and rather to
26 | focus on using such CIDs in specific cases in which participants are likely to know how
27 | to handle them.
28 |
29 |
30 |
31 |
Parsing BDASL CIDs
32 |
33 | All the parsing works the same as for DASL CIDs ([[cid]]) with one modification.
34 |
35 |
36 | In the steps to decode a CID, the hash type
37 | may also be equal to 0x1e (BLAKE3) ([[blake3]]).
38 |
24 | DASL components are designed to be small and to the extent possible self-contained.
25 | Because of that, despite having roots in IPFS, no DASL component depends on
26 | traditional IPFS implementations. In many cases, systems produced to work with
27 | DASL will interoperate with IPFS, but DASL's independence means that there can
28 | occasionally be impedance mismatch. This document is a non-normative list of
29 | things to consider when using DASL with IPFS systems, particularly with ones based
30 | on the DHT and Bitswap.
31 |
32 |
33 |
34 |
35 |
Amino / Bitswap Compatibility
36 |
37 | If you're interoperating with Amino (or other DHTs with similar properties) or
38 | need to use Bitswap:
39 |
40 |
41 |
Block size
42 |
43 | Nothing in DASL assumes a maximum block size, but Bitswap does. DASL does not have
44 | a built-in solution to break files into blocks (and we intend to keep it that way).
45 | In order to plan for interop with Bitswap, you will want to do your own splitting
46 | and probably to have a metadata wrapper to keep track of the split.
47 |
24 | BDASL extends DASL CIDs with a new hash type that works better for large files but isn't
25 | available by default in browsers, and therefore not an appropriate option in most
26 | situations.
27 |
28 |
29 |
30 |
31 |
Introduction
32 |
33 | BDASL extends DASL CIDs by adding BLAKE3 support ([[blake3]). BLAKE3 is a powerful hashing
34 | framework that works well for progressive verification of large streams. Unfortunately,
35 | it isn't available in browser (and neither is streaming hashing in general) which makes it
36 | inappriopriate for inclusion as the primary hash function in DASL CIDs.
37 |
38 |
39 | It is recommended to avoid using BDASL CIDs in arbitrary open environments, and rather to
40 | focus on using such CIDs in specific cases in which participants are likely to know how
41 | to handle them.
42 |
43 |
44 |
45 |
Parsing BDASL CIDs
46 |
47 | All the parsing works the same as for DASL CIDs ([cid]) with one modification.
48 |
49 |
50 | In the steps to decode a CID, the hash type
51 | may also be equal to 0x1e (BLAKE3) ([blake3]).
52 |
11 | DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash
12 | with enough metadata to be extensible (to add new hash types in the future) and to indicate whether
13 | they are pointing to raw bytes or to structured data.
14 |
15 |
16 |
17 |
Introduction
18 |
19 | DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash
20 | with enough metadata to be extensible (to add new hash types in the future) and to indicate whether
21 | they are pointing to raw bytes or to structured data. If you're simply using DASL CIDs as identifiers, you
22 | can almost certainly just use the string as an opaque ID and worry no further.
23 |
24 |
25 | A DASL CID can be represented as a string or as an array of bytes. If you wish to understand the
26 | internals of a CID, it has the following structure:
27 |
28 |
29 |
30 | A b prefix (only in string form). This is an extensibility point for future
31 | CID encodings other than the current base32 to be supported. (Currently this is the only one.)
32 |
33 |
34 | A version number, which is currently always 1.
35 |
36 |
37 | A content codec, which is a flag indicating whether it is pointing to structured or raw
38 | data.
39 |
40 |
41 | A hash type, that is always SHA-256 ([[sha256]]).
42 |
43 |
44 | A hash size, that is always 32 (0x20).
45 |
46 |
47 | A digest, which is the hash of the content being identified.
48 |
49 |
50 |
51 |
52 |
Parsing CIDs
53 |
54 | Use the following steps to parse a string-encoded CID, i.e. translate it to a bytestring:
55 |
56 |
57 |
Accept a string CID.
58 |
Remove the first character from CID and store it in prefix.
59 |
If prefix is not equal to b, throw an error.
60 |
61 | Decode the rest of CID using
62 | the base32 algorithm from
63 | RFC4648 with a lowercase alphabet and store the result in CID bytes ([[rfc4648]]).
64 |
65 |
This results in CID bytes, which can be used to decode a CID.
66 |
67 |
68 | Use the following steps to decode a CID:
69 |
70 |
71 |
Accept an array of bytes CID bytes.
72 |
73 | Remove the first byte in CID bytes and store it in version.
74 |
75 |
If version is not equal to 1, throw an error.
76 |
77 | Remove the next byte in CID bytes and store it in codec.
78 |
79 |
80 | If codec is not equal to 0x55 (raw) or 0x71 (DRISL),
81 | throw an error ([[drisl]]).
82 |
83 |
84 | Remove the next byte in CID bytes and store it in hash type.
85 |
86 |
87 | If hash type is not equal to 0x12 (SHA-256), throw an error ([[sha256]]).
88 |
89 |
90 | Read one byte from CID bytes and store it in hash size. If hash size
91 | is any value other than 32 (0x20) , throw an error.
92 |
93 |
94 | Read 32 bytes from CID bytes and store them in digest. If there were
95 | fewer than 32 bytes left in CID bytes, throw an error.
96 |
106 | You don't need to understand IPFS in order to use DASL. This section is for informational
107 | purposes only.
108 |
109 |
110 | DASL CIDs are a strict subset of IPFS CIDs
111 | with the following properties:
112 |
113 |
114 |
Only modern CIDv1 CIDs are used, not legacy CIDv0.
115 |
116 | Only the lowercase base32 multibase encoding (the b prefix) is used for human-readable
117 | (and subdomain-usable) string encoding.
118 |
119 |
120 | Only the raw binary multicodec (0x55) and dag-cbor multicodec (0x71), with the
121 | latter used only for [[drisl]]-conformant DAGs of CBOR objects.
122 |
123 |
Only SHA-256 (0x12) for the hash function .
124 |
125 | The CID isn't the boss of anyone, but the expectation is that, regardless of size, resources
126 | should not be "chunked" into a DAG or Merkle tree (as historically done with UnixFS canonicalization
127 | in IPFS systems) but rather hashed in their entirety and content-addressed directly. That being
128 | said, a DASL CID can point to a piece of [[drisl]] metadata that describes this kind of
129 | chunking, if needed. (A separate specification may be added for that.)
130 |
131 |
132 | This set of options has the added advantage that all the aforementioned single-byte prefixes require no
133 | additional varint processing or byte-fiddling.
134 |
11 | DRISL is a serialization format that is deterministic (so that the same
12 | data will have the same CID) and that features native support for using
13 | CIDs as links.
14 | It is based on CBOR, using a narrow profile of CBOR's "Core" featureset called "[[cborc-42]]", specified formally in this IETF document.
15 |
16 |
17 |
18 |
Introduction
19 |
20 | Deterministic encodings that produce the same stream of bytes for any
21 | given data with the same semantics are particularly useful in a content-addressed
22 | context. DRISL provides that, compactly encoded as CBOR. Additionally, it supports CBOR's Tag 42 to compactly encode CIDs ([[cid]]) as bytestreams. This CID can be used for content-addressed linking
23 | between DRISL documents (such as MASL documents), or to resources (best wrapped in MASL documents where renderability as web documents or web apps is desired).
24 |
25 |
26 | DRISL does not fork CBOR, CDE, or dCBOR ([[cbor]], [[cde]], [[dcbor]]), but it is a subset of features defined in CBOR "Core" ([[cborc]]), first defined in the earliest CBOR RFC and largely unaffected by refinements made since.
27 | Any decoder for any version of CBOR since v1 will be able to read DRISL
28 | content as conformant CBOR, and with enough carefully configuration of a powerful CBOR library and in some case pre-processing of the data, DRISL can be encoded anywhere CBOR can.
29 |
30 |
31 |
32 |
Format
33 |
34 | DRISL is an application profile of CBOR ([[cbor]]) that mostly subsets the more established [[cborc]] profile, with the following additional
35 | constraints:
36 |
37 |
38 |
39 | Implementations MUST support Tag 42 in the CBOR
40 | Tag registry; this tag compactly encodes CIDs as bytestrings, as specified below.
41 |
42 |
43 | Implementations MUST reject all other tags, including any of the tags
44 | listed in the section
45 | 3.4 of RFC 8949.
46 |
47 |
48 | Implementations MUST reject map keys that are not strings.
49 |
50 |
51 | Floating-point numbers MUST always be encoded as a 64-bit IEEE 754 binary floating-point, never as a "half-precision" (16-byte, major 7-25) or "single-precision (32-byte, major 7-26)" CBOR key.
52 | NOTE: It is RECOMMENDED that users avoid encoding floating-point numbers as much as possible
53 | to minimize interoperability and tooling issues.
54 |
55 |
56 | Completely avoiding floating-point numbers is RECOMMENDED to minimize interoperability and tooling issues.
57 |
58 |
59 | Even where floating-point numbers are used, most of the IEEE 754 "special" floating points (infinity, negative infinity, minimal NaN, and NaN with payloads) MUST NOT be encoded.
60 | Negative zero is the only allowed special floating point.
61 |
62 |
63 |
64 |
65 | Indefinite-length arrays (and the "break" code making them usable, in major type 7) are not allowed.
66 |
67 |
68 | Similarly, indefinite, incomplete, or streaming CBOR cannot be hashed and thus cannot be referenced by CID; for this reason, DRISL can only encode finite, bounded documents and resources.
69 |
70 |
71 | Concatenation of DRISL objects is generally discouraged and incurs both performance and interoperability risks.
72 |
73 |
74 | Note that DRISL objects cannot be streamed as CBOR streams (defined in RFC 8742) except in MIME-type aware contexts, as per the CBOR streams specification.
75 |
76 |
77 | Applications are discouraged from handling concatenated DRISL objects or appending extra bytes of any kind to a DRISL object in memory or across interfaces, as doing so breaks the DRISL-wide assumption that each CID refers to one complete, discrete, and valid CBOR object, and that DRISL systems only ever will be expected to handle such objects.
78 |
79 |
80 |
81 |
82 | Encoders MUST NOT encode any simple values other than true, false, and null (20, 21, and 22 in section 3.3 of [[rfc8949]]).
83 |
84 |
85 |
86 | All other requirements are as CBOR/c ([[cborc]]).
87 |
88 |
89 |
90 |
CIDs in CBOR
91 |
92 | CIDs in CBOR are stored in binary format, as CBOR bytestrings under custom tag 42. For historical reasons the null
93 | byte 0x00 is prepended to the binary CID before storing in CBOR.
94 |
95 |
96 | For more information, see the [[cbor-tag42]] appendix to the [[drisl]] specification.
97 |
98 |
99 |
100 |
Conversion with JSON
101 |
102 | JSON lacks a native way to encode tag 42 for CIDs ([[cbor-tag42]], [[cid]]). Historically, there have
103 | been different conventions to represent CIDs in JSON. For example, DAG-JSON, part of IPLD, used
104 | an object with a single / key pointing to the stringified CID.
105 |
106 |
107 | The AT Protocol uses an object with a $link key pointing to the stringified DASL CID:
108 |
117 | This specification recommends that implementations default to the AT Protocol $link
118 | convention, but may offer the option to support DAG-JSON or other conventions as well.
119 |
11 | RASL is a URL scheme used to identify content-addressed DASL resources
12 | along with a simple HTTP-based retrieval method.
13 |
14 |
15 |
20 |
21 |
Introduction
22 |
23 | Content-addressed resources are "self-certifying," which is to say that
24 | you don't need any external authority to certify that the content you
25 | have when you resolve the identifier is correct: because the identifier
26 | contains a hash, you can (and should) verify that you obtained the right
27 | content yourself ([[ipfs-principles]]). The identifier is enough to
28 | certify the content. This has several implications, but two are
29 | particularly relevant for this specification:
30 |
31 |
32 |
33 | When resolving a content-addressed identifier, you can obtain the content
34 | from anyone. It doesn't have to be the content's author. You can even
35 | obtain it from entirely untrusted sources — given that you can always
36 | certify it, you don't need to trust whoever gives it to you. As a result,
37 | the authority part of a URL — the part that can certify the content you
38 | get, which is the domain part in an https URL — is the
39 | CID itself ([[cid]]).
40 |
41 |
42 | Because it doesn't matter where you get content from, content-addressed
43 | URLs are inherently transport-independent. There are benefits to agreeing
44 | on transport (if only so that people can find one another's content)
45 | but as a client, if you know of several potential ways of obtaining a
46 | CID you are free to use whichever you prefer or to try several in
47 | whatever order.
48 |
49 |
50 |
51 | Taking these aspects into consideration, this specification defines a URL
52 | scheme in which the CID is the authority, along with optional hints of
53 | potential look-up locations, and defines a retrieval method but does not
54 | mandate that RASL retrieval rely on it.
55 |
56 |
57 |
The rasl URL Scheme
58 |
59 | RASL URLs look like this:
60 | rasl://bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4/?hint=berjon.com&hint=bsky.app.
61 | This breaks down into the following components:
62 |
63 |
64 |
65 | The rasl scheme. This is simply used as an entry point into RASL semantics.
66 |
67 |
68 | An authority, which is simply a DASL CID ([[cid]]), here bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4.
69 |
70 |
71 | A path (here just /) that is empty or / and that is expected
72 | to be ignored since RASL is only used for raw data retrieval.
73 |
74 |
75 | A query string part, that is parsed according to CGI rules and that contains
76 | zero or more hint entries, each of which is a host from which
77 | that CID can be obtained.
78 |
79 |
80 |
81 | Use the following steps to parse a RASL URL:
82 |
83 |
84 |
Accept a string url and parse it according to the URL Standard ([[url]]).
85 |
If that's a failure, return the failure.
86 |
Read the host part of the parsed URL and store that in cid.
87 |
If cid is not a valid CID ([[cid]]), return failure.
88 |
89 | Select all the tuples in the query object associated with the URL ([[url]]) whose name is
90 | exaclty hint. Each value MUST match the syntax of a valid host for the
91 | https scheme and values which do not match this syntax MUST be ignored
92 | and removed from this list. Store the remaining values in hints. If there were none
93 | then hints is an empty array.
94 |
95 |
Return the URL's parts as well as cid and hints.
96 |
97 |
98 |
99 |
Fetching RASL
100 |
101 | A user agent may retrieve a CID in whichever way it prefers. This section
102 | provides a simple standard for HTTP-based CID retrieval, to make it
103 | easy for authors to publish content to their own sites and have it
104 | retrieved, without having to worry about operating any infrastructure
105 | beyond the web server they already have.
106 |
107 |
108 | Use the following steps to fetch a RASL URL:
109 |
110 |
111 |
Accept a string url and parse it according to the steps to parse a RASL URL.
112 |
113 | Construct a request using cid from the url as well as hints that may
114 | be from the URL or from elsewhere (this is entirely up to you):
115 |
116 |
117 | For each hint, construct a request URL that is the concatenation of https://,
118 | the hint as host, /.well-known/rasl/, and the cid.
119 |
120 |
121 | Prepare the request such that it has a method of either GET or HEAD,
122 | that it is stateless (no cookies, no credentials of any kind), and that it uses no content
123 | negotiation.
124 |
125 |
126 |
127 |
128 | Fetch the requests. How these get prioritised is entirely up to the implementation. It
129 | is common to run them all in parallel and abort them with the first success response.
130 | Note that the .well-known path may redirect, so be ready to handle that.
131 | This makes it possible to create sites that are published the usual way and to have a RASL that
132 | is simply a redirect to the resource. So for instance, you may have an existing
133 | https://berjon.com/kitten.jpg the CID for which is
134 | bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4.
135 | This can be published as this RASL URL:
136 | rasl://bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4/?hint=berjon.com.
137 | A client can retrieve it by constructing the a request to this URL:
138 | https://berjon.com/.well-known/rasl/bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4.
139 | In turn, the latter may simply return a 307 redirect back to https://berjon.com/kitten.jpg.
140 | (Yes, this is HTTP with extra steps, but the extra steps get you self-certifying content.)
141 |
142 |
143 | If the response is a redirect but not a 307, the client SHOULD treat it as if it
144 | had been a 307 anyway.
145 |
146 |
147 | If none of the responses are successful, return failure.
148 |
149 |
150 | Set the response's media type to application/octet-stream. (The server should have
151 | done that already, but may not have done so, notably if it relied on a redirect.) The purpose
152 | of RASL is to retrieve data in ways that are independent of the server — any media type
153 | processing must therefore take place at another layer. Without this, we lose the self-certifying
154 | nature of the system. (Note that servers are encouraged to enforce that so as not to have their
155 | RASL endpoints used for general-purpose web serving, which can be a security vector depending on
156 | where the data being served came from.)
157 |
158 |
159 | Produce a CID for the retrieved data. If that CID does not match the requested cid,
160 | return failure.
161 |
162 |
163 | Return the data.
164 |
165 |
166 |
167 |
168 |
169 |
RASL Pathing
170 |
171 | Implementations SHOULD ignore paths in RASL URLs. They may be used in a future
172 | iteration of this specification.
173 |
24 | DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash
25 | with enough metadata to be extensible (to add new hash types in the future) and to indicate whether
26 | they are pointing to raw bytes or to structured data.
27 |
28 |
29 |
30 |
31 |
Introduction
32 |
33 | DASL CIDs are a simple structured identifier format for content addressing. They encapsulate a hash
34 | with enough metadata to be extensible (to add new hash types in the future) and to indicate whether
35 | they are pointing to raw bytes or to structured data. If you're simply using DASL CIDs as identifiers, you
36 | can almost certainly just use the string as an opaque ID and worry no further.
37 |
38 |
39 | A DASL CID can be represented as a string or as an array of bytes. If you wish to understand the
40 | internals of a CID, it has the following structure:
41 |
42 |
43 |
44 | A b prefix (only in string form). This is an extensibility point for future
45 | CID encodings other than the current base32 to be supported. (Currently this is the only one.)
46 |
47 |
48 | A version number, which is currently always 1.
49 |
50 |
51 | A content codec, which is a flag indicating whether it is pointing to structured or raw
52 | data.
53 |
54 |
55 | A hash type, that is always SHA-256 ([sha256]).
56 |
57 |
58 | A hash size, that is always 32 (0x20).
59 |
60 |
61 | A digest, which is the hash of the content being identified.
62 |
63 |
64 |
65 |
66 |
Parsing CIDs
67 |
68 | Use the following steps to parse a string-encoded CID, i.e. translate it to a bytestring:
69 |
70 |
71 |
Accept a string CID.
72 |
Remove the first character from CID and store it in prefix.
120 | You don't need to understand IPFS in order to use DASL. This section is for informational
121 | purposes only.
122 |
123 |
124 | DASL CIDs are a strict subset of IPFS CIDs
125 | with the following properties:
126 |
127 |
128 |
Only modern CIDv1 CIDs are used, not legacy CIDv0.
129 |
130 | Only the lowercase base32 multibase encoding (the b prefix) is used for human-readable
131 | (and subdomain-usable) string encoding.
132 |
133 |
134 | Only the raw binary multicodec (0x55) and dag-cbor multicodec (0x71), with the
135 | latter used only for [drisl]-conformant DAGs of CBOR objects.
136 |
137 |
Only SHA-256 (0x12) for the hash function .
138 |
139 | The CID isn't the boss of anyone, but the expectation is that, regardless of size, resources
140 | should not be "chunked" into a DAG or Merkle tree (as historically done with UnixFS canonicalization
141 | in IPFS systems) but rather hashed in their entirety and content-addressed directly. That being
142 | said, a DASL CID can point to a piece of [drisl] metadata that describes this kind of
143 | chunking, if needed. (A separate specification may be added for that.)
144 |
145 |
146 | This set of options has the added advantage that all the aforementioned single-byte prefixes require no
147 | additional varint processing or byte-fiddling.
148 |
11 | The CAR format offers a serialized representation of set of content-addressed
12 | resources in one single concatenated stream, alongside a header that describes
13 | that content.
14 |
15 |
16 |
17 |
Introduction
18 |
19 | The CAR format (Content Addressable aRchives) is used to store series of
20 | content-addressable objects as a sequence of bytes. It packages that stream of
21 | objects with a header.
22 |
23 |
24 | Much of the content of this specification was initially developed as part
25 | of the IPLD project. This specification
26 | was developed based on demand from the community to have just the one simplified
27 | document. Note that a CARv2 specification was developed at some point to add
28 | support for an index trailer, but it met with limited adoption and so was not
29 | considered when bringing CAR into DASL.
30 |
31 |
32 |
33 |
Parsing CAR
34 |
35 | The CAR format is made of a Header followed by a Body. The Header is a length-prefixed
36 | chunk of DRISL ([[drisl]]) and the Body is a sequence of zero or more length-prefixed
37 | blocks that contain a tuple of a DASL CID ([[cid]]) which is always 36 bytes long and
38 | the data addressed by that CID.
39 | The length prefix in a CAR is encoded as an unsigned variable-length
40 | integer ([[varint]], a variant of LEB128).
41 | This integer specifies the number of remaining bytes, excluding
42 | the bytes used to encode the integer, but including the CID for Body blocks.
43 |
44 |
|------ Header -----| |------------------- Body -------------------|
45 | [ int | DRISL block ] [ int | CID | data ] [ int | CID | data ] …
46 |
47 |
48 | The steps to parse a CAR are:
49 |
50 |
51 |
52 | Accept a byte stream bytes that is consumed with every step
53 | that reads from it.
54 |
55 |
56 | Run the steps to parse a CAR header with bytes to obtain
57 | metadata.
58 |
59 |
60 | Set up array blocks and run these substeps:
61 |
62 |
If bytes is empty, terminate these substeps.
63 |
64 | Run the steps to parse a CAR block header with bytes
65 | to obtain cid and data size.
66 |
67 |
68 | Read data size bytes from bytes and store the
69 | result in data.
70 |
71 |
72 | Push an entry onto blocks containing cid,
73 | data size, and data.
74 |
75 |
Return to the beginning of these substeps.
76 |
77 |
78 |
79 | Return metadata and blocks.
80 |
81 |
82 |
83 | Note that the CAR header contains a near-arbitrary DRISL object that is to be
84 | treated as metadata ([[drisl]]). For historical reasons, there are two
85 | constraints on the header:
86 |
87 |
88 |
89 | The object MUST contain a version map entry, the value of which
90 | is always integer-type 1. Version numbers in data formats are
91 | an anti-pattern, and as a result this number is guaranteed never to change.
92 |
93 |
94 | The object MUST contain a roots entry, which MUST be of type
95 | array. It MAY be empty, but if it isn't then it must be an array of CIDs
96 | encoded using tag 42 ([[cid]]). A CAR can be used
97 | to contain one or more DAGs of [[drisl]] content and the purpose of the
98 | roots is to list one or more roots for those DAGs. The array
99 | may be empty if you do not care about encoding DAGs.
100 |
101 |
102 |
103 | Some implementations will only return version and roots,
104 | but it is RECOMMENDED that they make the entire metadata object
105 | available. A best practice for authors is to use the metadata
106 | to capture MASL content, which is able to provide metadata and a pathing
107 | mapping for the entire content of the CAR stream if needed ([[masl]]).
108 |
109 |
110 | The steps to parse a CAR header are:
111 |
112 |
113 |
Accept a byte stream bytes.
114 |
Read an unsigned varint length from bytes ([[varint]]).
115 |
If length is 0, throw an error.
116 |
117 | Read length bytes from bytes and decode them as
118 | DRISL ([[drisl]]) into metadata. If metadata is
119 | not a map, throw an error.
120 |
121 |
122 | If metadata does not have a version key entry
123 | with integer value 1, throw an error. Otherwise, store
124 | version in version.
125 |
126 |
127 | If metadata does not have a roots key entry
128 | that is an array, or if that array contains anything other than DASL
129 | CIDs, throw an error. Otherwise, store roots in
130 | roots.
131 |
132 |
133 | Return metadata. (For implementations that only report
134 | version and roots, return those.)
135 |
136 |
137 |
138 | After its header, CAR contains a series of blocks each of which is
139 | length-prefixed and has a small header capturing a CID followed by
140 | the block's body data.
141 |
142 |
143 | The steps to parse a CAR block header are:
144 |
145 |
146 |
Accept a byte stream bytes.
147 |
Read an unsigned varint length from bytes ([[varint]]).
148 |
If length is 0, throw an error.
149 |
150 | Read a CID ([[cid]]) from bytes and store it in cid.
151 | Note: the length of the CID is always 36 bytes.
152 |
153 |
Set data size to length minus 36 (the CID length).
154 |
155 | Return data size and cid.
156 |
157 |
158 |
159 |
160 |
Additional Considerations
161 |
162 |
Conformance
163 |
164 | A CAR stream must only feature DASL CIDs.
165 |
166 |
167 | A CAR stream must have CIDs that match the data body that follows
168 | them. A CAR implementation should verify that CIDs match block body data, though
169 | it may delegate verification to other components. (Keep in mind that not
170 | verifying at all negates the value of content addressing.)
171 |
172 |
173 | A CAR stream's stated roots must match CIDs contained in the Body.
174 | However, implementations frequently operate in a streaming fashion such that
175 | they have no way of knowing whether a CAR stream conforms to this
176 | requirement before having processed the entire stream. Checking
177 | correctness with respect to this requirement may therefore be more
178 | readily performed via a warning (at end of processing) or a dedicated
179 | validator.
180 |
181 |
182 |
183 |
Determinism
184 |
185 | Deterministic CAR creation is not covered by this specification. However, deterministic
186 | generation of a CAR from a given graph is possible and is relied upon by certain uses of
187 | the format, most notably, Filecoin.
188 | dCAR may be the topic of a future specification.
189 |
190 |
191 | Care regarding the ordering of the roots array in the Header and avoidance
192 | of duplicate blocks may also be required for strict determinism.
193 |
194 |
195 |
196 |
Security & Verifiability
197 |
198 | The roots specified by the Header of a CAR is expected to appear somewhere in its Body section,
199 | however there is no requirement that the roots define entire DAGs, nor that all blocks
200 | in a CAR must be part of DAGs described by the root CIDs in the Header. Therefore, the
201 | roots must not be used alone to determine or differentiate the contents of a CAR.
202 |
203 |
204 | The CAR format contains no internal means, beyond the blocks and their CIDs, to verify
205 | or differentiate contents. Where such a requirement exists, this must be performed
206 | externally, such as creating a digest of the entire CAR (and refer to it using a CID).
207 |
35 | DASL ("dazzle") is a small set of simple, standard primitives for working with content-addressed,
36 | linked data. It builds on content addressing, a proven approach used in Git and
37 | IPFS to
38 | create reliable content identifiers (known as CIDs) through cryptographic hashing. Content addressing
39 | enables robust data integrity checks and efficient networking: systems can verify they received exactly
40 | what they asked for and avoid downloading the same content twice. The linked data part lets you link to
41 | stuff by its hash. You can build very big graphs with these primitives, such as the graph behind Bluesky.
42 |
43 |
44 | We call DASL "data-addressed" because it supports a data serialization component that makes
45 | content-addressing sweet and easy when
46 | working with data. The design is inspired by subcomponents of the
47 | IPFS universe, but simplified to improve
48 | interoperability, decrease costs, and work well with the web. More specifically, our priorities are:
49 |
50 |
51 |
52 | pave the cowpaths: we focus on supporting what people trying to solve real-world
53 | problems actually use. This takes over any consideration of engineering ideals or theoretical purity.
54 | We're retconning the spec to what people actually use implement — as it should be.
55 |
56 |
57 | extensibility vs optionality: extensibility is important for long-lived distributed
58 | systems, because the world will happen and you will need to change. But introducing optionality
59 | reduces interoperability and increases cost of both implementation and adoption. So rather than
60 | require support for many options now, we have extension points now but deliberately don't use their
61 | full range.
62 |
63 |
64 | don't make me think: you don't want to be thinking about content addressing. You
65 | want to grab this off the shelf and have something that works out of the box. Nothing weird, no
66 | impedance mismatch with the systems you know and love (or maybe know and hate, but whatever, it
67 | just works).
68 |
69 |
70 | lightweight loading: some people like JavaScript, others don't. We don't care, we
71 | just want things that work. What's certain is that you can't ignore it and be relevant. The ability
72 | to ship small code to the browser is critical.
73 |
74 |
75 | Unix philosophy: all of our specs are tiny and meant to compose together in simple
76 | ways that can be implemented independently from one another.
77 |
78 |
79 |
80 | This is intended to work for the community, to grow support for what we need. If you have thoughts, don't
81 | be shy and submit an issue! No stupid questions,
82 | don't assume everyone else has context that you don't. If this page isn't enough to understand DASL,
83 | then we're the ones who screwed up.
84 |
85 |
86 |
87 |
How
88 |
89 | This section describes how to use DASL patterns. It's work in progress!
90 |
91 |
92 |
93 |
Implementations
94 |
95 | DASL is a strict subset of IPFS CIDs and IPLD, so existing IPFS and IPLD implementations will just
96 | read DASL CIDs and the MASL documents they point to without so much as a hiccup. Some implementations also specifically target a DASL subset.
97 |
98 |
Here are some implementations that partially or fully support DASL:
99 |
100 |
101 | atcute (JS/TS): a collection of lightweight
102 | packages to make working with Bluesky and the ATmosphere easy.
103 |
150 | DRISL is a profile of deterministic CBOR used to ensure that the same data will have the same CID;
151 | it features native support for using binary CIDs as compact links between documents.
152 |
157 | The CAR format offers a serialized representation of set of content-addressed
158 | resources in one single concatenated stream, alongside a header that describes
159 | that content.
160 |
165 | MASL is a type of CBOR metadata document that is designed to work well with content-addressed
166 | and decentralised systems, to enable fully self-contained, self-certified content
167 | distribution.
168 |
180 | This extends DASL CIDs with a new hash type that works better for large files but isn't
181 | available by default in browsers, and therefore not an appropriate option in most
182 | situations.
183 |
188 | DASL components are designed to be small and to the extent possible self-contained.
189 | Because of that, despite having roots in IPFS, no DASL component depends on
190 | traditional IPFS implementations. In many cases, systems produced to work with
191 | DASL will interoperate with IPFS, but DASL's independence means that there can
192 | occasionally be impedance mismatch. This document is a non-normative list of
193 | things to consider when using DASL with IPFS systems, particularly with ones based
194 | on the DHT and Bitswap.
195 |
196 |
197 |
198 |
209 |
210 |
211 |
212 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/drisl.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
16 |
17 |
18 |
19 |
20 | DRISL — Deterministic Representation for Interoperable Structures & Links
21 |
22 |
24 | DRISL is a serialization format that is deterministic (so that the same
25 | data will have the same CID) and that features native support for using
26 | CIDs as links.
27 | It is based on CBOR, using a narrow profile of CBOR's "Core" featureset called "[cborc-42]", specified formally in this IETF document.
28 |
29 |
30 |
31 |
32 |
Introduction
33 |
34 | Deterministic encodings that produce the same stream of bytes for any
35 | given data with the same semantics are particularly useful in a content-addressed
36 | context. DRISL provides that, compactly encoded as CBOR. Additionally, it supports CBOR's Tag 42 to compactly encode CIDs ([cid]) as bytestreams. This CID can be used for content-addressed linking
37 | between DRISL documents (such as MASL documents), or to resources (best wrapped in MASL documents where renderability as web documents or web apps is desired).
38 |
39 |
40 | DRISL does not fork CBOR, CDE, or dCBOR ([cbor], [cde], [dcbor]), but it is a subset of features defined in CBOR "Core" ([cborc]), first defined in the earliest CBOR RFC and largely unaffected by refinements made since.
41 | Any decoder for any version of CBOR since v1 will be able to read DRISL
42 | content as conformant CBOR, and with enough carefully configuration of a powerful CBOR library and in some case pre-processing of the data, DRISL can be encoded anywhere CBOR can.
43 |
44 |
45 |
46 |
Format
47 |
48 | DRISL is an application profile of CBOR ([cbor]) that mostly subsets the more established [cborc] profile, with the following additional
49 | constraints:
50 |
51 |
52 |
53 | Implementations MUST support Tag 42 in the CBOR
54 | Tag registry; this tag compactly encodes CIDs as bytestrings, as specified below.
55 |
56 |
57 | Implementations MUST reject all other tags, including any of the tags
58 | listed in the section
59 | 3.4 of RFC 8949.
60 |
61 |
62 | Implementations MUST reject map keys that are not strings.
63 |
64 |
65 | Floating-point numbers MUST always be encoded as a 64-bit IEEE 754 binary floating-point, never as a "half-precision" (16-byte, major 7-25) or "single-precision (32-byte, major 7-26)" CBOR key.
66 | NOTE: It is RECOMMENDED that users avoid encoding floating-point numbers as much as possible
67 | to minimize interoperability and tooling issues.
68 |
69 |
70 | Completely avoiding floating-point numbers is RECOMMENDED to minimize interoperability and tooling issues.
71 |
72 |
73 | Even where floating-point numbers are used, most of the IEEE 754 "special" floating points (infinity, negative infinity, minimal NaN, and NaN with payloads) MUST NOT be encoded.
74 | Negative zero is the only allowed special floating point.
75 |
76 |
77 |
78 |
79 | Indefinite-length arrays (and the "break" code making them usable, in major type 7) are not allowed.
80 |
81 |
82 | Similarly, indefinite, incomplete, or streaming CBOR cannot be hashed and thus cannot be referenced by CID; for this reason, DRISL can only encode finite, bounded documents and resources.
83 |
84 |
85 | Concatenation of DRISL objects is generally discouraged and incurs both performance and interoperability risks.
86 |
87 |
88 | Note that DRISL objects cannot be streamed as CBOR streams (defined in RFC 8742) except in MIME-type aware contexts, as per the CBOR streams specification.
89 |
90 |
91 | Applications are discouraged from handling concatenated DRISL objects or appending extra bytes of any kind to a DRISL object in memory or across interfaces, as doing so breaks the DRISL-wide assumption that each CID refers to one complete, discrete, and valid CBOR object, and that DRISL systems only ever will be expected to handle such objects.
92 |
93 |
94 |
95 |
96 | Encoders MUST NOT encode any simple values other than true, false, and null (20, 21, and 22 in section 3.3 of [rfc8949]).
97 |
98 |
99 |
100 | All other requirements are as CBOR/c ([cborc]).
101 |
102 |
103 |
104 |
CIDs in CBOR
105 |
106 | CIDs in CBOR are stored in binary format, as CBOR bytestrings under custom tag 42. For historical reasons the null
107 | byte 0x00 is prepended to the binary CID before storing in CBOR.
108 |
109 |
110 | For more information, see the [cbor-tag42] appendix to the [drisl] specification.
111 |
112 |
113 |
114 |
Conversion with JSON
115 |
116 | JSON lacks a native way to encode tag 42 for CIDs ([cbor-tag42], [cid]). Historically, there have
117 | been different conventions to represent CIDs in JSON. For example, DAG-JSON, part of IPLD, used
118 | an object with a single / key pointing to the stringified CID.
119 |
120 |
121 | The AT Protocol uses an object with a $link key pointing to the stringified DASL CID:
122 |
130 | This specification recommends that implementations default to the AT Protocol $link
131 | convention, but may offer the option to support DAG-JSON or other conventions as well.
132 |
24 | RASL is a URL scheme used to identify content-addressed DASL resources
25 | along with a simple HTTP-based retrieval method.
26 |
27 |
28 |
29 |
34 |
35 |
Introduction
36 |
37 | Content-addressed resources are "self-certifying," which is to say that
38 | you don't need any external authority to certify that the content you
39 | have when you resolve the identifier is correct: because the identifier
40 | contains a hash, you can (and should) verify that you obtained the right
41 | content yourself ([ipfs-principles]). The identifier is enough to
42 | certify the content. This has several implications, but two are
43 | particularly relevant for this specification:
44 |
45 |
46 |
47 | When resolving a content-addressed identifier, you can obtain the content
48 | from anyone. It doesn't have to be the content's author. You can even
49 | obtain it from entirely untrusted sources — given that you can always
50 | certify it, you don't need to trust whoever gives it to you. As a result,
51 | the authority part of a URL — the part that can certify the content you
52 | get, which is the domain part in an https URL — is the
53 | CID itself ([cid]).
54 |
55 |
56 | Because it doesn't matter where you get content from, content-addressed
57 | URLs are inherently transport-independent. There are benefits to agreeing
58 | on transport (if only so that people can find one another's content)
59 | but as a client, if you know of several potential ways of obtaining a
60 | CID you are free to use whichever you prefer or to try several in
61 | whatever order.
62 |
63 |
64 |
65 | Taking these aspects into consideration, this specification defines a URL
66 | scheme in which the CID is the authority, along with optional hints of
67 | potential look-up locations, and defines a retrieval method but does not
68 | mandate that RASL retrieval rely on it.
69 |
70 |
71 |
The rasl URL Scheme
72 |
73 | RASL URLs look like this:
74 | rasl://bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4/?hint=berjon.com&hint=bsky.app.
75 | This breaks down into the following components:
76 |
77 |
78 |
79 | The rasl scheme. This is simply used as an entry point into RASL semantics.
80 |
81 |
82 | An authority, which is simply a DASL CID ([cid]), here bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4.
83 |
84 |
85 | A path (here just /) that is empty or / and that is expected
86 | to be ignored since RASL is only used for raw data retrieval.
87 |
88 |
89 | A query string part, that is parsed according to CGI rules and that contains
90 | zero or more hint entries, each of which is a host from which
91 | that CID can be obtained.
92 |
93 |
94 |
95 | Use the following steps to parse a RASL URL:
96 |
97 |
98 |
Accept a string url and parse it according to the URL Standard ([url]).
99 |
If that's a failure, return the failure.
100 |
Read the host part of the parsed URL and store that in cid.
101 |
If cid is not a valid CID ([cid]), return failure.
102 |
103 | Select all the tuples in the query object associated with the URL ([url]) whose name is
104 | exaclty hint. Each value MUST match the syntax of a valid host for the
105 | https scheme and values which do not match this syntax MUST be ignored
106 | and removed from this list. Store the remaining values in hints. If there were none
107 | then hints is an empty array.
108 |
109 |
Return the URL's parts as well as cid and hints.
110 |
111 |
112 |
113 |
Fetching RASL
114 |
115 | A user agent may retrieve a CID in whichever way it prefers. This section
116 | provides a simple standard for HTTP-based CID retrieval, to make it
117 | easy for authors to publish content to their own sites and have it
118 | retrieved, without having to worry about operating any infrastructure
119 | beyond the web server they already have.
120 |
121 |
122 | Use the following steps to fetch a RASL URL:
123 |
124 |
125 |
Accept a string url and parse it according to the steps to parse a RASL URL.
126 |
127 | Construct a request using cid from the url as well as hints that may
128 | be from the URL or from elsewhere (this is entirely up to you):
129 |
130 |
131 | For each hint, construct a request URL that is the concatenation of https://,
132 | the hint as host, /.well-known/rasl/, and the cid.
133 |
134 |
135 | Prepare the request such that it has a method of either GET or HEAD,
136 | that it is stateless (no cookies, no credentials of any kind), and that it uses no content
137 | negotiation.
138 |
139 |
140 |
141 |
142 | Fetch the requests. How these get prioritised is entirely up to the implementation. It
143 | is common to run them all in parallel and abort them with the first success response.
144 | Note that the .well-known path may redirect, so be ready to handle that.
145 | This makes it possible to create sites that are published the usual way and to have a RASL that
146 | is simply a redirect to the resource. So for instance, you may have an existing
147 | https://berjon.com/kitten.jpg the CID for which is
148 | bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4.
149 | This can be published as this RASL URL:
150 | rasl://bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4/?hint=berjon.com.
151 | A client can retrieve it by constructing the a request to this URL:
152 | https://berjon.com/.well-known/rasl/bafkreifn5yxi7nkftsn46b6x26grda57ict7md2xuvfbsgkiahe2e7vnq4.
153 | In turn, the latter may simply return a 307 redirect back to https://berjon.com/kitten.jpg.
154 | (Yes, this is HTTP with extra steps, but the extra steps get you self-certifying content.)
155 |
156 |
157 | If the response is a redirect but not a 307, the client SHOULD treat it as if it
158 | had been a 307 anyway.
159 |
160 |
161 | If none of the responses are successful, return failure.
162 |
163 |
164 | Set the response's media type to application/octet-stream. (The server should have
165 | done that already, but may not have done so, notably if it relied on a redirect.) The purpose
166 | of RASL is to retrieve data in ways that are independent of the server — any media type
167 | processing must therefore take place at another layer. Without this, we lose the self-certifying
168 | nature of the system. (Note that servers are encouraged to enforce that so as not to have their
169 | RASL endpoints used for general-purpose web serving, which can be a security vector depending on
170 | where the data being served came from.)
171 |
172 |
173 | Produce a CID for the retrieved data. If that CID does not match the requested cid,
174 | return failure.
175 |
176 |
177 | Return the data.
178 |
179 |
180 |
181 |
182 |
183 |
RASL Pathing
184 |
185 | Implementations SHOULD ignore paths in RASL URLs. They may be used in a future
186 | iteration of this specification.
187 |
24 | The CAR format offers a serialized representation of set of content-addressed
25 | resources in one single concatenated stream, alongside a header that describes
26 | that content.
27 |
28 |
29 |
30 |
31 |
Introduction
32 |
33 | The CAR format (Content Addressable aRchives) is used to store series of
34 | content-addressable objects as a sequence of bytes. It packages that stream of
35 | objects with a header.
36 |
37 |
38 | Much of the content of this specification was initially developed as part
39 | of the IPLD project. This specification
40 | was developed based on demand from the community to have just the one simplified
41 | document. Note that a CARv2 specification was developed at some point to add
42 | support for an index trailer, but it met with limited adoption and so was not
43 | considered when bringing CAR into DASL.
44 |
45 |
46 |
47 |
Parsing CAR
48 |
49 | The CAR format is made of a Header followed by a Body. The Header is a length-prefixed
50 | chunk of DRISL ([drisl]) and the Body is a sequence of zero or more length-prefixed
51 | blocks that contain a tuple of a DASL CID ([cid]) which is always 36 bytes long and
52 | the data addressed by that CID.
53 | The length prefix in a CAR is encoded as an unsigned variable-length
54 | integer ([varint], a variant of LEB128).
55 | This integer specifies the number of remaining bytes, excluding
56 | the bytes used to encode the integer, but including the CID for Body blocks.
57 |
58 |
|------ Header -----| |------------------- Body -------------------|
59 | [ int | DRISL block ] [ int | CID | data ] [ int | CID | data ] …
60 |
61 |
62 | The steps to parse a CAR are:
63 |
64 |
65 |
66 | Accept a byte stream bytes that is consumed with every step
67 | that reads from it.
68 |
69 |
70 | Run the steps to parse a CAR header with bytes to obtain
71 | metadata.
72 |
73 |
74 | Set up array blocks and run these substeps:
75 |
76 |
If bytes is empty, terminate these substeps.
77 |
78 | Run the steps to parse a CAR block header with bytes
79 | to obtain cid and data size.
80 |
81 |
82 | Read data size bytes from bytes and store the
83 | result in data.
84 |
85 |
86 | Push an entry onto blocks containing cid,
87 | data size, and data.
88 |
89 |
Return to the beginning of these substeps.
90 |
91 |
92 |
93 | Return metadata and blocks.
94 |
95 |
96 |
97 | Note that the CAR header contains a near-arbitrary DRISL object that is to be
98 | treated as metadata ([drisl]). For historical reasons, there are two
99 | constraints on the header:
100 |
101 |
102 |
103 | The object MUST contain a version map entry, the value of which
104 | is always integer-type 1. Version numbers in data formats are
105 | an anti-pattern, and as a result this number is guaranteed never to change.
106 |
107 |
108 | The object MUST contain a roots entry, which MUST be of type
109 | array. It MAY be empty, but if it isn't then it must be an array of CIDs
110 | encoded using tag 42 ([cid]). A CAR can be used
111 | to contain one or more DAGs of [drisl] content and the purpose of the
112 | roots is to list one or more roots for those DAGs. The array
113 | may be empty if you do not care about encoding DAGs.
114 |
115 |
116 |
117 | Some implementations will only return version and roots,
118 | but it is RECOMMENDED that they make the entire metadata object
119 | available. A best practice for authors is to use the metadata
120 | to capture MASL content, which is able to provide metadata and a pathing
121 | mapping for the entire content of the CAR stream if needed ([masl]).
122 |
123 |
124 | The steps to parse a CAR header are:
125 |
126 |
127 |
Accept a byte stream bytes.
128 |
Read an unsigned varint length from bytes ([varint]).
129 |
If length is 0, throw an error.
130 |
131 | Read length bytes from bytes and decode them as
132 | DRISL ([drisl]) into metadata. If metadata is
133 | not a map, throw an error.
134 |
135 |
136 | If metadata does not have a version key entry
137 | with integer value 1, throw an error. Otherwise, store
138 | version in version.
139 |
140 |
141 | If metadata does not have a roots key entry
142 | that is an array, or if that array contains anything other than DASL
143 | CIDs, throw an error. Otherwise, store roots in
144 | roots.
145 |
146 |
147 | Return metadata. (For implementations that only report
148 | version and roots, return those.)
149 |
150 |
151 |
152 | After its header, CAR contains a series of blocks each of which is
153 | length-prefixed and has a small header capturing a CID followed by
154 | the block's body data.
155 |
156 |
157 | The steps to parse a CAR block header are:
158 |
159 |
160 |
Accept a byte stream bytes.
161 |
Read an unsigned varint length from bytes ([varint]).
162 |
If length is 0, throw an error.
163 |
164 | Read a CID ([cid]) from bytes and store it in cid.
165 | Note: the length of the CID is always 36 bytes.
166 |
167 |
Set data size to length minus 36 (the CID length).
168 |
169 | Return data size and cid.
170 |
171 |
172 |
173 |
174 |
Additional Considerations
175 |
176 |
Conformance
177 |
178 | A CAR stream must only feature DASL CIDs.
179 |
180 |
181 | A CAR stream must have CIDs that match the data body that follows
182 | them. A CAR implementation should verify that CIDs match block body data, though
183 | it may delegate verification to other components. (Keep in mind that not
184 | verifying at all negates the value of content addressing.)
185 |
186 |
187 | A CAR stream's stated roots must match CIDs contained in the Body.
188 | However, implementations frequently operate in a streaming fashion such that
189 | they have no way of knowing whether a CAR stream conforms to this
190 | requirement before having processed the entire stream. Checking
191 | correctness with respect to this requirement may therefore be more
192 | readily performed via a warning (at end of processing) or a dedicated
193 | validator.
194 |
195 |
196 |
197 |
Determinism
198 |
199 | Deterministic CAR creation is not covered by this specification. However, deterministic
200 | generation of a CAR from a given graph is possible and is relied upon by certain uses of
201 | the format, most notably, Filecoin.
202 | dCAR may be the topic of a future specification.
203 |
204 |
205 | Care regarding the ordering of the roots array in the Header and avoidance
206 | of duplicate blocks may also be required for strict determinism.
207 |
208 |
209 |
210 |
Security & Verifiability
211 |
212 | The roots specified by the Header of a CAR is expected to appear somewhere in its Body section,
213 | however there is no requirement that the roots define entire DAGs, nor that all blocks
214 | in a CAR must be part of DAGs described by the root CIDs in the Header. Therefore, the
215 | roots must not be used alone to determine or differentiate the contents of a CAR.
216 |
217 |
218 | The CAR format contains no internal means, beyond the blocks and their CIDs, to verify
219 | or differentiate contents. Where such a requirement exists, this must be performed
220 | externally, such as creating a digest of the entire CAR (and refer to it using a CID).
221 |
11 | MASL is a CBOR-based metadata system that is designed to work well with content-addressed
12 | and decentralised systems, to enable fully self-contained, self-certified content
13 | distribution.
14 |
15 |
16 |
17 |
Introduction
18 |
19 | Anywhere you have resources that will be deployed in real-world systems, the potential
20 | metadata needs of those systems are effectively unbounded. This is particularly true of
21 | decentralised systems that need to exhibit "web-like" behaviour: in order to have
22 | reproducible and safe execution when content is sourced from arbitrary origins, it is
23 | necessary to have an equivalent to HTTP headers that is as verifiable as the content
24 | itself and not suffer the lossy behaviour that can occur when one treats the web as a file system (which it is not).
25 |
26 |
27 | Designing or constraining a syntax for all metadata needs would be hubris and madness.
28 | Instead, this document tries to minimally constrain applications while illustrating "where to stick"
29 | that metadata, as there are so few layers and hiding places in the DASL system.
30 |
31 |
32 |
33 |
How to use a MASL document
34 |
35 | The recommended structure for DASL metadata is to insert a [[drisl]] document "between" each
36 | CID and its resource(s), essentially using DRISL to encode headers for one or more CID-addressed resources.
37 | To do this, simply replace the CID of a resource with the CID of a DRISL object that contains a
38 | top-level (tag 42) property named src.
39 | To bundle a set of CIDs in a logical unit, drop the top-level src and replace it with a mapping of paths-from-domain-root to DRISL metadata objects including a src property containing the resource's CID.
40 |
41 |
42 | It is preferable to nest any metadata in a top-level object to namespace your own metadata
43 | vocabulary (inside an object named, for example, `my-cool-project-v1`) rather than using opaque UUIDs, version-bits inserted into values, or magic numbers.
44 | There are a few reserved words at the top-level, but even these can be avoided by nesting any conflicts as needed.
45 |
46 |
47 | There are many metadata standards that can be embedded in this way to facilitate the preservation of
48 | metadata at ingress as well as translatability. For example, the IPFS-based storage system Storacha
49 | has a robust CID-based metadata system called
50 | content credentials which includes UCAN-based
51 | permissioning, inclusion attestations, and CID "equivalences," e.g. mappings of multiple IPFS CIDs equivalent to each other and/or to a given DASL CID.
52 |
53 |
54 |
Using MASL with CAR
55 |
56 | CAR files ([[car]]) have a space reserved for metadata in their header.
57 | A MASL metadata document can occupy that metadata header-space,
58 | and the variant using a resources map is particularly well-suited to be used there.
59 | The resources field can be used to map paths to the CIDs of resources contained in subsequent CAR
60 | blocks after the metadata header.
61 |
62 |
63 | In order to be inserted directly as metadata object within CAR files, MASL documents need to contain a top-level version property whose value MUST
64 | be set to integer 1 and a top-level roots array that must contain 0 or more tag 42 CIDs and nothing else.
65 | Neither of these fields has any meaning in MASL, but they must be provided in the context of a CAR header for historical compatibility
66 | reasons.
67 | Note that there is no requirements that all the CIDs in a roots array also appear in the resources mapping or vice versa.
68 |
69 |
70 |
71 |
72 |
Fields
73 |
74 | MASL is designed to host arbitrary metadata but for interoperability purposes a number of root
75 | fields have predetermined values. Authors are invited to add their own metadata by creating
76 | namespaced objects at the top level.
77 |
78 |
79 | NOTE: In examples below, whenever we represent a CID as JSON for, say, field
80 | src, we use "src": { "$link": "CID value…" } as a readability convention.
81 |
82 |
83 |
Single or Multiple Resources
84 |
85 | MASL documents are primarily used to wrap around other resources for which they provide
86 | metadata. This can happen in one of two modes:
87 |
88 |
89 |
90 | Single Mode (using src): the metadata is only for one
91 | resource, which is the one that can be retrieved from the CID pointed to by src.
92 | HTTP metadata, if specified, goes at the root. App manifest metadata on a single resource
93 | can be used if that resource a fully standalone document (e.g. a PDF).
94 |
95 |
96 | Bundle Mode (using resources): the metadata is used to
97 | describe a whole set of resources. These resources SHOULD be related to one another in some
98 | way (e.g. components that go into building an app or document). The keys of the resources
99 | map are complete paths that MUST start with / and the values are metadata
100 | objects that MUST have an src field pointing to the resource's CID and SHOULD have a
101 | content-type field giving its MIME type, along with other HTTP headers.
102 | Any other properties present which do not map to HTTP headers MUST be ignored.
103 |
104 |
105 |
106 | Note that if both src and resources are specified, then
107 | src MUST be ignored.
108 |
109 |
110 | The Bundle Mode has some specific processing rules:
111 |
112 |
113 |
114 | Default: The entry with path / is the default path that is loaded if the bundle
115 | itself gets rendered, e.g. in a browser or other user-agent. Implementations MUST only recognise this as the default
116 | and MUST NOT automatically decide to pick a given entry (e.g. /index.html).
117 |
118 |
119 | Relative: When loading a bundle into a web context, the root of the bundle is given an opaque origin,
120 | and all internal links are resolved relative to that.
121 |
122 |
123 | No directory: There is no notion of directory. If a resource is indicated as sitting at /cats/reds/kitsune.jpg
124 | this does not entail that /cats/ or /cats/reds/ somehow exist. As
125 | in web contexts, it is the full path that is matched, not /-separated subsets. URLs
126 | do not map to file systems.
127 |
128 |
129 | Query strings: When resolving a URL inside of a bundle, implementation MUST only make use of the URL's pathname and
130 | MUST ignore the query string. (Note that this departs from typical URL processing but makes it easier
131 | to pass parameters between resources internally.)
132 |
133 |
134 |
135 | There is no requirement in MASL that bundles have to be stored or dereferenced in any specific manner. The relevant CIDs
136 | may be loaded through whatever way the implementation knows about such as RASL ([[rasl]]) or may be
137 | provided in a CAR file ([[car]]).
138 | Note that one value of this approach when compared to bundling resources for instance into Zip archives (or CAR files!)
139 | is that the resource map can contain an arbitrarily high number or volume of resources; an implementation may load an
140 | arbitrary subset of the resources, may parallelize loading, in an arbitrary order informed by types and HTTP headers,
141 | or load some resources on-demand.
142 |
143 |
144 | Specify a scheme and fetch rules properly.
145 |
177 | MASL supports a subset of HTTP response headers that are meaningful in decentralised
178 | contexts. This doesn't preclude headers not listed here from being used, but implementations
179 | that support using HTTP headers SHOULD NOT reflect the value of arbitrary HTTP headers
180 | without considering the potential attack surface they create.
181 |
182 |
183 | When using HTTP headers as MASL metadata, there are two modes. If the MASL document
184 | contains a root resources field then it is a MASL document for multiple resources
185 | and the HTTP headers are only meaningful if they are set on values of the resources
186 | map (and MUST be ignored if set on the root object). Conversely, if this MASL document
187 | contains a src field (and no resources) then the HTTP headers MUST
188 | be set on the root and ignored otherwise. If neither src nor resources
189 | are specified, the meaning of HTTP fields is undefined.
190 |
191 |
192 | All HTTP headers, where specified, are lowercased. Implementations MUST ignore headers
193 | with a different casing.
194 |
195 |
196 | Supported headers:
197 |
198 |
199 |
content-disposition
200 |
content-encoding
201 |
content-language
202 |
203 | content-security-policy: keep in mind however that runtime contexts are likely
204 | to already have a strict CSP that will override or constrain this one.
205 |
206 |
content-type
207 |
link
208 |
permissions-policy
209 |
referrer-policy
210 |
service-worker-allowed
211 |
212 | sourcemap: this must point to another resource in the resources map.
213 | Implementations SHOULD verify that this is the case as source maps could otherwise be used to
214 | exfiltrate information.
215 |
216 |
217 | speculation-rules: this must point to another resource in the resources map.
218 |
266 | One useful pattern with MASL is to describe an entire app or document, with all of its
267 | resources available for content addressing, possibly within a common CAR ([[car]]). Such
268 | docs or apps should use Web App Manifest metadata ([[manifest]]) as it is widely understood.
269 |
270 |
271 | The following manifest fields are guaranteed to be usable:
272 | background_color,
273 | categories,
274 | description,
275 | icons,
276 | id,
277 | name,
278 | screenshots,
279 | short_name, and
280 | theme_color.
281 | Note: other manifest fields MAY be used, but their behaviour is not guaranteed in the kind
282 | of web and web-like contexts for which MASL is optimized.
283 |
284 |
285 | For both icons and screenshots, the src field MUST
286 | be a path that matches an entry in the resources map, and the type
287 | field that is normally accepted in manifests there MUST NOT be used and MUST be ignored
288 | if specified. Media type information for that resource is specific on the resource entry
289 | that src maps to.
290 |
317 | As indicated in the CAR specification ([[car]]), the metadata object in the CAR header
318 | must contain a version field set to integer 1 and a
319 | roots field set to an array (that may be empty) of tag 42 CIDs. These
320 | fields have no meaning for MASL, but are expected to be set when MASL is used for CAR
321 | metadata for historical compatibility. Note that using versions in this way is an
322 | antipattern, and we expect the value never to change.
323 |
324 |
325 | Example:
326 |
327 |
328 | {
329 | "name": "Get in the CAR if you want to live",
330 | "version": 1,
331 | "roots": []
332 | }
333 |
334 |
335 |
336 |
AT Compatibility
337 |
338 | When used with the AT Protocol ([[at]]), it is common that objects will need to feature
339 | a $type field. If present, it MUST be a string and SHOULD be set to the
340 | value ing.dasl.masl.
341 |
342 |
343 |
344 |
Versioning
345 |
346 | When manipulating DAGs, it can be useful to keep track of history by referencing
347 | earlier versions of the same data or metadata. This can be done using the prev
348 | field, which if present MUST be a tag 42 CID pointing to a previous MASL document.
349 |
364 | Making a precise lexicon ([[lexicon]]) for MASL is impossible because lexicons lack a way of
365 | constraining objects with arbitrary keys. However, the following may still prove useful when
366 | MASL is integrated with the AT Protocol ([[at]]).
367 |
24 | MASL is a CBOR-based metadata system that is designed to work well with content-addressed
25 | and decentralised systems, to enable fully self-contained, self-certified content
26 | distribution.
27 |
28 |
29 |
30 |
31 |
Introduction
32 |
33 | Anywhere you have resources that will be deployed in real-world systems, the potential
34 | metadata needs of those systems are effectively unbounded. This is particularly true of
35 | decentralised systems that need to exhibit "web-like" behaviour: in order to have
36 | reproducible and safe execution when content is sourced from arbitrary origins, it is
37 | necessary to have an equivalent to HTTP headers that is as verifiable as the content
38 | itself and not suffer the lossy behaviour that can occur when one treats the web as a file system (which it is not).
39 |
40 |
41 | Designing or constraining a syntax for all metadata needs would be hubris and madness.
42 | Instead, this document tries to minimally constrain applications while illustrating "where to stick"
43 | that metadata, as there are so few layers and hiding places in the DASL system.
44 |
45 |
46 |
47 |
How to use a MASL document
48 |
49 | The recommended structure for DASL metadata is to insert a [drisl] document "between" each
50 | CID and its resource(s), essentially using DRISL to encode headers for one or more CID-addressed resources.
51 | To do this, simply replace the CID of a resource with the CID of a DRISL object that contains a
52 | top-level (tag 42) property named src.
53 | To bundle a set of CIDs in a logical unit, drop the top-level src and replace it with a mapping of paths-from-domain-root to DRISL metadata objects including a src property containing the resource's CID.
54 |
55 |
56 | It is preferable to nest any metadata in a top-level object to namespace your own metadata
57 | vocabulary (inside an object named, for example, `my-cool-project-v1`) rather than using opaque UUIDs, version-bits inserted into values, or magic numbers.
58 | There are a few reserved words at the top-level, but even these can be avoided by nesting any conflicts as needed.
59 |
60 |
61 | There are many metadata standards that can be embedded in this way to facilitate the preservation of
62 | metadata at ingress as well as translatability. For example, the IPFS-based storage system Storacha
63 | has a robust CID-based metadata system called
64 | content credentials which includes UCAN-based
65 | permissioning, inclusion attestations, and CID "equivalences," e.g. mappings of multiple IPFS CIDs equivalent to each other and/or to a given DASL CID.
66 |
67 |
68 |
Using MASL with CAR
69 |
70 | CAR files ([car]) have a space reserved for metadata in their header.
71 | A MASL metadata document can occupy that metadata header-space,
72 | and the variant using a resources map is particularly well-suited to be used there.
73 | The resources field can be used to map paths to the CIDs of resources contained in subsequent CAR
74 | blocks after the metadata header.
75 |
76 |
77 | In order to be inserted directly as metadata object within CAR files, MASL documents need to contain a top-level version property whose value MUST
78 | be set to integer 1 and a top-level roots array that must contain 0 or more tag 42 CIDs and nothing else.
79 | Neither of these fields has any meaning in MASL, but they must be provided in the context of a CAR header for historical compatibility
80 | reasons.
81 | Note that there is no requirements that all the CIDs in a roots array also appear in the resources mapping or vice versa.
82 |
83 |
84 |
85 |
86 |
Fields
87 |
88 | MASL is designed to host arbitrary metadata but for interoperability purposes a number of root
89 | fields have predetermined values. Authors are invited to add their own metadata by creating
90 | namespaced objects at the top level.
91 |
92 |
93 | NOTE: In examples below, whenever we represent a CID as JSON for, say, field
94 | src, we use "src": { "$link": "CID value…" } as a readability convention.
95 |
96 |
97 |
Single or Multiple Resources
98 |
99 | MASL documents are primarily used to wrap around other resources for which they provide
100 | metadata. This can happen in one of two modes:
101 |
102 |
103 |
104 | Single Mode (using src): the metadata is only for one
105 | resource, which is the one that can be retrieved from the CID pointed to by src.
106 | HTTP metadata, if specified, goes at the root. App manifest metadata on a single resource
107 | can be used if that resource a fully standalone document (e.g. a PDF).
108 |
109 |
110 | Bundle Mode (using resources): the metadata is used to
111 | describe a whole set of resources. These resources SHOULD be related to one another in some
112 | way (e.g. components that go into building an app or document). The keys of the resources
113 | map are complete paths that MUST start with / and the values are metadata
114 | objects that MUST have an src field pointing to the resource's CID and SHOULD have a
115 | content-type field giving its MIME type, along with other HTTP headers.
116 | Any other properties present which do not map to HTTP headers MUST be ignored.
117 |
118 |
119 |
120 | Note that if both src and resources are specified, then
121 | src MUST be ignored.
122 |
123 |
124 | The Bundle Mode has some specific processing rules:
125 |
126 |
127 |
128 | Default: The entry with path / is the default path that is loaded if the bundle
129 | itself gets rendered, e.g. in a browser or other user-agent. Implementations MUST only recognise this as the default
130 | and MUST NOT automatically decide to pick a given entry (e.g. /index.html).
131 |
132 |
133 | Relative: When loading a bundle into a web context, the root of the bundle is given an opaque origin,
134 | and all internal links are resolved relative to that.
135 |
136 |
137 | No directory: There is no notion of directory. If a resource is indicated as sitting at /cats/reds/kitsune.jpg
138 | this does not entail that /cats/ or /cats/reds/ somehow exist. As
139 | in web contexts, it is the full path that is matched, not /-separated subsets. URLs
140 | do not map to file systems.
141 |
142 |
143 | Query strings: When resolving a URL inside of a bundle, implementation MUST only make use of the URL's pathname and
144 | MUST ignore the query string. (Note that this departs from typical URL processing but makes it easier
145 | to pass parameters between resources internally.)
146 |
147 |
148 |
149 | There is no requirement in MASL that bundles have to be stored or dereferenced in any specific manner. The relevant CIDs
150 | may be loaded through whatever way the implementation knows about such as RASL ([rasl]) or may be
151 | provided in a CAR file ([car]).
152 | Note that one value of this approach when compared to bundling resources for instance into Zip archives (or CAR files!) is that the resource map can contain an arbitrarily high number or volume of resources; an implementation may load an arbitrary subset of the resources, may parallelize loading, in an arbitrary order informed by types and HTTP headers, or load some resources on-demand.
153 |
154 |
155 | Specify a scheme and fetch rules properly.
156 |
186 | MASL supports a subset of HTTP response headers that are meaningful in decentralised
187 | contexts. This doesn't preclude headers not listed here from being used, but implementations
188 | that support using HTTP headers SHOULD NOT reflect the value of arbitrary HTTP headers
189 | without considering the potential attack surface they create.
190 |
191 |
192 | When using HTTP headers as MASL metadata, there are two modes. If the MASL document
193 | contains a root resources field then it is a MASL document for multiple resources
194 | and the HTTP headers are only meaningful if they are set on values of the resources
195 | map (and MUST be ignored if set on the root object). Conversely, if this MASL document
196 | contains a src field (and no resources) then the HTTP headers MUST
197 | be set on the root and ignored otherwise. If neither src nor resources
198 | are specified, the meaning of HTTP fields is undefined.
199 |
200 |
201 | All HTTP headers, where specified, are lowercased.
202 |
203 |
204 | Supported headers:
205 |
206 |
207 |
content-disposition
208 |
content-encoding
209 |
content-language
210 |
211 | content-security-policy: keep in mind however that runtime contexts are likely
212 | to already have a strict CSP that will override or constrain this one.
213 |
214 |
content-type
215 |
link
216 |
permissions-policy
217 |
referrer-policy
218 |
service-worker-allowed
219 |
220 | sourcemap: this must point to another resource in the resources map.
221 | Implementations SHOULD verify that this is the case as source maps could otherwise be used to
222 | exfiltrate information.
223 |
224 |
225 | speculation-rules: this must point to another resource in the resources map.
226 |
272 | One useful pattern with MASL is to describe an entire app or document, with all of its
273 | resources available for content addressing, possibly within a common CAR ([car]). Such
274 | docs or apps should use Web App Manifest metadata ([manifest]) as it is widely understood.
275 |
276 |
277 | The following manifest fields are guaranteed to be usable:
278 | background_color,
279 | categories,
280 | description,
281 | icons,
282 | id,
283 | name,
284 | screenshots,
285 | short_name, and
286 | theme_color.
287 | Note: other manifest fields MAY be used, but their behaviour is not guaranteed in the kind
288 | of web and web-like contexts for which MASL is optimized.
289 |
290 |
291 | For both icons and screenshots, the src field MUST
292 | be a path that matches an entry in the resources map, and the type
293 | field that is normally accepted in manifests there MUST NOT be used and MUST be ignored
294 | if specified. Media type information for that resource is specific on the resource entry
295 | that src maps to.
296 |
322 | As indicated in the CAR specification ([car]), the metadata object in the CAR header
323 | must contain a version field set to integer 1 and a
324 | roots field set to an array (that may be empty) of tag 42 CIDs. These
325 | fields have no meaning for MASL, but are expected to be set when MASL is used for CAR
326 | metadata for historical compatibility. Note that using versions in this way is an
327 | antipattern, and we expect the value never to change.
328 |
329 |
330 | Example:
331 |
332 |
{
333 | "name": "Get in the CAR if you want to live",
334 | "version": 1,
335 | "roots": []
336 | }
337 |
338 |
339 |
340 |
AT Compatibility
341 |
342 | When used with the AT Protocol ([at]), it is common that objects will need to feature
343 | a $type field. If present, it MUST be a string and SHOULD be set to the
344 | value ing.dasl.masl.
345 |
346 |
347 |
348 |
Versioning
349 |
350 | When manipulating DAGs, it can be useful to keep track of history by referencing
351 | earlier versions of the same data or metadata. This can be done using the prev
352 | field, which if present MUST be a tag 42 CID pointing to a previous MASL document.
353 |
367 | Making a precise lexicon ([lexicon]) for MASL is impossible because lexicons lack a way of
368 | constraining objects with arbitrary keys. However, the following may still prove useful when
369 | MASL is integrated with the AT Protocol ([at]).
370 |