4 |
5 | This page has moved...
6 |
7 |
8 |
22 |
23 |
24 |
This page has moved to {{redirectUrl}}. You will be redirected in 5 seconds.
25 |
30 |
31 |
32 |
--------------------------------------------------------------------------------
/.vuepress/redirects.js:
--------------------------------------------------------------------------------
1 | const path = require('path')
2 | const fs = require('fs').promises
3 |
4 | const defaultRedirect = 'https://ipld.io/specs/'
5 |
6 | const redirects = [
7 | ['/block-layer/codecs/index.html', 'https://ipld.io/specs/codecs/'],
8 | ['/block-layer/codecs/dag-cbor.html', 'https://ipld.io/specs/codecs/dag-cbor/'],
9 | ['/block-layer/codecs/dag-json.html', 'https://ipld.io/specs/codecs/dag-json/'],
10 | ['/block-layer/codecs/dag-jose.html', 'https://ipld.io/specs/codecs/dag-jose/'],
11 | ['/block-layer/codecs/dag-pb.html', 'https://ipld.io/specs/codecs/dag-pb/'],
12 | ['/block-layer/content-addressable-archives.html', 'https://ipld.io/specs/transport/car/'],
13 | ['/block-layer/graphsync/known_extensions.html', 'https://ipld.io/specs/transport/graphsync/known_extensions/'],
14 | ['/block-layer/graphsync/graphsync.html', 'https://ipld.io/specs/transport/graphsync/'],
15 | ['/concepts/type-theory-glossary.html', 'https://ipld.io/design/concepts/type-theory-glossary/'],
16 | ['/data-model-layer/pathing.html', 'https://ipld.io/docs/data-model/pathing/'],
17 | [/^\/data-structures\/ethereum/, '/data-structures/ethereum/'],
18 | [/^\/data-structures\/filecoin/, 'https://github.com/ipld/ipld/tree/master/_legacy/specs/data-structures/filecoin'],
19 | ['/data-structures/flexible-byte-layout.html', 'https://ipld.io/specs/advanced-data-layouts/fbl/'],
20 | ['/data-structures/hashmap.html', 'https://ipld.io/specs/advanced-data-layouts/hamt/'],
21 | [/^\/design\/history\/exploration-reports/, 'https://github.com/ipld/ipld/tree/master/notebook/exploration-reports'],
22 | [/^\/design\//, 'https://ipld.io/design/'],
23 | ['/design/libraries/nodes-and-kinds.html', 'https://ipld.io/design/libraries/nodes-and-kinds/'],
24 | [/^\/schemas\//, 'https://ipld.io/docs/schemas/'],
25 | [/^\/selectors\//, 'https://ipld.io/specs/selectors/']
26 | ]
27 |
28 | module.exports = (options = {}, context) => ({
29 | generated: async (paths) => {
30 | const redirectTemplate = await fs.readFile(path.join(__dirname, 'redirect_template.html'), 'utf8')
31 | const queue = []
32 |
33 | for (const pathAbs of paths) {
34 | const pathRel = pathAbs.replace(path.join(__dirname, 'dist'), '')
35 | let redir = defaultRedirect
36 | for (const [from, to] of redirects) {
37 | if (typeof from === 'string' && pathRel.startsWith(from)) {
38 | redir = to
39 | break
40 | } else if (from instanceof RegExp && from.test(pathRel)) {
41 | redir = to
42 | break
43 | }
44 | }
45 | redirPage = redirectTemplate.replace(/\{\{redirectUrl\}\}/g, redir)
46 | queue.push(fs.writeFile(pathAbs, redirPage, 'utf8'))
47 | }
48 |
49 | await Promise.all(queue)
50 | }
51 | })
52 |
--------------------------------------------------------------------------------
/.vuepress/styles/index.styl:
--------------------------------------------------------------------------------
1 | div[class~="language-ipldsch"]::before {
2 | content: "ipld schema";
3 | }
4 |
--------------------------------------------------------------------------------
/EDITING.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | Editing
17 | =======
18 |
19 | Editing specifications is hard! Here are some partial guidelines and rules-of-thumb.
20 |
21 | - [Communicate!](#communicate)
22 | - [Make sure your change fits](#make-sure-your-change-fits)
23 | - ... especially if it's a [semantic change](#semantic-changes)
24 | - [Use Exploration Reports](#use-exploration-reports)
25 | - Heed the [stylistic preferences](#stylistic-preferences)
26 |
27 |
28 | Communicate!
29 | ------------
30 |
31 | There's lots of ways to communicate with the IPLD team and community.
32 |
33 | - Chat (all of these are federated; pick any form of client you like):
34 | - IRC: [#ipld on Freenode](irc://irc.freenode.net/ipld) ([webchat](https://webchat.freenode.net/?channels=ipld))
35 | - Matrix: https://matrix.to/#/#freenode_#ipld:matrix.org
36 | - All our development is in the open on GitHub:
37 | - This repo is https://github.com/ipld/specs/
38 | - Other work exists in other repos the same org: https://github.com/ipld/
39 | - GitHub issues and Pull Requests work just fine.
40 | - We have a (mostly) weekly meeting on Zoom: https://github.com/ipld/team-mgmt#weekly-call
41 |
42 | If you'd like to discuss something before proposing a change,
43 | there's lots of venues that can host a conversation.
44 |
45 | Escalating in levels of formality can be a good idea:
46 | e.g., ask questions on IRC/Matrix/Discord or in an issue first;
47 | then work on PRs later if an idea seems worth pursuing.
48 |
49 |
50 | Make sure your change fits
51 | --------------------------
52 |
53 | ### Textual changes
54 |
55 | Small textual changes can be made by Pull Request easily. Go for it!
56 |
57 | (Since the amount of effort involved in creating and reviewing this kind of
58 | change is minimal for both sides, it's easier to just send it than to ask permission.)
59 |
60 | ### Semantic changes
61 |
62 | _Editing specifications is **really** hard!_
63 |
64 | We welcome everyone's effort in refining the specs to be the best they can be.
65 | At the same time, be warned that it's a serious investment of energy!
66 | As shepherds of what we hope is a stable process, we have to be careful and
67 | deliberate when considering what changes to accept.
68 |
69 | A lot of decisions in specifications involve looking out for long-range effects
70 | and non-obvious implications. We try to document these considerations,
71 | but also in many documents try to strike a balance with brevity.
72 | Hitting a perfect stride with this is arguably objectively not possible;
73 | it's pretty much guaranteed that in your reading, you'll encounter a document
74 | that doesn't sufficiently explain all the "why isn't this different" branches.
75 |
76 | Specs deserve an extra long think before proposing changes.
77 | [Chesterton's fence](https://en.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fence)
78 | applies in _spades_ to specifications.
79 |
80 | Some common reasons things may be less tenable than they first appear:
81 |
82 | - Does this feature interact with other features in complex ways?
83 | - Does this presence of this feature with other features make the system overall
84 | more difficult to reason about?
85 | - Does adding this feature create an undue burden to library implementers?
86 | - Does this feature interact exceptionally poorly with any major programming languages?
87 | - Does this feature interact exceptionally poorly with any common serialization formats?
88 | (Would it shift any common formats to be more 'incomplete', e.g. unable to
89 | fully represent data from the Data Model or from other formats?)
90 | - Does this feature introduce potentially unbounded or difficult-to-estimate
91 | time or memory costs when implemented?
92 | - Does this feature make canonicalization or hashing of data exceptionally
93 | more difficult, confusing, or error-prone?
94 | - Does this change add too much verbosity to a detail that we don't want to emphasize?
95 | - Does this change remove explicitness and reduce focus from something we _do want_ to draw attention to?
96 |
97 | If you've considered all these things, and still think that you've got a good
98 | change in mind, then thank you for the depth of your thoughts so far!
99 | Go ahead and either create an issue on github, or hoist a PR, or move forward
100 | by writing up an [exploration report](#use-exploration-reports) to help
101 | hone further discussion, or, simply find us and start a conversation on
102 | the `#ipld` channel on freenode IRC!
103 |
104 |
105 | Use Exploration Reports
106 | -----------------------
107 |
108 | One way to make progress in designing something or discussing a tricky issue
109 | is to make an "exploration report".
110 |
111 | The key point of an "exploration report" is to document some thinking
112 | and collate notes about a complex topic, *without* jumping straight to a
113 | proposed solution. This is useful because we can readily preserve the notes,
114 | even if the idea in question doesn't pan out or can't be completed in one joust.
115 |
116 | We've got a bunch of these -- and more description of what that concept is --
117 | in the [`./design/history/exploration-reports`](./design/history/exploration-reports)
118 | directory. You can also make exploration reports as gists, or other kinds of
119 | documents, or even issues in github; whatever creates the least burden is best.
120 |
121 | Exploration reports are almost guaranteed to be a good idea if proposing
122 | any kind of [semantic change](#semantic-changes).
123 |
124 |
125 | Stylistic Preferences
126 | ---------------------
127 |
128 | ### Headings
129 |
130 | Be careful with headings -- they should look good as links.
131 | Being able to link deeply into a spec document is every bit as important
132 | as being able to read it from top to bottom.
133 |
134 | Be very careful not to have headings that are repeated; this makes linking to that heading impossible.
135 | (If you feel that repeated headings are still visually good formatting,
136 | consider if a `"**bold text**: subject details"` approach fits.
137 | Because this doesn't try to produce links, it's not a problem to repeat the bold component.
138 | But, by the other side of the same coin, _it can't be linked to_; be mindful of the concession.)
139 |
140 | Don't put redundant information or parenthetical remarks in a heading;
141 | it rarely comes out legible in the resulting links.
142 | For example, "#block-layer" is preferable to "#block-layer-layer-0".
143 | Clarifications and secondary terminology variations are better expressed in the body text
144 | than jammed into the heading and adding tongue-twisters to links.
145 |
146 | ### Linking
147 |
148 | The preferred linking format depends on what you're linking to and how far away it is.
149 | In general, the aim is to make editing easier in the long run.
150 |
151 | - The preferred link syntax depends on how distant the target is.
152 | - Links within a document should use bare hashtags -- `[text](#heading)`.
153 | - Links to documents in the same folder should use relative paths -- `[text](./file.md#heading)`.
154 | - Links to documents elsewhere in the repo should use rooted paths -- `[text](/full/path.md#heading)`.
155 | (This makes repo-wide reorganizations and searching easier; a series of "../../.." can be inconvenient to work with, and are difficult to verify by eye.)
156 | - Links to content outside the repo should of course use the full URL.
157 | - Links should include the file extension for '.md' files.
158 | (It will be stripped by our site publish tooling, but is necessary for links to work well within github's markdown processing.)
159 | - When linking to source code: use the full commit hash! (GitHub has a handy `'y'` shortcut for this; use it!)
160 | - If linking to specific line numbers, a full commit hash is an absolute requirement. It is otherwise far too easy to have "working" but semantically invalid links.
161 | - If aiming to whole packages or high level documents in other repos, it _may_ be viable to link to the master branch. But be judicious.
162 |
163 | ### Todos
164 |
165 | Some documents have "TODO"s within them.
166 |
167 | This isn't necessarily something to emulate when submitting new changes.
168 | Documentation that's done is much better than documentation that's to-do!
169 |
170 | Whether or not a "TODO" is acceptable in a PR depends:
171 |
172 | - How critical is it?
173 | - if it's a "TODO" in critical parts of system definition? No. (Consider re-writing what you're working on as an [exploration report](#use-exploration-reports) instead!)
174 | - If it's a "TODO" stating that better description is still wanted? Maybe. If it's useful in context to state that, and a legitimate open ask to the community at large when they read the document, we can consider merging it.
175 | - How sure are we it can be addressed?
176 | - If it's coming from someone with known long-term investment in the project? Maybe; we can have a reasonable expectation the same person will either come back to finish it, or at least has a good idea about whether or not it will be possible for other contributors to fill the missing piece.
177 | - If it's coming from someone making their first contributions? Unfortunately, we have to set a higher bar here. It's harder to know whether follow-up will come, and so its more risky to bet that it will.
178 |
179 | None of these factors have a clear yes-or-no answer (nor are they intended to),
180 | so, ultimately, expect this to be resolved by discussion during review.
181 | By default? Fewer "TODO"s are better: avoid introducing them if you can.
182 |
183 | ### Line breaks
184 |
185 | Linebreaks are non-semantic in markdown, so where to break lines is up to you.
186 | We don't enforce any strict line-length limit.
187 |
188 | As a general rule, edits should aim to match the style in their vicinity,
189 | and where possible avoid cascading re-flows.
190 | Do what makes "git diff" more likely to be readable, in other words.
191 |
192 | Some documents use a "break line at end of sentence" heuristic.
193 | You might give that a shot and see how it feels.
194 | (This paragraph in this document is an example!)
195 |
--------------------------------------------------------------------------------
/block-layer/CID.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Specification: CIDs
17 |
18 | **Status: Descriptive - Final**
19 |
20 | This document will use the words "Content IDs" or "CIDs" interchangeably.
21 |
22 | Prior Base58 Multihash links to Protobuf data is referred to as CID Version 0.
23 |
24 | ## Summary
25 |
26 | A CID is a hash-based content identifier. Includes the `codec` and `multihash`.
27 |
28 | ```
29 | +-------+------------------------------+
30 | | Codec | Multihash |
31 | +-------+------------------------------+
32 | ```
33 |
34 | The long version:
35 |
36 | ```
37 | +------------------------------+
38 | |Codec |
39 | +------------------------------+
40 | |Multihash |
41 | | +----------+---------------+ |
42 | | |Hash Type | Hash Value | |
43 | | +----------+---------------+ |
44 | | |
45 | +------------------------------+
46 | ```
47 |
48 | ## CIDs Version 1
49 |
50 | Putting together the IPLD Link update statements above, we can term the new handle for IPLD data CID Version 1, with a multibase prefix, a version, a packed multicodec, and a multihash.
51 |
52 | ```
53 |
54 | ```
55 |
56 | Where:
57 | - `` is a multibase prefix describing the base that encodes this CID. If binary, this is omitted.
58 | - `` is the version number of the cid.
59 | - `` is a multicodec-packed identifier, from the CID multicodec table
60 | - `` is a cryptographic multihash, including: ``
61 |
62 | Note that all CIDs v1 and on should always begin with ``, this evolving nicely.
63 |
64 | ### Multicodec Packed Representation
65 |
66 | It is useful to have a compact version of multicodec, for use in small identifiers. This compact identifier will just be a single varint, looked up in a table. Different applications can use different tables. We should probably have one common table for well-known formats.
67 |
68 | We will establish a table for common authenticated data structure formats, for example: IPFS v0 Merkledag, CBOR IPLD, Git, Bitcoin, and more. The table is a simple varint lookup.
69 |
70 | ### Distinguishing v0 and v1 CIDs (old and new)
71 |
72 | It is a HARD CONSTRAINT that all IPFS links continue to work. This means we need to continue to support v0 CIDs. This means IPFS APIs must accept both v0 and v1 CIDs. This section defines how to distinguish v0 from v1 CIDs.
73 |
74 | Old v0 CIDs are strictly sha2-256 multihashes encoded in base58 -- this is because IPFS tooling only shipped with support for sha2-256. This means the binary versions are 34 bytes long (sha2-256 256 bit multihash), and that the string versions are 46 characters long (base58 encoded). This means we can recognize a v0 CID by ensuring it is a sha256 bit multihash, of length 256 bits, and base58 encoded (when a string). Basically:
75 |
76 | - `` is implicitly base58.
77 | - `` is implicitly 0.
78 | - `` is implicitly protobuf (for backwards compat with v0).
79 | - `` is a cryptographic multihash, explicit.
80 |
81 | We can re-write old v0 CIDs into v1 CIDs, by making the elements explicit. This should be done henceforth to avoid creating more v0 CIDs. But note that many references exist in the wild, and thus we must continue supporting v0 links. In the distant future, we may remove this support after sha2 breaks.
82 |
83 | Note we can cleanly distinguish the values, which makes it easy to support both. The code for this check is here: https://gist.github.com/jbenet/bf402718a7955bf636fb47d214bcef8a
84 |
85 | ### IPLD supports non-CID hash links as implicit CIDv1s
86 |
87 | Note that raw hash links _stored in various data structures_ (e.g. Protobuf, Git, Bitcoin, Ethereum, etc) already exist. These links -- when loaded directly as one of these data structures -- can be seen as "linking within a network" whereas proper CIDv1 IPLD links can be seen as linking "across networks" (internet of data! internet of data structures!). Supporting these existing (or even new) raw hash links as a CIDv1 can be done by noting that when on data structure links with just a raw binary link, the rest of the CIDv1 fields are implicit:
88 |
89 | - `` is implicitly binary or whatever the format encodes.
90 | - `` is implicitly 1.
91 | - `` is implicitly the same as the data structure.
92 | - `` can be determined from the raw hash.
93 |
94 | Basically, we construct the corresponding CIDv1 out of the raw hash link because all the other information is _in the context_ of the data structure. This is very useful because it allows:
95 | - more compact encoding of a CIDv1 when linking from one data struct to another
96 | - linking from CBOR IPLD to other CBOR IPLD objects exactly as has been spec-ed out so far, so any IPLD adopters continue working.
97 | - (most important) opens the door for native support of other data structures
98 |
99 | ### IPLD addresses raw data
100 |
101 | Given the above addressing changes, it is now possible to address raw data directly, as an IPLD node. This node is of course taken to be just a byte buffer, and devoid of links (i.e. a leaf node).
102 |
103 | The utility of this is the ability to directly address any object via hashing external to IPLD data structures.
104 |
105 | ### Support for multiple binary packed formats
106 |
107 | Contrary to prior Merkle objects (e.g. IPFS protobuf legacy, git, bitcoin, dat and others), new IPLD objects are authenticated AND self described data blobs, each IPLD object is serialized and prefixed by a multicodec identifying its format.
108 |
--------------------------------------------------------------------------------
/block-layer/block.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Concept: Block
17 |
18 | A IPLD Block is a CID and the binary data value for that CID.
19 |
20 | The short version:
21 | ```
22 | +-----+--------------------------------+
23 | | CID | Data |
24 | +-----+--------------------------------+
25 | ```
26 |
27 | The long version:
28 | ```
29 | +-----------------------------------+------------------+
30 | | CID | Binary Data |
31 | | +------------------------------+ | |
32 | | |Codec | | |
33 | | +------------------------------+ | |
34 | | |Multihash | | |
35 | | | +----------+---------------+ | | |
36 | | | |Hash Type | Hash Value | | | |
37 | | | +----------+---------------+ | | |
38 | | | | | |
39 | | +------------------------------+ | |
40 | | | |
41 | +-----------------------------------+------------------+
42 | ```
43 |
--------------------------------------------------------------------------------
/block-layer/codecs/README.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # IPLD Codecs
17 |
18 | A codec exposes serialization and deserialization for IPLD blocks.
19 | If it also supports content addressable links then the codec exposes those links as CIDs.
20 | A codec also supports atomic IPLD Path lookups on the block.
21 |
--------------------------------------------------------------------------------
/block-layer/codecs/codecs-and-completeness.meta.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | Meta: Codecs and Completeness
17 | =============================
18 |
19 | This document is somewhat rough.
20 | It's necessary that we have this discussion somewhere, but it could certainly be polished further.
21 |
22 |
23 | ### do not use wishful thinking here
24 |
25 | Discussion about codecs and completeness must be grounded in **one-to-one correspondence** --
26 | also known as [Bijection](https://en.wikipedia.org/wiki/Bijection).
27 |
28 | Bijection is a double-edged sword:
29 | inherently, a Codec with *more* features than the Data Model will *lose* data when transforming data to the Data Model;
30 | and a Codec with *fewer* features than the Data Model will not be able to express all data that IPLD can describe.
31 | One cannot wish this away; one can only document tradeoffs, domains, and limitations accurately.
32 |
33 | ### conflicts
34 |
35 | This document conflicts with the some other claims currently in the specs repo
36 | (such as "DAG-JSON supports the full IPLD Data Model" in https://github.com/ipld/specs/blob/master/block-layer/codecs/dag-json.md ).
37 | This document supercedes those. There is corrective work to be done.
38 |
39 | ### can we make tables
40 |
41 | It would be neat if we continued to boil this down until we can get regularized tables with checkmarks.
42 | We could then use those tables in the specs pages for each codec.
43 |
44 | To do so is somewhat tricky.
45 |
46 | ### can we choose more words? fewer words? different words?
47 |
48 | Sure. Language is an art.
49 |
50 | This document tries to choose terms for describing patterns of limitations that *tend* to exist in implementations,
51 | and tradeoffs that designers and specifiers of codecs *often* weigh.
52 | It does not try to describe every possible choice with a unique term;
53 | it also names many more patterns than the purest of mathematical data modeling would consider distinctive.
54 | The goal is to provide a happy medium of vocabulary that is useful in practical discussion.
55 |
--------------------------------------------------------------------------------
/block-layer/codecs/dag-jose.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Specification: DAG-JOSE
17 |
18 | **Status: Descriptive - Draft**
19 |
20 | JOSE is a standard for signing and encrypting JSON objects. The various specifications for JOSE can be found in the [IETF datatracker](https://datatracker.ietf.org/wg/jose/documents/).
21 |
22 | ## Format
23 |
24 | The are two kinds of JOSE objects: JWS ([JSON web signature](https://datatracker.ietf.org/doc/rfc7515/?include_text=1)) and JWE ([JSON web encryption](https://datatracker.ietf.org/doc/rfc7516/?include_text=1)). These two objects are primitives in JOSE and can be used to create JWT and JWM objects etc. The IETF RFCs specify a JSON encoding of JOSE objects. This specification maps the JSON encoding to CBOR. Upon encountering the `dag-jose` multiformat implementations can be sure that the block contains dag-cbor encoded data which matches the IPLD schema we specify below.
25 |
26 | ### Mapping from the JOSE general JSON serialization to dag-jose serialization
27 |
28 | Both JWS and JWE supports three different serialization formats: `Compact Serialization`, `Flattened JSON Serialization`, and `General JSON Serialization`. The first two are more concise, but they only allow for one recipient. Therefore DAG JOSE always uses the `General Serialization` which ensures maximum compatibility with minimum ambiguity. Libraries implementing serialization should accept all JOSE formats including the `Decoded Representation` (see below) and convert them if necessary.
29 |
30 | To map the general JSON serialization to CBOR we do the following:
31 |
32 | - Any field which is represented as `base64url()` we map directly to `Bytes` . For fields like `header` and `protected` which are specified as the `base64url(ascii())` that means that the value is the `ascii()` bytes.
33 | - For JWS we specify that the `payload` property MUST be a CID, and we set the `payload` of the encoded JOSE object to `Bytes` containing the bytes of the CID. For applications where an additional network request to retrieve the linked content is undesirable then an `identity` multihash should be used.
34 | - For JWE objects the `ciphertext` must decrypt to a cleartext which is the bytes of a CID. This is for the same reason as the `payload` being a CID, and the same approach of using an `identity` multihash can be used, and most likely will be the only way to retain the confidentiality of data.
35 |
36 | Below we present an IPLD schema representing the encoded JOSE objects. Note that there are two IPLD schemas, `EncodedJWE` and `EncodedJWS`. The actual wire format is a single struct which contains all the keys from both the `EncodedJWE` and the `EncodedJWS` structs, implementors should follow [section 9 of the JWE spec](https://tools.ietf.org/html/rfc7516#section-9) and distinguish between these two branches by checking if the `payload` attribute exists, and hence you have a JWS; or the `ciphertext` attribute, hence you have a JWE.
37 |
38 | **Encoded JOSE**
39 |
40 | ```ipldsch
41 | type EncodedSignature struct {
42 | header optional {String:Any}
43 | protected optional Bytes
44 | signature Bytes
45 | }
46 |
47 | type EncodedRecipient struct {
48 | encrypted_key optional Bytes
49 | header optional {String:Any}
50 | }
51 |
52 | type EncodedJWE struct {
53 | aad optional Bytes
54 | ciphertext Bytes
55 | iv optional Bytes
56 | protected optional Bytes
57 | recipients [EncodedRecipient]
58 | tag optional Bytes
59 | unprotected optional {String:Any}
60 | }
61 |
62 | type EncodedJWS struct {
63 | payload optional Bytes
64 | signatures [EncodedSignature]
65 | }
66 | ```
67 |
68 | ## Padding for encryption
69 |
70 | Applications may need to pad the cleartext when encrypting to avoid leaking the size of the cleartext. This raises the question of how the application knows what part of the decrypted cleartext is padding. In this case we use the fact that the cleartext MUST be a valid CID, implementations should parse the cleartext as a CID and discard any content beyond the multihash digest size - which we assume to be the padding.
71 |
72 |
73 | ## Decoded JOSE
74 |
75 | Typically implementations will want to decode this format into something more useful for applications. Exactly what that will look like depends on the language of the implementation, here we use the IPLD schema language to give a somewhat language agnostic description of what the decoded representation might look like at runtime. Note that everything which is specified as `base64url(ascii())` in the JOSE specs - and which we encode as `Bytes` in the wire format - is here decoded to a `String`. We also add the `link: &Any` attribute to the `DecodedJWS`, which allows applications to easily retrieve the authenticated content.
76 |
77 | Also note that, as with the encoded representation, there are two different representations; `DecodedJWE` and `DecodedJWS`. Applications can distinguish between these two branches in the same way as with the Encoded representation described above.
78 |
79 | ```ipldsch
80 | type DecodedSignature struct {
81 | header optional {String:Any}
82 | protected optional String
83 | signature String
84 | }
85 |
86 | type DecodedJWS struct {
87 | payload String
88 | signatures [DecodedSignature]
89 | link: &Any
90 | }
91 |
92 | type DecodedRecipient struct {
93 | encrypted_key optional String
94 | header optional {String:Any}
95 | }
96 |
97 | type DecodedJWE struct {
98 | aad optional String
99 | ciphertext String
100 | iv String
101 | protected String
102 | recipients [DecodedRecipient]
103 | tag String
104 | unprotected optional {String:Any}
105 | }
106 | ```
107 |
108 | ## Implementations
109 |
110 | - [Javascript](https://github.com/oed/js-dag-jose)
111 | - [Go](https://github.com/alexjg/go-dag-jose)
112 |
--------------------------------------------------------------------------------
/block-layer/codecs/dag-json.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Specification: DAG-JSON
17 |
18 | **Status: Descriptive - Final**
19 |
20 | DAG-JSON supports the full [IPLD Data Model](../../data-model-layer/data-model.md).
21 |
22 | DAG-JSON uses the [JavaScript Object Notation (JSON)] data format, defined by [RFC 8259](https://tools.ietf.org/html/rfc8259).
23 |
24 | ## Format
25 |
26 | The native JSON IPLD format is called DAG-JSON to disambiguate it from regular JSON. Most simple JSON objects are valid DAG-JSON. The primary differences are:
27 |
28 | * Bytes and Links are supported with special use of single-key (`"/"`) map.
29 | * In limited cases, maps with the key `"/"` other than those used to encode Bytes and Links, are disallowed.
30 | * Maps are sorted by key.
31 |
32 | ## Serialization
33 |
34 | Codec implementors **MUST** do the following in order to ensure hashes consistently match for the same block data.
35 |
36 | - Sort object keys by their (UTF-8) encoded representation, i.e. with byte comparisons
37 | - Strip whitespace
38 |
39 | This produces the most compact and consistent representation which will ensure that two codecs
40 | producing the same data end up with matching block hashes.
41 |
42 | ## Supported Kinds
43 |
44 | All [IPLD Data Model Kinds](../../data-model.md#kinds) except Bytes and Link are supported natively by JSON.
45 |
46 | Bytes and Links use extensions specific to DAG-JSON. They are implemented as an map, where the single key is a slash (`"/"`) and the value contains the kind's data.
47 |
48 | ### Numbers
49 |
50 | JSON only has a single number type. Many dynamically typed programming languages (e.g. Python, Ruby, PHP) distinguish between integers and floats when parsing JSON. JavaScript does not since all numbers are represented internally as IEEE 754 floats. A JSON number consisting of an optional leading sign (`-`) and only digits is parsed as integer, if it contains a decimal point, it's parsed as a float. DAG-JSON the same method is used to represent integers and floats.
51 |
52 | Data Model floats that do not have a fractional component should be encoded **with** a decimal point, and will therefore be distinguishable from an integer during round-trip. (Note that since JavaScript still cannot distinguish a float from an integer where the number has no fractional component, this rule will not impact JavaScript encoding or decoding).
53 |
54 | Contrary to popular belief, JSON as a format supports Big Integers. It's only JavaScript itself that has trouble with them. This means JS implementations of DAG-JSON can't use the native JSON parser and serializer if integers bigger than `2^53 - 1` need to be supported.
55 |
56 | `Infinity`, `NaN` and `-Infinity` are not natively supported by JSON and are not supported by the IPLD Data Model.
57 |
58 | See further discussion on [Floats in the Data Model](../../data-model.md#float-kind), including a recommendation to avoid floats where possible when producing and consuming content addressed data.
59 |
60 | ### Bytes
61 |
62 | The Bytes kind is represented as an object with `"bytes"` as key and a Base64 encoded string as value. The Base64 encoding is the one described in [RFC 4648, section 4](https://tools.ietf.org/html/rfc4648#section-4) without padding.
63 |
64 | _Note that a previous version of this specification and some implementations used a [Multibase](https://github.com/multiformats/multibase) prefix `m` for the bytes, this has been removed from the specification and the Base64 encoded bytes **should not** be prefixed._
65 |
66 |
67 | ```javascript
68 | {"/": { "bytes": String /* Base64 encoded binary */ }}
69 | ```
70 |
71 | ### Links
72 |
73 | A Link kind is represented as a base encoded CID. CIDv0 and CIDv1 are encoded differently.
74 |
75 | - CIDv1 is represented as a Multibase Base32 encoded string. The Base32 encoding is the one described in [RFC 4648, section 6](https://tools.ietf.org/html/rfc4648#section-6) without padding, hence the Multibase prefix is `b`.
76 | - CIDv0 is represented in its only possible Base58 encoding. The Base58 encoding is the one described in [Base58 draft](https://tools.ietf.org/html/draft-msporny-base58).
77 |
78 | ```javascript
79 | {"/": String /* Base58 encoded CIDv0 or Multibase Base32 encoded CIDv1 */}
80 | ```
81 |
82 | ### The Reserved Namespace
83 |
84 | Maps with the first key of `"/"` are considered the **reserved namespace** in DAG-JSON as they are used to represent Bytes and Links. There are special rules that restrict certain data forms from being properly encoded in DAG-JSON. These rules allow for the clean representation of Bytes and Links as well as efficient operation of tokenizing decoders. A tokenizing decoder should not need to buffer and back-track more than 4 tokens upon detection of a map that is not properly encoding Links or Bytes.
85 |
86 | The two forms used in the reserved namespace are:
87 |
88 | * **CID**: a map with the single key `"/"`, whose value is a string, must contain a valid CIDv0 in Base58 string form **or** CIDv1 in Base32 string form. Such a map whose string does not properly represent such a CID should should be rejected as invalid DAG-JSON.
89 | * **Bytes**: A map with the single key `"/"`, whose value is a map with the single key `"bytes"`, whose value is a string, must contain a valid Base64 encoded byte array. Such a construction whose string does not properly represent a Base64 encoded byte array should be rejected as invalid DAG-JSON.
90 |
91 | #### Parse rejection modes in the reserved namespace
92 |
93 | Data with the following forms are **strictly not valid DAG-JSON** and should be rejected by encoders and decoders:
94 |
95 | ***Maps with more than one key, where the first key is `"/"` and its value is a string.***
96 |
97 | e.g. `{"/":"foo","bar":"baz"}`
98 |
99 | * Where a key exists that sorts before `"/"`, the map is valid, e.g. `{"0bar":"baz","/":"foo"}`.
100 | * Where the value of the `"/"` entry is not a string, the map is valid, e.g. `{"/":true,"bar":"baz"}`.
101 |
102 | ***Maps where the first key is `"/"` and its value is a map with more than one key where the first key of the inner map is `"bytes"` whose value is a string.***
103 |
104 | e.g. `{"/":{"bytes":"foo","bar":"baz"}}`
105 |
106 | * Where a key exists in the inner map that sorts before `"bytes"`, the map is valid, e.g. `{"/":{"abar":"baz","bytes":"foo"}}`.
107 | * Where the value of the inner map's `"bytes"` entry is not a string, the map is valid, e.g. `{"/":{"bytes":true},"bar":"baz"}`.
108 |
109 | ***Maps with more than one key, where the first key is `"/"` and its value is a map where the first key of the inner map is `"bytes"` whose value is a string.***
110 |
111 | e.g. `{"/":{"bytes":"foo"},"bar":"baz"}`
112 |
113 | * Where a key exists that sorts before `"/"`, the map is valid, e.g. `{"0bar":"baz","/":{"bytes":"foo"}}`.
114 | * Where the value of the `"bytes"` entry in the inner map is not a string, the map is valid, e.g. `{"/":{"bytes":true},"bar":"baz"}`.
115 |
116 | There is no mechanism for escaping otherwise valid JSON data that takes these forms. For this reason, it is recommended that the `"/"` key should be avoided in Data Model maps where DAG-JSON may be used in order to avoid such conflicts.
117 |
118 | ## Implementations
119 |
120 | ### JavaScript
121 |
122 | **[@ipld/dag-json](https://github.com/ipld/js-dag-json)**, for use with [multiformats](https://github.com/multiformats/js-multiformats) adheres to this specification.
123 |
124 | The legacy **[ipld-dag-json](https://github.com/ipld/js-ipld-dag-json)** implementation adheres to this specification, with the following caveats:
125 | * The reserved namespace rules above are not strictly applied. Decoding maps with the forms of Bytes and Links but with additional entries in inner or outer maps will be successfuly decoded as Bytes or Links but the extraneous entries will be ignored.
126 | * Bytes are encoded with their Multibase Base64 prefix `m` as per a previous version of this specification.
127 |
128 | ### Go
129 |
130 | **[go-ipld-prime]** adheres to this specification with the following caveats:
131 | * Map keys are not sorted, they retain their assembled order.
132 | * Encoded forms are pretty-printed, i.e. do not have whitespace stripped.
133 |
--------------------------------------------------------------------------------
/block-layer/codecs/dag-pb.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # DAG-PB Spec
17 |
18 | **Status: Descriptive - Final**
19 |
20 | DAG-PB is an IPLD codec that uses [Protocol Buffers](https://developers.google.com/protocol-buffers/) to describe a binary format that can encode a byte array and an associated list of links. It is the primary means of encoding structured file data for [IPFS](https://ipfs.io/), serving as the encoded data carrier for [UnixFS](https://docs.ipfs.io/concepts/file-systems/#unix-file-system-unixfs).
21 |
22 | DAG-PB does not support the full [IPLD Data Model](../../data-model-layer/data-model.md).
23 |
24 | ## Implementations
25 |
26 | * JavaScript
27 | - [@ipld/dag-pb](https://github.com/ipld/js-dag-pb) - compatible with [multiformats](https://github.com/multiformats/js-multiformats)
28 | - [ipld-dag-pb](https://github.com/ipld/js-ipld-dag-pb) - legacy implementation
29 | * Go
30 | - [go-codec-dagpb](https://github.com/ipld/go-codec-dagpb) - for use with [go-ipld-prime](https://github.com/ipld/go-ipld-prime)
31 | - [go-merkledag/pb](https://github.com/ipfs/go-merkledag/tree/master/pb) - legacy implementation
32 | - [go-ipld-prime-proto](https://github.com/ipld/go-ipld-prime-proto) - read-only interface for go-merkledag/pb through [go-ipld-prime](https://github.com/ipld/go-ipld-prime)
33 |
34 | ## Serial Format
35 |
36 | The DAG-PB IPLD serial format is described with a single protobuf:
37 |
38 | ```protobuf
39 | message PBLink {
40 | // binary CID (with no multibase prefix) of the target object
41 | optional bytes Hash = 1;
42 |
43 | // UTF-8 string name
44 | optional string Name = 2;
45 |
46 | // cumulative size of target object
47 | optional uint64 Tsize = 3;
48 | }
49 |
50 | message PBNode {
51 | // refs to other objects
52 | repeated PBLink Links = 2;
53 |
54 | // opaque user data
55 | optional bytes Data = 1;
56 | }
57 | ```
58 |
59 | ### Protobuf Strictness
60 |
61 | DAG-PB aims to have a **canonical form** for any given set of data. Therefore, in addition to the standard Protobuf parsing rules, DAG-PB decoders should enforce additional constraints to ensure canonical forms (where possible):
62 |
63 | 1. Fields in the `PBLink` message must appear in the order as defined by the Protobuf schema above, following the field numbers. Blocks with out-of-order `PBLink` fields should be rejected. (Note that it is common for Protobuf decoders to accept out-of-order field entries, which means the DAG-PB spec is somewhat stricter than may be seen as typical for other Protobuf-based formats.)
64 | 2. Fields in the `PBNode` message must be encoded in the order as defined by the Protobuf schema above. Note that this order does not follow the field numbers. The decoder should accept either order, as IPFS data exists in both forms.
65 | 3. Duplicate entries in the binary form are invalid; blocks with duplicate field values should be rejected. (Note that it is common for Protobuf decoders to accept repeated field values in the binary data, and interpret them as _updates_ to fields that have already been set; DAG-PB is stricter than this.)
66 | 4. Fields and wire types other than those that appear in the Protobuf schema above are invalid and blocks containing these should be rejected. (Note that it is common for Protobuf decoders to skip data in each message type that does not match the fields in the schema.)
67 |
68 | ## Logical Format
69 |
70 | When we handle DAG-PB content at the Data Model level, we treat these objects as maps.
71 |
72 | This layout can be expressed with [IPLD Schemas](../../schemas/README.md) as:
73 |
74 | ```ipldsch
75 | type PBNode struct {
76 | Links [PBLink]
77 | Data optional Bytes
78 | }
79 |
80 | type PBLink struct {
81 | Hash Link
82 | Name optional String
83 | Tsize optional Int
84 | }
85 | ```
86 |
87 | ### Constraints
88 |
89 | * The first node in a block of DAG-PB data will match the `PBNode` type.
90 | * `Data` may be omitted or a byte array with a length of zero or more.
91 | * `Links`:
92 | * must be present, even if empty; the binary form makes no distinction between an empty array and an omitted value, in the Data Model we always instantiate an array.
93 | * elements must be sorted in ascending order by their `Name` values, which are compared by bytes rather than as strings.
94 | * `Name`s must be unique or be omitted.
95 | * `Hash`:
96 | * even though `Hash` is `optional` in the Protobuf encoding, it should not be treated as optional when creating new blocks or decoding existing ones, an omitted `Hash` should be interpreted as a bad block
97 | * the bytes in the encoding format is interpreted as the bytes of a CID, if the bytes cannot be converted to a CID then it should be treated as a bad block.
98 | * the data is encoded in the binary form as a byte array, it is therefore possible for a decoder to read a correct binary form but fail to convert a `Hash` to a CID and therefore treat it as a bad block.
99 | * When creating data, you can create maps using the standard Data Model concepts, and as long as they have exactly these fields. If additional fields are present, the DAG-PB codec will error, because there is no way to encode them.
100 |
101 | Both the most recent [JavaScript](https://github.com/ipld/js-dag-pb) and [Go](https://github.com/ipld/go-codec-dagpb) implementations strictly expose this logical format via the Data Model and do not support alternative means of resolving paths via named links as the legacy implementations do (see below).
102 |
103 | ## Alternative (Legacy) Pathing
104 |
105 | While the [logical format](#logical-format) implicitly describes a set of mechanisms for pathing over and through DAG-PB data in strict Data Model form, legacy implementations afford a means of resolving paths by privileging the `Name` in links.
106 |
107 | This alternative pathing is covered here as part of this descriptive spec, but was developed independently of the Data Model and is thus not well standardized.
108 | The alternative pathing mechanisms differ between implementations and has been removed from the newer implementations entirely.
109 |
110 | The legacy [Go](https://github.com/ipfs/go-merkledag/tree/master/pb) and [JavaScript](github.com/ipld/js-ipld-dag-pb) implementations both support pathing with link names: `///…`.
111 |
112 | In the legacy Go implementation, this is the only way, which implies that is is impossible to path through nodes that don't name their links. Also neither the Data section nor the Links section/metadata are accessible through paths.
113 |
114 | In the legacy JavaScript implementation, there is an additional way to path through the data. It's based purely on the structure of object, i.e. `/Links//Hash/…`. This way you have direct access to the `Data`, `Links`, and `size` fields, e.g. `/Links//Hash/Data`.
115 |
116 | These two ways of pathing can be combined, so you can access e.g. the `Data` field of a named link via `//Links/0/Hash/Data` or `/Links//Hash//Data`. When using the DAG API in js-ipfs, then the pathing over the structure has precedence, so you won't be able to use named pathing on a named link called `Links`, you would need to use the index of the link instead.
117 |
118 | Both the most recent [JavaScript](https://github.com/ipld/js-dag-pb) and [Go](https://github.com/ipld/go-codec-dagpb) implementations do not expose novel pathing mechanisms but adhere strictly to the IPLD Data Model as described in the above [Logical Format](#logical-format) schema.
119 |
120 | ## Zero-length blocks
121 |
122 | The zero-length DAG-PB block is valid and will be decoded as having null `Data` and an empty `Links` array.
123 |
124 | With a SHA2-256 multihash, the CID of this block is:
125 |
126 | * CIDv1: `bafybeihdwdcefgh4dqkjv67uzcmw7ojee6xedzdetojuzjevtenxquvyku`
127 | * CIDv0: `QmdfTbBqBPQ7VNxZEYEj14VmRuZBkqFbiwReogJgS1zR1n`
128 |
--------------------------------------------------------------------------------
/block-layer/content-addressable-archives.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Specification: Content Addressable aRchives (CAR / .car)
17 |
18 | **This document has moved:** https://ipld.io/specs/transport/car/
19 |
--------------------------------------------------------------------------------
/block-layer/content-addressable-archives.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ipld/specs/a7b9376ebd43aeabba7d78487db3d9df456b7714/block-layer/content-addressable-archives.png
--------------------------------------------------------------------------------
/block-layer/graphsync/graphsync.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Graphsync
17 |
18 | **Status: Prescriptive - Draft**
19 |
20 | A protocol to synchronize graphs across peers.
21 |
22 | See also [ipld](../IPLD.md), [IPLD Selectors](../../selectors/selectors.md)
23 |
24 | ## [Meta: Status of this doc]
25 |
26 | - This was written around 2018-10-16 ([video presentation](https://drive.google.com/file/d/1NbbVxZQFKXwW6mdodxgTaftsI8eID-c1/view))
27 | - This document is unfortunately far from complete. :( I want to finish this by EOQ.
28 | - But this document provides enough information for an implementation to be made by someone who has already implemented bitswap (or understands it well).
29 | - It relies heavily on an understanding of bitswap as it is now. It likely won't be useful to people without a good understanding of how Bitswap works at the moment.
30 | - This requires IPLD Selectors to exist and be implemented.
31 |
32 | ## Concepts and Glossary
33 |
34 | - `peer` - a program or process participating in the graphsync protocol. It can connect to other peers.
35 | - `graph` - an authenticated directed acyclic graph (DAG) of content. an IPLD dag. consists of nodes with hash (content addressed, authenticated) links to other nodes. ($$ G $$)
36 | - `dag` - a directed acyclic graph. For our purposes, our DAGs are all IPLD (connected by hash links, authenticated, content addressed, etc.)
37 | - `selector` - an expression that identifies a specific subset of a graph. ($$ S(G) \subset G $$)
38 | - `selector language` - the language defining a family of selectors
39 | - `request` - a request for content from one `peer` to another. This is similar to HTTP, RPC, or API requests.
40 | - `response` - the content sent from `responder` to `requester` fulfilling a `request`.
41 | - `requester` - the peer which initiates a `request` (wants content).
42 | - `responder` - the peer receiving a `request`, and providing content in a `response` (provides content).
43 | - `request process` - a request and its fulfillment is a sub-process, a procedure call across peers with the following phases (at a high level):
44 | - (1) The `requester` initiates by sending a request message (`req`) to the `responder`, specifying desired content and other request parameters.
45 | - (2) Upon receiving a request message, the `responder` adds the request to a set of active requests, and starts processing it.
46 | - (3) The `responder` fulfills the request by sending content to the `requester` (the `response`) .
47 | - (4) The `responder` and `requester` can terminate the request process at any time.
48 | - Notes:
49 | - We are explicitly avoiding the `client-server` terminology to make it clear that `requester` and `responder` are "roles" that any peer might play, and to avoid failing in the two-sided client-server model of the web..
50 | - `requests` may be short or long-lived -- requests may be as short as microseconds or last indefinitely.
51 | - `priority` - a numeric label associated with a `request` implying the relative ordering of importance for requests. This is a `requester's` way of expressing to a `responder` the order in which the `requester` wishes the `requests` to be fulfilled. The `responder` SHOULD respect `priority`, though may return `responses` in any order.
52 |
53 | ## Interfaces
54 |
55 | This is a listing of the data structures and process interfaces involved in the graphsync protocol. For simplicity, we use Go type notation, though of course graphsync is language agnostic.
56 |
57 | ```go
58 | type Graphsync interface {
59 | Request(req Request) (Response, error)
60 | }
61 |
62 | type Request struct {
63 | Selector Selector
64 | Priority Priority // optional
65 | Expires time.Duration // optional
66 | }
67 |
68 | type GraphSyncNet interface {
69 | SendMessage(m Message)
70 | RecvMessage(m Message)
71 | }
72 | ```
73 |
74 |
75 | ## Network Messages
76 |
77 | ```protobuf
78 | message GraphsyncMessage {
79 |
80 | message Request {
81 | int32 id = 1; // unique id set on the requester side
82 | bytes root = 2; // a CID for the root node in the query
83 | bytes selector = 3; // ipld selector to retrieve
84 | map extensions = 4; // side channel information
85 | int32 priority = 5; // the priority (normalized). default to 1
86 | bool cancel = 6; // whether this cancels a request
87 | bool update = 7; // whether this is an update to an in progress request
88 | }
89 |
90 | message Response {
91 | int32 id = 1; // the request id
92 | int32 status = 2; // a status code.
93 | map extensions = 3; // side channel information
94 | }
95 |
96 | message Block {
97 | bytes prefix = 1; // CID prefix (cid version, multicodec and multihash prefix (type + length)
98 | bytes data = 2;
99 | }
100 |
101 | // the actual data included in this message
102 | bool completeRequestList = 1; // This request list includes *all* requests, replacing outstanding requests.
103 | repeated Request requests = 2; // The list of requests.
104 | repeated Response responses = 3; // The list of responses.
105 | repeated Block data = 4; // Blocks related to the responses
106 | }
107 | ```
108 |
109 |
110 | ### Extensions
111 |
112 | The Graphsync protocol is extensible. A graphsync request and a graphsync response contain an `extensions` field, which is a map type. Each key of the extensions field specifies the name of the extension, while the value is data (serialized as bytes) relevant to that extension.
113 |
114 | Extensions help make Graphsync operate more efficiently, or provide a mechanism for exchanging side channel information for other protocols. An implementation can choose to support one or more extensions, but it does not have to.
115 |
116 | A list of well known extensions is found [here](./known_extensions.md)
117 |
118 | ### Updating requests
119 |
120 | A client may send an updated version of a request.
121 |
122 | An update contains ONLY extension data, which the responder can use to modify an in progress request. For example, if a responder supports the Do Not Send CIDs extension, it could choose to also accept an update to this list and ignore CIDs encountered later. It is not possible to modify the original root and selector of a request through this mechanism. If this is what is needed, you should cancel the request and send a new one.
123 |
124 | The update mechanism in conjunction with the paused response code can also be used to support incremental payment protocols.
125 |
126 | ### Response Status Codes
127 |
128 | ```
129 | # info - partial
130 | 10 Request Acknowledged. Working on it.
131 | 11 Additional Peers. PeerIDs in extra.
132 | 12 Not enough vespene gas ($)
133 | 13 Other Protocol - info in extra.
134 | 14 Partial Response w/ metadata, may include blocks
135 | 15 Request Paused, pending update, see extensions for info
136 |
137 | # success - terminal
138 | 20 Request Completed, full content.
139 | 21 Request Completed, partial content.
140 |
141 | # error - terminal
142 | 30 Request Rejected. NOT working on it.
143 | 31 Request failed, busy, try again later (getting dosed. backoff in extra).
144 | 32 Request failed, for unknown reason. Extra may have more info.
145 | 33 Request failed, for legal reasons.
146 | 34 Request failed, content not found.
147 | ```
148 |
149 | ## Example Use Cases
150 |
151 | ### Syncing a Blockchain
152 |
153 | Requests we would like to make for this:
154 |
155 | - Give me `/Parent`, `/Parent/Parent` and so on, up to a depth of `N`.
156 | - Give me nodes that exist in `` but not ``
157 | - In addition to this, the ability to say "Give me some range of (the above query) is very important". For example: "Give me the second 1/3 of the nodes that are children of `` but not ``"
158 |
159 | ### Downloading Package Dependencies
160 |
161 | - Give me everything within `/foo/v1.0.0`
162 |
163 | ### Loading content from deep within a giant dataset
164 |
165 | - Give me the nodes for the path `/a/b/c/d/e/f/g`
166 |
167 | ### Loading a large video optimizing for playback and seek
168 |
169 | - First, give me the first few data blocks `/data/*`
170 | - Second, give me all of the tree except for leaves `/**/!`
171 | - Third, give me everything else. `/**/*`
172 |
173 | ### Looking up an entry in a sharded directory
174 |
175 | Given a directory entry I think *might* exist in a sharded directory, I should be able to specify the speculative hamt path for that item, and get back as much of that path that exists. For example:
176 |
177 | "Give me `/AB/F5/3E/B7/11/C3/B9`"
178 |
179 | And if the item I want is actually just at `/AB/F5/3E`, I should get that back.
180 |
181 | ## Other notes
182 |
183 | **Cost to the responder.** The graphsync protocol will require a non-zero additional overhead of CPU and memory. This cost must be very clearly articulated, and accounted for, otherwise we will end up opening ugly DoS vectors
184 |
--------------------------------------------------------------------------------
/block-layer/graphsync/known_extensions.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Graphsync: Known Extensions
17 |
18 | ### Do Not Send CIDs
19 |
20 | Extension Name: `graphsync/do-not-send-cids`
21 |
22 | What it does:
23 |
24 | Often a node may know ahead of time that it has some of the blocks needed to match a selector query in its local store already. Some reasons this might occur include:
25 |
26 | - a previous request was interrupted
27 | - a previous request for a subset of the requested selector was already completed
28 |
29 | How it works:
30 |
31 | When a requestor sends a request, it should send a CBOR encoded IPLD Node that is a list of CIDs it already has as the value for this extension.
32 |
33 | The IPLD schema is as follows:
34 |
35 | ```ipldsch
36 | type DoNotSendCids [Cid]
37 | ```
38 |
39 | The responder node will execute the selector query as it would normally. However, if it supports the extension, when the selector query passes over any blocks that have a cid from the DoNotSend list, the responder will not send that block back, knowing ahead of time the requestor already has it. The responder does not send a value back in the response for this extension.
40 |
41 | ### Response Metadata
42 |
43 | Extension Name: `graphsync/response-metadata`
44 |
45 | What it does:
46 |
47 | Response metadata provides information about the response to help the requestor more efficiently verify that the blocks sent back from the responder are valid for the requested IPLD selector. It contains information about the CIDs the responder traversed, in order, during the course of performing the selector query, and whether or not the corresponding block was present in its local block store. Telling the requestor immediately that the query passed over a block the responder did not have allows the requestor to advance its local query, and return a separate error for that particular block.
48 |
49 | How it works:
50 |
51 | When a requestor node sends a request, it should include the "graphsync/response-metadata" key with an CBOR-encoded IPLD encoded boolean value of true to request metadata.
52 |
53 | When the responder sends responses, it should include the key with a CBOR-encoded IPLD node of the format:
54 |
55 | ```json
56 | [
57 | {
58 | "link": "cidabcdef",
59 | "blockPresent": true
60 | },
61 | {
62 | "link": "abcdedf443",
63 | "blockPresent": false
64 | },
65 | ...
66 | ]
67 | ```
68 |
69 | or in IPLD Schema:
70 |
71 | ```ipldsch
72 | type LinkMetadata struct {
73 | link Cid
74 | blockPresent Bool
75 | }
76 |
77 | type ResponseMetadata [LinkMetadata]
78 | ```
79 |
--------------------------------------------------------------------------------
/block-layer/multihash.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Concept: Multihash
17 |
18 | Multihash is hash format that is not specific to a single hashing algorithm.
19 |
20 | A multihash describes the algorithm used for the hash as well as the hash value.
21 |
22 | ```
23 | +-----------+----------------------------+
24 | | Hash Type | Hash Value |
25 | +-----------+----------------------------+
26 | ```
27 |
28 | SHA-256 example.
29 |
30 | ```
31 | +---------+------------------------------------------------------------------+
32 | | SHA-256 | 2413fb3709b05939f04cf2e92f7d0897fc2596f9ad0b8a9ea855c7bfebaae892 |
33 | +---------+------------------------------------------------------------------+
34 | ```
35 |
36 | Note: these examples are simplifications of the concepts. For a complete description visit the [project and its specs](https://github.com/multiformats/multihash).
37 |
--------------------------------------------------------------------------------
/block-layer/serialization-and-formats.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Concept: Serialization and Formats
17 |
18 | A logical separation exists in any given IPLD codec between the **format** and the **serializer/deserializer**.
19 |
20 | ```
21 | ┌────────────────────┐ ┌────────────────────┐
22 | │ │ │ │
23 | │ Serializer │ │ Deserializer │
24 | │ │ │ │
25 | └─────────┬──────────┘ └──────────^─────────┘
26 | │ │
27 | │ Sent to another peer │
28 | │ │
29 | ┌─────────v──────────┐ ┌──────────┴─────────┐
30 | │ │ │ │
31 | │ Format ├─────────────> Format │
32 | │ │ │ │
33 | └────────────────────┘ └────────────────────┘
34 | ```
35 |
36 | A **format** may represent object types and tree structures any way it wishes.
37 | This includes existing representations (JSON, BSON, CBOR, Protobuf, msgpack, etc) or even new custom serializations.
38 |
39 | Therefor, a **format** is the standardized representation of IPLD Links and Paths.
40 | It describes how to translate between structured data and binary.
41 |
42 | It is worth noting that **serializers** and **deserializers** differ by programming language while the **format** does not and MUST remain consistent across all codec implementations.
43 |
--------------------------------------------------------------------------------
/concepts/content-addressability.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Concept: Content Addressability
17 |
18 | "Content addressability" refers to the ability to refer to content by a trustless identifier.
19 |
20 | Rather than referring to content by a string identifier or URL, content addressable systems refer to content
21 | by a cryptographic hash. This allows complete decentralization of the content as the identifier
22 | does not specify the retrieval method and provides a secure way to verify the content.
23 |
--------------------------------------------------------------------------------
/data-model-layer/data-model.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Specification: IPLD Data Model
17 |
18 | **Status: Descriptive - Draft**
19 |
20 | The IPLD Data Model is a core part of the IPLD specification,
21 | which describes what data is representable in IPLD --
22 | for example, booleans, integers, textual strings, maps and lists, etc.
23 |
24 | While the Data Model describes these representations in the abstract,
25 | [Codecs](../block-layer/codecs) specify exactly how these data are transcribed
26 | into serialized bytes. (Another component of the IPLD specifications,
27 | [Schemas](../schemas), provide additional optional tooling on top of the Data
28 | Model which can further refine, describe, and constrain the range of acceptable
29 | data values.)
30 |
31 | Motivation
32 | ----------
33 |
34 | There is not **one** block format but **many** block formats widely used today in content
35 | addressed data structures. We assume that we'll see more of these block formats in the
36 | future and not less. It is quite clear then that a reasonable and more future proof approach
37 | to using these data structures is to be block format agnostic.
38 |
39 | The data model defines a common representation of basic types that **are easily representable
40 | by common programming languages.** This provides the foundation for block format agnostic tools
41 | to be built using familiar native types in a programmer's preferred language. As such, there
42 | is an element of "lowest common denominator" to the IPLD Data Model in that it cannot support
43 | some advanced features (like non-string keys for Maps) because support for such a feature
44 | is not common enough among programming languages.
45 |
46 | This does not mean that a block format could not support more advanced features than exist in the
47 | data model, it just means that the common set of tools IPLD is building w/ its block format
48 | agnostic approach cannot be easily leveraged to use those features.
49 |
50 | Kinds
51 | -----
52 |
53 | The following is the list of essential _kinds_ (or more formally, _representation kinds_)
54 | of data representable in the IPLD Data Model:
55 |
56 | * Null
57 | * Boolean
58 | * Integer
59 | * Float
60 | * String
61 | * Bytes
62 | * List
63 | * Map
64 | * Link
65 |
66 | (Note that we use the term "_kinds_" here to disambiguate this from "_types_",
67 | which is a term we'll use at the [Schemas](../schemas) level.)
68 |
69 | The _recursive kinds_ are:
70 |
71 | * List
72 | * Map
73 |
74 | The _scalar kinds_ (the complement of recursive) are:
75 |
76 | * Null
77 | * Boolean
78 | * Integer
79 | * Float
80 | * String
81 | * Bytes
82 | * Link
83 |
84 | (Note that [Schemas](../schemas) introduce a few more kinds -- when clarification is necessary,
85 | these Data Model kinds can be called the "_representation kinds_",
86 | while the additional kinds introduced in the Schema layer are "_perceived kinds_".)
87 |
88 | ### Kinds Reference
89 |
90 | Each of the following sections provides details about each of the kinds
91 | introduced in the summary table above.
92 |
93 | #### Null kind
94 |
95 | Null is a scalar kind. Its cardinality is one -- the only value is 'null'.
96 |
97 | #### Boolean kind
98 |
99 | Boolean is a scalar kind. Its cardinality is two -- either the value 'true' or the value 'false'.
100 |
101 | #### Integer kind
102 |
103 | Integer is a scalar kind and refers to whole numbers without a fractional
104 | component.
105 |
106 | It is important to consider codec and language limitations that may be imposed
107 | on the serialization of integers from the IPLD Data Model. For example:
108 |
109 | * Some codecs, such as DAG-CBOR, will assume that integers must be within the
110 | 64-bit signed range and reject anything larger.
111 | * IPLD libraries, such as go-ipld-prime, limit their in-memory representation
112 | of the integer kind to the signed 64-bit range.
113 | * JavaScript has difficulties safely handling and representing integers outside
114 | of the 53-bit unsigned range and differentiating between integers and floats
115 | where there is no fractional component.
116 |
117 | #### Float kind
118 |
119 | Float is a scalar kind and refers to a number with a fractional component,
120 | represented with a decimal point. This may also include numbers where the
121 | fractional component is zero (although the distinction between these numbers
122 | and integers is blurred in some cases—specifically the JavaScript memory model).
123 |
124 | There is not a strict 1:1 mapping between the Float kind and any particular
125 | specified memory layout. Even though IEEE 754 is the most common memory and
126 | binary format for expressing floating point values, the IPLD Data Model does
127 | not treat this format as the definition of "Float". However, for practical
128 | purposes, the IPLD Float kind will be implemented using IEEE 754 primitives and
129 | byte layout in most languages and codecs due to its ubiquity. Parts of the IEEE
130 | 754 specification that go beyond the representation of simple floating-point
131 | numbers are not included in the IPLD Data Model. This includes the special
132 | values `NaN`, `Infinity` and `-Infinity`, which are commonly available where
133 | IEEE 754 is supported (`NaN` in particular introduces considerable variability
134 | in byte representations). These should not be supported by IPLD tooling.
135 |
136 | While Float is a formal kind in the IPLD Data Model, **it is recommended that
137 | Float values be avoided when developing systems on IPLD** (and
138 | content-addressable systems in general) due to:
139 | * The broad scope for introducing variability in byte representations.
140 | * The ambiguity introduced in some languages that may cause round-trip
141 | discrepancies; specifically JavaScript which does not clearly disambiguate
142 | between "float" and "integer" in its memory model.
143 | The imprecise nature of representing the range of possible fractional numbers
144 | (infinite) in a fixed number of bits means that floating point operations
145 | typically involve a margin of tolerance (i.e. strict equality is rarely a
146 | correct way to compare floating point numbers generated by different systems).
147 | Content-addressing works best where the content being addressed has a
148 | stable meaning for the address it produces. Alternative methods for
149 | representing this meaning, or for encoding fractional numbers with greater
150 | precision and less variability should be considered where possible.
151 |
152 | Some alternatives for representing floating point numbers include:
153 |
154 | * Integers that are divided by a fixed number (e.g. represent cents rather
155 | than dollars and cents and divide by 100 where necessary).
156 | * Pairs of integers representing the parts of a floating point, e.g
157 | significand & exponent.
158 | * A byte array backed by a programmatic construct with necessary accuracy,
159 | e.g. Go's `big.Float`.
160 | * A string form of the value with a fixed number of decimal places.
161 |
162 | #### String kind
163 |
164 | Strings do not have any guarantees or requirements with regard to encoding.
165 |
166 | All the hazards you typically find among programming languages and libraries working with binary
167 | serialization of strings apply to strings in the IPLD data model. IPLD's data model is conceptual,
168 | it takes the world as it is, and the world of strings has widely known compatibility issues.
169 |
170 | * Some languages/libraries guarantee a string encoding (typically UTF8), some do not.
171 | * Some languages/libraries can handle arbitrary byte data in strings, some do not.
172 |
173 | While some codec specifications will define a required encoding it should be noted that in practice
174 | many codec implementations leave this kind of validation and sanitization up to the consumer (application
175 | code) and it is typical to find arbitrary byte data in strings even in codecs that explicitely forbid it.
176 |
177 | Applications **SHOULD** only encode UTF8 data into string values and use byte values when they need
178 | arbitrary bytes, but IPLD libraries may not provide these guarantees and rely on the application, or often the
179 | programming language itself, to do so instead.
180 |
181 | Applications that only serialize valid UTF8 in string values will have fewer compatibility
182 | issues than applications that do not.
183 |
184 | Codec implementations that can de-serialization and roundtrip
185 | arbitrary byte data in strings will see fewer bug reports from people working with data produced by
186 | applications that serialize arbitrary byte data into strings.
187 |
188 | The [Unicode Normalization Form](http://www.unicode.org/reports/tr15/) should be NFC.
189 |
190 | #### Bytes kind
191 |
192 | Bytes is a scalar kind. Its cardinality is infinite -- byte sequences do not have a length limit.
193 |
194 | Bytes are distinct from strings in that they are not considered to have any character encoding nor
195 | generally expected to be printable as human-readable text.
196 | In order to print byte sequences as text, additional effort such as Base64 encoding may be required.
197 |
198 | #### List kind
199 |
200 | List is a recursive kind.
201 |
202 | Values contained in lists can be accessed by their ordinal offset in the list.
203 |
204 | #### Map kind
205 |
206 | Map is a recursive kind.
207 |
208 | Values in maps are accessed by their "key". Maps can also be iterated over,
209 | yielding key+value pairs.
210 |
211 | #### Link kind
212 |
213 | A link represents a link to another IPLD Block. The link reference
214 | is a [`CID`](CID.md).
215 |
216 | Link is a scalar kind -- however, when "loaded", may become another kind, either scalar or recursive!
217 |
218 | ### Kinds Implementation References
219 |
220 | - Kinds in Go: https://github.com/ipld/go-ipld-prime/blob/master/kind.go
221 |
--------------------------------------------------------------------------------
/data-model-layer/numeric-size.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | Numeric sizes
17 | =============
18 |
19 |
20 | Integers
21 | --------
22 |
23 | In principle, we consider the range of integers to be infinite.
24 | In practice, many libraries may choose to implement things in such a way that numbers may have limited sizes.
25 |
26 | We require that IPLD libraries support integers up to at least size 2^53 in order to be considered a full-featured core-compliant IPLD library.
27 |
28 | We love IPLD libraries that support arbitrarily large numbers. But 2^53 is the critical minimum.
29 |
30 |
31 | ### Why?
32 |
33 | Because.
34 |
35 | But let's ask more detailed questions, and answer those:
36 |
37 | ### Why have size limits at all?
38 |
39 | Most programming languages and compilers already have size limits when working with numbers.
40 |
41 | Being in denial about this when we describe IPLD is unconstructive:
42 | It's important for us to be able to provide concrete recommendations about what we expect from IPLD libraries --
43 | and if that guidance is "always use a bigint type, regardless of whether you language and ecosystem provides a usable and widely adopted one",
44 | then that guidance will be frequently ignored, regardless of how principled and well-intentioned it may be.
45 |
46 | ### Why 2^53?
47 |
48 | This 2^53 number is chosen because it's reasonably high (e.g., you can use it for timestamps),
49 | and also because it's reasonably practical (it happens to be the number above which javascript's handling of numbers gets Interesting).
50 |
51 | **Above this number, it's likely that you'll want to consider application-level and language-level numeric compatibility issues which are bigger in scope than IPLD _anyway_.**
52 |
53 | 32-bit signed, 32-bit unsigned, 64-bit signed, and 64-bit unsigned integers are also all common numeric sizes to consider,
54 | because those are often the well-supported numeric types in programming languages.
55 | When writing a new IPLD library, we suggest you pick "64-bit signed" if these are your options.
56 |
57 | (32-bit numbers are definitely small enough to get you into trouble;
58 | the [2038 problem](https://en.wikipedia.org/wiki/Year_2038_problem) is coming up *very* soon now.
59 | By contrast, a 53-bit integer used to represent a second-granularity timestamp should get you to about the year 285,618,384.
60 | So... it should suffice.)
61 |
62 | ### What if I want to write an IPLD library that supports arbitrarily large ints?
63 |
64 | **Go for it.**
65 |
66 | If it is possible to support arbitrary "BigInt" in your library, that's fantastic. Do it.
67 |
68 | We just don't _mandate_ this as part of the minimum core feature checklist for IPLD libraries,
69 | because we understand that it's impractical in some programming languages,
70 | either because the "BigInt" types have different performance characteristics,
71 | or aren't widely agreed upon in the community,
72 | or are otherwise simply syntactically or ergonomically clunky to handle, etc.
73 | But if you see it as easy to support: _go for it_.
74 |
75 | ### What if I ship an IPLD library that only supports 2^47?
76 |
77 | (... or some other completely arbitrary number.)
78 |
79 | Fine.
80 |
81 | Please be very clear about that in your documentation.
82 |
83 | We won't list your library in our own docs as being a full-featured core-compliant IPLD library.
84 |
85 | But go nuts; nobody's going to stop you.
86 | Your library may run into hard times processing data that's produced by other IPLD libraries, but that's a choice you're free to make.
87 |
88 | ### What if my IPLD library encounters serialized numbers that are bigger than it supports?
89 |
90 | Then it must error. Clearly, one would hope.
91 |
92 | IPLD libraries must not quietly round down to their max (or up to their min) supported values -- they must error.
93 |
94 | Fortunately, this rule gets us pretty far, pretty easily -- because we don't *do math* in IPLD,
95 | these errors can really only arise during deserialization,
96 | and should fit naturally into the error reporting flow that deserialization already naturally needs.
97 |
98 |
99 |
100 |
101 |
102 | Floating point
103 | --------------
104 |
105 | // a lot of text can go here.
106 |
107 | // it would be nice if at least some of it can talk about the inherent issues of precision ambiguity in floating point,
108 | // and how fixed point is actually an important consideration in many sufficiently scientific applications.
109 | // discussion of fractions and how floating point approximations of them are necessarily wrong would also be appropriate.
110 | // in general it would be great if this document can remind people that floats are rarely "the answer", and are certainly not the only answer.
111 |
--------------------------------------------------------------------------------
/data-model-layer/pathing.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | Pathing in IPLD
17 | ===============
18 |
19 | "Pathing" refers to the use of "paths" to describe navigation between nodes in IPLD data.
20 |
21 | You can think of "pathing" in IPLD as being comparable to how you use "paths" in a filesystem:
22 | paths are composed of a series of segments, and each segment is an instruction on how to navigate deeper into the filesystem.
23 | With filesystems, each step is over a "directory" and leads you to either a "file" or another "directory";
24 | for IPLD, each step is over a "node" and leads you to another "node"!
25 |
26 | Paths are used in IPLD libraries to tell the library to traverse a graph of nodes.
27 | Paths are also used in IPLD libraries to tell you about the progress you've made in a traversal,
28 | or in error messages to describe where in a data graph the error was encountered.
29 |
30 | Paths are also often user-facing -- it's very convenient for applications built on IPLD to use IPLD paths as part of their user interface.
31 | (See the [how IPFS web gateways work tutorial](/tutorials/how-ipfs-web-gateways-work.md) for an example of this.)
32 |
33 |
34 | Paths vs Path Segments
35 | ----------------------
36 |
37 |
38 |
39 |
40 |
41 | Unified pathing
42 | ---------------
43 |
44 | We have a concept of pathing which unifies the way we handle data in the (raw) Data Model, data processed with IPLD Schemas, and data which is accessed via ADLs.
45 |
46 |
47 |
48 | Paths in various IPLD implementations
49 | -------------------------------------
50 |
51 | - in Golang:
52 | - in [go-ipld-prime](https://github.com/ipld/go-ipld-prime):
53 | - [ipld.Path (godoc)](https://godoc.org/github.com/ipld/go-ipld-prime#Path)
54 | - [ipld.PathSegment (godoc)](https://godoc.org/github.com/ipld/go-ipld-prime#PathSegment)
55 |
--------------------------------------------------------------------------------
/data-model-layer/paths.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Specification: IPLD Paths
17 |
18 | **Status: Descriptive - Draft**
19 |
20 | ## Summary
21 |
22 | An IPLD "Path" is a string identifier used for deep references into IPLD graphs.
23 | Paths follow similar escape and segmentation rules as URI paths.
24 |
25 | An IPLD Path is a string identifier used for deep references into IPLD graphs.
26 |
27 | IPLD Path's are constructed following the same constraints as [URI Paths](https://tools.ietf.org/html/rfc3986#section-3.3).
28 |
29 | Similarly, the string `?` is reserved for future use as a query separator.
30 |
31 | ## Path Resolution
32 |
33 | Path resolution is broken into two parts: full path resolution and block level resolution.
34 |
35 | Block level path resolution is defined by individual codecs.
36 |
37 | Full path resolution should use block level resolution through each block.
38 | When a block level resolver returns an `IPLD Link` a full path resolution
39 | should retrieve that block, load its codec, and continue on with additional
40 | block level resolution until the full path is resolved. Finally, path resolution
41 | should return a [**representation**](./IPLD-Path.md#representation)
42 | of the value for the given path.
43 |
--------------------------------------------------------------------------------
/data-structures/ethereum/README.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Ethereum as an IPLD Data Structure
17 |
18 | Within these documents, schemas are grouped by their serialized blocks.
19 | Other than those types listed in "Basic Types", the top-level schema type in each grouping of schema
20 | types in a code block represents a data structure that is serialized into a single IPLD block with its own Link (CID).
21 |
22 | There are some state data structures that are repeats of the same form: a modified merkle patricia trie node.
23 | They are not de-duplicated here for clarity to demonstrate the different purposes and contents of those data structures.
24 |
25 | For more information about the IPLD Schema language, see the [specification](https://specs.ipld.io/schemas/).
26 |
27 | ## Data Structure Descriptions
28 |
29 | * [Ethereum Data Structures **Basic Types**](basic_types.md)
30 | * [Ethereum **Chain** Data Structures](chain.md)
31 | * [Ethereum **Convenience Types**](convenience_types.md)
32 | * [Ethereum **State** Data Structures](state.md)
33 |
--------------------------------------------------------------------------------
/data-structures/ethereum/basic_types.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Ethereum Data Structure Basic Types
17 | These types are used throughout the Ethereum data structures but are themselves not IPLD blocks.
18 |
19 | ```ipldsch
20 | # Go big.Int
21 | # Prefer presenting to users either as a number or a string view of the decimal number
22 | # for readability.
23 | type BigInt bytes
24 |
25 | # Unsigned integer
26 | # Used to explicity specify that an integer cannot be negative
27 | type Uint int
28 |
29 | # Block nonce is an 8 byte binary representation of a block's nonce
30 | type BlockNonce bytes
31 |
32 | # Hash represents the 32 byte KECCAK_256 hash of arbitrary data.
33 | type Hash bytes
34 |
35 | # Address represents the 20 byte address of an Ethereum account.
36 | type Address bytes
37 |
38 | # Bloom represents a 256 byte bloom filter.
39 | type Bloom bytes
40 |
41 | # Balance represents an account's balance in units of wei (1*10^-18 ETH)
42 | type Balance BigInt
43 |
44 | # OpCode is a 1 byte EVM opcode
45 | type OpCode bytes
46 | ```
47 |
--------------------------------------------------------------------------------
/data-structures/ethereum/chain.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Ethereum Chain Data Structures
17 |
18 | This section contains the IPLD schemas for the blockchain data structures of Ethereum.
19 | This includes: headers, uncle sets, transactions, and receipts. The state trie, storage trie,
20 | receipt trie, and transaction trie IPLDs are described in the [state](state.md) section. It
21 | is important to note that traversal from header to a specific transaction or receipt requires traversal
22 | across their respective tries beginning at the root referenced in the header. Alternatively, uncles are referenced
23 | directly from the header by the hash of the RLP encoded list of uncles.
24 |
25 | ## Header IPLD
26 |
27 | This is the IPLD schema for a canonical Ethereum block header.
28 | * The IPLD block is the RLP encoded header
29 | * Links to headers use a KECCAK_256 mutlihash of the RLP encoded header and the EthHeader codec (0x90).
30 | * Parent headers are referenced back to by their child header.
31 | * The genesis header is unique in that it does not reference a parent header in `ParentCID`, instead it contains a reference to a `GenesisInfo` ADL.
32 |
33 | ```ipldsch
34 | # Header contains the consensus fields of an Ethereum block header
35 | type Header struct {
36 | # CID link to the parent header
37 | # This CID is composed of the KECCAK_256 multihash of the linked RLP encoded header and the EthHeader codec (0x90)
38 | ParentCID &Header
39 |
40 | # CID link to the list of uncles at this block
41 | # This CID is composed of the KECCAK_256 multihash of the RLP encoded list of Uncles and the EthHeaderList codec (0x91)
42 | # Note that an uncle is simply a header that does not have an associated body
43 | UnclesCID &Uncles
44 | Coinbase Address
45 |
46 | # CID link to the root node of the state trie
47 | # This CID is composed of the KECCAK_256 multhash of the RLP encoded state trie root node and the EthStateTrie codec (0x96)
48 | # This steps us down into the state trie, from which we can link to the rest of the state trie nodes and all the linked storage tries
49 | StateRootCID &StateTrieNode
50 |
51 | # CID link to the root node of the transaction trie
52 | # This CID is composed of the KECCAK_256 multihash of the RLP encoded tx trie root node and the EthTxReceiptTrie codec (0x92)
53 | # This steps us down into the transaction trie, from which we can link to the rest of the tx trie nodes and all of the linked transactions
54 | TxRootCID &TxTrieNode
55 |
56 | # CID link to the root of the receipt trie
57 | # This CID is composed of the KECCAK_256 multihash of the RLP encoded rct trie root node and the EthTxReceiptTrie codec (0x94)
58 | # This steps us down into the receipt trie, from which we can link to the rest of the rct trie nodes and all of the linked receipts
59 | RctRootCID &RctTrieNode
60 |
61 | Bloom Bloom
62 | Difficulty BigInt
63 | Number BigInt
64 | GasLimit Uint
65 | GasUser Uint
66 | Time Uint
67 | Extra Bytes
68 | MixDigest Hash
69 | Nonce BlockNonce
70 | }
71 | ```
72 |
73 | ## Uncles IPLD
74 | This is the IPLD schema for a list of uncles ordered in ascending order by their block number.
75 | * The IPLD block is the RLP encoded list of uncles
76 | * CID links to `UncleList` use a KECCAK_256 multihash of the RLP encoded list and the EthHeaderList codec (0x92).
77 | * The `Uncles` is referenced in an Ethereum `Header` by the `UnclesCID`.
78 |
79 | ```ipldsch
80 | # Uncles contains an ordered list of Ethereum uncles (headers that have no associated body)
81 | # This IPLD object is referenced by a CID composed of the KECCAK_256 multihash of the RLP encoded list and the EthHeaderList codec (0x91)
82 | type Uncles [Header]
83 | ```
84 |
85 | ## Transaction IPLD
86 | This is the IPLD schema for a canonical Ethereum transaction. It contains only the fields required for consensus.
87 | Note that this will need to be updated once EIP-1559 and EIP-2718 are approved.
88 | * The IPLD block is the RLP encoded transaction
89 | * CID links to `Transaction` use a KECCAK_256 multihash of the RLP encoded transaction and the EthTx codec (0x93).
90 | * `Transaction` IPLDs are not referenced directly from an `Ethereum` header but are instead linked to from within the transaction trie whose root is referenced in the `Header` by the `TxRootCID`.
91 | ```ipldsch
92 | # Transaction contains the consensus fields of an Ethereum transaction
93 | type Transaction struct {
94 | AccountNonce Uint
95 | Price BigInt
96 | GasLimit Uint
97 | Recipient nullable Address # null recipient means the tx is a contract creation
98 | Amount BigInt
99 | Payload Bytes
100 |
101 | # Signature values
102 | V BigInt
103 | R BigInt
104 | S BigInt
105 | }
106 | ```
107 |
108 | ## Receipt IPLD
109 | This is the IPLD schema for a canonical Ethereum receipt. It contains only the fields required for consensus.
110 | * The IPLD block is the RLP encoded receipt
111 | * CID links to `Receipt` use a KECCAK_256 multihash of the RLP encoded receipt and the EthTxReceipt codec (0x95).
112 | * `Receipt` IPLDs are not referenced directly from an `Ethereum` header but are instead linked to from within the receipt trie whose root is referenced in the `Header` by the `RctRootCID`.
113 | ```ipldsch
114 | # Receipt contains the consensus fields of an Ethereum receipt
115 | type Receipt struct {
116 | PostStateOrStatus Bytes
117 | CumulativeGasUsed Uint
118 | Bloom Bloom
119 | Logs [Log]
120 | }
121 |
122 | # Log contains the consensus fields of an Etherem receipt log
123 | type Log struct {
124 | Address Address
125 | Topics [Hash]
126 | Data Bytes
127 | }
128 | ```
129 |
--------------------------------------------------------------------------------
/data-structures/ethereum/state.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Ethereum State Data Structures
17 |
18 | Ethereum state is stored in four different modified merkle patricia tries (MMPTs): the transaction, receipt, state, and storage tries.
19 | At each block there is one transaction, receipt, and state trie which are referenced by their root hashes in the block `Header`. For every contract
20 | deployed on Ethereum there is a storage trie used to hold that contract's persistent variables, each storage trie is referenced by their root hash
21 | in the state account object stored in the state trie leaf node corresponding to that contract's address.
22 |
23 | ## Trie Node IPLD
24 | This is the general MMPT node IPLD schema used by all the below MMPTs (everything below
25 | except the AccountSnapshot). The different tries are broken up and explicitly typed below in order to demonstrate the
26 | different purposes and contents of these trie structures.
27 |
28 | * If leaf or extension node: There will be two elements; [0] is the compact encoded partial path, [1] is val.
29 | * For extension nodes, val is the hash of the child node it references.
30 | * For leaf nodes, val is the RLP encoded value stored at that leaf.
31 | * For the tx trie this value is a transaction.
32 | * For the receipt trie this value is a receipt.
33 | * For the state trie this value is a state account.
34 | * For the storage trie this value is the value of a contract variable.
35 | * Partial path is the path remaining between the current path and the full path.
36 | * If the first nibble in the prefix byte of the key is 0, the node is an extension node with an even length key; The second nibble in the prefix byte is padding (0) and the key fits in the remaining bytes.
37 | * If the first nibble in the prefix byte of the key is 1, the node is an extension node with an odd length key; The second nibble in the prefix byte is actually the first nibble of the key, the rest of the key fits in the remaining bytes.
38 | * If the first nibble in the prefix byte of the key is 2, the node is a leaf node with an even length key; The second nibble in the prefix byte is padding (0) and the key fits in the remaining bytes.
39 | * If the first nibble in the prefix byte of the key is 3, the node is a leaf node with an odd length key; The second nibble in the prefix byte is actually the first nibble of the key, the rest of the key fits in the remaining bytes.
40 | * E.g. for a leaf node at path `ce57` that holds the state account for the address hash `ce573ced93917e658d10e2d9009470dad72b63c898d173721194a12f2ae5e190`,
41 | the compact encoded partial path will be `23ced93917e658d10e2d9009470dad72b63c898d173721194a12f2ae5e190`.
42 | * If branch node: There will be 17 elements; [0] - [15] store the hashes of the child nodes at the next hex character (0-f) step down a path, [16] is val.
43 | * If there are no further nodes down one of the branch's paths, an empty byte array is stored in the corresponding element.
44 |
45 | NOTE: if the value stored at a leaf node would be smaller than or equal to the length of the hash of that leaf node (<= 32 bytes) then
46 | the value is directly included in the parent branch or extension node rather than the parent node linking to the entire leaf node.
47 | In this case the child is a one element list of bytes where that one element is the RLP encoded value itself. In practice this is only possible for
48 | the storage trie, as RLP encoded transactions, receipts, and state accounts are always greater than 32 bytes in length.
49 |
50 | ```ipldsch
51 | # TrieNode ADL
52 | # Node IPLD values are RLP encoded; node IPLD multihashes are always the KECCAK_256 hash of the RLP encoded node bytes and the codec is dependent on the type of the trie
53 | type TrieNode struct {
54 | Elements [Bytes]
55 | }
56 | ```
57 |
58 | ## Maybe make TrieNode a union type of 3 different node types
59 |
60 | ## Transaction Trie IPLD
61 | This is the IPLD schema type for transaction trie nodes.
62 | * The IPLD block is the RLP encoded trie node.
63 | * Leaf node keys are the RLP encoding of the transaction's index.
64 | * Leaf node values are the RLP encoded transaction.
65 | * CID links to transaction trie nodes use a KECCAK_256 multihash of the RLP encoded node and the EthTxTrie codec (0x92).
66 | * The root node of the transaction trie is referenced in an Ethereum `Header` by the `TxRootCID`.
67 | ```ipldsch
68 | # TxTrieNode is an IPLD block for a node in the transaction trie
69 | type TxTrieNode TrieNode
70 | ```
71 |
72 | ## Receipt Trie IPLD
73 | This is the IPLD schema type for receipt trie nodes.
74 | * The IPLD block is the RLP encoded trie node.
75 | * Leaf node keys are the RLP encoding of the receipt's index.
76 | * Leaf node values are the RLP encoded receipt.
77 | * CID links to receipt trie nodes use a KECCAK_256 multihash of the RLP encoded node and the EthTxReceiptTrie codec (0x94).
78 | * The root node of the receipt trie is referenced in an Ethereum `Header` by the `RctRootCID`.
79 | ```ipldsch
80 | # RctTrieNode is an IPLD block for a node in the receipt trie
81 | type RctTrieNode TrieNode
82 | ```
83 |
84 | ## State Trie IPLD
85 | This is the IPLD schema type for state trie nodes.
86 | * The IPLD block is the RLP encoded trie node.
87 | * Leaf node keys are the KECCAK_256 hash of the account address.
88 | * Leaf node values are the RLP encoded state accounts.
89 | * CID links to state trie nodes use a KECCAK_256 multihash of the RLP encoded node and the EthStateTrie codec (0x96).
90 | * The root node of the state trie is referenced in an Ethereum `Header` by the `StateRootCID`.
91 | ```ipldsch
92 | # StateTrieNode is an IPLD block for a node in the state trie
93 | type StateTrieNode TrieNode
94 | ```
95 |
96 | ## State Account IPLD
97 | This is the IPLD schema for a state account in the Ethereum state trie.
98 | * The IPLD block is the RLP encoded state account, this is the object stored as the value in a StateTrieNode leaf.
99 | * CID links to state accounts use a KECCAK_256 multihash of the RLP encoded state account and the EthAccountSnapshot codec (0x97).
100 | ```ipldsch
101 | # Account is the Ethereum consensus representation of accounts.
102 | # These objects are stored in the state trie leafs.
103 | type StateAccount struct {
104 | Nonce Uint
105 | Balance Balance
106 |
107 | # CID link to the storage trie root node
108 | # This CID links down to all of the storage nodes that exist for this account at this block
109 | # This CID uses the EthStorageTrie codec (0x98)
110 | # If this is a contract account the multihash is a KECCAK_256 hash of the RLP encoded root storage node
111 | # If this is an externally controlled account, the multihash is a KECCAK_256 hash of an RLP encoded empty byte array
112 | StorageTrieCID &StorageTrieNode
113 |
114 | # CID link to the bytecode for this account
115 | # This CID uses the Raw codec (0x55)
116 | # If this is a contract account the multihash is a KECCAK_256 hash of the contract byte code for this contract
117 | # If this is an externally controlled account the multihash is the KECCAK_256 hash of an empty byte array
118 | CodeCID &ByteCode
119 | }
120 |
121 | type ByteCode bytes
122 | ```
123 |
124 | ## Storage Trie IPLD
125 | This is the IPLD schema type for storage trie nodes.
126 |
127 | * Leaf node keys are the KECCAK_256 hash of the storage slot key.
128 | * The value of a storage slot key is ultimately dependent on the EVM byte code compiler used.
129 | For solidity, the most widely used compiler, the keys are generated as such:
130 | 1) First, slot position is defined as the position a variable is declared. E.g. the first variable declared in a smart contract
131 | will have a slot position of `0`.
132 | 2) For primitive data type variables- such as `string`, `int`, `bool`- their storage slot key is the 32-byte left-padded slot position.
133 | E.g. for an `int` declared as the first variable in a contract its slot key is equal to `0000000000000000000000000000000000000000000000000000000000000000`
134 | and its leaf node key is KECCAK_256(`0000000000000000000000000000000000000000000000000000000000000000`).
135 | 3) For composite data type variables things get more complicated, e.g. for mappings, each entry in the mapping is stored at a different storage leaf. The storage slot key for a specific entry in a mapping is
136 | calculated as the KECCAK_256 hash of the entry's key in the map + the mapping's slot position. E.g. for a mapping `mapping(address => uint)` that is declared
137 | as the first variable in a contract, the storage slot key for the mapping element with key `0x89205A3A3b2A69De6Dbf7f01ED13B2108B2c43e7` will be equal to
138 | KECCAK_256(`0x89205A3A3b2A69De6Dbf7f01ED13B2108B2c43e7` + `0000000000000000000000000000000000000000000000000000000000000000`). The leaf node key will then be equal
139 | to the hash of this storage slot key e.g. KECCAK_256(KECCAK_256(`0x89205A3A3b2A69De6Dbf7f01ED13B2108B2c43e7` + `0000000000000000000000000000000000000000000000000000000000000000`)).
140 |
141 | * Leaf node values are the RLP encoded storage values.
142 | * CID links to storage trie nodes use a KECCAK_256 multihash of the RLP encoded node and the EthStorageTrie codec (0x98).
143 |
144 | ```ipldsch
145 | # StorageTrieNode is an IPLD block for a node in the storage trie
146 | # The root node of the storage trie is referenced in an StateAccount by the StorageTrieCID
147 | type StorageTrieNode TrieNode
148 | ```
149 |
--------------------------------------------------------------------------------
/data-structures/filecoin/README.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Filecoin as an IPLD Data Structure
17 |
18 | Within these documents, schemas are grouped by their serialized blocks. Other than those types listed in "Basic Types" and "Crypto Types", each grouping of schema types in a code block represents a data structure that is serialized into a single IPLD block with its own Link (CID).
19 |
20 | Advanced Data Layouts (ADLs) are shown in their expanded form here, as the data appears on-block. Their logical forms for programmatic purposes are `Map` for the HAMT and `List` for the AMT.
21 |
22 | There are some data structures that are repeats of the same forms, primarily the AMT and HAMTs that share the same data types. They are not de-duplicated here for clarity to demonstrate the different purposes of those data structures.
23 |
24 | For more information about the IPLD Schema language, see the [specificaiton](https://specs.ipld.io/schemas/).
25 |
26 | ## Data Structure Descriptions
27 |
28 | * [Filecoin Data Structures **Basic Types**](basic_types.md)
29 | * [Filecoin **Main Chain** Data Structures](chain.md)
30 | * [Filecoin **Messages** Data Structures](messages.md)
31 | * [Filecoin **Actor State** Data Structures](state.md)
32 |
--------------------------------------------------------------------------------
/data-structures/filecoin/basic_types.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Filecoin Data Structures Basic Types
17 |
18 | These types are used throughout the Filecoin data structures.
19 |
20 | ```ipldsch
21 | # Address is a `https://github.com/filecoin-project/go-address` in its binary byte format.
22 | # It may be used as a hamt key, and will be often visualized or presented to users
23 | # in its more readable string form.
24 | type Address bytes
25 |
26 | # Go big.Int
27 | # Prefer presenting to users either as a number or a string view of the decimal number
28 | # for readability.
29 | type BigInt bytes
30 |
31 | # An indicator of which RPC method on an actor a message should trigger execution of.
32 | type MethodNum int
33 |
34 | # The 'f0...' subset of Addresses used for the actual indexes of actors in a state root.
35 | type ActorID int
36 |
37 | # the height of a block in the chain. Should fit in an Int64
38 | type ChainEpoch int
39 |
40 | # the ChainEpoch as bytes, where the integer is converted to its string form and
41 | # the string's bytes are used
42 | type ChainEpochBytes bytes
43 |
44 | type TokenAmount BigInt
45 |
46 | type PaddedPieceSize int
47 |
48 | type PeerID bytes
49 |
50 | type SectorSize int
51 |
52 | type SectorNumber int
53 |
54 | # the SectorNumber as bytes, where the integer is encoded as a uvarint and the
55 | # resulting bytes are used
56 | type SectorNumberBytes bytes
57 |
58 | type PartitionNumber int
59 |
60 | type BitField bytes
61 |
62 | type StoragePower BigInt
63 |
64 | type DataCap StoragePower
65 |
66 | type DealID int
67 |
68 | # the DealID as bytes, where the integer is encoded as a uvarint and the resulting
69 | # bytes are used
70 | type DealIDBytes bytes
71 |
72 | type DealWeight BigInt
73 |
74 | type Multiaddr bytes
75 |
76 | type RegisteredSealProof int
77 |
78 | type TransactionID # TxnID
79 |
80 | # the TransactionID as bytes, where the integer is encoded as a varint (not uvarint)
81 | # and the resulting bytes are used
82 | type TransactionIDBytes bytes
83 |
84 | # A quantity of space x time (in byte-epochs) representing power committed to the network for some duration.
85 | type Spacetime BigInt
86 |
87 | type ExitCode int
88 |
89 | # Message parameters are encoded as DAG-CBOR and the resulting bytes are
90 | # embedded as `Params` fields in some structs.
91 | # See the Filecoin Messages Data Structures document for encoded DAG-CBOR message
92 | # params
93 | type CborEncodedParams Bytes
94 |
95 | # Message receipt returns are encoded as DAG-CBOR and the resulting bytes are
96 | # embedded as the `Return` field in `MessageReceipt`.
97 | # See the Filecoin Messages Data Structures document for encoded DAG-CBOR message
98 | # returns
99 | type CborEncodedReturn Bytes
100 | ```
101 |
102 | ## Crypto Types
103 |
104 | ```ipldsch
105 | type Signature union {
106 | SignatureSecp256k1 1
107 | SignatureBLS 2
108 | } representation byteprefix
109 | ```
110 |
--------------------------------------------------------------------------------
/data-structures/flexible-byte-layout.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Specification: Flexible Byte Layout
17 |
18 | **Status: Prescriptive - Draft**
19 |
20 | `Flexible Byte Layout` is an advanced layout for representing binary data.
21 |
22 | It is flexible enough to support very small and very large (multi-block) binary data.
23 |
24 | ```sh
25 | type FlexibleByteLayout union {
26 | | Bytes bytes
27 | | NestedByteList list
28 | | &FlexibleByteLayout link
29 | } representation kinded
30 |
31 | type NestedByteList [ NestedByte ]
32 |
33 | type NestedByte union {
34 | | Bytes bytes
35 | | NestedFBL list
36 | } representation kinded
37 |
38 | type NestedFBL struct {
39 | length Int
40 | part FlexibleByteLayout
41 | } representation tuple
42 | ```
43 |
44 | `FlexibleByteLayout` is the entrypoint/root of the data structure and uses a potentially recursive
45 | union type. This allows you to build very large nested dags via `NestedByteList` that can themselves
46 | contain additional `NestedByteList`s or actual `Bytes` (via Links to `FlexibleByteLayout` in `NestedByte`).
47 |
48 | An implementation must define a custom function for reading ranges of binary
49 | data but once implemented, you can read data regardless of the layout algorithm used.
50 |
51 | Since readers only need to concern themselves with implementing the read method, they **do not**
52 | need to understand the algorithms used to generate the layouts. This gives a lot of flexibility
53 | in the future to define new layout algorithms as necessary without needing to worry about
54 | updating prior implementations.
55 |
56 | The `length` property must be encoded with the proper byte length. If not encoded properly, readers
57 | will not be able to read properly. However, the property is **not secure** and a malicious encoder
58 | could write it as whatever they please. As such, it should not be relied upon when calculating usage
59 | against a quota or any similar calculation where there may be an incentive for an encoder to alter the
60 | length.
61 |
--------------------------------------------------------------------------------
/data-structures/fs.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Specification: unixfs-v2
17 |
18 | **Status: Prescriptive - Draft**
19 |
20 | The following schema is used to represent files and directories in pure IPLD Data Model. It
21 | differ substantially from UnixFSv1 which is built exclusively on `dag-pb` and is currently
22 | integrated into IPFS.
23 |
24 | This schema makes use of two existing data structures, HAMT and FBL.
25 |
26 | ```sh
27 | type Symlink struct {
28 | target String
29 | }
30 |
31 | type DirEnt struct {
32 | attribs optional Attribs
33 | content AnyFile
34 | }
35 |
36 | type AnyFile union {
37 | | "f" FBL
38 | | "d" &HAMT
39 | | "l" Symlink
40 | } representation keyed
41 |
42 | type Attribs struct {
43 | # we'll discuss this in the next section;
44 | # for now, it's enough to reserve the position where it's used.
45 | }
46 | ```
47 |
--------------------------------------------------------------------------------
/data-structures/hashmap.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Specification: HashMap
17 |
18 | **This document has moved:** https://ipld.io/specs/advanced-data-layouts/hamt/spec/
19 |
--------------------------------------------------------------------------------
/design/README.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # IPLD Exploratory Specifications
17 |
18 | [This is the home](https://github.com/ipld/specs/tree/master/design) for all exploratory stage specifications
19 | and other design notes.
20 |
21 | Nothing documented here should be considered final and the contribution guidelines allow any contributor to
22 | add new documents without consensus.
23 |
24 | Having implementations does not mean a specification is ready to move on to
25 | draft stage. Implementations of exploratory specifications often exist and serve to inform spec changes
26 | and eventual graduation to draft stage when appropriate.
27 |
28 | | | |
29 | |-----|------|
30 | | [Dir: history](./history) | [./history](./history) |
31 | | [Dir: history/exploration-reports](./history/exploration-reports) | [./history/exploration-reports](./history/exploration-reports)
32 |
--------------------------------------------------------------------------------
/design/composites-A.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Specification: IPLD Composites
17 |
18 | **Status: Prescriptive - Exploratory**
19 |
20 | Organizing IPLD data into usable, efficient, complex data structures spanning many blocks aimed for use by end-user applications.
21 |
22 | This document will re-use some terms found in the [IPLD data model](data-model-layer/data-model.md).
23 |
24 | IPLD Composites offer codec agnostic programming interfaces for all common operations users can currently accomplish on Data-Model [“Kinds”](data-model-layer/data-model.md#kinds).
25 |
26 | Contents:
27 |
28 | * [Motivation](#motivation)
29 | * [Requirements](#requirements)
30 | * [Composite Definition](#composite-definition)
31 | * [Operations](#operations)
32 |
33 | ## Motivation
34 |
35 | Even before the IPLD Data-Model was formally specified, developers were creating multi-block data-structures with similar semantics to single-block primitives. The most illustrative example of this is the `dag-pb` HAMT implementation used by IPFS for large directories.
36 |
37 | These early implementations of multi-block data structures exposed several problems.
38 |
39 | * They are lacking self-description. A consumer of a graph containing these structures needs to have logic on top of IPLD and vary the way it performs operations on these data structures.
40 | * Implementations of these data structures cannot perform operations on each other. In other words, multi-block data structures have a hard time building on top of each other.
41 |
42 | Since there wasn’t a standardized way to describe these data structures we couldn’t build libraries for paths and selectors that seamlessly supported them.
43 |
44 | As we started designing this system several other requirements surfaced.
45 |
46 | * Transparent encryption envelopes on the read **and** write.
47 | * Advanced `Link` types that can support some form of mutability and link to paths within other data structures.
48 | * Flexible multi-block binary types.
49 |
50 | ## Requirements
51 |
52 | IPLD Composites cannot be implemented without:
53 |
54 | * The IPLD Data-Model. While composites are codec agnostic they do require the full data model be implemented by the codec.
55 |
56 | ## Composite Definition
57 |
58 | Composite definitions describe how to find an implementation of the data structure. When encoded into the data these also serve as a the self-description mechanism.
59 |
60 | A Composite Definition may be applied in a number of ways, either "out-of-band" by applications or "in-band" using something like the the "Fat Pointers" discussed briefly below.
61 |
62 | ***Open Issue: Fat Pointers***
63 |
64 | Early experiments simply reserved the `_type` property for composites to describe themselves. Reserving this property by default across any data in any block is highly problematic and makes it impossible to express certain data in IPLD.
65 |
66 | What we need in order to move forward to enable some version of "fat pointer" is still under discussion. Some
67 | extension/modification to `CID` in order to signal that “the data being linked to is a composite definition” at
68 | which point we can safely ad semantics to `_type` or other properties without reserving any property universally
69 | would work but there may be other options we have yet to explore.
70 |
71 | ### Version 1
72 |
73 | The `_type` property is a string identifier. This identifier is used to lookup the implementation and if it cannot be found by the host environment any operation is expected to throw an exception. When a Composite Definition is applied, implementations MUST NOT fallback to *Layer 1* operations on the contents of the node if they do not have an implementation.
74 |
75 | Example:
76 |
77 | ```json
78 | { "_type": "IPLD/Experimental/HAMT/0" }
79 | ```
80 |
81 | ### Version 2
82 |
83 | The `_type` property is a `Map`.
84 |
85 | The map must contain the following properties.
86 |
87 | * `name` must be a string identifier.
88 | * `engine` must be one of the following:
89 | * `”IPLD/Engine/WASM/0”`
90 |
91 | Each additional property describes the implementation of every operation.
92 |
93 | *TODO: define structure of pointers to WASM functions*
94 |
--------------------------------------------------------------------------------
/design/history/README.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | IPLD Design History
17 | ===================
18 |
19 | This folder is for reference material that was part of the design history of IPLD.
20 |
21 | Some of this material may represent works-in-progress; others have become historical.
22 | (We do not attempt to differentiate these in advance, because it's only in retrospect
23 | that it becomes clear whether an idea is going somewhere, or not!)
24 | It's called "history", but it may also be the source of innovation!
25 |
26 | We put content here for archival reference; but also, we have defined how we
27 | organize content in this directory in such a way as to make it encourage
28 | writing things down -- even (and especially) new ideas.
29 |
30 | One trait that's unified across files within this directory is that we treat them
31 | as (roughly) *append-only*. This is because we want to use them as touchpoints
32 | in discussion, and *not as the discussion format itself*.
33 |
34 | Therefore, if you encounter a document in the history directory that you think
35 | needs substantial iteration and new content, the way to move forward is:
36 |
37 | - gather your ideas and start writing
38 | - prepare to create a new file
39 | - commit and share your notes, linking to existing content where you can.
40 |
41 | If you don't have push access to this repository, or want to start writing and
42 | sharing even earlier iterations of thoughts, GitHub Gists are also a great way
43 | to handle this style of editing.
44 | This directory can be thought of roughly like a collection of gists; we've just
45 | gathered the ones especially relevant to the evolution of the design of IPLD.
46 |
47 |
48 | Specific forms of History
49 | -------------------------
50 |
51 | - [Exploration Reports](./exploration-reports)
52 |
53 |
54 | FAQ
55 | ---
56 |
57 | See the fine [FAQ](./FAQ.md).
58 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/2018.07-graphsync-a/MTE1NDM5MC80NTg0MTc5OS1lMTFjZGQ4MC1iY2U4LTExZTgtOWZkOS01NzI4NDRhYWJmNTkucG5n.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ipld/specs/a7b9376ebd43aeabba7d78487db3d9df456b7714/design/history/exploration-reports/2018.07-graphsync-a/MTE1NDM5MC80NTg0MTc5OS1lMTFjZGQ4MC1iY2U4LTExZTgtOWZkOS01NzI4NDRhYWJmNTkucG5n.png
--------------------------------------------------------------------------------
/design/history/exploration-reports/2018.07-merkle-segments.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | This document was archived from https://github.com/ipld/specs/issues/67. There was agreement that UTF-8 NFC should be the canonical encoding. Below is the original text of the issue written by @warpfork.
17 |
18 | ---
19 |
20 | ## Mission
21 |
22 | It's important to specify precisely what is a valid merklepath segement in IPLD.
23 |
24 | The spec currently contains a "TODO: list path resolving restrictions" and this could be improved :)
25 |
26 | ---
27 |
28 | ## Why
29 |
30 | First, a quick clarification: "merklepath segments" are a distinct concept from "IPLD Selectors". Merklepaths are a specific and limited implementation of IPLD Selectors; they can only specify a traversal to a single object; and importantly, we want them to be serializable in a way that's easy for humans to operate. To quote the current spec for motivations:
31 |
32 | > IPLD paths MUST layer cleanly over UNIX and The Web (use `/`, have deterministic transforms for ASCII systems).
33 |
34 | (Perhaps "ASCII" is a little over-constrained there. The spec *also* says "
35 | IPLD paths MUST be universal and avoid oppressing non-english societies (e.g. use UTF-8, not ASCII)" -- we might want to refine those two lines after we tackle the rest of this issue.)
36 |
37 | Second of all, just a *list* of other issues that are fairly closely related to a need for clarity on this subject:
38 |
39 | - https://github.com/ipfs/go-ipfs/issues/1710 -- "**IPFS permits undesirable paths**" -- lots of good discussion on encodings, and examples of valid but problematic characters
40 | - https://github.com/ipld/specs/issues/59 -- "**Document restrictions on keys of maps**"
41 | - https://github.com/ipld/specs/issues/58 -- "**Non-string keys**" -- this one has a particularly interesting detail quoted: "The original intention was to actually be quite restrictive: map keys must be unicode strings with no slashes. However, we've loosened that so that they can contain slashes, it's just that those keys can't be pathed through.". (n.b. *this* issue here is named "merklepath segments", not "IPLD keys", specifically to note this same distinction.)
42 | - https://github.com/ipld/specs/issues/55 -- "**Spec out DagPB path resolution**"
43 | - https://github.com/ipld/specs/issues/37 -- "**Spec refining: make sure that an attribute cannot be named `.`**"
44 | - https://github.com/ipld/specs/issues/20 -- "**Spec refining: Merkle-paths to an array's index**"
45 | - https://github.com/ipld/specs/issues/15 -- "**Spec refining: Terminology IPLD vs Merkle**" -- basically, am I titling this issue correctly by saying "merklepath"? Perhaps not ;)
46 | - https://github.com/ipld/specs/issues/1 -- "**Spec refining: Relative paths in IPLD**" -- may require reserving more character sequences as special
47 | - https://github.com/ipfs/unixfs-v2/issues/3 -- "**Handing of non-utf-8 posix filenames**"
48 | - https://github.com/ipfs/go-ipfs/issues/4292 -- "**WebUI should (somehow) indicate a problematic directory entry**"
49 | - perhaps more!
50 |
51 | As this list makes evident... we really need to get this nailed down.
52 |
53 | ---
54 |
55 | ## Mission, refined
56 |
57 | Okay, motivations and intro done. What do we need to *do*?
58 |
59 | (1) Update the spec to be consistently clear about IPLD *keys* versus *valid merklepath segments*. This distinction seems to exist already, but it's tricky, so we should hammer it.
60 |
61 | (2) Define normal character encoding. (I think it's now well established that this is necessary -- merklepath segments are *absolutely* for direct human use, so we're certainly speaking of chars rather than bytes; and also unicode is complex and ignoring normalization is not viable.)
62 |
63 | (3) Define any blacklisting of any further byte sequences which are valid normalized characters but we nonetheless don't want to see in merklepath segments.
64 |
65 | (4) Ensure we're clear about what happens when an IPLD key is valid but as a key but not a merklepath segment (e.g. it's unpathable).
66 |
67 | (And one more quick note: a lot of this has been in discussion already as part of sussing out the unixfsv2 spec. In unixfsv2, we've come to the conclusion that some of our path handling rules are quantum-entangled with the IPLD spec for merklepaths. Unixfsv2 may apply *more* blacklistings of byte sequences which are problematic than IPLD merklepath segements, so we don't have to worry about *everything* up here; but we do want to spec this *first*, so we can make sure the Unixfsv2 behavior normalizers are a nice subset of the IPLD merklepath rules.)
68 |
69 | ---
70 |
71 | ## Progress
72 |
73 | **Regarding (1)**: "just a small matter of writing" once we nail the rest...
74 |
75 | **Regarding (2)**: **We have an answer [and the answer is "NFC"](https://www.unicode.org/reports/tr15/#Norm_Forms)**. (At least, I think we have an answer with reasonable consensus.) We had a long thread about this [in the context of unixfsv2](https://github.com/ipfs/unixfs-v2/issues/3), but entirely applicable here in general. Everyone seems to agree that UTF8 is a sensible place to be and [NFC encoding is a sensible, already-well-specified normalization](https://github.com/ipfs/unixfs-v2/issues/3#issuecomment-404760564) to use. And importantly, in practice, NFC is the encoding seen in practically all documents everywhere, so choosing NFC means we accept the majority of strings unchanged. Whew. *dusts off hands*
76 |
77 | **Regarding (3)**: Lots of example choices in https://github.com/ipfs/go-ipfs/issues/1710 . We need to reify that into a list in the spec.
78 |
79 | **Regarding (4)**: Open field?
80 |
81 | (I'll update this "progress" section as discussion... progresses.)
82 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/2018.09-type-convention.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | This document was archived from https://github.com/ipld/specs/issues/71.
17 |
18 | #71: IPLD Type Convention
19 | -------------------------
20 | Opened 2018-09-20T22:57:15Z by Stebalien
21 |
22 | Motivation 1: I'd like to be able to look at an IPLD object and know, approximately, it's intended interpretation (without guessing or using context).
23 |
24 | Motivation 2: I'd like to be able to define or extend a type system for *my* IPLD application without having it completely fail to interop with other IPLD type systems.
25 |
26 | Motivation 3: I'd like to buy some time to figure out the perfect type system.
27 |
28 | ---
29 |
30 | We've been discussing IPLD type *systems* but these discussions usually boil down to implementing the perfect system. I'd like to propose an alternative: the IPLD type convention.
31 |
32 | Proposal: `@type: $something` denotes a type. What this type *means* depends on the type's type (if specified).
33 |
34 | Why @? It's less likely to conflict but I'm not fixated on it.
35 |
36 | Why "the IPLD type convention"? This isn't a specification. Basically, I'm giving in to JSON duck-typing and calling it "good enough".
37 |
38 | Why is it good enough? This is a decentralized system so we'll have to check everything *anyways*. Trying to prescribe structure on users tends to lead to more trouble than it's worth (IMO). If we need more structure, we can always give the type a type to make sure we're operating within the correct type system.
39 |
40 | How will this work with existing formats:
41 |
42 | 1. CBOR/JSON: Do nothing. For existing objects without a `@type`, these objects simply don't have types (within this system). Type systems that need to give everything *some* type can just give these some
43 | 2. Git (tree, commit, etc), Eth, etc: I'd *like* to retroactively add in a type field. Thoughts? I kind of doubt this will break anything.
44 |
45 | ---
46 |
47 | We've *also* talked about adding a new format with the structure ``. That is, introduce a new format where we put all the type and schema information in a separate object, prepending the CID of this separate object to the *actual* object (the value).
48 |
49 | After considering this for a bit, I've realized we should treat these as separate concerns: we're conflating *types* with *schemas*. There's no reason we can't introduce this new, compressed format at some later date even if we go with the above "type convention" proposal.
50 |
51 | ---
52 |
53 | Disclaimer: this was *not* my idea, I've just finally convinced myself that it's probably "good enough".
54 |
55 | Thoughts @jonnycrunch (you're the one who told me to look into the JSON-LD spec), @diasdavid, @davidad, @whyrusleeping?
56 |
57 | ---
58 |
59 | While I'd like to avoid prescribing *too* much, I'd like to define a set of conventions that users *should* follow. For example:
60 |
61 | * `@type: CID` - CID points to the actual type.
62 | * `@type: {}`: inline type. This will often be used for type "constructors". For example: `{@type: {@type: "generic", constructor: CidOfConstructor, parameters: [...]}`.
63 | * `@type: "string"`: A human readable string/path. IMO, this should *usually* be used to specify the type *system*.
64 | * `@type: 1234`: A multicodec. A reasonable type-of function would look this multicodec up in the multicodec table to map it to a CID.
65 | * `@type: [thing1, thing2, thing3]`: multiple types.
66 |
67 | ---
68 | #### (2018-11-12T19:05:30Z) jonnycrunch:
69 | @Stebalien Seems that you are most concerned with simple General Purpose data types definitions to start.
70 |
71 | ```
72 | "simpleTypes": {
73 | "enum": [
74 | "array",
75 | "boolean",
76 | "integer",
77 | "null",
78 | "number",
79 | "object",
80 | "string"
81 | ]
82 | },
83 |
84 | ```
85 |
86 | You could then build upon this and add support for more complex data types to give more meaning and context. This would help with validation of the structure.
87 |
88 | An simple example extension would be `date` and `datetime`, which are an extension of `{ "@type" : "string"}` but the context would define the syntax of the string.
89 |
90 | Presently, the handling of `datetime` in `@context` is `xsd:datetime` which references [http://www.w3.org/2001/XMLSchema#](http://www.w3.org/2001/XMLSchema#), where the explaination as `documentation source` is in [html](https://www.w3.org/TR/xmlschema-2/#dateTime).
91 |
92 | My favorite annotation in this xml is:
93 |
94 | >First the built-in primitive datatypes. These definitions are for information only, the real built-in definitions are *magic*.
95 |
96 | Example for `string` from [XMLSchema](https://www.w3.org/2001/XMLSchema.xsd):
97 |
98 |
99 |
100 |
101 |
102 |
103 |
104 |
105 |
106 |
107 |
108 |
109 |
110 |
111 |
112 |
113 |
114 |
115 |
116 | And more specfically for `datetime`:
117 |
118 |
119 |
120 |
121 |
122 |
123 |
124 |
125 |
126 |
127 |
128 |
129 |
130 |
131 |
132 |
133 |
134 |
135 |
136 |
137 |
138 |
139 |
140 |
141 | More complex data types would be generators and would help with the IPLD selectors issue.
142 |
143 |
144 | As far as JSON-LD, I'm rolling back my support of it. There is SO much reliance on location-based mappings. I have really started to look into how to strip out all reliance of the location and make it a pure content-addressed schema. But I'm getting a "symbol grounding problem". Also, there is so much self-referencing that my script fails given the cycles in the graph. I have moved away from the w3c model and looking at wikidata. Unfortunately, there isn't a good mapping for simple data types like `datetime` above. I think your "good enough" approach expresses the fundamental "intentionality" that point the user in the direction of the proper meaning. Wikidata's approach is to give many and and allow users to "triangulate" the meaning, especially across languages.
145 |
146 | I, myself, like inline link ``:
147 |
148 | ```
149 | {
150 | ...
151 | @type : { "/" : "" }
152 | }
153 | ```
154 |
155 | The use of `@type` to denote the object is an instance of a class of entities.
156 |
157 | The problem is what do you link to? You'd be building a whole [ontology](http://ontodm.com/ontodt/OntoDT.owl) for data types.
158 |
159 | BTW, there is an ISO standard for [General Purpose Datatypes](https://en.wikipedia.org/wiki/ISO/IEC_11404).
160 |
161 | If you keep it simple and start with `strings`, and those strings are defined in the `@context`.
162 |
163 | ```
164 | {
165 | @context : { "/" : "" },
166 | ...
167 | @type : "string"
168 | }
169 | ```
170 |
171 | More complicated examples can build out from here.
172 |
173 | In theory, this syntx for links should be compatible with json-ld, but in practice it not. see my [issue #110 in JSON-ld](https://github.com/w3c-ccg/did-spec/pull/110#issuecomment-431356177).
174 |
175 | ---
176 | #### (2019-01-12T19:04:54Z) pavetok:
177 | > I'd like to be able to look at an IPLD object and know, approximately, it's intended interpretation (without guessing or using context).
178 |
179 | Why does it necessary to embed type information into data itself?
180 | Modern CTT, for example, [says](https://www.youtube.com/watch?v=LE0SSLizYUI) that typing judgments are completely separate things. As consequence one value can inhabit multiple types.
181 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/2019.02-replication-protocol-with-batching.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | A replication protocol with batching on top of libp2p streams
17 | =============================================================
18 |
19 | There were lots of discussions about the [Graphsync wire protocol](https://github.com/ipld/specs/blob/15f1dae2de3708953d6bb0f117fdc4648854ca16/graphsync/graphsync.md#network-messages). Sometimes it wasn't clear on which level we'd like to operate. Should it be a protocol for a lower TCP-like level of networking, or rather higher level built on top of libp2p streams?
20 |
21 | This led to the main point of the discussion which was about how to implement certain batching mechanisms and whether those should be part of the wire protocol itself or not.
22 |
23 | I was arguing for keeping the protocol simple and doing those optimizations on a different layer. In the end we decided to bake it directly into the wire protocol.
24 |
25 | Nonetheless I like to publish the thoughts I had on how it could be implemented on top of libp2p as I spent so much time on it and don't want to see those ideas just fade away into void. Perhaps it's useful for someone else or even future protocols we develop.
26 |
27 |
28 | Current Graphsync wire protocol
29 | -------------------------------
30 |
31 | The basic idea of the current Graphsync protocol is that you send some request and get multiple blocks back. Multiple responses of multiple requests might be put into a single message. I see two reasons for doing this:
32 |
33 | - Block de-duplication: If several responses contain the same blocks, those can be de-duplicated when sent over the wire.
34 | - Latency/bandwidth trade-off: If the connection between peers has a high latency, but there's still bandwidth available, you could bundle the responses to make better use of the resources.
35 |
36 |
37 | Building on top of libp2p
38 | -------------------------
39 |
40 | I think it would be a huge win for the libp2p ecosystem if we find a way to build protocol independent batching capabilities on top of libp2p. New protocols would become easier to implement, but there would still be ways to optimize for different scenarios where batching makes sense.
41 |
42 | ### Batching responses of a single request
43 |
44 | libp2p already supports multiplexing, which simplifies the code on the receiving side a lot. How nice would it be if the protocol specific code would only need to deal with single messages, even if they are sent in batches? The sender side would call a library to create batches, the receiving side then splits them into individual ones. This way the protocol specific code to handle messages would only need to deal with single messages.
45 |
46 |
47 | ### De-duplicating blocks
48 |
49 | De-duplicating blocks is a superset of the "[batching responses of a single request](#batching-responses-of-a-single-request)" problem. I could imagine that you would write a protocol specific function for assembling and disassembling batches. The advantage over doing it directly on the protocol level would be that you won't need to deal with the disassembling/de-muxing in detail, but on a more abstract level.
50 |
51 | This mechanism could also be used for bundling several responses in one message.
52 |
53 |
54 | Author
55 | ------
56 |
57 | @vmx (Volker Mische)
58 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/2019.06-new-version-dag-json.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | New DAG-JSON
17 | ============
18 |
19 | This exploration report was originally [Pull Requst 129](https://github.com/ipld/specs/pull/129). It got converted [via script](https://github.com/vmx/export_issues) into an exploration report in order to preserve all the useful information it contains.
20 |
21 | ---
22 |
23 | #129: Proposal: new version of dag-json (open)
24 | ----------------------------------------------
25 | Opened 2019-06-11T03:53:16Z by mikeal
26 |
27 | This spec change would resolve the outstanding issue of `dag-json` not being able to represent the “/“ key.
28 |
29 | Since “/“ is already reserved, we use that key for the version definition, which makes this new version backwards compatible with all of the existing `dag-json` data.
30 |
31 | This change would also set up `dag-json` to store data outside of the actual node data, which could be used similar to how tags are used in cbor or for something like https://github.com/ipld/js-generics/issues/3
32 |
33 |
34 | Files
35 | -----
36 | `block-layer/codecs/DAG-JSON.md`
37 |
38 | # Specification: DAG-JSON
39 |
40 | **Status: Descriptive - Final**
41 |
42 | DAG-JSON supports the full [IPLD Data Model](../data-model-layer/data-model.md).
43 |
44 | ## Format
45 |
46 | ### Simple Types
47 |
48 | All simple types except binary are supported natively by JSON.
49 |
50 | Contrary to popular belief, JSON as a format supports Big Integers. It's only
51 | JavaScript itself that has trouble with them. This means JS implementations
52 | of `dag-json` can't use the native JSON parser and serializer.
53 |
54 | ### Version 0
55 |
56 | This is an old version of `dag-json` that reserved the `"/"` key in order to
57 | represent binary and link data types.
58 |
59 | #### Binary Type
60 |
61 | ```javascript
62 | {"/": { "base64": String }}
63 | ```
64 |
65 | #### Link Type
66 |
67 | ```javascript
68 | {"/": String /* base encoded CID */}
69 | ```
70 |
71 | ### Version 1
72 |
73 | #### Format
74 |
75 | The internal data format is valid JSON but is **NOT** identical to the decoded
76 | node data codecs produce.
77 | >
78 | > ---
79 | >
80 | > #### (2019-06-11T10:33:10Z) daviddias:
81 | > I do not understand why this is necessary. Is there a place with the rationale to make JSON not JSON?
82 | >
83 | > ---
84 | >
85 | > #### (2019-06-11T20:16:25Z) mikeal:
86 | > Right now, we’re using a reserved key, `”/“`, for encoding links and binary. Several people think that key reservation is highly problematic because it means there are certain data sets you just can encode.
87 | >
88 | > This change would fix that changing the internal block format. The format would **still* be valid JSON, it just wouldn’t have as close of a 1-to-1 match with the decoded form of the data.
89 | >
90 | > However, this change may not be worth it. This PR is to show that we have a solution to the reserved key issue and that is what it would take to fix it.
91 |
92 | Example internal format:
93 |
94 | ```javascript
95 | { "/": { "_": 1 },
96 | ```
97 | >
98 | > ---
99 | >
100 | > #### (2019-06-11T11:13:53Z) vmx:
101 | > Wouldn't it be simpler to just add a field called `version` and move `meta` and `data` to the top-level?
102 | >
103 | > ---
104 | >
105 | > #### (2019-06-11T20:17:34Z) mikeal:
106 | > “meta” and “data” *are* top level. using “_” was just to save space but we can do whatever we want.
107 | >
108 | > ---
109 | >
110 | > #### (2019-06-12T08:23:03Z) vmx:
111 | > Oh, sorry I missed that they are top-level.
112 | ```javascript
113 | "data": { "hello": "world", { "obj": { "array": [0, 0] } } },
114 | ```
115 | >
116 | > ---
117 | >
118 | > #### (2019-06-11T11:10:07Z) vmx:
119 | > So those `[0, 0]` are placeholders? So if you want to encode `[CID, "some string"]` it would be `[0, "some string"]`?
120 | >
121 | > ---
122 | >
123 | > #### (2019-06-11T20:16:42Z) mikeal:
124 | > They are just placeholders to preserve the ordering of the array.
125 | >
126 | > ---
127 | >
128 | > #### (2019-06-12T08:21:54Z) vmx:
129 | > I'd use `null` for that, but that's a minor detail. Perhaps the example could be expended to `[null, "some string", null]` to make it clearer that there are place holders which are mixed with actual values.
130 | >
131 | > ---
132 | >
133 | > #### (2019-06-12T16:42:19Z) mikeal:
134 | > originally it was null but i went with 0 because null encoding is extra bytes :/
135 | ```javascript
136 | "meta": {
137 | base64: [
138 | [[ "key" ], "dmFsdWU="],
139 | ```
140 | >
141 | > ---
142 | >
143 | > #### (2019-06-11T11:16:29Z) vmx:
144 | > I key see where those tuples come from. Why not using an "overlay object" with proper nesting as you did in your other dag-json experiment?
145 | >
146 | > Is it because you could get the binary data/links without traversing a nested structure?
147 | >
148 | > ---
149 | >
150 | > #### (2019-06-11T20:19:57Z) mikeal:
151 | > If you recall, we considered that in a `dag-json` alternative proposal and the problem, which I think you ended up alerting me to, was how to represent sparse values in an array.
152 | >
153 | > ---
154 | >
155 | > #### (2019-06-12T08:23:54Z) vmx:
156 | > Right, that would be a lot of placeholders then.
157 | ```javascript
158 | [[ "obj", "key"], "dmFsdWU="],
159 | [[ "obj", "array", 0], "dmFsdWU="]
160 | ],
161 | links: [
162 | [["obj", "array", 1], "zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA"]
163 | ]
164 | }
165 | }
166 | ```
167 |
168 | Decodes to:
169 |
170 | ```javascript
171 | { hello: 'world',
172 | key: Buffer.from('value'),
173 | ```
174 | >
175 | > ---
176 | >
177 | > #### (2019-06-11T11:24:50Z) vmx:
178 | > I would really prefer if we'd move to an Browser first approach and not using Node.js primitives:
179 | >
180 | > ```JS
181 | > key: new TextEncoder().encode('value'),
182 | > ```
183 | >
184 | > ---
185 | >
186 | > #### (2019-06-11T20:21:48Z) mikeal:
187 | > it’s just not entirely clear to everyone that this does binary encoding. perhaps we should just move to psuedo-code, something like `Bytes(‘value’)`
188 | >
189 | > ---
190 | >
191 | > #### (2019-06-12T08:24:51Z) vmx:
192 | > That would work for me. I just don't want to give the impression it's a Node.js Buffer.
193 | ```javascript
194 | obj: {
195 | array: [ Buffer.from('value'), new CID('zb2rhe5P4gXftAwvA4eXQ5HJwsER2owDyS9sKaQRRVQPn93bA')],
196 | key: Buffer.from('value')
197 | }
198 | }
199 | ```
200 |
201 | ---
202 |
203 | Comments
204 | --------
205 |
206 | #### (2019-06-12T16:10:10Z) Stebalien:
207 | Is the motivation _just_ to support `/` keys? IMO, dag-json exists so users could read and write IPLD using JSON. I'm concerned this format looses that.
208 |
209 | If we _just_ want to support `/` in keys, we could use some form of escaping syntax. For full backwards compatibility, we could do something like `{"/": {"literal": foobar}}`. Alternatively, we can _probably_ allow escaping forward slash without breaking too much: `//` -> `/`.
210 |
211 |
212 | ---
213 |
214 | #### (2019-06-12T16:49:15Z) mikeal:
215 | > IMO, dag-json exists so users could read and write IPLD using JSON. I'm concerned this format looses that.
216 |
217 | I’m a little confused by this use case, because the `dag-json` codec already isn’t `json`. If you want to take a block of existing `json` data you should just use a `json` codec (which we should implement since it would be trivial).
218 |
219 | If you want to take existing JSON data and add links to it, you’re going to have to use an encoding interface anyway and all of these details are invisible.
220 |
221 | Is there a case I’m missing where you want `dag-json` instead of `json` **and** you’re interacting directly with the block format?
222 |
223 | > Is the motivation just to support / keys?
224 |
225 | Yes, that’s the only thing we get out of this change. Using “/“ as a key will *still* be highly problematic for anyone using paths and selectors, but this would fix the issue at the codec level.
226 |
227 | I’m actually not too worried about “/“ being reserved in `dag-json` but other people seem to be. I opened this PR to show what it would take to fix the issue, given some people are concerned.
228 |
229 |
230 | ---
231 |
232 | #### (2019-06-12T18:57:53Z) Stebalien:
233 | > I’m a little confused by this use case, because the dag-json codec already isn’t json. If you want to take a block of existing json data you should just use a json codec (which we should implement since it would be trivial).
234 |
235 | dag-json wasn't designed to be 100% compatible but still mostly compatible. I agree we should have a plain JSON format for users who just want JSON but the point of _this_ format was to provide a familiar, JSON-like format that mostly "just works" (i.e., you can link up existing JSON objects).
236 |
237 |
238 | ---
239 |
240 | #### (2019-06-17T11:55:59Z) vmx:
241 | Another thing that might be worth keeping in mind. Having a format you can easily use as exchange format for tests: https://github.com/multiformats/rust-cid/pull/17#issuecomment-502395133
242 |
243 |
244 | ---
245 |
246 | #### (2019-08-20T11:08:19Z) vmx:
247 | @mikeal We decided to put this into design history, but after having a closer look I think the discussion can be easily be summed up. Instead of having it archived automatically I suggest that we create a proper exploration report like https://github.com/ipld/specs/blob/035683c97d0280de5e2d490822d63ad618a8acab/design/history/exploration-reports/2019.06-unixfsv2-spike-01.md. @mikeal if you think it's worth having it there, could you please create a new PR with that report?
248 |
249 |
250 | ---
251 |
252 | #### (2019-10-02T10:27:07Z) Assigned to mikeal
253 |
254 | ---
255 |
256 | #### (2020-10-08T21:19:39Z) mikeal:
257 | Closing as this was mostly informational. @vmx will use his script to create an exploration report for this.
258 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/2019.09-adl-schema-root-type-defn.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | Root type definitions for ADL schema declarations
17 | =================================================
18 |
19 | This text formed an original part of https://github.com/ipld/specs/pull/182 for the file schemas/advanced-layouts.md. The initial proposal included a `root` type specifier for `advanced` declarations that would allow the schema parser to make assumptions (and assertions) about the data found at the node where the ADL was encountered.
20 |
21 | This (partly) necessitated the connection of an ADL implementation schema to an ADL usage schema, such that a user would have to reach into the implementation and refer to a type defined there. For this reason (primarily), `root` was removed from the proposal and an ADL is to be declared simply with its name, `advanced Foo`, and no additional information, for now.
22 |
23 | ---------------------------
24 |
25 | ## Root node type definitions
26 |
27 | Advanced layouts are designed to abstract data that exists at the data model layer. As such, they may also dictate what they expect from the data that exists at the node their _root_ resides at.
28 |
29 | In the case of our `ROT13` `string` representation, we are likely to want to store this on the block as a `string` (i.e. this is a crude encryption mechanism, transforming `string` to `string`—realistic encryption mechanisms are likely to involve `bytes` and perhaps complex data structures to store encryption metadata).
30 |
31 | ```ipldsch
32 | advanced ROT13 {
33 | root String
34 | }
35 |
36 | type MyString string representation ROT13
37 |
38 | type Name struct {
39 | firstname MyString
40 | surname MyString
41 | }
42 | ```
43 |
44 | A validator using our schema is now able to assert that it should find a `map` (default `struct` representation) with two fields, `firstname` and `surname`, and, thanks to the `root` definition of `ROT13`, it may also assert that these two fields are of kind `string`.
45 |
46 | We may also introduce complex types as the root definition. For example, a `byte` representation that is a chain of blocks, each containing a section of `bytes` and a link to the next block:
47 |
48 | ```ipldsch
49 | advanced ChainedBytes {
50 | root Chunk
51 | }
52 |
53 | type Chunk struct {
54 | contents Bytes
55 | next nullable Link
56 | }
57 | ```
58 |
59 | Or, as in the IPLD [HashMap](../data-structures/hashmap.md) spec:
60 |
61 | ```ipldsch
62 | advanced HashMap {
63 | root HashMapRoot
64 | }
65 |
66 | # Root node layout
67 | type HashMapRoot struct {
68 | hashAlg String
69 | bucketSize Int
70 | map Bytes
71 | data [ Element ]
72 | }
73 |
74 | # ... `Element` (and further) definition omitted
75 | ```
76 |
77 | And we could use this to define a map of `string`s to `link`s:
78 |
79 | ```ipldsch
80 | type MyMap { String : Link } representation HashMap
81 | ```
82 |
83 | We could even combine usage of our `ROT13` and `HashMap` definitions in novel ways:
84 |
85 | ```ipldsch
86 | type BigConfidentialCatalog [ Secretz ]
87 |
88 | type Secretz struct {
89 | title MyString
90 | data MyMap
91 | }
92 |
93 | type MyMap { String : Name } representation HashMap
94 | ```
95 |
96 | If we were to take an IPLD node, and assert that it is of type `BigConfidentialCatalog`, we should expect that:
97 |
98 | 1. The node is a `list` kind
99 | 2. Each element of the `list` contains a `map`, which is described by `Secretz`
100 | 3. Each map contains the two properties defined by `Secretz`: `title` and `data`
101 | 4. The `title` property of the `map` is of `string` kind, thanks to the `MyString` definition, but it must be transformed through the `ROT13` layout to make sense of it.
102 | 5. The `data` property of the `map` is of `map` kind, which itself should conform to the `HashMapRoot` type specification, but must be interacted through with the logic associated with `HashMap` in order to make sense of it (which may also involve loading further blocks to traverse the sharded data).
103 |
104 | If `ROT13` and `HashMap` were to omit their `root` descriptor, we could only make assertions 1 and 2 above.
105 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/2019.09-rust-serde.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | Using Serde for Rust IPLD
17 | =========================
18 |
19 | **Author**: Volker Mische ([@vmx])
20 |
21 | It sounds like a great idea using [Serde] as basis for Rust IPLD. It should be simply to put the [IPLD Data Model] on top of the [Serde Data Model]. So I was looking into implementing DagCbor on top of [serde_cbor].
22 |
23 | Missing CBOR tags support in serde_cbor
24 | ---------------------------------------
25 |
26 | The blocking issue is that there's no CBOR tags support in serde_cbpr. [@dignifiedquire] did an implementation for tags support. I couldn't get it working (due to not being familiar with Serde yet) for deserializing things into custom types. The plan was to serialize CBOR tag 42 into a "CID struct". I looked for other formats that are hitting the same limitation in Serde.
27 |
28 | [msgpack] has the notion of [Extensions] and there's also an implemention for Serde ([msgpack-rust]) which even gained [extension support recently]. So I used that as a basis to create yet another implementation of CBOR tags support for serde_cbor ([#151]). If you want to learn more about this approach, checkout the discussions at [#529: Implementing custom type serializing].
29 |
30 | I'm happy with the result and finally have a way better understanding of Serde. I now also understand why merging a PR following that approach would be problematic. You could serialize and deserialize CBOR tags with it, but there won't be any interoperability with e.g. rust-msgpack. You couldn't simply have a single implementation of a CID type that would work with CBOR as well as with msgpack. Though, I've created a [prototype] that can serialize/deserialize the IPLD Date Model into CBOR using the patched serde_cbor.
31 |
32 | So what would be needed in order to have it properly work with Serde?
33 |
34 |
35 | Tags support in Serde
36 | ---------------------
37 |
38 | There has been plenty of discussions ways in order to get tags support as needed by formats like CBOR or msgpack into Serde. Here I try to give a bit of that history and the summary.
39 |
40 | The starting point is [#163: CBOR support], where the author of serde_cbor asks if it would be possible to add a `visit_tag()` method to Serde to support the tags use case. Following this idea a [#301: WIP: Parsing and emitting tagged values] is opened which starts adding support for serializing and deserializing tags. The discussion is about types that tags need to support, CBOR Tags are integers, YAML tags are strings. And also about the problem of different tags for the same data type in different formats.
41 |
42 | There is no conclusion, but that issues is superseded by [#408: Add support for tagged values.], which contains a full implementation. It seems to solve the problem, but a discussion starts, whether there are better ways to implement tags. A proposed solution is to use [specialization].
43 |
44 | That idea is further pursued in [#455: Tagged values through specialization]. After some discussion it reaches the point where the specialization doesn't work. Bug [#38516: Specialization does not find the default impl] was opened almost 3 years ago, but didn't find any attention.
45 |
46 |
47 | Conclusion
48 | ----------
49 |
50 | I still like the idea of being able to use Serde for the Rust IPLD work. The chances are high that there is already a parser for for the file format you want to use, you would just need to implement make sure IPLD Data Model types can be (de)serialized.
51 |
52 | But in order to make this happen, we need tag support within Serde. I will try to revive the idea outlined in [#408: Add support for tagged values.], Without tag support there's no real benefit of using Serde and we would be better off having specialized parsers for each format we want to support. The result would then probably resemble Serde, but less flexible and specific to the IPLD Data Model.
53 |
54 |
55 | [@vmx]: https://github.com/vmx/
56 | [Serde]: https://serde.rs/
57 | [IPLD Data Model]: https://github.com/ipld/specs/blob/67028313e0793d562d671a7fb4a030f471f90098/data-model-layer/data-model.md
58 | [Serde Data Model]: https://serde.rs/data-model.html
59 | [serde_cbor]: https://github.com/pyfisch/cbor
60 | [@dignifiedquire]: https://github.com/dignifiedquire/
61 | [msgpack]: https://msgpack.org/
62 | [msgpack-rust]: https://github.com/3Hren/msgpack-rust
63 | [Extensions]: https://github.com/msgpack/msgpack/blob/1e4fd94b90d38167b8b5a0ecf57f59b538669574/spec.md#extension-types
64 | [extension support recently]: https://github.com/3Hren/msgpack-rust/pull/216
65 | [#151]: https://github.com/pyfisch/cbor/pull/151
66 | [#529: Implementing custom type serializing]: https://github.com/serde-rs/serde/issues/529
67 | [prototype]: https://github.com/vmx/rust-ipld-dag-cbor
68 | [#163: CBOR support]: https://github.com/serde-rs/serde/issues/163
69 | [#301: WIP: Parsing and emitting tagged values]: https://github.com/serde-rs/serde/pull/301
70 | [#408: Add support for tagged values.]: https://github.com/serde-rs/serde/pull/408
71 | [specialization]: https://github.com/rust-lang/rust/issues/31844
72 | [#455: Tagged values through specialization]: https://github.com/serde-rs/serde/pull/455
73 | [#38516: Specialization does not find the default impl]: https://github.com/rust-lang/rust/issues/38516
74 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/2019.12-manifesting-adls.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | Notes on Manifesting Advanced Data Layouts
17 | ==========================================
18 |
19 | This document is to explore some questions about Advanced Data Layouts (ADLs)
20 | and how they're:
21 |
22 | - created,
23 | - documented,
24 | - shared,
25 | - and used.
26 |
27 | ... and perhaps most importantly, what user stories touch each of those facets,
28 | how often we expect the same people to be carrying out more than one of those stories,
29 | and when not, what kind of coordination they would require between the parties;
30 | and finally, what all that implies for our library and API and spec designs.
31 |
32 |
33 | Recap of the story so far
34 | -------------------------
35 |
36 | - ADLs are a cutout in the IPLD ecosystem specs which describe some Strongly Recommended library features.
37 | - ADLs are a way to present information as matching the Data Model -- so that
38 | it can be traversed and manipulated generically, like any other Data Model content --
39 | while not making any specification about how the data is actually stored.
40 | - Specifically, ADLs have the notable property of potentially using *multiple Blocks* in their content.
41 | - Examples of important ADLs we image:
42 | - Presenting a Data Model `map`, while internally using multiple Blocks in a HAMT format.
43 | - Presenting a Data Model `bytes`, while internally using some tree structure, and chunks defined by some rolling checksum.
44 | - This is not an exclusive list: we expect to make it possible for users to make their own ADLs.
45 | - It is important to note that ADLs use some *code* in order to do their internal work.
46 | - Often this code will be written the native language of whatever library ecosystem you're using;
47 | it may also be a interpreted code in some virtual machine; this is not a detail that's important here.
48 | - ADLs should be *specification driven*, so that it's not an undue burden to implement one natively
49 | in a language that currently doesn't have such an implementation.
50 | - Since ADLs use *code*, it follows that security conscious library users will want to whitelist ADLs
51 | which they'll allow the use of. (This may be for resource accounting and DoS prevention, if nothing else.)
52 |
53 | ADLs are a tricky feature because they're intentionally somewhat open-ended
54 | (so that they're extensible and can be applied to use cases in the future we didn't expect in advance),
55 | yet still need to follow enough higher level rules that systems designed on IPLD remain understandable and reliable.
56 |
57 | So: what are some more higher level rules we can establish?
58 |
59 |
60 | Rules We're Fairly Sure We Want
61 | -------------------------------
62 |
63 | ### consuming in the realm of IPLD Schema tooling
64 |
65 | - When I parse a schema, I want to validate that all types it references are defined at this time.
66 | It is an error to not be able to tell, or to have dangling references.
67 | - When I parse a schema, I should be able to tell what _kind_ all types are at this time.
68 | It is necessary to do this so we can perform additional sanity checks, such as that kinded unions are coherent.
69 | - When I parse a schema, I should be able to see if any advanced layouts will be required in order to fully understand this data.
70 | (It is not necessary for the ADL implementations to be provided in order for me to parse this schema; I just need to see where they will slot in.)
71 |
72 | ### consuming in the realm of coding against IPLD libraries
73 |
74 | - As a user writing code using an IPLD library, we should be able to use ADLs... with the Data Model interfaces.
75 | No interfaces for IPLD Schema features should be necessary to reference to in order to activate an ADL.
76 | This may be _verbose_, but it should be _possible_.
77 | - As a user who *does* use IPLD Schemas, I should be additionally empowered:
78 | I should be able to take the Schema's hints about where ADLs will be required, and supply implementations up front.
79 | (This can be expected to be less verbose than the above, because all schematicADLname->ADLimpl mappings can be declared once, _up front_,
80 | rather than ADLimpl mappings being handled by programmatic logic that has to be applied mid-tree.)
81 |
82 | ### consuming in the realm of generic behaviors
83 |
84 | (n.b. making up a word for this.
85 | Means: things like "take this data CID, this schema CID, and this selector CID, and evaluate it";
86 | this is something we expect tooling to be able to evaluate from those declarations -- _without writing code_,
87 | which makes it a very distinct story from what's covered in the previous heading.)
88 |
89 | - In the story "take this data CID, this schema CID, and this selector CID"...
90 | - If **no** ADL is involved, we simply expect this to succeed.
91 | - If an ADL **is** involved, it should either succeed, or fail *clearly* (and as soon as possible).
92 | - If an ADL **is** involved, we should be able to provide an additional argument of "{schematicADLname}->{ADLimpl}" in order to succeed.
93 |
94 | ### authoring
95 |
96 | (_We're much less sure about authoring. This is the big exploration topic right now._)
97 |
98 | ### local naming
99 |
100 | I've used the term "schematicADLname" to above refer to the _local_ name of an ADL in a schema _using_ it.
101 |
102 | E.g., "Fwee" and "Fwop" in the following schema are each an schematicADLname:
103 |
104 | ```ipldsch
105 | advanced Fwee {
106 | kind map
107 | }
108 | advanced Fwop {
109 | kind bytes
110 | }
111 |
112 | type FancyBytes bytes
113 | representation advanced Fwop
114 |
115 | type FancyMap {String:String}
116 | representation advanced Fwee
117 | ```
118 |
119 | It may be important to disambiguate schematicADLname from the name or reference handle
120 | used for the ADL in any other context.
121 |
122 | For example, note that the story for consuming ADLs as library user includes two paths:
123 | and one of them *does not allude to "schematicADLname"* whatsoever.
124 |
125 | It's also important to note that the name an author of an ADL uses versus
126 | the name used locally in a consuming schema are not assumed to be in lock-step.
127 | (If they were, it would raise all sorts of questions about coordination,
128 | updating, etc, to which we have not yet posed concrete answers.)
129 |
130 | In fact, it's unclear if an author of an ADL even needs to name it at all in order to use it.
131 |
132 |
133 | Current Discussion
134 | ------------------
135 |
136 | ### Do we need a syntax for stating an ADL is to be "exported"?
137 |
138 | (Whatever "exported" means -- this has itself not yet been fully described.)
139 |
140 | Unclear.
141 |
142 | We may certainly find it *nice*, for documentation purposes.
143 |
144 | ### Is such a syntax part of the Schema DSL?
145 |
146 | If the answer to the above question is "yes":
147 | Should it be in a similar syntax and in the same files as schema DSL statements?
148 |
149 | Unclear.
150 |
151 | Further exploratory questions:
152 |
153 | - Does it make sense to be able to export more than one ADL from the same file?
154 | - How often will two different ADLs share internal types?
155 | - Does it matter? Would "vendoring" the defn's twice hurt anyone or make anything impossible?
156 |
157 | ### What information might be useful in an "export" declaration for an ADL?
158 |
159 | "RootType" specifically recurs often as an idea that seems potentially useful.
160 |
161 | Question: if the code for the ADL impl defacto needs to refer to this type,
162 | is it strictly necessary to state it (redundantly) in the export declaration?
163 |
164 | Do we expect consumers of ADLs to be able to inspect the ADL's Schema,
165 | even if they _do not_ evaluate its _code_?
166 | Does that unlock any interesting features? (Maybe!)
167 |
168 | ### Are ADLs required to have a schema describing their internal data?
169 |
170 | ("required" as in RFC 6919 "MUST")
171 |
172 | _Probably not._
173 | At least, there's been no explicit choice -- so far -- to mandate it.
174 |
175 | We might expect most of them to, because it's just a high-leverage, useful choice.
176 | We imagine ourselves using schemas to develop the ADLs we're authoring!
177 | But "a good idea" and a "must" are different things.
178 |
179 | Sub-question: Do we expect ADLs to have _exactly one_ (not two or more) schema?
180 |
181 | _**Probably not**_ -- use of multiple schemas in a "try stack" might be a useful feature
182 | for the ADL code author to do version detection and graceful migrations... just like anywhere else.
183 |
184 |
185 |
186 | Resolutions and Bets
187 | --------------------
188 |
189 | ### making library APIs for ADLs
190 |
191 | Yes. Let's do it. Purely a forward and learning experience,
192 | and we're almost certain to need it regardless.
193 |
194 | ### proposing properties for declarative manifests required for exporting ADLs
195 |
196 | Maybe?
197 |
198 | We can make drafts and proposals around this that are free-standing,
199 | so it's probably very viable to experiment with this freely.
200 |
201 | ### proposing IPLD Schema DSL syntax for ADL export manifests
202 |
203 | Maybe?
204 |
205 | This is relatively difficult to do as an experiment of limited scope.
206 |
207 | ### review this again
208 |
209 | In a few months, after experimenting with library APIs, we'll probably have
210 | additional experiences which will be useful input for reviewing this design.
211 |
212 | Ideally, we'd like to gain those experiences in more than one library and language.
213 |
214 | Let's make a point to re-check these ideas as that info becomes available.
215 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/2020.09-learning-ipld.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Journey learning the IPLD stack
17 |
18 | Author: Daniel Martí (@mvdan)
19 |
20 | It was suggested that I should capture my perspective as I get up to speed on
21 | the IPLD stack, so that we can possibly identify shortcomings with the current
22 | material, or topics which can cause problems.
23 |
24 | ### Previous knowledge
25 |
26 | I should note that I was already familiar with hashing, Git, type systems and
27 | data structures, encodings like JSON and Protobuf, and compatibility between
28 | different programming languages. So the "block" and "data model" layers of IPLD
29 | were relatively easy to understand.
30 |
31 | ### First impression
32 |
33 | The docs are somewhat scattered and unfinished, which does make it a little
34 | extra confusing to get started. In chronological order, I read:
35 |
36 | 1) https://hackmd.io/LHTTmGSWSvem4Wz2h_a39g?both, Eric's "terse primer". Seems
37 | to try to cover everything, though it does seem like quite a lot of
38 | information to take in all at once.
39 |
40 | 2) https://ipld.github.io/docs/, sourced at https://github.com/ipld/docs. Seems
41 | aimed at getting started with tutorials in JS. I should note that I first
42 | read this before Mikeal's "new intro" added on September 2nd 2020.
43 |
44 | 3) https://github.com/ipld/specs, which seems to contain all the formal specs,
45 | but also includes a pretty decent README.
46 |
47 | I think all three should probably be unified into two halves:
48 |
49 | * A high-level introduction to IPLD, max 3-4 pages. Probably extending Mikeal's
50 | new intro with some material from Eric's primer?
51 |
52 | * The set of spec documents, with a README to classify and introduce each of the
53 | groups or layers. I think the specs repo already does a decent job at this.
54 |
55 | ### Concepts that confused me
56 |
57 | I already raised some of these as Slack threads or HackMD comments, but for the
58 | sake of keeping record, I'm listing the most basic or important ones here.
59 |
60 | * It is said that a link is unlike a URL, since it is merely a hash of data
61 | that doesn't statically say where to fetch the data from. So... how would one
62 | ever actually fetch data via a link?
63 |
64 | * Out of the three layers (blocks, data model, schemas), Schemas have been by
65 | far the hardest to wrap my head around. I think an introduction should contain
66 | a very brief example, including how it actually looks like when mapped to the
67 | data model and encoded into a block.
68 |
69 | The following are more such points, but focused around schemas, once I got to
70 | that part of the spec:
71 |
72 | * My first read about ADLs left me very confused, in particular how they're
73 | different than Schemas. I found the "Mapped to the Data Model" introduction to
74 | ADLs much easier to understand, as it shows reasonable examples.
75 |
76 | * Why are multi-block data structures a separate definition in the spec, and not
77 | just part of Schemas?
78 |
79 | * Are blocks generally filled with data nearly completely, or is it normal to
80 | have them relatively empty?
81 |
82 | * Wouldn't removing the first byte from a very long multi-block List mean that
83 | every block would need to be modified to shift all bytes forward by one? I
84 | assume and hope not, but the spec doesn't really give pointers.
85 |
86 | * Since data in blocks is encoded from the data model, how would Iknow if a
87 | particular data model value fits in a single block? What about a shema value?
88 |
89 | * Would IPLD be much different if the data model was an internal detail, and not
90 | exposed to the user? I imagine that, most of the time, one would interact with
91 | schemas and not the data model.
92 |
93 | * When docs say "the IPLD type system", is that in terms of Schemas, or the Data
94 | Model, or both? Answer: The schema intro later says that "types" are for
95 | schemas, and "kinds" for the data model. That should probably be sooner in the
96 | schema spec.
97 |
98 | * For new IPLD team members in the future, it's probably best if their first
99 | week is focused on the basics alone - blocks, hashing, linking, encodings, and
100 | the data model. That's enough for some realistic demos using IPLD, and can be
101 | learned in half a day, allowing the person to start contributing without
102 | multiple full days of reading. I should clarify that Eric did give me a data
103 | model issue to work on during my first week, but I never picked up on the "you
104 | don't need to read about schemas for now" nudge.
105 |
106 | * The specs README hierarchy presents these three concepts in order: multi-block
107 | collections, schemas, and ADLs. The docs do explain why schemas and ADLs are
108 | different, but not why multi-block collections are also a separate thing. Eric
109 | mentioned that multi-block collections are pretty much ADLs; so why are they
110 | introduced before schemas?
111 |
112 | * The schema authoring guide talks about "component specifiers" and "component
113 | specifications", but never seems to define them.
114 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/2020.10-data-model-maps-keys.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | IPLD Data Model and Map keys
17 | ============================
18 |
19 | **Author**: Volker Mische ([@vmx])
20 |
21 | This is an exploration report about how we define the keys of the [IPLD Data Model] `Map` kind. This is not specifically about strings in the IPLD Data Model, although it touches that topic.
22 |
23 |
24 | Starting point
25 | --------------
26 |
27 | The [current version of the IPLD Data Model spec] doesn't say anything about the shape of Map keys in the definition of Maps at all, though in the "Motivation" section it states:
28 |
29 | > IPLD Data Model in that it cannot support some advanced features (like non-string keys for Maps) because support for such a feature is not common enough among programming languages.
30 |
31 | So Map keys are strings. It doesn't go into detail about the definition of "string". It is likely the same kind of string as the IPLD Data Model `String` kind, which "should be UTF-8 text", but it could also be other definitions of "string".
32 |
33 |
34 | ### Definition of strings
35 |
36 | And the problem is exactly the definition of "string". To me, preferring coding in Python or Rust, "string" means UTF-8, or more generally spoken a sequence of valid Unicode characters. In the past haven't actually thought about exactly we are defining strings in the IPLD Data Model, I just implemented/treated them as always valid Unicode.
37 |
38 | I also just learnt recently that JavaScript is using [UTF-16] internally (just as Java does), which can store arbitrary bytes, hence a string may contain invalid Unicode characters.
39 |
40 |
41 | Discussions
42 | -----------
43 |
44 | A more precise specification of Map keys is needed. This problem was convoluted with the discussion about how strings should be defined. Which happened at a [Gist from @warpfork], lead to [response from me] and yet another [doc about the IPLD Data Model in general]. After long discussions on IRC with @warpfork about all this, we found a way to separate this problem from the discussion about the `String` kind representation.
45 |
46 |
47 | ### Clear distinction of the Data Model
48 |
49 | In order to have a rather self-contained document, I want to repeat a few things from my [doc about the IPLD Data Model in general]. We should make a clear distinction between IPLD Codecs, the IPLD Data model and the programming language.
50 |
51 | - IPLD Codec: A Codec specifies how IPLD Data model kinds are actually serialized. As full round-trippability is desired, there should be only one way to convert a Data Model kind into a Codec and back.
52 | - IPLD Data Model: The Ipld Data Model only specifies the properties of a kind. It doesn't specify any memory layouts or how things are serialized (that's part of the Codec). Kinds might also be represented differently in different programming languages.
53 | - Programming languages: The programming language represents each IPLD Data Model kind in its own type system. Those types aren't (even can't) be unified across languages.
54 |
55 |
56 | ### Using a specific kind as key
57 |
58 | As mentioned before, I wanted to separate this problem from the discussion about the representation of the `String` kind. Hence using it as the map key is not an option. This makes the `Bytes` kind the only option.
59 |
60 | If you are using a language which supports arbitrary bytes in string (e.g. JavaScript, Go), then you might want to use that string type for `Map` keys (it might even be the only sensible way, e.g. in JavaScript). Though you might still want to use another type if `Bytes` are used as values. Having the same kind being different programming language types under certain circumstances is (while technically possible) difficult to understand and reason about.
61 |
62 | The same applies to codecs. For text based Codecs like DAG-JSON, you would encode the `Bytes` kind as Base64. But `Map` keys are expected to be valid UTF-8 most of the time, so you would want to have readable (not Base64 encoded) representation which does some escaping in case there are non UTF-8 characters. So again you'd have two different representation for arbitrary bytes based things.
63 |
64 | And it's even true for binary Codecs. Most Codecs have some type to store arbitrary bytes. Though they might not allow for arbitrary bytes in map keys. You could solve this again with a custom encoding to turn them into valid UTF-8. But you wouldn't apply that for the `Bytes` kind.
65 |
66 |
67 | Solution
68 | --------
69 |
70 | To solve this problem, we see the `Map` kind more holistically and make the keys part of it. We specify that the keys of the `Map` kind need to be arbitrary bytes. This aligns well with the Data Model only defining properties and not actual memory layouts or other representations. It is now up to the Codecs and the programming language to decide how to represent it.
71 |
72 | So for example you could decide that the DAG-CBOR codec is using the CBOR String type for the keys for backwards compatibility reasons. Though your programming language, e.g. Python could use its `bytes` type as map keys.
73 |
74 |
75 | [@vmx]: https://github.com/vmx
76 | [IPLD Data Model]: https://specs.ipld.io/data-model-layer/data-model.html
77 | [current version of the IPLD Data Model spec]: https://github.com/ipld/specs/blob/fd3697982f031405ffa00fff71801d3759d06f1f/data-model-layer/data-model.md
78 | [UTF-16]: https://en.wikipedia.org/wiki/UTF-16
79 | [Gist from @warpfork]: https://gist.github.com/warpfork/3aea1c0f60d0d27ab03d1bd24cc05f35
80 | [response from me]: https://gist.github.com/vmx/9eb56f525370d405bf5155a0aa5be3b9
81 | [yet another doc about the IPLD Data Model in general]: https://github.com/ipld/specs/pull/324
82 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/2020.10-data-model-numbers-and-codecs.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | IPLD Data Model numbers and Ipld Codecs
17 | =======================================
18 |
19 | **Author**: Volker Mische ([@vmx])
20 |
21 | This is an exploration report about how to serialize [IPLD Data Model] numbers into bytes and back.
22 |
23 |
24 | Canonical representations
25 | -------------------------
26 |
27 | In IPLD it's desirable to have one single representation for every instance of the Data Model. This would lead to a bijective mapping between the Data Model and the encoded bytes. If you have a Data Model instance and you use a specific codec, it should always result in the same bytes, independent of the programming language used. That's important for content-addressing.
28 |
29 |
30 | ### Conversion into the Data Model
31 |
32 | This looks like a hard problem, e.g. if you look at this IPFS issue with [DAG-CBOR] where there is a [different encoding between JavaScript and Go when numbers are integers]. Though in this issue, the problem is **not** the Data Model to Codec conversion. The problem is the **input data to Data Model conversion**. It took me a long time to figure that out.
33 |
34 | When I think of the IPLD Data Model I often think in terms of JSON and having numbers represented as text. I then think about how to represent those things as bytes. If you think about it that way, you have problems like: is `4251.00` and `4251` the same number? If yes, then it should be encoded the same way?
35 |
36 | From a Data Model perspective that's a non-issue. You have two kinds of numbers: [Integer] and [Float]. You can clearly distinguish between `Integer(4251)` and `Float(4251)`.
37 |
38 |
39 | ### Serializing the Data Model
40 |
41 |
42 | #### Codec with multiple integer/float sizes
43 |
44 | Even within the Data Model there might be ambiguities, depending on Codec you use. If your Codec supports more than one integer type (which is often the case), you need to decide which one to choose. The obvious way is what CBOR in its [canonical representation] is doing:
45 |
46 | > Integers must be as small as possible.
47 |
48 | This means that a codec that supports 8-bit to 64-bit integers will encode `8472` always as `0x2118` and never as `0x00002118`.
49 |
50 | For floating point numbers (I always mean [binary IEEE-754 floating point numbers], others don't matter in modern day computing) the same issue arises. If your codec supports more than one floating point number type, e.g. 32-bit single precision and 64-bit double precision, a number like [`0.199951171875`] could be represented as `0x3e4cc000` (32-bit) or `0x3fc9980000000000` (64-bit). You can, just as with integers, require the smallest lossless representation of a float.
51 |
52 | Such a conversion to the smallest possible float makes sense for the sake of having a canonical representation, it doesn't make sense for space savings. Only about 0.00000002% of all the 64-bit floats can be represented as 32-bit floats. So it's not that likely that your data contains exactly those floats.
53 |
54 |
55 | #### Codecs with one type for integers and one for floats
56 |
57 | If you restrict your Codec to a single integer and a single float type, the conversion from the Codec into the IPLD Data Model becomes much simpler, as you obviously don't have to deal with different types.
58 |
59 | For integers, you could just a sensibly large integer, e.g. 64-bit and always serialize only into that. In case you want to save bytes, you can use variable sized integers like [LEB128].
60 |
61 | For floats the only sensible way is using IEEE-754 double precision float only. They are well supported by almost all programming languages.
62 |
63 |
64 | Problems with JavaScript
65 | ------------------------
66 |
67 | Creating a canonical representation for numbers is possible when you distinguish between integers and floats. The IPLD Data Model has that property, hence it's not a problem designing a Codec that has a canonical representation of numbers. Though, there is the problem with JavaScript. Currently, most (all?) IPLD Data Model implementations work with native JavaScript types without any wrappers around them. The problem is that historically JavaScript only has a [`Number`] type, which is always a 64-bit IEEE-754 float. You cannot distinguish between `4251.00` and `4251`. Both are always floats.
68 |
69 | This means you can't really round-trip Data Model encodings in JavaScript. It looses the type information whether something was an integer as soon as it becomes a native `Number`. Possible solutions are using wrapper classes (we do the same in other programming language implementations like Go or Rust) or you leverage the recently introduced [`BigInt`] type.
70 |
71 |
72 | Thanks
73 | ------
74 |
75 | Thanks [@mikeal] for making me put more thought into this and [@bobbl] for making me realize that most of the difficulties were in fact JavaScript related.
76 |
77 |
78 | [@vmx]: https://github.com/vmx
79 | [IPLD Data Model]: https://specs.ipld.io/data-model-layer/data-model.html
80 | [different encoding between JavaScript and Go when numbers are integers]: https://github.com/ipld/interface-ipld-format/issues/9#issuecomment-431029329
81 | [converting from JSON to CBOR]: https://tools.ietf.org/html/rfc7049#section-4.2
82 | [DAG-CBOR]: https://specs.ipld.io/block-layer/codecs/dag-cbor.html
83 | [`0.199951171875`]: https://float.exposed/0x3e4cc000
84 | [Integer]: https://specs.ipld.io/data-model-layer/data-model.html#integer-kind
85 | [Float]: https://specs.ipld.io/data-model-layer/data-model.html#float-kind
86 | [canonical representation]: https://tools.ietf.org/html/rfc7049#section-3.9
87 | [LEB128]: https://en.wikipedia.org/wiki/LEB128
88 | [binary IEEE-754 floating point numbers]: https://en.wikipedia.org/wiki/IEEE_754
89 | [`Number`]: https://developer.mozilla.org/en-US/docs/Glossary/Number
90 | [`BigInt`]: https://developer.mozilla.org/en-US/docs/Glossary/BigInt
91 | [@mikeal]: https://github.com/mikeal
92 | [@bobbl]: https://github.com/bobbl
93 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/2020.10-schema-listpair-extension.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | Schema listpair extension
17 | =========================
18 |
19 | **Author**: Volker Mische ([@vmx])
20 |
21 | First of all, this idea wasn't mine, it came up during a conversation with [@ianopolous]. I also don't think it's a new idea, but I'm not aware that we've a written record of this.
22 |
23 | This idea is related to the discussion about the [IPLD Data Model] `String` kind should be defined, as 8-bit byte sequence or as sequence of Unicode characters. I get into details why it can help with that discussion as well at the end of the document. Though I encourage readers to read this exploration report without that issue in mind.
24 |
25 |
26 | The idea
27 | --------
28 |
29 | Currently the [IPLD Schemas] `Map` kind shares the same limitations in regards to the key kind as the Data Model `Map`s do. They can only be of kind `String`. There are applications where it would be convenient to be able to other kinds as well, for example the `Integer` kind.
30 |
31 | IPLD Schemas already support different representations for the `Map` kind. One of them is [`listpairs`]. This representation could be extended to support other kinds as well. A `Map` with `Integer` keys could look like that:
32 |
33 | ```ipldsch
34 | type IntegerKeyedMap { Int: String } representation listpairs
35 | ```
36 |
37 | Some data matching the `IntegerKeyedMap` Map (shown as JSON) is:
38 |
39 | ```json
40 | [[16711680, "red"], [65280, "green"], [255, "blue"]]
41 | ```
42 |
43 |
44 | Analysis
45 | --------
46 |
47 | ### Other representations
48 |
49 | Giving the `listpairs` representation the power of having keys with arbitrary key, makes it a special case. The other representations won't work with it.
50 |
51 | So for example the default `map` representation with a definition like this
52 |
53 | ```ipldsch
54 | type IntegerKeyedMap { Int: String }
55 | ```
56 |
57 | would need to produce an error as IPLD Data Model `Map`s only support keys with the `String` kind.
58 |
59 | In also applies to the `stringpairs` representation. Though that representation has already constraints that they key as well as the value need to be representable as a Data Model `String` kind.
60 |
61 | The different representations would support these key value pairs:
62 |
63 | | Representation | Key | Value |
64 | | -------------- | -------- | -------- |
65 | | `stringpairs` | `String` | `String` |
66 | | `map` | `String` | any kind |
67 | | `listpairs` | any kind | any kind |
68 |
69 |
70 | ### Selectors
71 |
72 | Selectors operate on the Data Model and not on the Schema level. This means that making the proposed `listpairs` extension to the Schema `Map` kind won't change anything on the Selectors.
73 |
74 | Though this is not fully true. When thinking about Selectors, you also want to apply them to things that are expressed in the Schema, but is not necessarily reflected in the Data Model representation. An example are `Struct`s where you want to access something by the field name, even if its representation `tuple`.
75 |
76 | The same method can be applied to the extended `listpairs` representation for `Map`s. The Selector could be translated into a pure Data Model representation, or your implementations might have an abstraction to support iterating over such a different map representation.
77 |
78 |
79 | ### Pathing
80 |
81 | Basic pathing is done in the Data Model, so no changes would be needed. In order to path over kinds other than strings, some more advanced pathing that is Schema aware would be needed. It needs to be determined whether this should be part of the basic pathing, the Selectors or something separate. Independent of this proposal, something similar would be needed in case pathing over `Struct`s should be supported (which I think should be).
82 |
83 |
84 | ### Codecs
85 |
86 | Depending on the Codec, things can be similarly efficient as with Data Model `Map`s. Take CBOR as an example. Let's not think about the Data Model for a moment, but only about the Codec. In CBOR (not in *DAG-CBOR*) it's possible to have maps with integer keys:
87 |
88 | ```
89 | A3 # map(3)
90 | 1A 00FF0000 # unsigned(16711680)
91 | 63 # text(3)
92 | 726564 # "red"
93 | 19 FF00 # unsigned(65280)
94 | 65 # text(5)
95 | 677265656E # "green"
96 | 18 FF # unsigned(255)
97 | 64 # text(4)
98 | 626C7565 # "blue"
99 | ```
100 |
101 | The same data with representation `listpairs`, which could be decoded into a valid Data Model (hence also *DAG-CBOR*), looks like this:
102 |
103 |
104 | ```
105 |
106 | 83 # array(3)
107 | 82 # array(2)
108 | 1A 00FF0000 # unsigned(16711680)
109 | 63 # text(3)
110 | 726564 # "red"
111 | 82 # array(2)
112 | 19 FF00 # unsigned(65280)
113 | 65 # text(5)
114 | 677265656E # "green"
115 | 82 # array(2)
116 | 18 FF # unsigned(255)
117 | 64 # text(4)
118 | 626C7565 # "blue"
119 | ```
120 |
121 | You can see that the "pairs" add one level of indirection. So the changes are quite small.
122 |
123 | Though in case you want to get even closer to the native CBOR map with integers encoding, there could be a new representation be introduced, which stores the key value pairs in a flat list. So instead of…
124 |
125 | ```json
126 | [[16711680, "red"], [65280, "green"], [255, "blue"]]
127 | ```
128 |
129 | …the data would be represented as…
130 |
131 | ```json
132 | [16711680, "red", 65280, "green", 255, "blue"]
133 | ```
134 |
135 | …which would then encode in CBOR as:
136 |
137 |
138 | ```
139 | 86 # array(6)
140 | 1A 00FF0000 # unsigned(16711680)
141 | 63 # text(3)
142 | 726564 # "red"
143 | 19 FF00 # unsigned(65280)
144 | 65 # text(5)
145 | 677265656E # "green"
146 | 18 FF # unsigned(255)
147 | 64 # text(4)
148 | 626C7565 # "blue"
149 | ```
150 |
151 | The only difference to the map encoding is the first byte being `86` instead of `A3`, it doesn't have any other additional bytes.
152 |
153 |
154 | ### Advanced uses
155 |
156 | Extending the `listpair` definition would also enable other advanced uses. Let's take DAG-CBOR as an example. The Codec specifies a certain ordering on the map keys, which is needed so that the same data is always encoded the same way. IPLD implementations might preserve the ordering of the codec and not define their own.
157 |
158 | If that's the case, then applications could decide to impose their own custom ordering, which would also be preserved in the Codec. You could even use it from strings keys. You would manually construct the list containing the pairs on the Data Model layer.
159 |
160 |
161 | Relation to the String discussion
162 | ---------------------------------
163 |
164 | All this relates to the discussion whether Data Model Strings should be a sequence of 8-bit bytes or a sequence of Unicode characters. And related to that is the discussion whether `Map` keys [should be string, bytes or something else]. There it becomes clear that it would makes systems easier when `Map` keys could just be the same thing as string values and path segments.
165 |
166 | Though there is a problem. It is desirable that strings as values are Unicode-only to maximize interoperability, trading-off flexibility. Though for map keys, having the ability to use arbitrary bytes is nice in case you e.g. want to have filenames (which may contain non Unicode bytes) as keys. Having both things, "don't use arbitrary bytes in strings" and "it's OK to have bytes in strings" is a contradiction.
167 |
168 | With this proposal it's possible. Strings could be defined as sequence of Unicode character and disallow/discourage bytes. As a side-effect this would also enhance interoperability with Codecs that only support strings as key (e.g. Protocol Buffers) and programming languages alike.
169 |
170 | Though the use-case of filenames as map keys, e.g for IPFS, can still be served. You would use binary keys together with the `listpairs` representation to store the filenames.
171 |
172 | To me all this aligns well with:
173 |
174 | > "Simple things should be simple, complex things should be possible."
175 | > -- [Alan Kay]
176 |
177 |
178 | [@vmx]: https://github.com/vmx
179 | [@ianopolous]: https://github.com/ianopolous
180 | [IPLD Data Model]: https://specs.ipld.io/data-model-layer/data-model.html
181 | [IPLD Schemas]: https://specs.ipld.io/schemas/
182 | [`listpairs`]: https://specs.ipld.io/schemas/representations.html#map-listpairs-representation
183 | [should be string, bytes or something else]: https://hackmd.io/79okuu4eQoedhpmgVbZboA?view
184 | [Alan Kay]: https://www.quora.com/What-is-the-story-behind-Alan-Kay-s-adage-Simple-things-should-be-simple-complex-things-should-be-possible/answer/Alan-Kay-11
185 |
--------------------------------------------------------------------------------
/design/history/exploration-reports/README.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | Exploration Reports
17 | ===================
18 |
19 | Exploration Reports are a (loose) style of document we use in order to
20 | gather thoughts, share early drafts of concepts, and begin to solicit feedback.
21 |
22 | - There is minimal restriction on formatting;
23 | - The goal is to produce referenceable material;
24 | - Content may be single-author and consensus is not a prerequisite;
25 | - Dates are used to provide rough contextualization of how freshly handled a matter is.
26 |
27 | ### format
28 |
29 | Exploration Reports don't have a *strict* format, just general intentions.
30 | Part of the purpose of exploration reports is to *make writing possible*
31 | and lower the barriers to entry to get something written and sharable;
32 | overly strict formats and formalizations would be antithetical to this purpose.
33 |
34 | ### producing reference material
35 |
36 | Remember, a goal of these documents is to produce documents viable to use
37 | as touchpoints in discussion, and not necessarily serve as the discussion format itself.
38 | For that reason, we treat them as (roughly) *append-only*; and, typically,
39 | exploration reports will have a single author (more on that in the next section).
40 |
41 | ### non-consensus
42 |
43 | Exploration Reports are intended to encourage writing things down and forming
44 | bodies of thought which we can reference in the future -- and do so *without*
45 | putting any burdens of consensus-seeking up-front (which we observe can become
46 | a barrier that makes sharing and recording early design processes harder).
47 |
48 | Therefore, it's not just acceptable but even typical for Exploration Reports
49 | to have a *single author*.
50 | Correspondingly, Exploration Reports don't need to reach for full solutions,
51 | which means as long as the report covers reasonably interesting ground,
52 | it should often be possible to merge them with minimal turnaround time.
53 |
54 | You can find similar patterns used in other communities described here:
55 |
56 | - https://github.com/golang/go/wiki/ExperienceReports
57 |
--------------------------------------------------------------------------------
/design/libraries/README.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | Library Design Guidance
17 | =======================
18 |
19 | This directory contains some documentation of recommendations for
20 | library authors who want to make IPLD libraries in a new language
21 | (or, perhaps for readers who want to understand an existing library better).
22 |
23 | Some of the information expressed here comes down to opinions moreso than specification;
24 | what is good ergonomics may vary wildly per language, so take these as
25 | recommendations rather than strictures.
26 |
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
1 | {
2 | "scripts": {
3 | "prepare:base": "rm -rf html && mkdir html && cp -a .vuepress/ html/.vuepress/ && mv html/.vuepress/README.md html/",
4 | "prepare:content": "cp -a schemas/ html/schemas/",
5 | "prepare:ipldsch": "cd html && find . -name \\*.ipldsch -exec sh -c \"echo '---\\neditLink: false\\n---\\n\\n\\`\\`\\`ipldsch' > {}.md && cat {} >> {}.md && echo '\\`\\`\\`' >> {}.md\" \\;",
6 | "prepare:json": "find html/ -name \\*.json -exec sh -c \"echo '---\\neditLink: false\\n---\\n\\n\\`\\`\\`json' > {}.md && cat {} >> {}.md && echo '\\`\\`\\`' >> {}.md\" \\;",
7 | "build": "npm run build:prepare && vuepress build",
8 | "build:prepare": "set -e; for t in prepare:base prepare:content prepare:ipldsch prepare:json; do npm run $t; done",
9 | "dev": "npm run build:prepare && vuepress dev",
10 | "serve": "npm run build:prepare && vuepress dev"
11 | },
12 | "dependencies": {
13 | "vuepress": "^1.5.4"
14 | }
15 | }
16 |
--------------------------------------------------------------------------------
/schemas/README.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | IPLD Schemas
17 | ============
18 |
19 | **This document has moved:** https://ipld.io/docs/schemas/
20 |
--------------------------------------------------------------------------------
/schemas/advanced-layouts.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Advanced Layouts for IPLD Schemas
17 |
18 | **This document has moved:** https://ipld.io/docs/schemas/
19 |
--------------------------------------------------------------------------------
/schemas/authoring-guide.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Authoring IPLD Schemas
17 |
18 | **This document has moved:** https://ipld.io/docs/schemas/
19 |
--------------------------------------------------------------------------------
/schemas/examples.ipldsch:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # This document has moved: https://ipld.io/specs/schemas/examples.ipldsch
17 |
--------------------------------------------------------------------------------
/schemas/examples.ipldsch.json:
--------------------------------------------------------------------------------
1 | {
2 | "error": "This document has moved: https://ipld.io/specs/schemas/examples.ipldsch.json"
3 | }
4 |
--------------------------------------------------------------------------------
/schemas/feature-summary.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # IPLD Schemas Feature Summary
17 |
18 | **This document has moved:** https://ipld.io/docs/schemas/
19 |
--------------------------------------------------------------------------------
/schemas/goals.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # IPLD Schemas Goals
17 |
18 | **This document has moved:** https://ipld.io/docs/schemas/
19 |
--------------------------------------------------------------------------------
/schemas/introduction.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # An Introduction to IPLD Schemas
17 |
18 | **This document has moved:** https://ipld.io/docs/schemas/
19 |
--------------------------------------------------------------------------------
/schemas/links.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Links and IPLD Schemas
17 |
18 | **This document has moved:** https://ipld.io/docs/schemas/
19 |
--------------------------------------------------------------------------------
/schemas/migration.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | IPLD Schemas and migration
17 | ==========================
18 |
19 | **This document has moved:** https://ipld.io/docs/schemas/
20 |
--------------------------------------------------------------------------------
/schemas/representations.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # Representations of IPLD Schema Kinds
17 |
18 | **This document has moved:** https://ipld.io/docs/schemas/
19 |
--------------------------------------------------------------------------------
/schemas/schema-kinds.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | # IPLD Schema Kinds
17 |
18 | **This document has moved:** https://ipld.io/docs/schemas/
19 |
--------------------------------------------------------------------------------
/schemas/schema-schema.ipldsch:
--------------------------------------------------------------------------------
1 | # This document has moved: https://ipld.io/specs/schemas/schema-schema.ipldsch
--------------------------------------------------------------------------------
/schemas/schema-schema.ipldsch.json:
--------------------------------------------------------------------------------
1 | {
2 | "error": "This document has moved: https://ipld.io/specs/schemas/schema-schema.ipldsch.json"
3 | }
4 |
--------------------------------------------------------------------------------
/selectors/example-selectors.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | Example Selectors
17 | =================
18 |
19 | In this document, we will show some examples of selectors.
20 |
21 | #### A quick note about formatting
22 |
23 | We will typically show some sample data as JSON encoded documents,
24 | and show the Selectors as YAML documents.
25 | **Remember,** any IPLD document can be encoded in any multicodec;
26 | these examples are JSON for convenience and by convention, but all
27 | the examples would work the same even if they were in other formats.
28 | Similarly, **Selectors themselves are just IPLD documents**, so while
29 | we've written YAML forms here, they could just as well be e.g. JSON or CBOR.
30 | When we write YAML, we'll also use quoted strings for user-supplied strings,
31 | and unquoted strings as map keys when it's for a field specified by the schema.
32 |
33 | For human convenience, we will also pretend there are a few modifications
34 | to the Selector schema:
35 |
36 | - we will disregard all 'rename' shortenings specified by the schema;
37 | - and we will use type names for union discriminators,
38 | instead of the shorter keys specified in the schema for those unions.
39 |
40 | Therefore, note that the *real* serialized selectors will be significantly terser!
41 |
42 |
43 | Examples
44 | --------
45 |
46 | ### Deeply nested path
47 |
48 | In some DAG you want to get one specific value you know the path of. Let's say you want to get the birth year of a specific character of a specific show.
49 |
50 | Example data (as JSON):
51 |
52 | ```json
53 | {
54 | "show": "start-trek-voyager",
55 | "characters": {
56 | "kathryn-janeway": {
57 | "birthday": {
58 | "day": 20,
59 | "month": 5,
60 | "year": 2328
61 | },
62 | "rank": "vice admiral"
63 | }
64 | }
65 | }
66 | ```
67 |
68 | A Selector to extract the "year" data could look like this:
69 |
70 | ```yaml
71 | Selector:
72 | ExploreFields:
73 | fields:
74 | "characters":
75 | ExploreFields:
76 | fields:
77 | "kathryn-janeway":
78 | ExploreFields:
79 | fields:
80 | "birthday":
81 | ExploreFields:
82 | fields:
83 | "year":
84 | Matcher:
85 | {}
86 | ```
87 |
88 |
89 | ### Getting a certain number of parent blocks in a blockchain
90 |
91 | You want to get a certain number of parents from a certain block.
92 |
93 | The shape of a block could look like this (in JSON):
94 |
95 | ```json
96 | {
97 | "parent": "",
98 | "time": 1549641260,
99 | "none": 3423545
100 | }
101 | ```
102 |
103 | If you know you want five parents you could continue to use explicit Field Selectors:
104 |
105 | ```yaml
106 | Selector:
107 | ExploreFields:
108 | fields:
109 | "parent":
110 | ExploreFields:
111 | fields:
112 | "parent":
113 | ExploreFields:
114 | fields:
115 | "parent":
116 | ExploreFields:
117 | fields:
118 | "parent":
119 | ExploreFields:
120 | fields:
121 | "parent":
122 | Matcher:
123 | {}
124 | ```
125 |
126 | This selector matches the fifth-deepest "parent"
127 |
128 | But this gets a bit verbose. We can explore the same tree in a similar
129 | pattern with another mechanism -- recursive exploration:
130 |
131 | ```yaml
132 | Selector:
133 | ExploreRecursive:
134 | maxDepth: 5
135 | sequence:
136 | ExploreFields:
137 | fields:
138 | "parent":
139 | ExploreRecursiveEdge
140 | ```
141 |
142 | This will traverse the same set of nodes as the previous example -- however,
143 | it has has a *slightly* different effect.
144 |
145 | Using a recursive selector in this way matches *each* of the "parent" nodes,
146 | up to the depth limit -- meaning it matches five nodes, instead of the
147 | previous example, which matches only the last one.
148 |
149 | Implementations that return all visited notes (and not only the matched ones)
150 | will return the same set of notes for both examples.
151 |
152 |
153 | ### Getting changes up to a certain one
154 |
155 | This use case is inspired by CRDTs, where you have a chain of changes. You observe a new change and want to get all the previous changes up to the one that you have already observed. It is a recursive query with a CID as stopping condition.
156 |
157 | The shape of a change could look like this (in JSON):
158 |
159 | ```json
160 | {
161 | "prev": "prevcid",
162 | "timestamp": 1549641260,
163 | "value": "abc"
164 | }
165 | ```
166 |
167 | It will be a Recursive Selector following along until it reaches a link of a
168 | certain value (`somecid` in this case):
169 |
170 | ```yaml
171 | Selector:
172 | ExploreRecursive:
173 | maxDepth: 100
174 | sequence:
175 | ExploreFields:
176 | fields:
177 | "prev":
178 | ExploreRecursiveEdge
179 | stopAt:
180 | TBD: # Conditions are specified yet
181 | ```
182 |
183 |
184 | ### Retrieving data recursively
185 |
186 | [UnixFSv1] is a good case of a recursive data structure. Any number of links can be used to create deeply nested structures. The following example is inspired by UnixFSv1, but uses a simplified structure, so that we don't get lost in UnixFSv1 specific implementation details.
187 |
188 | The basic structure described by an [IPLD Schema]:
189 |
190 | ```ipldsch
191 | type FileSystem struct {
192 | # Only leaf nodes have data
193 | data optional Bytes
194 | links [Link]
195 | }
196 |
197 | type Link struct {
198 | name String
199 | cid &FileSystem
200 | }
201 | ```
202 |
203 | The recursion comes into play with the `cid` field that points to another block that has exactly the same structure. As you can transparently path through IPLD structures that are composed of several blocks, we just omit those in the following example and pretend that we have a single large object describing our file system.
204 |
205 |
206 | ```json
207 | {
208 | "links": [{
209 | "name": "subdir1",
210 | "cid": {
211 | "links": [{
212 | "name": "somedata.txt",
213 | "cid": {
214 | "data": "74686572652773206461746120696e2068657265",
215 | "links": []
216 | }
217 | }]
218 | }
219 | },{
220 | "name": "subdir2",
221 | "cid": …
222 | }]
223 | }
224 |
225 | ```
226 |
227 | The following selector visits all `links` and matches all `data` fields:
228 |
229 |
230 | ```yaml
231 | Selector:
232 | ExploreRecursive:
233 | maxDepth: 1000
234 | sequence:
235 | ExploreFields:
236 | fields:
237 | "data":
238 | Matcher:
239 | {}
240 | "links":
241 | ExploreAll:
242 | next:
243 | ExploreFields:
244 | fields:
245 | "cid":
246 | ExploreRecursiveEdge
247 | ```
248 |
249 | [UnixFSv1]: https://github.com/ipfs/specs/tree/master/unixfs
250 | [IPLD Schema]: ../schemas
251 |
--------------------------------------------------------------------------------
/selectors/selectors.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ipld/specs/a7b9376ebd43aeabba7d78487db3d9df456b7714/selectors/selectors.jpg
--------------------------------------------------------------------------------
/theory-of-change.md:
--------------------------------------------------------------------------------
1 |
2 | !!!
3 |
4 | This document has **moved**.
5 |
6 | You'll now find information like this in the [ipld/ipld](https://github.com/ipld/ipld/) meta-repo,
7 | and published to the web at https://ipld.io/ .
8 |
9 | All documentation, fixtures, specifications, and web content is now gathered into that repo.
10 | Please update your links, and direct new contributions there.
11 |
12 | !!!
13 |
14 | ----
15 |
16 | The Vision of IPLD, and our Theory of Change
17 | ============================================
18 |
19 | IPLD is a project with the long term aim of increasing the amount of technology
20 | which develops based on decentralized, "local-first" principles.
21 |
22 | This is a very holistic goal: we want to increase the amount of such technology
23 | by making it easier to write such technology;
24 | by making such technology easier to use and consistently more reliably than its centralized counterparts;
25 | by making it clear that such technology can provide features and possibilities that centralized software simply can't.
26 | We want to make all of these things true in a harmonious self-reinforcing cycle of ecosystemic growth,
27 | where continuing development of decentralized local-first technologies makes
28 | more development of *even more* decentralized local-first technologies more appealing.
29 |
30 |
31 | What is a Theory of Change?
32 | ---------------------------
33 |
34 | A Theory of Change is an understanding that some of our goals are very long term.
35 | We can't work on all of them directly: in many cases we have to plan for more reachable intermediate goals,
36 | identify ways to know we've reached those, and plan actions that lead to those intermediate goals;
37 | in other cases, it means we aren't pursuing goals directly at all, but just trying to support
38 | the right _conditions_ for our desired outcomes to be able to emerge.
39 |
40 |
41 | The IPLD Theory of Change
42 | -------------------------
43 |
44 | IPLD is the “data” layer of our vision for a Decentralized Web.
45 | However, since IPLD doesn’t really concern itself with how blocks are stored or how they are distributed,
46 | the work we’re doing can be leveraged by many projects and use cases beyond just “Decentralization.”
47 |
48 | As an example, git is a decentralized technology built on decentralized primitives,
49 | yet most people interact with aspects of git through centralized services (GitHub, GitLab, etc).
50 | These services solve acute coordination problems while git itself delivers an unmatched user experience because it’s built on primitives that don’t assume a central authority.
51 |
52 | IPLD brings a superset of the capabilities of git’s internals to the developer community.
53 | We don’t yet know how those will be leveraged, but it’s a fair bet that many of them
54 | especially in the near future, will use centralization or federation in order to solve some acute problems their applications face.
55 |
56 | There’s both a gradual story to technical progress and a more radical one.
57 | The radical one is that certain core innovations spark fundamental changes in the computing ecosystem.
58 | Another says that adoption of technological changes are more gradual and that something that is too big of a change cannot be adopted quickly.
59 |
60 | IPLD comes from the perspective of radical change (Let’s Re-Decentralize the Web!)
61 | but it’s a simple enough primitive that you can also see the case for it being adopted as part of a more gradual shift in computing,
62 | where centralized, federated and edge services adopt it to solve problems it’s uniquely well suited to.
63 | This increases general familiarity with the project and its primitives and lays the groundwork for continued gradual change towards fully distributed systems.
64 |
65 | The great thing about IPLD is that we don’t have to pick one of these theories;
66 | IPLD is well suited to both and as long as we don’t assume our community will come from a specific perspective we can just release good code,
67 | plant a lot of seeds in different developer communities, and see which ones sprout and grow.
68 |
69 |
70 | An Example
71 | ----------
72 |
73 | One concrete example we use to state our objectives in the IPLD project is this:
74 |
75 | > Imagine you want to write the next "git" (or imagine git hadn't been made yet, and you're about to do it).
76 | > Some decentralized project: you know you need content-address data primitives, but haven't written it yet.
77 | >
78 | > The goal of IPLD is to make that take one order of magnitude _less time_ than it otherwise would have.
79 |
80 | (Thereafter, we also hope IPLD made that project enduringly easier to debug,
81 | easier to build compatibility stories with other projects,
82 | easier to actually ship the content around,
83 | easier to build long term migration strategies for as the project evolves,
84 | and so on. But in general: all these things should be *easier* to do with IPLD
85 | in contrast to if you tried to do all these things alone.)
86 |
--------------------------------------------------------------------------------
/what_is_ipld.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ipld/specs/a7b9376ebd43aeabba7d78487db3d9df456b7714/what_is_ipld.png
--------------------------------------------------------------------------------