├── .circleci
└── config.yml
├── .gitignore
├── 0000-template.md
├── LICENSE
├── README.md
└── text
├── 0001-rfc_process.md
├── 0002-qri_dataset_definition.md
├── 0003-content_addressed_file_system.md
├── 0004-structured_data_io.md
├── 0006-dataset_naming.md
├── 0011-html_viz.md
├── 0012-CLI-commands.md
├── 0013-api.md
├── 0014-export.md
├── 0015-add-geo-processing-abilities-to-skylark.md
├── 0016-revise_transform_processing.md
├── 0017-define_dataset_creation.md
├── 0018-publish-update.md
├── 0019-manifests.md
├── 0020-distingush_manual_vs_scripted_transforms.md
├── 0021-export_behavior.md
├── 0022-remotes.md
├── 0023-starlark_load_dataset.md
├── 0024-scheduled-updates.md
├── 0025-filesystem-integration.md
├── 0026-starlark_expose.md
├── 0027-assets
├── dataset_01.png
├── dataset_02_patch_application.png
├── dataset_03_xform_diff_shape.png
├── dataset_04_xform_same_shape.png
├── dataset_05_conflict.png
└── dataset_06_working_dir.png
├── 0027-transform_application.md
├── 0028-externalize_private_keys.md
├── 0029-config_revision.md
├── 0030-replace_publish_clone_with_push_pull.md
├── 0031-expanded_remove.md
├── 0032-access_command.md
└── 0033-storage_command.md
/.circleci/config.yml:
--------------------------------------------------------------------------------
1 | version: '2'
2 | jobs:
3 | build:
4 | working_directory: "~/rfcs"
5 | docker:
6 | - image: circleci/golang:1.11
7 | steps:
8 | - checkout
9 | - run:
10 | name: install misspell
11 | command: curl -L -o ./install-misspell.sh https://git.io/misspell && sh ./install-misspell.sh
12 | - run:
13 | name: Check Spelling
14 | command: ./bin/misspell -error ~/rfcs/text/
15 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | book/
3 | src/
4 | .DS_Store
5 |
--------------------------------------------------------------------------------
/0000-template.md:
--------------------------------------------------------------------------------
1 | - Feature Name:
2 | - Start Date:
3 | - RFC PR:
4 | - Issue:
5 |
6 | # Summary
7 | [summary]: #summary
8 |
9 |
10 |
11 | # Motivation
12 | [motivation]: #motivation
13 |
14 |
15 |
16 | # Guide-level explanation
17 | [guide-level-explanation]: #guide-level-explanation
18 |
19 |
28 |
29 | # Reference-level explanation
30 | [reference-level-explanation]: #reference-level-explanation
31 |
32 |
39 |
40 | # Drawbacks
41 | [drawbacks]: #drawbacks
42 |
43 | Why should we *not* do this?
44 |
45 | # Rationale and alternatives
46 | [rationale-and-alternatives]: #rationale-and-alternatives
47 |
48 |
51 |
52 | # Prior art
53 | [prior-art]: #prior-art
54 |
55 |
68 |
69 | # Unresolved questions
70 | [unresolved-questions]: #unresolved-questions
71 |
72 |
75 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2018 QRI, Inc.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in
13 | all copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21 | THE SOFTWARE.
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Qri RFCs
2 |
3 | [Qri RFCs]: #qri-rfcs
4 |
5 | Many changes can be implemented and reviewed via the normal GitHub pull request
6 | workflow. Some changes though are "substantial", and we ask that these be put
7 | through a bit of a design process and produce a consensus among the qri
8 | community and core team.
9 |
10 | The "RFC" (request for comments) process is intended to provide a consistent
11 | and controlled path for charting the roadmap of qri. We've seen a number of
12 | projects in the distributed space suffer from under-considered design choices
13 | and unclear roadmapping, We're hoping strong adherence to a lightweight RFC
14 | process can help mitigate these problems. You should be able get a sense of
15 | where qri is going by reading through the accepted proposals.
16 |
17 | We openly adknowledge this may seem premature for such an early-stage project.
18 | We're intending to put this RFC place in process now to develop a design-driven
19 | culture that others have a clear path to contribute to the future of
20 | the project.
21 |
22 | This process is _deeply_ inspired by the [rust language RFC process](https://github.com/rust-lang/rfcs),
23 | which builds on the [Python Enhancement Proposals process](https://www.python.org/dev/peps/),
24 | a big thank-you to these projects for leading the way.
25 |
26 |
27 | ## Table of Contents
28 | [Table of Contents]: #table-of-contents
29 |
30 | - [Opening](#qri-rfcs)
31 | - [Table of Contents]
32 | - [When you need to follow this process]
33 | - [Before creating an RFC]
34 | - [What the process is]
35 | - [The RFC life-cycle]
36 | - [Reviewing RFCs]
37 | - [Implementing an RFC]
38 | - [RFC Postponement]
39 | - [Help this is all too informal!]
40 | - [License]
41 |
42 |
43 | ## When you need to follow this process
44 | [When you need to follow this process]: #when-you-need-to-follow-this-process
45 |
46 | Most qri repositories follow [angular commit conventions](https://github.com/angular/angular.js/blob/master/DEVELOPERS.md#type)
47 | which designates 8 _types_ of change:
48 |
49 | - **feat:** A new feature
50 | - **fix:** A bug fix
51 | - **docs:** Documentation only changes
52 | - **style:** Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
53 | - **refactor:** A code change that neither fixes a bug nor adds a feature
54 | - **perf:** A code change that improves performance
55 | - **test:** Adding missing or correcting existing tests
56 | - **chore:** Changes to the build process or auxiliary tools and libraries such as documentation generation
57 |
58 | We only need RFCs for `feat` and breaking `refactor` changes to any of our novel
59 | projects, which are the [qri](https://github.com/qri-io/qri) and [dataset](https://github.com/qri-io/dataset)
60 | repositories. `feat` and `refactor` changes to the [frontend](https://github.com/qri-io/frontend)
61 | will also require heavy review, but don't require an RFC.
62 |
63 | Most of the RFC's you'll see created will be coming from the core team, but
64 | that doesn't mean you can't chime in. Constructive one-off comments
65 | are more than welcome!
66 |
67 |
68 | ## Before creating an RFC
69 | [Before creating an RFC]: #before-creating-an-rfc
70 |
71 | A hastily-proposed RFC can hurt its chances of acceptance. Low quality
72 | proposals, proposals for previously-rejected features, or those that don't fit
73 | into the near-term roadmap, may be quickly rejected, which can be demotivating
74 | for the unprepared contributor. Laying some groundwork ahead of the RFC can
75 | make the process smoother.
76 |
77 | Although there is no single way to prepare for submitting an RFC, it is
78 | generally a good idea to pursue feedback from other project developers
79 | beforehand, to ascertain that the RFC may be desirable; having a consistent
80 | impact on the project requires concerted effort toward consensus-building.
81 |
82 | The best place to start is by filing an issue on this repository, the core team
83 | monitors this repo, and will provide feedback on your issue, including weather
84 | it would make for a good RFC.
85 |
86 |
87 | ## What the process is
88 | [What the process is]: #what-the-process-is
89 |
90 | In short, to get a major feature added to Qri, one must first get the RFC
91 | merged into the RFC repository as a markdown file. At that point the RFC is
92 | "active" and may be implemented with the goal of eventual inclusion into Qri.
93 |
94 | - Fork the RFC repo [RFC repository]
95 | - Copy `0000-template.md` to `text/0000-my-feature.md` (where "my-feature" is
96 | descriptive. don't assign an RFC number yet).
97 | - Fill in the RFC. Put care into the details: RFCs that do not present
98 | convincing motivation, demonstrate understanding of the impact of the
99 | design, or are disingenuous about the drawbacks or alternatives tend to be
100 | poorly-received.
101 | - Submit a pull request. As a pull request the RFC will receive design
102 | feedback from the larger community, and the author should be prepared to
103 | revise it in response.
104 | - Each pull request will be reviewed by the core team and assigned to a team
105 | member, who will take responsibility for guiding the RFC
106 | - Build consensus and integrate feedback. RFCs that have broad support are
107 | much more likely to make progress than those that don't receive any
108 | comments. Feel free to reach out to the RFC assignee in particular to get
109 | help identifying stakeholders and obstacles.
110 | - The core team will discuss the RFC pull request, as much as possible in the
111 | comment thread of the pull request itself. Offline discussion will be
112 | summarized on the pull request comment thread.
113 | - RFCs rarely go through this process unchanged, especially as alternatives
114 | and drawbacks are shown. You can make edits, big and small, to the RFC to
115 | clarify or change the design, but make changes as new commits to the pull
116 | request, and leave a comment on the pull request explaining your changes.
117 | Specifically, do not squash or rebase commits after they are visible on the
118 | pull request.
119 | - If the proposal is submitted by a core team member, it can be merged
120 | by at least 2 other core team members approving the RFC, otherwise a member
121 | of the core team will propose a "motion for final comment period" (FCP),
122 | along with a *disposition* for the RFC (merge, close, or postpone).
123 | - This step is taken when enough of the tradeoffs have been discussed that
124 | the core team is in a position to make a decision. That does not require
125 | consensus amongst all participants in the RFC thread (which is usually
126 | impossible). However, the argument supporting the disposition on the RFC
127 | needs to have already been clearly articulated, and there should not be a
128 | strong consensus *against* that position outside of the core team. Team
129 | members use their best judgment in taking this step, and the FCP itself
130 | ensures there is ample time and notification for stakeholders to push back
131 | if it is made prematurely.
132 | - For RFCs with lengthy discussion, the motion to FCP is usually preceded by
133 | a *summary comment* trying to lay out the current state of the discussion
134 | and major tradeoffs/points of disagreement.
135 | - Before actually entering FCP, *all* members of the core team must sign off;
136 | this is often the point at which many core team members first review the RFC
137 | in full depth.
138 | - The FCP lasts ten calendar days, so that it is open for at least 5 business
139 | days. It is also advertised widely,
140 | e.g. in [This Week in Qri](https://this-week-in-rust.org/). This way all
141 | stakeholders have a chance to lodge any final objections before a decision
142 | is reached.
143 | - In most cases, the FCP period is quiet, and the RFC is either merged or
144 | closed. However, sometimes substantial new arguments or ideas are raised,
145 | the FCP is canceled, and the RFC goes back into development mode.
146 |
147 | ## The RFC life-cycle
148 | [The RFC life-cycle]: #the-rfc-life-cycle
149 |
150 | Once an RFC becomes "active" then authors may implement it and submit the
151 | feature as a pull request to the Qri repo. Being "active" is not a rubber
152 | stamp, and in particular still does not mean the feature will ultimately be
153 | merged; it does mean that in principle all the major stakeholders have agreed
154 | to the feature and are amenable to merging it.
155 |
156 | Furthermore, the fact that a given RFC has been accepted and is "active"
157 | implies nothing about what priority is assigned to its implementation, nor does
158 | it imply anything about whether a Qri developer has been assigned the task of
159 | implementing the feature. While it is not *necessary* that the author of the
160 | RFC also write the implementation, it is by far the most effective way to see
161 | an RFC through to completion: authors should not expect that other project
162 | developers will take on responsibility for implementing their accepted feature.
163 |
164 | Modifications to "active" RFCs can be done in follow-up pull requests. We
165 | strive to write each RFC in a manner that it will reflect the final design of
166 | the feature; but the nature of the process means that we cannot expect every
167 | merged RFC to actually reflect what the end result will be at the time of the
168 | next major release.
169 |
170 | In general, once accepted, RFCs should not be substantially changed. Only very
171 | minor changes should be submitted as amendments. More substantial changes
172 | should be new RFCs, with a note added to the original RFC. Exactly what counts
173 | as a "very minor change" is up to the sub-team to decide; check
174 | [Sub-team specific guidelines] for more details.
175 |
176 |
177 | ## Reviewing RFCs
178 | [Reviewing RFCs]: #reviewing-rfcs
179 |
180 | While the RFC pull request is up, the core team may schedule meetings with the
181 | author and/or relevant stakeholders to discuss the issues in greater detail,
182 | and in some cases the topic may be discussed at a core team meeting. In either
183 | case a summary from the meeting will be posted back to the RFC pull request.
184 |
185 | The core team makes final decisions about RFCs after the benefits and drawbacks
186 | are well understood. These decisions can be made at any time, but the core team
187 | will regularly issue decisions. When a decision is made, the RFC pull request
188 | will either be merged or closed. In either case, if the reasoning is not clear
189 | from the discussion in thread, the sub-team will add a comment describing the
190 | rationale for the decision.
191 |
192 |
193 | ## Implementing an RFC
194 | [Implementing an RFC]: #implementing-an-rfc
195 |
196 | Some accepted RFCs represent vital features that need to be implemented right
197 | away. Other accepted RFCs can represent features that can wait until some
198 | arbitrary developer feels like doing the work. Every accepted RFC has an
199 | associated issue tracking its implementation in the Qri repository; thus that
200 | associated issue can be assigned a priority via the triage process that the
201 | team uses for all issues in the Qri repository.
202 |
203 | The author of an RFC is not obligated to implement it. Of course, the RFC
204 | author (like any other developer) is welcome to post an implementation for
205 | review after the RFC has been accepted.
206 |
207 | If you are interested in working on the implementation for an "active" RFC, but
208 | cannot determine if someone else is already working on it, feel free to ask
209 | (e.g. by leaving a comment on the associated issue).
210 |
211 |
212 | ## RFC Postponement
213 | [RFC Postponement]: #rfc-postponement
214 |
215 | Some RFC pull requests are tagged with the "postponed" label when they are
216 | closed (as part of the rejection process). An RFC closed with "postponed" is
217 | marked as such because we want neither to think about evaluating the proposal
218 | nor about implementing the described feature until some time in the future, and
219 | we believe that we can afford to wait until then to do so. Historically,
220 | "postponed" was used to postpone features until after 1.0. Postponed pull
221 | requests may be re-opened when the time is right. We don't have any formal
222 | process for that, you should ask members of the relevant sub-team.
223 |
224 | Usually an RFC pull request marked as "postponed" has already passed an
225 | informal first round of evaluation, namely the round of "do we think we would
226 | ever possibly consider making this change, as outlined in the RFC pull request,
227 | or some semi-obvious variation of it." (When the answer to the latter question
228 | is "no", then the appropriate response is to close the RFC, not postpone it.)
229 |
230 |
231 | ### Help this is all too informal!
232 | [Help this is all too informal!]: #help-this-is-all-too-informal
233 |
234 | The process is intended to be as lightweight as reasonable for the present
235 | circumstances. As usual, we are trying to let the process be driven by
236 | consensus and community norms, not impose more structure than necessary.
237 |
238 | [RFC repository]: http://github.com/qri-io/rfcs
239 |
240 | ## License
241 | [License]: #license
242 |
243 | This repository is licensed under the MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
244 |
245 | ### Contributions
246 |
247 | Unless you explicitly state otherwise, any contribution intentionally submitted
248 | for inclusion in the work by you, as defined in the shall be MIT licensed,
249 | without any additional terms or conditions.
250 |
--------------------------------------------------------------------------------
/text/0001-rfc_process.md:
--------------------------------------------------------------------------------
1 | - Feature Name: rfc_process
2 | - Start Date: 2018-08-13
3 | - RFC PR: #1
4 | - Issue: N/A
5 |
6 | # Summary
7 | [summary]: #summary
8 |
9 | The "RFC" (request for comments) process is intended to provide a consistent
10 | and controlled path for charting the roadmap of qri. We've seen a number of
11 | projects in the distributed space suffer from under-considered design choices
12 | and unclear roadmapping, We're hoping strong adherence to a lightweight RFC
13 | process can help mitigate these problems. Anyone should be able get a sense of
14 | _where qri is going_ by reading through the accepted proposals.
15 |
16 | # Motivation
17 | [motivation]: #motivation
18 |
19 | I openly adknowledge this may seem premature for such an early-stage project.
20 | I'm intending to put this RFC place in process now to develop a design-driven
21 | culture so that others have a clear path to contribute to the future of the
22 | project, and replace myself as the arbiter of "what qri is" with a codified
23 | principle of [rough consensus and working code](https://www.ietf.org/about/participate/tao/).
24 |
25 | One of the main concerns I have is ambiguity exists around what is _expected
26 | behaviour_ when interacting with qri. To me, this is a _major_ problem.
27 | Not knowing precisely what working-as-expected looks like cuts down on what core
28 | team members can confidently reason about, which means the core can't
29 | confidently tell _others_ how qri should work, which means the whole project is
30 | both ambiguous and without a clear path for resolving that ambiguity.
31 |
32 | To me this ambiguity is a sign that we (the core team) have haven't taken enough
33 | time to reach design-level consensus about how qri _should_ behave, and why I
34 | think it's not too early to start this process.
35 |
36 | There are a few key motivations for implementing this now:
37 | - develop a culture of rough consensus
38 | - inject more design-time thinking into the core team's process
39 | - maintain an accurate, reliable roadmap
40 | - create an on-ramp for others to contribute to the design of qri
41 |
42 | # Guide-level explanation
43 | [guide-level-explanation]: #guide-level-explanation
44 |
45 | For the core team adopting this RFC process will feel like three things at once:
46 | - a source of truth for _how qri should behave_
47 | - a new channel to directly impact the design of qri
48 | - lots of additional work to get new stuff into qri
49 |
50 | The first batch of RFCs should ideally come from me (@b5), outlining in detail
51 | how the various aspects of qri _should_ work (which as we know, is different
52 | from how things do/not work). The process of debating these initial RFCs should
53 | serve as a chance to work together to establish language, clarity & detail
54 | around expected behaviours. Accepting these initial RFCs provides a new,
55 | consensus-driven foundation for the project that should give core team members
56 | a confident, consistent source of truth to work from.
57 | By requiring sign-off from the core team, this should be a chance for the
58 | core team to achieve consensus on how things should work.
59 |
60 | Upon accepting this proposal, we'll also move all of the issues filed in
61 | [qri-io/notes](https://github.com/qri-io/notes) & depricate the repo. If we
62 | accept this RFC process, we should commit to it fully. Notes should now
63 | start as issues on this repo with the intention that they could develop into
64 | formal RFCs.
65 |
66 | At the same time, this should also be a chance for others to start documenting
67 | ideas we've been kicking around for enhancing qri. Things we've discussed in the
68 | past like skylark modules, simulation testing, new file format support,
69 | etc. should be collected as issues on this repo so we can start working them
70 | into RFCs.
71 |
72 | Finally this process should not get in the way. If done properly, day-to-day
73 | development should _accelerate_ once we've accepted enough RFCs to get
74 | proposals ahead of current development.
75 | It should be easier to implement a feature with confidence because implementing
76 | new things should just be coding up an already-approved design document.
77 | I want to save RFCs for design-level features, not day-to-day fixes or
78 | implementation details. It should still be perfectly acceptable to make changes
79 | on interfaces between packages, move packages around, and riff on ideas outside
80 | of the formal RFC process. RFCs should be saved for decisions that affect how we
81 | expect the user-facing edges of qri to behave.
82 |
83 | # Reference-level explanation
84 | [reference-level-explanation]: #reference-level-explanation
85 |
86 | _Most of this is copied from the Rust RFC process, except I've added a section
87 | intended to keep us moving quickly: if a core team member creates an RFC, and
88 | two other core team members approve it, it's automatically merged. We should
89 | remove this provision once qri reaches a 1.0 release._
90 |
91 | In short, to get a major feature added to Qri, one must first get the RFC
92 | merged into the RFC repository as a markdown file. At that point the RFC is
93 | "active" and may be implemented with the goal of eventual inclusion into Qri.
94 |
95 | - Fork the RFC repo [RFC repository]
96 | - Copy `0000-template.md` to `text/0000-my-feature.md` (where "my-feature" is
97 | descriptive. don't assign an RFC number yet).
98 | - Fill in the RFC. Put care into the details: RFCs that do not present
99 | convincing motivation, demonstrate understanding of the impact of the
100 | design, or are disingenuous about the drawbacks or alternatives tend to be
101 | poorly-received.
102 | - Submit a pull request. As a pull request the RFC will receive design
103 | feedback from the larger community, and the author should be prepared to
104 | revise it in response.
105 | - Each pull request will be reviewed by the core team and assigned to a team
106 | member, who will take responsibility for guiding the RFC
107 | - Build consensus and integrate feedback. RFCs that have broad support are
108 | much more likely to make progress than those that don't receive any
109 | comments. Feel free to reach out to the RFC assignee in particular to get
110 | help identifying stakeholders and obstacles.
111 | - The core team will discuss the RFC pull request, as much as possible in the
112 | comment thread of the pull request itself. Offline discussion will be
113 | summarized on the pull request comment thread.
114 | - RFCs rarely go through this process unchanged, especially as alternatives
115 | and drawbacks are shown. You can make edits, big and small, to the RFC to
116 | clarify or change the design, but make changes as new commits to the pull
117 | request, and leave a comment on the pull request explaining your changes.
118 | Specifically, do not squash or rebase commits after they are visible on the
119 | pull request.
120 | - If the proposal is submitted by a core team member, it can be merged
121 | by at least 2 other core team members approving the RFC, otherwise a member
122 | of the core team will propose a "motion for final comment period" (FCP),
123 | along with a *disposition* for the RFC (merge, close, or postpone).
124 | - This step is taken when enough of the tradeoffs have been discussed that
125 | the core team is in a position to make a decision. That does not require
126 | consensus amongst all participants in the RFC thread (which is usually
127 | impossible). However, the argument supporting the disposition on the RFC
128 | needs to have already been clearly articulated, and there should not be a
129 | strong consensus *against* that position outside of the core team. Team
130 | members use their best judgment in taking this step, and the FCP itself
131 | ensures there is ample time and notification for stakeholders to push back
132 | if it is made prematurely.
133 | - For RFCs with lengthy discussion, the motion to FCP is usually preceded by
134 | a *summary comment* trying to lay out the current state of the discussion
135 | and major tradeoffs/points of disagreement.
136 | - Before actually entering FCP, *all* members of the core team must sign off;
137 | this is often the point at which many core team members first review the RFC
138 | in full depth.
139 | - The FCP lasts ten calendar days, so that it is open for at least 5 business
140 | days. This way all stakeholders have a chance to lodge any final objections
141 | before a decision is reached.
142 | - In most cases, the FCP period is quiet, and the RFC is either merged or
143 | closed. However, sometimes substantial new arguments or ideas are raised,
144 | the FCP is cancelled, and the RFC goes back into development mode.
145 |
146 | ### The RFC life-cycle
147 | [The RFC life-cycle]: #the-rfc-life-cycle
148 |
149 | Once an RFC becomes "active" then authors may implement it and submit the
150 | feature as a pull request to the Qri repo. Being "active" is not a rubber
151 | stamp, and in particular still does not mean the feature will ultimately be
152 | merged; it does mean that in principle all the major stakeholders have agreed
153 | to the feature and are amenable to merging it.
154 |
155 | Furthermore, the fact that a given RFC has been accepted and is "active"
156 | implies nothing about what priority is assigned to its implementation, nor does
157 | it imply anything about whether a Qri developer has been assigned the task of
158 | implementing the feature. While it is not *necessary* that the author of the
159 | RFC also write the implementation, it is by far the most effective way to see
160 | an RFC through to completion: authors should not expect that other project
161 | developers will take on responsibility for implementing their accepted feature.
162 |
163 | Modifications to "active" RFCs can be done in follow-up pull requests. We
164 | strive to write each RFC in a manner that it will reflect the final design of
165 | the feature; but the nature of the process means that we cannot expect every
166 | merged RFC to actually reflect what the end result will be at the time of the
167 | next major release.
168 |
169 | In general, once accepted, RFCs should not be substantially changed. Only very
170 | minor changes should be submitted as amendments. More substantial changes
171 | should be new RFCs, with a note added to the original RFC. Exactly what counts
172 | as a "very minor change" is up to the sub-team to decide; check
173 | [Sub-team specific guidelines] for more details.
174 |
175 | ### Reviewing RFCs
176 | [Reviewing RFCs]: #reviewing-rfcs
177 |
178 | While the RFC pull request is up, the core team may schedule meetings with the
179 | author and/or relevant stakeholders to discuss the issues in greater detail,
180 | and in some cases the topic may be discussed at a core team meeting. In either
181 | case a summary from the meeting will be posted back to the RFC pull request.
182 |
183 | The core team makes final decisions about RFCs after the benefits and drawbacks
184 | are well understood. These decisions can be made at any time, but the core team
185 | will regularly issue decisions. When a decision is made, the RFC pull request
186 | will either be merged or closed. In either case, if the reasoning is not clear
187 | from the discussion in thread, the sub-team will add a comment describing the
188 | rationale for the decision.
189 |
190 |
191 | ### Implementing an RFC
192 | [Implementing an RFC]: #implementing-an-rfc
193 |
194 | Some accepted RFCs represent vital features that need to be implemented right
195 | away. Other accepted RFCs can represent features that can wait until some
196 | arbitrary developer feels like doing the work. Every accepted RFC has an
197 | associated issue tracking its implementation in the Qri repository; thus that
198 | associated issue can be assigned a priority via the triage process that the
199 | team uses for all issues in the Qri repository.
200 |
201 | The author of an RFC is not obligated to implement it. Of course, the RFC
202 | author (like any other developer) is welcome to post an implementation for
203 | review after the RFC has been accepted.
204 |
205 | If you are interested in working on the implementation for an "active" RFC, but
206 | cannot determine if someone else is already working on it, feel free to ask
207 | (e.g. by leaving a comment on the associated issue).
208 |
209 |
210 | # Drawbacks
211 | [drawbacks]: #drawbacks
212 |
213 | This is going to slow us down and consume precious time.
214 |
215 | # Rationale and alternatives
216 | [rationale-and-alternatives]: #rationale-and-alternatives
217 |
218 | On a philosophical level, I'm tired of shitty software that doesn't work. I want
219 | to raise our standards, aspiring toward principles of
220 | ["Zero Defects Programming"](https://en.wikipedia.org/wiki/Zero_Defects):
221 |
222 | > Instill in workers the will to prevent problems during design and manufacture
223 | rather than go back and fix them later
224 |
225 | The only way to prevent problems during the design phase is to **require a
226 | design phase for things that matter**. In the context of Open Source Software,
227 | the RFC process is the best I've seen for having a strong design that yields
228 | software others can depend on.
229 |
230 | # Prior art
231 | [prior-art]: #prior-art
232 |
233 | - [Rust RFC Process](https://github.com/rust-lang/rfcs)
234 | - [IETF Standards Process](https://www.ietf.org/standards/process/)
235 | - [Python Enhancement Proposals](https://www.python.org/dev/peps/)
236 |
237 | # Unresolved questions
238 | [unresolved-questions]: #unresolved-questions
239 |
240 | - What structures need to be in place to align RFCs with the mission of the
241 | project?
242 | - Do we need to somehow distill approved RFCs into a single roadmap document?
243 |
--------------------------------------------------------------------------------
/text/0003-content_addressed_file_system.md:
--------------------------------------------------------------------------------
1 | - Feature Name: content_addressed_file_system
2 | - Start Date: 2017-08-03
3 | - RFC PR: [#3](https://github.com/qri-io/rfcs/pull/3)
4 | - Repo: https://github.com/qri-io/cafs
5 |
6 | _Note: This RFC was created as part of an initial sprint to adopt the RFC
7 | process itself, as such sections of this document are less complete than
8 | we'd hope, or less complete than we'd expect from a new RFC.
9 | -:heart: the qri core team_
10 |
11 | # Summary
12 | [summary]: #summary
13 |
14 | Content-Addressed File System (CAFS) is a generalized interface for working with
15 | filestores that names content based on the content itself, usually
16 | through some sort of hashing function.
17 | Examples of content-addressed file systems include git, bittorrent, IPFS,
18 | the DAT project, etc.
19 |
20 | # Motivation
21 | [motivation]: #motivation
22 |
23 | The long-term goal of CAFS is to define an interface for common filestore
24 | operations between different content-addressed filestores that serves the
25 | subset of features qri needs to function.
26 |
27 | This package doesn't aim to implement everything a given filestore can do,
28 | but instead focus on basic file & directory i/o. CAFS is in its very early days,
29 | starting with a proof of concept based on IPFS and an in-memory implementation.
30 | Over time we'll work to add additional stores, which will undoubtedly affect
31 | the overall interface definition.
32 |
33 | A tacit goal of this interface is to manage the seam between graph-based
34 | storage systems, and a file interface.
35 |
36 | # Guide-level explanation
37 | [guide-level-explanation]: #guide-level-explanation
38 |
39 | There are two key interfaces to CAFS. The rest are built upon these two:
40 |
41 | ### File
42 | File is an interface based largely on the `os.File` interface from golang `os`
43 | package, with the exception that files can be _either a file or a directory_.
44 | This file interface will have many dependants in the qri ecosystem.
45 |
46 | ### Filestore
47 | Filestore is the interface for storing files & directories. The "content
48 | addressed" part means that `Put` operations are in charge of returning the name
49 | of the file.
50 |
51 |
52 | # Reference-level explanation
53 | [reference-level-explanation]: #reference-level-explanation
54 |
55 | There are two primary interfaces that constitute a CAFS, *File* and *Filestore*:
56 |
57 |
58 | File is an interface that provides functionality for handling files/directories
59 | as values that can be supplied to commands. For directories, child files are
60 | accessed serially by calling `NextFile()`.
61 | ```golang
62 | type File interface {
63 | // Files implement ReadCloser, but can only be read from or closed if
64 | // they are not directories
65 | io.ReadCloser
66 |
67 | // FileName returns a filename associated with this file
68 | FileName() string
69 |
70 | // FullPath returns the full path used when adding this file
71 | FullPath() string
72 |
73 | // IsDirectory returns true if the File is a directory (and therefore
74 | // supports calling `NextFile`) and false if the File is a normal file
75 | // (and therefore supports calling `Read` and `Close`)
76 | IsDirectory() bool
77 |
78 | // NextFile returns the next child file available (if the File is a
79 | // directory). It will return (nil, io.EOF) if no more files are
80 | // available. If the file is a regular file (not a directory), NextFile
81 | // will return a non-nil error.
82 | NextFile() (File, error)
83 | }
84 | ```
85 |
86 | Filestore is an interface for working with a content-addressed file system.
87 | This interface is under active development, expect it to change lots.
88 | It's currently form-fitting around IPFS (ipfs.io), with far-off plans to
89 | generalize toward compatibility with git (git-scm.com), then maybe other stuff,
90 | who knows.
91 | ```golang
92 | type Filestore interface {
93 | // Put places a file or a directory in the store.
94 | // The most notable difference from a standard file store is the store itself determines
95 | // the resulting key (google "content addressing" for more info ;)
96 | // keys returned by put must be prefixed with the PathPrefix,
97 | // eg. /ipfs/QmZ3KfGaSrb3cnTriJbddCzG7hwQi2j6km7Xe7hVpnsW5S
98 | // "pin" is a flag for recursively pinning this object
99 | Put(file File, pin bool) (key string, err error)
100 |
101 | // Get retrieves the object `value` named by `key`.
102 | // Get will return ErrNotFound if the key is not mapped to a value.
103 | Get(key string) (file File, err error)
104 |
105 | // Has returns whether the `key` is mapped to a `value`.
106 | // In some contexts, it may be much cheaper only to check for existence of
107 | // a value, rather than retrieving the value itself. (e.g. HTTP HEAD).
108 | // The default implementation is found in `GetBackedHas`.
109 | Has(key string) (exists bool, err error)
110 |
111 | // Delete removes the value for given `key`.
112 | Delete(key string) error
113 |
114 | // NewAdder allocates an Adder instance for adding files to the filestore
115 | // Adder gives a higher degree of control over the file adding process at the
116 | // cost of being harder to work with.
117 | // "pin" is a flag for recursively pinning this object
118 | // "wrap" sets whether the top level should be wrapped in a directory
119 | // expect this to change to something like:
120 | // NewAdder(opt map[string]interface{}) (Adder, error)
121 | NewAdder(pin, wrap bool) (Adder, error)
122 |
123 | // PathPrefix is a top-level identifier to distinguish between filestores,
124 | // for exmple: the "ipfs" in /ipfs/QmZ3KfGaSrb3cnTriJbddCzG7hwQi2j6km7Xe7hVpnsW5S
125 | // a Filestore implementation should always return the same prefix
126 | PathPrefix() string
127 | }
128 | ```
129 |
130 | ### Pinning
131 | `Filestore.Put()` accepts a `pin` flag. when `pin = true` the filestore will retain
132 | a permanent reference to the file. When `pin = false`, the filestore may remove
133 | the file at any point. Adding files without pinning is a great way to get the hash
134 | of a file without enforcing the overhead of content retention.
135 |
136 | # Drawbacks
137 | [drawbacks]: #drawbacks
138 |
139 | Getting this interface right will be difficult & full of odd edge-cases that'll
140 | need to be handled carefully.
141 |
142 | # Rationale and alternatives
143 | [rationale-and-alternatives]: #rationale-and-alternatives
144 |
145 | We could skip the notion of _files_ entirely at this level, and instead choose
146 | to focus on _graph_ structures. I think we may end up maturing in this direction
147 | over time, but if so let's do that with proper design consideration.
148 |
149 | # Prior art
150 | [prior-art]: #prior-art
151 |
152 | There isn't much here in the way of prior art.
153 |
154 | # Unresolved questions
155 | [unresolved-questions]: #unresolved-questions
156 |
157 | - How do we properly handle the distinction between network & local operations?
158 |
--------------------------------------------------------------------------------
/text/0004-structured_data_io.md:
--------------------------------------------------------------------------------
1 | - Feature Name: structured_io
2 | - Start Date: 2017-10-18
3 | - RFC PR: [#4](https://github.com/qri-io/rfcs/pull/4)
4 | - Repo: [dataset](https://github.com/qri-io/dataset)
5 |
6 | _Note: This RFC was created as part of an initial sprint to adopt the RFC
7 | process itself, as such sections of this document are less complete than
8 | we'd hope, or less complete than we'd expect from a new RFC.
9 | -:heart: the qri core team_
10 |
11 | # Summary
12 | [summary]: #summary
13 |
14 | Structured I/O defines interfaces for reading & writing streams of parsed
15 | _entries_, which are elements of arbitraty-yet-structured data such as JSON,
16 | CBOR, or CSV documents. Structured reader & writer interfaces combine a byte
17 | stream, data format and schema to create entry readers & writers that produce &
18 | consume entries of parsed primtive types instead of bytes. Structured I/O
19 | streams can be composed & connected to form the basis of rich data communication
20 | capable of spanning across formats.
21 |
22 | # Motivation
23 | [motivation]: #motivation
24 |
25 | One of the prime goals of qri is to to be able to make any dataset comparable to
26 | another dataset. Datasets are also intended to be a arbitrary-yet-structured
27 | document format, able to support all sorts of data with varying degrees of
28 | quality. These requirements mean datasets must be able to define their own
29 | schemas, and may include data that violates that schema.
30 |
31 | Our challenge is to declare a clear set of abstractions that leverage _key
32 | assumptions_ enforced by the dataset model, and leverage those assumptions
33 | to combine arbitrary data at runtime.
34 |
35 | Structured I/O is the primary means of parsing data, abstracting away the
36 | underlying data format while also delivering a set of expectations about
37 | parsed data based on those key assumptions. Those expectations are parsing
38 | to a predetermined set of types (`int`, `bool`, `string`, etc.), and delivering
39 | a _schema_ that includes a definition of valid data structuring.
40 |
41 | As concrete examples, all of the following require tools for data parsing:
42 | - Creating a dataset from a JSON file
43 | - Converting dataset data from one data format to another
44 | - Printing the first 10 rows of a dataset
45 | - Counting the number of entries in a dataset
46 |
47 | All of these tasks are basic things that we'd like to be able to do with one
48 | or more datasets.
49 |
50 | Structured I/O is intended to be a robust set of primitives that
51 | underpin these tasks. Structured _streams_ (readers & writers) wrap a
52 | raw stream of bytes with a parser that transform raw bytes into _entries_ made
53 | of a standard set language-native types (`int`, `bool`, `string`, etc.)
54 | Working with entries instead of bytes allows the programmer to avoid thinking
55 | about the underlying format & focus on the semantics of data instead of
56 | idiosyncrocies between encoding formats.
57 |
58 | Orienting our primitives around _streams_ helps manage concerns created by both
59 | network latency and data volume. By orienting qri around stream programming
60 | we set ourselves up for success for programming in a distributed context.
61 |
62 | Structured I/O builds on foundations set forth in the _structure_ portion of
63 | the dataset definition. For any valid dataset it must be possible to create
64 | a Structured Reader of the dataset body, and a Writer that can be used to
65 | compose an update.
66 |
67 | Structured I/O is intended to underpin lots of other functionality. Doing new
68 | things with data should be a process of composing and enhancing Structured I/O
69 | streams. Want only a subsection of a dataset's body? use a `LimitOffsetReader`.
70 | Want to convert JSON to CBOR? Pipe a JSON-formatted `EntryReader` to a
71 | CBOR-formatted writer.
72 |
73 | # Guide-level explanation
74 | [guide-level-explanation]: #guide-level-explanation
75 |
76 | Creating a structured I/O stream requires a minimum of three things:
77 | - a stream of raw data bytes
78 | - the _data format_ of that stream (eg: JSON)
79 | - a data schema
80 |
81 | The Format & Schema are specified in the passed-in structure, the byte stream
82 | is an io.Reader or io.Writer primitive. Here's a quick example of creating a
83 | reader from scratch & reading it's values:
84 | ```golang
85 | import (
86 | "strings"
87 |
88 | "github.com/qri-io/dataset"
89 | "github.com/qri-io/dataset/dsio"
90 | "github.com/qri-io/jsonschema"
91 | )
92 |
93 | // the data we want to stream, an array with two entries
94 | const JSONData = `["foo",{"name":"bar"}]`
95 |
96 | st := &dataset.Structure{
97 | Format: dataset.JSONDataFormat,
98 | Schema: jsonschema.Must(`{"type":"array"}`),
99 | }
100 |
101 | // created a Structured I/O reader:
102 | str, err := dsio.NewEntryReader(st, strings.NewReader(JSONData))
103 | if err != nil {
104 | panic(err)
105 | }
106 |
107 | ent, err := str.ReadEntry()
108 | if err != nil {
109 | panic(err)
110 | }
111 | fmt.Println(ent.Value) // "foo"
112 |
113 | ent, err := str.ReadEntry()
114 | if err != nil {
115 | panic(err)
116 | }
117 | fmt.Println(ent.Value) // {"name":"bar"}
118 |
119 | _, err := str.ReadEntry()
120 | fmt.Println(err.Error()) // EOF
121 | ```
122 |
123 | ### Stream & Top Level Data
124 | A _stream_ refers to refer to both a _reader_ and a _writer_ collectively.
125 |
126 | _Top Level_ refers to the specific type of the outermost element in a structured
127 | piece of data. This data's top level is an _array_:
128 | ```json
129 | [
130 | {"a": 1},
131 | {"b": 2},
132 | {"c": 3}
133 | ]
134 | ```
135 |
136 | This Data's top level is a _string_:
137 | ```json
138 | "foo"
139 | ```
140 |
141 | Qri requires a top-level type of either array or object.
142 |
143 | ### Entries
144 | Traditional "unstructured" streams often use byte arrays as the basic unit that
145 | is both read and written. Structured I/O works with _entries_ instead
146 | An _entry_ is the fundamental unit of reading & writing for a Structured stream.
147 | Entries are themselves a small abstraction that carries the `Value`, which is
148 | parsed data of an arbitrary structure, `Index` and `Key`. Only one of `Index`
149 | and `Key` will be populated for a given entry, depending on weather a top level
150 | array or object is being read.
151 |
152 | ### Value Types
153 | Qri is built around a basic set of types, which forms a crucial assumption when
154 | working with Structured I/O, which build on this assumption. These assumptions
155 | are inherited from JSON, with the addition of byte arrays.
156 |
157 | All entries will conform to one of the following types:
158 | ```golang
159 | // Scalar Types
160 | nil
161 | bool
162 | int
163 | float64
164 | string
165 | []byte
166 |
167 | // Complex Types
168 | []interface{} // array
169 | map[string]interface{} // object
170 | ```
171 |
172 | When examining an `Entry.Value` it's type is `interface{}`, performing a
173 | [type switch](https://tour.golang.org/methods/16) that handles all of the above
174 | types will cover all possible cases for a valid entry. Using such a type switch
175 | recursively on complex types provides a robust, exhaustive method for inspecting
176 | any given entry.
177 |
178 | Its important to note that these garuntees are only enforced for basic
179 | Structured I/O streams. Abstractions on top of Structured I/O may introduce
180 | additional types during processing. A classic example is timestamp parsing.
181 | Implementers of streams that break this type assumption are encouraged to define
182 | a more specific interface than structured I/O to indicate to consumers this
183 | assumption has been broken.
184 |
185 | ### Corrupt Vs. Invalid Data
186 | Structured I/O must distinguish between data that is _corrupt_ and data that is
187 | _invalid_. Corrupt data is data that doesn't conform to the specified format.
188 | As an example, this is corrupt json data (extra comma):
189 | ```json
190 | ["foooo",]
191 | ```
192 | Structured I/O will error if it enounters any kind of corrupt data.
193 |
194 | _Invalid_ data is data the doesn't conform to the specified _schema_ of
195 | structured I/O. Structured I/O streams _are_ expected to gracefully handle
196 | invalid data by falling back to _identity schemas_, discussed below.
197 |
198 | ### Identity Schemas & Fallbacks
199 | Because schemas _must_ be defined on complex types, and the only complex types
200 | we support are objects and arrays, there are two "identity" schemas that
201 | represent the minimum possible schema definitions that specify the top level of
202 | a data stream be either an array or an object:
203 |
204 | **Array Identity Schema**
205 | ```json
206 | { "type" : "array" }
207 | ```
208 |
209 | **Object Identity Schema**
210 | ```json
211 | { "type" : "object" }
212 | ```
213 |
214 | Data who's top level does not conform to one of these two schemas is
215 | considered corrupt.
216 |
217 | These "Identity Schemas" form a _fallback_ the stream will revert to if the data
218 | it's presented with is invalid. For example, if a schema specifies a top level
219 | of "object", and the stream encounters an array, it will silently revert to the
220 | array identity schema & keep reading.
221 |
222 | The rationale for such a choice is emphasizing _parsing_ over strict adherence
223 | to schema definitions. One of the primary use cases of a dataset version control
224 | system is to begin with data that is invalid according to a given schema, and
225 | correct toward it.
226 |
227 | For this reason, consumers of structured I/O streams are encouraged to
228 | prioritize parsing based on type switches as mentined above, unless the codepath
229 | they are operating in presumes _strict mode_.
230 |
231 | ### Strict Mode
232 | Fallbacks are intended to keep data reading at all costs. However many use cases
233 | will want explicit failure when a stream is misconfigured. For this purpose
234 | streams provide a "strict mode" that errors when invalid data is encountered,
235 | instead of using silent identity-schema fallbacks.
236 |
237 | When a stream operating in Strict mode encouters an entry that doesn't match, it
238 | will return `ErrInvalidEntry` for that entry. In this case the stream will
239 | remain safe for continued use, so that invalid entries do not prevent access
240 | to subsequent valid reads/writes.
241 |
242 | # Reference-level explanation
243 | [reference-level-explanation]: #reference-level-explanation
244 |
245 | Entry is a "row" of a dataset:
246 | ```golang
247 | type Entry struct {
248 | // Index represents this entry's numeric position in a dataset
249 | // this index may not necessarily refer to the overall position within
250 | // the dataset as things like offsets affect where the index begins
251 | Index int
252 | // Key is a string key for this entry
253 | // only present when the top level structure is a map
254 | Key string
255 | // Value is information contained within the row
256 | Value interface{}
257 | }
258 | ```
259 |
260 | EntryWriter is a generalized interface for writing structured data:
261 | ```golang
262 | type EntryWriter interface {
263 | // Structure gives the structure being written
264 | Structure() *dataset.Structure
265 | // WriteEntry writes one "row" of structured data to the Writer
266 | WriteEntry(Entry) error
267 | // Close finalizes the writer, indicating all entries
268 | // have been written
269 | Close() error
270 | }
271 | ```
272 |
273 | EntryReader is a generalized interface for reading Ordered Structured Data:
274 | ```golang
275 | type EntryReader interface {
276 | // Structure gives the structure being read
277 | Structure() *dataset.Structure
278 | // ReadVal reads one row of structured data from the reader
279 | ReadEntry() (Entry, error)
280 | }
281 | ```
282 |
283 | EntryReadWriter combines EntryWriter and EntryReader behaviors:
284 | ```golang
285 | type EntryReadWriter interface {
286 | // Structure gives the structure being read and written
287 | Structure() *dataset.Structure
288 | // ReadVal reads one row of structured data from the reader
289 | ReadEntry() (Entry, error)
290 | // WriteEntry writes one row of structured data to the ReadWriter
291 | WriteEntry(Entry) error
292 | // Close finalizes the ReadWriter, indicating all entries
293 | // have been written
294 | Close() error
295 | // Bytes gives the raw contents of the ReadWriter
296 | Bytes() []byte
297 | }
298 | ```
299 |
300 | # Drawbacks
301 | [drawbacks]: #drawbacks
302 |
303 | The drawbacks of this is it's a new interface that needs to be built,
304 | rationalized & maintained, which is to say this is work, and we should avoid
305 | doing work when we could instead be doing something else.
306 |
307 | # Rationale and alternatives
308 | [rationale-and-alternatives]: #rationale-and-alternatives
309 |
310 | Given that we've already written this, the time for considering alternatives
311 | should be in future a RFC.
312 |
313 | # Prior art
314 | [prior-art]: #prior-art
315 |
316 | [golang's io package](https://godoc.org/io) is _the_ source of inspiration here.
317 |
318 |
319 | ### OpenAPI
320 | [OpenAPI](https://swagger.io/docs/specification/about/) Structured I/O can be
321 | seen as a strict extension on OpenAPI. In fact, we use the jsonschema spec that
322 | grew out of OpenAPI!
323 |
324 | From OpenAPI's docs:
325 | > The ability of APIs to describe their own structure is the root of all
326 | awesomeness in OpenAPI. Once written, an OpenAPI specification and Swagger tools
327 | can drive your API development further in various ways...
328 |
329 | Qri datasets are analogous to self-contained OpenAPI specifications & data in
330 | one combined document. Structured I/O is kinda like the thing that turns such a
331 | document back into an "API".
332 |
333 |
334 | # Unresolved questions
335 | [unresolved-questions]: #unresolved-questions
336 |
337 | How do we handle data that _doesn't_ conform to the structure, such as invalid
338 | data. Should we implement a "strict mode" that requires data to be valid?
339 |
340 | A Structured Reader connects a single schema to a Data Stream, it's a common
341 | use case that entries need only a the subsection of the schema that the entry
--------------------------------------------------------------------------------
/text/0006-dataset_naming.md:
--------------------------------------------------------------------------------
1 | - Feature Name: dataset_naming
2 | - Start Date: 2017-08-14
3 | - RFC PR: [#6](https://github.com/qri-io/rfcs/pull/3)
4 | - Repo: https://github.com/qri-io/qri
5 |
6 | _Note: This RFC was created as part of an initial sprint to adopt the RFC
7 | process itself to help clarify our existing work. As such, sections of this
8 | document are less complete than we'd expect from a new RFC.
9 | -:heart: the qri core team_
10 |
11 | # Summary
12 | [summary]: #summary
13 |
14 | Define the qri naming system, conventions for name resolution, and related jargon.
15 |
16 | # Motivation
17 | [motivation]: #motivation
18 |
19 | As a decentralized system, qri must confront the [Zooko's Triangle](https://en.wikipedia.org/wiki/Zooko%27s_triangle) problem, which is establishing a way of referring to datasets that is:
20 |
21 | * human readable
22 | * decentralized
23 | * unique
24 |
25 | Because Qri is assumed to be built atop a content-addressed file system as its storage layer, the properties of being decentralized & unique are already present. The Qri naming system maps a human readable name to the newest version of a dataset.
26 |
27 | # Guide-level explanation
28 | [guide-level-explanation]: #guide-level-explanation
29 |
30 | It’s possible to refer to a dataset in a number of ways. It’s easiest to look
31 | at the full definition of a dataset reference, and then show what the “defaults” are to make sense of things. The full definition of a dataset reference is as follows:
32 |
33 | dataset_reference = handle/dataset_name@profile_id/network/version_id
34 |
35 | an example of that looks like this:
36 |
37 | b5/comics@QmYCvbfNbCwFR45HiNP45rwJgvatpiW38D961L5qAhUM5Y/ipfs/QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y
38 |
39 | In a sentence: b5 is the handle of a peer, who has a dataset named comics, and its hash on the ipfs network at a point in time was `QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y`
40 |
41 | The individual components of this reference are:
42 |
43 | * handle - The human-friendly name that the creator is using to refer to theirself, somewhat analogous to a username in other systems. We need handles so lots of people can name datasets the same thing.
44 | * dataset_name - The human-friendly name that makes it easy to remember and refer to the dataset.
45 | * profile_id - A unique identifier to let machines uniquely refer to datasets created by this peer, regardless of whether their handle is renamed.
46 | * network - Protocol name that stores distributed data, defaulting to "ipfs".
47 | * version_id - A unique identifier hash to refer to a specific version of a dataset, from an exact point in time.
48 |
49 | ### default to latest on ipfs
50 |
51 | Now, having to type `b5/comics@QmYCvbfNbCwFR45HiNP45rwJgvatpiW38D961L5qAhUM5Y/ipfs/QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y`
52 | every time you wanted a dataset would be irritating. So we have two defaults.
53 | The default network is `ipfs`, and the default hash is the lastest known version of a dataset. We say latest known because sometimes things can fall out of sync. If you're only working with your own local datasets, this won’t be an issue.
54 |
55 | Anyway, that means we can cut down on the typing if we just want the latest
56 | version of b5’s comics dataset, we can just type:
57 |
58 | b5/comics
59 |
60 | In a sentence: “the latest version of b5’s dataset named comics.”
61 |
62 | ### the me keyword
63 |
64 | What if your handle is, say, `golden_pear_ginger_pointer`? First, why did you pick such a long handle?
65 | Whatever your answer, it would be irritating to have to type this every time, so we give you a special way to refer to yourself: `me`. So if you have a dataset named comics, you can just type:
66 |
67 | me/comics
68 |
69 | In a sentence: “the latest version of my dataset named comics.” Under the hood, we’ll re-write this request to use your actual handle instead.
70 |
71 | ### drop names with hashes
72 |
73 | Finally, it’s also possible to use just the hash. This is a perfectly valid dataset reference:
74 |
75 | @/ipfs/QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y
76 |
77 | In this case we’re ignoring naming altogether and simply referring to a dataset by its network and version hash. Because IPFS hashes are global, we can do this across the entire network. If you’re coming from git, this is a fun new trick.
78 |
79 | To recap:
80 |
81 | All of these are ways to refer to a dataset:
82 |
83 | * handle/dataset_name (user-friendly reference)
84 |
85 | b5/comics
86 |
87 | * network/hash (path)
88 |
89 | /ipfs/QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y
90 |
91 | * handle/dataset_name@profile_id/network/version_id (canonical reference)
92 |
93 | b5/comics@QmYCvbfNbCwFR45HiNP45rwJgvatpiW38D961L5qAhUM5Y/ipfs/QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y
94 |
95 |
96 | # Reference-level explanation
97 | [reference-level-explanation]: #reference-level-explanation
98 |
99 | The struct that stores a dataset reference is `DatasetRef`. Its fields correspond directly to the definition specified above.
100 |
101 | ```
102 | // DatasetRef encapsulates a reference to a dataset.
103 | type DatasetRef struct {
104 | // Handle of the dataset creator
105 | Handle string `json:"handle,omitempty"`
106 | // Human-friendly name for this dataset
107 | Name string `json:"name,omitempty"`
108 | // ProfileID of dataset creator
109 | ProfileID profile.ID `json:"profileID,omitempty"`
110 | // Content-addressed path for this dataset, network and version_id
111 | Path string `json:"path,omitempty"`
112 | // Dataset is a pointer to the dataset being referenced
113 | Dataset *dataset.DatasetPod `json:"dataset,omitempty"`
114 | }
115 | ```
116 |
117 | The Dataset pointer optionally points to a dataset itself, once it has been loaded into memory.
118 |
119 | The most important function for working with `DatasetRef`s is `CanonicalizeDatasetRef`, which converts from user-friendly references and path references to canonical references.
120 |
121 | ```
122 | func CanonicalizeDatasetRef(r Repo, ref *DatasetRef) error {
123 | ...
124 | }
125 | ```
126 |
127 | This function handles replacements like converting `me` to the peer's `handle`, and fills in PeerID and Path. It returns the error `repo.ErrNotFound` if the dataset does not exist in the user's repo, which means the dataset is not local, but may exist remotely. Callers of this function should respond appropriately, by contacting peers if a p2p connection exists.
128 |
129 |
130 | # Drawbacks
131 | [drawbacks]: #drawbacks
132 |
133 | Creating a naming scheme carrying with it many issues around backwards compatibility, but this is somewhat mitigated by having the network name in the reference. If, for some reason, a new distributed network needs to be supported in the future, Qri can adapt without breaking old references.
134 |
135 | Renames are difficult to get right, since it means that code cannot assume that all hashes correlate to the same user-friendly string.
136 |
137 | There exist subtle differences between types of dataset references, due to the structure being stateful. Code must carefully handle DatasetRefs not knowing from their type alone whether they are user-friendly, or canonical, or only a path.
138 |
139 | # Rationale and alternatives
140 | [rationale-and-alternatives]: #rationale-and-alternatives
141 |
142 | Some naming scheme is absolutely necessary to refer to distributed dataset in a user-friendly way.
143 |
144 | # Prior art
145 | [prior-art]: #prior-art
146 |
147 | Similar concepts exist in Git, which uses sha1 hashes the way Qri uses IPFS hashes. Git also uses branch and remote names similar to how Qri uses dataset names.
148 |
149 | Bittorrent solves similar problems by encapsulating hash information in binary files that users load from their native interface.
150 |
151 | # Unresolved questions
152 | [unresolved-questions]: #unresolved-questions
153 |
154 | What other shortcut names are there aside from `me` that we may also want to support in the future? How might the DatasetRef structure need to change in the future if new network values are ever supported?
155 |
--------------------------------------------------------------------------------
/text/0011-html_viz.md:
--------------------------------------------------------------------------------
1 | - Feature Name: html_viz
2 | - Start Date: 2018-08-14
3 | - RFC PR: [#13](https://github.com/qri-io/rfcs/pull/13)
4 | - Issue:
5 |
6 | _Note: We never properly finished the HTML rendering RFC, which we started work on in August 2018.
7 | Instead of creating a new one we just "finished what we started" in March 2019. As such this RFC
8 | contains references to RFCS developed in the period between August 2018 & March 2019.
9 | -:heart: the qri core team_
10 |
11 | # Summary
12 | [summary]: #summary
13 |
14 | HTML vizualizations are instructions embedded in a dataset rendering a dataset as a standard HTML document.
15 |
16 | # Motivation
17 | [motivation]: #motivation
18 |
19 | Qri has a syntax-agnostic `viz` component that encapsulates the details required to visualize a dataset. This RFC proposes the first & default visualization syntax should be rendering to a single HTML document using a template processing engine.
20 |
21 | # Guide-level explanation
22 | [guide-level-explanation]: #guide-level-explanation
23 |
24 |
25 | ### `qri render` & default template
26 |
27 | A command will be added to both Qri's HTTP API ('API' for short) & command-line interface (CLI) called `render`, which adds the capacity to execute HTML templates against datasets. When called with a specified dataset `qri render` will load the dataset, assume the viz syntax is HTML, and use a default template to write an HTML representation of a dataset to `stdout` on the CLI, or the HTTP response body on the API:
28 | `qri render me/example_dataset`
29 |
30 | The default template and output path can be overridden with the `--template` and `--output` flags respectively. The output is on the CLI only:
31 | `qri render --template template.html --output rendered.html me/example_dataset`
32 |
33 | The default template must be carefully composed to balance the size of the resulting HTML file in bytes against readability & utility of the resulting visualization. It should also include a well-constructed citation footer that details the components of a dataset in a concise, unobtrusive manner that invites users to audit the veracity of the dataset in question. These defaults should encourage easy reading and invite verification on the part of the reader.
34 |
35 | ### Vizualizations in datasets
36 |
37 | Saving a dataset will by default execute the default template to a file called `index.html` & place it in the IPFS merkle-DAG. When an IPFS HTTP gateway receives a request for a DAG that is a directory containing `index.html`, it returns the HTML file by default. This means when a user visits a dataset on the d.web–completely outside the Qri system of dataset production–they are greeted with a well-formatted dataset document by default.
38 |
39 | While care will be taken to keep `index.html` files small, users may understandably want to disable them entirely. To achieve this we'll add a new flag to `qri save` and `qri update`: `--no-render`. No render will prevent the execution of any viz template. This will save a few KB from version to version at the cost of usability.
40 |
41 | Users can _override_ the default template by supplying their own custom viz templates either by specifying a `viz.scriptPath`:
42 |
43 | `dataset.yaml`:
44 | ```yaml
45 | name: example_dataset
46 | # additional fields elided ...
47 | viz:
48 | format: html
49 | scriptPath: template.html
50 | ```
51 |
52 | and running save:
53 | ```
54 | $ qri save --file dataset.yaml
55 | ```
56 |
57 | Or by running `qri save` with an `.html` file argument:
58 | `qri save --file template.html me/example_dataset`
59 |
60 | Since the above example provided no additional configuration details for the `viz` component in `dataset.yaml`, the two calls will have the same effect.
61 |
62 | # Reference-level explanation
63 | [reference-level-explanation]: #reference-level-explanation
64 |
65 | ### Template API
66 |
67 | Introducing HTML template execution requires defining an API for template values. This API will need to be documented & maintained just like any other API in the Qri ecosystem.
68 |
69 | The template implementation will use the [go language html template package](https://golang.org/pkg/html/template)
70 |
71 | #### Dataset is exposed as `ds`
72 |
73 | HTML template should expose a dataset document at `ds`. Exposing the dataset document as `ds` matches our conventions for referring to a dataset in starlark, and allows access to all defined parts of a dataset. ds should use _lower camel case_ fields for component accessors. eg:
74 |
75 | ```html
76 | {{ ds.meta.title }}
77 | {{ ds.transform.scriptPath }}
78 | ```
79 |
80 | Undefined components should be defined as empty struct pointers if null. For example a dataset _without_ a transform the following template should print nothing:
81 | ```html
82 | {{ ds.transform }}
83 | ```
84 | And this should error:
85 | ```html
86 | {{ ds.transform.script }}
87 | ```
88 |
89 | Having default empty pointers prevents unnecessary `if` clauses, allowing a skip to tests for existence on component fields:
90 | ```html
91 | {{ if ds.transform.script }}
92 | {{ end }}
93 | ```
94 |
95 | ### Template functions
96 |
97 | Top level functions should be loaded into the template `funcs` space to make rendering templates easier. The go html template package comes with [predefined functions](https://golang.org/pkg/text/template/#hdr-Functions). Because our implementation builds on the html/template package, this RFC introduces all of these functions into our template engine. I think this is totally fine, might be a pain if someone needs to write a non-go `Render` implementation, but at least the universe of available template functions is known.
98 |
99 | An example that uses a default function prints the length of the body:
100 |
101 | ```html
102 |
The Body has {{ len getBody }} elements
103 | ```
104 |
105 | In addition to the stock predefined functions the following should be loaded for all templates to make templating a little easier:
106 |
107 | | name | description |
108 | | ----------- | ------------------------ |
109 | | timeParse | parse a timestamp string, returning a golang *time.Time struct |
110 | | timeFormat | convert the textual representation of the datetime into the specified format |
111 | | default | allows setting a default value that can be returned if a first value is not set. |
112 | | title | give the title of a dataset |
113 | | getBody | load the full dataset body |
114 | | filesize | convert byte count to kb/mb/etc string |
115 |
116 |
117 | #### future dataset document API
118 |
119 | We have reserved future work for a "dataset API" that will expand the default capabilities of a dataset document to include convenience functions for doing things like loading named columns or sampling rows from the body. We've intentionally left this API undefined thus far to understand how it will work in different contexts. One such context is this template API.
120 |
121 | The one exception to this is exposing body data through a global function: `getBody`. This is because there's a _very_ high chance we'll want to export `ds.body` as an object with methods in the future. If we simply load the entire body & drop it into `ds.body`, adding methods to `ds.body` will require breaking the document API.
122 |
123 | ### The default template
124 |
125 | Our standard template should be a collection of pre-defined blocks which are also available to user-provided templates. An example default template would look something like this:
126 |
127 | ```html
128 |
129 |
130 |
131 | {{ block "stylesheet" . }}{{ end }}
132 |
133 |
134 | {{ block "header" ds . }}{{ end }}
135 | {{ block "summary" ds . }}{{ end }}
136 | {{ block "stats" ds . }}{{ end }}
137 | {{ block "citations" ds . }}{{ end }}
138 |
139 |
140 | ```
141 |
142 | Users can then swap in these predefined blocks to make pseudo-custom templates a thing, and ease the transition to fully-custom visualizations through progressive customization. The pre-defined blocks are as follows:
143 |
144 | | name | purpose |
145 | | ---------- | ------- |
146 | | stylesheet | default css styles |
147 | | header | dataset title, date created & author in a `` tag |
148 | | summary | overview of dataset details |
149 | | stats | template of stats component that prints nothing if the stats component is undefined |
150 | | citations | `