├── .circleci └── config.yml ├── .gitignore ├── 0000-template.md ├── LICENSE ├── README.md └── text ├── 0001-rfc_process.md ├── 0002-qri_dataset_definition.md ├── 0003-content_addressed_file_system.md ├── 0004-structured_data_io.md ├── 0006-dataset_naming.md ├── 0011-html_viz.md ├── 0012-CLI-commands.md ├── 0013-api.md ├── 0014-export.md ├── 0015-add-geo-processing-abilities-to-skylark.md ├── 0016-revise_transform_processing.md ├── 0017-define_dataset_creation.md ├── 0018-publish-update.md ├── 0019-manifests.md ├── 0020-distingush_manual_vs_scripted_transforms.md ├── 0021-export_behavior.md ├── 0022-remotes.md ├── 0023-starlark_load_dataset.md ├── 0024-scheduled-updates.md ├── 0025-filesystem-integration.md ├── 0026-starlark_expose.md ├── 0027-assets ├── dataset_01.png ├── dataset_02_patch_application.png ├── dataset_03_xform_diff_shape.png ├── dataset_04_xform_same_shape.png ├── dataset_05_conflict.png └── dataset_06_working_dir.png ├── 0027-transform_application.md ├── 0028-externalize_private_keys.md ├── 0029-config_revision.md ├── 0030-replace_publish_clone_with_push_pull.md ├── 0031-expanded_remove.md ├── 0032-access_command.md └── 0033-storage_command.md /.circleci/config.yml: -------------------------------------------------------------------------------- 1 | version: '2' 2 | jobs: 3 | build: 4 | working_directory: "~/rfcs" 5 | docker: 6 | - image: circleci/golang:1.11 7 | steps: 8 | - checkout 9 | - run: 10 | name: install misspell 11 | command: curl -L -o ./install-misspell.sh https://git.io/misspell && sh ./install-misspell.sh 12 | - run: 13 | name: Check Spelling 14 | command: ./bin/misspell -error ~/rfcs/text/ 15 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *~ 2 | book/ 3 | src/ 4 | .DS_Store 5 | -------------------------------------------------------------------------------- /0000-template.md: -------------------------------------------------------------------------------- 1 | - Feature Name: 2 | - Start Date: 3 | - RFC PR: 4 | - Issue: 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | 10 | 11 | # Motivation 12 | [motivation]: #motivation 13 | 14 | 15 | 16 | # Guide-level explanation 17 | [guide-level-explanation]: #guide-level-explanation 18 | 19 | 28 | 29 | # Reference-level explanation 30 | [reference-level-explanation]: #reference-level-explanation 31 | 32 | 39 | 40 | # Drawbacks 41 | [drawbacks]: #drawbacks 42 | 43 | Why should we *not* do this? 44 | 45 | # Rationale and alternatives 46 | [rationale-and-alternatives]: #rationale-and-alternatives 47 | 48 | 51 | 52 | # Prior art 53 | [prior-art]: #prior-art 54 | 55 | 68 | 69 | # Unresolved questions 70 | [unresolved-questions]: #unresolved-questions 71 | 72 | 75 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2018 QRI, Inc. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in 13 | all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 21 | THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Qri RFCs 2 | 3 | [Qri RFCs]: #qri-rfcs 4 | 5 | Many changes can be implemented and reviewed via the normal GitHub pull request 6 | workflow. Some changes though are "substantial", and we ask that these be put 7 | through a bit of a design process and produce a consensus among the qri 8 | community and core team. 9 | 10 | The "RFC" (request for comments) process is intended to provide a consistent 11 | and controlled path for charting the roadmap of qri. We've seen a number of 12 | projects in the distributed space suffer from under-considered design choices 13 | and unclear roadmapping, We're hoping strong adherence to a lightweight RFC 14 | process can help mitigate these problems. You should be able get a sense of 15 | where qri is going by reading through the accepted proposals. 16 | 17 | We openly adknowledge this may seem premature for such an early-stage project. 18 | We're intending to put this RFC place in process now to develop a design-driven 19 | culture that others have a clear path to contribute to the future of 20 | the project. 21 | 22 | This process is _deeply_ inspired by the [rust language RFC process](https://github.com/rust-lang/rfcs), 23 | which builds on the [Python Enhancement Proposals process](https://www.python.org/dev/peps/), 24 | a big thank-you to these projects for leading the way. 25 | 26 | 27 | ## Table of Contents 28 | [Table of Contents]: #table-of-contents 29 | 30 | - [Opening](#qri-rfcs) 31 | - [Table of Contents] 32 | - [When you need to follow this process] 33 | - [Before creating an RFC] 34 | - [What the process is] 35 | - [The RFC life-cycle] 36 | - [Reviewing RFCs] 37 | - [Implementing an RFC] 38 | - [RFC Postponement] 39 | - [Help this is all too informal!] 40 | - [License] 41 | 42 | 43 | ## When you need to follow this process 44 | [When you need to follow this process]: #when-you-need-to-follow-this-process 45 | 46 | Most qri repositories follow [angular commit conventions](https://github.com/angular/angular.js/blob/master/DEVELOPERS.md#type) 47 | which designates 8 _types_ of change: 48 | 49 | - **feat:** A new feature 50 | - **fix:** A bug fix 51 | - **docs:** Documentation only changes 52 | - **style:** Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc) 53 | - **refactor:** A code change that neither fixes a bug nor adds a feature 54 | - **perf:** A code change that improves performance 55 | - **test:** Adding missing or correcting existing tests 56 | - **chore:** Changes to the build process or auxiliary tools and libraries such as documentation generation 57 | 58 | We only need RFCs for `feat` and breaking `refactor` changes to any of our novel 59 | projects, which are the [qri](https://github.com/qri-io/qri) and [dataset](https://github.com/qri-io/dataset) 60 | repositories. `feat` and `refactor` changes to the [frontend](https://github.com/qri-io/frontend) 61 | will also require heavy review, but don't require an RFC. 62 | 63 | Most of the RFC's you'll see created will be coming from the core team, but 64 | that doesn't mean you can't chime in. Constructive one-off comments 65 | are more than welcome! 66 | 67 | 68 | ## Before creating an RFC 69 | [Before creating an RFC]: #before-creating-an-rfc 70 | 71 | A hastily-proposed RFC can hurt its chances of acceptance. Low quality 72 | proposals, proposals for previously-rejected features, or those that don't fit 73 | into the near-term roadmap, may be quickly rejected, which can be demotivating 74 | for the unprepared contributor. Laying some groundwork ahead of the RFC can 75 | make the process smoother. 76 | 77 | Although there is no single way to prepare for submitting an RFC, it is 78 | generally a good idea to pursue feedback from other project developers 79 | beforehand, to ascertain that the RFC may be desirable; having a consistent 80 | impact on the project requires concerted effort toward consensus-building. 81 | 82 | The best place to start is by filing an issue on this repository, the core team 83 | monitors this repo, and will provide feedback on your issue, including weather 84 | it would make for a good RFC. 85 | 86 | 87 | ## What the process is 88 | [What the process is]: #what-the-process-is 89 | 90 | In short, to get a major feature added to Qri, one must first get the RFC 91 | merged into the RFC repository as a markdown file. At that point the RFC is 92 | "active" and may be implemented with the goal of eventual inclusion into Qri. 93 | 94 | - Fork the RFC repo [RFC repository] 95 | - Copy `0000-template.md` to `text/0000-my-feature.md` (where "my-feature" is 96 | descriptive. don't assign an RFC number yet). 97 | - Fill in the RFC. Put care into the details: RFCs that do not present 98 | convincing motivation, demonstrate understanding of the impact of the 99 | design, or are disingenuous about the drawbacks or alternatives tend to be 100 | poorly-received. 101 | - Submit a pull request. As a pull request the RFC will receive design 102 | feedback from the larger community, and the author should be prepared to 103 | revise it in response. 104 | - Each pull request will be reviewed by the core team and assigned to a team 105 | member, who will take responsibility for guiding the RFC 106 | - Build consensus and integrate feedback. RFCs that have broad support are 107 | much more likely to make progress than those that don't receive any 108 | comments. Feel free to reach out to the RFC assignee in particular to get 109 | help identifying stakeholders and obstacles. 110 | - The core team will discuss the RFC pull request, as much as possible in the 111 | comment thread of the pull request itself. Offline discussion will be 112 | summarized on the pull request comment thread. 113 | - RFCs rarely go through this process unchanged, especially as alternatives 114 | and drawbacks are shown. You can make edits, big and small, to the RFC to 115 | clarify or change the design, but make changes as new commits to the pull 116 | request, and leave a comment on the pull request explaining your changes. 117 | Specifically, do not squash or rebase commits after they are visible on the 118 | pull request. 119 | - If the proposal is submitted by a core team member, it can be merged 120 | by at least 2 other core team members approving the RFC, otherwise a member 121 | of the core team will propose a "motion for final comment period" (FCP), 122 | along with a *disposition* for the RFC (merge, close, or postpone). 123 | - This step is taken when enough of the tradeoffs have been discussed that 124 | the core team is in a position to make a decision. That does not require 125 | consensus amongst all participants in the RFC thread (which is usually 126 | impossible). However, the argument supporting the disposition on the RFC 127 | needs to have already been clearly articulated, and there should not be a 128 | strong consensus *against* that position outside of the core team. Team 129 | members use their best judgment in taking this step, and the FCP itself 130 | ensures there is ample time and notification for stakeholders to push back 131 | if it is made prematurely. 132 | - For RFCs with lengthy discussion, the motion to FCP is usually preceded by 133 | a *summary comment* trying to lay out the current state of the discussion 134 | and major tradeoffs/points of disagreement. 135 | - Before actually entering FCP, *all* members of the core team must sign off; 136 | this is often the point at which many core team members first review the RFC 137 | in full depth. 138 | - The FCP lasts ten calendar days, so that it is open for at least 5 business 139 | days. It is also advertised widely, 140 | e.g. in [This Week in Qri](https://this-week-in-rust.org/). This way all 141 | stakeholders have a chance to lodge any final objections before a decision 142 | is reached. 143 | - In most cases, the FCP period is quiet, and the RFC is either merged or 144 | closed. However, sometimes substantial new arguments or ideas are raised, 145 | the FCP is canceled, and the RFC goes back into development mode. 146 | 147 | ## The RFC life-cycle 148 | [The RFC life-cycle]: #the-rfc-life-cycle 149 | 150 | Once an RFC becomes "active" then authors may implement it and submit the 151 | feature as a pull request to the Qri repo. Being "active" is not a rubber 152 | stamp, and in particular still does not mean the feature will ultimately be 153 | merged; it does mean that in principle all the major stakeholders have agreed 154 | to the feature and are amenable to merging it. 155 | 156 | Furthermore, the fact that a given RFC has been accepted and is "active" 157 | implies nothing about what priority is assigned to its implementation, nor does 158 | it imply anything about whether a Qri developer has been assigned the task of 159 | implementing the feature. While it is not *necessary* that the author of the 160 | RFC also write the implementation, it is by far the most effective way to see 161 | an RFC through to completion: authors should not expect that other project 162 | developers will take on responsibility for implementing their accepted feature. 163 | 164 | Modifications to "active" RFCs can be done in follow-up pull requests. We 165 | strive to write each RFC in a manner that it will reflect the final design of 166 | the feature; but the nature of the process means that we cannot expect every 167 | merged RFC to actually reflect what the end result will be at the time of the 168 | next major release. 169 | 170 | In general, once accepted, RFCs should not be substantially changed. Only very 171 | minor changes should be submitted as amendments. More substantial changes 172 | should be new RFCs, with a note added to the original RFC. Exactly what counts 173 | as a "very minor change" is up to the sub-team to decide; check 174 | [Sub-team specific guidelines] for more details. 175 | 176 | 177 | ## Reviewing RFCs 178 | [Reviewing RFCs]: #reviewing-rfcs 179 | 180 | While the RFC pull request is up, the core team may schedule meetings with the 181 | author and/or relevant stakeholders to discuss the issues in greater detail, 182 | and in some cases the topic may be discussed at a core team meeting. In either 183 | case a summary from the meeting will be posted back to the RFC pull request. 184 | 185 | The core team makes final decisions about RFCs after the benefits and drawbacks 186 | are well understood. These decisions can be made at any time, but the core team 187 | will regularly issue decisions. When a decision is made, the RFC pull request 188 | will either be merged or closed. In either case, if the reasoning is not clear 189 | from the discussion in thread, the sub-team will add a comment describing the 190 | rationale for the decision. 191 | 192 | 193 | ## Implementing an RFC 194 | [Implementing an RFC]: #implementing-an-rfc 195 | 196 | Some accepted RFCs represent vital features that need to be implemented right 197 | away. Other accepted RFCs can represent features that can wait until some 198 | arbitrary developer feels like doing the work. Every accepted RFC has an 199 | associated issue tracking its implementation in the Qri repository; thus that 200 | associated issue can be assigned a priority via the triage process that the 201 | team uses for all issues in the Qri repository. 202 | 203 | The author of an RFC is not obligated to implement it. Of course, the RFC 204 | author (like any other developer) is welcome to post an implementation for 205 | review after the RFC has been accepted. 206 | 207 | If you are interested in working on the implementation for an "active" RFC, but 208 | cannot determine if someone else is already working on it, feel free to ask 209 | (e.g. by leaving a comment on the associated issue). 210 | 211 | 212 | ## RFC Postponement 213 | [RFC Postponement]: #rfc-postponement 214 | 215 | Some RFC pull requests are tagged with the "postponed" label when they are 216 | closed (as part of the rejection process). An RFC closed with "postponed" is 217 | marked as such because we want neither to think about evaluating the proposal 218 | nor about implementing the described feature until some time in the future, and 219 | we believe that we can afford to wait until then to do so. Historically, 220 | "postponed" was used to postpone features until after 1.0. Postponed pull 221 | requests may be re-opened when the time is right. We don't have any formal 222 | process for that, you should ask members of the relevant sub-team. 223 | 224 | Usually an RFC pull request marked as "postponed" has already passed an 225 | informal first round of evaluation, namely the round of "do we think we would 226 | ever possibly consider making this change, as outlined in the RFC pull request, 227 | or some semi-obvious variation of it." (When the answer to the latter question 228 | is "no", then the appropriate response is to close the RFC, not postpone it.) 229 | 230 | 231 | ### Help this is all too informal! 232 | [Help this is all too informal!]: #help-this-is-all-too-informal 233 | 234 | The process is intended to be as lightweight as reasonable for the present 235 | circumstances. As usual, we are trying to let the process be driven by 236 | consensus and community norms, not impose more structure than necessary. 237 | 238 | [RFC repository]: http://github.com/qri-io/rfcs 239 | 240 | ## License 241 | [License]: #license 242 | 243 | This repository is licensed under the MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT) 244 | 245 | ### Contributions 246 | 247 | Unless you explicitly state otherwise, any contribution intentionally submitted 248 | for inclusion in the work by you, as defined in the shall be MIT licensed, 249 | without any additional terms or conditions. 250 | -------------------------------------------------------------------------------- /text/0001-rfc_process.md: -------------------------------------------------------------------------------- 1 | - Feature Name: rfc_process 2 | - Start Date: 2018-08-13 3 | - RFC PR: #1 4 | - Issue: N/A 5 | 6 | # Summary 7 | [summary]: #summary 8 | 9 | The "RFC" (request for comments) process is intended to provide a consistent 10 | and controlled path for charting the roadmap of qri. We've seen a number of 11 | projects in the distributed space suffer from under-considered design choices 12 | and unclear roadmapping, We're hoping strong adherence to a lightweight RFC 13 | process can help mitigate these problems. Anyone should be able get a sense of 14 | _where qri is going_ by reading through the accepted proposals. 15 | 16 | # Motivation 17 | [motivation]: #motivation 18 | 19 | I openly adknowledge this may seem premature for such an early-stage project. 20 | I'm intending to put this RFC place in process now to develop a design-driven 21 | culture so that others have a clear path to contribute to the future of the 22 | project, and replace myself as the arbiter of "what qri is" with a codified 23 | principle of [rough consensus and working code](https://www.ietf.org/about/participate/tao/). 24 | 25 | One of the main concerns I have is ambiguity exists around what is _expected 26 | behaviour_ when interacting with qri. To me, this is a _major_ problem. 27 | Not knowing precisely what working-as-expected looks like cuts down on what core 28 | team members can confidently reason about, which means the core can't 29 | confidently tell _others_ how qri should work, which means the whole project is 30 | both ambiguous and without a clear path for resolving that ambiguity. 31 | 32 | To me this ambiguity is a sign that we (the core team) have haven't taken enough 33 | time to reach design-level consensus about how qri _should_ behave, and why I 34 | think it's not too early to start this process. 35 | 36 | There are a few key motivations for implementing this now: 37 | - develop a culture of rough consensus 38 | - inject more design-time thinking into the core team's process 39 | - maintain an accurate, reliable roadmap 40 | - create an on-ramp for others to contribute to the design of qri 41 | 42 | # Guide-level explanation 43 | [guide-level-explanation]: #guide-level-explanation 44 | 45 | For the core team adopting this RFC process will feel like three things at once: 46 | - a source of truth for _how qri should behave_ 47 | - a new channel to directly impact the design of qri 48 | - lots of additional work to get new stuff into qri 49 | 50 | The first batch of RFCs should ideally come from me (@b5), outlining in detail 51 | how the various aspects of qri _should_ work (which as we know, is different 52 | from how things do/not work). The process of debating these initial RFCs should 53 | serve as a chance to work together to establish language, clarity & detail 54 | around expected behaviours. Accepting these initial RFCs provides a new, 55 | consensus-driven foundation for the project that should give core team members 56 | a confident, consistent source of truth to work from. 57 | By requiring sign-off from the core team, this should be a chance for the 58 | core team to achieve consensus on how things should work. 59 | 60 | Upon accepting this proposal, we'll also move all of the issues filed in 61 | [qri-io/notes](https://github.com/qri-io/notes) & depricate the repo. If we 62 | accept this RFC process, we should commit to it fully. Notes should now 63 | start as issues on this repo with the intention that they could develop into 64 | formal RFCs. 65 | 66 | At the same time, this should also be a chance for others to start documenting 67 | ideas we've been kicking around for enhancing qri. Things we've discussed in the 68 | past like skylark modules, simulation testing, new file format support, 69 | etc. should be collected as issues on this repo so we can start working them 70 | into RFCs. 71 | 72 | Finally this process should not get in the way. If done properly, day-to-day 73 | development should _accelerate_ once we've accepted enough RFCs to get 74 | proposals ahead of current development. 75 | It should be easier to implement a feature with confidence because implementing 76 | new things should just be coding up an already-approved design document. 77 | I want to save RFCs for design-level features, not day-to-day fixes or 78 | implementation details. It should still be perfectly acceptable to make changes 79 | on interfaces between packages, move packages around, and riff on ideas outside 80 | of the formal RFC process. RFCs should be saved for decisions that affect how we 81 | expect the user-facing edges of qri to behave. 82 | 83 | # Reference-level explanation 84 | [reference-level-explanation]: #reference-level-explanation 85 | 86 | _Most of this is copied from the Rust RFC process, except I've added a section 87 | intended to keep us moving quickly: if a core team member creates an RFC, and 88 | two other core team members approve it, it's automatically merged. We should 89 | remove this provision once qri reaches a 1.0 release._ 90 | 91 | In short, to get a major feature added to Qri, one must first get the RFC 92 | merged into the RFC repository as a markdown file. At that point the RFC is 93 | "active" and may be implemented with the goal of eventual inclusion into Qri. 94 | 95 | - Fork the RFC repo [RFC repository] 96 | - Copy `0000-template.md` to `text/0000-my-feature.md` (where "my-feature" is 97 | descriptive. don't assign an RFC number yet). 98 | - Fill in the RFC. Put care into the details: RFCs that do not present 99 | convincing motivation, demonstrate understanding of the impact of the 100 | design, or are disingenuous about the drawbacks or alternatives tend to be 101 | poorly-received. 102 | - Submit a pull request. As a pull request the RFC will receive design 103 | feedback from the larger community, and the author should be prepared to 104 | revise it in response. 105 | - Each pull request will be reviewed by the core team and assigned to a team 106 | member, who will take responsibility for guiding the RFC 107 | - Build consensus and integrate feedback. RFCs that have broad support are 108 | much more likely to make progress than those that don't receive any 109 | comments. Feel free to reach out to the RFC assignee in particular to get 110 | help identifying stakeholders and obstacles. 111 | - The core team will discuss the RFC pull request, as much as possible in the 112 | comment thread of the pull request itself. Offline discussion will be 113 | summarized on the pull request comment thread. 114 | - RFCs rarely go through this process unchanged, especially as alternatives 115 | and drawbacks are shown. You can make edits, big and small, to the RFC to 116 | clarify or change the design, but make changes as new commits to the pull 117 | request, and leave a comment on the pull request explaining your changes. 118 | Specifically, do not squash or rebase commits after they are visible on the 119 | pull request. 120 | - If the proposal is submitted by a core team member, it can be merged 121 | by at least 2 other core team members approving the RFC, otherwise a member 122 | of the core team will propose a "motion for final comment period" (FCP), 123 | along with a *disposition* for the RFC (merge, close, or postpone). 124 | - This step is taken when enough of the tradeoffs have been discussed that 125 | the core team is in a position to make a decision. That does not require 126 | consensus amongst all participants in the RFC thread (which is usually 127 | impossible). However, the argument supporting the disposition on the RFC 128 | needs to have already been clearly articulated, and there should not be a 129 | strong consensus *against* that position outside of the core team. Team 130 | members use their best judgment in taking this step, and the FCP itself 131 | ensures there is ample time and notification for stakeholders to push back 132 | if it is made prematurely. 133 | - For RFCs with lengthy discussion, the motion to FCP is usually preceded by 134 | a *summary comment* trying to lay out the current state of the discussion 135 | and major tradeoffs/points of disagreement. 136 | - Before actually entering FCP, *all* members of the core team must sign off; 137 | this is often the point at which many core team members first review the RFC 138 | in full depth. 139 | - The FCP lasts ten calendar days, so that it is open for at least 5 business 140 | days. This way all stakeholders have a chance to lodge any final objections 141 | before a decision is reached. 142 | - In most cases, the FCP period is quiet, and the RFC is either merged or 143 | closed. However, sometimes substantial new arguments or ideas are raised, 144 | the FCP is cancelled, and the RFC goes back into development mode. 145 | 146 | ### The RFC life-cycle 147 | [The RFC life-cycle]: #the-rfc-life-cycle 148 | 149 | Once an RFC becomes "active" then authors may implement it and submit the 150 | feature as a pull request to the Qri repo. Being "active" is not a rubber 151 | stamp, and in particular still does not mean the feature will ultimately be 152 | merged; it does mean that in principle all the major stakeholders have agreed 153 | to the feature and are amenable to merging it. 154 | 155 | Furthermore, the fact that a given RFC has been accepted and is "active" 156 | implies nothing about what priority is assigned to its implementation, nor does 157 | it imply anything about whether a Qri developer has been assigned the task of 158 | implementing the feature. While it is not *necessary* that the author of the 159 | RFC also write the implementation, it is by far the most effective way to see 160 | an RFC through to completion: authors should not expect that other project 161 | developers will take on responsibility for implementing their accepted feature. 162 | 163 | Modifications to "active" RFCs can be done in follow-up pull requests. We 164 | strive to write each RFC in a manner that it will reflect the final design of 165 | the feature; but the nature of the process means that we cannot expect every 166 | merged RFC to actually reflect what the end result will be at the time of the 167 | next major release. 168 | 169 | In general, once accepted, RFCs should not be substantially changed. Only very 170 | minor changes should be submitted as amendments. More substantial changes 171 | should be new RFCs, with a note added to the original RFC. Exactly what counts 172 | as a "very minor change" is up to the sub-team to decide; check 173 | [Sub-team specific guidelines] for more details. 174 | 175 | ### Reviewing RFCs 176 | [Reviewing RFCs]: #reviewing-rfcs 177 | 178 | While the RFC pull request is up, the core team may schedule meetings with the 179 | author and/or relevant stakeholders to discuss the issues in greater detail, 180 | and in some cases the topic may be discussed at a core team meeting. In either 181 | case a summary from the meeting will be posted back to the RFC pull request. 182 | 183 | The core team makes final decisions about RFCs after the benefits and drawbacks 184 | are well understood. These decisions can be made at any time, but the core team 185 | will regularly issue decisions. When a decision is made, the RFC pull request 186 | will either be merged or closed. In either case, if the reasoning is not clear 187 | from the discussion in thread, the sub-team will add a comment describing the 188 | rationale for the decision. 189 | 190 | 191 | ### Implementing an RFC 192 | [Implementing an RFC]: #implementing-an-rfc 193 | 194 | Some accepted RFCs represent vital features that need to be implemented right 195 | away. Other accepted RFCs can represent features that can wait until some 196 | arbitrary developer feels like doing the work. Every accepted RFC has an 197 | associated issue tracking its implementation in the Qri repository; thus that 198 | associated issue can be assigned a priority via the triage process that the 199 | team uses for all issues in the Qri repository. 200 | 201 | The author of an RFC is not obligated to implement it. Of course, the RFC 202 | author (like any other developer) is welcome to post an implementation for 203 | review after the RFC has been accepted. 204 | 205 | If you are interested in working on the implementation for an "active" RFC, but 206 | cannot determine if someone else is already working on it, feel free to ask 207 | (e.g. by leaving a comment on the associated issue). 208 | 209 | 210 | # Drawbacks 211 | [drawbacks]: #drawbacks 212 | 213 | This is going to slow us down and consume precious time. 214 | 215 | # Rationale and alternatives 216 | [rationale-and-alternatives]: #rationale-and-alternatives 217 | 218 | On a philosophical level, I'm tired of shitty software that doesn't work. I want 219 | to raise our standards, aspiring toward principles of 220 | ["Zero Defects Programming"](https://en.wikipedia.org/wiki/Zero_Defects): 221 | 222 | > Instill in workers the will to prevent problems during design and manufacture 223 | rather than go back and fix them later 224 | 225 | The only way to prevent problems during the design phase is to **require a 226 | design phase for things that matter**. In the context of Open Source Software, 227 | the RFC process is the best I've seen for having a strong design that yields 228 | software others can depend on. 229 | 230 | # Prior art 231 | [prior-art]: #prior-art 232 | 233 | - [Rust RFC Process](https://github.com/rust-lang/rfcs) 234 | - [IETF Standards Process](https://www.ietf.org/standards/process/) 235 | - [Python Enhancement Proposals](https://www.python.org/dev/peps/) 236 | 237 | # Unresolved questions 238 | [unresolved-questions]: #unresolved-questions 239 | 240 | - What structures need to be in place to align RFCs with the mission of the 241 | project? 242 | - Do we need to somehow distill approved RFCs into a single roadmap document? 243 | -------------------------------------------------------------------------------- /text/0003-content_addressed_file_system.md: -------------------------------------------------------------------------------- 1 | - Feature Name: content_addressed_file_system 2 | - Start Date: 2017-08-03 3 | - RFC PR: [#3](https://github.com/qri-io/rfcs/pull/3) 4 | - Repo: https://github.com/qri-io/cafs 5 | 6 | _Note: This RFC was created as part of an initial sprint to adopt the RFC 7 | process itself, as such sections of this document are less complete than 8 | we'd hope, or less complete than we'd expect from a new RFC. 9 | -:heart: the qri core team_ 10 | 11 | # Summary 12 | [summary]: #summary 13 | 14 | Content-Addressed File System (CAFS) is a generalized interface for working with 15 | filestores that names content based on the content itself, usually 16 | through some sort of hashing function. 17 | Examples of content-addressed file systems include git, bittorrent, IPFS, 18 | the DAT project, etc. 19 | 20 | # Motivation 21 | [motivation]: #motivation 22 | 23 | The long-term goal of CAFS is to define an interface for common filestore 24 | operations between different content-addressed filestores that serves the 25 | subset of features qri needs to function. 26 | 27 | This package doesn't aim to implement everything a given filestore can do, 28 | but instead focus on basic file & directory i/o. CAFS is in its very early days, 29 | starting with a proof of concept based on IPFS and an in-memory implementation. 30 | Over time we'll work to add additional stores, which will undoubtedly affect 31 | the overall interface definition. 32 | 33 | A tacit goal of this interface is to manage the seam between graph-based 34 | storage systems, and a file interface. 35 | 36 | # Guide-level explanation 37 | [guide-level-explanation]: #guide-level-explanation 38 | 39 | There are two key interfaces to CAFS. The rest are built upon these two: 40 | 41 | ### File 42 | File is an interface based largely on the `os.File` interface from golang `os` 43 | package, with the exception that files can be _either a file or a directory_. 44 | This file interface will have many dependants in the qri ecosystem. 45 | 46 | ### Filestore 47 | Filestore is the interface for storing files & directories. The "content 48 | addressed" part means that `Put` operations are in charge of returning the name 49 | of the file. 50 | 51 | 52 | # Reference-level explanation 53 | [reference-level-explanation]: #reference-level-explanation 54 | 55 | There are two primary interfaces that constitute a CAFS, *File* and *Filestore*: 56 | 57 | 58 | File is an interface that provides functionality for handling files/directories 59 | as values that can be supplied to commands. For directories, child files are 60 | accessed serially by calling `NextFile()`. 61 | ```golang 62 | type File interface { 63 | // Files implement ReadCloser, but can only be read from or closed if 64 | // they are not directories 65 | io.ReadCloser 66 | 67 | // FileName returns a filename associated with this file 68 | FileName() string 69 | 70 | // FullPath returns the full path used when adding this file 71 | FullPath() string 72 | 73 | // IsDirectory returns true if the File is a directory (and therefore 74 | // supports calling `NextFile`) and false if the File is a normal file 75 | // (and therefore supports calling `Read` and `Close`) 76 | IsDirectory() bool 77 | 78 | // NextFile returns the next child file available (if the File is a 79 | // directory). It will return (nil, io.EOF) if no more files are 80 | // available. If the file is a regular file (not a directory), NextFile 81 | // will return a non-nil error. 82 | NextFile() (File, error) 83 | } 84 | ``` 85 | 86 | Filestore is an interface for working with a content-addressed file system. 87 | This interface is under active development, expect it to change lots. 88 | It's currently form-fitting around IPFS (ipfs.io), with far-off plans to 89 | generalize toward compatibility with git (git-scm.com), then maybe other stuff, 90 | who knows. 91 | ```golang 92 | type Filestore interface { 93 | // Put places a file or a directory in the store. 94 | // The most notable difference from a standard file store is the store itself determines 95 | // the resulting key (google "content addressing" for more info ;) 96 | // keys returned by put must be prefixed with the PathPrefix, 97 | // eg. /ipfs/QmZ3KfGaSrb3cnTriJbddCzG7hwQi2j6km7Xe7hVpnsW5S 98 | // "pin" is a flag for recursively pinning this object 99 | Put(file File, pin bool) (key string, err error) 100 | 101 | // Get retrieves the object `value` named by `key`. 102 | // Get will return ErrNotFound if the key is not mapped to a value. 103 | Get(key string) (file File, err error) 104 | 105 | // Has returns whether the `key` is mapped to a `value`. 106 | // In some contexts, it may be much cheaper only to check for existence of 107 | // a value, rather than retrieving the value itself. (e.g. HTTP HEAD). 108 | // The default implementation is found in `GetBackedHas`. 109 | Has(key string) (exists bool, err error) 110 | 111 | // Delete removes the value for given `key`. 112 | Delete(key string) error 113 | 114 | // NewAdder allocates an Adder instance for adding files to the filestore 115 | // Adder gives a higher degree of control over the file adding process at the 116 | // cost of being harder to work with. 117 | // "pin" is a flag for recursively pinning this object 118 | // "wrap" sets whether the top level should be wrapped in a directory 119 | // expect this to change to something like: 120 | // NewAdder(opt map[string]interface{}) (Adder, error) 121 | NewAdder(pin, wrap bool) (Adder, error) 122 | 123 | // PathPrefix is a top-level identifier to distinguish between filestores, 124 | // for exmple: the "ipfs" in /ipfs/QmZ3KfGaSrb3cnTriJbddCzG7hwQi2j6km7Xe7hVpnsW5S 125 | // a Filestore implementation should always return the same prefix 126 | PathPrefix() string 127 | } 128 | ``` 129 | 130 | ### Pinning 131 | `Filestore.Put()` accepts a `pin` flag. when `pin = true` the filestore will retain 132 | a permanent reference to the file. When `pin = false`, the filestore may remove 133 | the file at any point. Adding files without pinning is a great way to get the hash 134 | of a file without enforcing the overhead of content retention. 135 | 136 | # Drawbacks 137 | [drawbacks]: #drawbacks 138 | 139 | Getting this interface right will be difficult & full of odd edge-cases that'll 140 | need to be handled carefully. 141 | 142 | # Rationale and alternatives 143 | [rationale-and-alternatives]: #rationale-and-alternatives 144 | 145 | We could skip the notion of _files_ entirely at this level, and instead choose 146 | to focus on _graph_ structures. I think we may end up maturing in this direction 147 | over time, but if so let's do that with proper design consideration. 148 | 149 | # Prior art 150 | [prior-art]: #prior-art 151 | 152 | There isn't much here in the way of prior art. 153 | 154 | # Unresolved questions 155 | [unresolved-questions]: #unresolved-questions 156 | 157 | - How do we properly handle the distinction between network & local operations? 158 | -------------------------------------------------------------------------------- /text/0004-structured_data_io.md: -------------------------------------------------------------------------------- 1 | - Feature Name: structured_io 2 | - Start Date: 2017-10-18 3 | - RFC PR: [#4](https://github.com/qri-io/rfcs/pull/4) 4 | - Repo: [dataset](https://github.com/qri-io/dataset) 5 | 6 | _Note: This RFC was created as part of an initial sprint to adopt the RFC 7 | process itself, as such sections of this document are less complete than 8 | we'd hope, or less complete than we'd expect from a new RFC. 9 | -:heart: the qri core team_ 10 | 11 | # Summary 12 | [summary]: #summary 13 | 14 | Structured I/O defines interfaces for reading & writing streams of parsed 15 | _entries_, which are elements of arbitraty-yet-structured data such as JSON, 16 | CBOR, or CSV documents. Structured reader & writer interfaces combine a byte 17 | stream, data format and schema to create entry readers & writers that produce & 18 | consume entries of parsed primtive types instead of bytes. Structured I/O 19 | streams can be composed & connected to form the basis of rich data communication 20 | capable of spanning across formats. 21 | 22 | # Motivation 23 | [motivation]: #motivation 24 | 25 | One of the prime goals of qri is to to be able to make any dataset comparable to 26 | another dataset. Datasets are also intended to be a arbitrary-yet-structured 27 | document format, able to support all sorts of data with varying degrees of 28 | quality. These requirements mean datasets must be able to define their own 29 | schemas, and may include data that violates that schema. 30 | 31 | Our challenge is to declare a clear set of abstractions that leverage _key 32 | assumptions_ enforced by the dataset model, and leverage those assumptions 33 | to combine arbitrary data at runtime. 34 | 35 | Structured I/O is the primary means of parsing data, abstracting away the 36 | underlying data format while also delivering a set of expectations about 37 | parsed data based on those key assumptions. Those expectations are parsing 38 | to a predetermined set of types (`int`, `bool`, `string`, etc.), and delivering 39 | a _schema_ that includes a definition of valid data structuring. 40 | 41 | As concrete examples, all of the following require tools for data parsing: 42 | - Creating a dataset from a JSON file 43 | - Converting dataset data from one data format to another 44 | - Printing the first 10 rows of a dataset 45 | - Counting the number of entries in a dataset 46 | 47 | All of these tasks are basic things that we'd like to be able to do with one 48 | or more datasets. 49 | 50 | Structured I/O is intended to be a robust set of primitives that 51 | underpin these tasks. Structured _streams_ (readers & writers) wrap a 52 | raw stream of bytes with a parser that transform raw bytes into _entries_ made 53 | of a standard set language-native types (`int`, `bool`, `string`, etc.) 54 | Working with entries instead of bytes allows the programmer to avoid thinking 55 | about the underlying format & focus on the semantics of data instead of 56 | idiosyncrocies between encoding formats. 57 | 58 | Orienting our primitives around _streams_ helps manage concerns created by both 59 | network latency and data volume. By orienting qri around stream programming 60 | we set ourselves up for success for programming in a distributed context. 61 | 62 | Structured I/O builds on foundations set forth in the _structure_ portion of 63 | the dataset definition. For any valid dataset it must be possible to create 64 | a Structured Reader of the dataset body, and a Writer that can be used to 65 | compose an update. 66 | 67 | Structured I/O is intended to underpin lots of other functionality. Doing new 68 | things with data should be a process of composing and enhancing Structured I/O 69 | streams. Want only a subsection of a dataset's body? use a `LimitOffsetReader`. 70 | Want to convert JSON to CBOR? Pipe a JSON-formatted `EntryReader` to a 71 | CBOR-formatted writer. 72 | 73 | # Guide-level explanation 74 | [guide-level-explanation]: #guide-level-explanation 75 | 76 | Creating a structured I/O stream requires a minimum of three things: 77 | - a stream of raw data bytes 78 | - the _data format_ of that stream (eg: JSON) 79 | - a data schema 80 | 81 | The Format & Schema are specified in the passed-in structure, the byte stream 82 | is an io.Reader or io.Writer primitive. Here's a quick example of creating a 83 | reader from scratch & reading it's values: 84 | ```golang 85 | import ( 86 | "strings" 87 | 88 | "github.com/qri-io/dataset" 89 | "github.com/qri-io/dataset/dsio" 90 | "github.com/qri-io/jsonschema" 91 | ) 92 | 93 | // the data we want to stream, an array with two entries 94 | const JSONData = `["foo",{"name":"bar"}]` 95 | 96 | st := &dataset.Structure{ 97 | Format: dataset.JSONDataFormat, 98 | Schema: jsonschema.Must(`{"type":"array"}`), 99 | } 100 | 101 | // created a Structured I/O reader: 102 | str, err := dsio.NewEntryReader(st, strings.NewReader(JSONData)) 103 | if err != nil { 104 | panic(err) 105 | } 106 | 107 | ent, err := str.ReadEntry() 108 | if err != nil { 109 | panic(err) 110 | } 111 | fmt.Println(ent.Value) // "foo" 112 | 113 | ent, err := str.ReadEntry() 114 | if err != nil { 115 | panic(err) 116 | } 117 | fmt.Println(ent.Value) // {"name":"bar"} 118 | 119 | _, err := str.ReadEntry() 120 | fmt.Println(err.Error()) // EOF 121 | ``` 122 | 123 | ### Stream & Top Level Data 124 | A _stream_ refers to refer to both a _reader_ and a _writer_ collectively. 125 | 126 | _Top Level_ refers to the specific type of the outermost element in a structured 127 | piece of data. This data's top level is an _array_: 128 | ```json 129 | [ 130 | {"a": 1}, 131 | {"b": 2}, 132 | {"c": 3} 133 | ] 134 | ``` 135 | 136 | This Data's top level is a _string_: 137 | ```json 138 | "foo" 139 | ``` 140 | 141 | Qri requires a top-level type of either array or object. 142 | 143 | ### Entries 144 | Traditional "unstructured" streams often use byte arrays as the basic unit that 145 | is both read and written. Structured I/O works with _entries_ instead 146 | An _entry_ is the fundamental unit of reading & writing for a Structured stream. 147 | Entries are themselves a small abstraction that carries the `Value`, which is 148 | parsed data of an arbitrary structure, `Index` and `Key`. Only one of `Index` 149 | and `Key` will be populated for a given entry, depending on weather a top level 150 | array or object is being read. 151 | 152 | ### Value Types 153 | Qri is built around a basic set of types, which forms a crucial assumption when 154 | working with Structured I/O, which build on this assumption. These assumptions 155 | are inherited from JSON, with the addition of byte arrays. 156 | 157 | All entries will conform to one of the following types: 158 | ```golang 159 | // Scalar Types 160 | nil 161 | bool 162 | int 163 | float64 164 | string 165 | []byte 166 | 167 | // Complex Types 168 | []interface{} // array 169 | map[string]interface{} // object 170 | ``` 171 | 172 | When examining an `Entry.Value` it's type is `interface{}`, performing a 173 | [type switch](https://tour.golang.org/methods/16) that handles all of the above 174 | types will cover all possible cases for a valid entry. Using such a type switch 175 | recursively on complex types provides a robust, exhaustive method for inspecting 176 | any given entry. 177 | 178 | Its important to note that these garuntees are only enforced for basic 179 | Structured I/O streams. Abstractions on top of Structured I/O may introduce 180 | additional types during processing. A classic example is timestamp parsing. 181 | Implementers of streams that break this type assumption are encouraged to define 182 | a more specific interface than structured I/O to indicate to consumers this 183 | assumption has been broken. 184 | 185 | ### Corrupt Vs. Invalid Data 186 | Structured I/O must distinguish between data that is _corrupt_ and data that is 187 | _invalid_. Corrupt data is data that doesn't conform to the specified format. 188 | As an example, this is corrupt json data (extra comma): 189 | ```json 190 | ["foooo",] 191 | ``` 192 | Structured I/O will error if it enounters any kind of corrupt data. 193 | 194 | _Invalid_ data is data the doesn't conform to the specified _schema_ of 195 | structured I/O. Structured I/O streams _are_ expected to gracefully handle 196 | invalid data by falling back to _identity schemas_, discussed below. 197 | 198 | ### Identity Schemas & Fallbacks 199 | Because schemas _must_ be defined on complex types, and the only complex types 200 | we support are objects and arrays, there are two "identity" schemas that 201 | represent the minimum possible schema definitions that specify the top level of 202 | a data stream be either an array or an object: 203 | 204 | **Array Identity Schema** 205 | ```json 206 | { "type" : "array" } 207 | ``` 208 | 209 | **Object Identity Schema** 210 | ```json 211 | { "type" : "object" } 212 | ``` 213 | 214 | Data who's top level does not conform to one of these two schemas is 215 | considered corrupt. 216 | 217 | These "Identity Schemas" form a _fallback_ the stream will revert to if the data 218 | it's presented with is invalid. For example, if a schema specifies a top level 219 | of "object", and the stream encounters an array, it will silently revert to the 220 | array identity schema & keep reading. 221 | 222 | The rationale for such a choice is emphasizing _parsing_ over strict adherence 223 | to schema definitions. One of the primary use cases of a dataset version control 224 | system is to begin with data that is invalid according to a given schema, and 225 | correct toward it. 226 | 227 | For this reason, consumers of structured I/O streams are encouraged to 228 | prioritize parsing based on type switches as mentined above, unless the codepath 229 | they are operating in presumes _strict mode_. 230 | 231 | ### Strict Mode 232 | Fallbacks are intended to keep data reading at all costs. However many use cases 233 | will want explicit failure when a stream is misconfigured. For this purpose 234 | streams provide a "strict mode" that errors when invalid data is encountered, 235 | instead of using silent identity-schema fallbacks. 236 | 237 | When a stream operating in Strict mode encouters an entry that doesn't match, it 238 | will return `ErrInvalidEntry` for that entry. In this case the stream will 239 | remain safe for continued use, so that invalid entries do not prevent access 240 | to subsequent valid reads/writes. 241 | 242 | # Reference-level explanation 243 | [reference-level-explanation]: #reference-level-explanation 244 | 245 | Entry is a "row" of a dataset: 246 | ```golang 247 | type Entry struct { 248 | // Index represents this entry's numeric position in a dataset 249 | // this index may not necessarily refer to the overall position within 250 | // the dataset as things like offsets affect where the index begins 251 | Index int 252 | // Key is a string key for this entry 253 | // only present when the top level structure is a map 254 | Key string 255 | // Value is information contained within the row 256 | Value interface{} 257 | } 258 | ``` 259 | 260 | EntryWriter is a generalized interface for writing structured data: 261 | ```golang 262 | type EntryWriter interface { 263 | // Structure gives the structure being written 264 | Structure() *dataset.Structure 265 | // WriteEntry writes one "row" of structured data to the Writer 266 | WriteEntry(Entry) error 267 | // Close finalizes the writer, indicating all entries 268 | // have been written 269 | Close() error 270 | } 271 | ``` 272 | 273 | EntryReader is a generalized interface for reading Ordered Structured Data: 274 | ```golang 275 | type EntryReader interface { 276 | // Structure gives the structure being read 277 | Structure() *dataset.Structure 278 | // ReadVal reads one row of structured data from the reader 279 | ReadEntry() (Entry, error) 280 | } 281 | ``` 282 | 283 | EntryReadWriter combines EntryWriter and EntryReader behaviors: 284 | ```golang 285 | type EntryReadWriter interface { 286 | // Structure gives the structure being read and written 287 | Structure() *dataset.Structure 288 | // ReadVal reads one row of structured data from the reader 289 | ReadEntry() (Entry, error) 290 | // WriteEntry writes one row of structured data to the ReadWriter 291 | WriteEntry(Entry) error 292 | // Close finalizes the ReadWriter, indicating all entries 293 | // have been written 294 | Close() error 295 | // Bytes gives the raw contents of the ReadWriter 296 | Bytes() []byte 297 | } 298 | ``` 299 | 300 | # Drawbacks 301 | [drawbacks]: #drawbacks 302 | 303 | The drawbacks of this is it's a new interface that needs to be built, 304 | rationalized & maintained, which is to say this is work, and we should avoid 305 | doing work when we could instead be doing something else. 306 | 307 | # Rationale and alternatives 308 | [rationale-and-alternatives]: #rationale-and-alternatives 309 | 310 | Given that we've already written this, the time for considering alternatives 311 | should be in future a RFC. 312 | 313 | # Prior art 314 | [prior-art]: #prior-art 315 | 316 | [golang's io package](https://godoc.org/io) is _the_ source of inspiration here. 317 | 318 | 319 | ### OpenAPI 320 | [OpenAPI](https://swagger.io/docs/specification/about/) Structured I/O can be 321 | seen as a strict extension on OpenAPI. In fact, we use the jsonschema spec that 322 | grew out of OpenAPI! 323 | 324 | From OpenAPI's docs: 325 | > The ability of APIs to describe their own structure is the root of all 326 | awesomeness in OpenAPI. Once written, an OpenAPI specification and Swagger tools 327 | can drive your API development further in various ways... 328 | 329 | Qri datasets are analogous to self-contained OpenAPI specifications & data in 330 | one combined document. Structured I/O is kinda like the thing that turns such a 331 | document back into an "API". 332 | 333 | 334 | # Unresolved questions 335 | [unresolved-questions]: #unresolved-questions 336 | 337 | How do we handle data that _doesn't_ conform to the structure, such as invalid 338 | data. Should we implement a "strict mode" that requires data to be valid? 339 | 340 | A Structured Reader connects a single schema to a Data Stream, it's a common 341 | use case that entries need only a the subsection of the schema that the entry -------------------------------------------------------------------------------- /text/0006-dataset_naming.md: -------------------------------------------------------------------------------- 1 | - Feature Name: dataset_naming 2 | - Start Date: 2017-08-14 3 | - RFC PR: [#6](https://github.com/qri-io/rfcs/pull/3) 4 | - Repo: https://github.com/qri-io/qri 5 | 6 | _Note: This RFC was created as part of an initial sprint to adopt the RFC 7 | process itself to help clarify our existing work. As such, sections of this 8 | document are less complete than we'd expect from a new RFC. 9 | -:heart: the qri core team_ 10 | 11 | # Summary 12 | [summary]: #summary 13 | 14 | Define the qri naming system, conventions for name resolution, and related jargon. 15 | 16 | # Motivation 17 | [motivation]: #motivation 18 | 19 | As a decentralized system, qri must confront the [Zooko's Triangle](https://en.wikipedia.org/wiki/Zooko%27s_triangle) problem, which is establishing a way of referring to datasets that is: 20 | 21 | * human readable 22 | * decentralized 23 | * unique 24 | 25 | Because Qri is assumed to be built atop a content-addressed file system as its storage layer, the properties of being decentralized & unique are already present. The Qri naming system maps a human readable name to the newest version of a dataset. 26 | 27 | # Guide-level explanation 28 | [guide-level-explanation]: #guide-level-explanation 29 | 30 | It’s possible to refer to a dataset in a number of ways. It’s easiest to look 31 | at the full definition of a dataset reference, and then show what the “defaults” are to make sense of things. The full definition of a dataset reference is as follows: 32 | 33 | dataset_reference = handle/dataset_name@profile_id/network/version_id 34 | 35 | an example of that looks like this: 36 | 37 | b5/comics@QmYCvbfNbCwFR45HiNP45rwJgvatpiW38D961L5qAhUM5Y/ipfs/QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y 38 | 39 | In a sentence: b5 is the handle of a peer, who has a dataset named comics, and its hash on the ipfs network at a point in time was `QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y` 40 | 41 | The individual components of this reference are: 42 | 43 | * handle - The human-friendly name that the creator is using to refer to theirself, somewhat analogous to a username in other systems. We need handles so lots of people can name datasets the same thing. 44 | * dataset_name - The human-friendly name that makes it easy to remember and refer to the dataset. 45 | * profile_id - A unique identifier to let machines uniquely refer to datasets created by this peer, regardless of whether their handle is renamed. 46 | * network - Protocol name that stores distributed data, defaulting to "ipfs". 47 | * version_id - A unique identifier hash to refer to a specific version of a dataset, from an exact point in time. 48 | 49 | ### default to latest on ipfs 50 | 51 | Now, having to type `b5/comics@QmYCvbfNbCwFR45HiNP45rwJgvatpiW38D961L5qAhUM5Y/ipfs/QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y` 52 | every time you wanted a dataset would be irritating. So we have two defaults. 53 | The default network is `ipfs`, and the default hash is the lastest known version of a dataset. We say latest known because sometimes things can fall out of sync. If you're only working with your own local datasets, this won’t be an issue. 54 | 55 | Anyway, that means we can cut down on the typing if we just want the latest 56 | version of b5’s comics dataset, we can just type: 57 | 58 | b5/comics 59 | 60 | In a sentence: “the latest version of b5’s dataset named comics.” 61 | 62 | ### the me keyword 63 | 64 | What if your handle is, say, `golden_pear_ginger_pointer`? First, why did you pick such a long handle? 65 | Whatever your answer, it would be irritating to have to type this every time, so we give you a special way to refer to yourself: `me`. So if you have a dataset named comics, you can just type: 66 | 67 | me/comics 68 | 69 | In a sentence: “the latest version of my dataset named comics.” Under the hood, we’ll re-write this request to use your actual handle instead. 70 | 71 | ### drop names with hashes 72 | 73 | Finally, it’s also possible to use just the hash. This is a perfectly valid dataset reference: 74 | 75 | @/ipfs/QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y 76 | 77 | In this case we’re ignoring naming altogether and simply referring to a dataset by its network and version hash. Because IPFS hashes are global, we can do this across the entire network. If you’re coming from git, this is a fun new trick. 78 | 79 | To recap: 80 | 81 | All of these are ways to refer to a dataset: 82 | 83 | * handle/dataset_name (user-friendly reference) 84 | 85 | b5/comics 86 | 87 | * network/hash (path) 88 | 89 | /ipfs/QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y 90 | 91 | * handle/dataset_name@profile_id/network/version_id (canonical reference) 92 | 93 | b5/comics@QmYCvbfNbCwFR45HiNP45rwJgvatpiW38D961L5qAhUM5Y/ipfs/QmejvEPop4D7YUadeGqYWmZxHhLc4JBUCzJJHWMzdcMe2y 94 | 95 | 96 | # Reference-level explanation 97 | [reference-level-explanation]: #reference-level-explanation 98 | 99 | The struct that stores a dataset reference is `DatasetRef`. Its fields correspond directly to the definition specified above. 100 | 101 | ``` 102 | // DatasetRef encapsulates a reference to a dataset. 103 | type DatasetRef struct { 104 | // Handle of the dataset creator 105 | Handle string `json:"handle,omitempty"` 106 | // Human-friendly name for this dataset 107 | Name string `json:"name,omitempty"` 108 | // ProfileID of dataset creator 109 | ProfileID profile.ID `json:"profileID,omitempty"` 110 | // Content-addressed path for this dataset, network and version_id 111 | Path string `json:"path,omitempty"` 112 | // Dataset is a pointer to the dataset being referenced 113 | Dataset *dataset.DatasetPod `json:"dataset,omitempty"` 114 | } 115 | ``` 116 | 117 | The Dataset pointer optionally points to a dataset itself, once it has been loaded into memory. 118 | 119 | The most important function for working with `DatasetRef`s is `CanonicalizeDatasetRef`, which converts from user-friendly references and path references to canonical references. 120 | 121 | ``` 122 | func CanonicalizeDatasetRef(r Repo, ref *DatasetRef) error { 123 | ... 124 | } 125 | ``` 126 | 127 | This function handles replacements like converting `me` to the peer's `handle`, and fills in PeerID and Path. It returns the error `repo.ErrNotFound` if the dataset does not exist in the user's repo, which means the dataset is not local, but may exist remotely. Callers of this function should respond appropriately, by contacting peers if a p2p connection exists. 128 | 129 | 130 | # Drawbacks 131 | [drawbacks]: #drawbacks 132 | 133 | Creating a naming scheme carrying with it many issues around backwards compatibility, but this is somewhat mitigated by having the network name in the reference. If, for some reason, a new distributed network needs to be supported in the future, Qri can adapt without breaking old references. 134 | 135 | Renames are difficult to get right, since it means that code cannot assume that all hashes correlate to the same user-friendly string. 136 | 137 | There exist subtle differences between types of dataset references, due to the structure being stateful. Code must carefully handle DatasetRefs not knowing from their type alone whether they are user-friendly, or canonical, or only a path. 138 | 139 | # Rationale and alternatives 140 | [rationale-and-alternatives]: #rationale-and-alternatives 141 | 142 | Some naming scheme is absolutely necessary to refer to distributed dataset in a user-friendly way. 143 | 144 | # Prior art 145 | [prior-art]: #prior-art 146 | 147 | Similar concepts exist in Git, which uses sha1 hashes the way Qri uses IPFS hashes. Git also uses branch and remote names similar to how Qri uses dataset names. 148 | 149 | Bittorrent solves similar problems by encapsulating hash information in binary files that users load from their native interface. 150 | 151 | # Unresolved questions 152 | [unresolved-questions]: #unresolved-questions 153 | 154 | What other shortcut names are there aside from `me` that we may also want to support in the future? How might the DatasetRef structure need to change in the future if new network values are ever supported? 155 | -------------------------------------------------------------------------------- /text/0011-html_viz.md: -------------------------------------------------------------------------------- 1 | - Feature Name: html_viz 2 | - Start Date: 2018-08-14 3 | - RFC PR: [#13](https://github.com/qri-io/rfcs/pull/13) 4 | - Issue: 5 | 6 | _Note: We never properly finished the HTML rendering RFC, which we started work on in August 2018. 7 | Instead of creating a new one we just "finished what we started" in March 2019. As such this RFC 8 | contains references to RFCS developed in the period between August 2018 & March 2019. 9 | -:heart: the qri core team_ 10 | 11 | # Summary 12 | [summary]: #summary 13 | 14 | HTML vizualizations are instructions embedded in a dataset rendering a dataset as a standard HTML document. 15 | 16 | # Motivation 17 | [motivation]: #motivation 18 | 19 | Qri has a syntax-agnostic `viz` component that encapsulates the details required to visualize a dataset. This RFC proposes the first & default visualization syntax should be rendering to a single HTML document using a template processing engine. 20 | 21 | # Guide-level explanation 22 | [guide-level-explanation]: #guide-level-explanation 23 | 24 | 25 | ### `qri render` & default template 26 | 27 | A command will be added to both Qri's HTTP API ('API' for short) & command-line interface (CLI) called `render`, which adds the capacity to execute HTML templates against datasets. When called with a specified dataset `qri render` will load the dataset, assume the viz syntax is HTML, and use a default template to write an HTML representation of a dataset to `stdout` on the CLI, or the HTTP response body on the API: 28 | `qri render me/example_dataset` 29 | 30 | The default template and output path can be overridden with the `--template` and `--output` flags respectively. The output is on the CLI only: 31 | `qri render --template template.html --output rendered.html me/example_dataset` 32 | 33 | The default template must be carefully composed to balance the size of the resulting HTML file in bytes against readability & utility of the resulting visualization. It should also include a well-constructed citation footer that details the components of a dataset in a concise, unobtrusive manner that invites users to audit the veracity of the dataset in question. These defaults should encourage easy reading and invite verification on the part of the reader. 34 | 35 | ### Vizualizations in datasets 36 | 37 | Saving a dataset will by default execute the default template to a file called `index.html` & place it in the IPFS merkle-DAG. When an IPFS HTTP gateway receives a request for a DAG that is a directory containing `index.html`, it returns the HTML file by default. This means when a user visits a dataset on the d.web–completely outside the Qri system of dataset production–they are greeted with a well-formatted dataset document by default. 38 | 39 | While care will be taken to keep `index.html` files small, users may understandably want to disable them entirely. To achieve this we'll add a new flag to `qri save` and `qri update`: `--no-render`. No render will prevent the execution of any viz template. This will save a few KB from version to version at the cost of usability. 40 | 41 | Users can _override_ the default template by supplying their own custom viz templates either by specifying a `viz.scriptPath`: 42 | 43 | `dataset.yaml`: 44 | ```yaml 45 | name: example_dataset 46 | # additional fields elided ... 47 | viz: 48 | format: html 49 | scriptPath: template.html 50 | ``` 51 | 52 | and running save: 53 | ``` 54 | $ qri save --file dataset.yaml 55 | ``` 56 | 57 | Or by running `qri save` with an `.html` file argument: 58 | `qri save --file template.html me/example_dataset` 59 | 60 | Since the above example provided no additional configuration details for the `viz` component in `dataset.yaml`, the two calls will have the same effect. 61 | 62 | # Reference-level explanation 63 | [reference-level-explanation]: #reference-level-explanation 64 | 65 | ### Template API 66 | 67 | Introducing HTML template execution requires defining an API for template values. This API will need to be documented & maintained just like any other API in the Qri ecosystem. 68 | 69 | The template implementation will use the [go language html template package](https://golang.org/pkg/html/template) 70 | 71 | #### Dataset is exposed as `ds` 72 | 73 | HTML template should expose a dataset document at `ds`. Exposing the dataset document as `ds` matches our conventions for referring to a dataset in starlark, and allows access to all defined parts of a dataset. ds should use _lower camel case_ fields for component accessors. eg: 74 | 75 | ```html 76 | {{ ds.meta.title }} 77 | {{ ds.transform.scriptPath }} 78 | ``` 79 | 80 | Undefined components should be defined as empty struct pointers if null. For example a dataset _without_ a transform the following template should print nothing: 81 | ```html 82 | {{ ds.transform }} 83 | ``` 84 | And this should error: 85 | ```html 86 | {{ ds.transform.script }} 87 | ``` 88 | 89 | Having default empty pointers prevents unnecessary `if` clauses, allowing a skip to tests for existence on component fields: 90 | ```html 91 | {{ if ds.transform.script }} 92 | {{ end }} 93 | ``` 94 | 95 | ### Template functions 96 | 97 | Top level functions should be loaded into the template `funcs` space to make rendering templates easier. The go html template package comes with [predefined functions](https://golang.org/pkg/text/template/#hdr-Functions). Because our implementation builds on the html/template package, this RFC introduces all of these functions into our template engine. I think this is totally fine, might be a pain if someone needs to write a non-go `Render` implementation, but at least the universe of available template functions is known. 98 | 99 | An example that uses a default function prints the length of the body: 100 | 101 | ```html 102 |

The Body has {{ len getBody }} elements

103 | ``` 104 | 105 | In addition to the stock predefined functions the following should be loaded for all templates to make templating a little easier: 106 | 107 | | name | description | 108 | | ----------- | ------------------------ | 109 | | timeParse | parse a timestamp string, returning a golang *time.Time struct | 110 | | timeFormat | convert the textual representation of the datetime into the specified format | 111 | | default | allows setting a default value that can be returned if a first value is not set. | 112 | | title | give the title of a dataset | 113 | | getBody | load the full dataset body | 114 | | filesize | convert byte count to kb/mb/etc string | 115 | 116 | 117 | #### future dataset document API 118 | 119 | We have reserved future work for a "dataset API" that will expand the default capabilities of a dataset document to include convenience functions for doing things like loading named columns or sampling rows from the body. We've intentionally left this API undefined thus far to understand how it will work in different contexts. One such context is this template API. 120 | 121 | The one exception to this is exposing body data through a global function: `getBody`. This is because there's a _very_ high chance we'll want to export `ds.body` as an object with methods in the future. If we simply load the entire body & drop it into `ds.body`, adding methods to `ds.body` will require breaking the document API. 122 | 123 | ### The default template 124 | 125 | Our standard template should be a collection of pre-defined blocks which are also available to user-provided templates. An example default template would look something like this: 126 | 127 | ```html 128 | 129 | 130 | 131 | {{ block "stylesheet" . }}{{ end }} 132 | 133 | 134 | {{ block "header" ds . }}{{ end }} 135 | {{ block "summary" ds . }}{{ end }} 136 | {{ block "stats" ds . }}{{ end }} 137 | {{ block "citations" ds . }}{{ end }} 138 | 139 | 140 | ``` 141 | 142 | Users can then swap in these predefined blocks to make pseudo-custom templates a thing, and ease the transition to fully-custom visualizations through progressive customization. The pre-defined blocks are as follows: 143 | 144 | | name | purpose | 145 | | ---------- | ------- | 146 | | stylesheet | default css styles | 147 | | header | dataset title, date created & author in a `
` tag | 148 | | summary | overview of dataset details | 149 | | stats | template of stats component that prints nothing if the stats component is undefined | 150 | | citations | `