├── traceability ├── README.md ├── tooling_arch.png ├── tooling_nav.png ├── tooling_domain.png ├── tooling_rm_domain.png ├── tooling_nav.graphml ├── tooling_rm_domain.graphml ├── traceability.md ├── tooling.md └── tooling_arch.graphml ├── README.md ├── guide ├── notes-planning.md ├── naming.md ├── templ-high.md ├── notes-zarko.txt ├── informal-english.md └── guide.md ├── blockchain ├── fullnode.md └── blockchain.md └── lightclient ├── failuredetector.md └── verification.md /traceability/README.md: -------------------------------------------------------------------------------- 1 | This directory contains proposals related to traceability of the requirements 2 | -------------------------------------------------------------------------------- /traceability/tooling_arch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/informalsystems/vdd/HEAD/traceability/tooling_arch.png -------------------------------------------------------------------------------- /traceability/tooling_nav.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/informalsystems/vdd/HEAD/traceability/tooling_nav.png -------------------------------------------------------------------------------- /traceability/tooling_domain.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/informalsystems/vdd/HEAD/traceability/tooling_domain.png -------------------------------------------------------------------------------- /traceability/tooling_rm_domain.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/informalsystems/vdd/HEAD/traceability/tooling_rm_domain.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # VDD 2 | Verification-Driven Development 3 | 4 | This repository contains guidelines and specifications that documents our view on verification-driven development. The specifications in this repository are of preliminary nature and do not represent reviewed and authoratative documentation. See https://github.com/informalsystems/tendermint-rs/tree/master/docs/spec/ for those. 5 | -------------------------------------------------------------------------------- /guide/notes-planning.md: -------------------------------------------------------------------------------- 1 | These are the points that came out the planning sessions: 2 | 3 | - What are responsibilities of specs? 4 | - What are features of impl the spec should address? What level of details 5 | - High level protocol spec 6 | - Low level architecture spec 7 | - English spec vs TLA+ spec 8 | - Verification log/reporting/artifacts 9 | - How does all this inform / synchronize with development ? 10 | - Template for high and low level specs 11 | -------------------------------------------------------------------------------- /guide/naming.md: -------------------------------------------------------------------------------- 1 | # VDD Naming Conventions / Semantic Versioning 2 | 3 | In order to facilitate tracability over several iterations of English 4 | and TLA+ specifications, and to make clear to external readers what is 5 | the state of a given document, we use the following convention: 6 | 7 | ## English Spec 8 | 9 | As part of the version, the file name and title of each English specification should include reference to one 10 | of the following stages: 11 | 12 | ``` 13 | Draft --> Proposal --> Reviewed 14 | | 15 | |--> Published 16 | ``` 17 | 18 | 19 | - There should be an additional stage `Withdrawn`, for the uncommon 20 | outcome that at some point we find the specification useless 21 | - `Draft` should live in auxiliary branches 22 | - `Proposal` should live in a PR 23 | - `Reviewed`, `Published`, `Withdrawn` should live in the main 24 | (f.k.a. master) branch 25 | - From any stage, if updates need to be made, we first transition back 26 | to Draft. 27 | 28 | 29 | Whenever the stage changes for a English specification, the version 30 | number should increase. Here an example trace 31 | 32 | Draft.1 -> Proposal.2 -> Reviewed.3 -> Draft.4 -> Proposal.5 -> 33 | Published.6 34 | 35 | 36 | ## TLA+ Spec 37 | 38 | ``` 39 | Draft --> Proposal --> Reviewed --> 40 | Validation Draft --> Validation Proposal --> Validated --> 41 | Verification Draft --> Verification Proposal --> Verified 42 | ``` 43 | 44 | 45 | - There should be an additional stage `Withdrawn`, for the uncommon 46 | outcome that at some point we find the specification useless 47 | - `Draft`, `Proposal`, `Reviewed` work similar to the English spec 48 | - When simple invariants and type checks are added the spec should 49 | transition to `Validation Draft`, and when it passes model checking it 50 | should be called `Validated`. 51 | - If we go towards full verification (to be defined; parameterized?) 52 | we go towards `Verification Draft`, and when the model checker 53 | verifies it, it should move to `Verified`. 54 | 55 | - All drafts live in branches 56 | - All proposals live in PRs 57 | - `Reviewed`, `Validated`, `Verified` live in the main branch 58 | 59 | ## Filenames and version control 60 | 61 | The specification files should be named in a consistent way, e.g., `MySpec_003_Reviewed.md` and `MySpec_006_Validated.tla`. 62 | We consider it a good practice to keep the files for the varying versions in the repository, that is, `MySpec_001_Draft.tla`, 63 | `MySpec_002_Proposal.md`, `MySpec_003_Reviewed.md`, etc. should be all kept in the same branch. By doing so, we avoid broken links, 64 | can easily compare different versions and immediately see, if a software component is referring an outdated spec. 65 | 66 | -------------------------------------------------------------------------------- /guide/templ-high.md: -------------------------------------------------------------------------------- 1 | *** This is the beginning of an unfinished draft. Don't continue reading! *** 2 | 3 | # << Name of Component >> 4 | 5 | > Rough outline of what the component is doing and why. 2-3 paragraphs 6 | 7 | # Part I - Outside view 8 | 9 | ## Context of this document 10 | 11 | > mention other components and or specifications that are relevant for this 12 | spec. Possible interactions, possible use cases, etc. 13 | 14 | > should give the reader the understanding in what environment this component 15 | will be used. 16 | 17 | ## Informal Problem statement 18 | 19 | > for the general audience, that is, engineers who want to get an overview over what the component is doing 20 | from a bird's eye view. 21 | 22 | 23 | ## Sequential Problem statement 24 | 25 | > should be English and precise. will be accompanied with a TLA spec. 26 | 27 | # Part II - Protocol view 28 | 29 | ## Environment/Assumptions/Incentives 30 | 31 | > Introduce distributed aspects 32 | 33 | > Timing and correctness assumptions. Possibly with justification that the 34 | assumptions make sense, e.g., it is in the interest of a full node to behave 35 | correctly 36 | 37 | > should have clear formalization in temporal logic. 38 | 39 | ## Distributed Problem Statement 40 | 41 | ### Design choices 42 | 43 | > input/output variables used to define the temporal properties. Most likely they come from an ADR 44 | 45 | ### Temporal Properties 46 | 47 | > safety specifications / invariants in English 48 | 49 | > liveness specifications in English. Possibly with timing/fairness requirements: 50 | e.g., if the component is connected to a correct full node and communication is 51 | reliable and timely, then something good happens eventually. 52 | 53 | should have clear formalization in temporal logic. 54 | 55 | > How is the problem statement linked to the "Sequential Problem statement". 56 | Simulation, implementation, etc. relations 57 | 58 | ## Definitions 59 | 60 | > In this section we become more concrete, with basic (abstracted) data types 61 | 62 | > some math that allows to write specifications and pseudo code solution below. 63 | Some variables, etc. 64 | 65 | ## Solution 66 | 67 | > Basic data structures. Simplified, so that we can focus on the distributed 68 | algorithm here. If existing: link to Tendermint data structures, and mentioned 69 | if details were omitted. 70 | 71 | ### Outline 72 | 73 | > Describe solution (in English), decomposition into functions, where communication to other components happens. 74 | 75 | ### Details 76 | 77 | > Function signatures followed by pseudocode (optional) and a list of features (required): 78 | > - Implementation remarks (optional) 79 | > - e.g. (local/remote) function called in the body of this function 80 | > - Expected precondition 81 | > - Expected postcondition 82 | > - Error condition 83 | 84 | 85 | ## Correctness arguments 86 | 87 | > Proof sketches of why we believe the solution satisfies the problem statement. 88 | Possibly giving inductive invariants that can be used to prove the specifications 89 | of the problem statement 90 | 91 | > In case the specification describes an existing protocol with known issues, 92 | e.g., liveness bugs, etc. "Correctness Arguments" should be replace by 93 | a section called "Analysis" 94 | 95 | # References 96 | 97 | > links to other specifications/ADRs this document refers to 98 | -------------------------------------------------------------------------------- /guide/notes-zarko.txt: -------------------------------------------------------------------------------- 1 | 2 | Previous approaches: 3 | 4 | TDD: Test-driven development (TDD) (Beck 2003; Astels 2003), is an evolutionary approach to 5 | development which combines test-first development where you write a test before you write 6 | just enough production code to fulfill that test and refactoring. 7 | References: 8 | 9 | - https://www.amazon.com/exec/obidos/ASIN/0321146530/ambysoftinc 10 | - https://www.amazon.com/exec/obidos/ASIN/0131016490/ambysoftinc 11 | - https://www.amazon.com/exec/obidos/ASIN/0135974445/ambysoftinc 12 | - https://www.amazon.com/exec/obidos/ASIN/1617290084/ambysoftinc 13 | 14 | 15 | BDD: 16 | 17 | Academia: 18 | 19 | - https://www.cs.vu.nl/~wanf/BOOKS/moddissys.pdf 20 | 21 | Related tools: 22 | 23 | - http://fitnesse.org/ 24 | - https://relishapp.com/rspec 25 | - https://github.com/UBC-NSS/pgo/wiki/Modular-PlusCal 26 | 27 | 28 | TODO: 29 | 30 | - would be great if we could make a short survey how specifications of known 31 | fault-tolerant distributed systems are done. For example, we could look at 32 | Zookeeper or Raft as an example of consensus based coordination framework, 33 | distributed file systems (GFS), apache-spark or Map-Reduce as an example of 34 | distributed scheduler and execution framework, Kafka as a distributed broker and 35 | MongoDB, Elastic or something like that as distributed key-value store. We probably 36 | also want to look at some gossip or p2p system (Cademlia, Bittorent), and 37 | blockchain systems (Bitcoin). We probably want to start by looking at 38 | TLA+ examples first and see if can we generalise some rules out of those 39 | examples. This will probably be delivered in the form of blog post or maybe 40 | even a research paper. 41 | 42 | Examples: 43 | - https://github.com/apache/zookeeper/blob/master/zookeeper-docs/src/main/resources/markdown/zookeeperInternals.md 44 | - https://github.com/elastic/elasticsearch-formal-models 45 | - http://jepsen.io/analyses // we might want to look at these reports to see what specification Kyle used. 46 | - http://tla.msr-inria.inria.fr/kuppe/2019conf/06%20-%20William%20Schultz%20-%20Strangeloop%20TLA+%20Conference%202019%20Talk.pdf 47 | - http://tla.msr-inria.inria.fr/kuppe/2019conf/02%20-%20Ivan%20Beschastnikh%20Finn%20Hackett%20-%20TLA+%20conf%202019%20PGo%20presentation.pdf 48 | - https://github.com/visualzhou/mongo-repl-tla/tree/5fd666da29e7cc088ea70c8d076c12818aba372e 49 | - https://news.ycombinator.com/item?id=9601770 50 | - https://github.com/tlaplus 51 | - https://github.com/tlaplus/Examples 52 | 53 | Goals: 54 | 55 | - Improve software engineering practices by relying on formal methods. We want to constrain us 56 | on fault-tolerant distributed systems. 57 | - In addition to software engineering best practices (TDD, BDD) we also want to consider 58 | what are best practises in writing specifications for distributed systems (both in industry 59 | and academia). 60 | - We might want to figure out how we are replacing existing concepts (and techniques) 61 | with TDD in VDD. For example, TDD (more precisely BDD) will start by writing 62 | acceptance test (specification or requirement). What would be the equivalent 63 | in VDD? Problem with VDD might be the fact that it is harder making incremental 64 | steps. Can we model one aspect of the system in isolation? 65 | - From ModularPascal presentation: Goal: isolate system definition from abstractions of its 66 | execution environment. 67 | 68 | - We might want to think about design/engineering flows and what are the ideal outputs 69 | formal tools should generate. For example: 70 | 71 | 1. write high level specifications that involve participants, messages exchanged and message handlers. 72 | 2. write invariants/properties that should hold. We probably also want to capture system assumptions. At this level we are 73 | probably looking at system, i.e., multi node model of the system. We want to illustrate this on examples we know ( 74 | consensus, fast sync, lite client, fork accountability). 75 | Q: how the tool will help at this stage? It should verify design and generate counter examples in case 76 | something is wrong. What if this step is finished (no issue is found after iterations)? 77 | 78 | 3. engineers think in terms of APIs, data structures and functions. Sometimes for complex modules 79 | they also think in terms of concurrent tasks and communication patterns between concurrent tasks. 80 | Very important aspect is error handling. At this level we think about single node perspective 81 | and interactions with the rest of the systems is modelled as environment. How we step into 82 | internal concurrency? Do we need first to look at single node perspective where 83 | we model single node behaviour and interaction with the environment first and then 84 | internal decomposition? Furthermore, how we connect various specs (by refinement?)? 85 | - What would be ideal output from the tool at this stage? we need to model APIs. 86 | After API is modelled we write behaviours (actions) and then check if invariants 87 | hold? Maybe this is the glue. Ideal output: counter-examples. This would be similar 88 | like failing tests in TDD. 89 | 90 | 4. After we modelled single node perspective, i.e., APIs (external) and internal 91 | behaviour (actions) we want to align this perspective with the code. The code should 92 | implement similar logic and then we check if code is aligned with the spec by 93 | relying on counter-examples generated by the spec. Note that while the model is exhaustively 94 | model checked, the code will be tested only agains some counter-examples. How we can 95 | do full space (symbolic execution) on the code, the same way we are doing it 96 | on the model? Does this make sense at all? The output from the stage 3 should be 97 | understandable by the code at stage 4. This affects code architecture, which should 98 | be more event driven. What are the constraints here? 99 | 100 | 5. For performance reasons we want to decompose logic into concurrent tasks. We want 101 | to ensure that by doing this we are not violating invariants (spec). Do we at this point need 102 | to start from step 1? We have now concurrent system of entities that exchange messages 103 | and some high level guarantees should hold? Then we might want to look at the perspective 104 | of each concurrent task and to model its behaviour and see if invariants hold. Furthermore, 105 | at the level of each task there might be additional invariants we want to add so 106 | corresponding test scenarios will be executed. 107 | 108 | TODO: It seems that it would be very hard to make this approach completely execution 109 | agnostic, i.e., we need to make some assumptions about component interactions. For example 110 | we might assume that every component is communicating with its environment using 111 | events, i.e., we assume event driven architecture. Does this mean we assume 112 | shared nothing architecture? 113 | 114 | Software evolution! How we make this process incremental, i.e., how to make a change 115 | to a spec at any level and how this would be reflected? 116 | 117 | Semantic versioning 118 | -------------------------------------------------------------------------------- /blockchain/fullnode.md: -------------------------------------------------------------------------------- 1 | *** This is the beginning of an unfinished draft. Comments welcome! *** 2 | 3 | # Tendermint Full Node API 4 | 5 | > Rough outline of what the component is doing and why. 2-3 paragraphs 6 | 7 | In several protocols (fast sync, light client verification, light 8 | client failure detector) the "other" communication partner is a full 9 | node. For this purpose, a full node exposes functions that can be 10 | called remotely, by RPC. This document collects these functions and 11 | provides their expected behavior with respect to the blockchain [[blockchain]]. 12 | 13 | 14 | # Part I - Outside view 15 | 16 | A full node provides an API to query specific information from the 17 | blockchain. 18 | 19 | ## Context of this document 20 | 21 | > mention other components and or specifications that are relevant for this 22 | spec. Possible interactions, possible use cases, etc. 23 | 24 | > should give the reader the understanding in what environment this component 25 | will be used. 26 | 27 | A full node follows Tendermint consensus [[blockchain]] and therefore 28 | has a local view of the current state of the blockchain. Process that 29 | do not have a complete view of the blockchain (e.g., light clients, 30 | recovering nodes, etc.) need to query full nodes. This specifications 31 | describes means how that can be done. 32 | 33 | 34 | 35 | ## Informal Problem statement 36 | 37 | > for the general audience, that is, engineers who want to get an overview over what the component is doing 38 | from a bird's eye view. 39 | 40 | 41 | ## Sequential Problem statement 42 | 43 | > should be English and precise. will be accompanied with a TLA spec. 44 | 45 | 46 | The Tendermint Full Node exposes the following functions over Tendermint RPC: 47 | 48 | ### For the Light Client 49 | 50 | ```go 51 | func Commit(height int64) (SignedHeader, error) 52 | ``` 53 | - Implementation remark 54 | - RPC to full node *n* 55 | - Expected precodnition 56 | - header of `height` exists on blockchain 57 | - Expected postcondition 58 | - if *n* is correct: Returns the signed header of height `height` 59 | from the blockchain if communication is timely (no timeout) 60 | - if *n* is faulty: Returns a signed header with arbitrary content 61 | - Error condition 62 | * if *n* is correct: precondition violated or timeout 63 | * if *n* is faulty: arbitrary error 64 | 65 | ---- 66 | 67 | 68 | ```go 69 | func Validators(height int64) (ValidatorSet, error) 70 | ``` 71 | - Implementation remark 72 | - RPC to full node *n* 73 | - Expected precodnition 74 | - header of `height` exists on blockchain 75 | - Expected postcondition 76 | - if *n* is correct: Returns the validator set of height `height` 77 | from the blockchain if communication is timely (no timeout) 78 | - if *n* is faulty: Returns arbitrary validator set 79 | - Error condition 80 | - if *n* is correct: precondition violated or timeout 81 | - if *n* is faulty: arbitrary error 82 | 83 | ---- 84 | 85 | 86 | ### For Fastsync 87 | 88 | ```go 89 | func Status(addr Address) (int64, error) 90 | ``` 91 | - Implementation remark 92 | - RPC to full node *addr* 93 | - Expected precodnition 94 | - none 95 | - Expected postcondition 96 | - if *addr* is correct: Returns the current height `height` of the peer 97 | if communication is timely (no timeout) 98 | - if *addr* is faulty: Returns an arbitrary height 99 | - Error condition 100 | * if *addr* is correct: timeout 101 | **TODO:** we assume communication is reliable and timely. Should we 102 | keep this? 103 | * if *addr* is faulty: arbitrary error 104 | ---- 105 | 106 | 107 | ```go 108 | func Block(addr Address, height int64) (Block, error) 109 | ``` 110 | - Implementation remark 111 | - RPC to full node *addr* 112 | - Expected precodnition 113 | - header of `height` is less than or equal to height of the peer 114 | - Expected postcondition 115 | - if *addr* is correct: Returns the block of height `height` 116 | from the blockchain if communication is timely (no timeout) 117 | - if *addr* is faulty: Returns arbitrary block 118 | - Error condition 119 | - if *addr* is correct: precondition violated or timeout 120 | - if *addr* is faulty: arbitrary error 121 | ---- 122 | 123 | 124 | #### **[FN-LuckyCase]**: 125 | 126 | The callee is correct and no timeout occurs at the caller before the 127 | remote function returns. 128 | 129 | #### **[FN-ManifestFaulty]** 130 | The callee is faulty and the returned data violates the expected 131 | postcondition for a correct callee (e.g., a faulty header is received 132 | in `Commit`). 133 | 134 | 135 | 136 | # Part II - Protocol view 137 | 138 | ## Environment/Assumptions/Incentives 139 | 140 | > Introduce distributed aspects 141 | 142 | > Timing and correctness assumptions. Possibly with justification that the 143 | assumptions make sense, e.g., it is in the interest of a full node to behave 144 | correctly 145 | 146 | > should have clear formalization in temporal logic. 147 | 148 | ## Distributed Problem Statement 149 | 150 | ### Design choices 151 | 152 | > input/output variables used to define the temporal properties. Most likely they come from an ADR 153 | 154 | ### Temporal Properties 155 | 156 | > safety specifications / invariants in English 157 | 158 | > liveness specifications in English. Possibly with timing/fairness requirements: 159 | e.g., if the component is connected to a correct full node and communication is 160 | reliable and timely, then something good happens eventually. 161 | 162 | should have clear formalization in temporal logic. 163 | 164 | > How is the problem statement linked to the "Sequential Problem statement". 165 | Simulation, implementation, etc. relations 166 | 167 | ## Definitions 168 | 169 | > In this section we become more concrete, with basic (abstracted) data types 170 | 171 | > some math that allows to write specifications and pseudo code solution below. 172 | Some variables, etc. 173 | 174 | ### Data structures 175 | The following are data structures that are needed for this specification. 176 | 177 | ```go 178 | type SignedHeader struct { 179 | Header Header 180 | Commit Commit 181 | } 182 | ``` 183 | 184 | 185 | ## Solution 186 | 187 | > Basic data structures. Simplified, so that we can focus on the distributed 188 | algorithm here. If existing: link to Tendermint data structures, and mentioned 189 | if details were omitted. 190 | 191 | ### Outline 192 | 193 | > Describe solution (in English), decomposition into functions, where communication to other components happens. 194 | 195 | ### Details 196 | 197 | > Function signatures followed by pseudocode (optional) and a list of features (required): 198 | > - Implementation remarks (optional) 199 | > - e.g. (local/remote) function called in the body of this function 200 | > - Expected precondition 201 | > - Expected postcondition 202 | > - Error condition 203 | 204 | 205 | ## Correctness arguments 206 | 207 | > Proof sketches of why we believe the solution satisfies the problem statement. 208 | Possibly giving inductive invariants that can be used to prove the specifications 209 | of the problem statement 210 | 211 | # References 212 | 213 | > links to other specifications/ADRs this document refers to 214 | 215 | 216 | [[block]] Specification of the block data structure. 217 | 218 | [[blockchain]] The specification of the Tendermint blockchain. Tags refering to 219 | this specification are labeled [TMBC-*]. 220 | 221 | 222 | [block]: https://github.com/tendermint/spec/blob/master/spec/blockchain/blockchain.md 223 | 224 | [blockchain]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md 225 | 226 | -------------------------------------------------------------------------------- /guide/informal-english.md: -------------------------------------------------------------------------------- 1 | *** This is the beginning of an unfinished draft. Don't continue reading! *** 2 | 3 | # << Name of Component >> 4 | 5 | > Rough outline of what the component is doing and why. 2-3 paragraphs 6 | 7 | # Outline 8 | 9 | > Table of content with rough outline for the parts 10 | 11 | - [Part I](#part-i---tendermint-blockchain): Introduction of 12 | relevant terms of the Tendermint 13 | blockchain. 14 | 15 | - [Part II](#part-ii---sequential-definition-problem): 16 | - [Informal Problem 17 | statement](#Informal-Problem-statement): For the general 18 | audience, that is, engineers who want to get an overview over what 19 | the component is doing from a bird's eye view. 20 | - [Sequential Problem statement](#Sequential-Problem-statement): 21 | Provides a mathematical definition of the problem statement in 22 | its sequential form, that is, ignoring the distributed aspect of 23 | the implementation of the blockchain. 24 | 25 | - [Part III](#part-iii---as-distributed-system): Distributed 26 | aspects, system assumptions and temporal 27 | logic specifications. 28 | 29 | - [Incentives](#incentives): how faulty full nodes may benefit from 30 | misbehaving and how correct full nodes benefit from cooperating. 31 | 32 | - [Computational Model](#Computational-Model): 33 | timing and correctness assumptions. 34 | 35 | - [Distributed Problem Statement](#Distributed-Problem-Statement): 36 | temporal properties that formalize safety and liveness 37 | properties in the distributed setting. 38 | 39 | - [Part IV](#part-iv---Protocol): 40 | Specification of the protocols. 41 | 42 | - [Definitions](#Definitions): Describes inputs, outputs, 43 | variables used by the protocol, auxiliary functions 44 | 45 | - [Protocol](#core-verification): gives an outline of the solution, 46 | and details of the functions used (with preconditions, 47 | postconditions, error conditions). 48 | 49 | - [Liveness Scenarios](#liveness-scenarios): when the light 50 | client makes progress depends heavily on the changes in the 51 | validator sets of the blockchain. We discuss some typical scenarios. 52 | 53 | - [Part V](#part-v---supporting-the-ibc-relayer): Additional 54 | discussions and analysis 55 | 56 | 57 | In this document we quite extensively use tags in order to be able to 58 | reference assumptions, invariants, etc. in future communication. In 59 | these tags we frequently use the following short forms: 60 | 61 | - TMBC: Tendermint blockchain 62 | - SEQ: for sequential specifications 63 | - LCV: Lightclient Verification 64 | - LIVE: liveness 65 | - SAFE: safety 66 | - FUNC: function 67 | - INV: invariant 68 | - A: assumption 69 | 70 | 71 | 72 | # Part I - Tendermint Blockchain 73 | 74 | > necessary parts of the blockchain spec. Might be replaced by a link 75 | > to the spec once we have a published version of it. 76 | 77 | ## Context of this document 78 | 79 | > mention other components and or specifications that are relevant for this 80 | spec. Possible interactions, possible use cases, etc. 81 | 82 | > should give the reader the understanding in what environment this component 83 | will be used. 84 | 85 | 86 | 87 | # Part II - Sequential Definition of the Problem 88 | 89 | 90 | ## Informal Problem statement 91 | 92 | > for the general audience, that is, engineers who want to get an overview over what the component is doing 93 | from a bird's eye view. 94 | 95 | 96 | ## Sequential Problem statement 97 | 98 | > should be English and precise. will be accompanied with a TLA spec. 99 | 100 | 101 | # Part III - Distributed System 102 | 103 | > Introduce distributed aspects 104 | 105 | > Timing and correctness assumptions. Possibly with justification that the 106 | assumptions make sense, e.g., it is in the interest of a full node to behave 107 | correctly 108 | 109 | > should have clear formalization in temporal logic. 110 | 111 | ## Incentives 112 | 113 | 114 | ## Computational Model 115 | 116 | ## Distributed Problem Statement 117 | 118 | ### Two Kinds of Termination 119 | 120 | ### Design choices 121 | 122 | > input/output variables used to define the temporal properties. Most likely they come from an ADR 123 | 124 | 125 | ### Temporal Properties 126 | 127 | > safety specifications / invariants in English 128 | 129 | > liveness specifications in English. Possibly with timing/fairness requirements: 130 | e.g., if the component is connected to a correct full node and communication is 131 | reliable and timely, then something good happens eventually. 132 | 133 | should have clear formalization in temporal logic. 134 | 135 | 136 | ### Solving the sequential specification 137 | 138 | > How is the problem statement linked to the "Sequential Problem statement". 139 | Simulation, implementation, etc. relations 140 | 141 | 142 | # Part IV - Protocol 143 | 144 | > Overview 145 | 146 | 147 | ## Definitions 148 | 149 | ### Data Types 150 | 151 | ### Inputs 152 | 153 | 154 | ### Configuration Parameters 155 | 156 | ### Variables 157 | 158 | ### Assumptions 159 | 160 | ### Invariants 161 | 162 | ### Used Remote Functions / Exchanged Messages 163 | 164 | ## <> 165 | 166 | ### Outline 167 | 168 | > Describe solution (in English), decomposition into functions, where communication to other components happens. 169 | 170 | 171 | ### Details of the Functions 172 | 173 | > Function signatures followed by pseudocode (optional) and a list of features (required): 174 | > - Implementation remarks (optional) 175 | > - e.g. (local/remote) function called in the body of this function 176 | > - Expected precondition 177 | > - Expected postcondition 178 | > - Error condition 179 | 180 | 181 | ### Solving the distributed specification 182 | 183 | > Proof sketches of why we believe the solution satisfies the problem statement. 184 | Possibly giving inductive invariants that can be used to prove the specifications 185 | of the problem statement 186 | 187 | > In case the specification describes an existing protocol with known issues, 188 | e.g., liveness bugs, etc. "Correctness Arguments" should be replace by 189 | a section called "Analysis" 190 | 191 | 192 | 193 | ## Liveness Scenarios 194 | 195 | 196 | 197 | # Part V - Additional Discussions 198 | 199 | 200 | 201 | 202 | 203 | # References 204 | 205 | [[block]] Specification of the block data structure. 206 | 207 | [[RPC]] RPC client for Tendermint 208 | 209 | [[fork-detector]] The specification of the light client fork detector. 210 | 211 | [[fullnode]] Specification of the full node API 212 | 213 | [[ibc-rs]] Rust implementation of IBC modules and relayer. 214 | 215 | [[lightclient]] The light client ADR [77d2651 on Dec 27, 2019]. 216 | 217 | [RPC]: https://docs.tendermint.com/master/rpc/ 218 | 219 | [block]: https://github.com/tendermint/spec/blob/master/spec/blockchain/blockchain.md 220 | 221 | [TMBC-HEADER-link]: #tmbc-header.1 222 | [TMBC-SEQ-link]: #tmbc-seq.1 223 | [TMBC-CorrFull-link]: #tmbc-corr-full.1 224 | [TMBC-Auth-Byz-link]: #tmbc-auth-byz.1 225 | [TMBC-TIME_PARAMS-link]: tmbc-time-params.1 226 | [TMBC-FM-2THIRDS-link]: #tmbc-fm-2thirds.1 227 | [TMBC-VAL-CONTAINS-CORR-link]: tmbc-val-contains-corr.1 228 | [TMBC-VAL-COMMIT-link]: #tmbc-val-commit.1 229 | [TMBC-SOUND-DISTR-POSS-COMMIT-link]: #tmbc-sound-distr-poss-commit.1 230 | 231 | [lightclient]: https://github.com/interchainio/tendermint-rs/blob/e2cb9aca0b95430fca2eac154edddc9588038982/docs/architecture/adr-002-lite-client.md 232 | [fork-detector]: https://github.com/informalsystems/tendermint-rs/blob/master/docs/spec/lightclient/detection.md 233 | [fullnode]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md 234 | 235 | [ibc-rs]:https://github.com/informalsystems/ibc-rs 236 | 237 | [FN-LuckyCase-link]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md#fn-luckycase 238 | 239 | [blockchain-validator-set]: https://github.com/tendermint/spec/blob/master/spec/blockchain/blockchain.md#data-structures 240 | [fullnode-data-structures]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md#data-structures 241 | 242 | [FN-ManifestFaulty-link]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md#fn-manifestfaulty 243 | 244 | [arXiv]: https://arxiv.org/abs/1807.04938 245 | -------------------------------------------------------------------------------- /traceability/tooling_nav.graphml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | English 24 | Specification 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | Implementation 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | TLA+ Model 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | Tests 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | Defects 69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | (Navigation 145 | by IDE) 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | -------------------------------------------------------------------------------- /guide/guide.md: -------------------------------------------------------------------------------- 1 | # Verification-Driven Development: An Informal Guide 2 | 3 | **Abstract.** Software bugs can be increasingly costly, e.g., blockchain 4 | technology. At the same time, modern software stacks are quite complex. 5 | For instance, Byzantine fault tolerant blockchain systems (for example Tendermint) 6 | are based on complex fault-tolerant distributed protocols, implemented 7 | as highly concurrent systems, and overall contain thousands of lines of code. 8 | Therefore, there is lot of space for potential bugs: protocol bugs, 9 | concurrency bugs, implementation bugs, security bugs, etc. 10 | 11 | As bugs in (some) modern distributed systems may have high cost, 12 | people have started considering computer-aided verification tools and 13 | methodologies as a way to increase correctness and verifiability of 14 | those critical systems. However current approaches turn out to be 15 | impractical, especially at larger scale: 16 | completely automated verification methods such as model checking hit 17 | theoretical limits and undecidability results, while mechanical verification 18 | methods based on interactive provers require a huge amount of manual work. 19 | 20 | Part of the problem is the fact that the verification is often regarded 21 | as an "after the fact" task, that is, used after the software is written. 22 | In this document we present an approach where verification goes hand in 23 | hand with software design and development. Our goal is to improve software 24 | engineering practices (and ultimately the end software product) 25 | by relying on formal methods. Our approach primarily focuses on 26 | distributed and concurrent systems, although it might be of interest also 27 | outside those domains. 28 | 29 | ## Development Work Flow 30 | 31 | In this section we discuss the steps, deliverables and responsibilities 32 | in the process of designing and developing distributed/concurrent systems. 33 | This process is a collaboration of researchers (protocol and/or verification 34 | experts) and software engineers. Different steps require different expertise 35 | to lead that step. We make these requirements explicit. 36 | 37 | ### 1. Problem Statement / Outside view 38 | 39 | This part of the specification should present the problem at the very 40 | high level (if possible using sequential specification). We can think 41 | about this part of the specification as an operational level of the 42 | problem where basic mechanisms are explained, but without 43 | going into details about the particular system model in which problem 44 | can be solved and without entering protocol design details. 45 | 46 | This part of the specification serves as a gentle introduction to the 47 | problem. It is useful to both protocol designers and engineers as a high 48 | level problem description. 49 | 50 | #### Expected Expertise 51 | 52 | - **lead:** distributed algorithm designer 53 | - **input/feedback:** distributed systems engineers 54 | 55 | #### Artifacts 56 | 57 | - High level English specification of the problem. 58 | - [Optional] In some cases it might be useful to write TLA+ 59 | specification of temporal properties (in case we want to do TLA+ 60 | refinements), but this is by default not required. 61 | 62 | #### Verification, Validation, or Proof Obligation 63 | 64 | - We don't require any proofs or formal validation to be done at this 65 | level. In case temporal properties are expressed in TLA+, it can be 66 | perhaps used in composition with other specifications. 67 | 68 | ### 2. Protocol Specification / Protocol view 69 | 70 | This part of the specification should present a concrete system model 71 | in which the problem is considered, and an algorithm that solves the problem 72 | in the given model. It contains two parts: 1) system model specification 73 | and 2) algorithm (protocol) specification. 74 | 75 | #### 2.1 System Model Specification 76 | 77 | As we focus on (fault-tolerant) distributed and concurrent systems, we 78 | have to specify protocols that run on unreliable/adversarial 79 | computers and networks, that refine the problem statement from above. 80 | 81 | System model section should therefore contain assumptions made 82 | regarding the following aspects: 83 | 84 | - definitions of processes (process represents unit of execution) involved. 85 | A process can be a node in a distributed system or a thread (routine) 86 | in case of concurrent applications. Furthermore, we need to specify what 87 | kind of process faults we consider: no faults, crash-stop faults, 88 | crash-recovery faults, arbitrary (Byzantine) faults, etc. In addition to 89 | kind of faults we need also to specify a maximum bound (number) of faulty 90 | processes we assume: less than majority, less than one third, all 91 | connected processes could be faulty, etc. 92 | - definitions of communications between processes: message passing, 93 | shared memory, remote procedure calls, channels (blocking, 94 | unblocking, bounded, etc), etc. 95 | - synchrony assumptions on the process and network speed: synchronous 96 | (there is a known upper bound on the process/network speed), 97 | asynchronous (no assumption is made on upper bound on the process/network 98 | speed), partially synchronous (system in between synchronous and 99 | asynchronous, i.e., system is eventually synchronous, or transitions between 100 | periods of asynchrony and synchrony). 101 | - safety and liveness properties of the problem in the given model 102 | (these are the properties that an algorithm must fulfill to be able to 103 | solve the problem in the given system model). 104 | 105 | ##### Expected Expertise 106 | 107 | - **lead:** distributed algorithm designer 108 | - **input/feedback:** distributed systems engineers, 109 | verification engineers 110 | 111 | ##### Artifacts 112 | 113 | - High-level English specification covering aspects mentioned above 114 | (processes, communication channels, synchrony assumptions, safety and 115 | liveness properties). 116 | 117 | ##### Verification, Validation, or Proof Obligation 118 | 119 | - Informal check that the problem refinement in the given model 120 | corresponds to the top level problem definition. 121 | 122 | #### 2.2 Algorithm (Protocol) Specification 123 | 124 | This part presents an algorithm (protocol) that solves the problem 125 | in the system model specified in the section 2.1. Algorithm specification 126 | at this level should be at a higher level of abstraction 127 | than a real implementation, and should be seen mainly as part of 128 | the protocol design phase. High level algorithm specification should cover: 129 | 130 | - messages exchanged (minimal set of messages needed to express 131 | core algorithm logic) 132 | - core data structures every process maintains (minimal data structures 133 | needed to express core algorithm logic). Importantly, when there is a trade-off 134 | between the clarity and efficiency of a data structure for local computation, 135 | clarity should be chosen. The efficiency concerns should be addressed at 136 | lower levels. 137 | - state machine(s) that defines algorithm transitions and 138 | - protocol invariants. 139 | 140 | Protocol specifications should be at a higher level of abstraction 141 | than real implementations, and they should not introduce concepts that 142 | could differ between implementations. For example, high level protocol 143 | specifications should (if possible) avoid dealing explicitly with timeouts or 144 | efficiency concerns (batching of messages, flow control logic, 145 | DDoS protection mechanisms, etc), concurrency aspects, detailed error 146 | handling, etc. Some of those concerns might be addressed at the lower 147 | specification levels, and some would just appear at the implementation level. 148 | 149 | Depending on the protocol type, we might need to model a complete 150 | distributed system with a set of processes and communication channels 151 | between them (for example consensus), as safety and liveness properties 152 | are global (ie they are about all correct processes). In other cases, where 153 | protocol is more single node oriented (for example fast sync, state sync, 154 | light client, etc), it might be sufficient modelling a single node 155 | (that is service consumer) and all other processes represent the environment. 156 | This should simplify modelling and make model checking more efficient. 157 | It also makes it simpler to consider multiple environments to be able to 158 | model weaker or stronger adversaries in a more modular way. 159 | 160 | 161 | ##### Expected Expertise 162 | 163 | - **lead:** distributed algorithm designer and/or verification engineer 164 | - **input/feedback:** verification engineers and distributed systems 165 | engineers 166 | 167 | ##### Artifacts 168 | 169 | - English description of the protocol 170 | - TLA+ specification of the protocol 171 | - TLA+ specification of properties (expressed in English in the section 172 | 2.1) as invariants and temporal properties 173 | 174 | NOTE: TLA+ protocol specification should be considered as the definitive 175 | source of truth. However, English description of the protocol should 176 | provide reader with the intuition behind the core protocol mechanisms, 177 | and therefore reduce the time needed to understand TLA+ specification. 178 | Recommended practice is to map TLA+ constructs to the English explanation 179 | (either by including description snippets as comments in TLA+ specs, or 180 | by referencing parts of the English description from the comments in 181 | TLA+ specification). 182 | 183 | - Abstract test scenarios: generation of (relevant) abstract 184 | execution scenarios (using model checkers) that can be used to 185 | drive unit and integration tests. 186 | - [Optional] In addition to properties that corresponds to protocol 187 | safety and liveness properties, good practice is also writing a set 188 | of simpler failing properties that are useful for having more 189 | certainty in the correctness of the specification (that concepts are 190 | correctly encoded) and also as a mean of generating interesting 191 | test cases (by the model checker). These properties should be prefixed 192 | with ```Witness```, for example ```WitnessPeerSetIsNeverEmpty```. Effectively, 193 | we expect a model checker to produce a counterexample to our property. 194 | As we do not expect the property to hold true, the model checker produces a 195 | witness to the negation of the property. 196 | 197 | ##### Verification, Validation, or Proof Obligation 198 | 199 | - Check that the protocol satisfies the invariants and temporal logic 200 | properties 201 | - [Optional] TLA+ reductions / refinement mappings in case higher level 202 | specifications are expressed in TLA+. 203 | 204 | 205 | ### 3. Single Node View 206 | 207 | At this level we think about the specification of the code. Even in 208 | distributed systems, code inherently runs on one computer, so we take the 209 | single node perspective here while the rest of the 210 | system is modelled as the environment. 211 | 212 | The goal of the specification at this level (we call it also low level 213 | specification) is to be as close as possible to the implementation. 214 | Being close to the implementation allows discovery of software 215 | architecture problems and potentially implementation issues. 216 | Depending on the implementation complexity the specification at this 217 | level should contain: 218 | 219 | - state machine of the single process that maps high level protocol 220 | transitions to the node API (input/output events). This state machine 221 | should normally contain more complex data structures (compared to 222 | the one from the section 2.2), new concepts and mechanisms 223 | (implementation specific) that could be either efficiency motivated or 224 | consequence of programming language environment, and detailed error 225 | handling logic. 226 | - multiple state machines in case a node implementation actually compose 227 | of multiple concurrent processes (tasks). In this case, in addition to 228 | the elements already mentioned (the single task case), we also need to 229 | model concurrency architecture of the solution, defining additional 230 | invariants and temporal properties for the concurrency architecture 231 | (for example that the solution does not have deadlocks). Note that in 232 | this case, we might need to write multiple TLA+ specifications 233 | that will be analysed and model checked in isolation, together with the 234 | complete solution that would correspond to the overall node implementation. 235 | 236 | 237 | #### Expected Expertise 238 | 239 | - **lead:** distributed systems engineer 240 | - **input/feedback:** distributed protocol designer, verification engineer 241 | 242 | Note that at this stage of development we expect engineers to lead 243 | by proposing implementation informed by the high level specification 244 | (section 2), and formal specification and verification should be seen 245 | as supportive tools to discover implementation level design issues, 246 | discover bugs, and generate interesting abstract test scenarios; in general 247 | to increase confidence in the code correctness. 248 | 249 | Although in general, engineers are not constrained in the way system 250 | will be implemented, we expect verification engineers to inform 251 | some implementation decisions from the verifiability perspective. 252 | Some known good engineering practices that align well with formal tools 253 | are: 254 | 255 | - modular design (reduce complexity by splitting implementation into 256 | smaller modules with clear and simple responsibilities that can be 257 | designed, implemented, tested and verified in isolation) 258 | - reactive programming (core business logic should be expressed as a 259 | function (for example state machine) which based on input state and events 260 | generate new state and output events, i.e., avoid side effects like 261 | state mutation) 262 | - simpler concurrency architectures (concurrent tasks are owners of 263 | particular concepts and communication between tasks happen explicitly 264 | by exchanging events, i.e., avoid communication between different tasks 265 | over shared data and locks) 266 | - language-based verification via sound, parametric type systems (see for example 267 | https://www.seas.upenn.edu/~sweirich/papers/foser10.pdf for more details). 268 | 269 | #### Artifacts 270 | 271 | - Code 272 | - TLA+ specification of the state machines that correspond to code artefact 273 | - TLA+ specification of properties introduced at the 274 | implementation level (for example concurrency related) expressed as 275 | invariants and temporal properties 276 | - Abstract test scenarios: generation of (relevant) abstract 277 | execution scenarios (using model checkers) that can be used to 278 | drive unit and integration tests. 279 | - [Optional] Simpler failing properties that are useful for having more 280 | certainty in the correctness of the specification (that concepts are 281 | correctly encoded) and also as a mean of generating interesting 282 | test cases (by the model checker). These properties should be prefixed 283 | with Witness. 284 | 285 | #### Verification, Validation, or Proof Obligation 286 | 287 | - Check that the specification satisfies the invariants and temporal logic 288 | properties 289 | - TLA+ reductions / refinement mappings with respect to the high level 290 | specification 291 | - Check that code passes tests that corresponds to (automatically) generated 292 | abstract execution scenarios. 293 | -------------------------------------------------------------------------------- /traceability/tooling_rm_domain.graphml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | Specification 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | Requirement 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | Acceptance 46 | Criterion 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | Implementation 58 | Artifact 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | Test Artifact 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | Defect 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | Has 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | Is fulfilled when 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | Can be automatically 114 | proven by 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | Automatically 126 | tests 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | Is logically related to 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | Can have 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | Automatically 166 | tests if resolved 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | -------------------------------------------------------------------------------- /traceability/traceability.md: -------------------------------------------------------------------------------- 1 | # Traceability 2 | 3 | **Informal Systems** 4 | 5 | **Version 2** 6 | 7 | **Abstract**. This document contains a proposal on implementing 8 | traceability in the process of verification-driven development. Software 9 | developers are familiar with [requirements traceability], which can be 10 | understood as "the ability to describe and follow the life of a 11 | requirement in both a forwards and backwards direction" 12 | \[[Wikipedia][requirements traceability]\]. The [VDD Manifesto] defines 13 | several layers of specifications that have to be linked with the 14 | implementation, unit tests, architectural proposals, etc. We propose a 15 | concrete and lightweight approach that will allow us to implement 16 | traceability without using heavy and expensive tooling. This approach 17 | can be automated with plain text tools, though it can be also 18 | implemented manually. 19 | 20 | ## 1. Process artifacts 21 | 22 | For every distributed protocol, [VDD Manifesto] defines the following 23 | artifacts (this list is incomplete): 24 | 25 | - High level English specification of the problem solved by the 26 | protocol. For instance, a specification of a sequential algorithm. 27 | - System model specification in English. 28 | - Protocol specification in English. 29 | - TLA+ specification of the protocol. 30 | - TLA+ specification of the expected behavior (invariants and temporal 31 | properties). 32 | - Implementation in a language of choice (Golang or Rust). 33 | - TLA+ specification of the state machines that correspond to the 34 | implementation code. 35 | - TLA+ specification of the invariants and temporal properties at the 36 | implementation level. 37 | - Abstract test scenarios. 38 | 39 | It is clear that maintaining cross references between the logical units 40 | (e.g., paragraphs of text, TLA+ operators, Rust functions) in all of 41 | these artifacts is tedious and error prone. Maintaining github links is 42 | a fragile solution, as the links expire, get moved across different 43 | branches, etc. 44 | 45 | ## 2. Logical units 46 | 47 | Before we discuss, the implementation of the traceability properties, we 48 | classify logical units that should be tagged. These are: 49 | 50 | - **Requirements in English**. We should tag every paragraph (or 51 | several adjacent paragraphs) that defines a single requirement. We 52 | do not have to tag the text that provides the reader with the 53 | motivation, explanation, examples, etc. 54 | 55 | - **TLA+ operators**. We should tag the top-level operators, that is, 56 | the operators that do not have parents in the call graph. 57 | 58 | - **Implementation methods**. We should tag the principle functions 59 | and data structures. The main point is not to tag every single piece 60 | of code, but to tag the code that implements the requirements at 61 | higher levels. At the moment, it is not reasonable to pursue 62 | complete code coverage with tags. 63 | 64 | - **Unit tests**. We should tag every unit test, as they are the main 65 | proof that the requirements have been implemented. 66 | 67 | ## 3. Traceability properties 68 | 69 | We need a solution that satisfies the following properties: 70 | 71 | |TRC-TAG.1| 72 | : Tagging a logical unit should be easy. 73 | 74 | |TRC-REF.1| 75 | : Referencing a logical unit should be easy. 76 | 77 | |TRC-IMPL.1| 78 | : Marking that one logical unit implements another one should be easy. 79 | 80 | |TRC-REV.1| 81 | : Tagging revised logical units should be easy. 82 | 83 | |TRC-GRAPH.1| 84 | : There should be a way to automatically construct the traceability 85 | graph, that is, the graph connecting the logical units via tag 86 | references. 87 | 88 | |TRC-MISS.1| 89 | : There should be a way to automatically identify childless and 90 | orphaned logical units. 91 | 92 | |TRC-GITHUB-REF.1| 93 | : The GitHub references should be updated automatically. 94 | 95 | |TRC-UNIQ.1| 96 | : Every identifier should be declared only once. 97 | 98 | ## 4. Tag syntax 99 | 100 | ### 4.1. Naming scheme 101 | 102 | |TRC-TAG.1::SYNTAX.1| 103 | : We propose a simple naming scheme for tags. We start with the tags 104 | for top-level requirements and then proceed with the tags of the 105 | logical units that implement lower-level requirements. 106 | 107 | **Zero-level requirement tags.** These are the tags of the requirements 108 | that do not implement any requirement but are the zero-level 109 | requirements themselves. These tags have the form `.`. 110 | The component `NAME` is a sequence of capital English letters and 111 | (arabic) digits, possibly separated with a hyphen (-). The component 112 | `` is a sequence of digits, starting with a non-zero digit. 113 | 114 | Example: the above-defined tag **TRC-TAG.1** is a zero-level tag, whose 115 | `` component is `TRC-TAG`, whereas the `` component is 116 | `1`. 117 | 118 | **Non-zero-level requirement tags.** These are the tags of the logical 119 | units that implement the higher-level requirements. For instance, these 120 | may be the tags of English paragraphs that define requirements of lower 121 | levels, of TLA+ operators, and of Rust methods. A tag of level `k` is of 122 | the form `::.`. The component `` is a 123 | tag of level `k-1`, while `` and `` are defined exactly 124 | as in the case of zero-level tags. 125 | 126 | Example: the above-defined tag **TRC-TAG.1::SYNTAX.1** is a tag of level 127 | 1. 128 | 129 | Remark: We choose `::` as a tag separator, as it is familiar to Rust and 130 | C++ programmers. There is no danger of confusing a Rust package name 131 | with a tag, as the tags always come with revision numbers. 132 | 133 | |TRC-TAG.1::SYNTAX.1::SYNONYMY.1| \|TRC-IMPL.1::SYNONYMY.1\| 134 | : Non-zero-level requirement tags (described above) record the 135 | "ancestry" of the higher level units they implement. When a logical 136 | unit implements multiple units from a higher level, it has multiple 137 | parents, multiple lines of ancestry, and, consequently, multiple 138 | valid tags. In such cases, we end up with two or more tags which 139 | refer to the same logical unit. These tags are *synonymous* (in 140 | terms of reference, but not sense) so we refer to this as "tag 141 | synonymy". 142 | 143 | A logical unit is tagged with synonymous tags by separating the synonyms 144 | with spaces. Given a unit designated by the leaf tag `T` that implements 145 | `n` ancestor units with paths `path[0]...path[n]`, the unit is tagged 146 | with `|path[0]::T| |path[1]::T| ... |path[n]::T|`. 147 | 148 | Example: The previous logical unit implements both [TRC-TAG.1::SYNTAX.1] 149 | and [TRC-IMPL.1]. As a result, it is tagged with both 150 | [TRC-TAG.1::SYNTAX.1::SYNONYMY.1] and 151 | [TRC-IMPL.1::SYNONYMY.1][TRC-TAG.1::SYNTAX.1::SYNONYMY.1], and so these 152 | two tags are synonymous. However, all its tags end with the single leaf 153 | tag `SYNONYMY.1` 154 | 155 | \|{TRC-TAG.1::SYNTAX.1,TRC-IMPL.1}::SYNONYMY.1::BRACE-EXPANSION.1\| 156 | : [brace expansion] is used for a concise designation of synonyms: 157 | ancestor tag paths are separated by commas and surrounded in curly 158 | braces. Given a unit with the leaf tag `T` that implements `n` 159 | ancestor units with paths `path[0]...path[n]`, the unit is tagged 160 | with `|{path[0],path[1],...,path[n]}::T|`. This is equivalent to the 161 | concatenation of all tag synonyms, as specified in 162 | [TRC-TAG.1::SYNTAX.1::SYNONYMY.1]. 163 | 164 | Example: The previous unit implements both 165 | [TRC-TAG.1::SYNTAX.1::SYNONYMY.1] and 166 | [TRC-IMPL.1::SYNONYMY.1][TRC-TAG.1::SYNTAX.1::SYNONYMY.1], and has the 167 | leaf tag `BRACE-EXPANSION.1`. It is tagged using the brace expansion 168 | syntax 169 | `|{TRC-TAG.1::SYNTAX.1,TRC-IMPL.1}::SYNONYMY.1::BRACE-EXPANSION.1|`, 170 | which is equivalent to the sequence of tags 171 | `|TRC-TAG.1::SYNTAX.1::SYNONYMY.1::BRACE-EXPANSION.1| |TRC-IMPL.1::SYNONYMY.1::BRACE-EXPANSION.1|`. 172 | 173 | ### 4.2. Tagging logical units 174 | 175 | |TRC-TAG.1::DEF.2| 176 | : A logical unit should be tagged with a tag according to where the 177 | tag is being created. If the tag is created in a specification, 178 | assuming that the specification is written in Markdown format, the 179 | logical unit must be tagged using [PHP Markdown Extra's Definition 180 | List format] and must be surrounded by pipe symbols (`|`). 181 | 182 | Example: 183 | 184 | ``` {.sourceCode .markdown} 185 | |TRC-TAG.1| 186 | : This is the logical unit to which [TRC-TAG.1] refers. Text content related 187 | to the logical unit can span multiple lines. 188 | 189 | Text content related to this logical unit can also span multiple 190 | paragraphs, as long as those paragraphs are indented to align with the text 191 | of the first line of the logical unit following the colon (:). 192 | ``` 193 | 194 | Note that several lower-level requirements may implement the same 195 | higher-level requirement. For instance, the requirements labelled with 196 | [TRC-TAG.1::SYNTAX.1] and [TRC-TAG.1::DEF.2] implement the requirement 197 | [TRC-TAG.1]. 198 | 199 | ### 4.3. Referring to logical units 200 | 201 | |TRC-REF.1::SYNTAX.2| 202 | : To refer to a tag TAG, one surrounds the tag name with square 203 | brackets, that is, `[TAG]`. The tag syntax is sufficiently unique to 204 | automatically identify a tag in text or in code. 205 | 206 | Example: [TRC-REF.1::SYNTAX.2] is a reference to the tag for this 207 | requirement. 208 | 209 | ### 4.4. Implementing logical units 210 | 211 | |TRC-IMPL.1::PREFIX.1| 212 | : The fact that a logical unit implements another logical unit is 213 | reflected by the tag naming scheme. The name of a tag of level `k` 214 | has the form `::.`. The name indicates that 215 | the logical unit labelled with the tag implements the logical unit 216 | that is labelled with the tag ``. 217 | 218 | Example: the requirement described in this previous paragraph has the 219 | tag [TRC-IMPL.1::PREFIX.1] that implements the requirement [TRC-IMPL.1]. 220 | 221 | ### 4.5. Revising logical units 222 | 223 | |TRC-REV.1::INC.1| 224 | : Whenever the behavior of a logical unit has been changed -- this 225 | decision being done by the maintainer of the logical unit -- the tag 226 | revision should be incremented. The revisions of the parent tags 227 | should stay intact. 228 | 229 | ## 5. Automation 230 | 231 | Tagging logical units requires manual effort from researchers, 232 | verification engineers, and system engineers. As the tagging scheme is 233 | simple and non-intrusive, we believe that the tagging process itself 234 | does not require any automation. However, we indeed need automation to 235 | keep the logical units and their tags consistent. 236 | 237 | |TRC-GRAPH.1::BUILD.1| 238 | : From the conceptual point of view, it should be easy to construct 239 | the traceability graph. One has to check out all repositories that 240 | contain English specifications, TLA+ specifications, and the 241 | implementation code. Once it is done, the source files should be 242 | grepped for the regular expression that defines a tag, see 243 | [TRC-TAG.1::DEF.2] and [TRC-TAG.1::SYNTAX.1]. Having extracted the 244 | tags, we can immediately build the graph, as the non-zero-level tags 245 | contain the names of the parent tags. (Due to that, the graph edges 246 | do not have to be built at all.) 247 | 248 | Example. The figure below shows the traceability graph of the 249 | requirements that are introduced in this document. 250 | 251 | .----> TRC-TAG.1 <--. TRC-REF.1 252 | | | ^ 253 | | | | 254 | TRC-TAG.1::SYNTAX.1 TRC-TAG.1::DEF.2 TRC-REF.1::SYNTAX.2 255 | 256 | .-> TRC-IMPL.1 .-> TRC-REV.1 257 | | | 258 | | | 259 | TRC-IMPL.1::PREFIX.1 TRC-REV.1::INC.1 260 | 261 | |TRC-GRAPH.1::CI.1| 262 | : Obviously, searching for the tags from scratch will take plenty of 263 | time. It is more reasonable to update the tags database when new 264 | commits arrive in a git repository. Such updates could be 265 | implemented with continuous integration tools. 266 | 267 | |TRC-MISS.1::ANALYSIS.1| 268 | : Once we have collected the database of tags (see 269 | [TRC-GRAPH.1::BUILD.1]), we can identify the tags that have the 270 | following properties: 271 | 272 | 1. The tags of the form `PARENT::NAME.REVISION` that do not have the 273 | corresponding `PARENT` tag registered in the database (orphan tags). 274 | 275 | 2. The tags of the form `PARENT` that do not have a single child tag 276 | `PARENT::NAME.REVISION` (childless tags). 277 | 278 | The continuous integration tool should report on the orphan tags and 279 | childless tags. If we do not like to see the leaf tags in the report, we 280 | should introduce a notation for the implementation tags that are not 281 | supposed to have children. 282 | 283 | |TRC-MISS.1::OUTDATED.1| 284 | : The tag revisions may help us in finding outdated implementations. 285 | For instance, let us assume that a function `bar()` is labelled with 286 | the tag `FOO.1::BAR.1`, and `FOO` has been advanced to revision 2, 287 | that is, there is a requirement `FOO.2`, but no requirement `FOO.1`. 288 | In this case, the tool could report that the function `bar()` 289 | implements the outdated requirement `FOO.1`. 290 | 291 | |TRC-GITHUB-REF.1::IMPL.1| 292 | : When collecting the tags in the process of [TRC-GRAPH.1::BUILD.1], 293 | we can record the source location of every tag. Having the source 294 | locations, it is easy to replace a reference to every tag with a 295 | hyperlink (yeah!) to the source location. 296 | 297 | |TRC-GITHUB-REF.1::DICT.1| 298 | : If we do not like to update the source code with links to other 299 | repositories, we can build a dictionary of tags and their source 300 | locations. A simple solution would be to create a page that contains 301 | tag names and links to the source locations of their definitions. 302 | 303 | |TRC-UNIQ.1::DUPS.1| 304 | : Again, when collecting the tags in [TRC-GRAPH.1::BUILD.1], we can 305 | check that every tag is defined only once. The continuous 306 | integration tool should report the violations of [TRC-UNIQ.1]. 307 | 308 | |TRC-REV.1::INC.1::TOOL.1| 309 | : In order to manage change, the tool should check whether the content 310 | corresponding to a tag (English, TLA+, code) has changed since the 311 | last commit. The tool should suggest to update to a new version 312 | number. The user then decides whether a new version number is 313 | necessary. If the version number is updated, the tool should show 314 | all references to the old tag. For each of these, the user has to 315 | decide to update the references to the new version, or keep them to 316 | the old one. This functionality should be available to the user to 317 | check the current working branch, but the functionality should also 318 | be part of CI. Also if a new version is introduced, perhaps we can 319 | collect the old versions in a way that is easily accessible. The 320 | information about all tags is already in the repo, but we should 321 | make it easily accessible. 322 | 323 | |TRC-UNIQ.1::BRANCHES.1| 324 | : The approach of [TRC-UNIQ.1::DUPS.1] will report false positives, if 325 | a git repository contains multiple branches. Indeed, multiple 326 | branches may contain the definitions of the same tag that is defined 327 | in a common git commit. In this case, the tag analyser should 328 | analyse the commit history. By doing so the analyser can test, 329 | whether the potential duplicates belong to the same commit, and thus 330 | are uniquely defined. 331 | 332 | [requirements traceability]: https://en.wikipedia.org/wiki/Requirements_traceability 333 | [VDD Manifesto]: https://github.com/informalsystems/VDD/blob/master/manifesto/manifesto.md 334 | [TRC-TAG.1::SYNTAX.1]: https://github.com/informalsystems/vdd/blob/master/traceability/traceability.md#TRC-TAG.1::SYNTAX.1 335 | [TRC-IMPL.1]: https://github.com/informalsystems/vdd/blob/master/traceability/traceability.md#TRC-IMPL.1 336 | [TRC-TAG.1::SYNTAX.1::SYNONYMY.1]: https://github.com/informalsystems/vdd/blob/master/traceability/traceability.md#TRC-TAG.1::SYNTAX.1::SYNONYMY.1 337 | [brace expansion]: www.gnu.org/software/bash/manual/html_node/Brace-Expansion.html 338 | [PHP Markdown Extra's Definition List format]: https://michelf.ca/projects/php-markdown/extra/#def-list 339 | [TRC-TAG.1::DEF.2]: https://github.com/informalsystems/vdd/blob/master/traceability/traceability.md#TRC-TAG.1::DEF.2 340 | [TRC-TAG.1]: https://github.com/informalsystems/vdd/blob/master/traceability/traceability.md#TRC-TAG.1 341 | [TRC-REF.1::SYNTAX.2]: https://github.com/informalsystems/vdd/blob/master/traceability/traceability.md#TRC-REF.1::SYNTAX.2 342 | [TRC-IMPL.1::PREFIX.1]: https://github.com/informalsystems/vdd/blob/master/traceability/traceability.md#TRC-IMPL.1::PREFIX.1 343 | [TRC-GRAPH.1::BUILD.1]: https://github.com/informalsystems/vdd/blob/master/traceability/traceability.md#TRC-GRAPH.1::BUILD.1 344 | [TRC-UNIQ.1]: https://github.com/informalsystems/vdd/blob/master/traceability/traceability.md#TRC-UNIQ.1 345 | [TRC-UNIQ.1::DUPS.1]: https://github.com/informalsystems/vdd/blob/master/traceability/traceability.md#TRC-UNIQ.1::DUPS.1 346 | -------------------------------------------------------------------------------- /traceability/tooling.md: -------------------------------------------------------------------------------- 1 | # Traceability Tooling 2 | 3 | This proposal is intended to support and help inform the [Traceability 4 | Proposal](./traceability.md) for Informal Systems. 5 | 6 | ## Background 7 | 8 | ### Requirements Management 9 | 10 | [Requirements management][1] is a broad and deep area of interest in software 11 | engineering. This breadth and depth is most clearly evident in safety-critical 12 | software engineering efforts, such as in aerospace and military application 13 | development, where mistakes can cost lives. 14 | 15 | Notable examples of systems engineering standardization efforts that include 16 | requirements management: 17 | 18 | * [ISO/IEC 12207][2], which replaced [MIL-STD-498][3] for military software 19 | engineering. 20 | * [DO-178C][4] for aerospace systems engineering, which embraces the use of 21 | formal methods for verification. 22 | 23 | ### Existing Technologies 24 | 25 | Many technologies already exist to assist in requirements management. Some 26 | prominent examples include: 27 | 28 | * [IBM Engineering Requirements Management DOORS][6] 29 | * Video: [IBM Rational DOORS Objects][7] 30 | * Video: [IBM Rational DOORS Attributes][8] 31 | * Video: [IBM Rational DOORS Linking and Traceability][9] 32 | * [Product Documentation][10] 33 | * [Eclipse Capra][17] (open source) 34 | * [Traceability demonstration][20] 35 | * [Video demonstration][18] 36 | * [Modern Requirements][11] 37 | * [Pearls][15] 38 | * [Visure Requirements][16] 39 | * [Jama Connect][13] 40 | * [Doorstop][22] (open source) 41 | 42 | Additional potentially useful/interesting links: 43 | 44 | * It seems as though [GitLab is in the process of implementing some kind of 45 | requirements tracking functionality][23] 46 | * [Requirements Interchange Format][19] 47 | * See the video series [Using Qualified Tools in a DO-178C Development 48 | Process][5]. 49 | 50 | ### Domain Analysis 51 | 52 | When looking at the existing software packages available for requirements 53 | management, it seems, at a high level, that the following primary concepts are 54 | employed: 55 | 56 | * **Specifications**, which: 57 | * contain requirements, 58 | * have attributes/metadata, and 59 | * have relationships to other specifications, where these relationships are 60 | directional and can have other arbitrary attributes associated with them. 61 | * **Requirements**, which have: 62 | * globally unique identifiers, 63 | * relationships to other requirements, where these relationships are 64 | directional and can have other arbitrary attributes associated with them, 65 | * other arbitrary attributes, such as priority and status (e.g. "Accepted", 66 | "Rejected"). 67 | * **Implementation artifacts**, such as source code, which implement 68 | requirements. 69 | * **Acceptance criteria**, which: 70 | * have globally unique identifiers, 71 | * relate to specific requirements, and 72 | * specify what is necessary to consider a requirement as being met. 73 | * **Defects**, which 74 | * relate to specific requirements, and 75 | * are either present or absent. 76 | * **Test artifacts**, which, when executed successfully, automatically verify 77 | that either 78 | * specific acceptance criteria are met, or 79 | * a specific defect has been corrected. 80 | 81 | This is illustrated in the following diagram. 82 | 83 | ![High-Level Requirements Management Domain](./tooling_rm_domain.png) 84 | 85 | ### Versioning 86 | 87 | It seems as though most requirements management tooling implements their own 88 | complex versioning system on top of their domain model. If, however, one had to 89 | devise a scheme that allows people to specify the domain objects in plain text 90 | such that it could be stored in a VCS like Git, some aspects of this versioning 91 | would be available for free. 92 | 93 | ### Visualization 94 | 95 | Most requirements management tools provide various ways of visualizing 96 | requirements: 97 | 98 | * [Traceability matrices][21] 99 | * Traceability graphs 100 | * Hierarchically structured documents (e.g. Microsoft Word documents with 101 | different levels of requirements specified at different heading levels and 102 | paragraph text) 103 | 104 | ### Workflow 105 | 106 | Many of the existing tools seem as though they require that one use their user 107 | interfaces to capture requirements and implementation status, as well as to 108 | update requirement versions. Most seem to be able to export or import 109 | information to/from other similar or related systems. 110 | 111 | Collaborative capabilities, where they exist, seem embedded inside the user 112 | interfaces of the requirements management software. It is not clear whether this 113 | is compatible with the collaborative features available through GitHub or 114 | GitLab. 115 | 116 | ## Desired User Experience 117 | 118 | ### Domain 119 | 120 | ![VDD Requirements Management Domain](./tooling_domain.png) 121 | 122 | **Note the following:** 123 | 124 | * We do not formally define **acceptance criteria** at present. For practical 125 | purposes, we assume that our requirements are acceptance criteria as well at 126 | present. This may change in future. 127 | * While we regard the TLA+ models as a form of specification, in the terminology 128 | of requirements management the models would technically be considered a form 129 | of implementation artifact (just like a simulation would be). This distinction 130 | does not practically affect our desired traceability. 131 | * With regard to **defects**: 132 | * The term *defect* implies that there is either an existing model or existing 133 | implementation and test code that relate to one or more requirements, but it 134 | has been shown that the model and/or implementation and/or test code is 135 | insufficient to meet the requirement. 136 | * We distinguish between *defects* and *bugs*, where a defect relates to one 137 | or more requirements in the specifications, but a bug does not (e.g. a 138 | misspelled field name in a JSON-RPC response). 139 | 140 | ### Stakeholders 141 | The requirements traceability tooling needs to support the following users: 142 | 143 | * **Clients**, who 144 | * help define and refine the requirements for their project, 145 | * want to know how the project is progressing and 146 | * want to know whether their requirements have been met. 147 | * **Researchers**, who 148 | * help refine project requirements, 149 | * write problem, system and protocol specifications (in English), 150 | * implement and validate models of specifications (e.g. in TLA+), and 151 | * implement and validate models (e.g. in TLA+) that correspond to 152 | implementation code. 153 | * **Engineers**, who 154 | * help refine project requirements, 155 | * implement software according to the requirements, models and acceptance 156 | criteria, and 157 | * write tests according to the requirements and acceptance criteria. 158 | 159 | ### Use Cases 160 | 161 | Use cases for each stakeholder for the tooling are outlined below. Derived from 162 | the [traceability properties that need to be 163 | satisfied](./traceability.md#3-traceability-properties). 164 | 165 | * **All users** need to be able to 166 | * collaboratively define and refine requirements for their projects; 167 | * see which requirements are 168 | * modelled, 169 | * implemented, 170 | * tested, and 171 | * defective (and see which open GitHub/GitLab issues capture the nature of 172 | the defects); 173 | * see a high-level summary of how far the project is and how quickly it is 174 | progressing, 175 | * **Researchers** additionally need to be able to 176 | * define which requirements must be modelled in TLA+; 177 | * reference specific requirements from their TLA+ models' code; 178 | * see when references in their TLA+ models are out of date (i.e. requirements' 179 | version numbers have been updated, but the references in the TLA+ models 180 | have not); 181 | * easily navigate amongst the specification, models, implementation code, 182 | tests and defects as per the scheme in the figure below. 183 | * **Engineers** additionally need to be able to 184 | * reference specific requirements from their implementation and test code; 185 | * reference specific parts of TLA+ models from their implementation and test 186 | code; 187 | * see when references in the code no longer refer to existing requirements 188 | (i.e. if the requirement has been removed); 189 | * see when references in the code are out of date (i.e. requirements' version 190 | numbers have been updated); 191 | * easily navigate amongst the specification, models, implementation code, 192 | tests and defects as per the scheme in the figure below. 193 | 194 | ![Navigation through artifacts](./tooling_nav.png) 195 | 196 | ### Workflow 197 | At present, researchers and engineers collaborate on specifications, models and 198 | implementation through GitHub and GitLab. All artifacts are written in 199 | structured plain text files. 200 | 201 | It would be ideal if we could develop user experiences for all stakeholders that 202 | keeps to the current workflow as far as possible. Important workflows to be 203 | considered include the following. 204 | 205 | * **Defect tracking** 206 | 1. An issue is created on GitHub/GitLab. 207 | 2. The issue is tagged as a `defect` to distinguish it from bugs, feature 208 | requests, etc. 209 | 3. This issue must reference one or more requirements that are impacted by the 210 | defect. 211 | 4. If the defect is in the implementation: 212 | 1. a regression test must be implemented to automatically test for when the 213 | defect has been corrected, 214 | 2. the necessary code changes must be made to make the regression test pass. 215 | 5. If the defect is in the model: 216 | 1. the model must be updated, 217 | 2. if necessary, regression tests must be written and implementation must be 218 | updated accordingly. 219 | 6. Once the defect has been corrected, the issue is closed. 220 | 221 | ### Requirement Tagging Ergonomics 222 | It appears as though most requirements management systems use tagging systems 223 | where little to no information about the relevant requirement is encoded in the 224 | tag itself. Sometimes there may be some indication of where in the hierarchy of 225 | requirements or context the specific tag exists. For example: `REQ-001`, 226 | `TST-001`. 227 | 228 | From our [traceability requirements](./traceability.md), it is clear that we 229 | want tagging that has some indication of: 230 | 231 | 1. the name of the specification, 232 | 2. what the specific requirement is about, 233 | 3. where in the hierarchy of requirements a specific requirement is, and 234 | 4. the version of the requirement. 235 | 236 | ## Implementation Recommendations 237 | 238 | ### Existing Software Options 239 | 240 | It does not seem as though any existing requirements management software is an 241 | appropriate fit for the current team workflow. 242 | 243 | ### Text Formats for Specifications 244 | 245 | If we make use of GitHub as our primary platform for development and 246 | collaboration on specifications, we could either: 247 | 248 | 1. Write out specifications in [GitHub-Flavored Markdown][24]. 249 | 2. Write out specifications in a different format that meets our criteria 250 | (perhaps an extension of Markdown) and then separate the means by which we 251 | write the specifications (i.e. by way of collaborative editing on GitHub) 252 | from how we read them (e.g. by way of a static site, or pre-produced PDF 253 | files). 254 | 255 | The most versatile approach that would lend itself to both usability and 256 | extension would probably be option (2) above. 257 | 258 | ### Artifact Types 259 | 260 | It is clear from our [domain](#domain) that we have distinct artifact **types** 261 | as inputs to our tooling: 262 | 263 | * Natural language specifications 264 | * Models 265 | * Code 266 | * Defects 267 | * Configuration 268 | 269 | It is not clear yet whether dividing components (e.g. the natural language 270 | specifications and code) into sub-types would provide benefits, but we must be 271 | open to this possibility. 272 | 273 | ### Recommended Architecture 274 | 275 | The recommended architecture for our implementation is very similar to a static 276 | web site generator, which operates on a set of plain text input files to produce 277 | a rich, hyperlinked static web site. 278 | 279 | ![Recommended architecture](./tooling_arch.png) 280 | 281 | There are two logical applications (which could, practically, be the same 282 | application, or they could be two separate applications) in the recommended 283 | architecture above: 284 | 285 | 1. A **traceability compiler**, which takes as input all of the text of the 286 | specifications, models, code, GitHub/GitLab issues (specifically those 287 | labelled as "defects") and some configuration. The primary output of this 288 | compiler is an *intermediate representation*. It is recommended here to use 289 | [JSON-LD](https://json-ld.org/). 290 | 2. A **static web site generator**, which takes as input the *intermediate 291 | representation* as well as some configuration. The output of this would be a 292 | static web site comprising: 293 | 1. A rich, navigable catalogue of all of the related specifications, models, 294 | code and tests. 295 | 2. Rich summary reports that detail things like implementation coverage (i.e. 296 | how much of the specification has been implemented), amongst other 297 | important metrics. 298 | 299 | ### Execution Environment Configuration 300 | 301 | The inputs for the recommended tooling are most likely going to be spread across 302 | multiple Git repositories. Artifacts (i.e. the intermediate representation and 303 | static web site) will probably need to be produced in two environments: 304 | 305 | 1. Locally, as one is making modifications to one or more of the various 306 | repositories. 307 | 2. In CI, as changes are pushed and merged to different branches. 308 | 309 | Naturally, the locations of repositories will differ between the two 310 | environments. In the case of local execution, the locations of the repositories 311 | on the local machine will also potentially vary from machine to machine 312 | depending on personal user preferences. 313 | 314 | For each project, we therefore need two types of configuration: 315 | 316 | 1. Per-repository configuration. In this case, the repository in which the 317 | specification(s) live could act as the primary entrypoint for the tooling to 318 | discover related repositories and to be able to interpret the different input 319 | artifacts according to their [type](#artifact-types). Depending on the nature 320 | of the project, one could have all inputs co-located in the same repository, 321 | but this need not be the case. 322 | 2. Local, mapping specific directories to Git remotes. As an optimization here 323 | we may employ a convention where, if repositories are organized in a 324 | particular way, the tooling will automatically discover all of the necessary 325 | repositories to be able to produce the output artifacts. 326 | 327 | ### Sample Referencing Approach 328 | 329 | The following outlines a trivial example of how one could capture the traces 330 | between specifications and implementation, using [TRC-IMPL.1::PREFIX.1][]. 331 | 332 | #### Sample Specification 333 | 334 | ```markdown 335 | # Hello World 336 | 337 | |HW.1| 338 | : A program to greet the world. 339 | 340 | ## User Input 341 | 342 | |HW.1.::INPUT.1| 343 | : The software must be able to handle specified forms of input. 344 | 345 | |HW.1::INPUT.1::ASKNAME.1| 346 | : The software must ask the user for their name, and receive the input given in 347 | response. 348 | 349 | ## Output 350 | 351 | |HW.1::OUTPUT.1| 352 | : The software must produce certain outputs, based on the [HW.1::INPUT.1]. 353 | 354 | |HW.1::OUTPUT.1::HELLO.1| 355 | : Once the software has asked for the user's name, it must print the text 356 | "Hello {user name}!". 357 | 358 | The `{user name}` part of the output string must be replaced with the name of 359 | the user obtained in [HW.1::INPUT.1::ASKNAME.1]. 360 | 361 | |HW.1::OUTPUT.1::FOUNDHIM.1| 362 | : If the user's name is "Waldo", in addition to saying hello to the user as per 363 | [HW.1::OUTPUT.1::HELLO.1], print the text: "We found you!". 364 | ``` 365 | 366 | #### Sample Code 367 | 368 | ```rust 369 | use std::io; 370 | 371 | /// Checks whether the specified name is "Waldo". 372 | /// 373 | /// This tag applies to the whole function. 374 | /// 375 | /// |HW.1::OUTPUT.1::FOUNDHIM.1:::CHECK.1| 376 | fn check_for_waldo(name: &str) { 377 | if name == "Waldo" { 378 | println!("We found you!"); 379 | } 380 | } 381 | 382 | /// |HW.1::MAIN.1| 383 | fn main() { 384 | let mut name = String::new(); 385 | 386 | // The following tag applies only to the line directly underneath it. 387 | // 388 | // |HW.1::INPUT.1::ASKNAME.1::IMP.1| 389 | println!("Hi there! What's your name?"); 390 | 391 | // |HW.1.::INPUT.1::READ-LINE.1| 392 | match io::stdin().read_line(&mut name) { 393 | Ok(_) => { 394 | let name_trimmed = name.trim(); 395 | // |HW.1::OUTPUT.1::HELLO.1::IMP.1| 396 | println!("Hello {}!", name_trimmed); 397 | check_for_waldo(name_trimmed); 398 | }, 399 | Err(e) => panic!(e), 400 | } 401 | } 402 | ``` 403 | 404 | [1]: https://en.wikipedia.org/wiki/Requirements_management 405 | [2]: https://en.wikipedia.org/wiki/ISO/IEC_12207 406 | [3]: https://en.wikipedia.org/wiki/MIL-STD-498 407 | [4]: https://en.wikipedia.org/wiki/DO-178C 408 | [5]: https://www.mathworks.com/videos/series/using-qualified-tools-in-a-do-178c-development-process.html 409 | [6]: https://www.ibm.com/us-en/marketplace/requirements-management 410 | [7]: https://www.youtube.com/watch?v=GAY9Xq1dcsU&list=PLFB5C518530CFEC93&index=1 411 | [8]: https://www.youtube.com/watch?v=zfp5DEisdcs&list=PLFB5C518530CFEC93&index=2 412 | [9]: https://www.youtube.com/watch?v=2tN_cVQP214&list=PLFB5C518530CFEC93&index=4 413 | [10]: https://www.ibm.com/support/knowledgecenter/SSYQBZ_9.7.1/com.ibm.doors.requirements.doc/helpindex_doors.html 414 | [11]: https://www.modernrequirements.com/ 415 | [12]: https://reqtest.com/ 416 | [13]: https://www.jamasoftware.com/platform/jama-connect/ 417 | [14]: https://www.xebrio.com/requirements-management-software 418 | [15]: https://pearls-inc.com/products/pearls-lite-version/ 419 | [16]: https://visuresolutions.com/requirements-management-tool 420 | [17]: https://projects.eclipse.org/projects/modeling.capra 421 | [18]: https://www.youtube.com/watch?v=h6qntRT33gM 422 | [19]: https://www.omg.org/spec/ReqIF/ 423 | [20]: https://www.youtube.com/watch?v=XRtLs5OT_yM 424 | [21]: https://en.wikipedia.org/wiki/Traceability_matrix 425 | [22]: https://doorstop.readthedocs.io/en/latest/ 426 | [23]: https://gitlab.com/groups/gitlab-org/-/epics/2703 427 | [24]: https://github.github.com/gfm/ 428 | 429 | [TRC-IMPL.1::PREFIX.1]: ./traceability.md#TRC-IMPL.1::PREFIX.1 430 | -------------------------------------------------------------------------------- /lightclient/failuredetector.md: -------------------------------------------------------------------------------- 1 | ***This is the beginning of an unfinished draft. Comments are welcome!*** 2 | 3 | # Fork detector 4 | 5 | A detector (or detector for short) is a mechanism that expects as 6 | input a header with some height *h*, connects to different Tendermint 7 | full nodes, requests the header of height *h* from them, and then 8 | cross-checks the headers and the input header. 9 | 10 | There are two foreseeable use cases: 11 | 12 | 1) strengthen the light client: If a light client accepts a header 13 | *hd* (after performing skipping or sequential verification), it can 14 | use the detector to probe the system for conflicting headers and 15 | increase the trust in *hd*. Instead of communicating with a single 16 | full node, communicating with several full nodes shall increase the 17 | likelihood to be aware of a fork (see [[accountability]] for 18 | discussion about forks) in case there is one. 19 | 20 | 2) to support fork accountability: In the case when more than 1/3 of the voting power is held by faulty validators, faulty nodes may generate two conflicting headers for the same height. The goal of the detector is to learn about the conflicting headers by probing different full nodes. Once a detector has two conflicting headers, these headers are evidence of misbehavior. A natural extension is to use the detector within a monitor process (on a full node) that calls the detector on a sample (or all) headers (in parallel). (If the sample is chosen at random, this adds a level of probabilistic reasoning.) If conflicting headers are found, they are evidence that can be used for punishing processes. 21 | 22 | In this document we will focus onn strengthening the light client, and leave other uses of the detection mechanism (e.g., when run on a full node) to the future. 23 | 24 | This document refers to the light client ADR [[lightclient]]. If not 25 | familiar with the light client, the ADR gives a good overview. 26 | 27 | ## Context of this document 28 | 29 | The light client verification specification [[verification]] is 30 | designed for the Tendermint failure model (1/3 assumption) 31 | [TMBC-FM-2THIRDS]. It is safe under this assumption, and live 32 | if it can reliably (that is, no message loss, no duplication, and 33 | eventually delivered) and timely communicate with a correct full node. If 34 | this assumption is violated, the light client can be fooled to trust a 35 | header that was not generated by Tendermint consensus. 36 | 37 | This specification, the fork detector, is a "second line of defense", in case the 1/3 assumption is violated. Its goal is to collect evidence. However, it is impractical to probe all full nodes. At this time we consider a simple scheme of maintaining an address book of known full nodes from which a small subset (e.g., 4) are chosen initially to communicate with. More involved book keeping with probabilistic guarantees can be considered at later stages of the project. 38 | 39 | The [light client](lightclient) maintains a simple address book 40 | containing addresses of full nodes that it can pick as primary and 41 | secondaries. 42 | To obtain a new header, the light client first does [verification](verification) 43 | with the primary, and then cross-checks the header 44 | with the secondaries using this specification. 45 | 46 | ### Informal Problem statement 47 | 48 | > We put tags to informal problem statements as there is no sequential 49 | > secification. 50 | 51 | The following requirements are operational in that they describe how things 52 | should be done, rather than what should be done. They capture the intuition 53 | after the [light client ADR](lightclient) had been written. However, they do not constitute 54 | temporal logic verification conditions. For those, see [LCD-VC-*] below. 55 | 56 | #### **[LCD-IP-STATE]** 57 | 58 | We assume that `State` mentioned in [light client ADR](lightclient) is 59 | a collection of pairs *(fn,h)* that capture that header *h* was 60 | downloaded from full node *fn*. 61 | 62 | #### **[LCD-IP-Q]** 63 | 64 | Whenever the light client verifier adds a new pair 65 | *(p,h)* containing the primary *p* and a header *h* to *State*, the 66 | detector should query the secondaries by calling `Commit` for height 67 | *h.Height* remotely. 68 | 69 | 70 | #### **[LCD-IP-RespOK]** 71 | 72 | If a header *h'*, returned by the secondary *s*, is 73 | equal to *h* the detector adds *(s,h)* to state. 74 | 75 | 76 | *Remark:* This information might later be useful in case we find a 77 | problem when we get another header for this height from a different 78 | secondary. TODO: shall we keep that? Or is it sufficient to only store 79 | headers in `State`? 80 | 81 | 82 | #### **[LCD-IP-RespBad]** 83 | 84 | Otherwise, that is, if *h'* returned by *s* is 85 | different from *h*, the detector has to analyze the situation. If the detector 86 | can prove a fork on the main chain by performing the bisection protocol with *s*, it stops the 87 | light client and submits evidence. 88 | 89 | 90 | #### **[LCD-IP-PEERSET]** 91 | 92 | Whenever the detector observes misbehavior of a full node from 93 | the set of Secondaries it should be replaced by a fresh full node. (A full node 94 | that has not been primary or secondary before). 95 | 96 | 97 | 98 | ## Assumptions/Incentives/Environment 99 | 100 | It is not in the interest of faulty full nodes to talk to the 101 | detector as long as the detector is connected to at least one 102 | correct full node. This would only increase the likelihood of 103 | misbehavior being detected. Also we cannot punish them easily 104 | (cheaply). The absence of a response need not be the fault of the full 105 | node. 106 | 107 | Correct full nodes have the incentive to respond, because the 108 | detector may help them to understand whether their header is a good 109 | one. We can thus base liveness arguments of the detector on 110 | the assumptions that correct full nodes reliably talk to the 111 | detector. 112 | 113 | 114 | **Assumptions** 115 | 116 | #### **[LCD-A-CorrFull]** 117 | 118 | At all times there is at least one correct full 119 | node among the primary and the secondary. 120 | 121 | *Remark:* Perhaps [LCD-A-CorrFull] is not needed in the end because 122 | the verification conditions [LCD-VC-*] have preconditions on specific 123 | cases where primary and/or secondaries are faulty. 124 | 125 | #### **[LCD-A-RelComm]** 126 | 127 | Communication between the detector and a correct full node is reliable and bounded in time. 128 | 129 | 130 | ## Problem statement 131 | 132 | The detector gets as input a header at height *h* and 133 | queries the secondaries for their headers. Eventually, the 134 | detector should 135 | decide 136 | - whether to report evidence for height *h* 137 | - whether to stop operation at height *h* 138 | 139 | The detector should satisfy the following temporal formulas 140 | 141 | #### **[LCD-VC-INV]** 142 | 143 | If there is no fork at height *h*, and the primary 144 | and the secondaries are correct, then the detector should 145 | never output evidence for height *h* and should not stop at height *h*. 146 | 147 | 148 | #### **[LCD-VC-INV-DONT-STOP]** 149 | 150 | If there is no fork at height *h*, and 151 | the primary is correct, then the detector should never stop 152 | at height *h*. 153 | 154 | 155 | #### **[LCD-VC-LIVE-DONT-STOP]** 156 | 157 | If there is no fork at height *h*, and 158 | the primary is correct, then the detector should eventually 159 | decide to not stop and to not report evidence 160 | at height *h*. 161 | 162 | 163 | #### **[LCD-VC-LIFE-FORK]** 164 | 165 | If there is a fork (two correct full nodes decided on different blocks for the same height), and 166 | - the light client needs to obtain a header of a height *h* that is 167 | affected, 168 | and 169 | - there are two correct full nodes *i* and *j* that are 170 | - on different branches, and 171 | - primary or secondary, 172 | 173 | then the detector eventually outputs evidence for height *h*. 174 | 175 | 176 | 177 | #### **[LCD-REQ-REP]** 178 | 179 | If the detector observes two conflicting headers for height *h*, it should try to verify both. If both are verified it should report evidence. 180 | 181 | If the primary reports header *h* and a secondary reports header *h'*, 182 | and if *h'* can be verified based on common root of trust, then 183 | evidence should be generated; TODO: shall we add "otherwise drop 184 | h' and continue normal operation"? 185 | 186 | *Remark:* By verifying we mean calling `VerifyHeaderAtHeight` from the 187 | [[verification]] specification. 188 | 189 | ## Definitions 190 | 191 | - A fixed set of full nodes is provided in the configuration upon 192 | initialization. Initially this set is partitioned into 193 | - one full node that is the *primary* (singleton set), 194 | - a set *Secondaries* (of fixed size, e.g., 3), 195 | - a set *FullNodes*. 196 | - A set *FaultyNodes* of nodes that the light client suspects of being faulty; it is initially empty 197 | - *State* is a set of pairs *(fn,h)* where header *h* has been received (and possibly verified with) full node *fn*. 198 | - The verifier communicates with the primary [[verification]]. Whenever the verifier successfully verifies a header *h* from the primary *p*, it stores 199 | *(p,h)* 200 | in *State*. 201 | 202 | 203 | #### **[LCD-INV-NODES]:** 204 | The detector shall maintain the following invariants: 205 | - *FullNodes \intersect Secondaries = {}* 206 | - *FullNodes \intersect FaultyNodes = {}* 207 | - *Secondaries \intersect FaultyNodes = {}* 208 | 209 | and the following transition invariant 210 | - *FullNodes' \union Secondaries' \union FaultyNodes' = FullNodes \union Secondaries \union FaultyNodes* 211 | 212 | ## Solution 213 | 214 | 215 | 216 | ### Inter Process Communication 217 | 218 | 219 | 220 | For the purpose of this light client specification, we assume that the 221 | Tendermint Full Node exposes the following functions over 222 | Tendermint RPC: 223 | 224 | 225 | ```go 226 | func Commit(addr Address, height int64) (SignedHeader, error) 227 | ``` 228 | 229 | - Implementation remark 230 | - RPC to full node *n* at address *addr* 231 | - Expected precodnition 232 | - header of `height` exists on blockchain 233 | - Expected postcondition 234 | - if *n* is correct: Returns a sound signed header of height `height` 235 | from the blockchain if communication is timely (no timeout) 236 | - if *n* is faulty: Returns a signed header with arbitrary content 237 | (possibly the header is sound despite *n* being faulty) 238 | - Error condition 239 | * if *n* is correct: precondition violated or timeout 240 | * if *n* is faulty: arbitrary error 241 | 242 | #### **[FN-LuckyCase]**: 243 | The full node on which the procedure is called remotely is correct and no timeout occurs at the caller on `Commit`. 244 | 245 | #### **[FN-ManifestFaulty]** 246 | The full node on which the procedure is called remotely is faulty and a faulty header is returned. 247 | 248 | ---- 249 | 250 | ### Auxiliary Functions (Local) 251 | 252 | 253 | ```go 254 | Add_to_state(addr Address, sh SignedHeader) 255 | ``` 256 | - Expected postcondition 257 | - The pair *(addr,sh)* is added to *State* 258 | 259 | 260 | 261 | ```go 262 | still_punishable(sh SignedHeader) (Boolean) 263 | ``` 264 | - Implementation Remark: it might make sense to check whether the unbonding period is 265 | still running although the trusting period is over 266 | TODO: fix the period that should be checked. Something between 267 | trusting period and unbonding period? 268 | - Expected postcondition 269 | - returns true if misbehavior related to *sh* can still be 270 | punished. Can be approximated by *sh.bfttime + unbondingperiod > now* 271 | 272 | 273 | 274 | ```go 275 | Replace_Secondary(addr Address) 276 | ``` 277 | - Expected precondition 278 | - *FullNodes* is nonempty 279 | - Expected postcondition 280 | - addr is moved from *Secondaries* to *FaultyNodes* 281 | - an address *a* is moved from *FullNodes* to *Secondaries* 282 | - Error condition 283 | - if precondition is violated 284 | 285 | ```go 286 | Report_and_Stop(sh) 287 | ``` 288 | - Implementation Remark: 289 | - This function communicates the existence of a fork to the outside 290 | - It creates the evidence from its local information: 291 | - all headers of height *sh.height* 292 | - possibly all the other pairs *(f,h)* from *State* from full 293 | nodes *f* that where used to find the fork (the primary, 294 | all involved secondaries) 295 | - It submits this evidence 296 | - It flags the light client to stop 297 | - Expected Postcondition 298 | - It "terminates everything". TODO: should this be described in a nicer 299 | control flow? How should this be escalated to the whole light client? 300 | 301 | #### From the verifier 302 | 303 | ```go 304 | VerifyHeaderAtHeight (untrustedHeight int64, 305 | trustedState TrustedState, 306 | addr Address) (TrustedState, error) 307 | ``` 308 | - Implementation remark 309 | - signature deviates from current verification spec, which is 310 | written with having bisection with the primary in mind. However, 311 | we also need bisection with secondaries, so that we added the 312 | Address `addr` of the full node the light client should do 313 | bisection with. Also we changed the return values slightly, as we are 314 | concurrently working on a more verification-oriented (verification 315 | as in "model checking") light client verification spec 316 | (verification 317 | as in [[verification]]). 318 | - *startTime* and *endTime* are the local system time right after 319 | invocation of `VerifyHeaderAtHeight` and right before the function returns, respectively. 320 | - Expected precondition 321 | - The field `Time` of the signed header of `trustedState` is within *trustingPeriod* from *startTime* 322 | - Expected postcondition: 323 | - Returns `(trustedState, OK)` under [**[FN-LuckyCase]**](FN-LuckyCase-link), 324 | if the signed header of `trustedState`: 325 | - is the header at height `untrustedHeight` of the blockchain, and 326 | - was generated within *trustingPeriod* from *endTime* 327 | - corresponds to`return (trustedState, 328 | nil)` in current verification spec. 329 | - Returns `(trustedState, EXPIRED)` under [**[FN-LuckyCase]**](FN-LuckyCase-link), if 330 | the signed header of `trustedState`: 331 | - is the header at height `untrustedHeight` of the blockchain, and 332 | - was generated after *endTime - trustingPeriod* 333 | - corresponds to `return (trustedState, 334 | ErrHeaderNotWithinTrustedPeriod)` in current verification spec. 335 | - Error conditions 336 | - precondition violated 337 | - [**[FN-LuckyCase]**](FN-LuckyCase-link) does not hold 338 | - [**[FN-ManifestFaulty]**](FN-ManifestFaulty-link) holds 339 | 340 | ## Solution 341 | 342 | Shared data of the light client 343 | - a pool of full nodes *FullNodes* that have not been contacted before 344 | - peer set called *Secondaries* 345 | - primary 346 | - State 347 | 348 | 349 | 350 | The problem is solved by calling the function `ForkDetector` with 351 | a header *hd* that has 352 | just been verified by the verifier as a parameter. *trustedState* 353 | should be "a possibly old" 354 | trusted state to increase the likelihood of detecting a fork. 355 | 356 | 357 | ```go 358 | func ForkDetector(hd Header, trustedState TrustedState) { 359 | for i, s range Secondaries { 360 | sh := Commit(s,hd.height) 361 | if validateSignedHeaderAndVals(sh,...) fails { 362 | // validateSignedHeaderAndVals is defined in [verification] 363 | // sh is malformed (fails basic validation): *s* is 364 | // faulty. We replace it in the peer set by a different full node 365 | Replace_Secondary(s) 366 | // if this fails, we do not have more full node addresses 367 | // to talk to. Should we ask for more full nodes? 368 | } 369 | else { 370 | if hd == sh { 371 | // header matches. we do nothing 372 | } 373 | else { 374 | // [LCD-REQ-REP] 375 | // header does not match. there is a situation. 376 | // we try to verify sh by querying s 377 | result := VerifyHeaderAtHeight(sh.height, trustedState, s) 378 | if result = (sh,OK) { 379 | // we verified header sh which is conflicting to hd 380 | // there is a fork on the main blockchain. -> call panic 381 | // with all the evidence 382 | Report_and_Stop(sh) 383 | } 384 | else if result = (sh,EXPIRED) { 385 | // we verified header sh which is conflicting to hd 386 | // there is a fork on the main 387 | // blockchain but trusting period expired. -> if still 388 | // within unbonding period do panic 389 | if still_punishable(sh) { 390 | Report_and_Stop(sh) 391 | } 392 | else { 393 | // try to reproduce the fork with a 394 | // later trusted state? If we are lucky, 395 | // VerifyHeaderAtHeight returns with OK 396 | TODO: fix what to do here 397 | } 398 | } 399 | else { 400 | // s might be faulty or unreachable 401 | Replace_peer(s) 402 | // after this Secondaries might be updated: TODO: 403 | // decide whether this should imply one more 404 | // loop iteration 405 | } 406 | } 407 | } 408 | } 409 | } 410 | 411 | ``` 412 | - Comments 413 | - Correctness is based on that *hd* has been verified by verification. 414 | - Expected precondition 415 | - trustedState within trustingperiod 416 | - Secondaries initialized and non-empty 417 | - Expected postcondition 418 | - satisfies [LCD-VC-INV], [LCD-VC-INV-DONT-STOP], 419 | [LCD-VC-LIFE-FORK] for height *hd.height*. 420 | - TODO: perhaps add return values: returns false under the preconditions of [LCD-VC-INV], [LCD-VC-INV-DONT-STOP] 421 | - TODO: perhaps add return values: returns true otherwise 422 | - removes faulty secondary if it reports wrong header 423 | - Error condition 424 | - fails if precondition is violated 425 | 426 | 427 | 428 | 429 | ## Correctness arguments 430 | 431 | > Proof sketches of why we believe the solution satisfies the problem statement. 432 | Possibly giving inductive invariants that can be used to prove the specifications 433 | of the problem statement 434 | 435 | #### Argument for [LCD-VC-INV] 436 | 437 | - In this case, `Commit` will always return the header from the blockchain 438 | - hd == sh will always be true. `ForkDetector` does nothing 439 | 440 | #### Argument for [LCD-VC-INV-DONT-STOP] 441 | 442 | - In this case, *hd* is the one from the blockchain 443 | - As there is no fork, no faulty secondary can create a sequence of 444 | headers that convince the detector. 445 | 446 | TODO: the last point requires pointers to blockchain invariants, and 447 | that if there is not fork, no sequence of proof can be generated 448 | 449 | #### Argument for [LCD-VC-LIVE-DONT-STOP] 450 | 451 | TODO 452 | 453 | #### Argument for [LCD-VC-LIFE-FORK] 454 | 455 | Can be proven under the assumption that TrustedState is choosen before 456 | the fork happened. 457 | 458 | 459 | 460 | 461 | # References 462 | 463 | > links to other specifications/ADRs this document refers to 464 | 465 | 466 | [[verification]] The specification of the light client verification. 467 | 468 | [[lightclient]] The light client ADR [77d2651 on Dec 27, 2019]. 469 | 470 | [TMBC-FM-2THIRDS-linkVDD]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#**[TMBC-FM-2THIRDS-link]**: 471 | 472 | [TMBC-FM-2THIRDS-link]: https://github.com/tendermint/spec/blob/master/spec/consensus/light-client/verification.md 473 | 474 | 475 | [block]: https://github.com/tendermint/spec/blob/master/spec/blockchain/blockchain.md 476 | 477 | [blockchain]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md 478 | 479 | [lightclient]: https://github.com/interchainio/tendermint-rs/blob/e2cb9aca0b95430fca2eac154edddc9588038982/docs/architecture/adr-002-lite-client.md 480 | 481 | [verificationVDD]: https://github.com/informalsystems/VDD/blob/master/lightclient/failuredetector.md 482 | 483 | [verification]: https://github.com/tendermint/spec/blob/master/spec/consensus/light-client/verification.md 484 | 485 | [accountability]: https://github.com/tendermint/spec/blob/master/spec/consensus/light-client/accountability.md 486 | -------------------------------------------------------------------------------- /traceability/tooling_arch.graphml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | Traceability 24 | Compiler 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | English 36 | Specification 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | TLA+ 48 | Model 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | Implementation 60 | Code and Tests 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | Hyperlinked 72 | Static Web Site 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | Hyperlinked 84 | Spec / Model / Code 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | Summary Reports 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | Configuration 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | Intermediate 118 | Representation 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | Static Site 130 | Generator 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | Configuration 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | GitHub / GitLab 153 | Issues (Defects) 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 | 191 | 192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 | 222 | 223 | 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | 270 | 271 | 272 | 273 | 274 | 275 | Via API call 276 | 277 | 278 | 279 | 280 | 281 | 282 | 283 | 284 | 285 | -------------------------------------------------------------------------------- /lightclient/verification.md: -------------------------------------------------------------------------------- 1 | **VDD example for high-level English spec** 2 | 3 | 4 | TODO: check that reliable communication is defined properly 5 | TODO: authentication 6 | 7 | 8 | # Core Verification 9 | 10 | > Rough outline of what the component is doing and why. 2-3 paragraphs 11 | 12 | The light client implements a read operation of a 13 | [header][TMBC-HEADER-link] from the [blockchain][TMBC-SEQ-link], by 14 | communicating with full nodes. As some full nodes may be faulty, this 15 | functionality must be implemented in a fault-tolerant way. 16 | 17 | For the purpose of this specification, we assume that the blockchain 18 | is a list of headers, rather than a list of blocks, by 19 | [**[TMBC-HEADER]**][TMBC-HEADER-link]. 20 | 21 | In the Tendermint blockchain, the validator set may change with every 22 | new block. The staking and unbonding mechanism induces a [security 23 | model][TMBC-FM-2THIRDS-link]: starting at time *Time* of the 24 | [header][TMBC-HEADER-link], 25 | more than two-thirds of the next validators of a new block are correct 26 | for the duration of *TrustedPeriod*. The fault-tolerant read 27 | operation is designed for this security model. 28 | 29 | The challenge addressed here is that the light client might have a 30 | block of height *h1* and needs to read the block of height *h2* 31 | greater than *h1*. Checking all headers of heights from *h1* to *h2* 32 | might be too costly (e.g., in terms of energy for mobile devices). 33 | This specification tries to reduce the number of intermediate blocks 34 | that need to be checked, by exploiting the guarantees provided by the 35 | [security model][TMBC-FM-2THIRDS-link]. 36 | 37 | 38 | 39 | # Part I - External View 40 | 41 | ## Context of this document 42 | 43 | > mention other components and/or specifications that are relevant for this 44 | spec. Possible interactions, possible use cases, etc. 45 | 46 | > should give the reader the understanding in what environment this component 47 | will be used. 48 | 49 | 50 | In this document we specify the light client verification component, called *Verifier*. 51 | The *Verifier* communicates with a full node. 52 | As full nodes may be faulty, the light client has to check whether 53 | the header it receives coincides with the one generated by Tendermint consensus. 54 | The central features used in this specification are: 55 | 56 | - Tendermint blockchain ensures several [soundness properties][blockchain] 57 | [TMBC-SOUND-?]. If a block does not satisfy this soundness 58 | properties it did not originate from the blockchain. Verification 59 | encodes these tests, 60 | 61 | - the Tendermint [security model][TMBC-FM-2THIRDS-link] guarantees that there is a set of full 62 | nodes that represent more than two-thirds of the voting power in the *NextValidators* set, such that the full nodes in this set are correct from 63 | the time a block is generated until the trusting period has passed. 64 | 65 | 66 | To do verification checks based on these properties, the two 67 | properties [[TMBC-VAL-CONTAINS-CORR]][TMBC-VAL-CONTAINS-CORR-link] and 68 | [[TMBC-VAL-COMMIT]][TMBC-VAL-COMMIT-link] formalize the checks done 69 | by this specification: 70 | Given a trusted block *tb* and an untrusted block *ub* with a commit *cub*, 71 | one has to check that *cub* is in *PossibleCommit(ub)*, and that *cub* 72 | contains a correct node using *tb*. 73 | 74 | 75 | ## Informal Problem statement 76 | 77 | 78 | Given a height *h* as an input, the *Verifier* stores a header of 79 | height *h* locally. This header is generated by the Tendermint 80 | [blockchain][blockchain]. In particular, a header that violates one of 81 | the [soundness properties][blockchain] [TMBC-SOUND-?] should never be 82 | stored. 83 | 84 | 85 | ## Sequential Problem statement 86 | 87 | #### **[LCV-Seq-Live]**: 88 | The *Verifier* gets as input a height *h*, and eventually stores the 89 | header of height *h* of the blockchain, that is, *chain[h]* [**[TMBC-SEQ]**][TMBC-SEQ-link]. 90 | 91 | #### **[LCV-Seq-Inv]**: 92 | The *Verifier* never stores a header which is not in the blockchain. 93 | 94 | 95 | # Part II - Protocol view 96 | 97 | ## Environment/Assumptions/Incentives 98 | 99 | 100 | > Introduce distributed aspects 101 | 102 | > Timing and correctness assumptions. Possibly with justification that the 103 | assumptions make sense, e.g., it is in the interest of a full node to behave 104 | correctly 105 | 106 | The light client *Verifier* communicates with a full node of a Tendermint blockchain. 107 | Full nodes satisfy the following properties: 108 | [**[TMBC-CorrFull]**][TMBC-CorrFull-link], 109 | and [**[TMBC-Auth-Byz]**][TMBC-Auth-Byz-link]. 110 | 111 | ### Incentives 112 | 113 | Faulty full nodes may benefit from lying to the light client, by making the 114 | light client accept a block that deviates (e.g., contains additional 115 | transactions) from the one generated by Tendermint consensus. 116 | Users using the light client might be harmed by accepting a forged header. 117 | 118 | The [fork detector][failuredetector] of the light client may help the correct full nodes to understand whether their header is a good one. 119 | Hence, in combination with the light client detector, the correct full nodes have the incentive to respond. 120 | We can thus base liveness arguments on the assumption that correct full nodes reliably talk to the light client. 121 | 122 | 123 | 124 | ### Assumptions 125 | 126 | #### **[LCV-A-FULL]**: 127 | The verifier communicates with a full node. No assumption is made about the full node (it may be correct or faulty). 128 | 129 | #### **[LCV-A-Comm]**: 130 | Communication between the light client and a correct full node is 131 | reliable and bounded in time. Reliable communication means that 132 | messages are not lost, not duplicated, and eventually delivered. There 133 | is a (known) end-to-end delay *Delta*, such that if a message is sent 134 | at time *t* then it is received and processes by time *t + Delta*. 135 | 136 | #### **[LCV-A-TFM]**: 137 | The Tendermint blockchain satisfies the Tendermint failure model [**[TMBC-FM-2THIRDS]**][TMBC-FM-2THIRDS-link]. 138 | 139 | 143 | 144 | 145 | ## Distributed Problem Statement 146 | 147 | > safety specifications / invariants in English 148 | 149 | 150 | > liveness specifications in English. Possibly with timing/fairness requirements: 151 | e.g., if the component is connected to a correct full node and communication is 152 | reliable and timely, then something good happens eventually. 153 | 154 | > should have clear formalization in temporal logic. 155 | 156 | ### Design choices 157 | 158 | #### **[LCV-D-State]**: 159 | The light client has a local data structure called *State* that 160 | contains headers. 161 | **TODO:** make consistent with detector 162 | 163 | #### **[LCV-D-Primary]**: 164 | The light client has a local variable *primary* that contains the Address (ID) of a full node. 165 | 166 | #### **[LCV-D-State-Init]**: 167 | State is initialized with *inithead* that was correctly generated by the Tendermint consensus. 168 | 169 | ### Temporal Properties 170 | 171 | #### **[LCV-VC-Inv]**: 172 | It is always the case that every header in *State* was generated by an instance of Tendermint consensus. 173 | 174 | #### **[LCV-VC-Live]**: 175 | From time to time, a new instance of the verifier is called with a height *h*. Each instance must eventually terminate. The instance adds a header *hd* with height *h* to *State* if 176 | 177 | - the full node (peer) with which the verifier communicates is correct 178 | - *State* contains a header whose age is less than the trusting period. 179 | 180 | *Remark*: These definitions imply that if the peer is faulty, a header may or may not be added to *State*. In any case, [**[LCV-VC-Inv]**](#lcv-vc-inv) must hold. 181 | 182 | *Remark*: The invariant [**[LCV-VC-Inv]**](#lcv-vc-inv) and the liveness requirement [**[LCV-VC-Live]**](#lcv-vc-live) 183 | allow that headers are added to *State* whose height was not passed 184 | to the verifier (e.g., intermediate headers used in bisection; see below). 185 | 186 | *Remark*: In liveness [**[LCV-VC-Live]**](#lcv-vc-live) we use "eventually", while in practice 187 | the header *hd* should be added to *State* before the *trustingPeriod* expires, starting from *hd.Time*. 188 | 189 | ### Solving the sequential specification 190 | 191 | This specification provides a partial solution to the sequential specification. 192 | The *Verifier* solves the invariant of the sequential part 193 | 194 | [**[LCV-VC-Inv]**](#lcv-vc-inv) => [**[LCV-Seq-Inv]**](#lcv-seq-inv) 195 | 196 | In the case the peer is correct, and there is a recent header in *State*, the verifier satisfies the liveness requirements. 197 | 198 | /\ "correct peer" 199 | /\ \E TrustedState in State. TrustedState.SignedHeader.Header.Time > 200 | now - *trustingPeriod* 201 | /\ [**[LCV-A-Comm]**](#lcv-a-comm) /\ 202 | [**[TMBC-CorrFull]**][TMBC-CorrFull-link] /\ 203 | [**[LCV-VC-Live]**](#lcv-vc-live) 204 | => [**[LCV-Seq-Live]**](#lcv-seq-live) 205 | 206 | 207 | ## Definitions 208 | > In this section we become more concrete, with basic data types, 209 | 210 | > some math that allows to write specifications and pseudo code solution below. 211 | Some variables, etc. 212 | 213 | ### Data structures 214 | 215 | **TODO:** High level explanations of data structures? 216 | **TODO:** *State* is missing. Should be made consistent with detector. We should decide what it contains: e.g., (i) set of headers, 217 | (ii) set of TrustedState (iii) set of pairs: Trustedstate, address of 218 | full node from which the lightlient downloaded the header 219 | 220 | In the following, only the details of the data structures needed for 221 | this specification are given. 222 | 223 | #### **[LCV-TRUSTED-STATE]** 224 | A `TrustedState` is a data stucture that is used to store data about 225 | correct headers (defined below) in the *State* of the light client. 226 | It has the following fields: 227 | - `SignedHeader`, a [signed header][fullnode-data-structures] 228 | - `ValidatorSet`, a [validator set][blockchain-validator-set] 229 | 230 | 231 | #### **[LCV-CORRECT-SIGNED-HEADER]** 232 | A signed header *sh* of height *h* is correct, if it coincides with the header at height *h* on the blockchain, and: 233 | - if the `Commit` of *sh* equals to the `LastCommit` of height *h+1* 234 | (canonic commit) [**[TMBC-SOUND-DISTR-LAST-COMM]**][TMBC-SOUND-DISTR-LAST-COMMIT-link] 235 | , or 236 | - if the `Commit` of *sh* contains signatures of validators of height 237 | *h* that represent more than two-thirds of the voting power at 238 | height *h* [**[TMBC-SOUND-DISTR-PossCommit]**][TMBC-SOUND-DISTR-PossCommit-link]. 239 | 240 | 263 | 264 | ## Solution 265 | 266 | > Basic data structures. Simplified, so that we can focus on the distributed 267 | algorithm here. If existing: link to Tendermint data structures, and mentioned 268 | if details were omitted. 269 | 270 | > Pseudo code of the solution 271 | 272 | The light client verifier has the following configuration parameters: 273 | - *trustThreshold*: a float. Can be used if correctness should not be based on more voting power and 1/3. 274 | - *trustingPeriod*: a time duration [**[TMBC-TIME_PARAMS]**][TMBC-TIME_PARAMS-link]. 275 | - *clockDrift*: a time duration. Correction parameter dealing with only approximately synchronized clocks. 276 | 277 | 278 | 279 | We start by presenting the function `VerifyHeaderAtHeight`. 280 | This function implements the problem statement and is used in 281 | [**[LCV-VC-Live]**](#lcv-vc-live). Within the light client 282 | architecture, verification is embedded as described in [LCV-TState] 283 | and [LCV-INTF] just below. 284 | 285 | #### **[LCV-TState]**: 286 | 287 | `VerifyHeaderAtHeight` is called with `trustedState`, whose header is 288 | the header that has maximal height in *State*, and the address *addr* 289 | of the primary (if called from the detector, the address of a secondary). 290 | 291 | --- 292 | 293 | 294 | #### **[LCV-INTF]**: 295 | *State* is supposed to be maintained outside of this specification. When 296 | `VerifyHeaderAtHeight` is called, the signed header of the `trustedState` passed as input is in *State*. When the function returns a `TrustedState`, its signed header is added to *State*. 297 | 298 | --- 299 | 300 | The function `VerifyHeaderAtHeight` checks timestamps, and in case these preliminary checks go through, it calls bisection, by calling the function `VerifyBisection`. 301 | The function `VerifyBisection` implements 302 | the recursive logic for checking whether it is possible to build a trust 303 | relationship between `trustedState` and untrusted header at `untrustedHeight`. 304 | 305 | 306 | 307 | #### **[LCV-MAIN-VerifyHeaderAtHeight]**: 308 | ```go 309 | func VerifyHeaderAtHeight(untrustedHeight int64, 310 | trustedState TrustedState, 311 | addr Address) (TrustedState, error) 312 | ``` 313 | - Implementation remark 314 | - *startTime* and *endTime* are the local system time right after 315 | invocation of `VerifyHeaderAtHeight` and right before the function 316 | returns, respectively. 317 | - Expected precondition 318 | - The field `Time` of the signed header of `trustedState` is within *trustingPeriod* from *startTime* 319 | - Expected postcondition: 320 | - Returns `(trustedState, OK)` under [**[FN-LuckyCase]**][FN-LuckyCase-link], 321 | if the signed header of `trustedState`: 322 | - is the header at height `untrustedHeight` of the blockchain, and 323 | - was generated within *trustingPeriod* from *endTime* 324 | - Returns `(trustedState, EXPIRED)` under [**[FN-LuckyCase]**][FN-LuckyCase-link], if 325 | the signed header of `trustedState`: 326 | - is the header at height `untrustedHeight` of the blockchain, and 327 | - was generated after *endTime - trustingPeriod* 328 | - Error conditions 329 | - precondition violated 330 | - [**[FN-LuckyCase]**][FN-LuckyCase-link] does not hold 331 | - [**[FN-ManifestFaulty]**][FN-ManifestFaulty-link] holds 332 | 333 | --- 334 | 335 | 336 | `VerifyBisection` is used to get a trusted state, whose signed header has height `untrustedHeight` in the blockchain. 337 | To do so, `VerifyBisection` first downloads the necessary information 338 | from the peer, by calling `QueryFullNode`. 339 | This information includes a signed header `sh`, and two validator sets `vs, nextVs`. 340 | The result of `QueryFullNode`, together with the `trustedState`, is passed as input 341 | to the function `VerifySingle`. 342 | If there are no errors, `VerifySingle` returns a new trusted state. 343 | In `VerifyBisection`, either the new trusted state obtained as result of `VerifySingle` is returned, 344 | or a new signed header is computed recursively. 345 | 346 | #### **[LCV-VerifyBisection]**: 347 | 348 | We give the pseudocode of `VerifyBisection` below, as well as the specifications 349 | of the functions called by it. 350 | 351 | ```go 352 | func VerifyBisection(untrustedHeight int64, 353 | trustedState TrustedState, 354 | addr Address 355 | now Time 356 | ) (TrustedState, error) { 357 | 358 | sh, vs, nextVs, err := QueryFullNode(addr,untrustedHeight) 359 | if err == nil { 360 | newTrustedState, err := VerifySingle(sh, vs, nextVs, trustedState) 361 | if err == OK { 362 | return newTrustedState 363 | } else if err = CANNOT_VERIFY{ 364 | compute pivot 365 | newTrustedState := VerifyBisection(pivot, trustedState, now) 366 | return VerifyBisection(untrustedHeight, newTrustedState, now) 367 | } 368 | } 369 | } 370 | ``` 371 | - Expected precondition 372 | - the field `Time` of the signed header of `trustedState` is within *trustingPeriod* from `now` 373 | - Expected postcondition 374 | - Returns a trusted state whose header is the header at height `untrustedHeight` from the blockchain, if [**[FN-LuckyCase]**][FN-LuckyCase-link] holds, and if the field `Time` of the header of the returned trusted state is greater than `now + clockDrift` 375 | - Error conditions 376 | - violated precondition 377 | - [**[FN-LuckyCase]**][FN-LuckyCase-link] does not hold 378 | - the header lies in the future 379 | **TODO:** What is the precise 380 | condition about the future? 381 | 382 | --- 383 | 384 | #### **[LCV-QueryFullNode]**: 385 | 386 | `QueryFullNode` is called by `VerifyBisection`, and it is used to gather information from a 387 | full node at address `addr`. 388 | ```go 389 | func QueryFullNode(addr Address, 390 | untrustedHeight int64) 391 | (SignedHeader, ValidatorSet, ValidatorSet, error) 392 | ``` 393 | - Implementation remark 394 | - Used to communicate with a full node *n* at address *addr* via RPCs `Commit` and `Validators` 395 | - The only function that makes external calls! 396 | - in order to ensure [**[FN-LuckyCase]**][FN-LuckyCase-link] the 397 | timeout for the RPCs must be greater than or equal to *2 Delta*, cf. 398 | [**[LCV-A-Comm]**](#lcv-a-comm). 399 | - Expected precondition 400 | - true 401 | - Expected postcondition: 402 | - If *n* is correct and there is no error in the RPC to *n*: Returns the following data: 403 | - `SignedHeader` of height `untrustedHeight`, 404 | - `ValidatorSet` of height `untrustedHeight`, 405 | - `ValidatorSet` of height `untrustedHeight + 1` 406 | - The field time of the returned signed header is smaller than `now + clockDrift` 407 | - Error conditions 408 | - precondition violated 409 | - The field time of the returned signed header is greater than or 410 | equal to `now + clockDrift` 411 | - If *n* is faulty or there is an error in the RPC to *n* 412 | 413 | *Remark*: Observe that the error conditions includes "error in RPC to *n*" but 414 | *not* [**[FN-LuckyCase]**][FN-LuckyCase-link]. 415 | A faulty peer might return arbitrary values, without 416 | forcing the function to report an error. 417 | 418 | --- 419 | 420 | If `QueryFullNode` returns without error, `VerifyBisection` calls `VerifySingle`. 421 | 422 | #### **[LCV-VerifySingle]**: 423 | 424 | 425 | ```go 426 | func VerifySingle(untrustedSh SignedHeader, 427 | untrustedVs ValidatorSet, 428 | untrustedNextVs ValidatorSet, 429 | trustedState TrustedState) (TrustedState, error) 430 | ``` 431 | 432 | - Implementation remarks: 433 | - This function does not make external RPC calls to the full node; the whole logic is 434 | based on the local (given) state. 435 | - Expected precondition: 436 | - the field `Time` of the untrusted signed header `untrustedSh` is greater than `now + clockDrift` 437 | - the signed header of the trusted state was generated within the *trustingPeriod* 438 | - the height and `Time` of the signed header of the trusted state are smaller than the height and 439 | `Time` of the untrusted signed header `untrustedSh`, respectively 440 | - the *SignedHeader* satisfies the soundness requirements 441 | [**[TMBC-SOUND-?]**][blockchain], in particular 442 | - the untrusted signed header `untrustedSh` and the untrusted validator sets `untrustedVs`, 443 | `untrustedNextVs` are consistent 444 | - if the untrusted signed header `unstrustedSh` is the immediate successor of 445 | the signed header of the trusted state `trustedState`, then it holds that 446 | the next validator set of the signed header of the `trustedState` is equal to the untrusted 447 | validator set `untrustedVs`, and moreover, more than two-thirds of the validators 448 | signed 449 | - Expected postcondition: 450 | - Returns `(trustedState, OK)` if: 451 | - the untrusted signed header `untrustedSh` is the immediate successor of the signed header 452 | of the trusted state `trustedState` [TMBC-SOUND-?], or 453 | - the untrusted signed header `untrustedSh` is a successor of 454 | the signed header of the trusted state `trustedState` and the 455 | validators that have more than *max(1/3,trustThreshold)* of 456 | voting power in the trusted state `trustedState` signed the 457 | untrusted signed header `untrustedSh` 458 | header passes the tests [TMBC-VAL-CONTAINS-CORR] and [TMBC-VAL-COMMIT] 459 | - Returns `(trustedState, CANNOT_VERIFY)` if 460 | [**[TMBC-VAL-CONTAINS-CORR]**][TMBC-VAL-CONTAINS-CORR-link] 461 | fails and header is does not violate the soundness 462 | checks [**[TMBC-SOUND-?]**][blockchain]. 463 | - Error condition: 464 | - precondition violated 465 | - the untrusted signed header `untrustedSh` is not a successor of the signed header of the trusted state `trustedState` 466 | 467 | --- 468 | 469 | If `VerifySingle` is successful, it returns `(TrustedState,OK)` to 470 | `VerifyBisection` which in turn also returns this `TrustedState`. If 471 | `(trustedState, CANNOT_VERIFY)` is returned, `VerifyBisection` 472 | computes a pivot height between the height of the signed header of the 473 | trusted state `trustedState` and the height `untrustedHeight`, and 474 | calls itself recursively. If an error is reported by `VerifySingle`, 475 | the error should be propagated. 476 | 477 | --- 478 | 479 | 480 | 481 | ## Correctness arguments 482 | 483 | > Proof sketches of why we believe the solution satisfies the specifications. 484 | Possibly giving inductive invariants that can be used to prove the specifications 485 | >Link to Part I 486 | 487 | **TO BE ARGUED** 488 | 489 | ### Why the protocol implements the distributed spec 490 | 491 | > distributed algorithm correctness proof comes here. 492 | 493 | 494 | # References 495 | 496 | [[block]] Specification of the block data structure. 497 | 498 | [[blockchain]] The specification of the Tendermint blockchain. Tags refering to this specification are labeled [TMBC-*]. 499 | 500 | [[failuredetector]] The specification of the light client fork detector. 501 | 502 | [[fullnode]] Specification of the full node API 503 | 504 | [[lightclient]] The light client ADR [77d2651 on Dec 27, 2019]. 505 | 506 | 507 | [block]: https://github.com/tendermint/spec/blob/master/spec/blockchain/blockchain.md 508 | [blockchain]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md 509 | [TMBC-HEADER-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-header 510 | [TMBC-SEQ-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-seq 511 | [TMBC-CorrFull-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-corrfull 512 | [TMBC-Auth-Byz-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-auth-byz 513 | [TMBC-Sign-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-sign 514 | [TMBC-FaultyFull-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-faultyfull 515 | [TMBC-TIME_PARAMS-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-time_params 516 | [TMBC-FM-2THIRDS-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-fm-2thirds 517 | [TMBC-VAL-CONTAINS-CORR-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-val-contains-corr 518 | [TMBC-VAL-COMMIT-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-val-commit 519 | [TMBC-SOUND-DISTR-LAST-COMMIT-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-sound-distr-last-commit 520 | [TMBC-SOUND-DISTR-PossCommit-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-sound-distr-posscommit 521 | 522 | 523 | [TMBC-INV-SIGN-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-inv-sign 524 | [TMBC-INV-VALID-link]: https://github.com/informalsystems/VDD/tree/master/blockchain/blockchain.md#tmbc-inv-valid 525 | 526 | [LCV-VC-LIVE-link]: https://github.com/informalsystems/VDD/tree/master/lightclient/verification.md#lcv-vc-live 527 | 528 | [lightclient]: https://github.com/interchainio/tendermint-rs/blob/e2cb9aca0b95430fca2eac154edddc9588038982/docs/architecture/adr-002-lite-client.md 529 | [failuredetector]: https://github.com/informalsystems/VDD/blob/master/liteclient/failuredetector.md 530 | [fullnode]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md 531 | 532 | [FN-LuckyCase-link]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md#fn-luckycase 533 | 534 | [blockchain-validator-set]: https://github.com/tendermint/spec/blob/master/spec/blockchain/blockchain.md#data-structures 535 | [fullnode-data-structures]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md#data-structures 536 | 537 | [FN-ManifestFaulty-link]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md#fn-manifestfaulty 538 | -------------------------------------------------------------------------------- /blockchain/blockchain.md: -------------------------------------------------------------------------------- 1 | *** This is the beginning of an unfinished draft. Don't continue reading! *** 2 | 3 | # Tendermint Blockchain 4 | 5 | > Rough outline of what the component is doing and why. 2-3 paragraphs 6 | 7 | A blockchain is a growing list of sets of transactions, denoted by 8 | *chain*. If several processes in a distributed system have access to 9 | the blockchain, they can (1) provide transactions as input and (2) use 10 | *chain* to execute these transactions in order. 11 | 12 | The *chain* itself should be implemented in a reliable way, which 13 | introduces the need for fault-tolerance and distribution. The 14 | Tendermint protocols implement the blockchain over an unreliable 15 | (Byzantine) distributed system. More precisely, a Tendermint system 16 | consists of many so-called [full nodes][fullnode] that each maintain a 17 | local copy of (a prefix of) the current *chain*. 18 | 19 | In this specification, we are only concerned with the *chain*. 20 | They are maintained at the (correct) full nodes, and are the result of the 21 | execution of the Tendermint consensus protocol, described in the 22 | [ArXiv paper][arXiv]. The *chain* is called *decision* in the 23 | paper, and the paper does not consider internals of what is stored in 24 | decision. The Tendermint consensus implements a loop with iterator 25 | *h* (height). In each iteration, upon deciding on the transactions to 26 | be put into the *chain*, a new *chain* entry is created. 27 | 28 | 29 | # Part I - Outside view 30 | 31 | ## Context of this document 32 | 33 | > mention other components and or specifications that are relevant for this 34 | spec. Possible interactions, possible use cases, etc. 35 | 36 | > should give the reader the understanding in what environment this component 37 | will be used. 38 | 39 | This specification is central in the collection of Tendermint 40 | protocols. The behavior of protocols like [fastsync][fastsync], or 41 | the [light client][lightclient] will be defined with respect to this 42 | specification. E.g., the light client implements a read operation of 43 | the *chain* entry of some height *h*. It is thus crucial to 44 | understand what data is stored in this *chain* entry, and what 45 | are the precise semantics of a read operation in a faulty environment. 46 | 47 | 48 | ## Informal Problem statement 49 | 50 | > for the general audience, that is, engineers who want to get an overview over what the component is doing 51 | from a bird's eye view. 52 | 53 | A blockchain provides a growing sequence of sets of transactions. 54 | 55 | ## Sequential Problem statement 56 | 57 | > should be English and precise. will be accompanied with a TLA spec. 58 | 59 | #### **[TMBC-HEADER]**: 60 | A set of blockchain transactions is stored in a data structure called 61 | *block*, which contains a field called *header*. (The data structure 62 | *block* is defined [here][block]). As the header contains hashes to 63 | the relevant fields of the block, for the purpose of this 64 | specification, we will assume that the blockchain is a list of 65 | headers, rather than a list of blocks. 66 | 67 | #### **[TMBC-HASH-UNIQUENESS]**: 68 | We assume that every hash in the header identifies the data it hashes. 69 | Therefore, in this specification, we do not distinguish between hashes and the 70 | data they represent. 71 | 72 | 73 | #### **[TMBC-HEADER-Fields]**: 74 | A header contains the following fields: 75 | 76 | - `Height`: non-negative integer 77 | - `Time`: time (integer) 78 | - `LastBlockID`: Hashvalue 79 | - `LastCommit` DomainCommit 80 | - `Validators`: DomainVal 81 | - `NextValidators`: DomainVal 82 | - `Data`: DomainTX 83 | - `AppState`: DomainApp 84 | - `LastResults`: DomainRes 85 | 86 | 87 | #### **[TMBC-SEQ]**: 88 | 89 | The Tendermint blockchain is a list *chain* of headers. 90 | 91 | ### Appending a block 92 | 93 | #### **[TMBC-SEQ-GROW]**: 94 | 95 | During operation, new headers may be appended to the list one by one. 96 | 97 | 98 | 99 | In the following, *ETIME* and *LTIME* are a lower and upper bounds, 100 | respectively, on the time interval between the times at which two 101 | successor blocks are added. We are not fixing these times 102 | here. Rather, they should serve as defining constraints for temporal 103 | properties of other protocols. For instance, we might instantiate 104 | these times by setting *ETIME* to infinity, when we say that 105 | *Fastsync* terminates in the case the blockchain does not grow. 106 | 107 | #### **[TMBC-SEQ-APPEND-E]**: 108 | If a header is appended at time *t* then no additional header will be 109 | appended before time *t + ETIME*. 110 | 111 | 112 | #### **[TMBC-SEQ-APPEND-L]**: 113 | If a header is appended at time *t* then the next header will be 114 | appended before time *t + LTIME*. 115 | 116 | #### **[TMBC-SEQ-APPEND-ELEL]**: 117 | *ETIME <= LTIME* 118 | 119 | 120 | 121 | 122 | 123 | 124 | ### Basic Soundness Conditions 125 | 126 | 127 | 128 | #### **[TMBC-SOUND-INC-HEIGHT]**: 129 | For all *i < len(chain)*: *chain[i].Height + 1 = chain[i+1].Height* 130 | 131 | *Remark:* We do not write *chain[i].Height = i*, to allow that a chain 132 | can be started at some arbitrary height, e.g., when there is social 133 | consensus to restart a chain from a given height/block. 134 | 135 | 136 | 137 | #### **[TMBC-SOUND-INC-TIME]**: 138 | For all *i < len(chain)*: *chain[i].Time < chain[i+1].Time* 139 | 140 | 141 | #### **[TMBC-SOUND-NextV]**: 142 | For all *i < len(chain)*: *chain[i+1].Validators = chain[i].NextValidators* 143 | 144 | 145 | 146 | 147 | ### Functions, Domains, and more soundness conditions 148 | 149 | #### **[TMBC-SOUND-PossCommit]**: 150 | There is a function *PossibleCommit* that maps a block (header) to a set 151 | of values in DomainCommit, cf. [TMBC-HEADER-Fields]. 152 | 153 | 157 | 158 | 159 | #### **[TMBC-SOUNDNESS-FUNCTIONS]**: 160 | The system provides the following functions: 161 | 162 | - `hash`: We assume that every hash identifies the data it hashes 163 | 164 | - `execute`: used for state machine replication. maps *Data* 165 | (transactions) and an *application state* to a new state. It is a function 166 | (deterministic transitions). 167 | **TODO:** it is provided by the 168 | application. Do we need to talk about applications in this spec? 169 | 170 | - `proof(b,commit)`: a predicate: true iff 171 | * *b* is part of the *chain*, that is, there is an *i* such that 172 | *chain[i] = b* 173 | * *commit* is in PossibleCommit(b), cf. [TMBC-SOUND-PossCommit]. 174 | 175 | *Remark.* Observe that *proof* refers to the *chain*. It thus depend 176 | on the execution, which results in a different quantifier order. For 177 | instance, we say "there exists a function *hash* such that for all 178 | runs", while we say "for each run there exists a function 179 | *proof*". The consequence is that *hash* is a predetermined function 180 | (implemented), while *proof* will have to be computed during the run 181 | as a function of the *chain*. The challenge in a distributed system is 182 | to locally compute *proof* without necessarily having complete 183 | knowledge of *chain*. In the context of the light client, we even want 184 | to infer knowledge about *chain* from the outcomes of the local 185 | computation of *proof*. We will use digital signatures for that. We 186 | will introduce them below when we introduce the distributed aspects. 187 | 188 | 189 | #### **[TMBC-SOUNDNESS-PREDICATES]**: 190 | Given two blocks *b* and *b'*: 191 | 192 | - `match-hash(b,b')` iff *hash(b) = b'.LastBlockID* 193 | - `match-proof(b,b')` iff *proof(b, b'.LastCommit)* 194 | 195 | #### **[TMBC-SOUND-CHAIN]**: 196 | For all *i < len(chain)*: *match-hash(chain[i], chain[i+1])* 197 | 198 | #### **[TMBC-SOUND-LAST-COMM]**: 199 | For all *i < len(chain)*: *match-proof(chain[i], chain[i+1])* 200 | 201 | 202 | #### **[TMBC-SOUND-APP]**: 203 | For all *i < len(chain)*: *chain[i+1].AppState = execute(chain[i].Data,chain[i].AppState)* 204 | 205 | 206 | #### **[TMBC-SEQ-INV]** 207 | 208 | At all times, the chain is sound [TMBC-SOUND-*]. 209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | # Part II - Protocol view 219 | 220 | ## Environment/Assumptions/Incentives 221 | 222 | > Introduce distributed aspects 223 | 224 | > Timing and correctness assumptions. Possibly with justification that the 225 | assumptions make sense, e.g., it is in the interest of a full node to behave 226 | correctly 227 | 228 | > should have clear formalization in temporal logic. 229 | 230 | ## Timing, nodes, and correctness assumptions 231 | 232 | In a Tendermint system, the nodes that choose to interact with each 233 | other collectively compute the *chain* in a distributed manner 234 | by executing the [consensus protocol][arXiv]. 235 | The Tendermint protocols should ensure that 236 | the participating nodes cannot benefit by deviating from the expected 237 | behavior. As a result, the nodes should be motivated (incentivized) 238 | to follow the protocol. This is achieved by requiring the nodes to 239 | bond atoms and by the system punishing the nodes that do not follow 240 | the protocol. That is, the nodes bond atoms in order to participate, 241 | and they are punished if they misbehave. If the nodes want to claim 242 | their atoms, they have to wait for a certain period of time, called 243 | the *unbonding period*. The unbonding period is a configuration 244 | parameter of the system. 245 | 246 | #### **[TMBC-TIME-PARAMS]**: 247 | A Tendermint blockchain has the following configuration parameters: 248 | - *unbondingPeriod*: a time duration. 249 | - *trustingPeriod*: a time duration smaller than *unbondingPeriod*. 250 | 251 | 252 | 253 | 254 | #### **[TMBC-NODES]**: 255 | Tendermint full nodes (or just *full nodes*), execute a set of 256 | protocols, e.g., consensus, gossip, fast sync, etc. When a full node 257 | actively participates in the distributed consensus, it is called a 258 | *validator node*. 259 | 260 | #### **[TMBC-CORRECT]**: 261 | We define a predicate *correctUntil(n, t)*, where *n* is a node and *t* is a 262 | time point. 263 | The predicate *correctUntil(n, t)* is true if and only if the node *n* 264 | follows all the protocols (at least) until time *t*. 265 | 266 | ## Authenticated Byzantine Model 267 | 268 | Due to the incentives, we claim that it is safe to assume below that 269 | most of the full nodes will follow the protocol. However, we also have 270 | to capture that full nodes deviate from the prescribed behavior, 271 | either due to misconfiguration or implementation bugs, or because of 272 | adversarial behavior. At the same time, we will heavily use digital 273 | signature, and they constitute a stable technology that allows to 274 | determine the sender of a message, even if a messages is wrapped into 275 | another message and forwarded. In the distributed algorithm literature 276 | (e.g., [[DLS88]][DLS]), this model is called **authenticated 277 | Byzantine**. Similar to Nancy Lynch when writing her book on 278 | distributed algorithms, "we do not know of a nice formal definition" 279 | for Byzantine failures with authentication. So, we give an 280 | axiomatization that is sufficient for verification. 281 | 282 | #### **[TMBC-Sign]**: 283 | - For each node with address *addr*, there is a function *sign_addr* 284 | that maps *Data* to a domain *SignedData(addr)*. 285 | - *SignedDataDomain* is the disjoint union of *SignedData(addr)* over 286 | all *addr*. We call an element of *SignedDataDomain* signed data. 287 | - There is a function *check* that maps signed data *sd* 288 | to the pair *(data, addr)* such that *sd = sign_addr(data)*. 289 | 290 | #### **[TMBC-Sign-NoForge]**: 291 | For all runs *r*, for all nodes *p* and *q*, for all 292 | *sdq* from SignedDataDomain, if *aq* is the address of *q* and *p* 293 | (different from *q*) 294 | sends a message that contains *sdq* from *SignedData(aq)* in run *r*, 295 | then *q* has sent a message containing *sdq* earlier in run *r*. 296 | 297 | *Remark:* [TMBC-Sign-NoForge] can be written as invariant over the 298 | message history. 299 | 300 | 301 | #### **[TMBC-FaultyFull]**: 302 | No assumption is made about the internal 303 | behavior of faulty full nodes. 304 | 305 | #### **[TMBC-Auth-Byz]**: 306 | The authenticated Byzantine model assumes [TMBC-Sign-NoForge] and 307 | [TMBC-FaultyFull], that is, faulty nodes are limited in that they 308 | cannot forge messages [TMBC-Sign-NoForge]. 309 | 310 | 311 | 312 | 313 | ## Validators 314 | 315 | In [TMBC-HEADER-Fields], most of the fields are defined for abstract 316 | domains. Here we will specialize DomainVal and DomainCommit to the 317 | distributed setting, and describe how validators and commit are 318 | implemented in Tendermint consensus. 319 | 320 | *Remark:* We observed that in the existing documentation the term 321 | *validator* refers to both a data structure and a full node that 322 | participates in the distributed computation. Therefore, we introduce 323 | the notions *validator pair* and *validator node*, respectively, to 324 | distinguish these notions in the cases where they are not clear from 325 | the context. 326 | 327 | 328 | #### **[TMBC-VALIDATOR-Pair]**: 329 | 330 | Given a full node, a 331 | *validator pair* is a pair *(address, voting_power)*, where 332 | - *address* is the address (public key) of a full node, 333 | - *voting_power* is an integer (representing the full node's 334 | voting power in a certain consensus instance). 335 | 336 | *Remark:* In the Golang implementation the data type for *validator 337 | pair* is called `Validator` 338 | 339 | 340 | #### **[TMBC-VALIDATOR-Set]**: 341 | 342 | A *validator set* is a set of validator pairs. For a validator set 343 | *vs*, we write *TotalVotingPower(vs)* for the sum of the voting powers 344 | of its validator pairs. 345 | 346 | #### **[TMBC-VP-SOUND-VALID-UNIQUE]**: 347 | For each block in the chain, the set *Validators* contains at most 348 | one validator pair for each full node. 349 | 350 | #### **[TMBC-VP-SOUND-NEXT-VALID-UNIQUE]**: 351 | For each block in the chain, the set *NextValidators* contains at 352 | most one validator pair for each full node. 353 | 354 | 355 | 356 | ## Distributed Definition of Commit 357 | 358 | 359 | 360 | #### **[TMBC-VOTE]**: 361 | A *vote* contains a `prevote` or `precommit` message sent and signed by 362 | a validator node during the execution of [consensus][arXiv]. Each 363 | message contain the following fields 364 | - `Type`: prevote or precommit 365 | - `Height`: positive integer 366 | - `Round` a positive integer 367 | - `BlockID` a Hashvalue of a block (not necessarily a block of the chain) 368 | 369 | 370 | #### **[TMBC-COMMIT]**: 371 | A commit is a set of votes. 372 | 373 | **TODO:** clarify whether `prevote` or `precommit` are equivalent in 374 | the Commit. 375 | 376 | #### **[TMBC-SOUND-DISTR-PossCommit]**: 377 | For a block *b*, each element *pc* of *PossibleCommit(b)* satisfies: 378 | - each vote *v* in *pc* satisfies 379 | * *pc* contains only votes (cf. [TMBC-VOTE]) 380 | by validators from *b.Validators* 381 | * v.blockID = hash(b) 382 | * v.Height = b.Height 383 | **TODO:** complete the checks here 384 | - the sum of the voting powers in *pc* is greater than 2/3 385 | *TotalVotingPower(b.Validators)* 386 | 387 | 388 | #### **[TMBC-SOUND-DISTR-LAST-COMM]**: 389 | Combining the specialization of *PossibleCommit* from 390 | [TMBC-SOUND-DISTR-PossCommit] with the abstract definitions 391 | [TMBC-SOUND-FUNCTIONS] and [TMBC-SOUND-LAST-COMM] we obtain the 392 | definition of soundness for LastCommit in the distributed setting. 393 | 394 | 395 | 396 | 397 | 398 | 399 | ## Commit Invariants 400 | 401 | Commit messages are used to establish proof that a certain block is on 402 | the blockchain. 403 | 404 | We now make explicit some invariants a correct validator node must ensure. 405 | We us from the [consensus algorithm][arXiv] 406 | the predicate `valid` over blocks, and `precommit` 407 | messages. 408 | Correct validator nodes use *valid* to ensure the soundness requirements of 409 | the blockchain [TMBC-SOUND-?], and send *precommit* messages 410 | only for blocks for which *valid* evaluates to true. 411 | 412 | #### **[TMBC-INV-CORR-PROC-VALID]**: 413 | 414 | There is a predicate `valid` over a block (and the prefix of the 415 | chain, which we omit in the notation). 416 | In particular, if *valid(b)* evaluates to true at a correct validator node, 417 | then *b.Validators = pred'.NextValidators* of the block *pred* of 418 | height *h - 1* of the blockchain; 419 | 420 | 421 | The following invariant is crucial to guarantee the soundness of the chain: 422 | 423 | 424 | #### **[TMBC-INV-CORR-PROC-COMMIT]**: 425 | 426 | A correct validator node sends and signs precommit for a block *b*, only if `valid(b)`. 427 | 428 | *Remark:* Follows from code line 36 in the [consensus algorithm][arXiv]. 429 | 430 | 431 | 432 | 433 | 434 | From [TMBC-INV-CORR-PROC-VALID] and [TMBC-INV-CORR-PROC-COMMIT] 435 | follows that **more than two thirds of the voting 436 | power in *b.Validators* is correct for any block *b* signed by a correct 437 | validator node**. As a result, a commit that is well-formed (that is, is in 438 | *PossibleCommit(b)*) and signed by a correct validator node is a proof that 439 | *b* is in the blockchain. 440 | 441 | *Remark:* "Signed by a correct validator node" means that the 442 | validator node *n* 443 | sends *precommit* at time *t* and *correctUntil(n, t)* holds. 444 | 445 | 446 | 447 | #### **[TMBC-VAL-COMMIT]**: 448 | 449 | If for a block *b*, a commit *c* 450 | - contains at least one validator pair *(v,p)* such that *v* is a correct 451 | validator node, and 452 | - is contained in *PossibleCommit(b)* 453 | 454 | then the block *b* is on the blockchain. 455 | 456 | 457 | 458 | 459 | ## Tendermint failure model (a.k.a. Tendermint Security Model) 460 | 461 | 462 | 463 | 464 | 465 | #### **[TMBC-FM-2THIRDS]**: 466 | If a block *h* is in the chain, 467 | then there exists a subset *CorrV* 468 | of *h.NextValidators*, such that: 469 | - *TotalVotingPower(CorrV) > 2/3 470 | TotalVotingPower(h.NextValidators)*; cf. [TMBC-VALIDATOR-Set] 471 | - For every validator pair *(n,p)* in *CorrV*, it holds *correctUntil(n, 472 | h.Time + trustingPeriod)*; cf. [TMBC-CORRECT] 473 | 474 | 475 | 476 | 481 | 482 | *Remark:* The definition of correct 483 | [**[TMBC-CORRECT]**](TMBC-CORRECT-link) refers to realtime, while it 484 | is used here with *Time* and *trustingPeriod*, which are "hardware 485 | times". We do not make a distinction here. 486 | 487 | 488 | From [TMBC-FM-2THIRDS] we directly derive the following observation: 489 | 490 | #### **[TMBC-VAL-CONTAINS-CORR]**: 491 | 492 | Given a (trusted) block *tb* of the blockchain, a set of full nodes 493 | *N* contains a correct node at a real-time *t*, if 494 | - *t - trustingPeriod < tb.Time < t* 495 | - the voting power in tb.NextValidators of nodes in *N* is more 496 | than 1/3 of *TotalVotingPower(tb.NextValidators)* 497 | 498 | 499 | 500 | 501 | 502 | *Remark:* The light client verification checks [TMBC-VAL-CONTAINS-CORR] and 503 | [TMBC-VAL-COMMIT] as follows: 504 | Given a trusted block *tb* and an untrusted block *ub* with a commit *cub*, 505 | one has to check that *cub* is in *PossibleCommit(ub)*, and that *cub* 506 | contains a correct node using *tb*. 507 | 508 | 509 | 510 | Until now, we have established soundness of the blockchain, and some 511 | invariants expected from correct validators when observed from the 512 | outside. Below we describe the internals of the [consensus 513 | algorithm][arXiv]. For details we refer to the [paper][arXiv]. 514 | *Remark:* For 515 | now the goal of this specification is to have a formal understanding of the outside view of 516 | the blockchain in order to be able to specify other protocols. 517 | 518 | ## Distributed Problem Statement 519 | 520 | ### Design choices 521 | 522 | 523 | #### **[TMBC-FM-CONS]**: 524 | (Consensus failure model) 525 | There is a set *C* of validator pairs, such that *C* is a subset of *NextValidators* at height *h*, where: 526 | - The validator pairs in *C* hold more than two-thirds of the total voting power in *NextValidators* at height *h* 527 | - For every validator pair *(n,p)* in *C*, follows the consensus protocol until consensus for height *h+1* is terminated. 528 | 529 | 530 | We recall that [TMBC-CORRECT] denotes by *correctUntil(n, t)* that full 531 | node *n* is correct up to time *t* if it follows all the protocols 532 | up to time *t*. For now we assume that both failure assumptions 533 | [TMBC-FM-2THIRDS] and [TMBC-FM-CONS] hold. 534 | 535 | 536 | 537 | #### **[TMBC-LOCAL-CHAIN]**: 538 | 539 | Each correct full node *p* maintains its local copy of a prefix the Tendermint 540 | blockchain, denoted by *chain_p*. 541 | 542 | 543 | 544 | ### Temporal Properties 545 | 546 | > safety specifications / invariants in English 547 | 548 | > liveness specifications in English. Possibly with timing/fairness requirements: 549 | e.g., if the component is connected to a correct full node and communication is 550 | reliable and timely, then something good happens eventually. 551 | 552 | 553 | 554 | 555 | 556 | ### Safety 557 | 558 | #### **[TMBC-VC-AGR]**: 559 | At all times *t*, for any two full nodes *p* and *q*, with *correctUntil(p, 560 | t)* and *correctUntil(q, t)* it holds that *chain_p(t)* is a prefix 561 | of *chain_q(t)* or *chain_q(t)* is a prefix of *chain_p(t)*. 562 | 563 | #### **[TMBC-VC-VAL]**: 564 | For a full node *p*, we substitute *chain* with *chain_p* in the 565 | soundness properties [TMBC-SOUND-?]. For all times *t* and every full 566 | node *p*, with *correctUntil(p, t)*, the soundness requirements hold for 567 | *chain_p(t)*. 568 | 569 | 570 | *Remark:* Validity [TMBC-VC_VAL] should make reference to the mempool, 571 | e.g., only messages from the mempool are added to Data + we will need 572 | a spec for the mempool. For now I leave it like that as the light 573 | client and fastsync do not care about that. 574 | 575 | *Remark:* Additional application specific soundness requirements might 576 | also need to hold. 577 | 578 | 579 | 580 | ### Liveness 581 | 582 | The following is an abstract liveness property that states that a 583 | correct full nodes infinitely often append new blocks to the 584 | chain. This can be defined only for full nodes that are correct 585 | forever (as one needs infinite traces). 586 | 587 | 588 | #### **[TMBC-VC-LIVE]**: 589 | For all full nodes *p*, with *correctUntil(p, infinity)*, for all times 590 | *t*, there exists a time *t'*, such that *|chain_p(t)| < 591 | |chain_p(t')|*. 592 | 593 | 594 | The following property is formally not a liveness property (as it can 595 | be violated on a finite prefix) but is a progress property of practical 596 | relevance: 597 | 598 | #### **[TMBC-VC-PROG]**: 599 | For all full nodes *p*, and all times *t'*: If *correctUntil(p, t')*, 600 | then for all times *t < t' - LTIME* it holds that *|chain_p(t)| < 601 | |chain_p(t + LTIME)|*. 602 | 603 | *Remark:* In the temporal properties above we use the *correctUntil* 604 | predicate, in a way that suggests that all full nodes participate in 605 | Tendermint since the genesis block. However, there are Tendermint 606 | protocols (state sync, fast sync) that allow nodes to join the system 607 | later. We will have to define later what it means for these nodes to satisfy 608 | [TMBC-VC_AGR] and [TMBC-VC_VAL] and [TMBC-VC-PROG]. For instance, they 609 | may not need to have the complete prefix of *chain* but start at some height. 610 | 611 | 612 | > How is the problem statement linked to the "Sequential Problem statement". 613 | Simulation, implementation, etc. relations 614 | 615 | ### Solving the sequential specification 616 | 617 | **TODO:** How does the distributed specification map to the sequential 618 | one? For instance, at each time the longest prefix of *chain_p* for 619 | some *p* defines *chain* in the sequential specification. 620 | 621 | #### **[TMBC-CorrFull]**: 622 | Every correct Tendermint full node locally stores a prefix of the 623 | current list of headers from [**[TMBC-SEQ]**](TMBC-SEQ-link). 624 | 625 | 626 | 627 | 628 | **For the remainder, we refer to the [arXiv paper](arXiv) for now.** 629 | 630 | 631 | ## Definitions 632 | 633 | > In this section we become more concrete, with basic (abstracted) data types 634 | 635 | > some math that allows to write specifications and pseudo code solution below. 636 | Some variables, etc. 637 | 638 | ### Data structures 639 | 640 | 641 | 642 | ## Solution 643 | 644 | > Basic data structures. Simplified, so that we can focus on the distributed 645 | algorithm here. If existing: link to Tendermint data structures, and mentioned 646 | if details were omitted. 647 | 648 | 649 | 650 | ### Outline 651 | 652 | > Describe solution (in English), decomposition into functions, where communication to other components happens. 653 | 654 | ### Details 655 | 656 | > Pseudo code of the solution 657 | 658 | 659 | ## Correctness arguments 660 | 661 | > Proof sketches of why we believe the solution satisfies the problem statement. 662 | Possibly giving inductive invariants that can be used to prove the specifications 663 | of the problem statement 664 | 665 | 666 | 667 | # References 668 | 669 | [[block]] Definition of the block data structure 670 | 671 | [[blockchain]] Tendermint Blockcahin specification 672 | 673 | [[fastsync]] Specification of the fastsync protocol 674 | 675 | [[fullnode]] Specification of the full node API 676 | 677 | [[header]] Definition of the header data structure 678 | 679 | [[lightclient]] Light Client ADR 680 | 681 | [[verifier]] Light Client Verification Specification 682 | 683 | [[arXiv]] The Tendermint paper on arXiv 684 | 685 | 686 | [block]: https://github.com/tendermint/spec/blob/master/spec/blockchain/blockchain.md#block 687 | [blockchain]: https://github.com/tendermint/spec/blob/master/spec/blockchain/blockchain.md#blockchain 688 | [fastsync]: https://github.com/informalsystems/VDD/blob/master/fastsync/fastsync.md 689 | [lightclient]: https://github.com/interchainio/tendermint-rs/blob/e2cb9aca0b95430fca2eac154edddc9588038982/docs/architecture/adr-002-lite-client.md#adr-002-light-client 690 | [verifier]: https://github.com/informalsystems/VDD/blob/master/lightclient/verification.md#core-verification 691 | [header]: https://github.com/tendermint/spec/blob/master/spec/blockchain/blockchain.md#header 692 | [fullnode]: https://github.com/tendermint/spec/blob/master/spec/blockchain/fullnode.md 693 | 694 | [TMBC-SEQ-link]: https://github.com/informalsystems/VDD/blob/master/lightclient/blockchain.md#tmbc-seq 695 | [TMBC-VALIDATOR-link]: https://github.com/informalsystems/VDD/blob/master/lightclient/blockchain.md#tmbc-validator 696 | [TMBC-CORRECT-link]: https://github.com/informalsystems/VDD/blob/master/lightclient/blockchain.md#tmbc-correct 697 | [TMBC-TIME-link]: https://github.com/informalsystems/VDD/blob/master/lightclient/blockchain.md#tmbc-time 698 | 699 | [arXiv]: https://arxiv.org/abs/1807.04938 700 | 701 | [DLS]: https://groups.csail.mit.edu/tds/papers/Lynch/jacm88.pdf 702 | --------------------------------------------------------------------------------