├── C.cfg ├── CE.cfg ├── F.cfg ├── TemporalLogic.cfg ├── docs └── TLA+CheatSheet.pdf ├── figures ├── v01-ring01.gif ├── v01-ring03.gif └── v01-ring04.gif ├── APEWD998.cfg ├── .devcontainer ├── extensions │ ├── better-comments-2.0.5.vsix │ └── EFanZh.graphviz-preview-1.5.0.vsix ├── devcontainer.json └── install.sh ├── SmokeEWD998.cfg ├── MCEWD998.cfg ├── MCAsyncTerminationDetection.cfg ├── APAsyncTerminationDetection.cfg ├── .gitpod.yml ├── Utils.tla ├── .gitignore ├── MCAsyncTerminationDetection_actions.dot ├── C.tla ├── LICENSE ├── TemporalLogic.tla ├── .vscode └── settings.json ├── IncDec.tla ├── SmokeEWD998.tla ├── AsyncTerminationDetection_apalache.tla ├── EWD998_proof.tla ├── SyncTerminationDetection.tla ├── .github └── workflows │ └── main.yml ├── MCAsyncTerminationDetection.tla ├── APEWD998.tla ├── O.tla ├── F.tla ├── README.md ├── AsyncTerminationDetection_proof.tla ├── MCEWD998.tla ├── MCEWD998_actions.dot ├── EWD998.tla └── AsyncTerminationDetection.tla /C.cfg: -------------------------------------------------------------------------------- 1 | SPECIFICATION SpecC 2 | PROPERTY InvT -------------------------------------------------------------------------------- /CE.cfg: -------------------------------------------------------------------------------- 1 | SPECIFICATION SpecE 2 | PROPERTY InvT -------------------------------------------------------------------------------- /F.cfg: -------------------------------------------------------------------------------- 1 | \* TLC always expects a config file, even if it is empty. -------------------------------------------------------------------------------- /TemporalLogic.cfg: -------------------------------------------------------------------------------- 1 | SPECIFICATION 2 | Spec 3 | 4 | PROPERTIES 5 | Prop 6 | 7 | ALIAS 8 | Alias -------------------------------------------------------------------------------- /docs/TLA+CheatSheet.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/docs/TLA+CheatSheet.pdf -------------------------------------------------------------------------------- /figures/v01-ring01.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/figures/v01-ring01.gif -------------------------------------------------------------------------------- /figures/v01-ring03.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/figures/v01-ring03.gif -------------------------------------------------------------------------------- /figures/v01-ring04.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/figures/v01-ring04.gif -------------------------------------------------------------------------------- /APEWD998.cfg: -------------------------------------------------------------------------------- 1 | CONSTANT N = 3 2 | SPECIFICATION Spec 3 | INVARIANT TypeOK 4 | INVARIANT Inv 5 | INVARIANT MaxDiameter 6 | -------------------------------------------------------------------------------- /.devcontainer/extensions/better-comments-2.0.5.vsix: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/.devcontainer/extensions/better-comments-2.0.5.vsix -------------------------------------------------------------------------------- /.devcontainer/extensions/EFanZh.graphviz-preview-1.5.0.vsix: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/.devcontainer/extensions/EFanZh.graphviz-preview-1.5.0.vsix -------------------------------------------------------------------------------- /SmokeEWD998.cfg: -------------------------------------------------------------------------------- 1 | CONSTANT 2 | N = 3 3 | Init <- SmokeInit 4 | SPECIFICATION Spec 5 | INVARIANT TypeOK 6 | INVARIANT Inv 7 | CONSTRAINT StopAfter 8 | CHECK_DEADLOCK FALSE -------------------------------------------------------------------------------- /MCEWD998.cfg: -------------------------------------------------------------------------------- 1 | CONSTANT N = 3 2 | SPECIFICATION Spec 3 | INVARIANT TypeOK 4 | INVARIANT Inv 5 | \* CONSTRAINT StateConstraint 6 | PROPERTY ATDSpec 7 | INVARIANT MaxDiameter 8 | ALIAS Alias -------------------------------------------------------------------------------- /MCAsyncTerminationDetection.cfg: -------------------------------------------------------------------------------- 1 | CONSTANT N = 3 2 | CONSTANT Init <- MCInit 3 | SPECIFICATION Spec 4 | CONSTRAINT StateConstraint 5 | \* ACTION_CONSTRAINT ActionConstraint 6 | INVARIANT TypeOK 7 | PROPERTY Stable 8 | PROPERTY ActuallyNext 9 | PROPERTY Terminates 10 | \* PROPERTY AngleNextSubVars 11 | PROPERTY Live 12 | -------------------------------------------------------------------------------- /APAsyncTerminationDetection.cfg: -------------------------------------------------------------------------------- 1 | \* 2 | \* Check TypeOK for an unbounded co-domain of pending : 3 | \* $ apalache-mc check --config=APAsyncTerminationDetection.cfg --length=1 AsyncTerminationDetection.tla 4 | \* 5 | \* Read https://apalache.informal.systems/docs/adr/002adr-types.html to learn 6 | \* about Apalache's type annotations. 7 | CONSTANT N = 3 8 | SPECIFICATION Spec 9 | INVARIANT TypeOK 10 | -------------------------------------------------------------------------------- /.gitpod.yml: -------------------------------------------------------------------------------- 1 | ## The -vnc image below causes problems because it 2 | ## lacks packages such as graphviz that also cannot 3 | ## be installed via apt. 4 | #image: 5 | # gitpod/workspace-full-vnc 6 | 7 | tasks: 8 | - init: bash -i .devcontainer/install.sh 9 | 10 | vscode: 11 | extensions: 12 | - tintinweb.graphviz-interactive-preview 13 | - cssho.vscode-svgviewer 14 | - tomoki1207.pdf 15 | - efanzh.graphviz-preview 16 | - mhutchie.git-graph 17 | -------------------------------------------------------------------------------- /Utils.tla: -------------------------------------------------------------------------------- 1 | This is a snapshot of a few operators from the TLA+ 2 | community modules at https://github.com/tlaplus/CommunityModules 3 | 4 | ------- MODULE Utils ------- 5 | 6 | MapThenFoldSet(op(_,_), base, f(_), choose(_), S) == 7 | LET iter[s \in SUBSET S] == 8 | IF s = {} THEN base 9 | ELSE LET x == choose(s) 10 | IN op(f(x), iter[s \ {x}]) 11 | IN iter[S] 12 | 13 | FoldFunctionOnSet(op(_,_), base, fun, indices) == 14 | MapThenFoldSet(op, base, LAMBDA i : fun[i], LAMBDA s: CHOOSE x \in s : TRUE, indices) 15 | 16 | ============================ -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ## Blacklist all files 2 | * 3 | 4 | ## Whitelist TLA+ files 5 | !*.tla 6 | 7 | ## Whitelist TLC model config and results 8 | !*.cfg 9 | !*.out ## Usually .out files are small 10 | 11 | ## Whitelist Toolbox model metadata 12 | !*.launch 13 | 14 | ## Whitelist Toolbox spec metadata 15 | !.project 16 | !*.prefs 17 | 18 | ## Whitelist all folders 19 | !*/ 20 | 21 | ## Blacklist TLAPS cache folder 22 | ## See https://github.com/tlaplus/tlapm/issues/16 23 | *.tlaps/ 24 | __tlacache__ 25 | .tlacache 26 | 27 | ## Blacklist apalache working dir 28 | x/ 29 | 30 | ## Ignore tools installed into the workspace 31 | tools/ 32 | -------------------------------------------------------------------------------- /MCAsyncTerminationDetection_actions.dot: -------------------------------------------------------------------------------- 1 | digraph ActionGraph { 2 | nodesep=0.35; 3 | subgraph cluster_legend { 4 | label = "Coverage"; 5 | node [shape=point] { 6 | d0 [style = invis]; 7 | d1 [style = invis]; 8 | p0 [style = invis]; 9 | p0 [style = invis]; 10 | } 11 | d0 -> d1 [label=unseen, color="green", style=dotted] 12 | p0 -> p1 [label=seen] 13 | } 14 | 0 [label="DetectTermination"] 15 | 1 [label="SendMsg"] 16 | 2 [label="Terminate"] 17 | 3 [label="Wakeup"] 18 | 0 -> 0[penwidth=0.83]; 19 | 0 -> 1[penwidth=0.64]; 20 | 0 -> 2[penwidth=0.65]; 21 | 0 -> 3[penwidth=0.67]; 22 | 1 -> 0[color="green",style=dotted]; 23 | 1 -> 1[penwidth=0.74]; 24 | 1 -> 2[penwidth=0.74]; 25 | 1 -> 3[penwidth=0.75]; 26 | 2 -> 0[penwidth=0.7]; 27 | 2 -> 1[penwidth=0.72]; 28 | 2 -> 2[penwidth=0.72]; 29 | 2 -> 3[penwidth=0.76]; 30 | 3 -> 0[color="green",style=dotted]; 31 | 3 -> 1[penwidth=0.76]; 32 | 3 -> 2[penwidth=0.76]; 33 | 3 -> 3[penwidth=0.75]; 34 | } -------------------------------------------------------------------------------- /C.tla: -------------------------------------------------------------------------------- 1 | 2 | 3 | See Specifying Systems section 6.6 on page 73. 4 | 5 | --------------------------------- MODULE C ---------------------------------- 6 | EXTENDS Integers 7 | 8 | S == 9 | {"c","a","c","f"} 10 | 11 | VARIABLE 12 | \* @type: Str; 13 | x 14 | 15 | ----------------------------------------------------------------------------- 16 | 17 | InitC == 18 | x = CHOOSE n \in S: TRUE 19 | 20 | NextC == 21 | x' = CHOOSE n \in {"a","c","f","f","c","a"}: n \in S 22 | 23 | SpecC == InitC /\ [][NextC]_x 24 | 25 | ----------------------------------------------------------------------------- 26 | 27 | InitE == 28 | x \in S 29 | 30 | NextE == 31 | x' \in {"a","c","f","f","c","a"} 32 | 33 | SpecE == InitE /\ [][NextE]_x 34 | 35 | ----------------------------------------------------------------------------- 36 | 37 | \* TLC 38 | InvT == 39 | [][x = x']_x 40 | 41 | \* Apalache 42 | InvA == 43 | x = x' 44 | 45 | ============================================================================= -------------------------------------------------------------------------------- /.devcontainer/devcontainer.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "TLA+ EWD998", 3 | 4 | // Install optional extension. If this fails, it just degrades the convenience factor. 5 | "extensions": [ 6 | "tlaplus.vscode-ide", 7 | "EFanZh.graphviz-preview", 8 | "cssho.vscode-svgviewer", 9 | "tomoki1207.pdf", 10 | "mhutchie.git-graph", 11 | "ms-vsliveshare.vsliveshare" 12 | ], 13 | 14 | // - Do not automatically update extensions (E.g. better-code ext is patched for TLA+) 15 | // - Use Java GC that works best with TLC. 16 | // - https://github.com/alygin/vscode-tlaplus/wiki/Automatic-Module-Parsing 17 | "settings": { 18 | "extensions.autoUpdate": false, 19 | "extensions.autoCheckUpdates": false, 20 | "editor.minimap.enabled": false, 21 | "tlaplus.tlc.statisticsSharing": "share", 22 | "tlaplus.java.options": "-XX:+UseParallelGC", 23 | "tlaplus.java.home": "/home/codespace/java/current/", 24 | "[tlaplus]": {"editor.codeActionsOnSave": {"source": true} } 25 | }, 26 | 27 | "onCreateCommand": "bash -i .devcontainer/install.sh", 28 | } 29 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 Markus Alexander Kuppe 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /TemporalLogic.tla: -------------------------------------------------------------------------------- 1 | --------------------------- MODULE TemporalLogic -------------------------------- 2 | EXTENDS Naturals, Sequences 3 | 4 | F == FALSE 5 | T == TRUE 6 | 7 | VARIABLE p 8 | 9 | seq == 10 | { 11 | \* << F, T, 2 >> 12 | \* ,<< T, F, 2 >> 13 | \* ,<< T, 1 >> 14 | 15 | \* ,<< T, F, 1 >> 16 | \* ,<< F, T, T, T, F, 5 >> 17 | \* ,<< T, F, T, T, F, T, 6 >> 18 | \* ,<< F, T, T, T, F, 3 >> 19 | } 20 | 21 | Prop == 22 | /\ p = T 23 | \* /\ []p 24 | \* /\ <>p 25 | \* /\ <>[]p 26 | \* /\ []<>p 27 | 28 | ------------------------------------------------------------------------------- 29 | \* Ignore the following! 30 | 31 | VARIABLE c, b 32 | vars == <> 33 | 34 | Init == 35 | /\ b \in seq 36 | /\ c = 1 37 | /\ p = b[c] 38 | 39 | Next == 40 | /\ UNCHANGED b 41 | /\ c' = IF c + 1 <= Len(b) - 1 THEN c + 1 ELSE b[Len(b)] 42 | /\ p' = b[c'] 43 | 44 | Spec == 45 | Init /\ [][Next]_vars /\ WF_vars(Next) 46 | 47 | Alias == 48 | \* Hide c and b variables. 49 | [ p |-> p ] 50 | 51 | ============================================================================== -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "tlaplus.tlc.modelChecker.options": "-deadlock -noTE", 3 | "tlaplus.tlc.statisticsSharing": "share", 4 | "tlaplus.java.options": "-XX:+UseParallelGC", 5 | "[tlaplus]": {"editor.codeActionsOnSave": { 6 | "source": "explicit" 7 | } }, 8 | "extensions.autoCheckUpdates": false, 9 | "extensions.autoUpdate": false, 10 | "breadcrumbs.enabled": false, 11 | "editor.minimap.enabled": false, 12 | "editor.useTabStops": false, 13 | "redhat.telemetry.enabled": false, 14 | "settingsSync.ignoredExtensions": [ 15 | "aaron-bond.better-comments" 16 | ], 17 | "files.exclude": { 18 | ".gitignore": true, 19 | ".gitpod.yml": true, 20 | ".devcontainer": true, 21 | ".github": true, 22 | ".vscode": true, 23 | "LICENSE": true, 24 | "figures": true, 25 | ".tlacache": true, 26 | "*.tlaps": true, 27 | "states": true, 28 | "x": true, 29 | "log0.smt": true, 30 | "profile-rules.txt": true, 31 | "detailed.log": true, 32 | "*.toolbox": true, 33 | "*.aux": true, 34 | "*.dvi": true, 35 | "*.log": true, 36 | "*.tex": true, 37 | "_apalache-out": true 38 | } 39 | } 40 | -------------------------------------------------------------------------------- /IncDec.tla: -------------------------------------------------------------------------------- 1 | Apalache is the new kid on the block. Where TLC 2 | implements finite-state model-checking, Apalache 3 | implements bounded model-checking. Apalache 4 | underpins a powerful SMT solver that can answer 5 | queries such as \E n \in 1..Nat : n \in Nat 6 | without enumerating the values of n (TLC won't 7 | even try to enumerate Nat). 8 | 9 | Let's see the different powers of Apalache and 10 | TLC... 11 | 12 | ### Run tools 13 | 14 | $ apalache-mc check --inv=Inv --length=10 \ 15 | IncDec.tla 16 | 17 | $ tlc -config IncDec.tla IncDec.tla 18 | 19 | ### Quick demo 20 | 21 | 1) Check spec as is 22 | 2) Increment --length to 11 23 | 3) Increment Inv and --length to 1000 24 | 4) Change Init to v \in Nat 25 | 4a) Go Apalache! 26 | (But no longer useful counter-examples 27 | when checking inductive invariants) 28 | 4b) TLC gives up 29 | (Workaround: Randomization.tla with 30 | RandomSubset(42,0..10000000)) 31 | 32 | ---- MODULE IncDec ---- 33 | EXTENDS Integers, Randomization 34 | 35 | VARIABLE 36 | \* @type: Int; 37 | v 38 | 39 | Init == 40 | /\ v = 0 41 | 42 | Inc == 43 | /\ v >= 0 44 | /\ v' = v + 1 45 | 46 | Dec == 47 | /\ v <= 0 48 | /\ v' = v - 1 49 | 50 | Next == 51 | \/ Inc 52 | \/ Dec 53 | 54 | Inv == 55 | /\ v < 10 56 | /\ v > -10 57 | 58 | ==== 59 | ---- CONFIG IncDec ---- 60 | INIT Init 61 | NEXT Next 62 | INVARIANT Inv 63 | ==== -------------------------------------------------------------------------------- /SmokeEWD998.tla: -------------------------------------------------------------------------------- 1 | ------------------------------- MODULE SmokeEWD998 ------------------------------- 2 | EXTENDS MCEWD998, TLC, Randomization 3 | 4 | k == 5 | 10 6 | 7 | \* SmokeInit is configured to re-define the initial predicate. We use SmokeInit 8 | \* to randomly select a subset of the defined initial states in cases when the 9 | \* set of all initial states is too expensive to generate during smoke testing. 10 | SmokeInit == 11 | /\ pending \in RandomSubset(k, [Node -> 0..(N-1)]) 12 | /\ counter \in RandomSubset(k, [Node -> -(N-1)..(N-1)]) 13 | /\ active \in RandomSubset(k, [Node -> BOOLEAN]) 14 | /\ color \in RandomSubset(k, [Node -> Color]) 15 | /\ token \in RandomSubset(k, [pos: Node, q: Node, color: Color]) 16 | /\ Inv \* Reject states with invalid ratio between counter, pending, ... 17 | 18 | \* StopAfter has to be configured as a state constraint. It stops TLC after ~1 19 | \* second or after generating 100 traces, whatever comes first, unless TLC 20 | \* encountered an error. In this case, StopAfter has no relevance. 21 | StopAfter == 22 | TLCGet("config").mode = "simulate" => 23 | (* The smoke test has a time budget of 1 second. *) 24 | \/ TLCSet("exit", TLCGet("duration") > 1) 25 | (* Generating 100 traces should provide reasonable coverage. *) 26 | \/ TLCSet("exit", TLCGet("diameter") > 100) 27 | 28 | =============================================================================== -------------------------------------------------------------------------------- /AsyncTerminationDetection_apalache.tla: -------------------------------------------------------------------------------- 1 | We want to prove the temporal property Stable , which is defined as: 2 | 3 | Stable == [](terminationDetected => []terminated) 4 | 5 | For the moment, Apalache supports only invariant checking. 6 | Nevertheless, we can check the property Stable with Apalache. 7 | If we look carefully at the temporal formula Stable, we can see that 8 | it is sufficient to check the following: 9 | 10 | 1. Init => StableInv 11 | 2. StableInv /\ Next => StableInv' 12 | 3. StableInv /\ Next => StableActionInv 13 | 14 | We can check that by issuing the following three queries: 15 | 16 | $ apalache-mc check --config=APAsyncTerminationDetection.cfg --length=1 \ 17 | --inv=StableInv --init=Init AsyncTerminationDetection_apalache.tla 18 | $ apalache-mc check --config=APAsyncTerminationDetection.cfg --length=2 \ 19 | --init=StableInv --inv=StableInv AsyncTerminationDetection_apalache.tla 20 | $ apalache-mc check --config=APAsyncTerminationDetection.cfg --length=2 \ 21 | --init=StableInv --inv=StableActionInv AsyncTerminationDetection_apalache.tla 22 | 23 | We issue query 1 for a computation of length 1 (predicate Init is counted as a 24 | step), whereas we issue queries 2-3 for computations of length 2 (StableInv, 25 | then Next). 26 | 27 | ---------------------- MODULE AsyncTerminationDetection_apalache --------------------- 28 | EXTENDS AsyncTerminationDetection 29 | 30 | \* This is a state invariant. 31 | StableInv == 32 | /\ TypeOK 33 | /\ (terminationDetected => terminated) 34 | 35 | \* This is an action invariant. 36 | StableActionInv == 37 | terminated => terminated' 38 | ====================================================================================== 39 | -------------------------------------------------------------------------------- /EWD998_proof.tla: -------------------------------------------------------------------------------- 1 | ---------------------- MODULE EWD998_proof --------------------- 2 | EXTENDS EWD998, TLAPS 3 | 4 | USE NIsPosNat DEF 5 | Color, Node, 6 | Init, Spec, 7 | Next, vars, 8 | System, InitiateProbe, PassToken, 9 | Environment, SendMsg, RecvMsg, Deactivate, 10 | TypeOK 11 | 12 | LEMMA TypeCorrect == Spec => []TypeOK 13 | <1>1. Init => TypeOK OBVIOUS 14 | <1>2. TypeOK /\ [Next]_vars => TypeOK' 15 | <1>3. QED BY <1>1, <1>2, PTL 16 | 17 | THEOREM TerminationDetection == Spec => []IInv 18 | <1> USE TypeCorrect DEF IInv, Inv, Sum 19 | <1>1. Init => IInv 20 | <1>2. IInv /\ [Next]_vars => IInv' 21 | <1>3. QED BY <1>1, <1>2, PTL 22 | 23 | \* TODO Have fun and prove TerminationDetection above! When done, file a PR 24 | \* TODO for the TLA+ examples at https://examples.tlapl.us :-) 25 | 26 | ============================================================================= 27 | 28 | 29 | 30 | \* The <1>1 proof obligation is not OBVIOUS, but the failed proof obligation 31 | \* nicely shows the equivalence of the special syntax for recursive functions 32 | \* F[e \in S] == ... and CHOOSE. 33 | \* Below is an excerpt of what TLAPS returns for <1>1: 34 | 35 | ASSUME NEW CONSTANT N, 36 | NEW VARIABLE active, 37 | NEW VARIABLE pending, 38 | NEW VARIABLE color, 39 | NEW VARIABLE counter, 40 | NEW VARIABLE token, 41 | N \in Nat \ {0} 42 | PROVE (/\ ... 43 | => ... 44 | /\ /\ P0::(B 45 | = (CHOOSE sum : 46 | sum 47 | = [i \in 0..N - 1 |-> 48 | IF i = 0 49 | THEN counter[i] 50 | ELSE sum[i - 1] + counter[i]])[N - 1]) 51 | /\ \/ P1:: ... 52 | -------------------------------------------------------------------------------- /SyncTerminationDetection.tla: -------------------------------------------------------------------------------- 1 | ---------------------- MODULE SyncTerminationDetection ---------------------- 2 | (***************************************************************************) 3 | (* An abstract specification of the termination detection problem in a *) 4 | (* ring with synchronous communication. *) 5 | (***************************************************************************) 6 | EXTENDS Naturals 7 | CONSTANT N 8 | ASSUME NAssumption == N \in Nat \ {0} 9 | 10 | Node == 0 .. N-1 11 | 12 | VARIABLES 13 | active, \* activation status of nodes 14 | terminationDetected \* has termination been detected? 15 | 16 | TypeOK == 17 | /\ active \in [Node -> BOOLEAN] 18 | /\ terminationDetected \in BOOLEAN 19 | 20 | terminated == \A n \in Node : ~ active[n] 21 | 22 | (***************************************************************************) 23 | (* Initial condition: the nodes can be active or inactive, termination *) 24 | (* may (but need not) be detected immediately if all nodes are inactive. *) 25 | (***************************************************************************) 26 | Init == 27 | /\ active \in [Node -> BOOLEAN] 28 | /\ terminationDetected \in {FALSE, terminated} 29 | 30 | Terminate(i) == \* node i terminates 31 | /\ active[i] 32 | /\ active' = [active EXCEPT ![i] = FALSE] 33 | (* possibly (but not necessarily) detect termination if all nodes are inactive *) 34 | /\ terminationDetected' \in {terminationDetected, terminated'} 35 | 36 | Wakeup(i,j) == \* node i activates node j 37 | /\ active[i] 38 | /\ active' = [active EXCEPT ![j] = TRUE] 39 | /\ UNCHANGED terminationDetected 40 | 41 | DetectTermination == 42 | /\ terminated 43 | /\ terminationDetected' = TRUE 44 | /\ UNCHANGED active 45 | 46 | Next == 47 | \/ \E i \in Node : Terminate(i) 48 | \/ \E i,j \in Node : Wakeup(i,j) 49 | \/ DetectTermination 50 | 51 | vars == <> 52 | Spec == Init /\ [][Next]_vars /\ WF_vars(DetectTermination) 53 | 54 | Stable == [](terminationDetected => []terminated) 55 | 56 | Live == terminated ~> terminationDetected 57 | 58 | ============================================================================= 59 | -------------------------------------------------------------------------------- /.github/workflows/main.yml: -------------------------------------------------------------------------------- 1 | name: CI 2 | 3 | on: [push] 4 | 5 | jobs: 6 | build: 7 | 8 | runs-on: ubuntu-latest 9 | 10 | steps: 11 | - uses: actions/checkout@v1 12 | # Do not download and install TLAPS over and over again. 13 | - uses: actions/cache@v1 14 | id: cache 15 | with: 16 | path: tlaps/ 17 | key: tlaps1.4.5 18 | - name: Get TLAPS 19 | if: steps.cache.outputs.cache-hit != 'true' # see actions/cache above 20 | run: wget https://github.com/tlaplus/tlapm/releases/download/v1.4.5/tlaps-1.4.5-x86_64-linux-gnu-inst.bin 21 | - name: Install TLAPS 22 | if: steps.cache.outputs.cache-hit != 'true' # see actions/cache above 23 | run: | 24 | chmod +x tlaps-1.4.5-x86_64-linux-gnu-inst.bin 25 | ./tlaps-1.4.5-x86_64-linux-gnu-inst.bin -d tlaps 26 | - name: Run TLAPS 27 | run: tlaps/bin/tlapm --cleanfp -I tlaps/ O.tla AsyncTerminationDetection_proof.tla 28 | - name: Get (nightly) TLC 29 | run: wget https://github.com/tlaplus/tlaplus/releases/download/v1.8.0/tla2tools.jar 30 | - name: Run TLC 31 | run: >- 32 | java -Dtlc2.TLC.stopAfter=1800 -Dtlc2.TLC.ide=Github 33 | -Dutil.ExecutionStatisticsCollector.id=aabbcc60f238424fa70d124d0c77bbf1 34 | -cp tla2tools.jar tlc2.TLC -workers auto -lncheck final -checkpoint 60 35 | -coverage 60 -tool -deadlock MCAsyncTerminationDetection 36 | - name: Get (nightly) Apalache 37 | run: wget https://github.com/informalsystems/apalache/releases/latest/download/apalache.tgz 38 | - name: Install Apalache 39 | run: | 40 | tar xvfz apalache.tgz 41 | - name: Run Apalache 42 | run: | 43 | apalache/bin/apalache-mc check --config=APAsyncTerminationDetection.cfg --length=1 --inv=StableInv --init=Init AsyncTerminationDetection_apalache.tla 44 | apalache/bin/apalache-mc check --config=APAsyncTerminationDetection.cfg --length=2 --init=StableInv --inv=StableInv AsyncTerminationDetection_apalache.tla 45 | apalache/bin/apalache-mc check --config=APAsyncTerminationDetection.cfg --length=2 --init=StableInv --inv=StableActionInv AsyncTerminationDetection_apalache.tla 46 | apalache/bin/apalache-mc check --features=no-rows --config=APEWD998.cfg --length=2 --init=IInvA --next=Next --inv=InvA APEWD998.tla 47 | -------------------------------------------------------------------------------- /MCAsyncTerminationDetection.tla: -------------------------------------------------------------------------------- 1 | ---------------------- MODULE MCAsyncTerminationDetection --------------------- 2 | EXTENDS AsyncTerminationDetection 3 | 4 | MCInit == 5 | /\ pending \in [Node -> {1,2,3}] 6 | /\ active \in [ Node -> BOOLEAN ] 7 | /\ terminationDetected \in {terminated} 8 | 9 | StateConstraint == 10 | \* * A (state-) constraint is a boolean-valued state function, i.e. a function 11 | \* * that is true or false of a state. 12 | \* * A state s, for which the constraint evaluates to FALSE, is not in the model. 13 | \* * TLC checks if s satisfies the properties (later!), but the successor states 14 | \* * of s are not generated. 15 | \* * Constraints are configured in TLC's configuration file 16 | \* * (MCAsyncTerminationDetection.cfg). 17 | \* * In this model, we restrict the state space to a finite fragment such that 18 | \* * at most three messages are pending. 19 | \A n \in Node : pending[n] <= 3 20 | 21 | ActionConstraint == 22 | \* * A state function cannot only be built from constant- and state-level operators. 23 | \* * Among others, the prime operator has action-level. Thus, it cannot appear in 24 | \* * a state function such as this state constraint. Fortunately, TLC also supports 25 | \* * action constraints. 26 | \* * There exists no node for which pending increases. 27 | ~ \E n \in Node: pending'[n] > pending[n] 28 | 29 | \* * We could have stated the constraint in AsyncTerminationDetection.tla instead of 30 | \* * in a new module. However, constraints are only relevant when model-checking 31 | \* * and not part of the system design. 32 | 33 | \* Gradually increase the value of CONSTANT N in MCAsyncTerminationDetection.cfg 34 | \* and observe how quickly the size of the state space explodes (distinct states). 35 | \* Do we need a supercomputer for model-checking to be useful? Usually, most bugs 36 | \* are found even with tiny models. This is called the "small scope hyphothesis". 37 | \* If higher assurances are needed, one can write a proof for infinite domains with 38 | \* the TLA proof system. 39 | ============================================================================= 40 | 41 | | N | Diameter | Distinct States | 42 | |---| ---| --- | 43 | | 4 | 17 | 4k | 44 | | 5 | 21 | 32k | 45 | | 6 | 25 | 262k | 46 | -------------------------------------------------------------------------------- /APEWD998.tla: -------------------------------------------------------------------------------- 1 | 2 | \* Cannot check IInvA as the invariant below because Apalache complains about TypeOK : 3 | \* Input error (see the manual): Found a set map over an infinite set of Int. Not supported. 4 | 5 | \* Remove/comment PROPERTY ATDSpec in MCEWD998.cfg to stop Apalache (0.23.1) from complaining about: 6 | \* AsyncTerminationDetection.tla:340:30-340:55: unsupported expression: WF_< Bool; 23 | active, 24 | \* @type: Int -> Int; 25 | pending, 26 | \* @type: Int -> Str; 27 | color, 28 | \* @type: Int -> Int; 29 | counter, 30 | \* @type: [pos: Int, q: Int, color: Str]; 31 | token 32 | 33 | INSTANCE EWD998 34 | 35 | \* C parameter of SumA because Apalache does not handle non-constant ranges 36 | \* (see https://git.io/JGFhg) 37 | \* @type: (Int -> Int, Set(Int)) => Int; 38 | SumA(fun, C) == 39 | LET Plus(a, b) == a + b 40 | IN FoldFunctionOnSet(Plus, 0, fun, C) 41 | 42 | BA == 43 | \* This spec counts the in-flight messages in the variable pending . 44 | SumA(pending, Node) 45 | 46 | \* The set of nodes that have passed the token in this round. 47 | \* Previously written more concisely as (token.pos+1)..N-1 48 | \* (see https://git.io/JGFhg) 49 | Decided == 50 | { n \in Node: n > token.pos } 51 | 52 | \* The set of nodes that have not passed the token in this round yet. 53 | \* Previously written more concisely as 0..token.pos 54 | \* (see https://git.io/JGFhg) 55 | Undecided == 56 | { n \in Node: n <= token.pos } 57 | 58 | InvA == 59 | /\ P0:: BA = SumA(counter, Node) 60 | /\ \/ P1:: /\ \A i \in Decided : ~ active[i] 61 | /\ IF token.pos = N-1 62 | THEN token.q = 0 63 | ELSE token.q = SumA(counter, Decided) 64 | \/ P2:: SumA(counter, Undecided) + token.q > 0 65 | \/ P3:: \E i \in Undecided : color[i] = "black" 66 | \/ P4:: token.color = "black" 67 | 68 | IInvA == 69 | \* Conjoin TypeOK to define the types of the variables. This is somewhat 70 | \* redundant given Apalache's type annotations. 71 | /\ TypeOK 72 | /\ InvA 73 | 74 | ============================================================================= 75 | -------------------------------------------------------------------------------- /.devcontainer/install.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash -i 2 | 3 | ## Fix issues with gitpod's stock .bashrc 4 | cp /etc/skel/.bashrc $HOME 5 | 6 | ## Shorthands for git 7 | git config --global alias.slog 'log --pretty=oneline --abbrev-commit' 8 | git config --global alias.co checkout 9 | git config --global alias.lco '!f() { git checkout ":/$1:" ; }; f' 10 | 11 | ## Waste less screen estate on the prompt. 12 | echo 'export PS1="$ "' >> $HOME/.bashrc 13 | 14 | ## Make it easy to go back and forth in the (linear) git history. 15 | echo 'function sn() { git log --reverse --pretty=%H main | grep -A 1 $(git rev-parse HEAD) | tail -n1 | xargs git show --color; }' >> $HOME/.bashrc 16 | echo 'function n() { git log --reverse --pretty=%H main | grep -A 1 $(git rev-parse HEAD) | tail -n1 | xargs git checkout; }' >> $HOME/.bashrc 17 | echo 'function p() { git checkout HEAD^; }' >> $HOME/.bashrc 18 | 19 | ## Place to install TLC, TLAPS, Apalache, ... 20 | mkdir -p tools 21 | 22 | ## PATH below has two locations because of inconsistencies between Gitpod and Codespaces. 23 | ## Gitpod: /workspace/... 24 | ## Codespaces: /workspaces/... 25 | 26 | ## Install TLA+ Tools (download from github instead of nightly.tlapl.us (inria) to only rely on github) 27 | wget -qN https://github.com/tlaplus/tlaplus/releases/download/v1.8.0/tla2tools.jar -P tools/ 28 | echo "alias tlcrepl='java -cp /workspace/ewd998/tools/tla2tools.jar:/workspaces/ewd998/tools/tla2tools.jar tlc2.REPL'" >> $HOME/.bashrc 29 | echo "alias tlc='java -cp /workspace/ewd998/tools/tla2tools.jar:/workspaces/ewd998/tools/tla2tools.jar tlc2.TLC'" >> $HOME/.bashrc 30 | 31 | ## Install CommunityModules 32 | wget -qN https://github.com/tlaplus/CommunityModules/releases/latest/download/CommunityModules-deps.jar -P tools/ 33 | 34 | ## Install TLAPS (proof system) 35 | wget -N https://github.com/tlaplus/tlapm/releases/download/v1.4.5/tlaps-1.4.5-x86_64-linux-gnu-inst.bin -P /tmp 36 | chmod +x /tmp/tlaps-1.4.5-x86_64-linux-gnu-inst.bin 37 | /tmp/tlaps-1.4.5-x86_64-linux-gnu-inst.bin -d tools/tlaps 38 | echo 'export PATH=$PATH:/workspace/ewd998/tools/tlaps/bin:/workspaces/ewd998/tools/tlaps/bin' >> $HOME/.bashrc 39 | 40 | ## Install Apalache 41 | wget -qN https://github.com/informalsystems/apalache/releases/latest/download/apalache.tgz -P /tmp 42 | mkdir -p tools/ 43 | tar xvfz /tmp/apalache.tgz --directory tools/ 44 | echo 'export PATH=$PATH:/workspace/ewd998/tools/apalache/bin:/workspaces/ewd998/tools/apalache/bin' >> $HOME/.bashrc 45 | tools/apalache/bin/apalache-mc config --enable-stats=true 46 | 47 | ## Update missing or outdated apt database on cloud instances. Without it, 48 | ## installing packages below will likely fail. 49 | sudo apt-get update 50 | 51 | ## (Moved to the end to let it run in the background while we get started) 52 | ## - graphviz to visualize TLC's state graphs 53 | ## - htop to show system load 54 | ## - texlive-latex-recommended to generate pretty-printed specs 55 | ## - z3 for Apalache (comes with z3 turnkey) (TLAPS brings its own install) 56 | ## - r-base iff tutorial covers statistics (TODO) 57 | sudo apt-get install -y graphviz htop 58 | ## No need because Apalache comes with z3 turnkey 59 | #sudo apt-get install -y z3 libz3-java 60 | sudo apt-get install -y --no-install-recommends texlive-latex-recommended 61 | #sudo apt-get install -y r-base 62 | 63 | ## Install TLA+ Toolbox 64 | wget https://github.com/tlaplus/tlaplus/releases/download/v1.8.0/TLAToolbox-1.8.0.deb -P /tmp 65 | sudo dpkg -i /tmp/TLAToolbox-1.8.0.deb 66 | 67 | ## switch to first commit of the tutorial. Unshallow on Codespaces first. 68 | if $(git rev-parse --is-shallow-repository); then git fetch --unshallow; fi 69 | git co ':/v01:' 70 | 71 | ## $(pwd)/ because VSCode apparantly doesn't like relative paths. 72 | #code --force --install-extension $(pwd)/.devcontainer/extensions/better-comments-2.0.5.vsix 73 | #code --force --install-extension $(pwd)/.devcontainer/extensions/EFanZh.graphviz-preview-1.5.0.vsix 74 | 75 | ## Open the readme.md file in the editor. 76 | #code README.md 77 | -------------------------------------------------------------------------------- /O.tla: -------------------------------------------------------------------------------- 1 | Run `tlapm O.tla` on the terminal to verify the 2 | theorems below with TLAPS. 3 | 4 | ---- MODULE O ---- 5 | 6 | CONSTANT O(_) 7 | 8 | \* THEOREM T1 == O(1) /\ O(2) <=> \E i \in {1,2}: O(i) OBVIOUS 9 | THEOREM T2 == O(1) /\ O(2) <=> \A i \in {1,2}: O(i) OBVIOUS 10 | THEOREM T3 == O(1) \/ O(2) <=> \E i \in {1,2}: O(i) OBVIOUS 11 | \* THEOREM T4 == O(1) \/ O(2) <=> \A i \in {1,2}: O(i) OBVIOUS 12 | 13 | 14 | ------------------ 15 | \* Implication 16 | 17 | CONSTANT 18 | P, \* It's raining 19 | Q \* The street is wet (street is not in a tunnel!) 20 | 21 | \* If it rains (P), the street is wet (Q) 22 | THEOREM TRUE => TRUE <=> TRUE OBVIOUS 23 | \* It cannot be that it rains, but the street is dry 24 | THEOREM TRUE => FALSE <=> FALSE OBVIOUS 25 | \* The street might be wet, even without rain (somebody spilled some water) 26 | THEOREM FALSE => TRUE <=> TRUE OBVIOUS 27 | \* No rain and a dry street 28 | THEOREM FALSE => FALSE <=> TRUE OBVIOUS 29 | 30 | \* Contraposition (Street not wet implies no rain). 31 | \* https://en.wikipedia.org/wiki/Contraposition 32 | THEOREM P => Q <=> ~Q => ~P OBVIOUS 33 | \* Or-and-if. 34 | THEOREM P => Q <=> (~P) \/ Q OBVIOUS 35 | \* Negated conditionals. 36 | THEOREM ~(P => Q) <=> P /\ (~Q) OBVIOUS 37 | 38 | ------------------ 39 | \* Action operators 40 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 41 | PROVE [A]_v <=> A \/ v' = v OBVIOUS 42 | 43 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 44 | PROVE <>_v <=> A /\ v' # v OBVIOUS 45 | 46 | \* ExpandENABLED requires TLAPS version greater than 1.4 47 | \* ENABLED A \/ v=v' is a tautology. 48 | INSTANCE TLAPS 49 | 50 | THEOREM ASSUME NEW VARIABLE v 51 | PROVE (ENABLED [FALSE]_v) (*BY ExpandENABLED*) 52 | 53 | THEOREM ASSUME NEW VARIABLE v 54 | PROVE (ENABLED [TRUE]_v) (*BY ExpandENABLED*) 55 | 56 | THEOREM ASSUME NEW VARIABLE v 57 | PROVE (ENABLED [FALSE]_TRUE) (*BY ExpandENABLED*) 58 | 59 | THEOREM ASSUME NEW VARIABLE v 60 | PROVE (ENABLED [TRUE]_TRUE) (*BY ExpandENABLED*) 61 | 62 | ------------------ 63 | \* Dual Box and Diamond operators 64 | THEOREM ASSUME NEW F 65 | PROVE <>F <=> ~[]~F OBVIOUS 66 | 67 | THEOREM ASSUME NEW F 68 | PROVE ~<>F <=> []~F OBVIOUS 69 | 70 | \* see Specifying Systems page 92 71 | THEOREM ASSUME NEW F 72 | PROVE ~[]F <=> <>~F OBVIOUS 73 | 74 | \* see Specifying Systems page 93 75 | THEOREM ASSUME NEW F, NEW G 76 | PROVE 77 | /\ [](F /\ G) <=> ([]F) /\ ([]G) 78 | /\ <>(F \/ G) <=> (<>F) \/ (<>G) 79 | OBVIOUS 80 | 81 | THEOREM ASSUME NEW F, NEW G 82 | PROVE 83 | /\ ([]F) \/ ([]G) => [](F \/ G) 84 | /\ <>(F /\ G) => (<>F) /\ (<>G) 85 | OBVIOUS 86 | 87 | \* see Specifying Systems page 94 88 | THEOREM ASSUME NEW ACTION A, NEW ACTION B, NEW VARIABLE v 89 | PROVE 90 | /\ [A /\ B]_v <=> [A]_v /\ [B]_v 91 | /\ <>_v <=> <>_v \/ <>_v 92 | \* 8.5 93 | /\ ([]<><>_v) \/ ([]<><>_v) <=> ([]<><>_v) \/ ([]<><>_v) 94 | OBVIOUS 95 | 96 | \* see Specifying Systems page 95 97 | THEOREM ASSUME NEW ACTION A, NEW ACTION B, NEW VARIABLE v 98 | PROVE 99 | /\ []<><>_v <=> ([]<><>_v) \/ ([]<><>_v) 100 | BY PTL 101 | 102 | ------------------ 103 | \* (Weak) Fairness (see Specifying Systems page 97ff for more equivalent formulae) 104 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 105 | PROVE ( <>[](ENABLED <>_v) => []<><>_v ) <=> ( []([]ENABLED <>_v => <><>_v) ) BY PTL 106 | 107 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 108 | PROVE ( <>[](ENABLED <>_v) => []<><>_v ) <=> ( WF_v(A) ) BY PTL 109 | 110 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 111 | PROVE ( []<>(~ENABLED <>_v) \/ []<><>_v ) <=> ( WF_v(A) ) BY PTL 112 | 113 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 114 | PROVE ( []<>(ENABLED <>_v) => []<><>_v ) <=>( SF_v(A) ) BY PTL 115 | 116 | ------------------ 117 | \* Leads-to 118 | THEOREM ASSUME NEW F, NEW G 119 | PROVE [](F => <>G) <=> (F ~> G) OMITTED 120 | 121 | ------------------ 122 | \* CHOOSE 123 | 124 | THEOREM ASSUME NEW P(_), NEW S 125 | PROVE ( \E c: P(c) ) <=> ( P(CHOOSE c: P(c)) ) OBVIOUS 126 | 127 | ==== 128 | -------------------------------------------------------------------------------- /F.tla: -------------------------------------------------------------------------------- 1 | ---- MODULE F ---- 2 | EXTENDS Naturals, FiniteSets, Sequences 3 | 4 | (* 1. Set of all permutations of {"T","L","A"} including repetitions. *) 5 | PermsWithReps(S) == 6 | [ 1..Cardinality(S) -> S ] 7 | 8 | ASSUME 9 | PermsWithReps({"T","L","A"}) = 10 | {<<"T", "T", "T">>, <<"T", "T", "L">>, <<"T", "T", "A">>, 11 | <<"T", "L", "T">>, <<"T", "L", "L">>, <<"T", "L", "A">>, 12 | <<"T", "A", "T">>, <<"T", "A", "L">>, <<"T", "A", "A">>, 13 | <<"L", "T", "T">>, <<"L", "T", "L">>, <<"L", "T", "A">>, 14 | <<"L", "L", "T">>, <<"L", "L", "L">>, <<"L", "L", "A">>, 15 | <<"L", "A", "T">>, <<"L", "A", "L">>, <<"L", "A", "A">>, 16 | <<"A", "T", "T">>, <<"A", "T", "L">>, <<"A", "T", "A">>, 17 | <<"A", "L", "T">>, <<"A", "L", "L">>, <<"A", "L", "A">>, 18 | <<"A", "A", "T">>, <<"A", "A", "L">>, <<"A", "A", "A">>} 19 | 20 | (* 2. All combinations of a two-digit lock. *) 21 | TwoDigitLock == 22 | [1..2 -> 0..9] 23 | 24 | ASSUME 25 | /\ (0..9) \X (0..9) = TwoDigitLock 26 | /\ {<> : n,m \in 10..19} \notin SUBSET TwoDigitLock 27 | 28 | (* 3. All combinations of a three-digit lock. *) 29 | ThreeDigitLock == 30 | [1..3 -> 0..9] 31 | 32 | ASSUME 33 | /\ (0..9) \X (0..9) \X (0..9) = ThreeDigitLock 34 | /\ {<> : n,m,o \in 10..19} \notin SUBSET ThreeDigitLock 35 | 36 | (* 4. All pairs (including repetitions) of the natural numbers. *) 37 | PairsOfNaturals == 38 | [1..2 -> Nat] 39 | 40 | ASSUME 41 | {<> : n,m \in 0..100} \subseteq PairsOfNaturals 42 | 43 | (* 5. All triples... *) 44 | TriplesOfNaturals == 45 | [1..3 -> Nat] 46 | 47 | ASSUME 48 | {<> : n,m,o \in 0..25} \subseteq TriplesOfNaturals 49 | 50 | (* 6. Set of all pairs and triples... *) 51 | PairsAndTriplesOfNaturals == 52 | [1..2 -> Nat] \cup [1..3 -> Nat] 53 | 54 | ASSUME 55 | /\ {<> : n,m \in 0..100} \subseteq PairsAndTriplesOfNaturals 56 | /\ {<> : n,m,o \in 0..25} \subseteq PairsAndTriplesOfNaturals 57 | 58 | (* 7. What is the Cardinality of 3. ? *) 59 | Cardinality3 == 60 | Cardinality(ThreeDigitLock) 61 | 62 | ASSUME Cardinality3 = 1000 63 | 64 | (* 8. What is the Cardinality of 6. (PairsAndTriplesOfNaturals) ? *) 65 | 66 | -------------------------------------------------------------- 67 | 68 | (* 9. The range/image/co-domain of a function. *) 69 | Range(f) == { f[x]: x \in DOMAIN f } 70 | 71 | ASSUME Range([a |-> 1, b |-> 2, c |-> 3]) = 1..3 72 | 73 | (* 10. The permutations of a set _without_ repetition. *) 74 | Perms(S) == 75 | { f \in [S -> S] : 76 | Range(f) = S } 77 | 78 | ASSUME Perms({1,2,3}) = 79 | {<<1, 2, 3>>, <<1, 3, 2>>, 80 | <<2, 1, 3>>, <<2, 3, 1>>, 81 | <<3, 1, 2>>, <<3, 2, 1>>} 82 | 83 | Perms2(S) == 84 | \* If for all w in S there exists a v in S for which f[v]=w, 85 | \* there can be no repetitions as a consequence. The predicate 86 | \* demands for all elements of S to be in the range of f. 87 | { f \in [S -> S] : 88 | \A w \in S : 89 | \E v \in S : f[v]=w } 90 | 91 | ASSUME Perms2({1,2,3}) = 92 | {<<1, 2, 3>>, <<1, 3, 2>>, 93 | <<2, 1, 3>>, <<2, 3, 1>>, 94 | <<3, 1, 2>>, <<3, 2, 1>>} 95 | 96 | Perms3(S) == 97 | { f \in [S -> S] : 98 | \A i,j \in DOMAIN f : 99 | i # j => f[i] # f[j] } 100 | 101 | ASSUME Perms3({1,2,3}) = 102 | {<<1, 2, 3>>, <<1, 3, 2>>, 103 | <<2, 1, 3>>, <<2, 3, 1>>, 104 | <<3, 1, 2>>, <<3, 2, 1>>} 105 | 106 | (* 11. Reverse a sequence (a function with domain 1..N). *) 107 | Reverse(seq) == 108 | [ i \in 1..Len(seq) |-> seq[Len(seq)+1 - i] ] 109 | 110 | ASSUME Reverse(<<1, 2, 3>>) = <<3, 2, 1>> 111 | ASSUME Reverse(<<>>) = <<>> 112 | 113 | (* 12. An (infix) operator to quickly define a function mapping an x to a y. *) 114 | x :> y == 115 | [ e \in {x} |-> y ] 116 | 117 | ASSUME "x" :> 42 = [ x |-> 42 ] 118 | 119 | (* 13. Merge two functions f and g *) 120 | f ++ g == 121 | [x \in (DOMAIN f) \cup (DOMAIN g) |-> IF x \in DOMAIN f THEN f[x] ELSE g[x]] 122 | 123 | ASSUME <<1,2,3>> ++ [i \in 1..6 |-> i] = <<1, 2, 3, 4, 5, 6>> 124 | 125 | (* 14. Advanced!!! Inverse of a function f (swap the domain and range). *) 126 | Inverse(f) == 127 | CHOOSE g \in [ Range(f) -> DOMAIN f] : \A s \in DOMAIN f: g[f[s]]=s 128 | 129 | ASSUME Inverse(("a" :> 0) ++ ("b" :> 1) ++ ("c" :> 2)) = 130 | ((0 :> "a") ++ (1 :> "b") ++ (2 :> "c")) 131 | 132 | -------------------------------------------------------------- 133 | 134 | \* Mutual recursion becomes possible 135 | \* with recursive *operators*. 136 | \* Evaluate in the tlcrepl with: 137 | \* LET F == INSTANCE F IN F!IsEven(42) 138 | ---------------------- 139 | RECURSIVE IsEven(_) 140 | 141 | RECURSIVE IsOdd(_) 142 | 143 | IsEven(n) == 144 | IF n = 0 145 | THEN TRUE 146 | ELSE IsOdd(n-1) 147 | 148 | IsOdd(n) == 149 | IF n = 0 150 | THEN FALSE 151 | ELSE IsEven(n-1) 152 | 153 | ================== 154 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ewd998 2 | Experience TLA+ in action by specifying distributed termination detection on a ring, [due to Shmuel Safra](https://www.cs.utexas.edu/users/EWD/ewd09xx/EWD998.PDF). Each [git commit](https://github.com/lemmy/ewd998/commits/) introduces a new TLA+ concept. Go back to the very first commit to follow along! 3 | 4 | ### v00: IDE 5 | 6 | Click either one of the buttons to launch a zero-install IDE to give the TLA+ specification language a try: 7 | 8 | [![Open TLA+ EWD998 in Codespaces](https://img.shields.io/badge/TLA+-in--Codespaces-grey?labelColor=ee4e14&style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyBmaWxsPSIjNjY2NjY2IiByb2xlPSJpbWciIHZpZXdCb3g9IjAgMCAyNCAyNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48dGl0bGU+TWljcm9zb2Z0IGljb248L3RpdGxlPjxwYXRoIGQ9Ik0xMS40IDI0SDBWMTIuNmgxMS40VjI0ek0yNCAyNEgxMi42VjEyLjZIMjRWMjR6TTExLjQgMTEuNEgwVjBoMTEuNHYxMS40em0xMi42IDBIMTIuNlYwSDI0djExLjR6Ii8+PC9zdmc+)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=408523143&machine=standardLinux32gb&devcontainer_path=.devcontainer%2Fdevcontainer.json&location=WestUs2) 9 | [![Open TLA+ EWD998 in Gitpod Ready-to-Code](https://img.shields.io/badge/TLA+-in--Gitpod-grey?labelColor=ee4e14&style=for-the-badge&logo=gitpod)](https://gitpod.io/#https://github.com/ewd998/ewd998) 10 | 11 | (=> [Screencast how to create the TLA+ Codespace](https://www.youtube.com/watch?v=mFWWDcJahg0&list=PLWLcqZLzY8u_oWnCTGC77OgZlWaab06Gt)) 12 | 13 | ### v01: Problem statement - Termination detection in a ring 14 | 15 | #### v01a: Termination of [pleasingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel) 16 | 17 | For this tutorial, we assume that the distributed system nodes are organized as a ring, with one the (unique) leader[^1]. If we further assume that nodes execute independent computations, (global) termination detection becomes trivial--the leader initiates a token transfer around the ring, and each node passes the token to its next neighbor, iff the node finished its computation. When the initiator receives back the token, it knows that all (other) nodes have terminated. 18 | 19 | ![Token Passing](figures/v01-ring01.gif) 20 | 21 | This problem is too simple, and we don't need TLA+ to model it. 22 | 23 | [^1] Perhaps by some [leader election algorithm](https://en.wikipedia.org/wiki/Paxos_(computer_science)). 24 | 25 | #### v01b: Termination of collaborative computation 26 | 27 | A more interesting problem is to look at a "collaborative" computation, which implies that nodes can re-activate each other. For example, the result of a computation at node 23 is (atomically!) sent to and further processed at node 42. With the previous protocol, node 42 might have already passed on the token, causing the initiator to eventually detect (global) termination; a bug that is at least difficult to reproduce with testing! 28 | A solution is offered in [EWD840](https://github.com/tlaplus/Examples/blob/master/specifications/ewd840/EWD840.tla): 29 | * Initiator sends a "stateful" token around the ring 30 | * Each node remembers if it activated another node 31 | * Activation taints the token (when the activator gets the token) 32 | * Initiator keeps running rounds until it receives an untainted token 33 | 34 | ![Token Passing](figures/v01-ring03.gif) 35 | 36 | #### v01c: Termination detection with asynchronous message delivery 37 | 38 | What happens if we loosen the restriction that message delivery is atomic (it seldom is)? Clearly, we are back at square one: 39 | 1) Node 23 sends a message to 42 40 | 2) 23 taints the token 41 | 3) Initiator starts a new round 42 | 4) Node 42 received the fresh token before receiving the activation message from 23 43 | 5) Boom! 44 | 45 | The fix proposed in [Shmuel Safra's EWD998](https://www.cs.utexas.edu/users/EWD/ewd09xx/EWD998.PDF), is to count in-flight messages. But will this work? 46 | 47 | ![EWD998](figures/v01-ring04.gif) 48 | 49 | Throughout the chapters of this tutorial, we will use the TLA+ specification language to model EWD998, and check interesting properties. 50 | 51 | ### v02: High-level spec AsyncTerminationDetection 52 | 53 | TLA+ is all about abstraction, and, as we will later see, has first-class support to connect different levels of abstraction. Let's use this and write a basic spec that either falsifies our design above, or gives us sufficient confidence to invest in writing a more detailed spec. 54 | 55 | (Credit: [Stephan Merz](https://members.loria.fr/Stephan.Merz/) wrote AsyncTerminationDetection) 56 | 57 | #### v02a: Spec skeleton 58 | 59 | Instead of modeling message channels, let alone modeling the transport layer, we will write a spec that models: 60 | 61 | 1) A ring of N nodes 62 | 2) The activation status of each node 63 | 3) The number of messages *pending*[^2] at a node 64 | 4) A send action 65 | 5) A receive action 66 | 6) A terminate action 67 | 7) The initial configuration of the system 68 | 69 | Please switch to [AsyncTerminationDetection.tla](AsyncTerminationDetection.tla) and read its comments. From here 70 | on, the tutorial continues there... 71 | 72 | [^2] It's difficult to (efficiently) count pending messages in an implementation. In a TLA+ spec, we don't care about that notion of efficiency. Also, all variables are global. 73 | -------------------------------------------------------------------------------- /AsyncTerminationDetection_proof.tla: -------------------------------------------------------------------------------- 1 | ---------------------- MODULE AsyncTerminationDetection_proof --------------------- 2 | EXTENDS AsyncTerminationDetection, TLAPS 3 | 4 | (* Do not whitelist all the known facts/assumptions and definitions to speedup provers *) 5 | \*USE NIsPosNat DEF vars, terminated, Node, 6 | \* Init, Next, Spec, 7 | \* DetectTermination, Terminate, 8 | \* Wakeup, SendMsg, 9 | \* TypeOK, Stable 10 | 11 | \* * An invariant I is inductive, iff Init => I and I /\ [Next]_vars => I. Note 12 | \* * though, that TypeOK itself won't imply Stable though! TypeOK alone 13 | \* * does not help us prove Stable. 14 | 15 | LEMMA TypeCorrect == Spec => []TypeOK 16 | <1>1. Init => TypeOK BY NIsPosNat DEF Init, TypeOK, Node, terminated 17 | <1>2. TypeOK /\ [Next]_vars => TypeOK' 18 | <2> SUFFICES ASSUME TypeOK, 19 | [Next]_vars 20 | PROVE TypeOK' 21 | OBVIOUS 22 | <2>1. CASE DetectTermination 23 | BY <2>1 DEF TypeOK, Next, vars, DetectTermination 24 | <2>2. ASSUME NEW i \in Node, 25 | NEW j \in Node, 26 | Terminate(i) 27 | PROVE TypeOK' 28 | BY <2>2 DEF TypeOK, Next, vars, Terminate, terminated 29 | <2>3. ASSUME NEW i \in Node, 30 | NEW j \in Node, 31 | Wakeup(i) 32 | PROVE TypeOK' 33 | BY <2>3 DEF TypeOK, Next, vars, Wakeup 34 | <2>4. ASSUME NEW i \in Node, 35 | NEW j \in Node, 36 | SendMsg(i, j) 37 | PROVE TypeOK' 38 | BY <2>4 DEF TypeOK, Next, vars, SendMsg 39 | <2>5. CASE UNCHANGED vars 40 | BY <2>5 DEF TypeOK, Next, vars 41 | <2>6. QED 42 | BY <2>1, <2>2, <2>3, <2>4, <2>5 DEF Next 43 | <1>. QED BY <1>1, <1>2, PTL DEF Spec 44 | 45 | (***************************************************************************) 46 | (* Proofs of safety and stability. *) 47 | (***************************************************************************) 48 | Safe == terminationDetected => terminated 49 | 50 | THEOREM Safety == Spec => []Safe 51 | <1>. USE DEF terminated, TypeOK, Safe 52 | <1>1. Init => Safe 53 | BY Zenon DEF Init 54 | <1>2. TypeOK /\ Safe /\ [Next]_vars => Safe' 55 | <2> SUFFICES ASSUME TypeOK, Safe, [Next]_vars 56 | PROVE Safe' 57 | OBVIOUS 58 | <2>1. CASE DetectTermination 59 | BY <2>1 DEF DetectTermination 60 | <2>2. ASSUME NEW i \in Node, Terminate(i) 61 | PROVE Safe' 62 | BY <2>2, Zenon DEF Terminate 63 | <2>3. ASSUME NEW i \in Node, Wakeup(i) 64 | PROVE Safe' 65 | BY <2>3 DEF Wakeup 66 | <2>4. ASSUME NEW i \in Node, NEW j \in Node, SendMsg(i, j) 67 | PROVE Safe' 68 | BY <2>4 DEF SendMsg 69 | <2>5. CASE UNCHANGED vars 70 | BY <2>5 DEF vars 71 | <2>. QED 72 | BY <2>1, <2>2, <2>3, <2>4, <2>5 DEF Next 73 | <1>. QED 74 | BY <1>1, <1>2, TypeCorrect, PTL DEF Spec 75 | 76 | THEOREM Stability == Spec => Stable 77 | \* We show that terminationDetected is never reset to FALSE 78 | <1>1. TypeOK /\ Safe /\ terminationDetected /\ [Next]_vars => terminationDetected' 79 | BY Zenon 80 | DEF TypeOK, Safe, terminated, Next, DetectTermination, Terminate, Wakeup, SendMsg, vars 81 | <1>. QED BY <1>1, TypeCorrect, Safety, PTL DEF Spec, Stable, Safe 82 | 83 | ----------------------------------------------------------------------------- 84 | 85 | \* syncActive == [n \in Node |-> active[n] \/ pending[n] # 0] 86 | 87 | \* STD == INSTANCE SyncTerminationDetection WITH active <- syncActive 88 | 89 | \* (***************************************************************************) 90 | \* (* We prove (the safety part of) refinement. *) 91 | \* (***************************************************************************) 92 | 93 | \* THEOREM Refinement == Spec => STD!Spec 94 | \* <1>. USE DEF Node, STD!Node, syncActive, terminated, STD!terminated 95 | \* <1>1. Init => STD!Init 96 | \* BY NIsPosNat, Zenon DEF Init, STD!Init 97 | \* <1>2. TypeOK /\ Safe /\ [Next]_vars => [STD!Next]_(STD!vars) 98 | \* <2> SUFFICES ASSUME TypeOK, Safe, [Next]_vars 99 | \* PROVE [STD!Next]_(STD!vars) 100 | \* OBVIOUS 101 | \* <2>. USE NIsPosNat DEF TypeOK, STD!Next, STD!vars 102 | \* <2>1. CASE DetectTermination 103 | \* BY <2>1, Zenon DEF DetectTermination, STD!DetectTermination 104 | \* <2>2. ASSUME NEW i \in Node, Terminate(i) 105 | \* PROVE [STD!Next]_(STD!vars) 106 | \* BY <2>2, Zenon DEF Terminate, STD!Terminate, Safe 107 | \* <2>3. ASSUME NEW i \in Node, Wakeup(i) 108 | \* PROVE [STD!Next]_(STD!vars) 109 | \* BY <2>3 DEF Wakeup 110 | \* <2>4. ASSUME NEW i \in Node, NEW j \in Node, SendMsg(i, j) 111 | \* PROVE [STD!Next]_(STD!vars) 112 | \* <3>1. syncActive[i] /\ UNCHANGED terminationDetected 113 | \* BY <2>4 DEF SendMsg 114 | \* <3>2. syncActive' = [syncActive EXCEPT ![j] = TRUE] 115 | \* BY <2>4, Isa DEF SendMsg 116 | \* <3>. QED BY <3>1, <3>2, Zenon DEF STD!Wakeup 117 | \* <2>5. CASE UNCHANGED vars 118 | \* BY <2>5 DEF vars 119 | \* <2>6. QED 120 | \* BY <2>1, <2>2, <2>3, <2>4, <2>5 DEF Next 121 | \* <1>3. Spec => WF_(STD!vars)(STD!DetectTermination) 122 | \* OMITTED 123 | \* <1>. QED BY <1>1, <1>2, <1>3, TypeCorrect, Safety, PTL DEF Spec, STD!Spec 124 | 125 | ============================================================================= 126 | -------------------------------------------------------------------------------- /MCEWD998.tla: -------------------------------------------------------------------------------- 1 | ------------------------------- MODULE MCEWD998 ------------------------------- 2 | EXTENDS EWD998, TLC 3 | 4 | (***************************************************************************) 5 | (* Bound the otherwise infinite state space that TLC has to check. *) 6 | (***************************************************************************) 7 | StateConstraint == 8 | /\ \A i \in Node : counter[i] < 3 /\ pending[i] < 3 9 | /\ token.q < 3 10 | 11 | ----------------------------------------------------------------------------- 12 | 13 | \* Note that the non-property TLCGet("level") < 42 combined with TLC's 14 | \* simulator quickly triggers som "counter-example" for MCEWD998. 15 | MaxDiameter == TLCGet("level") < 42 16 | 17 | \* $ tlc -noTE -simulate -deadlock MCEWD998 | grep -A1 "sim = TRUE" 18 | Alias == 19 | [ 20 | active |-> active 21 | ,color |-> color 22 | ,counter |-> counter 23 | ,pending |-> pending 24 | ,token |-> token 25 | 26 | \* Eye-ball test if some nodes simultaneously deactivate. Note that 27 | \* the nodes deactive in the *successor* (primed) state. 28 | ,sim |-> \E i,j \in Node: 29 | /\ i # j 30 | /\ active[i] # active[i]' 31 | /\ active[j] # active[j]' 32 | \* Yes, one can prime TLCGet("...") in recent version of TLC! With it, 33 | \* we account for the sim being true when the nodes deactivate in the 34 | \* successor state. Obviously, .name will be "Deactivate". 35 | ,action |-> TLCGet("action")'.name 36 | ] 37 | 38 | ----------------------------------------------------------------------------- 39 | 40 | \* With TLC, checking IInv /\ [Next]_vars => IInv' translate to a config s.t. 41 | \* 42 | \* CONSTANT N = 3 43 | \* INIT IInv 44 | \* NEXT Next 45 | \* INVARIANT IInv 46 | \* 47 | \* However, the number of states defined by TypeOK is infinite because of 48 | \* sub-formulas involving undound sets (Nat & Int). Therefore, we rewrite 49 | \* TypeOk and substitute MyNat for Nat and MyInt for Int , 50 | \* respectively. 51 | \* Alternatively, we could have re-defined Nat with MyNat and Int with 52 | \* MyInt . 53 | \* TODO Do you see why re-defining Nat and Int would have caused problems? 54 | 55 | MyNat == 0..3 56 | MyInt == -2..2 57 | 58 | IInit == 59 | /\ active \in [Node -> BOOLEAN] 60 | /\ pending \in [Node -> MyNat] 61 | /\ color \in [Node -> Color] 62 | /\ counter \in [Node -> MyInt] 63 | /\ token \in [ pos: Node, q: MyInt, color: Color ] 64 | /\ Inv 65 | 66 | ============================================================================= 67 | 68 | $ tlc -deadlock -config MCEWD998.tla MCEWD998 69 | 70 | ------------------------------ CONFIG MCEWD998 ------------------------------ 71 | 72 | CONSTANT N = 3 73 | 74 | INIT IInit 75 | NEXT Next 76 | 77 | INVARIANT IInv 78 | 79 | CONSTRAINT StateConstraint 80 | 81 | ============================================================================= 82 | 83 | TLC2 Version 2.16 of Day Month 20?? (rev: 5682c4a) 84 | Running breadth-first search Model-Checking with fp 75 and seed 6362907857480250600 with 1 worker on 4 cores with 5291MB heap and 64MB offheap memory [pid: 245607] (Linux 5.4.0-74-generic amd64, Ubuntu 11.0.11 x86_64, MSBDiskFPSet, DiskStateQueue). 85 | Parsing file /home/markus/src/TLA/ewd998/MCEWD998.tla 86 | Parsing file /home/markus/src/TLA/ewd998/EWD998.tla 87 | Parsing file /tmp/Integers.tla (jar:file:/opt/toolbox/tla2tools.jar!/tla2sany/StandardModules/Integers.tla) 88 | Parsing file /tmp/Naturals.tla (jar:file:/opt/toolbox/tla2tools.jar!/tla2sany/StandardModules/Naturals.tla) 89 | Parsing file /home/markus/src/TLA/ewd998/AsyncTerminationDetection.tla 90 | Semantic processing of module Naturals 91 | Semantic processing of module Integers 92 | Semantic processing of module AsyncTerminationDetection 93 | Semantic processing of module EWD998 94 | Semantic processing of module MCEWD998 95 | Starting... (2021-06-05 18:13:27) 96 | Computing initial states... 97 | Computed 2 initial states... 98 | Computed 4 initial states... 99 | Computed 8 initial states... 100 | Computed 16 initial states... 101 | Computed 32 initial states... 102 | Computed 64 initial states... 103 | Computed 128 initial states... 104 | Computed 256 initial states... 105 | Computed 512 initial states... 106 | Computed 1024 initial states... 107 | Computed 2048 initial states... 108 | Computed 4096 initial states... 109 | Computed 8192 initial states... 110 | Computed 16384 initial states... 111 | Computed 32768 initial states... 112 | Computed 65536 initial states... 113 | Computed 131072 initial states... 114 | Computed 262144 initial states... 115 | Computed 524288 initial states... 116 | Finished computing initial states: 696928 states generated, with 507184 of them distinct at 2021-06-05 18:14:47. 117 | Progress(2) at 2021-06-05 18:14:50: 850,004 states generated (850,004 s/min), 509,765 distinct states found (509,765 ds/min), 454,600 states left on queue. 118 | Model checking completed. No error has been found. 119 | Estimates of the probability that TLC did not check all reachable states 120 | because two distinct states had the same fingerprint: 121 | calculated (optimistic): val = 1.4E-7 122 | based on the actual fingerprints: val = 1.4E-10 123 | 4895579 states generated, 599598 distinct states found, 0 states left on queue. 124 | The depth of the complete state graph search is 36. 125 | The average outdegree of the complete state graph is 0 (minimum is 0, the maximum 7 and the 95th percentile is 1). 126 | Finished in 01min 54s at (2021-06-05 18:15:20) 127 | -------------------------------------------------------------------------------- /MCEWD998_actions.dot: -------------------------------------------------------------------------------- 1 | digraph ActionGraph { 2 | nodesep=0.35; 3 | subgraph cluster_legend { 4 | label = "Coverage"; 5 | node [shape=point] { 6 | d0 [style = invis]; 7 | d1 [style = invis]; 8 | p0 [style = invis]; 9 | p0 [style = invis]; 10 | } 11 | d0 -> d1 [label=unseen, color="green", style=dotted] 12 | p0 -> p1 [label=seen] 13 | } 14 | subgraph cluster_2914 { 15 | color="white" 16 | label="[]" 17 | 0 [label="InitiateProbe"] 18 | } 19 | subgraph cluster_577585152 { 20 | color="white" 21 | label="[i->1, i->1]" 22 | 1 [label="PassToken"] 23 | } 24 | subgraph cluster_1169842819 { 25 | color="white" 26 | label="[i->2, n->2]" 27 | 9 [label="SendMsg"] 28 | 10 [label="RecvMsg"] 29 | 11 [label="Deactivate"] 30 | } 31 | subgraph cluster_572967547 { 32 | color="white" 33 | label="[i->1, n->1]" 34 | 6 [label="SendMsg"] 35 | 7 [label="RecvMsg"] 36 | 8 [label="Deactivate"] 37 | } 38 | subgraph cluster_1165225214 { 39 | color="white" 40 | label="[i->2, i->2]" 41 | 2 [label="PassToken"] 42 | } 43 | subgraph cluster_1979189383 { 44 | color="white" 45 | label="[i->0, n->0]" 46 | 3 [label="SendMsg"] 47 | 4 [label="RecvMsg"] 48 | 5 [label="Deactivate"] 49 | } 50 | 0 -> 0[penwidth=0.56]; 51 | 0 -> 1[color="green",style=dotted]; 52 | 0 -> 2[penwidth=0.62]; 53 | 0 -> 3[penwidth=0.53]; 54 | 0 -> 4[penwidth=0.5]; 55 | 0 -> 5[penwidth=0.54]; 56 | 0 -> 6[penwidth=0.52]; 57 | 0 -> 7[penwidth=0.5]; 58 | 0 -> 8[penwidth=0.53]; 59 | 0 -> 9[penwidth=0.52]; 60 | 0 -> 10[color="green",style=dotted]; 61 | 0 -> 11[penwidth=0.51]; 62 | 1 -> 0[penwidth=0.6]; 63 | 1 -> 1[color="green",style=dotted]; 64 | 1 -> 2[color="green",style=dotted]; 65 | 1 -> 3[penwidth=0.48]; 66 | 1 -> 4[penwidth=0.5]; 67 | 1 -> 5[penwidth=0.48]; 68 | 1 -> 6[color="green",style=dotted]; 69 | 1 -> 7[penwidth=0.49]; 70 | 1 -> 8[color="green",style=dotted]; 71 | 1 -> 9[color="green",style=dotted]; 72 | 1 -> 10[color="green",style=dotted]; 73 | 1 -> 11[color="green",style=dotted]; 74 | 2 -> 0[color="green",style=dotted]; 75 | 2 -> 1[penwidth=0.63]; 76 | 2 -> 2[color="green",style=dotted]; 77 | 2 -> 3[penwidth=0.48]; 78 | 2 -> 4[penwidth=0.51]; 79 | 2 -> 5[penwidth=0.51]; 80 | 2 -> 6[penwidth=0.48]; 81 | 2 -> 7[penwidth=0.5]; 82 | 2 -> 8[penwidth=0.5]; 83 | 2 -> 9[color="green",style=dotted]; 84 | 2 -> 10[color="green",style=dotted]; 85 | 2 -> 11[color="green",style=dotted]; 86 | 3 -> 0[penwidth=0.49]; 87 | 3 -> 1[penwidth=0.44]; 88 | 3 -> 2[penwidth=0.47]; 89 | 3 -> 3[penwidth=0.53]; 90 | 3 -> 4[penwidth=0.46]; 91 | 3 -> 5[penwidth=0.53]; 92 | 3 -> 6[penwidth=0.46]; 93 | 3 -> 7[penwidth=0.55]; 94 | 3 -> 8[penwidth=0.48]; 95 | 3 -> 9[penwidth=0.39]; 96 | 3 -> 10[color="green",style=dotted]; 97 | 3 -> 11[penwidth=0.41]; 98 | 4 -> 0[penwidth=0.49]; 99 | 4 -> 1[penwidth=0.49]; 100 | 4 -> 2[penwidth=0.49]; 101 | 4 -> 3[penwidth=0.55]; 102 | 4 -> 4[penwidth=0.47]; 103 | 4 -> 5[penwidth=0.56]; 104 | 4 -> 6[penwidth=0.48]; 105 | 4 -> 7[penwidth=0.47]; 106 | 4 -> 8[penwidth=0.5]; 107 | 4 -> 9[penwidth=0.4]; 108 | 4 -> 10[color="green",style=dotted]; 109 | 4 -> 11[penwidth=0.42]; 110 | 5 -> 0[penwidth=0.54]; 111 | 5 -> 1[penwidth=0.54]; 112 | 5 -> 2[penwidth=0.53]; 113 | 5 -> 3[color="green",style=dotted]; 114 | 5 -> 4[penwidth=0.51]; 115 | 5 -> 5[color="green",style=dotted]; 116 | 5 -> 6[penwidth=0.5]; 117 | 5 -> 7[penwidth=0.52]; 118 | 5 -> 8[penwidth=0.51]; 119 | 5 -> 9[penwidth=0.41]; 120 | 5 -> 10[color="green",style=dotted]; 121 | 5 -> 11[penwidth=0.43]; 122 | 6 -> 0[penwidth=0.48]; 123 | 6 -> 1[color="green",style=dotted]; 124 | 6 -> 2[penwidth=0.45]; 125 | 6 -> 3[penwidth=0.46]; 126 | 6 -> 4[penwidth=0.53]; 127 | 6 -> 5[penwidth=0.46]; 128 | 6 -> 6[penwidth=0.5]; 129 | 6 -> 7[penwidth=0.44]; 130 | 6 -> 8[penwidth=0.55]; 131 | 6 -> 9[penwidth=0.36]; 132 | 6 -> 10[color="green",style=dotted]; 133 | 6 -> 11[penwidth=0.36]; 134 | 7 -> 0[penwidth=0.49]; 135 | 7 -> 1[color="green",style=dotted]; 136 | 7 -> 2[penwidth=0.49]; 137 | 7 -> 3[penwidth=0.49]; 138 | 7 -> 4[penwidth=0.47]; 139 | 7 -> 5[penwidth=0.49]; 140 | 7 -> 6[penwidth=0.55]; 141 | 7 -> 7[penwidth=0.48]; 142 | 7 -> 8[penwidth=0.56]; 143 | 7 -> 9[penwidth=0.29]; 144 | 7 -> 10[color="green",style=dotted]; 145 | 7 -> 11[penwidth=0.33]; 146 | 8 -> 0[penwidth=0.51]; 147 | 8 -> 1[penwidth=0.56]; 148 | 8 -> 2[penwidth=0.52]; 149 | 8 -> 3[penwidth=0.49]; 150 | 8 -> 4[penwidth=0.53]; 151 | 8 -> 5[penwidth=0.51]; 152 | 8 -> 6[color="green",style=dotted]; 153 | 8 -> 7[penwidth=0.52]; 154 | 8 -> 8[color="green",style=dotted]; 155 | 8 -> 9[penwidth=0.41]; 156 | 8 -> 10[color="green",style=dotted]; 157 | 8 -> 11[penwidth=0.42]; 158 | 9 -> 0[penwidth=0.43]; 159 | 9 -> 1[color="green",style=dotted]; 160 | 9 -> 2[color="green",style=dotted]; 161 | 9 -> 3[penwidth=0.4]; 162 | 9 -> 4[penwidth=0.47]; 163 | 9 -> 5[penwidth=0.38]; 164 | 9 -> 6[penwidth=0.35]; 165 | 9 -> 7[penwidth=0.3]; 166 | 9 -> 8[penwidth=0.38]; 167 | 9 -> 9[penwidth=0.47]; 168 | 9 -> 10[color="green",style=dotted]; 169 | 9 -> 11[penwidth=0.48]; 170 | 10 -> 0[color="green",style=dotted]; 171 | 10 -> 1[color="green",style=dotted]; 172 | 10 -> 2[color="green",style=dotted]; 173 | 10 -> 3[color="green",style=dotted]; 174 | 10 -> 4[color="green",style=dotted]; 175 | 10 -> 5[color="green",style=dotted]; 176 | 10 -> 6[color="green",style=dotted]; 177 | 10 -> 7[color="green",style=dotted]; 178 | 10 -> 8[color="green",style=dotted]; 179 | 10 -> 9[color="green",style=dotted]; 180 | 10 -> 10[color="green",style=dotted]; 181 | 10 -> 11[color="green",style=dotted]; 182 | 11 -> 0[penwidth=0.48]; 183 | 11 -> 1[color="green",style=dotted]; 184 | 11 -> 2[penwidth=0.5]; 185 | 11 -> 3[penwidth=0.42]; 186 | 11 -> 4[penwidth=0.46]; 187 | 11 -> 5[penwidth=0.42]; 188 | 11 -> 6[penwidth=0.39]; 189 | 11 -> 7[penwidth=0.38]; 190 | 11 -> 8[penwidth=0.41]; 191 | 11 -> 9[color="green",style=dotted]; 192 | 11 -> 10[color="green",style=dotted]; 193 | 11 -> 11[color="green",style=dotted]; 194 | } -------------------------------------------------------------------------------- /EWD998.tla: -------------------------------------------------------------------------------- 1 | It is time to pause and recap what we've done so far, both in terms of learning 2 | TLA+ and modeling termination detection in a ring, a.k.a. EWD998. 3 | 4 | Regarding the termination detection algorithm, checking the spec 5 | AsyncTerminationDetection (with TLC and Apalache) confirms that the high-level 6 | design of counting in-flight messages is a valid approach to detecting (global) 7 | termination. It might seem silly to write such a simple spec to confirm what is 8 | easy to see is true. On the other hand, writing a tiny spec is a small investment, 9 | and "Writing is nature's way of letting you know how sloppy your thinking is" 10 | (Guindon). Later, we will see another reason why specifying 11 | AsyncTerminationDetection paid off. 12 | 13 | What comes next is to (re-)model AsyncTerminationDetection at a level of detail 14 | that matches the EWD998 paper. Here is a reformulated & reordered excerpt of the 15 | eight rules that (informally) describe the algorithm: 16 | 17 | 0) Sending a message by node i increments a counter at i , the receipt of a 18 | message decrements i's counter 19 | 3) Receiving a *message* (not token) blackens the (receiver) node 20 | 2) An active node j -owning the token- keeps the token. When j becomes inactive, 21 | it passes the token to its neighbor with q = q + counter[j] 22 | 4) A black node taints the token 23 | 7) Passing the token whitens the sender node 24 | 1) The initiator sends the token with a counter q initialized to 0 and color 25 | white 26 | 5) The initiator starts a new round iff the current round is inconclusive 27 | 6) The initiator whitens itself and the token when initiating a new round 28 | 29 | 30 | Regarding learning TLA+, we've already covered lots of ground. Most importantly, 31 | we encountered TLA with its temporal operators, behaviors, safety & liveness 32 | properties, fairness, ... Learning TLA+ is pretty much downhill from here on. 33 | 34 | The remaining concepts this tutorial covers are: 35 | - IF-THEN-ELSE 36 | - Records 37 | - Recursive functions & operators 38 | - Refinement 39 | - Tuples/Sequences 40 | - CHOOSE operator (Hilbert's epsilon) 41 | 42 | ------------------------------- MODULE EWD998 ------------------------------- 43 | EXTENDS Integers \* No longer Naturals \* TODO Do you already see why? 44 | 45 | CONSTANT 46 | \* @type: Int; 47 | N 48 | 49 | ASSUME NIsPosNat == N \in Nat \ {0} 50 | 51 | Node == 0 .. N-1 52 | 53 | Color == {"white", "black"} 54 | 55 | VARIABLES 56 | \* @type: Int -> Bool; 57 | active, 58 | \* @type: Int -> Int; 59 | pending, 60 | color, 61 | counter, 62 | token 63 | 64 | vars == <> 65 | 66 | TypeOK == 67 | /\ active \in [Node -> BOOLEAN] 68 | /\ pending \in [Node -> Nat] 69 | /\ color \in [Node -> Color] 70 | /\ counter \in [Node -> Int] 71 | \* * TLA+ has records which are fuctions whose domain are strings. Since 72 | \* * records are functions, the syntax to create a record is that of a 73 | \* * function, except that the record key does not get quoted. 74 | \* * Finally, as with function sets we've seen earlier, it is easy 75 | \* * to define the set of records. However, the syntax is not -> , 76 | \* * but the : (colon), [ a : {1,2,3} ] . 77 | /\ token \in [ pos: Node, q: Int, color: Color ] 78 | 79 | ----------------------------------------------------------------------------- 80 | 81 | Init == 82 | /\ active \in [Node -> BOOLEAN] 83 | /\ pending = [i \in Node |-> 0] 84 | (* Rule 0 *) 85 | /\ color \in [Node -> Color] 86 | /\ counter = [i \in Node |-> 0] 87 | /\ pending = [i \in Node |-> 0] 88 | /\ token = [pos |-> 0, q |-> 0, color |-> "black"] 89 | 90 | ----------------------------------------------------------------------------- 91 | 92 | InitiateProbe == 93 | (* Rules 1 + 5 + 6 *) 94 | /\ token.pos = 0 95 | /\ \* previous round inconclusive: 96 | \/ token.color = "black" 97 | \/ color[0] = "black" 98 | \/ counter[0] + token.q > 0 99 | /\ token' = [ pos |-> N-1, q |-> 0, color |-> "white"] 100 | /\ color' = [ color EXCEPT ![0] = "white" ] 101 | /\ UNCHANGED <> 102 | 103 | PassToken(i) == 104 | (* Rules 2 + 4 + 7 *) 105 | /\ ~ active[i] 106 | /\ token.pos = i 107 | \* Rule 2 + 4 108 | \* Wow, TLA+ has an IF-THEN-ELSE expressions. 109 | /\ token' = [ token EXCEPT !.pos = @ - 1, 110 | !.q = @ + counter[i], 111 | !.color = IF color[i] = "black" THEN "black" ELSE @ ] 112 | \* Rule 7 113 | /\ color' = [ color EXCEPT ![i] = "white" ] 114 | /\ UNCHANGED <> 115 | 116 | System == 117 | \/ InitiateProbe 118 | \/ \E i \in Node \ {0}: PassToken(i) 119 | 120 | ----------------------------------------------------------------------------- 121 | 122 | SendMsg(i) == 123 | (* Rule 0 *) 124 | /\ active[i] 125 | /\ counter' = [counter EXCEPT ![i] = @ + 1] 126 | \* TLA has a CHOOSE operator that picks a value satisfying some property: 127 | \* CHOOSE x \in S: P(x) 128 | \* The choice is deterministic, meaning that CHOOSE always picks the same value. 129 | \* If no value in S satisfies the property P , the value of the CHOOSE 130 | \* expression is undefined. It is *not* an error in TLA, although TLC will 131 | \* complain. Likewise, TLC won't choose if S is unbound/infinite. 132 | \* CHOOSE is almost always wrong when it appears in the behavior spec 133 | \* (except for constant-level operators such as Min(S) or when choosing 134 | \* what is called model-values). 135 | \* In TLA+, non-deteministic choice is expressed with existential 136 | \* quantification, like it was done in Environment and System . 137 | \* However, using CHOOSE is a common mistake, which is why this topic is 138 | \* covered in this tutorial. CHOOSE usually has the "advantage" to cause 139 | \* less state-space explosion; but not in a good way. 140 | /\ \E recv \in (Node \ {i}): 141 | pending' = [pending EXCEPT ![recv] = @ + 1] 142 | /\ UNCHANGED <> 143 | 144 | \* Wakeup(i) in AsyncTerminationDetection. 145 | RecvMsg(i) == 146 | /\ pending[i] > 0 147 | /\ active' = [active EXCEPT ![i] = TRUE] 148 | /\ pending' = [pending EXCEPT ![i] = @ - 1] 149 | (* Rule 0 + 3 *) 150 | /\ counter' = [counter EXCEPT ![i] = @ - 1] 151 | /\ color' = [ color EXCEPT ![i] = "black" ] 152 | /\ UNCHANGED <> 153 | 154 | \* Terminate(i) in AsyncTerminationDetection. 155 | Deactivate == 156 | \* Modeling variant: Let multiple (logical processes) nodes deactivate at 157 | \* the same time/in the same step. This breaks the refinement ATD => STD. 158 | \* (Pick a function from the set of functions s.t. the inactive nodes in 159 | \* the current step remain inactive and the active nodes in the current 160 | \* step non-deterministically deactivate.) 161 | /\ active' \in { f \in [ Node -> BOOLEAN] : \A n \in Node: ~active[n] => ~f[n] } 162 | \* To avoid generating behaviors that quickly stutter when simulating the spec. 163 | /\ active' # active 164 | /\ UNCHANGED <> 165 | 166 | Environment == 167 | \E n \in Node: 168 | \/ SendMsg(n) 169 | \/ RecvMsg(n) 170 | \/ Deactivate 171 | 172 | ----------------------------------------------------------------------------- 173 | 174 | Next == 175 | System \/ Environment 176 | 177 | Spec == Init /\ [][Next]_vars /\ WF_vars(System) 178 | \* With the refinement below, TLC produces the following (liveness) violation: 179 | \* Error: Temporal properties were violated. 180 | \* 181 | \* Error: The following behavior constitutes a counter-example: 182 | \* 183 | \* State 1: 184 | \* /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 185 | \* /\ counter = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 186 | \* /\ token = [q |-> 0, color |-> "black", pos |-> 0] 187 | \* /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE) 188 | \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white") 189 | \* 190 | \* State 2: 191 | \* /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 192 | \* /\ counter = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 193 | \* /\ token = [q |-> 0, color |-> "white", pos |-> 2] 194 | \* /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE) 195 | \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white") 196 | \* 197 | \* State 3: 198 | \* /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 199 | \* /\ counter = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 200 | \* /\ token = [q |-> 0, color |-> "white", pos |-> 1] 201 | \* /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE) 202 | \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white") 203 | \* 204 | \* State 4: Stuttering 205 | \* This counter-examples makes us realize that we haven't defined a suitable 206 | \* fairness property for EWD998 . 207 | \* With WF_vars(Next) , TLC finds a counter-example where the Initiator 208 | \* forever initiates new token rounds, but one node never receives a message 209 | \* that was send to it. 210 | \* 211 | \* State 1: 212 | \* /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 213 | \* /\ counter = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 214 | \* /\ token = [q |-> 0, color |-> "black", pos |-> 0] 215 | \* /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE) 216 | \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white") 217 | \* 218 | \* State 2: 219 | \* /\ pending = (0 :> 0 @@ 1 :> 1 @@ 2 :> 0) 220 | \* /\ counter = (0 :> 1 @@ 1 :> 0 @@ 2 :> 0) 221 | \* /\ token = [q |-> 0, color |-> "black", pos |-> 0] 222 | \* /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE) 223 | \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white") 224 | \* 225 | \* State 3: 226 | \* /\ pending = (0 :> 0 @@ 1 :> 1 @@ 2 :> 0) 227 | \* /\ counter = (0 :> 1 @@ 1 :> 0 @@ 2 :> 0) 228 | \* /\ token = [q |-> 0, color |-> "white", pos |-> 2] 229 | \* /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE) 230 | \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white") 231 | \* 232 | \* State 4: 233 | \* /\ pending = (0 :> 0 @@ 1 :> 1 @@ 2 :> 0) 234 | \* /\ counter = (0 :> 1 @@ 1 :> 0 @@ 2 :> 0) 235 | \* /\ token = [q |-> 0, color |-> "white", pos |-> 1] 236 | \* /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE) 237 | \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white") 238 | \* 239 | \* State 5: 240 | \* /\ pending = (0 :> 0 @@ 1 :> 1 @@ 2 :> 0) 241 | \* /\ counter = (0 :> 1 @@ 1 :> 0 @@ 2 :> 0) 242 | \* /\ token = [q |-> 0, color |-> "white", pos |-> 0] 243 | \* /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE) 244 | \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white") 245 | \* 246 | \* Back to state 3: 247 | \* 248 | \* This hints at the fact that EWD998 does not handle unreliable message 249 | \* delivery. However, what is really happening is that the RecvMsg never 250 | \* occurs. How can that be, since we defined (weak) fairness on the Next 251 | \* action and its sub-action RecvMsg is permanently enabled? 252 | \* Fairness does not distribute over the sub-actions of an action such as Next . 253 | \* If this is what we want, we would have to conjoin multiple fairness 254 | \* conditions to Spec ; one for each sub-action. This isn't really what we 255 | \* want, though. Fundamentally, the algorithm described in EWD998 detects 256 | \* termination if and only if all nodes (eventually) terminate. If the nodes 257 | \* never terminate (which subsumes sending messages back and forth), there is 258 | \* no termination to detect. This suggests that we are only interest in 259 | \* checking whether or not termination is detected for those behaviors where 260 | \* all nodes eventually terminate. 261 | 262 | terminationDetected == 263 | /\ token.pos = 0 264 | /\ token.color = "white" 265 | /\ token.q + counter[0] = 0 266 | /\ color[0] = "white" 267 | /\ ~ active[0] 268 | 269 | \* We haven't checked anything except the TypeOK invariant above, which does 270 | \* not say anything about termination detection. What we could do, is to 271 | \* re-state and check the same theorems Stable and Live that we checked for 272 | \* AsyncTerminationDetection -- copy&paste is acceptable with specs after all! 273 | \* On the other hand, this is not exactly what we want to check; we don't want 274 | \* to check that (an amended) Stable and Live hold for EWD998. What we 275 | \* really care about is that the module EWD998 *implements* the high-level 276 | \* specificiation AsyncTerminationDetection (ATD). 277 | \* With TLA, implementation is (logical) implication. To state that some spec 278 | \* I implements a higher-level specification A is formally expressed as 279 | \* I => A . This is equivalent to saying that the behaviors defined by I 280 | \* are a subset of the behaviors defined by A . However, what if I declares 281 | \* additional variables that don't exist in A ? For spec EWD998 , the 282 | \* variables color, token, pending do not appear in ATD . This is where 283 | \* the sub-scripts we added to the various temporal formulas in ATD start to 284 | \* make sense. Recall that [][A]_v is equivalent to [](A \/ v'=v) . This 285 | \* formula is true of behaviors in which variables - not appearing in [A]_v - 286 | \* change in any way they want, as long as the variables in v remain unchanged, 287 | \* or a A step happens. In fact, [A]_v does not say anything about variables 288 | \* not appearing in it; the formula does not "care" about those variables. For 289 | \* EWD998 and ATD , the module ATD allows the module EWD998 to specify 290 | \* anything "in line" with ATD . 291 | \* Remember that an A step above is just an action-level formula. The 292 | \* identifier A of its definition is just a syntactic element to make specs 293 | \* more readable. In other words, when we say A step above, we talk about 294 | \* the formula (the right-hand side of A == foo). Thus, the A step of ATD 295 | \* can be a B step of EWD998 provided that B is a step permitted by A . 296 | \* This theorem is syntactically incorrect, because we haven't added the module 297 | \* AsyncTerminationDetection to the list of EXTENDS at the top of EWD998. 298 | \* If we were to add ATD to the EXTENDS , we would end up with various name 299 | \* clashes. Think of EXTENDS as inlining the extended modulese. 300 | \* What we need is to "import" ATD under a new namespace, thoug. In TLA, the 301 | \* term is instantiation, syntactically expressed with INSTANCE M where M 302 | \* is a module. To instantiate module M into a namespace, we rely on the 303 | \* (fundamental) concept of definitions again: M == INSTANCE M . 304 | \* The module ATD declares the variables terminationDetected that is absent 305 | \* in EWD998 . In other words, EWD998 does not define a value of the 306 | \* variable terminationDetected in its behaviors. We can define the value of 307 | \* terminationDetected in EWD998 by stating with what expression 308 | \* terminationDetected should be substituted that is equivalent to 309 | \* ATD!terminationDetected . Syntactically, we append a 310 | \* WITH symbol <- substitution to the INSTANCE statement. 311 | ATD == INSTANCE AsyncTerminationDetection 312 | 313 | THEOREM Implements == Spec => ATD!Spec 314 | 315 | \* The bang is not a valid token in a config file. 316 | ATDSpec == ATD!Spec 317 | 318 | \* With the refinement done, it is sanity-check time again. As we have learned 319 | \* with the state constraint earlier, a good check is to quickly generate a 320 | \* small number of behaviors. If some actions are not covered, we have to look 321 | \* closer. 322 | \* Another useful sanity-check is to verify the spec for a single node, i.e., 323 | \* N = 1 . We want termination to detect termination of a single node, no? 324 | \* Generating the graph with "full" statistics reveals the context in which the 325 | \* action formulae are evaluated. In other words, the graph includes the 326 | \* parameters that were "passed" to the actions. 327 | \* For the graph generated from EWD998, the RecvMsg action for the context 328 | \* [i->2] , which corresponds to node #2 is not covered. This means that the 329 | \* sub-action RecvMsg was never enabled when the simulator generated the 330 | \* behaviors, which can be the case iff SendMsg never incremented 331 | \* pending[2] . This might just be exceptional luck, but maybe there is 332 | \* something more subtle going on. This is an excellent opportunity to meet 333 | \* the TLA+ debugger (that has recently been added :-). 334 | 335 | ----------------------------------------------------------------------------- 336 | 337 | HasToken == 338 | token.pos 339 | 340 | \* Usually, one would find additional invariants and liveness properties at this 341 | \* stage and check the spec for different spec parameters. The second part can 342 | \* easily be parallelized and scaled out (hello cloud computing!). 343 | \* If higher assurances are needed, now would be the start of proving EWD998 344 | \* correct, which requires finding an inductive invariant. Finding an 345 | \* inductive invariant is hard because one has to know *why* the algorithm 346 | \* works (model-checking only confirms that algorithms work according to the 347 | \* checked properties). 348 | \* Fortunately, the EWD998 paper gives an inductive invariant in the form of a 349 | \* larger formula P0 /\ (P1 \/ P2 \/ P3 \/ P4) , with \S representing 350 | \* "the sum of", B to equal the sum of in-flight messages, and P0 to P4 : 351 | \* 352 | \* P0: B = Si: 0 <= i < N: c.i) 353 | \* P1: (Ai: t < i < N: machine nr.i is passive) /\ 354 | \* (Si: t < i < N: c.i) = q 355 | \* P2: (Si: 0 <= i <= t: c.i) + q > 0 356 | \* P3: Ei: 0 <= i <= t : machine nr.i is black 357 | \* P4: The token is black 358 | 359 | \* TLA doesn't have for loops with which we could sum the elements of the 360 | \* variables counter and pending ; TLA+ is not an imperative programming 361 | \* language. Instead, TLA+ has recursive functions. We could write a 362 | \* function to sum the variable counter as: 363 | \* 364 | \* SumC == CHOOSE f : f = [ i \in 0..N-1 |-> IF i = 0 365 | \* THEN counter[i] 366 | \* ELSE f[i-1] + counter[i] ] 367 | \* 368 | \* The sum of counter would then be SumC[N-1] . 369 | \* TLC does not evaluate unbounded choose. However, TLA+ has a syntactic 370 | \* variant that TLC evaluates: 371 | \* 372 | \* SumC[ i \in 0..N-1 ] == IF i=0 THEN counter[i] ELSE SumC[i-1] + counter[i] 373 | \* 374 | \* To write a recursive function to sum the elements of a function given a 375 | \* (subset) of its domain that is independent of counter , and, thus, also 376 | \* works for pending , we need to see another TLA+ concept. A let/in 377 | \* expression allows us to use locally define operators. A let/in is just a 378 | \* syntactic concept, and the expression is equivalent to an expression 379 | \* with all locally defined operators in-lined. 380 | \* @type: (Int -> Int, Int, Int) => Int; 381 | \* Sum(fun, from, to) == 382 | \* LET sum[ i \in from..to ] == 383 | \* IF i = from THEN fun[i] 384 | \* ELSE sum[i-1] + fun[i] 385 | \* IN sum[to] 386 | 387 | \* Alternatively, one can write recursive operators. What distinguishes a 388 | \* recursive operator from an ordinary operator, is a RECURSIVE operator 389 | \* declaration. 390 | \* Compared to recursive functions, TLC usually evaluate recursive operators 391 | \* faster. However, that is not the case for Apalache. PlusPy, a tool to 392 | \* execute TLA+ specifications, doesn't support recursive operators at all. 393 | \* Commented because of https://git.io/JGAf7 and mandatory bounds for unrolling 394 | \* https://apalache.informal.systems/docs/apalache/principles.html#recursion 395 | \* RECURSIVE SumO(_,_,_) 396 | \* SumO(fun, from, to) == 397 | \* IF from = to 398 | \* THEN fun[to] 399 | \* ELSE fun[from] + SumO(fun, from+1, to) 400 | 401 | \* Lastly, we can re-use fold operators from the TLA+ CommunityModules at 402 | \* https://github.com/tlaplus/CommunityModules that are especially well-known 403 | \* among functional programmers. This gives us a chance to show LAMBDA 404 | \* in TLA+. 405 | \* Commented because of https://git.io/JGAf7 and lack of annotations in Utils.tla 406 | Sum(fun, from, to) == 407 | LET F == INSTANCE Functions 408 | IN F!FoldFunctionOnSet(LAMBDA a,b: a+b, 0, fun, from..to) 409 | 410 | B == 411 | \* This spec counts the in-flight messages in the variable pending . 412 | Sum(pending, 0, N-1) 413 | 414 | Inv == 415 | /\ P0:: B = Sum(counter, 0, N-1) 416 | /\ \/ P1:: /\ \A i \in (token.pos+1)..N-1: ~ active[i] 417 | /\ IF token.pos = N-1 418 | THEN token.q = 0 419 | ELSE token.q = Sum(counter, (token.pos+1), N-1) 420 | \/ P2:: Sum(counter, 0, token.pos) + token.q > 0 421 | \/ P3:: \E i \in 0..token.pos : color[i] = "black" 422 | \/ P4:: token.color = "black" 423 | 424 | \* We expect that Inv is an inductive invariant that we can eventually prove 425 | \* correct with TLAPS. However, "it is easier to prove something if it's true", 426 | \* and, thus, we validate IInv for small values of N with model-checking. 427 | \* For that, we conjoin TypeOK with Inv to IInv , and (logically) check 428 | \* the formula with TLC: 429 | \* 430 | \* IInv /\ [Next]_vars => IInv' 431 | \* 432 | IInv == 433 | /\ TypeOK 434 | /\ Inv 435 | 436 | ============================================================================= 437 | -------------------------------------------------------------------------------- /AsyncTerminationDetection.tla: -------------------------------------------------------------------------------- 1 | ---------------------- MODULE AsyncTerminationDetection --------------------- 2 | \* * TLA+ is an expressive language and we usually define operators on-the-fly. 3 | \* * That said, the TLA+ reference guide "Specifying Systems" (download from: 4 | \* * https://lamport.azurewebsites.net/tla/book.html) defines a handful of 5 | \* * standard modules. Additionally, a community-driven repository has been 6 | \* * collecting more modules (http://modules.tlapl.us). In our spec, we are 7 | \* * going to need operators for natural numbers. 8 | EXTENDS Naturals 9 | 10 | \* * A constant is a parameter of a specification. In other words, it is a 11 | \* * "variable" that cannot change throughout a behavior, i.e., a sequence 12 | \* * of states. Below, we declares N to be a constant of this spec. 13 | \* * We don't know what value N has or even what its type is; TLA+ is untyped and 14 | \* * everything is a set. In fact, even 23 and "frob" are sets and 23="frob" is 15 | \* * syntactically correct. However, we don't know what elements are in the sets 16 | \* * 23 and "frob" (nor do we care). The value of 23="frob" is undefined, and TLA+ 17 | \* * users call this a "silly expression". 18 | CONSTANT 19 | \* @type: Int; 20 | N 21 | 22 | \* * We should declare what we assume about the parameters of a spec--the constants. 23 | \* * In this spec, we assume constant N to be a (positive) natural number, by 24 | \* * stating that N is in the set of Nat (defined in Naturals.tla) without 0 (zero). 25 | \* * Note that the TLC model-checker, which we will meet later, checks assumptions 26 | \* * upon startup. 27 | ASSUME NIsPosNat == N \in Nat \ {0} 28 | 29 | \* * A definition Id == exp defines Id to be synonymous with an expression exp. 30 | \* * A definition just gives a name to an expression. The name isn't special. 31 | \* * It is best to write comments that explain what is being defined. To get 32 | \* * a feeling for how extensive comments tend to be, see the Paxos spec at 33 | \* * https://git.io/JZJaD . 34 | \* * Here, we define Node to be synonymous with the set of naturals numbers 35 | \* * 0 to N-1. Semantically, Node is going to represent the ring of nodes. 36 | \* * Note that the definition Node is a zero-arity (parameter-less) operator. 37 | Node == 0 .. N-1 38 | 39 | 40 | \* * Contrary to constants above, variables may change value in a behavior: 41 | \* * The value of active may be 23 in one state and "frob" in another. 42 | \* * For EWD998, active will maintain the activation status of our nodes, 43 | \* * while pending counts the in-flight messages from other nodes that a 44 | \* * node has yet to receive. 45 | VARIABLES 46 | \* @type: Int -> Bool; 47 | active, \* activation status of nodes 48 | \* @type: Int -> Int; 49 | pending, \* number of messages pending at a node 50 | \* * Up to now, this specification didn't teach us anything useful regarding 51 | \* * termination detection in a ring (we were mostly concerned with TLA+ itself). 52 | \* * Let's change this to find out if this proto-algorithm detects termination. 53 | \* * In an implementation, we could write to a log file whenever the system 54 | \* * terminates. However, for larger systems it can be challenging to collect 55 | \* * e.g., a consistent snapshot. In a spec, we can just use an (ordinary) variable 56 | \* * that -contrary to the other variables- doesn't define the state the system is 57 | \* * in, but records what the system has done so far. The jargon for this variable 58 | \* * is "history variable". 59 | \* * For termination detection, the complete history of the computation, performed 60 | \* * by the system, is not relevant--we only care if the system detected 61 | \* * termination. 62 | \* @type: Bool; 63 | terminationDetected 64 | 65 | \* * A definition that lets us refer to the spec's variables (more on it later). 66 | vars == << active, pending, terminationDetected >> 67 | 68 | terminated == \A n \in Node : ~ active[n] /\ pending[n] = 0 69 | 70 | ----------------------------------------------------------------------------- 71 | 72 | \* * Initially, all nodes are active and no messages are pending. 73 | Init == 74 | \* * ...all nodes are active. 75 | \* * The TLA+ language construct below is a function. A function has a domain 76 | \* * and a co-domain/range. Lamport: ["In the absence of types, I don't know 77 | \* * what a partial function would be or why it would be useful."] 78 | \* * (http://discuss.tlapl.us/msg01536.html). 79 | \* * Here, we "map" each element in Node to the value TRUE (it is just 80 | \* * coincidence that the elements of Node are 0, 1, ..., N-1, which could 81 | \* * suggest that functions are just zero-indexed arrays found in programming 82 | \* * languages. As a matter of fact, the domain of a function can be any set, 83 | \* * even infinite ones: [n \in Nat |-> n]). 84 | \* * /\ is logical And (&& in programming). Conjunct lists usually make it easier 85 | \* * to read. However, indentation is significant! 86 | \* * So far, the initial predicate defined a single state. That seems natural as 87 | \* * most programs usually start with all variables initialized to some fixed 88 | \* * value. In a spec, we don't have to be this strict. Instead, why not let 89 | \* * the system start from any (type-correct) state? 90 | \* * Besides syntax to define a specific function, TLA+ also has syntax to define 91 | \* * a set of functions mapping from some set S (the domain) to some other set T: 92 | \* * [ S -> T ] or, more concretely: [ {0,1,2,3} -> {TRUE, FALSE} ] 93 | /\ active \in [ Node -> BOOLEAN ] 94 | /\ pending \in [ Node -> Nat ] 95 | /\ terminationDetected \in {FALSE, terminated} 96 | 97 | \* * Recall that TLA+ is untyped and that we are "free" to write silly expressions. So 98 | \* * why no types? The reason is that, while real-world specs can be big enough for 99 | \* * silly expressions to sneak in (still way smaller than programs), types would 100 | \* * unnecessarily slow us down when specifying (prototyping). Also, there is a way to 101 | \* * catch silly expressions quickly. 102 | \* * It's finally time to state and check a first correctness property, namely that our 103 | \* * spec is "properly typed". We do this by writing an operator that evaluates to 104 | \* * false, should values of variables not be as expected. We can think of this a 105 | \* * stating the types of variables in a special place, and not where they are declared 106 | \* * or where values are assigned. When TLC verifies the spec, it will evaluate the 107 | \* * operator on every state it generates. If the operator evaluates to false, an error 108 | \* * is reported. In other words, the operator is an invariant of the system. 109 | \* * Invariants are (a class of) safety properties, and safety props are "informally" 110 | \* * define as "nothing bad ever happens" (a formal definition can be found in 111 | \* * https://link.springer.com/article/10.1007/BF01782772, but we won't need it). 112 | TypeOK == 113 | /\ active \in [ Node -> BOOLEAN ] 114 | /\ pending \in [ Node -> Nat ] 115 | /\ terminationDetected \in BOOLEAN 116 | 117 | ----------------------------------------------------------------------------- 118 | 119 | \* * Each one of the definitions below represent atomic transitions, i.e., define 120 | \* * the next state of the current behavior (a state is an assignment of 121 | \* * values to variables). We call those definitions "actions". A next state is 122 | \* * possible if the action is true for some combination of current and next 123 | \* * values. Two or more actions do *not* happen simultaneously; if we want to 124 | \* * e.g. model things to happen at two nodes at once, we are free to choose an 125 | \* * appropriate level of granularity for those actions. 126 | 127 | \* * Node i terminates. 128 | Terminate(i) == 129 | \* Any subset of *active* nodes can become inactive in the next step. 130 | /\ active' \in { f \in [ Node -> BOOLEAN] : \A n \in Node: ~active[n] => ~f[n] } 131 | \* * Also, the variable active is no longer unchanged. 132 | /\ pending' = pending 133 | \* * Possibly (but not necessarily) detect termination, iff all nodes are inactive 134 | \* * and no messages are in-flight. 135 | /\ terminationDetected' \in {terminationDetected, terminated'} 136 | 137 | \* * Node i sends a message to node j. 138 | SendMsg(i, j) == 139 | /\ active[i] 140 | /\ pending' = [pending EXCEPT ![j] = @ + 1] 141 | /\ UNCHANGED << active, terminationDetected >> 142 | 143 | \* * Node I receives a message. 144 | Wakeup(i) == 145 | /\ pending[i] > 0 146 | /\ active' = [active EXCEPT ![i] = TRUE] 147 | /\ pending' = [pending EXCEPT ![i] = @ - 1] 148 | /\ UNCHANGED << terminationDetected >> 149 | 150 | DetectTermination == 151 | /\ terminated 152 | /\ ~terminationDetected 153 | /\ terminationDetected' = TRUE 154 | /\ UNCHANGED << active, pending >> 155 | 156 | ----------------------------------------------------------------------------- 157 | 158 | \* * Here we define the complete next-state action. Recall that it’s a predicate 159 | \* * on two states — the current and the next — which is true if the next state 160 | \* * is acceptable. 161 | \* * The next-state relation should somehow plug concrete values into the 162 | \* * (sub-) actions Terminate, SendMsg, and Wakeup. 163 | Next == 164 | \/ DetectTermination 165 | \/ \E i,j \in Node: 166 | \/ Terminate(i) 167 | \/ Wakeup(i) 168 | \* ? Is it correct to let node i send a message to node j with i = j? 169 | \/ SendMsg(i, j) 170 | 171 | Stable == 172 | \* * With the addition of the auxiliary variable terminationDetected and 173 | \* * the action DetectTermination , we can check that our (ultra) high-level 174 | \* * design achieves termination detection. 175 | \* * Holds iff tD = FALSE instead of in Init/MCInit. 176 | \* * If the definition of MCInit in MCAsyncTerminationDetection.tla is 177 | \* * changed to terminationDetected \in {FALSE, terminated} , Stable 178 | \* * is violated by the initial state: 179 | \* * Error: Property Stable is violated by the initial state: 180 | \* * /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 181 | \* * /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE) 182 | \* * /\ terminationDetected = FALSE 183 | \* * Why? Because Stable just asserts something about initial states. 184 | \* * With terminationDetected \in {FALSE, terminated} , the state above 185 | \* * becomes an initial state (see Specifying Systems p. 241 for morew details). 186 | \* * How do we say that we want Stable to hold for all states of a behavior, 187 | \* * not just for initial states? In other words, how do we state properties 188 | \* * that are evaluated on behaviors; not just single states? 189 | \* * We have arrived at the provenance of temporal logic. There are many temporal 190 | \* * logics, and TLA is but one of them (the missing "+" is not a typo!). 191 | \* * Like with programming, different (temporal) logics make different tradeoffs. 192 | \* * Compared to, e.g., Linear temporal logic (LTL), TLA has the two (fundamental) 193 | \* * temporal operators, Always (denoted as [] and pronounced "box") and Eventually 194 | \* * (<> pronounced "diamond"). In contrast, LTL has Next and Until, which means 195 | \* * that one cannot say the same things with both logics. TLA's operators 196 | \* * guarantee that temporal formulae are stuttering invariant, which we will touch 197 | \* * on later when we talk about refinement. 198 | \* * For now, we just need the Always operator, to state Stable. []Stable asserts 199 | \* * that Stable holds in all states of a behavior. In other words, the formula 200 | \* * Stable is always true. Note that Box can also be pushed into the definition of 201 | \* * Stable. 202 | \* * The following behavior violates the (strengthened) Stable: 203 | \* * State 1: 204 | \* * /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 205 | \* * /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> TRUE) 206 | \* * /\ terminationDetected = FALSE 207 | \* * State 2: 208 | \* * /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 209 | \* * /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE) 210 | \* * /\ terminationDetected = FALSE 211 | \* * State 3: 212 | \* * /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0) 213 | \* * /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE) 214 | \* * /\ terminationDetected = TRUE 215 | \* * State 4: Stuttering 216 | \* * Have we already found a flaw in our design and are forced back to the 217 | \* * whiteboard? No, you (intentionally) got hold of the wrong end of the stick. 218 | \* * It is not that terminated implies terminationDetection , but the other 219 | \* * way around. 220 | \* * Phew, we have a high-level design (and you learned a lot about TLA+). Let's 221 | \* * move to the next level. Except, one should always be suspicious of success... 222 | [](terminationDetected => []terminated) 223 | 224 | ----------------------------------------------------------------------------- 225 | 226 | \* * It is usually a good idea to check a couple of non-properties, i.e., properties that 227 | \* * we expect to be violated. We will use the behavior that violates the non-property 228 | \* * as a sanity check. 229 | \* * So far, our spec has TypeOK that assert the "types" of the variables and Stable 230 | \* * that asserts that terminationDetected can only be true, iff terminated is true. 231 | \* * In TLA, we can also assert that (sub-)actions occur in a behavior; after all, it's 232 | \* * the Temporal Logic of *Actions*. :-) A formula, [A]_v with A an action holds 233 | \* * for a behavior if ever step (pair of states) is an [A]_v step. For the moment, 234 | \* * we will ignore the subscript _v and simply write _vars instead of it: [A]_vars. 235 | \* * 236 | ActuallyNext == 237 | [][DetectTermination \/ \E i,j \in Node: (Terminate(i) \/ Wakeup(i) \/ SendMsg(i,j))]_vars 238 | \* * In hindsight, it was to be expected that the trace just has two states 239 | \* * i.e., a single step. The property OnlyTerminating is violated by 240 | \* * behaviors that take our actions: 241 | \* * Error: Action property OnlyTerminating is violated. 242 | \* * Error: The behavior up to this point is: 243 | \* * State 1: 244 | \* * /\ pending = (0 :> 1 @@ 1 :> 1 @@ 2 :> 1) 245 | \* * /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE) 246 | \* * /\ terminationDetected = FALSE 247 | \* * 248 | \* * State 2: 249 | \* * /\ pending = (0 :> 0 @@ 1 :> 1 @@ 2 :> 1) 250 | \* * /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE) 251 | \* * /\ terminationDetected = FALSE 252 | \* * Let's now focus on the subscript _v part that we glossed over previously. 253 | \* * The subscript _v in [A]_v is a state-function, a formula without action- or 254 | \* * temporal-level operators, that -informally- defines what happens with the 255 | \* * variables. 256 | \* * We replaced _v with _vars where vars equals the defintion on line 57 257 | \* * << active, pending, terminationDetected >> . Note that << >> is just syntactic 258 | \* * sugar to conveniently state 1-indexed arrays. However, they are called 259 | \* * sequences in TLA are many useful sequence-related operators are defined in the 260 | \* * Sequences.tla standard module. More importantly, a sequence has an order! 261 | \* * Time to pull out the TLA+ cheat sheet and check page 4: 262 | \* * https://www.hpl.hp.com/techreports/Compaq-DEC/SRC-TN-1997-006A.pdf 263 | \* * The formula [A]_v is equivalent to A \/ (v' = v) . Semantically, every 264 | \* * step of the behavior is an A step, or the variables in v remain unchanged. 265 | \* * If you look closely, you will realize that the disjunct of actions nested in 266 | \* * OnlyTerminating is equivalent to the Next operator above! Up to now, 267 | \* * we've been using a TLC feature that lets us pass INIT and NEXT in TLC's 268 | \* * configuration file. In TLA, the system specification that defines the set of 269 | \* * of valid system behaviors, is actually given as a temporal formula. 270 | 271 | F == 272 | \* * With this liveness property F , all (other) properties hold. :-) However, 273 | \* * it looks funny that check Live1 and Live2 when both are also part of Spec. 274 | \* * At the level of termination detection with EWD998, terminated might never be 275 | \* * true because nodes may never terminate. 276 | \* * Additionally, there is a second problem with F that is even independent of 277 | \* * EWD998: A scheduler would have to look into the future to see if the 278 | \* * scheduling choice it is making at some point, leads to an unrecoverable state 279 | \* * later from where the stipulated "good thing" can no longer happen. This is 280 | \* * elsewhere informally called "paint itself in the corner", or -formally- is the 281 | \* * topic of machine-closed specifications. 282 | \* * We want F to not add additional safety properties on top of Spec . We won't 283 | \* * discuss the whys here, but if we restrict ourselve to only stipulate that 284 | \* * enabled sub-actions of the next-state relation Next eventually happen, we can 285 | \* * be sure that we don't paint the scheduler in the corner. To rule out the 286 | \* * behavior shown by TLC as a violation of Live1 , we have to require that a 287 | \* * Next step eventually hapens (if it is "possible"). We need to put a number of 288 | \* * previously seen concepts together now: 289 | \* * - => (implication) 290 | \* * - ENABLED 291 | \* * - <>_v 292 | \* * - Combining [] and <> to []<> and <>[] 293 | \* * "If A is enabled forever, infinitely many A steps will eventually occur." 294 | \* * <>[](ENABLED <>_vars) => []<><>_vars 295 | \* * This can be written more compactly as WF_vars(Next) , but TLC still shows 296 | \* * a lasso-shaped counter-example: 297 | \* * 298 | \* * Error: Temporal properties were violated. 299 | \* * 300 | \* * Error: The following behavior constitutes a counter-example: 301 | \* * 302 | \* * State 1: 303 | \* * /\ pending = (0 :> 1 @@ 1 :> 1 @@ 2 :> 1) 304 | \* * /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE) 305 | \* * /\ terminationDetected = FALSE 306 | \* * 307 | \* * State 2: 308 | \* * /\ pending = (0 :> 1 @@ 1 :> 1 @@ 2 :> 0) 309 | \* * /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> TRUE) 310 | \* * /\ terminationDetected = FALSE 311 | \* * 312 | \* * State 3: 313 | \* * /\ pending = (0 :> 1 @@ 1 :> 1 @@ 2 :> 1) 314 | \* * /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> TRUE) 315 | \* * /\ terminationDetected = FALSE 316 | \* * 317 | \* * Back to state 1: 318 | WF_vars(DetectTermination) 319 | 320 | \* * We’ll now define a formula that encompasses our specification of how the system 321 | \* * behaves. It combines the Initial state predicate, the next-state action, and 322 | \* * something called a fairness property that we will learn about later. 323 | \* * It is convention to name the behavior spec Spec . 324 | Spec == 325 | \* * F has been inlined because of https://github.com/informalsystems/apalache/issues/468#issuecomment-853259723 326 | \* Wow, liveness (fairness) is subtle. However, this is not because TLA poorly 327 | \* equipped to handle liveness. "[Instead,] the problem lies in the nature 328 | \* of liveness, not in its definition" (Lamport). 329 | \* "Narrowing" fairness from Next to DetectTermination makes sure that 330 | \* a DetectTermination eventually happens instead of repeated token rounds. 331 | \* TODO Convince yourself that AsyncTerminationDetection is still correct 332 | \* TODO and EWD998 passes, i.e., rerun TLC. 333 | Init /\ [][Next]_vars /\ WF_vars(DetectTermination) (* F *) 334 | 335 | Terminates == 336 | \* * The behavior spec Spec asserts that every step/transition is a Next step, or 337 | \* * the variables do not change. But is it actually true that the system can always 338 | \* * and forever take a Next step? Semantically, we are specifying termination 339 | \* * detection. Does the algorithm for termination detection itself terminate or can 340 | \* * it execute forever? 341 | \* * TLA defines an ENABLED operator with which we can state predicates such as 342 | \* * ENABLED A . This prediacte is true iff action A is enabled, i.e., there exists 343 | \* * a state t such that the transition s -> t is an A step. 344 | []ENABLED [Next]_vars 345 | 346 | 347 | \* * In Terminates , we asserted that it is always "possible" to take a Next step, or that 348 | \* * it is possible for all variables to remain unchange: Next \/ vars' = vars . This is 349 | \* * a tautology in TLA and we effectively checked that Spec => TRUE . A related mistake 350 | \* * is when the antecedent is FALSE : FALSE => TRUE (Try conjoining 1 = 2 to Spec ) 351 | \* * Remember: [](Be suspicious of success). 352 | \* * 353 | \* * Sometime, we wish to assert that all or some steps are an A step (for an action A), 354 | \* * and some variables change. In other words, we wish to assert A /\ vars' # vars (which 355 | \* * is equivalent to ~(~A \/ vars' = vars) ). TLA has dedicated syntax for this, which 356 | \* * is <>_v where v is usually vars but can be any state function. 357 | AngleNextSubVars == 358 | []ENABLED <>_vars 359 | 360 | ----------------------------------------------------------------------------- 361 | 362 | Live == 363 | \* * Up to now, we have been stating safety properties, i.e., "nothing bad ever happens". 364 | \* * Looking at the counter-examples we've encountered so far, we find that a safety 365 | \* * property is a finite prefix of a (infinite) behavior where the final state or action 366 | \* * (transition) violates the property. We primarily care about safety when we check 367 | \* * systems. For example, when we (used to) board a plane, we very much care that the 368 | \* * plane never crashes! However, if the pilots decide not to take off, the plane is 369 | \* * guaranteed not to crash. So we sit on the plane forever, waiting for it to depart. 370 | \* * Clearly, as travelers, we eventually wish to arrive at our destination, e.g., to 371 | \* * attend a meeting next Tuesday. Can we formulate this as a safety property? Easy, 372 | \* * if we assume a (global) clock that determines when it is Tuesday. Specifying 373 | \* * algorithms or systems, we know how to replicate clocks. However, an algorithm that 374 | \* * requires something to happen in a fixed amount of (some notion of) time is brittle. 375 | \* * For example, an algorithm that counts hardware instructions will likely only work 376 | \* * on a particular hardware architecture. For EWD998, we could assert that termination 377 | \* * is detected within N rounds after termination occurred, but do we know the value of 378 | \* * N? And even with an N, we would need another property to assert that each round 379 | \* * terminates... 380 | \* * A way out is to formulate the property such that we assert that "something good 381 | \* * eventually happens"--the plane eventually arrives at its destination; the algorithm 382 | \* * eventually produces a result, termination is eventually detected. 383 | \* * 384 | \* * Requiring something good to eventually happen is a liveness property. Unfortunately, 385 | \* * in practice, it is not very useful to know that the algorithm eventually produces a 386 | \* * result if it takes 5 billion years to do so. 387 | \* * 388 | \* * A violation of a liveness property is -contrary to a safety property- an infinite 389 | \* * behavior where the "good thing" never happens. When printed, tools such as TLC show 390 | \* * a lasso where the property doesn't hold in the lasso loop. 391 | \* * 392 | \* * In TLA, we syntactically express a property that asserts that something good 393 | \* * eventually happens, with the diamond operator <> (which is just the dual of the box 394 | \* * operator: <>P <=> ~[]~P ). 395 | \* * 396 | \* * Error: Temporal properties were violated. 397 | \* * Error: The following behavior constitutes a counter-example: 398 | \* * State 1: 399 | \* * /\ pending = (0 :> 1 @@ 1 :> 1 @@ 2 :> 1) 400 | \* * /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE) 401 | \* * /\ terminationDetected = FALSE 402 | \* * State 2: Stuttering 403 | \* * Studying the counter-example below F will eventually make us realize that Live1 404 | \* * and Live2 are non-properties of the system. Instead, the liveness property we 405 | \* * really care about is that when all nodes terminate, the termination detection 406 | \* * algorithm eventually detects termination. It might take a number of rounds for the 407 | \* * algorithm to detect the termination. 408 | \* * In TLA, we can write [](terminated => <>terminationDetected) more compactly with 409 | \* * the leads-to operators: 410 | terminated ~> terminationDetected 411 | 412 | \* * Lastly, we state for readers which properties are theorems of the system. This is yet 413 | \* * another place where implication shows up. This is nothing other than stating that the 414 | \* * behaviors defined by Spec are a subset of the behaviors defined by Stable, and 415 | \* * Live . 416 | THEOREM Spec => Stable 417 | 418 | THEOREM Spec => Live 419 | 420 | \* * For both properties Live1 and Live2 , TLC reports counter-examples that end in 421 | \* * stuttering. This is strange! Clearly, the counter-example for Live1 could be 422 | \* * extended by, e.g., a Wakeup action that "consumes" one of the pending messages. 423 | \* * Similarly, the counter-example for Live2 could be extended by a 424 | \* * DetectTermination action. 425 | \* * We have to look at Spec again to see what is happening. The (temporal) formula 426 | \* * Spec defines a set of behaviors, and this set includes the counter-examples 427 | \* * reported for Live1 and Live2 . Why? Because Spec does not state a good 428 | \* * thing that (eventually) has to happen. In its current form, Spec only defines 429 | \* * what must never happen ( Spec itself is a safety property!). However, since we 430 | \* * ask TLC to check if something good eventually happens, it finds those behaviors 431 | \* * permitted by Spec, where nothing good ever happens. 432 | \* * We have to amend Spec such that it, in addition to the safety part, also defines 433 | \* * the liveness property we the system to satisfy. Mathematically, this means we have 434 | \* * to conjoin Spec with some suitable liveness property F: Spec /\ F 435 | \* * Naively, we might choose for F the (liveness) property 436 | \* * <>terminated /\ <>terminationDetected. 437 | ============================================================================= 438 | \* Modification History 439 | \* Created Sun Jan 10 15:19:20 CET 2021 by Stephan Merz @muenchnerkindl --------------------------------------------------------------------------------