├── C.cfg
├── CE.cfg
├── F.cfg
├── TemporalLogic.cfg
├── docs
    └── TLA+CheatSheet.pdf
├── figures
    ├── v01-ring01.gif
    ├── v01-ring03.gif
    └── v01-ring04.gif
├── APEWD998.cfg
├── .devcontainer
    ├── extensions
    │   ├── better-comments-2.0.5.vsix
    │   └── EFanZh.graphviz-preview-1.5.0.vsix
    ├── devcontainer.json
    └── install.sh
├── SmokeEWD998.cfg
├── MCEWD998.cfg
├── MCAsyncTerminationDetection.cfg
├── APAsyncTerminationDetection.cfg
├── .gitpod.yml
├── Utils.tla
├── .gitignore
├── MCAsyncTerminationDetection_actions.dot
├── C.tla
├── LICENSE
├── TemporalLogic.tla
├── .vscode
    └── settings.json
├── IncDec.tla
├── SmokeEWD998.tla
├── AsyncTerminationDetection_apalache.tla
├── EWD998_proof.tla
├── SyncTerminationDetection.tla
├── .github
    └── workflows
    │   └── main.yml
├── MCAsyncTerminationDetection.tla
├── APEWD998.tla
├── O.tla
├── F.tla
├── README.md
├── AsyncTerminationDetection_proof.tla
├── MCEWD998.tla
├── MCEWD998_actions.dot
├── EWD998.tla
└── AsyncTerminationDetection.tla


/C.cfg:
--------------------------------------------------------------------------------
1 | SPECIFICATION SpecC
2 | PROPERTY InvT


--------------------------------------------------------------------------------
/CE.cfg:
--------------------------------------------------------------------------------
1 | SPECIFICATION SpecE
2 | PROPERTY InvT


--------------------------------------------------------------------------------
/F.cfg:
--------------------------------------------------------------------------------
1 | \* TLC always expects a config file, even if it is empty.


--------------------------------------------------------------------------------
/TemporalLogic.cfg:
--------------------------------------------------------------------------------
1 | SPECIFICATION
2 |     Spec
3 | 
4 | PROPERTIES
5 |     Prop
6 | 
7 | ALIAS
8 |     Alias


--------------------------------------------------------------------------------
/docs/TLA+CheatSheet.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/docs/TLA+CheatSheet.pdf


--------------------------------------------------------------------------------
/figures/v01-ring01.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/figures/v01-ring01.gif


--------------------------------------------------------------------------------
/figures/v01-ring03.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/figures/v01-ring03.gif


--------------------------------------------------------------------------------
/figures/v01-ring04.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/figures/v01-ring04.gif


--------------------------------------------------------------------------------
/APEWD998.cfg:
--------------------------------------------------------------------------------
1 | CONSTANT N = 3
2 | SPECIFICATION Spec
3 | INVARIANT TypeOK
4 | INVARIANT Inv
5 | INVARIANT MaxDiameter
6 | 


--------------------------------------------------------------------------------
/.devcontainer/extensions/better-comments-2.0.5.vsix:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/.devcontainer/extensions/better-comments-2.0.5.vsix


--------------------------------------------------------------------------------
/.devcontainer/extensions/EFanZh.graphviz-preview-1.5.0.vsix:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tlaplus-workshops/ewd998/HEAD/.devcontainer/extensions/EFanZh.graphviz-preview-1.5.0.vsix


--------------------------------------------------------------------------------
/SmokeEWD998.cfg:
--------------------------------------------------------------------------------
1 | CONSTANT
2 |     N = 3
3 |     Init <- SmokeInit
4 | SPECIFICATION Spec
5 | INVARIANT TypeOK
6 | INVARIANT Inv
7 | CONSTRAINT StopAfter
8 | CHECK_DEADLOCK FALSE 


--------------------------------------------------------------------------------
/MCEWD998.cfg:
--------------------------------------------------------------------------------
1 | CONSTANT N = 3
2 | SPECIFICATION Spec
3 | INVARIANT TypeOK
4 | INVARIANT Inv
5 | \* CONSTRAINT StateConstraint
6 | PROPERTY ATDSpec
7 | INVARIANT MaxDiameter
8 | ALIAS Alias


--------------------------------------------------------------------------------
/MCAsyncTerminationDetection.cfg:
--------------------------------------------------------------------------------
 1 | CONSTANT N = 3
 2 | CONSTANT Init <- MCInit
 3 | SPECIFICATION Spec
 4 | CONSTRAINT StateConstraint
 5 | \* ACTION_CONSTRAINT ActionConstraint
 6 | INVARIANT TypeOK
 7 | PROPERTY Stable
 8 | PROPERTY ActuallyNext
 9 | PROPERTY Terminates
10 | \* PROPERTY AngleNextSubVars
11 | PROPERTY Live
12 | 


--------------------------------------------------------------------------------
/APAsyncTerminationDetection.cfg:
--------------------------------------------------------------------------------
 1 | \* 
 2 | \* Check  TypeOK  for an unbounded co-domain of  pending  :
 3 | \* $ apalache-mc check --config=APAsyncTerminationDetection.cfg --length=1 AsyncTerminationDetection.tla
 4 | \* 
 5 | \* Read https://apalache.informal.systems/docs/adr/002adr-types.html to learn
 6 | \* about Apalache's type annotations.
 7 | CONSTANT N = 3
 8 | SPECIFICATION Spec
 9 | INVARIANT TypeOK
10 | 


--------------------------------------------------------------------------------
/.gitpod.yml:
--------------------------------------------------------------------------------
 1 | ## The -vnc image below causes problems because it
 2 | ## lacks packages such as graphviz that also cannot
 3 | ## be installed via apt.
 4 | #image:
 5 | #  gitpod/workspace-full-vnc
 6 | 
 7 | tasks:
 8 |   - init: bash -i .devcontainer/install.sh
 9 |     
10 | vscode:
11 |   extensions:
12 |     - tintinweb.graphviz-interactive-preview
13 |     - cssho.vscode-svgviewer
14 |     - tomoki1207.pdf
15 |     - efanzh.graphviz-preview
16 |     - mhutchie.git-graph
17 | 


--------------------------------------------------------------------------------
/Utils.tla:
--------------------------------------------------------------------------------
 1 | This is a snapshot of a few operators from the TLA+
 2 | community modules at https://github.com/tlaplus/CommunityModules
 3 | 
 4 | ------- MODULE Utils -------
 5 | 
 6 | MapThenFoldSet(op(_,_), base, f(_), choose(_), S) ==
 7 |   LET iter[s \in SUBSET S] ==
 8 |         IF s = {} THEN base
 9 |         ELSE LET x == choose(s)
10 |              IN  op(f(x), iter[s \ {x}])
11 |   IN  iter[S]
12 | 
13 | FoldFunctionOnSet(op(_,_), base, fun, indices) ==
14 |   MapThenFoldSet(op, base, LAMBDA i : fun[i], LAMBDA s: CHOOSE x \in s : TRUE, indices)
15 | 
16 | ============================


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | ## Blacklist all files
 2 | *
 3 | 
 4 | ## Whitelist TLA+ files
 5 | !*.tla
 6 | 
 7 | ## Whitelist TLC model config and results
 8 | !*.cfg
 9 | !*.out  ## Usually .out files are small
10 | 
11 | ## Whitelist Toolbox model metadata
12 | !*.launch
13 | 
14 | ## Whitelist Toolbox spec metadata
15 | !.project
16 | !*.prefs
17 | 
18 | ## Whitelist all folders
19 | !*/
20 | 
21 | ## Blacklist TLAPS cache folder
22 | ## See https://github.com/tlaplus/tlapm/issues/16
23 | *.tlaps/
24 | __tlacache__
25 | .tlacache
26 | 
27 | ## Blacklist apalache working dir
28 | x/
29 | 
30 | ## Ignore tools installed into the workspace
31 | tools/
32 | 


--------------------------------------------------------------------------------
/MCAsyncTerminationDetection_actions.dot:
--------------------------------------------------------------------------------
 1 | digraph ActionGraph {
 2 | nodesep=0.35;
 3 | subgraph cluster_legend {
 4 | label = "Coverage";
 5 | node [shape=point] {
 6 | d0 [style = invis];
 7 | d1 [style = invis];
 8 | p0 [style = invis];
 9 | p0 [style = invis];
10 | }
11 | d0 -> d1 [label=unseen, color="green", style=dotted]
12 | p0 -> p1 [label=seen]
13 | }
14 | 0 [label="DetectTermination"]
15 | 1 [label="SendMsg"]
16 | 2 [label="Terminate"]
17 | 3 [label="Wakeup"]
18 | 0 -> 0[penwidth=0.83];
19 | 0 -> 1[penwidth=0.64];
20 | 0 -> 2[penwidth=0.65];
21 | 0 -> 3[penwidth=0.67];
22 | 1 -> 0[color="green",style=dotted];
23 | 1 -> 1[penwidth=0.74];
24 | 1 -> 2[penwidth=0.74];
25 | 1 -> 3[penwidth=0.75];
26 | 2 -> 0[penwidth=0.7];
27 | 2 -> 1[penwidth=0.72];
28 | 2 -> 2[penwidth=0.72];
29 | 2 -> 3[penwidth=0.76];
30 | 3 -> 0[color="green",style=dotted];
31 | 3 -> 1[penwidth=0.76];
32 | 3 -> 2[penwidth=0.76];
33 | 3 -> 3[penwidth=0.75];
34 | }


--------------------------------------------------------------------------------
/C.tla:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 |                See Specifying Systems section 6.6 on page 73.
 4 | 
 5 | --------------------------------- MODULE C ----------------------------------
 6 | EXTENDS Integers
 7 | 
 8 | S ==
 9 |     {"c","a","c","f"}
10 | 
11 | VARIABLE 
12 |     \* @type: Str;
13 |     x
14 | 
15 | -----------------------------------------------------------------------------
16 | 
17 | InitC ==
18 |     x = CHOOSE n \in S: TRUE
19 | 
20 | NextC ==
21 |     x' = CHOOSE n \in {"a","c","f","f","c","a"}: n \in S
22 | 
23 | SpecC == InitC /\ [][NextC]_x
24 | 
25 | -----------------------------------------------------------------------------
26 | 
27 | InitE ==
28 |     x \in S
29 | 
30 | NextE ==
31 |     x' \in {"a","c","f","f","c","a"}
32 | 
33 | SpecE == InitE /\ [][NextE]_x
34 | 
35 | -----------------------------------------------------------------------------
36 | 
37 | \* TLC
38 | InvT ==
39 |     [][x = x']_x
40 | 
41 | \* Apalache
42 | InvA ==
43 |     x = x'
44 | 
45 | =============================================================================


--------------------------------------------------------------------------------
/.devcontainer/devcontainer.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "name": "TLA+ EWD998",
 3 | 
 4 |   // Install optional extension. If this fails, it just degrades the convenience factor.
 5 |   "extensions": [
 6 |     "tlaplus.vscode-ide",
 7 |     "EFanZh.graphviz-preview",
 8 |     "cssho.vscode-svgviewer",
 9 |     "tomoki1207.pdf",
10 |     "mhutchie.git-graph",
11 |     "ms-vsliveshare.vsliveshare"
12 |   ],
13 | 
14 |   // - Do not automatically update extensions (E.g. better-code ext is patched for TLA+)
15 |   // - Use Java GC that works best with TLC.
16 |   // - https://github.com/alygin/vscode-tlaplus/wiki/Automatic-Module-Parsing
17 |   "settings": {
18 |     "extensions.autoUpdate": false,
19 |     "extensions.autoCheckUpdates": false,
20 |     "editor.minimap.enabled": false,
21 |     "tlaplus.tlc.statisticsSharing": "share",
22 |     "tlaplus.java.options": "-XX:+UseParallelGC",
23 |     "tlaplus.java.home": "/home/codespace/java/current/",
24 |     "[tlaplus]": {"editor.codeActionsOnSave": {"source": true} }
25 |     },
26 | 
27 |   "onCreateCommand": "bash -i .devcontainer/install.sh",
28 | }
29 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 Markus Alexander Kuppe
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/TemporalLogic.tla:
--------------------------------------------------------------------------------
 1 | --------------------------- MODULE TemporalLogic --------------------------------
 2 | EXTENDS Naturals, Sequences
 3 | 
 4 | F == FALSE
 5 | T == TRUE
 6 | 
 7 | VARIABLE p
 8 | 
 9 | seq ==
10 |     {
11 |         \* << F, T, 2 >>
12 |         \* ,<< T, F, 2 >>
13 |         \* ,<< T, 1 >>
14 | 
15 |         \* ,<< T, F, 1 >>
16 |         \* ,<< F, T, T, T, F, 5 >>
17 |         \* ,<< T, F, T, T, F, T, 6 >>
18 |         \* ,<< F, T, T, T, F, 3 >>
19 |     }
20 | 
21 | Prop ==
22 |     /\ p = T
23 |     \* /\   []p
24 |     \* /\   <>p
25 |     \* /\ <>[]p
26 |     \* /\ []<>p
27 | 
28 | -------------------------------------------------------------------------------
29 | \* Ignore the following!
30 | 
31 | VARIABLE c, b
32 | vars == <<c, b, p>>
33 | 
34 | Init ==
35 |     /\ b \in seq
36 |     /\ c = 1
37 |     /\ p = b[c] 
38 | 
39 | Next ==
40 |     /\ UNCHANGED b
41 |     /\ c' = IF c + 1 <= Len(b) - 1 THEN c + 1 ELSE b[Len(b)]
42 |     /\ p' = b[c']
43 | 
44 | Spec ==
45 |     Init /\ [][Next]_vars /\ WF_vars(Next)
46 | 
47 | Alias ==
48 |     \* Hide c and b variables.
49 |     [ p |-> p ]
50 | 
51 | ==============================================================================


--------------------------------------------------------------------------------
/.vscode/settings.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "tlaplus.tlc.modelChecker.options": "-deadlock -noTE",
 3 |     "tlaplus.tlc.statisticsSharing": "share",
 4 |     "tlaplus.java.options": "-XX:+UseParallelGC",
 5 |     "[tlaplus]": {"editor.codeActionsOnSave": {
 6 |             "source": "explicit"
 7 |         } },
 8 |     "extensions.autoCheckUpdates": false,
 9 |     "extensions.autoUpdate": false,
10 |     "breadcrumbs.enabled": false,
11 |     "editor.minimap.enabled": false,
12 |     "editor.useTabStops": false,
13 |     "redhat.telemetry.enabled": false,
14 |     "settingsSync.ignoredExtensions": [
15 |      "aaron-bond.better-comments"
16 |     ],
17 |     "files.exclude": {
18 |         ".gitignore": true,
19 |         ".gitpod.yml": true,
20 |         ".devcontainer": true,
21 |         ".github": true,
22 |         ".vscode": true,
23 |         "LICENSE": true,
24 |         "figures": true,
25 |         ".tlacache": true,
26 |         "*.tlaps": true,
27 |         "states": true,
28 |         "x": true,
29 |         "log0.smt": true,
30 |         "profile-rules.txt": true,
31 |         "detailed.log": true,
32 |         "*.toolbox": true,
33 |         "*.aux": true,
34 |         "*.dvi": true,
35 |         "*.log": true,
36 |         "*.tex": true,
37 |         "_apalache-out": true
38 |     }
39 | }
40 | 


--------------------------------------------------------------------------------
/IncDec.tla:
--------------------------------------------------------------------------------
 1 | Apalache is the new kid on the block.  Where TLC
 2 | implements finite-state model-checking, Apalache
 3 | implements bounded model-checking.  Apalache
 4 | underpins a powerful SMT solver that can answer
 5 | queries such as  \E n \in 1..Nat : n \in Nat
 6 | without enumerating the values of  n  (TLC won't
 7 | even try to enumerate  Nat).
 8 | 
 9 | Let's see the different powers of Apalache and
10 | TLC...
11 | 
12 | ### Run tools
13 | 
14 | $ apalache-mc check --inv=Inv --length=10 \
15 |   IncDec.tla
16 | 
17 | $ tlc -config IncDec.tla IncDec.tla
18 | 
19 | ### Quick demo
20 | 
21 | 1) Check spec as is
22 | 2) Increment --length to 11
23 | 3) Increment  Inv  and  --length  to 1000
24 | 4) Change  Init  to  v \in Nat
25 | 4a) Go Apalache!
26 |     (But no longer useful counter-examples
27 |      when checking inductive invariants)
28 | 4b) TLC gives up
29 |     (Workaround: Randomization.tla with 
30 |     RandomSubset(42,0..10000000))
31 | 
32 | ---- MODULE IncDec ----
33 | EXTENDS Integers, Randomization
34 | 
35 | VARIABLE
36 |     \* @type: Int;
37 |     v
38 | 
39 | Init ==
40 |     /\ v = 0
41 | 
42 | Inc ==
43 |     /\ v >= 0
44 |     /\ v' = v + 1
45 | 
46 | Dec ==
47 |     /\ v <= 0
48 |     /\ v' = v - 1
49 | 
50 | Next ==
51 |     \/ Inc
52 |     \/ Dec
53 | 
54 | Inv ==
55 |     /\ v <  10
56 |     /\ v > -10
57 | 
58 | ====
59 | ---- CONFIG IncDec ----
60 | INIT Init
61 | NEXT Next
62 | INVARIANT Inv
63 | ====


--------------------------------------------------------------------------------
/SmokeEWD998.tla:
--------------------------------------------------------------------------------
 1 | ------------------------------- MODULE SmokeEWD998 -------------------------------
 2 | EXTENDS MCEWD998, TLC, Randomization
 3 | 
 4 | k ==
 5 |     10
 6 | 
 7 | \* SmokeInit is configured to re-define the initial predicate. We use  SmokeInit
 8 | \* to randomly select a subset of the defined initial states in cases when the
 9 | \* set of all initial states is too expensive to generate during smoke testing.
10 | SmokeInit ==
11 |     /\ pending \in RandomSubset(k, [Node -> 0..(N-1)])
12 |     /\ counter \in RandomSubset(k, [Node -> -(N-1)..(N-1)])
13 |     /\ active \in RandomSubset(k, [Node -> BOOLEAN])
14 |     /\ color \in RandomSubset(k, [Node -> Color])
15 |     /\ token \in RandomSubset(k, [pos: Node, q: Node, color: Color])
16 |     /\ Inv \* Reject states with invalid ratio between counter, pending, ...
17 | 
18 | \* StopAfter  has to be configured as a state constraint. It stops TLC after ~1
19 | \* second or after generating 100 traces, whatever comes first, unless TLC
20 | \* encountered an error.  In this case,  StopAfter  has no relevance.
21 | StopAfter ==
22 |     TLCGet("config").mode = "simulate" =>
23 |     (* The smoke test has a time budget of 1 second. *)
24 |     \/ TLCSet("exit", TLCGet("duration") > 1)
25 |     (* Generating 100 traces should provide reasonable coverage. *)
26 |     \/ TLCSet("exit", TLCGet("diameter") > 100)
27 | 
28 | ===============================================================================


--------------------------------------------------------------------------------
/AsyncTerminationDetection_apalache.tla:
--------------------------------------------------------------------------------
 1 | We want to prove the temporal property  Stable  , which is defined as:
 2 | 
 3 |   Stable == [](terminationDetected => []terminated)
 4 | 
 5 | For the moment, Apalache supports only invariant checking.
 6 | Nevertheless, we can check the property  Stable  with Apalache.
 7 | If we look carefully at the temporal formula Stable, we can see that
 8 | it is sufficient to check the following:
 9 | 
10 | 1. Init => StableInv
11 | 2. StableInv /\ Next => StableInv'
12 | 3. StableInv /\ Next => StableActionInv
13 | 
14 | We can check that by issuing the following three queries:
15 | 
16 | $ apalache-mc check --config=APAsyncTerminationDetection.cfg --length=1 \
17 |    --inv=StableInv --init=Init AsyncTerminationDetection_apalache.tla
18 | $ apalache-mc check --config=APAsyncTerminationDetection.cfg --length=2 \
19 |    --init=StableInv --inv=StableInv AsyncTerminationDetection_apalache.tla
20 | $ apalache-mc check --config=APAsyncTerminationDetection.cfg --length=2 \
21 |    --init=StableInv --inv=StableActionInv AsyncTerminationDetection_apalache.tla
22 | 
23 | We issue query 1 for a computation of length 1 (predicate Init is counted as a
24 | step), whereas we issue queries 2-3 for computations of length 2 (StableInv, 
25 | then Next).
26 | 
27 | ---------------------- MODULE AsyncTerminationDetection_apalache ---------------------
28 | EXTENDS AsyncTerminationDetection
29 | 
30 | \* This is a state invariant.
31 | StableInv ==
32 |     /\ TypeOK
33 |     /\ (terminationDetected => terminated)
34 | 
35 | \* This is an action invariant.
36 | StableActionInv ==    
37 |     terminated => terminated'
38 | ======================================================================================
39 | 


--------------------------------------------------------------------------------
/EWD998_proof.tla:
--------------------------------------------------------------------------------
 1 | ---------------------- MODULE EWD998_proof ---------------------
 2 | EXTENDS EWD998, TLAPS
 3 | 
 4 | USE NIsPosNat DEF 
 5 |         Color, Node, 
 6 |         Init, Spec, 
 7 |         Next, vars,
 8 |         System, InitiateProbe, PassToken, 
 9 |         Environment, SendMsg, RecvMsg, Deactivate, 
10 |         TypeOK
11 | 
12 | LEMMA TypeCorrect == Spec => []TypeOK
13 | <1>1. Init => TypeOK  OBVIOUS 
14 | <1>2. TypeOK /\ [Next]_vars => TypeOK'
15 | <1>3. QED BY <1>1, <1>2, PTL
16 | 
17 | THEOREM TerminationDetection == Spec => []IInv
18 | <1> USE TypeCorrect DEF IInv, Inv, Sum
19 | <1>1. Init => IInv
20 | <1>2. IInv /\ [Next]_vars => IInv'   
21 | <1>3. QED BY <1>1, <1>2, PTL
22 | 
23 | \* TODO Have fun and prove TerminationDetection above!  When done, file a PR
24 |  \* TODO for the TLA+ examples at https://examples.tlapl.us  :-)
25 | 
26 | =============================================================================
27 | 
28 | 
29 | 
30 | \* The <1>1 proof obligation is not OBVIOUS, but the failed proof obligation
31 |  \* nicely shows the equivalence of the special syntax for recursive functions  
32 |  \*   F[e \in S] == ...  and  CHOOSE.
33 |  \* Below is an excerpt of what TLAPS returns for <1>1:
34 | 
35 | ASSUME NEW CONSTANT N,
36 |        NEW VARIABLE active,
37 |        NEW VARIABLE pending,
38 |        NEW VARIABLE color,
39 |        NEW VARIABLE counter,
40 |        NEW VARIABLE token,
41 |        N \in Nat \ {0} 
42 | PROVE  (/\ ...
43 |        =>  ...
44 |            /\ /\ P0::(B
45 |                       = (CHOOSE sum :
46 |                            sum
47 |                            = [i \in 0..N - 1 |->
48 |                                 IF i = 0
49 |                                   THEN counter[i]
50 |                                   ELSE sum[i - 1] + counter[i]])[N - 1])
51 |               /\ \/ P1:: ...
52 |               


--------------------------------------------------------------------------------
/SyncTerminationDetection.tla:
--------------------------------------------------------------------------------
 1 | ---------------------- MODULE SyncTerminationDetection ----------------------
 2 | (***************************************************************************)
 3 | (* An abstract specification of the termination detection problem in a     *)
 4 | (* ring with synchronous communication.                                    *)
 5 | (***************************************************************************)
 6 | EXTENDS Naturals
 7 | CONSTANT N
 8 | ASSUME NAssumption == N \in Nat \ {0}
 9 | 
10 | Node == 0 .. N-1
11 | 
12 | VARIABLES 
13 |   active,               \* activation status of nodes
14 |   terminationDetected   \* has termination been detected?
15 | 
16 | TypeOK ==
17 |   /\ active \in [Node -> BOOLEAN]
18 |   /\ terminationDetected \in BOOLEAN
19 | 
20 | terminated == \A n \in Node : ~ active[n]
21 | 
22 | (***************************************************************************)
23 | (* Initial condition: the nodes can be active or inactive, termination     *)
24 | (* may (but need not) be detected immediately if all nodes are inactive.   *)
25 | (***************************************************************************)
26 | Init ==
27 |   /\ active \in [Node -> BOOLEAN]
28 |   /\ terminationDetected \in {FALSE, terminated}
29 | 
30 | Terminate(i) ==  \* node i terminates
31 |   /\ active[i]
32 |   /\ active' = [active EXCEPT ![i] = FALSE]
33 |      (* possibly (but not necessarily) detect termination if all nodes are inactive *)
34 |   /\ terminationDetected' \in {terminationDetected, terminated'}
35 | 
36 | Wakeup(i,j) ==  \* node i activates node j
37 |   /\ active[i]
38 |   /\ active' = [active EXCEPT ![j] = TRUE]
39 |   /\ UNCHANGED terminationDetected
40 | 
41 | DetectTermination ==
42 |   /\ terminated
43 |   /\ terminationDetected' = TRUE
44 |   /\ UNCHANGED active
45 | 
46 | Next ==
47 |   \/ \E i \in Node : Terminate(i)
48 |   \/ \E i,j \in Node : Wakeup(i,j)
49 |   \/ DetectTermination
50 | 
51 | vars == <<active, terminationDetected>>
52 | Spec == Init /\ [][Next]_vars /\ WF_vars(DetectTermination)
53 | 
54 | Stable == [](terminationDetected => []terminated)
55 | 
56 | Live == terminated ~> terminationDetected
57 | 
58 | =============================================================================
59 | 


--------------------------------------------------------------------------------
/.github/workflows/main.yml:
--------------------------------------------------------------------------------
 1 | name: CI
 2 | 
 3 | on: [push]
 4 | 
 5 | jobs:
 6 |   build:
 7 | 
 8 |     runs-on: ubuntu-latest
 9 | 
10 |     steps:
11 |     - uses: actions/checkout@v1
12 |     # Do not download and install TLAPS over and over again.
13 |     - uses: actions/cache@v1
14 |       id: cache
15 |       with:
16 |         path: tlaps/
17 |         key: tlaps1.4.5
18 |     - name: Get TLAPS
19 |       if: steps.cache.outputs.cache-hit != 'true' # see actions/cache above
20 |       run: wget https://github.com/tlaplus/tlapm/releases/download/v1.4.5/tlaps-1.4.5-x86_64-linux-gnu-inst.bin
21 |     - name: Install TLAPS
22 |       if: steps.cache.outputs.cache-hit != 'true' # see actions/cache above
23 |       run: |
24 |         chmod +x tlaps-1.4.5-x86_64-linux-gnu-inst.bin
25 |         ./tlaps-1.4.5-x86_64-linux-gnu-inst.bin -d tlaps
26 |     - name: Run TLAPS
27 |       run: tlaps/bin/tlapm --cleanfp -I tlaps/ O.tla AsyncTerminationDetection_proof.tla
28 |     - name: Get (nightly) TLC
29 |       run: wget https://github.com/tlaplus/tlaplus/releases/download/v1.8.0/tla2tools.jar
30 |     - name: Run TLC
31 |       run: >-
32 |         java -Dtlc2.TLC.stopAfter=1800 -Dtlc2.TLC.ide=Github
33 |         -Dutil.ExecutionStatisticsCollector.id=aabbcc60f238424fa70d124d0c77bbf1
34 |         -cp tla2tools.jar tlc2.TLC -workers auto -lncheck final -checkpoint 60
35 |         -coverage 60 -tool -deadlock MCAsyncTerminationDetection
36 |     - name: Get (nightly) Apalache
37 |       run: wget https://github.com/informalsystems/apalache/releases/latest/download/apalache.tgz
38 |     - name: Install Apalache
39 |       run: |
40 |         tar xvfz apalache.tgz
41 |     - name: Run Apalache
42 |       run: |
43 |         apalache/bin/apalache-mc check --config=APAsyncTerminationDetection.cfg --length=1 --inv=StableInv --init=Init AsyncTerminationDetection_apalache.tla
44 |         apalache/bin/apalache-mc check --config=APAsyncTerminationDetection.cfg --length=2 --init=StableInv --inv=StableInv AsyncTerminationDetection_apalache.tla
45 |         apalache/bin/apalache-mc check --config=APAsyncTerminationDetection.cfg --length=2 --init=StableInv --inv=StableActionInv AsyncTerminationDetection_apalache.tla
46 |         apalache/bin/apalache-mc check --features=no-rows --config=APEWD998.cfg --length=2 --init=IInvA --next=Next --inv=InvA APEWD998.tla
47 | 


--------------------------------------------------------------------------------
/MCAsyncTerminationDetection.tla:
--------------------------------------------------------------------------------
 1 | ---------------------- MODULE MCAsyncTerminationDetection ---------------------
 2 | EXTENDS AsyncTerminationDetection
 3 | 
 4 | MCInit ==
 5 |     /\ pending \in [Node -> {1,2,3}]
 6 |     /\ active \in [ Node -> BOOLEAN ]
 7 |     /\ terminationDetected \in {terminated}
 8 | 
 9 | StateConstraint ==
10 |     \* * A (state-) constraint is a boolean-valued state function, i.e. a function
11 |      \* * that is true or false of a state.
12 |      \* * A state s, for which the constraint evaluates to FALSE, is not in the model.
13 |      \* * TLC checks if s satisfies the properties (later!), but the successor states
14 |      \* * of s are not generated.
15 |      \* * Constraints are configured in TLC's configuration file
16 |      \* * (MCAsyncTerminationDetection.cfg).
17 |      \* * In this model, we restrict the state space to a finite fragment such that
18 |      \* * at most three messages are pending.
19 |     \A n \in Node : pending[n] <= 3
20 | 
21 | ActionConstraint ==
22 |     \* * A state function cannot only be built from constant- and state-level operators.
23 |      \* * Among others, the prime operator has action-level.  Thus, it cannot appear in
24 |      \* * a state function such as this state constraint.  Fortunately, TLC also supports
25 |      \* * action constraints.
26 |      \* * There exists no node for which pending increases.
27 |     ~ \E n \in Node: pending'[n] > pending[n]
28 | 
29 | \* * We could have stated the constraint in AsyncTerminationDetection.tla instead of
30 |  \* * in a new module.  However, constraints are only relevant when model-checking
31 |  \* * and not part of the system design.
32 | 
33 | \* Gradually increase the value of CONSTANT N in MCAsyncTerminationDetection.cfg
34 |  \* and observe how quickly the size of the state space explodes (distinct states).
35 |  \* Do we need a supercomputer for model-checking to be useful?  Usually, most bugs
36 |  \* are found even with tiny models.  This is called the "small scope hyphothesis".
37 |  \* If higher assurances are needed, one can write a proof for infinite domains with
38 |  \* the TLA proof system. 
39 | =============================================================================
40 | 
41 | | N | Diameter | Distinct States |
42 | |---| ---|  --- |
43 | | 4 | 17 |   4k | 
44 | | 5 | 21 |  32k |
45 | | 6 | 25 | 262k |
46 | 


--------------------------------------------------------------------------------
/APEWD998.tla:
--------------------------------------------------------------------------------
 1 | 
 2 | \* Cannot check  IInvA  as the invariant below because Apalache complains about  TypeOK  :
 3 |  \* Input error (see the manual): Found a set map over an infinite set of Int. Not supported.
 4 | 
 5 | \* Remove/comment  PROPERTY ATDSpec  in  MCEWD998.cfg  to stop Apalache (0.23.1) from complaining about:
 6 |  \*  AsyncTerminationDetection.tla:340:30-340:55: unsupported expression: WF_<<active, pending,...
 7 | 
 8 | \* Replace recursive function  Sum  in EWD998 with the Fold variant because Apalache stopped supporting
 9 |  \* recursive functions at some point.
10 | 
11 | $ apalache-mc check --features=no-rows --inv=InvA --config=MCEWD998.cfg \
12 |   --init=IInvA --next=Next --length=1 APEWD998.tla
13 | 
14 | ------------------------------ MODULE APEWD998 ------------------------------
15 | EXTENDS Functions
16 | 
17 | CONSTANT
18 |     \* @type: Int;
19 |     N
20 | 
21 | VARIABLES 
22 |     \* @type: Int -> Bool;
23 |     active,
24 |     \* @type: Int -> Int;
25 |     pending,
26 |     \* @type: Int -> Str;
27 |     color,
28 |     \* @type: Int -> Int;
29 |     counter,
30 |     \* @type: [pos: Int, q: Int, color: Str];
31 |     token
32 | 
33 | INSTANCE EWD998
34 | 
35 | \* C  parameter of  SumA  because Apalache does not handle non-constant ranges
36 |  \* (see https://git.io/JGFhg)
37 | \* @type: (Int -> Int, Set(Int)) => Int;
38 | SumA(fun, C) ==
39 |     LET Plus(a, b) == a + b
40 |     IN FoldFunctionOnSet(Plus, 0, fun, C)
41 | 
42 | BA ==
43 |     \* This spec counts the in-flight messages in the variable  pending  .
44 |     SumA(pending, Node)
45 | 
46 | \* The set of nodes that have passed the token in this round.
47 |  \* Previously written more concisely as  (token.pos+1)..N-1
48 |  \* (see https://git.io/JGFhg)
49 | Decided ==
50 |     { n \in Node: n > token.pos }
51 | 
52 | \* The set of nodes that have not passed the token in this round yet.
53 |  \* Previously written more concisely as  0..token.pos
54 |  \* (see https://git.io/JGFhg)
55 | Undecided ==
56 |     { n \in Node: n <= token.pos }
57 | 
58 | InvA == 
59 |     /\ P0:: BA = SumA(counter, Node)
60 |     /\  \/ P1:: /\ \A i \in Decided : ~ active[i]
61 |             /\ IF token.pos = N-1 
62 |                THEN token.q = 0 
63 |                ELSE token.q = SumA(counter, Decided)
64 |         \/ P2:: SumA(counter, Undecided) + token.q > 0
65 |         \/ P3:: \E i \in Undecided : color[i] = "black"
66 |         \/ P4:: token.color = "black"
67 | 
68 | IInvA ==
69 |     \* Conjoin  TypeOK  to define the types of the variables.  This is somewhat
70 |      \* redundant given Apalache's type annotations.
71 |     /\ TypeOK
72 |     /\ InvA
73 | 
74 | =============================================================================
75 | 


--------------------------------------------------------------------------------
/.devcontainer/install.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash -i
 2 | 
 3 | ## Fix issues with gitpod's stock .bashrc
 4 | cp /etc/skel/.bashrc $HOME
 5 | 
 6 | ## Shorthands for git
 7 | git config --global alias.slog 'log --pretty=oneline --abbrev-commit'
 8 | git config --global alias.co checkout
 9 | git config --global alias.lco '!f() { git checkout ":/$1:" ; }; f'
10 | 
11 | ## Waste less screen estate on the prompt.
12 | echo 'export PS1="$ "' >> $HOME/.bashrc
13 | 
14 | ## Make it easy to go back and forth in the (linear) git history.
15 | echo 'function sn() { git log --reverse --pretty=%H main | grep -A 1 $(git rev-parse HEAD) | tail -n1 | xargs git show --color; }' >> $HOME/.bashrc
16 | echo 'function n() { git log --reverse --pretty=%H main | grep -A 1 $(git rev-parse HEAD) | tail -n1 | xargs git checkout; }' >> $HOME/.bashrc
17 | echo 'function p() { git checkout HEAD^; }' >> $HOME/.bashrc
18 | 
19 | ## Place to install TLC, TLAPS, Apalache, ...
20 | mkdir -p tools
21 | 
22 | ## PATH below has two locations because of inconsistencies between Gitpod and Codespaces.
23 | ## Gitpod:     /workspace/...
24 | ## Codespaces: /workspaces/...
25 | 
26 | ## Install TLA+ Tools (download from github instead of nightly.tlapl.us (inria) to only rely on github)
27 | wget -qN https://github.com/tlaplus/tlaplus/releases/download/v1.8.0/tla2tools.jar -P tools/
28 | echo "alias tlcrepl='java -cp /workspace/ewd998/tools/tla2tools.jar:/workspaces/ewd998/tools/tla2tools.jar tlc2.REPL'" >> $HOME/.bashrc
29 | echo "alias tlc='java -cp /workspace/ewd998/tools/tla2tools.jar:/workspaces/ewd998/tools/tla2tools.jar tlc2.TLC'" >> $HOME/.bashrc
30 | 
31 | ## Install CommunityModules
32 | wget -qN https://github.com/tlaplus/CommunityModules/releases/latest/download/CommunityModules-deps.jar -P tools/
33 | 
34 | ## Install TLAPS (proof system)
35 | wget -N https://github.com/tlaplus/tlapm/releases/download/v1.4.5/tlaps-1.4.5-x86_64-linux-gnu-inst.bin -P /tmp
36 | chmod +x /tmp/tlaps-1.4.5-x86_64-linux-gnu-inst.bin
37 | /tmp/tlaps-1.4.5-x86_64-linux-gnu-inst.bin -d tools/tlaps
38 | echo 'export PATH=$PATH:/workspace/ewd998/tools/tlaps/bin:/workspaces/ewd998/tools/tlaps/bin' >> $HOME/.bashrc
39 | 
40 | ## Install Apalache
41 | wget -qN https://github.com/informalsystems/apalache/releases/latest/download/apalache.tgz -P /tmp
42 | mkdir -p tools/
43 | tar xvfz /tmp/apalache.tgz --directory tools/
44 | echo 'export PATH=$PATH:/workspace/ewd998/tools/apalache/bin:/workspaces/ewd998/tools/apalache/bin' >> $HOME/.bashrc
45 | tools/apalache/bin/apalache-mc config --enable-stats=true
46 | 
47 | ## Update missing or outdated apt database on cloud instances.  Without it,
48 | ## installing packages below will likely fail.
49 | sudo apt-get update
50 | 
51 | ## (Moved to the end to let it run in the background while we get started)
52 | ## - graphviz to visualize TLC's state graphs
53 | ## - htop to show system load
54 | ## - texlive-latex-recommended to generate pretty-printed specs
55 | ## - z3 for Apalache (comes with z3 turnkey) (TLAPS brings its own install)
56 | ## - r-base iff tutorial covers statistics (TODO)
57 | sudo apt-get install -y graphviz htop
58 | ## No need because Apalache comes with z3 turnkey
59 | #sudo apt-get install -y z3 libz3-java 
60 | sudo apt-get install -y --no-install-recommends texlive-latex-recommended
61 | #sudo apt-get install -y r-base
62 | 
63 | ## Install TLA+ Toolbox
64 | wget https://github.com/tlaplus/tlaplus/releases/download/v1.8.0/TLAToolbox-1.8.0.deb -P /tmp
65 | sudo dpkg -i /tmp/TLAToolbox-1.8.0.deb
66 | 
67 | ## switch to first commit of the tutorial. Unshallow on Codespaces first.
68 | if $(git rev-parse --is-shallow-repository); then git fetch --unshallow; fi
69 | git co ':/v01:'
70 | 
71 | ## $(pwd)/ because VSCode apparantly doesn't like relative paths.
72 | #code --force --install-extension $(pwd)/.devcontainer/extensions/better-comments-2.0.5.vsix
73 | #code --force --install-extension $(pwd)/.devcontainer/extensions/EFanZh.graphviz-preview-1.5.0.vsix
74 | 
75 | ## Open the readme.md file in the editor.
76 | #code README.md
77 | 


--------------------------------------------------------------------------------
/O.tla:
--------------------------------------------------------------------------------
  1 | Run `tlapm O.tla` on the terminal to verify the 
  2 | theorems below with TLAPS.
  3 | 
  4 | ---- MODULE O ----
  5 | 
  6 | CONSTANT O(_)
  7 | 
  8 | \* THEOREM T1 == O(1) /\ O(2) <=> \E i \in {1,2}: O(i)  OBVIOUS
  9 | THEOREM T2 == O(1) /\ O(2) <=> \A i \in {1,2}: O(i)  OBVIOUS
 10 | THEOREM T3 == O(1) \/ O(2) <=> \E i \in {1,2}: O(i)  OBVIOUS
 11 | \* THEOREM T4 == O(1) \/ O(2) <=> \A i \in {1,2}: O(i)  OBVIOUS
 12 | 
 13 | 
 14 | ------------------
 15 | \* Implication
 16 | 
 17 | CONSTANT
 18 |     P, \* It's raining
 19 |     Q  \* The street is wet (street is not in a tunnel!)
 20 | 
 21 | \* If it rains (P), the street is wet (Q)
 22 | THEOREM TRUE => TRUE <=> TRUE  OBVIOUS
 23 | \* It cannot be that it rains, but the street is dry
 24 | THEOREM TRUE => FALSE <=> FALSE  OBVIOUS
 25 | \* The street might be wet, even without rain (somebody spilled some water)
 26 | THEOREM FALSE => TRUE <=> TRUE  OBVIOUS
 27 | \* No rain and a dry street
 28 | THEOREM FALSE => FALSE <=> TRUE  OBVIOUS
 29 | 
 30 | \* Contraposition (Street not wet implies no rain).
 31 | \* https://en.wikipedia.org/wiki/Contraposition
 32 | THEOREM P => Q <=> ~Q => ~P  OBVIOUS
 33 | \* Or-and-if.
 34 | THEOREM P => Q <=> (~P) \/ Q  OBVIOUS
 35 | \* Negated conditionals.
 36 | THEOREM ~(P => Q) <=> P /\ (~Q)  OBVIOUS 
 37 | 
 38 | ------------------
 39 | \* Action operators
 40 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 
 41 | PROVE [A]_v <=> A \/ v' = v  OBVIOUS 
 42 | 
 43 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 
 44 | PROVE <<A>>_v <=> A /\ v' # v  OBVIOUS 
 45 | 
 46 | \* ExpandENABLED requires TLAPS version greater than 1.4
 47 | \* ENABLED A \/ v=v'  is a tautology.
 48 | INSTANCE TLAPS
 49 | 
 50 | THEOREM ASSUME NEW VARIABLE v
 51 | PROVE (ENABLED [FALSE]_v) (*BY ExpandENABLED*)
 52 | 
 53 | THEOREM ASSUME NEW VARIABLE v
 54 | PROVE (ENABLED [TRUE]_v) (*BY ExpandENABLED*)
 55 | 
 56 | THEOREM ASSUME NEW VARIABLE v
 57 | PROVE (ENABLED [FALSE]_TRUE) (*BY ExpandENABLED*)
 58 | 
 59 | THEOREM ASSUME NEW VARIABLE v
 60 | PROVE (ENABLED [TRUE]_TRUE) (*BY ExpandENABLED*)
 61 | 
 62 | ------------------
 63 | \* Dual Box and Diamond operators
 64 | THEOREM ASSUME NEW F 
 65 | PROVE <>F <=> ~[]~F  OBVIOUS 
 66 | 
 67 | THEOREM ASSUME NEW F 
 68 | PROVE ~<>F <=> []~F  OBVIOUS 
 69 | 
 70 | \* see Specifying Systems page 92
 71 | THEOREM ASSUME NEW F 
 72 | PROVE ~[]F <=> <>~F  OBVIOUS 
 73 | 
 74 | \* see Specifying Systems page 93
 75 | THEOREM ASSUME NEW F, NEW G 
 76 | PROVE 
 77 |     /\ [](F /\ G) <=> ([]F) /\ ([]G)
 78 |     /\ <>(F \/ G) <=> (<>F) \/ (<>G)
 79 | OBVIOUS 
 80 | 
 81 | THEOREM ASSUME NEW F, NEW G 
 82 | PROVE 
 83 |     /\ ([]F) \/ ([]G) => [](F \/ G)
 84 |     /\ <>(F /\ G) => (<>F) /\ (<>G)
 85 | OBVIOUS 
 86 | 
 87 | \* see Specifying Systems page 94
 88 | THEOREM ASSUME NEW ACTION A, NEW ACTION B, NEW VARIABLE v 
 89 | PROVE 
 90 |     /\ [A /\ B]_v <=> [A]_v /\ [B]_v
 91 |     /\ <<A \/ B>>_v <=> <<A>>_v \/ <<B>>_v
 92 |     \* 8.5
 93 |     /\ ([]<><<A>>_v) \/ ([]<><<B>>_v) <=> ([]<><<A>>_v) \/ ([]<><<B>>_v)
 94 | OBVIOUS 
 95 | 
 96 | \* see Specifying Systems page 95
 97 | THEOREM ASSUME NEW ACTION A, NEW ACTION B, NEW VARIABLE v 
 98 | PROVE 
 99 |     /\ []<><<A \/ B>>_v <=> ([]<><<A>>_v) \/ ([]<><<B>>_v)
100 | BY PTL
101 | 
102 | ------------------
103 | \* (Weak) Fairness (see Specifying Systems page 97ff for more equivalent formulae)
104 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 
105 | PROVE ( <>[](ENABLED <<A>>_v) => []<><<A>>_v ) <=> ( []([]ENABLED <<A>>_v => <><<A>>_v) )  BY PTL 
106 | 
107 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 
108 | PROVE ( <>[](ENABLED <<A>>_v) => []<><<A>>_v ) <=> ( WF_v(A) )  BY PTL 
109 | 
110 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 
111 | PROVE ( []<>(~ENABLED <<A>>_v) \/ []<><<A>>_v ) <=> ( WF_v(A) )  BY PTL 
112 | 
113 | THEOREM ASSUME NEW ACTION A, NEW VARIABLE v 
114 | PROVE ( []<>(ENABLED <<A>>_v) => []<><<A>>_v ) <=>( SF_v(A) )  BY PTL 
115 | 
116 | ------------------
117 | \* Leads-to
118 | THEOREM ASSUME NEW F, NEW G
119 | PROVE [](F => <>G) <=> (F ~> G)  OMITTED 
120 | 
121 | ------------------
122 | \* CHOOSE
123 | 
124 | THEOREM ASSUME NEW P(_), NEW S
125 | PROVE ( \E c: P(c) ) <=> ( P(CHOOSE c: P(c)) )  OBVIOUS 
126 | 
127 | ====
128 | 


--------------------------------------------------------------------------------
/F.tla:
--------------------------------------------------------------------------------
  1 | ---- MODULE F ----
  2 | EXTENDS Naturals, FiniteSets, Sequences
  3 | 
  4 | (* 1. Set of all permutations of {"T","L","A"} including repetitions. *)
  5 | PermsWithReps(S) ==
  6 |     [ 1..Cardinality(S) -> S ]
  7 |     
  8 | ASSUME 
  9 |     PermsWithReps({"T","L","A"}) =
 10 |         {<<"T", "T", "T">>, <<"T", "T", "L">>, <<"T", "T", "A">>, 
 11 |             <<"T", "L", "T">>, <<"T", "L", "L">>, <<"T", "L", "A">>, 
 12 |             <<"T", "A", "T">>, <<"T", "A", "L">>, <<"T", "A", "A">>, 
 13 |             <<"L", "T", "T">>, <<"L", "T", "L">>, <<"L", "T", "A">>, 
 14 |             <<"L", "L", "T">>, <<"L", "L", "L">>, <<"L", "L", "A">>, 
 15 |             <<"L", "A", "T">>, <<"L", "A", "L">>, <<"L", "A", "A">>, 
 16 |             <<"A", "T", "T">>, <<"A", "T", "L">>, <<"A", "T", "A">>, 
 17 |             <<"A", "L", "T">>, <<"A", "L", "L">>, <<"A", "L", "A">>, 
 18 |             <<"A", "A", "T">>, <<"A", "A", "L">>, <<"A", "A", "A">>}
 19 | 
 20 | (* 2. All combinations of a two-digit lock. *)
 21 | TwoDigitLock ==
 22 |     [1..2 -> 0..9]
 23 | 
 24 | ASSUME
 25 |     /\ (0..9) \X (0..9) = TwoDigitLock
 26 |     /\ {<<n,m>> : n,m \in 10..19} \notin SUBSET TwoDigitLock
 27 | 
 28 | (* 3. All combinations of a three-digit lock. *)
 29 | ThreeDigitLock ==
 30 |     [1..3 -> 0..9]
 31 | 
 32 | ASSUME
 33 |     /\ (0..9) \X (0..9) \X (0..9) = ThreeDigitLock
 34 |     /\ {<<n,m,o>> : n,m,o \in 10..19} \notin SUBSET ThreeDigitLock
 35 | 
 36 | (* 4. All pairs (including repetitions) of the natural numbers. *)
 37 | PairsOfNaturals ==
 38 |     [1..2 -> Nat]
 39 | 
 40 | ASSUME
 41 |     {<<n,m>> : n,m \in 0..100} \subseteq PairsOfNaturals
 42 | 
 43 | (* 5. All triples... *)
 44 | TriplesOfNaturals ==
 45 |     [1..3 -> Nat]
 46 | 
 47 | ASSUME
 48 |     {<<n,m,o>> : n,m,o \in 0..25} \subseteq TriplesOfNaturals
 49 | 
 50 | (* 6. Set of all pairs and triples... *)
 51 | PairsAndTriplesOfNaturals ==
 52 |     [1..2 -> Nat] \cup [1..3 -> Nat]
 53 | 
 54 | ASSUME
 55 |     /\ {<<n,m>> : n,m \in 0..100} \subseteq PairsAndTriplesOfNaturals
 56 |     /\ {<<n,m,o>> : n,m,o \in 0..25} \subseteq PairsAndTriplesOfNaturals
 57 | 
 58 | (* 7. What is the Cardinality of 3. ? *)
 59 | Cardinality3 ==
 60 |     Cardinality(ThreeDigitLock)
 61 | 
 62 | ASSUME Cardinality3 = 1000
 63 | 
 64 | (* 8. What is the Cardinality of 6. (PairsAndTriplesOfNaturals) ? *)
 65 | 
 66 | --------------------------------------------------------------
 67 | 
 68 | (* 9. The range/image/co-domain of a function. *)
 69 | Range(f) == { f[x]: x \in DOMAIN f }
 70 | 
 71 | ASSUME Range([a |-> 1, b |-> 2, c |-> 3]) = 1..3
 72 | 
 73 | (* 10. The permutations of a set _without_ repetition. *)
 74 | Perms(S) ==
 75 |     { f \in [S -> S] :
 76 |         Range(f) = S }
 77 | 
 78 | ASSUME Perms({1,2,3}) =
 79 |              {<<1, 2, 3>>, <<1, 3, 2>>,
 80 |               <<2, 1, 3>>, <<2, 3, 1>>,
 81 |               <<3, 1, 2>>, <<3, 2, 1>>}
 82 | 
 83 | Perms2(S) ==
 84 |     \* If for all w in S there exists a v in S for which f[v]=w,
 85 |     \* there can be no repetitions as a consequence. The predicate
 86 |     \* demands for all elements of S to be in the range of f.
 87 |     { f \in [S -> S] :
 88 |         \A w \in S :
 89 |             \E v \in S : f[v]=w }
 90 | 
 91 | ASSUME Perms2({1,2,3}) =
 92 |              {<<1, 2, 3>>, <<1, 3, 2>>,
 93 |               <<2, 1, 3>>, <<2, 3, 1>>,
 94 |               <<3, 1, 2>>, <<3, 2, 1>>}
 95 | 
 96 | Perms3(S) ==
 97 |     { f \in [S -> S] :
 98 |         \A i,j \in DOMAIN f :
 99 |             i # j => f[i] # f[j] }
100 | 
101 | ASSUME Perms3({1,2,3}) =
102 |              {<<1, 2, 3>>, <<1, 3, 2>>,
103 |               <<2, 1, 3>>, <<2, 3, 1>>,
104 |               <<3, 1, 2>>, <<3, 2, 1>>}
105 | 
106 | (* 11. Reverse a sequence (a function with domain 1..N). *)
107 | Reverse(seq) ==
108 |     [ i \in 1..Len(seq) |-> seq[Len(seq)+1 - i] ]
109 | 
110 | ASSUME Reverse(<<1, 2, 3>>) = <<3, 2, 1>>
111 | ASSUME Reverse(<<>>) = <<>>
112 | 
113 | (* 12. An (infix) operator to quickly define a function mapping an x to a y.  *)
114 | x :> y ==
115 |     [ e \in {x} |-> y ]
116 | 
117 | ASSUME "x" :> 42 = [ x |-> 42 ]
118 | 
119 | (* 13. Merge two functions f and g *)
120 | f ++ g ==
121 |   [x \in (DOMAIN f) \cup (DOMAIN g) |-> IF x \in DOMAIN f THEN f[x] ELSE g[x]]
122 | 
123 | ASSUME <<1,2,3>> ++ [i \in 1..6 |-> i] = <<1, 2, 3, 4, 5, 6>>
124 | 
125 | (* 14. Advanced!!! Inverse of a function f (swap the domain and range). *)
126 | Inverse(f) ==
127 |    CHOOSE g \in [ Range(f) -> DOMAIN f] : \A s \in DOMAIN f: g[f[s]]=s
128 | 
129 | ASSUME Inverse(("a" :> 0) ++ ("b" :> 1) ++ ("c" :> 2)) =
130 |               ((0 :> "a") ++ (1 :> "b") ++ (2 :> "c"))
131 | 
132 | --------------------------------------------------------------
133 | 
134 | \* Mutual recursion becomes possible
135 | \* with recursive *operators*.
136 | \* Evaluate in the tlcrepl with:
137 | \*  LET F == INSTANCE F IN F!IsEven(42)
138 | ----------------------
139 | RECURSIVE IsEven(_)
140 | 
141 | RECURSIVE IsOdd(_)
142 | 
143 | IsEven(n) ==
144 |     IF n = 0
145 |     THEN TRUE
146 |     ELSE IsOdd(n-1)
147 | 
148 | IsOdd(n) ==
149 |     IF n = 0
150 |     THEN FALSE
151 |     ELSE IsEven(n-1)
152 | 
153 | ==================
154 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # ewd998
 2 | Experience TLA+ in action by specifying distributed termination detection on a ring, [due to Shmuel Safra](https://www.cs.utexas.edu/users/EWD/ewd09xx/EWD998.PDF).  Each [git commit](https://github.com/lemmy/ewd998/commits/) introduces a new TLA+ concept.  Go back to the very first commit to follow along!
 3 | 
 4 | ### v00: IDE
 5 | 
 6 | Click either one of the buttons to launch a zero-install IDE to give the TLA+ specification language a try:
 7 | 
 8 | [![Open TLA+ EWD998 in Codespaces](https://img.shields.io/badge/TLA+-in--Codespaces-grey?labelColor=ee4e14&style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyBmaWxsPSIjNjY2NjY2IiByb2xlPSJpbWciIHZpZXdCb3g9IjAgMCAyNCAyNCIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj48dGl0bGU+TWljcm9zb2Z0IGljb248L3RpdGxlPjxwYXRoIGQ9Ik0xMS40IDI0SDBWMTIuNmgxMS40VjI0ek0yNCAyNEgxMi42VjEyLjZIMjRWMjR6TTExLjQgMTEuNEgwVjBoMTEuNHYxMS40em0xMi42IDBIMTIuNlYwSDI0djExLjR6Ii8+PC9zdmc+)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=408523143&machine=standardLinux32gb&devcontainer_path=.devcontainer%2Fdevcontainer.json&location=WestUs2)
 9 | [![Open TLA+ EWD998 in Gitpod Ready-to-Code](https://img.shields.io/badge/TLA+-in--Gitpod-grey?labelColor=ee4e14&style=for-the-badge&logo=gitpod)](https://gitpod.io/#https://github.com/ewd998/ewd998)
10 | 
11 | (=> [Screencast how to create the TLA+ Codespace](https://www.youtube.com/watch?v=mFWWDcJahg0&list=PLWLcqZLzY8u_oWnCTGC77OgZlWaab06Gt))
12 | 
13 | ### v01: Problem statement - Termination detection in a ring
14 | 
15 | #### v01a: Termination of [pleasingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel)
16 | 
17 | For this tutorial, we assume that the distributed system nodes are organized as a ring, with one the (unique) leader[^1].  If we further assume that nodes execute independent computations, (global) termination detection becomes trivial--the leader initiates a token transfer around the ring, and each node passes the token to its next neighbor, iff the node finished its computation.  When the initiator receives back the token, it knows that all (other) nodes have terminated.
18 | 
19 | ![Token Passing](figures/v01-ring01.gif)
20 | 
21 | This problem is too simple, and we don't need TLA+ to model it.
22 | 
23 | [^1] Perhaps by some [leader election algorithm](https://en.wikipedia.org/wiki/Paxos_(computer_science)).
24 | 
25 | #### v01b: Termination of collaborative computation
26 | 
27 | A more interesting problem is to look at a "collaborative" computation, which implies that nodes can re-activate each other.  For example, the result of a computation at node 23 is (atomically!) sent to and further processed at node 42. With the previous protocol, node 42 might have already passed on the token, causing the initiator to eventually detect (global) termination; a bug that is at least difficult to reproduce with testing!
28 | A solution is offered in [EWD840](https://github.com/tlaplus/Examples/blob/master/specifications/ewd840/EWD840.tla):
29 | * Initiator sends a "stateful" token around the ring
30 | * Each node remembers if it activated another node
31 | * Activation taints the token (when the activator gets the token)
32 | * Initiator keeps running rounds until it receives an untainted token
33 | 
34 | ![Token Passing](figures/v01-ring03.gif)
35 | 
36 | #### v01c: Termination detection with asynchronous message delivery
37 | 
38 | What happens if we loosen the restriction that message delivery is atomic (it seldom is)?  Clearly, we are back at square one:
39 | 1) Node 23 sends a message to 42
40 | 2) 23 taints the token
41 | 3) Initiator starts a new round
42 | 4) Node 42 received the fresh token before receiving the activation message from 23
43 | 5) Boom!
44 | 
45 | The fix proposed in [Shmuel Safra's EWD998](https://www.cs.utexas.edu/users/EWD/ewd09xx/EWD998.PDF), is to count in-flight messages. But will this work?
46 | 
47 | ![EWD998](figures/v01-ring04.gif)
48 | 
49 | Throughout the chapters of this tutorial, we will use the TLA+ specification language to model EWD998, and check interesting properties.
50 | 
51 | ### v02: High-level spec AsyncTerminationDetection
52 | 
53 | TLA+ is all about abstraction, and, as we will later see, has first-class support to connect different levels of abstraction. Let's use this and write a basic spec that either falsifies our design above, or gives us sufficient confidence to invest in writing a more detailed spec.
54 | 
55 | (Credit: [Stephan Merz](https://members.loria.fr/Stephan.Merz/) wrote AsyncTerminationDetection)
56 | 
57 | #### v02a: Spec skeleton
58 | 
59 | Instead of modeling message channels, let alone modeling the transport layer, we will write a spec that models:
60 | 
61 | 1) A ring of N nodes 
62 | 2) The activation status of each node
63 | 3) The number of messages *pending*[^2] at a node
64 | 4) A send action
65 | 5) A receive action
66 | 6) A terminate action
67 | 7) The initial configuration of the system
68 | 
69 | Please switch to [AsyncTerminationDetection.tla](AsyncTerminationDetection.tla) and read its comments.  From here
70 | on, the tutorial continues there...
71 | 
72 | [^2] It's difficult to (efficiently) count pending messages in an implementation. In a TLA+ spec, we don't care about that notion of efficiency.  Also, all variables are global.
73 | 


--------------------------------------------------------------------------------
/AsyncTerminationDetection_proof.tla:
--------------------------------------------------------------------------------
  1 | ---------------------- MODULE AsyncTerminationDetection_proof ---------------------
  2 | EXTENDS AsyncTerminationDetection, TLAPS
  3 | 
  4 | (* Do not whitelist all the known facts/assumptions and definitions to speedup provers *)
  5 | \*USE NIsPosNat DEF vars, terminated, Node,
  6 | \*                  Init, Next, Spec,
  7 | \*                  DetectTermination, Terminate,
  8 | \*                  Wakeup, SendMsg,
  9 | \*                  TypeOK, Stable
 10 | 
 11 | \* * An invariant I is inductive, iff Init => I and I /\ [Next]_vars => I. Note
 12 | \* * though, that TypeOK itself won't imply  Stable  though!  TypeOK alone
 13 | \* * does not help us prove Stable.
 14 | 
 15 | LEMMA TypeCorrect == Spec => []TypeOK
 16 | <1>1. Init => TypeOK BY NIsPosNat DEF Init, TypeOK, Node, terminated
 17 | <1>2. TypeOK /\ [Next]_vars => TypeOK'
 18 |  <2> SUFFICES ASSUME TypeOK,
 19 |                      [Next]_vars
 20 |               PROVE  TypeOK'
 21 |    OBVIOUS
 22 |  <2>1. CASE DetectTermination
 23 |    BY <2>1 DEF TypeOK, Next, vars, DetectTermination
 24 |  <2>2. ASSUME NEW i \in Node,
 25 |               NEW j \in Node,
 26 |               Terminate(i)
 27 |        PROVE  TypeOK'
 28 |    BY <2>2 DEF TypeOK, Next, vars, Terminate, terminated
 29 |  <2>3. ASSUME NEW i \in Node,
 30 |               NEW j \in Node,
 31 |               Wakeup(i)
 32 |        PROVE  TypeOK'
 33 |    BY <2>3 DEF TypeOK, Next, vars, Wakeup
 34 |  <2>4. ASSUME NEW i \in Node,
 35 |               NEW j \in Node,
 36 |               SendMsg(i, j)
 37 |        PROVE  TypeOK'
 38 |    BY <2>4 DEF TypeOK, Next, vars, SendMsg
 39 |  <2>5. CASE UNCHANGED vars
 40 |    BY <2>5 DEF TypeOK, Next, vars
 41 |  <2>6. QED
 42 |    BY <2>1, <2>2, <2>3, <2>4, <2>5 DEF Next
 43 | <1>. QED BY <1>1, <1>2, PTL DEF Spec
 44 | 
 45 | (***************************************************************************)
 46 | (* Proofs of safety and stability.                                         *)
 47 | (***************************************************************************)
 48 | Safe == terminationDetected => terminated
 49 | 
 50 | THEOREM Safety == Spec => []Safe
 51 | <1>. USE DEF terminated, TypeOK, Safe
 52 | <1>1. Init => Safe
 53 |   BY Zenon DEF Init
 54 | <1>2. TypeOK /\ Safe /\ [Next]_vars => Safe'
 55 |   <2> SUFFICES ASSUME TypeOK, Safe, [Next]_vars
 56 |                PROVE  Safe'
 57 |     OBVIOUS
 58 |   <2>1. CASE DetectTermination
 59 |     BY <2>1 DEF DetectTermination
 60 |   <2>2. ASSUME NEW i \in Node, Terminate(i)
 61 |         PROVE  Safe'
 62 |     BY <2>2, Zenon DEF Terminate
 63 |   <2>3. ASSUME NEW i \in Node, Wakeup(i)
 64 |         PROVE  Safe'
 65 |     BY <2>3 DEF Wakeup
 66 |   <2>4. ASSUME NEW i \in Node, NEW j \in Node, SendMsg(i, j)
 67 |         PROVE  Safe'
 68 |     BY <2>4 DEF SendMsg
 69 |   <2>5. CASE UNCHANGED vars
 70 |     BY <2>5 DEF vars
 71 |   <2>. QED
 72 |     BY <2>1, <2>2, <2>3, <2>4, <2>5 DEF Next
 73 | <1>. QED
 74 |   BY <1>1, <1>2, TypeCorrect, PTL DEF Spec
 75 | 
 76 | THEOREM Stability == Spec => Stable
 77 | \* We show that terminationDetected is never reset to FALSE
 78 | <1>1. TypeOK /\ Safe /\ terminationDetected /\ [Next]_vars => terminationDetected'
 79 |     BY Zenon
 80 |        DEF TypeOK, Safe, terminated, Next, DetectTermination, Terminate, Wakeup, SendMsg, vars
 81 | <1>. QED  BY <1>1, TypeCorrect, Safety, PTL DEF Spec, Stable, Safe
 82 | 
 83 | -----------------------------------------------------------------------------
 84 | 
 85 | \* syncActive == [n \in Node |-> active[n] \/ pending[n] # 0]
 86 | 
 87 | \* STD == INSTANCE SyncTerminationDetection WITH active <- syncActive
 88 | 
 89 | \* (***************************************************************************)
 90 | \* (* We prove (the safety part of) refinement.                               *)
 91 | \* (***************************************************************************)
 92 | 
 93 | \* THEOREM Refinement == Spec => STD!Spec
 94 | \* <1>. USE DEF Node, STD!Node, syncActive, terminated, STD!terminated
 95 | \* <1>1. Init => STD!Init
 96 | \*   BY NIsPosNat, Zenon DEF Init, STD!Init
 97 | \* <1>2. TypeOK /\ Safe /\ [Next]_vars => [STD!Next]_(STD!vars)
 98 | \*   <2> SUFFICES ASSUME TypeOK, Safe, [Next]_vars
 99 | \*                PROVE  [STD!Next]_(STD!vars)
100 | \*     OBVIOUS
101 | \*   <2>. USE NIsPosNat DEF TypeOK, STD!Next, STD!vars
102 | \*   <2>1. CASE DetectTermination
103 | \*     BY <2>1, Zenon DEF DetectTermination, STD!DetectTermination
104 | \*   <2>2. ASSUME NEW i \in Node, Terminate(i)
105 | \*         PROVE  [STD!Next]_(STD!vars)
106 | \*     BY <2>2, Zenon DEF Terminate, STD!Terminate, Safe
107 | \*   <2>3. ASSUME NEW i \in Node, Wakeup(i)
108 | \*         PROVE  [STD!Next]_(STD!vars)
109 | \*     BY <2>3 DEF Wakeup
110 | \*   <2>4. ASSUME NEW i \in Node, NEW j \in Node, SendMsg(i, j)
111 | \*         PROVE  [STD!Next]_(STD!vars)
112 | \*     <3>1. syncActive[i] /\ UNCHANGED terminationDetected
113 | \*       BY <2>4 DEF SendMsg
114 | \*     <3>2. syncActive' = [syncActive EXCEPT ![j] = TRUE]
115 | \*       BY <2>4, Isa DEF SendMsg
116 | \*     <3>. QED  BY <3>1, <3>2, Zenon DEF STD!Wakeup
117 | \*   <2>5. CASE UNCHANGED vars
118 | \*     BY <2>5 DEF vars
119 | \*   <2>6. QED
120 | \*     BY <2>1, <2>2, <2>3, <2>4, <2>5 DEF Next
121 | \* <1>3. Spec => WF_(STD!vars)(STD!DetectTermination)
122 | \*   OMITTED
123 | \* <1>. QED  BY <1>1, <1>2, <1>3, TypeCorrect, Safety, PTL DEF Spec, STD!Spec
124 | 
125 | =============================================================================
126 | 


--------------------------------------------------------------------------------
/MCEWD998.tla:
--------------------------------------------------------------------------------
  1 | ------------------------------- MODULE MCEWD998 -------------------------------
  2 | EXTENDS EWD998, TLC
  3 | 
  4 | (***************************************************************************)
  5 | (* Bound the otherwise infinite state space that TLC has to check.         *)
  6 | (***************************************************************************)
  7 | StateConstraint ==
  8 |   /\ \A i \in Node : counter[i] < 3 /\ pending[i] < 3
  9 |   /\ token.q < 3
 10 | 
 11 | -----------------------------------------------------------------------------
 12 | 
 13 | \* Note that the non-property  TLCGet("level") < 42  combined with TLC's
 14 |  \* simulator quickly triggers som "counter-example" for MCEWD998.
 15 | MaxDiameter == TLCGet("level") < 42
 16 | 
 17 | \* $ tlc -noTE -simulate -deadlock MCEWD998 | grep -A1 "sim = TRUE"
 18 | Alias ==
 19 |     [
 20 |         active |-> active
 21 |         ,color |-> color
 22 |         ,counter |-> counter
 23 |         ,pending |-> pending
 24 |         ,token |-> token
 25 |         
 26 |         \* Eye-ball test if some nodes simultaneously deactivate. Note that
 27 |          \* the nodes deactive in the *successor* (primed) state.
 28 |         ,sim |-> \E i,j \in Node:
 29 |                         /\ i # j
 30 |                         /\ active[i] # active[i]'
 31 |                         /\ active[j] # active[j]'
 32 |         \* Yes, one can prime  TLCGet("...")  in recent version of TLC! With it,
 33 |          \* we account for the  sim  being true when the nodes deactivate in the
 34 |          \* successor state.  Obviously, .name will be "Deactivate".
 35 |         ,action |-> TLCGet("action")'.name
 36 |     ]
 37 | 
 38 | -----------------------------------------------------------------------------
 39 | 
 40 | \* With TLC, checking  IInv /\ [Next]_vars => IInv'  translate to a config s.t.
 41 |  \*
 42 |  \*  CONSTANT N = 3
 43 |  \*  INIT IInv
 44 |  \*  NEXT Next
 45 |  \*  INVARIANT IInv
 46 |  \*
 47 |  \* However, the number of states defined by  TypeOK  is infinite because of 
 48 |  \* sub-formulas involving undound sets (Nat & Int).  Therefore, we rewrite
 49 |  \*  TypeOk  and substitute  MyNat  for Nat  and  MyInt  for  Int  ,
 50 |  \* respectively.
 51 |  \* Alternatively, we could have re-defined  Nat  with  MyNat  and  Int  with
 52 |  \*  MyInt  . 
 53 | \* TODO Do you see why re-defining  Nat  and  Int  would have caused problems?
 54 | 
 55 | MyNat == 0..3
 56 | MyInt == -2..2
 57 | 
 58 | IInit ==
 59 |     /\ active \in [Node -> BOOLEAN]
 60 |     /\ pending \in [Node -> MyNat]
 61 |     /\ color \in [Node -> Color]
 62 |     /\ counter \in [Node -> MyInt]
 63 |     /\ token \in [ pos: Node, q: MyInt, color: Color ]
 64 |     /\ Inv
 65 | 
 66 | =============================================================================
 67 | 
 68 | $ tlc -deadlock -config MCEWD998.tla MCEWD998
 69 | 
 70 | ------------------------------ CONFIG MCEWD998 ------------------------------
 71 | 
 72 | CONSTANT N = 3
 73 | 
 74 | INIT IInit
 75 | NEXT Next
 76 | 
 77 | INVARIANT IInv
 78 | 
 79 | CONSTRAINT StateConstraint
 80 | 
 81 | =============================================================================
 82 | 
 83 | TLC2 Version 2.16 of Day Month 20?? (rev: 5682c4a)
 84 | Running breadth-first search Model-Checking with fp 75 and seed 6362907857480250600 with 1 worker on 4 cores with 5291MB heap and 64MB offheap memory [pid: 245607] (Linux 5.4.0-74-generic amd64, Ubuntu 11.0.11 x86_64, MSBDiskFPSet, DiskStateQueue).
 85 | Parsing file /home/markus/src/TLA/ewd998/MCEWD998.tla
 86 | Parsing file /home/markus/src/TLA/ewd998/EWD998.tla
 87 | Parsing file /tmp/Integers.tla (jar:file:/opt/toolbox/tla2tools.jar!/tla2sany/StandardModules/Integers.tla)
 88 | Parsing file /tmp/Naturals.tla (jar:file:/opt/toolbox/tla2tools.jar!/tla2sany/StandardModules/Naturals.tla)
 89 | Parsing file /home/markus/src/TLA/ewd998/AsyncTerminationDetection.tla
 90 | Semantic processing of module Naturals
 91 | Semantic processing of module Integers
 92 | Semantic processing of module AsyncTerminationDetection
 93 | Semantic processing of module EWD998
 94 | Semantic processing of module MCEWD998
 95 | Starting... (2021-06-05 18:13:27)
 96 | Computing initial states...
 97 | Computed 2 initial states...
 98 | Computed 4 initial states...
 99 | Computed 8 initial states...
100 | Computed 16 initial states...
101 | Computed 32 initial states...
102 | Computed 64 initial states...
103 | Computed 128 initial states...
104 | Computed 256 initial states...
105 | Computed 512 initial states...
106 | Computed 1024 initial states...
107 | Computed 2048 initial states...
108 | Computed 4096 initial states...
109 | Computed 8192 initial states...
110 | Computed 16384 initial states...
111 | Computed 32768 initial states...
112 | Computed 65536 initial states...
113 | Computed 131072 initial states...
114 | Computed 262144 initial states...
115 | Computed 524288 initial states...
116 | Finished computing initial states: 696928 states generated, with 507184 of them distinct at 2021-06-05 18:14:47.
117 | Progress(2) at 2021-06-05 18:14:50: 850,004 states generated (850,004 s/min), 509,765 distinct states found (509,765 ds/min), 454,600 states left on queue.
118 | Model checking completed. No error has been found.
119 |   Estimates of the probability that TLC did not check all reachable states
120 |   because two distinct states had the same fingerprint:
121 |   calculated (optimistic):  val = 1.4E-7
122 |   based on the actual fingerprints:  val = 1.4E-10
123 | 4895579 states generated, 599598 distinct states found, 0 states left on queue.
124 | The depth of the complete state graph search is 36.
125 | The average outdegree of the complete state graph is 0 (minimum is 0, the maximum 7 and the 95th percentile is 1).
126 | Finished in 01min 54s at (2021-06-05 18:15:20)
127 | 


--------------------------------------------------------------------------------
/MCEWD998_actions.dot:
--------------------------------------------------------------------------------
  1 | digraph ActionGraph {
  2 | nodesep=0.35;
  3 | subgraph cluster_legend {
  4 | label = "Coverage";
  5 | node [shape=point] {
  6 | d0 [style = invis];
  7 | d1 [style = invis];
  8 | p0 [style = invis];
  9 | p0 [style = invis];
 10 | }
 11 | d0 -> d1 [label=unseen, color="green", style=dotted]
 12 | p0 -> p1 [label=seen]
 13 | }
 14 | subgraph cluster_2914 {
 15 | color="white"
 16 | label="[]"
 17 | 0 [label="InitiateProbe"]
 18 | }
 19 | subgraph cluster_577585152 {
 20 | color="white"
 21 | label="[i->1, i->1]"
 22 | 1 [label="PassToken"]
 23 | }
 24 | subgraph cluster_1169842819 {
 25 | color="white"
 26 | label="[i->2, n->2]"
 27 | 9 [label="SendMsg"]
 28 | 10 [label="RecvMsg"]
 29 | 11 [label="Deactivate"]
 30 | }
 31 | subgraph cluster_572967547 {
 32 | color="white"
 33 | label="[i->1, n->1]"
 34 | 6 [label="SendMsg"]
 35 | 7 [label="RecvMsg"]
 36 | 8 [label="Deactivate"]
 37 | }
 38 | subgraph cluster_1165225214 {
 39 | color="white"
 40 | label="[i->2, i->2]"
 41 | 2 [label="PassToken"]
 42 | }
 43 | subgraph cluster_1979189383 {
 44 | color="white"
 45 | label="[i->0, n->0]"
 46 | 3 [label="SendMsg"]
 47 | 4 [label="RecvMsg"]
 48 | 5 [label="Deactivate"]
 49 | }
 50 | 0 -> 0[penwidth=0.56];
 51 | 0 -> 1[color="green",style=dotted];
 52 | 0 -> 2[penwidth=0.62];
 53 | 0 -> 3[penwidth=0.53];
 54 | 0 -> 4[penwidth=0.5];
 55 | 0 -> 5[penwidth=0.54];
 56 | 0 -> 6[penwidth=0.52];
 57 | 0 -> 7[penwidth=0.5];
 58 | 0 -> 8[penwidth=0.53];
 59 | 0 -> 9[penwidth=0.52];
 60 | 0 -> 10[color="green",style=dotted];
 61 | 0 -> 11[penwidth=0.51];
 62 | 1 -> 0[penwidth=0.6];
 63 | 1 -> 1[color="green",style=dotted];
 64 | 1 -> 2[color="green",style=dotted];
 65 | 1 -> 3[penwidth=0.48];
 66 | 1 -> 4[penwidth=0.5];
 67 | 1 -> 5[penwidth=0.48];
 68 | 1 -> 6[color="green",style=dotted];
 69 | 1 -> 7[penwidth=0.49];
 70 | 1 -> 8[color="green",style=dotted];
 71 | 1 -> 9[color="green",style=dotted];
 72 | 1 -> 10[color="green",style=dotted];
 73 | 1 -> 11[color="green",style=dotted];
 74 | 2 -> 0[color="green",style=dotted];
 75 | 2 -> 1[penwidth=0.63];
 76 | 2 -> 2[color="green",style=dotted];
 77 | 2 -> 3[penwidth=0.48];
 78 | 2 -> 4[penwidth=0.51];
 79 | 2 -> 5[penwidth=0.51];
 80 | 2 -> 6[penwidth=0.48];
 81 | 2 -> 7[penwidth=0.5];
 82 | 2 -> 8[penwidth=0.5];
 83 | 2 -> 9[color="green",style=dotted];
 84 | 2 -> 10[color="green",style=dotted];
 85 | 2 -> 11[color="green",style=dotted];
 86 | 3 -> 0[penwidth=0.49];
 87 | 3 -> 1[penwidth=0.44];
 88 | 3 -> 2[penwidth=0.47];
 89 | 3 -> 3[penwidth=0.53];
 90 | 3 -> 4[penwidth=0.46];
 91 | 3 -> 5[penwidth=0.53];
 92 | 3 -> 6[penwidth=0.46];
 93 | 3 -> 7[penwidth=0.55];
 94 | 3 -> 8[penwidth=0.48];
 95 | 3 -> 9[penwidth=0.39];
 96 | 3 -> 10[color="green",style=dotted];
 97 | 3 -> 11[penwidth=0.41];
 98 | 4 -> 0[penwidth=0.49];
 99 | 4 -> 1[penwidth=0.49];
100 | 4 -> 2[penwidth=0.49];
101 | 4 -> 3[penwidth=0.55];
102 | 4 -> 4[penwidth=0.47];
103 | 4 -> 5[penwidth=0.56];
104 | 4 -> 6[penwidth=0.48];
105 | 4 -> 7[penwidth=0.47];
106 | 4 -> 8[penwidth=0.5];
107 | 4 -> 9[penwidth=0.4];
108 | 4 -> 10[color="green",style=dotted];
109 | 4 -> 11[penwidth=0.42];
110 | 5 -> 0[penwidth=0.54];
111 | 5 -> 1[penwidth=0.54];
112 | 5 -> 2[penwidth=0.53];
113 | 5 -> 3[color="green",style=dotted];
114 | 5 -> 4[penwidth=0.51];
115 | 5 -> 5[color="green",style=dotted];
116 | 5 -> 6[penwidth=0.5];
117 | 5 -> 7[penwidth=0.52];
118 | 5 -> 8[penwidth=0.51];
119 | 5 -> 9[penwidth=0.41];
120 | 5 -> 10[color="green",style=dotted];
121 | 5 -> 11[penwidth=0.43];
122 | 6 -> 0[penwidth=0.48];
123 | 6 -> 1[color="green",style=dotted];
124 | 6 -> 2[penwidth=0.45];
125 | 6 -> 3[penwidth=0.46];
126 | 6 -> 4[penwidth=0.53];
127 | 6 -> 5[penwidth=0.46];
128 | 6 -> 6[penwidth=0.5];
129 | 6 -> 7[penwidth=0.44];
130 | 6 -> 8[penwidth=0.55];
131 | 6 -> 9[penwidth=0.36];
132 | 6 -> 10[color="green",style=dotted];
133 | 6 -> 11[penwidth=0.36];
134 | 7 -> 0[penwidth=0.49];
135 | 7 -> 1[color="green",style=dotted];
136 | 7 -> 2[penwidth=0.49];
137 | 7 -> 3[penwidth=0.49];
138 | 7 -> 4[penwidth=0.47];
139 | 7 -> 5[penwidth=0.49];
140 | 7 -> 6[penwidth=0.55];
141 | 7 -> 7[penwidth=0.48];
142 | 7 -> 8[penwidth=0.56];
143 | 7 -> 9[penwidth=0.29];
144 | 7 -> 10[color="green",style=dotted];
145 | 7 -> 11[penwidth=0.33];
146 | 8 -> 0[penwidth=0.51];
147 | 8 -> 1[penwidth=0.56];
148 | 8 -> 2[penwidth=0.52];
149 | 8 -> 3[penwidth=0.49];
150 | 8 -> 4[penwidth=0.53];
151 | 8 -> 5[penwidth=0.51];
152 | 8 -> 6[color="green",style=dotted];
153 | 8 -> 7[penwidth=0.52];
154 | 8 -> 8[color="green",style=dotted];
155 | 8 -> 9[penwidth=0.41];
156 | 8 -> 10[color="green",style=dotted];
157 | 8 -> 11[penwidth=0.42];
158 | 9 -> 0[penwidth=0.43];
159 | 9 -> 1[color="green",style=dotted];
160 | 9 -> 2[color="green",style=dotted];
161 | 9 -> 3[penwidth=0.4];
162 | 9 -> 4[penwidth=0.47];
163 | 9 -> 5[penwidth=0.38];
164 | 9 -> 6[penwidth=0.35];
165 | 9 -> 7[penwidth=0.3];
166 | 9 -> 8[penwidth=0.38];
167 | 9 -> 9[penwidth=0.47];
168 | 9 -> 10[color="green",style=dotted];
169 | 9 -> 11[penwidth=0.48];
170 | 10 -> 0[color="green",style=dotted];
171 | 10 -> 1[color="green",style=dotted];
172 | 10 -> 2[color="green",style=dotted];
173 | 10 -> 3[color="green",style=dotted];
174 | 10 -> 4[color="green",style=dotted];
175 | 10 -> 5[color="green",style=dotted];
176 | 10 -> 6[color="green",style=dotted];
177 | 10 -> 7[color="green",style=dotted];
178 | 10 -> 8[color="green",style=dotted];
179 | 10 -> 9[color="green",style=dotted];
180 | 10 -> 10[color="green",style=dotted];
181 | 10 -> 11[color="green",style=dotted];
182 | 11 -> 0[penwidth=0.48];
183 | 11 -> 1[color="green",style=dotted];
184 | 11 -> 2[penwidth=0.5];
185 | 11 -> 3[penwidth=0.42];
186 | 11 -> 4[penwidth=0.46];
187 | 11 -> 5[penwidth=0.42];
188 | 11 -> 6[penwidth=0.39];
189 | 11 -> 7[penwidth=0.38];
190 | 11 -> 8[penwidth=0.41];
191 | 11 -> 9[color="green",style=dotted];
192 | 11 -> 10[color="green",style=dotted];
193 | 11 -> 11[color="green",style=dotted];
194 | }


--------------------------------------------------------------------------------
/EWD998.tla:
--------------------------------------------------------------------------------
  1 | It is time to pause and recap what we've done so far, both in terms of learning
  2 | TLA+ and modeling termination detection in a ring, a.k.a. EWD998.
  3 | 
  4 | Regarding the termination detection algorithm, checking the spec
  5 |  AsyncTerminationDetection   (with TLC and Apalache) confirms that the high-level
  6 | design of counting in-flight messages is a valid approach to detecting (global)
  7 | termination.  It might seem silly to write such a simple spec to confirm what is
  8 | easy to see is true.  On the other hand, writing a tiny spec is a small investment,
  9 | and "Writing is nature's way of letting you know how sloppy your thinking is"
 10 | (Guindon).  Later, we will see another reason why specifying
 11 |  AsyncTerminationDetection  paid off.
 12 | 
 13 | What comes next is to (re-)model  AsyncTerminationDetection  at a level of detail
 14 | that matches the EWD998 paper.  Here is a reformulated & reordered excerpt of the
 15 | eight rules that (informally) describe the algorithm:
 16 | 
 17 | 0) Sending a message by node  i  increments a counter at  i  , the receipt of a
 18 |    message decrements i's counter
 19 | 3) Receiving a *message* (not token) blackens the (receiver) node
 20 | 2) An active node j -owning the token- keeps the token.  When j becomes inactive,
 21 |    it passes the token to its neighbor with  q = q + counter[j] 
 22 | 4) A black node taints the token
 23 | 7) Passing the token whitens the sender node
 24 | 1) The initiator sends the token with a counter  q  initialized to  0  and color
 25 |    white
 26 | 5) The initiator starts a new round iff the current round is inconclusive
 27 | 6) The initiator whitens itself and the token when initiating a new round
 28 | 
 29 | 
 30 | Regarding learning TLA+, we've already covered lots of ground. Most importantly,
 31 | we encountered TLA with its temporal operators, behaviors, safety & liveness
 32 | properties, fairness, ...  Learning TLA+ is pretty much downhill from here on.
 33 | 
 34 | The remaining concepts this tutorial covers are:
 35 | - IF-THEN-ELSE
 36 | - Records
 37 | - Recursive functions & operators
 38 | - Refinement
 39 | - Tuples/Sequences
 40 | - CHOOSE operator (Hilbert's epsilon)
 41 | 
 42 | ------------------------------- MODULE EWD998 -------------------------------
 43 | EXTENDS Integers \* No longer Naturals \* TODO Do you already see why?
 44 | 
 45 | CONSTANT 
 46 |     \* @type: Int;
 47 |     N
 48 | 
 49 | ASSUME NIsPosNat == N \in Nat \ {0}
 50 | 
 51 | Node == 0 .. N-1
 52 | 
 53 | Color == {"white", "black"}
 54 | 
 55 | VARIABLES 
 56 |     \* @type: Int -> Bool;
 57 |     active,
 58 |     \* @type: Int -> Int;
 59 |     pending,
 60 |     color,
 61 |     counter,
 62 |     token
 63 | 
 64 | vars == <<active, pending, color, counter, token>>
 65 | 
 66 | TypeOK ==
 67 |     /\ active \in [Node -> BOOLEAN]
 68 |     /\ pending \in [Node -> Nat]
 69 |     /\ color \in [Node -> Color]
 70 |     /\ counter \in [Node -> Int]
 71 |     \* * TLA+ has records which are fuctions whose domain are strings. Since
 72 |      \* * records are functions, the syntax to create a record is that of a
 73 |      \* * function, except that the record key does not get quoted.
 74 |     \* * Finally, as with function sets we've seen earlier, it is easy
 75 |      \* * to define the set of records.  However, the syntax is not  ->  ,
 76 |      \* * but the  :  (colon),  [ a : {1,2,3} ]  .
 77 |     /\ token \in [ pos: Node, q: Int, color: Color ]
 78 | 
 79 | -----------------------------------------------------------------------------
 80 | 
 81 | Init ==
 82 |     /\ active \in [Node -> BOOLEAN]
 83 |     /\ pending = [i \in Node |-> 0]
 84 |     (* Rule 0 *)
 85 |     /\ color \in [Node -> Color]
 86 |     /\ counter = [i \in Node |-> 0]
 87 |     /\ pending = [i \in Node |-> 0]
 88 |     /\ token = [pos |-> 0, q |-> 0, color |-> "black"]
 89 | 
 90 | -----------------------------------------------------------------------------
 91 | 
 92 | InitiateProbe ==
 93 |     (* Rules 1 + 5 + 6 *)
 94 |     /\ token.pos = 0
 95 |     /\ \* previous round inconclusive:
 96 |         \/ token.color = "black"
 97 |         \/ color[0] = "black"
 98 |         \/ counter[0] + token.q > 0
 99 |     /\ token' = [ pos |-> N-1, q |-> 0, color |-> "white"]
100 |     /\ color' = [ color EXCEPT ![0] = "white" ]
101 |     /\ UNCHANGED <<active, counter, pending>>                            
102 | 
103 | PassToken(i) ==
104 |     (* Rules 2 + 4 + 7 *)
105 |     /\ ~ active[i]
106 |     /\ token.pos = i
107 |     \* Rule 2 + 4
108 |     \* Wow, TLA+ has an IF-THEN-ELSE expressions.
109 |     /\ token' = [ token EXCEPT !.pos = @ - 1,
110 |                                !.q   = @ + counter[i],
111 |                                !.color = IF color[i] = "black" THEN "black" ELSE @ ]
112 |     \* Rule 7
113 |     /\ color' = [ color EXCEPT ![i] = "white" ]
114 |     /\ UNCHANGED <<active, pending, counter>>
115 | 
116 | System ==
117 |     \/ InitiateProbe
118 |     \/ \E i \in Node \ {0}: PassToken(i)
119 | 
120 | -----------------------------------------------------------------------------
121 | 
122 | SendMsg(i) ==
123 |     (* Rule 0 *)
124 |     /\ active[i]
125 |     /\ counter' = [counter EXCEPT ![i] = @ + 1]
126 |     \* TLA has a CHOOSE operator that picks a value satisfying some property:  
127 |      \*   CHOOSE x \in S: P(x)   
128 |      \* The choice is deterministic, meaning that CHOOSE always picks the same value.
129 |      \* If no value in  S  satisfies the property  P  , the value of the CHOOSE
130 |      \* expression is undefined.  It is *not* an error in TLA, although TLC will
131 |      \* complain. Likewise, TLC won't choose if  S  is unbound/infinite.
132 |     \* CHOOSE  is almost always wrong when it appears in the behavior spec
133 |      \* (except for constant-level operators such as  Min(S)  or when choosing
134 |      \* what is called model-values).
135 |      \* In TLA+, non-deteministic choice is expressed with existential
136 |      \* quantification, like it was done in  Environment  and  System  .
137 |      \* However, using  CHOOSE  is a common mistake, which is why this topic is
138 |      \* covered in this tutorial.  CHOOSE  usually has the "advantage" to cause
139 |      \* less state-space explosion; but not in a good way.
140 |     /\ \E recv \in (Node \ {i}):
141 |             pending' = [pending EXCEPT ![recv] = @ + 1]
142 |     /\ UNCHANGED <<active, color, token>>
143 | 
144 | \* Wakeup(i) in AsyncTerminationDetection.
145 | RecvMsg(i) ==
146 |     /\ pending[i] > 0
147 |     /\ active' = [active EXCEPT ![i] = TRUE]
148 |     /\ pending' = [pending EXCEPT ![i] = @ - 1]
149 |     (* Rule 0 + 3 *)
150 |     /\ counter' = [counter EXCEPT ![i] = @ - 1]
151 |     /\ color' = [ color EXCEPT ![i] = "black" ]
152 |     /\ UNCHANGED <<token>>
153 | 
154 | \* Terminate(i) in AsyncTerminationDetection.
155 | Deactivate ==
156 |     \* Modeling variant: Let multiple (logical processes) nodes deactivate at
157 |      \* the same time/in the same step. This breaks the refinement ATD => STD.
158 |      \* (Pick a function from the set of functions s.t. the inactive nodes in
159 |      \* the current step remain inactive and the active nodes in the current
160 |      \* step non-deterministically deactivate.)
161 |     /\ active' \in { f \in [ Node -> BOOLEAN] : \A n \in Node: ~active[n] => ~f[n] }
162 |     \* To avoid generating behaviors that quickly stutter when simulating the spec.
163 |     /\ active' # active
164 |     /\ UNCHANGED <<pending, color, counter, token>>
165 | 
166 | Environment == 
167 |     \E n \in Node:
168 |         \/ SendMsg(n)
169 |         \/ RecvMsg(n)
170 |         \/ Deactivate
171 | 
172 | -----------------------------------------------------------------------------
173 | 
174 | Next ==
175 |   System \/ Environment
176 | 
177 | Spec == Init /\ [][Next]_vars /\ WF_vars(System)
178 | \* With the refinement below, TLC produces the following (liveness) violation:
179 |  \* Error: Temporal properties were violated.
180 |  \*
181 |  \* Error: The following behavior constitutes a counter-example:
182 |  \*
183 |  \* State 1: <Initial predicate>
184 |  \* /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
185 |  \* /\ counter = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
186 |  \* /\ token = [q |-> 0, color |-> "black", pos |-> 0]
187 |  \* /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE)
188 |  \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white")
189 |  \*
190 |  \* State 2: <InitiateProbe line 93, col 5 to line 100, col 45 of module EWD998>
191 |  \* /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
192 |  \* /\ counter = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
193 |  \* /\ token = [q |-> 0, color |-> "white", pos |-> 2]
194 |  \* /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE)
195 |  \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white")
196 |  \*
197 |  \* State 3: <PassToken line 104, col 5 to line 113, col 45 of module EWD998>
198 |  \* /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
199 |  \* /\ counter = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
200 |  \* /\ token = [q |-> 0, color |-> "white", pos |-> 1]
201 |  \* /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE)
202 |  \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white")
203 |  \*
204 |  \* State 4: Stuttering
205 | \* This counter-examples makes us realize that we haven't defined a suitable
206 |  \* fairness property for  EWD998 .
207 | \* With  WF_vars(Next)  , TLC finds a counter-example where the  Initiator  
208 |  \* forever initiates new token rounds, but one node never receives a message
209 |  \* that was send to it.
210 |  \*
211 |  \* State 1: <Initial predicate>
212 |  \* /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
213 |  \* /\ counter = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
214 |  \* /\ token = [q |-> 0, color |-> "black", pos |-> 0]
215 |  \* /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE)
216 |  \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white") 
217 |  \*
218 |  \* State 2: <SendMsg line 123, col 5 to line 132, col 41 of module EWD998>
219 |  \* /\ pending = (0 :> 0 @@ 1 :> 1 @@ 2 :> 0)
220 |  \* /\ counter = (0 :> 1 @@ 1 :> 0 @@ 2 :> 0)
221 |  \* /\ token = [q |-> 0, color |-> "black", pos |-> 0]
222 |  \* /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE)
223 |  \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white")
224 |  \*
225 |  \* State 3: <InitiateProbe line 93, col 5 to line 100, col 45 of module EWD998>
226 |  \* /\ pending = (0 :> 0 @@ 1 :> 1 @@ 2 :> 0)
227 |  \* /\ counter = (0 :> 1 @@ 1 :> 0 @@ 2 :> 0)
228 |  \* /\ token = [q |-> 0, color |-> "white", pos |-> 2]
229 |  \* /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE)
230 |  \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white")
231 |  \*
232 |  \* State 4: <PassToken line 104, col 5 to line 113, col 45 of module EWD998>
233 |  \* /\ pending = (0 :> 0 @@ 1 :> 1 @@ 2 :> 0)
234 |  \* /\ counter = (0 :> 1 @@ 1 :> 0 @@ 2 :> 0)
235 |  \* /\ token = [q |-> 0, color |-> "white", pos |-> 1]
236 |  \* /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE)
237 |  \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white")
238 |  \*
239 |  \* State 5: <PassToken line 104, col 5 to line 113, col 45 of module EWD998>
240 |  \* /\ pending = (0 :> 0 @@ 1 :> 1 @@ 2 :> 0)
241 |  \* /\ counter = (0 :> 1 @@ 1 :> 0 @@ 2 :> 0)
242 |  \* /\ token = [q |-> 0, color |-> "white", pos |-> 0]
243 |  \* /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE)
244 |  \* /\ color = (0 :> "white" @@ 1 :> "white" @@ 2 :> "white")
245 |  \*
246 |  \* Back to state 3: <InitiateProbe line 93, col 5 to line 100, col 45 of module EWD998>
247 |  \*
248 |  \* This hints at the fact that  EWD998  does not handle unreliable message
249 |  \* delivery.  However, what is really happening is that the  RecvMsg  never
250 |  \* occurs.  How can that be, since we defined (weak) fairness on the  Next  
251 |  \* action and its sub-action  RecvMsg  is permanently enabled?
252 |  \* Fairness does not distribute over the sub-actions of an action such as  Next  .
253 |  \* If this is what we want, we would have to conjoin multiple fairness 
254 |  \* conditions to  Spec  ; one for each sub-action.  This isn't really what we
255 |  \* want, though.  Fundamentally, the algorithm described in EWD998 detects
256 |  \* termination if and only if all nodes (eventually) terminate.  If the nodes
257 |  \* never terminate (which subsumes sending messages back and forth), there is
258 |  \* no termination to detect.  This suggests that we are only interest in
259 |  \* checking whether or not termination is detected for those behaviors where
260 |  \* all nodes eventually terminate.
261 | 
262 | terminationDetected ==
263 |     /\ token.pos = 0
264 |     /\ token.color = "white"
265 |     /\ token.q + counter[0] = 0
266 |     /\ color[0] = "white"
267 |     /\ ~ active[0]
268 | 
269 | \* We haven't checked anything except the  TypeOK  invariant above, which does
270 |  \* not say anything about termination detection.  What we could do, is to
271 |  \* re-state and check the same theorems  Stable  and  Live  that we checked for
272 |  \*  AsyncTerminationDetection  -- copy&paste is acceptable with specs after all!
273 |  \* On the other hand, this is not exactly what we want to check; we don't want
274 |  \* to check that (an amended)  Stable  and  Live  hold for  EWD998.  What we
275 |  \* really care about is that the module  EWD998  *implements* the high-level
276 |  \* specificiation  AsyncTerminationDetection  (ATD).
277 |  \* With TLA, implementation is (logical) implication.  To state that some spec
278 |  \*  I  implements  a higher-level specification  A  is formally expressed as
279 |  \*  I => A  .  This is  equivalent to saying that the behaviors defined by  I
280 |  \* are a subset of the behaviors defined by  A  .  However, what if  I  declares
281 |  \* additional variables that don't exist in  A  ?  For spec  EWD998  , the
282 |  \* variables  color, token, pending  do not appear in  ATD  .  This is where
283 |  \* the sub-scripts we added to the various temporal formulas in  ATD  start to
284 |  \* make sense.  Recall that  [][A]_v  is equivalent to  [](A \/ v'=v)  .  This
285 |  \* formula is true of behaviors in which variables - not appearing in  [A]_v  -
286 |  \* change in any way they want, as long as the variables in  v  remain unchanged,
287 |  \* or a  A step happens.  In fact,  [A]_v  does not say anything about variables
288 |  \* not appearing in it; the formula does not "care" about those variables.  For
289 |  \*  EWD998  and  ATD  ,  the module  ATD  allows the module  EWD998  to specify
290 |  \* anything "in line" with  ATD  .
291 |  \* Remember that an  A  step  above is just an action-level formula.  The
292 |  \* identifier  A  of its definition is just a syntactic element to make specs
293 |  \* more readable.  In other words, when we say  A  step above, we talk about
294 |  \* the formula (the right-hand side of  A == foo).  Thus, the  A  step of  ATD
295 |  \* can be a  B  step of  EWD998  provided that  B  is a step permitted by  A  .
296 | \* This theorem is syntactically incorrect, because we haven't added the module
297 |  \*  AsyncTerminationDetection  to the list of  EXTENDS  at the top of  EWD998.
298 |  \* If we were to add  ATD  to the  EXTENDS  , we would end up with various name
299 |  \* clashes.  Think of  EXTENDS  as inlining the extended modulese.
300 |  \* What we need is to "import"  ATD  under a new namespace, thoug.  In TLA, the
301 |  \* term is instantiation, syntactically expressed with  INSTANCE M  where  M  
302 |  \* is a module.  To instantiate module  M  into a namespace, we rely on the
303 |  \* (fundamental) concept of definitions again:  M == INSTANCE M  .
304 | \* The module  ATD  declares the variables  terminationDetected  that is absent
305 |  \* in  EWD998  .  In other words,  EWD998  does not define a value of the
306 |  \* variable  terminationDetected  in its behaviors.  We can define the value of
307 |  \*  terminationDetected  in  EWD998  by stating with what expression  
308 |  \*  terminationDetected  should be substituted that is equivalent to  
309 |  \*  ATD!terminationDetected  .  Syntactically, we append a
310 |  \*  WITH symbol <- substitution  to the INSTANCE statement.
311 | ATD == INSTANCE AsyncTerminationDetection
312 | 
313 | THEOREM Implements == Spec => ATD!Spec
314 | 
315 | \* The bang is not a valid token in a config file.
316 | ATDSpec == ATD!Spec
317 | 
318 | \* With the refinement done, it is sanity-check time again. As we have learned
319 |  \* with the state constraint earlier, a good check is to quickly generate a
320 |  \* small number of behaviors.  If some actions are not covered, we have to look
321 |  \* closer.
322 | \* Another useful sanity-check is to verify the spec for a single node, i.e., 
323 |  \*  N = 1  .  We want termination to detect termination of a single node, no?
324 | \* Generating the graph with "full" statistics reveals the context in which the
325 |  \* action formulae are evaluated.  In other words, the graph includes the
326 |  \* parameters that were "passed" to the actions.
327 |  \* For the graph generated from EWD998, the  RecvMsg  action for the context
328 |  \*  [i->2]  , which corresponds to node #2 is not covered.  This means that the
329 |  \* sub-action  RecvMsg  was never enabled when the simulator generated the
330 |  \* behaviors, which can be the case iff  SendMsg   never incremented
331 |  \*  pending[2]  . This might just be exceptional luck, but maybe there is
332 |  \* something more subtle going on.  This is an excellent opportunity to meet
333 |  \* the TLA+ debugger (that has recently been added :-).
334 | 
335 | -----------------------------------------------------------------------------
336 | 
337 | HasToken ==
338 |     token.pos
339 | 
340 | \* Usually, one would find additional invariants and liveness properties at this
341 |  \* stage and check the spec for different spec parameters.  The second part can
342 |  \* easily be parallelized and scaled out (hello cloud computing!).
343 | \* If higher assurances are needed, now would be the start of proving  EWD998
344 |  \* correct, which requires finding an inductive invariant.  Finding an
345 |  \* inductive invariant is hard because one has to know *why* the algorithm
346 |  \* works (model-checking only confirms that algorithms work according to the
347 |  \* checked properties).
348 | \* Fortunately, the EWD998 paper gives an inductive invariant in the form of a
349 |  \* larger formula  P0 /\ (P1 \/ P2 \/ P3 \/ P4)  , with  \S  representing
350 |  \* "the sum of",  B  to equal the sum of in-flight messages,  and  P0  to  P4 : 
351 |  \*
352 |  \* P0: B = Si: 0 <= i < N: c.i)
353 |  \* P1: (Ai: t < i < N: machine nr.i is passive) /\
354 |  \*     (Si: t < i < N: c.i) = q
355 |  \* P2: (Si: 0 <= i <= t: c.i) + q > 0
356 |  \* P3: Ei: 0 <= i <= t : machine nr.i is black
357 |  \* P4: The token is black
358 | 
359 | \* TLA doesn't have for loops with which we could sum the elements of the 
360 |  \* variables  counter  and  pending  ; TLA+ is not an imperative programming
361 |  \* language.  Instead, TLA+ has recursive functions.  We could write a
362 |  \* function to sum the variable  counter  as:
363 |  \* 
364 |  \*  SumC == CHOOSE f : f = [ i \in 0..N-1 |-> IF i = 0 
365 |  \*                                            THEN counter[i]
366 |  \*                                            ELSE f[i-1] + counter[i]  ]
367 |  \* 
368 |  \* The sum of  counter  would then be  SumC[N-1]  .
369 |  \* TLC does not evaluate unbounded choose.  However, TLA+ has a syntactic
370 |  \* variant that TLC evaluates:
371 |  \* 
372 |  \*  SumC[ i \in 0..N-1 ] == IF i=0 THEN counter[i] ELSE SumC[i-1] + counter[i]
373 |  \*
374 |  \* To write a recursive function to sum the elements of a function given a
375 |  \* (subset) of its domain that is independent of  counter  , and, thus, also
376 |  \* works for  pending  , we need to see another TLA+ concept.  A let/in
377 |  \* expression allows us to use locally define operators. A let/in is just a
378 |  \* syntactic concept, and the expression is equivalent to an expression
379 |  \* with all locally defined operators in-lined.
380 | \* @type: (Int -> Int, Int, Int) => Int;
381 | \* Sum(fun, from, to) ==
382 | \*     LET sum[ i \in from..to ] ==
383 | \*             IF i = from THEN fun[i]
384 | \*             ELSE sum[i-1] + fun[i]
385 | \*     IN sum[to]
386 | 
387 | \* Alternatively, one can write recursive operators. What distinguishes a
388 |  \* recursive operator from an ordinary operator, is a  RECURSIVE  operator
389 |  \* declaration.
390 |  \* Compared to recursive functions, TLC usually evaluate recursive operators
391 |  \* faster.  However, that is not the case for Apalache.  PlusPy, a tool to
392 |  \* execute TLA+ specifications, doesn't support recursive operators at all.
393 | \* Commented because of https://git.io/JGAf7 and mandatory bounds for unrolling
394 | \* https://apalache.informal.systems/docs/apalache/principles.html#recursion
395 | \* RECURSIVE SumO(_,_,_)
396 | \* SumO(fun, from, to) ==
397 | \*     IF from = to 
398 | \*     THEN fun[to]
399 | \*     ELSE fun[from] + SumO(fun, from+1, to)
400 | 
401 | \* Lastly, we can re-use fold operators from the TLA+ CommunityModules at
402 |  \* https://github.com/tlaplus/CommunityModules that are especially well-known
403 |  \* among functional programmers.  This gives us a chance to show  LAMBDA  
404 |  \* in TLA+.
405 | \* Commented because of https://git.io/JGAf7 and lack of annotations in Utils.tla
406 | Sum(fun, from, to) ==
407 |     LET F == INSTANCE Functions
408 |     IN F!FoldFunctionOnSet(LAMBDA a,b: a+b, 0, fun, from..to)
409 | 
410 | B ==
411 |     \* This spec counts the in-flight messages in the variable  pending  .
412 |     Sum(pending, 0, N-1)
413 | 
414 | Inv == 
415 |     /\ P0:: B = Sum(counter, 0, N-1)
416 |     /\  \/ P1:: /\ \A i \in (token.pos+1)..N-1: ~ active[i]
417 |             /\ IF token.pos = N-1 
418 |                THEN token.q = 0 
419 |                ELSE token.q = Sum(counter, (token.pos+1), N-1)
420 |         \/ P2:: Sum(counter, 0, token.pos) + token.q > 0
421 |         \/ P3:: \E i \in 0..token.pos : color[i] = "black"
422 |         \/ P4:: token.color = "black"
423 | 
424 | \* We expect that  Inv  is an inductive invariant that we can eventually prove
425 |  \* correct with TLAPS.  However, "it is easier to prove something if it's true",
426 |  \* and, thus, we validate  IInv  for small values of  N  with model-checking.
427 |  \* For that, we conjoin  TypeOK  with  Inv  to  IInv  , and (logically) check
428 |  \* the formula with TLC:
429 |  \*
430 |  \*  IInv /\ [Next]_vars => IInv'
431 |  \*
432 | IInv ==
433 |     /\ TypeOK
434 |     /\ Inv
435 | 
436 | =============================================================================
437 | 


--------------------------------------------------------------------------------
/AsyncTerminationDetection.tla:
--------------------------------------------------------------------------------
  1 | ---------------------- MODULE AsyncTerminationDetection ---------------------
  2 | \* * TLA+ is an expressive language and we usually define operators on-the-fly.
  3 |  \* * That said, the TLA+ reference guide "Specifying Systems" (download from:
  4 |  \* * https://lamport.azurewebsites.net/tla/book.html) defines a handful of
  5 |  \* * standard modules.  Additionally, a community-driven repository has been
  6 |  \* * collecting more modules (http://modules.tlapl.us). In our spec, we are
  7 |  \* * going to need operators for natural numbers.
  8 | EXTENDS Naturals
  9 | 
 10 | \* * A constant is a parameter of a specification. In other words, it is a
 11 |  \* * "variable" that cannot change throughout a behavior, i.e., a sequence
 12 |  \* * of states. Below, we declares N to be a constant of this spec.
 13 |  \* * We don't know what value N has or even what its type is; TLA+ is untyped and
 14 |  \* * everything is a set. In fact, even 23 and "frob" are sets and 23="frob" is 
 15 |  \* * syntactically correct.  However, we don't know what elements are in the sets 
 16 |  \* * 23 and "frob" (nor do we care). The value of 23="frob" is undefined, and TLA+
 17 |  \* * users call this a "silly expression".
 18 | CONSTANT 
 19 |     \* @type: Int;
 20 |     N
 21 | 
 22 | \* * We should declare what we assume about the parameters of a spec--the constants.
 23 |  \* * In this spec, we assume constant N to be a (positive) natural number, by
 24 |  \* * stating that N is in the set of Nat (defined in Naturals.tla) without 0 (zero).
 25 |  \* * Note that the TLC model-checker, which we will meet later, checks assumptions
 26 |  \* * upon startup.
 27 | ASSUME NIsPosNat == N \in Nat \ {0}
 28 | 
 29 | \* * A definition Id == exp defines Id to be synonymous with an expression exp.
 30 |  \* * A definition just gives a name to an expression. The name isn't special.
 31 |  \* * It is best to write comments that explain what is being defined. To get
 32 |  \* * a feeling for how extensive comments tend to be, see the Paxos spec at
 33 |  \* * https://git.io/JZJaD .
 34 |  \* * Here, we define Node to be synonymous with the set of naturals numbers
 35 |  \* * 0 to N-1.  Semantically, Node is going to represent the ring of nodes.
 36 |  \* * Note that the definition Node is a zero-arity (parameter-less) operator.
 37 | Node == 0 .. N-1
 38 | 
 39 | 
 40 | \* * Contrary to constants above, variables may change value in a behavior:
 41 |  \* * The value of active may be 23 in one state and "frob" in another.
 42 |  \* * For EWD998, active will maintain the activation status of our nodes,
 43 |  \* * while pending counts the in-flight messages from other nodes that a
 44 |  \* * node has yet to receive.
 45 | VARIABLES 
 46 |   \* @type: Int -> Bool;
 47 |   active,               \* activation status of nodes
 48 |   \* @type: Int -> Int;
 49 |   pending,              \* number of messages pending at a node
 50 |   \* * Up to now, this specification didn't teach us anything useful regarding
 51 |    \* * termination detection in a ring (we were mostly concerned with TLA+ itself).
 52 |    \* * Let's change this to find out if this proto-algorithm detects termination.
 53 |    \* * In an implementation, we could write to a log file whenever the system
 54 |    \* * terminates.  However, for larger systems it can be challenging to collect
 55 |    \* * e.g., a consistent snapshot.  In a spec, we can just use an (ordinary) variable
 56 |    \* * that -contrary to the other variables- doesn't define the state the system is
 57 |    \* * in, but records what the system has done so far.  The jargon for this variable
 58 |    \* * is "history variable".
 59 |    \* * For termination detection, the complete history of the computation, performed
 60 |    \* * by the system, is not relevant--we only care if the system detected
 61 |    \* * termination.
 62 |   \* @type: Bool;
 63 |   terminationDetected
 64 | 
 65 | \* * A definition that lets us refer to the spec's variables (more on it later).
 66 | vars == << active, pending, terminationDetected >>
 67 | 
 68 | terminated == \A n \in Node : ~ active[n] /\ pending[n] = 0
 69 | 
 70 | -----------------------------------------------------------------------------
 71 | 
 72 | \* * Initially, all nodes are active and no messages are pending.
 73 | Init ==
 74 |     \* * ...all nodes are active.
 75 |      \* * The TLA+ language construct below is a function. A function has a domain
 76 |      \* * and a co-domain/range. Lamport: ["In the absence of types, I don't know
 77 |      \* * what a partial function would be or why it would be useful."]
 78 |      \* * (http://discuss.tlapl.us/msg01536.html).
 79 |      \* * Here, we "map" each element in Node to the value TRUE (it is just
 80 |      \* * coincidence that the elements of Node are 0, 1, ..., N-1, which could
 81 |      \* * suggest that functions are just zero-indexed arrays found in programming
 82 |      \* * languages. As a matter of fact, the domain of a function can be any set,
 83 |      \* * even infinite ones: [n \in Nat |-> n]).
 84 |     \* * /\ is logical And (&& in programming). Conjunct lists usually make it easier
 85 |      \* * to read. However, indentation is significant!
 86 |     \* * So far, the initial predicate defined a single state.  That seems natural as
 87 |      \* * most programs usually start with all variables initialized to some fixed
 88 |      \* * value.  In a spec, we don't have to be this strict.  Instead, why not let
 89 |      \* * the system start from any (type-correct) state?
 90 |      \* * Besides syntax to define a specific function, TLA+ also has syntax to define
 91 |      \* * a set of functions mapping from some set S (the domain) to some other set T:
 92 |      \* *   [ S -> T ] or, more concretely:  [ {0,1,2,3} -> {TRUE, FALSE} ]
 93 |     /\ active \in [ Node -> BOOLEAN ]
 94 |     /\ pending \in [ Node -> Nat ]
 95 |     /\ terminationDetected \in {FALSE, terminated}
 96 | 
 97 | \* * Recall that TLA+ is untyped and that we are "free" to write silly expressions.  So
 98 |  \* * why no types?  The reason is that, while real-world specs can be big enough for 
 99 |  \* * silly expressions to sneak in (still way smaller than programs), types would 
100 |  \* * unnecessarily slow us down when specifying (prototyping). Also, there is a way to
101 |  \* * catch silly expressions quickly.
102 | \* * It's finally time to state and check a first correctness property, namely that our
103 |  \* * spec is "properly typed".  We do this by writing an operator that evaluates to
104 |  \* * false, should values of variables not be as expected.  We can think of this a
105 |  \* * stating the types of variables in a special place, and not where they are declared
106 |  \* * or where values are assigned.  When TLC verifies the spec, it will evaluate the
107 |  \* * operator on every state it generates.  If the operator evaluates to false, an error
108 |  \* * is reported.  In other words, the operator is an invariant of the system.
109 |  \* * Invariants are (a class of) safety properties, and safety props are "informally"
110 |  \* * define as "nothing bad ever happens" (a formal definition can be found in
111 |  \* * https://link.springer.com/article/10.1007/BF01782772, but we won't need it).
112 | TypeOK ==
113 |     /\ active \in [ Node -> BOOLEAN ]
114 |     /\ pending \in [ Node -> Nat ]
115 |     /\ terminationDetected \in BOOLEAN 
116 | 
117 | -----------------------------------------------------------------------------
118 | 
119 | \* * Each one of the definitions below represent atomic transitions, i.e., define
120 |  \* * the next state of the current behavior (a state is an assignment of
121 |  \* * values to variables). We call those definitions "actions". A next state is
122 |  \* * possible if the action is true for some combination of current and next
123 |  \* * values. Two or more actions do *not* happen simultaneously; if we want to
124 |  \* * e.g. model things to happen at two nodes at once, we are free to choose an
125 |  \* * appropriate level of granularity for those actions.
126 | 
127 | \* * Node i terminates.
128 | Terminate(i) ==
129 |     \* Any subset of *active* nodes can become inactive in the next step.
130 |     /\ active' \in { f \in [ Node -> BOOLEAN] : \A n \in Node: ~active[n] => ~f[n] }
131 |     \* * Also, the variable active is no longer unchanged.
132 |     /\ pending' = pending
133 |     \* * Possibly (but not necessarily) detect termination, iff all nodes are inactive
134 |      \* * and no messages are in-flight.
135 |     /\ terminationDetected' \in {terminationDetected, terminated'}
136 | 
137 | \* * Node i sends a message to node j.
138 | SendMsg(i, j) ==
139 |     /\ active[i]
140 |     /\ pending' = [pending EXCEPT ![j] = @ + 1]
141 |     /\ UNCHANGED << active, terminationDetected >>
142 | 
143 | \* * Node I receives a message.
144 | Wakeup(i) ==
145 |     /\ pending[i] > 0
146 |     /\ active' = [active EXCEPT ![i] = TRUE]
147 |     /\ pending' = [pending EXCEPT ![i] = @ - 1]
148 |     /\ UNCHANGED << terminationDetected >>
149 | 
150 | DetectTermination ==
151 |     /\ terminated
152 |     /\ ~terminationDetected
153 |     /\ terminationDetected' = TRUE
154 |     /\ UNCHANGED << active, pending >>
155 | 
156 | -----------------------------------------------------------------------------
157 | 
158 | \* * Here we define the complete next-state action. Recall that it’s a predicate
159 |  \* * on two states — the current and the next — which is true if the next state
160 |  \* * is acceptable.
161 |  \* * The next-state relation should somehow plug concrete values into the 
162 |  \* * (sub-) actions Terminate, SendMsg, and Wakeup.
163 | Next ==
164 |     \/ DetectTermination
165 |     \/ \E i,j \in Node:   
166 |         \/ Terminate(i)
167 |         \/ Wakeup(i)
168 |         \* ? Is it correct to let node i send a message to node j with i = j?
169 |         \/ SendMsg(i, j)
170 | 
171 | Stable ==
172 |     \* * With the addition of the auxiliary variable  terminationDetected  and
173 |      \* * the action  DetectTermination  , we can check that our (ultra) high-level
174 |      \* * design achieves termination detection.
175 |     \* * Holds iff  tD = FALSE  instead of    in  Init/MCInit.
176 |      \* * If the definition of  MCInit  in MCAsyncTerminationDetection.tla is
177 |      \* * changed to  terminationDetected \in {FALSE, terminated}  ,  Stable  
178 |      \* * is violated by the initial state:
179 |      \* *    Error: Property Stable is violated by the initial state:
180 |      \* *    /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
181 |      \* *    /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE)
182 |      \* *    /\ terminationDetected = FALSE
183 |      \* * Why? Because  Stable  just asserts something about initial states.
184 |      \* * With  terminationDetected \in {FALSE, terminated}  , the state above
185 |      \* * becomes an initial state (see Specifying Systems p. 241 for morew details).
186 |     \* * How do we say that we want  Stable  to hold for all states of a behavior,
187 |      \* * not just for initial states?  In other words, how do we state properties
188 |      \* * that are evaluated on behaviors; not just single states?
189 |      \* * We have arrived at the provenance of temporal logic.  There are many temporal
190 |      \* * logics, and TLA is but one of them (the missing "+" is not a typo!).
191 |      \* * Like with programming, different (temporal) logics make different tradeoffs.
192 |      \* * Compared to, e.g., Linear temporal logic (LTL), TLA has the two (fundamental)
193 |      \* * temporal operators, Always (denoted as [] and pronounced "box") and Eventually
194 |      \* * (<> pronounced "diamond"). In contrast, LTL has Next and Until, which means
195 |      \* * that one cannot say the same things with both logics.  TLA's operators
196 |      \* * guarantee that temporal formulae are stuttering invariant, which we will touch
197 |      \* * on later when we talk about refinement.
198 |      \* * For now, we just need the Always operator, to state  Stable.   []Stable asserts
199 |      \* * that  Stable  holds in all states of a behavior.  In other words, the formula
200 |      \* * Stable is always true.  Note that Box can also be pushed into the definition of
201 |      \* * Stable.
202 |     \* * The following behavior violates the (strengthened)  Stable:
203 |      \* *    State 1: <Initial predicate>
204 |      \* *    /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
205 |      \* *    /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> TRUE)
206 |      \* *    /\ terminationDetected = FALSE
207 |      \* *    State 2: <Terminate line 122, col 5 to line 131, col 66 of module AsyncTerminationDetection>
208 |      \* *    /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
209 |      \* *    /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE)
210 |      \* *    /\ terminationDetected = FALSE
211 |      \* *    State 3: <DetectTermination line 147, col 5 to line 149, col 38 of module AsyncTerminationDetection>
212 |      \* *    /\ pending = (0 :> 0 @@ 1 :> 0 @@ 2 :> 0)
213 |      \* *    /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE)
214 |      \* *    /\ terminationDetected = TRUE
215 |      \* *    State 4: Stuttering
216 |      \* * Have we already found a flaw in our design and are forced back to the
217 |      \* * whiteboard?  No, you (intentionally) got hold of the wrong end of the stick.
218 |      \* * It is not that  terminated  implies  terminationDetection  , but the other
219 |      \* * way around.
220 |     \* * Phew, we have a high-level design (and you learned a lot about TLA+). Let's
221 |      \* * move to the next level.  Except, one should always be suspicious of success...
222 |     [](terminationDetected => []terminated)
223 | 
224 | -----------------------------------------------------------------------------
225 | 
226 | \* * It is usually a good idea to check a couple of non-properties, i.e., properties that
227 |  \* * we expect to be violated.  We will use the behavior that violates the non-property
228 |  \* * as a sanity check.
229 |  \* * So far, our spec has  TypeOK  that assert the "types" of the variables and  Stable
230 |  \* * that asserts that   terminationDetected  can only be true, iff  terminated  is true.
231 |  \* * In TLA, we can also assert that (sub-)actions occur in a behavior; after all, it's
232 |  \* * the Temporal Logic of *Actions*.  :-)  A formula,  [A]_v  with  A  an action holds
233 |  \* * for a behavior if ever step (pair of states) is an  [A]_v  step.  For the moment,
234 |  \* * we will ignore the subscript  _v  and simply write  _vars instead of it:  [A]_vars.
235 |  \* *
236 | ActuallyNext ==
237 |     [][DetectTermination \/ \E i,j \in Node: (Terminate(i) \/ Wakeup(i) \/ SendMsg(i,j))]_vars
238 |     \* * In hindsight, it was to be expected that the trace just has two states
239 |      \* * i.e., a single step.  The property  OnlyTerminating  is violated by
240 |      \* * behaviors that take our actions:
241 |      \* *    Error: Action property OnlyTerminating is violated.
242 |      \* *    Error: The behavior up to this point is:
243 |      \* *    State 1: <Initial predicate>
244 |      \* *    /\ pending = (0 :> 1 @@ 1 :> 1 @@ 2 :> 1)
245 |      \* *    /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE)
246 |      \* *    /\ terminationDetected = FALSE
247 |      \* *    
248 |      \* *    State 2: <Wakeup line 141, col 5 to line 144, col 42 of module AsyncTerminationDetection>
249 |      \* *    /\ pending = (0 :> 0 @@ 1 :> 1 @@ 2 :> 1)
250 |      \* *    /\ active = (0 :> TRUE @@ 1 :> FALSE @@ 2 :> FALSE)
251 |      \* *    /\ terminationDetected = FALSE
252 |     \* * Let's now focus on the subscript  _v  part that we glossed over previously.
253 |      \* * The subscript  _v  in  [A]_v  is a state-function, a formula without action- or
254 |      \* * temporal-level operators, that -informally- defines what happens with the
255 |      \* * variables. 
256 |      \* * We replaced  _v  with  _vars  where  vars  equals the defintion on line 57
257 |      \* *  << active, pending, terminationDetected >>  .  Note that  << >>  is just syntactic
258 |      \* * sugar to conveniently state  1-indexed arrays.  However, they are called 
259 |      \* * sequences in TLA are many useful sequence-related operators are defined in the
260 |      \* * Sequences.tla standard module.  More importantly, a sequence has an order!
261 |      \* * Time to pull out the TLA+ cheat sheet and check page 4:
262 |      \* *  https://www.hpl.hp.com/techreports/Compaq-DEC/SRC-TN-1997-006A.pdf
263 |     \* * The formula  [A]_v  is equivalent to  A \/ (v' = v)  .  Semantically, every
264 |      \* * step of the behavior is an  A  step, or the variables in  v  remain unchanged.
265 |     \* * If you look closely, you will realize that the disjunct of actions nested in
266 |      \* *  OnlyTerminating  is equivalent to the  Next  operator above!  Up to now,
267 |      \* * we've been using a TLC feature that lets us pass  INIT  and  NEXT  in TLC's
268 |      \* * configuration file.  In TLA, the system specification that defines the set of
269 |      \* * of valid system behaviors, is actually given as a temporal formula.
270 | 
271 | F ==
272 |     \* * With this liveness property  F  , all (other) properties hold. :-)  However,
273 |      \* * it looks funny that check  Live1  and  Live2  when both are also part of  Spec.
274 |      \* * At the level of termination detection with EWD998,  terminated  might never be
275 |      \* * true because nodes may never terminate.
276 |      \* * Additionally, there is a second problem with  F  that is even independent of
277 |      \* * EWD998: A scheduler would have to look into the future to see if the
278 |      \* * scheduling choice it is making at some point, leads to an unrecoverable state
279 |      \* * later from where the stipulated "good thing" can no longer happen.  This is
280 |      \* * elsewhere informally called "paint itself in the corner", or -formally- is the
281 |      \* * topic of machine-closed specifications.
282 |     \* * We want  F  to not add additional safety properties on top of  Spec  .  We won't
283 |      \* * discuss the whys here, but if we restrict ourselve to only stipulate that
284 |      \* * enabled sub-actions of the next-state relation  Next  eventually happen, we can
285 |      \* * be sure that we don't paint the scheduler in the corner.  To rule out the
286 |      \* * behavior shown by TLC as a violation of  Live1  , we have to require that a
287 |      \* * Next  step eventually hapens (if it is "possible"). We need to put a number of
288 |      \* * previously seen concepts together now:
289 |      \* * - =>  (implication)
290 |      \* * - ENABLED
291 |      \* * - <<A>>_v
292 |      \* * - Combining  []  and  <>  to  []<>  and  <>[]
293 |      \* * "If  A  is enabled forever,  infinitely many  A  steps will eventually occur."
294 |      \* *   <>[](ENABLED <<Next>>_vars) => []<><<Next>>_vars
295 |      \* * This can be written more compactly as  WF_vars(Next)  , but TLC still shows
296 |      \* * a lasso-shaped counter-example:
297 |      \* *   
298 |      \* *   Error: Temporal properties were violated.
299 |      \* *   
300 |      \* *   Error: The following behavior constitutes a counter-example:
301 |      \* *   
302 |      \* *   State 1: <Initial predicate>
303 |      \* *   /\ pending = (0 :> 1 @@ 1 :> 1 @@ 2 :> 1)
304 |      \* *   /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE)
305 |      \* *   /\ terminationDetected = FALSE
306 |      \* *   
307 |      \* *   State 2: <Wakeup line 141, col 5 to line 144, col 42 of module AsyncTerminationDetection>
308 |      \* *   /\ pending = (0 :> 1 @@ 1 :> 1 @@ 2 :> 0)
309 |      \* *   /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> TRUE)
310 |      \* *   /\ terminationDetected = FALSE
311 |      \* *   
312 |      \* *   State 3: <SendMsg line 135, col 5 to line 137, col 50 of module AsyncTerminationDetection>
313 |      \* *   /\ pending = (0 :> 1 @@ 1 :> 1 @@ 2 :> 1)
314 |      \* *   /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> TRUE)
315 |      \* *   /\ terminationDetected = FALSE
316 |      \* *   
317 |      \* *   Back to state 1: <Terminate line 122, col 5 to line 131, col 66 of module AsyncTerminationDetection>
318 |     WF_vars(DetectTermination)
319 | 
320 | \* * We’ll now define a formula that encompasses our specification of how the system
321 |  \* * behaves. It combines the Initial state predicate, the next-state action, and
322 |  \* * something called a fairness property that we will learn about later.
323 |  \* * It is convention to name the behavior spec  Spec  .
324 | Spec ==
325 |     \* *  F  has been inlined because of    https://github.com/informalsystems/apalache/issues/468#issuecomment-853259723
326 |     \* Wow, liveness (fairness) is subtle.  However, this is not because TLA poorly
327 |      \* equipped to handle liveness.  "[Instead,] the problem lies in the nature
328 |      \* of liveness, not in its definition" (Lamport).
329 |      \* "Narrowing" fairness from  Next  to   DetectTermination  makes sure that
330 |      \* a  DetectTermination  eventually happens instead of repeated token rounds.
331 |     \* TODO Convince yourself that  AsyncTerminationDetection  is still correct
332 |      \* TODO and  EWD998  passes, i.e., rerun TLC.
333 |     Init /\ [][Next]_vars /\ WF_vars(DetectTermination) (*  F  *)
334 | 
335 | Terminates ==
336 |     \* * The behavior spec  Spec  asserts that every step/transition is a  Next  step, or
337 |      \* * the variables do not change.  But is it actually true that the system can always
338 |      \* * and forever take a  Next  step?  Semantically, we are specifying termination
339 |      \* * detection.  Does the algorithm for termination detection itself terminate or can
340 |      \* * it execute forever?
341 |     \* * TLA defines an  ENABLED  operator with which we can state predicates such as
342 |      \* *  ENABLED A  .  This prediacte is true iff action A is enabled, i.e., there exists
343 |      \* * a state  t  such that the transition  s -> t  is an A step.
344 |     []ENABLED [Next]_vars
345 |     
346 | 
347 | \* * In  Terminates  , we asserted that it is always "possible" to take a  Next  step, or that
348 |  \* * it is possible for all variables to remain unchange:  Next \/ vars' = vars  .  This is
349 |  \* * a tautology in  TLA  and we effectively checked  that  Spec => TRUE  .  A related mistake
350 |  \* * is when the antecedent is  FALSE  :  FALSE => TRUE  (Try conjoining 1 = 2 to  Spec )
351 |  \* * Remember:  [](Be suspicious of success).
352 |  \* *
353 |  \* * Sometime, we wish to assert that all or some steps are an  A  step (for an action A),
354 |  \* * and some variables change. In other words, we wish to assert  A /\ vars' # vars  (which
355 |  \* * is equivalent to   ~(~A \/ vars' = vars)  ).  TLA has dedicated syntax for this, which
356 |  \* * is  <<A>>_v   where  v  is usually  vars  but can be any state function.
357 | AngleNextSubVars ==
358 |     []ENABLED <<Next>>_vars
359 | 
360 | -----------------------------------------------------------------------------
361 | 
362 | Live ==
363 |     \* * Up to now, we have been stating safety properties, i.e., "nothing bad ever happens".
364 |      \* * Looking at the counter-examples we've encountered so far, we find that a safety
365 |      \* * property is a finite prefix of a (infinite) behavior where the final state or action
366 |      \* * (transition) violates the property.  We primarily care about safety when we check
367 |      \* * systems.  For example, when we (used to) board a plane, we very much care that the
368 |      \* * plane never crashes!  However, if the pilots decide not to take off, the plane is
369 |      \* * guaranteed not to crash.  So we sit on the plane forever, waiting for it to depart.
370 |      \* * Clearly, as travelers, we eventually wish to arrive at our destination, e.g., to
371 |      \* * attend a meeting next Tuesday.  Can we formulate this as a safety property?  Easy,
372 |      \* * if we assume a (global) clock that determines when it is Tuesday.  Specifying
373 |      \* * algorithms or systems, we know how to replicate clocks. However, an algorithm that
374 |      \* * requires something to happen in a fixed amount of (some notion of) time is brittle.
375 |      \* * For example, an algorithm that counts hardware instructions will likely only work
376 |      \* * on a particular hardware architecture. For EWD998, we could assert that termination
377 |      \* * is detected within N rounds after termination occurred, but do we know the value of
378 |      \* * N?  And even with an N, we would need another property to assert that each round
379 |      \* * terminates...
380 |      \* * A way out is to formulate the property such that we assert that "something good 
381 |      \* * eventually happens"--the plane eventually arrives at its destination; the algorithm
382 |      \* * eventually produces a result, termination is eventually detected.
383 |      \* *
384 |      \* * Requiring something good to eventually happen is a liveness property. Unfortunately,
385 |      \* * in practice, it is not very useful to know that the algorithm eventually produces a
386 |      \* * result if it takes 5 billion years to do so.
387 |      \* *
388 |      \* * A violation of a liveness property is -contrary to a safety property- an infinite
389 |      \* * behavior where the "good thing" never happens.  When printed, tools such as TLC show
390 |      \* * a lasso where the property doesn't hold in the lasso loop.
391 |      \* *
392 |      \* * In TLA, we syntactically express a property that asserts that something good
393 |      \* * eventually happens, with the diamond operator  <>  (which is just the dual of the box
394 |      \* * operator:  <>P <=> ~[]~P  ).
395 |     \* * 
396 |      \* *   Error: Temporal properties were violated.
397 |      \* *   Error: The following behavior constitutes a counter-example:
398 |      \* *   State 1: <Initial predicate>
399 |      \* *   /\ pending = (0 :> 1 @@ 1 :> 1 @@ 2 :> 1)
400 |      \* *   /\ active = (0 :> FALSE @@ 1 :> FALSE @@ 2 :> FALSE)
401 |      \* *   /\ terminationDetected = FALSE
402 |      \* *   State 2: Stuttering
403 |     \* * Studying the counter-example below  F  will eventually make us realize that  Live1
404 |      \* * and  Live2  are non-properties of the system.  Instead, the liveness property we
405 |      \* * really care about is that when all nodes terminate, the termination detection
406 |      \* * algorithm eventually detects termination.  It might take a number of rounds for the
407 |      \* * algorithm to detect the termination.
408 |     \* * In TLA, we can write  [](terminated => <>terminationDetected)  more compactly with
409 |      \* * the leads-to operators:  
410 |     terminated ~> terminationDetected
411 | 
412 | \* * Lastly, we state for readers which properties are theorems of the system.  This is yet
413 |  \* * another place where implication shows up.  This is nothing other than stating that the
414 |  \* * behaviors defined by  Spec  are a subset of the behaviors defined  by  Stable, and
415 |  \* *  Live  .
416 | THEOREM Spec => Stable
417 | 
418 | THEOREM Spec => Live
419 | 
420 |     \* * For both properties  Live1  and  Live2  ,  TLC reports counter-examples that end in
421 |      \* * stuttering.  This is strange!  Clearly, the counter-example for  Live1  could be
422 |      \* * extended by, e.g., a  Wakeup  action that "consumes" one of the pending messages.
423 |      \* * Similarly, the counter-example for  Live2  could be extended by a
424 |      \* *  DetectTermination action.
425 |      \* * We have to look at  Spec  again to see what is happening.  The (temporal) formula
426 |      \* *  Spec  defines a set of behaviors, and this set includes the counter-examples
427 |      \* * reported for  Live1  and  Live2  .  Why?  Because  Spec  does not state a good
428 |      \* * thing that (eventually) has to happen.  In its current form,  Spec  only defines
429 |      \* * what must never happen (  Spec  itself is a safety property!).  However, since we
430 |      \* * ask TLC to check if something good eventually happens, it finds those behaviors
431 |      \* * permitted by  Spec, where nothing good ever happens.
432 |      \* * We have to amend  Spec  such that it, in addition to the safety part, also defines
433 |      \* * the liveness property we the system to satisfy.  Mathematically, this means we have
434 |      \* * to conjoin  Spec  with some suitable liveness property  F:  Spec /\ F
435 |     \* * Naively, we might choose for  F  the (liveness) property
436 |      \* *  <>terminated  /\ <>terminationDetected.
437 | =============================================================================
438 | \* Modification History
439 | \* Created Sun Jan 10 15:19:20 CET 2021 by Stephan Merz @muenchnerkindl


--------------------------------------------------------------------------------