├── LICENSE
├── README.md
├── ownership_protocol
├── ZeusOwnership.pdf
├── ZeusOwnership.tla
├── ZeusOwnershipFaults.pdf
├── ZeusOwnershipFaults.tla
├── ZeusOwnershipMeta.pdf
└── ZeusOwnershipMeta.tla
├── reliable_commit_ptrotocol
├── ZeusReliableCommit.pdf
└── ZeusReliableCommit.tla
└── zeus.png
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Zeus Ownership and Transactional Protocols
2 |
3 |
4 |
5 | *Zeus* is a datastore that offers fast locality-aware distributed transactions with strong consistency and availability. A brief description follows and more details can be found in the [Eurosys'21](https://2021.eurosys.org/) paper.
6 |
7 | This is the publicly available artifact repository supporting *Zeus*, which contains the specification of the two protocols that enable Zeus locality-aware reliable transactions; the *ownership protocol* and the *reliable commit protocol* of Zeus. The specifications are written in TLA+ and can be used to verify Zeus's correctness via model-checking.
8 |
9 | ## Inspired by
10 | Zeus protocols build on ideas of [Hermes](https://hermes-protocol.com/) and draws inspiration from cache coherence and hardware transactional memory exapting ideas to a replicated distributed setting for availability. Inspired concepts include the invalidation-based design of both proposed protocols and Zeus's approach to move objects and ensure exclusive write access (*ownership*) to the coordinator of a write transaction.
11 |
12 | ## Citation
13 | ```
14 | @inproceedings{Katsarakis:21,
15 | author = {Katsarakis, Antonios and Ma, Yijun and Tan, Zhaowei and Bainbridge, Andrew and Balkwill, Matthew and Dragojevic, Aleksandar and Grot, Boris and Radunovic, Bozidar and Zhang, Yongguang},
16 | title = {Zeus: Locality-Aware Distributed Transactions},
17 | year = {2021},
18 | publisher = {Association for Computing Machinery},
19 | address = {New York, NY, USA},
20 | booktitle = {Proceedings of the Sixteenth European Conference on Computer Systems},
21 | location = {Online Event, United Kingdom},
22 | series = {EuroSys '21}
23 | }
24 | ```
25 | ----
26 | # Locality-aware reliable transactions
27 | Transactions in Zeus involve three main phases:
28 | - __Prepare & Execute__: Execute the transaction locally;
29 | If *locality is not captured* (i.e., if accessing an object not local to the executor -- or missing exclussive write access for write transactions)
30 |
→ the object (and/or permissions) are acquired via the __ownership protocol__
31 | - *Exclusive owner* guarantee: at any time, at most one node with exclusive write access (i.e., *owner*) to an object
32 | - *Fast/slow-path* design: to acquire ownership (and data) in at most 1.5 RTT regardless of the requesting node in the absence of faults
33 | - *Fault-tolerant*: each ownership protocol step is idempotent to recover from faults
34 | - __Local Commit__: *Any* traditional single node (unreliable -- i.e., non-replicated) commit
35 | - __Reliable Commit__: Replicate updates to sharers for data availability:
36 | - *Fast Commit*: 1RTT that is also pipelined to hide the latency
37 | - *Read-only optimized transactions*: strictly serializable and local from any replica
38 | - *Fault-tolerant*: each reliable commit step is idempotent to recover from faults
39 |
40 | ## Properties and Invariants
41 | __Faults__: The specification and model checking assumes that crash-stop node faults and message reorderings may occur.
42 | Message losses in Zeus are handled via retransmissions. The exact failure model can be found in the paper.
43 | __Strong Consistency__: Zeus transactions guarantee the strongest consistency (i.e., are strictly serializable).
44 | __Invariants__: A list of model-checked invariants provided by the protocols follows
45 | * Amongst concurrent ownership requests to the same object, at most one succeeds.
46 | * At any time, there is at most one valid owner of an object.
47 | * A valid owner of an object has the most up-to-date data and version among live replicas.
48 | * All valid sharer vectors (stored by directory nodes and the owner) of an object agree on the object's sharers and ownership timestamp (o_ts).
49 | * The owner and readers are always correctly reflected by all valid sharer vectors.
50 | * A replica found in the valid state stores the latest committed value of an object.
51 |
52 | ----
53 |
54 | ## Model checking
55 | To model check the protocols, you need to download and install the TLA+ Toolbox so that you can run the *TLC* model checker using either the Reliable commit or ownership *TLA+* specifications. We next list the steps to model check Zeus's *reliable commit protocol* (model checking the ownership protocol is similar).
56 | * __Prerequisites__: Any OS with Java 1.8 or later, to accommodate the *TLA+* Toolbox.
57 | * __Download and install__ the [TLA+ Toolbox](https://lamport.azurewebsites.net/tla/toolbox.html).
58 | * __Launch__ the TLA+ Toolbox.
59 | * __Create a spec__: *File>Open Spec>Add New Spec...*; Browse and use *zeus/reliable_commit_protocol/ZeusReliableCommit.tla* as root module to finish.
60 | * __Create a new Model__: Navigate to *TLC Model Checker>New model...*; and create a model with the name "reliable-commit-model".
61 | * __Setup Behavior__: In *Model Overview* tab of the model, and under the *"What is the behavior spec?"* section, select *"Temporal formula"* and write *"Spec"*.
62 | * __Setup Constants__: Then specify the values of declared constants (under *"What is the model?"* section). You may use low values for constants to check correctness without exploding the state space. An example configuration would be three nodes and maximum versions of two or three. To accomplish that, you would need to click on each constant and select the "ordinary assignment" option. Then fill the box for version and epoch constants (e.g., *R_MAX_VERSION*) with a small number (e.g., with *"2"* or *"3"*) and for any node related fields (e.g., *R_NODES*) with a set of nodes (e.g., *"{1,2,3}"* -- for three nodes).
63 |
64 | ### File Structure
65 | * __The reliable commit specification__ is a single TLA+ module in *zeus/reliable_commit_protocol* folder.
66 | * __The ownership specification__ is decoupled into three modules under the *zeus/ownership_protocol* folder for simplicity. *ZeusOwnership.tla* and *ZeusOwnershipMeta.tla* specify (and can be used to model check) the ownership protocol in the absence of faults. The specification with failures is built on top of those in the module *ZeusOwnershipFaults.tla*.
67 |
68 | #### Caveats
69 | * The reliable commit specification does not include the pipelining optimization yet, and the ownership specification focuses on the slow-path for now -- which is mandatory to model check faults.
70 | * Apart from acquiring ownership, the ownership protocol can be utilized to handle other dynamic sharding actions (e.g., remove or add a reader replica) which were omitted from the paper. We may describe those in a separate online document if there is interest.
71 |
72 | ----
73 | ### License
74 | This work is freely distributed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0 "Apache 2.0").
75 |
76 | ### Contact
77 | Antonios Katsarakis: `antonis.io` | [`antoniskatsarakis@yahoo.com`](mailto:antoniskatsarakis@yahoo.com?subject=[GitHub]%20Zeus%20Specification "Email")
78 |
79 |
--------------------------------------------------------------------------------
/ownership_protocol/ZeusOwnership.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ease-lab/Zeus/1dc21c5365ec44553e0aeba1a667204bdb5a3a12/ownership_protocol/ZeusOwnership.pdf
--------------------------------------------------------------------------------
/ownership_protocol/ZeusOwnership.tla:
--------------------------------------------------------------------------------
1 | --------------------------- MODULE ZeusOwnership ---------------------------
2 | EXTENDS ZeusOwnershipMeta
3 | \* This Module specifies the full slow-path of the Zeus ownership protocol
4 | \* as appears in the according paper of Eurosys'21 without faults.
5 | \* It model checks its properties in the face of concurrent conflicting
6 | \* requests of changing ownerships and emulated reliable commits.
7 |
8 | \* Faults are added on top with the ZeusOwnershipFaults.tla spec
9 |
10 | -------------------------------------------------------------------------------------
11 | \* WARNING: We need to make sure that requester REQs are executed at most once; this requires:
12 | \* an APP node to be sticky to its LB driver for a Key (unless failure) and send REQ msgs via a
13 | \* FIFO REQ channel so that driver does not re-issues REQs that have been already completed in the past!
14 |
15 | \* We emulate executing only once via committedRTS (committedREQs is used to check INVARIANTS)
16 | commit_REQ(o_ts, r_ts) ==
17 | /\ committedRTS' = committedRTS \union {r_ts}
18 | /\ committedREQs' = committedREQs \union {o_ts}
19 |
20 | upd_t_meta(n, version, state, t_acks) ==
21 | /\ tState' = [tState EXCEPT![n] = state]
22 | /\ tVersion' = [tVersion EXCEPT![n] = version]
23 | /\ tRcvACKs' = [tRcvACKs EXCEPT![n] = t_acks]
24 |
25 | upd_r_meta(n, ver, tb, id, type) ==
26 | /\ rID' = [rID EXCEPT![n] = id]
27 | /\ rEID' = [rEID EXCEPT![n] = mEID] \* always update to latest mEID
28 | /\ rType' = [rType EXCEPT![n] = type]
29 | /\ rTS' = [rTS EXCEPT![n].ver = ver, ![n].tb = tb]
30 |
31 | \* to update the epoch id of last message issue
32 | upd_rEID(n) == upd_r_meta(n, rTS[n].ver, rTS[n].tb, rID[n], rType[n])
33 |
34 | upd_o_meta(n, ver, tb, state, driver, vec, ACKs) ==
35 | /\ oVector' = [oVector EXCEPT![n] = vec]
36 | /\ oRcvACKs' = [oRcvACKs EXCEPT![n] = ACKs]
37 | /\ oState' = [oState EXCEPT![n] = state]
38 | /\ oDriver' = [oDriver EXCEPT![n] = driver]
39 | /\ oTS' = [oTS EXCEPT![n].ver = ver, ![n].tb = tb]
40 |
41 | upd_o_meta_driver(n, ver, tb) == upd_o_meta(n, ver, tb, "drive", n, oVector[n], {})
42 | upd_o_meta_add_ack(n, sender) ==
43 | upd_o_meta(n, oTS[n].ver, oTS[n].tb, oState[n], oDriver[n], oVector[n], oRcvACKs[n] \union {sender})
44 |
45 | upd_o_meta_apply_val(n, m) ==
46 | /\ IF rTS[n].tb \notin mAliveNodes
47 | THEN upd_o_meta(n, oTS[n].ver, oTS[n].tb, "valid", 0, post_oVec(n, 0, oVector[n]), {})
48 | ELSE upd_o_meta(n, oTS[n].ver, oTS[n].tb, "valid", 0, post_oVec(n, rTS[n].tb, oVector[n]), {})
49 |
50 | upd_o_meta_apply_val_n_reset_o_state(n) ==
51 | upd_o_meta(n, 0, 0, "valid", 0, [readers |-> {}, owner |-> 0], {})
52 |
53 | -------------------------------------------------------------------------------------
54 | \* REQUESTER Helper operators
55 |
56 | choose_req(n) ==
57 | LET choice == CHOOSE x \in {0,1} : TRUE IN
58 | IF is_reader(n)
59 | THEN /\ IF choice = 0
60 | THEN "change-owner"
61 | ELSE "remove-reader"
62 | ELSE /\ IF choice = 0
63 | THEN "add-owner"
64 | ELSE "add-reader"
65 |
66 | max_commited_ver(S, n) == IF \A i \in S: i.tb # n THEN [ver |-> 0, tb |-> 0]
67 | ELSE CHOOSE i \in S: /\ i.tb = n
68 | /\ \A j \in S: \/ j.tb # n
69 | \/ j.ver <= i.ver
70 |
71 | next_rTS_ver(n) == max_commited_ver(committedRTS, n).ver + 1
72 |
73 | upd_rs_meta_n_send_req(n, r_type) ==
74 | /\ upd_r_meta(n, next_rTS_ver(n), n, 0, r_type)
75 | /\ upd_o_meta(n, 0, 0, "request", 0, [readers |-> {}, owner |-> 0], {})
76 | /\ o_send_req([ver |-> next_rTS_ver(n), tb |-> n], 0, r_type)
77 |
78 | -------------------------------------------------------------------------------------
79 | \* REQUESTER ACTIONS
80 |
81 | ORequesterREQ(n) == \* Requester issues a REQ
82 | /\ is_valid_requester(n)
83 | /\ is_reader(n)
84 | /\ next_rTS_ver(n) <= O_MAX_VERSION \* bound execution --> Bound this in reachable states
85 | /\ upd_rs_meta_n_send_req(n, "change-owner") \* to limit the state space only choose change ownership
86 | \* /\ upd_rs_meta_n_send_req(n, choose_req(n))
87 | /\ unchanged_mtc
88 |
89 | \* Requester receives NACK and replays REQ w/ higher rID
90 | ORequesterNACK(n) ==
91 | /\ is_in_progress_requester(n)
92 | /\ rID[n] < O_MAX_VERSION \* TODO: may Bound rID to number of APP_NODES instead
93 | /\ \E m \in oMsgs: o_rcv_nack(m, n)
94 | /\ upd_r_meta(n, rTS[n].ver, n, rID[n] + 1, rType[n])
95 | /\ o_send_req([ver |-> rTS[n].ver, tb |-> n], rID[n] + 1, rType[n])
96 | /\ unchanged_mtco
97 |
98 | ORequesterRESP(n) == \* Requester receives a RESP and sends a VAL to arbiters
99 | \E m \in oMsgs:
100 | /\ o_rcv_resp(m, n)
101 | /\ is_in_progress_requester(n)
102 | /\ commit_REQ(m.oTS, rTS[n])
103 | /\ upd_t_meta(n, m.tVersion, "valid", tRcvACKs[n]) \* todo this is optional
104 | /\ upd_o_meta(n, m.oTS.ver, m.oTS.tb, "valid", 0, post_oVec(n, n, m.oVector), {})
105 | /\ o_send_val(m.oTS)
106 | /\ unchanged_mtr
107 |
108 | ORequesterActions ==
109 | \E n \in APP_LIVE_NODES:
110 | \/ ORequesterREQ (n)
111 | \/ ORequesterNACK(n)
112 | \/ ORequesterRESP(n)
113 |
114 | -------------------------------------------------------------------------------------
115 | \* DRIVER ACTIONS
116 | ODriverINV(n, m) ==
117 | /\ o_rcv_req(m)
118 | /\ oState[n] = "valid"
119 | /\ oTS[n].ver < O_MAX_VERSION \* bound execution --> Bound this in reachable states
120 | /\ upd_t_meta(n, 0, tState[n], tRcvACKs[n])
121 | /\ upd_r_meta(n, m.rTS.ver, m.rTS.tb, m.rID, m.rType)
122 | /\ upd_o_meta_driver(n, oTS[n].ver + 1, n)
123 | /\ o_send_inv(n, n, [ver |-> oTS[n].ver + 1, tb |-> n], oVector[n], m.rTS, m.rID, m.rType)
124 | /\ unchanged_mc
125 |
126 | ODriverNACK(n, m) ==
127 | /\ o_rcv_req(m)
128 | /\ rTS[n] # m.rTS
129 | /\ oState[n] # "valid"
130 | /\ msg_not_exists(o_rcv_nack, m.rTS.tb) \* NACK does not exist (bound state space)
131 | /\ o_send_nack(m.rTS, m.rID)
132 | /\ unchanged_mtrco
133 |
134 | ODriverACK(n, m) ==
135 | /\ o_rcv_ack(m, n)
136 | /\ upd_o_meta_add_ack(n, m.sender)
137 | /\ IF m.tVersion # 0
138 | THEN upd_t_meta(n, m.tVersion, tState[n], tRcvACKs[n])
139 | ELSE unchanged_t
140 | /\ unchanged_Mmrc
141 |
142 | ODriverRESP(n) ==
143 | /\ oState[n] = "drive"
144 | /\ has_rcved_all_ACKs(n)
145 | /\ requester_is_alive(n)
146 | /\ msg_not_exists(o_rcv_resp, rTS[n].tb) \* RESP does not exist (bound state space)
147 | /\ o_send_resp(rTS[n], oTS[n], post_oVec(n, rTS[n].tb, oVector[n]), tVersion[n])
148 | /\ unchanged_mtrco
149 |
150 | ODriverActions ==
151 | \E n \in LB_LIVE_NODES:
152 | \/ ODriverRESP(n)
153 | \/ \E m \in oMsgs:
154 | \/ ODriverINV (n, m)
155 | \/ ODriverNACK(n, m)
156 | \/ ODriverACK (n, m)
157 |
158 | -------------------------------------------------------------------------------------
159 | \* LB ARBITER ACTIONS
160 | inv_to_be_applied(n, m) ==
161 | \/ o_rcv_inv_greater_ts(m, n)
162 | \/ (o_rcv_inv_equal_ts(m, n) /\ oState[n] = "invalid" /\ m.epochID > rEID[n])
163 |
164 | check_n_apply_inv(n, m) ==
165 | /\ inv_to_be_applied(n, m)
166 | /\ upd_r_meta(n, m.rTS.ver, m.rTS.tb, m.rID, m.rType)
167 | /\ upd_o_meta(n, m.oTS.ver, m.oTS.tb, "invalid", m.driver, m.oVector, {})
168 |
169 | \* We do not model lost messages thus arbiter need not respond w/ INV when ts is smaller
170 | OLBArbiterINV(n, m) ==
171 | /\ check_n_apply_inv(n, m)
172 | /\ \/ oState[n] # "drive"
173 | \/ o_send_nack(rTS[n], rID[n])
174 | /\ o_send_ack(n, m.oTS, 0)
175 | /\ unchanged_mtc
176 |
177 | OLBArbiterVAL(n, m) ==
178 | /\ o_rcv_val(m, n)
179 | /\ upd_o_meta_apply_val(n, m)
180 | /\ unchanged_Mmtrc
181 |
182 | OLBArbiterActions ==
183 | \E n \in LB_LIVE_NODES: \E m \in oMsgs:
184 | \/ OLBArbiterINV(n, m)
185 | \/ OLBArbiterVAL(n, m)
186 |
187 | -------------------------------------------------------------------------------------
188 | \* (O)wner or (R)eader ARBITER ACTIONS
189 |
190 | \* reader doesn't apply an INV but always responds with an ACK
191 | \* (and data if non-sharing rType and in tValid state)
192 | ORArbiterINV(n, m) ==
193 | /\ is_reader(n)
194 | /\ tState[n] = "valid"
195 | /\ o_rcv_inv(m, n)
196 | /\ o_send_ack(n, m.oTS, tVersion[n])
197 | /\ unchanged_mtrco
198 |
199 | OOArbiterINV(n, m) ==
200 | /\ is_owner(n)
201 | /\ m.type = "S_INV"
202 | /\ m.oVector.owner = n \* otherwise owner lost a VAL --> SFMOArbiterINVLostVAL
203 | /\ tState[n] = "valid"
204 | /\ check_n_apply_inv(n, m)
205 | /\ o_send_ack(n, m.oTS, tVersion[n])
206 | /\ unchanged_mtc
207 |
208 | OOArbiterVAL(n, m) ==
209 | /\ o_rcv_val(m, n)
210 | /\ IF oVector[n].owner = n
211 | THEN /\ upd_o_meta_apply_val(n, m)
212 | ELSE /\ upd_o_meta_apply_val_n_reset_o_state(n)
213 | /\ unchanged_Mmtrc
214 |
215 | OAPPArbiterActions ==
216 | \E n \in APP_LIVE_NODES: \E m \in oMsgs:
217 | \/ ORArbiterINV(n, m)
218 | \/ OOArbiterINV(n, m)
219 | \/ OOArbiterVAL(n, m)
220 |
221 | -------------------------------------------------------------------------------------
222 | \* Owner actions emulating tx updates
223 |
224 | TOwnerINV(n) ==
225 | /\ upd_t_meta(n, tVersion[n] + 1, "write", {})
226 | /\ t_send(n, "T_INV", tVersion[n] + 1)
227 | /\ unchanged_mrco
228 |
229 | TOwnerACK(n) ==
230 | \E m \in oMsgs:
231 | /\ t_rcv_ack(m, n)
232 | /\ upd_t_meta(n, tVersion[n], tState[n], tRcvACKs[n] \union {m.sender})
233 | /\ unchanged_Mmrco
234 |
235 | TOwnerVAL(n) ==
236 | /\ oVector[n].readers \subseteq tRcvACKs[n] \* has received all acks from readers
237 | /\ upd_t_meta(n, tVersion[n], "valid", {})
238 | /\ t_send(n, "T_VAL", tVersion[n])
239 | /\ unchanged_mrco
240 |
241 | \* Reader actions emulating tx updates
242 | TReaderINV(n) ==
243 | \E m \in oMsgs:
244 | /\ t_rcv_inv(m, n)
245 | /\ m.tVersion > tVersion[n]
246 | /\ upd_t_meta(n, m.tVersion, "invalid", {})
247 | /\ t_send(n, "T_ACK", m.tVersion)
248 | /\ unchanged_mrco
249 |
250 | TReaderVAL(n) ==
251 | \E m \in oMsgs:
252 | /\ t_rcv_val(m, n)
253 | /\ m.tVersion = tVersion[n]
254 | /\ upd_t_meta(n, tVersion[n], "valid", {})
255 | /\ unchanged_Mmrco
256 |
257 | TOwnerReaderActions ==
258 | \E n \in APP_LIVE_NODES:
259 | \/ /\ is_valid_owner(n)
260 | /\ \/ TOwnerINV(n)
261 | \/ TOwnerACK(n)
262 | \/ TOwnerVAL(n)
263 | \/ /\ is_reader(n)
264 | /\ \/ TReaderINV(n)
265 | \/ TReaderVAL(n)
266 |
267 | -------------------------------------------------------------------------------------
268 | \* Modeling Sharding protocol (Requester and Arbiter actions)
269 | ONext ==
270 | \/ OInit_min_owner_rest_readers
271 | \/ ORequesterActions
272 | \/ ODriverActions
273 | \/ OLBArbiterActions
274 | \/ OAPPArbiterActions
275 | \* \/ TOwnerReaderActions
276 |
277 | (***************************************************************************)
278 | (* The complete definition of the algorithm *)
279 | (***************************************************************************)
280 |
281 | Spec == OInit /\ [][ONext]_vars
282 |
283 | Invariants == /\ ([]OTypeOK)
284 | /\ ([]CONSISTENT_DATA) /\ ([]ONLY_ONE_CONC_REQ_COMMITS)
285 | /\ ([]AT_MOST_ONE_OWNER) /\ ([]OWNER_LATEST_DATA)
286 | /\ ([]CONSISTENT_SHARERS) /\ ([]CONSISTENT_OVECTORS)
287 |
288 |
289 | THEOREM Spec => Invariants
290 | -------------------------------------------------------------------------------------
291 | \*
292 | \*LSpec == Spec /\ WF_vars(ONext)
293 | \*
294 | \*LIVENESS == \E i \in LB_NODES: []<>(oState[i] = "valid" /\ oTS[i].ver > 3)
295 | \*-----------------------------------------------------------------------------
296 | \*THEOREM LSpec => LIVENESS
297 | =============================================================================
298 |
--------------------------------------------------------------------------------
/ownership_protocol/ZeusOwnershipFaults.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ease-lab/Zeus/1dc21c5365ec44553e0aeba1a667204bdb5a3a12/ownership_protocol/ZeusOwnershipFaults.pdf
--------------------------------------------------------------------------------
/ownership_protocol/ZeusOwnershipFaults.tla:
--------------------------------------------------------------------------------
1 | --------------------------- MODULE ZeusOwnershipFaults ---------------------------
2 |
3 | EXTENDS ZeusOwnership
4 |
5 | sharing_ok == \A nn \in APP_LIVE_NODES: \/ ~has_data(nn)
6 | \/ tState[nn] = "valid"
7 |
8 | arb_replay(n) ==
9 | /\ upd_rEID(n)
10 | /\ upd_o_meta_driver(n, oTS[n].ver, oTS[n].tb)
11 | /\ o_send_inv(n, n, oTS[n], oVector[n], rTS[n], rID[n], rType[n])
12 |
13 |
14 | -------------------------------------------------------------------------------------
15 | \* Requester replays msg after a failure
16 | OFRequester ==
17 | \E n \in APP_LIVE_NODES:
18 | /\ oState[n] = "request"
19 | /\ rEID[n] < mEID
20 | /\ upd_rEID(n)
21 | /\ o_send_req(rTS[n], rID[n], rType[n])
22 | /\ unchanged_mtco
23 |
24 | OFDriverRequester == \* waits for sharing-ok + computes next oVec transitions to valid +
25 | \* send vals with the proper changes in the oVec
26 | \E n \in LB_LIVE_NODES:
27 | /\ oState[n] = "drive"
28 | /\ has_rcved_all_ACKs(n)
29 | /\ ~requester_is_alive(n)
30 | /\ o_send_val(oTS[n])
31 | /\ upd_o_meta(n, oTS[n].ver, oTS[n].tb, "valid", 0, post_oVec(n, 0, oVector[n]), {})
32 | /\ unchanged_mtrc
33 |
34 | OFArbReplay == \* drivers resets acks and replays msg on arbiter failures
35 | \* if the failed arbiter was an owner we need to wait for sharing-ok
36 | \* for convinience arb-replays happen on any failure (e.g., requester)
37 | \E n \in mAliveNodes:
38 | /\ (oState[n] = "drive" \/ oState[n] = "invalid")
39 | /\ (n \in LB_LIVE_NODES \/ oVector[n].owner = n)
40 | /\ rEID[n] < mEID
41 | /\ \/ oVector[n].owner \in mAliveNodes
42 | \/ sharing_ok
43 | /\ arb_replay(n)
44 | /\ unchanged_mtc
45 |
46 | OLBArbiterACK == \* ACK an INV message which has the same as local s_ts but wasn't applied
47 | \E n \in LB_LIVE_NODES: \E m \in oMsgs:
48 | /\ ~inv_to_be_applied(n, m)
49 | /\ o_rcv_inv_equal_ts(m, n)
50 | /\ o_send_ack(n, m.oTS, 0)
51 | /\ unchanged_mtrco
52 |
53 | -------------------------------------------------------------------------------------
54 | \* INV response to an owner who did an arb-replay due to a lost val
55 | OFMOArbiterLostVALOldReplay ==
56 | \E l \in LB_LIVE_NODES: \E a \in APP_LIVE_NODES:
57 | /\ oState[a] = "drive"
58 | /\ oVector[a].owner = a
59 | /\ is_greaterTS(oTS[l], oTS[a])
60 | /\ o_send_inv(l, l, oTS[l], oVector[l], rTS[l], rID[l], rType[l])
61 | /\ unchanged_mtrco
62 |
63 | \* message failures
64 | OFMOArbiterINVLostVAL == \* An INV is received (w/ higher ts) to a non-valid owner
65 | \* who lost a VAL for the message that demoted him
66 | \E n \in APP_LIVE_NODES: \E m \in oMsgs:
67 | /\ oVector[n].owner = n
68 | /\ o_rcv_inv_greater_ts(m, n)
69 | /\ m.oVector.owner # n
70 | /\ upd_o_meta_apply_val_n_reset_o_state(n)
71 | /\ o_send_ack(n, m.oTS, tVersion[n])
72 | /\ unchanged_mtrc
73 |
74 | OFMRequesterVALReplay == \* Requester receives a RESP (already applied)
75 | \* and re-sends a VAL to arbiters
76 | \E n \in APP_LIVE_NODES: \E m \in oMsgs:
77 | /\ o_rcv_resp(m, n)
78 | /\ m.rTS.tb = n
79 | /\ m.oTS = oTS[n]
80 | /\ o_send_val(m.oTS)
81 | /\ unchanged_mtrco
82 |
83 | -------------------------------------------------------------------------------------
84 | block_owner_failures_if_not_in_tx_valid_state(n) ==
85 | \/ has_valid_data(n)
86 | \/ ~is_valid_owner(n)
87 |
88 | \* Emulate a node failure if there more than 2 alive nodes in LIVE_NODE_SET
89 | nodeFailure(n, LIVE_NODE_SET) ==
90 | /\ n \in LIVE_NODE_SET
91 | \* /\ block_owner_failures_if_not_in_valid_state(n)
92 | /\ Cardinality(LIVE_NODE_SET) > 2
93 | \* Update Membership and epoch id
94 | /\ mEID' = mEID + 1
95 | /\ mAliveNodes' = mAliveNodes \ {n}
96 | \* Remove failed node from oVectors
97 | /\ oVector' = [l \in O_NODES |-> [readers |-> oVector[l].readers \ {n},
98 | owner |-> IF oVector[l].owner = n
99 | THEN 0
100 | ELSE oVector[l].owner ]]
101 | /\ unchanged_Mtrc
102 | /\ UNCHANGED <>
103 |
104 | -------------------------------------------------------------------------------------
105 | FNext ==
106 | \/ OFRequester
107 | \/ OFDriverRequester
108 | \/ OFArbReplay
109 | \/ OLBArbiterACK
110 | \/ OFMOArbiterINVLostVAL
111 | \/ OFMRequesterVALReplay
112 | \/ OFMOArbiterLostVALOldReplay
113 |
114 | OFNext ==
115 | \/ ONext
116 | \/ FNext
117 | \/ \E n \in mAliveNodes:
118 | \/ nodeFailure(n, LB_LIVE_NODES) \* emulate LB node failures
119 | \/ nodeFailure(n, APP_LIVE_NODES) \* emulate application node failures
120 |
121 | (***************************************************************************)
122 | (* The complete definition of the algorithm *)
123 | (***************************************************************************)
124 |
125 | SFSpec == OInit /\ [][OFNext]_vars
126 |
127 | THEOREM SFSpec => Invariants
128 | =============================================================================
129 |
--------------------------------------------------------------------------------
/ownership_protocol/ZeusOwnershipMeta.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ease-lab/Zeus/1dc21c5365ec44553e0aeba1a667204bdb5a3a12/ownership_protocol/ZeusOwnershipMeta.pdf
--------------------------------------------------------------------------------
/ownership_protocol/ZeusOwnershipMeta.tla:
--------------------------------------------------------------------------------
1 | --------------------------- MODULE ZeusOwnershipMeta ---------------------------
2 |
3 | EXTENDS Integers, FiniteSets
4 |
5 | CONSTANTS \* LB_NODES and APP_NODES must not intersect and neither should contain 0.
6 | LB_NODES,
7 | APP_NODES,
8 | O_MAX_VERSION,
9 | O_MAX_FAILURES,
10 | O_MAX_DATA_VERSION
11 |
12 |
13 | VARIABLES \* variable prefixes --> o: ownership, r: request, t: transactional, m: membership
14 | \* VECTORS indexed by node_id
15 | oTS,
16 | oState,
17 | oDriver,
18 | oVector, \* No readers/owner: .readers = {} / .owner = 0
19 | oRcvACKs,
20 | \*
21 | rTS,
22 | rID,
23 | rType,
24 | rEID, \* since we do not have message loss timeouts we use this to
25 | \* track epoch of last issued INVs for replays
26 | tState,
27 | tVersion, \* tVesion sufice to represent tData | = 0 --> no data | > 0 data (reader / owner)
28 | tRcvACKs,
29 | \* GLOBAL variables
30 | oMsgs,
31 | mAliveNodes, \* membership
32 | mEID, \* membership epoch id
33 | committedREQs, \* only to check invariant that exactly one of concurrent REQs is committed
34 | committedRTS \* only to emulate FIFO REQ channels (i.e., do not re execute same client requests)
35 |
36 | vars == << oTS, oState, oDriver, oVector, oRcvACKs, rTS, rID, rType, rEID,
37 | tState, tVersion, tRcvACKs, oMsgs, mAliveNodes, mEID, committedREQs, committedRTS>>
38 |
39 | \* Helper operators
40 | O_NODES == LB_NODES \union APP_NODES
41 | O_NODES_0 == O_NODES \union {0}
42 | LB_NODES_0 == LB_NODES \union {0}
43 | APP_NODES_0 == APP_NODES \union {0}
44 | LB_LIVE_NODES == LB_NODES \intersect mAliveNodes
45 | APP_LIVE_NODES == APP_NODES \intersect mAliveNodes
46 | LB_LIVE_ARBITERS(driver) == LB_LIVE_NODES \ {driver} \* all arbiters except driver and owner
47 |
48 | ASSUME LB_NODES \intersect APP_NODES = {}
49 | ASSUME \A k \in O_NODES: k # 0 \* we use 0 as the default noop
50 |
51 | -------------------------------------------------------------------------------------
52 | \* Useful Unchanged shortcuts
53 | unchanged_M == UNCHANGED <>
54 | unchanged_m == UNCHANGED <>
55 | unchanged_t == UNCHANGED <>
56 | unchanged_r == UNCHANGED <>
57 | unchanged_c == UNCHANGED <>
58 | unchanged_o == UNCHANGED <>
59 | unchanged_mc == unchanged_m /\ unchanged_c
60 | unchanged_mtc == unchanged_mc /\ unchanged_t
61 | unchanged_mtr == unchanged_m /\ unchanged_t /\ unchanged_r
62 | unchanged_Mrc == unchanged_r /\ unchanged_c /\ unchanged_M
63 | unchanged_mrco == unchanged_mc /\ unchanged_r /\ unchanged_o
64 | unchanged_mtco == unchanged_mtc /\ unchanged_o
65 | unchanged_mtrc == unchanged_mtc /\ unchanged_r
66 | unchanged_Mtrc == unchanged_Mrc /\ unchanged_t
67 | unchanged_Mmrc == unchanged_Mrc /\ unchanged_m
68 | unchanged_mtrco == unchanged_mtrc /\ unchanged_o
69 | unchanged_Mmrco == unchanged_mrco /\ unchanged_M
70 | unchanged_Mmtrc == unchanged_mtrc /\ unchanged_M
71 |
72 |
73 | -------------------------------------------------------------------------------------
74 | \* Type definitions
75 | Type_oTS == [ver: 0..O_MAX_VERSION, tb: LB_NODES_0]
76 | Type_rTS == [ver: 0..O_MAX_VERSION, tb: APP_NODES_0]
77 | Type_tState == {"valid", "invalid", "write"} \* readers can be in valid and invalid and owner in valid and write
78 | Type_oState == {"valid", "invalid", "drive", "request"} \* all nodes start from valid
79 | Type_rType == {"add-owner", "change-owner", "add-reader", "rm-reader", "NOOP"}
80 | Type_oVector == [readers: SUBSET APP_NODES, owner: APP_NODES_0]
81 |
82 | Type_oMessage == \* Msgs exchanged by the sharding protocol
83 | [type: {"REQ"}, rTS : Type_rTS,
84 | rID : Nat,
85 | rType : Type_rType,
86 | epochID : 0..O_MAX_FAILURES]
87 | \union
88 | [type: {"NACK"}, rTS : Type_rTS,
89 | rID : Nat]
90 | \union
91 | [type: {"S_INV"}, sender : O_NODES,
92 | driver : O_NODES,
93 | rTS : Type_rTS,
94 | rID : Nat,
95 | oTS : Type_oTS,
96 | oVector : Type_oVector,
97 | rType : Type_rType,
98 | epochID : 0..O_MAX_FAILURES]
99 | \union
100 | [type: {"S_ACK"}, sender : O_NODES,
101 | oTS : Type_oTS,
102 | tVersion : 0..O_MAX_DATA_VERSION, \* emulates data send as well
103 | epochID : 0..O_MAX_FAILURES]
104 | \union
105 | [type: {"RESP"}, oVector : Type_oVector,
106 | oTS : Type_oTS,
107 | rTS : Type_rTS,
108 | \* preOwner , \* pre-request owner is not needed for model check (since we model bcast messages)
109 | tVersion : 0..O_MAX_DATA_VERSION,
110 | epochID : 0..O_MAX_FAILURES]
111 | \union
112 | [type: {"S_VAL"}, oTS : Type_oTS,
113 | epochID : 0..O_MAX_FAILURES]
114 |
115 |
116 | Type_tMessage == \* msgs exchanged by the transactional reliable commit protocol
117 | [type: {"T_INV", "T_ACK", "T_VAL"}, tVersion : Nat,
118 | sender : O_NODES,
119 | epochID : 0..O_MAX_FAILURES]
120 |
121 |
122 | -------------------------------------------------------------------------------------
123 | \* Type check and initialization
124 |
125 | OTypeOK == \* The type correctness invariant
126 | /\ oTS \in [O_NODES -> Type_oTS]
127 | /\ oState \in [O_NODES -> Type_oState]
128 | /\ oDriver \in [O_NODES -> O_NODES_0]
129 | /\ oVector \in [O_NODES -> Type_oVector]
130 | /\ \A n \in O_NODES: oRcvACKs[n] \subseteq (O_NODES \ {n})
131 | /\ rTS \in [O_NODES -> Type_rTS]
132 | /\ rID \in [O_NODES -> 0..O_MAX_VERSION]
133 | /\ rType \in [O_NODES -> Type_rType]
134 | /\ rEID \in [O_NODES -> 0..(Cardinality(O_NODES) - 1)]
135 | /\ tVersion \in [O_NODES -> 0..O_MAX_DATA_VERSION]
136 | /\ tState \in [O_NODES -> Type_tState]
137 | /\ \A n \in O_NODES: tRcvACKs[n] \subseteq (O_NODES \ {n})
138 | /\ committedREQs \subseteq Type_oTS
139 | /\ committedRTS \subseteq Type_rTS
140 | /\ oMsgs \subseteq (Type_oMessage \union Type_tMessage)
141 | /\ mEID \in 0..(Cardinality(O_NODES) - 1)
142 | /\ mAliveNodes \subseteq O_NODES
143 |
144 | OInit == \* The initial predicate
145 | /\ oTS = [n \in O_NODES |-> [ver |-> 0, tb |-> 0]]
146 | /\ oState = [n \in O_NODES |-> "valid"]
147 | /\ oDriver = [n \in O_NODES |-> 0]
148 | /\ oVector = [n \in O_NODES |-> [readers |-> {}, owner |-> 0]]
149 | /\ oRcvACKs = [n \in O_NODES |-> {}]
150 | /\ rTS = [n \in O_NODES |-> [ver |-> 0, tb |-> 0]]
151 | /\ rID = [n \in O_NODES |-> 0]
152 | /\ rEID = [n \in O_NODES |-> 0]
153 | /\ rType = [n \in O_NODES |-> "NOOP"]
154 | /\ tVersion = [n \in O_NODES |-> 0]
155 | /\ tState = [n \in O_NODES |-> "valid"]
156 | /\ tRcvACKs = [n \in O_NODES |-> {}]
157 | /\ committedRTS = {}
158 | /\ committedREQs = {}
159 | /\ oMsgs = {}
160 | /\ mEID = 0
161 | /\ mAliveNodes = O_NODES
162 |
163 | Min(S) == CHOOSE x \in S: \A y \in S \ {x}: y > x
164 | set_wo_min(S) == S \ {Min(S)}
165 |
166 | \* First Command executed once after OInit to initialize owner/readers and oVector state
167 | OInit_min_owner_rest_readers ==
168 | /\ \A x \in O_NODES: tVersion[x] = 0
169 | /\ tVersion' = [n \in O_NODES |-> IF n \in LB_NODES THEN 0 ELSE 1]
170 | /\ oVector' = [n \in O_NODES |-> IF n \in set_wo_min(APP_NODES)
171 | THEN oVector[n]
172 | ELSE [readers |-> set_wo_min(APP_NODES),
173 | owner |-> Min(APP_NODES)]]
174 | /\ unchanged_Mmrc
175 | /\ UNCHANGED <>
176 |
177 | -------------------------------------------------------------------------------------
178 | \* Helper functions
179 | has_data(n) == tVersion[n] > 0
180 | has_valid_data(n) == /\ has_data(n)
181 | /\ tState[n] = "valid"
182 |
183 | is_owner(n) == /\ has_data(n)
184 | /\ oVector[n].owner = n
185 |
186 | is_valid_owner(n) == /\ is_owner(n)
187 | /\ oState[n] = "valid"
188 |
189 | is_reader(n) == /\ has_data(n)
190 | /\ ~is_owner(n)
191 | /\ n \notin LB_NODES
192 |
193 | is_live_arbiter(n) == \/ n \in LB_LIVE_NODES
194 | \/ is_owner(n)
195 |
196 | is_valid_live_arbiter(n) == /\ is_live_arbiter(n)
197 | /\ oState[n] = "valid"
198 |
199 | is_requester(n) ==
200 | /\ n \in APP_LIVE_NODES
201 | /\ ~is_owner(n)
202 |
203 | is_valid_requester(n) ==
204 | /\ is_requester(n)
205 | /\ oState[n] = "valid"
206 |
207 | is_in_progress_requester(n) ==
208 | /\ is_requester(n)
209 | /\ oState[n] = "request"
210 |
211 | requester_is_alive(n) == rTS[n].tb \in mAliveNodes
212 |
213 | -------------------------------------------------------------------------------------
214 | \* Timestamp Comparison Helper functions
215 | is_equalTS(ts1, ts2) ==
216 | /\ ts1.ver = ts2.ver
217 | /\ ts1.tb = ts2.tb
218 |
219 | is_greaterTS(ts1, ts2) ==
220 | \/ ts1.ver > ts2.ver
221 | \/ /\ ts1.ver = ts2.ver
222 | /\ ts1.tb > ts2.tb
223 |
224 | is_greatereqTS(ts1, ts2) ==
225 | \/ is_equalTS(ts1, ts2)
226 | \/ is_greaterTS(ts1, ts2)
227 |
228 | is_smallerTS(ts1, ts2) == ~is_greatereqTS(ts1, ts2)
229 |
230 | -------------------------------------------------------------------------------------
231 | \* Request type Helper functions
232 | is_non_sharing_req(n) == (rType[n] = "add-owner" \/ rType[n] = "add-reader")
233 |
234 | \* Post o_vector based on request type and r (requester or 0 if requester is not alive)
235 | post_oVec(n, r, pre_oVec) ==
236 | IF (rType[n] = "add-owner" \/ rType[n] = "change-owner")
237 | THEN [owner |-> r,
238 | readers |-> (pre_oVec.readers \union {pre_oVec.owner}) \ {r, 0}]
239 | ELSE [owner |-> pre_oVec.owner,
240 | readers |-> IF rType[n] = "remove-reader"
241 | THEN pre_oVec.readers \ {r, 0}
242 | ELSE \* rType[n] = "add-reader"
243 | (pre_oVec.readers \union {r}) \ {0}]
244 |
245 |
246 | -------------------------------------------------------------------------------------
247 | \* Message Helper functions
248 |
249 | \* Used only to emulate FIFO REQ channels (and not re-execute already completed REQs)
250 | not_completed_rTS(r_ts) == \A c_rTS \in committedRTS: c_rTS # r_ts
251 |
252 | \* Messages in oMsgs are only appended to this variable (not removed once delivered)
253 | \* intentionally to check protocols tolerance in dublicates and reorderings
254 | send_omsg(m) == oMsgs' = oMsgs \union {m}
255 |
256 | o_send_req(r_ts, r_id, r_type) ==
257 | send_omsg([type |-> "REQ",
258 | rTS |-> r_ts,
259 | rID |-> r_id,
260 | rType |-> r_type,
261 | epochID |-> mEID ])
262 |
263 | o_send_nack(r_ts, r_id) ==
264 | send_omsg([type |-> "NACK",
265 | rTS |-> r_ts,
266 | rID |-> r_id])
267 |
268 | o_send_inv(sender, driver, o_ts, o_vec, r_ts, r_id, r_type) ==
269 | send_omsg([type |-> "S_INV",
270 | sender |-> sender,
271 | driver |-> driver,
272 | oTS |-> o_ts,
273 | oVector |-> o_vec,
274 | rTS |-> r_ts,
275 | rID |-> r_id,
276 | rType |-> r_type,
277 | epochID |-> mEID ])
278 |
279 | o_send_ack(sender, o_ts, t_version) ==
280 | send_omsg([type |-> "S_ACK",
281 | sender |-> sender,
282 | oTS |-> o_ts,
283 | tVersion |-> t_version,
284 | epochID |-> mEID ])
285 |
286 | o_send_resp(r_ts, o_ts, o_vec, t_version) ==
287 | send_omsg([type |-> "RESP",
288 | oVector |-> o_vec,
289 | oTS |-> o_ts,
290 | rTS |-> r_ts,
291 | tVersion |-> t_version,
292 | epochID |-> mEID ])
293 |
294 | o_send_val(o_ts) ==
295 | send_omsg([type |-> "S_VAL",
296 | oTS |-> o_ts,
297 | epochID |-> mEID ])
298 |
299 | \* Operators to check received messages (m stands for message)
300 | o_rcv_req(m) ==
301 | /\ m.type = "REQ"
302 | /\ m.epochID = mEID
303 | /\ not_completed_rTS(m.rTS)
304 |
305 | o_rcv_nack(m, receiver) ==
306 | /\ m.type = "NACK"
307 | /\ m.rTS = rTS[receiver]
308 | /\ m.rID = rID[receiver]
309 |
310 | o_rcv_resp(m, receiver) ==
311 | /\ m.type = "RESP"
312 | /\ m.epochID = mEID
313 | /\ m.rTS = rTS[receiver]
314 |
315 | o_rcv_inv(m, receiver) ==
316 | /\ m.type = "S_INV"
317 | /\ m.epochID = mEID
318 | /\ m.sender # receiver
319 |
320 | o_rcv_inv_equal_ts(m, receiver) ==
321 | /\ o_rcv_inv(m, receiver)
322 | /\ is_equalTS(m.oTS, oTS[receiver])
323 |
324 | o_rcv_inv_smaller_ts(m, receiver) ==
325 | /\ o_rcv_inv(m, receiver)
326 | /\ is_smallerTS(m.oTS, oTS[receiver])
327 |
328 | o_rcv_inv_greater_ts(m, receiver) ==
329 | /\ o_rcv_inv(m, receiver)
330 | /\ is_greaterTS(m.oTS, oTS[receiver])
331 |
332 | o_rcv_inv_greatereq_ts(m, receiver) ==
333 | /\ o_rcv_inv(m, receiver)
334 | /\ ~is_smallerTS(m.oTS, oTS[receiver])
335 |
336 | o_rcv_ack(m, receiver) ==
337 | /\ m.type = "S_ACK"
338 | /\ m.epochID = mEID
339 | /\ m.sender # receiver
340 | /\ oState[receiver] = "drive"
341 | /\ m.sender \notin oRcvACKs[receiver]
342 | /\ is_equalTS(m.oTS, oTS[receiver])
343 |
344 | o_rcv_val(m, receiver) ==
345 | /\ m.type = "S_VAL"
346 | /\ m.epochID = mEID
347 | /\ oState[receiver] # "valid"
348 | /\ is_equalTS(m.oTS, oTS[receiver])
349 |
350 |
351 | \* Used to not re-issue messages that already exists (and bound the state space)
352 | msg_not_exists(o_rcv_msg(_, _), receiver) ==
353 | ~\E mm \in oMsgs: o_rcv_msg(mm, receiver)
354 |
355 |
356 |
357 | rcved_acks_from_set(n, set) == set \subseteq oRcvACKs[n]
358 |
359 | \* Check if all acknowledgments from arbiters have been received
360 | has_rcved_all_ACKs(n) ==
361 | /\ rEID[n] = mEID
362 | /\ IF oVector[n].owner # 0
363 | THEN rcved_acks_from_set(n, {oVector[n].owner} \union LB_LIVE_ARBITERS(n))
364 | ELSE \/ /\ ~requester_is_alive(n)
365 | /\ rcved_acks_from_set(n, LB_LIVE_ARBITERS(n))
366 | \/ /\ oVector[n].readers # {}
367 | /\ \E x \in oVector[n].readers: rcved_acks_from_set(n, {x} \union LB_LIVE_ARBITERS(n))
368 | -------------------------------------------------------------------------------------
369 | \* message helper functions related to transactions
370 | t_send(n, msg_type, t_ver) ==
371 | send_omsg([type |-> msg_type,
372 | tVersion |-> t_ver,
373 | sender |-> n,
374 | epochID |-> mEID ])
375 |
376 | t_rcv_inv(m, receiver) ==
377 | /\ m.type = "T_INV"
378 | /\ m.epochID = mEID
379 | /\ m.sender # receiver
380 |
381 | t_rcv_ack(m, receiver) ==
382 | /\ m.type = "T_ACK"
383 | /\ m.epochID = mEID
384 | /\ m.sender # receiver
385 | /\ tState[receiver] = "write"
386 | /\ m.sender \notin tRcvACKs[receiver]
387 | /\ m.tVersion = tVersion[receiver]
388 |
389 | t_rcv_val(m, receiver) ==
390 | /\ m.type = "T_VAL"
391 | /\ m.epochID = mEID
392 | /\ tState[receiver] # "valid"
393 | /\ m.tVersion = tVersion[receiver]
394 |
395 | -------------------------------------------------------------------------------------
396 | \* Protocol Invariants:
397 |
398 | \* Valid data are consistent
399 | CONSISTENT_DATA ==
400 | \A k,n \in APP_LIVE_NODES: \/ ~has_valid_data(k)
401 | \/ ~has_valid_data(n)
402 | \/ tVersion[n] = tVersion[k]
403 |
404 | \* Amongst concurrent sharing requests only one succeeds
405 | \* The invariant that an we cannot have two REQs committed with same versions
406 | \* (i.e., that read and modified the same sharing vector)
407 | ONLY_ONE_CONC_REQ_COMMITS ==
408 | \A x,y \in committedREQs: \/ x.ver # y.ver
409 | \/ x.tb = y.tb
410 |
411 | \* There is always at most one valid owner
412 | AT_MOST_ONE_OWNER ==
413 | \A n,m \in mAliveNodes: \/ ~is_valid_owner(n)
414 | \/ ~is_valid_owner(m)
415 | \/ m = n
416 |
417 | \* Valid owner has the most up-to-date data and version among live replicas
418 | OWNER_LATEST_DATA ==
419 | \A o,k \in mAliveNodes: \/ ~is_valid_owner(o)
420 | \/ ~has_data(o)
421 | \/ tVersion[o] >= tVersion[k]
422 |
423 | \* All valid sharers (LB + owner) agree on their sharing vectors (and TS)
424 | CONSISTENT_SHARERS ==
425 | \A k,n \in mAliveNodes: \/ ~is_valid_live_arbiter(n)
426 | \/ ~is_valid_live_arbiter(k)
427 | \/ /\ oTS[n] = oTS[k]
428 | /\ oVector[n] = oVector[k]
429 |
430 |
431 | CONSISTENT_OVECTORS_Fwd ==
432 | \A n \in mAliveNodes: \/ ~is_valid_live_arbiter(n)
433 | \/ /\ \A r \in oVector[n].readers:
434 | /\ has_data(r)
435 | /\ ~is_valid_owner(r)
436 | /\ \/ oVector[n].owner = 0
437 | \/ is_owner(oVector[n].owner)
438 |
439 | CONSISTENT_OVECTORS_Reverse_owner ==
440 | \A o,n \in mAliveNodes: \/ ~is_valid_owner(o)
441 | \/ ~is_valid_live_arbiter(n)
442 | \/ oVector[n].owner = o
443 |
444 | CONSISTENT_OVECTORS_Reverse_readers ==
445 | \A r,n \in mAliveNodes: \/ ~is_reader(r)
446 | \/ ~is_valid_live_arbiter(n)
447 | \/ r \in oVector[n].readers
448 |
449 | \* The owner and readers are always correctly reflected by any valid sharing vectors
450 | CONSISTENT_OVECTORS ==
451 | /\ CONSISTENT_OVECTORS_Fwd
452 | /\ CONSISTENT_OVECTORS_Reverse_owner
453 | /\ CONSISTENT_OVECTORS_Reverse_readers
454 |
455 | =============================================================================
456 |
--------------------------------------------------------------------------------
/reliable_commit_ptrotocol/ZeusReliableCommit.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ease-lab/Zeus/1dc21c5365ec44553e0aeba1a667204bdb5a3a12/reliable_commit_ptrotocol/ZeusReliableCommit.pdf
--------------------------------------------------------------------------------
/reliable_commit_ptrotocol/ZeusReliableCommit.tla:
--------------------------------------------------------------------------------
1 | ------------------------------- MODULE ZeusReliableCommit -------------------------------
2 | \* Specification of Zeus's reliable commit protocol presented in the Zeus paper
3 | \* that appears in Eurosys'21.
4 | \* This module includes everything but the pipelining optimization presented in the paper.
5 |
6 | \* Model check passed [@ 21st of Jan 2021] with the following parameters:
7 | \* R_NODES = {0, 1, 2}
8 | \* R_MAX_EPOCH = 4
9 | \* R_MAX_VERSION = 4
10 |
11 | EXTENDS Integers
12 |
13 | CONSTANTS R_NODES,
14 | R_MAX_EPOCH,
15 | R_MAX_VERSION
16 |
17 | VARIABLES rMsgs,
18 | rKeyState,
19 | rKeySharers,
20 | rKeyVersion,
21 | rKeyRcvedACKs,
22 | rKeyLastWriter,
23 | rNodeEpochID,
24 | rAliveNodes,
25 | rEpochID
26 |
27 | vars == << rMsgs, rKeyState, rKeySharers, rKeyVersion, rKeyRcvedACKs,
28 | rKeyLastWriter, rNodeEpochID, rAliveNodes, rEpochID >>
29 | -----------------------------------------------------------------------------
30 | \* The consistent invariant: all alive nodes in valid state should have the same value / TS
31 | RConsistentInvariant ==
32 | \A k,s \in rAliveNodes: \/ rKeyState[k] /= "valid"
33 | \/ rKeyState[s] /= "valid"
34 | \/ rKeyVersion[k] = rKeyVersion[s]
35 |
36 | RMaxVersionDistanceInvariant == \* this does not hold w/ the pipelining optimization
37 | \A k,s \in rAliveNodes:
38 | \/ rKeyVersion[k] <= rKeyVersion[s] + 1
39 | \/ rKeyVersion[s] <= rKeyVersion[k] + 1
40 |
41 | RSingleOnwerInvariant ==
42 | \A k,s \in rAliveNodes:
43 | \/ rKeySharers[k] /= "owner"
44 | \/ rKeySharers[s] /= "owner"
45 | \/ k = s
46 |
47 | ROnwerOnlyWriterInvariant ==
48 | \A k \in rAliveNodes:
49 | \/ rKeyState[k] /= "write"
50 | \/ rKeySharers[k] = "owner"
51 |
52 | ROnwerHighestVersionInvariant == \* owner has the highest version among alive nodes
53 | \A k,s \in rAliveNodes:
54 | \/ /\ rKeySharers[s] /= "owner"
55 | /\ rKeySharers[k] /= "owner"
56 | \/
57 | /\ rKeySharers[k] = "owner"
58 | /\ rKeyVersion[k] >= rKeyVersion[s]
59 | \/
60 | /\ rKeySharers[s] = "owner"
61 | /\ rKeyVersion[s] >= rKeyVersion[k]
62 |
63 | -----------------------------------------------------------------------------
64 |
65 | RMessage == \* Messages exchanged by the Protocol
66 | [type: {"INV", "ACK"}, sender : R_NODES,
67 | epochID : 0..R_MAX_EPOCH,
68 | version : 0..R_MAX_VERSION]
69 | \union
70 | [type: {"VAL"}, epochID : 0..R_MAX_EPOCH,
71 | version : 0..R_MAX_VERSION]
72 |
73 |
74 | RTypeOK == \* The type correctness invariant
75 | /\ rMsgs \subseteq RMessage
76 | /\ rAliveNodes \subseteq R_NODES
77 | /\ \A n \in R_NODES: rKeyRcvedACKs[n] \subseteq (R_NODES \ {n})
78 | /\ rNodeEpochID \in [R_NODES -> 0..R_MAX_EPOCH]
79 | /\ rKeyLastWriter \in [R_NODES -> R_NODES]
80 | /\ rKeyVersion \in [R_NODES -> 0..R_MAX_VERSION]
81 | /\ rKeySharers \in [R_NODES -> {"owner", "reader", "non-sharer"}]
82 | /\ rKeyState \in [R_NODES -> {"valid", "invalid", "write", "replay"}]
83 |
84 |
85 | RInit == \* The initial predicate
86 | /\ rMsgs = {}
87 | /\ rEpochID = 0
88 | /\ rAliveNodes = R_NODES
89 | /\ rKeyVersion = [n \in R_NODES |-> 0]
90 | /\ rNodeEpochID = [n \in R_NODES |-> 0]
91 | /\ rKeyRcvedACKs = [n \in R_NODES |-> {}]
92 | /\ rKeySharers = [n \in R_NODES |-> "reader"]
93 | /\ rKeyState = [n \in R_NODES |-> "valid"]
94 | /\ rKeyLastWriter = [n \in R_NODES |-> CHOOSE k \in R_NODES:
95 | \A m \in R_NODES: k <= m]
96 |
97 | -----------------------------------------------------------------------------
98 |
99 | RNoChanges_in_membership == UNCHANGED <>
100 |
101 | RNoChanges_but_membership ==
102 | UNCHANGED <>
105 |
106 | RNoChanges ==
107 | /\ RNoChanges_in_membership
108 | /\ RNoChanges_but_membership
109 |
110 | -----------------------------------------------------------------------------
111 | \* A buffer maintaining all network messages. Messages are only appended to
112 | \* this variable (not \* removed once delivered) intentionally to check
113 | \* protocol's tolerance in dublicates and reorderings
114 | RSend(m) == rMsgs' = rMsgs \union {m}
115 |
116 | \* Check if all acknowledgments for a write have been received
117 | RAllACKsRcved(n) == (rAliveNodes \ {n}) \subseteq rKeyRcvedACKs[n]
118 |
119 | RIsAlive(n) == n \in rAliveNodes
120 |
121 | RNodeFailure(n) == \* Emulate a node failure
122 | \* Make sure that there are atleast 3 alive nodes before killing a node
123 | /\ \E k,m \in rAliveNodes: /\ k /= n
124 | /\ m /= n
125 | /\ m /= k
126 | /\ rEpochID' = rEpochID + 1
127 | /\ rAliveNodes' = rAliveNodes \ {n}
128 | /\ RNoChanges_but_membership
129 |
130 | -----------------------------------------------------------------------------
131 | RNewOwner(n) ==
132 | /\ \A k \in rAliveNodes:
133 | /\ rKeySharers[k] /= "owner"
134 | /\ \/ /\ rKeyState[k] = "valid" \* all alive replicas are in valid state
135 | /\ rKeySharers[k] = "reader" \* and there is not alive owner
136 | \/ /\ rKeySharers[k] = "non-sharer" \* and there is not alive owner
137 | /\ rKeySharers' = [rKeySharers EXCEPT ![n] = "owner"]
138 | /\ UNCHANGED <>
140 |
141 | ROverthrowOwner(n) ==
142 | \E k \in rAliveNodes:
143 | /\ rKeyState[k] = "valid"
144 | /\ rKeySharers[k] = "owner"
145 | /\ rKeySharers' = [rKeySharers EXCEPT ![n] = "owner",
146 | ![k] = "reader"]
147 | /\ UNCHANGED <>
149 |
150 | RGetOwnership(n) ==
151 | /\ rKeySharers[n] /= "owner"
152 | /\ \A x \in rAliveNodes: rNodeEpochID[x] = rEpochID \*TODO may move this to RNewOwner
153 | /\ \/ ROverthrowOwner(n)
154 | \/ RNewOwner(n)
155 | -----------------------------------------------------------------------------
156 |
157 | RRead(n) == \* Execute a read
158 | /\ rNodeEpochID[n] = rEpochID
159 | /\ rKeyState[n] = "valid"
160 | /\ RNoChanges
161 |
162 | RRcvInv(n) == \* Process a received invalidation
163 | \E m \in rMsgs:
164 | /\ m.type = "INV"
165 | /\ m.epochID = rEpochID
166 | /\ m.sender /= n
167 | /\ m.sender \in rAliveNodes
168 | \* always acknowledge a received invalidation (irrelevant to the timestamp)
169 | /\ RSend([type |-> "ACK",
170 | epochID |-> rEpochID,
171 | sender |-> n,
172 | version |-> m.version])
173 | /\ \/ m.version > rKeyVersion[n]
174 | /\ rKeyState[n] \in {"valid", "invalid", "replay"}
175 | /\ rKeyState' = [rKeyState EXCEPT ![n] = "invalid"]
176 | /\ rKeyVersion' = [rKeyVersion EXCEPT ![n] = m.version]
177 | /\ rKeyLastWriter' = [rKeyLastWriter EXCEPT ![n] = m.sender]
178 | \/ m.version <= rKeyVersion[n]
179 | /\ UNCHANGED <>
180 | /\ UNCHANGED <>
181 |
182 | RRcvVal(n) == \* Process a received validation
183 | \E m \in rMsgs:
184 | /\ rKeyState[n] /= "valid"
185 | /\ m.type = "VAL"
186 | /\ m.epochID = rEpochID
187 | /\ m.version = rKeyVersion[n]
188 | /\ rKeyState' = [rKeyState EXCEPT ![n] = "valid"]
189 | /\ UNCHANGED <>
191 |
192 | RReaderActions(n) == \* Actions of a write follower
193 | \/ RRead(n)
194 | \/ RRcvInv(n)
195 | \/ RRcvVal(n)
196 |
197 | -------------------------------------------------------------------------------------
198 |
199 | RWrite(n) ==
200 | /\ rNodeEpochID[n] = rEpochID
201 | /\ rKeySharers[n] \in {"owner"}
202 | /\ rKeyState[n] \in {"valid"} \* May add invalid state here as well
203 | /\ rKeyVersion[n] < R_MAX_VERSION
204 | /\ rKeyLastWriter' = [rKeyLastWriter EXCEPT ![n] = n]
205 | /\ rKeyRcvedACKs' = [rKeyRcvedACKs EXCEPT ![n] = {}]
206 | /\ rKeyState' = [rKeyState EXCEPT ![n] = "write"]
207 | /\ rKeyVersion' = [rKeyVersion EXCEPT ![n] = rKeyVersion[n] + 1]
208 | /\ RSend([type |-> "INV",
209 | epochID |-> rEpochID,
210 | sender |-> n,
211 | version |-> rKeyVersion[n] + 1])
212 | /\ UNCHANGED <>
213 |
214 | RRcvAck(n) == \* Process a received acknowledment
215 | \E m \in rMsgs:
216 | /\ m.type = "ACK"
217 | /\ m.epochID = rEpochID
218 | /\ m.sender /= n
219 | /\ m.version = rKeyVersion[n]
220 | /\ m.sender \notin rKeyRcvedACKs[n]
221 | /\ rKeyState[n] \in {"write", "replay"}
222 | /\ rKeyRcvedACKs' = [rKeyRcvedACKs EXCEPT ![n] =
223 | rKeyRcvedACKs[n] \union {m.sender}]
224 | /\ UNCHANGED <>
226 |
227 | RSendVals(n) == \* Send validations once received acknowledments from all alive nodes
228 | /\ rKeyState[n] \in {"write", "replay"}
229 | /\ RAllACKsRcved(n)
230 | /\ rKeyState' = [rKeyState EXCEPT![n] = "valid"]
231 | /\ RSend([type |-> "VAL",
232 | epochID |-> rEpochID,
233 | version |-> rKeyVersion[n]])
234 | /\ UNCHANGED <>
236 |
237 | ROwnerActions(n) == \* Actions of a read/write coordinator
238 | \/ RRead(n)
239 | \/ RWrite(n)
240 | \/ RRcvAck(n)
241 | \/ RSendVals(n)
242 |
243 | -------------------------------------------------------------------------------------
244 |
245 | RWriteReplay(n) == \* Execute a write-replay
246 | /\ rKeyLastWriter' = [rKeyLastWriter EXCEPT ![n] = n]
247 | /\ rKeyRcvedACKs' = [rKeyRcvedACKs EXCEPT ![n] = {}]
248 | /\ rKeyState' = [rKeyState EXCEPT ![n] = "replay"]
249 | /\ RSend([type |-> "INV",
250 | sender |-> n,
251 | epochID |-> rEpochID,
252 | version |-> rKeyVersion[n]])
253 | /\ UNCHANGED <>
254 |
255 | RLocalWriteReplay(n) ==
256 | /\ \/ rKeySharers[n] = "owner"
257 | \/ rKeyState[n] = "replay"
258 | /\ RWriteReplay(n)
259 |
260 | RFailedNodeWriteReplay(n) ==
261 | /\ ~RIsAlive(rKeyLastWriter[n])
262 | /\ rKeyState[n] = "invalid"
263 | /\ RWriteReplay(n)
264 |
265 | RUpdateLocalEpochID(n) ==
266 | /\ rKeyState[n] = "valid"
267 | /\ rNodeEpochID' = [rNodeEpochID EXCEPT![n] = rEpochID]
268 | /\ UNCHANGED <>
270 |
271 | RReplayActions(n) ==
272 | /\ rNodeEpochID[n] < rEpochID
273 | /\ \/ RLocalWriteReplay(n)
274 | \/ RFailedNodeWriteReplay(n)
275 | \/ RUpdateLocalEpochID(n)
276 |
277 | -------------------------------------------------------------------------------------
278 | RNext == \* Modeling protocol (Owner and Reader actions while emulating failures)
279 | \E n \in rAliveNodes:
280 | \/ RReaderActions(n)
281 | \/ ROwnerActions(n)
282 | \/ RReplayActions(n)
283 | \/ RGetOwnership(n)
284 | \/ RNodeFailure(n) \* emulate node failures
285 |
286 |
287 | (***************************************************************************)
288 | (* The complete definition of the algorithm *)
289 | (***************************************************************************)
290 |
291 | Spec == RInit /\ [][RNext]_vars
292 |
293 | Invariants == /\ ([]RTypeOK)
294 | /\ ([]RConsistentInvariant)
295 | /\ ([]RSingleOnwerInvariant)
296 | /\ ([]ROnwerOnlyWriterInvariant)
297 | /\ ([]RMaxVersionDistanceInvariant)
298 | /\ ([]ROnwerHighestVersionInvariant)
299 |
300 | THEOREM Spec => Invariants
301 | =============================================================================
302 |
--------------------------------------------------------------------------------
/zeus.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ease-lab/Zeus/1dc21c5365ec44553e0aeba1a667204bdb5a3a12/zeus.png
--------------------------------------------------------------------------------