├── LICENSE ├── README.md ├── ownership_protocol ├── ZeusOwnership.pdf ├── ZeusOwnership.tla ├── ZeusOwnershipFaults.pdf ├── ZeusOwnershipFaults.tla ├── ZeusOwnershipMeta.pdf └── ZeusOwnershipMeta.tla ├── reliable_commit_ptrotocol ├── ZeusReliableCommit.pdf └── ZeusReliableCommit.tla └── zeus.png /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Zeus Ownership and Transactional Protocols 2 | 3 | 4 | 5 | *Zeus* is a datastore that offers fast locality-aware distributed transactions with strong consistency and availability. A brief description follows and more details can be found in the [Eurosys'21](https://2021.eurosys.org/) paper. 6 | 7 | This is the publicly available artifact repository supporting *Zeus*, which contains the specification of the two protocols that enable Zeus locality-aware reliable transactions; the *ownership protocol* and the *reliable commit protocol* of Zeus. The specifications are written in TLA+ and can be used to verify Zeus's correctness via model-checking. 8 | 9 | ## Inspired by 10 | Zeus protocols build on ideas of [Hermes](https://hermes-protocol.com/) and draws inspiration from cache coherence and hardware transactional memory exapting ideas to a replicated distributed setting for availability. Inspired concepts include the invalidation-based design of both proposed protocols and Zeus's approach to move objects and ensure exclusive write access (*ownership*) to the coordinator of a write transaction. 11 | 12 | ## Citation 13 | ``` 14 | @inproceedings{Katsarakis:21, 15 | author = {Katsarakis, Antonios and Ma, Yijun and Tan, Zhaowei and Bainbridge, Andrew and Balkwill, Matthew and Dragojevic, Aleksandar and Grot, Boris and Radunovic, Bozidar and Zhang, Yongguang}, 16 | title = {Zeus: Locality-Aware Distributed Transactions}, 17 | year = {2021}, 18 | publisher = {Association for Computing Machinery}, 19 | address = {New York, NY, USA}, 20 | booktitle = {Proceedings of the Sixteenth European Conference on Computer Systems}, 21 | location = {Online Event, United Kingdom}, 22 | series = {EuroSys '21} 23 | } 24 | ``` 25 | ---- 26 | # Locality-aware reliable transactions 27 | Transactions in Zeus involve three main phases: 28 | - __Prepare & Execute__: Execute the transaction locally;
29 | If *locality is not captured* (i.e., if accessing an object not local to the executor -- or missing exclussive write access for write transactions) 30 |
→ the object (and/or permissions) are acquired via the __ownership protocol__ 31 | - *Exclusive owner* guarantee: at any time, at most one node with exclusive write access (i.e., *owner*) to an object 32 | - *Fast/slow-path* design: to acquire ownership (and data) in at most 1.5 RTT regardless of the requesting node in the absence of faults 33 | - *Fault-tolerant*: each ownership protocol step is idempotent to recover from faults 34 | - __Local Commit__: *Any* traditional single node (unreliable -- i.e., non-replicated) commit 35 | - __Reliable Commit__: Replicate updates to sharers for data availability: 36 | - *Fast Commit*: 1RTT that is also pipelined to hide the latency 37 | - *Read-only optimized transactions*: strictly serializable and local from any replica 38 | - *Fault-tolerant*: each reliable commit step is idempotent to recover from faults 39 | 40 | ## Properties and Invariants 41 | __Faults__: The specification and model checking assumes that crash-stop node faults and message reorderings may occur. 42 | Message losses in Zeus are handled via retransmissions. The exact failure model can be found in the paper.
43 | __Strong Consistency__: Zeus transactions guarantee the strongest consistency (i.e., are strictly serializable).
44 | __Invariants__: A list of model-checked invariants provided by the protocols follows 45 | * Amongst concurrent ownership requests to the same object, at most one succeeds. 46 | * At any time, there is at most one valid owner of an object. 47 | * A valid owner of an object has the most up-to-date data and version among live replicas. 48 | * All valid sharer vectors (stored by directory nodes and the owner) of an object agree on the object's sharers and ownership timestamp (o_ts). 49 | * The owner and readers are always correctly reflected by all valid sharer vectors. 50 | * A replica found in the valid state stores the latest committed value of an object. 51 | 52 | ---- 53 | 54 | ## Model checking 55 | To model check the protocols, you need to download and install the TLA+ Toolbox so that you can run the *TLC* model checker using either the Reliable commit or ownership *TLA+* specifications. We next list the steps to model check Zeus's *reliable commit protocol* (model checking the ownership protocol is similar). 56 | * __Prerequisites__: Any OS with Java 1.8 or later, to accommodate the *TLA+* Toolbox. 57 | * __Download and install__ the [TLA+ Toolbox](https://lamport.azurewebsites.net/tla/toolbox.html). 58 | * __Launch__ the TLA+ Toolbox. 59 | * __Create a spec__: *File>Open Spec>Add New Spec...*; Browse and use *zeus/reliable_commit_protocol/ZeusReliableCommit.tla* as root module to finish. 60 | * __Create a new Model__: Navigate to *TLC Model Checker>New model...*; and create a model with the name "reliable-commit-model". 61 | * __Setup Behavior__: In *Model Overview* tab of the model, and under the *"What is the behavior spec?"* section, select *"Temporal formula"* and write *"Spec"*. 62 | * __Setup Constants__: Then specify the values of declared constants (under *"What is the model?"* section). You may use low values for constants to check correctness without exploding the state space. An example configuration would be three nodes and maximum versions of two or three. To accomplish that, you would need to click on each constant and select the "ordinary assignment" option. Then fill the box for version and epoch constants (e.g., *R_MAX_VERSION*) with a small number (e.g., with *"2"* or *"3"*) and for any node related fields (e.g., *R_NODES*) with a set of nodes (e.g., *"{1,2,3}"* -- for three nodes). 63 | 64 | ### File Structure 65 | * __The reliable commit specification__ is a single TLA+ module in *zeus/reliable_commit_protocol* folder. 66 | * __The ownership specification__ is decoupled into three modules under the *zeus/ownership_protocol* folder for simplicity. *ZeusOwnership.tla* and *ZeusOwnershipMeta.tla* specify (and can be used to model check) the ownership protocol in the absence of faults. The specification with failures is built on top of those in the module *ZeusOwnershipFaults.tla*. 67 | 68 | #### Caveats 69 | * The reliable commit specification does not include the pipelining optimization yet, and the ownership specification focuses on the slow-path for now -- which is mandatory to model check faults. 70 | * Apart from acquiring ownership, the ownership protocol can be utilized to handle other dynamic sharding actions (e.g., remove or add a reader replica) which were omitted from the paper. We may describe those in a separate online document if there is interest. 71 | 72 | ---- 73 | ### License 74 | This work is freely distributed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0 "Apache 2.0"). 75 | 76 | ### Contact 77 | Antonios Katsarakis: `antonis.io` | [`antoniskatsarakis@yahoo.com`](mailto:antoniskatsarakis@yahoo.com?subject=[GitHub]%20Zeus%20Specification "Email") 78 | 79 | -------------------------------------------------------------------------------- /ownership_protocol/ZeusOwnership.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ease-lab/Zeus/1dc21c5365ec44553e0aeba1a667204bdb5a3a12/ownership_protocol/ZeusOwnership.pdf -------------------------------------------------------------------------------- /ownership_protocol/ZeusOwnership.tla: -------------------------------------------------------------------------------- 1 | --------------------------- MODULE ZeusOwnership --------------------------- 2 | EXTENDS ZeusOwnershipMeta 3 | \* This Module specifies the full slow-path of the Zeus ownership protocol 4 | \* as appears in the according paper of Eurosys'21 without faults. 5 | \* It model checks its properties in the face of concurrent conflicting 6 | \* requests of changing ownerships and emulated reliable commits. 7 | 8 | \* Faults are added on top with the ZeusOwnershipFaults.tla spec 9 | 10 | ------------------------------------------------------------------------------------- 11 | \* WARNING: We need to make sure that requester REQs are executed at most once; this requires: 12 | \* an APP node to be sticky to its LB driver for a Key (unless failure) and send REQ msgs via a 13 | \* FIFO REQ channel so that driver does not re-issues REQs that have been already completed in the past! 14 | 15 | \* We emulate executing only once via committedRTS (committedREQs is used to check INVARIANTS) 16 | commit_REQ(o_ts, r_ts) == 17 | /\ committedRTS' = committedRTS \union {r_ts} 18 | /\ committedREQs' = committedREQs \union {o_ts} 19 | 20 | upd_t_meta(n, version, state, t_acks) == 21 | /\ tState' = [tState EXCEPT![n] = state] 22 | /\ tVersion' = [tVersion EXCEPT![n] = version] 23 | /\ tRcvACKs' = [tRcvACKs EXCEPT![n] = t_acks] 24 | 25 | upd_r_meta(n, ver, tb, id, type) == 26 | /\ rID' = [rID EXCEPT![n] = id] 27 | /\ rEID' = [rEID EXCEPT![n] = mEID] \* always update to latest mEID 28 | /\ rType' = [rType EXCEPT![n] = type] 29 | /\ rTS' = [rTS EXCEPT![n].ver = ver, ![n].tb = tb] 30 | 31 | \* to update the epoch id of last message issue 32 | upd_rEID(n) == upd_r_meta(n, rTS[n].ver, rTS[n].tb, rID[n], rType[n]) 33 | 34 | upd_o_meta(n, ver, tb, state, driver, vec, ACKs) == 35 | /\ oVector' = [oVector EXCEPT![n] = vec] 36 | /\ oRcvACKs' = [oRcvACKs EXCEPT![n] = ACKs] 37 | /\ oState' = [oState EXCEPT![n] = state] 38 | /\ oDriver' = [oDriver EXCEPT![n] = driver] 39 | /\ oTS' = [oTS EXCEPT![n].ver = ver, ![n].tb = tb] 40 | 41 | upd_o_meta_driver(n, ver, tb) == upd_o_meta(n, ver, tb, "drive", n, oVector[n], {}) 42 | upd_o_meta_add_ack(n, sender) == 43 | upd_o_meta(n, oTS[n].ver, oTS[n].tb, oState[n], oDriver[n], oVector[n], oRcvACKs[n] \union {sender}) 44 | 45 | upd_o_meta_apply_val(n, m) == 46 | /\ IF rTS[n].tb \notin mAliveNodes 47 | THEN upd_o_meta(n, oTS[n].ver, oTS[n].tb, "valid", 0, post_oVec(n, 0, oVector[n]), {}) 48 | ELSE upd_o_meta(n, oTS[n].ver, oTS[n].tb, "valid", 0, post_oVec(n, rTS[n].tb, oVector[n]), {}) 49 | 50 | upd_o_meta_apply_val_n_reset_o_state(n) == 51 | upd_o_meta(n, 0, 0, "valid", 0, [readers |-> {}, owner |-> 0], {}) 52 | 53 | ------------------------------------------------------------------------------------- 54 | \* REQUESTER Helper operators 55 | 56 | choose_req(n) == 57 | LET choice == CHOOSE x \in {0,1} : TRUE IN 58 | IF is_reader(n) 59 | THEN /\ IF choice = 0 60 | THEN "change-owner" 61 | ELSE "remove-reader" 62 | ELSE /\ IF choice = 0 63 | THEN "add-owner" 64 | ELSE "add-reader" 65 | 66 | max_commited_ver(S, n) == IF \A i \in S: i.tb # n THEN [ver |-> 0, tb |-> 0] 67 | ELSE CHOOSE i \in S: /\ i.tb = n 68 | /\ \A j \in S: \/ j.tb # n 69 | \/ j.ver <= i.ver 70 | 71 | next_rTS_ver(n) == max_commited_ver(committedRTS, n).ver + 1 72 | 73 | upd_rs_meta_n_send_req(n, r_type) == 74 | /\ upd_r_meta(n, next_rTS_ver(n), n, 0, r_type) 75 | /\ upd_o_meta(n, 0, 0, "request", 0, [readers |-> {}, owner |-> 0], {}) 76 | /\ o_send_req([ver |-> next_rTS_ver(n), tb |-> n], 0, r_type) 77 | 78 | ------------------------------------------------------------------------------------- 79 | \* REQUESTER ACTIONS 80 | 81 | ORequesterREQ(n) == \* Requester issues a REQ 82 | /\ is_valid_requester(n) 83 | /\ is_reader(n) 84 | /\ next_rTS_ver(n) <= O_MAX_VERSION \* bound execution --> Bound this in reachable states 85 | /\ upd_rs_meta_n_send_req(n, "change-owner") \* to limit the state space only choose change ownership 86 | \* /\ upd_rs_meta_n_send_req(n, choose_req(n)) 87 | /\ unchanged_mtc 88 | 89 | \* Requester receives NACK and replays REQ w/ higher rID 90 | ORequesterNACK(n) == 91 | /\ is_in_progress_requester(n) 92 | /\ rID[n] < O_MAX_VERSION \* TODO: may Bound rID to number of APP_NODES instead 93 | /\ \E m \in oMsgs: o_rcv_nack(m, n) 94 | /\ upd_r_meta(n, rTS[n].ver, n, rID[n] + 1, rType[n]) 95 | /\ o_send_req([ver |-> rTS[n].ver, tb |-> n], rID[n] + 1, rType[n]) 96 | /\ unchanged_mtco 97 | 98 | ORequesterRESP(n) == \* Requester receives a RESP and sends a VAL to arbiters 99 | \E m \in oMsgs: 100 | /\ o_rcv_resp(m, n) 101 | /\ is_in_progress_requester(n) 102 | /\ commit_REQ(m.oTS, rTS[n]) 103 | /\ upd_t_meta(n, m.tVersion, "valid", tRcvACKs[n]) \* todo this is optional 104 | /\ upd_o_meta(n, m.oTS.ver, m.oTS.tb, "valid", 0, post_oVec(n, n, m.oVector), {}) 105 | /\ o_send_val(m.oTS) 106 | /\ unchanged_mtr 107 | 108 | ORequesterActions == 109 | \E n \in APP_LIVE_NODES: 110 | \/ ORequesterREQ (n) 111 | \/ ORequesterNACK(n) 112 | \/ ORequesterRESP(n) 113 | 114 | ------------------------------------------------------------------------------------- 115 | \* DRIVER ACTIONS 116 | ODriverINV(n, m) == 117 | /\ o_rcv_req(m) 118 | /\ oState[n] = "valid" 119 | /\ oTS[n].ver < O_MAX_VERSION \* bound execution --> Bound this in reachable states 120 | /\ upd_t_meta(n, 0, tState[n], tRcvACKs[n]) 121 | /\ upd_r_meta(n, m.rTS.ver, m.rTS.tb, m.rID, m.rType) 122 | /\ upd_o_meta_driver(n, oTS[n].ver + 1, n) 123 | /\ o_send_inv(n, n, [ver |-> oTS[n].ver + 1, tb |-> n], oVector[n], m.rTS, m.rID, m.rType) 124 | /\ unchanged_mc 125 | 126 | ODriverNACK(n, m) == 127 | /\ o_rcv_req(m) 128 | /\ rTS[n] # m.rTS 129 | /\ oState[n] # "valid" 130 | /\ msg_not_exists(o_rcv_nack, m.rTS.tb) \* NACK does not exist (bound state space) 131 | /\ o_send_nack(m.rTS, m.rID) 132 | /\ unchanged_mtrco 133 | 134 | ODriverACK(n, m) == 135 | /\ o_rcv_ack(m, n) 136 | /\ upd_o_meta_add_ack(n, m.sender) 137 | /\ IF m.tVersion # 0 138 | THEN upd_t_meta(n, m.tVersion, tState[n], tRcvACKs[n]) 139 | ELSE unchanged_t 140 | /\ unchanged_Mmrc 141 | 142 | ODriverRESP(n) == 143 | /\ oState[n] = "drive" 144 | /\ has_rcved_all_ACKs(n) 145 | /\ requester_is_alive(n) 146 | /\ msg_not_exists(o_rcv_resp, rTS[n].tb) \* RESP does not exist (bound state space) 147 | /\ o_send_resp(rTS[n], oTS[n], post_oVec(n, rTS[n].tb, oVector[n]), tVersion[n]) 148 | /\ unchanged_mtrco 149 | 150 | ODriverActions == 151 | \E n \in LB_LIVE_NODES: 152 | \/ ODriverRESP(n) 153 | \/ \E m \in oMsgs: 154 | \/ ODriverINV (n, m) 155 | \/ ODriverNACK(n, m) 156 | \/ ODriverACK (n, m) 157 | 158 | ------------------------------------------------------------------------------------- 159 | \* LB ARBITER ACTIONS 160 | inv_to_be_applied(n, m) == 161 | \/ o_rcv_inv_greater_ts(m, n) 162 | \/ (o_rcv_inv_equal_ts(m, n) /\ oState[n] = "invalid" /\ m.epochID > rEID[n]) 163 | 164 | check_n_apply_inv(n, m) == 165 | /\ inv_to_be_applied(n, m) 166 | /\ upd_r_meta(n, m.rTS.ver, m.rTS.tb, m.rID, m.rType) 167 | /\ upd_o_meta(n, m.oTS.ver, m.oTS.tb, "invalid", m.driver, m.oVector, {}) 168 | 169 | \* We do not model lost messages thus arbiter need not respond w/ INV when ts is smaller 170 | OLBArbiterINV(n, m) == 171 | /\ check_n_apply_inv(n, m) 172 | /\ \/ oState[n] # "drive" 173 | \/ o_send_nack(rTS[n], rID[n]) 174 | /\ o_send_ack(n, m.oTS, 0) 175 | /\ unchanged_mtc 176 | 177 | OLBArbiterVAL(n, m) == 178 | /\ o_rcv_val(m, n) 179 | /\ upd_o_meta_apply_val(n, m) 180 | /\ unchanged_Mmtrc 181 | 182 | OLBArbiterActions == 183 | \E n \in LB_LIVE_NODES: \E m \in oMsgs: 184 | \/ OLBArbiterINV(n, m) 185 | \/ OLBArbiterVAL(n, m) 186 | 187 | ------------------------------------------------------------------------------------- 188 | \* (O)wner or (R)eader ARBITER ACTIONS 189 | 190 | \* reader doesn't apply an INV but always responds with an ACK 191 | \* (and data if non-sharing rType and in tValid state) 192 | ORArbiterINV(n, m) == 193 | /\ is_reader(n) 194 | /\ tState[n] = "valid" 195 | /\ o_rcv_inv(m, n) 196 | /\ o_send_ack(n, m.oTS, tVersion[n]) 197 | /\ unchanged_mtrco 198 | 199 | OOArbiterINV(n, m) == 200 | /\ is_owner(n) 201 | /\ m.type = "S_INV" 202 | /\ m.oVector.owner = n \* otherwise owner lost a VAL --> SFMOArbiterINVLostVAL 203 | /\ tState[n] = "valid" 204 | /\ check_n_apply_inv(n, m) 205 | /\ o_send_ack(n, m.oTS, tVersion[n]) 206 | /\ unchanged_mtc 207 | 208 | OOArbiterVAL(n, m) == 209 | /\ o_rcv_val(m, n) 210 | /\ IF oVector[n].owner = n 211 | THEN /\ upd_o_meta_apply_val(n, m) 212 | ELSE /\ upd_o_meta_apply_val_n_reset_o_state(n) 213 | /\ unchanged_Mmtrc 214 | 215 | OAPPArbiterActions == 216 | \E n \in APP_LIVE_NODES: \E m \in oMsgs: 217 | \/ ORArbiterINV(n, m) 218 | \/ OOArbiterINV(n, m) 219 | \/ OOArbiterVAL(n, m) 220 | 221 | ------------------------------------------------------------------------------------- 222 | \* Owner actions emulating tx updates 223 | 224 | TOwnerINV(n) == 225 | /\ upd_t_meta(n, tVersion[n] + 1, "write", {}) 226 | /\ t_send(n, "T_INV", tVersion[n] + 1) 227 | /\ unchanged_mrco 228 | 229 | TOwnerACK(n) == 230 | \E m \in oMsgs: 231 | /\ t_rcv_ack(m, n) 232 | /\ upd_t_meta(n, tVersion[n], tState[n], tRcvACKs[n] \union {m.sender}) 233 | /\ unchanged_Mmrco 234 | 235 | TOwnerVAL(n) == 236 | /\ oVector[n].readers \subseteq tRcvACKs[n] \* has received all acks from readers 237 | /\ upd_t_meta(n, tVersion[n], "valid", {}) 238 | /\ t_send(n, "T_VAL", tVersion[n]) 239 | /\ unchanged_mrco 240 | 241 | \* Reader actions emulating tx updates 242 | TReaderINV(n) == 243 | \E m \in oMsgs: 244 | /\ t_rcv_inv(m, n) 245 | /\ m.tVersion > tVersion[n] 246 | /\ upd_t_meta(n, m.tVersion, "invalid", {}) 247 | /\ t_send(n, "T_ACK", m.tVersion) 248 | /\ unchanged_mrco 249 | 250 | TReaderVAL(n) == 251 | \E m \in oMsgs: 252 | /\ t_rcv_val(m, n) 253 | /\ m.tVersion = tVersion[n] 254 | /\ upd_t_meta(n, tVersion[n], "valid", {}) 255 | /\ unchanged_Mmrco 256 | 257 | TOwnerReaderActions == 258 | \E n \in APP_LIVE_NODES: 259 | \/ /\ is_valid_owner(n) 260 | /\ \/ TOwnerINV(n) 261 | \/ TOwnerACK(n) 262 | \/ TOwnerVAL(n) 263 | \/ /\ is_reader(n) 264 | /\ \/ TReaderINV(n) 265 | \/ TReaderVAL(n) 266 | 267 | ------------------------------------------------------------------------------------- 268 | \* Modeling Sharding protocol (Requester and Arbiter actions) 269 | ONext == 270 | \/ OInit_min_owner_rest_readers 271 | \/ ORequesterActions 272 | \/ ODriverActions 273 | \/ OLBArbiterActions 274 | \/ OAPPArbiterActions 275 | \* \/ TOwnerReaderActions 276 | 277 | (***************************************************************************) 278 | (* The complete definition of the algorithm *) 279 | (***************************************************************************) 280 | 281 | Spec == OInit /\ [][ONext]_vars 282 | 283 | Invariants == /\ ([]OTypeOK) 284 | /\ ([]CONSISTENT_DATA) /\ ([]ONLY_ONE_CONC_REQ_COMMITS) 285 | /\ ([]AT_MOST_ONE_OWNER) /\ ([]OWNER_LATEST_DATA) 286 | /\ ([]CONSISTENT_SHARERS) /\ ([]CONSISTENT_OVECTORS) 287 | 288 | 289 | THEOREM Spec => Invariants 290 | ------------------------------------------------------------------------------------- 291 | \* 292 | \*LSpec == Spec /\ WF_vars(ONext) 293 | \* 294 | \*LIVENESS == \E i \in LB_NODES: []<>(oState[i] = "valid" /\ oTS[i].ver > 3) 295 | \*----------------------------------------------------------------------------- 296 | \*THEOREM LSpec => LIVENESS 297 | ============================================================================= 298 | -------------------------------------------------------------------------------- /ownership_protocol/ZeusOwnershipFaults.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ease-lab/Zeus/1dc21c5365ec44553e0aeba1a667204bdb5a3a12/ownership_protocol/ZeusOwnershipFaults.pdf -------------------------------------------------------------------------------- /ownership_protocol/ZeusOwnershipFaults.tla: -------------------------------------------------------------------------------- 1 | --------------------------- MODULE ZeusOwnershipFaults --------------------------- 2 | 3 | EXTENDS ZeusOwnership 4 | 5 | sharing_ok == \A nn \in APP_LIVE_NODES: \/ ~has_data(nn) 6 | \/ tState[nn] = "valid" 7 | 8 | arb_replay(n) == 9 | /\ upd_rEID(n) 10 | /\ upd_o_meta_driver(n, oTS[n].ver, oTS[n].tb) 11 | /\ o_send_inv(n, n, oTS[n], oVector[n], rTS[n], rID[n], rType[n]) 12 | 13 | 14 | ------------------------------------------------------------------------------------- 15 | \* Requester replays msg after a failure 16 | OFRequester == 17 | \E n \in APP_LIVE_NODES: 18 | /\ oState[n] = "request" 19 | /\ rEID[n] < mEID 20 | /\ upd_rEID(n) 21 | /\ o_send_req(rTS[n], rID[n], rType[n]) 22 | /\ unchanged_mtco 23 | 24 | OFDriverRequester == \* waits for sharing-ok + computes next oVec transitions to valid + 25 | \* send vals with the proper changes in the oVec 26 | \E n \in LB_LIVE_NODES: 27 | /\ oState[n] = "drive" 28 | /\ has_rcved_all_ACKs(n) 29 | /\ ~requester_is_alive(n) 30 | /\ o_send_val(oTS[n]) 31 | /\ upd_o_meta(n, oTS[n].ver, oTS[n].tb, "valid", 0, post_oVec(n, 0, oVector[n]), {}) 32 | /\ unchanged_mtrc 33 | 34 | OFArbReplay == \* drivers resets acks and replays msg on arbiter failures 35 | \* if the failed arbiter was an owner we need to wait for sharing-ok 36 | \* for convinience arb-replays happen on any failure (e.g., requester) 37 | \E n \in mAliveNodes: 38 | /\ (oState[n] = "drive" \/ oState[n] = "invalid") 39 | /\ (n \in LB_LIVE_NODES \/ oVector[n].owner = n) 40 | /\ rEID[n] < mEID 41 | /\ \/ oVector[n].owner \in mAliveNodes 42 | \/ sharing_ok 43 | /\ arb_replay(n) 44 | /\ unchanged_mtc 45 | 46 | OLBArbiterACK == \* ACK an INV message which has the same as local s_ts but wasn't applied 47 | \E n \in LB_LIVE_NODES: \E m \in oMsgs: 48 | /\ ~inv_to_be_applied(n, m) 49 | /\ o_rcv_inv_equal_ts(m, n) 50 | /\ o_send_ack(n, m.oTS, 0) 51 | /\ unchanged_mtrco 52 | 53 | ------------------------------------------------------------------------------------- 54 | \* INV response to an owner who did an arb-replay due to a lost val 55 | OFMOArbiterLostVALOldReplay == 56 | \E l \in LB_LIVE_NODES: \E a \in APP_LIVE_NODES: 57 | /\ oState[a] = "drive" 58 | /\ oVector[a].owner = a 59 | /\ is_greaterTS(oTS[l], oTS[a]) 60 | /\ o_send_inv(l, l, oTS[l], oVector[l], rTS[l], rID[l], rType[l]) 61 | /\ unchanged_mtrco 62 | 63 | \* message failures 64 | OFMOArbiterINVLostVAL == \* An INV is received (w/ higher ts) to a non-valid owner 65 | \* who lost a VAL for the message that demoted him 66 | \E n \in APP_LIVE_NODES: \E m \in oMsgs: 67 | /\ oVector[n].owner = n 68 | /\ o_rcv_inv_greater_ts(m, n) 69 | /\ m.oVector.owner # n 70 | /\ upd_o_meta_apply_val_n_reset_o_state(n) 71 | /\ o_send_ack(n, m.oTS, tVersion[n]) 72 | /\ unchanged_mtrc 73 | 74 | OFMRequesterVALReplay == \* Requester receives a RESP (already applied) 75 | \* and re-sends a VAL to arbiters 76 | \E n \in APP_LIVE_NODES: \E m \in oMsgs: 77 | /\ o_rcv_resp(m, n) 78 | /\ m.rTS.tb = n 79 | /\ m.oTS = oTS[n] 80 | /\ o_send_val(m.oTS) 81 | /\ unchanged_mtrco 82 | 83 | ------------------------------------------------------------------------------------- 84 | block_owner_failures_if_not_in_tx_valid_state(n) == 85 | \/ has_valid_data(n) 86 | \/ ~is_valid_owner(n) 87 | 88 | \* Emulate a node failure if there more than 2 alive nodes in LIVE_NODE_SET 89 | nodeFailure(n, LIVE_NODE_SET) == 90 | /\ n \in LIVE_NODE_SET 91 | \* /\ block_owner_failures_if_not_in_valid_state(n) 92 | /\ Cardinality(LIVE_NODE_SET) > 2 93 | \* Update Membership and epoch id 94 | /\ mEID' = mEID + 1 95 | /\ mAliveNodes' = mAliveNodes \ {n} 96 | \* Remove failed node from oVectors 97 | /\ oVector' = [l \in O_NODES |-> [readers |-> oVector[l].readers \ {n}, 98 | owner |-> IF oVector[l].owner = n 99 | THEN 0 100 | ELSE oVector[l].owner ]] 101 | /\ unchanged_Mtrc 102 | /\ UNCHANGED <> 103 | 104 | ------------------------------------------------------------------------------------- 105 | FNext == 106 | \/ OFRequester 107 | \/ OFDriverRequester 108 | \/ OFArbReplay 109 | \/ OLBArbiterACK 110 | \/ OFMOArbiterINVLostVAL 111 | \/ OFMRequesterVALReplay 112 | \/ OFMOArbiterLostVALOldReplay 113 | 114 | OFNext == 115 | \/ ONext 116 | \/ FNext 117 | \/ \E n \in mAliveNodes: 118 | \/ nodeFailure(n, LB_LIVE_NODES) \* emulate LB node failures 119 | \/ nodeFailure(n, APP_LIVE_NODES) \* emulate application node failures 120 | 121 | (***************************************************************************) 122 | (* The complete definition of the algorithm *) 123 | (***************************************************************************) 124 | 125 | SFSpec == OInit /\ [][OFNext]_vars 126 | 127 | THEOREM SFSpec => Invariants 128 | ============================================================================= 129 | -------------------------------------------------------------------------------- /ownership_protocol/ZeusOwnershipMeta.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ease-lab/Zeus/1dc21c5365ec44553e0aeba1a667204bdb5a3a12/ownership_protocol/ZeusOwnershipMeta.pdf -------------------------------------------------------------------------------- /ownership_protocol/ZeusOwnershipMeta.tla: -------------------------------------------------------------------------------- 1 | --------------------------- MODULE ZeusOwnershipMeta --------------------------- 2 | 3 | EXTENDS Integers, FiniteSets 4 | 5 | CONSTANTS \* LB_NODES and APP_NODES must not intersect and neither should contain 0. 6 | LB_NODES, 7 | APP_NODES, 8 | O_MAX_VERSION, 9 | O_MAX_FAILURES, 10 | O_MAX_DATA_VERSION 11 | 12 | 13 | VARIABLES \* variable prefixes --> o: ownership, r: request, t: transactional, m: membership 14 | \* VECTORS indexed by node_id 15 | oTS, 16 | oState, 17 | oDriver, 18 | oVector, \* No readers/owner: .readers = {} / .owner = 0 19 | oRcvACKs, 20 | \* 21 | rTS, 22 | rID, 23 | rType, 24 | rEID, \* since we do not have message loss timeouts we use this to 25 | \* track epoch of last issued INVs for replays 26 | tState, 27 | tVersion, \* tVesion sufice to represent tData | = 0 --> no data | > 0 data (reader / owner) 28 | tRcvACKs, 29 | \* GLOBAL variables 30 | oMsgs, 31 | mAliveNodes, \* membership 32 | mEID, \* membership epoch id 33 | committedREQs, \* only to check invariant that exactly one of concurrent REQs is committed 34 | committedRTS \* only to emulate FIFO REQ channels (i.e., do not re execute same client requests) 35 | 36 | vars == << oTS, oState, oDriver, oVector, oRcvACKs, rTS, rID, rType, rEID, 37 | tState, tVersion, tRcvACKs, oMsgs, mAliveNodes, mEID, committedREQs, committedRTS>> 38 | 39 | \* Helper operators 40 | O_NODES == LB_NODES \union APP_NODES 41 | O_NODES_0 == O_NODES \union {0} 42 | LB_NODES_0 == LB_NODES \union {0} 43 | APP_NODES_0 == APP_NODES \union {0} 44 | LB_LIVE_NODES == LB_NODES \intersect mAliveNodes 45 | APP_LIVE_NODES == APP_NODES \intersect mAliveNodes 46 | LB_LIVE_ARBITERS(driver) == LB_LIVE_NODES \ {driver} \* all arbiters except driver and owner 47 | 48 | ASSUME LB_NODES \intersect APP_NODES = {} 49 | ASSUME \A k \in O_NODES: k # 0 \* we use 0 as the default noop 50 | 51 | ------------------------------------------------------------------------------------- 52 | \* Useful Unchanged shortcuts 53 | unchanged_M == UNCHANGED <> 54 | unchanged_m == UNCHANGED <> 55 | unchanged_t == UNCHANGED <> 56 | unchanged_r == UNCHANGED <> 57 | unchanged_c == UNCHANGED <> 58 | unchanged_o == UNCHANGED <> 59 | unchanged_mc == unchanged_m /\ unchanged_c 60 | unchanged_mtc == unchanged_mc /\ unchanged_t 61 | unchanged_mtr == unchanged_m /\ unchanged_t /\ unchanged_r 62 | unchanged_Mrc == unchanged_r /\ unchanged_c /\ unchanged_M 63 | unchanged_mrco == unchanged_mc /\ unchanged_r /\ unchanged_o 64 | unchanged_mtco == unchanged_mtc /\ unchanged_o 65 | unchanged_mtrc == unchanged_mtc /\ unchanged_r 66 | unchanged_Mtrc == unchanged_Mrc /\ unchanged_t 67 | unchanged_Mmrc == unchanged_Mrc /\ unchanged_m 68 | unchanged_mtrco == unchanged_mtrc /\ unchanged_o 69 | unchanged_Mmrco == unchanged_mrco /\ unchanged_M 70 | unchanged_Mmtrc == unchanged_mtrc /\ unchanged_M 71 | 72 | 73 | ------------------------------------------------------------------------------------- 74 | \* Type definitions 75 | Type_oTS == [ver: 0..O_MAX_VERSION, tb: LB_NODES_0] 76 | Type_rTS == [ver: 0..O_MAX_VERSION, tb: APP_NODES_0] 77 | Type_tState == {"valid", "invalid", "write"} \* readers can be in valid and invalid and owner in valid and write 78 | Type_oState == {"valid", "invalid", "drive", "request"} \* all nodes start from valid 79 | Type_rType == {"add-owner", "change-owner", "add-reader", "rm-reader", "NOOP"} 80 | Type_oVector == [readers: SUBSET APP_NODES, owner: APP_NODES_0] 81 | 82 | Type_oMessage == \* Msgs exchanged by the sharding protocol 83 | [type: {"REQ"}, rTS : Type_rTS, 84 | rID : Nat, 85 | rType : Type_rType, 86 | epochID : 0..O_MAX_FAILURES] 87 | \union 88 | [type: {"NACK"}, rTS : Type_rTS, 89 | rID : Nat] 90 | \union 91 | [type: {"S_INV"}, sender : O_NODES, 92 | driver : O_NODES, 93 | rTS : Type_rTS, 94 | rID : Nat, 95 | oTS : Type_oTS, 96 | oVector : Type_oVector, 97 | rType : Type_rType, 98 | epochID : 0..O_MAX_FAILURES] 99 | \union 100 | [type: {"S_ACK"}, sender : O_NODES, 101 | oTS : Type_oTS, 102 | tVersion : 0..O_MAX_DATA_VERSION, \* emulates data send as well 103 | epochID : 0..O_MAX_FAILURES] 104 | \union 105 | [type: {"RESP"}, oVector : Type_oVector, 106 | oTS : Type_oTS, 107 | rTS : Type_rTS, 108 | \* preOwner , \* pre-request owner is not needed for model check (since we model bcast messages) 109 | tVersion : 0..O_MAX_DATA_VERSION, 110 | epochID : 0..O_MAX_FAILURES] 111 | \union 112 | [type: {"S_VAL"}, oTS : Type_oTS, 113 | epochID : 0..O_MAX_FAILURES] 114 | 115 | 116 | Type_tMessage == \* msgs exchanged by the transactional reliable commit protocol 117 | [type: {"T_INV", "T_ACK", "T_VAL"}, tVersion : Nat, 118 | sender : O_NODES, 119 | epochID : 0..O_MAX_FAILURES] 120 | 121 | 122 | ------------------------------------------------------------------------------------- 123 | \* Type check and initialization 124 | 125 | OTypeOK == \* The type correctness invariant 126 | /\ oTS \in [O_NODES -> Type_oTS] 127 | /\ oState \in [O_NODES -> Type_oState] 128 | /\ oDriver \in [O_NODES -> O_NODES_0] 129 | /\ oVector \in [O_NODES -> Type_oVector] 130 | /\ \A n \in O_NODES: oRcvACKs[n] \subseteq (O_NODES \ {n}) 131 | /\ rTS \in [O_NODES -> Type_rTS] 132 | /\ rID \in [O_NODES -> 0..O_MAX_VERSION] 133 | /\ rType \in [O_NODES -> Type_rType] 134 | /\ rEID \in [O_NODES -> 0..(Cardinality(O_NODES) - 1)] 135 | /\ tVersion \in [O_NODES -> 0..O_MAX_DATA_VERSION] 136 | /\ tState \in [O_NODES -> Type_tState] 137 | /\ \A n \in O_NODES: tRcvACKs[n] \subseteq (O_NODES \ {n}) 138 | /\ committedREQs \subseteq Type_oTS 139 | /\ committedRTS \subseteq Type_rTS 140 | /\ oMsgs \subseteq (Type_oMessage \union Type_tMessage) 141 | /\ mEID \in 0..(Cardinality(O_NODES) - 1) 142 | /\ mAliveNodes \subseteq O_NODES 143 | 144 | OInit == \* The initial predicate 145 | /\ oTS = [n \in O_NODES |-> [ver |-> 0, tb |-> 0]] 146 | /\ oState = [n \in O_NODES |-> "valid"] 147 | /\ oDriver = [n \in O_NODES |-> 0] 148 | /\ oVector = [n \in O_NODES |-> [readers |-> {}, owner |-> 0]] 149 | /\ oRcvACKs = [n \in O_NODES |-> {}] 150 | /\ rTS = [n \in O_NODES |-> [ver |-> 0, tb |-> 0]] 151 | /\ rID = [n \in O_NODES |-> 0] 152 | /\ rEID = [n \in O_NODES |-> 0] 153 | /\ rType = [n \in O_NODES |-> "NOOP"] 154 | /\ tVersion = [n \in O_NODES |-> 0] 155 | /\ tState = [n \in O_NODES |-> "valid"] 156 | /\ tRcvACKs = [n \in O_NODES |-> {}] 157 | /\ committedRTS = {} 158 | /\ committedREQs = {} 159 | /\ oMsgs = {} 160 | /\ mEID = 0 161 | /\ mAliveNodes = O_NODES 162 | 163 | Min(S) == CHOOSE x \in S: \A y \in S \ {x}: y > x 164 | set_wo_min(S) == S \ {Min(S)} 165 | 166 | \* First Command executed once after OInit to initialize owner/readers and oVector state 167 | OInit_min_owner_rest_readers == 168 | /\ \A x \in O_NODES: tVersion[x] = 0 169 | /\ tVersion' = [n \in O_NODES |-> IF n \in LB_NODES THEN 0 ELSE 1] 170 | /\ oVector' = [n \in O_NODES |-> IF n \in set_wo_min(APP_NODES) 171 | THEN oVector[n] 172 | ELSE [readers |-> set_wo_min(APP_NODES), 173 | owner |-> Min(APP_NODES)]] 174 | /\ unchanged_Mmrc 175 | /\ UNCHANGED <> 176 | 177 | ------------------------------------------------------------------------------------- 178 | \* Helper functions 179 | has_data(n) == tVersion[n] > 0 180 | has_valid_data(n) == /\ has_data(n) 181 | /\ tState[n] = "valid" 182 | 183 | is_owner(n) == /\ has_data(n) 184 | /\ oVector[n].owner = n 185 | 186 | is_valid_owner(n) == /\ is_owner(n) 187 | /\ oState[n] = "valid" 188 | 189 | is_reader(n) == /\ has_data(n) 190 | /\ ~is_owner(n) 191 | /\ n \notin LB_NODES 192 | 193 | is_live_arbiter(n) == \/ n \in LB_LIVE_NODES 194 | \/ is_owner(n) 195 | 196 | is_valid_live_arbiter(n) == /\ is_live_arbiter(n) 197 | /\ oState[n] = "valid" 198 | 199 | is_requester(n) == 200 | /\ n \in APP_LIVE_NODES 201 | /\ ~is_owner(n) 202 | 203 | is_valid_requester(n) == 204 | /\ is_requester(n) 205 | /\ oState[n] = "valid" 206 | 207 | is_in_progress_requester(n) == 208 | /\ is_requester(n) 209 | /\ oState[n] = "request" 210 | 211 | requester_is_alive(n) == rTS[n].tb \in mAliveNodes 212 | 213 | ------------------------------------------------------------------------------------- 214 | \* Timestamp Comparison Helper functions 215 | is_equalTS(ts1, ts2) == 216 | /\ ts1.ver = ts2.ver 217 | /\ ts1.tb = ts2.tb 218 | 219 | is_greaterTS(ts1, ts2) == 220 | \/ ts1.ver > ts2.ver 221 | \/ /\ ts1.ver = ts2.ver 222 | /\ ts1.tb > ts2.tb 223 | 224 | is_greatereqTS(ts1, ts2) == 225 | \/ is_equalTS(ts1, ts2) 226 | \/ is_greaterTS(ts1, ts2) 227 | 228 | is_smallerTS(ts1, ts2) == ~is_greatereqTS(ts1, ts2) 229 | 230 | ------------------------------------------------------------------------------------- 231 | \* Request type Helper functions 232 | is_non_sharing_req(n) == (rType[n] = "add-owner" \/ rType[n] = "add-reader") 233 | 234 | \* Post o_vector based on request type and r (requester or 0 if requester is not alive) 235 | post_oVec(n, r, pre_oVec) == 236 | IF (rType[n] = "add-owner" \/ rType[n] = "change-owner") 237 | THEN [owner |-> r, 238 | readers |-> (pre_oVec.readers \union {pre_oVec.owner}) \ {r, 0}] 239 | ELSE [owner |-> pre_oVec.owner, 240 | readers |-> IF rType[n] = "remove-reader" 241 | THEN pre_oVec.readers \ {r, 0} 242 | ELSE \* rType[n] = "add-reader" 243 | (pre_oVec.readers \union {r}) \ {0}] 244 | 245 | 246 | ------------------------------------------------------------------------------------- 247 | \* Message Helper functions 248 | 249 | \* Used only to emulate FIFO REQ channels (and not re-execute already completed REQs) 250 | not_completed_rTS(r_ts) == \A c_rTS \in committedRTS: c_rTS # r_ts 251 | 252 | \* Messages in oMsgs are only appended to this variable (not removed once delivered) 253 | \* intentionally to check protocols tolerance in dublicates and reorderings 254 | send_omsg(m) == oMsgs' = oMsgs \union {m} 255 | 256 | o_send_req(r_ts, r_id, r_type) == 257 | send_omsg([type |-> "REQ", 258 | rTS |-> r_ts, 259 | rID |-> r_id, 260 | rType |-> r_type, 261 | epochID |-> mEID ]) 262 | 263 | o_send_nack(r_ts, r_id) == 264 | send_omsg([type |-> "NACK", 265 | rTS |-> r_ts, 266 | rID |-> r_id]) 267 | 268 | o_send_inv(sender, driver, o_ts, o_vec, r_ts, r_id, r_type) == 269 | send_omsg([type |-> "S_INV", 270 | sender |-> sender, 271 | driver |-> driver, 272 | oTS |-> o_ts, 273 | oVector |-> o_vec, 274 | rTS |-> r_ts, 275 | rID |-> r_id, 276 | rType |-> r_type, 277 | epochID |-> mEID ]) 278 | 279 | o_send_ack(sender, o_ts, t_version) == 280 | send_omsg([type |-> "S_ACK", 281 | sender |-> sender, 282 | oTS |-> o_ts, 283 | tVersion |-> t_version, 284 | epochID |-> mEID ]) 285 | 286 | o_send_resp(r_ts, o_ts, o_vec, t_version) == 287 | send_omsg([type |-> "RESP", 288 | oVector |-> o_vec, 289 | oTS |-> o_ts, 290 | rTS |-> r_ts, 291 | tVersion |-> t_version, 292 | epochID |-> mEID ]) 293 | 294 | o_send_val(o_ts) == 295 | send_omsg([type |-> "S_VAL", 296 | oTS |-> o_ts, 297 | epochID |-> mEID ]) 298 | 299 | \* Operators to check received messages (m stands for message) 300 | o_rcv_req(m) == 301 | /\ m.type = "REQ" 302 | /\ m.epochID = mEID 303 | /\ not_completed_rTS(m.rTS) 304 | 305 | o_rcv_nack(m, receiver) == 306 | /\ m.type = "NACK" 307 | /\ m.rTS = rTS[receiver] 308 | /\ m.rID = rID[receiver] 309 | 310 | o_rcv_resp(m, receiver) == 311 | /\ m.type = "RESP" 312 | /\ m.epochID = mEID 313 | /\ m.rTS = rTS[receiver] 314 | 315 | o_rcv_inv(m, receiver) == 316 | /\ m.type = "S_INV" 317 | /\ m.epochID = mEID 318 | /\ m.sender # receiver 319 | 320 | o_rcv_inv_equal_ts(m, receiver) == 321 | /\ o_rcv_inv(m, receiver) 322 | /\ is_equalTS(m.oTS, oTS[receiver]) 323 | 324 | o_rcv_inv_smaller_ts(m, receiver) == 325 | /\ o_rcv_inv(m, receiver) 326 | /\ is_smallerTS(m.oTS, oTS[receiver]) 327 | 328 | o_rcv_inv_greater_ts(m, receiver) == 329 | /\ o_rcv_inv(m, receiver) 330 | /\ is_greaterTS(m.oTS, oTS[receiver]) 331 | 332 | o_rcv_inv_greatereq_ts(m, receiver) == 333 | /\ o_rcv_inv(m, receiver) 334 | /\ ~is_smallerTS(m.oTS, oTS[receiver]) 335 | 336 | o_rcv_ack(m, receiver) == 337 | /\ m.type = "S_ACK" 338 | /\ m.epochID = mEID 339 | /\ m.sender # receiver 340 | /\ oState[receiver] = "drive" 341 | /\ m.sender \notin oRcvACKs[receiver] 342 | /\ is_equalTS(m.oTS, oTS[receiver]) 343 | 344 | o_rcv_val(m, receiver) == 345 | /\ m.type = "S_VAL" 346 | /\ m.epochID = mEID 347 | /\ oState[receiver] # "valid" 348 | /\ is_equalTS(m.oTS, oTS[receiver]) 349 | 350 | 351 | \* Used to not re-issue messages that already exists (and bound the state space) 352 | msg_not_exists(o_rcv_msg(_, _), receiver) == 353 | ~\E mm \in oMsgs: o_rcv_msg(mm, receiver) 354 | 355 | 356 | 357 | rcved_acks_from_set(n, set) == set \subseteq oRcvACKs[n] 358 | 359 | \* Check if all acknowledgments from arbiters have been received 360 | has_rcved_all_ACKs(n) == 361 | /\ rEID[n] = mEID 362 | /\ IF oVector[n].owner # 0 363 | THEN rcved_acks_from_set(n, {oVector[n].owner} \union LB_LIVE_ARBITERS(n)) 364 | ELSE \/ /\ ~requester_is_alive(n) 365 | /\ rcved_acks_from_set(n, LB_LIVE_ARBITERS(n)) 366 | \/ /\ oVector[n].readers # {} 367 | /\ \E x \in oVector[n].readers: rcved_acks_from_set(n, {x} \union LB_LIVE_ARBITERS(n)) 368 | ------------------------------------------------------------------------------------- 369 | \* message helper functions related to transactions 370 | t_send(n, msg_type, t_ver) == 371 | send_omsg([type |-> msg_type, 372 | tVersion |-> t_ver, 373 | sender |-> n, 374 | epochID |-> mEID ]) 375 | 376 | t_rcv_inv(m, receiver) == 377 | /\ m.type = "T_INV" 378 | /\ m.epochID = mEID 379 | /\ m.sender # receiver 380 | 381 | t_rcv_ack(m, receiver) == 382 | /\ m.type = "T_ACK" 383 | /\ m.epochID = mEID 384 | /\ m.sender # receiver 385 | /\ tState[receiver] = "write" 386 | /\ m.sender \notin tRcvACKs[receiver] 387 | /\ m.tVersion = tVersion[receiver] 388 | 389 | t_rcv_val(m, receiver) == 390 | /\ m.type = "T_VAL" 391 | /\ m.epochID = mEID 392 | /\ tState[receiver] # "valid" 393 | /\ m.tVersion = tVersion[receiver] 394 | 395 | ------------------------------------------------------------------------------------- 396 | \* Protocol Invariants: 397 | 398 | \* Valid data are consistent 399 | CONSISTENT_DATA == 400 | \A k,n \in APP_LIVE_NODES: \/ ~has_valid_data(k) 401 | \/ ~has_valid_data(n) 402 | \/ tVersion[n] = tVersion[k] 403 | 404 | \* Amongst concurrent sharing requests only one succeeds 405 | \* The invariant that an we cannot have two REQs committed with same versions 406 | \* (i.e., that read and modified the same sharing vector) 407 | ONLY_ONE_CONC_REQ_COMMITS == 408 | \A x,y \in committedREQs: \/ x.ver # y.ver 409 | \/ x.tb = y.tb 410 | 411 | \* There is always at most one valid owner 412 | AT_MOST_ONE_OWNER == 413 | \A n,m \in mAliveNodes: \/ ~is_valid_owner(n) 414 | \/ ~is_valid_owner(m) 415 | \/ m = n 416 | 417 | \* Valid owner has the most up-to-date data and version among live replicas 418 | OWNER_LATEST_DATA == 419 | \A o,k \in mAliveNodes: \/ ~is_valid_owner(o) 420 | \/ ~has_data(o) 421 | \/ tVersion[o] >= tVersion[k] 422 | 423 | \* All valid sharers (LB + owner) agree on their sharing vectors (and TS) 424 | CONSISTENT_SHARERS == 425 | \A k,n \in mAliveNodes: \/ ~is_valid_live_arbiter(n) 426 | \/ ~is_valid_live_arbiter(k) 427 | \/ /\ oTS[n] = oTS[k] 428 | /\ oVector[n] = oVector[k] 429 | 430 | 431 | CONSISTENT_OVECTORS_Fwd == 432 | \A n \in mAliveNodes: \/ ~is_valid_live_arbiter(n) 433 | \/ /\ \A r \in oVector[n].readers: 434 | /\ has_data(r) 435 | /\ ~is_valid_owner(r) 436 | /\ \/ oVector[n].owner = 0 437 | \/ is_owner(oVector[n].owner) 438 | 439 | CONSISTENT_OVECTORS_Reverse_owner == 440 | \A o,n \in mAliveNodes: \/ ~is_valid_owner(o) 441 | \/ ~is_valid_live_arbiter(n) 442 | \/ oVector[n].owner = o 443 | 444 | CONSISTENT_OVECTORS_Reverse_readers == 445 | \A r,n \in mAliveNodes: \/ ~is_reader(r) 446 | \/ ~is_valid_live_arbiter(n) 447 | \/ r \in oVector[n].readers 448 | 449 | \* The owner and readers are always correctly reflected by any valid sharing vectors 450 | CONSISTENT_OVECTORS == 451 | /\ CONSISTENT_OVECTORS_Fwd 452 | /\ CONSISTENT_OVECTORS_Reverse_owner 453 | /\ CONSISTENT_OVECTORS_Reverse_readers 454 | 455 | ============================================================================= 456 | -------------------------------------------------------------------------------- /reliable_commit_ptrotocol/ZeusReliableCommit.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ease-lab/Zeus/1dc21c5365ec44553e0aeba1a667204bdb5a3a12/reliable_commit_ptrotocol/ZeusReliableCommit.pdf -------------------------------------------------------------------------------- /reliable_commit_ptrotocol/ZeusReliableCommit.tla: -------------------------------------------------------------------------------- 1 | ------------------------------- MODULE ZeusReliableCommit ------------------------------- 2 | \* Specification of Zeus's reliable commit protocol presented in the Zeus paper 3 | \* that appears in Eurosys'21. 4 | \* This module includes everything but the pipelining optimization presented in the paper. 5 | 6 | \* Model check passed [@ 21st of Jan 2021] with the following parameters: 7 | \* R_NODES = {0, 1, 2} 8 | \* R_MAX_EPOCH = 4 9 | \* R_MAX_VERSION = 4 10 | 11 | EXTENDS Integers 12 | 13 | CONSTANTS R_NODES, 14 | R_MAX_EPOCH, 15 | R_MAX_VERSION 16 | 17 | VARIABLES rMsgs, 18 | rKeyState, 19 | rKeySharers, 20 | rKeyVersion, 21 | rKeyRcvedACKs, 22 | rKeyLastWriter, 23 | rNodeEpochID, 24 | rAliveNodes, 25 | rEpochID 26 | 27 | vars == << rMsgs, rKeyState, rKeySharers, rKeyVersion, rKeyRcvedACKs, 28 | rKeyLastWriter, rNodeEpochID, rAliveNodes, rEpochID >> 29 | ----------------------------------------------------------------------------- 30 | \* The consistent invariant: all alive nodes in valid state should have the same value / TS 31 | RConsistentInvariant == 32 | \A k,s \in rAliveNodes: \/ rKeyState[k] /= "valid" 33 | \/ rKeyState[s] /= "valid" 34 | \/ rKeyVersion[k] = rKeyVersion[s] 35 | 36 | RMaxVersionDistanceInvariant == \* this does not hold w/ the pipelining optimization 37 | \A k,s \in rAliveNodes: 38 | \/ rKeyVersion[k] <= rKeyVersion[s] + 1 39 | \/ rKeyVersion[s] <= rKeyVersion[k] + 1 40 | 41 | RSingleOnwerInvariant == 42 | \A k,s \in rAliveNodes: 43 | \/ rKeySharers[k] /= "owner" 44 | \/ rKeySharers[s] /= "owner" 45 | \/ k = s 46 | 47 | ROnwerOnlyWriterInvariant == 48 | \A k \in rAliveNodes: 49 | \/ rKeyState[k] /= "write" 50 | \/ rKeySharers[k] = "owner" 51 | 52 | ROnwerHighestVersionInvariant == \* owner has the highest version among alive nodes 53 | \A k,s \in rAliveNodes: 54 | \/ /\ rKeySharers[s] /= "owner" 55 | /\ rKeySharers[k] /= "owner" 56 | \/ 57 | /\ rKeySharers[k] = "owner" 58 | /\ rKeyVersion[k] >= rKeyVersion[s] 59 | \/ 60 | /\ rKeySharers[s] = "owner" 61 | /\ rKeyVersion[s] >= rKeyVersion[k] 62 | 63 | ----------------------------------------------------------------------------- 64 | 65 | RMessage == \* Messages exchanged by the Protocol 66 | [type: {"INV", "ACK"}, sender : R_NODES, 67 | epochID : 0..R_MAX_EPOCH, 68 | version : 0..R_MAX_VERSION] 69 | \union 70 | [type: {"VAL"}, epochID : 0..R_MAX_EPOCH, 71 | version : 0..R_MAX_VERSION] 72 | 73 | 74 | RTypeOK == \* The type correctness invariant 75 | /\ rMsgs \subseteq RMessage 76 | /\ rAliveNodes \subseteq R_NODES 77 | /\ \A n \in R_NODES: rKeyRcvedACKs[n] \subseteq (R_NODES \ {n}) 78 | /\ rNodeEpochID \in [R_NODES -> 0..R_MAX_EPOCH] 79 | /\ rKeyLastWriter \in [R_NODES -> R_NODES] 80 | /\ rKeyVersion \in [R_NODES -> 0..R_MAX_VERSION] 81 | /\ rKeySharers \in [R_NODES -> {"owner", "reader", "non-sharer"}] 82 | /\ rKeyState \in [R_NODES -> {"valid", "invalid", "write", "replay"}] 83 | 84 | 85 | RInit == \* The initial predicate 86 | /\ rMsgs = {} 87 | /\ rEpochID = 0 88 | /\ rAliveNodes = R_NODES 89 | /\ rKeyVersion = [n \in R_NODES |-> 0] 90 | /\ rNodeEpochID = [n \in R_NODES |-> 0] 91 | /\ rKeyRcvedACKs = [n \in R_NODES |-> {}] 92 | /\ rKeySharers = [n \in R_NODES |-> "reader"] 93 | /\ rKeyState = [n \in R_NODES |-> "valid"] 94 | /\ rKeyLastWriter = [n \in R_NODES |-> CHOOSE k \in R_NODES: 95 | \A m \in R_NODES: k <= m] 96 | 97 | ----------------------------------------------------------------------------- 98 | 99 | RNoChanges_in_membership == UNCHANGED <> 100 | 101 | RNoChanges_but_membership == 102 | UNCHANGED <> 105 | 106 | RNoChanges == 107 | /\ RNoChanges_in_membership 108 | /\ RNoChanges_but_membership 109 | 110 | ----------------------------------------------------------------------------- 111 | \* A buffer maintaining all network messages. Messages are only appended to 112 | \* this variable (not \* removed once delivered) intentionally to check 113 | \* protocol's tolerance in dublicates and reorderings 114 | RSend(m) == rMsgs' = rMsgs \union {m} 115 | 116 | \* Check if all acknowledgments for a write have been received 117 | RAllACKsRcved(n) == (rAliveNodes \ {n}) \subseteq rKeyRcvedACKs[n] 118 | 119 | RIsAlive(n) == n \in rAliveNodes 120 | 121 | RNodeFailure(n) == \* Emulate a node failure 122 | \* Make sure that there are atleast 3 alive nodes before killing a node 123 | /\ \E k,m \in rAliveNodes: /\ k /= n 124 | /\ m /= n 125 | /\ m /= k 126 | /\ rEpochID' = rEpochID + 1 127 | /\ rAliveNodes' = rAliveNodes \ {n} 128 | /\ RNoChanges_but_membership 129 | 130 | ----------------------------------------------------------------------------- 131 | RNewOwner(n) == 132 | /\ \A k \in rAliveNodes: 133 | /\ rKeySharers[k] /= "owner" 134 | /\ \/ /\ rKeyState[k] = "valid" \* all alive replicas are in valid state 135 | /\ rKeySharers[k] = "reader" \* and there is not alive owner 136 | \/ /\ rKeySharers[k] = "non-sharer" \* and there is not alive owner 137 | /\ rKeySharers' = [rKeySharers EXCEPT ![n] = "owner"] 138 | /\ UNCHANGED <> 140 | 141 | ROverthrowOwner(n) == 142 | \E k \in rAliveNodes: 143 | /\ rKeyState[k] = "valid" 144 | /\ rKeySharers[k] = "owner" 145 | /\ rKeySharers' = [rKeySharers EXCEPT ![n] = "owner", 146 | ![k] = "reader"] 147 | /\ UNCHANGED <> 149 | 150 | RGetOwnership(n) == 151 | /\ rKeySharers[n] /= "owner" 152 | /\ \A x \in rAliveNodes: rNodeEpochID[x] = rEpochID \*TODO may move this to RNewOwner 153 | /\ \/ ROverthrowOwner(n) 154 | \/ RNewOwner(n) 155 | ----------------------------------------------------------------------------- 156 | 157 | RRead(n) == \* Execute a read 158 | /\ rNodeEpochID[n] = rEpochID 159 | /\ rKeyState[n] = "valid" 160 | /\ RNoChanges 161 | 162 | RRcvInv(n) == \* Process a received invalidation 163 | \E m \in rMsgs: 164 | /\ m.type = "INV" 165 | /\ m.epochID = rEpochID 166 | /\ m.sender /= n 167 | /\ m.sender \in rAliveNodes 168 | \* always acknowledge a received invalidation (irrelevant to the timestamp) 169 | /\ RSend([type |-> "ACK", 170 | epochID |-> rEpochID, 171 | sender |-> n, 172 | version |-> m.version]) 173 | /\ \/ m.version > rKeyVersion[n] 174 | /\ rKeyState[n] \in {"valid", "invalid", "replay"} 175 | /\ rKeyState' = [rKeyState EXCEPT ![n] = "invalid"] 176 | /\ rKeyVersion' = [rKeyVersion EXCEPT ![n] = m.version] 177 | /\ rKeyLastWriter' = [rKeyLastWriter EXCEPT ![n] = m.sender] 178 | \/ m.version <= rKeyVersion[n] 179 | /\ UNCHANGED <> 180 | /\ UNCHANGED <> 181 | 182 | RRcvVal(n) == \* Process a received validation 183 | \E m \in rMsgs: 184 | /\ rKeyState[n] /= "valid" 185 | /\ m.type = "VAL" 186 | /\ m.epochID = rEpochID 187 | /\ m.version = rKeyVersion[n] 188 | /\ rKeyState' = [rKeyState EXCEPT ![n] = "valid"] 189 | /\ UNCHANGED <> 191 | 192 | RReaderActions(n) == \* Actions of a write follower 193 | \/ RRead(n) 194 | \/ RRcvInv(n) 195 | \/ RRcvVal(n) 196 | 197 | ------------------------------------------------------------------------------------- 198 | 199 | RWrite(n) == 200 | /\ rNodeEpochID[n] = rEpochID 201 | /\ rKeySharers[n] \in {"owner"} 202 | /\ rKeyState[n] \in {"valid"} \* May add invalid state here as well 203 | /\ rKeyVersion[n] < R_MAX_VERSION 204 | /\ rKeyLastWriter' = [rKeyLastWriter EXCEPT ![n] = n] 205 | /\ rKeyRcvedACKs' = [rKeyRcvedACKs EXCEPT ![n] = {}] 206 | /\ rKeyState' = [rKeyState EXCEPT ![n] = "write"] 207 | /\ rKeyVersion' = [rKeyVersion EXCEPT ![n] = rKeyVersion[n] + 1] 208 | /\ RSend([type |-> "INV", 209 | epochID |-> rEpochID, 210 | sender |-> n, 211 | version |-> rKeyVersion[n] + 1]) 212 | /\ UNCHANGED <> 213 | 214 | RRcvAck(n) == \* Process a received acknowledment 215 | \E m \in rMsgs: 216 | /\ m.type = "ACK" 217 | /\ m.epochID = rEpochID 218 | /\ m.sender /= n 219 | /\ m.version = rKeyVersion[n] 220 | /\ m.sender \notin rKeyRcvedACKs[n] 221 | /\ rKeyState[n] \in {"write", "replay"} 222 | /\ rKeyRcvedACKs' = [rKeyRcvedACKs EXCEPT ![n] = 223 | rKeyRcvedACKs[n] \union {m.sender}] 224 | /\ UNCHANGED <> 226 | 227 | RSendVals(n) == \* Send validations once received acknowledments from all alive nodes 228 | /\ rKeyState[n] \in {"write", "replay"} 229 | /\ RAllACKsRcved(n) 230 | /\ rKeyState' = [rKeyState EXCEPT![n] = "valid"] 231 | /\ RSend([type |-> "VAL", 232 | epochID |-> rEpochID, 233 | version |-> rKeyVersion[n]]) 234 | /\ UNCHANGED <> 236 | 237 | ROwnerActions(n) == \* Actions of a read/write coordinator 238 | \/ RRead(n) 239 | \/ RWrite(n) 240 | \/ RRcvAck(n) 241 | \/ RSendVals(n) 242 | 243 | ------------------------------------------------------------------------------------- 244 | 245 | RWriteReplay(n) == \* Execute a write-replay 246 | /\ rKeyLastWriter' = [rKeyLastWriter EXCEPT ![n] = n] 247 | /\ rKeyRcvedACKs' = [rKeyRcvedACKs EXCEPT ![n] = {}] 248 | /\ rKeyState' = [rKeyState EXCEPT ![n] = "replay"] 249 | /\ RSend([type |-> "INV", 250 | sender |-> n, 251 | epochID |-> rEpochID, 252 | version |-> rKeyVersion[n]]) 253 | /\ UNCHANGED <> 254 | 255 | RLocalWriteReplay(n) == 256 | /\ \/ rKeySharers[n] = "owner" 257 | \/ rKeyState[n] = "replay" 258 | /\ RWriteReplay(n) 259 | 260 | RFailedNodeWriteReplay(n) == 261 | /\ ~RIsAlive(rKeyLastWriter[n]) 262 | /\ rKeyState[n] = "invalid" 263 | /\ RWriteReplay(n) 264 | 265 | RUpdateLocalEpochID(n) == 266 | /\ rKeyState[n] = "valid" 267 | /\ rNodeEpochID' = [rNodeEpochID EXCEPT![n] = rEpochID] 268 | /\ UNCHANGED <> 270 | 271 | RReplayActions(n) == 272 | /\ rNodeEpochID[n] < rEpochID 273 | /\ \/ RLocalWriteReplay(n) 274 | \/ RFailedNodeWriteReplay(n) 275 | \/ RUpdateLocalEpochID(n) 276 | 277 | ------------------------------------------------------------------------------------- 278 | RNext == \* Modeling protocol (Owner and Reader actions while emulating failures) 279 | \E n \in rAliveNodes: 280 | \/ RReaderActions(n) 281 | \/ ROwnerActions(n) 282 | \/ RReplayActions(n) 283 | \/ RGetOwnership(n) 284 | \/ RNodeFailure(n) \* emulate node failures 285 | 286 | 287 | (***************************************************************************) 288 | (* The complete definition of the algorithm *) 289 | (***************************************************************************) 290 | 291 | Spec == RInit /\ [][RNext]_vars 292 | 293 | Invariants == /\ ([]RTypeOK) 294 | /\ ([]RConsistentInvariant) 295 | /\ ([]RSingleOnwerInvariant) 296 | /\ ([]ROnwerOnlyWriterInvariant) 297 | /\ ([]RMaxVersionDistanceInvariant) 298 | /\ ([]ROnwerHighestVersionInvariant) 299 | 300 | THEOREM Spec => Invariants 301 | ============================================================================= 302 | -------------------------------------------------------------------------------- /zeus.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ease-lab/Zeus/1dc21c5365ec44553e0aeba1a667204bdb5a3a12/zeus.png --------------------------------------------------------------------------------