├── README.md ├── RaftMongo.tla └── RaftMongoWithRaftReconfig.tla /README.md: -------------------------------------------------------------------------------- 1 | This is an attempt to model a simplified part of the replication system of MongoDB in TLA+. 2 | 3 | ## Spec 4 | MongoDB secondaries pull oplogs from any nodes that have more up-to-date oplogs, which is different than the push model in Raft. This spec models the gossip protocol with two synchronized actions: AppendOplog and RollbackOplog. 5 | 6 | The spec also simplifies the election protocol. Every election will succeed in one shot, including sending and replying vote requests and learning the new term. 7 | 8 | ## Model Checking 9 | I have successfully run model checker on the spec with a small model including: 10 | - 3 nodes (symmetrical model value) 11 | - Term up to 3 12 | - Number of logs up to 10 13 | 14 | State constraint: 15 | ``` 16 | /\ \forall i \in Server: globalCurrentTerm <= 3 17 | /\ \forall i \in Server: Len(log[i]) <= 10 18 | ``` 19 | Invarients to check: 20 | - NeverRollbackCommitted 21 | 22 | The model checker generates 7778 distinct states and passes. 23 | 24 | ## Play with the Spec 25 | To play with the spec, you may comment out Line 112 in RollbackCommitted action, which specifies that an oplog entry replicated to the majority nodes only **in the current term** can be considered as "committed". Otherwise, secondaries syncing from the the old primary will report the commit level to the old primary even though they have voted for the new primary. This differs from the Raft protocol. In Raft, voting for a new primary implies not accepting any logs from old leaders. 26 | 27 | Commenting out Line 112 will cause the model checker to fail, giving a simple concrete failure case. 28 | 29 | 1. Node 1 becomes the primary in term 2 and writes [index: 1, term: 2] to its oplog. 30 | 2. Node 3 wins an election in term 3 without the oplog on Node 1 and writes [index: 1, term: 3]. 31 | 3. Node 2 replicates [index: 1, term: 2] from Node 1, making this oplog entry replicated on the majority nodes, but it will be rolled back after syncing from Node 3. 32 | 33 | ## Conclusion 34 | We have found the exact same issue in MongoDB code. [SERVER-22136](https://jira.mongodb.org/browse/SERVER-22136) tracks the fix to notify the old primary of the new term. We've never encountered this issue in testing or in the field and only found it by reasoning about the edge cases. This shows writing and model checking TLA+ specs is an excellent alternative way to find and verify edge cases. 35 | -------------------------------------------------------------------------------- /RaftMongo.tla: -------------------------------------------------------------------------------- 1 | --------------------------------- MODULE RaftMongo --------------------------------- 2 | \* This is the formal specification for the Raft consensus algorithm in MongoDB 3 | 4 | EXTENDS Naturals, FiniteSets, Sequences, TLC 5 | 6 | \* The set of server IDs 7 | CONSTANTS Server 8 | 9 | \* Server states. 10 | \* Candidate is not used, but this is fine. 11 | CONSTANTS Follower, Candidate, Leader 12 | 13 | \* A reserved value. 14 | CONSTANTS Nil 15 | 16 | ---- 17 | \* Global variables 18 | 19 | \* The server's term number. 20 | VARIABLE globalCurrentTerm 21 | 22 | ---- 23 | \* The following variables are all per server (functions with domain Server). 24 | 25 | \* The server's state (Follower, Candidate, or Leader). 26 | VARIABLE state 27 | 28 | \* The commit point learned by each server. 29 | VARIABLE commitPoint 30 | 31 | electionVars == <> 32 | serverVars == <> 33 | 34 | \* A Sequence of log entries. The index into this sequence is the index of the 35 | \* log entry. Unfortunately, the Sequence module defines Head(s) as the entry 36 | \* with index 1, so be careful not to use that! 37 | VARIABLE log 38 | logVars == <> 39 | 40 | \* End of per server variables. 41 | ---- 42 | 43 | \* All variables; used for stuttering (asserting state hasn't changed). 44 | vars == <> 45 | 46 | ---- 47 | \* Helpers 48 | 49 | \* The set of all quorums. This just calculates simple majorities, but the only 50 | \* important property is that every quorum overlaps with every other. 51 | Quorum == {i \in SUBSET(Server) : Cardinality(i) * 2 > Cardinality(Server)} 52 | 53 | \* The term of the last entry in a log, or 0 if the log is empty. 54 | GetTerm(xlog, index) == IF index = 0 THEN 0 ELSE xlog[index].term 55 | LogTerm(i, index) == GetTerm(log[i], index) 56 | LastTerm(xlog) == GetTerm(xlog, Len(xlog)) 57 | 58 | \* Return the minimum value from a set, or undefined if the set is empty. 59 | Min(s) == CHOOSE x \in s : \A y \in s : x <= y 60 | \* Return the maximum value from a set, or undefined if the set is empty. 61 | Max(s) == CHOOSE x \in s : \A y \in s : x >= y 62 | 63 | ---- 64 | \* Define initial values for all variables 65 | 66 | InitServerVars == /\ globalCurrentTerm = 0 67 | /\ state = [i \in Server |-> Follower] 68 | /\ commitPoint = [i \in Server |-> [term |-> 0, index |-> 0]] 69 | InitLogVars == /\ log = [i \in Server |-> << >>] 70 | Init == /\ InitServerVars 71 | /\ InitLogVars 72 | 73 | ---- 74 | \* Message handlers 75 | \* i = recipient, j = sender, m = message 76 | 77 | AppendOplog(i, j) == 78 | \* /\ state[i] = Follower \* Disable primary catchup and draining 79 | /\ Len(log[i]) < Len(log[j]) 80 | /\ LastTerm(log[i]) = LogTerm(j, Len(log[i])) 81 | /\ log' = [log EXCEPT ![i] = Append(log[i], log[j][Len(log[i]) + 1])] 82 | /\ UNCHANGED <> 83 | 84 | CanRollbackOplog(i, j) == 85 | /\ Len(log[i]) > 0 86 | /\ \* The log with later term is more up-to-date 87 | LastTerm(log[i]) < LastTerm(log[j]) 88 | /\ 89 | \/ Len(log[i]) > Len(log[j]) 90 | \* There seems no short-cut of OR clauses, so I have to specify the negative case 91 | \/ /\ Len(log[i]) <= Len(log[j]) 92 | /\ LastTerm(log[i]) /= LogTerm(j, Len(log[i])) 93 | 94 | RollbackOplog(i, j) == 95 | /\ CanRollbackOplog(i, j) 96 | \* Rollback 1 oplog entry 97 | /\ LET new == [index2 \in 1..(Len(log[i]) - 1) |-> log[i][index2]] 98 | IN log' = [log EXCEPT ![i] = new] 99 | /\ UNCHANGED <> 100 | 101 | \* The set of nodes that has log[me][logIndex] in their oplog 102 | Agree(me, logIndex) == 103 | { node \in Server : 104 | /\ Len(log[node]) >= logIndex 105 | /\ LogTerm(me, logIndex) = LogTerm(node, logIndex) } 106 | 107 | IsCommitted(me, logIndex) == 108 | /\ Agree(me, logIndex) \in Quorum 109 | \* If we comment out the following line, a replicated log entry from old primary will voilate the safety. 110 | \* [ P (2), S (), S ()] 111 | \* [ S (2), S (), P (3)] 112 | \* [ S (2), S (2), P (3)] !!! the log from term 2 shouldn't be considered as committed. 113 | /\ LogTerm(me, logIndex) = globalCurrentTerm 114 | 115 | \* RollbackCommitted and NeverRollbackCommitted are not actions. 116 | \* They are used for verification. 117 | RollbackCommitted(i) == 118 | \E j \in Server: 119 | /\ CanRollbackOplog(i, j) 120 | /\ IsCommitted(i, Len(log[i])) 121 | 122 | NeverRollbackCommitted == 123 | \A i \in Server: ~RollbackCommitted(i) 124 | 125 | \* ACTION 126 | \* i = the new primary node. 127 | BecomePrimaryByMagic(i) == 128 | LET notBehind(me, j) == 129 | \/ LastTerm(log[me]) > LastTerm(log[j]) 130 | \/ /\ LastTerm(log[me]) = LastTerm(log[j]) 131 | /\ Len(log[me]) >= Len(log[j]) 132 | ayeVoters(me) == 133 | { index \in Server : notBehind(me, index) } 134 | IN /\ ayeVoters(i) \in Quorum 135 | /\ state' = [index \in Server |-> IF index = i THEN Leader ELSE Follower] 136 | /\ globalCurrentTerm' = globalCurrentTerm + 1 137 | /\ UNCHANGED <> 138 | 139 | \* ACTION 140 | \* Leader i receives a client request to add v to the log. 141 | ClientWrite(i) == 142 | /\ state[i] = Leader 143 | /\ LET entry == [term |-> globalCurrentTerm] 144 | newLog == Append(log[i], entry) 145 | IN log' = [log EXCEPT ![i] = newLog] 146 | /\ UNCHANGED <> 147 | 148 | \* ACTION 149 | AdvanceCommitPoint == 150 | \E leader \in Server : 151 | /\ state[leader] = Leader 152 | /\ IsCommitted(leader, Len(log[leader])) 153 | /\ commitPoint' = [commitPoint EXCEPT ![leader] = [term |-> LastTerm(log[leader]), index |-> Len(log[leader])]] 154 | /\ UNCHANGED <> 155 | 156 | \* Return whether Node i can learn the commit point from Node j. 157 | CommitPointLessThan(i, j) == 158 | \/ commitPoint[i].term < commitPoint[j].term 159 | \/ /\ commitPoint[i].term = commitPoint[j].term 160 | /\ commitPoint[i].index < commitPoint[j].index 161 | 162 | \* ACTION 163 | \* Node i learns the commit point from j via heartbeat. 164 | LearnCommitPoint(i, j) == 165 | /\ CommitPointLessThan(i, j) 166 | /\ commitPoint' = [commitPoint EXCEPT ![i] = commitPoint[j]] 167 | /\ UNCHANGED <> 168 | 169 | \* ACTION 170 | \* Node i learns the commit point from j via heartbeat with term check 171 | LearnCommitPointWithTermCheck(i, j) == 172 | /\ LastTerm(log[i]) = commitPoint[j].term 173 | /\ LearnCommitPoint(i, j) 174 | 175 | \* ACTION 176 | LearnCommitPointFromSyncSource(i, j) == 177 | /\ ENABLED AppendOplog(i, j) 178 | /\ LearnCommitPoint(i, j) 179 | 180 | \* ACTION 181 | LearnCommitPointFromSyncSourceNeverBeyondLastApplied(i, j) == 182 | \* From sync source 183 | /\ ENABLED AppendOplog(i, j) 184 | /\ CommitPointLessThan(i, j) 185 | \* Never beyond last applied 186 | /\ LET myCommitPoint == 187 | \* If they have the same term, commit point can be ahead. 188 | IF commitPoint[j].term <= LastTerm(log[i]) 189 | THEN commitPoint[j] 190 | ELSE [term |-> LastTerm(log[i]), index |-> Len(log[i])] 191 | IN commitPoint' = [commitPoint EXCEPT ![i] = myCommitPoint] 192 | /\ UNCHANGED <> 193 | 194 | \* ACTION 195 | AppendEntryAndLearnCommitPointFromSyncSource(i, j) == 196 | \* Append entry 197 | /\ Len(log[i]) < Len(log[j]) 198 | /\ LastTerm(log[i]) = LogTerm(j, Len(log[i])) 199 | /\ log' = [log EXCEPT ![i] = Append(log[i], log[j][Len(log[i]) + 1])] 200 | \* Learn commit point 201 | /\ CommitPointLessThan(i, j) 202 | /\ commitPoint' = [commitPoint EXCEPT ![i] = commitPoint[j]] 203 | /\ UNCHANGED <> 204 | 205 | ---- 206 | AppendOplogAction == 207 | \E i,j \in Server : AppendOplog(i, j) 208 | 209 | RollbackOplogAction == 210 | \E i,j \in Server : RollbackOplog(i, j) 211 | 212 | BecomePrimaryByMagicAction == 213 | \E i \in Server : BecomePrimaryByMagic(i) 214 | 215 | ClientWriteAction == 216 | \E i \in Server : ClientWrite(i) 217 | 218 | LearnCommitPointAction == 219 | \E i, j \in Server : LearnCommitPoint(i, j) 220 | 221 | LearnCommitPointWithTermCheckAction == 222 | \E i, j \in Server : LearnCommitPointWithTermCheck(i, j) 223 | 224 | LearnCommitPointFromSyncSourceAction == 225 | \E i, j \in Server : LearnCommitPointFromSyncSource(i, j) 226 | 227 | LearnCommitPointFromSyncSourceNeverBeyondLastAppliedAction == 228 | \E i, j \in Server : LearnCommitPointFromSyncSourceNeverBeyondLastApplied(i, j) 229 | 230 | AppendEntryAndLearnCommitPointFromSyncSourceAction == 231 | \E i, j \in Server : AppendEntryAndLearnCommitPointFromSyncSource(i, j) 232 | 233 | ---- 234 | \* Properties to check 235 | 236 | RollbackBeforeCommitPoint(i) == 237 | /\ \E j \in Server: 238 | /\ CanRollbackOplog(i, j) 239 | /\ \/ LastTerm(log[i]) < commitPoint[i].term 240 | \/ /\ LastTerm(log[i]) = commitPoint[i].term 241 | /\ Len(log[i]) <= commitPoint[i].index 242 | \* todo: clean up 243 | 244 | NeverRollbackBeforeCommitPoint == \A i \in Server: ~RollbackBeforeCommitPoint(i) 245 | 246 | \* Liveness check 247 | 248 | \* This isn't accurate for any infinite behavior specified by Spec, but it's fine 249 | \* for any finite behavior with the liveness we can check with the model checker. 250 | \* This is to check at any time, if two nodes' commit points are not the same, they 251 | \* will be the same eventually. 252 | \* This is checked after all possible rollback is done. 253 | CommitPointEventuallyPropagates == 254 | /\ \A i, j \in Server: 255 | [](commitPoint[i] # commitPoint[j] ~> 256 | <>(~ENABLED RollbackOplogAction => commitPoint[i] = commitPoint[j])) 257 | 258 | ---- 259 | \* Defines how the variables may transition. 260 | Next == 261 | \* --- Replication protocol 262 | \/ AppendOplogAction 263 | \/ RollbackOplogAction 264 | \/ BecomePrimaryByMagicAction 265 | \/ ClientWriteAction 266 | \* 267 | \* --- Commit point learning protocol 268 | \/ AdvanceCommitPoint 269 | \* \/ LearnCommitPointAction 270 | \/ LearnCommitPointFromSyncSourceAction 271 | \* \/ AppendEntryAndLearnCommitPointFromSyncSourceAction 272 | \* \/ LearnCommitPointWithTermCheckAction 273 | \* \/ LearnCommitPointFromSyncSourceNeverBeyondLastAppliedAction 274 | 275 | Liveness == 276 | /\ SF_vars(AppendOplogAction) 277 | /\ SF_vars(RollbackOplogAction) 278 | \* A new primary should eventually write one entry. 279 | /\ WF_vars(\E i \in Server : LastTerm(log[i]) # globalCurrentTerm /\ ClientWrite(i)) 280 | \* /\ WF_vars(ClientWriteAction) 281 | \* 282 | \* --- Commit point learning protocol 283 | /\ WF_vars(AdvanceCommitPoint) 284 | \* /\ WF_vars(LearnCommitPointAction) 285 | /\ SF_vars(LearnCommitPointFromSyncSourceAction) 286 | \* /\ SF_vars(AppendEntryAndLearnCommitPointFromSyncSourceAction) 287 | \* /\ SF_vars(LearnCommitPointWithTermCheckAction) 288 | \* /\ SF_vars(LearnCommitPointFromSyncSourceNeverBeyondLastAppliedAction) 289 | 290 | \* The specification must start with the initial state and transition according 291 | \* to Next. 292 | Spec == Init /\ [][Next]_vars /\ Liveness 293 | 294 | =============================================================================== 295 | -------------------------------------------------------------------------------- /RaftMongoWithRaftReconfig.tla: -------------------------------------------------------------------------------- 1 | --------------------------------- MODULE RaftMongoWithRaftReconfig -------------------------------- 2 | \* This is the formal specification for the Raft consensus algorithm in MongoDB. 3 | \* It allows reconfig using the protocol for single server membership changes described in Raft. 4 | 5 | EXTENDS Naturals, FiniteSets, Sequences, TLC 6 | 7 | \* The set of server IDs 8 | CONSTANTS Server 9 | 10 | \* Server states. 11 | \* Candidate is not used, but this is fine. 12 | CONSTANTS Follower, Candidate, Leader 13 | 14 | \* A reserved value. 15 | CONSTANTS Nil 16 | 17 | ---- 18 | \* Global variables 19 | 20 | \* Servers in a given config version. 21 | \* e.g. << {S1, S2}, {S1, S2, S3} >> 22 | VARIABLE configs 23 | 24 | \* The set of log entries that have been acknowledged as committed, i.e. 25 | \* "immediately committed" entries. It does not include "prefix committed" 26 | \* entries, which are allowed to roll back on minority nodes. 27 | VARIABLE committedEntries 28 | 29 | ---- 30 | \* The following variables are all per server (functions with domain Server). 31 | 32 | \* The server's term number. 33 | VARIABLE currentTerm 34 | 35 | \* The server's state (Follower, Candidate, or Leader). 36 | VARIABLE state 37 | 38 | serverVars == <> 39 | 40 | \* A Sequence of log entries. The index into this sequence is the index of the 41 | \* log entry. Unfortunately, the Sequence module defines Head(s) as the entry 42 | \* with index 1, so be careful not to use that! 43 | VARIABLE log 44 | logVars == <> 45 | 46 | \* End of per server variables. 47 | ---- 48 | 49 | \* All variables; used for stuttering (asserting state hasn't changed). 50 | vars == <> 51 | 52 | ---- 53 | \* Helpers 54 | 55 | \* The term of the last entry in a log, or 0 if the log is empty. 56 | GetTerm(xlog, index) == IF index = 0 THEN 0 ELSE xlog[index].term 57 | LogTerm(i, index) == GetTerm(log[i], index) 58 | LastTerm(xlog) == GetTerm(xlog, Len(xlog)) 59 | 60 | \* Return the minimum value from a set, or undefined if the set is empty. 61 | Min(s) == CHOOSE x \in s : \A y \in s : x <= y 62 | \* Return the maximum value from a set, or undefined if the set is empty. 63 | Max(s) == CHOOSE x \in s : \A y \in s : x >= y 64 | 65 | \* The config version in the node's last entry. 66 | GetConfigVersion(i) == log[i][Len(log[i])].configVersion 67 | 68 | \* Gets the node's first entry with a given config version. 69 | GetConfigEntry(i, configVersion) == LET configEntries == {index \in 1..Len(log[i]) : 70 | log[i][index].configVersion = configVersion} 71 | IN Min(configEntries) 72 | 73 | \* The servers that are in the same config as i. 74 | ServerViewOn(i) == configs[GetConfigVersion(i)] 75 | 76 | \* The set of all quorums. This just calculates simple majorities, but the only 77 | \* important property is that every quorum overlaps with every other. 78 | Quorum(me) == {sub \in SUBSET(ServerViewOn(me)) : Cardinality(sub) * 2 > Cardinality(ServerViewOn(me))} 79 | 80 | ---- 81 | \* Define initial values for all variables 82 | InitServerVars == /\ currentTerm = [i \in Server |-> 0] 83 | /\ state = [i \in Server |-> Follower] 84 | InitLogVars == /\ log = [i \in Server |-> << [term |-> 0, configVersion |-> 1] >>] 85 | /\ committedEntries = {[term |-> 0, index |-> 1]} 86 | InitConfigs == configs = << Server >> 87 | Init == /\ InitServerVars 88 | /\ InitLogVars 89 | /\ InitConfigs 90 | 91 | ---- 92 | \* Message handlers 93 | \* i = recipient, j = sender, m = message 94 | 95 | AppendOplog(i, j) == 96 | /\ state[i] = Follower \* Disable primary catchup and draining 97 | /\ j \in ServerViewOn(i) \* j is in the config of i. 98 | /\ Len(log[i]) < Len(log[j]) 99 | /\ LastTerm(log[i]) = LogTerm(j, Len(log[i])) 100 | /\ log' = [log EXCEPT ![i] = Append(log[i], log[j][Len(log[i]) + 1])] 101 | /\ UNCHANGED <> 102 | 103 | CanRollbackOplog(i, j) == 104 | /\ j \in ServerViewOn(i) \* j is in the config of i. 105 | /\ Len(log[i]) > 0 106 | /\ \* The log with later term is more up-to-date 107 | LastTerm(log[i]) < LastTerm(log[j]) 108 | /\ 109 | \/ Len(log[i]) > Len(log[j]) 110 | \* There seems no short-cut of OR clauses, so I have to specify the negative case 111 | \/ /\ Len(log[i]) <= Len(log[j]) 112 | /\ LastTerm(log[i]) /= LogTerm(j, Len(log[i])) 113 | 114 | RollbackOplog(i, j) == 115 | /\ CanRollbackOplog(i, j) 116 | \* Rollback 1 oplog entry 117 | /\ LET new == [index2 \in 1..(Len(log[i]) - 1) |-> log[i][index2]] 118 | IN log' = [log EXCEPT ![i] = new] 119 | /\ UNCHANGED <> 120 | 121 | \* The set of nodes in my config that has log[me][logIndex] in their oplog 122 | Agree(me, logIndex) == 123 | { node \in ServerViewOn(me) : 124 | /\ Len(log[node]) >= logIndex 125 | /\ LogTerm(me, logIndex) = LogTerm(node, logIndex) } 126 | 127 | NotBehind(me, j) == \/ LastTerm(log[me]) > LastTerm(log[j]) 128 | \/ /\ LastTerm(log[me]) = LastTerm(log[j]) 129 | /\ Len(log[me]) >= Len(log[j]) 130 | 131 | \* ACTION 132 | \* i = the new primary node. 133 | BecomePrimaryByMagic(i, ayeVoters) == 134 | /\ \A j \in ayeVoters : /\ i \in ServerViewOn(j) 135 | /\ NotBehind(i, j) 136 | /\ currentTerm[j] <= currentTerm[i] 137 | /\ ayeVoters \in Quorum(i) 138 | /\ state' = [index \in Server |-> IF index \notin ayeVoters 139 | THEN state[index] 140 | ELSE IF index = i THEN Leader ELSE Follower] 141 | /\ currentTerm' = [index \in Server |-> IF index \in (ayeVoters \union {i}) 142 | THEN currentTerm[i] + 1 143 | ELSE currentTerm[index]] 144 | /\ UNCHANGED <> 145 | 146 | \* ACTION 147 | \* Leader i receives a client request to add v to the log. 148 | ClientWrite(i) == 149 | /\ state[i] = Leader 150 | /\ LET entry == [term |-> currentTerm[i], configVersion |-> GetConfigVersion(i)] 151 | newLog == Append(log[i], entry) 152 | IN log' = [log EXCEPT ![i] = newLog] 153 | /\ UNCHANGED <> 154 | 155 | \* ACTION 156 | \* Commit the latest log entry on a primary. 157 | AdvanceCommitPoint == 158 | \E leader \in Server : \E acknowledgers \in SUBSET Server : 159 | /\ state[leader] = Leader 160 | /\ acknowledgers \subseteq Agree(leader, Len(log[leader])) 161 | /\ acknowledgers \in Quorum(leader) 162 | \* If we comment out the following line, a replicated log entry from old primary will voilate the safety. 163 | \* [ P (2), S (), S ()] 164 | \* [ S (2), S (), P (3)] 165 | \* [ S (2), S (2), P (3)] !!! the log from term 2 shouldn't be considered as committed. 166 | /\ LogTerm(leader, Len(log[leader])) = currentTerm[leader] 167 | \* If an acknowledger has a higher term, the leader would step down. 168 | /\ \A j \in acknowledgers : currentTerm[j] <= currentTerm[leader] 169 | /\ committedEntries' = committedEntries \union {[term |-> LastTerm(log[leader]), index |-> Len(log[leader])]} 170 | /\ UNCHANGED <> 171 | 172 | UpdateTermThroughHeartbeat(i, j) == 173 | /\ j \in ServerViewOn(i) \* j is in the config of i. 174 | /\ currentTerm[j] > currentTerm[i] 175 | /\ currentTerm' = [currentTerm EXCEPT ![i] = currentTerm[j]] 176 | /\ state' = [state EXCEPT ![i] = IF ~(state[i] = Leader) THEN state[i] ELSE Follower] 177 | /\ UNCHANGED <> 178 | 179 | Reconfig(i, newConfig) == 180 | /\ state[i] = Leader 181 | /\ i \in newConfig 182 | \* Only support single node addition/removal. 183 | /\ Cardinality(ServerViewOn(i) \ newConfig) + Cardinality(newConfig \ ServerViewOn(i)) <= 1 184 | \* The config entry must be committed. 185 | /\ LET configEntry == GetConfigEntry(i, GetConfigVersion(i)) 186 | IN [term |-> log[i][configEntry].term, index |-> configEntry] \in committedEntries 187 | \* The primary must have committed an entry in its current term. 188 | /\ \E entry \in committedEntries : entry.term = currentTerm[i] 189 | /\ configs' = Append(configs, newConfig) 190 | /\ LET entry == [term |-> currentTerm[i], configVersion |-> Len(configs) + 1] 191 | newLog == Append(log[i], entry) 192 | IN log' = [log EXCEPT ![i] = newLog] 193 | /\ UNCHANGED <> 194 | 195 | ---- 196 | AppendOplogAction == 197 | \E i,j \in Server : AppendOplog(i, j) 198 | 199 | RollbackOplogAction == 200 | \E i,j \in Server : RollbackOplog(i, j) 201 | 202 | BecomePrimaryByMagicAction == 203 | \E i \in Server : \E ayeVoters \in SUBSET(Server) : BecomePrimaryByMagic(i, ayeVoters) 204 | 205 | ClientWriteAction == 206 | \E i \in Server : ClientWrite(i) 207 | 208 | UpdateTermThroughHeartbeatAction == 209 | \E i,j \in Server : UpdateTermThroughHeartbeat(i, j) 210 | 211 | ReconfigAction == 212 | \E i \in Server : \E newConfig \in SUBSET(Server) : Reconfig(i, newConfig) 213 | 214 | ---- 215 | \* Defines how the variables may transition. 216 | Next == 217 | \* --- Replication protocol 218 | \/ AppendOplogAction 219 | \/ RollbackOplogAction 220 | \/ BecomePrimaryByMagicAction 221 | \/ ClientWriteAction 222 | \/ AdvanceCommitPoint 223 | \/ ReconfigAction 224 | \/ UpdateTermThroughHeartbeatAction 225 | 226 | Liveness == 227 | /\ SF_vars(AppendOplogAction) 228 | /\ SF_vars(RollbackOplogAction) 229 | \* A new primary should eventually write one entry. 230 | /\ WF_vars(\E i \in Server : LastTerm(log[i]) # currentTerm[i] /\ ClientWrite(i)) 231 | \* /\ WF_vars(ClientWriteAction) 232 | 233 | \* The specification must start with the initial state and transition according 234 | \* to Next. 235 | Spec == Init /\ [][Next]_vars /\ Liveness 236 | 237 | \* RollbackCommitted and NeverRollbackCommitted are not actions. 238 | \* They are used for verification. 239 | RollbackCommitted(i) == 240 | /\ [term |-> LastTerm(log[i]), index |-> Len(log[i])] \in committedEntries 241 | /\ \E j \in Server: CanRollbackOplog(i, j) 242 | 243 | NeverRollbackCommitted == 244 | \A i \in Server: ~RollbackCommitted(i) 245 | 246 | TwoPrimariesInSameTerm == 247 | \E i, j \in Server : 248 | /\ i # j 249 | /\ currentTerm[i] = currentTerm[j] 250 | /\ state[i] = Leader 251 | /\ state[j] = Leader 252 | 253 | NoTwoPrimariesInSameTerm == ~TwoPrimariesInSameTerm 254 | 255 | =============================================================================== 256 | --------------------------------------------------------------------------------