├── .gitignore ├── Lecture 02.md ├── Lecture 03.md ├── Lecture 04.md ├── Lecture 05.md ├── Lecture 06.md ├── Lecture 07.md ├── Lecture 08.md ├── Lecture 09.md ├── Lecture 10.md ├── Lecture 11.md ├── Lecture 12.md ├── Lecture 13.md ├── Lecture 14.md ├── Lecture 15.md ├── Lecture 16.md ├── Lecture 17.md ├── Lecture 18.md ├── Lecture 19.md ├── Lecture 20.md ├── Lecture 21.md ├── Lecture 22.md ├── Lecture 23.md ├── README.md ├── img ├── Ben Franklin.jpg ├── Diagrams.pptx ├── Eventual Consistency.jpg ├── L10 Fault Hierarchy 1.png ├── L10 Fault Hierarchy 2.png ├── L10 Fault Hierarchy 3.png ├── L10 Fault Hierarchy 4.png ├── L10 Fault Hierarchy 5.png ├── L10 Possible Faults.png ├── L10 Two Generals.png ├── L10 Vacuous FIFO Delivery.png ├── L11 Broadcast 1.png ├── L11 Reliable Broadcast 1.png ├── L11 Reliable Broadcast 2.png ├── L11 Reliable Broadcast 3.png ├── L11 Reliable Broadcast 4.png ├── L11 Reliable Delivery 1.png ├── L11 Reliable Delivery 2.png ├── L11 Reliable Delivery 3.png ├── L12 Chain Replication 1.png ├── L12 Chain Replication 2.png ├── L12 Chain Replication.png ├── L12 Consistency Hierarchy.png ├── L12 Determinism Violation.png ├── L12 Primary Backup Replication 1.png ├── L12 Primary Backup Replication 2.png ├── L12 Replica Disagreement 1.png ├── L12 Replica Disagreement 2.png ├── L12 Replica Disagreement 3.png ├── L12 Single Replica 1.png ├── L12 Single Replica 2.png ├── L12 TO Anomaly.png ├── L13 Chain Replication Paper Fig 4.png ├── L14 Chain Replication 1.png ├── L14 Chain Replication 2.png ├── L14 Consensus 1.png ├── L14 Consensus 2.png ├── L14 Paxos 1.png ├── L14 Paxos 2.png ├── L14 Paxos 3.png ├── L14 Paxos 4.png ├── L14 Paxos 5.png ├── L14 Paxos 6.png ├── L15 Chandy-Lamport Snapshot Bug.png ├── L15 Multiple Proposers 1.png ├── L15 Multiple Proposers 2.png ├── L15 Multiple Proposers 3.png ├── L15 Multiple Proposers 4.png ├── L15 Multiple Proposers 5.png ├── L15 Multiple Proposers 6.png ├── L15 Multiple Proposers 7.png ├── L15 Paxos Milestone 1.png ├── L15 Paxos Milestone 2.png ├── L15 Paxos Milestone 3.png ├── L16 MultiPaxos.png ├── L16 Paxos Minimum Msg Exchange.png ├── L16 Paxos Nontermination 1.png ├── L16 Paxos Nontermination 2.png ├── L16 Paxos Nontermination 3.png ├── L16 Paxos Phases.png ├── L16 Primary Backup 1.png ├── L16 Primary Backup 2.png ├── L17 Amazon Cart 1.png ├── L17 Amazon Cart 2.png ├── L17 Amazon Cart 3.png ├── L17 Eventual Consistency.png ├── L17 Network Partition 1.png ├── L17 Network Partition 2.png ├── L17 Strong Convergence 1.png ├── L17 TO Anomaly.png ├── L17 Tradeoff 1.png ├── L17 Tradeoff 2.png ├── L17 Tradeoff 3.png ├── L18 Merkle Conflict 1.png ├── L18 Merkle Conflict 2.png ├── L18 Merkle Conflict 3.png ├── L18 Merkle Conflict 4.png ├── L18 Merkle Conflict 5.png ├── L18 Merkle Tree 1.png ├── L18 Merkle Tree 2.png ├── L19 Dataset Replication.png ├── L19 Dynamo Read Conflict 1.png ├── L19 Dynamo Read Conflict 2.png ├── L19 Key Replication.png ├── L19 MD5 Output Space.png ├── L19 MD5 To Node.png ├── L19 No Sharding.png ├── L19 Node Addition 1.png ├── L19 Node Addition 2.png ├── L19 Node Crash 1.png ├── L19 Ring 1.png ├── L19 Ring 2.png ├── L19 Ring 3.png ├── L19 Ring 4.png ├── L19 Sharding 1.png ├── L19 Sharding 2.png ├── L19 Sharding 3.png ├── L19 Sharding 4.png ├── L19 Sharding 5.png ├── L19 Sharding 6.png ├── L2 Message 1.png ├── L20 Distributed MapReduce 1.png ├── L20 Distributed MapReduce 2.png ├── L20 Distributed MapReduce 3.png ├── L20 Distributed MapReduce 4.png ├── L20 Inverted Index.png ├── L21 Master 1.png ├── L21 Master 2.png ├── L22 Boolean Ordering.png ├── L22 Comparable Subsets.png ├── L22 Conflicting Updates.png ├── L22 Delete Cart Item 1.png ├── L22 Delete Cart Item 2.png ├── L22 Delete Cart Item 3.png ├── L22 Noncomparable Subsets.png ├── L22 Replica Consensus 1.png ├── L22 Replica Consensus 2.png ├── L22 Set of Subsets.png ├── L23 Equivalent Terms.png ├── L23 FT Hierarchy.png ├── L23 Rados Fig 2.png ├── L3 Causal Anomaly.png ├── L3 Message Passing.png ├── L3 Multiple Processes.png ├── L3 Process events.png ├── L3 Reasoning About State.png ├── L4 LC Msg Send 1.png ├── L4 LC Msg Send 2.png ├── L4 LC Msg Send 3.png ├── L4 LC Msg Send 4.png ├── L4 LC Msg Send 5.png ├── L4 LC Msg Send 6.png ├── L4 Lattice.png ├── L4 Natural Numbers.png ├── L5 Causal History 1.png ├── L5 Causal History 2.png ├── L5 Causal History 3.png ├── L5 Causal History 4.png ├── L5 FIFO Anomaly.png ├── L5 Protocol 1.png ├── L5 Protocol 2.png ├── L5 Protocol 3.png ├── L5 Protocol 4.png ├── L5 Protocol 5.png ├── L5 VC Clocks 1.png ├── L5 VC Clocks 2.png ├── L5 VC Clocks 3.png ├── L5 VC Clocks 4.png ├── L5 VC Clocks 5.png ├── L5 VC Clocks 6.png ├── L5 VC Clocks 7.png ├── L5 VC Clocks 8.png ├── L6 Causal Violation.png ├── L6 Delivery Hierarchy 1.png ├── L6 Delivery Hierarchy 2.png ├── L6 Ensure Casual Delivery 1.png ├── L6 Ensure Casual Delivery 2.png ├── L6 Naive Seq Nos.png ├── L6 Total Order Anomaly.png ├── L6 Vacuous FIFO Delivery.png ├── L7 Bad Snapshot.png ├── L7 Causal Broadcast 1.png ├── L7 Causal Broadcast 2.png ├── L7 Causal Broadcast 3.png ├── L7 Causal Broadcast 4.png ├── L7 Causal Broadcast 5.png ├── L7 Causal Broadcast 6.png ├── L7 Causal Broadcast 7.png ├── L7 Causal Broadcast 8.png ├── L7 Channels 1.png ├── L7 Channels 2.png ├── L7 Channels 3.png ├── L7 Channels 4.png ├── L7 Good Snapshot.png ├── L7 Process State.png ├── L7 TO Anomaly.png ├── L7 Wallclock Snapshot Anomaly.png ├── L8 CL Snapshot 1.png ├── L8 CL Snapshot 2.png ├── L8 CL Snapshot 3.png ├── L8 Marker FIFO Anomaly.png ├── L8 Snapshot Ex 1.png ├── L8 Snapshot Ex 2.png ├── L8 Snapshot Ex 3.png ├── L8 Snapshot Ex 4.png ├── L8 Snapshot Ex 5.png ├── L8 Snapshot Ex 6.png ├── L8 Snapshot Ex 7.png ├── L9 Bad Snapshot.png ├── L9 Connected Graph.png ├── L9 Consistent Cut 1.png ├── L9 Consistent Cut 2.png ├── L9 Cut.png ├── L9 Inconsistent Cut.png ├── L9 Simultaneous Snapshot 1.png ├── L9 Simultaneous Snapshot 2.png ├── L9 Total Graph.png ├── aardvark.jpeg ├── bang.png ├── cross.png ├── cross_small.png ├── emoji_book.png ├── emoji_jeans.png ├── emoji_neutral.png ├── emoji_sad.png ├── emoji_smiley.png ├── emoji_torch.png ├── stickman.png ├── tick.png ├── tick_small.png └── very_silly.png └── papers ├── Alsberg and Day.pdf ├── Dynamo.pdf ├── FLP.pdf ├── Frank Schmuck PhD Paper.pdf ├── JSON CRDT.pdf ├── Ladin and Liskov.pdf ├── MapReduce.pdf ├── Paxos Made Simple.pdf ├── Paxos vs RAFT.pdf ├── Paxos vs VSR vs ZAB.pdf ├── TCOEDS.pdf ├── VS Replication.pdf ├── VSR.pdf ├── VirtTime_GlobState.pdf ├── Vogels.pdf ├── ZAB.pdf ├── atomic_broadcast.pdf ├── birman91multicast.pdf ├── byzantine.pdf ├── chain_replication.pdf ├── chandy.pdf ├── fidge88timestamps.pdf ├── holygrail.pdf ├── net_comms_constraints_tradeoffs.pdf ├── paxoscommit-tods2006.pdf ├── pbft.pdf ├── rados.pdf └── raft.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | ~$*.pptx 3 | -------------------------------------------------------------------------------- /Lecture 02.md: -------------------------------------------------------------------------------- 1 | # Distributed Systems Lecture 2 2 | 3 | ## Lecture Given by [Lindsey Kuper](https://users.soe.ucsc.edu/~lkuper/) on April 1st, 2020 via [YouTube](https://www.youtube.com/watch?v=G0wpsacaYpE) 4 | 5 | | Previous | Next 6 | |---|--- 7 | | | [Lecture 3](./Lecture%2003.md) 8 | 9 | ## What is a Distributed System? 10 | 11 | Leslie Lamport gives the rather comical definition that: 12 | 13 | > *"A distributed system is one in which I can't get my work done because a computer I've never heard of has crashed"* 14 | 15 | Although he was joking, this definition captures a very important aspect of a distributed system in that it is one defined by some type of failure. 16 | 17 | Martin Kleppmann's definition of a distributed system is somewhat more serious (he's the author of a book called [Designing Data-Intensive Applications](https://www.amazon.co.uk/Designing-Data-Intensive-Applications-Reliable-Maintainable-ebook/dp/B06XPJML5D/ref=sr_1_1)). 18 | 19 | > A distributed system runs on several nodes (computers) and is characterised by partial failure 20 | 21 | However, other definitions of a distributed system include ideas such as: 22 | 23 | - Systems where work is distributed fairly between nodes, or 24 | - Where multiple computers "behave as one" 25 | 26 | These last definitions, however, are all rather too optimistic because they do not account for any real-life difficulties - one of which is mentioned in Martin Kleppman's definition. 27 | 28 | ## What is Partial Failure? 29 | 30 | Partial failure is where some component of a computation fails, but that failure is not necessarily fatal. 31 | For instance, one machine in a cluster of 1000 could fail without there being any significant impact on the overall "system". 32 | In other words, the presence of this type of partial failure is non-fatal to the operation of your system as a whole. 33 | 34 | ## "Cloud Computing" vs. "High Performance Computing (HPC)" 35 | 36 | The problem with HPC is that it treats partial failure as total failure. 37 | Consequently, HPC must rely on techniques such as check-pointing. 38 | This is where the progress of the calculation is saved at regular intervals so that in the event of a failure, the computation can be continued from the last check point without needing to start over from the beginning. 39 | 40 | However, in computing in general and specifically in Cloud Computing, it is expected that various parts of the system will fail. 41 | Consequently, this type of behaviour is a fundamental part of the software design. 42 | 43 | Had Ben Franklin been a programmer, he would probably have said: 44 | 45 | ![Ben Franklin](./img/Ben%20Franklin.jpg) 46 | 47 | Failure can occur in many ways. For instance: 48 | 49 | - Network partition 50 | - Hardware failure 51 | - Software failure 52 | 53 | ### Determining the Cause of Failure 54 | 55 | In a minimal cluster of two computers `M1` and `M2`, `M1` wants to ask `M2` for the value of a particular variable: 56 | 57 | - `M1` sends a message to `M2` saying *"What's the value of `x`?"* 58 | - `M2` should then respond with another message saying *"`x=5`"* 59 | 60 | ![Message 1](./img/L2%20Message%201.png) 61 | 62 | But in this very simple scenario, the number of possibilities for failure is still very high: 63 | 64 | - `M1`'s message might get lost due to network failure 65 | - `M1`'s message is delivered very slowly due to unexpectedly high network latency 66 | - `M2` could be down 67 | - `M2` might crash immediately after reporting to `M1` that it is alive, but before receiving `M1`'s message 68 | - `M2` might crash as a result of trying to respond to `M1`'s message 69 | - `M2` might refuse to respond to the question due to some type of authentication or authorisation failure 70 | - `M2` responds correctly, but the message never gets back to `M1` 71 | - Some random external event such as a cosmic ray flips a bit in the message thus corrupting it (Byzantine error) 72 | 73 | And on and on... 74 | 75 | Although the underlying causes are very different, as far as `M1` is concerned, they all look the same. 76 | All `M1` can tell is that it never got an answer to its question. 77 | 78 | In fact, it is impossible to determine the cause of such a failure without first having global knowledge of the entire system. 79 | 80 | ### Timeouts 81 | 82 | It is often assumed that `M1` must wait for some predefined timeout period, after which it should assume failure if it does not receive an answer. 83 | But the problem is that network latency is indeterminate; therefore, waiting for an arbitrary timeout period might help to trap *some* errors, but in general, it is not a good strategy for determining failure. 84 | 85 | For example, instead of asking *"What is the value of `x`"*, `M1` could ask `M2` to *"Add 1 to `x`"*. 86 | 87 | ***Q:***  How will `M1` know this request was successfully processed? 88 | Should it simply wait for a predetermined timeout period and if it hears nothing back, assume everything's fine? 89 | ***A:***  No, `M1` can only discover the success or failure of its request when it receives some sort of acknowledgement back from `M2`. 90 | 91 | If `M1` does not receive a response within its timeout period, what should it conclude? 92 | 93 | We can see that without any additional information, it is not correct for `M1` to assume either success or failure. 94 | All `M1` can say with any certainty is that from its point of view, the state of variable `x` in `M2` is indeterminate. 95 | 96 | ### Realistic Timeout Values? 97 | 98 | If the maximum network delay for message transmission is `D` seconds and the maximum time a machine spends processing a request is `R`, then the upper bound of the message handling timeout should be `2D + R` (two network journeys (send and receive) plus the remote machine's processing time). 99 | This would rule out the uncertainty for timeouts in a slow network, but still leave us completely unable to reason about any other types of failure. 100 | 101 | In distributed systems, we must deal not only with the problems of "Partial failure", but also the problem of "Unbounded Latency" (the definition given by [Peter Alvaro](https://dl.acm.org/profile/81453654530) - one of Lindsey Kuper's colleagues) 102 | 103 | ***Q:***  Given they're so hard to debug, why would you want to use a distributed system? 104 | ***A:***  Well, largely because we have no choice. 105 | All manner of external factors can force us to distribute either a computation or storage (or both) across multiple machines… 106 | 107 | - Too much data to fit on a single machine 108 | - Need a faster response time, so we have to throw more processing power at the problem 109 | - Scalability (need to handle more data, more users, or simply need more CPUs) 110 | - Fault Tolerance (redundancy) 111 | 112 | ## Time and How We Measure it 113 | 114 | Computers use clocks to identify specific points in time (both past and present). For example: 115 | 116 | - This class starts at 09:00 117 | - This item in the cache expires tomorrow at 4:26 PM 118 | - This log event happened yesterday at 11:31:34.352 119 | 120 | Computers also use clocks for reasoning about time intervals: 121 | 122 | - This class is 65 minutes long 123 | - This access code will expire in 30 seconds time 124 | - This user spent 4 minutes 38 seconds on our website 125 | 126 | ### Time-of-Day Clocks 127 | 128 | Computers have two types of clock, time-of-day clocks and monotonic clocks. 129 | 130 | - Time-of-day clocks are typically synchronised across different machines using [NTP](http://www.ntp.org/) 131 | - Time-of-day clocks are ***bad*** for measuring intervals between events taking place on different machines: 132 | - The clocks on the different machines may not be synchronised correctly and may disagree on the exact size of a time interval 133 | - There are several cases where the time-of-day clock can jump: 134 | * Daylight saving time just started 135 | * A leap second happens 136 | * NTP resets the machine's clock to an earlier value 137 | - Time-of-day clocks are fine for timestamping events that only require a low degree of accuracy, but they are quite inadequate in situations where a high degree of accuracy is needed 138 | 139 | There is a general principle in timekeeping: the more accurately you need to measure time, the harder that task becomes. 140 | If you now try to get two or more computers to agree on what the time is, then the difficulty is only compounded. 141 | 142 | ### Monotonic clocks 143 | 144 | A value is said to be *"monotonic"* if it only ever changes in one direction. 145 | So, a monotonic clock is one that will ***never*** jump backwards; its counter is guaranteed only to get bigger. 146 | 147 | The value of a monotonic clock is: 148 | 149 | - simply a counter that only ever gets bigger (E.G. microseconds since system boot) 150 | - useless as a timestamp, because it has no meaning outside the machine on which it exists. 151 | Therefore, it makes no sense to compare monotonic clock values between different machines. 152 | - ideally suited for measuring time intervals on a single machine (E.G. how long did it take to compress that image) 153 | 154 | Time of day and monotonic clocks are both physical clocks. 155 | This means they are concerned with quantifying time intervals between now and some fixed reference point in the past. 156 | Common reference points are Jan 1st, 1970 (for most date calculations) and when the machine was started. 157 | 158 | ***Q:***  So how do we mark points in time that are valid across multiple machines?
159 | ***A:***  If we attempt to use any sort of physical clock, then it's very tricky. 160 | 161 | ### What's the Time, Mr Lamport? 162 | 163 | If we think we can solve this problem by repeatedly asking a physical clock *"What's the time?"*, then in fact, we are asking the wrong question because we have misunderstood the problem. 164 | 165 | Machines in a distributed system don't need to know what the time is in any absolute sense, all they need to be able to do is answer the question ***"Which event happened first?"***. 166 | This is why distributed systems use a very different notion of what a clock is; they use something called a ***Logical Clock*** (a.k.a. a Lamport Clock). 167 | 168 | Counter-intuitively, a logical clock measures neither the time of day nor the elapsed interval between two events; instead, it measures nothing more than the ordering of events. 169 | 170 | At first, this concept is not easy to grasp because it goes against our firmly established notion of what a clock is (or ought to be). 171 | But in order to answer the all-important question ***"Which event happened first?"*** we don't need to reference the time of day, we just need some sort of counter that clicks up every time an event occurs. 172 | This type of *"clock"* is very important for several reasons: 173 | 174 | * **Communication is unreliable**
175 | This means that the length of time taken for message transmission is unpredictable (a phenomenon known as *"unbounded latency"*) 176 | * **Unpredictable changes of state**
177 | E.G. the last time you asked machine `M2` what the value of `x` was, it said `x=5`; but in the meantime, some other machine has changed `x` to `6` and hasn't told you… 178 | 179 | In these situations, it is vital to know the order in which events occurred. 180 | 181 | ### Event Ordering 182 | 183 | Let's says we have two database events `A` and `B` and we know that `A` happened before `B`. 184 | So we denote this relationship by the syntax `A -> B`. This is pronounced *"`A` happens before `B`"* 185 | 186 | But what does this tell us about causality? 187 | 188 | All we can really say is that event `A` ***might*** have been the cause of event `B`, but we cannot be certain about this. 189 | All we can say with absolute certainty is that since `B` happened ***after*** `A`, it is ***impossible*** for `B` to have been the cause of `A`. 190 | 191 | Questions about causality are very important in distributed systems because they help us solve problems such as debugging. 192 | 193 | If `M1` thinks `x=1`, but `M2` thinks `x=5`, then by knowing when the value of `x` changed in each machine, we can rule out those events that did ***not*** contribute to the problem. 194 | This greatly helps in narrowing down what ***is*** the cause of the problem. 195 | 196 | This determinism is also useful when designing systems in which users see a sequence of events. 197 | You know that if event `A` happened before event `B`, then you can ensure that your system will ***never*** display event `B` to the user before they have first seen event `A`. 198 | 199 | --- 200 | 201 | | Previous | Next 202 | |---|--- 203 | | | [Lecture 3](./Lecture%2003.md) 204 | -------------------------------------------------------------------------------- /Lecture 03.md: -------------------------------------------------------------------------------- 1 | # Distributed Systems Lecture 3 2 | 3 | ## Lecture Given by [Lindsey Kuper](https://users.soe.ucsc.edu/~lkuper/) on April 3rd, 2020 via [YouTube](https://www.youtube.com/watch?v=83Ha1rX2LSw) 4 | 5 | | Previous | Next 6 | |---|--- 7 | | [Lecture 2](./Lecture%2002.md) | [Lecture 4](./Lecture%2004.md) 8 | 9 | ## Causality and the "Happens Before" Relation 10 | 11 | The notation `A -> B` means *"A happens before B"* and helps make the ordering of events in time explicit. 12 | 13 | The "happens before" relation allows us to conclude two things: 14 | 15 | 1. It is possible that event `A` ***might*** have been the cause of event `B` 16 | 1. It is ***completely impossible*** for event `B` to have been the cause of event `A` 17 | 18 | In other words, the arrow in the happens before relation indicates the direction of possible causality. 19 | 20 | ## Lamport Diagrams (a.k.a. Spacetime Diagrams) 21 | 22 | With time moving downwards, draw a vertical line to represent the events that happen within a process.[1](#f1) 23 | Events are then represented as dots on that line. 24 | 25 | ![Process events](./img/L3%20Process%20events.png) 26 | 27 | This diagram tells us that three events have taken place within this process and that they happened in the order `X` followed by `Y` followed by `Z`. 28 | From this, we can then infer the following: 29 | 30 | * Event `X` happened before events `Y` and `Z`, and therefore ***might*** be their cause, but we cannot be certain about this 31 | * Event `Y` happened before event `Z` and therefore ***might*** be its cause, but again, we cannot be certain about this 32 | * Events `Y` and `Z` happened after event `X`. We can therefore say with 100% certainty that `X` was not caused by either `Y` or `Z` 33 | * Event `Z` happened after event `Y`. We can therefore say with 100% certainty that `Y` was not caused by `Z` 34 | 35 | In the case of multiple machines, we would represent these as a set of adjacent vertical lines, each with their own timeline of events. 36 | 37 | ![Multiple processes](./img/L3%20Multiple%20Processes.png) 38 | 39 | ### Communication Between Machines 40 | 41 | The only way these machines can communicate with each other is by sending messages. 42 | The send and receive events are represented as dots on each machine's timeline. 43 | 44 | ![Message passing between processes](./img/L3%20Message%20Passing.png) 45 | 46 | Generally speaking, given two events, `A` and `B`, we can say that `A` happens before `B` (`A -> B`) if any of the following are true: 47 | 48 | - Events `A` and `B` are events in the same process and `B` happens after `A` 49 | - If `A` is a message send event and `B` is the corresponding receive event, then `B` must have happened after `A`. 50 | Sorry kids, time travel is not possible, so it makes no sense to talk of a message being received ***before*** it was sent 51 | - If `A -> C` and `C -> B`, the we can be certain that `A -> B` (This is known as transitive closure) 52 | 53 | This is the definition of the ***"happens before"*** relation. 54 | 55 | ## Causal Anomalies 56 | 57 | Here's an example that highlights the importance of needing to know the exact order in which events occurred. 58 | Without this knowledge, you might well be left wondering why you've received a particular message. 59 | 60 | ![A causal anomaly](./img/L3%20Causal%20Anomaly.png) 61 | 62 | 1. Alice thinks Bob has a problem with personal hygiene and sends a message to both Bob and Carol saying *"Bob Smells"* 63 | 1. Bob takes offence and immediately responds to both Alice and Carol by saying *"Up yours!"* 64 | 1. However, Alice's original message to Carol is delayed for some reason, and results in Bob's rude response arriving ***before*** Alice's original message 65 | 66 | Without a clear understanding of concept of the "happens before", Carol will not understand why Bob has apparently started sending her insulting messages. 67 | 68 | ## Network Models 69 | 70 | ### How Long Does it Take to Send a Message?
(Would that be Synchronous or Asynchronous?) 71 | 72 | If we could say with certainty that sending a network message required no more than `N` units of time, then we would have a much better idea of how communication should be managed. 73 | If such an upper limit could be placed on communication performance, then using timeouts to reason about communication failure would be a reasonable approach. 74 | 75 | Networks that make such timing guarantees are known as "synchronous networks" (E.G a network in which we know that message transmission will take no more than `N` units of time). 76 | In general however, synchronous networks require the existence of a stable circuit between sender and receiver — which is exactly what does not exist in either public switched telephone neworks ([PSTNs](https://en.wikipedia.org/wiki/Public_switched_telephone_network)) or the internet. 77 | 78 | The internet is an asynchronous network (I.E. a network having no central point of control and in which there is no upper bound on message transmission time) 79 | 80 | > As an aside, there is also a type of network known as *"partially synchronous network"* in which the upper bound on transmission time is large, but finite. 81 | > (For details, see the book [Distributed Algorithms](https://www.amazon.co.uk/Distributed-Algorithms-Kaufmann-Management-Systems-ebook/dp/B006QUTUR2/ref=sr_1_1) by Nancy Lynch) 82 | 83 | ### But Can We Reason About Transmission Times? 84 | 85 | Well, it depends on the type of network you're using... 86 | 87 | The least forgiving network model is the asynchronous one. By *"least forgiving"*, we mean a network that: 88 | 89 | * Makes the least number of assumptions about message transmission, and 90 | * Allows us the least scope for reasoning about its behaviour 91 | 92 | In spite of it being so unforgiving, the most robust designs are built on asynchronous networks. 93 | The flip side of this however is that these are also the hardest networks to reason about. 94 | One of the key consequences here is that we have no ability to describe all possible behaviours our system might exhibit, and without this ability, we can only protect our system against ***some*** of the possible error conditions, not all. 95 | 96 | If, on the other hand, you want to prove than a certain type of event is impossible, then you should choose the most forgiving network model (the synchronous one); for if you can prove that a certain event is impossible in the most forgiving network (for instance, where message delivery is known never to exceed `N` units of time), then you can also be certain that the same event will be impossible in the least forgiving network where `N` is unbounded. 97 | 98 | Here, a *"forgiving network protocol"* is one that makes the most assumptions and allow us the greatest scope for reasoning about its behaviour 99 | 100 | ## State 101 | 102 | So far, our Lamport diagrams only show events and the transmission of messages between processes. 103 | How then should state be described? 104 | 105 | ### What is the 'State' of a Computer? 106 | 107 | The term "state" is typically understood to mean the contents of a computer's storage (registers, memory and permanent storage) at some particular point in time. 108 | 109 | So, if at some particular point in time, `x=5`, then it follows that there must have been some previous point in time when `x` did not equal `5`. 110 | The transition from `x` not equalling `5` to `x` equalling `5` is an event that can be represented on a Lamport diagram. 111 | Therefore, events can also be thought of as representing changes of state. 112 | 113 | So, it might seem reasonable to propose that we should be able to reconstruct a machine's state at any point in time if we know two things: 114 | 115 | 1. The *full* history of all events that led up to `x` becoming equal to `5` 116 | 1. The precise order in which those events occurred 117 | 118 | In reality however, even if we have this knowledge, it might still not be possible to reconstruct the state. 119 | 120 | ### Reasoning About State 121 | 122 | There are three different ways in which events can be ordered using the `->` *"happens before"* relation: 123 | 124 | 1. Events `A` and `B` occur in the same process, with `B` happening after `A` 125 | 1. If `A` is a message send event and `B` is the corresponding receive event, then `B` ***must*** happen after `A` because it makes no sense to talk of a receive event happening ***before*** its corresponding send event 126 | 1. If `A -> C` and `C -> B`, then we can be certain that `A -> B` (Transitive closure) 127 | 128 | So, if `A -> B`, then events `A` and `B` form a pair of events ordered by the **"happens before"** relation. 129 | 130 | ![Reasoning about State](./img/L3%20Reasoning%20About%20State.png) 131 | 132 | What can we say about the ordering of these events? 133 | 134 | * From rule 1: `B -> C` 135 | * From rule 2: `A -> B` and `D -> C` 136 | * From rule 3: `A -> C` 137 | 138 | ***Q:***   What can we say about the order of events `A` and `B` in relation to event `D`? 139 | ***A:***   Absolutely nothing! 140 | 141 | So, the state of the machine is represented by the smallest set of ordered pairs, that is: 142 | 143 | ``` 144 | { (A,B) 145 | ,(B,C) 146 | ,(A,C) 147 | ,(D,C) 148 | } 149 | ``` 150 | 151 | This list includes only those event pairs that obey the ***happens before*** relation. 152 | 153 | What then can we say about the events shown in the following diagram? 154 | 155 | ![Message passing between processes](./img/L3%20Message%20Passing.png) 156 | 157 | ***Q:***   Does `P` happen before `S`? 158 | ***A:***   Yes, `P` is a send event and `S` is the corresponding receive event, therefore `P -> S` 159 | 160 | ***Q:***   What about `X` and `Z`? 161 | ***A:***   Yes, events `X` and `Z` occur within the same process and `Z` happens after `X`, therefore `X -> Z` 162 | 163 | ***Q:***   What about `P` and `Z`? 164 | ***A:***   Yes, `P -> S`, and `S` and `Z` are in the same process with `Z` happening after `S`, therefore `P -> Z` 165 | 166 | ***Q:***   What about `Q` and `R`? 167 | ***A:***   We do know that `Q` was caused by `T` and that `T -> Z` and `Z -> U` and `U -> R`; however, we are not allowed to determine causality by travelling backwards in time, so we must conclude that `Q` and `R` are ***not*** related by the "happens before" relation. 168 | In spite of the visual position of the dots in the diagram, we are unable to say which event happened first. 169 | 170 | Another way of saying this is that the ***"happens before"*** relation cannot be used to form an ordered pair from `Q` and `R`. 171 | All we can say about `Q` and `R` is that these events are **"concurrent"** or **"independent"**. This is written as `Q || R`. 172 | 173 | In the above diagram, events `X` and `P`, and `Z`, `U`, `V` and `Q` are also concurrent. 174 | 175 | 176 | ## Partial Orders 177 | 178 | A partial order is where, for a given set `S`, a binary relation holds that exhibits the following properties. 179 | This binary relation is usually, but not always written as `≤` (less than or equals) and allows you to compare two members of `S` using the following properties: 180 | 181 | | Property | English Description | Mathematical Description | 182 | |---|---|---| 183 | | Reflexivity | For all `a` in `S`,
`a` is always `≤` to itself | `∀ a ∈ S: a ≤ a` 184 | | Anti-symmetry | For all `a` and `b` in `S`,
if `a ≤ b` and `b ≤ a`, then `a = b` | `∀ a, b ∈ S: a ≤ b, b ≤ a => a = b` 185 | | Transitivity | For all `a`, `b` and `c` in `S`,
if `a ≤ b` and `b ≤ c`, then `a ≤ c` | `∀ a, b, c ∈ S: a ≤ b, b ≤ c => a ≤ c` 186 | 187 | However, when the members of the set are events whose ordering is determined by some measure of time, it makes no sense to say that event `A` ***"happens before"*** itself; so when speaking of a set of events, the reflexivity property is nonsensical and therefore not applicable. 188 | 189 | Also, we will never encounter the situation in a distributed system where event `A` happens before event `B` ***and*** event `B` happens before event `A` (making `A` and `B` the same event). 190 | Thus, the anti-symmetry property can never be adhered to in real life. 191 | Strictly speaking however, whilst this rule is never followed, it is also never violated; therefore, this rule is said to be ***"vacuously true"***. 192 | That is, when dealing with a set of real-life events, we will never find an example that exhibits this property; however, we will also never find an example that violates this property... 193 | 194 | So, the "happens before" relation is a weird kind of partial order because only two of the three rules governing partial orders apply, and even then, one of those two is true only in a vacuous sense. 195 | 196 | Therefore, the ***happens before*** relation is said to be *"an irreflexive partial order"*. 197 | 198 | In distributed systems, we will be dealing with many different kinds of partial order. 199 | So the first fundamental principle we find governing the behaviour of distributed systems is this weird partial order called "happens before". 200 | 201 | Whenever we talk about a relation being a partial order, we must first look at the set of things we're dealing with. If we're dealing with the "happens before" relation, then we're dealing with a set of events. 202 | 203 | --- 204 | 205 | | Previous | Next 206 | |---|--- 207 | | [Lecture 2](./Lecture%2002.md) | [Lecture 4](./Lecture%2004.md) 208 | 209 | --- 210 | 211 | ***Endnotes*** 212 | 213 | 1   It is not a requirement for the direction of time to be downwards in a Lamport Diagram. 214 | This is simply a stylistic choice; however, time is most often drawn moving either downwards, or from left to right. 215 | 216 | [↩](#a1) 217 | 218 | -------------------------------------------------------------------------------- /Lecture 04.md: -------------------------------------------------------------------------------- 1 | # Distributed Systems Lecture 4 2 | 3 | ## Lecture Given by [Lindsey Kuper](https://users.soe.ucsc.edu/~lkuper/) on April 6th, 2020 via [YouTube](https://www.youtube.com/watch?v=zQk7U6InXZs) 4 | 5 | | Previous | Next 6 | |---|--- 7 | | [Lecture 3](./Lecture%2003.md) | [Lecture 5](./Lecture%2005.md) 8 | 9 | ## Recap 10 | 11 | ### What is a Distributed System? 12 | 13 | > ***[Martin Kleppman](https://martin.kleppmann.com/)'s definition*** 14 | > A collection of computing nodes connected by a network and characterised by partial failure 15 | 16 | > ***[Peter Alvaro](https://dl.acm.org/profile/81453654530)'s definition*** 17 | > A collection of computing nodes connected by a network and characterised by partial failure and unbounded latency 18 | 19 | 20 | ### What is the Definition of the *"Happens Before"* Relation 21 | 22 | * Events `A` and `B` take place in the same process and `B` happens after `A` 23 | * If `A` is a message send event and `B` is the corresponding receive event, then `B` ***cannot*** have happened before `A`. 24 | * If `A -> C` and `C -> B`, the we can be certain that `A -> B` (Transitive closure) 25 | 26 | 27 | ### The *"Happens Before"* Relation is an Irreflexive Partial Order 28 | 29 | A partial order for a set `S` is where the members of `S` can be ordered using a binary relation such as *"less than or equal to"* `≤`. 30 | The partial order then has the following properties: 31 | 32 | | Property | English Description | Mathematical Description | 33 | |---|---|---| 34 | | Reflexivity | For all `a` in `S`,
`a` is always `≤` to itself | `∀ a ∈ S: a ≤ a` 35 | | Anti-symmetry | For all `a` and `b` in `S`,
if `a ≤ b` and `b ≤ a`, then `a = b` | `∀ a, b ∈ S: a ≤ b, b ≤ a => a = b` 36 | | Transitivity | For all `a`, `b` and `c` in `S`,
if `a ≤ b` and `b ≤ c`, then `a ≤ c` | `∀ a, b, c ∈ S: a ≤ b, b ≤ c => a ≤ c` 37 | 38 | The set of all events in a system can be ordered by the *"happens before"* partial order; however, in the case of events, not all of the three properties described above are applicable. 39 | It is impossible to apply the reflexivity property simply because it makes no sense to say that an event *happened before itself*. 40 | 41 | ### Accurate Terminology 42 | 43 | Don't get hung up on thinking that you must always state that the "happens before" relation is an irreflexive partial order. 44 | It’s fine just to call it a "partial order". 45 | 46 | ## What's an Example of a True Partial Order? 47 | 48 | Set inclusion (or set containment). 49 | This is defined as the set of all subsets of any given set. 50 | 51 | So, if we have a set `{a, b, c}`, then the containment set `S` contains the following 8 elements; each of which is one of the possible subsets of `{a, b, c}` (Don't forget to include both the empty set and the entire set!): 52 | 53 | ``` 54 | { {} 55 | , {a} 56 | , {b} 57 | , {c} 58 | , {a, b} 59 | , {a, c} 60 | , {b, c} 61 | , {a, b, c} 62 | } 63 | ``` 64 | 65 | All of these members are subsets of the original set and are related by the relation `⊆` (meaning "is a subset of") and can be represented as a lattice with the empty set at the bottom 66 | 67 | ![Partial order lattice](./img/L4%20Lattice.png) 68 | 69 | The `⊆` relation ("is a subset of") is a true partial order because every element of the set adheres to the rules of: 70 | 71 | * Reflexivity 72 | * Antisymmetry 73 | * Transitivity 74 | 75 | ## Total Orders 76 | 77 | As an aside... 78 | 79 | ***Q:***   In the above inclusion set `S`, how is the element `{a}` related to the element `{b,c}`? 80 | ***A:***   It's not! 81 | In a set defined by a ***partial order***, it is not possible to relate every member of that set to every other member. 82 | That's why its called a ***partial*** order! 83 | 84 | Some orders however are able to relate every element in a set to every other element in that set. 85 | These are known as ***total orders***. 86 | 87 | A good example of a total order is the counting (or natural) numbers. 88 | If our set is the natural numbers and the relation is `≤`, then our lattice diagram of the containment set is simply a straight line: 89 | 90 | ![Total order](./img/L4%20Natural%20Numbers.png) 91 | 92 | This is because every natural number is comparable to every other natural number using the `≤` relation. 93 | 94 | ## How Can a Computer Determine the *"Happens Before"* Relation? 95 | 96 | This question relates to clocks - specifically ***Logical Clocks*** 97 | 98 | > A logical clock is a very unusual type of clock because it can neither tell us the time of day, nor how large a time interval has elapsed between two events. 99 | > All a logical clock can tell us is the order in which events occurred. 100 | 101 | ### Lamport Clocks 102 | 103 | The simplest type of logical clock is a ***Lamport Clock***. 104 | In its most basic form, this is simply a counter assigned to an event. 105 | The way we can reason about Lamport Clock values is by knowing that: 106 | 107 | ``` 108 | if A -> B then LC(A) < LC(B) 109 | ``` 110 | 111 | In other words, if it is true that event `A` happened before event `B`, then we can be certain that the Lamport Clock value of event `A` will be smaller than the Lamport Clock value of event `B`. 112 | 113 | It is not important to know the absolute Lamport Clock value of an event, because outside the context of the *"happens before"* relation, this value is meaningless. 114 | And even in the context of the *"happens before"* relation, we must have at least two events in order to compare their Lamport Clock values. 115 | 116 | In other words, Lamport Clocks mean nothing in themselves, but are consistent with causality. 117 | 118 | ### Assigning Lamport Clocks to Events 119 | 120 | When assigning a value to a Lamport Clock, we need first to make a defining decision, and then follow a simple set of rules: 121 | 122 | 1. First, decide what constitutes *"an event"*. 123 | The outcome of this decision then defines what our particular Lamport Clock will count. 124 | In this particular case, we decide that both sending ***and*** receiving a message ***is*** counted as an event. 125 | (Some systems choose not to count *"message receives"* as events). 126 | 1. Every process has an integer counter, initially set to 0 127 | 1. On every event, the process increments its counter by 1 128 | 1. When a process sends a message, the current Lamport Clock value is included as metadata sent with the message payload 129 | 1. When receiving a message, the receiving process sets its Lamport Clock using the formula 130 | `max(local counter, msg counter) + 1` 131 | 132 | If we decide that a *"message receive"* is not counted as an event, then the above formula does not need to include the `+ 1`. 133 | 134 | ### Worked Example 135 | 136 | We can see how this formula is applied in the following sequence of message send/receive events: 137 | 138 | We have three processes `A`, `B` and `C` and the Lamport Clock values at the top of the process line indicate the state of the clock ***before*** the message send/receive event, and the Lamport Clock values at the bottom of the process line indicate the state of the clock ***after*** the send/receive event has been processed. 139 | 140 | ![Lamport Clock Message Send 1](./img/L4%20LC%20Msg%20Send%201.png) 141 | 142 | ![Lamport Clock Message Send 1](./img/L4%20LC%20Msg%20Send%202.png) 143 | 144 | ![Lamport Clock Message Send 1](./img/L4%20LC%20Msg%20Send%203.png) 145 | 146 | As you can see, the value of each process' Lamport Clock increases monotonically (that is, the value only ever stays the same, or gets bigger — it can never get smaller!) 147 | 148 | ## Reasoning About Lamport Clock Values 149 | 150 | We know that if `A` happens before `B` (`A -> B`) then we also know that `A`'s Lamport Clock value will be smaller than `B`'s Lamport Clock value (`LC(A) < LC(B)`). 151 | 152 | But can we apply this logic the other way around? If we know that `LC(A) < LC(B)` then does this prove that `A -> B`? 153 | 154 | Actually, no it doesn't. Consider this situation: 155 | 156 | ![Lamport Clock Message Send 1](./img/L4%20LC%20Msg%20Send%204.png) 157 | 158 | ![Lamport Clock Message Send 1](./img/L4%20LC%20Msg%20Send%205.png) 159 | 160 | ![Lamport Clock Message Send 1](./img/L4%20LC%20Msg%20Send%206.png) 161 | 162 | Let's compare the Lamport Clock values of processes `A` and `B` 163 | 164 | ``` 165 | LC(A) = 1 166 | LC(B) = 4 167 | ``` 168 | 169 | Since `LC(A) < LC(B)` does this prove that `E1 -> E2`? 170 | 171 | Absolutely not! 172 | 173 | Why? 174 | Because we have no way to connect event `E1` with event `E2`. 175 | These two events do not sit in the same process, neither is there a sequence of message send/receive events that allows us to draw a continuous line between them. 176 | So, in this case, we are unable to use the ***happens before*** relation to define any causal relation between events `E1` and `E2`. 177 | All we can say is that `E1` is independent of `E2`, or `E1 || E2`. 178 | 179 | So, in plain language: 180 | 181 | > If we are unable to trace an unbroken connection between two events, then we are unable to establish a causal relation between those events 182 | 183 | Or to use fancier, academic language: 184 | 185 | > Causality requires graph reachability moving forwards through spacetime 186 | 187 | So, in summary, when we reason about Lamport Clock values, if we know that `A -> B`, then we can be sure that `LC(A) < LC(B)`. In other words: 188 | 189 | > Lamport Clocks are consistent with causality 190 | 191 | However, simply knowing that `LC(A) < LC(B)` does not prove `A -> B`; this is because: 192 | 193 | > Lamport Clocks do not characterise (or establish) causality 194 | 195 | The above example is derived from the ["Holy Grail"](./papers/holygrail.pdf) paper by Reinhard Schwarz and Friedemann Mattern. 196 | 197 | ### So, What Are Lamport Clocks Good For? 198 | 199 | ***Q:***   Can we use a Lamport Clock to tell us the time of day? 200 | ***A:***   Nope 201 | 202 | ***Q:***   Can we use a Lamport Clock to measure the time interval between two events? 203 | ***A:***   Nope 204 | 205 | ***Q:***   So what are they good for? 206 | ***A:***   A Lamport Clock is designed to help establish event ordering in terms of the *happens before* relation 207 | 208 | Even though Lamport Clocks cannot characterise causality, they are still very useful because they define the logical implication: `if A -> B then LC(A) < LC(B)` 209 | 210 | All logical implications are constructed from a premise (`A -> B`) stated in the form of a question, and a conclusion (`LC(A) < LC(B)`) that can be reached if the answer to the question is true. 211 | 212 | For any logical implication, we can also take its contra-positive. 213 | Thus, if `P => Q` then the contra-positive states that `¬Q => ¬P` 214 | 215 | So, in the case of the "happens before" relation 216 | 217 | `if A -> B then LC(A) < LC(B)` 218 | 219 | The contra-positive states 220 | 221 | `if ¬(LC(A) < LC(B)) then ¬(A -> B)` 222 | 223 | The contra-positive states that if the Lamport Clock of `A` is not less than the Lamport Clock of `B`, then `A` cannot have happened before `B`. 224 | This turns out to be really valuable in debugging because if we know that event `A` ***did not*** happen before event `B`, then we can say with complete certainty that whatever consequences were created by event `A`, they could never have contributed to the earlier problem experienced during event `B`. 225 | 226 | 227 | ## Summary 228 | 229 | Lamport Clocks are good for certain aspects of determining causality, but they do leave us with indeterminate situations. 230 | This is because a Lamport Clock is consistent with causality but does not characterise it. 231 | 232 | In order to remove this indeterminacy, we need a different type of clock, called a Vector Clock. 233 | 234 | --- 235 | 236 | | Previous | Next 237 | |---|--- 238 | | [Lecture 3](./Lecture%2003.md) | [Lecture 5](./Lecture%2005.md) 239 | -------------------------------------------------------------------------------- /Lecture 05.md: -------------------------------------------------------------------------------- 1 | # Distributed Systems Lecture 5 2 | 3 | ## Lecture Given by [Lindsey Kuper](https://users.soe.ucsc.edu/~lkuper/) on April 8th, 2020 via [YouTube](https://www.youtube.com/watch?v=zuxA6f-XIAc) 4 | 5 | | Previous | Next 6 | |---|--- 7 | | [Lecture 4](./Lecture%2004.md) | [Lecture 6](./Lecture%2006.md) 8 | 9 | 10 | 11 | ## Recap of Lamport Clocks 12 | 13 | Lamport Clocks are consistent with causality, but do not characterise (or establish) it. 14 | 15 | If `A -> B` then this implies that `LC(A) < LC(B)` 16 | 17 | However, this implication is not reversable: 18 | 19 | If `LC(A) < LC(B)` then it does ***not*** imply `A -> B` 20 | 21 | 22 | ## Vector Clocks 23 | 24 | Invented independently by Friedemann Mattern and Colin Fidge. Both men wrote papers about this subject in 1988. 25 | 26 | > Addendum 27 | > 28 | > Subsequent investigation by Lindsey Kuper has uncovered that without using the specific name *"vector clock"*, the concept was already being used by Rivka Ladin and Barbara Liskov in 1986. 29 | > See [lecture 23](https://github.com/ChrisWhealy/DistributedSystemNotes/blob/master/Lecture%2023.md#ladin--liskovs-paper-highly-available-distributed-services-and-fault-tolerant-distributed-garbage-collection) for details. 30 | 31 | For Vector Clocks however, this relation ***is*** reversable: 32 | 33 | `A -> B <==> VC(A) < VC(B)` 34 | 35 | A Lamport Clock is just a single integer, but a Vector Clock is a sequence of integers. 36 | (The word *"vector"* is being used here in the programming sense, not in the sense of a magnitude and direction) 37 | 38 | ### Implementing a Vector Clock 39 | 40 | There are two pre-conditions that must be fulfilled before you can implement a Vector Clock: 41 | 42 | 1. You must know upfront how many processes make up your system 43 | 1. All the processes must agree the order in which the clock values will occur within the vector 44 | 45 | In this case, we know that we have three processes `Alice`, `Bob` and `Carol`, and that the order of values in the vector will be sorted alphabetically by process name. 46 | 47 | So, each process will create its own copy of the vector clock with an initial value of `[0,0,0]` for `Alice`, `Bob` and `Carol` respectively. 48 | 49 | > ***IMPLEMENTATION DETAIL*** 50 | > When implementing a vector clock, the above two constraints can be relaxed. 51 | > 52 | > Rather than implementing a vector clock as a simple list of integers, it is useful first to create data type for a Lamport Clock which could be as simple as just the process name and a clock value. 53 | > 54 | > A minimal Rust implementation might look something like this: 55 | > 56 | > ```rust 57 | > pub struct LamportClock { 58 | > pub process_name: String, 59 | > pub clock_value: u64 60 | > } 61 | > ``` 62 | > 63 | > A minimal Vector Clock is then simply an array of Lamport Clocks. 64 | > 65 | > ```rust 66 | > pub struct VectorClock { 67 | > pub entries: Vec 68 | > } 69 | > ``` 70 | > 71 | > Now with suitable coding to manage vector clocks, process names can occur in any order, or even be missing. 72 | 73 | The Vector Clock is then managed by applying the following rules: 74 | 75 | 1. Every process maintains a vector of integers initialised to `0` - one for each process with which we wish to communicate 76 | 77 | 1. On every event, a process increments its own position in the vector clock: this also includes internal events that do not cause messages to be sent or received 78 | 79 | 1. When sending a message, each process includes the current state of its clock as metadata with the message payload 80 | 81 | 1. When receiving a message, each process updates its own position in the vector clock using the rule `max(VC(self),VC(msg))` 82 | 83 | If, at some point in time, the state of the vector clock in process `Alice` becomes `[17,0,0]`, then this means that as far as `Alice` is concerned, it has recorded 17 events, whereas it thinks that processes `Bob` and `Carol` have not recorded any events yet. 84 | 85 | ### Calculating the `max` of Two Vector Clocks 86 | 87 | But how do we take the `max` of two vectors? 88 | 89 | The notion of `max` we're going to use here is a per-element comparison (a.k.a. a pointwise maximum) 90 | 91 | For example, if my VC is `[1,12,4]` and I receive the VC `[7,0,2]`, the pointwise maximum would be `[7,12,4]` 92 | 93 | ### Applying the `<` Operator on Two Vectors 94 | 95 | What does `<` mean in the context of two vectors? 96 | 97 | It means that when a pointwise comparison is made of the values in `VC(A)` and `VC(B)`, two things must be true: 98 | 99 | * At least one value in `VC(A)` is less than the corresponding value in `VC(B)`, and 100 | * No value in `VC(A)` is greater than the corresponding value in `VC(B)` 101 | 102 | > ***IMPORTANT*** 103 | > For two vector clocks to be comparable, they must refer to exactly the same set of clock values, and depending on your implemention, those values may also have to be listed in the same order. 104 | 105 | This comparison can be performed using the `≤` operator as long as we first reject the case where `VC(A) == VC(B)`. 106 | To put that more algorithmically, the following must be true (and here we assume that a vector clock is a simple list of integers): 107 | 108 |
109 |    VC(A) !== VC(B)
110 | && VC(A).length == VC(B).length
111 | && for each process at position i, VC(A)i ≤ VC(B)i
112 | 
113 | 114 | ### But Is This Comparison Enough to Characterise Causality? 115 | 116 | Consider two Vector Clocks `VC(A) = [2,2,0]` and `VC(B) = [1,2,3]` 117 | 118 | Is `VC(A) < VC(B)`? 119 | 120 | So, taking a pointwise comparison of each element gives: 121 | 122 | | Index | VC(A)i ≤ VC(B)i | Outcome 123 | |---|---|--- 124 | | `0` | `2 ≤ 1` | `false` 125 | | `1` | `2 ≤ 2` | `true` 126 | | `2` | `0 ≤ 3` | `true` 127 | 128 | The overall result is then calculated by `AND`ing all the outcomes together: 129 | 130 | `false && true && true = false` 131 | 132 | So, we can conclude that `VC(A)` is ***not*** less than `VC(B)`. 133 | 134 | Ok, let’s do this comparison the other way around. 135 | 136 | Is `VC(B) < VC(A)`? 137 | 138 | Again, we perform a pointwise comparison of each element gives: 139 | 140 | | Index | VC(B)i ≤ VC(A)i | Outcome 141 | |---|---|--- 142 | | `0` | `1 ≤ 2` | `true` 143 | | `1` | `2 ≤ 2` | `true` 144 | | `2` | `3 ≤ 0` | `false` 145 | 146 | The overall result is still `false` because `true && true && false = false` 147 | 148 | Since `VC(B)` is ***not*** less than `VC(A)` and `VC(A)` is ***not*** less than `VC(B)`, we are left in an indeterminate state. 149 | All we can say about the events represented by these two vector clocks is that they are concurrent, independent or causally unrelated (these three terms are synonyms). 150 | 151 | I.E. `A || B` 152 | 153 | ## Worked Example 154 | 155 | Let's see how the vector clocks in three processes (`Alice`, `Bob` and `Carol`) are processed as messages are sent and received: 156 | 157 | ![Vector Clocks example 1](./img/L5%20VC%20Clocks%201.png) 158 | ![Vector Clocks example 2](./img/L5%20VC%20Clocks%202.png) 159 | ![Vector Clocks example 3](./img/L5%20VC%20Clocks%203.png) 160 | ![Vector Clocks example 4](./img/L5%20VC%20Clocks%204.png) 161 | ![Vector Clocks example 5](./img/L5%20VC%20Clocks%205.png) 162 | ![Vector Clocks example 6](./img/L5%20VC%20Clocks%206.png) 163 | ![Vector Clocks example 7](./img/L5%20VC%20Clocks%207.png) 164 | 165 | After these messages have been sent, we can see the history of how the vector clocks in each process changed over time. 166 | 167 | ![Vector Clocks example 8](./img/L5%20VC%20Clocks%208.png) 168 | 169 | ## Determining the Causal History of an Event 170 | 171 | If we choose a particular event `A`, let's determine the events in `A`'s causal history. 172 | 173 | ![Causal History 1](./img/L5%20Causal%20History%201.png) 174 | 175 | `A` was an event that took place within process `Bob` and by looking back at `Bob`'s other events, we can see one sequence of events: 176 | 177 | ![Causal History 2](./img/L5%20Causal%20History%202.png) 178 | 179 | Also, by following the messages that led up to event `A`, we can see another sequence of events: 180 | 181 | ![Causal History 3](./img/L5%20Causal%20History%203.png) 182 | 183 | What is common here is that: 184 | 185 | 1. Event `A` has ***graph reachability moving forwards in spacetime*** from all the events in its causal history. 186 | That is, without lifting the pen from the paper or going backwards in time, we can connect any event in `A`'s past with `A`. 187 | 1. Working backwards from `A`, we can see that all the vector clock values in its causal history satisfy the ***happens before*** relation: that is, all preceding vector clock values are less than `A`'s vector clock value. 188 | 189 | In addition to this, by looking at events that come after `A` (in other words, `A` is in the causal history of some future event), we can see that all such vector clock values are larger than `A`'s. 190 | 191 | ### Are the Vector Clocks of All Events Comparable? 192 | 193 | Consider events `A` and `B`. 194 | Can their vector clock values be related using the ***happens before*** relation? 195 | 196 | ![Causal History 4](./img/L5%20Causal%20History%204.png) 197 | 198 | In order for this relation to be satisfied, the vector clock of one event must be less than the vector clock of the other event (in a pointwise sense). 199 | But in this case, this is clearly not true: 200 | 201 | ``` 202 | VC(A) = [2,4,1] 203 | VC(B) = [0,3,2] 204 | 205 | [2,4,1] < [0,3,2] = false 206 | [0,3,2] < [2,4,1] = false 207 | ``` 208 | 209 | Neither vector clock is larger or smaller than the other; therefore, all we can say about these two events is that they are independent, concurrent or causally unrelated, or `A || B`. 210 | 211 | This does however mean that we can easily tell a computer how to determine if two events are causally related. 212 | All we have to do is compare the vector clock values of these two events. 213 | If we can determine that one is less than the other, then we know for certain that the event with the smaller vector clock value occurred in the causal history of the event with the larger vector clock value. 214 | 215 | If, on the other hand, the ***less than*** relation cannot be satisfied, then we can be certain that the two events are causally unrelated. 216 | 217 | ## Protocols 218 | 219 | The non-rigorous definition of a protocol is that it is an agreed upon set of rules that computers use for communicating with each other. 220 | 221 | Let's take a simple example: 222 | 223 | One process sends the message `Hi, how are you?` to another process. 224 | According to our simple protocol, when a process receives this specific message, it is required to respond with `Good, thanks` 225 | 226 | However, our simplistic protocol says nothing about which process is to send the first message, or what should happen after the `Good, thanks` message has been received. 227 | 228 | The following two diagrams are both valid runs of our protocol: 229 | 230 | ![Protocol 1](./img/L5%20Protocol%201.png) 231 | 232 | ![Protocol 2](./img/L5%20Protocol%202.png) 233 | 234 | What about this? 235 | 236 | ![Protocol 3](./img/L5%20Protocol%203.png) 237 | 238 | Nope, this is not allowed because our protocol states that the message `Good, thanks` should only be sent in response to the receipt of message `Hi, how are you?`. 239 | Therefore, sending such an unsolicited message constitutes a protocol violation. 240 | 241 | ### So How Long Did the Message Exchange Take? 242 | 243 | What about this - is this a protocol violation? 244 | 245 | ![Protocol 4](./img/L5%20Protocol%204.png) 246 | 247 | Hmmm, it's hard to tell. 248 | Maybe the protocol is running correctly and we're simply looking at a particular point in time that does not give us the full story. 249 | 250 | The point here is that a Lamport Diagram can only represent logical time - that is, it describes the order in which a sequence of events occurred, but it cannot give us any idea about how much real-world time elapsed between those events. 251 | 252 | But considering that a Logical Clock is only concerned with the ordering of events, it is not surprising that the following two event sequences appear to be identical: 253 | 254 | ![Protocol 5](./img/L5%20Protocol%205.png) 255 | 256 | ### Complete Protocol Representation? 257 | 258 | ***Q:***   Is it possible to use a Lamport Diagram to give us a complete representation of all the possible message exchanges in a given protocol? 259 | ***A:***   No! 260 | 261 | It turns out that there are infinitely many different Lamport Diagrams that all represent valid runs of a protocol. 262 | 263 | The point here is that a Lamport Diagram is good for representing a ***specific*** run of a protocol, but it cannot represent ***all possible*** runs of that protocol, because there will be infinitely many. 264 | 265 | Lamport Diagrams are also good for representing protocol violations - for instance, in the diagram above, `Alice` sent the message `Good, thanks` to `Bob` without `Bob` first sending the message `Hi, how are you?`. 266 | 267 | ### Correctness Properties and Their Violation 268 | 269 | When discussing properties of a system that we want to be true, one way to talk about the correctness of such properties is to draw a diagram that represents its violation. 270 | 271 | For instance, consider the order in which messages are sent and received. 272 | 273 | ### Important Terminology 274 | 275 | In distributed systems, messages are not only sent and received, but they are also ***delivered***. 276 | 277 | Hmmmm, it sounds like there is some highly specific meaning attached to the word ***deliver***... Yes, there is. 278 | 279 | ***Sending*** and ***receiving*** can be understood quite intuitively, but in the context of distributed systems, the concept of ***message delivery*** has a highly specific meaning: 280 | 281 | * ***Sending a Message*** 282 | The explicit act of causing a message to be transmitted (typically over a network). 283 | The sending process has complete control over when or even if a message is sent; therefore, sending a message is entirely ***active***. 284 | * ***Receiving a Message*** 285 | The act of capturing a message upon arrival. 286 | The receiving process has no control over when or even if a message arrives — they just show up randomly... or not. Therefore, from the perspective of the receiving process, receiving a message is entirely ***passive***. 287 | * ***Delivering a Message*** 288 | The term ***Delivery*** exists to distinguish the passive act of receiving a message from the active choice to start processing the contents of that message. You can't decide when you're going to receive a message, but you can decide when and if to process its contents; therefore, although receiving a message is passive, the choice to deliver a message is entirely ***active***. 289 | For example, by placing a received message into a queue, we can delay acting upon the contents of that message until some later point in time. 290 | 291 | ### FIFO (or Ordered) Delivery 292 | 293 | If a process sends message `M2` after message `M1`, the receiving process must deliver `M1` first, followed by `M2`. 294 | 295 | We can represent a protocol violation such as ***FIFO anomaly*** using the following diagram. 296 | 297 | ![FIFO Anomaly](./img/L5%20FIFO%20Anomaly.png) 298 | 299 | This is an example of where a diagram provides a very useful way to represent the violation of some correctness property of a protocol. 300 | 301 | 302 | --- 303 | 304 | | Previous | Next 305 | |---|--- 306 | | [Lecture 4](./Lecture%2004.md) | [Lecture 6](./Lecture%2006.md) 307 | 308 | 309 | -------------------------------------------------------------------------------- /Lecture 06.md: -------------------------------------------------------------------------------- 1 | # Distributed Systems Lecture 6 2 | 3 | ## Lecture Given by [Lindsey Kuper](https://users.soe.ucsc.edu/~lkuper/) on April 10th, 2020 via [YouTube](https://www.youtube.com/watch?v=UoIiwJ2G2fc) 4 | 5 | | Previous | Next 6 | |---|--- 7 | | [Lecture 5](./Lecture%2005.md) | [Lecture 7](./Lecture%2007.md) 8 | 9 | ## Recap 10 | 11 | ### Sending, Receiving and Delivering 12 | 13 | * ***Sending*** a message is active: you choose when and if to send a message 14 | * ***Receiving*** a message is passive: you cannot choose either when or if a message will arrive. 15 | All you can do is react by capturing it 16 | * ***Delivering*** a message is active: it is the conscious choice to act upon the contents of a received message. 17 | 18 | But why would you want to queue a message before processing it? 19 | Typically, because messages need to be processed in the correct order, which could well be different from the order in which they were received. 20 | 21 | ### FIFO Delivery 22 | 23 | If a process sends message `M2` after `M1`, then any process delivering ***both*** of these messages must deliver `M1` first then `M2`. 24 | Failure to do this constitutes a protocol violation as described in the previous lecture as a "FIFO anomaly" 25 | 26 | ![FIFO violation or anomaly](./img/L5%20FIFO%20Anomaly.png) 27 | 28 | In this case, irrespective of the order in which process `Bob` received messages `m1` and `m2`, it should always ***deliver*** message `m1` first, followed by message `m2`. 29 | 30 | In the case that `Bob` does not receive one (or either) of these messages, then no FIFO violation could have occurred, because here we are concerned with message ***delivery***, not message ***receipt***. 31 | 32 | 33 | ### FIFO Delivery Implementation 34 | 35 | In real-life, it is unusual to have to implement FIFO delivery yourself because most distributed systems communicate using TCP which already implements FIFO packet delivery. 36 | 37 | ## Causal Delivery 38 | 39 | There are different ways of phrasing this, but one way is to say: 40 | 41 | > If `m1`'s send happens before `m2`'s send, then `m1`'s delivery must happen before `m2`'s delivery 42 | 43 | Here's an example of such a violation 44 | 45 | ![Causal violation](./img/L6%20Causal%20Violation.png) 46 | 47 | In process P1, event `A` happens in the causal history of event `B`; therefore, any messages sent from P1 to P2 should be processed in the same causal order as the events that generated them. 48 | 49 | But now, let's go back to the "Bob smells" example used in [lecture 3](./Lecture%203.md) 50 | 51 | ![Causal Anomaly](./img/L3%20Causal%20Anomaly.png) 52 | 53 | ***Q:***   Is this a FIFO violation? 54 | ***A:***   No (but only in a vacuous sense...) 55 | 56 | The reason is that a FIFO violation only occurs when two messages ***from the same originating process*** are delivered out of order by the receiving process. 57 | 58 | In the above diagram, there is no single process that sends ***two*** distinct messages to the same receiving process. Here: 59 | 60 | * `Alice` sends a single message to `Bob` 61 | * `Alice` sends a single message to `Carol` 62 | * `Bob` sends a single message to `Alice` 63 | * `Bob` sends a single message to `Carol` 64 | * `Carol` is confused... 65 | 66 | However, this scenario is still a causal anomaly. In general, at least three communicating processes are required to create a causal anomaly. 67 | 68 | ### Can a Message be Sent to Multiple Destinations? 69 | 70 | Yes. 71 | Messages can be sent either to a group of participants in a network (a multicast message), or to all participants in a network (a broadcast message). 72 | The idea of broadcast messages is something that will be dealt with later. 73 | 74 | ## Totally-Ordered Delivery 75 | 76 | This is another correctness property. 77 | 78 | > If a process delivers message `M1` followed by `M2`, then ***all*** processes delivering both `M1` and `M2` must deliver `M1` first followed by `M2`. 79 | 80 | Let's say we have two client processes `C1` and `C2` that each broadcast a message to two processes `R1` and `R2`. 81 | In this scenario, processes `R1` and `R2` each maintain their own replica of some key/value store. 82 | 83 | If processes `R1` and `R2` do not deliver the messages in the correct order, then we will encounter a violation that results in the replicas disagreeing with each other as to what the value of `x` should be. 84 | In other words, this violation creates an inconsistency between data replicas. 85 | 86 | ![Total-Order Anomaly](./img/L6%20Total%20Order%20Anomaly.png) 87 | 88 | This is known as a ***Total-Order Anomaly*** and is created when process `R1` delivers message `m1` followed by `m2`, but process `R2` delivers message `m2` followed by `m1`. 89 | 90 | ### Delivery Guarantees 91 | 92 | Since we know that causal delivery also ensures FIFO delivery, we can start to arrange these delivery strategies in a hierarchy, with the weakest at the bottom. 93 | Here, we will use the term `YOLO` to indicate the delivery guarantee that makes no guarantees! 94 | 95 | ![Delivery Hierarchy 1](./img/L6%20Delivery%20Hierarchy%201.png) 96 | 97 | Where would Totally Ordered Delivery fit in to this scheme? 98 | 99 | In fact, it would get its own branch because a FIFO anomaly is not necessarily an anomaly as far as Totally-Ordered Delivery is concerned. 100 | 101 | We recall that a FIFO anomaly is the following: 102 | 103 | ![FIFO violation or anomaly](./img/L5%20FIFO%20Anomaly.png) 104 | 105 | But since the definition of Totally-Ordered Delivery says that ***all*** processes delivering both `m1` and `m2` must do so in a consistent order, the above FIFO anomaly is not an anomaly for Totally-Ordered Delivery because there is only one receiving process. 106 | So the order in which that process delivers the messages is immaterial. 107 | Thus, this scenario only vacuously conforms to a Totally-Ordered Delivery. 108 | 109 | Conversely, Totally-Ordered Delivery violations are not necessarily FIFO violations. 110 | 111 | ![Delivery Hierarchy 2](./img/L6%20Delivery%20Hierarchy%202.png) 112 | 113 | ### What Does This Hierarchy Imply? 114 | 115 | This hierarchy helps us understand what we can and cannot expect out of a particular delivery guarantee. 116 | For instance, if we implement a system guaranteeing causal delivery, then in doing so, we would also be guaranteeing FIFO delivery, because FIFO delivery sits directly below Causal delivery in the hierarchy. 117 | However, if we implemented a FIFO delivery system, we could make no guarantees about causal delivery. 118 | 119 | Similarly, if the system implements Totally-Ordered Delivery, then this guarantee, in and of itself, cannot ensure either FIFO or Causal delivery. 120 | 121 | Turning this argument around, we can also gain an understanding of what type of anomalies can occur. 122 | For instance, if we have a FIFO anomaly, then this is also going to be a causal anomaly, but not necessarily a Totally-Ordered anomaly. 123 | 124 | ## Implementing Delivery Guarantees 125 | 126 | ### Implementing FIFO Delivery: Sequence Numbers 127 | 128 | The rule here is that any process `P2` delivering messages from some other process `P1`, must do so in the order that `P1` sent those messages; which, due to variations in network latency, might well be different from the order in which those messages arrive at `P2`. 129 | 130 | How then would we go about eliminating FIFO delivery anomalies? 131 | 132 | One possibility is to use sequence numbers. 133 | This is where all messages from a given sender are tagged with a sequence number and a sender id. 134 | Each time a message is sent, the sender increments its sequence number. 135 | On the receiver's side, all the messages from a given sender are added to a queue ordered by the sequence number. 136 | When all the messages have arrived, the receiver can then deliver them in the correct order. 137 | 138 | In this case, sequence numbers do not need to be unique across all the processes, because each message is also qualified with a sender id; therefore, it is the combination of the sender id and the sequence number that allows the receiver to discriminate who sent which message and in what order. 139 | 140 | ***Problems with Sequence Numbers*** 141 | 142 | What happens if a message is lost? 143 | Consider the following sequence of events: 144 | 145 | * `Alice` sends three messages to Bob: `m1`, `m2` and `m3` 146 | * Bob receives and correctly delivers messages `m1` and `m2` 147 | * For some reason, Bob never received message `m3` 148 | * Unaware the message `m3` never arrived, `Alice` sends messages `m4` and `m5` which `Bob` receives 149 | * However, because `Bob` is still waiting for the message with sequence number `3` to arrive, he will delay the delivery of any subsequent messages (by adding them to a queue) 150 | * In this situation, `Bob` maye well end up waiting forever for the lost message to be delivered 151 | 152 | ![Naïve Sequence numbering](./img/L6%20Naive%20Seq%20Nos.png) 153 | 154 | Consequently, in a network where message delivery is unreliable, a naïve sequence number strategy like this will break as soon as message delivery fails for some reason. 155 | 156 | Strategies to mitigate these problems could include: 157 | 158 | * Buffering out of sequence messages for a pre-determined period of time, hoping that the late message arrives either before the message buffer fills or the pre-determined timeout expires 159 | * Processing out of sequence messages on the assumption that the intervening message is lost. 160 | If this assumption turns out to be false and the message delivery was simply delayed, then the late message would have to be dropped 161 | 162 | Neither of the above strategies are very good in that they tend to create more problems than they solve... 163 | 164 | ### Vacuous FIFO Delivery 165 | 166 | Now consider this situation. 167 | Are the conditions of FIFO delivery satisfied? 168 | 169 | ![Vacuous FIFO Delivery](./img/L6%20Vacuous%20FIFO%20Delivery.png) 170 | 171 | Yes, but only in a vacuous sense. 172 | Due to the fact that `Bob` drops all the messages he receives, zero messages are delivered; therefore, the conditions of FIFO delivery are vacuously satisfied. 173 | 174 | ### Implementing FIFO Delivery: Acknowledgments 175 | 176 | In this approach, upon receipt of a message, every receiver must send a *"message received"* acknowledgment (such as `ack`) back to the sender. 177 | 178 | So, when `Alice` sends a message to `Bob`, neither `Alice` nor `Bob` need concern themselves with sequence numbers. 179 | However, this approach has several distinct drawbacks: 180 | 181 | * Communication now becomes sequential. `Alice` cannot send `m2` to `Bob` until she has received an `ack` from `Bob` that he has received `m1` 182 | * Increases the volume of network traffic 183 | * We are still dependent upon a network that can guarantee reliable message delivery 184 | 185 | One way of making this approach to communication more efficient is to gather messages into batches, thus decreasing the granularity of communication. 186 | 187 | ## Using Vector Clocks to Prevent Causal Anomalies 188 | 189 | Let’s look again at the ***Causal Anomaly*** situation: 190 | 191 | ![Causal Anomaly](./img/L3%20Causal%20Anomaly.png) 192 | 193 | The problem here is that `Carol` delivers the message she receives from `Bob` out of causal order, thus resulting in confusion... 194 | 195 | Here, we can use vector clocks to solve causal anomalies. 196 | 197 | In the [previous lecture](./Lecture%205.md), we looked at using vector clocks to count both message-send and -receive events; but in order to ensure Causal Delivery, it turns out that we only need to count message send events. 198 | 199 | ![Ensure Casual Delivery 1](./img/L6%20Ensure%20Casual%20Delivery%201.png) 200 | 201 | Before sending the message, `Alice` updates her vector clock to `[1,0,0]` 202 | 203 | ### What Should Bob Do? 204 | 205 | `Bob` receives the message from `Alice`. 206 | 207 | ***Q:***   Should he deliver it? 208 | ***A:***   Yes, he has no reason not to. 209 | 210 | `Bob` delivers the message and discovers that it is not to his liking. 211 | But, since `Bob` has the emotional maturity of an eight-year-old, he fails to realise that soap and water will work far better than trading insults; so, he resorts to telling the world what he thinks of `Alice`. 212 | 213 | In delivering this message, `Bob` examines the vector clock of the incoming message and discovers that its less than his, but only by the counter in `Alice`'s position. 214 | This is to be expected, since the message came from `Alice`. 215 | He therefore uses the received vector clock to update his own vector clock, and then increments his position in the vector clock. 216 | 217 | Since a broadcast message is treated as a single send event to multiple recipients, the same vector clock value of `[1,1,0]` is sent as part of the messages to both `Alice` and `Carol`. 218 | 219 | ![Ensure Casual Delivery 2](./img/L6%20Ensure%20Casual%20Delivery%202.png) 220 | 221 | ### What Should Alice Do? 222 | 223 | The message now arrives at `Alice`. 224 | 225 | ***Q:***   Should she deliver it? 226 | ***A:***   Yes, she has no reason not to. 227 | 228 | `Alice`'s vector clock is `[1,0,0]` and the incoming vector clock on the message differs only by `1` in `Bob`'s position. 229 | So, we can conclude that only one event has taken place since our last message send event, and that event happened in process `Bob` from whom we received this message. 230 | 231 | ### But What Should Carol Do? 232 | 233 | `Bob`'s message also arrives at Carol with vector clock `[1,1,0]`, but earlier than `Alice`'s original message. 234 | 235 | ***Q:***   Should she deliver it? 236 | ***A:***   No — look at the vector clock values! 237 | 238 | The reason is that compared to `Carol`'s vector clock (which is still set to `[0,0,0]`), the vector clock on the incoming message is too big. 239 | 240 | It’s fine for `Bob`'s position to be set to `1` because this is one bigger than `Carol`'s vector clock position for `Bob` and the message came from `Bob`. 241 | 242 | But there's a `1` in `Alice`'s vector clock position. 243 | 244 | ***Q:***   Hmmmm, that's odd. Where did that come from? 245 | ***A:***   The value comes from the fact that this message is the response to some event that has taken place in `Alice`, but that ***Carol doesn't yet know about***. 246 | 247 | In other words, as far as `Carol` is concerned, this is a ***message from the future*** that has arrived too early and must therefore be buffered. 248 | 249 | Finally, `Alice`'s original `"Bob smells"` message arrives at `Carol`. 250 | `Carol` now examines this message's vector clock and discovers that it has the expected value of `[1,0,0]`; therefore, it is fine to deliver this message first. 251 | 252 | Once this out-of-sequence message has been delivered, the message waiting in the buffer can be delivered because `Carol` has now caught up with the event that took place in `Alice`. 253 | 254 | `Carol` is no longer confused... 255 | 256 | --- 257 | 258 | | Previous | Next 259 | |---|--- 260 | | [Lecture 5](./Lecture%2005.md) | [Lecture 7](./Lecture%2007.md) 261 | 262 | -------------------------------------------------------------------------------- /Lecture 08.md: -------------------------------------------------------------------------------- 1 | # Distributed Systems Lecture 8 2 | 3 | ## Lecture Given by [Lindsey Kuper](https://users.soe.ucsc.edu/~lkuper/) on April 15th, 2020 via [YouTube](https://www.youtube.com/watch?v=x1BCZ351dJk) 4 | 5 | | Previous | Next 6 | |---|--- 7 | | [Lecture 7](./Lecture%2007.md) | [Lecture 9](./Lecture%2009.md) 8 | 9 | 10 | ## Rules of the Chandy-Lamport Algorithm 11 | 12 | This is an example of a decentralised[1](#f1) algorithm that allows you to take a global snapshot of a running distributed system, and its design gives us two key advantages: 13 | 14 | 1. Any participating process can initiate a snapshot. 15 | The process initiating the snapshot is not required to occupy some elevated role (such as "supervisor") because this task not considered "special" or "privileged". 16 | 1. The process initiating the snapshot does not need to warn the other processes that this action is about to take place. 17 | 18 | The act of initiating a snapshot creates a cascade of marker messages throughout the entire system. 19 | This message cascade then causes all the other processes to take a snapshot of themselves. 20 | 21 | ### The Initiator Process 22 | 23 | * Records its own state 24 | * Sends a marker message out on all its outgoing channels 25 | * Starts recording messages arriving on ***all*** incoming channels 26 | 27 | If process `P1` decides to initiate a snapshot, then the following sequence of events takes place: 28 | 29 | * `P1` records its own state as `S1` 30 | * Immediately after recording its own state, `P1` sends out marker messages on all its outgoing channels (only one in this case: channel C12) 31 | * `P1` starts recording any messages that might arrive on its incoming channels (again, only one in this case: channel C21) 32 | 33 | ![Chandy-Lamport Snapshot 1](./img/L8%20CL%20Snapshot%201.png) 34 | 35 | Notice that at the time `P1`'s snapshot happens, message `m` is currently ***in the channel*** from `P2` to `P1` (channel C21). 36 | 37 | ### Processes Receiving a Marker Message 38 | 39 | > ***IMPORTANT*** 40 | > 41 | > ![A Marker FIFO Anomaly Cannot Happen](./img/L8%20Marker%20FIFO%20Anomaly.png) 42 | > 43 | > Due the fact that all channels behave as FIFO queues, we do not need to be concerned about the possibility of FIFO anomalies. 44 | > This system is designed such that marker messages cannot arrive ***before*** earlier message-send events in the originating process. 45 | > 46 | > None of what follows would work if we had not first eliminated the possibility of FIFO anomalies! 47 | 48 | When a process receives a marker message, it can react in one of two different ways. 49 | How it reacts depends on whether or not that process has already seen a marker message during this run of the global snapshot. 50 | 51 | #### Scenario 1: Nope, I Haven't Seen a Marker Message Before... 52 | 53 | If this is the first time this process has seen a marker message, the receiver: 54 | 55 | * Records its own state 56 | * Flags the channel on which the marker message was received as ***empty*** 57 | * Sends out a marker message on each of its outgoing channels 58 | * Starts recording incoming messages on all channels ***except*** the one on which it received the original marker message (now flagged as empty) 59 | 60 | ***Q:***   During a snapshot, once a channel is marked as empty, what happens if you then receive a message on that channel? 61 | ***A:***   Whilst the snapshot is running, messages received on channels marked as empty are ignored! 62 | 63 | In the diagram below, since this is the first marker message `P2` has seen, it does the following: 64 | 65 | * It records its own state as `S2` 66 | * Flags channel C12 as empty 67 | * Sends out a marker message on all its outgoing channels (in this case, only channel C21) 68 | * Normally, it would now start recording any messages that arrive on its other, incoming channels; however, in this case, since its only incoming channel (C12) has already been marked as empty, there is nothing to record 69 | 70 | ![Chandy-Lamport Snapshot 2](./img/L8%20CL%20Snapshot%202.png) 71 | 72 | #### Scenario 2: Yup, I've Already Seen a Marker Message... 73 | 74 | If a process sends out a marker message, then we consider that process already to have "seen" a marker message (its own). 75 | So when a process that has already sent out its own marker message receives someone else's marker message, it: 76 | 77 | * Stops recording incoming messages on that channel 78 | * Sets that channel's final state to be the sequence of all messages received whilst recording was active 79 | 80 | Message `m` from `P2` (sent at event `C`) arrives on channel C21 as event `D` in process `P1`. 81 | This message arrived ***before*** the marker message because channels always behave as FIFO queues. 82 | 83 | Upon receiving this marker message, `P1` then: 84 | 85 | * Stops recording on the marker message's channel (C21 in this case) 86 | * The final state of channel C21 is set to the sequence of messages that arrived whilst recording was active 87 | 88 | ![Chandy-Lamport Snapshot 3](./img/L8%20CL%20Snapshot%203.png) 89 | 90 | So, we now have a consistent snapshot of our entire system, which in this simple case, consists of four things: 91 | 92 | 1. The state of our two processes: 93 | * `P1`'s state recorded as `S1` 94 | * `P2`'s state recorded as `S2` 95 | 1. The state of all channels between those processes: 96 | * Channel C12 recorded by `P2` (Empty) 97 | * Channel C21 recorded by `P1` (Message `m`) 98 | 99 | ## The Chandy-Lamport Algorithm in a More Detailed Scenario 100 | 101 | When a snapshot takes place, every process ends up sending out a marker message to every other process. 102 | So, for a system containing `N` participating processes, `N * (N - 1)` marker messages will be sent. 103 | This might seem inefficient as the number of messages rises quadratically with the number of participating processes, but unfortunately, there is no better approach. 104 | 105 | As stated in the previous lecture, the success of the Chandy-Lamport algorithm relies entirely on the truth of the following assumptions: 106 | 107 | 1. Eventual message delivery is guaranteed, thus making delivery failure impossible 108 | 1. All channels act as FIFO queues, thus eliminating the possibility of messages being delivered out of order (FIFO anomalies) 109 | 1. Processes don't crash! (See [lecture 10](./Lecture%2010.md)) 110 | 111 | ### A Worked Example 112 | 113 | In this example, we have three communicating processes `P1`, `P2` and `P3` in our system, and we want to take a snapshot. 114 | 115 | Process `P1` acts as the initiator; so it follows the above steps: 116 | 117 | * It records its own state as S1 118 | * It sends out two marker messages; one to `P2` and one to `P3` - but notice that the arrival of the marker message at `P2` is delayed. 119 | This turns out not to be a problem. 120 | * `P1` starts recording on both its incoming channels C21 and C31 121 | 122 | ![Chandy-Lamport Example Step 1](./img/L8%20Snapshot%20Ex%201.png) 123 | 124 | Next, `P3` receives the marker message from `P1`. 125 | Since this is the first marker message it has received: 126 | 127 | * It records its own state as `S3` 128 | * Marks the channel on which it received the marker message (C13) as empty 129 | * Sends out marker messages on all its outgoing channels 130 | * Starts recording on its other incoming channel (C23) 131 | 132 | ![Chandy-Lamport Example Step 2](./img/L8%20Snapshot%20Ex%202.png) 133 | 134 | Looking at `P3`'s marker message that now arrives at `P1`, since `P1` initiated the snapshot process, this is not the first marker it has seen, so `P1`: 135 | 136 | * Stops recording incoming messages on that channel (C31) 137 | * Sets that channel's final state to be the sequence of all messages received whilst recording was active - which is none - so the channel state of C31 is `{}`. 138 | 139 | ![Chandy-Lamport Example Step 3](./img/L8%20Snapshot%20Ex%203.png) 140 | 141 | Now look at the other marker message from `P3` to `P2`. 142 | This is the first marker `P2` has seen, so it: 143 | 144 | * It records its own state as `S2` 145 | * Marks the channel on which it received the marker message (C32) as empty 146 | * Sends out marker messages on all its outgoing channels 147 | * Starts recording on its other incoming channel (C12) 148 | 149 | ![Chandy-Lamport Example Step 4](./img/L8%20Snapshot%20Ex%204.png) 150 | 151 | Eventually, the initial marker message from `P1` arrives at `P2`. 152 | This is the second marker `P2` has seen, so it: 153 | 154 | * Stops recording incoming messages on that channel (C12) 155 | * Sets that channel's final state to be the sequence of all messages received whilst recording was active - which is none - so the channel state of C12 is `{}`. 156 | 157 | ![Chandy-Lamport Example Step 5](./img/L8%20Snapshot%20Ex%205.png) 158 | 159 | `P2`'s marker message now arrives at `P1`. 160 | This is not the first marker `P1` has seen, so it: 161 | 162 | * Stops recording incoming messages on that channel (C21) 163 | * Sets that channel's final state to be the sequence of all messages received whilst recording was active - which in this case is the message `m3` sent at event `H` in `P2` to event `D` in `P1` - so the channel state of C12 is `{m3}`. 164 | 165 | ![Chandy-Lamport Example Step 6](./img/L8%20Snapshot%20Ex%206.png) 166 | 167 | Lastly, the marker message from `P2` arrives at `P3`. 168 | Similarly, this is not the first marker `P3` has seen, so it: 169 | 170 | * Stops recording incoming messages on that channel (C23) 171 | * Sets that channel's final state to be the sequence of all messages received whilst recording was active - which is none - so the channel state of C23 is `{}`. 172 | 173 | ![Chandy-Lamport Example Step 7](./img/L8%20Snapshot%20Ex%207.png) 174 | 175 | We now have a consistent snapshot of the entire system composed of three process states: 176 | 177 | * P1 = S1 178 | * P2 = S2 179 | * P3 = S3 180 | 181 | And six channel states: 182 | 183 | * C12 = {} 184 | * C21 = {m3} 185 | * C13 = {} 186 | * C31 = {} 187 | * C23 = {} 188 | * C32 = {} 189 | 190 | ### What About the Internal Events Not Recorded in the Process Snapshots? 191 | 192 | In the above diagram, events `C`, `D` and `E` do not form part of `P1`'s snapshot recorded in state S1 because these events had not yet occurred at the time `P1` decided to take its snapshot. 193 | 194 | Similarly, events `J` and `K` do not form part of `P3`'s snapshot recorded in state S3 because these events had not yet occurred at the time the marker message arrived from `P1`. 195 | 196 | These events will all be recorded the next time a snapshot is taken. 197 | 198 | ## How Does the Entire System Know When the Snapshot Is Complete? 199 | 200 | An individual process knows its local snapshot is complete when it has recorded: 201 | 202 | * Its own internal state, and 203 | * The state of ***all*** its incoming channels 204 | 205 | If it can be shown that the snapshot process terminates for an individual process, and all individual processes use the same snapshot algorithm, then it follows that the snapshot will terminate for all participating processes in the system. 206 | 207 | Now we can appreciate the importance of the assumptions listed at the start. 208 | The success of this entire algorithm rests on the fact that: 209 | 210 | * Eventual message delivery is guaranteed, and 211 | * Messages never arrive out of order (all channels are FIFO queues), and 212 | * Processes do not crash (yeah, right! Again, see [lecture 10](./Lecture%2010.md)) 213 | 214 | In Chandy & Lamport's [original paper](https://lamport.azurewebsites.net/pubs/chandy.pdf) they provide a proof that the snapshot process does in fact terminate. 215 | 216 | However, determining when the snapshot for the entire system is complete lies outside the rules of the Chandy-Lamport algorithm itself. 217 | Management of an entire system snapshot needs to be handled by some external coordinating process that: 218 | 219 | 1. Receives all the snapshot data from the individual processes, then 220 | 1. Collates that data to form an overall system snapshot. 221 | 222 | --- 223 | 224 | | Previous | Next 225 | |---|--- 226 | | [Lecture 7](./Lecture%2007.md) | [Lecture 9](./Lecture%2009.md) 227 | 228 | --- 229 | 230 | ***Endnotes*** 231 | 232 | 1   In this context, a "decentralised algorithm" is one that does not need to be invoked from a special coordinating process; any process in the system can act as the initiator. A beneficial side-effect of this is that if two processes simultaneously decide to initiate a snapshot, then nothing bad happens. 233 | 234 | [↩](#a1) 235 | 236 | -------------------------------------------------------------------------------- /Lecture 09.md: -------------------------------------------------------------------------------- 1 | # Distributed Systems Lecture 9 2 | 3 | ## Lecture Given by [Lindsey Kuper](https://users.soe.ucsc.edu/~lkuper/) on April 17th, 2020 via [YouTube](https://www.youtube.com/watch?v=utsDozs1ZMc) 4 | 5 | | Previous | Next 6 | |---|--- 7 | | [Lecture 8](./Lecture%2008.md) | [Lecture 10](./Lecture%2010.md) 8 | 9 | 10 | ## Big Picture View of the Chandy-Lamport Algorithm 11 | 12 | ### Chandy-Lamport Assumptions 13 | 14 | The Chandy-Lamport algorithm was the very first to define how you can take a reliable snapshot of a running distributed system. 15 | This algorithm does make some fairly large assumptions, but the fact that it works even if marker messages are delayed is a significant achievement. 16 | 17 | #### FIFO Delivery 18 | 19 | We have assumed that the communication channels used between processes in a distributed system operate as FIFO queues. 20 | This means that a channel is a mechanism for delivering an ordered sequence of messages. 21 | But as we saw in the previous lecture, this is far more than an assumption - it is a requirement. 22 | If channels did not behave as FIFO queues, then the Chandy-Lamport Snapshot Algorithm would break. 23 | 24 | So, this requirement then leads to the question: 25 | 26 | ***Q:***   If the delivery mechanism in a distributed system ***cannot*** guarantee ordered delivery (I.E. FIFO anomalies are possible), then are other algorithms available for taking a global snapshot? 27 | ***A:***   Yes, there are, but such algorithms have other drawbacks such as needing to pause application processing while the snapshot is taking place. 28 | 29 | One particularly nice thing about the Chandy-Lamport algorithm is that you can take a snapshot while the application is running (I.E. it’s not a *stop-the-world* style algorithm). 30 | The fact that a process sends out marker messages during its snapshot does not interfere with the application messages already travelling through the system. 31 | 32 | #### Reliable Delivery 33 | 34 | The Chandy-Lamport algorithm requires that messages are never: 35 | 36 | * Lost 37 | * Corrupted 38 | * Duplicated 39 | 40 | #### Processes Don't Crash 41 | 42 | The Chandy-Lamport algorithm is not robust against processes crashing whilst the snapshot is being taken. 43 | If this were to happen, at best the snapshot would be incomplete; but to obtain a full snapshot, you would have to start the snapshot process over again. 44 | 45 | ### The Chandy-Lamport Algorithm is Guaranteed to Terminate 46 | 47 | Without giving a formal proof, in order to take a snapshot of a distributed system we must make each process responsible for recording: 48 | 49 | * Its own internal state 50 | * The state of all messages on its incoming channels 51 | 52 | Then, when all the processes in the system have completed their snapshots, the sum of the individual snapshots becomes the consistent snapshot of the entire system. 53 | 54 | If it can be demonstrated that this approach for taking a snapshot will terminate for an individual process, and all processes follow the same approach, then it follows that the entire snapshot process will terminate for the entire distributed system. 55 | 56 | In section 3.3 of Chandy & Lamport's [original paper](./papers/chandy.pdf) they say: 57 | 58 | > If the graph is strongly connected and at least one process spontaneously records its state, then all processes will record their states in finite time (assuming reliable delivery) 59 | 60 | The term ***strongly connected*** means that every process is reachable from every other process via channels. 61 | In our example however, we have assumed that every process is directly connected to every other process by a channel, thus we not only have a strongly connected graph, we have a ***complete graph***. 62 | 63 | If we draw the processes in our example system as a graph, every process is connected to every other process, making a total graph. 64 | 65 | ![Total Graph](./img/L9%20Total%20Graph.png) 66 | 67 | However, Chandy & Lamport require only that the graph is strongly connected; for example, like this: 68 | 69 | ![Connected Graph](./img/L9%20Connected%20Graph.png) 70 | 71 | `P3` can still send messages to `P2` but it must send them via `P1`. 72 | 73 | ## Simultaneous Snapshot Initiators 74 | 75 | It is interesting that Chandy & Lamport state that ***at least*** one process must start the global snapshot by recording its own state. 76 | They do not require that ***only*** one process initiates the global snapshot. 77 | 78 | So, let's look at what happens when two processes simultaneously decide to take a snapshot. 79 | 80 | In the following diagram, processes `P1` and `P2` simultaneously decide to snapshot themselves. 81 | 82 | ![Simultaneous Snapshot 1](./img/L9%20Simultaneous%20Snapshot%201.png) 83 | 84 | Since each is acting as the initiator, they both follow these rules: 85 | 86 | * `P1` and `P2` both record their own state, creating states `S1` and `S2` respectively 87 | * `P1` and `P2` both start recording messages on their incoming channels - C21 and C12 respectively 88 | * `P1` and `P2` both send out marker messages on their outgoing channels 89 | 90 | Now the marker messages arrive at each process. 91 | 92 | ![Simultaneous Snapshot 2](./img/L9%20Simultaneous%20Snapshot%202.png) 93 | 94 | As soon as the marker messages arrive: 95 | 96 | * `P1` and `P2` stop recording messages on their incoming channels 97 | * `P1` and `P2` save the state of each recorded channel. 98 | In this case, no messages arrived on C21, but message `m` arrived on C12 99 | 100 | Again, we now have a coherent snapshot of the whole system in spite of the fact that two processes simultaneously decided to act as initiators. 101 | 102 | This is known as a ***consistent cut*** - something we'll talk about a little later. 103 | 104 | ### Is It Possible to Get a Bad Snapshot? 105 | 106 | Could we end up with a bad snapshot such as the one shown below? 107 | 108 | ![Bad Snapshot](./img/L9%20Bad%20Snapshot.png) 109 | 110 | No, this is in fact impossible because as soon as a process records its own state, it must immediately send out marker messages. 111 | This is the rule that makes it impossible for message `m` to arrive at `P2` before the snapshot processing has completed. 112 | 113 | The rules of the Chandy-Lamport Snapshot algorithm state that as soon as the internal state of `P1` is recorded, marker message ***must*** be sent on all outgoing channels. 114 | So, the marker message will always be sent ***before*** event `B` happens in `P1` that sends message `m` to `P2`. 115 | Also, because channels are FIFO queues, it is impossible for message `m` to arrive at `P2` before the marker message arrives. 116 | 117 | To understand how important this is, let's consider what would happen if simultaneous initiators were ***not*** permitted and the process wishing to initiate a snapshot had to seek agreement from the other participants before starting: 118 | 119 | * `P1` decides it wants to take a snapshot 120 | * `P1` sends a message to all the participants saying *"Hi guys, I'd like to take a snapshot. Is that OK with you?"* 121 | * But whilst `P1` is waiting for everyone to respond, another process sends out a message saying *"Hi guys, I'd like to take a snapshot. Is that OK with you?"* 122 | 123 | This all gets very chaotic and could well lead to some sort of deadlock. 124 | 125 | So, if multiple initiators are not permitted, then there has to be some way for processes to decide who is going to act as the sole initiator. 126 | This then leads into the very challenging problem domain known as ***Agreement Problems*** (Warning: here be dragons!) 127 | 128 | Since the Chandy-Lamport algorithm permits multiple initiators, it is very much easier to implement because we do not have to care about solving the hard problem of agreeing on either who will act as the initiator, or when a snapshot should be taken — ***any*** process can take a snapshot ***any*** time it likes! 129 | 130 | Further to this, any process that receives a marker message does not need to care about either who sent that marker, or which process originally acted as the initiator. 131 | Hence, markers can be very lightweight messages that do not need to carry any data such as the identity of the initiator. 132 | 133 | The Chandy-Lamport algorithm is an example of a decentralised algorithm. 134 | There are, however, algorithms that are centralised, and these ***do*** require a single process to act as the initiator (we'll talk more about this type of algorithm later in the course). 135 | 136 | ## Why Do We Want Snapshots in the First Place? 137 | 138 | What are snapshots good for? Here are some ideas: 139 | 140 | * ***Checkpointing*** 141 | If we are performing a process that is either expensive or based on non-deterministic input (such as user requests arriving over a network), then in the event of failure, a checkpoint allows us to reconstruct the system's state and provides us with a reasonable restart point. 142 | For expensive calculations, all our calculation effort up until the last checkpoint is preserved, and for transactional systems, the system state preserved at the checkpoint reduces data loss down to only those transactions that occurred between the checkpoint being taken and the system failing. 143 | * ***Deadlock detection*** 144 | Once a dead lock occurs at time `T` in a system, unless it is resolved, that deadlock will continue to exist for all points in time greater than `T`. 145 | A snapshot can be used to perform deadlock detection and thus serves as a useful debugging tool 146 | * ***Stable Property Detection*** 147 | A deadlock is an example of a ***Stable Property***. 148 | A property of the system is said to be ***stable***, if, once it becomes true, it remains true. 149 | Another example of a stable property is when the system has finished doing useful work - I.E. Task termination (however, human intervention might be required in order to detect that this state has been reached) 150 | 151 | Be careful not to conflate a ***deadlock*** with a process crashing. 152 | Typically (but not always), a deadlock occurs when two running processes enter a mutual wait state. 153 | Thus, neither process has crashed, but at the same time, neither process is capable of doing any useful work because each is waiting for a response from the other. 154 | 155 | ## Chandy-Lamport Algorithm and Causality 156 | 157 | The set of events in a Chandy-Lamport Snapshot will always make sense with respect to the causality of those events. 158 | 159 | Now we need to introduce a new piece of terminology: a ***cut***. 160 | 161 | A cut is a time frontier that divides a Lamport diagram into past and future events. 162 | 163 | ![Cut](./img/L9%20Cut.png) 164 | 165 | An event is said to be ***in the cut*** if it belongs to the past. 166 | In Lamport diagrams, where time goes downwards, this means events occurring above the line. 167 | 168 | So, if event `E` is in the cut and `D->E` then for the cut to be ***consistent***, event `D` must also be in the cut. 169 | 170 | This is a restatement of the principle of [consistent global snapshots](https://github.com/ChrisWhealy/DistributedSystemNotes/blob/master/Lecture%207.md#consistent-global-snapshot) that we saw in [lecture 7](./Lecture%207.md). 171 | 172 | In both the of following diagrams `B->D`. 173 | Therefore, for a cut to be consistent, it must preserve the fact that event `B` happens in the causal history of event `D`. 174 | 175 | So, this cut is valid and is therefore called a consistent cut: 176 | 177 | ![Consistent Cut 1](./img/L9%20Consistent%20Cut%201.png) 178 | 179 | Even though the cut has separated events `B` and `D`, their causality has not been reversed: event `B` remains on the "past" side of the cut, and event `D` remains on the "future" side. 180 | 181 | However, in the diagram below, the cut is inconsistent because the causality of events `B` and `D` has been reversed: 182 | 183 | ![Inconsistent Cut](./img/L9%20Inconsistent%20Cut.png) 184 | 185 | The happens-before relation between events `B` and `D` is now broken because the cut has moved the future event `D` into the past, and the past event `B` into the future. 186 | 187 | ## The Chandy-Lamport Algorithm Determines a Consistent Cut 188 | 189 | If you take a cut of a distributed system and discover that the set of events in that cut is identical to a Chandy-Lamport snapshot, then you have a consistent cut. 190 | This is what is meant when we say that a Chandy-Lamport Snapshot determines a consistent cut. 191 | 192 | Going back to the more detailed example used in the previous lecture, we can see that the snapshot formed from three process states and six channel states forms a consistent cut. 193 | 194 | ![Consistent Cut 2](./img/L9%20Consistent%20Cut%202.png) 195 | 196 | Another way of saying this is that a Chandy-Lamport snapshot is causally correct. 197 | 198 | ## Recap: Delivery Guarantees and Protocols 199 | 200 | ### FIFO Guarantee 201 | 202 | If a process sends message `m2` after `m1`, then any process delivering both of these messages must deliver `m1` first. 203 | 204 | A violation of this guarantee is a FIFO anomaly: 205 | 206 | ![FIFO Anomaly](./img/L5%20FIFO%20Anomaly.png) 207 | 208 | ### A Protocol to Enforce FIFO Delivery 209 | 210 | In order to enforce FIFO delivery, we could implement strategies such as: 211 | 212 | * Giving each message a sequence number. 213 | The receiving process is then required to deliver these messages in sequence number order 214 | * Requiring the receiving process to acknowledge receipt of a message 215 | 216 | ### Causal Delivery 217 | 218 | The [causal delivery](https://github.com/ChrisWhealy/DistributedSystemNotes/blob/master/Lecture%206.md#causal-delivery) guarantee simply says messages must be delivered in the order they were sent. 219 | 220 | > If `m1`'s send happens before `m2`'s send, then `m1`'s delivery must happen before `m2`'s delivery. 221 | 222 | A causal anomaly looks like this: 223 | 224 | ![Causal Anomaly](./img/L3%20Causal%20Anomaly.png) 225 | 226 | ### A Protocol to Enforce Causal Delivery 227 | 228 | In order to enforce causal delivery (at least in the case of broadcast messages), we could implement a causal broadcast strategy based on vector clocks. 229 | Each process is then responsible for maintaining its own vector clock and by means of a queueing mechanism, ensures that no ***messages from the future*** are delivered early. 230 | 231 | ### Totally Ordered Delivery 232 | 233 | If a process delivers `m1` then `m2`, all participating processes delivering both messages must deliver `m1` first. 234 | 235 | A violation of totally-ordered delivery looks like this: 236 | 237 | ![Total Order Anomaly](./img/L6%20Total%20Order%20Anomaly.png) 238 | 239 | ### A Protocol to Enforce Totally-Ordered Delivery 240 | 241 | We did not actually talk about a protocol for enforcing totally-ordered delivery - we just spoke about it being hard! 242 | 243 | But here's an idea. 244 | If a fifth process were added to act as a coordinator, then totally-ordered delivery could be ensured by telling every client process that in order to change the state of the data in the replicated databases, it must first check in with the coordinator. 245 | The coordinator then acts as a middleman for ensuring that message delivery happens in the correct order. 246 | 247 | However, this approach has several downsides: 248 | 249 | * Under high-load situations, the coordinator process could become a performance bottleneck 250 | * If the coordinator crashes, then all updates stop until such time as the coordinator can be restarted (but who's going to monitor the coordinator process — a coordinator-coordinator process?) 251 | 252 | ## Safety and Liveness Properties 253 | 254 | Let's now briefly introduce the next topic, that of ***safety*** and ***liveness*** properties. 255 | 256 | | Safety Property | Liveness Property | 257 | |---|---| 258 | | Something bad will ***not*** happen | Something good ***eventually*** happens 259 | | In a finite execution, we can demonstrate that something bad will happen if this property is not satisfied.

FIFO anomalies, Causal Anomalies and Totally-Ordered Anomalies are all examples of safety properties because we can demonstrate that their failure causes something bad to happen | For example, all client messages are eventually answered.

The problem though is that liveness properties tend to have very open-ended definitions. This means we might have to wait forever before something good happens... Consequently, when considering ***finite*** execution, it is very difficult (impossible?) to provide counter examples.

This is why liveness properties are much harder to reason about. 260 | 261 | --- 262 | 263 | | Previous | Next 264 | |---|--- 265 | | [Lecture 8](./Lecture%2008.md) | [Lecture 10](./Lecture%2010.md) 266 | 267 | -------------------------------------------------------------------------------- /Lecture 11.md: -------------------------------------------------------------------------------- 1 | # Distributed Systems Lecture 11 2 | 3 | ## Lecture Given by [Lindsey Kuper](https://users.soe.ucsc.edu/~lkuper/) on April 22nd, 2020 via [YouTube](https://www.youtube.com/watch?v=Rly9GBg14Zs) 4 | 5 | | Previous | Next 6 | |---|--- 7 | | [Lecture 10](./Lecture%2010.md) | [Lecture 12](./Lecture%2012.md) 8 | 9 | 10 | 11 | ## Implementing Reliable Delivery 12 | 13 | ### What Exactly is Reliable Delivery? 14 | 15 | The definition we gave in the previous lecture for reliable delivery was the following 16 | 17 | > Let `P1` be a process that sends a message `m` to process `P2`. 18 | > If neither `P1` nor `P2` crashes,* then `P2` eventually delivers message `m` 19 | 20 | We also qualified this definition with the further criterion (indicated by the star) that says: 21 | 22 | **\*** So, it appears there could be some circumstances under which we need to care about message loss, and some circumstances under which we don't. 23 | 24 | The difference here depends on which failure model we want to implement. 25 | If we are implementing the Crash Model, then the only kind of failures we will tolerate are processes crashing. 26 | So, under these circumstances, we will not handle message loss. 27 | However, if we decide to implement the Omission Model, message loss must also be tolerated and therefore handled. 28 | 29 | So, our definition of ***Reliable Delivery*** varies depending on which fault model we choose to implement. 30 | 31 | From a practical perspective, the Omission Model is a far more useful approach for handling real-life situations since message loss is a real, everyday occurrence. 32 | 33 | Remember also from the previous lecture that due to their hierarchical relationship, if we implement the Omission Model, we have also implemented the Crash Model. 34 | 35 | ![Fault Hierarchy 1](./img/L10%20Fault%20Hierarchy%201.png) 36 | 37 | ### But Can't We Have a Consistent Definition of "Reliable Delivery" 38 | 39 | Wouldn't it be better to have a consistent definition of "Reliable Delivery" that doesn't accumulate extra caveats depending upon the fault model? 40 | 41 | Certainly - that would be something like this: 42 | 43 | > If a correct process `P1` sends a message `m` to a correct process `P2` and not all messages are lost, then `P2` eventually delivers message `m`. 44 | 45 | Ok, but haven't you just hidden the variability of the earlier definition behind the abstract term ***correct process***? 46 | Yes, that's true — but you did ask for a consistent definition! 47 | 48 | The term ***correct process*** means different things in different fault models. 49 | If we need to implement the Byzantine Fault Model, then a ***correct*** process is a non-malicious or non-arbitrary process; however, if we are implementing the Crash Model, then a ***correct*** process is simply one that doesn't crash. 50 | 51 | So, as soon as you see the word ***"correct"*** in a definition like the one above, we should immediately determine which fault model is being used, because this will then tell us what ***correct*** means in that particular context. 52 | 53 | ### How Do We Go About Implementing Reliable Delivery? 54 | 55 | Going back to The Two Generals Problem discussed in the last lecture, one approach to help `Alice` and `Bob` launch a coordinated attack would be for `Alice` to keep sending the message `"Attack at dawn"` until she receives an `ack` from `Bob`. 56 | 57 | This could be implemented using the following algorithm: 58 | 59 | * `Alice` sends the message `"Attack at dawn"` to `Bob`, then places it into a send buffer that has a predetermined timeout period 60 | * If an `ack` is received from `Bob` during the timeout period, communication was successful and `Alice` can delete the message from her send buffer 61 | * If the timeout expires without receiving an `ack` from `Bob`, `Alice` resends the message 62 | 63 | ![Reliable delivery 1](./img/L11%20Reliable%20Delivery%201.png) 64 | 65 | However, this style of implementation is not without its issues because it can never solve the Two Generals Problem: 66 | 67 | * When `Alice` receives an `ack` from `Bob`, she can be sure that `Bob` got her message; however, `Bob` is still unsure that his `ack` was received 68 | * Reliable delivery does not need to assume FIFO delivery, so copies of `Alice`'s original message could be received after `Bob` has already issued an `ack`. 69 | 70 | Does it matter that `Bob` receives multiple copies of the same message? 71 | 72 | In this situation, due to the nature of this particular message, no it doesn't. 73 | This is partly because this particular strategy for increasing communication certainty requires `Alice` to send multiple messages until such time as she receives an `ack` from `Bob`; consequently, `Bob` will, mostly likely, receive multiple copies of the same message. 74 | 75 | However, what if `Bob` was a Key/Value store and `Alice` wanted to modify the value of some variable? 76 | Well, this depends entirely what `Alice`'s message contains. 77 | 78 | ![Reliable delivery 2](./img/L11%20Reliable%20Delivery%202.png) 79 | 80 | If the message simply sets `x` to some absolute value, then this would not result in any data corruption. 81 | 82 | ![Reliable delivery 3](./img/L11%20Reliable%20Delivery%203.png) 83 | 84 | However, if the message instructs the KeyStore to increment the value of `x`, then this will create significant problems if such a message were delivered more than once. 85 | 86 | ### Idempotency 87 | 88 | The word ***idempotent*** comes from the Latin words *idem* meaning "the same" and *potens* meaning "power". 89 | 90 | In this context, the instruction contained in the message is said to be ***idempotent*** if (in mathematical terms): 91 | 92 | `f(x) = f(f(x)) = f(f(f(x))) etc...` 93 | 94 | In other words, an idempotent function `f` only has an effect the first time it is applied to some value `x`. 95 | Thereafter, subsequent applications of function `f` to the value returned by `f(x)` have no further effect. 96 | 97 | So, assigning a value to a variable is idempotent, but incrementing a variable is not. 98 | 99 | Generally speaking, if we can work with idempotent operations, then our implementation of reliable delivery will be easier because we can be certain that nothing bad will happen to our data if, for some reason, the operation is applied more than once. 100 | 101 | ### How Many Times Should a Message be Delivered? 102 | 103 | From the above discussion (and assuming that all communication happens within the asynchronous network model), we can say that reliable delivery therefore means that a message is delivered ***at least once***. 104 | 105 | Let's say we have some function `del` that returns the number of times message `m` has been delivered, then there are three possible options: 106 | 107 | | Delivery Strategy | Delivery Count 108 | |---|--- 109 | | At least once | `1 ≤ del(m)` 110 | | At most once | `0 ≤ del(m) ≤ 1` 111 | | Exactly once | `del(m) == 1` 112 | 113 | 114 | Looking at the above table, it can be seen that since ***at most once*** delivery allows for a message to be delivered zero times, then this strategy can be implemented (at least vacuously) by not sending the message at all! Doh! 115 | 116 | But how about the ***exactly once*** strategy? 117 | 118 | There does not appear to be any formal proof to demonstrate that this strategy is impossible to implement; however, it is certainly very hard to ensure. 119 | Any time a system claims to implement ***exactly once*** delivery, the reality is that in the strictest sense, other assumptions have been made about the system that then give the appearance of exactly once delivery. 120 | Such systems tend either to work with idempotent messages (which you can deliver as many times as you like anyway), or there is some sort of message de-duplication functionality at work. 121 | 122 | ## How Many Recipients Receive This Message? 123 | 124 | Most of the message sending scenarios we've looked at so far are cases where one participant sends a message to exactly one other participant. 125 | 126 | In [lecture 7](Lecture%207.md) we looked at an implementation of causal broadcast. 127 | This is the situation in which ***all*** participants in the system receive the message (excluding of course the process that sent the message in the first place). 128 | By means of vector clocks, we were able to ensure that a ***message from the future*** was not delivered too early. 129 | 130 | ![Causal Broadcast](./img/L7%20Causal%20Broadcast%208.png) 131 | 132 | Then there is the case that one participant sends a message to ***many***, but not all of the other participants in the system. 133 | An example of this the Total-Order anomaly we also saw in lecture 7. 134 | 135 | ![Total Order Anomaly](./img/L7%20TO%20Anomaly.png) 136 | 137 | In this case `C1` sends messages to `R1` and `R2`, but not `C2`; likewise, `C2` sends messages to `R1` and `R2`, but not `C1`. 138 | 139 | So, assuming reliable delivery, we have three different message sending strategies: 140 | 141 | | Message Sending Strategy | Number of Recipients 142 | |---|--- 143 | | Unicast | One 144 | | Multicast | Many 145 | | Broadcast | All 146 | 147 | In this course we will not speak to much about implementing unicast messages; instead, we will simply assume that a unicast command exists as a primitive within each process. 148 | 149 | In this manner, we could send broadcast or multicast messages simply by invoking the unicast primitive multiple times. 150 | 151 | ![Broadcast Implemented Using Unicast](./img/L11%20Broadcast%201.png) 152 | 153 | Up until now, we have been drawing our Lamport diagrams with multiple messages coming from a single event in the sending process - and this is how it should be. 154 | Conceptually, we need to treat broadcast messages as having exactly one point of origin. 155 | 156 | However, under the hood, the mechanism for sending the actual messages could be multiple invocations of the unicast send primitive. 157 | But this will only get us so far. 158 | The problem is that even if we batch together all the message send commands into some transactional bundle and then something goes wrong when we're only half way through sending that bundle, what remedial action can we take concerning the messages that have already been sent? 159 | 160 | Should we attempt to cancel them... 161 | 162 | Revoke them.. 163 | 164 | What should we do here? 165 | 166 | So, the reality is that we need a reliable way to define reliable broadcast. 167 | 168 | ## Implementing Reliable Broadcast 169 | 170 | Remembering the discussion of the term ***correct*** given in the section above, reliable broadcast can then be generically defined as: 171 | 172 | > If the correct process delivers the broadcast message `m` then all correct processes deliver `m`. 173 | 174 | ***IMPORTANT ASSUMPTION*** 175 | The discussion that follows assumes we are working within the Crash Model where we can pretend that message loss never happens. 176 | Under these limited conditions, we only need to handle the case of processes crashing. 177 | 178 | Let's say `Alice` sends a message to `Bob` and `Carol`. 179 | 180 | Both `Bob` and `Carol` receive the message, and `Bob` delivers it correctly; however, `Carol` crashes before she can deliver that message. 181 | 182 | ![Reliable Broadcast 1](./img/L11%20Reliable%20Broadcast%201.png) 183 | 184 | Has this violated the rules of reliable broadcast? 185 | 186 | Actually, no it hasn't. 187 | This is because since `Carol` crashed, she does not qualify as a ***correct*** process; therefore, its OK that she didn't deliver the message. 188 | 189 | Now consider this next scenario: as we've mentioned earlier, under the hood, a broadcast message can be implemented as a sequence of unicast messages. 190 | So, with this in mind, let's say that `Alice` wants to send a broadcast message to `Bob` and `Carol`; but as we now know, this is implemented at two unicast messages that (conceptually at least) form a single send event. 191 | 192 | So `Alice` sends the first unicast message to `Bob` who delivers it correctly. 193 | But before `Alice` can send the second unicast message to `Carol`, she crashes. 194 | This leaves both `Bob` and `Carol` running normally; however, `Bob` received the "broadcast" message but `Carol` did not. 195 | 196 | ![Reliable Broadcast 2](./img/L11%20Reliable%20Broadcast%202.png) 197 | 198 | We could argue here that since `Alice` did not successfully send the message to ***all*** the participants, it is therefore not a true "broadcast" message. 199 | But this sounds like a pedantic attempt to squeeze through a loophole in the definition of the word "broadcast", and is therefore quite unsatisfactory. 200 | 201 | In reality, `Alice` fully intended to send a message to ***all*** participants in the system; therefore, irrespective of the success or failure of this action, the intention was to send a ***broadcast*** message. 202 | Therefore, this situation is in violation of the specification for a ***reliable broadcast***. 203 | 204 | Notice that the definition of a reliable broadcast message speaks only about the correctness of the processes delivering that message; it says nothing about the correctness of the sending process. 205 | 206 | Therefore, under this definition, if the correct process `Bob` received and delivered the message, then the correct process `Carol` should also receive and deliver this message. 207 | However, since `Carol` never received this message (for whatever reason), this is a violation of the rules of reliable broadcast. 208 | 209 | So how can we implement reliable broadcast, if we know that halfway through the sequence of unicast sends, the sending process could crash? 210 | 211 | Here's one outline of an algorithm for reliable broadcast: 212 | 213 | * All processes keep a set of delivered messages in their local state 214 | * When a process `P` wants to broadcast a message `m`: 215 | * It must unicast that message to all other processes (except itself) 216 | * `P` adds `m` to its set of delivered messages 217 | * When `P1` receives a message `m` from `P2`: 218 | * If `m` is already in `P1`'s set of delivered messages, `P1` does nothing 219 | * Otherwise, `P1` unicasts `m` to everyone except itself and `P2`, and adds `m` to its set of delivered messages 220 | 221 | Let's see this algorithm at work: 222 | 223 | `Alice` sends a message to `Bob`, but then `Alice` immediately crashes. 224 | 225 | However, since `Bob` has not seen that message before, he adds it to his set of delivered messages and unicasts it to `Alice` and `Carol` — but then `Bob` immediately crashes! 226 | 227 | ![Reliable Broadcast 3](./img/L11%20Reliable%20Broadcast%203.png) 228 | 229 | Since `Carol` is the only ***correct*** process left running, she delivers the message. 230 | However, since `Alice` and `Bob` have both crashed, they are excluded from our definition of a ***correct*** process, and so the rules of reliable broadcast remain satisfied. 231 | 232 | Even with the optimization of not sending a known message back to the sender, this protocol still results in processes receiving duplicate messages. 233 | 234 | ![Reliable Broadcast 4](./img/L11%20Reliable%20Broadcast%204.png) 235 | 236 | `Bob` and `Carol` each receive this message twice. 237 | 238 | So, a further optimization can be that if a process has already delivered the received message, then simply do nothing. 239 | 240 | ## Fault Tolerance Often Involves Making Copies of Things 241 | 242 | This is a fundamental concept that will be used a lot as we proceed through this course. 243 | 244 | We can mitigate message loss by making copies of messages; but what else might we lose? 245 | 246 | In addition to message sends and receives, there are also internal events within a process that record its changes of state (I.E. the changes that occur to a process' internal data). 247 | How can we mitigate against data loss? 248 | Again, by taking copies. 249 | 250 | ### Why Have Multiple Copies of Data? 251 | 252 | There are several reasons: 253 | 254 | ***Protection Against Data Loss*** 255 | The state of a process at time `t` is determined by the complete set of events that have occurred up until that time. 256 | Therefore, by knowing a process' event history we can reconstruct the state that process. 257 | This then allows us to keep replicas of processes in different locations. 258 | If one data centre goes down, then we still have all the data preserved in one or more other data centres. 259 | 260 | ***Response Time*** 261 | Having your data in multiple data centres not only solves the issue of data loss, but it can also help with reducing response times since the data centre that is physically closest to you is the one most likely to give you the fastest response. 262 | 263 | ***Load Balancing*** 264 | If you are experiencing a high volume of requests for the data, then having multiple copies of the data is a good way to distribute the processing load across multiple machines in multiple locations. 265 | 266 | ### Reliable Delivery 267 | 268 | In the case of reliable delivery, we can tolerate message loss by sending multiple copies of the same message until at least one gets through. 269 | 270 | ### Reliable Broadcast 271 | 272 | In the case of reliable broadcast, we can tolerate processes crashing by making copies of messages and then forwarding them. 273 | 274 | ## Question Received Via Chat 275 | 276 | ***Q:*** At the moment, we're only working within the Crash Model. 277 | How would this work if we needed to work in the Omission Model? 278 | 279 | ***A:*** Remembering that fault models are arranged hierarchically, the algorithm used for reliable broadcast will act as the foundation for the algorithm used in the Omission Model. 280 | Therefore, what we will need to do is extend the reliable broadcast algorithm with an algorithm for reliable delivery. 281 | 282 | --- 283 | 284 | | Previous | Next 285 | |---|--- 286 | | [Lecture 10](./Lecture%2010.md) | [Lecture 12](./Lecture%2012.md) 287 | 288 | -------------------------------------------------------------------------------- /Lecture 13.md: -------------------------------------------------------------------------------- 1 | # Distributed Systems Lecture 13 2 | 3 | ## Lecture Given by [Lindsey Kuper](https://users.soe.ucsc.edu/~lkuper/) on April 27th, 2020 via [YouTube](https://www.youtube.com/watch?v=5oCUmo9PKaw) 4 | 5 | | Previous | Next 6 | |---|--- 7 | | [Lecture 12](./Lecture%2012.md) | [Lecture 14](./Lecture%2014.md) 8 | 9 | 10 | ## Primary Backup (P/B) Replication 11 | 12 | In P/B Replication, the clients only ever talk to the primary node `P`. 13 | Any time `P` receives a write request, that request is broadcast to all the backup nodes, which independently send their `ack`s back to the primary. 14 | 15 | When the primary has received `ack`s from all its backups, it then delivers the write to itself and sends an `ack` back to the client. 16 | This point in time is known as the ***commit point***. 17 | 18 | The write latency time experienced by the client is the sum of the times taken to complete each of the following four steps (imagine we have some function `rt(From, To)` that can measure the response time between two nodes): 19 | 20 | rt(C, P) + rt(P, Bslowest) + rt(Bslowest, P) + rt(P, C) 21 | 22 | Irrespective of the number of backups in this system, all write requests are completed in these four steps. 23 | 24 | ![Primary Backup Replication - Writes](./img/L12%20Primary%20Backup%20Replication%201.png) 25 | 26 | Read requests are handled directly by the primary. 27 | 28 | ![Primary Backup Replication - Reads](./img/L12%20Primary%20Backup%20Replication%202.png) 29 | 30 | The read latency time is the sum of the time taken to complete the two steps: 31 | 32 | `rt(C, P) + rt(P, C)` 33 | 34 | ### Primary Backup Replication: Drawbacks 35 | 36 | There can only be one primary node; thus, it must handle all the workload. 37 | 38 | In addition to the primary becoming a bottleneck, this arrangement does not allow for any horizontal scalability. 39 | 40 | ## Chain Replication 41 | 42 | Chain Replication was developed to alleviate some of the drawbacks of Primary Backup Replication. 43 | 44 | ![Chain Replication - Write](./img/L12%20Chain%20Replication%201.png) 45 | 46 | Here, the write latency time grows linearly with the number of backups and is calculated as the sum of the time taken to complete the `3 + n` steps (where `n` is the number of intermediate backups between the head and the tail): 47 | 48 | rt(C, H) + rt(H, B1) + … + rt(Bn, T) + rt(T, C) 49 | 50 | So, if you have a large number of backups, the client could experience a higher latency time for each write requests. 51 | 52 | The point however of Chain Replication is to improve read throughput by redirecting all reads to the tail. 53 | 54 | ![Chain Replication - Read](./img/L12%20Chain%20Replication%202.png) 55 | 56 | Now the read latency time is simply the sum of the time taken to complete the two steps: 57 | 58 | `rt(C, T) + rt(T, C)` 59 | 60 | Here are some figures from the [Chain Replication paper](./papers/chain_replication.pdf) by Renesse and Schneider. Looking at Figure 4 at the top of page 8 61 | 62 | ![Chain Replication Paper - Figure 4](./img/L13%20Chain%20Replication%20Paper%20Fig%204.png) 63 | 64 | These graphs compare the request throughput times of three different backup strategies (where `t` represents the number of replicas in the system): 65 | 66 | * ***Weak Replication*** 67 | Client requests can be served by any replica in the system. 68 | Indicated by the solid line with `+` signs 69 | * **Chain Replication** 70 | Client write requests are always served the head, read requests are always served the tail. 71 | Indicated by the dashed line with `x` signs 72 | * ***Primary Backup Replication*** 73 | All client requests are served the primary. 74 | Indicated by the dotted line with `*` signs 75 | 76 | As you can see, Weak Replication offers the highest throughput because any client can talk to any replica. 77 | So, this is good illustration of how throughput can be improved simply by throwing more resources at the problem. 78 | However, it must also be understood that Weak Replication cannot offer the same strong consistency guarantees as either Primary Backup or Chain Replication. 79 | 80 | Weak Replication therefore is only valuable in situations where access to the data is *"read mostly"*, and you're not overly concerned if different replicas occasionally give different answers to the same read request. 81 | 82 | Comparing the Chain and P/B Replication curves, notice that if none of the requests are updates, then their performance is identical. 83 | The same is true when the update percentage starts to exceed about 40%. 84 | 85 | However, look at the Chain Replication curve. 86 | 87 | Instead of descending in a gradually flattening curve, there is a hump at around the 10-15% mark. 88 | This is where the benefits of Chain Replication can be seen. 89 | 90 | By why should this improvement be seen at this particular ratio of writes to reads? 91 | 92 | The answer here lies in understanding how the workload is distributed between the head and tail processes in Chain Replication. 93 | According to the research done by Renesse and Schneider, their experiments showed that when 10-15% of the requests are writes, then this produces the best throughput — presumably because the workload has now been distributed evenly between the head and tail processes. 94 | 95 | It turns out that in practice, this ratio of writes to reads is quite representative of many distributed systems that are *"out there in the wild"*. 96 | 97 | ## Dealing with Failure 98 | 99 | If the primary process in a P/B Replication system fails, who is responsible for informing the clients that one of the backups has now taken on the role of primary? 100 | 101 | Well, clients always need to know who to contact in the first place, and this role could be performed by some sort of coordinator acting as a communication proxy. 102 | 103 | In the P/B Replication system, the coordinator must at least ensure that: 104 | 105 | * Each client knows who is acting as the primary 106 | * Each replica knows who the head is 107 | 108 | In Chain Replication, coordination is slightly more involved in that: 109 | 110 | * Each client must know who is acting as the head ***and*** who is acting as the tail 111 | * Each replica needs to know who the next replica is in the chain 112 | * Everyone must agree on this order! 113 | 114 | ### Coordinator Process 115 | 116 | So, in both situations, it is necessary to have some sort of internal coordinating process whose job it is to know who all the replicas are, and what role they are playing at any given time. 117 | 118 | > ***Assumptions*** 119 | > 120 | > * Not all the processes in our system will crash. 121 | > For a system containing `n` processes, we are relying on the fact that no more than `n-1` processes will ever crash (Ha ha!!) 122 | > * The coordinator process is able to detect when a process crashes. 123 | > 124 | > However, we have not discussed how such assumptions could possibly be true because the term *"crash"* can mean a wide variety of things: perhaps software execution has terminated, or execution continues but the process simply stops responding to messages, or responds very slowly... 125 | > 126 | > Failure detection is a deep topic in itself that we cannot venture into at the moment; suffice it to say, that in an asynchronous distributed system, perfect failure detection is impossible. 127 | 128 | The coordinator must perform at least the following roles: 129 | 130 | * It must know about all the replicas in the system 131 | * It must inform each replica of the role it is currently playing 132 | * It must monitor these replicas for failure 133 | * Should a replica fail, the coordinator must reconfigure the remaining replicas such that the overall system keeps running 134 | * It must inform the clients which replica(s) will service their requests: 135 | * In the case of P/R Replication, the coordinator must inform the clients which replica acts as the primary 136 | * In the case of Chain Replication, the coordinator must inform the clients which replica acts as the head and which acts as the tail 137 | 138 | #### Coordinator Role in P/B Replication 139 | 140 | In the event of failure in a P/B Replication system, the coordinator must keep the system running by: 141 | 142 | * Nominating one of the backups to act as the new primary 143 | * Informing all clients to direct their requests to the new primary 144 | * Possibly starting a new replica to ensure that the current system workload can be handled and configuring that replica to act as a backup 145 | * etc... 146 | 147 | #### Coordinator Role in Chain Replication 148 | 149 | The coordinator must perform a similar set of tasks if failure occurs in a Chain Replication system. 150 | If we assume that the head process fails, then the coordinator must keep the system running by: 151 | 152 | * Nominating the head's successor to act as the new head 153 | * Informing all clients to direct their write requests to the new head 154 | * Possibly starting a new replica to ensure that the current system workload can be handled and reconfiguring the chain to include this new backup 155 | * etc... 156 | 157 | Similarly, if the tail process fails, the coordinator makes the tail process' predecessor the new tail and informs the client of the change. 158 | 159 | ### What if the Coordinator Fails? 160 | 161 | If we go to all the trouble of implementing a system that replicates data across some number of backups, but uses a single coordinator process to manage those replicas, then in one sense, we've actually taken a big backwards step because we've introduced a new *"single point of failure"* into our supposedly fault tolerant system. 162 | 163 | So, what steps can we take to be more tolerant of coordinator failure… 164 | 165 | * Simply spin up some replicas of the coordinator? 166 | And should we do this in just one data centre, or across multiple data centres? 167 | * But then how do you keep the coordinators coordinated? 168 | * Do you have a coordinator coordinator process? 169 | If so, who coordinates the coordinator coordinator process? 170 | 171 | This quickly leads either to an infinite regression of coordinators, or another [Monty Python sketch](./img/very_silly.png)... (Spam! spam! spam! spam!) 172 | 173 | This question then leads us very nicely into the next topic of ***Consensus*** — but we won't start that now. 174 | 175 | It is amusing to notice that in Renesse and Schneider's paper, one of the first things they state is *"We assume the coordinator doesn't fail!"* which they then admit is an unrealistic assumption. 176 | They then go on to describe how in their tests, they had a set of coordinator processes that were able to behave as a single process by running a consensus protocol between them. 177 | 178 | It is sobering to realise that if we wish to implement both strong consistency between replicas ***and*** fault tolerance (which was the problem we wanted to avoid in the first place), then ultimately, we are forced to rely upon some form of consensus protocol. 179 | 180 | But consensus is both ***hard*** and ***expensive*** to implement. 181 | This difficulty might then become a factor in deciding ***not*** to implement strong consistency. 182 | Now it looks very appealing to say *"If we can get away with a weaker form of consistency such as Causal Consistency, then shouldn't we look at this option?"* 183 | 184 | That said, there are times when consensus really is vitally important. 185 | 186 | --- 187 | 188 | | Previous | Next 189 | |---|--- 190 | | [Lecture 12](./Lecture%2012.md) | [Lecture 14](./Lecture%2014.md) 191 | 192 | 193 | -------------------------------------------------------------------------------- /Lecture 15.md: -------------------------------------------------------------------------------- 1 | # Distributed Systems Lecture 15 2 | 3 | ## Lecture Given by [Lindsey Kuper](https://users.soe.ucsc.edu/~lkuper/) on May 4th, 2020 via [YouTube](https://www.youtube.com/watch?v=UCmAzWvrFmo) 4 | 5 | | Previous | Next 6 | |---|--- 7 | | [Lecture 14](./Lecture%2014.md) | [Lecture 16](./Lecture%2016.md) 8 | 9 | ## Course Admin... 10 | 11 | ...snip... 12 | 13 | Read Amazon's [Dynamo](https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf) paper and Google's [MapReduce](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf) paper. 14 | 15 | ...snip... 16 | 17 | ### Problematic Exam Question: Chandy-Lamport Snapshot Bug 18 | 19 | The following diagram shows a buggy implementation of the Chandy-Lamport snapshot algorithm. 20 | 21 | Process `P2` initiates the snapshot, but then something goes wrong. 22 | Where's the bug? 23 | 24 | ![Chandy-Lamport Snapshot Bug](./img/L15%20Chandy-Lamport%20Snapshot%20Bug.png) 25 | 26 | The Chandy-Lamport algorithm assumes FIFO delivery of all messages — irrespective of whether they are application or marker messages; so, if we trace through the steps shown in the diagram, we can discover the bug: 27 | 28 | * `P2` initiates the snapshot so it records its own state (the green ellipse around event `E`), then immediately sends out a marker message to `P1` 29 | * `P1` receives the marker message and immediately records its own state (the green ellipse around events `A`, `B`, `C`, and `D`) and then sends out its marker message 30 | * After `P2` sends out its marker message, its snapshot is complete, and it continues processing events in the normal way — resulting in event `F` sending out an application message to `P1`. 31 | 32 | The bug is created by the fact that this diagram shows a FIFO anomaly created when the application message from `P2` event `F` ***overtakes*** the snapshot marker message. 33 | 34 | As a result, `P1` event `D` is recorded in `P1`'s snapshot, but the event that caused it (`P2` event `F`) is missing from `P2`'s snapshot. 35 | Thus, our snapshot is not a ***consistent cut***. 36 | 37 | Remember that for a cut to be consistent, it must contain ***all*** events that led up to a certain point in time. 38 | So, the inclusion of event `D` in `P1`'s snapshot is the problem because even `D` is the result of delivering a ***message from the future***. 39 | 40 | This is an example of a situation in which a FIFO anomaly (out of order message delivery) leads to a causal anomaly (an inconsistent cut). 41 | 42 | ## Paxos: The Easy Parts 43 | 44 | At the end of the last lecture, our discussion of the Paxos Algorithm got us up to here: 45 | 46 | ![Paxos Consensus Reached](./img/L14%20Paxos%206.png) 47 | 48 | This was a very simple run of Paxos involving: 49 | 50 | * One proposer, 51 | * Three acceptors, and 52 | * Two learners 53 | 54 | In this example, the proposer `P` sent out `prepare` messages to a majority of the acceptors, which in this case, was two out of three; however, it would be been equally valid for `P` to have sent `prepare` messages to all the acceptors. 55 | In fact, doing so would be quite smart because it mitigates against message loss, because on balance, even if one message is lost, you have still communicated with the majority of acceptors. 56 | 57 | The same idea applies when the proposer listens for `promise` messages coming back from the acceptors. 58 | It only needs to hear from a majority of the acceptors before it can be happy. 59 | Exactly who those acceptors are is not important, and if it does hear back from all the acceptors then that's great, but it’s not a requirement. 60 | It just needs to hear from a majority. 61 | 62 | So, when we speak of a ***majority***, we are speaking of at least the ***minimum*** majority. 63 | For instance, if there are five acceptors, then the minimum majority is three: but if we hear back either from four, or even all five, then this is not a problem. 64 | The issue is that we must hear back from at least the minimum number of acceptors required to form a majority. 65 | 66 | There are other subtleties involved in this algorithm that we will now go through, including what happens when there is more than one proposer. 67 | 68 | ## Milestones in the Paxos Algorithm 69 | 70 | One thing that was mentioned in the previous lecture was that three specific milestones are reached during a run of the Paxos algorithm. These are: 71 | 72 | 1. When the proposer receives `promise(n)` messages from a majority of acceptors. 73 | 74 | ![Paxos Milestone 1](./img/L15%20Paxos%20Milestone%201.png) 75 | 76 | A majority of acceptors have all promised to respond to the agreed proposal number `n`; and by implication, they have also promised to ignore any request with a proposal number lower than `n`. 77 | 78 | 1. When a majority of acceptors all issue `accepted(n,val)` messages for proposal number `n` and some value `val`. 79 | 80 | ![Paxos Milestone 2](./img/L15%20Paxos%20Milestone%202.png) 81 | 82 | Now, even though the other processes participating in the Paxos algorithm do not yet realise it, consensus has in fact been reached. 83 | 84 | 1. When the proposer(s) and learners receive `accepted(n,val)` messages from a majority of the acceptors. 85 | 86 | ![Paxos Milestone 3](./img/L15%20Paxos%20Milestone%203.png) 87 | 88 | It is only now that the proposer(s) and the learners ***realise*** that consensus has already been reached 89 | 90 | ## Paxos: The Full Algorithm (Mostly) 91 | 92 | A run of the Paxos algorithm involves the following sequence of message exchanges - primarily between the proposer and acceptors: 93 | 94 | 1. ***The Proposer*** 95 | Sends out `propose(n)` messages to at least the minimum number of acceptors needed to form a majority. 96 | The proposal number `n` must be: 97 | 98 | * Unique 99 | * Higher than any previous proposal number used by ***this*** proposer 100 | 101 | It’s important to understand that the proposal number rules are applied to proposers ***individually***. 102 | Consequently, if there are multiple proposers in the system, there does not need to be any agreement between proposers about what the next proposal number should be. 103 | 104 | 1. ***The Acceptor*** 105 | When the acceptor receives a `prepare(n)` message, it asks itself *"Have I already agreed to ignore proposals with this proposal number?"*. 106 | If the answer is yes, then the message is simply ignored; but if not, it replies to the proposer with a `promise(n)` message. 107 | 108 | By returning a `promise(n)` message, the acceptor has now committed to ignore all messages with a proposal number smaller than `n`. 109 | 110 | 1. ***The Proposer*** 111 | When the proposer has received `promise` messages from a majority of acceptors for a particular proposal number `n`, it sends an `accept(n,val)` message to a majority of acceptors containing both the agreed proposal number `n`, and the value `val` that it wishes to propose. 112 | 113 | 1. ***The Acceptor*** 114 | When an acceptor receives an `accept(n,val)` message, it asks the same question as before: *"Have I already agreed to ignore messages with this proposal number?"*. 115 | If yes, it ignores the message; but if no, it replies with an `accepted(n,val)` both back to the proposer ***and*** broadcasts this acceptance to all the learners. 116 | 117 | Up till now, we have assumed that there is only one proposer — but next, we must examine what happens if there are multiple proposers. 118 | 119 | ### What Happens If There Is More Than One Proposer? 120 | 121 | In this scenario, we will make two changes. 122 | We will run the Paxos algorithm with two proposers, and for visual clarity, since learners do not actually take part in the steps needed to reach consensus, we will omit them from the diagram. 123 | 124 | Let's say we have ***two*** proposers P1 and P2 and as before, three acceptors. 125 | (We also have two learners, but we'll ignore them for the time being.) 126 | 127 | Remember we previously stated that in situations where there are multiple proposers, these proposers must have already agreed on how they will ensure the uniqueness of their own proposal numbers. 128 | So, in this case, we will assume that: 129 | 130 | * Proposer P1 uses odd proposal numbers, and 131 | * Proposer P2 uses even proposal numbers 132 | 133 | So, proposer P1 sends out a `prepare(5)` message to a majority of the acceptors. 134 | This is the first proposal number these acceptors have seen during this run of the protocol, so they are all happy to accept it and respond with `promise(5)` messages. 135 | 136 | Proposer P1 is seeking consensus for value `1`, so it now sends out `accept(5,1)` messages and the majority of acceptors respond with `accepted(5,1)` 137 | 138 | ![Multiple Proposers 1](./img/L15%20Multiple%20Proposers%201.png) 139 | 140 | Ok, that's fine; we seem to have agreed on value `1`. 141 | 142 | Meanwhile, back in Gotham City, proposer P2 has no idea what's been going on, and decides to send out a `prepare(4)` message to all the acceptors... 143 | 144 | ![Multiple Proposers 2](./img/L15%20Multiple%20Proposers%202.png) 145 | 146 | The `prepare(4)` message arrives at acceptors A1 and A2 ***after*** they have already agreed on proposal number `5`. 147 | Since they are now ignoring proposal numbers less than `5`, they simply ignore this message. 148 | 149 | Acceptor A3 however has not seen proposal number `4` before, so it happily agrees to it and sends back a `promise(4)` message to proposer P2. 150 | 151 | Proposer P2 is now left hanging. 152 | 153 | It sent out `prepare` messages to all the acceptors but has only heard back from a minority of them. 154 | The rest have simply not answered, and given the way asynchronous communication works, P2 cannot know ***why*** it has not heard back from the other acceptors. 155 | They could have crashed, or they might be running slowly, or, as it turns out, the other acceptors have already agreed to P1's proposal and are now having his babies... 156 | 157 | So, all P2 can do is wait for its timeout period, and if it doesn't hear back within that time, it concludes that proposal number `4` was a bad idea and tries again. 158 | This time, P2 shows up in a faster car (proposal number `6`). 159 | 160 | ![Multiple Proposers 3](./img/L15%20Multiple%20Proposers%203.png) 161 | 162 | But wait a minute, consensus (milestone 2) has ***already*** been reached, so the acceptors now have a problem because: 163 | 164 | * Acceptors cannot go back on their majority decision 165 | * Acceptors cannot ignore `prepare` messages with a ***higher*** proposal number 166 | 167 | So, here's where we must address one of the subtleties that we previously glossed over. 168 | 169 | Previously, we stated only that if an acceptor receives a `prepare` message with a ***lower*** proposal number, it should simply ignore it. 170 | Well, OK, that's fine. 171 | 172 | But what about the case where we receive a proposal number that is ***higher*** than the last one? 173 | Here is where we need to further qualify ***how*** that `prepare` message should be handled. 174 | 175 | In this case, each acceptor must consider the following situation: 176 | 177 | > *"I've already promised to respond to proposal number `n`, 178 | > but now I'm being asked to promise to respond to proposal number `n+1`"* 179 | 180 | How the acceptor reacts now depends on what has happened in between receiving the `prepare(n)` message and the `prepare(n+1)` message. 181 | 182 | Either way, the acceptor cannot ignore the higher proposal number; so it needs to send out some sort of `promise` message. However, but this time, the acceptor must consider whether it has already accepted a value based on some earlier, lower proposal number. 183 | 184 | * If no, then we accept the new proposal number with a `promise(n+1)` message as normal 185 | * If yes, then we accept the new proposal number with a `promise(n+1, ...)` message, but in addition, we are obligated to tell the new proposer that we've already agreed to go on a date with a proposer using a lower proposal number. 186 | 187 | In the latter case, you can see that the `promise` message needs to carry some extra information. 188 | 189 | In the above example, acceptor A1 has already agreed with proposer P1 that, using proposal number `5`, the value should be `1`; but now proposer P2 comes along and presents proposal number `6` to all the acceptors. 190 | 191 | ![Multiple Proposers 4](./img/L15%20Multiple%20Proposers%204.png) 192 | 193 | So, in this specific situation, acceptor A3 responds simply with `promise(6)` because although it previously agreed to proposal number `4`, nothing came of that, and it has not previously accepted any earlier value. 194 | 195 | Acceptors A1 and A2 however, must respond with the message `promise(6,(5,1))`. 196 | 197 | This extra information in the `promise` message effectively means: *"Ok, I'll move with you to proposal number `6` but understand this: using proposal number `5`, I've already accepted value `1`"*. 198 | 199 | ### So, What Should A Proposer Do with Such a Message? 200 | 201 | Previously, we said that when a proposer receives sufficient `promise(n)` messages, it will then send out `accept(n,val)` messages. 202 | But here's where our description of the protocol needs to be refined. 203 | What should the proposer do if, instead of receiving a `promise(n)` message, it receives a promise(n,(nold,valold)) message? 204 | 205 | In our example, proposer P2 has received three `promise` messages: 206 | 207 | * A straight-forward `promise(6)` from A3, and 208 | * Two `promise(6,(5,1))` messages from A1 and A2 209 | 210 | Proposer P2 must now take into account that using proposal number `5`, consensus has already been reached on value `1`. 211 | 212 | In this case, both `promise` messages contain the value `1` that was agreed upon using proposal number `5`; however, it is perfectly possible that P2 could receive multiple `promise` messages containing values agreed on by proposal numbers older than `5`. 213 | 214 | So, the rule is this: proposer P2 must look at all the older, already agreed upon values, and chose the value corresponding to the most recent, old proposal number. 215 | 216 | This is pretty ironic (and amusing) really because proposer P2 now has no choice over what value to propose. 217 | It is constrained to propose the one value upon which consensus has most recently been reached! 218 | So, the fact that it wants to send out its own proposal is somewhat redundant, because the only value it can propose is one upon which consensus has already been agreed... 219 | 220 | So, now we must revise rule 3 given above. Previously we stated: 221 | 222 | > When the proposer has received `promise` messages from a majority of acceptors for a particular proposal number `n`, it sends an `accept(n,val)` message to a majority of acceptors containing both the agreed proposal number `n`, and the value `val` that it wishes to propose. 223 | 224 | But now we understand that the proposer does not have complete liberty to send out the value ***it wishes*** to propose; instead, it must first consider: 225 | 226 | * If I have received any `promise` messages containing old agreed values, then I am obligated to propose the most recently agreed value 227 | * If I have received only simple `promise(n)` messages, then I am free to propose any value I like 228 | 229 | So now, P2 can only send out the message `accept(6,1)`. 230 | 231 | ![Multiple Proposers 5](./img/L15%20Multiple%20Proposers%205.png) 232 | 233 | Notice that P2 has not had to use the earlier proposal number `5`, but it was constrained to propose the value `1`, because this value has already been agreed upon. 234 | 235 | So, what do the acceptors do now? 236 | They simply invoke rule 4 above and respond with `accepted(6,1)`. 237 | 238 | ![Multiple Proposers 6](./img/L15%20Multiple%20Proposers%206.png) 239 | 240 | Let's isolate the messages that were exchanged between proposer P2 and acceptor A3. 241 | 242 | ![Multiple Proposers 7](./img/L15%20Multiple%20Proposers%207.png) 243 | 244 | A3 only sees the following exchange of messages. 245 | 246 | * P2 first tried proposal number `4`, but nothing came of that 247 | * P2 tried again with proposal number `6` 248 | * A3 went with the highest proposal number (`6`) and subsequently agreed to accept value `1` 249 | 250 | As far as A3 is concerned, it thinks that value `1` was P2's idea. 251 | It has no clue that P2 was proposing a value already agreed upon by others. 252 | 253 | --- 254 | 255 | | Previous | Next 256 | |---|--- 257 | | [Lecture 14](./Lecture%2014.md) | [Lecture 16](./Lecture%2016.md) 258 | 259 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Distributed Systems 2 | 3 | Lecture notes for course [CSE138, Spring 2020](http://composition.al/CSE138-2020-03/index.html) given by [Prof Lindsey Kuper](https://users.soe.ucsc.edu/~lkuper/), Assistant Professor of Computing at UCSC 4 | 5 | Due to the Covid-19 lockdown being enforced at the time, these lectures had to be delivered online and are available on [YouTube](https://www.youtube.com/user/lindseykuper/videos) and [Twitch](https://www.twitch.tv/lindseykuper/videos) 6 | 7 | This series of lectures also includes a discussion panel with recent grad students and two guest lecturers. 8 | Notes have not been created for these events; however, you can watch the videos here: 9 | 10 | * ["Grad Student Discussion Panel"](https://www.youtube.com/watch?v=ArKapXZkJvM) Lindsey Kuper talks with Emma Gomish, Pete Wilcox and Lawrence Lawson. 11 | May 15th, 2020 12 | * ["Blockchain Consensus"](https://www.youtube.com/watch?v=m6qZY7_ingY) by Chris Colohan, May 27th, 2020 13 | * ["Building Peer-to-Peer Applications"](https://www.twitch.tv/videos/640120840) by Karissa McKelvey, June 3rd, 2020 14 | 15 | | Date | Description | Subjects Recapped | 16 | |---|---|---| 17 | | Lecture 1 | There are no notes for this lecture as it was concerned with course administration and logistics | 18 | | [Lecture 2](./Lecture%2002.md)
April 1st, 2020 | Distributed Systems: What and why?
Time and clocks | 19 | | [Lecture 3](./Lecture%2003.md)
April 3rd, 2020| Lamport diagrams
Causality and the "happens before" relation
Network models
State and events
Partial orders 20 | | [Lecture 4](./Lecture%2004.md)
April 6th, 2020 | Total orders and Lamport clocks | Partial orders
Happens before relation 21 | | [Lecture 5](./Lecture%2005.md)
April 8th, 2020 | Vector clocks
Protocol runs and anomalies
Message Delivery vs. Message Receipt
FIFO delivery | Lamport Clocks 22 | | [Lecture 6](./Lecture%2006.md)
April 10th, 2020 | Causal delivery
Totally-ordered delivery
Implementing FIFO delivery
Preview of implementing causal broadcast | Delivery vs. Receipt
FIFO delivery 23 | | [Lecture 7](./Lecture%2007.md)
April 13th, 2020 | Implementing causal broadcast
Uses of causality in distributed systems
Consistent snapshots
Preview of the Chandy-Lamport snapshot algorithm | Causal anomalies and vector clocks 24 | | [Lecture 8](./Lecture%2008.md)
April 15th, 2020 | Chandy-Lamport Snapshot Algorithm | 25 | | [Lecture 9](./Lecture%2009.md)
April 17th, 2020 | Chandy-Lamport wrap-up: limitations, assumptions and properties
Uses of snapshots
Centralized vs. decentralized algorithms
Safety and liveness | Delivery guarantees and protocols 26 | | [Lecture 10](./Lecture%2010.md)
April 20th, 2020 | Reliable delivery
Fault classification and fault models
The Two Generals problem | Safety and liveness 27 | | [Lecture 11](./Lecture%2011.md)
April 22nd, 2020 | Implementing reliable delivery
Idempotence
At-least-once/at-most-once/exactly-once delivery
Unicast/Broadcast/Multicast
Reliable broadcast
Implementing reliable broadcast
Preview of replication 28 | | [Lecture 12](./Lecture%2012.md)
April 24th, 2020 | Replication
Total order vs. determinism
Consistency models: FIFO, causal and strong
Primary-backup replication
Chain replication
Latency and throughput 29 | | [Lecture 13](./Lecture%2013.md)
April 27th, 2020 | **Pause for breath!**
Wrapping up replication techniques 30 | | [Lecture 14](./Lecture%2014.md)
May 1st, 2020 | Handling node failure in replication protocols
Introduction to consensus
Problems equivalent to consensus
The FLP result
Introduction to Paxos | Strongly consistent replication protocols 31 | | [Lecture 15](./Lecture%2015.md)
May 4th, 2020 | Paxos: the interesting parts 32 | | [Lecture 16](./Lecture%2016.md)
May 6th, 2020 | Paxos wrap-up: Non-termination, Multi-Paxos, Fault tolerance
Other consensus protocols: Viewstamped Replication, Zab, Raft
Passive vs. Active (state machine) replication 33 | | [Lecture 17](./Lecture%2017.md)
May 8th, 2020 | Eventual consistency
Strong convergence and strong eventual consistency
Introduction to application-specific conflict resolution
Network partitions
Availability
The consistency/availability trade-off 34 | | [Lecture 18](./Lecture%2018.md)
May 11th, 2020 | Dynamo: A review of old ideasIntroduction to: 35 | | [Lecture 19](./Lecture%2019.md)
May 13th, 2020 | More about Quorum Consistency
Introduction to sharding
Consistent hashing 36 | | [Lecture 20](./Lecture%2020.md)
May 18th, 2020 | Online systems vs. Offline systems
Raw data vs. Derived data
Introduction to Google's MapReduce framework
MapReduce example: transform a forward index into an inverted index 37 | | [Lecture 21](./Lecture%2021.md)
May 20th, 2020 | MapReduce | MapReduce phases 38 | | [Lecture 22](./Lecture%2022.md)
May 29th, 2020 | The math behind replica conflict resolution | Strong convergence
Partial orders 39 | | [Lecture 23](./Lecture%2023.md)
June 1st, 2020 | Filling in the gaps: Overviews of 2-phase commit and Practical Byzantine Fault Tolerance (PBFT)
Quick overview of the history of: 40 | -------------------------------------------------------------------------------- /img/Ben Franklin.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/Ben Franklin.jpg -------------------------------------------------------------------------------- /img/Diagrams.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/Diagrams.pptx -------------------------------------------------------------------------------- /img/Eventual Consistency.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/Eventual Consistency.jpg -------------------------------------------------------------------------------- /img/L10 Fault Hierarchy 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L10 Fault Hierarchy 1.png -------------------------------------------------------------------------------- /img/L10 Fault Hierarchy 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L10 Fault Hierarchy 2.png -------------------------------------------------------------------------------- /img/L10 Fault Hierarchy 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L10 Fault Hierarchy 3.png -------------------------------------------------------------------------------- /img/L10 Fault Hierarchy 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L10 Fault Hierarchy 4.png -------------------------------------------------------------------------------- /img/L10 Fault Hierarchy 5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L10 Fault Hierarchy 5.png -------------------------------------------------------------------------------- /img/L10 Possible Faults.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L10 Possible Faults.png -------------------------------------------------------------------------------- /img/L10 Two Generals.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L10 Two Generals.png -------------------------------------------------------------------------------- /img/L10 Vacuous FIFO Delivery.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L10 Vacuous FIFO Delivery.png -------------------------------------------------------------------------------- /img/L11 Broadcast 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L11 Broadcast 1.png -------------------------------------------------------------------------------- /img/L11 Reliable Broadcast 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L11 Reliable Broadcast 1.png -------------------------------------------------------------------------------- /img/L11 Reliable Broadcast 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L11 Reliable Broadcast 2.png -------------------------------------------------------------------------------- /img/L11 Reliable Broadcast 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L11 Reliable Broadcast 3.png -------------------------------------------------------------------------------- /img/L11 Reliable Broadcast 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L11 Reliable Broadcast 4.png -------------------------------------------------------------------------------- /img/L11 Reliable Delivery 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L11 Reliable Delivery 1.png -------------------------------------------------------------------------------- /img/L11 Reliable Delivery 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L11 Reliable Delivery 2.png -------------------------------------------------------------------------------- /img/L11 Reliable Delivery 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L11 Reliable Delivery 3.png -------------------------------------------------------------------------------- /img/L12 Chain Replication 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Chain Replication 1.png -------------------------------------------------------------------------------- /img/L12 Chain Replication 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Chain Replication 2.png -------------------------------------------------------------------------------- /img/L12 Chain Replication.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Chain Replication.png -------------------------------------------------------------------------------- /img/L12 Consistency Hierarchy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Consistency Hierarchy.png -------------------------------------------------------------------------------- /img/L12 Determinism Violation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Determinism Violation.png -------------------------------------------------------------------------------- /img/L12 Primary Backup Replication 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Primary Backup Replication 1.png -------------------------------------------------------------------------------- /img/L12 Primary Backup Replication 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Primary Backup Replication 2.png -------------------------------------------------------------------------------- /img/L12 Replica Disagreement 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Replica Disagreement 1.png -------------------------------------------------------------------------------- /img/L12 Replica Disagreement 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Replica Disagreement 2.png -------------------------------------------------------------------------------- /img/L12 Replica Disagreement 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Replica Disagreement 3.png -------------------------------------------------------------------------------- /img/L12 Single Replica 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Single Replica 1.png -------------------------------------------------------------------------------- /img/L12 Single Replica 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 Single Replica 2.png -------------------------------------------------------------------------------- /img/L12 TO Anomaly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L12 TO Anomaly.png -------------------------------------------------------------------------------- /img/L13 Chain Replication Paper Fig 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L13 Chain Replication Paper Fig 4.png -------------------------------------------------------------------------------- /img/L14 Chain Replication 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L14 Chain Replication 1.png -------------------------------------------------------------------------------- /img/L14 Chain Replication 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L14 Chain Replication 2.png -------------------------------------------------------------------------------- /img/L14 Consensus 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L14 Consensus 1.png -------------------------------------------------------------------------------- /img/L14 Consensus 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L14 Consensus 2.png -------------------------------------------------------------------------------- /img/L14 Paxos 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L14 Paxos 1.png -------------------------------------------------------------------------------- /img/L14 Paxos 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L14 Paxos 2.png -------------------------------------------------------------------------------- /img/L14 Paxos 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L14 Paxos 3.png -------------------------------------------------------------------------------- /img/L14 Paxos 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L14 Paxos 4.png -------------------------------------------------------------------------------- /img/L14 Paxos 5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L14 Paxos 5.png -------------------------------------------------------------------------------- /img/L14 Paxos 6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L14 Paxos 6.png -------------------------------------------------------------------------------- /img/L15 Chandy-Lamport Snapshot Bug.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L15 Chandy-Lamport Snapshot Bug.png -------------------------------------------------------------------------------- /img/L15 Multiple Proposers 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L15 Multiple Proposers 1.png -------------------------------------------------------------------------------- /img/L15 Multiple Proposers 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L15 Multiple Proposers 2.png -------------------------------------------------------------------------------- /img/L15 Multiple Proposers 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L15 Multiple Proposers 3.png -------------------------------------------------------------------------------- /img/L15 Multiple Proposers 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L15 Multiple Proposers 4.png -------------------------------------------------------------------------------- /img/L15 Multiple Proposers 5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L15 Multiple Proposers 5.png -------------------------------------------------------------------------------- /img/L15 Multiple Proposers 6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L15 Multiple Proposers 6.png -------------------------------------------------------------------------------- /img/L15 Multiple Proposers 7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L15 Multiple Proposers 7.png -------------------------------------------------------------------------------- /img/L15 Paxos Milestone 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L15 Paxos Milestone 1.png -------------------------------------------------------------------------------- /img/L15 Paxos Milestone 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L15 Paxos Milestone 2.png -------------------------------------------------------------------------------- /img/L15 Paxos Milestone 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L15 Paxos Milestone 3.png -------------------------------------------------------------------------------- /img/L16 MultiPaxos.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L16 MultiPaxos.png -------------------------------------------------------------------------------- /img/L16 Paxos Minimum Msg Exchange.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L16 Paxos Minimum Msg Exchange.png -------------------------------------------------------------------------------- /img/L16 Paxos Nontermination 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L16 Paxos Nontermination 1.png -------------------------------------------------------------------------------- /img/L16 Paxos Nontermination 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L16 Paxos Nontermination 2.png -------------------------------------------------------------------------------- /img/L16 Paxos Nontermination 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L16 Paxos Nontermination 3.png -------------------------------------------------------------------------------- /img/L16 Paxos Phases.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L16 Paxos Phases.png -------------------------------------------------------------------------------- /img/L16 Primary Backup 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L16 Primary Backup 1.png -------------------------------------------------------------------------------- /img/L16 Primary Backup 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L16 Primary Backup 2.png -------------------------------------------------------------------------------- /img/L17 Amazon Cart 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L17 Amazon Cart 1.png -------------------------------------------------------------------------------- /img/L17 Amazon Cart 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L17 Amazon Cart 2.png -------------------------------------------------------------------------------- /img/L17 Amazon Cart 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L17 Amazon Cart 3.png -------------------------------------------------------------------------------- /img/L17 Eventual Consistency.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L17 Eventual Consistency.png -------------------------------------------------------------------------------- /img/L17 Network Partition 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L17 Network Partition 1.png -------------------------------------------------------------------------------- /img/L17 Network Partition 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L17 Network Partition 2.png -------------------------------------------------------------------------------- /img/L17 Strong Convergence 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L17 Strong Convergence 1.png -------------------------------------------------------------------------------- /img/L17 TO Anomaly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L17 TO Anomaly.png -------------------------------------------------------------------------------- /img/L17 Tradeoff 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L17 Tradeoff 1.png -------------------------------------------------------------------------------- /img/L17 Tradeoff 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L17 Tradeoff 2.png -------------------------------------------------------------------------------- /img/L17 Tradeoff 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L17 Tradeoff 3.png -------------------------------------------------------------------------------- /img/L18 Merkle Conflict 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L18 Merkle Conflict 1.png -------------------------------------------------------------------------------- /img/L18 Merkle Conflict 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L18 Merkle Conflict 2.png -------------------------------------------------------------------------------- /img/L18 Merkle Conflict 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L18 Merkle Conflict 3.png -------------------------------------------------------------------------------- /img/L18 Merkle Conflict 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L18 Merkle Conflict 4.png -------------------------------------------------------------------------------- /img/L18 Merkle Conflict 5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L18 Merkle Conflict 5.png -------------------------------------------------------------------------------- /img/L18 Merkle Tree 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L18 Merkle Tree 1.png -------------------------------------------------------------------------------- /img/L18 Merkle Tree 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L18 Merkle Tree 2.png -------------------------------------------------------------------------------- /img/L19 Dataset Replication.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Dataset Replication.png -------------------------------------------------------------------------------- /img/L19 Dynamo Read Conflict 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Dynamo Read Conflict 1.png -------------------------------------------------------------------------------- /img/L19 Dynamo Read Conflict 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Dynamo Read Conflict 2.png -------------------------------------------------------------------------------- /img/L19 Key Replication.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Key Replication.png -------------------------------------------------------------------------------- /img/L19 MD5 Output Space.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 MD5 Output Space.png -------------------------------------------------------------------------------- /img/L19 MD5 To Node.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 MD5 To Node.png -------------------------------------------------------------------------------- /img/L19 No Sharding.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 No Sharding.png -------------------------------------------------------------------------------- /img/L19 Node Addition 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Node Addition 1.png -------------------------------------------------------------------------------- /img/L19 Node Addition 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Node Addition 2.png -------------------------------------------------------------------------------- /img/L19 Node Crash 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Node Crash 1.png -------------------------------------------------------------------------------- /img/L19 Ring 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Ring 1.png -------------------------------------------------------------------------------- /img/L19 Ring 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Ring 2.png -------------------------------------------------------------------------------- /img/L19 Ring 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Ring 3.png -------------------------------------------------------------------------------- /img/L19 Ring 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Ring 4.png -------------------------------------------------------------------------------- /img/L19 Sharding 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Sharding 1.png -------------------------------------------------------------------------------- /img/L19 Sharding 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Sharding 2.png -------------------------------------------------------------------------------- /img/L19 Sharding 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Sharding 3.png -------------------------------------------------------------------------------- /img/L19 Sharding 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Sharding 4.png -------------------------------------------------------------------------------- /img/L19 Sharding 5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Sharding 5.png -------------------------------------------------------------------------------- /img/L19 Sharding 6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L19 Sharding 6.png -------------------------------------------------------------------------------- /img/L2 Message 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L2 Message 1.png -------------------------------------------------------------------------------- /img/L20 Distributed MapReduce 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L20 Distributed MapReduce 1.png -------------------------------------------------------------------------------- /img/L20 Distributed MapReduce 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L20 Distributed MapReduce 2.png -------------------------------------------------------------------------------- /img/L20 Distributed MapReduce 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L20 Distributed MapReduce 3.png -------------------------------------------------------------------------------- /img/L20 Distributed MapReduce 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L20 Distributed MapReduce 4.png -------------------------------------------------------------------------------- /img/L20 Inverted Index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L20 Inverted Index.png -------------------------------------------------------------------------------- /img/L21 Master 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L21 Master 1.png -------------------------------------------------------------------------------- /img/L21 Master 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L21 Master 2.png -------------------------------------------------------------------------------- /img/L22 Boolean Ordering.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L22 Boolean Ordering.png -------------------------------------------------------------------------------- /img/L22 Comparable Subsets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L22 Comparable Subsets.png -------------------------------------------------------------------------------- /img/L22 Conflicting Updates.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L22 Conflicting Updates.png -------------------------------------------------------------------------------- /img/L22 Delete Cart Item 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L22 Delete Cart Item 1.png -------------------------------------------------------------------------------- /img/L22 Delete Cart Item 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L22 Delete Cart Item 2.png -------------------------------------------------------------------------------- /img/L22 Delete Cart Item 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L22 Delete Cart Item 3.png -------------------------------------------------------------------------------- /img/L22 Noncomparable Subsets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L22 Noncomparable Subsets.png -------------------------------------------------------------------------------- /img/L22 Replica Consensus 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L22 Replica Consensus 1.png -------------------------------------------------------------------------------- /img/L22 Replica Consensus 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L22 Replica Consensus 2.png -------------------------------------------------------------------------------- /img/L22 Set of Subsets.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L22 Set of Subsets.png -------------------------------------------------------------------------------- /img/L23 Equivalent Terms.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L23 Equivalent Terms.png -------------------------------------------------------------------------------- /img/L23 FT Hierarchy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L23 FT Hierarchy.png -------------------------------------------------------------------------------- /img/L23 Rados Fig 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L23 Rados Fig 2.png -------------------------------------------------------------------------------- /img/L3 Causal Anomaly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L3 Causal Anomaly.png -------------------------------------------------------------------------------- /img/L3 Message Passing.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L3 Message Passing.png -------------------------------------------------------------------------------- /img/L3 Multiple Processes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L3 Multiple Processes.png -------------------------------------------------------------------------------- /img/L3 Process events.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L3 Process events.png -------------------------------------------------------------------------------- /img/L3 Reasoning About State.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L3 Reasoning About State.png -------------------------------------------------------------------------------- /img/L4 LC Msg Send 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L4 LC Msg Send 1.png -------------------------------------------------------------------------------- /img/L4 LC Msg Send 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L4 LC Msg Send 2.png -------------------------------------------------------------------------------- /img/L4 LC Msg Send 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L4 LC Msg Send 3.png -------------------------------------------------------------------------------- /img/L4 LC Msg Send 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L4 LC Msg Send 4.png -------------------------------------------------------------------------------- /img/L4 LC Msg Send 5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L4 LC Msg Send 5.png -------------------------------------------------------------------------------- /img/L4 LC Msg Send 6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L4 LC Msg Send 6.png -------------------------------------------------------------------------------- /img/L4 Lattice.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L4 Lattice.png -------------------------------------------------------------------------------- /img/L4 Natural Numbers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L4 Natural Numbers.png -------------------------------------------------------------------------------- /img/L5 Causal History 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 Causal History 1.png -------------------------------------------------------------------------------- /img/L5 Causal History 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 Causal History 2.png -------------------------------------------------------------------------------- /img/L5 Causal History 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 Causal History 3.png -------------------------------------------------------------------------------- /img/L5 Causal History 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 Causal History 4.png -------------------------------------------------------------------------------- /img/L5 FIFO Anomaly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 FIFO Anomaly.png -------------------------------------------------------------------------------- /img/L5 Protocol 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 Protocol 1.png -------------------------------------------------------------------------------- /img/L5 Protocol 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 Protocol 2.png -------------------------------------------------------------------------------- /img/L5 Protocol 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 Protocol 3.png -------------------------------------------------------------------------------- /img/L5 Protocol 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 Protocol 4.png -------------------------------------------------------------------------------- /img/L5 Protocol 5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 Protocol 5.png -------------------------------------------------------------------------------- /img/L5 VC Clocks 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 VC Clocks 1.png -------------------------------------------------------------------------------- /img/L5 VC Clocks 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 VC Clocks 2.png -------------------------------------------------------------------------------- /img/L5 VC Clocks 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 VC Clocks 3.png -------------------------------------------------------------------------------- /img/L5 VC Clocks 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 VC Clocks 4.png -------------------------------------------------------------------------------- /img/L5 VC Clocks 5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 VC Clocks 5.png -------------------------------------------------------------------------------- /img/L5 VC Clocks 6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 VC Clocks 6.png -------------------------------------------------------------------------------- /img/L5 VC Clocks 7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 VC Clocks 7.png -------------------------------------------------------------------------------- /img/L5 VC Clocks 8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L5 VC Clocks 8.png -------------------------------------------------------------------------------- /img/L6 Causal Violation.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L6 Causal Violation.png -------------------------------------------------------------------------------- /img/L6 Delivery Hierarchy 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L6 Delivery Hierarchy 1.png -------------------------------------------------------------------------------- /img/L6 Delivery Hierarchy 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L6 Delivery Hierarchy 2.png -------------------------------------------------------------------------------- /img/L6 Ensure Casual Delivery 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L6 Ensure Casual Delivery 1.png -------------------------------------------------------------------------------- /img/L6 Ensure Casual Delivery 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L6 Ensure Casual Delivery 2.png -------------------------------------------------------------------------------- /img/L6 Naive Seq Nos.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L6 Naive Seq Nos.png -------------------------------------------------------------------------------- /img/L6 Total Order Anomaly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L6 Total Order Anomaly.png -------------------------------------------------------------------------------- /img/L6 Vacuous FIFO Delivery.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L6 Vacuous FIFO Delivery.png -------------------------------------------------------------------------------- /img/L7 Bad Snapshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Bad Snapshot.png -------------------------------------------------------------------------------- /img/L7 Causal Broadcast 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Causal Broadcast 1.png -------------------------------------------------------------------------------- /img/L7 Causal Broadcast 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Causal Broadcast 2.png -------------------------------------------------------------------------------- /img/L7 Causal Broadcast 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Causal Broadcast 3.png -------------------------------------------------------------------------------- /img/L7 Causal Broadcast 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Causal Broadcast 4.png -------------------------------------------------------------------------------- /img/L7 Causal Broadcast 5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Causal Broadcast 5.png -------------------------------------------------------------------------------- /img/L7 Causal Broadcast 6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Causal Broadcast 6.png -------------------------------------------------------------------------------- /img/L7 Causal Broadcast 7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Causal Broadcast 7.png -------------------------------------------------------------------------------- /img/L7 Causal Broadcast 8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Causal Broadcast 8.png -------------------------------------------------------------------------------- /img/L7 Channels 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Channels 1.png -------------------------------------------------------------------------------- /img/L7 Channels 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Channels 2.png -------------------------------------------------------------------------------- /img/L7 Channels 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Channels 3.png -------------------------------------------------------------------------------- /img/L7 Channels 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Channels 4.png -------------------------------------------------------------------------------- /img/L7 Good Snapshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Good Snapshot.png -------------------------------------------------------------------------------- /img/L7 Process State.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Process State.png -------------------------------------------------------------------------------- /img/L7 TO Anomaly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 TO Anomaly.png -------------------------------------------------------------------------------- /img/L7 Wallclock Snapshot Anomaly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L7 Wallclock Snapshot Anomaly.png -------------------------------------------------------------------------------- /img/L8 CL Snapshot 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L8 CL Snapshot 1.png -------------------------------------------------------------------------------- /img/L8 CL Snapshot 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L8 CL Snapshot 2.png -------------------------------------------------------------------------------- /img/L8 CL Snapshot 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L8 CL Snapshot 3.png -------------------------------------------------------------------------------- /img/L8 Marker FIFO Anomaly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L8 Marker FIFO Anomaly.png -------------------------------------------------------------------------------- /img/L8 Snapshot Ex 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L8 Snapshot Ex 1.png -------------------------------------------------------------------------------- /img/L8 Snapshot Ex 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L8 Snapshot Ex 2.png -------------------------------------------------------------------------------- /img/L8 Snapshot Ex 3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L8 Snapshot Ex 3.png -------------------------------------------------------------------------------- /img/L8 Snapshot Ex 4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L8 Snapshot Ex 4.png -------------------------------------------------------------------------------- /img/L8 Snapshot Ex 5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L8 Snapshot Ex 5.png -------------------------------------------------------------------------------- /img/L8 Snapshot Ex 6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L8 Snapshot Ex 6.png -------------------------------------------------------------------------------- /img/L8 Snapshot Ex 7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L8 Snapshot Ex 7.png -------------------------------------------------------------------------------- /img/L9 Bad Snapshot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L9 Bad Snapshot.png -------------------------------------------------------------------------------- /img/L9 Connected Graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L9 Connected Graph.png -------------------------------------------------------------------------------- /img/L9 Consistent Cut 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L9 Consistent Cut 1.png -------------------------------------------------------------------------------- /img/L9 Consistent Cut 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L9 Consistent Cut 2.png -------------------------------------------------------------------------------- /img/L9 Cut.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L9 Cut.png -------------------------------------------------------------------------------- /img/L9 Inconsistent Cut.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L9 Inconsistent Cut.png -------------------------------------------------------------------------------- /img/L9 Simultaneous Snapshot 1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L9 Simultaneous Snapshot 1.png -------------------------------------------------------------------------------- /img/L9 Simultaneous Snapshot 2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L9 Simultaneous Snapshot 2.png -------------------------------------------------------------------------------- /img/L9 Total Graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/L9 Total Graph.png -------------------------------------------------------------------------------- /img/aardvark.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/aardvark.jpeg -------------------------------------------------------------------------------- /img/bang.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/bang.png -------------------------------------------------------------------------------- /img/cross.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/cross.png -------------------------------------------------------------------------------- /img/cross_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/cross_small.png -------------------------------------------------------------------------------- /img/emoji_book.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/emoji_book.png -------------------------------------------------------------------------------- /img/emoji_jeans.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/emoji_jeans.png -------------------------------------------------------------------------------- /img/emoji_neutral.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/emoji_neutral.png -------------------------------------------------------------------------------- /img/emoji_sad.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/emoji_sad.png -------------------------------------------------------------------------------- /img/emoji_smiley.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/emoji_smiley.png -------------------------------------------------------------------------------- /img/emoji_torch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/emoji_torch.png -------------------------------------------------------------------------------- /img/stickman.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/stickman.png -------------------------------------------------------------------------------- /img/tick.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/tick.png -------------------------------------------------------------------------------- /img/tick_small.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/tick_small.png -------------------------------------------------------------------------------- /img/very_silly.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/img/very_silly.png -------------------------------------------------------------------------------- /papers/Alsberg and Day.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/Alsberg and Day.pdf -------------------------------------------------------------------------------- /papers/Dynamo.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/Dynamo.pdf -------------------------------------------------------------------------------- /papers/FLP.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/FLP.pdf -------------------------------------------------------------------------------- /papers/Frank Schmuck PhD Paper.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/Frank Schmuck PhD Paper.pdf -------------------------------------------------------------------------------- /papers/JSON CRDT.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/JSON CRDT.pdf -------------------------------------------------------------------------------- /papers/Ladin and Liskov.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/Ladin and Liskov.pdf -------------------------------------------------------------------------------- /papers/MapReduce.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/MapReduce.pdf -------------------------------------------------------------------------------- /papers/Paxos Made Simple.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/Paxos Made Simple.pdf -------------------------------------------------------------------------------- /papers/Paxos vs RAFT.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/Paxos vs RAFT.pdf -------------------------------------------------------------------------------- /papers/Paxos vs VSR vs ZAB.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/Paxos vs VSR vs ZAB.pdf -------------------------------------------------------------------------------- /papers/TCOEDS.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/TCOEDS.pdf -------------------------------------------------------------------------------- /papers/VS Replication.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/VS Replication.pdf -------------------------------------------------------------------------------- /papers/VSR.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/VSR.pdf -------------------------------------------------------------------------------- /papers/VirtTime_GlobState.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/VirtTime_GlobState.pdf -------------------------------------------------------------------------------- /papers/Vogels.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/Vogels.pdf -------------------------------------------------------------------------------- /papers/ZAB.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/ZAB.pdf -------------------------------------------------------------------------------- /papers/atomic_broadcast.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/atomic_broadcast.pdf -------------------------------------------------------------------------------- /papers/birman91multicast.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/birman91multicast.pdf -------------------------------------------------------------------------------- /papers/byzantine.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/byzantine.pdf -------------------------------------------------------------------------------- /papers/chain_replication.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/chain_replication.pdf -------------------------------------------------------------------------------- /papers/chandy.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/chandy.pdf -------------------------------------------------------------------------------- /papers/fidge88timestamps.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/fidge88timestamps.pdf -------------------------------------------------------------------------------- /papers/holygrail.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/holygrail.pdf -------------------------------------------------------------------------------- /papers/net_comms_constraints_tradeoffs.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/net_comms_constraints_tradeoffs.pdf -------------------------------------------------------------------------------- /papers/paxoscommit-tods2006.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/paxoscommit-tods2006.pdf -------------------------------------------------------------------------------- /papers/rados.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/rados.pdf -------------------------------------------------------------------------------- /papers/raft.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ChrisWhealy/DistributedSystemNotes/7fc12e742d38fd19ca6916a05d8e7911878eb833/papers/raft.pdf --------------------------------------------------------------------------------