├── .gitignore ├── build.sh ├── flow.dot ├── sagas.pdf └── sagas.tex /.gitignore: -------------------------------------------------------------------------------- 1 | *.swp 2 | *.aux 3 | *.log 4 | flow.pdf 5 | -------------------------------------------------------------------------------- /build.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | #dot -Tpng:cairo flow.dot > flow.png 3 | dot -Tpdf flow.dot > flow.pdf 4 | pdflatex sagas.tex 5 | -------------------------------------------------------------------------------- /flow.dot: -------------------------------------------------------------------------------- 1 | digraph { 2 | start[shape="box", label="Start"]; 3 | 4 | saga_start_log[shape="parallelogram", label="Log saga start"]; 5 | saga_start[shape="box", label="Saga start"]; 6 | 7 | t_init_i[shape="box", label="Let i = 0"]; 8 | t_inc_i[shape="box", label="i++"]; 9 | 10 | t_start_log[shape="parallelogram", label=i start>]; 11 | t_req[shape="box", label=i>]; 12 | t_if_res[shape="diamond", label=i>]; 13 | t_done_log[shape="parallelogram", label=i done>]; 14 | t_if_more[shape="diamond", label="i = n?"]; 15 | 16 | saga_abort_log[shape="parallelogram", label="Log saga abort"]; 17 | saga_abort[shape="box", label="Saga abort"]; 18 | 19 | c_init_i[shape="box", label="Let i = last logged value of i"]; 20 | c_dec_i[shape="box", label="i--"]; 21 | 22 | c_start_log[shape="parallelogram", label=i start>]; 23 | c_req[shape="box", label=i>]; 24 | c_if_res[shape="diamond", label=i>]; 25 | c_done_log[shape="parallelogram", label=i done>]; 26 | c_if_more[shape="diamond", label="i = 0?"]; 27 | 28 | saga_done_log[shape="parallelogram", label="Log saga done"]; 29 | saga_done[shape="box", label="Saga done"]; 30 | 31 | // Start 32 | start -> saga_abort_log[label="incomplete saga"]; 33 | start -> saga_abort[label="aborted saga"]; 34 | start -> saga_start_log[label="clean"]; 35 | 36 | // Saga start 37 | saga_start_log -> saga_start; 38 | saga_start -> t_init_i; 39 | t_init_i -> t_start_log; 40 | 41 | // Transaction attempt 42 | t_start_log -> t_req; 43 | t_req -> t_if_res; 44 | 45 | // Successful txn 46 | t_if_res -> t_done_log[label="ok"]; 47 | t_done_log -> t_if_more; 48 | 49 | // Saga completion 50 | t_if_more -> saga_done_log[label="done"]; 51 | 52 | // More txns? 53 | t_if_more -> t_inc_i[label="more"]; 54 | t_inc_i -> t_start_log; 55 | 56 | // Failed txn 57 | t_if_res -> saga_abort_log[label="error"]; 58 | 59 | // Abort 60 | saga_abort_log -> saga_abort; 61 | saga_abort -> c_init_i; 62 | c_init_i -> c_start_log; 63 | 64 | // Compensate attempt 65 | c_start_log -> c_req; 66 | c_req -> c_if_res; 67 | 68 | // Compensate failure 69 | c_if_res -> c_req[label="error"]; 70 | 71 | // Compensate success 72 | c_if_res -> c_done_log[label="ok"]; 73 | c_done_log -> c_if_more; 74 | 75 | // Rollback complete 76 | c_if_more -> saga_done_log[label="done"]; 77 | 78 | // More compensations 79 | c_if_more -> c_dec_i[label="more"]; 80 | c_dec_i -> c_start_log; 81 | 82 | saga_done_log -> saga_done; 83 | saga_done -> start; 84 | 85 | {rank=same; t_inc_i t_start_log} 86 | {rank=same; c_dec_i c_start_log} 87 | {rank=same; t_if_more c_if_more} 88 | {rank=same; saga_start_log saga_abort_log} 89 | {rank=same; t_start_log c_start_log} 90 | {rank=same; saga_start saga_abort} 91 | } 92 | -------------------------------------------------------------------------------- /sagas.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aphyr/dist-sagas/20c26d899a7c72450749c5eda91c79570afb8e1f/sagas.pdf -------------------------------------------------------------------------------- /sagas.tex: -------------------------------------------------------------------------------- 1 | \documentclass{article} 2 | \usepackage[utf8]{inputenc} 3 | \usepackage[english]{babel} 4 | \usepackage{graphicx} 5 | \usepackage{amsthm} 6 | 7 | \DeclareGraphicsExtensions{.pdf,.ps,.png,.jpg} 8 | 9 | \newcounter{results} 10 | 11 | \newtheorem{theorem}{Theorem}[section] 12 | \newtheorem{lemma}[theorem]{Lemma} 13 | \newtheorem{corollary}{Corollary}[theorem] 14 | 15 | \author{ 16 | McCaffrey, Caitie\\ 17 | \texttt{Sporty Tights, Inc} 18 | \and 19 | Kingsbury, Kyle\\ 20 | \texttt{The SF Eagle} 21 | \and 22 | Narula, Neha\\ 23 | \texttt{That's DOCTOR Narula to you!} 24 | } 25 | 26 | \title{Distributed Sagas} 27 | 28 | \begin{document} 29 | 30 | \maketitle 31 | 32 | \section{Introduction} 33 | 34 | The saga paper outlines a technique for long-lived transactions which provide 35 | atomicity and durability without isolation (what about consistency? Preserved 36 | outside saga scope, not within, right?). In this work, we generalize sagas to 37 | a distributed system, where processes communicate via an asynchronous network, 38 | and discover new constraints on saga sub-transactions. 39 | 40 | We are especially interested in the problem of writing sagas which interact with 41 | \textit{third-party services}, where we control the Saga Execution Coordinator 42 | (SEC) and its storage, but not the downstream Transaction Execution 43 | Coordinators (TECs) themselves. Communication between the SEC and TEC(s) takes 44 | place over an asynchronous network (e.g. TCP) which is allowed to drop, delay, 45 | or reorder messages, but not to duplicate them. 46 | 47 | We assume a high-availability SEC service running on multiple nodes for 48 | fault-tolerance, where multiple SECs may run concurrently. They coordinate 49 | their actions through a linearizable data store, which ensures saga 50 | transactions proceed sequentially. 51 | 52 | \section{The Saga Execution Coordinator} 53 | 54 | \includegraphics[width=\linewidth]{flow} 55 | 56 | 57 | 58 | 59 | \section{Both Rollback and Roll-forward} 60 | 61 | \begin{lemma}[] 62 | \label{t_contiguous} 63 | If $T_i$ is received by a TEC, then $T_0, T_1, ... T_{i-1}$ have already been 64 | acknowledged by a TEC, where $0 < i \le n$. 65 | \end{lemma} 66 | 67 | \begin{proof} 68 | 69 | In order for $T_i$ to be received by a TEC, it must have been requested by an 70 | SEC. In a roll-forward SEC, this could be a retry of a failed attempt to 71 | execute $T_i$, but regardless of whether the SEC is roll-back or roll-forward, 72 | entering that part of the algorithm requires the SEC to journal its intent to 73 | start $T_i$. 74 | 75 | There are only two paths to that journaling operation. The first case, $i = 0$, 76 | falls outside our constraint $0 < i \le n$. Therefore the SEC \textit{must} 77 | have taken the other path: incrementing $i$ before beginning a new transaction. 78 | 79 | That path depends on $i - 1 \ne n$ being false, which holds since we are 80 | considering $i \le n$. That in turn depends on journaling $T_{i-1}$'s 81 | completion, which depends on a successful response from a TEC for $T_{i-1}$. 82 | Therefore some TEC acknowledged $T_i$. That in turn requires that TEC to have 83 | received $T_i$. 84 | 85 | So, the receipt of $T_i$ implies both the receipt and acknowledgement of 86 | $T_{i-1}$. By induction, receiving $T_i$ implies \textit{all} transactions 87 | $T_0, T_1, ... T_{i-1}$ have been acknowledged. 88 | 89 | \end{proof} 90 | 91 | 92 | \begin{corollary} 93 | \label{t_zero_first} 94 | The first transaction to be received and acknowledged is $T_0$. 95 | \end{corollary} 96 | 97 | \begin{proof} 98 | 99 | Assume the first transaction to be processed is not $T_0$, but rather, some 100 | $T_i \mid 0 < i \le n$. By \ref{t_contiguous}, $T_{i-1}$ must have been 101 | received and acknowledged by a TEC already. $T_i$ is therefore \textit{not} the 102 | first transaction: a contradiction. 103 | 104 | \end{proof} 105 | 106 | 107 | \begin{lemma} 108 | \label{c_prior_ts} 109 | If $C_i$ is received by a TEC, then $T_{i - 1}$ must have been acknowledged by 110 | some TEC, where $0 < i \le n$. 111 | \end{lemma} 112 | 113 | \begin{proof} 114 | 115 | Receipt of $C_i$ by a TEC implies the request of $C_i$ by some SEC. An SEC can 116 | only request $C_i$ if it logs its intent to start $C_i$, which can occur by two 117 | paths: either the completion of $C_{i+1}$, or by the initialization of $i$ to 118 | its last logged value. Both branches imply the SEC read $i$, or some higher 119 | value, from storage. 120 | 121 | $i$ is only incremented by an SEC which has successfully completed $T_{i-1}$. 122 | Since $i$ is nonzero, it was incremented, and $T_{i-1}$ was acknowledged by 123 | some TEC. 124 | 125 | \end{proof} 126 | 127 | 128 | \begin{lemma} 129 | \label{c_maybe_t} 130 | If $C_i$ is requested, $T_i$ may or may not have been requested. 131 | \end{lemma} 132 | 133 | \begin{proof} 134 | 135 | We know from \ref{c_prior_ts} that $C_i$ implies the acknowledgement of all 136 | $T_j$ where $0 \le j < i$. But what of that final transaction, $T_i$? Can we 137 | guarantee its completion? 138 | 139 | The answer is no. All that is necessary for $C_i$ to occur is for an SEC to 140 | write $T_i$'s start. If the SEC crashes just after journaling, it will never 141 | request $T_i$. If it does not crash, $T_i$ will be requested. 142 | 143 | \end{proof} 144 | 145 | 146 | \begin{lemma} 147 | \label{max_c_later_ts} 148 | If $C_i$ is the highest compensating transaction requested, no $T_j$ will ever 149 | have been requested, for all $i < j$. 150 | \end{lemma} 151 | 152 | \begin{proof} 153 | 154 | Assume some $T_j$ subsequent to $T_i$ \textit{is} requested. Then some SEC must 155 | have written $j$ to storage prior to that request. In order to reach $C_i$, an SEC must have received acknowledgement for $C_j$ first, which implies $C_i$ is not the highest compensating transaction requested: a contradiction. 156 | 157 | \end{proof} 158 | 159 | \begin{lemma} 160 | \label{max_t_later_cs} 161 | If $T_i$ is the highest transaction requested, no $C_j$ will ever have been 162 | requested, for all $i + 1 < j$. 163 | \end{lemma} 164 | 165 | \begin{proof} 166 | 167 | Assume some $C_j$ \textit{is} eventually requested. Then some SEC must have written $j$ to disk, which implies $T_{j-1}$ was acknowledged. Since $T_{j-1}$ was requested, and $i < j - 1$, $T_i$ cannot have been the highest transaction requested: a contradiction. 168 | 169 | \end{proof} 170 | 171 | 172 | \begin{lemma} 173 | \label{success_all_ts} 174 | If a saga completes successfully, every transaction $T_i$ will have been 175 | acknowledged at least once, for $0 \le i \le n$. 176 | \end{lemma} 177 | 178 | \begin{proof} 179 | 180 | A saga can complete successfully iff the highest transaction $T_n$ has been 181 | acknowledged. By \ref{t_contiguous}, every $T_i$ must \textit{also} have 182 | completed, where $0 \le i < n$. 183 | 184 | \end{proof} 185 | 186 | 187 | \begin{lemma} 188 | \label{abort_corresponding_cs} 189 | If a saga completes the abort process, and $T_i$ was received by a TEC, $C_i$ 190 | was also acknowledged by a TEC. 191 | \end{lemma} 192 | 193 | \begin{proof} 194 | 195 | Let $C_{m}$ be the highest compensating transaction acknowledged. Assume $C_i$ 196 | was not received: $m < i$. By \ref{max_c_later_ts}, no transaction $T_i$ with 197 | $m < i$ can ever occur, so $i \le m$---which contradicts $m < i$. $C_i$ must 198 | have been acknowledged. 199 | 200 | \end{proof} 201 | 202 | 203 | \begin{theorem} 204 | \label{all_ts_or_corresponding_cs} 205 | Once a saga is complete, either every transaction $T_i$ will have been acknowledged at least once; or, for every transaction $T_i$ received by a TEC, $C_i$ is also acknowledged by a TEC. 206 | \end{theorem} 207 | 208 | \begin{proof} 209 | 210 | Sagas may only complete by successful termination or by being aborted. If 211 | successful, \ref{success_all_ts} ensures every $T_i$ occurs. If the saga 212 | aborts, \ref{abort_corresponding_cs} ensures the receipt of $T_i$ implies the 213 | receipt of $C_i$. 214 | 215 | \section{Rollback} 216 | 217 | \begin{lemma} 218 | \label{t_at_most_once} 219 | Transactions are requested and received at most once. 220 | \end{lemma} 221 | 222 | \begin{proof} 223 | 224 | In order for an SEC to request a transaction $T_i$, it has to record its intent 225 | to execute $T_i$ in shared SEC storage. Since that storage is linearizable, any 226 | other SEC recording an intent to execute $T_i$ would be visible to the 227 | requesting SEC. 228 | 229 | \begin{description} 230 | \item[Case 1] Another SEC has already recorded its intent to request $T_i$. 231 | The given SEC chooses to crash instead of requesting $T_i$. 232 | \item[Case 2] No other SEC has recorded its intent to request $T_i$. The 233 | given SEC requests $T_i$ once. 234 | \end{description} 235 | 236 | In both cases, $T_i$ is requested at most once, across all SECs, depending on 237 | whether or not the successfully-recording SEC crashes before making its 238 | request. 239 | 240 | Because the network does not duplicate requests, the number of times $T_i$ can 241 | arrive at a TEC is less than or equal to the number of requests any SEC makes 242 | for $T_i$. Since that number is at most one, $T_i$ is received at most once. 243 | 244 | \end{proof} 245 | 246 | 247 | \begin{lemma} 248 | \label{t_sequential} 249 | Transactions are seen by TECs in sequential order: $T_0, T_1, \ldots, T_j$, 250 | where $0 \le j \le n$. 251 | \end{lemma} 252 | 253 | \begin{proof} 254 | 255 | Consider a sequential history $S = (T_0, ..., T_i)$ followed by $T_j$. Is $(T_0, 256 | ... T_i, T_j)$ sequential? We must show $i + 1 = j$. 257 | 258 | \begin{description} 259 | \item[Case 1] Assume $j \le i$. Then $T_j$ is a duplicate of some transaction 260 | already in $S$, which violates \ref{t_at_most_once}: a contradiction. 261 | \item[Case 2] Assume $i + 1 < j$. By \ref{t_contiguous}, $T_{i + 1}$ must 262 | appear before $T_j$---but $S$ cannot contain $T_{i+1}$, since it only ranges 263 | from $0$ to $T_i$. 264 | \item[Case 3] Assume $i < j \le i + 1$. Then $i + 1 = j$. 265 | \end{description} 266 | 267 | Since cases 1 and 2 are impossible, \textit{any} history comprised of a 268 | transaction following a sequential history of at least one element must be 269 | sequential as well. 270 | 271 | Now, consider histories of one element or fewer: 272 | 273 | \begin{description} 274 | \item[Case 1] No transactions occur. The history is trivially sequential. 275 | \item[Case 2] Exactly one transaction occurs. By \ref{t_zero_first}, that 276 | transaction must be $T_0$. This history is trivially sequential. 277 | \end{description} 278 | 279 | So any history of one element or fewer is sequential, and any transaction 280 | \textit{appended} to that history will also form a sequential history, and so 281 | on. By induction, all transactions in a rollback saga system occur 282 | sequentially. 283 | 284 | \end{proof} 285 | 286 | \end{document} 287 | --------------------------------------------------------------------------------