├── img
└── update_delete.png
└── README.md
/img/update_delete.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hytromo/postgres_transaction_cheatsheet/HEAD/img/update_delete.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Postgres Transaction Cheatsheet
2 |
3 | ## How SELECT, UPDATE and DELETE work
4 |
5 | ### SELECT
6 |
7 | Takes a snapshot of the entire db and searches in the snapshot. This means that 2 selects in same transaction have their own snapshot and can see different data.
8 |
9 | ### UPDATE / DELETE
10 |
11 |
12 |
13 |
14 |
15 | It is worth noting that UPDATEs and DELETEs get a row-level write lock (continue reading to find out what this means or see solution #S1.b)
16 |
17 | ## Problems to deal with
18 |
19 | ### #P1 - Non-repeatable reads
20 |
21 | When a transaction reads the same row twice but finds different data, because of some other transaction.
22 |
23 | ### #P2 - Lost update
24 |
25 | Consider the following example:
26 |
27 | 1. Fetch row from db and read value (e.g. value == 1)
28 | 2. Increment a field (server side) (value++ -> value == 2)
29 | 3. Save back to db (SET value = 2)
30 |
31 | If the above example is run concurrently from 2 apps, then the final value will be 2, not 3.
32 |
33 | ### #P3 - Phantom read
34 |
35 | The same transaction may see different number of rows in the same table because of INSERTs of other transactions.
36 |
37 | ### #P4 - Skipped modification
38 |
39 | Due to how UPDATEs and DELETEs work, a row can be true for the WHERE clause and still not be updated.
40 | Consider the following example:
41 |
42 | ```
43 | DB tables->
44 | [name:ints]
45 | n
46 | ---
47 | 1 -> row#1
48 | 2 -> row#2
49 |
50 | T1: BEGIN
51 | T1: UPDATE ints SET n = n+1;
52 |
53 | T2: BEGIN
54 | T2: DELETE from ints WHERE n = 2; -> blocks since delete is trying to modify rows being updated
55 |
56 | T1: COMMIT
57 |
58 | T2: COMMIT
59 |
60 | DB tables->
61 | [name:ints]
62 | n
63 | ---
64 | 2
65 | 3
66 | ```
67 |
68 | No row is being deleted. Because:
69 | 1. DELETE finds #row2 having the value '2'
70 | 2. It waits for it to become unlocked
71 | 3. Once unlocked, it re-evaluates the row, but now it has the value '3', and it's no longer true.
72 | 4. DELETE skips the row
73 |
74 | The catch is that row#1 is not re-evaluated after the first transaction commits. Consult the UPDATE / DELETE image if you can't wrap your head around this, and you'll realize that `T2` blocks only on #row2.
75 |
76 | ### #P5 - Serialization anomaly
77 |
78 | This happens when it is not possible to serialize (= execute the one after the other) the 2 transactions and getting the results that each transaction had.
79 |
80 | Example:
81 | ```
82 | T0> BEGIN;
83 | T0> SELECT count(*) FROM ints;
84 | count
85 | -------
86 | 0
87 | (1 row)
88 |
89 | T1> BEGIN;
90 | T1> SELECT count(*) FROM ints;
91 | count
92 | -------
93 | 0
94 | (1 ROW)
95 |
96 | T1> INSERT INTO ints SELECT 1;
97 | T1> COMMIT;
98 |
99 | T0> INSERT INTO ints SELECT 1;
100 | T0> COMMIT;
101 | ```
102 |
103 | Both `T0` and `T1` see `count = 0`, but there is no way when run sequentially that they both saw `count = 0`, because they both insert rows. So, if `T0` runs first, `T1` should see `count = 1` and vice versa.
104 |
105 | Postgres does not complain (does not throw errors (in the default isolation level)), but it can still create problems in your app.
106 |
107 | ## Solutions
108 |
109 | ### #S1 - Row level locks
110 | Solves #P1 (Non-repeatable reads) and #P2 (Lost update)
111 |
112 | #### #S1.a - Read lock
113 | Solves #P1 (Non-repeatable reads)
114 |
115 | Add `FOR SHARE` after the `SELECT` statement. This will acquire a read lock in the matched rows. These rows will not be able to be updated or deleted from another transaction until the lock is released. **This ensures that these rows will have the same value for the duration of the transaction**. Of course, the transaction itself (the current one, with the read lock) can update/delete these rows and the changes will be visible within the transaction.
116 |
117 | #### #S1.b - Write lock
118 | Solves #P2 (Lost update)
119 |
120 | Add `FOR UPDATE` after the `SELECT` statement. This will acquire a write lock in the matched rows. This blocks other transactions when they try to acquire **either a read or a write lock on the same row**.
121 |
122 | Now the #P2 example is not longer an issue. Consider `P1` and `P2` being two apps trying to access and increment a value in the same row:
123 |
124 | - P1.1 Fetch row from db and read value (e.g. value == 1) with `FOR UPDATE`
125 | - P2.1 Fetch row from db and read value with `FOR UPDATE` -> **blocks, because `FOR UPDATE` tries to acquire a write lock, while a write lock is present**
126 | - P1.2. Increment the field (server side) (value++ -> value == 2)
127 | - P1.3. Save back to db (SET value = 2) and COMMIT
128 | - P2.1. -> **unblocks, because the write lock is released**, reads value == 2
129 | - P2.2. Increment the field (server side) (value++ -> value == 3)
130 | - P2.3. Save back to db (SET value = 3) and COMMIT
131 |
132 | Now the value will have the expected value of 3
133 |
134 | *Note*: You can use `SKIP LOCKED` in order to add a virtual `WHERE locked = false` and skip the locked rows from being selected and you being blocked, if you don't want that.
135 |
136 | ### #S2 - Different transaction isolation levels
137 | Can solve all the problems. Generally, while the isolation levels can solve #P1 and #P2, they do so in a less elegant way (aborting transactions), so **#P1 and #P2 should best be solved with row-level locks (#S1) and not with different isolation levels**.
138 |
139 | There are 3 different isolation levels:
140 |
141 | - Read committed (This is the default)
142 | - Repeatable read
143 | - Serializable
144 |
145 | To see the current isolation level:
146 | ```
147 | SHOW default_transaction_isolation;
148 | ```
149 |
150 | To set a different level:
151 | ```
152 | SET default_transaction_isolation='serializable';
153 | ```
154 |
155 | *Note*: The isolation level is different per connected client basis.
156 |
157 | #### #S2.a - Isolation level 'Repeatable read'
158 | Solves #P1, #P2, #P3 and #P4 (all apart from serialization anomaly)
159 |
160 | When the transaction starts, it takes a snapshot of the database. All the following queries within the transaction will use this snapshot to do their changes.
161 |
162 | *What does this mean for the queries?*
163 |
164 | - SELECT: It "sees" only what is in the snapshot and nothing more. The select does not see any changes of every other transaction that commits after this transaction has begun.
165 | - UPDATE / DELETE: If they stumble upon a locked row, they wait for it to be commited / rolled back. **If the row gets commited by the other transaction, the snapshot is considered to be "out of date" and the current transaction aborts**. If the row gets rolled back by the other transaction, then the query continues as if nothing has happened.
166 |
167 | This means that **when someone uses 'Repeatable read', their code has to be prepared to retry transactions if any of them fail**.
168 |
169 | #P3 (Phantom read) is solved because the rows that the transaction sees are from the snapshot in the beginning of the transaction, so no more/less rows can be read due to other transactions.
170 |
171 | #P4 (Skipped modification) is solved by **aborting** all the transactions that wait for write lock in a row that is finally commited ("out of date" row).
172 |
173 | #### #S2.b - Isolation level 'Serializable'
174 | Can solve all the problems.
175 |
176 | This isolation level behaves exactly like 'Repeatable read', but it can also detect when two transactions cannot serialize correctly, and it will abort one of them to prevent a serialization anomaly. This comes with an overhead cost because Postgres now has an extra job - to detect the serialization anomalies.
177 |
178 | This means that **when someone uses 'Serializable', their code has to be prepared to retry transactions if any of them fail**.
179 |
180 | ## Summary
181 |
182 | - Solve #P1 - Non-repeatable reads -> use #S1.a, row level read lock (`FOR SHARE`)
183 | - Solve #P2 - Lost update -> use #S1.b, row level write lock (`FOR UPDATE`)
184 | - Solve #P3 - Phantom read and #P4 - Skipped modification -> use #S2.a, 'Repeatable read' isolation level (**RETRY TRANSACTION ON FAIL**)
185 | - Solve #P5 - Serialization anomaly -> use #S2.b, 'Serializable' isolation level (**RETRY TRANSACTION ON FAIL**)
186 |
187 | Sources and full credit to:
188 |
189 | - http://malisper.me/postgres-transactions-arent-fully-isolated/
190 | - http://malisper.me/postgres-row-level-locks/
191 | - http://malisper.me/postgres-transaction-isolation-levels/
192 |
--------------------------------------------------------------------------------