├── img └── update_delete.png └── README.md /img/update_delete.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hytromo/postgres_transaction_cheatsheet/HEAD/img/update_delete.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Postgres Transaction Cheatsheet 2 | 3 | ## How SELECT, UPDATE and DELETE work 4 | 5 | ### SELECT 6 | 7 | Takes a snapshot of the entire db and searches in the snapshot. This means that 2 selects in same transaction have their own snapshot and can see different data. 8 | 9 | ### UPDATE / DELETE 10 | 11 |

12 | 13 |

14 | 15 | It is worth noting that UPDATEs and DELETEs get a row-level write lock (continue reading to find out what this means or see solution #S1.b) 16 | 17 | ## Problems to deal with 18 | 19 | ### #P1 - Non-repeatable reads 20 | 21 | When a transaction reads the same row twice but finds different data, because of some other transaction. 22 | 23 | ### #P2 - Lost update 24 | 25 | Consider the following example: 26 | 27 | 1. Fetch row from db and read value (e.g. value == 1) 28 | 2. Increment a field (server side) (value++ -> value == 2) 29 | 3. Save back to db (SET value = 2) 30 | 31 | If the above example is run concurrently from 2 apps, then the final value will be 2, not 3. 32 | 33 | ### #P3 - Phantom read 34 | 35 | The same transaction may see different number of rows in the same table because of INSERTs of other transactions. 36 | 37 | ### #P4 - Skipped modification 38 | 39 | Due to how UPDATEs and DELETEs work, a row can be true for the WHERE clause and still not be updated. 40 | Consider the following example: 41 | 42 | ``` 43 | DB tables-> 44 | [name:ints] 45 | n 46 | --- 47 | 1 -> row#1 48 | 2 -> row#2 49 | 50 | T1: BEGIN 51 | T1: UPDATE ints SET n = n+1; 52 | 53 | T2: BEGIN 54 | T2: DELETE from ints WHERE n = 2; -> blocks since delete is trying to modify rows being updated 55 | 56 | T1: COMMIT 57 | 58 | T2: COMMIT 59 | 60 | DB tables-> 61 | [name:ints] 62 | n 63 | --- 64 | 2 65 | 3 66 | ``` 67 | 68 | No row is being deleted. Because: 69 | 1. DELETE finds #row2 having the value '2' 70 | 2. It waits for it to become unlocked 71 | 3. Once unlocked, it re-evaluates the row, but now it has the value '3', and it's no longer true. 72 | 4. DELETE skips the row 73 | 74 | The catch is that row#1 is not re-evaluated after the first transaction commits. Consult the UPDATE / DELETE image if you can't wrap your head around this, and you'll realize that `T2` blocks only on #row2. 75 | 76 | ### #P5 - Serialization anomaly 77 | 78 | This happens when it is not possible to serialize (= execute the one after the other) the 2 transactions and getting the results that each transaction had. 79 | 80 | Example: 81 | ``` 82 | T0> BEGIN; 83 | T0> SELECT count(*) FROM ints; 84 | count 85 | ------- 86 | 0 87 | (1 row) 88 | 89 | T1> BEGIN; 90 | T1> SELECT count(*) FROM ints; 91 | count 92 | ------- 93 | 0 94 | (1 ROW) 95 | 96 | T1> INSERT INTO ints SELECT 1; 97 | T1> COMMIT; 98 | 99 | T0> INSERT INTO ints SELECT 1; 100 | T0> COMMIT; 101 | ``` 102 | 103 | Both `T0` and `T1` see `count = 0`, but there is no way when run sequentially that they both saw `count = 0`, because they both insert rows. So, if `T0` runs first, `T1` should see `count = 1` and vice versa. 104 | 105 | Postgres does not complain (does not throw errors (in the default isolation level)), but it can still create problems in your app. 106 | 107 | ## Solutions 108 | 109 | ### #S1 - Row level locks 110 | Solves #P1 (Non-repeatable reads) and #P2 (Lost update) 111 | 112 | #### #S1.a - Read lock 113 | Solves #P1 (Non-repeatable reads) 114 | 115 | Add `FOR SHARE` after the `SELECT` statement. This will acquire a read lock in the matched rows. These rows will not be able to be updated or deleted from another transaction until the lock is released. **This ensures that these rows will have the same value for the duration of the transaction**. Of course, the transaction itself (the current one, with the read lock) can update/delete these rows and the changes will be visible within the transaction. 116 | 117 | #### #S1.b - Write lock 118 | Solves #P2 (Lost update) 119 | 120 | Add `FOR UPDATE` after the `SELECT` statement. This will acquire a write lock in the matched rows. This blocks other transactions when they try to acquire **either a read or a write lock on the same row**. 121 | 122 | Now the #P2 example is not longer an issue. Consider `P1` and `P2` being two apps trying to access and increment a value in the same row: 123 | 124 | - P1.1 Fetch row from db and read value (e.g. value == 1) with `FOR UPDATE` 125 | - P2.1 Fetch row from db and read value with `FOR UPDATE` -> **blocks, because `FOR UPDATE` tries to acquire a write lock, while a write lock is present** 126 | - P1.2. Increment the field (server side) (value++ -> value == 2) 127 | - P1.3. Save back to db (SET value = 2) and COMMIT 128 | - P2.1. -> **unblocks, because the write lock is released**, reads value == 2 129 | - P2.2. Increment the field (server side) (value++ -> value == 3) 130 | - P2.3. Save back to db (SET value = 3) and COMMIT 131 | 132 | Now the value will have the expected value of 3 133 | 134 | *Note*: You can use `SKIP LOCKED` in order to add a virtual `WHERE locked = false` and skip the locked rows from being selected and you being blocked, if you don't want that. 135 | 136 | ### #S2 - Different transaction isolation levels 137 | Can solve all the problems. Generally, while the isolation levels can solve #P1 and #P2, they do so in a less elegant way (aborting transactions), so **#P1 and #P2 should best be solved with row-level locks (#S1) and not with different isolation levels**. 138 | 139 | There are 3 different isolation levels: 140 | 141 | - Read committed (This is the default) 142 | - Repeatable read 143 | - Serializable 144 | 145 | To see the current isolation level: 146 | ``` 147 | SHOW default_transaction_isolation; 148 | ``` 149 | 150 | To set a different level: 151 | ``` 152 | SET default_transaction_isolation='serializable'; 153 | ``` 154 | 155 | *Note*: The isolation level is different per connected client basis. 156 | 157 | #### #S2.a - Isolation level 'Repeatable read' 158 | Solves #P1, #P2, #P3 and #P4 (all apart from serialization anomaly) 159 | 160 | When the transaction starts, it takes a snapshot of the database. All the following queries within the transaction will use this snapshot to do their changes. 161 | 162 | *What does this mean for the queries?* 163 | 164 | - SELECT: It "sees" only what is in the snapshot and nothing more. The select does not see any changes of every other transaction that commits after this transaction has begun. 165 | - UPDATE / DELETE: If they stumble upon a locked row, they wait for it to be commited / rolled back. **If the row gets commited by the other transaction, the snapshot is considered to be "out of date" and the current transaction aborts**. If the row gets rolled back by the other transaction, then the query continues as if nothing has happened. 166 | 167 | This means that **when someone uses 'Repeatable read', their code has to be prepared to retry transactions if any of them fail**. 168 | 169 | #P3 (Phantom read) is solved because the rows that the transaction sees are from the snapshot in the beginning of the transaction, so no more/less rows can be read due to other transactions. 170 | 171 | #P4 (Skipped modification) is solved by **aborting** all the transactions that wait for write lock in a row that is finally commited ("out of date" row). 172 | 173 | #### #S2.b - Isolation level 'Serializable' 174 | Can solve all the problems. 175 | 176 | This isolation level behaves exactly like 'Repeatable read', but it can also detect when two transactions cannot serialize correctly, and it will abort one of them to prevent a serialization anomaly. This comes with an overhead cost because Postgres now has an extra job - to detect the serialization anomalies. 177 | 178 | This means that **when someone uses 'Serializable', their code has to be prepared to retry transactions if any of them fail**. 179 | 180 | ## Summary 181 | 182 | - Solve #P1 - Non-repeatable reads -> use #S1.a, row level read lock (`FOR SHARE`) 183 | - Solve #P2 - Lost update -> use #S1.b, row level write lock (`FOR UPDATE`) 184 | - Solve #P3 - Phantom read and #P4 - Skipped modification -> use #S2.a, 'Repeatable read' isolation level (**RETRY TRANSACTION ON FAIL**) 185 | - Solve #P5 - Serialization anomaly -> use #S2.b, 'Serializable' isolation level (**RETRY TRANSACTION ON FAIL**) 186 | 187 | Sources and full credit to: 188 | 189 | - http://malisper.me/postgres-transactions-arent-fully-isolated/ 190 | - http://malisper.me/postgres-row-level-locks/ 191 | - http://malisper.me/postgres-transaction-isolation-levels/ 192 | --------------------------------------------------------------------------------