├── img
    └── update_delete.png
└── README.md


/img/update_delete.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hytromo/postgres_transaction_cheatsheet/HEAD/img/update_delete.png


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Postgres Transaction Cheatsheet
  2 | 
  3 | ## How SELECT, UPDATE and DELETE work
  4 | 
  5 | ### SELECT
  6 | 
  7 | Takes a snapshot of the entire db and searches in the snapshot. This means that 2 selects in same transaction have their own snapshot and can see different data.
  8 | 
  9 | ### UPDATE / DELETE
 10 | 
 11 | <p align="center">
 12 |     <img src="https://raw.githubusercontent.com/hytromo/postgres_transaction_cheatsheet/master/img/update_delete.png" />
 13 | </p>
 14 | 
 15 | It is worth noting that UPDATEs and DELETEs get a row-level write lock (continue reading to find out what this means or see solution #S1.b)
 16 | 
 17 | ## Problems to deal with
 18 | 
 19 | ### #P1 - Non-repeatable reads
 20 | 
 21 | When a transaction reads the same row twice but finds different data, because of some other transaction.
 22 | 
 23 | ### #P2 - Lost update
 24 | 
 25 | Consider the following example:
 26 | 
 27 | 1. Fetch row from db and read value (e.g. value == 1)
 28 | 2. Increment a field (server side) (value++ -> value == 2)
 29 | 3. Save back to db (SET value = 2)
 30 | 
 31 | If the above example is run concurrently from 2 apps, then the final value will be 2, not 3.
 32 | 
 33 | ### #P3 - Phantom read
 34 | 
 35 | The same transaction may see different number of rows in the same table because of INSERTs of other transactions.
 36 | 
 37 | ### #P4 - Skipped modification
 38 | 
 39 | Due to how UPDATEs and DELETEs work, a row can be true for the WHERE clause and still not be updated.
 40 | Consider the following example: 
 41 | 
 42 | ```
 43 | DB tables->
 44 | [name:ints]
 45 |  n
 46 | ---
 47 |  1 -> row#1
 48 |  2 -> row#2
 49 | 
 50 | T1: BEGIN
 51 | T1: UPDATE ints SET n = n+1;
 52 | 
 53 | T2: BEGIN
 54 | T2: DELETE from ints WHERE n = 2; -> blocks since delete is trying to modify rows being updated
 55 | 
 56 | T1: COMMIT
 57 | 
 58 | T2: COMMIT
 59 | 
 60 | DB tables->
 61 | [name:ints]
 62 |  n
 63 | ---
 64 |  2
 65 |  3
 66 |  ```
 67 | 
 68 | No row is being deleted. Because: 
 69 | 1. DELETE finds #row2 having the value '2'
 70 | 2. It waits for it to become unlocked
 71 | 3. Once unlocked, it re-evaluates the row, but now it has the value '3', and it's no longer true.
 72 | 4. DELETE skips the row
 73 | 
 74 | The catch is that row#1 is not re-evaluated after the first transaction commits. Consult the UPDATE / DELETE image if you can't wrap your head around this, and you'll realize that `T2` blocks only on #row2.
 75 | 
 76 | ### #P5 - Serialization anomaly
 77 | 
 78 | This happens when it is not possible to serialize (= execute the one after the other) the 2 transactions and getting the results that each transaction had.
 79 | 
 80 | Example:
 81 | ```
 82 | T0> BEGIN;
 83 | T0> SELECT count(*) FROM ints;
 84 |  count
 85 | -------
 86 |      0
 87 | (1 row)
 88 |  
 89 | T1> BEGIN;
 90 | T1> SELECT count(*) FROM ints;
 91 |  count
 92 | -------
 93 |      0
 94 | (1 ROW)
 95 |  
 96 | T1> INSERT INTO ints SELECT 1;
 97 | T1> COMMIT;
 98 |  
 99 | T0> INSERT INTO ints SELECT 1;
100 | T0> COMMIT;
101 | ```
102 | 
103 | Both `T0` and `T1` see `count = 0`, but there is no way when run sequentially that they both saw `count = 0`, because they both insert rows. So, if `T0` runs first, `T1` should see `count = 1` and vice versa.
104 | 
105 | Postgres does not complain (does not throw errors (in the default isolation level)), but it can still create problems in your app.
106 | 
107 | ## Solutions
108 | 
109 | ### #S1 - Row level locks
110 | Solves #P1 (Non-repeatable reads) and #P2 (Lost update)
111 | 
112 | #### #S1.a - Read lock
113 | Solves #P1 (Non-repeatable reads)
114 | 
115 | Add `FOR SHARE` after the `SELECT` statement. This will acquire a read lock in the matched rows. These rows will not be able to be updated or deleted from another transaction until the lock is released. **This ensures that these rows will have the same value for the duration of the transaction**. Of course, the transaction itself (the current one, with the read lock) can update/delete these rows and the changes will be visible within the transaction.
116 | 
117 | #### #S1.b - Write lock
118 | Solves #P2 (Lost update)
119 | 
120 | Add `FOR UPDATE` after the `SELECT` statement. This will acquire a write lock in the matched rows. This blocks other transactions when they try to acquire **either a read or a write lock on the same row**.
121 | 
122 | Now the #P2 example is not longer an issue. Consider `P1` and `P2` being two apps trying to access and increment a value in the same row:
123 | 
124 | - P1.1 Fetch row from db and read value (e.g. value == 1) with `FOR UPDATE`
125 | - P2.1 Fetch row from db and read value with `FOR UPDATE` -> **blocks, because `FOR UPDATE` tries to acquire a write lock, while a write lock is present**
126 | - P1.2. Increment the field (server side) (value++ -> value == 2)
127 | - P1.3. Save back to db (SET value = 2) and COMMIT
128 | - P2.1. -> **unblocks, because the write lock is released**, reads value == 2
129 | - P2.2. Increment the field (server side) (value++ -> value == 3)
130 | - P2.3. Save back to db (SET value = 3) and COMMIT
131 | 
132 | Now the value will have the expected value of 3
133 | 
134 | *Note*: You can use `SKIP LOCKED` in order to add a virtual `WHERE locked = false` and skip the locked rows from being selected and you being blocked, if you don't want that.
135 | 
136 | ### #S2 - Different transaction isolation levels
137 | Can solve all the problems. Generally, while the isolation levels can solve #P1 and #P2, they do so in a less elegant way (aborting transactions), so **#P1 and #P2 should best be solved with row-level locks (#S1) and not with different isolation levels**.
138 | 
139 | There are 3 different isolation levels:
140 | 
141 | - Read committed (This is the default)
142 | - Repeatable read
143 | - Serializable
144 | 
145 | To see the current isolation level:
146 | ```
147 | SHOW default_transaction_isolation;
148 | ```
149 | 
150 | To set a different level:
151 | ```
152 | SET default_transaction_isolation='serializable';
153 | ```
154 | 
155 | *Note*: The isolation level is different per connected client basis.
156 | 
157 | #### #S2.a - Isolation level 'Repeatable read'
158 | Solves #P1, #P2, #P3 and #P4 (all apart from serialization anomaly)
159 | 
160 | When the transaction starts, it takes a snapshot of the database. All the following queries within the transaction will use this snapshot to do their changes.
161 | 
162 | *What does this mean for the queries?*
163 | 
164 | - SELECT: It "sees" only what is in the snapshot and nothing more. The select does not see any changes of every other transaction that commits after this transaction has begun.
165 | - UPDATE / DELETE: If they stumble upon a locked row, they wait for it to be commited / rolled back. **If the row gets commited by the other transaction, the snapshot is considered to be "out of date" and the current transaction aborts**. If the row gets rolled back by the other transaction, then the query continues as if nothing has happened.
166 | 
167 | This means that **when someone uses 'Repeatable read', their code has to be prepared to retry transactions if any of them fail**.
168 | 
169 | #P3 (Phantom read) is solved because the rows that the transaction sees are from the snapshot in the beginning of the transaction, so no more/less rows can be read due to other transactions.
170 | 
171 | #P4 (Skipped modification) is solved by **aborting** all the transactions that wait for write lock in a row that is finally commited ("out of date" row).
172 | 
173 | #### #S2.b - Isolation level 'Serializable'
174 | Can solve all the problems.
175 | 
176 | This isolation level behaves exactly like 'Repeatable read', but it can also detect when two transactions cannot serialize correctly, and it will abort one of them to prevent a serialization anomaly. This comes with an overhead cost because Postgres now has an extra job - to detect the serialization anomalies.
177 | 
178 | This means that **when someone uses 'Serializable', their code has to be prepared to retry transactions if any of them fail**.
179 | 
180 | ## Summary
181 | 
182 | - Solve #P1 - Non-repeatable reads -> use #S1.a, row level read lock (`FOR SHARE`)
183 | - Solve #P2 - Lost update -> use #S1.b, row level write lock (`FOR UPDATE`)
184 | - Solve #P3 - Phantom read and #P4 - Skipped modification -> use #S2.a, 'Repeatable read' isolation level (**RETRY TRANSACTION ON FAIL**)
185 | - Solve #P5 - Serialization anomaly -> use #S2.b, 'Serializable' isolation level (**RETRY TRANSACTION ON FAIL**)
186 | 
187 | Sources and full credit to:
188 | 
189 | - http://malisper.me/postgres-transactions-arent-fully-isolated/
190 | - http://malisper.me/postgres-row-level-locks/
191 | - http://malisper.me/postgres-transaction-isolation-levels/
192 | 


--------------------------------------------------------------------------------