├── CHANGELOG.md ├── LICENSE ├── Backfilling.md ├── Anatomy.md └── README.md /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # 2025-09-16 2 | 3 | - Added a note about materializing a new column's default value for existing rows 4 | 5 | # 2025-09-14 6 | 7 | - Add Anatomy.md guide 8 | - Add Backfilling.md guide 9 | 10 | # 2024-08-19 11 | 12 | - Correction to "Adding an Index" section about what locks are obtained. 13 | Thanks @seanpascoe for the report! 14 | 15 | # 2023-10-10 16 | 17 | - Update the instructions for setting a columns' non-null constraint. 18 | Avoid using `modify/3` since that will include adjusting the type 19 | unnecessarily, which will cause Postgres to rewrite the table. 20 | Thanks @dhedlund for the report! 21 | 22 | # 2023-05-10 23 | 24 | - Add note about setting `migration_lock: false` on CheckRepoStatus when 25 | the Repo is using advisory locks for migrations. Thanks @cgrothaus for the 26 | report! 27 | 28 | # 2022-10-06 29 | 30 | - Ecto 3.9.0 supports advisory locks for migrations, which is a great way to 31 | manage migrations when you need to create indexes concurrently. Add a note about 32 | using that migration locking strategy. 33 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2021 David Bernheisel 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | 23 | --- 24 | 25 | Copyright (c) 2013 Bob Remeika and David Waller, 2015-2021 Andrew Kane 26 | 27 | MIT License 28 | 29 | Permission is hereby granted, free of charge, to any person obtaining 30 | a copy of this software and associated documentation files (the 31 | "Software"), to deal in the Software without restriction, including 32 | without limitation the rights to use, copy, modify, merge, publish, 33 | distribute, sublicense, and/or sell copies of the Software, and to 34 | permit persons to whom the Software is furnished to do so, subject to 35 | the following conditions: 36 | 37 | The above copyright notice and this permission notice shall be 38 | included in all copies or substantial portions of the Software. 39 | 40 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 41 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 42 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 43 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 44 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 45 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 46 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 47 | -------------------------------------------------------------------------------- /Backfilling.md: -------------------------------------------------------------------------------- 1 | # Backfilling Data 2 | 3 | When I say "backfilling data", I mean that as any attempt to change data in bulk. This can happen in code through migrations, application code, UIs that allow multiple selections and updates, or in a console connected to a running application. Since bulk changes affect a lot of data, it's always a good idea to have the code reviewed before it runs. You also want to check that it runs efficiently and does not overwhelm the database. Ideally, it's nice when the code is written to be safe to re-run. For these reasons, please don't change data in bulk through a console! 4 | 5 | We're going to focus on bulk changes executed though Ecto migrations, but the same principles are applicable to any case where bulk changes are being made. Typical scenarios where you might need to run data migrations is when you need to fill in data for records that already exist (hence, backfilling data). This usually comes up when table structures are changed in the database. 6 | 7 | Some examples of backfilling: 8 | 9 | - Populating data into a new column 10 | - Changing a column to make it required. May require changing existing rows to set a value. 11 | - Splitting one database table into several 12 | - Fixing bad data 13 | 14 | For simplicity, we are using `Ecto.Migrator` to run our data migrations, but it's important to not let these migrations break developers' environments over time (more on this below). If using migrations to change data is a normal process that happens regularly, then you may consider exploring a migration system outside of `Ecto.Migrator` that is observable, hooks into error reporting, metrics, and allows for dry runs. This guide is intended as a starting point, and since Ecto ships with a great migration runner, we'll leverage it to also run the data migrations. 15 | 16 | There are both bad and good ways to write these data migrations. Let explore some: 17 | 18 | ### Bad ❌ 19 | 20 | In the following example, a migration references the schema `MyApp.MySchema`. 21 | 22 | ```elixir 23 | defmodule MyApp.Repo.DataMigrations.BackfillPosts do 24 | use Ecto.Migration 25 | import Ecto.Query 26 | 27 | def change do 28 | alter table("posts") do 29 | add :new_data, :text 30 | end 31 | 32 | flush() 33 | 34 | MyApp.MySchema 35 | |> where(new_data: nil) 36 | |> MyApp.Repo.update_all(set: [new_data: "some data"]) 37 | end 38 | end 39 | ``` 40 | 41 | The problem is the code and schema may change over time. However, migrations are using a snapshot of your schemas at the time it's written. In the future, many assumptions may no longer be true. For example, the new_data column may not be present anymore in the schema causing the query to fail if this migration is run months later. 42 | 43 | Additionally, in your development environment, you might have 10 records to migrate; in staging, you might have 100; in production, you might have 1 billion to migrate. Scaling your approach matters. 44 | 45 | Ultimately, there are several bad practices here: 46 | 47 | 1. The Ecto schema in the query may change after this migration was written. 48 | 1. If you try to backfill the data all at once, it may exhaust the database memory and/or CPU if it's changing a large data set. 49 | 1. Backfilling data inside a transaction for the migration locks row updates for the duration of the migration, even if you are updating in batches. 50 | 1. Disabling the transaction for the migration and only batching updates may still spike the database CPU to 100%, causing other concurrent reads or writes to time out. 51 | 52 | ### Good ✅ 53 | 54 | There are four keys to backfilling safely: 55 | 56 | 1. running outside a transaction 57 | 1. batching 58 | 1. throttling 59 | 1. resiliency 60 | 61 | As we've learned in this guide, it's straight-forward to disable the migration transactions. Add these options to the migration: 62 | 63 | ```elixir 64 | @disable_ddl_transaction true 65 | @disable_migration_lock true 66 | ``` 67 | 68 | Batching our data migrations still has several challenges. 69 | 70 | We'll start with how do we paginate efficiently: `LIMIT`/`OFFSET` by itself is an expensive query for large tables (they start fast, but slow to a crawl when in the later pages of the table), so we must find another way to paginate. Since we cannot use a database transaction, this also implies we cannot leverage cursors since they require a transaction. This leaves us with [keyset pagination](https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/). 71 | 72 | For querying and updating the data, there are two ways to "snapshot" your schema at the time of the migration. We'll use both options below in the examples: 73 | 74 | 1. Execute raw SQL that represents the table at that moment. Do not use Ecto schemas. Prefer this approach when you can. Your application's Ecto schemas will change over time, but your migration should not, therefore it's not a true snapshot of the data at the time. 75 | 1. Write a small Ecto schema module inside the migration that only uses what you need. Then use that in your data migration. This is helpful if you prefer the Ecto API and decouples from your application's Ecto schemas as it evolves separately. 76 | 77 | For throttling, we can simply add a `Process.sleep(@throttle)` for each page. 78 | 79 | For resiliency, we need to ensure that we handle errors without losing our progress. You don't want to migrate the same data twice! Most data migrations I have run find some records in a state that I wasn't expecting. This causes the data migration to fail. When the data migration stops, that means I have to write a little bit more code to handle that scenario, and re-run the migration. Every time the data migration is re-run, it should pick up where it left off without revisiting already-migrated records. 80 | 81 | Finally, to manage these data migrations separately, we need to: 82 | 83 | 1. Store data migrations separately from your schema migrations. 84 | 1. Run the data migrations manually. 85 | 86 | To achieve this, be inspired by the [Ecto's documentation on creating a Release module](https://hexdocs.pm/ecto_sql/Ecto.Migrator.html#module-example-running-migrations-in-a-release), and extend your release module to allow options to pass into `Ecto.Migrator` that specifies the version to migrate and the data migrations' file path, for example: 87 | 88 | ```elixir 89 | defmodule MyApp.Release do 90 | # ... 91 | @doc """ 92 | Migrate data in the database. Defaults to migrating to the latest, `[all: true]` 93 | Also accepts `[step: 1]`, or `[to: 20200118045751]` 94 | """ 95 | def migrate_data(opts \\ [all: true]) do 96 | for repo <- repos() do 97 | path = Ecto.Migrator.migrations_path(repo, "data_migrations") 98 | {:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, path, :up, opts)) 99 | end 100 | end 101 | end 102 | ``` 103 | 104 | ## Batching Deterministic Data 105 | 106 | If the data can be queried with a condition that is removed after update then you can repeatedly query the data and update the data until the query result is empty. For example, if a column is currently null and will be updated to not be null, then you can query for the null records and pick up where you left off. 107 | 108 | Here's how we can manage the backfill: 109 | 110 | 1. Disable migration transactions. 111 | 1. Use keyset pagination: Order the data, find rows greater than the last mutated row and limit by batch size. 112 | 1. For each page, mutate the records. 113 | 1. Check for failed updates and handle it appropriately. 114 | 1. Use the last mutated record's ID as the starting point for the next page. This helps with resiliency and prevents looping on the same record over and over again. 115 | 1. Arbitrarily sleep to throttle and prevent exhausting the database. 116 | 1. Rinse and repeat until there are no more records 117 | 118 | For example: 119 | 120 | ```bash 121 | mix ecto.gen.migration --migrations-path=priv/repo/data_migrations backfill_posts 122 | ``` 123 | 124 | And modify the migration: 125 | 126 | ```elixir 127 | defmodule MyApp.Repo.DataMigrations.BackfillPosts do 128 | use Ecto.Migration 129 | import Ecto.Query 130 | 131 | @disable_ddl_transaction true 132 | @disable_migration_lock true 133 | @batch_size 1000 134 | @throttle_ms 100 135 | 136 | def up do 137 | throttle_change_in_batches(&page_query/1, &do_change/1) 138 | end 139 | 140 | def down, do: :ok 141 | 142 | def do_change(batch_of_ids) do 143 | {_updated, results} = repo().update_all( 144 | from(r in "weather", select: r.id, where: r.id in ^batch_of_ids), 145 | [set: [approved: true]], 146 | log: :info 147 | ) 148 | not_updated = MapSet.difference(MapSet.new(batch_of_ids), MapSet.new(results)) |> MapSet.to_list() 149 | Enum.each(not_updated, &handle_non_update/1) 150 | Enum.sort(results) 151 | end 152 | 153 | def page_query(last_id) do 154 | # Notice how we do not use Ecto schemas here. 155 | from( 156 | r in "weather", 157 | select: r.id, 158 | where: is_nil(r.approved) and r.id > ^last_id, 159 | order_by: [asc: r.id], 160 | limit: @batch_size 161 | ) 162 | end 163 | 164 | # If you have BigInt or Int IDs, fallback last_pos = 0 165 | # If you have UUID IDs, fallback last_pos = "00000000-0000-0000-0000-000000000000" 166 | defp throttle_change_in_batches(query_fun, change_fun, last_pos \\ 0) 167 | defp throttle_change_in_batches(_query_fun, _change_fun, nil), do: :ok 168 | defp throttle_change_in_batches(query_fun, change_fun, last_pos) do 169 | case repo().all(query_fun.(last_pos), [log: :info, timeout: :infinity]) do 170 | [] -> 171 | :ok 172 | 173 | ids -> 174 | results = change_fun.(List.flatten(ids)) 175 | next_page = List.first(results) 176 | Process.sleep(@throttle_ms) 177 | throttle_change_in_batches(query_fun, change_fun, next_page) 178 | end 179 | end 180 | 181 | defp handle_non_update(id) do 182 | raise "#{inspect(id)} was not updated" 183 | end 184 | end 185 | ``` 186 | 187 | ## Batching Arbitrary Data 188 | 189 | If the data being updated does not indicate it's already been updated, then we need to take a snapshot of the current data and store it temporarily. For example, if all rows should increment a column's value by 10, how would you know if a record was already updated? You could load a list of IDs into the application during the migration, but what if the process crashes? Instead we're going to keep the data we need in the database. 190 | 191 | To do this, it works well if we can pick a specific point in time where all records _after_ that point in time do not need adjustment. This happens when you realize a bug was creating bad data and after the bug was fixed and deployed, all new entries are good and should not be touched as we clean up the bad data. For this example, we'll use `inserted_at` as our marker. Let's say that the bug was fixed on a midnight deploy on 2021-08-22. 192 | 193 | Here's how we'll manage the backfill: 194 | 195 | 1. Create a "temporary" table. In this example, we're creating a real table that we'll drop at the end of the data migration. In Postgres, there are [actual temporary tables](https://www.postgresql.org/docs/12/sql-createtable.html) that are discarded after the session is over; we're not using those because we need resiliency in case the data migration encounters an error. The error would cause the session to be over, and therefore the temporary table tracking progress would be lost 🙁. Real tables don't have this problem. Likewise, we don't want to store IDs in application memory during the migration for the same reason. 196 | 1. Populate that temporary table with IDs of records that need to update. This query only requires a read of the current records, so there are no consequential locks occurring when populating, but be aware this could be a lengthy query. Populating this table can occur at creation or afterwards; in this example we'll populate it at table creation. 197 | 1. Ensure there's an index on the temporary table so it's fast to delete IDs from it. I use an index instead of a primary key because it's easier to re-run the migration in case there's an error. There isn't a straight-forward way to `CREATE IF NOT EXIST` on a primary key; but you can do that easily with an index. 198 | 1. Use keyset pagination to pull batches of IDs from the temporary table. Do this inside a database transaction and lock records for updates. Each batch should read and update within milliseconds, so this should have little impact on concurrent reads and writes. 199 | 1. For each batch of records, determine the data changes that need to happen. This can happen for each record. 200 | 1. [Upsert](https://wiki.postgresql.org/wiki/UPSERT) those changes to the real table. This insert will include the ID of the record that already exists and a list of attributes to change for that record. Since these insertions will conflict with existing records, we'll instruct Postgres to replace certain fields on conflicts. 201 | 1. Delete those IDs from the temporary table since they're updated on the real table. Close the database transaction for that batch. 202 | 1. Throttle so we don't overwhelm the database, and also give opportunity to other concurrent processes to work. 203 | 1. Rinse and repeat until the temporary table is empty. 204 | 1. Finally, drop the temporary table when empty. 205 | 206 | Let's see how this can work: 207 | 208 | ```bash 209 | mix ecto.gen.migration --migrations-path=priv/repo/data_migrations backfill_weather 210 | ``` 211 | 212 | Modify the migration: 213 | 214 | ```elixir 215 | # Both of these modules are in the same migration file 216 | # In this example, we'll define a new Ecto Schema that is a snapshot 217 | # of the current underlying table and no more. 218 | defmodule MyApp.Repo.DataMigrations.BackfillWeather.MigratingSchema do 219 | use Ecto.Schema 220 | 221 | # Copy of the schema at the time of migration 222 | schema "weather" do 223 | field :temp_lo, :integer 224 | field :temp_hi, :integer 225 | field :prcp, :float 226 | field :city, :string 227 | 228 | timestamps(type: :naive_datetime_usec) 229 | end 230 | end 231 | 232 | defmodule MyApp.Repo.DataMigrations.BackfillWeather do 233 | use Ecto.Migration 234 | import Ecto.Query 235 | alias MyApp.Repo.DataMigrations.BackfillWeather.MigratingSchema 236 | 237 | @disable_ddl_transaction true 238 | @disable_migration_lock true 239 | @temp_table_name "records_to_update" 240 | @batch_size 1000 241 | @throttle_ms 100 242 | 243 | def up do 244 | repo().query!(""" 245 | CREATE TABLE IF NOT EXISTS "#{@temp_table_name}" AS 246 | SELECT id FROM weather WHERE inserted_at < '2021-08-21T00:00:00' 247 | """, [], log: :info, timeout: :infinity) 248 | flush() 249 | 250 | create_if_not_exists index(@temp_table_name, [:id]) 251 | flush() 252 | 253 | throttle_change_in_batches(&page_query/1, &do_change/1) 254 | 255 | # You may want to check if it's empty before dropping it. 256 | # Since we're raising an exception on non-updates 257 | # we don't have to do that in this example. 258 | drop table(@temp_table_name) 259 | end 260 | 261 | def down, do: :ok 262 | 263 | def do_change(batch_of_ids) do 264 | # Wrap in a transaction to momentarily lock records during read/update 265 | repo().transaction(fn -> 266 | mutations = 267 | from( 268 | r in MigratingSchema, 269 | where: r.id in ^batch_of_ids, 270 | lock: "FOR UPDATE" 271 | ) 272 | |> repo().all() 273 | |> Enum.reduce([], &mutation/2) 274 | 275 | # Don't be fooled by the name `insert_all`, this is actually an upsert 276 | # that will update existing records when conflicting; they should all 277 | # conflict since the ID is included in the update. 278 | 279 | {_updated, results} = repo().insert_all( 280 | MigratingSchema, 281 | mutations, 282 | returning: [:id], 283 | # Alternatively, {:replace_all_except, [:id, :inserted_at]} 284 | on_conflict: {:replace, [:temp_lo, :updated_at]}, 285 | conflict_target: [:id], 286 | placeholders: %{now: NaiveDateTime.utc_now()}, 287 | log: :info 288 | ) 289 | results = results |> Enum.map(& &1.id) |> Enum.sort() 290 | 291 | not_updated = 292 | mutations 293 | |> Enum.map(& &1[:id]) 294 | |> MapSet.new() 295 | |> MapSet.difference(MapSet.new(results)) 296 | |> MapSet.to_list() 297 | 298 | Enum.each(not_updated, &handle_non_update/1) 299 | repo().delete_all(from(r in @temp_table_name, where: r.id in ^results)) 300 | 301 | results 302 | end) 303 | end 304 | 305 | def mutation(record, mutations_acc) do 306 | # This logic can be whatever you need; we'll just do something simple 307 | # here to illustrate 308 | if record.temp_hi > 1 do 309 | # No updated needed 310 | mutations_acc 311 | else 312 | # Upserts don't update autogenerated fields like timestamps, so be sure 313 | # to update them yourself. The inserted_at value should never be used 314 | # since all these records are already inserted, and we won't replace 315 | # this field on conflicts; we just need it to satisfy table constraints. 316 | [%{ 317 | id: record.id, 318 | temp_lo: record.temp_hi - 10, 319 | inserted_at: {:placeholder, :now}, 320 | updated_at: {:placeholder, :now} 321 | } | mutations_acc] 322 | end 323 | end 324 | 325 | def page_query(last_id) do 326 | from( 327 | r in @temp_table_name, 328 | select: r.id, 329 | where: r.id > ^last_id, 330 | order_by: [asc: r.id], 331 | limit: @batch_size 332 | ) 333 | end 334 | 335 | defp handle_non_update(id) do 336 | raise "#{inspect(id)} was not updated" 337 | end 338 | 339 | # If you have BigInt IDs, fallback last_pod = 0 340 | # If you have UUID IDs, fallback last_pos = "00000000-0000-0000-0000-000000000000" 341 | # If you have Int IDs, you should consider updating it to BigInt or UUID :) 342 | defp throttle_change_in_batches(query_fun, change_fun, last_pos \\ 0) 343 | defp throttle_change_in_batches(_query_fun, _change_fun, nil), do: :ok 344 | defp throttle_change_in_batches(query_fun, change_fun, last_pos) do 345 | case repo().all(query_fun.(last_pos), [log: :info, timeout: :infinity]) do 346 | [] -> 347 | :ok 348 | 349 | ids -> 350 | case change_fun.(List.flatten(ids)) do 351 | {:ok, results} -> 352 | next_page = List.first(results) 353 | Process.sleep(@throttle_ms) 354 | throttle_change_in_batches(query_fun, change_fun, next_page) 355 | error -> 356 | raise error 357 | end 358 | end 359 | end 360 | end 361 | ``` 362 | 363 | --- 364 | 365 | This guide was originally published on [Fly.io Phoenix Files]( https://fly.io/phoenix-files/backfilling-data/). 366 | -------------------------------------------------------------------------------- /Anatomy.md: -------------------------------------------------------------------------------- 1 | [Ecto migrations](https://hexdocs.pm/ecto_sql/Ecto.Migration.html) are used to do the following: 2 | 3 | * Change the structure of the database, such as adding fields, tables, or indexes to improve lookups (DDL). 4 | * Populating fields with modified data or new data (DML). 5 | 6 | In order for us to create and run safe Ecto migrations on our database, it is helpful to understand what is actually happening with the database. To do that, we'll dig deeper into how Ecto migrations work by looking both at the code being executed and the Postgres logs. 7 | 8 | To generate a migration, we'll use run a mix task: 9 | 10 | ```bash 11 | mix ecto.gen.migration create_test_table 12 | ``` 13 | 14 | > [!TIP] 15 | > If you're using Phoenix, you might consider `mix phx.gen.schema` which will generate a migration and also allows you to pass in fields and types. See `mix help phx.gen.schema` for more information. 16 | 17 | This command will generate a file in `priv/repo/migrations` given the repo name of `Repo`. If you named it `OtherRepo` the file would be `in priv/other_repo/migrations`. 18 | 19 | Let's look at that file: 20 | 21 | ```elixir 22 | defmodule MyApp.Repo.Migrations.CreateTestTable do 23 | use Ecto.Migration 24 | 25 | def change do 26 | 27 | end 28 | end 29 | ``` 30 | 31 | Let's make some changes; how about we create a table that tracks a city's climate? 32 | 33 | ```elixir 34 | defmodule MyApp.Repo.Migrations.CreateTestTable do 35 | use Ecto.Migration 36 | 37 | def change do 38 | create table("test") do 39 | add :city, :string, size: 40 40 | add :temp_lo, :integer 41 | add :temp_hi, :integer 42 | add :prcp, :float 43 | 44 | timestamps() 45 | end 46 | end 47 | end 48 | ``` 49 | 50 | Now that we have a migration, let's run it! Run `mix ecto.migrate`: 51 | 52 | ```shell 53 | $ mix ecto.migrate 54 | 21:26:18.992 [info] == Running 20210702012346 MyApp.Repo.Migrations.CreateTestTable.change/0 forward 55 | 21:26:18.994 [info] create table test 56 | 21:26:19.004 [info] == Migrated 20210702012346 in 0.0s 57 | ``` 58 | 59 | ## Inspect SQL 60 | 61 | Understanding exactly what SQL commands are running is helpful to ensure safe migrations, so how do we see the SQL that is executed? By default, Ecto does not log the raw SQL. First, I'll rollback, and then re-migrate but with an additional flag `--log-migrations-sql` so we can see what actually runs. 62 | 63 | ```shell 64 | $ mix ecto.rollback 65 | 21:29:32.287 [info] == Running 20210702012346 MyApp.Repo.Migrations.CreateTestTable.change/0 backward 66 | 21:29:32.289 [info] drop table test 67 | 21:29:32.292 [info] == Migrated 20210702012346 in 0.0s 68 | ``` 69 | 70 | ```shell 71 | $ mix ecto.migrate --log-migrations-sql 72 | 21:29:36.461 [info] == Running 20210702012346 MyApp.Repo.Migrations.CreateTestTable.change/0 forward 73 | 21:29:36.462 [info] create table test 74 | 21:29:36.466 [debug] QUERY OK db=3.2ms 75 | CREATE TABLE "test" ("id" bigserial, "city" varchar(40), "temp_lo" integer, "temp_hi" integer, "prcp" float, "inserted_at" timestamp(0) NOT NULL, "updated_at" timestamp(0) NOT NULL, PRIMARY KEY ("id")) [] 76 | 21:29:36.467 [info] == Migrated 20210702012346 in 0.0s 77 | ``` 78 | 79 | Ecto logged the SQL for our changes, but we're not seeing all the SQL that Ecto is running for the migration-- we're missing the Ecto.Migrator SQL that manages the migration. To get these missing logs, we'll also use another flag, `--log-migrator-sql`. Here I am tailing the PostgreSQL server logs: 80 | 81 | ```shell 82 | $ mix ecto.rollback 83 | $ mix ecto.migrate --log-migrator-sql --log-migrations-sql 84 | ``` 85 | 86 | ``` 87 | LOG: statement: BEGIN 88 | LOG: execute : LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE 89 | LOG: execute ecto_3: SELECT s0."version"::bigint FROM "schema_migrations" AS s0 90 | LOG: statement: BEGIN 91 | LOG: execute : CREATE TABLE "weather" ("id" bigserial, "city" varchar(40), "temp_lo" integer, "temp_hi" integer, "prcp" float, "inserted_at" timestamp(0) NOT NULL, "updated_at" timestamp(0) NOT NULL, PRIMARY KEY ("id")) 92 | LOG: execute ecto_insert_schema_migrations: INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2) 93 | DETAIL: parameters: $1 = '20210718204657', $2 = '2021-07-18 20:53:49' 94 | LOG: statement: COMMIT 95 | LOG: statement: COMMIT 96 | ``` 97 | 98 | Let's trace the code. When running migrations, Ecto executes these functions: 99 | 100 | 1. [`Ecto.Migrator.run/4`](https://github.com/elixir-ecto/ecto_sql/blob/557335f9a2a1e6950c1d761063e84aa5d03cb312/lib/ecto/migrator.ex#L384) 101 | 2. [`Ecto.Migrator.lock_for_migrations/4`](https://github.com/elixir-ecto/ecto_sql/blob/557335f9a2a1e6950c1d761063e84aa5d03cb312/lib/ecto/migrator.ex#L464) 102 | 3. [The adapter's `lock_for_migrations` implementation](https://github.com/elixir-ecto/ecto_sql/blob/557335f9a2a1e6950c1d761063e84aa5d03cb312/lib/ecto/adapters/postgres.ex#L207) 103 | 4. [Wrapped in another transaction](https://github.com/elixir-ecto/ecto_sql/blob/557335f9a2a1e6950c1d761063e84aa5d03cb312/lib/ecto/adapters/postgres.ex#L217) 104 | 105 | Inside the transaction, the Ecto Postgres adapter obtains a `SHARE UPDATE EXCLUSIVE` lock of the "schema_migrations" table. 106 | 107 | **Why this lock is important**: Systems at scale may have multiple instances of the application connected to the same database, and during a deployment all of the instances rolling out may try to migrate that database at the same time, Ecto leverages this `SHARE UPDATE EXCLUSIVE` lock as a way to ensure that only one instance is running a migration at a time and only once. 108 | 109 | This is what the migration actually looks like: 110 | 111 | ```sql 112 | BEGIN; 113 | LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE; 114 | BEGIN; 115 | CREATE TABLE "test" ("id" bigserial, "city" varchar(40), "temp_lo" integer, "temp_hi" integer, "prcp" float, "inserted_at" timestamp(0) NOT NULL, "updated_at" timestamp(0) NOT NULL, PRIMARY KEY ("id")); 116 | INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ('20210718204657','2021-07-18 20:53:49'); 117 | COMMIT; 118 | COMMIT; 119 | ``` 120 | 121 | When a migration fails, the transaction is rolled back and no changes are kept in the database. In most situations, these are great defaults. 122 | 123 | Veteran database administrators may notice the database transactions (`BEGIN`/`COMMIT`) and wonder how to turn those off in situations where transactions could cause problems; such as when adding indexes concurrently; Ecto provides some options that can help with transactions and locks. Let's explore some of those options next. 124 | 125 | ## Migration Options 126 | 127 | A typical migration has this structure (reminder: this guide is using Postgres; other adapters will vary): 128 | 129 | ```sql 130 | BEGIN; 131 | LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE; 132 | BEGIN; 133 | -- after_begin callback 134 | -- my_changes 135 | -- before_commit callback 136 | INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2); 137 | COMMIT; 138 | COMMIT; 139 | ``` 140 | 141 | _`my_changes` refers to the changes you specify in each of your migrations._ 142 | 143 | ### `@disable_migration_lock` 144 | 145 | By default, Ecto acquires a lock on the "schema_migrations" table during the migration transaction: 146 | 147 | ```SQL 148 | BEGIN; 149 | -- ↓ THIS LOCK ↓ 150 | LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE 151 | BEGIN; 152 | -- after_begin callback 153 | -- my_changes 154 | -- before_commit callback 155 | INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2); 156 | COMMIT; 157 | COMMIT; 158 | ``` 159 | 160 | You want this lock for most migrations because running multiple migrations simultaneously without this lock could have unpredictable results. In database transactions, any locks obtained inside the transaction are released when the transaction is committed, which then unblocks other transactions that touch the same records to proceed. 161 | 162 | However, there are some scenarios where you don't want a lock. We'll explore these scenarios later on (for example, backfilling data and creating indexes). 163 | 164 | You can skip this lock in Ecto by setting the module attribute `@disable_migration_lock true` in your migration. When the migration lock is disabled, the migration looks like this: 165 | 166 | ```sql 167 | BEGIN; 168 | -- after_begin callback 169 | -- my changes 170 | -- before_commit callback 171 | INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2); 172 | COMMIT; 173 | ``` 174 | 175 | ### `@disable_ddl_transaction` 176 | 177 | By default, Ecto wraps your changes in a transaction: 178 | 179 | ```sql 180 | BEGIN; 181 | LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE 182 | -- ↓ THIS TRANSACTION ↓ 183 | BEGIN; 184 | -- after_begin callback 185 | -- my changes 186 | -- before_commit callback 187 | INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2); 188 | COMMIT; 189 | -- ↑ THIS TRANSACTION ↑ 190 | COMMIT; 191 | ``` 192 | 193 | This ensures that when failures occur during a migration, your database is not left in an incomplete or mangled state. 194 | 195 | There are scenarios where you don't want a migration to run inside a transaction. Like when performing data migrations or when running commands such as `CREATE INDEX CONCURRENTLY` that can run in the background in the database after you issue the command and cannot be inside a transaction. 196 | 197 | You can disable this transaction by setting the module attribute `@disable_ddl_transaction true` in your migration. The migration then looks like this: 198 | 199 | ```sql 200 | BEGIN; 201 | LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE 202 | -- my_changes 203 | INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2); 204 | COMMIT; 205 | ``` 206 | 207 | > [!TIP] 208 | > For Postgres, when disabling transactions, you'll also want to disable the migration lock since that uses yet another transaction. When running these migrations in a multi-node environment, you'll need a process to ensure these migrations are only run once since there is no protection against multiple nodes running the same migration at the same exact time. 209 | 210 | Disabling both the migration lock and the DDL transaction, your migration will be pretty simple: 211 | 212 | ```sql 213 | -- my_changes 214 | INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2); 215 | ``` 216 | 217 | ### Transaction Callbacks 218 | 219 | In the examples above, you'll notice there are `after_begin` and `before_commit` hooks if the migration is occurring within a transaction: 220 | 221 | ```sql 222 | BEGIN; 223 | -- after_begin hook ← THIS HOOK 224 | -- my_changes 225 | -- before_commit hook ← AND THIS HOOK 226 | INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2); 227 | COMMIT; 228 | ``` 229 | 230 | You can use these hooks by defining `after_begin/0` and `before_commit/0` in your migration. A good use case for this is setting migration lock timeouts as safeguards (see later Safeguards section). 231 | 232 | ```elixir 233 | defmodule MyApp.Repo.Migrations.CreateTestTable do 234 | use Ecto.Migration 235 | 236 | def change do 237 | # ... my potentially long-locking migration 238 | end 239 | 240 | def after_begin do 241 | execute "SET lock_timeout TO '5s'", "SET lock_timeout TO '10s'" 242 | end 243 | end 244 | ``` 245 | 246 | > [!CAUTION] 247 | > Be aware that these callbacks are not called when `@disable_ddl_transaction true` is configured because they rely on the transaction being present. 248 | 249 | ## Inspecting Locks In a Query 250 | 251 | Before we dive into safer migration practices, we should cover how to check if a migration could potentially block your application. In Postgres, there is a `pg_locks` table that we can query that reveals the locks in the system. Let's query that table alongside our changes from the migration, but return the locks so we can see what locks were obtained from the changes. 252 | 253 | ```sql 254 | BEGIN; 255 | -- Put your actions in here. For example, validating a constraint 256 | ALTER TABLE addresses VALIDATE CONSTRAINT "my_table_locking_constraint"; 257 | 258 | -- end your transaction with a SELECT on pg_locks so you can see the locks 259 | -- that occurred during the transaction 260 | SELECT locktype, relation::regclass, mode, transactionid AS tid, virtualtransaction AS vtid, pid, granted FROM pg_locks; 261 | COMMIT; 262 | ``` 263 | 264 | The result from this SQL command should return the locks obtained during the database transaction. Let's see an example: We'll add a unique index without concurrency so we can see the locks it obtains: 265 | 266 | ```sql 267 | BEGIN; 268 | LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE; 269 | -- we are going to squash the embedded transaction here for simplicity 270 | CREATE UNIQUE INDEX IF NOT EXISTS "weather_city_index" ON "weather" ("city"); 271 | INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ('20210718210952',NOW()); 272 | SELECT locktype, relation::regclass, mode, transactionid AS tid, virtualtransaction AS vtid, pid, granted FROM pg_locks; 273 | COMMIT; 274 | 275 | -- locktype | relation | mode | tid | vtid | pid | granted 276 | -- ---------------+--------------------+--------------------------+--------+-------+-----+--------- 277 | -- relation | pg_locks | AccessShareLock | | 2/321 | 253 | t 278 | -- relation | schema_migrations | RowExclusiveLock | | 2/321 | 253 | t 279 | -- virtualxid | | ExclusiveLock | | 2/321 | 253 | t 280 | -- relation | weather_city_index | AccessExclusiveLock | | 2/321 | 253 | t 281 | -- relation | schema_migrations | ShareUpdateExclusiveLock | | 2/321 | 253 | t 282 | -- transactionid | | ExclusiveLock | 283863 | 2/321 | 253 | t 283 | -- relation | weather | ShareLock | | 2/321 | 253 | t 284 | -- (7 rows) 285 | ``` 286 | 287 | Let's go through each of these: 288 | 289 | 1. `relation | pg_locks | AccessShareLock` - This is us querying the `"pg_locks"` table in the transaction so we can see which locks are taken. It has the weakest lock which only conflicts with `AccessExclusive` which should never happen on the internal `"pg_locks"` table itself. 290 | 1. `relation | schema_migrations | RowExclusiveLock` - This is because we're inserting a row into the `"schema_migrations"` table. Reads are still allowed, but mutation on this table is blocked until the transaction is done. 291 | 1. `virtualxid | _ | ExlusiveLock` - Querying `pg_locks` created a virtual transaction on the `SELECT` query. We can ignore this. 292 | 1. `relation | weather_city_index | AccessExclusiveLock` - We're creating the index, so this new index will be completely locked to any reads and writes until this transaction is complete. 293 | 1. `relation | schema_migrations | ShareUpdateExclusiveLock` - This lock is acquired by Ecto to ensure that only one mutable operation is happening on the table. This is what allows multiple nodes able to run migrations at the same time safely. Other processes can still read the `"schema_migrations"` table, but you cannot write to it. 294 | 1. `transactionid | _ | ExclusiveLock` - This lock is on a transaction that is happening; in this case, it has an `ExclusiveLock` on itself; meaning that if another transaction occurring at the same time conflicts with this transaction, the other transaction will acquire a lock on this transaction so it knows when it's done. I call this "lockception". 🙂🤯 295 | 1. `relation | weather | ShareLock` - Finally, the reason why we're here. Remember, we're creating a unique index on the `"weather"` table without concurrency. This lock is our red flag 🚩. Notice it acquires a ShareLock on the table. This means it blocks writes! That's not good if we deploy this and have processes or web requests that regularly write to this table. `UPDATE`, `DELETE`, and `INSERT` acquire a `RowExclusiveLock` which conflicts with the ShareLock. 296 | 297 | To avoid this lock, we change the command to `CREATE INDEX CONCURRENTLY ...`; when using `CONCURRENTLY`, it prevents us from using database transactions which is unfortunate because now we cannot easily see the locks the command obtains. We know this will be safer however because `CREATE INDEX CONCURRENTLY` acquires a `ShareUpdateExclusiveLock` which does not conflict with `RowExclusiveLock` (See Reference Material). 298 | 299 | This scenario is revisited later in [Migration Recipes](README.md). 300 | 301 | ## Safeguards in the database 302 | 303 | It's a good idea to add safeguards so no developer on the team accidentally locks up the database for too long. Even if you know all about databases and locks, you might have a forgetful day and try to add an index non-concurrently and bring down production. Safeguards are good. 304 | 305 | We can add one or more safeguards: 306 | 307 | 1. Automatically cancel a statement if the lock is held for too long. There are two ways to do this: 308 | 1. Apply to migrations. This can be done with a lock_timeout inside a transaction. 309 | 2. Apply to any statements. This can be done by setting a lock_timeout on a Postgres role. 310 | 2. Automatically cancel statements that take too long. This is broader than #1 because it includes any statement, not just locks. 311 | 312 | Let's dive into these safeguards. 313 | 314 | ### Add a `lock_timeout` 315 | 316 | One safeguard we can add to migrations is a lock timeout. A lock timeout ensures a lock will not last more than n seconds. This way, when an unsafe migration sneaks in, it only locks tables to updates and writes (and possibly reads) for n seconds instead of indefinitely when the migration finishes. 317 | 318 | From the Postgres docs: 319 | 320 | > Abort any statement that waits longer than the specified amount of time while attempting to acquire a lock on a table, index, row, or other database object. The time limit applies separately to each lock acquisition attempt. The limit applies both to explicit locking requests (such as LOCK TABLE, or SELECT FOR UPDATE without NOWAIT) and to implicitly-acquired locks. If this value is specified without units, it is taken as milliseconds. A value of zero (the default) disables the timeout. 321 | > 322 | > Unlike `statement_timeout`, this timeout can only occur while waiting for locks. Note that if `statement_timeout` is nonzero, it is rather pointless to set lock_timeout to the same or larger value, since the statement timeout would always trigger first. If `log_min_error_statement` is set to `ERROR` or lower, the statement that timed out will be logged. 323 | 324 | There are two ways to apply this lock: 325 | 326 | 1. localized to the transaction 327 | 2. default for the user/role 328 | 329 | Let's go through those options: 330 | 331 | #### Transaction lock_timeout 332 | 333 | In SQL: 334 | 335 | ```sql 336 | SET LOCAL lock_timeout TO '5s'; 337 | ``` 338 | 339 | Let's move that to an [Ecto migration transaction callback](https://hexdocs.pm/ecto_sql/Ecto.Migration.html#module-transaction-callbacks). Since this `lock_timeout` will be in a database transaction for Postgres, we will use `SET LOCAL lock_timeout` so that the `lock_timeout` only alters this database transaction and not the session. 340 | 341 | You can set a lock timeout in every migration: 342 | 343 | ```elixir 344 | def after_begin do 345 | # execute/2 can be ran in both migration directions, up/down. 346 | # The first argument will be ran when migrating up. 347 | # The second argument will be ran when migrating down. You might give yourself 348 | # a couple extra seconds when rolling back. 349 | execute("SET LOCAL lock_timeout TO '5s'", "SET LOCAL lock_timeout TO '10s'") 350 | end 351 | ``` 352 | 353 | But this can get tedious since you'll likely want this for every migration. Let's write a little macro to help with this boilerplate code. 354 | 355 | In every migration, you'll notice that we `use Ecto.Migration` which inserts some code into your migration. Let's use this same idea to inject a boilerplate of our own and leverage an option to set a lock timeout. We define the [`after_begin/0` callback](https://hexdocs.pm/ecto_sql/Ecto.Migration.html#c:after_begin/0) to set the lock timeout. 356 | 357 | ```elixir 358 | defmodule MyApp.Migration do 359 | defmacro __using__(opts) do 360 | lock_timeout = Keyword.get(opts, :lock_timeout, [up: "5s", down: "10s"]) 361 | 362 | quote do 363 | use Ecto.Migration 364 | 365 | if unquote(lock_timeout) do 366 | def after_begin do 367 | execute( 368 | "SET LOCAL lock_timeout TO '#{Keyword.fetch!(unquote(lock_timeout), :up)}'", 369 | "SET LOCAL lock_timeout TO '#{Keyword.fetch!(unquote(lock_timeout), :down)}'" 370 | ) 371 | end 372 | end 373 | end 374 | end 375 | end 376 | ``` 377 | 378 | And adjust our migration: 379 | 380 | ```diff 381 | defmodule MyApp.Repo.Migrations.CreateTestTable do 382 | - use Ecto.Migration 383 | + use MyApp.Migration 384 | 385 | def change do 386 | # my changes 387 | end 388 | end 389 | ``` 390 | 391 | Now the migrations will only be allowed to acquire locks up to 5 seconds when migrating up and 10 seconds when rolling back. Remember, these callbacks are not called when `@disable_ddl_transaction true` is set. 392 | 393 | You can override the lock timeout if needed by passing in options: 394 | 395 | ```elixir 396 | # disable the lock_timeout 397 | use MyApp.Migration, lock_timeout: false 398 | 399 | # or change the timeouts 400 | use MyApp.Migration, lock_timeout: [up: "10s", down: "20s"] 401 | ``` 402 | 403 | Let's now make Ecto use our custom Migration module by default when [generating new migrations](https://hexdocs.pm/ecto_sql/Mix.Tasks.Ecto.Gen.Migration.html#module-configuration): 404 | 405 | ```elixir 406 | # config/config.exs 407 | config :ecto_sql, migration_module: MyApp.Migration 408 | ``` 409 | 410 | #### Role-level lock_timeout 411 | 412 | Alternatively, you can set a lock timeout for the user in all commands: 413 | 414 | ```sql 415 | ALTER ROLE myuser SET lock_timeout = '10s'; 416 | ``` 417 | 418 | If you have a different user that runs migrations, this could be a good option for that migration-specific Postgres user. The trade-off is that Elixir developers won't see this timeout as they write migrations and explore the call stack since database role settings are in the database which developers don't usually monitor. 419 | 420 | #### Statement Timeout 421 | 422 | Another way to ensure safety is to configure your Postgres database with statement timeouts. These timeouts apply to all statements, including migrations and the locks they obtain. 423 | 424 | From Postgres docs: 425 | 426 | > Abort any statement that takes more than the specified amount of time. If `log_min_error_statement` is set to `ERROR` or lower, the statement that timed out will also be logged. If this value is specified without units, it is taken as milliseconds. A value of zero (the default) disables the timeout. 427 | > 428 | > The timeout is measured from the time a command arrives at the server until it is completed by the server. If multiple SQL statements appear in a single simple-Query message, the timeout is applied to each statement separately. (PostgreSQL versions before 13 usually treated the timeout as applying to the whole query string.) In extended query protocol, the timeout starts running when any query-related message (Parse, Bind, Execute, Describe) arrives, and it is canceled by completion of an Execute or Sync message. 429 | 430 | You can specify this configuration for the Postgres user. For example: 431 | 432 | ```sql 433 | ALTER ROLE myuser SET statement_timeout = '10m'; 434 | ``` 435 | 436 | Now any statement automatically times out if it runs for more than 10 minutes; opposed to running indefinitely. This can help if you accidentally run a query that runs the database CPU hot, slowing everything else down; now the unoptimized query will be limited to 10 minutes or else it will fail and be canceled. 437 | 438 | Setting this `statement_timeout` requires discipline from the team; if there are runaway queries that fail (for example) at 10 minutes, an exception will likely occur somewhere. You will want to equip your application with sufficient logging, tracing, and reporting so you can replicate the query and the parameters it took to hit the timeout, and ultimately optimize the query. Without this discipline, you risk creating a culture that ignores exceptions. 439 | 440 | --- 441 | 442 | This guide was originally published on [Fly.io Phoenix Files]( https://fly.io/phoenix-files/anatomy-of-an-ecto-migration/). 443 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Safe Ecto Migrations 2 | 3 | A non-exhaustive guide on common migration recipes and how to avoid trouble. 4 | 5 | - [Adding an index](#adding-an-index) 6 | - [Adding a reference or foreign key](#adding-a-reference-or-foreign-key) 7 | - [Adding a column with a default value](#adding-a-column-with-a-default-value) 8 | - [Changing a column's default value](#changing-a-columns-default-value) 9 | - [Changing the type of a column](#changing-the-type-of-a-column) 10 | - [Removing a column](#removing-a-column) 11 | - [Renaming a column](#renaming-a-column) 12 | - [Renaming a table](#renaming-a-table) 13 | - [Adding a check constraint](#adding-a-check-constraint) 14 | - [Setting NOT NULL on an existing column](#setting-not-null-on-an-existing-column) 15 | - [Adding a JSON column](#adding-a-json-column) 16 | - [Squashing migrations](#squashing-migrations) 17 | 18 | Read more about safe migration techniques: 19 | 20 | - [Migration locks in "Anatomy of a Migration"](./Anatomy.md) 21 | - [How to backfill data and change data in bulk (aka: DML)](./Backfilling.md) 22 | 23 | --- 24 | 25 | ## Adding an index 26 | 27 | Creating an index will [block writes](https://www.postgresql.org/docs/8.2/sql-createindex.html) to the table in Postgres. 28 | 29 | MySQL is concurrent by default since [5.6](https://downloads.mysql.com/docs/mysql-5.6-relnotes-en.pdf) unless using `SPATIAL` or `FULLTEXT` indexes, which then it [blocks reads and writes](https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html#online-ddl-index-syntax-notes). 30 | 31 | **BAD ❌** 32 | 33 | ```elixir 34 | def change do 35 | create index("posts", [:slug]) 36 | 37 | # This obtains a ShareLock on "posts" which will block writes to the table 38 | end 39 | ``` 40 | 41 | **GOOD ✅** 42 | 43 | With Postgres, instead create the index concurrently which does not block writes. 44 | There are two options: 45 | 46 | **Option 1** 47 | 48 | [Configure the Repo to use advisory locks](https://hexdocs.pm/ecto_sql/Ecto.Adapters.Postgres.html#module-migration-options) for locking migrations while running. Advisory locks are application-controlled database-level locks, and EctoSQL since v3.9.0 provides an option to use this type of lock. This is the safest option as it avoids the trade-off in Option 2. 49 | 50 | Disable the DDL transaction in the migration to avoid a database transaction which is not compatible with `CONCURRENTLY` database operations. 51 | 52 | ```elixir 53 | # in config/config.exs 54 | config MyApp.Repo, migration_lock: :pg_advisory_lock 55 | 56 | # in the migration 57 | @disable_ddl_transaction true 58 | 59 | def change do 60 | create index("posts", [:slug], concurrently: true) 61 | end 62 | ``` 63 | 64 | If you're using Phoenix and PhoenixEcto, you will likely appreciate disabling 65 | the migration lock in the CheckRepoStatus plug during dev to avoid hitting and 66 | waiting on the advisory lock with concurrent web processes. You can do this by 67 | adding `migration_lock: false` to the CheckRepoStatus plug in your 68 | `MyAppWeb.Endpoint`. 69 | 70 | **Option 2** 71 | 72 | Disable the DDL transaction and the migration lock for the migration. By default, EctoSQL with Postgres will run migrations with a DDL transaction and a migration lock which also (by default) uses another transaction. You must disable both of these database transactions to use `CONCURRENTLY`. However, disabling the migration lock will allow competing nodes to try to run the same migration at the same time (eg, in a multi-node Kubernetes environment that runs migrations before startup). Therefore, some nodes may fail startup for a variety of reasons. 73 | 74 | ```elixir 75 | @disable_ddl_transaction true 76 | @disable_migration_lock true 77 | 78 | def change do 79 | create index("posts", [:slug], concurrently: true) 80 | end 81 | ``` 82 | 83 | For either option chosen, the migration may still take a while to run, but reads and updates to rows will continue to work. For example, for 100,000,000 rows it took 165 seconds to add run the migration, but SELECTS and UPDATES could occur while it was running. 84 | 85 | **Do not have other changes in the same migration**; only create the index concurrently and separate other changes to later migrations. 86 | 87 | --- 88 | 89 | ## Adding a reference or foreign key 90 | 91 | Adding a foreign key blocks writes on both tables. 92 | 93 | **BAD ❌** 94 | 95 | ```elixir 96 | def change do 97 | alter table("posts") do 98 | add :group_id, references("groups") 99 | # Obtains a ShareRowExclusiveLock which blocks writes on both tables 100 | end 101 | end 102 | ``` 103 | 104 | 105 | **GOOD ✅** 106 | 107 | In the first migration 108 | 109 | ```elixir 110 | def change do 111 | alter table("posts") do 112 | add :group_id, references("groups", validate: false) 113 | # Obtains a ShareRowExclusiveLock which blocks writes on both tables. 114 | end 115 | end 116 | ``` 117 | 118 | In the second migration 119 | 120 | ```elixir 121 | def change do 122 | execute "ALTER TABLE posts VALIDATE CONSTRAINT group_id_fkey", "" 123 | # Obtains a ShareUpdateExclusiveLock which doesn't block reads or writes 124 | end 125 | ``` 126 | 127 | These migrations can be in the same deployment, but make sure they are separate migrations. 128 | 129 | **Note on empty tables**: when the table creating the referenced column is empty, you may be able to 130 | create the column and validate at the same time since the time difference would be milliseconds 131 | which may not be noticeable, no matter if you have 1 million or 100 million records in the referenced table. 132 | 133 | **Note on populated tables**: the biggest difference depends on your scale. For 1 million records in 134 | both tables, you may lock writes to both tables when creating the column for milliseconds 135 | (you should benchmark for yourself) which could be acceptable for you. However, once your table has 136 | 100+ million records, the difference becomes seconds which is more likely to be felt and cause timeouts. 137 | The differentiating metric is the time that both tables are locked from writes. Therefore, err on the side 138 | of safety and separate constraint validation from referenced column creation when there is any data in the table. 139 | 140 | --- 141 | 142 | ## Adding a column with a default value 143 | 144 | Adding a column with a default value to an existing table may cause the table to be rewritten. During this time, reads and writes are blocked in Postgres, and writes are blocked in MySQL and MariaDB. If the default column is an expression (volatile value) it will remain unsafe. 145 | 146 | **BAD ❌** 147 | 148 | Note: This becomes safe for non-volatile (static) defaults in: 149 | 150 | - [Postgres 11+](https://www.postgresql.org/docs/release/11.0/). Default applies to INSERT since 7.x, and UPDATE since 9.3. 151 | - MySQL 8.0.12+ 152 | - MariaDB 10.3.2+ 153 | 154 | ```elixir 155 | def change do 156 | alter table("comments") do 157 | add :approved, :boolean, default: false 158 | # This took 10 minutes for 100 million rows with no fkeys, 159 | 160 | # Obtained an AccessExclusiveLock on the table, which blocks reads and 161 | # writes. 162 | end 163 | end 164 | ``` 165 | 166 | ```elixir 167 | def change do 168 | alter table("comments") do 169 | add :some_timestamp, :utc_datetime, default: fragment("now()") 170 | # A volatile value 171 | end 172 | end 173 | ``` 174 | 175 | **GOOD ✅** 176 | 177 | Add the column first, then alter it to include the default. 178 | 179 | First migration: 180 | 181 | ```elixir 182 | def change do 183 | alter table("comments") do 184 | add :approved, :boolean 185 | # This took 0.27 milliseconds for 100 million rows with no fkeys, 186 | end 187 | end 188 | ``` 189 | 190 | Second migration: 191 | 192 | ```elixir 193 | def change do 194 | execute "ALTER TABLE comments ALTER COLUMN approved SET DEFAULT false", 195 | "ALTER TABLE comments ALTER COLUMN approved DROP DEFAULT" 196 | # This took 0.28 milliseconds for 100 million rows with no fkeys, 197 | end 198 | ``` 199 | 200 | Note: we cannot use [`modify/3`](https://hexdocs.pm/ecto_sql/Ecto.Migration.html#modify/3) as it will include updating the column type as 201 | well unnecessarily, causing Postgres to rewrite the table. For more information, 202 | [see this example](https://github.com/fly-apps/safe-ecto-migrations/issues/10). 203 | 204 | Schema change to read the new column: 205 | 206 | ```diff 207 | schema "comments" do 208 | + field :approved, :boolean, default: false 209 | end 210 | ``` 211 | 212 | > [!NOTE] 213 | > The safe method will not materialize the default value on the column for existing rows because the default was not set when adding the column (avoiding a potential table lock so it can re-write it to _write_ the default). However, the next `UPDATE` operation on the row will materialize the default, additionally Ecto will apply the default on the application side when reading the record. If you want to materialize the value, then you will need to consider [backfilling](./Backfilling.md). 214 | 215 | --- 216 | 217 | ## Changing a column's default value 218 | 219 | Changing an existing column's default may risk rewriting the table. 220 | 221 | **BAD ❌** 222 | 223 | ```elixir 224 | def change do 225 | alter table("comments") do 226 | # Previously, the default was `true` 227 | modify :approved, :boolean, default: false 228 | # This took 10 minutes for 100 million rows with no fkeys, 229 | 230 | # Obtained an AccessExclusiveLock on the table, which blocks reads and 231 | # writes. 232 | end 233 | end 234 | ``` 235 | 236 | The issue is that we cannot use [`modify/3`](https://hexdocs.pm/ecto_sql/Ecto.Migration.html#modify/3) as it will include updating the column type as 237 | well unnecessarily, causing Postgres to rewrite the table. For more information, 238 | [see this example](https://github.com/fly-apps/safe-ecto-migrations/issues/10). 239 | 240 | **GOOD ✅** 241 | 242 | Execute raw sql instead to alter the default: 243 | 244 | ```elixir 245 | def change do 246 | execute "ALTER TABLE comments ALTER COLUMN approved SET DEFAULT false", 247 | "ALTER TABLE comments ALTER COLUMN approved SET DEFAULT true" 248 | # This took 0.28 milliseconds for 100 million rows with no fkeys, 249 | end 250 | ``` 251 | 252 | > [!NOTE] 253 | > This will not update the values of rows previously-set by the old default. This value has been materialized at the time of insert/update and therefore has no distinction between whether it was set by the column `DEFAULT` or set by the original operation. 254 | > 255 | > If you want to update the default of already-written rows, you must distinguish them somehow and modify them with a [backfill](./Backfilling.md) 256 | 257 | --- 258 | 259 | ## Changing the type of a column 260 | 261 | Changing the type of a column may cause the table to be rewritten. During this time, reads and writes are blocked in Postgres, and writes are blocked in MySQL and MariaDB. 262 | 263 | **BAD ❌** 264 | 265 | Safe in Postgres: 266 | 267 | - increasing length on varchar or removing the limit 268 | - changing varchar to text 269 | - changing text to varchar with no length limit 270 | - Postgres 9.2+ - increasing precision (NOTE: not scale) of decimal or numeric columns. eg, increasing 8,2 to 10,2 is safe. Increasing 8,2 to 8,4 is not safe. 271 | - Postgres 9.2+ - changing decimal or numeric to be unconstrained 272 | - Postgres 12+ - changing timestamp to timestamptz when session TZ is UTC 273 | 274 | Safe in MySQL/MariaDB: 275 | 276 | - increasing length of varchar from < 255 up to 255. 277 | - increasing length of varchar from > 255 up to max. 278 | 279 | ```elixir 280 | def change do 281 | alter table("posts") do 282 | modify :my_column, :boolean, from: :text 283 | end 284 | end 285 | ``` 286 | 287 | **GOOD ✅** 288 | 289 | Take a phased approach: 290 | 291 | 1. Create a new column 292 | 1. In application code, write to both columns 293 | 1. Backfill data from old column to new column 294 | 1. In application code, move reads from old column to the new column 295 | 1. In application code, remove old column from Ecto schemas. 296 | 1. Drop the old column. 297 | 298 | --- 299 | 300 | ## Removing a column 301 | 302 | If Ecto is still configured to read a column in any running instances of the application, then queries will fail when loading data into your structs. This can happen in multi-node deployments or if you start the application before running migrations. 303 | 304 | **BAD ❌** 305 | 306 | ```elixir 307 | # Without a code change to the Ecto Schema 308 | 309 | def change do 310 | alter table("posts") do 311 | remove :no_longer_needed_column 312 | end 313 | end 314 | ``` 315 | 316 | 317 | **GOOD ✅** 318 | 319 | Safety can be assured if the application code is first updated to remove references to the column so it's no longer loaded or queried. Then, the column can safely be removed from the table. 320 | 321 | 1. Deploy code change to remove references to the field. 322 | 1. Deploy migration change to remove the column. 323 | 324 | First deployment: 325 | 326 | ```diff 327 | # First deploy, in the Ecto schema 328 | 329 | defmodule MyApp.Post do 330 | schema "posts" do 331 | - column :no_longer_needed_column, :text 332 | end 333 | end 334 | ``` 335 | 336 | Second deployment: 337 | 338 | ```elixir 339 | def change do 340 | alter table("posts") do 341 | remove :no_longer_needed_column 342 | end 343 | end 344 | ``` 345 | 346 | --- 347 | 348 | ## Renaming a column 349 | 350 | Ask yourself: "Do I _really_ need to rename a column?". Probably not, but if you must, read on and be aware it requires time and effort. 351 | 352 | If Ecto is configured to read a column in any running instances of the application, then queries will fail when loading data into your structs. This can happen in multi-node deployments or if you start the application before running migrations. 353 | 354 | There is a shortcut: Don't rename the database column, and instead rename the schema's field name and configure it to point to the database column. 355 | 356 | **BAD ❌** 357 | 358 | ```elixir 359 | # In your schema 360 | schema "posts" do 361 | field :summary, :text 362 | end 363 | 364 | 365 | # In your migration 366 | def change do 367 | rename table("posts"), :title, to: :summary 368 | end 369 | ``` 370 | 371 | The time between your migration running and your application getting the new code may encounter trouble. 372 | 373 | **GOOD ✅** 374 | 375 | **Strategy 1** 376 | 377 | Rename the field in the schema only, and configure it to point to the database column and keep the database column the same. Ensure all calling code relying on the old field name is also updated to reference the new field name. 378 | 379 | ```elixir 380 | defmodule MyApp.MySchema do 381 | use Ecto.Schema 382 | 383 | schema "weather" do 384 | field :temp_lo, :integer 385 | field :temp_hi, :integer 386 | field :precipitation, :float, source: :prcp 387 | field :city, :string 388 | 389 | timestamps(type: :naive_datetime_usec) 390 | end 391 | end 392 | ``` 393 | 394 | ```diff 395 | ## Update references in other parts of the codebase: 396 | my_schema = Repo.get(MySchema, "my_id") 397 | - my_schema.prcp 398 | + my_schema.precipitation 399 | ``` 400 | 401 | **Strategy 2** 402 | 403 | Take a phased approach: 404 | 405 | 1. Create a new column 406 | 1. In application code, write to both columns 407 | 1. Backfill data from old column to new column 408 | 1. In application code, move reads from old column to the new column 409 | 1. In application code, remove old column from Ecto schemas. 410 | 1. Drop the old column. 411 | 412 | --- 413 | 414 | ## Renaming a table 415 | 416 | Ask yourself: "Do I _really_ need to rename a table?". Probably not, but if you must, read on and be aware it requires time and effort. 417 | 418 | If Ecto is still configured to read a table in any running instances of the application, then queries will fail when loading data into your structs. This can happen in multi-node deployments or if you start the application before running migrations. 419 | 420 | There is a shortcut: rename the schema only, and do not change the underlying database table name. 421 | 422 | **BAD ❌** 423 | 424 | ```elixir 425 | def change do 426 | rename table("posts"), to: table("articles") 427 | end 428 | ``` 429 | 430 | **GOOD ✅** 431 | 432 | **Strategy 1** 433 | 434 | Rename the schema only and all calling code, and don’t rename the table: 435 | 436 | ```diff 437 | - defmodule MyApp.Weather do 438 | + defmodule MyApp.Forecast do 439 | use Ecto.Schema 440 | 441 | schema "weather" do 442 | field :temp_lo, :integer 443 | field :temp_hi, :integer 444 | field :precipitation, :float, source: :prcp 445 | field :city, :string 446 | 447 | timestamps(type: :naive_datetime_usec) 448 | end 449 | end 450 | 451 | # and in calling code: 452 | - weather = MyApp.Repo.get(MyApp.Weather, “my_id”) 453 | + forecast = MyApp.Repo.get(MyApp.Forecast, “my_id”) 454 | ``` 455 | 456 | **Strategy 2** 457 | 458 | Take a phased approach: 459 | 460 | 1. Create the new table. This should include creating new constraints (checks and foreign keys) that mimic behavior of the old table. 461 | 1. In application code, write to both tables, continuing to read from the old table. 462 | 1. Backfill data from old table to new table 463 | 1. In application code, move reads from old table to the new table 464 | 1. In application code, remove the old table from Ecto schemas. 465 | 1. Drop the old table. 466 | 467 | --- 468 | 469 | ## Adding a check constraint 470 | 471 | Adding a check constraint blocks reads and writes to the table in Postgres, and blocks writes in MySQL/MariaDB while every row is checked. 472 | 473 | **BAD ❌** 474 | 475 | ```elixir 476 | def change do 477 | create constraint("products", :price_must_be_positive, check: "price > 0") 478 | # Creating the constraint with validate: true (the default when unspecified) 479 | # will perform a full table scan and acquires a lock preventing updates 480 | end 481 | ``` 482 | 483 | **GOOD ✅** 484 | 485 | There are two operations occurring: 486 | 487 | 1. Creating a new constraint for new or updating records 488 | 1. Validating the new constraint for existing records 489 | 490 | If these commands are happening at the same time, it obtains a lock on the table as it validates the entire table and fully scans the table. To avoid this full table scan, we can separate the operations. 491 | 492 | In one migration: 493 | 494 | ```elixir 495 | def change do 496 | create constraint("products", :price_must_be_positive, check: "price > 0", validate: false) 497 | # Setting validate: false will prevent a full table scan, and therefore 498 | # commits immediately. 499 | end 500 | ``` 501 | 502 | In the next migration: 503 | 504 | ```elixir 505 | def change do 506 | execute "ALTER TABLE products VALIDATE CONSTRAINT price_must_be_positive", "" 507 | # Acquires SHARE UPDATE EXCLUSIVE lock, which allows updates to continue 508 | end 509 | ``` 510 | 511 | These can be in the same deployment, but ensure there are 2 separate migrations. 512 | 513 | --- 514 | 515 | ## Setting NOT NULL on an existing column 516 | 517 | Setting NOT NULL on an existing column blocks reads and writes while every row is checked. Just like the Adding a check constraint scenario, there are two operations occurring: 518 | 519 | 1. Creating a new constraint for new or updating records 520 | 1. Validating the new constraint for existing records 521 | 522 | To avoid the full table scan, we can separate these two operations. 523 | 524 | **BAD ❌** 525 | 526 | ```elixir 527 | def change do 528 | alter table("products") do 529 | modify :active, :boolean, null: false 530 | end 531 | end 532 | ``` 533 | 534 | **GOOD ✅** 535 | 536 | Add a check constraint without validating it, backfill data to satiate the constraint and then validate it. This will be functionally equivalent. 537 | 538 | In the first migration: 539 | 540 | ```elixir 541 | # Deployment 1 542 | def change do 543 | create constraint("products", :active_not_null, check: "active IS NOT NULL", validate: false) 544 | end 545 | ``` 546 | 547 | This will enforce the constraint in all new rows, but not care about existing rows until that row is updated. 548 | 549 | You'll likely need a data migration at this point to ensure that the constraint is satisfied. 550 | 551 | Then, in the next deployment's migration, we'll enforce the constraint on all rows: 552 | 553 | ```elixir 554 | # Deployment 2 555 | def change do 556 | execute "ALTER TABLE products VALIDATE CONSTRAINT active_not_null", "" 557 | end 558 | ``` 559 | 560 | If you're using Postgres 12+, you can add the NOT NULL to the column after validating the constraint. From the Postgres 12 docs: 561 | 562 | > SET NOT NULL may only be applied to a column provided 563 | > none of the records in the table contain a NULL value 564 | > for the column. Ordinarily this is checked during the 565 | > ALTER TABLE by scanning the entire table; however, if 566 | > a valid CHECK constraint is found which proves no NULL 567 | > can exist, then the table scan is skipped. 568 | 569 | **However** we cannot use [`modify/3`](https://hexdocs.pm/ecto_sql/Ecto.Migration.html#modify/3) 570 | as it will include updating the column type as well unnecessarily, causing 571 | Postgres to rewrite the table. For more information, [see this example](https://github.com/fly-apps/safe-ecto-migrations/issues/10). 572 | 573 | ```elixir 574 | # **Postgres 12+ only** 575 | 576 | def change do 577 | execute "ALTER TABLE products VALIDATE CONSTRAINT active_not_null", 578 | "" 579 | 580 | execute "ALTER TABLE products ALTER COLUMN active SET NOT NULL", 581 | "ALTER TABLE products ALTER COLUMN active DROP NOT NULL" 582 | 583 | drop constraint("products", :active_not_null) 584 | end 585 | ``` 586 | 587 | If your constraint fails, then you should consider backfilling data first to cover the gaps in your desired data integrity, then revisit validating the constraint. 588 | 589 | --- 590 | 591 | ## Adding a JSON column 592 | 593 | In Postgres, there is no equality operator for the json column type, which can cause errors for existing SELECT DISTINCT queries in your application. 594 | 595 | **BAD ❌** 596 | 597 | ```elixir 598 | def change do 599 | alter table("posts") do 600 | add :extra_data, :json 601 | end 602 | end 603 | ``` 604 | 605 | **GOOD ✅** 606 | 607 | Use jsonb instead. Some say it’s like “json” but “better.” 608 | 609 | ```elixir 610 | def change do 611 | alter table("posts") do 612 | add :extra_data, :jsonb 613 | end 614 | end 615 | ``` 616 | 617 | --- 618 | 619 | ## Squashing Migrations 620 | 621 | If you have a long list of migrations, sometimes it can take a while to migrate 622 | each of those files every time the project is reset or spun up by a new 623 | developer. Thankfully, Ecto comes with mix tasks to `dump` and `load` a database 624 | structure which will represent the state of the database up to a certain point 625 | in time, not including content. 626 | 627 | - [mix ecto.dump](mix_ecto_dump) 628 | - [mix ecto.load](mix_ecto_load) 629 | 630 | Schema dumping and loading is only supported by external binaries `pg_dump` and 631 | `mysqldump`, which are used by the Postgres, MyXQL, and MySQL Ecto adapters (not 632 | supported in MSSQL adapter). 633 | 634 | For example: 635 | 636 | ``` 637 | 20210101000000 - First Migration 638 | 20210201000000 - Second Migration 639 | 20210701000000 - Third Migration <-- we are here now. run `mix ecto.dump` 640 | ``` 641 | 642 | We can "squash" the migrations up to the current day which will effectively 643 | fast-forward migrations to that structure. The Ecto Migrator will detect that 644 | the database is already migrated to the third migration, and so it begins there 645 | and migrates forward. 646 | 647 | Let's add a new migration: 648 | 649 | ``` 650 | 20210101000000 - First Migration 651 | 20210201000000 - Second Migration 652 | 20210701000000 - Third Migration <-- `structure.sql` represents up to here 653 | 20210801000000 - New Migration <-- This is where migrations will begin 654 | ``` 655 | 656 | The new migration will still run, but the first-through-third migrations will 657 | not need to be run since the structure already represents the changes applied by 658 | those migrations. At this point, you can safely delete the first, second, and 659 | third migration files or keep them for historical auditing. 660 | 661 | Let's make this work: 662 | 663 | 1. Run `mix ecto.dump` which will dump the current structure into 664 | `priv/repo/structure.sql` by default. [Check the mix task for more 665 | options](mix_ecto_dump). 666 | 2. During project setup with an empty database, run `mix ecto.load` to load 667 | `structure.sql`. 668 | 3. Run `mix ecto.migrate` to run any additional migrations created after the 669 | structure was dumped. 670 | 671 | To simplify these actions into one command, we can leverage mix aliases: 672 | 673 | ```elixir 674 | # mix.exs 675 | 676 | defp aliases do 677 | [ 678 | "ecto.reset": ["ecto.drop", "ecto.setup"], 679 | "ecto.setup": ["ecto.load", "ecto.migrate"], 680 | # ... 681 | ] 682 | end 683 | ``` 684 | 685 | Now you can run `mix ecto.setup` and it will load the database structure and run 686 | remaining migrations. Or, run `mix ecto.reset` and it will drop and run setup. 687 | Of course, you can continue running `mix ecto.migrate` as you create them. 688 | 689 | [mix_ecto_dump]: https://hexdocs.pm/ecto_sql/Mix.Tasks.Ecto.Dump.html 690 | [mix_ecto_load]: https://hexdocs.pm/ecto_sql/Mix.Tasks.Ecto.Load.html 691 | 692 | --- 693 | 694 | # Credits 695 | 696 | Created and written by David Bernheisel with recipes heavily inspired from Andrew Kane and his library [strong_migrations](https://github.com/ankane/strong_migrations). 697 | 698 | [PostgreSQL at Scale by James Coleman](https://medium.com/braintree-product-technology/postgresql-at-scale-database-schema-changes-without-downtime-20d3749ed680) 699 | 700 | [Strong Migrations by Andrew Kane](https://github.com/ankane/strong_migrations) 701 | 702 | [Adding a NOT NULL CONSTRAINT on PG Faster with Minimal Locking by Christophe Escobar](https://medium.com/doctolib/adding-a-not-null-constraint-on-pg-faster-with-minimal-locking-38b2c00c4d1c) 703 | 704 | [Postgres Runtime Configuration](https://www.postgresql.org/docs/current/runtime-config-client.html) 705 | 706 | [Automatic and Manual Ecto Migrations by Wojtek Mach](https://dashbit.co/blog/automatic-and-manual-ecto-migrations) 707 | 708 | Special thanks for sponsorship: 709 | 710 | * Fly.io 711 | 712 | Special thanks for these reviewers: 713 | 714 | * Steve Bussey 715 | * Stephane Robino 716 | * Dennis Beatty 717 | * Wojtek Mach 718 | * Mark Ericksen 719 | * And [all the contributors](https://github.com/fly-apps/safe-ecto-migrations/graphs/contributors) 720 | 721 | # Reference Material 722 | 723 | [Postgres Lock Conflicts](https://www.postgresql.org/docs/12/explicit-locking.html) 724 | 725 | | | **Current Lock →** | | | | | | | | 726 | |---------------------|-------------------|-|-|-|-|-|-|-| 727 | | **Requested Lock ↓** | ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE | 728 | | ACCESS SHARE | | | | | | | | X | 729 | | ROW SHARE | | | | | | | X | X | 730 | | ROW EXCLUSIVE | | | | | X | X | X | X | 731 | | SHARE UPDATE EXCLUSIVE | | | | X | X | X | X | X | 732 | | SHARE | | | X | X | | X | X | X | 733 | | SHARE ROW EXCLUSIVE | | | X | X | X | X | X | X | 734 | | EXCLUSIVE | | X | X | X | X | X | X | X | 735 | | ACCESS EXCLUSIVE | X | X | X | X | X | X | X | X | 736 | 737 | - `SELECT` acquires a `ACCESS SHARE` lock 738 | - `SELECT FOR UPDATE` acquires a `ROW SHARE` lock 739 | - `UPDATE`, `DELETE`, and `INSERT` will acquire a `ROW EXCLUSIVE` lock 740 | - `CREATE INDEX CONCURRENTLY` and `VALIDATE CONSTRAINT` acquires `SHARE UPDATE EXCLUSIVE` 741 | - `CREATE INDEX` acquires `SHARE` lock 742 | 743 | Knowing this, let's re-think the above table: 744 | 745 | | | **Current Operation →** | | | | | | | | 746 | |---------------------|-------------------|-|-|-|-|-|-|-| 747 | | **Blocks Operation ↓** | `SELECT` | `SELECT FOR UPDATE` | `UPDATE` `DELETE` `INSERT` | `CREATE INDEX CONCURRENTLY` `VALIDATE CONSTRAINT` | `CREATE INDEX` | SHARE ROW EXCLUSIVE | EXCLUSIVE | `ALTER TABLE` `DROP TABLE` `TRUNCATE` `REINDEX` `CLUSTER` `VACUUM FULL` | 748 | | `SELECT` | | | | | | | | X | 749 | | `SELECT FOR UPDATE` | | | | | | | X | X | 750 | | `UPDATE` `DELETE` `INSERT` | | | | | X | X | X | X | 751 | | `CREATE INDEX CONCURRENTLY` `VALIDATE CONSTRAINT` | | | | X | X | X | X | X | 752 | | `CREATE INDEX` | | | X | X | | X | X | X | 753 | | SHARE ROW EXCLUSIVE | | | X | X | X | X | X | X | 754 | | EXCLUSIVE | | X | X | X | X | X | X | X | 755 | | `ALTER TABLE` `DROP TABLE` `TRUNCATE` `REINDEX` `CLUSTER` `VACUUM FULL` | X | X | X | X | X | X | X | X | 756 | --------------------------------------------------------------------------------