├── CHANGELOG.md
├── LICENSE
├── Backfilling.md
├── Anatomy.md
└── README.md


/CHANGELOG.md:
--------------------------------------------------------------------------------
 1 | # 2025-09-16
 2 | 
 3 | - Added a note about materializing a new column's default value for existing rows
 4 | 
 5 | # 2025-09-14
 6 | 
 7 | - Add Anatomy.md guide
 8 | - Add Backfilling.md guide
 9 | 
10 | # 2024-08-19
11 | 
12 | - Correction to "Adding an Index" section about what locks are obtained.
13 |   Thanks @seanpascoe for the report!
14 | 
15 | # 2023-10-10
16 | 
17 | - Update the instructions for setting a columns' non-null constraint.
18 |   Avoid using `modify/3` since that will include adjusting the type
19 |   unnecessarily, which will cause Postgres to rewrite the table.
20 |   Thanks @dhedlund for the report!
21 | 
22 | # 2023-05-10
23 | 
24 | - Add note about setting `migration_lock: false` on CheckRepoStatus when
25 |   the Repo is using advisory locks for migrations. Thanks @cgrothaus for the
26 |   report!
27 | 
28 | # 2022-10-06
29 | 
30 | - Ecto 3.9.0 supports advisory locks for migrations, which is a great way to
31 |   manage migrations when you need to create indexes concurrently. Add a note about
32 |   using that migration locking strategy.
33 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2021 David Bernheisel
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 
23 | ---
24 | 
25 | Copyright (c) 2013 Bob Remeika and David Waller, 2015-2021 Andrew Kane
26 | 
27 | MIT License
28 | 
29 | Permission is hereby granted, free of charge, to any person obtaining
30 | a copy of this software and associated documentation files (the
31 | "Software"), to deal in the Software without restriction, including
32 | without limitation the rights to use, copy, modify, merge, publish,
33 | distribute, sublicense, and/or sell copies of the Software, and to
34 | permit persons to whom the Software is furnished to do so, subject to
35 | the following conditions:
36 | 
37 | The above copyright notice and this permission notice shall be
38 | included in all copies or substantial portions of the Software.
39 | 
40 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
41 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
42 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
43 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
44 | LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
45 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
46 | WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
47 | 


--------------------------------------------------------------------------------
/Backfilling.md:
--------------------------------------------------------------------------------
  1 | # Backfilling Data
  2 | 
  3 | When I say "backfilling data", I mean that as any attempt to change data in bulk. This can happen in code through migrations, application code, UIs that allow multiple selections and updates, or in a console connected to a running application. Since bulk changes affect a lot of data, it's always a good idea to have the code reviewed before it runs. You also want to check that it runs efficiently and does not overwhelm the database. Ideally, it's nice when the code is written to be safe to re-run. For these reasons, please don't change data in bulk through a console!
  4 | 
  5 | We're going to focus on bulk changes executed though Ecto migrations, but the same principles are applicable to any case where bulk changes are being made. Typical scenarios where you might need to run data migrations is when you need to fill in data for records that already exist (hence, backfilling data). This usually comes up when table structures are changed in the database.
  6 | 
  7 | Some examples of backfilling:
  8 | 
  9 | - Populating data into a new column
 10 | - Changing a column to make it required. May require changing existing rows to set a value.
 11 | - Splitting one database table into several
 12 | - Fixing bad data
 13 | 
 14 | For simplicity, we are using `Ecto.Migrator` to run our data migrations, but it's important to not let these migrations break developers' environments over time (more on this below). If using migrations to change data is a normal process that happens regularly, then you may consider exploring a migration system outside of `Ecto.Migrator` that is observable, hooks into error reporting, metrics, and allows for dry runs. This guide is intended as a starting point, and since Ecto ships with a great migration runner, we'll leverage it to also run the data migrations.
 15 | 
 16 | There are both bad and good ways to write these data migrations. Let explore some:
 17 | 
 18 | ### Bad ❌
 19 | 
 20 | In the following example, a migration references the schema `MyApp.MySchema`.
 21 | 
 22 | ```elixir
 23 | defmodule MyApp.Repo.DataMigrations.BackfillPosts do
 24 |   use Ecto.Migration
 25 |   import Ecto.Query
 26 | 
 27 |   def change do
 28 |     alter table("posts") do
 29 |       add :new_data, :text
 30 |     end
 31 | 
 32 |     flush()
 33 | 
 34 |     MyApp.MySchema
 35 |     |> where(new_data: nil)
 36 |     |> MyApp.Repo.update_all(set: [new_data: "some data"])
 37 |   end
 38 | end
 39 | ```
 40 | 
 41 | The problem is the code and schema may change over time. However, migrations are using a snapshot of your schemas at the time it's written. In the future, many assumptions may no longer be true. For example, the new_data column may not be present anymore in the schema causing the query to fail if this migration is run months later.
 42 | 
 43 | Additionally, in your development environment, you might have 10 records to migrate; in staging, you might have 100; in production, you might have 1 billion to migrate. Scaling your approach matters.
 44 | 
 45 | Ultimately, there are several bad practices here:
 46 | 
 47 | 1. The Ecto schema in the query may change after this migration was written.
 48 | 1. If you try to backfill the data all at once, it may exhaust the database memory and/or CPU if it's changing a large data set.
 49 | 1. Backfilling data inside a transaction for the migration locks row updates for the duration of the migration, even if you are updating in batches.
 50 | 1. Disabling the transaction for the migration and only batching updates may still spike the database CPU to 100%, causing other concurrent reads or writes to time out.
 51 | 
 52 | ###  Good ✅
 53 | 
 54 | There are four keys to backfilling safely:
 55 | 
 56 | 1. running outside a transaction
 57 | 1. batching
 58 | 1. throttling
 59 | 1. resiliency
 60 | 
 61 | As we've learned in this guide, it's straight-forward to disable the migration transactions. Add these options to the migration:
 62 | 
 63 | ```elixir
 64 | @disable_ddl_transaction true
 65 | @disable_migration_lock true
 66 | ```
 67 | 
 68 | Batching our data migrations still has several challenges.
 69 | 
 70 | We'll start with how do we paginate efficiently: `LIMIT`/`OFFSET` by itself is an expensive query for large tables (they start fast, but slow to a crawl when in the later pages of the table), so we must find another way to paginate. Since we cannot use a database transaction, this also implies we cannot leverage cursors since they require a transaction. This leaves us with [keyset pagination](https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/).
 71 | 
 72 | For querying and updating the data, there are two ways to "snapshot" your schema at the time of the migration. We'll use both options below in the examples:
 73 | 
 74 | 1. Execute raw SQL that represents the table at that moment. Do not use Ecto schemas. Prefer this approach when you can. Your application's Ecto schemas will change over time, but your migration should not, therefore it's not a true snapshot of the data at the time.
 75 | 1. Write a small Ecto schema module inside the migration that only uses what you need. Then use that in your data migration. This is helpful if you prefer the Ecto API and decouples from your application's Ecto schemas as it evolves separately.
 76 | 
 77 | For throttling, we can simply add a `Process.sleep(@throttle)` for each page.
 78 | 
 79 | For resiliency, we need to ensure that we handle errors without losing our progress. You don't want to migrate the same data twice! Most data migrations I have run find some records in a state that I wasn't expecting. This causes the data migration to fail. When the data migration stops, that means I have to write a little bit more code to handle that scenario, and re-run the migration. Every time the data migration is re-run, it should pick up where it left off without revisiting already-migrated records.
 80 | 
 81 | Finally, to manage these data migrations separately, we need to:
 82 | 
 83 | 1. Store data migrations separately from your schema migrations.
 84 | 1. Run the data migrations manually.
 85 | 
 86 | To achieve this, be inspired by the [Ecto's documentation on creating a Release module](https://hexdocs.pm/ecto_sql/Ecto.Migrator.html#module-example-running-migrations-in-a-release), and extend your release module to allow options to pass into `Ecto.Migrator` that specifies the version to migrate and the data migrations' file path, for example:
 87 | 
 88 | ```elixir
 89 | defmodule MyApp.Release do
 90 |   # ...
 91 |   @doc """
 92 |   Migrate data in the database. Defaults to migrating to the latest, `[all: true]`
 93 |   Also accepts `[step: 1]`, or `[to: 20200118045751]`
 94 |   """
 95 |   def migrate_data(opts \\ [all: true]) do
 96 |     for repo <- repos() do
 97 |       path = Ecto.Migrator.migrations_path(repo, "data_migrations")
 98 |       {:ok, _, _} = Ecto.Migrator.with_repo(repo, &Ecto.Migrator.run(&1, path, :up, opts))
 99 |     end
100 |   end
101 | end
102 | ```
103 | 
104 | ## Batching Deterministic Data
105 | 
106 | If the data can be queried with a condition that is removed after update then you can repeatedly query the data and update the data until the query result is empty. For example, if a column is currently null and will be updated to not be null, then you can query for the null records and pick up where you left off.
107 | 
108 | Here's how we can manage the backfill:
109 | 
110 | 1. Disable migration transactions.
111 | 1. Use keyset pagination: Order the data, find rows greater than the last mutated row and limit by batch size.
112 | 1. For each page, mutate the records.
113 | 1. Check for failed updates and handle it appropriately.
114 | 1. Use the last mutated record's ID as the starting point for the next page. This helps with resiliency and prevents looping on the same record over and over again.
115 | 1. Arbitrarily sleep to throttle and prevent exhausting the database.
116 | 1. Rinse and repeat until there are no more records
117 | 
118 | For example:
119 | 
120 | ```bash
121 | mix ecto.gen.migration --migrations-path=priv/repo/data_migrations backfill_posts
122 | ```
123 | 
124 | And modify the migration:
125 | 
126 | ```elixir
127 | defmodule MyApp.Repo.DataMigrations.BackfillPosts do
128 |   use Ecto.Migration
129 |   import Ecto.Query
130 | 
131 |   @disable_ddl_transaction true
132 |   @disable_migration_lock true
133 |   @batch_size 1000
134 |   @throttle_ms 100
135 | 
136 |   def up do
137 |     throttle_change_in_batches(&page_query/1, &do_change/1)
138 |   end
139 | 
140 |   def down, do: :ok
141 | 
142 |   def do_change(batch_of_ids) do
143 |     {_updated, results} = repo().update_all(
144 |       from(r in "weather", select: r.id, where: r.id in ^batch_of_ids),
145 |       [set: [approved: true]],
146 |       log: :info
147 |     )
148 |     not_updated = MapSet.difference(MapSet.new(batch_of_ids), MapSet.new(results)) |> MapSet.to_list()
149 |     Enum.each(not_updated, &handle_non_update/1)
150 |     Enum.sort(results)
151 |   end
152 | 
153 |   def page_query(last_id) do
154 |     # Notice how we do not use Ecto schemas here.
155 |     from(
156 |       r in "weather",
157 |       select: r.id,
158 |       where: is_nil(r.approved) and r.id > ^last_id,
159 |       order_by: [asc: r.id],
160 |       limit: @batch_size
161 |     )
162 |   end
163 | 
164 |   # If you have BigInt or Int IDs, fallback last_pos = 0
165 |   # If you have UUID IDs, fallback last_pos = "00000000-0000-0000-0000-000000000000"
166 |   defp throttle_change_in_batches(query_fun, change_fun, last_pos \\ 0)
167 |   defp throttle_change_in_batches(_query_fun, _change_fun, nil), do: :ok
168 |   defp throttle_change_in_batches(query_fun, change_fun, last_pos) do
169 |     case repo().all(query_fun.(last_pos), [log: :info, timeout: :infinity]) do
170 |       [] ->
171 |         :ok
172 | 
173 |       ids ->
174 |         results = change_fun.(List.flatten(ids))
175 |         next_page = List.first(results)
176 |         Process.sleep(@throttle_ms)
177 |         throttle_change_in_batches(query_fun, change_fun, next_page)
178 |     end
179 |   end
180 | 
181 |   defp handle_non_update(id) do
182 |     raise "#{inspect(id)} was not updated"
183 |   end
184 | end
185 | ```
186 | 
187 | ## Batching Arbitrary Data
188 | 
189 | If the data being updated does not indicate it's already been updated, then we need to take a snapshot of the current data and store it temporarily. For example, if all rows should increment a column's value by 10, how would you know if a record was already updated? You could load a list of IDs into the application during the migration, but what if the process crashes? Instead we're going to keep the data we need in the database.
190 | 
191 | To do this, it works well if we can pick a specific point in time where all records _after_ that point in time do not need adjustment. This happens when you realize a bug was creating bad data and after the bug was fixed and deployed, all new entries are good and should not be touched as we clean up the bad data. For this example, we'll use `inserted_at` as our marker. Let's say that the bug was fixed on a midnight deploy on 2021-08-22.
192 | 
193 | Here's how we'll manage the backfill:
194 | 
195 | 1. Create a "temporary" table. In this example, we're creating a real table that we'll drop at the end of the data migration. In Postgres, there are [actual temporary tables](https://www.postgresql.org/docs/12/sql-createtable.html) that are discarded after the session is over; we're not using those because we need resiliency in case the data migration encounters an error. The error would cause the session to be over, and therefore the temporary table tracking progress would be lost 🙁. Real tables don't have this problem. Likewise, we don't want to store IDs in application memory during the migration for the same reason.
196 | 1. Populate that temporary table with IDs of records that need to update. This query only requires a read of the current records, so there are no consequential locks occurring when populating, but be aware this could be a lengthy query. Populating this table can occur at creation or afterwards; in this example we'll populate it at table creation.
197 | 1. Ensure there's an index on the temporary table so it's fast to delete IDs from it. I use an index instead of a primary key because it's easier to re-run the migration in case there's an error. There isn't a straight-forward way to `CREATE IF NOT EXIST` on a primary key; but you can do that easily with an index.
198 | 1. Use keyset pagination to pull batches of IDs from the temporary table. Do this inside a database transaction and lock records for updates. Each batch should read and update within milliseconds, so this should have little impact on concurrent reads and writes.
199 | 1. For each batch of records, determine the data changes that need to happen. This can happen for each record.
200 | 1. [Upsert](https://wiki.postgresql.org/wiki/UPSERT) those changes to the real table. This insert will include the ID of the record that already exists and a list of attributes to change for that record. Since these insertions will conflict with existing records, we'll instruct Postgres to replace certain fields on conflicts.
201 | 1. Delete those IDs from the temporary table since they're updated on the real table. Close the database transaction for that batch.
202 | 1. Throttle so we don't overwhelm the database, and also give opportunity to other concurrent processes to work.
203 | 1. Rinse and repeat until the temporary table is empty.
204 | 1. Finally, drop the temporary table when empty.
205 | 
206 | Let's see how this can work:
207 | 
208 | ```bash
209 | mix ecto.gen.migration --migrations-path=priv/repo/data_migrations backfill_weather
210 | ```
211 | 
212 | Modify the migration:
213 | 
214 | ```elixir
215 | # Both of these modules are in the same migration file
216 | # In this example, we'll define a new Ecto Schema that is a snapshot
217 | # of the current underlying table and no more.
218 | defmodule MyApp.Repo.DataMigrations.BackfillWeather.MigratingSchema do
219 |   use Ecto.Schema
220 | 
221 |   # Copy of the schema at the time of migration
222 |   schema "weather" do
223 |     field :temp_lo, :integer
224 |     field :temp_hi, :integer
225 |     field :prcp, :float
226 |     field :city, :string
227 | 
228 |     timestamps(type: :naive_datetime_usec)
229 |   end
230 | end
231 | 
232 | defmodule MyApp.Repo.DataMigrations.BackfillWeather do
233 |   use Ecto.Migration
234 |   import Ecto.Query
235 |   alias MyApp.Repo.DataMigrations.BackfillWeather.MigratingSchema
236 | 
237 |   @disable_ddl_transaction true
238 |   @disable_migration_lock true
239 |   @temp_table_name "records_to_update"
240 |   @batch_size 1000
241 |   @throttle_ms 100
242 | 
243 |   def up do
244 |     repo().query!("""
245 |     CREATE TABLE IF NOT EXISTS "#{@temp_table_name}" AS
246 |     SELECT id FROM weather WHERE inserted_at < '2021-08-21T00:00:00'
247 |     """, [], log: :info, timeout: :infinity)
248 |     flush()
249 | 
250 |     create_if_not_exists index(@temp_table_name, [:id])
251 |     flush()
252 | 
253 |     throttle_change_in_batches(&page_query/1, &do_change/1)
254 | 
255 |     # You may want to check if it's empty before dropping it.
256 |     # Since we're raising an exception on non-updates
257 |     # we don't have to do that in this example.
258 |     drop table(@temp_table_name)
259 |   end
260 | 
261 |   def down, do: :ok
262 | 
263 |   def do_change(batch_of_ids) do
264 |     # Wrap in a transaction to momentarily lock records during read/update
265 |     repo().transaction(fn ->
266 |       mutations =
267 |         from(
268 |           r in MigratingSchema,
269 |           where: r.id in ^batch_of_ids,
270 |           lock: "FOR UPDATE"
271 |         )
272 |         |> repo().all()
273 |         |> Enum.reduce([], &mutation/2)
274 | 
275 |       # Don't be fooled by the name `insert_all`, this is actually an upsert
276 |       # that will update existing records when conflicting; they should all
277 |       # conflict since the ID is included in the update.
278 | 
279 |       {_updated, results} = repo().insert_all(
280 |         MigratingSchema,
281 |         mutations,
282 |         returning: [:id],
283 |         # Alternatively, {:replace_all_except, [:id, :inserted_at]}
284 |         on_conflict: {:replace, [:temp_lo, :updated_at]},
285 |         conflict_target: [:id],
286 |         placeholders: %{now: NaiveDateTime.utc_now()},
287 |         log: :info
288 |       )
289 |       results = results |> Enum.map(& &1.id) |> Enum.sort()
290 | 
291 |       not_updated =
292 |         mutations
293 |         |> Enum.map(& &1[:id])
294 |         |> MapSet.new()
295 |         |> MapSet.difference(MapSet.new(results))
296 |         |> MapSet.to_list()
297 | 
298 |       Enum.each(not_updated, &handle_non_update/1)
299 |       repo().delete_all(from(r in @temp_table_name, where: r.id in ^results))
300 | 
301 |       results
302 |     end)
303 |   end
304 | 
305 |   def mutation(record, mutations_acc) do
306 |     # This logic can be whatever you need; we'll just do something simple
307 |     # here to illustrate
308 |     if record.temp_hi > 1 do
309 |       # No updated needed
310 |       mutations_acc
311 |     else
312 |       # Upserts don't update autogenerated fields like timestamps, so be sure
313 |       # to update them yourself. The inserted_at value should never be used
314 |       # since all these records are already inserted, and we won't replace
315 |       # this field on conflicts; we just need it to satisfy table constraints.
316 |       [%{
317 |         id: record.id,
318 |         temp_lo: record.temp_hi - 10,
319 |         inserted_at: {:placeholder, :now},
320 |         updated_at: {:placeholder, :now}
321 |       } | mutations_acc]
322 |     end
323 |   end
324 | 
325 |   def page_query(last_id) do
326 |     from(
327 |       r in @temp_table_name,
328 |       select: r.id,
329 |       where: r.id > ^last_id,
330 |       order_by: [asc: r.id],
331 |       limit: @batch_size
332 |     )
333 |   end
334 | 
335 |   defp handle_non_update(id) do
336 |     raise "#{inspect(id)} was not updated"
337 |   end
338 | 
339 |   # If you have BigInt IDs, fallback last_pod = 0
340 |   # If you have UUID IDs, fallback last_pos = "00000000-0000-0000-0000-000000000000"
341 |   # If you have Int IDs, you should consider updating it to BigInt or UUID :)
342 |   defp throttle_change_in_batches(query_fun, change_fun, last_pos \\ 0)
343 |   defp throttle_change_in_batches(_query_fun, _change_fun, nil), do: :ok
344 |   defp throttle_change_in_batches(query_fun, change_fun, last_pos) do
345 |     case repo().all(query_fun.(last_pos), [log: :info, timeout: :infinity]) do
346 |       [] ->
347 |         :ok
348 | 
349 |       ids ->
350 |         case change_fun.(List.flatten(ids)) do
351 |           {:ok, results} ->
352 |             next_page = List.first(results)
353 |             Process.sleep(@throttle_ms)
354 |             throttle_change_in_batches(query_fun, change_fun, next_page)
355 |           error ->
356 |             raise error
357 |         end
358 |     end
359 |   end
360 | end
361 | ```
362 | 
363 | ---
364 | 
365 | This guide was originally published on [Fly.io Phoenix Files]( https://fly.io/phoenix-files/backfilling-data/).
366 | 


--------------------------------------------------------------------------------
/Anatomy.md:
--------------------------------------------------------------------------------
  1 | [Ecto migrations](https://hexdocs.pm/ecto_sql/Ecto.Migration.html) are used to do the following:
  2 | 
  3 | * Change the structure of the database, such as adding fields, tables, or indexes to improve lookups (DDL).
  4 | * Populating fields with modified data or new data (DML).
  5 | 
  6 | In order for us to create and run safe Ecto migrations on our database, it is helpful to understand what is actually happening with the database. To do that, we'll dig deeper into how Ecto migrations work by looking both at the code being executed and the Postgres logs.
  7 | 
  8 | To generate a migration, we'll use run a mix task:
  9 | 
 10 | ```bash
 11 | mix ecto.gen.migration create_test_table
 12 | ```
 13 | 
 14 | > [!TIP]
 15 | > If you're using Phoenix, you might consider `mix phx.gen.schema` which will generate a migration and also allows you to pass in fields and types. See `mix help phx.gen.schema` for more information.
 16 | 
 17 | This command will generate a file in `priv/repo/migrations` given the repo name of `Repo`. If you named it `OtherRepo` the file would be `in priv/other_repo/migrations`.
 18 | 
 19 | Let's look at that file:
 20 | 
 21 | ```elixir
 22 | defmodule MyApp.Repo.Migrations.CreateTestTable do
 23 |   use Ecto.Migration
 24 | 
 25 |   def change do
 26 | 
 27 |   end
 28 | end
 29 | ```
 30 | 
 31 | Let's make some changes; how about we create a table that tracks a city's climate?
 32 | 
 33 | ```elixir
 34 | defmodule MyApp.Repo.Migrations.CreateTestTable do
 35 |   use Ecto.Migration
 36 | 
 37 |   def change do
 38 |     create table("test") do
 39 |       add :city,    :string, size: 40
 40 |       add :temp_lo, :integer
 41 |       add :temp_hi, :integer
 42 |       add :prcp,    :float
 43 | 
 44 |       timestamps()
 45 |     end
 46 |   end
 47 | end
 48 | ```
 49 | 
 50 | Now that we have a migration, let's run it! Run `mix ecto.migrate`:
 51 | 
 52 | ```shell
 53 | $ mix ecto.migrate
 54 | 21:26:18.992 [info]  == Running 20210702012346 MyApp.Repo.Migrations.CreateTestTable.change/0 forward
 55 | 21:26:18.994 [info]  create table test
 56 | 21:26:19.004 [info]  == Migrated 20210702012346 in 0.0s
 57 | ```
 58 | 
 59 | ## Inspect SQL
 60 | 
 61 | Understanding exactly what SQL commands are running is helpful to ensure safe migrations, so how do we see the SQL that is executed? By default, Ecto does not log the raw SQL. First, I'll rollback, and then re-migrate but with an additional flag `--log-migrations-sql` so we can see what actually runs.
 62 | 
 63 | ```shell
 64 | $ mix ecto.rollback
 65 | 21:29:32.287 [info]  == Running 20210702012346 MyApp.Repo.Migrations.CreateTestTable.change/0 backward
 66 | 21:29:32.289 [info]  drop table test
 67 | 21:29:32.292 [info]  == Migrated 20210702012346 in 0.0s
 68 | ```
 69 | 
 70 | ```shell
 71 | $ mix ecto.migrate --log-migrations-sql
 72 | 21:29:36.461 [info]  == Running 20210702012346 MyApp.Repo.Migrations.CreateTestTable.change/0 forward
 73 | 21:29:36.462 [info]  create table test
 74 | 21:29:36.466 [debug] QUERY OK db=3.2ms
 75 | CREATE TABLE "test" ("id" bigserial, "city" varchar(40), "temp_lo" integer, "temp_hi" integer, "prcp" float, "inserted_at" timestamp(0) NOT NULL, "updated_at" timestamp(0) NOT NULL, PRIMARY KEY ("id")) []
 76 | 21:29:36.467 [info]  == Migrated 20210702012346 in 0.0s
 77 | ```
 78 | 
 79 | Ecto logged the SQL for our changes, but we're not seeing all the SQL that Ecto is running for the migration-- we're missing the Ecto.Migrator SQL that manages the migration. To get these missing logs, we'll also use another flag, `--log-migrator-sql`. Here I am tailing the PostgreSQL server logs:
 80 | 
 81 | ```shell
 82 | $ mix ecto.rollback
 83 | $ mix ecto.migrate --log-migrator-sql --log-migrations-sql
 84 | ```
 85 | 
 86 | ```
 87 | LOG:  statement: BEGIN
 88 | LOG:  execute <unnamed>: LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE
 89 | LOG:  execute ecto_3: SELECT s0."version"::bigint FROM "schema_migrations" AS s0
 90 | LOG:  statement: BEGIN
 91 | LOG:  execute <unnamed>: CREATE TABLE "weather" ("id" bigserial, "city" varchar(40), "temp_lo" integer, "temp_hi" integer, "prcp" float, "inserted_at" timestamp(0) NOT NULL, "updated_at" timestamp(0) NOT NULL, PRIMARY KEY ("id"))
 92 | LOG:  execute ecto_insert_schema_migrations: INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2)
 93 | DETAIL:  parameters: $1 = '20210718204657', $2 = '2021-07-18 20:53:49'
 94 | LOG:  statement: COMMIT
 95 | LOG:  statement: COMMIT
 96 | ```
 97 | 
 98 | Let's trace the code. When running migrations, Ecto executes these functions:
 99 | 
100 | 1. [`Ecto.Migrator.run/4`](https://github.com/elixir-ecto/ecto_sql/blob/557335f9a2a1e6950c1d761063e84aa5d03cb312/lib/ecto/migrator.ex#L384)
101 | 2. [`Ecto.Migrator.lock_for_migrations/4`](https://github.com/elixir-ecto/ecto_sql/blob/557335f9a2a1e6950c1d761063e84aa5d03cb312/lib/ecto/migrator.ex#L464)
102 | 3. [The adapter's `lock_for_migrations` implementation](https://github.com/elixir-ecto/ecto_sql/blob/557335f9a2a1e6950c1d761063e84aa5d03cb312/lib/ecto/adapters/postgres.ex#L207)
103 | 4. [Wrapped in another transaction](https://github.com/elixir-ecto/ecto_sql/blob/557335f9a2a1e6950c1d761063e84aa5d03cb312/lib/ecto/adapters/postgres.ex#L217)
104 | 
105 | Inside the transaction, the Ecto Postgres adapter obtains a `SHARE UPDATE EXCLUSIVE` lock of the "schema_migrations" table.
106 | 
107 | **Why this lock is important**: Systems at scale may have multiple instances of the application connected to the same database, and during a deployment all of the instances rolling out may try to migrate that database at the same time, Ecto leverages this `SHARE UPDATE EXCLUSIVE` lock as a way to ensure that only one instance is running a migration at a time and only once.
108 | 
109 | This is what the migration actually looks like:
110 | 
111 | ```sql
112 | BEGIN;
113 | LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE;
114 | BEGIN;
115 | CREATE TABLE "test" ("id" bigserial, "city" varchar(40), "temp_lo" integer, "temp_hi" integer, "prcp" float, "inserted_at" timestamp(0) NOT NULL, "updated_at" timestamp(0) NOT NULL, PRIMARY KEY ("id"));
116 | INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ('20210718204657','2021-07-18 20:53:49');
117 | COMMIT;
118 | COMMIT;
119 | ```
120 | 
121 | When a migration fails, the transaction is rolled back and no changes are kept in the database. In most situations, these are great defaults.
122 | 
123 | Veteran database administrators may notice the database transactions (`BEGIN`/`COMMIT`) and wonder how to turn those off in situations where transactions could cause problems; such as when adding indexes concurrently; Ecto provides some options that can help with transactions and locks.  Let's explore some of those options next.
124 | 
125 | ## Migration Options
126 | 
127 | A typical migration has this structure (reminder: this guide is using Postgres; other adapters will vary):
128 | 
129 | ```sql
130 | BEGIN;
131 |   LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE;
132 |   BEGIN;
133 |     -- after_begin callback
134 |     -- my_changes
135 |     -- before_commit callback
136 |     INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2);
137 |   COMMIT;
138 | COMMIT;
139 | ```
140 | 
141 | _`my_changes` refers to the changes you specify in each of your migrations._
142 | 
143 | ### `@disable_migration_lock`
144 | 
145 | By default, Ecto acquires a lock on the "schema_migrations" table during the migration transaction:
146 | 
147 | ```SQL
148 | BEGIN;
149 |   -- ↓ THIS LOCK ↓
150 |   LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE
151 |   BEGIN;
152 |     -- after_begin callback
153 |     -- my_changes
154 |     -- before_commit callback
155 |     INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2);
156 |   COMMIT;
157 | COMMIT;
158 | ```
159 | 
160 | You want this lock for most migrations because running multiple migrations simultaneously without this lock could have unpredictable results. In database transactions, any locks obtained inside the transaction are released when the transaction is committed, which then unblocks other transactions that touch the same records to proceed.
161 | 
162 | However, there are some scenarios where you don't want a lock. We'll explore these scenarios later on (for example, backfilling data and creating indexes).
163 | 
164 | You can skip this lock in Ecto by setting the module attribute `@disable_migration_lock true` in your migration. When the migration lock is disabled, the migration looks like this:
165 | 
166 | ```sql
167 | BEGIN;
168 |   -- after_begin callback
169 |   -- my changes
170 |   -- before_commit callback
171 |   INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2);
172 | COMMIT;
173 | ```
174 | 
175 | ### `@disable_ddl_transaction`
176 | 
177 | By default, Ecto wraps your changes in a transaction:
178 | 
179 | ```sql
180 | BEGIN;
181 |   LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE
182 |   -- ↓ THIS TRANSACTION ↓
183 |   BEGIN;
184 |     -- after_begin callback
185 |     -- my changes
186 |     -- before_commit callback
187 |     INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2);
188 |   COMMIT;
189 |   -- ↑ THIS TRANSACTION ↑
190 | COMMIT;
191 | ```
192 | 
193 | This ensures that when failures occur during a migration, your database is not left in an incomplete or mangled state.
194 | 
195 | There are scenarios where you don't want a migration to run inside a transaction. Like when performing data migrations or when running commands such as `CREATE INDEX CONCURRENTLY` that can run in the background in the database after you issue the command and cannot be inside a transaction.
196 | 
197 | You can disable this transaction by setting the module attribute `@disable_ddl_transaction true` in your migration. The migration then looks like this:
198 | 
199 | ```sql
200 | BEGIN;
201 |   LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE
202 |   -- my_changes
203 |   INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2);
204 | COMMIT;
205 | ```
206 | 
207 | > [!TIP]
208 | > For Postgres, when disabling transactions, you'll also want to disable the migration lock since that uses yet another transaction. When running these migrations in a multi-node environment, you'll need a process to ensure these migrations are only run once since there is no protection against multiple nodes running the same migration at the same exact time.
209 | 
210 | Disabling both the migration lock and the DDL transaction, your migration will be pretty simple:
211 | 
212 | ```sql
213 | -- my_changes
214 | INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2);
215 | ```
216 | 
217 | ### Transaction Callbacks
218 | 
219 | In the examples above, you'll notice there are `after_begin` and `before_commit` hooks if the migration is occurring within a transaction:
220 | 
221 | ```sql
222 | BEGIN;
223 |   -- after_begin hook  ← THIS HOOK
224 |   -- my_changes
225 |   -- before_commit hook  ← AND THIS HOOK
226 |   INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ($1,$2);
227 | COMMIT;
228 | ```
229 | 
230 | You can use these hooks by defining `after_begin/0` and `before_commit/0` in your migration. A good use case for this is setting migration lock timeouts as safeguards (see later Safeguards section).
231 | 
232 | ```elixir
233 | defmodule MyApp.Repo.Migrations.CreateTestTable do
234 |   use Ecto.Migration
235 | 
236 |   def change do
237 |     # ... my potentially long-locking migration
238 |   end
239 | 
240 |   def after_begin do
241 |     execute "SET lock_timeout TO '5s'", "SET lock_timeout TO '10s'"
242 |   end
243 | end
244 | ```
245 | 
246 | > [!CAUTION]
247 | > Be aware that these callbacks are not called when `@disable_ddl_transaction true` is configured because they rely on the transaction being present.
248 | 
249 | ## Inspecting Locks In a Query
250 | 
251 | Before we dive into safer migration practices, we should cover how to check if a migration could potentially block your application. In Postgres, there is a `pg_locks` table that we can query that reveals the locks in the system. Let's query that table alongside our changes from the migration, but return the locks so we can see what locks were obtained from the changes.
252 | 
253 | ```sql
254 | BEGIN;
255 |   -- Put your actions in here. For example, validating a constraint
256 |   ALTER TABLE addresses VALIDATE CONSTRAINT "my_table_locking_constraint";
257 | 
258 |   -- end your transaction with a SELECT on pg_locks so you can see the locks
259 |   -- that occurred during the transaction
260 |   SELECT locktype, relation::regclass, mode, transactionid AS tid, virtualtransaction AS vtid, pid, granted FROM pg_locks;
261 | COMMIT;
262 | ```
263 | 
264 | The result from this SQL command should return the locks obtained during the database transaction. Let's see an example: We'll add a unique index without concurrency so we can see the locks it obtains:
265 | 
266 | ```sql
267 | BEGIN;
268 |   LOCK TABLE "schema_migrations" IN SHARE UPDATE EXCLUSIVE MODE;
269 |   -- we are going to squash the embedded transaction here for simplicity
270 |   CREATE UNIQUE INDEX IF NOT EXISTS "weather_city_index" ON "weather" ("city");
271 |   INSERT INTO "schema_migrations" ("version","inserted_at") VALUES ('20210718210952',NOW());
272 |   SELECT locktype, relation::regclass, mode, transactionid AS tid, virtualtransaction AS vtid, pid, granted FROM pg_locks;
273 | COMMIT;
274 | 
275 | --    locktype    |      relation      |           mode           |  tid   | vtid  | pid | granted
276 | -- ---------------+--------------------+--------------------------+--------+-------+-----+---------
277 | --  relation      | pg_locks           | AccessShareLock          |        | 2/321 | 253 | t
278 | --  relation      | schema_migrations  | RowExclusiveLock         |        | 2/321 | 253 | t
279 | --  virtualxid    |                    | ExclusiveLock            |        | 2/321 | 253 | t
280 | --  relation      | weather_city_index | AccessExclusiveLock      |        | 2/321 | 253 | t
281 | --  relation      | schema_migrations  | ShareUpdateExclusiveLock |        | 2/321 | 253 | t
282 | --  transactionid |                    | ExclusiveLock            | 283863 | 2/321 | 253 | t
283 | --  relation      | weather            | ShareLock                |        | 2/321 | 253 | t
284 | -- (7 rows)
285 | ```
286 | 
287 | Let's go through each of these:
288 | 
289 | 1. `relation | pg_locks | AccessShareLock` - This is us querying the `"pg_locks"` table in the transaction so we can see which locks are taken. It has the weakest lock which only conflicts with `AccessExclusive` which should never happen on the internal `"pg_locks"` table itself.
290 | 1. `relation | schema_migrations | RowExclusiveLock` - This is because we're inserting a row into the `"schema_migrations"` table. Reads are still allowed, but mutation on this table is blocked until the transaction is done.
291 | 1. `virtualxid | _ | ExlusiveLock` - Querying `pg_locks` created a virtual transaction on the `SELECT` query. We can ignore this.
292 | 1. `relation | weather_city_index | AccessExclusiveLock` - We're creating the index, so this new index will be completely locked to any reads and writes until this transaction is complete.
293 | 1. `relation | schema_migrations | ShareUpdateExclusiveLock` - This lock is acquired by Ecto to ensure that only one mutable operation is happening on the table. This is what allows multiple nodes able to run migrations at the same time safely. Other processes can still read the `"schema_migrations"` table, but you cannot write to it.
294 | 1. `transactionid | _ | ExclusiveLock` - This lock is on a transaction that is happening; in this case, it has an `ExclusiveLock` on itself; meaning that if another transaction occurring at the same time conflicts with this transaction, the other transaction will acquire a lock on this transaction so it knows when it's done. I call this "lockception". 🙂🤯
295 | 1. `relation | weather | ShareLock` - Finally, the reason why we're here. Remember, we're creating a unique index on the `"weather"` table without concurrency. This lock is our red flag 🚩. Notice it acquires a ShareLock on the table. This means it blocks writes! That's not good if we deploy this and have processes or web requests that regularly write to this table. `UPDATE`, `DELETE`, and `INSERT` acquire a `RowExclusiveLock` which conflicts with the ShareLock.
296 | 
297 | To avoid this lock, we change the command to `CREATE INDEX CONCURRENTLY ...`; when using `CONCURRENTLY`, it prevents us from using database transactions which is unfortunate because now we cannot easily see the locks the command obtains. We know this will be safer however because `CREATE INDEX CONCURRENTLY` acquires a `ShareUpdateExclusiveLock` which does not conflict with `RowExclusiveLock` (See Reference Material).
298 | 
299 | This scenario is revisited later in [Migration Recipes](README.md).
300 | 
301 | ## Safeguards in the database
302 | 
303 | It's a good idea to add safeguards so no developer on the team accidentally locks up the database for too long. Even if you know all about databases and locks, you might have a forgetful day and try to add an index non-concurrently and bring down production. Safeguards are good.
304 | 
305 | We can add one or more safeguards:
306 | 
307 | 1. Automatically cancel a statement if the lock is held for too long. There are two ways to do this:
308 |   1. Apply to migrations. This can be done with a lock_timeout inside a transaction.
309 |   2. Apply to any statements. This can be done by setting a lock_timeout on a Postgres role.
310 | 2. Automatically cancel statements that take too long. This is broader than #1 because it includes any statement, not just locks.
311 | 
312 | Let's dive into these safeguards.
313 | 
314 | ### Add a `lock_timeout`
315 | 
316 | One safeguard we can add to migrations is a lock timeout. A lock timeout ensures a lock will not last more than n seconds. This way, when an unsafe migration sneaks in, it only locks tables to updates and writes (and possibly reads) for n seconds instead of indefinitely when the migration finishes.
317 | 
318 | From the Postgres docs:
319 | 
320 | > Abort any statement that waits longer than the specified amount of time while attempting to acquire a lock on a table, index, row, or other database object. The time limit applies separately to each lock acquisition attempt. The limit applies both to explicit locking requests (such as LOCK TABLE, or SELECT FOR UPDATE without NOWAIT) and to implicitly-acquired locks. If this value is specified without units, it is taken as milliseconds. A value of zero (the default) disables the timeout.
321 | >
322 | > Unlike `statement_timeout`, this timeout can only occur while waiting for locks. Note that if `statement_timeout` is nonzero, it is rather pointless to set lock_timeout to the same or larger value, since the statement timeout would always trigger first. If `log_min_error_statement` is set to `ERROR` or lower, the statement that timed out will be logged.
323 | 
324 | There are two ways to apply this lock:
325 | 
326 | 1. localized to the transaction
327 | 2. default for the user/role
328 | 
329 | Let's go through those options:
330 | 
331 | ####  Transaction lock_timeout
332 | 
333 | In SQL:
334 | 
335 | ```sql
336 | SET LOCAL lock_timeout TO '5s';
337 | ```
338 | 
339 | Let's move that to an [Ecto migration transaction callback](https://hexdocs.pm/ecto_sql/Ecto.Migration.html#module-transaction-callbacks). Since this `lock_timeout` will be in a database transaction for Postgres, we will use `SET LOCAL lock_timeout` so that the `lock_timeout` only alters this database transaction and not the session.
340 | 
341 | You can set a lock timeout in every migration:
342 | 
343 | ```elixir
344 | def after_begin do
345 |   # execute/2 can be ran in both migration directions, up/down.
346 |   # The first argument will be ran when migrating up.
347 |   # The second argument will be ran when migrating down. You might give yourself
348 |   # a couple extra seconds when rolling back.
349 |   execute("SET LOCAL lock_timeout TO '5s'", "SET LOCAL lock_timeout TO '10s'")
350 | end
351 | ```
352 | 
353 | But this can get tedious since you'll likely want this for every migration. Let's write a little macro to help with this boilerplate code.
354 | 
355 | In every migration, you'll notice that we `use Ecto.Migration` which inserts some code into your migration. Let's use this same idea to inject a boilerplate of our own and leverage an option to set a lock timeout. We define the [`after_begin/0` callback](https://hexdocs.pm/ecto_sql/Ecto.Migration.html#c:after_begin/0) to set the lock timeout.
356 | 
357 | ```elixir
358 | defmodule MyApp.Migration do
359 |   defmacro __using__(opts) do
360 |     lock_timeout = Keyword.get(opts, :lock_timeout, [up: "5s", down: "10s"])
361 | 
362 |     quote do
363 |       use Ecto.Migration
364 | 
365 |       if unquote(lock_timeout) do
366 |         def after_begin do
367 |           execute(
368 |             "SET LOCAL lock_timeout TO '#{Keyword.fetch!(unquote(lock_timeout), :up)}'",
369 |             "SET LOCAL lock_timeout TO '#{Keyword.fetch!(unquote(lock_timeout), :down)}'"
370 |           )
371 |         end
372 |       end
373 |     end
374 |   end
375 | end
376 | ```
377 | 
378 | And adjust our migration:
379 | 
380 | ```diff
381 | defmodule MyApp.Repo.Migrations.CreateTestTable do
382 | -  use Ecto.Migration
383 | +  use MyApp.Migration
384 | 
385 |   def change do
386 |     # my changes
387 |   end
388 | end
389 | ```
390 | 
391 | Now the migrations will only be allowed to acquire locks up to 5 seconds when migrating up and 10 seconds when rolling back. Remember, these callbacks are not called when `@disable_ddl_transaction true` is set.
392 | 
393 | You can override the lock timeout if needed by passing in options:
394 | 
395 | ```elixir
396 | # disable the lock_timeout
397 | use MyApp.Migration, lock_timeout: false
398 | 
399 | # or change the timeouts
400 | use MyApp.Migration, lock_timeout: [up: "10s", down: "20s"]
401 | ```
402 | 
403 | Let's now make Ecto use our custom Migration module by default when [generating new migrations](https://hexdocs.pm/ecto_sql/Mix.Tasks.Ecto.Gen.Migration.html#module-configuration):
404 | 
405 | ```elixir
406 | # config/config.exs
407 | config :ecto_sql, migration_module: MyApp.Migration
408 | ```
409 | 
410 | #### Role-level lock_timeout
411 | 
412 | Alternatively, you can set a lock timeout for the user in all commands:
413 | 
414 | ```sql
415 | ALTER ROLE myuser SET lock_timeout = '10s';
416 | ```
417 | 
418 | If you have a different user that runs migrations, this could be a good option for that migration-specific Postgres user. The trade-off is that Elixir developers won't see this timeout as they write migrations and explore the call stack since database role settings are in the database which developers don't usually monitor.
419 | 
420 | #### Statement Timeout
421 | 
422 | Another way to ensure safety is to configure your Postgres database with statement timeouts. These timeouts apply to all statements, including migrations and the locks they obtain.
423 | 
424 | From Postgres docs:
425 | 
426 | > Abort any statement that takes more than the specified amount of time. If `log_min_error_statement` is set to `ERROR` or lower, the statement that timed out will also be logged. If this value is specified without units, it is taken as milliseconds. A value of zero (the default) disables the timeout.
427 | >
428 | > The timeout is measured from the time a command arrives at the server until it is completed by the server. If multiple SQL statements appear in a single simple-Query message, the timeout is applied to each statement separately. (PostgreSQL versions before 13 usually treated the timeout as applying to the whole query string.) In extended query protocol, the timeout starts running when any query-related message (Parse, Bind, Execute, Describe) arrives, and it is canceled by completion of an Execute or Sync message.
429 | 
430 | You can specify this configuration for the Postgres user. For example:
431 | 
432 | ```sql
433 | ALTER ROLE myuser SET statement_timeout = '10m';
434 | ```
435 | 
436 | Now any statement automatically times out if it runs for more than 10 minutes; opposed to running indefinitely. This can help if you accidentally run a query that runs the database CPU hot, slowing everything else down; now the unoptimized query will be limited to 10 minutes or else it will fail and be canceled.
437 | 
438 | Setting this `statement_timeout` requires discipline from the team; if there are runaway queries that fail (for example) at 10 minutes, an exception will likely occur somewhere. You will want to equip your application with sufficient logging, tracing, and reporting so you can replicate the query and the parameters it took to hit the timeout, and ultimately optimize the query. Without this discipline, you risk creating a culture that ignores exceptions.
439 | 
440 | ---
441 | 
442 | This guide was originally published on [Fly.io Phoenix Files]( https://fly.io/phoenix-files/anatomy-of-an-ecto-migration/).
443 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Safe Ecto Migrations
  2 | 
  3 | A non-exhaustive guide on common migration recipes and how to avoid trouble.
  4 | 
  5 | - [Adding an index](#adding-an-index)
  6 | - [Adding a reference or foreign key](#adding-a-reference-or-foreign-key)
  7 | - [Adding a column with a default value](#adding-a-column-with-a-default-value)
  8 | - [Changing a column's default value](#changing-a-columns-default-value)
  9 | - [Changing the type of a column](#changing-the-type-of-a-column)
 10 | - [Removing a column](#removing-a-column)
 11 | - [Renaming a column](#renaming-a-column)
 12 | - [Renaming a table](#renaming-a-table)
 13 | - [Adding a check constraint](#adding-a-check-constraint)
 14 | - [Setting NOT NULL on an existing column](#setting-not-null-on-an-existing-column)
 15 | - [Adding a JSON column](#adding-a-json-column)
 16 | - [Squashing migrations](#squashing-migrations)
 17 | 
 18 | Read more about safe migration techniques:
 19 | 
 20 | - [Migration locks in "Anatomy of a Migration"](./Anatomy.md)
 21 | - [How to backfill data and change data in bulk (aka: DML)](./Backfilling.md)
 22 | 
 23 | ---
 24 | 
 25 | ## Adding an index
 26 | 
 27 | Creating an index will [block writes](https://www.postgresql.org/docs/8.2/sql-createindex.html) to the table in Postgres.
 28 | 
 29 | MySQL is concurrent by default since [5.6](https://downloads.mysql.com/docs/mysql-5.6-relnotes-en.pdf) unless using `SPATIAL` or `FULLTEXT` indexes, which then it [blocks reads and writes](https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html#online-ddl-index-syntax-notes).
 30 | 
 31 | **BAD ❌**
 32 | 
 33 | ```elixir
 34 | def change do
 35 |   create index("posts", [:slug])
 36 | 
 37 |   # This obtains a ShareLock on "posts" which will block writes to the table
 38 | end
 39 | ```
 40 | 
 41 | **GOOD ✅**
 42 | 
 43 | With Postgres, instead create the index concurrently which does not block writes.
 44 | There are two options:
 45 | 
 46 | **Option 1**
 47 | 
 48 | [Configure the Repo to use advisory locks](https://hexdocs.pm/ecto_sql/Ecto.Adapters.Postgres.html#module-migration-options) for locking migrations while running. Advisory locks are application-controlled database-level locks, and EctoSQL since v3.9.0 provides an option to use this type of lock. This is the safest option as it avoids the trade-off in Option 2.
 49 | 
 50 | Disable the DDL transaction in the migration to avoid a database transaction which is not compatible with `CONCURRENTLY` database operations.
 51 | 
 52 | ```elixir
 53 | # in config/config.exs
 54 | config MyApp.Repo, migration_lock: :pg_advisory_lock
 55 | 
 56 | # in the migration
 57 | @disable_ddl_transaction true
 58 | 
 59 | def change do
 60 |   create index("posts", [:slug], concurrently: true)
 61 | end
 62 | ```
 63 | 
 64 | If you're using Phoenix and PhoenixEcto, you will likely appreciate disabling
 65 | the migration lock in the CheckRepoStatus plug during dev to avoid hitting and
 66 | waiting on the advisory lock with concurrent web processes. You can do this by
 67 | adding `migration_lock: false` to the CheckRepoStatus plug in your
 68 | `MyAppWeb.Endpoint`.
 69 | 
 70 | **Option 2**
 71 | 
 72 | Disable the DDL transaction and the migration lock for the migration. By default, EctoSQL with Postgres will run migrations with a DDL transaction and a migration lock which also (by default) uses another transaction. You must disable both of these database transactions to use `CONCURRENTLY`. However, disabling the migration lock will allow competing nodes to try to run the same migration at the same time (eg, in a multi-node Kubernetes environment that runs migrations before startup). Therefore, some nodes may fail startup for a variety of reasons.
 73 | 
 74 | ```elixir
 75 | @disable_ddl_transaction true
 76 | @disable_migration_lock true
 77 | 
 78 | def change do
 79 |   create index("posts", [:slug], concurrently: true)
 80 | end
 81 | ```
 82 | 
 83 | For either option chosen, the migration may still take a while to run, but reads and updates to rows will continue to work. For example, for 100,000,000 rows it took 165 seconds to add run the migration, but SELECTS and UPDATES could occur while it was running.
 84 | 
 85 | **Do not have other changes in the same migration**; only create the index concurrently and separate other changes to later migrations.
 86 | 
 87 | ---
 88 | 
 89 | ## Adding a reference or foreign key
 90 | 
 91 | Adding a foreign key blocks writes on both tables.
 92 | 
 93 | **BAD ❌**
 94 | 
 95 | ```elixir
 96 | def change do
 97 |   alter table("posts") do
 98 |     add :group_id, references("groups")
 99 |     # Obtains a ShareRowExclusiveLock which blocks writes on both tables
100 |   end
101 | end
102 | ```
103 | 
104 | 
105 | **GOOD ✅**
106 | 
107 | In the first migration
108 | 
109 | ```elixir
110 | def change do
111 |   alter table("posts") do
112 |     add :group_id, references("groups", validate: false)
113 |     # Obtains a ShareRowExclusiveLock which blocks writes on both tables.
114 |   end
115 | end
116 | ```
117 | 
118 | In the second migration
119 | 
120 | ```elixir
121 | def change do
122 |   execute "ALTER TABLE posts VALIDATE CONSTRAINT group_id_fkey", ""
123 |   # Obtains a ShareUpdateExclusiveLock which doesn't block reads or writes
124 | end
125 | ```
126 | 
127 | These migrations can be in the same deployment, but make sure they are separate migrations.
128 | 
129 | **Note on empty tables**: when the table creating the referenced column is empty, you may be able to
130 | create the column and validate at the same time since the time difference would be milliseconds
131 | which may not be noticeable, no matter if you have 1 million or 100 million records in the referenced table.
132 | 
133 | **Note on populated tables**: the biggest difference depends on your scale. For 1 million records in
134 | both tables, you may lock writes to both tables when creating the column for milliseconds
135 | (you should benchmark for yourself) which could be acceptable for you. However, once your table has
136 | 100+ million records, the difference becomes seconds which is more likely to be felt and cause timeouts.
137 | The differentiating metric is the time that both tables are locked from writes. Therefore, err on the side
138 | of safety and separate constraint validation from referenced column creation when there is any data in the table.
139 | 
140 | ---
141 | 
142 | ## Adding a column with a default value
143 | 
144 | Adding a column with a default value to an existing table may cause the table to be rewritten. During this time, reads and writes are blocked in Postgres, and writes are blocked in MySQL and MariaDB. If the default column is an expression (volatile value) it will remain unsafe.
145 | 
146 | **BAD ❌**
147 | 
148 | Note: This becomes safe for non-volatile (static) defaults in:
149 | 
150 | - [Postgres 11+](https://www.postgresql.org/docs/release/11.0/). Default applies to INSERT since 7.x, and UPDATE since 9.3.
151 | - MySQL 8.0.12+
152 | - MariaDB 10.3.2+
153 | 
154 | ```elixir
155 | def change do
156 |   alter table("comments") do
157 |     add :approved, :boolean, default: false
158 |     # This took 10 minutes for 100 million rows with no fkeys,
159 | 
160 |     # Obtained an AccessExclusiveLock on the table, which blocks reads and
161 |     # writes.
162 |   end
163 | end
164 | ```
165 | 
166 | ```elixir
167 | def change do
168 |   alter table("comments") do
169 |     add :some_timestamp, :utc_datetime, default: fragment("now()")
170 |     # A volatile value
171 |   end
172 | end
173 | ```
174 | 
175 | **GOOD ✅**
176 | 
177 | Add the column first, then alter it to include the default.
178 | 
179 | First migration:
180 | 
181 | ```elixir
182 | def change do
183 |   alter table("comments") do
184 |     add :approved, :boolean
185 |     # This took 0.27 milliseconds for 100 million rows with no fkeys,
186 |   end
187 | end
188 | ```
189 | 
190 | Second migration:
191 | 
192 | ```elixir
193 | def change do
194 |   execute "ALTER TABLE comments ALTER COLUMN approved SET DEFAULT false",
195 |           "ALTER TABLE comments ALTER COLUMN approved DROP DEFAULT"
196 |   # This took 0.28 milliseconds for 100 million rows with no fkeys,
197 | end
198 | ```
199 | 
200 | Note: we cannot use [`modify/3`](https://hexdocs.pm/ecto_sql/Ecto.Migration.html#modify/3) as it will include updating the column type as
201 | well unnecessarily, causing Postgres to rewrite the table. For more information,
202 | [see this example](https://github.com/fly-apps/safe-ecto-migrations/issues/10).
203 | 
204 | Schema change to read the new column:
205 | 
206 | ```diff
207 | schema "comments" do
208 | + field :approved, :boolean, default: false
209 | end
210 | ```
211 | 
212 | > [!NOTE]
213 | > The safe method will not materialize the default value on the column for existing rows because the default was not set when adding the column (avoiding a potential table lock so it can re-write it to _write_ the default). However, the next `UPDATE` operation on the row will materialize the default, additionally Ecto will apply the default on the application side when reading the record. If you want to materialize the value, then you will need to consider [backfilling](./Backfilling.md).
214 | 
215 | ---
216 | 
217 | ## Changing a column's default value
218 | 
219 | Changing an existing column's default may risk rewriting the table.
220 | 
221 | **BAD ❌**
222 | 
223 | ```elixir
224 | def change do
225 |   alter table("comments") do
226 |     # Previously, the default was `true`
227 |     modify :approved, :boolean, default: false
228 |     # This took 10 minutes for 100 million rows with no fkeys,
229 | 
230 |     # Obtained an AccessExclusiveLock on the table, which blocks reads and
231 |     # writes.
232 |   end
233 | end
234 | ```
235 | 
236 | The issue is that we cannot use [`modify/3`](https://hexdocs.pm/ecto_sql/Ecto.Migration.html#modify/3) as it will include updating the column type as
237 | well unnecessarily, causing Postgres to rewrite the table. For more information,
238 | [see this example](https://github.com/fly-apps/safe-ecto-migrations/issues/10).
239 | 
240 | **GOOD ✅**
241 | 
242 | Execute raw sql instead to alter the default:
243 | 
244 | ```elixir
245 | def change do
246 |   execute "ALTER TABLE comments ALTER COLUMN approved SET DEFAULT false",
247 |           "ALTER TABLE comments ALTER COLUMN approved SET DEFAULT true"
248 |   # This took 0.28 milliseconds for 100 million rows with no fkeys,
249 | end
250 | ```
251 | 
252 | > [!NOTE]
253 | > This will not update the values of rows previously-set by the old default. This value has been materialized at the time of insert/update and therefore has no distinction between whether it was set by the column `DEFAULT` or set by the original operation.
254 | >
255 | > If you want to update the default of already-written rows, you must distinguish them somehow and modify them with a [backfill](./Backfilling.md)
256 | 
257 | ---
258 | 
259 | ## Changing the type of a column
260 | 
261 | Changing the type of a column may cause the table to be rewritten. During this time, reads and writes are blocked in Postgres, and writes are blocked in MySQL and MariaDB.
262 | 
263 | **BAD ❌**
264 | 
265 | Safe in Postgres:
266 | 
267 | - increasing length on varchar or removing the limit
268 | - changing varchar to text
269 | - changing text to varchar with no length limit
270 | - Postgres 9.2+ - increasing precision (NOTE: not scale) of decimal or numeric columns. eg, increasing 8,2 to 10,2 is safe. Increasing 8,2 to 8,4 is not safe.
271 | - Postgres 9.2+ - changing decimal or numeric to be unconstrained
272 | - Postgres 12+ - changing timestamp to timestamptz when session TZ is UTC
273 | 
274 | Safe in MySQL/MariaDB:
275 | 
276 | - increasing length of varchar from < 255 up to 255.
277 | - increasing length of varchar from > 255 up to max.
278 | 
279 | ```elixir
280 | def change do
281 |   alter table("posts") do
282 |     modify :my_column, :boolean, from: :text
283 |   end
284 | end
285 | ```
286 | 
287 | **GOOD ✅**
288 | 
289 | Take a phased approach:
290 | 
291 | 1. Create a new column
292 | 1. In application code, write to both columns
293 | 1. Backfill data from old column to new column
294 | 1. In application code, move reads from old column to the new column
295 | 1. In application code, remove old column from Ecto schemas.
296 | 1. Drop the old column.
297 | 
298 | ---
299 | 
300 | ## Removing a column
301 | 
302 | If Ecto is still configured to read a column in any running instances of the application, then queries will fail when loading data into your structs. This can happen in multi-node deployments or if you start the application before running migrations.
303 | 
304 | **BAD ❌**
305 | 
306 | ```elixir
307 | # Without a code change to the Ecto Schema
308 | 
309 | def change do
310 |   alter table("posts") do
311 |     remove :no_longer_needed_column
312 |   end
313 | end
314 | ```
315 | 
316 | 
317 | **GOOD ✅**
318 | 
319 | Safety can be assured if the application code is first updated to remove references to the column so it's no longer loaded or queried. Then, the column can safely be removed from the table.
320 | 
321 | 1. Deploy code change to remove references to the field.
322 | 1. Deploy migration change to remove the column.
323 | 
324 | First deployment:
325 | 
326 | ```diff
327 | # First deploy, in the Ecto schema
328 | 
329 | defmodule MyApp.Post do
330 |   schema "posts" do
331 | -   column :no_longer_needed_column, :text
332 |   end
333 | end
334 | ```
335 | 
336 | Second deployment:
337 | 
338 | ```elixir
339 | def change do
340 |   alter table("posts") do
341 |     remove :no_longer_needed_column
342 |   end
343 | end
344 | ```
345 | 
346 | ---
347 | 
348 | ## Renaming a column
349 | 
350 | Ask yourself: "Do I _really_ need to rename a column?". Probably not, but if you must, read on and be aware it requires time and effort.
351 | 
352 | If Ecto is configured to read a column in any running instances of the application, then queries will fail when loading data into your structs. This can happen in multi-node deployments or if you start the application before running migrations.
353 | 
354 | There is a shortcut: Don't rename the database column, and instead rename the schema's field name and configure it to point to the database column.
355 | 
356 | **BAD ❌**
357 | 
358 | ```elixir
359 | # In your schema
360 | schema "posts" do
361 |   field :summary, :text
362 | end
363 | 
364 | 
365 | # In your migration
366 | def change do
367 |   rename table("posts"), :title, to: :summary
368 | end
369 | ```
370 | 
371 | The time between your migration running and your application getting the new code may encounter trouble.
372 | 
373 | **GOOD ✅**
374 | 
375 | **Strategy 1**
376 | 
377 | Rename the field in the schema only, and configure it to point to the database column and keep the database column the same. Ensure all calling code relying on the old field name is also updated to reference the new field name.
378 | 
379 | ```elixir
380 | defmodule MyApp.MySchema do
381 |   use Ecto.Schema
382 | 
383 |   schema "weather" do
384 |     field :temp_lo, :integer
385 |     field :temp_hi, :integer
386 |     field :precipitation, :float, source: :prcp
387 |     field :city, :string
388 | 
389 |     timestamps(type: :naive_datetime_usec)
390 |   end
391 | end
392 | ```
393 | 
394 | ```diff
395 | ## Update references in other parts of the codebase:
396 |    my_schema = Repo.get(MySchema, "my_id")
397 | -  my_schema.prcp
398 | +  my_schema.precipitation
399 | ```
400 | 
401 | **Strategy 2**
402 | 
403 | Take a phased approach:
404 | 
405 | 1. Create a new column
406 | 1. In application code, write to both columns
407 | 1. Backfill data from old column to new column
408 | 1. In application code, move reads from old column to the new column
409 | 1. In application code, remove old column from Ecto schemas.
410 | 1. Drop the old column.
411 | 
412 | ---
413 | 
414 | ## Renaming a table
415 | 
416 | Ask yourself: "Do I _really_ need to rename a table?". Probably not, but if you must, read on and be aware it requires time and effort.
417 | 
418 | If Ecto is still configured to read a table in any running instances of the application, then queries will fail when loading data into your structs. This can happen in multi-node deployments or if you start the application before running migrations.
419 | 
420 | There is a shortcut: rename the schema only, and do not change the underlying database table name.
421 | 
422 | **BAD ❌**
423 | 
424 | ```elixir
425 | def change do
426 |   rename table("posts"), to: table("articles")
427 | end
428 | ```
429 | 
430 | **GOOD ✅**
431 | 
432 | **Strategy 1**
433 | 
434 | Rename the schema only and all calling code, and don’t rename the table:
435 | 
436 | ```diff
437 | - defmodule MyApp.Weather do
438 | + defmodule MyApp.Forecast do
439 |   use Ecto.Schema
440 | 
441 |   schema "weather" do
442 |     field :temp_lo, :integer
443 |     field :temp_hi, :integer
444 |     field :precipitation, :float, source: :prcp
445 |     field :city, :string
446 | 
447 |     timestamps(type: :naive_datetime_usec)
448 |   end
449 | end
450 | 
451 | # and in calling code:
452 | - weather = MyApp.Repo.get(MyApp.Weather, “my_id”)
453 | + forecast = MyApp.Repo.get(MyApp.Forecast, “my_id”)
454 | ```
455 | 
456 | **Strategy 2**
457 | 
458 | Take a phased approach:
459 | 
460 | 1. Create the new table. This should include creating new constraints (checks and foreign keys) that mimic behavior of the old table.
461 | 1. In application code, write to both tables, continuing to read from the old table.
462 | 1. Backfill data from old table to new table
463 | 1. In application code, move reads from old table to the new table
464 | 1. In application code, remove the old table from Ecto schemas.
465 | 1. Drop the old table.
466 | 
467 | ---
468 | 
469 | ## Adding a check constraint
470 | 
471 | Adding a check constraint blocks reads and writes to the table in Postgres, and blocks writes in MySQL/MariaDB while every row is checked.
472 | 
473 | **BAD ❌**
474 | 
475 | ```elixir
476 | def change do
477 |   create constraint("products", :price_must_be_positive, check: "price > 0")
478 |   # Creating the constraint with validate: true (the default when unspecified)
479 |   # will perform a full table scan and acquires a lock preventing updates
480 | end
481 | ```
482 | 
483 | **GOOD ✅**
484 | 
485 | There are two operations occurring:
486 | 
487 | 1. Creating a new constraint for new or updating records
488 | 1. Validating the new constraint for existing records
489 | 
490 | If these commands are happening at the same time, it obtains a lock on the table as it validates the entire table and fully scans the table. To avoid this full table scan, we can separate the operations.
491 | 
492 | In one migration:
493 | 
494 | ```elixir
495 | def change do
496 |   create constraint("products", :price_must_be_positive, check: "price > 0", validate: false)
497 |   # Setting validate: false will prevent a full table scan, and therefore
498 |   # commits immediately.
499 | end
500 | ```
501 | 
502 | In the next migration:
503 | 
504 | ```elixir
505 | def change do
506 |   execute "ALTER TABLE products VALIDATE CONSTRAINT price_must_be_positive", ""
507 |   # Acquires SHARE UPDATE EXCLUSIVE lock, which allows updates to continue
508 | end
509 | ```
510 | 
511 | These can be in the same deployment, but ensure there are 2 separate migrations.
512 | 
513 | ---
514 | 
515 | ## Setting NOT NULL on an existing column
516 | 
517 | Setting NOT NULL on an existing column blocks reads and writes while every row is checked.  Just like the Adding a check constraint scenario, there are two operations occurring:
518 | 
519 | 1. Creating a new constraint for new or updating records
520 | 1. Validating the new constraint for existing records
521 | 
522 | To avoid the full table scan, we can separate these two operations.
523 | 
524 | **BAD ❌**
525 | 
526 | ```elixir
527 | def change do
528 |   alter table("products") do
529 |     modify :active, :boolean, null: false
530 |   end
531 | end
532 | ```
533 | 
534 | **GOOD ✅**
535 | 
536 | Add a check constraint without validating it, backfill data to satiate the constraint and then validate it. This will be functionally equivalent.
537 | 
538 | In the first migration:
539 | 
540 | ```elixir
541 | # Deployment 1
542 | def change do
543 |   create constraint("products", :active_not_null, check: "active IS NOT NULL", validate: false)
544 | end
545 | ```
546 | 
547 | This will enforce the constraint in all new rows, but not care about existing rows until that row is updated.
548 | 
549 | You'll likely need a data migration at this point to ensure that the constraint is satisfied.
550 | 
551 | Then, in the next deployment's migration, we'll enforce the constraint on all rows:
552 | 
553 | ```elixir
554 | # Deployment 2
555 | def change do
556 |   execute "ALTER TABLE products VALIDATE CONSTRAINT active_not_null", ""
557 | end
558 | ```
559 | 
560 | If you're using Postgres 12+, you can add the NOT NULL to the column after validating the constraint. From the Postgres 12 docs:
561 | 
562 | > SET NOT NULL may only be applied to a column provided
563 | > none of the records in the table contain a NULL value
564 | > for the column. Ordinarily this is checked during the
565 | > ALTER TABLE by scanning the entire table; however, if
566 | > a valid CHECK constraint is found which proves no NULL
567 | > can exist, then the table scan is skipped.
568 | 
569 | **However** we cannot use [`modify/3`](https://hexdocs.pm/ecto_sql/Ecto.Migration.html#modify/3)
570 | as it will include updating the column type as well unnecessarily, causing
571 | Postgres to rewrite the table. For more information, [see this example](https://github.com/fly-apps/safe-ecto-migrations/issues/10).
572 | 
573 | ```elixir
574 | # **Postgres 12+ only**
575 | 
576 | def change do
577 |   execute "ALTER TABLE products VALIDATE CONSTRAINT active_not_null",
578 |           ""
579 | 
580 |   execute "ALTER TABLE products ALTER COLUMN active SET NOT NULL",
581 |           "ALTER TABLE products ALTER COLUMN active DROP NOT NULL"
582 | 
583 |   drop constraint("products", :active_not_null)
584 | end
585 | ```
586 | 
587 | If your constraint fails, then you should consider backfilling data first to cover the gaps in your desired data integrity, then revisit validating the constraint.
588 | 
589 | ---
590 | 
591 | ## Adding a JSON column
592 | 
593 | In Postgres, there is no equality operator for the json column type, which can cause errors for existing SELECT DISTINCT queries in your application.
594 | 
595 | **BAD ❌**
596 | 
597 | ```elixir
598 | def change do
599 |   alter table("posts") do
600 |     add :extra_data, :json
601 |   end
602 | end
603 | ```
604 | 
605 | **GOOD ✅**
606 | 
607 | Use jsonb instead. Some say it’s like “json” but “better.”
608 | 
609 | ```elixir
610 | def change do
611 |   alter table("posts") do
612 |     add :extra_data, :jsonb
613 |   end
614 | end
615 | ```
616 | 
617 | ---
618 | 
619 | ## Squashing Migrations
620 | 
621 | If you have a long list of migrations, sometimes it can take a while to migrate
622 | each of those files every time the project is reset or spun up by a new
623 | developer. Thankfully, Ecto comes with mix tasks to `dump` and `load` a database
624 | structure which will represent the state of the database up to a certain point
625 | in time, not including content.
626 | 
627 | - [mix ecto.dump](mix_ecto_dump)
628 | - [mix ecto.load](mix_ecto_load)
629 | 
630 | Schema dumping and loading is only supported by external binaries `pg_dump` and
631 | `mysqldump`, which are used by the Postgres, MyXQL, and MySQL Ecto adapters (not
632 | supported in MSSQL adapter).
633 | 
634 | For example:
635 | 
636 | ```
637 | 20210101000000 - First Migration
638 | 20210201000000 - Second Migration
639 | 20210701000000 - Third Migration <-- we are here now. run `mix ecto.dump`
640 | ```
641 | 
642 | We can "squash" the migrations up to the current day which will effectively
643 | fast-forward migrations to that structure. The Ecto Migrator will detect that
644 | the database is already migrated to the third migration, and so it begins there
645 | and migrates forward.
646 | 
647 | Let's add a new migration:
648 | 
649 | ```
650 | 20210101000000 - First Migration
651 | 20210201000000 - Second Migration
652 | 20210701000000 - Third Migration <-- `structure.sql` represents up to here
653 | 20210801000000 - New Migration <-- This is where migrations will begin
654 | ```
655 | 
656 | The new migration will still run, but the first-through-third migrations will
657 | not need to be run since the structure already represents the changes applied by
658 | those migrations. At this point, you can safely delete the first, second, and
659 | third migration files or keep them for historical auditing.
660 | 
661 | Let's make this work:
662 | 
663 | 1. Run `mix ecto.dump` which will dump the current structure into
664 |    `priv/repo/structure.sql` by default. [Check the mix task for more
665 |    options](mix_ecto_dump).
666 | 2. During project setup with an empty database, run `mix ecto.load` to load
667 |    `structure.sql`.
668 | 3. Run `mix ecto.migrate` to run any additional migrations created after the
669 |    structure was dumped.
670 | 
671 | To simplify these actions into one command, we can leverage mix aliases:
672 | 
673 | ```elixir
674 | # mix.exs
675 | 
676 | defp aliases do
677 |   [
678 |     "ecto.reset": ["ecto.drop", "ecto.setup"],
679 |     "ecto.setup": ["ecto.load", "ecto.migrate"],
680 |     # ...
681 |   ]
682 | end
683 | ```
684 | 
685 | Now you can run `mix ecto.setup` and it will load the database structure and run
686 | remaining migrations. Or, run `mix ecto.reset` and it will drop and run setup.
687 | Of course, you can continue running `mix ecto.migrate` as you create them.
688 | 
689 | [mix_ecto_dump]: https://hexdocs.pm/ecto_sql/Mix.Tasks.Ecto.Dump.html
690 | [mix_ecto_load]: https://hexdocs.pm/ecto_sql/Mix.Tasks.Ecto.Load.html
691 | 
692 | ---
693 | 
694 | # Credits
695 | 
696 | Created and written by David Bernheisel with recipes heavily inspired from Andrew Kane and his library [strong_migrations](https://github.com/ankane/strong_migrations).
697 | 
698 | [PostgreSQL at Scale by James Coleman](https://medium.com/braintree-product-technology/postgresql-at-scale-database-schema-changes-without-downtime-20d3749ed680)
699 | 
700 | [Strong Migrations by Andrew Kane](https://github.com/ankane/strong_migrations)
701 | 
702 | [Adding a NOT NULL CONSTRAINT on PG Faster with Minimal Locking by Christophe Escobar](https://medium.com/doctolib/adding-a-not-null-constraint-on-pg-faster-with-minimal-locking-38b2c00c4d1c)
703 | 
704 | [Postgres Runtime Configuration](https://www.postgresql.org/docs/current/runtime-config-client.html)
705 | 
706 | [Automatic and Manual Ecto Migrations by Wojtek Mach](https://dashbit.co/blog/automatic-and-manual-ecto-migrations)
707 | 
708 | Special thanks for sponsorship:
709 | 
710 | * Fly.io
711 | 
712 | Special thanks for these reviewers:
713 | 
714 | * Steve Bussey
715 | * Stephane Robino
716 | * Dennis Beatty
717 | * Wojtek Mach
718 | * Mark Ericksen
719 | * And [all the contributors](https://github.com/fly-apps/safe-ecto-migrations/graphs/contributors)
720 | 
721 | # Reference Material
722 | 
723 | [Postgres Lock Conflicts](https://www.postgresql.org/docs/12/explicit-locking.html)
724 | 
725 | |  |  **Current Lock →** | | | | | | | |
726 | |---------------------|-------------------|-|-|-|-|-|-|-|
727 | | **Requested Lock ↓** | ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE |
728 | | ACCESS SHARE           |   |   |   |   |   |   |   | X |
729 | | ROW SHARE              |   |   |   |   |   |   | X | X |
730 | | ROW EXCLUSIVE          |   |   |   |   | X | X | X | X |
731 | | SHARE UPDATE EXCLUSIVE |   |   |   | X | X | X | X | X |
732 | | SHARE                  |   |   | X | X |   | X | X | X |
733 | | SHARE ROW EXCLUSIVE    |   |   | X | X | X | X | X | X |
734 | | EXCLUSIVE              |   | X | X | X | X | X | X | X |
735 | | ACCESS EXCLUSIVE       | X | X | X | X | X | X | X | X |
736 | 
737 | - `SELECT` acquires a `ACCESS SHARE` lock
738 | - `SELECT FOR UPDATE` acquires a `ROW SHARE` lock
739 | - `UPDATE`, `DELETE`, and `INSERT` will acquire a `ROW EXCLUSIVE` lock
740 | - `CREATE INDEX CONCURRENTLY` and `VALIDATE CONSTRAINT` acquires `SHARE UPDATE EXCLUSIVE`
741 | - `CREATE INDEX` acquires `SHARE` lock
742 | 
743 | Knowing this, let's re-think the above table:
744 | 
745 | |  |  **Current Operation →** | | | | | | | |
746 | |---------------------|-------------------|-|-|-|-|-|-|-|
747 | | **Blocks Operation ↓** | `SELECT` | `SELECT FOR UPDATE` | `UPDATE` `DELETE` `INSERT` | `CREATE INDEX CONCURRENTLY`  `VALIDATE CONSTRAINT` | `CREATE INDEX` | SHARE ROW EXCLUSIVE | EXCLUSIVE | `ALTER TABLE` `DROP TABLE` `TRUNCATE` `REINDEX` `CLUSTER` `VACUUM FULL` |
748 | | `SELECT` |   |   |   |   |   |   |   | X |
749 | | `SELECT FOR UPDATE` |   |   |   |   |   |   | X | X |
750 | | `UPDATE` `DELETE` `INSERT` |   |   |   |   | X | X | X | X |
751 | | `CREATE INDEX CONCURRENTLY` `VALIDATE CONSTRAINT` |   |   |   | X | X | X | X | X |
752 | | `CREATE INDEX` |   |   | X | X |   | X | X | X |
753 | | SHARE ROW EXCLUSIVE |   |   | X | X | X | X | X | X |
754 | | EXCLUSIVE |   | X | X | X | X | X | X | X |
755 | | `ALTER TABLE` `DROP TABLE` `TRUNCATE` `REINDEX` `CLUSTER` `VACUUM FULL` | X | X | X | X | X | X | X | X |
756 | 


--------------------------------------------------------------------------------