Optimize link repository ID migration
The previous migration was slow at times because the update would cause
PostgreSQL to do a merge-join and then filter out rows matching
repository_id IS NULL
. As more rows migrated gained a repository_id
,
this would increase the query time significantly for each batch.
For example, as seen in the auto-explain output in #3973 (comment 804697280):
-> Merge Join (cost=0.96..71846.80 rows=303634 width=119) (actual time=44.613..44.651 rows=14 loops=1)
Merge Cond: ((storage_repositories_1.virtual_storage = repositories.virtual_storage) AND (storage_repositories_1.relative_path = repositories.relative_path))
-> Index Scan using storage_repositories_pkey on storage_repositories storage_repositories_1 (cost=0.55..52379.32 rows=370291 width=111) (actual time=32.448..32.471 rows=14 loops=1)
Filter: (repository_id IS NULL)
Rows Removed by Filter: 30305
The batching was added to deal with limiting the payload size of a trigger update.
We can make this migration go faster by disabling the triggers in the transactions, rollback to 2bbec66c, and re-enable the trigger.
Relates to #3973 (closed)
Before
$ time _build/bin/praefect -config /tmp/praefect.config.toml sql-migrate
praefect sql-migrate: OK (applied 11 migrations)
real 93m15.507s
user 0m3.503s
sys 0m0.283s
After
$ time _build/bin/praefect -config /tmp/praefect.config.toml sql-migrate
praefect sql-migrate: OK (applied 11 migrations)
real 0m25.522s
user 0m0.357s
sys 0m0.058s
Edited by Stan Hu