Optimize link repository ID migration (!4234) · Merge requests · GitLab.org / gitaly

Stan Hu requested to merge sh-optimize-repository-id-migration into master Jan 07, 2022

The previous migration was slow at times because the update would cause PostgreSQL to do a merge-join and then filter out rows matching repository_id IS NULL. As more rows migrated gained a repository_id, this would increase the query time significantly for each batch.

For example, as seen in the auto-explain output in #3973 (comment 804697280):

                            ->  Merge Join  (cost=0.96..71846.80 rows=303634 width=119) (actual time=44.613..44.651 rows=14 loops=1)
                                  Merge Cond: ((storage_repositories_1.virtual_storage = repositories.virtual_storage) AND (storage_repositories_1.relative_path = repositories.relative_path))
                                  ->  Index Scan using storage_repositories_pkey on storage_repositories storage_repositories_1  (cost=0.55..52379.32 rows=370291 width=111) (actual time=32.448..32.471 rows=14 loops=1)
                                        Filter: (repository_id IS NULL)
                                        Rows Removed by Filter: 30305

The batching was added to deal with limiting the payload size of a trigger update.

We can make this migration go faster by disabling the triggers in the transactions, rollback to 2bbec66c, and re-enable the trigger.

Relates to #3973 (closed)

Before

$ time _build/bin/praefect -config /tmp/praefect.config.toml sql-migrate
praefect sql-migrate: OK (applied 11 migrations)

real    93m15.507s
user    0m3.503s
sys     0m0.283s

After

$ time _build/bin/praefect -config /tmp/praefect.config.toml sql-migrate
praefect sql-migrate: OK (applied 11 migrations)

real    0m25.522s
user    0m0.357s
sys     0m0.058s

Edited Jan 10, 2022 by Stan Hu

Optimize link repository ID migration

Before

After

Merge request reports