Background migrations cause statement timeouts if there is a huge gap in a batch

Description

Background migrations is a great tool to migrate data from an old structure to a new structure, and it works asynchronously, so rake db:migrate is much faster than regular post-migrations.

However, there is a problem with big tables. If there is a huge gap between minimal ID and maximum ID, background migrations will likely fail by statement timeouts.

In fact, recently we had three such cases;

  • Ci::Stage migration => 4 million gap in a batch
  • Legacy artifacts migration => 2 million gap in a batch
  • Legacy job traces migration => We don't have a extact number, but probably 10-20 million gap in a batch

We were dealing with those problems by adding database indexes, and it worked at some points, but selecting from a huge dataset naturally takes time, and thus still there is a chance of timeouts.

Data on gitlab.com is growing rapidly. Probably ci_builds reach 100 million rows by the end of this year.

We need a long-term solution to let us easily and safely migrate data.

Proposal

There are some proposals from the discussions on slack.

  • Splitting by a range => https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/19744)
  • Having background_migrations table to track the background migration status, and repeatedly migrate data with cron worker
  • Chaining sidekiq jobs by passing arguments from job to job (This requires Fixing unreliable sidekiq queue)

Related

  • https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/19744
  • https://gitlab.com/gitlab-com/infrastructure/issues/4377
  • https://gitlab.com/gitlab-org/gitlab-ce/issues/47454
  • https://gitlab.com/gitlab-org/gitlab-ce/issues/46652#note_82544834
  • https://gitlab.com/gitlab-org/gitlab-ce/issues/48096
Edited Jun 26, 2025 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading