20190228092516_clean_up_noteable_id_for_notes_on_commits migration failed due to statement timeout
The deploy of RC2 failed in production due to a database statement timeout caused by the migration introduced in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/25639:
PG::QueryCanceled: ERROR: canceling statement due to statement timeout
: UPDATE "notes" SET "noteable_id" = NULL WHERE "notes"."id" >= 125221126 AND "notes"."id" < 125678509 AND "notes"."noteable_type" = 'Commit' AND "notes"."noteable_id" IS NOT NULL
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers.rb:375:in `block in update_column_in_batches'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers.rb:352:in `loop'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/migration_helpers.rb:352:in `update_column_in_batches'
/opt/gitlab/embedded/service/gitlab-rails/db/migrate/20190228092516_clean_up_noteable_id_for_notes_on_commits.rb:22:in `up'
If you do the subtract the number of IDs, you see:
125678509 - 125221126 = 457383
You can see in the logs that many of these UPDATE queries took close to 10-15 seconds in general.
Even though there's an index, it's possible that because the data is so sparsely populated in the table, the database has to do a lot of work to seek all the rows to update.
Note that there are about 126 million rows in the notes table, and although we limit the batch size to 1000 in https://gitlab.com/gitlab-org/gitlab-ce/blob/a592a78072bb44fed1a25c25f2cabdc4cf4bc0bd/lib/gitlab/database/migration_helpers.rb#L340-346, it's possible the range gap in the ID is large.
This seems like a similar problem @shinya saw when migrating CI artifacts in https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/18615#note_86431380. In this case, we only have about 50,000 rows to update:
gitlabhq_production=# SELECT COUNT(*) AS count FROM notes WHERE noteable_type = 'Commit' AND noteable_id IS NOT NULL;
count
-------
50281
(1 row)
@yorickpeterse I thought there was some attempt somewhere to limit the ID range in the migration helper, but it was never merged. Is this still something we should consider?
I believe @rspeicher and @skarbek are going to revert this migration for now. @engwan, can you take a look at this?