Skip to content

Migrate empty releases to ghost user

What does this MR do and why?

This MR performs backfill migration on the releases table for the records which does not have an author_id.

  • We verified that there are no new empty release author_id records after merging !104853 (merged)
  • For the older records it was decided to perform a backfill migration.

Related to #386089 (closed)

We want to start with the backfill data migration first as it will enable to us to test the model validation with FF validate_release_with_author. Plan of steps is listed in implementation proposal.

Note:

  • For the records which are backfilled, there is no logical revert. If required we can get the modified record ids from DB snapshot after 2022-11-30 23:59:59 as we have already verified that there are no new releases records with empty author_id.

Estimation on gitlab.com

  • Row Count: 958400 (19.40% of entire rows)
  • Batch size: 1000 (sub-batch size: 100)
  • Job count: 959 jobs
  • Job interval: 2 min
  • Total Duration: 1918 (~ 1 day 8 hours)

Database

See !106813 (comment 1210767361) for DB testing.

Up

> bundle exec rake db:migrate:up:main VERSION=20221215151822

main: == 20221215151822 ScheduleBackfillReleasesAuthorId: migrating =================
main: == 20221215151822 ScheduleBackfillReleasesAuthorId: migrated (0.3046s) ========

Down

> bundle exec rake db:migrate:down:main VERSION=20221215151822

main: == 20221215151822 ScheduleBackfillReleasesAuthorId: reverting =================
main: == 20221215151822 ScheduleBackfillReleasesAuthorId: reverted (0.2285s) ========

Table stats

  • SELECT COUNT(*) FROM "releases"; => Release.count

  • SELECT COUNT(*) FROM "releases" WHERE "releases"."author_id" IS NULL; => Release.where(:author_id => nil).count

See: #367522 (comment 1213090878)

Queries

Queries are from the local log.

SELECT releases.id FROM releases WHERE releases.id BETWEEN 1 AND 100 AND releases.author_id IS NULL ORDER BY releases.id ASC LIMIT 1

SELECT releases.id FROM releases WHERE releases.id BETWEEN 1 AND 100 AND releases.author_id IS NULL AND releases.id >= 81 ORDER BY releases.id ASC LIMIT 1 OFFSET 10

UPDATE releases SET author_id = 32 WHERE releases.id BETWEEN 1 AND 100 AND releases.author_id IS NULL AND releases.id >= 81

Explain from a sampled batch

All executed in cold cache

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Bala Kumar

Merge request reports