Skip to content

Migrate users.bio to user_details.bio

Adam Hegyi requested to merge 206913-migrate-users-bio into master

Feature Flag

Feature flag (default enabled): migrate_bio_to_user_details.

In case of an issue: Turn off the feature flag off globally, this stops the attribute syncs and the background migrations will be no-op.

What does this MR do?

This MR is the first part of the migration of users.bio to user_details.bio.

Milestone N (current):

  • Add bio column to user_details.
  • Kick off background migration jobs to copy users.bio to user_details.bio for existing users.
  • Update the application code to assign the value of users.bio to user_details.bio as well.

Milestone N + 1 (next):

  • Ignore the users.bio column.
  • Always delegate bio to user_details.bio.
  • Preload user_details on /users API to avoid N+1.

Milestone N + 2:

  • Remove users.bio column.

Migration output

Migration output.
== 20200323071918 AddBioToUserDetails: migrating ==============================
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0003s
-- transaction()
-- add_column(:user_details, :bio, :string, {:default=>nil, :limit=>255})
   -> 0.0011s
-- change_column_default(:user_details, :bio, "")
   -> 0.0021s
   -> 0.0047s
-- columns(:user_details)
   -> 0.0011s
-- transaction_open?()
   -> 0.0000s
-- exec_query("SELECT COUNT(*) AS count FROM \"user_details\"")
   -> 0.0007s
-- exec_query("SELECT \"user_details\".\"user_id\" FROM \"user_details\" ORDER BY \"user_details\".\"user_id\" ASC LIMIT 1")
   -> 0.0011s
-- exec_query("SELECT \"user_details\".\"user_id\" FROM \"user_details\" WHERE \"user_details\".\"user_id\" >= 1 ORDER BY \"user_details\".\"user_id\" ASC LIMIT 1 OFFSET 1")
   -> 0.0009s
-- execute("UPDATE \"user_details\" SET \"bio\" = '' WHERE \"user_details\".\"user_id\" >= 1 AND \"user_details\".\"user_id\" < 78")
   -> 0.0016s
-- exec_query("SELECT \"user_details\".\"user_id\" FROM \"user_details\" WHERE \"user_details\".\"user_id\" >= 78 ORDER BY \"user_details\".\"user_id\" ASC LIMIT 1 OFFSET 1")
   -> 0.0003s
-- execute("UPDATE \"user_details\" SET \"bio\" = '' WHERE \"user_details\".\"user_id\" >= 78")
   -> 0.0011s
-- change_column_null(:user_details, :bio, false)
   -> 0.0009s
-- execute("RESET ALL")
   -> 0.0002s
== 20200323071918 AddBioToUserDetails: migrated (0.0149s) =====================

== 20200323074147 AddTempIndexOnUsersBio: migrating =========================== -- transaction_open?() -> 0.0000s -- index_exists?(:users, :id, {:where=>"(COALESCE(bio, '') IS DISTINCT FROM '')", :name=>"tmp_idx_on_user_id_where_bio_is_filled", :algorithm=>:concurrently}) -> 0.0061s -- execute("SET statement_timeout TO 0") -> 0.0001s -- add_index(:users, :id, {:where=>"(COALESCE(bio, '') IS DISTINCT FROM '')", :name=>"tmp_idx_on_user_id_where_bio_is_filled", :algorithm=>:concurrently}) -> 0.0134s -- execute("RESET ALL") -> 0.0002s == 20200323074147 AddTempIndexOnUsersBio: migrated (0.0200s) ==================

== 20200323080714 TriggerBackgroundMigrationForUsersBio: migrating ============ == 20200323080714 TriggerBackgroundMigrationForUsersBio: migrated (0.0606s) ===

BG migration queries

Each bath query for scheduling, query 1 (< 1ms)

SELECT "users"."id" FROM "users" WHERE ((COALESCE(bio, '') IS DISTINCT FROM '')) ORDER BY "users"."id" ASC LIMIT 1

Limit (cost=0.42..0.43 rows=1 width=4) (actual time=0.253..0.253 rows=1 loops=1) Buffers: shared hit=1 read=3 I/O Timings: read=0.151 -> Index Only Scan using tmp_idx_on_user_id_where_bio_is_filled on public.users (cost=0.42..58684.85 rows=5391511 width=4) (actual time=0.251..0.251 rows=1 loops=1) Heap Fetches: 0 Buffers: shared hit=1 read=3 I/O Timings: read=0.151

Each bath query for scheduling, query 2 (< 5ms)

SELECT "users"."id" FROM "users" WHERE ((COALESCE(bio, '') IS DISTINCT FROM '')) AND "users"."id" >= 78 ORDER BY "users"."id" ASC LIMIT 1 OFFSET 1000

Limit (cost=11.37..11.39 rows=1 width=4) (actual time=2.801..2.801 rows=1 loops=1) Buffers: shared hit=846 read=9 dirtied=5 I/O Timings: read=2.139 -> Index Only Scan using tmp_idx_on_user_id_where_bio_is_filled on public.users (cost=0.42..59061.71 rows=5391443 width=4) (actual time=0.038..2.738 rows=1001 loops=1) Index Cond: (users.id >= 78) Heap Fetches: 7 Buffers: shared hit=846 read=9 dirtied=5 I/O Timings: read=2.139

Upsert query (< 1s)

INSERT INTO user_details (user_id, bio) SELECT id AS user_id, substring(COALESCE(bio, '') from 1 for 255) AS bio FROM "users" WHERE ((COALESCE(bio, '') IS DISTINCT FROM '')) AND "users"."id" BETWEEN 35149 AND 79126 ON CONFLICT (user_id) DO UPDATE SET "bio" = EXCLUDED."bio";

ModifyTable on public.user_details (cost=0.42..78690.51 rows=42534 width=458) (actual time=752.533..752.533 rows=0 loops=1) Buffers: shared hit=10090 read=1006 dirtied=17 I/O Timings: read=709.175 -> Index Scan using tmp_idx_on_user_id_where_bio_is_filled on public.users (cost=0.42..78690.51 rows=42534 width=458) (actual time=0.046..720.882 rows=1002 loops=1) Index Cond: ((users.id >= 35149) AND (users.id <= 79126)) Buffers: shared hit=7 read=996 dirtied=7 I/O Timings: read=709.175

Note: This is with batch size 1000. I modified it to 500 just to be safe. Migration will take about 10 hours.

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #206913 (closed)

Edited by Adam Hegyi

Merge request reports