Destroy invalid group membership records

Merged charlie ablett requested to merge 349575-cablett-remove-invalid-membership-records into master

What does this MR do and why?

We have some invalid membership records that may have been the result of an old console/SQL delete. The namespace IDs don't point to existing records. This MR removes them.

If needed we can do the same for Project members in another MR.

See !79204 (comment 836039745) for context.

DB info

All records with group type and invalid source_id (443 as of 5 July)...

explain SELECT COUNT(*) FROM members LEFT JOIN namespaces ON members.source_id = namespaces.id WHERE members.type = 'GroupMember' AND namespaces.id IS NULL;

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10911/commands/39198

As of 4 February there are 443 invalid group membership records - see !78715 (comment 831433787).

SELECT COUNT(*) FROM members
LEFT JOIN namespaces ON members.source_id = namespaces.id
WHERE members.type = 'GroupMember'
AND namespaces.id IS NULL;

 count
-------
   443
(1 row)
   Background Migration Details:

   443 items to delete
   batch size = 10000
   443 / x = ? batches

   Estimated times per batch:
   - xms for select statement with 1000 items (see linked explain plan)
   - yms for delete statement with 1000 items (see linked explain plan)
   Total: ~x+y sec per batch

   2 mins delay per batch (safe for the given total time per batch)

   y batches * 2 min per batch = z mins to run all the scheduled jobs

Queries

main: == 20220809002011 ScheduleDestroyInvalidMembershipRecords: migrating ==========
main: == 20220809002011 ScheduleDestroyInvalidMembershipRecords: migrated (0.0610s) =

.main: == 20220809002011 ScheduleDestroyInvalidMembershipRecords: migrating ==========
main: == 20220809002011 ScheduleDestroyInvalidMembershipRecords: migrated (0.0381s) =

Explains (from my local)

# batching strategy
SELECT "members"."id" FROM "members" WHERE ("id" >= 13) AND "members"."member_namespace_id" IS NULL AND "members"."source_type" = 'Group' ORDER BY "members"."id" ASC LIMIT 1

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/11398/commands/40749.

destroy_invalid_group_members_spec.rb (console output from rspec test)

 ↳ lib/gitlab/database.rb:364:in `commit'
   Load (0.5ms)  SELECT "members"."id" FROM "members" LEFT OUTER JOIN namespaces ON members.source_id = namespaces.id WHERE "members"."id" BETWEEN 1 AND 1000 AND "namespaces"."id" IS NULL ORDER BY "members"."id" ASC LIMIT 1 
  ↳ app/models/concerns/each_batch.rb:62:in `each_batch'
   Load (0.4ms)  SELECT "members"."id" FROM "members" LEFT OUTER JOIN namespaces ON members.source_id = namespaces.id WHERE "members"."id" BETWEEN 1 AND 1000 AND "namespaces"."id" IS NULL AND "members"."id" >= 1 ORDER BY "members"."id" ASC LIMIT 1 OFFSET 100 
  ↳ app/models/concerns/each_batch.rb:81:in `block in each_batch'
   Load (0.4ms)  SELECT "members".* FROM "members" LEFT OUTER JOIN namespaces ON members.source_id = namespaces.id WHERE "members"."id" BETWEEN 1 AND 1000 AND "namespaces"."id" IS NULL AND "members"."id" >= 1
  ↳ lib/gitlab/background_migration/destroy_invalid_group_members.rb:14:in `map'
SELECT "members".* FROM "members" LEFT OUTER JOIN namespaces ON members.source_id = namespaces.id WHERE "members"."id" BETWEEN 10000 AND 20000 AND "namespaces"."id" IS NULL AND "members"."id" >= 10000

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/11398/commands/40750

DELETE FROM "members" WHERE "members"."id" IN (SELECT "members"."id" FROM "members" LEFT OUTER JOIN namespaces ON members.source_id = namespaces.id WHERE "members"."id" BETWEEN 10000 AND 20000 AND "namespaces"."id" IS NULL AND "members"."id" >= 10000)

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/11398/commands/40751

Data removal set verification

  • Logging of affected IDs
  • Production clone dry-run

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #365028 (closed) and #349575 (closed)

Edited by charlie ablett