Skip to content

Destroy invalid project members

What does this MR do and why?

Very similar to !80253 (merged)

We have some invalid project membership records that may have been the result of an old console/SQL delete. The namespace IDs don't point to existing records. This MR removes them.

Deletion of Records!

Following https://docs.gitlab.com/ee/development/database_review.html#preparation-when-adding-data-migrations:

If the migration itself is not reversible, details of how data changes could be reverted in the event of an incident. For example, in the case of a migration that deletes records (an operation that most of the times is not automatically reversible), how could the deleted records be recovered.

It might be prudent to run the query beforehand, save the exact records being deleted as an SQL script, and then proceed with the deletion. If the records need to be restored, these can be recovered.

If the migration deletes data, apply the label data-deletion.

Done.

Concise descriptions of possible user experience impact of an error; for example, “Issues would unexpectedly go missing from Epics”.

If the wrong records are destroyed, some users might find themselves no longer in a group.

Relevant data from the query plans that indicate the query works as expected; such as the approximate number of records that are modified or deleted

See below.

db stuff

ups and downs

🆙

main: == 20220821212706 ScheduleDestroyInvalidProjectMembers: migrating =============
main: == 20220821212706 ScheduleDestroyInvalidProjectMembers: migrated (0.0447s) ====

main: == 20220901035725 AddTempProjectMemberIndex: migrating ========================
main: -- transaction_open?()
main:    -> 0.0000s
main: -- index_exists?(:members, :id, {:name=>"index_project_members_on_id_temp", :where=>"source_type = 'Project'", :algorithm=>:concurrently})
main:    -> 0.0079s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0004s
main: -- add_index(:members, :id, {:name=>"index_project_members_on_id_temp", :where=>"source_type = 'Project'", :algorithm=>:concurrently})
main:    -> 0.0049s
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
main: == 20220901035725 AddTempProjectMemberIndex: migrated (0.0194s) ===============

ci: == 20220821212706 ScheduleDestroyInvalidProjectMembers: migrating =============
ci: -- The migration is skipped since it modifies the schemas: [:gitlab_main].
ci: -- This database can only apply migrations in one of the following schemas: [:gitlab_ci, :gitlab_shared, :gitlab_internal].
ci: == 20220821212706 ScheduleDestroyInvalidProjectMembers: migrated (0.0001s) ====

ci: == 20220901035725 AddTempProjectMemberIndex: migrating ========================
ci: -- transaction_open?()
ci:    -> 0.0000s
ci: -- index_exists?(:members, :id, {:name=>"index_project_members_on_id_temp", :where=>"source_type = 'Project'", :algorithm=>:concurrently})
ci:    -> 0.0092s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0003s
ci: -- add_index(:members, :id, {:name=>"index_project_members_on_id_temp", :where=>"source_type = 'Project'", :algorithm=>:concurrently})
ci:    -> 0.0036s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0004s
ci: == 20220901035725 AddTempProjectMemberIndex: migrated (0.0158s) ===============

$ brake db:rollback:main STEP=2
main: == 20220901035725 ScheduleDestroyInvalidProjectMembers: reverting =============
main: == 20220901035725 ScheduleDestroyInvalidProjectMembers: reverted (0.0165s) ====

main: == 20220901035722 AddTempProjectMemberIndex: reverting ========================
main: -- transaction_open?()
main:    -> 0.0000s
main: -- index_exists?(:members, :id, {:name=>"index_project_members_on_id_temp", :algorithm=>:concurrently})
main:    -> 0.0120s
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0003s
main: -- remove_index(:members, {:name=>"index_project_members_on_id_temp", :algorithm=>:concurrently, :column=>:id})
main:    -> 0.0114s
main: -- execute("RESET statement_timeout")
main:    -> 0.0005s
main: == 20220901035722 AddTempProjectMemberIndex: reverted (0.0303s) ===============

$ brake db:rollback:ci STEP=2
ci: == 20220901035725 ScheduleDestroyInvalidProjectMembers: reverting =============
ci: -- The migration is skipped since it modifies the schemas: [:gitlab_main].
ci: -- This database can only apply migrations in one of the following schemas: [:gitlab_ci, :gitlab_shared, :gitlab_internal].
ci: == 20220901035725 ScheduleDestroyInvalidProjectMembers: reverted (0.0001s) ====

ci: == 20220901035722 AddTempProjectMemberIndex: reverting ========================
ci: -- transaction_open?()
ci:    -> 0.0000s
ci: -- index_exists?(:members, :id, {:name=>"index_project_members_on_id_temp", :algorithm=>:concurrently})
ci:    -> 0.0105s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0003s
ci: -- remove_index(:members, {:name=>"index_project_members_on_id_temp", :algorithm=>:concurrently, :column=>:id})
ci:    -> 0.0114s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0003s
ci: == 20220901035722 AddTempProjectMemberIndex: reverted (0.0292s) ===============

explains and that

explain SELECT members.id
FROM members
LEFT JOIN projects ON members.source_id = projects.id
WHERE members.type = 'ProjectMember' AND projects.id IS NULL;

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/11636/commands/41334 (retrieved 22 August 2022 NZST)

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/11855/commands/42095 (retrieved with data snapshot as of 2022-09-03T11:25:36Z, same number of rows as before).

Looks like 67 rows, unless I'm mistaken.

queries

With batch size 10000

explain SELECT "members"."id"
FROM "members"
WHERE "members"."id" BETWEEN 100000 AND 110000 AND "members"."source_type" = 'Project'
ORDER BY "members"."id" ASC LIMIT 1 

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/11691/commands/41644

explain SELECT "members"."id"
FROM "members"
WHERE "members"."id" BETWEEN 100000 AND 110000 AND "members"."source_type" = 'Project' AND "members"."id" >= 1
ORDER BY "members"."id" ASC LIMIT 1 OFFSET 100 

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/11691/commands/41645

explain SELECT "members"."id"
FROM "members"
LEFT OUTER JOIN projects ON members.source_id = projects.id
WHERE "members"."id" BETWEEN 100000 AND 110000 AND "members"."source_type" = 'Project' AND "members"."id" >= 100000 AND "projects"."id" IS NULL

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/11691/commands/41646

explain DELETE FROM "members"
WHERE "members"."id" IN (
  SELECT "members"."id" FROM "members" LEFT OUTER JOIN projects ON members.source_id = projects.id
  WHERE "members"."id" BETWEEN 100000 AND 110000 AND "members"."source_type" = 'Project' AND "members"."id" >= 100000 AND "projects"."id" IS NULL) 

https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/11691/commands/41647

How to set up and validate locally

  1. Check if you have any invalid records on your local already
# gdk psql
gitlabhq_development=# SELECT members.id FROM members LEFT JOIN projects ON members.source_id = projects.id WHERE members.type = 'ProjectMember' AND projects.id
 IS NULL;

id 
-------
    123
(1 row)

If none exist, you'll have to create one 👇

  1. (if needed) In console, create an invalid project member record
u = User.last
p = Project.last
pm = Member.new(user: u, type: 'ProjectMember', source_type: 'Project', source_id: p.id, notification_level: 3, access_level: 30)
=> <#ProjectMember....>
pm.save
=> true
pm.id
=> 123 # or whatever, note this down

In postgres...

# UPDATE members SET source_id = 999 WHERE id=123;
UPDATE 1
  1. Run the migration and they should be destroyed and the IDs logged.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #349575 (closed)

Edited by Krasimir Angelov

Merge request reports