Backfill project bot user details bot namespace
What does this MR do and why?
Related to #498333 (closed)
Backfills project_bot user's bot_namespace association. Project bot users are tied to project and group access tokens. Currently, the only way a project bot ownership is determined is by the sole group or project membership that user has. This works because project bots cannot be added to groups and projects and are limited to that initial sole membership. However, this presents challenges if we want to query for all the project bots that belong to groups and projects in a given hierarchy. That's why we introduced bot_namespace.
This MR queries for project bots' group and project memberships. There will only be one membership. Then from that membership we store either the group or the project's project namespace (not to be confused with the project's group) in the bot_namespace.
Data integrity/query checks
See internal comment at !171326 (comment 2195032956) for metrics.
I ran the following query to validate my assumption that no project_bot user should have both a project membership and group membership. Based on the execution plan at https://postgres.ai/console/gitlab/gitlab-production-main/sessions/33292/commands/102714, I believe that 0 rows were returned. This should be sufficient validation that the data is as we expect and there is no risk to this query getting the wrong namespace_id to insert into bot_namespace_id.
explain SELECT *
FROM
users AS u
LEFT JOIN members AS gm ON gm.type = 'GroupMember' AND gm.user_id = u.id
LEFT JOIN members AS pm ON pm.type = 'ProjectMember' AND pm.user_id = u.id
LEFT JOIN projects AS p ON pm.source_id = p.id
LEFT JOIN user_details AS ud ON ud.user_id = u.id
WHERE
ud.bot_namespace_id IS NULL
AND u.user_type = 6
AND gm.user_id IS NOT NULL
AND pm.user_id IS NOT NULL;
We can also validate the number of project bots that we will find in this query using the following portion of the query from the migration. The only modification is the addition of u.user_type = 6 which would otherwise be pulled in as part of the sub batch query. https://postgres.ai/console/gitlab/gitlab-production-main/sessions/33440/commands/103072
explain SELECT
DISTINCT ON (u.id)
u.id AS user_id,
m.member_namespace_id AS namespace_id
FROM
users AS u
LEFT JOIN members AS m ON m.user_id = u.id
LEFT JOIN user_details AS ud ON ud.user_id = u.id
WHERE
ud.bot_namespace_id IS NULL;
Then we can compare this number to a simple query like the following. This number is a few thousand higher than the above query. But I believe this is because some users are orphaned/awaiting deletion and have no membership. https://postgres.ai/console/gitlab/gitlab-production-main/sessions/33292/commands/102719
explain select * from users where user_type = 6;
References
Please include cross links to any resources that are relevant to this MR This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
| Before | After |
|---|---|
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.