Skip to content

Backfill project bot user details bot namespace

What does this MR do and why?

Related to #498333 (closed)

Backfills project_bot user's bot_namespace association. Project bot users are tied to project and group access tokens. Currently, the only way a project bot ownership is determined is by the sole group or project membership that user has. This works because project bots cannot be added to groups and projects and are limited to that initial sole membership. However, this presents challenges if we want to query for all the project bots that belong to groups and projects in a given hierarchy. That's why we introduced bot_namespace.

This MR queries for project bots' group and project memberships. There will only be one membership. Then from that membership we store either the group or the project's project namespace (not to be confused with the project's group) in the bot_namespace.

Data integrity/query checks

See internal comment at !171326 (comment 2195032956) for metrics.

I ran the following query to validate my assumption that no project_bot user should have both a project membership and group membership. Based on the execution plan at https://postgres.ai/console/gitlab/gitlab-production-main/sessions/33292/commands/102714, I believe that 0 rows were returned. This should be sufficient validation that the data is as we expect and there is no risk to this query getting the wrong namespace_id to insert into bot_namespace_id.

explain SELECT *
FROM
    users AS u
    LEFT JOIN members AS gm ON gm.type = 'GroupMember' AND gm.user_id = u.id
    LEFT JOIN members AS pm ON pm.type = 'ProjectMember' AND pm.user_id = u.id
    LEFT JOIN projects AS p ON pm.source_id = p.id
    LEFT JOIN user_details AS ud ON ud.user_id = u.id
WHERE
    ud.bot_namespace_id IS NULL
    AND u.user_type = 6
    AND gm.user_id IS NOT NULL
    AND pm.user_id IS NOT NULL;

We can also validate the number of project bots that we will find in this query using the following portion of the query from the migration. The only modification is the addition of u.user_type = 6 which would otherwise be pulled in as part of the sub batch query. https://postgres.ai/console/gitlab/gitlab-production-main/sessions/33440/commands/103072

                  explain SELECT
                      DISTINCT ON (u.id)
                      u.id AS user_id,
                      m.member_namespace_id AS namespace_id
                  FROM
                      users AS u
                      LEFT JOIN members AS m ON m.user_id = u.id
                      LEFT JOIN user_details AS ud ON ud.user_id = u.id
                  WHERE 
                      ud.bot_namespace_id IS NULL;

Then we can compare this number to a simple query like the following. This number is a few thousand higher than the above query. But I believe this is because some users are orphaned/awaiting deletion and have no membership. https://postgres.ai/console/gitlab/gitlab-production-main/sessions/33292/commands/102719

explain select * from users where user_type = 6;

References

Please include cross links to any resources that are relevant to this MR This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Edited by Drew Blessing

Merge request reports

Loading