Skip to content

Use linear query to refresh the project authorizations

Aboobacker MK requested to merge tachyons/linear_project_authorizations into master

What does this MR do and why?

This is an attempt to replace recursive CTE to used in project authorisation query with linear query using traversal_ids

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/17819/commands/59527

New query

WITH "linear_namespaces_cte" AS MATERIALIZED (
    (
        SELECT
            namespaces.id,
            MAX(LEAST (members.access_level, group_group_links.group_access)) AS access_level
        FROM
            "namespaces"
            INNER JOIN group_group_links ON group_group_links.shared_group_id = namespaces.id
                OR namespaces.traversal_ids @> ARRAY[group_group_links.shared_group_id::int]
            INNER JOIN members ON group_group_links.shared_with_group_id = members.source_id
        WHERE
            "namespaces"."type" = 'Group'
            AND "members"."user_id" = 1
            AND "members"."requested_at" IS NULL
            AND (access_level >= 10) -- Minimum access level is excluded as they are not applicable for projects
            AND "members"."type" = 'GroupMember'
            AND "members"."source_type" = 'Namespace'
            AND "members"."state" = 0
        GROUP BY
            "namespaces"."id"
)
    UNION (
        SELECT
            namespaces.id,
            MAX( members.access_level) AS access_level
        FROM
            "members"
        CROSS JOIN LATERAL (
            SELECT
                namespaces.traversal_ids[array_length(namespaces.traversal_ids, 1)] AS id -- Optimisation to scan through traversal ids alone
            FROM
                "namespaces"
            WHERE
                "namespaces"."type" = 'Group'
                AND (namespaces.traversal_ids @> ARRAY[members.source_id])
) AS namespaces
        WHERE
            "members"."user_id" = 1
            AND "members"."requested_at" IS NULL
            AND (access_level >= 10)
            AND "members"."type" = 'GroupMember'
            AND "members"."source_type" = 'Namespace'
            AND "members"."state" = 0
        GROUP BY
            "namespaces"."id"
))
SELECT
    "project_authorizations"."project_id",
    MAX(access_level) AS access_level
FROM ((
        SELECT
            projects.id AS project_id,
            "members"."access_level"
        FROM
            "projects"
            INNER JOIN "members" ON "projects"."id" = "members"."source_id"
        WHERE
            "members"."type" = 'ProjectMember'
            AND "members"."source_type" = 'Project'
            AND "members"."user_id" = 1
            AND "members"."requested_at" IS NULL
            AND "members"."state" = 0)
    UNION (
        SELECT
            projects.id AS project_id,
            50 AS access_level
        FROM
            "projects"
            INNER JOIN "namespaces" ON "projects"."namespace_id" = "namespaces"."id"
        WHERE
            "namespaces"."owner_id" = 1
            AND "namespaces"."type" = 'User')
    UNION (
        SELECT
            "projects"."id" AS project_id,
            "namespaces"."access_level"
        FROM
            "linear_namespaces_cte" "namespaces"
            INNER JOIN "projects" ON "projects"."namespace_id" = "namespaces"."id") -- Hotspot as projects table is big
UNION (
    SELECT
        "project_group_links"."project_id",
        LEAST ("namespaces"."access_level", "project_group_links"."group_access") AS access_level
    FROM
        "linear_namespaces_cte" "namespaces"
        INNER JOIN project_group_links ON project_group_links.group_id = namespaces.id
        INNER JOIN projects ON projects.id = project_group_links.project_id --Hotspot as projects table is big and this is not a index only scan
        INNER JOIN namespaces p_ns ON p_ns.id = projects.namespace_id -- Hotspot as it self joins a big table and share_with_group_lock is not part of the index
    WHERE (p_ns.share_with_group_lock IS FALSE))) project_authorizations
GROUP BY
    "project_authorizations"."project_id"

Improvements for future iterations

  • It seems that adding index for group_id for project_group_links improves buffer usage a bit, evaluate this index
  • Check the feasibility for converting index scans to index only scans
  • Check the feasibility of using array operations instead of array inclusion operator (@>) to speed up the lookup
  • Extract the namespaces CTE to independent query and batch the calculation of project access
  • Add another feature flag to compare both results on run time and log them to check the correctness

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Pick a user with several projects associated with
  2. Run Gitlab::ProjectAuthorizations.new(user).calculate on Rails console and save results
  3. Enable the linear_project_authorization flag
    Feature.enable(:linear_project_authorization)
  4. Run Gitlab::ProjectAuthorizations.new(user).calculate on Rails console and and compare with previous results

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #410502 (closed)

Edited by Aboobacker MK

Merge request reports