Skip to content

Break repository excess calculation into smaller, simpler queries [RUN ALL RSPEC] [RUN AS-IF-FOSS]

Mark Chao requested to merge 323985-split-query into master

What does this MR do?

The repository execess calculation's query involves a subquery of getting all nested groups. This resulted in some slow query during one incident (but it is not the cause).

We want to separate the split nested groups query as a separate query, which should be more resilient when things go wrong.

The more detailed analysis can be found here: https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12769#note_525534422

From research, one of the larger group contains 5000 sub-groups. A feature flag is added so we can test if this is okay, or if we need to further split the query.

Related issue #323985 (closed)

Database query

Query example: group.total_repository_size_excess

before change
SELECT 
  SUM(
    (
      (
        "project_statistics"."repository_size" + "project_statistics"."lfs_objects_size"
      ) - "projects"."repository_size_limit"
    )
  ) 
FROM 
  "projects" 
  INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id" 
WHERE 
  "projects"."namespace_id" IN (
    WITH RECURSIVE "base_and_descendants" AS (
      (
        SELECT 
          "namespaces".* 
        FROM 
          "namespaces" 
        WHERE 
          "namespaces"."type" = 'Group' 
          AND "namespaces"."id" = 22
      ) 
      UNION 
        (
          SELECT 
            "namespaces".* 
          FROM 
            "namespaces", 
            "base_and_descendants" 
          WHERE 
            "namespaces"."type" = 'Group' 
            AND "namespaces"."parent_id" = "base_and_descendants"."id"
        )
    ) 
    SELECT 
      id 
    FROM 
      "base_and_descendants" AS "namespaces"
  ) 
  AND (
    "project_statistics"."repository_size" + "project_statistics"."lfs_objects_size"
  ) > "projects"."repository_size_limit" 
  AND "projects"."repository_size_limit" != 0
after change
WITH RECURSIVE "base_and_descendants" AS (
  (
    SELECT 
      "namespaces".* 
    FROM 
      "namespaces" 
    WHERE 
      "namespaces"."type" = 'Group' 
      AND "namespaces"."id" = 22
  ) 
  UNION 
    (
      SELECT 
        "namespaces".* 
      FROM 
        "namespaces", 
        "base_and_descendants" 
      WHERE 
        "namespaces"."type" = 'Group' 
        AND "namespaces"."parent_id" = "base_and_descendants"."id"
    )
) 
SELECT 
  id 
FROM 
  "base_and_descendants" AS "namespaces"


SELECT 
  SUM(
    (
      (
        "project_statistics"."repository_size" + "project_statistics"."lfs_objects_size"
      ) - "projects"."repository_size_limit"
    )
  ) 
FROM 
  "projects" 
  INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id" 
WHERE 
  "projects"."namespace_id" = 22 
  AND (
    "project_statistics"."repository_size" + "project_statistics"."lfs_objects_size"
  ) > "projects"."repository_size_limit" 
  AND "projects"."repository_size_limit" != 0

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Mark Chao

Merge request reports