Skip to content

Calculate total excess storage for a namespace

Tyler Amos requested to merge 255346-total-excess-storage-in-namespace into master

What does this MR do?

In order to enforce the new repository size rules allowing customers to purchase additional storage (as defined in the &4237), we need to be able to calculate the total excess repository size of all the projects in the root namespace. This MR introduces a method, total_repository_size_excess, accomplishes through one query on the namespace.

As an example, here's the method used in a rails console locally, along with the raw query:

[19] pry(main)> root_namespace.total_repository_size_excess
  ApplicationSetting Load (2.0ms)  SELECT "application_settings".* FROM "application_settings" ORDER BY "application_settings"."id" DESC LIMIT $1  [["LIMIT", 1]]
   (6.4ms)  SELECT SUM(("project_statistics"."repository_size" - "projects"."repository_size_limit")) FROM "projects" INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project' INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id" WHERE (rs.path LIKE 'test-group-01/%') AND (projects.repository_size_limit != 0 AND project_statistics.repository_size > projects.repository_size_limit)
   (1.2ms)  SELECT SUM(("project_statistics"."repository_size" - 10485760000)) FROM "projects" INNER JOIN routes rs ON rs.source_id = projects.id AND rs.source_type = 'Project' INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id" WHERE (rs.path LIKE 'test-group-01/%') AND (projects.repository_size_limit IS NULL AND project_statistics.repository_size > 10485760000)

Database Review Notes

First query for projects with a set repository limit:

Formatted Query:

SELECT
    SUM(project_statistics.repository_size - projects.repository_size_limit) AS total_excess
FROM
    "projects"
    INNER JOIN routes rs ON rs.source_id = projects.id
        AND rs.source_type = 'Project'
    INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/%')
    AND projects.repository_size_limit != 0 AND project_statistics.repository_size > projects.repository_size_limit

Explain Output: https://explain.depesz.com/s/7ns

Aggregate  (cost=5335.62..5335.63 rows=1 width=32)
  ->  Nested Loop  (cost=1.43..5335.61 rows=1 width=16)
        Join Filter: (project_statistics.repository_size > projects.repository_size_limit)
        ->  Nested Loop  (cost=1.00..5332.16 rows=1 width=16)
              ->  Index Scan using index_routes_on_path_text_pattern_ops on routes rs  (cost=0.56..3.59 rows=1554 width=4)
                    Index Cond: (((path)::text ~>=~ 'gbobject/'::text) AND ((path)::text ~<~ 'gbobject0'::text))
                    Filter: (((path)::text ~~ 'gbobject/% '::text) AND ((source_type)::text = 'Project'::text))
              ->  Index Scan using projects_pkey on projects  (cost=0.43..3.43 rows=1 width=12)
                    Index Cond: (id = rs.source_id)
                    Filter: (repository_size_limit <> 0)
        ->  Index Scan using index_project_statistics_on_project_id on project_statistics  (cost=0.43..3.44 rows=1 width=12)
              Index Cond: (project_id = rs.source_id)

Summary:

First attempt:

Time: 10.580 s
  - planning: 1.480 ms
  - execution: 10.579 s
    - I/O read: 10.438 s
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 30270 (~236.50 MiB) from the buffer pool
  - reads: 9127 (~71.30 MiB) from the OS file cache, including disk I/O
  - dirtied: 186 (~1.50 MiB)
  - writes: 0

Second attempt:

Time: 37.127 ms
  - planning: 1.646 ms
  - execution: 35.481 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 39397 (~307.80 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Second query for projects that fall back to the namespace repository limit (example with 10 GB):

Formatted Query:

EXPLAIN SELECT
    SUM(project_statistics.repository_size - 10485760000) AS total_excess
FROM
    "projects"
    INNER JOIN routes rs ON rs.source_id = projects.id
        AND rs.source_type = ‘Project’
    INNER JOIN "project_statistics" ON "project_statistics"."project_id" = "projects"."id"
WHERE (rs.path LIKE 'gbobject/% ')
    AND projects.repository_size_limit IS NULL AND project_statistics.repository_size > 10485760000

Explain Output: https://explain.depesz.com/s/4rl3

Aggregate  (cost=150.07..150.08 rows=1 width=32)
  ->  Nested Loop  (cost=24.01..150.06 rows=1 width=8)
        Join Filter: (rs.source_id = projects.id)
        ->  Hash Join  (cost=23.57..146.65 rows=1 width=16)
              Hash Cond: (project_statistics.project_id = rs.source_id)
              ->  Index Only Scan using index_project_statistics_on_repository_size_and_project_id on project_statistics  (cost=0.56..115.02 rows=3286 width=12)
                    Index Cond: (repository_size > '10485760000'::bigint)
              ->  Hash  (cost=3.59..3.59 rows=1554 width=4)
                    ->  Index Scan using index_routes_on_path_text_pattern_ops on routes rs  (cost=0.56..3.59 rows=1554 width=4)
                          Index Cond: (((path)::text ~>=~ 'gbobject/'::text) AND ((path)::text ~<~ 'gbobject0'::text))
                          Filter: (((path)::text ~~ 'gbobject/% '::text) AND ((source_type)::text = 'Project'::text))
        ->  Index Scan using projects_pkey on projects  (cost=0.43..3.40 rows=1 width=4)
              Index Cond: (id = project_statistics.project_id)
              Filter: (repository_size_limit IS NULL)

Summary:

First attempt:

Time: 497.272 ms
  - planning: 1.719 ms
  - execution: 495.553 ms
    - I/O read: 443.846 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 30275 (~236.50 MiB) from the buffer pool
  - reads: 9130 (~71.30 MiB) from the OS file cache, including disk I/O
  - dirtied: 186 (~1.50 MiB)
  - writes: 0

Second attempt:

Time: 42.985 ms
  - planning: 1.625 ms
  - execution: 41.360 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 39402 (~307.80 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

Relates to #255346

Edited by Alper Akgun

Merge request reports