Scaling limitations on project usage calculation
Context
Related to &9106. Similar to #779 (closed).
Problem
After deploying the temporary workaround for #779 (closed), I noticed that we have a similar problem with the project-level usage calculations. We're seeing some timeouts when attempting to calculation this.
Current query:
SELECT
coalesce(sum(q.size), 0)
FROM ( WITH RECURSIVE repository_ids AS MATERIALIZED (
SELECT
id
FROM
repositories
WHERE
top_level_namespace_id = $1
AND (
path = $2
OR path LIKE $3
)
),
cte AS (
SELECT
m.id AS manifest_id
FROM
manifests AS m
WHERE
m.top_level_namespace_id = $1
AND m.repository_id IN (
SELECT
id
FROM
repository_ids)
AND EXISTS (
SELECT
FROM
tags AS t
WHERE
t.top_level_namespace_id = m.top_level_namespace_id
AND t.repository_id = m.repository_id
AND t.manifest_id = m.id)
UNION
SELECT
mr.child_id AS manifest_id
FROM
manifest_references AS mr
JOIN cte ON mr.parent_id = cte.manifest_id
WHERE
mr.top_level_namespace_id = $1
AND mr.repository_id IN (
SELECT
id
FROM
repository_ids))
SELECT DISTINCT ON (l.digest)
l.size
FROM
layers AS l
JOIN cte ON l.top_level_namespace_id = $1
AND l.repository_id IN (
SELECT
id
FROM
repository_ids)
AND l.manifest_id = cte.manifest_id) AS q;
Sample query plan: https://console.postgres.ai/gitlab/gitlab-production-registry/sessions/12866/commands/45176
The sample above took over one minute to execute.