[Proposal] Add `namespace_id` to `vulnerability_reads` table
Why are we doing this work
The "Group-level vulnerability report" page is mostly timing out as we are trying to generate the response by searching a huge amount of data. We've already fixed the timeout issues for the "project-level vulnerability report" page by using the unnested_in_filters
but applying the same solution for the group-level queries might not be a good solution.
Therefore, we can add the namespace_id
into the vulnerability_reads
table and use that attribute for filtering the vulnerabilities instead of filtering by all the projects belonging to a specific group.
The query filtering by projects
WITH RECURSIVE "base_and_descendants" AS (
(
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group' AND "namespaces"."id" = 9970
)
UNION
(
SELECT
"namespaces"."id"
FROM
"namespaces",
"base_and_descendants"
WHERE
"namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id"
)
),
"group_projects" AS (
SELECT
"projects"."id" AS "project_id"
FROM
"projects"
INNER JOIN "base_and_descendants" ON "base_and_descendants"."id" = "projects"."namespace_id"
INNER JOIN "project_settings" ON "project_settings"."project_id" = "projects"."id" AND "project_settings"."has_vulnerabilities" IS TRUE
WHERE
"projects"."archived" = FALSE AND "projects"."pending_delete" = FALSE
)
SELECT
"vulnerability_reads".*
FROM
"group_projects",
unnest('{0, 4}'::smallint[]) AS "report_types" ("report_type"),
unnest('{1, 4}'::smallint[]) AS "states" ("state"),
unnest(ARRAY["group_projects"."project_id"]::integer[]) as projects(id),
LATERAL (
SELECT
"vulnerability_reads".*
FROM
"vulnerability_reads"
WHERE
"vulnerability_reads"."project_id" = "projects"."id"
AND "vulnerability_reads"."severity" = 5
AND (vulnerability_reads."report_type" = "report_types"."report_type")
AND (vulnerability_reads."state" = "states"."state")
ORDER BY
"vulnerability_reads"."severity" DESC,
"vulnerability_reads"."vulnerability_id" DESC
LIMIT 20) AS vulnerability_reads
ORDER BY
"vulnerability_reads"."severity" DESC,
"vulnerability_reads"."vulnerability_id" DESC
LIMIT 20
- Cold cache execution plan(~8.5s): https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10979/commands/39444
- Warm cache execution plan(~36ms): https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10979/commands/39445
The query filtering by namespaces
WITH RECURSIVE "base_and_descendants" AS MATERIALIZED (
(
SELECT
"namespaces"."id"
FROM
"namespaces"
WHERE
"namespaces"."type" = 'Group' AND "namespaces"."id" = 9970
)
UNION
(
SELECT
"namespaces"."id"
FROM
"namespaces",
"base_and_descendants"
WHERE
"namespaces"."type" = 'Group' AND "namespaces"."parent_id" = "base_and_descendants"."id"
)
)
SELECT
"vulnerability_reads".*
FROM
"base_and_descendants",
unnest('{0, 4}'::smallint[]) AS "report_types" ("report_type"),
unnest('{1, 4}'::smallint[]) AS "states" ("state"),
unnest(ARRAY["base_and_descendants"."id"]::integer[]) as namespaces(id),
LATERAL (
SELECT
"vulnerability_reads".*
FROM
"vulnerability_reads"
WHERE
"vulnerability_reads"."namespace_id" = "namespaces"."id"
AND "vulnerability_reads"."severity" = 5
AND (vulnerability_reads."report_type" = "report_types"."report_type")
AND (vulnerability_reads."state" = "states"."state")
ORDER BY
"vulnerability_reads"."severity" DESC,
"vulnerability_reads"."vulnerability_id" DESC
LIMIT 20) AS vulnerability_reads
ORDER BY
"vulnerability_reads"."severity" DESC,
"vulnerability_reads"."vulnerability_id" DESC
LIMIT 20
- Without index execution plan (~42ms): https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10979/commands/39475
- With index warm cache execution plan(~17ms): https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10979/commands/39478
As seen in the above execution plans, by adding the namespace_id
column to the vulnerability_reads
table we can achieve a tangible performance improvement.
Relevant links
- "Unnested in filters" optimization introduced by: Filter vulnerability reads by the cartesian pro... (!90207 - merged)
Implementation plan
-
database Add
namespace_id
(bigint) column tovulnerability_reads
table along with the FK constraints -
database Create index on
vulnerability_reads
tablebtree(namespace_id, state, severity, vulnerability_id DESC)
-
database Change existing triggers to set
namespace_id
ofvulnerability_reads
records -
database Create migration to backfill the
namespace_id
column of existingvulnerability_reads
records -
backend Adjust
VulnerabilityReadsFinder
to useunnested_in_filters
withbase_and_descendants
CTE when the vulnerable is a group.
Verification steps
TBD
Edited by Mehmet Emin INAC