Follow-up from "Add API endpoint for vulnerability history at group level"

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Close this issue

The following discussion from !8603 (merged) should be addressed:

@abrandl started a discussion: (+3 comments)

@gonzoyumo I'm not exactly sure about this: Is there some sort of redundancy in the query?

EXPLAIN ANALYZE
SELECT
    CAST(vulnerability_occurrence_pipelines.created_at AS DATE) AS DAY,
    "vulnerability_occurrences"."severity",
    count(DISTINCT vulnerability_occurrences.id)
FROM
    "vulnerability_occurrences"
    INNER JOIN "vulnerability_occurrence_pipelines" ON "vulnerability_occurrence_pipelines"."occurrence_id" = "vulnerability_occurrences"."id"
WHERE
    "vulnerability_occurrence_pipelines"."pipeline_id" IN (
        SELECT
            "ci_pipelines"."id"
        FROM
            "ci_pipelines"
        WHERE
            "ci_pipelines"."project_id" IN (
                SELECT
                    "projects"."id"
                FROM
                    "projects"
                    INNER JOIN routes rs ON rs.source_id = projects.id
                        AND rs.source_type = 'Project'
                WHERE (rs.path LIKE 'olivier/%'))
            AND (EXISTS (
                    SELECT
                        1
                    FROM
                        "vulnerability_occurrence_pipelines"
                    WHERE (ci_pipelines.id = vulnerability_occurrence_pipelines.pipeline_id)))
            AND "ci_pipelines"."status" = 'success')
        AND (vulnerability_occurrence_pipelines.created_at >= '2018-08-27')
    GROUP BY
        "day",
        "vulnerability_occurrences"."severity"
    ORDER BY
        DAY;

I was wondering if this part is redundant to the JOIN condition:

            AND (EXISTS (
                    SELECT
                        1
                    FROM
                        "vulnerability_occurrence_pipelines"
                    WHERE (ci_pipelines.id = vulnerability_occurrence_pipelines.pipeline_id)))

I don't know the model too well though, feel free to ignore. :) If this can be removed, it speeds up the query quite a lot and the plan gets a tad nicer.

I shortened the query to make this more readable to the relevant part below (the EXISTS(...) is redundant IMHO).

SELECT
*
FROM
    "vulnerability_occurrence_pipelines"
WHERE
    "vulnerability_occurrence_pipelines"."pipeline_id" IN (
        SELECT
            "ci_pipelines"."id"
        FROM
            "ci_pipelines"
        WHERE
            "ci_pipelines"."project_id" IN (...)
            AND (EXISTS (
                    SELECT
                        1
                    FROM
                        "vulnerability_occurrence_pipelines"
                    WHERE (ci_pipelines.id = vulnerability_occurrence_pipelines.pipeline_id)))
        )

Edited Jun 26, 2025 by 🤖 GitLab Bot 🤖