Remove `[analyzer]_pipeline` metrics join on ci_builds"
What does this MR do and why?
[analyzer]_pipeline
service ping metrics are sent for every CI job that contains a scan, not just for those that are part of successful non-retried jobs. This is an attempt to reduce the load on the database.
Related to #338010 (closed)
We took similar approach in this MR.
Impact to the database
Query
Before:
SELECT
"security_scans".*
FROM
"security_scans"
INNER JOIN "ci_builds" ON "ci_builds"."id" = "security_scans"."build_id"
AND "ci_builds"."type" = 'Ci::Build'
WHERE
"ci_builds"."status" = 'success'
AND (
"ci_builds"."retried" = FALSE
OR "ci_builds"."retried" IS NULL
)
AND "security_scans"."scan_type" = 1
AND "security_scans"."created_at" BETWEEN '2021-09-27 19:55:50.275002'
AND '2021-09-29 19:55:45.806409'
The explain plan generated from #database_lab
can be found at the following location https://explain.depesz.com/s/yE1T
After
A query like the following will be made for each scan type (of which there are nine).
SELECT
"security_scans".*
FROM
"security_scans"
WHERE
"security_scans"."scan_type" = 1
AND "security_scans"."created_at" BETWEEN '2021-09-27 19:55:50.275002'
AND '2021-09-29 19:55:45.806409'
This is an expected improvement on the previous query as it doesn't have to join with the ci_builds
table and filter by the latest successful build.
The new query searches using the index on scan_type
. This query is run against scan_type
1
(SAST), which accounts for the vast majority of Security Scans.
The explain plan generated from #database_lab
can be found at the following location https://explain.depesz.com/s/pzXV.