Set `vulnerability_count` of project_security_statistics records

What does this MR do and why?

This change introduces a batched background migration to calculate and persist the total number of vulnerabilities each project contains.

We are removing an old migration which was calculating the same numbers but writing the values to the project_statistics table. We have decided to introduce a new table called project_security_statistics instead of using the project_statistics table because the project_statistics table stays on the main DB while project_security_statistics lives on the sec DB which guarantees transactionality.

Related to #474280+.

Database review

The estimation by the db testing job is off because it's using the total number of tuples in the vulnerability_reads table for the calculation, however, we are running a loose-index scan on that table to iterate over distinct project IDs.

There are currently 183_774 unique project IDs in the vulnerability reads table so the correct estimation would be (183_774 / 1000) * 2 / 60 ~= 6 hours.

Queries used by the migration

Upsert query to create/update `project_security_statistics` records(`ensure_statistics`)
INSERT INTO "project_security_statistics" ("project_id")
    VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12), (13), (14), (15), (16), (17), (18), (19), (20), (21), (22), (23), (24), (25), (26), (27), (28), (29), (30), (31), (32), (33), (34), (35), (36), (37), (38), (39), (40), (41), (42), (43), (44), (45), (46), (47), (48), (49), (50), (51), (52), (53), (54), (55), (56), (57), (58), (59), (60), (61), (62), (63), (64), (65), (66), (67), (68), (69), (70), (71), (72), (73), (74), (75), (76), (77), (78), (79), (80), (81), (82), (83), (84), (85), (86), (87), (88), (89), (90), (91), (92), (93), (94), (95), (96), (97), (98), (99), (100)
ON CONFLICT
    DO NOTHING
RETURNING
    "project_id"

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/32285/commands/99730

Finding the latest `vulnerability_reads` record for a project(`reset_statistics_and_get_latest_vulnerability_id`)
SELECT
    vulnerability_reads.*
FROM
    vulnerability_reads
WHERE
    vulnerability_reads.project_id = 278964
ORDER BY
    vulnerability_reads.vulnerability_id DESC
LIMIT 1

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96952

Finding the lower-bound of the batch counting(`calculate_number_of_vulnerabilities`)
SELECT
    "vulnerability_reads"."vulnerability_id"
FROM
    "vulnerability_reads"
WHERE (project_id = 278964
    AND vulnerability_id <= 9231721312)
ORDER BY
    "vulnerability_reads"."vulnerability_id" ASC
LIMIT 1

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96954

Finding the upper-bound of the batch counting(`calculate_number_of_vulnerabilities`)
SELECT
    "vulnerability_reads"."vulnerability_id"
FROM
    "vulnerability_reads"
WHERE (project_id = 278964
    AND vulnerability_id <= 9231721312)
AND "vulnerability_reads"."vulnerability_id" >= 11449
ORDER BY
    "vulnerability_reads"."vulnerability_id" ASC
LIMIT 1 OFFSET 1000

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96955

Counting the `vulnerability_reads` records(`calculate_number_of_vulnerabilities`)
SELECT
    COUNT(*)
FROM
    "vulnerability_reads"
WHERE (project_id = 278964
    AND vulnerability_id <= 9231721312)
AND "vulnerability_reads"."vulnerability_id" >= 11449
AND "vulnerability_reads"."vulnerability_id" < 12449

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96956

Updating the `project_security_statistics`(`persist_statistics`)
UPDATE
    project_security_statistics
SET
    vulnerability_count = vulnerability_count + 1000
WHERE
    project_id = 278964

https://console.postgres.ai/gitlab/gitlab-production-main/sessions/32285/commands/99729

Edited by Mehmet Emin INAC

Merge request reports

Loading