Set `vulnerability_count` of project_security_statistics records
What does this MR do and why?
This change introduces a batched background migration to calculate and persist the total number of vulnerabilities each project contains.
We are removing an old migration which was calculating the same numbers but writing the values to the project_statistics table. We have decided to introduce a new table called project_security_statistics instead of using the project_statistics table because the project_statistics table stays on the main DB while project_security_statistics lives on the sec DB which guarantees transactionality.
Related to #474280+.
Database review
The estimation by the db testing job is off because it's using the total number of tuples in the vulnerability_reads table for the calculation, however, we are running a loose-index scan on that table to iterate over distinct project IDs.
There are currently 183_774 unique project IDs in the vulnerability reads table so the correct estimation would be (183_774 / 1000) * 2 / 60 ~= 6 hours.
Queries used by the migration
Upsert query to create/update `project_security_statistics` records(`ensure_statistics`)
INSERT INTO "project_security_statistics" ("project_id")
VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12), (13), (14), (15), (16), (17), (18), (19), (20), (21), (22), (23), (24), (25), (26), (27), (28), (29), (30), (31), (32), (33), (34), (35), (36), (37), (38), (39), (40), (41), (42), (43), (44), (45), (46), (47), (48), (49), (50), (51), (52), (53), (54), (55), (56), (57), (58), (59), (60), (61), (62), (63), (64), (65), (66), (67), (68), (69), (70), (71), (72), (73), (74), (75), (76), (77), (78), (79), (80), (81), (82), (83), (84), (85), (86), (87), (88), (89), (90), (91), (92), (93), (94), (95), (96), (97), (98), (99), (100)
ON CONFLICT
DO NOTHING
RETURNING
"project_id"
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/32285/commands/99730
Finding the latest `vulnerability_reads` record for a project(`reset_statistics_and_get_latest_vulnerability_id`)
SELECT
vulnerability_reads.*
FROM
vulnerability_reads
WHERE
vulnerability_reads.project_id = 278964
ORDER BY
vulnerability_reads.vulnerability_id DESC
LIMIT 1
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96952
Finding the lower-bound of the batch counting(`calculate_number_of_vulnerabilities`)
SELECT
"vulnerability_reads"."vulnerability_id"
FROM
"vulnerability_reads"
WHERE (project_id = 278964
AND vulnerability_id <= 9231721312)
ORDER BY
"vulnerability_reads"."vulnerability_id" ASC
LIMIT 1
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96954
Finding the upper-bound of the batch counting(`calculate_number_of_vulnerabilities`)
SELECT
"vulnerability_reads"."vulnerability_id"
FROM
"vulnerability_reads"
WHERE (project_id = 278964
AND vulnerability_id <= 9231721312)
AND "vulnerability_reads"."vulnerability_id" >= 11449
ORDER BY
"vulnerability_reads"."vulnerability_id" ASC
LIMIT 1 OFFSET 1000
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96955
Counting the `vulnerability_reads` records(`calculate_number_of_vulnerabilities`)
SELECT
COUNT(*)
FROM
"vulnerability_reads"
WHERE (project_id = 278964
AND vulnerability_id <= 9231721312)
AND "vulnerability_reads"."vulnerability_id" >= 11449
AND "vulnerability_reads"."vulnerability_id" < 12449
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/31211/commands/96956
Updating the `project_security_statistics`(`persist_statistics`)
UPDATE
project_security_statistics
SET
vulnerability_count = vulnerability_count + 1000
WHERE
project_id = 278964
https://console.postgres.ai/gitlab/gitlab-production-main/sessions/32285/commands/99729