Change Vulnerabilities Count Data Retention to 1 year
What does this MR do?
This MR resolves #241809 (closed). It does the following to re-populate database with data from last year:
- Modify retention period from 90 days to 1 year (https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/ee/gitlab/background_migration/populate_vulnerability_historical_statistics.rb#L19)
- Create new post migration to execute the background migration (just like https://gitlab.com/gitlab-org/gitlab/blob/master/db/post_migrate/20200821224343_schedule_populate_vulnerability_historical_statistics.rb#L3)
Projects
SELECT DISTINCT "vulnerabilities"."project_id"
FROM "vulnerabilities"
ORDER BY "vulnerabilities"."project_id" ASC;
Unique (cost=0.43..62711.49 rows=4941 width=8) (actual time=0.159..989.705 rows=6021 loops=1)
Buffers: shared hit=38073
-> Index Only Scan using index_vulnerabilities_on_project_id on public.vulnerabilities (cost=0.43..56352.26 rows=2543689 width=8) (actual time=0.157..668.205 rows=2579347 loops=1)
Heap Fetches: 24184
Buffers: shared hit=38073
Queries
Currently we have ~6082 projects that have at least one vulnerability. With Batch size = 50
, we will have around 122 batches executed with a 5 minutes delay between them, and the migration should take ~10h.
Each batch needs around ~30s to just perform the query to read the statistics, and ~25s to write statistics back to database.
Query to fetch projects with vulnerabilities (6083 / 50 = 122 times => 13ms * 122 = 1586ms)
SELECT DISTINCT "vulnerabilities"."project_id"
FROM "vulnerabilities"
ORDER BY "vulnerabilities"."project_id" ASC
LIMIT 50;
https://explain.depesz.com/s/DMOe (~13ms)
Query to fetch statistics (37 * 50ms for single project => 1850ms | 50 * 1850ms => ~92,5s for single batch)
SELECT DATE(calendar.entry) AS day,
severity,
COUNT(*)
FROM generate_series(
DATE '2020-07-20',
DATE '2020-07-29',
INTERVAL '1 day'
) as calendar(entry)
INNER JOIN vulnerabilities ON vulnerabilities.created_at <= calendar.entry
WHERE "vulnerabilities"."project_id" = 278964
AND (
(
vulnerabilities.dismissed_at IS NULL
OR vulnerabilities.dismissed_at > calendar.entry
)
AND (
vulnerabilities.resolved_at IS NULL
OR vulnerabilities.resolved_at > calendar.entry
)
)
GROUP BY "day",
"severity";
https://explain.dalibo.com/plan/FLW (~55ms)
Insert statistics (4 * 50 * 350ms for single batch => ~70s)
INSERT INTO "vulnerability_historical_statistics" (
"letter_grade",
"created_at",
"updated_at",
"project_id",
"total",
"critical",
"high",
"medium",
"low",
"unknown",
"info",
"date"
)
VALUES (
1,
'2020-07-30 07:23:26.813637',
'2020-07-30 07:23:26.813637',
10,
1,
0,
0,
0,
1,
0,
0,
'2020-05-01'
),
...,
(
1,
'2020-07-30 07:23:26.813637',
'2020-07-30 07:23:26.813637',
10,
91,
0,
0,
0,
91,
0,
0,
'2020-07-30'
) ON CONFLICT DO NOTHING
RETURNING "id"
https://explain.depesz.com/s/Eael (~350ms)
To summarize in single batch we are doing:
- fetching statistics for projects in batch (50 * 1850ms => ~92,5s)
- inserting statistics to vulnerability_historical_statistics table (50 * 1400ms => ~70s)
So a single batch should take around 3 minutes, so we set an interval of 5 minutes between batches.
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry - [-] Documentation (if required)
-
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
Database Execution
explain SELECT "vulnerability_historical_statistics"."id"
FROM "vulnerability_historical_statistics"
WHERE (vulnerability_historical_statistics.date < now() - interval '365 days')
ORDER BY "vulnerability_historical_statistics"."id";
Sort (cost=2.54..2.55 rows=1 width=8) (actual time=2.846..2.846 rows=0 loops=1)
Sort Key: vulnerability_historical_statistics.id
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=3 read=2
I/O Timings: read=2.487
-> Index Only Scan using index_vulnerability_historical_statistics_on_date_and_id on public.vulnerability_historical_statistics (cost=0.29..2.53 rows=1 width=8) (actual time=2.715..2.715 rows=0 loops=1)
Index Cond: (vulnerability_historical_statistics.date < (now() - '365 days'::interval))
Heap Fetches: 0
Buffers: shared read=2
Time: 3.056 ms
- planning: 0.159 ms
- execution: 2.897 ms
- I/O read: 2.487 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 3 (~24.00 KiB) from the buffer pool
- reads: 2 (~16.00 KiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Edited by Alan (Maciej) Paruszewski