Skip to content

Change Vulnerabilities Count Data Retention to 1 year

What does this MR do?

This MR resolves #241809 (closed). It does the following to re-populate database with data from last year:

Projects

SELECT DISTINCT "vulnerabilities"."project_id"
FROM "vulnerabilities"
ORDER BY "vulnerabilities"."project_id" ASC;
 Unique  (cost=0.43..62711.49 rows=4941 width=8) (actual time=0.159..989.705 rows=6021 loops=1)
   Buffers: shared hit=38073
   ->  Index Only Scan using index_vulnerabilities_on_project_id on public.vulnerabilities  (cost=0.43..56352.26 rows=2543689 width=8) (actual time=0.157..668.205 rows=2579347 loops=1)
         Heap Fetches: 24184
         Buffers: shared hit=38073

Queries

Currently we have ~6082 projects that have at least one vulnerability. With Batch size = 50, we will have around 122 batches executed with a 5 minutes delay between them, and the migration should take ~10h.

Each batch needs around ~30s to just perform the query to read the statistics, and ~25s to write statistics back to database.

Query to fetch projects with vulnerabilities (6083 / 50 = 122 times => 13ms * 122 = 1586ms)

SELECT DISTINCT "vulnerabilities"."project_id"
FROM "vulnerabilities"
ORDER BY "vulnerabilities"."project_id" ASC
LIMIT 50;

https://explain.depesz.com/s/DMOe (~13ms)

Query to fetch statistics (37 * 50ms for single project => 1850ms | 50 * 1850ms => ~92,5s for single batch)

SELECT DATE(calendar.entry) AS day,
  severity,
  COUNT(*)
FROM generate_series(
    DATE '2020-07-20',
    DATE '2020-07-29',
    INTERVAL '1 day'
  ) as calendar(entry)
  INNER JOIN vulnerabilities ON vulnerabilities.created_at <= calendar.entry
WHERE "vulnerabilities"."project_id" = 278964
  AND (
    (
      vulnerabilities.dismissed_at IS NULL
      OR vulnerabilities.dismissed_at > calendar.entry
    )
    AND (
      vulnerabilities.resolved_at IS NULL
      OR vulnerabilities.resolved_at > calendar.entry
    )
  )
GROUP BY "day",
  "severity";

https://explain.dalibo.com/plan/FLW (~55ms)

Insert statistics (4 * 50 * 350ms for single batch => ~70s)

INSERT INTO "vulnerability_historical_statistics" (
    "letter_grade",
    "created_at",
    "updated_at",
    "project_id",
    "total",
    "critical",
    "high",
    "medium",
    "low",
    "unknown",
    "info",
    "date"
  )
VALUES (
    1,
    '2020-07-30 07:23:26.813637',
    '2020-07-30 07:23:26.813637',
    10,
    1,
    0,
    0,
    0,
    1,
    0,
    0,
    '2020-05-01'
  ),
  ...,
  (
    1,
    '2020-07-30 07:23:26.813637',
    '2020-07-30 07:23:26.813637',
    10,
    91,
    0,
    0,
    0,
    91,
    0,
    0,
    '2020-07-30'
  ) ON CONFLICT DO NOTHING
RETURNING "id"

https://explain.depesz.com/s/Eael (~350ms)

To summarize in single batch we are doing:

  • fetching statistics for projects in batch (50 * 1850ms => ~92,5s)
  • inserting statistics to vulnerability_historical_statistics table (50 * 1400ms => ~70s)

So a single batch should take around 3 minutes, so we set an interval of 5 minutes between batches.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Database Execution

explain SELECT "vulnerability_historical_statistics"."id"
  FROM "vulnerability_historical_statistics"
  WHERE (vulnerability_historical_statistics.date < now() - interval '365 days')
  ORDER BY "vulnerability_historical_statistics"."id";
 Sort  (cost=2.54..2.55 rows=1 width=8) (actual time=2.846..2.846 rows=0 loops=1)
   Sort Key: vulnerability_historical_statistics.id
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=3 read=2
   I/O Timings: read=2.487
   ->  Index Only Scan using index_vulnerability_historical_statistics_on_date_and_id on public.vulnerability_historical_statistics  (cost=0.29..2.53 rows=1 width=8) (actual time=2.715..2.715 rows=0 loops=1)
         Index Cond: (vulnerability_historical_statistics.date < (now() - '365 days'::interval))
         Heap Fetches: 0
         Buffers: shared read=2
Time: 3.056 ms
  - planning: 0.159 ms
  - execution: 2.897 ms
    - I/O read: 2.487 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 3 (~24.00 KiB) from the buffer pool
  - reads: 2 (~16.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
Edited by Alan (Maciej) Paruszewski

Merge request reports