Remove cross-join from snapshot calculator

What does this MR do and why?

Remove cross-join queries from ee/lib/analytics/devops_adoption/snapshot_calculator.rb

Before this change, SnapshotCalculator#vulnerability_management_used_count triggered an SQL query that involved the gitlab_sec schema and the main schema. The MR changes that query so that it no longer references tables of the main schema.

References

Please include cross links to any resources that are relevant to this MR This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Queries

SQL query executed for a each batch of snapshot_project_ids:

SELECT COUNT(DISTINCT "vulnerabilities"."project_id") FROM "vulnerabilities"
WHERE "vulnerabilities"."state" != 1
AND "vulnerabilities"."created_at" BETWEEN '2020-12-01 00:00:00' AND '2020-12-31 23:59:59.999999'
AND "vulnerabilities"."project_id" IN (60457116, 9450192, 278964)

NOTE: I've taken an SQL query generated when running the spec for the SnapshotCalculator, and changed to project IDs that match real projects on gitlab.com.

Query plan:

 Aggregate  (cost=16.16..16.17 rows=1 width=8) (actual time=69.456..69.458 rows=1 loops=1)
   Buffers: shared hit=2951 read=65 dirtied=3
   WAL: records=3 fpi=3 bytes=23945
   I/O Timings: read=67.217 write=0.000
   ->  Index Only Scan using idx_vulnerabilities_partial_devops_adoption_and_default_branch on public.vulnerabilities  (cost=0.56..15.73 rows=172 width=8) (actual time=22.625..68.787 rows=5388 loops=1)
         Index Cond: ((vulnerabilities.project_id = ANY ('{60457116,9450192,278964}'::bigint[])) AND (vulnerabilities.created_at >= '2020-12-01 00:00:00+00'::timestamp with time zone) AND (vulnerabilities.created_at <= '2020-12-31 23:59:59.999999+00'::timestamp with time zone))
         Heap Fetches: 3
         Buffers: shared hit=2941 read=65 dirtied=3
         WAL: records=3 fpi=3 bytes=23945
         I/O Timings: read=67.217 write=0.000
Settings: seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '472585MB', jit = 'off', random_page_cost = '1.5'
Time: 71.189 ms
  - planning: 1.665 ms
  - execution: 69.524 ms
    - I/O read: 67.217 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 2951 (~23.10 MiB) from the buffer pool
  - reads: 65 (~520.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 3 (~24.00 KiB)
  - writes: 0

See https://console.postgres.ai/gitlab/gitlab-production-main/sessions/33254/commands/102709

Related to #485660 (closed)

Edited by Fabien Catteau

Merge request reports

Loading