Filter vulnerability reads by the cartesian product of the given `IN` predicates
What does this MR do and why?
This MR introduces a performance optimization for IN
queries to speed up the queries generated by the VulnerabilityReadsFinder
class. Our initial plan was to use the already existing InOperatorOptimization
but I couldn't find a way to place it nicely in our finder class.
The main problem was that the finder class is operating on vulnerability_reads
table to speed up the filtering, sorting, and the pagination but since the GraphQL type is Vulnerability
, the ActiveRecord relation object returned by this finder class automagically resolves into a collection vulnerability
objects. For this reason, it was easier to implement this less efficient solution.
We will try to find a way to use InOperatorOptimization
after hopefully fixing the performance issues with this temporary solution.
Related to #342531.
Database Review
An example query without the optimization
SELECT
"vulnerability_reads".*
FROM
"vulnerability_reads"
WHERE
"vulnerability_reads"."project_id" = 278964
AND "vulnerability_reads"."report_type" IN (0, 4)
AND "vulnerability_reads"."severity" = 5
AND "vulnerability_reads"."state" IN (1, 4)
ORDER BY
"vulnerability_reads"."severity" DESC,
"vulnerability_reads"."vulnerability_id" DESC
LIMIT 20;
Limit (cost=0.56..3493.34 rows=20 width=141) (actual time=16823.255..57617.596 rows=20 loops=1)
Buffers: shared hit=105327 read=49501 dirtied=11957
I/O Timings: read=55805.355 write=0.000
-> Index Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads (cost=0.56..3142626.79 rows=17995 width=141) (actual time=16823.252..57617.571 rows=20 loops=1)
Filter: ((vulnerability_reads.report_type = ANY ('{0,4}'::integer[])) AND (vulnerability_reads.state = ANY ('{1,4}'::integer[])) AND (vulnerability_reads.project_id = 278964) AND (vulnerability_reads.severity = 5))
Rows Removed by Filter: 377838
Buffers: shared hit=105327 read=49501 dirtied=11957
I/O Timings: read=55805.355 write=0.000
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10587/commands/38090
The equivalent query with the optimization
SELECT
"vulnerability_reads".*
FROM
unnest('{0, 4}'::smallint[]) AS "report_types" ("report_type"),
unnest('{1, 4}'::smallint[]) AS "states" ("state"),
LATERAL (
SELECT
"vulnerability_reads".*
FROM
"vulnerability_reads"
WHERE
"vulnerability_reads"."project_id" = 278964
AND "vulnerability_reads"."severity" = 5
AND (vulnerability_reads."report_type" = "report_types"."report_type")
AND (vulnerability_reads."state" = "states"."state")
ORDER BY
"vulnerability_reads"."severity" DESC,
"vulnerability_reads"."vulnerability_id" DESC
LIMIT 20) AS vulnerability_reads
ORDER BY
"vulnerability_reads"."severity" DESC,
"vulnerability_reads"."vulnerability_id" DESC
LIMIT 20
Limit (cost=117.53..117.58 rows=20 width=141) (actual time=0.450..0.457 rows=20 loops=1)
Buffers: shared hit=47
I/O Timings: read=0.000 write=0.000
-> Sort (cost=117.53..117.73 rows=80 width=141) (actual time=0.448..0.452 rows=20 loops=1)
Sort Key: vulnerability_reads.severity DESC, vulnerability_reads.vulnerability_id DESC
Sort Method: quicksort Memory: 27kB
Buffers: shared hit=47
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=0.57..115.40 rows=80 width=141) (actual time=0.147..0.326 rows=20 loops=1)
Buffers: shared hit=41
I/O Timings: read=0.000 write=0.000
-> Function Scan on unnest states (cost=0.00..0.02 rows=2 width=2) (actual time=0.016..0.017 rows=2 loops=1)
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=0.57..57.29 rows=40 width=141) (actual time=0.076..0.149 rows=10 loops=2)
Buffers: shared hit=41
I/O Timings: read=0.000 write=0.000
-> Function Scan on unnest report_types (cost=0.00..0.02 rows=2 width=2) (actual time=0.003..0.003 rows=2 loops=2)
I/O Timings: read=0.000 write=0.000
-> Limit (cost=0.56..28.23 rows=20 width=141) (actual time=0.043..0.069 rows=5 loops=4)
Buffers: shared hit=41
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_vulnerability_reads_common_finder_query on public.vulnerability_reads (cost=0.56..1093.43 rows=790 width=141) (actual time=0.042..0.067 rows=5 loops=4)
Index Cond: ((vulnerability_reads.project_id = 278964) AND (vulnerability_reads.state = states.state) AND (vulnerability_reads.report_type = report_types.report_type) AND (vulnerability_reads.severity = 5))
Buffers: shared hit=41
I/O Timings: read=0.000 write=0.000
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10587/commands/38091
An optimized paginated query example
SELECT
"vulnerability_reads".*
FROM
unnest('{6,2,5,3,1,0,4,99}'::smallint[]) AS "report_types" ("report_type"),
unnest('{1,4}'::smallint[]) AS "states" ("state"),
LATERAL (
SELECT
"vulnerability_reads".*
FROM
"vulnerability_reads"
WHERE
"vulnerability_reads"."project_id" = 278964
AND (("vulnerability_reads"."severity",
"vulnerability_reads"."vulnerability_id") < (7,
46556621))
AND (vulnerability_reads."report_type" = "report_types"."report_type")
AND (vulnerability_reads."state" = "states"."state")
ORDER BY
"vulnerability_reads"."severity" DESC,
"vulnerability_reads"."vulnerability_id" DESC
LIMIT 20) AS vulnerability_reads
ORDER BY
"vulnerability_reads"."severity" DESC,
"vulnerability_reads"."vulnerability_id" DESC
LIMIT 20
Limit (cost=39907.82..39907.87 rows=20 width=141) (actual time=9.147..9.155 rows=20 loops=1)
Buffers: shared hit=2659
I/O Timings: read=0.000 write=0.000
-> Sort (cost=39907.82..39908.62 rows=320 width=141) (actual time=9.146..9.150 rows=20 loops=1)
Sort Key: vulnerability_reads.severity DESC, vulnerability_reads.vulnerability_id DESC
Sort Method: top-N heapsort Memory: 30kB
Buffers: shared hit=2659
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=2493.05..39899.30 rows=320 width=141) (actual time=3.005..9.040 rows=84 loops=1)
Buffers: shared hit=2653
I/O Timings: read=0.000 write=0.000
-> Function Scan on unnest states (cost=0.00..0.02 rows=2 width=2) (actual time=0.013..0.014 rows=2 loops=1)
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=2493.05..19948.04 rows=160 width=141) (actual time=1.509..4.500 rows=42 loops=2)
Buffers: shared hit=2653
I/O Timings: read=0.000 write=0.000
-> Function Scan on unnest report_types (cost=0.00..0.08 rows=8 width=2) (actual time=0.004..0.006 rows=8 loops=2)
I/O Timings: read=0.000 write=0.000
-> Limit (cost=2493.04..2493.09 rows=20 width=141) (actual time=0.558..0.559 rows=5 loops=16)
Buffers: shared hit=2653
I/O Timings: read=0.000 write=0.000
-> Sort (cost=2493.04..2497.47 rows=1772 width=141) (actual time=0.557..0.557 rows=5 loops=16)
Sort Key: vulnerability_reads.severity DESC, vulnerability_reads.vulnerability_id DESC
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=2653
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_vulnerability_reads_common_finder_query on public.vulnerability_reads (cost=0.56..2445.89 rows=1772 width=141) (actual time=0.010..0.414 rows=652 loops=16)
Index Cond: ((vulnerability_reads.project_id = 278964) AND (vulnerability_reads.state = states.state) AND (vulnerability_reads.report_type = report_types.report_type) AND (ROW(vulnerability_reads.severity, vulnerability_reads.vulnerability_id) < ROW(7, 46556621)))
Buffers: shared hit=2653
I/O Timings: read=0.000 write=0.000
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10656/commands/38438
An optimized `has_next_page?` query
SELECT
1 AS one
FROM
unnest('{6,2,5,3,1,0,4,99}'::smallint[]) AS "report_types" ("report_type"),
unnest('{1,4}'::smallint[]) AS "states" ("state"),
LATERAL (
SELECT
"vulnerability_reads".*
FROM
"vulnerability_reads"
WHERE
"vulnerability_reads"."project_id" = 278964
AND (("vulnerability_reads"."severity",
"vulnerability_reads"."vulnerability_id") < (7,
44669986))
AND (vulnerability_reads."report_type" = "report_types"."report_type")
AND (vulnerability_reads."state" = "states"."state")
ORDER BY
"vulnerability_reads"."severity" DESC,
"vulnerability_reads"."vulnerability_id" DESC
LIMIT 1 OFFSET 20) AS vulnerability_reads
LIMIT 1
Limit (cost=214.50..415.63 rows=1 width=4) (actual time=5.967..5.972 rows=1 loops=1)
Buffers: shared hit=137
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=214.50..3432.60 rows=16 width=4) (actual time=5.965..5.969 rows=1 loops=1)
Buffers: shared hit=137
I/O Timings: read=0.000 write=0.000
-> Function Scan on unnest states (cost=0.00..0.02 rows=2 width=2) (actual time=0.012..0.012 rows=1 loops=1)
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=214.50..1716.21 rows=8 width=0) (actual time=5.949..5.952 rows=1 loops=1)
Buffers: shared hit=137
I/O Timings: read=0.000 write=0.000
-> Function Scan on unnest report_types (cost=0.00..0.08 rows=8 width=2) (actual time=0.007..0.008 rows=4 loops=1)
I/O Timings: read=0.000 write=0.000
-> Limit (cost=214.49..214.50 rows=1 width=120) (actual time=1.484..1.484 rows=0 loops=4)
Buffers: shared hit=137
I/O Timings: read=0.000 write=0.000
-> Sort (cost=214.44..218.87 rows=1772 width=120) (actual time=1.482..1.483 rows=5 loops=4)
Sort Key: vulnerability_reads.severity DESC, vulnerability_reads.vulnerability_id DESC
Sort Method: top-N heapsort Memory: 26kB
Buffers: shared hit=137
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using index_vulnerability_reads_common_finder_query on public.vulnerability_reads (cost=0.56..166.67 rows=1772 width=120) (actual time=0.032..0.825 rows=1158 loops=4)
Index Cond: ((vulnerability_reads.project_id = 278964) AND (vulnerability_reads.state = states.state) AND (vulnerability_reads.report_type = report_types.report_type) AND (ROW(vulnerability_reads.severity, vulnerability_reads.vulnerability_id) < ROW(7, 44669986)))
Heap Fetches: 95
Buffers: shared hit=131
I/O Timings: read=0.000 write=0.000
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10656/commands/38441
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.