Skip to content

Filter vulnerability reads by the cartesian product of the given `IN` predicates

What does this MR do and why?

This MR introduces a performance optimization for IN queries to speed up the queries generated by the VulnerabilityReadsFinder class. Our initial plan was to use the already existing InOperatorOptimization but I couldn't find a way to place it nicely in our finder class.

The main problem was that the finder class is operating on vulnerability_reads table to speed up the filtering, sorting, and the pagination but since the GraphQL type is Vulnerability, the ActiveRecord relation object returned by this finder class automagically resolves into a collection vulnerability objects. For this reason, it was easier to implement this less efficient solution.

We will try to find a way to use InOperatorOptimization after hopefully fixing the performance issues with this temporary solution.

Related to #342531.

Database Review

An example query without the optimization
SELECT
    "vulnerability_reads".*
FROM
    "vulnerability_reads"
WHERE
    "vulnerability_reads"."project_id" = 278964
    AND "vulnerability_reads"."report_type" IN (0, 4)
    AND "vulnerability_reads"."severity" = 5
    AND "vulnerability_reads"."state" IN (1, 4)
ORDER BY
    "vulnerability_reads"."severity" DESC,
    "vulnerability_reads"."vulnerability_id" DESC
LIMIT 20;
Limit  (cost=0.56..3493.34 rows=20 width=141) (actual time=16823.255..57617.596 rows=20 loops=1)
   Buffers: shared hit=105327 read=49501 dirtied=11957
   I/O Timings: read=55805.355 write=0.000
   ->  Index Scan using index_vulnerability_reads_on_vulnerability_id on public.vulnerability_reads  (cost=0.56..3142626.79 rows=17995 width=141) (actual time=16823.252..57617.571 rows=20 loops=1)
         Filter: ((vulnerability_reads.report_type = ANY ('{0,4}'::integer[])) AND (vulnerability_reads.state = ANY ('{1,4}'::integer[])) AND (vulnerability_reads.project_id = 278964) AND (vulnerability_reads.severity = 5))
         Rows Removed by Filter: 377838
         Buffers: shared hit=105327 read=49501 dirtied=11957
         I/O Timings: read=55805.355 write=0.000

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10587/commands/38090

The equivalent query with the optimization
SELECT
    "vulnerability_reads".*
FROM
    unnest('{0, 4}'::smallint[]) AS "report_types" ("report_type"),
    unnest('{1, 4}'::smallint[]) AS "states" ("state"),
    LATERAL (
        SELECT
            "vulnerability_reads".*
        FROM
            "vulnerability_reads"
        WHERE
            "vulnerability_reads"."project_id" = 278964
            AND "vulnerability_reads"."severity" = 5
            AND (vulnerability_reads."report_type" = "report_types"."report_type")
            AND (vulnerability_reads."state" = "states"."state")
        ORDER BY
            "vulnerability_reads"."severity" DESC,
            "vulnerability_reads"."vulnerability_id" DESC
        LIMIT 20) AS vulnerability_reads
ORDER BY
    "vulnerability_reads"."severity" DESC,
    "vulnerability_reads"."vulnerability_id" DESC
LIMIT 20
Limit  (cost=117.53..117.58 rows=20 width=141) (actual time=0.450..0.457 rows=20 loops=1)
   Buffers: shared hit=47
   I/O Timings: read=0.000 write=0.000
   ->  Sort  (cost=117.53..117.73 rows=80 width=141) (actual time=0.448..0.452 rows=20 loops=1)
         Sort Key: vulnerability_reads.severity DESC, vulnerability_reads.vulnerability_id DESC
         Sort Method: quicksort  Memory: 27kB
         Buffers: shared hit=47
         I/O Timings: read=0.000 write=0.000
         ->  Nested Loop  (cost=0.57..115.40 rows=80 width=141) (actual time=0.147..0.326 rows=20 loops=1)
               Buffers: shared hit=41
               I/O Timings: read=0.000 write=0.000
               ->  Function Scan on unnest states  (cost=0.00..0.02 rows=2 width=2) (actual time=0.016..0.017 rows=2 loops=1)
                     I/O Timings: read=0.000 write=0.000
               ->  Nested Loop  (cost=0.57..57.29 rows=40 width=141) (actual time=0.076..0.149 rows=10 loops=2)
                     Buffers: shared hit=41
                     I/O Timings: read=0.000 write=0.000
                     ->  Function Scan on unnest report_types  (cost=0.00..0.02 rows=2 width=2) (actual time=0.003..0.003 rows=2 loops=2)
                           I/O Timings: read=0.000 write=0.000
                     ->  Limit  (cost=0.56..28.23 rows=20 width=141) (actual time=0.043..0.069 rows=5 loops=4)
                           Buffers: shared hit=41
                           I/O Timings: read=0.000 write=0.000
                           ->  Index Scan using index_vulnerability_reads_common_finder_query on public.vulnerability_reads  (cost=0.56..1093.43 rows=790 width=141) (actual time=0.042..0.067 rows=5 loops=4)
                                 Index Cond: ((vulnerability_reads.project_id = 278964) AND (vulnerability_reads.state = states.state) AND (vulnerability_reads.report_type = report_types.report_type) AND (vulnerability_reads.severity = 5))
                                 Buffers: shared hit=41
                                 I/O Timings: read=0.000 write=0.000

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10587/commands/38091

An optimized paginated query example
SELECT
    "vulnerability_reads".*
FROM
    unnest('{6,2,5,3,1,0,4,99}'::smallint[]) AS "report_types" ("report_type"),
    unnest('{1,4}'::smallint[]) AS "states" ("state"),
    LATERAL (
        SELECT
            "vulnerability_reads".*
        FROM
            "vulnerability_reads"
        WHERE
            "vulnerability_reads"."project_id" = 278964
            AND (("vulnerability_reads"."severity",
                    "vulnerability_reads"."vulnerability_id") < (7,
                    46556621))
            AND (vulnerability_reads."report_type" = "report_types"."report_type")
            AND (vulnerability_reads."state" = "states"."state")
        ORDER BY
            "vulnerability_reads"."severity" DESC,
            "vulnerability_reads"."vulnerability_id" DESC
        LIMIT 20) AS vulnerability_reads
ORDER BY
    "vulnerability_reads"."severity" DESC,
    "vulnerability_reads"."vulnerability_id" DESC
LIMIT 20
Limit  (cost=39907.82..39907.87 rows=20 width=141) (actual time=9.147..9.155 rows=20 loops=1)
   Buffers: shared hit=2659
   I/O Timings: read=0.000 write=0.000
   ->  Sort  (cost=39907.82..39908.62 rows=320 width=141) (actual time=9.146..9.150 rows=20 loops=1)
         Sort Key: vulnerability_reads.severity DESC, vulnerability_reads.vulnerability_id DESC
         Sort Method: top-N heapsort  Memory: 30kB
         Buffers: shared hit=2659
         I/O Timings: read=0.000 write=0.000
         ->  Nested Loop  (cost=2493.05..39899.30 rows=320 width=141) (actual time=3.005..9.040 rows=84 loops=1)
               Buffers: shared hit=2653
               I/O Timings: read=0.000 write=0.000
               ->  Function Scan on unnest states  (cost=0.00..0.02 rows=2 width=2) (actual time=0.013..0.014 rows=2 loops=1)
                     I/O Timings: read=0.000 write=0.000
               ->  Nested Loop  (cost=2493.05..19948.04 rows=160 width=141) (actual time=1.509..4.500 rows=42 loops=2)
                     Buffers: shared hit=2653
                     I/O Timings: read=0.000 write=0.000
                     ->  Function Scan on unnest report_types  (cost=0.00..0.08 rows=8 width=2) (actual time=0.004..0.006 rows=8 loops=2)
                           I/O Timings: read=0.000 write=0.000
                     ->  Limit  (cost=2493.04..2493.09 rows=20 width=141) (actual time=0.558..0.559 rows=5 loops=16)
                           Buffers: shared hit=2653
                           I/O Timings: read=0.000 write=0.000
                           ->  Sort  (cost=2493.04..2497.47 rows=1772 width=141) (actual time=0.557..0.557 rows=5 loops=16)
                                 Sort Key: vulnerability_reads.severity DESC, vulnerability_reads.vulnerability_id DESC
                                 Sort Method: quicksort  Memory: 25kB
                                 Buffers: shared hit=2653
                                 I/O Timings: read=0.000 write=0.000
                                 ->  Index Scan using index_vulnerability_reads_common_finder_query on public.vulnerability_reads  (cost=0.56..2445.89 rows=1772 width=141) (actual time=0.010..0.414 rows=652 loops=16)
                                       Index Cond: ((vulnerability_reads.project_id = 278964) AND (vulnerability_reads.state = states.state) AND (vulnerability_reads.report_type = report_types.report_type) AND (ROW(vulnerability_reads.severity, vulnerability_reads.vulnerability_id) < ROW(7, 46556621)))
                                       Buffers: shared hit=2653
                                       I/O Timings: read=0.000 write=0.000

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10656/commands/38438

An optimized `has_next_page?` query
SELECT
    1 AS one
FROM
    unnest('{6,2,5,3,1,0,4,99}'::smallint[]) AS "report_types" ("report_type"),
    unnest('{1,4}'::smallint[]) AS "states" ("state"),
    LATERAL (
        SELECT
            "vulnerability_reads".*
        FROM
            "vulnerability_reads"
        WHERE
            "vulnerability_reads"."project_id" = 278964
            AND (("vulnerability_reads"."severity",
                    "vulnerability_reads"."vulnerability_id") < (7,
                    44669986))
            AND (vulnerability_reads."report_type" = "report_types"."report_type")
            AND (vulnerability_reads."state" = "states"."state")
        ORDER BY
            "vulnerability_reads"."severity" DESC,
            "vulnerability_reads"."vulnerability_id" DESC
        LIMIT 1 OFFSET 20) AS vulnerability_reads
LIMIT 1
Limit  (cost=214.50..415.63 rows=1 width=4) (actual time=5.967..5.972 rows=1 loops=1)
   Buffers: shared hit=137
   I/O Timings: read=0.000 write=0.000
   ->  Nested Loop  (cost=214.50..3432.60 rows=16 width=4) (actual time=5.965..5.969 rows=1 loops=1)
         Buffers: shared hit=137
         I/O Timings: read=0.000 write=0.000
         ->  Function Scan on unnest states  (cost=0.00..0.02 rows=2 width=2) (actual time=0.012..0.012 rows=1 loops=1)
               I/O Timings: read=0.000 write=0.000
         ->  Nested Loop  (cost=214.50..1716.21 rows=8 width=0) (actual time=5.949..5.952 rows=1 loops=1)
               Buffers: shared hit=137
               I/O Timings: read=0.000 write=0.000
               ->  Function Scan on unnest report_types  (cost=0.00..0.08 rows=8 width=2) (actual time=0.007..0.008 rows=4 loops=1)
                     I/O Timings: read=0.000 write=0.000
               ->  Limit  (cost=214.49..214.50 rows=1 width=120) (actual time=1.484..1.484 rows=0 loops=4)
                     Buffers: shared hit=137
                     I/O Timings: read=0.000 write=0.000
                     ->  Sort  (cost=214.44..218.87 rows=1772 width=120) (actual time=1.482..1.483 rows=5 loops=4)
                           Sort Key: vulnerability_reads.severity DESC, vulnerability_reads.vulnerability_id DESC
                           Sort Method: top-N heapsort  Memory: 26kB
                           Buffers: shared hit=137
                           I/O Timings: read=0.000 write=0.000
                           ->  Index Only Scan using index_vulnerability_reads_common_finder_query on public.vulnerability_reads  (cost=0.56..166.67 rows=1772 width=120) (actual time=0.032..0.825 rows=1158 loops=4)
                                 Index Cond: ((vulnerability_reads.project_id = 278964) AND (vulnerability_reads.state = states.state) AND (vulnerability_reads.report_type = report_types.report_type) AND (ROW(vulnerability_reads.severity, vulnerability_reads.vulnerability_id) < ROW(7, 44669986)))
                                 Heap Fetches: 95
                                 Buffers: shared hit=131
                                 I/O Timings: read=0.000 write=0.000

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/10656/commands/38441

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Mehmet Emin INAC

Merge request reports