Skip to content

Sort vulnerability list by primary identifier

What does this MR do?

This MR allows users to sort the vulnerability list by the Primary Identifier.

Related to #258842 (closed)

Database Review

This MR introduces one migration to create 3 indices to speed up the sorting by primary_identifier relation of vulnerability_occurrences by just doing Index Only Scan to avoid reading from disk.

rake db:migrate:up

== 20201207173651 AddPrimaryIdentifierSortingIndices: migrating ===============
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:vulnerability_occurrences, [:primary_identifier_id, :vulnerability_id], {:name=>"index_vulnerability_occurrences_on_identifier_and_vulnerability", :algorithm=>:concurrently})
   -> 0.0037s
-- add_index(:vulnerability_occurrences, [:primary_identifier_id, :vulnerability_id], {:name=>"index_vulnerability_occurrences_on_identifier_and_vulnerability", :algorithm=>:concurrently})
   -> 0.0044s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:vulnerability_identifiers, [:project_id, :external_id, :id], {:order=>{:external_id=>:asc, :id=>:desc}, :name=>"index_vulnerability_identifiers_sorted_asc_by_external_id", :algorithm=>:concurrently})
   -> 0.0011s
-- add_index(:vulnerability_identifiers, [:project_id, :external_id, :id], {:order=>{:external_id=>:asc, :id=>:desc}, :name=>"index_vulnerability_identifiers_sorted_asc_by_external_id", :algorithm=>:concurrently})
   -> 0.0020s
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:vulnerability_identifiers, [:project_id, :external_id, :id], {:order=>{:external_id=>:desc, :id=>:desc}, :name=>"index_vulnerability_identifiers_sorted_desc_by_external_id", :algorithm=>:concurrently})
   -> 0.0012s
-- add_index(:vulnerability_identifiers, [:project_id, :external_id, :id], {:order=>{:external_id=>:desc, :id=>:desc}, :name=>"index_vulnerability_identifiers_sorted_desc_by_external_id", :algorithm=>:concurrently})
   -> 0.0028s
== 20201207173651 AddPrimaryIdentifierSortingIndices: migrated (0.0169s) ======

rake db:rollback

== 20201207173651 AddPrimaryIdentifierSortingIndices: reverting ===============
-- transaction_open?()
   -> 0.0000s
-- indexes(:vulnerability_occurrences)
   -> 0.0041s
-- remove_index(:vulnerability_occurrences, {:algorithm=>:concurrently, :name=>"index_vulnerability_occurrences_on_identifier_and_vulnerability"})
   -> 0.0031s
-- transaction_open?()
   -> 0.0000s
-- indexes(:vulnerability_identifiers)
   -> 0.0014s
-- remove_index(:vulnerability_identifiers, {:algorithm=>:concurrently, :name=>"index_vulnerability_identifiers_sorted_asc_by_external_id"})
   -> 0.0010s
-- transaction_open?()
   -> 0.0000s
-- indexes(:vulnerability_identifiers)
   -> 0.0012s
-- remove_index(:vulnerability_identifiers, {:algorithm=>:concurrently, :name=>"index_vulnerability_identifiers_sorted_desc_by_external_id"})
   -> 0.0010s
== 20201207173651 AddPrimaryIdentifierSortingIndices: reverted (0.0136s) ======

An example query which will utilize these new indices;

SELECT 
  "vulnerabilities".* 
FROM "vulnerabilities" 
INNER JOIN "vulnerability_occurrences" ON "vulnerability_occurrences"."vulnerability_id" = "vulnerabilities"."id" 
INNER JOIN "vulnerability_identifiers" ON "vulnerability_identifiers"."id" = "vulnerability_occurrences"."primary_identifier_id" 
WHERE 
  "vulnerabilities"."project_id" = 278964 AND 
  "vulnerability_identifiers"."project_id" = "vulnerabilities"."project_id" AND -- This is necessary to avoid scanning the whole index
  "vulnerabilities"."state" IN (1, 4) 
ORDER BY 
  "vulnerability_identifiers"."external_id" DESC, 
  "vulnerability_identifiers"."id" DESC 
LIMIT 20

And the query plan for this;

 Limit  (cost=1.29..459.50 rows=4 width=304) (actual time=0.089..0.281 rows=20 loops=1)
   Buffers: shared hit=108
   ->  Nested Loop  (cost=1.29..459.50 rows=4 width=304) (actual time=0.088..0.275 rows=20 loops=1)
         Buffers: shared hit=108
         ->  Nested Loop  (cost=0.86..299.13 rows=328 width=36) (actual time=0.062..0.085 rows=20 loops=1)
               Buffers: shared hit=28
               ->  Index Only Scan using test_index_ordered_5 on public.vulnerability_identifiers  (cost=0.43..5.62 rows=66 width=28) (actual time=0.022..0.023 rows=2 loops=1)
                     Index Cond: (vulnerability_identifiers.project_id = 278964)
                     Heap Fetches: 0
                     Buffers: shared hit=4
               ->  Index Only Scan using test_index_ordered_4 on public.vulnerability_occurrences  (cost=0.43..3.85 rows=60 width=16) (actual time=0.017..0.026 rows=10 loops=2)
                     Index Cond: (vulnerability_occurrences.primary_identifier_id = vulnerability_identifiers.id)
                     Heap Fetches: 0
                     Buffers: shared hit=24
         ->  Index Scan using vulnerabilities_pkey on public.vulnerabilities  (cost=0.43..0.49 rows=1 width=280) (actual time=0.008..0.008 rows=1 loops=20)
               Index Cond: (vulnerabilities.id = vulnerability_occurrences.vulnerability_id)
               Filter: ((vulnerabilities.state = ANY ('{1,4}'::integer[])) AND (vulnerabilities.project_id = 278964))
               Rows Removed by Filter: 0
               Buffers: shared hit=80

https://explain.depesz.com/s/r8wp

Without the indices in place, it takes around ~400ms to run the above query with the following query plan;

Limit  (cost=3946.70..3946.71 rows=4 width=304) (actual time=358.328..358.335 rows=20 loops=1)
   Buffers: shared hit=317269
   ->  Sort  (cost=3946.70..3946.71 rows=4 width=304) (actual time=358.326..358.330 rows=20 loops=1)
         Sort Key: vulnerability_identifiers.external_id DESC, vulnerability_identifiers.id DESC
         Sort Method: top-N heapsort  Memory: 34kB
         Buffers: shared hit=317269
         ->  Nested Loop  (cost=1.29..3946.66 rows=4 width=304) (actual time=0.077..314.453 rows=58546 loops=1)
               Buffers: shared hit=317269
               ->  Nested Loop  (cost=0.86..3786.79 rows=327 width=36) (actual time=0.056..89.295 rows=64637 loops=1)
                     Buffers: shared hit=63196
                     ->  Index Scan using index_vulnerability_identifiers_on_project_id_and_fingerprint on public.vulnerability_identifiers  (cost=0.43..72.32 rows=65 width=28) (actual time=0.011..0.272 rows=256 loops=1)
                           Index Cond: (vulnerability_identifiers.project_id = 278964)
                           Buffers: shared hit=259
                     ->  Index Scan using index_vulnerability_occurrences_on_primary_identifier_id on public.vulnerability_occurrences  (cost=0.43..56.55 rows=60 width=16) (actual time=0.005..0.287 rows=252 loops=256)
                           Index Cond: (vulnerability_occurrences.primary_identifier_id = vulnerability_identifiers.id)
                           Buffers: shared hit=62937
               ->  Index Scan using vulnerabilities_pkey on public.vulnerabilities  (cost=0.43..0.49 rows=1 width=280) (actual time=0.003..0.003 rows=1 loops=64637)
                     Index Cond: (vulnerabilities.id = vulnerability_occurrences.vulnerability_id)
                     Filter: ((vulnerabilities.state = ANY ('{1,4}'::integer[])) AND (vulnerabilities.project_id = 278964))
                     Rows Removed by Filter: 0
                     Buffers: shared hit=254073

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Mehmet Emin INAC

Merge request reports