Add security inventory based filtering

What does this MR do and why?

Adds project filtering capabilities based on the security_inventory_filters table in sec_db, when loading projects from main_db. This MR manually calculates pagination-related fields using data from the sec_db table, and passes ids to be used for fetching projects from main_db.

Changelog: added
EE: true

How to set up and validate locally

  1. Add some vulnerabilities to your project using this repo.
  2. Use the code in this MR to backfill the inventory filters data table
  3. Use this GraphQL query:
     query GetNamespaceSecurityProjects(
       $namespaceId: NamespaceID!
       $search: String
       $vulnerabilityCountFilters: [VulnerabilityCountFilterInput!]
       $securityAnalyzerFilters: [AnalyzerFilterInput!]
       $first: Int
       $after: String
       $last: Int
       $before: String
     ) {
       namespaceSecurityProjects(
         namespaceId: $namespaceId
         search: $search
         vulnerabilityCountFilters: $vulnerabilityCountFilters
         securityAnalyzerFilters: $securityAnalyzerFilters
         first: $first
         after: $after
         last: $last
         before: $before
       ) {
         edges {
           node {
             id
             name
             path
             fullPath
           }
         }
         pageInfo {
           hasNextPage
           hasPreviousPage
           startCursor
           endCursor
         }
       }
     }
    With some parameters. For example:
     {
       "namespaceId": "gid://gitlab/Group/XXXXX",
       "search": "XXXXXXX",
       "vulnerabilityCountFilters": [
         {
           "operator": "GREATER_THAN_OR_EQUAL_TO",
           "count": 5,
           "severity": "CRITICAL"
         }
       ],
       "securityAnalyzerFilters": [
         {
           "analyzerType": "SAST",
           "status": "SUCCESS"
         }
       ],
       "first": 10
     }

Query plans

by_severity_count

Raw SQL
SELECT
    "security_inventory_filters".*
FROM
    "security_inventory_filters"
WHERE (security_inventory_filters.traversal_ids >= '{9970}'
    AND '{9971}' > security_inventory_filters.traversal_ids)
AND "security_inventory_filters"."critical" <= 5
AND "security_inventory_filters"."high" >= 7
ORDER BY
    "security_inventory_filters"."project_id" ASC,
    "security_inventory_filters"."id" DESC
LIMIT 21
Plan

See full plan here.

  Limit  (cost=2259.10..9812.87 rows=14 width=142) (actual time=122.498..126.403 rows=10 loops=1)
   Buffers: shared hit=161064 read=4405 dirtied=3012
   WAL: records=3443 fpi=3012 bytes=23952931
   I/O Timings: read=89.550 write=0.000
   ->  Gather Merge  (cost=2259.10..9812.87 rows=14 width=142) (actual time=122.495..126.399 rows=10 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=161064 read=4405 dirtied=3012
         WAL: records=3443 fpi=3012 bytes=23952931
         I/O Timings: read=89.550 write=0.000
         ->  Incremental Sort  (cost=1259.07..8811.24 rows=7 width=142) (actual time=117.656..117.658 rows=3 loops=3)
               Sort Key: security_inventory_filters.project_id, security_inventory_filters.id DESC
               Buffers: shared hit=161064 read=4405 dirtied=3012
               WAL: records=3443 fpi=3012 bytes=23952931
               I/O Timings: read=89.550 write=0.000
               ->  Parallel Index Scan using index_security_inventory_filters_on_project_id on public.security_inventory_filters  (cost=0.42..8810.92 rows=7 width=142) (actual time=56.433..117.537 rows=3 loops=3)
                     Filter: ((security_inventory_filters.traversal_ids >= '{9970}'::bigint[]) AND ('{9971}'::bigint[] > security_inventory_filters.traversal_ids) AND (security_inventory_filters.critical <= 5) AND (security_inventory_filters.high >= 7))
                     Rows Removed by Filter: 54914
                     Buffers: shared hit=160849 read=4405 dirtied=3012
                     WAL: records=3443 fpi=3012 bytes=23952931
                     I/O Timings: read=89.550 write=0.000
Settings: seq_page_cost = '4', random_page_cost = '1.5', work_mem = '100MB', effective_cache_size = '338688MB', jit = 'off'

by_analyzer_status

Raw SQL
SELECT
    "security_inventory_filters".*
FROM
    "security_inventory_filters"
WHERE (security_inventory_filters.traversal_ids >= '{9970}'
    AND '{9971}' > security_inventory_filters.traversal_ids)
AND "security_inventory_filters"."secret_detection" = 1
AND "security_inventory_filters"."dependency_scanning" = 2
ORDER BY
    "security_inventory_filters"."project_id" ASC,
    "security_inventory_filters"."id" DESC
LIMIT 21
Plan

See full plan here.

 Limit  (cost=2259.10..9812.87 rows=14 width=142) (actual time=55.575..58.879 rows=1 loops=1)
   Buffers: shared hit=164910
   I/O Timings: read=0.000 write=0.000
   ->  Gather Merge  (cost=2259.10..9812.87 rows=14 width=142) (actual time=55.573..58.876 rows=1 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=164910
         I/O Timings: read=0.000 write=0.000
         ->  Incremental Sort  (cost=1259.07..8811.24 rows=7 width=142) (actual time=52.231..52.232 rows=0 loops=3)
               Sort Key: security_inventory_filters.project_id, security_inventory_filters.id DESC
               Buffers: shared hit=164910
               I/O Timings: read=0.000 write=0.000
               ->  Parallel Index Scan using index_security_inventory_filters_on_project_id on public.security_inventory_filters  (cost=0.42..8810.92 rows=7 width=142) (actual time=33.786..52.109 rows=0 loops=3)
                     Filter: ((security_inventory_filters.traversal_ids >= '{9970}'::bigint[]) AND ('{9971}'::bigint[] > security_inventory_filters.traversal_ids) AND (security_inventory_filters.secret_detection = 1) AND (security_inventory_filters.dependency_scanning = 2))
                     Rows Removed by Filter: 54917
                     Buffers: shared hit=164695
                     I/O Timings: read=0.000 write=0.000
Settings: work_mem = '100MB', effective_cache_size = '338688MB', jit = 'off', seq_page_cost = '4', random_page_cost = '1.5'

Both

Raw SQL
SELECT
    "security_inventory_filters".*
FROM
    "security_inventory_filters"
WHERE (security_inventory_filters.traversal_ids >= '{9970}'
    AND '{9971}' > security_inventory_filters.traversal_ids)
AND "security_inventory_filters"."critical" >= 3
AND "security_inventory_filters"."secret_detection" = 1
ORDER BY
    "security_inventory_filters"."project_id" ASC,
    "security_inventory_filters"."id" DESC
LIMIT 21
Plan

See full plan here.

  Limit  (cost=3937.29..9811.77 rows=6 width=142) (actual time=50.111..53.018 rows=14 loops=1)
   Buffers: shared hit=164910
   I/O Timings: read=0.000 write=0.000
   ->  Gather Merge  (cost=3937.29..9811.77 rows=6 width=142) (actual time=50.110..53.013 rows=14 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=164910
         I/O Timings: read=0.000 write=0.000
         ->  Incremental Sort  (cost=2937.26..8811.06 rows=3 width=142) (actual time=46.854..46.858 rows=5 loops=3)
               Sort Key: security_inventory_filters.project_id, security_inventory_filters.id DESC
               Buffers: shared hit=164910
               I/O Timings: read=0.000 write=0.000
               ->  Parallel Index Scan using index_security_inventory_filters_on_project_id on public.security_inventory_filters  (cost=0.42..8810.92 rows=3 width=142) (actual time=3.311..46.721 rows=5 loops=3)
                     Filter: ((security_inventory_filters.traversal_ids >= '{9970}'::bigint[]) AND ('{9971}'::bigint[] > security_inventory_filters.traversal_ids) AND (security_inventory_filters.critical >= 3) AND (security_inventory_filters.secret_detection = 1))
                     Rows Removed by Filter: 54913
                     Buffers: shared hit=164695
                     I/O Timings: read=0.000 write=0.000
Settings: random_page_cost = '1.5', work_mem = '100MB', effective_cache_size = '338688MB', jit = 'off', seq_page_cost = '4'

After adding index on traversal_ids:

by_severity_count

Raw SQL
SELECT
    "security_inventory_filters".*
FROM
    "security_inventory_filters"
WHERE (security_inventory_filters.traversal_ids >= '{9970}'
    AND '{9971}' > security_inventory_filters.traversal_ids)
AND "security_inventory_filters"."critical" <= 5
AND "security_inventory_filters"."high" >= 7
ORDER BY
    "security_inventory_filters"."project_id" ASC,
    "security_inventory_filters"."id" DESC
LIMIT 21
Plan

See full plan here.

  Limit  (cost=105.78..2311.52 rows=21 width=140) (actual time=20.601..20.605 rows=21 loops=1)
   Buffers: shared hit=14028
   I/O Timings: read=0.000 write=0.000
   ->  Incremental Sort  (cost=105.78..31196.29 rows=296 width=140) (actual time=20.599..20.602 rows=21 loops=1)
         Sort Key: security_inventory_filters.project_id, security_inventory_filters.id DESC
         Buffers: shared hit=14028
         I/O Timings: read=0.000 write=0.000
         ->  Index Scan using index_security_inventory_filters_on_project_id on public.security_inventory_filters  (cost=0.42..31182.97 rows=296 width=140) (actual time=0.065..20.577 rows=22 loops=1)
               Filter: ((security_inventory_filters.traversal_ids >= '{9970}'::bigint[]) AND ('{9971}'::bigint[] > security_inventory_filters.traversal_ids) AND (security_inventory_filters.critical <= 5) AND (security_inventory_filters.high >= 7))
               Rows Removed by Filter: 33384
               Buffers: shared hit=14022
               I/O Timings: read=0.000 write=0.000
Settings: seq_page_cost = '4', random_page_cost = '1.5', work_mem = '100MB', effective_cache_size = '338688MB', jit = 'off'

by_analyzer_status

Raw SQL
SELECT
    "security_inventory_filters".*
FROM
    "security_inventory_filters"
WHERE (security_inventory_filters.traversal_ids >= '{9970}'
    AND '{9971}' > security_inventory_filters.traversal_ids)
AND "security_inventory_filters"."secret_detection" = 1
AND "security_inventory_filters"."dependency_scanning" = 2
ORDER BY
    "security_inventory_filters"."project_id" ASC,
    "security_inventory_filters"."id" DESC
LIMIT 21
Plan

See full plan here.

 Limit  (cost=2411.89..2411.95 rows=21 width=140) (actual time=6.191..6.195 rows=21 loops=1)
   Buffers: shared hit=1493 read=14 dirtied=99
   WAL: records=139 fpi=139 bytes=1105403
   I/O Timings: read=0.368 write=0.000
   ->  Sort  (cost=2411.89..2412.09 rows=78 width=140) (actual time=6.190..6.192 rows=21 loops=1)
         Sort Key: security_inventory_filters.project_id, security_inventory_filters.id DESC
         Sort Method: top-N heapsort  Memory: 32kB
         Buffers: shared hit=1493 read=14 dirtied=99
         WAL: records=139 fpi=139 bytes=1105403
         I/O Timings: read=0.368 write=0.000
         ->  Index Scan using index_security_inventory_filters_on_traversal_ids on public.security_inventory_filters  (cost=0.42..2409.79 rows=78 width=140) (actual time=1.789..6.160 rows=58 loops=1)
               Index Cond: ((security_inventory_filters.traversal_ids >= '{9970}'::bigint[]) AND (security_inventory_filters.traversal_ids < '{9971}'::bigint[]))
               Filter: ((security_inventory_filters.secret_detection = 1) AND (security_inventory_filters.dependency_scanning = 2))
               Rows Removed by Filter: 1630
               Buffers: shared hit=1490 read=14 dirtied=99
               WAL: records=139 fpi=139 bytes=1105403
               I/O Timings: read=0.368 write=0.000
Settings: work_mem = '100MB', effective_cache_size = '338688MB', jit = 'off', seq_page_cost = '4', random_page_cost = '1.5'

Both

Raw SQL
SELECT
    "security_inventory_filters".*
FROM
    "security_inventory_filters"
WHERE (security_inventory_filters.traversal_ids >= '{9970}'
    AND '{9971}' > security_inventory_filters.traversal_ids)
AND "security_inventory_filters"."critical" >= 3
AND "security_inventory_filters"."secret_detection" = 1
ORDER BY
    "security_inventory_filters"."project_id" ASC,
    "security_inventory_filters"."id" DESC
LIMIT 21
Plan

See full plan here.

 Limit  (cost=2416.37..2416.42 rows=21 width=140) (actual time=4.508..4.512 rows=21 loops=1)
   Buffers: shared hit=1507
   I/O Timings: read=0.000 write=0.000
   ->  Sort  (cost=2416.37..2416.98 rows=244 width=140) (actual time=4.507..4.509 rows=21 loops=1)
         Sort Key: security_inventory_filters.project_id, security_inventory_filters.id DESC
         Sort Method: top-N heapsort  Memory: 34kB
         Buffers: shared hit=1507
         I/O Timings: read=0.000 write=0.000
         ->  Index Scan using index_security_inventory_filters_on_traversal_ids on public.security_inventory_filters  (cost=0.42..2409.79 rows=244 width=140) (actual time=0.066..4.463 rows=273 loops=1)
               Index Cond: ((security_inventory_filters.traversal_ids >= '{9970}'::bigint[]) AND (security_inventory_filters.traversal_ids < '{9971}'::bigint[]))
               Filter: ((security_inventory_filters.critical >= 3) AND (security_inventory_filters.secret_detection = 1))
               Rows Removed by Filter: 1415
               Buffers: shared hit=1504
               I/O Timings: read=0.000 write=0.000
Settings: effective_cache_size = '338688MB', jit = 'off', seq_page_cost = '4', random_page_cost = '1.5', work_mem = '100MB'

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to [Backend] Add project-level filtering capabilities (#542537 - closed) • Gal Katz • 18.7, Add GraphQL filtering based on the security_inv... (#554138 - closed) • Gal Katz • 18.4

Edited by Gal Katz

Merge request reports

Loading