Backfill missing empty projects records in inventory filters table

What does this MR do?

Backfills missing project records in the security_inventory_filters table to enable filtering by projects that have no security scans or vulnerabilities.

Approach

This migration iterates over the projects table in batches. For each batch:

  1. Fetch project data with namespace information: Joins projects with their project namespaces to retrieve project details and traversal hierarchy from main_db
  2. Extract root namespace IDs: Identifies the root namespace for each project by taking the first element of traversal_ids
  3. Filter by analyzer status: Queries analyzer_namespace_statuses on sec_db to identify which root namespaces have a matching record in the table
  4. Insert filtered records: Creates security_inventory_filters records only for projects belonging to security-enabled root namespaces

The table's unique constraint on project_id prevents duplicates—existing entries are preserved via ON CONFLICT DO NOTHING.

Why filter by analyzer status? By only backfilling projects within security-enabled hierarchies (those with analyzer_namespace_statuses records), we keep the table size manageable and focused on relevant data. Going forward, inventory filter records are automatically created for new projects, making this a one-time backfill operation.

Changelog: added
EE: true

Query Plans

fetch_project_data_with_namespaces

Raw SQL ```sql explain SELECT projects.id, projects.name, projects.archived, namespaces.traversal_ids FROM projects JOIN namespaces ON namespaces.id = projects.project_namespace_id WHERE projects.id BETWEEN 66101996 AND 66106996 AND projects.id >= 66103996 ```
Query plan

See this for more details

 Gather  (cost=1028.99..7067.15 rows=1597 width=49) (actual time=1.015..8.960 rows=1436 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Buffers: shared hit=8624
   I/O Timings: read=0.000 write=0.000
   ->  Nested Loop  (cost=28.99..5907.45 rows=939 width=49) (actual time=0.386..3.706 rows=718 loops=2)
         Buffers: shared hit=8624
         I/O Timings: read=0.000 write=0.000
         ->  Parallel Bitmap Heap Scan on public.projects  (cost=28.43..2545.54 rows=939 width=29) (actual time=0.348..1.526 rows=718 loops=2)
               Buffers: shared hit=1437
               I/O Timings: read=0.000 write=0.000
               ->  Bitmap Index Scan using projects_pkey  (cost=0.00..28.03 rows=1597 width=0) (actual time=0.490..0.490 rows=1440 loops=1)
                     Index Cond: ((projects.id >= 66101996) AND (projects.id <= 66106996) AND (projects.id >= 66103996))
                     Buffers: shared hit=8
                     I/O Timings: read=0.000 write=0.000
         ->  Index Scan using namespaces_pkey on public.namespaces  (cost=0.57..3.58 rows=1 width=32) (actual time=0.003..0.003 rows=1 loops=1436)
               Index Cond: (namespaces.id = projects.project_namespace_id)
               Buffers: shared hit=7187
               I/O Timings: read=0.000 write=0.000
Settings: random_page_cost = '1.5', seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '472585MB', jit = 'off'

Analyzer namespace status check

Raw SQL ```sql SELECT "analyzer_namespace_statuses"."namespace_id" FROM "analyzer_namespace_statuses" WHERE "analyzer_namespace_statuses"."namespace_id" IN (9970, 9971, 9972, 9973, 9974, 115756369) AND (array_length(traversal_ids, 1) = 1) ```
Query plan

See this for more details

 Index Scan using index_analyzer_namespace_statuses_status on public.analyzer_namespace_statuses  (cost=0.43..43.07 rows=1 width=8) (actual time=0.022..0.081 rows=28 loops=1)
   Index Cond: (analyzer_namespace_statuses.namespace_id = ANY ('{9970,9971,9972,9973,9974,115756369}'::bigint[]))
   Filter: (array_length(analyzer_namespace_statuses.traversal_ids, 1) = 1)
   Rows Removed by Filter: 0
   Buffers: shared hit=49
   I/O Timings: read=0.000 write=0.000
Settings: effective_cache_size = '338688MB', jit = 'off', seq_page_cost = '4', work_mem = '100MB', random_page_cost = '1.5'

insert_all

Raw SQL ```sql INSERT INTO "security_inventory_filters" ("project_id","project_name","traversal_ids","archived") VALUES (194, 'project-1', '{405}', FALSE), (195, 'project-2', '{101,202}', TRUE), (196, 'project-3', '{303}', FALSE), (197, 'project-4', '{404,505,606}', FALSE), (198, 'project-5', '{707,808}', TRUE), (199, 'project-6', '{909}', FALSE), (200, 'project-7', '{111,222,333}', TRUE), (201, 'project-8', '{444}', FALSE), (202, 'project-9', '{555,666}', FALSE), (203, 'project-10', '{777,888,999}', TRUE), (204, 'project-11', '{123}', FALSE), (205, 'project-12', '{234,345}', FALSE), (206, 'project-13', '{456,567,678}', TRUE), (207, 'project-14', '{789}', FALSE), (208, 'project-15', '{890,901}', TRUE), (209, 'project-16', '{102,203,304}', FALSE), (210, 'project-17', '{405}', TRUE), (211, 'project-18', '{506,607}', FALSE), (212, 'project-19', '{708,809,910}', TRUE), (213, 'project-20', '{111}', FALSE) ON CONFLICT ("project_id") DO NOTHING; ```
Query plan

See this for more details

 ModifyTable on public.security_inventory_filters  (cost=0.00..0.30 rows=0 width=0) (actual time=19.064..19.064 rows=0 loops=1)
   Buffers: shared hit=421 read=75 dirtied=21
   WAL: records=102 fpi=20 bytes=155581
   I/O Timings: read=17.952 write=0.000
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.30 rows=20 width=137) (actual time=1.694..1.759 rows=20 loops=1)
         Buffers: shared hit=29 read=5 dirtied=1
         WAL: records=1 fpi=0 bytes=99
         I/O Timings: read=1.595 write=0.000
Settings: work_mem = '100MB', random_page_cost = '1.5', effective_cache_size = '338688MB', jit = 'off', seq_page_cost = '4'

Related to #573375 (closed)

Edited by Gal Katz

Merge request reports

Loading