Backfill missing empty projects records in inventory filters table
What does this MR do?
Backfills missing project records in the security_inventory_filters table to enable filtering by projects that have no security scans or vulnerabilities.
Approach
This migration iterates over the projects table in batches. For each batch:
-
Fetch project data with namespace information: Joins projects with their project namespaces to retrieve project details and traversal hierarchy from
main_db -
Extract root namespace IDs: Identifies the root namespace for each project by taking the first element of
traversal_ids -
Filter by analyzer status: Queries
analyzer_namespace_statusesonsec_dbto identify which root namespaces have a matching record in the table -
Insert filtered records: Creates
security_inventory_filtersrecords only for projects belonging to security-enabled root namespaces
The table's unique constraint on project_id prevents duplicates—existing entries are preserved via ON CONFLICT DO NOTHING.
Why filter by analyzer status? By only backfilling projects within security-enabled hierarchies (those with analyzer_namespace_statuses records), we keep the table size manageable and focused on relevant data. Going forward, inventory filter records are automatically created for new projects, making this a one-time backfill operation.
Changelog: added
EE: true
Query Plans
fetch_project_data_with_namespaces
Raw SQL
```sql explain SELECT projects.id, projects.name, projects.archived, namespaces.traversal_ids FROM projects JOIN namespaces ON namespaces.id = projects.project_namespace_id WHERE projects.id BETWEEN 66101996 AND 66106996 AND projects.id >= 66103996 ```Query plan
See this for more details
Gather (cost=1028.99..7067.15 rows=1597 width=49) (actual time=1.015..8.960 rows=1436 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=8624
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=28.99..5907.45 rows=939 width=49) (actual time=0.386..3.706 rows=718 loops=2)
Buffers: shared hit=8624
I/O Timings: read=0.000 write=0.000
-> Parallel Bitmap Heap Scan on public.projects (cost=28.43..2545.54 rows=939 width=29) (actual time=0.348..1.526 rows=718 loops=2)
Buffers: shared hit=1437
I/O Timings: read=0.000 write=0.000
-> Bitmap Index Scan using projects_pkey (cost=0.00..28.03 rows=1597 width=0) (actual time=0.490..0.490 rows=1440 loops=1)
Index Cond: ((projects.id >= 66101996) AND (projects.id <= 66106996) AND (projects.id >= 66103996))
Buffers: shared hit=8
I/O Timings: read=0.000 write=0.000
-> Index Scan using namespaces_pkey on public.namespaces (cost=0.57..3.58 rows=1 width=32) (actual time=0.003..0.003 rows=1 loops=1436)
Index Cond: (namespaces.id = projects.project_namespace_id)
Buffers: shared hit=7187
I/O Timings: read=0.000 write=0.000
Settings: random_page_cost = '1.5', seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '472585MB', jit = 'off'
Analyzer namespace status check
Raw SQL
```sql SELECT "analyzer_namespace_statuses"."namespace_id" FROM "analyzer_namespace_statuses" WHERE "analyzer_namespace_statuses"."namespace_id" IN (9970, 9971, 9972, 9973, 9974, 115756369) AND (array_length(traversal_ids, 1) = 1) ```Query plan
See this for more details
Index Scan using index_analyzer_namespace_statuses_status on public.analyzer_namespace_statuses (cost=0.43..43.07 rows=1 width=8) (actual time=0.022..0.081 rows=28 loops=1)
Index Cond: (analyzer_namespace_statuses.namespace_id = ANY ('{9970,9971,9972,9973,9974,115756369}'::bigint[]))
Filter: (array_length(analyzer_namespace_statuses.traversal_ids, 1) = 1)
Rows Removed by Filter: 0
Buffers: shared hit=49
I/O Timings: read=0.000 write=0.000
Settings: effective_cache_size = '338688MB', jit = 'off', seq_page_cost = '4', work_mem = '100MB', random_page_cost = '1.5'
insert_all
Raw SQL
```sql INSERT INTO "security_inventory_filters" ("project_id","project_name","traversal_ids","archived") VALUES (194, 'project-1', '{405}', FALSE), (195, 'project-2', '{101,202}', TRUE), (196, 'project-3', '{303}', FALSE), (197, 'project-4', '{404,505,606}', FALSE), (198, 'project-5', '{707,808}', TRUE), (199, 'project-6', '{909}', FALSE), (200, 'project-7', '{111,222,333}', TRUE), (201, 'project-8', '{444}', FALSE), (202, 'project-9', '{555,666}', FALSE), (203, 'project-10', '{777,888,999}', TRUE), (204, 'project-11', '{123}', FALSE), (205, 'project-12', '{234,345}', FALSE), (206, 'project-13', '{456,567,678}', TRUE), (207, 'project-14', '{789}', FALSE), (208, 'project-15', '{890,901}', TRUE), (209, 'project-16', '{102,203,304}', FALSE), (210, 'project-17', '{405}', TRUE), (211, 'project-18', '{506,607}', FALSE), (212, 'project-19', '{708,809,910}', TRUE), (213, 'project-20', '{111}', FALSE) ON CONFLICT ("project_id") DO NOTHING; ```Query plan
See this for more details
ModifyTable on public.security_inventory_filters (cost=0.00..0.30 rows=0 width=0) (actual time=19.064..19.064 rows=0 loops=1)
Buffers: shared hit=421 read=75 dirtied=21
WAL: records=102 fpi=20 bytes=155581
I/O Timings: read=17.952 write=0.000
-> Values Scan on "*VALUES*" (cost=0.00..0.30 rows=20 width=137) (actual time=1.694..1.759 rows=20 loops=1)
Buffers: shared hit=29 read=5 dirtied=1
WAL: records=1 fpi=0 bytes=99
I/O Timings: read=1.595 write=0.000
Settings: work_mem = '100MB', random_page_cost = '1.5', effective_cache_size = '338688MB', jit = 'off', seq_page_cost = '4'
Related to #573375 (closed)