[Backend] Implementation suggestion: Aggregated namespace counters (#596691) · Issues · GitLab.org / GitLab

[Backend] Implementation suggestion: Aggregated namespace counters

## What We Need: We need to provide data for [this design](https://gitlab.com/gitlab-org/gitlab/-/work_items/593900): ![image](/uploads/c4391f311dd110387ec9a0d4a5c2d90f/image.png){width=900 height=165} Today, `analyzer_namespace_statuses` tracks **per-analyzer** counters (success/failure per namespace per `analyzer_type`). We need **cross-analyzer** counters that answer questions about the project as a whole: - "How many projects under this group have **any** security scan configured?" - "How many projects under this group have **any** scan in a failed or warning state?" --- ## Option 1: New Aggregate Table Create a dedicated table `namespace_aggregated_scan_statuses` with **one pre-computed row per namespace** holding cross-analyzer counters. Updates follow the same incremental delta pattern already used for `analyzer_namespace_statuses`: compute a diff when a project's status changes, propagate the delta up the namespace tree via `traversal_ids`, and run a weekly `AdjustmentService` to repair any drift. ### Table Structure ```sql CREATE TABLE namespace_aggregated_scan_statuses ( id bigint NOT NULL, namespace_id bigint NOT NULL, projects_with_scans integer DEFAULT 0, projects_with_failed_or_warning integer DEFAULT 0, projects_with_stale integer DEFAULT 0, traversal_ids bigint[] DEFAULT '{}', created_at timestamptz NOT NULL, updated_at timestamptz NOT NULL ); CREATE UNIQUE INDEX ON namespace_aggregated_scan_statuses (namespace_id); CREATE INDEX ON namespace_aggregated_scan_statuses USING btree (traversal_ids); ``` ### How It Works 1. **DiffService** already loads all project analyzer statuses. We extend it to compute an `aggregate_diff` (e.g., `{ projects_with_scans: +1 }`) by checking whether project-level booleans flipped (e.g., went from "no scan" to "has scan"). No additional DB queries needed. 1. **DiffsService** sums these `aggregate_diff` values per namespace, same pattern as per-analyzer aggregation. 1. A new service propagates deltas up the namespace tree using the same `traversal_ids` expansion + `UPSERT` pattern as the existing `AncestorsUpdateService`. 1. A weekly **AdjustmentService** recalculates from scratch using over `analyzer_project_statuses` to fix any drift. This also serves as the initial backfill on first run. 1. Reading a count is a single row lookup by `namespace_id`. ### Scale Profile - **Reads**: O(1) per namespace - **Writes**: Every status change triggers delta propagation to all ancestor namespaces (limited depth of 20) - **Table size**: One row per namespace (small) --- ## Option 2: Generated Columns + Covering Index on `security_inventory_filters` Add two **stored generated columns** (`has_scanners`, `has_failed_or_warning`) to the existing `security_inventory_filters` table, plus a covering index. Counts are computed at read time via `COUNT` with the covering index. `security_inventory_filters` already has one row per project with all 14 analyzer status columns and `traversal_ids`. The generated columns pre-compute the cross-analyzer booleans so queries don't need to evaluate across 14 columns at query time. Additionally, this can help us filter projects using these columns. ### Schema Changes ```sql ALTER TABLE security_inventory_filters ADD COLUMN has_scanners boolean GENERATED ALWAYS AS ( GREATEST(sast, sast_advanced, sast_iac, dast, dependency_scanning, coverage_fuzzing, api_fuzzing, cluster_image_scanning, secret_detection, secret_detection_secret_push_protection, secret_detection_pipeline_based, container_scanning, container_scanning_for_registry, container_scanning_pipeline_based) > 0 -- enum where 0 = not_configured ) STORED, ADD COLUMN has_failed_or_warning boolean GENERATED ALWAYS AS ( 2 IN (sast, sast_advanced, sast_iac, dast, -- 2 = failed. Warning status is in development dependency_scanning, coverage_fuzzing, api_fuzzing, cluster_image_scanning, secret_detection, secret_detection_secret_push_protection, secret_detection_pipeline_based, container_scanning, container_scanning_for_registry, container_scanning_pipeline_based) ) STORED; -- Covering index to answer COUNT queries from the index alone CREATE INDEX idx_sec_inv_filters_computed_covering ON security_inventory_filters (traversal_ids) INCLUDE (has_scanners, has_failed_or_warning) WHERE NOT archived; ``` ### How It Works 1. When `security_inventory_filters` rows are updated, PG automatically recomputes the generated columns. 2. Queries filter by `traversal_ids` and count using the boolean columns. 3. The covering index allows for an **index-only scan**, minimizing heap access. ### Scale Profile - **Reads**: O(n) where n = projects under the namespace. Index-only scan. - **Writes**: Small overhead, with recompute of 2 booleans during existing row updates. - **Table size**: Adds ~2 bytes per row to existing table. Covering index adds ~100MB for 10M rows. ### Example query plans: I tested using existing columns as proxies: `sast` for `has_scanners` and `secret_detection_secret_push_protection` for `has_failed_or_warning`. - Index creation: ``` EXEC CREATE INDEX CONCURRENTLY idx_sec_inv_test_covering ON security_inventory_filters (traversal_ids) INCLUDE (sast, secret_detection_secret_push_protection) WHERE NOT archived; ``` - Query used for testing: ```sql SELECT COUNT(*) AS total_projects, COUNT(*) FILTER (WHERE sast = 0) AS no_scanners, COUNT(*) FILTER (WHERE secret_detection_secret_push_protection = 2) AS any_failed FROM security_inventory_filters WHERE NOT archived AND traversal_ids >= '{namespace_id}' AND traversal_ids < '{namespace_id+1}'; ``` - [Query plan with `namespace_id=9970`](https://console.postgres.ai/gitlab/gitlab-production-sec/sessions/50788/commands/150482): ``` Aggregate (cost=2584.90..2584.91 rows=1 width=24) (actual time=61.509..61.510 rows=1 loops=1) Buffers: shared hit=2576 read=569 -> Index Only Scan using idx_sec_inv_test_covering on public.security_inventory_filters (cost=0.56..2446.34 rows=11085 width=4) (actual time=0.053..60.962 rows=6799 loops=1) Index Cond: ((security_inventory_filters.traversal_ids >= '{9970}'::bigint[]) AND (security_inventory_filters.traversal_ids < '{9971}'::bigint[])) Heap Fetches: 1370 Buffers: shared hit=2576 read=569 Settings: work_mem = '100MB', seq_page_cost = '4', effective_cache_size = '338688MB', jit = 'off', random_page_cost = '1.5' ``` - [Query plan with `namespace_id=4249178`](https://console.postgres.ai/gitlab/gitlab-production-sec/sessions/50788/commands/150483): ``` Aggregate (cost=17035.29..17035.30 rows=1 width=24) (actual time=318.171..318.172 rows=1 loops=1) Buffers: shared hit=26465 read=3922 -> Index Only Scan using idx_sec_inv_test_covering on public.security_inventory_filters (cost=0.56..16033.69 rows=80128 width=4) (actual time=0.070..315.179 rows=41596 loops=1) Index Cond: ((security_inventory_filters.traversal_ids >= '{4249178}'::bigint[]) AND (security_inventory_filters.traversal_ids < '{4249179}'::bigint[])) Heap Fetches: 17489 Buffers: shared hit=26465 read=3922 Settings: random_page_cost = '1.5', work_mem = '100MB', seq_page_cost = '4', effective_cache_size = '338688MB', jit = 'off' ```

issue