Fix Tool Coverage data update when all projects are archived (!208741) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

This MR fixes a bug where Tool Coverage statistics were not properly updated when archiving the last project in a namespace/subgroup. The issue occurred specifically when all projects in a namespace became archived, leaving the analyzer namespace statistics with stale data instead of being reset to zero.

Problem Description

When projects are archived, the Security::AnalyzerNamespaceStatuses::AdjustmentService is responsible for recalculating namespace-level analyzer statistics by aggregating data from unarchived projects. However, there was a critical gap in the SQL query logic:

Before the fix:

The main query used a LEFT JOIN with analyzer_project_statuses filtered by archived = FALSE
When all projects were archived, this join returned no rows
The HAVING count(analyzer_project_statuses.analyzer_type) > 0 clause filtered out empty results
This meant existing analyzer_namespace_statuses records were never updated when all projects became archived
The namespace statistics remained at their previous values instead of being reset to zero

Solution

The fix adds a UNION ALL clause to handle the edge case where all projects are archived:

UNION ALL

-- Handle case where all projects are archived: return 0 counts for existing analyzer types
SELECT DISTINCT
  analyzer_namespace_statuses.analyzer_type,
  0 AS success,
  0 AS failure,
  namespace_data.traversal_ids,
  namespace_data.namespace_id,
  now() AS created_at,
  now() AS updated_at
FROM namespace_data
CROSS JOIN analyzer_namespace_statuses
WHERE analyzer_namespace_statuses.namespace_id = namespace_data.namespace_id
AND NOT EXISTS (
  SELECT 1
  FROM analyzer_project_statuses
  WHERE analyzer_project_statuses.archived = FALSE
  AND analyzer_project_statuses.analyzer_type = analyzer_namespace_statuses.analyzer_type
  AND analyzer_project_statuses.traversal_ids >= namespace_data.traversal_ids
  AND analyzer_project_statuses.traversal_ids < namespace_data.next_traversal_id
)

Changelog: fixed EE: true

References

#572794

How to set up and validate locally

Before the fix:

Create a subgroup
Import first project
Import second project
Run pipeline for first project → "Tool Coverage" stats update correctly ✅
Run pipeline for second project → "Tool Coverage" stats update correctly ✅
Archive the first project → "Tool Coverage" stats update correctly ✅
Archive the second (last) project → "Tool Coverage" stats remain unchanged ❌

After the fix:

Create a subgroup
Import first project
Import second project
Run pipeline for first project → "Tool Coverage" stats update correctly ✅
Run pipeline for second project → "Tool Coverage" stats update correctly ✅
Archive the first project → "Tool Coverage" stats update correctly ✅
Archive the second (last) project → "Tool Coverage" stats are updated and the counters will be 0 (zero) ✅

Query plans

Before

Raw SQL

WITH new_values AS (
  WITH namespace_data (namespace_id, traversal_ids, next_traversal_id) AS (
    VALUES (23, ARRAY[23]::bigint[], ARRAY[24]::bigint[]), (24, ARRAY[23, 24]::bigint[], ARRAY[23, 25]::bigint[]), (25, ARRAY[23, 24, 25]::bigint[], ARRAY[23, 24, 26]::bigint[])
)
SELECT
  analyzer_project_statuses.analyzer_type,
  COALESCE(SUM((status = 1)::int), 0) AS success,
  COALESCE(SUM((status = 2)::int), 0) AS failure,
  namespace_data.traversal_ids as traversal_ids,
  namespace_data.namespace_id as namespace_id,
  now() AS created_at,
  now() AS updated_at
FROM namespace_data
LEFT JOIN analyzer_project_statuses
  ON analyzer_project_statuses.archived = FALSE
  AND analyzer_project_statuses.traversal_ids >= namespace_data.traversal_ids
  AND analyzer_project_statuses.traversal_ids < namespace_data.next_traversal_id
GROUP BY namespace_data.traversal_ids, namespace_id, analyzer_project_statuses.analyzer_type
HAVING count(analyzer_project_statuses.analyzer_type) > 0
), old_values AS (
  SELECT
  namespace_id,
  traversal_ids,
  analyzer_type,
  success,
  failure
FROM analyzer_namespace_statuses
WHERE namespace_id IN (SELECT namespace_id FROM new_values)

), upserted AS (
  INSERT INTO analyzer_namespace_statuses
  (analyzer_type, success, failure, traversal_ids, namespace_id, created_at, updated_at)
  (SELECT analyzer_type, success, failure, traversal_ids, namespace_id, created_at, updated_at
FROM new_values
)
ON CONFLICT (namespace_id, analyzer_type)
DO UPDATE SET
  success = EXCLUDED.success,
  failure = EXCLUDED.failure,
  updated_at = EXCLUDED.updated_at,
  traversal_ids = EXCLUDED.traversal_ids
RETURNING namespace_id

), diff_values AS (
  SELECT
  new_values.namespace_id AS namespace_id,
  new_values.traversal_ids,
  new_values.analyzer_type,
  new_values.success - COALESCE(old_values.success, 0) AS success,
  new_values.failure - COALESCE(old_values.failure, 0) AS failure
FROM new_values
LEFT JOIN old_values
  ON new_values.namespace_id = old_values.namespace_id AND
     new_values.analyzer_type = old_values.analyzer_type
WHERE EXISTS (
  SELECT 1
  FROM upserted
  WHERE upserted.namespace_id = new_values.namespace_id
)

)
SELECT *
FROM diff_values
WHERE
  success != 0 OR
  failure != 0;

Plan

See full plan here.

Hash Semi Join  (cost=661090.08..661145.14 rows=14 width=54) (actual time=0.053..0.055 rows=0 loops=1)
   Hash Cond: (new_values.namespace_id = upserted.namespace_id)
   Buffers: shared hit=12
   I/O Timings: read=0.000 write=0.000
   CTE new_values
     ->  HashAggregate  (cost=661088.25..661088.84 rows=14 width=70) (actual time=0.039..0.039 rows=0 loops=1)
           Group Key: "*VALUES*".column2, "*VALUES*".column1, analyzer_project_statuses.analyzer_type
           Filter: (count(analyzer_project_statuses.analyzer_type) > 0)
           Rows Removed by Filter: 3
           Buffers: shared hit=12
           I/O Timings: read=0.000 write=0.000
           ->  Nested Loop Left Join  (cost=0.56..538088.77 rows=4919979 width=40) (actual time=0.023..0.032 rows=3 loops=1)
                 Buffers: shared hit=12
                 I/O Timings: read=0.000 write=0.000
                 ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=68) (actual time=0.003..0.004 rows=3 loops=1)
                       I/O Timings: read=0.000 write=0.000
                 ->  Index Scan using index_analyzer_project_statuses_traversal_ids on public.analyzer_project_statuses  (cost=0.56..162962.98 rows=1639993 width=39) (actual time=0.008..0.008 rows=0 loops=3)
                       Index Cond: ((analyzer_project_statuses.traversal_ids >= "*VALUES*".column2) AND (analyzer_project_statuses.traversal_ids < "*VALUES*".column3))
                       Filter: (NOT analyzer_project_statuses.archived)
                       Rows Removed by Filter: 0
                       Buffers: shared hit=12
                       I/O Timings: read=0.000 write=0.000
   CTE upserted
     ->  ModifyTable on public.analyzer_namespace_statuses analyzer_namespace_statuses_1  (cost=0.00..0.35 rows=14 width=82) (actual time=0.040..0.040 rows=0 loops=1)
           Buffers: shared hit=12
           I/O Timings: read=0.000 write=0.000
           ->  CTE Scan on new_values new_values_2  (cost=0.00..0.35 rows=14 width=82) (actual time=0.039..0.039 rows=0 loops=1)
                 Buffers: shared hit=12
                 I/O Timings: read=0.000 write=0.000
   ->  Nested Loop Left Join  (cost=0.43..55.23 rows=14 width=70) (actual time=0.000..0.000 rows=0 loops=0)
         Filter: (((new_values.success - COALESCE(analyzer_namespace_statuses.success, '0'::bigint)) <> 0) OR ((new_values.failure - COALESCE(analyzer_namespace_statuses.failure, '0'::bigint)) <> 0))
         Rows Removed by Filter: 0
         I/O Timings: read=0.000 write=0.000
         ->  CTE Scan on new_values  (cost=0.00..0.28 rows=14 width=54) (actual time=0.000..0.000 rows=0 loops=0)
               I/O Timings: read=0.000 write=0.000
         ->  Nested Loop Semi Join  (cost=0.43..3.90 rows=1 width=26) (actual time=0.000..0.000 rows=0 loops=0)
               I/O Timings: read=0.000 write=0.000
               ->  Index Scan using index_analyzer_namespace_statuses_status on public.analyzer_namespace_statuses  (cost=0.43..3.45 rows=1 width=26) (actual time=0.000..0.000 rows=0 loops=0)
                     Index Cond: ((analyzer_namespace_statuses.namespace_id = new_values.namespace_id) AND (analyzer_namespace_statuses.analyzer_type = new_values.analyzer_type))
                     I/O Timings: read=0.000 write=0.000
               ->  CTE Scan on new_values new_values_1  (cost=0.00..0.28 rows=14 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                     I/O Timings: read=0.000 write=0.000
   ->  Hash  (cost=0.28..0.28 rows=14 width=8) (actual time=0.040..0.041 rows=0 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 8kB
         Buffers: shared hit=12
         I/O Timings: read=0.000 write=0.000
         ->  CTE Scan on upserted  (cost=0.00..0.28 rows=14 width=8) (actual time=0.040..0.040 rows=0 loops=1)
               Buffers: shared hit=12
               I/O Timings: read=0.000 write=0.000
Settings: seq_page_cost = '4', work_mem = '100MB', random_page_cost = '1.5', effective_cache_size = '338688MB', jit = 'off'

After