Fix Tool Coverage data update when all projects are archived (!210078) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

This MR fixes a bug where Tool Coverage statistics were not properly updated when archiving the last project in a namespace/subgroup. The issue occurred specifically when all projects in a namespace became archived, leaving the analyzer namespace statistics with stale data instead of being reset to zero.

Problem Description

When projects are archived, the Security::AnalyzerNamespaceStatuses::AdjustmentService is responsible for recalculating namespace-level analyzer statistics by aggregating data from unarchived projects. However, there was a critical gap in the SQL query logic:

Before the fix:

The main query used a LEFT JOIN with analyzer_project_statuses filtered by archived = FALSE
When all projects were archived, this join returned no rows
The HAVING count(analyzer_project_statuses.analyzer_type) > 0 clause filtered out empty results
This meant existing analyzer_namespace_statuses records were never updated when all projects became archived
The namespace statistics remained at their previous values instead of being reset to zero

Previous solution (reverted because timeouts)

The fix adds a UNION ALL clause to handle the edge case where all projects are archived:

UNION ALL

-- Handle case where all projects are archived: return 0 counts for existing analyzer types
SELECT DISTINCT
  analyzer_namespace_statuses.analyzer_type,
  0 AS success,
  0 AS failure,
  namespace_data.traversal_ids,
  namespace_data.namespace_id,
  now() AS created_at,
  now() AS updated_at
FROM namespace_data
CROSS JOIN analyzer_namespace_statuses
WHERE analyzer_namespace_statuses.namespace_id = namespace_data.namespace_id
AND NOT EXISTS (
  SELECT 1
  FROM analyzer_project_statuses
  WHERE analyzer_project_statuses.archived = FALSE
  AND analyzer_project_statuses.analyzer_type = analyzer_namespace_statuses.analyzer_type
  AND analyzer_project_statuses.traversal_ids >= namespace_data.traversal_ids
  AND analyzer_project_statuses.traversal_ids < namespace_data.next_traversal_id
)

Changelog: fixed EE: true

References

#572794

Follow up

!208741 (merged)

How to set up and validate locally

Before the fix:

Create a subgroup
Import first project
Import second project
Run pipeline for first project → "Tool Coverage" stats update correctly ✅
Run pipeline for second project → "Tool Coverage" stats update correctly ✅
Archive the first project → "Tool Coverage" stats update correctly ✅
Archive the second (last) project → "Tool Coverage" stats remain unchanged ❌

After the fix:

Create a subgroup
Import first project
Import second project
Run pipeline for first project → "Tool Coverage" stats update correctly ✅
Run pipeline for second project → "Tool Coverage" stats update correctly ✅
Archive the first project → "Tool Coverage" stats update correctly ✅
Archive the second (last) project → "Tool Coverage" stats are updated and the counters will be 0 (zero) ✅

Query plans

Before

Raw SQL

WITH new_values AS (
  WITH namespace_data (namespace_id, traversal_ids, next_traversal_id) AS (
    VALUES (23, ARRAY[23]::bigint[], ARRAY[24]::bigint[]), (24, ARRAY[23, 24]::bigint[], ARRAY[23, 25]::bigint[]), (25, ARRAY[23, 24, 25]::bigint[], ARRAY[23, 24, 26]::bigint[])
)
SELECT
  analyzer_project_statuses.analyzer_type,
  COALESCE(SUM((status = 1)::int), 0) AS success,
  COALESCE(SUM((status = 2)::int), 0) AS failure,
  namespace_data.traversal_ids as traversal_ids,
  namespace_data.namespace_id as namespace_id,
  now() AS created_at,
  now() AS updated_at
FROM namespace_data
LEFT JOIN analyzer_project_statuses
  ON analyzer_project_statuses.archived = FALSE
  AND analyzer_project_statuses.traversal_ids >= namespace_data.traversal_ids
  AND analyzer_project_statuses.traversal_ids < namespace_data.next_traversal_id
GROUP BY namespace_data.traversal_ids, namespace_id, analyzer_project_statuses.analyzer_type
HAVING count(analyzer_project_statuses.analyzer_type) > 0
), old_values AS (
  SELECT
  namespace_id,
  traversal_ids,
  analyzer_type,
  success,
  failure
FROM analyzer_namespace_statuses
WHERE namespace_id IN (SELECT namespace_id FROM new_values)

), upserted AS (
  INSERT INTO analyzer_namespace_statuses
  (analyzer_type, success, failure, traversal_ids, namespace_id, created_at, updated_at)
  (SELECT analyzer_type, success, failure, traversal_ids, namespace_id, created_at, updated_at
FROM new_values
)
ON CONFLICT (namespace_id, analyzer_type)
DO UPDATE SET
  success = EXCLUDED.success,
  failure = EXCLUDED.failure,
  updated_at = EXCLUDED.updated_at,
  traversal_ids = EXCLUDED.traversal_ids
RETURNING namespace_id

), diff_values AS (
  SELECT
  new_values.namespace_id AS namespace_id,
  new_values.traversal_ids,
  new_values.analyzer_type,
  new_values.success - COALESCE(old_values.success, 0) AS success,
  new_values.failure - COALESCE(old_values.failure, 0) AS failure
FROM new_values
LEFT JOIN old_values
  ON new_values.namespace_id = old_values.namespace_id AND
     new_values.analyzer_type = old_values.analyzer_type
WHERE EXISTS (
  SELECT 1
  FROM upserted
  WHERE upserted.namespace_id = new_values.namespace_id
)

)
SELECT *
FROM diff_values
WHERE
  success != 0 OR
  failure != 0;

Plan

See full plan here.

Hash Semi Join  (cost=661090.08..661145.14 rows=14 width=54) (actual time=0.053..0.055 rows=0 loops=1)
   Hash Cond: (new_values.namespace_id = upserted.namespace_id)
   Buffers: shared hit=12
   I/O Timings: read=0.000 write=0.000
   CTE new_values
     ->  HashAggregate  (cost=661088.25..661088.84 rows=14 width=70) (actual time=0.039..0.039 rows=0 loops=1)
           Group Key: "*VALUES*".column2, "*VALUES*".column1, analyzer_project_statuses.analyzer_type
           Filter: (count(analyzer_project_statuses.analyzer_type) > 0)
           Rows Removed by Filter: 3
           Buffers: shared hit=12
           I/O Timings: read=0.000 write=0.000
           ->  Nested Loop Left Join  (cost=0.56..538088.77 rows=4919979 width=40) (actual time=0.023..0.032 rows=3 loops=1)
                 Buffers: shared hit=12
                 I/O Timings: read=0.000 write=0.000
                 ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=68) (actual time=0.003..0.004 rows=3 loops=1)
                       I/O Timings: read=0.000 write=0.000
                 ->  Index Scan using index_analyzer_project_statuses_traversal_ids on public.analyzer_project_statuses  (cost=0.56..162962.98 rows=1639993 width=39) (actual time=0.008..0.008 rows=0 loops=3)
                       Index Cond: ((analyzer_project_statuses.traversal_ids >= "*VALUES*".column2) AND (analyzer_project_statuses.traversal_ids < "*VALUES*".column3))
                       Filter: (NOT analyzer_project_statuses.archived)
                       Rows Removed by Filter: 0
                       Buffers: shared hit=12
                       I/O Timings: read=0.000 write=0.000
   CTE upserted
     ->  ModifyTable on public.analyzer_namespace_statuses analyzer_namespace_statuses_1  (cost=0.00..0.35 rows=14 width=82) (actual time=0.040..0.040 rows=0 loops=1)
           Buffers: shared hit=12
           I/O Timings: read=0.000 write=0.000
           ->  CTE Scan on new_values new_values_2  (cost=0.00..0.35 rows=14 width=82) (actual time=0.039..0.039 rows=0 loops=1)
                 Buffers: shared hit=12
                 I/O Timings: read=0.000 write=0.000
   ->  Nested Loop Left Join  (cost=0.43..55.23 rows=14 width=70) (actual time=0.000..0.000 rows=0 loops=0)
         Filter: (((new_values.success - COALESCE(analyzer_namespace_statuses.success, '0'::bigint)) <> 0) OR ((new_values.failure - COALESCE(analyzer_namespace_statuses.failure, '0'::bigint)) <> 0))
         Rows Removed by Filter: 0
         I/O Timings: read=0.000 write=0.000
         ->  CTE Scan on new_values  (cost=0.00..0.28 rows=14 width=54) (actual time=0.000..0.000 rows=0 loops=0)
               I/O Timings: read=0.000 write=0.000
         ->  Nested Loop Semi Join  (cost=0.43..3.90 rows=1 width=26) (actual time=0.000..0.000 rows=0 loops=0)
               I/O Timings: read=0.000 write=0.000
               ->  Index Scan using index_analyzer_namespace_statuses_status on public.analyzer_namespace_statuses  (cost=0.43..3.45 rows=1 width=26) (actual time=0.000..0.000 rows=0 loops=0)
                     Index Cond: ((analyzer_namespace_statuses.namespace_id = new_values.namespace_id) AND (analyzer_namespace_statuses.analyzer_type = new_values.analyzer_type))
                     I/O Timings: read=0.000 write=0.000
               ->  CTE Scan on new_values new_values_1  (cost=0.00..0.28 rows=14 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                     I/O Timings: read=0.000 write=0.000
   ->  Hash  (cost=0.28..0.28 rows=14 width=8) (actual time=0.040..0.041 rows=0 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 8kB
         Buffers: shared hit=12
         I/O Timings: read=0.000 write=0.000
         ->  CTE Scan on upserted  (cost=0.00..0.28 rows=14 width=8) (actual time=0.040..0.040 rows=0 loops=1)
               Buffers: shared hit=12
               I/O Timings: read=0.000 write=0.000
Settings: seq_page_cost = '4', work_mem = '100MB', random_page_cost = '1.5', effective_cache_size = '338688MB', jit = 'off'

Previous solution (reverted because timeouts)

Raw SQL

WITH new_values AS (
  WITH namespace_data (namespace_id, traversal_ids, next_traversal_id) AS (
    VALUES (23, ARRAY[23]::bigint[], ARRAY[24]::bigint[]), (24, ARRAY[23, 24]::bigint[], ARRAY[23, 25]::bigint[]), (25, ARRAY[23, 24, 25]::bigint[], ARRAY[23, 24, 26]::bigint[])
)
SELECT
  analyzer_project_statuses.analyzer_type,
  COALESCE(SUM((status = 1)::int), 0) AS success,
  COALESCE(SUM((status = 2)::int), 0) AS failure,
  namespace_data.traversal_ids as traversal_ids,
  namespace_data.namespace_id as namespace_id,
  now() AS created_at,
  now() AS updated_at
FROM namespace_data
LEFT JOIN analyzer_project_statuses
  ON analyzer_project_statuses.archived = FALSE
  AND analyzer_project_statuses.traversal_ids >= namespace_data.traversal_ids
  AND analyzer_project_statuses.traversal_ids < namespace_data.next_traversal_id
GROUP BY namespace_data.traversal_ids, namespace_id, analyzer_project_statuses.analyzer_type
HAVING count(analyzer_project_statuses.analyzer_type) > 0

UNION ALL

-- Handle case where all projects are archived: return 0 counts for existing analyzer types
SELECT DISTINCT
  analyzer_namespace_statuses.analyzer_type,
  0 AS success,
  0 AS failure,
  namespace_data.traversal_ids,
  namespace_data.namespace_id,
  now() AS created_at,
  now() AS updated_at
FROM namespace_data
CROSS JOIN analyzer_namespace_statuses
WHERE analyzer_namespace_statuses.namespace_id = namespace_data.namespace_id
AND NOT EXISTS (
  SELECT 1
  FROM analyzer_project_statuses
  WHERE analyzer_project_statuses.archived = FALSE
  AND analyzer_project_statuses.analyzer_type = analyzer_namespace_statuses.analyzer_type
  AND analyzer_project_statuses.traversal_ids >= namespace_data.traversal_ids
  AND analyzer_project_statuses.traversal_ids < namespace_data.next_traversal_id
)

), old_values AS (
  SELECT
  namespace_id,
  traversal_ids,
  analyzer_type,
  success,
  failure
FROM analyzer_namespace_statuses
WHERE namespace_id IN (SELECT namespace_id FROM new_values)

), upserted AS (
  INSERT INTO analyzer_namespace_statuses
  (analyzer_type, success, failure, traversal_ids, namespace_id, created_at, updated_at)
  (SELECT analyzer_type, success, failure, traversal_ids, namespace_id, created_at, updated_at
FROM new_values
)
ON CONFLICT (namespace_id, analyzer_type)
DO UPDATE SET
  success = EXCLUDED.success,
  failure = EXCLUDED.failure,
  updated_at = EXCLUDED.updated_at,
  traversal_ids = EXCLUDED.traversal_ids
RETURNING namespace_id

), diff_values AS (
  SELECT
  new_values.namespace_id AS namespace_id,
  new_values.traversal_ids,
  new_values.analyzer_type,
  new_values.success - COALESCE(old_values.success, 0) AS success,
  new_values.failure - COALESCE(old_values.failure, 0) AS failure
FROM new_values
LEFT JOIN old_values
  ON new_values.namespace_id = old_values.namespace_id AND
     new_values.analyzer_type = old_values.analyzer_type
WHERE EXISTS (
  SELECT 1
  FROM upserted
  WHERE upserted.namespace_id = new_values.namespace_id
)

)
SELECT *
FROM diff_values
WHERE
  success != 0 OR
  failure != 0;

Plan

See full plan here.

 Hash Semi Join  (cost=2336074.36..2336187.13 rows=26 width=54) (actual time=1.946..1.951 rows=0 loops=1)
   Hash Cond: (new_values.namespace_id = upserted.namespace_id)
   Buffers: shared hit=28 read=7
   I/O Timings: read=1.835 write=0.000
   CTE new_values
     ->  Append  (cost=661088.31..2336072.43 rows=26 width=70) (actual time=1.945..1.947 rows=0 loops=1)
           Buffers: shared hit=28 read=7
           I/O Timings: read=1.835 write=0.000
           CTE namespace_data
             ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=68) (actual time=0.003..0.005 rows=3 loops=1)
                   I/O Timings: read=0.000 write=0.000
           ->  HashAggregate  (cost=661088.27..661088.87 rows=14 width=70) (actual time=1.785..1.786 rows=0 loops=1)
                 Group Key: namespace_data.traversal_ids, namespace_data.namespace_id, analyzer_project_statuses.analyzer_type
                 Filter: (count(analyzer_project_statuses.analyzer_type) > 0)
                 Rows Removed by Filter: 3
                 Buffers: shared hit=8 read=4
                 I/O Timings: read=1.733 write=0.000
                 ->  Nested Loop Left Join  (cost=0.56..538088.80 rows=4919979 width=40) (actual time=1.763..1.776 rows=3 loops=1)
                       Buffers: shared hit=8 read=4
                       I/O Timings: read=1.733 write=0.000
                       ->  CTE Scan on namespace_data  (cost=0.00..0.06 rows=3 width=68) (actual time=0.004..0.008 rows=3 loops=1)
                             I/O Timings: read=0.000 write=0.000
                       ->  Index Scan using index_analyzer_project_statuses_traversal_ids on public.analyzer_project_statuses  (cost=0.56..162962.98 rows=1639993 width=39) (actual time=0.588..0.588 rows=0 loops=3)
                             Index Cond: ((analyzer_project_statuses.traversal_ids >= namespace_data.traversal_ids) AND (analyzer_project_statuses.traversal_ids < namespace_data.next_traversal_id))
                             Filter: (NOT analyzer_project_statuses.archived)
                             Rows Removed by Filter: 0
                             Buffers: shared hit=8 read=4
                             I/O Timings: read=1.733 write=0.000
           ->  Subquery Scan on *SELECT* 2  (cost=1674983.10..1674983.40 rows=12 width=70) (actual time=0.158..0.160 rows=0 loops=1)
                 Buffers: shared hit=20 read=3
                 I/O Timings: read=0.102 write=0.000
                 ->  Unique  (cost=1674983.10..1674983.22 rows=12 width=62) (actual time=0.158..0.159 rows=0 loops=1)
                       Buffers: shared hit=20 read=3
                       I/O Timings: read=0.102 write=0.000
                       ->  Sort  (cost=1674983.10..1674983.13 rows=12 width=62) (actual time=0.157..0.158 rows=0 loops=1)
                             Sort Key: analyzer_namespace_statuses_1.analyzer_type, namespace_data_1.traversal_ids, namespace_data_1.namespace_id
                             Sort Method: quicksort  Memory: 25kB
                             Buffers: shared hit=20 read=3
                             I/O Timings: read=0.102 write=0.000
                             ->  Nested Loop Anti Join  (cost=0.86..1674982.89 rows=12 width=62) (actual time=0.124..0.125 rows=0 loops=1)
                                   Buffers: shared hit=9 read=3
                                   I/O Timings: read=0.102 write=0.000
                                   ->  Nested Loop  (cost=0.43..8.90 rows=13 width=70) (actual time=0.124..0.124 rows=0 loops=1)
                                         Buffers: shared hit=9 read=3
                                         I/O Timings: read=0.102 write=0.000
                                         ->  CTE Scan on namespace_data namespace_data_1  (cost=0.00..0.06 rows=3 width=68) (actual time=0.000..0.001 rows=3 loops=1)
                                               I/O Timings: read=0.000 write=0.000
                                         ->  Index Only Scan using index_analyzer_namespace_statuses_status on public.analyzer_namespace_statuses analyzer_namespace_statuses_1  (cost=0.43..2.91 rows=4 width=10) (actual time=0.040..0.040 rows=0 loops=3)
                                               Index Cond: (analyzer_namespace_statuses_1.namespace_id = namespace_data_1.namespace_id)
                                               Heap Fetches: 0
                                               Buffers: shared hit=9 read=3
                                               I/O Timings: read=0.102 write=0.000
                                   ->  Index Scan using index_analyzer_project_statuses_status on public.analyzer_project_statuses analyzer_project_statuses_1  (cost=0.43..123766.90 rows=1054281 width=37) (actual time=0.000..0.000 rows=0 loops=0)
                                         Index Cond: (analyzer_project_statuses_1.analyzer_type = analyzer_namespace_statuses_1.analyzer_type)
                                         Filter: (NOT analyzer_project_statuses_1.archived)
                                         Rows Removed by Filter: 0
                                         I/O Timings: read=0.000 write=0.000
   CTE upserted
     ->  ModifyTable on public.analyzer_namespace_statuses analyzer_namespace_statuses_2  (cost=0.00..0.65 rows=26 width=82) (actual time=0.001..0.001 rows=0 loops=1)
           I/O Timings: read=0.000 write=0.000
           ->  CTE Scan on new_values new_values_2  (cost=0.00..0.65 rows=26 width=82) (actual time=0.000..0.000 rows=0 loops=1)
                 I/O Timings: read=0.000 write=0.000
   ->  Nested Loop Left Join  (cost=0.43..112.71 rows=26 width=70) (actual time=1.946..1.946 rows=0 loops=1)
         Filter: (((new_values.success - COALESCE(analyzer_namespace_statuses.success, '0'::bigint)) <> 0) OR ((new_values.failure - COALESCE(analyzer_namespace_statuses.failure, '0'::bigint)) <> 0))
         Rows Removed by Filter: 0
         Buffers: shared hit=28 read=7
         I/O Timings: read=1.835 write=0.000
         ->  CTE Scan on new_values  (cost=0.00..0.52 rows=26 width=54) (actual time=1.945..1.946 rows=0 loops=1)
               Buffers: shared hit=28 read=7
               I/O Timings: read=1.835 write=0.000
         ->  Nested Loop Semi Join  (cost=0.43..4.29 rows=1 width=26) (actual time=0.000..0.000 rows=0 loops=0)
               I/O Timings: read=0.000 write=0.000
               ->  Index Scan using index_analyzer_namespace_statuses_status on public.analyzer_namespace_statuses  (cost=0.43..3.45 rows=1 width=26) (actual time=0.000..0.000 rows=0 loops=0)
                     Index Cond: ((analyzer_namespace_statuses.namespace_id = new_values.namespace_id) AND (analyzer_namespace_statuses.analyzer_type = new_values.analyzer_type))
                     I/O Timings: read=0.000 write=0.000
               ->  CTE Scan on new_values new_values_1  (cost=0.00..0.52 rows=26 width=4) (actual time=0.000..0.000 rows=0 loops=0)
                     I/O Timings: read=0.000 write=0.000
   ->  Hash  (cost=0.52..0.52 rows=26 width=8) (actual time=0.000..0.000 rows=0 loops=0)
         I/O Timings: read=0.000 write=0.000
         ->  CTE Scan on upserted  (cost=0.00..0.52 rows=26 width=8) (actual time=0.000..0.000 rows=0 loops=0)
               I/O Timings: read=0.000 write=0.000
Settings: random_page_cost = '1.5', effective_cache_size = '338688MB', jit = 'off', seq_page_cost = '4', work_mem = '100MB'

Current solution

Raw SQL

WITH new_values AS (
  WITH namespace_data (namespace_id, traversal_ids, next_traversal_id) AS (
    VALUES (23, ARRAY[23]::bigint[], ARRAY[24]::bigint[]), (24, ARRAY[23, 24]::bigint[], ARRAY[23, 25]::bigint[]), (25, ARRAY[23, 24, 25]::bigint[], ARRAY[23, 24, 26]::bigint[])
),
project_stats AS (
  SELECT
    analyzer_project_statuses.analyzer_type,
    COALESCE(SUM((status = 1)::int), 0) AS success,
    COALESCE(SUM((status = 2)::int), 0) AS failure,
    namespace_data.traversal_ids,
    namespace_data.namespace_id,
    now() AS created_at,
    now() AS updated_at
  FROM namespace_data
  LEFT JOIN analyzer_project_statuses
    ON analyzer_project_statuses.archived = FALSE
    AND analyzer_project_statuses.traversal_ids >= namespace_data.traversal_ids
    AND analyzer_project_statuses.traversal_ids < namespace_data.next_traversal_id
  GROUP BY namespace_data.traversal_ids, namespace_data.namespace_id, analyzer_project_statuses.analyzer_type
  HAVING count(analyzer_project_statuses.analyzer_type) > 0
),
existing_analyzer_types AS (
  SELECT DISTINCT
    analyzer_namespace_statuses.analyzer_type,
    namespace_data.traversal_ids,
    namespace_data.namespace_id,
    now() AS created_at,
    now() AS updated_at
  FROM namespace_data
  INNER JOIN analyzer_namespace_statuses
    ON analyzer_namespace_statuses.namespace_id = namespace_data.namespace_id
  WHERE NOT EXISTS (
    SELECT 1 FROM project_stats
    WHERE project_stats.namespace_id = namespace_data.namespace_id
    AND project_stats.analyzer_type = analyzer_namespace_statuses.analyzer_type
  )
)
SELECT analyzer_type, success, failure, traversal_ids, namespace_id, created_at, updated_at
FROM project_stats

UNION ALL

SELECT analyzer_type, 0 AS success, 0 AS failure, traversal_ids, namespace_id, created_at, updated_at
FROM existing_analyzer_types

), old_values AS (
  SELECT
  namespace_id,
  traversal_ids,
  analyzer_type,
  success,
  failure
FROM analyzer_namespace_statuses
WHERE namespace_id IN (SELECT namespace_id FROM new_values)

), upserted AS (
  INSERT INTO analyzer_namespace_statuses
  (analyzer_type, success, failure, traversal_ids, namespace_id, created_at, updated_at)
  (SELECT analyzer_type, success, failure, traversal_ids, namespace_id, created_at, updated_at
FROM new_values
)
ON CONFLICT (namespace_id, analyzer_type)
DO UPDATE SET
  success = EXCLUDED.success,
  failure = EXCLUDED.failure,
  updated_at = EXCLUDED.updated_at,
  traversal_ids = EXCLUDED.traversal_ids
RETURNING namespace_id

), diff_values AS (
  SELECT
  new_values.namespace_id AS namespace_id,
  new_values.traversal_ids,
  new_values.analyzer_type,
  new_values.success - COALESCE(old_values.success, 0) AS success,
  new_values.failure - COALESCE(old_values.failure, 0) AS failure
FROM new_values
LEFT JOIN old_values
  ON new_values.namespace_id = old_values.namespace_id AND
     new_values.analyzer_type = old_values.analyzer_type
WHERE EXISTS (
  SELECT 1
  FROM upserted
  WHERE upserted.namespace_id = new_values.namespace_id
)

)
SELECT *
FROM diff_values
WHERE
  success != 0 OR
  failure != 0;

Plan

See full plan here.

  Hash Semi Join  (cost=705127.96..705187.47 rows=15 width=54) (actual time=1.726..1.755 rows=9 loops=1)
   Hash Cond: (new_values.namespace_id = upserted.namespace_id)
   Buffers: shared hit=332 read=28 dirtied=21
   WAL: records=52 fpi=10 bytes=85105
   I/O Timings: read=0.905 write=0.000
   CTE new_values
     ->  Append  (cost=705118.00..705126.67 rows=15 width=70) (actual time=0.098..0.230 rows=9 loops=1)
           Buffers: shared hit=78
           I/O Timings: read=0.000 write=0.000
           CTE namespace_data
             ->  Values Scan on "*VALUES*"  (cost=0.00..0.04 rows=3 width=68) (actual time=0.001..0.003 rows=3 loops=1)
                   I/O Timings: read=0.000 write=0.000
           CTE project_stats
             ->  HashAggregate  (cost=705117.37..705117.96 rows=14 width=70) (actual time=0.097..0.105 rows=9 loops=1)
                   Group Key: namespace_data.traversal_ids, namespace_data.namespace_id, analyzer_project_statuses.analyzer_type
                   Filter: (count(analyzer_project_statuses.analyzer_type) > 0)
                   Rows Removed by Filter: 0
                   Buffers: shared hit=39
                   I/O Timings: read=0.000 write=0.000
                   ->  Nested Loop Left Join  (cost=0.56..575590.82 rows=5181062 width=40) (actual time=0.026..0.071 rows=24 loops=1)
                         Buffers: shared hit=36
                         I/O Timings: read=0.000 write=0.000
                         ->  CTE Scan on namespace_data  (cost=0.00..0.06 rows=3 width=68) (actual time=0.002..0.005 rows=3 loops=1)
                               I/O Timings: read=0.000 write=0.000
                         ->  Index Scan using index_analyzer_project_statuses_traversal_ids on public.analyzer_project_statuses  (cost=0.56..174593.38 rows=1727021 width=39) (actual time=0.011..0.020 rows=8 loops=3)
                               Index Cond: ((analyzer_project_statuses.traversal_ids >= namespace_data.traversal_ids) AND (analyzer_project_statuses.traversal_ids < namespace_data.next_traversal_id))
                               Filter: (NOT analyzer_project_statuses.archived)
                               Rows Removed by Filter: 1
                               Buffers: shared hit=36
                               I/O Timings: read=0.000 write=0.000
           ->  CTE Scan on project_stats  (cost=0.00..0.28 rows=14 width=70) (actual time=0.097..0.109 rows=9 loops=1)
                 Buffers: shared hit=39
                 I/O Timings: read=0.000 write=0.000
           ->  Subquery Scan on *SELECT* 2  (cost=8.28..8.31 rows=1 width=70) (actual time=0.113..0.115 rows=0 loops=1)
                 Buffers: shared hit=39
                 I/O Timings: read=0.000 write=0.000
                 ->  Subquery Scan on existing_analyzer_types  (cost=8.28..8.30 rows=1 width=62) (actual time=0.112..0.114 rows=0 loops=1)
                       Buffers: shared hit=39
                       I/O Timings: read=0.000 write=0.000
                       ->  Unique  (cost=8.28..8.30 rows=1 width=54) (actual time=0.111..0.113 rows=0 loops=1)
                             Buffers: shared hit=39
                             I/O Timings: read=0.000 write=0.000
                             ->  Sort  (cost=8.28..8.29 rows=1 width=54) (actual time=0.111..0.112 rows=0 loops=1)
                                   Sort Key: analyzer_namespace_statuses_1.analyzer_type, namespace_data_1.traversal_ids, namespace_data_1.namespace_id
                                   Sort Method: quicksort  Memory: 25kB
                                   Buffers: shared hit=39
                                   I/O Timings: read=0.000 write=0.000
                                   ->  Hash Anti Join  (cost=0.92..8.27 rows=1 width=54) (actual time=0.058..0.059 rows=0 loops=1)
                                         Hash Cond: ((namespace_data_1.namespace_id = project_stats_1.namespace_id) AND (analyzer_namespace_statuses_1.analyzer_type = project_stats_1.analyzer_type))
                                         Buffers: shared hit=28
                                         I/O Timings: read=0.000 write=0.000
                                         ->  Nested Loop  (cost=0.43..7.68 rows=13 width=38) (actual time=0.018..0.039 rows=6 loops=1)
                                               Buffers: shared hit=28
                                               I/O Timings: read=0.000 write=0.000
                                               ->  CTE Scan on namespace_data namespace_data_1  (cost=0.00..0.06 rows=3 width=36) (actual time=0.001..0.001 rows=3 loops=1)
                                                     I/O Timings: read=0.000 write=0.000
                                               ->  Index Only Scan using index_analyzer_namespace_statuses_status on public.analyzer_namespace_statuses analyzer_namespace_statuses_1  (cost=0.43..2.50 rows=4 width=10) (actual time=0.006..0.011 rows=2 loops=3)
                                                     Index Cond: (analyzer_namespace_statuses_1.namespace_id = namespace_data_1.namespace_id)
                                                     Heap Fetches: 15
                                                     Buffers: shared hit=28
                                                     I/O Timings: read=0.000 write=0.000
                                         ->  Hash  (cost=0.28..0.28 rows=14 width=6) (actual time=0.006..0.006 rows=9 loops=1)
                                               Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                               I/O Timings: read=0.000 write=0.000
                                               ->  CTE Scan on project_stats project_stats_1  (cost=0.00..0.28 rows=14 width=6) (actual time=0.001..0.002 rows=9 loops=1)
                                                     I/O Timings: read=0.000 write=0.000
   CTE upserted
     ->  ModifyTable on public.analyzer_namespace_statuses analyzer_namespace_statuses_2  (cost=0.00..0.38 rows=15 width=82) (actual time=0.637..1.684 rows=9 loops=1)
           Buffers: shared hit=290 read=28 dirtied=21
           WAL: records=52 fpi=10 bytes=85105
           I/O Timings: read=0.905 write=0.000
           ->  CTE Scan on new_values new_values_2  (cost=0.00..0.38 rows=15 width=82) (actual time=0.150..0.292 rows=9 loops=1)
                 Buffers: shared hit=101 dirtied=1
                 I/O Timings: read=0.000 write=0.000
   ->  Nested Loop Left Join  (cost=0.43..59.66 rows=15 width=70) (actual time=0.012..0.033 rows=9 loops=1)
         Filter: (((new_values.success - COALESCE(analyzer_namespace_statuses.success, '0'::bigint)) <> 0) OR ((new_values.failure - COALESCE(analyzer_namespace_statuses.failure, '0'::bigint)) <> 0))
         Rows Removed by Filter: 0
         Buffers: shared hit=42
         I/O Timings: read=0.000 write=0.000
         ->  CTE Scan on new_values  (cost=0.00..0.30 rows=15 width=54) (actual time=0.000..0.001 rows=9 loops=1)
               I/O Timings: read=0.000 write=0.000
         ->  Nested Loop Semi Join  (cost=0.43..3.94 rows=1 width=26) (actual time=0.003..0.003 rows=1 loops=9)
               Buffers: shared hit=42
               I/O Timings: read=0.000 write=0.000
               ->  Index Scan using index_analyzer_namespace_statuses_status on public.analyzer_namespace_statuses  (cost=0.43..3.45 rows=1 width=26) (actual time=0.002..0.002 rows=1 loops=9)
                     Index Cond: ((analyzer_namespace_statuses.namespace_id = new_values.namespace_id) AND (analyzer_namespace_statuses.analyzer_type = new_values.analyzer_type))
                     Buffers: shared hit=42
                     I/O Timings: read=0.000 write=0.000
               ->  CTE Scan on new_values new_values_1  (cost=0.00..0.30 rows=15 width=4) (actual time=0.000..0.000 rows=2 loops=6)
                     I/O Timings: read=0.000 write=0.000
   ->  Hash  (cost=0.30..0.30 rows=15 width=8) (actual time=1.697..1.697 rows=9 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         Buffers: shared hit=290 read=28 dirtied=21
         WAL: records=52 fpi=10 bytes=85105
         I/O Timings: read=0.905 write=0.000
         ->  CTE Scan on upserted  (cost=0.00..0.30 rows=15 width=8) (actual time=0.640..1.690 rows=9 loops=1)
               Buffers: shared hit=290 read=28 dirtied=21
               WAL: records=52 fpi=10 bytes=85105
               I/O Timings: read=0.905 write=0.000
Settings: random_page_cost = '1.5', work_mem = '100MB', effective_cache_size = '338688MB', jit = 'off', seq_page_cost = '4'

Plan Comparison

Previous solution plan

Append (UNION ALL)
├─ Branch 1: HashAggregate → project stats with counts > 0
└─ Branch 2: Subquery Scan
   └─ Unique
      └─ Sort
         └─ Nested Loop Anti Join (cost: 1,761,707) ⚠️
            ├─ Get all analyzer types from namespace
            └─ For EACH type, scan analyzer_project_statuses table
               └─ Check millions of rows to prove NON-existence

Current solution plan

Append (UNION ALL)
├─ Branch 1: CTE Scan on project_stats (already computed)
└─ Branch 2: Subquery Scan
   └─ Unique
      └─ Sort
         └─ Hash Anti Join (cost: 8.27) ✅
            ├─ Get all analyzer types from namespace
            └─ Hash lookup in project_stats CTE
               └─ Check only 9 rows to prove NON-existence

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited Oct 24, 2025 by Nicolae Rotaru

Fix Tool Coverage data update when all projects are archived

What does this MR do and why?

Problem Description

References

#572794

Follow up

!208741 (merged)

How to set up and validate locally

Query plans

Before

Previous solution (reverted because timeouts)

Current solution

Plan Comparison

Previous solution plan

Current solution plan

MR acceptance checklist

Merge request reports