Draft: Support group archive in analyzer namespace status adjustment service
What does this MR do and why?
Adds support for group archive in analyzer namespace status adjustment service.
Changelog: fixed
EE: true
The group archive feature adds a group-level archive flag that affects all descendant groups and projects. This breaks the existing logic in AnalyzerNamespaceStatuses::AdjustmentService
, which aggregates project-level records based on the traversal_ids
of the namespace in context.
To resolve this, I tried excluding project-level records that have an archived namespace_id
anywhere in their traversal_ids
array.
Example:
Consider this namespace structure where group [1,2] is archived and we should ignore the dotted projects [1,2,4]
, [1,2,5]
when calculating the statistics for [1]
:
graph TD
A["[1]"]
A --> B["[1,1]"]
A --> C["[1,2] - Archived"]
A --> D["[1,3]"]
B --> E["[1,1,4]"]
B --> F["[1,1,5]"]
C --> G["[1,2,4]"]
C --> H["[1,2,5]"]
D --> I["[1,3,4]"]
D --> J["[1,3,5]"]
classDef dotted stroke-dasharray: 5 5
class G,H dotted
The fix adds an array overlap check to the WHERE clause:
AND NOT (analyzer_project_statuses.traversal_ids && ARRAY[<ARCHIVED_NAMESPACE_IDS>]::bigint[])
Where ARCHIVED_NAMESPACE_IDS
is all the traversal_ids
of the namespaces in this batch.
Query and plan:
Query
WITH new_values AS (
WITH namespace_data (
namespace_id,
traversal_ids,
next_traversal_id
) AS (
VALUES (98867842, ARRAY[9970, 96981785, 97058478, 98867842]::bigint[], ARRAY[9970, 96981785, 97058478, 98867843]::bigint[]))
SELECT
analyzer_project_statuses.analyzer_type,
COALESCE(SUM((status = 1)::int), 0) AS success,
COALESCE(SUM((status = 2)::int), 0) AS failure,
namespace_data.traversal_ids AS traversal_ids,
namespace_data.namespace_id AS namespace_id,
now() AS created_at,
now() AS updated_at
FROM
namespace_data
LEFT JOIN analyzer_project_statuses ON analyzer_project_statuses.archived = FALSE
AND analyzer_project_statuses.traversal_ids >= namespace_data.traversal_ids
AND analyzer_project_statuses.traversal_ids < namespace_data.next_traversal_id
AND NOT (analyzer_project_statuses.traversal_ids && ARRAY[231, 222, 96981785]::bigint[])
GROUP BY
namespace_data.traversal_ids,
namespace_id,
analyzer_project_statuses.analyzer_type
HAVING
count(analyzer_project_statuses.analyzer_type) > 0
),
old_values AS (
SELECT
namespace_id,
traversal_ids,
analyzer_type,
success,
failure
FROM
analyzer_namespace_statuses
WHERE
namespace_id IN (
SELECT
namespace_id
FROM
new_values)
),
upserted AS (
INSERT INTO analyzer_namespace_statuses (analyzer_type, success, failure, traversal_ids, namespace_id, created_at, updated_at) (
SELECT
analyzer_type,
success,
failure,
traversal_ids,
namespace_id,
created_at,
updated_at
FROM
new_values)
ON CONFLICT (namespace_id,
analyzer_type)
DO UPDATE SET
success = EXCLUDED.success, failure = EXCLUDED.failure, updated_at = EXCLUDED.updated_at, traversal_ids = EXCLUDED.traversal_ids
RETURNING
namespace_id
),
diff_values AS (
SELECT
new_values.namespace_id AS namespace_id,
new_values.traversal_ids,
new_values.analyzer_type,
new_values.success - COALESCE(old_values.success, 0) AS success,
new_values.failure - COALESCE(old_values.failure, 0) AS failure
FROM
new_values
LEFT JOIN old_values ON new_values.namespace_id = old_values.namespace_id
AND new_values.analyzer_type = old_values.analyzer_type
WHERE
EXISTS (
SELECT
1
FROM
upserted
WHERE
upserted.namespace_id = new_values.namespace_id))
SELECT
*
FROM
diff_values
WHERE
success != 0
OR failure != 0
plan
Nested Loop Semi Join (cost=4.11..7.24 rows=1 width=54) (actual time=69.444..69.449 rows=0 loops=1)
Buffers: shared hit=8 read=125 dirtied=7
WAL: records=7 fpi=7 bytes=56355
I/O Timings: read=68.373 write=0.000
CTE new_values
-> Aggregate (cost=3.61..3.65 rows=1 width=70) (actual time=69.442..69.444 rows=0 loops=1)
Group Key: analyzer_project_statuses.analyzer_type
Filter: (count(analyzer_project_statuses.analyzer_type) > 0)
Rows Removed by Filter: 1
Buffers: shared hit=8 read=125 dirtied=7
WAL: records=7 fpi=7 bytes=56355
I/O Timings: read=68.373 write=0.000
-> Sort (cost=3.61..3.62 rows=1 width=4) (actual time=69.433..69.435 rows=1 loops=1)
Sort Key: analyzer_project_statuses.analyzer_type
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=8 read=125 dirtied=7
WAL: records=7 fpi=7 bytes=56355
I/O Timings: read=68.373 write=0.000
-> Nested Loop Left Join (cost=0.56..3.60 rows=1 width=4) (actual time=69.405..69.408 rows=1 loops=1)
Buffers: shared hit=5 read=125 dirtied=7
WAL: records=7 fpi=7 bytes=56355
I/O Timings: read=68.373 write=0.000
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_analyzer_project_statuses_traversal_ids on public.analyzer_project_statuses (cost=0.56..3.58 rows=1 width=4) (actual time=69.401..69.402 rows=0 loops=1)
Index Cond: ((analyzer_project_statuses.traversal_ids >= '{9970,96981785,97058478,98867842}'::bigint[]) AND (analyzer_project_statuses.traversal_ids < '{9970,96981785,97058478,98867843}'::bigint[]))
Filter: ((NOT analyzer_project_statuses.archived) AND (NOT (analyzer_project_statuses.traversal_ids && '{231,222,96981785}'::bigint[])))
Rows Removed by Filter: 162
Buffers: shared hit=5 read=125 dirtied=7
WAL: records=7 fpi=7 bytes=56355
I/O Timings: read=68.373 write=0.000
CTE upserted
-> ModifyTable on public.analyzer_namespace_statuses analyzer_namespace_statuses_1 (cost=0.00..0.03 rows=1 width=82) (actual time=0.003..0.003 rows=0 loops=1)
I/O Timings: read=0.000 write=0.000
-> CTE Scan on new_values new_values_2 (cost=0.00..0.03 rows=1 width=82) (actual time=0.001..0.001 rows=0 loops=1)
I/O Timings: read=0.000 write=0.000
-> Nested Loop Left Join (cost=0.43..3.52 rows=1 width=70) (actual time=69.444..69.445 rows=0 loops=1)
Filter: (((new_values.success - COALESCE(analyzer_namespace_statuses.success, '0'::bigint)) <> 0) OR ((new_values.failure - COALESCE(analyzer_namespace_statuses.failure, '0'::bigint)) <> 0))
Rows Removed by Filter: 0
Buffers: shared hit=8 read=125 dirtied=7
WAL: records=7 fpi=7 bytes=56355
I/O Timings: read=68.373 write=0.000
-> CTE Scan on new_values (cost=0.00..0.02 rows=1 width=54) (actual time=69.443..69.444 rows=0 loops=1)
Buffers: shared hit=8 read=125 dirtied=7
WAL: records=7 fpi=7 bytes=56355
I/O Timings: read=68.373 write=0.000
-> Nested Loop Semi Join (cost=0.43..3.48 rows=1 width=26) (actual time=0.000..0.000 rows=0 loops=0)
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_analyzer_namespace_statuses_status on public.analyzer_namespace_statuses (cost=0.43..3.45 rows=1 width=26) (actual time=0.000..0.000 rows=0 loops=0)
Index Cond: ((analyzer_namespace_statuses.namespace_id = new_values.namespace_id) AND (analyzer_namespace_statuses.analyzer_type = new_values.analyzer_type))
I/O Timings: read=0.000 write=0.000
-> CTE Scan on new_values new_values_1 (cost=0.00..0.02 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=0)
I/O Timings: read=0.000 write=0.000
-> CTE Scan on upserted (cost=0.00..0.02 rows=1 width=8) (actual time=0.000..0.000 rows=0 loops=0)
I/O Timings: read=0.000 write=0.000
Settings: random_page_cost = '1.5', work_mem = '100MB', effective_cache_size = '338688MB', jit = 'off', seq_page_cost = '4'
Note the addition of this filter condition:
AND (NOT (analyzer_project_statuses.traversal_ids && '{<FLATTENED_TRAVERSAL_IDS>}'::bigint[])))
Given a BATCH_SIZE
of 100
namespaces and a maximum depth of 20
namespaces, is the performance impact of this array overlap check acceptable?
Related to Make Security::AnalyzersStatus::ProcessArchived... (#572476) • Gal Katz • Backlog