Metrics collection fails with error 'failed collecting read-only repository count metric' due very slow database query
Issue
After upgrading from 14.1.2 to 14.2.2 a customer found that some Prometheus metrics were no longer being reported. In the Praefect log the following error was being reported for each scrape request:
{
"component": "RepositoryStoreCollector",
"error": "query: pq: canceling statement due to user request",
"level": "error",
"msg": "failed collecting read-only repository count metric",
"pid": 59911,
"time": "2021-09-25T18:48:46.117Z"
}
The issue appears to be caused by an extremely slow query to check for read-only repositories. In a test instance with 70,003 rows in the repositories
table this query takes 31 minutes to execute.
This problem is similar to #3286 (closed) and will be fixed by !3663 (closed) which removes the failing query.
The ANY('{"<VIRTUAL_STORAGE>"}'
clause appears to be source of slowness, even with only one virtual storage. Modifying the query slightly to use a single virtual storage rather than an array cut execution time to 1.75 seconds:
SELECT virtual_storage, COUNT(*)
FROM repositories
WHERE NOT EXISTS (
SELECT FROM valid_primaries
WHERE valid_primaries.virtual_storage = repositories.virtual_storage
AND valid_primaries.relative_path = repositories.relative_path
) AND repositories.virtual_storage = 'cluster'
GROUP BY virtual_storage
EXPLAIN ANALYZE results
Failing query with `repositories.virtual_storage = ANY('{"cluster"}')`, 31 minutes to execute
praefect_production=> EXPLAIN ANALYZE SELECT virtual_storage, COUNT(*) FROM repositories WHERE NOT EXISTS ( SELECT FROM valid_primaries WHERE valid_primaries.virtual_storage = repositories.virtual_storage AND valid_primaries.relative_path = repositories.relative_path ) AND repositories.virtual_storage = ANY('{"cluster"}') GROUP BY virtual_storage; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=27585.00..27585.01 rows=1 width=16) (actual time=1859639.730..1859639.747 rows=0 loops=1) Group Key: repositories.virtual_storage Batches: 1 Memory Usage: 24kB -> Nested Loop Anti Join (cost=9.04..27234.97 rows=70007 width=8) (actual time=1859639.728..1859639.744 rows=0 loops=1) Join Filter: ((candidates.virtual_storage = repositories.virtual_storage) AND (candidates.relative_path = repositories.relative_path)) Rows Removed by Join Filter: 4900350006 -> Seq Scan on repositories (cost=0.00..2909.59 rows=70008 width=89) (actual time=0.013..451.147 rows=70003 loops=1) Filter: (virtual_storage = ANY ('{cluster}'::text[])) -> Materialize (cost=9.04..23100.24 rows=1 width=89) (actual time=0.004..12.759 rows=70003 loops=70003) -> Subquery Scan on candidates (cost=9.04..23100.24 rows=1 width=89) (actual time=0.166..2013.699 rows=140005 loops=1) Filter: candidates.eligible -> WindowAgg (cost=9.04..23100.22 rows=2 width=122) (actual time=0.166..1983.623 rows=140005 loops=1) -> Nested Loop Anti Join (cost=9.04..23100.18 rows=2 width=121) (actual time=0.102..1827.837 rows=140006 loops=1) -> Nested Loop Left Join (cost=8.92..23099.86 rows=2 width=130) (actual time=0.098..1680.288 rows=140006 loops=1) Join Filter: ((storage_repositories.virtual_storage = repository_assignments.virtual_storage) AND (storage_repositories.relative_path = repository_assignments.relative_path) AND (storage_repositories.storage = repository_assignments.storage)) -> Nested Loop (cost=8.92..23057.83 rows=2 width=98) (actual time=0.093..1624.873 rows=140006 loops=1) Join Filter: ((ns.node_name)::text = storage_repositories.storage) Rows Removed by Join Filter: 70003 -> Merge Join (cost=8.50..22779.51 rows=349 width=114) (actual time=0.075..253.825 rows=140006 loops=1) Merge Cond: (storage_repositories_1.virtual_storage = (ns.shard_name)::text) -> GroupAggregate (cost=0.42..21885.73 rows=69786 width=97) (actual time=0.021..178.999 rows=70003 loops=1) Group Key: storage_repositories_1.virtual_storage, storage_repositories_1.relative_path -> Index Scan using storage_repositories_pkey on storage_repositories storage_repositories_1 (cost=0.42..20137.82 rows=140006 width=97) (actual time=0.012..115.796 rows=140006 loops=1) -> Materialize (cost=8.08..16.22 rows=1 width=17) (actual time=0.050..15.399 rows=70004 loops=1) -> GroupAggregate (cost=8.08..16.21 rows=1 width=17) (actual time=0.049..0.070 rows=2 loops=1) Group Key: ns.shard_name, ns.node_name Filter: ((count(ns.praefect_name))::numeric >= (SubPlan 1)) -> Sort (cost=8.08..8.09 rows=1 width=41) (actual time=0.025..0.026 rows=2 loops=1) Sort Key: ns.shard_name, ns.node_name Sort Method: quicksort Memory: 25kB -> Seq Scan on node_status ns (cost=0.00..8.07 rows=1 width=41) (actual time=0.015..0.016 rows=2 loops=1) Filter: (last_seen_active_at >= (now() - '00:00:10'::interval)) Rows Removed by Filter: 2 SubPlan 1 -> Aggregate (cost=8.08..8.10 rows=1 width=32) (actual time=0.015..0.016 rows=1 loops=2) -> Seq Scan on node_status (cost=0.00..8.08 rows=1 width=24) (actual time=0.004..0.005 rows=2 loops=2) Filter: (((shard_name)::text = (ns.shard_name)::text) AND (last_contact_attempt_at >= (now() - '00:01:00'::interval))) Rows Removed by Filter: 2 -> Index Scan using storage_repositories_pkey on storage_repositories (cost=0.42..0.79 rows=1 width=106) (actual time=0.009..0.009 rows=2 loops=140006) Index Cond: ((virtual_storage = storage_repositories_1.virtual_storage) AND (relative_path = storage_repositories_1.relative_path)) Filter: ((max(storage_repositories_1.generation)) = generation) -> Materialize (cost=0.00..19.15 rows=610 width=96) (actual time=0.000..0.000 rows=0 loops=140006) -> Seq Scan on repository_assignments (cost=0.00..16.10 rows=610 width=96) (actual time=0.002..0.003 rows=0 loops=1) -> Index Scan using delete_replica_unique_index on replication_queue (cost=0.12..0.15 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=140006) Index Cond: (((job ->> 'virtual_storage'::text) = storage_repositories.virtual_storage) AND ((job ->> 'relative_path'::text) = storage_repositories.relative_path)) Filter: ((job ->> 'target_node_storage'::text) = storage_repositories.storage) Planning Time: 2.038 ms Execution Time: 1859642.870 ms (48 rows)
Modified query using `repositories.virtual_storage = 'cluster'`, 1.75 seconds to execute
praefect_production=> EXPLAIN ANALYZE SELECT virtual_storage, COUNT(*) FROM repositories WHERE NOT EXISTS ( SELECT FROM valid_primaries WHERE valid_primaries.virtual_storage = repositories.virtual_storage AND valid_primaries.relative_path = repositories.relative_path ) AND repositories.virtual_storage = 'cluster' GROUP BY virtual_storage; QUERY PLAN
GroupAggregate (cost=34596.60..39009.59 rows=1 width=16) (actual time=1747.042..1747.072 rows=0 loops=1) Group Key: repositories.virtual_storage -> Hash Anti Join (cost=34596.60..38660.41 rows=69834 width=8) (actual time=1747.025..1747.055 rows=0 loops=1) Hash Cond: ((repositories.virtual_storage = candidates.virtual_storage) AND (repositories.relative_path = candidates.relative_path)) -> Seq Scan on repositories (cost=0.00..2997.10 rows=70008 width=89) (actual time=0.008..14.227 rows=70003 loops=1) Filter: (virtual_storage = 'cluster'::text) -> Hash (cost=34593.97..34593.97 rows=175 width=89) (actual time=1608.694..1608.721 rows=140006 loops=1) Buckets: 32768 (originally 1024) Batches: 8 (originally 1) Memory Usage: 3841kB -> Subquery Scan on candidates (cost=34578.16..34593.97 rows=175 width=89) (actual time=1155.654..1549.349 rows=140006 loops=1) Filter: candidates.eligible -> WindowAgg (cost=34578.16..34590.47 rows=350 width=122) (actual time=1155.653..1521.809 rows=140006 loops=1) -> Merge Anti Join (cost=34578.16..34584.35 rows=350 width=121) (actual time=1155.633..1403.835 rows=140006 loops=1) Merge Cond: ((storage_repositories.relative_path = ((replication_queue.job ->> 'relative_path'::text))) AND (storage_repositories.storage = ((replication_queue.job ->> 'target_node_storage'::text)))) Join Filter: ((replication_queue.job ->> 'virtual_storage'::text) = storage_repositories.virtual_storage) -> Merge Left Join (cost=34570.00..34572.66 rows=350 width=130) (actual time=1155.608..1377.071 rows=140006 loops=1) Merge Cond: ((storage_repositories.relative_path = repository_assignments.relative_path) AND (storage_repositories.storage = repository_assignments.storage)) Join Filter: (storage_repositories.virtual_storage = repository_assignments.virtual_storage) -> Sort (cost=34558.70..34559.57 rows=350 width=98) (actual time=1155.565..1346.276 rows=140006 loops=1) Sort Key: storage_repositories.relative_path, storage_repositories.storage Sort Method: external merge Disk: 14808kB -> Hash Join (cost=7738.83..34543.91 rows=350 width=98) (actual time=178.780..552.039 rows=140006 loops=1) Hash Cond: ((storage_repositories_1.relative_path = storage_repositories.relative_path) AND ((max(storage_repositories_1.generation)) = storage_repositories.generation)) -> GroupAggregate (cost=0.42..22235.74 rows=69786 width=97) (actual time=0.031..195.422 rows=70003 loops=1) Group Key: storage_repositories_1.virtual_storage, storage_repositories_1.relative_path -> Index Scan using storage_repositories_pkey on storage_repositories storage_repositories_1 (cost=0.42..20487.84 rows=140006 width=97) (actual time=0.022..141.953 rows=140006 loops=1) Index Cond: (virtual_storage = 'cluster'::text) -> Hash (cost=5525.37..5525.37 rows=70003 width=106) (actual time=178.374..178.391 rows=140006 loops=1) Buckets: 32768 (originally 32768) Batches: 8 (originally 4) Memory Usage: 3841kB -> Hash Join (cost=4.24..5525.37 rows=70003 width=106) (actual time=0.094..109.500 rows=140006 loops=1) Hash Cond: (storage_repositories.storage = (ns.node_name)::text) -> Seq Scan on storage_repositories (cost=0.00..4296.07 rows=140006 width=106) (actual time=0.005..53.619 rows=140006 loops=1) Filter: (virtual_storage = 'cluster'::text) -> Hash (cost=4.23..4.23 rows=1 width=17) (actual time=0.069..0.082 rows=2 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> GroupAggregate (cost=2.09..4.22 rows=1 width=17) (actual time=0.053..0.073 rows=2 loops=1) Group Key: ns.shard_name, ns.node_name Filter: ((count(ns.praefect_name))::numeric >= (SubPlan 1)) -> Sort (cost=2.09..2.09 rows=1 width=41) (actual time=0.021..0.022 rows=2 loops=1) Sort Key: ns.node_name Sort Method: quicksort Memory: 25kB -> Seq Scan on node_status ns (cost=0.00..2.08 rows=1 width=41) (actual time=0.011..0.012 rows=2 loops=1) Filter: (((shard_name)::text = 'cluster'::text) AND (last_seen_active_at >= (now() - '00:00:10'::interval))) Rows Removed by Filter: 2 SubPlan 1 -> Aggregate (cost=2.08..2.10 rows=1 width=32) (actual time=0.015..0.019 rows=1 loops=2) -> Seq Scan on node_status (cost=0.00..2.08 rows=1 width=24) (actual time=0.003..0.004 rows=2 loops=2) Filter: (((shard_name)::text = (ns.shard_name)::text) AND (last_contact_attempt_at >= (now() - '00:01:00'::interval))) Rows Removed by Filter: 2 -> Sort (cost=11.31..11.31 rows=3 width=96) (actual time=0.035..0.036 rows=0 loops=1) Sort Key: repository_assignments.relative_path, repository_assignments.storage Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on repository_assignments (cost=4.17..11.28 rows=3 width=96) (actual time=0.020..0.021 rows=0 loops=1) Recheck Cond: (virtual_storage = 'cluster'::text) -> Bitmap Index Scan on repository_assignments_pkey (cost=0.00..4.17 rows=3 width=0) (actual time=0.016..0.016 rows=0 loops=1) Index Cond: (virtual_storage = 'cluster'::text) -> Sort (cost=8.15..8.16 rows=1 width=32) (actual time=0.023..0.023 rows=0 loops=1) Index Cond: ((job ->> 'virtual_storage'::text) = 'cluster'::text) Planning Time: 1.787 ms Execution Time: 1750.485 ms (62 rows)
/cc @8bitlife