Skip to content

Metrics collection fails with error 'failed collecting read-only repository count metric' due very slow database query

Issue

After upgrading from 14.1.2 to 14.2.2 a customer found that some Prometheus metrics were no longer being reported. In the Praefect log the following error was being reported for each scrape request:

{
  "component": "RepositoryStoreCollector",
  "error": "query: pq: canceling statement due to user request",
  "level": "error",
  "msg": "failed collecting read-only repository count metric",
  "pid": 59911,
  "time": "2021-09-25T18:48:46.117Z"
}

The issue appears to be caused by an extremely slow query to check for read-only repositories. In a test instance with 70,003 rows in the repositories table this query takes 31 minutes to execute.

This problem is similar to #3286 (closed) and will be fixed by !3663 (closed) which removes the failing query.

The ANY('{"<VIRTUAL_STORAGE>"}' clause appears to be source of slowness, even with only one virtual storage. Modifying the query slightly to use a single virtual storage rather than an array cut execution time to 1.75 seconds:

SELECT virtual_storage, COUNT(*)
FROM repositories
WHERE NOT EXISTS (
	SELECT FROM valid_primaries
	WHERE valid_primaries.virtual_storage = repositories.virtual_storage
	AND   valid_primaries.relative_path   = repositories.relative_path
) AND repositories.virtual_storage = 'cluster'
GROUP BY virtual_storage

EXPLAIN ANALYZE results

Failing query with `repositories.virtual_storage = ANY('{"cluster"}')`, 31 minutes to execute
praefect_production=> EXPLAIN ANALYZE SELECT virtual_storage, COUNT(*) FROM repositories WHERE NOT EXISTS (                                    SELECT FROM valid_primaries                                                                                                                   WHERE valid_primaries.virtual_storage = repositories.virtual_storage                                                                          AND valid_primaries.relative_path = repositories.relative_path                                                                               ) AND repositories.virtual_storage = ANY('{"cluster"}')                                                                                       GROUP BY virtual_storage;
                                                                                                                                        QUERY 
PLAN                                                                                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=27585.00..27585.01 rows=1 width=16) (actual time=1859639.730..1859639.747 rows=0 loops=1)
   Group Key: repositories.virtual_storage
   Batches: 1  Memory Usage: 24kB
   ->  Nested Loop Anti Join  (cost=9.04..27234.97 rows=70007 width=8) (actual time=1859639.728..1859639.744 rows=0 loops=1)
         Join Filter: ((candidates.virtual_storage = repositories.virtual_storage) AND (candidates.relative_path = repositories.relative_path))
         Rows Removed by Join Filter: 4900350006
         ->  Seq Scan on repositories  (cost=0.00..2909.59 rows=70008 width=89) (actual time=0.013..451.147 rows=70003 loops=1)
               Filter: (virtual_storage = ANY ('{cluster}'::text[]))
         ->  Materialize  (cost=9.04..23100.24 rows=1 width=89) (actual time=0.004..12.759 rows=70003 loops=70003)
               ->  Subquery Scan on candidates  (cost=9.04..23100.24 rows=1 width=89) (actual time=0.166..2013.699 rows=140005 loops=1)
                     Filter: candidates.eligible
                     ->  WindowAgg  (cost=9.04..23100.22 rows=2 width=122) (actual time=0.166..1983.623 rows=140005 loops=1)
                           ->  Nested Loop Anti Join  (cost=9.04..23100.18 rows=2 width=121) (actual time=0.102..1827.837 rows=140006 loops=1)
                                 ->  Nested Loop Left Join  (cost=8.92..23099.86 rows=2 width=130) (actual time=0.098..1680.288 rows=140006 loops=1)
                                       Join Filter: ((storage_repositories.virtual_storage = repository_assignments.virtual_storage) AND (storage_repositories.relative_path = repository_assignments.relative_path) AND (storage_repositories.storage = repository_assignments.storage))
                                       ->  Nested Loop  (cost=8.92..23057.83 rows=2 width=98) (actual time=0.093..1624.873 rows=140006 loops=1)
                                             Join Filter: ((ns.node_name)::text = storage_repositories.storage)
                                             Rows Removed by Join Filter: 70003
                                             ->  Merge Join  (cost=8.50..22779.51 rows=349 width=114) (actual time=0.075..253.825 rows=140006 loops=1)
                                                   Merge Cond: (storage_repositories_1.virtual_storage = (ns.shard_name)::text)
                                                   ->  GroupAggregate  (cost=0.42..21885.73 rows=69786 width=97) (actual time=0.021..178.999 rows=70003 loops=1)
                                                         Group Key: storage_repositories_1.virtual_storage, storage_repositories_1.relative_path
                                                         ->  Index Scan using storage_repositories_pkey on storage_repositories storage_repositories_1  (cost=0.42..20137.82 rows=140006 width=97) (actual time=0.012..115.796 rows=140006 loops=1)
                                                   ->  Materialize  (cost=8.08..16.22 rows=1 width=17) (actual time=0.050..15.399 rows=70004 loops=1)
                                                         ->  GroupAggregate  (cost=8.08..16.21 rows=1 width=17) (actual time=0.049..0.070 rows=2 loops=1)
                                                               Group Key: ns.shard_name, ns.node_name
                                                               Filter: ((count(ns.praefect_name))::numeric >= (SubPlan 1))
                                                               ->  Sort  (cost=8.08..8.09 rows=1 width=41) (actual time=0.025..0.026 rows=2 loops=1)
                                                                     Sort Key: ns.shard_name, ns.node_name
                                                                     Sort Method: quicksort  Memory: 25kB
                                                                     ->  Seq Scan on node_status ns  (cost=0.00..8.07 rows=1 width=41) (actual time=0.015..0.016 rows=2 loops=1)
                                                                           Filter: (last_seen_active_at >= (now() - '00:00:10'::interval))
                                                                           Rows Removed by Filter: 2
                                                               SubPlan 1
                                                                 ->  Aggregate  (cost=8.08..8.10 rows=1 width=32) (actual time=0.015..0.016 rows=1 loops=2)
                                                                       ->  Seq Scan on node_status  (cost=0.00..8.08 rows=1 width=24) (actual time=0.004..0.005 rows=2 loops=2)
                                                                             Filter: (((shard_name)::text = (ns.shard_name)::text) AND (last_contact_attempt_at >= (now() - '00:01:00'::interval)))
                                                                             Rows Removed by Filter: 2
                                             ->  Index Scan using storage_repositories_pkey on storage_repositories  (cost=0.42..0.79 rows=1 width=106) (actual time=0.009..0.009 rows=2 loops=140006)
                                                   Index Cond: ((virtual_storage = storage_repositories_1.virtual_storage) AND (relative_path = storage_repositories_1.relative_path))
                                                   Filter: ((max(storage_repositories_1.generation)) = generation)
                                       ->  Materialize  (cost=0.00..19.15 rows=610 width=96) (actual time=0.000..0.000 rows=0 loops=140006)
                                             ->  Seq Scan on repository_assignments  (cost=0.00..16.10 rows=610 width=96) (actual time=0.002..0.003 rows=0 loops=1)
                                 ->  Index Scan using delete_replica_unique_index on replication_queue  (cost=0.12..0.15 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=140006)
                                       Index Cond: (((job ->> 'virtual_storage'::text) = storage_repositories.virtual_storage) AND ((job ->> 'relative_path'::text) = storage_repositories.relative_path))
                                       Filter: ((job ->> 'target_node_storage'::text) = storage_repositories.storage)
 Planning Time: 2.038 ms
 Execution Time: 1859642.870 ms
 (48 rows)
Modified query using `repositories.virtual_storage = 'cluster'`, 1.75 seconds to execute

praefect_production=> EXPLAIN ANALYZE SELECT virtual_storage, COUNT(*) FROM repositories WHERE NOT EXISTS ( SELECT FROM valid_primaries WHERE valid_primaries.virtual_storage = repositories.virtual_storage AND valid_primaries.relative_path = repositories.relative_path ) AND repositories.virtual_storage = 'cluster' GROUP BY virtual_storage; QUERY PLAN



GroupAggregate (cost=34596.60..39009.59 rows=1 width=16) (actual time=1747.042..1747.072 rows=0 loops=1) Group Key: repositories.virtual_storage -> Hash Anti Join (cost=34596.60..38660.41 rows=69834 width=8) (actual time=1747.025..1747.055 rows=0 loops=1) Hash Cond: ((repositories.virtual_storage = candidates.virtual_storage) AND (repositories.relative_path = candidates.relative_path)) -> Seq Scan on repositories (cost=0.00..2997.10 rows=70008 width=89) (actual time=0.008..14.227 rows=70003 loops=1) Filter: (virtual_storage = 'cluster'::text) -> Hash (cost=34593.97..34593.97 rows=175 width=89) (actual time=1608.694..1608.721 rows=140006 loops=1) Buckets: 32768 (originally 1024) Batches: 8 (originally 1) Memory Usage: 3841kB -> Subquery Scan on candidates (cost=34578.16..34593.97 rows=175 width=89) (actual time=1155.654..1549.349 rows=140006 loops=1) Filter: candidates.eligible -> WindowAgg (cost=34578.16..34590.47 rows=350 width=122) (actual time=1155.653..1521.809 rows=140006 loops=1) -> Merge Anti Join (cost=34578.16..34584.35 rows=350 width=121) (actual time=1155.633..1403.835 rows=140006 loops=1) Merge Cond: ((storage_repositories.relative_path = ((replication_queue.job ->> 'relative_path'::text))) AND (storage_repositories.storage = ((replication_queue.job ->> 'target_node_storage'::text)))) Join Filter: ((replication_queue.job ->> 'virtual_storage'::text) = storage_repositories.virtual_storage) -> Merge Left Join (cost=34570.00..34572.66 rows=350 width=130) (actual time=1155.608..1377.071 rows=140006 loops=1) Merge Cond: ((storage_repositories.relative_path = repository_assignments.relative_path) AND (storage_repositories.storage = repository_assignments.storage)) Join Filter: (storage_repositories.virtual_storage = repository_assignments.virtual_storage) -> Sort (cost=34558.70..34559.57 rows=350 width=98) (actual time=1155.565..1346.276 rows=140006 loops=1) Sort Key: storage_repositories.relative_path, storage_repositories.storage Sort Method: external merge Disk: 14808kB -> Hash Join (cost=7738.83..34543.91 rows=350 width=98) (actual time=178.780..552.039 rows=140006 loops=1) Hash Cond: ((storage_repositories_1.relative_path = storage_repositories.relative_path) AND ((max(storage_repositories_1.generation)) = storage_repositories.generation)) -> GroupAggregate (cost=0.42..22235.74 rows=69786 width=97) (actual time=0.031..195.422 rows=70003 loops=1) Group Key: storage_repositories_1.virtual_storage, storage_repositories_1.relative_path -> Index Scan using storage_repositories_pkey on storage_repositories storage_repositories_1 (cost=0.42..20487.84 rows=140006 width=97) (actual time=0.022..141.953 rows=140006 loops=1) Index Cond: (virtual_storage = 'cluster'::text) -> Hash (cost=5525.37..5525.37 rows=70003 width=106) (actual time=178.374..178.391 rows=140006 loops=1) Buckets: 32768 (originally 32768) Batches: 8 (originally 4) Memory Usage: 3841kB -> Hash Join (cost=4.24..5525.37 rows=70003 width=106) (actual time=0.094..109.500 rows=140006 loops=1) Hash Cond: (storage_repositories.storage = (ns.node_name)::text) -> Seq Scan on storage_repositories (cost=0.00..4296.07 rows=140006 width=106) (actual time=0.005..53.619 rows=140006 loops=1) Filter: (virtual_storage = 'cluster'::text) -> Hash (cost=4.23..4.23 rows=1 width=17) (actual time=0.069..0.082 rows=2 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> GroupAggregate (cost=2.09..4.22 rows=1 width=17) (actual time=0.053..0.073 rows=2 loops=1) Group Key: ns.shard_name, ns.node_name Filter: ((count(ns.praefect_name))::numeric >= (SubPlan 1)) -> Sort (cost=2.09..2.09 rows=1 width=41) (actual time=0.021..0.022 rows=2 loops=1) Sort Key: ns.node_name Sort Method: quicksort Memory: 25kB -> Seq Scan on node_status ns (cost=0.00..2.08 rows=1 width=41) (actual time=0.011..0.012 rows=2 loops=1) Filter: (((shard_name)::text = 'cluster'::text) AND (last_seen_active_at >= (now() - '00:00:10'::interval))) Rows Removed by Filter: 2 SubPlan 1 -> Aggregate (cost=2.08..2.10 rows=1 width=32) (actual time=0.015..0.019 rows=1 loops=2) -> Seq Scan on node_status (cost=0.00..2.08 rows=1 width=24) (actual time=0.003..0.004 rows=2 loops=2) Filter: (((shard_name)::text = (ns.shard_name)::text) AND (last_contact_attempt_at >= (now() - '00:01:00'::interval))) Rows Removed by Filter: 2 -> Sort (cost=11.31..11.31 rows=3 width=96) (actual time=0.035..0.036 rows=0 loops=1) Sort Key: repository_assignments.relative_path, repository_assignments.storage Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on repository_assignments (cost=4.17..11.28 rows=3 width=96) (actual time=0.020..0.021 rows=0 loops=1) Recheck Cond: (virtual_storage = 'cluster'::text) -> Bitmap Index Scan on repository_assignments_pkey (cost=0.00..4.17 rows=3 width=0) (actual time=0.016..0.016 rows=0 loops=1) Index Cond: (virtual_storage = 'cluster'::text) -> Sort (cost=8.15..8.16 rows=1 width=32) (actual time=0.023..0.023 rows=0 loops=1) Index Cond: ((job ->> 'virtual_storage'::text) = 'cluster'::text) Planning Time: 1.787 ms Execution Time: 1750.485 ms (62 rows)

/cc @8bitlife

Edited by Will Chandler (ex-GitLab)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information