Expensive SQL queries when distributed reads are enabled
In gitlab#227215 (comment 375428748), we observed that the Praefect Cloud SQL database (8-core, 30 GB RAM) pegged out at CPU at 100%, possibly leading to lots of stalled RPCs and job failures, particularly in the www-gitlab-com
repo:
I think the query from https://gitlab.com/gitlab-org/gitaly/blob/26bae69b6bc2ddbcfa94811eb939441acabffe76/internal/praefect/datastore/queue.go#L396-405 may need to be optimized:
SELECT
storage
FROM
(
SELECT
DISTINCT ON (job ->> 'target_node_storage') job ->> 'target_node_storage' AS storage,
state
FROM
replication_queue
WHERE
job ->> 'virtual_storage' = 'praefect-file01'
AND job ->> 'relative_path' = '@hashed/fa/53/fa539965395b8382145f8370b34eab249cf610d2d6f2943c95b9b9d08a63d4a3.git'
ORDER BY
job ->> 'target_node_storage',
updated_at DESC NULLS FIRST
) t
WHERE
state = 'completed';
EXPLAIN output: https://explain.depesz.com/s/tufT