Stop calling Praefect's Readiness RPC from Rails' readiness endpoint
GitLab Rails has an endpoint for readiness check. This is called periodically to see whether the a given instance of the service is ready to serve traffic for example by a load balancer.
The checks exposed by Praefect's Readiness
RPC are not really a good fit this use case. They were intended to be diagnostic tools that can be used to identify issues for example prior to launching a Praefect. The checks in the RPC open multiple database pools and run heavy queries. They can also lead to cascading failures as the checks fail if a Gitaly node is unreachable or a given repository is unavailable, leading to all Praefect nodes being considered unready to serve.
In gitlab!95243 (merged), the aim was to extend the gitlab:gitaly:check
rake task to run these diagnostic checks. It seems that the same code that was extended is also used by the readiness endpoint which led to these these diagnostic checks being used to consider whether Praefect is ready to serve traffic or not by the Rails API.
We should fix this in Rails by not invoking these checks from the readiness API but invoking them from the rake task.
Alternatively, we could remove the integration in the Rake task. An inherent issue with the integration is that Rails is not aware of the multiple Praefect nodes but calls Praefect through the load balancer. The diagnostic checks thus run on a random Praefect picked by the load balancer. Calling the endpoint again may return different results if there is a problem on one Praefect but not all. It could also return a success and thus confusingly hide an issue present on one Praefect.