Add metrics on searching for caught up replicas [RUN ALL RSPEC] [RUN AS-IF-FOSS]
What does this MR do?
Add new metrics to reflect how often we search for caught-up replicas (in the current implementation, we demand that all of the replicas should be caught up to unstick). It would allow us to understand how often do we perform that operation and the distribution of the results.
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
Test plan:
-
Configure replication on GDK: https://gitlab.com/gitlab-org/gitlab-development-kit/-/blob/main/doc/howto/database_load_balancing.md -
Check that the metrics appeared on /-/metrics
(some write required, e.g. create an issue) -
You could also simulate replication delay - link - so, you would see this metric with the false
label (which would meanall_caught_up?
was false and we picked primary) -
Additionally, test for the structured logging output -
Also: check both with load balancing enabled and not; verify that the middleware loading order change didn't cause any issue
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
N/A
Related to #326125 (closed)
Edited by Aleksei Lipniagov