Repository downtime latency during repository migration
With https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11809 ramping up, we expect to start performing repository migrations en-masse, and through automated means, imminently.
One of the questions that's been raised in this regard is the duration of the downtime during a repository move.
My response is normally "very short", but I don't know what range this value lies in.
Request
Is it possible to record this duration, in logs (including the project name) and in metrics (as a histogram)?
Why?
Since we plan to be doing a lot of these migrations, we should model an SLI for repository shard migration downtime, and set an SLO. This will give us meaningful numbers with which we can communicate with customers, TAMs, etc.
This SLI and SLO would be evaluated through the metrics-catalog: https://gitlab.com/gitlab-com/runbooks/blob/master/metrics-catalog/services/gitaly.jsonnet
Additionally, if we find that this value is too high, recording it will help us gauge our progress towards an satisfactory value.
cc @albertoramos @nnelson @davis_townsend @awthomas @derekferguson @zj-gitlab