Geo: Sync timeout problem
A customer has limited bandwidth between two Geo Nodes. It looks like a 2.4GB repo will take about 4 hours to clone. http://gitlab.zendesk.com/agent/tickets/110963
It was difficult to:
- Discover that the clone was hitting the timeout
- Figure out how high the timeout should be
We increased the timeout a long time ago for Geo purposes here: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/15292 and omnibus-gitlab@f6a56821
The default timeout is 10800 seconds (3 hours). Confusingly, the template makes it look like the default is 800 seconds (13.3 mins).
We can't just keep increasing the timeout, can we? If a default timeout has value, then at some point we need to choose a limit.
Let's say this default is fine...
When a big repo is syncing over a slow connection, there isn't enough visibility into what's happening to know if it's even syncing at all.
Then when it times out, there again isn't enough visibility to know what to fix.
Proposal
-
When a sync fails (esp by timeout), surface something helpfulSee #9052 (comment 2970209595) -
On Geo > Projects, show when a project is currently being synced? (so they don't have to go look at background jobs)See #9052 (comment 2970209595) - Align the gitlab.rb.template with the default (change 800 to 10800)
Implementation Guide
- Align the gitlab.rb.template with the default (change 800 to 10800)