2018-11-06: Up to 15 minute delays on clones from GitLab repositories, including www-gitlab-com, gitlab-ee, gitlab-ce
First reported in general alerting at ~10h30 UTC: https://gitlab.slack.com/archives/CD6HFD1L0/p1541500304044300
Slack conversation: https://gitlab.slack.com/archives/CD6HFD1L0/p1541511890051000
Working notes Google doc for further investigation: https://docs.google.com/document/d/1sGMaBglVmi5NzUgr-MLHSHD7VL66G7ATpcR7D85sF54/edit
- The last time this happened - https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/2880
Corrective actions
- add some alerting around lock acquisition times
- alert on active sessions https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/5478
- add monitoring on a "per process" basis for important services
haproxy
td-agent
mtail
rsyslog
- https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/5487
- investigate our syslog configuration to ensure we aren't introducing a blocking situation: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/5486
- upgrade to haproxy 1.8 for multithreading to take advantage of multi-core vms https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/5293
Edited by 🤖 GitLab Bot 🤖