2018-11-06: Up to 15 minute delays on clones from GitLab repositories, including www-gitlab-com, gitlab-ee, gitlab-ce

First reported in general alerting at ~10h30 UTC: https://gitlab.slack.com/archives/CD6HFD1L0/p1541500304044300

Slack conversation: https://gitlab.slack.com/archives/CD6HFD1L0/p1541511890051000

chart

Working notes Google doc for further investigation: https://docs.google.com/document/d/1sGMaBglVmi5NzUgr-MLHSHD7VL66G7ATpcR7D85sF54/edit

  • The last time this happened - https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/2880

Corrective actions

  • add some alerting around lock acquisition times
  • alert on active sessions https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/5478
  • add monitoring on a "per process" basis for important services
    • haproxy
    • td-agent
    • mtail
    • rsyslog
    • https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/5487
  • investigate our syslog configuration to ensure we aren't introducing a blocking situation: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/5486
  • upgrade to haproxy 1.8 for multithreading to take advantage of multi-core vms https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/5293
Edited Aug 03, 2020 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading