500s on gitlab.com after/during EE 10.0.1 deployment

We received some internal reports of 502s on the site during the 10.0.1 deploy. Additionally the blackbox probe for pages failed for a short interval.

https://gitlab.slack.com/archives/C101F3796/p1506329398000118

Immediately after the deployment we saw increased load on multiple api servers and rate limiting 500s on haproxy to the api backend.

Dashboards:

  • https://performance.gitlab.net/dashboard/db/fleet-overview?refresh=5m&orgId=1
  • https://performance.gitlab.net/dashboard/db/triage-overview?refresh=1m&orgId=1&from=now-3h&to=now

Screen_Shot_2017-09-25_at_12.11.43_PM Screen_Shot_2017-09-25_at_12.11.33_PM Screen_Shot_2017-09-25_at_12.11.28_PM Screen_Shot_2017-09-25_at_11.24.44_AM

Edited Sep 25, 2017 by John Jarvis
Assignee Loading
Time tracking Loading