Incident Working doc: 2018-08-17

Incident reported.

Working Doc: https://docs.google.com/document/d/1p0pLaASameWw_l_DTeWhzkUVXWE8N8MUtJK4fmkCxHg/edit

At 16:55 UTC we started receiving multiple alerts and reports of GitLab.com being down.

@ahmadsherif noticed a surge on pgbouncer errors on sentry https://sentry.gitlap.com/gitlab/gitlabcom/issues/493055/?query=is:unresolved. Upon checking the pgbouncer config we saw that it didn't have any host configuration. The suspicion is a pgb-notify command error caused this misconfiguration but still don't know for sure.

At 17:03 UTC we update databases.ini manually to point to postgres-03 after checking that it is still the only master, then HUP-ing pgbouncer. This brought back the site to normal functioning.

Edited Aug 17, 2018 by Alejandro Rodríguez
Assignee Loading
Time tracking Loading