2022-11-27: customers.gitlab.com is down

Current Status

customers.gitlab.com seems to be down again and #s_sulfillment_status alerted on CustomersDot-zuora API returning status code 500. Screenshot_2022-11-28_at_9.03.37

From #s_fulfillment_status slack channel Screenshot_2022-11-28_at_9.03.52

📝 Summary for CMOC notice / Exec summary:

  1. Customer Impact: Customers were not able to reach customers.gitlab.com for managing payment and license information as a self-service.
  2. Service Impact: ~"Service::Customers" / ServiceCustomersDot
  3. Impact Duration: 23:43 UTC (Nov 27, 2022) - 00:40 UTC (Nov 28, 2022) (Approx 57 minutes)
  4. Root cause: Zuora API service, which is a dependency of customers portal, was down and came back up after 50+ minutes. But customer portal after several failed retries went into maintenance mode and didn't recover from it even after Zuora API was back up. Issue was mitigated by redeploying the customer portal, which restarted the portal processes as well.

📚 References and helpful links

Recent Events (available internally only):

  • Deployments ❙ Feature Flag Changes ❙ Gitlab.com Latest Updates ❙ Runbook for Release Rollback
  • Infrastructure Configurations
  • GCP Events (e.g. host failure)

Use the following links to create related issues to this incident if additional work needs to be completed after it is resolved:

  • Corrective action ❙ Infradev
  • Incident Review ❙ Infra investigation followup
  • Confidential Support contact ❙ QA investigation

Note: In some cases we need to redact information from public view. We only do this in a limited number of documented cases. This might include the summary, timeline or any other bits of information, laid out in out handbook page. Any of this confidential data will be in a linked issue, only visible internally. By default, all information we can share, will be public, in accordance to our transparency value.

Edited Nov 28, 2022 by Furhan Shabir
Assignee Loading
Time tracking Loading