2021-01-13: 500 error when visiting the general projects setting page
Summary
A code bug caused a 500 error on https://gitlab.com/gitlab-org/gitlab/edit or any project settings page. Error is only present on the General
settings page, is not present in the other setting pages (e.g https://gitlab.com/gitlab-org/gitlab/-/settings/ci_cd).
The change was reverted - gitlab-org/gitlab!46222 (merged)
Issue initially raised as gitlab-org/gitlab#297666 (closed)
More information will be added as we investigate the issue.
Timeline
All times UTC.
2021-01-13
- 11:20pm @mayra-cabrera opens gitlab-org/gitlab#297666 (closed) to record the error. The change is in an ongoing production deployment.
- 11:27pm Production deployment completes and the error is visible to all users
- 11:35pm @mayra-cabrera creates revert gitlab-org/gitlab!51662 (merged) and picks into the auto_deployment
2021-01-14
- 05:59am The revert is deployed to Staging
- 06:51am The revert is deployed to Canary
- 10:13am The revert is deployed to Production
Corrective Actions
- Create development seeds for all template types gitlab-org/gitlab#299110
- Improve Service Desk documentation about templates. gitlab-org/gitlab#299165 (closed)
- New e2e tests are being added to cover the gaps:
Click to expand or collapse the Incident Review section.
Incident Review
Summary
- Service(s) affected:
- Team attribution:
- Time to detection:
- Minutes downtime or degradation:
Metrics
Customer Impact
-
Who was impacted by this incident? (i.e. external customers, internal customers)
- ...
-
What was the customer experience during the incident? (i.e. preventing them from doing X, incorrect display of Y, ...)
- ...
-
How many customers were affected?
- ...
-
If a precise customer impact number is unknown, what is the estimated impact (number and ratio of failed requests, amount of traffic drop, ...)?
- ...
What were the root causes?
Incident Response Analysis
-
How was the incident detected?
- ...
-
How could detection time be improved?
- ...
-
How was the root cause diagnosed?
- ...
-
How could time to diagnosis be improved?
- ...
-
How did we reach the point where we knew how to mitigate the impact?
- ...
-
How could time to mitigation be improved?
- ...
-
What went well?
- ...
Post Incident Analysis
-
Did we have other events in the past with the same root cause?
- ...
-
Do we have existing backlog items that would've prevented or greatly reduced the impact of this incident?
- ...
-
Was this incident triggered by a change (deployment of code or change to infrastructure)? If yes, link the issue.
- ...
Lessons Learned
Guidelines
Resources
- If the Situation Zoom room was utilised, recording will be automatically uploaded to Incident room Google Drive folder (private)
Incident Review Stakeholders
Edited by Amy Phillips