2021-02-03 - Deployments blocked: "undefined method `map' for "---\n- gstg.plantuml.gitlab-static.net\n- staging.gitlab.com\n":String" error
Summary
post-deployment migration failed on Staging blocking deployments
Timeline
All times UTC.
2021-02-03
-
16:07- A deployment to staging starts https://ops.gitlab.net/gitlab-com/gl-infra/deployer/-/pipelines/451724 -
16:10-gstg-migrationsfails with "undefined method `map' for "---\n- gstg.plantuml.gitlab-static.net\n- staging.gitlab.com\n":String" error". See here for the full log #3494 (comment 500723039) -
16:27- @mayra-cabrera declares incident in Slack. -
17:13- A fix is being prepared gitlab-org/gitlab!53272 (merged). It'll need to be deployed to staging to mitigate this incident. -
20:44- A deployment to staging with the fix starts https://ops.gitlab.net/gitlab-com/gl-infra/deployer/-/pipelines/452104 -
21:34- Deployment to staging finishes successfully -
21:35- Deployment to canary starts -
22:28- Deployment to canary finishes successfully -
23:32- Deployment to prod starts -
00:58- Deployment to prod finishes
Corrective Actions
Incident Review
Summary
- Service(s) affected:
- Team attribution:
- Time to detection:
- Minutes downtime or degradation:
Metrics
Customer Impact
-
Who was impacted by this incident? (i.e. external customers, internal customers)
- ...
-
What was the customer experience during the incident? (i.e. preventing them from doing X, incorrect display of Y, ...)
- ...
-
How many customers were affected?
- ...
-
If a precise customer impact number is unknown, what is the estimated impact (number and ratio of failed requests, amount of traffic drop, ...)?
- ...
What were the root causes?
Incident Response Analysis
-
How was the incident detected?
- ...
-
How could detection time be improved?
- ...
-
How was the root cause diagnosed?
- ...
-
How could time to diagnosis be improved?
- ...
-
How did we reach the point where we knew how to mitigate the impact?
- ...
-
How could time to mitigation be improved?
- ...
-
What went well?
- ...
Post Incident Analysis
-
Did we have other events in the past with the same root cause?
- ...
-
Do we have existing backlog items that would've prevented or greatly reduced the impact of this incident?
- ...
-
Was this incident triggered by a change (deployment of code or change to infrastructure)? If yes, link the issue.
- ...
Lessons Learned
Guidelines
Resources
- If the Situation Zoom room was utilised, recording will be automatically uploaded to Incident room Google Drive folder (private)
Incident Review Stakeholders
Edited by Amy Phillips