2024-05-25 Handbook Outage Retro
While cleaning up the "stuck" deployments (with this script https://gitlab.com/gitlab-org/developer-relations/contributor-success/toolbox/-/blob/c85059f5fb60c032d76da8ab43357ec0e54b244a/bin/pages_deployment_cleanup.rb) I accidentally deleted the production deployments!
Thankfully @dnsmichi was in exactly the right place at the right time and connected the dots to my WIP (hallelujah for the transparency/visibility which comes with GitLab)
I was able to re-run the production deployments and we probably ended up with about 10 minutes downtime.
I have updated the script to NOT delete deployments where path_prefix == '' but thought it best to create this issue to reflect.
Timeline
- 2024-05-25 8:00 pm UTC handbook.gitlab.com redirects to a 404 page.
@dnsmichipings@leetickett-gitlabin Slack, after seeing a recent pipeline trigger in https://gitlab.com/gitlab-com/content-sites/handbook/-/jobs/6940225769 - 2024-05-25 8:02 pm UTC
@leetickett-gitlabinvestigates and fixes the problem - 2024-05-25 8:02 pm UTC Do we have monitoring?
- 2024-05-25 8:09 pm UTC Public handbook works again
Edited by Michael Friedrich