Skip to content

2024-05-25 Handbook Outage Retro

While cleaning up the "stuck" deployments (with this script https://gitlab.com/gitlab-org/developer-relations/contributor-success/toolbox/-/blob/c85059f5fb60c032d76da8ab43357ec0e54b244a/bin/pages_deployment_cleanup.rb) I accidentally deleted the production deployments!

Thankfully @dnsmichi was in exactly the right place at the right time and connected the dots to my WIP (hallelujah for the transparency/visibility which comes with GitLab)

I was able to re-run the production deployments and we probably ended up with about 10 minutes downtime.

I have updated the script to NOT delete deployments where path_prefix == '' but thought it best to create this issue to reflect.

Timeline

  • 2024-05-25 8:00 pm UTC handbook.gitlab.com redirects to a 404 page. @dnsmichi pings @leetickett-gitlab in Slack, after seeing a recent pipeline trigger in https://gitlab.com/gitlab-com/content-sites/handbook/-/jobs/6940225769
  • 2024-05-25 8:02 pm UTC @leetickett-gitlab investigates and fixes the problem
  • 2024-05-25 8:02 pm UTC Do we have monitoring?
  • 2024-05-25 8:09 pm UTC Public handbook works again
Edited by Michael Friedrich
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information