Skip to content

Operational runbook for quickly rotating instances

During the incident gitlab-com/gl-infra/production#18348 (closed) we had to rotate Cloud Run instances (effectively pods) of AIGW due to some in-memory cached state.

This was done manually by adding an environment variable to each of the regional cloud run services and shifting traffic to the new revision.

It may be worth documenting this "break-glass" procedure, or even providing some operational tooling for this type of task.

Edited by Igor