What can we do about cleaning up failed deploys in CI?
Currently when a change is made to the charts, review environments are deployed (to GKE and EKS) in stage Review
. Tests are run against these environments in the Specs
stage. If the deploy job fails the environment fails, the stop job for the environment cannot be run and resources must be cleaned up manually by Distribution engineers.
We need to do some investigation into this and figure out what is the right approach.