make it easier to troubleshoot flaky CAPO CI pipelines
Today when we have a flaky CAPO CI pipeline, if we want to look at what is happening live on a run, we need to:
- start such a run
- cancel the cleanup job
- update the Heat stack to have the periodic cleanup spare it
Here is what I would propose:
- introduce a
delay-capo-ci-cleanup-on-failuregitlab label - on an MR marked with this label:
- if a pipeline is successful, run the cleanup job as usual
- if the pipeline is failed, don't clean it up, and instead add the
please-delay-cleanupon the Heat stack