make it easier to troubleshoot flaky CAPO CI pipelines

Today when we have a flaky CAPO CI pipeline, if we want to look at what is happening live on a run, we need to:

  • start such a run
  • cancel the cleanup job
  • update the Heat stack to have the periodic cleanup spare it

Here is what I would propose:

  • introduce a delay-capo-ci-cleanup-on-failure gitlab label
  • on an MR marked with this label:
    • if a pipeline is successful, run the cleanup job as usual
    • if the pipeline is failed, don't clean it up, and instead add the please-delay-cleanup on the Heat stack
Assignee Loading
Time tracking Loading