Add script to cleanup old canary/beta VMs
What does this MR do?
Automatically deletes VMs running in the canary
and beta
environment once they reach 3h of age. Our build timeout is currently 2h, so this should catch leftover VMs.
Why was this MR needed?
When a user cancels a job using the orka autoscaler, the VM is left behind. If we don't clean these up we risk exceeding our capacity. There was a handful of VMs left running from when I was on leave, so it is not a big amount, but we don't notice it until we look manually, so let's be safe.
What's the best way to test this MR?
I executed the script locally manually with the CI token, and it deleted one of my VMs correctly:
Checking canary VMs...
VMs to delete: ['6d8c7b79386ca']
Successfully deleted VM(s)
Done