Make spec/features/backups_spec.rb more robust against GKE autoscaler
Issue
While investigating #5001 (closed), which complained about the toolbox pod not being found, we identified that it was indeed created at some point, and then it was rescheduled due to the GCP auto scaler scaling down the nodes.
Since the spec knew exactly the name of the pod which to look for, and we found the pod name by checking pods with --field-selector=status.phase=Running
, it means that we indeed got affected by the autoscaler, not by the pod not being present.
Relevant logs
Failures:
1) Restoring a backup Backups Should be able to backup an identical tar
Failure/Error: expect(status.success?).to be(true), "Error backing up instance: #{stdout}"
Error backing up instance: Error from server (NotFound): pods "gke125-production-bz1tbp-toolbox-75df9cd6d5-dg28r" not found
# ./spec/features/backups_spec.rb:102:in `block (3 levels) in <top (required)>'
Proposal
From our discussion in the upstream issue:
We'd like to block the autoscaler from scaling down these pods which our tests depends on by patching our deployments with cluster-autoscaler.kubernetes.io/safe-to-evict: false
before the tests run.
NOTE: This is currently hard-coded into the Toolbox Deployment's .spec.template.spec.metadata.annotations