Determine failures in staging in relation to urgent-cpu-bound shard

Two persons raised concerns after we had moved the urgent-cpu-bound queue from VM's into Kubernetes. After restoring the VM in staging, preprod continues to show the same errors. At the time I was unable to determine root cause. Let's rope in some people on this issue to see what we need to investigate. This should be considered a blocker for moving this shard to Kubernetes into production.

Threads that started discussion: