Follow-up from "Resolve "Race condition in fetching Kubernetes token causes missing `$KUBECONFIG`""
The following discussion from !29922 (merged) should be addressed:
-
@DylanGriffith started a discussion: Conversation from Slack for historical context:
tiger [20 minutes ago] I don't think there's necessarily anything wrong with the
sleep
strategy, it could just make for some workers running much longer than they usually would. It's significantly trickier, but I think the traditional way to do this would be to abort the job and re-queue with something likeWorker.perform_in(@token_retry_delay)
. But that is tricky if we're dealing with more than one place where this can happen.dylan [12 minutes ago] tiger we can't retry the job because as we learnt it will recreate the secret again and clear out the token
tiger [5 minutes ago] Right, of course. So we'd need to do some serious reworking to make that happen. I feel like in the best case we'd make
CreateOrUpdateServiceAccountService
completely idempotent, as not being able to retry is caused bycreate_or_update_service_account
changing something on every attempt?tiger [4 minutes ago] Though these are big changes when you have a fix already. Maybe we have a technical debt piece to tidy it as best we can later on, once we've fixed it for everyone and got a DB constraint in
dylan [1 minute ago] Yeah I'm inclined to push a change now since the sooner we get this out the sooner we are helping more people. I'd love to refactor
CreateOrUpdateServiceAccountService
and fix that weird issue where it's not idempotent but my fear is that if we can't reproduce this locally then we may make a mistake and not fix it properly.