Task runner (toolbox) pod may not be available when needed by config tasks
In https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/sandbox/switchboard_la/-/jobs/2553686526, we got:
TASK [post_configure : Wait for GitLab to be available] ************************
ok: [localhost]
TASK [kubeconfig - Configure local kubeconfig to point to correct cluster] *****
TASK [gitlab_charts : Configure kubeconfig credentials (AWS)] ******************
changed: [localhost]
TASK [post_configure : Check for Task Runner pod] ******************************
ok: [localhost]
TASK [post_configure : Save Task Runner pod name] ******************************
ok: [localhost]
TASK [post_configure : Disable Write to "authorized_keys" file setting via GitLab Task Runner pod] ***
[DEPRECATION WARNING]: The 'return_code' return key is deprecated. Please use
'rc' instead. This feature will be removed from kubernetes.core in version
4.0.0. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
fatal: [localhost]: FAILED! => changed=true
rc: 137
return_code: 137
stderr: ''
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
This is running GET 2.2.2, and I was hopeful that using the runner pod (rather than the API and access tokens) was going to be a bit more reliable. This run of ansible caused helm to cycle the rails pods (sidekiq, webservice, and toolbox) and I suspect that the task runner pod wasn't fully functional by the time ansible tried to Disable Write
. Unfortunately this is hard to reproduce as it's fairly transient (I possibly could if I tried, but time is against me right now).
Maybe we need another task after Save Task Runner pod name
that waits for the pod to be ready, rather than just listed enough to have a name?