Ansible job failing frequently in GitLab-Provisioner project
Update: As a quick fix attempt, we merged gitlab-provisioner!20 (merged) to make sure we run reconfigure always. However, we have to dig deeper why components didn't get started properly in the first attempt. Following logs may show more patterns.
- https://gitlab.com/gitlab-org/distribution/gitlab-provisioner/-/jobs/122143835
- https://gitlab.com/gitlab-org/distribution/gitlab-provisioner/-/jobs/122147410
- https://gitlab.com/gitlab-org/distribution/gitlab-provisioner/-/jobs/121877989
Looks like a race condition - before the DB and tables are up and ready, we are trying to access them.
I noticed another pattern - many a times, the first attempt of ansible fails. The retry sometimes succeed, some times not.
- First attempt failure - https://gitlab.com/gitlab-org/distribution/gitlab-provisioner/-/jobs/127672494, retry failure - https://gitlab.com/gitlab-org/distribution/gitlab-provisioner/-/jobs/127674836
- First attempt failure - https://gitlab.com/gitlab-org/distribution/gitlab-provisioner/-/jobs/127088265, retry success - https://gitlab.com/gitlab-org/distribution/gitlab-provisioner/-/jobs/127089304
- First attempt failure - https://gitlab.com/gitlab-org/distribution/gitlab-provisioner/-/jobs/126467510, retry failure - https://gitlab.com/gitlab-org/distribution/gitlab-provisioner/-/jobs/126473835
- First attempt success - https://gitlab.com/gitlab-org/distribution/gitlab-provisioner/-/jobs/126095558
- First attempt success - https://gitlab.com/gitlab-org/distribution/gitlab-provisioner/-/jobs/125935152
cc @ibaum Anything obvious from the logs?
Edited by Balasankar 'Balu' C