Every Gitlab build hangs for 5 minutes after finishing Azure uploads

Summary

We have a current (13.5.1-ee) Kubernetes deployed Gitlab which has recently developed a behaviour that in practice blocks CI for everyone: After every job finishes (either successfully or with a failure), and this is reported by the runner (as confirmed by the local log), the coordinator waits for 5 minutes to report the status via the UI. This blocks starting any dependent jobs and stages.

Steps to reproduce

  1. Access our Gitlab instance.
  2. Install example project on our instance.
  3. Observe builds time.

Example Project

Attached please find an examples project and a video that shows the behaviour.

What is the current bug behavior?

Job starts and finishes as expected, then the system waits to close the job with the three dots shown at the bottom of the screen.

What is the expected correct behavior?

Job starts and finishes as expected, then the system closes the job.

Relevant logs and/or screenshots

  • We have turned the log Debug flags on, but they do not show anything out of the ordinary.
  • We have reviewed the runner logs, but they do not show anything out of the ordinary.

Output of checks

Private instance hosted on AKS.

Results of GitLab environment info

Private instance hosted on AKS, created via helm, see attached values. The periphery uses hosted services where possible.

Results of GitLab application Check

Application check(rake) as per Kubernetes instructions.

git@gitlab-task-runner-5c55b46cff-bxh7x:/$ /usr/local/bin/gitlab-rake gitlab:check
Checking GitLab subtasks ...

Checking GitLab Shell ...

GitLab Shell: ... GitLab Shell version >= 13.11.0 ? ... OK (13.11.0)
Running /home/git/gitlab-shell/bin/check
gitlab-shell self-check failed
  Try fixing it:
  Make sure GitLab is running;
  Check the gitlab-shell configuration file:
  sudo -u git -H editor /home/git/gitlab-shell/config.yml
  Please fix the error above and rerun the checks.

Checking GitLab Shell ... Finished

Checking Gitaly ...

Gitaly: ... default ... OK

Checking Gitaly ... Finished

Checking Sidekiq ...

Sidekiq: ... Running? ... no
  Try fixing it:
  sudo -u git -H RAILS_ENV=production bin/background_jobs start
  For more information see:
  doc/install/installation.md in section "Install Init Script"
  see log/sidekiq.log for possible errors
  Please fix the error above and rerun the checks.

Checking Sidekiq ... Finished

Checking Incoming Email ...

Incoming Email: ... Reply by email is disabled in config/gitlab.yml

Checking Incoming Email ... Finished

Checking LDAP ...

LDAP: ... LDAP is disabled in config/gitlab.yml

Checking LDAP ... Finished

Checking GitLab App ...

Git configured correctly? ... no
Trying to fix error automatically. ...Failed
  Try fixing it:
  sudo -u git -H "/usr/bin/git" config --global core.autocrlf "input"
  For more information see:
  doc/install/installation.md in section "GitLab"
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet)
Init script exists? ... no
  Try fixing it:
  Install the init script
  For more information see:
  doc/install/installation.md in section "Install Init Script"
  Please fix the error above and rerun the checks.
Init script up-to-date? ... can't check because of previous errors
Projects have namespace: ... 
GitLab Instance / Monitoring ... yes
Research and Development / Infrastructure / Gitlab Installation ... yes
Codebots / Marketing Site ... yes
Research and Development / Fourth Generation Bots / Bot Talk ... yes
Research and Development / Fourth Generation Bots / Bot Learn ... yes
Research and Development / Fourth Generation Bots / Bot Vision ... yes
Codebots / Marketing Blog ... yes
Brodie O'Carroll / Marketing Blog ... yes
Codebots / Site Builder ... yes
Jörn Guy Süß / Gitlab Cleaner ... yes
Research and Development / Collaboration / University of Queensland  / Micro Credential ... yes
Research and Development / Collaboration / University of Queensland  / Agility ... yes
Research and Development / Collaboration / CRCP ... yes
Research and Development / gitlab-5minute-lag ... yes
Redis version >= 4.0.0? ... yes
Ruby version >= 2.5.3 ? ... yes (2.6.6)
Git version >= 2.24.0 ? ... no
Your git bin path is "/usr/bin/git"
  Try fixing it:
  Update your git to a version >= 2.24.0 from Unknown
  Please fix the error above and rerun the checks.
Git user has default SSH configuration? ... yes
Active users: ... 7
Is authorized keys file accessible? ... skipped (authorized keys not enabled)
GitLab configured to store new projects in hashed storage? ... yes
All projects are in hashed storage? ... yes
Elasticsearch version 6.x - 7.x? ... skipped (elasticsearch is disabled)

Checking GitLab App ... Finished


Checking GitLab subtasks ... Finished

Possible fixes

This behaviour does not depend on the:

  • type of executor. It occurs with the docker and kubernetes executor.
  • size of artifacts (for test cases there are none)
  • size of logs (for test cases they are 5 lines long)
  • image being used (for test cases it is busybox)
  • script (for tests it is one line)
  • network quality (for tests I have activated feature flags that would address this)

We have no resolution and no logs that show issues. It feels that the coordinator is attempting a call to another system and times out.

gitlab-org/gitlab-runner~bug

gitlab-org/gitlab-runner>

Edited by Stan Hu