CI clone timeouts (and RPC curl failures)
Summary
Looking for ideas and clues to determine what is causing the intermittent timeouts. I think this might be a configuration issue, but I do not have the knowledge to rule it out conclusively. It should be noted that the timeouts only started recently, when I updated the Gitlab container. Previously, I experienced no timeouts.
CI runners (using the docker executor) have suddenly encountering timeouts during the cloning process. Immediately re-running the job generally results in the job not encountering the timeout and succeeding. Because of this, it is hard to nail down exactly what is going wrong.
Following is the CI runner environment variables related to GIT and CURL, and an extract of a timed out job.
GIT_DEPTH: 3
GIT_FETCH_EXTRA_FLAGS: --prune --no-tags --prune-tags --force
GIT_CLONE_PATH: $CI_BUILDS_DIR/$CI_CONCURRENT_ID/$CI_PROJECT_NAME
Running with gitlab-runner 13.5.0 (ece86343)
on sydneyDocker XXXXXXXXX
Preparing the "docker" executor
Using Docker executor with image edbizarro/gitlab-ci-pipeline-php:7.4-alpine ...
Starting service mariadb:10.3.17 ...
Pulling docker image mariadb:10.3.17 ...
Using docker image sha256:32ab29277e522bcd5b9f68ddd0cf3b8e0afa4b0a8cbc3a3d02ea8a1c30a346fe for mariadb:10.3.17 with digest mariadb@sha256:2269c4fd6671574fbc1cae3e30840c219bed1797c430e70a8c16956d146215ac ...
Waiting for services to be up and running...
Pulling docker image edbizarro/gitlab-ci-pipeline-php:7.4-alpine ...
Using docker image sha256:5f82106536f1c77aedfea2f3e153260888c2c89270d757ad9240b9c36fc8076a for edbizarro/gitlab-ci-pipeline-php:7.4-alpine with digest edbizarro/gitlab-ci-pipeline-php@sha256:84c1069035ef90a4a423e846138b00fa66eae3790ab14a03203e39118509dc7a ...
Preparing environment
Running on runner-vmahxjny-project-21-concurrent-0 via server...
Getting source from Git repository
Fetching changes with git depth set to 3...
Reinitialized existing Git repository in /builds/0/slair/.git/
ERROR: Job failed: execution took longer than 2h0m0s seconds
This job is a timeout with the CURL logs being more verbose. I could not see a hint of the underlying problem.
Getting source from Git repository
15:02:39.192574 git.c:439 trace: built-in: git config --global http.https://gitlab.xxxxxx.sslCAInfo /builds/0/slair.tmp/CI_SERVER_TLS_CA_FILE
Fetching changes with git depth set to 3...
15:02:39.195329 git.c:439 trace: built-in: git config -f /builds/0/slair.tmp/git-template/config fetch.recurseSubmodules false
15:02:39.196838 git.c:439 trace: built-in: git init /builds/0/slair --template /builds/0/slair.tmp/git-template
Reinitialized existing Git repository in /builds/0/slair/.git/
15:02:39.203236 git.c:439 trace: built-in: git remote set-url origin https://gitlab-ci-token:[MASKED]@gitlab.xxxxxx/slair/slair.git
15:02:39.204733 git.c:439 trace: built-in: git fetch origin +refs/pipelines/4213:refs/pipelines/4213 +refs/heads/TestCi:refs/remotes/origin/TestCi --depth 3 --prune --no-tags --prune-tags --force
15:02:39.206032 run-command.c:663 trace: run_command: GIT_DIR=.git git-remote-https origin https://gitlab-ci-token:[MASKED]@gitlab.xxxxxx/slair/slair.git
* Trying yy.yy.yy.yy:443...
* Connected to gitlab.xxxxxx (yy.yy.yy.yy) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /builds/0/slair.tmp/CI_SERVER_TLS_CA_FILE
CApath: none
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=gitlab.xxxxxx
* start date: Oct 15 19:14:14 2020 GMT
* expire date: Jan 13 19:14:14 2021 GMT
* subjectAltName: host "gitlab.xxxxxx" matched cert's "gitlab.xxxxxx"
* issuer: C=US; O=Let's Encrypt; CN=Let's Encrypt Authority X3
* SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55ccaf5e7280)
> GET /slair/slair.git/info/refs?service=git-upload-pack HTTP/2
Host: gitlab.xxxxxx
user-agent: git/2.26.2
accept: */*
accept-encoding: deflate, gzip
pragma: no-cache
git-protocol: version=2
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 401
< server: nginx/1.19.3
< date: Tue, 17 Nov 2020 15:02:39 GMT
< content-type: text/plain; charset=utf-8
< content-length: 26
< cache-control: no-cache
< www-authenticate: Basic realm="GitLab"
< x-content-type-options: nosniff
< x-download-options: noopen
< x-frame-options: SAMEORIGIN
< x-permitted-cross-domain-policies: none
< x-request-id: AgwpDrERwP
< x-runtime: 0.013341
< x-xss-protection: 1; mode=block
< strict-transport-security: max-age=31536000
<
* Ignoring the response-body
* Connection #0 to host gitlab.xxxxxx left intact
* Issue another request to this URL: 'https://gitlab-ci-token:[MASKED]@gitlab.xxxxxx/slair/slair.git/info/refs?service=git-upload-pack'
* Found bundle for host gitlab.xxxxxx: 0x55ccaf5ee400 [can multiplex]
* Re-using existing connection! (#0) with host gitlab.xxxxxx
* Connected to gitlab.xxxxxx (yy.yy.yy.yy) port 443 (#0)
* Server auth using Basic with user 'gitlab-ci-token'
* Using Stream ID: 3 (easy handle 0x55ccaf5e7280)
> GET /slair/slair.git/info/refs?service=git-upload-pack HTTP/2
Host: gitlab.xxxxxx
authorization: Basic XXXXXXXXXXXXXXXXXXX
user-agent: git/2.26.2
accept: */*
accept-encoding: deflate, gzip
pragma: no-cache
git-protocol: version=2
< HTTP/2 200
< server: nginx/1.19.3
< date: Tue, 17 Nov 2020 15:02:39 GMT
< content-type: application/x-git-upload-pack-advertisement
< content-length: 140
< cache-control: no-cache
< content-encoding: gzip
< strict-transport-security: max-age=31536000
< referrer-policy: strict-origin-when-cross-origin
< strict-transport-security: max-age=31536000
<
* Connection #0 to host gitlab.xxxxxx left intact
* Found bundle for host gitlab.xxxxxx: 0x55ccaf5ee400 [can multiplex]
* Re-using existing connection! (#0) with host gitlab.xxxxxx
* Connected to gitlab.xxxxxx (yy.yy.yy.yy) port 443 (#0)
* Server auth using Basic with user 'gitlab-ci-token'
* Using Stream ID: 5 (easy handle 0x55ccaf5e7280)
> POST /slair/slair.git/git-upload-pack HTTP/2
Host: gitlab.xxxxxx
authorization: Basic XXXXXXXXXXXXXXXXXXX
user-agent: git/2.26.2
accept-encoding: deflate, gzip
content-type: application/x-git-upload-pack-request
accept: application/x-git-upload-pack-result
git-protocol: version=2
content-length: 591
* We are completely uploaded and fine
< HTTP/2 200
< server: nginx/1.19.3
< date: Tue, 17 Nov 2020 15:02:39 GMT
< content-type: application/x-git-upload-pack-result
< content-length: 132
< cache-control: no-cache
< strict-transport-security: max-age=31536000
< referrer-policy: strict-origin-when-cross-origin
< strict-transport-security: max-age=31536000
<
* Connection #0 to host gitlab.xxxxxx left intact
* Found bundle for host gitlab.xxxxxx: 0x55ccaf5ee400 [can multiplex]
* Re-using existing connection! (#0) with host gitlab.xxxxxx
* Connected to gitlab.xxxxxx (yy.yy.yy.yy) port 443 (#0)
* Server auth using Basic with user 'gitlab-ci-token'
* Using Stream ID: 7 (easy handle 0x55ccaf5e7280)
> POST /slair/slair.git/git-upload-pack HTTP/2
Host: gitlab.xxxxxx
authorization: Basic XXXXXXXXXXXXXXXXXXX
user-agent: git/2.26.2
accept-encoding: deflate, gzip
content-type: application/x-git-upload-pack-request
accept: application/x-git-upload-pack-result
git-protocol: version=2
content-encoding: gzip
content-length: 965
Steps to reproduce
N/A; looking for ideas about where to look next.
Example Project
N/A; once I know where to look for more information I can create the example.
What is the current bug behavior?
Inexplicable and intermittent timeouts that only occurs when the the job is getting the code. Restarting the job usually results in the timeout no longer occuring.
What is the expected correct behavior?
Either no timeout, or a decent log message that explains the difference as to why this job isn't working when the next retry will succeed.
Results of GitLab environment info
Expand for output related to GitLab environment info
root@gitlab:/# gitlab-rake gitlab:env:info System information System: Current User: git Using RVM: no Ruby Version: 2.6.6p146 Gem Version: 2.7.10 Bundler Version:1.17.3 Rake Version: 12.3.3 Redis Version: 5.0.9 Git Version: 2.28.0 Sidekiq Version:5.2.9 Go Version: unknown GitLab information Version: 13.5.4 Revision: 7f5f940041c Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 11.9 URL: https://gitlab.slair.com.au HTTP Clone URL: https://gitlab.slair.com.au/some-group/some-project.git SSH Clone URL: ssh://git@gitlab.slair.com.au:2020/some-group/some-project.git Using LDAP: no Using Omniauth: yes Omniauth Providers: GitLab Shell Version: 13.11.0 Repository storage paths: - default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
root@gitlab:/# gitlab-rake gitlab:check SANITIZE=true Checking GitLab subtasks ...Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 13.11.0 ? ... OK (13.11.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Checking Reply by email ...
IMAP server credentials are correct? ... Checking gitlab@slair.com.au yes Init.d configured correctly? ... skipped MailRoom running? ... skipped
Checking Reply by email ... Finished
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab App ...
Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 2/1 ... yes 2/12 ... yes 21/15 ... yes 2/20 ... yes 21/21 ... yes 13/22 ... yes 2/23 ... yes Redis version >= 4.0.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.6) Git version >= 2.24.0 ? ... yes (2.28.0) Git user has default SSH configuration? ... yes Active users: ... 9 Is authorized keys file accessible? ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes
Checking GitLab App ... Finished
Checking GitLab subtasks ... Finished
Gitlab runner config
Expand for output related to the the affected runner
[[runners]]
name = "sydneyDocker"
url = "https://gitlab.xxxxxx/"
token = "[MASKED]
executor = "docker"
cache_dir = "/cache"
environment = ["DOCKER_TLS_CERTDIR="]
pre_clone_script = "git config --global http.postBuffer 157286400"
[runners.custom_build_dir]
enabled = true
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "docker:stable"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0