13.10 Job fetching performance / docker+machine usage
Summary
Starting with runner 13.10 we experience performance issues in our pipelines. The runner does no longer pick enough jobs and has too long delays between job checks.
We did not have any of these issues with 13.9, there were no configuration changes.
Steps to reproduce
- configure
docker+machine
autoscaling - start a bunch of jobs
Actual behavior
Runner only picks around 2-3 jobs and does not use all machines it started.
Expected behavior
Pick any pending job and use all the idle machines.
Relevant logs and/or screenshots
During this log period there were around 30 idle jobs that could have been picked by this runner. The runner only had 2 jobs running, whereas there were 9 docker-machines up and running that could have been used.
The runner correctly spins up the required instances and starts fetching jobs. What we noticed is that job checking occurs about every 50 seconds, opposed to what we would expect:
job log
Mar 24 10:27:11 gitlab gitlab-runner[21612]: Checking for jobs... received #033[0;m job#033[0;m=313338 repo_url#033[0;m=https://gitlab.example.com/redacted-project.git runner#033[0;m=ovfJCpH1
Mar 24 10:27:12 gitlab gitlab-runner[21612]: Using existing docker-machine #033[0;m created#033[0;m=2021-03-24 10:01:18.780997449 +0100 CET m=+1.713588173 docker#033[0;m=tcp://116.203.59.132:2376 job#033[0;m=313338 name#033[0;m=runner-ovfjcph1-runner-1616571477-cef13cd7 now#033[0;m=2021-03-24 10:27:12.741892218 +0100 CET m=+1555.674482981 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=7
Mar 24 10:27:14 gitlab gitlab-runner[21612]: Starting docker-machine build... #033[0;m created#033[0;m=2021-03-24 10:01:18.780997449 +0100 CET m=+1.713588173 docker#033[0;m=tcp://116.203.59.132:2376 job#033[0;m=313338 name#033[0;m=runner-ovfjcph1-runner-1616571477-cef13cd7 now#033[0;m=2021-03-24 10:27:14.497331006 +0100 CET m=+1557.429921751 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=7
Mar 24 10:28:04 gitlab gitlab-runner[21612]: Checking for jobs... received #033[0;m job#033[0;m=313339 repo_url#033[0;m=https://gitlab.example.com/redacted-project.git runner#033[0;m=ovfJCpH1
Mar 24 10:28:05 gitlab gitlab-runner[21612]: Finished docker-machine build: <nil> #033[0;m created#033[0;m=2021-03-24 10:01:18.781005082 +0100 CET m=+1.713595815 docker#033[0;m=tcp://157.90.166.215:2376 job#033[0;m=313333 name#033[0;m=runner-ovfjcph1-runner-1616572126-92c36086 now#033[0;m=2021-03-24 10:28:05.157583303 +0100 CET m=+1608.090174067 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=11
Mar 24 10:28:05 gitlab gitlab-runner[21612]: Cleaned up docker-machine #033[0;m created#033[0;m=2021-03-24 10:01:18.781005082 +0100 CET m=+1.713595815 docker#033[0;m=tcp://157.90.166.215:2376 job#033[0;m=313333 name#033[0;m=runner-ovfjcph1-runner-1616572126-92c36086 now#033[0;m=2021-03-24 10:28:05.180154732 +0100 CET m=+1608.112745479 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=11
Mar 24 10:28:05 gitlab gitlab-runner[21612]: Job succeeded #033[0;m duration#033[0;m=1m43.849102947s job#033[0;m=313333 project#033[0;m=461 runner#033[0;m=ovfJCpH1
Mar 24 10:28:05 gitlab gitlab-runner[21612]: Using existing docker-machine #033[0;m created#033[0;m=2021-03-24 10:01:18.780988916 +0100 CET m=+1.713579639 docker#033[0;m=tcp://157.90.126.211:2376 job#033[0;m=313339 name#033[0;m=runner-ovfjcph1-runner-1616569366-5a0fd8f0 now#033[0;m=2021-03-24 10:28:05.271903506 +0100 CET m=+1608.204494244 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=7
Mar 24 10:28:06 gitlab gitlab-runner[21612]: Starting docker-machine build... #033[0;m created#033[0;m=2021-03-24 10:01:18.780988916 +0100 CET m=+1.713579639 docker#033[0;m=tcp://157.90.126.211:2376 job#033[0;m=313339 name#033[0;m=runner-ovfjcph1-runner-1616569366-5a0fd8f0 now#033[0;m=2021-03-24 10:28:06.985579886 +0100 CET m=+1609.918170640 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=7
Mar 24 10:28:52 gitlab gitlab-runner[21612]: Finished docker-machine build: <nil> #033[0;m created#033[0;m=2021-03-24 10:01:18.781013068 +0100 CET m=+1.713603790 docker#033[0;m=tcp://157.90.174.232:2376 job#033[0;m=313331 name#033[0;m=runner-ovfjcph1-runner-1616574459-7d0883dd now#033[0;m=2021-03-24 10:28:52.351457155 +0100 CET m=+1655.284047926 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=12
Mar 24 10:28:52 gitlab gitlab-runner[21612]: Cleaned up docker-machine #033[0;m created#033[0;m=2021-03-24 10:01:18.781013068 +0100 CET m=+1.713603790 docker#033[0;m=tcp://157.90.174.232:2376 job#033[0;m=313331 name#033[0;m=runner-ovfjcph1-runner-1616574459-7d0883dd now#033[0;m=2021-03-24 10:28:52.386604577 +0100 CET m=+1655.319195329 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=12
Mar 24 10:28:52 gitlab gitlab-runner[21612]: Job succeeded #033[0;m duration#033[0;m=3m25.597972082s job#033[0;m=313331 project#033[0;m=461 runner#033[0;m=ovfJCpH1
Mar 24 10:28:52 gitlab gitlab-runner[21612]: Finished docker-machine build: <nil> #033[0;m created#033[0;m=2021-03-24 10:01:18.780988916 +0100 CET m=+1.713579639 docker#033[0;m=tcp://157.90.126.211:2376 job#033[0;m=313339 name#033[0;m=runner-ovfjcph1-runner-1616569366-5a0fd8f0 now#033[0;m=2021-03-24 10:28:52.781324371 +0100 CET m=+1655.713915122 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=7
Mar 24 10:28:54 gitlab gitlab-runner[21612]: Cleaned up docker-machine #033[0;m created#033[0;m=2021-03-24 10:01:18.780988916 +0100 CET m=+1.713579639 docker#033[0;m=tcp://157.90.126.211:2376 job#033[0;m=313339 name#033[0;m=runner-ovfjcph1-runner-1616569366-5a0fd8f0 now#033[0;m=2021-03-24 10:28:54.191253434 +0100 CET m=+1657.123844158 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=7
Mar 24 10:28:54 gitlab gitlab-runner[21612]: Job succeeded #033[0;m duration#033[0;m=49.930860599s job#033[0;m=313339 project#033[0;m=461 runner#033[0;m=ovfJCpH1
Mar 24 10:28:54 gitlab gitlab-runner[21612]: Checking for jobs... received #033[0;m job#033[0;m=313332 repo_url#033[0;m=https://gitlab.example.com/redacted-project.git runner#033[0;m=ovfJCpH1
Mar 24 10:28:55 gitlab gitlab-runner[21612]: Using existing docker-machine #033[0;m created#033[0;m=2021-03-24 10:01:18.781013068 +0100 CET m=+1.713603790 docker#033[0;m=tcp://157.90.174.232:2376 job#033[0;m=313332 name#033[0;m=runner-ovfjcph1-runner-1616574459-7d0883dd now#033[0;m=2021-03-24 10:28:55.343254134 +0100 CET m=+1658.275844855 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=13
Mar 24 10:28:57 gitlab gitlab-runner[21612]: Starting docker-machine build... #033[0;m created#033[0;m=2021-03-24 10:01:18.781013068 +0100 CET m=+1.713603790 docker#033[0;m=tcp://157.90.174.232:2376 job#033[0;m=313332 name#033[0;m=runner-ovfjcph1-runner-1616574459-7d0883dd now#033[0;m=2021-03-24 10:28:57.087692868 +0100 CET m=+1660.020283630 project#033[0;m=461 runner#033[0;m=ovfJCpH1 usedcount#033[0;m=13
gitlab-runner --debug run:
:
Checking for jobs... nothing runner=redacted_fPwc
Checking for jobs... nothing runner=redacted_3PG7
Checking for jobs... nothing runner=redacted_27Gn
Checking for jobs... nothing runner=redacted_8819
Checking for jobs... nothing runner=redacted_6cb5
Feeding runners to channel builds=2
Docker Machine Details creating=0 idle=9 maxMachineCreate=0 maxMachines=14 minIdleCount=5 removing=0 runner=redacted_CpH1 time=2021-03-25 11:01:15.730655082 +0100 CET m=+101.597428603 total=11 used=2
Checking for jobs... nothing runner=redacted_1a65
Checking for jobs... nothing runner=redacted_eb93
Checking for jobs... nothing runner=redacted_4f32
Checking for jobs... nothing runner=redacted_ae5e
Checking for jobs... nothing runner=redacted_dec0
Checking for jobs... nothing runner=redacted_fd17
Checking for jobs... received job=314792 repo_url=https://gitlab.example.com/redacted-project.git runner=redacted_CpH1
Processing chain chain-leaf=[0xc000cfa580 0xc00058cc00 0xc00071f180] context=certificate-chain-build
Certificate doesn't provide parent URL: exiting the loop Issuer=DST Root CA X3 IssuerCertURL=[] Serial=91299735575339953335919266965803778155 Subject=DST Root CA X3 context=certificate-chain-build
Failed to requeue the runner builds=3 runner=redacted_CpH1
Running with gitlab-runner 13.10.0 (54944146) job=314792 project=310 runner=redacted_CpH1
on docker@hetzner.cloud redacted_CpH1 job=314792 project=310 runner=redacted_CpH1
Preparing the "docker+machine" executor job=314792 project=310 runner=redacted_CpH1
Checking for jobs... nothing runner=redacted_aaef
Checking for jobs... nothing runner=redacted_eb6a
Checking for jobs... nothing runner=redacted_hENo
Checking for jobs... nothing runner=redacted_zx8u
Environment description
We have a single gitlab-runner instance that has 20 distinct runner configurations.
One of these configurations is our main runner using docker+machine
autoscaling:
config.toml contents
concurrent = 20
check_interval = 1
[session_server]
session_timeout = 1800
[[runners]]
name = "my-runner"
limit = 14
output_limit = 40960
url = "https://gitlab.example.com/"
token = "<redacted>"
executor = "docker+machine"
[runners.docker]
tls_verify = false
image = "debian:latest"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
shm_size = 0
[runners.cache]
Type = "s3"
Shared = true
[runners.cache.s3]
ServerAddress = "<redacted>"
AccessKey = "<redacted>"
SecretKey = "<redacted>"
BucketName = "gitlab-cache"
[runners.cache.gcs]
[runners.machine]
IdleCount = 0
IdleTime = 1800
MaxBuilds = 100
MachineDriver = "hetzner"
MachineName = "runner-%s"
MachineOptions = [
# redacted
]
[[runners.machine.autoscaling]]
Periods = ["* * 7-19 * * mon-fri *"]
IdleCount = 5
IdleTime = 3600
Timezone = "Europe/Berlin"
Used GitLab Runner version
Version: 13.10.0
Git revision: 54944146
Git branch: 13-10-stable
GO version: go1.13.8
Built: 2021-03-21T09:13:32+0000
OS/Arch: linux/amd64