Self-managed runner keeps job pending while being idle
<!---
Please read this!
Before opening a new issue, make sure to search for keywords in the issues
filtered by the "regression" or "type::bug" label:
- https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=regression
- https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=type::bug
and verify the issue you're about to submit isn't a duplicate.
--->
### Summary
<!-- Summarize the bug encountered concisely. -->
We have a self-managed GitLab instance (on K8s) with auto-scaled runners (VM's).
The only runner in our environment is the runner-manager which automatically creates new VM's to run the jobs on and removes them again after a few minutes of being idle.
We have it configured to always have 1 idle runner within business hours so that we can quickly run jobs without having to wait for the runner-manager to create a new VM (or multiple).
Recently we have noticed that this setup is having problems with one particular job. This job stays `Pending`.
* In `Admin > CI/CD > Runners` I can see the runner-manager is `Online` and `Idle`.
* In `Admin > CI/CD > Jobs` I can see the job being on status `Pending` with `none` in the Runner column. So it seems like this particular job didn't get a runner assigned.
In the logs of the runner itself I see this every second:
```
Checking for jobs...nothing runner=xxxxxxx
```
While creating this issue I noticed that the job suddenly got picked up, after being queued for 59 minutes and 18 seconds (just in time before it would have been stopped by the `1h` timeout).
Usually we cancel and retry it after a few minutes of being it stuck, after which it always immediately gets picked up and proceeds with the rest of the pipeline.
Our runner-manager configuration:
```toml
listen_address = ":9252"
concurrent = 25
check_interval = 1
log_level = "info"
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "runner-manager"
url = "https://gitlab.example.com"
id = 114
token = "<redacted>"
token_obtained_at = 2023-01-30T03:04:44Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "docker+machine"
environment = ["DOCKER_TLS_CERTDIR="]
[runners.custom_build_dir]
[runners.cache]
Type = "gcs"
Shared = true
MaxUploadedArchiveSize = 0
[runners.cache.s3]
[runners.cache.gcs]
BucketName = "<redacted>"
[runners.cache.azure]
[runners.feature_flags]
FF_USE_FASTZIP = true
[runners.docker]
tls_verify = false
image = "alpine"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
pull_policy = ["always"]
shm_size = 0
[runners.machine]
MaxGrowthRate = 25
IdleCount = 0
IdleScaleFactor = 0.0
IdleCountMin = 0
IdleTime = 300
MaxBuilds = 20
MachineDriver = "google"
MachineName = "auto-scale-%s"
MachineOptions = [<redacted>]
[[runners.machine.autoscaling]]
Periods = ["* * 6-18 * * mon-fri *"]
Timezone = "Europe/Amsterdam"
IdleCount = 1
IdleScaleFactor = 0.0
IdleCountMin = 1
IdleTime = 300
```
### Steps to reproduce
<!-- Describe how one can reproduce the issue - this is very important. Please use an ordered list. -->
### Example Project
It seems to be happening (a lot) with the same job, our `gitleaks` check.
```yaml
test/gitleaks:
stage: test
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"
allow_failure: true
interruptible: true
image:
name: zricethezav/gitleaks
entrypoint: [""] # Clear the entrypoint
artifacts:
expire_in: 1 hour # We only use it briefly
reports:
secret_detection: gitleaks-report.json
when: on_failure # Gitleaks only produces a report on failure
variables:
GIT_STRATEGY: clone # Fetch causes issues
GIT_CEILING_DIRECTORIES: $CI_PROJECT_DIR # For Git < 2.35.2
before_script:
- |
# Adding build directory to Git safe list
git config --global --add safe.directory $CI_PROJECT_DIR # For git >= 2.35.2
- |
# Obtaining the commits between this branch and the upstream branch
git fetch origin $CI_DEFAULT_BRANCH $CI_COMMIT_REF_NAME
git log --left-right --cherry-pick --pretty=format:"%H" refs/remotes/origin/$CI_DEFAULT_BRANCH...refs/remotes/origin/$CI_COMMIT_REF_NAME > commit_list.txt
export GL_FIRST_COMMIT=$(head -n1 commit_list.txt) GL_LAST_COMMIT=$(tail -n1 commit_list.txt)
rm commit_list.txt
script:
- gitleaks detect -v --log-opts="${GL_FIRST_COMMIT}..${GL_LAST_COMMIT}" --report-format=json --report-path=gitleaks-report.json
#TODO: Export gitleaks report as junit report
retry: 2 # Retry when failed, sometimes this job fails the first time.
```
### What is the current *bug* behavior?
<!-- Describe what actually happens. -->
Job is not picked up or very late while runner showing as being `idle`.
### What is the expected *correct* behavior?
<!-- Describe what you should see instead. -->
Job picked up if the runner is idle.
### Relevant logs and/or screenshots
<!-- Paste any relevant logs - please use code blocks (```) to format console output, logs, and code
as it's tough to read otherwise. -->
#### Results of GitLab environment info
<!-- Input any relevant GitLab environment information if needed. -->
<details>
<summary>Expand for output related to GitLab environment info</summary>
<pre>
System information
System:
Current User: git
Using RVM: no
Ruby Version: 2.7.7p221
Gem Version: 3.2.33
Bundler Version:2.3.15
Rake Version: 13.0.6
Redis Version: 6.0.16
Sidekiq Version:6.5.7
Go Version: unknown
GitLab information
Version: 15.8.0
Revision: c052f86b6b4
Directory: /srv/gitlab
DB Adapter: PostgreSQL
DB Version: 14.4
URL: https://gitlab.example.com
HTTP Clone URL: https://gitlab.example.com/some-group/some-project.git
SSH Clone URL: git@gitlab.example.com:some-group/some-project.git
Using LDAP: no
Using Omniauth: yes
Omniauth Providers: azure_activedirectory_v2, gitlab
GitLab Shell
Version: 14.15.0
Repository storages:
- default: tcp://gitlab-gitaly-0.gitlab-gitaly.gitlab.svc:8075
GitLab Shell path: /home/git/gitlab-shell
</pre>
</details>
#### Results of GitLab application Check
<!-- Input any relevant GitLab application check information if needed. -->
<details>
<summary>Expand for output related to the GitLab application check</summary>
<pre>
Checking GitLab subtasks ...
Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 14.15.0 ? ... OK (14.15.0)
Running /home/git/gitlab-shell/bin/check
gitlab-shell self-check failed
Try fixing it:
Make sure GitLab is running;
Check the gitlab-shell configuration file:
sudo -u git -H editor /home/git/gitlab-shell/config.yml
Please fix the error above and rerun the checks.
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... no
Try fixing it:
sudo -u git -H RAILS_ENV=production bin/background_jobs start
For more information see:
doc/install/installation.md in section "Install Init Script"
see log/sidekiq.log for possible errors
Please fix the error above and rerun the checks.
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Reply by email is disabled in config/gitlab.yml
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab App ...
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config up to date? ... yes
Cable config exists? ... yes
Resque config exists? ... yes
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory exists? ... yes
Uploads directory has correct permissions? ... yes
Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet)
Systemd unit files or init script exist? ... no
Try fixing it:
Install the Service
For more information see:
doc/install/installation.md in section "Install the Service"
Please fix the error above and rerun the checks.
Systemd unit files or init script up-to-date? ... can't check because of previous errors
Projects have namespace: ...
2/1 ... yes
5/2 ... yes
5/3 ... yes
5/4 ... yes
170/108 ... yes
170/109 ... yes
170/110 ... yes
169/111 ... yes
175/112 ... yes
175/113 ... yes
178/114 ... yes
177/115 ... yes
177/116 ... yes
176/117 ... yes
176/118 ... yes
181/119 ... yes
180/120 ... yes
179/121 ... yes
180/122 ... yes
179/123 ... yes
180/124 ... yes
168/125 ... yes
168/126 ... yes
168/127 ... yes
168/128 ... yes
168/129 ... yes
168/130 ... yes
168/131 ... yes
168/132 ... yes
168/133 ... yes
168/134 ... yes
168/135 ... yes
168/136 ... yes
168/137 ... yes
168/138 ... yes
168/139 ... yes
168/140 ... yes
168/141 ... yes
260/142 ... yes
260/143 ... yes
260/144 ... yes
229/145 ... yes
260/146 ... yes
267/147 ... yes
267/148 ... yes
307/149 ... yes
307/150 ... yes
307/151 ... yes
224/152 ... yes
235/153 ... yes
299/154 ... yes
224/155 ... yes
294/156 ... yes
326/157 ... yes
326/158 ... yes
327/159 ... yes
276/160 ... yes
236/161 ... yes
313/162 ... yes
313/163 ... yes
278/164 ... yes
274/165 ... yes
274/166 ... yes
274/167 ... yes
236/168 ... yes
224/169 ... yes
248/170 ... yes
306/171 ... yes
331/172 ... yes
331/173 ... yes
331/174 ... yes
265/175 ... yes
266/176 ... yes
266/177 ... yes
262/178 ... yes
262/179 ... yes
262/180 ... yes
261/181 ... yes
261/182 ... yes
261/183 ... yes
261/184 ... yes
261/185 ... yes
224/186 ... yes
241/187 ... yes
246/188 ... yes
246/189 ... yes
246/190 ... yes
246/191 ... yes
281/192 ... yes
316/193 ... yes
316/194 ... yes
281/195 ... yes
316/196 ... yes
316/197 ... yes
239/198 ... yes
247/199 ... yes
247/200 ... yes
247/201 ... yes
247/202 ... yes
247/203 ... yes
247/204 ... yes
240/205 ... yes
240/206 ... yes
240/207 ... yes
248/208 ... yes
248/209 ... yes
310/210 ... yes
272/211 ... yes
272/212 ... yes
272/213 ... yes
310/214 ... yes
310/215 ... yes
310/216 ... yes
272/217 ... yes
272/218 ... yes
308/221 ... yes
268/222 ... yes
303/223 ... yes
303/224 ... yes
230/225 ... yes
230/226 ... yes
230/227 ... yes
262/228 ... yes
262/229 ... yes
262/230 ... yes
262/231 ... yes
262/232 ... yes
262/233 ... yes
262/234 ... yes
262/235 ... yes
262/236 ... yes
262/237 ... yes
262/238 ... yes
262/239 ... yes
262/240 ... yes
262/241 ... yes
262/242 ... yes
262/243 ... yes
262/244 ... yes
262/245 ... yes
262/246 ... yes
262/247 ... yes
262/248 ... yes
262/249 ... yes
262/250 ... yes
262/251 ... yes
262/252 ... yes
262/253 ... yes
230/254 ... yes
230/255 ... yes
230/256 ... yes
230/257 ... yes
230/258 ... yes
230/259 ... yes
230/260 ... yes
230/261 ... yes
230/262 ... yes
263/263 ... yes
230/264 ... yes
230/265 ... yes
230/266 ... yes
261/267 ... yes
261/268 ... yes
261/269 ... yes
261/270 ... yes
261/271 ... yes
261/272 ... yes
261/273 ... yes
261/274 ... yes
261/275 ... yes
261/276 ... yes
261/277 ... yes
261/278 ... yes
236/279 ... yes
264/280 ... yes
264/281 ... yes
264/282 ... yes
230/283 ... yes
264/284 ... yes
264/285 ... yes
264/286 ... yes
264/287 ... yes
264/288 ... yes
264/289 ... yes
264/290 ... yes
264/291 ... yes
264/292 ... yes
264/293 ... yes
264/294 ... yes
264/295 ... yes
264/296 ... yes
264/297 ... yes
264/298 ... yes
264/299 ... yes
264/300 ... yes
264/301 ... yes
264/302 ... yes
226/303 ... yes
226/304 ... yes
226/305 ... yes
226/306 ... yes
225/307 ... yes
255/308 ... yes
225/309 ... yes
225/310 ... yes
255/311 ... yes
255/312 ... yes
225/313 ... yes
254/314 ... yes
254/315 ... yes
254/316 ... yes
254/317 ... yes
225/318 ... yes
225/319 ... yes
280/320 ... yes
280/321 ... yes
280/322 ... yes
315/323 ... yes
315/324 ... yes
315/325 ... yes
315/326 ... yes
315/327 ... yes
258/328 ... yes
258/329 ... yes
227/330 ... yes
224/331 ... yes
276/332 ... yes
236/333 ... yes
278/334 ... yes
273/335 ... yes
271/336 ... yes
271/337 ... yes
275/338 ... yes
236/339 ... yes
252/340 ... yes
252/341 ... yes
252/342 ... yes
252/343 ... yes
259/344 ... yes
259/345 ... yes
228/346 ... yes
228/347 ... yes
228/348 ... yes
257/349 ... yes
257/350 ... yes
305/351 ... yes
299/352 ... yes
299/353 ... yes
299/354 ... yes
299/355 ... yes
329/356 ... yes
329/357 ... yes
329/358 ... yes
300/359 ... yes
300/360 ... yes
330/361 ... yes
330/362 ... yes
330/363 ... yes
330/364 ... yes
279/365 ... yes
314/366 ... yes
224/367 ... yes
227/368 ... yes
227/369 ... yes
227/370 ... yes
256/371 ... yes
256/372 ... yes
256/373 ... yes
276/374 ... yes
236/375 ... yes
250/376 ... yes
281/377 ... yes
277/378 ... yes
312/379 ... yes
332/380 ... yes
333/381 ... yes
273/382 ... yes
271/383 ... yes
270/384 ... yes
234/385 ... yes
234/386 ... yes
234/387 ... yes
270/388 ... yes
270/389 ... yes
269/390 ... yes
309/391 ... yes
286/392 ... yes
242/393 ... yes
242/394 ... yes
242/395 ... yes
242/396 ... yes
242/397 ... yes
242/398 ... yes
318/399 ... yes
317/400 ... yes
319/401 ... yes
285/402 ... yes
292/403 ... yes
242/404 ... yes
287/405 ... yes
293/406 ... yes
283/407 ... yes
283/408 ... yes
283/409 ... yes
284/410 ... yes
284/411 ... yes
284/412 ... yes
282/413 ... yes
282/414 ... yes
242/415 ... yes
242/416 ... yes
242/417 ... yes
291/418 ... yes
242/419 ... yes
325/420 ... yes
291/421 ... yes
291/422 ... yes
291/423 ... yes
324/424 ... yes
324/425 ... yes
324/426 ... yes
291/427 ... yes
291/428 ... yes
322/429 ... yes
321/430 ... yes
323/431 ... yes
320/432 ... yes
242/433 ... yes
288/434 ... yes
242/435 ... yes
242/436 ... yes
304/437 ... yes
304/438 ... yes
304/439 ... yes
311/440 ... yes
312/441 ... yes
311/442 ... yes
332/443 ... yes
332/444 ... yes
333/445 ... yes
333/446 ... yes
274/447 ... yes
250/449 ... yes
168/450 ... yes
250/451 ... yes
234/452 ... yes
168/453 ... yes
168/454 ... yes
276/455 ... yes
262/456 ... yes
168/457 ... yes
728/459 ... yes
728/460 ... yes
728/461 ... yes
278/462 ... yes
248/463 ... yes
312/464 ... yes
310/465 ... yes
310/466 ... yes
750/468 ... yes
242/469 ... yes
242/470 ... yes
758/471 ... yes
758/472 ... yes
264/473 ... yes
262/474 ... yes
765/475 ... yes
765/476 ... yes
262/477 ... yes
Redis version >= 6.0.0? ... yes
Ruby version >= 2.7.2 ? ... yes (2.7.7)
Git user has default SSH configuration? ... yes
Active users: ... 63
Is authorized keys file accessible? ... skipped (authorized keys not enabled)
GitLab configured to store new projects in hashed storage? ... yes
All projects are in hashed storage? ... yes
Checking GitLab App ... Finished
Checking GitLab subtasks ... Finished
</pre>
</details>
### Possible fixes
<!-- If you can, link to the line of code that might be responsible for the problem. -->
issue