Self-managed runner keeps job pending while being idle
<!--- Please read this! Before opening a new issue, make sure to search for keywords in the issues filtered by the "regression" or "type::bug" label: - https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=regression - https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=type::bug and verify the issue you're about to submit isn't a duplicate. ---> ### Summary <!-- Summarize the bug encountered concisely. --> We have a self-managed GitLab instance (on K8s) with auto-scaled runners (VM's). The only runner in our environment is the runner-manager which automatically creates new VM's to run the jobs on and removes them again after a few minutes of being idle. We have it configured to always have 1 idle runner within business hours so that we can quickly run jobs without having to wait for the runner-manager to create a new VM (or multiple). Recently we have noticed that this setup is having problems with one particular job. This job stays `Pending`. * In `Admin > CI/CD > Runners` I can see the runner-manager is `Online` and `Idle`. * In `Admin > CI/CD > Jobs` I can see the job being on status `Pending` with `none` in the Runner column. So it seems like this particular job didn't get a runner assigned. In the logs of the runner itself I see this every second: ``` Checking for jobs...nothing runner=xxxxxxx ``` While creating this issue I noticed that the job suddenly got picked up, after being queued for 59 minutes and 18 seconds (just in time before it would have been stopped by the `1h` timeout). Usually we cancel and retry it after a few minutes of being it stuck, after which it always immediately gets picked up and proceeds with the rest of the pipeline. Our runner-manager configuration: ```toml listen_address = ":9252" concurrent = 25 check_interval = 1 log_level = "info" shutdown_timeout = 0 [session_server] session_timeout = 1800 [[runners]] name = "runner-manager" url = "https://gitlab.example.com" id = 114 token = "<redacted>" token_obtained_at = 2023-01-30T03:04:44Z token_expires_at = 0001-01-01T00:00:00Z executor = "docker+machine" environment = ["DOCKER_TLS_CERTDIR="] [runners.custom_build_dir] [runners.cache] Type = "gcs" Shared = true MaxUploadedArchiveSize = 0 [runners.cache.s3] [runners.cache.gcs] BucketName = "<redacted>" [runners.cache.azure] [runners.feature_flags] FF_USE_FASTZIP = true [runners.docker] tls_verify = false image = "alpine" privileged = true disable_entrypoint_overwrite = false oom_kill_disable = false disable_cache = false volumes = ["/cache"] pull_policy = ["always"] shm_size = 0 [runners.machine] MaxGrowthRate = 25 IdleCount = 0 IdleScaleFactor = 0.0 IdleCountMin = 0 IdleTime = 300 MaxBuilds = 20 MachineDriver = "google" MachineName = "auto-scale-%s" MachineOptions = [<redacted>] [[runners.machine.autoscaling]] Periods = ["* * 6-18 * * mon-fri *"] Timezone = "Europe/Amsterdam" IdleCount = 1 IdleScaleFactor = 0.0 IdleCountMin = 1 IdleTime = 300 ``` ### Steps to reproduce <!-- Describe how one can reproduce the issue - this is very important. Please use an ordered list. --> ### Example Project It seems to be happening (a lot) with the same job, our `gitleaks` check. ```yaml test/gitleaks: stage: test rules: - if: '$CI_PIPELINE_SOURCE == "merge_request_event" allow_failure: true interruptible: true image: name: zricethezav/gitleaks entrypoint: [""] # Clear the entrypoint artifacts: expire_in: 1 hour # We only use it briefly reports: secret_detection: gitleaks-report.json when: on_failure # Gitleaks only produces a report on failure variables: GIT_STRATEGY: clone # Fetch causes issues GIT_CEILING_DIRECTORIES: $CI_PROJECT_DIR # For Git < 2.35.2 before_script: - | # Adding build directory to Git safe list git config --global --add safe.directory $CI_PROJECT_DIR # For git >= 2.35.2 - | # Obtaining the commits between this branch and the upstream branch git fetch origin $CI_DEFAULT_BRANCH $CI_COMMIT_REF_NAME git log --left-right --cherry-pick --pretty=format:"%H" refs/remotes/origin/$CI_DEFAULT_BRANCH...refs/remotes/origin/$CI_COMMIT_REF_NAME > commit_list.txt export GL_FIRST_COMMIT=$(head -n1 commit_list.txt) GL_LAST_COMMIT=$(tail -n1 commit_list.txt) rm commit_list.txt script: - gitleaks detect -v --log-opts="${GL_FIRST_COMMIT}..${GL_LAST_COMMIT}" --report-format=json --report-path=gitleaks-report.json #TODO: Export gitleaks report as junit report retry: 2 # Retry when failed, sometimes this job fails the first time. ``` ### What is the current *bug* behavior? <!-- Describe what actually happens. --> Job is not picked up or very late while runner showing as being `idle`. ### What is the expected *correct* behavior? <!-- Describe what you should see instead. --> Job picked up if the runner is idle. ### Relevant logs and/or screenshots <!-- Paste any relevant logs - please use code blocks (```) to format console output, logs, and code as it's tough to read otherwise. --> #### Results of GitLab environment info <!-- Input any relevant GitLab environment information if needed. --> <details> <summary>Expand for output related to GitLab environment info</summary> <pre> System information System: Current User: git Using RVM: no Ruby Version: 2.7.7p221 Gem Version: 3.2.33 Bundler Version:2.3.15 Rake Version: 13.0.6 Redis Version: 6.0.16 Sidekiq Version:6.5.7 Go Version: unknown GitLab information Version: 15.8.0 Revision: c052f86b6b4 Directory: /srv/gitlab DB Adapter: PostgreSQL DB Version: 14.4 URL: https://gitlab.example.com HTTP Clone URL: https://gitlab.example.com/some-group/some-project.git SSH Clone URL: git@gitlab.example.com:some-group/some-project.git Using LDAP: no Using Omniauth: yes Omniauth Providers: azure_activedirectory_v2, gitlab GitLab Shell Version: 14.15.0 Repository storages: - default: tcp://gitlab-gitaly-0.gitlab-gitaly.gitlab.svc:8075 GitLab Shell path: /home/git/gitlab-shell </pre> </details> #### Results of GitLab application Check <!-- Input any relevant GitLab application check information if needed. --> <details> <summary>Expand for output related to the GitLab application check</summary> <pre> Checking GitLab subtasks ... Checking GitLab Shell ... GitLab Shell: ... GitLab Shell version >= 14.15.0 ? ... OK (14.15.0) Running /home/git/gitlab-shell/bin/check gitlab-shell self-check failed Try fixing it: Make sure GitLab is running; Check the gitlab-shell configuration file: sudo -u git -H editor /home/git/gitlab-shell/config.yml Please fix the error above and rerun the checks. Checking GitLab Shell ... Finished Checking Gitaly ... Gitaly: ... default ... OK Checking Gitaly ... Finished Checking Sidekiq ... Sidekiq: ... Running? ... no Try fixing it: sudo -u git -H RAILS_ENV=production bin/background_jobs start For more information see: doc/install/installation.md in section "Install Init Script" see log/sidekiq.log for possible errors Please fix the error above and rerun the checks. Checking Sidekiq ... Finished Checking Incoming Email ... Incoming Email: ... Reply by email is disabled in config/gitlab.yml Checking Incoming Email ... Finished Checking LDAP ... LDAP: ... LDAP is disabled in config/gitlab.yml Checking LDAP ... Finished Checking GitLab App ... Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Cable config exists? ... yes Resque config exists? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... skipped (no tmp uploads folder yet) Systemd unit files or init script exist? ... no Try fixing it: Install the Service For more information see: doc/install/installation.md in section "Install the Service" Please fix the error above and rerun the checks. Systemd unit files or init script up-to-date? ... can't check because of previous errors Projects have namespace: ... 2/1 ... yes 5/2 ... yes 5/3 ... yes 5/4 ... yes 170/108 ... yes 170/109 ... yes 170/110 ... yes 169/111 ... yes 175/112 ... yes 175/113 ... yes 178/114 ... yes 177/115 ... yes 177/116 ... yes 176/117 ... yes 176/118 ... yes 181/119 ... yes 180/120 ... yes 179/121 ... yes 180/122 ... yes 179/123 ... yes 180/124 ... yes 168/125 ... yes 168/126 ... yes 168/127 ... yes 168/128 ... yes 168/129 ... yes 168/130 ... yes 168/131 ... yes 168/132 ... yes 168/133 ... yes 168/134 ... yes 168/135 ... yes 168/136 ... yes 168/137 ... yes 168/138 ... yes 168/139 ... yes 168/140 ... yes 168/141 ... yes 260/142 ... yes 260/143 ... yes 260/144 ... yes 229/145 ... yes 260/146 ... yes 267/147 ... yes 267/148 ... yes 307/149 ... yes 307/150 ... yes 307/151 ... yes 224/152 ... yes 235/153 ... yes 299/154 ... yes 224/155 ... yes 294/156 ... yes 326/157 ... yes 326/158 ... yes 327/159 ... yes 276/160 ... yes 236/161 ... yes 313/162 ... yes 313/163 ... yes 278/164 ... yes 274/165 ... yes 274/166 ... yes 274/167 ... yes 236/168 ... yes 224/169 ... yes 248/170 ... yes 306/171 ... yes 331/172 ... yes 331/173 ... yes 331/174 ... yes 265/175 ... yes 266/176 ... yes 266/177 ... yes 262/178 ... yes 262/179 ... yes 262/180 ... yes 261/181 ... yes 261/182 ... yes 261/183 ... yes 261/184 ... yes 261/185 ... yes 224/186 ... yes 241/187 ... yes 246/188 ... yes 246/189 ... yes 246/190 ... yes 246/191 ... yes 281/192 ... yes 316/193 ... yes 316/194 ... yes 281/195 ... yes 316/196 ... yes 316/197 ... yes 239/198 ... yes 247/199 ... yes 247/200 ... yes 247/201 ... yes 247/202 ... yes 247/203 ... yes 247/204 ... yes 240/205 ... yes 240/206 ... yes 240/207 ... yes 248/208 ... yes 248/209 ... yes 310/210 ... yes 272/211 ... yes 272/212 ... yes 272/213 ... yes 310/214 ... yes 310/215 ... yes 310/216 ... yes 272/217 ... yes 272/218 ... yes 308/221 ... yes 268/222 ... yes 303/223 ... yes 303/224 ... yes 230/225 ... yes 230/226 ... yes 230/227 ... yes 262/228 ... yes 262/229 ... yes 262/230 ... yes 262/231 ... yes 262/232 ... yes 262/233 ... yes 262/234 ... yes 262/235 ... yes 262/236 ... yes 262/237 ... yes 262/238 ... yes 262/239 ... yes 262/240 ... yes 262/241 ... yes 262/242 ... yes 262/243 ... yes 262/244 ... yes 262/245 ... yes 262/246 ... yes 262/247 ... yes 262/248 ... yes 262/249 ... yes 262/250 ... yes 262/251 ... yes 262/252 ... yes 262/253 ... yes 230/254 ... yes 230/255 ... yes 230/256 ... yes 230/257 ... yes 230/258 ... yes 230/259 ... yes 230/260 ... yes 230/261 ... yes 230/262 ... yes 263/263 ... yes 230/264 ... yes 230/265 ... yes 230/266 ... yes 261/267 ... yes 261/268 ... yes 261/269 ... yes 261/270 ... yes 261/271 ... yes 261/272 ... yes 261/273 ... yes 261/274 ... yes 261/275 ... yes 261/276 ... yes 261/277 ... yes 261/278 ... yes 236/279 ... yes 264/280 ... yes 264/281 ... yes 264/282 ... yes 230/283 ... yes 264/284 ... yes 264/285 ... yes 264/286 ... yes 264/287 ... yes 264/288 ... yes 264/289 ... yes 264/290 ... yes 264/291 ... yes 264/292 ... yes 264/293 ... yes 264/294 ... yes 264/295 ... yes 264/296 ... yes 264/297 ... yes 264/298 ... yes 264/299 ... yes 264/300 ... yes 264/301 ... yes 264/302 ... yes 226/303 ... yes 226/304 ... yes 226/305 ... yes 226/306 ... yes 225/307 ... yes 255/308 ... yes 225/309 ... yes 225/310 ... yes 255/311 ... yes 255/312 ... yes 225/313 ... yes 254/314 ... yes 254/315 ... yes 254/316 ... yes 254/317 ... yes 225/318 ... yes 225/319 ... yes 280/320 ... yes 280/321 ... yes 280/322 ... yes 315/323 ... yes 315/324 ... yes 315/325 ... yes 315/326 ... yes 315/327 ... yes 258/328 ... yes 258/329 ... yes 227/330 ... yes 224/331 ... yes 276/332 ... yes 236/333 ... yes 278/334 ... yes 273/335 ... yes 271/336 ... yes 271/337 ... yes 275/338 ... yes 236/339 ... yes 252/340 ... yes 252/341 ... yes 252/342 ... yes 252/343 ... yes 259/344 ... yes 259/345 ... yes 228/346 ... yes 228/347 ... yes 228/348 ... yes 257/349 ... yes 257/350 ... yes 305/351 ... yes 299/352 ... yes 299/353 ... yes 299/354 ... yes 299/355 ... yes 329/356 ... yes 329/357 ... yes 329/358 ... yes 300/359 ... yes 300/360 ... yes 330/361 ... yes 330/362 ... yes 330/363 ... yes 330/364 ... yes 279/365 ... yes 314/366 ... yes 224/367 ... yes 227/368 ... yes 227/369 ... yes 227/370 ... yes 256/371 ... yes 256/372 ... yes 256/373 ... yes 276/374 ... yes 236/375 ... yes 250/376 ... yes 281/377 ... yes 277/378 ... yes 312/379 ... yes 332/380 ... yes 333/381 ... yes 273/382 ... yes 271/383 ... yes 270/384 ... yes 234/385 ... yes 234/386 ... yes 234/387 ... yes 270/388 ... yes 270/389 ... yes 269/390 ... yes 309/391 ... yes 286/392 ... yes 242/393 ... yes 242/394 ... yes 242/395 ... yes 242/396 ... yes 242/397 ... yes 242/398 ... yes 318/399 ... yes 317/400 ... yes 319/401 ... yes 285/402 ... yes 292/403 ... yes 242/404 ... yes 287/405 ... yes 293/406 ... yes 283/407 ... yes 283/408 ... yes 283/409 ... yes 284/410 ... yes 284/411 ... yes 284/412 ... yes 282/413 ... yes 282/414 ... yes 242/415 ... yes 242/416 ... yes 242/417 ... yes 291/418 ... yes 242/419 ... yes 325/420 ... yes 291/421 ... yes 291/422 ... yes 291/423 ... yes 324/424 ... yes 324/425 ... yes 324/426 ... yes 291/427 ... yes 291/428 ... yes 322/429 ... yes 321/430 ... yes 323/431 ... yes 320/432 ... yes 242/433 ... yes 288/434 ... yes 242/435 ... yes 242/436 ... yes 304/437 ... yes 304/438 ... yes 304/439 ... yes 311/440 ... yes 312/441 ... yes 311/442 ... yes 332/443 ... yes 332/444 ... yes 333/445 ... yes 333/446 ... yes 274/447 ... yes 250/449 ... yes 168/450 ... yes 250/451 ... yes 234/452 ... yes 168/453 ... yes 168/454 ... yes 276/455 ... yes 262/456 ... yes 168/457 ... yes 728/459 ... yes 728/460 ... yes 728/461 ... yes 278/462 ... yes 248/463 ... yes 312/464 ... yes 310/465 ... yes 310/466 ... yes 750/468 ... yes 242/469 ... yes 242/470 ... yes 758/471 ... yes 758/472 ... yes 264/473 ... yes 262/474 ... yes 765/475 ... yes 765/476 ... yes 262/477 ... yes Redis version >= 6.0.0? ... yes Ruby version >= 2.7.2 ? ... yes (2.7.7) Git user has default SSH configuration? ... yes Active users: ... 63 Is authorized keys file accessible? ... skipped (authorized keys not enabled) GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... yes Checking GitLab App ... Finished Checking GitLab subtasks ... Finished </pre> </details> ### Possible fixes <!-- If you can, link to the line of code that might be responsible for the problem. -->
issue