Skip to content

Changing autoscaler policy causes runners to stop accepting jobs

Summary

A GitLab premium customer reported (Internal ZD Ticket) that gitlab runners stopped accepting new jobs. After closer observation the runner was reporting offline to the gitlab but still seemed to finish the current running jobs. Support was able to identify that this was shortly after a autoscaler policy change. We've found that changing their autoscaler policy is causing runners to fail unexpectedly preventing new jobs from being picked up.

GitLab runner seems to be stuck in a sort of loop and doesn't move forward with grabbing new jobs or adjusting to the policy change. In the runner trace logs collected the following error was found:

executor: reserving taskscaler capacity: no capacity: no immediately available capacity

The customer pointed out that this error was coming from this section of the code . This seems to indicate that the capacity isn't being handled appropriately.

Steps to reproduce

The customer reported that they had 2 runner polices that switched between weekends and weekdays. During the switch from weekend to weekday they started experiencing the behavior noted above.

For the weekend policy they have idle_count=0 and weekdays have idle_count=6.

Actual behavior

On autoscaler policy changes gitlab runner stops accepting new jobs.

Expected behavior

Regardless of policy changes GitLab runner continues to operate as expected.

Relevant logs and/or screenshots

12705 19:36:32.921398 write(2<UNIX-STREAM:[64307->64310]>, "\33[37;1mFailed to process runner                          \33[0;m  \33[37;1mbuilds\33[0;m=0 \33[37;1merror\33[0;m=failed to update executor: reserving taskscaler capacity: no capacity: no immediately available capacity \33[37;1mexecutor\33[0;m=docker-autoscaler \33[37;1mmax_builds\33[0;m=120 \33[37;1mrunner\33[0;m=jHtzMyjDk\n", 303 <unfinished ...>

Environment description

GitLab: Self-managed premium 16.7.4 omnibus

AWS docker autoscaler executor

Used GitLab Runner version

Runner: 17.5.2

Version: 17.5.2

Git revision: c6eae8d7

Git branch: 17-5-stable

GO version: go1.22.7

Possible Fixes

The customer highlighted these two sections of the code:

https://gitlab.com/gitlab-org/gitlab-runner/-/blob/main/executors/internal/autoscaler/provider.go?ref_type=heads#L292

https://gitlab.com/gitlab-org/fleeting/taskscaler/-/blob/main/taskscaler.go#L448

Implementation

  • {placeholder for implementation plan}
Edited by Darren Eastman