Gitlab runners frequently failing with: There has been a runner system failure, please try again

Errors on the gitlab-main runner machine:

no space left on device -> we increased the disk space but we noticed significant growth in disk usage over past couple of days (almost 2GB a day).

seems major contributor is a log file:

image.png

journalctl logs following errors continously:

Feb 07 09:15:46 ip-172-30-0-7.eu-west-1.compute.internal gitlab-runner[21002]: time="2025-02-07T09:15:46Z" level=warning msg="Requesting machine removal" lifetime=43.396151ms name=runner-gxpyubax-aws-buts-gitlab-prod-c5-1738667769-ff02fc3e now="2025-02-07 09:15:46.417759143 +0000 UTC m=+59565.252263582" reason="machine is unavailable" used=43.397394ms usedCount=1 Feb 07 09:15:46 ip-172-30-0-7.eu-west-1.compute.internal gitlab-runner[21002]: time="2025-02-07T09:15:46Z" level=error msg="Error getting migrated host: unexpected end of JSON input" name=runner-gxpyubax-aws-buts-gitlab-prod-c5-1738667786-a30e3905 operation=exists Feb 07 09:15:46 ip-172-30-0-7.eu-west-1.compute.internal gitlab-runner[21002]: time="2025-02-07T09:15:46Z" level=warning msg="Skipping machine removal, because it doesn't exist" lifetime=25.451941ms name=runner-gxpyubax-aws-buts-gitlab-prod-c5-1738667786-a30e3905 reason="machine is unavailable" used="417.079µs" usedCount=1 Feb 07 09:15:46 ip-172-30-0-7.eu-west-1.compute.internal gitlab-runner[21002]: time="2025-02-07T09:15:46Z" level=info msg="Machine removed" lifetime=105.783852ms name=runner-gxpyubax-aws-buts-gitlab-prod-c5-1738667786-a30e3905 now="2025-02-07 09:15:46.42483131 +0000 UTC m=+59565.259335734" reason="machine is unavailable" retries=0 used=80.749222ms usedCount=1 Feb 07 09:15:46 ip-172-30-0-7.eu-west-1.compute.internal gitlab-runner[21002]: time="2025-02-07T09:15:46Z" level=warning msg="Requesting machine removal" lifetime=46.025605ms name=runner-gxpyubax-aws-buts-gitlab-prod-c5-1738663223-9e8d1c9b now="2025-02-07 09:15:46.43599489 +0000 UTC m=+59565.270499312" reason="machine is unavailable" used=46.026541ms usedCount=1 Feb 07 09:15:46 ip-172-30-0-7.eu-west-1.compute.internal gitlab-runner[21002]: time="2025-02-07T09:15:46Z" level=error msg="Error getting migrated host: unexpected end of JSON input" name=runner-gxpyubax-aws-buts-gitlab-prod-c5-1738667770-1023f83b operation=exists Feb 07 09:15:46 ip-172-30-0-7.eu-west-1.compute.internal gitlab-runner[21002]: time="2025-02-07T09:15:46Z" level=warning msg="Skipping machine removal, because it doesn't exist" lifetime=27.674228ms name=runner-gxpyubax-aws-buts-gitlab-prod-c5-1738667770-1023f83b reason="machine is unavailable" used=2.321719ms usedCount=1

checking for the machines in the main runner we see lot of machines in error state and those arent available on aws console:

image.png

gitlab version: self managed 17.8.3

runner version: 17.8.3

Edited by Dharani Vattamwar