docker-autoscaler and docker+machine fail to fetch incrementally
Summary
Since gitlab-runner 17.10, we seem to hit consistent issues with git fetch strategy. 17.9.0 works. I have seen this issue with every version between 18.1 and 17.9.3. I haven't tested 17.9.2-17.9.1
fatal: missing blob object 'XXXX'
error: remote did not send all necessary objects
This issue persists with retries feature flag enabled, across two different projects.
This issue signature is at least similar to #28945 (closed) but with different timeframes when the issue emerged.
Steps to reproduce
The best I have figured out so far for concrete reproduction steps is:
- Have a runner node fetch a merged-result SHA from a merged result MR (passes)
- Once the same node tries to fetch a merged-result SHA from a second merged result MR, we hit the fatal error above.
- This may be specifically when we have two different CI_MERGE_REQUEST_TARGET_BRANCH_SHA targets, but can't say for sure despite best efforts.
Actual behavior
Gitlab runner will randomly fail to fetch
Expected behavior
Gitlab runner fully fetches
Relevant logs and/or screenshots
job log
fatal: missing blob object 'XXXX'
error: remote did not send all necessary objects
Environment description
Custom installation for my org (can DM). We're a large ultimate customer.
Docker+machine (latest), docker-autoscaler, versions 18.1.1 -> 17.10
docker 25.0.8-1.amzn2023.0.4
Can also say that 17.10 K8s executor does not exhibit this issue
config.toml contents
[[runners]]
name = "autoscale-1c"
limit = 300
url = "https://gitlab.XX.com"
id = 42825
token = "XXXX"
token_obtained_at =
token_expires_at =
executor = "docker-autoscaler"
[runners.cache]
Type = "s3"
Shared = true
MaxUploadedArchiveSize = 0
[runners.cache.s3]
ServerAddress = "XX"
AccessKey = "XX"
SecretKey = "XX"
BucketName = "XX"
BucketLocation = "XX"
[runners.feature_flags]
FF_USE_FLEETING_ACQUIRE_HEARTBEATS = true
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = false
disable_entrypoint_overwrite = false
cap_add = ["SYS_ADMIN"]
oom_kill_disable = false
disable_cache = false
volumes = ["/builds:/some-other-dir"]
pull_policy = ["if-not-present"]
shm_size = 0
network_mtu = 0
[runners.docker.ulimit]
nofile = "2500"
[runners.autoscaler]
capacity_per_instance = 1
max_use_count = 50
max_instances = 300
plugin = "aws:latest"
update_interval = "10s"
update_interval_when_expecting = "0s"
[runners.autoscaler.plugin_config]
config_file = "/home/XX/.aws/config"
credentials_file = "/home/XX/.aws/credentials"
name = "XX"
profile = "default"
[runners.autoscaler.connector_config]
protocol_port = 22
username = "ec2-user"
keepalive = "0s"
timeout = "0s"
use_external_addr = true
[[runners.autoscaler.policy]]
idle_count = 0
idle_time = "30s"
scale_factor = 0.0
scale_factor_limit = 0
[runners.autoscaler.state_storage]
enabled = true
Used GitLab Runner version
Version: 18.1.1
Git revision: 2b813ade
Git branch: 18-1-stable
GO version: go1.24.4 X:cacheprog
Built: 2025-06-26T16:25:31Z
OS/Arch: linux/amd64
Possible fixes
Downgrade to 17.9.0 has worked