binfmt.service not starting on green runners
ZD: https://gitlab.zendesk.com/agent/tickets/521744
It appears with gitlab-cookbooks/cookbook-wrapper-gitlab-runner!57 (merged) to address GPU flakiness in gitlab-com/gl-infra/production#17835 (closed), binfmt.service
is not starting properly for some reason:
runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd ~ # systemctl status binfmt.service
● binfmt.service
Loaded: loaded (/etc/systemd/system/binfmt.service; enabled; vendor preset: disabled)
Active: failed (Result: signal) since Mon 2024-04-22 14:59:38 UTC; 49min ago
Main PID: 1041 (code=killed, signal=TERM)
CPU: 61ms
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: Starting binfmt.service...
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: binfmt.service: Main process exited, code=killed, status=15/TERM
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: binfmt.service: Failed with result 'signal'.
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: Stopped binfmt.service.
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: binfmt.service: Consumed 61ms CPU time
I wonder if Docker Machine is restarting docker
while binfmt.service
is trying to run because running systemctl start binfmt.service
seems to work fine now:
runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd ~ # systemctl status binfmt.service
● binfmt.service
Loaded: loaded (/etc/systemd/system/binfmt.service; enabled; vendor preset: disabled)
Active: active (exited) since Mon 2024-04-22 15:52:33 UTC; 1s ago
Process: 13367 ExecStart=/usr/bin/docker run --rm --privileged linuxkit/binfmt:v0.8 (code=exited, status=0/SUCCESS)
Main PID: 13367 (code=exited, status=0/SUCCESS)
CPU: 52ms
Apr 22 15:52:33 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: Starting binfmt.service...
Apr 22 15:52:33 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: Started binfmt.service.
The Docker logs do suggest binfmt.service
was terminated:
# sudo journalctl -u docker.service
-- Logs begin at Mon 2024-04-22 14:59:19 UTC, end at Mon 2024-04-22 15:50:46 UTC. --
Apr 22 14:59:28 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: Starting Docker Application Container Engine...
Apr 22 14:59:32 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:32.597517155Z" level=info msg="Starting up"
Apr 22 14:59:32 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:32.797712005Z" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 22 14:59:32 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:32.797764516Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 22 14:59:32 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:32.797796716Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}"
module=grpc
Apr 22 14:59:32 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:32.797830946Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 22 14:59:32 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:32.841917802Z" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 22 14:59:32 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:32.841968463Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 22 14:59:32 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:32.841997653Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/containerd/containerd.sock 0 <nil>}] <nil>}"
module=grpc
Apr 22 14:59:32 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:32.842026003Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 22 14:59:34 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:34.334925398Z" level=warning msg="Your kernel does not support cgroup blkio weight"
Apr 22 14:59:34 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:34.334958508Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Apr 22 14:59:34 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:34.335167492Z" level=info msg="Loading containers: start."
Apr 22 14:59:35 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:35.449673118Z" level=error msg="Could not add route to IPv6 network fc00::1/7 via device docker0"
Apr 22 14:59:35 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:35.584003813Z" level=info msg="Removing stale sandbox 31c6a4c9e7157e45febc314a0d987e71c45828de9f98e3e400c47378cfcc0111 (8c1dbc2220947a92
747c86c106130a5791a3ab7fa6750c60d3ee597c5636e87e)"
Apr 22 14:59:35 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:35.602657423Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 1470d00a7bed730bd5
b85cd61bbdf0f656f444830454a1cfa4be113e40b6edc0 41697d6f66562e3b4e0906d3c3333a240dc4dcc2c420faf58b3b6c7f9ff0ad63], retrying...."
Apr 22 14:59:35 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:35.688189716Z" level=info msg="Removing stale sandbox 5c034e67a0716bfa30d15ca35dc1ff03c1314509e1f2ebc06d03f462f01d3284 (17e19eb0ac0b6429
2ca77913c9c80cf0433960f1c25cd3e35b6326b04002c6ca)"
Apr 22 14:59:35 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:35.690123310Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 81ee74cdd69cd1012b
235ebe718f840cdfeeef20021ad6e6867ae691138ade23 ce6a477ef16cdc9497a69bca457bfe905fa6477bd28fd18e77781ee7019bb777], retrying...."
Apr 22 14:59:35 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:35.787350869Z" level=info msg="Removing stale sandbox cfd8bdb510f06825b742d45eae0a658ab8310b756683291306c3a500926326df (fa185f65bc99292d
64371bc0fb7686f856397afbf3dcbdfb9b73099ec9f4b5e2)"
Apr 22 14:59:35 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:35.802076159Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint 81ee74cdd69cd1012b
235ebe718f840cdfeeef20021ad6e6867ae691138ade23 bed6e713b8d9687d2be824208095a224800482b19872a6b1c574f78953d8c1e2], retrying...."
Apr 22 14:59:35 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:35.862988466Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to
set a preferred IP address"
Apr 22 14:59:37 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:37.584542927Z" level=info msg="Loading containers: done."
Apr 22 14:59:37 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:37.903746631Z" level=info msg="Docker daemon" commit=99e3ed8 graphdriver(s)=overlay2 version=19.03.15
Apr 22 14:59:37 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:37.943255890Z" level=info msg="Daemon has completed initialization"
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:38.086549333Z" level=info msg="API listen on /var/run/docker.sock"
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: Started Docker Application Container Engine.
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:38.463186783Z" level=info msg="Processing signal 'terminated'"
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[415]: time="2024-04-22T14:59:38.464030088Z" level=info msg="Daemon shutdown complete"
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: Stopping Docker Application Container Engine...
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd echo[1085]: docker daemon exited
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: Stopped Docker Application Container Engine.
Apr 22 14:59:38 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: docker.service: Consumed 1.184s CPU time
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: Starting Docker Application Container Engine...
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.681203130Z" level=info msg="Starting up"
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.684317875Z" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.684343535Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.684364526Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 <nil>}] <nil>}" mo
dule=grpc
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.684379526Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.685598558Z" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.685620758Z" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.685648558Z" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock 0 <nil>}] <nil>}" mo
dule=grpc
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.685663929Z" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.911049834Z" level=warning msg="Your kernel does not support cgroup blkio weight"
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.911347639Z" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Apr 22 14:59:39 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:39.911559673Z" level=info msg="Loading containers: start."
Apr 22 14:59:40 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:40.013801161Z" level=error msg="Could not add route to IPv6 network fc00::1/7 via device docker0"
Apr 22 14:59:40 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:40.036463601Z" level=info msg="There are old running containers, the network config will not take affect"
Apr 22 14:59:40 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:40.088494501Z" level=info msg="Loading containers: done."
Apr 22 14:59:40 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:40.186381432Z" level=info msg="Docker daemon" commit=99e3ed8 graphdriver(s)=overlay2 version=19.03.15
Apr 22 14:59:40 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:40.186456824Z" level=info msg="Daemon has completed initialization"
Apr 22 14:59:40 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:40.220255351Z" level=info msg="API listen on [::]:2376"
Apr 22 14:59:40 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:40.220415004Z" level=info msg="API listen on /var/run/docker.sock"
Apr 22 14:59:40 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd systemd[1]: Started Docker Application Container Engine.
Apr 22 14:59:48 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:48.368189105Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 14:59:49 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T14:59:49.151208180Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 15:00:48 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:00:48.293015033Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:00:48 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:00:48.299026258Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 15:00:48 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:00:48.562294957Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 15:00:48 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:00:48.788795770Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 15:00:48 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:00:48.809763260Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 15:01:19 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:01:19.682428256Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 15:01:35 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:01:35.923271337Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 15:01:44 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:01:44.172871951Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 15:01:45 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:01:45.389443807Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 15:01:48 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:01:48.936264822Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 15:01:52 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:01:52.622338098Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Apr 22 15:02:11 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:02:11.343542980Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:02:16 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:02:16.343384829Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:02:26 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:02:26.347015080Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:02:46 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:02:46.359097784Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:07:54 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:07:54.052840911Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:07:59 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:07:59.054056263Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:08:09 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:08:09.060703779Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:08:29 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:08:29.073451935Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:45:29 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:45:29.572906427Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:45:34 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:45:34.576374156Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:45:44 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:45:44.584704724Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"
Apr 22 15:46:04 runner-rgetuxo-s-l-m-amd64-1713797942-2f4fb2bd dockerd[1217]: time="2024-04-22T15:46:04.597761889Z" level=error msg="[resolver] error writing resolver resp, dns: bad rdata"