Jobs timing out for `buildx` cluster via KAS
As described in !1425 (comment 1450183697) and !1368 (comment 1448195706) whenever we use buildx
kubernetes cluster connection using KAS we see jobs timing out.
Potentially relevant log entry from buildx
pods is:
time="2023-06-29T17:37:00Z" level=error msg="/moby.buildkit.v1.Control/Solve returned error: rpc error: code = Canceled desc = context canceled"
which seem to indicate connection timeout. Potentially this may be due to KAS terminating connection with either kubernetes cluster or client.
gilab-agent
log entries that correlate in time:
gitlab-agent {"level":"error","time":"2023-06-29T20:46:54.992Z","msg":"Error handling a connection","mod_name":"reverse_tunnel","error":"rpc error: code = Unavailable desc = closing transport due to: connection error: desc = \"error reading from server: EOF\", received prior goaway: code: NO_ERROR","agent_id":62272}
it is uncertain whether gitlab-agent
messages actually related to failures observed.
Edited by Dmytro Makovey