Skip to content

Network issues on hosted runners on Linux Arm when using Docker-in-Docker

Summary

When Running builds on hosted runners on Linux Arm in combination with DinD, users may encounter network errors.

More details and examples here:

The same build on amd64 works fine. Additionally, the same build on a self-hosted arm64 runner (hosted in GCP) also has no similar issue(s)

This isn't specific to the tools being installed - it was occurring previously, when we were building asdf-based tools2024-07-09

2024-05-10

Relevant logs and/or screenshots

 > [ 6/12] RUN curl https://sh.rustup.rs -sSf | sh -s -- -y     && . "$HOME/.cargo/env"     && cargo install mise     && eval "$(mise activate bash)"     && echo "# Added to support mise for GDK" >> ~/.bashrc     && echo ". "$HOME/.cargo/env"" >> ~/.bashrc     && echo "eval "$(mise activate bash)"" >> ~/.bashrc:
0.291 info: downloading installer
184.2 curl: (35) Recv failure: Connection reset by peer
184.2 rustup: command failed: downloader https://static.rust-lang.org/rustup/dist/aarch64-unknown-linux-gnu/rustup-init /tmp/tmp.bnztxHNkiU/rustup-init aarch64-unknown-linux-gnu

Steps to reproduce

Run a build of https://gitlab.com/gitlab-org/developer-relations/contributor-success/gdk-iab-container on an arm64 runner

What is the current bug behavior?

Connectivity issues when building docker containers under the arm64 runners.

What is the expected correct behavior?

No connectivity issues encountered.

Identified Root Cause

Network connectivity issues occur because Google Cloud's network uses smaller packet sizes (MTU=1460) than the default Docker container networks (MTU=1500). When running containers inside containers (Docker-in-Docker), these mismatched packet sizes can cause network failures, especially in ARM64 environments where the networking stack handles these mismatches differently. Unfortunately as of now this can only be handled client-side.

For more details why this issue occurs, you can check #473739 (comment 2097585327).

Workaround

Jobs that are using Docker-in-Docker (for either building or using Docker images within the job script), the DinD service can be reconfigured to add the --mtu=1400 flag to DinD's commandline.

Considering the job is defined as:


docker-in-docker job:
  image: "docker:${DOCKER_VERSION}"
  services: "docker:${DOCKER_VERSION}-dind"
  (...)

changing to the following, will prevent this issue:


docker-in-docker job:
  image: "docker:${DOCKER_VERSION}"
  services:
    - name: "docker:${DOCKER_VERSION}-dind"
      command: ["--mtu=1400"]
  (...)

Look into the CI/CD YAML syntax reference documentation for details on what's the supported syntax of the services section.

For customers using Buildah, check the suggested fix here.

This bug happens on GitLab.com

Edited by Gabriel Engel