CI Pipelines timeout unexpectedly with the trace freezing
Summary
When running a multi stage pipeline, at any point in the pipeline, we stop receiving receiving trace updates and job eventually times out.
Looking into the frozen container shows that the job did in fact succeed with all the artifacts present from the job. The job however will stay frozen with no logs until the job times out and fails.
Very similar to this issue: #3299 (closed)
Per the suggestion in the above issue, we have tried also directly doing:
Kubectl exec -our command-
And we have not encountered the same freezing issue
Steps to reproduce
The reproducibility of this job seems to be random. The job seems to be freezing at any given point in our pipeline (roughly 10% to 25% of the time).
Actual behavior
Jobs (roughly 10% to 25% of the time) will fail due to timeout with the logs on the executor simply failing to continue.
Expected behavior
All jobs should pass or fail but not freeze
Relevant logs and/or screenshots
We see nothing out of the ordinary on our Executor logs.
Our build logs look like this (This is the frozen log, normally, it continues to enter other folders):
Running with gitlab-runner 11.4.0 (8af42251)
on gke-code-compiler-test-gitlab-runner-6fdc565bdd-llgvw 190af49d
Using Kubernetes namespace: default
Using Kubernetes executor with image master:latest ...
Waiting for pod default/runner-190af49d-project-5977043-concurrent-2bpzmc to be running, status is Pending
Waiting for pod default/runner-190af49d-project-5977043-concurrent-2bpzmc to be running, status is Pending
Running on runner-190af49d-project-5977043-concurrent-2bpzmc via gke-code-compiler-test-gitlab-runner-6fdc565bdd-llgvw...
Skipping Git repository setup
Skipping Git checkout
Skipping Git submodules setup
$ # sourcing helper functions # collapsed multi-line command
$ touch /output.txt
$ which ssh-agent || ( apt-get update -y && apt-get install openssh-client -y )
Get:1 http://packages.cloud.google.com/apt cloud-sdk-xenial InRelease [6372 B]
Get:2 http://packages.ros.org/ros/ubuntu xenial InRelease [4040 B]
Get:3 http://packages.cloud.google.com/apt cloud-sdk-xenial/main amd64 Packages [57.1 kB]
Get:4 https://download.docker.com/linux/ubuntu xenial InRelease [66.2 kB]
Get:5 http://packages.ros.org/ros/ubuntu xenial/main amd64 Packages [706 kB]
Get:6 http://security.ubuntu.com/ubuntu xenial-security InRelease [107 kB]
Get:7 http://archive.ubuntu.com/ubuntu xenial InRelease [247 kB]
Get:8 https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages [5299 B]
Get:9 https://packagecloud.io/github/git-lfs/ubuntu xenial InRelease [23.2 kB]
Get:10 https://packagecloud.io/github/git-lfs/ubuntu xenial/main amd64 Packages [7044 B]
Get:11 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [751 kB]
Get:12 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Get:13 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
Get:14 http://archive.ubuntu.com/ubuntu xenial/main amd64 Packages [1558 kB]
Get:15 http://security.ubuntu.com/ubuntu xenial-security/restricted amd64 Packages [12.7 kB]
Get:16 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [511 kB]
Get:17 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [4026 B]
Get:18 http://archive.ubuntu.com/ubuntu xenial/restricted amd64 Packages [14.1 kB]
Get:19 http://archive.ubuntu.com/ubuntu xenial/universe amd64 Packages [9827 kB]
Get:20 http://archive.ubuntu.com/ubuntu xenial/multiverse amd64 Packages [176 kB]
Get:21 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [1153 kB]
Get:22 http://archive.ubuntu.com/ubuntu xenial-updates/restricted amd64 Packages [13.1 kB]
Get:23 http://archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [914 kB]
Get:24 http://archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [19.0 kB]
Get:25 http://archive.ubuntu.com/ubuntu xenial-backports/main amd64 Packages [7959 B]
Get:26 http://archive.ubuntu.com/ubuntu xenial-backports/universe amd64 Packages [8532 B]
Fetched 16.4 MB in 3s (4999 kB/s)
Reading package lists...
Reading package lists...
Building dependency tree...
The following additional packages will be installed:
libxmuu1 xauth
Suggested packages:
ssh-askpass libpam-ssh keychain monkeysphere
The following NEW packages will be installed:
libxmuu1 openssh-client xauth
0 upgraded, 3 newly installed, 0 to remove and 61 not upgraded.
Need to get 617 kB of archives.
After this operation, 3902 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu xenial/main amd64 libxmuu1 amd64 2:1.1.2-2 [9674 B]
Get:2 http://archive.ubuntu.com/ubuntu xenial-updates/main amd64 openssh-client amd64 1:7.2p2-4ubuntu2.6 [584 kB]
Get:3 http://archive.ubuntu.com/ubuntu xenial/main amd64 xauth amd64 1:1.0.9-1ubuntu2 [22.7 kB]
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin:
Fetched 617 kB in 1s (513 kB/s)
Selecting previously unselected package libxmuu1:amd64.
(Reading database ...
(Reading database ... 5%
(Reading database ... 10%
(Reading database ... 15%
(Reading database ... 20%
(Reading database ... 25%
(Reading database ... 30%
(Reading database ... 35%
(Reading database ... 40%
(Reading database ... 45%
(Reading database ... 50%
(Reading database ... 55%
(Reading database ... 60%
(Reading database ... 65%
(Reading database ... 70%
(Reading database ... 75%
(Reading database ... 80%
(Reading database ... 85%
(Reading database ... 90%
(Reading database ... 95%
(Reading database ... 100%
(Reading database ... 73860 files and directories currently installed.)
Preparing to unpack .../libxmuu1_2%3a1.1.2-2_amd64.deb ...
Unpacking libxmuu1:amd64 (2:1.1.2-2) ...
Selecting previously unselected package openssh-client.
Preparing to unpack .../openssh-client_1%3a7.2p2-4ubuntu2.6_amd64.deb ...
Unpacking openssh-client (1:7.2p2-4ubuntu2.6) ...
Selecting previously unselected package xauth.
Preparing to unpack .../xauth_1%3a1.0.9-1ubuntu2_amd64.deb ...
Unpacking xauth (1:1.0.9-1ubuntu2) ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...
Setting up libxmuu1:amd64 (2:1.1.2-2) ...
Setting up openssh-client (1:7.2p2-4ubuntu2.6) ...
Setting up xauth (1:1.0.9-1ubuntu2) ...
Processing triggers for libc-bin (2.23-0ubuntu10) ...
$ eval "$(ssh-agent -s)"
Agent pid 472
$ mkdir -p "$SSH_DIR"
$ chmod 700 "$SSH_DIR"
$ ssh-keyscan -t rsa gitlab.com >> "$SSH_DIR"/known_hosts
# gitlab.com:22 SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.6
$ ssh-keyscan -t rsa github.com >> "$SSH_DIR"/known_hosts
# github.com:22 SSH-2.0-babeld-f43b814b
$ echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add - > /dev/null
Identity added: (stdin) ((stdin))
$ REPO_DIR=/mnt/disks/ssd0/code
$ echo "$GCLOUD_JSON_KEY" > /bespin.json
$ mkdir -p /mnt/disks/ssd0 && cd /mnt/disks/ssd0
$ n=0 # collapsed multi-line command
Warning: Permanently added the RSA host key for IP address '35.231.145.151' to the list of known hosts.
Entering 'android'
...
In addition on our Kubernetes Instance where the executor/runners are hosted, we see this in the kubelet logs:
exec.go:71 error executing command in container: EOF
And on the Docker Logs of the Runner node within our cluster:
level=error msg="attach: stdout: write unix /var/run/docker.sock->@: write: broken pipe"
level=error msg="Error running exec in container: exec attach failed with error: write unix /var/run/docker.sock->@: write: broken pipe"
Environment description
We are using a custom Runner installation, that is, we have our own build container that is being spawned by a gitlab kubernetes executor
We are using gitlab/gitlab-runner:alpine-v10.3.0 We have also tested this with gitlab/gitlab-runner:alpine-v11.4.0 and have gotten the same results
We are hosting our executor/runners on GCP with Kubernetes version 1.10.6-gke.9
Used GitLab Runner version
Version: 11.4.0
Git revision: 8af42251
Git branch: 11-4-stable
GO version: go1.8.7
Built: 2018-10-22T10:06:36+0000
OS/Arch: linux/amd64