Preparing environment hungs - double time slowness in jobs
UPDATE! This seems to be runner isue as the timing problem doeasn't exist when reverted back to runner gitlab-runner:alpine-v14.1.0??
Summary
After gitlab upgrade helm chart to gitlab-5.4.5 (14.4.5), we observe double timing in job runs and especially in Preparing environment stage, in our kubernetes cluster Preparing environment stage sometimes takes as long as 4min and not always so hard to trace what is wrong.
Steps to reproduce
Upgrade to 5.4.5 with runner gitlab-runner:alpine-v14.4.2
Configuration used
image: docker-proxy.ourdomain.io/gitlab/gitlab-runner:alpine-v14.4.2
#image: gitlab/gitlab-runner:alpine-v13.4.1
gitlabUrl: https://gitlab.ourdomain.io/
runnerRegistrationToken: "redacted"
unregisterRunners: true
concurrent: 50
logLevel: debug
envVars:
- name: REGISTER_RUN_UNTAGGED
value: "true"
rbac:
create: true
imagePullSecrets: ["regsecret"]
## Configuration for the Pods that that the runner launches for each new job
##
runners:
config: |
[[runners]]
environment = ["REGISTER_RUN_UNTAGGED=true", "RUNNER_OUTPUT_LIMIT=40960", "KUBERNETES_POLL_TIMEOUT=480"]
[runners.kubernetes]
image = "docker-proxy.ourdomain.io/ubuntu:18.04"
privileged = true
image_pull_secrets = ["regsecret"]
poll_timeout = 600
cpu_limit = "2"
memory_limit = "7.5Gi"
helper_image = "docker-proxy.ourdomain.io/gitlab/gitlab-runner-helper:x86_64-v${CI_RUNNER_VERSION}"
[runners.kubernetes.node_selector]
"kops.k8s.io/instancegroup" = "spot"
[runners.kubernetes.pod_labels]
"podLabels" = "spot"
"team" = "shared"
"environment" = "devops"
"environment2" = "internal-tools"
"component_type" = "runner"
"service" = "gitlab-ci"
[runners.cache]
Type = "s3"
Path = "gitlab_runner"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
BucketName = "ourdomain-gitlab-cache"
BucketLocation = "eu-west-1"
Insecure = false
locked: false
tags: "k8s"
#
nodeSelector: {
kops.k8s.io/instancegroup: ondemand
}
Current behavior
Preparing environment hungs with error in runner log, jobs takes twice as long as before
gitlab-runner-test-gitlab-runner WARNING: Submitting job to coordinator... job failed code=403 job=1759493 job-status=canceled runner=xHvg9A1q update-interval=0s
Expected behavior
Preparing env should take much quicker
Versions
- Chart: 5.4.5 (14.4.5)
- Platform:
- Cloud: (AWS)
- Self-hosted: (Kops)
- Kubernetes: (
kubectl version)- Client: 1.20.12
- Server: 1.20.12
- Helm: (
helm version)- Client: 3.6.3
- Server: n/a
Relevant logs
runner debug log when hungs
Checking for jobs... received job=1759955 repo_url=https://gitlab.ourdomain.io/ahoy/zoovu-platform-backend.git runner=BdTZKDHs
Running with gitlab-runner 14.4.2 (50fc80a6) job=1759955 project=12 runner=BdTZKDHs
on gitlab-runner-test-gitlab-runner-57d464cb4b-ppdjm BdTZKDHs job=1759955 project=12 runner=BdTZKDHs
Preparing the "kubernetes" executor job=1759955 project=12 runner=BdTZKDHs
Regex allowing overrides for Namespace is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
Regex allowing overrides for ServiceAccount is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
Regex allowing overrides for BearerToken is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
Regex allowing overrides for PodAnnotations is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for CPURequest is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for MemoryRequest is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for EphemeralStorageRequest is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for CPULimit is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for MemoryLimit is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for EphemeralStorageLimit is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for ServiceCPURequest is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for ServiceMemoryRequest is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for ServiceEphemeralStorageRequest is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for ServiceCPULimit is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for ServiceMemoryLimit is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for ServiceEphemeralStorageLimit is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for HelperCPURequest is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for HelperMemoryRequest is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for HelperEphemeralStorageRequest is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for HelperCPULimit is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for HelperMemoryLimit is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
setting allowing overrides for HelperEphemeralStorageLimit is empty, disabling override. job=1759955 project=12 runner=BdTZKDHs
Using Kubernetes namespace: gitlab-ci job=1759955 project=12 runner=BdTZKDHs
Using Kubernetes executor with image docker.zoovu.io/st3-runner-image ... job=1759955 project=12 runner=BdTZKDHs
Using attach strategy to execute scripts... job=1759955 project=12 runner=BdTZKDHs
Using helper image: registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-50fc80a6 job=1759955 project=12 runner=BdTZKDHs
job=1759955 project=12 runner=BdTZKDHs
Waiting for signals... job=1759955 project=12 runner=BdTZKDHs
No referees configured job=1759955 project=12 runner=BdTZKDHs
Executing build stage build_stage=prepare_script job=1759955 project=12 runner=BdTZKDHs
Preparing environment job=1759955 project=12 runner=BdTZKDHs
Starting Kubernetes command with attach... job=1759955 project=12 runner=BdTZKDHs
Setting up secrets job=1759955 project=12 runner=BdTZKDHs
Setting up scripts config map job=1759955 project=12 runner=BdTZKDHs
Appending trace to coordinator... ok code=202 job=1759955 job-log=0-544 job-status=running runner=BdTZKDHs sent-log=0-543 status=202 Accepted update-interval=1m0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Submitting job to coordinator... ok code=200 job=1759955 job-status= runner=BdTZKDHs update-interval=0s
Setting up build pod job=1759955 project=12 runner=BdTZKDHs
Creating build pod job=1759955 project=12 runner=BdTZKDHs
Creating pod proxy services job=1759955 project=12 runner=BdTZKDHs
Starting in container "helper" the command ["gitlab-runner-build" "<<<" "/scripts-12-1759955/prepare_script" "2>&1 | tee -a /logs-12-1759955/output.log"] with script: #!/usr/bin/env bash
job=1759955 project=12 runner=BdTZKDHs
Remote process exited with the status: CommandExitCode: 0, Script: /scripts-12-1759955/prepare_script job=1759955 project=12 runner=BdTZKDHs
Container "helper" exited with error: <nil> job=1759955 project=12 runner=BdTZKDHs
Executing build stage build_stage=get_sources job=1759955 project=12 runner=BdTZKDHs
Getting source from Git repository job=1759955 project=12 runner=BdTZKDHs
Starting Kubernetes command with attach... job=1759955 project=12 runner=BdTZKDHs
Not sure if related to the issue:
occasional (twice per minuet or so )
gitaly *** /var/log/gitaly/gitaly.log *** │
│ gitaly {"level":"warning","msg":" transport: http2Server.HandleStreams failed to read frame: read unix /home/git/internal_11.sock-\u003e@: read: connection reset by peer","pid":11," │
│ gitaly {"level":"warning","msg":" transport: http2Server.HandleStreams failed to read frame: read unix /home/git/internal_11.sock-\u003e@: read: connection reset by peer","pid":11," │
│ gitaly {"level":"warning","msg":" transport: http2Server.HandleStreams failed to read frame: read unix /home/git/internal_11.sock-\u003e@: read: connection reset by peer","pid":11," │
│ gitaly {"level":"warning","msg":" transport: http2Server.HandleStreams failed to read frame: read tcp 100.102.113.106:8075-\u003e100.118.174.28:37760: read: connection reset by peer │
│ gitaly
Also observed intermittent secret and configmaps difficulties for runner with build to mount during startup
from runner events
4m32s Warning FailedMount pod/runner-u6p2qqau-project-97-concurrent-10jlbnz MountVolume.SetUp failed for volume "scripts" : failed to sync configmap cache: timed out waiting for the condition
2m29s Warning FailedMount pod/runner-u6p2qqau-project-97-concurrent-1xjktd MountVolume.SetUp failed for volume "default-token-2nw7s" : failed to sync secret cache: timed out waiting for the condition
interesting kube-api logs also intermittent
I0128 08:57:43.260860 1 node_authorizer.go:206] "NODE DENY" err="node 'ip-10-0-1-109.eu-west-1.compute.internal' cannot get unknown configmap gitlab-ci/runner-bt4oge9--project-12-concurrent-19-scripts4mtr8"
I0128 09:06:56.270020 1 node_authorizer.go:206] "NODE DENY" err="node 'ip-10-0-1-247.eu-west-1.compute.internal' cannot get unknown configmap gitlab-ci/runner-bt4oge9--project-74-concurrent-3-scripts6jbjn"
I0128 09:06:56.271295 1 node_authorizer.go:206] "NODE DENY" err="node 'ip-10-0-1-247.eu-west-1.compute.internal' cannot get secret gitlab-ci/regsecret, no relationship to this object was found in the node authorizer graph"
I0128 09:06:56.272043 1 node_authorizer.go:206] "NODE DENY" err="node 'ip-10-0-1-247.eu-west-1.compute.internal' cannot get secret gitlab-ci/default-token-2nw7s, no relationship to this object was found in the node authorizer graph"
More to follow if we find more info.This makes our life really hard Please help or advice.