0 of 1 updated replicas available - CrashLoopBackOff [AutoDevOps + GKE @ stage:production]
Summary
Pipeline job for production stage fails with:
Waiting for deployment "production" rollout to finish: 0 of 1 updated replicas are available...
error: deployment "production" exceeded its progress deadline
ERROR: Job failed: command terminated with exit code 1
GKE reports deployment error: 0 of 1 updated replicas available - CrashLoopBackOff
Steps to reproduce
- Connect to existing GKE cluster which was created:
$ gcloud config set project [PROJECT_ID]
$ gcloud config set compute/zone europe-west4-a
$ export PROJECT_ID="$(gcloud config get-value project -q)"
$ gcloud container clusters create hello-cluster --num-nodes=3
$ kubectl create serviceaccount gitlab
$ kubectl create clusterrolebinding gitlab-cluster-admin --clusterrole=cluster-admin --serviceaccount=default:gitlab
$ kubectl get secrets
$ kubectl -o json get secret gitlab-token-xxxxx | jq -r '.data."ca.crt"' | base64 -d - | tee ca.crt
$ kubectl -o json get secret gitlab-token-xxxxx | jq -r '.data."token"' | base64 -d - | tee token.crt
- configure API endpoint, ca, token and wildcard ci domain
- install helm tiller, ingress, update your dns with ingress IP, cert-manager, gitlab-runner
- check your updated DNS for: *.ci.example.com -->
dig @1.1.1.1 xxx.ci.example.com
should point to your ingress IP
- Enable AutoDevOps
- use build, test and production stages
- in test stage change:
/bin/herokuish buildpack test
into/bin/herokuish version
otherwise it will fail with:
$ /bin/herokuish buildpack test
-----> Unable to select a buildpack
ERROR: Job failed: exit code 1
- Commit some code and merge dev branch to master
- Wait for production stage
- Function deploy should fail with:
Waiting for deployment "production" rollout to finish: 0 of 1 updated replicas are available...
error: deployment "production" exceeded its progress deadline
ERROR: Job failed: command terminated with exit code 1
Example Project
What is the current bug behaviour?
- Pipeline job for production stage fails
- POD restarts
What is the expected correct behaviour?
Pipeline job for production stage succeed
Relevant logs and/or screenshots
$ kubectl get pods --namespace=flask-gke-gitlab-10808718
NAME READY STATUS RESTARTS AGE
production-d54b865cf-w84dt 0/1 CrashLoopBackOff 19 53m
production-postgres-5b5cf56747-vmlnf 1/1 Running 0 53m
$ kubectl logs production-d54b865cf-w84dt --namespace=flask-gke-gitlab-10808718
2019/02/16 21:22:00 Server listening on port 8080
Deployment status:
CrashLoopBackOff Container 'auto-deploy-app' keeps crashing.
Deployment status details:
...
status:
conditions:
- lastTransitionTime: 2019-02-16T20:01:47Z
lastUpdateTime: 2019-02-16T20:01:47Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: 2019-02-16T20:11:48Z
lastUpdateTime: 2019-02-16T20:11:48Z
message: ReplicaSet "production-d54b865cf" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 1
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
StackDriver Logs:
Click to expand
2019-02-16T21:27:05,401298743+00:00 limits.cpu needs updating. Is: '1', want: '1000m'.
{
insertId: "vczrqqfuclgl6"
labels: {
compute.googleapis.com/resource_name: "fluentd-gcp-v3.2.0-8cvth"
container.googleapis.com/namespace_name: "kube-system"
container.googleapis.com/pod_name: "fluentd-gcp-scaler-8b674f786-t8wft"
container.googleapis.com/stream: "stdout"
}
logName: "projects/hello-gke-231316/logs/fluentd-gcp-scaler"
receiveTimestamp: "2019-02-16T21:27:11.344007112Z"
resource: {
labels: {
cluster_name: "hello-cluster"
container_name: "fluentd-gcp-scaler"
instance_id: "3000136174297012618"
namespace_id: "kube-system"
pod_id: "fluentd-gcp-scaler-8b674f786-t8wft"
project_id: "hello-gke-231316"
zone: "europe-west4-a"
}
type: "container"
}
severity: "INFO"
textPayload: "2019-02-16T21:27:05,401298743+00:00 limits.cpu needs updating. Is: '1', want: '1000m'.
"
timestamp: "2019-02-16T21:27:05.401525720Z"
}
--------------------------------------------
2019-02-16T21:27:05,505025655+00:00 Running: kubectl set resources -n kube-system ds fluentd-gcp-v3.2.0 -c fluentd-gcp --requests=cpu=100m,memory=200Mi --limits=cpu=1000m,memory=500Mi
{
insertId: "vczrqqfuclgl7"
labels: {
compute.googleapis.com/resource_name: "fluentd-gcp-v3.2.0-8cvth"
container.googleapis.com/namespace_name: "kube-system"
container.googleapis.com/pod_name: "fluentd-gcp-scaler-8b674f786-t8wft"
container.googleapis.com/stream: "stdout"
}
logName: "projects/hello-gke-231316/logs/fluentd-gcp-scaler"
receiveTimestamp: "2019-02-16T21:27:11.344007112Z"
resource: {
labels: {
cluster_name: "hello-cluster"
container_name: "fluentd-gcp-scaler"
instance_id: "3000136174297012618"
namespace_id: "kube-system"
pod_id: "fluentd-gcp-scaler-8b674f786-t8wft"
project_id: "hello-gke-231316"
zone: "europe-west4-a"
}
type: "container"
}
severity: "INFO"
textPayload: "2019-02-16T21:27:05,505025655+00:00 Running: kubectl set resources -n kube-system ds fluentd-gcp-v3.2.0 -c fluentd-gcp --requests=cpu=100m,memory=200Mi --limits=cpu=1000m,memory=500Mi
"
timestamp: "2019-02-16T21:27:05.505245675Z"
}
--------------------------------------------
error: info: {extensions v1beta1 daemonsets} "fluentd-gcp-v3.2.0" was not changed </summary>
{
insertId: "1fmcmfmfrtlegn"
labels: {
compute.googleapis.com/resource_name: "fluentd-gcp-v3.2.0-8cvth"
container.googleapis.com/namespace_name: "kube-system"
container.googleapis.com/pod_name: "fluentd-gcp-scaler-8b674f786-t8wft"
container.googleapis.com/stream: "stderr"
}
logName: "projects/hello-gke-231316/logs/fluentd-gcp-scaler"
receiveTimestamp: "2019-02-16T20:05:11.657662029Z"
resource: {
labels: {
cluster_name: "hello-cluster"
container_name: "fluentd-gcp-scaler"
instance_id: "3000136174297012618"
namespace_id: "kube-system"
pod_id: "fluentd-gcp-scaler-8b674f786-t8wft"
project_id: "hello-gke-231316"
zone: "europe-west4-a"
}
type: "container"
}
severity: "ERROR"
textPayload: "error: info: {extensions v1beta1 daemonsets} "fluentd-gcp-v3.2.0" was not changed
"
timestamp: "2019-02-16T20:05:06.912095351Z"
}
HTTPS Ingress works, no backend though:
503 Service Temporarily Unavailable
nginx/1.13.8
Possible fixes
???
-
meanwhile will try to reproduce it duringTHE SAME BEHAVIOURCreate new GKE cluster
scenario - will try to reproduce it on different regions if 1. unsuccessful
Edited by 🤖 GitLab Bot 🤖