0 of 1 updated replicas available - CrashLoopBackOff [AutoDevOps + GKE @ stage:production]

Summary

Pipeline job for production stage fails with:

Waiting for deployment "production" rollout to finish: 0 of 1 updated replicas are available...
error: deployment "production" exceeded its progress deadline
ERROR: Job failed: command terminated with exit code 1

GKE reports deployment error: 0 of 1 updated replicas available - CrashLoopBackOff

Steps to reproduce

Connect to existing GKE cluster which was created:

$ gcloud config set project [PROJECT_ID]
$ gcloud config set compute/zone europe-west4-a
$ export PROJECT_ID="$(gcloud config get-value project -q)"
$ gcloud container clusters create hello-cluster --num-nodes=3
$ kubectl create serviceaccount gitlab
$ kubectl create clusterrolebinding gitlab-cluster-admin --clusterrole=cluster-admin --serviceaccount=default:gitlab
$ kubectl get secrets
$ kubectl -o json get secret gitlab-token-xxxxx | jq -r '.data."ca.crt"' | base64 -d - | tee ca.crt
$ kubectl -o json get secret gitlab-token-xxxxx | jq -r '.data."token"' | base64 -d - | tee token.crt

configure API endpoint, ca, token and wildcard ci domain
install helm tiller, ingress, update your dns with ingress IP, cert-manager, gitlab-runner
check your updated DNS for: *.ci.example.com --> dig @1.1.1.1 xxx.ci.example.com should point to your ingress IP

Enable AutoDevOps
- use build, test and production stages
- in test stage change: /bin/herokuish buildpack test into /bin/herokuish version otherwise it will fail with:

$ /bin/herokuish buildpack test
-----> Unable to select a buildpack
ERROR: Job failed: exit code 1

Commit some code and merge dev branch to master
Wait for production stage
Function deploy should fail with:

Waiting for deployment "production" rollout to finish: 0 of 1 updated replicas are available...
error: deployment "production" exceeded its progress deadline
ERROR: Job failed: command terminated with exit code 1

Example Project

What is the current bug behaviour?

Pipeline job for production stage fails
POD restarts

What is the expected correct behaviour?

Pipeline job for production stage succeed

Relevant logs and/or screenshots

$ kubectl get pods --namespace=flask-gke-gitlab-10808718
NAME                                   READY   STATUS             RESTARTS   AGE
production-d54b865cf-w84dt             0/1     CrashLoopBackOff   19         53m
production-postgres-5b5cf56747-vmlnf   1/1     Running            0          53m

$ kubectl logs production-d54b865cf-w84dt --namespace=flask-gke-gitlab-10808718
2019/02/16 21:22:00 Server listening on port 8080

Deployment status:

CrashLoopBackOff Container 'auto-deploy-app' keeps crashing.

Deployment status details:

...
status:
  conditions:
  - lastTransitionTime: 2019-02-16T20:01:47Z
    lastUpdateTime: 2019-02-16T20:01:47Z
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: 2019-02-16T20:11:48Z
    lastUpdateTime: 2019-02-16T20:11:48Z
    message: ReplicaSet "production-d54b865cf" has timed out progressing.
    reason: ProgressDeadlineExceeded
    status: "False"
    type: Progressing
  observedGeneration: 1
  replicas: 1
  unavailableReplicas: 1
  updatedReplicas: 1

StackDriver Logs:

Click to expand

2019-02-16T21:27:05,401298743+00:00 limits.cpu needs updating. Is: '1', want: '1000m'.
{
 insertId: "vczrqqfuclgl6"  
 
labels: {
  compute.googleapis.com/resource_name: "fluentd-gcp-v3.2.0-8cvth"   
  container.googleapis.com/namespace_name: "kube-system"   
  container.googleapis.com/pod_name: "fluentd-gcp-scaler-8b674f786-t8wft"   
  container.googleapis.com/stream: "stdout"   
 }
 logName: "projects/hello-gke-231316/logs/fluentd-gcp-scaler"  
 receiveTimestamp: "2019-02-16T21:27:11.344007112Z"  
 
resource: {
  
labels: {
   cluster_name: "hello-cluster"    
   container_name: "fluentd-gcp-scaler"    
   instance_id: "3000136174297012618"    
   namespace_id: "kube-system"    
   pod_id: "fluentd-gcp-scaler-8b674f786-t8wft"    
   project_id: "hello-gke-231316"    
   zone: "europe-west4-a"    
  }
  type: "container"   
 }
 severity: "INFO"  
 textPayload: "2019-02-16T21:27:05,401298743+00:00 limits.cpu needs updating. Is: '1', want: '1000m'.
"  
 timestamp: "2019-02-16T21:27:05.401525720Z"  
}

--------------------------------------------
2019-02-16T21:27:05,505025655+00:00 Running: kubectl set resources -n kube-system ds fluentd-gcp-v3.2.0 -c fluentd-gcp --requests=cpu=100m,memory=200Mi --limits=cpu=1000m,memory=500Mi
 {
 insertId: "vczrqqfuclgl7"  
 
labels: {
  compute.googleapis.com/resource_name: "fluentd-gcp-v3.2.0-8cvth"   
  container.googleapis.com/namespace_name: "kube-system"   
  container.googleapis.com/pod_name: "fluentd-gcp-scaler-8b674f786-t8wft"   
  container.googleapis.com/stream: "stdout"   
 }
 logName: "projects/hello-gke-231316/logs/fluentd-gcp-scaler"  
 receiveTimestamp: "2019-02-16T21:27:11.344007112Z"  
 
resource: {
  
labels: {
   cluster_name: "hello-cluster"    
   container_name: "fluentd-gcp-scaler"    
   instance_id: "3000136174297012618"    
   namespace_id: "kube-system"    
   pod_id: "fluentd-gcp-scaler-8b674f786-t8wft"    
   project_id: "hello-gke-231316"    
   zone: "europe-west4-a"    
  }
  type: "container"   
 }
 severity: "INFO"  
 textPayload: "2019-02-16T21:27:05,505025655+00:00 Running: kubectl set resources -n kube-system ds fluentd-gcp-v3.2.0 -c fluentd-gcp --requests=cpu=100m,memory=200Mi --limits=cpu=1000m,memory=500Mi
"  
 timestamp: "2019-02-16T21:27:05.505245675Z"  
}

--------------------------------------------
error: info: {extensions v1beta1 daemonsets} "fluentd-gcp-v3.2.0" was not changed </summary>
{
 insertId: "1fmcmfmfrtlegn"  
 
labels: {
  compute.googleapis.com/resource_name: "fluentd-gcp-v3.2.0-8cvth"   
  container.googleapis.com/namespace_name: "kube-system"   
  container.googleapis.com/pod_name: "fluentd-gcp-scaler-8b674f786-t8wft"   
  container.googleapis.com/stream: "stderr"   
 }
 logName: "projects/hello-gke-231316/logs/fluentd-gcp-scaler"  
 receiveTimestamp: "2019-02-16T20:05:11.657662029Z"  
 
resource: {
  
labels: {
   cluster_name: "hello-cluster"    
   container_name: "fluentd-gcp-scaler"    
   instance_id: "3000136174297012618"    
   namespace_id: "kube-system"    
   pod_id: "fluentd-gcp-scaler-8b674f786-t8wft"    
   project_id: "hello-gke-231316"    
   zone: "europe-west4-a"    
  }
  type: "container"   
 }
 severity: "ERROR"  
 textPayload: "error: info: {extensions v1beta1 daemonsets} "fluentd-gcp-v3.2.0" was not changed
"  
 timestamp: "2019-02-16T20:05:06.912095351Z"  
}

HTTPS Ingress works, no backend though:

503 Service Temporarily Unavailable
nginx/1.13.8

Possible fixes

???

~~meanwhile will try to reproduce it during Create new GKE cluster scenario~~ THE SAME BEHAVIOUR
will try to reproduce it on different regions if 1. unsuccessful

Edited Feb 07, 2024 by 🤖 GitLab Bot 🤖

Admin message