CrashLoopBackOff: io.fabric8.kubernetes.client.KubernetesClientException: too old resource version

Summary

Stackgres operator restarting all the time. Operator logs say that it has problem with updating some crd resources, am I right? Everything is in the logs. Can anyone tell me what is going on and how to fix that.

Environment

  • StackGres version: commit: 25f423e7 Release 0.8
  • Kubernetes version (use kubectl version): Server Version: v1.15.9-gke.24
  • Cloud provider or hardware configuration: GKE

Steps to reproduce

  • install stackgress operator with default values:
helm install --namespace stackgres --name stackgres-operator ./stackgres-k8s/install/helm/stackgres-cluster
  • install cluster from manifests
apiVersion: v1
kind: Namespace
metadata:
  name: databases
---
apiVersion: stackgres.io/v1alpha1
kind: StackGresPostgresConfig
metadata:
  name: pg-conf-l
  namespace: databases
spec:
  pgVersion: '11'
  postgresql.conf:
    max_connections: '800'
    shared_buffers: '4GB'
    work_mem: 4MB
    maintenance_work_mem: 1GB
    wal_compression: 'on'
    wal_sender_timeout: '60s'
    password_encryption: 'scram-sha-256'
    random_page_cost: '1.5'
    shared_preload_libraries: 'pg_stat_statements'
    checkpoint_completion_target: '0.9'
    checkpoint_timeout: '5min'
---
apiVersion: stackgres.io/v1alpha1
kind: StackGresConnectionPoolingConfig
metadata:
  name: bouncer-conf
  namespace: databases
spec:
  pgbouncer.ini:
    default_pool_size: '800'
    max_client_conn: '800'
    pool_mode: 'transaction'
---
apiVersion: stackgres.io/v1alpha1
kind: StackGresProfile
metadata:
  name: size-l
  namespace: databases
spec:
  cpu: "4"
  memory: 8Gi
---
apiVersion: stackgres.io/v1alpha1
kind: StackGresCluster
metadata:
  name: dev-cluster
  namespace: databases
spec:
  instances: 2
  pgVersion: '11.6'
  volumeSize: '150Gi'
  pgConfig: 'pg-conf-l'
  connectionPoolingConfig: 'bouncer-conf'
  resourceProfile: 'size-l'
  nonProduction:
    disableClusterPodAntiAffinity: true

Relevant logs and/or screenshots

2020-04-01 14:40:42,179 ERROR [io.st.op.ValidationResource] (vert.x-worker-thread-18) cannot proceed with request 3d872845-5e7d-46fb-89de-10062b9f4fa2 cause: Cannot update default CRdefaultpgconfig
2020-04-01 14:40:42,315 INFO  [io.st.op.ValidationResource] (vert.x-worker-thread-19) Validating admission review 63cd7026-76a4-4446-989f-c9069def57a2 of kind GroupVersionKind(group=stackgres.io, kind=StackGresPostgresConfig, version=v1alpha1, additionalProperties={})
2020-04-01 14:40:42,315 ERROR [io.st.op.ValidationResource] (vert.x-worker-thread-19) cannot proceed with request 63cd7026-76a4-4446-989f-c9069def57a2 cause: Cannot update default CRdefaultpgconfig
2020-04-01 14:40:42,336 INFO  [io.st.op.ValidationResource] (vert.x-worker-thread-15) Validating admission review 36693ad3-5b7e-408c-9f9e-195f35b15399 of kind GroupVersionKind(group=stackgres.io, kind=StackGresPostgresConfig, version=v1alpha1, additionalProperties={})
2020-04-01 14:40:42,926 ERROR [io.st.op.ValidationResource] (vert.x-worker-thread-7) cannot proceed with request fb82303c-8c46-42ce-8bf1-03fb1c8c0913 cause: Cannot update default CRdefaultpgconfig
2020-04-01 14:40:45,380 TRACE [io.st.op.re.AbstractReconciliationCycle] (Cluster-ReconciliationCycle) Starting Reconciliation Cycle 457
2020-04-01 14:40:45,383 TRACE [io.st.op.re.AbstractReconciliationCycle] (Cluster-ReconciliationCycle) Reconciliation Cycle 457 getting existing cluster list
2020-04-01 14:40:45,386 ERROR [io.st.op.re.AbstractReconciliationCycle] (Cluster-ReconciliationCycle) Cluster reconciliation cycle failed: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for kind: [CustomResourceDefinition]  with name: [sgclusters.stackgres.io]  in namespace: [stackgres]  failed.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:237)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:170)
	at io.stackgres.operator.resource.ResourceUtil.getCustomResource(ResourceUtil.java:204)
	at io.stackgres.operator.controller.ClusterReconciliationCycle.getExistingConfigs(ClusterReconciliationCycle.java:169)
	at io.stackgres.operatorframework.reconciliation.AbstractReconciliationCycle.reconciliationCycle(AbstractReconciliationCycle.java:103)
	at io.stackgres.operatorframework.reconciliation.AbstractReconciliationCycle.reconciliationCycleLoop(AbstractReconciliationCycle.java:87)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Failed to connect to /10.0.0.1:443
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:248)
	at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:166)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:111)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
	at okhttp3.RealCall.execute(RealCall.java:92)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:411)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:372)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:337)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:318)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:833)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:226)
	... 8 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:607)
	at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
	at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:246)
	... 40 more
2020-04-01 14:44:40,719 ERROR [io.st.op.co.ClusterResourceWatcherFactory] (OkHttp https://10.0.0.1/...) onClose was called, : io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 29897439 (29900565)
	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:263)
	at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
	at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
	at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
	at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

2020-04-01 14:44:40,781 INFO  [io.st.op.ap.StackGresOperatorApp] (Thread-4) The application is stopping...
2020-04-01 14:44:40,784 INFO  [io.st.op.co.ClusterResourceWatcherFactory] (Thread-4) onClose was called
2020-04-01 14:44:41,784 WARN  [io.fa.ku.cl.ds.in.WatchConnectionManager] (Thread-4) Executor didn't terminate in time after shutdown in close(), killing it in: io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@10010c2d
2020-04-01 14:44:41,785 INFO  [io.st.op.co.ClusterResourceWatcherFactory] (Thread-4) onClose was called
2020-04-01 14:44:42,164 INFO  [io.st.op.co.ClusterResourceWatcherFactory] (Thread-4) onClose was called
2020-04-01 14:44:42,164 INFO  [io.st.op.co.ClusterResourceWatcherFactory] (Thread-4) onClose was called
2020-04-01 14:44:43,165 WARN  [io.fa.ku.cl.ds.in.WatchConnectionManager] (Thread-4) Executor didn't terminate in time after shutdown in close(), killing it in: io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@442ac289
2020-04-01 14:44:43,165 INFO  [io.st.op.co.ClusterResourceWatcherFactory] (Thread-4) onClose was called
2020-04-01 14:44:43,166 INFO  [io.st.op.re.AbstractReconciliationCycle] (Cluster-ReconciliationCycle) Cluster reconciliation cycle loop stopped
2020-04-01 14:44:43,211 DEBUG [io.qu.ar.impl] (Thread-4) ArC DI container shut down
2020-04-01 14:44:43,211 INFO  [io.quarkus] (Thread-4) stackgres-operator stopped in 2.431s
Edited by Matteo Melli