SGCluster fails to reconcile when backups are enabled
Summary
Stackgres 1.4.0-RC SGCluster fails to reconcile when backups are enabled.
Current Behaviour
With backups enabled, stackgres-operator pod loops with the following endlessly:
2022-11-18 11:00:55,532 INFO [io.st.op.conciliation] (SGCluster-ReconciliationLoop) Checking reconciliation status of SGCluster test/test-pg
2022-11-18 11:00:55,635 INFO [io.st.op.conciliation] (SGCluster-ReconciliationLoop) SGCluster test/test-pg it's not up to date. Reconciling
2022-11-18 11:00:55,635 INFO [io.st.op.conciliation] (SGCluster-ReconciliationLoop) Creating resource test.test-pg-backup of kind: CronJob
2022-11-18 11:00:55,708 ERROR [io.st.op.conciliation] (SGCluster-ReconciliationLoop) Reconciliation of SGCluster test/test-pg failed: io.fabric8.kubernetes.client.KubernetesClientException: the server could not find the requested resource
at io.stackgres.common.kubernetesclient.StackGresDefaultKubernetesClient.executeRequest(StackGresDefaultKubernetesClient.java:277)
at io.stackgres.common.kubernetesclient.StackGresDefaultKubernetesClient.apply(StackGresDefaultKubernetesClient.java:254)
at io.stackgres.common.kubernetesclient.StackGresDefaultKubernetesClient.serverSideApply(StackGresDefaultKubernetesClient.java:142)
at java.lang.reflect.Method.invoke(Method.java:568)
at io.stackgres.common.kubernetesclient.KubernetesClientProducer$KubernetesClientInvocationHandler.invoke(KubernetesClientProducer.java:68)
at jdk.proxy4.$Proxy338.serverSideApply(Unknown Source)
at io.stackgres.operator.conciliation.AbstractReconciliationHandler.create(AbstractReconciliationHandler.java:32)
at io.stackgres.operator.conciliation.cluster.ClusterDefaultReconciliationHandler_ClientProxy.create(Unknown Source)
at io.stackgres.operator.conciliation.cluster.ClusterHandlerDelegator.create(ClusterHandlerDelegator.java:38)
at io.stackgres.operator.conciliation.cluster.ClusterHandlerDelegator.create(ClusterHandlerDelegator.java:20)
at io.stackgres.operator.conciliation.cluster.ClusterHandlerDelegator_ClientProxy.create(Unknown Source)
at io.stackgres.operator.conciliation.AbstractReconciliator.lambda$reconciliationCycle$1(AbstractReconciliator.java:120)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:357)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at io.stackgres.operator.conciliation.AbstractReconciliator.lambda$reconciliationCycle$4(AbstractReconciliator.java:115)
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:179)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at io.stackgres.operator.conciliation.AbstractReconciliator.reconciliationCycle(AbstractReconciliator.java:100)
at io.stackgres.operator.conciliation.AbstractReconciliator.reconciliationLoop(AbstractReconciliator.java:90)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.lang.Thread.run(Thread.java:833)
at com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:704)
at com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine(PosixPlatformThreads.java:202)
(this loops every 10 seconds)
with no backups configured, it loops with this:
2022-11-18 11:11:45,552 INFO [io.st.op.conciliation] (SGCluster-ReconciliationLoop) Checking reconciliation status of SGCluster test/test-pg
2022-11-18 11:11:45,570 INFO [io.st.op.mu.ScriptMutationResource] (executor-thread-12) Mutating admission review uid 260d19aa-423a-4891-8cb0-77d406961785 of kind SGScript for resource test.test-pg-default
2022-11-18 11:11:45,572 INFO [io.st.op.va.ScriptValidationResource] (executor-thread-12) Validating admission review uid 05be10af-b438-4f93-b56c-3ab443079daf of kind SGScript for resource test.test-pg-default
2022-11-18 11:11:45,577 INFO [io.st.op.conciliation] (SGScript-ReconciliationLoop) Checking reconciliation status of SGScript test/test-pg-default
2022-11-18 11:11:45,577 INFO [io.st.op.conciliation] (SGScript-ReconciliationLoop) SGScript test/test-pg-default it's up to date
2022-11-18 11:11:45,683 INFO [io.st.op.conciliation] (SGCluster-ReconciliationLoop) SGCluster test/test-pg it's up to date
2022-11-18 11:11:45,702 INFO [io.st.op.mu.ClusterMutationResource] (executor-thread-12) Mutating admission review uid 5ec6630c-a9e0-4afd-b44b-c60fd4206302 of kind SGCluster for resource test.test-pg
2022-11-18 11:11:45,704 INFO [io.st.op.va.ClusterValidationResource] (executor-thread-12) Validating admission review uid e04406c1-e195-4bf0-bd28-e344ea131200 of kind SGCluster for resource test.test-pg
2022-11-18 11:11:46,053 INFO [io.st.op.mu.ClusterMutationResource] (executor-thread-12) Mutating admission review uid a6f4837a-3870-4d12-84cf-d3e42efbad03 of kind SGCluster for resource test.test-pg
2022-11-18 11:11:46,069 INFO [io.st.op.va.ClusterValidationResource] (executor-thread-12) Validating admission review uid b4d561dc-94af-488b-87bf-fea63adb04ce of kind SGCluster for resource test.test-pg
(this loops every 10 seconds)
Steps to reproduce
On a freshly installed Ubuntu 22.04:
snap install microk8s --classic --channel=1.25/stable
microk8s enable dns
microk8s enable hostpath-storage
then:
helm install --create-namespace --namespace stackgres stackgres-operator https://stackgres.io/downloads/stackgres-k8s/stackgres/1.4.0-RC1/helm/stackgres-operator.tgz
then:
kubectl create ns test
kubens test
kubectl apply -f db.yaml
with the following manifest:
apiVersion: stackgres.io/v1
kind: SGPostgresConfig
metadata:
name: test
spec:
postgresVersion: "14"
postgresql.conf:
max_standby_streaming_delay: "2h"
---
apiVersion: stackgres.io/v1
kind: SGPoolingConfig
metadata:
name: test
spec:
pgBouncer:
pgbouncer.ini:
pgbouncer:
pool_mode: transaction
max_client_conn: "1000"
default_pool_size: "80"
---
apiVersion: stackgres.io/v1
kind: SGInstanceProfile
metadata:
name: test
spec:
cpu: "2"
memory: "4Gi"
---
apiVersion: v1
kind: Secret
metadata:
name: backup-aws
type: Opaque
stringData:
AWS_ACCESS_KEY_ID: redacted
AWS_SECRET_ACCESS_KEY: redacted
---
apiVersion: stackgres.io/v1beta1
kind: SGObjectStorage
metadata:
name: test
spec:
type: s3Compatible
s3Compatible:
awsCredentials:
secretKeySelectors:
accessKeyId:
name: backup-aws
key: AWS_ACCESS_KEY_ID
secretAccessKey:
name: backup-aws
key: AWS_SECRET_ACCESS_KEY
endpoint: https://storage.yandexcloud.net
region: ru-central1
bucket: redacted
storageClass: STANDARD_IA
---
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
name: test-pg
spec:
postgres:
version: "14.5"
instances: 1
sgInstanceProfile: test
pods:
persistentVolume:
size: 10Gi
configurations:
sgPostgresConfig: test
sgPoolingConfig: test
backups:
- sgObjectStorage: test
cronSchedule: "5 20 * * *"
compression: brotli
retention: 14
then run kubectl logs -n stackgres stackgres-operator-7898ccff4-klxwd --tail 100 -f
Expected Behaviour
Reconciliation completes normally.
Environment
- StackGres Version: 1.4.0-RC1
- Kubernetes version: 1.25.3
- Cloud provider or hardware configuration: bare metal 6 cores / 32 gig / 480 GB SSD
Edited by Ilya Semenov