Skip to content

Cluster creation not fully correct due to errors in Rancher UI

Hi

I've created a cluster with following CRD.

---
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
  namespace: elias-test
  name: cluster
spec:
  postgres:
    version: '14'
  instances: 2
  sgInstanceProfile: 'size-m'
  pods:
    persistentVolume:
      size: '6Gi'
  configurations:
    sgPostgresConfig: pg-config # lets use the above specified config
    backups:
    - sgObjectStorage: 'minio-backup'
      cronSchedule: '00 05 * * *'
      retention: 4
      compression: brotli # lz4
  distributedLogs:
    sgDistributedLogs: distributedlogs # send postgres logs to distributedlogs pg instance.
  prometheusAutobind: true

But it doesn't seem to be ok.

$ kubectl exec -ti "$(kubectl get pod --selector app=StackGresCluster,stackgres.io/cluster=true,role=master -o name)" -c patroni -- patronictl list
+-----------+-------------------+---------+---------+----+-----------+
| Member    | Host              | Role    | State   | TL | Lag in MB |
+ Cluster: cluster (7189620847726707067) -+---------+----+-----------+
| cluster-0 | 10.42.26.239:7433 | Leader  | running |  1 |           |
| cluster-1 | 10.42.20.3:7433   | Replica | running |  1 |         0 |
+-----------+-------------------+---------+---------+----+-----------+
$ kubectl get sgcluster -o wide
NAME      VERSION   INSTANCES   PROFILE   DISK   PROMETHEUS-AUTOBIND   POOL-CONFIG                            POSTGRES-CONFIG
cluster   14.6      2           size-m    6Gi    true                  generated-from-default-1673963975840   pg-config
$ kubectl get sgcluster cluster -o yaml | yq e '.status'
arch: x86_64
conditions:
  - lastTransitionTime: "2023-01-17T13:59:37.950187Z"
    reason: FalseFailed
    status: "False"
    type: Failed
  - lastTransitionTime: "2023-01-17T13:59:37.974765Z"
    reason: FalsePendingRestart
    status: "False"
    type: PendingRestart
  - lastTransitionTime: "2023-01-17T13:59:37.974790Z"
    reason: FalsePendingUpgrade
    status: "False"
    type: PendingUpgrade
managedSql:
  scripts:
    - completedAt: "2023-01-17T14:01:16.129892Z"
      id: 0
      scripts:
        - id: 0
          version: 0
      startedAt: "2023-01-17T14:01:15.924683Z"
      updatedAt: "2023-01-17T14:01:15.924700Z"
os: linux
podStatuses:
  - installedPostgresExtensions: []
    name: cluster-0
    pendingRestart: false
    primary: true
    replicationGroup: 0
  - installedPostgresExtensions: []
    name: cluster-1
    pendingRestart: false
    primary: false
    replicationGroup: 0

Is the status correct for this cluster?

In the Rancher UI it looks like this

Screenshot_2023-01-17_at_21.12.05

In the Stackgres Admin UI, the Cluster details is empty. Screenshot_2023-01-17_at_21.54.44

But the posgres cluster seems to be working fine.

Is this because Rancher doesn't understand the SGCluster CRD correctly or is something actually wrong?

I should add that the backups are failing as well with no details on to why.

What should I do? I'm a first time user of Stackgres, everything is new to me now.

(Running StackGres 1.4.0)