Skip to content

bootstrap from backup

Summary

I would like to restore a backup on a different kubernetes cluster. This is essential for us to be able for a Disaster-Recovery or e.g. to restore prod to staging environment.

apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
  name: stackgres
spec:
  replicateFrom:
    storage:
      path: sgbackups.stackgres.io/edm/timescaledb/2024-05-31-21-10-12/16
      sgObjectStorage: stackgres-backups
    users:
      superuser:
        username:
          name: pg-origin-secret
          key: superuser-username
        password:
          name: pg-origin-secret
          key: superuser-password
      replication:
        username:
          name: pg-origin-secret
          key: replication-username
        password:
          name: pg-origin-secret
          key: replication-password
      authenticator:
        username:
          name: pg-origin-secret
          key: authenticator-username
        password:
          name: pg-origin-secret
          key: authenticator-password

I have skiped this part because it does not make sense in my case. A PGBackup has to be linked to a SGCluster which I do not have yet on my new K8S Cluster.

  initialData:
    restore:
      fromBackup:
        name: backup-name

Current Behaviour

After applying the YAML the StatefulSet was created but there are error logs:

2024-06-01 14:22:57,123 WARNING: postgresql parameter listen_addresses=localhost failed validation, defaulting to None
2024-06-01 14:22:57,123 WARNING: postgresql parameter port=5432 failed validation, defaulting to None
2024-06-01T16:22:57.124042993+02:00 2024-06-01 14:22:57,123 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-06-01T16:22:57.222223812+02:00 2024-06-01 14:22:57,126 INFO: Lock owner: ; I am test-cluster-0
2024-06-01T16:22:57.222322689+02:00 2024-06-01 14:22:57,221 INFO: trying to bootstrap a new standby leader
2024-06-01T16:22:57.229174023+02:00 pg_basebackup: error: connection to server at "test-cluster" (10.43.207.128), port 5433 failed: Connection refused
2024-06-01T16:22:57.229193723+02:00 	Is the server running on that host and accepting TCP/IP connections?
2024-06-01T16:22:57.230104020+02:00 2024-06-01 14:22:57,229 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2024-06-01T16:22:57.230138903+02:00 2024-06-01 14:22:57,229 WARNING: Trying again in 5 seconds
pg_basebackup: error: connection to server at "test-cluster" (10.43.207.128), port 5433 failed: Connection refused
2024-06-01T16:23:02.239798682+02:00 	Is the server running on that host and accepting TCP/IP connections?
2024-06-01T16:23:02.240958575+02:00 2024-06-01 14:23:02,240 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2024-06-01T16:23:02.240980826+02:00 2024-06-01 14:23:02,240 ERROR: failed to bootstrap clone from remote member postgresql://test-cluster:5433
2024-06-01T16:23:02.240991859+02:00 2024-06-01 14:23:02,240 INFO: Removing data directory: /var/lib/postgresql/data
2024-06-01 14:23:07,126 INFO: removing initialize key after failed attempt to bootstrap the cluster
Traceback (most recent call last):
2024-06-01T16:23:07.740109091+02:00   File "/usr/bin/patroni", line 8, in <module>
2024-06-01T16:23:07.740154200+02:00     sys.exit(main())
  File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 344, in main
    return patroni_main(args.configfile)
2024-06-01T16:23:07.740239465+02:00   File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 232, in patroni_main
2024-06-01T16:23:07.740287630+02:00     abstract_main(Patroni, configfile)
2024-06-01T16:23:07.740294316+02:00   File "/usr/lib/python3.9/site-packages/patroni/daemon.py", line 174, in abstract_main
    controller.run()
2024-06-01T16:23:07.740347170+02:00   File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 192, in run
    super(Patroni, self).run()
2024-06-01T16:23:07.740388708+02:00   File "/usr/lib/python3.9/site-packages/patroni/daemon.py", line 143, in run
    self._run_cycle()
2024-06-01T16:23:07.740434767+02:00   File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 201, in _run_cycle
2024-06-01T16:23:07.740470918+02:00     logger.info(self.ha.run_cycle())
2024-06-01T16:23:07.740474103+02:00   File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1980, in run_cycle
    info = self._run_cycle()
  File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1797, in _run_cycle
    return self.post_bootstrap()
  File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1681, in post_bootstrap
2024-06-01T16:23:07.741379972+02:00     self.cancel_initialization()
  File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1674, in cancel_initialization
    raise PatroniFatalException('Failed to bootstrap cluster')
2024-06-01T16:23:07.741666723+02:00 patroni.exceptions.PatroniFatalException: Failed to bootstrap cluster

Steps to reproduce

  1. create a SGObjectStorage and a SGCluster
  2. create a Backup
  3. create the SGObjectStorage and create the credentials secret on a other cluster or namespace
  4. modify and apply the yaml I've pasted above to this namespace / cluster

Expected Behaviour

Cluster should be initialized with the backup as initial data.

Environment

  • StackGres version: 1.10
  • Kubernetes version: 1.28
  • Cloud provider or hardware configuration: VMs on ESXI Hosts