bootstrap from backup
Summary
I would like to restore a backup on a different kubernetes cluster. This is essential for us to be able for a Disaster-Recovery or e.g. to restore prod to staging environment.
apiVersion: stackgres.io/v1
kind: SGCluster
metadata:
name: stackgres
spec:
replicateFrom:
storage:
path: sgbackups.stackgres.io/edm/timescaledb/2024-05-31-21-10-12/16
sgObjectStorage: stackgres-backups
users:
superuser:
username:
name: pg-origin-secret
key: superuser-username
password:
name: pg-origin-secret
key: superuser-password
replication:
username:
name: pg-origin-secret
key: replication-username
password:
name: pg-origin-secret
key: replication-password
authenticator:
username:
name: pg-origin-secret
key: authenticator-username
password:
name: pg-origin-secret
key: authenticator-password
I have skiped this part because it does not make sense in my case. A PGBackup has to be linked to a SGCluster which I do not have yet on my new K8S Cluster.
initialData:
restore:
fromBackup:
name: backup-name
Current Behaviour
After applying the YAML the StatefulSet was created but there are error logs:
2024-06-01 14:22:57,123 WARNING: postgresql parameter listen_addresses=localhost failed validation, defaulting to None
2024-06-01 14:22:57,123 WARNING: postgresql parameter port=5432 failed validation, defaulting to None
2024-06-01T16:22:57.124042993+02:00 2024-06-01 14:22:57,123 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-06-01T16:22:57.222223812+02:00 2024-06-01 14:22:57,126 INFO: Lock owner: ; I am test-cluster-0
2024-06-01T16:22:57.222322689+02:00 2024-06-01 14:22:57,221 INFO: trying to bootstrap a new standby leader
2024-06-01T16:22:57.229174023+02:00 pg_basebackup: error: connection to server at "test-cluster" (10.43.207.128), port 5433 failed: Connection refused
2024-06-01T16:22:57.229193723+02:00 Is the server running on that host and accepting TCP/IP connections?
2024-06-01T16:22:57.230104020+02:00 2024-06-01 14:22:57,229 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2024-06-01T16:22:57.230138903+02:00 2024-06-01 14:22:57,229 WARNING: Trying again in 5 seconds
pg_basebackup: error: connection to server at "test-cluster" (10.43.207.128), port 5433 failed: Connection refused
2024-06-01T16:23:02.239798682+02:00 Is the server running on that host and accepting TCP/IP connections?
2024-06-01T16:23:02.240958575+02:00 2024-06-01 14:23:02,240 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2024-06-01T16:23:02.240980826+02:00 2024-06-01 14:23:02,240 ERROR: failed to bootstrap clone from remote member postgresql://test-cluster:5433
2024-06-01T16:23:02.240991859+02:00 2024-06-01 14:23:02,240 INFO: Removing data directory: /var/lib/postgresql/data
2024-06-01 14:23:07,126 INFO: removing initialize key after failed attempt to bootstrap the cluster
Traceback (most recent call last):
2024-06-01T16:23:07.740109091+02:00 File "/usr/bin/patroni", line 8, in <module>
2024-06-01T16:23:07.740154200+02:00 sys.exit(main())
File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 344, in main
return patroni_main(args.configfile)
2024-06-01T16:23:07.740239465+02:00 File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 232, in patroni_main
2024-06-01T16:23:07.740287630+02:00 abstract_main(Patroni, configfile)
2024-06-01T16:23:07.740294316+02:00 File "/usr/lib/python3.9/site-packages/patroni/daemon.py", line 174, in abstract_main
controller.run()
2024-06-01T16:23:07.740347170+02:00 File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 192, in run
super(Patroni, self).run()
2024-06-01T16:23:07.740388708+02:00 File "/usr/lib/python3.9/site-packages/patroni/daemon.py", line 143, in run
self._run_cycle()
2024-06-01T16:23:07.740434767+02:00 File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 201, in _run_cycle
2024-06-01T16:23:07.740470918+02:00 logger.info(self.ha.run_cycle())
2024-06-01T16:23:07.740474103+02:00 File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1980, in run_cycle
info = self._run_cycle()
File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1797, in _run_cycle
return self.post_bootstrap()
File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1681, in post_bootstrap
2024-06-01T16:23:07.741379972+02:00 self.cancel_initialization()
File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1674, in cancel_initialization
raise PatroniFatalException('Failed to bootstrap cluster')
2024-06-01T16:23:07.741666723+02:00 patroni.exceptions.PatroniFatalException: Failed to bootstrap cluster
Steps to reproduce
- create a SGObjectStorage and a SGCluster
- create a Backup
- create the SGObjectStorage and create the credentials secret on a other cluster or namespace
- modify and apply the yaml I've pasted above to this namespace / cluster
Expected Behaviour
Cluster should be initialized with the backup as initial data.
Environment
- StackGres version: 1.10
- Kubernetes version: 1.28
- Cloud provider or hardware configuration: VMs on ESXI Hosts