Cannot restore cluster from PVC "server forked off from that timeline at 0/xxxx"
Summary
I'm trying to restore a single node cluster and can't seem to find any way to restore it, even though i tried
First I tried the stackgres backup but I got the same error.
Then I tried the restore through PVC procedure. Since I use longhorn for disk provisioning I restored the disk at multiple time in the past two week (failure occurred a couple of days ago)
Environment
- StackGres version: 1.10.0
- Kubernetes version: v1.28.6+k3s2
- Cloud provider or hardware configuration: Self hosted k3s on arm64 hardware, disk provided through longhorn.
Current Behaviour
The patroni pod just keep restarting.
Steps to reproduce
Sadly, except by providing the data, I have
Expected Behavior
It should just work.
Possible Solution
I am wondering if there is an option missing in the restore process which could solve this problem. I'm not very knowledgeable on patroni and pg clustering (which is why I'm using stackgres). I don't think the data is that corrupted (tried with 5 different backups and 5 disk backup). So this makes me think it's a setting somewhere...
Relevant logs and/or screenshots
Sample logs from the S3 backup restore...
May 04, 2024 16:16:14.425 2024-05-04 16:16:14,425 WARNING: postgresql parameter listen_addresses=localhost failed validation, defaulting to None
May 04, 2024 16:16:14.426 2024-05-04 16:16:14,426 WARNING: postgresql parameter port=5432 failed validation, defaulting to None
May 04, 2024 16:16:14.427 2024-05-04 16:16:14,427 INFO: No PostgreSQL configuration items changed, nothing to reload.
May 04, 2024 16:16:14.632 2024-05-04 16:16:14,431 INFO: Lock owner: ; I am gitea-postgres-0
May 04, 2024 16:16:14.633 2024-05-04 16:16:14,626 INFO: trying to bootstrap a new cluster
May 04, 2024 16:16:14.633 2024-05-04 16:16:14,626 INFO: Running custom bootstrap script: exec-with-env "restore" -- /etc/patroni/recovery-from-backup.sh
May 04, 2024 16:16:14.942 + '[' -n '' ']'
May 04, 2024 16:16:14.943 + '[' false = true ']'
May 04, 2024 16:16:14.943 + wal-g backup-fetch /var/lib/postgresql/data base_000000050000000000000090
May 04, 2024 16:16:14.998 INFO: 2024/05/04 16:16:14.997887 Selecting the backup with name base_000000050000000000000090...
May 04, 2024 16:16:14.998 INFO: 2024/05/04 16:16:14.998429 Backup to fetch will be searched in storages: [default]
May 04, 2024 16:16:23.209 INFO: 2024/05/04 16:16:23.209447 Finished extraction of part_001.tar.lz4
May 04, 2024 16:16:24.435 2024-05-04 16:16:24,432 INFO: Lock owner: ; I am gitea-postgres-0
May 04, 2024 16:16:24.436 2024-05-04 16:16:24,432 INFO: not healthy enough for leader race
May 04, 2024 16:16:24.582 2024-05-04 16:16:24,581 INFO: bootstrap in progress
May 04, 2024 16:16:30.717 INFO: 2024/05/04 16:16:30.717230 Finished extraction of part_002.tar.lz4
May 04, 2024 16:16:30.737 INFO: 2024/05/04 16:16:30.737547 Finished extraction of part_004.tar.lz4
May 04, 2024 16:16:30.758 INFO: 2024/05/04 16:16:30.758395 Finished extraction of pg_control.tar.lz4
May 04, 2024 16:16:30.758 INFO: 2024/05/04 16:16:30.758429
May 04, 2024 16:16:30.758 Backup extraction complete.
May 04, 2024 16:16:31.372 2024-05-04 16:16:31 UTC [1425]: db=,user=,app=,client= LOG: starting PostgreSQL 12.17 (OnGres 12.17-build-6.31) on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20), 64-bit
May 04, 2024 16:16:31.372 2024-05-04 16:16:31 UTC [1425]: db=,user=,app=,client= LOG: listening on IPv4 address "127.0.0.1", port 5432
May 04, 2024 16:16:31.379 2024-05-04 16:16:31 UTC [1425]: db=,user=,app=,client= LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
May 04, 2024 16:16:31.382 2024-05-04 16:16:31,381 INFO: postmaster pid=1425
May 04, 2024 16:16:31.425 2024-05-04 16:16:31 UTC [1428]: db=,user=,app=,client= LOG: database system was interrupted; last known up at 2024-05-01 05:00:08 UTC
May 04, 2024 16:16:31.425 2024-05-04 16:16:31 UTC [1429]: db=[unknown],user=[unknown],app=[unknown],client=[local] LOG: connection received: host=[local]
May 04, 2024 16:16:31.425 2024-05-04 16:16:31 UTC [1428]: db=,user=,app=,client= LOG: creating missing WAL directory "pg_wal/archive_status"
May 04, 2024 16:16:31.425 2024-05-04 16:16:31 UTC [1429]: db=postgres,user=postgres,app=[unknown],client=[local] FATAL: the database system is starting up
May 04, 2024 16:16:31.426 /var/run/postgresql:5432 - rejecting connections
May 04, 2024 16:16:31.438 2024-05-04 16:16:31 UTC [1431]: db=[unknown],user=[unknown],app=[unknown],client=[local] LOG: connection received: host=[local]
May 04, 2024 16:16:31.438 2024-05-04 16:16:31 UTC [1431]: db=postgres,user=postgres,app=[unknown],client=[local] FATAL: the database system is starting up
May 04, 2024 16:16:31.439 /var/run/postgresql:5432 - rejecting connections
May 04, 2024 16:16:31.891 INFO: 2024/05/04 16:16:31.891295 Files will be read from storages: [default]
May 04, 2024 16:16:31.972 2024-05-04 16:16:31 UTC [1428]: db=,user=,app=,client= LOG: restored log file "00000006.history" from archive
May 04, 2024 16:16:32.321 INFO: 2024/05/04 16:16:32.320251 Files will be read from storages: [default]
May 04, 2024 16:16:32.384 ERROR: 2024/05/04 16:16:32.384693 Archive '00000007.history' does not exist.
May 04, 2024 16:16:32.396 2024-05-04 16:16:32 UTC [1428]: db=,user=,app=,client= LOG: starting archive recovery
May 04, 2024 16:16:32.459 2024-05-04 16:16:32 UTC [1641]: db=[unknown],user=[unknown],app=[unknown],client=[local] LOG: connection received: host=[local]
May 04, 2024 16:16:32.459 2024-05-04 16:16:32 UTC [1641]: db=postgres,user=postgres,app=[unknown],client=[local] FATAL: the database system is starting up
May 04, 2024 16:16:32.460 /var/run/postgresql:5432 - rejecting connections
May 04, 2024 16:16:32.795 INFO: 2024/05/04 16:16:32.795585 Files will be read from storages: [default]
May 04, 2024 16:16:32.872 2024-05-04 16:16:32 UTC [1428]: db=,user=,app=,client= LOG: restored log file "00000006.history" from archive
May 04, 2024 16:16:33.246 INFO: 2024/05/04 16:16:33.246612 Files will be read from storages: [default]
May 04, 2024 16:16:33.339 ERROR: 2024/05/04 16:16:33.339021 Archive '000000060000000000000090' does not exist.
May 04, 2024 16:16:33.488 2024-05-04 16:16:33 UTC [1841]: db=[unknown],user=[unknown],app=[unknown],client=[local] LOG: connection received: host=[local]
May 04, 2024 16:16:33.488 2024-05-04 16:16:33 UTC [1841]: db=postgres,user=postgres,app=[unknown],client=[local] FATAL: the database system is starting up
May 04, 2024 16:16:33.488 /var/run/postgresql:5432 - rejecting connections
May 04, 2024 16:16:33.770 INFO: 2024/05/04 16:16:33.770488 Files will be read from storages: [default]
May 04, 2024 16:16:33.943 2024-05-04 16:16:33 UTC [1428]: db=,user=,app=,client= LOG: restored log file "000000050000000000000090" from archive
May 04, 2024 16:16:34.149 2024-05-04 16:16:34 UTC [1428]: db=,user=,app=,client= FATAL: requested timeline 6 is not a child of this server's history
May 04, 2024 16:16:34.149 2024-05-04 16:16:34 UTC [1428]: db=,user=,app=,client= DETAIL: Latest checkpoint is at 0/90000060 on timeline 5, but in the history of the requested timeline, the server forked off from that timeline at 0/20000A0.
May 04, 2024 16:16:34.150 2024-05-04 16:16:34 UTC [1425]: db=,user=,app=,client= LOG: startup process (PID 1428) exited with exit code 1
May 04, 2024 16:16:34.150 2024-05-04 16:16:34 UTC [1425]: db=,user=,app=,client= LOG: aborting startup due to startup process failure
May 04, 2024 16:16:34.155 2024-05-04 16:16:34 UTC [1425]: db=,user=,app=,client= LOG: database system is shut down
May 04, 2024 16:16:34.439 2024-05-04 16:16:34,434 INFO: Lock owner: ; I am gitea-postgres-0
May 04, 2024 16:16:34.439 2024-05-04 16:16:34,435 INFO: not healthy enough for leader race
May 04, 2024 16:16:34.504 /var/run/postgresql:5432 - no response
May 04, 2024 16:16:34.657 2024-05-04 16:16:34,656 INFO: bootstrap in progress
May 04, 2024 16:16:34.660 2024-05-04 16:16:34,659 INFO: removing initialize key after failed attempt to bootstrap the cluster
May 04, 2024 16:16:34.673 2024-05-04 16:16:34,673 INFO: renaming data directory to /var/lib/postgresql/data.failed
May 04, 2024 16:16:35.146 Traceback (most recent call last):
May 04, 2024 16:16:35.146 File "/usr/bin/patroni", line 8, in <module>
May 04, 2024 16:16:35.146 sys.exit(main())
May 04, 2024 16:16:35.146 File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 344, in main
May 04, 2024 16:16:35.147 return patroni_main(args.configfile)
May 04, 2024 16:16:35.147 File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 232, in patroni_main
May 04, 2024 16:16:35.147 abstract_main(Patroni, configfile)
May 04, 2024 16:16:35.147 File "/usr/lib/python3.9/site-packages/patroni/daemon.py", line 174, in abstract_main
May 04, 2024 16:16:35.147 controller.run()
May 04, 2024 16:16:35.147 File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 192, in run
May 04, 2024 16:16:35.148 super(Patroni, self).run()
May 04, 2024 16:16:35.148 File "/usr/lib/python3.9/site-packages/patroni/daemon.py", line 143, in run
May 04, 2024 16:16:35.148 self._run_cycle()
May 04, 2024 16:16:35.148 File "/usr/lib/python3.9/site-packages/patroni/__main__.py", line 201, in _run_cycle
May 04, 2024 16:16:35.148 logger.info(self.ha.run_cycle())
May 04, 2024 16:16:35.148 File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1980, in run_cycle
May 04, 2024 16:16:35.150 info = self._run_cycle()
May 04, 2024 16:16:35.150 File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1797, in _run_cycle
May 04, 2024 16:16:35.151 return self.post_bootstrap()
May 04, 2024 16:16:35.151 File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1681, in post_bootstrap
May 04, 2024 16:16:35.153 self.cancel_initialization()
May 04, 2024 16:16:35.153 File "/usr/lib/python3.9/site-packages/patroni/ha.py", line 1674, in cancel_initialization
May 04, 2024 16:16:35.154 raise PatroniFatalException('Failed to bootstrap cluster')
May 04, 2024 16:16:35.154 patroni.exceptions.PatroniFatalException: Failed to bootstrap cluster
Sample logs from a PVC restore
May 04, 2024 16:08:59.169 /var/run/postgresql:5432 - rejecting connections
May 04, 2024 16:08:59.512 INFO: 2024/05/04 16:08:59.512364 Files will be read from storages: [default]
May 04, 2024 16:08:59.583 2024-05-04 16:08:59 UTC [2132]: db=,user=,app=,client= LOG: restored log file "00000006.history" from archive
May 04, 2024 16:08:59.938 INFO: 2024/05/04 16:08:59.937765 Files will be read from storages: [default]
May 04, 2024 16:09:00.004 ERROR: 2024/05/04 16:09:00.004412 Archive '00000007.history' does not exist.
May 04, 2024 16:09:00.018 2024-05-04 16:09:00 UTC [2132]: db=,user=,app=,client= LOG: entering standby mode
May 04, 2024 16:09:00.186 2024-05-04 16:09:00 UTC [2359]: db=[unknown],user=[unknown],app=[unknown],client=[local] LOG: connection received: host=[local]
May 04, 2024 16:09:00.186 2024-05-04 16:09:00 UTC [2359]: db=postgres,user=postgres,app=[unknown],client=[local] FATAL: the database system is starting up
May 04, 2024 16:09:00.186 /var/run/postgresql:5432 - rejecting connections
May 04, 2024 16:09:00.337 INFO: 2024/05/04 16:09:00.337368 Files will be read from storages: [default]
May 04, 2024 16:09:00.434 2024-05-04 16:09:00 UTC [2132]: db=,user=,app=,client= LOG: restored log file "00000006.history" from archive
May 04, 2024 16:09:00.447 2024-05-04 16:09:00 UTC [2132]: db=,user=,app=,client= FATAL: requested timeline 6 is not a child of this server's history
May 04, 2024 16:09:00.447 2024-05-04 16:09:00 UTC [2132]: db=,user=,app=,client= DETAIL: Latest checkpoint is at 0/78000028 on timeline 5, but in the history of the requested timeline, the server forked off from that timeline at 0/20000A0.