Data source: WAL-G (workaround) - override the value of "BackupName" using the backup-list command to get the name of the last backup
Goal
Relying on modification time may lead to unexpected results (issues with backup-list, and backup-fetch .. LATEST)
See details: https://github.com/wal-g/wal-g/issues/694
What we have:
daily full backups stored in GCS older backups automatically moved to a "colder" storage, which updates their modtime, so they pop up in the tail
Example:
sudo /usr/bin/envdir /etc/wal-g.d/env /opt/wal-g/bin/wal-g backup-list
name modified wal_segment_backup_start
...
base_000000060008168A000000A3 2022-05-11T10:47:12Z 000000060008168A000000A3
base_000000060007EBF40000001D 2022-05-11T12:54:02Z 000000060007EBF40000001D
base_000000050007A326000000AF 2022-05-11T14:06:02Z 000000050007A326000000AF
This behavior has been fixed only for backup-list
with the --detail
option in WAL-G version 1.1
Example:
sudo /usr/bin/envdir /etc/wal-g.d/env /opt/wal-g/bin/wal-g backup-list --detail | grep "2022-05-11"
name modified wal_segment_backup_start start_time finish_time hostname data_dir pg_version start_lsn finish_lsn is_permanent
base_000000050007A326000000AF 2022-05-11T14:06:02Z 000000050007A326000000AF 2022-04-01T00:00:02Z 2022-04-01T10:03:59Z patroni-v12-06-db-gprd /var/lib/postgresql/data 120007 2149711377183576 2150945156759456 false
base_000000060007EBF40000001D 2022-05-11T12:54:02Z 000000060007EBF40000001D 2022-04-27T00:00:03Z 2022-04-27T10:56:39Z patroni-v12-09-db-gprd /var/lib/postgresql/data 120007 2229758528203736 2231447410651736 false
base_000000060008168A000000A3 2022-05-11T10:47:12Z 000000060008168A000000A3 2022-05-11T00:00:03Z 2022-05-11T10:47:11Z patroni-v12-10-db-gprd /var/lib/postgresql/data 120007 2276584509839464 2277945372971768 false
But not for backup-fetch
.
This can lead to the fact that the wrong backup file will be selected (not the latest), as a result of which the sync instance will play all the following archived WALs for a long time, which can lead to a huge lag.
TODO / How to implement
Nik: we cannot use backup-fetch .. LATEST" right now.
Instead, we are going to explicitly order the list of backups by LSN and choose the latest one.
This is because old backups are moved to another storage, which leads to changing mod time
(see https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10684)
therefore to messy list of backups and wrong behavior of LATEST.
Once this issue is fixed in WAL-G, we can return to using LATEST
(see https://github.com/wal-g/wal-g/issues/694).
If an option retrieval.spec.physicalRestore.options.walg.backupName
with the value LATEST
is specified, use this command to get the name of the last backup, and use this value in the wal-g backup-fetch
command instead of "LATEST":
wal-g backup-list | grep base | sort -nk1 | tail -1 | awk '{print $1}'
Or for WAL-G 1.1 and higher
wal-g backup-list --detail | tail -1 | awk '{print $1}'
but since it works a little slower with the "--detail" option (with a large number of backups), it may be worth using the first example.
Acceptance criteria
- DLE overrides the value of "BackupName" using the backup-list command to get the name of the last backup if BackupName = LATEST