Multiple full backups are created daily by WAL-E (should be once per day)
On patroni-01
(current primary), for some days, there are more than 1 full backups available, according to WAL-G's backup-list
:
$ /usr/bin/envdir /etc/wal-g.d/env /opt/wal-g/bin/wal-g backup-list
name last_modified wal_segment_backup_start
base_0000000200025200000000A0_08419680 2020-06-04T11:19:41Z 0000000200025200000000A0
base_000000020002538E00000063_11136696 2020-06-05T11:34:04Z 000000020002538E00000063
base_0000000200025533000000EA_00861624 2020-06-06T08:41:26Z 0000000200025533000000EA
base_000000020002436C00000055_00682592 2020-06-06T19:10:01Z 000000020002436C00000055
base_00000002000255A900000014_10456328 2020-06-07T08:23:08Z 00000002000255A900000014
base_000000020002440500000024_00719944 2020-06-07T17:43:07Z 000000020002440500000024
base_000000020002562100000032_16762680 2020-06-08T09:04:03Z 000000020002562100000032
base_0000000200024482000000B6_03285384 2020-06-08T19:06:51Z 0000000200024482000000B6
base_00000002000257B200000028_02692848 2020-06-09T09:53:10Z 00000002000257B200000028
base_00000002000245C500000032_08802024 2020-06-09T19:22:33Z 00000002000245C500000032
base_000000020002598900000034_02363136 2020-06-10T09:35:12Z 000000020002598900000034
base_0000000200025B170000009B_08114072 2020-06-11T09:31:47Z 0000000200025B170000009B
base_000000020002491500000068_09930656 2020-06-11T20:54:54Z 000000020002491500000068
base_000000020002477F0000000E_05698536 2020-06-12T06:16:02Z 000000020002477F0000000E
base_0000000200025CA0000000D3_09963032 2020-06-12T09:24:05Z 0000000200025CA0000000D3
base_0000000200025DE4000000C2_06657248 2020-06-13T08:29:19Z 0000000200025DE4000000C2
base_0000000200024AA700000073_05209016 2020-06-13T12:18:22Z 0000000200024AA700000073
base_0000000200024C1F00000071_02732016 2020-06-13T17:24:00Z 0000000200024C1F00000071
base_0000000300025E52000000C6_10573912 2020-06-14T07:52:56Z 0000000300025E52000000C6
base_0000000200024CA30000009F_05090616 2020-06-14T17:52:19Z 0000000200024CA30000009F
base_0000000300025ECF0000007E_04432176 2020-06-15T08:26:57Z 0000000300025ECF0000007E
base_0000000200024D270000003E_05167904 2020-06-16T00:08:56Z 0000000200024D270000003E
base_00000003000260710000005B_00505664 2020-06-16T08:35:30Z 00000003000260710000005B
base_0000000200024EAA00000030_00465024 2020-06-17T03:40:40Z 0000000200024EAA00000030
base_00000003000261F1000000BA_15704136 2020-06-17T08:48:10Z 00000003000261F1000000BA
base_000000020002504200000090_00078520 2020-06-17T22:58:02Z 000000020002504200000090
base_00000003000263800000000D_02150264 2020-06-18T08:34:21Z 00000003000263800000000D
WAL-E's backup-list
also reports the same list, although ordered differently, not by timestamp but by LSN:
$ sudo -u gitlab-psql /usr/bin/envdir /etc/wal-e.d/env /opt/wal-e/bin/wal-e backup-list
wal_e.main INFO MSG: starting WAL-E
DETAIL: The subcommand is "backup-list".
STRUCTURED: time=2020-06-18T21:32:01.334857-00 pid=4752
name last_modified expanded_size_bytes wal_segment_backup_start wal_segment_offset_backup_start wal_segment_backup_stop wal_segment_offset_backup_stop
base_000000020002436C00000055_00682592 2020-06-06 19:10:01.227000+00:00 000000020002436C00000055 00682592
base_000000020002440500000024_00719944 2020-06-07 17:43:07.294000+00:00 000000020002440500000024 00719944
base_0000000200024482000000B6_03285384 2020-06-08 19:06:51.695000+00:00 0000000200024482000000B6 03285384
base_00000002000245C500000032_08802024 2020-06-09 19:22:33.334000+00:00 00000002000245C500000032 08802024
base_000000020002477F0000000E_05698536 2020-06-12 06:16:02.840000+00:00 000000020002477F0000000E 05698536
base_000000020002491500000068_09930656 2020-06-11 20:54:54.362000+00:00 000000020002491500000068 09930656
base_0000000200024AA700000073_05209016 2020-06-13 12:18:22.983000+00:00 0000000200024AA700000073 05209016
base_0000000200024C1F00000071_02732016 2020-06-13 17:24:00.059000+00:00 0000000200024C1F00000071 02732016
base_0000000200024CA30000009F_05090616 2020-06-14 17:52:19.916000+00:00 0000000200024CA30000009F 05090616
base_0000000200024D270000003E_05167904 2020-06-16 00:08:56.664000+00:00 0000000200024D270000003E 05167904
base_0000000200024EAA00000030_00465024 2020-06-17 03:40:40.312000+00:00 0000000200024EAA00000030 00465024
base_000000020002504200000090_00078520 2020-06-17 22:58:02.446000+00:00 000000020002504200000090 00078520
base_0000000200025200000000A0_08419680 2020-06-04 11:19:41.962000+00:00 0000000200025200000000A0 08419680
base_000000020002538E00000063_11136696 2020-06-05 11:34:04.812000+00:00 000000020002538E00000063 11136696
base_0000000200025533000000EA_00861624 2020-06-06 08:41:26.058000+00:00 0000000200025533000000EA 00861624
base_00000002000255A900000014_10456328 2020-06-07 08:23:08.202000+00:00 00000002000255A900000014 10456328
base_000000020002562100000032_16762680 2020-06-08 09:04:03.162000+00:00 000000020002562100000032 16762680
base_00000002000257B200000028_02692848 2020-06-09 09:53:10.117000+00:00 00000002000257B200000028 02692848
base_000000020002598900000034_02363136 2020-06-10 09:35:12.025000+00:00 000000020002598900000034 02363136
base_0000000200025B170000009B_08114072 2020-06-11 09:31:47.373000+00:00 0000000200025B170000009B 08114072
base_0000000200025CA0000000D3_09963032 2020-06-12 09:24:05.682000+00:00 0000000200025CA0000000D3 09963032
base_0000000200025DE4000000C2_06657248 2020-06-13 08:29:19.358000+00:00 0000000200025DE4000000C2 06657248
base_0000000300025E52000000C6_10573912 2020-06-14 07:52:56.572000+00:00 0000000300025E52000000C6 10573912
base_0000000300025ECF0000007E_04432176 2020-06-15 08:26:57.401000+00:00 0000000300025ECF0000007E 04432176
base_00000003000260710000005B_00505664 2020-06-16 08:35:30.856000+00:00 00000003000260710000005B 00505664
base_00000003000261F1000000BA_15704136 2020-06-17 08:48:10.605000+00:00 00000003000261F1000000BA 15704136
base_00000003000263800000000D_02150264 2020-06-18 08:34:21.947000+00:00 00000003000263800000000D 02150264
Notice that for 2020-06-17, we have 3 full backups:
base_0000000200024EAA00000030_00465024 2020-06-17T03:40:40Z 0000000200024EAA00000030
base_00000003000261F1000000BA_15704136 2020-06-17T08:48:10Z 00000003000261F1000000BA
base_000000020002504200000090_00078520 2020-06-17T22:58:02Z 000000020002504200000090
-- one is from timeline 2, and another is from 3.
This looks like a problem with multiple sources -- like several nodes are writing to the archive, and the state of the archive is messy, that can cause big issues in the case if we need to perform a fresh restoration from backups.
Pinging @gitlab-com/gl-infra/sre-datastores – I would consider this as a critical issue that needs attention ASAP.