WAL archiving should only happen on a single host
As we recently discovered, WAL-G is currently archiving on the primary as well as all replicas.
That is happening due to the use of archive_mode = always
.
From the docs:
When archive_mode is enabled, completed WAL segments are sent to archive storage by setting archive_command. In addition to off, to disable, there are two modes: on, and always. During normal operation, there is no difference between the two modes, but when set to always the WAL archiver is enabled also during archive recovery or standby mode. In always mode, all files restored from the archive or streamed with streaming replication will be archived (again).
Background
This was enabled in https://gitlab.com/gitlab-com/gl-infra/chef-repo/-/merge_requests/1307 as part of https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15126.
There are also some longer-term plans to move WAL archiving to replicas: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15471.
This issue is not intended to discuss whether WAL should happen on the primary vs a replica. That question can be discussed in https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15471. Rather, it is about ensuring we only run archiving in a single place.
Problems
Archiving from all nodes in the fleet creates several problems for us:
- Racy behaviour: As described in https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15505#note_892969332, having WAL-G processes competing on uploading the same WAL files results in races and was the trigger for https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15362.
- Wasted resources: We were previously spending resources only on the primary. Now we are spending those resources on all replicas as well. We are doing the same work multiple times. We only need to upload the WAL files once, yet we are currently uploading each one 10 times.
Proposal
I concur with the suggestion made by @hphilipps in https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15505#note_893075422:
I think simply setting
archive_mode = on
would be the best solution for now.
This will remove the concurrent archiving from secondaries and will leave it enabled only on the primary.
If we do decide to move archiving to a replica a la https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15471, it should be done in a way that runs it only on a single replica.
Downsides
- In case of failover/switchover some WAL archive can be missed and not be pushed into GCS, this can impact our Recovery point objective (RPO);
cc @Finotto @alexander-sosna @rhenchen.gitlab @bshah11 @nhoppe1 @marcel @NikolayS @kwanyangu @afappiano