Geo to use Gitaly transaction logs and WAL partition archives for replication
Context
Geo current uses basic git operations such as clone and fetch to synchronize git data between the primary and secondary sites. This has served us well so far.
The Gitaly team are building two key capabilities:
- Transaction logs
- WAL partition archives
Proposal
Geo to leverage transaction logs and WAL partition archives to synchronize data between sites.
I envision this to look similar to how PostgreSQL works.
- Replication is started by first taking a backup of the WAL partition archive and copying this over to the secondary to set the baseline. This is similar to the pg_basebackup in the PostgreSQL world.
- The transaction logs would be generated for changes made on the primary site that would be consumed on the secondary site to update it.
Discuss
- What are the advantages and disadvantages of this approach?
- How large an effort would this be?
- Are the advantages sufficient to justify the effort?
- What additional capabilities are required from Gitaly to support this workflow? For example, is per secondary queuing needed like there is for PostgreSQL?
Edited by Sampath Ranasinghe