Build a scalable, self-service geo replication and verification framework

Hand some data to an engineer and they will not replicate it. Teach an engineer how to replicate data and customer happiness shall be with you. - The Wise Geologist

Introduction

There are several discussions within the Geo team regarding the limitations of our current replication system. As of October 2019 only ~50% of data types (we need a better name) are replicated and of those only ~41% are fully verified. This is known and we have made some efforts to change this situation by trying to replicate the remaining data types and by trying to verify those data types. As part of those efforts we learned that replicating data types is hard and so is verifying those data.

There are several technical reasons for this, including the current architecture, the differences in data types, usage of FDW in combination with selective sync etc.; however, these are not solely responsible for the difficulties. GitLab is growing rapidly and teams across the organisation are adding features at a rapid pace. Many of these features add new data types and are not initially designed to be easy to replicate; for example, GitLab Pages, server-side hooks, and Design Repositories. This is very likely through a lack of knowledge and because Geo is not considered during initial designs. Engineers across the company are not empowered to easily support geo replication and consequently the Geo team becomes a bottle neck.

In order to address both the technical challenges and the operational limitations, I propose to build a new geo replication and verification framework with the explicit goal of enabling teams across GitLab to add new data types in a way that supports geo replication out of the box. It should be incredibly easy for engineers to do the right thing. The Geo team should develop the framework and offer support to the organisation but would no longer be responsible for most of the implementation.

Problem to solve

Geo is usually not considered by other teams when implementing features
Customers expect new features (and their data) to be replicated either for performance or Disaster Recovery purposes
Adding new data types to Geo is hard and can only be performed by the Geo team
Verification of data types is difficult and does not perform well. Again only the Geo team can do this
The company is growing rapidly and adds new features; the current operational mode is not scalable
Software developers across GitLab are not empowered to make their features geo-compatbile

Intended users

Sasha (Software Developer)

Further details

There are currently some technical considerations on how to iterate on Geo:

Proposal

Create a Geo replication framework that is so easy to use that every software developer in GitLab can utilise it to make a new feature "Geo compatible"
The framework should abstract away many of the low level functionalities e.g. verification so engineers don't need to worry about them
Create educational material, workshops etc. to teach folks how to use it.

@toon wrote up some pseudocode of how this could look like:

class MyCoolNewFeatureModel < ApplicationRecord
  include Geo::Replicable

  geo_replicate_repository :cool_repository # name/prefix of the column(s) where Geo can find the repo

  # rest of the code unrelated to Geo
  # ...
end

Documentation

We would need to create documentation for this. The better the documentation, the more likely this is to take off

Testing

Geo replication must be extensively tested on all levels because of its relevance for Disaster Recoveyr

What does success look like, and how can we measure that?

Percentage of new features that are geo replication and DR ready out of the box (target: 80% of new features implemented by GitLab support Geo replication and DR out of the box)
Time it takes for software developer to become productive using the Geo framework
Number of steps needed to make a new data type geo-compatible

What is the type of buyer?

Premium
Ultimate

Links / references

Edited Oct 23, 2019 by Fabian Zimmer