Skip to content

Deprecate `local` and `sql` electors

Praefect's new per_repository election strategy elects a primary for each repository. With the new election strategy in place, we should deprecate the old strategies in %13.12 so we can begin removing them from %14.0 onwards.

The local and sql elect a primary for a virtual storage. This difference required us to add branching logic in quite a few places in the code base to handle the primaries correctly. This increases the complexity and makes maintenance more difficult. The NodeManager component used in both strategies had grown quite monolithic and began to make things quite difficult to extend and test. As such, the health checking, primary election and connection management functionalities were split in the process. Some features, such as variable replication factor, only support with repository specific primaries as otherwise all repositories would end up on the same host.

The local elector itself is not supposed to be used in production, as it elects a primary in-memory. There's no coordination between different Praefect nodes. It is used by a wide array of tests though. The repository state tracking is only implemented on top of Postgres, leading to tests using local elector not really exercising the production code.

The sql elector also works on the assumption that we may not have database records of all repositories. This makes it difficult to enforce correctness in some cases as we can't error out when a repository does not have database records. Variable replication factor also can't work without the database records. If the repository can't be expected to be present on every node, Praefect really needs to know where it is. There's also some existing open issues that are blocked on being able to expect the records to exist: #3183 (closed).

To summarize:

  1. We should remove the local elector as it doesn't support full feature set of Praefect and it shouldn't be used in production. We should update the tests using it to use the per_repository elector to make sure the tests actually test the production setup.
  2. sql should be removed and everyone migrated to per_repository elector. Having the primary per virtual storage makes some of the features like variable replication factor non-sensical. Missing database records for repositories makes it difficult to ensure correctness.

We should deprecate them in %13.12, so we can begin removal from %14.0 onwards. This requires us to add a deprecation warning in Omnibus and document the pre-requisites for migrating from sql to per_repository elector. The only pre-requisite is that the automatic migration has been run, which was included in %13.6.

As there's a lot of existing use in tests, actually removing the electors may take some time. It would be sufficient to always use per_repository strategy from %14.0 onwards unless one of the environment variables is set that GitLab Rails uses for testing Gitaly. That way the existing tests keep working until we have a chance to remove the implementations fully but all of the installations would use repository specific primaries.

/cc @mjwood

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information