Define supported configuration model when using decomposed database for a monolithic installation

TL;DR

There are different configuration types that can be used when a decomposed features are to be used. Depending on a model used we can use single or many databases.

This extends discussions of the issues discussing usage of schema and structures:

Especially it is important to get better of the schema-based isolation and proposed usage of schemas:

#333415 (closed)
gitlab_shared, gitlab_main, gitlab_ci

Configuration types

We can distinguish different configuration types:

Using a single monolithic database (not migrated), with a single connection
Using a single monolithic database (not migrated), but with separate configurations
Using two separate logical databases

1. A single monolithic database (not migrated), with a single connection

This describes a situation of using a single database that only leverages schemas to perform logical partitioning / defining boundaries between decomposed features. The are two ways how we can manage structure of such database.

This model is the least impactful, as it retains the same network and memory requirements since we are not opening nor maintaining new connections. Having many connections can be a problem in some installations, especially when application connects directly to PostgreSQL without a proxy like PgBouncer.

main:
  database: gitlab_monolithic
  schema_search_path: gitlab_ci,gitlab_main,gitlab_shared

In this model we assume that there's a single structure, with many schemas, and many migrations.

2. A single monolithic database (not migrated), but with separate configurations

This describes a situation when using a single logical database that only leverages schemas to perform logical partitioning / defining boundaries between decomposed features. The difference is that two separate connections are defined to the same logical database, but with a different visibility configured.

main:
  database: gitlab_monolithic
  schema_search_path: gitlab_main,gitlab_shared
ci:
  database: gitlab_monolithic
  schema_search_path: gitlab_ci,gitlab_shared

This model is pretty impactful:

schema_search_path can break some queries if executed on main: with invalid context
this can be avoided in a transition period by doing: main:schema_search_path: gitlab_ci,gitlab_main,gitlab_shared
we require effectively to open twice many connections, which might have negative consequences for the performance if PgBouncer is not used
we clearly indicate what is being modified where

3. Using two databases

This describes a situation when using a many database, leveraging a fully decomposed architecture to increase capacity.

main:
  database: gitlab_main
  schema_search_path: gitlab_main,gitlab_shared
ci:
  database: gitlab_ci
  schema_search_path: gitlab_ci,gitlab_shared

This is end goal for GitLab.com. We ensure that only asked migrations are executed in a given context, ensuring that things are well described and minimal.

Shared tables

This is discussed in #333415 (closed)

Edited Jul 09, 2021 by Thong Kuah