Define supported configuration model when using decomposed database for a monolithic installation
TL;DR
There are different configuration types that can be used when a decomposed features are to be used. Depending on a model used we can use single or many databases.
This extends discussions of the issues discussing usage of schema and structures:
- Use database SCHEMAs to better isolated CI decomposed features
- Decide on a single
structure.sql
vsci_structure.sql
Especially it is important to get better of the schema-based isolation and proposed usage of schemas:
- #333415 (closed)
-
gitlab_shared
,gitlab_main
,gitlab_ci
Configuration types
We can distinguish different configuration types:
- Using a single monolithic database (not migrated), with a single connection
- Using a single monolithic database (not migrated), but with separate configurations
- Using two separate logical databases
1. A single monolithic database (not migrated), with a single connection
This describes a situation of using a single database that only leverages schemas to perform logical partitioning / defining boundaries between decomposed features. The are two ways how we can manage structure of such database.
This model is the least impactful, as it retains the same network and memory requirements since we are not opening nor maintaining new connections. Having many connections can be a problem in some installations, especially when application connects directly to PostgreSQL without a proxy like PgBouncer.
main:
database: gitlab_monolithic
schema_search_path: gitlab_ci,gitlab_main,gitlab_shared
In this model we assume that there's a single structure, with many schemas, and many migrations.
2. A single monolithic database (not migrated), but with separate configurations
This describes a situation when using a single logical database that only leverages schemas to perform logical partitioning / defining boundaries between decomposed features. The difference is that two separate connections are defined to the same logical database, but with a different visibility configured.
main:
database: gitlab_monolithic
schema_search_path: gitlab_main,gitlab_shared
ci:
database: gitlab_monolithic
schema_search_path: gitlab_ci,gitlab_shared
This model is pretty impactful:
-
schema_search_path
can break some queries if executed onmain:
with invalid context - this can be avoided in a transition period by doing:
main:schema_search_path: gitlab_ci,gitlab_main,gitlab_shared
- we require effectively to open twice many connections, which might have negative consequences for the performance if PgBouncer is not used
- we clearly indicate what is being modified where
3. Using two databases
This describes a situation when using a many database, leveraging a fully decomposed architecture to increase capacity.
main:
database: gitlab_main
schema_search_path: gitlab_main,gitlab_shared
ci:
database: gitlab_ci
schema_search_path: gitlab_ci,gitlab_shared
This is end goal for GitLab.com. We ensure that only asked migrations are executed in a given context, ensuring that things are well described and minimal.
Shared tables
This is discussed in #333415 (closed)