Use database SCHEMAs to better isolated CI decomposed features
Once we start decomposing features, it becomes a problem to clearly define boundaries between features to prevent code mistakes doing operations forbidden when many databases are used. This creates chaos what can be freed/removed from main database, or what foreign keys are allowed to be used.
Proposal
Usage of SCHEMA is convenient way to create logical split of tables within a single logical database.
For that particular purpose we should use SCHEMA
to make it explicit that tables in this SCHEMA
are a separate logical structure that is migrated into another database.
Benefits of using SCHEMA
:
- clearly indicate decomposed features with a
SCHEMA
- detect foreign keys cross-joining
SCHEMA
- limit visibility of schema depending on connection used
- safety: allow to run production in a permissive mode (seeing all schemas) to retain things just work
- safety: allow to run development/test in an enforcing mode (limiting schema visibility) to discover cross-joining features and forbid them as part of test run
- be able to clearly indicate data that is shared (tables required to be present in all databases) vs data owned by the
main
database exclusively that can be truncated/dropped in another context
Iteration plan
This plans is defining a number of incremental steps based on assumption that we are using untested
feature of Rails (schema_search_path
) in a highly optimised environment (PgBouncer
and Load Balancer
)
and there's a high risk of causing a production incident.
- Development: Create migration
create_schema(:gitlab_ci)
(There's a Rails helper inActiveRecord::Migration
) - Development: Update
config/database.yml
to definemain: schema_search_path: public,gitlab_ci
formain:
. This needs to be populated in GDK/GCK/rspec/Omnibus/CNG. - Production: Validate that a proper
search_path
is used on production by doingActiveRecord::Base.connection.schema_search_path
. - Development: Ship migration to do
ALTER TABLE ci_instance_variables SET SCHEMA gitlab_ci
. - Development: Model a way to indicate in a code a way to what model schema belongs.
- Development: Validate that
ci_instance_variables
is part ofstructure.sql
. - Development: Validate that
ci_instance_variables
is part ofbackup/restore
(this feature defines exportable schemas) and works with projectimport/export
. - Development: Build a failsafe FF in a codebase to make it possible to disable fetching of
ci_instance_variables
(or make it possible to access it via explicitly defined name that includes schemagitlab_ci.ci_instance_variables
) as we expect that this might break and in some cases schema visibility might be invalid. - Production: Rollout feature flag and validate that instance variables can be accessed.
- Development: Ship a rspec code that validates that we do not define FKs that cross-join schemas.
- Development: Ship a code that discovers and disallows schema cross-joins (likely not possible yet).
- Once confirmed, this is done.
Example way to indicate schemas
# config/database.yml
production:
main:
schema_search_path: public,gitlab_ci
# migration
def up
create_schema(:gitlab_ci)
execute("ALTER TABLE ci_instance_variables SET SCHEMA gitlab_ci")
end
# Describe schema
# As we can later use that to validate cross-schema joins
class ApplicationRecord
class_attribute :schema_name, default: :public
end
class Ci::BaseModel < ApplicationRecord
self.schema_name = :gitlab_ci
end
# Dynamically alter schema visibility
module DatabaseSchemaHelper
def with_search_paths(connection, search_path)
# we need transaction to ensure that changes are local
# this is required by PgBouncer which might change connections on each subsequent call
connection.transaction do
saved = connection.schema_search_path
connection.schema_search_path = search_path
yield
end
ensure
connection.schema_search_path = saved
end
end
Risks
There are risks associated with usage of schemas:
- Our customers might be not allowing to create/manage schemas, which will result in failed migrations
- Dynamically changing schema outside of transaction might change a global context, resulting in an invalid visibility for subsequent requests
- We might find that schema visibility is not always set if used with PgBouncer, or our Load Balancer implementation
- We might find that additional schemas are not properly exported as part of backups, or as part of
structure.sql
- Validate if PgBouncer managed connections can end-up in a state where invalid schema is used: #333415 (comment 616830582).
- Customers using a different name for main schema, that is not
public
Ideally
We would create and set visibility accordingly.
-
gitlab_shared
- all tables that needs to live in every logical database, like:schema_migrations
and*background_migrations
-
gitlab_main
- all tables related tomain
database -
gitlab_ci
- all tables related toci
database
Shared tables
Migrations tables
There are number of tables that are required to be placed in both places (aka. gitlab_shared
), and are required for each newly configured database:
-
schema_migrations
required by Rails to know what migrations were executed -
batched_background_migration_jobs
,background_migration_jobs
required by GitLab Rails to implement a method to have a predictable long running migrations
It is without a question that schema_migrations
needs to be local to the database.
It is to question if background_migrations*
does need to be. However, the good practice
rather says that it should, since BG migrations
is affecting always a specific tables
for which it is to run. Keeping these tables local to what is to be modified is likely
a desirable architecture. It would create a chaos to keep BG migrations
only in central
database
Partitioning schemas
We use a separate schema for hiding partitioned tables gitlab_partitions_dynamic
and gitlab_partitions_static
.
This serves like a bucket for all partitioned tables.
Likely this model should be change into being explicit about decomposed partition, like:
-
gitlab_main_partitions_dynamic
andgitlab_main_partitions_static
-
gitlab_ci_partitions_dynamic
andgitlab_ci_partitions_static