Skip to content

Use database SCHEMAs to better isolated CI decomposed features

Once we start decomposing features, it becomes a problem to clearly define boundaries between features to prevent code mistakes doing operations forbidden when many databases are used. This creates chaos what can be freed/removed from main database, or what foreign keys are allowed to be used.

Proposal

Usage of SCHEMA is convenient way to create logical split of tables within a single logical database. For that particular purpose we should use SCHEMA to make it explicit that tables in this SCHEMA are a separate logical structure that is migrated into another database.

Benefits of using SCHEMA:

  • clearly indicate decomposed features with a SCHEMA
  • detect foreign keys cross-joining SCHEMA
  • limit visibility of schema depending on connection used
  • safety: allow to run production in a permissive mode (seeing all schemas) to retain things just work
  • safety: allow to run development/test in an enforcing mode (limiting schema visibility) to discover cross-joining features and forbid them as part of test run
  • be able to clearly indicate data that is shared (tables required to be present in all databases) vs data owned by the main database exclusively that can be truncated/dropped in another context

Iteration plan

This plans is defining a number of incremental steps based on assumption that we are using untested feature of Rails (schema_search_path) in a highly optimised environment (PgBouncer and Load Balancer) and there's a high risk of causing a production incident.

  1. Development: Create migration create_schema(:gitlab_ci) (There's a Rails helper in ActiveRecord::Migration)
  2. Development: Update config/database.yml to define main: schema_search_path: public,gitlab_ci for main:. This needs to be populated in GDK/GCK/rspec/Omnibus/CNG.
  3. Production: Validate that a proper search_path is used on production by doing ActiveRecord::Base.connection.schema_search_path.
  4. Development: Ship migration to do ALTER TABLE ci_instance_variables SET SCHEMA gitlab_ci.
  5. Development: Model a way to indicate in a code a way to what model schema belongs.
  6. Development: Validate that ci_instance_variables is part of structure.sql.
  7. Development: Validate that ci_instance_variables is part of backup/restore (this feature defines exportable schemas) and works with project import/export.
  8. Development: Build a failsafe FF in a codebase to make it possible to disable fetching of ci_instance_variables (or make it possible to access it via explicitly defined name that includes schema gitlab_ci.ci_instance_variables) as we expect that this might break and in some cases schema visibility might be invalid.
  9. Production: Rollout feature flag and validate that instance variables can be accessed.
  10. Development: Ship a rspec code that validates that we do not define FKs that cross-join schemas.
  11. Development: Ship a code that discovers and disallows schema cross-joins (likely not possible yet).
  12. Once confirmed, this is done.

Example way to indicate schemas

# config/database.yml
production:
  main:
    schema_search_path: public,gitlab_ci

# migration
def up
  create_schema(:gitlab_ci)
  execute("ALTER TABLE ci_instance_variables SET SCHEMA gitlab_ci")
end

# Describe schema
# As we can later use that to validate cross-schema joins
class ApplicationRecord
  class_attribute :schema_name, default: :public
end

class Ci::BaseModel < ApplicationRecord
  self.schema_name = :gitlab_ci
end

# Dynamically alter schema visibility
module DatabaseSchemaHelper
  def with_search_paths(connection, search_path)
    # we need transaction to ensure that changes are local
    # this is required by PgBouncer which might change connections on each subsequent call
    connection.transaction do
      saved = connection.schema_search_path
      connection.schema_search_path = search_path
      yield
    end
  ensure
    connection.schema_search_path = saved
  end
end

Risks

There are risks associated with usage of schemas:

  • Our customers might be not allowing to create/manage schemas, which will result in failed migrations
  • Dynamically changing schema outside of transaction might change a global context, resulting in an invalid visibility for subsequent requests
  • We might find that schema visibility is not always set if used with PgBouncer, or our Load Balancer implementation
  • We might find that additional schemas are not properly exported as part of backups, or as part of structure.sql
  • Validate if PgBouncer managed connections can end-up in a state where invalid schema is used: #333415 (comment 616830582).
  • Customers using a different name for main schema, that is not public

Ideally

We would create and set visibility accordingly.

  • gitlab_shared - all tables that needs to live in every logical database, like: schema_migrations and *background_migrations
  • gitlab_main - all tables related to main database
  • gitlab_ci - all tables related to ci database

Shared tables

Migrations tables

There are number of tables that are required to be placed in both places (aka. gitlab_shared), and are required for each newly configured database:

  • schema_migrations required by Rails to know what migrations were executed
  • batched_background_migration_jobs, background_migration_jobs required by GitLab Rails to implement a method to have a predictable long running migrations

It is without a question that schema_migrations needs to be local to the database.

It is to question if background_migrations* does need to be. However, the good practice rather says that it should, since BG migrations is affecting always a specific tables for which it is to run. Keeping these tables local to what is to be modified is likely a desirable architecture. It would create a chaos to keep BG migrations only in central database

Partitioning schemas

We use a separate schema for hiding partitioned tables gitlab_partitions_dynamic and gitlab_partitions_static. This serves like a bucket for all partitioned tables.

Likely this model should be change into being explicit about decomposed partition, like:

  • gitlab_main_partitions_dynamic and gitlab_main_partitions_static
  • gitlab_ci_partitions_dynamic and gitlab_ci_partitions_static
Edited by Kamil Trzciński