Skip to content

Zero downtime database switch PoC

Adam Hegyi requested to merge zero_downtime_table_switch into master

What does this MR do?

This MR demonstrates that zero-downtime database switch is possible. Note: It's hacky!

Configuration

  • CiBase is an abstract AR class, defines the two databases (shards)
    • primary is the current database
    • ci is the new database, data from primary should be moved here.
    • Note: ci shard is not used in the application code at all.
  • CiTestTable is a model for the ci_test_table

Database configuration (config/database.yml):

development:
  primary:
    STANDARD_DEV_GDK_DB_CONFIG_COMES_HERE
  ci:
    adapter: postgresql
    encoding: unicode
    database: gitlabhq_ci_development
    user: postgres
    port: 5432
    pool: 20
    prepared_statements: false

The new database needs to be created by hand:

create database gitlabhq_ci_development;

Test script

If the new database is present and the configuration is in place, we can invoke the PoC script:

RAILS_PROFILE=true rails runner table_migration_test.rb

Note: RAILS_PROFILE is important, it doesn't work with code reloading.

What the script does:

  1. Create the DB table in both databases (primary, ci)
  2. Insert 10 rows, this will go to primary (default)
  3. In an infinite loop, run SELECT queries
  4. Get an EXCLUSIVE lock. This still allows reads but writes are blocked
  5. Copy the table data from primary to the ci DB
  6. Fix the sequence for the pkey
  7. Install a trigger to the primary DB to prevent table modification
  8. Release the lock
  9. Insert 10 records again. These will fail because of the trigger. Catch the error and reconnect to the ci DB.
  10. The retry will insert data to the ci DB
  11. Verify the data

Problem: the infinite loop that reads rows from primary DB will never switch to the new DB. Triggers are only preventing writes. We'll probably need to revoke permission or rename the table so these statements will also fail and we can retry with the new connection.

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

#333338 (closed)

Edited by Dylan Griffith

Merge request reports