Skip to content

Update migration helpers to use check constraints instead of change_column_null

What does this MR do?

This MR introduces new migration helpers for managing NOT NULL constraints and updates existing migration helpers to use check constraints instead of change_column_null.

Problem Description

The main issue to be addressed by this MR is described in #38358 (closed):

We want to avoid access exclusive locks in SET NOT NULL migrations.

We had a migration that used add_column_with_default and attempted to run:

ALTER TABLE 'ci_build_needs' ALTER COLUMN 'artifacts' SET NOT NULL

That locked the table for about 8-10 minutes before it completed, probably because it still had to do a full table scan of 9 million rows.

The proposal is to change how a NOT NULL constraint on a column is added.

Currently, we simply run change_column_null which results to the following stament:

ALTER TABLE 'ci_build_needs' ALTER COLUMN 'artifacts' SET NOT NULL

This has two problems:

  1. It requires an exclusive lock on the table, possibly interfering with traffic
  2. The validation can take an extended period of time (while holding the lock)

Instead we want to switch that to:

  1. Add a NOT VALID NOT NULL check constraint for the column
  2. Validate the constraint afterwards

This separates the change of the table (which requires the exclusive lock) from the validation of the constraint (which only needs a SHARE UPDATE EXCLUSIVE type of lock). The benefit is that we don't hold the exclusive lock for an extended period of time (but we still need it, to add the constraint in the first place).

Implementation details

To achieve the aforementioned solution, this MR:

  1. Adds migration helpers for managing NOT NULL constraints (add, validate, remove, check if exists)

    We only need the add_not_null_constraint to solve the problem described in the previous section, but we also add all the helpers that will allow us to manage NOT NULL constraints in various use cases.

    The most important additional option provided with those helpers is the ability to add a NOT NULL constraint in a migration with validate: false, run a data migration to clean up the column and then validate the constraint, similarly to our approach for foreign keys and text limit constraints.

  2. Update the disable_statement_timeout migration helper to allow calling it multiple times, even from inside a block of another disable_statement_timeout, without resetting the timeout before the outer block finishes.

    This is a simple but important update introduced in this MR: it allows using migration helpers that disable the statement timeout inside other migration helpers that also disable the statement timeout.

    I am going to showcase this in the following section.

  3. Update migration helpers to use add_not_null_constraint to enforce NOT NULL for existing columns instead of change_column_null

  4. Update the Database Guides with a reference to use add_not_null_constraint instead of change_column_null

    Additional updates to the Database Guides will be introduced with #216361 after this MR is merged.

The updates affect both add_column_with_default and rename_column_concurrently (as it uses create_column_for that also adds not null constraints).

Update on disable_statement_timeout

With our current implementation of disable_statement_timeout, if we were to call a helper that disables the statement timeout (e.g. add_check_constraint) inside another helper that disables the statement timeout (e.g. add_column_with_default), then when the first finishes, it resets the statement_timeout and all commands that come after it (even though we are still in a disable_statement_timeout block) run without a disabled statement timeout.

Assume the following test migration:

class DelayedValidationOfNotNull < ActiveRecord::Migration[6.0]
  include Gitlab::Database::MigrationHelpers

  DOWNTIME = false

  disable_ddl_transaction!

  def up
    log_statement_timeout('OUTSIDE of disable_statement_timeout')

    disable_statement_timeout do
      log_statement_timeout('In disable_statement_timeout')

      disable_statement_timeout do
        log_statement_timeout('In disable_statement_timeout > disable_statement_timeout')
      end

      log_statement_timeout('In disable_statement_timeout')
    end

    log_statement_timeout('OUTSIDE of disable_statement_timeout')

    raise 'stop'
  end

  def down
    # no-op
  end

  private

  def log_statement_timeout(msg)
    puts "#{msg} | statement_timeout = #{connection.select_value('SHOW statement_timeout')}"
  end
end

Our current implementation would result to the following:

OUTSIDE of disable_statement_timeout | statement_timeout = 15s
-- execute("SET statement_timeout TO 0")
In disable_statement_timeout | statement_timeout = 0
-- execute("SET statement_timeout TO 0")
In disable_statement_timeout > disable_statement_timeout | statement_timeout = 0
-- execute("RESET ALL")
In disable_statement_timeout | statement_timeout = 15s
-- execute("RESET ALL")
OUTSIDE of disable_statement_timeout | statement_timeout = 15s

So RESET ALL is running twice and everything after the first one (end of the inner disable_statement_timeout block) and the second one (end of the top level disable_statement_timeout block) runs with the default statement timeout.

After this update:

OUTSIDE of disable_statement_timeout | statement_timeout = 15s
-- execute("SET statement_timeout TO 0")
In disable_statement_timeout | statement_timeout = 0
In disable_statement_timeout > disable_statement_timeout | statement_timeout = 0
In disable_statement_timeout | statement_timeout = 0
-- execute("RESET ALL")
OUTSIDE of disable_statement_timeout | statement_timeout = 15s

That means that we can reuse and call the various migration helpers (especially the ones that add check constraints) in our other migration helpers without worrying that we'll break their behavior.

Test run showcasing that this update properly addresses the problems described

Assume the following test migration:

class DelayedValidationOfNotNull < ActiveRecord::Migration[6.0]
  include Gitlab::Database::MigrationHelpers

  DOWNTIME = false

  disable_ddl_transaction!

  def up
    add_column_with_default(
      :issues,
      :new_not_null_column,
      :boolean,
      default: false,
      allow_null: false
    )
  end

  def down
    remove_column :issues, :new_not_null_column
  end
end

Checking that this update works:

$ bundle exec rake db:migrate
== 20200501164742 DelayedValidationOfNotNull: migrating =======================
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- transaction()
-- add_column(:issues, :new_not_null_column, :boolean, {:default=>nil})
   -> 0.0021s
-- change_column_default(:issues, :new_not_null_column, false)
   -> 0.0048s
   -> 0.0081s
-- columns(:issues)
   -> 0.0023s
-- transaction_open?()
   -> 0.0000s
-- exec_query("SELECT COUNT(*) AS count FROM \"issues\"")
   -> 0.0017s
-- exec_query("SELECT \"issues\".\"id\" FROM \"issues\" ORDER BY \"issues\".\"id\" ASC LIMIT 1")
   -> 0.0005s
-- exec_query("SELECT \"issues\".\"id\" FROM \"issues\" WHERE \"issues\".\"id\" >= 1 ORDER BY \"issues\".\"id\" ASC LIMIT 1 OFFSET 23")
   -> 0.0004s
-- execute("UPDATE \"issues\" SET \"new_not_null_column\" = FALSE WHERE \"issues\".\"id\" >= 1 AND \"issues\".\"id\" < 24")
   -> 0.0030s

[ ... ... ...]

-- execute("UPDATE \"issues\" SET \"new_not_null_column\" = FALSE WHERE \"issues\".\"id\" >= 438")
   -> 0.0006s
-- transaction_open?()
   -> 0.0000s
-- execute("ALTER TABLE issues\nADD CONSTRAINT check_5463530bb7\nCHECK ( new_not_null_column IS NOT NULL )\nNOT VALID;\n")
   -> 0.0024s
-- execute("ALTER TABLE issues VALIDATE CONSTRAINT check_5463530bb7;")
   -> 0.0008s
-- execute("RESET ALL")
   -> 0.0002s
== 20200501164742 DelayedValidationOfNotNull: migrated (0.2029s) ==============


$ git diff db/structure.sql

@@ -3384,7 +3384,9 @@ CREATE TABLE public.issues (
     external_key character varying(255),
-    sprint_id bigint
+    sprint_id bigint,
+    new_not_null_column boolean DEFAULT false,
+    CONSTRAINT check_5463530bb7 CHECK ((new_not_null_column IS NOT NULL))
 );


$ gdk psql
gitlabhq_development=# \d+ issues;
...
Check constraints:
    "check_5463530bb7" CHECK (new_not_null_column IS NOT NULL)
...
gitlabhq_development=# \q

$ bundle exec rake db:rollback
== 20200501164742 DelayedValidationOfNotNull: reverting =======================
-- remove_column(:issues, :new_not_null_column)
   -> 0.0021s
== 20200501164742 DelayedValidationOfNotNull: reverted (0.0022s) ==============

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

Merge request reports