Skip to content

Fix schema helpers for restricted migrations

What does this MR do and why?

Related to #356276 (closed)

Update a couple migration helpers for concurrent schema changes that mix DML and DDL operations. This will no longer be allowed under Gitlab::Database::Migration[2.0], due to the decomposition effort.

In this case, the helpers rely on create_column_from which creates and populates a new column from an existing column. Since update_column_in_batches is used to populate the new column in the migration, it violates the rule of running DML during a DDL-only migration.

To fix this, require the helpers to run in DDL mode, which is necessary to keep the schema in sync between multiple databases. At the same time, we can suppress the error caused by update_column_in_batches running DML to populate the column.

How to set up and validate locally

  1. Create a new migration db/post_migrate/20220412193755_rename_project_updated_at_column.rb which uses one of these helpers:

    # frozen_string_literal: true
    
    class RenameProjectUpdatedAtColumn < Gitlab::Database::Migration[2.0]
      disable_ddl_transaction!
    
      def up
        rename_column_concurrently :projects, :updated_at, :last_changed_at
      end
    
      def down
        undo_rename_column_concurrently :projects, :updated_at, :last_changed_at
      end
    end
  2. Run the migration on master, which should give this error:

    Gitlab::Database::QueryAnalyzers::RestrictAllowedSchemas::DMLNotAllowedError: Select/DML queries (SELECT/UPDATE/DELETE) are disallowed in the DDL (structure) mode. Modifying of '["projects"]' ([:gitlab_main]) with 'SELECT COUNT(*) AS count FROM "projects" /*application:web,db_config_name:main,line:/lib/gitlab/database/migration_helpers.rb:519:in `update_column_in_batches'*/'
  3. Cleanup the database by removing the column (it should only be on main since the migration failed):

    alter table projects drop column last_changed_at;
  4. Re-run the migration on this branch, and you should the migration output for both main and ci. Also, assuming your main database has data in projects and ci doesn't, you'll see the batching for update_column_in_batches:

    Output from main
    -- exec_query("SELECT COUNT(*) AS count FROM \"projects\"")
       -> 0.0055s
    -- exec_query("SELECT \"projects\".\"id\" FROM \"projects\" ORDER BY \"projects\".\"id\" ASC LIMIT 1")
       -> 0.0012s
    -- exec_query("SELECT \"projects\".\"id\" FROM \"projects\" WHERE \"projects\".\"id\" >= 1 ORDER BY \"projects\".\"id\" ASC LIMIT 1 OFFSET 1")
       -> 0.0011s
    -- execute("UPDATE \"projects\" SET \"last_changed_at\" = \"projects\".\"updated_at\" WHERE \"projects\".\"id\" >= 1 AND \"projects\".\"id\" < 2")
       -> 0.0023s
    -- exec_query("SELECT \"projects\".\"id\" FROM \"projects\" WHERE \"projects\".\"id\" >= 2 ORDER BY \"projects\".\"id\" ASC LIMIT 1 OFFSET 1")
       -> 0.0009s
    -- execute("UPDATE \"projects\" SET \"last_changed_at\" = \"projects\".\"updated_at\" WHERE \"projects\".\"id\" >= 2 AND \"projects\".\"id\" < 3")
       -> 0.0011s
    -- exec_query("SELECT \"projects\".\"id\" FROM \"projects\" WHERE \"projects\".\"id\" >= 3 ORDER BY \"projects\".\"id\" ASC LIMIT 1 OFFSET 1")
       -> 0.0008s
    -- execute("UPDATE \"projects\" SET \"last_changed_at\" = \"projects\".\"updated_at\" WHERE \"projects\".\"id\" >= 3 AND \"projects\".\"id\" < 4")
       -> 0.0011s
    -- exec_query("SELECT \"projects\".\"id\" FROM \"projects\" WHERE \"projects\".\"id\" >= 4 ORDER BY \"projects\".\"id\" ASC LIMIT 1 OFFSET 1")
       -> 0.0009s
    -- execute("UPDATE \"projects\" SET \"last_changed_at\" = \"projects\".\"updated_at\" WHERE \"projects\".\"id\" >= 4 AND \"projects\".\"id\" < 5")
       -> 0.0011s
    -- exec_query("SELECT \"projects\".\"id\" FROM \"projects\" WHERE \"projects\".\"id\" >= 5 ORDER BY \"projects\".\"id\" ASC LIMIT 1 OFFSET 1")
       -> 0.0010s
    -- execute("UPDATE \"projects\" SET \"last_changed_at\" = \"projects\".\"updated_at\" WHERE \"projects\".\"id\" >= 5 AND \"projects\".\"id\" < 6")
       -> 0.0011s
    -- exec_query("SELECT \"projects\".\"id\" FROM \"projects\" WHERE \"projects\".\"id\" >= 6 ORDER BY \"projects\".\"id\" ASC LIMIT 1 OFFSET 1")
       -> 0.0009s
    -- execute("UPDATE \"projects\" SET \"last_changed_at\" = \"projects\".\"updated_at\" WHERE \"projects\".\"id\" >= 6 AND \"projects\".\"id\" < 7")
       -> 0.0012s
    -- exec_query("SELECT \"projects\".\"id\" FROM \"projects\" WHERE \"projects\".\"id\" >= 7 ORDER BY \"projects\".\"id\" ASC LIMIT 1 OFFSET 1")
       -> 0.0012s
    -- execute("UPDATE \"projects\" SET \"last_changed_at\" = \"projects\".\"updated_at\" WHERE \"projects\".\"id\" >= 7 AND \"projects\".\"id\" < 8")
       -> 0.0011s
    -- exec_query("SELECT \"projects\".\"id\" FROM \"projects\" WHERE \"projects\".\"id\" >= 8 ORDER BY \"projects\".\"id\" ASC LIMIT 1 OFFSET 1")
       -> 0.0009s
    -- execute("UPDATE \"projects\" SET \"last_changed_at\" = \"projects\".\"updated_at\" WHERE \"projects\".\"id\" >= 8")
    Output from ci
    -- exec_query("SELECT COUNT(*) AS count FROM \"projects\"")
    -> 0.0055s

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Patrick Bair

Merge request reports