Skip to content

Add migration helpers for converting int columns to bigint

Related issue: #292841 (closed)

In order to address the Primary Key overflow risk for tables with an integer PK, we need migration helpers that will allow us to convert a primary key and all the foreign keys that reference it from int to bigint.

The first step is to add a migration helper and a background migration for initializing the process:

  1. Given an integer column (e.g. events.id), create a new column (e.g. events.new_id), with bigint NOT NULL DEFAULT 0
  2. Setup a trigger to keep the two columns in sync
  3. Backfill the new column with the values of the existing column by scheduling background jobs
  4. Support for tracking the scheduled jobs through the use of Gitlab::Database::BackgroundMigrationJob

We need to do so for Primary Keys, but also for all Foreign Keys that reference them, which can be defined with or without a NOT NULL constraint and some are defined as both a Foreign Key and a Primary Key.

A nice example is the the push_event_payloads.event_id that we'll have to update for converting events.id to bigint (igitlab-org/gitlab#288004): It is a primary key with a name other than id (which a lot of our helpers require) and it is a foreign key at the same time, referencing events.id.

To address the aforementioned requirements, we add the following in this MR:

  1. A new migration helper initialize_conversion_of_integer_to_bigint that will add the column and the sync trigger and schedule the backfill jobs.

    We can not use our current migration helpers (like change_column_type_using_background_migration) as they are built to schedule the background migrations and then cleanup in one go. We need to do a couple additional steps in the next release, like adding a unique index for PKeys, add any additional indexes that include the existing index, remove the default we set, swap the table sequence end set the new column as the primary in an atomic way and more (#288005 (closed))

    On top of that, we need to add some tricks that are not supported by our current helpers, like defining the new column as a NOT NULL DEFAULT 0 and we also want to track the Background Migrations through the recently introduced BackgroundMigrationJob

    As we want to support both Primary Keys and Foreign Keys, the helper is generic (hence the name) and can support the conversion of any integer column.

  2. A new Background Migration CopyColumnUsingBackgroundMigrationJob that uses BackgroundMigrationJob to mark when it succeeds.

    It can be used with the queue_background_migration_jobs_by_range_at_intervals helper that also supports job tracking and which can not be used with CopyColumn as it has the start_id, end_id arguments in the wrong order.

    In addition, it supports sub-batching and tables with primary keys that are not called id (which is required, for example, for the push_event_payloads.event_id conersion.

This covers half the work required to support end-to-end the process of converting a primary key from int to bigint: We need to also add support for finalizing the conversion, which is covered in #292874 (closed) and will follow the successful release of this MR.

Proof Of concept

Following, we setup a migration similar to the one that we are going to ship for #288004 (closed), with some additional columns updated to showcase and test against all possible cases.

# frozen_string_literal: true

class TestInitConversionOfIntToBigint < ActiveRecord::Migration[6.0]
  include Gitlab::Database::MigrationHelpers

  DOWNTIME = false

  disable_ddl_transaction!

  def up
    # Primary Key
    initialize_conversion_of_integer_to_bigint :events, :id, batch_size: 300, sub_batch_size: 100

    # FK that can be NULL
    initialize_conversion_of_integer_to_bigint :events, :project_id, batch_size: 100, sub_batch_size: 50

    # NOT NULL FK
    initialize_conversion_of_integer_to_bigint :events, :author_id

    # Primary key other than :id
    # and at the same time NOT NULL foreign key that references events.id
    # (will be included in the events migration)
    initialize_conversion_of_integer_to_bigint :push_event_payloads, :event_id, primary_key: :event_id
  end

  def down
    remove_rename_triggers_for_postgresql(:events, rename_trigger_name(:events, :id, :id_convert_to_bigint))
    remove_rename_triggers_for_postgresql(:events, rename_trigger_name(:events, :project_id, :project_id_convert_to_bigint))
    remove_rename_triggers_for_postgresql(:events, rename_trigger_name(:events, :author_id, :author_id_convert_to_bigint))
    remove_rename_triggers_for_postgresql(:push_event_payloads, rename_trigger_name(:push_event_payloads, :event_id, :event_id_convert_to_bigint))

    remove_column :events, :id_convert_to_bigint
    remove_column :events, :project_id_convert_to_bigint
    remove_column :events, :author_id_convert_to_bigint
    remove_column :push_event_payloads, :event_id_convert_to_bigint
  end
end

Click for the Migration output
$ bundle exec rake db:migrate

== 20201211140738 TestInitConversionOfIntToBigint: migrating ==================
-- table_exists?(:events)
   -> 0.0008s
-- column_exists?(:events, :id)
   -> 0.0016s
-- column_exists?(:events, :id)
   -> 0.0007s
-- columns(:events)
   -> 0.0009s
-- add_column(:events, "id_convert_to_bigint", :bigint, {:default=>0, :null=>false})
   -> 0.0015s
-- quote_table_name(:events)
   -> 0.0000s
-- quote_column_name(:id)
   -> 0.0000s
-- quote_column_name("id_convert_to_bigint")
   -> 0.0000s
-- execute("CREATE OR REPLACE FUNCTION trigger_69523443cc10()\nRETURNS trigger AS\n$BODY$\nBEGIN\n  NEW.\"id_convert_to_bigint\" := NEW.\"id\";\n  RETURN NEW;\nEND;\n$BODY$\nLANGUAGE 'plpgsql'\nVOLATILE\n")
   -> 0.0016s
-- execute("DROP TRIGGER IF EXISTS trigger_69523443cc10\nON \"events\"\n")
   -> 0.0004s
-- execute("CREATE TRIGGER trigger_69523443cc10\nBEFORE INSERT OR UPDATE\nON \"events\"\nFOR EACH ROW\nEXECUTE FUNCTION trigger_69523443cc10()\n")
   -> 0.0007s
-- table_exists?(:events)
   -> 0.0005s
-- column_exists?(:events, :id)
   -> 0.0008s
-- column_exists?(:events, :project_id)
   -> 0.0008s
-- columns(:events)
   -> 0.0008s
-- add_column(:events, "project_id_convert_to_bigint", :bigint, {:default=>nil})
   -> 0.0011s
-- quote_table_name(:events)
   -> 0.0000s
-- quote_column_name(:project_id)
   -> 0.0000s
-- quote_column_name("project_id_convert_to_bigint")
   -> 0.0000s
-- execute("CREATE OR REPLACE FUNCTION trigger_168230620228()\nRETURNS trigger AS\n$BODY$\nBEGIN\n  NEW.\"project_id_convert_to_bigint\" := NEW.\"project_id\";\n  RETURN NEW;\nEND;\n$BODY$\nLANGUAGE 'plpgsql'\nVOLATILE\n")
   -> 0.0005s
-- execute("DROP TRIGGER IF EXISTS trigger_168230620228\nON \"events\"\n")
   -> 0.0003s
-- execute("CREATE TRIGGER trigger_168230620228\nBEFORE INSERT OR UPDATE\nON \"events\"\nFOR EACH ROW\nEXECUTE FUNCTION trigger_168230620228()\n")
   -> 0.0005s
-- table_exists?(:events)
   -> 0.0006s
-- column_exists?(:events, :id)
   -> 0.0011s
-- column_exists?(:events, :author_id)
   -> 0.0009s
-- columns(:events)
   -> 0.0008s
-- add_column(:events, "author_id_convert_to_bigint", :bigint, {:default=>0, :null=>false})
   -> 0.0014s
-- quote_table_name(:events)
   -> 0.0000s
-- quote_column_name(:author_id)
   -> 0.0000s
-- quote_column_name("author_id_convert_to_bigint")
   -> 0.0000s
-- execute("CREATE OR REPLACE FUNCTION trigger_04418d3ccd9a()\nRETURNS trigger AS\n$BODY$\nBEGIN\n  NEW.\"author_id_convert_to_bigint\" := NEW.\"author_id\";\n  RETURN NEW;\nEND;\n$BODY$\nLANGUAGE 'plpgsql'\nVOLATILE\n")
   -> 0.0005s
-- execute("DROP TRIGGER IF EXISTS trigger_04418d3ccd9a\nON \"events\"\n")
   -> 0.0003s
-- execute("CREATE TRIGGER trigger_04418d3ccd9a\nBEFORE INSERT OR UPDATE\nON \"events\"\nFOR EACH ROW\nEXECUTE FUNCTION trigger_04418d3ccd9a()\n")
   -> 0.0005s
-- table_exists?(:push_event_payloads)
   -> 0.0005s
-- column_exists?(:push_event_payloads, :event_id)
   -> 0.0008s
-- column_exists?(:push_event_payloads, :event_id)
   -> 0.0008s
-- columns(:push_event_payloads)
   -> 0.0008s
-- add_column(:push_event_payloads, "event_id_convert_to_bigint", :bigint, {:default=>0, :null=>false})
   -> 0.0008s
-- quote_table_name(:push_event_payloads)
   -> 0.0000s
-- quote_column_name(:event_id)
   -> 0.0000s
-- quote_column_name("event_id_convert_to_bigint")
   -> 0.0000s
-- execute("CREATE OR REPLACE FUNCTION trigger_07c94931164e()\nRETURNS trigger AS\n$BODY$\nBEGIN\n  NEW.\"event_id_convert_to_bigint\" := NEW.\"event_id\";\n  RETURN NEW;\nEND;\n$BODY$\nLANGUAGE 'plpgsql'\nVOLATILE\n")
   -> 0.0005s
-- execute("DROP TRIGGER IF EXISTS trigger_07c94931164e\nON \"push_event_payloads\"\n")
   -> 0.0003s
-- execute("CREATE TRIGGER trigger_07c94931164e\nBEFORE INSERT OR UPDATE\nON \"push_event_payloads\"\nFOR EACH ROW\nEXECUTE FUNCTION trigger_07c94931164e()\n")
   -> 0.0005s
== 20201211140738 TestInitConversionOfIntToBigint: migrated (1.9054s) =========

We can see that the new columns have been added to both tables and the triggers are correctly set to sync them with the existing ones:

# \d events
                                                Table "public.events"
            Column            |           Type           | Collation | Nullable |              Default
------------------------------+--------------------------+-----------+----------+------------------------------------
 id                           | integer                  |           | not null | nextval('events_id_seq'::regclass)
 project_id                   | integer                  |           |          |
 author_id                    | integer                  |           | not null | author_id_convert_to_bigint  | bigint                   |           | not null | 0
...
 id_convert_to_bigint         | bigint                   |           | not null | 0
 project_id_convert_to_bigint | bigint                   |           |          |
 author_id_convert_to_bigint  | bigint                   |           | not null | 0
...

Triggers:
    trigger_04418d3ccd9a BEFORE INSERT OR UPDATE ON events FOR EACH ROW EXECUTE PROCEDURE trigger_04418d3ccd9a()
    trigger_168230620228 BEFORE INSERT OR UPDATE ON events FOR EACH ROW EXECUTE PROCEDURE trigger_168230620228()
    trigger_69523443cc10 BEFORE INSERT OR UPDATE ON events FOR EACH ROW EXECUTE PROCEDURE trigger_69523443cc10()


# SELECT *
   FROM events
   WHERE (id <> id_convert_to_bigint) OR (project_id <> project_id_convert_to_bigint)
     OR (author_id <>author_id_convert_to_bigint);

(0 rows)

# INSERT INTO EVENTS (project_id, author_id, created_at, updated_at, action)
   VALUES (1, 1, NOW(), NOW(), 1);
# INSERT INTO EVENTS (author_id, created_at, updated_at, action)
   VALUES (2, NOW(), NOW(), 2);

# SELECT id, project_id, author_id, id_convert_to_bigint, 
                  project_id_convert_to_bigint, author_id_convert_to_bigint
   FROM events
   ORDER BY id desc LIMIT 2;

 id  | project_id | author_id | id_convert_to_bigint | project_id_convert_to_bigint | author_id_convert_to_bigint
-----+------------+-----------+----------------------+------------------------------+-----------------------------
 457 |            |         2 |                  457 |                              |                           2
 456 |          1 |         1 |                  456 |                            1 |                           1
(2 rows)


# \d push_event_payloads

           Column           |         Type          | Collation | Nullable | Default
----------------------------+-----------------------+-----------+----------+---------
 event_id                   | integer               |           | not null |
...
 event_id_convert_to_bigint | bigint                |           | not null | 0
...
Triggers:
    trigger_07c94931164e BEFORE INSERT OR UPDATE ON push_event_payloads FOR EACH ROW EXECUTE PROCEDURE trigger_07c94931164e()

# SELECT event_id, event_id_convert_to_bigint FROM push_event_payloads;

 event_id | event_id_convert_to_bigint
----------+----------------------------
        1 |                          1
        2 |                          2
        3 |                          3

And finally, all the scheduled jobs have been tracked and their status has been updated after they were executed:

# SELECT id, status, class_name, arguments FROM background_migration_jobs;
 
id | status |              class_name               |                                         arguments
----+--------+---------------------------------------+-------------------------------------------------------------------------------------------
 33 |      0 | CopyColumnUsingBackgroundMigrationJob | [1, 300, "events", "id", "id", "id_convert_to_bigint", 100]
 34 |      0 | CopyColumnUsingBackgroundMigrationJob | [301, 457, "events", "id", "id", "id_convert_to_bigint", 100]
 35 |      0 | CopyColumnUsingBackgroundMigrationJob | [1, 100, "events", "id", "project_id", "project_id_convert_to_bigint", 50]
 36 |      0 | CopyColumnUsingBackgroundMigrationJob | [101, 200, "events", "id", "project_id", "project_id_convert_to_bigint", 50]
 37 |      0 | CopyColumnUsingBackgroundMigrationJob | [201, 300, "events", "id", "project_id", "project_id_convert_to_bigint", 50]
 38 |      0 | CopyColumnUsingBackgroundMigrationJob | [301, 400, "events", "id", "project_id", "project_id_convert_to_bigint", 50]
 39 |      0 | CopyColumnUsingBackgroundMigrationJob | [401, 457, "events", "id", "project_id", "project_id_convert_to_bigint", 50]
 40 |      0 | CopyColumnUsingBackgroundMigrationJob | [1, 457, "events", "id", "author_id", "author_id_convert_to_bigint", 1000]
 41 |      0 | CopyColumnUsingBackgroundMigrationJob | [1, 3, "push_event_payloads", "event_id", "event_id", "event_id_convert_to_bigint", 1000]

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by Yannis Roussos

Merge request reports