Skip to content

Move registry import API calls outside of transaction

Steve Abrams requested to merge 352300-import-api-transactions into master

Context

We are preparing for Phase 2 of the Container Registry migration which involves importing all existing container repositories to the new platform (Phase 1 involved routing all new container repositories to the new platform). See &7316 (closed) for full details of how the import will work.

Rails is responsible for kicking off new imports by making API requests to the container registry to either:

  • Start a pre-import
  • Start an import

We use the state_machine gem to manage different migration_states and transitions for the ContainerRepository model.

When we have a transition from one state to importing or pre_importing, we update migration_state and a timestamp or two the record in the rails database, and then make the API request to the registry. The problem is, this all occurs in a single transaction. Making API requests inside of a transaction is not a good idea because it can potentially cause long running lock on the database.

To fix this, the state_machine gem has a handy decorator to disable transactions.

But wait a minute, you might say, we don't want to totally disable transactions, we still want them to work while we are making changes in the database, and just want them to stop when making these API requests. The good news is, this decorator use_transactions: false only applies to operations in after_transition blocks in the state machine.

What does this MR do and why?

Disables transactions in the ContainerRepository#migration_state state machine so API requests are made outside of the database transactions.

Screenshots or screen recordings

In these examples, to prove the API request was happening outside of the transaction, I updated the after_transition for pre_importing and importing to log some text:

    # ~line 152 of app/models/container_repository.rb
    after_transition any => :pre_importing do |container_repository|
      Rails.logger.debug('#' * 90)
      container_repository.try_import do
        container_repository.migration_pre_import
      end
    end

Note the API request will fail which will trigger container_repository.abort_import, causing a second transition and update the container_repository a second time.

Without the change

Without the change, we see TRANSACTION (0.2ms) BEGIN, then the #### output comes before TRANSACTION (0.8ms) COMMIT

[1] pry(main)> FactoryBot.create(:container_repository, project: Project.last)
[2] pry(main)> ContainerRepository.last.update(migration_state: 'default', migration_pre_import_started_at: nil, migration_aborted_at: nil, migration_retries_count: 0, migration_aborted_in_state: nil)
[3] pry(main)> ContainerRepository.last.start_pre_import
  ContainerRepository Load (0.6ms)  SELECT "container_repositories".* FROM "container_repositories" ORDER BY "container_repositories"."id" DESC LIMIT 1 /*application:console,db_config_name:main,line:(pry):8:in `__pry__'*/
  TRANSACTION (0.2ms)  BEGIN /*application:console,db_config_name:main,line:/lib/gitlab/database.rb:265:in `block in transaction'*/
  ContainerRepository Exists? (0.3ms)  SELECT 1 AS one FROM "container_repositories" WHERE "container_repositories"."name" = 'test_image_3' AND "container_repositories"."id" != 37 AND "container_repositories"."project_id" = 30 LIMIT 1 /*application:console,db_config_name:main,line:/lib/gitlab/database.rb:265:in `block in transaction'*/
  ContainerRepository Update (0.3ms)  UPDATE "container_repositories" SET "updated_at" = '2022-02-11 18:06:45.792133', "migration_pre_import_started_at" = '2022-02-11 18:06:45.789295', "migration_state" = 'pre_importing' WHERE "container_repositories"."id" = 37 /*application:console,db_config_name:main,line:/lib/gitlab/database.rb:265:in `block in transaction'*/
##########################################################################################
...
  ContainerRepository Exists? (0.4ms)  SELECT 1 AS one FROM "container_repositories" WHERE "container_repositories"."name" = 'test_image_3' AND "container_repositories"."id" != 37 AND "container_repositories"."project_id" = 30 LIMIT 1 /*application:console,db_config_name:main,line:/lib/gitlab/database.rb:265:in `block in transaction'*/
  ContainerRepository Update (0.6ms)  UPDATE "container_repositories" SET "migration_state" = 'import_aborted', "migration_retries_count" = 1, "updated_at" = '2022-02-11 18:06:46.274216', "migration_aborted_at" = '2022-02-11 18:06:46.271701', "migration_aborted_in_state" = 'pre_importing' WHERE "container_repositories"."id" = 37 /*application:console,db_config_name:main,line:/lib/gitlab/database.rb:265:in `block in transaction'*/
  TRANSACTION (0.8ms)  COMMIT /*application:console,db_config_name:main,line:/lib/gitlab/database.rb:298:in `commit'*/
=> true

With the change

With the change, we see a tranaction begins, but then commits before the #### output, and then a separate transaction is opened for further updates after the API request is made.

[4] pry(main)> reload!
[5] pry(main)> ContainerRepository.last.update(migration_state: 'default', migration_pre_import_started_at: nil, migration_aborted_at: nil, migration_retries_count: 0, migration_aborted_in_state: nil)
[6] pry(main)> ContainerRepository.last.start_pre_import
  ContainerRepository Load (0.5ms)  SELECT "container_repositories".* FROM "container_repositories" ORDER BY "container_repositories"."id" DESC LIMIT 1 /*application:console,db_config_name:main,line:(pry):5:in `__pry__'*/
  TRANSACTION (0.2ms)  BEGIN /*application:console,db_config_name:main,line:/app/models/container_repository.rb:257:in `start_pre_import'*/
  ContainerRepository Exists? (0.4ms)  SELECT 1 AS one FROM "container_repositories" WHERE "container_repositories"."name" = 'test_image_3' AND "container_repositories"."id" != 37 AND "container_repositories"."project_id" = 30 LIMIT 1 /*application:console,db_config_name:main,line:/app/models/container_repository.rb:257:in `start_pre_import'*/
  ContainerRepository Update (0.4ms)  UPDATE "container_repositories" SET "updated_at" = '2022-02-11 18:03:49.951791', "migration_pre_import_started_at" = '2022-02-11 18:03:49.948227', "migration_state" = 'pre_importing' WHERE "container_repositories"."id" = 37 /*application:console,db_config_name:main,line:/app/models/container_repository.rb:257:in `start_pre_import'*/
  TRANSACTION (0.3ms)  COMMIT /*application:console,db_config_name:main,line:/lib/gitlab/database.rb:298:in `commit'*/
##########################################################################################
...
  TRANSACTION (0.2ms)  BEGIN /*application:console,db_config_name:main,line:/app/models/container_repository.rb:295:in `try_import'*/
  ContainerRepository Exists? (0.3ms)  SELECT 1 AS one FROM "container_repositories" WHERE "container_repositories"."name" = 'test_image_3' AND "container_repositories"."id" != 37 AND "container_repositories"."project_id" = 30 LIMIT 1 /*application:console,db_config_name:main,line:/app/models/container_repository.rb:295:in `try_import'*/
  ContainerRepository Update (0.4ms)  UPDATE "container_repositories" SET "migration_state" = 'import_aborted', "migration_retries_count" = 1, "updated_at" = '2022-02-11 18:03:50.761083', "migration_aborted_at" = '2022-02-11 18:03:50.757984', "migration_aborted_in_state" = 'pre_importing' WHERE "container_repositories"."id" = 37 /*application:console,db_config_name:main,line:/app/models/container_repository.rb:295:in `try_import'*/
  TRANSACTION (0.4ms)  COMMIT /*application:console,db_config_name:main,line:/lib/gitlab/database.rb:298:in `commit'*/
=> true

How to set up and validate locally

  1. Ensure Docker is running locally.

  2. Enable the phase 2 feature flag:

    Feature.enable(:container_registry_migration_phase2_enabled)
  3. You can use the exact commands from the above section, starting with creating a container repository using FactoryBot. Note that each time you run ContainerRepository.last.start_pre_import, you will want to reset the attributes using the update to ensure the repository is back in default state. I also recommend adding a Rails.logger.debug as described above to be able to see where the API call is actually made in reference to the transactions.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #352300 (closed)

Edited by Steve Abrams

Merge request reports