WIP: POC for bulk-insert API (!21628) · Merge requests · GitLab.org / GitLab

Matthias Käppler requested to merge 36992-bulk-insert-api-poc into master Dec 12, 2019

What does this MR do?

This is not fully functional yet.

As per #36992 (closed) we're looking for a cross-cutting way to introduce a bulk-insert API, such that INSERTs will be batched up for saves rather than executed 1 by 1. We already have raw SQL-level bulk-insert functionality via Gitlab::Database::bulk_insert but it operates on the row level and does not trigger ActiveRecord callbacks.

What we aim for here is to provide full bulk-insert functionality in ActiveRecord, including callbacks.

The current API is very simple:

MyEntity.save_all!(instances)

This will trigger 1 INSERT instead of instances.size inserts, and all callbacks will fire transactionally.

Example program:

Model = ImportFailure

Model.delete_all

class Model < ApplicationRecord
  before_save -> { p "model::before_save" }
  after_save -> {
    p "model::after_save"
    # create something transactionally
    OauthAccessToken.create!
  }
  after_commit -> {
    p "model::after_commit"
  }
end

proj = Project.last
m1 = Model.new(project: proj)
m2 = Model.new(project: proj)

Model.save_all!(m1, m2)

SQL log:

(0.2ms)  BEGIN
  ↳ lib/gitlab/database/bulk_ops/bulk_insert_support.rb:35
  Project Load (1.4ms)  SELECT  "projects".* FROM "projects" ORDER BY "projects"."id" DESC LIMIT $1  [["LIMIT", 1]]
  ↳ i@ar-batching.rb:29
  ImportFailure Load (0.4ms)  SELECT  "import_failures".* FROM "import_failures" ORDER BY "import_failures"."id" DESC LIMIT $1  [["LIMIT", 2]]
  ↳ i@ar-batching.rb:39
   (0.3ms)  SELECT VERSION()
  ↳ lib/gitlab/database.rb:239
   (0.4ms)          INSERT INTO import_failures ("project_id", "created_at")
        VALUES (66, '2019-12-12 16:43:35.245218'), (66, '2019-12-12 16:43:35.245840')
 RETURNING id
  ↳ lib/gitlab/database.rb:191
  Doorkeeper::AccessToken Exists (0.6ms)  SELECT  1 AS one FROM "oauth_access_tokens" WHERE "oauth_access_tokens"."token" = $1 LIMIT $2  [["token", "23fed6b958cb9b3dd196c68a9e2d40c39b5271e5080a06c804752a3e0ba9c92d"], ["LIMIT", 1]]
  ↳ i@ar-batching.rb:21
  OauthAccessToken Create (0.3ms)  INSERT INTO "oauth_access_tokens" ("token", "created_at") VALUES ($1, $2) RETURNING "id"  [["token", "23fed6b958cb9b3dd196c68a9e2d40c39b5271e5080a06c804752a3e0ba9c92d"], ["created_at", "2019-12-12 16:43:35.260848"]]
  ↳ i@ar-batching.rb:21
  Doorkeeper::AccessToken Exists (0.3ms)  SELECT  1 AS one FROM "oauth_access_tokens" WHERE "oauth_access_tokens"."token" = $1 LIMIT $2  [["token", "7bd111d90906bb536456e114db3d8ba256bc3214ec90cd89d367ad494d9fcdf3"], ["LIMIT", 1]]
  ↳ i@ar-batching.rb:21
  OauthAccessToken Create (0.2ms)  INSERT INTO "oauth_access_tokens" ("token", "created_at") VALUES ($1, $2) RETURNING "id"  [["token", "7bd111d90906bb536456e114db3d8ba256bc3214ec90cd89d367ad494d9fcdf3"], ["created_at", "2019-12-12 16:43:35.264559"]]
  ↳ i@ar-batching.rb:21
   (1.1ms)  COMMIT
  ↳ lib/gitlab/database/bulk_ops/bulk_insert_support.rb:35

It accomplishes this by hooking into two ActiveRecord methods

_create_record (instance method)
_insert_record (class method)

and rewriting the model's callback chain so that it will fire as follows:

save_all!(items)
save callback chain
remove after_* hooks for all items
call before_* hooks for all items
insert_all items
call after_* hooks for all items
restore callback chain

However, there are a number of challenges to overcome, not all of which are solved yet:

WIP: POC for bulk-insert API

What does this MR do?

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

WIP: POC for bulk-insert API

What does this MR do?

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Merge request reports