Skip to content

Introduce simple ActiveRecord-based bulk/insert functionality

Problem to solve

We have prior discussion about bulk inserts: #36992 (comment 271731371).

This is applicable to whole application, but specifically import process:

  1. We insert a number of simple AR objects,
  2. We need to run the insert via AR object, due to validations,
  3. We insert them one-by-one, which makes the process slow

Example where bulk insert would help:

  1. MergeRequestDiffCommit and MergeRequestDiffFile: we can insert a few hundreds for a single relation,
  2. Notes on issues and merge requests: as above, we can insert a few hundreds for a single issue and merge request.

We already do bulk insert in some cases, but this is very specific implementation:

  1. GitHub Importer: lib/gitlab/import/merge_request_helpers.rb: insert_or_replace_git_data.

Investigation

We tried in #36992 (closed) to create a AR-based low-level implementation that would allow us to bulk_insert data. However, this proven unrealistic, as it would require a heavy patching of active record to follow the execution cycle: validations + callbacks.

Proposal

Taken from: #36992 (comment 271151982)

We need something simpler, more targetted, fixing a specific relations.

Following my comment after !22783 (comment 271150808) I'm thinking that we could do something like this to have an automated way to perform bulk inserts, but done on a small scale, and targeting a specific relations ONLY:

What I'm really saying is that:

  1. If we disallow callbacks/validations on some models,
  2. We could gather them,
  3. We could bulk insert them, every some number of objects.

We could simply target a specific objects:

module WithBulkInsertableModels
  def supports_bulk_insert?(reflection_name)
    reflection = self.class.reflect_on_association(reflection_name)
    reflection.reflection_class < BulkInsertable
  end

  def append_to_bulk_insert(reflection_name, items)
    reflection = self.class.reflect_on_association(reflection_name)
    raise 'Does not support bulk insert' unless reflection.reflection_class < BulkInsertable

    @model_bulk_inserts ||= {} 
    @model_bulk_inserts[reflection] ||= []
    @model_bulk_inserts[reflection] += items
  end

  after_save :bulk_insert
    @model_bulk_inserts.each do |reflection_name, items|
      reflection.reflection_class.bulk_insert(items)
    end

    @model_bulk_inserts = nil
  end
end

module BulkInsert
  # disallow before_save/after_save
  # disallow before_validation

  class_methods do
    def bulk_insert(items)
      ...
    end
  end
end

class MergeRequestDiff
  include WithBulkInsertableModels

  has_many :merge_request_diff_commits
end

class MergeRequestDiffCommit
  include BulkInsertable
end

class RelationTreeRestorer
  def transform_sub_relations!(subject, data_hash, sub_relation_key, sub_relation_definition)
    ...

    if subject.respond_to?(:supports_bulk_insert?) && subject.supports_bulk_insert?(sub_relation_key)
      subject.append_to_bulk_insert(sub_relation_key, sub_data_hash)
      data_hash.delete(sub_relation_key)
    elsif sub_data_hash
      data_hash[sub_relation_key] = sub_data_hash
    else
      data_hash.delete(sub_relation_key)
    end
  end
end

It gets quite simple and maintainable as a result:

  1. as we ensure that some of the Models cannot have a complex validations/callbacks,
  2. we ensure that we can raw-insert them, which make them safe to insert with that model,
  3. we can re-use that elsewhere if needed, we use it now only for import/export,
  4. this can be our way to provide a consistent way to perform bulk insert across application in more structured manner.

Intended users

Links / references