Introduce simple ActiveRecord-based bulk/insert functionality
Problem to solve
We have prior discussion about bulk inserts: #36992 (comment 271731371).
This is applicable to whole application, but specifically import process:
- We insert a number of simple AR objects,
- We need to run the insert via AR object, due to validations,
- We insert them one-by-one, which makes the process slow
Example where bulk insert would help:
- 
MergeRequestDiffCommitandMergeRequestDiffFile: we can insert a few hundreds for a single relation,
- 
Noteson issues and merge requests: as above, we can insert a few hundreds for a single issue and merge request.
We already do bulk insert in some cases, but this is very specific implementation:
- GitHub Importer: lib/gitlab/import/merge_request_helpers.rb:insert_or_replace_git_data.
Investigation
We tried in #36992 (closed) to create a AR-based low-level implementation
that would allow us to bulk_insert data. However, this proven unrealistic, as it would require a heavy
patching of active record to follow the execution cycle: validations + callbacks.
Proposal
Taken from: #36992 (comment 271151982)
We need something simpler, more targetted, fixing a specific relations.
Following my comment after !22783 (comment 271150808) I'm thinking that we could do something like this to have an automated way to perform bulk inserts, but done on a small scale, and targeting a specific relations ONLY:
What I'm really saying is that:
- If we disallow callbacks/validations on some models,
- We could gather them,
- We could bulk insert them, every some number of objects.
We could simply target a specific objects:
module WithBulkInsertableModels
  def supports_bulk_insert?(reflection_name)
    reflection = self.class.reflect_on_association(reflection_name)
    reflection.reflection_class < BulkInsertable
  end
  def append_to_bulk_insert(reflection_name, items)
    reflection = self.class.reflect_on_association(reflection_name)
    raise 'Does not support bulk insert' unless reflection.reflection_class < BulkInsertable
    @model_bulk_inserts ||= {} 
    @model_bulk_inserts[reflection] ||= []
    @model_bulk_inserts[reflection] += items
  end
  after_save :bulk_insert
    @model_bulk_inserts.each do |reflection_name, items|
      reflection.reflection_class.bulk_insert(items)
    end
    @model_bulk_inserts = nil
  end
end
module BulkInsert
  # disallow before_save/after_save
  # disallow before_validation
  class_methods do
    def bulk_insert(items)
      ...
    end
  end
end
class MergeRequestDiff
  include WithBulkInsertableModels
  has_many :merge_request_diff_commits
end
class MergeRequestDiffCommit
  include BulkInsertable
end
class RelationTreeRestorer
  def transform_sub_relations!(subject, data_hash, sub_relation_key, sub_relation_definition)
    ...
    if subject.respond_to?(:supports_bulk_insert?) && subject.supports_bulk_insert?(sub_relation_key)
      subject.append_to_bulk_insert(sub_relation_key, sub_data_hash)
      data_hash.delete(sub_relation_key)
    elsif sub_data_hash
      data_hash[sub_relation_key] = sub_data_hash
    else
      data_hash.delete(sub_relation_key)
    end
  end
endIt gets quite simple and maintainable as a result:
- as we ensure that some of the Models cannot have a complex validations/callbacks,
- we ensure that we can raw-insert them, which make them safe to insert with that model,
- we can re-use that elsewhere if needed, we use it now only for import/export,
- this can be our way to provide a consistent way to perform bulk insert across application in more structured manner.