Skip to content

Introduce simple ActiveRecord-based bulk-insert functionality

Matthias Käppler requested to merge 196844-safe-bulk-inserts into master

What does this MR do?

Adds support for bulk-inserting AR models safely.

References: #196844 (closed)

New bulk insertion API

Bulk insertions are crucial for storing large amounts of data efficiently. However, we also identified the need for this to happen in a safe manner, i.e. by ensuring bulk insertions are only available when we can have certain guarantees that we are not causing integrity problems or violate business rules (often encoded in ActiveRecord validations.)

This MR extends on !24168 (merged) in the following ways:

BulkInsertSafe.[bulk_insert|bulk_insert!]

These two new methods operate on sequences of ActiveRecord objects. They behave similarly to save and save! in the sense that they run validations and either return a boolean indicating success or raise an exception. This ensures that we won't be writing data which would not pass if they were instead inserted via save or similar built-ins.

Internally these calls rely on ActiveRecord 6's new InsertAll type, which inserts hashes in bulk, but does not run validations. This and the fact that validations are run are the primary differences to the existing Database.bulk_insert helper.

Note that as of !24168 (merged) you can only access this functionality if (as the name suggests) your target model type is considered "safe for bulk insertion"; these rules are currently fairly simple and prevent certain callbacks from being registered, but can be easily expanded on in the future.

The bulk_insert method takes the following arguments:

  • items (required): ActiveRecord instances to be inserted
  • :batch_size (optional, default 500): Maximum amount of rows that will be inserted simultaneously
  • :validate (optional, default true): Boolean that allows to bypass validations (for instance when you run them outside of this call)
  • &handle_attributes (optional): A block that will be invoked for every attribute hash about to be inserted (this allows callers to inject or transform rows before insertion)

Code example:

class LabelLink < ApplicationRecord
  include BulkInsertSafe
end

label_links = ... # build some label links
LabelLink.bulk_insert(label_links, batch_size: 100)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Migration path

Since this new API extends on existing bulk-insert functionality in several ways, we should establish:

  • whether it can fully replace Database.bulk_insert
  • or whether it should live alongside it (considering it operates on AR instances, not row hashes)
  • or whether we should first migrate to insert_all everywhere
Edited by 🤖 GitLab Bot 🤖

Merge request reports