Technical evaluation: each_batch does not work with preloaded associations
Proposal
EachBatch#each_batch does not work when called with an ActiveRecord
relation that preloads (includes) other associations:
Sbom::Occurrence.includes(:component).each_batch(of: 100) do |batch|
batch.each do |occurrence|
puts "Occurence: #{occurrence.inspect}"
end
end
Sbom::Occurrence Load (0.7ms) SELECT "sbom_occurrences"."id" FROM "sbom_occurrences" ORDER BY "sbom_occurrences"."id" ASC LIMIT 1
ActiveModel::MissingAttributeError: missing attribute: component_id
This is caused because take is used by each_batch
here, which will attempt to preload the associations:
preload.each do |associations|
ActiveRecord::Associations::Preloader.new(records: records, associations: associations, scope: scope).call
end
For example:
Sbom::Occurrence.includes(:component).select(:id).take
Sbom::Occurrence Load (1.1ms) SELECT "sbom_occurrences"."id" FROM "sbom_occurrences" LIMIT 1
ActiveModel::MissingAttributeError: missing attribute: component_id
The above fails because it preloads (includes) the component
association, however the select(:id)
statement forces only the id
column to be returned from the sbom_occurrences
table, which prevents preloading the sbom_components
table because it requires the component_id
to be passed.
Background
Please see this discussion for more details.
Workarounds
Use either of the following as workarounds:
-
Use
in_batches(of:)
instead ofeach_batch
, however, be aware that there are downsides to usingin_batches(of:)
as described here:in_batches(of:)
is implemented in a way that is not very efficient, both query and memory usage wise. -
Remove the pre-loading before calling
each_batch(of:)
and apply it after:Sbom::PossiblyAffectedOccurrencesFinder.new(package_name: 'semver', purl_type: 'npm').execute.except(:includes).each_batch(of: 100) do |batch| batch.with_component_source_version_project_and_pipeline.each do |sbom_occurrence| ...
Possible fixes
Click to expand previous possible fix which doesn't work
Add :includes
to the list of methods excluded by EachBatch#each_batch:
Sbom::Occurrence.includes(:component).select(:id).except(:includes).take
Sbom::Occurrence Load (0.6ms) SELECT "sbom_occurrences"."id" FROM "sbom_occurrences" LIMIT 1
=> #<Sbom::Occurrence:0x0000000165bdfec8 id: 1>
This can be achieved by modifying EachBatch#each_batch as follows:
diff --git a/app/models/concerns/each_batch.rb b/app/models/concerns/each_batch.rb
index 945d286a2fd4..8140e7a2c85e 100644
--- a/app/models/concerns/each_batch.rb
+++ b/app/models/concerns/each_batch.rb
@@ -54,7 +54,7 @@ def each_batch(of: 1000, column: primary_key, order: :asc, order_hint: nil)
'the column: argument must be set to a column name to use for ordering rows'
end
- start = except(:select)
+ start = except(:select, :includes)
.select(column)
.reorder(column => order)
@@ -69,7 +69,7 @@ def each_batch(of: 1000, column: primary_key, order: :asc, order_hint: nil)
1.step do |index|
start_cond = arel_table[column].gteq(start_id)
start_cond = arel_table[column].lteq(start_id) if order == :desc
- stop = except(:select)
+ stop = except(:select, :includes)
.select(column)
.where(start_cond)
.reorder(column => order)
Note: upon further investigation, adding except(:includes)
will allow the each_batch
call to execute, but it removes any of the preloads, causing an n + 1
query. Using in_batches(of:)
doesn't suffer from this issue, however:
in_batches(of:)
is implemented in a way that is not very efficient, both query and memory usage wise.