`EachBatch#each_batch` module fals into endless loop
Summary
EachBatch
module falls into endless loop under certain conditions
Steps to reproduce
- Select relation that has non unique attribute (eg:
Ci::Build
) - Select batch size that is smaller than total number of records that contains the same attribute value
- Try iterate over non unique attribute with batch of size from previous point
[6] pry(main)> Ci::Build.group(:user_id).count
(0.7ms) SELECT COUNT(*) AS count_all, "ci_builds"."user_id" AS ci_builds_user_id FROM "ci_builds" WHERE "ci_builds"."type" = $1 GROUP BY "ci_builds"."user_id" [["type", "Ci::Build"]]
=> {1=>81, 2=>35, 3=>9, 4=>53, 5=>52, 6=>8, 7=>18, 8=>51, 9=>71, 10=>39, 11=>49, 12=>29, 13=>58, 14=>31, 15=>27, 16=>24, 18=>5, 19=>76, 20=>11, 21=>25}
[7] pry(main)> Ci::Build.each_batch(of: 40, column: :user_id) do |relation|
[7] pry(main)* puts "Actuall batch size was: #{relation.count}"
[7] pry(main)* end
Ci::Build Load (0.5ms) SELECT "ci_builds"."user_id" FROM "ci_builds" WHERE "ci_builds"."type" = $1 ORDER BY "ci_builds"."user_id" ASC LIMIT $2 [["type", "Ci::Build"], ["LIMIT", 1]]
Ci::Build Load (0.4ms) SELECT "ci_builds"."user_id" FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 ORDER BY "ci_builds"."user_id" ASC LIMIT $2 OFFSET $3 [["type", "Ci::Build"], ["LIMIT", 1], ["OFFSET", 40]]
(0.4ms) SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 AND "ci_builds"."user_id" < 1 [["type", "Ci::Build"]]
Actuall batch size was: 0
Ci::Build Load (0.3ms) SELECT "ci_builds"."user_id" FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 ORDER BY "ci_builds"."user_id" ASC LIMIT $2 OFFSET $3 [["type", "Ci::Build"], ["LIMIT", 1], ["OFFSET", 40]]
(0.5ms) SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 AND "ci_builds"."user_id" < 1 [["type", "Ci::Build"]]
Actuall batch size was: 0
Ci::Build Load (0.3ms) SELECT "ci_builds"."user_id" FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 ORDER BY "ci_builds"."user_id" ASC LIMIT $2 OFFSET $3 [["type", "Ci::Build"], ["LIMIT", 1], ["OFFSET", 40]]
(0.4ms) SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 AND "ci_builds"."user_id" < 1 [["type", "Ci::Build"]]
Actuall batch size was: 0
Ci::Build Load (0.3ms) SELECT "ci_builds"."user_id" FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 ORDER BY "ci_builds"."user_id" ASC LIMIT $2 OFFSET $3 [["type", "Ci::Build"], ["LIMIT", 1], ["OFFSET", 40]]
(0.3ms) SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 AND "ci_builds"."user_id" < 1 [["type", "Ci::Build"]]
Actuall batch size was: 0
Ci::Build Load (0.3ms) SELECT "ci_builds"."user_id" FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 ORDER BY "ci_builds"."user_id" ASC LIMIT $2 OFFSET $3 [["type", "Ci::Build"], ["LIMIT", 1], ["OFFSET", 40]]
(0.4ms) SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 AND "ci_builds"."user_id" < 1 [["type", "Ci::Build"]]
Actuall batch size was: 0
Ci::Build Load (0.3ms) SELECT "ci_builds"."user_id" FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 ORDER BY "ci_builds"."user_id" ASC LIMIT $2 OFFSET $3 [["type", "Ci::Build"], ["LIMIT", 1], ["OFFSET", 40]]
(0.3ms) SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 AND "ci_builds"."user_id" < 1 [["type", "Ci::Build"]]
Actuall batch size was: 0
Ci::Build Load (0.3ms) SELECT "ci_builds"."user_id" FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 ORDER BY "ci_builds"."user_id" ASC LIMIT $2 OFFSET $3 [["type", "Ci::Build"], ["LIMIT", 1], ["OFFSET", 40]]
(0.3ms) SELECT COUNT(*) FROM "ci_builds" WHERE "ci_builds"."type" = $1 AND "ci_builds"."user_id" >= 1 AND "ci_builds"."user_id" < 1 [["type", "Ci::Build"]]
Actuall batch size was: 0
Example Project
Problem was discovered on local development environment, due to preconditions that has to be met for the issue to occur, it may not yet be manifested on production environment.
What is the current bug behavior?
Method each_batch
falls into endless loop and fail to ever finish its delegated task
What is the expected correct behavior?
Method each_batch
successfully iterates over whole relation records and perform delegated task
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
Issue is caused by using OFFSET
keyword to find end of the batch range (https://gitlab.com/gitlab-org/gitlab/-/blob/e7650bc4442/app/models/concerns/each_batch.rb#L76)
In case that column selected to iterate over relation does not have unique values and selected batch size is smaller than the number of total number of records that holds the same value, calculated batch range constitutes empty range AND "ci_builds"."user_id" >= 1 AND "ci_builds"."user_id" < 1