Skip to content

Gitlab::GraphQL::Loaders::BatchModelLoader is inefficient

While looking at possibly moving BatchModelLoader to a more general namespace (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/32155), I discovered what I think is bug in the way we are handling the loader. Namely, it has the potential to issue more queries than is optimal.

Minimal test case

To see this, all we need to do is:

  • lazy load A1 (an A, with id = 1)
  • lazy load B1
  • force B1
  • lazy load A2
  • force A1
  • force A2

In the optimal case this would issue two queries, one for B1, and one for [A1,A2], since we do not need to know anything about A1 until after we have requested A2. As we can see from the (annotated) REPL session below, with our current code, we issue three queries, one for each object, which is the pessimal case:

[3] pry(main)> p1 = Gitlab::BatchModelLoader.new(Project, 1).find; nil
[5] pry(main)> u1 = Gitlab::BatchModelLoader.new(User, 1).find; nil
[6] pry(main)> p1 = Gitlab::BatchModelLoader.new(User, 1).find; nil
[7] pry(main)> u1.name # forces u1
# Issues two queries, one for u1 and one for p1
  Project Load (0.8ms)  SELECT "projects".* FROM "projects" WHERE "projects"."id" = $1  [["id", 1]]
  User Load (0.5ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1  [["id", 1]]
=> "Administrator"
[8] pry(main)> p2 = Gitlab::BatchModelLoader.new(Project, 2).find; nil
[9] pry(main)> p2.title
# Issues one query, for p2
  Project Load (1.9ms)  SELECT "projects".* FROM "projects" WHERE "projects"."id" = $1  [["id", 2]]
=> "Gitlab Shell"

Solution

This is solvable, but it means using a unique block for each set of items we want to batch:

[16] pry(main)> def load_project(pid)
[16] pry(main)*   BatchLoader.for(pid).batch { |ids, loader| Project.where(id: ids).each { |p| loader.call(p.id, p) } }
[16] pry(main)* end
[17] pry(main)> def load_user(uid)
[17] pry(main)*   BatchLoader.for(uid).batch { |ids, loader| User.where(id: ids).each { |u| loader.call(u.id, u) } }
[17] pry(main)* end
[18] pry(main)> p1 = load_project(1); nil
[19] pry(main)> u1 = load_user(1); nil
[20] pry(main)> u1.name
# Issues one query, for u1
  User Load (1.5ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1  [["id", 1]]
=> "Administrator"
[21] pry(main)> p2 = load_project(2); nil
[22] pry(main)> p1.name
# Issues one query, for p1 and p2
  Project Load (2.3ms)  SELECT "projects".* FROM "projects" WHERE "projects"."id" IN ($1, $2)  [["id", 1], ["id", 2]]
=> "Gitlab Test"
[23] pry(main)> p2.name
=> "Gitlab Shell"

Recommendation

Rewrite our batchloader as a batch-loader factory, using the request cache to retrieve instances, something like:

class BatchModelLoader
  def self.for(model)
    RequestCache.get(model.name) { new(model) }
  end

  def initialize(model)
    @model = model
  end
  
  def find(id)
    BatchLoader.for(id).batch { |ids, loader| @model.where(id: ids).each { |x| loader.call(x.id, x) } }
  end
end

/cc @.luke @digitalmoksha

Edited by Luke Duncalfe