Gitlab::GraphQL::Loaders::BatchModelLoader is inefficient
While looking at possibly moving BatchModelLoader
to a more general namespace (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/32155), I discovered what I think is bug in the way we are handling the loader. Namely, it has the potential to issue more queries than is optimal.
Minimal test case
To see this, all we need to do is:
- lazy load A1 (an
A
, withid = 1
) - lazy load B1
- force B1
- lazy load A2
- force A1
- force A2
In the optimal case this would issue two queries, one for B1, and one for [A1,A2], since we do not need to know anything about A1 until after we have requested A2. As we can see from the (annotated) REPL session below, with our current code, we issue three queries, one for each object, which is the pessimal case:
[3] pry(main)> p1 = Gitlab::BatchModelLoader.new(Project, 1).find; nil
[5] pry(main)> u1 = Gitlab::BatchModelLoader.new(User, 1).find; nil
[6] pry(main)> p1 = Gitlab::BatchModelLoader.new(User, 1).find; nil
[7] pry(main)> u1.name # forces u1
# Issues two queries, one for u1 and one for p1
Project Load (0.8ms) SELECT "projects".* FROM "projects" WHERE "projects"."id" = $1 [["id", 1]]
User Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 1]]
=> "Administrator"
[8] pry(main)> p2 = Gitlab::BatchModelLoader.new(Project, 2).find; nil
[9] pry(main)> p2.title
# Issues one query, for p2
Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE "projects"."id" = $1 [["id", 2]]
=> "Gitlab Shell"
Solution
This is solvable, but it means using a unique block for each set of items we want to batch:
[16] pry(main)> def load_project(pid)
[16] pry(main)* BatchLoader.for(pid).batch { |ids, loader| Project.where(id: ids).each { |p| loader.call(p.id, p) } }
[16] pry(main)* end
[17] pry(main)> def load_user(uid)
[17] pry(main)* BatchLoader.for(uid).batch { |ids, loader| User.where(id: ids).each { |u| loader.call(u.id, u) } }
[17] pry(main)* end
[18] pry(main)> p1 = load_project(1); nil
[19] pry(main)> u1 = load_user(1); nil
[20] pry(main)> u1.name
# Issues one query, for u1
User Load (1.5ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 1]]
=> "Administrator"
[21] pry(main)> p2 = load_project(2); nil
[22] pry(main)> p1.name
# Issues one query, for p1 and p2
Project Load (2.3ms) SELECT "projects".* FROM "projects" WHERE "projects"."id" IN ($1, $2) [["id", 1], ["id", 2]]
=> "Gitlab Test"
[23] pry(main)> p2.name
=> "Gitlab Shell"
Recommendation
Rewrite our batchloader as a batch-loader factory, using the request cache to retrieve instances, something like:
class BatchModelLoader
def self.for(model)
RequestCache.get(model.name) { new(model) }
end
def initialize(model)
@model = model
end
def find(id)
BatchLoader.for(id).batch { |ids, loader| @model.where(id: ids).each { |x| loader.call(x.id, x) } }
end
end
/cc @.luke @digitalmoksha