Gitlab::GraphQL::Loaders::BatchModelLoader is inefficient (#31213) · Issues · GitLab.org / GitLab

Gitlab::GraphQL::Loaders::BatchModelLoader is inefficient

While looking at possibly moving `BatchModelLoader` to a more general namespace (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/32155), I discovered what I think is bug in the way we are handling the loader. Namely, it has the potential to issue more queries than is optimal. ## Minimal test case To see this, all we need to do is: * lazy load A1 (an `A`, with `id = 1`) * lazy load B1 * force B1 * lazy load A2 * force A1 * force A2 In the optimal case this would issue two queries, one for B1, and one for [A1,A2], since we do not need to know anything about A1 until after we have requested A2. As we can see from the (annotated) REPL session below, with our current code, we issue three queries, one for each object, which is the pessimal case: ``` [3] pry(main)> p1 = Gitlab::BatchModelLoader.new(Project, 1).find; nil [5] pry(main)> u1 = Gitlab::BatchModelLoader.new(User, 1).find; nil [6] pry(main)> p1 = Gitlab::BatchModelLoader.new(User, 1).find; nil [7] pry(main)> u1.name # forces u1 # Issues two queries, one for u1 and one for p1 Project Load (0.8ms) SELECT "projects".* FROM "projects" WHERE "projects"."id" = $1 [["id", 1]] User Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 1]] => "Administrator" [8] pry(main)> p2 = Gitlab::BatchModelLoader.new(Project, 2).find; nil [9] pry(main)> p2.title # Issues one query, for p2 Project Load (1.9ms) SELECT "projects".* FROM "projects" WHERE "projects"."id" = $1 [["id", 2]] => "Gitlab Shell" ``` ## Solution This is solvable, but it means using a unique block for each set of items we want to batch: ``` [16] pry(main)> def load_project(pid) [16] pry(main)* BatchLoader.for(pid).batch { |ids, loader| Project.where(id: ids).each { |p| loader.call(p.id, p) } } [16] pry(main)* end [17] pry(main)> def load_user(uid) [17] pry(main)* BatchLoader.for(uid).batch { |ids, loader| User.where(id: ids).each { |u| loader.call(u.id, u) } } [17] pry(main)* end [18] pry(main)> p1 = load_project(1); nil [19] pry(main)> u1 = load_user(1); nil [20] pry(main)> u1.name # Issues one query, for u1 User Load (1.5ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 1]] => "Administrator" [21] pry(main)> p2 = load_project(2); nil [22] pry(main)> p1.name # Issues one query, for p1 and p2 Project Load (2.3ms) SELECT "projects".* FROM "projects" WHERE "projects"."id" IN ($1, $2) [["id", 1], ["id", 2]] => "Gitlab Test" [23] pry(main)> p2.name => "Gitlab Shell" ``` ## Recommendation Rewrite our batchloader as a batch-loader factory, using the request cache to retrieve instances, something like: ```ruby class BatchModelLoader def self.for(model) RequestCache.get(model.name) { new(model) } end def initialize(model) @model = model end def find(id) BatchLoader.for(id).batch { |ids, loader| @model.where(id: ids).each { |x| loader.call(x.id, x) } } end end ``` /cc @.luke @digitalmoksha

issue