Using goldiloader results in Ci::RegisterJobService#execute spending a lot of time eager loading a lot of associations
The method Ci::RegisterJobService#execute has the following snippet of code:
builds.find do |build|
next unless runner.can_pick?(build)
It seems that with Goldiloader we end up trying to eager load a lot of data. For example:
SELECT "taggings".*
FROM "taggings"
WHERE "taggings"."taggable_type" = 'CommitStatus'
AND "taggings"."taggable_id" IN (63461598, 63461599, 63461600, 63461608, 63461610, 63461611, 63461620, 63461622, 63461624, 63461632, 63461655, 63461658, 63461678, 63461683, 63461715, 63461725, 63461728, 63461752, 63461753, 63461754, 63461784, 63461787, 63461794, 63461799, 63461831, 63461848, 63461853, 63461854, 63461859, 63461860, 63461865, 63461866, 63461867, 63461898, 63461908, 63461925, 63461944, 63461958, 63461986, 63461997, 63461999, 63462020, 63462035, 63462043, 63462044, 63462057, 63462068, 63462071, 63462189, 63462190, 63462191, 63462208, 63462216, 63462235, 63462238,
...
This then results in the API POST /api/v4/jobs/request taking a long time to complete. During a Zoom call we found out that we can go back to the old behaviour by disabling automatic eager loading for this specific case using:
builds.auto_include(false).find do |build|
next unless runner.can_pick?(build)
This however means we're back to lazy loading, which most likely means we'll be running into N+1 query problems.
We need to find a way so we don't have an N+1 query problem, but also don't run into the issue of the eager loading taking forever.
I'm assigning this ~AP1 since this API endpoint is in the Frequently Used table in https://performance.gitlab.net/dashboard/db/sql-timings-overview?orgId=1&var-process_type=grape&var-action=Grape%23POST%20%2Fapi%2Fjobs%2Frequest&var-database=Production&var-rarely_used=5000&var-commonly_used=40000&var-ignore_actions=%2F%5E(Gitlab::RequestForgery%7CRootController%7CMetricsController)%2F&var-minimum_p99=100&var-minimum_sql=10 (6 million requests per 24 hours).