Several n+1 queries when loading the Projects dashboard
Summary
There are several n+1 queries that are currently executed when someone visits the projects dashboard. Four queries were already known to be n+1, and we've made a change to our spec factories that has surfaced three more.
The spec that tests and documents these lives in spec/features/dashboard/projects_spec.rb
and is called 'avoids an N+1 query in dashboard index'
:
it 'avoids an N+1 query in dashboard index' do
create(:ci_pipeline, :with_job, status: :success, project: project, ref: project.default_branch, sha: project.commit.sha)
...
# There are seven known N+1 database queries:
# 1. Project#open_issues_count
# 2. Project#open_merge_requests_count
# 3. Project#forks_count
# 4. ProjectsHelper#load_pipeline_status
# 5. RendersMemberAccess#preload_max_member_access_for_collection
# 6. User#max_member_access_for_project_ids
# 7. CommitWithPipeline#last_pipeline
Some of these appear to be addressed with some caching, since they appear in this spec but not in the production performance bar query list. There's discussion of a few of them in gitlab-foss!32169 (merged)
Improvements
Refactoring these queries should improve the performance of our projects dashboard. I don't know the background of all seven queries, but I've added details below for the ones I do.
Risks
Being a high-traffic page, the consequences of refactoring here will be large, either good or bad. Some additional specs, including one that actually loads the page with 20 projects and 20 pipelines to take a full view of the page performance could be useful in making sure nothing goes terribly awry.
Involved components
5) RendersMemberAccess#preload_max_member_access_for_collection
User Load (0.6ms) SELECT "users".* FROM "users" WHERE "users"."id" IN ($1, $2) /*application:test,controller:projects,action:index,correlation_id:eef4d297-683e-4578-ac2e-78eca86e6b92*/ [["id", 1], ["id", 3]]
↳ app/controllers/concerns/renders_member_access.rb:20:in `preload_max_member_access_for_collection'
QueryRecorder SQL: SELECT "users".* FROM "users" WHERE "users"."id" IN ($1, $2) /*application:test,controller:projects,action:index,correlation_id:eef4d297-683e-4578-ac2e-78eca86e6b92*/
--> spec/support/helpers/query_recorder.rb:57:in `callback'
--> app/controllers/concerns/renders_member_access.rb:20:in `preload_max_member_access_for_collection'
--> app/controllers/concerns/renders_member_access.rb:11:in `prepare_projects_for_rendering'
--> app/controllers/dashboard/projects_controller.rb:78:in `load_projects'
--> app/controllers/dashboard/projects_controller.rb:54:in `projects'
This query appear to be called for every project on the dashboard, even though the query includes the IDs for every project.
-
Investigation:
This is not technically an N+1 issue, because it's produced by:
# app/controllers/dashboard/projects_controller.rb def preload_associations(projects) projects.includes(:route, :creator, :group, :topics, namespace: [:route, :owner]).preload(:project_feature) end
When it's preloading the associations
:creator
and:owner
of:namespace
, for the first round, since there is only one user which is both thecreator
and theowner
of the singleproject
, the user query can be cached. When testing with a second project added, the new project has a different owner and creator, and for some reason the user preload receives the two user IDs in different orders when called for each of the association, and therefore produces two queries. The different order is a red herring, however, since the:creator
and:owner
associations can be fully disjoint sets of users. This is not an N+1 because the number of queries produced is at most 2 (one for thecreator
association and one for theowner
association). -
Possible solution:
Preload the users for
:creator
ofprojects
and:owner
of:namespace
after the projects are queried, e.g.:# app/controllers/dashboard/projects_controller.rb def load_projects(finder_params) # ... preload_users_for(prepare_projects_for_rendering(projects)) end def preload_associations(projects) projects.includes(:route, :group, :topics, namespace: [:route]).preload(:project_feature) end def preload_users_for(projects) user_ids = projects.map(&:creator_id) + projects.map(&:namespace).map(&:owner_id) users = User.where(id: user_ids).group_by(:id) projects.each do |p| p.association(:creator).target = users[p.creator_id] p.namespace.association(:owner).target = users[p.namespace.owner_id] end end
-
Alternative solution: The pseudo-N+1 disappears if we make the test projects' have distinct
owner
s andcreator
s. In this case, the control sample will also have the "extra" query because the two associations are disjoint sets of users.
6) ProjectTeam#max_member_access_for_project_ids
The query recorder complains about
SELECT MAX("project_authorizations"."access_level") AS maximum_access_level, "project_authorizations"."user_id" AS project_authorizations_user_id FROM "project_authorizations"...
-- (expected: 0, got: 1)
WHERE "project_authorizations"."project_id" = 32 AND "project_authorizations"."user_id" = 40 GROUP BY "project_authorizations"."user_id"
-
Investigation:
This is caused by:
(0.3ms) SELECT MAX("project_authorizations"."access_level") AS maximum_access_level, "project_authorizations"."user_id" AS project_authorizations_user_id FROM "project_authorizations" WHERE "project_authorizations"."project_id" = 32 AND "project_authorizations"."user_id" = 40 GROUP BY "project_authorizations"."user_id" /*application:test,correlation_id:e87f8fa5-3a32-4552-944f-f9e9676efc02,endpoint_id:Dashboard::ProjectsController#index,db_config_name:main,line:/app/models/project_team.rb:188:in `block in max_member_access_for_user_ids'*/ ↳ app/models/project_team.rb:188:in `block in max_member_access_for_user_ids' Query Trace: app/models/project_team.rb:188:in `block in max_member_access_for_user_ids' lib/gitlab/safe_request_loader.rb:43:in `update_resource_data' lib/gitlab/safe_request_loader.rb:22:in `execute' lib/gitlab/safe_request_loader.rb:6:in `execute' app/models/project_team.rb:182:in `max_member_access_for_user_ids' app/models/project_team.rb:201:in `max_member_access' app/policies/project_policy.rb:925:in `lookup_access_level!' ee/app/policies/ee/project_policy.rb:571:in `lookup_access_level!' app/policies/project_policy.rb:917:in `team_access_level' app/policies/project_policy.rb:19:in `block in <class:ProjectPolicy>' app/models/ability.rb:86:in `allowed?' app/controllers/application_controller.rb:246:in `can?' app/controllers/application_controller.rb:62:in `can?' app/helpers/projects_helper.rb:439:in `able_to_see_forks_count?' app/views/shared/projects/_list.html.haml:38 app/views/shared/projects/_list.html.haml:36:in `each_with_index' app/views/shared/projects/_list.html.haml:36 app/views/dashboard/projects/_projects.html.haml:1 app/views/dashboard/projects/index.html.haml:16 app/controllers/application_controller.rb:143:in `render'
The list is trying to check the access level for each project for current user in the
_list.html.haml
partial. -
Potential solution:
Since the access levels are preloaded at
User#max_member_access_for_project_ids
from app/controllers/concerns/renders_member_access.rb:18, I assume this preload is trying to batch load the access levels for projects, however the resource keys forUser#max_member_access_for_project_ids
andProjectTeam#max_member_access_for_project_ids
are different, therefore, the cache can't be reused. It's better to modify app/controllers/concerns/renders_projects_list.rb, e.g.:# app/controllers/concerns/renders_projects_list.rb def prepare_projects_for_rendering(projects) cache_project_accesses(projects, preload_max_member_access_for_collection(Project, projects)) # ... end def cache_project_accesses(projects, accesses) projects.group_by(&:id).try do |project_hash| accesses.each do |project_id, access_level| project = project_hash[project_id] Gitlab::SafeRequestLoader.execute(resource_key: project.max_member_access_for_resource_key(User), resource_ids: [current_user.id], default_value: Gitlab::Access::NO_ACCESS) do |user_ids| { current_user.id => access_level } end end end end
-
Another possible solution: We can change the call in
lookup_access_level!
to use- project.team.max_member_access(@user.id) + @user.max_member_access_for_project_ids([project.id])[project.id]
This is a higher-risk change, but it might be a better overall use of the cache in this context, since we usually have a single user operating on multiple projects, as opposed to the other way around (which the code currently optimizes for).
7) CommitWithPipeline#last_pipeline
Ci::Pipeline Load (0.6ms) SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = $1 AND ("ci_pipelines"."config_source" IN ($2, $3, $4, $5) OR "ci_pipelines"."config_source" IS NULL) AND "ci_pipelines"."sha" = $6 ORDER BY "ci_pipelines"."id" DESC LIMIT $7 /*application:test,controller:projects,action:index,correlation_id:eef4d297-683e-4578-ac2e-78eca86e6b92*/ [["project_id", 2], ["config_source", 1], ["config_source", 2], ["config_source", 4], ["config_source", 5], ["sha", "b83d6e391c22777fca1ed3012fce84f633d7fed0"], ["LIMIT", 1]]
↳ app/models/commit_with_pipeline.rb:17:in `block in last_pipeline'
QueryRecorder SQL: SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = $1 AND ("ci_pipelines"."config_source" IN ($2, $3, $4, $5) OR "ci_pipelines"."config_source" IS NULL) AND "ci_pipelines"."sha" = $6 ORDER BY "ci_pipelines"."id" DESC LIMIT $7 /*application:test,controller:projects,action:index,correlation_id:eef4d297-683e-4578-ac2e-78eca86e6b92*/
--> spec/support/helpers/query_recorder.rb:57:in `callback'
--> app/models/commit_with_pipeline.rb:17:in `block in last_pipeline'
--> lib/gitlab/utils/strong_memoize.rb:30:in `strong_memoize'
--> app/models/commit_with_pipeline.rb:16:in `last_pipeline'
--> app/models/commit.rb:128:in `last_pipeline'
--> app/models/project.rb:354:in `last_pipeline'
--> app/views/shared/projects/_project.html.haml:69
--> app/views/shared/projects/_project.html.haml:22
--> app/views/shared/projects/_list.html.haml:41
--> app/views/shared/projects/_list.html.haml:39:in `each_with_index'
--> app/views/shared/projects/_list.html.haml:39
--> app/views/dashboard/projects/_projects.html.haml:1
--> app/views/dashboard/projects/index.html.haml:15
--> app/controllers/application_controller.rb:125:in `render'
--> app/controllers/dashboard/projects_controller.rb:60:in `block in render_projects'
This query is also called once for every project, loading the most recent pipeline status before rendering in the partial.
8) Ci::Pipeline#detailed_status
SELECT COUNT(*) AS count_all, "ci_builds"."commit_id" AS ci_builds_commit_id FROM "ci_builds"...
-- (expected: 1, got: 0)
WHERE "ci_builds"."commit_id" = 28 AND ("ci_builds"."retried" = FALSE OR "ci_builds"."retried" IS NULL) AND "ci_builds"."allow_failure" = TRUE AND "ci_builds"."status" IN ('failed', 'canceled') GROUP BY "ci_builds"."commit_id"
-- (expected: 0, got: 1)
WHERE "ci_builds"."commit_id" = 29 AND ("ci_builds"."retried" = FALSE OR "ci_builds"."retried" IS NULL) AND "ci_builds"."allow_failure" = TRUE AND "ci_builds"."status" IN ('failed', 'canceled') GROUP BY "ci_builds"."commit_id"
-- (expected: 0, got: 1)
WHERE "ci_builds"."commit_id" IN (29, 28) AND ("ci_builds"."retried" = FALSE OR "ci_builds"."retried" IS NULL) AND "ci_builds"."allow_failure" = TRUE AND "ci_builds"."status" IN ('failed', 'canceled') GROUP BY "ci_builds"."commit_id"
(0.3ms) SELECT COUNT(*) AS count_all, "ci_builds"."commit_id" AS ci_builds_commit_id FROM "ci_builds" WHERE "ci_builds"."commit_id" IN (29, 28) AND ("ci_builds"."retried" = FALSE OR "ci_builds"."retried" IS NULL) AND "ci_builds"."allow_failure" = TRUE AND "ci_builds"."status" IN ('failed', 'canceled') GROUP BY "ci_builds"."commit_id" /*application:test,correlation_id:f65d43ae-6d95-414a-93b6-b90aa6e6cc45,endpoint_id:Dashboard::ProjectsController#index,db_config_name:ci,line:/app/models/ci/pipeline.rb:746:in `block in number_of_warnings'*/
↳ app/models/ci/pipeline.rb:746:in `block in number_of_warnings'
Query Trace:
app/models/ci/pipeline.rb:746:in `block in number_of_warnings'
app/models/ci/pipeline.rb:737:in `has_warnings?'
lib/gitlab/ci/status/success_warning.rb:28:in `matches?'
lib/gitlab/ci/status/factory.rb:38:in `block (2 levels) in extended_statuses'
lib/gitlab/ci/status/factory.rb:38:in `each'
lib/gitlab/ci/status/factory.rb:38:in `find'
lib/gitlab/ci/status/factory.rb:38:in `block in extended_statuses'
lib/gitlab/ci/status/factory.rb:37:in `each'
lib/gitlab/ci/status/factory.rb:37:in `flat_map'
lib/gitlab/ci/status/factory.rb:37:in `extended_statuses'
lib/gitlab/ci/status/factory.rb:14:in `fabricate!'
app/models/ci/pipeline.rb:1018:in `detailed_status'
app/views/shared/projects/_project.html.haml:75
app/views/shared/projects/_list.html.haml:38
app/views/shared/projects/_list.html.haml:36:in `each_with_index'
app/views/shared/projects/_list.html.haml:36
app/views/dashboard/projects/_projects.html.haml:1
app/views/dashboard/projects/index.html.haml:16
app/controllers/application_controller.rb:143:in `render'
The detailed_status
is loaded for each project's last_pipeline
above in the _project.html.haml
Proposal
Rather than solve each of these queries at one time, it might make sense to implement a higher level solution for loading 20 pipelines with 20 detailed statuses for 20 projects all in one place, and passing the required data into render
.
A DashboardPresenter might help, if we could accept the user up-front and do some permissions checking that would allow us to not execute the member-access queries every time.
There's also interesting search-related work discussed below (!32892 (merged)) that introduces a CommitMetadata object that could be useful for efficiently loading Project CI status.