Query used for counting the number of notes on a merge request is slow
Discovered in https://gitlab.com/gitlab-org/gitlab-ce/issues/37814
When displaying a merge request we run a query to get the number of notes. This query will look like the following:
SELECT COUNT(*)
FROM notes
WHERE (
(
project_id = 13083
AND noteable_type = 'MergeRequest'
AND noteable_id = 3985770
)
OR (
(
project_id = 13083
OR project_id = 13083
)
AND noteable_type = 'Commit'
AND commit_id IN ('aac1de46c9be659b74da12f704412f38292974db', '0395c47193b3bbf6b4f060f28c9f632580313a35')
)
)
AND notes.system = 'f';
In production this produces the following plan:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1595.39..1595.40 rows=1 width=8) (actual time=2.147..2.147 rows=1 loops=1)
-> Bitmap Heap Scan on notes (cost=1555.00..1595.39 rows=1 width=0) (actual time=1.509..2.112 rows=135 loops=1)
Recheck Cond: (((noteable_id = 3985770) AND ((noteable_type)::text = 'MergeRequest'::text)) OR (((commit_id)::text = ANY ('{aac1de46c9be659b74da12f704412f38292974db,0395c47193b3bbf6b4f060f28c9f632580313a35}'::text[])) AND (project_id = 13083) AND ((noteable_type)::text = 'Commit'::text)))
Filter: ((NOT system) AND (project_id = 13083))
Rows Removed by Filter: 175
Heap Blocks: exact=275
-> BitmapOr (cost=1555.00..1555.00 rows=20 width=0) (actual time=1.470..1.470 rows=0 loops=1)
-> Bitmap Index Scan on index_notes_on_noteable_id_and_noteable_type (cost=0.00..2.74 rows=18 width=0) (actual time=0.080..0.080 rows=308 loops=1)
Index Cond: ((noteable_id = 3985770) AND ((noteable_type)::text = 'MergeRequest'::text))
-> BitmapAnd (cost=1552.01..1552.01 rows=2 width=0) (actual time=1.387..1.387 rows=0 loops=1)
-> Bitmap Index Scan on index_notes_on_commit_id (cost=0.00..29.95 rows=1176 width=0) (actual time=0.035..0.035 rows=2 loops=1)
Index Cond: ((commit_id)::text = ANY ('{aac1de46c9be659b74da12f704412f38292974db,0395c47193b3bbf6b4f060f28c9f632580313a35}'::text[]))
-> Bitmap Index Scan on index_notes_on_project_id_and_noteable_type (cost=0.00..1521.81 rows=59125 width=0) (actual time=1.314..1.314 rows=6460 loops=1)
Index Cond: ((project_id = 13083) AND ((noteable_type)::text = 'Commit'::text))
Planning time: 0.196 ms
Execution time: 2.197 ms
While the timing here isn't too bad in most cases the query as a whole will take between 10 and 30 milliseconds, depending on how (un)lucky you are.
A ton of information can be found in the following comments:
- https://gitlab.com/gitlab-org/gitlab-ce/issues/37814#note_40874848
- https://gitlab.com/gitlab-org/gitlab-ce/issues/37814#note_40876003
- https://gitlab.com/gitlab-org/gitlab-ce/issues/37814#note_40878106
- https://gitlab.com/gitlab-org/gitlab-ce/issues/37814#note_41006294
- https://gitlab.com/gitlab-org/gitlab-ce/issues/37814#note_41007551
- https://gitlab.com/gitlab-org/gitlab-ce/issues/37814#note_41011134
- https://gitlab.com/gitlab-org/gitlab-ce/issues/37814#note_41014169
- https://gitlab.com/gitlab-org/gitlab-ce/issues/37814#note_41016830
To summarise, the only way we can fix this particular problem (forever) is by creating a table storing the unique commit IDs for every merge request. This would then allow us to join the notes
table on that, removing the need for scanning over all notes of the MR's target project (which we do currently and is one of the reasons this query doesn't perform well).
An intermediate solution may be to adjust the query to run two separate queries and sum the results as discussed in https://gitlab.com/gitlab-org/gitlab-ce/issues/37814#note_40876003, but it's not clear how effective this will be in production.