Skip to content

Query used for counting the number of notes on a merge request is slow

Discovered in https://gitlab.com/gitlab-org/gitlab-ce/issues/37814

When displaying a merge request we run a query to get the number of notes. This query will look like the following:

SELECT COUNT(*)
FROM notes
WHERE (
    (
        project_id = 13083
        AND noteable_type = 'MergeRequest'
        AND noteable_id = 3985770
    )
    OR (
        (
            project_id = 13083
            OR project_id = 13083
        )
        AND noteable_type = 'Commit'
        AND commit_id IN ('aac1de46c9be659b74da12f704412f38292974db', '0395c47193b3bbf6b4f060f28c9f632580313a35')
    )
)
AND notes.system = 'f';

In production this produces the following plan:

                                                                                                                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1595.39..1595.40 rows=1 width=8) (actual time=2.147..2.147 rows=1 loops=1)
   ->  Bitmap Heap Scan on notes  (cost=1555.00..1595.39 rows=1 width=0) (actual time=1.509..2.112 rows=135 loops=1)
         Recheck Cond: (((noteable_id = 3985770) AND ((noteable_type)::text = 'MergeRequest'::text)) OR (((commit_id)::text = ANY ('{aac1de46c9be659b74da12f704412f38292974db,0395c47193b3bbf6b4f060f28c9f632580313a35}'::text[])) AND (project_id = 13083) AND ((noteable_type)::text = 'Commit'::text)))
         Filter: ((NOT system) AND (project_id = 13083))
         Rows Removed by Filter: 175
         Heap Blocks: exact=275
         ->  BitmapOr  (cost=1555.00..1555.00 rows=20 width=0) (actual time=1.470..1.470 rows=0 loops=1)
               ->  Bitmap Index Scan on index_notes_on_noteable_id_and_noteable_type  (cost=0.00..2.74 rows=18 width=0) (actual time=0.080..0.080 rows=308 loops=1)
                     Index Cond: ((noteable_id = 3985770) AND ((noteable_type)::text = 'MergeRequest'::text))
               ->  BitmapAnd  (cost=1552.01..1552.01 rows=2 width=0) (actual time=1.387..1.387 rows=0 loops=1)
                     ->  Bitmap Index Scan on index_notes_on_commit_id  (cost=0.00..29.95 rows=1176 width=0) (actual time=0.035..0.035 rows=2 loops=1)
                           Index Cond: ((commit_id)::text = ANY ('{aac1de46c9be659b74da12f704412f38292974db,0395c47193b3bbf6b4f060f28c9f632580313a35}'::text[]))
                     ->  Bitmap Index Scan on index_notes_on_project_id_and_noteable_type  (cost=0.00..1521.81 rows=59125 width=0) (actual time=1.314..1.314 rows=6460 loops=1)
                           Index Cond: ((project_id = 13083) AND ((noteable_type)::text = 'Commit'::text))
 Planning time: 0.196 ms
 Execution time: 2.197 ms

While the timing here isn't too bad in most cases the query as a whole will take between 10 and 30 milliseconds, depending on how (un)lucky you are.

A ton of information can be found in the following comments:

To summarise, the only way we can fix this particular problem (forever) is by creating a table storing the unique commit IDs for every merge request. This would then allow us to join the notes table on that, removing the need for scanning over all notes of the MR's target project (which we do currently and is one of the reasons this query doesn't perform well).

An intermediate solution may be to adjust the query to run two separate queries and sum the results as discussed in https://gitlab.com/gitlab-org/gitlab-ce/issues/37814#note_40876003, but it's not clear how effective this will be in production.