Optimize count query by using a limited count only (!12231) · Merge requests · GitLab.org / GitLab

Andreas Brandl requested to merge ab-60524-optimize-query into master May 03, 2019

What does this MR do?

This improves a high-frequency database query by using a limited count approach. We stop the counting as soon as we hit the number we're interested in.

Shoutouts to @ayufan for pointing this out in https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/11217#note_163970022.

Conformity

Performance and testing

Query before (with plan attached comment): https://gitlab.com/gitlab-org/gitlab-ee/snippets/1854328
Query after (with plan attached in comment): https://gitlab.com/gitlab-org/gitlab-ee/snippets/1854330

Note that the difference in performance is not visible with these plans. That is because there are no records matching the conditions anyways.

However, let's look at a more relaxed condition:

gitlabhq_production=# explain analyze select count(*) from (select 1 AS one from projects join project_mirror_data import_state on import_state.project_id=projects.id WHERE mirror='t' and next_execution_timestamp > '2019-05-06 13:17') t;
                                                                                                QUERY PLAN                                                                                                 
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=7880.41..7880.42 rows=1 width=8) (actual time=36.061..36.061 rows=1 loops=1)
   ->  Nested Loop  (cost=0.99..7880.36 rows=20 width=0) (actual time=0.037..35.733 rows=3874 loops=1)
         ->  Index Scan using index_mirror_data_on_next_execution_and_retry_count on project_mirror_data import_state  (cost=0.55..1900.59 rows=1753 width=4) (actual time=0.022..8.069 rows=3874 loops=1)
               Index Cond: (next_execution_timestamp > '2019-05-06 13:17:00+00'::timestamp with time zone)
         ->  Index Scan using projects_pkey on projects  (cost=0.43..3.40 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=3874)
               Index Cond: (id = import_state.project_id)
               Filter: mirror
 Planning time: 0.439 ms
 Execution time: 36.090 ms
(9 rows)

gitlabhq_production=# explain analyze select count(*) from (select 1 AS one from projects join project_mirror_data import_state on import_state.project_id=projects.id WHERE mirror='t' and next_execution_timestamp > '2019-05-06 13:17' LIMIT 600) t;
                                                                                                   QUERY PLAN                                                                                                   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=7880.61..7880.62 rows=1 width=8) (actual time=6.241..6.241 rows=1 loops=1)
   ->  Limit  (cost=0.99..7880.36 rows=20 width=4) (actual time=0.037..6.186 rows=600 loops=1)
         ->  Nested Loop  (cost=0.99..7880.36 rows=20 width=4) (actual time=0.036..6.141 rows=600 loops=1)
               ->  Index Scan using index_mirror_data_on_next_execution_and_retry_count on project_mirror_data import_state  (cost=0.55..1900.59 rows=1753 width=4) (actual time=0.020..1.376 rows=600 loops=1)
                     Index Cond: (next_execution_timestamp > '2019-05-06 13:17:00+00'::timestamp with time zone)
               ->  Index Scan using projects_pkey on projects  (cost=0.43..3.40 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=600)
                     Index Cond: (id = import_state.project_id)
                     Filter: mirror
 Planning time: 0.408 ms
 Execution time: 6.269 ms
(10 rows)

So here, we have about 3,800 records matching the condition. The limited count performs much better.

In summary, the change is only going to have an impact if and when we have more pending mirrors than available slots.

Edited May 06, 2019 by Andreas Brandl

Optimize count query by using a limited count only

What does this MR do?

Conformity

Performance and testing

Merge request reports