Skip to content

Resolve "ActiveRecord::StatementInvalid: PG::QueryCanceled: ERROR: canceling statement due to statement timeout"

Queries like SELECT DISTINCT (sha) FROM merge_request_diff_commits ... are timing out.

My local environment

gitlabhq_development=# SELECT COUNT(*) FROM merge_request_diff_commits;
  count
----------
 66456804
(1 row)

Time: 33843.857 ms
gitlabhq_development=# SELECT COUNT(*) FROM merge_request_diff_commits WHERE merge_request_diff_id IN (SELECT id FROM merge_request_diffs WHERE merge_request_id = 8120);
 count
--------
 202884
(1 row)

Time: 62.987 ms
gitlabhq_development=# SELECT COUNT(DISTINCT(sha)) FROM merge_request_diff_commits WHERE merge_request_diff_id IN (SELECT id FROM merge_request_diffs WHERE merge_request_id = 8120);
 count
-------
 33814
(1 row)

Time: 178.501 ms

Failure one

I was going to add an index, but Postgres doesn't actually use indexes for this: https://wiki.postgresql.org/wiki/Loose_indexscan (You probably already know this, but I didn't!)

Either way, I get this query plan:

                                                                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=34125.37..34130.37 rows=1000 width=21) (actual time=202.449..203.821 rows=1000 loops=1)
   ->  Unique  (cost=34125.37..34148.42 rows=4611 width=21) (actual time=202.448..203.711 rows=1000 loops=1)
         ->  Sort  (cost=34125.37..34136.89 rows=4611 width=21) (actual time=202.447..203.109 rows=5995 loops=1)
               Sort Key: merge_request_diff_commits.sha
               Sort Method: external merge  Disk: 6144kB
               ->  Nested Loop  (cost=0.86..33844.77 rows=4611 width=21) (actual time=0.019..76.352 rows=202884 loops=1)
                     ->  Index Scan using index_merge_request_diffs_on_merge_request_id on merge_request_diffs  (cost=0.29..7.26 rows=6 width=4) (actual time=0.007..0.017 rows=6 loops=1)
                           Index Cond: (merge_request_id = 8120)
                     ->  Index Scan using index_merge_request_diff_commits_on_mr_diff_id_and_order on merge_request_diff_commits  (cost=0.56..5523.27 rows=11631 width=25) (actual time=0.009..7.834 rows=33814 loops=6)
                           Index Cond: (merge_request_diff_id = merge_request_diffs.id)
 Planning time: 0.310 ms
 Execution time: 214.129 ms
(12 rows)

Time: 214.861 ms

Failure two

I was also going to do the uniq in Ruby (because we do that anyway), but this is actually slower:

mr = MergeRequest.find(8120)
Benchmark.realtime { MergeRequestDiffCommit.where(merge_request_diff: mr.merge_request_diffs).pluck('sha').uniq }
#   (126.0ms)  SELECT "merge_request_diff_commits"."sha" FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = $1)  [["merge_request_id", 8120]]
#=> 0.8035269999818411
Benchmark.realtime { MergeRequestDiffCommit.where(merge_request_diff: mr.merge_request_diffs).pluck('DISTINCT sha').uniq }
#   (296.4ms)  SELECT DISTINCT sha FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = $1)  [["merge_request_id", 8120]]
#=> 0.41926100000273436

Failure three

This is the option I picked, but https://gitlab.com/gitlab-org/gitlab-ce/issues/39533 is the real solution here, to avoid this query entirely.

It's really stupid: I just do the uniq in Ruby, but also add a limit to avoid fetching too many commits. I explicitly don't order because this MR is silly.

Benchmark.realtime { MergeRequestDiffCommit.where(merge_request_diff: mr.merge_request_diffs).limit(10000).pluck('sha').uniq }
   (6.5ms)  SELECT  "merge_request_diff_commits"."sha" FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = $1) LIMIT 10000  [["merge_request_id", 8120]]
=> 0.09853100002510473
gitlabhq_development=# EXPLAIN ANALYZE SELECT sha FROM merge_request_diff_commits WHERE merge_request_diff_id IN (SELECT id FROM merge_request_diffs WHERE merge_request_id = 8120) LIMIT 10000;
                                                                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.86..33844.77 rows=4611 width=21) (actual time=0.013..4.941 rows=10000 loops=1)
   ->  Nested Loop  (cost=0.86..33844.77 rows=4611 width=21) (actual time=0.013..3.899 rows=10000 loops=1)
         ->  Index Scan using index_merge_request_diffs_on_merge_request_id on merge_request_diffs  (cost=0.29..7.26 rows=6 width=4) (actual time=0.005..0.005 rows=1 loops=1)
               Index Cond: (merge_request_id = 8120)
         ->  Index Scan using index_merge_request_diff_commits_on_mr_diff_id_and_order on merge_request_diff_commits  (cost=0.56..5523.27 rows=11631 width=25) (actual time=0.007..2.352 rows=10000 loops=1)
               Index Cond: (merge_request_diff_id = merge_request_diffs.id)
 Planning time: 0.188 ms
 Execution time: 5.510 ms
(8 rows)

Time: 6.102 ms

Database Checklist

When adding or modifying queries to improve performance:

  • Included the raw SQL queries of the relevant queries
  • Included the output of EXPLAIN ANALYZE and execution timings of the relevant queries

General Checklist

Closes #39054 (closed).

Edited by Yorick Peterse

Merge request reports