Skip to content

500 is raised on pipelines/charts due to query timeout

Summary

/gitlab-org/gitlab-ce/pipelines/charts pages causes 500 to be raised due to

ActiveRecord::QueryCanceled: PG::QueryCanceled: ERROR: canceling statement due to statement timeout :

SELECT COUNT("ci_pipelines"."created_at") AS count_created_at, DATE(ci_pipelines.created_at) AS date_ci_pipelines_created_at 
FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 13083 AND ('2019-09-03 23:59:59.999999' > ci_pipelines.created_at AND 
ci_pipelines.created_at > '2019-08-27 00:00:00') GROUP BY DATE(ci_pipelines.created_at)

The query indeed seems ineffective:

Query Plan: https://explain.depesz.com/s/Vtb2

Steps to reproduce

Visit https://gitlab.com/gitlab-org/gitlab-ce/pipelines/charts multiple times, the error is raised from time to time

Proposal

There are two database query triggered by Gitlab::Ci::Charts::YearChart.new(project):

Summary:
Time: 3.373 min
  - planning: 0.297 ms
  - execution: 3.373 min
    - I/O read: 3.338 min
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 14050 (~109.80 MiB) from the buffer pool
  - reads: 160734 (~1.20 GiB) from the OS file cache, including disk I/O
  - dirtied: 289 (~2.30 MiB)
  - writes: 0

Both of them use the index added in !38226 (merged), but the execution for the second one is quite bad.

A possible two step solution:

  • Include the status column in the index_ci_pipelines_on_project_id_and_created_at index, like add_concurrent_index :ci_pipelines, [:project_id, :created_at, :status]. This should speed up the second query.
  • Split the date range and query the data monthly. This might have a bit higher execution time, it's unlikely that we hit the statement_timeout.

Example Project

https://gitlab.com/gitlab-org/gitlab-ce/pipelines/charts

What is the current bug behavior?

Raises 500 from time to time

What is the expected correct behavior?

The page is responsive all the time

Edited by Marius Bobin