Pipeline schedules page is really slow on some projects

The pipeline schedules page for this project (which has just five schedules) is very slow and sometimes times out:

That's when it runs this query:

SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."pipeline_schedule_id" IN (45276, 43210, 43209, 43208, 13682) ORDER BY "ci_pipelines"."id" DESC

Here's an extreme example of a cold query plan for that:

gitlabhq_production=> EXPLAIN (analyze, buffers)  SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."pipeline_schedule_id" IN (45276, 43210, 43209, 43208, 13682) ORDER BY "ci_pipelines"."id" DESC;
                                                                                   QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=33609.12..33686.73 rows=31042 width=317) (actual time=40528.985..40534.862 rows=36357 loops=1)
   Sort Key: id DESC
   Sort Method: quicksort  Memory: 11194kB
   Buffers: shared hit=297 read=36151 written=8287
   I/O Timings: read=38759.033 write=1176.065
   ->  Index Scan using index_ci_pipelines_on_pipeline_schedule_id on ci_pipelines  (cost=0.57..31293.09 rows=31042 width=317) (actual time=3.569..40407.189 rows=36357 loops=1)
         Index Cond: (pipeline_schedule_id = ANY ('{45276,43210,43209,43208,13682}'::integer[]))
         Buffers: shared hit=294 read=36151 written=8287
         I/O Timings: read=38759.033 write=1176.065
 Planning Time: 99.826 ms
 Execution Time: 40537.493 ms
(11 rows)

This query comes from a view: https://gitlab.com/gitlab-org/gitlab/-/blob/bc6d5aa6c1ce257dc4f5793c495f32889b3e6636/app/views/projects/pipeline_schedules/index.html.haml#L15

https://gitlab.com/gitlab-org/quality/triage-ops/-/pipeline_schedules has many more schedules, and is slow, but not as slow as the first project mentioned.

Proposal

In the schedules controller we are trying to preload the latest pipeline for each schedule:

  def index
    @scope = params[:scope]
    @all_schedules = Ci::PipelineSchedulesFinder.new(@project).execute
    @schedules = Ci::PipelineSchedulesFinder.new(@project).execute(scope: params[:scope])
      .includes(:last_pipeline)
  end

And last_pipeline is defined as has_one :last_pipeline, -> { order(id: :desc) }, class_name: 'Ci::Pipeline'

ActiveRecord loads all the pipelines for the given schedules in memory and figures out what the last pipeline is in Ruby. I think we need a custom preloader that sets the last_pipeline records using a more efficient SQL query:

SELECT *
  FROM ci_pipelines
  INNER JOIN (
    SELECT distinct FIRST_VALUE(id) OVER (PARTITION BY pipeline_schedule_id order by id desc) AS id
      from ci_pipelines
      where pipeline_schedule_id in (?)
  ) last_pipelines on last_pipelines.id = ci_pipelines.id

Edited Sep 23, 2020 by Marius Bobin