Skip to content

Optimized query to support "Don't list jobs with no artifacts"

This is the issue for backend support of #382074 (closed).

The current query times out for larger, complex data sets per #382074 (comment 1209912062):

Just coming back to this with the postgres.ai result. https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/13998/commands/48935

  • ️ Query processes too much data to return a relatively small number of rows. – Reduce data cardinality as early as possible during the execution, using one or several of the following techniques: new indexes, partitioning, query rewriting, denormalization. See the visualization of the plan to understand which plan nodes are the main bottlenecks. <http://momjian.us/main/writings/pgsql/hw_performance/|Show details>
  • ️ Add LIMIT – The number of rows in the result set is too big. Limit number of rows. <https://postgres.ai/#tip-add-limit|Show details>
  • ️ Specialized index needed – The index(es) currently used does not serve quite well for the needs of this query (notice Rows Removed by Filter: ..., meaning that the index fetched many non-target rows). Consider adding more specialized index(es). <https://postgres.ai/#tip-index-inefficient-high-filtered|Show details>
  • ️ VACUUM ANALYZE needed – Visibility map(s) for the table(s) involved in the query are outdated. For better performance: 1) run VACUUM ANALYZE on them as a one-time solution, 2) perform autovacuum tuning as a long-term permanent solution (tuning "vacuum" part of autovacuum). <https://postgres.ai/#tip-vacuum-analyze-needed|Show details>

We need to optimize the query.

Current query made by Ci::JobFinder when called from ProjectJobsResolver with with_artifacts: true

SELECT "ci_builds".* 
    FROM "ci_builds" 
    WHERE "ci_builds"."type" = 'Ci::Build' 
      AND "ci_builds"."project_id" = 278964 
      AND ("ci_builds"."status" NOT IN ('created')) 
      AND (
          EXISTS (
              SELECT 1 FROM "ci_job_artifacts" WHERE ("ci_builds".id = "ci_job_artifacts".job_id) AND "ci_job_artifacts"."file_type" IN (1, 2, 4, 5, 6, 7, 8, 9, 101, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
              )
          ) ORDER BY "ci_builds"."id" DESC;

Query plan with exec

 Gather Merge  (cost=1001.31..205227435.50 rows=4611629 width=1247) (actual time=141.034..29501250.832 rows=5117617 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=414643740 read=69519177 dirtied=1162358 written=89560
   I/O Timings: read=53284434.610 write=5532.121
   ->  Nested Loop Semi Join  (cost=1.28..204694139.07 rows=1921512 width=1247) (actual time=71.638..18293555.500 rows=1705872 loops=3)
         Buffers: shared hit=414643740 read=69519177 dirtied=1162358 written=89560
         I/O Timings: read=53284434.610 write=5532.121
         ->  Parallel Index Scan using index_ci_builds_on_project_id_and_id on public.ci_builds  (cost=0.70..120789419.81 rows=36334530 width=1247) (actual time=26.366..16972977.633 rows=31472973 loops=3)
               Index Cond: (ci_builds.project_id = 278964)
               Filter: (((ci_builds.status)::text <> 'created'::text) AND ((ci_builds.type)::text = 'Ci::Build'::text))
               Rows Removed by Filter: 126872
               Buffers: shared hit=12876157 read=65273152 dirtied=229967 written=65646
               I/O Timings: read=50191630.433 write=3933.365
         ->  Index Only Scan using index_ci_job_artifacts_on_job_id_and_file_type on public.ci_job_artifacts  (cost=0.58..2.31 rows=1 width=8) (actual time=0.040..0.040 rows=0 loops=94418919)
               Index Cond: (ci_job_artifacts.job_id = ci_builds.id)
               Heap Fetches: 3782693
               Filter: (ci_job_artifacts.file_type = ANY ('{1,2,4,5,6,7,8,9,101,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28}'::integer[]))
               Rows Removed by Filter: 1
               Buffers: shared hit=401767583 read=4246025 dirtied=930258 written=23914
               I/O Timings: read=3092804.177 write=1598.756
Edited by Albert