Skip to content

Optimize query for loading artifacts in pipeline

Stan Hu requested to merge sh-optimize-artifact-loading-mr into master

Previously to query the latest artifacts for a given pipeline we would include the project ID in the query just in case. However, the inclusion of this project ID may cause the PostgreSQL query planner to use the wrong index. For example, it may use index_ci_job_artifacts_on_project_id, which is slow because there may be thousands of artifacts for a given project.

To optimize this, we can just omit the project ID since we really care about job artifacts for a given build.

Relates to gitlab-com/gl-infra/production#4674 (closed)

Before

Notice the 30,741 rows filtered:

gitlabhq_production=# explain analyze SELECT "ci_job_artifacts".* FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."id" IN (SELECT max(ci_job_artifacts.id) as id FROM "ci_job_artifacts" INNER JOIN "ci_builds" ON "ci_job_artifacts"."job_id" = "ci_builds"."id" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."commit_id" = 306905697 AND "ci_job_artifacts"."file_type" IN (4, 5, 6, 7, 8, 9, 10, 101, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26) AND "ci_job_artifacts"."project_id" = 16672586 GROUP BY "ci_job_artifacts"."file_type");
                                                                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=7885.17..7888.20 rows=1 width=146) (actual time=295.004..295.008 rows=0 loops=1)
   ->  HashAggregate  (cost=7884.59..7884.60 rows=1 width=4) (actual time=295.002..295.006 rows=0 loops=1)
         Group Key: max(ci_job_artifacts_1.id)
         ->  GroupAggregate  (cost=7884.56..7884.58 rows=1 width=8) (actual time=295.001..295.004 rows=0 loops=1)
               Group Key: ci_job_artifacts_1.file_type
               ->  Sort  (cost=7884.56..7884.57 rows=1 width=8) (actual time=295.000..295.003 rows=0 loops=1)
                     Sort Key: ci_job_artifacts_1.file_type
                     Sort Method: quicksort  Memory: 25kB
                     ->  Hash Join  (cost=470.71..7884.55 rows=1 width=8) (actual time=294.995..294.997 rows=0 loops=1)
                           Hash Cond: (ci_job_artifacts_1.job_id = ci_builds.id)
                           ->  Index Scan using index_ci_job_artifacts_on_project_id on ci_job_artifacts ci_job_artifacts_1  (cost=0.57..7413.99 rows=163 width=12) (actual time=294.993..294.994 rows=0 loops=1)
                                 Index Cond: (project_id = 16672586)
                                 Filter: (file_type = ANY ('{4,5,6,7,8,9,10,101,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26}'::integer[]))
                                 Rows Removed by Filter: 30741
                           ->  Hash  (cost=465.68..465.68 rows=356 width=4) (never executed)
                                 ->  Index Scan using index_ci_builds_on_commit_id_and_type_and_ref on ci_builds  (cost=0.70..465.68 rows=356 width=4) (never executed)
                                       Index Cond: ((commit_id = 306905697) AND ((type)::text = 'Ci::Build'::text))
   ->  Index Scan using ci_job_artifacts_pkey on ci_job_artifacts  (cost=0.57..3.59 rows=1 width=146) (never executed)
         Index Cond: (id = (max(ci_job_artifacts_1.id)))
 Planning Time: 1.185 ms
 Execution Time: 295.091 ms
(21 rows)

After

Compared to 1 row!

gitlabhq_production=# explain analyze SELECT "ci_job_artifacts".* FROM "ci_job_artifacts" WHERE "ci_job_artifacts"."id" IN (SELECT max(ci_job_artifacts.id) as id FROM "ci_job_artifacts" INNER JOIN "ci_builds" ON "ci_job_artifacts"."job_id" = "ci_builds"."id" WHERE "ci_builds"."type" = 'Ci::Build' AND "ci_builds"."commit_id" = 306905697 AND "ci_job_artifacts"."file_type" IN (4, 5, 6, 7, 8, 9, 10, 101, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26) GROUP BY "ci_job_artifacts"."file_type");
                                                                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=7974.83..8010.28 rows=10 width=146) (actual time=0.593..0.595 rows=0 loops=1)
   ->  HashAggregate  (cost=7974.26..7974.36 rows=10 width=4) (actual time=0.592..0.594 rows=0 loops=1)
         Group Key: max(ci_job_artifacts_1.id)
         ->  GroupAggregate  (cost=7973.96..7974.13 rows=10 width=8) (actual time=0.591..0.592 rows=0 loops=1)
               Group Key: ci_job_artifacts_1.file_type
               ->  Sort  (cost=7973.96..7973.98 rows=10 width=8) (actual time=0.589..0.590 rows=0 loops=1)
                     Sort Key: ci_job_artifacts_1.file_type
                     Sort Method: quicksort  Memory: 25kB
                     ->  Nested Loop  (cost=1.28..7973.79 rows=10 width=8) (actual time=0.585..0.585 rows=0 loops=1)
                           ->  Index Scan using index_ci_builds_on_commit_id_and_type_and_ref on ci_builds  (cost=0.70..465.68 rows=356 width=4) (actual time=0.471..0.532 rows=5 loops=1)
                                 Index Cond: ((commit_id = 306905697) AND ((type)::text = 'Ci::Build'::text))
                           ->  Index Scan using index_ci_job_artifacts_on_job_id_and_file_type on ci_job_artifacts ci_job_artifacts_1  (cost=0.57..21.08 rows=1 width=12) (actual time=0.009..0.009 rows=0 loops=5)
                                 Index Cond: (job_id = ci_builds.id)
                                 Filter: (file_type = ANY ('{4,5,6,7,8,9,10,101,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26}'::integer[]))
                                 Rows Removed by Filter: 1
   ->  Index Scan using ci_job_artifacts_pkey on ci_job_artifacts  (cost=0.57..3.59 rows=1 width=146) (never executed)
         Index Cond: (id = (max(ci_job_artifacts_1.id)))
 Planning Time: 0.965 ms
 Execution Time: 0.675 ms
(19 rows)
Edited by Stan Hu

Merge request reports