Skip to content

Rewrite the optimized ci_builds sql using union to do it more cleanly

This rewrites the union sql to in ci_builds that previously looked like

select * from ci_builds where id in (select id from ci_builds where ... union all select id from ci_builds where ...)

to instead look like:

select * from (select * from ci_builds where ... union all select * from ci_builds where ...) as ci_builds

This is more important than it appears because any further scopes will tack on where clauses which the first example will not be possible for Postgres to push inside the union. In the new style the where clauses will easily be pushed inside the union and Postgres will be able to use any indexes available.

I've run the rspec tests and captured all SQL queries generated using these scopes. There are only three distinct queries which appear to be from Ci::Pipeline::latest_builds_with_artifacts, Ci::Pipeline::latest_builds_with_artifacts.count, and Ci::Builds.with_expired_artifacts.

The Ci::Pipeline::latest_builds_with_artifacts method creates queries which now look like:

SELECT "ci_builds".* FROM (
  SELECT "ci_builds".* 
    FROM "ci_builds" 
   WHERE "ci_builds"."type" IN ('Ci::Build') AND (artifacts_file <> '')
   UNION ALL
  SELECT "ci_builds".* 
    FROM "ci_builds" 
   WHERE "ci_builds"."type" IN ('Ci::Build') AND ((artifacts_file IS NULL OR artifacts_file = '') AND EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id)))
  ) AS ci_builds 
 WHERE "ci_builds"."type" IN ('Ci::Build') 
   AND "ci_builds"."commit_id" = $1 
   AND ("ci_builds"."retried" = 'f' OR "ci_builds"."retried" IS NULL)

And the new plan looks like:

                                                                   QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.56..393.18 rows=125 width=1519)
   ->  Index Scan using index_ci_builds_on_commit_id_and_status_and_type on ci_builds  (cost=0.56..127.71 rows=122 width=1519)
         Index Cond: ((commit_id = 17644813) AND ((type)::text = 'Ci::Build'::text))
         Filter: (((NOT retried) OR (retried IS NULL)) AND (artifacts_file <> ''::text))
   ->  Nested Loop Semi Join  (cost=0.99..264.23 rows=3 width=1519)
         ->  Index Scan using index_ci_builds_on_commit_id_and_status_and_type on ci_builds ci_builds_1  (cost=0.56..127.71 rows=113 width=1519)
               Index Cond: ((commit_id = 17644813) AND ((type)::text = 'Ci::Build'::text))
               Filter: (((NOT retried) OR (retried IS NULL)) AND ((artifacts_file IS NULL) OR (artifacts_file = ''::text)))
         ->  Index Only Scan using index_ci_job_artifacts_on_job_id_and_file_type on ci_job_artifacts  (cost=0.43..1.98 rows=2 width=4)
               Index Cond: (job_id = ci_builds_1.id)
(10 rows)

The actual sql in the postgres log seems to have a artifacts_not_expired scope which I don't see where in the code it's coming from:

SELECT "ci_builds".* 
  FROM (
    SELECT "ci_builds".* 
      FROM "ci_builds" 
     WHERE "ci_builds"."type" IN ('Ci::Build') 
       AND (artifacts_file <> '' AND (artifacts_expire_at IS NULL OR artifacts_expire_at > '2018-02-18 18:53:10.570832'))
UNION ALL
    SELECT "ci_builds".* 
      FROM "ci_builds" 
     WHERE "ci_builds"."type" IN ('Ci::Build') 
       AND ((artifacts_file IS NULL OR artifacts_file = '') 
       AND EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id AND (expire_at IS NULL OR expire_at > '2018-02-18 18:53:10.571501'))))
  ) AS ci_builds 
 WHERE "ci_builds"."type" IN ('Ci::Build') 
   AND "ci_builds"."commit_id" = $1 
   AND "ci_builds"."type" IN ('Ci::Build') 
   AND ("ci_builds"."retried" = 'f' OR "ci_builds"."retried" IS NULL
)

And the same query with a .count method presumably:

SELECT COUNT(*) 
  FROM (
    SELECT "ci_builds".* 
      FROM "ci_builds" 
     WHERE "ci_builds"."type" IN ('Ci::Build') 
       AND (artifacts_file <> '' 
       AND (artifacts_expire_at IS NULL OR artifacts_expire_at > '2018-02-18 18:53:10.570832'))
 UNION ALL
    SELECT "ci_builds".* 
      FROM "ci_builds" 
     WHERE "ci_builds"."type" IN ('Ci::Build') 
       AND ((artifacts_file IS NULL OR artifacts_file = '') 
       AND EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id AND (expire_at IS NULL OR expire_at > '2018-02-18 18:53:10.571501'))))
  ) AS ci_builds 
 WHERE "ci_builds"."type" IN ('Ci::Build') 
   AND "ci_builds"."commit_id" = $1 
   AND "ci_builds"."type" IN ('Ci::Build') 
   AND ("ci_builds"."retried" = 'f' OR "ci_builds"."retried" IS NULL)

which now generates this plan:

 Aggregate  (cost=576.61..576.62 rows=1 width=8)
   ->  Append  (cost=0.56..576.31 rows=121 width=0)
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.56..139.13 rows=117 width=0)
               ->  Index Scan using index_ci_builds_on_commit_id_and_status_and_type on ci_builds  (cost=0.56..137.96 rows=117 width=4064)
                     Index Cond: ((commit_id = 1) AND ((type)::text = 'Ci::Build'::text))
                     Filter: (((NOT retried) OR (retried IS NULL)) AND (artifacts_file <> ''::text) AND ((artifacts_expire_at IS NULL) OR (artifacts_expire_at > '1999-01-01 00:00:00'::timestamp without time zone)))
         ->  Subquery Scan on "*SELECT* 2"  (cost=0.99..437.17 rows=4 width=0)
               ->  Nested Loop Semi Join  (cost=0.99..437.13 rows=4 width=4064)
                     ->  Index Scan using index_ci_builds_on_commit_id_and_status_and_type on ci_builds ci_builds_1  (cost=0.56..137.63 rows=123 width=4)
                           Index Cond: ((commit_id = 1) AND ((type)::text = 'Ci::Build'::text))
                           Filter: (((NOT retried) OR (retried IS NULL)) AND ((artifacts_file IS NULL) OR (artifacts_file = ''::text)))
                     ->  Index Scan using index_ci_job_artifacts_on_job_id_and_file_type on ci_job_artifacts  (cost=0.43..4.38 rows=2 width=4)
                           Index Cond: (job_id = ci_builds_1.id)
                           Filter: ((expire_at IS NULL) OR (expire_at > '1999-01-01 00:00:00+00'::timestamp with time zone))

Lastly the only other query seen in the rspec tests comes directly from Ci::Build.with_expired_artifacts which is run by the sidekiq job trying to expire artifacts. This query was previously hitting statement_timeout and was the original motivation for this MR:

SELECT "ci_builds"."id" 
  FROM (
    SELECT "ci_builds".* 
      FROM "ci_builds" 
     WHERE "ci_builds"."type" IN ('Ci::Build') 
       AND (artifacts_file <> '' 
       AND artifacts_expire_at < '2018-02-18 16:50:16.269315')
 UNION ALL
    SELECT "ci_builds".* 
      FROM "ci_builds" 
     WHERE "ci_builds"."type" IN ('Ci::Build') 
       AND ((artifacts_file IS NULL OR artifacts_file = '') 
       AND EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id AND expire_at < '2018-02-18 16:50:16.270639')))
  ) AS ci_builds 
 WHERE "ci_builds"."type" IN ('Ci::Build')

And that generates this plan:

                                                                                                     QUERY PLAN                                                                                                     
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.43..4223727.14 rows=6317078 width=4) (actual time=0.025..7259.167 rows=956736 loops=1)
   ->  Subquery Scan on "*SELECT* 1"  (cost=0.43..1840370.96 rows=5431296 width=4) (actual time=0.025..1456.550 rows=464874 loops=1)
         ->  Index Scan using index_ci_builds_on_artifacts_expire_at on ci_builds  (cost=0.43..1786058.00 rows=5431296 width=4064) (actual time=0.024..1238.297 rows=464874 loops=1)
               Index Cond: (artifacts_expire_at < '2018-02-18 16:50:16.269315'::timestamp without time zone)
               Filter: ((type)::text = 'Ci::Build'::text)
   ->  Subquery Scan on "*SELECT* 2"  (cost=134074.06..2383356.18 rows=885782 width=4) (actual time=1058.890..5354.183 rows=491862 loops=1)
         ->  Nested Loop  (cost=134074.06..2374498.36 rows=885782 width=4064) (actual time=1058.889..5095.644 rows=491862 loops=1)
               ->  Unique  (cost=134073.50..139009.18 rows=815089 width=4) (actual time=1058.818..1912.501 rows=491862 loops=1)
                     ->  Sort  (cost=134073.50..136541.34 rows=987137 width=4) (actual time=1058.815..1501.192 rows=983720 loops=1)
                           Sort Key: ci_job_artifacts.job_id
                           Sort Method: external merge  Disk: 13440kB
                           ->  Index Only Scan using index_ci_job_artifacts_on_expire_at_and_job_id on ci_job_artifacts  (cost=0.43..27111.49 rows=987137 width=4) (actual time=0.051..358.960 rows=983720 loops=1)
                                 Index Cond: (expire_at < '2018-02-12 16:50:16.270639+00'::timestamp with time zone)
                                 Heap Fetches: 4323
               ->  Index Scan using ci_builds_pkey on ci_builds ci_builds_1  (cost=0.56..2.73 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=491862)
                     Index Cond: (id = ci_job_artifacts.job_id)
                     Filter: (((artifacts_file IS NULL) OR (artifacts_file = ''::text)) AND ((type)::text = 'Ci::Build'::text))
 Planning time: 0.844 ms
 Execution time: 7485.729 ms
(19 rows)

Out of paranoia I've also tested that this coding pattern propagates all scopes properly regardless of whether they're applied before or after this scope:

[39] pry(main)> print Ci::Build.ref_protected.with_artifacts.last_month.to_sql
SELECT "ci_builds".* FROM (
  SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND (artifacts_file <> '')
  UNION ALL
  SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND ((artifacts_file IS NULL OR artifacts_file = '') AND EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id)))
) AS ci_builds 
WHERE "ci_builds"."type" IN ('Ci::Build') 
  AND "ci_builds"."protected" = 't' 
  AND (created_at > '2018-01-17')

What does this MR do?

Are there points in the code the reviewer needs to double check?

Why was this MR needed?

Screenshots (if relevant)

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Edited by Gregory Stark

Merge request reports