Skip to content

Change SQL for expired artifacts to use new ci_job_artifacts.expire_at

Gregory Stark requested to merge expired-ci-artifacts into master

Add indexes and change the queries to deal with expired artifacts.

The migration to add the indexes looks like:

== 20180119160751 OptimizeCiJobArtifacts: migrating ===========================
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- add_index(:ci_job_artifacts, [:expire_at, :job_id], {:algorithm=>:concurrently})
   -> 0.0992s
-- transaction_open?()
   -> 0.0000s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- add_index(:ci_builds, [:artifacts_expire_at], {:where=>"artifacts_file <> ''", :algorithm=>:concurrently})
   -> 0.0353s
== 20180119160751 OptimizeCiJobArtifacts: migrated (0.1360s) ==================

And the rollback:

== 20180119160751 OptimizeCiJobArtifacts: reverting ===========================
-- transaction_open?()
   -> 0.0000s
-- select_one("SELECT current_setting('server_version_num') AS v")
   -> 0.0010s
-- execute("SET statement_timeout TO 0")
   -> 0.0004s
-- remove_index(:ci_job_artifacts, {:algorithm=>:concurrently, :column=>[:expire_at, :job_id]})
   -> 0.0464s
-- transaction_open?()
   -> 0.0000s
-- select_one("SELECT current_setting('server_version_num') AS v")
   -> 0.0008s
-- execute("SET statement_timeout TO 0")
   -> 0.0005s
-- remove_index(:ci_builds, {:where=>"artifacts_file <> ''", :algorithm=>:concurrently, :column=>[:artifacts_expire_at]})
   -> 0.0046s
== 20180119160751 OptimizeCiJobArtifacts: reverted (0.0547s) ==================

And the old queries look like this example (this is for expired artifacts but the others are more or less similar):

SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND ((artifacts_file <> '') OR EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id))) AND (artifacts_expire_at < '2018-01-19 12:42:31.175351')

Now it looks like:

-- [11] pry(main)> print Ci::Build.with_artifacts.to_sql
SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND (ci_builds.id IN (SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND (artifacts_file <> '') UNION ALL SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND ((artifacts_file IS NULL OR artifacts_file = '') AND EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id)))))
-- [13] pry(main)> print Ci::Build.with_artifacts_not_expired.to_sql
SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND (ci_builds.id IN (SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND (artifacts_file <> '' AND (artifacts_expire_at IS NULL OR artifacts_expire_at > '2018-02-09 18:29:08.599269')) UNION ALL SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND ((artifacts_file IS NULL OR artifacts_file = '') AND EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id AND (expire_at IS NULL OR expire_at > '2018-02-09 18:29:08.600634'))))))
-- [14] pry(main)> print Ci::Build.with_expired_artifacts.to_sql
SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND (ci_builds.id IN (SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND (artifacts_file <> '' AND artifacts_expire_at < '2018-02-09 18:29:13.525384') UNION ALL SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND ((artifacts_file IS NULL OR artifacts_file = '') AND EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id AND expire_at < '2018-02-09 18:29:13.526227')))))

With a plan like (on staging):

gitlabhq_production=# explain SELECT "ci_builds".* FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND (ci_builds.id IN (SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND (artifacts_file <> '' AND artifacts_expire_at < '2018-02-09 18:49:51.885305') UNION ALL SELECT "ci_builds"."id" FROM "ci_builds" WHERE "ci_builds"."type" IN ('Ci::Build') AND ((artifacts_file IS NULL OR artifacts_file = '') AND EXISTS (SELECT 1 FROM "ci_job_artifacts" WHERE (ci_builds.id = ci_job_artifacts.job_id AND expire_at < '2018-02-09 18:49:51.885977')))));
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=3020159.07..3021079.51 rows=16265026 width=1441)
   ->  HashAggregate  (cost=3020158.51..3020160.51 rows=200 width=4)
         Group Key: ci_builds_1.id
         ->  Append  (cost=0.43..3010781.58 rows=3750772 width=4)
               ->  Index Scan using index_ci_builds_on_artifacts_expire_at on ci_builds ci_builds_1  (cost=0.43..2973246.56 rows=3750767 width=4)
                     Index Cond: (artifacts_expire_at < '2018-02-09 18:49:51.885305'::timestamp without time zone)
                     Filter: ((type)::text = 'Ci::Build'::text)
               ->  Nested Loop  (cost=4.82..27.30 rows=5 width=4)
                     ->  HashAggregate  (cost=4.26..4.31 rows=5 width=4)
                           Group Key: ci_job_artifacts.job_id
                           ->  Index Only Scan using index_ci_job_artifacts_on_expire_at_and_job_id on ci_job_artifacts  (cost=0.14..4.24 rows=6 width=4)
                                 Index Cond: (expire_at < '2018-02-09 18:49:51.885977+00'::timestamp with time zone)
                     ->  Index Scan using ci_builds_pkey on ci_builds ci_builds_2  (cost=0.56..4.59 rows=1 width=4)
                           Index Cond: (id = ci_job_artifacts.job_id)
                           Filter: (((artifacts_file IS NULL) OR (artifacts_file = ''::text)) AND ((type)::text = 'Ci::Build'::text))
   ->  Index Scan using ci_builds_pkey on ci_builds  (cost=0.56..4.59 rows=1 width=1441)
         Index Cond: (id = ci_builds_1.id)
         Filter: ((type)::text = 'Ci::Build'::text)
(18 rows)

The plans for the other queries are not as great though they're better than they were. Hopefully they'll be good when other where clauses are added like the current project or job or whatever.

Eventually all this UNION stuff is going to be ripped out when the migration is finished and then these indexes will be even more likely to be used wherever they can be.

Database Checklist

When adding migrations:

  • Updated db/schema.rb
  • Added a down method so the migration can be reverted
  • Added the output of the migration(s) to the MR body
  • Added tests for the migration in spec/migrations if necessary (e.g. when migrating data)

When adding or modifying queries to improve performance:

  • Included data that shows the performance improvement, preferably in the form of a benchmark
  • Included the output of EXPLAIN (ANALYZE, BUFFERS) of the relevant queries

When adding foreign keys to existing tables:

  • Included a migration to remove orphaned rows in the source table before adding the foreign key
  • Removed any instances of dependent: ... that may no longer be necessary

When adding tables:

  • Ordered columns based on the Ordering Table Columns guidelines
  • Added foreign keys to any columns pointing to data in other tables
  • Added indexes for fields that are used in statements such as WHERE, ORDER BY, GROUP BY, and JOINs

When removing columns, tables, indexes or other structures:

  • Removed these in a post-deployment migration
  • Made sure the application no longer uses (or ignores) these structures

General Checklist

Edited by Grzegorz Bizon

Merge request reports