Skip to content

Add index to ci_job_artifacts

Ref: #362640 (closed)

What does this MR do and why?

Following up !88427 (merged) where we introduced a new index on ci_job_artifacts: (project_id, id) during low traffic this weekend.

This MR is adding a migration which will create our index for users outside of gitlab.com as explained in our documentation.

Why are we doing this?

This index will be used to order job artifacts by id instead of created_at to recalculate job artifacts size which is currently broken for some projects.

Database queries

Verification that our index exists on production
Table "public.ci_job_artifacts"
      Column       |           Type           | Collation | Nullable |                   Default                    | Storage  | Stats target | Description 
-------------------+--------------------------+-----------+----------+----------------------------------------------+----------+--------------+-------------
 project_id        | integer                  |           | not null |                                              | plain    |              | 
 file_type         | integer                  |           | not null |                                              | plain    |              | 
 size              | bigint                   |           |          |                                              | plain    |              | 
 created_at        | timestamp with time zone |           | not null |                                              | plain    |              | 
 updated_at        | timestamp with time zone |           | not null |                                              | plain    |              | 
 expire_at         | timestamp with time zone |           |          |                                              | plain    |              | 
 file              | character varying        |           |          |                                              | extended |              | 
 file_store        | integer                  |           |          | 1                                            | plain    |              | 
 file_sha256       | bytea                    |           |          |                                              | extended |              | 
 file_format       | smallint                 |           |          |                                              | plain    |              | 
 file_location     | smallint                 |           |          |                                              | plain    |              | 
 id                | bigint                   |           | not null | nextval('ci_job_artifacts_id_seq'::regclass) | plain    |              | 
 job_id            | bigint                   |           | not null |                                              | plain    |              | 
 locked            | smallint                 |           |          | 2                                            | plain    |              | 
 original_filename | text                     |           |          |                                              | extended |              | 
Indexes:
    "ci_job_artifacts_pkey" PRIMARY KEY, btree (id)
    "index_ci_job_artifacts_on_job_id_and_file_type" UNIQUE, btree (job_id, file_type)
    "index_ci_job_artifacts_for_terraform_reports" btree (project_id, id) WHERE file_type = 18
    "index_ci_job_artifacts_id_for_terraform_reports" btree (id) WHERE file_type = 18
    "index_ci_job_artifacts_on_expire_at_and_job_id" btree (expire_at, job_id)
    "index_ci_job_artifacts_on_expire_at_for_removal" btree (expire_at) WHERE locked = 0 AND expire_at IS NOT NULL
    "index_ci_job_artifacts_on_file_store" btree (file_store)
    "index_ci_job_artifacts_on_file_type_for_devops_adoption" btree (file_type, project_id, created_at) WHERE file_type = ANY (ARRAY[5, 6, 8, 23])
    "index_ci_job_artifacts_on_id_project_id_and_created_at" btree (project_id, created_at, id)
    "index_ci_job_artifacts_on_id_project_id_and_file_type" btree (project_id, file_type, id)
    "index_ci_job_artifacts_on_project_id" btree (project_id)
+   "index_ci_job_artifacts_on_project_id_and_id" btree (project_id, id)
    "index_ci_job_artifacts_on_project_id_for_security_reports" btree (project_id) WHERE file_type = ANY (ARRAY[5, 6, 7, 8])
    "tmp_index_ci_job_artifacts_on_expire_at_where_locked_unknown" btree (expire_at, job_id) WHERE locked = 2 AND expire_at IS NOT NULL
    "tmp_index_ci_job_artifacts_on_id_where_trace_and_expire_at" btree (id) WHERE file_type = 3 AND (expire_at = ANY (ARRAY['2021-04-22 00:00:00+00'::timestamp with time zone, '2021-05-22 00:00:00+00'::timestamp with time zone, '2021-06-22 00:00:00+00'::timestamp with time zone, '2022-01-22 00:00:00+00'::timestamp with time zone, '2022-02-22 00:00:00+00'::timestamp with time zone, '2022-03-22 00:00:00+00'::timestamp with time zone, '2022-04-22 00:00:00+00'::timestamp with time zone]))
Check constraints:
    "check_27f0f6dbab" CHECK (file_store IS NOT NULL)
    "check_85573000db" CHECK (char_length(original_filename) <= 512)
Foreign-key constraints:
    "fk_rails_c5137cb2c1" FOREIGN KEY (job_id) REFERENCES ci_builds(id) ON DELETE CASCADE
Referenced by:
    TABLE "ci_build_trace_metadata" CONSTRAINT "fk_21d25cac1a" FOREIGN KEY (trace_artifact_id) REFERENCES ci_job_artifacts(id) ON DELETE CASCADE
    TABLE "ci_job_artifact_states" CONSTRAINT "fk_rails_80a9cba3b2" FOREIGN KEY (job_artifact_id) REFERENCES ci_job_artifacts(id) ON DELETE CASCADE
Access method: heap

Migration output

ci: == 20220530074027 AddProjectIdIndexToJobArtifacts: migrating ==================
ci: -- transaction_open?()
ci:    -> 0.0000s
ci: -- index_exists?(:ci_job_artifacts, [:project_id, :id], {:name=>"index_ci_job_artifacts_on_project_id_and_id", :algorithm=>:concurrently})
ci:    -> 0.0053s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0001s
ci: -- add_index(:ci_job_artifacts, [:project_id, :id], {:name=>"index_ci_job_artifacts_on_project_id_and_id", :algorithm=>:concurrently})
ci:    -> 0.0015s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0001s
ci: == 20220530074027 AddProjectIdIndexToJobArtifacts: migrated (0.0112s) =========

-------------------------- ROLLBACK --------------------------------

ci: == 20220530074027 AddProjectIdIndexToJobArtifacts: reverting ==================
ci: -- transaction_open?()
ci:    -> 0.0000s
ci: -- indexes(:ci_job_artifacts)
ci:    -> 0.0062s
ci: -- execute("SET statement_timeout TO 0")
ci:    -> 0.0001s
ci: -- remove_index(:ci_job_artifacts, {:algorithm=>:concurrently, :name=>"index_ci_job_artifacts_on_project_id_and_id"})
ci:    -> 0.0010s
ci: -- execute("RESET statement_timeout")
ci:    -> 0.0001s
ci: == 20220530074027 AddProjectIdIndexToJobArtifacts: reverted (0.0116s) =========

NEW query without index

SELECT "ci_job_artifacts"."id", "ci_job_artifacts"."size"
FROM "ci_job_artifacts"
WHERE "ci_job_artifacts"."project_id" = 278964
AND "ci_job_artifacts"."id" > 500000
AND "ci_job_artifacts"."id" <= 700000
ORDER BY "ci_job_artifacts"."id" ASC
LIMIT 1000;

NEW query with index

SELECT "ci_job_artifacts"."id", "ci_job_artifacts"."size"
FROM "ci_job_artifacts"
WHERE "ci_job_artifacts"."project_id" = 278964
AND "ci_job_artifacts"."id" > 500000
AND "ci_job_artifacts"."id" <= 700000
ORDER BY "ci_job_artifacts"."id" ASC
LIMIT 1000;

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Max Orefice

Merge request reports

Loading