Async drop redundant index index_ci_job_variables_on_partition_id_job_id
What does this MR do and why?
This MR prepares the async removal of the index index_ci_job_variables_on_partition_id_job_id on table ci_job_variables. By dropping the index we can reclaim at least 12 GB of disk space.
It will be removed synchronously as part of the same issue #544929 (closed).
Index removal justification
We can remove:
"index_ci_job_variables_on_partition_id_job_id" btree (partition_id, job_id)
because:
- This index was originally introduced to support our CI partitioning efforts related to FKs. It is no longer required for this purpose. Refs #498968 (comment 2454944661), #498968 (comment 2497789884):
We added these at the beginning of the partitioning work because we had a database test that made it mandatory for each FK to have an accompanying index.
The ones from the table are droppable because we added them for FKs, but they are not needed, the FKs can delete fine without them.
- We already have coverage on
job_idwith"index_ci_job_variables_on_job_id" btree (job_id), which makes the index withpartition_idredundant because:
- Its relationship with
p_ci_buildsis non-unique, so we don't need a composite withpartition_idto ensure uniqueness. - The table
ci_job_variablesis non-partitioned, so filtering byjob_idis sufficient. Includingpartition_iddoesn't significantly improve query performance.
- We do see some usage on the index in Grafana, but it's relatively low compared to the index on just
job_id.
- This low usage can be accounted for by Postgres' occasional preference for
index_ci_job_variables_on_partition_id_job_idfor certain queries, but those same queries should be almost as effectively supported byindex_ci_job_variables_on_job_idas explained above.
- We don't have any queries on
ci_job_variablesthat filter only onpartition_idand notjob_id.
References
- Issue: #544929 (closed)
How to set up and validate locally
Follow the steps in Verify indexes removed asynchronously. You may have to repeatedly run bundle exec rails gitlab:db:reindex since it only processes 2 pending async indexes at a time by default.
The following shows my local results.
Before dropping index:
gitlabhq_development=# \d+ index_ci_job_variables_on_partition_id_job_id
Index "public.index_ci_job_variables_on_partition_id_job_id"
Column | Type | Key? | Definition | Storage | Stats target
--------------+--------+------+--------------+---------+--------------
partition_id | bigint | yes | partition_id | plain |
job_id | bigint | yes | job_id | plain |
btree, for table "public.ci_job_variables"
After dropping index:
gitlabhq_development=# \d+ index_ci_job_variables_on_partition_id_job_id
Did not find any relation named "index_ci_job_variables_on_partition_id_job_id".
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #544929 (closed)