Cleanup unused indexes in CI tables

A large % of our disk usage for CI tables are the indexes. This issue is to provide a preliminary analysis of the indexes that we could remove/optimize, along with estimates of disk space savings we expect to see.

Index evaluation guide

[UPDATE 2025-05-01]: This guide has been added to our development docs (!189872 (merged)). Refer to Dropping unused indexes for the latest updates and further details.

The Index usage investigation steps and Query investigation steps below are a suggested guide to evaluate the index and ensure it is unused prior to removal. This guide is not exhaustive and you may choose to deviate from it; however, with any approach, please keep in mind:

⚠️ Dropping the wrong indexes could result in significant query performance degradation or timeouts, potentially leading to incidents. So we must be confident that the index can be safely removed.

Considerations

It's possible that an index has zero usage on GitLab.com but not on self-managed. It's important to thoroughly investigate the history of the index and existing code to ensure there are no queries that depend on it.
Indexes that show non-zero but relatively low usage may still be dropped if it's determined that other existing indexes would sufficiently support the queries using it. This is because Postgres determines which index to use based on data distribution statistics; in some cases it may slightly prefer one index over another even if both indexes adequately support the query.
For partitioned tables, we must check that the corresponding indexes (i.e. "child indexes") on all partitions are unused. Also note: only the parent index can be dropped. Postgres does not allow child indexes to be independently dropped.
For reference, partitioned CI tables often (but not always!) follow these parent-child naming patterns:

Parent: p_ci_my_table

Index: p_ci_my_table_column_names_idx

Children:

First partition: ci_my_table
- The p_ prefix is not present because this was an old ci table that was later attached to the parent as the first partition.
- Index: index_my_table_on_column_names
Dynamic partitions: ci_my_table_102, ci_my_table_103, ci_my_table_104, etc.
- Index: ci_my_table_102_column_names_idx

Index usage investigation steps

Based on: #520213 (comment 2443401100)

Check if the index is present in the latest postgres.ai Unused Indexes report.
For partitioned tables, we need to check all corresponding indexes on partitions (these "child indexes" are usually named differently than the parent index). We should also verify their definitions on Production to avoid an index mismatch situation. Here are a few resources to help:

Manually inspect db/structure.sql.
Check the Production schemas with \d commands on postgres.ai Database Lab Engine (ref: #520213 (comment 2452364282)).
To quickly list all child indexes, run \d+ <PARENT_INDEX_NAME>.
To see the full parent-child index structure with more detail, you can run the following query in PSQL console:

SELECT
  parent_idx.relname AS parent_index,
  child_tbl.relname AS child_table,
  child_idx.relname AS child_index,
  dep.deptype,
  pg_get_indexdef(child_idx.oid) AS child_index_def
FROM
  pg_class parent_idx
JOIN pg_depend dep ON dep.refobjid = parent_idx.oid
JOIN pg_class child_idx ON child_idx.oid = dep.objid
JOIN pg_index i ON i.indexrelid = child_idx.oid
JOIN pg_class child_tbl ON i.indrelid = child_tbl.oid
WHERE
  parent_idx.relname = '<PARENT_INDEX_NAME>';

In Grafana, query the metric pg_stat_user_indexes_idx_scan filtered by the relevant index name(s) for at least the last 6 months. Example: https://dashboards.gitlab.net/goto/cJCiggJHg?orgId=1 (you can optionally scope it to a single DB instance.)
Conduct the Query investigation steps below to fully verify that no queries depend on the index exclusively (i.e. they can be sufficiently covered by other existing indexes).

Query investigation steps

Partial example: #520213 (comment 2443401412)

Investigate why the index was introduced in the first place and determine if it may only be applicable to SM instances. Dig through the commit history and related MRs/issues that introduced the index.
Check queries outputted from the rspec:merge-auto-explain-logs CI job. This job collects and analyzes queries triggered through tests.
Check production queries outputted on Postgres Kibana logs. This is just in case the queries generated through tests are not exhaustive. This also helps to inform the frequency of each query execution. Caveat: We only keep the last 7 days of data and these logs do not apply to SM usage.
Manually search through the codebase for related queries. This is to confirm that there are no other queries we missed from the steps above, especially ones that may only be applicable to SM instances.

Edited May 02, 2025 by Leaminn Ma