Skip to content

index_deployments_on_project_id_and_iid on table deployments is costly to maintain but underutilized

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

I was doing some WAL analysis on our typical database traffic to see which indexes are responsible for the most WAL volume, and the index index_deployments_on_project_id_and_iid popped up as consuming approximately 6% of all WAL volume. However, this graph indicates that while the index is used, it is not used all that often. (This graph is a plot of 24 hours over July 8 for the number of scans per second.)

image Source

So while this is not a pressing issue, I believe it would be worth asking:

  1. Can we justify the existence of this index?
  2. Is this an index that was created as part of a feature whose use is expected to grow substantially?

If we were to drop this index, a trivial query against this column becomes significantly more expensive, however the actual execution time remains in milliseconds:

gitlabhq_dblab=# EXPLAIN (analyze) SELECT 1 FROM deployments WHERE project_id = 1 AND iid = 3;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using index_deployments_on_project_id_and_id on deployments  (cost=0.57..8645.76 rows=7 width=4) (actual time=0.022..0.022 rows=0 loops=1)
   Index Cond: (project_id = 1)
   Filter: (iid = 3)
 Planning Time: 4.822 ms
 Execution Time: 0.085 ms

The biggest question here is: Do we make queries against iid and project_id perform slightly worse in exchange for getting back a substantial chunk of WAL volume, which is a limited resource?

Edited by 🤖 GitLab Bot 🤖