Staging: Drop unexpected index on routes table

The outage in production#2885 (closed) was caused by rendering an index unusable due to reindexing. Upon further investigation, this led to gitlab-org/gitlab#280826 (closed) which describes that the offending index isn't being tracked in our codebase (but it should be).

What's more here is that there are significant inconsistencies with respect to the database structure for both staging and GitLab.com production as compared to what we expect to have in the codebase. This is because this is the longest living installation of GitLab and there have been inconsistencies with database migrations in the past. See gitlab-org&3928 for a bigger picture.

This issue is about another difference in staging as compared to the codebase. In staging, we have a seemingly temporary index on the routes table:

    "stark_tmp2" btree (lower(path::text) varchar_pattern_ops)

This is presumably a left-over from manual testing from a long time ago (judging by the name, perhaps 2018).

Now the problematic query that was observed during the outage production#2885 (closed) has a plan that picks up exactly that temporary index (query+plan: https://explain.depesz.com/s/qgZD).

Incident recap and testing reindexing in staging

We've asked whether or not we would have observed the same issue as during the outage in staging. Had we reindexed index_on_routes_lower_path (which is what caused the outage) in staging, we wouldn't have seen the same issue in staging. That is, because the stark_tmp2 would have been used anyways.

Suggested fix

  • DROP INDEX CONCURRENTLY stark_tmp2 in staging

I strongly suggest to review the database structure in both staging and production and work towards consolidating that with what we track in the codebase: gitlab-org&3928