Index creation failed on 2022-10-01 leaving an index in INVALID state – "index_vulnerability_reads_on_location_image_trigram"
Raised by @mparuszewski in Slack (internal) and in MR gitlab-org/gitlab!98131 (comment 1124367739):
index index_vulnerability_reads_on_location_image_trigram
is not at INVALID state, the question is how to fix it – I suppose the corresponding DB migration is also failed.
Some details:
- index is indeed in INVALID state – verified on gprd main:
"index_vulnerability_reads_on_location_image_trigram" gin (location_image gin_trgm_ops) WHERE (report_type = ANY (ARRAY[2, 7])) AND location_image IS NOT NULL INVALID
- Logs: https://log.gprd.gitlab.net/goto/fde12680-45bd-11ed-b0ec-930003e0679c. Index build attempt started on
patroni-main-2004-01-db-gprd
at2022-10-01 00:12:23 GMT
:statement: /*application:web,db_config_name:main*/ CREATE INDEX CONCURRENTLY "index_vulnerability_reads_on_location_image_trigram" ON "vulnerability_reads" USING gin ("location_image" gin_trgm_ops) WHERE report_type = ANY (ARRAY[2, 7]) AND location_image IS NOT NULL
- Then a bunch of messages about tmp files generated // side-note: a sign that we should consider increasing
maintenance_work_mem
-- cc @rhenchen.gitlab @alexander-sosna @Finotto @vitabaks - No signs of the final message – I expected to find an error, but Kibana doesn't show any. Why could it be? Lost connection? Worth further investigation.
What to do, proposals:
- right now: ensure that migration is not marked as successful, prepare to retry it but before a retry attempt, manually drop the index with
DROP INDEX CONCURRENTLY
and only then retry the migration. - longer term:
- consider increasing
maintenance_work_mem
(separate issue) - define a workflow for such cases – I think retries could be automated in this case (unless it was already discussed / created yet; cc @alexives)
- consider increasing