Missing Reset Trigger causing errors during migrations
Summary
MR gitlab-org/gitlab!149681 (merged) contained a reset_trigger which was the same as the one on another MR
This led to one trigger not being reset as it should, leading to errors.
This is a result of simple copy/pasting code and forgetting to make the change
Problem: tables with triggers
from ahegyi
If the trigger is correctly reset using the reset_trigger_function
, we would have seen the same problem with the merge_request_metrics
table because there is another trigger on the table that modifies the pipeline_id column: nullify_merge_request_metrics_build_data()
How to reproduce it:
Session 1: invoke the query below a few times
update merge_request_metrics set pipeline_id = (random() * 100000)::integer where id = 12;
Session 2: change the column type:
alter table merge_request_metrics alter column pipeline_id type integer;
Session 1: verify the error
update merge_request_metrics set pipeline_id = (random() * 100000)::integer where id = 12;
Session 2: reset the trigger function
alter function trigger_fd041fe2d1a7 reset all;
Session 1: verify the error is still there
update merge_request_metrics set pipeline_id = (random() * 100000)::integer where id = 12;
Session 2: reset the 2nd trigger function
alter function nullify_merge_request_metrics_build_data reset all;
Session 1: now it works
update merge_request_metrics set pipeline_id = (random() * 100000)::integer where id = 12;
How to improve the tooling
Introduce a new helper method called reset_trigger_functions(table, reset: [], skip: [])
. The method requires us to list the trigger function names and decide whether we want to reset the function or skip the reset. If a trigger function is missing from the list, we raise error.
Alternatively, we can forcefully reset all trigger functions on the table. Maybe DB team could weigh in if it's safe or not.
In both cases, we must update the documentation here: https://docs.gitlab.com/ee/development/database/avoiding_downtime_in_migrations.html#swap-the-columns-release-n--1
Related Incident(s)
Originating issue(s): production#17921
Desired Outcome/Acceptance Criteria
- Future cases of duplication/missing resets should be caught
Associated Services
Corrective Action Issue Checklist
-
Link the incident(s) this corrective action arose from -
Give context for what problem this corrective action is trying to prevent re-occurring -
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
Assign a priority (this will default to 'Reliability::P4' but should match the severity of the related incident) -
Assign a service label