GroupDestroyWorker failure - deployment_merge_requests statement timeout
Summary
Related to &7171
On Gitlab.com, GroupDestroyWorker
sometimes fails due to a non-deleteable project. The failures are tracked in this Kibana dashboard.
We have narrowed down the failures to ten distinct Project#delete_error
values. This issue deals with project deletion errors due to
PG::QueryCanceled: ERROR: canceling statement due to statement timeout
CONTEXT: SQL statement "DELETE FROM ONLY "public"."deployment_merge_requests" WHERE $1 OPERATOR(pg_catalog.=) "merge_request_id""
The full list of delete_errors
can be found here: https://gitlab.com/gitlab-org/gitlab/-/issues/342692#note_737332055
Impact
In the past week, GroupDestroyWorker
has failed ~1500 times due to 138 projects attempting to be deleted over and over again.
Recommendation
Verification
Designs
- Show closed items
Is blocked by
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Serena Fang added Engineering Allocation infradev typebug labels
added Engineering Allocation infradev typebug labels
added Eng-ConsumerInfrastructure Eng-ProducerDevelopment labels
- Maintainer
@serenafang
, please ensure the following labels are present for Engineering Allocation:- An
~Eng-Consumer::*
label - An
~Eng-Producer::*
label - A
~priority::*
label - A
~severity::*
label when the type is ~"bug"
- An
- Serena Fang added to epic &7171
added to epic &7171
- Serena Fang changed the description
Compare with previous version changed the description
- Maintainer
All infradev issues need to have a proper severity label, priority label label, and a milestone set. Please add those to this issue. For more details, see the handbook.
- 🤖 GitLab Bot 🤖 added automation:infradev-missing-labels label
added automation:infradev-missing-labels label
added Category:Source Code Management groupsource code labels
- Maintainer
Setting label(s) devopscreate sectiondev based on groupsource code.
- 🤖 GitLab Bot 🤖 added devopscreate sectiondev labels
added devopscreate sectiondev labels
- Darva Satcher assigned to @sean_carroll
assigned to @sean_carroll
- Sean Carroll assigned to @vyaklushin
assigned to @vyaklushin
- Sean Carroll unassigned @sean_carroll
unassigned @sean_carroll
- Sean Carroll changed milestone to %14.6
changed milestone to %14.6
- Sean Carroll added Deliverable label
added Deliverable label
- Developer
@vyaklushin this is a new infradev issue that has been pulled into %14.6, please work on it as a priority (although feel free to finish up in-flight work first).
For tracking purposes it would be great to have a weight added, although this can be done after work has commenced and you have a better idea of what is needed.
Edited by Sean Carroll - Sean Carroll added priority2 severity2 labels
- Maintainer
Kibana link to the error message: https://log.gprd.gitlab.net/goto/ff633688847ae190d06357314d911d30
I think the problem is similar to !59754 (merged). We rely on database
DELETE CASCADE
to delete project dependencies. But we can face statement timeouts when we have too many of them. We can use the same approach and explicitly deletedeployment_merge_requests
one step before we delete the project itself.I need to reproduce this problem locally and check if this solution works. I'd put weight 2 for it.
Collapse replies - Developer
That's great, thank you for the quick response on this @vyaklushin
- Vasilii Iakliushin set weight to 2
set weight to 2
- 🤖 GitLab Bot 🤖 removed automation:infradev-missing-labels label
removed automation:infradev-missing-labels label
- Sean Carroll added workflowin dev label
added workflowin dev label
- Maintainer
@sean_carroll It's currently blocked before we find an answer to this comment. I'm not sure if the extraction of
deployment_merge_requests
from the delete flow will actually solve the problem. We need to identify the root cause first. - Vasilii Iakliushin added workflowblocked label and removed workflowin dev label
added workflowblocked label and removed workflowin dev label
- Maintainer
Here is a redacted list of triggers for one of the projects we got after running
explain
in dblab. I picked only those that consume more than 1 second.Delete on public.projects (cost=0.44..3.46 rows=1 width=6) (actual time=1.207..1.207 rows=0 loops=1) -> Index Scan using projects_pkey on public.projects (cost=0.44..3.46 rows=1 width=6) (actual time=0.044..0.045 rows=1 loops=1) Output: ctid Index Cond: (projects.id = _) Planning Time: 3.366 ms Trigger RI_ConstraintTrigger_a_23991 for constraint fk_rails_f601258b28 on projects: time=10332.771 calls=1 Trigger RI_ConstraintTrigger_a_20539 for constraint fk_310d714958 on merge_requests: time=4737.057 calls=2411 Trigger RI_ConstraintTrigger_a_20844 for constraint fk_8483f3258f on merge_requests: time=13051.745 calls=2411 Trigger RI_ConstraintTrigger_a_21014 for constraint fk_a23be95014 on merge_requests: time=1087.250 calls=2411 Trigger RI_ConstraintTrigger_a_21554 for constraint fk_rails_004ce82224 on merge_requests: time=3967.408 calls=2411 Trigger RI_ConstraintTrigger_a_22249 for constraint fk_rails_443443ce6f on merge_requests: time=3257.003 calls=2411 Trigger RI_ConstraintTrigger_a_22264 for constraint fk_rails_458eda8667 on merge_requests: time=1255.230 calls=2411 Trigger RI_ConstraintTrigger_a_22869 for constraint fk_rails_86a6d8bf12 on merge_requests: time=1683.598 calls=2411 Trigger RI_ConstraintTrigger_a_23014 for constraint fk_rails_92dd0e705c on merge_requests: time=5263.188 calls=2411 Trigger RI_ConstraintTrigger_a_23064 for constraint fk_rails_9851a00031 on merge_requests: time=1536.541 calls=2411 Trigger RI_ConstraintTrigger_a_23244 for constraint fk_rails_aa1b2961b1 on merge_requests: time=1359.613 calls=2411 Trigger RI_ConstraintTrigger_a_23816 for constraint fk_rails_e6d7c24d1b on merge_requests: time=2713.346 calls=2411 Trigger RI_ConstraintTrigger_a_1208325681 for constraint fk_36c74129da on events: time=27044.279 calls=14129 Trigger RI_ConstraintTrigger_a_20354 for constraint fk_06067f5644 on merge_request_diffs: time=5166.029 calls=5213 Trigger RI_ConstraintTrigger_a_22039 for constraint fk_rails_316aaceda3 on merge_request_diffs: time=24418.137 calls=5213 Trigger RI_ConstraintTrigger_a_22364 for constraint fk_rails_501aa0a391 on merge_request_diffs: time=17948.358 calls=5213 Trigger RI_ConstraintTrigger_a_22469 for constraint fk_rails_5b2ecf6139 on approval_merge_request_rules: time=1118.373 calls=2092 Trigger RI_ConstraintTrigger_a_22614 for constraint fk_rails_6577725edb on approval_merge_request_rules: time=2321.448 calls=2092 Trigger RI_ConstraintTrigger_a_22849 for constraint fk_rails_80e6801803 on approval_merge_request_rules: time=5075.111 calls=2092 Trigger RI_ConstraintTrigger_a_23801 for constraint fk_rails_e605a04f76 on approval_merge_request_rules: time=1440.312 calls=2092 Execution Time: 142306.636 ms (280 rows)
1 - Vasilii Iakliushin added workflowin dev label and removed workflowblocked label
added workflowin dev label and removed workflowblocked label
- Vasilii Iakliushin mentioned in issue #346169 (closed)
mentioned in issue #346169 (closed)
- Vasilii Iakliushin mentioned in epic &7171
mentioned in epic &7171
- Maintainer
I'm going to work on #346169 (closed) first. There is a chance that it will resolve this issue as well.
1 Collapse replies - Maintainer
Update
The fix from #346169 (closed) did not completely resolve the problem, however it decreased the number of
deployment_merge_requests
failures. I see only 1 case in the last 7 days. I believe we should continue extracting heavy relations (like merge_requests) into separate batch delete transactions.I still think that the current issue will be resolved automatically as a by-product of the other fix.
- Maintainer
Is there an updated status on this issue?
Edited by Darva Satcher - Maintainer
@dsatcher I think we can mark it as resolved. This issue is caused by a random error that disappeared after fixes in #346169 (closed). Fixes to other database relations (especially #346166 (closed)) should lower the chance that this error occurs again.
- Developer
Thank you for the update @vyaklushin, this is good news.
- Maintainer
@vyaklushin, I agree with @sean_carroll, this is really good news!
- Sean Carroll marked this issue as related to #346169 (closed)
marked this issue as related to #346169 (closed)
- Sean Carroll removed the relation with #346169 (closed)
removed the relation with #346169 (closed)
- Sean Carroll marked this issue as related to #346169 (closed)
marked this issue as related to #346169 (closed)
- 🤖 GitLab Bot 🤖 changed milestone to %14.7
changed milestone to %14.7
- 🤖 GitLab Bot 🤖 added missed-deliverable missed:14.6 labels
added missed-deliverable missed:14.6 labels
- Vasilii Iakliushin closed
closed
- 🤖 GitLab Bot 🤖 mentioned in issue gl-retrospectives/create-stage/source-code#45 (closed)
mentioned in issue gl-retrospectives/create-stage/source-code#45 (closed)
- Kerri Miller mentioned in issue #347073 (closed)
mentioned in issue #347073 (closed)
- Kerri Miller marked this issue as related to #347073 (closed)
marked this issue as related to #347073 (closed)
- Kerri Miller removed the relation with #347073 (closed)
removed the relation with #347073 (closed)
- Bruno Freitas mentioned in issue #352511 (closed)
mentioned in issue #352511 (closed)