Find way to `dependent: :destroy` without creating transactions
We have several examples of cross-database modifications (eg. executing queries against 2 different databases within the context of a transaction) that are caused by the use of dependent: :destroy. One example is:
class Project
has_many :builds, class_name: 'Ci::Build', inverse_of: :project, dependent: :destroy # rubocop:disable Cop/ActiveRecordDependent
end
The usual use case for such things is cleaning up data that lives outside of Postgres. One common example being files that live in object storage that relate to builds/artifacts and so on.
When you execute project.destroy it does:
- Start a transaction
- Loop through all builds calling
destroyon each - Destroy the project
- Commit the transaction
This code will all work still when we have builds on a separate DB but we will lose the transactional guarantee that any failure during steps 2/3 will mean the entire transaction is rolled back and nothing is deleted. When we move to 2 separate databases we are just going to have to live with possible data inconsistencies like this.
Possible solutions
- We are implementing an alternative to Postgres's cascading deletes in #338535 (closed) and there has been some discussion about extending this mechanism to support the
destroycase as well (ie. we load the models in our cleanup worker and call#destroyon them allowing them to do whatever cleanup they need) but it was mentioned that this may not work if the parent object no longer exists as data from the parent object may be necessary for the cleanup. We should investigate to see if there is any way to handle these cases. We may still want to implement this as an option for cases where it will work and then fan out more complicated cases to the relevant team to come up with a specific solution to their case - Schedule the cleanup async in sidekiq. This would give us better guarantee that the work will complete eventually (due to sidekiq retries) than doing it sync in the single request context as short timeouts could cause this to lead to partial cleanup regularly
- Schedule the cleanup after the first transaction completes
TODO
-
List all examples that violate cross-db modification -
For each example describe high level what the purpose of dependent: :destroyis -
Can examples be made async? Do they need a parent object present and if so why? -
Can the deletes remain sync but be done after the transaction keeping parent objects in memory? Worst case scenario is partial cleanup? Is this bad? Loose FK constraints will enforce DB cleanup later but there may remain partial cleanup of object storage? Can we even guarantee perfect object storage cleanup today? -
Also look into FastDestroyAlland usages and see if it has the same problem -
What code paths are calling FastDestroyAll.fast_destroy_alldirectly and do they actually cross-db boundaries in a transaction?
Examples
-
app/models/project.rb: has_many :builds, class_name: 'Ci::Build', inverse_of: :project, dependent: :destroy- Fanned out to grouppipeline authoring #340256 (closed)
-
app/models/project.rb: has_one :ci_cd_settings, class_name: 'ProjectCiCdSetting', inverse_of: :project, autosave: true, dependent: :destroySeems to be have been added in gitlab-foss!18745 (diffs) as some kind of workaround for import/export issues which weirdly related toproject.update(ci_cd_settings: nil). This appears to not actually be critically necessary for it to be adestroyand operation as there is no destroying needed. I suspect we'll be able to implement a different workaround here.- This is actually redundant. Since the table is called
project_ci_cd_settings(and notci_cd_settingslike I thought) then it won't actually be moving to the new CI database. I did actually create an MR to remove this !69947 (closed) but realised it's not necessary. The code is still redundant and we could remove it but it's not a blocker so I'll leave this alone for now.
-
app/models/user.rb: has_many :triggers, dependent: :destroy, class_name: 'Ci::Trigger', foreign_key: :owner_id- I was able to trace this back to this comment gitlab-foss!9713 (comment 24835526) but the MR gives no indication about why the cascading delete is not sufficient.
- MR to fix it !69721 (merged)
-
User.has_many :builds, dependent: :nullify, class_name: 'Ci::Build'andUser.has_many :pipelines, dependent: :nullify, class_name: 'Ci::Pipeline'\- Extracted to #340260 (closed)
FastDestroyAll examples
-
app/models/project.rb: use_fast_destroy :build_trace_chunks- If using the
fast_destroy_allmethod then this could theoretically have the same problem since it callsdelete_all. I've yet to determine the code path that uses it but it does not seem to be the same path of deleting aProject - For the
Project.use_fast_destroy :build_trace_chunkscase it seems it does not actually have this transaction problem as it doesn't ever actually calldeleteordelete_allon theci_build_trace_chunks. Checking the output fromProject#destroyI can't see any deletes happening onci_build_trace_chunksand so I think it seems to be relying on Postgres cascading deletes. The code path is:
- If using the