Package stage: object storage handling
🔥 Problem
The grouppackage registry and groupcontainer registry relies heavily on object storage to store different kind of files or artifacts.
It is very often that we stumble upon the following situation:
- A parent model has many children models (
has_many
association). - Each children model has a reference pointing to a file on object storage.
- We can destroy:
- A single children.
- Many children.
- The parent model.
- In all destruction cases, the file on object storage needs to be removed.
On top of that, we have the sharding key which can be a group_id
or project_id
. The additional problem here is that the sharding key can also be the parent key. The sharding key brings an additional constraint: it can't be set to NULL
, never.
Lastly, we have the statistics challenge. Usually, object storage usage is monitored and measure to create a metric: the object storage usage. This means that when a file is removed from object storage, its size should be deducted from the related statistic (located on the Project or (root) Group).
Destroying a single children is usually not a problem, we can overcome the above challenges easily.
Destroying multiple records at once is where things get interesting.
What we already tried
- We tried using a mark (
status
column) so that the actual destruction of the records is delayed to a background job (that can take whatever time to walk through marked records).- This way bulk destroying is as costly as bulk updating.
- The model (package file) on which we tried this has an object storage key that depends on the parent id and even the grand parent id which makes the entire thing more complex to handle (we can't delete a record without the parent id and grand parent id). More information in #358941.
- Using a mark means that all regular access need to filter out those marked records.
- We tried using
SET NULL ON DELETE
on the parent id column. Basically a nullable foreign key. In this case, we have a mark (null foreign key) and at the same time the objects are disconnected from the parent -> we don't need to update the read queries.- This however causes a challenge to keep the statistics in sync since the parent id is
NULL
. - This doesn't work on cases where the parent id is also the sharding key (sharding key can't be
NULL
).
- This however causes a challenge to keep the statistics in sync since the parent id is
🚒 Solution
We think that we should have a global solution that cover all cases (parent id is sharding key or not, statistics involved or not).
This might be achievable by the following:
- Use stable object storage keys. The record that is going to be destroyed should have everything that is required to know the object storage key. The easiest solution here is to store the object storage key to the database. We set its value when the record is created and that value will not move until we destroy the record.
- We need a mark to delay the object storage destruction (eg. a background cleaning task). This mark can't be the parent id. Thus, we don't have a choice than using a
status
column.- Thus, we accept the downside of: all read queries should filter out records marked for destruction.
- We need a way to quickly update the mark when destroying things in bulk or in particular destroy the parent.
- Here we think we could leverage the loose foreign key solution to describe the parent foreign key.
- Using a loose foreign key, we need to describe what to do when the parent id doesn't exist anymore (eg. the parent has been destroyed). Here we could introduce a new action:
update_column_to
. This action would update the target column to the target value. As you may have guessed, we can update thestatus
column to thepending_destruction
value. - Doubts: can we introduce such new action without impacting the loose foreign key logic performance?
- Benefit: because the sharding key is kept, the background cleanup job can handle the statistics updates.
One last point: why can't we simply add a new action destroy_object_storage_file
in the loose foreign key logic? This might be possible but we're assuming here that this logic works with database queries as much as possible to get the fastest performance. The problem with destroy object storage files is that we need to trigger an API request to the object storage provider with the key to delete. We sense that this part would massively slow down the loose foreign key logic.
👣 Next steps
- Investigate the loose foreign key logic to assert what is possible and what kind of new
on_delete
action can be introduced.