Skip to content

Maven virtual registry: use LFK update_column_to

🔥 Problem

In the maven virtual registries, cached responses are destroyed by using a mark system. Records that need to be destroyed bear a mark to flag them as "ready to be destroyed".

Then, we have a background job that will walk through them to actually destroy them. This allows us to better cope the object storage references (a cached response is a file on object storage) and scalability (we could be destroying 1000s of cached responses).

Single record destruction is not a challenge but things get interesting when parent objects get destroyed. Here are the considered associations:

  1. upstream - 1:n -> cached_response.
  2. group - 1:n -> cached_response.

(1.) is already handled by Loose Foreign Key nullify. In this case, the "mark" is a NULL value for the upstream_id column on cached responses records.

(2.) is not currently covered and we can't use the same approach as the group_id column is the sharding key which means that we can't set it to NULL.

🚒 Solution

In #475204 (closed), we're implementing a new action for Loose Foreign Keys: update_column_to. This will allow us to not nullify a given column but set any desired value on any column.

Thus, we can have a status column that can be updated to pending_destruction.

(1.) should be updated to use that approach too. The cached response still being attached to the upstream, we should update the read queries to filter out cached response in other statuses than default.

This way, the cleanup background job can simply look for records with the status pending_destruction (instead of looking for 2 different "marks").

Edited by David Fernandez