Geo: Sync expired (but not yet deleted) artifacts
Problem
- Have DB replication lag of more than 1 second (I am assuming this was the case, which helped surface a race condition)
- Trigger jobs which create artifacts which expire in 1 second
The artifacts never get synced to the secondary, even though you can still browse and download the artifacts on the primary.
What is the behavior of expired artifacts?
The job on the primary says: These artifacts are the latest. They will not be deleted (even if expired) until newer artifacts are available.
. So, expired artifacts does not equal deleted artifacts.
Proposal
Should expired artifacts be synced by Geo secondaries? They haven't been since 11.0, but it was to resolve failures when attempting to sync expired artifacts #5357 (closed).
We could cause them to be synced by removing not_expired
from https://gitlab.com/gitlab-org/gitlab/-/blob/v13.4.1-ee/ee/app/models/ee/ci/job_artifact.rb#L80
I assume this is safe since if/when they do get deleted from the primary, a delete event will occur and be processed by secondaries. We probably need to fire more delete events from the primary e.g. from services that cause mass deletion of artifacts. => This is now covered by #297472 (closed)
I think this would make secondaries more accurate representations of the primary.
This is no longer relevant to this issue
Update: See #258918 (comment 484092073) and #297472 (closed)
A minor related problem
If the expiration is longer so the artifacts become synced, when artifacts do expire, they are not deleted, so no delete event is processed. The background workers on the secondary eventually discover the expired artifact and delete it (but they do not delete the file since delete jobs triggered by this process are not given the file path, and then the delete job just deletes the registry). If/when the artifact is deleted on the primary, the delete event happens but the way FileRegistryRemovalService
is written, it exits without deleting the file if there is no registry.
Summary: Artifacts become orphaned on secondaries after expiration.
I don't want to create a separate issue yet for this problem since it would be resolved by removing the not_expired
scope.