GitLab indiscriminately purges old artifacts from self-hosted instances
TL;DR
The migration added in !47723 (merged), which was written to satisfy a gitlab.com
requirement1, will eventually cause silent, unrecoverable data loss for all self-hosted instances.
Problems
-
The migration
backfill_artifact_expiry_date.rb
, added in !47723 (merged) retroactively sets an expiration date on all "old" artifacts which do not have an expiration date. This includes artifacts for tagged builds which many workflows deem to be precious.This was never mentioned on the GitLab 13.8 release blogpost!
-
The only reason that most users have not yet seen this problem is because the service which actually destroys expired artifacts is broken and doesn't work when it encounters large "slugs" of artifacts all of which have the same expiration date: #330378
DestroyAllExpiredService
callseach_batch
in an unsafe mannerOnce #330378 (closed) is fixed, self-hosted admins will start getting complaints from their users that their old artifacts are inexplicably deleted.
💥 -
The migration does not filter on
file_type
; it even deletes job logs, which are normally excluded from artifact expiration.Adding to the frustration, because deleted artifacts get their rows deleted from the database, without database backups there is no way to determine which files or rows have been removed.
-
The migration wasn't even implemented correctly and had to be backed-out and re-scheduled:
- @stanhu pointed out that it was failing when it was pushed to production !47723 (diffs, comment 486021780)
- !51821 (merged), !51822 (merged), !55093 (merged)
-
While this migration set
ci_job_artifacts.expire_at
, it did not setci_builds.artifacts_expire_at
(which is terribly redundant).The result is that this code:
- Does not tell you that the artifacts will expire if the current time is before the artifactopocalypse
- Does not let you Keep artifacts
- Does not tell you when artifacts were removed if the current time is after artifactopocalypse (whether or not
DestroyAllExpiredService
actually was able to remove them)
(Added based on this comment and gitlab 13.9.5)
Complaints
This migration was 100% written with gitlab.com
needs in mind, and with complete disregard for GitLab customers who choose to run self-hosted instances.
When artifact expiration was first introduced, I begged for some sort of control that would prevent tagged builds from being deleted. As with many GitLab features, the MVP was rolled out with the promise that they would implement something like that in the future. Yet here we are, GitLab still doesn't give a shit about the tagged artifacts on self-hosted instances. Related:
I find it incredibly disturbing that the the author of this migration, and everyone involved in the approvals process, find it acceptable to dictate a data retention policy for all of GitLab's on-premise customers.
Why the hell was this implemented as a migration anyway? If gitlab.com
wants to enforce a policy on its own instance, why wasn't that done via the API, or directly against the database?
Why should self-hosted users be affected by gitlab.com
's policies?
cc @matteeyah @iroussos @abrandl @ayufan @jheimbuck_gl
-
See #263234 (closed) which starts "To support the change at https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/10177 ..."
↩