Stale / Orphaned Pages Artifacts are not removed on Geo Secondary
Summary
- Stale pages artifacts on Geo Primary are cleaned up regularly.
- The file cleanup was updated in 16.5 to use cron
- There was a bug for file deletion via cron that is fixed now
- The page artifacts files are synced to Geo Secondary
- The stale files on Geo Secondary are not cleaned up
- Files will pile up on Geo Secondary, leading to a disk saturation problem
Workaround: There is no known workaround. Manual cleanup can be done by running the script mentioned in this comment.
Steps to reproduce
- Setup Geo with Pages
- Create multiple pages deployment in Geo Primary
- By default, pages artifacts is placed in
/var/opt/gitlab/gitlab-rails/shared/pages/@hashed
- Over time, the stale files will be removed by a cronjob
- Check Geo Secondary
- Pages artifacts inside
/var/opt/gitlab/gitlab-rails/shared/pages/@hashed
are never clean uo
Example Project
n/a
What is the current bug behavior?
Stale pages artifacts on Geo Secondary servers are not cleaned up properly.
What is the expected correct behavior?
Stale pages artifacts are cleaned up regularly on Geo Secondary
Possible fixes
I haven't looked into more details about why this is happening, but my guess is something to do with our cron process.
My guess is that the cron relies on data from the PagesDeployment
table. When the cron triggers in Geo Primary, it deletes the files, as well as the data on the table. The database is synced directly to Geo Secondary. Geo Secondary does not have any data on which files to delete anymore, so the cron on Geo Secondary does not clean up any files.
Possible workaround
To delete orphaned pages deployments one-time
Copied and adapted from #432777 (comment 1694034947).
-
SSH into an affected secondary Geo site
-
Find valid pages deployment files and write them to
pages_deployments
:gitlab-rails runner - << EOF > pages_deployments PagesDeployment.find_each do |p| puts "/#{p.id}/artifacts.zip" end EOF
We will use this
pages_deployments
file as an input to afind
invocation, in order to exclude valid files. -
List invalid pages deployment files:
find /var/opt/gitlab/gitlab-rails/shared/pages/@hashed -name artifacts.zip | grep -vf pages_deployments | less
-
This command will delete invalid pages deployment files:
find /var/opt/gitlab/gitlab-rails/shared/pages/@hashed -name artifacts.zip | grep -vf pages_deployments | xargs -L1 rm
-
Open Rails console on the secondary site:
gitlab-rails console
-
Resync pages deployment files which we should not have deleted. If it took less than 2 hours since beginning step 1, then use
2.hours.ago
. If it took less than 1 hour, then use1.hour.ago
.Geo::PagesDeploymentRegistry.where("last_synced_at > ?", 2.hours.ago).update_all(state: 0, last_synced_at: nil)