Skip to content

Stale / Orphaned Pages Artifacts are not removed on Geo Secondary

Summary

Workaround: There is no known workaround. Manual cleanup can be done by running the script mentioned in this comment.

Steps to reproduce

  1. Setup Geo with Pages
  2. Create multiple pages deployment in Geo Primary
  3. By default, pages artifacts is placed in /var/opt/gitlab/gitlab-rails/shared/pages/@hashed
  4. Over time, the stale files will be removed by a cronjob
  5. Check Geo Secondary
  6. Pages artifacts inside /var/opt/gitlab/gitlab-rails/shared/pages/@hashed are never clean uo

Example Project

n/a

What is the current bug behavior?

Stale pages artifacts on Geo Secondary servers are not cleaned up properly.

What is the expected correct behavior?

Stale pages artifacts are cleaned up regularly on Geo Secondary

Possible fixes

I haven't looked into more details about why this is happening, but my guess is something to do with our cron process.

My guess is that the cron relies on data from the PagesDeployment table. When the cron triggers in Geo Primary, it deletes the files, as well as the data on the table. The database is synced directly to Geo Secondary. Geo Secondary does not have any data on which files to delete anymore, so the cron on Geo Secondary does not clean up any files.

Internal only zd ref

Possible workaround

To delete orphaned pages deployments one-time

Copied and adapted from #432777 (comment 1694034947).

  1. SSH into an affected secondary Geo site

  2. Find valid pages deployment files and write them to pages_deployments:

    gitlab-rails runner - << EOF > pages_deployments
    PagesDeployment.find_each do |p|
     puts "/#{p.id}/artifacts.zip"
    end
    EOF

    We will use this pages_deployments file as an input to a find invocation, in order to exclude valid files.

  3. List invalid pages deployment files:

    find /var/opt/gitlab/gitlab-rails/shared/pages/@hashed -name artifacts.zip | grep -vf pages_deployments | less
  4. This command will delete invalid pages deployment files:

    find /var/opt/gitlab/gitlab-rails/shared/pages/@hashed -name artifacts.zip | grep -vf pages_deployments | xargs -L1 rm
  5. Open Rails console on the secondary site:

    gitlab-rails console
  6. Resync pages deployment files which we should not have deleted. If it took less than 2 hours since beginning step 1, then use 2.hours.ago. If it took less than 1 hour, then use 1.hour.ago.

    Geo::PagesDeploymentRegistry.where("last_synced_at > ?", 2.hours.ago).update_all(state: 0, last_synced_at: nil)
Edited by Michael Kozono