Upload Purging Scales Poorly as Number of Registry Instances Increases
Context
Problem
Currently, upload purging is managed by a goroutine that is spawned by the constructor function for the handlers.App
struct. This means that each instance of the registry will attempt to sweep the same object storage periodically for partial uploads which could be cleaned up, potentially resulting in many (N-1) more sweeps than are necessary.
Possible Solutions
Standalone: It should be relatively simple to break out this logic into a standalone command, similar to the garbage collection command. This would introduce a manual maintenance operation, however.
Lock File: It should also be possible to use a lock file which the upload purger could check before running a sweep. This file would contain the timestamp of the last completed run so that the interval set in the config could be honored. We would need to have a special case for the initial run for a dataset, but it's not unmanageable. This change is a little more complicated, but transparent to the user.