There was an issue (https://gitlab.com/gitlab-org/gitlab-ce/issues/45425) which caused that object store uploads were not deleted when deleting its parent resource (e.g. when deleting a project). Although https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/18698 fixes this issue, we may need to provide a rake task which would print a list of existing objects stored in object store which are not referenced by any upload in DB. This is probably most relevant to gitlab.com (which is the primary "canary" for object store usage).
@ayufan@jprovaznik how much urgency do you think we should put on this? I assume the main implementation concern here is getting the list of objects from the object storage?
I don't think that this is critical either. I'm not sure how efficiently run that. We have the same problem in CI, something that @dosuken123 is working on.
For checking orhened files on object storage, we can do something like this.
Correct me if I am wrong but the example there always searches for a specific file on the object storage. It never tries to get all files on that storage. Which I am sure is possible however I think that example does not help (and I don't know how to do it so far either - but I hope I'll figure that out )
We need something like ::Fog::Storage.new(cred).list_objects(bucket_name) to fetch all objects in remote storage. I'm not sure if this is performant as we have tons of files on object storages (S3/GCS).
However same as @dosuken123 I am concerned about performance.
When ignoring the performance for now - what is the scope of this issue in the end? In description we have only one case and that the rake task should only print the list of objects that are stored on object storage but not in the uploads table.
@jarka I think we can pick one that we think will be useful now, and create separate issues for the others. Which do you think is best / easiest to get started with?
I think we should start with the one described in the beginning "print a list of existing objects stored in object store which are not referenced by any upload in DB" - do we want to remove them as well?
@jarka Please note that !20863 (merged) handles a subset of uploads, just the project markdown ones. It would take some more effort to cover Case 6 completely.