Feature Request: Add Rake task to clean up references to missing remote uploads
Problem to solve
Current documentation only covers for: Clean up project upload files if they don’t exist in GitLab database. It does not cover scenario of the remote object has been deleted permanently but still exists in GitLab database. It would be nice to have a clean up rake task to do this.
Intended users
User experience goal
Goal
The user should be able to clean up references to remote uploads that were deleted externally.
User experience workflow
- User faces the problem that their remote uploads were deleted and can cannot be recovered.
- Running the command
gitlab-rake gitlab:uploads:check VERBOSE=1
shows that the references to remote uploads still exist in GitLab Database. - User searches GitLab documentation and reaches the Clean up project upload files from object storage page and finds a section that matches their scenario: Remote Uploads deleted externally but exists in GitLab database.
- User follows the steps recommended in the documentation to clean up all references to remote objects that does not exist.
Proposal
Currently, there is no rake task to clean up references to remote uploads that no longer exist in the system.
We propose to have a new rake task to delete these references to remote uploads that are no longer available.
Example: sudo gitlab-rake gitlab:cleanup:remote_upload_files_references
or something similar.
Further details
Customer (ZD ticket - Internal Use) faced the problem of deleting references to remote uploads that were deleted externally from AWS S3. This can be seen when running the integrity check:
Example:
$ sudo gitlab-rake gitlab:uploads:check VERBOSE=1
Checking integrity of Uploads
- 100..434: Failures: 2
- Upload: 100: Remote object does not exist
- Upload: 101: Remote object does not exist
Done!
Since there is no rake task to remove the entries in GitLab, we came up with a workaround to remove the references through Rail console:
uploads_deleted=0
Upload.find_each do |upload|
next if upload.retrieve_uploader.file.exists?
uploads_deleted=uploads_deleted + 1
p upload ### allow verification before destroy
# p upload.destroy! ### uncomment to actually destroy
end
p "#{uploads_deleted} were destroyed."
The code snippet is documented under Troubleshooting of Integrity check Rake task.
Permissions and Security
Permission to run sudo gitlab-rake
on application server that GitLab is running on.
Documentation
Add the new clean up rake task section to Clean Up Project Upload Files Documentation.
Availability & Testing
Test Plan:
- Setup a GitLab instance that uses Amazon S3 as object storage.
- Create an issue and attach a file.
- Delete the file from Amazon S3.
- Run integrity check rake task (
gitlab-rake gitlab:uploads:check VERBOSE=1
)on GitLab instance and check the output message for the errorRemote object does not exist
. - Run the new rake task to clean up references to the remote upload that does not exist.
- Run integrity check rake task again to confirm the reference has been deleted.
What does success look like, and how can we measure that?
System administrators who face the same problem able to navigate to the Clean Up documentation and run the appropriate clean up rake task.
What is the type of buyer?
All tiers
Is this a cross-stage feature?
Links / references
Docs feedback: Documentation to clean up/delete references to remote uploads is missing