Add local project uploads cleanup task
What does this MR do?
Adds a rake task that can list, or delete move to lost-and-found, orphaned local project upload files.
Are there points in the code the reviewer needs to double check?
Why was this MR needed?
It is necessary for https://gitlab.com/gitlab-org/gitlab-ce/issues/49630 and https://gitlab.com/gitlab-org/gitlab-ee/issues/6012
Screenshots (if relevant)
Example output:
$ sudo gitlab-rake gitlab:cleanup:project_uploads
I, [2018-07-27T12:08:28.671559 #89817] INFO -- : Looking for orphaned project uploads to clean up. Dry run...
E, [2018-07-27T12:08:28.689869 #89817] ERROR -- : Skipping... Unable to parse project upload path: "/opt/gitlab/embedded/service/gitlab-rails/public/uploads/test.out"
I, [2018-07-27T12:08:28.754259 #89817] INFO -- : Found correct path! /opt/gitlab/embedded/service/gitlab-rails/public/uploads/qux/foo/bar/89a0f7b0b97008a4a18cedccfdcd93fb/foo.txt
I, [2018-07-27T12:08:28.755624 #89817] INFO -- : Can be moved to parent: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/foo/bar/89a0f7b0b97008a4a18cedccfdcd93fb/foo.txt
I, [2018-07-27T12:08:28.760257 #89817] INFO -- : Can be moved to lost and found: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/foo/bar/1dd6f0f7eefd2acc4c2233f89a0f7b0b/image.png
I, [2018-07-27T12:08:28.764470 #89817] INFO -- : To cleanup these files run this command with DRY_RUN=false
$ sudo gitlab-rake gitlab:cleanup:project_uploads DRY_RUN=false
I, [2018-07-27T12:09:24.944414 #89936] INFO -- : Looking for orphaned project uploads to clean up...
E, [2018-07-27T12:09:24.962321 #89936] ERROR -- : Skipping... Unable to parse project upload path: "/opt/gitlab/embedded/service/gitlab-rails/public/uploads/test.out"
I, [2018-07-27T12:08:28.754259 #89817] INFO -- : Found correct path! /opt/gitlab/embedded/service/gitlab-rails/public/uploads/qux/foo/bar/89a0f7b0b97008a4a18cedccfdcd93fb/foo.txt
I, [2018-07-27T12:08:28.755624 #89817] INFO -- : Moved to parent: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/foo/bar/89a0f7b0b97008a4a18cedccfdcd93fb/foo.txt
I, [2018-07-27T12:08:28.760257 #89817] INFO -- : Moved to lost and found: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/foo/bar/1dd6f0f7eefd2acc4c2233f89a0f7b0b/image.png
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
Tests added for this feature/bug - Conforms to the code review guidelines
-
Has been reviewed by a Backend maintainer
-
-
Conforms to the merge request performance guidelines -
Conforms to the style guides -
Conforms to the database guides -
If you have multiple commits, please combine them into a few logically organized commits by squashing them -
End-to-end tests pass ( package-and-qa
manual pipeline job)
What are the relevant issue numbers?
https://gitlab.com/gitlab-org/gitlab-ce/issues/49630 and https://gitlab.com/gitlab-org/gitlab-ee/issues/6012
Merge request reports
Activity
marked the checklist item Changelog entry added, if necessary as completed
assigned to @mkozono
changed milestone to %11.2
- Resolved by Ash McKenzie
- Resolved by Ash McKenzie
@ash.mckenzie Could you please give an initial review?
(This is working for me locally, but I still have it WIP since I need to add documentation, and I plan to test it on staging.)
Edited by Michael Kozono- Resolved by Michael Kozono
- Resolved by Ash McKenzie
- Resolved by Michael Kozono
Looking good @mkozono :)
assigned to @mkozono
Ok I addressed some feedback and responded on some, and I've added documentation. I also added a test for skipping an unexpected file path, and I moved the delete logic into the class.
I used a logger, and now running the rake task outputs to both Rails.logger and stdout. I've updated the screenshot in the description.
Back to you @ash.mckenzie!
Edited by Michael KozonoIn case you are wondering, the latest squashed changes were:
- Change variable name from
finder
tocleaner
- Change default from
remove: true
toremove: false
(this mistake happened because I originally named the argumentdry_run
) - Change
logger
argument in initialize from required to optional (which was my original intention)
Also, my staging testing isn't going well.
@ash.mckenzie I know this is hard to read, but since my Unix knowledge is not strong, I can't explain this weirdness:
- On GSTG, find took 2 hours, and returned only 39 files
- On Azure staging, find took 10 mins, and returned only 27 files
Could you please give it a look?
# find command hangs mkozono@sidekiq-besteffort-01-sv-gstg.c.gitlab-staging-1.internal:~$ sudo /usr/bin/ionice -c Idle find -L /opt/gitlab/embedded/service/gitlab-rails/public/uploads -type f ! \( -path /opt/gitlab/embedded/service/gitlab-rails/public/uploads/-/* -prune \) ! \( -path /opt/gitlab/embedded/service/gitlab-rails/public/uploads/@hashed/* -prune \) ! \( -path /opt/gitlab/embedded/service/gitlab-rails/public/uploads/tmp/* -prune \) -print0 <waited 1 min> ^C # simpler find command hangs mkozono@sidekiq-besteffort-01-sv-gstg.c.gitlab-staging-1.internal:~$ sudo find -L /opt/gitlab/embedded/service/gitlab-rails/public/uploads -type f <waited 1 min> ^C <snip> # Find in subdirectory works mkozono@sidekiq-besteffort-01-sv-gstg.c.gitlab-staging-1.internal:~$ sudo /usr/bin/ionice -c Idle find -L /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest -type f /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest/qa-test-gcp-migration/e7f8b7ec9b98c7a274e6bc0f53702f0d/bob.gif /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest/qa-test-gcp-migration/cc0691eed0191a9fc6cfb70b9f20e9b7/_.jpg /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest/qa-test-gcp-migration/5cf9a57f599059e6de4d346057831e60/Screen_Shot_2018-06-04_at_09.04.59.png /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest/qa-test-gcp-migration/9851cafa822ddfbe38d73ca9dc42b505/rails_sample.jpg <snip> # Tried find in all uploads again, and waited longer, and received some output mkozono@sidekiq-besteffort-01-sv-gstg.c.gitlab-staging-1.internal:~$ sudo /usr/bin/ionice -c Idle find -L /opt/gitlab/embedded/service/gitlab-rails/public/uploads/ -type f /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest/qa-test-gcp-migration/e7f8b7ec9b98c7a274e6bc0f53702f0d/bob.gif /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest/qa-test-gcp-migration/cc0691eed0191a9fc6cfb70b9f20e9b7/_.jpg /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest/qa-test-gcp-migration/5cf9a57f599059e6de4d346057831e60/Screen_Shot_2018-06-04_at_09.04.59.png /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest/qa-test-gcp-migration/9851cafa822ddfbe38d73ca9dc42b505/rails_sample.jpg ^C <snip> mkozono@sidekiq-besteffort-01-sv-gstg.c.gitlab-staging-1.internal:~$ sudo gitlab-rails console <snip> # Tried my running my code again. Am still waiting for a first batch to process irb(main):157:0> cleaner = Gitlab::Cleanup::ProjectUploads.new(logger: Rails.logger) => #<Gitlab::Cleanup::ProjectUploads:0x00007f28bcd0ba10 @logger=#<ActiveSupport::Logger:0x00007f28d3cb0b80 @level=1, @progname=nil, @default_formatter=#<Logger::Formatter:0x00007f28d3cb0ae0 @datetime_format=nil>, @formatter=#<ActiveSupport::Logger::SimpleFormatter:0x00007f28d350b510 @datetime_format=nil>, @logdev=#<Logger::LogDevice:0x00007f28d3cb0a90 @shift_period_suffix=nil, @shift_size=nil, @shift_age=nil, @filename=nil, @dev=#<File:/opt/gitlab/embedded/service/gitlab-rails/log/production.log>, @mon_owner=nil, @mon_count=0, @mon_mutex=#<Thread::Mutex:0x00007f28d3cb0a40>>, @local_levels=#<ThreadSafe::Cache:0x00007f28d3cb09c8 @backend={}, @default_proc=nil>>> irb(main):158:0> cleaner.run! Looking for orphaned project uploads to remove. Dry run... find command: "/usr/bin/ionice -c Idle find -L /opt/gitlab/embedded/service/gitlab-rails/public/uploads -type f ! ( -path /opt/gitlab/embedded/service/gitlab-rails/public/uploads/-/* -prune ) ! ( -path /opt/gitlab/embedded/service/gitlab-rails/public/uploads/@hashed/* -prune ) ! ( -path /opt/gitlab/embedded/service/gitlab-rails/public/uploads/tmp/* -prune ) -print0" ...< still waiting here, at least 30 mins >... # Update, it completed after 2 hours Processing batch of 39 project upload file paths, starting with /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest/qa-test-gcp-migration/e7f8b7ec9b98c7a274e6bc0f53702f0d/bob.gif Can be removed: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest/qa-test-gcp-migration/e7f8b7ec9b98c7a274e6bc0f53702f0d/bob.gif Can be removed: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/mkozonotest/qa-test-gcp-migration/5cf9a57f599059e6de4d346057831e60/Screen_Shot_2018-06-04_at_09.04.59.png Can be removed: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/gitlab-qa-sandbox/qa-test-15-11-2017-16-15-05/awesome-project-95d0ad5043bfeb85/f7c6e8251da94493dfddf4e0e57dd70f/IMG_0300.JPG <snip 16 more lines of orphans to keep the hashes secret> => nil
I started it as well on Azure staging and it returned in about 5 or 10 minutes with unexpected results:
[ stg ] production> cleaner = Gitlab::Cleanup::ProjectUploads.new => #<Gitlab::Cleanup::ProjectUploads:0x00007f08911d3cb0 @logger=#<ActiveSupport::Logger:0x00007f0893883658 @level=1, @progname=nil, @default_formatter=#<Logger::Formatter:0x00007f08938835b8 @datetime_format=nil>, @formatter=#<ActiveSupport::Logger::SimpleFormatter:0x00007f08930ece08 @datetime_format=nil>, @logdev=#<Logger::LogDevice:0x00007f0893883568 @shift_period_suffix=nil, @shift_size=nil, @shift_age=nil, @filename=nil, @dev=#<File:/opt/gitlab/embedded/service/gitlab-rails/log/production.log>, @mon_owner=nil, @mon_count=0, @mon_mutex=#<Thread::Mutex:0x00007f0893883518>>, @local_levels=#<ThreadSafe::Cache:0x00007f08938834a0 @backend={}, @default_proc=nil>>> [ stg ] production> cleaner.run!; Time.now Looking for orphaned project uploads to remove. Dry run... find command: "/usr/bin/ionice -c Idle find -L /opt/gitlab/embedded/service/gitlab-rails/public/uploads -type f ! ( -path /opt/gitlab/embedded/service/gitlab-rails/public/uploads/-/* -prune ) ! ( -path /opt/gitlab/embedded/service/gitlab-rails/public/uploads/@hashed/* -prune ) ! ( -path /opt/gitlab/embedded/service/gitlab-rails/public/uploads/tmp/* -prune ) -print0" Processing batch of 27 project upload file paths, starting with /opt/gitlab/embedded/service/gitlab-rails/public/uploads/system.bak/user/avatar/171554/avatar.png Host 10.129.1.102 came back online Can be removed: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/system.bak/user/avatar/171554/avatar.png Can be removed: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/system.bak/project/avatar/2689262/ghipster.png Can be removed: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/system.bak/group/avatar/1317779/Screen_Shot_2017-06-01_at_13.13.10.png Can be removed: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/gitlab-org/gitlab-ci-multi-runner/7b41c4a74b36c1f45ddae6fd447cd328/image.png Can be removed: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/gitlab-org/gitlab-ci-multi-runner/a17454b9f21b35a0115308a75f2dd1f1/Screenshot_from_2017-09-21_12_01_45.png Can be removed: /opt/gitlab/embedded/service/gitlab-rails/public/uploads/gitlab-org/gitlab-ci-multi-runner/604873327c0f30d6998904fdaa64cb30/Screenshot_from_2017-09-21_12_17_55.png => 2018-07-28 00:25:25 +0000 [ stg ] production>
I'd like to test the command on production, since I've found that it sometimes acts differently when it comes to this stuff (especially involving the filesystem, I don't know why). But stop me if that's a bad idea. A quick google said that a huge directory can make
ls
andfind
hang, but if we are able to cancel it, could it do any harm?Edited by Michael Kozono- Change variable name from
cc @stanhu Perhaps you might have some ideas about the above testing in staging?
mentioned in issue #46535 (closed)
mentioned in issue gitlab-com/migration#731 (closed)
@mkozono I think you found something there: I/O on the Azure uploads server is significantly faster than the I/O on the Google server, most likely because the latter is using standard disk while the former is on SSD. I think we'd better fix that: gitlab-com/migration#731 (closed).
- Resolved by Michael Kozono
- Resolved by Michael Kozono
- Resolved by Michael Kozono
Great work @mkozono , only a few minor comments :)
assigned to @mkozono
@mkozono I think you found something there: I/O on the Azure uploads server is significantly faster than the I/O on the Google server, most likely because the latter is using standard disk while the former is on SSD. I think we'd better fix that: gitlab-com/migration#731 (closed).
Nice find @stanhu!
added 210 commits
-
3e1d9522...ff060ad4 - 209 commits from branch
master
- 8a5b907f - Add local project uploads cleanup task
-
3e1d9522...ff060ad4 - 209 commits from branch
@mkozono I think the machines have been upgraded to SSDs. Try your test again?
@stanhu
I am just about to test this again.@stanhu The task completed on GSTG in 38 minutes instead of 2 hours.
@mkozono Interesting, still slower than Azure?
Yep, but staging is weird so it may not be a valid test for comparing Azure vs GC.
I have now tried the task on GPRD, and it acts as I originally expected-- it hangs for only ~30 seconds, then quickly begins returning batches. Batches are processed faster than the verify uploads task, and we only search through a subset of uploads, so performance is not actually a concern for this MR IMO.
I still don't understand the mechanism for why staging acts so much differently, but I don't think it's actually a cause for concern.
marked the checklist item Documentation created/updated as completed
marked the checklist item Conforms to the merge request performance guidelines as completed
marked the checklist item If you have multiple commits, please combine them into a few logically organized commits by squashing them as completed
marked the checklist item Conforms to the database guides as completed