Skip to content
GitLab
Next
    • GitLab: the DevOps platform
    • Explore GitLab
    • Install GitLab
    • How GitLab compares
    • Get started
    • GitLab docs
    • GitLab Learn
  • Pricing
  • Talk to an expert
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    Projects Groups Topics Snippets
  • Register
  • Sign in
  • GitLab GitLab
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
    • Locked files
  • Issues 50,435
    • Issues 50,435
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 1,543
    • Merge requests 1,543
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
    • Test cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.orgGitLab.org
  • GitLabGitLab
  • Issues
  • #207808
Closed
Open
Issue created Feb 24, 2020 by Michael Kozono@mkozono2️⃣Maintainer

Geo: Bug: FileDownloadDispatchWorker may sometimes excessively resync files, and block new files from being synced until the next hour

Summary

In certain circumstances, FileDownloadDispatchWorker goes into an almost infinite loop of syncing and resyncing files that are already synced.

Steps to reproduce

  • Set up 2 fresh instances with ansible-gitlab-geo on GitLab Version 12.7.6-ee
  • Create a project
  • Clone it
  • git lfs install
  • git lfs track "*.txt"
  • Create 10 new .txt files (because File sync capacity defaults to 10)
    create_n_txt_files() {
      for n in {1..$1}; do
        echo "Hello $n" > "File$n.txt"
      done
    }
    create_n_txt_files 10
  • git add . && git commit -m "Add 10 LFS files" && git push

This is reproducible locally on master (which is working on 12.9) as well. FileDownloadDispatchWorker uses a different query if the feature geo_lfs_registry_ssot_sync is enabled or disabled, but it does the infinite loop behavior either way.

What is the current bug behavior?

  • SSH into the secondary and tail -f /var/log/gitlab/gitlab-rails/geo.log | grep File
  • Observe an unending loop of FileDownloadDispatchWorker enqueuing FileDownloadService for IDs that are already synced.
    • Until FileDownloadDispatchWorker's 1 hour timeout is reached

What is the expected correct behavior?

  • A single batch of 10 FileDownloadService log lines with message File download

Relevant logs and/or screenshots

{"severity":"INFO","time":"2020-02-24T08:46:47.272Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadDispatchWorker","message":"Loop 17","job_id":"ffbc8e9ed85831d8c7f8f0ad","enqueued":10,"pending":0,"scheduled":10,"capacity":10}
{"severity":"INFO","time":"2020-02-24T08:46:47.291Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":105,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.087}
{"severity":"INFO","time":"2020-02-24T08:46:47.294Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":106,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.088}
{"severity":"INFO","time":"2020-02-24T08:46:47.310Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":111,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.106}
{"severity":"INFO","time":"2020-02-24T08:46:47.331Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":114,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.096}
{"severity":"INFO","time":"2020-02-24T08:46:47.338Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":108,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.109}
{"severity":"INFO","time":"2020-02-24T08:46:47.345Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":110,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.121}
{"severity":"INFO","time":"2020-02-24T08:46:47.346Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":109,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.091}
{"severity":"INFO","time":"2020-02-24T08:46:47.348Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":112,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.084}
{"severity":"INFO","time":"2020-02-24T08:46:47.377Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":107,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.108}
{"severity":"INFO","time":"2020-02-24T08:46:47.392Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":113,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.094}
{"severity":"INFO","time":"2020-02-24T08:46:48.361Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadDispatchWorker","message":"Loop 18","job_id":"ffbc8e9ed85831d8c7f8f0ad","enqueued":10,"pending":0,"scheduled":10,"capacity":10}
{"severity":"INFO","time":"2020-02-24T08:46:48.406Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":106,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.087}
{"severity":"INFO","time":"2020-02-24T08:46:48.421Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":105,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.106}
{"severity":"INFO","time":"2020-02-24T08:46:48.423Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":110,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.099}
{"severity":"INFO","time":"2020-02-24T08:46:48.426Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":108,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.103}
{"severity":"INFO","time":"2020-02-24T08:46:48.431Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":114,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.107}
{"severity":"INFO","time":"2020-02-24T08:46:48.438Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":107,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.061}
{"severity":"INFO","time":"2020-02-24T08:46:48.470Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":112,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.093}
{"severity":"INFO","time":"2020-02-24T08:46:48.474Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":113,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.098}
{"severity":"INFO","time":"2020-02-24T08:46:48.483Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":111,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.162}
{"severity":"INFO","time":"2020-02-24T08:46:48.485Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":109,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.113}
{"severity":"INFO","time":"2020-02-24T08:46:49.470Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadDispatchWorker","message":"Loop 19","job_id":"ffbc8e9ed85831d8c7f8f0ad","enqueued":10,"pending":0,"scheduled":10,"capacity":10}
{"severity":"INFO","time":"2020-02-24T08:46:49.475Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":106,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.067}
{"severity":"INFO","time":"2020-02-24T08:46:49.484Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":111,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.069}
{"severity":"INFO","time":"2020-02-24T08:46:49.518Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":105,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.1}
{"severity":"INFO","time":"2020-02-24T08:46:49.519Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":108,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.084}
{"severity":"INFO","time":"2020-02-24T08:46:49.521Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":110,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.081}
{"severity":"INFO","time":"2020-02-24T08:46:49.522Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":114,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.085}
{"severity":"INFO","time":"2020-02-24T08:46:49.543Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":107,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.069}
{"severity":"INFO","time":"2020-02-24T08:46:49.557Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":112,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.089}
{"severity":"INFO","time":"2020-02-24T08:46:49.562Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":113,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.075}
{"severity":"INFO","time":"2020-02-24T08:46:49.570Z","correlation_id":"2ad51c871d2bcd529cb6e3a03fb80672","class":"Geo::FileDownloadService","object_type":"lfs","object_db_id":109,"message":"File download","mark_as_synced":true,"download_success":true,"bytes_downloaded":10,"failed_before_transfer":false,"primary_missing_file":false,"download_time_s":0.098}
  • I have found this get to Loop 3279 on one instance before it hit the 1 hour timeout.
  • If you push below the batch size, it does not trigger the loop.
  • If you push new files while it's in this loop, it does not sync those.
  • If you push 100 new LFS files while FileDownloadDispatchWorker is between runs because there wasn't much LFS activity, when it starts up again, it actually cycles through all 100 of the new files before starting from the beginning of the 100 files again.
    • Extrapolating from this, I expect LFS syncing becomes semi-up-to-date every hour, and if you have very low LFS activity for a couple hours, overnight for example, then the secondaries have a chance to break out of this behavior until somebody pushes 10 or more files between runs. This is probably why this bug has not been reported. It is typically not permanent, and while it's happening it gets near synced every hour. For this same reason, I think we can bump this down to ~S3. But I'd do ~P2.
  • If you restart sidekiq, it finishes that group of files that it was stuck on, but if there are 10 or more files already pending to be synced after that, then it will get stuck looping on those.

Results of GitLab environment info

A fresh Geo setup on GitLab Version 12.7.6-ee. This happens on newer versions too.

Possible fixes

Unconfirmed suspicion:

I think the LfsObjectRegistryFinder#find_unsynced is returning the same results during each FileDownloadDispatchWorker#load_pending_resources. The condition that is supposed to ensure these results are excluded is missing_registry at https://gitlab.com/gitlab-org/gitlab/-/blob/v12.7.6-ee/ee/app/finders/geo/lfs_object_registry_finder.rb#L44. I think the scopes are correct. I think the database data is correct. So my suspicion is the problem is in the area of the ORM layer, or how we are using FDW, or how we are handling database connections, etc.

I have a feeling the root cause is the same as what blocked me in !20002 (comment 266285571).

Edited Feb 25, 2020 by Michael Kozono
Assignee
Assign to
Time tracking