Mirrors won't download new LFS files
Summary
This is a rather specific bug introduced by !41770 (merged) and #214211 (closed)
The lack of handling of the following error causes no new LFS downloads to occur after the first import of the pull mirror repository:
{"severity":"ERROR","time":"2021-05-19T07:53:08.653Z","correlation_id":"01F61STA29DWG7RS7XBXCE2RX4","user_id":1,"project_id":33,"import_url":"https://:*****@gitlab.com/hchouraria/lfs-test3.git","error_message":"Validation failed: Lfs object already exists in repository"}
Steps to reproduce
- Create a git repository anywhere outside your GitLab instance
- For example, on GitHub, or on a different GitLab instance (such as GitLab.com if using Self-Managed, or vice-versa)
- Push some initial LFS files to the repository, with a file size above 1.0 MiB
git lfs track *.big
git add .gitattributes
git commit -m "add lfs tracking entry"
# Creates a 1.1 MiB file
dd if=/dev/random of=file1.big bs=1024 count=1100
git add file1.big
git commit -m "add file1 to lfs"
git push origin main
-
Setup a mirroring of this repository in your GitLab.com or SM GitLab instance.
- Choose new project, "Run CI/CD for external repository" to do this.
- You can use either GitHub integration, or Repo by URL, the source does not matter.
-
Wait for the initial import to complete. Try to download the LFS
file1.bigfile from repo UI. It works/downloads/carries data. -
Now go back to the external source repo. Add another LFS file and push. File size does not matter now.
# Creates a 10 MiB file
dd if=/dev/random of=file2.big bs=1024 count=10000
git add file2.big
git commit -m "add file2 to lfs"
git push origin main
-
Wait for the mirror repo to complete fetching the new changes
-
Observe that the following error appears in the
update_mirror_service_json.log:
{"severity":"ERROR","time":"2021-05-19T07:53:08.653Z","correlation_id":"01F61STA29DWG7RS7XBXCE2RX4","user_id":1,"project_id":33,"import_url":"https://:*****@gitlab.com/group/project.git","error_message":"Validation failed: Lfs object already exists in repository"}
-
Observe that the repo UI now shows the
file2.bigfile, but if you attempt to download it, it fails with a 404. -
Repeat the steps above to add as many new regular or LFS files, and observe that none of the new LFS files' actual data are being downloaded automatically anymore
-
Only the first LFS file, present during initial import, is present/downloadable
Example Project
File file12.big on https://gitlab.com/gitlab-gold/hchouraria/lfs-downloads-fail
404: https://gitlab.com/gitlab-gold/hchouraria/lfs-downloads-fail/-/raw/main/file12.big
What is the current bug behavior?
No new LFS files added to source repo are downloaded by the mirror automatically New LFS files in the repo do not download in CI/CD, and return 404 on the UIs when downloaded
What is the expected correct behavior?
All new LFS files added to source repo must be downloaded by the mirror automatically New LFS files in the repo must be available to CI/CD and downloads on the UIs
Relevant logs and/or screenshots
{"severity":"ERROR","time":"2021-05-19T07:53:08.653Z","correlation_id":"01F61STA29DWG7RS7XBXCE2RX4","user_id":1,"project_id":33,"import_url":"https://:*****@gitlab.com/group/project.git","error_message":"Validation failed: Lfs object already exists in repository"}
Output of checks
This bug happens on GitLab.com
Possible fixes
The issue is in file app/services/projects/lfs_pointers/lfs_download_service.rb on line:
return link_existing_lfs_object! if lfs_size > LARGE_FILE_SIZE && lfs_object
- When the LFS object list is retrieved, all the existing OIDs are part of the list.
- We try to perform a
link_existing_lfs_object!for one such pre-existing file. - But this fails with the object validation error, because a record for an
LfsObjectsProjectmodel already exists - The exception thrown breaks the entire download loop and aborts
- No file beyond the first (repeated) OID is processed
- No new LFS files end up being truly downloaded
The only workarounds I can think of are rather impractical:
- Setup the mirror when there are no pre-existing LFS files in the source repo (or)
- Ensure no pre-existing LFS file is above 1 MiB ever (even in history of the repo)
This bug effectively breaks use of LFS with mirroring options (using GitLab as the CI/CD endpoint for external repos being the use-case) so it is pretty severe for those that already have been using LFS on their source systems.