Getting email with error The LFS objects download list couldn't be imported. Error: Request Entity Too Large for github <-> gitlab repository mirror. (Github to GitLab). Although the repository is only 430mb in size and we have other projects that exceeds 800mb in size that mirrors fine.
More importantly, the repository is actually syncing fine as all my commits came through as well as deployed fine. The email that was sent out more likely is a false alarm.
Steps to reproduce
(How one can reproduce the issue - this is very important)
What is the current bug behavior?
Getting email with entity too large issue
What is the expected correct behavior?
not getting an email
Relevant logs and/or screenshots
Output of checks
This bug happens on GitLab.com
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
Regarding pull mirroring, the process of importing LFS objects is executed in the last step, in order to, at least, update all the other refs. I guess, the only thing affected by this error could be any integrations attached to the repository update.
I'm wondering if the request we send to Github is above the client_max_body_size setting they have configured. Even if the repository is only 430 Mb, a request asking, for example, 1000 objects of 1 byte each one, may trigger this error. It depends on whether 1000 * lfs_header_size > client_max_body_size, where lfs_header_size is an object we build for each lfs files, according to the standard.
@samuel.lin may I ask you to execute a couple of commands in your console inside the repository dir?:
# all referencesgit lfs ls-files --all|wc -l# from HEADgit lfs ls-files|wc -l
I want to figure out the number of references you are requesting when you pull your mirror. Also, which is the protocol configured in your mirror url? http, https, ssh, ...
@nick.thomas AFAIK only in the LFS locks API exists something similar to pagination.
But I don't think that would be a problem. When we perform the request to that endpoint, we tell the endpoint which LFS objects we want the info from. So, we can set a max amount of lfs object pointers and request them into batches.
@stanhu I'm doubting. With the feature flag, we'd have to add some more logic, because depending on whether it is active or not the behavior changes a little bit. Besides, we won't be able to discover possible edge cases like this one.
I guess I'd prefer to allow the mirror process to succeed even if the LFS sync fails. We can put some logging there and analyze it later.
@fjsanpedro I think we need to make it more obvious to the users that LFS mirroring is failing even if we can't fix this problem. Not communicating to the users could cause more confusions and possible data loss if the user is relying on mirror to have complete records, won't it?
Would this kind of degraded repository be useful for those who have LFS objects in their repository?
I understand this is recently added feature, but I'm a bit uneasy about making it fail silently to the end users. I feel we need to look into batch processing this to solve this problem for good as much as possible. What do you think?
I think we need to make it more obvious to the users that LFS mirroring is failing even if we can't fix this problem
Totally agree, the import system lacks in some scenarios to return feedback to the user.
Would this kind of degraded repository be useful for those who have LFS objects in their repository?
Mmm, it depends, but I would say no. Besides, personally I think we should fail the import in order to avoid confusion.
I understand this is recently added feature, but I'm a bit uneasy about making it fail silently to the end users. I feel we need to look into batch processing this to solve this problem for good as much as possible. What do you think?
The error is basically because we tell the api to return the list of objects to download (from all branches ). In big projects, this can be huge.
Recently, there have been some updates to the LFS API, like allowing the ref param. We can iterate over the different branches and request the list of objects in that branch.
I think this will narrow that list and avoid the error. Nevertheless, it doesn't completely solve this problem because we can still have projects with tons of LFS objects in the same branch, but this will be fewer.
I'll start looking into splitting up the request somehow. I wonder if we need to go by branches only given the fact that there are also possibilities of failure in one branch with many LFS objects. I don't know much about LFS API, but I'll see if we can paginate them anyways. Thanks!
The problem is that we are attached to the LFS API specification. If you don't see any way for that pagination, maybe the quickest way would be to create an issue in the official LFS repo.
I think within this issue we need to investigate a possible solution (for example #28725 (comment 215770967)) and either implement it or allow the mirror process to succeed in case of LFS failure (#28725 (comment 215770974)) and create a separate issue for implementing the solution