Blob.lazy batch loader requests blobs that have already been loaded
http://profiler.gitlap.com/20190412/19154b3c-80f5-4808-b98e-ce4bd1fa6241.html.gz is a profile for https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/21767/discussions.json
.
It has this section, which is curious:
We call Gitlab::Git::Blob.binary?
3,100 times just from within this call stack. That adds up to several seconds of wall time in the request.
The reason for this appears to be that we batch these calls, but we don't deduplicate the batches. In the performance bar for that URL (from https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/21767), I see 41 matches for revision=>"9621bbb94cde3d33fce3c2c25fb98748f7a1824a", :path=>"app/views/projects/ci/builds/_build.html.haml"
from different blob_service#get_blobs
calls.
I was trying to come up with a better description for this, but I don't get it. We're:
- Making too many calls to
get_blobs
. (https://gitlab.com/gitlab-org/gitlab-ce/issues/58297) - Including far too many blobs in those requests. (this issue, see https://gitlab.com/gitlab-org/gitlab-ce/issues/60829#note_163484873)
- Spending a lot of time in encoding detection as a result.
I think those are the order we should fix these in. If we can't fix that we could also consider keying blobs in the request store so we don't have to do item 3, at least.