WIP:Improve performance of LFS integrity check
What does this MR do?
It tries to improve the performance of the LFS integrity check by using git diff
and git cat-file
instead of git rev-list
(which returns all the revisions when pushing a new branch). The mentioned git
commands are already being used for other checks which means that we don't need to invoke the git
command multiple times since the content is cached in Repository#raw_changes_between
Are there points in the code the reviewer needs to double check?
Do we need to delete other unused code? Do you think we can get the same results by using git diff
instead of git rev-list
?
Why was this MR needed?
We were observing some performance issues in Grafana.
I've made a test by replacing the current implementation for checking LFS integrity by reusing the Repository#raw_changes_between
method along with Gitlab::Git::Blob.batch
and have noticed an improvement of 40%~ when testing the local push of the gitlab-ce
project.
First we need to make sure that the project has LFS enabled:
project = Project.find_by_full_path('namespace/repo')
project.lfs_enabled? #=> true
Then we can use the following commands to test the time spent:
# On a local copy of gitlab-ce
git remote add local user@localhost:/namespace/repo.git
time git push local master
Original implementation:
NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:18:22.055
FINISH: 2018-05-21T13:18:23.753
ms=1697.9999542236328
NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:20:45.875
FINISH: 2018-05-21T13:20:46.962
ms=1086.9998931884766
NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:23:20.993
FINISH: 2018-05-21T13:23:22.222
ms=1229.0000915527344
Updated implementation:
NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:27:32.380
FINISH: 2018-05-21T13:27:32.940
ms=559.999942779541
NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:30:03.724
FINISH: 2018-05-21T13:30:04.205
ms=480.9999465942383
NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:31:56.600
FINISH: 2018-05-21T13:31:57.248
ms=648.0000019073486
This approach has the advantage that we can reuse the objects generated by Repository#raw_changes_between
in the other checks like repository size or max file size.
Screenshots (if relevant)
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added -
Tests added for this feature/bug - Conform by the code review guidelines
-
Has been reviewed by a UX Designer -
Has been reviewed by a Frontend maintainer -
Has been reviewed by a Backend maintainer -
Has been reviewed by a Database specialist
-
-
Conform by the merge request performance guides -
Conform by the style guides -
If you have multiple commits, please combine them into a few logically organized commits by squashing them -
Internationalization required/considered -
End-to-end tests pass ( package-and-qa
manual pipeline job)
What are the relevant issue numbers?
Closes #45062 (closed)