Skip to content

WIP:Improve performance of LFS integrity check

What does this MR do?

It tries to improve the performance of the LFS integrity check by using git diff and git cat-file instead of git rev-list (which returns all the revisions when pushing a new branch). The mentioned git commands are already being used for other checks which means that we don't need to invoke the git command multiple times since the content is cached in Repository#raw_changes_between

Are there points in the code the reviewer needs to double check?

Do we need to delete other unused code? Do you think we can get the same results by using git diff instead of git rev-list?

Why was this MR needed?

We were observing some performance issues in Grafana.

I've made a test by replacing the current implementation for checking LFS integrity by reusing the Repository#raw_changes_between method along with Gitlab::Git::Blob.batch and have noticed an improvement of 40%~ when testing the local push of the gitlab-ce project.

First we need to make sure that the project has LFS enabled:

project = Project.find_by_full_path('namespace/repo')

project.lfs_enabled? #=> true 

Then we can use the following commands to test the time spent:

# On a local copy of gitlab-ce
git remote add local user@localhost:/namespace/repo.git

time git push local master
Original implementation:
NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:18:22.055
FINISH: 2018-05-21T13:18:23.753
ms=1697.9999542236328

NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:20:45.875
FINISH: 2018-05-21T13:20:46.962
ms=1086.9998931884766

NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:23:20.993
FINISH: 2018-05-21T13:23:22.222
ms=1229.0000915527344
Updated implementation:
NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:27:32.380
FINISH: 2018-05-21T13:27:32.940
ms=559.999942779541

NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:30:03.724
FINISH: 2018-05-21T13:30:04.205
ms=480.9999465942383

NEW_REV: 95dacabe4b57f0820a75be02b26fffb9c45270f0
START: 2018-05-21T13:31:56.600
FINISH: 2018-05-21T13:31:57.248
ms=648.0000019073486

This approach has the advantage that we can reuse the objects generated by Repository#raw_changes_between in the other checks like repository size or max file size.

Screenshots (if relevant)

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Closes #45062 (closed)

Edited by 🤖 GitLab Bot 🤖

Merge request reports