Skip to content

Add a Blame cache

Problem to solve

The file history for the tool I'm working on is migrated from CVS, and there are some 25-year-old files in there with several thousand commits on them (200k+ commits in the whole repo). For these 25-year-old files, git blame takes a very long time (up to 10 minutes) on a local machine, and Gitlab times out trying to display the blame information.

Further details

Some of our developers have to chase down issues in these files, and they have to look at the surrounding history to understand how and why changes have been made. With CVS, it only takes a second to display the file history.

Having a blame cache would make such perusal with Git much easier. Gerrit and Phabricator have implemented such caches.

If plugged in properly to the API, I could then write a CLI tool that fetches blame information from Gitlab, allowing fast blame also from the CLI.

Proposal

Implement a server-side blame cache. Not sure if the cache should be enabled by default, or on a per-repo basis, or maybe even a file-by-file basis (e.g. any file with more than X commits is cached).

This should be transparent to the end user, whether it's accessed through the Web UI or the API.

What does success look like, and how can we measure that?

It should not take more than 5 seconds to get the blame information, no matter how much history there is in the file (even the 30k+ LoC monster with 3k+ commits that I have here) or the repo.

Links / references

Here are the implementations of Git Blame caches I could find:

  1. Phabricator: https://secure.phabricator.com/rPe8d3071452f0555053dc1d9e004e345c9bcf5654
  2. Gerrit: https://gerrit.googlesource.com/gitiles/+/v0.1-9/blame-cache/src/main/java/com/google/gitiles/blame
  3. Perl module: https://metacpan.org/pod/Git::Repository::Plugin::Blame::Cache

Thanks,

Clément

Edited by Clément Moyroud