Add a Blame cache
Problem to solve
The file history for the tool I'm working on is migrated from CVS, and there are some 25-year-old files in there with several thousand commits on them (200k+ commits in the whole repo). For these 25-year-old files, git blame
takes a very long time (up to 10 minutes) on a local machine, and Gitlab times out trying to display the blame information.
Further details
Some of our developers have to chase down issues in these files, and they have to look at the surrounding history to understand how and why changes have been made. With CVS, it only takes a second to display the file history.
Having a blame cache would make such perusal with Git much easier. Gerrit and Phabricator have implemented such caches.
If plugged in properly to the API, I could then write a CLI tool that fetches blame information from Gitlab, allowing fast blame also from the CLI.
Proposal
Implement a server-side blame cache. Not sure if the cache should be enabled by default, or on a per-repo basis, or maybe even a file-by-file basis (e.g. any file with more than X commits is cached).
This should be transparent to the end user, whether it's accessed through the Web UI or the API.
What does success look like, and how can we measure that?
It should not take more than 5 seconds to get the blame information, no matter how much history there is in the file (even the 30k+ LoC monster with 3k+ commits that I have here) or the repo.
Links / references
Here are the implementations of Git Blame caches I could find:
- Phabricator: https://secure.phabricator.com/rPe8d3071452f0555053dc1d9e004e345c9bcf5654
- Gerrit: https://gerrit.googlesource.com/gitiles/+/v0.1-9/blame-cache/src/main/java/com/google/gitiles/blame
- Perl module: https://metacpan.org/pod/Git::Repository::Plugin::Blame::Cache
Thanks,
Clément