Provide a flat list of files that have changed and the type of change that was made (delete, modify, add, renamed)
Desired Outcome
Provide a flat list of files that have changed, not necessarily commits. The key pieces of information that are needed are the full path of all the changed files as well as the type of change for each file (delete, modify, add, renamed).
Find a way to get the list of files that are changed in a commit from GitLab so that I can update our internal Codesearch application appropriately.
Use Case Problem to solve
I have a customer who is coming from SVN to GIt, that maintains an internal code search tool (recently migrated from svn to git). They are looking to call the GitLab API to request a full list of files that were added/deleted/modified during a commit, however, it looks like it has a hardcoded limit of 1000.
I'm a developer internally focused on writing more tooling for our actual product developers to use. We are coming from SVN. We're looking to use Gitlab as our backend infrastructure for Git and I'm prepping for that.
I'm running into some blocks regarding large diffs as I'm working on an internal Codesearch application we have. I’ve been looking into how to integrate the updater process with Git on our internal Codesearch application we have.
I’ve played around with LibGit2Sharp but unfortunately, the library requires that the repository be on disk to get the information needed to update our CodeSearch application.
Generally, for things that our Codesearch application has wanted to do in git without a local repository, we’ve been relying on the GitLab web APIs.
GitLab has some rest endpoints that at first glance looked like they would work. The closest API I can find is the commit/diff API which is way more verbose than what I need. It not only returns the list of files that were committed but the diff of each file.
I can use the GitLab web APIs for everything except getting the full list of files that were added/deleted/modified during a commit.
This web API has a hardcoded limit of 1000 file changes in a single commit (probably because it returns a diff in addition to the file names) and doesn’t support any sort of pagination (as far as I can tell). Unfortunately, the API has some size limitations when looking at the files that were modified in a given commit, and reading the GitLab documentation Commits API | GitLab , it seems like the limit is not configurable. It doesn’t look like this is a limitation we are going to be able to work around with configuration changes.
I'd like to get this information without checking out all the code that Codesearch indexes to disk. It looks like GitLab uses a Postgres database under the hood and was something that I thought could be exposed for read-only access but, after careful consideration, I'd like to NOT query the backend Postgres database at all costs. That comes with a number of downsides because our CodeSearch application manages the database and makes schema changes on virtually every update, which occurs monthly.
On-disk checkout is my last option but I'm hoping to avoid an on-disk checkout as it’s much more work to maintain a checked-out copy of all the code that our application indexes. I’d prefer to avoid this option if at all possible.
Maybe I'm trying to perform a specific operation in Git like I did in SVN and that's why I am unsuccessful? DO you know of another way to obtain the information I am looking for?
Question
Is there any way to change this setting? https://docs.gitlab.com/ee/api/commits.html#get-the-diff-of-a-commit
As far as I know, we don’t have an API endpoint that returns the filenames of the commit changes. We have https://docs.gitlab.com/ee/api/commits.html#get-a-single-commit that returns stats, but it's just numbers.
Proposed change
It seems to be hardcoded, but we’re working on increasing the limits: #219565 (closed). I believe, if we enable increased_diff_limits FF (https://gitlab.com/gitlab-org/gitlab/-/issues/241185), we can increase the limit to 3000.
The only problem I see with that is Any arbitrary limit (even 3000) would be a problem since they will hit that limit at times and their code searching tool would still need to handle those cases.
I’m guessing one of the reasons the limits were put in place is because of the amount of data that is included in the response because it includes a diff of every modified file such that if 1000 files were added in a commit, all the content from every file would be included in the response.
Ideally, they would like to have an endpoint that shows all the added/deleted/modified files during a commit without the file diffs and without any restrictions on the number of files modified.