Skip to content

Blob search should match on the blob id

Release notes

Code and Wiki search will now match documents on the underlying Git object ID.

Problem to solve

Currently, whenever a user searches for a blob (repository or wiki search), we only match on the document's file_name and content1 for the search terms.

As Git is a content-addressable storage system, I think we should leverage that feature by allowing users to search for a Blob's object ID (oid), which represents uniquely the file's content, in full, as a single SHA (i.e. ffded2bb9b398af20fbc2f3e11c74b546f4c9764). This feature would be helpful for debugging purposes, whenever we want to ensure a document is present in the index.

The best implementation path here would be using a search filter, blob:<object-id> or oid:<object_id>.

Intended users

I think this feature is mainly useful for debugging purpose, so Developers, Admins.

User experience goal

The goal is to have a simple way to ensure a document is present in the index.

Supposing the user has access to the document, running git hash-object <file-path> should yield the Blob's object ID. Using this same object ID in the GitLab should result in that exact document, if it exists.

Proposal

Include a way for users to search for a specific Blob object ID.

Further details

Permissions and Security

There might be a case for a user to search for a specific file content across GitLab to scan for vulnerabilities in public projects. Keep in mind that the Blob's object ID will change everytime the content changes and there are most likely other ways to scan as such already.

Documentation

Availability & Testing

What does success look like, and how can we measure that?

What is the type of buyer?

Is this a cross-stage feature?

Links / references

  1. https://gitlab.com/gitlab-org/gitlab/-/blob/489b294a23a5295d041c7dfc2188d4482ceb47ac/ee/lib/elastic/latest/git_class_proxy.rb#L133

Edited by Micaël Bergeron