[Rust Code Parser] Parse and chunk files into logical code elements

Context

For an overview of the Code Parsing and Chunking Strategy, please refer to #528770 (comment 2451039319).

In this issue:

we need to introduce the code parsing function needed to chunk code files into logical code elements
the parser should chunk the files into top-level code segments using tree-sitter
the parser will need to be introduced in a new rust-based project called the gitlab-code-parser

I/O Contract

The Go-based gitlab-elasticsearch-indexer will call the parser using a Code Parsing Chunker class.

As part of this issue, we need to finalize the I/O contract between the Code Parsing Chunker and the Rust Code Parser. The main things to consider are:

we need to use a data structure that's most performant for FFI communications
For the input (Code Parsing Chunker -> Rust Code Parser), this should be an array of files, with each file having the following fields:
- file_path
- file_content
For the output (Rust Code Parser -> Code Parsing Chunker), this should be an array of chunks, with each chunk having the following fields:
- file_path
- content_type (e.g.: method|class|import|etc)
- content_name (e.g.: ModuleName::ClassName::method_name)
- language
- start_byte
- end_byte

Note that the Rust Code Parser doesn't need to return the content of the chunks. The Code Parsing Chunker class in the Go Elasticsearch Indexer can determine this from the file content and the start_byte and end_byte of the chunks.

Prerequisites

The gitlab-code-parser must be created, see: #534153 (comment 2440502912) and #536077 (closed)

References

Resources:

Proposal: Create "One Parser" - A Unified Stati... (#534153 - closed)
Code Parsing and Chunking Strategy proposal
Rust Tree Sitter documentation
ast-grep - this allows you to introduce a "polyglot" query that will apply to all languages
- example: playground link
- for more information, please check in with @michaelangeloio
Experiments for using Go + Rust + Tree-Sitter:
- https://gitlab.com/michaelangeloio/gitlab-code-parser-go-ffi/
- partiaga/gitlab-code-parser-go-ffi!1 (closed)

Reference Persons

@michaelangeloio
@partiaga

Proposal

TBA

Edited Jun 23, 2025 by 🤖 GitLab Bot 🤖