[Go Indexer] Introduce the Code Parsing Chunker

Context

(For an overview of the Code Parsing and Chunking Strategy, please refer to #528770 (comment 2451039319).)

Currently, the gitlab-elasticsearch-indexer uses a SizeChunker that chunks code content by size. In this issue, we will replace it with the Code Chunker in the gitlab-code-parser, which exposes a Go package that can be used in the gitlab-elasticsearch-indexer

Current Process Flow

gitlab-elasticsearch-indexer has a chunk mode indexer implemented in internal/mode/chunk/chunk.go
in internal/mode/chunk/chunk.go, we create the SizeChunker and pass it to the ChunkIndexer
we start indexing by calling ChunkIndexer.PerformIndexing
ChunkIndexer.PerformIndexing fetches the files from gitaly, and passes them to a BulkDelete and BulkIndex function
BulkIndex uses the current chunker (a SizeChunker) to chunk the files
BulkIndex then passes the chunks to the vectorStoreIndexer to be indexed in the current vector store (elasticsearch)

Prerequisites

the Go-based gitlab-elasticsearch-indexer should introduce a "chunking" mode, see #536209 (closed)
the gitlab-code-parser should expose its Code Chunker as a Go package, see gitlab-org/rust/gitlab-code-parser!85 (merged)

References

Running the `gitlab-elasticsearch-indexer`

See guide

Planning discussions

Code Parsing and Chunking Strategy proposal
Code Embeddings blueprint
Please refer to the outcome in #536142 (closed)
Please refer to the Indexer & Chunker contract defined in #528770 (comment 2451039319)

Proposal

Introduce the gitlab-code-parser's Code Chunker into the gitlab-elasticsearch-indexer. This should be wrapped as a sub-package in the chunker package, similar to the SizeChunker
In step 2 of the process flow defined above, replace the creation of the Size Chunker with creating the new Code Chunker
Ensure that Code Chunker outputs all the same Chunk fields as the SizeChunker, defined in the parent chunker package
Ensure that the output of the Code Chunker can be used in the call to the vectorStoreIndexer

Edited Jul 31, 2025 by 🤖 GitLab Bot 🤖