Use FFI to call knowledge graph indexer

We eventually plan to call knowledge graph indexer from zoekt-indexer using FFI. The reason is that zoekt-indexer is written in Go, but knowledge graph indexer is written in Rust and we would like to avoid exec-ing another binary.

To make this possible we will need:

publish go-bindings for knowledge graph indexer (perhaps something similar to https://gitlab.com/michaelangeloio/gitlab-code-parser-go-ffi)
update zoekt-indexer to use these bindings

We should make sure that knowledge graph indexer and its bindings are stable to make sure it doesn't break zoekt build/release process, or make this dependency optional for this initial phase.

Slack thread of the related discussion

jprovaznik
jprovaznik  Yesterday at 16:53
@Omar Qunsul
 to follow-up on discussion about using CLI vs FFI - we will need FFI much sooner then "before release": we will need FFI approach for testing it on .com (I suspect it would be difficult to build/package separate binary in our deployment process). So CLI would be handy rather for local experiments/testing with the indexer.

1

9 replies


jprovaznik
jprovaznik  Yesterday at 16:55
We can sync up about this tomorrow after looking closer at the existing indexer code


Omar Qunsul
Omar Qunsul  Yesterday at 17:45
@jprovaznik
 there are different threads for me to look into:
(Incremental Indexing) Trying to get the changed file batch, for incremental update, and write them to files [As far as I understand, we don't want to proceed with this, but it depends on how complex FFI]
(Initial Indexing) Try to do git checkout from Gitaly, to pass the repo to the knowledge graph, as an argument to the CLI.
Look into FFI. For sure this is used for incremental indexing, but not yet sure if that's used for the Initial Indexing
Let me know tomorrow which one is the most important me to dig deep into. I feel I want to continue with (2), even if it was with git checkout. I even got some go code works that checks out the code from Gitaly (edited) 


jprovaznik
jprovaznik  Today at 09:58
@Omar Qunsul
 I was thinking about following iterative steps:
start with simplest way to do initial indexing - your "step 2":  investigate if we can easily "git clone" the repo with Gitaly and store it in a tmp dir and use CLI (external command) to execute indexing. This is useful for testing the indexing and whole process locally and it makes sense to do as long as it's simple to implement it. 
@dgruzd
 mentioned that initial version of zoekt-indexer created copy of repo locally initially so it might be worth to check this in zoekt-indexer git commits (as similar logic to get repository would be reused).
Use FFI to call the indexer (instead of exec-ing external CLI command) - the whole repo would be still cloned to a tmp dir (done in step 1), but we wouldn't need another binary dependency (knowledge graph CLI). Also some work on zoekt-indexer side will be needed to make sure that knoeldge-graph indexer is built as part of zoekt-indexer build process
incremental updates - instead of doing "git clone" to store whole repo in tmp dir, get only changed files and use FFI to call the indexer (and pass changed files to the indexer as params in batches).
The end goal is step (3) - ideally we want to use FFI to call the indexer both for full indexing and incremental indexing. I assume full indexing would fetch changed files same way as in incremental indexing but we would just tell gitaly "give me all files changed from the first commit". If steps 1 or 2 would actually require significant amount of time, we can focus directly on step 3 because steps 1 and 2 make sense only if these are easy to implement. But I think that preparing FFI calling will not be straightforward. WDYT?


dgruzd
dgruzd  Today at 10:02
start with simplest way to do initial indexing - your "step 2":  investigate if we can easily "git clone" the repo with Gitaly and store it in a tmp dir and use CLI (external command) to execute indexing. This is useful for testing the indexing and whole process locally and it makes sense to do as long as it's simple to implement it. 
@dgruzd
 mentioned that initial version of zoekt-indexer created copy of repo locally initially so it might be worth to check this in zoekt-indexer git commits (as similar logic to get repository would be reused).
We no longer use this indexer. We don't even have git command available
10:03
The only option is to load the content of the files from Gitaly and write these to a temporary directory. That's why I was saying that it's a bit wasteful compared to using FFI straight away (my personal opinion, non-blocking)


jprovaznik
jprovaznik  Today at 10:24
I was hoping there would be a handy command for getting whole repo in gitaly directly (when using grpc - https://pkg.go.dev/gitlab.com/gitlab-org/gitaly@v1.87.0/proto/go/gitalypb) but maybe not. Then I wonder if it would still make sense to do it in two steps:
get all changed files in batches using gitaly (for full indexing we would fetch basically all files) and store them in tmp dir. Use CLI to call it on this dir
replace CLI with FFI
But I agree that trying to write to tmp dir might be waste of time if means more work. The main reason I'm still tempted to start with CLI is that I suspect that FFI call will not be trivial amount of work - aside of knowledge graph indexer work, there will be also work needed to incorporate this in zoekt-indexer building process. Also because knowledge graph indexer is in an early phase there is a risk of breaking this process. Also I think production readiness should be done first, or at least infra team should approve this FFI plan first before we include it in zoekt-indexer. WDYT?

GitLab
Knowledge Graph - Production Readiness (#132) · Issues · GitLab.com / GitLab Infrastructure Team / readiness · GitLab
Production Readiness This issue serves as a tracking issue to guide you through the readiness review. It's not the production readiness...
Author
Jan Provaznik
Assignee
Jan Provaznik


dgruzd
dgruzd  Today at 10:26
I actually don't think that FFI requires an infra team approval. If we build bindings correctly, it would be another go library from the indexer perspective


jprovaznik
jprovaznik  Today at 10:30
hm, true


jprovaznik
jprovaznik  Today at 10:38
Let's wait for 
@Omar Qunsul
's findings how much difficult it would be to do step 1

1

1

Edited Jun 26, 2025 by Jan Provaznik