feat(db): adding server side repository processing with c bindings

Review changes
Open in Workspace
Download
Patches
Plain diff

feat(db): adding server side repository processing with c bindings

Omar Qunsulrequested to merge

ffi into main Jul 07, 2025

Overview 45
Commits 17
Pipelines 22
Changes 14

What does this MR do and why?

Adding server side repository processing with FFI / C Bindings

Addressing: gitlab-org/gitlab-zoekt-indexer#89 (closed)

Testing

From the directory crates/indexer-c-bindings, run the command cbindgen --output c_bindings.h
From the root directory of the project, run cargo build --release -p indexer-c-bindings -v to generate the static library knowledge-graph/target/release/libindexer_c_bindings.a

On Zoekt side:

Checkout the branch 79-graph-zoekt-indexer-to-create-graph
In the file internal/graph_indexer/graph_indexer.go, you will need to change the CGO instructions, to point to the headers and static library on your computer instead. Not sure yet how to make cleaner, I will check if there are other examples in our GoLang codebases for such flags, and follow the best practices.
Make sure to stop all GDK gitlab-zoekt-indexer-*
make build-unified
From the root directory of your GDK, run ./gitlab-zoekt-indexer/bin/gitlab-zoekt indexer -index_dir zoekt-data/development/index -graph_index_dir zoekt-data/development/graph-index -listen :6080 -secret_path <YOUR_GDK_ROOT_DIRECTORY>/gitlab-shell/.gitlab_shell_secret -self_url "http://localhost:6080" -search_url "http://localhost:6090" -gitlab_url http://gdk.test:3000
Make sure to enable the FF knowledge_graph_indexing in your Rails app.
Push to any project default branch
A few seconds later you should see Kuzu Graph DB created in GDK zoekt-data/development/index in a new directory called knowledge_graph_data.

Performance Checklist

Have you reviewed your memory allocations to ensure you're optimizing correctly? Are you cloning or copying unnecessary data?
Have you profiled with cargo bench or criterion to measure performance impact?
Are you using zero-copy operations where possible (e.g., &str instead of String, slice references)?
Have you considered using Cow<'_, T> for conditional ownership to avoid unnecessary clones?
Are iterator chains and lazy evaluation being used effectively instead of intermediate collections?
Are you reusing allocations where possible (e.g., Vec::clear() and reuse vs new allocation)?
Have you considered using SmallVec or similar for small, stack-allocated collections?
Are async operations properly structured to avoid blocking the executor?
Have you reviewed unsafe code blocks for both safety and performance implications?
Are you using appropriate data structures (e.g., HashMap vs BTreeMap vs IndexMap)?
Have you considered compile-time optimizations (e.g., const fn, generics instead of trait objects)?
Are debug assertions (debug_assert!) used instead of runtime checks where appropriate?

Edited Jul 14, 2025 by Omar Qunsul

Merge request reports

Assignee Loading

Reviewers Loading

Request review from

Loading

Time tracking Loading

Loading