feat(db): adding server side repository processing with c bindings
What does this MR do and why?
Adding server side repository processing with FFI / C Bindings
Addressing: gitlab-org/gitlab-zoekt-indexer#89 (closed)
Testing
- From the directory
crates/indexer-c-bindings, run the commandcbindgen --output c_bindings.h - From the root directory of the project, run
cargo build --release -p indexer-c-bindings -vto generate the static libraryknowledge-graph/target/release/libindexer_c_bindings.a
On Zoekt side:
- Checkout the branch
79-graph-zoekt-indexer-to-create-graph - In the file
internal/graph_indexer/graph_indexer.go, you will need to change the CGO instructions, to point to the headers and static library on your computer instead. Not sure yet how to make cleaner, I will check if there are other examples in our GoLang codebases for such flags, and follow the best practices. - Make sure to stop all GDK
gitlab-zoekt-indexer-* make build-unified- From the root directory of your GDK, run
./gitlab-zoekt-indexer/bin/gitlab-zoekt indexer -index_dir zoekt-data/development/index -graph_index_dir zoekt-data/development/graph-index -listen :6080 -secret_path <YOUR_GDK_ROOT_DIRECTORY>/gitlab-shell/.gitlab_shell_secret -self_url "http://localhost:6080" -search_url "http://localhost:6090" -gitlab_url http://gdk.test:3000 - Make sure to enable the FF
knowledge_graph_indexingin your Rails app. - Push to any project default branch
- A few seconds later you should see Kuzu Graph DB created in GDK
zoekt-data/development/indexin a new directory calledknowledge_graph_data.
Performance Checklist
-
Have you reviewed your memory allocations to ensure you're optimizing correctly? Are you cloning or copying unnecessary data? -
Have you profiled with cargo benchorcriterionto measure performance impact? -
Are you using zero-copy operations where possible (e.g., &strinstead ofString, slice references)? -
Have you considered using Cow<'_, T>for conditional ownership to avoid unnecessary clones? -
Are iterator chains and lazy evaluation being used effectively instead of intermediate collections? -
Are you reusing allocations where possible (e.g., Vec::clear()and reuse vs new allocation)? -
Have you considered using SmallVecor similar for small, stack-allocated collections? -
Are async operations properly structured to avoid blocking the executor? -
Have you reviewed unsafecode blocks for both safety and performance implications? -
Are you using appropriate data structures (e.g., HashMapvsBTreeMapvsIndexMap)? -
Have you considered compile-time optimizations (e.g., const fn, generics instead of trait objects)? -
Are debug assertions ( debug_assert!) used instead of runtime checks where appropriate?
Edited by Omar Qunsul