feat(db): adding server side repository processing with c bindings

What does this MR do and why?

Adding server side repository processing with FFI / C Bindings

Addressing: gitlab-org/gitlab-zoekt-indexer#89 (closed)

Testing

  1. From the directory crates/indexer-c-bindings, run the command cbindgen --output c_bindings.h
  2. From the root directory of the project, run cargo build --release -p indexer-c-bindings -v to generate the static library knowledge-graph/target/release/libindexer_c_bindings.a

On Zoekt side:

  1. Checkout the branch 79-graph-zoekt-indexer-to-create-graph
  2. In the file internal/graph_indexer/graph_indexer.go, you will need to change the CGO instructions, to point to the headers and static library on your computer instead. Not sure yet how to make cleaner, I will check if there are other examples in our GoLang codebases for such flags, and follow the best practices.
  3. Make sure to stop all GDK gitlab-zoekt-indexer-*
  4. make build-unified
  5. From the root directory of your GDK, run ./gitlab-zoekt-indexer/bin/gitlab-zoekt indexer -index_dir zoekt-data/development/index -graph_index_dir zoekt-data/development/graph-index -listen :6080 -secret_path <YOUR_GDK_ROOT_DIRECTORY>/gitlab-shell/.gitlab_shell_secret -self_url "http://localhost:6080" -search_url "http://localhost:6090" -gitlab_url http://gdk.test:3000
  6. Make sure to enable the FF knowledge_graph_indexing in your Rails app.
  7. Push to any project default branch
  8. A few seconds later you should see Kuzu Graph DB created in GDK zoekt-data/development/index in a new directory called knowledge_graph_data.

Performance Checklist

  • Have you reviewed your memory allocations to ensure you're optimizing correctly? Are you cloning or copying unnecessary data?
  • Have you profiled with cargo bench or criterion to measure performance impact?
  • Are you using zero-copy operations where possible (e.g., &str instead of String, slice references)?
  • Have you considered using Cow<'_, T> for conditional ownership to avoid unnecessary clones?
  • Are iterator chains and lazy evaluation being used effectively instead of intermediate collections?
  • Are you reusing allocations where possible (e.g., Vec::clear() and reuse vs new allocation)?
  • Have you considered using SmallVec or similar for small, stack-allocated collections?
  • Are async operations properly structured to avoid blocking the executor?
  • Have you reviewed unsafe code blocks for both safety and performance implications?
  • Are you using appropriate data structures (e.g., HashMap vs BTreeMap vs IndexMap)?
  • Have you considered compile-time optimizations (e.g., const fn, generics instead of trait objects)?
  • Are debug assertions (debug_assert!) used instead of runtime checks where appropriate?
Edited by Omar Qunsul

Merge request reports

Loading