Skip to content

Investigate server-side usage of kuzu DB

An alternative approach (instead of using one big graph DB for storing all graphs) would be usage of many separate embedded file-based graph DBs. With this approach, a graph DB for each project would be stored in a separate kuzu DB (stored in a separate directory on disk).

Pros:

  • projects are isolated from each other - this is especially important since we want to use LLM-generated queries
  • we wouldn't need to rewrite queries to add filtering by project
  • better scalability - we could easily scale number of created graph DBs, but it wouldn't impact query performance for individual repos
  • same DB would be used both on server-side and client-side

Cons:

  • accessing file-based DB might be slow
  • kuzu may not scale well with opening many connections in parallel
  • kuzu doesn't support simltaneous read-write access to the same DB (all connections to the same DB must be read-only) - this complicates update of graphs, but it's not a blocker (we can prepare new graph in a separate DB and then just replace files)

Some initial investigation was done in #517117 (comment 2432540741).

Goals:

  • investigate if kuzu could be used on server-side to serve knowledge graph for repositories
  • investigate if kuzu solution would scale for server side needs: number of simultaneously open connections, memory usage, query times, project indexing time
  • because kuzu is an embedded DB, there needs to be some service API layer which would manage kuzu DB connections, serve incoming requests and take care of updating graphs - investigate how this should look like or if we can re-use zoekt-webservice logic (which should do something similar already)
Edited by Jan Provaznik