Investigate server-side usage of kuzu DB

An alternative approach (instead of using one big graph DB for storing all graphs) would be usage of many separate embedded file-based graph DBs. With this approach, a graph DB for each project would be stored in a separate kuzu DB (stored in a separate directory on disk).

Pros:

projects are isolated from each other - this is especially important since we want to use LLM-generated queries
we wouldn't need to rewrite queries to add filtering by project
better scalability - we could easily scale number of created graph DBs, but it wouldn't impact query performance for individual repos
same DB would be used both on server-side and client-side

Cons:

accessing file-based DB might be slow
kuzu may not scale well with opening many connections in parallel
kuzu doesn't support simltaneous read-write access to the same DB (all connections to the same DB must be read-only) - this complicates update of graphs, but it's not a blocker (we can prepare new graph in a separate DB and then just replace files)

Some initial investigation was done in #517117 (comment 2432540741).

Goals:

investigate if kuzu could be used on server-side to serve knowledge graph for repositories
investigate if kuzu solution would scale for server side needs: number of simultaneously open connections, memory usage, query times, project indexing time
because kuzu is an embedded DB, there needs to be some service API layer which would manage kuzu DB connections, serve incoming requests and take care of updating graphs - investigate how this should look like or if we can re-use zoekt-webservice logic (which should do something similar already)

Edited Apr 08, 2025 by Jan Provaznik