Investigate server-side usage of kuzu DB
An alternative approach (instead of using one big graph DB for storing all graphs) would be usage of many separate embedded file-based graph DBs. With this approach, a graph DB for each project would be stored in a separate kuzu DB (stored in a separate directory on disk).
Pros:
- projects are isolated from each other - this is especially important since we want to use LLM-generated queries
- we wouldn't need to rewrite queries to add filtering by project
- better scalability - we could easily scale number of created graph DBs, but it wouldn't impact query performance for individual repos
- same DB would be used both on server-side and client-side
Cons:
- accessing file-based DB might be slow
- kuzu may not scale well with opening many connections in parallel
- kuzu doesn't support simltaneous read-write access to the same DB (all connections to the same DB must be read-only) - this complicates update of graphs, but it's not a blocker (we can prepare new graph in a separate DB and then just replace files)
Some initial investigation was done in #517117 (comment 2432540741).
Goals:
- investigate if kuzu could be used on server-side to serve knowledge graph for repositories
- investigate if kuzu solution would scale for server side needs: number of simultaneously open connections, memory usage, query times, project indexing time
- because kuzu is an embedded DB, there needs to be some service API layer which would manage kuzu DB connections, serve incoming requests and take care of updating graphs - investigate how this should look like or if we can re-use zoekt-webservice logic (which should do something similar already)
Edited by Jan Provaznik