Extend graph-zoekt-indexer with "create graph" task
Extend gitlab-zoekt-indexer project to support a task for indexing and creating a graph database for a repository:
- checks for "create graph" task (scheduling will be done in gitlab#540850 (closed))
- fetches changed repository files - both full repository re-indexing and incremental/partial re-indexing should be supported
- calls knowledge graph rust library to parse source code files (knowledge graph library implementation is tracked in &17517), the result will be probably updated copy of kuzu DB
- when indexing is finished, it safely updates kuzu graph DB (closes open connections to the DB and then replaces the DB)
Implementation details
There are two options how the whole process could look like, depending hhow knowledge graph library will be executed (related to &17517 (comment 2480940813)):
- if knowledge graph will be executed as a standalone app:
- zoekt-indexer makes a copy of repository's kuzu DB
- zoekt-indexer fetches repository files (either all files for full re-index or only updated files for incremental update) and stores them on disk (respecting directory/file tree structure)
- zoekt-indexer executes knowledge graph CLI (command), passing it path to the directory where repository files were stored and path to the copy of repository's kuzu DB
- knowledge graph CLI parses repository files and updates kuzu DB and returns a status code
- if knowledge graph CLI command was successful, zoekt-indexer replaces repository's old kuzu DB with the updated kuzu DB in a safe way (first closes open connections, then replaces kuzu directories)
- if knowledge graph will be a function call (through go bindings):
- zoekt-indexer makes a copy of repository's kuzu DB
- zoekt-indexer fetches repository files (either all files for full re-index or only updated files for incremental update) in batches, and for each batch it calls knowledge graph's function (through FFI) to parse the batch of source files, it also passes kuzu DB path to this function call
- knowledge graph parses repository files and updates kuzu DB and returns a status code
- if knowledge graph CLI command was successful, zoekt-indexer replaces repository's old kuzu DB with the updated kuzu DB in a safe way (first closes open connections, then replaces kuzu directories)
Scope of this issue is a set of steps done on "graph node1" in sequence diagram &17765
Related to gitlab#540850 (closed)
Related POC !472 (closed)
Edited by Jan Provaznik