Skip to content

[kuzu][server] Spike: Kuzu does not allow multi db queries via ATTACH functionality

Problem to Solve

The GKG team has discussed the implications of using Kuzu to store redacted SDLC data with arbitrary namespace depth. The primary bottleneck is not in giving each namespace it's own Kuzu DB, as a single Kuzu DB can handle massive scale, but rather querying across databases.

Kuzu has limited functionality in this matter, via the ATTACH Cypher command. Kuzu also has ambiguous documentation on what is considered a local vs. remote database, and the docs indicated that multiple local databases may be allowed to be queried via a single connection.

We have spun up a branch based on the Kuzu performance analysis branch to test the behavior of the ATTACH command.

ATTACH Findings

Testing ATTACH for database: test_main_db_1.db as attached_db_1
Query on attached_db_1 returned 1 rows:
  user_count: 40
Testing ATTACH for database: test_main_db_2.db as attached_db_2
Query on attached_db_2 returned 1 rows:
  user_count: 40
Testing ATTACH for database: test_main_db_3.db as attached_db_3
Query on attached_db_3 returned 1 rows:
  user_count: 40
Testing ATTACH for database: test_main_db_4.db as attached_db_4
Query on attached_db_4 returned 1 rows:
  user_count: 40
Attaching database: test_main_db_1.db as db_1
Successfully attached test_main_db_1.db as db_1
After attaching db_1, total users query returns:
  user_count: 40
Attaching database: test_main_db_2.db as db_2
Successfully attached test_main_db_2.db as db_2
After attaching db_2, total users query returns:
  user_count: 40
Attaching database: test_main_db_3.db as db_3
Successfully attached test_main_db_3.db as db_3
After attaching db_3, total users query returns:
  user_count: 40
Attaching database: test_main_db_4.db as db_4
Successfully attached test_main_db_4.db as db_4
After attaching db_4, total users query returns:
  user_count: 40

Multiple databases can indeed be attached at once (verifiable via the CALL SHOW_ATTACHED_DATABASES() RETURN *; command), but per the Kuzu docs, as soon as one database is at attached, it "unmounts" the database attached prior. Unlike other OLAP databases, we have no ability to query tables across multiple db's and UNION the results, for example:

// Desired behavior, not possible
MATCH (u:GitLabUser) RETURN count(u) as user_count, 'main' as database_name
UNION ALL
MATCH (u:db_1.GitLabUser) RETURN count(u) as user_count, 'db_1' as database_name  
UNION ALL
MATCH (u:db_2.GitLabUser) RETURN count(u) as user_count, 'db_2' as database_name

To properly query multiple databases that share identical schemas at once, we'd need to dispatch concurrent reads, and union/merge disparate queries server-side. This behavior points to a single SDLC graph with a robust "mirror" of GL's permissions hierarchy either via redaction or federation as a potential solution.

Edited by Michael Usachenko