Skip to content

[design] defining schema for relationships between definition nodes

(WIP)

Background

As they are currently defined, we have three unique node types, DirectoryNode, FileNode, and DefinitionNode. All three are similar, and we can potentially merge them in a single Node type, and just have sparse columns, as Kuzu is a columnar database, and automatically compresses sparse column segments. But for now, let us assume that we have all three.

To represent relationships between nodes, I propose we have a single relationships table with the following schema, at most, for every node type:

relationships

Property Type Description
source_id UINT32 Outgoing or "sender" node id
target_id UINT32 Incoming or "receiver" node id
type UINT8 Relationship type

Kuzu doesn't have an Enum type, so we'll be stuck something like UINT8 (255 possible relationship types), and converting the integer to a string. Or mapping a string to an integer in the case of using this for agentic RAG. Luckily, if we run out of UINT8 slots, we could always do a schema migration to UINT16.

To note, we could store the below relationships (originally defined in @michaelangeloio's E2E MR) entirely outside of Kuzu, or have a relationship_metadata table if we want to join on relationship strings like MODULE_TO_METHOD too, which helps us avoid using up too much memory when constructing/storing the DB, and still gives us the DX niceties of building queries without needing to write join predicates like ON type=5.

Definition Relationships We Currently Capture

All relationships below are FROM DefinitionNode TO DefinitionNode

Module Relationships

Relationship Type Description
MODULE_TO_CLASS Module contains class definition
MODULE_TO_MODULE Module contains/imports another module
MODULE_TO_METHOD Module contains method definition
MODULE_TO_SINGLETON_METHOD Module contains singleton method
MODULE_TO_CONSTANT Module contains constant definition
MODULE_TO_LAMBDA Module contains lambda definition
MODULE_TO_PROC Module contains proc definition

Class Relationships

Relationship Type Description
CLASS_TO_METHOD Class contains method definition
CLASS_TO_ATTRIBUTE Class contains attribute definition
CLASS_TO_CONSTANT Class contains constant definition
CLASS_INHERITS_FROM Class inheritance relationship
CLASS_TO_SINGLETON_METHOD Class contains singleton method
CLASS_TO_CLASS Class contains nested class
CLASS_TO_LAMBDA Class contains lambda definition
CLASS_TO_PROC Class contains proc definition

Method Relationships

Relationship Type Description
METHOD_CALLS Method calls another method
METHOD_TO_BLOCK Method contains block definition
SINGLETON_METHOD_TO_BLOCK Singleton method contains block
Edited by Michael Usachenko