Skip to content

Draft: Replace predicate_id on statements with relation type

Marcel Konrad requested to merge predicate-id-as-relation-label into master

As discussed, this MR explores the possibility to replace the predicate_id property of statements with the label of the relationship. To evaluate the performance of the change, I concluded four experiments, where I ran the profileNeo4jRepositories task. The database was restarted for each experiment and I waited until the CPU usage was around 0-1%.

Experiment Description
1 master branch (15a2b810)
2 predicate-id-as-relation-label without any indexes
3 predicate-id-as-relation-label with indexes for r.statement_id on relation types, where COUNT(type(rel)) > 1000
4 predicate-id-as-relation-label with indexes for r.statement_id on every relation type

Migration: I did not include a neo4j migrations migration, as the query will always run into a transaction timeout. However, the following query can be used to migrate the database (takes a couple of minutes to execute):

MATCH (a)-[r:RELATED]->(b)
WITH r.predicate_id AS newLabel, COLLECT(r) AS rels
CALL apoc.refactor.rename.type("RELATED", newLabel, rels)
YIELD total
RETURN SUM(total)

Results: While the overall performance is significantly worse (290% slower even with indexes) than the current master branch, I suspect that this is due to the poor query optimization that has been done so far. Most queries should run significantly faster if we write the desired relation type directly on the relation itself (i.e. ()-[:P32]->()), instead of filtering the relation types after the fact with WHERE type(r) = "P32". The only problem with that is that we cannot pass relationship types as parameters to queries, meaning that we cannot make use of query memoization. However, individual queries using direct relationship labels run slightly faster. For a full performance breakdown, you can view the following files:

compare-1-to-4.txt summary1.txtsummary2.txtsummary3.txtsummary4.txt

This MR would also revert !666 (merged)

Merge request reports