Merge conflicts in `split_into_inserts_and_updates` when cached entity references a record without id
Difficult to reproduce, but during crawling I stumbled upon this nasty error:
Traceback (most recent call last):
File "/home/fspreck_indiscale/indiscale/caosdb/dataset-crawler/test_pangaea_crawler.py", line 57, in <module>
crawl_pangaea_dir()
File "/home/fspreck_indiscale/indiscale/caosdb/dataset-crawler/test_pangaea_crawler.py", line 52, in crawl_pangaea_dir
ins, ups = crawler.synchronize(unique_names=False)
File "/home/fspreck_indiscale/.local/lib/python3.10/site-packages/caoscrawler/crawl.py", line 483, in synchronize
return self._synchronize(self.crawled_data, commit_changes, unique_names=unique_names)
File "/home/fspreck_indiscale/.local/lib/python3.10/site-packages/caoscrawler/crawl.py", line 959, in _synchronize
to_be_inserted, to_be_updated = self.split_into_inserts_and_updates(crawled_data)
File "/home/fspreck_indiscale/.local/lib/python3.10/site-packages/caoscrawler/crawl.py", line 721, in split_into_inserts_and_updates
merge_entities(newrecord, record)
File "/home/fspreck_indiscale/.local/lib/python3.10/site-packages/caosdb/apiutils.py", line 436, in merge_entities
raise RuntimeError(
RuntimeError: Merge conflict:
Entity a (2911, None) has a Property 'license' with value=<Record name="CC-BY-3.0">
<Parent name="license"/>
<Property name="full_name" importance="FIX" flag="inheritance:FIX">Creative Commons Attribution 3.0 Unported</Property>
<Property name="url" importance="FIX" flag="inheritance:FIX">https://creativecommons.org/licenses/by/3.0/</Property>
</Record>
Entity b (None, None) has a Property 'license' with value=<Record id="2624" name="CC-BY-3.0">
<Parent name="license"/>
<Property name="full_name" importance="FIX" flag="inheritance:FIX">Creative Commons Attribution 3.0 Unported</Property>
<Property name="url" importance="FIX" flag="inheritance:FIX">https://creativecommons.org/licenses/by/3.0/</Property>
</Record>
Idea for reproducing
Record D1 (has id) references L1 (no id) Record D2 (has no id) references L2 (has id)
apart from ids, L1 and L2 are identical; D1 and D2 too or at least can be merged.
split_into_inserts_and_updates([L1, D1, D2, L2])
with appropriate caches and identifiables should be a suitable unit test.
Possible fix
a WIP fix at least for this specific case is in branch f-merge-conflict-id
.