Skip to content

Merge conflicts in `split_into_inserts_and_updates` when cached entity references a record without id

Difficult to reproduce, but during crawling I stumbled upon this nasty error:

Traceback (most recent call last):
  File "/home/fspreck_indiscale/indiscale/caosdb/dataset-crawler/test_pangaea_crawler.py", line 57, in <module>
    crawl_pangaea_dir()
  File "/home/fspreck_indiscale/indiscale/caosdb/dataset-crawler/test_pangaea_crawler.py", line 52, in crawl_pangaea_dir
    ins, ups = crawler.synchronize(unique_names=False)
  File "/home/fspreck_indiscale/.local/lib/python3.10/site-packages/caoscrawler/crawl.py", line 483, in synchronize
    return self._synchronize(self.crawled_data, commit_changes, unique_names=unique_names)
  File "/home/fspreck_indiscale/.local/lib/python3.10/site-packages/caoscrawler/crawl.py", line 959, in _synchronize
    to_be_inserted, to_be_updated = self.split_into_inserts_and_updates(crawled_data)
  File "/home/fspreck_indiscale/.local/lib/python3.10/site-packages/caoscrawler/crawl.py", line 721, in split_into_inserts_and_updates
    merge_entities(newrecord, record)
  File "/home/fspreck_indiscale/.local/lib/python3.10/site-packages/caosdb/apiutils.py", line 436, in merge_entities
    raise RuntimeError(
RuntimeError: Merge conflict:
Entity a (2911, None) has a Property 'license' with value=<Record name="CC-BY-3.0">
  <Parent name="license"/>
  <Property name="full_name" importance="FIX" flag="inheritance:FIX">Creative Commons Attribution 3.0 Unported</Property>
  <Property name="url" importance="FIX" flag="inheritance:FIX">https://creativecommons.org/licenses/by/3.0/</Property>
</Record>

Entity b (None, None) has a Property 'license' with value=<Record id="2624" name="CC-BY-3.0">
  <Parent name="license"/>
  <Property name="full_name" importance="FIX" flag="inheritance:FIX">Creative Commons Attribution 3.0 Unported</Property>
  <Property name="url" importance="FIX" flag="inheritance:FIX">https://creativecommons.org/licenses/by/3.0/</Property>
</Record>

Idea for reproducing

Record D1 (has id) references L1 (no id) Record D2 (has no id) references L2 (has id)

apart from ids, L1 and L2 are identical; D1 and D2 too or at least can be merged.

split_into_inserts_and_updates([L1, D1, D2, L2]) with appropriate caches and identifiables should be a suitable unit test.

Possible fix

a WIP fix at least for this specific case is in branch f-merge-conflict-id.