Skip to content

Crawler overwrites and deletes existing data

I experienced that the crawler overwrites and deletes properties during update. I.e., before crawling, there was a Person record

Person:
  full_name: Some Name
  email: some.name@test.org
  legacy_id: 12

Now the crawler finds the same person from a different source with an ORCID, but without email address and legacy id.

Person:
  full_name: Some Name
  ORCID: 12345

Since full_name is the identifiable, the existing record is updated. In the update, the old properties are removed s.th. the record after the update looks like

Person:
  full_name: Some Name
  ORCID: 12345

without email and legacy id.

This is probably a bug in Pylib's merge function and needs to be fixed there, but we need to investigate.