Research on De-duplicating those title records that were duplicated as a result of the logic in content registration
Research task: Understand the scale of duplicated title records that occurred prior to this fix. Can we pull a list of the ISBNs that have multiple CiteIDs in our system? Understanding the scale of the problem, as well as the steps Support uses to correct them manually/time this takes, will help us find a reasonable solution.
More context:
Support has not yet received any new examples of title records being duplicated since Mike's fix in #984 (closed). That's the good news.
But, we have received dozens of requests to cleanup duplicated title records. Most recently, IEEE in the last couple of weeks has requested that we merge about 40 different conference-level records (here's an example Zendesk ticket: https://crossref.zendesk.com/agent/tickets/371046).
Between IEEE, ASME, and ACS we could easily see hundreds of additional support requests to manually cleanup those title records that were previously created (prior to Mike's improvement).
This is what a duplicated title record looks like in the admin tool:
As you can see, the Cite IDs on these duplicated records are in chronological order and the titles are identical.
Could the technical team automate a merging process for these duplicate title records in order to save the support team from this toil? We'd need to merge all of the DOIs of each Cite ID together (we, in support, retain the oldest Cite ID and merge the newer Cite ID records to the initial record) and then delete the duplicate title records after there are no DOIs registered against that Cite ID.
How urgent
Moderately urgent. IEEE will have additional examples; as will other members.
Definition of ready
-
Product owner: @SaraBowman -
Tech lead: @myalter -
Service:: or C:: label applied -
Definition of done updated -
Acceptance testing plan: list and findings delivered to @SaraBowman -
Weight applied
Definition of done
-
Consider any impacts to current or future architecture/infrastructure, and update specifications and documentation as needed -
Knowledge base reviewed and updated -
Acceptance criteria met -
A list (csv?) of ISBNs that are attached to multiple CiteIDs and their titles: ISBN, CiteID, Title -
Outline of the manual steps Support takes to correct these, and an estimate of the time to do them
-
-
Acceptance testing passed