Research which DOIs have null uniclob IDs in the citationinfo table.
Background
There are 2099 items in the citationinfo table, which means that we don't have their XML on file. It's possible that we do have the XML in the clobinfo table and it's orphaned from the citationid table. This is exemplified in the error message https://sentry.io/organizations/crossref/issues/1824960366/?project=1769602
unable to process {\n "citationId": 65299718,\n "submissionId": 1362515871,\n "count": 1\n}
NotFoundException: unknown uniclob: unresolved={0}; uniclobId={1}
File "UnresolvedReferencesProcessor.java", line 222, in process
File "UnresolvedReferencesProcessor.java", line 198, in process
File "QueuedItemsProcessor.java", line 121, in run
File "SessionRunner.java", line 80, in run
File "ThreadPoolExecutor.java", line 1142, in runWorker
File "ThreadPoolExecutor.java", line 617, in run
File "Thread.java", line 745, in run
Research:
- Which items have no XML? Are there common characteristics, e.g. publisher, prefix, date?
- Can we identify orphaned XMLs (i.e. are not referenced by citation info). How many of these are there?
- Can we re-deposit / reprocess those submissions and re-run them to get the XML?
- Is it suitable to simply redeposit or is it necessary to do something more clever?
Definition of ready
-
Product owner: @SaraBowman -
Tech lead: @jonmstark -
Service:: label applied -
Definition of done updated -
Weight applied
Definition of done
-
Unit tests identified, implemented, and passing -
Code reviewed -
Available via a staging URL -
Knowledge base reviewed and updated -
Consider any impacts to current or future architecture/infrastructure, and update specifications and documentation as needed -
Acceptance criteria met -
Answer above research questions. -
Liaise with support team / announcement. -
Formulate issue to perform the fixes, including a spec for the patch tool
-
Notes
Edited by Patrick Polischuk