Update Crossref agent and modify percolator in order to add reference metadata to Event Data
What
As a Scholix user I want Crossref data citation events for 2020 into the future.
Why
So that I can find connections between published articles and the datasets they rely on.
Background
The old agent retrieved data from OAI-PMH and we want to switch to using the REST API.
Some external APIs may have changed and need re-implementing.
Because of issues#883 (closed) we removed the worktype lookup at index time and this means that the subject metadata needs to be included by the agent. The agent also needs to request that the percolator looks up the object worktype id. These are needed to expose these events in the Scholix endpoint of the query API.
Definition of ready
-
Product owner: @mrittman -
Tech lead: @ppandis -
Service:: label applied -
Definition of done updated -
Acceptance testing plan: input/output from local testing (from the agent up to the API Scholix endpoint) -
Weight applied
Definition of done
-
Unit tests identified, implemented, and passing -
Code reviewed -
Provide a demo of correct output based on input -
Knowledge base reviewed and updated -
Public documentation reviewed and updated -
Consider any impacts to current or future architecture/infrastructure, and update specifications and documentation as needed -
Acceptance criteria met -
All Crossref references for DOIs with open references are visible in the Event Data API -
Each reference corresponds to one event. The subject is always a Crossref DOI. The object may be a Crossref or Datacite DOI. -
The observation's input-content will be a DOI if the reference has one. Otherwise it will be the reference's unstructured content. The percolator will treat both as plain text and validate the DOI if one exists. -
The subject work type is derived from the crossref metadata via the REST API -
The object worktype is derived from the API of the registration agency of the object (Datacite or Crossref) -
Closed references produce no events (the agent must check the reference visibility for the subject DOI metadata) -
The agent should be designed to be extensible in the future so that we can add relations as well (the activity id should be generated for future compatibility)
-
Prior to and during Backlog Refinement, consider the potential impacts this user story may have on the following areas:
- Billing/costs
- Internal documentation
- External documentation
- Schema
- Outputs
- Operations
- Support & Membership experience
- Outreach & Communications
- Testing
- Internationalization
- Accessibility
- Metrics, analytics, reporting
Additional details about the above items can be found here.
Notes
Combined input/output examples
- (Crossref journal-article to DataCite dataset using DOI) -> Scholix
- (Crossref journal-article to DataCite dataset using unstructured) -> Scholix
- (Crossref journal-article to Crossref journal-article) -> Ignored by scholix
- (Crossref journal-article to Crossref dataset) -> scholix
- (Crossref journal-article to DataCite article) -> Ignored by Scholix
We will assume that all references are within the following DOI: https://api.crossref.org/works/10.1038/s41597-020-00742-5?mailto=ppandis@crossref.org
DOI - reference DOI - api call
1 and 2 use the same reference because that specific DOI was ideal as an example of a DOI existing in the unstructured text.
- 10.1038/s41597-020-00742-5 - 10.5061/dryad.fn2z34trn - https://api.datacite.org/works/10.5061/dryad.fn2z34trn?include=resource-type
- 10.1038/s41597-020-00742-5 - 10.5061/dryad.fn2z34trn - https://api.datacite.org/works/10.5061/dryad.fn2z34trn?include=resource-type (find a different one if possible)
- 10.1038/s41597-020-00742-5 - 10.2105/AJPH.2014.302190 - https://api.crossref.org/v1/works/10.2105/AJPH.2014.302190
- 10.7287/peerj.preprints.26962v1/supp-1 - null - https://api.crossref.org/v1/works/10.7287/peerj.preprints.26962v1/supp-1
- 10.1038/s41597-020-00742-5 - 10.1001/archpedi.159.5.470 - https://api.datacite.org/works/10.1001/archpedi.159.5.470?include=resource-type
Cayenne output
Not sure about how the reference for 10.7287/peerj.preprints.26962v1/supp-1 would look like, including only the DOI for now.
There is one example here with an ISBN and no DOI, we can create an action from it but currently no event (although we should check what the percolator returns, it might use the URI in the ISBN field as the subject ID).
{
"status": "ok",
"message-type": "work",
"message-version": "1.0.0",
"message": {
"DOI": "10.1038/s41597-020-00742-5",
"type": "journal-article",
"reference": [
{
"key": "742_CR17",
"author": "C Zipfel",
"year": "2020",
"unstructured": "Zipfel C., Garnier R., Kuney M. & Bansal S. School exemptions in the United States. Dryad https://doi.org/10.5061/dryad.fn2z34trn (2020).",
"DOI": "10.5061/dryad.fn2z34trn",
"doi-asserted-by": "publisher"
},
{
"key": "742_CR27",
"author": "C Zipfel",
"year": "2020",
"unstructured": "Zipfel C., Garnier R., Kuney M. & Bansal S. School exemptions in the United States. Dryad https://doi.org/10.5061/dryad.fn2z34trn (2020).",
},
{
"key": "742_CR1",
"doi-asserted-by": "publisher",
"first-page": "e62",
"DOI": "10.2105/AJPH.2014.302190",
"volume": "104",
"author": "E Wang",
"year": "2014",
"unstructured": "Wang, E., Clymer, J., Davis-Hayes, C. & Buttenheim, A. Nonmedical exemptions from school immunization requirements: a systematic review. Am. J. Public. Health 104, e62–e84 (2014).",
"journal-title": "Am. J. Public. Health"
},
{
"DOI": "10.7287/peerj.preprints.26962v1/supp-1"
},
{
"key": "742_CR21",
"doi-asserted-by": "publisher",
"first-page": "470",
"DOI": "10.1001/archpedi.159.5.470",
"volume": "159",
"author": "DA Salmon",
"year": "2005",
"unstructured": "Salmon, D. A. et al. Factors associated with refusal of childhood vaccines among parents of school-aged children: a case-control study. Arch. Pediatr. Adolesc. Med. 159, 470–476 (2005).",
"journal-title": "Arch. Pediatr. Adolesc. Med."
},
{
"key": "ref=128",
"isbn-type": "print",
"volume-title": "Comportement au vent des ponts",
"author": "Christian Crémona",
"year": "2002",
"unstructured": "Crémona, Christian (2002), \"Comportement au vent des ponts\" , pp. 492",
"ISBN": "http://id.crossref.org/isbn/9782859783600"
}
]
}
}
Event
- (Crossref journal-article to DataCite dataset using DOI) -> Scholix
{
"license": "https://creativecommons.org/publicdomain/zero/1.0/",
"terms": "https://doi.org/10.13003/CED-terms-of-use",
"obj_id": "https://doi.org/10.5061/dryad.fn2z34trn",
"source_token": "36c35e23-8757-4a9d-aacf-345e9b7eb50d",
"occurred_at": "2014-05-18T13:26:49.000Z",
"subj_id": "https://doi.org/10.1038/s41597-020-00742-5",
"id": "962de995-17d7-482b-8caf-0a5b7d8f4d6f",
"action": "add",
"subj": {
"pid": "https://doi.org/10.1038/s41597-020-00742-5",
"work_type_id": "journal-article"
},
"source_id": "crossref",
"obj": {
"pid": "https://doi.org/10.5061/dryad.fn2z34trn",
"work_type_id": "dataset"
},
"timestamp": "2020-11-19T00:00:00Z",
"relation_type_id": "references"
}
- (Crossref journal-article to DataCite dataset using unstructured) -> Scholix
{
"license": "https://creativecommons.org/publicdomain/zero/1.0/",
"terms": "https://doi.org/10.13003/CED-terms-of-use",
"obj_id": "https://doi.org/10.5061/dryad.fn2z34trn",
"source_token": "36c35e23-8757-4a9d-aacf-345e9b7eb50d",
"occurred_at": "2014-05-18T13:26:49.000Z",
"subj_id": "https://doi.org/10.1038/s41597-020-00742-5",
"id": "4176dc6c-761e-4a68-909a-4597090565ea",
"action": "add",
"subj": {
"pid": "https://doi.org/10.1038/s41597-020-00742-5",
"work_type_id": "journal-article"
},
"source_id": "crossref",
"obj": {
"pid": "https://doi.org/10.6084/10.5061/dryad.fn2z34trn",
"work_type_id": "dataset"
},
"timestamp": "2020-11-19T00:00:00Z",
"relation_type_id": "references"
}
- (Crossref journal-article to Crossref journal-article) -> Ignored by scholix
{
"license": "https://creativecommons.org/publicdomain/zero/1.0/",
"terms": "https://doi.org/10.13003/CED-terms-of-use",
"obj_id": "https://doi.org/10.2105/AJPH.2014.302190",
"source_token": "36c35e23-8757-4a9d-aacf-345e9b7eb50d",
"occurred_at": "2014-05-18T13:26:49.000Z",
"subj_id": "https://doi.org/10.1038/s41597-020-00742-5",
"id": "6fce55d8-a0ba-4feb-a1be-3167ab212a3c",
"action": "add",
"subj": {
"pid": "https://doi.org/10.1038/s41597-020-00742-5",
"work_type_id": "journal-article"
},
"source_id": "crossref",
"obj": {
"pid": "https://doi.org/10.2105/AJPH.2014.302190",
"work_type_id": "text"
},
"timestamp": "2020-11-19T00:00:00Z",
"relation_type_id": "references"
}
- (Crossref journal-article to Crossref dataset) -> scholix
{
"license": "https://creativecommons.org/publicdomain/zero/1.0/",
"terms": "https://doi.org/10.13003/CED-terms-of-use",
"obj_id": "https://doi.org/10.7287/peerj.preprints.26962v1/supp-1",
"source_token": "36c35e23-8757-4a9d-aacf-345e9b7eb50d",
"occurred_at": "2018-05-27T00:00:00.000Z",
"subj_id": "https://doi.org/10.1038/s41597-020-00742-5",
"id": "433e0fda-6fd4-49a7-b0f0-e86c9af57647",
"action": "add",
"subj": {
"pid": "https://doi.org/10.1038/s41597-020-00742-5",
"work_type_id": "journal-article"
},
"source_id": "crossref",
"obj": {
"pid": "https://doi.org/10.7287/peerj.preprints.26962v1/supp-1",
"work_type_id": "dataset"
},
"timestamp": "2020-11-19T00:00:00Z",
"relation_type_id": "references"
}
- (Crossref journal-article to DataCite article) -> Ignored by Scholix
{
"license": "https://creativecommons.org/publicdomain/zero/1.0/",
"terms": "https://doi.org/10.13003/CED-terms-of-use",
"obj_id": "https://doi.org/10.1001/archpedi.159.5.470",
"source_token": "36c35e23-8757-4a9d-aacf-345e9b7eb50d",
"occurred_at": "2014-05-18T13:26:49.000Z",
"subj_id": "https://doi.org/10.1038/s41597-020-00742-5",
"id": "efeb0a6d-322c-4dd8-b478-2b2b49d3cc8e",
"action": "add",
"subj": {
"pid": "https://doi.org/10.1038/s41597-020-00742-5",
"work_type_id": "journal-article"
},
"source_id": "crossref",
"obj": {
"pid": "https://doi.org/10.1001/archpedi.159.5.470",
"work_type_id": "text"
},
"timestamp": "2020-11-19T00:00:00Z",
"relation_type_id": "references"
}
Scholix
- (Crossref journal-article to DataCite dataset using DOI) -> Scholix
{
"LinkPublicationDate": "2020-11-19T00:00:00Z",
"LinkProvider": [
{
"Name": "Crossref"
}
],
"RelationshipType": {
"Name": "References"
},
"LicenseURL": "https://creativecommons.org/publicdomain/zero/1.0/",
"Url": "https://api.eventdata.crossref.org/v1/events/scholix/962de995-17d7-482b-8caf-0a5b7d8f4d6f",
"Source": {
"Identifier": {
"ID": "10.1038/s41597-020-00742-5",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.1038/s41597-020-00742-5"
},
"Type": {
"Name": "literature"
}
},
"Target": {
"Identifier": {
"ID": "10.5061/dryad.fn2z34trn",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.6084/10.5061/dryad.fn2z34trn"
},
"Type": {
"Name": "dataset"
}
}
}
- (Crossref journal-article to DataCite dataset using unstructured) -> Scholix
{
"LinkPublicationDate": "2020-11-19T00:00:00Z",
"LinkProvider": [
{
"Name": "Crossref"
}
],
"RelationshipType": {
"Name": "References"
},
"LicenseURL": "https://creativecommons.org/publicdomain/zero/1.0/",
"Url": "https://api.eventdata.crossref.org/v1/events/scholix/4176dc6c-761e-4a68-909a-4597090565ea",
"Source": {
"Identifier": {
"ID": "10.1038/s41597-020-00742-5",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.1038/s41597-020-00742-5"
},
"Type": {
"Name": "literature"
}
},
"Target": {
"Identifier": {
"ID": "10.5061/dryad.fn2z34trn",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.6084/10.5061/dryad.fn2z34trn"
},
"Type": {
"Name": "dataset"
}
}
}
- (Crossref journal-article to Crossref journal-article) -> Ignored by scholix
- (Crossref journal-article to Crossref dataset) -> scholix
{
"LinkPublicationDate": "2020-11-19T00:00:00Z",
"LinkProvider": [
{
"Name": "Crossref"
}
],
"RelationshipType": {
"Name": "References"
},
"LicenseURL": "https://creativecommons.org/publicdomain/zero/1.0/",
"Url": "https://api.eventdata.crossref.org/v1/events/scholix/4176dc6c-761e-4a68-909a-4597090565ea",
"Source": {
"Identifier": {
"ID": "10.1038/s41597-020-00742-5",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.1038/s41597-020-00742-5"
},
"Type": {
"Name": "literature"
}
},
"Target": {
"Identifier": {
"ID": "10.7287/peerj.preprints.26962v1/supp-1",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.6084/10.7287/peerj.preprints.26962v1/supp-1"
},
"Type": {
"Name": "dataset"
}
}
}
- (Crossref journal-article to DataCite article) -> Ignored by Scholix