As a Scholix user I would like all data citations to be provided in a Scholix endpoint
...with data taken from the Manifold database and as rich metadata as possible, in order to identify and use data citations.
Background
We run a Scholix endpoint, but it only has a limited number of data citations that we know because of infrastructure issues. These will be solved by Manifold and allow us to build a richer Scholix endpoint.
Observed behavior
Some data citations are provided with minimal metadata.
Expected behavior
All data citations that we know about provided with as much metadata as we can feasibly manage. See a proposal for output below.
How urgent
Supporting data citation is a high organisational priority.
Discussion of output
First, an example of current output:
{
"status": "ok",
"message-type": "link-package-list",
"message": {
"next-cursor": "fa3499ce-b4d2-4707-8c79-e87131ed4f5e",
"total-results": 960284,
"items-per-page": 10,
"link-packages": [
{
"LinkPublicationDate": "2021-01-01T01:17:41Z",
"LinkProvider": [
{
"Name": "datacite"
}
],
"RelationshipType": {
"Name": "IsRelatedTo"
},
"LicenseURL": "https://creativecommons.org/publicdomain/zero/1.0/",
"Url": "https://api.eventdata.crossref.org/v1/events/scholix/282f3a39-11a5-4bca-997f-b039d0066c55",
"Source": {
"Identifier": {
"ID": "10.15468/k8j44v",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.15468/k8j44v"
},
"Type": {
"Name": "dataset"
}
},
"Target": {
"Identifier": {
"ID": "10.3897/mycokeys.76.58465",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.3897/mycokeys.76.58465"
},
"Type": {
"Name": "literature"
}
}
},
{
"LinkPublicationDate": "2021-01-01T01:17:58Z",
"LinkProvider": [
{
"Name": "datacite"
}
],
"RelationshipType": {
"Name": "IsRelatedTo"
},
"LicenseURL": "https://creativecommons.org/publicdomain/zero/1.0/",
"Url": "https://api.eventdata.crossref.org/v1/events/scholix/a46a66e5-1076-493e-8418-390997d2c210",
"Source": {
"Identifier": {
"ID": "10.15468/q7wd45",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.15468/q7wd45"
},
"Type": {
"Name": "dataset"
}
},
"Target": {
"Identifier": {
"ID": "10.3897/mycokeys.76.58406",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.3897/mycokeys.76.58406"
},
"Type": {
"Name": "literature"
}
}
},
...
A new endpoint
Each Scholix link package has six sections. We'll look at each in turn.
Some points to consider:
- There seems to be inconsistent capitalisation of fields in the Scholix schema: name is not capitalised for publishers, although it is in the PDF.
- Why IDScheme and SubTypeSchema? i vs a.
License URL
There seems not need to change this from what we currently provide, i.e. just the link to a Creative Commons CC0 'license'. "LicenseURL": "https://creativecommons.org/publicdomain/zero/1.0/",
Link publication date
Scholix recommend W3CDTF from ISO8601, e.g. 2021-01-01T12:01:34Z. This is the date of creation of the link package, different to the publication dates of the items involved. We can use the earliest created date from Manifold, i.e. the equivalent of the created date in the Event Data API. "LinkPublicationDate": "2021-01-01T01:17:41Z",
Link Provider
Currently we only provide the name of the link provider, however they will be items in Manifold and we can use a ROR identifier. Curiously, we can have multiple providers which may be useful if we have several chains of data. We should consider including Crossref in all records.
"LinkProvider": [
{
"Name": "datacite",
"Identifier": {
"IDURL": "https://ror.org/04wxnsj81",
"IDScheme": "ROR"
},
{
"Name": "crossref",
"Identifier": {
"IDURL": "https://ror.org/02twcfp32",
"IDScheme": "ROR"
}
]
Relationship type
Currently we include only the Scholix relationship type, however we can additionally include the type as defined by the Crossref or DataCite schema.
"RelationshipType": {
"Name": "IsRelatedTo",
"SubType": "references",
"SubTypeSchema": "https://data.crossref.org/schemas/relations.xsd"
},
Source and Target
Similar to the relationship type, we can includee the Crossref or DataCite type name. We can also pull in the title where known.
- We could consider putting the ID in URL form and removing the IDUrl field.
- Can include authors. Use the format "first name, last name" and include ORCID where known.
"Source": {
"Identifier": {
"ID": "10.15468/k8j44v",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.15468/k8j44v"
},
"Type": {
"Name": "dataset",
"SubType": "dataset",
"SubTypeSchema": "https://data.crossref.org/schemas/crossref5.3.1.xsd"
},
"Title": "Morpho-phylogenetic evidence reveals new species in Rhytismataceae (Rhytismatales, Leotiomycetes, Ascomycota) from Guizhou Province, China",
"PublicationDate": 2020-01-01T00:00:00Z,
"Creator": [
{
"Name": "Jin-Feng, Zhang",
"Identifier": [
{
"ID": "https://orcid.org/0000-0002-4969-255X",
"IDScheme": "ORCID"
}
]
}
]
"Publisher": [
{
"Name": "Pensoft",
"Identifier": [
{
"ID": "https://ror.org/01znaqx63",
"IDScheme": "ROR"
}
]
}
]
}
URL
This is a bonus one that isn't in the schema, but we do include in our output. "Url": "https://api.eventdata.crossref.org/v1/events/scholix/282f3a39-11a5-4bca-997f-b039d0066c55",
Header
We also need a header. It should look similar to the Metadata REST API output. The only element currently missing is the message-version.
{
"status": "ok",
"message-type": "link-package-list",
"message-version": "2.0.0",
"message": {
"next-cursor": "fa3499ce-b4d2-4707-8c79-e87131ed4f5e",
"total-results": 960284,
"items-per-page": 10,
"link-packages": [
Putting it all together
A full record would look like this:
"status": "ok",
"message-type": "link-package-list",
"message-version": "2.0.0",
"message": {
"next-cursor": "fa3499ce-b4d2-4707-8c79-e87131ed4f5e",
"total-results": 960284,
"items-per-page": 10,
"link-packages": [
{
"LicenseURL": "https://creativecommons.org/publicdomain/zero/1.0/",
"LinkPublicationDate": "2021-01-01T01:17:41Z",
"LinkProvider": [
{
"Name": "datacite",
"Identifier": {
"IDURL": "https://ror.org/04wxnsj81",
"IDScheme": "ROR"
},
},
{
"Name": "crossref",
"Identifier": {
"IDURL": "https://ror.org/02twcfp32",
"IDScheme": "ROR"
}
}
],
"RelationshipType": {
"Name": "IsRelatedTo",
"SubType": "references",
"SubTypeSchema": "https://data.crossref.org/schemas/relations.xsd"
},
"Source": {
"Identifier": {
"ID": "10.15468/k8j44v",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.15468/k8j44v"
},
"Type": {
"Name": "dataset",
"SubType": "dataset",
"SubTypeSchema": "https://data.crossref.org/schemas/crossref5.3.1.xsd"
},
"Title": "Morpho-phylogenetic evidence reveals new species in Rhytismataceae (Rhytismatales, Leotiomycetes, Ascomycota) from Guizhou Province, China",
"PublicationDate": "2020-01-01T00:00:00Z",
"Creator": [
{
"Name": "Jin-Feng, Zhang",
"Identifier": [
{
"ID": "https://orcid.org/0000-0002-4969-255X",
"IDScheme": "ORCID"
}
]
}
],
"Publisher": [
{
"Name": "Pensoft",
"Identifier": [
{
"ID": "https://ror.org/01znaqx63",
"IDScheme": "ROR"
}
]
}
]
},
"Target": {
"Identifier": {
"ID": "10.3897/mycokeys.76.58465",
"IDScheme": "DOI",
"IDUrl": "https://doi.org/10.3897/mycokeys.76.58465"
},
"Type": {
"Name": "dataset",
"SubType": "dataset",
"SubTypeSchema": "https://data.crossref.org/schemas/crossref5.3.1.xsd"
},
"Title": "Morpho-phylogenetic evidence reveals new species in Rhytismataceae (Rhytismatales, Leotiomycetes, Ascomycota) from Guizhou Province, China",
"PublicationDate": "2020-01-01T00:00:00Z",
"Creator": [
{
"Name": "Jin-Feng, Zhang",
"Identifier": [
{
"ID": "https://orcid.org/0000-0002-4969-255X",
"IDScheme": "ORCID"
}
]
}
],
"Publisher": [
{
"Name": "Pensoft",
"Identifier": [
{
"ID": "https://ror.org/01znaqx63",
"IDScheme": "ROR"
}
]
}
]
},
"Url": "https://api.eventdata.crossref.org/v1/events/scholix/282f3a39-11a5-4bca-997f-b039d0066c55"
}
]
}
}
Definition of ready
-
Product owner: -
Tech lead: -
Service:: or C:: label applied -
Definition of done updated -
Acceptance testing plan: -
Weight applied
Definition of done
-
Unit tests identified, implemented, and passing -
Code reviewed -
Available for acceptance testing via a staging URL, or otherwise -
Consider any impacts to current or future architecture/infrastructure, and update specifications and documentation as needed -
Knowledge base reviewed and updated -
Public documentation reviewed and updated -
Acceptance criteria met -
AC 1 -
AC 2
-
-
Acceptance testing passed -
Deployed to production