Validation of ROR and other institution IDs on input
As Crossref I want to make sure the institutional identifiers registered with me are as free from errors as possible because bad metadata is not very useful
What
Validation of institution identifiers supplied in members submission XML. We currently support 3 identifiers (ROR, ISNI, Wikidata).
Long-term we would like to move this sort of validation to Schematron, but for now since we are not able to validate within the XSD we will apply some validation rules when submissions are processed. The XSD file does require IDs to begin with https:// or HTTPS://
ROR: from https://ror.org/facts/
ROR ID (Example: https://ror.org/03yrm5c26)
Expressed as a URL that resolves to the organization’s record Unique and opaque character string: leading 0 followed by 6 characters (excludes I,L,O) and a 2-digit checksum based on the Crockford base-32 url library and ISO-7064 Crosswalks with other identifiers for the organization (GRID, ISNI, Crossref Funder Registry, Wikidata)
Example valid ROR ID https://ror.org/01k4yrm29 Example invalid ROR IDs: http://ror.org/11k4yrm29 (does not start with https://) https://ror.org/11k4yrm29 (does not start with 0, checksum is most certainly wrong) https://ror.org/01k34yrm29 (wrong number characters, checksum is most certainly wrong) https://ror.org/01kOyrm29 (contains invalid character - O, checksum is most certainly wrong) https://ror.org/01k4yrm30 (invalid checksum)
ISNI
From https://isni.org/page/faqs/ An ISNI is made up of 16 digits, the last character being a check character. The check character may be either a decimal digit or the character “X”. There are therefore one hundred thousand billion possible combinations.
Example valid ISNI: https://www.isni.org/isni/0000000405062673 Example invalid ISNI: https://www.isni.org/isni/000000040506263 (wrong number of digits) https://www.isni.org/isni/0000000405062679 (check character invalid) https://www.isnig.org/0000000405062673 (does not start with https://www.isni.org/isni)
Wikidata
From https://www.wikidata.org/wiki/Wikidata:Identifiers Wikidata identifiers Each Wikidata entity is identified by an entity ID, which is a number prefixed by a letter.
items, also known as Q-items, are prefixed with Q (e.g. Q12345), properties are prefixed by P (e.g. P569) and lexemes are prefixed by L (e.g. L1). Entity IDs can also be used as globally unique URIs that follow the pattern http://www.wikidata.org/entity/ID where ID is an entity ID.
Example valid wikidata entity ID: https://www.wikidata.org/entity/Q5188229 Example invalid wikidata entity ID: https://www.wikidata.org/Q5188229 (does not start with https://www.wikidata.org/entity/) https://www.wikidata.org/entity/H5188229 (letter is not one of Q,P,L) https://www.wikidata.org/entity/QPLPPL (end of ID is not number) https://www.wikidata.org/entity/QP5188229 (contains 2 letters, not 1)
note that we are enforcing https,
Why
I want to help our members supply good metadata, bad ROR or other IDs undercut the value of our metadata
How urgent
Very
Definition of ready
-
Product owner: @SaraBowman -
Tech lead: @myalter -
Service:: or C:: label applied -
Definition of done updated -
Acceptance testing plan: Check output of test cases -
Weight applied
Definition of done
-
Unit tests identified, implemented, and passing. Be sure to include noted test cases. -
SONAR on merge request branch checked by tech lead -
SONAR on merge request branch checked by reviewer -
Code reviewed -
Available for acceptance testing via a staging URL, or otherwise -
Consider any impacts to current or future architecture/infrastructure, and update specifications and documentation as needed -
Knowledge base reviewed and updated -
Public documentation reviewed and updated -
Acceptance criteria met -
System check validates ROR ID as having pattern https://ror.org/0 followed by 6 characters (excluding I,L,O), and finally a 2 digit checksum (based on the Crockford base-32 url library and ISO-7064) (see example valid/invalid ROR IDs above) -
System check validates ISNI as having pattern https://www.isni.org/isni/ followed by 16 characters, the last of which is a check character (see example valid/invalid ISNIs above) -
System check validates Wikidata entity ID as having pattern http://www.wikidata.org/entity/ followed by a letter (either Q, P, or L) and numbers (see example valid/invalid wikidata entity IDs above)
-
-
Acceptance testing passed -
Deployed to production
Prior to and during Backlog Refinement, consider the potential impacts this user story may have on the following areas:
- Billing/costs
- Internal documentation
- External documentation
- Schema
- Outputs
- Operations
- Support & Membership experience
- Outreach & Communications
- Testing
- Internationalization
- Accessibility
- Metrics, analytics, reporting
Additional details about the above items can be found here.