Imposing checks or limitations on Relationships between DOIs to prevent bad metadata
# Background [comment]: # (If this is a bug, fill in the following sections) A recent inquiry from a member revealed that It's very easy to enter bogus relationship metadata in Metadata Manager, and the nature of Metadata Manager means that users are highly compelled to fill in all the fields, even if they don't know what those fields mean. Moreover, once an incorrect relation is made, its possible to remove it from the bibliographic metadata, but not form the crm-item. And, we may want to think about ways to structurally limit the most nonsensical sorts of relationships. For example, an article cannot be its own preprint. This is a little bit sprawly. It may need to be split into several tickets: one specific to metadata manager; one about removing the crm-item relationship when the relationship has been removed from the deposited metadata; and potentially one about a broader check that applies to all deposits. # Observed behavior [comment]: # (Please provide as many details as you can about steps to reproduce, operating system, browser, and version used, screenshots, example submission IDs, example XML, example queries, etc) Member The Bhopal School of Social Sciences contacted us because they had received the following Preprint/VoR notification email, and they didn't know what it meant. > Member The Bhopal School of Social Sciences has deposited DOI 10.51767/jc1101 > (http://doi.org/10.51767/jc1101) > which may be the VoR for your posted content DOI 10.51767/jc1101. > > Please display a link to the Version of Record from your posted content online. > Linking postedContent to the published record is critical to enabling the full history of scholarly results, > and ensuring that the citation record is clear and up-to-date. > > If you have questions please contact support@crossref.org and one of our colleagues > (in the EST timezone) will get back to you. > > Many thanks, > Crossref The email indicates that 10.51767/jc1101 is a preprint of itself. This was triggered because The Bhopal School of Social Sciences added that relationship in Metadata Manager when they initially registered 10.51767/jc1101 ![Screen_Shot_2021-02-12_at_3.49.49_PM](/uploads/e6e35248f2cd60c422627aaa53accee6/Screen_Shot_2021-02-12_at_3.49.49_PM.png) The fact that they did is this is probably a combination of two factors: 1) Metadata Manager encourages people to fill in as many fields as they can, even if they don't know what they mean or have any relevant data (that's how we get things like "N/A" and "none" from both MM and Web Deposit users too) and 2) Lots of users misunderstand what a preprint it and assume that it means the same things as a published accepted manuscript. Isaac removed that relationship from 10.51767/jc1101 in Metadata Manager and redeposited it. The relationship was successfully overwritten in the bibliographic metadata, but it remains in the crm-item [XML API Query for 10.51767/jc1101](http://doi.crossref.org/search/doi?pid=support@crossref.org&format=unixsd&doi=10.51767%2Fjc1101) `<crm-item name="relation" type="doi" claim="hasPreprint">10.51767/jc1101</crm-item>` # Expected behavior [comment]: # (What did you expect to happen before you observed the anomalous behavior?) The most (only?) straightforward expectation is that, if the relationship metadata is removed, the relationship <crm_item> should also be removed. I don't know if this is "expected" exactly, since we don't have a consistent approach to "saving users from themselves" when it comes to submitting bad metadata. We do it for some things (ISSNs most notably) and not for others. But, it's worth considering how to do that with relationship metadata. The most obvious might be that we shouldn't allow a DOI to assert a relationship with itself. This seems like it should not just be a limitation imposed in Metadata Manager, but for all deposits. In Metadata Manager, we might want to think about removing the Related Items fields entirely or limiting the relationships to DOIs that have a prefix that's different than the depositor's prefix. (there are legit use cases where two DOIs with the same prefix would have relationship metadata linking them, but that doesn't seem to be what MM users are doing.) Or, for as long as MM only accepts journal article content, we could also consider removing the relationship types that couldn't possibly pertain to journal articles (e.g. IsReviewOf, IsPreprint, etc.) from the list of relationships that it's possible to add. # How urgent [comment]: # (There are myriad factors that go into prioritizing and scheduling development work, but any information you can provide to help us understand severity, urgency, relative priority, or deadlines, is much appreciated.) Not allowing self-referential relationships seems like the most urgent of these issues. I don't know how often it happens, but given how little relationship metadata there still is overall, these mistakes make a disproportionate impact. [comment]: # (No need to update the Definition of ready when filing issues, but feel free to have a go if you're familiar with the territory.) # Definition of ready - [x] Product owner: @SaraBowman - [x] Tech lead: @myalter - [x] Service:: label applied - [ ] Definition of done updated - [ ] Acceptance testing plan: - [ ] Weight applied [comment]: # (Feel free to leave this as is, or suggest changes. We'll update these during Backlog Refinement, prior to bringing this into a sprint.) # Definition of done - [ ] Unit tests identified, implemented, and passing - [ ] Code reviewed - [ ] Available for acceptance testing via a staging URL, or otherwise - [ ] Consider any impacts to current or future architecture/infrastructure, and update specifications and documentation as needed - [ ] Knowledge base reviewed and updated - [ ] Public documentation reviewed and updated - [ ] Acceptance criteria met - [ ] AC 1 - [ ] AC 2 - [ ] Acceptance testing passed # Notes [comment]: # (By default all issues need to be labeled Planning::New, only remove if you know what you're doing)
issue