Skip to content
GitLab
  • Menu
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
    • Switch to GitLab Next
  • Sign in / Register
  • I Issue Tracker
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 396
    • Issues 396
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Insights
    • Issue
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • crossrefcrossref
  • Issue Tracker
  • Issues
  • #1533
Closed
Open
Created Oct 13, 2021 by Dominika Tkaczyk@dtkaczykMaintainer14 of 19 checklist items completed14/19 checklist items

Cayenne should not remove characters from DOIs during ingestion

Background

Cayenne removes some characters, including non-printable and unicode characters, from DOI while ingesting. This happens when XML is transformed into ItemTree. The scale is unknown.

The change was introduced here: rest_api@923086f5 and affected both old and new REST API.

Observed behavior

Example DOI 10.2741/Ortéga was transformed into DOI 10.2741/Ortga during ingestion: https://api.crossref.org/works/10.2741/Ortga

So we have two unwanted effects: 1) some DOIs are not accessible though REST API or JSON snapshot (10.2741/Ortéga), and 2) REST API and JSON snapshot contain DOIs that do not exist (10.2741/Ortga)

Expected behavior

Cayenne should not modify DOIs in any way. When a DOI appears in the bucket, it means it is successfully registered, so Cayenne should ingest and index it as is.

See also @gbilder's comment: #1231 (comment 702197157)

How urgent

Important also in the context of the Manifold.

Definition of ready

  • Product owner: @ppolischuk1
  • Tech lead: @dtkaczyk
  • Service:: or C:: label applied
  • Definition of done updated
  • Acceptance testing plan:
  • Weight applied

Definition of done

  • Unit tests identified, implemented, and passing
  • Code reviewed
  • Available for acceptance testing via a staging URL, or otherwise
  • Consider any impacts to current or future architecture/infrastructure, and update specifications and documentation as needed
  • Knowledge base reviewed and updated
  • Public documentation reviewed and updated
  • Acceptance criteria met
    • update the code so that no characters are removed from DOIs
    • determine the list of DOIs that were affected by removal of characters in the past
    • delete from ES index modified DOIs that were result of character removal
    • generate a mapping of deleted and replacement DOIs
  • Acceptance testing passed
  • Deployed to production

Notes

Edited May 05, 2022 by Patrick Polischuk
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking