Skip to content

CR-953: Refactor out UNIXSD-derived data

Joe Wass requested to merge feature/CR-953 into main

UNIXML-supplied "publisher" gets set as a property (with :name and :location) on the item's container, in accordance with the UNIXML schema.

UNIXSD-supplied publisher kept as "steward" relationship.

The parser for the following XML elements, which may contain the element, were updated:

  • book_metadata
  • book_series_metadata
  • book_set_metadata
  • database_metadata
  • proceedings_metadata
  • proceedings_series_metadata
  • report-paper_metadata
  • report-paper_series_metadata
  • standard_metadata

The rendering behaviour of citeproc-json is unchanged. For the publisher JSON field it consults the element of the conatiner, falling back to the UNIXSD crm-item.

This behaviour is moved from the XML parser to the citeproc-json renderer, which keeps the item tree correct.

Item Tree refactor. Change UNIXSD parser to item tree structure. Change item tree to Elastic Search converter so the output doesn’t change. So there is no breaking change.

Remove ‘publisher’ field from the item tree.

Split the ‘publisher’ out into separate items and relationships:

  • steward (publisher name and id)
  • prefix (owner prefix)
  • container (book-id or journal-id)

Attach those relationships directly to the item being identified by the DOI in the crm-item (e.g. the article) not the top level node (e.g. journal).

The following fields are now attached to the item directly:

  • steward
  • prefix
  • container
  • relations
  • deposited (moved to be a property)
  • first-deposited (moved to be a property)
  • cited-count (moved to be a property)

Updated parser regression test files. Updated convert to elastic regression test files.

feat(parser): remove the OAI-PMH format from test suite and parser as it's not used and won't be useful in future To enable CR-73, our larger epic for refactoring the tree.

CR-133 CR-953 CR-961

Edited by Joe Wass

Merge request reports