Build tool for generating public data files
Background
Crossref is committed to publishing public data files on a regular basis. We have so far done this annually in 2020 and 2021, though that frequency may change.
Currently, Dominika uses custom scripts to remove limited references from the JSON snapshot we use to generate the file. She also breaks the tar.gz into multiple smaller tar.gzs to facilitate hosting as a torrent.
We should build a tool to automate as much of this as makes sense.
How urgent
Depends on whether we want a tool for the 2022 public data file, the 2023 public data file, or some interval in between.
Definition of ready
-
Product owner: @ppolischuk1 -
Tech lead: @dtkaczyk -
Service:: label applied -
Definition of done updated -
Acceptance testing plan: -
Weight applied
Definition of done
-
Unit tests identified, implemented, and passing -
Code reviewed -
Available for acceptance testing via a staging URL, or otherwise -
Consider any impacts to current or future architecture/infrastructure, and update specifications and documentation as needed -
Knowledge base reviewed and updated -
Public documentation reviewed and updated -
Acceptance criteria met -
AC 1 -
AC 2
-
-
Acceptance testing passed -
Deployed to production