Add test corpus with cleaned references
This commit adds a test corpus that is used for automated tests, as part of this commit I have had to regenerate a lot of the assertion data since much of it has changed because of the lack of references.
There are some other minor changes because of upstream changes to the DOI.