Remove Scholix metadata retrieval at index time
This is a known point that constricts indexing of Events, and hasn't scaled with the new volume of DataCite records. It may not be the only problem, but it's certainly a blocker.
Steps:
- Adjust the Event Data Query API indexer code to no longer retrieve metadata.
- Re-deploy.
- Manually remove the metadata cache index from Elastic as it will no longer be used.
Observed behavior
Indexing is too slow with current volume.
Expected behavior
Remove retrieval of metadata retrieval so it can be implemented upstream in Agents. (TODO)
Definition of ready
-
Product owner: @mrittman -
Tech lead: @afandian -
Service:: label applied -
Definition of done updated -
Acceptance testing plan: Unit tests. Manual smoke test. -
Weight applied
Definition of done
-
Unit tests identified, implemented, and passing -
Code reviewed -
Knowledge base reviewed and updated -
Public documentation reviewed and updated -
Consider any impacts to current or future architecture/infrastructure, and update specifications and documentation as needed -
Acceptance criteria met -
Event Data Query API indexer code no longer retrieves metadata. -
Metadata cache removed from production index in Elastic as it will no longer be used -
Compare Pingdom availability before and after, and response time in AWS.
-
Notes
- This involves the removal of the
scholix
Elastic Search index. Events will be transformed on the way out of hte index, rather than on the way in. This means there's one less index to store. - This also involves the removal fo the work-cache index as it was used to maintain the work types which are now being added to events upstream.
- The Scholix endpoint will query the standard index. An additional indexed flag will be added to the document in the standard index to mark inclusion in the Scholix dataset. The API endpoint will need to add additional Elastic Seach filters to scope it down to the scholix dataset. This will be invisible to the user.
Edited by Joe Wass