As a Metadata Plus user, I'd like the cursor timeout to be increased (a 5-minute expiration is too short)
It is my understanding that cursoring in Elasticsearch has changed from cursoring in Solr. One Metadata Plus user (Informatics Publishing - Zendesk 377133) and one Public pool user (National Institutes of Health - Zendesk 376078) have contacted us about increasing the current cursor timeout of five minutes.
We in support do not have a concrete recommendation on the proper timeout. We're thinking somewhere between 10 and 60 minutes. As you can see below, NIH is requesting 1-2 days, which seemed excessive and likely does not meet best practice. Perhaps the tech team has a best practice recommendation for an increase?
In the case of Informatics Publishing, support will get them configured so they are using Plus, but when Plus is cutover this timeout problem will just be an issue for them there.
This is the call they are using:
The NIH query is less straightforward. They're trying to retrieve all works. Shayn is working with them to try to get them to use the public data file as a starting point. I'm going to paste the entirety of their recent message into support below, as I think most of it is relevant and also includes some info about what they are seeing from our API post-migration, which I found interesting:
The downloading of Crossref data is part of our pipeline. A Python script downloads data from your site. This script will re-try when there are network errors. Another shell scripts invokes and monitors the Python script. When the Python script fails, the shell script will restart it. But still the whole process may fail if we have to run them for several weeks. We can’t monitor the pipeline for 24 hours every day. If the pipeline fails at night, we will resume it the next morning. Up to now, the Crossref downloading has been very stable compared to the other data sources. It’s not easy for this huge data set. We really appreciate your hard work. In the past, the whole process took three weeks and were interrupted only 1-3 times.
After you upgrade your system recently, I found that the downloading was much faster. For example, we can download 30% in the past 24 hours. Don’t know whether this speed can be maintained in the future.
It would be great if you can provide a bulk version for downloading every year. But last time you said you didn’t have a plan yet. We have tried batch downloading based on update dates as we have done for another unstable data source. But this makes our script more complicated and won’t save any time. And your data is keep changing, the update dates are keep changing. There will be overlap among batches.
Obviously, five minutes are not enough for us to resume our pipeline. Now I have a third script to check the downloading status every 2 minutes. But this is inefficient and just a temporary solution. Is it possible that you decide the cursor expiration time based on the count of records that will be downloaded? For a 100M downloading, can the expiration time be set as 1-2 days?
What
Increase the cursor timeout.
Why
Although some API users could certainly use more efficient data retrieval processes, it seems that simply increasing the timeout would be a good starting point and would reduce the load on support (albeit support with guidance from the tech team will likely continue to find suggestions for optimization).
How urgent
Moderately. Two requests for this just this week.
Definition of ready
-
Product owner: @ppolischuk1 -
Tech lead: @dtkaczyk -
Service:: or C:: label applied -
Definition of done updated -
Acceptance testing plan: -
Weight applied
Definition of done
-
Unit tests identified, implemented, and passing -
SONAR on merge request branch checked by tech lead -
SONAR on merge request branch checked by reviewer -
Code reviewed -
Available for acceptance testing via a staging URL, or otherwise -
Consider any impacts to current or future architecture/infrastructure, and update specifications and documentation as needed -
Knowledge base reviewed and updated -
Public documentation reviewed and updated -
Acceptance criteria met -
Respond to and close Zendesk ticket 376078 -
Respond to and close Zendesk ticket 377133 -
Cursor timeout increased
-
-
Acceptance testing passed -
Deployed to production
Prior to and during Backlog Refinement, consider the potential impacts this user story may have on the following areas:
- Billing/costs
- Internal documentation
- External documentation
- Schema
- Outputs
- Operations
- Support & Membership experience
- Outreach & Communications
- Testing
- Internationalization
- Accessibility
- Metrics, analytics, reporting
Additional details about the above items can be found here.