post_api doesn't retry on 500 errors unlike query_api
We've been hitting consistent failures in our scheduled triage jobs with
v1.49.0 when posting comments to epics on self-hosted instance with premium license. The job fails immediately on the first POST request with InternalServerError, but GET requests work fine.
Actually problem is quite old, but finally I found solution to it.
After digging into it, I found that query_api retries on Errors::Network::InternalServerError:
# rest_api_network.rb line 37-39
response = execute_with_retry(
exception_types: [Net::ReadTimeout, Errors::Network::InternalServerError],
...
But post_api only retries on Net::ReadTimeout:
# rest_api_network.rb line 68-70
response = execute_with_retry(
exception_types: Net::ReadTimeout,
...
Same for delete_api.
The thing is, when I manually retry the exact same POST request right after the failure, it works. So GitLab is returning transient 500s that would succeed on retry.
I patched the gem locally to add InternalServerError to the retry list for post_api and delete_api, and the triage job completed successfully.
Steps to reproduce
- Run gitlab-triage against a group with many epics (we have ~160 pending close)
- First POST to create a note returns 500
- Job fails
Proposed fix
def post_api(url, body)
response = execute_with_retry(
- exception_types: Net::ReadTimeout,
+ exception_types: [Net::ReadTimeout, Errors::Network::InternalServerError],
backoff_exceptions: Errors::Network::TooManyRequests, debug: options.debug) do
Same change for delete_api.
