post_api doesn't retry on 500 errors unlike query_api

We've been hitting consistent failures in our scheduled triage jobs with v1.49.0 when posting comments to epics on self-hosted instance with premium license. The job fails immediately on the first POST request with InternalServerError, but GET requests work fine.

Actually problem is quite old, but finally I found solution to it.

After digging into it, I found that query_api retries on Errors::Network::InternalServerError:

# rest_api_network.rb line 37-39
response = execute_with_retry(
  exception_types: [Net::ReadTimeout, Errors::Network::InternalServerError],
  ...

But post_api only retries on Net::ReadTimeout:

# rest_api_network.rb line 68-70
response = execute_with_retry(
  exception_types: Net::ReadTimeout,
  ...

Same for delete_api.

The thing is, when I manually retry the exact same POST request right after the failure, it works. So GitLab is returning transient 500s that would succeed on retry.

I patched the gem locally to add InternalServerError to the retry list for post_api and delete_api, and the triage job completed successfully.

image

Steps to reproduce

  1. Run gitlab-triage against a group with many epics (we have ~160 pending close)
  2. First POST to create a note returns 500
  3. Job fails

Proposed fix

def post_api(url, body)
  response = execute_with_retry(
-   exception_types: Net::ReadTimeout,
+   exception_types: [Net::ReadTimeout, Errors::Network::InternalServerError],
    backoff_exceptions: Errors::Network::TooManyRequests, debug: options.debug) do

Same change for delete_api.

Edited Mar 02, 2026 by Dawid Rycerz
Assignee Loading
Time tracking Loading