What does this MR do and why?

Allows more time for the status update to recover if the sha/pipeline is locked.

Missing a commit status update to a complete status could leave jobs hanging as pending/running which can block users.

Recently we upped the TTL on the lock to 60 seconds. This was based on a few outlier requests taking this long. A shorter lock lead to duplicates which was a bug --since the application expects 1 'current' status of a certain name per sha. note: We can have multiple non-current statuses which are marked as 'retried=true'.

We also started returning 409 conflict which should be a signal to retry the http request, unlike a 500 which was previously used. I've documented that in this MR.

Prior to this change the users could only expect a 500 from a conflict and so they may not have implemented retries on the client end. Especially across the many various integrations.

Today we retry for only 2 seconds based on the configured number of retries and sleep_sec:

    def pipeline_lock_params
      {
        ttl: (Feature.enabled?(:long_pipeline_lock_ttl, project) ? 1.minute : 5.seconds),
        sleep_sec: 0.1.seconds,
        retries: 20
      }
    end

We should increase the sleep_sec so that the request has more chance to recover if the pipeline is locked. Changing sleep_sec to .05 and retries to 20 gives us 10 seconds total retry time - enough to cover the 99th percentile (~2 seconds) with plenty of buffer. For longer request clients can retry on conflict.

Ultimately, a better architecture would be: #575990

Long running requests like the 60 second ones can affect reliability. This affect should be very contained since 99th percentile is around 2 seconds.

Metrics

https://log.gprd.gitlab.net/app/r/s/Cm5pW - We should see number of 409's reduced:

Edited Oct 09, 2025 by Allison Browne

Reduce number of Conflicts returned by Commit Status api

What does this MR do and why?

Metrics

Merge request reports