Adaptive request concurrency: don't treat retried requests as evidence of server capacity

What does this MR do?

Fixes a signal-misclassification bug in the adaptive request concurrency controller (FF_USE_ADAPTIVE_REQUEST_CONCURRENCY).

Before this MR, buildsHelper.releaseRequest inferred "the server has spare capacity" from the final HTTP response only. When the network retry layer transparently retried past a 429 (or 5xx) and eventually received a 201 with a job, the controller treated that as a clean success and scaled concurrency up — exactly the wrong direction, since the server had just told us to slow down.

After this MR, the retry layer reports whether any retries happened during a request, and the controller treats a retried request as an explicit slow-down signal — the adaptive limit halves on the spot (multiplicative decrease, AIMD-style), instead of growing as if the request had been a clean success.

Why this matters

Under sustained server-side back-pressure (either a gateway rate-limiter returning 429s or transient 5xx from the API itself), a runner with adaptive concurrency enabled could climb to its configured request_concurrency cap and stay there, amplifying the pressure rather than relieving it. For large fleets sharing an egress IP, this turns an ordinary rate-limit response into a cascading event because each runner continues issuing concurrent requests at the maximum allowed rate despite receiving explicit slow-down signals.

The fix is scoped: it corrects a misinterpretation of the existing control signal. It does not introduce new policy (no new feature flag, no new knobs, no behaviour change for the default AIMD shape when the server is healthy).

Behaviour change

releaseRequest(runner, hasJob, retried) now distinguishes between a passive non-capacity signal (an empty/failed clean response) and an explicit server-driven slow-down signal (retries happened):

hasJob	retried	Effect on adaptive limit
true	false	×1.1 (+10%) — unchanged, healthy success
true	true	×0.5 (−50%) — explicit slow-down, AIMD multiplicative decrease
false	false	×0.95 (−5%) — unchanged, no work right now
false	true	×0.5 (−50%) — explicit slow-down

The split between ×0.95 and ×0.5 mirrors TCP's AIMD congestion control: gentle additive decrease on passive signals (no work to do), aggressive multiplicative decrease on explicit congestion signals (server pushed back). When the feature flag is off, behaviour is unchanged.

How the retry signal is surfaced

network.WithRetryTracker(ctx) (ctx, *atomic.Bool) attaches a writable flag to a request context. The retry layer sets the flag when it observes a retriable status and decides to retry. The caller reads it after the request completes.

This mirrors the pattern used by net/http/httptrace.WithClientTrace — a request-scoped observer installed via context. It avoids rippling a new return value through client.do → doJSON → doMeasuredJSON → RequestJob → Network (six signatures plus the mocked interface).

Observed behaviour

Measured over a 10-second window driving the real network client and adaptive controller against a fake API endpoint that returns retriable statuses mixed with successes, with request_concurrency=20 and a fast feed cadence simulating many runners behind one egress IP.

Requests the server saw (adaptive FF on)

Lower is less pressure on the API gateway.

xychart-beta
    title "Server-observed requests over 10s (adaptive=on)"
    x-axis ["clean", "429-then-job (cold)", "429-then-empty (cold)", "503-then-job (cold)", "429-then-job (warm)"]
    y-axis "requests" 0 --> 500
    bar "before fix" [200, 113,  20, 378, 480]
    bar "after fix"  [200,  20,  20, 134, 162]

Max adaptive concurrency limit observed (adaptive FF on)

xychart-beta
    title "Max adaptive concurrency limit (adaptive=on)"
    x-axis ["clean", "429-then-job (cold)", "429-then-empty (cold)", "503-then-job (cold)", "429-then-job (warm)"]
    y-axis "adaptive limit" 0 --> 22
    bar "before fix" [20, 20,  1, 20, 20]
    bar "after fix"  [20,  1,  1,  1, 20]

Headline changes:

Scenario	Requests before	Requests after	Δ	Adaptive limit at end of run, before	… after
429-then-job (cold start)	113	20	−82%	16.29	1.00
503-then-job (cold start)	378	134	−65%	20.00	1.00
429-then-job (warm start)	480	162	−66%	15.48	1.00

The clean and 429-then-empty scenarios are unchanged, confirming the fix is scoped to the misclassification and not a policy change for healthy runners or for already-correct paths. The warm start row reaches the same maximum adaptive limit (20) before pressure begins — the difference shows up after pressure starts: pre-fix the limit barely moves (final 15.48 of 20); post-fix it collapses to the floor (1.0) within a few retried requests.

Testing

New unit tests:

commands/builds_helper_test.go:
- TestBuildsHelper_ReleaseRequest_AdaptiveGrowsOnCleanSuccess — clean first-try successes grow to the configured cap (unchanged behaviour).
- TestBuildsHelper_ReleaseRequest_RetriedDoesNotGrowAdaptive — retried+hasJob requests keep the limit clamped at 1 (fix).
- TestBuildsHelper_ReleaseRequest_RetriedAfterWarmupShrinksAdaptiveFast — multiplicative decrease collapses a warm-high limit to the floor in ~6 retried requests.
- TestBuildsHelper_ReleaseRequest_RetriedDecreasesFasterThanEmptyResponse — explicit slow-down signal shrinks adaptive faster than passive empty responses.
- TestBuildsHelper_ReleaseRequest_RetriedIgnoredWhenAdaptiveDisabled — FF=off is unaffected.
network/retry_signal_test.go: table-driven TestWithRetryTracker covering first-try success (no flag), 429-then-success (flag set), 503-then-success (flag set, confirms not 429-specific), and no-tracker-installed (no panic).

All existing tests in the touched packages pass; the releaseRequest signature change is propagated through commands/multi.go and existing test call sites.

Edited Apr 27, 2026 by Arran Walker

Draft: Adaptive request concurrency: do not treat retried requests as capacity