[Feature flag] Convert limiter queue to adopt LIFO strategy (UseResizableSemaphoreLifoStrategy Feature Flag)
## What
Copied from gitlab!122775 (comment 1425182274)
At several points in time we have discussed whether we want to change queueing semantics. Right now we admit queued processes from the head of the queue (FIFO), whereas it was proposed several times that it might be preferable to admit processes from the back (LIFO).
With a FIFO queue and high server load, the requests may time out (or the user aborts them) while waiting in the queue, and thus your highly-loaded server will not actually get work done (any work half-done is thrown away).
With a LIFO queue, this does not happen. And when the load is not so high, the queue algorithm doesn't really matter.
In more detail, for example, https://medium.com/swlh/fifo-considered-harmful-793b76f98374
Enable the :use_resizable_semaphore_lifo_strategy
feature flag ...
Owners
- Team: Gitaly
- Most appropriate slack channel to reach out to:
#g_gitaly
- Best individual to reach out to: @echui-gitlab
Expectations
What release does this feature occur in first?
in milestone v16.10
What are we expecting to happen?
We are expecting better performance when there is relatively higher traffic. With adaptive limiting turned on, slightly more requests should be served with requests staying a shorter time in the queue if they are waiting to be served.
What might happen if this goes wrong?
Since we are switching the queuing mechanism for concurrency limiters from FIFO to LIFO, if things go sour, we might expect more cancellation or dropped requests that have timed out. This is because LIFO does not handle requests as fairly as FIFO where the first request comes in , gets handled first. LIFO will unfairly favour the most recent request that comes in.
What can we monitor to detect problems with this?
The amount of dropped requests, requests served, the time spent being rate limited, latency of requests, cpu & memory of node.
Roll Out Steps
-
Enable on staging -
Is the required code deployed on staging? (howto) -
Enable on staging (howto) -
Add featureflagstaging to this issue (howto) -
Test on staging (howto) -
Verify the feature flag was used by checking Prometheus metric gitaly_feature_flag_checks_total
-
-
Enable on production -
Is the required code deployed on production? (howto) -
Progressively enable in production (howto) -
Add featureflagproduction to this issue -
Verify the feature flag was used by checking Prometheus metric gitaly_feature_flag_checks_total
-
-
Create subsequent issues -
To default enable the feature flag (optional, only required if backwards-compatibility concerns exist) -
Create issue using the Feature Flag Default Enable
template. -
Set milestone to current+1 release
-
-
To Remove feature flag -
Create issue using the Feature Flag Removal
template. -
Set milestone to current+1 (+2 if we created an issue to default enable the flag).
-
-
Please refer to the documentation of feature flags for further information.