Skip to content

limithandler: Fix queueing mechanism in the concurrency limiter

Patrick Steinhardt requested to merge pks-limithandler-fix-queueing into master

The concurrency limiter has two different mechanisms:

1. Callers first get put into a queue that is global across all
   limiting keys. This queue is depth- and time-limited, which means
   that callers will get rejected if it is full and will be evicted
   when they took too long to acquire the concurrency token.

2. The concurrency limit itself that will only allow a set number of
   callers at the same time. This functionality is per limiting key.

While the intent is usually to concurrency-limit either by user or by repository, this is sabotaged by the fact that the queue is key-agnostic and thus global. So even if almost all calls would apply to a single repository only, if the queue is enabled and full then it would become impossible to invoke the function for any other repository now. As a result the queue is rather useless as a concurrency-limiting primitive.

But there is another big design flaw: as callers need to be in the queue to obtain the concurrency-limiting token, and the number of callers in the queue is strictly limited, we essentially have a global concurrency limit to obtain the concurrency-limiting tokens. Suppose you have two calls to a function that has a maximum queue depth of 1 and two calls to this function at the same time. Even if the concurrency limit would now theoretically allow for both functions to run at the same time, there is a race window where both callers might try to enter the queue at the same point in time. If this race is lost, then one of both callers will be rejected due to the queue being full while the other one is trying to obtain the concurrency token. This issue in fact surfaces in our tests, where the TestStreamLimitHandler test is frequently failing because the race is often lost.

This second design flaw cannot easily be fixed while the queue remains global: we need to remain in the queue when trying to acquire the concurrency token, or otherwise it wouldn't really be limiting anything at all.

Convert the queue to be per-key so that each resource identified by a key essentially has its own queue. This fixes both issues:

- If the queue is full then we only limit access to the specific
  resource for which it is full. This makes the queueing mechanism
  useful again.

- We can now change the queueing logic to allow as many callers into
  the queue as the concurrency limit would allow _plus_ the allowed
  depth of the queue itself. This fixes the race described aboved.

Fixes #4625 (closed).

Fixes #4697 (closed).

Fixes https://gitlab.com/gitlab-org/gitaly/-/issues/4411.

Edited by Patrick Steinhardt

Merge request reports