Skip to content

fix: FastAPI runs new threads for dependency resolutions

Shinya Maeda requested to merge async-dependency-resolution into main

What does this merge request do and why?

This MR fixes an issue that many threads are spun up when multiple requests are received concurrently, which could be a root cause of https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/409+ and https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17366+.

Currently, we're using FastAPI's Depends and Dependency Injector's Wiring (a.k.a Provide/@inject) in the following way:

async def chat(
    request: Request,
    chat_request: ChatRequest,
    anthropic_claude_factory: Factory[AnthropicModel] = Depends(
        Provide[ContainerApplication.chat.anthropic_claude_factory.provider]
    ),

This is actually an example usage provided by Dependency Injector, however, since Provider object is not async/coroutine compatible, FastAPI runs a new thread from the pool in order to resolve the dependency. See solve_dependencies module in FastAPI:

        elif is_coroutine_callable(call):
            solved = await call(**sub_values)
        else:
            solved = await run_in_threadpool(call, **sub_values)

https://github.com/tiangolo/fastapi/blob/0.108.0/fastapi/dependencies/utils.py#L600

This would be not a good practice because most of the dependencies inside the Dependency Injector are not thread-safe.

We fix this issue by passing async def to the Depends, so that it resolves the Dependency Injector in the main thread.

Related to https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/409+

How to set up and validate locally

  1. Enable thread monitoring:
# Instrumentators
AIGW_INSTRUMENTATOR__THREAD_MONITORING_ENABLED=True
AIGW_INSTRUMENTATOR__THREAD_MONITORING_INTERVAL=1
  1. poetry run ai_gateway
  2. Simulate concurrent requests:
for i in {1..10}
do
curl -X 'POST' \
  'http://0.0.0.0:5052/v1/chat/agent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "\n\nHuman: Hi, How are you?\n\nAssistant:",
        "provider": "anthropic",
        "model": "claude-2.0",
        "params": {
          "stop_sequences": [
            "\n\nHuman",
            "Observation:"
          ],
          "temperature": 0.2,
          "max_tokens_to_sample": 2048
        }
      }
    }
  ],
  "stream": false
}' &
done

(FYI, & is to execute the requests in subprocesses concurrently.)

  1. Make sure that the threads_count in the modelgateway_debug.log doesn't increase.

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Shinya Maeda

Merge request reports