fix: FastAPI runs new threads for dependency resolutions (!606) · Merge requests · GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

Shinya Maeda requested to merge async-dependency-resolution into main Jan 23, 2024

What does this merge request do and why?

This MR fixes an issue that many threads are spun up when multiple requests are received concurrently, which could be a root cause of https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/409+ and https://gitlab.com/gitlab-com/gl-infra/production/-/issues/17366+.

Currently, we're using FastAPI's Depends and Dependency Injector's Wiring (a.k.a Provide/@inject) in the following way:

async def chat(
    request: Request,
    chat_request: ChatRequest,
    anthropic_claude_factory: Factory[AnthropicModel] = Depends(
        Provide[ContainerApplication.chat.anthropic_claude_factory.provider]
    ),

This is actually an example usage provided by Dependency Injector, however, since Provider object is not async/coroutine compatible, FastAPI runs a new thread from the pool in order to resolve the dependency. See solve_dependencies module in FastAPI:

        elif is_coroutine_callable(call):
            solved = await call(**sub_values)
        else:
            solved = await run_in_threadpool(call, **sub_values)

https://github.com/tiangolo/fastapi/blob/0.108.0/fastapi/dependencies/utils.py#L600

This would be not a good practice because most of the dependencies inside the Dependency Injector are not thread-safe.

We fix this issue by passing async def to the Depends, so that it resolves the Dependency Injector in the main thread.

How to set up and validate locally

Enable thread monitoring:

# Instrumentators
AIGW_INSTRUMENTATOR__THREAD_MONITORING_ENABLED=True
AIGW_INSTRUMENTATOR__THREAD_MONITORING_INTERVAL=1

poetry run ai_gateway
Simulate concurrent requests:

for i in {1..10}
do
curl -X 'POST' \
  'http://0.0.0.0:5052/v1/chat/agent' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "prompt_components": [
    {
      "type": "string",
      "metadata": {
        "source": "string",
        "version": "string"
      },
      "payload": {
        "content": "\n\nHuman: Hi, How are you?\n\nAssistant:",
        "provider": "anthropic",
        "model": "claude-2.0",
        "params": {
          "stop_sequences": [
            "\n\nHuman",
            "Observation:"
          ],
          "temperature": 0.2,
          "max_tokens_to_sample": 2048
        }
      }
    }
  ],
  "stream": false
}' &
done

(FYI, & is to execute the requests in subprocesses concurrently.)

Make sure that the threads_count in the modelgateway_debug.log doesn't increase.

Merge request checklist

Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Edited Jan 23, 2024 by Shinya Maeda

fix: FastAPI runs new threads for dependency resolutions

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports