Skip to content

fix: Avoid running a new thread per invalid request

What does this merge request do and why?

This MR fixes the issue that the server spins up a new thread for each invalid request in order to fill the response_body field in the access log. These threads linger even after the response is sent (and it seems waiting for acquiring an exclusive lock), thus I suspect that it could pressurize the CPU utilization.

Related to https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/409+ and Consolidate 401 error logs and log response con... (!62 - merged)

This MR introduces a new auth_error_details field to the access log, so that we can keep monitoring the authentication error messages. Here is an example of a log:

{
    "url": "http://0.0.0.0:5052/dummy",
    "path": "/dummy",
    "status_code": 401,
    "method": "GET",
    "correlation_id": "36e3a617c138489e979d2703408dd447",
    "http_version": "1.1",
    "client_ip": "127.0.0.1",
    "client_port": 54632,
    "duration_s": 0.000851937998959329,
    "cpu_s": 0.000851090999999915,
    "user_agent": "curl/8.4.0",
    "gitlab_instance_id": null,
    "gitlab_global_user_id": null,
    "gitlab_host_name": null,
    "gitlab_saas_namespace_ids": null,
    "gitlab_realm": null,
    "auth_error_details": "No authorization header presented",
    "logger": "api.access",
    "level": "info",
    "type": "mlops",
    "stage": "main",
    "timestamp": "2024-01-24T06:54:34.951007Z",
    "message": "127.0.0.1:54632 - \"GET /dummy HTTP/1.1\" 401"
}

This MR also introduces a new http_exception_details field to the access log, so that we can keep monitoring the http exception messages. This is a supplemental information to log the exceptions happening at the endpoint layer. For example, this test case is expected to encounter auth error due to the @requires(<scope-name>) check at the endpoint. This exception is raised as HTTPException and can be caught by FastAPI's exception handlers. Here is an example of a log:

{
    "url": "http://0.0.0.0:5052/v1/chat/agent",
    "path": "/v1/chat/agent",
    "status_code": 502,
    "method": "POST",
    "correlation_id": "cb7a438c39944bef993fa271f99ff009",
    "http_version": "1.1",
    "client_ip": "127.0.0.1",
    "client_port": 47430,
    "duration_s": 0.00301417999980913,
    "cpu_s": 0.0030138370000001302,
    "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
    "gitlab_instance_id": null,
    "gitlab_global_user_id": null,
    "gitlab_host_name": null,
    "gitlab_saas_namespace_ids": null,
    "gitlab_realm": null,
    "meta.feature_category": "duo_chat",
    "http_exception_details": "502: Anthropic API Status Error.",
    "logger": "api.access",
    "level": "info",
    "type": "mlops",
    "stage": "main",
    "timestamp": "2024-01-24T10:24:54.022925Z",
    "message": "127.0.0.1:47430 - \"POST /v1/chat/agent HTTP/1.1\" 502"
}

How to set up and validate locally

  1. Enable thread monitoring:
# Instrumentators
AIGW_INSTRUMENTATOR__THREAD_MONITORING_ENABLED=True
AIGW_INSTRUMENTATOR__THREAD_MONITORING_INTERVAL=1
  1. poetry run ai_gateway
  2. Simulate concurrent invalid requests:
for i in {1..10}
do
curl http://0.0.0.0:5052/dummy &
done

(FYI, & is to execute the requests in subprocesses concurrently.)

  1. Make sure that the threads_count in the modelgateway_debug.log doesn't increase.

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Shinya Maeda

Merge request reports