fix: Avoid running a new thread per invalid request
What does this merge request do and why?
This MR fixes the issue that the server spins up a new thread for each invalid request in order to fill the response_body
field in the access log. These threads linger even after the response is sent (and it seems waiting for acquiring an exclusive lock), thus I suspect that it could pressurize the CPU utilization.
Related to https://gitlab.com/gitlab-org/modelops/applied-ml/code-suggestions/ai-assist/-/issues/409+ and Consolidate 401 error logs and log response con... (!62 - merged)
This MR introduces a new auth_error_details
field to the access log, so that we can keep monitoring the authentication error messages. Here is an example of a log:
{
"url": "http://0.0.0.0:5052/dummy",
"path": "/dummy",
"status_code": 401,
"method": "GET",
"correlation_id": "36e3a617c138489e979d2703408dd447",
"http_version": "1.1",
"client_ip": "127.0.0.1",
"client_port": 54632,
"duration_s": 0.000851937998959329,
"cpu_s": 0.000851090999999915,
"user_agent": "curl/8.4.0",
"gitlab_instance_id": null,
"gitlab_global_user_id": null,
"gitlab_host_name": null,
"gitlab_saas_namespace_ids": null,
"gitlab_realm": null,
"auth_error_details": "No authorization header presented",
"logger": "api.access",
"level": "info",
"type": "mlops",
"stage": "main",
"timestamp": "2024-01-24T06:54:34.951007Z",
"message": "127.0.0.1:54632 - \"GET /dummy HTTP/1.1\" 401"
}
This MR also introduces a new http_exception_details
field to the access log, so that we can keep monitoring the http exception messages. This is a supplemental information to log the exceptions happening at the endpoint layer. For example, this test case is expected to encounter auth error due to the @requires(<scope-name>)
check at the endpoint. This exception is raised as HTTPException
and can be caught by FastAPI's exception handlers. Here is an example of a log:
{
"url": "http://0.0.0.0:5052/v1/chat/agent",
"path": "/v1/chat/agent",
"status_code": 502,
"method": "POST",
"correlation_id": "cb7a438c39944bef993fa271f99ff009",
"http_version": "1.1",
"client_ip": "127.0.0.1",
"client_port": 47430,
"duration_s": 0.00301417999980913,
"cpu_s": 0.0030138370000001302,
"user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"gitlab_instance_id": null,
"gitlab_global_user_id": null,
"gitlab_host_name": null,
"gitlab_saas_namespace_ids": null,
"gitlab_realm": null,
"meta.feature_category": "duo_chat",
"http_exception_details": "502: Anthropic API Status Error.",
"logger": "api.access",
"level": "info",
"type": "mlops",
"stage": "main",
"timestamp": "2024-01-24T10:24:54.022925Z",
"message": "127.0.0.1:47430 - \"POST /v1/chat/agent HTTP/1.1\" 502"
}
How to set up and validate locally
- Enable thread monitoring:
# Instrumentators
AIGW_INSTRUMENTATOR__THREAD_MONITORING_ENABLED=True
AIGW_INSTRUMENTATOR__THREAD_MONITORING_INTERVAL=1
poetry run ai_gateway
- Simulate concurrent invalid requests:
for i in {1..10}
do
curl http://0.0.0.0:5052/dummy &
done
(FYI, &
is to execute the requests in subprocesses concurrently.)
- Make sure that the
threads_count
in themodelgateway_debug.log
doesn't increase.
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.