Use stricter timeouts for model gateway calls (!126852) · Merge requests · GitLab.org / GitLab

Matthias Käppler requested to merge 418893-mg-timeouts into master Jul 18, 2023

What does this MR do and why?

Use stricter timeouts for model gateway calls. Blocking the Puma worker thread for too long can have significant impact on overall ability of this worker process to process other requests so we should consider any request for code suggestions that takes too long as failed. See #418203 (closed) how these problems materialize.

Errors resulting from TCP connection timeouts will surface in logs of both gitlab-rails (because Net::HTTP times out) and the model gateway (because the client hangs up before the server can respond).

One challenge with this is that self-managed instances will use this soon too but they will experience different latency to the model gateway from where ever they are deployed versus SaaS. It's an open question for how we will account for that.

Potential follow-up improvements

The gateway logs could be clearer. Perhaps we want to catch these exceptions and produce an explicit ClientTimeoutError that we log instead, along with which client was calling (self-managed or SaaS).
Rails logs for self-managed are not available to us. To get the client side picture, we could increment Redis counters whenever we fail due to timeouts and ship that through the service ping?

How to set up and validate locally

Run gitlab-rails and the model gateway locally
Add 2+ seconds of latency to a model e.g. by letting the thread sleep

Request code suggestions:

   curl -v -H'Authorization: Bearer <snip>' -H'X-gitlab-oidc-token: <snip>' -H'content-type: application/json' -d'{
  "prompt_version": 1,
  "current_file": {
    "file_name": "test.py",
    "content_above_cursor": "def is_even(n: int) ->",
    "content_below_cursor": ""
  }
}' localhost:3000/api/v4/code_suggestions/completions

To demonstrate, I injected some extra latency into the model response so the client would always time out.

client-side error:

{
    "severity": "ERROR",
    "time": "2023-07-18T12:18:01.469Z",
    "correlation_id": "01H5MEZ5K7QGH5QXHXPER6VS8G",
    "meta.caller_id": "POST /api/:version/code_suggestions/completions",
    "meta.remote_ip": "192.168.80.1",
    "meta.feature_category": "code_suggestions",
    "meta.user": "root",
    "meta.user_id": 1,
    "meta.client_id": "user/1",
    "exception.class": "Net::ReadTimeout",
    "exception.message": "Net::ReadTimeout with #\u003cTCPSocket:(closed)\u003e",
    "exception.backtrace": [
        "lib/gitlab/buffered_io.rb:32:in `readuntil'",
        "lib/gitlab/http.rb:65:in `perform_request'",
        "ee/lib/api/code_suggestions.rb:90:in `block (3 levels) in \u003cclass:CodeSuggestions\u003e'",
        "..."
    ],
    "user.username": "root",
    "tags.program": "web",
    "tags.locale": "en",
    "tags.feature_category": "code_suggestions",
    "tags.correlation_id": "01H5MEZ5K7QGH5QXHXPER6VS8G"
}

server-side error:

{
    "url": "http://ai-gateway:5000/v2/completions",
    "path": "/v2/completions",
    "status_code": 500,
    "method": "POST",
    "correlation_id": "a9ae7adb00fb4bb58a1a708a09c1483a",
    "http_version": "1.1",
    "client_ip": "192.168.80.2",
    "client_port": 58110,
    "duration_s": 2.0054025259996706,
    "cpu_s": 0.006960481000000129,
    "user_agent": "Ruby",
    "get_suggestions_duration_s": 2.001935721999871,
    "exception.message": "No response returned.",
    "exception.backtrace": "Traceback (most recent call last):\n  File \"/opt/venv/codesuggestions-9TtSrW0h-py3.9/lib/python3.9/site-packages/anyio/streams/memory.py\", line 98, in receive\n    return self.receive_nowait()\n  File \"/opt/venv/codesuggestions-9TtSrW0h-py3.9/lib/python3.9/site-packages/anyio/streams/memory.py\", line 93, in receive_nowait\n    raise WouldBlock\nanyio.WouldBlock\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/venv/codesuggestions-9TtSrW0h-py3.9/lib/python3.9/site-packages/starlette/middleware/base.py\", line 43, in call_next\n    message = await recv_stream.receive()\n  File \"/opt/venv/codesuggestions-9TtSrW0h-py3.9/lib/python3.9/site-packages/anyio/streams/memory.py\", line 118, in receive\n    raise EndOfStream\nanyio.EndOfStream\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/app/codesuggestions/api/middleware.py\", line 103, in dispatch\n    response = await call_next(request)\n  File \"/opt/venv/codesuggestions-9TtSrW0h-py3.9/lib/python3.9/site-packages/starlette/middleware/base.py\", line 47, in call_next\n    raise RuntimeError(\"No response returned.\")\nRuntimeError: No response returned.\n",
    "logger": "api.access",
    "level": "info",
    "type": "mlops",
    "stage": "main",
    "timestamp": "2023-07-18T12:18:01.453366Z",
    "message": "192.168.80.2:58110 - \"POST /v2/completions HTTP/1.1\" 500"
}

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Related to #418893 (closed)

Edited Jul 18, 2023 by Matthias Käppler

Use stricter timeouts for model gateway calls

What does this MR do and why?

Potential follow-up improvements

How to set up and validate locally

MR acceptance checklist

Merge request reports