Calculate proper limits for context window when serializing resource in Chat Identifier tool
Problem
With move of tool prompts to AI Gateway and introduction of custom models, the monolith is not responsible anymore for choosing a right model for a task. When we execute any of identifier tools in Chat (for answer questions about issues, epics, merge requests, ci jobs and commits), we just send parameters to AI Gateway which is responsible to redirect the request to a right model. So GitLab monolith knows nothing about supported context window.
However, we still have some numbers hardcoded in the monolith and they are still used when we serialize a resource which we return back to the Chat executor when Identifier finished its work.
Quick recap how it works
- User asks a question.
- Rails Chat executor sends a request to AI Gateway with the user question and some context information.
- AI Gateway prepares a prompt and sends request to the model.
- Model decides to use an Identifier tool, let's say
issue_reader
. - Request response is forwarded to Rails Chat executor.
- Rail Chat executor chooses
IssueReader
to make a next step. -
IssueReader
sends request to LLM to identify resource from user question. - When response is received
IssueReader
searches for a requested resource (issue) in the database. - Issue is serialized, but content is limited based on provided
MAX_CHARACTERS
from a "selected prompt provider". - Serialized issue is returned to Rails Chat executor, which sends the serialized issue + previous conversation back to AI Gateway -> LLM.
- If LLM is able to give a final answer, it's displayed in UI.
Bug behaviour
The incorrect behaviour occurs on step 9. Currently, Rails monolith aware only about Anthropic provider, and doesn't take into account all other possible models. Moreover, the data in the helper is outdated, or have to be verified:
- The link in the code comment is broken and leads to nowhere. Probably, it should be changed to this one.
- Currently, it states that the model token limit is 100_000, but according to the new documentation, for Claude 3_5 it's 200_000.
There is possible that we don't use the model for a full capacity, or it creates a potential issues for custom models.
Solution
It should be investigated. One of the solution could be to send a bit more to AI GW and do the trimming there.