Skip to content

Calculate proper limits for context window when serializing resource in Chat Identifier tool

Problem

With move of tool prompts to AI Gateway and introduction of custom models, the monolith is not responsible anymore for choosing a right model for a task. When we execute any of identifier tools in Chat (for answer questions about issues, epics, merge requests, ci jobs and commits), we just send parameters to AI Gateway which is responsible to redirect the request to a right model. So GitLab monolith knows nothing about supported context window.

However, we still have some numbers hardcoded in the monolith and they are still used when we serialize a resource which we return back to the Chat executor when Identifier finished its work.

Quick recap how it works

  1. User asks a question.
  2. Rails Chat executor sends a request to AI Gateway with the user question and some context information.
  3. AI Gateway prepares a prompt and sends request to the model.
  4. Model decides to use an Identifier tool, let's say issue_reader.
  5. Request response is forwarded to Rails Chat executor.
  6. Rail Chat executor chooses IssueReader to make a next step.
  7. IssueReader sends request to LLM to identify resource from user question.
  8. When response is received IssueReader searches for a requested resource (issue) in the database.
  9. Issue is serialized, but content is limited based on provided MAX_CHARACTERS from a "selected prompt provider".
  10. Serialized issue is returned to Rails Chat executor, which sends the serialized issue + previous conversation back to AI Gateway -> LLM.
  11. If LLM is able to give a final answer, it's displayed in UI.

Bug behaviour

The incorrect behaviour occurs on step 9. Currently, Rails monolith aware only about Anthropic provider, and doesn't take into account all other possible models. Moreover, the data in the helper is outdated, or have to be verified:

  • The link in the code comment is broken and leads to nowhere. Probably, it should be changed to this one.
  • Currently, it states that the model token limit is 100_000, but according to the new documentation, for Claude 3_5 it's 200_000.

There is possible that we don't use the model for a full capacity, or it creates a potential issues for custom models.

Solution

It should be investigated. One of the solution could be to send a bit more to AI GW and do the trimming there.

Edited by Tetiana Chupryna