Spike: Investigate character limits for Duo Chat
A customer wanted to know what happens when the files in an MR are too many to fit in the context window. What happens then?
Lesley already dug up that for some reason, the character limit for merge request-related questions on Duo Chat is 96,000. That seems low as the token limit is 200,000 for Claude 3.5 Sonnet. (Usually there are about 4 characters per token.)
We also realized that we have other questions about how this truncation works in Duo Chat in general.
Acceptance criteria
The following should be recorded:
- What is the character limit for MR, issue, epic, GitLab Documentation, and general questions on Duo Chat?
- Why do we have those limits?
- What happens if the limit is exceeded? E.g. during the conversation, does it truncate the newest messages or the oldest messages? In other words, is there a point in the conversation where Duo Chat is always answering old questions, as the newer ones are cut off?
- Do we track how often it happens that we truncate a conversation or a question? Apparently not, according to @nateweinshenker's comment in https://gitlab.slack.com/archives/C051K31F30R/p1721318066274069?thread_ts=1721072092.059389&cid=C051K31F30R. (Related issues: #472681 (closed) and gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#554)
- Report the findings to @tlinz so we can decide if any adjustments are needed.
Dev Note
To test the second question, hardcode a small limit so you can observe the effects easier.
Results
Character Limits
this MR)
Per-message Limit (outdated - no longer exists due toThere is a 400,000 character limit for any single message on Duo Chat (code).
Total Input Limit
There is a limit of 200,000 tokens (about 680k characters) total for the input to the model. The main difference with the above limit is that this:
- Includes the conversation history
- Comes directly from Anthropic
Serialized Content Limits
The serialized content (issues, epics, MRs) has a limit of 192,000 characters. (code) For merge requests, this is split into 96,000 characters for diff and 96,000 characters for comments (code)
What this means it that if an issue, epic, MR, etc has content that goes above this limit, it gets cutoff from the end. The user gets no indication this has happened.
This is based on an assumption that the limit is 100k tokens for Anthropic. (code). This is outdated, and can be safely bumped up to 200k. However, some would still argue that it's best not to use up almost the entire limit with a single serialized resource.
The exact math is:
- Total token limit: 100,000 tokens
- Input token limit: 80,000 tokens (100,000 × 0.8)
- Uses 80% of total limit as safety buffer
- Maximum characters: 320,000 characters (80,000 tokens × 4 chars/token)
- Based on estimate of 4 characters per token
- Resource serialization: 192,000 characters (320,000 × 0.6)
- Serializes context with character limit of 60% of max characters to leave room for prompt
What Happens When Limits Are Exceeded
Limit Type | Result |
---|---|
Per-message | User gets an 'A9999' error |
Total input | User receives: "I'm sorry, you've entered too many prompts. Please run /clear or /reset before asking the next question." |
Serialized content | Content gets truncated when it exceeds the limit. |
Only the 'total input' and 'serialized content' limits still exist.
Tracking
Currently, no limits are specifically tracked. However, total input limit exceptions can be monitored through Kibana logs.