Classic Duo Chat - Output Truncated
### Summary
When classic duo chat is prompted to generate a large message, the output can be cut of half way through and appear truncated due to passing the max_tokens limit.
Similar issues for the slash commands are:
[/fix](https://gitlab.com/gitlab-org/gitlab/-/work_items/582842) Fixed by having the LLM identify and fix the most important issues first (priority order provided to LLM)
[/refactor](https://gitlab.com/gitlab-org/gitlab/-/work_items/579683) Fixed by having the LLM choose a snippet and refactoring that snippet only
[/tests](https://gitlab.com/gitlab-org/gitlab/-/work_items/575987) Fixed by having the LLM output the tests in chunks over multiple messages
### Steps to reproduce
1. Type into classic duo chat (Web or IDE) "Generate a code snippet over 800 lines"
2. Output will likely cut off mid way and look like:
{width="490" height="375"}
A real world example would be:
1. In classic chat in an IDE highlight the entire user.rb file of the rails monolith (3k lines)
2. Type "provide a summary of every method in this file with an example of when you would call them and a code snippet showing the method call"
3. Output will look like:
```
...
Abuse & Trust
trusted?
When to call: Bypassing spam checks
unless user.trusted?
SpamCheckService.new(issue).execute
end
abuse_metadata
When to call: Reporting abuse
AbuseReport.create(
user: reported_user,
reporter: current_user,
metadata: reported_user.abuse_metadata
)
**CI
```
### What is the current _bug_ behavior?
Chat message is not always completed
### What is the expected _correct_ behavior?
Chat output will be completed, or at least a clear error is displayed to the users asking them to reduce the context.
### Possible fixes
1. Have the LLM identify when a prompt is too large and inform the user
2. Automatically detect when max_tokens are fully used and display an error to the user
3. There are likely more as well
issue