Code Suggestions API returns 500 error for payloads ≥100k characters due to outdated TOTAL_MODEL_TOKEN_LIMIT
## Problem
The Code Suggestions API (`/api/v4/code_suggestions/completions`) returns HTTP 500 errors when sending payloads with `content_above_cursor` or `content_below_cursor` fields containing ≥100,000 characters, despite documentation stating that Claude 3.5 Sonnet supports a 200,000 token context window (~800,000 characters for code generation).
### Observed Behavior
- Requests with `content_above_cursor` < 100,000 characters: ✅ Success (200 OK)
- Requests with `content_above_cursor` ≥ 100,000 characters: ❌ Failure (500 Internal Server Error)
### Expected Behavior
The API should accept payloads up to the documented 200k token limit (~800k characters) without crashing, or gracefully truncate content like other LLM APIs do.
## Root Cause
The limit is hardcoded in `ee/lib/gitlab/llm/chain/concerns/anthropic_prompt.rb` at line 10:
```ruby
# 100_000 tokens limit documentation: https://docs.anthropic.com/claude/reference/selecting-a-model
TOTAL_MODEL_TOKEN_LIMIT = 100_000
# leave a 20% for cases where 1 token does not exactly match to 4 characters
INPUT_TOKEN_LIMIT = (TOTAL_MODEL_TOKEN_LIMIT * 0.8).to_i.freeze # = 80,000 tokens
# approximate that one token is ~4 characters.
MAX_CHARACTERS = (INPUT_TOKEN_LIMIT * CHARACTERS_IN_TOKEN).to_i.freeze # = 320,000 characters
```
**This constant is based on the OLD Claude model context window of 100,000 tokens**, but Claude 3.5 Sonnet supports **200,000 tokens**.
Current calculation:
- `TOTAL_MODEL_TOKEN_LIMIT = 100_000` tokens (outdated)
- `INPUT_TOKEN_LIMIT = 80,000` tokens (80% of 100k)
- `MAX_CHARACTERS = 320,000` characters (80k × 4)
## Impact
This prevents users from:
- Analyzing or generating large files
- Using the full context window capabilities of Claude 3.5 Sonnet
- Leveraging the documented 200k token limit for code generation use cases
Additionally, **unlike other LLM APIs that gracefully truncate content**, GitLab Duo returns a 500 error, forcing developers to manually truncate content before sending requests.
## Proposed Solution
Update `TOTAL_MODEL_TOKEN_LIMIT` to match Claude 3.5 Sonnet's actual capabilities:
```ruby
# 200_000 tokens limit for Claude 3.5 Sonnet: https://docs.anthropic.com/en/docs/about-claude/models
TOTAL_MODEL_TOKEN_LIMIT = 200_000
```
This would result in:
- `INPUT_TOKEN_LIMIT = 160,000` tokens (80% of 200k, maintaining the 20% safety buffer)
- `MAX_CHARACTERS = 640,000` characters (160k × 4)
### Additional Improvements
1. **Implement graceful truncation**: Instead of returning 500 errors, automatically truncate content to fit within limits (like other LLM providers do)
2. **Return proper HTTP status codes**: If limits must be enforced, return `413 Payload Too Large` with clear error messages instead of `500 Internal Server Error`
3. **Update documentation**: Ensure docs accurately reflect actual implementation limits
## References
- Support ticket: https://gitlab.zendesk.com/agent/tickets/684659
- Documentation: https://docs.gitlab.com/ee/user/project/repository/code_suggestions/#context-and-privacy
- Code location: https://gitlab.com/gitlab-org/gitlab/blob/e66c5c6cd09df14eb32891f163263cf1710ce7dc/ee/lib/gitlab/llm/chain/concerns/anthropic_prompt.rb#L11
- Claude 3.5 Sonnet specs: https://docs.anthropic.com/en/docs/about-claude/models
issue