Code Suggestions API returns 500 error for payloads ≥100k characters due to outdated TOTAL_MODEL_TOKEN_LIMIT
## Problem The Code Suggestions API (`/api/v4/code_suggestions/completions`) returns HTTP 500 errors when sending payloads with `content_above_cursor` or `content_below_cursor` fields containing ≥100,000 characters, despite documentation stating that Claude 3.5 Sonnet supports a 200,000 token context window (~800,000 characters for code generation). ### Observed Behavior - Requests with `content_above_cursor` < 100,000 characters: ✅ Success (200 OK) - Requests with `content_above_cursor` ≥ 100,000 characters: ❌ Failure (500 Internal Server Error) ### Expected Behavior The API should accept payloads up to the documented 200k token limit (~800k characters) without crashing, or gracefully truncate content like other LLM APIs do. ## Root Cause The limit is hardcoded in `ee/lib/gitlab/llm/chain/concerns/anthropic_prompt.rb` at line 10: ```ruby # 100_000 tokens limit documentation: https://docs.anthropic.com/claude/reference/selecting-a-model TOTAL_MODEL_TOKEN_LIMIT = 100_000 # leave a 20% for cases where 1 token does not exactly match to 4 characters INPUT_TOKEN_LIMIT = (TOTAL_MODEL_TOKEN_LIMIT * 0.8).to_i.freeze # = 80,000 tokens # approximate that one token is ~4 characters. MAX_CHARACTERS = (INPUT_TOKEN_LIMIT * CHARACTERS_IN_TOKEN).to_i.freeze # = 320,000 characters ``` **This constant is based on the OLD Claude model context window of 100,000 tokens**, but Claude 3.5 Sonnet supports **200,000 tokens**. Current calculation: - `TOTAL_MODEL_TOKEN_LIMIT = 100_000` tokens (outdated) - `INPUT_TOKEN_LIMIT = 80,000` tokens (80% of 100k) - `MAX_CHARACTERS = 320,000` characters (80k × 4) ## Impact This prevents users from: - Analyzing or generating large files - Using the full context window capabilities of Claude 3.5 Sonnet - Leveraging the documented 200k token limit for code generation use cases Additionally, **unlike other LLM APIs that gracefully truncate content**, GitLab Duo returns a 500 error, forcing developers to manually truncate content before sending requests. ## Proposed Solution Update `TOTAL_MODEL_TOKEN_LIMIT` to match Claude 3.5 Sonnet's actual capabilities: ```ruby # 200_000 tokens limit for Claude 3.5 Sonnet: https://docs.anthropic.com/en/docs/about-claude/models TOTAL_MODEL_TOKEN_LIMIT = 200_000 ``` This would result in: - `INPUT_TOKEN_LIMIT = 160,000` tokens (80% of 200k, maintaining the 20% safety buffer) - `MAX_CHARACTERS = 640,000` characters (160k × 4) ### Additional Improvements 1. **Implement graceful truncation**: Instead of returning 500 errors, automatically truncate content to fit within limits (like other LLM providers do) 2. **Return proper HTTP status codes**: If limits must be enforced, return `413 Payload Too Large` with clear error messages instead of `500 Internal Server Error` 3. **Update documentation**: Ensure docs accurately reflect actual implementation limits ## References - Support ticket: https://gitlab.zendesk.com/agent/tickets/684659 - Documentation: https://docs.gitlab.com/ee/user/project/repository/code_suggestions/#context-and-privacy - Code location: https://gitlab.com/gitlab-org/gitlab/blob/e66c5c6cd09df14eb32891f163263cf1710ce7dc/ee/lib/gitlab/llm/chain/concerns/anthropic_prompt.rb#L11 - Claude 3.5 Sonnet specs: https://docs.anthropic.com/en/docs/about-claude/models
issue