Code Suggestions API returns 500 error for payloads ≥100k characters due to outdated TOTAL_MODEL_TOKEN_LIMIT
Problem
The Code Suggestions API (/api/v4/code_suggestions/completions) returns HTTP 500 errors when sending payloads with content_above_cursor or content_below_cursor fields containing ≥100,000 characters, despite documentation stating that Claude 3.5 Sonnet supports a 200,000 token context window (~800,000 characters for code generation).
Observed Behavior
- Requests with
content_above_cursor< 100,000 characters:✅ Success (200 OK) - Requests with
content_above_cursor≥ 100,000 characters:❌ Failure (500 Internal Server Error)
Expected Behavior
The API should accept payloads up to the documented 200k token limit (~800k characters) without crashing, or gracefully truncate content like other LLM APIs do.
Root Cause
The limit is hardcoded in ee/lib/gitlab/llm/chain/concerns/anthropic_prompt.rb at line 10:
# 100_000 tokens limit documentation: https://docs.anthropic.com/claude/reference/selecting-a-model
TOTAL_MODEL_TOKEN_LIMIT = 100_000
# leave a 20% for cases where 1 token does not exactly match to 4 characters
INPUT_TOKEN_LIMIT = (TOTAL_MODEL_TOKEN_LIMIT * 0.8).to_i.freeze # = 80,000 tokens
# approximate that one token is ~4 characters.
MAX_CHARACTERS = (INPUT_TOKEN_LIMIT * CHARACTERS_IN_TOKEN).to_i.freeze # = 320,000 characters
This constant is based on the OLD Claude model context window of 100,000 tokens, but Claude 3.5 Sonnet supports 200,000 tokens.
Current calculation:
-
TOTAL_MODEL_TOKEN_LIMIT = 100_000tokens (outdated) -
INPUT_TOKEN_LIMIT = 80,000tokens (80% of 100k) -
MAX_CHARACTERS = 320,000characters (80k × 4)
Impact
This prevents users from:
- Analyzing or generating large files
- Using the full context window capabilities of Claude 3.5 Sonnet
- Leveraging the documented 200k token limit for code generation use cases
Additionally, unlike other LLM APIs that gracefully truncate content, GitLab Duo returns a 500 error, forcing developers to manually truncate content before sending requests.
Proposed Solution
Update TOTAL_MODEL_TOKEN_LIMIT to match Claude 3.5 Sonnet's actual capabilities:
# 200_000 tokens limit for Claude 3.5 Sonnet: https://docs.anthropic.com/en/docs/about-claude/models
TOTAL_MODEL_TOKEN_LIMIT = 200_000
This would result in:
-
INPUT_TOKEN_LIMIT = 160,000tokens (80% of 200k, maintaining the 20% safety buffer) -
MAX_CHARACTERS = 640,000characters (160k × 4)
Additional Improvements
- Implement graceful truncation: Instead of returning 500 errors, automatically truncate content to fit within limits (like other LLM providers do)
-
Return proper HTTP status codes: If limits must be enforced, return
413 Payload Too Largewith clear error messages instead of500 Internal Server Error - Update documentation: Ensure docs accurately reflect actual implementation limits
References
- Support ticket: https://gitlab.zendesk.com/agent/tickets/684659
- Documentation: https://docs.gitlab.com/ee/user/project/repository/code_suggestions/#context-and-privacy
- Code location: https://gitlab.com/gitlab-org/gitlab/blob/e66c5c6cd09df14eb32891f163263cf1710ce7dc/ee/lib/gitlab/llm/chain/concerns/anthropic_prompt.rb#L11
- Claude 3.5 Sonnet specs: https://docs.anthropic.com/en/docs/about-claude/models