Code Suggestions API returns 500 error for payloads ≥100k characters due to outdated TOTAL_MODEL_TOKEN_LIMIT

Problem

The Code Suggestions API (/api/v4/code_suggestions/completions) returns HTTP 500 errors when sending payloads with content_above_cursor or content_below_cursor fields containing ≥100,000 characters, despite documentation stating that Claude 3.5 Sonnet supports a 200,000 token context window (~800,000 characters for code generation).

Observed Behavior

Requests with content_above_cursor < 100,000 characters: ✅ Success (200 OK)
Requests with content_above_cursor ≥ 100,000 characters: ❌ Failure (500 Internal Server Error)

Expected Behavior

The API should accept payloads up to the documented 200k token limit (~800k characters) without crashing, or gracefully truncate content like other LLM APIs do.

Root Cause

The limit is hardcoded in ee/lib/gitlab/llm/chain/concerns/anthropic_prompt.rb at line 10:

# 100_000 tokens limit documentation: https://docs.anthropic.com/claude/reference/selecting-a-model
TOTAL_MODEL_TOKEN_LIMIT = 100_000

# leave a 20% for cases where 1 token does not exactly match to 4 characters
INPUT_TOKEN_LIMIT = (TOTAL_MODEL_TOKEN_LIMIT * 0.8).to_i.freeze  # = 80,000 tokens

# approximate that one token is ~4 characters.
MAX_CHARACTERS = (INPUT_TOKEN_LIMIT * CHARACTERS_IN_TOKEN).to_i.freeze  # = 320,000 characters

This constant is based on the OLD Claude model context window of 100,000 tokens, but Claude 3.5 Sonnet supports 200,000 tokens.

Current calculation:

TOTAL_MODEL_TOKEN_LIMIT = 100_000 tokens (outdated)
INPUT_TOKEN_LIMIT = 80,000 tokens (80% of 100k)
MAX_CHARACTERS = 320,000 characters (80k × 4)

Impact

This prevents users from:

Analyzing or generating large files
Using the full context window capabilities of Claude 3.5 Sonnet
Leveraging the documented 200k token limit for code generation use cases

Additionally, unlike other LLM APIs that gracefully truncate content, GitLab Duo returns a 500 error, forcing developers to manually truncate content before sending requests.

Proposed Solution

Update TOTAL_MODEL_TOKEN_LIMIT to match Claude 3.5 Sonnet's actual capabilities:

# 200_000 tokens limit for Claude 3.5 Sonnet: https://docs.anthropic.com/en/docs/about-claude/models
TOTAL_MODEL_TOKEN_LIMIT = 200_000

This would result in:

INPUT_TOKEN_LIMIT = 160,000 tokens (80% of 200k, maintaining the 20% safety buffer)
MAX_CHARACTERS = 640,000 characters (160k × 4)

Additional Improvements

Implement graceful truncation: Instead of returning 500 errors, automatically truncate content to fit within limits (like other LLM providers do)
Return proper HTTP status codes: If limits must be enforced, return 413 Payload Too Large with clear error messages instead of 500 Internal Server Error
Update documentation: Ensure docs accurately reflect actual implementation limits

References

Support ticket: https://gitlab.zendesk.com/agent/tickets/684659
Documentation: https://docs.gitlab.com/ee/user/project/repository/code_suggestions/#context-and-privacy
Code location: https://gitlab.com/gitlab-org/gitlab/blob/e66c5c6cd09df14eb32891f163263cf1710ce7dc/ee/lib/gitlab/llm/chain/concerns/anthropic_prompt.rb#L11
Claude 3.5 Sonnet specs: https://docs.anthropic.com/en/docs/about-claude/models

Edited Jan 14, 2026 by 🤖 GitLab Bot 🤖