Sign in or sign up before continuing. Don't have an account yet? Register now to get started.
Register now

Code Suggestions API returns 500 error for payloads ≥100k characters due to outdated TOTAL_MODEL_TOKEN_LIMIT

Problem

The Code Suggestions API (/api/v4/code_suggestions/completions) returns HTTP 500 errors when sending payloads with content_above_cursor or content_below_cursor fields containing ≥100,000 characters, despite documentation stating that Claude 3.5 Sonnet supports a 200,000 token context window (~800,000 characters for code generation).

Observed Behavior

  • Requests with content_above_cursor < 100,000 characters: ✅ Success (200 OK)
  • Requests with content_above_cursor ≥ 100,000 characters: ❌ Failure (500 Internal Server Error)

Expected Behavior

The API should accept payloads up to the documented 200k token limit (~800k characters) without crashing, or gracefully truncate content like other LLM APIs do.

Root Cause

The limit is hardcoded in ee/lib/gitlab/llm/chain/concerns/anthropic_prompt.rb at line 10:

# 100_000 tokens limit documentation: https://docs.anthropic.com/claude/reference/selecting-a-model
TOTAL_MODEL_TOKEN_LIMIT = 100_000

# leave a 20% for cases where 1 token does not exactly match to 4 characters
INPUT_TOKEN_LIMIT = (TOTAL_MODEL_TOKEN_LIMIT * 0.8).to_i.freeze  # = 80,000 tokens

# approximate that one token is ~4 characters.
MAX_CHARACTERS = (INPUT_TOKEN_LIMIT * CHARACTERS_IN_TOKEN).to_i.freeze  # = 320,000 characters

This constant is based on the OLD Claude model context window of 100,000 tokens, but Claude 3.5 Sonnet supports 200,000 tokens.

Current calculation:

  • TOTAL_MODEL_TOKEN_LIMIT = 100_000 tokens (outdated)
  • INPUT_TOKEN_LIMIT = 80,000 tokens (80% of 100k)
  • MAX_CHARACTERS = 320,000 characters (80k × 4)

Impact

This prevents users from:

  • Analyzing or generating large files
  • Using the full context window capabilities of Claude 3.5 Sonnet
  • Leveraging the documented 200k token limit for code generation use cases

Additionally, unlike other LLM APIs that gracefully truncate content, GitLab Duo returns a 500 error, forcing developers to manually truncate content before sending requests.

Proposed Solution

Update TOTAL_MODEL_TOKEN_LIMIT to match Claude 3.5 Sonnet's actual capabilities:

# 200_000 tokens limit for Claude 3.5 Sonnet: https://docs.anthropic.com/en/docs/about-claude/models
TOTAL_MODEL_TOKEN_LIMIT = 200_000

This would result in:

  • INPUT_TOKEN_LIMIT = 160,000 tokens (80% of 200k, maintaining the 20% safety buffer)
  • MAX_CHARACTERS = 640,000 characters (160k × 4)

Additional Improvements

  1. Implement graceful truncation: Instead of returning 500 errors, automatically truncate content to fit within limits (like other LLM providers do)
  2. Return proper HTTP status codes: If limits must be enforced, return 413 Payload Too Large with clear error messages instead of 500 Internal Server Error
  3. Update documentation: Ensure docs accurately reflect actual implementation limits

References

  • Support ticket: https://gitlab.zendesk.com/agent/tickets/684659
  • Documentation: https://docs.gitlab.com/ee/user/project/repository/code_suggestions/#context-and-privacy
  • Code location: https://gitlab.com/gitlab-org/gitlab/blob/e66c5c6cd09df14eb32891f163263cf1710ce7dc/ee/lib/gitlab/llm/chain/concerns/anthropic_prompt.rb#L11
  • Claude 3.5 Sonnet specs: https://docs.anthropic.com/en/docs/about-claude/models
Edited Jan 14, 2026 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading