Skip to content

Increase max_token_size for Claude 3.5 sonnet AI Gateway request

Defintion

It was announced that Anthropic has increased the max_token_size available for their Claude 3.5 Sonnet model. See the slack thread for the announcement of the new model: https://gitlab.slack.com/archives/C051K31F30R/p1721072092059389. Now Claude 3.5 Sonnet has a default of 8,192 tokens based on Anthropic's documentation.

We have recently enabled the Claude 3.5 feature flag use_sonnet_35 for all Gitlab team members and moving towards GA soon for defaulting all tools to use Claude 3.5 Sonnet (unless explicitly using a different model). The following issue will help us weigh the pros and cons related to updating our requests with increasing token size based on all our tools that use Claude 3.5 sonnet

Feature benefits/risks

Pro Additional notes
Improve Spec Generation @michaelangeloio
Less Overall Traffic (users may ask fewer follow-up questions if they instruct the model appropriately or the answer satisfies their needs without needing additional questions) @michaelangeloio
Valuable for AI actions with longer outputs to reduce latency. We currently only offer a output max_token_size of 2048 Model Comparison @nateweinshenker
Con Additional notes
potentially higher costs due to increased token usage @michaelangeloio

Proposal/Action Items

  • The new feature request requires us to "add the header "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15" to your API calls." Make sure the AI Gateway can handle this small change.
  • Modify any tools that use Claude 3.5 Sonnet to update their default max token size.

Link References

https://x.com/alexalbert__/status/1812921642143900036

Edited by Nathan Weinshenker