fix: approximate tokens count using tiktoken (!4129) · Merge requests · GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

What does this merge request do and why?

This merge request (MR) replaces the current approach of counting tokens by using tiktoken, OpenAI token counting tool.

This MR is needed because the current token counter, ApproximateTokenCounter, can severely undercount in some test cases, creating "prompt is too long" errors.

Token Counting Strategy

The old ApproximateTokenCounter used a simple formula (len // 4 * 1.5) that worked decently for English text but yielded inaccurate counts for Unicode and emojis, with up to 60% undercounting in some cases (see issue).

This MR replaces it with tiktoken, but since tiktoken is slow on large strings (~17ms for 10KB), we use a hybrid approach inspired by stratified sampling.

The logic is as follows:

Strings up to 1500 chars: exact tiktoken counting (accuracy matters most here)
Strings over 1500 chars: sample 500 chars from the start, middle, and end, count those, and extrapolate.

The sampling works well for large content because token density tends to be consistent throughout a file. We sample from three positions to handle cases where the start looks different from the rest (like imports in code files).

Latency Benchmark

Below you can find the results between the old ApproximateTokenCounter, tiktoken, and the proposed approach: tiktoken with sampling .

Test Case	ApproximateTokenCounter (ms)	tiktoken (ms)	tiktoken with sampling (this MR, ms)	sampling speedup
Short text (50 chars)	0.0001	0.0043	0.0047	0.9x
Medium text (1KB)	0.0001	0.1936	0.1932	1.0x
Large text (10KB)	0.0001	17.0386	0.1631	104.5x
Unicode heavy	0.0001	0.1134	0.1129	1.0x
Emoji heavy	0.0001	1.8347	1.8407	1.0x
JSON data	0.0001	0.2024	0.1491	1.4x

The proposed approach in this MR yields a good middleground between ApproximateTokenCounter and pure tiktoken.

Accuracy Benchmark

Below you can find the results between the old ApproximateTokenCounter, tiktoken, and the proposed approach: tiktoken with sampling .

Test Case	tiktoken	ApproximateTokenCounter	tiktoken with sampling (this MR)
Short text (50 chars)	13	16 (+3)	13 (+0)
Medium text (1KB)	128	384 (+256)	128 (+0)
Large text (10KB)	1280	3840 (+2560)	1290 (+10)
Unicode heavy	600	375 (-225)	600 (+0)
Emoji heavy	1600	375 (-1225)	1600 (+0)
JSON data	800	730 (-70)	800 (+0)

The proposed approach in this MR achieves significantly better accuracy compared with the old ApproximateTokenCounter, having minimal differences with respect to the ideal case of using pure tiktoken.

References

Parent issue: #1740
Stratified sampling: https://en.wikipedia.org/wiki/Stratified_sampling

How to set up and validate locally

Check out to this merge request's branch.
Ensure you comply with the Prerequisites
Install dependencies:
```
poetry install
```

Run the token counter tests:

poetry run pytest tests/duo_workflow_service/token_counter/test_tiktoken_counter.py -v

Verify token counting accuracy manually by running the following script:

from duo_workflow_service.token_counter.tiktoken_counter import TikTokenCounter

counter = TikTokenCounter("executor")

# Test with Unicode, should return 30 tokens in the website
text = "你好世界！" * 10
print(f"Unicode: {counter.count_string_content(text)} tokens")

# Test with emojis, should return 50 tokens in the website
text = "🚀✨🎉" * 10
print(f"Emojis: {counter.count_string_content(text)} tokens")

# Test with plain English, should return 8 tokens in the website
text = "Hello, how are you doing today?"
print(f"English: {counter.count_string_content(text)} tokens")

Compare the results with what you get in https://platform.openai.com/tokenizer, selecting gpt-4o .

Merge request checklist

Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.
If this change requires executor implementation: verified that issues/MRs exist for both Go executor and Node executor or confirmed that changes are backward-compatible and don't break existing executor functionality.

Closes #1740

Edited Dec 10, 2025 by Fabrizio J. Piva

fix: approximate tokens count using tiktoken