Draft: Add LRU cache for model.bind_tools to resolve CPU bottleneck
What does this merge request do and why?
This MR implements a production-ready LRU in-memory cache for bind_tools() operations to resolve a performance bottleneck in the Duo Workflow Service.
The Problem
Google Cloud Profiler analysis revealed that bind_tools() operations consume 28.4ms and 6.03% of CPU time per request. This operation:
- Occurs 3-4 times per request (once per agent initialization)
- Runs synchronously in the asyncio event loop, blocking all other requests
- Has no caching, causing repeated expensive schema format conversions
- Limits theoretical maximum throughput to ~12 RPS
At 15 RPS load testing, the service experienced a 49% failure rate due to this bottleneck.
Root Cause: LangChain's bind_tools() performs expensive schema format conversion (OpenAI
The Solution
This MR adds a thread-safe LRU cache that:
- Caches the result of
bind_tools()operations by (model_id, tool_signature, tool_choice) - Uses order-independent SHA256 hashing for stable cache keys
- Implements LRU eviction policy with configurable max size (default: 128 entries)
- Includes Prometheus metrics for monitoring (hits, misses, duration, size, evictions)
- Provides structured logging for debugging
- Is fully configurable via environment variables
Implementation Details
Files Created:
-
ai_gateway/prompts/bind_tools_cache.py- Core LRU cache implementation
Files Modified:
-
ai_gateway/prompts/base.py- Integration into Prompt class -
example.env- Configuration options (3 new env vars)
Related Issues
- Resolves: gitlab-org/gitlab#578158 - "Performance Bottleneck: Repeated pydantic.create_model() calls in bind_tools() blocking asyncio loop"
- Related: https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/3794 - "Agentic AI performance test execution and data gathering"
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed. -
If this change requires executor implementation: verified that issues/MRs exist for both Go executor and Node executor or confirmed that changes are backward-compatible and don't break existing executor functionality.
Edited by Dhruv Rathi