Skip to content

Draft: Add LRU cache for model.bind_tools to resolve CPU bottleneck

What does this merge request do and why?

This MR implements a production-ready LRU in-memory cache for bind_tools() operations to resolve a performance bottleneck in the Duo Workflow Service.

The Problem

Google Cloud Profiler analysis revealed that bind_tools() operations consume 28.4ms and 6.03% of CPU time per request. This operation:

  • Occurs 3-4 times per request (once per agent initialization)
  • Runs synchronously in the asyncio event loop, blocking all other requests
  • Has no caching, causing repeated expensive schema format conversions
  • Limits theoretical maximum throughput to ~12 RPS

At 15 RPS load testing, the service experienced a 49% failure rate due to this bottleneck.

Root Cause: LangChain's bind_tools() performs expensive schema format conversion (OpenAI ↔️ Anthropic) on every agent initialization, with no built-in caching.

The Solution

This MR adds a thread-safe LRU cache that:

  • Caches the result of bind_tools() operations by (model_id, tool_signature, tool_choice)
  • Uses order-independent SHA256 hashing for stable cache keys
  • Implements LRU eviction policy with configurable max size (default: 128 entries)
  • Includes Prometheus metrics for monitoring (hits, misses, duration, size, evictions)
  • Provides structured logging for debugging
  • Is fully configurable via environment variables

Implementation Details

Files Created:

  • ai_gateway/prompts/bind_tools_cache.py - Core LRU cache implementation

Files Modified:

  • ai_gateway/prompts/base.py - Integration into Prompt class
  • example.env - Configuration options (3 new env vars)

Related Issues

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
  • If this change requires executor implementation: verified that issues/MRs exist for both Go executor and Node executor or confirmed that changes are backward-compatible and don't break existing executor functionality.
Edited by Dhruv Rathi

Merge request reports

Loading