Anthropic API Features - Prompt Caching

Problem to solve

Prompt caching enables developers to cache frequently used context between API calls, reducing costs by up to 90% and latency by up to 85% for long prompts. This feature is particularly valuable for applications that repeatedly reference large documents, extensive instruction sets, or comprehensive codebases. F

Proposal

  • Use Anthropic Messages API endpoint /v1/messages with anthropic-beta: prompt-caching-2024-07-31 header
  • Add cache_control: {"type": "ephemeral"} to content blocks that should be cached
  • Implement cache-aware request patterns where large context (documents, instructions, examples) is marked for caching
  • Ensure cached sections are identical and marked with cache_control in the same locations across calls, verify calls are made within cache lifetime (5 minutes by default)
  • Handle cache invalidation scenarios and implement fallback for cache misses
  • Consider extended 1-hour TTL caching for long-running workflows at additional cost

Intended users

Duo users

Feature Usage Metrics

Definition of Done

  • Prompt caching implemented with proper beta headers
    • Cache control markup added to appropriate content blocks
    • Feature flag configured for gradual rollout
    • Cache hit/miss monitoring implemented