Anthropic API Features - Prompt Caching
Problem to solve
Prompt caching enables developers to cache frequently used context between API calls, reducing costs by up to 90% and latency by up to 85% for long prompts. This feature is particularly valuable for applications that repeatedly reference large documents, extensive instruction sets, or comprehensive codebases. F
Proposal
- Use Anthropic Messages API endpoint
/v1/messageswithanthropic-beta: prompt-caching-2024-07-31header - Add
cache_control: {"type": "ephemeral"}to content blocks that should be cached - Implement cache-aware request patterns where large context (documents, instructions, examples) is marked for caching
- Ensure cached sections are identical and marked with cache_control in the same locations across calls, verify calls are made within cache lifetime (5 minutes by default)
- Handle cache invalidation scenarios and implement fallback for cache misses
- Consider extended 1-hour TTL caching for long-running workflows at additional cost
Intended users
Duo users
Feature Usage Metrics
Definition of Done
- Prompt caching implemented with proper beta headers
-
Cache control markup added to appropriate content blocks -
Feature flag configured for gradual rollout -
Cache hit/miss monitoring implemented
-