Move model version logic to AI Gateway and implement fallback strategies
Description
In response to the recent 10+ hour outage from Anthropic, we need to implement robust fallback strategies for our AI-powered features. This issue focuses on moving the model version logic from Rails to the AI Gateway and establishing criteria for automatic and manual fallbacks.
Key objectives
- Move model version logic from Rails to AI Gateway
- Establish criteria and thresholds for fallbacks
- Create manual override options for feature teams
Implementation steps
- Relocate model version logic to AI Gateway
- Define fallback criteria and thresholds (e.g., error rates, response times)
- Implement automatic model version fallback logic
- Develop automatic provider fallback mechanisms
- Create an interface for manual overrides
- Enhance logging and monitoring capabilities
Fallback criteria and thresholds
- Error rate exceeds X% over Y minutes
- Response time surpasses Z seconds for N consecutive requests
- Availability drops below A% over B minutes
Manual actions
- Feature toggle for enabling/disabling automatic fallbacks
- Interface for manually switching between model versions or providers
- Emergency override for critical situations
Edited by David O'Regan