Move model version logic to AI Gateway and implement fallback strategies

Description

In response to the recent 10+ hour outage from Anthropic, we need to implement robust fallback strategies for our AI-powered features. This issue focuses on moving the model version logic from Rails to the AI Gateway and establishing criteria for automatic and manual fallbacks.

Key objectives

  1. Move model version logic from Rails to AI Gateway
  2. Establish criteria and thresholds for fallbacks
  3. Create manual override options for feature teams

Implementation steps

  1. Relocate model version logic to AI Gateway
  2. Define fallback criteria and thresholds (e.g., error rates, response times)
  3. Implement automatic model version fallback logic
  4. Develop automatic provider fallback mechanisms
  5. Create an interface for manual overrides
  6. Enhance logging and monitoring capabilities

Fallback criteria and thresholds

  • Error rate exceeds X% over Y minutes
  • Response time surpasses Z seconds for N consecutive requests
  • Availability drops below A% over B minutes

Manual actions

  • Feature toggle for enabling/disabling automatic fallbacks
  • Interface for manually switching between model versions or providers
  • Emergency override for critical situations
Edited by David O'Regan