Consolidate model client to ChatLiteLLM for centralizing cache control via cache_control_injection_points

Problem

The current issue of cache_control_injection_points is that it doesn't work for the other client such as ai_gateway.models.v2.anthropic_claude.ChatAnthropic. To go down the path, we need to make sure that all selectable models (ref ai_gateway/model_selection/models.yml and ai_gateway/model_selection/unit_primitives.yml) use ai_gateway.models.v2.chat_litellm.ChatLiteLLM, otherwise cache control annotation needs to be maintained in two places 1. cache_control_injection_points (perhaps in AIGW Prompt layer) for ChatLiteLLM path 2. Prompt Builder (e.g. ChatAgentPromptTemplate) for ChatAnthropic path. Considering derisking reliability on production, it's better to make it simpler as much as possible.

ref: !3800 (comment 2873627150)

Proposal

  • Remove all model clients except ai_gateway.models.v2.chat_litellm.ChatLiteLLM in order to make sure that ai_gateway.models.v2.chat_litellm.ChatLiteLLM is the only model client used in AIGW/DWS.
  • Add cache_control_injection_points into the model configuration file. With that, we can replace prompt_caching param in prompt file by it.
Edited by Shinya Maeda