Consolidate model client to ChatLiteLLM for centralizing cache control via cache_control_injection_points
Problem
The current issue of cache_control_injection_points is that it doesn't work for the other client such as ai_gateway.models.v2.anthropic_claude.ChatAnthropic. To go down the path, we need to make sure that all selectable models (ref ai_gateway/model_selection/models.yml and ai_gateway/model_selection/unit_primitives.yml) use ai_gateway.models.v2.chat_litellm.ChatLiteLLM, otherwise cache control annotation needs to be maintained in two places 1. cache_control_injection_points (perhaps in AIGW Prompt layer) for ChatLiteLLM path 2. Prompt Builder (e.g. ChatAgentPromptTemplate) for ChatAnthropic path. Considering derisking reliability on production, it's better to make it simpler as much as possible.
ref: !3800 (comment 2873627150)
Proposal
- Remove all model clients except
ai_gateway.models.v2.chat_litellm.ChatLiteLLMin order to make sure thatai_gateway.models.v2.chat_litellm.ChatLiteLLMis the only model client used in AIGW/DWS. - Add
cache_control_injection_pointsinto the model configuration file. With that, we can replaceprompt_cachingparam in prompt file by it.
Edited by Shinya Maeda