[Investigation] Client <> LLM Architecture changes for Code Suggestions (Post GA)
We currently could make a series of "hops" between the IDE that is requesting a code suggestion:
IDE --> Language Server --> Monolith --> AI Gateway --> LLM
The fastest possible route would be directly from the IDE to the LLM.
The idea would be to call the Monolith/AI Gateway as soon as the user starts typing to get everything you need including credentials to the LLM and routing information on which LLM to call. Then, when it is time to make a suggestion, the IDE could call the LLM directly.
One thing we need to figure out is authentication. We need a way to create secure calls to various LLMs. Maybe the AI Gateway could sign a part of the request and the LLM provider could verify that signature?
Edited by Jörg Heilig