Investigate streaming for code generation API
The latest code generation models are going to support also streaming - https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/code-generation?hl=en&authuser=1#stream_response_from_generative_ai_models
For preperation to support this we need to investigate the following:
- Time from request sent to streaming starts
- Changes needed in IDE's, Monolith and model gateway to enable it
- How would cleaning, deduplication work in that scenario
Edited by Tim Zallmann