Instrument latency and adjust prompt size and log truncation strategy for latency concerns
Problem
GA requirements include (source):
- Target response time of <5 seconds, instrumented for visibility but not a hard requirement. If response time is >5 seconds then feature teams need to document why we are making the trade-off for higher latency. Timing exceptions allowed with UX validation reasoning. Feature teams do not need to worry about meeting any specific p50/p90 metrics.
Note: This is actually our target for all requests throughout the product and we will violate the error budget if we go above it
Proposal
Once we release the MVC. We should investigate via the logs how long each request takes in production as a benchmark and then we can compare any changes to that. We have to consider that retrieving the log from the gitlab.com backend and sending it over will take a different amount of time in a live production system that on a isolated project like the model validation. Especially since we use Gitlab::Ci::Trace::Stream#raw
to stream the logs line by line. Still the anthropic request will likely be the majority of the time spent, so we can also determine ballpark estimates locally.
We should also adjust the prompts in production based on how quickly requests to anthropic come back balancing request accuracy against latency for longer logs. (some of this may come out of the model validation exercise?)
Providing more context should lead to more latency, so we should confirm that providing more of the log actually provides the user with a better result.