Requests and limits for the model-gateway
We currently do not have any requests or limits set for the model-gateway service in ai-assist.
This means the HPA will be unable to scale replicas as traffic increases. We saw recently in 2023-05-18: Code suggestions service is down (production#14451 - closed) where the number of replicas was too low for the amount of traffic we were seeing for the feature.
As a first iteration, we should set a conservative requests per pod.
cc @tle_gitlab