Investigate degradation in GPT AI test results with higher RPS

Summary

Degradation might be caused by the fact that all environments are currently using same machine size for AI mock backend - ai_gateway_machine_type = "n1-highcpu-2"

The issue is to debug this further and identify what needs to be done to align results on various size environments. If specs of AI backend don't improve performance - a new performance issue for the product should be risen following template.

Details

Performance tests for AI features are failing with higher load on bigger environments. For example, below is 10k/200RPS resutls

NAME                                          | RPS   | RPS RESULT          | TTFB AVG  | TTFB P90            | REQ STATUS     | RESULT
----------------------------------------------|-------|---------------------|-----------|---------------------|----------------|-------
api_v4_code_suggestions_completions           | 200/s | 100.32/s (>48.00/s) | 1841.46ms | 4271.04ms (<8000ms) | 100.00% (>99%) | Passed
api_v4_code_suggestions_completions_streaming | 200/s | 101.04/s (>48.00/s) | 1826.49ms | 4236.84ms (<8000ms) | 100.00% (>99%) | Passed
api_v4_code_suggestions_generations           | 200/s | 101.74/s (>48.00/s) | 1823.70ms | 5540.01ms (<8000ms) | 100.00% (>99%) | Passed

When 1k results are significantly lower

NAME                                          | RPS  | RPS RESULT         | TTFB AVG  | TTFB P90           | REQ STATUS     | RESULT  
----------------------------------------------|------|--------------------|-----------|--------------------|----------------|---------
api_v4_code_suggestions_completions           | 20/s | 19.46/s (>16.00/s) | 84.84ms   | 80.84ms (<500ms)   | 100.00% (>99%) | Passed  
api_v4_code_suggestions_completions_streaming | 20/s | 19.93/s (>16.00/s) | 76.92ms   | 79.32ms (<500ms)   | 100.00% (>99%) | Passed  
api_v4_code_suggestions_generations           | 20/s | 19.86/s (>16.00/s) | 85.14ms   | 87.66ms (<500ms)   | 100.00% (>99%) | Passed

This might be caused by the fact that all environments are currently using same machine size for AI mock backend - ai_gateway_machine_type = "n1-highcpu-2"