Ensure checks are included in successRate stats for code completions
I noted in https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/2599#note_1879567462 that there exists a possibility that the choices
array may be present, but not have any entries. Therefore, we should expand the checks here to include such a check.
Prior to this change, even if checks failed, they were not impacting the result of the test. This can lead to the test report stating 'Passed' despite checks having failed. We should instead consider setting the successRate
based on the output of these checks as it is misleading to say the test passed if these checks failed.
In the longer term I think that this should actually perhaps return an error and is noted in gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#443 (closed), but there might exist a possibility that if targeting some models we simply might not get a response back depending on the specific payload, model, etc, etc
With that in mind, I think there might be some value to add this additional check as an additional guardrail to ensure we do get 'good' responses when running these tests just in case there is some unforeseen scenario
Before
Note in the sample output that while it is logged ✗ choices is an array
the REQ STATUS | RESULT
in the table would previously have reported Passed. If you weren't paying full attention to the entire logs that subtlety may be missed.
█ setup
█ API - Code Suggestions - Completions
✓ is status 200
✓ verify response has choices
✗ choices is an array
↳ 0% — ✓ 0 / ✗ 11
checks.........................: 66.66% ✓ 22 ✗ 11
data_received..................: 9.9 kB 1.9 kB/s
data_sent......................: 4.3 kB 831 B/s
group_duration.................: avg=797.84ms min=165.59ms med=962.04ms max=1024.22ms p(90)=1016.54ms p(95)=1020.38ms
http_req_blocked...............: avg=0.13ms min=0.00ms med=0.01ms max=0.71ms p(90)=0.68ms p(95)=0.69ms
http_req_connecting............: avg=0.11ms min=0.00ms med=0.00ms max=0.62ms p(90)=0.61ms p(95)=0.61ms
http_req_duration..............: avg=159.01ms min=115.97ms med=164.38ms max=190.73ms p(90)=185.81ms p(95)=188.27ms
{ expected_response:true }...: avg=159.01ms min=115.97ms med=164.38ms max=190.73ms p(90)=185.81ms p(95)=188.27ms
http_req_failed................: 0.00% ✓ 0 ✗ 11
http_req_receiving.............: avg=0.09ms min=0.07ms med=0.08ms max=0.13ms p(90)=0.10ms p(95)=0.12ms
http_req_sending...............: avg=0.04ms min=0.03ms med=0.03ms max=0.06ms p(90)=0.05ms p(95)=0.06ms
http_req_tls_handshaking.......: avg=0.00ms min=0.00ms med=0.00ms max=0.00ms p(90)=0.00ms p(95)=0.00ms
✓ http_req_waiting...............: avg=158.89ms min=115.86ms med=164.20ms max=190.61ms p(90)=185.65ms p(95)=188.13ms
✓ http_reqs......................: 11 2.142125/s
iteration_duration.............: avg=731.43ms min=0.37ms med=955.49ms max=1024.25ms p(90)=1015.43ms p(95)=1020.05ms
iterations.....................: 11 2.142125/s
✓ successful_requests............: 0.00% ✓ 0 ✗ 0
vus............................: 1 min=1 max=2
vus_max........................: 2 min=2 max=2
running (05.1s), 0/2 VUs, 11 complete and 0 interrupted iterations
default ✓ [ 100% ] 0/2 VUs 5s
All k6 tests have finished after 6.46s!
█ Results summary
* Environment: Test Environment
* Environment Version: 16.11.0-pre `e4f650d206c`
* Option: 5s_2rps
* Date: 2024-04-24
* Run Time: 6.46s (Start: 23:30:19 UTC, End: 23:30:25 UTC)
* GPT Version: v2.14.0
❯ Overall Results Score: 0.0%
NAME | RPS | RPS RESULT | TTFB AVG | TTFB P90 | REQ STATUS | RESULT
------------------------------------|-----|------------------|----------|-------------------|--------------|----------------
api_v4_code_suggestions_completions | 2/s | 2.14/s (>1.60/s) | 158.89ms | 185.65ms (<500ms) | 0.00% (>99%) | Passed
After
█ setup
█ API - Code Suggestions - Completions
✓ is status 200
✓ verify response has choices
✓ choices is an array
✗ choices is not zero-length
↳ 0% — ✓ 0 / ✗ 11
checks.........................: 75.00% ✓ 33 ✗ 11
data_received..................: 9.8 kB 1.9 kB/s
data_sent......................: 4.3 kB 833 B/s
group_duration.................: avg=797.84ms min=165.59ms med=962.04ms max=1024.22ms p(90)=1016.54ms p(95)=1020.38ms
http_req_blocked...............: avg=0.13ms min=0.00ms med=0.01ms max=0.71ms p(90)=0.68ms p(95)=0.69ms
http_req_connecting............: avg=0.11ms min=0.00ms med=0.00ms max=0.62ms p(90)=0.61ms p(95)=0.61ms
http_req_duration..............: avg=159.01ms min=115.97ms med=164.38ms max=190.73ms p(90)=185.81ms p(95)=188.27ms
{ expected_response:true }...: avg=159.01ms min=115.97ms med=164.38ms max=190.73ms p(90)=185.81ms p(95)=188.27ms
http_req_failed................: 0.00% ✓ 0 ✗ 11
http_req_receiving.............: avg=0.09ms min=0.07ms med=0.08ms max=0.13ms p(90)=0.10ms p(95)=0.12ms
http_req_sending...............: avg=0.04ms min=0.03ms med=0.03ms max=0.06ms p(90)=0.05ms p(95)=0.06ms
http_req_tls_handshaking.......: avg=0.00ms min=0.00ms med=0.00ms max=0.00ms p(90)=0.00ms p(95)=0.00ms
✓ http_req_waiting...............: avg=158.89ms min=115.86ms med=164.20ms max=190.61ms p(90)=185.65ms p(95)=188.13ms
✓ http_reqs......................: 11 2.142125/s
iteration_duration.............: avg=731.43ms min=0.37ms med=955.49ms max=1024.25ms p(90)=1015.43ms p(95)=1020.05ms
iterations.....................: 11 2.145824/s
✗ successful_requests............: 0.00% ✓ 0 ✗ 11
vus............................: 1 min=1 max=2
vus_max........................: 2 min=2 max=2
running (05.1s), 0/2 VUs, 11 complete and 0 interrupted iterations
default ✓ [ 100% ] 0/2 VUs 5s
time="2024-04-24T23:57:35Z" level=error msg="thresholds on metrics 'successful_requests' have been crossed"
All k6 tests have finished after 13.77s!
█ Results summary
* Environment: Test Environment
* Environment Version: 16.11.0-pre `e4f650d206c`
* Option: 5s_2rps
* Date: 2024-04-24
* Run Time: 13.77s (Start: 23:57:21 UTC, End: 23:57:35 UTC)
* GPT Version: v2.14.0
❯ Overall Results Score: 0.0%
NAME | RPS | RPS RESULT | TTFB AVG | TTFB P90 | REQ STATUS | RESULT
------------------------------------|-----|------------------|----------|-------------------|--------------|-----------------
api_v4_code_suggestions_completions | 2/s | 2.15/s (>1.60/s) | 133.14ms | 166.57ms (<500ms) | 0.00% (>99%) | FAILED²