Ensure checks are included in successRate stats for code completions (!578) · Merge requests · GitLab.org / Quality Department / GitLab Performance Tool

John McDonnell requested to merge jmd/update-checks-for-code-suggestions-tests into main Apr 25, 2024

I noted in https://gitlab.com/gitlab-org/quality/quality-engineering/team-tasks/-/issues/2599#note_1879567462 that there exists a possibility that the choices array may be present, but not have any entries. Therefore, we should expand the checks here to include such a check.

Prior to this change, even if checks failed, they were not impacting the result of the test. This can lead to the test report stating 'Passed' despite checks having failed. We should instead consider setting the successRate based on the output of these checks as it is misleading to say the test passed if these checks failed.

In the longer term I think that this should actually perhaps return an error and is noted in gitlab-org/modelops/applied-ml/code-suggestions/ai-assist#443 (closed), but there might exist a possibility that if targeting some models we simply might not get a response back depending on the specific payload, model, etc, etc
With that in mind, I think there might be some value to add this additional check as an additional guardrail to ensure we do get 'good' responses when running these tests just in case there is some unforeseen scenario

Before

Note in the sample output that while it is logged ✗ choices is an array the REQ STATUS | RESULT in the table would previously have reported Passed. If you weren't paying full attention to the entire logs that subtlety may be missed.


     █ setup

     █ API - Code Suggestions - Completions

       ✓ is status 200
       ✓ verify response has choices
       ✗ choices is an array
        ↳  0% — ✓ 0 / ✗ 11

     checks.........................: 66.66% ✓ 22       ✗ 11 
     data_received..................: 9.9 kB 1.9 kB/s
     data_sent......................: 4.3 kB 831 B/s
     group_duration.................: avg=797.84ms min=165.59ms med=962.04ms max=1024.22ms p(90)=1016.54ms p(95)=1020.38ms
     http_req_blocked...............: avg=0.13ms   min=0.00ms   med=0.01ms   max=0.71ms    p(90)=0.68ms    p(95)=0.69ms   
     http_req_connecting............: avg=0.11ms   min=0.00ms   med=0.00ms   max=0.62ms    p(90)=0.61ms    p(95)=0.61ms   
     http_req_duration..............: avg=159.01ms min=115.97ms med=164.38ms max=190.73ms  p(90)=185.81ms  p(95)=188.27ms 
       { expected_response:true }...: avg=159.01ms min=115.97ms med=164.38ms max=190.73ms  p(90)=185.81ms  p(95)=188.27ms 
     http_req_failed................: 0.00%  ✓ 0        ✗ 11 
     http_req_receiving.............: avg=0.09ms   min=0.07ms   med=0.08ms   max=0.13ms    p(90)=0.10ms    p(95)=0.12ms   
     http_req_sending...............: avg=0.04ms   min=0.03ms   med=0.03ms   max=0.06ms    p(90)=0.05ms    p(95)=0.06ms   
     http_req_tls_handshaking.......: avg=0.00ms   min=0.00ms   med=0.00ms   max=0.00ms    p(90)=0.00ms    p(95)=0.00ms   
   ✓ http_req_waiting...............: avg=158.89ms min=115.86ms med=164.20ms max=190.61ms  p(90)=185.65ms  p(95)=188.13ms 
   ✓ http_reqs......................: 11     2.142125/s
     iteration_duration.............: avg=731.43ms min=0.37ms   med=955.49ms max=1024.25ms p(90)=1015.43ms p(95)=1020.05ms
     iterations.....................: 11     2.142125/s
   ✓ successful_requests............: 0.00%  ✓ 0        ✗ 0  
     vus............................: 1      min=1      max=2
     vus_max........................: 2      min=2      max=2


running (05.1s), 0/2 VUs, 11 complete and 0 interrupted iterations
default ✓ [ 100% ] 0/2 VUs  5s
All k6 tests have finished after 6.46s!

█ Results summary

* Environment:                Test Environment
* Environment Version:        16.11.0-pre `e4f650d206c`
* Option:                     5s_2rps
* Date:                       2024-04-24
* Run Time:                   6.46s (Start: 23:30:19 UTC, End: 23:30:25 UTC)
* GPT Version:                v2.14.0

❯ Overall Results Score: 0.0%

NAME                                | RPS | RPS RESULT       | TTFB AVG | TTFB P90          | REQ STATUS   | RESULT         
------------------------------------|-----|------------------|----------|-------------------|--------------|----------------
api_v4_code_suggestions_completions | 2/s | 2.14/s (>1.60/s) | 158.89ms | 185.65ms (<500ms) | 0.00% (>99%) | Passed

After

     █ setup

     █ API - Code Suggestions - Completions

       ✓ is status 200
       ✓ verify response has choices
       ✓ choices is an array
       ✗ choices is not zero-length
        ↳  0% — ✓ 0 / ✗ 11

     checks.........................: 75.00% ✓ 33       ✗ 11 
     data_received..................: 9.8 kB 1.9 kB/s
     data_sent......................: 4.3 kB 833 B/s
     group_duration.................: avg=797.84ms min=165.59ms med=962.04ms max=1024.22ms p(90)=1016.54ms p(95)=1020.38ms
     http_req_blocked...............: avg=0.13ms   min=0.00ms   med=0.01ms   max=0.71ms    p(90)=0.68ms    p(95)=0.69ms   
     http_req_connecting............: avg=0.11ms   min=0.00ms   med=0.00ms   max=0.62ms    p(90)=0.61ms    p(95)=0.61ms   
     http_req_duration..............: avg=159.01ms min=115.97ms med=164.38ms max=190.73ms  p(90)=185.81ms  p(95)=188.27ms 
       { expected_response:true }...: avg=159.01ms min=115.97ms med=164.38ms max=190.73ms  p(90)=185.81ms  p(95)=188.27ms 
     http_req_failed................: 0.00%  ✓ 0        ✗ 11 
     http_req_receiving.............: avg=0.09ms   min=0.07ms   med=0.08ms   max=0.13ms    p(90)=0.10ms    p(95)=0.12ms   
     http_req_sending...............: avg=0.04ms   min=0.03ms   med=0.03ms   max=0.06ms    p(90)=0.05ms    p(95)=0.06ms   
     http_req_tls_handshaking.......: avg=0.00ms   min=0.00ms   med=0.00ms   max=0.00ms    p(90)=0.00ms    p(95)=0.00ms   
   ✓ http_req_waiting...............: avg=158.89ms min=115.86ms med=164.20ms max=190.61ms  p(90)=185.65ms  p(95)=188.13ms 
   ✓ http_reqs......................: 11     2.142125/s
     iteration_duration.............: avg=731.43ms min=0.37ms   med=955.49ms max=1024.25ms p(90)=1015.43ms p(95)=1020.05ms
     iterations.....................: 11     2.145824/s
   ✗ successful_requests............: 0.00%  ✓ 0        ✗ 11 
     vus............................: 1      min=1      max=2
     vus_max........................: 2      min=2      max=2


running (05.1s), 0/2 VUs, 11 complete and 0 interrupted iterations
default ✓ [ 100% ] 0/2 VUs  5s
time="2024-04-24T23:57:35Z" level=error msg="thresholds on metrics 'successful_requests' have been crossed"
All k6 tests have finished after 13.77s!

█ Results summary

* Environment:                Test Environment
* Environment Version:        16.11.0-pre `e4f650d206c`
* Option:                     5s_2rps
* Date:                       2024-04-24
* Run Time:                   13.77s (Start: 23:57:21 UTC, End: 23:57:35 UTC)
* GPT Version:                v2.14.0

❯ Overall Results Score: 0.0%

NAME                                | RPS | RPS RESULT       | TTFB AVG | TTFB P90          | REQ STATUS   | RESULT          
------------------------------------|-----|------------------|----------|-------------------|--------------|-----------------
api_v4_code_suggestions_completions | 2/s | 2.15/s (>1.60/s) | 133.14ms | 166.57ms (<500ms) | 0.00% (>99%) | FAILED²

Edited Apr 25, 2024 by John McDonnell

Ensure checks are included in successRate stats for code completions

Merge request reports