Explore xk6-exec POC implementation

📜 Summary

This issue explores the viability of implementing git-ssh performance testing using xk6-exec based on @john.mcdonnell 's recommendation. We have concerns about potential limitations (filesystem I/O under load, scalability constraints, etc.), so this POC focuses on deliberately stress-testing the approach to uncover gotchas and failure modes rather than building a production-ready solution.

🥅 Goal

Build a functional POC implementation using xk6-exec
Stress-test the approach to identify failure modes and constraints (filesystem bottlenecks, load generator stability, etc.)
Document discovered limitations and workarounds to inform next iteration decisions
Determine if identified issues are solvable within this approach or require the custom K6 plugin path

🏁 Exit Criteria

POC implementation completed
Load tests executed specifically to trigger suspected failure modes
Findings documented with discovered gotchas, limitations, and potential solutions
Clear recommendation on whether to iterate on xk6-exec approach or pivot to custom plugin development

Results

Recommendation

Continue with the xk6-exec approach. The POC successfully demonstrated:

Viable integration with GPT testing framework
Ability to test both SSH and HTTP protocols
No fundamental technical blockers discovered

Analysis

We successfully created a POC script that tests git clone --depth 1 via both SSH and HTTP using the native git client, comparing protocol performance. We ran multiple load levels against a GET RA 10k environment.

Here is a breakdown of the results found:

run configuration	expected iterations	actual iterations	dropped iterations	Maximum VUsers	actual VUsers	run duration	SSH clone min/avg/max (s)	HTTP clone min/avg/max (s)
30s_2rps	30	10	20	10	10	1.5 min	12.8 / 31.1 / 37.8	23.6 / 30.3 / 35.6
60s_2rps	60	11	49	10	10	1.6 min	9.3 / 30.0 / 38.5	6.0 / 30.1 / 37.6
60s_10rps	60	28	32	50	28	8.2 min	15.3 / 83.4 / 118.2	60.8 / 79.5 / 108.4
60s_20rps	60	28	32	100	28	8.3 min	13.6 / 84.0 / 111.5	63.6 / 84.0 / 111.6
60s_40rps	120	55	65	200	56	18.9 min	66.3 / 201.4 / 252.4	157.9 / 207.0 / 248.0
60s_80rps	180	83	97	400	84	32.6 min	179.1 / 349.4 / 446.6	273.4 / 350.7 / 406.0

Performance degraded 3x at just 10 RPS (from ~30s to ~80s clone times) and 12x at 80 RPS (to ~350s), far below the infrastructure's rated 200 RPS capacity
The number of VUsers never approached the configured maximums because clone operations became so slow (100-400+ seconds) that the 60-second arrival window ended before k6 could ramp additional VUs.
None of the runs achieved their expected iterations and thus load levels
SSH and HTTP clone performance tracked within 5-10% at all load levels, confirming SSH protocol overhead is NOT the bottleneck
Zero SSH connection failures across all tests despite concurrent connections up to 84 VUs, confirming SSH key reuse is not a limiting factor
The script ran better than expected and did not hit the SSH connection limits or saturate the test generator disk

Tooling findings

GPT can be configured to run SSH tests as well as HTTP tests.
The xk6-exec module is not included in GPT's k6 implementation, requiring a custom k6 build (one-time setup)

Open questions

What are our real KPIs for git ssh (response time? byte throughput?)
What git commands are of interest?
- Is a shallow git clone of interest or should we do a full clone?
- How to conduct a git push with data?
- How to do a git pull where the data on the server is different than local?
Are these clone times (30s baseline, 350s at load) expected for the test environment, or do they indicate a GitLab configuration/infrastructure issue, is the performance cliff expected?

Next steps

Investigate the root cause of the 10 RPS performance cliff via Grafana metrics (Gitaly queue depth, network bandwidth, Git-specific bottlenecks)
Continue iterating on the xk6-exec path
- Explore how to implement the new k6 build in GPT rather than a one-off implementation
- Explore how to implement git pull and git push
Explore what other metrics we can gather (clone size based on duration of load time is suspect, can we capture throughput a different way?)
Compare the native git results against the API based testing we do currently

Edited Oct 02, 2025 by Andy Hohenner