Topology service performance under production load conditions validation
Overview
The topology service is a critical component of the Cells infrastructure that needs to handle production-scale load and support future growth. Before deploying to production, we need to validate that the service can meet performance requirements under realistic load conditions.
This issue tracks the execution of comprehensive load testing following GitLab's performance guidelines to ensure production readiness.
Related Epic: &4 (closed)
Goal
Validate that the topology service can handle production load and scale to support growth by:
- Executing load tests following GitLab performance testing guidelines
- Establishing baseline performance metrics (throughput, latency, resource utilization)
- Identifying performance bottlenecks and capacity limits
- Documenting production readiness findings
Proposal
-
Define test scenarios based on expected production usage patterns:
- Normal load conditions
- Peak load scenarios
- Growth projections (e.g., 2x, 5x current expected load)
-
Set up load testing environment following GitLab guidelines:
- Configure realistic test data
- Set up monitoring and observability
- Prepare load generation tools
-
Execute load tests measuring:
- Request throughput (requests/second)
- Response latency (p50, p95, p99)
- Resource utilization (CPU, memory, I/O)
- Error rates under load
-
Analyze results and document:
- Performance baselines
- Identified bottlenecks
- Capacity limits
- Recommendations for optimization or scaling
Exit Criteria
-
Load test scenarios defined based on production requirements -
Load testing environment configured with monitoring -
Load tests executed for normal, peak, and growth scenarios -
Performance metrics documented (throughput, latency, resource usage) -
Bottlenecks and capacity limits identified and documented -
Production readiness assessment completed with recommendations