Topology service performance under production load conditions validation

Overview

The topology service is a critical component of the Cells infrastructure that needs to handle production-scale load and support future growth. Before deploying to production, we need to validate that the service can meet performance requirements under realistic load conditions.

This issue tracks the execution of comprehensive load testing following GitLab's performance guidelines to ensure production readiness.

Related Epic: &4 (closed)

Goal

Validate that the topology service can handle production load and scale to support growth by:

  • Executing load tests following GitLab performance testing guidelines
  • Establishing baseline performance metrics (throughput, latency, resource utilization)
  • Identifying performance bottlenecks and capacity limits
  • Documenting production readiness findings

Proposal

  1. Define test scenarios based on expected production usage patterns:

    • Normal load conditions
    • Peak load scenarios
    • Growth projections (e.g., 2x, 5x current expected load)
  2. Set up load testing environment following GitLab guidelines:

    • Configure realistic test data
    • Set up monitoring and observability
    • Prepare load generation tools
  3. Execute load tests measuring:

    • Request throughput (requests/second)
    • Response latency (p50, p95, p99)
    • Resource utilization (CPU, memory, I/O)
    • Error rates under load
  4. Analyze results and document:

    • Performance baselines
    • Identified bottlenecks
    • Capacity limits
    • Recommendations for optimization or scaling

Exit Criteria

  • Load test scenarios defined based on production requirements
  • Load testing environment configured with monitoring
  • Load tests executed for normal, peak, and growth scenarios
  • Performance metrics documented (throughput, latency, resource usage)
  • Bottlenecks and capacity limits identified and documented
  • Production readiness assessment completed with recommendations

References

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information