Testing gitlab-sshd in staging

Problem Statement

Two times now we've attempted to rollout gitlab-sshd and have been forced to rollback. The first attempt was rolled back after seeing extremely high memory consumption leading to failed Pods. The second time, we had only canary taking a small fraction of traffic, but the amount of Context cancelled errors was abnormally high.

It's clear at this point that testing something inside of gitlab-shell has been insufficient. While it is widely known that staging differs from production in various ways, we should have the capability to discover these varied issues ahead of our next attempted rollout.

Milestones

Link to existing testing strategies that were utilized for review
Discuss if the existing testing that has been performed is sufficient
Discuss additional testing strategies that can be utilized to showcase that we've covered all potential failure scenarios
Document/Create issues for actionable items