autoscaling gitlab-shell causes dropped connections
Summary
Noticed during load testing of gitlab-shell today.
I was attempting to clone a copy of the linux kernel, and I kept on getting an EOF about half-way through the clone. Looking in the logs on the particular node the pods were on, I was able to see it was happening as the ingresses were reloading, triggered by more gitlab-shell pods coming up.
By setting shell's min and max pods to the same value in the HPA, the problem went away.
Still need to investigate whether this happens after any nginx ingress reload event, (check whether ssh gets and EOF if a unrelated ingress is added, or whether it just happens for changes to the nginx tcp mapping for shell)
Hopefully it only happens when shell scales, and for now we have to remove the auto-scaling for shell, with a followup issue to see if we can find the root cause.