Pod capacity drops during deployments
During deployments of higher load times, our Apdex takes a hit. Both our ServiceAPI and ServiceWeb appear to be negatively impacted by this.
Sometimes this is harsh enough to alert the EOC.
Utilize this issue to investigate why this is happening. Ideally the tuning of the deployment should enable us to cleanly rollover new Pods without having any sort of negative impact to this metric.
Questions
-
Is the application suffering when taking it's first few requests? -
Are Kubernetes deployments not well tuned? -
Are the metrics capturing incorrect data? -
Is there an imbalance of traffic? -
...
Results of this Issue
Reasoning has been found. Our deployment objects for some deployments, container registry, webservice, contain the spec.replicas
definition. Kubernetes will see this being applied for every deploy and configuration change. This in turn removes a set of Pods from service during changes to the deployment.
Fun investigative conversation can be found on this thread below: #1992 (comment 674659356)
After a modification to our chart to remove spec.replica
counts, further deploys no longer see this issue, the result of which can be found in the below thread: #1992 (comment 678236039)