CHALLENGE: Why did helm remove the Service loadBalancerIP configuration?
Recently the staging auto-deploy became blocked because the diff job was detecting an unwanted change, reference: https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/jobs/5069316 Change in question:
gitlab, gitlab-gitlab-shell, Service (v1) has changed:
...
    annotations:
      cloud.google.com/load-balancer-type: Internal
  spec:
    type: LoadBalancer
    ports:
      - port: 2222
        targetPort: 2222
        protocol: TCP
        name: ssh
    externalTrafficPolicy: Cluster
+   loadBalancerIP: 10.224.46.6
    selector:
      app: gitlab-shell
      release: gitlabSo where did this come from? There has been no work happening related to service IP's on GitLab-Shell, and no deploys or configuration rollouts between the last auto-deploy and this new one. If we go back and look at the last auto-deploy that was rolled out, we see everything looks normal, reference: https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/jobs/5068570
No where in there, did helm report that we removed the IP address from this service.
BUT
We did.
If you go into that cluster, and run a helm diff on the revisions, you can clearly see this did actually happen.
% helm history gitlab
% helm diff revision gitlab 1541 1542 > diff.yamlAnd if you look at the output, sure enough:
 227 ^[[0;33mgitlab, gitlab-gitlab-shell, Service (v1) has changed:^[[0m
 228   # Source: gitlab/charts/gitlab/charts/gitlab-shell/templates/service.yaml
 229   apiVersion: v1
 230   kind: Service
 231   metadata:
 232     name: gitlab-gitlab-shell
 233     namespace: gitlab
 234     labels:
 235       app: gitlab-shell
 236       chart: gitlab-shell-5.3.2
 237       release: gitlab
 238       heritage: Helm
 239 ······
 240       deployment: "gitlab-shell"
 241       shard: "default"
 242       stage: "main"
 243       tier: "sv"
 244       type: "git"
 245 ······
 246     annotations:
 247       cloud.google.com/load-balancer-type: Internal
 248   spec:
 249     type: LoadBalancer
 250     ports:
 251       - port: 2222
 252         targetPort: 2222
 253         protocol: TCP
 254         name: ssh
 255     externalTrafficPolicy: Cluster
 256 ^[[0;31m-   loadBalancerIP: 10.224.46.6^[[0m
 257     selector:
 258       app: gitlab-shell
 259       release: gitlabAside from shell coloring escape codes, we definitely see the removal of that IP address. This raises 2 questions:
- Why was this NOT seen on the CI job that pushed out this change? Helm templates this out prior to performing the actual release, so we SHOULD have seen this inside of CI.
- What is the appropriate way to fix this, SHOULD THIS HAPPEN in production?
In staging, since it's a minor environment, we performed a % helm rollback -n gitlab gitlab 1541.  This won't be a safe operation in production 
Note that remediation by manually editing the service object is NOT going to work in this situation. The helm release data won't have this information therefore, when helm performs the diff, the change will continue to show up. We attempted this and auto-deploy remained blocked.
Utilize this issue to track two things:
- 
Determine how this could have happened 
- 
Figure out remediation steps that would be safe to apply to production environments - Document those steps