[Controller] removeInitContainerEnvVar algorithm causes runtime error: index out of range panic

Summary

Controller gets into a crash loop due to 'index out of range panic'.

Steps to reproduce

Deploy with configuration specified below.

Configuration used

apiVersion: apps.gitlab.com/v1beta1
kind: GitLab
metadata:
  name: gitlab-acme
  namespace: gitlab-system
spec:
  chart:
    version: "5.7.0" # select a version from the CHART_VERSIONS file in the root of this project
    values:
      gitlab:
        toolbox:
          backups:
            objectStorage:
              config:
                secret: storage-config
                key: config
      postgresql:
        install: false
      redis:
        install: false
      registry:
        enabled: false
      nginx-ingress:
        controller:
          config:
            # pass the X-Forwarded-* headers directly from the upstream
            use-forwarded-headers: "true"
          service:
            annotations:
              # Layer 7, injects X-Forwarded-* headers
              service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
              service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "3600"
              # Configure ACM certifiates
              service.beta.kubernetes.io/aws-load-balancer-ssl-cert: ...
              # Configure which ports are to terminate SSL.
              service.beta.kubernetes.io/aws-load-balancer-ssl-ports: https
              service.beta.kubernetes.io/aws-load-balancer-internal: "true"
            targetPorts:
              https: http # the ELB will send HTTP to 443
      global:
        appConfig:
          lfs:
            bucket: acme-gitlab
            connection:
              secret: storage-config
              key: config
          artifacts:
            bucket: acme-gitlab
            connection:
              secret: storage-config
              key: config
          uploads:
            bucket: acme-gitlab
            connection:
              secret: storage-config
              key: config
          packages:
            bucket: acme-gitlab
            connection:
              secret: storage-config
              key: config
          backups:
            bucket: acme-gitlab
            tmpBucket: acme-gitlab
            connection:
              secret: storage-config
              key: connection
        hosts:
          domain: acme.com # use a real domain here
        ingress:
          tls:
            enabled: false
        minio:
          enabled: false
        psql:
          host: <host>
          username: gitlab
          password:
            secret: gitlab-postgresql-password
            key: postgres-password
        redis:
          host: my-redis-master
          password:
            secret: my-redis
            key: redis-password
      certmanager-issuer:
        email: e@mail.com # use your real email address here

Current behavior

Controller is in a crash loop due to index out of range panic.

Expected behavior

Controller does not crash.

Versions

  • Operator: f4e17887
  • Platform:
    • Cloud: EKS
  • Kubernetes: (kubectl version)
    • Client: v1.22.4
    • Server:v1.21.5-eks-bc4871b

Relevant logs

panic: runtime error: index out of range [37] with length 37

goroutine 425 [running]:
gitlab.com/gitlab-org/cloud-native/gitlab-operator/controllers.removeInitContainerEnvVar(0xc001b74000, 0x202cdd0, 0xc, 0x2039eb5, 0x15)
	/workspace/controllers/upgrade.go:182 +0x4d5
gitlab.com/gitlab-org/cloud-native/gitlab-operator/controllers.(*GitLabReconciler).rollingUpdateDeployments(0xc00065d400, 0x22e3128, 0xc000648690, 0x22fb5f8, 0xc000648e70, 0xc000a6e950, 0x1, 0x1, 0x0, 0xc0002f5680)
	/workspace/controllers/upgrade.go:80 +0x1a6
gitlab.com/gitlab-org/cloud-native/gitlab-operator/controllers.(*GitLabReconciler).rollingUpdateWebserviceDeployments(0xc00065d400, 0x22e3128, 0xc000648690, 0x22fb5f8, 0xc000648e70, 0x0, 0xc000756780)
	/workspace/controllers/upgrade.go:91 +0x93
gitlab.com/gitlab-org/cloud-native/gitlab-operator/controllers.(*GitLabReconciler).rollingUpdateWebserviceAndSidekiqIfEnabled(0xc00065d400, 0x22e3128, 0xc000648690, 0x22fb5f8, 0xc000648e70, 0x22f02a8, 0xc00013a690, 0x0, 0x0)
	/workspace/controllers/upgrade.go:162 +0x191

The problem is in algorithm while iterating through an array and removing it's elements without decreasing the counter. Probable fix:

diff --git a/controllers/upgrade.go b/controllers/upgrade.go
index e665334..6ef8582 100644
--- a/controllers/upgrade.go
+++ b/controllers/upgrade.go
@@ -179,8 +179,9 @@ func removeInitContainerEnvVar(deployment *appsv1.Deployment, initContainerName,
        for i := range deployment.Spec.Template.Spec.InitContainers {
                if deployment.Spec.Template.Spec.InitContainers[i].Name == initContainerName {
                        for j := range deployment.Spec.Template.Spec.InitContainers[i].Env {
-                               if deployment.Spec.Template.Spec.InitContainers[i].Env[j].Name == envVarName {
+                               if (j < len(deployment.Spec.Template.Spec.InitContainers[i].Env)) && (deployment.Spec.Template.Spec.InitContainers[i].Env[j].Name == envVarName) {
                                        deployment.Spec.Template.Spec.InitContainers[i].Env = removeEnvVar(deployment.Spec.Template.Spec.InitContainers[i].Env, j)
+                                       j = j - 1
                                }
                        }
                }
Edited by Mitchell Nielsen