ERROR: error while generating S3 pre-signed URL error=Not authorized to perform sts:AssumeRoleWithWebIdentity after upgrading Helm chart

Status Update 2023-08-23

After investigation, the documentation should be updated to show how to use IRSA with the runner Helm chart.

Related comment: gitlab-org/gitlab-runner#36788 (comment 1525675743)


I recently upgraded a set of runners running in EKS to gitlab-runner 14.6.0 to 16.2.0. This upgrade consisted of changing the image and helper_image to 16.2.0, for example:

gitlab-runner:
  image: ptt-docker-dev.artifactory.mycorp.com/gitlab/gitlab-runner:alpine-v16.2.0
                                               
  runners:
    config: |
      [[runners]]     
        [runners.kubernetes]
          helper_image = "ptt-docker-dev.artifactory.mycorp.com/gitlab/gitlab-runner-helper:x86_64-v16.2.0"

These runners use AWS IRSA to get permission to read/write an S3 bucket that we use for S3-based distributed CI/CD caching. After this upgrade of the runner and helper images the S3-based distributed CI/CD caching worked fine. This is how the S3 distributed caching with IRSA is configured:

gitlab-runner:
  image: ptt-docker-dev.artifactory.mycorp.com/gitlab/gitlab-runner:alpine-v16.2.0

  rbac:
    create: true
    serviceAccountAnnotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/app/123456789012-s3-ifi-bazel-runner-sa-role
                                               
  runners:
    config: |
      [[runners]]     
        [runners.kubernetes]
          namespace = "{{.Release.Namespace}}"
          helper_image = "ptt-docker-dev.artifactory.mycorp.com/gitlab/gitlab-runner-helper:x86_64-v16.2.0"

          [runners.cache]
            Type = "s3"
            Path = "runner-cache"
            Shared = true
            [runners.cache.s3]
              ServerAddress = "s3.amazonaws.com"
              BucketName = "123456789012-s3-ifi-bazel-runner"
              BucketLocation = "us-east-1"
              Insecure = false 

Then I upgraded the same runners' Helm charts to the version of the Helm chart that corresponds to 16.2.0 (0.55.0) and CI jobs with caching started displaying

No URL provided, cache will not be uploaded to shared cache server. Cache will be stored only locally.

In the runner log the error was

ERROR: error while generating S3 pre-signed URL error=Not authorized to perform sts:AssumeRoleWithWebIdentity after upgrading Helm chart

I checked the sample vaules.yml for 0.55.0 and it looks like there is a new runners.cache setting and there are examples of using a K8s Secret to store an AWS access+secret key, but nothing for how to use IRSA or even good old instance profiles. The runner config docs do mention IRSA, but the Helm docs do not.

I tried adding an empty runners.cache object similar to the example in the sample values.yml but that didn't work. For example:

  runners:
    cache: {}