Skip to content

S3 Cache with AssumeRoleWithWebIdentity fails with incorrect STS URL in AWS Beijing region

Summary

We are running GitLab runners in EKS in the EU, US, and China AWS regions with S3 caching. All works correctly in EU and US regions, but China gives us the following error:

ERROR: error while generating S3 pre-signed URL     error=Post https://sts.cn-north-1.amazonaws.com?Action=AssumeRoleWithWebIdentity&RoleArn=arn%3Aaws-cn%3Aiam%3A%3A<AWS ACCT NUMBER REDACTED>%3Arole%2Feks-ops-cn-gl-runners-secrets-csi-driver&RoleSessionName=1625589185585289293&Version=2011-06-15&WebIdentityToken=   <REDACTED>: dial tcp: lookup sts.cn-north-1.amazonaws.com   on 172.20.0.10:53: no such host

As can be seen, the URL generated is: https://sts.cn-north-1.amazonaws.com, but that should actually be: https://sts.cn-north-1.amazonaws.com.cn

We have worked with AWS support and after running some tests, they believe this to be an issue with the GitLab runner pod.

Steps to reproduce

Set up S3 cache details in kubernetes chart (truncated to show only relevant section)

apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - command:
        - /bin/bash
        - /scripts/entypoint
        env:
        - name: CACHE_TYPE
          value: s3
        - name: CACHE_PATH
          value: gitlab-caches
        - name: CACHE_SHARED
          value: "true"
        - name: CACHE_S3_SERVER_ADDRESS
          value: s3.amazonaws.com.cn
        - name: CACHE_S3_BUCKET_NAME
          value: gitlab-cache
        - name: CACHE_S3_BUCKET_LOCATION
          value: cn-north-1

Gitlab Runner service account is annotated with:

apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws-cn:iam::111111111111:role/pod-role
.gitlab-ci.yml
#####
# Global
#####
cache: &global_cache
    key: "${CI_COMMIT_REF_SLUG}-${CI_COMMIT_SHA}"
    paths:
      - ${TF_DIR}/.terraform/modules
      - ${TF_DIR}/.terraform/plugins

Actual behavior

Cache fails as the STS URL that's generated is incorrect for China

Expected behavior

Cacheing should succeed with a successful connection to the S3 bucket

Relevant logs and/or screenshots

job log
Add the job log

Environment description

We are running custom shared group runners on GitLab.com which are running on AWS EKS 1.20

Used GitLab Runner version

Running with gitlab-runner 14.0.1 (c1edb478)
  on gitlab-runner-gitlab-runner-6db4c58df5-999h9 39LNSLEh