ERROR: error while generating S3 pre-signed URL
Summary
When using IAM Roles for Pods in EKS (v1.21), the S3 cache fails with the following error:
ERROR: error while generating S3 pre-signed URL
I've referenced #16097 (closed), #27152 (closed), #28085 (closed), & #28099 before filing this bug report.
Steps to reproduce
Runner manager is successfully deployed to an EKS cluster in its own namespace via the Helm chart to a Fargate pod. An IAM role is built with permissions to the designated S3 bucket and assigned via a ServiceAccount. Worker pods initialize properly on a NodeGroup of EC2 instances. We're using the Runner Helm chart (v0.37.2).
.gitlab-ci.yml
cache:
key: $CI_COMMIT_REF_SLUG
untracked: false
policy: pull-push
paths:
- workspace/node_modules
- workspace/automation/.terraform
Actual behavior
ERROR: error while generating S3 pre-signed URL
Expected behavior
S3 cache is properly initialized and used.
Relevant logs and/or screenshots
job log
Restoring cache
Checking cache for <REDACTED>...
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.
Successfully extracted cache
Environment description
IAM Role Permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ServiceAccount",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<REDACTED>:oidc-provider/oidc.eks.<REDACTED>.amazonaws.com/id/<REDACTED>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.<REDACTED>.amazonaws.com/id/<REDACTED>:sub": "system:serviceaccount:gitlab:gitlab-runner-service"
}
}
}
]
}
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3",
"Effect": "Allow",
"Action": [
"s3:*Object*",
"s3:*MultipartUpload*"
],
"Resource": "arn:aws:s3:::<REDACTED>/*"
},
{
"Sid": "KMSList",
"Effect": "Allow",
"Action": "kms:ListKeys",
"Resource": "*"
},
{
"Sid": "KMS",
"Effect": "Allow",
"Action": [
"kms:GetPublicKey",
"kms:GenerateDataKey",
"kms:Encrypt",
"kms:DescribeKey",
"kms:Decrypt"
],
"Resource": "arn:aws:kms:<REDACTED>:<REDACTED>:key/<REDACTED>"
}
]
}
NOTE: We're using Terraform to deploy the Helm chart.
config.toml contents
[session_server]
session_timeout = 1800
[[runners]]
[runners.cache]
Path = "${local.env_name}-${local.region_code}"
Shared = true
Type = "s3"
[runners.cache.s3]
BucketName = "${aws_s3_bucket.runner_cache.id}"
BucketRegion = "${var.aws_region}"
Insecure = false
# AuthenticationType = "iam"
[runners.kubernetes]
image = "alpine:latest"
privileged = true
pull_policy = "always"
namespace = "${kubernetes_namespace.project.metadata[0].name}"
service_account = "${kubernetes_service_account.cache_role.metadata[0].name}" # Service account for the jobs/executors
[runners.kubernetes.node_selector]
large_nodes = "true"
[runners.kubernetes.node_tolerations]
"large_nodes=true" = "NoSchedule"
[runners.kubernetes.pod_labels]
type = "job"
[runners.kubernetes.pod_annotations]
"iam.amazonaws.com/role" = "${aws_iam_role.service.arn}"
[[runners.kubernetes.volumes.host_path]]
name = "docker"
mount_path = "/var/run/docker.sock"
host_path = "/var/run/docker.sock"
Used GitLab Runner version
alpine-v14.7.0
Possible fixes
N/A
Edited by George Cooksey