FF_KUBERNETES_HONOR_ENTRYPOINT feature not working
Summary
Enabling FF_KUBERNETES_HONOR_ENTRYPOINT in a job causes them to immediately fail.
Steps to reproduce
Any time we set this in a job or in the runner's env vars, the job fails. The tf_deploy image is one that has an entrypoint set in it's dockerfile like so: ENTRYPOINT ["/usr/local/bin/tf_deploy.sh"]
.gitlab-ci.yml
# deploy the all/tooling target
deploy_all_tooling:
extends: .tf_deploy_template
tags:
- tooling-sandbox-infra-runner
environment:
name: all/tooling
resource_group: all_tooling
variables:
TARGET: all/tooling
GIT_SUBMODULE_STRATEGY: normal
EKS_CLUSTER: tooling-sandbox-infra
FF_KUBERNETES_HONOR_ENTRYPOINT: "true"
rules:
- if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
when: never
- if: $CI_SERVER_HOST != "gitlab.login.gov"
when: never
- if: $CI_PIPELINE_SOURCE == "schedule"
when: never
# XXX change this to main when we are done
- if: $CI_COMMIT_BRANCH == "tspencer/non_idp_terraform_environments"
when: always
script:
# XXX We want to turn on FF_KUBERNETES_HONOR_ENTRYPOINT in the runner. See infra-runner-values.yaml
- /usr/local/bin/tf_deploy.sh
# deploy job template for tf-deploy
.tf_deploy_template:
image:
name: $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/cd/tf_deploy/blessed@$TF_DEPLOY_IMAGE_DIGEST
stage: deploy
artifacts:
name: "$CI_ENVIRONMENT_NAME-$CI_COMMIT_SHA"
paths:
- terraform.plan
- plan.txt
expire_in: 1 year
reports:
terraform: plan.json
script:
- echo "Yay deploys!" ; exit 1
Actual behavior
The build container seems to immediately die.
Running with gitlab-runner 16.5.0 (853330f9)
on tooling-sandbox-infra-gitlab-infra-runner-gitlab-runner-6d5g6tf 62zGgYi6, system ID: r_ZZqSDxQWTouy
feature flags: FF_KUBERNETES_HONOR_ENTRYPOINT:true
Resolving secrets
Preparing the "kubernetes" executor
Using Kubernetes namespace: gitlab
Using Kubernetes executor with image [MASKED].dkr.ecr.[MASKED].amazonaws.com/cd/tf_deploy/blessed@sha256:6488f2c2690c93d9beed80185b94928a5fb848b1ac81636aa441dee7a6702597 ...
Using attach strategy to execute scripts...
Preparing environment
Using FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 1h0m0s...
Waiting for pod gitlab/runner-62zggyi6-project-21-concurrent-0-asgpqyn2 to be running, status is Pending
ERROR: Job failed (system failure): prepare environment: setting up trapping scripts on emptyDir: unable to upgrade connection: container not found ("build"). Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
Expected behavior
When I set FF_KUBERNETES_HONOR_ENTRYPOINT, I expect it to run the container without a command set, and thus for it to run the entrypoint in the image regardless of what command/args/scripts/entrypoints are set. It should run like it does when we turn off FF_KUBERNETES_HONOR_ENTRYPOINT, as shown below:
Running with gitlab-runner 16.5.0 (853330f9)
on tooling-sandbox-infra-gitlab-infra-runner-gitlab-runner-7dnn4cj is1RksBs, system ID: r_AHy9RejbA9Iq
Resolving secrets
00:00
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: gitlab
Using Kubernetes executor with image [MASKED].dkr.ecr.[MASKED].amazonaws.com/cd/tf_deploy/blessed@sha256:6488f2c2690c93d9beed80185b94928a5fb848b1ac81636aa441dee7a6702597 ...
Using attach strategy to execute scripts...
Preparing environment
00:05
Using FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 1h0m0s...
Waiting for pod gitlab/runner-is1rksbs-project-21-concurrent-0-2i8r38eu to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [kuma-sidecar build helper]"
ContainersNotReady: "containers with unready status: [kuma-sidecar build helper]"
Running on runner-is1rksbs-project-21-concurrent-0-2i8r38eu via tooling-sandbox-infra-gitlab-infra-runner-gitlab-runner-7dnn4cj...
Getting source from Git repository
00:07
Fetching changes with git depth set to 20...
Initialized empty Git repository in /builds/lg/identity-devops/.git/
Created fresh repository.
Checking out a3e0ae53 as detached HEAD (ref is tspencer/non_idp_terraform_environments)...
Updating/initializing submodules with git depth set to 20...
Submodule 'identity-devops-private' (https://gitlab-ci-token:[MASKED]@gitlab.login.gov/lg/identity-devops-private.git) registered for path 'identity-devops-private'
Synchronizing submodule url for 'identity-devops-private'
Cloning into '/builds/lg/identity-devops/identity-devops-private'...
Submodule path 'identity-devops-private': checked out '479ba929bd4a9ecb27042392f99652b423e98c7c'
Updated submodules
Entering 'identity-devops-private'
Entering 'identity-devops-private'
Executing "step_script" stage of the job script
02:29
$ /usr/local/bin/tf_deploy.sh
TARGET is valid format: all/tooling
...the rest of the properly functioning job output is trimmed here...
Relevant logs and/or screenshots
See above for the relevant job logs.
Environment description
We are running a self-hosted gitlab 16.5.0 with k8s runners in a dedicated EKS cluster.
config.toml contents
image:
registry: ${accountid}.dkr.ecr.us-west-2.amazonaws.com
image: ecr-public/gitlab/gitlab-runner
tag: ubi-fips-v16.5.0
podAnnotations:
kuma.io/mesh: gitlab
concurrent: 5
gitlabUrl: https://gitlab.login.gov/
rbac:
create: true
logLevel: debug
runners:
config: |
[[runners]]
url = "https://gitlab.login.gov"
# XXX we really want to set this, but this is broken.
environment = ["FF_KUBERNETES_HONOR_ENTRYPOINT=true"]
[runners.kubernetes]
namespace = "gitlab"
service_account = "${irsa_sa}"
helper_image = "${accountid}.dkr.ecr.us-west-2.amazonaws.com/ecr-public/gitlab/gitlab-runner-helper:ubi-fips-x86_64-v16.5.0"
secret: gitlab-runner-secret
podAnnotations:
kuma.io/mesh: gitlab
Used GitLab Runner version
Running with gitlab-runner 16.5.0 (853330f9)
on tooling-sandbox-infra-gitlab-infra-runner-gitlab-runner-6d5g6tf 62zGgYi6, system ID: r_ZZqSDxQWTouy
feature flags: FF_KUBERNETES_HONOR_ENTRYPOINT:true
Resolving secrets
Preparing the "kubernetes" executor
Possible fixes
I thought that #29172 (comment 1676276649) was the problem, but I created helper and gitlab-runner images with entrypoints by using dockerfiles like so:
FROM 217680906704.dkr.ecr.us-west-2.amazonaws.com/ecr-public/gitlab/gitlab-runner-helper:ubi-fips-x86_64-v16.5.0
ENTRYPOINT ["/usr/bin/dumb-init", "/entrypoint"]
and it still didn't fix it. We tried changing FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY, but that didn't help. I also looked at #30713 (closed), which looks like it might be related, but they also seem to want to change how the feature works, and I'm mystified, because it sounds like their stuff actually works the way we want it to. We really need FF_KUBERNETES_HONOR_ENTRYPOINT to actually force the runner to honor the entrypoint supplied in the image, regardless of what commands are supplied in the job.
I engaged with your federal support team, and so they asked me to create this issue. The support ticket is: https://federal-support.gitlab.com/hc/en-us/requests/7049?page=1
Let me know if there's anything else I can get you to figure this out!