FF_KUBERNETES_HONOR_ENTRYPOINT feature not working

Summary

Enabling FF_KUBERNETES_HONOR_ENTRYPOINT in a job causes them to immediately fail.

Steps to reproduce

Any time we set this in a job or in the runner's env vars, the job fails. The tf_deploy image is one that has an entrypoint set in it's dockerfile like so: ENTRYPOINT ["/usr/local/bin/tf_deploy.sh"]

.gitlab-ci.yml

# deploy the all/tooling target
deploy_all_tooling:
  extends: .tf_deploy_template
  tags:
    - tooling-sandbox-infra-runner
  environment:
    name: all/tooling
  resource_group: all_tooling
  variables:
    TARGET: all/tooling
    GIT_SUBMODULE_STRATEGY: normal
    EKS_CLUSTER: tooling-sandbox-infra
    FF_KUBERNETES_HONOR_ENTRYPOINT: "true"
  rules:
    - if: '$CI_PIPELINE_SOURCE == "merge_request_event"'
      when: never
    - if: $CI_SERVER_HOST != "gitlab.login.gov"
      when: never
    - if: $CI_PIPELINE_SOURCE == "schedule"
      when: never
    # XXX change this to main when we are done
    - if: $CI_COMMIT_BRANCH == "tspencer/non_idp_terraform_environments"
      when: always
  script:
    # XXX We want to turn on FF_KUBERNETES_HONOR_ENTRYPOINT in the runner.  See infra-runner-values.yaml
    - /usr/local/bin/tf_deploy.sh

# deploy job template for tf-deploy
.tf_deploy_template:
  image:
    name: $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/cd/tf_deploy/blessed@$TF_DEPLOY_IMAGE_DIGEST
  stage: deploy
  artifacts:
    name: "$CI_ENVIRONMENT_NAME-$CI_COMMIT_SHA"
    paths:
      - terraform.plan
      - plan.txt
    expire_in: 1 year
    reports:
      terraform: plan.json
  script:
    - echo "Yay deploys!" ; exit 1

Actual behavior

The build container seems to immediately die.

Running with gitlab-runner 16.5.0 (853330f9)
  on tooling-sandbox-infra-gitlab-infra-runner-gitlab-runner-6d5g6tf 62zGgYi6, system ID: r_ZZqSDxQWTouy
  feature flags: FF_KUBERNETES_HONOR_ENTRYPOINT:true
Resolving secrets
Preparing the "kubernetes" executor
Using Kubernetes namespace: gitlab
Using Kubernetes executor with image [MASKED].dkr.ecr.[MASKED].amazonaws.com/cd/tf_deploy/blessed@sha256:6488f2c2690c93d9beed80185b94928a5fb848b1ac81636aa441dee7a6702597 ...
Using attach strategy to execute scripts...
Preparing environment
Using FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 1h0m0s...
Waiting for pod gitlab/runner-62zggyi6-project-21-concurrent-0-asgpqyn2 to be running, status is Pending
ERROR: Job failed (system failure): prepare environment: setting up trapping scripts on emptyDir: unable to upgrade connection: container not found ("build"). Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information

Expected behavior

When I set FF_KUBERNETES_HONOR_ENTRYPOINT, I expect it to run the container without a command set, and thus for it to run the entrypoint in the image regardless of what command/args/scripts/entrypoints are set. It should run like it does when we turn off FF_KUBERNETES_HONOR_ENTRYPOINT, as shown below:

Running with gitlab-runner 16.5.0 (853330f9)
  on tooling-sandbox-infra-gitlab-infra-runner-gitlab-runner-7dnn4cj is1RksBs, system ID: r_AHy9RejbA9Iq
Resolving secrets
00:00
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: gitlab
Using Kubernetes executor with image [MASKED].dkr.ecr.[MASKED].amazonaws.com/cd/tf_deploy/blessed@sha256:6488f2c2690c93d9beed80185b94928a5fb848b1ac81636aa441dee7a6702597 ...
Using attach strategy to execute scripts...
Preparing environment
00:05
Using FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 1h0m0s...
Waiting for pod gitlab/runner-is1rksbs-project-21-concurrent-0-2i8r38eu to be running, status is Pending
	ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
	ContainersNotReady: "containers with unready status: [kuma-sidecar build helper]"
	ContainersNotReady: "containers with unready status: [kuma-sidecar build helper]"
Running on runner-is1rksbs-project-21-concurrent-0-2i8r38eu via tooling-sandbox-infra-gitlab-infra-runner-gitlab-runner-7dnn4cj...
Getting source from Git repository
00:07
Fetching changes with git depth set to 20...
Initialized empty Git repository in /builds/lg/identity-devops/.git/
Created fresh repository.
Checking out a3e0ae53 as detached HEAD (ref is tspencer/non_idp_terraform_environments)...
Updating/initializing submodules with git depth set to 20...
Submodule 'identity-devops-private' (https://gitlab-ci-token:[MASKED]@gitlab.login.gov/lg/identity-devops-private.git) registered for path 'identity-devops-private'
Synchronizing submodule url for 'identity-devops-private'
Cloning into '/builds/lg/identity-devops/identity-devops-private'...
Submodule path 'identity-devops-private': checked out '479ba929bd4a9ecb27042392f99652b423e98c7c'
Updated submodules
Entering 'identity-devops-private'
Entering 'identity-devops-private'
Executing "step_script" stage of the job script
02:29
$ /usr/local/bin/tf_deploy.sh
TARGET is valid format: all/tooling
...the rest of the properly functioning job output is trimmed here...

Relevant logs and/or screenshots

See above for the relevant job logs.

Environment description

We are running a self-hosted gitlab 16.5.0 with k8s runners in a dedicated EKS cluster.

config.toml contents

image:
  registry: ${accountid}.dkr.ecr.us-west-2.amazonaws.com
  image: ecr-public/gitlab/gitlab-runner
  tag: ubi-fips-v16.5.0
podAnnotations:
  kuma.io/mesh: gitlab

concurrent: 5
gitlabUrl: https://gitlab.login.gov/

rbac:
  create: true

logLevel: debug

runners:
  config: |
    [[runners]]
      url = "https://gitlab.login.gov"
      # XXX we really want to set this, but this is broken.
      environment = ["FF_KUBERNETES_HONOR_ENTRYPOINT=true"]
      [runners.kubernetes]
        namespace = "gitlab"
        service_account = "${irsa_sa}"
        helper_image = "${accountid}.dkr.ecr.us-west-2.amazonaws.com/ecr-public/gitlab/gitlab-runner-helper:ubi-fips-x86_64-v16.5.0"
  secret: gitlab-runner-secret
  podAnnotations:
    kuma.io/mesh: gitlab

Used GitLab Runner version

Running with gitlab-runner 16.5.0 (853330f9)
  on tooling-sandbox-infra-gitlab-infra-runner-gitlab-runner-6d5g6tf 62zGgYi6, system ID: r_ZZqSDxQWTouy
  feature flags: FF_KUBERNETES_HONOR_ENTRYPOINT:true
Resolving secrets
Preparing the "kubernetes" executor

Possible fixes

I thought that #29172 (comment 1676276649) was the problem, but I created helper and gitlab-runner images with entrypoints by using dockerfiles like so:

FROM 217680906704.dkr.ecr.us-west-2.amazonaws.com/ecr-public/gitlab/gitlab-runner-helper:ubi-fips-x86_64-v16.5.0
ENTRYPOINT ["/usr/bin/dumb-init", "/entrypoint"]

and it still didn't fix it. We tried changing FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY, but that didn't help. I also looked at #30713 (closed), which looks like it might be related, but they also seem to want to change how the feature works, and I'm mystified, because it sounds like their stuff actually works the way we want it to. We really need FF_KUBERNETES_HONOR_ENTRYPOINT to actually force the runner to honor the entrypoint supplied in the image, regardless of what commands are supplied in the job.

I engaged with your federal support team, and so they asked me to create this issue. The support ticket is: https://federal-support.gitlab.com/hc/en-us/requests/7049?page=1

Let me know if there's anything else I can get you to figure this out!