Skip to content

Kubernetes Gitlab Runner jobs being deleted by kube-scheduler during PodInitializing

Summary

Hi,

We use the Gitlab CI Kubernetes Executor and run CI jobs in Kubernetes with it. There they are created and executed as Kubernetes jobs. Currently we are struggling with a problem that these jobs are still deleted in the PodInitializing phase by the kube-scheduler. Any ideas how to prevent the deletion for further analyzing the problem? Or how we can find the deletion reason? Or why the pod is deleted during PodInitializing phase?

Thanks everyone 😄

Relevant logs and/or screenshots

Kube api server (job/pod deletion event)
{
    "kind": "Event",
    "apiVersion": "audit.k8s.io/v1",
    "level": "RequestResponse",
    "auditID": "98fbb934-a476-481b-b024-c2e4f3d4ae10",
    "stage": "ResponseComplete",
    "requestURI": "/api/v1/namespaces/default/pods/runner-jb-gtjeb-project-33850-concurrent-2mw5ks",
    "verb": "delete",
    "user": {
        "username": "system:kube-scheduler",
        "groups": [
            "system:authenticated"
        ]
    },
    "sourceIPs": [
        "XXX"
    ],
    "userAgent": "kube-scheduler/v1.22.17 (linux/amd64) kubernetes/47b89ea/scheduler",
    "objectRef": {
        "resource": "pods",
        "namespace": "default",
        "name": "runner-jb-gtjeb-project-33850-concurrent-2mw5ks",
        "apiVersion": "v1"
    },
    "responseStatus": {
        "metadata": {},
        "code": 200
    },
    "requestObject": {
        "kind": "DeleteOptions",
        "apiVersion": "v1"
    },
    "responseObject": {
        "kind": "Pod",
        "apiVersion": "v1",
        "metadata": {
            "name": "runner-jb-gtjeb-project-33850-concurrent-2mw5ks",
            "generateName": "runner-jb-gtjeb-project-33850-concurrent-2",
            "namespace": "default",
            "uid": "05150dcb-c273-4cb8-8269-9fa065a64e26",
            "resourceVersion": "537970282",
            "creationTimestamp": "2023-03-27T03:12:26Z",
            "deletionTimestamp": "2023-03-27T03:13:12Z",
            "deletionGracePeriodSeconds": 0,
            "managedFields": [
                {
                    "manager": "gitlab-runner 15.4.0 (15-4-stable; go1.17.9; linux",
                    "operation": "Update",
                    "apiVersion": "v1",
                    "time": "2023-03-27T03:12:26Z",
                    "fieldsType": "FieldsV1",
                },
                {
                    "manager": "kube-scheduler",
                    "operation": "Update",
                    "apiVersion": "v1",
                    "time": "2023-03-27T03:12:26Z",
                    "fieldsType": "FieldsV1",
                    "subresource": "status"
                }
            ]
        },
        "spec": {
            "initContainers": [
                {
                    "name": "init-permissions",
                    "image": "XXX.dkr.ecr.eu-central-1.amazonaws.com/mov-base/nil/default/movbase:gitlab-runner-helper_latest",
                    "command": [
                        "sh",
                        "-c",
                        "touch /logs-33850-121061407/output.log \u0026\u0026 (chmod 777 /logs-33850-121061407/output.log || exit 0)"
                    ],
                    "terminationMessagePath": "/dev/termination-log",
                    "terminationMessagePolicy": "File",
                    "imagePullPolicy": "Always"
                }
            ],
            "restartPolicy": "Never",
            "terminationGracePeriodSeconds": 0,
            "dnsPolicy": "ClusterFirst",
            "serviceAccountName": "eks-cluster-beta-runner-cicd-gitlab-runner",
            "serviceAccount": "eks-cluster-beta-runner-cicd-gitlab-runner",
            "nodeName": "XXX.eu-central-1.compute.internal",
            "securityContext": {},
            "affinity": {},
            "schedulerName": "default-scheduler",
            "priority": 0,
            "enableServiceLinks": true,
            "preemptionPolicy": "PreemptLowerPriority"
        },
        "status": {
            "phase": "Pending",
            "conditions": [
                {
                    "type": "Initialized",
                    "status": "False",
                    "lastProbeTime": null,
                    "lastTransitionTime": "2023-03-27T03:13:12Z",
                    "reason": "ContainersNotInitialized",
                    "message": "containers with incomplete status: [init-permissions]"
                },
                {
                    "type": "Ready",
                    "status": "False",
                    "lastProbeTime": null,
                    "lastTransitionTime": "2023-03-27T03:13:12Z",
                    "reason": "ContainersNotReady",
                    "message": "containers with unready status: [build helper]"
                },
                {
                    "type": "ContainersReady",
                    "status": "False",
                    "lastProbeTime": null,
                    "lastTransitionTime": "2023-03-27T03:13:12Z",
                    "reason": "ContainersNotReady",
                    "message": "containers with unready status: [build helper]"
                },
                {
                    "type": "PodScheduled",
                    "status": "True",
                    "lastProbeTime": null,
                    "lastTransitionTime": "2023-03-27T03:13:12Z"
                }
            ],
            "hostIP": "10.XXX",
            "startTime": "2023-03-27T03:13:12Z",
            "initContainerStatuses": [
                {
                    "name": "init-permissions",
                    "state": {
                        "waiting": {
                            "reason": "PodInitializing"
                        }
                    },
                    "lastState": {},
                    "ready": false,
                    "restartCount": 0,
                    "image": "XXX.dkr.ecr.eu-central-1.amazonaws.com/mov-base/nil/default/movbase:gitlab-runner-helper_latest",
                    "imageID": ""
                }
            ],
            "containerStatuses": [
                {
                    "name": "build",
                    "state": {
                        "waiting": {
                            "reason": "PodInitializing"
                        }
                    },
                    "lastState": {},
                    "ready": false,
                    "restartCount": 0,
                    "image": "XXX.dkr.ecr.eu-central-1.amazonaws.com/mov-base/nil/default/movbase:eksdeployment_latest",
                    "imageID": "",
                    "started": false
                },
                {
                    "name": "helper",
                    "state": {
                        "waiting": {
                            "reason": "PodInitializing"
                        }
                    },
                    "lastState": {},
                    "ready": false,
                    "restartCount": 0,
                    "image": "XXX.dkr.ecr.eu-central-1.amazonaws.com/mov-base/nil/default/movbase:gitlab-runner-helper_latest",
                    "imageID": "",
                    "started": false
                }
            ],
            "qosClass": "Burstable"
        }
    },
    "requestReceivedTimestamp": "2023-03-27T03:13:12.313005Z",
    "stageTimestamp": "2023-03-27T03:13:12.353819Z",
    "annotations": {
        "authorization.k8s.io/decision": "allow",
        "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"system:kube-scheduler\" of ClusterRole \"system:kube-scheduler\" to User \"system:kube-scheduler\""
    }
}
GitLab Logs of failing job ``` Running with gitlab-runner 15.4.0 (43b2dc3d) on eks-cluster-beta-runner-cicd-gitlab-runner-5d459d557b-6dcll jB-gtjeB Preparing the "kubernetes" executor 00:00 Using Kubernetes namespace: default Using Kubernetes executor with image 996594559435.dkr.ecr.eu-central-1.amazonaws.com/mov-base/nil/default/movbase:eksdeployment_latest ... Using attach strategy to execute scripts... Preparing environment 00:48 Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." Waiting for pod default/runner-jb-gtjeb-project-33850-concurrent-2mw5ks to be running, status is Pending Unschedulable: "0/10 nodes are available: 6 Insufficient memory, 9 Insufficient cpu." ERROR: Job failed (system failure): prepare environment: waiting for pod running: pods "runner-jb-gtjeb-project-33850-concurrent-2mw5ks" not found. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information ```

Gitlab Runner v15.4.0