Warning FailedCreatePodSandBox blocks the execution of action
This is an issue viewed in the latest release of Ryax.
The problem has been seen on all staging.ryax.io , empyrean.ryax.io and ai.ryax.io is for some actions and different workflows. It has been observed for example with workflows video detection and Finetune LLM on custom data but I'm sure it can be reproduced with other as well.
So once the workflow is launched the action seems to be stuck in ContainerCreating state when we do a
kubectl get pods -n ryaxns-execs
and then when we do a describe of the pod we see the following:
....
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m17s default-scheduler 0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
Normal Scheduled 2m53s default-scheduler Successfully assigned ryaxns-execs/wiki-article-fetcher-1-1-5-3ko2-658df6447-l8px8 to scw-k8s-mystifying-alvin-cpu-pool-370786e2e227
Normal TriggeredScaleUp 4m10s cluster-autoscaler pod triggered scale-up: [{e9d3f12e-90b0-48a6-a6b0-972ea7870faa 3->4 (max: 5)}]
Warning FailedCreatePodSandBox 2m38s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "73d52a4344acdd9ae98e3d066490840de8e9f04ab4196890fca757e45d6e8c59": cannot start a stopped process: unknown
Warning FailedCreatePodSandBox 2m20s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "55841c189e03b209c1a332f7323d0c0542571906b4d5f0eb4a995a2f07dfde52": cannot start a stopped process: unknown
Warning FailedCreatePodSandBox 2m5s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "cb526782018c471971ed0e66c4b4971310216321f080db20fcd6e34f2a9b6527": cannot start a stopped process: unknown
Warning FailedCreatePodSandBox 113s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "a619b59fca5cc613281e313e7f288e55ee3847ccf9ced05787324977dc3b53fd": cannot start a stopped process: unknown
Warning FailedCreatePodSandBox 100s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "e2bcb21d9c14aed68181a066fb0145715f2e9b871214f17743d00ec79e8056d2": cannot start a stopped process: unknown
Warning FailedCreatePodSandBox 86s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "a04459ed0dfb8b3704a92e9d8cbb9ef69b7c99d075805eabce4a9d435f0564c3": cannot start a stopped process: unknown
Warning FailedCreatePodSandBox 74s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "0848b018f725146d51a86a064da2d5eea6cd2c7ab365d2c22978f78d220e300c": cannot start a stopped process: unknown
Warning FailedCreatePodSandBox 62s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "f4f18c9e53699d4c75d976d5377f112ebd1c453d746ef87a49d38570e6992eaf": cannot start a stopped process: unknown
Warning FailedCreatePodSandBox 49s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "6784962d5541253564375957e962f698adc8de3ce80a44dd39c907733276e9a3": cannot start a stopped process: unknown
Warning FailedCreatePodSandBox 9s (x3 over 37s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "f02147994b185eddb570c2e72ce80e120b2c3ab63412670c60e401237ee13afb": cannot start a stopped process: unknown
Initially we thought it was a scaleway infrastructure problem but we now think that it is most probably related to intelliscale low assignation of memory.
If we check the describe in detail we see this:
$ kubectl describe pods wiki-article-fetcher-1-1-5-3ko2-658df6447-l8px8 -n ryaxns-execs
Name: wiki-article-fetcher-1-1-5-3ko2-658df6447-l8px8
Namespace: ryaxns-execs
Priority: 0
Service Account: default
Node: scw-k8s-mystifying-alvin-cpu-pool-370786e2e227/172.16.8.18
Start Time: Thu, 30 Oct 2025 16:51:18 +0100
Labels: action_deployment_id=ActionDeployment-1761839172-9ziv3ko2
kind=ACTION
pod-template-hash=658df6447
ryax=action
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/wiki-article-fetcher-1-1-5-3ko2-658df6447
Containers:
wiki-article-fetcher-1-1-5-3ko2:
Container ID:
Image: registry.ai.ryax.io/25160b03-1a27-49a5-b453-8db52a993e57:1.1.5
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Limits:
memory: 2722610
Requests:
memory: 2722610
Environment:
The limits and requests of memory seem particularly low and certainly something set by intelliscale since this happen starting from the 2nd executions, never the 1st one.
Why did Intelliscale set 2.6MB ? We should avoid to provide so low limit. It has to start at least from 64Mi.
But let's try to propose a fast hot fix because this one is blocking us the new release.