Warning FailedCreatePodSandBox blocks the execution of action

This is an issue viewed in the latest release of Ryax.

The problem has been seen on all staging.ryax.io , empyrean.ryax.io and ai.ryax.io is for some actions and different workflows. It has been observed for example with workflows video detection and Finetune LLM on custom data but I'm sure it can be reproduced with other as well.

So once the workflow is launched the action seems to be stuck in ContainerCreating state when we do a

kubectl get pods -n ryaxns-execs

and then when we do a describe of the pod we see the following:

....
Events:
  Type     Reason                  Age               From                Message
  ----     ------                  ----              ----                -------
  Warning  FailedScheduling        4m17s             default-scheduler   0/6 nodes are available: 3 node(s) didn't match Pod's node affinity/selector, 3 node(s) were unschedulable. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling.
  Normal   Scheduled               2m53s             default-scheduler   Successfully assigned ryaxns-execs/wiki-article-fetcher-1-1-5-3ko2-658df6447-l8px8 to scw-k8s-mystifying-alvin-cpu-pool-370786e2e227
  Normal   TriggeredScaleUp        4m10s             cluster-autoscaler  pod triggered scale-up: [{e9d3f12e-90b0-48a6-a6b0-972ea7870faa 3->4 (max: 5)}]
  Warning  FailedCreatePodSandBox  2m38s             kubelet             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "73d52a4344acdd9ae98e3d066490840de8e9f04ab4196890fca757e45d6e8c59": cannot start a stopped process: unknown
  Warning  FailedCreatePodSandBox  2m20s             kubelet             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "55841c189e03b209c1a332f7323d0c0542571906b4d5f0eb4a995a2f07dfde52": cannot start a stopped process: unknown
  Warning  FailedCreatePodSandBox  2m5s              kubelet             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "cb526782018c471971ed0e66c4b4971310216321f080db20fcd6e34f2a9b6527": cannot start a stopped process: unknown
  Warning  FailedCreatePodSandBox  113s              kubelet             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "a619b59fca5cc613281e313e7f288e55ee3847ccf9ced05787324977dc3b53fd": cannot start a stopped process: unknown
  Warning  FailedCreatePodSandBox  100s              kubelet             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "e2bcb21d9c14aed68181a066fb0145715f2e9b871214f17743d00ec79e8056d2": cannot start a stopped process: unknown
  Warning  FailedCreatePodSandBox  86s               kubelet             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "a04459ed0dfb8b3704a92e9d8cbb9ef69b7c99d075805eabce4a9d435f0564c3": cannot start a stopped process: unknown
  Warning  FailedCreatePodSandBox  74s               kubelet             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "0848b018f725146d51a86a064da2d5eea6cd2c7ab365d2c22978f78d220e300c": cannot start a stopped process: unknown
  Warning  FailedCreatePodSandBox  62s               kubelet             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "f4f18c9e53699d4c75d976d5377f112ebd1c453d746ef87a49d38570e6992eaf": cannot start a stopped process: unknown
  Warning  FailedCreatePodSandBox  49s               kubelet             Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "6784962d5541253564375957e962f698adc8de3ce80a44dd39c907733276e9a3": cannot start a stopped process: unknown
  Warning  FailedCreatePodSandBox  9s (x3 over 37s)  kubelet             (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "f02147994b185eddb570c2e72ce80e120b2c3ab63412670c60e401237ee13afb": cannot start a stopped process: unknown

Initially we thought it was a scaleway infrastructure problem but we now think that it is most probably related to intelliscale low assignation of memory.

If we check the describe in detail we see this:

$ kubectl describe pods wiki-article-fetcher-1-1-5-3ko2-658df6447-l8px8 -n ryaxns-execs
Name:             wiki-article-fetcher-1-1-5-3ko2-658df6447-l8px8
Namespace:        ryaxns-execs
Priority:         0
Service Account:  default
Node:             scw-k8s-mystifying-alvin-cpu-pool-370786e2e227/172.16.8.18
Start Time:       Thu, 30 Oct 2025 16:51:18 +0100
Labels:           action_deployment_id=ActionDeployment-1761839172-9ziv3ko2
                  kind=ACTION
                  pod-template-hash=658df6447
                  ryax=action
Annotations:      <none>
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    ReplicaSet/wiki-article-fetcher-1-1-5-3ko2-658df6447
Containers:
  wiki-article-fetcher-1-1-5-3ko2:
    Container ID:   
    Image:          registry.ai.ryax.io/25160b03-1a27-49a5-b453-8db52a993e57:1.1.5
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      memory:  2722610
    Requests:
      memory:  2722610
    Environment:

The limits and requests of memory seem particularly low and certainly something set by intelliscale since this happen starting from the 2nd executions, never the 1st one.

Why did Intelliscale set 2.6MB ? We should avoid to provide so low limit. It has to start at least from 64Mi.

But let's try to propose a fast hot fix because this one is blocking us the new release.

Edited by Yiannis Georgiou
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information