Skip to content

[gpu-driver] switch order of execution of init containers

Tariq Ibrahim requested to merge switch-order into master

In Kubernetes, the init containers are executed in the same order as the order of their declaration in the manifest. In this PR, we have the k8s-driver-manager execute before the mofed-validation container. This way, the k8s-driver-manager has the opportunity to cleanup all the stale nvidia modules and then the mofed-validation blocks on the successful installation of MOFED. This change resolves a deadlock situation where the gpu-driver pod is blocked on the mofed pod and the mofed pod is blocked by the nvpeermem module which is loaded by the gpu-driver pod

Signed-off-by: Tariq Ibrahim tibrahim@nvidia.com

Edited by Tariq Ibrahim

Merge request reports