Prototype of the actions' resources limits & requests dynamic adaptation for better bin-packing of actions within nodes ("Google Autopilot"-like workload autoscaling)

This prototype will provide a first study and initial POC upon the bin-packing optimizations through "Google Autopilot"-like workload autoscaling with Ryax. For this we will take into account the recent code developed in the context of the Empyrean project internship by Yuqiang. Since we cannot act upon Kubernetes directly we will allow the new optimization techniques to take into account the current workload of Ryax and propose the adaptations as a wrapper upon Kubernetes.

In more detail, we will perform the following:

a) We have to consider if it is better to keep each action execution as a Kubernetes deployment or if we need to remove the deployment and manipulate directly the pods.
b) Then we need to consider the pods' resources (CPUs/RAMs) limits and requests, track their evolution and feed it to the algorithms implemented by Yuqiang
c) Consider the 2 different ways of workload autoscaling as implemented by Yuqiang: i) rule based workload autoscaling, ii) AI-based workload autoscaling which will calculate new adapted values for limits and requests. Feed the proposed values of limits and requests to the Ryax wrapper which will try to modify the limits and requests of the pods dynamically.
d) Implement the Ryax wrapper to modify the limits and requests of the pods (Ryax actions) dynamically. We need to consider if the dynamicity here means to deploy another pod and migrate the execution on that new pod with the new resources requests/limits or if we wait for the end of the pod and we consider the new adapted values of limits/resources for the next execution.
e) Propose a synthetic workload to test and showcase the benefits of new Ryax workload autoscaling techniques in system utilization and bin-packing, ideally showing how the new techniques allowed us to reserve (and pay) less compute nodes than with the initial Kubernetes deployment.

Note: This issue considers only CPU and RAM resources and the consideration for workload autoscaling at the level of GPUs will be tackled in a different issue.

Edited Jun 24, 2024 by Yiannis Georgiou

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information