Backend: Add logic for using the agent's default/max cpu/memory during reconciliation

MR: Use default and max workspace resources on work... (!139209 - merged)

Description

As a user, I want to be able to specify the default and max cpu/memory to be used for all workspaces provisioned through an agent.

If any container in the generated Kubernetes Deployment does not have the cpu/memory requests/limits specified, the default value from the agent's default_resources_per_workspace_container are added to it.

Also generate the Kubernetes Resource Quota using the agent's max_resources_per_workspace

Acceptance Criteria

If any container in the generated Kubernetes Deployment does not have the cpu/memory requests/limits specified, the default value from the agent's default_resources_per_workspace_container are added to it.
Kubernetes Resource Quota using the agent's max_resources_per_workspace is generated. This is only sent in full reconciliation or if force_include_all_resources is true.
Creating a workspace from a devfile which has cpu/memory requests/limits higher than what is allowed by the agent, results in the workspace being reported as Starting for 600 seconds, then Failed after 600 seconds.

Impact Assessment

If default_resources_per_workspace_container or max_resources_per_workspace are updated(and successfully stored in the DB), then it will immediately apply to all the existing workspace. It would result in the restart of the existing workspaces.

Reason for not setting the defaults during workspace creation

If we use the defaults(default_resources_per_workspace_container) configured at the agent during workspace creation(by modifying the devfile before storing it in the DB), the result would be that any change in default_resources_per_workspace_container would not be reflected onto existing workspaces and thus they won't be restarted.

However, this behaviour would be in contrast with the way max_resources_per_workspace are handled. Since the Kubernetes Resource Quota (like Kubernetes Deployment) is generated during each reconciliation, any update to max_resources_per_workspace would result in an updated Kubernetes Resource Quota. However, by the nature of Resource Quota, it only applies to any new Pods that are created. So in that sense it wouldn't affect the existing workspaces. But if the workspace pod(was rescheduled in Kubernetes - this can happen at any time for various reason), then the updated Resource Quota would come into effect and the workspace(if it violates the max cpu/memory requests/limits) would result in a Failed state.

To make this behaviour more consistent, using the default_resources_per_workspace_container during workspace reconciliation, as opposed to workspace creation, would force a new workspace pod to be created. Because a new pod is being created, the updated Resource Quota would come into effect and the workspace(if it violates the max cpu/memory requests/limits) would result in a Failed state. However, since the behaviour is deterministic, the agent administrator can schedule this activity(of updating default_resources_per_workspace_container and max_resources_per_workspace) and all workspaces associated with the agent would get restarted and if any workspace violates the max cpu/memory requests/limits, it would result in a Failed state.

Reason for not using Limit Range to set the defaults during workspace reconcile

https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/#what-if-you-specify-a-container-s-limit-but-not-its-request

If you specify a container's limit, but not its request(in the devfile) and if you are using a Kubernetes Limit Range to set the defaults, the container's memory request is set to match its memory limit. Notice that the container was not assigned the default memory request value as mentioned in the Limit Range.

This is the behaviour of Kubernetes and does not make sense in our case. Thus, we will deep merge the container's resources with the agent's default resources value such that if the key is already present in the container's resources, it will take higher precedence.

Edited Dec 26, 2023 by Vishal Tak