Skip to content

Use default and max workspace resources on workspace reconcile

Vishal Tak requested to merge vtak/resources_reoncile into master

What does this MR do and why?

Issue: Backend: Add logic for using the agent's defaul... (#427144 - closed)

Use default and max workspace resources on workspace reconcile.

With Workspace config_version 2 migration (!131402 - merged) , we no longer needed desired_config_generator_prev1 and devfile_parser_prev1 since all non-terminated workspaces have been migrated to config version 2. However, for this change, we need to introduce a new config version. So instead of removing the files in one MR and then reintroducing them in this MR, I've just updated the files directly in this MR.

This MR is broken down into 4 commits for easier reviewing.

  • Update previous versions of workspace resources generation
    • Update desired_config_generator_prev1 with contents of desired_config_generator
    • Update devfile_parser_prev1 with contents of devfile_parser
    • Update remote development shared contexts
  • Apply container resoruce default and create resource quota
    • Generate resource quota using the agent's max_resources_per_workspace.
    • Add annotation for SHA256 of max_resources_per_workspace to force workspace restart when value changes.
    • Deep merge the default_resources_per_workspace_container into the containers and init containers of the workspace to apply the defaults.
  • Update version of newly created workspaces
  • Update naming convention for config versions ( Related - Revisit versioning of #create_config_to_apply i... (#425227 - closed) )
    • Rename all _prev1 files to _v2 to make the workspace config version explicit in these files

This MR will be followed by migration in Rails: Migrate workspaces with config_version=2... (#434494 - closed) to migrate non-terminated workspaces from config version 2 to 3.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

Setup

  1. Set the remote development agent config where the max_resources_per_workspace and default_resources_per_workspace_container is not set.

    remote_development:
      enabled: true
      dns_zone: workspaces.localdev.me
      network_policy:
        enabled: true
        egress:
          - allow: 0.0.0.0/0
            except:
              - 10.0.0.0/8
              - 172.16.0.0/12
              - 192.168.0.0/16
          - allow: 172.16.123.1/32
  2. Create a devfile in a project

    schemaVersion: 2.2.0
    components:
    - name: gitlab-ui
      attributes:
        gl/inject-editor: true
      container:
        image: registry.gitlab.com/gitlab-org/remote-development/gitlab-remote-development-docs/debian-bullseye-ruby-3.2.patched-golang-1.20-rust-1.65-node-18.16-postgresql-15@sha256:216b9bf0555349f4225cd16ea37d7a627f2dad24b7e85aa68f4d364319832754
        env:
        - name: STORYBOOK_HOST
          value: "0.0.0.0"
        endpoints:
        - name: storybook
          targetPort: 9001
          secure: true
          protocol: http
        memoryLimit: "2048Mi"
        cpuLimit: "2.3"
  3. Create a new workspace for this project. Open this workspace and create a new file in it called TEST.md and type something. This would be later used for some validations.

Verifying default_resources_per_workspace_container behaviour

  1. Set the default_resources_per_workspace_container in the remote development agent config

    remote_development:
      enabled: true
      dns_zone: workspaces.localdev.me
      network_policy:
        enabled: true
        egress:
        - allow: '0.0.0.0/0'
          except:
          - '10.0.0.0/8'
          - '172.16.0.0/12'
          - '192.168.0.0/16'
        - allow: '172.16.123.1/32'
      default_resources_per_workspace_container:
        limits:
          cpu: "1.5"
          memory: "786Mi"
        requests:
          cpu: "0.6"
          memory: "512Mi"
  2. This will result in the pod for the existing workspace being terminated and a new pod is being created because the default_resources_per_workspace_container.resources.requests are used/merged to generate the workspace's config during reconciliation. You can verify this by performing kubectl describe po and checking the container's resources.request and verifying that it matches the agent's default_resources_per_workspace_container.resources.requests.

  3. Once the workspace is ready, open the workspace and verify that it contains the TEST.md file.

  4. Thus, any change in the agent's default_resources_per_workspace_container results in all workspaces being immediately restarted and the value being enforced without losing any data in the workspace.

Verifying max_resources_per_workspace behaviour

  1. Set the max_resources_per_workspace in the remote development agent config

    remote_development:
      enabled: true
      dns_zone: workspaces.localdev.me
      network_policy:
        enabled: true
        egress:
        - allow: '0.0.0.0/0'
          except:
          - '10.0.0.0/8'
          - '172.16.0.0/12'
          - '192.168.0.0/16'
        - allow: '172.16.123.1/32'
      default_resources_per_workspace_container:
        limits:
          cpu: "1.5"
          memory: "786Mi"
        requests:
          cpu: "0.6"
          memory: "512Mi"
      max_resources_per_workspace:
        limits:
          cpu: "5"
          memory: "5Gi"
        requests:
          cpu: "3"
          memory: "3Gi"
  2. This will result in the pod for the existing workspace being terminated and a new pod is being created because the max_resources_per_workspace has changed and it is used to generate the Kubernetes Resource Quota during reconciliation. The reason it restarts is because we have added an annotation on the workspace pod which is a SHA256 of the agent's max_resources_per_workspace value. You can verify this by performing kubectl describe po and checking the pod's annotations. You can check the resource quota generated by running kubectl describe resourcequota.

  3. Once the workspace is ready, open the workspace and verify that it contains the TEST.md file.

  4. Thus, any change in the agent's max_resources_per_workspace results in all workspaces being immediately restarted and the value being enforced without losing any data in the workspace.

  5. Update the agent's max_resources_per_workspace.limits.cpu to 2 and max_resources_per_workspace.requests.cpu to 1.8.

  6. This will result in the pod for the existing workspace being terminated but no new pod being created. This is because the workspace's devfile have the cpuLimit set to 2.3 and the agent's max_resources_per_workspace.limits.cpu is set to 2. Thus, it is violating the constraints. You can further verify this by doing kubectl get rs -o yaml and checking the status of the latest replica set which will have a message similar to - message: 'pods "workspace-10-1-1mljkt-7bbcb9698-qp95n" is forbidden: exceeded quota. Eventually(after 10 minutes), the workspace will have an actual state of Failed.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Vishal Tak

Merge request reports