Skip to content

Kubernetes: explicit cpu/memory service resources overwrites

What does this MR do?

Allow explicit overwriting of service container resources requests/limits via normalized job environment variables for services with aliases. examples:

job1:
  services:
    - name: redis:5
      alias: my-redis1.foo
    - name: private.registry.io/image:tag
      alias: business-app
    # No aliases means no overriding through environment variables
    - name: postgres:11
  variables:
    # Double quote integers where applicable
    KUBERNETES_SERVICE_CPU_LIMIT_MY_REDIS1_FOO: "2"
    KUBERNETES_SERVICE_CPU_REQUEST_MY_REDIS1_FOO: 500m
    KUBERNETES_SERVICE_MEMORY_LIMIT_MY_REDIS1_FOO: 500M
    KUBERNETES_SERVICE_MEMORY_REQUEST_MY_REDIS1_FOO: 64Mi
    KUBERNETES_SERVICE_CPU_LIMIT_BUSINESS_APP: "3"
    KUBERNETES_SERVICE_EPHEMERAL_STORAGE_REQUEST_BUSINESS_APP: "100Mi"
    KUBERNETES_SERVICE_EPHEMERAL_STORAGE_LIMIT_BUSINESS_APP: "1Gi"

Respects service resources requests/limits max overrides, while overriding global service resources requests/limits, both introduced by !2108 (merged)

Also allow for the future (another MR on gitlab-ci.yml schema validation would be needed I guess?) setting of service container resources requests/limits from the services: section in .gitlab-ci.yml:

job1:
  services:
    - name: postgres:11
      alias: postgres
      resources:
        cpu:
          limit: "2"
          request: 500m
        memory:
          limit: 1Gi
          request: 128Mi
        ephemeral_storage:
          limit: 1Gi
          request: 100Mi

The excellent work of !2108 (merged) needs to be merged first

Why was this MR needed?

While setting uniform service requests/limits for all services within a job works well with similar services having the same needs, precious resources can be wasted if the service topology is heterogenous: a lightweight redis service will not need the same resources requests as a mongodb service. Or another database service may be cpu throttled if the global service cpu limits are too restrictive, but a user cannot risk changing that global service cpu limit which may lead to node cpu contention if the cpu requests were not precisely tailored. (update: the same goes with services' ephemeral storage overwrites, which this MR now also handles)

Are there points in the code the reviewer needs to double check?

  • Environment variables normalization may be too basic (uppercase conversion + . and - conversion to _) to generate a valid shell environment variable.
  • The extension of the Image struct to allow for nested Resources.

Does this MR meet the acceptance criteria?

  • Documentation created/updated
  • Added tests for this feature/bug
  • In case of conflicts with master - branch was rebased

What are the relevant issue numbers?

#25317 (closed)

Edited by Olivier Boukili

Merge request reports