Allow Operational Container Scanning maximum memory setting to be configured via Agent configuration to avoid OOMKilled errors (#384238) · Issues · GitLab.org / GitLab

Allow Operational Container Scanning maximum memory setting to be configured via Agent configuration to avoid OOMKilled errors

### Proposal   [Operational Container Scanning](https://docs.gitlab.com/ee/user/clusters/agent/vulnerabilities.html) is no longer dependent on having the [Starboard Operator]() installed and can be scheduled via the [Agent configuration](https://docs.gitlab.com/ee/user/clusters/agent/vulnerabilities.html#enable-via-agent-configuration). However there isn't any way to configure the maximum amount of memory available to the scanner pods. The memory setting is essential to be able to avoid pods failing with `OOMKilled` errors when the images are large (at present the memory limit is [hard coded to 500MB](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/internal/module/starboard_vulnerability/agent/starboard_config.go#L31)). ## Updates Note that the implementation plan has been updated based on feedback from [this thread](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/merge_requests/949#note_1381989637) in the MR. ## Implementation Plan 1. Add `resource_requirements` to `container_scanning` config of the [`agent config` file](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/eac853913da2c08a37f293469da207e4a0453b93/pkg/agentcfg/agentcfg.proto#L246) - Example config ```plaintext container_scanning: cadence: '10 * * * *' vulnerability_report: namespaces: - default resource_requirements: limits: cpu: 100m memory: 500Mi requests: cpu: 100m memory: 500Mi ``` 2. Update module [logic](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/b32973971e1c94f392e477d5413d4bf4c980d510/internal/module/starboard_vulnerability/agent/module.go#L40-68) to parse `resource_requirements` and `scan config`^ - If only `agent_config` is configured with `scan config` * Scanner should use `agent_config's` `scan config` with **default** `resource_requirements` - If only `agent_config` is configured with `scan config`and `resource_requirements` * Scanner should use `agent_config's` `scan config` as well as **configured** `resource_requirements` - If only `scan_execution_policy` is configured * Scanner should use `scan_execution_policy's` `scan config` with **default** `resource_requirements` - If `scan_execution_policy` is configured and `agent_config` has both `scan config` and `resource_requirements` - Scanner should use `scan_execution_policy's` `scan config` as well as **configured** `resource_requirements` - If only `scan_execution_policy` is configured and `agent_config` has `resource_requirements` * Scanner should use `scan_execution_policy's` `scan config` with **configured** `resource_requirements` ^ `scan config` refers to `cadence` and `vulnerability_report` 3. Update operational container scanning docs to: - Specify that `scan_execution_policy` takes precedence over `agent_config` if both are configured - Include instructions on configuring resource requirements ## ~~Implementation Plan~~ 1. ~~In~~ [~~starboard_config.go~~](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/1569c792d74dc5cd937ce9fc0ce06fc76069cd78/internal/module/starboard_vulnerability/agent/starboard_config.go)~~, check if the `trivy.resources` config values have one of the following environment variables set. If they do, override the default with the set values.~~ * ~~`TRIVY_CPU_RESOURCE_REQUEST`~~ * ~~`TRIVY_CPU_RESOURCE_LIMIT`~~ * ~~`TRIVY_MEMORY_RESOURCE_REQUEST`~~ * ~~`TRIVY_MEMORY_RESOURCE_LIMIT`~~ 2. ~~Add new values to the helm chart~~ [~~values.yml~~](https://gitlab.com/gitlab-org/charts/gitlab-agent/-/blob/25cdd3771e1132c542fa61c1e17d6c8dff934cc0/values.yaml) ```yaml container_scanning: trivy: resources: {} # limits: # cpu: 100m # memory: 128Mi # requests: # cpu: 100m # memory: 128Mi ``` 3. ~~Use~~ [~~the deployment template~~](https://gitlab.com/gitlab-org/charts/gitlab-agent/-/blob/25cdd3771e1132c542fa61c1e17d6c8dff934cc0/templates/deployment.yaml#L75) ~~to add these values to the pod's environment variables.~~

issue