Allow Operational Container Scanning maximum memory setting to be configured via Agent configuration to avoid OOMKilled errors
<!-- This template is a great use for issues that are feature::additions or technical tasks for larger issues.--> ### Proposal <!-- Use this section to explain the feature and how it will work. It can be helpful to add technical details, design proposals, and links to related epics or issues. --> <!-- Consider adding related issues and epics to this issue. You can also reference the Feature Proposal Template (https://gitlab.com/gitlab-org/gitlab/-/blob/master/.gitlab/issue_templates/Feature%20proposal%20-%20detailed.md) for additional details to consider adding to this issue. Additionally, as a data oriented organization, when your feature exits planning breakdown, consider adding the `What does success look like, and how can we measure that?` section. --> [Operational Container Scanning](https://docs.gitlab.com/ee/user/clusters/agent/vulnerabilities.html) is no longer dependent on having the [Starboard Operator]() installed and can be scheduled via the [Agent configuration](https://docs.gitlab.com/ee/user/clusters/agent/vulnerabilities.html#enable-via-agent-configuration). However there isn't any way to configure the maximum amount of memory available to the scanner pods. The memory setting is essential to be able to avoid pods failing with `OOMKilled` errors when the images are large (at present the memory limit is [hard coded to 500MB](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/master/internal/module/starboard_vulnerability/agent/starboard_config.go#L31)). ## Updates Note that the implementation plan has been updated based on feedback from [this thread](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/merge_requests/949#note_1381989637) in the MR. ## Implementation Plan 1. Add `resource_requirements` to `container_scanning` config of the [`agent config` file](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/eac853913da2c08a37f293469da207e4a0453b93/pkg/agentcfg/agentcfg.proto#L246) - Example config ```plaintext container_scanning: cadence: '10 * * * *' vulnerability_report: namespaces: - default resource_requirements: limits: cpu: 100m memory: 500Mi requests: cpu: 100m memory: 500Mi ``` 2. Update module [logic](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/b32973971e1c94f392e477d5413d4bf4c980d510/internal/module/starboard_vulnerability/agent/module.go#L40-68) to parse <span dir="">`resource_requirements`</span> and `scan config`^ - <span dir="">If only `agent_config` is configured</span> with `scan config` * <span dir="">Scanner should use `agent_config's` `scan config` with **default** `resource_requirements`</span> - <span dir="">If only `agent_config` is configured with `scan config`and `resource_requirements`</span> * <span dir="">Scanner should use `agent_config's` `scan config` as well as **configured** `resource_requirements`</span> - <span dir="">If only `scan_execution_policy` is configured</span> * <span dir="">Scanner should use `scan_execution_policy's` `scan config` with **default** `resource_requirements`</span> - If `scan_execution_policy` is configured and `agent_config` has both `scan config` and <span dir="">`resource_requirements`</span> - Scanner should use <span dir="">`scan_execution_policy's`</span> `scan config` as well as **configured** <span dir="">`resource_requirements`</span> - <span dir="">If only `scan_execution_policy` is configured</span> <span dir="">and `agent_config` has `resource_requirements`</span> * <span dir="">Scanner should use `scan_execution_policy's` `scan config` with **configured** `resource_requirements`</span> ^ `scan config` refers to `cadence` and <span dir="">`vulnerability_report`</span> 3. Update operational container scanning docs to: - Specify that `scan_execution_policy` takes precedence over `agent_config` if both are configured - Include instructions on configuring resource requirements ## ~~Implementation Plan~~ 1. ~~In~~ [~~starboard_config.go~~](https://gitlab.com/gitlab-org/cluster-integration/gitlab-agent/-/blob/1569c792d74dc5cd937ce9fc0ce06fc76069cd78/internal/module/starboard_vulnerability/agent/starboard_config.go)~~, check if the `trivy.resources` config values have one of the following environment variables set. If they do, override the default with the set values.~~ * ~~`TRIVY_CPU_RESOURCE_REQUEST`~~ * ~~`TRIVY_CPU_RESOURCE_LIMIT`~~ * ~~`TRIVY_MEMORY_RESOURCE_REQUEST`~~ * ~~`TRIVY_MEMORY_RESOURCE_LIMIT`~~ 2. ~~Add new values to the helm chart~~ [~~values.yml~~](https://gitlab.com/gitlab-org/charts/gitlab-agent/-/blob/25cdd3771e1132c542fa61c1e17d6c8dff934cc0/values.yaml) ```yaml container_scanning: trivy: resources: {} # limits: # cpu: 100m # memory: 128Mi # requests: # cpu: 100m # memory: 128Mi ``` 3. ~~Use~~ [~~the deployment template~~](https://gitlab.com/gitlab-org/charts/gitlab-agent/-/blob/25cdd3771e1132c542fa61c1e17d6c8dff934cc0/templates/deployment.yaml#L75) ~~to add these values to the pod's environment variables.~~ <!-- Label reminders Use the following resources to find the appropriate labels: - https://gitlab.com/gitlab-org/gitlab/-/labels - https://about.gitlab.com/handbook/product/categories/features/ -->
issue