tenant-observability-stack: Add support for node selector in tenant-observability-config-manager job
Problem
The tenant-observability-config-manager
job in the tenant-observability-stack
module doesn't support node selector configuration, which causes issues in heterogeneous clusters with different node architectures (e.g., ARM64 vs x86/AMD64).
Currently, when the observability pool is configured to use a different architecture than other nodes in the cluster, the job can be scheduled on incompatible nodes, resulting in image pull errors like:
Warning Failed 14m (x4 over 15m) kubelet Failed to pull image "registry.gitlab.com/gitlab-com/gl-infra/observability/tenant-observabil │
│ ity/config-manager:v1.9.11": rpc error: code = NotFound desc = failed to pull and unpack image "registry.gitlab.com/gitlab-com/gl-infra/observability/tena │
│ nt-observability/config-manager:v1.9.11": no match for platform in manifest: not found
Proposal
Add configurable node selector support to the kubernetes_job
resource for tenant_observability_config_manager
in the module. This would allow users to specify which node pool the job should run on, ensuring compatibility with the architecture-specific container image.
Here's example fix gitlab-com/gl-infra/terraform-modules/observability/tenant-observability-stack@v2.10.0...nodeselector-2.10.0 that resolved issue https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/instrumentor/-/merge_requests/5242#note_2444826115 where job was scheduled to Sidekiq node pool (ARM) and not Observability pool (x86).
This configuration will be needed if Dedicated switches to hybrid ARM support.