Skip to content

tenant-observability-stack: Add support for node selector in tenant-observability-config-manager job

Problem

The tenant-observability-config-manager job in the tenant-observability-stack module doesn't support node selector configuration, which causes issues in heterogeneous clusters with different node architectures (e.g., ARM64 vs x86/AMD64).

Currently, when the observability pool is configured to use a different architecture than other nodes in the cluster, the job can be scheduled on incompatible nodes, resulting in image pull errors like:

   Warning  Failed     14m (x4 over 15m)   kubelet            Failed to pull image "registry.gitlab.com/gitlab-com/gl-infra/observability/tenant-observabil │
│ ity/config-manager:v1.9.11": rpc error: code = NotFound desc = failed to pull and unpack image "registry.gitlab.com/gitlab-com/gl-infra/observability/tena │
│ nt-observability/config-manager:v1.9.11": no match for platform in manifest: not found  

Proposal

Add configurable node selector support to the kubernetes_job resource for tenant_observability_config_manager in the module. This would allow users to specify which node pool the job should run on, ensuring compatibility with the architecture-specific container image.

Here's example fix gitlab-com/gl-infra/terraform-modules/observability/tenant-observability-stack@v2.10.0...nodeselector-2.10.0 that resolved issue https://gitlab.com/gitlab-com/gl-infra/gitlab-dedicated/instrumentor/-/merge_requests/5242#note_2444826115 where job was scheduled to Sidekiq node pool (ARM) and not Observability pool (x86).

This configuration will be needed if Dedicated switches to hybrid ARM support.