Enable monitoring and Grafana BGP dashboard for MetalLB frr-k8s mode (!7504) · Merge requests · Sylva-projects / sylva-core

What does this MR do and why?

Context

Currently, MetalLB monitoring and alerting is only fully supported in native BGP mode. When running MetalLB in frr-k8s mode, BGP metrics are exposed with a different prefix (frrk8s_bgp_*), which causes existing dashboards and alert rules to miss relevant data. Additionally, ServiceMonitor resources can cause deployment errors if the required CRDs are not present.

Goals

Ensure that Prometheus can scrape BGP metrics from frr-k8s pods.
Adapt the MetalLB Grafana dashboard to visualize BGP metrics from both native and frr-k8s modes.
Update Thanos/Prometheus alert rules to support both metric types.
Prevent ServiceMonitor deployment errors in clusters without the required CRDs.

Tasks

Prometheus Integration:
Configure Prometheus to scrape BGP-related metrics from frr-k8s pods by setting the appropriate Helm values.
Grafana Dashboard:
Update the MetalLB Grafana dashboard to support both metallb_bgp_session_up and frrk8s_bgp_session_up metrics.
Alert Rules:
Patch the MetalLB BGP alert rules in Thanos/Prometheus to include both metric types, ensuring alerts fire in both modes.
ServiceMonitor Handling:
Adapt Helm values and deployment logic to disable ServiceMonitor creation when the CRD is not present, avoiding installation errors.

Acceptance Criteria

BGP metrics from frr-k8s are visible in Prometheus and Grafana.
The MetalLB dashboard displays BGP session status for both native and frr-k8s modes.
BGP alert rules trigger correctly for both metric types.
No ServiceMonitor-related errors occur during deployment in clusters without the CRD.

Closes #3896 (closed)

Related to:

Test coverage

Tested in the UIs, in a windows vm, Grafana, Prometheus, Thanos. Tested in the capo CI pipeline. I made a fresh deployment for testing with the changes applied and is not getting stuck at first node that's being deployed anymore.

Example of values to enable frr-k8s:

metallb:
  bgp_lbs:
    l3_options:
      bgp_peers:
        ext-router1:
          local_asn: 64513
          peer_asn: 64513
          peer_address: 172.20.219.241
          advertised_pools:
            - pool1
          receive_routes:
            mode: all
    address_pools:
      pool1:
        addresses:
          - 192.168.1.1-192.168.1.2

Testing in the ci with the misc deployment option for the metallb values.

crustgather-job-14266725387 ~> flux debug hr metallb --show-values | yq .frr-k8s.prometheus
namespace: cattle-monitoring-system
rbacPrometheus: true
rbacProxy:
  repository: quay.io/brancz/kube-rbac-proxy
  tag: v0.18.1
serviceAccount: rancher-monitoring-prometheus
serviceMonitor:
  enabled: true

CI configuration

Below you can choose test deployment variants to run in this MR's CI.

Click to open to CI configuration

Legend:

Icon	Meaning	Available values
☁️	Infra Provider	`capd`, `capo`, `capm3`
🚀	Bootstrap Provider	`kubeadm` (alias `kadm`), `rke2`, `okd`, `ck8s`
🐧	Node OS	`ubuntu`, `suse`, `na`, `leapmicro`
🛠️	Deployment Options	Deployment option list and description
🎬	Pipeline Scenarios	Available scenario list and description
🟢	Enabled units	Any available units name, by default apply to management and workload cluster. Can be prefixed by `mgmt:` or `wkld:` to be applied only to a specific cluster type
🔴	Disabled units	Any available units name, by default apply to management and workload cluster. Can be prefixed by `mgmt:` or `wkld:` to be applied only to a specific cluster type
🏗️	Target platform	Can be used to select specific deployment environment Available platform list and description

Global config for deployment pipelines

autorun pipelines
allow failure on pipelines
record sylvactl events

Notes:

Enabling autorun will make deployment pipelines to be run automatically without human interaction
Disabling allow failure will make deployment pipelines mandatory for pipeline success.
if both autorun and allow failure are disabled, deployment pipelines will need manual triggering but will be blocking the pipeline

Be aware: after configuration change, pipeline is not triggered automatically. Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.

Edited May 13, 2026 by Andra-Simona Delicostea

Enable monitoring and Grafana BGP dashboard for MetalLB frr-k8s mode