Update dependency https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules.git to v0.2.3 (main)

This MR contains the following updates:

Package Update Change
https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules.git patch 0.2.2 -> 0.2.3

⚠️ Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Release Notes

sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules (https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules.git)

v0.2.3: sylva-prometheus-rules: 0.2.3

Compare Source

Merge Requests integrated in this release

4 merge requests were integrated in this repo between 0.2.2 and 0.2.3. These notes don't account for the MRs merged in secondary repos.

Monitoring

CI

Contributors

1 person contributed.

Alin H

sylva-prometheus-rules

Generate PrometheusRule objects for consumption by Prometheus

Overview

There are two mechanisms that control which rules are deployed

  1. createRules selects which directories are considered
  2. optional_rules selects which files in those directories are added to the Configmap

Rules overrides

.Values.createRules controls which cluster rules are checked and the keys represent the directories under alert-rules/

If .Values.createRules.allclusters is true (default) then the alert-rules/allclusters/*yaml rules are parsed last, regardless of what other clusters are specified

This allows for rule overriding. Example:

createRules:
  allclusters: true
  management-cluster: true
alert-rules/allclusters/health-alerts.yaml
alert-rules/allclusters/dummy.yaml

alert-rules/management-cluster/flux.yaml
alert-rules/management-cluster/health-alerts.yaml
alert-rules/management-cluster/minio.yaml
  • First the PrometheusRule with the flux, minio and health-alerts name from management-cluster are created.
  • Then health-alerts and dummy from allcluster are parsed. Since health-alerts is already applied from mananagement-cluster it will not be applied again. dummy will be applied since it doesn't override anything

This in effect allows the user to override the health-alerts from allclusters with health-alerts form management-cluster

Rules activation

.Values.optional_rules controls which rules are enabled for optional components

Details about rules
alert-rules/allclusters/snmp-dell-idrac.yaml
Alert Name For Severity Type Description
SNMP_DELL_iDRAC_globalSystemStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - globalSystemStatus is NOK. Current state is: {{ $labels.globalSystemStatus }}
SNMP_DELL_iDRAC_systemStateBatteryStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateBatteryStatus is NOK. Current state is: {{ $labels.systemStateBatteryStatusCombined }}. Check RAID Controller BBU or CMOS battery in iDRAC.
SNMP_DELL_iDRAC_systemStateCoolingDeviceStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateCoolingDeviceStatus is NOK. Current state is: {{ $labels.systemStateCoolingDeviceStatusCombined }}. Check system fans in iDRAC.
SNMP_DELL_iDRAC_systemStateCoolingUnitStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateCoolingDeviceStatus is NOK. Current state is: {{ $labels.systemStateCoolingUnitStatusCombined }}. Check system fans in iDRAC.
SNMP_DELL_iDRAC_systemStateMemoryDeviceStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateMemoryDeviceStatus is NOK. Current state is: {{ $labels.systemStateMemoryDeviceStatusCombined }}. Check system volatile memory in iDRAC.
SNMP_DELL_iDRAC_systemStatePowerSupplyStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStatePowerSupplyStatus is NOK. Current state is: {{ $labels.systemStatePowerSupplyStatusCombined }}. Check system power supply in iDRAC.
SNMP_DELL_iDRAC_systemStatePowerUnitStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStatePowerUnitStatus is NOK. Current state is: {{ $labels.systemStatePowerUnitStatusCombined }}. Check system power supply or external power delivery in iDRAC.
SNMP_DELL_iDRAC_systemStateProcessorDeviceStatusCombined_NOK 5m critical hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateProcessorDeviceStatus is NOK. Current state is: {{ $labels.systemStateProcessorDeviceStatusCombined }}. Check system processor in iDRAC.
SNMP_DELL_iDRAC_systemStateTemperatureStatisticsStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateTemperatureStatisticsStatus is NOK. Current state is: {{ $labels.systemStateTemperatureStatisticsStatusCombined }}. Check system temperatures in iDRAC.
SNMP_DELL_iDRAC_systemStateTemperatureStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateTemperatureStatus is NOK. Current state is: {{ $labels.systemStateTemperatureStatusCombined }}. Check system temperatures in iDRAC.
SNMP_DELL_iDRAC_systemStateVoltageStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateVoltageStatus is NOK. Current state is: {{ $labels.systemStateVoltageStatusCombined }}. Check system voltage in iDRAC.
SNMP_DELL_iDRAC_systemStateAmperageStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateAmperageStatus is NOK. Current state is: {{ $labels.systemStateAmperageStatusCombined }}. Check system voltage in iDRAC.
SNMP_DELL_iDRAC_controllerRollUpStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - controllerRollUpStatus is NOK for controllerNumber {{ $labels.controllerNumber }} ( {{ $labels.controllerName }}). Current state is: {{ $labels.controllerRollUpStatus }}.
SNMP_DELL_iDRAC_controllerComponentStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - controllerComponentStatus is NOK for controllerNumber {{ $labels.controllerNumber }} ( {{ $labels.controllerName }}). Current state is: {{ $labels.controllerComponentStatus }}.
SNMP_DELL_iDRAC_physicalDiskState_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskState is NOK for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Current state is: {{ $labels.physicalDiskState }}.
SNMP_DELL_iDRAC_physicalDiskComponentStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskComponentStatus is NOK for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Current state is: {{ $labels.physicalDiskComponentStatus }}.
SNMP_DELL_iDRAC_physicalDiskSmartAlertIndication_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskSmartAlertIndication is NOK for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}).
SNMP_DELL_iDRAC_physicalDiskRemainingRatedWriteEndurance_WARNING 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskRemainingRatedWriteEndurance is less than 40 for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Value: {{ humanize $value }}
SNMP_DELL_iDRAC_physicalDiskRemainingRatedWriteEndurance_CRITICAL 5m critical hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskRemainingRatedWriteEndurance is less than 20 for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Value: {{ humanize $value }}
SNMP_DELL_iDRAC_virtualDiskState_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - virtualDiskState is NOK for virtualDiskNumber {{ $labels.virtualDiskNumber }} ( {{ $labels.virtualDiskDisplayName }}). Current state is: {{ $labels.virtualDiskState }}.
SNMP_DELL_iDRAC_virtualDiskComponentStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - virtualDiskComponentStatus is NOK for virtualDiskNumber {{ $labels.virtualDiskNumber }} ( {{ $labels.virtualDiskDisplayName }}). Current state is: {{ $labels.virtualDiskComponentStatus }}.
SNMP_DELL_iDRAC_virtualDiskBadBlocksDetected 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - virtualDiskBadBlocksDetected for virtualDiskNumber {{ $labels.virtualDiskNumber }} ( {{ $labels.virtualDiskDisplayName }}).
alert-rules/allclusters/snmp-hp-cpq.yaml
Alert Name For Severity Type Description
SNMP_HP_CPQ_Overall_Health_NOK 5m critical hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Overall health status is NOK. Value: "{{ $labels.cpqHeMibCondition }}"
SNMP_HP_CPQ_Event_Log_Condition_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Event Log Condition is NOK. Value: "{{ $labels.cpqHeEventLogCondition }}"}}
SNMP_HP_CPQ_CPU_Health_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - CPU status is NOK. Value: "{{ $labels.cpqSeCpuCondition }}"}}
SNMP_HP_CPQ_Thermal_Condition_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Thermal condition status is NOK. Value: "{{ $labels.cpqHeThermalCondition }}"}}
SNMP_HP_CPQ_Power_Supply_Condition_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ]- Power supply condition status is NOK. Value: "{{ $labels.cpqHeFltTolPwrSupplyCondition }}"}}
SNMP_HP_CPQ_Storage_Subsystem_Condition_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Storage subsystem condition status is NOK. Value: "{{ $labels.cpqSsMibCondition }}"}}
SNMP_HP_CPQ_Controller_Overall_Condition_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Controller "{{ $labels.cpqDaCntlrIndex }}"}} status is NOK. Value: "{{ $labels.cpqDaCntlrCondition }}"}}. This value represents the overall condition of this controller, and any associated logical drives, physical drives, and array accelerator.
SNMP_HP_CPQ_iLO_LicenseKey_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - HP iLO interface is missing its License activation.
alert-rules/allclusters/snmp-lenovo-xcc.yaml
Alert Name For Severity Type Description
SNMP_Lenovo_XCC_systemHealthStat_NOK 5m critical hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemHealthStat is not "normal". Current state is: {{ $labels.systemHealthStat }}
SNMP_Lenovo_XCC_cpuVpdHealthStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - cpuVpdHealthStatus for CPU "{{ $labels.cpuVpdDescription }}" is not "normal". Current state is: {{ $labels.cpuVpdHealthStatus }}
SNMP_Lenovo_XCC_raidDriveHealthStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - raidDriveHealthStatus for "{{ $labels.raidDriveName }}" is not "Normal". Current state is: {{ $labels.raidDriveHealthStatus }}
SNMP_Lenovo_XCC_memoryHealthStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - memoryHealthStatus for DIMM "{{ $labels.memoryVpdDescription }}" is not "Normal". Current state is: {{ $labels.memoryHealthStatus }}
SNMP_Lenovo_XCC_fanHealthStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - fanHealthStatus for Fan "{{ $labels.fanDescr }}" is not "Normal". Current state is: {{ $labels.fanHealthStatus }}
SNMP_Lenovo_XCC_voltHealthStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - voltHealthStatus for System Component "{{ $labels.voltDescr }}" is not "Normal". Current state is: {{ $labels.voltHealthStatus }}
alert-rules/management-cluster/flux.yaml
Alert Name For Severity Type Description
Flux_Kustomization_Failing 15m warning deployment Flux Kustomization "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace }}" fails to reconcile.
Flux_Kustomization_Failing_Cluster 60m warning deployment Flux Kustomization "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace }}" fails to reconcile.
Flux_HelmRelease_Failing 15m warning deployment Flux HelmRelease "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace}}" fails to reconcile.
Flux_Source_Failing 15m warning deployment Flux Source "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace}}" fails to reconcile.
Flux_Resource_Suspended 2h warning deployment Flux Resource "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace }}" suspended.
alert-rules/management-cluster/goldpinger.yaml
Alert Name For Severity Type Description
Goldpinger_Node_Unhealthy 5m critical network Goldpinger reports unhealthy nodes: "{{ $labels.node }}"
alert-rules/management-cluster/harbor.yaml
Alert Name For Severity Type Description
Harbor_Component_Status_NOK 5m warning tools Harbor component "{{ $labels.component }}" status is DOWN.
alert-rules/management-cluster/keycloak.yaml
Alert Name For Severity Type Description
Keycloak-CNPG_WAL_Disk_Usage_High 5m warning tools WAL directory usage on "{{ $labels.pod }}" has exceeded 2GiB
alert-rules/management-cluster/minio.yaml
Alert Name For Severity Type Description
MinIO_Cluster_Health_Status_NOK 5m critical monitoring MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" health status not OK.
MinIO_Cluster_Health_Status_Unknown 5m critical monitoring MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" health status is Unknown. The cluster does not return cluster metrics. Check pods logs for error messages.
MinIO_Cluster_Disk_Offline 5m critical monitoring MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" disk offline.
MinIO_Cluster_Disk_Space_Usage 5m warning monitoring MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" available disk space is less than 20%.
MinIO_Cluster_Disk_Space_Usage 5m critical monitoring MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" available disk space is less than 10%.
MinIO_Cluster_Disk_Space_Will_Fill_Up_Soon 5m warning monitoring MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" at the current rate of utilization the available disk space will run out in the next 2 days.
MinIO_Cluster_Tolerance 5m critical monitoring MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" has lost quorum on pool "{{ $labels.pool }}" / set "{{ $labels.set }}" for more than 5 minutes.
MinIO_Nodes_Offline 5m warning monitoring MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" has offline nodes.
alert-rules/management-cluster/thanos.yaml
Alert Name For Severity Type Description
Thanos-Compact_Multiple_Running 5m warning monitoring More than one Thanos Compact instance is running. Current number of instances: {{ $value }}.
Thanos-Compact_Halted 5m critical monitoring Thanos Compact has failed to run and now is halted.
Thanos-Compact_Compaction_Failures_Rate_High 15m warning monitoring Thanos Compact is failing to execute {{ $value
Thanos-Compact_Bucket_Operation_Failures_Rate_High 15m warning monitoring Thanos Compact Bucket is failing to execute {{ $value
Thanos-Compact_Has_Not_Run 5m warning monitoring Thanos Compact has not uploaded anything for 24 hours.
Thanos-Query_Store_Endpoints_Missing 5m critical monitoring Thanos Query is missing "{{ $labels.store_type }}" store type. Metrics served by this store type will not be available which can lead to alerting rules not evaluating properly.
Thanos-Query_HTTP_Request_Query_Error_Rate_High 5m critical monitoring Thanos Query is failing to handle {{ $value
Thanos-Query_HTTP_Request_QueryRange_Error_Rate_High 5m critical monitoring Thanos Query is failing to handle {{ $value
Thanos-Query_GRPC_Server_Error_Rate_High 5m warning monitoring Thanos Query is failing to handle {{ $value
Thanos-Query_GRPC_Client_Error_Rate_High 5m warning monitoring Thanos Query is failing to send {{ $value
Thanos-Query_Endpoint_DNS_Lookup_Failure_Rate_High 15m warning monitoring Thanos Query has {{ $value
Thanos-Query_Endpoint_Groups_DNS_Lookup_Failure_Rate_High 15m warning monitoring Thanos Query has {{ $value
Thanos-Query_Instant_Latency_High 10m critical monitoring Thanos Query has a 99th percentile latency of {{ $value }} seconds for instant queries.
Thanos-Query_Range_Latency_High 10m critical monitoring Thanos Query has a 99th percentile latency of {{ $value }} seconds for range queries.
Thanos-Query_Overload 15m warning monitoring Thanos Query has been overloaded for more than 15 minutes. This may be a symptom of excessive simultanous complex requests, low performance of the Prometheus API, or failures within these components. Assess the health of the Thanos query instances, the connnected Prometheus instances, look for potential senders of these requests and then contact support.
Thanos-Receive_HTTP_Request_Error_Rate_High 5m critical monitoring Thanos Receive is failing to handle {{ $value
Thanos-Receive_HTTP_Request_Latency_High 10m critical monitoring Thanos Receive has a 99th percentile latency of {{ $value }} seconds for requests.
Thanos-Receive_Replication_Failures_Rate_High 5m warning monitoring Thanos Receive is failing to replicate {{ $value
Thanos-Receive_Forward_Request_Failures_Rate_High 5m info monitoring Thanos Receive is failing to forward {{ $value
Thanos-Receive_Hashring_File_Refresh_Failures_Rate_High 15m warning monitoring Thanos Receive is failing to refresh hashring file, {{ $value
Thanos-Receive_Config_Reload_Failure 5m warning monitoring Thanos Receive has not been able to reload hashring configurations.
Thanos-Receive_No_Upload 3h critical monitoring Thanos Receive {{ $labels.pod }} has not uploaded latest data to object storage.
Thanos-Receive_Limits_Config_Reload_Failure 5m warning monitoring Thanos Receive has not been able to reload the limits configuration.
Thanos-Receive_Bucket_Operation_Failures_Rate_High 15m warning monitoring Thanos Receive Bucket is failing to execute {{ $value
Thanos-Store_GRPC_Error_Rate_High 5m warning monitoring Thanos Store is failing to handle {{ $value
Thanos-Store_Series_Gate_Latency_High 10m warning monitoring Thanos Store has a 99th percentile latency of {{ $value }} seconds for store series gate requests.
Thanos-Store_Bucket_Operation_Failures_Rate_High 15m warning monitoring Thanos Store Bucket is failing to execute {{ $value
Thanos-Store_Objstore_Operation_Latency_High 10m warning monitoring Thanos Store Bucket has a 99th percentile latency of {{ $value }} seconds for the bucket operations.
Thanos-Store_Block_Drop_Rate_high 10m warning monitoring Thanos Store is evicting blocks from its in-memory cache at a high rate. This may increase query latency and indicate that the index cache size is insufficient for your workload.
Thanos-Ruler_Queue_is_Dropping_Alerts 5m critical monitoring Thanos Rule {{ $labels.pod }} is failing to queue alerts.
Thanos-Ruler_Sender_is_Failing_Alerts 5m critical monitoring Thanos Rule {{ $labels.pod }} is failing to send alerts to alertmanager.
Thanos-Ruler_Rule_Evaluation_Failures_Rate_High 5m critical monitoring Thanos Rule {{ $labels.pod }} is failing to evaluate rules.
Thanos-Ruler_Rule_Evaluation_Warnings_Rate_High 15m info monitoring Thanos Rule {{ $labels.pod }} has high number of evaluation warnings.
Thanos-Ruler_Rule_Evaluation_Latency_High 5m warning monitoring Thanos Rule {{ labels.pod }} has higher evaluation latency than interval for {{labels.rule_group}}.
Thanos-Ruler_GRPC_Error_Rate_High 5m warning monitoring Thanos Ruler {{ $labels.pod }} is failing to handle {{ $value
Thanos-Ruler_Config_Reload_Failure 5m info monitoring Thanos Ruler {{ $labels.pod }} has not been able to reload its configuration.
Thanos-Ruler_Query_DNS_Lookup_Failure_Rate_High 15m warning monitoring Thanos Ruler {{ $labels.pod }} has {{ $value
Thanos-Ruler_Alertmanager_DNS_Failure_Lookup_Rate_High 15m warning monitoring Thanos Rule {{ $labels.pod }} has {{ $value
Thanos-Ruler_No_Evaluation_For_10_Intervals 5m info monitoring Thanos Ruler {{ $labels.pod }} has rule groups that did not evaluate for at least 10x of their expected interval.
Thanos-Ruler_No_Rule_Evaluations 5m critical monitoring Thanos Ruler {{ $labels.pod }} did not perform any rule evaluations in the past 10 minutes.
Thanos-Rule_Bucket_Operation_Failures_Rate_High 15m warning monitoring Thanos Rule Bucket is failing to execute {{ $value
Thanos-Component_Compact_is_Down 5m critical monitoring Thanos Compact has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_Query_is_Down 5m critical monitoring Thanos Query has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_QueryFrontend_is_Down 5m critical monitoring Thanos Query Frontend has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_Receive_is_Down 5m critical monitoring Thanos Receive has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_Rule_is_Down 5m critical monitoring Thanos Ruler has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_Store_is_Down 5m critical monitoring Thanos Store has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_Block_Meta_Sync_Failures 10m critical monitoring Thanos "{{ $labels.container }}" has failed to fetch or parse some block metadata from object storage in the last 10 minutes. This may cause missing data in queries or failed compactions.

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻️ Rebasing: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this MR and you won't be reminded about this update again.


  • If you want to rebase/retry this MR, check this box

This MR has been generated by Renovate Bot Sylva instance.

CI configuration couldn't be handle by MR description. A dedicated comment has been posted to control it.

If no checkbox is checked, a default pipeline will be enabled (capm3, or capo if capo label is set)

Merge request reports

Loading