Update dependency https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules.git to v0.2.3 (main) (!6129) · Merge requests · Sylva-projects / sylva-core

This MR contains the following updates:

Package	Update	Change
https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules.git	patch	`0.2.2` -> `0.2.3`

⚠️ Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.

Release Notes

sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules (https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules.git)

`v0.2.3`: sylva-prometheus-rules: 0.2.3

Compare Source

Merge Requests integrated in this release

4 merge requests were integrated in this repo between 0.2.2 and 0.2.3. These notes don't account for the MRs merged in secondary repos.

Monitoring

Add Goldpinger conditional !122 area:observability

CI

Update dependency sylva-projects/sylva-elements/ci-tooling/ci-templates to v1.0.47 renovate area:CI !119 !120 !121

Contributors

1 person contributed.

Alin H

sylva-prometheus-rules

Generate PrometheusRule objects for consumption by Prometheus

Overview

There are two mechanisms that control which rules are deployed

createRules selects which directories are considered
optional_rules selects which files in those directories are added to the Configmap

Rules overrides

.Values.createRules controls which cluster rules are checked and the keys represent the directories under alert-rules/

If .Values.createRules.allclusters is true (default) then the alert-rules/allclusters/*yaml rules are parsed last, regardless of what other clusters are specified

This allows for rule overriding. Example:

createRules:
  allclusters: true
  management-cluster: true

alert-rules/allclusters/health-alerts.yaml
alert-rules/allclusters/dummy.yaml

alert-rules/management-cluster/flux.yaml
alert-rules/management-cluster/health-alerts.yaml
alert-rules/management-cluster/minio.yaml

First the PrometheusRule with the flux, minio and health-alerts name from management-cluster are created.
Then health-alerts and dummy from allcluster are parsed. Since health-alerts is already applied from mananagement-cluster it will not be applied again. dummy will be applied since it doesn't override anything

This in effect allows the user to override the health-alerts from allclusters with health-alerts form management-cluster

Rules activation

.Values.optional_rules controls which rules are enabled for optional components

Details about rules

alert-rules/allclusters/snmp-dell-idrac.yaml

Alert Name	For	Severity	Type	Description
SNMP_DELL_iDRAC_globalSystemStatus_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - globalSystemStatus is NOK. Current state is: {{ $labels.globalSystemStatus }}
SNMP_DELL_iDRAC_systemStateBatteryStatusCombined_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateBatteryStatus is NOK. Current state is: {{ $labels.systemStateBatteryStatusCombined }}. Check RAID Controller BBU or CMOS battery in iDRAC.
SNMP_DELL_iDRAC_systemStateCoolingDeviceStatusCombined_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateCoolingDeviceStatus is NOK. Current state is: {{ $labels.systemStateCoolingDeviceStatusCombined }}. Check system fans in iDRAC.
SNMP_DELL_iDRAC_systemStateCoolingUnitStatusCombined_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateCoolingDeviceStatus is NOK. Current state is: {{ $labels.systemStateCoolingUnitStatusCombined }}. Check system fans in iDRAC.
SNMP_DELL_iDRAC_systemStateMemoryDeviceStatusCombined_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateMemoryDeviceStatus is NOK. Current state is: {{ $labels.systemStateMemoryDeviceStatusCombined }}. Check system volatile memory in iDRAC.
SNMP_DELL_iDRAC_systemStatePowerSupplyStatusCombined_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStatePowerSupplyStatus is NOK. Current state is: {{ $labels.systemStatePowerSupplyStatusCombined }}. Check system power supply in iDRAC.
SNMP_DELL_iDRAC_systemStatePowerUnitStatusCombined_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStatePowerUnitStatus is NOK. Current state is: {{ $labels.systemStatePowerUnitStatusCombined }}. Check system power supply or external power delivery in iDRAC.
SNMP_DELL_iDRAC_systemStateProcessorDeviceStatusCombined_NOK	5m	critical	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateProcessorDeviceStatus is NOK. Current state is: {{ $labels.systemStateProcessorDeviceStatusCombined }}. Check system processor in iDRAC.
SNMP_DELL_iDRAC_systemStateTemperatureStatisticsStatusCombined_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateTemperatureStatisticsStatus is NOK. Current state is: {{ $labels.systemStateTemperatureStatisticsStatusCombined }}. Check system temperatures in iDRAC.
SNMP_DELL_iDRAC_systemStateTemperatureStatusCombined_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateTemperatureStatus is NOK. Current state is: {{ $labels.systemStateTemperatureStatusCombined }}. Check system temperatures in iDRAC.
SNMP_DELL_iDRAC_systemStateVoltageStatusCombined_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateVoltageStatus is NOK. Current state is: {{ $labels.systemStateVoltageStatusCombined }}. Check system voltage in iDRAC.
SNMP_DELL_iDRAC_systemStateAmperageStatusCombined_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateAmperageStatus is NOK. Current state is: {{ $labels.systemStateAmperageStatusCombined }}. Check system voltage in iDRAC.
SNMP_DELL_iDRAC_controllerRollUpStatus_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - controllerRollUpStatus is NOK for controllerNumber {{ $labels.controllerNumber }} ( {{ $labels.controllerName }}). Current state is: {{ $labels.controllerRollUpStatus }}.
SNMP_DELL_iDRAC_controllerComponentStatus_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - controllerComponentStatus is NOK for controllerNumber {{ $labels.controllerNumber }} ( {{ $labels.controllerName }}). Current state is: {{ $labels.controllerComponentStatus }}.
SNMP_DELL_iDRAC_physicalDiskState_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskState is NOK for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Current state is: {{ $labels.physicalDiskState }}.
SNMP_DELL_iDRAC_physicalDiskComponentStatus_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskComponentStatus is NOK for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Current state is: {{ $labels.physicalDiskComponentStatus }}.
SNMP_DELL_iDRAC_physicalDiskSmartAlertIndication_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskSmartAlertIndication is NOK for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}).
SNMP_DELL_iDRAC_physicalDiskRemainingRatedWriteEndurance_WARNING	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskRemainingRatedWriteEndurance is less than 40 for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Value: {{ humanize $value }}
SNMP_DELL_iDRAC_physicalDiskRemainingRatedWriteEndurance_CRITICAL	5m	critical	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskRemainingRatedWriteEndurance is less than 20 for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Value: {{ humanize $value }}
SNMP_DELL_iDRAC_virtualDiskState_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - virtualDiskState is NOK for virtualDiskNumber {{ $labels.virtualDiskNumber }} ( {{ $labels.virtualDiskDisplayName }}). Current state is: {{ $labels.virtualDiskState }}.
SNMP_DELL_iDRAC_virtualDiskComponentStatus_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - virtualDiskComponentStatus is NOK for virtualDiskNumber {{ $labels.virtualDiskNumber }} ( {{ $labels.virtualDiskDisplayName }}). Current state is: {{ $labels.virtualDiskComponentStatus }}.
SNMP_DELL_iDRAC_virtualDiskBadBlocksDetected	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - virtualDiskBadBlocksDetected for virtualDiskNumber {{ $labels.virtualDiskNumber }} ( {{ $labels.virtualDiskDisplayName }}).

alert-rules/allclusters/snmp-hp-cpq.yaml

Alert Name	For	Severity	Type	Description
SNMP_HP_CPQ_Overall_Health_NOK	5m	critical	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Overall health status is NOK. Value: "{{ $labels.cpqHeMibCondition }}"
SNMP_HP_CPQ_Event_Log_Condition_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Event Log Condition is NOK. Value: "{{ $labels.cpqHeEventLogCondition }}"}}
SNMP_HP_CPQ_CPU_Health_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - CPU status is NOK. Value: "{{ $labels.cpqSeCpuCondition }}"}}
SNMP_HP_CPQ_Thermal_Condition_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Thermal condition status is NOK. Value: "{{ $labels.cpqHeThermalCondition }}"}}
SNMP_HP_CPQ_Power_Supply_Condition_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ]- Power supply condition status is NOK. Value: "{{ $labels.cpqHeFltTolPwrSupplyCondition }}"}}
SNMP_HP_CPQ_Storage_Subsystem_Condition_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Storage subsystem condition status is NOK. Value: "{{ $labels.cpqSsMibCondition }}"}}
SNMP_HP_CPQ_Controller_Overall_Condition_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Controller "{{ $labels.cpqDaCntlrIndex }}"}} status is NOK. Value: "{{ $labels.cpqDaCntlrCondition }}"}}. This value represents the overall condition of this controller, and any associated logical drives, physical drives, and array accelerator.
SNMP_HP_CPQ_iLO_LicenseKey_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - HP iLO interface is missing its License activation.

alert-rules/allclusters/snmp-lenovo-xcc.yaml

Alert Name	For	Severity	Type	Description
SNMP_Lenovo_XCC_systemHealthStat_NOK	5m	critical	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemHealthStat is not "normal". Current state is: {{ $labels.systemHealthStat }}
SNMP_Lenovo_XCC_cpuVpdHealthStatus_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - cpuVpdHealthStatus for CPU "{{ $labels.cpuVpdDescription }}" is not "normal". Current state is: {{ $labels.cpuVpdHealthStatus }}
SNMP_Lenovo_XCC_raidDriveHealthStatus_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - raidDriveHealthStatus for "{{ $labels.raidDriveName }}" is not "Normal". Current state is: {{ $labels.raidDriveHealthStatus }}
SNMP_Lenovo_XCC_memoryHealthStatus_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - memoryHealthStatus for DIMM "{{ $labels.memoryVpdDescription }}" is not "Normal". Current state is: {{ $labels.memoryHealthStatus }}
SNMP_Lenovo_XCC_fanHealthStatus_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - fanHealthStatus for Fan "{{ $labels.fanDescr }}" is not "Normal". Current state is: {{ $labels.fanHealthStatus }}
SNMP_Lenovo_XCC_voltHealthStatus_NOK	5m	warning	hardware	Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - voltHealthStatus for System Component "{{ $labels.voltDescr }}" is not "Normal". Current state is: {{ $labels.voltHealthStatus }}

alert-rules/management-cluster/flux.yaml

Alert Name	For	Severity	Type	Description
Flux_Kustomization_Failing	15m	warning	deployment	Flux Kustomization "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace }}" fails to reconcile.
Flux_Kustomization_Failing_Cluster	60m	warning	deployment	Flux Kustomization "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace }}" fails to reconcile.
Flux_HelmRelease_Failing	15m	warning	deployment	Flux HelmRelease "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace}}" fails to reconcile.
Flux_Source_Failing	15m	warning	deployment	Flux Source "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace}}" fails to reconcile.
Flux_Resource_Suspended	2h	warning	deployment	Flux Resource "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace }}" suspended.

alert-rules/management-cluster/goldpinger.yaml

Alert Name	For	Severity	Type	Description
Goldpinger_Node_Unhealthy	5m	critical	network	Goldpinger reports unhealthy nodes: "{{ $labels.node }}"

alert-rules/management-cluster/harbor.yaml

Alert Name	For	Severity	Type	Description
Harbor_Component_Status_NOK	5m	warning	tools	Harbor component "{{ $labels.component }}" status is DOWN.

alert-rules/management-cluster/keycloak.yaml

Alert Name	For	Severity	Type	Description
Keycloak-CNPG_WAL_Disk_Usage_High	5m	warning	tools	WAL directory usage on "{{ $labels.pod }}" has exceeded 2GiB

alert-rules/management-cluster/minio.yaml

Alert Name	For	Severity	Type	Description
MinIO_Cluster_Health_Status_NOK	5m	critical	monitoring	MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" health status not OK.
MinIO_Cluster_Health_Status_Unknown	5m	critical	monitoring	MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" health status is Unknown. The cluster does not return cluster metrics. Check pods logs for error messages.
MinIO_Cluster_Disk_Offline	5m	critical	monitoring	MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" disk offline.
MinIO_Cluster_Disk_Space_Usage	5m	warning	monitoring	MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" available disk space is less than 20%.
MinIO_Cluster_Disk_Space_Usage	5m	critical	monitoring	MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" available disk space is less than 10%.
MinIO_Cluster_Disk_Space_Will_Fill_Up_Soon	5m	warning	monitoring	MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" at the current rate of utilization the available disk space will run out in the next 2 days.
MinIO_Cluster_Tolerance	5m	critical	monitoring	MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" has lost quorum on pool "{{ $labels.pool }}" / set "{{ $labels.set }}" for more than 5 minutes.
MinIO_Nodes_Offline	5m	warning	monitoring	MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" has offline nodes.

alert-rules/management-cluster/thanos.yaml

Alert Name	For	Severity	Type	Description
Thanos-Compact_Multiple_Running	5m	warning	monitoring	More than one Thanos Compact instance is running. Current number of instances: {{ $value }}.
Thanos-Compact_Halted	5m	critical	monitoring	Thanos Compact has failed to run and now is halted.
Thanos-Compact_Compaction_Failures_Rate_High	15m	warning	monitoring	Thanos Compact is failing to execute {{ $value
Thanos-Compact_Bucket_Operation_Failures_Rate_High	15m	warning	monitoring	Thanos Compact Bucket is failing to execute {{ $value
Thanos-Compact_Has_Not_Run	5m	warning	monitoring	Thanos Compact has not uploaded anything for 24 hours.
Thanos-Query_Store_Endpoints_Missing	5m	critical	monitoring	Thanos Query is missing "{{ $labels.store_type }}" store type. Metrics served by this store type will not be available which can lead to alerting rules not evaluating properly.
Thanos-Query_HTTP_Request_Query_Error_Rate_High	5m	critical	monitoring	Thanos Query is failing to handle {{ $value
Thanos-Query_HTTP_Request_QueryRange_Error_Rate_High	5m	critical	monitoring	Thanos Query is failing to handle {{ $value
Thanos-Query_GRPC_Server_Error_Rate_High	5m	warning	monitoring	Thanos Query is failing to handle {{ $value
Thanos-Query_GRPC_Client_Error_Rate_High	5m	warning	monitoring	Thanos Query is failing to send {{ $value
Thanos-Query_Endpoint_DNS_Lookup_Failure_Rate_High	15m	warning	monitoring	Thanos Query has {{ $value
Thanos-Query_Endpoint_Groups_DNS_Lookup_Failure_Rate_High	15m	warning	monitoring	Thanos Query has {{ $value
Thanos-Query_Instant_Latency_High	10m	critical	monitoring	Thanos Query has a 99th percentile latency of {{ $value }} seconds for instant queries.
Thanos-Query_Range_Latency_High	10m	critical	monitoring	Thanos Query has a 99th percentile latency of {{ $value }} seconds for range queries.
Thanos-Query_Overload	15m	warning	monitoring	Thanos Query has been overloaded for more than 15 minutes. This may be a symptom of excessive simultanous complex requests, low performance of the Prometheus API, or failures within these components. Assess the health of the Thanos query instances, the connnected Prometheus instances, look for potential senders of these requests and then contact support.
Thanos-Receive_HTTP_Request_Error_Rate_High	5m	critical	monitoring	Thanos Receive is failing to handle {{ $value
Thanos-Receive_HTTP_Request_Latency_High	10m	critical	monitoring	Thanos Receive has a 99th percentile latency of {{ $value }} seconds for requests.
Thanos-Receive_Replication_Failures_Rate_High	5m	warning	monitoring	Thanos Receive is failing to replicate {{ $value
Thanos-Receive_Forward_Request_Failures_Rate_High	5m	info	monitoring	Thanos Receive is failing to forward {{ $value
Thanos-Receive_Hashring_File_Refresh_Failures_Rate_High	15m	warning	monitoring	Thanos Receive is failing to refresh hashring file, {{ $value
Thanos-Receive_Config_Reload_Failure	5m	warning	monitoring	Thanos Receive has not been able to reload hashring configurations.
Thanos-Receive_No_Upload	3h	critical	monitoring	Thanos Receive {{ $labels.pod }} has not uploaded latest data to object storage.
Thanos-Receive_Limits_Config_Reload_Failure	5m	warning	monitoring	Thanos Receive has not been able to reload the limits configuration.
Thanos-Receive_Bucket_Operation_Failures_Rate_High	15m	warning	monitoring	Thanos Receive Bucket is failing to execute {{ $value
Thanos-Store_GRPC_Error_Rate_High	5m	warning	monitoring	Thanos Store is failing to handle {{ $value
Thanos-Store_Series_Gate_Latency_High	10m	warning	monitoring	Thanos Store has a 99th percentile latency of {{ $value }} seconds for store series gate requests.
Thanos-Store_Bucket_Operation_Failures_Rate_High	15m	warning	monitoring	Thanos Store Bucket is failing to execute {{ $value
Thanos-Store_Objstore_Operation_Latency_High	10m	warning	monitoring	Thanos Store Bucket has a 99th percentile latency of {{ $value }} seconds for the bucket operations.
Thanos-Store_Block_Drop_Rate_high	10m	warning	monitoring	Thanos Store is evicting blocks from its in-memory cache at a high rate. This may increase query latency and indicate that the index cache size is insufficient for your workload.
Thanos-Ruler_Queue_is_Dropping_Alerts	5m	critical	monitoring	Thanos Rule {{ $labels.pod }} is failing to queue alerts.
Thanos-Ruler_Sender_is_Failing_Alerts	5m	critical	monitoring	Thanos Rule {{ $labels.pod }} is failing to send alerts to alertmanager.
Thanos-Ruler_Rule_Evaluation_Failures_Rate_High	5m	critical	monitoring	Thanos Rule {{ $labels.pod }} is failing to evaluate rules.
Thanos-Ruler_Rule_Evaluation_Warnings_Rate_High	15m	info	monitoring	Thanos Rule {{ $labels.pod }} has high number of evaluation warnings.
Thanos-Ruler_Rule_Evaluation_Latency_High	5m	warning	monitoring	Thanos Rule {{ labels.pod }} has higher evaluation latency than interval for {{labels.rule_group}}.
Thanos-Ruler_GRPC_Error_Rate_High	5m	warning	monitoring	Thanos Ruler {{ $labels.pod }} is failing to handle {{ $value
Thanos-Ruler_Config_Reload_Failure	5m	info	monitoring	Thanos Ruler {{ $labels.pod }} has not been able to reload its configuration.
Thanos-Ruler_Query_DNS_Lookup_Failure_Rate_High	15m	warning	monitoring	Thanos Ruler {{ $labels.pod }} has {{ $value
Thanos-Ruler_Alertmanager_DNS_Failure_Lookup_Rate_High	15m	warning	monitoring	Thanos Rule {{ $labels.pod }} has {{ $value
Thanos-Ruler_No_Evaluation_For_10_Intervals	5m	info	monitoring	Thanos Ruler {{ $labels.pod }} has rule groups that did not evaluate for at least 10x of their expected interval.
Thanos-Ruler_No_Rule_Evaluations	5m	critical	monitoring	Thanos Ruler {{ $labels.pod }} did not perform any rule evaluations in the past 10 minutes.
Thanos-Rule_Bucket_Operation_Failures_Rate_High	15m	warning	monitoring	Thanos Rule Bucket is failing to execute {{ $value
Thanos-Component_Compact_is_Down	5m	critical	monitoring	Thanos Compact has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_Query_is_Down	5m	critical	monitoring	Thanos Query has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_QueryFrontend_is_Down	5m	critical	monitoring	Thanos Query Frontend has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_Receive_is_Down	5m	critical	monitoring	Thanos Receive has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_Rule_is_Down	5m	critical	monitoring	Thanos Ruler has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_Store_is_Down	5m	critical	monitoring	Thanos Store has disappeared. Prometheus target for the component cannot be discovered.
Thanos-Component_Block_Meta_Sync_Failures	10m	critical	monitoring	Thanos "{{ $labels.container }}" has failed to fetch or parse some block metadata from object storage in the last 10 minutes. This may cause missing data in queries or failed compactions.

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻️ Rebasing: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this MR and you won't be reminded about this update again.

If you want to rebase/retry this MR, check this box

This MR has been generated by Renovate Bot Sylva instance.

CI configuration couldn't be handle by MR description. A dedicated comment has been posted to control it.

If no checkbox is checked, a default pipeline will be enabled (capm3, or capo if capo label is set)

Update dependency https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules.git to v0.2.3 (main)