Update dependency https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules.git to v0.1.3 (release-1.5)

This MR contains the following updates:

Package Update Change
https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules.git patch 0.1.0 -> 0.1.3

⚠️ Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Release Notes

sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules (https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-prometheus-rules.git)

v0.1.3: sylva-prometheus-rules: 0.1.3

Compare Source

Merge Requests integrated in this release

CI

  • Update dependency renovate-bot/renovate-runner to v22 !96 renovate
  • Update dependency sylva-projects/sylva-elements/ci-tooling/ci-templates to v1.0.40 !97 renovate

Other

  • Add Harbor rule !98

Contributors

Alin H

sylva-prometheus-rules

Generate PrometheusRule objects for consumption by Prometheus

Overview

There are two mechanisms that control which rules are deployed

  1. createRules selects which directories are considered
  2. optional_rules selects which files in those directories are added to the Configmap

Rules overrides

.Values.createRules controls which cluster rules are checked and the keys represent the directories under alert-rules/

If .Values.createRules.allclusters is true (default) then the alert-rules/allclusters/*yaml rules are parsed last, regardless of what other clusters are specified

This allows for rule overriding. Example:

createRules:
  allclusters: true
  management-cluster: true
alert-rules/allclusters/health-alerts.yaml
alert-rules/allclusters/dummy.yaml

alert-rules/management-cluster/flux.yaml
alert-rules/management-cluster/health-alerts.yaml
alert-rules/management-cluster/minio.yaml
  • First the PrometheusRule with the flux, minio and health-alerts name from management-cluster are created.
  • Then health-alerts and dummy from allcluster are parsed. Since health-alerts is already applied from mananagement-cluster it will not be applied again. dummy will be applied since it doesn't override anything

This in effect allows the user to override the health-alerts from allclusters with health-alerts form management-cluster

Rules activation

.Values.optional_rules controls which rules are enabled for optional components

Details about rules
alert-rules/allclusters/health-alerts.yaml
Alert Name For Severity Type Description
KubeJobFailedAllClusters 15m warning k8s Job "{{ $labels.namespace }}"/ "{{ $labels.job_name }}" failed to complete. Removing failed job after investigation should clear this alert.
alert-rules/allclusters/snmp-dell-idrac.yaml
Alert Name For Severity Type Description
SNMP_DELL_iDRAC_globalSystemStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - globalSystemStatus is NOK. Current state is: {{ $labels.globalSystemStatus }}
SNMP_DELL_iDRAC_systemStateBatteryStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateBatteryStatus is NOK. Current state is: {{ $labels.systemStateBatteryStatusCombined }}. Check RAID Controller BBU or CMOS battery in iDRAC.
SNMP_DELL_iDRAC_systemStateCoolingDeviceStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateCoolingDeviceStatus is NOK. Current state is: {{ $labels.systemStateCoolingDeviceStatusCombined }}. Check system fans in iDRAC.
SNMP_DELL_iDRAC_systemStateCoolingUnitStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateCoolingDeviceStatus is NOK. Current state is: {{ $labels.systemStateCoolingUnitStatusCombined }}. Check system fans in iDRAC.
SNMP_DELL_iDRAC_systemStateMemoryDeviceStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateMemoryDeviceStatus is NOK. Current state is: {{ $labels.systemStateMemoryDeviceStatusCombined }}. Check system volatile memory in iDRAC.
SNMP_DELL_iDRAC_systemStatePowerSupplyStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStatePowerSupplyStatus is NOK. Current state is: {{ $labels.systemStatePowerSupplyStatusCombined }}. Check system power supply in iDRAC.
SNMP_DELL_iDRAC_systemStatePowerUnitStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStatePowerUnitStatus is NOK. Current state is: {{ $labels.systemStatePowerUnitStatusCombined }}. Check system power supply or external power delivery in iDRAC.
SNMP_DELL_iDRAC_systemStateProcessorDeviceStatusCombined_NOK 5m critical hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateProcessorDeviceStatus is NOK. Current state is: {{ $labels.systemStateProcessorDeviceStatusCombined }}. Check system processor in iDRAC.
SNMP_DELL_iDRAC_systemStateTemperatureStatisticsStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateTemperatureStatisticsStatus is NOK. Current state is: {{ $labels.systemStateTemperatureStatisticsStatusCombined }}. Check system temperatures in iDRAC.
SNMP_DELL_iDRAC_systemStateTemperatureStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateTemperatureStatus is NOK. Current state is: {{ $labels.systemStateTemperatureStatusCombined }}. Check system temperatures in iDRAC.
SNMP_DELL_iDRAC_systemStateVoltageStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateVoltageStatus is NOK. Current state is: {{ $labels.systemStateVoltageStatusCombined }}. Check system voltage in iDRAC.
SNMP_DELL_iDRAC_systemStateAmperageStatusCombined_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemStateAmperageStatus is NOK. Current state is: {{ $labels.systemStateAmperageStatusCombined }}. Check system voltage in iDRAC.
SNMP_DELL_iDRAC_controllerRollUpStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - controllerRollUpStatus is NOK for controllerNumber {{ $labels.controllerNumber }} ( {{ $labels.controllerName }}). Current state is: {{ $labels.controllerRollUpStatus }}.
SNMP_DELL_iDRAC_controllerComponentStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - controllerComponentStatus is NOK for controllerNumber {{ $labels.controllerNumber }} ( {{ $labels.controllerName }}). Current state is: {{ $labels.controllerComponentStatus }}.
SNMP_DELL_iDRAC_physicalDiskState_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskState is NOK for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Current state is: {{ $labels.physicalDiskState }}.
SNMP_DELL_iDRAC_physicalDiskComponentStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskComponentStatus is NOK for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Current state is: {{ $labels.physicalDiskComponentStatus }}.
SNMP_DELL_iDRAC_physicalDiskSmartAlertIndication_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskSmartAlertIndication is NOK for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}).
SNMP_DELL_iDRAC_physicalDiskRemainingRatedWriteEndurance_WARNING 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskRemainingRatedWriteEndurance is less than 40 for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Value: {{ humanize $value }}
SNMP_DELL_iDRAC_physicalDiskRemainingRatedWriteEndurance_CRITICAL 5m critical hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - physicalDiskRemainingRatedWriteEndurance is less than 20 for physicalDiskNumber {{ $labels.physicalDiskNumber }} ( {{ $labels.physicalDiskDisplayName }}). Value: {{ humanize $value }}
SNMP_DELL_iDRAC_virtualDiskState_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - virtualDiskState is NOK for virtualDiskNumber {{ $labels.virtualDiskNumber }} ( {{ $labels.virtualDiskDisplayName }}). Current state is: {{ $labels.virtualDiskState }}.
SNMP_DELL_iDRAC_virtualDiskComponentStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - virtualDiskComponentStatus is NOK for virtualDiskNumber {{ $labels.virtualDiskNumber }} ( {{ $labels.virtualDiskDisplayName }}). Current state is: {{ $labels.virtualDiskComponentStatus }}.
SNMP_DELL_iDRAC_virtualDiskBadBlocksDetected 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - virtualDiskBadBlocksDetected for virtualDiskNumber {{ $labels.virtualDiskNumber }} ( {{ $labels.virtualDiskDisplayName }}).
alert-rules/allclusters/snmp-hp-cpq.yaml
Alert Name For Severity Type Description
SNMP_HP_CPQ_Overall_Health_NOK 5m critical hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Overall health status is NOK. Value: "{{ $labels.cpqHeMibCondition }}"
SNMP_HP_CPQ_Event_Log_Condition_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Event Log Condition is NOK. Value: "{{ $labels.cpqHeEventLogCondition }}"}}
SNMP_HP_CPQ_CPU_Health_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - CPU status is NOK. Value: "{{ $labels.cpqSeCpuCondition }}"}}
SNMP_HP_CPQ_Thermal_Condition_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Thermal condition status is NOK. Value: "{{ $labels.cpqHeThermalCondition }}"}}
SNMP_HP_CPQ_Power_Supply_Condition_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ]- Power supply condition status is NOK. Value: "{{ $labels.cpqHeFltTolPwrSupplyCondition }}"}}
SNMP_HP_CPQ_Storage_Subsystem_Condition_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Storage subsystem condition status is NOK. Value: "{{ $labels.cpqSsMibCondition }}"}}
SNMP_HP_CPQ_Controller_Overall_Condition_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - Controller "{{ $labels.cpqDaCntlrIndex }}"}} status is NOK. Value: "{{ $labels.cpqDaCntlrCondition }}"}}. This value represents the overall condition of this controller, and any associated logical drives, physical drives, and array accelerator.
SNMP_HP_CPQ_iLO_LicenseKey_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - HP iLO interface is missing its License activation.
alert-rules/allclusters/snmp-lenovo-xcc.yaml
Alert Name For Severity Type Description
SNMP_Lenovo_XCC_systemHealthStat_NOK 5m critical hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - systemHealthStat is not "normal". Current state is: {{ $labels.systemHealthStat }}
SNMP_Lenovo_XCC_cpuVpdHealthStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - cpuVpdHealthStatus for CPU "{{ $labels.cpuVpdDescription }}" is not "normal". Current state is: {{ $labels.cpuVpdHealthStatus }}
SNMP_Lenovo_XCC_raidDriveHealthStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - raidDriveHealthStatus for "{{ $labels.raidDriveName }}" is not "Normal". Current state is: {{ $labels.raidDriveHealthStatus }}
SNMP_Lenovo_XCC_memoryHealthStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - memoryHealthStatus for DIMM "{{ $labels.memoryVpdDescription }}" is not "Normal". Current state is: {{ $labels.memoryHealthStatus }}
SNMP_Lenovo_XCC_fanHealthStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - fanHealthStatus for Fan "{{ $labels.fanDescr }}" is not "Normal". Current state is: {{ $labels.fanHealthStatus }}
SNMP_Lenovo_XCC_voltHealthStatus_NOK 5m warning hardware Target "{{ $labels.alias }}" [ cluster: "{{ $labels.cluster_name }}" / address: "{{ $labels.instance }}" ] - voltHealthStatus for System Component "{{ $labels.voltDescr }}" is not "Normal". Current state is: {{ $labels.voltHealthStatus }}
alert-rules/management-cluster/flux.yaml
Alert Name For Severity Type Description
Flux_Kustomization_Failing 15m warning deployment Flux Kustomization "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace }}" fails to reconcile.
Flux_Kustomization_Failing_Cluster 60m warning deployment Flux Kustomization "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace }}" fails to reconcile.
Flux_HelmRelease_Failing 15m warning deployment Flux HelmRelease "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace}}" fails to reconcile.
Flux_Source_Failing 15m warning deployment Flux Source "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace}}" fails to reconcile.
Flux_Resource_Suspended 2h warning deployment Flux Resource "{{ $labels.name }}" in namespace "{{ $labels.exported_namespace }}" suspended.
alert-rules/management-cluster/harbor.yaml
Alert Name For Severity Type Description
Harbor_Component_Status_NOK 5m warning tools Harbor component "{{ $labels.component }}" status is DOWN.
alert-rules/management-cluster/health-alerts.yaml
Alert Name For Severity Type Description
KubeContainerWaitingManagement 1h critical k8s Pod "{{ $labels.namespace }}" / "{{ $labels.pod }}" has been in waiting state for more than 1 hour.
alert-rules/management-cluster/minio.yaml
Alert Name For Severity Type Description
MinIO_Cluster_Health_Status_NOK 5m critical storage MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" health status not OK.
MinIO_Cluster_Health_Status_Unknown 5m critical storage MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" health status is Unknown. The cluster does not return cluster metrics. Check pods logs for error messages.
MinIO_Cluster_Disk_Offline 5m critical storage MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" disk offline.
MinIO_Cluster_Disk_Space_Usage 5m warning storage MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" available disk space is less than 30%.
MinIO_Cluster_Disk_Space_Usage 5m critical storage MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" available disk space is less than 10%.
MinIO_Cluster_Disk_Space_Will_Fill_Up_Soon 5m warning storage MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" at the current rate of utilization the available disk space will run out in the next 2 days.
MinIO_Cluster_Tolerance 5m critical storage MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" has lost quorum on pool "{{ $labels.pool }}" / set "{{ $labels.set }}" for more than 5 minutes.
MinIO_Nodes_Offline 5m warning storage MinIO cluster "{{ $labels.minio_tenant }}" in namespace "{{ $labels.namespace }}" has offline nodes.
alert-rules/management-cluster/thanos.yaml
Alert Name For Severity Type Description
ThanosQueryStoreEndpointsMissing 5m critical monitoring Thanos Query is missing "{{ $labels.store_type }}" store type. Metrics served by this store type will not be available which can lead to alerting rules not evaluating properly.
ThanosCompactMultipleRunning 5m warning monitoring More than one Thanos Compact instance is running. Current number of instances: {{ $value }}.
ThanosCompactHalted 5m warning monitoring Thanos Compact {{ $labels.job }} has failed to run and now is halted.
ThanosCompactHighCompactionFailures 15m warning monitoring Thanos Compact {{ $labels.job }} is failing to execute {{ $value
ThanosCompactBucketHighOperationFailures 15m warning monitoring Thanos Compact {{ $labels.job }} Bucket is failing to execute {{ $value
ThanosCompactHasNotRun 5m warning monitoring Thanos Compact {{ $labels.job }} has not uploaded anything for 24 hours.
ThanosQueryHttpRequestQueryErrorRateHigh 5m critical monitoring Thanos Query {{ $labels.job }} is failing to handle {{ $value
ThanosQueryHttpRequestQueryRangeErrorRateHigh 5m critical monitoring Thanos Query {{ $labels.job }} is failing to handle {{ $value
ThanosQueryGrpcServerErrorRate 5m warning monitoring Thanos Query {{ $labels.job }} is failing to handle {{ $value
ThanosQueryGrpcClientErrorRate 5m warning monitoring Thanos Query {{ $labels.job }} is failing to send {{ $value
ThanosQueryHighDNSFailures 15m warning monitoring Thanos Query {{ $labels.job }} have {{ $value
ThanosQueryInstantLatencyHigh 10m critical monitoring Thanos Query {{ $labels.job }} has a 99th percentile latency of {{ $value }} seconds for instant queries.
ThanosQueryRangeLatencyHigh 10m critical monitoring Thanos Query {{ $labels.job }} has a 99th percentile latency of {{ $value }} seconds for range queries.
ThanosQueryOverload 15m warning monitoring Thanos Query {{ $labels.job }} has been overloaded for more than 15 minutes. This may be a symptom of excessive simultanous complex requests, low performance of the Prometheus API, or failures within these components. Assess the health of the Thanos query instances, the connnected Prometheus instances, look for potential senders of these requests and then contact support.
ThanosReceiveHttpRequestErrorRateHigh 5m critical monitoring Thanos Receive {{ $labels.job }} is failing to handle {{ $value
ThanosReceiveHttpRequestLatencyHigh 10m critical monitoring Thanos Receive {{ $labels.job }} has a 99th percentile latency of {{ $value }} seconds for requests.
ThanosReceiveHighReplicationFailures 5m warning monitoring Thanos Receive {{ $labels.job }} is failing to replicate {{ $value
ThanosReceiveHighForwardRequestFailures 5m info monitoring Thanos Receive {{ $labels.job }} is failing to forward {{ $value
ThanosReceiveHighHashringFileRefreshFailures 15m warning monitoring Thanos Receive {{ $labels.job }} is failing to refresh hashring file, {{ $value
ThanosReceiveConfigReloadFailure 5m warning monitoring Thanos Receive {{ $labels.job }} has not been able to reload hashring configurations.
ThanosReceiveNoUpload 3h critical monitoring Thanos Receive {{ $labels.pod }} has not uploaded latest data to object storage.
ThanosReceiveLimitsConfigReloadFailure 5m warning monitoring Thanos Receive {{ $labels.job }} has not been able to reload the limits configuration.
ThanosReceiveLimitsHighMetaMonitoringQueriesFailureRate 5m warning monitoring Thanos Receive {{ $labels.job }} is failing for {{ $value
ThanosReceiveTenantLimitedByHeadSeries 5m warning monitoring Thanos Receive tenant {{ $labels.tenant }} is limited by head series.
ThanosStoreGrpcErrorRate 5m warning monitoring Thanos Store {{ $labels.job }} is failing to handle {{ $value
ThanosStoreSeriesGateLatencyHigh 10m warning monitoring Thanos Store {{ $labels.job }} has a 99th percentile latency of {{ $value }} seconds for store series gate requests.
ThanosStoreBucketHighOperationFailures 15m warning monitoring Thanos Store {{ $labels.job }} Bucket is failing to execute {{ $value
ThanosStoreObjstoreOperationLatencyHigh 10m warning monitoring Thanos Store {{ $labels.job }} Bucket has a 99th percentile latency of {{ $value }} seconds for the bucket operations.
ThanosRuleQueueIsDroppingAlerts 5m critical monitoring Thanos Rule {{ $labels.pod }} is failing to queue alerts.
ThanosRuleSenderIsFailingAlerts 5m critical monitoring Thanos Rule {{ $labels.pod }} is failing to send alerts to alertmanager.
ThanosRuleHighRuleEvaluationFailures 5m critical monitoring Thanos Rule {{ $labels.pod }} is failing to evaluate rules.
ThanosRuleHighRuleEvaluationWarnings 15m info monitoring Thanos Rule {{ $labels.pod }} has high number of evaluation warnings.
ThanosRuleRuleEvaluationLatencyHigh 5m warning monitoring Thanos Rule {{ labels.pod }} has higher evaluation latency than interval for {{labels.rule_group}}.
ThanosRuleGrpcErrorRate 5m warning monitoring Thanos Ruler {{ $labels.pod }} is failing to handle {{ $value
ThanosRuleConfigReloadFailure 5m info monitoring Thanos Ruler {{ $labels.pod }} has not been able to reload its configuration.
ThanosRuleQueryHighDNSFailures 15m warning monitoring Thanos Ruler {{ $labels.pod }} has {{ $value
ThanosRuleAlertmanagerHighDNSFailures 15m warning monitoring Thanos Rule {{ $labels.pod }} has {{ $value
ThanosRuleNoEvaluationFor10Intervals 5m info monitoring Thanos Ruler {{ $labels.pod }} has rule groups that did not evaluate for at least 10x of their expected interval.
ThanosNoRuleEvaluations 5m critical monitoring Thanos Ruler {{ $labels.pod }} did not perform any rule evaluations in the past 10 minutes.
ThanosBucketReplicateErrorRate 5m critical monitoring Thanos Replicate is failing to run, {{ $value
ThanosBucketReplicateRunLatency 5m critical monitoring Thanos Replicate {{ $labels.job }} has a 99th percentile latency of {{ $value }} seconds for the replicate operations.
ThanosCompactIsDown 5m critical monitoring Thanos Compact has disappeared. Prometheus target for the component cannot be discovered.
ThanosQueryIsDown 5m critical monitoring Thanos Query has disappeared. Prometheus target for the component cannot be discovered.
ThanosQueryFrontendIsDown 5m critical monitoring Thanos Query Frontend has disappeared. Prometheus target for the component cannot be discovered.
ThanosReceiveIsDown 5m critical monitoring Thanos Receive has disappeared. Prometheus target for the component cannot be discovered.
ThanosRuleIsDown 5m critical monitoring Thanos Ruler has disappeared. Prometheus target for the component cannot be discovered.
ThanosStoreIsDown 5m critical monitoring Thanos Store has disappeared. Prometheus target for the component cannot be discovered.
alert-rules/my-workload-cluster/health-alerts.yaml
Alert Name For Severity Type Description
KubeJobFailedWorkload 15m warning k8s Job {{ $labels.namespace }}/{{ $labels.job_name }} failed to complete. Removing failed job after investigation should clear this alert.

v0.1.2: sylva-prometheus-rules: 0.1.2 - withdrawn, use 0.2.1 instead

Compare Source

⚠️ withdrawn, use 0.2.1 instead ⚠️

v0.1.1: sylva-prometheus-rules: 0.1.1 - withdrawn, use 0.2.1 instead

Compare Source

⚠️ withdrawn, use 0.2.1 instead ⚠️


Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻️ Rebasing: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this MR and you won't be reminded about this update again.


  • If you want to rebase/retry this MR, check this box

This MR has been generated by Renovate Bot Sylva instance.

CI configuration couldn't be handle by MR description. A dedicated comment has been posted to control it.

If no checkbox is checked, a default pipeline will be enabled (capm3, or capo if capo label is set)

Edited by Sylva Renovate bot

Merge request reports

Loading