Update dependency https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-thanos-rules.git to v0.1.0 (main)
This MR contains the following updates:
| Package | Update | Change |
|---|---|---|
| https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-thanos-rules.git | minor |
0.0.3 -> 0.1.0
|
Release Notes
sylva-projects/sylva-elements/helm-charts/sylva-thanos-rules (https://gitlab.com/sylva-projects/sylva-elements/helm-charts/sylva-thanos-rules.git)
v0.1.0: sylva-thanos-rules: 0.1.0
Merge Requests integrated in this release
CI
-
Update dependency renovate-bot/renovate-runner to v20.1.0
-
Update dependency renovate-bot/renovate-runner to v19.77.0 !65 renovate
-
Update dependency renovate-bot/renovate-runner to v19.84.1 !66 renovate
-
Update dependency renovate-bot/renovate-runner to v19.94.0 !69 renovate
-
Update dependency renovate-bot/renovate-runner to v19.107.1 !72 renovate
-
Update dependency renovate-bot/renovate-runner to v19.111.4 !73 renovate
-
Update dependency renovate-bot/renovate-runner to v20 !74 renovate
-
Update dependency renovate-bot/renovate-runner to v20.1.0 !76 renovate
-
-
Update dependency sylva-projects/sylva-elements/ci-tooling/ci-templates to v1.0.38
-
Update dependency sylva-projects/sylva-elements/ci-tooling/ci-templates to v1.0.34 !67 renovate
-
Update dependency sylva-projects/sylva-elements/ci-tooling/ci-templates to v1.0.35 !68 renovate
-
Update dependency sylva-projects/sylva-elements/ci-tooling/ci-templates to v1.0.36 !70 renovate
-
Update dependency sylva-projects/sylva-elements/ci-tooling/ci-templates to v1.0.37 !75 renovate
-
Update dependency sylva-projects/sylva-elements/ci-tooling/ci-templates to v1.0.38 !79 renovate
-
- Update dependency to-be-continuous/gitleaks to v2.7.0 !71 renovate
- Automatically update readme file !78 docs
Contributors
sylva-thanos-rules
Generate ConfigMap object for consumption by Thanos Ruler
Details about rules
rules/_helper_kubernetes_metadata.yml
| Alert Name | For | Severity | Type | Description |
|---|---|---|---|---|
| k8s-Metamonitoring_configuration_error_kube_namespace_labels | 45m | error | deployment | Metric "kube_namespace_labels" from cluster "{{ $labels.capi_cluster_name }}" is not exposed by "kube-state-metrics". |
| k8s-Metamonitoring_configuration_error_rancher_project_info | 45m | error | deployment | Metric "rancher_project_info" from the management cluster is not exposed by "kube-state-metrics". |
rules/clusters_state_rules.yml
| Alert Name | For | Severity | Type | Description |
|---|---|---|---|---|
| Sylva_cluster_Prometheus_not_Sending_Data_management | 45m | critical | deployment | Prometheus server from the management cluster has not sent data in the last 45m. |
| Sylva_cluster_Prometheus_not_Sending_Data | 45m | critical | deployment | Prometheus server from cluster "{{ $labels.capi_cluster_name }}" in namespace "{{ $labels.capi_cluster_namespace }}" has not sent data in the last 45m. |
| Sylva_clusters_different_number | 45m | critical | deployment | Some cluster is not properly provisioned in Rancher, check all clusters to see if cattle-agent is properly deployed |
| Sylva_clusters_metric_absent | 45m | error | deployment | Metric "capi_cluster_info" from the management cluster is not exposed by "kube-state-metrics". |
rules/kubernetes_capacity.yml
| Alert Name | For | Severity | Type | Description |
|---|---|---|---|---|
| k8s-Cluster_CPU_Overcommitted | 5m | warning | k8s | Kubernetes cluster "{{ $labels.cluster }}" has allocated over 75% of the allocatable CPUs. Node failures may cause Pods to be unschadulable due to lack of resources |
| k8s-Cluster_Memory_Overcommitted | 5m | warning | k8s | Kubernetes cluster "{{ $labels.cluster }}" has allocated over 75% of the allocatable Memory. Node failures may cause Pods to be unschadulable due to lack of resources |
| k8s-Too_Many_Pods | 15m | warning | k8s | Kubernetes cluster "{{ $labels.cluster }}" node {{ $labels.node }} number of pods over 90% of Pod number limit. Value: {{ humanize $value }}% |
rules/kubernetes_cluster_components.yml
| Alert Name | For | Severity | Type | Description |
|---|---|---|---|---|
| k8s-Version_Mismatch | 4h | warning | k8s | Kubernetes cluster "{{ $labels.cluster }}" has different versions of Kubernetes components running. Value: {{ $value }} |
| k8s-Client_Errors | 15m | warning | k8s | Kubernetes cluster "{{ $labels.cluster }}" API server client "{{ $labels.instance }}" job "{{ $labels.job }}" is experiencing errors. Value: {{ printf "%0.0f" $value }}% |
| k8s-API_Global_Error_Rate_High | 15m | warning | k8s | Kubernetes cluster "{{ $labels.cluster }}" API server is returning errors for over 3% of requests. Value: {{ humanize $value }}% |
| k8s-API_Error_Rate_High | 15m | warning | k8s | Kubernetes cluster "{{ $labels.cluster }}" API server is returning errors for 10% of requests for {{ $labels.verb }} {{ $labels.resource }} {{ $labels.subresource }}. Value: {{ humanize $value }}% |
rules/kubernetes_jobs.yml
| Alert Name | For | Severity | Type | Description |
|---|---|---|---|---|
| k8s-CronJob_Status_Failed | 5m | warning | k8s | CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} in cluster "{{ $labels.cluster }}" failed. Last job has failed multiple times. Value: {{ $value }} |
| k8s-CronJob_Taking_Too_Long | 0m | warning | k8s | CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} in cluster "{{ $labels.cluster }}" is taking too long to completes - is over deadline. Value: {{ humanizeDuration $value }} |
| k8s-Job_not_Completed | 15m | warning | k8s | Job {{ $labels.namespace }}/{{ $labels.job_name }} in cluster "{{ $labels.cluster }}" is taking more than 12h to complete. |
| k8s-Job_Failed | 15m | warning | k8s | Job {{ $labels.namespace }}/{{ $labels.job_name }} in cluster "{{ $labels.cluster }}" failed to complete. Removing failed job after investigation should clear this alert. |
rules/kubernetes_nodes.yml
| Alert Name | For | Severity | Type | Description |
|---|---|---|---|---|
| k8s-Node_status_OutOfDisk | 5m | warning | k8s | Node {{ $labels.node }} in cluster "{{ $labels.cluster }}" is almost out of disk space |
| k8s-Node_status_MemoryPressure | 5m | warning | k8s | Node {{ $labels.node }} in cluster "{{ $labels.cluster }}" is under memory pressure. |
| k8s-Node_status_DiskPressure | 5m | warning | k8s | Node {{ $labels.node }} in cluster "{{ $labels.cluster }}" is under disk pressure |
| k8s-Node_status_PIDPressure | 5m | warning | k8s | Node {{ $labels.node }} in cluster "{{ $labels.cluster }}" is under PID pressure |
| k8s-Node_status_NotReady | 5m | critical | k8s | Node {{ $labels.node }} in cluster "{{ $labels.cluster }}" has been not been Ready for more than an hour |
| k8s-Node_status_NetworkUnavailable | 5m | warning | k8s | Node {{ $labels.node }} in cluster "{{ $labels.cluster }}" has NetworkUnavailable condition. |
rules/kubernetes_pods.yml
| Alert Name | For | Severity | Type | Description |
|---|---|---|---|---|
| k8s-Kube_Pod_not_Ready | 15m | warning | k8s | Pod {{ $labels.namespace }}/{{ $labels.pod }} in cluster "{{ $labels.cluster }}" has been in a non-ready state for longer than 15 minutes. |
| k8s-Pod_OOMKilled | 0m | critical | k8s | Pod {{ $labels.namespace }}/{{ $labels.pod }} in cluster "{{ $labels.cluster}}" has been restarted due to OOMKilled reason in the last hour. Value: {{ humanize $value }} |
| k8s-Pod_CrashLooping | 15m | warning | k8s | Pod {{ $labels.namespace }}/{{ $labels.pod }} in cluster "{{ $labels.cluster}}" was restarted more than 5 times within the last hour. Value: {{ humanize $value }} |
| k8s-init_Container_CrashLooping | 15m | warning | k8s | Init Container from Pod {{ $labels.namespace }}/{{ $labels.pod }} in cluster "{{ $labels.cluster }}" was restarted more than 5 times within the last hour. Value: {{ humanize $value }} |
| k8s-StatefulSet_Replicas_not_Ready | 15m | warning | k8s | StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} in cluster "{{ $labels.cluster }}" has replicas in "notReady" state. Value: {{ humanize $value }} |
| k8s-StatefulSet_Generation_Mismatch | 15m | warning | k8s | StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} in cluster "{{ $labels.cluster }}" generation for does not match. This indicates that the StatefulSet has failed but has not been rolled back |
| k8s-StatefulSet_Update_not_Rolled_Out | 15m | warning | k8s | StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }} in cluster "{{ $labels.cluster }}" update has not been rolled out |
| k8s-ReplicaSet_Replicas_Mismatch | 15m | warning | k8s | ReplicaSet {{ $labels.namespace }}/{{ $labels.replicaset }} in cluster "{{ $labels.cluster }}" has not matched the expected number of replicas |
| k8s-Deployment_Replicas_Mismatch | 15m | warning | k8s | Deployment {{ $labels.namespace }}/{{ $labels.deployment }} in cluster "{{ $labels.cluster }}" has not matched the expected number of replicas. Value: {{ $value }} / {{ printf kube_deployment_spec_replicas{deployment="%s", cluster="%s"} $labels.deployment $labels.cluster |
| k8s-Deployment_Generation_Mismatch | 15m | warning | k8s | Deployment {{ $labels.namespace }}/{{ $labels.deployment }} in cluster "{{ $labels.cluster }}" generation does not match expected one |
| k8s-Deployment_Replicas_Not_Updated | 15m | warning | k8s | Deployment {{ $labels.namespace }}/{{ $labels.deployment }} in cluster "{{ $labels.cluster }}" replicas are not updated and available for deployment |
| k8s-Deployment_Rollout_Stuck | 15m | warning | k8s | Deployment {{ $labels.namespace }}/{{ $labels.deployment }} in cluster "{{ $labels.cluster }}" is not progressing for longer than 15 minutes. |
| k8s-DaemonSet_Rollout_Stuck | 15m | warning | k8s | DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} in cluster "{{ $labels.cluster }}" has less than 100% of desired pods scheduled and ready. Value: {{ humanize $value }}% |
| k8s-DaemonSet_Not_Scheduled | 15m | warning | k8s | DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} in cluster "{{ $labels.cluster }}" has unscheduled pods. Value: {{ humanize $value }} |
| k8s-DaemonSet_MisScheduled | 15m | warning | k8s | DaemonSet pods {{ $labels.namespace }}/{{ $labels.daemonset }} in cluster "{{ $labels.cluster }}" are running where they are not supposed to. Value: {{ humanize $value }} |
rules/kubernetes_storage.yml
| Alert Name | For | Severity | Type | Description |
|---|---|---|---|---|
| k8s-Persistent_Volume_Usage_Critical | 5m | warning | k8s | PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} in cluster "{{ $labels.cluster }}" is over 80% used. Value: {{ printf "%0.2f" $value }}% |
| k8s-Persistent_Volume_Full_in_4_days | 5m | warning | k8s | PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} in cluster "{{ $labels.cluster }}" will fill up in 4 days at the current rate of utilization. Value: {{ printf "%0.2f" $value }}% available |
| k8s-Persistent_Volume_Errors | 5m | warning | k8s | PersistentVolume "{{ $labels.persistentvolume }}" in cluster "{{ $labels.cluster }}" has status "{{ $labels.phase }}" |
| k8s-Orphan_Persistent_Volume_Claim | 3h | warning | k8s | PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} in cluster "{{ $labels.cluster }}" is not used by any pod |
Configuration
-
If you want to rebase/retry this MR, check this box
This MR has been generated by Renovate Bot Sylva instance.
CI configuration couldn't be handle by MR description. A dedicated comment has been posted to control it.
If no checkbox is checked, a default pipeline will be enabled (capm3, or capo if capo label is set)