Skip to content

[ops] Update grafana/mimir Docker tag to v2.14.0

Soos requested to merge renovate/ops-grafana-mimir-2.x into master

This MR contains the following updates:

Package Type Update Change
grafana/mimir (source) ops minor 2.12.0 -> 2.14.0

Warning

Some dependencies could not be looked up. Check the warning logs for more information.


Release Notes

grafana/mimir (grafana/mimir)

v2.14.0

Grafana Mimir
  • [CHANGE] Update minimal supported version of Go to 1.22. #​9134
  • [CHANGE] Store-gateway / querier: enable streaming chunks from store-gateways to queriers by default. #​6646
  • [CHANGE] Querier: honor the start/end time range specified in the read hints when executing a remote read request. #​8431
  • [CHANGE] Querier: return only samples within the queried start/end time range when executing a remote read request using "SAMPLES" mode. Previously, samples outside of the range could have been returned. Samples outside of the queried time range may still be returned when executing a remote read request using "STREAMED_XOR_CHUNKS" mode. #​8463
  • [CHANGE] Querier: Set minimum for -querier.max-concurrent to four to prevent queue starvation with querier-worker queue prioritization algorithm; values below the minimum four are ignored and set to the minimum. #​9054
  • [CHANGE] Store-gateway: enabled -blocks-storage.bucket-store.max-concurrent-queue-timeout by default with a timeout of 5 seconds. #​8496
  • [CHANGE] Store-gateway: enabled -blocks-storage.bucket-store.index-header.lazy-loading-concurrency-queue-timeout by default with a timeout of 5 seconds . #​8667
  • [CHANGE] Distributor: Incoming OTLP requests were previously size-limited by using limit from -distributor.max-recv-msg-size option. We have added option -distributor.max-otlp-request-size for limiting OTLP requests, with default value of 100 MiB. #​8574
  • [CHANGE] Distributor: remove metric cortex_distributor_sample_delay_seconds. #​8698
  • [CHANGE] Query-frontend: Remove deprecated frontend.align_queries_with_step YAML configuration. The configuration option has been moved to per-tenant and default limits since Mimir 2.12. #​8733 #​8735
  • [CHANGE] Store-gateway: Change default of -blocks-storage.bucket-store.max-concurrent to 200. #​8768
  • [CHANGE] Added new metric cortex_compactor_disk_out_of_space_errors_total which counts how many times a compaction failed due to the compactor being out of disk, alert if there is a single increase. #​8237 #​8278
  • [CHANGE] Store-gateway: Remove experimental parameter -blocks-storage.bucket-store.series-selection-strategy. The default strategy is now worst-case. #​8702
  • [CHANGE] Store-gateway: Rename -blocks-storage.bucket-store.series-selection-strategies.worst-case-series-preference to -blocks-storage.bucket-store.series-fetch-preference and promote to stable. #​8702
  • [CHANGE] Querier, store-gateway: remove deprecated -querier.prefer-streaming-chunks-from-store-gateways=true. Streaming from store-gateways is now always enabled. #​8696
  • [CHANGE] Ingester: remove deprecated -ingester.return-only-grpc-errors. #​8699 #​8828
  • [CHANGE] Distributor, ruler: remove deprecated -ingester.client.report-grpc-codes-in-instrumentation-label-enabled. #​8700
  • [CHANGE] Ingester client: experimental support for client-side circuit breakers, their configuration options (-ingester.client.circuit-breaker.*) and metrics (cortex_ingester_client_circuit_breaker_results_total, cortex_ingester_client_circuit_breaker_transitions_total) were removed. #​8802
  • [CHANGE] Ingester: circuit breakers do not open in case of per-instance limit errors anymore. Opening can be triggered only in case of push and pull requests exceeding the configured duration. #​8854
  • [CHANGE] Query-frontend: Return 413 Request Entity Too Large if a response shard for an /active_series request is too large. #​8861
  • [CHANGE] Distributor: Promote replying with Retry-After header on retryable errors to stable and set -distributor.retry-after-header.enabled=true by default. #​8694
  • [CHANGE] Distributor: Replace -distributor.retry-after-header.max-backoff-exponent and -distributor.retry-after-header.base-seconds with -distributor.retry-after-header.min-backoff and -distributor.retry-after-header.max-backoff for easier configuration. #​8694
  • [CHANGE] Ingester: increase the default inactivity timeout of active series (-ingester.active-series-metrics-idle-timeout) from 10m to 20m. #​8975
  • [CHANGE] Distributor: Remove -distributor.enable-otlp-metadata-storage flag, which was deprecated in version 2.12. #​9069
  • [CHANGE] Ruler: Removed -ruler.drain-notification-queue-on-shutdown option, which is now enabled by default. #​9115
  • [CHANGE] Querier: allow wrapping errors with context errors only when the former actually correspond to context.Canceled and context.DeadlineExceeded. #​9175
  • [CHANGE] Query-scheduler: Remove the experimental -query-scheduler.use-multi-algorithm-query-queue flag. The new multi-algorithm tree queue is always used for the scheduler. #​9210
  • [CHANGE] Distributor: reject incoming requests until the distributor service has started. #​9317
  • [CHANGE] Ingester, Distributor: Remove deprecated -ingester.limit-inflight-requests-using-grpc-method-limiter and -distributor.limit-inflight-requests-using-grpc-method-limiter. The feature was deprecated and enabled by default in Mimir 2.12. #​9407
  • [CHANGE] Querier: Remove deprecated -querier.max-query-into-future. The feature was deprecated in Mimir 2.12. #​9407
  • [CHANGE] Cache: Deprecate experimental support for Redis as a cache backend. The support is set to be removed in the next major release. #​9453
  • [FEATURE] Alertmanager: Added -alertmanager.log-parsing-label-matchers to control logging when parsing label matchers. This flag is intended to be used with -alertmanager.utf8-strict-mode-enabled to validate UTF-8 strict mode is working as intended. The default value is false. #​9173
  • [FEATURE] Alertmanager: Added -alertmanager.utf8-migration-logging-enabled to enable logging of tenant configurations that are incompatible with UTF-8 strict mode. The default value is false. #​9174
  • [FEATURE] Querier: add experimental streaming PromQL engine, enabled with -querier.query-engine=mimir. #​8422 #​8430 #​8454 #​8455 #​8360 #​8490 #​8508 #​8577 #​8660 #​8671 #​8677 #​8747 #​8850 #​8872 #​8838 #​8911 #​8909 #​8923 #​8924 #​8925 #​8932 #​8933 #​8934 #​8962 #​8986 #​8993 #​8995 #​9008 #​9017 #​9018 #​9019 #​9120 #​9121 #​9136 #​9139 #​9140 #​9145 #​9191 #​9192 #​9194 #​9196 #​9201 #​9212 #​9225 #​9260 #​9272 #​9277 #​9278 #​9280 #​9281 #​9342 #​9343 #​9371
  • [FEATURE] Experimental Kafka-based ingest storage. #​6888 #​6894 #​6929 #​6940 #​6951 #​6974 #​6982 #​7029 #​7030 #​7091 #​7142 #​7147 #​7148 #​7153 #​7160 #​7193 #​7349 #​7376 #​7388 #​7391 #​7393 #​7394 #​7402 #​7404 #​7423 #​7424 #​7437 #​7486 #​7503 #​7508 #​7540 #​7621 #​7682 #​7685 #​7694 #​7695 #​7696 #​7697 #​7701 #​7733 #​7734 #​7741 #​7752 #​7838 #​7851 #​7871 #​7877 #​7880 #​7882 #​7887 #​7891 #​7925 #​7955 #​7967 #​8031 #​8063 #​8077 #​8088 #​8135 #​8176 #​8184 #​8194 #​8216 #​8217 #​8222 #​8233 #​8503 #​8542 #​8579 #​8657 #​8686 #​8688 #​8703 #​8706 #​8708 #​8738 #​8750 #​8778 #​8808 #​8809 #​8841 #​8842 #​8845 #​8853 #​8886 #​8988
    • What it is:
      • When the new ingest storage architecture is enabled, distributors write incoming write requests to a Kafka-compatible backend, and the ingesters asynchronously replay ingested data from Kafka. In this architecture, the write and read path are de-coupled through a Kafka-compatible backend. The write path and Kafka load is a function of the incoming write traffic, the read path load is a function of received queries. Whatever the load on the read path, it doesn't affect the write path.
    • New configuration options:
      • -ingest-storage.enabled
      • -ingest-storage.kafka.*: configures Kafka-compatible backend and how clients interact with it.
      • -ingest-storage.ingestion-partition-tenant-shard-size: configures the per-tenant shuffle-sharding shard size used by partitions ring.
      • -ingest-storage.read-consistency: configures the default read consistency.
      • -ingest-storage.migration.distributor-send-to-ingesters-enabled: enabled tee-ing writes to classic ingesters and Kafka, used during a live migration to the new ingest storage architecture.
      • -ingester.partition-ring.*: configures partitions ring backend.
  • [FEATURE] Querier: added support for limitk() and limit_ratio() experimental PromQL functions. Experimental functions are disabled by default, but can be enabled setting -querier.promql-experimental-functions-enabled=true in the query-frontend and querier. #​8632
  • [FEATURE] Querier: experimental support for X-Mimir-Chunk-Info-Logger header that triggers logging information about TSDB chunks loaded from ingesters and store-gateways in the querier. The header should contain the comma separated list of labels for which their value will be included in the logs. #​8599
  • [FEATURE] Query frontend: added new query pruning middleware to enable pruning dead code (eg. expressions that cannot produce any results) and simplifying expressions (eg. expressions that can be evaluated immediately) in queries. #​9086
  • [FEATURE] Ruler: added experimental configuration, -ruler.rule-evaluation-write-enabled, to disable writing the result of rule evaluation to ingesters. This feature can be used for testing purposes. #​9060
  • [FEATURE] Ingester: added experimental configuration ingester.ignore-ooo-exemplars. When set to true out of order exemplars are no longer reported to the remote write client. #​9151
  • [ENHANCEMENT] Compactor: Add cortex_compactor_compaction_job_duration_seconds and cortex_compactor_compaction_job_blocks histogram metrics to track duration of individual compaction jobs and number of blocks per job. #​8371
  • [ENHANCEMENT] Rules: Added per namespace max rules per rule group limit. The maximum number of rules per rule groups for all namespaces continues to be configured by -ruler.max-rules-per-rule-group, but now, this can be superseded by the new -ruler.max-rules-per-rule-group-by-namespace option on a per namespace basis. This new limit can be overridden using the overrides mechanism to be applied per-tenant. #​8378
  • [ENHANCEMENT] Rules: Added per namespace max rule groups per tenant limit. The maximum number of rule groups per rule tenant for all namespaces continues to be configured by -ruler.max-rule-groups-per-tenant, but now, this can be superseded by the new -ruler.max-rule-groups-per-tenant-by-namespace option on a per namespace basis. This new limit can be overridden using the overrides mechanism to be applied per-tenant. #​8425
  • [ENHANCEMENT] Ruler: Added support to protect rules namespaces from modification. The -ruler.protected-namespaces flag can be used to specify namespaces that are protected from rule modifications. The header X-Mimir-Ruler-Override-Namespace-Protection can be used to override the protection. #​8444
  • [ENHANCEMENT] Query-frontend: be able to block remote read queries via the per tenant runtime override blocked_queries. #​8372 #​8415
  • [ENHANCEMENT] Query-frontend: added remote_read to op supported label values for the cortex_query_frontend_queries_total metric. #​8412
  • [ENHANCEMENT] Query-frontend: log the overall length and start, end time offset from current time for remote read requests. The start and end times are calculated as the miminum and maximum times of the individual queries in the remote read request. #​8404
  • [ENHANCEMENT] Storage Provider: Added option -<prefix>.s3.dualstack-enabled that allows disabling S3 client from resolving AWS S3 endpoint into dual-stack IPv4/IPv6 endpoint. Defaults to true. #​8405
  • [ENHANCEMENT] HA Tracker: Added reporting of most recent elected replica change via cortex_ha_tracker_last_election_timestamp_seconds gauge, logging, and a new column in the HA Tracker status page. #​8507
  • [ENHANCEMENT] Use sd_notify to send events to systemd at start and stop of mimir services. Default systemd mimir.service config now wait for those events with a configurable timeout TimeoutStartSec default is 3 min to handle long start time (ex. store-gateway). #​8220 #​8555 #​8658
  • [ENHANCEMENT] Alertmanager: Reloading config and templates no longer needs to hit the disk. #​4967
  • [ENHANCEMENT] Compactor: Added experimental -compactor.in-memory-tenant-meta-cache-size option to set size of in-memory cache (in number of items) for parsed meta.json files. This can help when a tenant has many meta.json files and their parsing before each compaction cycle is using a lot of CPU time. #​8544
  • [ENHANCEMENT] Distributor: Interrupt OTLP write request translation when context is canceled or has timed out. #​8524
  • [ENHANCEMENT] Ingester, store-gateway: optimised regular expression matching for patterns like 1.*|2.*|3.*|...|1000.*. #​8632
  • [ENHANCEMENT] Query-frontend: Add header_cache_control to query stats. #​8590
  • [ENHANCEMENT] Query-scheduler: Introduce query-scheduler.use-multi-algorithm-query-queue, which allows use of an experimental queue structure, with no change in external queue behavior. #​7873
  • [ENHANCEMENT] Query-scheduler: Improve CPU/memory performance of experimental query-scheduler. #​8871
  • [ENHANCEMENT] Expose a new s3.trace.enabled configuration option to enable detailed logging of operations against S3-compatible object stores. #​8690
  • [ENHANCEMENT] memberlist: locally-generated messages (e.g. ring updates) are sent to gossip network before forwarded messages. Introduced -memberlist.broadcast-timeout-for-local-updates-on-shutdown option to modify how long to wait until queue with locally-generated messages is empty when shutting down. Previously this was hard-coded to 10s, and wait included all messages (locally-generated and forwarded). Now it defaults to 10s, 0 means no timeout. Increasing this value may help to avoid problem when ring updates on shutdown are not propagated to other nodes, and ring entry is left in a wrong state. #​8761
  • [ENHANCEMENT] Querier: allow using both raw numbers of seconds and duration literals in queries where previously only one or the other was permitted. For example, predict_linear now accepts a duration literal (eg. predict_linear(..., 4h)), and range vector selectors now accept a number of seconds (eg. rate(metric[2])). #​8780
  • [ENHANCEMENT] Ruler: Add ruler.max-independent-rule-evaluation-concurrency to allow independent rules of a tenant to be run concurrently. You can control the amount of concurrency per tenant is controlled via the -ruler.max-independent-rule-evaluation-concurrency-per-tenan as a limit. Use a -ruler.max-independent-rule-evaluation-concurrency value of 0 can be used to disable the feature for all tenants. By default, this feature is disabled. A rule is eligible for concurrency as long as it doesn't depend on any other rules, doesn't have any other rules that depend on it, and has a total rule group runtime that exceeds 50% of its interval by default. The threshold can can be adjusted with -ruler.independent-rule-evaluation-concurrency-min-duration-percentage. #​8146 #​8858 #​8880 #​8884
    • This work introduces the following metrics:
      • cortex_ruler_independent_rule_evaluation_concurrency_slots_in_use
      • cortex_ruler_independent_rule_evaluation_concurrency_attempts_started_total
      • cortex_ruler_independent_rule_evaluation_concurrency_attempts_incomplete_total
      • cortex_ruler_independent_rule_evaluation_concurrency_attempts_completed_total
  • [ENHANCEMENT] Expose a new s3.session-token configuration option to enable using temporary security credentials. #​8952
  • [ENHANCEMENT] Add HA deduplication features to the mimir-microservices-mode development environment. #​9012
  • [ENHANCEMENT] Remove experimental -query-frontend.additional-query-queue-dimensions-enabled and -query-scheduler.additional-query-queue-dimensions-enabled. Mimir now always includes "query components" as a queue dimension. #​8984 #​9135
  • [ENHANCEMENT] Add a new ingester endpoint to prepare instances to downscale. #​8956
  • [ENHANCEMENT] Query-scheduler: Add query-scheduler.prioritize-query-components which, when enabled, will primarily prioritize dequeuing fairly across queue components, and secondarily prioritize dequeuing fairly across tenants. When disabled, tenant fairness is primarily prioritized. query-scheduler.use-multi-algorithm-query-queue must be enabled in order to use this flag. #​9016 #​9071
  • [ENHANCEMENT] Update runtime configuration to read gzip-compressed files with .gz extension. #​9074
  • [ENHANCEMENT] Ingester: add cortex_lifecycler_read_only metric which is set to 1 when ingester's lifecycler is set to read-only mode. #​9095
  • [ENHANCEMENT] Add a new field, encode_time_seconds to query stats log messages, to record the amount of time it takes the query-frontend to encode a response. This does not include any serialization time for downstream components. #​9062
  • [ENHANCEMENT] OTLP: If the flag -distributor.otel-created-timestamp-zero-ingestion-enabled is true, OTel start timestamps are converted to Prometheus zero samples to mark series start. #​9131
  • [ENHANCEMENT] Querier: attach logs emitted during query consistency check to trace span for query. #​9213
  • [ENHANCEMENT] Query-scheduler: Experimental -query-scheduler.prioritize-query-components flag enables the querier-worker queue priority algorithm to take precedence over tenant rotation when dequeuing requests. #​9220
  • [ENHANCEMENT] Add application credential arguments for Openstack Swift storage backend. #​9181
  • [BUGFIX] Ruler: add support for draining any outstanding alert notifications before shutting down. This can be enabled with the -ruler.drain-notification-queue-on-shutdown=true CLI flag. #​8346
  • [BUGFIX] Query-frontend: fix -querier.max-query-lookback enforcement when -compactor.blocks-retention-period is not set, and viceversa. #​8388
  • [BUGFIX] Ingester: fix sporadic not found error causing an internal server error if label names are queried with matchers during head compaction. #​8391
  • [BUGFIX] Ingester, store-gateway: fix case insensitive regular expressions not matching correctly some Unicode characters. #​8391
  • [BUGFIX] Query-frontend: "query stats" log now includes the actual status_code when the request fails due to an error occurring in the query-frontend itself. #​8407
  • [BUGFIX] Store-gateway: fixed a case where, on a quick subsequent restart, the previous lazy-loaded index header snapshot was overwritten by a partially loaded one. #​8281
  • [BUGFIX] Ingester: fixed timestamp reported in the "the sample has been rejected because its timestamp is too old" error when the write request contains only histograms. #​8462
  • [BUGFIX] Store-gateway: store sparse index headers atomically to disk. #​8485
  • [BUGFIX] Query scheduler: fix a panic in request queueing. #​8451
  • [BUGFIX] Querier: fix issue where "context canceled" is logged for trace spans for requests to store-gateways that return no series when chunks streaming is enabled. #​8510
  • [BUGFIX] Alertmanager: Fix per-tenant silence limits not reloaded during runtime. #​8456
  • [BUGFIX] Alertmanager: Fixes a number of bugs in silences which could cause an existing silence to be deleted/expired when updating the silence failed. This could happen when the replacing silence was invalid or exceeded limits. #​8525
  • [BUGFIX] Alertmanager: Fix help message for utf-8-strict-mode. #​8572
  • [BUGFIX] Query-frontend: Ensure that internal errors result in an HTTP 500 response code instead of 422. #​8595 #​8666
  • [BUGFIX] Configuration: Multi line envs variables are flatten during injection to be compatible with YAML syntax
  • [BUGFIX] Querier: fix issue where queries can return incorrect results if a single store-gateway returns overlapping chunks for a series. #​8827
  • [BUGFIX] HA Tracker: store correct timestamp for last received request from elected replica. #​8821
  • [BUGFIX] Querier: do not return grpc: the client connection is closing errors as HTTP 499. #​8865 #​8888
  • [BUGFIX] Compactor: fix a race condition between different compactor replicas that may cause a deleted block to be still referenced as non-deleted in the bucket index. #​8905
  • [BUGFIX] Querier: fix issue where some native histogram-related warnings were not emitted when rate() was used over native histograms. #​8918
  • [BUGFIX] Ruler: map invalid org-id errors to 400 status code. #​8935
  • [BUGFIX] Querier: Fix invalid query results when multiple chunks are being merged. #​8992
  • [BUGFIX] Query-frontend: return annotations generated during evaluation of sharded queries. #​9138
  • [BUGFIX] Querier: Support optional start and end times on /prometheus/api/v1/labels, /prometheus/api/v1/label/<label>/values, and /prometheus/api/v1/series when max_query_into_future: 0. #​9129
  • [BUGFIX] Alertmanager: Fix config validation gap around unreferenced templates. #​9207
  • [BUGFIX] Alertmanager: Fix goroutine leak when stored config fails to apply and there is no existing tenant alertmanager #​9211
  • [BUGFIX] Querier: fix issue where both recently compacted blocks and their source blocks can be skipped during querying if store-gateways are restarting. #​9224
  • [BUGFIX] Alertmanager: fix receiver firewall to detect 0.0.0.0 and IPv6 interface-local multicast address as local addresses. #​9308
Mixin
  • [CHANGE] Dashboards: set default auto-refresh rate to 5m. #​8758
  • [ENHANCEMENT] Dashboards: allow switching between using classic or native histograms in dashboards.
    • Overview dashboard: status, read/write latency and queries/ingestion per sec panels, cortex_request_duration_seconds metric. #​7674 #​8502 #​8791
    • Writes dashboard: cortex_request_duration_seconds metric. #​8757 #​8791
    • Reads dashboard: cortex_request_duration_seconds metric. #​8752
    • Rollout progress dashboard: cortex_request_duration_seconds metric. #​8779
    • Alertmanager dashboard: cortex_request_duration_seconds metric. #​8792
    • Ruler dashboard: cortex_request_duration_seconds metric. #​8795
    • Queries dashboard: cortex_request_duration_seconds metric. #​8800
    • Remote ruler reads dashboard: cortex_request_duration_seconds metric. #​8801
  • [ENHANCEMENT] Alerts: MimirRunningIngesterReceiveDelayTooHigh alert has been tuned to be more reactive to high receive delay. #​8538
  • [ENHANCEMENT] Dashboards: improve end-to-end latency and strong read consistency panels when experimental ingest storage is enabled. #​8543 #​8830
  • [ENHANCEMENT] Dashboards: Add panels for monitoring ingester autoscaling when not using ingest-storage. These panels are disabled by default, but can be enabled using the autoscaling.ingester.enabled: true config option. #​8484
  • [ENHANCEMENT] Dashboards: Add panels for monitoring store-gateway autoscaling. These panels are disabled by default, but can be enabled using the autoscaling.store_gateway.enabled: true config option. #​8824
  • [ENHANCEMENT] Dashboards: add panels to show writes to experimental ingest storage backend in the "Mimir / Ruler" dashboard, when _config.show_ingest_storage_panels is enabled. #​8732
  • [ENHANCEMENT] Dashboards: show all series in tooltips on time series dashboard panels. #​8748
  • [ENHANCEMENT] Dashboards: add compactor autoscaling panels to "Mimir / Compactor" dashboard. The panels are disabled by default, but can be enabled setting _config.autoscaling.compactor.enabled to true. #​8777
  • [ENHANCEMENT] Alerts: added MimirKafkaClientBufferedProduceBytesTooHigh alert. #​8763
  • [ENHANCEMENT] Dashboards: added "Kafka produced records / sec" panel to "Mimir / Writes" dashboard. #​8763
  • [ENHANCEMENT] Alerts: added MimirStrongConsistencyOffsetNotPropagatedToIngesters alert, and rename MimirIngesterFailsEnforceStrongConsistencyOnReadPath alert to MimirStrongConsistencyEnforcementFailed. #​8831
  • [ENHANCEMENT] Dashboards: remove "All" option for namespace dropdown in dashboards. #​8829
  • [ENHANCEMENT] Dashboards: add Kafka end-to-end latency outliers panel in the "Mimir / Writes" dashboard. #​8948
  • [ENHANCEMENT] Dashboards: add "Out-of-order samples appended" panel to "Mimir / Tenants" dashboard. #​8939
  • [ENHANCEMENT] Alerts: RequestErrors and RulerRemoteEvaluationFailing have been enriched with a native histogram version. #​9004
  • [ENHANCEMENT] Dashboards: add 'Read path' selector to 'Mimir / Queries' dashboard. #​8878
  • [ENHANCEMENT] Dashboards: add annotation indicating active series are being reloaded to 'Mimir / Tenants' dashboard. #​9257
  • [ENHANCEMENT] Dashboards: limit results on the 'Failed evaluations rate' panel of the 'Mimir / Tenants' dashboard to 50 to avoid crashing the page when there are many failing groups. #​9262
  • [FEATURE] Alerts: add MimirGossipMembersEndpointsOutOfSync alert. #​9347
  • [BUGFIX] Dashboards: fix "current replicas" in autoscaling panels when HPA is not active. #​8566
  • [BUGFIX] Alerts: do not fire MimirRingMembersMismatch during the migration to experimental ingest storage. #​8727
  • [BUGFIX] Dashboards: avoid over-counting of ingesters metrics when migrating to experimental ingest storage. #​9170
  • [BUGFIX] Dashboards: fix job_prefix not utilized in jobSelector. #​9155
Jsonnet
  • [CHANGE] Changed the following config options when the experimental ingest storage is enabled: #​8874
    • ingest_storage_ingester_autoscaling_min_replicas changed to ingest_storage_ingester_autoscaling_min_replicas_per_zone
    • ingest_storage_ingester_autoscaling_max_replicas changed to ingest_storage_ingester_autoscaling_max_replicas_per_zone
  • [CHANGE] Changed the overrides configmap generation to remove any field with null value. #​9116
  • [CHANGE] $.replicaTemplate function now takes replicas and labelSelector parameter. #​9248
  • [CHANGE] Renamed ingest_storage_ingester_autoscaling_replica_template_custom_resource_definition_enabled to replica_template_custom_resource_definition_enabled. #​9248
  • [FEATURE] Add support for automatically deleting compactor, store-gateway, ingester and read-write mode backend PVCs when the corresponding StatefulSet is scaled down. #​8382 #​8736
  • [FEATURE] Automatically set GOMAXPROCS on ingesters. #​9273
  • [ENHANCEMENT] Added the following config options to set the number of partition ingester replicas when migrating to experimental ingest storage. #​8517
    • ingest_storage_migration_partition_ingester_zone_a_replicas
    • ingest_storage_migration_partition_ingester_zone_b_replicas
    • ingest_storage_migration_partition_ingester_zone_c_replicas
  • [ENHANCEMENT] Distributor: increase -distributor.remote-timeout when the experimental ingest storage is enabled. #​8518
  • [ENHANCEMENT] Memcached: Update to Memcached 1.6.28 and memcached-exporter 0.14.4. #​8557
  • [ENHANCEMENT] Rollout-operator: Allow the rollout-operator to be used as Kubernetes statefulset webhook to enable no-downscale and prepare-downscale annotations to be used on ingesters or store-gateways. #​8743
  • [ENHANCEMENT] Do not deploy ingester-zone-c when experimental ingest storage is enabled and ingest_storage_ingester_zones is configured to 2. #​8776
  • [ENHANCEMENT] Added the config option ingest_storage_migration_classic_ingesters_no_scale_down_delay to disable the downscale delay on classic ingesters when migrating to experimental ingest storage. #​8775 #​8873
  • [ENHANCEMENT] Configure experimental ingest storage on query-frontend too when enabled. #​8843
  • [ENHANCEMENT] Allow to override Kafka client ID on a per-component basis. #​9026
  • [ENHANCEMENT] Rollout-operator's access to ReplicaTemplate is now configured via config option rollout_operator_replica_template_access_enabled. #​9252
  • [ENHANCEMENT] Added support for new way of downscaling ingesters, using rollout-operator's resource-mirroring feature and read-only mode of ingesters. This can be enabled by using ingester_automated_downscale_v2_enabled config option. This is mutually exclusive with both ingester_automated_downscale_enabled (previous downscale mode) and ingest_storage_ingester_autoscaling_enabled (autoscaling for ingest-storage).
  • [ENHANCEMENT] Update rollout-operator to v0.19.1. #​9388
  • [BUGFIX] Added missing node affinity matchers to write component. #​8910
Mimirtool
  • [CHANGE] Disable colored output on mimirtool when the output is not to a terminal. #​9423
  • [CHANGE] Add --force-color flag to be able to enable colored output when the output is not to a terminal. #​9423
  • [CHANGE] Analyze Rules: Count recording rules used in rules group as used. #​6133
  • [CHANGE] Remove deprecated --rule-files flag in favor of CLI arguments for the following commands: #​8701
    • mimirtool rules load
    • mimirtool rules sync
    • mimirtool rules diff
    • mimirtool rules check
    • mimirtool rules prepare
  • [ENHANCEMENT] Remote read and backfill now supports the experimental native histograms. #​9156
Mimir Continuous Test
  • [CHANGE] Use test metrics that do not pass through 0 to make identifying incorrect results easier. #​8630
  • [CHANGE] Allowed authentication to Mimir using both Tenant ID and basic/bearer auth. #​9038
  • [FEATURE] Experimental support for the -tests.send-chunks-debugging-header boolean flag to send the X-Mimir-Chunk-Info-Logger: series_id header with queries. #​8599
  • [ENHANCEMENT] Include human-friendly timestamps in diffs logged when a test fails. #​8630
  • [ENHANCEMENT] Add histograms to measure latency of read and write requests. #​8583
  • [ENHANCEMENT] Log successful test runs in addition to failed test runs. #​8817
  • [ENHANCEMENT] Series emitted by continuous-test now distribute more uniformly across ingesters. #​9218 #​9243
  • [ENHANCEMENT] Configure User-Agent header for the Mimir client via -tests.client.user-agent. #​9338
  • [BUGFIX] Initialize test result metrics to 0 at startup so that alerts can correctly identify the first failure after startup. #​8630
Query-tee
  • [CHANGE] If a preferred backend is configured, then query-tee always returns its response, regardless of the response status code. Previously, query-tee would only return the response from the preferred backend if it did not have a 5xx status code. #​8634
  • [ENHANCEMENT] Emit trace spans from query-tee. #​8419
  • [ENHANCEMENT] Log trace ID (if present) with all log messages written while processing a request. #​8419
  • [ENHANCEMENT] Log user agent when processing a request. #​8419
  • [ENHANCEMENT] Add time parameter to proxied instant queries if it is not included in the incoming request. This is optional but enabled by default, and can be disabled with -proxy.add-missing-time-parameter-to-instant-queries=false. #​8419
  • [ENHANCEMENT] Add support for sending only a proportion of requests to all backends, with the remainder only sent to the preferred backend. The default behaviour is to send all requests to all backends. This can be configured with -proxy.secondary-backends-request-proportion. #​8532
  • [ENHANCEMENT] Check annotations emitted by both backends are the same when comparing responses from two backends. #​8660
  • [ENHANCEMENT] Compare native histograms in query results when comparing results between two backends. #​8724
  • [ENHANCEMENT] Don't consider responses to be different during response comparison if both backends' responses contain different series, but all samples are within the recent sample window. #​8749 #​8894
  • [ENHANCEMENT] When the expected and actual response for a matrix series is different, the full set of samples for that series from both backends will now be logged. #​8947
  • [ENHANCEMENT] Wait up to -server.graceful-shutdown-timeout for inflight requests to finish when shutting down, rather than immediately terminating inflight requests on shutdown. #​8985
  • [ENHANCEMENT] Optionally consider equivalent error messages the same when comparing responses. Enabled by default, disable with -proxy.require-exact-error-match=true. #​9143 #​9350 #​9366
  • [BUGFIX] Ensure any errors encountered while forwarding a request to a backend (eg. DNS resolution failures) are logged. #​8419
  • [BUGFIX] The comparison of the results should not fail when either side contains extra samples from within SkipRecentSamples duration. #​8920
  • [BUGFIX] When -proxy.compare-skip-recent-samples is enabled, compare sample timestamps with the time the query requests were made, rather than the time at which the comparison is occurring. #​9416
Documentation
  • [ENHANCEMENT] Specify in which component the configuration flags -compactor.blocks-retention-period, -querier.max-query-lookback, -query-frontend.max-total-query-length, -query-frontend.max-query-expression-size-bytes are applied and that they are applied to remote read as well. #​8433
  • [ENHANCEMENT] Provide more detailed recommendations on how to migrate from classic to native histograms. #​8864
  • [ENHANCEMENT] Clarify that {namespace} and {groupName} path segments in the ruler config API should be URL-escaped. #​8969
  • [ENHANCEMENT] Include stalled compactor network drive information in runbooks. #​9297
  • [ENHANCEMENT] Document /ingester/prepare-partition-downscale and /ingester/prepare-instance-ring-downscale endpoints. #​9132
  • [ENHANCEMENT] Describe read-only mode of ingesters in component documentation. #​9132
Tools
  • [CHANGE] wal-reader: Renamed -series-entries to -print-series. Renamed -print-series-with-samples to -print-samples. #​8568
  • [FEATURE] query-bucket-index: add new tool to query a bucket index file and print the blocks that would be used for a given query time range. #​8818
  • [FEATURE] kafkatool: add new CLI tool to operate Kafka. Supported commands: #​9000
    • brokers list-leaders-by-partition
    • consumer-group commit-offset
    • consumer-group copy-offset
    • consumer-group list-offsets
    • create-partitions
  • [ENHANCEMENT] wal-reader: References to unknown series from Samples, Exemplars, histogram or tombstones records are now always logged. #​8568
  • [ENHANCEMENT] tsdb-series: added -stats option to print min/max time of chunks, total number of samples and DPM for each series. #​8420
  • [ENHANCEMENT] tsdb-print-chunk: print counter reset information for native histograms. #​8812
  • [ENHANCEMENT] grpcurl-query-ingesters: print counter reset information for native histograms. #​8820
  • [ENHANCEMENT] grpcurl-query-ingesters: concurrently query ingesters. #​9102
  • [ENHANCEMENT] grpcurl-query-ingesters: sort series and chunks in output. #​9180
  • [ENHANCEMENT] grpcurl-query-ingesters: print full chunk timestamps, not just time component. #​9180
  • [ENHANCEMENT] tsdb-series: Added -json option to generate JSON output for easier post-processing. #​8844
  • [ENHANCEMENT] tsdb-series: Added -min-time and -max-time options to filter samples that are used for computing data-points per minute. #​8844
  • [ENHANCEMENT] mimir-rules-action: Added new input to support matching target namespaces by regex. #​9244
  • [ENHANCEMENT] mimir-rules-action: Added new inputs to support ignoring namespaces and ignoring namespaces by regex. #​9258 #​9324
  • [BUGFIX] copyblocks, undelete-blocks, copyprefix: use a multipart upload to server-side copy objects greater than 5GiB in size on S3. #​9357

v2.13.0

Grafana Mimir
  • [CHANGE] Build: grafana/mimir docker image is now based on gcr.io/distroless/static-debian12 image. Alpine-based docker image is still available as grafana/mimir-alpine, until Mimir 2.15. #​8204 #​8235
  • [CHANGE] Ingester: /ingester/flush endpoint is now only allowed to execute only while the ingester is in Running state. The 503 status code is returned if the endpoint is called while the ingester is not in Running state. #​7486
  • [CHANGE] Distributor: Include label name in err-mimir-label-value-too-long error message: #​7740
  • [CHANGE] Ingester: enabled 1 out 10 errors log sampling by default. All the discarded samples will still be tracked by the cortex_discarded_samples_total metric. The feature can be configured via -ingester.error-sample-rate (0 to log all errors). #​7807
  • [CHANGE] Query-frontend: Query results caching and experimental query blocking now utilize the PromQL string-formatted query format rather than the unvalidated query as submitted to the frontend. #​7742
    • Query results caching should be more stable as all equivalent queries receive the same cache key, but there may be cache churn on first deploy with the updated format
    • Query blocking can no longer be circumvented with an equivalent query in a different format; see Configure queries to block
  • [CHANGE] Query-frontend: stop using -validation.create-grace-period to clamp how far into the future a query can span. #​8075
  • [CHANGE] Clamp GOMAXPROCS to runtime.NumCPU. #​8201
  • [CHANGE] Anonymous usage statistics tracking: add CPU usage percentage tracking. #​8282
  • [CHANGE] Added new metric cortex_compactor_disk_out_of_space_errors_total which counts how many times a compaction failed due to the compactor being out of disk. #​8237
  • [CHANGE] Anonymous usage statistics tracking: report active series in addition to in-memory series. #​8279
  • [CHANGE] Ruler: evaluation_delay field in the rule group configuration has been deprecated. Please use query_offset instead (it has the same exact meaning and behaviour). #​8295
  • [CHANGE] General: remove -log.buffered. The configuration option has been enabled by default and deprecated since Mimir 2.11. #​8395
  • [CHANGE] Ruler: promote tenant federation from experimental to stable. #​8400
  • [CHANGE] Ruler: promote -ruler.recording-rules-evaluation-enabled and -ruler.alerting-rules-evaluation-enabled from experimental to stable. #​8400
  • [CHANGE] General: promote -tenant-federation.max-tenants from experimental to stable. #​8400
  • [FEATURE] Continuous-test: now runable as a module with mimir -target=continuous-test. #​7747
  • [FEATURE] Store-gateway: Allow specific tenants to be enabled or disabled via -store-gateway.enabled-tenants or -store-gateway.disabled-tenants CLI flags or their corresponding YAML settings. #​7653
  • [FEATURE] New -<prefix>.s3.bucket-lookup-type flag configures lookup style type, used to access bucket in s3 compatible providers. #​7684
  • [FEATURE] Querier: add experimental streaming PromQL engine, enabled with -querier.promql-engine=mimir. #​7693 #​7898 #​7899 #​8023 #​8058 #​8096 #​8121 #​8197 #​8230 #​8247 #​8270 #​8276 #​8277 #​8291 #​8303 #​8340 #​8256 #​8348
  • [FEATURE] New /ingester/unregister-on-shutdown HTTP endpoint allows dynamic access to ingesters' -ingester.ring.unregister-on-shutdown configuration. #​7739
  • [FEATURE] Server: added experimental PROXY protocol support. The PROXY protocol support can be enabled via -server.proxy-protocol-enabled=true. When enabled, the support is added both to HTTP and gRPC listening ports. #​7698
  • [FEATURE] Query-frontend, querier: new experimental /cardinality/active_native_histogram_metrics API to get active native histogram metric names with statistics about active native histogram buckets. #​7982 #​7986 #​8008
  • [FEATURE] Alertmanager: Added -alertmanager.max-silences-count and -alertmanager.max-silence-size-bytes to set limits on per tenant silences. Disabled by default. #​8241 #​8249
  • [FEATURE] Ingester: add experimental support for the server-side circuit breakers when writing to and reading from ingesters. This can be enabled using -ingester.push-circuit-breaker.enabled and -ingester.read-circuit-breaker.enabled options. Further -ingester.push-circuit-breaker.* and -ingester.read-circuit-breaker.* options for configuring circuit-breaker are available. Added metrics cortex_ingester_circuit_breaker_results_total, cortex_ingester_circuit_breaker_transitions_total, cortex_ingester_circuit_breaker_current_state and cortex_ingester_circuit_breaker_request_timeouts_total. #​8180 #​8285 #​8315 #​8446
  • [FEATURE] Distributor, ingester: add new setting -validation.past-grace-period to limit how old (based on the wall clock minus OOO window) the ingested samples can be. The default 0 value disables this limit. #​8262
  • [ENHANCEMENT] Distributor: add metrics cortex_distributor_samples_per_request and cortex_distributor_exemplars_per_request to track samples/exemplars per request. #​8265
  • [ENHANCEMENT] Reduced memory allocations in functions used to propagate contextual information between gRPC calls. #​7529
  • [ENHANCEMENT] Distributor: add experimental limit for exemplars per series per request, enabled with -distributor.max-exemplars-per-series-per-request, the number of discarded exemplars are tracked with cortex_discarded_exemplars_total{reason="too_many_exemplars_per_series_per_request"} #​7989 #​8010
  • [ENHANCEMENT] Store-gateway: merge series from different blocks concurrently. #​7456
  • [ENHANCEMENT] Store-gateway: Add stage="wait_max_concurrent" to cortex_bucket_store_series_request_stage_duration_seconds which records how long the query had to wait for its turn for -blocks-storage.bucket-store.max-concurrent. #​7609
  • [ENHANCEMENT] Querier: add cortex_querier_federation_upstream_query_wait_duration_seconds to observe time from when a querier picks up a cross-tenant query to when work begins on its single-tenant counterparts. #​7209
  • [ENHANCEMENT] Compactor: Add cortex_compactor_block_compaction_delay_seconds metric to track how long it takes to compact blocks since the blocks are created. #​7635
  • [ENHANCEMENT] Store-gateway: add outcome label to cortex_bucket_stores_gate_duration_seconds histogram metric. Possible values for the outcome label are: rejected_canceled, rejected_deadline_exceeded, rejected_other, and permitted. #​7784
  • [ENHANCEMENT] Query-frontend: use zero-allocation experimental decoder for active series queries via -query-frontend.use-active-series-decoder. #​7665
  • [ENHANCEMENT] Go: updated to 1.22.2. #​7802
  • [ENHANCEMENT] Query-frontend: support limit parameter on /prometheus/api/v1/label/{name}/values and /prometheus/api/v1/labels endpoints. #​7722
  • [ENHANCEMENT] Expose TLS configuration for the S3 backend client. #​7959
  • [ENHANCEMENT] Rules: Support expansion of native histogram values when using rule templates #​7974
  • [ENHANCEMENT] Rules: Add metric cortex_prometheus_rule_group_last_restore_duration_seconds which measures how long it takes to restore rule groups using the ALERTS_FOR_STATE series #​7974
  • [ENHANCEMENT] OTLP: Improve remote write format translation performance by using label set hashes for metric identifiers instead of string based ones. #​8012
  • [ENHANCEMENT] Querying: Remove OpEmptyMatch from regex concatenations. #​8012
  • [ENHANCEMENT] Store-gateway: add -blocks-storage.bucket-store.max-concurrent-queue-timeout. When set, queries at the store-gateway's query gate will not wait longer than that to execute. If a query reaches the wait timeout, then the querier will retry the blocks on a different store-gateway. If all store-gateways are unavailable, then the query will fail with err-mimir-store-consistency-check-failed. #​7777 #​8149
  • [ENHANCEMENT] Store-gateway: add -blocks-storage.bucket-store.index-header.lazy-loading-concurrency-queue-timeout. When set, loads of index-headers at the store-gateway's index-header lazy load gate will not wait longer than that to execute. If a load reaches the wait timeout, then the querier will retry the blocks on a different store-gateway. If all store-gateways are unavailable, then the query will fail with err-mimir-store-consistency-check-failed. #​8138
  • [ENHANCEMENT] Ingester: Optimize querying with regexp matchers. #​8106
  • [ENHANCEMENT] Distributor: Introduce -distributor.max-request-pool-buffer-size to allow configuring the maximum size of the request pool buffers. #​8082
  • [ENHANCEMENT] Store-gateway: improve performance when streaming chunks to queriers is enabled (-querier.prefer-streaming-chunks-from-store-gateways=true) and the query selects fewer than -blocks-storage.bucket-store.batch-series-size series (defaults to 5000 series). #​8039
  • [ENHANCEMENT] Ingester: active series are now updated along with owned series. They decrease when series change ownership between ingesters. This helps provide a more accurate total of active series when ingesters are added. This is only enabled when -ingester.track-ingester-owned-series or -ingester.use-ingester-owned-series-for-limits are enabled. #​8084
  • [ENHANCEMENT] Query-frontend: include route name in query stats log lines. #​8191
  • [ENHANCEMENT] OTLP: Speed up conversion from OTel to Mimir format by about 8% and reduce memory consumption by about 30%. Can be disabled via -distributor.direct-otlp-translation-enabled=false #​7957
  • [ENHANCEMENT] Ingester/Querier: Optimise regexps with long lists of alternates. #​8221, #​8234
  • [ENHANCEMENT] Ingester: Include more detail in tracing of queries. #​8242
  • [ENHANCEMENT] Distributor: add insight=true to remote-write and OTLP write handlers when the HTTP response status code is 4xx. #​8294
  • [ENHANCEMENT] Ingester: reduce locked time while matching postings for a label, improving the write latency and compaction speed. #​8327
  • [ENHANCEMENT] Ingester: reduce the amount of locks taken during the Head compaction's garbage-collection process, improving the write latency and compaction speed. #​8327
  • [ENHANCEMENT] Query-frontend: log the start, end time and matchers for remote read requests to the query stats logs. #​8326 #​8370 #​8373
  • [BUGFIX] Distributor: prometheus retry on 5xx and 429 errors, while otlp collector only retry on 429, 502, 503 and 504, mapping other 5xx errors to the retryable ones in otlp endpoint. #​8324 #​8339
  • [BUGFIX] Distributor: make OTLP endpoint return marshalled proto bytes as response body for 4xx/5xx errors. #​8227
  • [BUGFIX] Rules: improve error handling when querier is local to the ruler. #​7567
  • [BUGFIX] Querier, store-gateway: Protect against panics raised during snappy encoding. #​7520
  • [BUGFIX] Ingester: Prevent timely compaction of empty blocks. #​7624
  • [BUGFIX] Querier: Don't cache context.Canceled errors for bucket index. #​7620
  • [BUGFIX] Store-gateway: account for "other" time in LabelValues and LabelNames requests. #​7622
  • [BUGFIX] Query-frontend: Don't panic when using the -query-frontend.downstream-url flag. #​7651
  • [BUGFIX] Ingester: when receiving multiple exemplars for a native histogram via remote write, sort them and only report an error if all are older than the latest exemplar as this could be a partial update. #​7640 #​7948 #​8014
  • [BUGFIX] Ingester: don't retain blocks if they finish exactly on the boundary of the retention window. #​7656
  • [BUGFIX] Bug-fixes and improvements to experimental native histograms. #​7744 #​7813
  • [BUGFIX] Querier: return an error when a query uses label_join with an invalid destination label name. #​7744
  • [BUGFIX] Compactor: correct outstanding job estimation in metrics and compaction-planner tool when block labels differ. #​7745
  • [BUGFIX] Ingester: turn native histogram validation errors in TSDB into soft ingester errors that result in returning 4xx to the end-user instead of 5xx. In the case of TSDB validation errors, the counter cortex_discarded_samples_total will be increased with the reason label set to "invalid-native-histogram". #​7736 #​7773
  • [BUGFIX] Do not wrap error message with sampled 1/<frequency> if it's not actually sampled. #​7784
  • [BUGFIX] Store-gateway: do not track cortex_querier_blocks_consistency_checks_failed_total metric if query has been canceled or interrued due to any error not related to blocks consistency check failed. #​7752
  • [BUGFIX] Ingester: ignore instances with no tokens when calculating local limits to prevent discards during ingester scale-up #​7881
  • [BUGFIX] Ingester: do not reuse exemplars slice in the write request if there are more than 10 exemplars per series. This should help to reduce the in-use memory in case of few requests with a very large number of exemplars. #​7936
  • [BUGFIX] Distributor: fix down scaling of native histograms in the distributor when timeseries unmarshal cache is in use. #​7947
  • [BUGFIX] Distributor: fix cardinality API to return more accurate number of in-memory series when number of zones is larger than replication factor. #​7984
  • [BUGFIX] All: fix config validation for non-ingester modules, when ingester's ring is configured with spread-minimizing token generation strategy. #​7990
  • [BUGFIX] Ingester: copy LabelValues strings out of mapped memory to avoid a segmentation fault if the region becomes unmapped before the result is marshaled. #​8003
  • [BUGFIX] OTLP: Don't generate target_info unless at least one identifying label is defined. #​8012
  • [BUGFIX] OTLP: Don't generate target_info unless there are metrics. #​8012
  • [BUGFIX] Query-frontend: Experimental query queue splitting: fix issue where offset and range selector duration were not considered when predicting query component. #​7742
  • [BUGFIX] Querying: Empty matrix results were incorrectly returning null instead of []. #​8029
  • [BUGFIX] All: don't increment thanos_objstore_bucket_operation_failures_total metric for cancelled requests. #​8072
  • [BUGFIX] Query-frontend: fix empty metric name matcher not being applied under certain conditions. #​8076
  • [BUGFIX] Querying: Fix regex matching of multibyte runes with dot operator. #​8089
  • [BUGFIX] Querying: matrix results returned from instant queries were not sorted by series. #​8113
  • [BUGFIX] Query scheduler: Fix a crash in result marshaling. #​8140
  • [BUGFIX] Store-gateway: Allow long-running index scans to be interrupted. #​8154
  • [BUGFIX] Query-frontend: fix splitting of queries using @ start() and @end() modifiers on a subquery. Previously the start() and end() would be evaluated using the start end end of the split query instead of the original query. #​8162
  • [BUGFIX] Distributor: Don't discard time series with invalid exemplars, just drop affected exemplars. #​8224
  • [BUGFIX] Ingester: fixed in-memory series count when replaying a corrupted WAL. #​8295
  • [BUGFIX] Ingester: fix context cancellation handling when a query is busy looking up series in the TSDB index and -blocks-storage.tsdb.head-postings-for-matchers-cache* or -blocks-storage.tsdb.block-postings-for-matchers-cache* are in use. #​8337
  • [BUGFIX] Querier: fix edge case where bucket indexes are sometimes cached forever instead of with the expected TTL. #​8343
  • [BUGFIX] OTLP handler: fix errors returned by OTLP handler when used via httpgrpc tunneling. #​8363
  • [BUGFIX] Update github.com/hashicorp/go-retryablehttp to address CVE-2024-6104. #​8539
  • [BUGFIX] Alertmanager: Fixes a number of bugs in silences which could cause an existing silence to be deleted/expired when updating the silence failed. This could happen when the replacing silence was invalid or exceeded limits. #​8525
  • [BUGFIX] Alertmanager: Fix per-tenant silence limits not reloaded during runtime. #​8456
  • [BUGFIX] Alertmanager: Fix help message for utf-8-strict-mode. #​8572
  • [BUGFIX] Upgrade golang to 1.22.5 to address CVE-2024-24791. #​8600
Mixin
  • [CHANGE] Alerts: Removed obsolete MimirQueriesIncorrect alert that used test-exporter metrics. Test-exporter support was however removed in Mimir 2.0 release. #​7774
  • [CHANGE] Alerts: Change threshold for MimirBucketIndexNotUpdated alert to fire before queries begin to fail due to bucket index age. #​7879
  • [FEATURE] Dashboards: added 'Remote ruler reads networking' dashboard. #​7751
  • [FEATURE] Alerts: Add MimirIngesterStuckProcessingRecordsFromKafka alert. #​8147
  • [ENHANCEMENT] Alerts: allow configuring alerts range interval via _config.base_alerts_range_interval_minutes. #​7591
  • [ENHANCEMENT] Dashboards: Add panels for monitoring distributor and ingester when using ingest-storage. These panels are disabled by default, but can be enabled using show_ingest_storage_panels: true config option. Similarly existing panels used when distributors and ingesters use gRPC for forwarding requests can be disabled by setting show_grpc_ingestion_panels: false. #​7670 #​7699
  • [ENHANCEMENT] Alerts: add the following alerts when using ingest-storage: #​7699 #​7702 #​7867
    • MimirIngesterLastConsumedOffsetCommitFailed
    • MimirIngesterFailedToReadRecordsFromKafka
    • MimirIngesterKafkaFetchErrorsRateTooHigh
    • MimirStartingIngesterKafkaReceiveDelayIncreasing
    • MimirRunningIngesterReceiveDelayTooHigh
    • MimirIngesterFailsToProcessRecordsFromKafka
    • MimirIngesterFailsEnforceStrongConsistencyOnReadPath
  • [ENHANCEMENT] Dashboards: add in-flight queries scaling metric panel for ruler-querier. #​7749
  • [ENHANCEMENT] Dashboards: renamed rows in the "Remote ruler reads" and "Remote ruler reads resources" dashboards to match the actual component names. #​7750
  • [ENHANCEMENT] Dashboards: allow switching between using classic of native histograms in dashboards. #​7627
    • Overview dashboard, Status panel, cortex_request_duration_seconds metric.
  • [ENHANCEMENT] Alerts: exclude 529 and 598 status codes from failure codes in MimirRequestsError. #​7889
  • [ENHANCEMENT] Dashboards: renamed "TCP Connections" panel to "Ingress TCP Connections" in the networking dashboards. #​8092
  • [ENHANCEMENT] Dashboards: update the use of deprecated "table (old)" panels to "table". #​8181
  • [ENHANCEMENT] Dashboards: added a component variable to "Slow queries" dashboard to allow checking the slow queries of the remote ruler evaluation query path. #​8309
  • [BUGFIX] Dashboards: fix regular expression for matching read-path gRPC ingester methods to include querying of exemplars, label-related queries, or active series queries. #​7676
  • [BUGFIX] Dashboards: fix user id abbreviations and column heads for Top Tenants dashboard. #​7724
  • [BUGFIX] Dashboards: fix incorrect query used for "queue length" panel on "Ruler" dashboard. #​8006
  • [BUGFIX] Dashboards: fix disk space utilization panels when running with a recent version of kube-state-metrics. #​8212
Jsonnet
  • [CHANGE] Memcached: Change default read timeout for chunks and index caches to 750ms from 450ms. #​7778
  • [CHANGE] Fine-tuned terminationGracePeriodSeconds for the following components: #​7364
    • Querier: changed from 30 to 180
    • Query-scheduler: changed from 30 to 180
  • [CHANGE] Change TCP port exposed by mimir-continuous-test deployment to match with updated defaults of its container image (see changes below). #​7958
  • [FEATURE] Add support to deploy Mimir with experimental ingest storage enabled. #​8028 #​8222
  • [ENHANCEMENT] Compactor: add $._config.cortex_compactor_concurrent_rollout_enabled option (disabled by default) that makes use of rollout-operator to speed up the rollout of compactors. #​7783 #​7878
  • [ENHANCEMENT] Shuffle-sharding: add $._config.shuffle_sharding.ingest_storage_partitions_enabled and $._config.shuffle_sharding.ingester_partitions_shard_size options, that allow configuring partitions shard size in ingest-storage mode. #​7804
  • [ENHANCEMENT] Update rollout-operator to v0.17.0. #​8399
  • [ENHANCEMENT] Add _config.autoscaling_querier_predictive_scaling_enabled to scale querier based on inflight queries 7 days ago. #​7775
  • [ENHANCEMENT] Add support to autoscale ruler-querier replicas based on in-flight queries too (in addition to CPU and memory based scaling). #​8060 #​8188
  • [ENHANCEMENT] Distributor: improved distributor HPA scaling metric to only take in account ready pods. This requires the metric kube_pod_status_ready to be available in the data source used by KEDA to query scaling metrics (configured via _config.autoscaling_prometheus_url). #​8251
  • [BUGFIX] Guard against missing samples in KEDA queries. #​7691
  • [BUGFIX] Alertmanager: Set -server.http-idle-timeout to avoid EOF errors in ruler. #​8192
Mimirtool
  • [CHANGE] Deprecated --rule-files flag in favor of CLI arguments. #​7756
  • [FEATURE] mimirtool: Add runtime-config verify sub-command, for verifying Mimir runtime config files. #​8123
  • [ENHANCEMENT] mimirtool promql format: Format PromQL query with Prometheus' string or pretty-print formatter. #​7742
  • [ENHANCEMENT] Add mimir-http-prefix configuration to set the Mimir URL prefix when using legacy routes. #​8069
  • [ENHANCEMENT] Add option --output-dir to mimirtool rules get and mimirtool rules print to allow persisting rule groups to a file for edit and re-upload. #​8142
  • [BUGFIX] Fix panic in loadgen subcommand. #​7629
  • [BUGFIX] mimirtool rules prepare: do not add aggregation label to on() clause if already present in group_left() or group_right(). #​7839
  • [BUGFIX] Analyze Grafana: fix parsing queries with variables. #​8062
  • [BUGFIX] mimirtool rules sync: detect a change when the query_offset or the deprecated evaluation_delay configuration changes. #​8297
Mimir Continuous Test
  • [CHANGE] mimir-continuous-test has been deprecated and replaced by a Mimir module that can be run as a target from the mimir binary using mimir -target=continuous-test. #​7753
  • [CHANGE] -server.metrics-port flag is no longer available for use in the module run of mimir-continuous-test, including the grafana/mimir-continuous-test Docker image which uses the new module. Configuring this port is still possible in the binary, which is deprecated. #​7747
  • [CHANGE] Allowed authenticatication to Mimir using both Tenant ID and basic/bearer auth #​7619.
  • [BUGFIX] Set User-Agent header for all requests sent from the testing client. #​7607
Query-tee
  • [ENHANCEMENT] Log queries that take longer than proxy.log-slow-query-response-threshold when compared to other backends. #​7346
  • [ENHANCEMENT] Add two new metrics for measuring the relative duration between backends: #​7782 #​8013 #​8330
    • cortex_querytee_backend_response_relative_duration_seconds
    • cortex_querytee_backend_response_relative_duration_proportional
Documentation
  • [ENHANCEMENT] Clarify Compactor and its storage volume when configured under Kubernetes. #​7675
  • [ENHANCEMENT] Add OTLP route to Mimir routes by path runbooks section. #​8074
  • [ENHANCEMENT] Document option server.log-source-ips-full. #​8268
Tools
  • [ENHANCEMENT] ulidtime: add option to show random part of ULID, timestamp in milliseconds and header. #​7615
  • [ENHANCEMENT] copyblocks: add a flag to configure part-size for multipart uploads in s3 client-side copying. #​8292
  • [ENHANCEMENT] copyblocks: enable pprof HTTP endpoints. #​8292

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this MR and you won't be reminded about this update again.


  • If you want to rebase/retry this MR, check this box

This MR has been generated by Renovate Bot.

Edited by Soos

Merge request reports

Loading