Error Mimir Sample timestamp too old / Error Mimir sample out of order

During the review of the following tickets:

Some of the metrics is missing in one or both environments:

A quick troubleshooting render the following issue in gstg (lots of error like thins one):

{
  "caller": "dedupe.go:112",
  "component": "remote",
  "count": 2000,
  "err": "server returned HTTP status 400 Bad Request: failed pushing to ingester: user=gitlab-gstg: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-01-29T13:44:19.997Z and is from series {__name__=\"ebpf_exporter_bio_size_bytes_bucket\", cluster=\"gstg-gitlab-gke\", device=\"loop4\", env=\"gstg\", environment=\"gstg\", fqdn=\"redis-cluster-feature-flag-shard-02-02-db-gstg\", instance=\"redis-cluster-feature-flag-shard-02-02-db-gstg\", job=\"scrapeConfig/monitoring/prometheus-agent-ebpf\", le=\"8.388608e+06\", machine_type=\"n1-standard-1\", monitor=\"default\", operation=\"read\", pet_name=\"redis-cluster-feature-flag-shard-02\", port=\"9435\", prometheus=\"monitoring/gitlab-rw-prometheus\", provider=\"gcp\", region=\"us-east1\", service=\"redis\", type=\"redis-cluster-feature-flag\", zone=\"us-east1-d\"}",
  "exemplarCount": 0,
  "level": "error",
  "msg": "non-recoverable error",
  "remote_name": "mimir",
  "ts": "2024-01-29T15:02:19.056Z",
  "url": "https://mimir.ops.gke.gitlab.net/api/v1/push"
}

https://grafana.com/docs/mimir/latest/manage/mimir-runbooks/#err-mimir-sample-timestamp-too-old

This error occurs when the ingester rejects a sample because its timestamp is too old as compared to the most recent timestamp received for the same tenant across all its time series.

How it works:

If the incoming timestamp is more than 1 hour older than the most recent timestamp ingested for the tenant, the sample will be rejected.

Edited Jan 29, 2024 by Raúl Naveiras