Error Mimir Sample timestamp too old / Error Mimir sample out of order
During the review of the following tickets:
Some of the metrics is missing in one or both environments:
A quick troubleshooting render the following issue in gstg (lots of error like thins one):
{
"caller": "dedupe.go:112",
"component": "remote",
"count": 2000,
"err": "server returned HTTP status 400 Bad Request: failed pushing to ingester: user=gitlab-gstg: the sample has been rejected because its timestamp is too old (err-mimir-sample-timestamp-too-old). The affected sample has timestamp 2024-01-29T13:44:19.997Z and is from series {__name__=\"ebpf_exporter_bio_size_bytes_bucket\", cluster=\"gstg-gitlab-gke\", device=\"loop4\", env=\"gstg\", environment=\"gstg\", fqdn=\"redis-cluster-feature-flag-shard-02-02-db-gstg\", instance=\"redis-cluster-feature-flag-shard-02-02-db-gstg\", job=\"scrapeConfig/monitoring/prometheus-agent-ebpf\", le=\"8.388608e+06\", machine_type=\"n1-standard-1\", monitor=\"default\", operation=\"read\", pet_name=\"redis-cluster-feature-flag-shard-02\", port=\"9435\", prometheus=\"monitoring/gitlab-rw-prometheus\", provider=\"gcp\", region=\"us-east1\", service=\"redis\", type=\"redis-cluster-feature-flag\", zone=\"us-east1-d\"}",
"exemplarCount": 0,
"level": "error",
"msg": "non-recoverable error",
"remote_name": "mimir",
"ts": "2024-01-29T15:02:19.056Z",
"url": "https://mimir.ops.gke.gitlab.net/api/v1/push"
}
https://grafana.com/docs/mimir/latest/manage/mimir-runbooks/#err-mimir-sample-timestamp-too-old
This error occurs when the ingester rejects a sample because its timestamp is too old as compared to the most recent timestamp received for the same tenant across all its time series.
How it works:
- If the incoming timestamp is more than 1 hour older than the most recent timestamp ingested for the tenant, the sample will be rejected.
Edited by Raúl Naveiras