Tune Fluentd/Loki buffer management and sync for better resilience
Closes #2766 (closed)
This MR is trying to fix couple of issues with logging. Currently fluentd and root-fluentd getting OOM killed.
From observation of Logs:
- Rejection errors due to oversized chunks sent by fluentd to loki-gateway, current limit is 30MB.
- Errors due to rate limit of streams which is currently 3MB/s.
- The current resource utilization for memory of root-fluentd and fluentd is:-
request 128MB, limit: 500MB.
Related Logs:
From fluentd logs
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx/1.27.5</center>
</body>
</html>
)
level=error ts=2025-08-16T06:27:08.949808267Z caller=manager.go:50 component=ingester path=write msg="write operation failed" details="Per stream rate limit exceeded (limit: 3MB/sec) while attempting to ingest for stream '{app=\"log-spammer\", cluster=\"management-cluster\", container=\"spam\", host=\"management-cluster-cp-d780ce53ac-dt7m7\", namespace=\"log-stress\", pod=\"log-spammer-6b68bf58f6-w5xvj\", service_name=\"log-spammer\"}'
Form loki gateway : chunk size is approx 32MB
2025/08/21 09:33:34 [error] 9#9: *5795 client intended to send too large body: 32275213 bytes, client: 100.72.23.30, server: , request: "POST /loki/api/v1/push HTTP/1.1", host: "loki-gateway.loki.svc.cluster.local"
To Summarize, there are 2 issues:
-
413 Request Entity Too Large: Anything above 30MB was getting rejected because of loki-gateway client_max_body_size). -
Per stream rate limit exceeded (limit: 3MB/sec): As in the loki write configper_stream_rate_limit: 3MB - Current memory limits are not enough to handle the burst of logs.
For solving above issues we are trying to tune in few parameters with respect to fluentd and loki.
Following are the parameters will be changed:
On fluentd buffer:
CHUNK_LIMIT_SIZE: 8m (Default 256MB) File buffer
FLUSH_THREAD_COUNT: "8" (Default 1)
FLUSH_INTERVAL: 2s (Default 60s)
FLUSH_MODE: interval (Default lazy)
fluentd/root-fluentd Resource:
resources:
limits:
memory: 1Gi (Default 500MB)
requests:
memory: 300M (Default 128MB)
In this MR increasing the request and limit of memory for both the root-fluentd and fluentd. The reason, now we have 8m of chunk size with 8 flush threads which will consume 64MiB at a moment in case of burst (sending data towards the loki). But there is also another factor which consumes memory is parsing/filtering. So general formula will become:
Total Memory = (2 X chunk_size (raw format) X Number of threads) + parsing/filtering overhead (inbound)(burst: logs incoming could vary in burst) + any plugins
Here 2 represent the chunk in memory before write to disk buffer and chunk going to flush to loki.
The increase in requests to 300Mi from 128Mi is because of, 128Mi (2 X 8 X 8) + Memory required for filtering/parsing the incoming data + any memory for plugin used (in general).
The limit is raised 2 times the default of 500MB, so that we could consume the log burst easily from fluentbit side. The rest of chunk parameters will make sure it will flush the data more frequently so that not to fill the PVC used for file buffer. Also we haven't put any constraint on fluentbit to slow or to backoff.
Based on this, a request of 300 MiB is appropriate for steady-state usage, while a limit of 1 GiB provides sufficient room for bursts (spikes, retries, multiple open chunk keys, and parsing/filtering overhead).
These parameters (buffer size and resource settings) were tested with 100 replica pods, each producing ~500-byte of log line (multiple lines per sec), on a cluster with 3 control plane and 2 worker nodes (CAPO deployment). This setup was treated as a burst scenario. Under this load, Fluentd’s resource usage was observed at approximately 700 MiB memory and 600m CPU. Memory values can be fine-tuned further depending on the specific environment and expected burst patterns. During testing, no backoff or throttle settings were applied on Fluent Bit.
On loki:
per_stream_rate_limit: 10MB (Earlier 3MB) # Raises the per stream rate limit
Compared to Loki’s current ingestion rate of 100 MB/s, the configured parameters yield a much lower theoretical throughput. Using the general formula:
(FLUSH_THREADS × CHUNK_LIMIT_SIZE) / FLUSH_INTERVAL
(8 × 8 MB) / 2 s = 32 MB/s # Overall ingestion using all the streams
With these CHUNK_LIMIT_SIZE and FLUSH_THREADS, the calculated throughput still remains below Loki’s ingestion rate of 100 MB/s. Additionally, with CHUNK_LIMIT_SIZE set to 8 MB, it stays safely below the configured client_max_body_size of 30 MB (loki-gateway).
Note: This MR is specifically for main branch and should not be backported as-is. The MR for release-1.4 is !5252 (merged). The release-1.4 have the rancher-logging and in main we have logging-operator.
Test coverage
CI configuration
Below you can choose test deployment variants to run in this MR's CI.
Click to open to CI configuration
Legend:
| Icon | Meaning | Available values |
|---|---|---|
| Infra Provider |
capd, capo, capm3
|
|
| Bootstrap Provider |
kubeadm (alias kadm), rke2, okd, ck8s
|
|
| Node OS |
ubuntu, suse, na
|
|
| Deployment Options |
light-deploy, dev-sources, ha, misc, maxsurge-0, logging, no-logging
|
|
| Pipeline Scenarios | Available scenario list and description |
-
🎬 preview☁️ capd🚀 kadm🐧 ubuntu -
🎬 preview☁️ capo🚀 rke2🐧 suse -
🎬 preview☁️ capm3🚀 rke2🐧 ubuntu -
☁️ capd🚀 kadm🛠️ light-deploy🐧 ubuntu -
☁️ capd🚀 rke2🛠️ light-deploy🐧 suse -
☁️ capo🚀 rke2🐧 ubuntu -
☁️ capo🚀 rke2🐧 ubuntu🛠️ ha,logging -
☁️ capo🚀 kadm🐧 ubuntu -
☁️ capo🚀 rke2🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capo🚀 kadm🎬 wkld-k8s-upgrade🐧 ubuntu -
☁️ capo🚀 rke2🎬 rolling-update-no-wkld🛠️ ha🐧 suse -
☁️ capo🚀 rke2🎬 sylva-upgrade-from-1.4.x🛠️ ha🐧 ubuntu -
☁️ capo🚀 rke2🎬 sylva-upgrade-from-1.4.x🛠️ ha,misc🐧 ubuntu -
☁️ capo🚀 rke2🛠️ ha,misc🐧 ubuntu -
☁️ capm3🚀 rke2🐧 suse -
☁️ capm3🚀 rke2🐧 ubuntu🛠️ ha,logging -
☁️ capm3🚀 kadm🐧 ubuntu -
☁️ capm3🚀 ck8s🐧 ubuntu -
☁️ capm3🚀 kadm🎬 rolling-update-no-wkld🛠️ ha,misc🐧 ubuntu -
☁️ capm3🚀 rke2🎬 wkld-k8s-upgrade🛠️ ha🐧 suse -
☁️ capm3🚀 kadm🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2🎬 sylva-upgrade-from-1.4.x🛠️ ha🐧 suse -
☁️ capm3🚀 rke2🛠️ misc,ha🐧 suse -
☁️ capm3🚀 rke2🎬 sylva-upgrade-from-1.4.x🛠️ ha,misc🐧 suse -
☁️ capm3🚀 kadm🎬 rolling-update🛠️ ha🐧 suse -
☁️ capm3🚀 ck8s🎬 rolling-update🛠️ ha🐧 ubuntu -
☁️ capm3🚀 rke2|okd🎬 no-update🐧 ubuntu|na -
☁️ capm3🚀 kadm🛠️ ha,logging🐧 ubuntu
Global config for deployment pipelines
-
autorun pipelines -
allow failure on pipelines -
record sylvactl events
Notes:
- Enabling
autorunwill make deployment pipelines to be run automatically without human interaction - Disabling
allow failurewill make deployment pipelines mandatory for pipeline success. - if both
autorunandallow failureare disabled, deployment pipelines will need manual triggering but will be blocking the pipeline
Be aware: after configuration change, pipeline is not triggered automatically.
Please run it manually (by clicking the run pipeline button in Pipelines tab) or push new code.
