td-agent is crashing due to a misconfiguration
Summary
td-agent is crashing due to a misconfiguration
Timeline
All times UTC.
2020-05-12
- 17:27 - SRE on-call was notified that there was an issue with logging
- 17:32 - Incident declared from Slack
- 17:32 - gitlab-cookbooks/gitlab_fluentd!153 (diffs) was determined to be the cause
- 17:33 - https://ops.gitlab.net/gitlab-cookbooks/chef-repo/-/merge_requests/3399 was created to revert
- 17:35 - Revert was merged
- 17:38 - Revert being applied to prod
- 17:39 - chef-client being run on
api-01-sv-gprd
to verify fix - 17:40 - Fix confirmed to work. Deciding how best to run chef-client on affected hosts.
- 18:26 - The final node finished converging
Details
td-agent is crashing across the fleet due to a misconfiguration, thus reducing log visibility.
2020-05-11 17:07:14 +0000 [error]: config error in:
<parse>
expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"(?:\s+(?<gzip_ratio>[^ ]+))?)?$/
time_format %d/%b/%Y:%H:%M:%S %z
unmatched_lines
</parse>
2020-05-11 17:07:14 +0000 [error]: config error file="/etc/td-agent/td-agent.conf" error_class=Fluent::ConfigError error="'@type' parameter is required, in section parse"
Source
Incident declared by alex in Slack via /incident declare
command.
Resources
- If the Situation Zoom room was utilised, recording will be automatically uploaded to Incident room Google Drive folder (private)
Edited by Alex Hanselka