Multiline logging with fluentd

We ship logs to ELK and StackDriver using fluentd. Postgres emits multi-line logs if, for example, the query supplied contained multiple lines.

For syslog, this resulted in:

2019-06-05_14:11:09 patroni-02-db-gstg postgres[18733]:  [4-1] 2019-06-05 14:11:09 GMT [18733]: [1-1] LOG:  duration: 1001.765 ms  statement: select 1,
2019-06-05_14:11:09 patroni-02-db-gstg postgres[18733]:  [4-2] #011pg_sleep(1),
2019-06-05_14:11:09 patroni-02-db-gstg postgres[18733]:  [4-3] #011'foobar'
2019-06-05_14:11:09 patroni-02-db-gstg postgres[18733]:  [4-4] #011;

Which indeed adhered to the log format fluentd would expect. However, those lines would show up as individual records in ELK, which isn't exactly helping either.

Once logging_collector turned on, %l-1 isn't evaluated anymore (%l is, but -1 isn't). We would see:

2019-06-05_14:11:09 patroni-02-db-gstg postgres[18733]:  [4-1] 2019-06-05 14:11:09 GMT [18733]: [1-1] LOG:  duration: 1001.765 ms  statement: select 1,
    pg_sleep(1),
    'foobar'
    ;

fluentd parses the first line fine (and emits a record), but fails to parse the other lines because the format is different.

Enabling csvlog isn't any better:

2019-06-05 14:33:46.443 GMT,"gitlab-superuser","gitlabhq_production",15271,"127.0.0.1:53870",5cf7d2b8.3ba7,1,"SELECT",2019-06-05 14:33:28 GMT,9/0,0,LOG,00000,"duration: 1002.224 ms  statement: select 1,
pg_sleep(1),
'csv is great isnt it?'
;",,,,,,,,,"psql"

fluent supports csv but uses http://ruby-doc.org/stdlib-2.4.1/libdoc/csv/rdoc/CSV.html#method-c-parse_line to parse, feeding it only one line at a time.

For fluent, there's multiline parsing - but it looks like it would only support a fixed number of lines (which has to be known a priori).

The question here is: How do we ship those multi-line logs properly to ELK?