Skip to content

Fix trace short writes for large log lines

Arran Walker requested to merge ajwalker/trace-short-writes into main

What does this MR do?

Fixes an issue where trace's buffer.Write() can return n < len(p) due to the masking algorithm delaying/buffering writes.

This can be triggered and influenced by several things:

  • Sequential Write() calls that just happen to be at problematic boundaries (https://github.com/golang/go/issues/46892). For this, we're now performing additional Write() calls for the remaining data, as long as data is being processed (as indicated by the length written variable).
  • We were previously not processing as much data as we could have been due favouring the last "safe token", rather than an additional safe place to flush at a further offset. This bug influenced the problem.
  • The internal buffer of text/transform is 4096. Replacing secret phrases larger than this is impossible if they're written across write boundaries, as the internal buffer is not large enough. This will be fixed in !2979 (merged), but larger secrets in general increase the chance of data being written over a write boundary and causing issues.

Why was this MR needed?

Trace logs were being truncated and causing jobs to fail.

What's the best way to test this MR?

  • We've been using a known failing job during testing, details here: #28001 (comment 607645940). However, this will still occasionally pass.

  • The new Fuzz function and using the previous implementation it found these problems in under 2 minutes.

    You can test fuzzing locally with:

    docker run -v $(pwd):/code --rm -it golang:1.13.8
    go get github.com/dvyukov/go-fuzz/go-fuzz && go get github.com/dvyukov/go-fuzz/go-fuzz-build
    cd /code/helpers/trace
    mkdir corpus
    cp testdata/corpus/* corpus/
    go-fuzz-build
    go-fuzz

    The fuzz includes large secrets that will also crash input, but will be fixed in !2979 (merged). The pipeline for fuzzing jobs is allowed to fail and having in this MR is not a problem. For local testing, remove the following lines from buffer_fuzz.go to avoid these specific crashes.

    strings.Repeat("A", 1024),
    strings.Repeat("B", 4*1024),
    strings.Repeat("C", 8*1024),

What are the relevant issue numbers?

Closes #28001 (closed)

Closes #27861 (closed)

Edited by Steve Xuereb

Merge request reports