Skip to content

Limit job log to ensure it contains UTF-8 valid data

Stan Hu requested to merge sh-limit-log-utf8-aware into main

Previously when a job log were truncated, we might cut off the log in the middle of a UTF-8 character. This would lead to 500 errors when Rails attempted to show the job log.

We know that a UTF-8 character can be at most 4 bytes, and each continuation byte has its high bit set to 1. We can rewind up to 4 bytes until we find a byte with the high bit set to 0 and truncate the log there. This means one multi-byte UTF-8 character might be lopped off unnecessarily, but this keeps the function simple.

Relates to gitlab#336356 (closed)

Edited by Stan Hu

Merge request reports