Skip to content

flawfinder fails in sast CI job with UnicodeDecodeError error

Summary

SAST calls flawfinder on C/C++ code, and this failed with a python backtrace related to Unicode.

Steps to reproduce

Configure a SAST CI job on a project with C/C++ code. Trigger the CI job with a push.

Example Project

See https://gitlab.com/selsky/ntpsec/-/jobs/84646428 for a failing sample job.

What is the current bug behavior?

Python traceback related to UnicodeDecodeError.

What is the expected correct behavior?

No traceback, and instead I should see the report json file generated.

Relevant logs and/or screenshots

Traceback (most recent call last):
  File "/usr/local/bin/flawfinder", line 2188, in <module>
    sys.exit(flawfind())
  File "/usr/local/bin/flawfinder", line 2181, in flawfind
    if process_files():
  File "/usr/local/bin/flawfinder", line 2020, in process_files
    process_file_args(files, patch_infos)
  File "/usr/local/bin/flawfinder", line 1775, in process_file_args
    maybe_process_file(f, patch_infos)
  File "/usr/local/bin/flawfinder", line 1720, in maybe_process_file
    maybe_process_file(os.path.join(f, dir_entry), patch_infos)
  File "/usr/local/bin/flawfinder", line 1720, in maybe_process_file
    maybe_process_file(os.path.join(f, dir_entry), patch_infos)
  File "/usr/local/bin/flawfinder", line 1744, in maybe_process_file
    process_c_file(f, patch_infos)
  File "/usr/local/bin/flawfinder", line 1486, in process_c_file
    text = "".join(my_input.readlines())
  File "/usr/local/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 5932: invalid start byte
2018/07/26 15:13:14 exit status 1
2018/07/26 15:13:14 Container exited with non zero status code
ERROR: Job failed: exit code 1

Output of checks

This bug happens on GitLab.com

Possible fixes

A possible work-around per https://www.dwheeler.com/flawfinder/flawfinder.pdf is to force the usage of python2, instead of python3. I attempted to set LANG=C, but that didn't seem to help. Forcing python2 is hacky, but it should be reliable in the face of uncertain input.

Planned Fix

  • Document how to fix the encoding issues by using iconv or cvt2utf to convert all files to utf8.
  • Parse and output a helpful GitLab specific message if flawfinder fails to run due to a character encoding issue.
Edited by Daniel Paul Searles