flawfinder fails in sast CI job with UnicodeDecodeError error
Summary
SAST calls flawfinder on C/C++ code, and this failed with a python backtrace related to Unicode.
Steps to reproduce
Configure a SAST CI job on a project with C/C++ code. Trigger the CI job with a push.
Example Project
See https://gitlab.com/selsky/ntpsec/-/jobs/84646428 for a failing sample job.
What is the current bug behavior?
Python traceback related to UnicodeDecodeError.
What is the expected correct behavior?
No traceback, and instead I should see the report json file generated.
Relevant logs and/or screenshots
Traceback (most recent call last):
File "/usr/local/bin/flawfinder", line 2188, in <module>
sys.exit(flawfind())
File "/usr/local/bin/flawfinder", line 2181, in flawfind
if process_files():
File "/usr/local/bin/flawfinder", line 2020, in process_files
process_file_args(files, patch_infos)
File "/usr/local/bin/flawfinder", line 1775, in process_file_args
maybe_process_file(f, patch_infos)
File "/usr/local/bin/flawfinder", line 1720, in maybe_process_file
maybe_process_file(os.path.join(f, dir_entry), patch_infos)
File "/usr/local/bin/flawfinder", line 1720, in maybe_process_file
maybe_process_file(os.path.join(f, dir_entry), patch_infos)
File "/usr/local/bin/flawfinder", line 1744, in maybe_process_file
process_c_file(f, patch_infos)
File "/usr/local/bin/flawfinder", line 1486, in process_c_file
text = "".join(my_input.readlines())
File "/usr/local/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 5932: invalid start byte
2018/07/26 15:13:14 exit status 1
2018/07/26 15:13:14 Container exited with non zero status code
ERROR: Job failed: exit code 1
Output of checks
This bug happens on GitLab.com
Possible fixes
A possible work-around per https://www.dwheeler.com/flawfinder/flawfinder.pdf is to force the usage of python2, instead of python3. I attempted to set LANG=C, but that didn't seem to help. Forcing python2 is hacky, but it should be reliable in the face of uncertain input.
Planned Fix
-
Document how to fix the encoding issues by using iconv or cvt2utf to convert all files to utf8. -
Parse and output a helpful GitLab specific message if flawfinder fails to run due to a character encoding issue.
Edited by Daniel Paul Searles