flawfinder fails in sast CI job with UnicodeDecodeError error
### Summary SAST calls flawfinder on C/C++ code, and this failed with a python backtrace related to Unicode. ### Steps to reproduce Configure a SAST CI job on a project with C/C++ code. Trigger the CI job with a push. ### Example Project See https://gitlab.com/selsky/ntpsec/-/jobs/84646428 for a failing sample job. ### What is the current *bug* behavior? Python traceback related to UnicodeDecodeError. ### What is the expected *correct* behavior? No traceback, and instead I should see the report json file generated. ### Relevant logs and/or screenshots ``` Traceback (most recent call last): File "/usr/local/bin/flawfinder", line 2188, in <module> sys.exit(flawfind()) File "/usr/local/bin/flawfinder", line 2181, in flawfind if process_files(): File "/usr/local/bin/flawfinder", line 2020, in process_files process_file_args(files, patch_infos) File "/usr/local/bin/flawfinder", line 1775, in process_file_args maybe_process_file(f, patch_infos) File "/usr/local/bin/flawfinder", line 1720, in maybe_process_file maybe_process_file(os.path.join(f, dir_entry), patch_infos) File "/usr/local/bin/flawfinder", line 1720, in maybe_process_file maybe_process_file(os.path.join(f, dir_entry), patch_infos) File "/usr/local/bin/flawfinder", line 1744, in maybe_process_file process_c_file(f, patch_infos) File "/usr/local/bin/flawfinder", line 1486, in process_c_file text = "".join(my_input.readlines()) File "/usr/local/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 5932: invalid start byte 2018/07/26 15:13:14 exit status 1 2018/07/26 15:13:14 Container exited with non zero status code ERROR: Job failed: exit code 1 ``` ### Output of checks This bug happens on GitLab.com ### Possible fixes A possible work-around per https://www.dwheeler.com/flawfinder/flawfinder.pdf is to force the usage of python2, instead of python3. I attempted to set LANG=C, but that didn't seem to help. Forcing python2 is hacky, but it should be reliable in the face of uncertain input. ## Planned Fix * [x] Document how to fix the encoding issues by using iconv or cvt2utf to convert all files to utf8. * [x] Parse and output a helpful GitLab specific message if flawfinder fails to run due to a character encoding issue.
issue