Investigate unexpected overhead when scanning HTML files with GitLab Advanced SAST
Summary
As mentioned here, there's currently an overhead when HTML is added to lightzSupportedLangs, as shown in the With HTML support table. This overhead occurs even when there are no rules for HTML, as mentioned here, which is unexpected behaviour, and a potential bug.
The purpose of this issue is to:
investigate what is causing the scan to take time with many HTML files.
Steps to reproduce
Follow the steps in this comment
What is the current bug behavior?
lightz-aio has an overhead when scanning projects containing HTML files, even if there are no HTML rules.
What is the expected correct behavior?
There should be no overhead in lightz-aio when scanning projects containing HTML files, if there are no HTML rules.
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)(we will only investigate if the tests are passing)
Possible fixes
/cc @thiagocsf