ModSecurity WAF causing high latency during request resolution
Summary
There have been both internal and external reports that our addition of the ModSecurity Web Application Firewall may be causing erratic behavior. While this feature exists behind an instance-level feature flag, when the flag is enabled it applies to all nginx-ingress cluster deployments within that instance.
The feature flag is currently enabled for the entire instance of gitlab.com, but defaults to "off" for self-hosted instances.
As was proposed by @danielgruesso, we should consider changing the default behavior to opt-in to loading the modsecurity nginx module.
The default behavior should result in passive logging of rule violations only, however we have seen several reports of unstability that may be attributed to the loading of the modsecurity nginx connector. We have so far been unable to narrow down the exact cause.
Previously proposed was to use the documented configuration variable AUTO_DEVOPS_MODSECURITY_SEC_RULE_ENGINE
to disable the processing behavior within modsecurity itself, however this does not appear to have resolved the issue. It appears linked to the top-level loading of the modsecurity capability instead, rather than per server block
Steps to reproduce
(How one can reproduce the issue - this is very important)
Example Project
What is the current bug behavior?
nginx appears to be hanging when attempting to establish TLS connections. This happens erratically but appears to get worse over time.
Quoting from gitlab-org/gitlab-services/design.gitlab.com#442 (comment 248879526)
Having a closer look this afternoon I was able to actually kind of reproduce the issue on demand, or at least, see it fail consistently in a way that was interesting. Basically what would happen is curls against design.gitlab.com would start failing semi consistently, hung on attempting to establish tls sessions. What was more interesting was the failure itself was happening from inside pods against the k8s internal service dns entry that's automatically created. ... What is more interesting is how the connections just hang on
SSL HELLO
, and that things work fine then slowly over time get worse and worse. I wonder if this is either
What is the expected correct behavior?
Enabling modsecurity in DetectionOnly
mode should not interfere with request resolution or contribute to significant request latency.
Relevant logs and/or screenshots
gitlab-org/gitlab-services/design.gitlab.com#442 (comment 248879526)
Output of checks
This bug happens on GitLab.com
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:env:info
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production
)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
(If you can, link to the line of code that might be responsible for the problem)
Development Log
Status
-
backend MR to add
enable_modsecurity
column to ingress managed app - backend frontend work to enable toggleable setting (replacing feature flag) !21966 (merged)