Web Application Firewall for Kubernetes Cluster MVC
## Problem to solve
Organizations today deploy web applications which are subject to many different network-based attacks, many of which occur at the HTTP level, beyond where traditional firewalls are effective. We want to be able to identify malicious HTTP traffic based on the contents in the HTTP messages before they reach the rest of the web app so we can either log or drop the traffic.
<!-- Original text, which is a solution not a problem statement, commented for now.
Users of our Kubernetes cluster management would benefit from available tools that provide Web Application Firewall functionality logging known rules that identify malicious requests to applications deployed to the cluster.
-->
### Intended users
<!-- Who will use this feature? If known, include any of the following: types of users (e.g. Developer), personas, or specific company roles (e.g. Release Manager). It's okay to write "Unknown" and fill this field in later.
Personas can be found at https://about.gitlab.com/handbook/marketing/product-marketing/roles-personas/ -->
1. Sidney (Systems Administrator)
1. Sam (Security Analyst)
1. Devon (DevOps Engineer)
### Details
At a high-level, this issue is about installing the [ModSecurity](https://www.modsecurity.org/) WAF into the Nginx Ingress controller of a Kubernetes cluster. Configure the WAF in a report-only mode to illustrate that the WAF is working correctly and for us to start learning more.
Do _not_ block traffic automatically, as our [Security Paradigm](https://about.gitlab.com/direction/secure/#security-paradigm) is to not block unless explicitly asked to do otherwise. The efforts to enable blocking are out of scope of this MVC and will be done in future issues, since there are implications we will need to work through with respect to UX, how to inform users about blocked traffic, and how to give controls over false positives.
<!-- We want to introduce a Web Application Firewall (WAF) to protect applications that are deployed to Kubernetes using our GitLab integration. -->
<!-- We should enable ModSecurity when we install the ingress, and allow users to leverage it. -->
### Proposal
1. Allow users to install ModSecurity into the Ingress controller of a Kubernetes cluster in detection-only mode.
* Allow this for both existing clusters or clusters created with our GKE-integration.
* [x] I propose that this is another "Application" under the Kubernetes cluster configuration screen. Would love input on this point if there is a better spot though.
1. Pre-configure the WAF with the [default OWASP rules](https://coreruleset.org/).
* Always configure with this rule set
* Do not expose this as a configuration option to the user for them to provide their own rules. (That will be done in a future issue).
1. Allow users to uninstall the WAF, once installed, if desired.
* This action should be done in the same location as where the user chose to install the WAF
* Do this for both existing clusters or clusters created with our GKE-integration.
1. Expose the logs produced by the WAF to the user
* [x] Get input from team here on best place & way to do this. Look at the logs of a pod, create a dedicated screen/tab somewhere (cluster management or environment screen perhaps), download a log file, something else?.
1. Implement adoption and usage metrics
- Report back that installation of WAF in the project was performed
- Report back that removal of WAF in the project was performed
- Leverage the existing product reporting mechanisms
- [ ] Anything more specific needed here to differentiate between usage ping or GitLab.com?
<!--
Logs will be created to track malicious requests for deployed applications. We should define if we want to enable rules for all the sites during the installation, or to allow applications to enable tracking (Auto DevOps will enable by default).
-->
### Documentation
<!-- See the Feature Change Documentation Workflow https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html
Add all known Documentation Requirements here, per https://docs.gitlab.com/ee/development/documentation/feature-change-workflow.html#documentation-requirements -->
Documentation should be created and/or updated to cover:
- The problem a WAF solves
- What a WAF is at a high-level
- What the GitLab WAF is pre-configured to detect and report on
- How the GitLab WAF fits inside the customer's application architecture
- How to enable the GitLab WAF inside of an application
- How to configure the GitLab WAF
- How to remove the GitLab WAF
- How to consume the results produced by the GitLab WAF
### Testing
<!-- What risks does this change pose? How might it affect the quality of the product? What additional test coverage or changes to tests will be needed? Will it require cross-browser testing? See the test engineering process for further guidelines: https://about.gitlab.com/handbook/engineering/quality/guidelines/test-engineering/ -->
There are several testing aspects that should be focused on for this MVC that roughly map to the different parts of the user experience:
- Installation/configuration of the WAF
- Users should be able to successfully install and configure the WAF if they are using the reference application architecture and deployment model.
- GitLab should fail gracefully if they have an incompatible application architecture or deployment model.
- Use of the application with the WAF installed
- The application should still continue to function properly for legitimate traffic.
- The WAF should not block malicious traffic unless it has been configured to specifically do so.
- The various interfaces to consume the WAF results should be correctly displaying events as they occur.
- Removal of the WAF
- Use of the application *after the WAF has been uninstalled*
- We need to confirm that uninstallation of the WAF has no adverse ongoing effects on the application. It should behave identically to before it ever had a WAF installed.
[ ] @twoodham Schedule deep dive on requirements
### What does success look like, and how can we measure that?
Percentage of newly created clusters with Ingress installed that still have the WAF enabled within 30 days of our release. Target => 75%
- Since clusters will have WAF added when Ingress is installed, we should expect a high percentage of clusters to continue using the WAF if it is successful. If we see many users disabling or removing the WAF, that is an indicator we need to investigate.
Percentage of all GKE-integration clusters that have the WAF within 30 days of our release. Target => 10%
Percentage of the above users who continue using the WAF in their deployed app for at least 30 days. Target => 75%
- This metric is important to ensure that customers are getting enough value out of the GitLab WAF to continue using it beyond an initial exploration period.
- Initially measure 60 days after release. Periodically re-measure.
Ingress controllers with ModSecurity enabled.
### Links / references
* [ModSecurity WAF](https://modsecurity.org/)
* Product discovery at https://gitlab.com/gitlab-org/gitlab-ee/issues/9520.
* The Nginx Ingress controller supports ModSecurity and allows to enable it via annotations:
- https://kubernetes.github.io/ingress-nginx/user-guide/third-party-addons/modsecurity/
- https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#modsecurity
### Development Log
### Status
- [x] gitlab-ce~2492649 work to add `modsecurity` to ingress integration ~~https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/15774~~ https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/32905
- [ ] ~~Consider upgrading `nginx-ingress` to get latest rules https://gitlab.com/gitlab-org/gitlab-ce/issues/61355~~ *DEFERRED*
- [ ] ~~Expose `gitlab-managed-apps` within Pod Logs~~ *DEFERRED*
### Decisions
- Ship behind feature flag to ensure performance change can be disabled if user considers it significant
- Initially ship as a configuration of existing `Ingress` managed application, not a separate app
- Ideally, logs will be fetchable via some UI like pod logs, but currently require manually tailing log from ingress controller pod
issue