Engineering Proposal for Host-based IDS/IPS Protection

Overview

For more details and context, please reference the Proof of Concept research issue that preceded this.

Product Requirements

Engineering will propose an architecture that meets the following customer requirements:

Must Have

File integrity monitoring - The ability to monitor and prevent changes to the container after the container has been started
Application allow listing - The ability to monitor and prevent processes from starting in the container if they do not match a predefined list of process names or hashes. This could also include the ability to monitor and prevent new processes from starting after the container has been fully started.
Active response / blocking - This is the "prevent" capability described in the above requirements. Fundamentally it is the ability to stop unwanted activity from occurring (processes, network communications, file changes, etc.).
Acceptable performance overhead - This needs to be considered both two perspectives. 1) Performance overhead should not interfere with the smooth operation of the customer's application. 2) Performance overhead should not cause a customer's cloud bill to increase dramatically.

Nice to Have

Malware scanning - The ability to scan all container file systems for known malware while they are running / after the containers have started.
Vulnerability scanning - The ability to identify known vulnerabilities in the packages of a container after it has been started and is running in production. Note: An acceptable solution would be to re-scan images that are running in production using the Secure stage capabilities (rather than scanning the running container itself).
Configuration vulnerability detection - The ability to identify known vulnerabilities in the way that running applications are configured after a container has been started and while the application is running in production. Note: An acceptable solution would be to re-scan images that are running in production using the Secure stage capabilities (rather than scanning the running container itself).
No need to give GitLab credentials to production Kubernetes or container environments - Customers have an option to do a manual install to avoid concerns about GitLab storing credentials to a production environment.

Engineering Architecture Proposal

Priority	T-Shirt Size	Technology	Requirements Addressed
1	M	Falco	Prerequisite to all other technologies below
2	L	AppArmor + Pod Security Policy	Inline Blocking/Prevention, Application Allow Listing, File Integrity Monitoring
3	M	Falco Sidekick	Active Response Options (create GitLab issue, send Slack message, run shell script, etc.)
4	S	GitLab Scheduled Pipeline running Secure Scans	Vulnerability Scanning, Configuration Vulnerability Scanning
5	M	ClamAV	Malware Scanning

Technologies Considered

TECHNOLOGY	REQUIREMENTS ADDRESSED	PROS	CONS	ITERATION PLAN	POSSIBLE ADDITIONAL FEATURES BUILT WITH THAT TECHNOLOGY	T-SHIRT SIZE	DEPENDENCIES
Falco	File Integrity Monitoring	cloud-native solution (part of CNCF), integrates easily with k8s cluster (as DaemonSet), easily extendible with macros/rules, active open-source community, gives all insights about unexpected behavior in the cluster popular solution in the industry, large community (we could contribute back to Falco community by exposing our rules/macros.	potentially not every Cloud provider accepts Falco kernel modules	Allow user to install Falco with Helm with single-click with a default set of rules (or extended by GitLab) that will report to Prometheus or Fluentd, Add Falco to GitLab CI/CD apps and allow users to modify rules/macros there	Container Behavior Analytics - Falco will help us achieve not only File Integrity Monitoring, but could help us all malicious behavior that is happening in pods; Events are easily exportable to Prometheus and we could potentially build some ML around these events as well	M	None
Pod Security Policy + AppArmor	Application allow listing	Pod Security Policy allows settings up capabilities and management of the AppArmor profiles cluster-wise. Instead of using the profiles set cluster-wise, profiles can be set on a container level through annotations. Files and their permissions can be set as allowed, logged, blocked, and blocked & logged.	A Helm chart with the DaemonSet which loads policies will be maintained by us.	Create and host a helm chart with the profile loader. Allow users to deploy a Helm chart with custom profile as a cluster managed app CI/CD. Allow users to set a cluster-wise pod security policy per deployment. Allow users to define which profile to be used per container.	Pod Security Policy also allows the definition of permissions related to capabilities, privileges (user, volumes, network) in addition to secComp and sysctl profiles.	L	None
Simple Go app	Active response / blocking	extendibility, simplicity, configurability (you configure scripts and alerts in similar way as for Falco; using configmaps), application could be a OSS extension to Falco that will allow to react/block	application written in-house (simple, but we still have to maintain it (+ helm charts) and fix all potential issues)	Write a simple Go application that will just run scripts when a certain event is triggered and a helm chart that will install this application to the cluster. Extend the application to react only when certain threshold is met (ie. 10 events in last 60s)	Capabilities are limitless as we allow customer to write their own script to be executed when certain event is triggered (ie. send a slack update, do API requests, etc.)	L	- Falco
NATS + Kubeless	Active response / blocking	extendibility, configurability, complex solution that will allow users to do more with their clusters,	we have to install and maintain 2 additional applications and test if these helm charts are properly installed, we need to add ability for users to add/remove Kubeless functions, adds unnecessary complexity	Allow users to install NATS/Kubeless in the Cluster and add ability to add/delete Kubeless functions Add UI to manage Kubeless functions	Potentially new feature for GitLab - being able to deploy single functions (serverless) Users will be able to utilize NATS for their applications	XL	- Falco
GitLab (scheduled pipeline)	Malware scanning	we are using same tools as we are using currently for MRs, this is actually possible right now (not sure where results are being saved)	we will not detect malwares introduced during container runtime (but we should be able to limit malicious behavior with AppArmor/Pod Policy)	Make sure that we can detect malwares and that when we detect them during scheduled pipeline they are visible on project page		S	None
ClamAV	Malware scanning	Has a stable helm chart which can be used to scan on a node level or within a container if proper mapping is set.		Allow users to install as part of the CI/CD pipeline. Update deployment templates to set the mapping in order to allow clamav to be run within the container.		M	Falco
Dagda	Malware scanning	Uses ClamAV for malware detection in addition to other features	It does not have a helm chart so it would have to be created and maintained by our team.	Create and publish the helm chart. Allow the user to install dagda as part of the CI/CD Allow user’s containers to be scanned by dagda	it has other features like static analyses, anomaly detection as it has integration with Falco.	L	Falco
GitLab (scheduled pipeline)	Vulnerability scanning	we are using same tools as we are using currently for MRs, this is actually possible right now (not sure where results are being saved)	we will not detect vulnerabilities introduced during container runtime (but we should be able to limit malicious behavior with AppArmor/Pod Policy)	Make sure that all vulnerabilities discovered during scheduled pipeline are visible on project page		S	None
Dagda	Vulnerability scanning	it has the ability to scan running containers for vulnerabilities.	It does not have a helm chart so it would have to be created and maintained by our team.	Create and publish the helm chart. Allow the user to install dagda as part of the CI/CD Allow user’s containers to be scanned by dagda	it has other features like static analyses, anomaly detection, malware detection as it has integration with ClamAV and Falco.	L	Falco / ClamAV
OpenSCAP - GitLab Secure Stage	Configuration vulnerability detection	we can detect if packages installed in the container are configured and compliant with security standards	can only be executed for the image (not running container), so we will not detect configuration vulnerabilities that were introduced during the runtime (AppArmor should help here)	Add OpenSCAP as a part of CI/CD pipeline (when we are checking container image)		L/XL	None
No available solution that is working for running containers (possibly we could extend OpenSCAP to support that)	Configuration vulnerability detection	N/A	N/A	N/A	N/A	?	None
kube-bench / kubeaudit (optional)	Configuration vulnerability detection (for Kubernetes cluster)	we can detect problems with K8s installation, users are able to recognize issues and secure the cluster before it is compromized		Add kube-bench / kubeaudit as a part of CI/CD process, parse and collect vulnerabilities (possibly new vulnerability report type will have to be added)		L	None
Kubesec	Configuration vulnerability management (Passwords/Secrets)	Allow users to encrypt their secrets within K8s				L	None

Edited May 15, 2020 by Sam White