MVC: Install a default NetworkPolicy object into GitLab-managed Kubernetes clusters (Container network security)

Problem to solve & Use Cases

Kubernetes clusters orchestrate many containers at once across different pods and nodes. These containers and pods can be configured to accept traffic from other containers and pods but by default, this means that they will accept traffic from any other pod or container inside the cluster, without regards for which specific pod or container that traffic is coming from.

As a result, a pod inside the same cluster could connect to another application's pod, even if it had nothing to do with the application itself and potentially attack the app. This could happen if the cluster is shared between multiple different users or if an attacker had somehow been able to escalate their permissions in the cluster.

Preventing this requires that additional logic and security controls be added at the application layer by developers or at the cluster level by operators to mitigate potential attacks. This is both time-consuming and difficult to do correctly.

Intended users

Devon
Sidney
1. Operators who are responsible for creating and managing Kubernetes clusters
Sam
1. Security team members who are responsible for ensuring that controls are put in place to prevent abuse

Proposal

As the first MVC for container network security, we should start by letting users enable our chosen container network security solution. Keeping with our listen and record first, then act principle, we should not enforce any default policy that could potentially impact any deployed cluster.

We cannot know all existing use cases for our users' deployed applications so we must therefore initially make no enforcement the default. We can offer a suggested NetworkPolicy either via documentation, commented out in a configuration, or both. However, it will ultimately be up to the user to define the desired policy for this MVC.

Previous Proposal:

As the first MVC for container network security, we should set a default set of policies designed to only let an application's pods accept network traffic from other pods in the application (and external traffic through an Ingress controller), but not any other pods in the cluster.

Do this by adding a default NetworkPolicy item inside the Kubernetes cluster designed to provide basic isolation for the deployed application and its containers and pods.

Minimal

Integrate a NetworkPolicy provider with the Kubernetes cluster GitLab provisions for users.
- This is needed since Kubernetes can parse NetworkPolicy object files, but does not provide a back-end for supporting them out of the box
- Allow users to install a cluster application for the network provider that GitLab chooses
- Confirm the specific technology to use. Two options I found:
  - Cilium is now the chosen option based on 2019-12-02 discussion
  - ~~Project Calico was chosen. Discussion is in the comments below.~~
    - Logging appears to be a paid feature
- Should focus on GCP as the cloud provider first. AWS should be second if possible during the MVC.
Provide users a way to configure NetworkPolicy objects but do not apply any policy by default. It is acceptable to include a sane, permissive suggested policy (like outlined above and below) in the configuration if it is disabled/commented out by default. ~~Provide users a way to disable the installation of default NetworkPolicy objects~~
- Consider how we approached this with WAF and if a similar approach makes sense to do here
~~For AutoDevOps and clusters that have a valid network provider installed, apply a default set of NetworkPolicy objects when deploying the application~~
- ~~[ ] Get input from team on contents for the default policy.~~
- ~~Initial proposal: All pods from the deployed app can communicate with each other and external traffic through an Ingress controller, but no other pods in the cluster can talk to those pods.~~
  - It must be possible to use other GitLab Managed Apps, such as Prometheus or Jupyter, when the default NetworkPolicy object has been applied. We will see many users who want to use NetworkPolicy support also using the other GitLab Managed Apps and we shouldn't make it an either/or decision of which to use.
  - ~~(Solution idea) What about using a label for pods to determine if they should be allowed access? What about using namespaces?~~ This will be post-MVC
Log traffic that would have been blocked (audit mode) and display it to users
- At a minimum, this should be visible as text
- Note: this requires feasibility investigation as Cilium provides drop-only logging OOTB.
Documentation and information for users
1. Explain the problem we're solving
2. Explain how to enable & disable the functionality.
3. Explain how to consume the results.
~~Usage analytics added~~ - moved to #199071 (closed)
1. ~~Report when the support has been installed~~
2. ~~Report when the support has been uninstalled~~

Next steps / post-MVC

Externally expose/publish the NetworkPolicy objects that we're applying so users can copy & paste the objects if their project does not use AutoDevOps.
Ability to specify custom NetworkPolicy objects
Ability to import pre-defined NetworkPolicy objects from other projects or from a URL
Ability to create a Finding or Vulnerability when traffic has been blocked/dropped.
Richer and more robust ways to display data & integrate with dashboards
Allow NetworkPolicy objects to work with serverless deployments. #32701 (closed)

What does success look like, and how can we measure that?

% of users who create GitLab-managed Kuberenetes cluster within 30 days of release who install the NetworkPolicy support. Target => 50%
- Measure adoption of the capability by our target users.
% of users who keep using the NetworkPolicy support without uninstalling it for at least 90 days. Target => 80%
- Measure if users install and then keep using the capability. If they uninstall it, that indicates it is not providing enough value, is introducing problems, or something else - in all cases, it is an indicator to investigate more to understand why it was uninstalled.

What is the type of buyer?

GitLab Ultimate

Questions

What is a good NetworkPolicy provider to start with? Some options include
1. Project Calico
  - Logging appears to be a paid capability
2. ~~Cilium~~
What is a good set of default rules to use for the NetworkPolicy?
1. Is there a best practice set of defaults we can use instead of writing our own? If there is, we can install them by default then make the NetworkPolicy support opt-out instead of opt-in.
What is the data retention policy for logs in terms of time and size?

Decisions

Cilium is now the chosen option based on 2019-12-02 discussion

Links / References

Edited Jan 27, 2020 by Matt Wilson