MVC: Install a default NetworkPolicy object into GitLab-managed Kubernetes clusters (Container network security)
Problem to solve & Use Cases
Kubernetes clusters orchestrate many containers at once across different pods and nodes. These containers and pods can be configured to accept traffic from other containers and pods but by default, this means that they will accept traffic from any other pod or container inside the cluster, without regards for which specific pod or container that traffic is coming from.
As a result, a pod inside the same cluster could connect to another application's pod, even if it had nothing to do with the application itself and potentially attack the app. This could happen if the cluster is shared between multiple different users or if an attacker had somehow been able to escalate their permissions in the cluster.
Preventing this requires that additional logic and security controls be added at the application layer by developers or at the cluster level by operators to mitigate potential attacks. This is both time-consuming and difficult to do correctly.
Intended users
- Devon
-
Sidney
- Operators who are responsible for creating and managing Kubernetes clusters
-
Sam
- Security team members who are responsible for ensuring that controls are put in place to prevent abuse
Proposal
As the first MVC for container network security, we should start by letting users enable our chosen container network security solution. Keeping with our listen and record first, then act principle, we should not enforce any default policy that could potentially impact any deployed cluster.
We cannot know all existing use cases for our users' deployed applications so we must therefore initially make no enforcement the default. We can offer a suggested NetworkPolicy
either via documentation, commented out in a configuration, or both. However, it will ultimately be up to the user to define the desired policy for this MVC.
Previous Proposal:
As the first MVC for container network security, we should set a default set of policies designed to only let an application's pods accept network traffic from other pods in the application (and external traffic through an Ingress controller), but not any other pods in the cluster.
Do this by adding a default
NetworkPolicy
item inside the Kubernetes cluster designed to provide basic isolation for the deployed application and its containers and pods.
Minimal
- Integrate a
NetworkPolicy
provider with the Kubernetes cluster GitLab provisions for users.- This is needed since Kubernetes can parse
NetworkPolicy
object files, but does not provide a back-end for supporting them out of the box - Allow users to install a cluster application for the network provider that GitLab chooses
-
Confirm the specific technology to use. Two options I found: - Cilium is now the chosen option based on 2019-12-02 discussion
-
Project Calico was chosen. Discussion is in the comments below.- Logging appears to be a paid feature
- Should focus on GCP as the cloud provider first. AWS should be second if possible during the MVC.
- This is needed since Kubernetes can parse
- Provide users a way to configure
NetworkPolicy
objects but do not apply any policy by default. It is acceptable to include a sane, permissive suggested policy (like outlined above and below) in the configuration if it is disabled/commented out by default.Provide users a way to disable the installation of defaultNetworkPolicy
objects- Consider how we approached this with WAF and if a similar approach makes sense to do here
-
For AutoDevOps and clusters that have a valid network provider installed, apply a default set ofNetworkPolicy
objects when deploying the application[ ] Get input from team on contents for the default policy.-
Initial proposal: All pods from the deployed app can communicate with each other and external traffic through an Ingress controller, but no other pods in the cluster can talk to those pods.It must be possible to use other GitLab Managed Apps, such as Prometheus or Jupyter, when the defaultNetworkPolicy
object has been applied. We will see many users who want to useNetworkPolicy
support also using the other GitLab Managed Apps and we shouldn't make it an either/or decision of which to use.-
(Solution idea) What about using a label for pods to determine if they should be allowed access? What about using namespaces?This will be post-MVC
- Log traffic that would have been blocked (audit mode) and display it to users
- At a minimum, this should be visible as text
- Note: this requires feasibility investigation as Cilium provides drop-only logging OOTB.
- Documentation and information for users
- Explain the problem we're solving
- Explain how to enable & disable the functionality.
- Explain how to consume the results.
-
Usage analytics added- moved to #199071 (closed)Report when the support has been installedReport when the support has been uninstalled
Next steps / post-MVC
- Externally expose/publish the
NetworkPolicy
objects that we're applying so users can copy & paste the objects if their project does not use AutoDevOps. - Ability to specify custom
NetworkPolicy
objects - Ability to import pre-defined
NetworkPolicy
objects from other projects or from a URL - Ability to create a
Finding
orVulnerability
when traffic has been blocked/dropped. - Richer and more robust ways to display data & integrate with dashboards
- Allow
NetworkPolicy
objects to work with serverless deployments. #32701 (closed)
What does success look like, and how can we measure that?
- % of users who create GitLab-managed Kuberenetes cluster within 30 days of release who install the
NetworkPolicy
support. Target => 50%- Measure adoption of the capability by our target users.
- % of users who keep using the
NetworkPolicy
support without uninstalling it for at least 90 days. Target => 80%- Measure if users install and then keep using the capability. If they uninstall it, that indicates it is not providing enough value, is introducing problems, or something else - in all cases, it is an indicator to investigate more to understand why it was uninstalled.
What is the type of buyer?
Questions
- What is a good
NetworkPolicy
provider to start with? Some options include-
Project Calico
- Logging appears to be a paid capability
Cilium
-
Project Calico
- What is a good set of default rules to use for the
NetworkPolicy
?- Is there a best practice set of defaults we can use instead of writing our own? If there is, we can install them by default then make the
NetworkPolicy
support opt-out instead of opt-in.
- Is there a best practice set of defaults we can use instead of writing our own? If there is, we can install them by default then make the
- What is the data retention policy for logs in terms of time and size?
Decisions
- Cilium is now the chosen option based on 2019-12-02 discussion