UX Discovery: Cluster NetworkPolicy & Intrusion Detection System statistics

Problem to solve

We will introduce new security controls, such as the Intrusion Detection system (IDS) and NetworkPolicy, which produce various sorts of information. Users will have to connect these different pieces of information & logging systems together to understand their risk and security posture. Having this information in multiple places and inside low-level log files makes this process difficult and time consuming.

Intended users

  • Devon (DevOps Engineer)
  • Sidney (Systems Administrator)
  • Sam (Security Analyst)

Further details

Tool overviews: IDS

What is IDS?

An intrusion detection system (IDS) is a device or software application that monitors a network or systems for malicious activity or policy violations.

IRL:

A network-based IDS is what stopped Samuel L Jackson from being able to access the Jurassic Park Network thus preventing him from saving the day, and ultimately the park.

Even more details here

An intrusion detection system (IDS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any malicious activity or violation is typically reported either to an administrator or collected centrally using a security information and event management (SIEM) system. A SIEM system combines outputs from multiple sources and uses alarm filtering techniques to distinguish malicious activity from false alarms.

IDS types range in scope from single computers to large networks. The most common classifications are network intrusion detection systems (NIDS) and host-based intrusion detection systems (HIDS). A system that monitors important operating system files is an example of an HIDS, while a system that analyzes incoming network traffic is an example of an NIDS. It is also possible to classify IDS by detection approach. The most well-known variants are signature-based detection (recognizing bad patterns, such as malware) and anomaly-based detection (detecting deviations from a model of "good" traffic, which often relies on machine learning). Another common variant is reputation-based detection (recognizing the potential threat according to the reputation scores). Some IDS products have the ability to respond to detected intrusions. Systems with response capabilities are typically referred to as an intrusion prevention system. Intrusion detection systems can also serve specific purposes by augmenting them with custom tools, such as using a honeypot to attract and characterize malicious traffic.

Even more here

What is the relationship between the IDS and WAF?

Although they both relate to network security, an IDS differs from a firewall in that a firewall looks outwardly for intrusions in order to stop them from happening. Firewalls limit access between networks to prevent intrusion and do not signal an attack from inside the network. An IDS describes a suspected intrusion once it has taken place and signals an alarm. An IDS also watches for attacks that originate from within a system. This is traditionally achieved by examining network communications, identifying heuristics and patterns (often known as signatures) of common computer attacks, and taking action to alert operators. A system that terminates connections is called an intrusion prevention system, and performs access control like an application layer firewall.

What tools are we leveraging for IDS?

Falco

Falco is a behavioral activity monitor designed to detect anomalous activity in your applications. Using powerful system call capture technology originally built by Sysdig. Falco lets you continuously monitor and detect container, application, host, and network activity, all in one place, from one source of data, with one set of rules.

Is Falco also an IPS "intrusion protection system" too?

No, Falco can not serve as a complete IDPS (Intrusion and protection system). Falco only logs and alerts intrusion events.

How do our IDS tools work within the confounds of GitLab?

Similar to the WAF The Falco IDS needs some established pre-requisites for the user to have in order to use the tool. After the initial install and configuration phase, the user will create and/or port a rules yaml file for the IDS to function. Once the rules are in place, the tool will begin reporting/alerting intrusions to the user. Below is an expanded view of these phases.

1. Install

We will install Falco in a Kubernetes cluster. To do so, deploy a DaemonSet to the Kubernetes cluster. A Falco installation on Kubernetes monitors the cluster, its worker nodes, and running containers for abnormal behavior.

🛠 Install isntructions

2. Configuration

Falco’s configuration file is a YAML file containing a collection of key: value or key: [value list] pairs.

⚙ Read more about configuration here

3. Rule setting

📋

4. Reporting and defense

🏰

What information do we get back from our IDS tool(s)?

When Falco detects suspicious behavior, it sends alerts via one or more channels:

  • Writing to standard error
  • Writing to a file
  • Writing to syslog
  • Pipe to a spawned program. A common use of this output type would be to send an email for every Falco notification.
JSON Output:

For all output channels, you can switch to JSON output either in the configuration file or on the command line. For each alert, falco will print a JSON object, on a single line, containing the following properties:

  • time: the time of the alert, in ISO8601 format.
  • rule: the rule that resulted in the alert.
  • priority: the priority of the rule that generated the alert.
  • output: the formatted output string for the alert.
  • output_fields: for each templated value in the output expression, the value of that field from the event that triggered the alert.
Single line:
{"output":"16:31:56.746609046: Error File below a known binary directory opened for writing (user=root command=touch /bin/hack file=/bin/hack)","priority":"Error","rule":"Write below binary dir","time":"2017-10-09T23:31:56.746609046Z", "output_fields": {"evt.t\
ime":1507591916746609046,"fd.name":"/bin/hack","proc.cmdline":"touch /bin/hack","user.name":"root"
Pretty-printed:
{
   "output" : "16:31:56.746609046: Error File below a known binary directory opened for writing (user=root command=touch /bin/hack file=/bin/hack)"
   "priority" : "Error",
   "rule" : "Write below binary dir",
   "time" : "2017-10-09T23:31:56.746609046Z",
   "output_fields" : {
      "user.name" : "root",
      "evt.time" : 1507591916746609046,
      "fd.name" : "/bin/hack",
      "proc.cmdline" : "touch /bin/hack"
   }
}

Recommended from sysdig for Falco: https://sysdig.com/blog/kubernetes-security-logging-fluentd-falco/

  • Area chart of all Falco alerts over time.
    • Aggregation: Date Histogram
    • Field: timestamp
    • Interval: Minute
  • Pie chart of top 10 Falco rules triggered.
    • Split Slices
    • Aggregation: Significant Terms
    • Field: rule.keyword
    • Count: 10
  • Pie chart of alerts by priority.
    • Split Slices
    • Aggregation: Significant Terms
    • Field: priority.keyword
    • Count: 10
  • Table of top 20 Falco rules triggered.
    • Split Rows
    • Aggregation: Significant Terms
    • Field: rule.keyword
    • Count: 20
  • Table of Falco alerts by Kubernetes Node
    • Split Rows
    • Aggregation: Significant Terms
    • Field: kubernetes.host.keyword
    • Count: 20
  • Table of Falco alerts by Kubernetes Pod
    • Split Rows
    • Aggregation: Significant Terms
    • Field: output_fields.k8s.pod.name.keyword
    • Count: 20

What technical barriers or limitations if any are we faced with due to our integration?

Tool overviews: NetworkPolicy

What is NetworkPolicy?

What tools are we leveraging for NetworkPolicy?

Cilium

How does NetworkPolicy working within the confounds of GitLab?

Screen_Shot_2019-12-11_at_1.33.36_PM

What information do we get back from our NetworkPolicy integrations?

Network state

  • Dropped Egress Packets (time / ops) Stacked line
    • Invalid destination mac
    • invalid source ip
    • Missed tail call
    • Policy denied (L3)
    • Service backend not found
    • Unknown L3 target address
  • Dropped Egress Traffic (time / kbps) Stacked line
    • Invalid destination mac
    • invalid source ip
    • Missed tail call
    • Policy denied (L3)
    • Service backend not found
    • Unknown L3 target address
  • Cilium drops Ingress (time / ops) Stacked line
    • Invalid packet
    • Not a local target address
    • Policy denied (L3)
  • Dropped Ingress Traffic (time / bps) Stacked line
    • Invalid packet
    • Not a local target address
    • Policy denied (L3)
  • L7 Forwarded request (time / requests) Stacked line
    • L7 Forwarded
    • L7 Denied
  • Proxy response time (time / ms ) AVG Stacked line
    • processingTime Avg: 1ms
    • upstreamTime AVG: 7ms
    • Parse errors AVG: 0
  • Proxy response time (time / ms ) MAX Stacked line
    • processingTime MAX (Avg)
    • upstreamTime MAX (Avg)
  • Policies Per Node (time / amount ) Stacked line
    • Min Current: (amt)
    • Avg Current: (amt)
    • Max Current: (amt)
    • Policy import errors: (amt)

Policy state

  • Policy Trigger Duration (time/ms) area
    • Min (Avg)
    • Avg (Avg)
    • Max (Avg)
  • Policy Trigger Runs (time / OPM) line chart
    • Min trigger
    • Avg trigger
    • Max trigger
  • Endpoints policy enforcement status (endpoint / amount) bar chart
    • both
    • egress
    • ingress
    • none
  • Proxy Redirects (time / redirects) area
    • min
    • avg
    • max
  • Proxy revision (time / amount) area
    • min
    • avg
    • max

What technical barriers or limitations if any are we faced with due to our integration?


Research to do

JTDB

What are the core JTBD for IDS users

IDS: (look into Falco users for reference)

What are the core JTBD for NetworkPolicy users

Journey Maps

What are the journeys of IDS feature users?
What are the journeys of NetworkPolicy feature users?

Competitive analysis

What do these features look and feel like today in other platforms?

Keep in-mind:

  • Setup
  • Configuration
  • Data output
  • Data interaction = Grouping of other features along with IDS and NetworkPolicies

IDS:

  • What do traditional IDSs look like in the space
  • What do cluster-based IDSs look like in the space

NetworkPolicy:

  • What do cluster-based network policies look like in the space
  • What do Kubernetes cluster-based network policies look like in the space

Design to consider

Information presentation of IDS and NetworkPolicy:

Unified Threat Monitoring experience approach with:

  • WAF stats
  • IDS data
  • NetworkPolicies

"Consider this as an opportunity for the Unified Dashboard approach from UX."

Be sure to link issue there [link] in that initiative as well


Product

Can these journeys and JTBD be broken down into category maturity?

Minimal (Required)

IDS
  • When my company needs to monitor malicious activity in their applications, I want to implement an Intrusion Detection System, So I can assure them that anomalies will be reported.

  • When I am installing my IDS for the first time, I want to follow (proper) documentation procedures, So that I can save time by not having to troubleshoot undocumented problems.

  • When I am installing my IDS for the first time, I want to configure it for a staging or test environment, So I don't impact production and verify it works to my specifications.

  • When I am installing the IDS for the first time, I want to be able to set and configure policies based on my companies needs, so I can monitor alerts that are relevant to the application.

  • When I am using the IDS, I want to receive monitoring information, so I can tune/configure and/or respond to alerts in real-time.

  • Journeys
Viable (Required)
  • JTBD

  • Journeys

Complete (Nice to have)
  • JTBD

  • Journeys

Loveable (Nice to have
  • JTBD

  • Journeys


Some specific use cases we could start with:

  • Presentation of high-level information
    • Similar to WAF, we could surface enough meaningful info to help users understand "yes, its turned on and configured correctly"
    • We could also explore ways to link and correlate results together. An example could be identifying alerts that all occurred within 3 minutes of each other, which perhaps is the same attacker being caught by multiple systems.
      • This is likely future work and covered by some our UEBA/machine learning ideas
  • Configuration & settings

Proposal

  • Research and validate the specific types of information we can and want to present to users based on their problems to be solved.
  • Create a plan for how users can interact with the results from the IDS & Network Policy on a higher-level than raw log files.
    • Identify how to present this information in a unified way with other security controls, like WAF.

Permissions and Security

Documentation

Testing

What does success look like, and how can we measure that?

  • Clarify and outline information regarding IDS and NetworkPolicy for all to understand
  • List of jobs to be done relative to the applicable security controls
  • Journey Maps of the e2e experience for user validation and technical feedback.
  • Creation and validation of prototypes / mock-ups for interfaces to be built in follow-up issues.
  • Follow-up issues created & linked to this issue.

What is the type of buyer?

These capabilities will be applicable to GitLab Ultimate subscribers.

Links / references

/label feature

  • IDS MVC
  • NetworkPolicy MVC
  • IDS Statistics
  • NetworkPolicy Statistics
Edited Dec 13, 2019 by Andy Volpe
Assignee Loading
Time tracking Loading