Add packet capture scripts for GKE nodes (!4616) · Merge requests · GitLab.com / Runbooks

Matt Smiley requested to merge add_tcpdump_for_gke_node into master May 20, 2022

This MR adds tcpdump wrapper scripts for capturing network traffic on a GKE node.

These scripts accept any arbitrary tcpdump arguments.

This MR adds 3 new scripts:

tcpdump_on_gke_node.sh [max_duration_seconds] [tcpdump_options]
tcpdump_on_gke_node_for_pod_id.using_pod_iface.sh [pod_id] [max_duration_seconds] [tcpdump_options]
tcpdump_on_gke_node_for_pod_id.using_pod_netns.sh [pod_id] [max_duration_seconds] [tcpdump_options]

The 1st script captures on the host's main network interface, in the root network namespace. It sees traffic for all pods but does not see loopback traffic local to the host itself.

The 2nd and 3rd scripts capture a single pod's traffic. This pair of scripts use different approaches for this capture. I expect either to suffice for most use-cases, but in certain cases the namespace-based approach may be better, since it includes the pod's loopback traffic too.

Testing summary:

This set of scripts successfully cover a wide variety of use-cases, including capturing only a specific pod or container port.

Each pod typically runs in an isolated network namespace, with listening processes bound to a container port within that namespace. When capturing on the host's root network namespace, filtering to a port will match traffic for all of the pods bound to that port, regardless of namespace.

In contrast, to capture traffic for a single pod, we can attach to that pod's network namespace or its virtual interface.

The docs in this MR include a quick reference commands list, usage summary, some contextual background, and a demo. For more details, see:

Edited May 21, 2022 by Matt Smiley

Add packet capture scripts for GKE nodes

Merge request reports