eBPF-Based Security Monitoring: Tetragon for Process, Network, and File Observability

eBPF-Based Security Monitoring: Tetragon for Process, Network, and File Observability

Problem

Falco monitors syscalls for runtime detection. Tetragon (CNCF/Cilium) goes deeper: it monitors process execution, network connections, and file access at the eBPF level with enforcement capability, it can block actions in real time, not just detect them. Most engineers do not know Tetragon exists or how it complements Falco.

Key differences:

  • Falco: Detection-only (alert on suspicious activity). Kernel module or eBPF driver. Rule-based matching against syscall events.
  • Tetragon: Detection AND enforcement (alert and/or block). eBPF-only (no kernel module). TracingPolicy CRDs for Kubernetes-native configuration. Can SIGKILL a process that violates policy.

Use both: Falco for broad behavioural detection rules, Tetragon for targeted enforcement on specific high-risk actions.

Target systems: Kubernetes 1.29+ with kernel 5.8+ (eBPF requirement). Tetragon 1.2+.

Threat Model

  • Adversary: Attacker with code execution inside a container, attempting: reverse shell, credential theft, container escape, crypto mining, or data exfiltration.
  • Key advantage of Tetragon over detection-only: Tetragon can kill the malicious process before the attack completes, not just alert after it succeeds.

Configuration

Deployment

helm repo add cilium https://helm.cilium.io
helm repo update

helm install tetragon cilium/tetragon \
  --namespace kube-system \
  --set tetragon.grpc.enabled=true \
  --set tetragon.exportFilename=/var/run/cilium/tetragon/tetragon.log \
  --set tetragon.resources.requests.cpu=100m \
  --set tetragon.resources.requests.memory=256Mi \
  --set tetragon.resources.limits.cpu=1000m \
  --set tetragon.resources.limits.memory=512Mi
# Install the tetra CLI for event viewing
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L "https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz" | tar -xz
sudo mv tetra /usr/local/bin/

# Verify Tetragon is running:
kubectl get pods -n kube-system -l app.kubernetes.io/name=tetragon
# Expected: one pod per node, Running

# View live events:
kubectl exec -n kube-system ds/tetragon -c tetragon -- tetra getevents -o compact

TracingPolicy: Process Execution Monitoring

# process-monitor.yaml
# Monitor all process executions in production namespaces.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: monitor-process-execution
spec:
  kprobes:
    - call: "sys_execve"
      syscall: true
      return: false
      args:
        - index: 0
          type: "string"  # filename (binary path)
      selectors:
        - matchNamespaces:
            - namespace: production
              operator: In
# View process execution events:
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
  tetra getevents -o compact --namespaces production

# Example output:
# πŸš€ process production/nginx-7d8f9b/nginx  /usr/sbin/nginx
# πŸš€ process production/nginx-7d8f9b/nginx  /bin/sh -c "echo test"  ← SUSPICIOUS

TracingPolicy: Block Specific Binaries (Enforcement)

# block-crypto-miners.yaml
# Kill any known crypto mining binary immediately.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: block-crypto-miners
spec:
  kprobes:
    - call: "sys_execve"
      syscall: true
      args:
        - index: 0
          type: "string"
      selectors:
        - matchArgs:
            - index: 0
              operator: "In"
              values:
                - "/usr/bin/xmrig"
                - "/tmp/xmrig"
                - "/usr/local/bin/minerd"
                - "/tmp/minerd"
                - "/usr/bin/cpuminer"
          matchActions:
            - action: Sigkill  # Kill the process immediately

Warning: Enforcement (Sigkill) has no undo. If the policy matches a legitimate process, it is killed instantly. Test enforcement policies in monitoring-only mode first (remove matchActions), verify zero false positives for 7 days, then enable Sigkill.

TracingPolicy: Network Connection Monitoring

# network-monitor.yaml
# Monitor outbound TCP connections from production pods.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: monitor-outbound-connections
spec:
  kprobes:
    - call: "tcp_connect"
      syscall: false
      args:
        - index: 0
          type: "sock"
      selectors:
        - matchNamespaces:
            - namespace: production
              operator: In

TracingPolicy: Sensitive File Access

# file-access-monitor.yaml
# Monitor access to sensitive files in containers.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
  name: monitor-sensitive-files
spec:
  kprobes:
    - call: "fd_install"
      syscall: false
      args:
        - index: 0
          type: "int"
        - index: 1
          type: "file"
      selectors:
        - matchArgs:
            - index: 1
              operator: "Prefix"
              values:
                - "/etc/shadow"
                - "/etc/passwd"
                - "/run/secrets/kubernetes.io/serviceaccount/token"
                - "/proc/1/environ"
          matchNamespaces:
            - namespace: production
              operator: In

Event Export to Grafana Cloud

# vector-tetragon.yaml - ship Tetragon events to centralized logging
sources:
  tetragon_events:
    type: file
    include:
      - /var/run/cilium/tetragon/tetragon.log

transforms:
  parse_tetragon:
    type: remap
    inputs: [tetragon_events]
    source: |
      . = parse_json!(.message)
      .source = "tetragon"

sinks:
  grafana_cloud:
    type: loki
    inputs: [parse_tetragon]
    endpoint: "https://logs-prod-us-central1.grafana.net"
    auth:
      strategy: basic
      user: "${LOKI_USER}"
      password: "${LOKI_API_KEY}"
    labels:
      source: tetragon
      event_type: "{{ process_exec.process.binary }}"
      namespace: "{{ process_exec.process.pod.namespace }}"

Tetragon vs Falco: When to Use Each

Scenario Tool Reason
Broad behavioural detection (unexpected processes) Falco Falco’s rule language is more expressive for behavioural patterns
Enforce: kill crypto mining binaries Tetragon Enforcement capability (Sigkill)
Network flow baselines and anomaly detection Tetragon + Hubble Native integration with Cilium networking
Sensitive file access monitoring Tetragon Lower overhead per-file monitoring
Container escape detection Both Falco for broad patterns, Tetragon for specific enforcement
Multi-cluster rule management Falco + Sysdig (#122) Better managed rule distribution ecosystem

Recommendation: Deploy both. Falco for broad detection and alert routing (via Falcosidekick). Tetragon for targeted enforcement on known-dangerous actions and deep file/network visibility.

Expected Behaviour

  • Tetragon DaemonSet running on all nodes (pod count = node count)
  • Process execution events visible in tetra getevents
  • Enforcement policies kill targeted binaries within milliseconds
  • Sensitive file access events exported to Grafana Cloud (#108)
  • Network connection events correlated with Hubble flow data
  • CPU overhead below 2% per node
# Verify enforcement:
kubectl exec -n production deploy/test -- /tmp/test-binary
# If the binary matches a block policy: process is killed immediately
# tetra events show: πŸ’€ exit production/test/test /tmp/test-binary SIGKILL

Trade-offs

Control Impact Risk Mitigation
Enforcement (Sigkill) Instant process termination Kills legitimate processes if policy is too broad Monitor-only for 7 days before enabling enforcement. Use precise path matching.
Process monitoring (all execve) Complete visibility into all process execution High event volume on busy nodes (1000+ events/sec) Filter by namespace. Export only security-relevant events.
File access monitoring Detect credential access, config modification I/O overhead for high-throughput file access Monitor specific paths only (not all file access).
eBPF requirement (kernel 5.8+) No kernel module needed Some older cloud provider images have kernel <5.8 Check node kernel version before deployment. Upgrade node image if needed.

Failure Modes

Failure Symptom Detection Recovery
TracingPolicy syntax error Policy not applied; no events generated kubectl get tracingpolicies shows error status; kubectl describe shows validation error Fix YAML syntax. Reapply. Tetragon validates policies at admission.
Enforcement kills legitimate process Application service disrupted Service monitoring detects pod crash; Tetragon event log shows Sigkill Remove or narrow the enforcement policy. Restart the affected pod. Test policy in monitor-only mode for longer.
Tetragon OOM on busy node Tetragon pod crashloops; monitoring gap Pod restart count increases; DaemonSet availability drops Increase memory limits. Reduce TracingPolicy scope (fewer monitored namespaces or paths).
Event export pipeline fails Events generated but not shipped to centralized storage Tetragon events visible locally but not in Grafana Cloud; Vector delivery metrics show failures Fix Vector pipeline. Events are buffered locally and ship when the pipeline recovers.

When to Consider a Managed Alternative

Tetragon event volume at scale requires managed storage. TracingPolicy lifecycle management across clusters requires centralized distribution.

  • Isovalent (#54) Cilium Enterprise: Managed Tetragon with policy lifecycle management, UI for TracingPolicy creation, and multi-cluster policy distribution.
  • Sysdig (#122): eBPF-based monitoring with managed platform. Not Tetragon-native but provides similar eBPF process/network/file monitoring.
  • Grafana Cloud (#108): For Tetragon event storage and dashboarding.

Premium content pack: Tetragon TracingPolicy collection. policies for process execution monitoring (per container image), crypto miner enforcement (block + kill), sensitive file access detection, network connection monitoring, and Grafana dashboards for Tetragon events.