Container Escape Detection: Runtime Signals, Kernel Indicators, and Response Automation
Problem
Container escapes are the highest-impact attack in Kubernetes. A single compromised pod that escapes its container gains access to the underlying node, and from there to every other pod on that node, the kubelet credentials, and potentially the entire cluster. Detection must catch the escape attempt before it succeeds, because once the attacker has node-level access, they can disable the monitoring that would detect them.
The specific challenges:
- Escape techniques are kernel-level. Container escapes exploit namespace manipulation (
nsenter,unshare), cgroup breakouts,/procfilesystem abuse, and mounted host paths. Detecting these requires kernel-level instrumentation, not application-level logging. - Legitimate admin operations look like escapes.
nsenterinto a container namespace is a normal debugging tool. Mounting host paths is required for log collection and storage. Detection rules must distinguish between authorized admin actions and attack techniques. - New escape techniques appear regularly. CVE-2024-21626 (runc
WORKDIRescape), CVE-2022-0185 (file system context escape), and Leaky Vessels (2024) each introduced new escape vectors. Static detection rules that check for known techniques miss novel exploits. - Privileged containers bypass all protections. Containers running with
privileged: trueor with specific dangerous capabilities (SYS_ADMIN,SYS_PTRACE) can escape trivially. Detection for privileged containers is a different problem than detection for standard containers.
This article covers Falco rules for known escape techniques, Tetragon TracingPolicies for kernel-level detection and blocking, Kubernetes audit log patterns, and automated response.
Target systems: Kubernetes clusters with Falco or Tetragon deployed as DaemonSets. Prometheus + Alertmanager. Cilium for network-level response.
Threat Model
- Adversary: An attacker who has gained code execution inside a container (through application vulnerability, supply chain compromise, or compromised image). They attempt to break out of the container namespace to reach the host node.
- Blast radius: A successful container escape gives the attacker root on the node. From there: access to kubelet credentials (can impersonate the node in the cluster), access to all pods on the node (including secrets mounted as volumes), ability to pivot to other nodes via the cluster network, and potential access to the cloud provider metadata service for further privilege escalation.
Configuration
Falco Rules for Known Escape Techniques
# falco-rules-container-escape.yaml
# Rules detecting common container escape techniques.
# Rule 1: nsenter or unshare execution inside a container.
# nsenter allows entering another namespace (escape to host namespace).
# unshare creates new namespaces (can be used to gain capabilities).
- rule: Namespace Manipulation in Container
desc: >
Detected nsenter or unshare execution inside a container.
This is a strong indicator of a container escape attempt.
condition: >
spawned_process
and container
and (proc.name in (nsenter, unshare))
and not (k8s.ns.name in (kube-system, monitoring))
output: >
Namespace manipulation in container
(command=%proc.cmdline container=%container.name
image=%container.image.repository namespace=%k8s.ns.name
pod=%k8s.pod.name user=%user.name)
priority: CRITICAL
tags: [container-escape, namespace]
# Rule 2: mount syscall from a non-init process in a container.
# Mounting filesystems inside a container is unusual and may indicate
# an attempt to mount the host filesystem.
- rule: Unexpected Mount in Container
desc: >
A process inside a container executed a mount syscall.
This may indicate an attempt to mount host filesystems.
condition: >
evt.type = mount
and container
and proc.pid != 1
and not (proc.pname in (mount, umount, systemd))
and not (k8s.ns.name in (kube-system))
output: >
Mount syscall in container
(command=%proc.cmdline container=%container.name
image=%container.image.repository namespace=%k8s.ns.name)
priority: CRITICAL
tags: [container-escape, mount]
# Rule 3: write to sensitive /proc paths.
# /proc/sysrq-trigger can reboot the host.
# /proc/*/mem can read/write other process memory.
- rule: Write to Sensitive Proc Path
desc: >
A container process wrote to a sensitive /proc path that could
affect the host or other processes.
condition: >
open_write
and container
and (fd.name startswith /proc/sysrq-trigger
or fd.name startswith /proc/self/mem
or fd.name startswith /host/proc)
output: >
Write to sensitive /proc path
(file=%fd.name command=%proc.cmdline container=%container.name
namespace=%k8s.ns.name pod=%k8s.pod.name)
priority: CRITICAL
tags: [container-escape, proc]
# Rule 4: access to Docker socket or containerd socket.
# Direct socket access allows creating new privileged containers.
- rule: Container Runtime Socket Access
desc: >
A process inside a container accessed the container runtime socket.
This allows creating new containers with arbitrary privileges.
condition: >
(open_read or open_write)
and container
and (fd.name in (/var/run/docker.sock, /run/containerd/containerd.sock,
/var/run/crio/crio.sock))
output: >
Container runtime socket access
(socket=%fd.name command=%proc.cmdline container=%container.name
namespace=%k8s.ns.name pod=%k8s.pod.name)
priority: CRITICAL
tags: [container-escape, runtime-socket]
# Rule 5: cgroup escape attempt (notify_on_release).
# Classic cgroup v1 escape: write to notify_on_release and release_agent.
- rule: Cgroup Escape Attempt
desc: >
A container process wrote to cgroup notify_on_release or release_agent,
which is the classic cgroup v1 container escape technique.
condition: >
open_write
and container
and (fd.name contains notify_on_release or fd.name contains release_agent)
output: >
Cgroup escape attempt
(file=%fd.name command=%proc.cmdline container=%container.name
namespace=%k8s.ns.name pod=%k8s.pod.name)
priority: CRITICAL
tags: [container-escape, cgroup]
# Rule 6: access to host filesystem via mounted paths.
# Pods with hostPath mounts may access sensitive host files.
- rule: Sensitive Host Path Access
desc: >
A container accessed sensitive files through a host path mount.
condition: >
(open_read or open_write)
and container
and (fd.name startswith /host/etc/shadow
or fd.name startswith /host/etc/kubernetes
or fd.name startswith /host/root/.ssh
or fd.name startswith /host/var/lib/kubelet)
output: >
Sensitive host path access
(file=%fd.name command=%proc.cmdline container=%container.name
namespace=%k8s.ns.name pod=%k8s.pod.name)
priority: CRITICAL
tags: [container-escape, host-path]
Tetragon TracingPolicies for Real-Time Blocking
Tetragon can block escape attempts at the kernel level, not just detect them:
# tetragon-escape-policy.yaml
# TracingPolicy that kills the process attempting a container escape.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: container-escape-prevention
spec:
kprobes:
# Block nsenter/unshare in non-system namespaces.
- call: "__x64_sys_setns"
syscall: true
selectors:
- matchNamespaces:
- namespace: Pid
operator: NotIn
values:
- "host_ns"
matchActions:
- action: Sigkill
argError: -1
args:
- index: 0
type: int
- index: 1
type: int
# Block mount syscall from containers (non-init processes).
- call: "__x64_sys_mount"
syscall: true
selectors:
- matchPIDs:
- operator: NotIn
followForks: true
values:
- 1
matchNamespaces:
- namespace: Mnt
operator: NotIn
values:
- "host_ns"
matchActions:
- action: Sigkill
# Block writes to cgroup escape paths.
- call: "security_file_open"
selectors:
- matchArgs:
- index: 0
operator: Postfix
values:
- "notify_on_release"
- "release_agent"
matchActions:
- action: Sigkill
Kubernetes Audit Log Patterns
Detect escape-adjacent activity through the API server:
# Prometheus alerting rules based on Kubernetes audit log events.
groups:
- name: container-escape-audit
rules:
# Alert: exec into a pod with suspicious commands.
- alert: SuspiciousPodExec
expr: >
sum by (user, namespace, pod) (
rate(apiserver_audit_event_total{
verb="create",
resource="pods/exec",
request_uri=~".*command=(nsenter|chroot|mount|unshare).*"
}[5m])
) > 0
for: 1m
labels:
severity: critical
detection_type: container_escape
annotations:
summary: >
Suspicious exec: {{ $labels.user }} ran escape-related
command in {{ $labels.namespace }}/{{ $labels.pod }}
# Alert: pod created with privileged security context.
- alert: PrivilegedPodCreated
expr: >
sum by (user, namespace) (
rate(apiserver_audit_event_total{
verb="create",
resource="pods",
request_object=~".*privileged.*true.*"
}[5m])
) > 0
for: 1m
labels:
severity: warning
detection_type: privilege_escalation
annotations:
summary: >
Privileged pod created by {{ $labels.user }}
in {{ $labels.namespace }}
Automated Response
# Falcosidekick configuration: auto-respond to container escape events.
config:
kubernetesPolicyReport:
enabled: true
minimumpriority: "critical"
webhook:
address: "http://response-automation:8080/falco"
minimumpriority: "critical"
# Response actions for container escape:
# 1. Kill the offending pod immediately.
# 2. Apply network quarantine to prevent lateral movement.
# 3. Cordon the node (prevent new pods from scheduling).
# 4. Page the security team with forensic context.
---
# response-actions.yaml (webhook handler configuration)
actions:
container_escape:
rules:
- "Namespace Manipulation in Container"
- "Cgroup Escape Attempt"
- "Container Runtime Socket Access"
steps:
- type: kubectl
command: "delete pod {{ .pod }} -n {{ .namespace }} --grace-period=0"
- type: kubectl
command: "label pod {{ .pod }} -n {{ .namespace }} security.quarantine=true"
- type: kubectl
command: "cordon {{ .node }}"
- type: alert
severity: critical
channel: "#security-incidents"
Expected Behaviour
nsenter,unshare, andmountexecution inside containers triggers a CRITICAL alert within seconds- Writes to
/proc/sysrq-trigger, cgroup escape paths, and container runtime sockets are detected and blocked - Tetragon kills escape processes at the kernel level before the escape completes
- Suspicious
kubectl execcommands are flagged through Kubernetes audit log monitoring - Privileged pod creation generates a WARNING alert
- Automated response kills the pod, quarantines the workload, and cordons the node within 30 seconds
- False positive rate below 1 per week after excluding
kube-systemand monitoring namespaces
Trade-offs
| Decision | Impact | Risk | Mitigation |
|---|---|---|---|
| Tetragon Sigkill on escape attempt | Blocks escape in real time, before completion | False positive kills a legitimate process | Exclude kube-system, monitoring, and other trusted namespaces. Test rules in audit mode (action: Post) before enabling Sigkill. |
| Auto-cordon node on escape detection | Prevents attacker from scheduling new pods on compromised node | Reduces cluster capacity; may cause scheduling pressure | Auto-uncordon after security team clears the node (within SLA). Ensure sufficient capacity to absorb one cordoned node. |
| Falco + Tetragon (both deployed) | Falco for visibility and alerting; Tetragon for blocking | Two DaemonSets add resource overhead (100-200MB RAM per node) | Use Falco for detection/alerting only. Use Tetragon for enforcement. Do not duplicate rules between them. |
| Excluding kube-system from rules | Reduces false positives from system components | Attacker could deploy malicious workload in kube-system | Restrict kube-system namespace with RBAC and admission control. Alert on any non-system workload deployed to kube-system. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Falco DaemonSet not running on a node | No escape detection on that node | absent(falco_events_total{node="X"}) or DaemonSet pod count mismatch |
Check node taints/tolerations. Ensure Falco DaemonSet has appropriate tolerations for all nodes. |
| Tetragon policy not loaded | Escape attempts detected by Falco but not blocked | Tetragon logs show policy parse error; escape process is not killed | Validate TracingPolicy with kubectl describe tracingpolicy. Check Tetragon agent logs for BPF program load errors. |
| Escape technique uses unknown vector | No rule matches; escape succeeds undetected | Post-incident investigation reveals new technique | Subscribe to container security advisories. Update rules within 48 hours of new CVE disclosure. Add generic behavioural rules (unexpected capability usage). |
| Auto-response kills legitimate pod | Service disruption; pod restarts in a loop | Pod restart count increases; service health checks fail | Review the triggering event. Add an exception for the specific workload if it legitimately needs the detected behaviour. |
| Node cordoned but not uncordoned | Cluster capacity shrinks over time as nodes accumulate cordons | Node count in schedulable state decreasing | Set a TTL on cordon actions (auto-uncordon after 4 hours unless security team extends). Alert on nodes cordoned for more than 2 hours. |
When to Consider a Managed Alternative
Self-managed container escape detection requires Falco and/or Tetragon DaemonSet operation, rule maintenance for new CVEs, automated response infrastructure, and regular rule tuning (6-8 hours/month).
- Sysdig (#122): Managed Falco rules with automatic updates for new escape techniques. Drift detection that identifies unexpected changes inside containers. Multi-cluster rule management from a single console.
- Aqua (#123): Runtime protection with container escape prevention built in. Enforcement mode blocks escape attempts without custom rule writing. Integrates vulnerability scanning with runtime detection.
Premium content pack: Container escape Falco rule pack. 20+ rules covering nsenter, unshare, mount, cgroup breakout, proc filesystem abuse, runtime socket access, and host path exploitation. Includes Tetragon TracingPolicies, automated response configurations, and a testing framework for validating detection rules.