eBPF-Based Security Monitoring: Tetragon for Process, Network, and File Observability
Problem
Falco monitors syscalls for runtime detection. Tetragon (CNCF/Cilium) goes deeper: it monitors process execution, network connections, and file access at the eBPF level with enforcement capability, it can block actions in real time, not just detect them. Most engineers do not know Tetragon exists or how it complements Falco.
Key differences:
- Falco: Detection-only (alert on suspicious activity). Kernel module or eBPF driver. Rule-based matching against syscall events.
- Tetragon: Detection AND enforcement (alert and/or block). eBPF-only (no kernel module). TracingPolicy CRDs for Kubernetes-native configuration. Can SIGKILL a process that violates policy.
Use both: Falco for broad behavioural detection rules, Tetragon for targeted enforcement on specific high-risk actions.
Target systems: Kubernetes 1.29+ with kernel 5.8+ (eBPF requirement). Tetragon 1.2+.
Threat Model
- Adversary: Attacker with code execution inside a container, attempting: reverse shell, credential theft, container escape, crypto mining, or data exfiltration.
- Key advantage of Tetragon over detection-only: Tetragon can kill the malicious process before the attack completes, not just alert after it succeeds.
Configuration
Deployment
helm repo add cilium https://helm.cilium.io
helm repo update
helm install tetragon cilium/tetragon \
--namespace kube-system \
--set tetragon.grpc.enabled=true \
--set tetragon.exportFilename=/var/run/cilium/tetragon/tetragon.log \
--set tetragon.resources.requests.cpu=100m \
--set tetragon.resources.requests.memory=256Mi \
--set tetragon.resources.limits.cpu=1000m \
--set tetragon.resources.limits.memory=512Mi
# Install the tetra CLI for event viewing
GOOS=$(go env GOOS)
GOARCH=$(go env GOARCH)
curl -L "https://github.com/cilium/tetragon/releases/latest/download/tetra-${GOOS}-${GOARCH}.tar.gz" | tar -xz
sudo mv tetra /usr/local/bin/
# Verify Tetragon is running:
kubectl get pods -n kube-system -l app.kubernetes.io/name=tetragon
# Expected: one pod per node, Running
# View live events:
kubectl exec -n kube-system ds/tetragon -c tetragon -- tetra getevents -o compact
TracingPolicy: Process Execution Monitoring
# process-monitor.yaml
# Monitor all process executions in production namespaces.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: monitor-process-execution
spec:
kprobes:
- call: "sys_execve"
syscall: true
return: false
args:
- index: 0
type: "string" # filename (binary path)
selectors:
- matchNamespaces:
- namespace: production
operator: In
# View process execution events:
kubectl exec -n kube-system ds/tetragon -c tetragon -- \
tetra getevents -o compact --namespaces production
# Example output:
# π process production/nginx-7d8f9b/nginx /usr/sbin/nginx
# π process production/nginx-7d8f9b/nginx /bin/sh -c "echo test" β SUSPICIOUS
TracingPolicy: Block Specific Binaries (Enforcement)
# block-crypto-miners.yaml
# Kill any known crypto mining binary immediately.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: block-crypto-miners
spec:
kprobes:
- call: "sys_execve"
syscall: true
args:
- index: 0
type: "string"
selectors:
- matchArgs:
- index: 0
operator: "In"
values:
- "/usr/bin/xmrig"
- "/tmp/xmrig"
- "/usr/local/bin/minerd"
- "/tmp/minerd"
- "/usr/bin/cpuminer"
matchActions:
- action: Sigkill # Kill the process immediately
Warning: Enforcement (Sigkill) has no undo. If the policy matches a legitimate process, it is killed instantly. Test enforcement policies in monitoring-only mode first (remove matchActions), verify zero false positives for 7 days, then enable Sigkill.
TracingPolicy: Network Connection Monitoring
# network-monitor.yaml
# Monitor outbound TCP connections from production pods.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: monitor-outbound-connections
spec:
kprobes:
- call: "tcp_connect"
syscall: false
args:
- index: 0
type: "sock"
selectors:
- matchNamespaces:
- namespace: production
operator: In
TracingPolicy: Sensitive File Access
# file-access-monitor.yaml
# Monitor access to sensitive files in containers.
apiVersion: cilium.io/v1alpha1
kind: TracingPolicy
metadata:
name: monitor-sensitive-files
spec:
kprobes:
- call: "fd_install"
syscall: false
args:
- index: 0
type: "int"
- index: 1
type: "file"
selectors:
- matchArgs:
- index: 1
operator: "Prefix"
values:
- "/etc/shadow"
- "/etc/passwd"
- "/run/secrets/kubernetes.io/serviceaccount/token"
- "/proc/1/environ"
matchNamespaces:
- namespace: production
operator: In
Event Export to Grafana Cloud
# vector-tetragon.yaml - ship Tetragon events to centralized logging
sources:
tetragon_events:
type: file
include:
- /var/run/cilium/tetragon/tetragon.log
transforms:
parse_tetragon:
type: remap
inputs: [tetragon_events]
source: |
. = parse_json!(.message)
.source = "tetragon"
sinks:
grafana_cloud:
type: loki
inputs: [parse_tetragon]
endpoint: "https://logs-prod-us-central1.grafana.net"
auth:
strategy: basic
user: "${LOKI_USER}"
password: "${LOKI_API_KEY}"
labels:
source: tetragon
event_type: "{{ process_exec.process.binary }}"
namespace: "{{ process_exec.process.pod.namespace }}"
Tetragon vs Falco: When to Use Each
| Scenario | Tool | Reason |
|---|---|---|
| Broad behavioural detection (unexpected processes) | Falco | Falcoβs rule language is more expressive for behavioural patterns |
| Enforce: kill crypto mining binaries | Tetragon | Enforcement capability (Sigkill) |
| Network flow baselines and anomaly detection | Tetragon + Hubble | Native integration with Cilium networking |
| Sensitive file access monitoring | Tetragon | Lower overhead per-file monitoring |
| Container escape detection | Both | Falco for broad patterns, Tetragon for specific enforcement |
| Multi-cluster rule management | Falco + Sysdig (#122) | Better managed rule distribution ecosystem |
Recommendation: Deploy both. Falco for broad detection and alert routing (via Falcosidekick). Tetragon for targeted enforcement on known-dangerous actions and deep file/network visibility.
Expected Behaviour
- Tetragon DaemonSet running on all nodes (pod count = node count)
- Process execution events visible in
tetra getevents - Enforcement policies kill targeted binaries within milliseconds
- Sensitive file access events exported to Grafana Cloud (#108)
- Network connection events correlated with Hubble flow data
- CPU overhead below 2% per node
# Verify enforcement:
kubectl exec -n production deploy/test -- /tmp/test-binary
# If the binary matches a block policy: process is killed immediately
# tetra events show: π exit production/test/test /tmp/test-binary SIGKILL
Trade-offs
| Control | Impact | Risk | Mitigation |
|---|---|---|---|
| Enforcement (Sigkill) | Instant process termination | Kills legitimate processes if policy is too broad | Monitor-only for 7 days before enabling enforcement. Use precise path matching. |
| Process monitoring (all execve) | Complete visibility into all process execution | High event volume on busy nodes (1000+ events/sec) | Filter by namespace. Export only security-relevant events. |
| File access monitoring | Detect credential access, config modification | I/O overhead for high-throughput file access | Monitor specific paths only (not all file access). |
| eBPF requirement (kernel 5.8+) | No kernel module needed | Some older cloud provider images have kernel <5.8 | Check node kernel version before deployment. Upgrade node image if needed. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| TracingPolicy syntax error | Policy not applied; no events generated | kubectl get tracingpolicies shows error status; kubectl describe shows validation error |
Fix YAML syntax. Reapply. Tetragon validates policies at admission. |
| Enforcement kills legitimate process | Application service disrupted | Service monitoring detects pod crash; Tetragon event log shows Sigkill | Remove or narrow the enforcement policy. Restart the affected pod. Test policy in monitor-only mode for longer. |
| Tetragon OOM on busy node | Tetragon pod crashloops; monitoring gap | Pod restart count increases; DaemonSet availability drops | Increase memory limits. Reduce TracingPolicy scope (fewer monitored namespaces or paths). |
| Event export pipeline fails | Events generated but not shipped to centralized storage | Tetragon events visible locally but not in Grafana Cloud; Vector delivery metrics show failures | Fix Vector pipeline. Events are buffered locally and ship when the pipeline recovers. |
When to Consider a Managed Alternative
Tetragon event volume at scale requires managed storage. TracingPolicy lifecycle management across clusters requires centralized distribution.
- Isovalent (#54) Cilium Enterprise: Managed Tetragon with policy lifecycle management, UI for TracingPolicy creation, and multi-cluster policy distribution.
- Sysdig (#122): eBPF-based monitoring with managed platform. Not Tetragon-native but provides similar eBPF process/network/file monitoring.
- Grafana Cloud (#108): For Tetragon event storage and dashboarding.
Premium content pack: Tetragon TracingPolicy collection. policies for process execution monitoring (per container image), crypto miner enforcement (block + kill), sensitive file access detection, network connection monitoring, and Grafana dashboards for Tetragon events.