Kubernetes Audit Log Pipeline Design: From API Server to SIEM
Problem
Kubernetes audit logging at the RequestResponse level captures everything: every API call, every request body, every response payload. On an active cluster this generates terabytes per week. At the None level, you have zero visibility into who did what. The challenge is designing an audit policy that captures security-relevant events at the right level of detail while keeping storage costs manageable.
The specific problems:
- Default audit logging is off. Most Kubernetes distributions ship with no audit policy. You have no record of who created a privileged pod, who read a secret, or who modified RBAC bindings.
- RequestResponse for everything is unaffordable. Full request and response bodies for every API call generate 500MB-2GB per node per day. At 50 nodes, that is 25-100GB per day before indexing overhead.
- Secrets appear in audit logs. Without field redaction, Secret values are logged in plaintext when someone creates or updates a Secret with
RequestResponselevel. This turns your audit log into a credential store. - Managed Kubernetes restricts access. EKS, GKE, and AKS each expose audit logs differently. EKS sends them to CloudWatch (expensive to query). GKE sends them to Cloud Logging (different schema). AKS requires a diagnostic setting. You cannot simply configure the audit policy file.
- Volume estimation is guesswork. Without understanding how policy levels map to log volume for your specific workload, capacity planning is impossible.
This article provides a production audit policy, volume estimation approach, field redaction, and pipeline design from API server to SIEM.
Target systems: Self-managed Kubernetes (kubeadm, k3s, RKE2) with direct audit policy control. Guidance for EKS, GKE, AKS log access patterns.
Threat Model
- Adversary: An insider or external attacker with valid cluster credentials (stolen kubeconfig, compromised service account). They read secrets, escalate RBAC privileges, create privileged pods, or delete workloads.
- Blast radius: Without audit logs, post-incident investigation is impossible. You cannot determine what the attacker accessed, what they modified, or how they escalated privileges. With properly designed audit logging, every security-relevant API call is recorded with the actor identity, resource, and timestamp.
Configuration
Audit Policy Design
The policy applies the most verbose level only to security-critical resources:
# /etc/kubernetes/audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
# Omit stages that are not useful for security analysis.
omitStages:
- "RequestReceived"
rules:
# --- HIGH PRIORITY: RequestResponse for security-critical resources ---
# Secrets: log full request and response to track who created/read/updated.
# WARNING: enable redaction (see below) to avoid logging secret values.
- level: RequestResponse
resources:
- group: ""
resources: ["secrets"]
verbs: ["create", "update", "patch", "delete"]
# Secret reads: Metadata only (response body contains the secret value).
- level: Metadata
resources:
- group: ""
resources: ["secrets"]
verbs: ["get", "list", "watch"]
# RBAC: full request/response for all mutations.
- level: RequestResponse
resources:
- group: "rbac.authorization.k8s.io"
resources: ["clusterroles", "clusterrolebindings", "roles", "rolebindings"]
# ServiceAccounts: track creation and token requests.
- level: RequestResponse
resources:
- group: ""
resources: ["serviceaccounts"]
- group: "authentication.k8s.io"
resources: ["tokenreviews", "tokenrequests"]
# --- MEDIUM PRIORITY: Request level for workload mutations ---
# Pod/Deployment/DaemonSet mutations: log the request body.
- level: Request
resources:
- group: ""
resources: ["pods", "pods/exec", "pods/portforward"]
- group: "apps"
resources: ["deployments", "daemonsets", "statefulsets", "replicasets"]
verbs: ["create", "update", "patch", "delete"]
# Pod exec and port-forward deserve special attention.
- level: RequestResponse
resources:
- group: ""
resources: ["pods/exec", "pods/portforward"]
# --- LOW PRIORITY: Metadata for reads ---
# All other resource reads: just metadata (who, what, when).
- level: Metadata
resources:
- group: ""
resources: ["configmaps", "services", "endpoints", "persistentvolumeclaims"]
verbs: ["get", "list", "watch"]
# --- EXCLUDED: None for noisy, low-value endpoints ---
# Health checks and metrics: no security value, extreme volume.
- level: None
nonResourceURLs:
- "/healthz*"
- "/readyz*"
- "/livez*"
- "/metrics"
- "/openapi/*"
# System components: kube-proxy, kubelet, and node updates.
- level: None
users:
- "system:kube-proxy"
- "system:kube-scheduler"
- "system:kube-controller-manager"
resources:
- group: ""
resources: ["endpoints", "services", "services/status"]
# Catch-all: Metadata for everything else.
- level: Metadata
omitStages:
- "RequestReceived"
API Server Configuration
# kube-apiserver flags (in the static pod manifest or systemd unit)
spec:
containers:
- command:
- kube-apiserver
- --audit-policy-file=/etc/kubernetes/audit-policy.yaml
- --audit-log-path=/var/log/kubernetes/audit/audit.log
- --audit-log-maxage=7
- --audit-log-maxbackup=3
- --audit-log-maxsize=200
# Webhook backend for real-time streaming (alternative to file).
# - --audit-webhook-config-file=/etc/kubernetes/audit-webhook.yaml
# - --audit-webhook-batch-max-wait=5s
volumeMounts:
- mountPath: /etc/kubernetes/audit-policy.yaml
name: audit-policy
readOnly: true
- mountPath: /var/log/kubernetes/audit
name: audit-log
volumes:
- name: audit-policy
hostPath:
path: /etc/kubernetes/audit-policy.yaml
type: File
- name: audit-log
hostPath:
path: /var/log/kubernetes/audit
type: DirectoryOrCreate
Log Shipping with Vector
# vector.yaml: ship audit logs to your SIEM backend.
sources:
k8s_audit:
type: file
include:
- /var/log/kubernetes/audit/audit.log
read_from: beginning
transforms:
parse_audit:
type: remap
inputs: ["k8s_audit"]
source: |
. = parse_json!(.message)
# Redact secret values from request/response objects.
if .objectRef.resource == "secrets" {
del(.requestObject.data)
del(.responseObject.data)
del(.requestObject.stringData)
}
# Normalize fields for SIEM ingestion.
.actor = .user.username
.groups = .user.groups
.resource = join!([
.objectRef.resource,
"/",
.objectRef.namespace // "cluster",
"/",
.objectRef.name // "unknown"
])
.action = .verb
.timestamp = .requestReceivedTimestamp
filter_noise:
type: filter
inputs: ["parse_audit"]
condition: |
# Drop events already excluded by policy that slip through.
.verb != "watch" || .objectRef.resource == "secrets"
sinks:
elasticsearch:
type: elasticsearch
inputs: ["filter_noise"]
endpoints:
- "https://elasticsearch.internal:9200"
bulk:
index: "k8s-audit-%Y.%m.%d"
auth:
strategy: basic
user: "${ES_USER}"
password: "${ES_PASSWORD}"
tls:
verify_certificate: true
Volume Estimation
Use this formula to estimate daily audit log volume before enabling the policy:
Daily volume = (API requests/day) x (average event size) x (policy multiplier)
Policy multipliers (approximate):
None: 0 bytes
Metadata: ~500 bytes per event
Request: ~2 KB per event
RequestResponse: ~5-10 KB per event
Example (50-node cluster, moderate activity):
Total API calls/day: 2,000,000
Breakdown by policy:
None (health/metrics): 800,000 x 0 = 0
Metadata (reads): 900,000 x 500B = 450 MB
Request (mutations): 250,000 x 2 KB = 500 MB
RequestResponse (RBAC): 50,000 x 5 KB = 250 MB
Total: ~1.2 GB/day
Managed Kubernetes Differences
# EKS: enable audit logs via cluster logging configuration.
aws eks update-cluster-config \
--name production \
--logging '{"clusterLogging":[{"types":["audit"],"enabled":true}]}'
# Logs go to CloudWatch Log Group: /aws/eks/production/cluster
# Cost: $0.50/GB ingested + $0.03/GB stored/month
# GKE: audit logs are enabled by default in Cloud Audit Logs.
# Admin Activity logs: free, always on.
# Data Access logs: must be enabled, billed at Cloud Logging rates.
gcloud projects get-iam-policy PROJECT_ID \
--format=json | jq '.auditConfigs'
# AKS: enable via diagnostic settings.
az monitor diagnostic-settings create \
--name aks-audit \
--resource "/subscriptions/.../managedClusters/production" \
--logs '[{"category":"kube-audit-admin","enabled":true}]' \
--workspace "/subscriptions/.../workspaces/security-logs"
Expected Behaviour
- Security-critical resources (secrets, RBAC, service accounts) logged at RequestResponse level
- Pod exec and port-forward commands logged with full request and response
- Health checks and system component activity excluded (80% volume reduction)
- Secret values redacted from all log entries before shipping
- Daily log volume between 1-3 GB for a 50-node cluster (vs 25-100 GB at full RequestResponse)
- Audit events available in SIEM within 60 seconds of API call
Trade-offs
| Decision | Impact | Risk | Mitigation |
|---|---|---|---|
| Metadata-only for secret reads | Avoids logging secret values on read operations | Cannot see which specific secret version was read | Combine with Vault audit logs if detailed secret access tracking is needed. |
| None for health checks and system components | 80% volume reduction | Misses attacks that abuse health check endpoints | Monitor health check endpoint response codes separately with Prometheus. |
| File-based audit backend (not webhook) | Simpler setup; survives backend outages | Log delay if shipping agent falls behind | Monitor file size growth rate. Alert if audit log file exceeds 100MB (shipping lag). |
| 7-day local retention (maxage=7) | Limits disk usage on control plane nodes | Local logs lost after 7 days if shipping fails | Central SIEM is the primary store. Local retention is backup only. Alert on shipping failures. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Audit policy file missing or malformed | API server refuses to start or starts without audit | API server logs show policy parse error; no audit events in SIEM | Validate policy with kubectl apply --dry-run equivalent. Keep a known-good policy as fallback. |
| Audit log volume exceeds disk capacity | Control plane node disk full; API server crashes | Disk usage alert on control plane nodes | Reduce maxsize and maxbackup. Add None rules for additional high-volume, low-value resources. |
| Vector shipping lag | Audit events delayed by minutes or hours | Lag metric in Vector dashboard; SIEM freshness alert | Scale Vector resources. Increase batch size. Check network throughput to SIEM backend. |
| Secret values not redacted | Secrets visible in SIEM to anyone with log access | Periodic audit: search SIEM for objectRef.resource:secrets AND requestObject.data:* |
Fix Vector transform. Re-index affected time range with redaction. Rotate exposed secrets. |
| Managed K8s cost explosion | CloudWatch or Cloud Logging bill spikes unexpectedly | Billing alerts on logging cost | Add subscription-level log filters in CloudWatch/Cloud Logging to drop non-security events before storage. |
When to Consider a Managed Alternative
Self-managed Kubernetes audit logging requires API server configuration, log shipping infrastructure, storage capacity planning, and ongoing policy tuning (2-4 hours/month).
- Grafana Cloud (#108): Managed Loki backend for audit log storage. Pre-built dashboards for K8s audit analysis. No Elasticsearch cluster management.
- Axiom (#112): Schemaless ingestion handles audit log format changes across K8s versions. Cost-effective storage with fast query performance. No index management.
- Sysdig (#122): K8s-native audit analysis with pre-built detection rules. Automatic correlation of audit events with runtime security events.
Premium content pack: Kubernetes audit policy templates. Policies tuned for CIS Benchmark, SOC 2, and PCI-DSS compliance requirements. Includes Vector transforms, Elasticsearch index templates, and Grafana dashboards.