Kubernetes Audit Log Analysis: What to Log, How to Query, and What to Alert On
Problem
Kubernetes audit logs record every request to the API server: who made the request, what they asked for, and whether it succeeded. In an active cluster, this generates 1-10 GB per day at the Metadata level and 5-50 GB per day at the RequestResponse level. Without filtering, storage costs are prohibitive. Without analysis, security-relevant events are buried in millions of routine API calls.
The core challenges are:
- Logging everything is too expensive. The
RequestResponselevel records the full request and response body for every API call. For a cluster with frequent pod scheduling, configmap reads, and health checks, this produces terabytes of data per month. Most of it is noise. - Logging nothing is blind. The
Nonelevel disables all audit logging. You have no record of who created a privileged pod, who read a secret, or who modified RBAC. Incident response becomes guesswork. - The useful signal is in the middle. Secret access, RBAC changes, exec into pods, service account token creation, and node status changes are high-value events. Health checks, metrics scraping, and leader election updates are noise. The audit policy must distinguish between the two.
- Shipping and querying at scale requires infrastructure. Audit logs must be forwarded to a SIEM or log aggregation system, parsed, indexed, and queryable. The Kubernetes API server writes logs to files or webhooks; getting them into Loki, Elasticsearch, or a managed backend requires additional configuration.
This article covers audit policy design, per-resource filtering, sensitive field redaction, log shipping, and the top 10 suspicious patterns to alert on.
Target systems: Kubernetes 1.29+ (self-managed). Managed providers expose audit logs through their own interfaces (CloudWatch for EKS, Cloud Logging for GKE, Azure Monitor for AKS).
Threat Model
- Adversary: Insider with legitimate cluster access attempting privilege escalation, or an external attacker who has compromised a ServiceAccount or user credential.
- Access level: Valid Kubernetes API credentials (user certificate, ServiceAccount token, or OIDC token) with some level of RBAC permissions.
- Objective: Escalate privileges (create ClusterRoleBindings), access sensitive data (list secrets), establish persistence (create new ServiceAccounts, modify webhooks), or cover tracks (delete audit-relevant resources).
- Blast radius: Without audit logging, these actions leave no trace. With proper audit logging and alerting, each suspicious action triggers a detection, enabling incident response within minutes instead of days or weeks.
Configuration
Step 1: Audit Policy Design
The audit policy defines four logging levels per resource type:
| Level | What is logged | Storage impact |
|---|---|---|
None |
Nothing | Zero |
Metadata |
Request metadata (user, verb, resource, timestamp) | Low (200-500 bytes per event) |
Request |
Metadata + request body | Medium (1-5 KB per event) |
RequestResponse |
Metadata + request body + response body | High (5-50 KB per event) |
# /etc/kubernetes/audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# ============================================
# Level 1: Skip noise (None)
# ============================================
# Skip all requests to health check and readiness endpoints
- level: None
nonResourceURLs:
- /healthz*
- /readyz*
- /livez*
- /metrics
# Skip API discovery requests (constant, high-volume)
- level: None
resources:
- group: ""
resources: ["endpoints", "events"]
verbs: ["get", "list", "watch"]
# Skip watch requests (long-running, generate massive volume)
- level: None
verbs: ["watch"]
# Skip system:nodes and system:kube-scheduler reads
# (constant heartbeats and lease updates)
- level: None
users:
- "system:kube-scheduler"
- "system:kube-controller-manager"
verbs: ["get", "list"]
resources:
- group: "coordination.k8s.io"
resources: ["leases"]
# ============================================
# Level 2: High-value targets (RequestResponse)
# ============================================
# Log full request and response for secrets
- level: RequestResponse
resources:
- group: ""
resources: ["secrets"]
omitStages:
- RequestReceived
# Log full RBAC changes
- level: RequestResponse
resources:
- group: "rbac.authorization.k8s.io"
resources: ["clusterroles", "clusterrolebindings", "roles", "rolebindings"]
omitStages:
- RequestReceived
# Log full request for service account token creation
- level: RequestResponse
resources:
- group: ""
resources: ["serviceaccounts/token"]
omitStages:
- RequestReceived
# ============================================
# Level 3: Mutation tracking (Request)
# ============================================
# Log request body for pod, deployment, and daemonset mutations
- level: Request
resources:
- group: ""
resources: ["pods", "pods/exec", "pods/portforward"]
- group: "apps"
resources: ["deployments", "daemonsets", "statefulsets"]
verbs: ["create", "update", "patch", "delete"]
omitStages:
- RequestReceived
# Log webhook configuration changes (persistence vector)
- level: Request
resources:
- group: "admissionregistration.k8s.io"
resources: ["mutatingwebhookconfigurations", "validatingwebhookconfigurations"]
omitStages:
- RequestReceived
# Log namespace lifecycle
- level: Request
resources:
- group: ""
resources: ["namespaces"]
verbs: ["create", "delete"]
omitStages:
- RequestReceived
# ============================================
# Level 4: Everything else (Metadata)
# ============================================
# Default: log metadata for all other requests
- level: Metadata
omitStages:
- RequestReceived
Step 2: Enable Audit Logging on the API Server
For kubeadm clusters:
# /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: v1
kind: Pod
metadata:
name: kube-apiserver
namespace: kube-system
spec:
containers:
- name: kube-apiserver
command:
- kube-apiserver
- --audit-policy-file=/etc/kubernetes/audit-policy.yaml
- --audit-log-path=/var/log/kubernetes/audit/audit.log
- --audit-log-maxage=30
- --audit-log-maxbackup=10
- --audit-log-maxsize=100
# ... other existing flags ...
volumeMounts:
- name: audit-policy
mountPath: /etc/kubernetes/audit-policy.yaml
readOnly: true
- name: audit-log
mountPath: /var/log/kubernetes/audit
volumes:
- name: audit-policy
hostPath:
path: /etc/kubernetes/audit-policy.yaml
type: File
- name: audit-log
hostPath:
path: /var/log/kubernetes/audit
type: DirectoryOrCreate
For k3s:
# Create the audit policy
sudo mkdir -p /var/lib/rancher/k3s/server/
sudo cp audit-policy.yaml /var/lib/rancher/k3s/server/audit-policy.yaml
# Add to k3s server configuration
# /etc/rancher/k3s/config.yaml
# kube-apiserver-arg:
# - "audit-policy-file=/var/lib/rancher/k3s/server/audit-policy.yaml"
# - "audit-log-path=/var/log/kubernetes/audit/audit.log"
# - "audit-log-maxage=30"
# - "audit-log-maxbackup=10"
# - "audit-log-maxsize=100"
sudo systemctl restart k3s
Step 3: Sensitive Field Redaction
Prevent passwords and tokens from appearing in audit logs, even at the RequestResponse level:
# Add to the audit policy (top-level field alongside rules)
apiVersion: audit.k8s.io/v1
kind: Policy
omitManagedFields: true
rules:
# Log secrets at RequestResponse but redact the data field
- level: RequestResponse
resources:
- group: ""
resources: ["secrets"]
omitStages:
- RequestReceived
# Note: Kubernetes 1.30+ supports the omitResponseBody field
# For older versions, use a webhook backend that strips sensitive
# fields before forwarding to the SIEM
# ... rest of the policy ...
For pre-1.30 clusters, use a log pipeline to strip sensitive fields:
# Fluent Bit filter to redact secret data from audit logs
# fluent-bit-config.yaml
[FILTER]
Name lua
Match kube-audit.*
Script /fluent-bit/scripts/redact-secrets.lua
Call redact_secrets
-- /fluent-bit/scripts/redact-secrets.lua
function redact_secrets(tag, timestamp, record)
if record["objectRef"] and record["objectRef"]["resource"] == "secrets" then
if record["requestObject"] and record["requestObject"]["data"] then
record["requestObject"]["data"] = "[REDACTED]"
end
if record["responseObject"] and record["responseObject"]["data"] then
record["responseObject"]["data"] = "[REDACTED]"
end
end
return 1, timestamp, record
end
Step 4: Ship Audit Logs to a SIEM
Option A: Fluent Bit to Loki
# fluent-bit-daemonset.yaml (audit log shipper on control plane nodes)
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: audit-log-shipper
namespace: monitoring
spec:
selector:
matchLabels:
app: audit-log-shipper
template:
metadata:
labels:
app: audit-log-shipper
spec:
nodeSelector:
node-role.kubernetes.io/control-plane: ""
tolerations:
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
containers:
- name: fluent-bit
image: fluent/fluent-bit:3.1
volumeMounts:
- name: audit-log
mountPath: /var/log/kubernetes/audit
readOnly: true
- name: config
mountPath: /fluent-bit/etc
env:
- name: LOKI_URL
value: "http://loki.monitoring.svc.cluster.local:3100/loki/api/v1/push"
volumes:
- name: audit-log
hostPath:
path: /var/log/kubernetes/audit
type: Directory
- name: config
configMap:
name: fluent-bit-audit-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-audit-config
namespace: monitoring
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Log_Level info
[INPUT]
Name tail
Path /var/log/kubernetes/audit/audit.log
Parser json
Tag kube-audit
Refresh_Interval 5
[OUTPUT]
Name loki
Match kube-audit
Host loki.monitoring.svc.cluster.local
Port 3100
Labels job=kube-audit
Auto_Kubernetes_Labels off
Option B: Webhook backend (direct streaming)
# /etc/kubernetes/audit-webhook.yaml
apiVersion: v1
kind: Config
clusters:
- name: audit-backend
cluster:
server: https://siem.internal.example.com:9200/_bulk
certificate-authority: /etc/kubernetes/pki/siem-ca.crt
contexts:
- name: audit
context:
cluster: audit-backend
current-context: audit
# Add to kube-apiserver flags:
# --audit-webhook-config-file=/etc/kubernetes/audit-webhook.yaml
# --audit-webhook-batch-max-wait=5s
# --audit-webhook-batch-max-size=100
Step 5: Top 10 Suspicious Patterns to Alert On
These are the highest-signal audit log patterns that indicate potential security incidents:
| # | Pattern | LogQL Query (Loki) | Severity |
|---|---|---|---|
| 1 | Secret list across namespaces | {job="kube-audit"} | json | objectRef_resource="secrets" and verb="list" and objectRef_namespace="" |
Critical |
| 2 | ClusterRoleBinding creation | {job="kube-audit"} | json | objectRef_resource="clusterrolebindings" and verb="create" |
Critical |
| 3 | Exec into pod | {job="kube-audit"} | json | objectRef_subresource="exec" and verb="create" |
High |
| 4 | ServiceAccount token request | {job="kube-audit"} | json | objectRef_resource="serviceaccounts" and objectRef_subresource="token" and verb="create" |
High |
| 5 | Webhook configuration change | {job="kube-audit"} | json | objectRef_resource=~"mutatingwebhookconfigurations|validatingwebhookconfigurations" |
Critical |
| 6 | Privileged pod creation | {job="kube-audit"} | json | objectRef_resource="pods" and verb="create" | line_format "{{.requestObject}}" | regexp "privileged.*true" |
Critical |
| 7 | Namespace deletion | {job="kube-audit"} | json | objectRef_resource="namespaces" and verb="delete" |
High |
| 8 | Anonymous or unauthenticated requests | {job="kube-audit"} | json | user_username="system:anonymous" |
Critical |
| 9 | RBAC escalation (bind/escalate verbs) | {job="kube-audit"} | json | verb=~"bind|escalate" |
Critical |
| 10 | Node status patch from unexpected source | {job="kube-audit"} | json | objectRef_resource="nodes" and verb="patch" and user_username!~"system:node:.*" |
High |
Example Prometheus alerting rules (from audit log metrics):
# audit-alerts.yaml (PrometheusRule)
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kubernetes-audit-alerts
namespace: monitoring
spec:
groups:
- name: kubernetes-audit
interval: 30s
rules:
- alert: SecretListAllNamespaces
expr: |
count by (user_username) (
count_over_time(
{job="kube-audit"} | json
| objectRef_resource="secrets"
| verb="list"
| objectRef_namespace=""
[5m]
)
) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "User {{ $labels.user_username }} listed secrets across all namespaces"
description: "Listing secrets without a namespace filter is a common reconnaissance technique."
- alert: ClusterRoleBindingCreated
expr: |
count by (user_username) (
count_over_time(
{job="kube-audit"} | json
| objectRef_resource="clusterrolebindings"
| verb="create"
[5m]
)
) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "User {{ $labels.user_username }} created a ClusterRoleBinding"
description: "ClusterRoleBinding creation grants cluster-wide permissions. Verify this was authorized."
- alert: ExecIntoPod
expr: |
count by (user_username, objectRef_namespace) (
count_over_time(
{job="kube-audit"} | json
| objectRef_subresource="exec"
| verb="create"
[5m]
)
) > 3
for: 0m
labels:
severity: high
annotations:
summary: "User {{ $labels.user_username }} exec'd into pods in {{ $labels.objectRef_namespace }} more than 3 times in 5 minutes"
Step 6: Storage Estimation
Use this formula to estimate audit log storage requirements:
| Cluster Activity | Metadata Only | Request (mutations) | RequestResponse (secrets + RBAC) |
|---|---|---|---|
| Small (50 pods, 5 deploys/day) | 500 MB/day | 1 GB/day | 2 GB/day |
| Medium (200 pods, 20 deploys/day) | 2 GB/day | 5 GB/day | 10 GB/day |
| Large (1000 pods, 100 deploys/day) | 10 GB/day | 25 GB/day | 50 GB/day |
With the tiered policy from Step 1 (None for noise, Metadata for reads, Request for mutations, RequestResponse for secrets and RBAC), expect 40-60% reduction compared to a flat RequestResponse policy.
Expected Behaviour
After implementing audit logging and analysis:
- Health checks, metrics endpoints, and watch requests produce no audit log entries (filtered to None)
- Secret access and RBAC changes produce full RequestResponse entries for forensic analysis
- Pod mutations, exec operations, and webhook changes produce Request-level entries
- All other API calls produce Metadata-level entries (who, what, when, result)
- Audit logs ship to the SIEM within 5-10 seconds of the API call
- Alerts fire within 30-60 seconds of a suspicious pattern match
- Secret data fields are redacted in the log pipeline, preventing credential exposure in the SIEM
Trade-offs
| Control | Impact | Risk | Mitigation |
|---|---|---|---|
| RequestResponse for secrets | Full request/response bodies for secret operations (5-50 KB per event) | Storage cost increase; secret data in audit logs creates a secondary exposure risk | Redact secret data in the log pipeline. Set retention limits (30 days for RequestResponse, 90 days for Metadata) |
| None for health checks | No audit trail for health check endpoints | If an attacker discovers an unauthenticated health endpoint that leaks information, there is no log | Monitor health endpoints separately. Ensure health endpoints do not expose sensitive data |
| Webhook backend | API server depends on SIEM availability for audit delivery | If the webhook backend is unavailable, audit events are lost (unless buffered) | Use file-based logging as primary, webhook as secondary. Configure batch settings to buffer during short outages |
| Aggressive filtering | Reduced log volume and cost | Filtered events are invisible; new attack patterns targeting filtered resources will not be detected | Review and update the audit policy quarterly. Compare against updated threat models |
| Alert rules for top 10 patterns | Early detection of common attack techniques | Alert fatigue if patterns trigger on legitimate operations (e.g., platform engineers exec into pods regularly) | Exclude known-good identities from alerts. Use escalating severity (first occurrence: info; repeated: critical) |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Audit policy syntax error | API server fails to start; static pod enters CrashLoopBackOff | API server container logs show “failed to load audit policy”; kubectl commands fail |
Fix the policy YAML syntax. On kubeadm, the kubelet restarts the static pod automatically after the file is corrected |
| Audit log disk full | API server stops writing audit logs; if --audit-log-path is on the same partition as etcd, etcd may also fail |
Disk usage alerts on control plane nodes; audit log file stops growing | Increase disk size. Reduce --audit-log-maxbackup and --audit-log-maxsize. Archive old logs to object storage |
| Fluent Bit shipper OOM | Audit logs accumulate on disk but are not forwarded to the SIEM | Fluent Bit pod in OOM/CrashLoopBackOff; gap in SIEM audit data | Increase Fluent Bit memory limits. Reduce batch size. Check for log parsing errors causing memory leaks |
| SIEM backend unreachable | Webhook-based shipping drops events; file-based shipping continues locally | Webhook mode: API server logs show webhook delivery errors. File mode: no symptoms on API server side, but SIEM shows data gap | Restore SIEM connectivity. For webhook mode, consider switching to file-based shipping with a separate shipper for reliability |
| Alert rule too broad | Constant alerts for legitimate operations (e.g., CI/CD service accounts exec into pods) | Alert fatigue; team starts ignoring audit alerts | Add exclusions for known service accounts. Use separate alert rules for human users vs service accounts |
When to Consider a Managed Alternative
Transition point: Building an audit log pipeline (policy design, log shipping, SIEM integration, alert rules, storage management) requires 20-40 hours of initial setup and ongoing maintenance. The audit policy needs updates as your cluster evolves. The SIEM needs capacity planning. The alert rules need tuning to reduce noise. For teams without a dedicated security engineer, this is a significant and permanent operational commitment.
Recommended providers:
- Grafana Cloud (#108): Managed Loki for audit log storage with built-in LogQL querying. Eliminates the need to run Loki infrastructure. Includes alerting via Grafana Alerting with no additional tooling.
- Axiom (#112): Managed log aggregation with zero-configuration ingestion. Supports direct webhook ingestion from the Kubernetes API server, eliminating the need for a separate log shipper.
- Sysdig (#122): Kubernetes-native security platform that includes audit log analysis with pre-built detection rules. Provides the top 10 alert patterns from this article as out-of-the-box detections.
What you still control: The audit policy (what to log at which level) remains your configuration regardless of where logs are stored. The SIEM provider handles storage, indexing, and querying infrastructure. You define the alert rules and response procedures.
Premium content pack: Alert rule pack for Kubernetes audit events, including Prometheus alerting rules, Grafana dashboard for audit log visualization, and LogQL queries for the top 10 suspicious patterns. Includes a Fluent Bit configuration for shipping audit logs to Loki, Elasticsearch, and S3.