Kubernetes Node Hardening: From OS Configuration to kubelet Lockdown
Problem
A Kubernetes node is a Linux machine running kubelet, a container runtime, and your workloads. If the node is compromised, every pod on that node is compromised. Node hardening spans four layers: the base operating system, kernel parameters, the container runtime (containerd or CRI-O), and the kubelet configuration. Most hardening guides cover one of these layers but miss the others.
The challenges are concrete:
- Default OS installations include unnecessary packages. Ubuntu Server ships with
curl,wget,gcc,python3, and hundreds of other utilities. A compromised container that escapes to the host finds a full toolkit for lateral movement. - Default kernel parameters favour compatibility over security. IP forwarding, source routing, and ICMP redirects are enabled by default. These are useful for routers, not for Kubernetes nodes.
- kubelet defaults are permissive. Anonymous authentication is enabled, read-only port 10255 is open, and the kubelet API allows node-level operations without strong authorization.
- Container runtime defaults trust all images. containerd and CRI-O ship with permissive configurations that do not enforce image signing, runtime classes, or seccomp defaults.
This article covers all four layers with production-ready configuration for each. The estimated effort is 4-8 hours per node image, which is why managed Kubernetes providers are the right answer for many teams.
Target systems: Kubernetes 1.29+ on Ubuntu 24.04 LTS, Flatcar Container Linux, or Talos Linux. containerd 1.7+ or CRI-O 1.29+.
Threat Model
- Adversary: Attacker who has achieved container escape (via kernel exploit, runtime vulnerability, or misconfigured privileged container) and now has access to the node.
- Access level: Unprivileged or root shell on the host operating system, depending on the escape vector.
- Objective: Persist on the node, access secrets from other pods, pivot to the control plane or other nodes, exfiltrate data, or deploy cryptominers.
- Blast radius: Without node hardening, a compromised node exposes all pods on that node, the kubelet credentials (which can list cluster resources), and potentially the container runtime socket (which allows spawning new containers). With hardening, the attacker faces a minimal OS with no tools, restricted kernel interfaces, a locked-down kubelet that rejects unauthorized requests, and a runtime that limits container capabilities.
Configuration
Step 1: Minimal Operating System
Choose a container-optimized OS that ships with only the components needed to run kubelet and containers.
Option A: Ubuntu 24.04 Minimal
# Start with ubuntu-24.04-live-server-amd64.iso (minimal installation)
# After installation, remove unnecessary packages:
apt purge -y \
snapd \
unattended-upgrades \
apport \
popularity-contest \
ubuntu-advantage-tools \
gcc \
g++ \
make \
python3-pip
# Remove package managers that aid post-exploitation:
apt purge -y python3-pip
rm -f /usr/bin/wget # Keep curl for health checks if needed
# Lock down the package list
apt autoremove -y
apt clean
Option B: Talos Linux (immutable, API-managed)
Talos has no SSH, no shell, no package manager. All configuration is done via its API:
# talos-machine-config.yaml (relevant security sections)
machine:
install:
disk: /dev/sda
image: ghcr.io/siderolabs/installer:v1.9.0
kubelet:
extraArgs:
rotate-server-certificates: "true"
protect-kernel-defaults: "true"
extraConfig:
serverTLSBootstrap: true
kernel:
modules:
- name: br_netfilter
sysctls:
net.ipv4.ip_forward: "1"
net.bridge.bridge-nf-call-iptables: "1"
net.bridge.bridge-nf-call-ip6tables: "1"
kernel.panic: "10"
vm.overcommit_memory: "1"
Option C: Flatcar Container Linux
Flatcar is immutable with automatic updates. Configuration happens via Ignition:
{
"ignition": { "version": "3.4.0" },
"storage": {
"files": [
{
"path": "/etc/sysctl.d/99-kubernetes.conf",
"contents": {
"source": "data:,net.ipv4.ip_forward%3D1%0Anet.bridge.bridge-nf-call-iptables%3D1%0Anet.ipv4.conf.all.rp_filter%3D1"
}
}
]
}
}
Step 2: Kernel Parameters for Container Nodes
Apply sysctl settings that harden network behaviour and restrict kernel features:
# /etc/sysctl.d/99-kubernetes-hardening.conf
# Required for Kubernetes networking
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
# Disable source routing (prevents IP spoofing)
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv6.conf.all.accept_source_route = 0
# Disable ICMP redirects (prevent MITM via routing table manipulation)
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv6.conf.all.accept_redirects = 0
# Enable reverse path filtering (drop packets with spoofed source IPs)
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
# Ignore ICMP broadcast requests (prevent Smurf attacks)
net.ipv4.icmp_echo_ignore_broadcasts = 1
# Log martian packets (packets with impossible source addresses)
net.ipv4.conf.all.log_martians = 1
# Restrict kernel pointer leaks (prevent KASLR bypass)
kernel.kptr_restrict = 2
# Restrict dmesg access to root
kernel.dmesg_restrict = 1
# Restrict eBPF to privileged users
kernel.unprivileged_bpf_disabled = 1
# Restrict user namespaces (reduce kernel attack surface)
# Note: set to 0 only if your runtime does not need unprivileged user namespaces
# user.max_user_namespaces = 0
# Restrict ptrace to parent processes only
kernel.yama.ptrace_scope = 1
# Disable SysRq key (prevent console-based attacks on physical/VM nodes)
kernel.sysrq = 0
# Apply without reboot
sysctl --system
# Verify critical settings
sysctl net.ipv4.conf.all.accept_redirects # Should be 0
sysctl kernel.kptr_restrict # Should be 2
Step 3: Kernel Boot Parameters
Add security-relevant boot parameters to the kernel command line:
# /etc/default/grub (Ubuntu)
GRUB_CMDLINE_LINUX="apparmor=1 security=apparmor \
vsyscall=none \
page_poison=1 \
slab_nomerge \
init_on_alloc=1 \
init_on_free=1 \
randomize_kstack_offset=on"
# Apply the changes
update-grub
# Reboot required for boot parameters
| Parameter | Purpose |
|---|---|
vsyscall=none |
Disables vsyscall page, removing a known ROP gadget target |
page_poison=1 |
Fills freed pages with a pattern to detect use-after-free |
slab_nomerge |
Prevents slab cache merging, making heap exploitation harder |
init_on_alloc=1 |
Zero-fills allocated memory pages |
init_on_free=1 |
Zero-fills freed memory pages |
Step 4: kubelet Configuration
Lock down the kubelet with a configuration file instead of command-line flags:
# /var/lib/kubelet/config.yaml
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
# Authentication: disable anonymous access, require webhook auth
authentication:
anonymous:
enabled: false
webhook:
enabled: true
cacheTTL: 2m0s
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
# Authorization: use webhook (API server decides)
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 5m0s
cacheUnauthorizedTTL: 30s
# Disable read-only port (10255 exposes node info without auth)
readOnlyPort: 0
# Enable certificate rotation
rotateCertificates: true
serverTLSBootstrap: true
# Protect kernel defaults (kubelet refuses to start if sysctl values
# do not match expected values)
protectKernelDefaults: true
# Event recording rate limits
eventRecordQPS: 5
eventBurst: 10
# Enable seccomp default
seccompDefault: true
# Streaming connection timeouts
streamingConnectionIdleTimeout: 5m0s
# Make sure the kubelet only allows pods scheduled to this node
# (prevents unauthorized pod execution)
enableServer: true
# Verify kubelet is using the config file
systemctl status kubelet
# Check for: --config=/var/lib/kubelet/config.yaml
# Verify anonymous auth is disabled
curl -sk https://localhost:10250/pods
# Expected: 401 Unauthorized (not a list of pods)
# Verify read-only port is closed
curl -s http://localhost:10255/healthz
# Expected: connection refused
Step 5: containerd Hardening
# /etc/containerd/config.toml
version = 2
[plugins."io.containerd.grpc.v1.cri"]
# Disable deprecated image pull progress logging
disable_tcp_service = true
[plugins."io.containerd.grpc.v1.cri".containerd]
# Set default runtime to runc with seccomp
default_runtime_name = "runc"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
# Enable systemd cgroup driver (matches kubelet)
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".registry]
# Restrict to approved registries only
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://registry-1.docker.io"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.example.com"]
endpoint = ["https://registry.example.com"]
# Restrict the containerd socket permissions
chmod 0660 /run/containerd/containerd.sock
chown root:containerd /run/containerd/containerd.sock
# Restart containerd
systemctl restart containerd
# Verify the configuration
containerd config dump | grep SystemdCgroup
# Expected: SystemdCgroup = true
Step 6: CRI-O Hardening (Alternative Runtime)
# /etc/crio/crio.conf.d/99-hardening.conf
[crio.runtime]
# Default seccomp profile for all containers
seccomp_profile = "/usr/share/containers/seccomp.json"
# Set default capabilities (drop all, add only what is needed)
default_capabilities = [
"CHOWN",
"DAC_OVERRIDE",
"FSETID",
"FOWNER",
"SETGID",
"SETUID",
"NET_BIND_SERVICE"
]
# Set allowed registries (block all others)
[crio.image]
allowed_registries = [
"registry.example.com",
"docker.io",
"quay.io",
"registry.k8s.io"
]
Step 7: Node-Level Network Restrictions
Use iptables rules on the node to restrict access to the kubelet API and metadata endpoints:
# /etc/iptables/rules.v4 (or use nftables equivalent)
# Block access to kubelet API from pods (only API server should reach it)
iptables -A INPUT -p tcp --dport 10250 -s 10.0.0.0/8 -j DROP
iptables -A INPUT -p tcp --dport 10250 -s 172.16.0.0/12 -j DROP
iptables -A INPUT -p tcp --dport 10250 -s 192.168.0.0/16 -j DROP
# Allow from control plane CIDR
iptables -A INPUT -p tcp --dport 10250 -s <control-plane-cidr> -j ACCEPT
# Block access to cloud metadata endpoint from pods
# (prevents SSRF-based credential theft)
iptables -t nat -A PREROUTING -p tcp -d 169.254.169.254 --dport 80 \
-s 10.244.0.0/16 -j DNAT --to-destination 127.0.0.1:1
Expected Behaviour
After completing all hardening steps:
- The node runs a minimal OS with no compilers, scripting languages, or unnecessary network tools
- Kernel parameters block source routing, ICMP redirects, and kernel pointer leaks
- kubelet rejects anonymous requests and does not expose a read-only port
- The container runtime enforces a default seccomp profile and restricts image registries
- Pod network traffic cannot reach the kubelet API directly
protectKernelDefaults: truecauses kubelet to refuse to start if kernel parameters are reverted, acting as a drift detection mechanism- Certificate rotation keeps kubelet TLS credentials fresh without manual intervention
Trade-offs
| Control | Impact | Risk | Mitigation |
|---|---|---|---|
| Minimal OS (no debug tools) | Debugging production issues is harder without tcpdump, strace, curl |
Increased mean time to resolution during incidents | Use ephemeral debug containers (kubectl debug node/...) or maintain a separate debug toolkit image |
protectKernelDefaults: true |
kubelet refuses to start if sysctl values do not match | Node fails to join cluster if sysctl configuration is wrong | Test sysctl settings in staging. Include sysctl configuration in node image build pipeline |
| Disabling read-only kubelet port | Monitoring tools that scrape metrics from port 10255 break | Loss of node-level metrics until monitoring is reconfigured | Reconfigure Prometheus to scrape the authenticated port 10250 with appropriate ServiceAccount tokens |
| Registry restrictions in containerd/CRI-O | Legitimate images from unapproved registries fail to pull | Deployment failures when teams use new image sources | Maintain an internal registry mirror. Establish a process for adding approved registries |
| Kernel boot parameters | Slight performance overhead from memory zeroing (init_on_alloc/free) | 1-3% memory allocation overhead | Benchmark with your workloads. Disable init_on_free if overhead is unacceptable (keep init_on_alloc) |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| sysctl settings missing after OS update | kubelet refuses to start due to protectKernelDefaults check |
Node shows NotReady; kubelet logs show “kernel defaults not matching” | Re-apply sysctl settings from /etc/sysctl.d/99-kubernetes-hardening.conf and restart kubelet |
| containerd socket permissions too restrictive | kubelet cannot communicate with the container runtime | kubelet logs show “connection refused” for containerd socket; pods stuck in ContainerCreating | Fix socket permissions: chmod 0660 /run/containerd/containerd.sock and verify kubelet user is in the containerd group |
| Registry restriction blocks system images | kube-proxy, CoreDNS, or other system components fail to pull updated images | System pods in ImagePullBackOff; kubectl describe pod shows registry access denied |
Add registry.k8s.io and docker.io to the allowed registries list. Restart the container runtime |
| kubelet certificate rotation fails | kubelet TLS certificate expires; API server rejects kubelet communication | Node shows NotReady; kubelet logs show TLS handshake errors | Check that the cluster-signing CA is available. Manually approve pending CSRs with kubectl certificate approve |
| Firewall rules block legitimate kubelet traffic | API server cannot reach kubelet for logs, exec, port-forward | kubectl logs and kubectl exec return errors; node shows Ready but pod operations fail |
Review iptables rules. Ensure the control plane CIDR is in the ACCEPT rule for port 10250 |
When to Consider a Managed Alternative
Transition point: Node hardening requires 4-8 hours per node image to implement correctly, plus ongoing maintenance for OS updates, kernel upgrades, and runtime patches. Every time you update the base image, you must verify that all hardening controls survive the update. For teams running 3+ nodes, this is 12-24 hours of initial effort plus a recurring maintenance burden on every OS update cycle.
Recommended providers:
- Civo (#22), DigitalOcean (#21), Vultr (#12), and Linode (#13): Managed Kubernetes services handle node OS selection, kernel configuration, kubelet hardening, and runtime configuration. You deploy workloads; they maintain the node image. This eliminates the entire surface area covered in this article.
What you still control: Pod Security Standards, network policy, seccomp profiles, and application-level security remain your responsibility regardless of whether nodes are managed. The managed provider handles the infrastructure layer; you handle the workload layer.
Premium content pack: Packer templates for building hardened node images (Ubuntu 24.04 and Flatcar) with all sysctl, kubelet, and runtime configurations from this article pre-applied. Includes a CI pipeline for rebuilding images monthly with the latest security patches.