Hardening the Linux Kernel Attack Surface with sysctl and Boot Parameters

Problem

Linux kernels ship with defaults optimised for compatibility, not security. On a stock Ubuntu 24.04 or RHEL 9 installation:

The network stack accepts ICMP redirects, allowing an attacker on the same network segment to reroute traffic through a host they control.
Source routing is enabled, letting an attacker specify the path a packet takes through the network, bypassing firewall rules.
SYN flood protections are enabled by default on most modern distributions, but other network hardening parameters are not.
Kernel pointers are exposed to unprivileged users through /proc/kallsyms, providing the exact memory layout needed to bypass KASLR.
dmesg is readable by all users, leaking kernel addresses, hardware details, and driver information useful for targeted exploitation.
Memory protections like init_on_alloc and init_on_free are disabled, leaving freed memory contents accessible to subsequent allocations.

These defaults persist in production because administrators either do not know which parameters to change, fear breaking running services, or cannot find a single reference that covers the settings, their costs, and their failure modes in one place.

This article is that reference.

Target systems: Ubuntu 24.04 LTS, Debian 12, RHEL 9 / Rocky Linux 9, and any kernel 5.15+.

Threat Model

Adversary: Network-adjacent attacker who can send packets to the host (e.g., shared VPC, compromised neighbour), or unprivileged local user with shell access via a compromised application (e.g., RCE in a web service, compromised dependency).
Access level: Network access to exposed services, or unprivileged shell on the host.
Objective: Reconnaissance (kernel pointer leaks for KASLR bypass, hardware fingerprinting via dmesg), network manipulation (SYN floods, IP spoofing, ICMP redirect for traffic interception), or privilege escalation (leveraging weak memory protections, exploiting use-after-free bugs with uninitialised memory).
Blast radius: Single host compromise. On Kubernetes nodes, a compromised host means access to all pods on that node, kubelet credentials, and potentially the ability to move laterally to other nodes.

Configuration

Network Stack Hardening

These settings harden the IPv4 and IPv6 network stack against spoofing, redirect attacks, and flood-based denial of service.

Create /etc/sysctl.d/60-net-hardening.conf:

# /etc/sysctl.d/60-net-hardening.conf
# Network stack hardening for production systems
# Target: Ubuntu 24.04 LTS, RHEL 9, Debian 12, kernel 5.15+

# --- IPv4: Anti-spoofing ---
# Strict reverse path filtering. Drops packets where the source address
# would not be routable back through the interface they arrived on.
# Use =2 (loose mode) only if this host has asymmetric routing.
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1

# --- IPv4: Disable source routing ---
# Source-routed packets let the sender specify the route, bypassing
# your network topology and potentially your firewall rules.
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.accept_source_route = 0

# --- IPv4: ICMP redirect prevention ---
# Accepting redirects allows a network neighbour to change your routing table.
# Sending redirects can leak your routing topology.
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0

# --- IPv4: Logging and flood protection ---
# Log packets with impossible source addresses (spoofed, martian).
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.log_martians = 1

# SYN flood protection. Enabled by default on most modern kernels,
# but set explicitly to ensure it is not disabled.
net.ipv4.tcp_syncookies = 1

# TCP timestamps are REQUIRED for SYN cookies to work. Many hardening
# guides incorrectly recommend disabling timestamps. Do not disable them.
net.ipv4.tcp_timestamps = 1

# Ignore ICMP echo requests sent to broadcast addresses (Smurf attack prevention).
net.ipv4.icmp_echo_ignore_broadcasts = 1

# Ignore bogus ICMP error responses.
net.ipv4.icmp_ignore_bogus_error_responses = 1

# --- IPv6: Hardening ---
# Disable ICMP redirects for IPv6.
net.ipv6.conf.all.accept_redirects = 0
net.ipv6.conf.default.accept_redirects = 0

# Disable IPv6 source routing.
net.ipv6.conf.all.accept_source_route = 0
net.ipv6.conf.default.accept_source_route = 0

# Disable Router Advertisement acceptance on servers.
# WARNING: Only set this on hosts with static IPv6 configuration.
# Hosts relying on SLAAC for IPv6 addressing will lose connectivity.
net.ipv6.conf.all.accept_ra = 0
net.ipv6.conf.default.accept_ra = 0

Why tcp_timestamps = 1 must stay enabled: A common misconception in older hardening guides is that TCP timestamps leak uptime information and should be disabled. This is wrong for two reasons: (1) The information leakage is minimal and easily obtained through other means. (2) SYN cookies (tcp_syncookies) require TCP timestamps to function. Disabling timestamps disables your SYN flood protection. Keep timestamps enabled.

Memory and Kernel Protections

These settings restrict access to kernel internals and harden memory management against exploitation.

Create /etc/sysctl.d/60-kernel-hardening.conf:

# /etc/sysctl.d/60-kernel-hardening.conf
# Kernel memory and information disclosure protections
# Target: Ubuntu 24.04 LTS, RHEL 9, Debian 12, kernel 5.15+

# --- Address Space Layout Randomisation ---
# 2 = full randomisation (stack, VDSO, shared libraries, mmap, heap).
# This is the default on most modern kernels. Set explicitly to prevent regression.
kernel.randomize_va_space = 2

# --- Kernel pointer restriction ---
# 0 = pointers visible to all users (default on some distros)
# 1 = pointers hidden from non-privileged users
# 2 = pointers hidden from all users including root
# Use 2 for production. Use 1 if monitoring tools need kernel pointers.
kernel.kptr_restrict = 2

# --- Restrict dmesg access ---
# Prevents unprivileged users from reading kernel ring buffer.
# dmesg contains kernel addresses, hardware info, and driver details.
kernel.dmesg_restrict = 1

# --- Restrict perf_event access ---
# 3 = disallow all perf event access for unprivileged users.
# Perf events can be used for side-channel attacks (Spectre variants).
kernel.perf_event_paranoid = 3

# --- Restrict ptrace ---
# 0 = any process can ptrace any other (dangerous)
# 1 = only parent processes can ptrace children (default on Ubuntu)
# 2 = only processes with CAP_SYS_PTRACE can ptrace
# 3 = no process can ptrace (breaks debuggers entirely)
# Use 1 for production. Use 2 if no debugging is needed on this host.
kernel.yama.ptrace_scope = 1

# --- Disable unprivileged BPF ---
# Prevents unprivileged users from loading BPF programs.
# BPF can be used for kernel exploitation. Privileged BPF (root, CAP_BPF)
# is still available for tools like tcpdump and container runtimes.
# WARNING: Test with your container runtime. Some older versions of
# containerd/CRI-O use unprivileged BPF. Modern versions (containerd 1.7+,
# CRI-O 1.28+) work with this set to 1.
kernel.unprivileged_bpf_disabled = 1

# --- Harden BPF JIT ---
# When BPF JIT is enabled, harden the compiled code against
# JIT spraying attacks.
net.core.bpf_jit_harden = 2

# --- Disable kexec ---
# Prevents loading a new kernel at runtime. An attacker with root
# could use kexec to load a kernel without your security settings.
kernel.kexec_load_disabled = 1

# --- Restrict userfaultfd ---
# userfaultfd is used in kernel exploitation (race condition stabilisation).
# 1 = only privileged users can create userfaultfd.
vm.unprivileged_userfaultfd = 0

Filesystem Protections

Create /etc/sysctl.d/60-fs-hardening.conf:

# /etc/sysctl.d/60-fs-hardening.conf
# Filesystem protections against link-based attacks
# Target: Ubuntu 24.04 LTS, RHEL 9, Debian 12, kernel 5.15+

# Prevent hardlink creation to files the user does not own.
# Mitigates hardlink-based privilege escalation in world-writable directories.
fs.protected_hardlinks = 1

# Prevent symlink following in world-writable sticky directories
# unless the owner of the symlink matches the owner of the directory
# or the target. Mitigates symlink attacks in /tmp.
fs.protected_symlinks = 1

# Restrict FIFO and regular file creation in world-writable sticky
# directories to prevent data spoofing attacks.
# 2 = also applies when the directory owner does not own the existing file.
fs.protected_fifos = 2
fs.protected_regular = 2

# Prevent core dumps from setuid programs.
# Core dumps from privileged programs can contain sensitive data.
fs.suid_dumpable = 0

Boot Parameters

These kernel command-line parameters must be set in the bootloader (GRUB) and require a reboot to take effect.

Edit /etc/default/grub and add parameters to GRUB_CMDLINE_LINUX:

# Add these to the existing GRUB_CMDLINE_LINUX value in /etc/default/grub.
# Do not replace the existing value - append to it.

GRUB_CMDLINE_LINUX="$EXISTING_VALUES init_on_alloc=1 init_on_free=1 page_alloc.shuffle=1 slab_nomerge vsyscall=none lockdown=confidentiality"

Parameter reference:

Parameter	What it does	Performance impact
`init_on_alloc=1`	Zeroes memory on allocation, preventing data leaks from freed objects	1-3% throughput reduction on allocation-heavy workloads
`init_on_free=1`	Zeroes memory on free, preventing use-after-free data leaks	3-5% additional overhead. Skip on latency-sensitive systems
`page_alloc.shuffle=1`	Randomises page allocator freelists, making heap layout unpredictable	Negligible
`slab_nomerge`	Prevents merging of slab caches with similar object sizes, reducing cross-cache exploitation	5-15% increased memory usage
`vsyscall=none`	Disables the legacy vsyscall page, which is a known exploitation target	None. Breaks very old binaries (pre-glibc 2.14, circa 2011)
`lockdown=confidentiality`	Prevents root from reading kernel memory, loading unsigned modules, accessing /dev/mem, and using kexec	Blocks: NVIDIA unsigned drivers, hibernation, some BPF operations

Apply the GRUB changes:

# On Debian/Ubuntu:
sudo update-grub

# On RHEL/Rocky:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

# Reboot to apply boot parameters:
sudo systemctl reboot

About lockdown=confidentiality: This is the most impactful boot parameter. It prevents even root from accessing raw kernel memory, loading unsigned modules, or using kexec. If you use unsigned kernel modules (NVIDIA proprietary drivers, ZFS DKMS), use lockdown=integrity instead (weaker but allows unsigned modules) or sign your modules. Test this in staging before production.

Applying and Persisting sysctl Settings

Apply all sysctl settings immediately without rebooting:

# Apply all settings from /etc/sysctl.d/
sudo sysctl --system

# Output will show each setting being applied:
# * Applying /etc/sysctl.d/60-net-hardening.conf ...
# * Applying /etc/sysctl.d/60-kernel-hardening.conf ...
# * Applying /etc/sysctl.d/60-fs-hardening.conf ...

File naming convention: Files in /etc/sysctl.d/ are applied in lexicographic order. Using the 60- prefix ensures our hardening runs after distribution defaults (usually 10- or 20-) but before any application-specific tuning (90- or 99-). This allows application-specific overrides to take precedence.

Verification Script

Save as /usr/local/bin/verify-sysctl-hardening.sh:

#!/bin/bash
# Verify sysctl hardening settings are active.
# Exit code 0 = all settings correct. Exit code 1 = one or more settings wrong.

FAIL=0

check() {
    local key="$1"
    local expected="$2"
    local actual
    actual=$(sysctl -n "$key" 2>/dev/null)
    if [ "$actual" != "$expected" ]; then
        echo "FAIL: $key = $actual (expected $expected)"
        FAIL=1
    else
        echo "OK:   $key = $actual"
    fi
}

echo "=== Network Stack ==="
check net.ipv4.conf.all.rp_filter 1
check net.ipv4.conf.all.accept_source_route 0
check net.ipv4.conf.all.accept_redirects 0
check net.ipv4.conf.all.send_redirects 0
check net.ipv4.conf.all.log_martians 1
check net.ipv4.tcp_syncookies 1
check net.ipv4.tcp_timestamps 1
check net.ipv4.icmp_echo_ignore_broadcasts 1
check net.ipv6.conf.all.accept_redirects 0
check net.ipv6.conf.all.accept_ra 0

echo ""
echo "=== Kernel Protections ==="
check kernel.randomize_va_space 2
check kernel.kptr_restrict 2
check kernel.dmesg_restrict 1
check kernel.perf_event_paranoid 3
check kernel.yama.ptrace_scope 1
check kernel.unprivileged_bpf_disabled 1
check net.core.bpf_jit_harden 2
check kernel.kexec_load_disabled 1
check vm.unprivileged_userfaultfd 0

echo ""
echo "=== Filesystem ==="
check fs.protected_hardlinks 1
check fs.protected_symlinks 1
check fs.protected_fifos 2
check fs.protected_regular 2
check fs.suid_dumpable 0

echo ""
echo "=== Boot Parameters ==="
for param in init_on_alloc=1 init_on_free=1 page_alloc.shuffle=1 slab_nomerge vsyscall=none; do
    if grep -q "$param" /proc/cmdline; then
        echo "OK:   boot param $param present"
    else
        echo "FAIL: boot param $param missing from /proc/cmdline"
        FAIL=1
    fi
done

echo ""
if [ $FAIL -eq 0 ]; then
    echo "ALL CHECKS PASSED"
    exit 0
else
    echo "SOME CHECKS FAILED"
    exit 1
fi

sudo chmod +x /usr/local/bin/verify-sysctl-hardening.sh
sudo verify-sysctl-hardening.sh

Expected Behaviour

After applying all sysctl settings and rebooting with the new boot parameters:

sudo verify-sysctl-hardening.sh returns exit code 0 with all checks passing
cat /proc/kallsyms as a non-root user shows all addresses as 0000000000000000
dmesg as a non-root user returns dmesg: read kernel buffer failed: Operation not permitted
cat /proc/cmdline shows all boot parameters present
sysctl -a 2>/dev/null | grep rp_filter confirms strict mode on all interfaces
Network services (web server, database, SSH) function normally
Container workloads (Docker, containerd, CRI-O with version 1.7+/1.28+) start and run without errors

Testing network hardening (requires a second host on the same network):

# From another host, attempt to send a spoofed packet:
sudo hping3 -S -a 192.0.2.1 -p 80 TARGET_IP
# Expected: packet is dropped (rp_filter). No response from target.

# Attempt a SYN flood:
sudo hping3 -S --flood -p 80 TARGET_IP
# Expected: SYN cookies activate. Legitimate connections still succeed.
# Check with: netstat -s | grep "SYNs to LISTEN"

Trade-offs

Setting	Performance Impact	Compatibility Risk	Recommendation
`init_on_alloc=1`	1-3% throughput reduction on allocation-heavy workloads (benchmarked with `sysbench memory`)	None known	Enable everywhere. The overhead is negligible for most workloads.
`init_on_free=1`	3-5% additional overhead on top of `init_on_alloc`	None known	Enable on security-critical systems. Skip on latency-sensitive workloads (real-time processing, high-frequency trading).
`slab_nomerge`	5-15% increased kernel memory usage	None known	Enable on security-critical systems. Skip on memory-constrained hosts (<2GB RAM).
`lockdown=confidentiality`	None	Blocks unsigned module loading (NVIDIA, ZFS DKMS), hibernation, `/dev/mem` access, some BPF operations	Use `lockdown=integrity` if you need unsigned modules. Test in staging first.
`kernel.unprivileged_bpf_disabled=1`	None	Older container runtimes (containerd <1.7, CRI-O <1.28) may use unprivileged BPF	Test with your container runtime before applying. Modern runtimes are fine.
`net.ipv6.conf.all.accept_ra=0`	None	Hosts using SLAAC for IPv6 addressing will lose IPv6 connectivity	Only set on hosts with static IPv6 configuration.
`rp_filter=1` (strict)	None	Breaks asymmetric routing (traffic enters on one interface, would exit on another)	Use `rp_filter=2` (loose mode) only on interfaces with known asymmetric routing.

Failure Modes

Failure	Symptom	Detection	Recovery
`rp_filter=1` breaks asymmetric routing	Legitimate traffic dropped on multi-homed hosts	`dmesg` shows `martian source` logs with valid source IPs; monitoring shows packet loss on specific interfaces	Set `rp_filter=2` on the affected interface only: `sysctl net.ipv4.conf.eth1.rp_filter=2`
`lockdown=confidentiality` blocks NVIDIA driver	`modprobe nvidia` fails; GPU not available	`dmesg` shows `Lockdown: modprobe: unsigned module loading is restricted`; `nvidia-smi` returns error	Option 1: Sign the module (`scripts/sign-file`). Option 2: Use `lockdown=integrity`. Option 3: Remove `lockdown` from boot params and reboot
`unprivileged_bpf_disabled=1` breaks container runtime	containerd or CRI-O fails to start or pods fail to schedule	Container runtime logs show BPF-related permission errors; `journalctl -u containerd` shows `EPERM`	Set `kernel.unprivileged_bpf_disabled=0` and restart the runtime. Upgrade the runtime to a version that uses privileged BPF.
`accept_ra=0` breaks IPv6 connectivity	IPv6 stops working on hosts using SLAAC	`ip -6 route show` shows no default route; IPv6 connections fail	Set `net.ipv6.conf.<interface>.accept_ra=1` on interfaces needing SLAAC. Better: migrate to static IPv6 configuration.
`init_on_free=1` causes latency regression	P99 latency increases 3-5% on allocation-heavy workloads	Application latency metrics increase after reboot; `perf stat` shows increased page zeroing time	Remove `init_on_free=1` from boot params. Keep `init_on_alloc=1` (lower overhead, still valuable). Reboot.
sysctl settings reset after reboot	Settings revert to defaults	`verify-sysctl-hardening.sh` reports failures after reboot	Check that files exist in `/etc/sysctl.d/` and are not overridden by later files. Run `sysctl --system` and check output for conflicts.

When to Consider a Managed Alternative

Transition point: When you are managing sysctl consistency across more than 10-20 hosts and spending more than 2 hours per month verifying compliance or investigating drift.

What managed providers handle:

Managed Kubernetes providers (Civo (#22), DigitalOcean (#21), Vultr (#12), Linode (#13)) configure node-level kernel parameters as part of their node images. When you run workloads on managed Kubernetes, you do not manage sysctl on the underlying nodes. The provider handles kernel hardening, patching, and configuration consistency across all nodes in your cluster.

Runtime security platforms (Sysdig (#122) and Aqua (#123)) can verify sysctl compliance across a fleet of hosts and alert on configuration drift. If a host’s sysctl settings change (manually, through a package update, or through a configuration management error), the platform detects the deviation and alerts.

What you still control: Application-level sysctl tuning remains your responsibility even on managed infrastructure. Settings like net.core.somaxconn (maximum socket backlog for high-connection workloads) or vm.max_map_count (required by Elasticsearch) are workload-specific and set at the pod level using init containers or security context capabilities.

Automation path: For self-managed infrastructure, use the verification script from this article in a cron job or CI pipeline. For fleet-wide application, see Automated OS Hardening with Ansible (Article #15) for a production-ready playbook that applies these settings across all hosts with staged rollout and canary verification.