Hardening /proc and /sys: Restricting Kernel Information Disclosure

Problem

/proc and /sys are virtual filesystems that expose kernel internals, hardware details, and process information to userspace. On a stock Linux system, every unprivileged user can read:

/proc/kallsyms – the addresses of every symbol in the running kernel. With this data, an attacker can bypass KASLR (Kernel Address Space Layout Randomisation) and precisely target kernel exploitation.
/proc/kcore – a virtual file representing the physical memory of the system. Root can read the full contents of RAM through this file, including encryption keys, credentials, and other secrets.
/proc/[pid]/ directories for every process on the system. Any user can see the command-line arguments, environment variables (which often contain secrets), memory maps, and file descriptors of every other user’s processes.
/sys/kernel/ files that expose kernel configuration details, security module state, and hardware topology useful for fingerprinting.
/proc/sysrq-trigger – the magic SysRq interface that can reboot the machine, kill all processes, or dump memory, accessible to root without authentication.

These information leaks are prerequisites for most local privilege escalation attacks. The attacker first reads /proc to learn the kernel’s memory layout, identify running services, and find processes with interesting credentials, then uses that information to craft a targeted exploit.

Target systems: Ubuntu 24.04 LTS, Debian 12, RHEL 9 / Rocky Linux 9, kernel 5.15+.

Threat Model

Adversary: Unprivileged local user with shell access (compromised web application, stolen SSH credentials, container escape into the host namespace).
Access level: Unprivileged shell on the host, or a container with the host’s /proc mounted (misconfigured container or privileged mode).
Objective: Reconnaissance (kernel addresses for KASLR bypass, process enumeration, credential harvesting from environment variables), or direct system manipulation via /proc/sysrq-trigger.
Blast radius: Information gathered from /proc and /sys enables further attacks (privilege escalation, targeted exploitation). If /proc/sysrq-trigger is accessible, immediate denial of service or data exfiltration is possible.

Configuration

Hiding Process Information with hidepid

The hidepid mount option on /proc controls which processes are visible to unprivileged users:

Value	Effect
`hidepid=0`	Default. All users can read all `/proc/[pid]/` directories.
`hidepid=1`	Users can see all `/proc/[pid]/` entries but cannot access `/proc/[pid]/cmdline`, `/proc/[pid]/status`, etc. for other users’ processes.
`hidepid=2`	Users can only see their own processes in `/proc`. Other users’ PID directories are invisible.
`hidepid=invisible`	Same as `hidepid=2` on kernels 5.8+. Clearer naming.

Apply hidepid=2 via /etc/fstab:

# /etc/fstab - add or modify the /proc mount line
proc    /proc    proc    defaults,hidepid=2,gid=proc    0    0

The gid=proc option allows members of the proc group to see all processes. This is essential for monitoring agents and tools that need full process visibility.

# Create the proc group if it doesn't exist
sudo groupadd -r proc 2>/dev/null

# Add monitoring users to the proc group
sudo usermod -aG proc prometheus
sudo usermod -aG proc node_exporter
sudo usermod -aG proc zabbix

# Apply immediately without rebooting
sudo mount -o remount,hidepid=2,gid=proc /proc

Verify:

# As root - should see all processes
ps aux | wc -l

# As an unprivileged user - should see only their own processes
su - testuser -c "ps aux | wc -l"
# Expected: far fewer processes than root sees

Restricting Kernel Pointer Exposure

Kernel pointers in /proc/kallsyms are the primary target for KASLR bypass. Restrict them with kptr_restrict:

# /etc/sysctl.d/60-proc-hardening.conf

# Hide kernel pointers from all users (even root)
# 0 = visible to all (insecure default on some distros)
# 1 = hidden from unprivileged users, visible to root
# 2 = hidden from all users including root
kernel.kptr_restrict = 2

# Restrict access to dmesg (kernel ring buffer)
# Contains kernel addresses, hardware details, driver information
kernel.dmesg_restrict = 1

# Restrict perf_event access to prevent side-channel attacks
kernel.perf_event_paranoid = 3

# Disable the SysRq magic key (prevents reboot/crash via /proc/sysrq-trigger)
# 0 = disable all SysRq functions
# 1 = enable all SysRq functions (insecure)
# 176 = allow only sync and remount-ro (useful for emergency recovery)
kernel.sysrq = 0

Apply:

sudo sysctl --system

Restricting /proc/kcore

/proc/kcore provides a raw view of physical memory. While only root can read it by default, a compromised root account (through sudo misconfiguration or a container escape to host namespaces) can dump the entire contents of RAM:

# Check current permissions
ls -la /proc/kcore
# -r-------- 1 root root ... /proc/kcore (readable by root only by default)

On systems with Secure Boot and lockdown=confidentiality, access to /proc/kcore is blocked even for root. If you cannot use lockdown mode, restrict access with an AppArmor or SELinux policy.

AppArmor (Ubuntu/Debian):

# /etc/apparmor.d/proc-kcore
profile proc-kcore /proc/kcore {
    deny /proc/kcore r,
}

Hardening /sys Filesystem Access

The /sys filesystem exposes kernel configuration, device information, and security module interfaces. Key paths to restrict:

# Restrict access to security module interfaces
sudo chmod 700 /sys/kernel/security 2>/dev/null

# Restrict access to kernel debug interface
sudo chmod 700 /sys/kernel/debug 2>/dev/null

For persistent restrictions, create a systemd tmpfiles rule:

# /etc/tmpfiles.d/sys-hardening.conf
# Restrict /sys/kernel/security to root only
z /sys/kernel/security 0700 root root -
z /sys/kernel/debug 0700 root root -

sudo systemd-tmpfiles --create

Container Runtime procfs Masking

Container runtimes mask certain /proc and /sys paths to prevent containers from accessing sensitive host information. However, the specific paths masked differ between runtimes.

Paths masked by default in containerd and CRI-O:

Path	Why it is masked
`/proc/acpi`	Hardware ACPI tables (host fingerprinting)
`/proc/kcore`	Physical memory access
`/proc/keys`	Kernel keyring (encryption keys)
`/proc/latency_stats`	Kernel scheduling information
`/proc/sched_debug`	Scheduler debug output
`/proc/scsi`	SCSI device information
`/proc/timer_list`	Kernel timer information
`/proc/timer_stats`	Timer statistics
`/sys/firmware`	Firmware tables (host fingerprinting)

Verify container procfs masking:

# From inside a container, these should return "Permission denied" or show empty/fake data
docker run --rm alpine cat /proc/kcore
# Expected: "Permission denied"

docker run --rm alpine cat /proc/acpi/wakeup
# Expected: "Permission denied" or "No such file or directory"

If you run containers with --privileged, all procfs masking is disabled. Never use --privileged in production. Instead, grant specific capabilities:

# Kubernetes security context - restrictive defaults
securityContext:
  privileged: false
  readOnlyRootFilesystem: true
  allowPrivilegeEscalation: false
  capabilities:
    drop: ["ALL"]
  procMount: Default  # Uses the runtime's default masking

Verification Script

#!/bin/bash
# verify-proc-hardening.sh

FAIL=0

check_sysctl() {
    local key="$1"
    local expected="$2"
    local actual
    actual=$(sysctl -n "$key" 2>/dev/null)
    if [ "$actual" != "$expected" ]; then
        echo "FAIL: $key = $actual (expected $expected)"
        FAIL=1
    else
        echo "OK:   $key = $actual"
    fi
}

echo "=== sysctl Settings ==="
check_sysctl kernel.kptr_restrict 2
check_sysctl kernel.dmesg_restrict 1
check_sysctl kernel.perf_event_paranoid 3
check_sysctl kernel.sysrq 0

echo ""
echo "=== /proc Mount Options ==="
if findmnt -n -o OPTIONS /proc | grep -q "hidepid=2\|hidepid=invisible"; then
    echo "OK:   /proc mounted with hidepid=2 or hidepid=invisible"
else
    echo "FAIL: /proc not mounted with hidepid"
    FAIL=1
fi

echo ""
echo "=== Kernel Pointer Exposure ==="
KALLSYMS=$(cat /proc/kallsyms 2>/dev/null | head -1)
if echo "$KALLSYMS" | grep -q "^0000000000000000"; then
    echo "OK:   /proc/kallsyms addresses are zeroed"
else
    echo "FAIL: /proc/kallsyms exposes kernel addresses"
    FAIL=1
fi

echo ""
echo "=== dmesg Access ==="
if dmesg 2>&1 | grep -q "Operation not permitted"; then
    echo "OK:   dmesg restricted for unprivileged users"
else
    echo "INFO: Run this check as a non-root user to verify dmesg restriction"
fi

echo ""
if [ $FAIL -eq 0 ]; then
    echo "ALL CHECKS PASSED"
    exit 0
else
    echo "SOME CHECKS FAILED"
    exit 1
fi

Expected Behaviour

After applying /proc and /sys hardening:

cat /proc/kallsyms as a non-root user shows all addresses as 0000000000000000
cat /proc/kallsyms as root also shows zeroed addresses (with kptr_restrict=2)
dmesg as a non-root user returns “Operation not permitted”
ps aux as a non-root user shows only that user’s processes (with hidepid=2)
echo b > /proc/sysrq-trigger as root does nothing (with sysrq=0)
Monitoring agents in the proc group can still see all processes and collect metrics
Container processes cannot read /proc/kcore, /proc/keys, or /proc/acpi
System services (SSH, web servers, databases) function normally
systemd-cgtop, htop (as root), and top (as root) display all processes correctly

Trade-offs

Control	Benefit	Cost	Mitigation
`hidepid=2`	Users cannot see other users’ processes, preventing enumeration of running services and command-line secrets	`ps aux` as non-root shows only own processes. Some tools that expect full process visibility break.	Add monitoring and admin users to the `proc` group via the `gid=proc` mount option.
`kptr_restrict=2`	Kernel addresses hidden from everyone, including root. Prevents KASLR bypass even after root compromise.	Root cannot debug kernel issues that require symbol addresses. `perf` and `bpftrace` cannot resolve kernel symbols.	Use `kptr_restrict=1` if root needs kernel symbols for debugging. On dedicated development/debugging hosts, keep at 0.
`sysrq=0`	Prevents abuse of the SysRq interface for denial of service or data exfiltration	Cannot use SysRq for emergency recovery (sync, remount-ro, reboot)	Set `sysrq=176` to allow only safe SysRq functions (sync and remount-ro). Useful for emergency situations on physical hardware.
`dmesg_restrict=1`	Prevents unprivileged access to kernel ring buffer (addresses, hardware info, driver details)	Users cannot run `dmesg` for troubleshooting	Grant `CAP_SYSLOG` to specific debugging users or tools. Or use `journalctl -k` with appropriate journal permissions.
Container procfs masking	Containers cannot access sensitive host kernel information	Some monitoring containers need access to host `/proc` paths	Mount specific host paths read-only into monitoring containers instead of disabling procfs masking entirely.

Failure Modes

Failure	Symptom	Detection	Recovery
Monitoring agent cannot read `/proc`	Metrics collection stops. Dashboards show gaps. Alerts fire for missing metrics.	Prometheus scrape errors. Agent logs show “permission denied” on `/proc` paths.	Add the monitoring agent’s user to the `proc` group: `usermod -aG proc <agent_user>`. Restart the agent.
`hidepid=2` breaks application that reads other processes	Application fails with “no such file or directory” when reading `/proc/<pid>` of another process	Application error logs reference `/proc` paths. `strace` shows `ENOENT` or `EACCES` on `/proc/[pid]/` access.	Add the application user to the `proc` group. Or run the application with `CAP_SYS_PTRACE` capability (grants `/proc` access).
`kptr_restrict=2` breaks debugging tools	`perf report` shows unresolved symbols. `bpftrace` cannot map kernel addresses to function names.	Debugging output shows hex addresses instead of symbol names.	Temporarily set `sysctl kernel.kptr_restrict=1` for the debugging session. Reset to 2 when done.
`sysrq=0` prevents emergency recovery	Cannot use Alt+SysRq+S (sync) or Alt+SysRq+B (reboot) on a hung system	System is hung and the only option is a hard power cycle	Set `sysrq=176` instead of 0 to allow sync and remount-ro. For remote systems, use IPMI/BMC for emergency reboot.
Container runs with `--privileged` bypassing all masking	Container can read all `/proc` and `/sys` paths, including kernel memory	Kubernetes audit log shows privileged container creation. Pod security admission rejects the pod (if PSA is enforced).	Never use `--privileged`. Use Kubernetes Pod Security Admission (or a policy engine) to reject privileged containers at the admission level.

When to Consider a Managed Alternative

Transition point: When you run containers at scale and need consistent procfs masking across multiple container runtimes and runtime versions, or when container runtime upgrades change the default masking behaviour and you need to verify compliance after each update.

What managed providers handle:

Managed Kubernetes providers (Civo (#22), DigitalOcean (#21), Vultr (#12), Linode (#13)) configure container runtimes with appropriate procfs masking on their node images. The provider handles the runtime configuration and ensures that containers cannot access sensitive host paths by default. When the provider upgrades the container runtime, they verify that masking policies are maintained.

Falco (open source) and Sysdig (#122) detect suspicious access patterns to /proc and /sys paths at runtime. If a container attempts to read /proc/kcore or access a masked path, these tools generate an alert. This provides detection even if a masking configuration is accidentally weakened.

What you still control: Host-level /proc hardening (hidepid, kptr_restrict, dmesg_restrict) is your responsibility on self-managed infrastructure. Pod security contexts and admission policies that prevent privileged containers are your responsibility on any Kubernetes deployment, including managed clusters.

Automation path: For self-managed infrastructure, apply the sysctl and fstab configurations from this article through your configuration management tool. Run the verification script on a schedule to detect drift. For Kubernetes, enforce Pod Security Standards at the namespace level to prevent containers from running with elevated procfs access.