Hardening /proc and /sys: Restricting Kernel Information Disclosure
Problem
/proc and /sys are virtual filesystems that expose kernel internals, hardware details, and process information to userspace. On a stock Linux system, every unprivileged user can read:
/proc/kallsyms– the addresses of every symbol in the running kernel. With this data, an attacker can bypass KASLR (Kernel Address Space Layout Randomisation) and precisely target kernel exploitation./proc/kcore– a virtual file representing the physical memory of the system. Root can read the full contents of RAM through this file, including encryption keys, credentials, and other secrets./proc/[pid]/directories for every process on the system. Any user can see the command-line arguments, environment variables (which often contain secrets), memory maps, and file descriptors of every other user’s processes./sys/kernel/files that expose kernel configuration details, security module state, and hardware topology useful for fingerprinting./proc/sysrq-trigger– the magic SysRq interface that can reboot the machine, kill all processes, or dump memory, accessible to root without authentication.
These information leaks are prerequisites for most local privilege escalation attacks. The attacker first reads /proc to learn the kernel’s memory layout, identify running services, and find processes with interesting credentials, then uses that information to craft a targeted exploit.
Target systems: Ubuntu 24.04 LTS, Debian 12, RHEL 9 / Rocky Linux 9, kernel 5.15+.
Threat Model
- Adversary: Unprivileged local user with shell access (compromised web application, stolen SSH credentials, container escape into the host namespace).
- Access level: Unprivileged shell on the host, or a container with the host’s
/procmounted (misconfigured container or privileged mode). - Objective: Reconnaissance (kernel addresses for KASLR bypass, process enumeration, credential harvesting from environment variables), or direct system manipulation via
/proc/sysrq-trigger. - Blast radius: Information gathered from
/procand/sysenables further attacks (privilege escalation, targeted exploitation). If/proc/sysrq-triggeris accessible, immediate denial of service or data exfiltration is possible.
Configuration
Hiding Process Information with hidepid
The hidepid mount option on /proc controls which processes are visible to unprivileged users:
| Value | Effect |
|---|---|
hidepid=0 |
Default. All users can read all /proc/[pid]/ directories. |
hidepid=1 |
Users can see all /proc/[pid]/ entries but cannot access /proc/[pid]/cmdline, /proc/[pid]/status, etc. for other users’ processes. |
hidepid=2 |
Users can only see their own processes in /proc. Other users’ PID directories are invisible. |
hidepid=invisible |
Same as hidepid=2 on kernels 5.8+. Clearer naming. |
Apply hidepid=2 via /etc/fstab:
# /etc/fstab - add or modify the /proc mount line
proc /proc proc defaults,hidepid=2,gid=proc 0 0
The gid=proc option allows members of the proc group to see all processes. This is essential for monitoring agents and tools that need full process visibility.
# Create the proc group if it doesn't exist
sudo groupadd -r proc 2>/dev/null
# Add monitoring users to the proc group
sudo usermod -aG proc prometheus
sudo usermod -aG proc node_exporter
sudo usermod -aG proc zabbix
# Apply immediately without rebooting
sudo mount -o remount,hidepid=2,gid=proc /proc
Verify:
# As root - should see all processes
ps aux | wc -l
# As an unprivileged user - should see only their own processes
su - testuser -c "ps aux | wc -l"
# Expected: far fewer processes than root sees
Restricting Kernel Pointer Exposure
Kernel pointers in /proc/kallsyms are the primary target for KASLR bypass. Restrict them with kptr_restrict:
# /etc/sysctl.d/60-proc-hardening.conf
# Hide kernel pointers from all users (even root)
# 0 = visible to all (insecure default on some distros)
# 1 = hidden from unprivileged users, visible to root
# 2 = hidden from all users including root
kernel.kptr_restrict = 2
# Restrict access to dmesg (kernel ring buffer)
# Contains kernel addresses, hardware details, driver information
kernel.dmesg_restrict = 1
# Restrict perf_event access to prevent side-channel attacks
kernel.perf_event_paranoid = 3
# Disable the SysRq magic key (prevents reboot/crash via /proc/sysrq-trigger)
# 0 = disable all SysRq functions
# 1 = enable all SysRq functions (insecure)
# 176 = allow only sync and remount-ro (useful for emergency recovery)
kernel.sysrq = 0
Apply:
sudo sysctl --system
Restricting /proc/kcore
/proc/kcore provides a raw view of physical memory. While only root can read it by default, a compromised root account (through sudo misconfiguration or a container escape to host namespaces) can dump the entire contents of RAM:
# Check current permissions
ls -la /proc/kcore
# -r-------- 1 root root ... /proc/kcore (readable by root only by default)
On systems with Secure Boot and lockdown=confidentiality, access to /proc/kcore is blocked even for root. If you cannot use lockdown mode, restrict access with an AppArmor or SELinux policy.
AppArmor (Ubuntu/Debian):
# /etc/apparmor.d/proc-kcore
profile proc-kcore /proc/kcore {
deny /proc/kcore r,
}
Hardening /sys Filesystem Access
The /sys filesystem exposes kernel configuration, device information, and security module interfaces. Key paths to restrict:
# Restrict access to security module interfaces
sudo chmod 700 /sys/kernel/security 2>/dev/null
# Restrict access to kernel debug interface
sudo chmod 700 /sys/kernel/debug 2>/dev/null
For persistent restrictions, create a systemd tmpfiles rule:
# /etc/tmpfiles.d/sys-hardening.conf
# Restrict /sys/kernel/security to root only
z /sys/kernel/security 0700 root root -
z /sys/kernel/debug 0700 root root -
sudo systemd-tmpfiles --create
Container Runtime procfs Masking
Container runtimes mask certain /proc and /sys paths to prevent containers from accessing sensitive host information. However, the specific paths masked differ between runtimes.
Paths masked by default in containerd and CRI-O:
| Path | Why it is masked |
|---|---|
/proc/acpi |
Hardware ACPI tables (host fingerprinting) |
/proc/kcore |
Physical memory access |
/proc/keys |
Kernel keyring (encryption keys) |
/proc/latency_stats |
Kernel scheduling information |
/proc/sched_debug |
Scheduler debug output |
/proc/scsi |
SCSI device information |
/proc/timer_list |
Kernel timer information |
/proc/timer_stats |
Timer statistics |
/sys/firmware |
Firmware tables (host fingerprinting) |
Verify container procfs masking:
# From inside a container, these should return "Permission denied" or show empty/fake data
docker run --rm alpine cat /proc/kcore
# Expected: "Permission denied"
docker run --rm alpine cat /proc/acpi/wakeup
# Expected: "Permission denied" or "No such file or directory"
If you run containers with --privileged, all procfs masking is disabled. Never use --privileged in production. Instead, grant specific capabilities:
# Kubernetes security context - restrictive defaults
securityContext:
privileged: false
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
procMount: Default # Uses the runtime's default masking
Verification Script
#!/bin/bash
# verify-proc-hardening.sh
FAIL=0
check_sysctl() {
local key="$1"
local expected="$2"
local actual
actual=$(sysctl -n "$key" 2>/dev/null)
if [ "$actual" != "$expected" ]; then
echo "FAIL: $key = $actual (expected $expected)"
FAIL=1
else
echo "OK: $key = $actual"
fi
}
echo "=== sysctl Settings ==="
check_sysctl kernel.kptr_restrict 2
check_sysctl kernel.dmesg_restrict 1
check_sysctl kernel.perf_event_paranoid 3
check_sysctl kernel.sysrq 0
echo ""
echo "=== /proc Mount Options ==="
if findmnt -n -o OPTIONS /proc | grep -q "hidepid=2\|hidepid=invisible"; then
echo "OK: /proc mounted with hidepid=2 or hidepid=invisible"
else
echo "FAIL: /proc not mounted with hidepid"
FAIL=1
fi
echo ""
echo "=== Kernel Pointer Exposure ==="
KALLSYMS=$(cat /proc/kallsyms 2>/dev/null | head -1)
if echo "$KALLSYMS" | grep -q "^0000000000000000"; then
echo "OK: /proc/kallsyms addresses are zeroed"
else
echo "FAIL: /proc/kallsyms exposes kernel addresses"
FAIL=1
fi
echo ""
echo "=== dmesg Access ==="
if dmesg 2>&1 | grep -q "Operation not permitted"; then
echo "OK: dmesg restricted for unprivileged users"
else
echo "INFO: Run this check as a non-root user to verify dmesg restriction"
fi
echo ""
if [ $FAIL -eq 0 ]; then
echo "ALL CHECKS PASSED"
exit 0
else
echo "SOME CHECKS FAILED"
exit 1
fi
Expected Behaviour
After applying /proc and /sys hardening:
cat /proc/kallsymsas a non-root user shows all addresses as0000000000000000cat /proc/kallsymsas root also shows zeroed addresses (withkptr_restrict=2)dmesgas a non-root user returns “Operation not permitted”ps auxas a non-root user shows only that user’s processes (withhidepid=2)echo b > /proc/sysrq-triggeras root does nothing (withsysrq=0)- Monitoring agents in the
procgroup can still see all processes and collect metrics - Container processes cannot read
/proc/kcore,/proc/keys, or/proc/acpi - System services (SSH, web servers, databases) function normally
systemd-cgtop,htop(as root), andtop(as root) display all processes correctly
Trade-offs
| Control | Benefit | Cost | Mitigation |
|---|---|---|---|
hidepid=2 |
Users cannot see other users’ processes, preventing enumeration of running services and command-line secrets | ps aux as non-root shows only own processes. Some tools that expect full process visibility break. |
Add monitoring and admin users to the proc group via the gid=proc mount option. |
kptr_restrict=2 |
Kernel addresses hidden from everyone, including root. Prevents KASLR bypass even after root compromise. | Root cannot debug kernel issues that require symbol addresses. perf and bpftrace cannot resolve kernel symbols. |
Use kptr_restrict=1 if root needs kernel symbols for debugging. On dedicated development/debugging hosts, keep at 0. |
sysrq=0 |
Prevents abuse of the SysRq interface for denial of service or data exfiltration | Cannot use SysRq for emergency recovery (sync, remount-ro, reboot) | Set sysrq=176 to allow only safe SysRq functions (sync and remount-ro). Useful for emergency situations on physical hardware. |
dmesg_restrict=1 |
Prevents unprivileged access to kernel ring buffer (addresses, hardware info, driver details) | Users cannot run dmesg for troubleshooting |
Grant CAP_SYSLOG to specific debugging users or tools. Or use journalctl -k with appropriate journal permissions. |
| Container procfs masking | Containers cannot access sensitive host kernel information | Some monitoring containers need access to host /proc paths |
Mount specific host paths read-only into monitoring containers instead of disabling procfs masking entirely. |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
Monitoring agent cannot read /proc |
Metrics collection stops. Dashboards show gaps. Alerts fire for missing metrics. | Prometheus scrape errors. Agent logs show “permission denied” on /proc paths. |
Add the monitoring agent’s user to the proc group: usermod -aG proc <agent_user>. Restart the agent. |
hidepid=2 breaks application that reads other processes |
Application fails with “no such file or directory” when reading /proc/<pid> of another process |
Application error logs reference /proc paths. strace shows ENOENT or EACCES on /proc/[pid]/ access. |
Add the application user to the proc group. Or run the application with CAP_SYS_PTRACE capability (grants /proc access). |
kptr_restrict=2 breaks debugging tools |
perf report shows unresolved symbols. bpftrace cannot map kernel addresses to function names. |
Debugging output shows hex addresses instead of symbol names. | Temporarily set sysctl kernel.kptr_restrict=1 for the debugging session. Reset to 2 when done. |
sysrq=0 prevents emergency recovery |
Cannot use Alt+SysRq+S (sync) or Alt+SysRq+B (reboot) on a hung system | System is hung and the only option is a hard power cycle | Set sysrq=176 instead of 0 to allow sync and remount-ro. For remote systems, use IPMI/BMC for emergency reboot. |
Container runs with --privileged bypassing all masking |
Container can read all /proc and /sys paths, including kernel memory |
Kubernetes audit log shows privileged container creation. Pod security admission rejects the pod (if PSA is enforced). | Never use --privileged. Use Kubernetes Pod Security Admission (or a policy engine) to reject privileged containers at the admission level. |
When to Consider a Managed Alternative
Transition point: When you run containers at scale and need consistent procfs masking across multiple container runtimes and runtime versions, or when container runtime upgrades change the default masking behaviour and you need to verify compliance after each update.
What managed providers handle:
Managed Kubernetes providers (Civo (#22), DigitalOcean (#21), Vultr (#12), Linode (#13)) configure container runtimes with appropriate procfs masking on their node images. The provider handles the runtime configuration and ensures that containers cannot access sensitive host paths by default. When the provider upgrades the container runtime, they verify that masking policies are maintained.
Falco (open source) and Sysdig (#122) detect suspicious access patterns to /proc and /sys paths at runtime. If a container attempts to read /proc/kcore or access a masked path, these tools generate an alert. This provides detection even if a masking configuration is accidentally weakened.
What you still control: Host-level /proc hardening (hidepid, kptr_restrict, dmesg_restrict) is your responsibility on self-managed infrastructure. Pod security contexts and admission policies that prevent privileged containers are your responsibility on any Kubernetes deployment, including managed clusters.
Automation path: For self-managed infrastructure, apply the sysctl and fstab configurations from this article through your configuration management tool. Run the verification script on a schedule to detect drift. For Kubernetes, enforce Pod Security Standards at the namespace level to prevent containers from running with elevated procfs access.