Hardening WebSocket Connections: Authentication, Rate Limiting, and Origin Validation
Problem
WebSocket connections start as an HTTP upgrade request and then persist as a long-lived, full-duplex channel. This persistence creates a fundamentally different security surface than standard HTTP:
- No per-request authentication. Once the upgrade handshake completes, every subsequent message on that connection is trusted. If the initial handshake is weak (session cookie without origin check, no token validation), the connection stays open indefinitely with full access.
- Rate limiting gaps. Standard HTTP rate limiting counts requests per second. A WebSocket connection is a single “request” that can carry thousands of messages per second. Per-connection rate limiting requires stateful inspection that most reverse proxies do not provide by default.
- Origin bypass. Cross-Site WebSocket Hijacking (CSWSH) exploits the fact that browsers send cookies with WebSocket upgrade requests. An attacker page at
evil.comcan open a WebSocket toyour-api.com, and the browser attaches the user’s session cookie automatically. - Resource exhaustion. Each WebSocket connection holds a file descriptor and memory on the server. Without connection limits per IP or per user, an attacker can open thousands of connections and exhaust server resources.
- Message size abuse. WebSocket frames can be arbitrarily large. A single oversized message can consume all available memory on the server.
Target systems: Any application serving WebSocket connections, whether directly from the application server, through NGINX, or through Envoy/service mesh.
Threat Model
- Adversary: External attacker with browser-level access (for CSWSH) or direct TCP access (for connection flooding and message abuse). May also be an authenticated user abusing the connection.
- Access level: Unauthenticated for connection-level attacks. Authenticated (via stolen session or CSWSH) for message-level abuse.
- Objective: Cross-Site WebSocket Hijacking to read or send messages as the victim user. Resource exhaustion through connection flooding. Data exfiltration through an established WebSocket tunnel. Denial of service through oversized messages.
- Blast radius: All users sharing the same WebSocket server or backend process. If WebSocket handlers share memory or event loops with HTTP handlers, the blast radius extends to the entire application.
Configuration
Origin Validation
The most critical defence against Cross-Site WebSocket Hijacking is strict origin checking during the upgrade handshake. The server must reject upgrade requests from origins it does not explicitly trust.
NGINX configuration to enforce origin checking at the proxy layer:
# /etc/nginx/conf.d/websocket.conf
# Map to validate the Origin header during WebSocket upgrades.
# Only allow connections from your own domain(s).
map $http_origin $ws_origin_allowed {
default 0;
"https://app.example.com" 1;
"https://www.example.com" 1;
"https://staging.example.com" 1;
}
# Map to handle the WebSocket upgrade headers
map $http_upgrade $connection_upgrade {
default upgrade;
"" close;
}
server {
listen 443 ssl;
http2 on;
server_name ws.example.com;
# TLS configuration (see Article #39 for full TLS hardening)
ssl_certificate /etc/nginx/certs/ws.example.com.crt;
ssl_certificate_key /etc/nginx/certs/ws.example.com.key;
location /ws {
# Reject WebSocket upgrades from disallowed origins.
# This blocks Cross-Site WebSocket Hijacking.
if ($ws_origin_allowed = 0) {
return 403;
}
proxy_pass http://websocket_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# WebSocket-specific timeouts.
# proxy_read_timeout controls how long NGINX waits between
# reads from the backend. For WebSocket, this is the idle
# timeout: if no message is sent for this duration, NGINX
# closes the connection. 300s (5 minutes) is typical.
proxy_read_timeout 300s;
proxy_send_timeout 300s;
}
}
Application-level origin validation (Node.js with the ws library):
// server.js - WebSocket server with origin validation
const WebSocket = require('ws');
const ALLOWED_ORIGINS = new Set([
'https://app.example.com',
'https://www.example.com',
]);
const wss = new WebSocket.Server({
port: 8080,
// Validate origin before accepting the upgrade
verifyClient: (info, callback) => {
const origin = info.origin || info.req.headers.origin;
if (!origin || !ALLOWED_ORIGINS.has(origin)) {
callback(false, 403, 'Forbidden: invalid origin');
return;
}
callback(true);
},
// Maximum message size: 64KB
maxPayload: 64 * 1024,
});
Authentication During the Upgrade Handshake
WebSocket connections must be authenticated before the upgrade completes. The two common patterns are token-in-query-string and token-in-first-message:
// Pattern 1: Token in query string (validated during upgrade)
// Client connects: wss://ws.example.com/ws?token=<jwt>
const url = require('url');
const jwt = require('jsonwebtoken');
const wss = new WebSocket.Server({
port: 8080,
verifyClient: (info, callback) => {
// Validate origin
const origin = info.origin || info.req.headers.origin;
if (!ALLOWED_ORIGINS.has(origin)) {
callback(false, 403, 'Forbidden');
return;
}
// Extract and validate token from query string
const params = url.parse(info.req.url, true).query;
if (!params.token) {
callback(false, 401, 'Unauthorized: missing token');
return;
}
try {
const decoded = jwt.verify(params.token, process.env.JWT_SECRET);
// Attach user info to the request for later use
info.req.user = decoded;
callback(true);
} catch (err) {
callback(false, 401, 'Unauthorized: invalid token');
}
},
maxPayload: 64 * 1024,
});
wss.on('connection', (ws, req) => {
// req.user is available from verifyClient
ws.userId = req.user.sub;
ws.on('message', (data) => {
// All messages on this connection are from an authenticated user
handleMessage(ws, ws.userId, data);
});
});
Connection Limits Per IP
NGINX connection limiting applies to WebSocket connections the same way it applies to HTTP:
# In the http {} block
# Track WebSocket connections per client IP.
# Separate zone from HTTP to allow independent limits.
limit_conn_zone $binary_remote_addr zone=ws_conn_per_ip:10m;
server {
listen 443 ssl;
server_name ws.example.com;
location /ws {
# Maximum 10 simultaneous WebSocket connections per IP.
# Legitimate clients rarely need more than 2-3.
limit_conn ws_conn_per_ip 10;
limit_conn_status 429;
# Origin check (from map above)
if ($ws_origin_allowed = 0) {
return 403;
}
proxy_pass http://websocket_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
}
}
Per-Connection Message Rate Limiting
NGINX cannot inspect individual WebSocket frames after the upgrade. Message-level rate limiting must happen at the application layer:
// Per-connection message rate limiter
class ConnectionRateLimiter {
constructor(maxMessagesPerSecond, maxMessagesPerMinute) {
this.maxPerSecond = maxMessagesPerSecond;
this.maxPerMinute = maxMessagesPerMinute;
this.secondCount = 0;
this.minuteCount = 0;
// Reset counters on intervals
this.secondTimer = setInterval(() => { this.secondCount = 0; }, 1000);
this.minuteTimer = setInterval(() => { this.minuteCount = 0; }, 60000);
}
tryConsume() {
if (this.secondCount >= this.maxPerSecond) return false;
if (this.minuteCount >= this.maxPerMinute) return false;
this.secondCount++;
this.minuteCount++;
return true;
}
destroy() {
clearInterval(this.secondTimer);
clearInterval(this.minuteTimer);
}
}
wss.on('connection', (ws, req) => {
// Allow 10 messages/second, 200 messages/minute per connection
const limiter = new ConnectionRateLimiter(10, 200);
ws.on('message', (data) => {
if (!limiter.tryConsume()) {
ws.send(JSON.stringify({
error: 'rate_limited',
message: 'Too many messages. Slow down.'
}));
return;
}
handleMessage(ws, data);
});
ws.on('close', () => {
limiter.destroy();
});
});
Envoy WebSocket Proxy Configuration
# Envoy WebSocket configuration with connection limits
static_resources:
listeners:
- name: ws_listener
address:
socket_address:
address: 0.0.0.0
port_value: 8443
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ws_ingress
codec_type: AUTO
# Enable WebSocket upgrades
upgrade_configs:
- upgrade_type: websocket
enabled: true
# Idle timeout for WebSocket connections
stream_idle_timeout: 300s
route_config:
name: ws_route
virtual_hosts:
- name: ws_backend
domains: ["ws.example.com"]
routes:
- match:
prefix: "/ws"
route:
cluster: ws_cluster
timeout: 0s
idle_timeout: 300s
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: ws_cluster
connect_timeout: 5s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
# Circuit breaker limits simultaneous connections
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 10000
max_pending_requests: 1000
load_assignment:
cluster_name: ws_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 10.0.1.10
port_value: 8080
Message Size Limits
At the application layer, enforce maximum message sizes to prevent memory exhaustion:
// Node.js ws library: maxPayload in bytes
const wss = new WebSocket.Server({
port: 8080,
maxPayload: 64 * 1024, // 64 KB max message size
backlog: 100, // Max pending connections
clientTracking: true, // Track connected clients
});
// Go (gorilla/websocket)
// conn.SetReadLimit(65536) // 64 KB
Expected Behaviour
After applying the WebSocket hardening configuration:
# Verify origin checking blocks cross-origin requests
curl -s -o /dev/null -w "%{http_code}" \
-H "Upgrade: websocket" \
-H "Connection: Upgrade" \
-H "Origin: https://evil.com" \
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
-H "Sec-WebSocket-Version: 13" \
https://ws.example.com/ws
# Expected: 403
# Verify legitimate origin is accepted
curl -s -o /dev/null -w "%{http_code}" \
-H "Upgrade: websocket" \
-H "Connection: Upgrade" \
-H "Origin: https://app.example.com" \
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
-H "Sec-WebSocket-Version: 13" \
https://ws.example.com/ws
# Expected: 101 (Switching Protocols)
# Verify connection limit (open 11 connections from same IP)
for i in $(seq 1 11); do
curl -s -o /dev/null -w "%{http_code} " \
-H "Upgrade: websocket" \
-H "Connection: Upgrade" \
-H "Origin: https://app.example.com" \
-H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
-H "Sec-WebSocket-Version: 13" \
--max-time 2 \
https://ws.example.com/ws &
done
wait
# Expected: first 10 return 101, 11th returns 429
Trade-offs
| Control | Impact | Risk | Mitigation |
|---|---|---|---|
Origin validation via map |
Blocks cross-origin WebSocket connections | Legitimate integrations from partner domains are blocked | Add partner origins to the allow list; use a dynamic origin check at the application layer for multi-tenant setups |
proxy_read_timeout 300s |
Idle connections close after 5 minutes | Long-idle connections (dashboard tabs left open) disconnect | Implement application-level ping/pong to keep connections alive within the timeout window |
limit_conn ws_conn_per_ip 10 |
Limits simultaneous connections per IP | Users behind corporate NAT share the connection pool | Increase limit or switch to per-user connection limiting at the application layer |
| Application-level rate limiting | Adds CPU overhead per message for rate check | Slight increase in message latency | Use efficient token bucket implementation; overhead is negligible compared to message processing |
maxPayload: 64 * 1024 |
Messages larger than 64KB are rejected | Binary data transfers (images, files) over WebSocket fail | Use HTTP upload endpoints for large payloads; WebSocket should carry control messages and small data |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Origin allow list missing a legitimate domain | Users from that domain cannot establish WebSocket connections | Support reports from users on the missing domain; 403 spike in WebSocket upgrade logs | Add the domain to the origin map and reload NGINX |
proxy_read_timeout too short |
Active connections close during natural idle periods | Application monitoring shows frequent WebSocket reconnections; client-side reconnect storms | Increase timeout or implement ping/pong keepalive at the application layer |
| Connection limit too low for NAT users | Corporate users sharing a NAT IP cannot all connect | Support tickets from specific office locations; 429 responses correlated with NAT IP ranges | Increase limit_conn for known NAT ranges or switch to per-authenticated-user limits |
| Rate limiter rejects legitimate burst traffic | Users performing rapid valid actions (typing, scrolling) get rate-limited | Application logs show rate limit events from active users; user complaints about dropped messages | Increase burst allowance; use a sliding window instead of fixed interval |
maxPayload too small for application data |
Application features that send larger messages fail silently | Client-side errors; WebSocket close events with code 1009 (message too big) | Increase maxPayload to match the application’s actual maximum message size |
When to Consider a Managed Alternative
Transition point: When your WebSocket infrastructure exceeds 10,000 concurrent connections and you are spending significant time on connection management, scaling, and abuse detection rather than application features, or when you need geographic distribution of WebSocket endpoints.
What managed providers handle:
- Cloudflare (#29): WebSocket connections are proxied through Cloudflare’s edge network with automatic DDoS protection. Connection-level rate limiting and IP reputation filtering apply to WebSocket upgrades. The free plan supports WebSocket proxying; Pro adds more granular controls.
What you still control: Origin validation, authentication during the upgrade handshake, per-connection message rate limiting, and message-level authorization are application concerns that no edge provider handles for you. Cloudflare protects the connection layer; you protect the message layer.
Architecture: Cloudflare terminates TLS and filters abuse at the edge. Your WebSocket server behind Cloudflare handles authentication, origin validation, and message-level security. The edge absorbs connection floods; your application enforces business logic on established connections.