Hardening WebSocket Connections: Authentication, Rate Limiting, and Origin Validation

Hardening WebSocket Connections: Authentication, Rate Limiting, and Origin Validation

Problem

WebSocket connections start as an HTTP upgrade request and then persist as a long-lived, full-duplex channel. This persistence creates a fundamentally different security surface than standard HTTP:

  • No per-request authentication. Once the upgrade handshake completes, every subsequent message on that connection is trusted. If the initial handshake is weak (session cookie without origin check, no token validation), the connection stays open indefinitely with full access.
  • Rate limiting gaps. Standard HTTP rate limiting counts requests per second. A WebSocket connection is a single “request” that can carry thousands of messages per second. Per-connection rate limiting requires stateful inspection that most reverse proxies do not provide by default.
  • Origin bypass. Cross-Site WebSocket Hijacking (CSWSH) exploits the fact that browsers send cookies with WebSocket upgrade requests. An attacker page at evil.com can open a WebSocket to your-api.com, and the browser attaches the user’s session cookie automatically.
  • Resource exhaustion. Each WebSocket connection holds a file descriptor and memory on the server. Without connection limits per IP or per user, an attacker can open thousands of connections and exhaust server resources.
  • Message size abuse. WebSocket frames can be arbitrarily large. A single oversized message can consume all available memory on the server.

Target systems: Any application serving WebSocket connections, whether directly from the application server, through NGINX, or through Envoy/service mesh.

Threat Model

  • Adversary: External attacker with browser-level access (for CSWSH) or direct TCP access (for connection flooding and message abuse). May also be an authenticated user abusing the connection.
  • Access level: Unauthenticated for connection-level attacks. Authenticated (via stolen session or CSWSH) for message-level abuse.
  • Objective: Cross-Site WebSocket Hijacking to read or send messages as the victim user. Resource exhaustion through connection flooding. Data exfiltration through an established WebSocket tunnel. Denial of service through oversized messages.
  • Blast radius: All users sharing the same WebSocket server or backend process. If WebSocket handlers share memory or event loops with HTTP handlers, the blast radius extends to the entire application.

Configuration

Origin Validation

The most critical defence against Cross-Site WebSocket Hijacking is strict origin checking during the upgrade handshake. The server must reject upgrade requests from origins it does not explicitly trust.

NGINX configuration to enforce origin checking at the proxy layer:

# /etc/nginx/conf.d/websocket.conf

# Map to validate the Origin header during WebSocket upgrades.
# Only allow connections from your own domain(s).
map $http_origin $ws_origin_allowed {
    default 0;
    "https://app.example.com" 1;
    "https://www.example.com" 1;
    "https://staging.example.com" 1;
}

# Map to handle the WebSocket upgrade headers
map $http_upgrade $connection_upgrade {
    default upgrade;
    "" close;
}

server {
    listen 443 ssl;
    http2 on;
    server_name ws.example.com;

    # TLS configuration (see Article #39 for full TLS hardening)
    ssl_certificate /etc/nginx/certs/ws.example.com.crt;
    ssl_certificate_key /etc/nginx/certs/ws.example.com.key;

    location /ws {
        # Reject WebSocket upgrades from disallowed origins.
        # This blocks Cross-Site WebSocket Hijacking.
        if ($ws_origin_allowed = 0) {
            return 403;
        }

        proxy_pass http://websocket_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # WebSocket-specific timeouts.
        # proxy_read_timeout controls how long NGINX waits between
        # reads from the backend. For WebSocket, this is the idle
        # timeout: if no message is sent for this duration, NGINX
        # closes the connection. 300s (5 minutes) is typical.
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
    }
}

Application-level origin validation (Node.js with the ws library):

// server.js - WebSocket server with origin validation
const WebSocket = require('ws');

const ALLOWED_ORIGINS = new Set([
  'https://app.example.com',
  'https://www.example.com',
]);

const wss = new WebSocket.Server({
  port: 8080,
  // Validate origin before accepting the upgrade
  verifyClient: (info, callback) => {
    const origin = info.origin || info.req.headers.origin;

    if (!origin || !ALLOWED_ORIGINS.has(origin)) {
      callback(false, 403, 'Forbidden: invalid origin');
      return;
    }

    callback(true);
  },
  // Maximum message size: 64KB
  maxPayload: 64 * 1024,
});

Authentication During the Upgrade Handshake

WebSocket connections must be authenticated before the upgrade completes. The two common patterns are token-in-query-string and token-in-first-message:

// Pattern 1: Token in query string (validated during upgrade)
// Client connects: wss://ws.example.com/ws?token=<jwt>
const url = require('url');
const jwt = require('jsonwebtoken');

const wss = new WebSocket.Server({
  port: 8080,
  verifyClient: (info, callback) => {
    // Validate origin
    const origin = info.origin || info.req.headers.origin;
    if (!ALLOWED_ORIGINS.has(origin)) {
      callback(false, 403, 'Forbidden');
      return;
    }

    // Extract and validate token from query string
    const params = url.parse(info.req.url, true).query;
    if (!params.token) {
      callback(false, 401, 'Unauthorized: missing token');
      return;
    }

    try {
      const decoded = jwt.verify(params.token, process.env.JWT_SECRET);
      // Attach user info to the request for later use
      info.req.user = decoded;
      callback(true);
    } catch (err) {
      callback(false, 401, 'Unauthorized: invalid token');
    }
  },
  maxPayload: 64 * 1024,
});

wss.on('connection', (ws, req) => {
  // req.user is available from verifyClient
  ws.userId = req.user.sub;

  ws.on('message', (data) => {
    // All messages on this connection are from an authenticated user
    handleMessage(ws, ws.userId, data);
  });
});

Connection Limits Per IP

NGINX connection limiting applies to WebSocket connections the same way it applies to HTTP:

# In the http {} block

# Track WebSocket connections per client IP.
# Separate zone from HTTP to allow independent limits.
limit_conn_zone $binary_remote_addr zone=ws_conn_per_ip:10m;

server {
    listen 443 ssl;
    server_name ws.example.com;

    location /ws {
        # Maximum 10 simultaneous WebSocket connections per IP.
        # Legitimate clients rarely need more than 2-3.
        limit_conn ws_conn_per_ip 10;
        limit_conn_status 429;

        # Origin check (from map above)
        if ($ws_origin_allowed = 0) {
            return 403;
        }

        proxy_pass http://websocket_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;

        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
    }
}

Per-Connection Message Rate Limiting

NGINX cannot inspect individual WebSocket frames after the upgrade. Message-level rate limiting must happen at the application layer:

// Per-connection message rate limiter
class ConnectionRateLimiter {
  constructor(maxMessagesPerSecond, maxMessagesPerMinute) {
    this.maxPerSecond = maxMessagesPerSecond;
    this.maxPerMinute = maxMessagesPerMinute;
    this.secondCount = 0;
    this.minuteCount = 0;

    // Reset counters on intervals
    this.secondTimer = setInterval(() => { this.secondCount = 0; }, 1000);
    this.minuteTimer = setInterval(() => { this.minuteCount = 0; }, 60000);
  }

  tryConsume() {
    if (this.secondCount >= this.maxPerSecond) return false;
    if (this.minuteCount >= this.maxPerMinute) return false;
    this.secondCount++;
    this.minuteCount++;
    return true;
  }

  destroy() {
    clearInterval(this.secondTimer);
    clearInterval(this.minuteTimer);
  }
}

wss.on('connection', (ws, req) => {
  // Allow 10 messages/second, 200 messages/minute per connection
  const limiter = new ConnectionRateLimiter(10, 200);

  ws.on('message', (data) => {
    if (!limiter.tryConsume()) {
      ws.send(JSON.stringify({
        error: 'rate_limited',
        message: 'Too many messages. Slow down.'
      }));
      return;
    }

    handleMessage(ws, data);
  });

  ws.on('close', () => {
    limiter.destroy();
  });
});

Envoy WebSocket Proxy Configuration

# Envoy WebSocket configuration with connection limits
static_resources:
  listeners:
    - name: ws_listener
      address:
        socket_address:
          address: 0.0.0.0
          port_value: 8443
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ws_ingress
                codec_type: AUTO
                # Enable WebSocket upgrades
                upgrade_configs:
                  - upgrade_type: websocket
                    enabled: true
                # Idle timeout for WebSocket connections
                stream_idle_timeout: 300s
                route_config:
                  name: ws_route
                  virtual_hosts:
                    - name: ws_backend
                      domains: ["ws.example.com"]
                      routes:
                        - match:
                            prefix: "/ws"
                          route:
                            cluster: ws_cluster
                            timeout: 0s
                            idle_timeout: 300s
                http_filters:
                  - name: envoy.filters.http.router
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router

  clusters:
    - name: ws_cluster
      connect_timeout: 5s
      type: STRICT_DNS
      lb_policy: ROUND_ROBIN
      # Circuit breaker limits simultaneous connections
      circuit_breakers:
        thresholds:
          - priority: DEFAULT
            max_connections: 10000
            max_pending_requests: 1000
      load_assignment:
        cluster_name: ws_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 10.0.1.10
                      port_value: 8080

Message Size Limits

At the application layer, enforce maximum message sizes to prevent memory exhaustion:

// Node.js ws library: maxPayload in bytes
const wss = new WebSocket.Server({
  port: 8080,
  maxPayload: 64 * 1024,       // 64 KB max message size
  backlog: 100,                 // Max pending connections
  clientTracking: true,         // Track connected clients
});

// Go (gorilla/websocket)
// conn.SetReadLimit(65536) // 64 KB

Expected Behaviour

After applying the WebSocket hardening configuration:

# Verify origin checking blocks cross-origin requests
curl -s -o /dev/null -w "%{http_code}" \
  -H "Upgrade: websocket" \
  -H "Connection: Upgrade" \
  -H "Origin: https://evil.com" \
  -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
  -H "Sec-WebSocket-Version: 13" \
  https://ws.example.com/ws
# Expected: 403

# Verify legitimate origin is accepted
curl -s -o /dev/null -w "%{http_code}" \
  -H "Upgrade: websocket" \
  -H "Connection: Upgrade" \
  -H "Origin: https://app.example.com" \
  -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
  -H "Sec-WebSocket-Version: 13" \
  https://ws.example.com/ws
# Expected: 101 (Switching Protocols)

# Verify connection limit (open 11 connections from same IP)
for i in $(seq 1 11); do
  curl -s -o /dev/null -w "%{http_code} " \
    -H "Upgrade: websocket" \
    -H "Connection: Upgrade" \
    -H "Origin: https://app.example.com" \
    -H "Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==" \
    -H "Sec-WebSocket-Version: 13" \
    --max-time 2 \
    https://ws.example.com/ws &
done
wait
# Expected: first 10 return 101, 11th returns 429

Trade-offs

Control Impact Risk Mitigation
Origin validation via map Blocks cross-origin WebSocket connections Legitimate integrations from partner domains are blocked Add partner origins to the allow list; use a dynamic origin check at the application layer for multi-tenant setups
proxy_read_timeout 300s Idle connections close after 5 minutes Long-idle connections (dashboard tabs left open) disconnect Implement application-level ping/pong to keep connections alive within the timeout window
limit_conn ws_conn_per_ip 10 Limits simultaneous connections per IP Users behind corporate NAT share the connection pool Increase limit or switch to per-user connection limiting at the application layer
Application-level rate limiting Adds CPU overhead per message for rate check Slight increase in message latency Use efficient token bucket implementation; overhead is negligible compared to message processing
maxPayload: 64 * 1024 Messages larger than 64KB are rejected Binary data transfers (images, files) over WebSocket fail Use HTTP upload endpoints for large payloads; WebSocket should carry control messages and small data

Failure Modes

Failure Symptom Detection Recovery
Origin allow list missing a legitimate domain Users from that domain cannot establish WebSocket connections Support reports from users on the missing domain; 403 spike in WebSocket upgrade logs Add the domain to the origin map and reload NGINX
proxy_read_timeout too short Active connections close during natural idle periods Application monitoring shows frequent WebSocket reconnections; client-side reconnect storms Increase timeout or implement ping/pong keepalive at the application layer
Connection limit too low for NAT users Corporate users sharing a NAT IP cannot all connect Support tickets from specific office locations; 429 responses correlated with NAT IP ranges Increase limit_conn for known NAT ranges or switch to per-authenticated-user limits
Rate limiter rejects legitimate burst traffic Users performing rapid valid actions (typing, scrolling) get rate-limited Application logs show rate limit events from active users; user complaints about dropped messages Increase burst allowance; use a sliding window instead of fixed interval
maxPayload too small for application data Application features that send larger messages fail silently Client-side errors; WebSocket close events with code 1009 (message too big) Increase maxPayload to match the application’s actual maximum message size

When to Consider a Managed Alternative

Transition point: When your WebSocket infrastructure exceeds 10,000 concurrent connections and you are spending significant time on connection management, scaling, and abuse detection rather than application features, or when you need geographic distribution of WebSocket endpoints.

What managed providers handle:

  • Cloudflare (#29): WebSocket connections are proxied through Cloudflare’s edge network with automatic DDoS protection. Connection-level rate limiting and IP reputation filtering apply to WebSocket upgrades. The free plan supports WebSocket proxying; Pro adds more granular controls.

What you still control: Origin validation, authentication during the upgrade handshake, per-connection message rate limiting, and message-level authorization are application concerns that no edge provider handles for you. Cloudflare protects the connection layer; you protect the message layer.

Architecture: Cloudflare terminates TLS and filters abuse at the edge. Your WebSocket server behind Cloudflare handles authentication, origin validation, and message-level security. The edge absorbs connection floods; your application enforces business logic on established connections.