Rate Limiting at the Ingress Layer: NGINX, Envoy, and Cloud Load Balancers Compared
Problem
Rate limiting is the first line of defence against abuse, credential stuffing, API scraping, and denial-of-service attacks. However, production rate limiting is harder than it appears:
- NGINX
limit_reqis per-instance. If you run 3 NGINX replicas behind a load balancer, an attacker gets 3x the intended rate limit because each instance tracks state independently. At 5 replicas, the effective limit is 5x. - Per-IP limiting breaks behind NAT and proxies. Corporate offices, mobile carriers, and VPN providers share a single public IP across thousands of users. A per-IP rate limit of 10 requests per second affects every user behind that IP.
- Per-API-key limiting outgrows NGINX entirely. NGINX has no native concept of API keys. Implementing per-key limits requires either Lua scripting, an external rate-limit service, or moving to an API gateway.
- Connection-level and request-level limiting are confused.
limit_connrestricts simultaneous connections.limit_reqrestricts request rate. They protect against different attack patterns and are not interchangeable. - Cloud load balancer rate limiting varies wildly. AWS ALB, GCP Cloud Armor, and Azure Front Door each have different granularity, pricing, and limitations. Choosing the wrong layer wastes budget or leaves gaps.
The result: most teams either skip rate limiting, deploy per-instance limits that provide false confidence, or hard-code aggressive limits that block legitimate traffic.
Target systems: NGINX 1.24+, Envoy 1.28+, Kubernetes Ingress controllers, and cloud load balancers (AWS, GCP, Azure).
Threat Model
- Adversary: External attacker or automated bot with unauthenticated or authenticated HTTP access. May use distributed IPs (botnet, cloud VM rotation) to bypass per-IP limits.
- Access level: Network access to public-facing endpoints. For API abuse, the attacker may have valid API credentials (scraped, leaked, or from a free tier account).
- Objective: Credential stuffing against login endpoints, API scraping to extract bulk data, denial of service through request flooding, or resource exhaustion by triggering expensive backend operations (search queries, report generation).
- Blast radius: Without rate limiting, a single attacker can consume all backend capacity, affecting every legitimate user. With per-IP-only limiting, an attacker using distributed IPs bypasses all controls.
Configuration
NGINX: Per-IP Rate Limiting with limit_req
NGINX uses the leaky bucket algorithm. Requests arrive and fill a bucket at the configured rate. Excess requests are either delayed (queued) or rejected.
# /etc/nginx/nginx.conf - http {} block
# Define rate limit zones. Each zone tracks a key (client IP) and
# enforces a rate. 10m = ~160,000 unique IPs tracked.
# General API rate: 10 requests/second per IP.
limit_req_zone $binary_remote_addr zone=api_general:10m rate=10r/s;
# Strict rate for authentication endpoints: 3 requests/second per IP.
limit_req_zone $binary_remote_addr zone=auth_strict:10m rate=3r/s;
# Webhook receiver: 50 requests/second per IP (partner integrations).
limit_req_zone $binary_remote_addr zone=webhook:10m rate=50r/s;
# Return 429 (not the default 503) when rate limit is exceeded.
limit_req_status 429;
Apply rate limits per location:
# /etc/nginx/conf.d/app.conf - server {} block
# General API endpoints.
location /api/ {
# burst=20: allow 20 requests above the rate before rejecting.
# nodelay: process burst requests immediately instead of queuing.
limit_req zone=api_general burst=20 nodelay;
proxy_pass http://api-backend;
}
# Authentication endpoints: stricter limits.
location /api/auth/login {
limit_req zone=auth_strict burst=5 nodelay;
proxy_pass http://auth-backend;
}
location /api/auth/register {
limit_req zone=auth_strict burst=3 nodelay;
proxy_pass http://auth-backend;
}
# Webhook endpoints: higher limits for trusted senders.
location /webhooks/ {
limit_req zone=webhook burst=100 nodelay;
proxy_pass http://webhook-backend;
}
Connection limiting (separate from request rate):
# http {} block
limit_conn_zone $binary_remote_addr zone=conn_per_ip:10m;
# server {} block
# Max 20 simultaneous connections per IP.
limit_conn conn_per_ip 20;
limit_conn_status 429;
NGINX: Per-API-Key Rate Limiting
NGINX can rate-limit by API key using map to extract the key from a header:
# http {} block
# Extract API key from the X-API-Key header.
# If no key is present, fall back to client IP.
map $http_x_api_key $rate_limit_key {
default $binary_remote_addr;
"~.+" $http_x_api_key;
}
# Rate limit zone keyed by API key (or IP if no key).
limit_req_zone $rate_limit_key zone=api_by_key:10m rate=20r/s;
# server {} block
location /api/ {
limit_req zone=api_by_key burst=40 nodelay;
limit_req_status 429;
proxy_pass http://api-backend;
}
Limitation: This is still per-instance. With multiple NGINX replicas, each tracks keys independently.
Envoy: Distributed Rate Limiting with External Service
Envoy delegates rate limiting to an external gRPC service backed by Redis. This provides true distributed rate limiting across all Envoy instances.
Rate limit service configuration (ratelimit service using envoyproxy/ratelimit):
# ratelimit-config.yaml
# Configuration for the envoyproxy/ratelimit service.
domain: production
descriptors:
# Per-IP rate limit: 10 requests/second.
- key: remote_address
rate_limit:
unit: second
requests_per_unit: 10
# Per-API-key rate limit: 50 requests/second.
- key: header_match
value: api-key
descriptors:
- key: api_key
rate_limit:
unit: second
requests_per_unit: 50
# Strict limit for auth endpoints: 3 requests/second per IP.
- key: header_match
value: auth-endpoint
descriptors:
- key: remote_address
rate_limit:
unit: second
requests_per_unit: 3
Envoy filter configuration:
# envoy-ratelimit-filter.yaml
http_filters:
- name: envoy.filters.http.ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: production
failure_mode_deny: false
timeout: 0.05s
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: rate_limit_service
transport_api_version: V3
Rate limit service deployment (Kubernetes):
# ratelimit-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ratelimit
namespace: ingress
spec:
replicas: 2
selector:
matchLabels:
app: ratelimit
template:
metadata:
labels:
app: ratelimit
spec:
containers:
- name: ratelimit
image: envoyproxy/ratelimit:v1.4.0
env:
- name: RUNTIME_ROOT
value: /data
- name: RUNTIME_SUBDIRECTORY
value: ratelimit
- name: REDIS_SOCKET_TYPE
value: tcp
- name: REDIS_URL
value: redis.ingress.svc.cluster.local:6379
- name: USE_STATSD
value: "false"
ports:
- containerPort: 8081
name: grpc
volumeMounts:
- name: config
mountPath: /data/ratelimit/config
volumes:
- name: config
configMap:
name: ratelimit-config
---
apiVersion: v1
kind: Service
metadata:
name: ratelimit
namespace: ingress
spec:
selector:
app: ratelimit
ports:
- port: 8081
targetPort: 8081
name: grpc
Cloud Load Balancer Comparison
AWS WAF Rate-Based Rules (on ALB or CloudFront):
{
"Name": "RateLimitPerIP",
"Priority": 1,
"Action": { "Block": {} },
"Statement": {
"RateBasedStatement": {
"Limit": 2000,
"AggregateKeyType": "IP"
}
},
"VisibilityConfig": {
"SampledRequestsEnabled": true,
"CloudWatchMetricsEnabled": true,
"MetricName": "RateLimitPerIP"
}
}
AWS limitation: Minimum evaluation window is 5 minutes. The Limit value is the maximum requests per 5-minute window, not per second. A limit of 2000 means 2000 requests per 5 minutes, which averages to ~6.7 requests per second but allows bursts within the window.
GCP Cloud Armor rate limiting:
gcloud compute security-policies rules create 1000 \
--security-policy=my-policy \
--expression="true" \
--action=rate-based-ban \
--rate-limit-threshold-count=100 \
--rate-limit-threshold-interval-sec=60 \
--ban-duration-sec=600 \
--conform-action=allow \
--exceed-action=deny-429
GCP Cloud Armor evaluates per-second rates with configurable ban durations. More granular than AWS, but requires a security policy attached to a backend service.
Comparison Table
| Feature | NGINX limit_req |
Envoy + ratelimit | AWS WAF | GCP Cloud Armor |
|---|---|---|---|---|
| Distributed | No (per-instance) | Yes (Redis-backed) | Yes (managed) | Yes (managed) |
| Granularity | Per-second | Per-second | Per 5-minute window | Per-second |
| Key types | IP, header, variable | IP, header, path, custom | IP, header, query string | IP, header, region |
| Per-API-key | Via map/Lua | Native | Via custom header | Via custom header |
| Cost | Free (self-managed) | Redis + compute | $1/rule/month + $0.60/million requests | $0.006/rule/month + $0.75/million requests |
| Latency added | Sub-millisecond | 1-5ms (Redis lookup) | None (inline) | None (inline) |
Expected Behaviour
# Test NGINX rate limiting (10r/s with burst=20).
# Send 35 requests rapidly from a single IP.
for i in $(seq 1 35); do
curl -s -o /dev/null -w "%{http_code} " https://your-domain.com/api/test
done
echo ""
# Expected: first 30 return 200 (10 base + 20 burst), remaining 5 return 429.
# Test authentication endpoint (3r/s with burst=5).
for i in $(seq 1 10); do
curl -s -o /dev/null -w "%{http_code} " -X POST \
-d '{"user":"test","pass":"test"}' \
https://your-domain.com/api/auth/login
done
echo ""
# Expected: first 8 return 200 (or 401), remaining return 429.
# Verify rate limit headers are returned (if configured).
curl -sI https://your-domain.com/api/test | grep -i "retry-after"
# Note: NGINX does not send Retry-After by default.
# Envoy and API gateways typically include it.
# Verify connection limiting.
# Open 25 concurrent connections from one IP.
for i in $(seq 1 25); do
curl -s -o /dev/null -w "%{http_code} " --max-time 5 \
"https://your-domain.com/api/slow-endpoint" &
done
wait
# Expected: first 20 return 200, remaining 5 return 429.
Monitoring rate limit effectiveness:
# Count 429 responses in NGINX access log (last hour).
awk -v date="$(date -d '1 hour ago' '+%d/%b/%Y:%H')" \
'$4 ~ date && $9 == 429' /var/log/nginx/access.log | wc -l
# If using structured JSON logging:
jq -r 'select(.status == "429") | .remote_addr' \
/var/log/nginx/access.json | sort | uniq -c | sort -rn | head -20
Trade-offs
| Control | Impact | Risk | Mitigation |
|---|---|---|---|
| Per-IP limiting at 10r/s | Blocks aggressive scrapers and credential stuffing | Users behind shared NAT (corporate, mobile carrier) all share one limit | Use per-API-key limiting for authenticated endpoints; increase limits for known office IP ranges |
| NGINX per-instance limiting | No additional infrastructure required | Effective limit multiplied by replica count; 5 replicas = 5x the intended limit | Use Envoy with external rate limit service for true distributed limiting |
| Envoy + Redis rate limit | True distributed limiting across all instances | Redis becomes a single point of failure; adds 1-5ms latency per request | Deploy Redis in HA mode (Sentinel or Cluster); set failure_mode_deny: false so requests pass when Redis is down |
nodelay on burst |
Burst requests processed immediately | Backend receives full burst at once | Remove nodelay to queue burst requests (adds latency but smooths traffic) |
| AWS WAF 5-minute window | Simple to configure, fully managed | Allows large bursts within the window; 2000/5min allows 2000 requests in the first second | Combine with CloudFront caching or application-level limiting for burst-sensitive endpoints |
| Auth endpoint strict limits | Effective against credential stuffing | Locks out legitimate users who mistype passwords | Implement account lockout at the application layer instead of relying solely on IP-based rate limiting |
Failure Modes
| Failure | Symptom | Detection | Recovery |
|---|---|---|---|
| Rate limit zone memory exhausted | Oldest entries evicted; some IPs bypass limits | NGINX error log: limit_req zone "api_general" is full |
Increase zone size (10m to 32m) or reduce the number of tracked keys |
| Redis down (Envoy rate limiting) | If failure_mode_deny: false, all requests pass unlimited; if true, all requests blocked |
Redis health checks fail; rate limit metrics drop to zero | Deploy Redis with Sentinel; configure appropriate failure mode based on risk tolerance |
| Rate limit set too low | Legitimate API consumers receive 429 during normal usage | Support tickets; monitoring shows 429 rate increases during business hours (not attack pattern) | Increase rate and burst values; implement tiered limits per API key |
| Rate limit set too high | Attackers stay under the limit while still causing damage | Backend performance degrades despite rate limiting being active | Lower limits; add per-endpoint limits for expensive operations |
$binary_remote_addr behind proxy |
All requests appear from the load balancer IP; entire site rate-limited as one user | All users hit rate limits simultaneously | Use $http_x_forwarded_for or realip_module to extract the actual client IP; set set_real_ip_from for trusted proxies |
Missing set_real_ip_from |
NGINX trusts any X-Forwarded-For header, allowing attackers to spoof their IP |
Attacker bypasses rate limits by sending a fake X-Forwarded-For |
Configure set_real_ip_from to only trust your load balancer’s IP range |
Critical NGINX configuration for environments behind a load balancer:
# http {} block
# Only trust X-Forwarded-For from your load balancer.
# Replace with your actual load balancer CIDR.
set_real_ip_from 10.0.0.0/8;
set_real_ip_from 172.16.0.0/12;
real_ip_header X-Forwarded-For;
real_ip_recursive on;
Without this configuration, all rate limiting by $binary_remote_addr is useless because every request appears to come from the load balancer’s IP.
When to Consider a Managed Alternative
Transition point: When you need distributed rate limiting across multiple instances and maintaining the Redis-backed rate limit service costs more than a managed solution. Or when you need per-API-key limiting with tiered plans (free tier: 100r/min, paid: 1000r/min) and building this into NGINX is not sustainable.
What managed providers handle:
-
Cloudflare (#29): Distributed rate limiting at the edge with no origin infrastructure required. Rules can target by IP, path, header, cookie, or ASN. Pricing starts at $0.05 per 10,000 requests evaluated. The edge enforcement means rate-limited traffic never reaches your infrastructure.
-
Kong (#86): Rate limiting plugin with Redis or database backing for distributed enforcement. Supports per-consumer, per-route, and per-service limits. The rate-limiting-advanced plugin (Enterprise) adds sliding window counters and response header customization.
-
APISIX (#89):
limit-req,limit-conn, andlimit-countplugins with Redis backing for distributed enforcement. Supports per-consumer and per-route configuration. Open-source with no enterprise paywall for rate limiting features.
What you still control: Rate limit thresholds must be tuned based on your application’s traffic patterns. No provider can auto-detect the correct rate for your login endpoint versus your search API. You define the policies; the provider handles distributed enforcement and the infrastructure.