Layered rate limiting in Nginx — from limit_req_zone to Cloudflare and back

Single-layer rate limiting fails open. If your only defence is limit_req_zone at the Nginx layer, you've already lost when a botnet sends 50,000 requests per second from 30,000 distinct IPs — Nginx will accept the TCP connections, do the TLS handshakes, parse the requests, and then decide to 429 most of them. The 502s downstream are gone but the CPU bill is not.

Rate limiting only works as a layered story. This post walks through the three-layer setup we ship for every managed Nginx customer — edge (Cloudflare or equivalent), perimeter (Nginx), and origin (the app itself) — and the specific Nginx directives that do the heavy lifting at each level.

Layer 1: the edge

The first job at the edge is to never let pathological traffic reach Nginx. We use Cloudflare for most customers (occasionally AWS WAF or GCP Cloud Armor depending on the cloud). The edge handles:

Volumetric L3/L4 floods — the cloud edge absorbs these as part of the service. The traffic never touches your origin.
Known-bad ASNs and IPs — automated lists, plus per-customer additions
Bot challenges for traffic patterns matching scraping or credential-stuffing signatures
Geo-based filters where the customer's threat model permits it
A static per-IP rate limit at 600 requests/minute (10 RPS sustained) — anything above that hits a challenge before reaching us

The numbers vary by customer — an e-commerce checkout origin gets a tighter Cloudflare rule than a public CMS — but the principle doesn't: the cloud edge handles the volume, Nginx handles the precision. We tune the Cloudflare rules so that, by the time a request crosses the back-to-origin TLS handshake, it has at least a plausible claim to being a real user.

This is why the realip configuration from the previous post matters so much. Without set_real_ip_from + real_ip_header CF-Connecting-IP, every limit at the Nginx layer is keyed by the Cloudflare edge IP — meaning the entire planet's traffic shares one rate-limit bucket. With the realip module configured correctly, Nginx sees the actual client IP and rate-limits per-client. This is the most-common rate-limiting bug we fix.

Layer 2: Nginx

This is where things get interesting. Nginx has two rate-limiting modules — limit_req (requests per second) and limit_conn (concurrent connections). We use both, and we use them at different scopes.

Per-route request rate limiting

# Global zones — declared once at http{} level
limit_req_zone $binary_remote_addr zone=api_general:10m rate=20r/s;
limit_req_zone $binary_remote_addr zone=api_auth:10m rate=5r/s;
limit_req_zone $binary_remote_addr zone=api_search:10m rate=10r/s;
limit_req_zone $binary_remote_addr zone=static:10m rate=100r/s;
 
# Status code for rate-limited responses
limit_req_status 429;
 
server {
    listen 443 ssl http2;
    server_name app.example.com;
 
    # Auth endpoints — tight limit, brute-force protection
    location ~ ^/api/(login|register|reset-password) {
        limit_req zone=api_auth burst=10 nodelay;
        proxy_pass http://app_backend;
    }
 
    # Search — more permissive but still bounded
    location /api/search {
        limit_req zone=api_search burst=20 nodelay;
        proxy_pass http://app_backend;
    }
 
    # General API
    location /api/ {
        limit_req zone=api_general burst=40 nodelay;
        proxy_pass http://app_backend;
    }
 
    # Static — bulk allowance, mostly for assets
    location ~* \.(css|js|png|jpg|svg|woff2)$ {
        limit_req zone=static burst=200 nodelay;
        try_files $uri =404;
    }
}

A few details that aren't obvious from the Nginx docs:

$binary_remote_addr not $remote_addr. The binary form is 4 bytes (IPv4) or 16 bytes (IPv6); the string form is up to 39 bytes. With 10MB of zone memory, you can track ~160,000 unique IPs with the binary form vs ~40,000 with the string form. The 4x difference matters during an attack.

burst=N nodelay. This is the most-misunderstood pair of directives. burst=40 means "allow up to 40 requests beyond the rate limit to queue without 429". nodelay means "process those burst requests immediately instead of spacing them out". Together they create a "leaky bucket with immediate drain" — the common case (a user clicking around) doesn't get rate-limited, but a sustained flood does.

Without nodelay, the burst requests get queued and trickled out at the rate, which adds latency the user can feel. We almost always want nodelay.

Zone sizing. 10m gives us roughly 160k IP slots. For a customer with global traffic, we go to 50m or 100m. The zone is shared across all workers — no per-worker doubling — so the cost is just the memory.

Per-route connection limits

limit_req controls request rate; limit_conn controls concurrent connections. They protect against different things:

limit_conn_zone $binary_remote_addr zone=perip:10m;
limit_conn_zone $server_name zone=perserver:10m;
 
server {
    # No single client opens more than 50 simultaneous connections
    limit_conn perip 50;
 
    # No more than 5000 concurrent connections to this server in total
    limit_conn perserver 5000;
}

Why do this when limit_req already exists? Two reasons:

Slowloris — clients that open connections and trickle bytes. limit_req doesn't fire because the request hasn't completed. limit_conn catches it because the connection is held open.
Long-running endpoints — file uploads, downloads, SSE, WebSocket. These aren't "requests per second" workloads; they're "concurrent connections" workloads.

Combined with client_body_timeout 10s and client_header_timeout 10s, this neutralises basic slow-client attacks. The connection has 10 seconds to send a header byte; if it doesn't, Nginx drops it.

Keyed limits beyond IP

Sometimes the right key isn't $binary_remote_addr. Common alternatives:

# Rate-limit by API key (header) for authenticated traffic
limit_req_zone $http_x_api_key zone=api_key:10m rate=100r/s;
 
# Rate-limit anonymous traffic by IP, authenticated by user
map $http_authorization $rate_key {
    default $binary_remote_addr;
    "~^Bearer " $http_authorization;
}
limit_req_zone $rate_key zone=mixed:20m rate=50r/s;

The map block is the trick — it lets you choose the rate-limit key dynamically. Anonymous traffic gets limited per-IP (because that's all we know), authenticated traffic gets limited per-token (so a single user behind a NAT doesn't punish their colleagues).

Layer 3: the origin

The third layer is in your application, and Nginx can't do it for you. But the Nginx layer can signal to the application that traffic is suspect — for example, by setting a custom header when traffic is over the soft limit but under the hard limit, so the application can shed less-critical work (skip cache warming, defer analytics writes, etc.).

The pattern looks like this:

limit_req_zone $binary_remote_addr zone=soft:10m rate=30r/s;
limit_req_zone $binary_remote_addr zone=hard:10m rate=60r/s;
 
location /api/ {
    # Hard limit: 429 immediately
    limit_req zone=hard burst=20 nodelay;
 
    # Soft limit: signal to backend via header, don't block
    set $is_throttled "no";
    error_page 418 = @soft_throttle;
    if ($limit_req_status = "DELAYED") { return 418; }
 
    proxy_pass http://app_backend;
}
 
location @soft_throttle {
    proxy_set_header X-Soft-Throttled "yes";
    proxy_pass http://app_backend;
}

(That error_page 418 trick is a hack to set a flag based on the request's rate-limit status; in practice we use a slightly cleaner Lua block when the customer's edge build includes the Lua module.)

When you're already under attack

The configs above are preventive. When the on-call gets paged because traffic is up 50x and 5xx errors are climbing, the response is different. Our server-under-threat playbook covers the full sequence, but the Nginx-specific moves are:

Reduce limits aggressively. Drop the rate-limit zones to a fraction of normal. nginx -s reload applies the change in under a second without dropping in-flight requests.

Add a temporary IP allowlist. If the legitimate user base is identifiable (corporate IPs, known partner ranges), switch to allowlist-mode for the duration.

geo $is_allowed {
    default 0;
    192.0.2.0/24 1;
    203.0.113.0/24 1;
}
 
server {
    if ($is_allowed = 0) { return 429; }
    # ... normal config
}

Push the limit to the edge. Cloudflare's Under Attack Mode or AWS Shield Advanced both let you tighten the edge limits in seconds. The Nginx layer is the last place you want to be doing volumetric defence.

Capture and triage. Tail access.log filtered to 429s. Look for the ASN, user-agent, request path — most attacks have a fingerprint. Add a per-fingerprint rule (e.g., if ($http_user_agent ~* "python-requests") { return 429; }) and reload.

We carry these moves as part of our cybersecurity offering — a 24/7 on-call rotation, a documented playbook, and the muscle memory to run them without thrashing.

What we ship by default

For every new managed Nginx customer:

A limit_req_zone map per logical route group (auth, API, search, static), sized by observed traffic
A limit_conn_zone per-IP at 50 and per-server at a customer-specific ceiling
Slow-client mitigations (client_body_timeout, client_header_timeout, client_max_body_size)
A realip block that ensures keys are the real client IP, not the front-of-edge
An attack-mode include file that's pre-staged but commented out, ready to switch in seconds
Cloudflare or WAF rules that complement the Nginx layer, not duplicate it

It costs maybe an extra 5MB of RAM per worker and adds a single-digit-microsecond cost per request. In exchange, your perimeter has a story for everything from a single curl-in-a-loop to a 30k-IP botnet, and the on-call has a runbook rather than a panic.

If you want a baseline review of what your current limits actually do under load, our pricing for a one-off audit is on the pricing page. It's usually a half-day of work and produces a config diff you can apply yourself.

Sudhanshu K. leads cybersecurity at EdgeServers (RemotIQ Pty Ltd, ABN 91 682 628 128). She has watched too many limit_req_zones key on the wrong variable and considers $binary_remote_addr vs $remote_addr a hill worth dying on.