The Nginx reverse proxy patterns we actually run in production

Most Nginx reverse proxies we inherit from a new customer were copy-pasted from a Stack Overflow answer in 2019. They work — in the sense that traffic reaches the backend — but they leak client IPs to the wrong header, open a fresh TCP connection per request, and forward Host: localhost to an upstream that then renders broken URLs. The site is up. The configuration is wrong.

This post walks through the reverse-proxy patterns we standardise on for every managed Nginx customer. It is opinionated, it has specific numbers, and the snippets are the actual ones we ship — not a textbook example. If your current config doesn't look something like this, there's a good chance it's costing you latency, observability, or correctness.

The minimum-viable upstream block

A reverse proxy that doesn't use keepalive on its upstream connections is making a measurable mistake. Without keepalive, each incoming request opens a fresh TCP (and possibly TLS) connection to the backend. At 500 RPS, that's 500 socket creations per second per worker. The backend's connection accept queue fills up, latency spikes, and the easiest possible win — connection reuse — is left on the table.

upstream app_backend {
    server 10.0.1.10:8080 max_fails=3 fail_timeout=10s;
    server 10.0.1.11:8080 max_fails=3 fail_timeout=10s;
    server 10.0.1.12:8080 max_fails=3 fail_timeout=10s backup;
 
    keepalive 64;
    keepalive_requests 1000;
    keepalive_timeout 60s;
}
 
server {
    listen 443 ssl http2;
    server_name app.example.com;
 
    location / {
        proxy_pass http://app_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
 
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Host $host;
        proxy_set_header X-Forwarded-Port $server_port;
 
        proxy_connect_timeout 5s;
        proxy_send_timeout 30s;
        proxy_read_timeout 30s;
 
        proxy_buffering on;
        proxy_buffer_size 8k;
        proxy_buffers 16 8k;
        proxy_busy_buffers_size 16k;
    }
}

There are six things in there that matter, and the ones people get wrong:

keepalive 64 — the number of idle connections kept open per worker. Not the maximum. Set it to roughly the 95th-percentile concurrent upstream connection count. For most customer apps that lands between 32 and 128.

proxy_http_version 1.1 + proxy_set_header Connection "" — both are mandatory for keepalive to actually work. Nginx defaults to HTTP/1.0 with Connection: close for upstream requests, which silently kills any keepalive intent. We see this misconfiguration on roughly half of the configs we audit.

proxy_set_header Host $host — forwards the original client Host header. Without this, the backend sees Host: app_backend (the upstream name) and any application code that branches on hostname will misbehave.

max_fails / fail_timeout — if a backend returns a connection error or a 502/504 three times in 10 seconds, Nginx will stop sending it traffic for 10 seconds. Conservative defaults that work for most workloads.

backup — that third server is only used when the other two are unavailable. It's a cheap way to have a warm spare during a partial outage.

The X-Forwarded-For chain — the most-misunderstood part

If your application logs X-Forwarded-For: 192.0.2.1, 10.0.0.5, 172.16.0.1 and writes the first entry into the audit log as the client IP, you have a spoofable audit log. Anyone can send a request with X-Forwarded-For: 1.2.3.4 and have it forever recorded as the originating client.

The chain works like this. Each proxy appends to the header. So a request that traverses Cloudflare → Nginx → app server arrives at the app with:

X-Forwarded-For: <real client>, <Cloudflare edge>, <Nginx>

The rightmost entry is the one your immediate predecessor added — that one is trustworthy because you added it yourself. The leftmost entry came from the original client and is, by default, attacker-controlled.

The correct way to handle this in Nginx is the realip module:

# Trust Cloudflare's published edge IP ranges
set_real_ip_from 173.245.48.0/20;
set_real_ip_from 103.21.244.0/22;
set_real_ip_from 103.22.200.0/22;
# ... full list at https://www.cloudflare.com/ips/
set_real_ip_from 2400:cb00::/32;
 
real_ip_header CF-Connecting-IP;
real_ip_recursive on;

With this, $remote_addr is rewritten to the real client IP if and only if the connection came from a trusted proxy. If it came from somewhere else, the header is ignored and $remote_addr stays as the raw connecting IP. Your logs, rate limiters, and audit trail all work off $remote_addr and stop being spoofable.

If you're not behind Cloudflare and instead behind an AWS ALB, the same pattern applies with the ALB's VPC CIDR in set_real_ip_from and real_ip_header X-Forwarded-For. For our AWS managed customers, we generate this list from the live VPC config so it doesn't drift.

Buffering: when to turn it off

Default Nginx buffers the upstream response and streams it to the client. For 99% of HTTP traffic, that's exactly what you want — it frees the backend worker as quickly as possible. But two cases break it:

Server-Sent Events (SSE) and long-polling. Buffering means the client doesn't see events until a buffer fills. The whole point of SSE is real-time delivery. Disable buffering on those locations:

location /events {
    proxy_pass http://app_backend;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
 
    proxy_buffering off;
    proxy_cache off;
    proxy_read_timeout 24h;
 
    # Critical for SSE through Nginx
    proxy_set_header X-Accel-Buffering no;
    chunked_transfer_encoding on;
}

WebSockets. Same pattern but with the Upgrade and Connection headers preserved:

location /ws {
    proxy_pass http://app_backend;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;
    proxy_read_timeout 3600s;
    proxy_send_timeout 3600s;
}

A common bug we fix: the WebSocket location block inherits a proxy_set_header Connection "" from a parent block, which strips the upgrade signal. The connection completes the HTTP handshake and then immediately drops. Be explicit at every level.

Health checks and graceful failover

Open-source Nginx doesn't have active health checks — those are an Nginx Plus feature. We work around this with max_fails/fail_timeout (passive checks) plus an external monitor that hits a /healthz endpoint on each backend and removes failing nodes from the upstream via a generated include file.

# Generated by our health-monitor; reloaded on change
include /etc/nginx/conf.d/upstreams/app_backend.conf;

The monitor writes a fresh app_backend.conf whenever a backend's status changes, then issues nginx -s reload. The reload is graceful — existing connections complete on the old worker, new connections go to the new worker. Typical reload time is under 200ms. We run this for every customer who isn't on Nginx Plus, and it's been load-bearing infrastructure for years.

For customers who want zero-touch failover, we offer it as part of our managed operations package — monitor, reload pipeline, and metrics shipped as a standard module.

Timeouts: be specific

The default Nginx proxy timeouts (60s across the board) are wrong for almost every real workload. We tune three separately:

proxy_connect_timeout 5s — TCP handshake to the backend. Five seconds is generous; anything longer and the backend is effectively down. We page on connect_timeout errors.
proxy_send_timeout 30s — how long Nginx waits between successful write bursts to the backend. Mostly irrelevant for small request bodies; matters for file uploads.
proxy_read_timeout 30s — how long Nginx waits between successful read bursts from the backend. This is the one that matters most. For sync HTTP APIs, 30 seconds is plenty. For long-running endpoints (report generation, anything that legitimately takes minutes), override per-location.

If you set a single global proxy_read_timeout 600s because one endpoint is slow, you've just made every other endpoint hold a worker for 10 minutes if the backend hangs. Tune per-location.

What we standardise on

For every customer's managed Nginx edge, we ship a config bundle that includes:

An upstreams.conf template with keepalive, max_fails, and backup-server slots
A proxy_headers.conf snippet included into every location block (so X-Forwarded-* headers can't be forgotten)
A realip.conf generated from the customer's actual front-of-edge (Cloudflare, ALB, GCP load balancer)
A health-check sidecar that maintains the live upstream membership
Standard logging that includes both $remote_addr (post-realip) and $realip_remote_addr (the raw connection), so audit trails are unambiguous
Per-location timeout overrides for known slow endpoints

It's about 200 lines of config across six files. It takes about 90 minutes to install and validate on a new customer's edge. The result is a reverse proxy that is observable, debuggable, and behaves correctly under the failure modes that show up in production — not the happy path.

If your existing Nginx config evolved organically and nobody is sure which patterns were intentional and which were copy-pasted, reach out and we'll do an audit. It usually surfaces three or four corrections worth making, and one or two that would have caused an incident eventually.

Sudhanshu K. is a Senior SRE at EdgeServers (RemotIQ Pty Ltd, ABN 91 682 628 128). She has rewritten more X-Forwarded-For chains than she cares to count and considers proxy_buffering the most under-appreciated Nginx directive.