PHP-FPM pool tuning in production — the static vs dynamic vs ondemand decision we keep relitigating

When a customer rings us at 2am because "the site is slow," the script we run first isn't top or htop. It's a quick look at pm.status, the FPM slowlog, and the FPM error log. Eighty per cent of the time, the bottleneck isn't PHP code, isn't MySQL, isn't even Redis — it's that the PHP-FPM pool is sized wrong for the box, configured in the wrong process-manager mode, or quietly running out of children while requests queue up behind it.

This post is the cheat sheet we use internally when we onboard a new managed PHP workload — Laravel, WordPress, Symfony, custom — onto AWS, GCP, Azure, or DigitalOcean. It covers the three process manager modes, the sizing maths for pm.max_children, the slowlog discipline most teams skip, and how the pool config interacts with OPcache.

The three process manager modes, plainly

PHP-FPM ships with three pm modes. Most config you'll find on the internet picks one without explaining why, then sets the numbers to whatever the author had in /etc/php on their laptop. Here is what they actually do:

pm = static — FPM forks exactly pm.max_children workers at startup. They live forever, accepting requests in round-robin. Memory usage is predictable: max_children x average_worker_rss. Latency is the lowest of the three modes because there's never a fork happening on the request path.

pm = dynamic — FPM starts with pm.start_servers workers and keeps pm.min_spare_servers to pm.max_spare_servers idle, scaling up to pm.max_children under load. Idle workers above the spare ceiling are killed. Useful when the box has multiple roles and you don't want PHP to hog memory it isn't using.

pm = ondemand — FPM starts zero workers. It forks one when a request arrives, kills it after pm.process_idle_timeout. Latency on a cold worker is awful (the first request pays the fork cost). Memory at idle is near zero. Useful for low-traffic dev hosts and that's about it.

In production, we run static on every dedicated PHP host. The reasons:

The memory cost is paid upfront and is predictable. You never get a surprise OOM at 3pm because traffic spiked and FPM forked another 40 workers.
The fork cost — small per request, but real — is gone from the hot path.
OPcache and JIT warm-up only has to happen once per worker lifetime, and workers live forever.
The pool config is the single source of truth for "how much concurrency does this box support" — which makes capacity planning a multiplication, not a guess.

We use dynamic only on shared customer hosts where multiple roles share the same box. We don't use ondemand in production at all. If you're reading a guide that recommends ondemand for a busy site, it's wrong.

Sizing `pm.max_children` — the maths, not the guess

The single most common mis-configuration in PHP world is pm.max_children = 50 on a 2GB VM. This will OOM under load every single time.

The correct sizing formula is straightforward:

pm.max_children = (TOTAL_RAM - OS_RESERVE - OPCACHE - OTHER_SERVICES) / AVG_WORKER_RSS

Worked example — a 4GB DigitalOcean Droplet running a Laravel app:

TOTAL_RAM = 4096 MB
OS_RESERVE (kernel, sshd, monitoring agent, log shipper) = 400 MB
OPCACHE (opcache.memory_consumption + interned_strings_buffer) = 384 MB
OTHER_SERVICES (Redis, Nginx) = 600 MB
AVG_WORKER_RSS for the Laravel app = 90 MB

(4096 - 400 - 384 - 600) / 90 = 2712 / 90 ≈ 30

So pm.max_children = 30. Not 50. Not 100. Thirty.

To get AVG_WORKER_RSS for your app — and please, measure, do not guess — run under realistic load and use:

ps -ylC php-fpm8.3 --sort:rss | \
  awk '{ sum += $8; count++ } END { print "Avg RSS KB:", sum/count, "Workers:", count }'

A vanilla Laravel app sits at 70-100 MB per worker. A WordPress site with WooCommerce and a half-dozen plugins is usually 120-180 MB. A Symfony API with Doctrine warmed up can be 90-130 MB. JIT enabled adds 5-10 MB per worker but reduces CPU.

The reason this matters: when traffic exceeds pm.max_children, Nginx (or whatever fronts FPM) starts queueing requests. The queue is listen.backlog, default 511. Latency goes up before throughput goes down. By the time the user feels it, the queue has been growing for minutes. We monitor pm.status continuously across every customer pool — accepted, listen queue, active processes, max active processes since start — because the lag indicator is what tells us we need to scale, not the alert.

The slowlog: the most underused diagnostic in PHP

If you take only one thing from this article, take this. Configure the slowlog, and read it.

; /etc/php/8.3/fpm/pool.d/www.conf
request_slowlog_timeout = 2s
slowlog = /var/log/php-fpm/slow.log
slowlog_trace_depth = 40

What it does: any request that takes longer than 2 seconds gets a full PHP backtrace dumped to the slowlog. Not the HTTP request — the PHP call stack, at the moment the timeout fires. You see exactly where the worker is stuck.

We tune the threshold per workload — 2s for WordPress, 1s for an API, 500ms for hot endpoints. The output looks like this:

[12-May-2026 08:31:14] [pool www] pid 1842
script_filename = /var/www/app/public/index.php
[0x00007f...] mysqli_query() /var/www/app/vendor/.../Connection.php:243
[0x00007f...] PDO->prepare() /var/www/app/vendor/.../QueryBuilder.php:178
[0x00007f...] Model::where() /var/www/app/app/Http/Controllers/OrderController.php:67
[0x00007f...] OrderController->index() /var/www/app/app/Http/Controllers/OrderController.php:42

That's the smoking gun. The request was 1.8s into executing, and the worker was sitting in mysqli_query on line 243 of the DB connection wrapper. Now you know the slow request is a slow query, not a slow render, not a slow Redis call, not garbage collection. Half our performance audits start by turning on the slowlog and waiting 24 hours; the resulting log file usually contains the entire incident report.

Pool config that we actually ship

For a dedicated PHP host on a 4GB instance (DigitalOcean Droplet, EC2 t3.medium, equivalent), running a Laravel application with OPcache and JIT enabled, our default pool config:

; /etc/php/8.3/fpm/pool.d/www.conf
[www]
user = www-data
group = www-data
listen = /run/php/php8.3-fpm.sock
listen.owner = www-data
listen.group = www-data
listen.mode = 0660
listen.backlog = 4096
 
pm = static
pm.max_children = 30
pm.max_requests = 1000
 
request_terminate_timeout = 60s
request_slowlog_timeout = 2s
slowlog = /var/log/php-fpm/slow.log
slowlog_trace_depth = 40
 
pm.status_path = /fpm-status
ping.path = /fpm-ping
 
php_admin_value[memory_limit] = 256M
php_admin_value[error_log] = /var/log/php-fpm/error.log
php_admin_flag[log_errors] = on
clear_env = no
catch_workers_output = yes
decorate_workers_output = no

Some notes on the less obvious bits:

pm.max_requests = 1000 — each worker is recycled after 1000 requests. This is the safety valve against memory leaks in third-party libraries; we'd happily run it at 0 (no recycling) if everyone wrote leak-free code. They don't. Set it lower (250-500) if you're running WordPress with a lot of plugins.
listen.backlog = 4096 — the default of 511 is too low for any real workload. The Linux kernel will silently cap this at net.core.somaxconn, which you also need to raise (we set 65535).
request_terminate_timeout = 60s — kills runaway requests. Set this slightly above your slowest legitimate request, never below.
clear_env = no — keeps environment variables visible to PHP. If you set this to yes, you'll spend an evening wondering why getenv() returns nothing.

The interaction with OPcache (and why static wins again)

OPcache caches compiled PHP bytecode in shared memory, but with one important catch: per-worker state like interned strings and JIT-compiled code lives in process memory, not shared memory. Every time a worker is forked or recycled, that state has to be rebuilt.

In static mode, workers live forever (modulo pm.max_requests), so JIT warm-up amortises across thousands of requests. In ondemand mode, every cold worker is a full warm-up. The difference is measurable: on a Laravel API we benchmarked, p99 latency on the first 100 requests after a dynamic-mode scale-up was 4-7x higher than steady state, purely due to JIT/OPcache warm-up. With static, that's a non-event.

This is the reason we bias hard toward static even on workloads where the memory cost feels excessive: you're paying for predictable performance, and that's the thing customers actually notice.

What we monitor on every pool

Once the pool is sized, the monitoring matters more than the config. On every managed PHP host we run, the following are wired into Prometheus and alerted on:

Active processes vs max_children — alert at 80% sustained for 5 minutes
Listen queue depth — alert if > 0 for more than 30 seconds (means requests are waiting)
Slow request rate — alert if slowlog entries > 5/minute
Worker recycle rate — alert if workers are being killed by pm.max_requests faster than expected (indicates memory leak)
OPcache hits vs misses — alert if hit ratio drops below 99%
Worker RSS p95 — alert on growth over time (memory leak signal)

These are not optional. A pool you can't see is a pool that fails silently. We've taken over too many sites where the previous setup was "it's been fine for two years," which translated to "nobody has actually looked at PHP-FPM in two years and it's been quietly degrading the whole time."

When to scale out vs tune in

Once you've tuned the pool and you're still hitting max_children regularly, the answer isn't more children — it's more boxes. PHP scales horizontally very well; trying to cram 200 children into a single 8GB host is a fight against memory pressure that you will lose under traffic spikes.

We default to two FPM hosts behind a load balancer minimum, even for modest sites, because it gives you rolling deploys without dropping requests, OS patching without downtime, and headroom for the failure case. The Laravel and Symfony shops we work with are usually running 3-6 FPM hosts at steady state and scaling to 10-15 under campaign traffic. That's also the architecture we ship by default on our Laravel managed stacks.

The pool config is the same on each host. The only thing that changes is the box count.

If your PHP-FPM pool has been running on the defaults from your distro's package — and most of them have — you're almost certainly over-provisioned on max_children, under-provisioned on listen.backlog, and have a slowlog that has never been read. Have us audit it; the first 24 hours usually pay for the engagement in CPU savings alone.

Sudhanshu K. is a Principal Site Reliability Engineer at EdgeServers (RemotIQ Pty Ltd, ABN 91 682 628 128). He has been running PHP in production since PHP 5.2 and is still occasionally surprised by what pm = static does to a Friday afternoon graph.