MySQL slow query tuning: the EXPLAIN-driven workflow we use on customer databases

The MySQL performance ticket is usually the same shape. The application is slow, dashboards show high database CPU, the application team has a guess about which query is at fault, and the guess is wrong. The actual offender is usually a query that runs five thousand times a minute at 80ms each, not the single 3-second report query that everyone is staring at.

This post is the workflow we use when a customer hands us a managed MySQL database with "it's slow, please fix it." It scales from "I've never tuned MySQL" through to "I read mysql-internals on weekends," and the early steps are the ones that give you 80% of the wins.

Step 1: Turn on the slow query log properly

Half the customer environments we inherit have slow_query_log = OFF. The other half have it on but with long_query_time = 10 (the default), which means anything taking less than 10 seconds is invisible. Ten seconds is forever in OLTP terms; the slow queries that matter live below the 1-second line.

# /etc/my.cnf
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 0.5
log_queries_not_using_indexes = 1
log_slow_admin_statements = 1
log_slow_replica_statements = 1
min_examined_row_limit = 100

long_query_time = 0.5 is the right starting point on most production OLTP databases. You can dial down to 0.1 once you've cleared the >0.5s class. min_examined_row_limit = 100 filters out trivial queries that happen to take a moment due to lock waits.

These can all be set dynamically — no restart required:

SET GLOBAL slow_query_log = 1;
SET GLOBAL long_query_time = 0.5;
SET GLOBAL log_queries_not_using_indexes = 1;

For cloud-managed instances on GCP Cloud SQL or RDS, these go in the parameter group / database flags. They apply per-session, so set them globally to catch all traffic.

Step 2: Aggregate with pt-query-digest

The slow log is enormous and noisy. pt-query-digest (from Percona Toolkit) is the right tool for finding the highest-impact queries:

pt-query-digest /var/log/mysql/slow.log --limit 20 > slow-report.txt

The report ranks by total time consumed, which is what matters. A query taking 0.6 seconds running 10,000 times an hour is far more important than a query taking 30 seconds once a day.

Output looks like:

# Profile
# Rank Query ID                            Response time  Calls
# ==== =================================== ============== =====
#    1 0xBA1E94B5BBD2A8C8                  3492.2 24.1%   42183
#    2 0x2A11D6F2D5B7E9A1                  1804.6 12.5%      27
#    3 0xE5F4AD2B41C8D0F1                   942.3  6.5%   18922

Query #1 — 42,183 calls totalling 3,492 seconds — is where you start. Query #2's 27 calls totalling 1,800 seconds is also worth examining (probably a giant batch query), but query #1 is what's heating the database.

Performance Schema gives you the same data live, without parsing log files:

SELECT 
  digest_text,
  count_star,
  ROUND(sum_timer_wait/1e12, 2) AS total_seconds,
  ROUND(avg_timer_wait/1e9, 2) AS avg_ms,
  ROUND(sum_rows_examined / count_star) AS avg_rows_examined,
  ROUND(sum_rows_sent / count_star) AS avg_rows_sent
FROM performance_schema.events_statements_summary_by_digest
WHERE digest_text IS NOT NULL
ORDER BY sum_timer_wait DESC
LIMIT 20;

avg_rows_examined / avg_rows_sent is the most important ratio in MySQL tuning. A query that examines 100,000 rows to return 10 is reading 10,000x more data than it returns. That's almost always a missing index or a wrong index choice.

Step 3: EXPLAIN every top offender

Run EXPLAIN ANALYZE on each query in the top 20. The output tells you everything that matters: what indexes are used (or not), how many rows are read at each step, where time actually goes.

EXPLAIN ANALYZE
SELECT u.id, u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
  AND u.status = 'active'
GROUP BY u.id, u.email
ORDER BY order_count DESC
LIMIT 50;

What you're looking for in the output:

type: ALL or type: index — full table scan or full index scan. Almost always bad on big tables.
rows: 1,000,000+ — examining millions of rows for a single query. Almost always a missing index.
Using filesort — sorting on disk, expensive on big result sets. Often fixable with an index that already orders the data.
Using temporary — materialising an intermediate result, expensive. Often a GROUP BY without a covering index.
Extra: Using where alone is fine; combined with high rows it means the WHERE is being applied after a big scan.

EXPLAIN ANALYZE (MySQL 8.x) actually runs the query and reports per-step timings. It's slower than plain EXPLAIN but tells you which step is slow — sometimes the plan looks fine but a particular index lookup is taking 200ms because the secondary index isn't in the buffer pool.

Step 4: Index strategy

The bulk of MySQL tuning is index tuning. The rules we apply, in order:

Composite indexes follow the equality-range pattern

For a query like:

SELECT * FROM orders 
WHERE customer_id = 12345 
  AND status = 'pending' 
  AND created_at > '2026-05-01'
ORDER BY created_at DESC;

The right index is (customer_id, status, created_at). The rule: equality columns first, range columns last, ORDER BY columns aligned with the index ordering if possible.

(customer_id, created_at, status) would work for the range filter but not for the ORDER BY (because the range scan breaks the sort property).

Covering indexes save round trips

If a query reads only a few columns, an index covering those columns avoids the table lookup entirely:

-- Query that runs constantly
SELECT user_id, last_login FROM sessions WHERE token_hash = ?;
 
-- Covering index
ALTER TABLE sessions ADD INDEX idx_token_covering (token_hash, user_id, last_login);

The query becomes index-only. On a 50-million-row sessions table this is typically the difference between 30ms and 0.5ms.

Don't over-index

Every index has a write cost. A table with 12 indexes that gets 1,000 inserts a second is paying 12,000 index-write operations per second on top of the data writes. We see customers add indexes for every slow query they ever encounter, and the cumulative write cost eventually dominates.

Drop indexes that aren't used. Performance Schema makes this easy:

SELECT object_schema, object_name, index_name
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
  AND count_star = 0
  AND object_schema NOT IN ('mysql', 'sys', 'performance_schema', 'information_schema');

Indexes with count_star = 0 since restart are candidates for removal. Cross-reference with production usage over a longer window before dropping — some indexes are used only by month-end reports.

Step 5: The seven antipatterns we find on every audit

These are remarkably consistent across customers, regardless of the framework or ORM in use.

1. `SELECT *` in OLTP queries

The query reads 30 columns when the application only uses 3. The 30 includes a TEXT column with the full HTML of a description field, blowing up the result set size by 100x. Fix: ask the application team to project only what's needed.

2. `LIKE '%foo%'` queries

Leading-wildcard LIKE cannot use a B-tree index. On a million-row table this is a full scan, every time. Fix: full-text indexes (FULLTEXT INDEX) or an external search index (Elasticsearch, Meilisearch, Typesense).

3. Implicit type conversion

SELECT * FROM users WHERE phone = 5551234567;  -- phone is VARCHAR

MySQL converts every row's phone to a number to compare; the index on phone is unusable. Fix: parameterise correctly so the literal is a string.

4. `OFFSET` pagination on big tables

LIMIT 50 OFFSET 100000 requires MySQL to scan 100,050 rows and discard 100,000 of them. Fix: keyset pagination using the last seen primary key (WHERE id > :last_id LIMIT 50).

5. N+1 queries from ORMs

The application fetches a list of 50 orders and then issues 50 SELECT * FROM users WHERE id = ? queries. Each one is fast individually; the round-trip overhead kills throughput. Fix: eager loading at the ORM level. Surface this in the slow log by enabling log_queries_not_using_indexes — N+1 patterns light up as repetitive primary-key lookups.

6. ORDER BY a non-indexed column with LIMIT

SELECT * FROM events ORDER BY priority DESC LIMIT 10;

If priority isn't indexed, MySQL sorts the entire events table before applying LIMIT. Even with the optimisation, a million-row sort to return 10 rows is wasted. Fix: index the sort column, possibly with the filter column for a covering composite index.

7. `COUNT(*)` on huge tables for pagination

The "show total page count" feature counts every row in the table on every request. On a 100-million-row table this is a multi-second query, every page load. Fix: cache the count, use an approximate count from information_schema.tables, or remove the total page indicator entirely.

Step 6: performance_schema for live diagnostics

Beyond query-level data, performance_schema surfaces:

-- Wait events: where is time actually going?
SELECT event_name, SUM_TIMER_WAIT/1e12 AS seconds
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE event_name LIKE 'wait/io/%' OR event_name LIKE 'wait/lock/%'
ORDER BY seconds DESC LIMIT 20;
 
-- Lock waits: who's blocked on whom?
SELECT * FROM performance_schema.data_lock_waits;
 
-- Connection-level stats
SELECT user, current_connections, total_connections, sum_rows_sent
FROM performance_schema.users
ORDER BY total_connections DESC;

Wait-event analysis is what tells you whether your bottleneck is CPU (most aggregations on warm data), I/O (cold-cache scans), locks (write contention), or network (a chatty application).

The managed MySQL dashboards we ship put these metrics into Grafana with a per-customer baseline — the absolute numbers matter less than the deltas from "normal."

Step 7: Buffer pool sizing and hit ratio

The single biggest performance lever after indexing is the InnoDB buffer pool:

innodb_buffer_pool_size = 24G    # roughly 70-75% of system RAM
innodb_buffer_pool_instances = 8

Hit ratio target: >99% on OLTP, >95% on mixed workloads. Calculate it:

SELECT 
  ROUND(100 * (1 - 
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status 
     WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') /
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status 
     WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests')
  ), 2) AS buffer_pool_hit_ratio;

A hit ratio below 95% means cold reads from disk are happening constantly. Options: bigger buffer pool, smaller working set (data partitioning, archiving cold rows), or better indexes that make hot data fit.

What we ship by default

For every managed MySQL customer:

Slow log at 0.5s with pt-query-digest running daily, top-20 report sent to customer dashboard
Performance Schema enabled with sensible consumers
Grafana dashboards covering query digest, lock waits, buffer pool ratio, connection counts
Quarterly index review: drop unused indexes, identify missing ones from slow log
An on-call alert when the top query digest changes — usually a sign of an application deploy that introduced a new query pattern

Performance tuning is iterative. A new feature ships, a new slow query appears, an index needs adjusting. Doing this once and then ignoring it for a year is how databases gradually slow down without anyone noticing.

If your MySQL is creeping toward "we'll just throw a bigger instance at it," reach out. A two-day tuning engagement against a representative production workload typically yields a 30-50% reduction in database CPU and a measurable drop in tail latency — usually without any application changes at all.

Sudhanshu K. is a Senior Database Engineer at EdgeServers (RemotIQ Pty Ltd, ABN 91 682 628 128). He has spent more hours staring at EXPLAIN output than is healthy and has the strong opinions about composite index ordering to show for it.