Skip to content
EdgeServers
Blog

redhat

RHEL 9 to RHEL 10 with Leapp — the pre-flight checks and the gotchas we hit on real fleets

In-place major version upgrades are now genuinely viable on RHEL. They are not, however, fire-and-forget. Here's the Leapp workflow we run, the issues we surface, and when we still prefer fresh installs.

May 28, 2026 · 11 min · by Sudhanshu K.

RHEL 9 to RHEL 10 with Leapp — the pre-flight checks and the gotchas we hit on real fleets

Once upon a time the answer to "how do we upgrade RHEL major version" was "you don't — you build a new host and migrate." That changed in earnest with Leapp around RHEL 7.6, and by RHEL 9 → 10 it's a genuinely production-grade tool. We've now run the 9-to-10 upgrade on a few hundred hosts across our managed RHEL fleet, and we've collected the gotchas worth knowing before you start.

This is not a "Leapp can do everything" piece. Leapp is good, but every fleet we've upgraded has produced at least one surprise per twenty hosts. The work is in knowing where to look.

What Leapp actually does

Leapp is Red Hat's in-place major version upgrade tool. The mechanics:

  1. leapp preupgrade runs ~150 actor checks against the host, producing a report of blocking issues, inhibitors, and warnings
  2. You fix the blockers
  3. leapp upgrade downloads the RHEL 10 content, builds a special initramfs, and reboots into a transitional environment
  4. The transitional environment performs the package upgrade, configuration migration, and SELinux relabel
  5. The host reboots again into RHEL 10

On a clean host with no third-party content, the whole thing takes about 90 minutes elapsed, with maybe 20 minutes of actual downtime around the reboots. On a real production host with custom configs, the elapsed time is more like 4-6 hours including the pre-flight remediation.

The pre-flight, before you ever run leapp

We do not run leapp preupgrade cold. We do these checks first, because the Leapp report is easier to read when the obvious issues are already cleared.

Inventory the third-party RPMs

# Anything not from a Red Hat repo
rpm -qa --qf '%{NAME} %{VENDOR}\n' | grep -v "Red Hat" | sort

These are the packages most likely to break the upgrade. Common offenders: vendor-bundled monitoring agents (Datadog, NewRelic, Dynatrace), older versions of Postgres or MongoDB from upstream repos, in-house RPMs from rpm-build. Each one needs a plan: remove before upgrade, replace with RHEL 10 equivalent after, or trust the vendor's documentation.

Check the cgroups situation

RHEL 10 is cgroups v2 only. RHEL 9 defaulted to v2 but could be switched back to v1. If your fleet was using v1 (some Java workloads, older container runtimes), that breaks. Confirm:

mount | grep cgroup
# Expected on RHEL 10-ready: cgroup2

If you see cgroup (v1) anywhere, fix it on RHEL 9 first by removing systemd.unified_cgroup_hierarchy=0 from the kernel command line and rebooting. Validate everything still works there, then proceed.

Python 3.9 vs Python 3.12

RHEL 9 ships Python 3.9 as the system default. RHEL 10 ships Python 3.12. Any internal scripts that rely on Python need to be Python 3.12 compatible. The Leapp report will flag this, but we audit ahead of time:

# Find python scripts
find /opt /usr/local -name "*.py" 2>/dev/null
# Find shebangs
grep -rE '#!.*python3?(\.[0-9]+)?' /usr/local/bin /opt 2>/dev/null

The most common breakage we've seen is removed standard-library modules (asynchat, imp, distutils for real this time) that 3.9 was warning about. If your internal tooling stuck a few import distutils lines in, they need to come out.

OpenSSL and crypto policies

RHEL 10 hardens default crypto policies further. TLS 1.2 is still available but a chunk of the older cipher suites are gone. Old SHA-1 certificates are rejected by default. If your fleet talks to an old SAP system or a 2014-vintage internal CA, you'll discover it the hard way.

# Check what the current policy is
update-crypto-policies --show
# LEGACY policies still exist on RHEL 10 but you really shouldn't be there

The right time to confirm this is during pre-flight, not 30 seconds after the upgrade reboot when monitoring lights up.

Storage and LVM

RHEL 10 dropped support for some older filesystem and LVM configurations. The big one: dm-cache. If you've got LVM cache volumes anywhere, they need to be decached on RHEL 9 first. leapp preupgrade will flag this but the remediation is non-trivial (you need spare space on the slow device to hold the cached data).

Running the preupgrade

With the obvious pre-flight done, run the actual tool:

# Install Leapp on RHEL 9
dnf install leapp-upgrade
 
# Pull the latest data files (these update independently)
leapp answer --section check_vdo.no_vdo_devices=True
 
# Run preupgrade
leapp preupgrade

The report lands in /var/log/leapp/leapp-report.txt and /var/log/leapp/leapp-report.json. The JSON is what we feed into our automation; the text is what a human reads. Categories of finding:

  • Inhibitor — must be fixed before upgrade proceeds. Hard stop.
  • High — should be fixed; upgrade might succeed but you'll regret it.
  • Medium / Low / Info — advisory; read them, decide case by case.

For managed customers we run the preupgrade across a representative sample of hosts (one per role) and use the combined report to plan the fleet upgrade. The same set of inhibitors tends to recur — fix them in your golden image and they go away across the fleet at once.

The actual upgrade

Once preupgrade is clean:

leapp upgrade

This pulls the RHEL 10 content, builds the initramfs, and prompts you to reboot. The reboot is not optional and is not safe to skip — the upgrade only runs in the transitional environment, which only exists during that specific boot.

shutdown -r now

The host now boots into the upgrade initramfs, runs the upgrade transaction (taking 20-40 minutes for an average host), then reboots again into RHEL 10. Console access is essential — if anything goes wrong during the upgrade boot, you need the console to see what happened.

For managed RHEL on AWS, this means EC2 Instance Connect or SSM Session Manager rather than SSH. SSH won't work during the transitional boot because sshd hasn't started yet. Plan for this; we've seen engineers panic when they couldn't connect during the 25-minute upgrade window.

After the upgrade — verify, don't trust

# Confirm version
cat /etc/redhat-release
# Red Hat Enterprise Linux release 10.0 (Coughlan)
 
# Check kernel
uname -r
 
# Confirm subscription is correct for RHEL 10
subscription-manager refresh
subscription-manager status
 
# Check enabled repos
subscription-manager repos --list-enabled

Then the actual application-level smoke tests. We have a standard post-upgrade checklist that runs:

  • Every systemd unit reports active (running) (or appropriate state)
  • Every container/pod starts cleanly under the new podman/cri-o versions
  • The application's own health endpoints return 200
  • A representative database query completes in approximately the expected time
  • SELinux is enforcing and there are no fresh AVC denials

Anything that fails goes into the upgrade report. Roughly one host in 10 produces at least one finding worth follow-up.

The gotchas we actually hit

Network configuration regenerated

NetworkManager keyfiles change format slightly between major versions. We've had hosts come back online with DNS resolving but the search domain missing, or with a default route that took an extra route metric. The fix is straightforward (nmcli con mod ...) but you need to actually check, not assume.

Custom systemd units running as the wrong user

Specifically, units written without an explicit User= directive. On RHEL 9 they happened to run as root because the install order put them there; on RHEL 10 they ran as the unit-default user, which sometimes didn't exist. Add explicit User= everywhere.

Python virtualenvs broken

Virtualenvs created on RHEL 9 against Python 3.9 do not magically become Python 3.12 virtualenvs. They keep their broken Python 3.9 shebangs and crash on use. The fix is to rebuild every virtualenv from requirements.txt. This is usually obvious in advance but we've watched a customer team spend hours on it post-upgrade.

MariaDB / PostgreSQL major version bumps

RHEL 10 ships newer database major versions (MariaDB 11.x, PostgreSQL 16). If your application was using the system Postgres, the data directory will need a pg_upgrade step that Leapp does not perform for you. Pin the database version to a non-system stream or do the database upgrade explicitly.

# Right way
dnf module reset postgresql
dnf module enable postgresql:16
# Then run pg_upgradecluster or equivalent

SELinux denial after relabel

The Leapp transition includes a full SELinux relabel, but if you'd been carrying custom contexts manually applied (without semanage fcontext -a), they get wiped. Anything in a non-standard location that depended on a non-default label needs to be re-applied. We covered the workflow for this in the SELinux post — it's worth a re-read before you upgrade.

When we still don't use Leapp

For all of Leapp's maturity, there are cases where we still recommend a fresh install:

  • Hosts running for more than 3-4 years. Accumulated drift, undocumented changes, possibly some package state Leapp can't make sense of. The risk of "upgrade succeeds but something subtle is wrong" is much higher.
  • Hosts where the OS layer is genuinely small. If a host's role is "run a single container," there's no value in upgrading it — re-deploy it from your IaC and you're done in 10 minutes.
  • Hosts with significant third-party storage stacks. Vendor-supplied LVM/SAN tooling that pre-dates Leapp testing is a high-risk surface.
  • Compliance-driven fleets where re-attestation is cheaper than in-place modification. Some PCI and government environments need every host to be re-baselined after a major change; for them, a fresh install is the right operational answer.

For everything else — standard application servers, managed Postgres hosts, web/api fleets — Leapp is now our default, and it works.

What we ship for upgrade engagements

For customers running managed RHEL who are due to move from 9 to 10, our typical engagement looks like:

  1. Inventory across the fleet — third-party packages, custom configs, role classification
  2. Leapp preupgrade across a representative sample, with the findings consolidated
  3. Remediation of common findings centrally (in the golden image, Ansible role, or Satellite content view)
  4. A canary upgrade of 2-3 non-critical hosts
  5. Phased rollout, typically 10% per week, with checkpoint reviews
  6. Documented provisioning of any new RHEL 10 hosts from then on

End-to-end, a fleet of 100 hosts is a 4-6 week engagement. Most of that time is monitoring and verifying, not actually running Leapp.

The summary

Leapp has gone from "interesting experiment" to "the way we move RHEL between major versions" in the span of two releases. RHEL 9 to 10 is the cleanest in-place upgrade we've run yet — but it's not zero-effort. The pre-flight matters, the post-upgrade verification matters more, and the discipline of fixing issues in the golden image rather than per-host is what makes a fleet-wide upgrade tractable.

If you're sitting on a RHEL 9 fleet and looking at a 2026 upgrade calendar, that's a conversation we're happy to scope. We've made enough mistakes on other people's hosts that we have a fairly good map of where the cliffs are.

Sudhanshu K. is a Staff DevOps engineer at EdgeServers (RemotIQ Pty Ltd, ABN 91 682 628 128). He has been doing RHEL major-version upgrades since the rebuild-from-scratch days and remains quietly impressed every time Leapp succeeds.