Why are RHEL/Oracle Linux upgrades still so unreliable?

 

The RHEL distribution is still really popular in the enterprise Linux market. The stats sometimes look worse because the users are split across both RHEL and all of its many clones, but it still represents a massive chunk of the enterprise market. With that in mind, why are RHEL upgrades still so unreliable?

My history of OS upgrades

Over the years I’ve done loads of operating system upgrades. In “recent” times I’ve done Windows upgrades (8 -> 8.1 -> 10 -> 11) and loads of Intel macOS upgrades with minimal drama. That’s not the case with Linux.

About a decade ago I was using Fedup to upgrade between versions of Fedora. That was later replace by the DNF upgrade process. More recently I’ve tried my hand using Leapp to upgrade Oracle Linux. I’m no expert, but these upgrades always feel really janky. You never really know what you are going to get until the upgrade is complete. The milage varies depending on configuration of the server, and what software has been installed on it. In some cases you have a system that is running fine. In some cases you have a running server, but a bunch of the configuration has to be redone to make your application work. In some cases you have bricked your server. Not exactly confidence building.

So what is the alternative?

My overall feel has always been don’t upgrade! Build new kit, migrate across to it, then ditch the old kit.

In the past we would typically make that process fit with the lifespan of the physical kit, but now when we use virtual machines we could potentially have a VM that outlives the physical kit many times over, so the desire to migrate is not so pressing, and a reliable in-place upgrade process is more desirable.

Migrating to a new server comes with a hole bunch of overheads. It’s a lot more work than an in-place OS upgrade.

Infrastructure as Code

If you have true infrastructure as code, you can build new systems really quickly, including all the VMs, networking and firewalls, which just leaves you with the data migration to worry about. If you are not in that position, it’s a nightmare of service tickets and endless waiting.

Questions

Is it too much to ask for such a dominant operating system to have reliable upgrades?

I would love to know your experiences of OS upgrades for RHEL and its clones. Have I just been really unlucky with Leapp, or are upgrade really as bad as I think?

Cheers

Tim…

Update 1: I ran a poll on Twitter. There weren’t many responses, but at least I can see I’m not completely alone. 🙂

Update 2: There are some interesting responses in the comments. Well worth you giving them a look.

Author: Tim...

DBA, Developer, Author, Trainer.

4 thoughts on “Why are RHEL/Oracle Linux upgrades still so unreliable?”

  1. Hi Tim,

    Long time reader, first time writing 🙂

    I’ve performed arond 200 in-place upgrades in the last year (RHEL7 to RHEL8) and it has been great really. Apart from making sure the main application running on the server is supported/certified for RHEL8, everything should be fine. We perform some prechecks with ansible before considering any server. Checks like these:

    – Do we have Red Hat’s Tomcat? In RHEL7, Tomcat is available from the RHEL repository, but in RHEL8, you’ll need to purchase JBoss Web Server for continued support and patches.
    – Do we have more than one NIC with eth* naming convention?
    – Do we have RHEL’s High-Availability add-on? In-place upgrades are not supported for it.
    – Are there any unsigned packages from RHEL on the system?
    – Are there any packages installed on the system without any repo associated with?
    – Do we have the required free space in /usr and /var?
    – Are there any NFS or CIFS shares mounted?
    etc…

    Some of these you’ll get as warnings during the “leapp preupgrade” process. We’re at a point that, based on these prechecks we perform, we always gret a “GREEN” pass from leap preupgrade & upgrade.

    Thank you!


    Jose

  2. Ciao!

    We have no infrastructure as code, every new machine is a pain of tickets (the beloved “new generation application firewall” Is a kick in the ass) and last we have to move from OEL to RHEL, so we did first OEL to RHEL conversion and then the release upgrade. Surplising, this double excercise works and it’s our preferred path to update these machines (fortunally we have no 7×24 requirements).

    Regards
    Fabrizio

  3. Hi Tim,

    Long time reader, first time commenter….

    I feel your pain with RHEL-based system upgrades. Non-RHEL systems work totally fine for many decades. Luckily, we have many customers that seldomly upgrade their Oracle databases due to the lack of third party support (applications running on oracle) and are not often confronted by RHEL-upgrade path problems. Unfortunately, we’re still running 11.2.0.4 (and also 12.1.0.2, 12.2.0.1) on too many databases and still have to wait until we can finally upgrade. Therefore, the machines still run on their old rhel systems and only got upgraded with new hardware, a new OS and a new database version.
    In the last year, we see that there is more pressure to upgrade those systems, yet still the migration itself is a long-lasting effort. We often see efforts to secure the legacy systems even more in order to get them running longer including more expensive hardware support contracts. Another problem is the certification of operation systems on hardware. If e.g. OL7 is supported on a machine, it got not recertified with OL8 and can therefore not be used in a fully-certified stack, so that we’re unable to upgrade the OS. A hardware switch is then the only way forward.

    Why do we often use “real” hardware you might ask? Licensing. Most of our customers do not want to trap into the licensing on $YourHyervisor hell.

    We also played around with running Oracle Database inside Docker, but nothing on a production level. A RAC or SEHA is still much better. However, we use it for testing heavily.

    Best,
    Andreas

  4. Jose: That gives me some new hope. I definitely think we’ve got a bunch of servers that could upgrade successfully, because the operating systems are quite clean. I hope so, because migrating all of them will be painful.

    Fabrizio: I feel your pain! Fingers crossed it all goes well for you.

    Andreas: I’m lucky we’ve managed to keep everyone on relatively new versions of the software. Although I sometimes see the appeal of never touching servers again, and letting them grow old gracefully. 🙂

    Cheers

    Tim…

Comments are closed.