I’ve written a number of posts about how important automation is (here), but thought I would mention something that happened on Friday…
If you’ve followed the blog you will know I recently released a hands-off installation of 18 RAC. On Friday I received a pull request from GitHub suggesting a change to the GI software installation, switching the public and private network device names from “enp0s8” & “enp0s9” to “eth1” & “eth2”. I hadn’t seen ethernet device names like this since RHEL5/OL5, so I was convinced it was a mistake. I was in the middle of writing a comment on the pull request, but I thought I better do my homework, rather than making an assumption.
When I got home I fired up the existing RAC on my Linux server and the device names were “enp0s8” & “enp0s9” as I expected. I fired up a new DNS server on my laptop and the public network device name was “enp0s8”. At this point I was convinced I was correct, but I thought I better just make sure. I was using the latest VirtualBox and Vagrant releases, but I noticed the output from the build said there was a newer version of the “bento/oracle-7.5” Vagrant box. Did a “vagrant box update”, “vagrant destroy -f” and “vagrant up” and the new DNS server was built with a public network device name of “eth1”.
I went back to my Linux server, did the same and the device names were eth1 and eth2 on there too. So the “bento/oracle-7.5” box update had caused the ethernet device naming to change. As a result of this I accepted the pull request, made a similar change to another config file, then destroyed and started a rebuild of my RAC. I went out to visit some mates and watched the first two episodes of Ozark season 2. By the time I came back I had a new RAC up and running.
Why does this mean automation sucks? Well it doesn’t really, but it does show you how your automation can be broken by things outside of your control. That could be change to a Vagrant box, a Docker image, the Kickstart file from your system admins, or even a cloud service. You have to keep on top of this stuff. You can’t just define it and forget about it. Having said that, I would rather find a problem like this in an automated build than in a manual process… 🙂