Vagrant Box Drama

I had a little bit of VirtualBox and Vagrant drama today.

I was doing my normal thing of recreating some test VMs and I started to get errors like this during the first part of the VM build, before the config scripts ran.

==> default: Machine booted and ready!
Got different reports about installed GuestAdditions version:
Virtualbox on your host claims:
VBoxService inside the vm claims: 6.1.12
Going on, assuming VBoxService is correct…
[default] GuestAdditions seems to be installed (6.1.12) correctly, but not running.
Got different reports about installed GuestAdditions version:
Virtualbox on your host claims:
VBoxService inside the vm claims: 6.1.12
Going on, assuming VBoxService is correct…
Redirecting to /bin/systemctl start vboxadd.service
Job for vboxadd.service failed because the control process exited with error code.
See "systemctl status vboxadd.service" and "journalctl -xe" for details.
Redirecting to /bin/systemctl start vboxadd-service.service
Job for vboxadd-service.service failed because the control process exited with error code.
See "systemctl status vboxadd-service.service" and "journalctl -xe" for details.
Got different reports about installed GuestAdditions version:
Virtualbox on your host claims:
VBoxService inside the vm claims: 6.1.12
Going on, assuming VBoxService is correct…
VirtualBox Guest Additions: Starting.
VirtualBox Guest Additions: Building the VirtualBox Guest Additions kernel
modules. This may take a while.
VirtualBox Guest Additions: To build modules for other installed kernels, run
VirtualBox Guest Additions: /sbin/rcvboxadd quicksetup
VirtualBox Guest Additions: or
VirtualBox Guest Additions: /sbin/rcvboxadd quicksetup all
VirtualBox Guest Additions: Kernel headers not found for target kernel
5.4.17-2011.5.3.el8uek.x86_64. Please install them and execute
/sbin/rcvboxadd setup
ValueError: File context for /opt/VBoxGuestAdditions-6.1.12/other/mount.vboxsf already defined
modprobe vboxguest failed
The log file /var/log/vboxadd-setup.log may contain further information.
==> default: Checking for guest additions in VM…
default: No guest additions were detected on the base box for this VM! Guest
default: additions are required for forwarded ports, shared folders, host only
default: networking, and more. If SSH fails on this machine, please install
default: the guest additions and repackage the box to continue.
default:
default: This is not an error message; everything may continue to work properly,
default: in which case you may ignore this message.
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!
/usr/sbin/rcvboxadd setup
Stdout from the command:
VirtualBox Guest Additions: Starting.
VirtualBox Guest Additions: Building the VirtualBox Guest Additions kernel
modules. This may take a while.
VirtualBox Guest Additions: To build modules for other installed kernels, run
VirtualBox Guest Additions: /sbin/rcvboxadd quicksetup
VirtualBox Guest Additions: or
VirtualBox Guest Additions: /sbin/rcvboxadd quicksetup all
VirtualBox Guest Additions: Kernel headers not found for target kernel
5.4.17-2011.5.3.el8uek.x86_64. Please install them and execute
/sbin/rcvboxadd setup
Stderr from the command:
ValueError: File context for /opt/VBoxGuestAdditions-6.1.12/other/mount.vboxsf already defined
modprobe vboxguest failed
The log file /var/log/vboxadd-setup.log may contain further information.

The VM had booted, but because the guest additions weren’t working it couldn’t mount the shared folders, so none of the setup scripts had run.

Reading the output I figured the kernel-uek-devel package was missing from the Vagrant box, so I did the following…

I connected to the VM, installed the “kernel-uek-devel” package and exited from the VM.

vagrant ssh

sudo dnf install -y kernel-uek-devel
exit

Then I restarted the VM.

vagrant halt
vagrant up

During this second startup of the VM the problem kernel module was rebuilt, and the rest of the configuration steps ran the way you would expect from a normal first-time startup.

One of the issues about using someone else’s Vagrant box is you are at the mercy of what they decide to do with it. In this case I was using the ‘bento/oracle-8’ Vagrant box, which was built using Oracle Linux 8.2 with UEK 6, but the installed guest additions were not happy, and the packages were not present to allow the kernel module to be rebuilt on the fly.

If you are trying to use my Vagrant builds, which mostly use the ‘bento/oracle-8’ Vagrant box, and you are getting this type of issue, now you know what to do about it. Hopefully the next release of this Vagrant box will be less problematic.

Cheers

Tim…

Update: I spent some time figuring out Packer, and now I’ve switched all my OL8 builds to use my own image called ‘oraclebase/oracle-8’.

scsi_id and UDEV issues (update)…

Last month I wrote about a problem I saw with scsi_id and UDEV in  OL5.8. As it screwed up all my UDEV rules is was a pretty important issue for me. It turned out this was due to a mainline security fix (CVE-2011-4127) affecting the latest kernels of both RHEL/OL5 and RHEL/OL6. The comments on the previous post show a couple of workarounds.

Over the weekend I started to update a couple of articles that mentioned UDEV rules (here and here) and noticed the problem had dissapeared. I updated two VMs (OL5.8 and OL6.2) with the latest changes, including the UEK updates and ran the tests again and here’s what I got.

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.8 (Tikanga)
# uname -r
2.6.39-100.6.1.el5uek
# scsi_id -g -u -s /block/sda/sda1
SATA_VBOX_HARDDISK_VB535d493d-7a44eb0f_
#

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)
# uname -r
2.6.39-100.6.1.el6uek.x86_64
# /sbin/scsi_id -g -u /dev/sda1
1ATA_VBOX_HARDDISK_VB2b5dc561-4ae6e154
#

So it looked like normal service had been resumed. 🙂 Unfortunately, the MOS Note 1438604.1 associated with this issue is still not public, so I couldn’t tell if this was a unilateral change in UEK, or part of a mainline fix for the previous change.

To check I fired up a CentOS 6.2 VM with the latest kernel updates and switched an Oracle Linux VM to the latest RHEL compatible kernel and did the test on both. As you can see, they both still don’t report the scsi_id for partitions.

# cat /etc/redhat-release
CentOS release 6.2 (Final)
# uname -r
2.6.32-220.13.1.el6.x86_64
# /sbin/scsi_id -g -u /dev/sda1
#

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)
# uname -r
2.6.32-220.13.1.el6.x86_64
# /sbin/scsi_id -g -u /dev/sda1
#

It could be the associated fix has not worked through the mainline to RHEL and CentOS yet. I’ll do a bit of digging around to see what is going on here.

Cheers

Tim…

Update: It appears the reversion of this functionality may not be permanent, so I’ve updated my articles to use a “safer” method of referencing the parent (disk) device, rather than the partition device.

Oracle Linux 5.8 and UDEV issues…

I just did an update from Oracle Linux 5.7 to 5.8 on one of my VirtualBox RAC installations and things are not looking to clever at the moment. After a reboot, the ASM instances and therefore the database instances wouldn’t restart. A quick look showed the ASM disks were not visible. On this installation I was using UDEV, rather than ASMLib. In checking the UDEV rules I noticed the scsi_id command on OL5.8 doesn’t report an ID for partitions on disks, only the disks themselves. For example, on OL5.7 I get this,

# /sbin/scsi_id -g -u -s /block/sdb/sdb1
SATA_VBOX_HARDDISK_VBd306dbe0-df3367e3_
#

On OL5.8 I get this,

# /sbin/scsi_id -g -u -s /block/sdb/sdb1
#

If I run it against the disk, rather than the partition it works fine.

This has literally just happened, so I’ve done no further investigation, but I thought it was worth putting out there in case anyone was about to start an OS update on something they cared about. 🙂

At this point I’m not discounting that I’ve screwed up somewhere. My next plan is to install three clean VMs (OL 5.6, 5.7 and 5.8) and check the output of scsi_id on each of them. If that turns out OK, then I’ve screwed something else and you can probably ignore this post. I might not get to try it out until tomorrow. Either way, I’ll update this post with the results of that test.

Cheers

Tim…

Update 1: It’s definitely changed. See the following.

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
# /sbin/scsi_id -g -u -s /block/sda/sda1
SATA_VBOX_HARDDISK_VB54dff07f-931ce4d7_
#

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.7 (Tikanga)
# /sbin/scsi_id -g -u -s /block/sda/sda1
SATA_VBOX_HARDDISK_VBx180d717-f896e661_
#

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.8 (Tikanga)
# /sbin/scsi_id -g -u -s /block/sda/sda1
#

Update 2: As John Sobecki correctly pointed out in the comments, the title of the post is misleading. UDEV is not at fault here. The problem is the “/sbin/scsi_id” command is behaving differently, which is making my rules useless. The UDEV issue is the symptom, not the cause. The post is clearly focusing on the scsi_id issue, but I’ve picked a pretty bad title to go with it. 🙂

Update 3: John Sobecki pointed me at “[block] fail SCSI passthrough ioctls on partition devices CVE-2011-4127”, a mainline kernel security fix that seems to be the cause of this. It affects all new kernels which include this change (RHEL5/6, UEK etc). Oracle are testing the impact of this. Initially ASMLib and OCFS seem unaffected.

Update 4: MOS Note 1438604.1 (currently in review) contains more information about this issue. ASMLib and OCFS are unaffected by CVE-2011-4127, so ASMLib should probably be used in preference to UDEV with newer kernels.

Update 5: I’ve altered all the articles on my site to reference the parent (disk) device, rather than the partition device, which makes the UDEV rules work fine again. Thanks to Bryan Wood and Joachim for their suggestions.

Fedora 15: First big problem…

Yesterday I hit a pretty major problem with Fedora 15. I did a reboot and the login screen came up fine, but when I tried to log in I got a message saying,

failed to load session ‘gnome’

No options or alternatives. Just back to the login screen. ??

I started the machine up in “Full multiuser mode” by hitting the “a” key during boot and adding “3” on to the boot parameters. Once at the login prompt I could now log in as root. Since it looked like it might be a GNOME problem I uninstalled and reinstalled GNOME.

yum -y groupremove "GNOME Desktop Environment"
yum -y groupinstall "GNOME Desktop Environment"

No change!

My next thought was to install KDE, so at least I would have a desktop. I did this using,

yum -y groupinstall kde

I made KDE the default window manager by editing the “/etc/sysconfig/desktop” file to contain.

DISPLAYMANAGER=KDE

The machine now rebooted and I got KDM as the display manager. This allowed me to start KDE, but surprisingly, also allowed me to start GNOME as my window manager.

Now I figured it was probably an issue with GDM, not GNOME itself, so I reinstalled GDM.

yum -y remove gdm
yum -y install gdm
yum -y install gdm-plugin-fingerprint

Bingo. I was now able to switch back to GDM as my display manager by editing the “/etc/sysconfig/desktop” file to contain.

DISPLAYMANAGER=GNOME

I have no idea what happened to cause this problem in the first place. Googling for a solution wasn’t much help because most posts are really old and the new ones just said reinstall.

If anyone else has misfortune to run into this issue, you now know how I got out of it.

Incidentally, my brief time on KDE did not fill me with a desire to switch. I think I prefer GNOME. I am however a little nervous about the stability of Fedora 15 after this incident. Maybe I did something dumb to cause it, but if I did, I have no idea what it was. I’m just running a browser and VirtualBox VMs for the most part.

Cheers

Tim…

WordPress 2.5 linking problem…

I guess anyone following this blog will have noticed the problems I have when trying to link to other pages on this domain. It has been a common occurence for the first comment on every post about a new article to point out the link to the article doesn’t work…

It would appear that when I place a link to an article on oracle-base.com, WordPress tries to strip off the domain to make it a relative reference. I guess that makes sense when linking to other blog posts because it means you can move the blog to a new domain and not break all the links. The problem occurs when the link is to the main website, which is effectively an external site. More often than not, this results in a broken link. When I link to pages not part of oracle-base.com, it work fine.

This only seems to have happened since I moved to WordPress 2.5…

Is anyone else having this problem? 

Cheers

Tim…