scsi_id and UDEV issues (update)…

 

Last month I wrote about a problem I saw with scsi_id and UDEV in  OL5.8. As it screwed up all my UDEV rules is was a pretty important issue for me. It turned out this was due to a mainline security fix (CVE-2011-4127) affecting the latest kernels of both RHEL/OL5 and RHEL/OL6. The comments on the previous post show a couple of workarounds.

Over the weekend I started to update a couple of articles that mentioned UDEV rules (here and here) and noticed the problem had dissapeared. I updated two VMs (OL5.8 and OL6.2) with the latest changes, including the UEK updates and ran the tests again and here’s what I got.

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.8 (Tikanga)
# uname -r
2.6.39-100.6.1.el5uek
# scsi_id -g -u -s /block/sda/sda1
SATA_VBOX_HARDDISK_VB535d493d-7a44eb0f_
#

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)
# uname -r
2.6.39-100.6.1.el6uek.x86_64
# /sbin/scsi_id -g -u /dev/sda1
1ATA_VBOX_HARDDISK_VB2b5dc561-4ae6e154
#

So it looked like normal service had been resumed. :) Unfortunately, the MOS Note 1438604.1 associated with this issue is still not public, so I couldn’t tell if this was a unilateral change in UEK, or part of a mainline fix for the previous change.

To check I fired up a CentOS 6.2 VM with the latest kernel updates and switched an Oracle Linux VM to the latest RHEL compatible kernel and did the test on both. As you can see, they both still don’t report the scsi_id for partitions.

# cat /etc/redhat-release
CentOS release 6.2 (Final)
# uname -r
2.6.32-220.13.1.el6.x86_64
# /sbin/scsi_id -g -u /dev/sda1
#

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)
# uname -r
2.6.32-220.13.1.el6.x86_64
# /sbin/scsi_id -g -u /dev/sda1
#

It could be the associated fix has not worked through the mainline to RHEL and CentOS yet. I’ll do a bit of digging around to see what is going on here.

Cheers

Tim…

Update: It appears the reversion of this functionality may not be permanent, so I’ve updated my articles to use a “safer” method of referencing the parent (disk) device, rather than the partition device.

Oracle Linux 5.8 and UDEV issues…

 

I just did an update from Oracle Linux 5.7 to 5.8 on one of my VirtualBox RAC installations and things are not looking to clever at the moment. After a reboot, the ASM instances and therefore the database instances wouldn’t restart. A quick look showed the ASM disks were not visible. On this installation I was using UDEV, rather than ASMLib. In checking the UDEV rules I noticed the scsi_id command on OL5.8 doesn’t report an ID for partitions on disks, only the disks themselves. For example, on OL5.7 I get this,

# /sbin/scsi_id -g -u -s /block/sdb/sdb1
SATA_VBOX_HARDDISK_VBd306dbe0-df3367e3_
#

On OL5.8 I get this,

# /sbin/scsi_id -g -u -s /block/sdb/sdb1
#

If I run it against the disk, rather than the partition it works fine.

This has literally just happened, so I’ve done no further investigation, but I thought it was worth putting out there in case anyone was about to start an OS update on something they cared about. :)

At this point I’m not discounting that I’ve screwed up somewhere. My next plan is to install three clean VMs (OL 5.6, 5.7 and 5.8) and check the output of scsi_id on each of them. If that turns out OK, then I’ve screwed something else and you can probably ignore this post. I might not get to try it out until tomorrow. Either way, I’ll update this post with the results of that test.

Cheers

Tim…

Update 1: It’s definitely changed. See the following.

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
# /sbin/scsi_id -g -u -s /block/sda/sda1
SATA_VBOX_HARDDISK_VB54dff07f-931ce4d7_
#

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.7 (Tikanga)
# /sbin/scsi_id -g -u -s /block/sda/sda1
SATA_VBOX_HARDDISK_VBx180d717-f896e661_
#

# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.8 (Tikanga)
# /sbin/scsi_id -g -u -s /block/sda/sda1
#

Update 2: As John Sobecki correctly pointed out in the comments, the title of the post is misleading. UDEV is not at fault here. The problem is the “/sbin/scsi_id” command is behaving differently, which is making my rules useless. The UDEV issue is the symptom, not the cause. The post is clearly focusing on the scsi_id issue, but I’ve picked a pretty bad title to go with it. :)

Update 3: John Sobecki pointed me at “[block] fail SCSI passthrough ioctls on partition devices CVE-2011-4127”, a mainline kernel security fix that seems to be the cause of this. It affects all new kernels which include this change (RHEL5/6, UEK etc). Oracle are testing the impact of this. Initially ASMLib and OCFS seem unaffected.

Update 4: MOS Note 1438604.1 (currently in review) contains more information about this issue. ASMLib and OCFS are unaffected by CVE-2011-4127, so ASMLib should probably be used in preference to UDEV with newer kernels.

Update 5: I’ve altered all the articles on my site to reference the parent (disk) device, rather than the partition device, which makes the UDEV rules work fine again. Thanks to Bryan Wood and Joachim for their suggestions.

Fedora 15: First big problem…

 

Yesterday I hit a pretty major problem with Fedora 15. I did a reboot and the login screen came up fine, but when I tried to log in I got a message saying,

failed to load session ‘gnome’

No options or alternatives. Just back to the login screen. ??

I started the machine up in “Full multiuser mode” by hitting the “a” key during boot and adding “3” on to the boot parameters. Once at the login prompt I could now log in as root. Since it looked like it might be a GNOME problem I uninstalled and reinstalled GNOME.

yum -y groupremove "GNOME Desktop Environment"
yum -y groupinstall "GNOME Desktop Environment"

No change!

My next thought was to install KDE, so at least I would have a desktop. I did this using,

yum -y groupinstall kde

I made KDE the default window manager by editing the “/etc/sysconfig/desktop” file to contain.

DISPLAYMANAGER=KDE

The machine now rebooted and I got KDM as the display manager. This allowed me to start KDE, but surprisingly, also allowed me to start GNOME as my window manager.

Now I figured it was probably an issue with GDM, not GNOME itself, so I reinstalled GDM.

yum -y remove gdm
yum -y install gdm
yum -y install gdm-plugin-fingerprint

Bingo. I was now able to switch back to GDM as my display manager by editing the “/etc/sysconfig/desktop” file to contain.

DISPLAYMANAGER=GNOME

I have no idea what happened to cause this problem in the first place. Googling for a solution wasn’t much help because most posts are really old and the new ones just said reinstall.

If anyone else has misfortune to run into this issue, you now know how I got out of it.

Incidentally, my brief time on KDE did not fill me with a desire to switch. I think I prefer GNOME. I am however a little nervous about the stability of Fedora 15 after this incident. Maybe I did something dumb to cause it, but if I did, I have no idea what it was. I’m just running a browser and VirtualBox VMs for the most part.

Cheers

Tim…

WordPress 2.5 linking problem…

 

I guess anyone following this blog will have noticed the problems I have when trying to link to other pages on this domain. It has been a common occurence for the first comment on every post about a new article to point out the link to the article doesn’t work…

It would appear that when I place a link to an article on oracle-base.com, WordPress tries to strip off the domain to make it a relative reference. I guess that makes sense when linking to other blog posts because it means you can move the blog to a new domain and not break all the links. The problem occurs when the link is to the main website, which is effectively an external site. More often than not, this results in a broken link. When I link to pages not part of oracle-base.com, it work fine.

This only seems to have happened since I moved to WordPress 2.5…

Is anyone else having this problem? 

Cheers

Tim…