2 become 0, occasionally…

It’s been an “interesting” couple of days.

Since we moved to the new configuration we’ve have had some intermittent server crashes, including one last night where both nodes crashed simultaneously. We’ve applied an OS patch that is meant to solve the problem, but we’re still in the “fingers crossed” stage at the moment. It’s meant some busy nights and annoyed users, but we’re moving forward.

You’ve gotta love it…. Not!

Cheers

Tim…

3 become 2, RAC nodes that is…

This weekend our production system was switched from a 3-node to a 2-node RAC.

We were originally using a 2-node RAC (2 CPUs per node) and we added a third node because the system was struggling to cope with the workload. The third node helped us out in some ways, but it caused a lot of trouble in others. Ever since it’s inclusion it became impossible to take one node out of the RAC without bringing the lot crashing down, so much for high availability. In addition, a substantial proportion (about 30%) of the wait states on the system were due to inter-node communication. Now I expected with more nodes there would be more inter-node communication, but it seems a bit excessive. Heaven only knows what would happen in a 4-node cluster…

After a lot of banter with Oracle and HP we’ve finally decided to try a 2-node RAC again, but this time with 3 CPUs per node. OK, it’s actually 4 CPUs per node, but one CPU in each node is permanently offlined, so as not to affect our current Oracle licensing.

All the hardware modifications are complete and all tests indicate that the system is up and running normally. Of course the true test will happen tomorrow morning when the users log in and start to break things 🙂

The best news of all is that the move back to a 2-node cluster means that we can once again shut down one node at a time if we need to do maintenence. This is a big plus.

If everything goes quiet over the next few days it means that I’m fire-fighting and the switchover didn’t go well.

I’d be curious to see how many people out there are using RAC on more than 2 nodes. I’ve only done this on Tru64 with 1og Release 1, but I can say without a shadow of a doubt that it doesn’t work properly. I’m curious if this is Tru64 specific problem or if there is a fundamental flaw in RAC for clusters with more than 2 nodes.

Cheers

Tim…

The ever changing Oracle pricing model…

You’ve got to take your hat off to Oracle. Just when you thought they couldn’t make their pricing model any more ridiculous they come up with this gem. I guess we should be grateful that they’ve conceded this much. Prior to this announcement a single dual core processor was charged the same as two individual processors, even though it didn’t have the same performance.

Of course nobody pays list price for Oracle, but it is used to calculate the support and updates costs, so it’s in their interest to keep it artificially high.

I always find the TPC pricings a laugh. Oracle sounds real cheap until you realize they’ve only included a 3 year license. I’ve worked with Oracle products for over 10 years and I’ve yet to work for a company that has bought Oracle this way. They have all bought perpetual licenses for some serious wonga! Maybe we’re living in the dark ages in the UK 🙂

We’ve got some 3rd party applications running on mySQL and they work really well. Nice and cheap too. Well actually it’s free. We’ve got a few 3rd party applications running against SQL Server and they do the job nicely too. We’re looking to switch another project from Oracle to SQL Server part way through the implementation after some confusion over the Oracle licensing costs. It doesn’t take many incidents like this within a company before some momentum builds up and people start opting for cheaper alternatives.

Whenever anyone mentions licensing costs to me I hide under the desk. I’m a grunt, not an accountant!

Cheers

Tim…

Life is funny!

My osteopath just told me two funny stories today.

Fireworks
He treated a guy this week who had been drafted in as part of a team to organize a large firework display in Birmingham. The fireworks were all stored safely, but the guys took their homemade detonators back to the hotel with them. A cleaner walked in, saw all the gear, flipped and called the police. Next thing you know the police storm in arrest the guys and evacuate the center of Birmingham. The papers claim local businesses (mostly pubs) lost more than 1 million pounds of business as a result of the evacutation. If this sounds familiar see Welcome to my world!

You can imagine the scene down at the police station.

Firework guy: We’re not linked to Al Qaeda! We’re just preparing for Fireworks Fantasia!
Police: Now listen here sonny Jim, we weren’t born yesterday…

I know it’s not a funny subject, but it made me laugh.

Penguin
One of his patients took their 20 year old Down’s Syndrome son to a local zoo & theme park recently. Whilst there he repeatedly asked to go on the log flume ride, which they refused because they didn’t want him to get wet. At one point he went off on his own and when he returned he was soaked through. Naturally they assumed he had gone on the log flume ride, which he denied. When they got home the boy went to his room leaving his mother to unpack the bags. When she opened her son’s rucksack she found a penguin staring at her!

She rang the zoo, who said, “We’ll check if it’s one of ours!”, to which she replied, “Where else would he get a penguin from in Birmingham?”. Anyway, they fed the penguin pilchards until a man from the zoo came to pick it up.

I can only assume that the guy has seriously quick reflexes or it was one tame penguin!

I cannot guarantee that either of these stories are true, but they had me in stitches between blood-curdling crunches of my spine!

Cheers

Tim…

PS. My back feels good now! I reckon I’ll be OK to go to Karate tomorrow 🙂

Oracle Support Sucks…. Again.

Once again Oracle provides a less than perfect service on the support front. Let’s take a look at my latest encounter.

20-JUN-2005 – I raised an iTAR because the CC and BCC lists of the UTL_MAIL.SEND procedure don’t work. Emails are sent properly for people listed in the RECIPIENTS list, but mail to CC and BCC lists never get sent. I also sent this example code:

BEGIN
UTL_MAIL.send(sender     => 'me@mycompany.com',
recipients => 'person1@yourcompany.com',
cc         => 'person2@yourcompany.com',
bcc        => 'person3@yourcompany.com',
subject    => 'UTL_MAIL Test',
message    => 'If you get this message it worked!');
END;
/

Assuming these were real email addresses person1 would receive a mail, while person2 and person3 would not. I was able to repeat this issue on Tru64 and Windows.

Within a couple of hours support requested an OWC (Oracle Web Conferencing) session to investigate further. Unfortunately I never received the email of the iTAR update so I didn’t reply.

13-JUL-2005 – The iTAR gets updated asking me for an OWC session again. This time I get an email so I respond saying I don’t think an OWC session is necessary as there is nothing to show. The sample code says it all. In this case the OWC session seemed like complete waste of both our time.

14-JUL-2005 – The iTAR is updated requesting an OWC session again. I say OK and connect. During the 10 minute conference (accompanied by a phone call) my only input was to show the sample code, which was already in the iTAR. First I’m told the issue can’t be progressed as I’m not on the latest patch, to which I reply that 10.1.0.3.0 is the latest patch for Tru64. After that the support guy searches and finds a generic bug listed as being fixed in 10.1.0.4.0. If this bug had been public I would have found it and not raised the iTAR in the first place.

The iTAR is now closed.

Now I realize that the majority of the time wasted here is down to me waiting for an email that never came, rather than checking the iTAR status directly. Obviously, if this had been an important issue for me I would not have let it drag on so long, but the whole process took nearly 4 weeks to inform me that my problem was an existing bug. I think that’s pretty shocking, especially since the bug was found using the information from the original iTAR, not the subsequent OWC session.

Conclusion – I assume the support people work to quotas. By replying to ask a followup question or request an OWC session they can tick a box to say they’ve responded. I’m sure the statistics relating to response times at Oracle support make very impressive reading, but I believe the truth is very different.

I don’t have a problem with the support people themselves. Some are great and some are not. I just think the support process sucks! We pay a ridiculous amount of money for what I can only describe as a crappy service. These days I raise iTARs in an attempt to improve the product/documentation, not because I expect to get an answer. I’m more likely to do that by visiting a free forum or searching on Google.

I suppose I should be grateful. At least you get an answer to DB support requests. That’s more than can be said for AS10g support requests! We close those out of boredom 🙂

Cheers

Tim…

New Article – Partitioning an Existing Table

In a recent forum thread someone asked me to outline a method for Partitioning an Existing Table using the DBMS_REDEFINITION package. I figured this might be useful to other people so I wrote it up as an article.

I always approach partitioning with caution. Both the article and the forum thread warn against partitioning for the sake of it.

Cheers

Tim…

Oracle 10g Application Server, what’s the deal?

I’m begining to dislike Oracle 10g Application Server. That’s my polite and understated way of saying I loath, detest and hate it!

Before I move on I want to make it clear I’m a major fan of Oracle databases. I think Oracle consistently hit the nail on the head with respect to new database releases. Yes, they have a habit of adding chaff and bloat, but the core functionality is on the money every time.

I’ve been using Oracle’s application servers for a little over 2 years. My first experience was with 9iAS and if I ever see an installation of that again I will probably go on a killing spree. It’s like Oracle took a bunch of cool software, cobbled it together and made it totally unusable. If people ask me what 9iAS is like my immediate response is, “It’s an abortion!”.

When AS10g was released we moved to it right away. We had no choice, 9iAS didn’t work. For some months I basked in the glow of it’s brilliance, but little did I know the horrors that were waiting round the corner. Rather than list whats wrong with AS10g let’s look at it from another angle, let’s list what we want from an application server:

  • Reliability.
  • Speedy deployment of new applications.
  • Easy configuration.
  • High availability.
  • Simple problem diagnostics.
  • Simple performance monitoring.

The problem is AS10g gives me none of these. Let’s take these points one by one.

Reliability – We have logged untold amounts of bugs against AS10g, most of which have never been fixed to our satisfaction.

Speedy deployment of new applications – Our applications are pretty small and not exactly rocket science, but deployments to our 5 node application server cluster can take hours. You think I’m joking don’t you. I’m not! It’s not unheard of for us to lose our entire production system for a couple of hours during a deployment. Invariable a couple of nodes don’t deploy properly. By the time we’ve undeployed and redeployed the application, along with a few reboots, the user have packed up and gone home.

Easy configuration – Ok, it’s not the worst thing in the world, but there are so many products and layers to deal with that it becomes a nightmare if you want to do anything but the simplest application. I’ve just checked with one of my production app servers and it has 296 distinct log files. When someone asks me, “Are there any errors in the logs?”, it always brings a smile to my face.

High availability – I’ve already told you what happens when we deploy new applications! We have a 5 node cluster to make our application more resilient and maintain availability. Pitty we have to reboot before and after every application deployment. Until recently we were rebooting each app server once a day, but we’ve managed to get that down to once a week, provided we’re not deploying new versions of the application.

Simple problem diagnostics – Too many log files. Too many layers. We were hoping that grid control would come to our rescue, but it doesn’t work properly. I don’t even want to go there. You can read my earlier posts about that crap.

Simple performance monitoring – See previous answer. We’ve ended up writing some of our own tools. Sad I know!

I’m starting to depress myself so I’m going to knock this post on the head soon, but suffice to say, if I had my way we would ditch the lot and use Apache and PHP. No overcomplicated application servers and no J2EE. Simple, reliable and free!

I guess I can dream…

Cheers

Tim…

PS. For those of you that are assuming we’re just using it wrong, the consultants we’ve had in from HP and Oracle can’t make it work any better, so I guess we’re in good company 🙂

Welcome to my world!

I went to watch “War of the Worlds” at the cinema on Saturday. Visually fantastic, but a rather so-so film in all, and Tom Cruise is seriously lightweight in it. When I came out I found the roads blocked by police and lots of people walking away from the city center. It turns out there was a bomb scare in the center of Birmingham so they were evacuating everyone. I guess the events in London last week have made everyone a little jumpy. Straight out of a disaster movie and into a real life disaster…. almost.

So that’s the state of play in my world, but let’s not forget the really important news, Harry Potter’s ‘Half Blood Prince’ Leaked.

Now consider the following words carefully. IT’S A CHILDREN’S BOOK!!!!!

Who gives a monkies if a few copies are sold early! If this were a piece of software it would have been released two years ago and bug-fixed to it’s current level with several service packs.

Every now and then events happen that bring a certain clarity with them. All I can think is that if the early release of a kids book is that important, we are doomed.

Cheers

Tim…

PS. The rather grumpy tone of this post is in part due to a lack of sleep caused by a dodgy back. I’m going to hit the pain killers later, which should improve my mood 🙂

My Back Hurts!

OK. The pattern usually goes something like this:

  1. I injure my back doing something stupid, like Karate.
  2. The osteopath cracks it back into position.
  3. Yoga keeps my back flexible, strong and happy.
  4. Everything goes well for a few months.
  5. Goto 1.

Wow. That’s the first time I’ve used a goto statement since I was a kid programming on a ZX81 🙂

Anyway, on Friday one of the girls at Karate asked me if my back was OK. She said I wasn’t moving well and thought my back might have gone out again. Everything felt good so I thought nothing more of it.

Next day, bang. My back felt nasty. Added to that, I had promised to teach two Yoga classes for a friend who was away for his wedding anniversary. I managed to muddle through the classes OK, but my sciatic nerve was firing like crazy making my left leg freak out.

I’ve just got to survive a day of playing with my nephews today before a hasty trip to the oeseopath so I can start the cycle again.

You gotta laugh…

Cheers

Tim…

More 10g R2 Install Guides

I’ve finished another couple of 10g Release 2 installation guides:

Oracle Database 10g Release 2 (10.2.0.1) Installation On Fedora Core 3
Oracle Database 10g Release 2 (10.2.0.1) Installation On RedHat Advanced Server 3.0

The installation on RHEL 3.0 is only supported on Update 3 and upwards, but it does install OK on the initial OS release.

Cheers

Tim…