3 become 2, RAC nodes that is…

 

This weekend our production system was switched from a 3-node to a 2-node RAC.

We were originally using a 2-node RAC (2 CPUs per node) and we added a third node because the system was struggling to cope with the workload. The third node helped us out in some ways, but it caused a lot of trouble in others. Ever since it’s inclusion it became impossible to take one node out of the RAC without bringing the lot crashing down, so much for high availability. In addition, a substantial proportion (about 30%) of the wait states on the system were due to inter-node communication. Now I expected with more nodes there would be more inter-node communication, but it seems a bit excessive. Heaven only knows what would happen in a 4-node cluster…

After a lot of banter with Oracle and HP we’ve finally decided to try a 2-node RAC again, but this time with 3 CPUs per node. OK, it’s actually 4 CPUs per node, but one CPU in each node is permanently offlined, so as not to affect our current Oracle licensing.

All the hardware modifications are complete and all tests indicate that the system is up and running normally. Of course the true test will happen tomorrow morning when the users log in and start to break things πŸ™‚

The best news of all is that the move back to a 2-node cluster means that we can once again shut down one node at a time if we need to do maintenence. This is a big plus.

If everything goes quiet over the next few days it means that I’m fire-fighting and the switchover didn’t go well.

I’d be curious to see how many people out there are using RAC on more than 2 nodes. I’ve only done this on Tru64 with 1og Release 1, but I can say without a shadow of a doubt that it doesn’t work properly. I’m curious if this is Tru64 specific problem or if there is a fundamental flaw in RAC for clusters with more than 2 nodes.

Cheers

Tim…

Author: Tim...

DBA, Developer, Author, Trainer.

4 thoughts on “3 become 2, RAC nodes that is…”

  1. Tim,

    K Gopalkrishnan recently posted on a thread (oracle-l) that at-most there are only 3 parties involved. I am quoting him here …

    Raj:

    There won’t be technically any differnt from 3 node to 64 node as in RAC a maximum of 3 parties involved in ANY resource management
    (Master-holder-requester) . Be it is 3 nodes or 64 nodes or 128 nodes. If the application scales for 3 nodes, it will definitely scale for 64 nodes. I don’t understand why people have a wrong opion with additional number of nodes. I have customer running 8 nodes without any performance penalties and not much gc waits also. The application
    is NOT partitioned btw.

    Best Regards,
    K Gopalakrishnan

    So could it be Tru64 specific?
    Raj

  2. I believe that a two node cluster is a special case. With only two nodes some optimizations can be made that reduce the load on both servers. Put simply, my understanding is that in a two node cluster each node knows that there is only ever one other place that it will have to look for data if it cannot find it in it’s own caches. Three nodes is the first case where things get more complicated, hence a longer code path and more communication between nodes.

  3. It may well be that 3-node is worse than 2-node, but beyond 3-node the impact is not as great with each subsequent node…

    Of course, this doesn’t detract from the fact that with three nodes we can’t take out a single node without the cluster going nuts. I guess that must be a tru64 thing πŸ™‚

    Cheers

    Tim…

Comments are closed.