Oracle Database Patching (Poll Results Discussed)

Having recently put out a post about database patching, I was interested to know what people out in the world were doing, so I went to Twitter to ask.

As always, the sample size is small and my followers have an Oracle bias, so you can decide how representative you think these number are…

Patching Frequency

Here was the first question.

How often do you patch your production Oracle GI/DB installations? (Pick the nearest that applies)

There was a fairly even spread of answers, with about a third of people doing quarterly patching, and a quarter doing six-monthly patching. I feel like both these options are reasonable. About 20% were doing yearly patching, which is starting to sound a little risky to me. The real downer was over 22% of people never patch their databases. This is interesting when you consider the recent announcement about monthly recommended patches (MRPs).

For those people that never patch, I can think of a few reasons off the top off my head why.

  • Lack of testing resource. I think patch frequency has more to do with testing than any other factor. If you have a lot of databases, the testing resource to get through a patching cycle can be quite considerable. This is why you have to invest some time and money into automated testing.
  • If it ain’t broke, don’t fix it. The problem is, it is broken! How long after your system has been compromised will it be before you notice? How are your customers going to feel when you have a data breach and they find out you haven’t even taken basic steps to protect them? I don’t envy you explaining this…
  • Fear of downtime. I know downtime is a real issue to some companies, but there are several ways to mitigate this, and you have to balance the pros and the cons. I think if most people are honest, they can afford the downtime to patch their systems. They are just using this as an excuse.
  • Patching is risky. I understand that patches can introduce new issues, but that is why there are multiple ways to patch, with some being more conservative from a risk perspective. I think this is just another excuse.
  • Out of support database versions. I think this is a big factor. A lot of people run really old versions of the database that are no longer in support, and are no longer receiving patches. I don’t even think I need to explain why this is a terrible idea. Once again, how are you going to explain this to your customers?
  • Lack of skills. We like to think that every system is looked after by a qualified DBA, but the reality is that is just not true. I get a lot of questions from people who are SQL Server and MySQL DBAs that have been given some Oracle databases to look after, and they freely admit to not having the skills to look after them. Even amongst Oracle DBAs there is a massive variation in skills. Oracle patching has improved over the years, but it is still painful compared to other database engines. Just saying.

Type of Patching

This was the second question.

When patching your production Oracle GI/DB installations, which method do you use?
In-Place = Current ORACLE_HOME
Out-Of-Place = New ORACLE_HOME

This was a fairly even split, with In-Place winning by a small margin. Oracle recommend Out-Of-Place patching, but I think both options are fine if you understand the implications. I discussed these in my previous post.

Conclusion

I think of patch frequency in a similar way to upgrade frequency. If you do it very rarely, it’s really scary, and because nobody remembers what they did last time, there are a bunch of problems that occur, which makes everyone nervous about the next patch/upgrade. There are two ways to respond to this. The first is to delay patching and upgrades as long as possible, which will result in the next big disaster project. The second is to increase your patch/upgrade frequency, so everyone becomes well versed in what they have to do, and it becomes a well oiled machine. You get good at what you do frequently. As you might expect, I prefer the second option. I’ve fought long and hard to get my company into a quarterly patching schedule, and it will only decrease in frequency over my dead body!

Assuming the results of these polls are representative of the wider community, I feel like Oracle need to sit up and take notice. Patching is better than it was, but “less bad” is not the same as “good”. It is still too complicated, and too prone to introducing new issues IMHO!

Cheers

Tim…

Database Patching : It’s a difficult subject

If you came hear hoping I was going to say there are valid reasons not to patch, you are out of luck. There is never a valid reason not to patch…

Instead this post is more about the general approach to patching. I’ve spent 22+ years writing about Oracle, including how to install it, but I’ve written practically nothing about how to patch a database. My stock answer is “read the patch notes”, and to be honest that is probably the best thing anyone can do. Although patching is a lot more standardized these days, it’s still worth reading the patch notes in case something unexpected happens. In this post I just want to talk about a few top-level things…

Patching to a new ORACLE_HOME

There are two big reasons for patching to a new ORACLE_HOME, or out-of-place patching.

  1. You can apply the binary patches to the new home while the database is still running in the old home, so you reduce the total amount of downtime.
  2. You have a natural fallback in the event of the wanting to revert the patch. You don’t have to wait for the patch rollback to complete.

There are some downsides though.

  1. It requires extra space to hold both the unpatched and patched homes, until you reach a point where you are happy to remove the unpatched home.
  2. If you have any scripts that reference the ORACLE_HOME, they will need to be updated. Hopefully you’ve centralized this into a single environment setup script.
  3. I guess it’s a little more complicated, and the patch notes are not that helpful.

So should you follow the recommendation of patching to a new home or not? The answer as always is “it depends”.

The reduction in downtime for a single instance database is good, but if you are running RAC or Data Guard, this isn’t really an issue as the database remains online for most of the patching anyway. Having a quick fallback is great, but once again if you are running RAC or Data Guard this isn’t a big deal.

If you are running without RAC or Data Guard, you have made a decision that you can tolerate a certain level of downtime, so is taking the system down for an hour every quarter that big a deal? I’ve heard of folks who use RAC and/or Data Guard who still bring the whole system offline to patch, so the decision is probably going to be very different for people, depending on their environment and the constraints they are working with.

I hope you’re taking OS and database backups before patching. If something catastrophic happens, such that a rollback of the patch is not possible, you can recover your original home and database from the backups. Clearly this could take a long time, depending on how your backups are done, but the risk of loss is low. So the question is, can you tolerate the additional downtime?

You have to make a decision on the pros and cons of each approach for you, and of course deal with the consequences. If in doubt, go with the recommendation and patch to a new home.

Read-only Oracle homes

Read-only Oracle homes were introduced in 18c (here) as an option, and are the default from Oracle 21c onward. One of the benefits of read-only Oracle homes is they make switching homes so much easier. You haven’t got to worry about copying configuration files between homes, as they are already located outside the home.

Release Update (RU) or Release Update Revision (RUR)?

You have a choice between patching using a Release Update (RU), or a Release Update Revision (RUR). To put it simply, a RU contains not only the latest security patches and regression fixes, but may also include additional functionality, so the risk of introducing a new bug is higher. A RUR is just the security patches and regression fixes. Unlike the Critical Patch Updates (CPUs) of the past, that ran on endlessly, RURs are tied to specific RUs, so you will end up applying the RUs, but at a later date, when hopefully the bugs have been sorted by the RUR…

The folks at Oracle suggest applying the RUs, which is what I (currently) do. Some in the Oracle community suggest applying RURs is the safer strategy. If you look at the “Known Issues” for each RU, and the list of recommended one-off patches that should be applied after the RU, you can see why some people are nervous of going directly to RUs.

Once again, this comes down to you and your experience of patching with the feature set you use. If you are finding RUs are too problematic, go with the RUR approach. You can always change your mind at any time…

Monthly Recommended Patches (MRPs)

There’s a new kid on the block starting with 19.17 on Linux, which are monthly recommended patches (MRPs). They replace RURs. There are 6 MRPs per RU, with each MRP containing the RU and the current batch of recommended one-off patches, as documented in MOS Note 555.1.

I’m assuming these are rolling and standby-first patches, but I can’t confirm that yet.

RAC Patching : Rolling Patches

Rolling patches can be applied one node at a time, so there are always database instances running, which means the database remains available for the whole of the patching process.

Release Updates (RUs) and Release Update Revisions (RURs) are always rolling patches, so it makes sense to take advantage of this approach. If you are applying one-off patches, these may not be rolling patches, so always check the patch notes to make sure.

Even when rolling patches are available, you can still make the decision to take the whole system offline to apply the patches. I’m not sure why you would want to do this, but the option is there for you.

Data Guard : Standby-First Patches

Release Updates (RUs) and Release Update Revisions (RURs) are always standby-first patches. This gives you some flexibility on how you approach patching your system. Here are two scenarios with a two node Data Guard setup, where node 1 is the primary and node 2 is the standby.

Scenario 1 : Switchovers

  • Patch the node 2 binaries (not datapatch) and bring the standby back into recovery mode.
  • Switchover roles, making the node 2 the primary and node 1 the standby.
  • Patch the node 1 binaries (not datapatch) and bring the standby back into recovery mode.
  • Run datapatch against node 2 (the primary database).
  • Optionally switchover roles making node 1 the primary database again.

Scenario 2 : No switchovers

  • Patch the node 2 binaries (not datapatch), but don’t start the standby.
  • Patch the node 1 binaries (not datapatch) and start the database.
  • Start the standby on node 2.
  • Run datapatch on node 1 (the primary).

Scenario 1 reduces downtime, as the primary is always running while the standby is having the binaries patched. Scenario 2 is simpler, but has a more extensive downtime as the primary is out of action while the binaries are being patched.

Remember, one-off patches may not be standby-first patches, so you may only have the option of scenario 2 when applying them. You have to read the patch notes.

OJVM Patching : Which approach?

Oracle 21c has simplified the OJVM patching situation. In previous releases the OJVM patches were completely separate. The grid infrastructure (GI) and database patches for 21c include the OJVM patches. For 19c the OJVM patches are still separate.

The separate 19c OJVM patches come with additional restrictions. They are not standby-first patches, and according to the patch notes, they can only be applied as RAC rolling patches if you use out-of-place patching.

Why don’t you write about patching much?

Writing about patching is difficult, because everyone has a unique environment, and their own constraints placed on them by their business. I’ve always avoided writing too much about patching because I know it’s opening myself up for criticism. Whatever you say, someone will always disagree because of their unique situation, or demand yet another patching scenario because of their unique environment. You’re damned if you do, and damned if you don’t.

I’ve recently written a few patching articles for specific scenarios (here). I may add some more, but it’s not going to be a complete list, and don’t expect me to write articles about stuff I don’t use, like Exadata. These are purely meant as inspiration for new people. Ultimately, you need to read the patch notes and decide what is best for you!

Let the cloud do it!

If all this is too much hassle, you do have the option of moving your database to the cloud and letting them worry about patching it. 🙂

Conclusion

Read the patch notes!

Cheers

Tim…

Vagrant : “SSH auth method: private key” – Timed out…

Out of nowhere I recently started to get problems with Vagrant running on a Windows 11 host. The “vagrant up” command would always hang at the “SSH auth method: private key” stage. You can see an example of the output here.

    default: SSH address: 127.0.0.1:2222
    default: SSH username: vagrant
    default: SSH auth method: private key
Timed out while waiting for the machine to boot. This means that
Vagrant was unable to communicate with the guest machine within
the configured ("config.vm.boot_timeout" value) time period.

If you look above, you should be able to see the error(s) that
Vagrant had when attempting to connect to the machine. These errors
are usually good hints as to what may be wrong.

If you're using a custom box, make sure that networking is properly
working and you're able to connect to the machine. It is a common
problem that networking isn't setup properly in these boxes.
Verify that authentication configurations are also setup properly,
as well.

If the box appears to be booting properly, you may want to increase
the timeout ("config.vm.boot_timeout") value.

I didn’t get the same issue on other hosts (Windows 10 and macOS), so I figured it was something specific to Windows 11.

That started me Googling, which came back with a bunch of results, each of which appeared to work for some people, but not all. I’ve just found one that worked for me, so I figured I would write this post to bring together the fixes I tried, regardless of success for me. If you are experiencing this issue, maybe one of these will fix things for you.

Kernel Panic – Add more vCPUs

This was my issue, and of course it was the last suggestion I found. Typical! After trying all of those proposed fixes below I eventually started watching the VirtualBox console during boot and I noticed there was a kernel panic being reported, and the boot was stalling. This seemed to be different to all the other problems people were encountering. When I started to Google for VirtualBox kernel panics, I came across a forum thread where people said they were seeing this when they had a VM with a single vCPU. It just so happened my problems were occurring when I was doing a RAC build, where the DNS node has a single vCPU. Since you build the DNS node first, I was being blocked. I never thought to try other builds, as the DNS build was the simplest and quickest. 🙂

As soon as I switched the DNS node to use 2 vCPUs the kernel panics stopped and I was able to move forward.

Some references suggest maxing out the video memory also. This can be done by adding the following entry to the Vagrantfile.

vb.customize ["modifyvm", :id, "--vram", "128"]

Using a single vCPU for the DNS node worked fine on Windows 10. My Windows 10 machine has the same version of VirtualBox and Vagrant installed, so it wasn’t anything to do with those software versions directly.

Extend Boot Timeout

The notes produced by the timeout suggest increasing the timeout by adding the following to the Vagrantfile.

config.vm.boot_timeout = 600

Needless to say this was the first thing I tried and it didn’t work for me.

BIOS Settings

Some people experienced this symptom when the virtualization features were disabled in their BIOS. I checked my BIOS and all the virtualization features were enabled, so this wasn’t my issue.

Some people updated their BIOS, which fixed the issue for them.

Windows Hypervisor Clash

Some people experienced a similar symptom when they had Hyper-V enabled on their PCs. The recommendations vary, but many people suggest turning off the following features.

  • Hyper-V (this was not present on my PC)
  • Virtual Machine Platform (already disabled for me)
  • Windows Hypervisor Platform (already disabled for me)

I found several threads discussing this, but this is the thread I remember seeing first.

If you want to check this, click the Windows button and type “Turn Windows features on or off” to open the dialog.

VirtualBox and Vagrant Software Versions

Some people experienced a similar symptom after upgrading VirtualBox, Vagrant or both. I had recently upgraded to VirtualBox 6.1.38 and Vagrant 2.3.0, but both had worked fine (I think) since the upgrade. Despite this I tried the following:

  • Removing VirtualBox and Vagrant and installing the previous versions.
  • Removing VirtualBox and Vagrant and installing the previous version of Virtual Box, and the latest version of Vagrant.
  • Removing VirtualBox and Vagrant and installing the latest version of Virtual Box, and the previous version of Vagrant.

None of these made a difference for me, so I concluded it wasn’t related to the version of VirtualBox and Vagrant specifically.

Cable Connected for Network Adapter

For some people their virtual networks were not marked as connected, as shown by checking out the network adapters in the VirtualBox setting for the resulting VM. To fix this they added the following to their Vagrantfile.

config.vm.provider "virtualbox" do |vb|
    vb.customize ["modifyvm", :id, "--cableconnected1", "on"]
end

My network adapters were all marked as “Cable Connected”, even without this Vagrantfile setting, so this wasn’t my problem.

SSH Keys

Some people were having general problems using the default insecure keys. In case you didn’t know, Vagrant boxes are built using a known insecure key. Once you create a VM from a box a new secure key is created and the insecure key is removed. It’s just a way to make sure Vagrant can always connect without a password the first time it boots a new VM.

For some people this default action was causing them problems, so they registered their own secure key and booted the boxes directly with this, setting the secure key in the Vagrantfile.

Conclusion

It’s unfortunate that several situations can result in the same symptom, which lead me to chase my tail for several days. Fortunately I had other machines I could use for builds, so I wasn’t totally stumped.

Hopefully you won’t have to, but if you run into this issue, give some of these a try and see if any of them fix things for you.

Cheers

Tim…

VirtualBox 6.1.38

VirtualBox 6.1.38 has been released.

The downloads and changelog are in the usual places.

I’ve installed it on Windows 11 and macOS Big Sur hosts with no dramas. I’ve updated my Oracle Linux 7, 8 and 9 Vagrant boxes.

It’s probably going to take me a few days to run through my builds, but I’m sure they will be fine. 🙂

Cheers

Tim…

I don’t accept guest posts (except this)

I get a lot of requests from people wanting to post their articles on oracle-base.com, and they are always answered with a firm no! One of things I’ve always said is I like the fact it is one person’s voice. I think that means something…

A few weeks ago someone asked me if I had written anything about Oracle partitioning from a design perspective. Not how to do it, I’ve got loads of those types of articles (here). Since I had nothing suitable, I went to Twitter to ask for advice, and Patrick Wolf pointed me to this article by Lothar Flatz. It’s in German, and the person who asked me the original question was a German speaker, so it was a fortunate turn of events. 🙂

Google Translate did a pretty good job at translating it, which allowed me to read it, but speaking with Lothar we agreed an English translation of the article would be nice. It’s really easy to overlook posts that aren’t written in your native language, even with Google Translate available…

Fast forward a few weeks and Lothar was looking for somewhere to publish his translated post. I suggested a few options he should check out, and in a rather unusual turn of events I also offered to publish it on my website. 🙂

After speaking to some other people he came back to me and here is the first, and probably only, guest post on my website. I hope you like it.

Partitioning From a Design Perspective (A Practical Guide) – By Lothar Flatz

As a general point, I’m a firm believer in owning your content, so I would always suggest people publish on their own blog.

As for my website, my stance is still the same. I don’t accept guest posts (except this). 🙂

Cheers

Tim…

Test Cases Are Important : Again…

Over the weekend I was reminded of the importance of test cases again. I’ve written about this before, with probably the most consistent post here.

If you want my opinions of test cases, go and read that. In this post I want to tell a little story to demonstrate why I think test cases are important. I’m going to keep things a bit vague, because I don’t want to openly criticise the person in question, because they actually did an OK job of expressing their issue, but it did highlight some things.

The issue

I got a question that suggested a recent upgrade on Autonomous Database had altered the behaviour of something. Every time you patch or upgrade software there is a possibility of change, whether it is an intentional behaviour change, or a bug. The person had provided some evidence that did seem to suggest there was an issue, so my interest was piqued. Unfortunately there wasn’t a test case, but I have an article that includes a test case that was similar, so I was able to knock something together pretty quickly.

The test case

The first thing I did was try out my test case in an on-prem installation. Yes, I know the potential issue related to autonomous database, but I wanted to see the test case working, and prove to myself that what I believed to be true actually was true. Think of this as an experimental control. The test case ran as expected on-prem, which was good.

I then moved to trying to replicate this issue on autonomous database. Most cloud databases come with some restrictions on what you can do, so my test case setup was not ideal for running on autonomous database. I had to revise the setup a little. APEX to the rescue. Before you ask, yes, I did rerun the on-prem test with the new setup to make sure the control was still valid. 🙂 Having set up the base data, I was able to run the code for my test case, and it ran just the same in autonomous database as it did on-prem.

Test case vs in situ

In the original question, the issue was directed specifically at one feature, but my test case seemed to prove that feature was working as expected. When you are doing scientific experiments you try to reduce the number of variables. Too many variables and you have no idea what caused the result, so you can’t come to any reasonable conclusion. I was trying to prove a feature works as expected, so I reduced the possible variables to the point where I was specifically testing that feature, and it seems to work as expected.

So that’s the end of it right? Well not really. I’ll use an example from biology to explain. Biology is complicated because living things are complicated. When you are doing chemistry, it’s possible to isolate specific compounds and put them together in a controlled manner to observe an interaction. Kind-of like my test case, this is a very controlled approach. Living things have loads of working parts, and you can’t isolate things without killing the organism, so you have to deal with the fact you are working in the middle of a whole bunch of interactions. You still try to minimise your variables, but you have to accept that you can’t always do that to the extent you would like. You may define experimental controls that discount the other possible reasons for the result. This distinction between running things in isolation and in situ is really important. What has this got to do with my test case?

My test case is run in isolation. The original poster clearly has an issue in their system. Perhaps there is something in their system that affects the way this feature works, so although the feature works in isolation, maybe there is an issue in situ. My test case hasn’t resolved the issue. It has just ticked one thing off the list of possible causes.

What next?

Having ticked the base functionality off the list of possible causes, we then have to move up one step higher and incorporate more elements of the system, to see how that affect things. That could be as simple as we are using different session parameters, or it could be something more fundamental with the design of their system. Hell, it might be their data is corrupt for all I know (I really hope not).

It’s also possible when looking at the “next layer”, we notice something that shows the original test case is invalid. That sort of things happens.

What’s the point of this post?

I often get the impression some people think problem solving is some kind of witchcraft. In reality it is painstaking meticulous work. I look at all the people I think are good and they have one thing in common. They put in the the work and grind through this stuff. Yes, you get quicker the more experienced you get, but you still have to put in the effort. People are often looking for the “magic button” to solve their problem, but there isn’t one. If it were that simple, it would already be built into every piece of software you use. 🙂

You need a test case, even if all it does is prove your initial conclusion was wrong, and allows you to focus your attention elsewhere.

Once again, the question that promoted this post was not bad. The person did an OK job of expressing themselves. This is just a post that was triggered by that interaction. If we get to the bottom of their issue, and it proves to be interesting, I will probably write up something more specific about it. 🙂

You might find it useful to read these, as they are relevant to this post.

Cheers

Tim…

Update: This looks like it is a data/understanding issue. It’s starting to sound like the data isn’t stored in the format the original poster expected, so they are trying to do something with it that is impossible. If this is the case, it’s nothing to do with the upgrade.

User Experience – A Little Rant Again

I had a bit of a negative post yesterday, and it got me thinking of these two posts.

I’ve said some of this stuff before, but I want to bring it all into a slightly different context.

Good user experience is…

Good user experience is not about forcing me to follow your atomic implementation of a feature. What do I mean by this? Let’s take look at some examples of getting it right (IMHO) from Oracle.

An Oracle REST Data Services (ORDS) web service is made up of a module with one or more templates, each with one or more handlers. We could define our service by defining a module, template and a handler separately, because that’s how the underlying implementation of an ORDS web service works. It’s fine, but it’s a bit over the top if I just want a quick little web service based on a query. That’s why we have been given the DEFINE_SERVICE procedure, allowing us to do all that other stuff in a single call (see here). For simple services this is all you need.

The database scheduler is a complex beast. We can define loads of things like schedules, programs, arguments, jobs classes, windows and of course jobs. That’s fine, but 99% of the time we just want a simple job, and the CREATE_JOB procedure allows us create one in a single call (see here).

In both cases we can choose between doing things the long/verbose way, or use the “cheat code” and do stuff in a single call. This is exactly the sort of thing I like when I’m using a feature. I want to know the flexibility is there if I need it, but if 99% of my requirements don’t, I want the cheat code so I can do what I need to do and move on. This also makes the feature more accessible to new people…

Good user experience is not…

As I mentioned above, good user experience is not about forcing me to follow your atomic implementation of a feature. Someone should take a step back and ask what would “normal” users really like? The answer is probably giving them an option to zone out and get all the prerequisites and config done for them. It’s not making them spend a weekend trying to figure out how to enable a feature, then finding it doesn’t really work properly anyway…

I’m a generalist. I have to work with lots of different products. When I open the docs and I see a list of prerequisites, and then multiple commands to actually set stuff up my heart sinks. I want a “we’ll do everything for you” option. That might sound funny because of my history, and if companies did that it would make my website redundant, but I feel we need to progress. We’ve been doing this nuts & bolts crap for too long. If I can automate it, Oracle can automate it. If Oracle can automate it, why don’t they?

I don’t want to name and shame. I’ve made some positive comments about Oracle in the previous section, but you know there are a whole bunch of Oracle things I could use as examples of what not to do. Oracle aren’t alone here. It applies to lots of other companies too.

But Tim, I want to…

I can already hear people typing their responses about their need to be in control and their obsessive configuration disorder. Shut up. I don’t care. The chances are, if you are reading this post, you are probably one of the people that can cope with all this tech, but there are many people who can’t, or don’t want to.

Won’t someone think of the children customers

I am a customer. My company is a customer. I can think of two things my company refuse to pay for because the functionality in question is unsupportable if I’m not available. Those are features we need, but won’t buy because they are overly complex for normal people to do well.

Now you can argue that cloud services will solve all these issues, but cloud adoption varies between regions, and maybe people will not pick your cloud. My company are a perfect example of that. We’ve consolidated on Azure, and although we don’t run any Oracle databases there yet, if we run Oracle on the cloud, it will probably be on Azure.

If you heard someone say, “I used to get a punch in the face every day, but now it’s only once a week. Things are good!”, you would think they were crazy. Less bad is not the same as good. I often think companies bring out tools and utilities that are “less bad” than what they had before. Not actually “good”. If you have been in the trenches, “less bad” might feel “good”, but it’s not.

I realise this is another rant, but I think it’s a subject that is worth a rant. I use a wide variety of tech from a number of companies, and some of them get on my nerves at times, because it feels like user experience is an after thought. You can’t expect everyone to no-life the learning curve for your products. I’m just saying how I feel, and I’m pretty sure I’m not alone here!

Cheers

Tim…

PS. I’m playing a bit fast and loose with the term user experience in this post, but hopefully you get what I mean…

DG PDB : Oracle Data Guard for Pluggable Databases in 21c, and why you shouldn’t use it!

Last month you may have noticed the announcement of DG PDB. It’s Data Guard for PDBs, rather than CDBs, introduced in the Oracle 21.7 release update.

How do you use it?

I’ve had a play around with it, which resulted in this article.

I also did a Vagrant build, which includes the build of the servers, the database software installations, database creations and the perquisites, so you can jump straight to the DG PDB configuration section in the article. You can find that build here.

So that’s the basic how-to covered, and I really do mean “basic”. There is a lot more people might want to do with it, but it’s beyond the scope of my little Vagrant build.

What do I think about it?

Well I guess you know how this is going to go, based on the title of this post. I don’t like it (yet), but I’m going to try and be a bit more constructive than that.

  • It is buggy! : I know 21c is an innovation release, but this is a HA/DR solution, so it needs to be bullet proof and it’s not. There are a number of issues when you come to use it, which will most likely be fixed in a future release update, or database version, but for now this is a production release and I don’t feel like it is safe pair of hands for real PDBs. That is a *very* bad look for a product of this type.
  • Is it Data Guard? Really? : Once again, I know this is the first release of this functionality, but there are so many restrictions associated with it that I wonder if it is even deserving of the Data Guard name. I feel like it should have been a little further along the development cycle before it got associated with the name Data Guard. The first time someone has a problem with DG PDB, and they definitely will, they are going to say some choice words about Data Guard. I know this because I was throwing around some expletives when I was having issues with it. That’s not a feeling you want to associated with one of your HA/DR products…
  • Is this even scriptable? : The “add pluggable database” step in the DGMGRL utility prompts for a password. Maybe I’ve missed something, but I didn’t see a way to supply this silently. If it needs human interaction it is not finished. If someone can explain to me what I’ve missed, that would be good. If I’m correct and this can’t be done silently, it needs some new arguments. It doesn’t help that it consistently fails the first time you call it, but works the second time. Ouch!
  • Is the standby PDB created or not? : When you run the “add pluggable database” command (and it eventually works) it creates the standby PDB, but there are no datafiles associated with it. You have to copy those across yourself. The default action should be to copy the files across. Oracle could do it quite easily with the DBMS_FILE_TRANSFER package, or some variant of a hot clone. There should still be an option to not do the datafile copy, as some people might want to move the files manually, and that is fine, but to not have a way to include the file copy seems a bit crappy.
  • Ease of use : Oracle 21c introduced the PREPARE FOR DATA GUARD command, which automates a whole bunch of prerequisites for Data Guard setup, which is a really nice touch. Of course DG PDB has many of the same prerequisites, so you can use PREPARE FOR DATA GUARD to get yourself in a good place to start, but I still feel like there are too many moving parts to get going. I really want it to be a single command that takes me from zero to hero. I could say this about many other Oracle features too, but that’s the subject of another blog post.
  • Overall : A few times I got myself into such a mess the only thing I could do was rebuild the whole environment. That’s not a good look for a HA/DR product!

Conclusion

I’m sorry if I’ve pissed off any of the folks that worked on this feature. It wasn’t my intention. I just don’t think this is ready to be included in a production release yet. I’m hoping I can sing its praises of a future release of this functionality!

Cheers

Tim…

PS. I’m reminded of this post about The Definition of Done.

What a “simple” service request looks like on My Oracle Support (MOS)

I had a strop on Twitter again about My Oracle Support, and I just want to document why.

Someone on Twitter pointed out to me that the “active” real-time SQL monitoring reports weren’t working. It’s really easy to demonstrate, so I ran through it, and sure enough it was broken. When you opened the report in a browser, some of the dependent files (Javascript and a Flash movie) were missing, giving a 404 error.

I opened a service request (SR) on My Oracle Support (MOS). The first hurdle was the “Problem Type”. It’s mandatory, and the list of options is crap. Keep in mind that what you pick launches you into a load of automations, so picking the wrong thing is painful, but you typically have to because there is no “none of the above” option to pick… I decided on a performance related option, since the issue was a to do with the real-time SQL monitoring feature, even though I know this was not about the performance of my database…

I gave a link to a working example, and uploaded a generated report and the output from the Chrome developer console, showing all the 404 errors. It was a pretty self contained thing, so I was sure it would be understood as soon as a human saw it. This is what happened next…

An automated response.

Another automated response.

Another automated response.

Another automated response.

By now you can imagine I’m getting a little annoyed, so this is my response.

This was followed by a repost of my original post. Word for word. Seemed rather strange, but ok…

This was followed by three automated messages saying the same thing.

I was about to go supernova at this point, but I managed not to lose my shit.

Finally we have someone claiming to be a human!

I get asked to look at something that wasn’t directly related to the point of my SR.

I can feel the rage building.

The following day the person actually runs my report, which was uploaded in my initial SR opener, and it works for them. They update the SR. Once I see the update I try and sure enough it works for me now too. At some point between me opening the SR and this last interaction the “missing” files on http://download.oracle.com/otn_software/ have become available again. I tell them to close the SR.

Thoughts

Remember, this is about as simple an SR as you can get. I posted a test case to demonstrate it. It was really self-contained and simple. I don’t want to think what would happen with a “difficult” issue.

The “Problem Type” selection during SR creation is a problem. I understand MOS want to use it to help automate the gathering of information, but it needs a “none of the above” option, or you are *forced* to pick something you know is wrong.

The automated messages are terrible. I ranted about this before, which resulted in this post.

MOS Auto Responses : What’s my problem with them?

After that post I got a message from Oracle wanting to talk about it. That resulted in this post.

My Oracle Support (MOS) : Where do we go from here?

The messages that came out of that meeting were really positive, but it’s over 2.5 years later and if anything it’s got worse…

Someone from Oracle Support has reached out to me, so I have a call with them tomorrow. I’m not sure this call will result in any positive change. It feels a bit like groundhog day…

Overall, My Oracle Support (MOS) is a terrible user experience, and Oracle should not be charging this much money for this terrible service.

Cheers

Tim…

PS. It’s 2022. Why can’t people call me “Tim” rather than “Timothy”?

PPS. Please Oracle, stop breaking URLs…

PPPS. Please read the follow-up post here.

DBA and PL/SQL Development Tools (Poll Results Discussed)

I’ve been thinking about my DBA and PL/SQL tool choices recently, so I thought I would go out to Twitter and ask the masses what they are using.

As always, the sample size is small and my followers have an Oracle bias, so you can decide how representative you think these number are…

Here was the first question.

What tool do you do *most* of your Oracle DBA work with?

I expected SQL*Plus and SQLcl to be the winner here, and I was right. A lot of DBAs are still “old school” where administration is concerned. It may be tough for a beginner to use these command line tools, but over time you build up a list of scripts that mean it is much quicker than using GUI tools for most jobs.

SQL Developer had a pretty good showing at nearly 28%. I’m glad people are finding value in the DBA side of SQL Developer. TOAD/other were not doing so well. I know there are a lot of companies out there trying to make money with DBA tools, but maybe this is a tough market for them. Of course there are cross platform tools that may do well with other engines, even though they don’t register so well with the Oracle crowd.

I guess the real surprise was less than 8% using EM Cloud Control. Having said that, I’m considering ditching it myself. I like the performance pages and we use it as a centralized scheduler for backups, but I’m not sure our usage justifies the crazy bloat that is Cloud Control. It would be nice to remove all those agents and clean up! This figure of less than 8% is all the more surprising when you consider it is free (no cost option). Of course total cost of ownership is not just about the price tag…

This was the next question.

What tool do you do most of your PL/SQL development with?

I was expecting SQL Developer to do well here, but I was surprised by how low TOAD was in the list. I’ve worked at a few companies over the years where TOAD was a staple. I guess the consistent improvements to SQL Developer and a price tag of “free” have broken the TOAD strangle hold.

There were a few comments about Allround Aautomations PL/SQL Developer, which I used in one company many years ago. If I could have added an extra line in the poll, I would have put that as an option, because I know it is still popular. There were also mentions of DataGrip and a number of people using VS code with assorted extensions, including Oracle Developer Tools for VS Code.

Sadly, but understandably, SQL*Plus and SQLcl were low on this list. I’m an old timer, so I’ve had jobs where this was the only option. At one job I wrote my own editor in Visual Basic, then rewrote it in Java. Once SQL Developer (known as Raptor at the time) was released I stopped working on my editor…

When you’re doing “proper” PL/SQL development, it’s hard not to use an IDE. They just come with so much cool stuff to make you more productive. These days I tend to mostly write little utilities, or support other coders, so I find myself writing scripts in UltraEdit and compiling them in SQLcl. If I went back to hard core PL/SQL development, I would use an IDE though…

For fun I ended with this question.

SQL Developer and TOAD have a fight to the death. Who wins?

SQL Developer won, but it came out with a detached retina and some broken ribs!

Remember, you are most productive using the tools that suit your working style, but you should always keep your eyes open for better ways of working. Choice is a wonderful thing!

Cheers

Tim…