Log4j Vulnerabilities : My Random Thoughts

For technical information keep checking the Apache Log4j Security Vulnerabilities page for updates.

Someone on Twitter asked me to write something about the Log4j issue and my response was I’m not really qualified to do that. After some consideration I thought maybe my uneducated opinion would be useful to others, so here goes…

Basic Context

This is a variation of something I wrote on an internal P1 incident, to give people some context. Remember, there are a range of people reading this P1, so it was written to be understandable to a wide audience.

  • Log4j is an open source library used for logging in many Java applications. If you are not using Java apps, you are not using Log4j, so you are safe. If you are using Java apps, the vendor may not have used Log4j to do logging. This is why it is important to scan servers and check with the vendor to see if their software is vulnerable or not.
  • This is not an issue with Apache HTTPS Server. Apache is a software foundation, which manages many commonly used open source software products, including the Apache HTTP Server. When you see “Apache Log4j”, the word “Apache” is a reference to the software foundation, not the HTTP server. As a result, it’s not safe to assume that if you don’t use Apache HTTP Server you are safe.
  • Client applications running on your PC are low risk compared to server applications. Most of the attacks are based around sending requests containing dodgy payloads to application servers. Your local PC applications don’t handle such requests, so are extremely unlikely to be affected. They should still be patched as soon as patches are available, but you don’t need to obsess about them.

Here’s a quick summary:

  • Not a Java application. Don’t worry.
  • Java application that doesn’t use Log4j. Don’t worry.
  • Java application that uses Log4j 1.x. Don’t worry about these vulnerabilities. Of course, older code may be susceptible to other vulnerabilities.
  • Java application that uses Log4j 2.x. Java 8 (or later), upgrade to Log4j release 2.17.1*.
  • Java application that uses Log4j 2.x. Java 7, upgrade to Log4j release 2.12.4*.
  • If upgrading Log4j is not an immediate option, maybe you are waiting for a vendor to release a patch, consider mitigations until upgrades are possible.

* These versions were correct at the time of writing. Keep checking the Apache Log4j Security Vulnerabilities page for updates.

Mitigations are not Solutions

Upgrading Log4j is the only way to be sure.

  • Java 8 (or later) users should upgrade to Log4j release 2.17.1.
  • Java 7 users should upgrade to Log4j release 2.12.4.

In the early days of the vulnerabilities, most people focused on mitigations. Probably the most common was to add this JVM parameter.

-Dlog4j2.formatMsgNoLookups=true

Or to set this environment variable.

LOG4J_FORMAT_MSG_NO_LOOKUPS=true

These worked for the initial vulnerability, but don’t stop all attacks. They are listed as “discredited” on the Apache Log4j Security Vulnerabilities page. It’s still worth doing this while you wait for patches from vendors, but this only limits your exposure. It’s not a complete fix. Do not do this and assume it’s game over!

Another option was to remove the JndiLookup.class completely. This is still listed as a valid mitigation if you are not able to upgrade. It may seem scary, but if a vendor patch is not forthcoming, you need to weigh up the risks.

zip -q -d log4j-core-*.jar org/apache/logging/log4j/core/lookup/JndiLookup.class

In addition to the direct mitigations, you also need to consider the bigger picture. Applications that are available to the outside world are clearly at enormous risk. If applications are only available inside your company network, then the risk is reduced. I’m not saying ignore internal applications, but prioritise the higher risk systems first maybe?

Which of my systems are at risk?

If you work in a mixed shop with lots of 3rd party products, that is not necessarily an easy question to answer. You can’t just search the file system for *log4j* and think that’s good enough. The log4j libraries are often deployed inside other JAR files or zip files.

I’ve been using the log4j-detector tool by MergeBase to scan servers. They’ve been releasing new versions regularly since this issue started. It seems to work well, but I’m sure other tools are available.

On Linux servers all my software is under “/u01”, so I scan like this.

java -jar /tmp/log4j-detector-2021.12.22.jar /u01

I don’t have many Windows servers, but here’s an example of a command I used to scan an Artifactory server. Notice it’s not a standalone Java installation on this server, but one shipped as part of Artifactory.

"E:\jfrog\artifactory\app\third-party\java\bin\java.exe" -jar log4j-detector-2021.12.22.jar E:\jfrog

I would suggest you scan systems, even if your Vendor says they are safe. You never know what additional software has been installed by someone.

Log4j Developers

I’ve read a number of comments where people have criticised the Log4j developers, and for the most part I think they are totally out of order. The vast majority of companies have no engagement with open source software. They don’t commit code, and they don’t donate money to the projects they rely on. If you are just taking without giving anything back, I feel like you are not in a position to complain.

I understand some of the criticisms from a technical perspective, but hindsight is a wonderful thing. You could have spent time looking through the source code and highlighted stuff you didn’t like, but you didn’t. You could have got involved, but you didn’t.

I suspect there are developers of other common libraries checking their code for “exotic” features that need to be turned off by default…

Open Source

I’ve seen some people using the recent Log4j issues as a way to attack open source on a more general level. If people couldn’t see the source, they wouldn’t find the exploits right? I don’t buy this. Security through obscurity is not security. I wonder how much longer these vulnerabilities would have existed if the source code was not freely available?

Vendor Reactions

The reaction of vendors has been really interesting. Some have been really quick to react. Some, not so much. At times like this it’s really important that vendors release a statement as soon as possible, even if that is a message that says they are aware of the issue and are investigating. Your typical “watch this space” message. If your vendors were slow to react, or didn’t react at all, then I think you need to question whether you should be working with their products.

This is true for vendors of products that don’t even use Java. In addition to scanning, we have been compiling statements from vendors regardless of their technology stack. For a vulnerability this high profile, I think it’s important all vendors release a statement. It may sound ridiculous to you, but not every person involved in the process has a grasp of what technology stack is used by each product. If a vendor provides a clear statement, then it makes life a lot easier.

Oracle Advisory

The Oracle advisory came out pretty quick, and has been updated frequently over the last week as more patches have been released. Keep an eye on it over the next few days, as I expect some existing patches will be reissued with Log4j 2.17.

https://www.oracle.com/security-alerts/alert-cve-2021-44228.html

You still need to use your brain when determining the risk. The Oracle database is marked as not vulnerable, but there are some items shipped with the database that use vulnerable log4j versions. For example SQL Developer is shipped with the database, but this is a client tool. It is not receiving HTTPS requests from users, so it’s not a threat. There are patched versions of this available, but do you care? In a similar manner Enterprise Manager is vulnerable, and you should patch it, but it shouldn’t be accessible publicly, so the threat is lower. The chances are only your DBAs have firewall access to this server, so it represents a smaller threat than a public facing application server.

Conclusion

It has been a shit show, and there are little signs of it calming down much before Christmas, but you have to use this as a learning experience. Please apply patches as soon as they are available. If your vendor is slow off the mark, apply the mitigations while you wait.

As I’ve said, I’m not an expert, just someone trying to cope with these issues. If you see anything you think is factually incorrect, please tell me so I can correct it.

Cheers

Tim…

Windows 11 : My first few days…

I’ve been using Windows 11 for a few days now and I thought I would give my impressions so far.

Installation

I picked the upgrade from my Windows Update screen and it just worked. I didn’t have any dramas from the upgrade. After the upgrade I had two or three rounds of Windows Updates that needed reboots, but I kind-of expected that.

I’m sure people with older kit will have some different experiences, but on this Dell XPS 15″ with an i9 (6 cores), 32G RAM and a NVME M.2 drive things went fine.

First Impressions

I have macOS now… 🙂

The most striking thing is the change to the taskbar. It’s reminiscent of the macOS dock when it is idle. All the items are centralised, but you can move them to the left if you prefer that. When you compare Windows 11 to macOS Big Sur they look nothing like each other, but you get the vibe Microsoft were “inspired” by that look.

When you click the Windows button/key you get a much more streamlined start menu, which was a bit of a shock at first, but I think I prefer it. One gripe is all the stuff I had pinned to the start menu was lost after the upgrade, and replaced with bullshit I don’t care about. It only took a few minutes to sort that though.

Once you start using the OS it feels like Windows 10, but with rounded corners. There is a lot more consistency with the “design language” of the interface. Many of the common dialogs have been reworked to be consistent with the new look and feel, but there are still a bunch of things that never seem to change. Open up “Computer Management” and it feels kind-of jarring. It doesn’t follow the theme and it feels like you’ve switched back several versions of Windows. It’s not a problem, as most of the common dialogs are fine, but it is a little disappointing.

Unlike the super-glassy finish of Windows Vista, there is some transparency on certain menus in Windows 11, but it is very subtle.

Hiccups

I had a few hiccups along the way. They were all quite minor really.

  • The upgrade killed the VPN client I use for work. I had to uninstall it and install it again. The solution was pretty simple, but I was kind-of tense for a while.
  • The upgrade uninstalled “Teams for Work and School” and replaced it with the consumer version of Teams. That meant I couldn’t connect with anyone from work. I downloaded and installed “Teams for Work and School” and it was all good.
  • As I mentioned before, all the things I had pinned to the start menu were lost and I had to remove a load of crap and re-pin things.

None of these things were drama, but if you were under a time constraint you may find yourself swearing at the computer!

Heavy Usage

Minecraft works! 🙂

Most of my heavy use revolves around VirtualBox, Vagrant and Packer. I’ve built some new Vagrant boxes using Packer, and used those boxes for Vagrant builds of VirtualBox VMs, and I haven’t run into any problems yet.

I record and edit videos using Camtasia, and it seems happy running on Windows 11.

Most of my life is not spent doing process heavy things. I spend most of my time in a browser or a shell prompt. I connect to Linux boxes at home and at work using MobaXTerm. I’ve had no dramas with this day-to-day stuff.

I had a look on the interwebs and a few gamers have been complaining about Windows 11, so if you are a PC gamer, now might not be a good time to make the switch from Windows 10.

Overall Impressions

It’s the same, but different. The safe approach is to stick with Windows 10 for a few more years. I don’t think you are missing out on anything by doing that. If you fancy the jump to Windows 11 and you have reasonably new kit, go for it.

Cheers

Tim…

Continuous Delivery : Is there a problem with the sales pitch?

I saw Simon Haslam point to this very short video of Charity Majors speaking about Continuous Delivery.

This answer to why companies are not using continuous delivery is perfect!

“It’s a failure of will, and a failure of courage on the part of our technical leadership.”

I love this answer, because this is exactly how I feel about it!

After seeing that, it got me thinking about why technical leadership are so disengaged from continuous integration/continuous delivery (CI/CD), and I started to wonder if it was actually a problem with the sales pitch.

Have you ever been in a discussion where you provide compelling evidence for your stance, then say one stupid thing, which allows people with the opposing view to jump all over you, and effectively ignore all the stuff you said previously? Been there! Done that! I wonder if the same thing is happening during the CI/CD sales pitch.

When people write or speak about this stuff, they will often bring up things that provide an instant get-out for people. Let’s imagine I am trying to convince someone that CD is the way forward. I might say things like,

  • Automation means it’s not dependent on a specific person being there to complete the deployment.
  • We can eliminate human error from the delivery process.
  • It makes delivery more reliable, as we have a well tested and proven process.
  • That proven reliability makes both us and our customers more confident that deployments will be successful, so it reduces the fear, uncertainty and doubt that often surround deployments.
  • As a result of all of the above, it makes the delivery process quicker and more efficient.

That all sounds great, and surely seals the deal, but then we hit them with this.

  • Amazon does 23,000 production deployments a day!

And now you’ve lost your audience. The room of people who are scared of change, and will look for any reason to justify their stagnation, will likely go through this thought process.

  • Amazon use CI/CD to get 23,000 production deployments a day.
  • We don’t need to do 23,000 production deployments a day.
  • Therefore we don’t need CI/CD.

I know this sounds stupid, but I’m convinced it happens.

I’ve read a bunch of stuff over the years and I’m invested in this subject, but I still find myself irritated by some of the things I read because they focus on the end result, rather than the core values that got them to that end result. Statements like, “Amazon does 23,000 production deployments a day” or “this is what Twitter does”, are unhelpful to say the least. I feel like the core values should be consistent between companies, even if the end result is very different.

This is just a thought and I could be talking complete crap, but I’m going to try and declutter myself of all this hype bullshit and try to focus on the core values of stuff, and hopefully stop giving people a reason to ignore me…

Cheers

Tim…

DBmarlin : New Liquibase Integration

I had a call with some folks from DBmarlin back in June and they gave me a demo of their new database performance monitoring product, which supports any mix of on-prem and cloud databases. You can see the kind of stuff they do in the videos at the end of the post. One of the things I thought was cool was it tracks changes and overlays them on their timeline, so you can see if a change results in a significant difference in performance. Neat!

I mentioned this on Twitter at the time and Ilmar Kerm responded with this message.

“Good idea to overlay dB changes… this gives me an idea to start collecting all changes from liquibase databasechangelog tables and annotate grafana dashboards with them”

The next day he came back with this post.

“Done and deployed 🙂 Collecting Liquibase databasechangelog entries into central InfluxDB measurement. Added these events as annotations to grafana + dashboard just to browse all individual executed changesets as a text log. I like it.”

You can see an image of his dashboard here.

A few days ago I got an email from the DBmarlin folks again asking if I wanted to check out their Liquibase integration, inspired by Ilmar, so on Friday last week we had that call.

Their implementation is different to Ilmar’s, as they don’t reference the DATABASECHANGELOG table. Instead they’ve got a jar file that integrates with the Liquibase client to give them the change information, and allow them to link through to Liquibase Hub. The Liquibase integration is available from this repo on GitHib.

If you click on the image below to expand it, you will see the timeline with the changes (see my red arrows), and the list of changes at the bottom. Notice the Liquibase change I’ve highlighted.

Clicking on the link in the Liquibase change takes you to Liquibase Hub, where you can get more details about the change.

I really like this.

This is hot off the press. They are looking for people to kick the tyres on this and give them some feedback, so if you’re currently using the Liquibase client and Liquibase Hub with your databases (Oracle, PostreSQL, SQL Server, MySQL, CockroachDB etc.), give Russell (russell.luke@applicationperformance.com) a shout and have a play! You could get a free copy of DBmarlin for your feedback.

By the way, they also do Jenkins integration. See here. 🙂

Cheers

Tim…

BTW, this is not a sponsored post. I just like what they are doing with this product, so I thought it was worth a shout out.

Videos

DBmarlin for Liquibase

DBmarlin for Jenkins

DBmarlin short demo

Check out their YouTube channel here.

AdoptOpenJDK to Adoptium

This post is a “note to self”….

If you’ve been using AdoptOpenJDK to download OpenJDK, you will have noticed for the last few months there has been a message at the top of the screen with this link.

That message recently changed to include the following message.

“Our July 2021 and future releases will come from Adoptium.net

When you go to the link you will see a very familiar looking page layout, with some slightly different branding. 🙂

They July updates are scheduled for the end of July, so you’ll have to wait a bit. In previous quarters they’ve been less than 3 working days of the initial security announcement, but I guess the reorganization has delayed things somewhat.

So just remember to switch from this:

https://adoptopenjdk.net/

To this:

https://adoptium.net/

Happy patching!

Cheers

Tim…

Update: The OpenJDK 8 and 16 downloads are now present. Still waiting for the OpenJDK 11 downloads.

Performance/Flow : Focusing on the Constraint

Over a decade ago Cary Millsap was doing talks at Oracle conferences called “Thinking Clearly About Performance”. One of the points he discussed was identifying the big bottlenecks and dealing with those, rather than focusing on the little things that weren’t causing a problem. For example, if a task is made up of two operations where one takes 60 seconds to complete and the other one takes 1 second to complete, which one will give the most benefit if optimized?

This is the same issue when we are looking at automation to improve flow in DevOps. There are a whole bunch of things we might consider automating, but it makes sense to try and fix the things that are causing the biggest problem first, as they will give the best return. In DevOps and Lean terms that is focusing on the constraint. The weakest link in the chain. (see Theory of Constraints).

Lost in Automation

The reason I mention this is I think it’s really easy to get lost during automation. We often focus on what we can automate, rather than what needs automating. With my DBA hat on it’s clear my focus will be on the automation of provisioning and patching databases and application servers, but how important is that for the company?

  • If the developers want to “build & burn” environments, including databases, for their CI/CD pipelines, then automation of database and application server provisioning is really important as it might happen multiple times a day for automated testing.
  • If the developers use a more traditional dev, test, prod approach to environments, then the speed of provisioning new systems may be a lot less important to the overall performance of the company.

In both cases the automation gives benefits, but in the first case the benefits are much greater. Even then, is this the constraint? Maybe the problems is it takes 14 days for approval to run the automation? 🙂

It’s sometimes hard for techies to have a good idea of where they fit in the value chain. We are often so focused on what we do, and don’t have a clue about the bigger picture.

Before we launch into automation, we need to focus on where the big problems are. Deal with the constraints first. That might mean stopping what you’re doing and going to help another team…

Don’t Automate Bad Process

We need to streamline processes before automating them. It’s a really bad idea to automate bad processes, because they will become embedded for life. It’s hard enough to get rid of bad processes because the, “it’s what we’ve always done”, inertia is difficult to overcome. If we add automation around that bad process we will never get rid of it, because now people will complain we are breaking the automation if we alter the process.

Another thing Cary talked about was removing redundant steps. You can’t make something faster than not doing it in the first place. 🙂 It’s surprising how much useless crap becomes embedded in processes as they evolve over the years.

The process of continuous improvement involves all aspects of the business. We have to be willing to revise our processes to make sure they are optimal, and build our automation around those optimised processes.

I’m not advocating compulsory tuning disorder. We’ve got to be sensible about this stuff.

Know When to Cut Your Losses

The vast majority of examples of automation and DevOps are focussed on software delivery in software development focused companies. It can be very frustrating listening to people harp on about this stuff when you work in a mixed environment with a load of 3rd party applications that will never be automated because they can’t be. They can literally break every rule you have in place and you are still stuck with them because the business relies on them.

You have to know where to cut your losses and move on. There will be some systems that will remain manual and crappy for as long as they are in your company. You can still try and automate around them, and maybe end up in a semi-automated state, but forget wasting your time trying to get to 1000 deployments a day. 🙂

I wrote about this in a post called I’m 2% DevOps, 3% agile and 4% automated because of 3rd party apps.

Try to Understand the Bigger Picture

I think it makes sense for us to all to try and get a better understanding of the bigger picture. It can be frustrating when you’ve put in a lot of work to automate something and nobody cares, because it wasn’t perceived as a problem in the big picture. I’m not suggesting we all have to be business analysts and system architects, but it’s important we know enough about the big picture so we can direct our efforts to get the maximum results.

I wrote a series of posts about automation here.

Cheers

Tim…

Does anyone care about enterprise application servers anymore?

A few years ago it seemed like everywhere you turned there was a vendor extolling the virtues of their enterprise application server. The message was very much “big is beautiful”. The more complicated the better. The more features the better. Fast forward to present time and is that message still the same?

In a recent conversation I was asked about my thoughts on this and my response was I don’t think these “do it all” application servers matter anymore. This post outlines my feelings on the matter.

Full disclosure

Because of my job I was forced to learn WebLogic, and I’ve always disliked it. I think it’s over-complicated, a pig to manage, and every quarter the list of vulnerabilities look like a horror show. Keep this in mind when reading the rest of this post. Basically, I’m saying this is my biased opinion and you don’t have to agree with me. I’m not throwing shade at any particular company. In my opinion, the mindset at the time was different to now, which resulted in similar products from multiple companies.

Also, it’s worth saying I’m not an application server guru, so that probably affects my views on the subject.

Third-party application vendors

The company I work for at the moment uses a lot of third party apps. From what I can see, the 3rd party application vendors that were previously looking to offer solutions on products like Websphere and WebLogic are quite happily getting rid of them in favour of more streamlined solutions. They are quite happy to forgo the additional features, or implement them with additional libraries in their code.

Maybe that’s just the application vendors I come into contact with. Maybe your experience is different. I would be interested to know, in case I’m totally off the mark here.

Containers

With the rise of containers we’ve become accustomed to small and lightweight pieces of infrastructure that focus on doing one thing well, rather than trying to be all things to all people.

I know you can run these enterprise application servers in containers and on Kubernetes, but it just feels like trying to force monoliths into containers. That’s not my idea of progress.

When I talk about lightweight, I’m not just thinking about the size of the footprint, but the complexity of it also.

Administration

The administration of the monoliths is too complicated. Given the choice of training up a new person on WebLogic or Tomcat, I know which I would pick.

Even after all these years, when we get an issue on a WebLogic server a piece of me dies because I know I’ll be rooting around for ages to get the answers. In comparison, Tomcat is so much simpler.

We go back to my definition of the size of the footprint again. The complexity of some of these products comes at a cost.

But what about product X?

I realise that some products still need these enterprise application servers. If you use a product that needs them, that’s fine. Continue doing so. I just wouldn’t be looking to buy new products that require that much investment in time and money. It’s just my opinion though.

Moving to the cloud

One of the things I’ve mentioned several times is the move to the cloud changes things significantly. To a certain extent, I don’t care about the underlying technology used by a cloud service. I just care that it meets my requirements. Does it perform well enough? Does it have the required availability? Does it come at a reasonable price? Is it really easy to administer, or preferably need no real administration? Does it support the programming language/framework I care about? At no point am I really thinking about the underlying tech.

If a cloud provider builds out there service using an enterprise application server, that’s fine. I just don’t want to pay those licensing fees, and I don’t want to see any of the management. I want the most streamlined experience I can get.

What do you think?

I’m really interested to know what you think. I come to this conversation with a lot of bias, so I understand if a lot of people don’t agree with me. Are you looking to simplify this layer of your infrastructure, or do you still lean on the side of “big is beautiful”?

Cheers

Tim…

High Availability : How much availability do you really need?

I had a discussion with a few folks about high availability (HA) and I thought I would write down some of my thoughts on it. I’m sure I’ve made many of these points before, but maybe not collectively in this form.

Before we start

This is not specifically about Oracle products. It applies equally well to any technology, but I will use Oracle as an example, because that’s what I know.

When I speak about a specific product or technology I am only talking about it in the context of HA. I don’t care what other benefits it brings, as they are not relevant to this discussion.

This is not about me saying, “you don’t need to buy/use X”. It’s me asking you to ask yourself if you need X, before you spend money and time on it.

How much downtime can you really tolerate?

This is a really simple question to ask, but not one you will always get a considered answer to. Without thinking, people will demand 24×7 operation with zero downtime, yet you ask for a downtime window to perform a task and it will get approved. Clearly this contradicts the 24×7 stance.

As a company you have to get a good grasp of what downtime you can *really* tolerate. It might be different for every system. Think about interfaces and dependencies. If system A is considered “low importance”, but it is referenced by system B that is considered “high importance”, that may alter your perception of system A, and its HA requirements.

There are clearly some companies that require as close to 100% availability as possible, but also a lot don’t. Many can get away with planned downtime, and provided failures don’t happen too often, can work through them with little more than a few grumbles. We are not all the same. Don’t get lead astray by thinking you are Netflix.

The more downtime you can tolerate, the more HA options are available to you, and the simpler and cheaper your solutions can become.

What is the true cost of your downtime?

The customers of some companies have no brand loyalty. If the site is down, the customers will go elsewhere. Some companies have extreme brand loyalty and people will tolerate being messed around.

If Amazon is down, I will wait until it is back online and make the purchase. There could be a business impact in terms of the flow of work downstream, but they are not going to lose me as a customer. So you can argue Amazon can’t tolerate downtime, or you can argue they can.

I used to play World of Warcraft (WOW), and it always irritated me when they did the Wednesday server restarts, but I just grumbled and waited. Once again, their customer base could tolerate planned downtime.

In some cases you are talking about reputational damage. If an Oracle website is down it’s kind-of embarrassing when they are a company that sells HA solutions. Reputational damage can be quite costly.

This cost of downtime for planned maintenance and failures has to factor into your decision about how much downtime you can tolerate.

Can you afford the uptime you are demanding?

High availability costs money. The greater the uptime you demand, the more it’s going to cost you. The costs are multi-factored. There is the cost of the kit, the licenses and the people skills. More about people later.

If you want a specific level of availability, you have to be willing to invest the money to get it. If you are on a budget, good luck with that 99.99+% uptime… 🙂

Do you have the skills to minimize downtime?

It’s rare that HA comes for free from a skills perspective. Let’s look at some scenarios involving Oracle databases.

  • Single instance on VM: You are relying on your virtual infrastructure to handle failure. Your DBAs can have less HA experience, but you need to know your virtualization folks are on form.
  • Data Guard: Your DBAs have to know all the usual skills, but also need good Data Guard skills. There is no point having a standby database if you don’t know how to use it, or it doesn’t work when you need it.
  • Real Application Clusters (RAC): Now your DBAs need RAC skills. I think most people would agree that RAC done badly will give you less availability than a single instance database, so your people have to know what they are doing.
  • RAC plus Data Guard: I think you get the point.

We often hear about containers and microservices as the solution to all things performance and HA related, but that’s going to fail badly unless you have the correct skills.

Some of these skills can be ignored if you are willing to use a cloud service that does it for you, but if not you have to staff it! That’s either internal staff, or an external consultancy. If you skimp on the skills, your HA will fail!

What are you protecting against?

The terms high availability (HA) and disaster recovery (DR) can kind-of merge in some conversations, and I don’t want to get into a war about it. The important point is people need to understand what their HA/DR solutions can do and what they can’t.

  • Failure of a process/instance on a single host.
  • Failure of a host in a cluster located in a single data centre.
  • Failover between data centres in the same geographical region.
  • Failover between data centres in different geographical regions.
  • Failover between planets in the same solar system.

You get the idea. It’s easy to put your money down and think you’ve got HA sorted, but have you really? I think we’ve all seen (or lived through) the stories about systems being designed to failover between data centres, only to find one data centre contains a vital piece of the architecture that breaks everything if it is missing.

Are all your layers highly available?

A chain is only as strong as the weakest link. What’s the point of spending a fortune on sorting out your database HA if your application layer is crap? What’s the point of having a beautifully architected HA solution in your application layer if your database HA is screwed?

Teams will often obsess about their own little piece of the puzzle, but a failure is a failure to the users. They aren’t going to say, “I bet it wasn’t the database though!”

Maybe your attention needs to be on the real problems, not performing HA masturbation on a layer that is generally working fine.

Who are you being advised by, and what is their motivation?

Not everyone is coming to the table with the same motivations.

  • Some vendors just want to sell licenses.
  • Some consultants want to charge you for expensive skills and training.
  • Some consultants and staff want to get a specific skill on their CV, and are happy for you to pay them to do that.
  • Some vendors, consultants and staff don’t engage their brain, and just offer the same solution to every company they encounter.
  • Some people genuinely care about finding the best solution to meet your needs.

Over my career I’ve seen all of these. Call me cynical, but I would suggest you always question the motives of the people giving you advice. Not everyone has your best interests at heart.

So what do you do?

In my current company we use our virtual infrastructure for basic HA. The databases (Oracle, MySQL and SQL Server) typically run as single instances in a VM, and failover to a different host in the same data centre or a different data centre depending on the nature of the failure. There are some SQL Server databases that use AlwaysOn, but I see little benefit to it for us.

Every so often the subject of better database HA comes up. We can tolerate a certain level of downtime for planned maintenance and failures, and the cost and skills required for better HA are not practical for us at this point. This position is correct for us as we currently stand. It may not be the correct option in future, and if so we will revisit it.

For the middle tier we do the normal thing of multiple app servers (VMs or containers) behind load balancers.

I could probably build a very convincing case to do things differently to make my job a little “sexier”, but that would be a dick move. As it happens I want to move everything to the cloud so I can stop obsessing about the boring stuff and let the cloud provider worry about it. 🙂

Conclusion

There is no “one size fits all” solution to anything in life. As the mighty Tom Kyte said many times, the answer is always, “it depends”. If people are making decisions without discussing the sort of things I’ve mentioned here, I would suggest their decision process is flawed. Answers like, “but that’s what Twitter does”, are unacceptable.

Cheers

Tim..

When Algorithms Attack (Twitter Edition)

This morning Piet de Visser put out a Tweet to this short film.

It’s a work of fiction, but it’s all about the level of privacy we’ve given up without even knowing it, due to the real life collaborations between companies (Amazon, Google etc.) and institutions like the police and the NHS. In the film, algorithms take that data to make judgements about people. It’s well worth a watch…

Roger MacNicol replied Piet’s tweet saying it was very good, and I replied to the pair of them saying the following.

When Algorithms Attack

Very soon after tweeting this I was kicked out of Twitter. When I tried to get back in I was directed to a page that forced me to delete the tweet and told me I was going to be blocked from tweeting for 12 hours. I can still read tweets and DM. I just can’t tweet. I also received an email about it.

My first reaction was to burst out laughing for a couple of reasons.

  1. I was commenting about a film where algorithms were making judgements about people and their actions without taking context into account.
  2. I think anyone who follows me on Twitter knows my views on COVID-19 and vaccinations. I am very much in favour of vaccinations, and I’m not at all a believer in conspiracy theories surrounding any vaccination, including the COVID-19 vaccinations. I’ve had both my shots of the Pfizer vaccinations. I guess Twitter should know that because I tweeted about it.

Now clearly in isolation that tweet looks like I’m one of those folks I’m always complaining about, but looking at my history of tweets you would know I often reply to things with shit-posts and don’t always include smilies when I’m doing it. I think I wear my views on my sleeve, so I don’t really worry about people taking a single tweet out of context, but clearly the Twitter algorithm is another matter…

The algorithm isn’t very smart and just appears to flag up specific word combinations. OK. Nice AI Twitter! 🙂

What am I going to do about it?

Nothing, stupid! Twitter can play by whatever rules they want. I’ve just got to wait 12 hours before I can resume shit-posting on Twitter. 🙂

I guess the only thing that is annoying is I can’t post a message to say I’m blocked from tweeting. It would be kind-of nice if I could do that. Having said that, I’m on holiday today and I could easily waste all day talking crap on Twitter, so they’ve probably done me a favour! 🙂

In conclusion, we are all doomed! 🙂

Cheers

Tim…

PS. I once received a 12 hour ban when arguing with an anti-vaxer. I think the combinations of words in my tweets flagged me as one of those nutters. About 5 minutes later my account was unblocked, so I think maybe a human was doing some sanity checking at the time, or maybe the AI got smarter. 🙂

PPS. The 12 hours is now up…

Updates to Vagrant and Docker Builds (Oracle Patches and Upgrades)

Unless you’ve been living under a rock, you will know there have been a load of software patches and updates released recently. As a result I’ve been constantly updating my Vagrant and Docker builds as each one has dropped. With the release of ORDS 21.1, the main push for this quarter is done.

This is just a heads-up of what’s been happening.

Packer : My Packer builds of OL7 and OL8 Vagrant boxes have been updated and pushed to Vagrant Cloud. This ended up happening twice due to the quick release of VirtualBox 6.1.22 a few days after 6.1.20.

Vagrant : All relevant builds now have the latest Java 11, Tomcat 9, ORDS 12.1 and SQLcl 21.1 versions. Where necessary the database patches are included. I mostly try to do builds with stock releases, so people without a support contract can still use them, but some things require the patches to function properly. If you follow the blog you will already know the Oracle Enterprise Manager Cloud Control 13.5 builds have now been included.

Docker/Containers : Similar to Vagrant, all relevant builds now have the latest Java 11, Tomcat 9, ORDS 12.1 and SQLcl 21.1 versions. Database patches are updated where necessary.

There is still some stuff on the horizon though. With the new version of APEX dropping on the apex.oracle.com, I expect a new on-prem release soon (see update). There is also the on-prem release of Oracle database 21c, which I’m hoping drops soon. Once it does I will be adding those builds…

Cheers

Tim…

Update: APEX 21.1 dropped today (12-May-2021) just after publishing this post. It’s been added to all the builds now. 🙂

Exit mobile version