Performance/Flow : Focusing on the Constraint

Over a decade ago Cary Millsap was doing talks at Oracle conferences called “Thinking Clearly About Performance”. One of the points he discussed was identifying the big bottlenecks and dealing with those, rather than focusing on the little things that weren’t causing a problem. For example, if a task is made up of two operations where one takes 60 seconds to complete and the other one takes 1 second to complete, which one will give the most benefit if optimized?

This is the same issue when we are looking at automation to improve flow in DevOps. There are a whole bunch of things we might consider automating, but it makes sense to try and fix the things that are causing the biggest problem first, as they will give the best return. In DevOps and Lean terms that is focusing on the constraint. The weakest link in the chain. (see Theory of Constraints).

Lost in Automation

The reason I mention this is I think it’s really easy to get lost during automation. We often focus on what we can automate, rather than what needs automating. With my DBA hat on it’s clear my focus will be on the automation of provisioning and patching databases and application servers, but how important is that for the company?

  • If the developers want to “build & burn” environments, including databases, for their CI/CD pipelines, then automation of database and application server provisioning is really important as it might happen multiple times a day for automated testing.
  • If the developers use a more traditional dev, test, prod approach to environments, then the speed of provisioning new systems may be a lot less important to the overall performance of the company.

In both cases the automation gives benefits, but in the first case the benefits are much greater. Even then, is this the constraint? Maybe the problems is it takes 14 days for approval to run the automation? πŸ™‚

It’s sometimes hard for techies to have a good idea of where they fit in the value chain. We are often so focused on what we do, and don’t have a clue about the bigger picture.

Before we launch into automation, we need to focus on where the big problems are. Deal with the constraints first. That might mean stopping what you’re doing and going to help another team…

Don’t Automate Bad Process

We need to streamline processes before automating them. It’s a really bad idea to automate bad processes, because they will become embedded for life. It’s hard enough to get rid of bad processes because the, “it’s what we’ve always done”, inertia is difficult to overcome. If we add automation around that bad process we will never get rid of it, because now people will complain we are breaking the automation if we alter the process.

Another thing Cary talked about was removing redundant steps. You can’t make something faster than not doing it in the first place. πŸ™‚ It’s surprising how much useless crap becomes embedded in processes as they evolve over the years.

The process of continuous improvement involves all aspects of the business. We have to be willing to revise our processes to make sure they are optimal, and build our automation around those optimised processes.

I’m not advocating compulsory tuning disorder. We’ve got to be sensible about this stuff.

Know When to Cut Your Losses

The vast majority of examples of automation and DevOps are focussed on software delivery in software development focused companies. It can be very frustrating listening to people harp on about this stuff when you work in a mixed environment with a load of 3rd party applications that will never be automated because they can’t be. They can literally break every rule you have in place and you are still stuck with them because the business relies on them.

You have to know where to cut your losses and move on. There will be some systems that will remain manual and crappy for as long as they are in your company. You can still try and automate around them, and maybe end up in a semi-automated state, but forget wasting your time trying to get to 1000 deployments a day. πŸ™‚

I wrote about this in a post called I’m 2% DevOps, 3% agile and 4% automated because of 3rd party apps.

Try to Understand the Bigger Picture

I think it makes sense for us to all to try and get a better understanding of the bigger picture. It can be frustrating when you’ve put in a lot of work to automate something and nobody cares, because it wasn’t perceived as a problem in the big picture. I’m not suggesting we all have to be business analysts and system architects, but it’s important we know enough about the big picture so we can direct our efforts to get the maximum results.

I wrote a series of posts about automation here.

Cheers

Tim…

Does anyone care about enterprise application servers anymore?

A few years ago it seemed like everywhere you turned there was a vendor extolling the virtues of their enterprise application server. The message was very much “big is beautiful”. The more complicated the better. The more features the better. Fast forward to present time and is that message still the same?

In a recent conversation I was asked about my thoughts on this and my response was I don’t think these “do it all” application servers matter anymore. This post outlines my feelings on the matter.

Full disclosure

Because of my job I was forced to learn WebLogic, and I’ve always disliked it. I think it’s over-complicated, a pig to manage, and every quarter the list of vulnerabilities look like a horror show. Keep this in mind when reading the rest of this post. Basically, I’m saying this is my biased opinion and you don’t have to agree with me. I’m not throwing shade at any particular company. In my opinion, the mindset at the time was different to now, which resulted in similar products from multiple companies.

Also, it’s worth saying I’m not an application server guru, so that probably affects my views on the subject.

Third-party application vendors

The company I work for at the moment uses a lot of third party apps. From what I can see, the 3rd party application vendors that were previously looking to offer solutions on products like Websphere and WebLogic are quite happily getting rid of them in favour of more streamlined solutions. They are quite happy to forgo the additional features, or implement them with additional libraries in their code.

Maybe that’s just the application vendors I come into contact with. Maybe your experience is different. I would be interested to know, in case I’m totally off the mark here.

Containers

With the rise of containers we’ve become accustomed to small and lightweight pieces of infrastructure that focus on doing one thing well, rather than trying to be all things to all people.

I know you can run these enterprise application servers in containers and on Kubernetes, but it just feels like trying to force monoliths into containers. That’s not my idea of progress.

When I talk about lightweight, I’m not just thinking about the size of the footprint, but the complexity of it also.

Administration

The administration of the monoliths is too complicated. Given the choice of training up a new person on WebLogic or Tomcat, I know which I would pick.

Even after all these years, when we get an issue on a WebLogic server a piece of me dies because I know I’ll be rooting around for ages to get the answers. In comparison, Tomcat is so much simpler.

We go back to my definition of the size of the footprint again. The complexity of some of these products comes at a cost.

But what about product X?

I realise that some products still need these enterprise application servers. If you use a product that needs them, that’s fine. Continue doing so. I just wouldn’t be looking to buy new products that require that much investment in time and money. It’s just my opinion though.

Moving to the cloud

One of the things I’ve mentioned several times is the move to the cloud changes things significantly. To a certain extent, I don’t care about the underlying technology used by a cloud service. I just care that it meets my requirements. Does it perform well enough? Does it have the required availability? Does it come at a reasonable price? Is it really easy to administer, or preferably need no real administration? Does it support the programming language/framework I care about? At no point am I really thinking about the underlying tech.

If a cloud provider builds out there service using an enterprise application server, that’s fine. I just don’t want to pay those licensing fees, and I don’t want to see any of the management. I want the most streamlined experience I can get.

What do you think?

I’m really interested to know what you think. I come to this conversation with a lot of bias, so I understand if a lot of people don’t agree with me. Are you looking to simplify this layer of your infrastructure, or do you still lean on the side of “big is beautiful”?

Cheers

Tim…

High Availability : How much availability do you really need?

I had a discussion with a few folks about high availability (HA) and I thought I would write down some of my thoughts on it. I’m sure I’ve made many of these points before, but maybe not collectively in this form.

Before we start

This is not specifically about Oracle products. It applies equally well to any technology, but I will use Oracle as an example, because that’s what I know.

When I speak about a specific product or technology I am only talking about it in the context of HA. I don’t care what other benefits it brings, as they are not relevant to this discussion.

This is not about me saying, “you don’t need to buy/use X”. It’s me asking you to ask yourself if you need X, before you spend money and time on it.

How much downtime can you really tolerate?

This is a really simple question to ask, but not one you will always get a considered answer to. Without thinking, people will demand 24×7 operation with zero downtime, yet you ask for a downtime window to perform a task and it will get approved. Clearly this contradicts the 24×7 stance.

As a company you have to get a good grasp of what downtime you can *really* tolerate. It might be different for every system. Think about interfaces and dependencies. If system A is considered “low importance”, but it is referenced by system B that is considered “high importance”, that may alter your perception of system A, and its HA requirements.

There are clearly some companies that require as close to 100% availability as possible, but also a lot don’t. Many can get away with planned downtime, and provided failures don’t happen too often, can work through them with little more than a few grumbles. We are not all the same. Don’t get lead astray by thinking you are Netflix.

The more downtime you can tolerate, the more HA options are available to you, and the simpler and cheaper your solutions can become.

What is the true cost of your downtime?

The customers of some companies have no brand loyalty. If the site is down, the customers will go elsewhere. Some companies have extreme brand loyalty and people will tolerate being messed around.

If Amazon is down, I will wait until it is back online and make the purchase. There could be a business impact in terms of the flow of work downstream, but they are not going to lose me as a customer. So you can argue Amazon can’t tolerate downtime, or you can argue they can.

I used to play World of Warcraft (WOW), and it always irritated me when they did the Wednesday server restarts, but I just grumbled and waited. Once again, their customer base could tolerate planned downtime.

In some cases you are talking about reputational damage. If an Oracle website is down it’s kind-of embarrassing when they are a company that sells HA solutions. Reputational damage can be quite costly.

This cost of downtime for planned maintenance and failures has to factor into your decision about how much downtime you can tolerate.

Can you afford the uptime you are demanding?

High availability costs money. The greater the uptime you demand, the more it’s going to cost you. The costs are multi-factored. There is the cost of the kit, the licenses and the people skills. More about people later.

If you want a specific level of availability, you have to be willing to invest the money to get it. If you are on a budget, good luck with that 99.99+% uptime… πŸ™‚

Do you have the skills to minimize downtime?

It’s rare that HA comes for free from a skills perspective. Let’s look at some scenarios involving Oracle databases.

  • Single instance on VM: You are relying on your virtual infrastructure to handle failure. Your DBAs can have less HA experience, but you need to know your virtualization folks are on form.
  • Data Guard: Your DBAs have to know all the usual skills, but also need good Data Guard skills. There is no point having a standby database if you don’t know how to use it, or it doesn’t work when you need it.
  • Real Application Clusters (RAC): Now your DBAs need RAC skills. I think most people would agree that RAC done badly will give you less availability than a single instance database, so your people have to know what they are doing.
  • RAC plus Data Guard: I think you get the point.

We often hear about containers and microservices as the solution to all things performance and HA related, but that’s going to fail badly unless you have the correct skills.

Some of these skills can be ignored if you are willing to use a cloud service that does it for you, but if not you have to staff it! That’s either internal staff, or an external consultancy. If you skimp on the skills, your HA will fail!

What are you protecting against?

The terms high availability (HA) and disaster recovery (DR) can kind-of merge in some conversations, and I don’t want to get into a war about it. The important point is people need to understand what their HA/DR solutions can do and what they can’t.

  • Failure of a process/instance on a single host.
  • Failure of a host in a cluster located in a single data centre.
  • Failover between data centres in the same geographical region.
  • Failover between data centres in different geographical regions.
  • Failover between planets in the same solar system.

You get the idea. It’s easy to put your money down and think you’ve got HA sorted, but have you really? I think we’ve all seen (or lived through) the stories about systems being designed to failover between data centres, only to find one data centre contains a vital piece of the architecture that breaks everything if it is missing.

Are all your layers highly available?

A chain is only as strong as the weakest link. What’s the point of spending a fortune on sorting out your database HA if your application layer is crap? What’s the point of having a beautifully architected HA solution in your application layer if your database HA is screwed?

Teams will often obsess about their own little piece of the puzzle, but a failure is a failure to the users. They aren’t going to say, “I bet it wasn’t the database though!”

Maybe your attention needs to be on the real problems, not performing HA masturbation on a layer that is generally working fine.

Who are you being advised by, and what is their motivation?

Not everyone is coming to the table with the same motivations.

  • Some vendors just want to sell licenses.
  • Some consultants want to charge you for expensive skills and training.
  • Some consultants and staff want to get a specific skill on their CV, and are happy for you to pay them to do that.
  • Some vendors, consultants and staff don’t engage their brain, and just offer the same solution to every company they encounter.
  • Some people genuinely care about finding the best solution to meet your needs.

Over my career I’ve seen all of these. Call me cynical, but I would suggest you always question the motives of the people giving you advice. Not everyone has your best interests at heart.

So what do you do?

In my current company we use our virtual infrastructure for basic HA. The databases (Oracle, MySQL and SQL Server) typically run as single instances in a VM, and failover to a different host in the same data centre or a different data centre depending on the nature of the failure. There are some SQL Server databases that use AlwaysOn, but I see little benefit to it for us.

Every so often the subject of better database HA comes up. We can tolerate a certain level of downtime for planned maintenance and failures, and the cost and skills required for better HA are not practical for us at this point. This position is correct for us as we currently stand. It may not be the correct option in future, and if so we will revisit it.

For the middle tier we do the normal thing of multiple app servers (VMs or containers) behind load balancers.

I could probably build a very convincing case to do things differently to make my job a little “sexier”, but that would be a dick move. As it happens I want to move everything to the cloud so I can stop obsessing about the boring stuff and let the cloud provider worry about it. πŸ™‚

Conclusion

There is no “one size fits all” solution to anything in life. As the mighty Tom Kyte said many times, the answer is always, “it depends”. If people are making decisions without discussing the sort of things I’ve mentioned here, I would suggest their decision process is flawed. Answers like, “but that’s what Twitter does”, are unacceptable.

Cheers

Tim..

Updates to Vagrant and Docker Builds (Oracle Patches and Upgrades)

Unless you’ve been living under a rock, you will know there have been a load of software patches and updates released recently. As a result I’ve been constantly updating my Vagrant and Docker builds as each one has dropped. With the release of ORDS 21.1, the main push for this quarter is done.

This is just a heads-up of what’s been happening.

Packer : My Packer builds of OL7 and OL8 Vagrant boxes have been updated and pushed to Vagrant Cloud. This ended up happening twice due to the quick release of VirtualBox 6.1.22 a few days after 6.1.20.

Vagrant : All relevant builds now have the latest Java 11, Tomcat 9, ORDS 12.1 and SQLcl 21.1 versions. Where necessary the database patches are included. I mostly try to do builds with stock releases, so people without a support contract can still use them, but some things require the patches to function properly. If you follow the blog you will already know the Oracle Enterprise Manager Cloud Control 13.5 builds have now been included.

Docker/Containers : Similar to Vagrant, all relevant builds now have the latest Java 11, Tomcat 9, ORDS 12.1 and SQLcl 21.1 versions. Database patches are updated where necessary.

There is still some stuff on the horizon though. With the new version of APEX dropping on the apex.oracle.com, I expect a new on-prem release soon (see update). There is also the on-prem release of Oracle database 21c, which I’m hoping drops soon. Once it does I will be adding those builds…

Cheers

Tim…

Update: APEX 21.1 dropped today (12-May-2021) just after publishing this post. It’s been added to all the builds now. πŸ™‚

What are you actually comparing? (an extended Twitter rant)

I had a bit of a rant on a Twitter thread yesterday about this post.

The Transition to Trunk-Based Development

I’m going to repeat that rant here, then expand on it below.

The Rant

This kind-of gets on my tits.

It reads like, “we were doing everything badly and blamed it on GitFlow. Then we switched to trunk-based development and started to do everything properly. Trunk-based dev rules.”

You went from:

  • No automation to automated delivery.
  • Manual testing to automated testing.
  • Huge, long-lived feature branches to deploying smaller units of work.

None of this is about GitFlow vs trunk-based. It’s all about doing modern development properly.

This is such a lazy argument. I would be interested to see you fix all that stuff in GitFlow, then move to trunk-based and see what the difference/improvement was. That is a better comparison.

Now I realise sometimes you have to have a big step-change project to get people to do something different, and maybe trunk-based development was that trigger for your company, and if so, that’s good for you, but as I said, this is not GitFlow vs. trunk-based.

So once again. I’m not saying one method is better than the other. I’m just saying be honest about the major factors that contributed to your increased velocity.

None of the factors listed were specific to these styles of development. They are basic Agile and DevOps things.

Expanding

I see this sort of thing all the time.

I remember being in a session about Exadata when it was first released, and being told about 1000x performance when one project was moved from old conventional kit to Exadata. Someone much brighter than me asked the question about what code changes were implemented as part of the migration. It turned out a bunch of row-by-row (slow-by-slow) procedural data loads were ripped out and replaced by parallel DML operations using external tables. So as well as moving from old crappy kit to Exadata, they switched from crappy procedural code to set-based data loads. Relatively speaking, relational databases tend to be pretty crap at procedural work, and amazing at set-based processing. There was of course no comparison of how well the new data load approach worked on the old kit, so who knows where the biggest performance gains were made?

This isn’t about me slagging off trunk-based development, or Exadata. It’s about people making flawed comparisons. Imagine what you would say if I said advanced driving courses are amazing and here’s the proof. I did a lap of the track in a Ford Fiesta and it took 5 minutes. I then did an advanced driving course and was able to improve my time to 3 minutes in a Ferrari. I credit the advanced driving course for all the performance improvements! It’s clearly a terrible comparison and a misleading conclusion.

Wrapping It Up

Going back to the post that initiated this rant, I don’t think they were purposely trying to be misleading. Please don’t send any hate, but I feel like the real conclusion was not about GitFlow verses trunk-based development. It was about two things.

  • Manual process verses automation.
  • A cultural shift to encourage large, long-lived enhancements to be broken down into smaller user stories and story points, and being willing to ship small sprints of work to production, rather than waiting for the entire enhancement to be complete before deploying.

The first point is very much what “The Principle of Flow” in Devops focusses on. The second point is common to discussions about DevOps and Agile in general. This is what resulted in the increased velocity IMHO.

So when you are reading stuff, try to engage your brain and read between the lines.

Cheers

Tim…

How will my content change?

I was on a call a few weeks ago and the subject of technical content came up. As someone who produces a certain type of content, I had some thoughts related to cloud services. Specifically how cloud services make my style of content less meaningful these days.

A large part of my content could be described as “traditional” DBA content. As we move more systems to the cloud, and start to use increasingly automated database services, the “traditional” DBA is becoming less relevant, and therefore a certain proportion of my content is becoming less relevant with it. We are due to get an on-prem release of Oracle database 21c soon, and I’ll certainly be doing some installation articles for that, but how many more releases after that will need installation articles? How many more releases will require traditional DBA content? At some point we’ll be using cloud-based data services, so people like me won’t be installing or patching stuff anymore. What does that mean for my content?

Of course, if I’m still working I will still be producing content. As followers of the blog know, writing stuff is part of my learning process, so every time I’m learning something new, you are likely to see some articles appear on that subject. The issue is, if the traditional DBA content stops being necessary, or I’m just not doing DBA work anymore, what sort of content will I be producing?

The short answer is I don’t really know. I don’t think any DBA knows what they will be doing in five years. I suspect I will be some form of developer, but I don’t really know what type of developer. I would imagine it would be data-related, but who knows…

If this post raises any questions in your mind, I’m afraid I’m not the person to answer them. It’s just a really odd time…

Cheers

Tim…

VirtualBox 6.1.20 & Vagrant 2.2.15

VirtualBox 6.1.20 has been released.

The downloads and changelog are in the usual locations.

While I was playing around with VirtualBox I noticed Vagrant 2.2.15 has been released. You can download it here.

I’ve installed both of those on Windows 10, macOS Big Sur and Oracle Linux 7 hosts. So far so good.

With the release of the Oracle patches I’ll be doing a lot of Vagrant and Docker builds in the coming days, so I should get to exercise this pretty well.

I’ll also do the Packer builds of my Vagrant boxes with the new versions of the guest additions. They take a while to upload, so they should appear on Vagrant Cloud in the next couple of days.

Happy upgrading!

Cheers

Tim…

Learning new stuff is good, even if you don’t use it…

I mentioned on Twitter, I was on a training course last week for a non-Oracle cloud-based database engine.

I went into the week feeling quite nervous about things, because I like to learn things at my own pace. Having to work at the same pace as others doesn’t work well for me. I’m quick at some things, and slow at others. That said, it was a really fun week. It flew by, which is always a good sign.

The main point of this post is not about the product or the course itself. It’s a more general point. I think it’s important to expose yourself to new bits of tech all the time. I have no idea how involved I will be in the project using this product, but it’s still good to learn more about it. When you learn something new it makes you question how you approach your existing role, and how you use your existing tech.

Of course, the Oracle database is what I’ve invested most of my time into, and it’s the thing I know the most about, but it’s far from the only thing I do on a day to day basis. I often say people should try the latest version of the Oracle database, because it might make them alter their approach to the older versions. This is true of different technology stacks too. Over the last few years I’ve spent a lot of time using and writing about things that are not core Oracle database tech, but that has had a positive influence on my approach to the database.

So going back to the title of the post, it’s important you learn new things, even if you don’t use them often, as they will change you and give you a broader understanding of your goal.

Cheers

Tim…

PS. How you approach learning new stuff depends on you. I wrote a series of posts about Learning New Things.

PPS. It’s not all about tech. You should also expose yourself to different elements of projects, so you better understand where other people are coming from.

You don’t need to be an expert to be useful!

I come from a time when you could be an expert at one thing and be really useful to a company, but I think that time is long gone. If you only have one skill, no matter how good you are at it, you probably can’t achieve anything in tech without waiting weeks to get people to help you with all the stuff you don’t understand. In recent years, being a tech allrounder seems to be much more useful than being an expert. Maybe it always was.

Of course, this poses its own set of problems. How do you learn all this stuff? That’s the hard part and there aren’t any short cuts. There isn’t a “full stack developer course” that will teach you. You’ve just got to work on your Google-Fu and start getting your hands dirty. On the positive side, there is a lot of good information out there to get you started. Blog posts and videos that will get you from zero to adequate in a short amount of time if you put in the effort.

Over the last few years I’ve played with a lot of different technology, and I still find new stuff interesting, but it’s taken me a long time to deal with the fact I’m crap at most of it. Good enough to get the job done and fool people into thinking I know what I’m doing, but ultimately only one weekend of playing with the tech and a couple of Google searches ahead of some other people.

So my advice to people in tech is:

  • Try and get involved in as many aspects of tech as possible.
  • Forget trying to be an expert in any of them. Just try to get good enough to be useful.
  • Be humble enough to realise that what you say and do today may change tomorrow when you’ve Googled a bit more.
  • Try to understand the big picture. How things fit together and how processes work. Programming languages and services change all the time, but understanding the goal and the processes to get there don’t change as much as you might think.

Remember, it’s just my opinion!

Cheers

Tim…

VirtualBox 6.1.18

VirtualBox 6.1.18 has been released.

The downloads and changelog are in the usual places.

I’ve installed it on Windows 10, macOS Big Sur and Oracle Linux 7 hosts with no problems.

I’ll be running new Packer builds for the oraclebase/oracle-7 and oraclebase/oracle-8 vagrant boxes, so they should appear with the new version of the guest additions over the next day or so.

Cheers

Tim…

Links: