Continuous Delivery : Is there a problem with the sales pitch?

I saw Simon Haslam point to this very short video of Charity Majors speaking about Continuous Delivery.

This answer to why companies are not using continuous delivery is perfect!

“It’s a failure of will, and a failure of courage on the part of our technical leadership.”

I love this answer, because this is exactly how I feel about it!

After seeing that, it got me thinking about why technical leadership are so disengaged from continuous integration/continuous delivery (CI/CD), and I started to wonder if it was actually a problem with the sales pitch.

Have you ever been in a discussion where you provide compelling evidence for your stance, then say one stupid thing, which allows people with the opposing view to jump all over you, and effectively ignore all the stuff you said previously? Been there! Done that! I wonder if the same thing is happening during the CI/CD sales pitch.

When people write or speak about this stuff, they will often bring up things that provide an instant get-out for people. Let’s imagine I am trying to convince someone that CD is the way forward. I might say things like,

  • Automation means it’s not dependent on a specific person being there to complete the deployment.
  • We can eliminate human error from the delivery process.
  • It makes delivery more reliable, as we have a well tested and proven process.
  • That proven reliability makes both us and our customers more confident that deployments will be successful, so it reduces the fear, uncertainty and doubt that often surround deployments.
  • As a result of all of the above, it makes the delivery process quicker and more efficient.

That all sounds great, and surely seals the deal, but then we hit them with this.

  • Amazon does 23,000 production deployments a day!

And now you’ve lost your audience. The room of people who are scared of change, and will look for any reason to justify their stagnation, will likely go through this thought process.

  • Amazon use CI/CD to get 23,000 production deployments a day.
  • We don’t need to do 23,000 production deployments a day.
  • Therefore we don’t need CI/CD.

I know this sounds stupid, but I’m convinced it happens.

I’ve read a bunch of stuff over the years and I’m invested in this subject, but I still find myself irritated by some of the things I read because they focus on the end result, rather than the core values that got them to that end result. Statements like, “Amazon does 23,000 production deployments a day” or “this is what Twitter does”, are unhelpful to say the least. I feel like the core values should be consistent between companies, even if the end result is very different.

This is just a thought and I could be talking complete crap, but I’m going to try and declutter myself of all this hype bullshit and try to focus on the core values of stuff, and hopefully stop giving people a reason to ignore me…

Cheers

Tim…

DBmarlin : New Liquibase Integration

I had a call with some folks from DBmarlin back in June and they gave me a demo of their new database performance monitoring product, which supports any mix of on-prem and cloud databases. You can see the kind of stuff they do in the videos at the end of the post. One of the things I thought was cool was it tracks changes and overlays them on their timeline, so you can see if a change results in a significant difference in performance. Neat!

I mentioned this on Twitter at the time and Ilmar Kerm responded with this message.

“Good idea to overlay dB changes… this gives me an idea to start collecting all changes from liquibase databasechangelog tables and annotate grafana dashboards with them”

The next day he came back with this post.

“Done and deployed πŸ™‚ Collecting Liquibase databasechangelog entries into central InfluxDB measurement. Added these events as annotations to grafana + dashboard just to browse all individual executed changesets as a text log. I like it.”

You can see an image of his dashboard here.

A few days ago I got an email from the DBmarlin folks again asking if I wanted to check out their Liquibase integration, inspired by Ilmar, so on Friday last week we had that call.

Their implementation is different to Ilmar’s, as they don’t reference the DATABASECHANGELOG table. Instead they’ve got a jar file that integrates with the Liquibase client to give them the change information, and allow them to link through to Liquibase Hub. The Liquibase integration is available from this repo on GitHib.

If you click on the image below to expand it, you will see the timeline with the changes (see my red arrows), and the list of changes at the bottom. Notice the Liquibase change I’ve highlighted.

Clicking on the link in the Liquibase change takes you to Liquibase Hub, where you can get more details about the change.

I really like this.

This is hot off the press. They are looking for people to kick the tyres on this and give them some feedback, so if you’re currently using the Liquibase client and Liquibase Hub with your databases (Oracle, PostreSQL, SQL Server, MySQL, CockroachDB etc.), give Russell (russell.luke@applicationperformance.com) a shout and have a play! You could get a free copy of DBmarlin for your feedback.

By the way, they also do Jenkins integration. See here. πŸ™‚

Cheers

Tim…

BTW, this is not a sponsored post. I just like what they are doing with this product, so I thought it was worth a shout out.

Videos

DBmarlin for Liquibase

DBmarlin for Jenkins

DBmarlin short demo

Check out their YouTube channel here.

AdoptOpenJDK to Adoptium

This post is a “note to self”….

If you’ve been using AdoptOpenJDK to download OpenJDK, you will have noticed for the last few months there has been a message at the top of the screen with this link.

That message recently changed to include the following message.

“Our July 2021 and future releases will come from Adoptium.net

When you go to the link you will see a very familiar looking page layout, with some slightly different branding. πŸ™‚

They July updates are scheduled for the end of July, so you’ll have to wait a bit. In previous quarters they’ve been less than 3 working days of the initial security announcement, but I guess the reorganization has delayed things somewhat.

So just remember to switch from this:

https://adoptopenjdk.net/

To this:

https://adoptium.net/

Happy patching!

Cheers

Tim…

Update: The OpenJDK 8 and 16 downloads are now present. Still waiting for the OpenJDK 11 downloads.

Performance/Flow : Focusing on the Constraint

Over a decade ago Cary Millsap was doing talks at Oracle conferences called “Thinking Clearly About Performance”. One of the points he discussed was identifying the big bottlenecks and dealing with those, rather than focusing on the little things that weren’t causing a problem. For example, if a task is made up of two operations where one takes 60 seconds to complete and the other one takes 1 second to complete, which one will give the most benefit if optimized?

This is the same issue when we are looking at automation to improve flow in DevOps. There are a whole bunch of things we might consider automating, but it makes sense to try and fix the things that are causing the biggest problem first, as they will give the best return. In DevOps and Lean terms that is focusing on the constraint. The weakest link in the chain. (see Theory of Constraints).

Lost in Automation

The reason I mention this is I think it’s really easy to get lost during automation. We often focus on what we can automate, rather than what needs automating. With my DBA hat on it’s clear my focus will be on the automation of provisioning and patching databases and application servers, but how important is that for the company?

  • If the developers want to “build & burn” environments, including databases, for their CI/CD pipelines, then automation of database and application server provisioning is really important as it might happen multiple times a day for automated testing.
  • If the developers use a more traditional dev, test, prod approach to environments, then the speed of provisioning new systems may be a lot less important to the overall performance of the company.

In both cases the automation gives benefits, but in the first case the benefits are much greater. Even then, is this the constraint? Maybe the problems is it takes 14 days for approval to run the automation? πŸ™‚

It’s sometimes hard for techies to have a good idea of where they fit in the value chain. We are often so focused on what we do, and don’t have a clue about the bigger picture.

Before we launch into automation, we need to focus on where the big problems are. Deal with the constraints first. That might mean stopping what you’re doing and going to help another team…

Don’t Automate Bad Process

We need to streamline processes before automating them. It’s a really bad idea to automate bad processes, because they will become embedded for life. It’s hard enough to get rid of bad processes because the, “it’s what we’ve always done”, inertia is difficult to overcome. If we add automation around that bad process we will never get rid of it, because now people will complain we are breaking the automation if we alter the process.

Another thing Cary talked about was removing redundant steps. You can’t make something faster than not doing it in the first place. πŸ™‚ It’s surprising how much useless crap becomes embedded in processes as they evolve over the years.

The process of continuous improvement involves all aspects of the business. We have to be willing to revise our processes to make sure they are optimal, and build our automation around those optimised processes.

I’m not advocating compulsory tuning disorder. We’ve got to be sensible about this stuff.

Know When to Cut Your Losses

The vast majority of examples of automation and DevOps are focussed on software delivery in software development focused companies. It can be very frustrating listening to people harp on about this stuff when you work in a mixed environment with a load of 3rd party applications that will never be automated because they can’t be. They can literally break every rule you have in place and you are still stuck with them because the business relies on them.

You have to know where to cut your losses and move on. There will be some systems that will remain manual and crappy for as long as they are in your company. You can still try and automate around them, and maybe end up in a semi-automated state, but forget wasting your time trying to get to 1000 deployments a day. πŸ™‚

I wrote about this in a post called I’m 2% DevOps, 3% agile and 4% automated because of 3rd party apps.

Try to Understand the Bigger Picture

I think it makes sense for us to all to try and get a better understanding of the bigger picture. It can be frustrating when you’ve put in a lot of work to automate something and nobody cares, because it wasn’t perceived as a problem in the big picture. I’m not suggesting we all have to be business analysts and system architects, but it’s important we know enough about the big picture so we can direct our efforts to get the maximum results.

I wrote a series of posts about automation here.

Cheers

Tim…

Does anyone care about enterprise application servers anymore?

A few years ago it seemed like everywhere you turned there was a vendor extolling the virtues of their enterprise application server. The message was very much “big is beautiful”. The more complicated the better. The more features the better. Fast forward to present time and is that message still the same?

In a recent conversation I was asked about my thoughts on this and my response was I don’t think these “do it all” application servers matter anymore. This post outlines my feelings on the matter.

Full disclosure

Because of my job I was forced to learn WebLogic, and I’ve always disliked it. I think it’s over-complicated, a pig to manage, and every quarter the list of vulnerabilities look like a horror show. Keep this in mind when reading the rest of this post. Basically, I’m saying this is my biased opinion and you don’t have to agree with me. I’m not throwing shade at any particular company. In my opinion, the mindset at the time was different to now, which resulted in similar products from multiple companies.

Also, it’s worth saying I’m not an application server guru, so that probably affects my views on the subject.

Third-party application vendors

The company I work for at the moment uses a lot of third party apps. From what I can see, the 3rd party application vendors that were previously looking to offer solutions on products like Websphere and WebLogic are quite happily getting rid of them in favour of more streamlined solutions. They are quite happy to forgo the additional features, or implement them with additional libraries in their code.

Maybe that’s just the application vendors I come into contact with. Maybe your experience is different. I would be interested to know, in case I’m totally off the mark here.

Containers

With the rise of containers we’ve become accustomed to small and lightweight pieces of infrastructure that focus on doing one thing well, rather than trying to be all things to all people.

I know you can run these enterprise application servers in containers and on Kubernetes, but it just feels like trying to force monoliths into containers. That’s not my idea of progress.

When I talk about lightweight, I’m not just thinking about the size of the footprint, but the complexity of it also.

Administration

The administration of the monoliths is too complicated. Given the choice of training up a new person on WebLogic or Tomcat, I know which I would pick.

Even after all these years, when we get an issue on a WebLogic server a piece of me dies because I know I’ll be rooting around for ages to get the answers. In comparison, Tomcat is so much simpler.

We go back to my definition of the size of the footprint again. The complexity of some of these products comes at a cost.

But what about product X?

I realise that some products still need these enterprise application servers. If you use a product that needs them, that’s fine. Continue doing so. I just wouldn’t be looking to buy new products that require that much investment in time and money. It’s just my opinion though.

Moving to the cloud

One of the things I’ve mentioned several times is the move to the cloud changes things significantly. To a certain extent, I don’t care about the underlying technology used by a cloud service. I just care that it meets my requirements. Does it perform well enough? Does it have the required availability? Does it come at a reasonable price? Is it really easy to administer, or preferably need no real administration? Does it support the programming language/framework I care about? At no point am I really thinking about the underlying tech.

If a cloud provider builds out there service using an enterprise application server, that’s fine. I just don’t want to pay those licensing fees, and I don’t want to see any of the management. I want the most streamlined experience I can get.

What do you think?

I’m really interested to know what you think. I come to this conversation with a lot of bias, so I understand if a lot of people don’t agree with me. Are you looking to simplify this layer of your infrastructure, or do you still lean on the side of “big is beautiful”?

Cheers

Tim…

High Availability : How much availability do you really need?

I had a discussion with a few folks about high availability (HA) and I thought I would write down some of my thoughts on it. I’m sure I’ve made many of these points before, but maybe not collectively in this form.

Before we start

This is not specifically about Oracle products. It applies equally well to any technology, but I will use Oracle as an example, because that’s what I know.

When I speak about a specific product or technology I am only talking about it in the context of HA. I don’t care what other benefits it brings, as they are not relevant to this discussion.

This is not about me saying, “you don’t need to buy/use X”. It’s me asking you to ask yourself if you need X, before you spend money and time on it.

How much downtime can you really tolerate?

This is a really simple question to ask, but not one you will always get a considered answer to. Without thinking, people will demand 24×7 operation with zero downtime, yet you ask for a downtime window to perform a task and it will get approved. Clearly this contradicts the 24×7 stance.

As a company you have to get a good grasp of what downtime you can *really* tolerate. It might be different for every system. Think about interfaces and dependencies. If system A is considered “low importance”, but it is referenced by system B that is considered “high importance”, that may alter your perception of system A, and its HA requirements.

There are clearly some companies that require as close to 100% availability as possible, but also a lot don’t. Many can get away with planned downtime, and provided failures don’t happen too often, can work through them with little more than a few grumbles. We are not all the same. Don’t get lead astray by thinking you are Netflix.

The more downtime you can tolerate, the more HA options are available to you, and the simpler and cheaper your solutions can become.

What is the true cost of your downtime?

The customers of some companies have no brand loyalty. If the site is down, the customers will go elsewhere. Some companies have extreme brand loyalty and people will tolerate being messed around.

If Amazon is down, I will wait until it is back online and make the purchase. There could be a business impact in terms of the flow of work downstream, but they are not going to lose me as a customer. So you can argue Amazon can’t tolerate downtime, or you can argue they can.

I used to play World of Warcraft (WOW), and it always irritated me when they did the Wednesday server restarts, but I just grumbled and waited. Once again, their customer base could tolerate planned downtime.

In some cases you are talking about reputational damage. If an Oracle website is down it’s kind-of embarrassing when they are a company that sells HA solutions. Reputational damage can be quite costly.

This cost of downtime for planned maintenance and failures has to factor into your decision about how much downtime you can tolerate.

Can you afford the uptime you are demanding?

High availability costs money. The greater the uptime you demand, the more it’s going to cost you. The costs are multi-factored. There is the cost of the kit, the licenses and the people skills. More about people later.

If you want a specific level of availability, you have to be willing to invest the money to get it. If you are on a budget, good luck with that 99.99+% uptime… πŸ™‚

Do you have the skills to minimize downtime?

It’s rare that HA comes for free from a skills perspective. Let’s look at some scenarios involving Oracle databases.

  • Single instance on VM: You are relying on your virtual infrastructure to handle failure. Your DBAs can have less HA experience, but you need to know your virtualization folks are on form.
  • Data Guard: Your DBAs have to know all the usual skills, but also need good Data Guard skills. There is no point having a standby database if you don’t know how to use it, or it doesn’t work when you need it.
  • Real Application Clusters (RAC): Now your DBAs need RAC skills. I think most people would agree that RAC done badly will give you less availability than a single instance database, so your people have to know what they are doing.
  • RAC plus Data Guard: I think you get the point.

We often hear about containers and microservices as the solution to all things performance and HA related, but that’s going to fail badly unless you have the correct skills.

Some of these skills can be ignored if you are willing to use a cloud service that does it for you, but if not you have to staff it! That’s either internal staff, or an external consultancy. If you skimp on the skills, your HA will fail!

What are you protecting against?

The terms high availability (HA) and disaster recovery (DR) can kind-of merge in some conversations, and I don’t want to get into a war about it. The important point is people need to understand what their HA/DR solutions can do and what they can’t.

  • Failure of a process/instance on a single host.
  • Failure of a host in a cluster located in a single data centre.
  • Failover between data centres in the same geographical region.
  • Failover between data centres in different geographical regions.
  • Failover between planets in the same solar system.

You get the idea. It’s easy to put your money down and think you’ve got HA sorted, but have you really? I think we’ve all seen (or lived through) the stories about systems being designed to failover between data centres, only to find one data centre contains a vital piece of the architecture that breaks everything if it is missing.

Are all your layers highly available?

A chain is only as strong as the weakest link. What’s the point of spending a fortune on sorting out your database HA if your application layer is crap? What’s the point of having a beautifully architected HA solution in your application layer if your database HA is screwed?

Teams will often obsess about their own little piece of the puzzle, but a failure is a failure to the users. They aren’t going to say, “I bet it wasn’t the database though!”

Maybe your attention needs to be on the real problems, not performing HA masturbation on a layer that is generally working fine.

Who are you being advised by, and what is their motivation?

Not everyone is coming to the table with the same motivations.

  • Some vendors just want to sell licenses.
  • Some consultants want to charge you for expensive skills and training.
  • Some consultants and staff want to get a specific skill on their CV, and are happy for you to pay them to do that.
  • Some vendors, consultants and staff don’t engage their brain, and just offer the same solution to every company they encounter.
  • Some people genuinely care about finding the best solution to meet your needs.

Over my career I’ve seen all of these. Call me cynical, but I would suggest you always question the motives of the people giving you advice. Not everyone has your best interests at heart.

So what do you do?

In my current company we use our virtual infrastructure for basic HA. The databases (Oracle, MySQL and SQL Server) typically run as single instances in a VM, and failover to a different host in the same data centre or a different data centre depending on the nature of the failure. There are some SQL Server databases that use AlwaysOn, but I see little benefit to it for us.

Every so often the subject of better database HA comes up. We can tolerate a certain level of downtime for planned maintenance and failures, and the cost and skills required for better HA are not practical for us at this point. This position is correct for us as we currently stand. It may not be the correct option in future, and if so we will revisit it.

For the middle tier we do the normal thing of multiple app servers (VMs or containers) behind load balancers.

I could probably build a very convincing case to do things differently to make my job a little “sexier”, but that would be a dick move. As it happens I want to move everything to the cloud so I can stop obsessing about the boring stuff and let the cloud provider worry about it. πŸ™‚

Conclusion

There is no “one size fits all” solution to anything in life. As the mighty Tom Kyte said many times, the answer is always, “it depends”. If people are making decisions without discussing the sort of things I’ve mentioned here, I would suggest their decision process is flawed. Answers like, “but that’s what Twitter does”, are unacceptable.

Cheers

Tim..

Updates to Vagrant and Docker Builds (Oracle Patches and Upgrades)

Unless you’ve been living under a rock, you will know there have been a load of software patches and updates released recently. As a result I’ve been constantly updating my Vagrant and Docker builds as each one has dropped. With the release of ORDS 21.1, the main push for this quarter is done.

This is just a heads-up of what’s been happening.

Packer : My Packer builds of OL7 and OL8 Vagrant boxes have been updated and pushed to Vagrant Cloud. This ended up happening twice due to the quick release of VirtualBox 6.1.22 a few days after 6.1.20.

Vagrant : All relevant builds now have the latest Java 11, Tomcat 9, ORDS 12.1 and SQLcl 21.1 versions. Where necessary the database patches are included. I mostly try to do builds with stock releases, so people without a support contract can still use them, but some things require the patches to function properly. If you follow the blog you will already know the Oracle Enterprise Manager Cloud Control 13.5 builds have now been included.

Docker/Containers : Similar to Vagrant, all relevant builds now have the latest Java 11, Tomcat 9, ORDS 12.1 and SQLcl 21.1 versions. Database patches are updated where necessary.

There is still some stuff on the horizon though. With the new version of APEX dropping on the apex.oracle.com, I expect a new on-prem release soon (see update). There is also the on-prem release of Oracle database 21c, which I’m hoping drops soon. Once it does I will be adding those builds…

Cheers

Tim…

Update: APEX 21.1 dropped today (12-May-2021) just after publishing this post. It’s been added to all the builds now. πŸ™‚

What are you actually comparing? (an extended Twitter rant)

I had a bit of a rant on a Twitter thread yesterday about this post.

The Transition to Trunk-Based Development

I’m going to repeat that rant here, then expand on it below.

The Rant

This kind-of gets on my tits.

It reads like, “we were doing everything badly and blamed it on GitFlow. Then we switched to trunk-based development and started to do everything properly. Trunk-based dev rules.”

You went from:

  • No automation to automated delivery.
  • Manual testing to automated testing.
  • Huge, long-lived feature branches to deploying smaller units of work.

None of this is about GitFlow vs trunk-based. It’s all about doing modern development properly.

This is such a lazy argument. I would be interested to see you fix all that stuff in GitFlow, then move to trunk-based and see what the difference/improvement was. That is a better comparison.

Now I realise sometimes you have to have a big step-change project to get people to do something different, and maybe trunk-based development was that trigger for your company, and if so, that’s good for you, but as I said, this is not GitFlow vs. trunk-based.

So once again. I’m not saying one method is better than the other. I’m just saying be honest about the major factors that contributed to your increased velocity.

None of the factors listed were specific to these styles of development. They are basic Agile and DevOps things.

Expanding

I see this sort of thing all the time.

I remember being in a session about Exadata when it was first released, and being told about 1000x performance when one project was moved from old conventional kit to Exadata. Someone much brighter than me asked the question about what code changes were implemented as part of the migration. It turned out a bunch of row-by-row (slow-by-slow) procedural data loads were ripped out and replaced by parallel DML operations using external tables. So as well as moving from old crappy kit to Exadata, they switched from crappy procedural code to set-based data loads. Relatively speaking, relational databases tend to be pretty crap at procedural work, and amazing at set-based processing. There was of course no comparison of how well the new data load approach worked on the old kit, so who knows where the biggest performance gains were made?

This isn’t about me slagging off trunk-based development, or Exadata. It’s about people making flawed comparisons. Imagine what you would say if I said advanced driving courses are amazing and here’s the proof. I did a lap of the track in a Ford Fiesta and it took 5 minutes. I then did an advanced driving course and was able to improve my time to 3 minutes in a Ferrari. I credit the advanced driving course for all the performance improvements! It’s clearly a terrible comparison and a misleading conclusion.

Wrapping It Up

Going back to the post that initiated this rant, I don’t think they were purposely trying to be misleading. Please don’t send any hate, but I feel like the real conclusion was not about GitFlow verses trunk-based development. It was about two things.

  • Manual process verses automation.
  • A cultural shift to encourage large, long-lived enhancements to be broken down into smaller user stories and story points, and being willing to ship small sprints of work to production, rather than waiting for the entire enhancement to be complete before deploying.

The first point is very much what “The Principle of Flow” in Devops focusses on. The second point is common to discussions about DevOps and Agile in general. This is what resulted in the increased velocity IMHO.

So when you are reading stuff, try to engage your brain and read between the lines.

Cheers

Tim…

How will my content change?

I was on a call a few weeks ago and the subject of technical content came up. As someone who produces a certain type of content, I had some thoughts related to cloud services. Specifically how cloud services make my style of content less meaningful these days.

A large part of my content could be described as “traditional” DBA content. As we move more systems to the cloud, and start to use increasingly automated database services, the “traditional” DBA is becoming less relevant, and therefore a certain proportion of my content is becoming less relevant with it. We are due to get an on-prem release of Oracle database 21c soon, and I’ll certainly be doing some installation articles for that, but how many more releases after that will need installation articles? How many more releases will require traditional DBA content? At some point we’ll be using cloud-based data services, so people like me won’t be installing or patching stuff anymore. What does that mean for my content?

Of course, if I’m still working I will still be producing content. As followers of the blog know, writing stuff is part of my learning process, so every time I’m learning something new, you are likely to see some articles appear on that subject. The issue is, if the traditional DBA content stops being necessary, or I’m just not doing DBA work anymore, what sort of content will I be producing?

The short answer is I don’t really know. I don’t think any DBA knows what they will be doing in five years. I suspect I will be some form of developer, but I don’t really know what type of developer. I would imagine it would be data-related, but who knows…

If this post raises any questions in your mind, I’m afraid I’m not the person to answer them. It’s just a really odd time…

Cheers

Tim…

VirtualBox 6.1.20 & Vagrant 2.2.15

VirtualBox 6.1.20 has been released.

The downloads and changelog are in the usual locations.

While I was playing around with VirtualBox I noticed Vagrant 2.2.15 has been released. You can download it here.

I’ve installed both of those on Windows 10, macOS Big Sur and Oracle Linux 7 hosts. So far so good.

With the release of the Oracle patches I’ll be doing a lot of Vagrant and Docker builds in the coming days, so I should get to exercise this pretty well.

I’ll also do the Packer builds of my Vagrant boxes with the new versions of the guest additions. They take a while to upload, so they should appear on Vagrant Cloud in the next couple of days.

Happy upgrading!

Cheers

Tim…