DevOps : Why do you focus on Flow and Automation?

A few weeks ago I did a DevOps “Lunch & Learn” talk inside my company. I’m not trying to claim I’m “Billy DevOps”, but I need our company to move in that direction or we will die. While I was preparing for that talk I did some Googling for the common complaints about DevOps talks and training courses, hoping to avoid them, and what I got back was a bunch of people complaining about the heavy focus on “The Principle of Flow”, specifically the automation piece of that.

Take a look at any conference agenda and the DevOps talks are mostly focused on automation, whether that’s builds (Ansible, Terraform, Vagrant, Cloud) or Continuous Integration/Deployment (CI/CD). Automation is certainly a part of DevOps, but DevOps isn’t just automation. So why do people focus on the automation aspect of DevOps so much?

I started off my talk by saying something like this,

“I Googled the common complaints about DevOps talks and training, and most people complained about too much focus on the Principle of Flow and automation. I’m going to do the same thing!”

So why did I come to this conclusion? Well, there are a few reasons.

The principle of feedback relies on you getting that feedback and doing something with it, like improving applications or processes etc. I realise I’m being simplistic, but how can you do anything with that feedback unless you have flow sorted? If it takes you weeks/months/years to effect basic changes because of bad flow, the feedback becomes almost irrelevant. It only serves to demotivate you, as you identify all the problems, with no way of fixing them.

The principle of continual learning and experimentation relies on flow being sorted. If you can’t quickly and reliably build kit and deploy apps to it, how do you expect to be able to experiment and learn new things? I discussed this point in a post called Why Automation Matters : Reducing the Cost of Failure.

It feels like the majority of people I speak to don’t have basic automation sorted yet. In public they talk a good talk, but behind the scenes the processes in their company suck just as bad as ours. Either that, or they have one aspect of automation sorted, which they talk about all the time, forgetting to mention the other manual processes that persist.

Let’s not forget that most of the people I see talking about DevOps come from a technical background, and I suspect are probably more interested in the automation aspects of DevOps than the process side of things. Also, in their current roles they have the ability to influence automation more than they do process change, so they are going to focus on a fight they have a chance of winning.

I think it’s important to emphasise to people that automation isn’t the be-all and end-all of DevOps, but that doesn’t stop it being the fun bit for me. ๐Ÿ™‚

Check out the rest of the series here.

Cheers

Timโ€ฆ

Video : Vagrant : Oracle Database Build (19c on OL8)

Today’s video is an example of using Vagrant to perform an Oracle database build.

In this example I was using Oracle 19c on Oracle Linux 8. It also installs APEX 19.1, ORDS 19.2, SQLcl 19.2, with ORDS running on Tomcat 9 and OpenJDK 12.

If you’re new to Vagrant, there is an introduction video here. There’s also an article if you prefer to read that.

If you want to play around with some of my other Vagrant builds, you can find them here.

If you want to read about some of the individual pieces that make up this build, you can find them here.

The star of today’s video is Noel Portugal. It’s been far too long since I’ve seen you dude!

Cheers

Tim…

I’m 2% DevOps, 3% agile and 4% automated because of 3rd party apps…

I was having a discussion with my boss about the impact of 3rd party apps on the way we work, and how difficult things are when you have to deal with 3rd party apps, as opposed to just writing your own software.

It’s easier to do things well when you are in control of all the pieces. Most of the examples you see are people writing their own software, typically on new projects. That’s very different to dealing with old projects and 3rd party apps. I’ll give you some examples, without trashing the companies responsible for this.

Example 1

Our student system is provided by a 3rd party. The company in question has a really antiquated way of delivering applications. In recent years they’ve tried to resolve this by writing their own delivery mechanism, made up of some custom software and Jenkins. The problem is, this is just a wrapper over the old process, so it is not the most reliable tool in the world. Someone like me would describe it as putting lipstick on a pig.

In addition to that, you have to use a GUI to perform the operations. At this point there is no API to allow you to script operations, which makes building them into a bigger process really problematic. We have internal development which is gradually moving to something resembling CI/CD, but it will never truly meet that goal, because we have to include manual management of things because of the limitations of the 3rd party software.

I’m sure long-term customers see the new delivery mechanism as a great improvement, but it’s not something you would deliver for a new product. It’s less painful than it was, but not really good.

Example 2

We have a publishing system that is written in Java and runs on Tomcat. It is so close to being hands-off, but there are a couple of problems.

  • When you deploy a new version, it starts in maintenance mode and you need manual interaction to click an OK button a few times on a web-based maintenance screen. I’ve never “not clicked” the OK button, so I just want a “just do it” option, so I can let it get on with it.
  • When some features are enabled by the power users, the next restart of the application flips you into maintenance mode. We’ve had P1 incidents because a host failure has caused the VM to start on a new host, and because a user has enabled a new feature in the app, the automatic startup stalls, waiting for me to click the OK button a few times.

There are some other annoyances, which impact on availability and possible topology, as well. There is no way to resolve these because of limitations in the application. All we can do is raise enhancement requests with the vendor.

I could go on with more examples, but I think you get the message.

So what do you do?

It can be quite disheartening when you want to do things well, but you have to keep compromising because of factors outside your control. You have to try not to give up, and just keep plugging away.

  • Don’t make unrealistic comparisons between your environment and others. There’s no point comparing your mixed environment to a software house. I’ve worked in both. They are very different. Take what works. Ditch what doesn’t.
  • Semi-automated processes are better than processes that are 100% manual. Maybe you can use Robotic Process Automation (RPA) to automate what is essentially a manual process.
  • Try to make sure these considerations become part of your procurement process, or you will keep buying crap.
  • Try to be creative and find workarounds, don’t just bury your head in the sand. There’s always *something* you can do to improve things.
  • Even if something is terrible, that doesn’t stop you improving the processes around it.

I guess you should focus on the values, rather than trying to exactly match some prescriptive ideal.

Good luck!

Cheers

Tim…

PS. I’m pretty sure my boss is reading this laughing, as I’m following none of this advice myself, but instead stomping round the place like a thirteen year old having a strop because, “Everything is crap!” ๐Ÿ™‚

Why Automation Matters : Reducing the Cost of Failure

Recently I watched a video called The Future of Faster Enterprises by AWS Enterprise Strategist, Miriam McLemore. I think its a really good video, even if you don’t care about AWS or cloud in general. There is a wider message there.

One of the points Miriam raised was “Reducing the cost of failure”, which sparked a conversation between myself and a colleague. When you’re trying to improve the way you work, you are going to have to try new things. Not all of those things are going to work out. The important point is you try them, see if they work for you. If they do great. If they don’t, you throw them away and move on. Reducing the cost of failure is a really important part of encouraging the culture of experimentation needed for continuous improvement.

Recently I wrote a post called you have to keep working just to stand still. Now add to that the work required to move your company forward and I think you’ll see why any barrier to progress is a problem.

So what factors affect the cost of failure? Here are a few.

  • Lack of automation. If humans are involved in providing infrastructure, it’s going to increase the time it takes to set things up (see lost time), and they will get disgruntled when you ask them to throw it away 2 hours after you’ve got it. You need to be able to build and burn kit rapidly to have any hope of experimenting. This is why the focus on the automation part of flow in DevOps is so important, for both business as usual and experimentation.
  • Bloated waterfall process. If your company expects a detailed plan of action before you so much as fart, you are going to fail. You have to be agile. I’m not using the term agile in the, “I’m too lazy to plan”, sense. I mean proper agile.
  • Time. Your company has to value progress and be willing to allocate time to it. You can’t rely on the fact Beryl and Bert go home every night and no-life their way through learning something new, so the business can reap the benefit of it for free. Yes that happens, but companies that rely on it will fail.
  • Be accepting of failure. I’m not talking about being happy to be rubbish. I’m not talking about being accepting of failure in well defined business as usual (BAU) work. I’m talking about being accepting of failure during experimentation. Not everything will work. Not everything will be the right solution for you or your company. You have to be willing to try and fail or you will fall at the first hurdle.

Check out the rest of the series here.

Cheers

Tim…

My Vagrant Habit

I’ve posted a lot about automation and Vagrant over the last year. It’s got to the point where I find it quite difficult/annoying to create a VM manually anymore. I hadn’t really noticed this until a couple of days ago…

I wanted to try some stuff out with Fedora 30, which is currently in beta. I had a look and couldn’t find any Vagrant boxes for Fedora 30, so I downloaded the ISO image and started to do a manual creation of a VM. It wasn’t very long before I got really annoyed, because it felt so clumsy, and there were so many silly little things I had to do that Vagrant either does for me, or are really simple to configure with Vagrant. After a few minutes I threw my toys out of the pram and started to read up on creating a Vagrant base box. In all this time I had never created one for myself. Turns out it’s really simple.

Once I had that in my Vagrant box list, I could quickly bang out a number of tests. Happy days…

So what did this teach me? It seems I’ve become totally and utterly intolerant of doing anything manually! ๐Ÿ™‚

Cheers

Tim…

Why Automation Matters : Consistent Test Environments

I’ve already made this point in a previous post, but I thought it was worth mentioning in a little more detail.

One of the neat things about automation is it gives you the ability to quickly build/replace test environments, so you know you have a consistent starting point. This is especially important for automated testing (unit, integration etc.), but it also applies to your learning experience.

I’m currently learning about a bunch of Oracle 18c new features. Some of those features are limited to engineered systems and Oracle Database Cloud Services. Not to worry, there is a little hack that gets you round some of those restrictions for testing purposes.

alter system set "_exadata_feature_on"=true scope=spfile;
shutdown immediate;
startup;

In some cases I’m enabling extended data types. I’m also building additional test instances, and multiple test users, each requiring different levels of privileges.

So I finish learning about feature X and I move on to learning about feature Y. What am I bringing along with me for the ride? What problems will I run into, or not run into, as a result of the hacks I put in place for the previous test? I have no way of knowing.

This is where automation comes in really useful. You can quickly burn and build your test environment and start with a clean slate. This can also be really useful to check your understanding, by rerunning your tests on clean kit. Did you really remember to write down everything you did? ๐Ÿ™‚

It’s kind-of obvious I know, but it’s really surprising how often I’m rebuilding my testing kit these days. I’m literally talking multiple times a day just when I’m messing about with stuff. Earlier in the week someone asked me a question about RAC builds and I did the following in a 3 hour period, while I was doing other stuff. ๐Ÿ™‚

  • Oracle 18c RAC build on a Windows 10 host.
  • Oracle 12.2 RAC build on a Windows 10 host.
  • Oracle 18c RAC build on an Oracle Linux 7.6 host.
  • Oracle 18c RAC build on a macOS Mojave host.

There’s no way I could have contemplated that without automation.

When you are learning new stuff, the last thing you need to worry about is being thrown off target by crap left over from previous tests, so just start again with a clean slate!

Check out the rest of the series here.

Cheers

Tim…

PS. I know sometimes you can learn interesting stuff by making mistakes, like finding out that feature X and feature Y are incompatible, but I think you should approach those sort of tests in a controlled and conscious manner. Learning the basics first is far more important in my opinion.

Why Automation Matters : The Series

A few months ago I decided to write a post about the lost time associated with the hand-offs between teams. It was relevant to a conversation I wanted to have, and I wanted to order my thoughts before I went into that conversation. That post accidentally became a series of posts, which I’ve listed below.

I’m not an expert at automation and I’m far from being an expert at DevOps. Theses were just a useful exercise for me, so I thought they might be of interest to other people.

I’m not sure if I’ll write any more, but if I do, I’ll add them to this page.

I’ve added an Automation category to the blog, which I’ve been using to categorise these posts, and other things like my posts about Docker and Vagrant.

Cheers

Tim…

Why Automation Matters : Technical Debt

I was going to include Technical Debt in yesterday’s post about Unplanned Work, but I thought it deserved a post of its own.

What is it? You can read a definition here, but essentially it comes down to a short-termism approach to solving problems. It can be applied to many situations, but here are two.

  • You have a bunch of applications written in Oracle Forms 6i. A new requirement comes in, and rather than biting the bullet and writing the new application in something more up to date, you write it in Forms 6i and ship it.
  • You have to build a new server, which involves manual processes for building the VM, OS and other software (app server, DB etc.). You go ahead and do it the way you always have, rather than using this as an opportunity to take a step back and start working on automation first.

In both these cases, it might actually be the correct decision to just move forward, as you may not have the necessary time and skills yet to do something “better”. It’s not the specific decision that matters as much as the recognition of the implications of that decision. By moving forward with this, you have to recognise you’ve added to your technical debt.

In the case of the development example it’s quite obvious. You now have yet another application that will have to be upgraded/rewritten in the future. You’ve added to your future workload.

In the case of the server it may be less obvious. If everything were done properly, with no human errors, you may have a beautifully consistent and perfect server, but the reality is that isn’t going to happen and you’ve just added another “non-standard” server to your organisation, that will probably result in more unplanned work later, and should immediately go on the list of things that needs replacing, once an automated and standardised approach is created.

Technical debt is insidious because it’s so easy to justify that you made the right decision, and turn a blind eye to the problems down the road.

What’s this got to do with automation? In this case it’s about removing obstacles. Improving your delivery of infrastructure and application delivery pipeline makes it far easier to make changes in the future, and one thing we know about working in technology is everything is constantly changing. I see automation as an enabler of change, which can help you make decisions that won’t add to your technical debt.

Check out the rest of the series here.

Cheers

Tim…

Why Automation Matters : Unplanned Work and Death of Productivity!

The first time I heard the term “unplanned work” it was like a light bulb switched on in my head. I’m not sure my brain even recognised the distinction between different types of work before I heard this term. It was all just work to me.

You’ve probably read the descriptions of the types of work from one methodology or another before, but just to summarise you could break it down to three types:

  • External Projects. This is project work done for your customers. Depending on where you work, your customers could be external to the company, or a different department in the company.
  • Internal Projects. This is project work done within the team to improve your situation. It could be refactoring stuff, patching, upgrades, improving your tooling and processes, or automation of tasks. You get the idea.
  • Unplanned Work. This is stuff that comes out of the blue and forces you to drop what you are doing and look at it. Maybe a priority 1 incident.

The first two are planned work, since they are both projects, where you should have an idea of the resources required and the time they will take. As the name suggests, unplanned work is unplanned. ๐Ÿ™‚ You can add some slippage time into your projects to allow for interruptions by unplanned work, but ultimately you never know if it’s going to be enough.

So what’s this got to do with automation? Well to put it bluntly, when you do stuff manually you are going to screw up and have inconsistencies between your environments. Later you will do something that you’ve done “a million times before”, knowing exactly how long it’s going to take, it will fail and you’ll lose a bunch of time trying to figure out what went wrong and how to fix it. While you are doing that, all the work you should have been doing is not happening and the list of people screaming at you is getting longer by the minute.

It can easily get to the point where you are constantly firefighting, bouncing from one problem to the next, and never make any headway with your projects. Welcome to the world on unplanned work, population you!

I’m not trying to make out automation is 100% guaranteed to prevent unplanned work, but there are a whole bunch of cases where it’s going to reduce the incidence of problems, or make resolution of those problems simpler.

Check out the rest of the series here.

Cheers

Tim…

Why Automation Matters : Can’t the cloud do it for you?

One of the comments on my previous post in the series mentioned using the cloud may solve a lot of these issues, implying you don’t have to bother with your own automation. Cursed with the ability to see both sides to any argument, I both agree and disagree with this. ๐Ÿ™‚

Cloud providers bring a lot to the table as far as automation is concerned. Firing up new VMs and containers is really simple, and of course platforms such as RDS and the Oracle Autonomous Database family take over many of the operational aspects. So I can forget about automation right? Not so fast…

We typically see demos of cloud services that involve clicking buttons on web pages and it all works and looks great, but it’s not the way we really want to work. We want our infrastructure as code, and you can’t check button presses into your version control. ๐Ÿ™‚ Also, if we are promoting self-service in the company, the last thing we want to do is give everyone access to our cloud account.

The cloud providers have got our back here. They allow us to use CLIs, web services and tools like Terraform to define whole chunks of infrastructure based on their services. You can use these tools to create your own self-service portals within your company. But that’s a new bunch of stuff you have to learn to become effective using this platform. It hasn’t freed you up from having to think about automation completely. It’s just altered your focus.

What’s more, a cloud provider will not be able to provide every solution you need, configured exactly the way you want it. They may provide many of the building blocks or platforms, but you are still going to have to do some of the work your self, whether it’s application configuration or change management. All of this still needs to be automated if you want to live up to the infrastructure as code mantra.

We also have companies at various stages in the cloud journey to consider. Some companies are still not considering cloud. Some are part way through the journey. Some will almost certainly be running in mixed environments, made up of on-prem and multiple cloud providers for a long time, or eve forever? Automation allows you to abstract some of the working parts, giving you some consistency in these mixed environments.

I think this all comes down to levels. You may never have to install or patch a database again, but that isn’t the whole story as far as automation is concerned.

Check out the rest of the series here.

Cheers

Tim…