Why Automation Matters : The Series

A few months ago I decided to write a post about the lost time associated with the hand-offs between teams. It was relevant to a conversation I wanted to have, and I wanted to order my thoughts before I went into that conversation. That post accidentally became a series of posts, which I’ve listed below.

I’m not an expert at automation and I’m far from being an expert at DevOps. Theses were just a useful exercise for me, so I thought they might be of interest to other people.

I’m not sure if I’ll write any more, but if I do, I’ll add them to this page.

I’ve added an Automation category to the blog, which I’ve been using to categorise these posts, and other things like my posts about Docker and Vagrant.

Cheers

Tim…

Why Automation Matters : Technical Debt

I was going to include Technical Debt in yesterday’s post about Unplanned Work, but I thought it deserved a post of its own.

What is it? You can read a definition here, but essentially it comes down to a short-termism approach to solving problems. It can be applied to many situations, but here are two.

  • You have a bunch of applications written in Oracle Forms 6i. A new requirement comes in, and rather than biting the bullet and writing the new application in something more up to date, you write it in Forms 6i and ship it.
  • You have to build a new server, which involves manual processes for building the VM, OS and other software (app server, DB etc.). You go ahead and do it the way you always have, rather than using this as an opportunity to take a step back and start working on automation first.

In both these cases, it might actually be the correct decision to just move forward, as you may not have the necessary time and skills yet to do something “better”. It’s not the specific decision that matters as much as the recognition of the implications of that decision. By moving forward with this, you have to recognise you’ve added to your technical debt.

In the case of the development example it’s quite obvious. You now have yet another application that will have to be upgraded/rewritten in the future. You’ve added to your future workload.

In the case of the server it may be less obvious. If everything were done properly, with no human errors, you may have a beautifully consistent and perfect server, but the reality is that isn’t going to happen and you’ve just added another “non-standard” server to your organisation, that will probably result in more unplanned work later, and should immediately go on the list of things that needs replacing, once an automated and standardised approach is created.

Technical debt is insidious because it’s so easy to justify that you made the right decision, and turn a blind eye to the problems down the road.

What’s this got to do with automation? In this case it’s about removing obstacles. Improving your delivery of infrastructure and application delivery pipeline makes it far easier to make changes in the future, and one thing we know about working in technology is everything is constantly changing. I see automation as an enabler of change, which can help you make decisions that won’t add to your technical debt.

Cheers

Tim…

Why Automation Matters : Unplanned Work and Death of Productivity!

The first time I heard the term “unplanned work” it was like a light bulb switched on in my head. I’m not sure my brain even recognised the distinction between different types of work before I heard this term. It was all just work to me.

You’ve probably read the descriptions of the types of work from one methodology or another before, but just to summarise you could break it down to three types:

  • External Projects. This is project work done for your customers. Depending on where you work, your customers could be external to the company, or a different department in the company.
  • Internal Projects. This is project work done within the team to improve your situation. It could be refactoring stuff, patching, upgrades, improving your tooling and processes, or automation of tasks. You get the idea.
  • Unplanned Work. This is stuff that comes out of the blue and forces you to drop what you are doing and look at it. Maybe a priority 1 incident.

The first two are planned work, since they are both projects, where you should have an idea of the resources required and the time they will take. As the name suggests, unplanned work is unplanned. 🙂 You can add some slippage time into your projects to allow for interruptions by unplanned work, but ultimately you never know if it’s going to be enough.

So what’s this got to do with automation? Well to put it bluntly, when you do stuff manually you are going to screw up and have inconsistencies between your environments. Later you will do something that you’ve done “a million times before”, knowing exactly how long it’s going to take, it will fail and you’ll lose a bunch of time trying to figure out what went wrong and how to fix it. While you are doing that, all the work you should have been doing is not happening and the list of people screaming at you is getting longer by the minute.

It can easily get to the point where you are constantly firefighting, bouncing from one problem to the next, and never make any headway with your projects. Welcome to the world on unplanned work, population you!

I’m not trying to make out automation is 100% guaranteed to prevent unplanned work, but there are a whole bunch of cases where it’s going to reduce the incidence of problems, or make resolution of those problems simpler.

Cheers

Tim…

Why Automation Matters : Can’t the cloud do it for you?

One of the comments on my previous post in the series mentioned using the cloud may solve a lot of these issues, implying you don’t have to bother with your own automation. Cursed with the ability to see both sides to any argument, I both agree and disagree with this. 🙂

Cloud providers bring a lot to the table as far as automation is concerned. Firing up new VMs and containers is really simple, and of course platforms such as RDS and the Oracle Autonomous Database family take over many of the operational aspects. So I can forget about automation right? Not so fast…

We typically see demos of cloud services that involve clicking buttons on web pages and it all works and looks great, but it’s not the way we really want to work. We want our infrastructure as code, and you can’t check button presses into your version control. 🙂 Also, if we are promoting self-service in the company, the last thing we want to do is give everyone access to our cloud account.

The cloud providers have got our back here. They allow us to use CLIs, web services and tools like Terraform to define whole chunks of infrastructure based on their services. You can use these tools to create your own self-service portals within your company. But that’s a new bunch of stuff you have to learn to become effective using this platform. It hasn’t freed you up from having to think about automation completely. It’s just altered your focus.

What’s more, a cloud provider will not be able to provide every solution you need, configured exactly the way you want it. They may provide many of the building blocks or platforms, but you are still going to have to do some of the work your self, whether it’s application configuration or change management. All of this still needs to be automated if you want to live up to the infrastructure as code mantra.

We also have companies at various stages in the cloud journey to consider. Some companies are still not considering cloud. Some are part way through the journey. Some will almost certainly be running in mixed environments, made up of on-prem and multiple cloud providers for a long time, or eve forever? Automation allows you to abstract some of the working parts, giving you some consistency in these mixed environments.

I think this all comes down to levels. You may never have to install or patch a database again, but that isn’t the whole story as far as automation is concerned.

Cheers

Tim…

Why Automation Matters : Continuous Improvement and Buying Time For Yourself

In previous a post I talked about lost time associated with manual processes and hand-offs between teams, but in this post I want to look at time from a different perspective…

One of the big arguments I hear against automation is, “We don’t have time to work on automation!” If you don’t think you have time now, how are you going to make time when you have to deal with another 10, 100, 1000 servers? I don’t know about you, but every week I have to deal with more stuff, not less. If I waited for a convenient opportunity to work on automation, it would never happen.

I think a lot of this comes from a flawed mindset as far as automation is concerned. There seems to be this attitude that we have to get from where we are now to a full blown private cloud solution in a single step/project. Instead we should be trying to incrementally improve things. This idea of continuous improvement has been part of agile and DevOps for years. It doesn’t have to be great leaps. It can be small incremental changes, that over time amount to something big.

As a DBA we might think of these baby steps along the path.

  1. Stop doing GUI software installations. Instead focus on silent installations of software. This is probably the easiest thing a DBA can automate because Oracle have done all the hard work for you. Silent installations of most Oracle products are really easy. What’s more you can put your scripts into Git and you have a proper record of what you did. It’s surprising how many people have no record of what they did and how they did it!
  2. Stop doing GUI database creation. Just like the silent installations, Oracle has done all the hard work for you here. You can use the DBCA in silent mode and once again put your scripts into Git.
  3. Once you’ve got 1 & 2 sorted you can start thinking about scripting post installation and post DB creation tasks including patching and other operational tasks.
  4. Once that’s all running, you have some basic automation in place, which you can improve over time, you might want to try out some alternatives, like switching from shell scripting to something like Ansible.
  5. Once you’ve got some stable and reliable automation, you can start trying to integrate it with your System Administrator’s build and patching processes.
  6. At some point you might want to make some of these operations self-service, so users/developers don’t even have to ask you anymore, they initiate the automation themselves. You will still be responsible for creating and maintaining the automation, but you don’t have to be there 24×7 to manually run the scripts.

If all you have time to do is steps 1 & 2, you will still have saved yourself some time, as you can start a script and do something else until it finishes. That could be working on improving your automation. Added to that you’ve improved the reliability of those steps of the process, so you won’t have to redo things if you’ve made mistakes, or live with those mistakes forever.

I understand that company politics or internal company structure can make some things difficult. Believe me, I run into this all the time. I can build whole systems with a single command at home, but at work I have to break up some of my automation processes into separate steps because other teams have to perform certain tasks, and they haven’t exposed their work to me as a service. As frustrating as that can be, it doesn’t stop you improving your work, and maybe trying to gently nudge those around you to join in.

Remember, each time you save some time by automating something, invest some of that “saved” time into improving your automation, and automation skill set. Over time this will allow you to take on more work with the same number of staff, or even branch out into some new areas, so you aren’t left out on a limb when everything becomes autonomous. 🙂

Cheers

Tim…

Why Automation Matters : Keep Your Auditors Happy

We were having some of our systems audited recently. I’ve been part of this sort of things a few times over the years, but I was pleasantly surprised by a number of the questions that were being asked during this most recent session. I’ll paraphrase some of their questions and my answers.

  • How do you document your build processes? We have silent build scripts (where possible). The same build scripts are used for each build, with the differences just being environment variables. If a silent build is not possible, we do a semi-silent build, and use screen grabs for the manual bits.
  • How do you keep control of your builds and configuration? Everything goes into a cloud-based Git repository, and we have a local git server as a backup of the cloud service.
  • How do you manage change through your systems? Requests, Incidents, Enhancements, Tasks are raised and placed in a Task Board, which is kind-of like a Kanban board, in Service Now. Progression of changes to production require a Change Request (CR), which may need to be agreed by the Change Advisory Board (CAB), depending on the nature of the change.
  • Are changes applied manually, or using automation? This was followed by a long discussion about what we can and can’t automate because of our internal company structure and politics. It also covered the differences between automation of changes to infrastructure and in the development process. 🙂

There was a lot more than this, but this is enough to make my point.

The reactions to the answers can be summarised as follows.

  • When we had a repeatable automated process we got a thumbs up.
  • When we had a process that was semi-automated, because full automation was impractical (because of additional constraints), we got a thumbs up.
  • When we had a manual process, we got a thumbs down, because maintaining consistency and preventing human error is really hard when using manual processes.

In a sentence I guess I could say, if you are using DevOps you pass. If you are not using DevOps you fail. 🙂

Now I am coming to this with a certain level of bias in favour of DevOps, and that bias may be skewing my interpretation of the situation somewhat, but that is how it felt to me.

As I said earlier, I was pleasantly surprised by this angle. It’s nice to see the auditors giving me some extra leverage, and it certainly feels like automation is a good way to keep the auditors happy! 🙂

Cheers

Tim…

PS. This is just one part of the whole auditing process.

Autonomous Database : “Hand-tuning doesn’t scale”

I was at a talk by Chris Thalinger at Oracle Code One called “Performance tuning Twitter services with Graal and machine learning”. One of the things he said was, “Hand-tuning doesn’t scale”, and it brought into focus some of the things that have been going on in the Autonomous Database, which is closer to my world. 🙂

In my post called It’s not all about you! I discussed the reaction to a new feature mentioned in the ACE Director briefing. It has been spoken about publicly now, so I guess I’m allowed to mention it by name. The feature in question was Automatic Index Tuning that (insert Safe Harbour slide) might be in Oracle 19c, or in an autonomous database cloud service in the future. Once this feature was mentioned, the list of questions started to pile up, before we even knew what it was or how it was implemented. I mentioned my own reaction to this specific feature, but let’s look at this in the broader sense of autonomous services generally.

As I mentioned, watching Chris’ session brought all this into focus for me. Sorry if I’m stating the obvious, but here goes.

  • Even if I were capable of doing a better job than an automatic performance tuning feature, and I’m not sure I can, that is just me. Is everyone else I work with at my level of understanding or better? Is everyone else who works with the database across the world at my level of understanding or better? If the answer to that is no, then there is a need for feature X, whatever it is.
  • Let’s say I have a group of really skilled people that can do better than automatic feature X. Are they constantly looking at the system, trying to get the best performance possible, or are they working on hundreds or thousands of different targets, and actually spending very little time on each? As their workload grows, which it invariably will, will they be able to spend more or less time looking at each specific feature?

I know there are some consultants that get to go in and solve specific problems on specific systems, and maybe those folks will look down on automatic performance tuning features, but I have to look after loads of disparate systems and I get 30 seconds to get something done before I have to move on. I like to think I’m pretty good at Oracle database stuff, but I need all the help I can get if I want to keep things running smoothly.

When a new automatic feature is announced we always get super intense about it, which usually results in a lot of wailing and gnashing of teeth. Sometimes this is for very good reason, as the early incarnations of some features have been problematic, but over time they often become the norm. Think about the following, and what life would be like without them…

For some people reading this, they may never have experienced life without these features. Believe me, it wasn’t pretty! 🙂

Whether it’s a specific automatic feature, like Automatic Index Tuning, or a grander vision, like the Autonomous Database family of cloud services, this is part of the natural evolution of the database. At *some point* in the future I can see all my databases running on the cloud and all of them being some form of autonomous service, regardless of which cloud provider is running them.

Cheers

Tim…

PS. I hope people understand the spirit of what I’m saying, but I feel the need to include a few statements, as some people on Twitter seemed to get the wrong end of the stick.

  • I’m not saying you can do a rubbish job and leave it up to an automatic tuning feature to fix your crap application. Bad software always runs badly, no matter what you do with it. You might be able to mask some of the problems, but you don’t fix them.
  • I’m not suggesting the development process shouldn’t include proper testing, including unit, integration, UAT and performance testing. See previous point.
  • The more you know about your platform, the better job you can do, even if you have automatic features to help you.

Why Automation Sucks

I’ve written a number of posts about how important automation is (here), but thought I would mention something that happened on Friday…

If you’ve followed the blog you will know I recently released a hands-off installation of 18 RAC. On Friday I received a pull request from GitHub suggesting a change to the GI software installation, switching the public and private network device names from “enp0s8” & “enp0s9” to “eth1” & “eth2”. I hadn’t seen ethernet device names like this since RHEL5/OL5, so I was convinced it was a mistake. I was in the middle of writing a comment on the pull request, but I thought I better do my homework, rather than making an assumption.

When I got home I fired up the existing RAC on my Linux server and the device names were “enp0s8” & “enp0s9” as I expected. I fired up a new DNS server on my laptop and the public network device name was “enp0s8”. At this point I was convinced I was correct, but I thought I better just make sure. I was using the latest VirtualBox and Vagrant releases, but I noticed the output from the build said there was a newer version of the “bento/oracle-7.5” Vagrant box. Did a “vagrant box update”, “vagrant destroy -f” and “vagrant up” and the new DNS server was built with a public network device name of “eth1”.

I went back to my Linux server, did the same and the device names were eth1 and eth2 on there too. So the “bento/oracle-7.5” box update had caused the ethernet device naming to change. As a result of this I accepted the pull request, made a similar change to another config file, then destroyed and started a rebuild of my RAC. I went out to visit some mates and watched the first two episodes of Ozark season 2. By the time I came back I had a new RAC up and running.

Why does this mean automation sucks? Well it doesn’t really, but it does show you how your automation can be broken by things outside of your control. That could be change to a Vagrant box, a Docker image, the Kickstart file from your system admins, or even a cloud service. You have to keep on top of this stuff. You can’t just define it and forget about it. Having said that, I would rather find a problem like this in an automated build than in a manual process… 🙂

Cheers

Tim…

 

Why Automation Matters : You’re only a tweak away!

Once you start on the automation path it becomes progressively easier to automate new things because you will build up a collection of stuff you can tweak to create the new stuff. Here’s an example…

A little while ago I did a hands-off build of 18c RAC using VirtualBox and Vagrant (here). I had to solve a few little problems, but for the most part it was piecing together a bunch of stuff I already had, like silent installations and database creations, so no big drama. Probably the most complicated thing was deciding how I wanted to organise things, which I’m sure will change over time…

Fast forward to a few days ago when I wanted to play around with 18c Data Guard. I actually took the RAC build and used that as the basis of this little project. Obviously some things were chopped out and some things were added, but a lot of it was just reused, which saved a bunch of thinking and hassle.

Once I had the 18c build working, a couple of changes in the config files and I had a 12cR2 build up and running (here). Some config file changes and a couple of minor scripting changes and I had a 12cR1 build up and running. You get the picture.

Of course you will occasionally have to do something that constitutes a step change, or you will decide to take a completely new approach* and have to go back to basics, but a lot of the time you are only a tweak away from the next automation.

Cheers

Tim…

* I’m playing around with Ansible at the moment, so maybe I’ll end up redoing these using Ansible. Maybe not. We’ll see. 🙂

Why Automation Matters : ITIL

ITIL is quite a divisive subject in the geek world. Once the subject is raised most of us geeks start channelling our inner cowboy/cowgirl thinking we don’t need the shackles of a formal process, because we know what we are doing and don’t make mistakes. Once something goes wrong everyone looks around saying, “I didn’t do anything!”

Despite how annoying it can seem at times, you need something like ITIL for a couple of reasons:

  • It’s easy to be blinkered. I see so many people who can’t see beyond their own goals, even if that means riding roughshod over other projects and the needs of the business. You need something in place to control this.
  • You need a paper trail. As soon as something goes wrong you need to know what’s changed. If you ask people you will hear a resounding chorus of “I’ve not changed anything!”, sometimes followed by, “… except…”. It’s a lot easier to get to the bottom issues if you know exactly what has happened and in what order.

So what’s this got to do with automation? The vast majority of ITIL related tasks I’m forced to do should be invisible to me. Imagine the build and deployments of a new version of an application to a development server. The process might look like this.

  • Someone requests a new deployment manually, or it is done automatically on a schedule or triggered by a commit.
  • A new deployment request is raised.
  • The code is pulled from source control.
  • The build is completed and result of the build recorded in the deployment request.
  • Automated testing is used to test the new build. Let’s assume it’s all successful for the rest of the list. The results of the testing are recorded in the deployment request.
  • Artifacts from the build are stored in some form of artefact store.
  • The newly built application is deployed to the application server.
  • The result of the deployment is recorded in the deployment request.
  • Any necessary changes to the CMDB are recorded.
  • The deployment request is closed as successful.

None of those tasks require a human. For a development server the changes are all pre-approved, and all the ITIL “work” is automated, so you have a the full paper trail, even for your development servers.

It’s hard to be annoyed by ITIL if most of it is invisible to you! 🙂

IMHO the biggest problem with ITIL is bad implementation. Over complication, emphasis on manual operations and lack of continuous improvement. If ITIL is hindering your progress you are doing it wrong. The same could be said about lots of things. 🙂 One way of solving this is to automate the problem out of existence.

Cheers

Tim…