Automation : Increasing pressure on an existing constraint

Yesterday I tweeted that I was reminded of this post.

I was reminded of it because of something that is happening to me at work, so I thought I would talk about it here.

Production lines

If you’ve read anything about DevOps you will know it came from manufacturing. If you didn’t know that, check out The Goal, which was the basis for The Phoenix Project.

Manufacturing typically uses production lines made up of multiple stations, where each station performs a specific task, and the product moves forward through the stations until it is complete. If one station is slower than the others, it will become a blocker. Product will start to queue up behind it, and downstream stations will become starved. So production lines only work well if they are planned to enable a consistent flow.

What’s more, you can only sell the product when it is completed, so we could describe the product as having no value until it is finished and with the customer.

It’s not just about manufacturing

The processes born out of manufacturing also work really well for other industries. I would suggest most things can be described like a production line, and as soon as you do that, a similar approach can be adopted to identify and fix the constraints.

Tech is an obvious one, and variations on DevOps have grown in popularity because of that. My sister in law works in a medical practice, and we’ve discussed the processes used in the administration side of it. As a result of our discussions she’s started to use Kanban boards to visualise the flow of work.

Back to my problem

So with the concept of production lines and constraints in mind, we jump back to my problem…

We are in the process of replacing loads of Oracle Linux 7 servers with either Oracle Linux 8 or Oracle Linux 9, depending on vendor support. The first three links in that production line are as follows.

  • VMs are provisioned.
  • Operating system customisations are run.
  • Database or app server is installed and configured.

We are not perfect, but we’ve got pretty good at this part of the production line. When we are finished, the systems have to be tested, and go through various processes to get them live and used by the business. Those parts of the production line that follow us are slow due to a number of factors. So our improvements in the production line have just made things harder for those steps that follow us. A simplified view of the Kanban board looks something like this.

The obvious thing to do here is focus on the constraints and start working on downstream links in the chain, to improve the overall flow. That’s where we hit organisation and culture barriers, so we are pretty much stumped…

Thoughts

I’m pretty happy with what we’ve done over the last few years. We’ve definitely improved several aspects of our systems because of automation, but at the same time I can’t help thinking we’ve achieved nothing because ultimately the work is not getting completed as fast as it’s started.

I wrote here about reframing the goal, and I have to do that as copium. Unfortunately copium only goes so far… 🙂

Cheers

Tim…

DevOps : Why do you focus on Flow and Automation?

A few weeks ago I did a DevOps “Lunch & Learn” talk inside my company. I’m not trying to claim I’m “Billy DevOps”, but I need our company to move in that direction or we will die. While I was preparing for that talk I did some Googling for the common complaints about DevOps talks and training courses, hoping to avoid them, and what I got back was a bunch of people complaining about the heavy focus on “The Principle of Flow”, specifically the automation piece of that.

Take a look at any conference agenda and the DevOps talks are mostly focused on automation, whether that’s builds (Ansible, Terraform, Vagrant, Cloud) or Continuous Integration/Deployment (CI/CD). Automation is certainly a part of DevOps, but DevOps isn’t just automation. So why do people focus on the automation aspect of DevOps so much?

I started off my talk by saying something like this,

“I Googled the common complaints about DevOps talks and training, and most people complained about too much focus on the Principle of Flow and automation. I’m going to do the same thing!”

So why did I come to this conclusion? Well, there are a few reasons.

The principle of feedback relies on you getting that feedback and doing something with it, like improving applications or processes etc. I realise I’m being simplistic, but how can you do anything with that feedback unless you have flow sorted? If it takes you weeks/months/years to effect basic changes because of bad flow, the feedback becomes almost irrelevant. It only serves to demotivate you, as you identify all the problems, with no way of fixing them.

The principle of continual learning and experimentation relies on flow being sorted. If you can’t quickly and reliably build kit and deploy apps to it, how do you expect to be able to experiment and learn new things? I discussed this point in a post called Why Automation Matters : Reducing the Cost of Failure.

It feels like the majority of people I speak to don’t have basic automation sorted yet. In public they talk a good talk, but behind the scenes the processes in their company suck just as bad as ours. Either that, or they have one aspect of automation sorted, which they talk about all the time, forgetting to mention the other manual processes that persist.

Let’s not forget that most of the people I see talking about DevOps come from a technical background, and I suspect are probably more interested in the automation aspects of DevOps than the process side of things. Also, in their current roles they have the ability to influence automation more than they do process change, so they are going to focus on a fight they have a chance of winning.

I think it’s important to emphasise to people that automation isn’t the be-all and end-all of DevOps, but that doesn’t stop it being the fun bit for me. 🙂

Check out the rest of the series here.

Cheers

Tim…

Birmingham Digital & DevOps Meetup : August 2019

Yesterday evening I went along to the Birmingham Digital & DevOps Meetup for the first time. It followed the usual meetup format of quick intro, talk, break, talk then home.

First up was Elton Stoneman from Docker with “Just What Is A “Service Mesh”, And If I Get One Will It Make Everything OK?” The session started by describing the problems associated with communication between the building blocks of a system, and how a service mesh can alleviate some of them. It then moved on to some service mesh demos using Istio. These included examples of altering the routing of traffic to do canary testing and targeting specific groups etc.

Elton was really honest about the learning curve, issues and overhead associated with this sort of setup. One comment I really liked was when he showed a slide containing the following, saying that often people assume there is a progression from left to right.

Meaning people assume you learn Docker, then you need some form of orchestration so you learn Swarm. From there you naturally progress to Kubernetes and once you understand that, you will inevitably move on to a service mesh using something like Istio. Elton’s point was you don’t *have to* continue on this progression. You can step off at any point once you’ve achieved the functionality you need. I think this is a really important point and I can see it reflected in what I do with Docker. We’ve got some things that stop at just using Docker containers, with no orchestration at all. I work on a project that requires some orchestration, so we use Swarm, which is really easy to use. So far I’ve had no reason to go beyond Swarm, and even considering a service mesh is so far down the line for us. I’m not discounting the relevance of these for everyone, but they don’t make sense for me at this point.

It was a really good session and I learned a lot. You can check out Elton’s blog here.

After the break it was James Relph with “Container Security Fundamentals”. This started of with a basic introduction to containers, using that as an entry point to explain how containers can be problematic from a security perspective, and what you can do to reduce the impact. He covered a lot of stuff, some of which I already do, some I know about and some stuff that was new to me. This is not an exhaustive list.

  • Don’t automatically trust images from Docker hub. Do your due diligence, even when they are from a reputable source.
  • Use your own image repository. He mentioned ECR amongst others. This can be used for your own images, but also base images from Docker Hub, which you have verified.
  • Don’t use “latest”, but use specific tagged versions. Latest gives you all the latest fixes, but all the latest bugs too. You should test and verify before you let images out into your infrastructure.
  • Multi-stage builds to reduce the size of containers and minimise the attack surface. Basically, copy out what you need and leave the crap behind.
  • Using sidecar containers to provide specific services, allowing your application images to remain more focused. The sidecar images can be maintained by feature experts to make sure they are as secure as possible.
  • Scanning images using Clair, amongst other things, to check for dodgy software. One of the audience mentioned Anchore.
  • Using microVMs like Firecracker to provide additional isolation, whilst retaining the ease of use of containers. I’ve not played with this, but I have tried Kata Containers, which seems to do pretty much the same.

There was a lot in there!

I was a bit nervous going into the event thinking it would all go over my head, and some of it probably did, but it was cool. I got to speak to a few people before the event, during the break and at the end. It seemed like there were quite a mix of people there from beginners in these areas upward, so I didn’t feel out of place.

A few times I found myself thinking, that’s great, but what do I do about my 3rd party applications? I’ve written before (here) about how 3rd party apps screw everything up. 🙂

Thanks to Elton Stoneman and James Relph for taking the time to come and speak to us. Thanks to the folks from BrumDigitalDevOps for organising the event, and to Capgemini UK for sponsoring the event.

Cheers

Tim…

I’m 2% DevOps, 3% agile and 4% automated because of 3rd party apps…

I was having a discussion with my boss about the impact of 3rd party apps on the way we work, and how difficult things are when you have to deal with 3rd party apps, as opposed to just writing your own software.

It’s easier to do things well when you are in control of all the pieces. Most of the examples you see are people writing their own software, typically on new projects. That’s very different to dealing with old projects and 3rd party apps. I’ll give you some examples, without trashing the companies responsible for this.

Example 1

Our student system is provided by a 3rd party. The company in question has a really antiquated way of delivering applications. In recent years they’ve tried to resolve this by writing their own delivery mechanism, made up of some custom software and Jenkins. The problem is, this is just a wrapper over the old process, so it is not the most reliable tool in the world. Someone like me would describe it as putting lipstick on a pig.

In addition to that, you have to use a GUI to perform the operations. At this point there is no API to allow you to script operations, which makes building them into a bigger process really problematic. We have internal development which is gradually moving to something resembling CI/CD, but it will never truly meet that goal, because we have to include manual management of things because of the limitations of the 3rd party software.

I’m sure long-term customers see the new delivery mechanism as a great improvement, but it’s not something you would deliver for a new product. It’s less painful than it was, but not really good.

Example 2

We have a publishing system that is written in Java and runs on Tomcat. It is so close to being hands-off, but there are a couple of problems.

  • When you deploy a new version, it starts in maintenance mode and you need manual interaction to click an OK button a few times on a web-based maintenance screen. I’ve never “not clicked” the OK button, so I just want a “just do it” option, so I can let it get on with it.
  • When some features are enabled by the power users, the next restart of the application flips you into maintenance mode. We’ve had P1 incidents because a host failure has caused the VM to start on a new host, and because a user has enabled a new feature in the app, the automatic startup stalls, waiting for me to click the OK button a few times.

There are some other annoyances, which impact on availability and possible topology, as well. There is no way to resolve these because of limitations in the application. All we can do is raise enhancement requests with the vendor.

I could go on with more examples, but I think you get the message.

So what do you do?

It can be quite disheartening when you want to do things well, but you have to keep compromising because of factors outside your control. You have to try not to give up, and just keep plugging away.

  • Don’t make unrealistic comparisons between your environment and others. There’s no point comparing your mixed environment to a software house. I’ve worked in both. They are very different. Take what works. Ditch what doesn’t.
  • Semi-automated processes are better than processes that are 100% manual. Maybe you can use Robotic Process Automation (RPA) to automate what is essentially a manual process.
  • Try to make sure these considerations become part of your procurement process, or you will keep buying crap.
  • Try to be creative and find workarounds, don’t just bury your head in the sand. There’s always *something* you can do to improve things.
  • Even if something is terrible, that doesn’t stop you improving the processes around it.

I guess you should focus on the values, rather than trying to exactly match some prescriptive ideal.

Good luck!

I wrote a series of posts about automation here.

Cheers

Tim…

PS. I’m pretty sure my boss is reading this laughing, as I’m following none of this advice myself, but instead stomping round the place like a thirteen year old having a strop because, “Everything is crap!” 🙂

Why Automation Matters : Lost Time

Sorry for stating what I hope is the obvious, but automation matters. It’s mattered for a long time, but the constant mention of Cloud and DevOps over the last few years has thrown even more emphasis on automation.

If you are not sure why automation matters, I would just like to give you an example of the bad old days, which might be the current time for some who are still doing everything manually, with separate teams responsible for each stage of the process.

Lost Time : Handover/Handoff Lag

In the diagram below we can see all the stages you might go through to deploy a new application server. Every time the colour of the box changes, it means a handover to a different team.

So there are a few things to consider here.

  • Each team is likely to have different priorities, so a handover between teams is not necessarily instantaneous. The next stage may be waiting on a queue for a long time. Potentially days. Don’t even get me started on things waiting for people to return from holiday…
  • Even if an individual team has created build scripts and has done their best to automate their tasks, if it is relying on them to pick something off a queue to initiate it, there will still be a handover delay.
  • When things are done manually people make mistakes. It doesn’t matter how good the people are, they will mess up occasionally. That is why the diagram includes a testing failure, and the process being redirected back through several teams to diagnose and fix the issue. This results in even more work. Specifically, unplanned work.
  • Manual processes are just slower. Running an installer and clicking the “Next” button a few times takes longer than just running a script. If you have to type responses and make choices it’s going to take even more time, and don’t forget that bit about human error…

Let’s contrast this to the “perfect” automated setup, where the request triggers an automated process to deliver the new service.

In this example, the request initiates an automated workflow that completes the action and delivers the finished product without any human intervention along the path. The automation takes as long as it takes, and ultimately has to do most of the same work, but there is no added handover lag in this process.

I think it’s fair to say you would be expecting a modern version of this process to complete in a matter of minutes, but I’ve seen the manual process take weeks or even months, not because of “work time”, but because of the idle handover time and human processes involved…

They Took Our Jobs!

At first glance it might seem like this is a problem if you are employed in any of the teams responsible for doing the manual tasks. Surely the automation is going to mean job cuts right? That depends really. In order to fully automate the delivery of any service you are going to have to design and build the blocks that will be threaded together to produce the final solution. This is not always simple. Depending on your current setup this might mean having fewer, more highly skilled people, or it might require more people in total. It’s impossible to know without knowing the requirements and the current staffing levels. Also, cloud provides a lot of the building blocks for you, so if you go that route there may be less work to do in total.

Even if the number of people doesn’t change as part of the automation process, you are getting work through the door quicker, so you are adding value to the business at a higher rate. From a DevOps perspective you have not added value to the business until you’ve delivered something to them. All the hours spent getting part of the build done equate to zero value to the business…

But we are doing OK without automation!

No you’re not! You’re drowning! You just don’t know it yet!

I never hear people saying they haven’t got enough projects waiting. I always hear people saying they have to shelve things because they don’t have time staff/resources/time to do them.

As your processes get more efficient you should be able to reallocate staff to projects that add value to the business, rather than wasting their lives on clicking the “Next” button.

If your process stays inefficient you will always be saying you are short of staff and every new project will require yet another round of internal recruitment or outsourcing.

Is this DevOps?

I’m hesitant to use the term DevOps as it can be a bit of a divisive term. I struggle to see how anyone who understands DevOps can’t see the benefits, but I think many people don’t know what it means, and without the understanding the word is useless…

Certainly automation is one piece of the DevOps puzzle, but equally if you have company resistance to the term DevOps, feel free to ignore it and focus on trying to sell the individual benefits of DevOps, one of which is improved automation…

Check out the rest of the series here.

Cheers

Tim…

PS. Conway’s Law – Melvin Conway 1967

“organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.”

In language I can understand.

If you have 10 departments, each process will have 10 sections, with a hand-off between them.

Do you even [ Agile | DevOps ] bruh?

It seems I can’t turn around without getting myself involved in some discussion about Agile or DevOps these days.

agile-devops-meme2

I agree with many of the concepts and the aims of Agile, DevOps, Continuous Delivery etc. I find it hard to believe anyone wouldn’t see value in what they are trying to promote. As always, it is how people interpret and implement them that makes all the difference.

It’s just like religion. They all seem to be pretty sound at heart, but let a few lunatics and fundamentalists loose on them and next thing you know…

Things like Agile and DevOps have arisen to address perceived problems. If your organisation doesn’t suffer from those problems, you may not need to consider them, or you may already be doing something like them without knowing you are. 🙂

Your company can be agile, without following Scrum or Kanban. You will inevitably have arrived at similar patterns I guess. Likewise, your streamlining of process, automation of testing and deployment, good communication between silos (if present) may leave you wondering what all the DevOps fuss is about.

I am both a fan and hater of Agile and DevOps. I’m a fan of what they are able to achieve when used correctly. I’m a hater of all the bullshit that surrounds them!

Rant over. 🙂

Cheers

Tim…