Birmingham Digital & DevOps Meetup : August 2019

Yesterday evening I went along to the Birmingham Digital & DevOps Meetup for the first time. It followed the usual meetup format of quick intro, talk, break, talk then home.

First up was Elton Stoneman from Docker with “Just What Is A “Service Mesh”, And If I Get One Will It Make Everything OK?” The session started by describing the problems associated with communication between the building blocks of a system, and how a service mesh can alleviate some of them. It then moved on to some service mesh demos using Istio. These included examples of altering the routing of traffic to do canary testing and targeting specific groups etc.

Elton was really honest about the learning curve, issues and overhead associated with this sort of setup. One comment I really liked was when he showed a slide containing the following, saying that often people assume there is a progression from left to right.

Meaning people assume you learn Docker, then you need some form of orchestration so you learn Swarm. From there you naturally progress to Kubernetes and once you understand that, you will inevitably move on to a service mesh using something like Istio. Elton’s point was you don’t *have to* continue on this progression. You can step off at any point once you’ve achieved the functionality you need. I think this is a really important point and I can see it reflected in what I do with Docker. We’ve got some things that stop at just using Docker containers, with no orchestration at all. I work on a project that requires some orchestration, so we use Swarm, which is really easy to use. So far I’ve had no reason to go beyond Swarm, and even considering a service mesh is so far down the line for us. I’m not discounting the relevance of these for everyone, but they don’t make sense for me at this point.

It was a really good session and I learned a lot. You can check out Elton’s blog here.

After the break it was James Relph with “Container Security Fundamentals”. This started of with a basic introduction to containers, using that as an entry point to explain how containers can be problematic from a security perspective, and what you can do to reduce the impact. He covered a lot of stuff, some of which I already do, some I know about and some stuff that was new to me. This is not an exhaustive list.

  • Don’t automatically trust images from Docker hub. Do your due diligence, even when they are from a reputable source.
  • Use your own image repository. He mentioned ECR amongst others. This can be used for your own images, but also base images from Docker Hub, which you have verified.
  • Don’t use “latest”, but use specific tagged versions. Latest gives you all the latest fixes, but all the latest bugs too. You should test and verify before you let images out into your infrastructure.
  • Multi-stage builds to reduce the size of containers and minimise the attack surface. Basically, copy out what you need and leave the crap behind.
  • Using sidecar containers to provide specific services, allowing your application images to remain more focused. The sidecar images can be maintained by feature experts to make sure they are as secure as possible.
  • Scanning images using Clair, amongst other things, to check for dodgy software. One of the audience mentioned Anchore.
  • Using microVMs like Firecracker to provide additional isolation, whilst retaining the ease of use of containers. I’ve not played with this, but I have tried Kata Containers, which seems to do pretty much the same.

There was a lot in there!

I was a bit nervous going into the event thinking it would all go over my head, and some of it probably did, but it was cool. I got to speak to a few people before the event, during the break and at the end. It seemed like there were quite a mix of people there from beginners in these areas upward, so I didn’t feel out of place.

A few times I found myself thinking, that’s great, but what do I do about my 3rd party applications? I’ve written before (here) about how 3rd party apps screw everything up. 🙂

Thanks to Elton Stoneman and James Relph for taking the time to come and speak to us. Thanks to the folks from BrumDigitalDevOps for organising the event, and to Capgemini UK for sponsoring the event.

Cheers

Tim…

I’m 2% DevOps, 3% agile and 4% automated because of 3rd party apps…

I was having a discussion with my boss about the impact of 3rd party apps on the way we work, and how difficult things are when you have to deal with 3rd party apps, as opposed to just writing your own software.

It’s easier to do things well when you are in control of all the pieces. Most of the examples you see are people writing their own software, typically on new projects. That’s very different to dealing with old projects and 3rd party apps. I’ll give you some examples, without trashing the companies responsible for this.

Example 1

Our student system is provided by a 3rd party. The company in question has a really antiquated way of delivering applications. In recent years they’ve tried to resolve this by writing their own delivery mechanism, made up of some custom software and Jenkins. The problem is, this is just a wrapper over the old process, so it is not the most reliable tool in the world. Someone like me would describe it as putting lipstick on a pig.

In addition to that, you have to use a GUI to perform the operations. At this point there is no API to allow you to script operations, which makes building them into a bigger process really problematic. We have internal development which is gradually moving to something resembling CI/CD, but it will never truly meet that goal, because we have to include manual management of things because of the limitations of the 3rd party software.

I’m sure long-term customers see the new delivery mechanism as a great improvement, but it’s not something you would deliver for a new product. It’s less painful than it was, but not really good.

Example 2

We have a publishing system that is written in Java and runs on Tomcat. It is so close to being hands-off, but there are a couple of problems.

  • When you deploy a new version, it starts in maintenance mode and you need manual interaction to click an OK button a few times on a web-based maintenance screen. I’ve never “not clicked” the OK button, so I just want a “just do it” option, so I can let it get on with it.
  • When some features are enabled by the power users, the next restart of the application flips you into maintenance mode. We’ve had P1 incidents because a host failure has caused the VM to start on a new host, and because a user has enabled a new feature in the app, the automatic startup stalls, waiting for me to click the OK button a few times.

There are some other annoyances, which impact on availability and possible topology, as well. There is no way to resolve these because of limitations in the application. All we can do is raise enhancement requests with the vendor.

I could go on with more examples, but I think you get the message.

So what do you do?

It can be quite disheartening when you want to do things well, but you have to keep compromising because of factors outside your control. You have to try not to give up, and just keep plugging away.

  • Don’t make unrealistic comparisons between your environment and others. There’s no point comparing your mixed environment to a software house. I’ve worked in both. They are very different. Take what works. Ditch what doesn’t.
  • Semi-automated processes are better than processes that are 100% manual. Maybe you can use Robotic Process Automation (RPA) to automate what is essentially a manual process.
  • Try to make sure these considerations become part of your procurement process, or you will keep buying crap.
  • Try to be creative and find workarounds, don’t just bury your head in the sand. There’s always *something* you can do to improve things.
  • Even if something is terrible, that doesn’t stop you improving the processes around it.

I guess you should focus on the values, rather than trying to exactly match some prescriptive ideal.

Good luck!

Cheers

Tim…

PS. I’m pretty sure my boss is reading this laughing, as I’m following none of this advice myself, but instead stomping round the place like a thirteen year old having a strop because, “Everything is crap!” 🙂

Why Automation Matters : Lost Time

Sorry for stating what I hope is the obvious, but automation matters. It’s mattered for a long time, but the constant mention of Cloud and DevOps over the last few years has thrown even more emphasis on automation.

If you are not sure why automation matters, I would just like to give you an example of the bad old days, which might be the current time for some who are still doing everything manually, with separate teams responsible for each stage of the process.

Lost Time : Handover/Handoff Lag

In the diagram below we can see all the stages you might go through to deploy a new application server. Every time the colour of the box changes, it means a handover to a different team.

So there are a few things to consider here.

  • Each team is likely to have different priorities, so a handover between teams is not necessarily instantaneous. The next stage may be waiting on a queue for a long time. Potentially days. Don’t even get me started on things waiting for people to return from holiday…
  • Even if an individual team has created build scripts and has done their best to automate their tasks, if it is relying on them to pick something off a queue to initiate it, there will still be a handover delay.
  • When things are done manually people make mistakes. It doesn’t matter how good the people are, they will mess up occasionally. That is why the diagram includes a testing failure, and the process being redirected back through several teams to diagnose and fix the issue. This results in even more work. Specifically, unplanned work.
  • Manual processes are just slower. Running an installer and clicking the “Next” button a few times takes longer than just running a script. If you have to type responses and make choices it’s going to take even more time, and don’t forget that bit about human error…

Let’s contrast this to the “perfect” automated setup, where the request triggers an automated process to deliver the new service.

In this example, the request initiates an automated workflow that completes the action and delivers the finished product without any human intervention along the path. The automation takes as long as it takes, and ultimately has to do most of the same work, but there is no added handover lag in this process.

I think it’s fair to say you would be expecting a modern version of this process to complete in a matter of minutes, but I’ve seen the manual process take weeks or even months, not because of “work time”, but because of the idle handover time and human processes involved…

They Took Our Jobs!

At first glance it might seem like this is a problem if you are employed in any of the teams responsible for doing the manual tasks. Surely the automation is going to mean job cuts right? That depends really. In order to fully automate the delivery of any service you are going to have to design and build the blocks that will be threaded together to produce the final solution. This is not always simple. Depending on your current setup this might mean having fewer, more highly skilled people, or it might require more people in total. It’s impossible to know without knowing the requirements and the current staffing levels. Also, cloud provides a lot of the building blocks for you, so if you go that route there may be less work to do in total.

Even if the number of people doesn’t change as part of the automation process, you are getting work through the door quicker, so you are adding value to the business at a higher rate. From a DevOps perspective you have not added value to the business until you’ve delivered something to them. All the hours spent getting part of the build done equate to zero value to the business…

But we are doing OK without automation!

No you’re not! You’re drowning! You just don’t know it yet!

I never hear people saying they haven’t got enough projects waiting. I always hear people saying they have to shelve things because they don’t have time staff/resources/time to do them.

As your processes get more efficient you should be able to reallocate staff to projects that add value to the business, rather than wasting their lives on clicking the “Next” button.

If your process stays inefficient you will always be saying you are short of staff and every new project will require yet another round of internal recruitment or outsourcing.

Is this DevOps?

I’m hesitant to use the term DevOps as it can be a bit of a divisive term. I struggle to see how anyone who understands DevOps can’t see the benefits, but I think many people don’t know what it means, and without the understanding the word is useless…

Certainly automation is one piece of the DevOps puzzle, but equally if you have company resistance to the term DevOps, feel free to ignore it and focus on trying to sell the individual benefits of DevOps, one of which is improved automation…

Check out the rest of the series here.

Cheers

Tim…

Do you even [ Agile | DevOps ] bruh?

It seems I can’t turn around without getting myself involved in some discussion about Agile or DevOps these days.

agile-devops-meme2

I agree with many of the concepts and the aims of Agile, DevOps, Continuous Delivery etc. I find it hard to believe anyone wouldn’t see value in what they are trying to promote. As always, it is how people interpret and implement them that makes all the difference.

It’s just like religion. They all seem to be pretty sound at heart, but let a few lunatics and fundamentalists loose on them and next thing you know…

Things like Agile and DevOps have arisen to address perceived problems. If your organisation doesn’t suffer from those problems, you may not need to consider them, or you may already be doing something like them without knowing you are. 🙂

Your company can be agile, without following Scrum or Kanban. You will inevitably have arrived at similar patterns I guess. Likewise, your streamlining of process, automation of testing and deployment, good communication between silos (if present) may leave you wondering what all the DevOps fuss is about.

I am both a fan and hater of Agile and DevOps. I’m a fan of what they are able to achieve when used correctly. I’m a hater of all the bullshit that surrounds them!

Rant over. 🙂

Cheers

Tim…