Automation : Increasing pressure on an existing constraint

Yesterday I tweeted that I was reminded of this post.

I was reminded of it because of something that is happening to me at work, so I thought I would talk about it here.

Production lines

If you’ve read anything about DevOps you will know it came from manufacturing. If you didn’t know that, check out The Goal, which was the basis for The Phoenix Project.

Manufacturing typically uses production lines made up of multiple stations, where each station performs a specific task, and the product moves forward through the stations until it is complete. If one station is slower than the others, it will become a blocker. Product will start to queue up behind it, and downstream stations will become starved. So production lines only work well if they are planned to enable a consistent flow.

What’s more, you can only sell the product when it is completed, so we could describe the product as having no value until it is finished and with the customer.

It’s not just about manufacturing

The processes born out of manufacturing also work really well for other industries. I would suggest most things can be described like a production line, and as soon as you do that, a similar approach can be adopted to identify and fix the constraints.

Tech is an obvious one, and variations on DevOps have grown in popularity because of that. My sister in law works in a medical practice, and we’ve discussed the processes used in the administration side of it. As a result of our discussions she’s started to use Kanban boards to visualise the flow of work.

Back to my problem

So with the concept of production lines and constraints in mind, we jump back to my problem…

We are in the process of replacing loads of Oracle Linux 7 servers with either Oracle Linux 8 or Oracle Linux 9, depending on vendor support. The first three links in that production line are as follows.

  • VMs are provisioned.
  • Operating system customisations are run.
  • Database or app server is installed and configured.

We are not perfect, but we’ve got pretty good at this part of the production line. When we are finished, the systems have to be tested, and go through various processes to get them live and used by the business. Those parts of the production line that follow us are slow due to a number of factors. So our improvements in the production line have just made things harder for those steps that follow us. A simplified view of the Kanban board looks something like this.

The obvious thing to do here is focus on the constraints and start working on downstream links in the chain, to improve the overall flow. That’s where we hit organisation and culture barriers, so we are pretty much stumped…

Thoughts

I’m pretty happy with what we’ve done over the last few years. We’ve definitely improved several aspects of our systems because of automation, but at the same time I can’t help thinking we’ve achieved nothing because ultimately the work is not getting completed as fast as it’s started.

I wrote here about reframing the goal, and I have to do that as copium. Unfortunately copium only goes so far… 🙂

Cheers

Tim…

Business As Usual (BAU) vs Project Work

I’ve had this conversation so many times over the years, and I’m sure I’ve written about elements of it several times, but I’m not sure I’ve written about it specifically before, so here goes…

In every organisation there are conflicting demands from project work and business as usual (BAU) tasks. In case you’ve not heard the term BAU, here’s a definition.

Business as usual (BAU), the normal execution of standard functional operations within an organisation, forms a possible contrast to projects or programmes which might introduce change. BAU may also stand in contradistinction to external events which may have the effect of unsettling or distracting those inside an organisation.

Wikipedia

Swimming to stay still

I’ve written about this before in a rather angry post here. Working in tech is like swimming upstream. As soon as you stop swimming, you’re moving backwards. Let’s say today I have fully patched, secure and supported systems. How long can I do nothing before that is no longer the case?

  • Patches: For the operating system we may be talking days. For the database or an application server it may be months.
  • Support: Depending on where we are in the product support cycle, this is likely to be years, but we are fast approaching some important deadlines for Oracle Linux 7 and Oracle Database 19c, so we can’t wait much longer before we have a lot of work on our hands.

Standing still takes effort. If you are not putting in that effort, you are moving backwards, even if you don’t realise it.

BAU is invisible. Project work is shiny!

The problem with BAU is it is invisible to the users. Often you patch or upgrade a system and what they get after all that work is “exactly” what they had before. Of course, it’s not exactly the same, but from their perception it is. That can seem like a lot of time and effort for no perceivable gain, especially if you are asking for their resources to test things.

In comparison, project work often gives them something new and shiny to play with. It has perceivable value.

Faced with allocating resources between the two, you know there is going to be a lot of pressure to deliver new and shiny stuff over keeping the lights on…

Automation doesn’t solve all BAU tasks

Automation can certainly help with a lot of BAU work, but not everything. Even if you could magically upgrade a system without any downtime, somebody still needs to test the systems against it. Automation also brings with itself some additional BAU. Here are some examples I’ve seen recently.

Terraform: Providers change on a regular basis, which means you might be provisioning your kit using an old version of a provider. Over time you will start to see deprecation warnings, and have to update your provider. In some cases this will break your builds and you will have to do some refactoring. You need to revisit your Terraform builds on a regular basis, or put your automation at risk. Even the updates of the Terraform executable can introduce issues. One upgrade desupported a backend provider, which broke our project.

TeamCity: We use TeamCity for some on-prem automations. There are regular updates to this tool, usually because of security issues in some of the components such as Java or Tomcat. We have similar issues with Jenkins.

GitHub Actions: Have you seen that list of warnings and deprecation notices for those actions that are currently working fine? You are going to have to revisit those, or your lovely automations will break!

Cloud Platforms: If done well, cloud platforms can alleviate a lot of the operational BAU work, but they are not immune to issues around upgrades and deprecations. Many of us have lived through the desupport of previous generations of cloud architectures, and upgrades of underlying tech still have to happen, and require your systems to be tested when they are.

This is not meant to be an exhaustive list. Just examples.

BAU as internal projects

Just process your BAU as internal projects, and then they can be scheduled like any other project, right? That sounds fine, but someone still has to prioritise the projects, and BAU is not shiny! It’s still going to come in second place.

Education is the key

The only answer to this is education. The business has to understand that BAU is not negotiable. You have to be strong enough to push back on unrealistic demands, to make sure that systems remain up to date and safe. This can only be successful if you educate everyone on the importance of this boring and often invisible stuff…

A word about Oracle

It would seem wrong to finish this post without a mention of Oracle.

Most of the database upgrades I’ve done in my life have only happened because of the pressure of needing to stay in support. They have not been because people want the shiny features that are offered by the new release. That’s not to say they won’t end up being used down the line, but that is not the driving force a lot of the time. Stable and bug free beats new features every time!

In Oracle, just like any other company, there are competing pressures. I think most of us customer need to have 23c available so we can start the upgrade process to stay in support long term. There are no doubt a small number of important customers demanding features that will delay the release, and probably introduce bugs that affect all of us. There are also features that would sound cool for the sales teams and in keynote presentations. Who wins? Probably not me as I’m not working for an important customer, and I’m not in sales and marketing. 🙂

Conclusion

BAU is boring and often invisible, but it has to be done!

Cheers

Tim…