Why Automation Matters : Your automation is your documentation

How many times have you been following a process defined in a knowledge base note, only to find something has been omitted, or is unclear? This may be because of empire building, laziness or more often oversight, but the result is the same. Unless your processes are well documented, you always run the risk of progress drawing to a halt when “the right person” is not present.

One of the great things about automation is, by definition, every step of the process must be defined. If person X is on holiday, you can be 100% sure all the steps to complete the automation are present.

Of course, this doesn’t stop people writing stupid, ugly and hard to understand code, but your development process should have some control over that. Even if it doesn’t, you know the answer is there. It must be there because the process works.

Does that mean you don’t need to document automations?

No. The automations should be self documenting. I don’t mean that in the sense that “my code is so good it’s self-documenting”, which is the calling card of the lazy developer. I mean that automation code in your source control system should be documented. Markdown is a quick and easy tool that allows us to easily document our code, and the good thing about it is it remains close to the code. It’s right next to it in the repository. When we change our code, we should revise our documentation where necessary. The documentation becomes a living document, rather than some 1000 page word document that nobody ever reads, and nobody updates.

But documentation sucks!

Documentation gets a really bad rap because most people are doing it wrong. They fall into one of these traps.

They produce too little, which means people are unlikely to find what they are looking for.
They produce too much, which makes it daunting to look at, so nobody bothers.
It’s overly formal, which is dry and boring.
It’s hidden, or at least separate to the code, so people might not even know it exists.

Basic pointers and how-to examples are good enough for 90% of the cases, so make these the focus of your documentation. You can always give links to more detailed documentation for those people that need a little more. The context is slightly different, but this post on Structuring Content should give you some clues about how to structure your documentation. After all, documentation is content. 🙂

Conclusion

For some companies an automation or infrastructure as code project may well be the first time in their company history that they have got everything about a process documented. That has to be a positive result for the company!

Check out the rest of the series here.

Why Automation Matters

Cheers

Tim…

Judgement of Worth : I got an award, but…

Our company has some yearly awards. We nominate people who we think have made a significant contribution to some aspect of the company. There is a longlist of nominees for each category. Those people get shortlisted, and a winner is picked for each category. The longlist and shortlist are also published internally, so people get to see if they’ve been nominated, which is nice…

This year I won an award for Continuous Improvement. That’s good for my ego, which is no doubt in part why I am mentioning it, and it shows you *can* teach an old dog new tricks. But it made me want to talk about the judgement of worth again. I’ve mentioned this before in posts like visibility vs results. Judging worth is really difficult, and it potentially highlights a whole load of bias.

For someone like me it is really easy to stand out. I like to think I’m pretty good at what I do, but I’ve also developed good written and verbal communication skills over the years. On top of that, I’m always talking about what I’m doing, so I’m no shrinking violet. That gives me an unfair advantage over someone who may be doing better work than me, but is not so loud about it. This is why I ask to be removed from the running each year…

From a personal development perspective, I give some advice about improving your writing and speaking skills in the visibility vs results post. Please develop these skills so you get what you deserve!

From a management perspective, it’s really important you start judging your staff based on what they actually deliver, not on what they say. It’s the only way to be sure you are not being biased when you are thinking about who is productive and who is not. This ‘Self-promoters’ do nothing but still get ahead at work story shows that it’s really easy to make people think you are working hard, even if you are not.

This links back to my gripes about the negative stories related to productivity and working from home. Managers have to be able to track results/deliverables. It’s the only way to know if someone is doing busy work, or actually being productive.

I’d just like to say thank you to my colleagues who nominated me for this award. I don’t want to sound ungrateful, but next time perhaps it would be better to use your nomination to lift up someone who needs the exposure a little more than me. 😉

Cheers

Tim…

I don’t accept guest posts (except this)

I get a lot of requests from people wanting to post their articles on oracle-base.com, and they are always answered with a firm no! One of things I’ve always said is I like the fact it is one person’s voice. I think that means something…

A few weeks ago someone asked me if I had written anything about Oracle partitioning from a design perspective. Not how to do it, I’ve got loads of those types of articles (here). Since I had nothing suitable, I went to Twitter to ask for advice, and Patrick Wolf pointed me to this article by Lothar Flatz. It’s in German, and the person who asked me the original question was a German speaker, so it was a fortunate turn of events. 🙂

Google Translate did a pretty good job at translating it, which allowed me to read it, but speaking with Lothar we agreed an English translation of the article would be nice. It’s really easy to overlook posts that aren’t written in your native language, even with Google Translate available…

Fast forward a few weeks and Lothar was looking for somewhere to publish his translated post. I suggested a few options he should check out, and in a rather unusual turn of events I also offered to publish it on my website. 🙂

After speaking to some other people he came back to me and here is the first, and probably only, guest post on my website. I hope you like it.

Partitioning From a Design Perspective (A Practical Guide) – By Lothar Flatz

As a general point, I’m a firm believer in owning your content, so I would always suggest people publish on their own blog.

As for my website, my stance is still the same. I don’t accept guest posts (except this). 🙂

Cheers

Tim…

Test Cases Are Important : Again…

Over the weekend I was reminded of the importance of test cases again. I’ve written about this before, with probably the most consistent post here.

Learning New Things : We don’t need no freakin’ test case!

If you want my opinions of test cases, go and read that. In this post I want to tell a little story to demonstrate why I think test cases are important. I’m going to keep things a bit vague, because I don’t want to openly criticise the person in question, because they actually did an OK job of expressing their issue, but it did highlight some things.

The issue

I got a question that suggested a recent upgrade on Autonomous Database had altered the behaviour of something. Every time you patch or upgrade software there is a possibility of change, whether it is an intentional behaviour change, or a bug. The person had provided some evidence that did seem to suggest there was an issue, so my interest was piqued. Unfortunately there wasn’t a test case, but I have an article that includes a test case that was similar, so I was able to knock something together pretty quickly.

The test case

The first thing I did was try out my test case in an on-prem installation. Yes, I know the potential issue related to autonomous database, but I wanted to see the test case working, and prove to myself that what I believed to be true actually was true. Think of this as an experimental control. The test case ran as expected on-prem, which was good.

I then moved to trying to replicate this issue on autonomous database. Most cloud databases come with some restrictions on what you can do, so my test case setup was not ideal for running on autonomous database. I had to revise the setup a little. APEX to the rescue. Before you ask, yes, I did rerun the on-prem test with the new setup to make sure the control was still valid. 🙂 Having set up the base data, I was able to run the code for my test case, and it ran just the same in autonomous database as it did on-prem.

Test case vs in situ

In the original question, the issue was directed specifically at one feature, but my test case seemed to prove that feature was working as expected. When you are doing scientific experiments you try to reduce the number of variables. Too many variables and you have no idea what caused the result, so you can’t come to any reasonable conclusion. I was trying to prove a feature works as expected, so I reduced the possible variables to the point where I was specifically testing that feature, and it seems to work as expected.

So that’s the end of it right? Well not really. I’ll use an example from biology to explain. Biology is complicated because living things are complicated. When you are doing chemistry, it’s possible to isolate specific compounds and put them together in a controlled manner to observe an interaction. Kind-of like my test case, this is a very controlled approach. Living things have loads of working parts, and you can’t isolate things without killing the organism, so you have to deal with the fact you are working in the middle of a whole bunch of interactions. You still try to minimise your variables, but you have to accept that you can’t always do that to the extent you would like. You may define experimental controls that discount the other possible reasons for the result. This distinction between running things in isolation and in situ is really important. What has this got to do with my test case?

My test case is run in isolation. The original poster clearly has an issue in their system. Perhaps there is something in their system that affects the way this feature works, so although the feature works in isolation, maybe there is an issue in situ. My test case hasn’t resolved the issue. It has just ticked one thing off the list of possible causes.

What next?

Having ticked the base functionality off the list of possible causes, we then have to move up one step higher and incorporate more elements of the system, to see how that affect things. That could be as simple as we are using different session parameters, or it could be something more fundamental with the design of their system. Hell, it might be their data is corrupt for all I know (I really hope not).

It’s also possible when looking at the “next layer”, we notice something that shows the original test case is invalid. That sort of things happens.

What’s the point of this post?

I often get the impression some people think problem solving is some kind of witchcraft. In reality it is painstaking meticulous work. I look at all the people I think are good and they have one thing in common. They put in the the work and grind through this stuff. Yes, you get quicker the more experienced you get, but you still have to put in the effort. People are often looking for the “magic button” to solve their problem, but there isn’t one. If it were that simple, it would already be built into every piece of software you use. 🙂

You need a test case, even if all it does is prove your initial conclusion was wrong, and allows you to focus your attention elsewhere.

Once again, the question that promoted this post was not bad. The person did an OK job of expressing themselves. This is just a post that was triggered by that interaction. If we get to the bottom of their issue, and it proves to be interesting, I will probably write up something more specific about it. 🙂

You might find it useful to read these, as they are relevant to this post.

Cheers

Tim…

Update: This looks like it is a data/understanding issue. It’s starting to sound like the data isn’t stored in the format the original poster expected, so they are trying to do something with it that is impossible. If this is the case, it’s nothing to do with the upgrade.

User Experience – A Little Rant Again

I had a bit of a negative post yesterday, and it got me thinking of these two posts.

I’ve said some of this stuff before, but I want to bring it all into a slightly different context.

Good user experience is…

Good user experience is not about forcing me to follow your atomic implementation of a feature. What do I mean by this? Let’s take look at some examples of getting it right (IMHO) from Oracle.

An Oracle REST Data Services (ORDS) web service is made up of a module with one or more templates, each with one or more handlers. We could define our service by defining a module, template and a handler separately, because that’s how the underlying implementation of an ORDS web service works. It’s fine, but it’s a bit over the top if I just want a quick little web service based on a query. That’s why we have been given the DEFINE_SERVICE procedure, allowing us to do all that other stuff in a single call (see here). For simple services this is all you need.

The database scheduler is a complex beast. We can define loads of things like schedules, programs, arguments, jobs classes, windows and of course jobs. That’s fine, but 99% of the time we just want a simple job, and the CREATE_JOB procedure allows us create one in a single call (see here).

In both cases we can choose between doing things the long/verbose way, or use the “cheat code” and do stuff in a single call. This is exactly the sort of thing I like when I’m using a feature. I want to know the flexibility is there if I need it, but if 99% of my requirements don’t, I want the cheat code so I can do what I need to do and move on. This also makes the feature more accessible to new people…

Good user experience is not…

As I mentioned above, good user experience is not about forcing me to follow your atomic implementation of a feature. Someone should take a step back and ask what would “normal” users really like? The answer is probably giving them an option to zone out and get all the prerequisites and config done for them. It’s not making them spend a weekend trying to figure out how to enable a feature, then finding it doesn’t really work properly anyway…

I’m a generalist. I have to work with lots of different products. When I open the docs and I see a list of prerequisites, and then multiple commands to actually set stuff up my heart sinks. I want a “we’ll do everything for you” option. That might sound funny because of my history, and if companies did that it would make my website redundant, but I feel we need to progress. We’ve been doing this nuts & bolts crap for too long. If I can automate it, Oracle can automate it. If Oracle can automate it, why don’t they?

I don’t want to name and shame. I’ve made some positive comments about Oracle in the previous section, but you know there are a whole bunch of Oracle things I could use as examples of what not to do. Oracle aren’t alone here. It applies to lots of other companies too.

But Tim, I want to…

I can already hear people typing their responses about their need to be in control and their obsessive configuration disorder. Shut up. I don’t care. The chances are, if you are reading this post, you are probably one of the people that can cope with all this tech, but there are many people who can’t, or don’t want to.

Won’t someone think of the children customers

I am a customer. My company is a customer. I can think of two things my company refuse to pay for because the functionality in question is unsupportable if I’m not available. Those are features we need, but won’t buy because they are overly complex for normal people to do well.

Now you can argue that cloud services will solve all these issues, but cloud adoption varies between regions, and maybe people will not pick your cloud. My company are a perfect example of that. We’ve consolidated on Azure, and although we don’t run any Oracle databases there yet, if we run Oracle on the cloud, it will probably be on Azure.

If you heard someone say, “I used to get a punch in the face every day, but now it’s only once a week. Things are good!”, you would think they were crazy. Less bad is not the same as good. I often think companies bring out tools and utilities that are “less bad” than what they had before. Not actually “good”. If you have been in the trenches, “less bad” might feel “good”, but it’s not.

I realise this is another rant, but I think it’s a subject that is worth a rant. I use a wide variety of tech from a number of companies, and some of them get on my nerves at times, because it feels like user experience is an after thought. You can’t expect everyone to no-life the learning curve for your products. I’m just saying how I feel, and I’m pretty sure I’m not alone here!

Cheers

Tim…

PS. I’m playing a bit fast and loose with the term user experience in this post, but hopefully you get what I mean…

DG PDB : Oracle Data Guard for Pluggable Databases in 21c, and why you shouldn’t use it!

Last month you may have noticed the announcement of DG PDB. It’s Data Guard for PDBs, rather than CDBs, introduced in the Oracle 21.7 release update.

How do you use it?

I’ve had a play around with it, which resulted in this article.

DG PDB : Oracle Data Guard per Pluggable Database in Oracle Database 21c (21.7 Onward)

I also did a Vagrant build, which includes the build of the servers, the database software installations, database creations and the perquisites, so you can jump straight to the DG PDB configuration section in the article. You can find that build here.

https://github.com/oraclebase/vagrant/tree/master/dataguard/ol8_21_dgpdb

So that’s the basic how-to covered, and I really do mean “basic”. There is a lot more people might want to do with it, but it’s beyond the scope of my little Vagrant build.

What do I think about it?

Well I guess you know how this is going to go, based on the title of this post. I don’t like it (yet), but I’m going to try and be a bit more constructive than that.

It is buggy! : I know 21c is an innovation release, but this is a HA/DR solution, so it needs to be bullet proof and it’s not. There are a number of issues when you come to use it, which will most likely be fixed in a future release update, or database version, but for now this is a production release and I don’t feel like it is safe pair of hands for real PDBs. That is a *very* bad look for a product of this type.
Is it Data Guard? Really? : Once again, I know this is the first release of this functionality, but there are so many restrictions associated with it that I wonder if it is even deserving of the Data Guard name. I feel like it should have been a little further along the development cycle before it got associated with the name Data Guard. The first time someone has a problem with DG PDB, and they definitely will, they are going to say some choice words about Data Guard. I know this because I was throwing around some expletives when I was having issues with it. That’s not a feeling you want to associated with one of your HA/DR products…
Is this even scriptable? : The “add pluggable database” step in the DGMGRL utility prompts for a password. Maybe I’ve missed something, but I didn’t see a way to supply this silently. If it needs human interaction it is not finished. If someone can explain to me what I’ve missed, that would be good. If I’m correct and this can’t be done silently, it needs some new arguments. It doesn’t help that it consistently fails the first time you call it, but works the second time. Ouch!
Is the standby PDB created or not? : When you run the “add pluggable database” command (and it eventually works) it creates the standby PDB, but there are no datafiles associated with it. You have to copy those across yourself. The default action should be to copy the files across. Oracle could do it quite easily with the DBMS_FILE_TRANSFER package, or some variant of a hot clone. There should still be an option to not do the datafile copy, as some people might want to move the files manually, and that is fine, but to not have a way to include the file copy seems a bit crappy.
Ease of use : Oracle 21c introduced the PREPARE FOR DATA GUARD command, which automates a whole bunch of prerequisites for Data Guard setup, which is a really nice touch. Of course DG PDB has many of the same prerequisites, so you can use PREPARE FOR DATA GUARD to get yourself in a good place to start, but I still feel like there are too many moving parts to get going. I really want it to be a single command that takes me from zero to hero. I could say this about many other Oracle features too, but that’s the subject of another blog post.
Overall : A few times I got myself into such a mess the only thing I could do was rebuild the whole environment. That’s not a good look for a HA/DR product!

Conclusion

I’m sorry if I’ve pissed off any of the folks that worked on this feature. It wasn’t my intention. I just don’t think this is ready to be included in a production release yet. I’m hoping I can sing its praises of a future release of this functionality!

Cheers

Tim…

PS. I’m reminded of this post about The Definition of Done.

The Efficiency Paradox : Same Term, Different Meanings?

I’ve recently come across the term “Efficiency Paradox” being used by different people, in different contexts, and giving it different meanings. I thought I would share them…

The Efficiency Paradox in Economics

In 1865 William Stanley Jevons postulated, the more efficient a process gets in terms of resource usage, the higher demand you will see for that resource. This seems counter intuitive, as you might think the more efficient a process is, the less resources it requires, and therefore total resource usage would go down. Instead as a process becomes more efficient, costs drop and that drives demand, which eventually can result in more of the resource being needed. This is the heart of the Jevons Paradox, which is also referred to as the Efficiency Paradox by some sources.

Cost is always an important factor. We are currently going through a cost of living crisis in the UK. One of the factors affecting this is the cost of power. People are looking at ways to save money by reducing their power usage. When power was cheaper many people didn’t pay any attention to saving power. Now it is expensive, every little bit matters.

The Efficiency Paradox in Gaming

I watched a video by Josh Strife Hays, where he discussed the impact of guides and wikis on the enjoyment of playing video games. The term “grinding” refers to highly repetitive tasks that you must do to achieve a goal. Grinding can be exhausting, but when you achieve your goal there is a sense of satisfaction. Some games require a certain amount of detective work, where you try to figure out how to progress. Once again, the effort of trying to figure out how to progress can be exhausting, but the satisfaction on completing the task is high.

With the advent of the internet, there are loads of videos, wikis and websites dedicated to helping you play games in the most efficient manner possible. They might tell you how to minimise grinding, or flat out give you the answer to puzzles. These guides reduce the amount of time it takes to complete a task in a game, making you more efficient, but because you never have to deal with the adversity, you never get the same satisfaction when you complete a task.

So the efficiency paradox in gaming is, the more efficient you make the game play in an attempt to help the player, the less satisfying the game may become. Of course, if it is too difficult, they might leave before completing the task. There is a balance…

The Efficiency Paradox in Lean/DevOps

The previous versions of the efficiency paradox are interesting to me, but it’s this version that is really the subject of this post. In Lean and DevOps people often use the term efficiency paradox in subtly different ways, but invariably they are talking about resource efficiency vs. flow efficiency. Specifically, a focus on maximising resource efficiency resulting in less overall efficiency.

Lost Time : I’ve written about lost time before here. Lost time is about work waiting in queues while passing between siloed teams. Each team believe they are working efficiently because they have maximised their resource usage. All their staff are busy, but the flow of work through the chain of teams is really slow, making the flow efficiency low, and reducing the quality of work.

To counter this, some companies reorganise into self-sufficient teams that can progress a piece of work from conception to delivery, thereby reducing the hand-offs between teams. Some may retain the silos, but use automation to deliver self-service tools and APIs that others can pick up and run with. Regardless of the approach taken, they are attempting to reduce the constraints on the flow of work to improve flow efficiency.

Work in Process (WIP) : I’ve written about WIP before here. Most people can’t multitask well. Some think they can, but they just end up doing multiple things badly. Problem solving requires concentration, and it’s really hard to concentrate when you are being distracted by multiple projects competing for your attention. In an ideal world your WIP would be 1. You would work on a single task to completion, then move to another task. This can be tricky if you are constantly being blocked by other people and teams/silos, but it’s also complicated when a company wants to see staff being “busy” all the time.

In an effort to maximise resource (staff) usage, they increase the WIP, so there is always something for people to do. On the surface this increased resource usage looks like it is increasing efficiency, but often the work degenerates to the point where people are spinning plates, without actually achieving much. Also, the reduced attention on a specific task results in a lower quality of work. You should always try to keep WIP low, even if that means some people have idle time. If the idle time is excessive, it probably means there is a problem somewhere else in the organisation that needs to be fixed. Deal with the root cause, not the symptom!

Ultimately we have to forget about the resource efficiency and focus on flow efficiency. We can often see this in our normal working lives. We have some processes we know are going to take weeks to complete. Then there is a “Priority 1” incident that means we need to complete something ASAP. The P1 instantly aligns every team giving them the same priorities, and we race through and complete the work in a few hours. Once the P1 is over, every person goes back to their silo, with their differing priorities, and the process returns to taking weeks to complete again. We have proved it can be done in hours, but because of politics and the internal company organization, fast never becomes the norm.

Conclusion

I thought it was interesting that the term efficiency paradox came up in three different contexts in the space of a few days, so I thought I would write about it. The important point is that in all three cases people are often making incorrect assumptions about efficiency. People are doing things that they think will improving efficiency, but it is not having the desired result.

Cheers

Tim…

Why Automation Matters : It doesn’t get much simpler than this!

Just a little story about something that happened recently…

The July Oracle quarterly security patches dropped, along with the downstream OpenJDK releases a little later. As soon as the OpenJDK release dropped I updated all our non-production Oracle REST Data Services (ORDS) middle tier to the latest and greatest. Production will follow shortly. I figured that was the end of it…

On Friday evening I noticed a post by Jeff Smith saying version 22.2.1 of SQLcl and ORDS had dropped. Now it was Friday evening, and I typically don’t work on Fridays, but I really didn’t want to have this hanging over me all weekend, so this is what I did…

Download the software.
Upload the software to Artifactory.
Change a couple of files in Git, specifying the new software versions.
Press a button in TeamCity to rebuild the ORDS Docker image and push the new image to JFrog Platform.
Press a button in TeamCity to redeploy all the non-production ORDS containers.

The whole process took a few minutes, and I was confident it would work because it’s all automated and works every time. I sent this message to Jeff.

I {expletive} love automation. I just noticed your post about 22.2.1 of SQLcl and ORDS. I logged on to work. Updated a couple of files in git and pressed a button in TeamCity. New ORDS Docker image built and pushed to all non-prod systems. Boom…

This is why automation matters. The combination of picking the right platform, in this case containers, and investing a little time in the automation means we can react really quickly to new releases. It doesn’t get much simpler than this!

Check out the rest of the series here.

Why Automation Matters

Cheers

Tim…

PS. This post explains the application topology we use for ORDS and APEX.