Debugging Code : Problem Solving Revisited

A couple of incidents/discussions happened recently that made me think about this topic again. Here are some random thoughts on a subject that should definitely not be approached in a random manner. 🙂

Systematic Approaches Pay Off

I know it is really boring, but a systematic and meticulous approach will always yield better results than randomly jumping at stuff. I’ve discussed this before here.

It’s easy to become focused on what *you know* is the problem, just because of a gut feeling, without any supporting evidence. When you eventually find the real issue, you feel a bit stupid for looking at the wrong thing for so long.

Sometimes you end up focusing on the symptom, not the root cause. If I do this it all works again. Great! Then the same problem happens the next day. Before you know it you have a bunch of voodoo operational tasks to keep the system running, with nobody knowing how and why it works.

It really does pay to take a scientific approach to fixing things.

A Leap of Faith

Over time you get to spot patterns, which will sometimes allow you to jump straight to the root cause of a problem without doing the necessary legwork. There is no problem doing this, provided you are willing to accept it won’t always pay off, and you don’t become controlled by your hunches. You have to know when to accept your hunch could be wrong, and take a step back to a more meticulous approach.

This is not a contradiction of the first point. It’s something that you will learn to do because of prolonged use of a systematic approach. Be careful when working with more experienced people, as it is easy to believe their seemingly random approach to problem solving is just that. Random.

I mentioned this here.

Instrument Everything

I can’t emphasise enough how important instrumentation is.

You should be able to determine what went wrong just by looking at the instrumentation, without having to know or look at the code. In my opinion if you are doing it correctly, non-developers should be able to figure it out from your instrumentation.

We have a perfect example of this in Oracle. I have never seen any source code for the database, but I can diagnose and fix issues by using the instrumentation built into that code. Things like SQL Trace,  Real-Time SQL Monitoring, ASH, AWR, ADDM are all possible because of instrumentation in the code.

The problem with Googling solutions is you often see cut-down code examples, which can promote bad programming practices. I have almost no instrumentation in the examples on my website. That’s because I’m trying to keep them small and lightweight. I don’t want you to have to install a bunch of tracing, logging and unit testing packages before you paste in a 10 line bit of example code. That doesn’t mean those things are not important in your real solutions. It’s all about context.

A Fresh Pair of Eyes

Your brain is a weird thing. You work on something and get nowhere. You walk away and do something completely different and you get a flash of inspiration. All that time your brain has been churning it over and come up with the solution. Sometimes walking away is enough to solve the problem.

You can also call someone in to help you. Talking through the problem can help for a couple of reasons.

  1. They don’t have the mental baggage you have, so they might spot something obvious you are refusing to see. 🙂
  2. In explaining the issue to them, you are ordering your thoughts and effectively explaining it to yourself. The net result is you sometimes answer the question for yourself. This is one of the reasons why you should learn to ask questions properly, especially on forums. In formulating the proper question, you may answer the question for yourself.

I wrote about the second point here.



Problem Solving (Breaking Things Down)

direction-1033278_640Some people are great at problem solving, others not so much. The people I meet that are good at problem solving always have one very important skill, the ability to break stuff down into its constituent parts. With practice, it can seem like they are making massive leaps of faith, but that is based on their experience. That experience came from breaking problems down and dealing with the little stuff. Here are some examples, including some you may not consider as classic problem solving, but illustrate the point.

Books: Ask somebody to write a book and they will crap themselves. The thought of writing a book is really daunting. Ask them to write a chapter and they might still be scared, but less so. Ask them to write a page and most people would probably grudgingly do it. A book is a collection of pages. If you can write a page, you can write a book. I’m not saying it’s a good book of course, but you get my point. So the problem of writing a book and be broken down into very manageable pieces.

Development: I’ve been involved in some really complicated development projects in my career. When you’ve finished, you take a look back and think, how the hell did we manage to do that? When you look at the individual bits, they are all pretty simple. The skill is breaking down that massively complex development into manageable chunks. The classic top-down or bottom-up approaches to programming encouraged this. Agile, when done properly, also encourages this approach of breaking down problems to small units of work, delivered on a regular basis. So the big problem is broken down to little bits, that get put together and you end up with something that seems bigger than the sum of its parts.

Infrastructure: When you are dealing with multi-tier architectures, finding the cause of a problem can be quite complicated. You get questions like, “this URL is not working, why not?”. Based on experience of your environment, you might know likely candidates, but if not, you may have to take the long route, which may look like this:

  • Can I connect to the database?
  • If I connect directly to the App Server, rather than via the web layer (Load balancer, Reverse Proxy, web server) does the application work?
  • If I connect via the web layer, does the application work?
  • If I connect from different network zones, does the application work?

Based on the answers to those questions, I will know which part of the chain is broken and I can look at that specific section and break it down further, eventually finding the root cause and getting the relevant people involved to fix it.

SQL Tuning: Looking at a 10 page execution plan is really scary. What is a plan made up of? Lots of individual steps that combine to form the whole plan. If you work your way through the plan in order, operation by operation, it is often very obvious what is going wrong. Typically, a bad cardinality estimate somewhere causes the optimizer to make a bad decision, which kind-of propagates up the plan. Fix that cardinality estimate, or help the optimizer by “shaping” the plan in a more sensible fashion and things often fall into place. If you get a chance to see Jonathan Lewis talking about SQL tuning and shaping execution plans, or Kyle Hailey speaking about his approach to SQL tuning, you will see they both focus on breaking stuff down to their constituent parts, and you will realise the “black magic” they perform is actually very doable by mere mortals like us if we take a consistent and meticulous approach.

Life is all about breaking down big and daunting tasks into smaller, more manageable tasks. You can either get used to it, or spend the rest of your life achieving nothing and wondering what magic button everyone else is using! 🙂



PS. I understand there are lots of ways to achieve a goal and successful people will find a way of working that suites them. 🙂