Fantastic bugs and how to resolve them ep1: Heisenbugs

Welcome fellow developer, I can see you’ve traveled a long road, why don’t you stay a while and listen? I’ve got some fantastic stories to share; Lessons to imbue your debugging skills with power and wisdom, adding at least 1000 XP to take you to the next level and make your future travels much safer.

Hmm, now, where should we start? Have you already faced the terrifying Heisenbugs? They are truly fantastic.

Heisenbugs

Heisenbugs are bugs that seem to disappear or change when one tries to debug them. Magnificent bastards indeed. Named after the famous physicist Werner Heisenberg, in homage to his uncertainty principle.

They are the dread of every experienced developer since they know that encountering this type of bug will undoubtedly be hard to study, understand and resolve. They also know that Heisenbugs reduce the chances that a fix, once achieved, would cover all cases due to their ethereal nature.

There are two subtypes of Heisenbugs: those that are actually affected by the debugging capabilities, and those that seem to be affected but are in fact just manifesting randomly/statistically.


“I've spent every Tuesday for 2 months trying to track down a bug that only shows up in production on Tuesdays.” -- Flufcake on Reddit

Now you see me, now you don’t 🤦‍♂

Both types of the Heisenbug are as painful as a root canal, but one of them - disappearing only when debugging is used - is definitely more evil.

This type of Heisenbug comes with a built-in sense of despair, as you discover that most (if not all) tools at your disposal for approaching the problem are worthless. That despair is often the reward you get for spending a long time trying to figure out why you’re not seeing anything when you attach a debugger/profiler or redeploy with more code.

The disease is spreading

To make matters worse, Heisenbugs which used to be rare nightmares, are becoming more and more common these days. With the rise of distributed systems (e.g. Kubernetes, serverless, microservices) software is growing in scale, complexity, and asynchronicity. These are the perfect spawning grounds for the vicious Heisenbugs. As a SaaS company from the DevOps ecosystem, we tend to hear about Heisenbugs more often from customers who are migrating to microservices.

[caption id="attachment_3310" align="aligncenter" width="512"]

Credit: cloud.google.com/kubernetes-engine/kubernetes-comic/[/caption]

The greater the playing field and the more interconnecting moving parts it has, the more areas Heisenbugs have to come into existence. That’s just pure statistics. With added complexity and encapsulation there are more layers that can conceal simple bugs and “upgrade” them into vicious Heisenbugs.

Most prominent is the effect of asynchronicity. Heisenbugs are often close cousins of another type of bug, the race-condition (to be covered in a separate post). With more and more components connecting in an asynchronous fashion, the amount of possible software situations increases dramatically. Essentially, that is the Cartesian product of all concurrent software elements. As a result, specific software situations or configurations are becoming rare/fleeting, directly leading to more Heisenbugs springing into being.

[dt_default_button link="https://www.rookout.com/get-started" button_alignment="default" animation="fadeIn" size="big" default_btn_bg_color="" bg_hover_color="" text_color="" text_hover_color=""]Stop being afraid of Heisenbugs.
Use Rookout for real-time observability![/dt_default_button]

Not today, Satan! How to avoid and resolve Heisenbugs

Use the right tools - surgical ones

The heavier or clumsier your observability tools are, the more likely is their impact on the running code, causing bugs to disappear or change while you’re debugging. Be extremely cautious of tools that pauseת freeze or slow down execution (including classical debuggers); tools which allocate a lot of memory or have a high CPU overhead; and tools that change the execution or networking layout (including proxies and service-mesh solutions)

Know your prime suspects

We’ve covered where Heisenbugs prosper, to avoid them you must know your software. Know its most complex, encapsulated, asynchronous, distributed parts; you’d have an easier time finding the bastards there.

Use the right work and debug-flows to avoid creating Heisenbugs

If your debugging flows include restarting, redeploying, or significantly changing server layouts, it shouldn’t surprise you that Heisenbugs will soon pop up and you can then expect a world of hurt. Have protocols and observability solutions in place so that they can be put into play without all the ruckus. This way, when these ugly beasts rear their heads you can chop them right off.

Know your code and know how to do static analysis

Debugging is roughly split into two parts. Executing the code in order to observe it (dynamic analysis), and reading through the code to find patterns in it (static analysis). Heisenbugs can evade detection only in execution; if you don’t run them, they can’t run from you.

Never trust a Heisenbug

In most cases, it’s hard to know for sure if you’ve solved a Heisenbug. Even if they whisper in your ear “It’s ok now.  it’s gone… shhh... it’s gone.” It’s better to stay on the safer side of suspicion and have your debugging kit at the ready.

Happy hunting and Happy travels!

Still losing hours on getting data from your live code?

No credit card required