4 Debugging lessons I learned from House M.D.

In recent years, dev culture got some love from Hollywood and our favorite network shows. The developer image has gone through quite a transformation as well. In the ’80s and '90’s it was the geeky, socially awkward, living in their parents’ basements stereotype that prevailed. Today, these are the geeky, socially awkward, adorkable characters who hack into the NSA mainframe, found Facebook and argue over tabs versus spaces.

As I see it, the only thing missing from those movies and TV shows is capturing the drama of developing, testing and debugging in the same way that House, M.D. was able to capture the drama of diagnosis and treatment. Most of us don’t have medical degrees, but we were still able to follow the plot. Similarly, people who aren’t software engineers may be able to follow a team of too-attractive developers troubleshooting a race condition before it wrecks production in the middle of the night.

After all, everything I needed to know about debugging, I learned from Dr. Gregory House.

Everybody Lies

One thing that makes House, M.D. a pleasure to watch is the way the show makes diagnosing a patient feel like a murder mystery. Similarly, as Filipe Fortes once tweeted, “Debugging is like being the detective in a crime movie where you are also the murderer.”

debug a cloud-native app or an app that behaves differently in staging or in production

Whether you are debugging your own code or someone else’s, you know the code will behave in strange, unpredictable ways. And it will become even more unpredictable when you try to debug a cloud-native app or an app that behaves differently in staging or in production. When you face a bug that simply Does Not Reproduce, you start questioning your log lines, your tests, even your own code. So what would House do?

House would doubt everything, especially his patient. House would send his staff to question the patient’s family, to investigate his home, to find hard evidence that the patient is lying. Similarly, as you debug a sneaky bug, you must visit its “home” (the environment where it’s running) and accept nothing at face value. Track its behavior step by step. Look into every log line. Examine the value of each and every variable on the stack frame as if it was a murder suspect, or a clue to the mystery, or both.

It may require a lot of patience, and you will definitely need the right tools to be able to do that in remote, dynamic environments. But hey, if it were easy, anyone could have been like Dr. House.

Tests take time. Treatment is quicker.

Another thing everyone knows about Dr. House is that he doesn’t play by the rules. When Dr. Cuddy is worried about preventing a possible lawsuit, Dr. House dismisses her and does everything in his power to save the patient. That usually means treating the patient for one suspected disease in an attempt to dismiss another. All for the sake of saving time.

When an urgent issue is faced by a key customer, or when an unknown exception is preventing a bunch of clients from making online purchases, common sense tells you to stay calm and add log lines. Push a bunch of log lines, covering every single snippet that may be related to the issue. Use these new log lines to trace the root cause of the problem.

This works well, in theory. In practice, you will be going through your CI/CD cycle every time you add a few log lines. And after adding them you’ll learn the bug hasn’t been caught yet, and you’ll expand your search area by adding more loglines and waiting for yet another CI/CD cycle. And so on, and so forth. To make matters worse, adding and collecting too many log lines will impact your application’s performance. Which means that much like Dr. House, you may end up killing your patient in an attempt to isolate the disease that is killing him.

A way to bypass the CI/CD pipelines and prevent the need for an overflow of log lines killing your app.

If only there was a way to add and remove log lines with a click of a button. A way to bypass the CI/CD pipelines and prevent the need for an overflow of log lines killing your app. Only a 10X debugger like Gregory House, M.D. (Medical Debugger) would know about such a tool. ;)

Look at her eyes

About two thirds into every episode, the team would think they have found the root cause and saved the patient, only to learn that the treatment they gave has exposed another, seemingly unrelated symptom, which tells them they were wrong all along. “Look at her eyes. She’s completely jaundiced. Her liver is failing.” Or something similar would ramp up the drama of the episode even further.

The same happens too often with devs after we push a supposed fix to our production environment. Initially, things look calm and we congratulate ourselves for solving the problem. But soon enough things start crashing and burning, and we look at our APM dashboard just as dramatically as House would be looking into his dying patient’s eyes.

production environment

Common sense tells us to roll back to the last stable build we saw. Spend days or weeks isolating the problem locally, and then push another fix. It may even work. But right now, our production environment is showing us where the problem is. If only we had the log lines and House’s superior intellect to see it.

fetch every possible log line
Look at her eyes. Her Response Time is spiking. She’s completely crashing!

As we prepare for rolling back, we do what we can to fetch every possible log line from the areas where the problem reproduces. We do it with a click of a button, and the log lines are immediately streamlined into our log aggregator and tracing tools. The increased observability helps us “Pull a House” and dramatically find the root cause just in time to save the patient. Queue dramatic music, House riding away on his motorcycle, as his team looks bewildered at his genius. Fade to Awesome.

log aggregator and tracing tools

It’s never Lupus

In production debugging, as in House M.D., we know only one thing for sure: It’s never Lupus. Your code may appear to be lying to you, but if you are able to debug it remotely just as you debug it locally, you’ll end up finding the truth. Common sense tells you to push a bunch of log lines, wait for the CI/CD flow, and look for the needle in the haystack. But you know better than that. You add log lines with a click of a button and only add the ones you need.

debugging in production

And as you look deep into your application’s eyes (or, well, its dashboards), you know one thing for sure: only you can save her. You can do that by playing by your own rules. By being smarter than everyone else and deflecting your deep emotional involvement via wit and sarcasm. By debugging in production just as if you were debugging locally. And of course, by consuming vast amounts of a substance that helps you stay focused and alert as you debug a crash at 2 am. Yes, I know coffee isn’t as dramatic as Vicodin, but hey, it’s a show about developers.

I know I would watch that show. Especially if they cast Hugh Laurie as the lead, the 10x debugger. How about you? Who would you have them cast as yourself?

Still losing hours on getting data from your live code?

No credit card required