A Bittersweet Production Debugging Memoir

One day in 1997, my then-boyfriend, who was studying computer science, came home and said, “I had a good interview today.”
“What does the company do?” I asked.
“Not entirely sure,” he replied, “but it lets you see when your friends are surfing the net. And the cool part is that it sounds like an elephant!” The product was called ICQ.

Within a few months, I was a savvy ICQ user. My boyfriend introduced me to Glenda, a devoted Swiss ׳tester׳ of the live Mac version of ICQ. “I can trust Glenda to report bugs and glitches more reliably than our internal QA team and faster than our support team,” he claimed. To show his appreciation, we invited Glenda to spend her summer vacation at our place on the Mediterranean. A well-deserved reward for her hard and fruitful work!

Fast forward to 2018.

Today, almost every company is a software company, but where are the enthusiasts like Glenda, who save you time and trouble by finding bugs and reporting them quickly? You may still have those enthusiastic users, but you can’t really count on their help to uncover every production issue.

No matter how hard you try to catch and resolve bugs early in the SDLC (Software Development Life Cycle) -- bugs, glitches and security holes still happen. And when they do, for every extra minute that they persist, you lose business, see customer satisfaction plummet and waste pricey developer time. If all hell really breaks loose and your issue becomes a public matter, it can damage not only your reputation but also shareholder value. Ouch!

Often quoted charts showing that costs of fixing bugs increase by a factor of 4–6 between testing and production may seem outdated. However, recent studies show massive increases in losses due to software errors, mainly bugs, in production systems. Scalability comes at a cost. When a product that is very popular breaks, more people are affected. It only takes a few unhappy users to shame you on Twitter or Facebook and set things on fire!

True, finding bugs during the design, test, and dev stages is always better and much less costly than discovering them in production. But the reality is that some bugs will always find their way into production. Since the cost of these bugs is drastically higher, and most users aren’t as tolerant as dear Glenda was, you better have the ability to debug them quickly.

Here’s another fact: Back in the days when Glenda would kindly point out that there was an issue going on, your app was running in a server that was under your desk or in the room next to you.  It wasn’t in the cloud, it wasn’t containerized, and it most certainly wasn’t popping up and then disappearing on serverless. In other words, gaining visibility to production bugs nowadays is no simple matter, and the evolving ecosystem around observability is clear proof.

Monitoring, logs, exception management tools, and the like are all trying to give you better control over your somewhat elusive (yet fantastic, fast, agile and scalable!) production environment and to provide more clarity about your live code’s behavior.

Rookout adds a critical layer of in-depth visibility on top of these tools by letting you immediately collect any type of data from production, even without pre-instrumentation or further redeployment. So when a Glenda-bot pages you in the middle of the night, you’re now much more prepared to hunt production issues. Go get them!

Getting Started is a Breeze