Eating Our Own Dog Food: Production Debugging

February 3, 2022

Rookout’s Live Debugger is a product that’s created for developers, by developers. As such, our R&D team knows firsthand the actual challenges that developers face daily when debugging in production. It is a constant struggle to gain an understanding of what’s happening in their complex environments without accidentally breaking something, waiting for another deployment, or having to write additional lines of code.

 

But Rookout developers have a leg up on these challenges, as they have Rookout. With the introduction of the Golang SDK, our developers were able to dive even deeper into understanding issues that arose and were able to resolve them quickly. This was (and is) especially useful – and time and cost-effective – when helping customers troubleshoot in their own environments.

 

Dan Sela, Rookout R&D Team Lead, shared with us a behind-the-scenes look at just how useful the Rookout Live Debugger is to Rookout’s own team.

When It Only Happens In Production

 

There’s no better time to use a live debugging tool than for an instance in which a client is experiencing an issue that only occurs in production.

 

Specifically, this happened when helping a Rookout client add a user to a specific org. Each time the client attempted to do so on their own it kept failing. 

 

Yet, when Rookout developers did so in the staging environment, it worked. They were able to add other combinations of users. The only user they were facing an issue with was this specific user. They understood that the issue they were facing was only happening in production – and they weren’t able to reproduce it.

 

Luckily, that day, the Rookout Go SDK had been pushed to staging. Understanding that it was their only hope in helping the customer resolve the issue, the team immediately pushed it to production. 

 

Rookout’s developers were able to find the bug by placing a non-breaking breakpoint, adding the user that caused the issue, and seeing where the code stopped running. They then were able to go deeper inside and find the line that was giving the error. Finally, they were able to fix the bug, deploy to production, and – success! – they were able to add the user into its normal function. “We were able to find the root cause and deploy the fix in less than half an hour”, mentioned Dan. 

 


 

Shooting In The Dark

 

That wasn’t the only time that the Live Debugger has been useful for our developers. Rather, they know that they have an extremely useful tool at their fingertips to employ when helping customers troubleshoot issues and debug in production.

 

“One of the best features of Rookout is that we allow you to send your data to different targets, creating an easy and seamless experience for using multiple tools to really understand and troubleshoot your code”, said Dan.

 

However, for one specific user, this wasn’t the reality. They approached our customer success team and told them about their inability to use the Datadog target through Rookout in their production environment.

 

Dan and his team immediately set a live session with the user to understand what was happening and resolve it quickly. After setting non-breaking breakpoints in the Datadog target code they saw that they were getting an error 403- an unauthorized error. 

 

The developers then connected to their production controllers and placed a non-breaking breakpoint in the code that sends data to Datadog. They began by creating a different token. Seeing all the permissions, the user was able to use a token of our demo Datadog environment, and his token still didn’t work. However, it was working with Rookout’s demo environment. So the team dug deeper.

 

The pressure was building for the user while waiting for the Rookout developers to find and fix the issue. There was nothing in the logs and there was no indicative error. That’s when they turned to Rookout. Using Rookout, they found the source of the problem in just a few minutes: they understood that Rookout and the user hadn’t been sending the data to the right place. The team quickly added the option to choose a specific data center for users to send their data to so that the issue never occurred again for any user. “It was great”, said Dan, “Using Rookout felt like we were turning on a light. We were finally able to see things that we wouldn’t have been able to otherwise. Through our use of Rookout, we were able to quickly find the source of the issue.”

 

Results

 

In both situations, the bug that each one was experiencing couldn’t be reproduced. Our developers were able to help our customers get to the root cause of the issue quickly. In both cases, Rookout proved to be a great and efficient tool in helping navigate legacy code and issues that arose that couldn’t be reproduced. 

 

“I always forget on the day-to-day, when taking care of other things, how incredible using Rookout in production is, and every time I sit down to help a customer or fix one of our own bugs, well, I really can’t imagine going back to any of the classic debugging methods that were previously used”, said Dan.  

 

“We are able to resolve issues for our clients much faster, because we are now able to gain insight into issues that otherwise we wouldn’t have been able to reproduce”, continued Dan. “By using Rookout ourselves, we are better able to understand the pain of developers who work without it. It better equips us to build better features that we know and feel the need for.”