Working with Rookout customers, I have noticed a significant pattern in how they describe engineering routines in the days before our software became a part of their daily workflow. It shows up in various engineering tasks such as developing new features, reproducing and fixing bugs, or even just documenting the existing system and how to best utilize it. It is also consistent across industries and tech stack.
I want to take this opportunity to share this pattern with you, one which I find to be much in line with my own experience not only as an engineer, but also as an engineering manager.
As an engineering team working on an existing code base, your first and foremost source of truth is the code itself. After all, software documentation is notoriously difficult to maintain and is predictably out of date when you need it most. With manual labor traditionally being automated and placed in the code itself using modern software methodologies, such as Infrastructure as Code and Database as Code, this is even more prominent.
That being said, it’s important to remember that reading source code only tells half the story of what’s happening in the software as it’s running (and running it locally doesn’t help all that much, though that’s an entirely different blog post). This is where other data sources come into play, most notably the Observability and Monitoring tools such as logging, tracing, metrics, and BI in place.
Unfortunately, more often than not, engineers lack the data required to design and execute their day-to-day assignments to the best of their ability. Still, getting more data requires writing more code, getting it integrated, and deploying it to the relevant environment, all of which can be just as expensive (and sometimes as risky), as doing their assigned tasks in the first place. This brings us to the Engineer’s Dilemma:
Do I develop the task ahead of me with the information I already have, or do I develop a feature that will get me more data?
Reading this, you might be wondering: what are the missing pieces of information that all those engineers can’t get without writing more code? Well, here are a few of the most notable examples we are seeing:
The most straightforward approach for engineers when working is to do the job in front of them with the information they already have. Naturally, performing a task while lacking critical information is hardly the best way to go forward.
The classic example is when developers are attempting to resolve a bug. They find that the lack of data means they have little ability to pinpoint the root cause and have to quite literally change code at random in the hopes it may fix the bug. Worse, when lacking the data to understand and/or reproduce the bug, the team has no way to verify the bug has even been fixed.
Yet, even when developing a new feature, the shortage of an understanding of the existing code base and how it’s being used presents a hurdle for engineers. This means extra time and effort that will be spent on handling potential “what ifs” that might not even be relevant, all the while failing to address real issues that will inevitably arise in production. Overall, this leads to more expensive, slower to develop features that have higher failure rates when rolled out.
Alternatively, engineers can dive into the rabbit hole, chasing those missing pieces of data. Once such a missing piece of data has been identified, they then have to develop an entirely new feature, one that will collect for them the data they need.
While this is (usually) a relatively simple feature, it is a feature nonetheless. One has to figure out where in the code to collect the data, how to process it, and where to send it out to, be it a new logline, a new alert or a new metric. The new feature has to be integrated into the software’s mainline, and then verified that it is working properly, alongside regression tests for the new version. Last but not least, the new version has to be approved and deployed by whatever organizational processes in place and such changes are associated with their own set of risks.
Unfortunately, even after going through this process, an engineer may find that he failed to get the piece of data he was looking for, or that the new piece of data doesn’t provide as much clarity as he was hoping for, and might have to endure this process again.
Over and over, we have heard software engineers and architects lamenting that this process is so cumbersome and expensive in their own organizations that individual contributors prefer to skip it and act on whatever little data they already have.
I’m sure at this point you are asking yourself: how this can be? Organizations spend a fortune on the aforementioned Observability and Monitoring tools. How can these tools fail to fix these problems?
Well, the truth of the matter is that those tools were never meant to solve those problems. The main use cases for those tools are to:
There’s a very good reason the use-cases above have been prioritized and solved by these tools. The financial incentives for solving those problems are very clear and ROI calculations tend to be very straightforward. At the same time, these use-cases have little to do with the day-to-day work of the majority of the engineering workforce.
I have seen this pattern come up time after time in every organization we have worked with. Day after day engineers make suboptimal choices based on poor information, due to the sheer difficulty of collecting additional data to educate themselves. Besides causing a deep individual frustration, this has a big impact on software development velocity and quality.
That’s the very reason I founded Rookout. We strive to empower engineers to collect the data they need on the fly, while maintaining all the software parameters including correctness, performance, availability, security, and compliance. Reach out to learn more about the huge difference this can make for you.