Why Developers Should Care About Resilience
Originally published on DevOps.com
Recently, a friend reminded me of a joke we used to have when we were both developers at a huge software corporation (we won’t mention names, but back when printers were a thing, you probably owned one of theirs). We didn’t develop printers. We developed performance testing and monitoring tools.
We were the dev team, which was completely separate from the QA team and from the Ops team (yes, I’m that old – we didn’t even call it DevOps back then). Support was part of a completely different organization and “business” was something that happened in an altogether different galaxy. That is, until a customer had an issue.
When an important customer had an issue, we definitely felt it at R&D. Managers called hourly to check on status, and we would have intense conversations with the support team who would then, in turn, have intense discussions with the customer. If we, the software engineers developing the product, ended up on the same call with both support and the customer, it was a sure sign that things were bad. You knew that millions of dollars were on the line, and, consequently, that pressure was high. But most of all: you knew that we would do anything we could to solve the problem, because we needed those millions of dollars to be paid, so that we could keep getting paid. And so, we kept the coffee flowing.
This is where the joke part comes in. We used to have a system for managing support tickets, collecting information, tracking time and assigning it to the relevant software or support engineer, so everyone would know who was in charge. Today, you probably use Zendesk, ServiceNow or one of their alternatives. We used a home grown system, of course.
Our joke was this: the purpose of the system was not to track and measure and propagate the handling of customer issues. The purpose of the system was to indicate someone else is now responsible for solving the problem. Or, if you would rather paraphrase one of my favorite Douglas Adams jokes, the issue is now under a Somebody Else’s Problem field.
Because solving customer issues is difficult and frustrating, and being pressured by business could ruin a software engineer’s day. This is because when you’re a happy code monkey, and all you want to worry about is writing beautiful code and releasing cool features and drinking some coffee, solving a customer issue gets in your way.
You end up spending too much time sifting through endless logs, crossing your fingers and trying to reproduce an issue that only happens at 2 a.m. on a Friday, usually when a blue zebra is passing by. Spending your time on a heated support call is less fun than, say, drinking coffee quietly with your friends.
As R&D managers, we understand that our developers want nothing to do with solving customer issues. We want to make sure they are motivated and happy and efficient, and that they are free to challenge themselves by learning the latest buzz-worthy technologies and thinking up ways to introduce those into our product in a way that will make our customers happy.
Additionally, as R&D managers, we are far more likely to understand the business impact of having too many support tickets or having support tickets that take too long to resolve. We know that the loss on such cases is twofold:
1. The customer that faced an issue and is waiting for a solution may become frustrated, and the company may lose their business.
2. Our engineers are busy solving that customer issue instead of delivering new value to the business.
Therefore, as R&D managers, it is our responsibility to make sure that our engineers have the tools to handle these issues as efficiently as possible when they come up. It is our responsibility to make sure they are motivated, and that they understand the business impact of solving a customer issue as quickly as possible. And whenever possible, it is our job to make sure that they make our application as reliable and stable as possible to begin with.
That ensures that we have test coverage, and the right knowledge, practices and tools to ensure that we have as few customer issues as possible to begin with.
Happily, changes in dev culture usually means that “support” and “business” and even “customers” are not as far away as they used to be. When developing a SaaS platform, your developers often have an almost intimate, real-time ability to track what users are doing and how it impacts their experience. The culture of DevOps has done a lot for breaking the silos that used to separate the dev team from the support team, and to make sure that all members of the team engage and understand how the code they write and the issues they resolve help our customers.
I hope you already found the way to motivate your team to solve such issues, and that they have enough time to drink coffee. If you and your team are still struggling, we suggest bringing in a live debugging tool that can help them solve customer issues even faster, by making sure they have all the data they need at their fingertips, and that they can instantly reproduce any issue in a live, remote environment. Live debugging tools give your engineers a developer-friendly experience, and also gives them the power to reduce issue resolution time, too. Now, all you need to worry about is that your engineers have enough coffee.