Pushing Forward With Complex Code
As Eagle began their migration to a tech stack built out of new, modernized, containerized infrastructure, they started to understand that they were facing a problem. They were facing difficulties not only seeing what was going on in their production environment, but also being able to troubleshoot when errors did occur and facing long resolution times as a consequence.
Even worse, the team at Eagle found that due to all of these difficulties, they had been forced into a lose-lose situation. They were working to keep as many of their customers as possible happy, which meant that they had to deprioritize certain tickets where the issues were especially difficult to reproduce.
The resolution time for these tickets often exceeded the team’s target resolution time, because resolving them meant having to reproduce the issue in a dev environment, and that would mean that the customer support team could answer fewer requests per day, critically slowing down their response time and customer satisfaction.
According to David Julia, Head of Engineering at Eagle, “Working on support resolution was like working in the dark ages. We had to grab logs from individual EC2 instances and had to try to figure out the problem and recreate the situation locally. That would involve grabbing an export of data, an export of the charts, the workspaces, and the different configurations the user had made. To make it worse, none of this was guaranteed because sometimes in IoT you have these weird conditions such as a device losing connectivity or something unusual happening that you simply aren’t able to recreate in a local environment. So, we would try our best, but there was often no guarantee we would be able to recreate the issue.”