Logs. Do we really need them? As developers, we write the code and the tests, and everything seems to be looking great. Why then, do we need to spend even more time writing a bunch of lines nobody is ever going to read?
Recently I did a talk on the subject at StatsCraft (check out the slides here). This post, a summary of the talk I gave, will discuss why we need good, informative logs, and how we can write them well enough. I’ll also cover different tools that might just take your logging to the next level.
Usually, one can divide the logging journey into three stages. First is no logs at all, aka ‘logs are for losers’: at this point, the developers simply don’t believe in the importance of logs. They often realize their mistake only when it’s too late, both figuratively and literally: usually sometime in the middle of the night when something fails for no apparent reason.
The next stage is bad logging: the developer is adding some unhelpful log message. For instance: Something bad has happened. This isn’t really useful, but it is still better than no logging at all.
And finally, the stage we all want to reach - good logging: here we understand logging is important and start writing descriptive messages. Something like:
Org name is not valid, org_name=”hello world”, timestamp is 1989/11/11
So how do you make sure you’re always at the ‘good logging’ stage? It might help to remember the following:
The world of logging for machines is full of possibilities. In most cases, data is going to arrive at some sort of centralized logging system such as ELK, Logz, Splunk, etc. We now have visualization, alerts, aggregation with other logs, smart search, and much more. As you may know, these features are missing when writing a log that is intended to be readable by a human. So how can we make logs better for machines? A good start would be to remember that the King of logging for machines is, of course, JSON.
In order to better understand our logs’ context, it’s wise to add as much information as possible:
Python - Structlog: `structlog` makes logging in Python more powerful and less painful. With it, you can add structure to your log entries. If you wish, `structlog` can take care of your log entries output. You can also choose to forward them to a preferred logging system.
There are several advantages to smart logging:
Similar logging infra exist in any language: Golang - Logrus, JS - Winston, etc.
Distributed tracing: or as it’s often called -- request tracing. This method can be used for profiling and monitoring apps, mostly those based on a microservices architecture. Using distributed tracing can help track down the causes of poor performance and the exact spots where failures had occurred. This method allows devs to track requests through multiple services, providing various metadata which can later be reassembled into a complete picture of the app’s behavior at runtime.
Logging data in advance can be challenging. So what happens if you forgot to set a log line and things come crashing down? Or when the only log you have says Something bad happened? Luckily, there are tools you can use to quickly get the data you need from your application, while it’s running.
Stackdriver: If your app is running on Google cloud you can use Stackdriver to get data from your code using breakpoints in production. However, it is geared towards helping devs integrate with the Google cloud platform, and isn’t meant to be used as a general data collection tool.
Rookout: Set non-breaking breakpoints in your production code and get any data you might need - in an instant. This means you don’t have to write extra code, restart or redeploy your application. Just click and get the exact data you need in real-time. Rookout works on all clouds and platforms seamlessly: On-Prem, Monoliths, GCP, Azure, AWS, AWS Lambda, ECS, Fargate, etc. You can also pipeline your data anywhere it might be needed: DBs, APM, logging, exception manager, alerting, etc. Moreover, your data doesn't have to go through the cloud at all, or even through Rookout's server. You can drive data directly to your final target, even within your internal network.
Among the unique aspects of Rookout as an infrastructure solution is adding a temporal aspect to logs. Suddenly, developers can say and apply things like “I want this log for only a week.” “I want these data snapshots only when a specific condition happens.” “I want this data to be collected until an event happens,” or “I want this service to collect data until another service decides otherwise”. Rookout completely changes the way we can think about logs.
Driving logs down the river of time, we’ve come a very long way. From simple human text lines, through smart, best practices, structured data, processing pipelines, tracing, to on-demand logging for everyone with a click.
Every step in this journey is a step in the evolution of software, both as a whole and per project. No one can be certain if this journey has an end, but the ride and every step of the way sure gets easier with each log line.