K8s: Why it’s time to ditch legacy debugging

Kubernetes is a highly distributed, microservices oriented technology that allows devs to run code at scale. K8S revolutionized cloud infrastructure and made our lives a whole lot easier in many aspects. Developers don’t have to do anything but write code and wrap it in the docker container for K8S to handle. But even its greatest enthusiasts will admit debugging Kubernetes pods is still a pain.

In such a highly distributed system, reproducing the state of an error for simulating the exact situation you need to investigate is very difficult. In this post, I’m going to break down the existing approaches to troubleshooting and debugging Kubernetes applications, both looking at classic local debugging and the new methods of debugging remotely - directly in the cloud and in production - reviewing the pros and cons, and taking a glimpse at the future.

Our forefathers' legacy: Debugging locally

Every developer debugs locally as part of their development cycle. Local debugging is the good old legacy we all grew up on as developers. It’s a crucial part of the development process and we do it pretty much every day. However, when it comes to K8S and the complexities of microservices architecture it becomes immensely difficult.

Each microservice you have will both serve and use services by other microservices. To add a new microservice to this complex architecture you will have to simulate the entire infrastructure and all of the relevant components on your own machine. You’ll have to do the same to be able to debug it. There are currently four popular approaches for simulating the different microservices and all of their dependencies locally:

  1. Automation Script - Usually provided by the DevOps / lead developer, the script makes sure devs can run the microservices on their own machines by simply running execution commands in order. The script often breaks, however, since you have to control the configuration and how the branch you’re using is aligned with other branches running on your machine. For developers, this can be a very iterative and frustrating process.
  2. Hotel - [OpenSource] - Performs as a local process manager for running microservices. Devs can start and stop services and see all of the logs within a single screen in the browser. Has the same disadvantages that the Automation Script has. Plus, it forces the dev team to get familiar with a new tool.
  3. Docker Compose - A tool for defining and running multi-container Docker applications. Its YAML needs to be maintained according to the architectural changes. It might also be difficult to replicate a more advanced Kubernetes configuration as part of the Compose. Another minor disadvantage is that it writes all logs (from all microservices) to one place. This forces developers to use grep command to isolate the logs for each microservice they would like to focus on.
  4. Minikube - An official tool by CNCF which allows you to easily spin up a Kubernetes instance on your machine. Surprisingly, quite often your K8S configurations will likely not work out of the box in Minikube and may require some minor tweaking. Even worse, during the development process, devs may often need to make changes to the K8S configuration - adding/removing services for example. The learning curve required to use K8S with Minikube can be quite intimidating to some devs.  

Sailing on a cloud: Debugging K8s remotely

While you’re able to debug the microservices which are hosted by your cloud providers, K8S has its own orchestration mechanism and optimization methodologies. Those methodologies make K8S great, but they also make debugging such a pain. Accessing pods is a very unstable operation. If you want to SSH to try and run your debugging tools on your pod, K8S might actually kill it a second before you get the data you wanted. So what are your current options?

  1. `logger.info("Got Exit Signal: {}".format(sig))` - The oldest trick in the book.
  2. Attaching to a process - This can be hard since you’ll have to share process id namespace between the debugger and the application - between the containers inside the pod;
  3. Redirecting traffic from the cluster to the developer's machine - this will help you recreate an issue, but it isn’t secure and has disadvantages. If lots of data is pipelined through your system, this might be something your local computer won’t be able to handle.
    a. Sometimes you need to install DaemonSet on each node - which is privileged and mounts the container runtime socket.
    b. Privileged service running on each node, able to see all processes on all nodes.
    c. Redirecting traffic capability exposes data to the internet.
  4. Service mesh (Istio/ Linkerd, etc.) - this term describes the network of microservices and the interactions between them. It can track your microservices without the need to change your code. Service mesh proxies both inbound and outbound traffic, which makes it an ideal place to add debugging and tracing capabilities. It’s out-of-the-box distributed tracing capabilities allow you to see the full flow of a request through your microservices stack, and to pinpoint problematic requests or microservices. You can also get out-of-the-box success rate, requests per second and latency percentiles, and send them directly to your metrics DB, like Prometheus. The main downside of service mesh debugging is the fact it lacks the ability to find the root-cause of an issue. It can tell you that microservice A is slow, but it won’t tell you why. This will often require you to dive back into the code with other tools to get to the bottom line.
  5. Adding logs at runtime solutions - this is the easiest way to deploy to your K8S architecture. Deployment means adding an SDK only to your code. It allows you to add more logs on the fly and you won’t have to write more code and redeploy to get your data instantly. This is, in fact, a dynamic way to get logs and applicative data from your code in real time. Solutions in this space include Stackdriver debugger for GCP, and Rookout (all clouds)

Meet Rookout: on-demand, live datapoint collection

Rookout implements the fifth approach: adding logs/debug-snapshots at runtime and provides a solution for rapid debugging for dev, staging and production environments. It allows you to get the data you need from your Kubernetes application without writing more code, restarting and redeploying. And best of all, it works the same way for local, remote, and even production deployments.

It feels just like working with a regular debugger. You set a breakpoint and get data instantly, only it never stops your application, at any moment. Rookout collects the data and pipelines it on the fly while allowing the application to perform continuously. Today it supports Node.JS, JVM based languages (Java, Scala, Kotlin, etc.) and Python-based applications (2 and 3) both for PyPy and CPython interpreters, and will gradually cover everything (vote for your favorite).

As you can see in the video, I just discovered a bug in my K8S app.

Want to see how I fix it in just a couple of minutes? Register to check out the full video-guide to debugging K8S.

Still losing hours on getting data from your live code?

No credit card required