A few weeks ago we installed Datadog in our staging and production environments.
All in all, it was a smooth ride, with a few small hiccups that we resolved along the way. If you’re about to install Datadog and your environment is similar to ours (with Kubernetes, Python and these other goodies) you should find this post handy.
As a brief introduction, Rookout’s SaaS solution offers Dev/Ops teams some sleek and handy tools for rapid production debugging, including the ability to collect ad-hoc custom metrics and send them to Datadog.
When customer adoption started soaring and we were getting millions of messages per day from our clients, we figured that it's high time to take our SaaS performance and availability monitoring to the next level by adding Datadog to our own setup.
Datadog’s monitoring solution is renowned for its ease of use and friendly pricing. That makes it a perfect match for our needs as an early-stage startup. They offer 3 levels of monitoring capabilities:
All three levels are relevant to our business. Each requires a different degree of effort and tweaking to integrate with our existing orchestration tools.
Rookout’s web-facing production environment is based on the following components:
Helm provides useful functionality on top of Kubernetes:
At Rookout, our application is defined as a Helm chart and deployed multiple times to the same cluster (production, staging, etc.). We also use Helm to deploy infrastructure services such as Fluentd.
Datadog integration with GCP is pretty straightforward and is accomplished by adding a service account with the necessary permissions to your GCP account. Easy-to-follow instructions can be found here. In order to monitor additional elements of GCP (in our case GKE) simply install integrations from the Datadog integration page.
A ready-to-use Helm chart is available here for the Datadog agent. If Helm is installed you can install the Datadog agent on your current cluster simply by running the following:
A quick explanation of the command:
Note! This super-convenient installation does not create a Datadog agent service on our Kubernetes cluster. Instead, it relies on exposing the host’s port.
This one takes a few steps, so be patient.
Start by adding the PyPi packages for the Datadog APM add Datadog SDK to your requirements.txt file. While the Datadog SDK is not strictly needed, we’ll put it to good use.
Load the Datadog APM and connect it to the Datadog agent. Connecting the Datadog APM to agent’s exposed port can be a bit tricky for our use case since we do not know the agent’s IP address or hostname.
Fortunately, Datadog solves this problem nicely in their more mature Datadog SDK with a simple, container-oriented configuration. While we can’t use the same configuration for the Datadog APM, we can reuse the same code:
The Datadog APM behaves inconsistently with environment variables. Some affect the APM only if they’re executed from command line. Quite often, they aren’t properly documented.
The DATADOG_ENV variable is one such is environment variable, so if we want it to take effect, we must set it manually (copied from here):
To add web framework support, update the patch_all command to the following:
Flying colors? Not quite yet. After setting this configuration (which works perfectly!) we encountered an underlying Tornado bug.
The tornado.web.FallbackHandler is the recommended way to use WSGI containers in Tornado applications. However, it did not properly call RequestHandler.on_finish, which the Datadog APM uses for tracing. As a quick workaround, we subclassed FallbackHandler:
And used it to call the WSGIContainer:
As a DevOps expert, you’ve probably had the sometimes dubious pleasure of installing products. So you know that it can get tricky at times -- in fact, so tricky that you might be tempted to stop the installation and just do without it.
It’s important to remember that the tips, tricks, and workarounds that you develop to overcome these challenges are valuable resources. Be generous about sharing them, and check around carefully for smart tips and tricks like the ones we shared here.
At Rookout, we’re delighted to be working with amazing resources and solutions and will keep sharing the tips we develop to make integrations as smooth and easy as they can possibly be. We look forward to hearing great tips from our partners as well!
Wishing you a smooth integration :)