Monitoring your infrastructure with StatsD and Graphite

November 8, 2019


Table of contents

1. Introduction

          1.1 Graphite web application

          1.2 Carbon

          1.3 Whisper

          1.4 StatsD

          1.5 Setup

          1.6 Bucket

          1.7 Value

          1.8 Type

          1.9 Counters

          1.10 Timers

          1.11 Gauges

2. Integration with Node.js

3. Integration with Java

4. Horizontally Scaling StatsD

5. Conclusion


1. Introduction

Collecting metrics about your servers, applications and traffic is a critical part of an application development project. There are many things which can go wrong in production systems, and collecting and organizing data can help you pinpoint bottlenecks and problems in your infrastructure.

In this article, we will discuss Graphite and StatsD, and how they can help form the basis of monitoring infrastructure. 

Graphite is a library made up of several components. We’ll take a brief look at each component here: 

1.1 Graphite web application

Graphite web application is the place where you can create graphs and plot data. The web application allows you to save graph properties and layouts.

1.2 Carbon

Carbon is the storage backend for graphite. Carbon is essentially a daemon which can be configured to run on TCP/UDP ports. To handle the increasing load and configure replication and sharding, multiple carbon daemon can be run on the same host, or multiple hosts, and can be load distributed using carbon relay.

1.3 Whisper

Whisper is a database format which graphite uses to store the data.

Whisper allows for higher resolutions (seconds per point) of recent data to degrade into lower resolutions for long-term retention of historical data.

Now that we have discussed, Graphite, we will discuss StatsD

1.4 StatsD

StatsD is a node.js application. It was made in order to transmit data points about networks, servers and applications - which can then be rendered into graphs.

1.5 Setup

We will use the docker image located at: https://hub.docker.com/r/graphiteapp/docker-graphite-statsd/

Here is the very simple docker-compose.yml:

<p>CODE:https://gist.github.com/denshirenji/f2b3c9ab4c365718f4e61cb77797b3ed.js</p>


After running this docker image, browse to http://localhost, the browser will load the graphite web application as shown below:

Graphite web application


At this point, the Graphite metrics should be empty. Let’s test out the deployment, we can send a simple metric to our StatsD demon using below command:

          echo "deploys.test.myservice:1|c" | nc -w 1 -u localhost  8125

Here the syntax is as follows:

          bucket:value|type

1.6 Bucket

The bucket is an identifier for the metric. Metric datagrams with the same bucket and the same type are considered occurrences of the same event by the server. In the example above, we used “deploys.test.myservice” as our buckets.

1.7 Value

The value is a number that is associated with the metric. Values have different meanings depending on the metric’s type.

1.8 Type

The type will determine what type of metric it is. There are different metric types such as timers, counters, gauges, histogram etc. 

1.9 Counters

Counters, as the name suggests simply count an occurence. For example, if you want to count how many times an order has been created. You would create a counter such as “order.create” and increment it every time an order is created within your application. The usefulness of this metric comes when observed over a time-interval. That can give you the number of orders created within a day or number of hours etc.

1.10 Timers

Timers are different from counters as they measure the time interval. For example, if you want to measure how much time a REST api took to respond, you would use a timer. The single metric of timer is not very useful, for example 100 ms. It's more useful when combined over a time interval, such as 6 hours. There are various submetrics automatically computed with each metric such as mean, standard deviation, 50th percentile, 90th percentile, 95th percentile, etc.

          echo "deploys.test.myservice.time:55|ms" | nc -w 1 -u localhost 8125

1.11 Gauges

Gauges are used for fixed values which can either increase or decrease. For example, you could use a gauge to represent the number of threads in an application, or the number of jobs in a queue.

Here is our carbon web application showing both the counter and timer values on a single graph:


2. Integration with Node.js

Just now, we have seen how to send metrics through command line. In real life, that won’t be the case as the metrics would be generated by some applications such as those running the Node.js or Java based servers.

Let’s see how an application written in node.js can send metrics. Consider an express server running on port 3000 as seen below:


<p>CODE:https://gist.github.com/denshirenji/b06e88c5e7bfa0ce2bd58ee78e390d12.js</p>


First, we need to install node-statsd using npm as: 


          npm i node-statsd --save


We then create an instance of StatsD client as follows:


          const StatsD = require("node-statsd"),

              client = new StatsD();


The StatsD constructor takes a number of optional arguments such as the host and port of the machine which is running the StatsD server. The full documentation can be found at https://github.com/sivy/node-statsd

In my case, I was running the StatsD on default option which is http://localhost and port 8125.

Once we have created an instance of the client, we can call various methods to send metrics to our application. For example, we can track the count and the timing of the api invocations as follows:


<p>CODE:https://gist.github.com/denshirenji/dd6283e552090be0467df63d0f4523cf.js</p>


As soon as I type http://localhost:3000 in my browser, the API gets invoked and the StatsD client gets into action. I am able to see the updated metrics in my graphite web application.





For all the possible methods available on client instance, check out the documentation at https://github.com/sivy/node-statsd.


3. Integration with Java

Integration with Java based clients is very similar to Node.js. If you are using a build system such as Maven or Gradle (which is highly recommended), there is a utility jar available to make this integration easier. Add the following to the build configuration to have it included automatically:


For Maven:

          <dependency>

              <groupId>com.timgroup</groupId>

              <artifactId>java-statsd-client</artifactId>

              <version>3.1.0</version>

          </dependency>


For Gradle:

          compile group: 'com.timgroup', name: 'java-statsd-client', version: '3.1.0'


Once the client library is imported, we will create the instance of StatsDClient interface using the implementation class NonBlockingStatsDclient providing our desired prefix, hostname and port on which our StatsD server is running on.

As shown below, there are straightforward methods available on this interface such as time(), incrementCounter(), etc. which will send the Graphite to the StatsD server. For the full documentation, refer to https://github.com/tim-group/java-statsd-client.


<p>CODE:https://gist.github.com/denshirenji/714b96f9b066f6c37acb33fa13b56133.js</p>


4. Horizontally Scaling StatsD

Regarding the infrastructure, a single StatsD server would not be able to handle all of the load and would ultimately need horizontal scaling. Horizontal scaling with StatsD cannot be a simple round robin load balancing because one can perform aggregation in StatsD as well. If the metrics for the same key are distributed across multiple nodes, a single StatsD cannot aggregate accurately over a metric.

Hence, the creators of SatsD have released a StatsD Cluster Proxy which uses consistent hashing to make sure that same metrics are always sent to the same instance. 

Below is a very simple configuration for StatsD cluster proxy:


<p>CODE:https://gist.github.com/denshirenji/d0a6fd6edabae129ed473d9936c579de.js</p>


Once the config file is setup, simply run it as follows:


          node proxy.js proxyConfig.js


Proxy.js is available at the root of StatsD installation directory.

Some of the configuration keys deserve an explanation:

  • CheckInterval: determines the health check interval. If a node is offline, cluster proxy will take it out of the configuration.
  • Server: The server binary is loaded from the nodes configuration specified in the “nodes” configuration.

5. Conclusion

StatsD and Graphite make an excellent choice for monitoring your infrastructure. All of the code mentioned above and the configuration is available in the github repository.

Some of the key advantages are:

  • Low memory footprint: StatsD is a very simple node.js based server resulting in a very low memory footprint, which means that it is very easy to kickstart this setup in your infrastructure.
  • Network efficient: Since StatsD can work over UDP, which is a connection less protocol, large amount of data can be transferred in a very short amount of time. 

This post was written by our guest blogger Madhur Ahuja. Follow him on twitter for more great ideas and information about monitoring!

Related Posts

GET FREE monitoring FOR 14 Days