Graphite and Metric Collisions

Graphite and Metric Collisions

This article was originally published on March 3, 2015, by Charlie von Metzradt, co-founder of Hosted Graphite, for the Hosted Graphite blog. Since then, Hosted Graphite has become MetricFire but our goal has stayed the same: Monitoring should be accessible. For more information and for updates on new features, book a time with our team!

One of the really useful things about Graphite (and maybe the main one if you were going to pick one standout that has led to its wide adoption), is that you can just fire a new metric at the collector and Graphite will happily accept it and you get useful graphs.  Add some code to your app, or configure a plugin for collectd or diamond, restart your app, and quickly your new metrics appear like magic!

An image of two abandoned ship containers collided into each other

There are two possible issues with this:

  1. With a lack of basic controls, this can also be a problem - if someone commits a chunk of unreviewed code that fires off a rapidly changing or random element in a metric name, you’re going to end up with a whole lot of junk in your system. Add a username to your metric name in a system with a few million unique users. Whoops!
  2. Metric name collisions - if you have more than one server sending any given metric name at the same time, it’s like the movie Highlander (with fewer Freddie Mercury/lightning effects): THERE CAN BE ONLY ONE!

As Jason Dixon summed it up in a recent post on the Graphite-dev mailing list:

It’s assumed that you avoid namespace collisions in each backend cluster. Otherwise, whichever backend returns the query first, “wins”.

Let’s say someone accidentally uses the same metric name in a few different places - a picture trying to get useful information from two completely separate sets of data that have been interpolated side-by-side rather than collected together and processed.

Well, that sucks. When building out the backend for Hosted Graphite, we spent a lot of time trying to figure out the best and worst parts of Graphite so we can focus on the good and eliminate or mitigate the bad.  In the usual love-hate relationship that people have with Graphite the fast metric creation is great, and collisions are just sort of annoying behavior.

In our setup, we have control over where we collect from - and have removed any issues with metric collisions. We’re greedy! If you send the same metric name from multiple servers, we collect all the data. By default, we display the average, but we also collect all the data points and give you a true sum, minimum, and maximum as well as a few other more exotic views like a random sampling of data to be used for percentile data.

Not suffering from metric namespace collisions is particularly useful if you don’t want to pre-aggregate your data somewhere yourself, or you’re looking to count something quickly across servers. No weird interpolations, just data that does what it’s supposed to.

You might also like other posts...
heroku Feb 14, 2024 · 3 min read

Heroku Router Path Metrics

Learn more about how to collect Heroku Router metrics by path using Hosted Graphite's... Continue Reading

monitoring Oct 16, 2023 · 11 min read

Monitoring CPU Temperature with Hosted Graphite

Learn how to monitor CPU temperature using Hosted Graphite, and discover the benefits it... Continue Reading

monitoring Oct 11, 2023 · 13 min read

Monitoring RabbitMQ With Prometheus and Grafana

Monitor your RabbitMQ with Prometheus/Grafana, and visualize your node, queue, and cluster-wide metrics. Continue Reading

header image

We strive for
99.999% uptime

Because our system is your system.

14-day trial 14-day trial
No Credit Card Required No Credit Card Required