What is Graphite Monitoring?

What is Graphite Monitoring?

Table of Contents


Today we are going to touch up on the topic of why Graphite monitoring is essential. In today’s current climate of extreme competition, service reliability is crucial to the success of a business. Any downtime or degraded user experience is simply not an option as dissatisfied customers will jump ship in an instant.


Operations teams must be able to monitor their systems organically, paying particular attention to Service Level Indicators (SLIs) pertaining to the availability of the system. Like an F1 pit team, the stakes are high, and precise tooling is crucial. 


This article will focus on the monitoring tool: Graphite


Graphite monitoring provides operations teams with visibility on varying levels of granularity concerning the behavior and mannerisms of the systems and applications. This leads to error detection, resolution, and continuous improvement.


MetricFire specializes in providing a Hosted Graphite service for monitoring. With minimal configuration, you can gain in-depth insight into your systems. If you would like to learn more about it please book a demo with us, or sign up for the free trial today.



Key Takeaways

  1. Graphite monitoring offers insights into system behavior and application performance at varying levels of granularity, aiding in error detection, resolution, and continuous improvement.
  2. Graphite stores numeric time-series data, allowing for trend analysis, pattern recognition, and even future event forecasting.
  3. Graphite accepts data through protocols like Plaintext, Pickle, and AMQP, with StatsD and CollectD commonly used as data collectors.
  4. Carbon is Graphite's backend that stores metrics temporarily before flushing to disk in Whisper's database format. Whisper is a fixed-size, file-based time-series database designed for fast and reliable storage.
  5. Graphite-web is a web app for visualizing metrics, and the Graphite composer allows for on-the-fly data interpretation. Grafana is another popular alternative for visualization.

What is time series data?

Graphite stores numeric time-series data (metric, value, epoch timestamp) and renders graphs of this data on demand. A time series is a sequence of observations taken sequentially in time. Time series analysis reveals trends and patterns associated with external factors and anomalies. With adequate graphing teams and enough time series data, it's even possible to intuitively forecast future events.


As a general rule of thumb, a time series database should meet the following requirements.


  • Highly available, even amidst a high volume of concurrent reads and writes, the nature of time series data results in more frequent write operations (95-99%) as opposed to reading.
  • The ability to maintain low latency queries in the face of high throughput.
  • The capability for massive-scale data volume with cold and hot data separation. It is common for an ingestion service to operate on 5 billion metrics per minute storing up to 1.5 petabytes of time-series data.
  • Distributed architecture: Considering the requirements of data write and storage, it is recommended for the underlying layer to have distributed architecture capability.



Protocols and Collectors

As Graphite's design is orientated towards modularity and doing one thing very well, there is no direct data collection support. Carbon, one of the three Graphite components, listens passively for data. Solutions such as StatsD and CollectD are used to collect and parse data upstream to Graphite as different protocols. 



It is worth discussing the protocols Graphite accepts; Plaintext, Pickle, and AMQP. Plaintext messages take the following structure: <metric path> <metric value> <metric timestamp>; best used for trivial scripts or test purposes as it requires no additional formatting.


If sizeable amounts of data are involved, one should pack pickled data into a packet containing a simple header, and send the data over a TCP socket to Carbon's pickle receiver (by default, port 2004). Graphite can also accept data using AMQP (The Advanced Message Queuing Protocol).


AMQP protocol ensures reliable data transfer using a message broker which acts as a middleman in a distributed system. Machines use the broker as a central point of contact, it then orders the messages in a queue, and the client collects it when there is capacity available. Avoiding dead time (blocking calls) as sender and receiver are not reliant on each other to continue working—enabling asynchronous communication.


If your team is already using a broker such as RabbitMQ to publish and consume data, it is possible to integrate Graphite by forking an incoming stream of messages into another queue.



Applications use a collector client to feed device metrics upstream to a Graphite server; common collectors being StatsD or CollectD


StatsD is an event counter/aggregation service; listening on a UDP port for incoming metrics data it periodically sends aggregated events upstream to a back-end such as Graphite. Today, StatsD refers to the original protocol written at Etsy and to the myriad of services that now implement this protocol. 


CollectD is a statistics collection daemon that regularly polls various sources, such as your OS, CPU, RAM, and network before sending it upstream.


These two services complement each other very well. For a more in-depth discussion and comparison of StatsD and CollectD you can access the article here.



Carbon and Whisper


Graphite's back end is a Daemon process named Carbon (carbon-cache). It listens for inbound metric submissions and stores the metrics temporarily in a memory buffer cache before flushing to disk in Whisper's database format. It is built on top of Twisted, which is a highly scalable event-driven I/o framework for Python.


Twisted allows for efficient asynchronous communication with many clients and can handle an extensive amount of traffic with low overhead. The carbon relay (optional) receives metrics from clients and applies a set of rules (Regular Expressions). It determines which carbon-cache server to relay the data to, which provides a type of replication.


Still, there is no synchronization in place; the visual representation will be corrupt in the case of a node failure. However, it is possible to configure the re-synchronization process, but the scripts provided by Graphite require significant trial and error to get right. In turn, making it ready for a production environment.



Whisper is a fixed-size, file-based time-series database. Applications are able to retrieve and manipulate data from Whisper using standard REST (create, update, and fetch operations). The design of Whisper shares many attributes with an RRD (round-robin database), providing fast, reliable storage of numeric data.


The design handles the files on disk and downsamples* for long-term retention, storing high-precision raw data for a finite amount of time, and lower precision, summarised data, for more extended time frames.


* The process of converting high-resolution time-series data into low-resolution time-series data.



However, problems have emerged for Graphite in the cloud era; despite running multiple carbon agents and running on SSD drives, performance does not improve without expert tuning. Storage is the primary deficiency and remedies such as sharding across multiple nodes introduce too much complexity with the current design.


The community has responded with Ceres, intended to replace Whisper as the default back-end - it is a redesign of the round-robin database format. In contrast to Whisper, Ceres is not a fixed-size database and is designed to better support sparse data of arbitrary fixed-size resolutions. This allows Graphite to distribute individual time series across multiple servers or mounts.


It is a good alternative for users who want to maintain their current system architecture rather than disrupt operations with a migration to an alternative storage backend. Unfortunately, development is still ongoing, and only a small percentage of the community is running Ceres in production.


MetricFire’s Hosted Graphite solution mitigates this problem by replacing whisper storage for seamless scaling with multiple redundant copies of your data. To learn more from one of our experts, book a demo with us. Alternatively, you can try it out for yourself with a 14-day free trial.




Graphite-web is a Django-based web app that provides a simple user interface for visualizing the stored metrics in a graph format. It is created using an intuitive URL-based API for immediate graphing. As it uses Cairo for rendering graphs, it depends on several graphics-related libraries typically absent from standard VMs. Make sure to run the dependencies script in the configuration to avoid unnecessary complications during installation.  


The Graphite composer is the best way to learn Graphite's visualization capabilities. All of the metrics are present in a hierarchical tree structure on the left-hand side; clicking on the metrics adds its data series to the composer canvas. From here, it is easy to apply transformative functions for novel on-the-fly interpretation of the data. 


Graphite's features are exposed via the API as the UI consumer has the same endpoints. As this design is so clean, many alternative visualization tools are compatible with Graphite. The most popular is Grafana which provides a much more sleek Aesthetic.


There are many ways to create and display graphs, including a simple URL API for rendering that makes it easy to embed graphs in web pages. This allows for easy sharing between teammates who can make adjustments and pass back the new URLs which are immediately loaded into the composer, allowing for quick discussion and comparison.


MetricFire specializes in monitoring systems by using both Graphite and Grafana as a service.




Graphite provides a comprehensive library of statistical and transformative rendering functions capable of manipulating series data streams into critical gauges of system activity. One of the most prominent features of Graphites renders API is the ability to chain functions together, allowing engineers to compose deep levels of granularity.


As each series on a chart is associated with a stream of data, it is possible to pipe the output of one processing function into the next, combining the piped and nested function. An example as shown below:


sumSeries(stats_global.production.counters.api.requests.*.count)|scaleToSeco nds(60)|movingAverage(30)|alias('api.avg')


There are always cases when a custom function is necessary; every production system has its quirks and anomalies. It is possible to add custom processing functions to the Graphite API. Custom functions are packaged as python modules and are loaded by Graphite when placed in the /opt/graphite/webapp/graphite/functions/custom folder. More information on writing and using custom functions is available here.


The following is a simple example of replacing underscores in Metric Names:


from graphite.functions.params import Param, ParamTypes

def  formatHostLegend(requestContext, seriesList):
        """Custom function that prints a pretty-fied legend name"""
        for series in seriesList:
                pos = series.name.find(".perfdata")
                first = series.name[0:pos]
                second = series.name[pos:]

                series.name = first.replace('_', '.') + second

        return series list

# Define group
formatHostLegend.group = 'Custom'

# Define parameters for the callback
formatHostLegend.params = [
        Param('seriesList', ParamTypes.seriesList, required=True)

# Register the callback function
SeriesFunctions = {
        'formathostlegend': formatHostLegend




Old and new school engineers love Graphite, but few monitoring tools are as malleable. The evidence lies in the diverse range of companies using the tool in their production systems - Twitch, Etsy, Github, and SendGrid, to name a few.


However, these teams have experts who know Graphite inside out, and they know how to tune this tool to merge and mutate with their current systems. Most organizations do not have the resources or expertise to do this.


This is where MetricFire can help. We can provide this expertise for your team and deliver a fully hosted Graphite solution tailored to the needs and nuances of your system. Your team will not have to worry about scalability, releases, plugins, maintenance, tuning, or backups. Everything will work out of the box tailored to your needs with 24/7, 365 continuous automated monitoring from around the world.


We took the best parts of open-source Graphite and supercharged them. We also added everything that is missing in vanilla Graphite: a built-in agent, team accounts, granular dashboard permissions, and integrations to other technologies and services like AWS, Heroku, logging tools, and more.


If you would like to learn more about Graphite monitoring you can book a demo with us, or sign up for the free trial today.            

You might also like other posts...
metricfire Apr 10, 2024 · 9 min read

Step-by-Step Guide to Monitoring Your SNMP Devices With Telegraf

Monitoring SNMP devices is crucial for maintaining network health and security, enabling early detection... Continue Reading

metricfire Mar 13, 2024 · 8 min read

Easy Guide to monitoring uWSGI Using Telegraf and MetricFire

It's important to monitor uWSGI instances to ensure their stability, performance, and availability, helping... Continue Reading

metricfire Mar 12, 2024 · 8 min read

How to Monitor ClickHouse With Telegraf and MetricFire

Monitoring your ClickHouse database is a proactive measure that helps maintain its health and... Continue Reading

header image

We strive for
99.999% uptime

Because our system is your system.

14-day trial 14-day trial
No Credit Card Required No Credit Card Required