Graphite Monitoring Tool Tutorial.

Graphite Monitoring Tool Tutorial

Table of Contents

Introduction: Graphite monitoring

In this post, we will go through the process of configuring and installing Graphite on an Ubuntu machine


What is Graphite Monitoring?


In short; Graphite stores, collects, and visualizes time-series data in real time. It provides operations teams with instrumentation, allowing for visibility on varying levels of granularity concerning the behavior and mannerisms of the system. This leads to error detection, resolution, and continuous improvement. Graphite is composed of the following components.


  • Carbon: receives metrics over the network and writes to disk using a storage backend.
  • Whisper: file-based time-series database. 
  • Web: Django app which renders graphs and dashboards.


Sign up for the MetricFire free trial to set up Graphite and build your Grafana dashboard. You can also book a demo and talk to the MetricFire team on how you can best set up your monitoring stack.



Key Takeaways

  1. Graphite is a tool that stores, collects, and visualizes time-series data in real-time. It offers granular visibility into system behavior, aiding in error detection, resolution, and continuous improvement.
  2. You can customize the Graphite web app's user interface, including graph dimensions and themes, to suit your preferences.


Ubuntu 20.04 with at least 2GB of RAM.


System Update


sudo apt update
sudo apt upgrade -y


Graphite Stack Installation

First, we must satisfy build dependencies for the various Graphite monitoring tool components. This is done via the command line:


sudo apt -y install python3-dev python3-pip libcairo2-dev libffi-dev build-essential


Set PythonPath to augment the default search path for module files.


export PYTHONPATH="/opt/graphite/lib/:/opt/graphite/webapp/"


Install the data storage engine.


sudo -H pip3 install --no-binary=:all:


Install Carbon data-caching daemon.


sudo -H pip3 install --no-binary=:all:


Install the web-based visualization frontend.


sudo -H pip3 install --no-binary=:all:


Install and Configure Database

Graphite uses SQLite as the default database to store Django attributes such as dashboards, preferences, and graphs. Metric data is not stored here. However, here we will demonstrate PostgreSQL integration. The following is the software required for communication between Graphite and PostgreSQL.


sudo apt-get install postgresql libpq-dev python3-psycopg2


The next step is to create a database with a username and password. The TeamPassword password generator helps here.


sudo -u postgres psql


Graphite Web Configuration

Graphite-web uses the convention of importing a file from the web app module - Graphite-web’s runtime configuration loads from here. We must copy an example template before adding our desired configuration to the web app.


cd /opt/graphite/webapp/graphite
sudo nano /etc/graphite/


Uncomment and edit the following attributes secret_key, timezone, remote_user_authentication, debug, and databases sections as outlined below.




Set this to a long, random unique string to use as a secret key for this install. This key salts the hashes; used in auth tokens, CRSF middleware, cookie storage, etc. - should be set identically among instances if used behind a load balancer - use uuidgen.


TIME_ZONE = 'Europe/Amsterdam'


Set your local timezone (Django's default is America/Chicago). If your graphs appear to be offset by a couple of hours, then this probably needs to be explicitly set to your local time zone.


DEBUG = True


We also set DEBUG to True here because current versions of Django will not serve static files (JavaScript, images, and so on.) from the development server we are using in our demonstration. A more formal installation would leave the DEBUG setting disabled.




REMOTE_USER authentication. See:


   'default': {
     'NAME': 'fire',
     'ENGINE': 'django.db.backends.postgresql_psycopg2',
     'USER': 'metric',
     'HOST': '',
     'PORT': ''


Above is an example of using PostgreSQL. The default database is SQLite; 'django.db.backends.sqlite3'.


PostgreSQL, mySQL, sqlite3, and Oracle are all Graphite compatible.


Graphite Schema

It is necessary to set up an initial Graphite schema with the following command.


sudo -H PYTHONPATH=/opt/graphite/webapp django-admin migrate 
--settings=graphite.settings --run-syncdb


At this point, the database is empty, so we need a user that has complete access to the administration system. The Django-admin script outlined below; with the “createsuperuser” arg, will prompt you for a username, e-mail, and password; creating an admin user for managing other users on the web front end.


sudo -H PYTHONPATH=/opt/graphite/webapp django-admin createsuperuser 


Static Content

/opt/graphite/static is the default location for Graphite-web’s static content. One must manually populate the directory with the following command:


sudo -H PYTHONPATH=/opt/graphite/webapp django-admin collectstatic --noinput 


Carbon Configuration

Next, there are two configuration files that Carbon uses to control its cache and aggregation abilities, as well as the output storage format. We must copy the example configuration files as a template for carbon.conf and storage-schemas.conf.


sudo cp /opt/graphite/conf/carbon.conf.example /opt/graphite/conf/carbon.conf
sudo cp /opt/graphite/conf/storage-schemas.conf.example 


Add the following to storage-schemas.conf to define retention and downsampling requirements; as recommended by StatsD.


sudo nano /opt/graphite/conf/storage-schemas.conf

pattern = ^stats.*
retentions = 10s:6h,1m:6d,10m:1800d


The above translates for all metrics starting with 'stats' (i.e. all metrics sent by StatsD), capture:

  • Six hours of 10-second data (what we consider "near-real-time")
  • Six days of 1-minute data
  • Five years of 10-minute data


The recommendations also outline aggregation specifications to ensure matching patterns; preventing data from being corrupted or discarded when downsampled. 


Edit the conf/storage-aggregation.conf file to mimic the following.


pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min

pattern = \.upper(_\d+)?$
xFilesFactor = 0.1
aggregationMethod = max

pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum

pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum

pattern = .*
xFilesFactor = 0.3
aggregationMethod = average


Metrics ending with .lower or .upper, only the minimum and the maximum value retained. See StatsD for more details.


At this point, we can do a quick test to ensure the setup is correct. Run the web interface under the Django development server with the following commands.


cd /opt/graphite 
sudo PYTHONPATH=`pwd`/whisper ./bin/ 
--libs=`pwd`/webapp/ /opt/graphite/


By default, the server will listen on port 8080, and point your web browser to


The graphite interface should appear. If not the debug mode configuration should provide enough information; if not tail the latest process log.


tail -f /opt/graphite/storage/log/webapp/*.log



We will now expose the web application using Nginx which will proxy requests for Gunicorn, which in turn listens locally on port 8080 serving the web app (Django application).


sudo apt install gunicorn nginx
sudo ln -s /usr/local/bin/gunicorn /opt/graphite/bin/gunicorn


Create Nginx log files and add the correct permissions.


sudo touch /var/log/nginx/graphite.access.log
sudo touch /var/log/nginx/graphite.error.log
sudo chmod 640 /var/log/nginx/graphite.*
sudo chown www-data:www-data /var/log/nginx/graphite.*


Create a configuration file called /etc/nginx/sites-available/graphite and add the following content. Change the HOSTNAME to match your server name.


upstream graphite {
    server fail_timeout=0;

server {
    listen 80 default_server;

    server_name HOSTNAME;

    root /opt/graphite/webapp;

    access_log /var/log/nginx/graphite.access.log;
    error_log  /var/log/nginx/graphite.error.log;

    location = /favicon.ico {
        return 204;

    # serve static content from the "content" directory
    location /static {
        alias /opt/graphite/webapp/content;
        expires max;

    location / {
        try_files $uri @graphite;

    location @graphite {
        proxy_pass_header Server;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;
        proxy_connect_timeout 10;
        proxy_read_timeout 10;
        proxy_pass http://graphite;


We need to enable the server block files by creating symbolic links from these files to the sites-enabled directory, which Nginx reads from during startup.


sudo ln -s /etc/nginx/sites-available/graphite /etc/nginx/sites-enabled
sudo rm -f /etc/nginx/sites-enabled/default


Then validate Nginx configuration.


sudo nginx -t 
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful


Finally, restart the Nginx service.


sudo systemctl restart nginx



Applications use a collector client to feed device metrics upstream to a Graphite server; typically using StatsD or CollectD. StatsD is an event counter/aggregation service; listening on a UDP port for incoming metrics data it periodically sends aggregated events upstream to a back-end such as Graphite.


Today, StatsD refers to the original protocol written at Etsy and to the myriad of services that now implement this protocol.


StatsD requires Node; to install, use the following commands.


curl -L -s | sudo bash
sudo apt install -y nodejs git
ln -s /usr/bin/node /usr/local/bin/node


Clone StatsD from the Etsy repository.


sudo git clone /opt/statsd


Add the following configuration for Graphite integration.


sudo nano /opt/statsd/localConfig.js

   graphitePort: 2003,
   graphiteHost: "",
   port: 8125,
   backends: [ "./backends/graphite" ]



We will use supervisor to manage the Carbon, StatsD and Gunicorn processes. A configuration file is required for each process; outlined below.


sudo apt install -y supervisor




sudo nano /etc/supervisor/conf.d/statsd.conf


command=/usr/local/bin/node /opt/statsd/stats.js /opt/statsd/localConfig.js








sudo nano /etc/supervisor/conf.d/gunicorn.conf


command = /opt/graphite/bin/gunicorn -b -w 2 --pythonpath 
/opt/graphite/webapp/ wsgi:application

directory = /opt/graphite/webapp/



redirect_stderr = true




sudo nano /etc/supervisor/conf.d/carbon.conf


command = /opt/graphite/bin/ --debug start



redirect_stderr = true
Restart supervisor for the new configuration to be reloaded.
sudo systemctl restart supervisor
sudo systemctl enable supervisor


The following command will reveal if the processes are running successfully or not.


sudo supervisorctl 

carbon                           RUNNING   pid 1320, uptime 1:41:29
gunicorn                         RUNNING   pid 1321, uptime 1:41:29
statsd                           RUNNING   pid 1322, uptime 1:41:29


If there is an error you can debug with the following.


systemctl status supervisor 
tail -f /var/log/supervisor/supervisord.log


Exploring StatsD and Graphite Interaction

Now that we are up and running, we can send data to StatsD and examine the feedback in the graphite web app. StatsD accepts the following format.


echo "metric_name:metric_value|type_specification" | nc -u -w0 8125


Metric name and value are self-explanatory; below is a list of the commonly used data types and their applications. These are:

  • Gauges
  • Timers
  • Counters
  • Sets


Gauges are a constant data type. Best used for instrumentation; an example would be the current load of the system. They are not subject to averaging, and they don’t change unless you directly alter them.


echo "demo.gauge:100|g" | nc -u -w0 8125


The new stat is accessible under stats > gauges > demo with the tree hierarchy on the left-hand side.

Wait 10 seconds (flush rate) and send another data point.


echo "demo.gauge:125|g" | nc -u -w0 8125


Notice how it maintains its value until the next one is set.


Timers measure the duration of a process, crucial for measuring application performance, database calls, render times, etc.


echo "demo.timer:250|ms" | nc -u -w0 8125
echo "demo.timer:258|ms" | nc -u -w0 8125
echo "demo.timer:175|ms" | nc -u -w0 8125


StatsD will provide us with percentiles, average (mean), standard deviation, sum, and lower and upper bounds for the flush interval; vital information for modeling and understanding how a system behaves in the wild.


Counters are the most basic and default type and are used to measure the frequency of an event per minute, for example, failed login attempts. An example of how to count the amount of calls to an endpoint.


<metric name>:<value>|c[|@<rate>]

echo "demo.count:1|c" | nc -u -w0 8125
echo "demo.count:1|c" | nc -u -w0 8125
echo "demo.count:1|c" | nc -u -w0 8125
echo "demo.count:1|c" | nc -u -w0 8125
echo "demo.count:1|c" | nc -u -w0 8125


When viewing the graph, we can observe the average number of events per second during one minute; the count metric shows us the number of occurrences within the flush interval.


Sets count the number of unique occurrences between flushes. When a metric sends a unique value, an event is counted. For example, it is possible to count the number of users accessing your system as a UID accessing multiple times will only be counted once. By cross-referencing the graph with the commands below, we can see only two recorded values.


echo "demo.set:100|s" | nc -u -w0 8125
echo "demo.set:100|s" | nc -u -w0 8125
echo "demo.set:100|s" | nc -u -w0 8125
echo "demo.set:8|s" | nc -u -w0 8125


Dashboard Configuration

It is possible to modify the graphite web app UI to our bespoke preferences. First, we need to create the configuration files by copying the default template files.


cd /opt/graphite/conf
cp dashboard.conf.example dashboard.conf
cp graphTemplates.conf.example graphTemplates.conf


We can modify the dashboards to have larger tile sizes to prevent eye strain when reading the data.


sudo nano /opt/graphite/conf/dashboard.conf

default_graph_width = 450
default_graph_height = 450
automatic_variants = true
refresh_interval = 60
autocomplete_delay = 375
merge_hover_delay = 750


We can also modify the theme and aesthetics. For example, the following set of attributes gives us a solarized dark-style theme.


Sudo nano /opt/graphite/conf/graphTemplates.conf

background = #002b36
foreground = #839496
majorLine = #fdf6e3
minorLine = #eee8d5
lineColors = 268bd2aa,859900aa,dc322faa,d33682aa,db4b16aa,b58900aa,2aa198aa,6c71c4aa
fontName = Sans
fontSize = 10



As you can see the process of setting up Graphite can become an installation maze. To get the best out of Graphite requires mastery, and this requires time in the trenches; and learning the ins and outs of the system. 


MetricFire can provide this expertise for your team and deliver a fully hosted Graphite solution tailored to the needs and nuances of your system. Your team will not have to worry about scalability, releases, plugins, maintenance, tuning or backups. Everything will work out of the box tailored to your needs with 24/7, 365 continuous automated monitoring from around the world.

We took the best parts of open-source Graphite and supercharged them. We also added everything that is missing in vanilla Graphite: a built-in agent, team accounts, granular dashboard permissions, and integrations to other technologies and services like AWS, Heroku, logging tools, and more.


MetricFire’s Hosted Graphite will help you visualize your data without any setup hassles. Go ahead and avail your free trial to get started, or contact us for a quick and easy demo and learn from one of our MetricFire engineers! 

You might also like other posts...
metricfire Jul 12, 2024 · 8 min read

Monitor Your Active SystemD Services Using Telegraf

Monitoring the state of your services and running processes is crucial for ensuring system... Continue Reading

metricfire Jul 03, 2024 · 9 min read

Monitor Your Socket Connections Using Telegraf and MetricFire

Monitoring socket connections in your servers is critical because it ensures network communication is... Continue Reading

metricfire Jun 26, 2024 · 9 min read

Guide to Monitoring Webhook Performance Using Telegraf

Monitoring your webhook endpoints is essential to maintain operational efficiency and customer satisfaction, as... Continue Reading

header image

We strive for
99.999% uptime

Because our system is your system.

14-day trial 14-day trial
No Credit Card Required No Credit Card Required