Graphite Monitoring Tool Tutorial.

Graphite Monitoring Tool Tutorial

Table of Contents

Introduction: Graphite monitoring

In this post, we will go through the process of configuring and installing Graphite on an Ubuntu machine

                                      

What is Graphite Monitoring?

 

In short; Graphite stores, collects, and visualizes time-series data in real time. It provides operations teams with instrumentation, allowing for visibility on varying levels of granularity concerning the behavior and mannerisms of the system. This leads to error detection, resolution, and continuous improvement. Graphite is composed of the following components.

                       

  • Carbon: receives metrics over the network and writes to disk using a storage backend.
  • Whisper: file-based time-series database. 
  • Web: Django app which renders graphs and dashboards.

                                     

Sign up for the MetricFire free trial to set up Graphite and build your Grafana dashboard. You can also book a demo and talk to the MetricFire team on how you can best set up your monitoring stack.

                           

                                         

Key Takeaways

  1. Graphite is a tool that stores, collects, and visualizes time-series data in real-time. It offers granular visibility into system behavior, aiding in error detection, resolution, and continuous improvement.
  2. You can customize the Graphite web app's user interface, including graph dimensions and themes, to suit your preferences.
  1.   

Prerequisites

Ubuntu 20.04 with at least 2GB of RAM.

                            

System Update

                             

sudo apt update
sudo apt upgrade -y

                       

Graphite Stack Installation

First, we must satisfy build dependencies for the various Graphite monitoring tool components. This is done via the command line:

                

sudo apt -y install python3-dev python3-pip libcairo2-dev libffi-dev build-essential

              

Set PythonPath to augment the default search path for module files.

                  

export PYTHONPATH="/opt/graphite/lib/:/opt/graphite/webapp/"

              

Install the data storage engine.

              

sudo -H pip3 install --no-binary=:all: 
https://github.com/graphite-project/whisper/tarball/master

                  

Install Carbon data-caching daemon.

             

sudo -H pip3 install --no-binary=:all: 
https://github.com/graphite-project/carbon/tarball/master

               

Install the web-based visualization frontend.

            

sudo -H pip3 install --no-binary=:all: 
https://github.com/graphite-project/graphite-web/tarball/master

            

Install and Configure Database

Graphite uses SQLite as the default database to store Django attributes such as dashboards, preferences, and graphs. Metric data is not stored here. However, here we will demonstrate PostgreSQL integration. The following is the software required for communication between Graphite and PostgreSQL.

          

sudo apt-get install postgresql libpq-dev python3-psycopg2

                

The next step is to create a database with a username and password. The TeamPassword password generator helps here.

            

sudo -u postgres psql
CREATE USER metric WITH PASSWORD '$SECURE_PASS';
CREATE DATABASE fire WITH OWNER metric;
\q

        

Graphite Web Configuration

Graphite-web uses the convention of importing a local_settings.py file from the web app settings.py module - Graphite-web’s runtime configuration loads from here. We must copy an example template before adding our desired configuration to the web app.

           

cd /opt/graphite/webapp/graphite
cp local_settings.py.example local_settings.py
sudo nano /etc/graphite/local_settings.py

           

Uncomment and edit the following attributes secret_key, timezone, remote_user_authentication, debug, and databases sections as outlined below.

                

SECRET_KEY = '$SECURE_PASS'

                 

Set this to a long, random unique string to use as a secret key for this install. This key salts the hashes; used in auth tokens, CRSF middleware, cookie storage, etc. - should be set identically among instances if used behind a load balancer - use uuidgen.

              

TIME_ZONE = 'Europe/Amsterdam'

                   

Set your local timezone (Django's default is America/Chicago). If your graphs appear to be offset by a couple of hours, then this probably needs to be explicitly set to your local time zone.

              

DEBUG = True

              

We also set DEBUG to True here because current versions of Django will not serve static files (JavaScript, images, and so on.) from the development server we are using in our demonstration. A more formal installation would leave the DEBUG setting disabled.

                 

USE_REMOTE_USER_AUTHENTICATION = True

                   

REMOTE_USER authentication. See: 

https://docs.djangoproject.com/en/dev/howto/auth-remote-user/

             

DATABASES = {
   'default': {
     'NAME': 'fire',
     'ENGINE': 'django.db.backends.postgresql_psycopg2',
     'USER': 'metric',
     'PASSWORD': '$SECURE_PASS',
     'HOST': '127.0.0.1',
     'PORT': ''
   }
}

             

Above is an example of using PostgreSQL. The default database is SQLite; 'django.db.backends.sqlite3'.

          

PostgreSQL, mySQL, sqlite3, and Oracle are all Graphite compatible.

              

Graphite Schema

It is necessary to set up an initial Graphite schema with the following command.

           

sudo -H PYTHONPATH=/opt/graphite/webapp django-admin migrate 
--settings=graphite.settings --run-syncdb

               

At this point, the database is empty, so we need a user that has complete access to the administration system. The Django-admin script outlined below; with the “createsuperuser” arg, will prompt you for a username, e-mail, and password; creating an admin user for managing other users on the web front end.

              

sudo -H PYTHONPATH=/opt/graphite/webapp django-admin createsuperuser 
--settings=graphite.settings

                 

Static Content

/opt/graphite/static is the default location for Graphite-web’s static content. One must manually populate the directory with the following command:

                 

sudo -H PYTHONPATH=/opt/graphite/webapp django-admin collectstatic --noinput 
--settings=graphite.settings

             

Carbon Configuration

Next, there are two configuration files that Carbon uses to control its cache and aggregation abilities, as well as the output storage format. We must copy the example configuration files as a template for carbon.conf and storage-schemas.conf.

          

sudo cp /opt/graphite/conf/carbon.conf.example /opt/graphite/conf/carbon.conf
sudo cp /opt/graphite/conf/storage-schemas.conf.example 
/opt/graphite/conf/storage-schemas.conf

          

Add the following to storage-schemas.conf to define retention and downsampling requirements; as recommended by StatsD.

          

sudo nano /opt/graphite/conf/storage-schemas.conf



[stats]
pattern = ^stats.*
retentions = 10s:6h,1m:6d,10m:1800d

                          

The above translates for all metrics starting with 'stats' (i.e. all metrics sent by StatsD), capture:

  • Six hours of 10-second data (what we consider "near-real-time")
  • Six days of 1-minute data
  • Five years of 10-minute data

                   

The recommendations also outline aggregation specifications to ensure matching patterns; preventing data from being corrupted or discarded when downsampled. 

             

Edit the conf/storage-aggregation.conf file to mimic the following.

            

[min]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min

[max]
pattern = \.upper(_\d+)?$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum

[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum

[default_average]
pattern = .*
xFilesFactor = 0.3
aggregationMethod = average

                   

Metrics ending with .lower or .upper, only the minimum and the maximum value retained. See StatsD for more details.

               

At this point, we can do a quick test to ensure the setup is correct. Run the web interface under the Django development server with the following commands.

               

cd /opt/graphite 
sudo PYTHONPATH=`pwd`/whisper ./bin/run-graphite-devel-server.py 
--libs=`pwd`/webapp/ /opt/graphite/

                    

By default, the server will listen on port 8080, and point your web browser to http://127.0.0.1.

                 

The graphite interface should appear. If not the debug mode configuration should provide enough information; if not tail the latest process log.

                

tail -f /opt/graphite/storage/log/webapp/*.log

                   

Nginx

We will now expose the web application using Nginx which will proxy requests for Gunicorn, which in turn listens locally on port 8080 serving the web app (Django application).

                     

sudo apt install gunicorn nginx
sudo ln -s /usr/local/bin/gunicorn /opt/graphite/bin/gunicorn

                   

Create Nginx log files and add the correct permissions.

                 

sudo touch /var/log/nginx/graphite.access.log
sudo touch /var/log/nginx/graphite.error.log
sudo chmod 640 /var/log/nginx/graphite.*
sudo chown www-data:www-data /var/log/nginx/graphite.*

                     

Create a configuration file called /etc/nginx/sites-available/graphite and add the following content. Change the HOSTNAME to match your server name.

                   

upstream graphite {
    server 127.0.0.1:8080 fail_timeout=0;
}

server {
    listen 80 default_server;

    server_name HOSTNAME;

    root /opt/graphite/webapp;

    access_log /var/log/nginx/graphite.access.log;
    error_log  /var/log/nginx/graphite.error.log;

    location = /favicon.ico {
        return 204;
    }

    # serve static content from the "content" directory
    location /static {
        alias /opt/graphite/webapp/content;
        expires max;
    }

    location / {
        try_files $uri @graphite;
    }

    location @graphite {
        proxy_pass_header Server;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;
        proxy_connect_timeout 10;
        proxy_read_timeout 10;
        proxy_pass http://graphite;
    }
}

                  

We need to enable the server block files by creating symbolic links from these files to the sites-enabled directory, which Nginx reads from during startup.

                   

sudo ln -s /etc/nginx/sites-available/graphite /etc/nginx/sites-enabled
sudo rm -f /etc/nginx/sites-enabled/default

                       

Then validate Nginx configuration.

                

sudo nginx -t 
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

                      

Finally, restart the Nginx service.

                     

sudo systemctl restart nginx

                   

StatsD

Applications use a collector client to feed device metrics upstream to a Graphite server; typically using StatsD or CollectD. StatsD is an event counter/aggregation service; listening on a UDP port for incoming metrics data it periodically sends aggregated events upstream to a back-end such as Graphite.

                             

Today, StatsD refers to the original protocol written at Etsy and to the myriad of services that now implement this protocol.

                

StatsD requires Node; to install, use the following commands.

               

curl -L -s https://deb.nodesource.com/setup_10.x | sudo bash
sudo apt install -y nodejs git
ln -s /usr/bin/node /usr/local/bin/node

                 

Clone StatsD from the Etsy repository.

                 

sudo git clone https://github.com/etsy/statsd.git /opt/statsd

                    

Add the following configuration for Graphite integration.

                  

sudo nano /opt/statsd/localConfig.js

{
   graphitePort: 2003,
   graphiteHost: "127.0.0.1",
   port: 8125,
   backends: [ "./backends/graphite" ]
}

                     

Supervisord

We will use supervisor to manage the Carbon, StatsD and Gunicorn processes. A configuration file is required for each process; outlined below.

                      

sudo apt install -y supervisor

                  

StatsD.

              

sudo nano /etc/supervisor/conf.d/statsd.conf



[program:statd]

command=/usr/local/bin/node /opt/statsd/stats.js /opt/statsd/localConfig.js

process_name=%(program_name)s

autostart=true

autorestart=true

stopsignal=QUIT

               

Gunicorn.

                    

sudo nano /etc/supervisor/conf.d/gunicorn.conf



[program:gunicorn]

command = /opt/graphite/bin/gunicorn -b 127.0.0.1:8080 -w 2 --pythonpath 
/opt/graphite/webapp/ wsgi:application


directory = /opt/graphite/webapp/

autostart=true

autorestart=true

redirect_stderr = true

                       

Carbon.

                     

sudo nano /etc/supervisor/conf.d/carbon.conf



[program:carbon]

command = /opt/graphite/bin/carbon-cache.py --debug start

autostart=true

autorestart=true

redirect_stderr = true
Restart supervisor for the new configuration to be reloaded.
sudo systemctl restart supervisor
sudo systemctl enable supervisor

                    

The following command will reveal if the processes are running successfully or not.

                

sudo supervisorctl 



carbon                           RUNNING   pid 1320, uptime 1:41:29
gunicorn                         RUNNING   pid 1321, uptime 1:41:29
statsd                           RUNNING   pid 1322, uptime 1:41:29

                    

If there is an error you can debug with the following.

                 

systemctl status supervisor 
tail -f /var/log/supervisor/supervisord.log

              

Exploring StatsD and Graphite Interaction

Now that we are up and running, we can send data to StatsD and examine the feedback in the graphite web app. StatsD accepts the following format.

              

echo "metric_name:metric_value|type_specification" | nc -u -w0 127.0.0.1 8125

                

Metric name and value are self-explanatory; below is a list of the commonly used data types and their applications. These are:

  • Gauges
  • Timers
  • Counters
  • Sets

             

Gauges are a constant data type. Best used for instrumentation; an example would be the current load of the system. They are not subject to averaging, and they don’t change unless you directly alter them.

        

echo "demo.gauge:100|g" | nc -u -w0 127.0.0.1 8125

           

The new stat is accessible under stats > gauges > demo with the tree hierarchy on the left-hand side.

Wait 10 seconds (flush rate) and send another data point.

                

echo "demo.gauge:125|g" | nc -u -w0 127.0.0.1 8125

             

Notice how it maintains its value until the next one is set.

                 

Timers measure the duration of a process, crucial for measuring application performance, database calls, render times, etc.

                 

echo "demo.timer:250|ms" | nc -u -w0 127.0.0.1 8125
echo "demo.timer:258|ms" | nc -u -w0 127.0.0.1 8125
echo "demo.timer:175|ms" | nc -u -w0 127.0.0.1 8125

               

StatsD will provide us with percentiles, average (mean), standard deviation, sum, and lower and upper bounds for the flush interval; vital information for modeling and understanding how a system behaves in the wild.

       

Counters are the most basic and default type and are used to measure the frequency of an event per minute, for example, failed login attempts. An example of how to count the amount of calls to an endpoint.

               

<metric name>:<value>|c[|@<rate>]



echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125

                

When viewing the graph, we can observe the average number of events per second during one minute; the count metric shows us the number of occurrences within the flush interval.

                

Sets count the number of unique occurrences between flushes. When a metric sends a unique value, an event is counted. For example, it is possible to count the number of users accessing your system as a UID accessing multiple times will only be counted once. By cross-referencing the graph with the commands below, we can see only two recorded values.

                   

echo "demo.set:100|s" | nc -u -w0 127.0.0.1 8125
echo "demo.set:100|s" | nc -u -w0 127.0.0.1 8125
echo "demo.set:100|s" | nc -u -w0 127.0.0.1 8125
echo "demo.set:8|s" | nc -u -w0 127.0.0.1 8125

                 

Dashboard Configuration

It is possible to modify the graphite web app UI to our bespoke preferences. First, we need to create the configuration files by copying the default template files.

                  

cd /opt/graphite/conf
cp dashboard.conf.example dashboard.conf
cp graphTemplates.conf.example graphTemplates.conf

             

We can modify the dashboards to have larger tile sizes to prevent eye strain when reading the data.

             

sudo nano /opt/graphite/conf/dashboard.conf



[ui]
default_graph_width = 450
default_graph_height = 450
automatic_variants = true
refresh_interval = 60
autocomplete_delay = 375
merge_hover_delay = 750

                    

We can also modify the theme and aesthetics. For example, the following set of attributes gives us a solarized dark-style theme.

                      

Sudo nano /opt/graphite/conf/graphTemplates.conf



[solarized-dark]
background = #002b36
foreground = #839496
majorLine = #fdf6e3
minorLine = #eee8d5
lineColors = 268bd2aa,859900aa,dc322faa,d33682aa,db4b16aa,b58900aa,2aa198aa,6c71c4aa
fontName = Sans
fontSize = 10

              

Conclusion

As you can see the process of setting up Graphite can become an installation maze. To get the best out of Graphite requires mastery, and this requires time in the trenches; and learning the ins and outs of the system. 

              

MetricFire can provide this expertise for your team and deliver a fully hosted Graphite solution tailored to the needs and nuances of your system. Your team will not have to worry about scalability, releases, plugins, maintenance, tuning or backups. Everything will work out of the box tailored to your needs with 24/7, 365 continuous automated monitoring from around the world.
                 

We took the best parts of open-source Graphite and supercharged them. We also added everything that is missing in vanilla Graphite: a built-in agent, team accounts, granular dashboard permissions, and integrations to other technologies and services like AWS, Heroku, logging tools, and more.

                  

MetricFire’s Hosted Graphite will help you visualize your data without any setup hassles. Go ahead and avail your free trial to get started, or contact us for a quick and easy demo and learn from one of our MetricFire engineers! 

You might also like other posts...
metricfire Jul 12, 2024 · 8 min read

Monitor Your Active SystemD Services Using Telegraf

Monitoring the state of your services and running processes is crucial for ensuring system... Continue Reading

metricfire Jul 03, 2024 · 9 min read

Monitor Your Socket Connections Using Telegraf and MetricFire

Monitoring socket connections in your servers is critical because it ensures network communication is... Continue Reading

metricfire Jun 26, 2024 · 9 min read

Guide to Monitoring Webhook Performance Using Telegraf

Monitoring your webhook endpoints is essential to maintain operational efficiency and customer satisfaction, as... Continue Reading

header image

We strive for
99.999% uptime

Because our system is your system.

14-day trial 14-day trial
No Credit Card Required No Credit Card Required