Graphite Monitoring Tool Tutorial.

Graphite Monitoring Tool Tutorial

Table of Contents

  1. Introduction: Graphite monitoring
  2. Prerequisites
  3. System Update
  4. Graphite Stack Installation
  5. Install and Configure Database
  6. Graphite Web Configuration
  7. Graphite Schema
  8. Static Content
  9. Carbon Configuration
  10. Nginx
  11. StatsD
  12. Supervisord
  13. Exploring StatsD and Graphite interaction
  14. Dashboard Configuration
  15. Conclusion

Introduction: Graphite monitoring

In this post, we will go through the process of configuring and installing Graphite on an Ubuntu machine

                                     

What is Graphite Monitoring?

In short; Graphite stores, collects, and visualizes time-series data in real-time. It provides operations teams with instrumentation, allowing for visibility on varying levels of granularity concerning the behavior and mannerisms of the system. This leads to error detection, resolution and continuous improvement. Graphite is composed of the following components.

                      

  • Carbon: receives metrics over the network and writes to disk using a storage backend.
  • Whisper: file-based time-series database. 
  • Web: Django app which renders graphs and dashboards.

                                    

Sign up for the MetricFire free trial to set up Graphite and build your Grafana dashboard. You can also book a demo and talk to the MetricFire team on how you can best set up your monitoring stack.

                          

                                        

Prerequisites

Ubuntu 20.04 with at least 2GB of RAM.

                           

System Update

                            

sudo apt update
sudo apt upgrade -y

                      

Graphite Stack Installation

First, we must satisfy build dependencies for the various Graphite monitoring tool components. This is done via the command line:

               

sudo apt -y install python3-dev python3-pip libcairo2-dev libffi-dev build-essential

             

Set PythonPath to augment the default search path for module files.

                 

export PYTHONPATH="/opt/graphite/lib/:/opt/graphite/webapp/"

             

Install the data storage engine.

             

sudo -H pip3 install --no-binary=:all: 
https://github.com/graphite-project/whisper/tarball/master

                 

Install Carbon data-caching daemon.

            

sudo -H pip3 install --no-binary=:all: 
https://github.com/graphite-project/carbon/tarball/master

              

Install the web-based visualization frontend.

           

sudo -H pip3 install --no-binary=:all: 
https://github.com/graphite-project/graphite-web/tarball/master

           

Install and Configure Database

Graphite uses SQLite as the default database to store Django attributes such as dashboards, preferences and graphs. Metric data is not stored here. However, here we will demonstrate PostgreSQL integration. The following is the software required for communication between Graphite and PostgreSQL.

         

sudo apt-get install postgresql libpq-dev python3-psycopg2

               

The next step is to create a database with a user and password. The TeamPassword password generator helps here.

           

sudo -u postgres psql
CREATE USER metric WITH PASSWORD '$SECURE_PASS';
CREATE DATABASE fire WITH OWNER metric;
\q

       

Graphite Web Configuration

Graphite-web uses the convention of importing a local_settings.py file from the web app settings.py module - Graphite-web’s runtime configuration loads from here. We must copy an example template before adding our desired configuration to the web app.

          

cd /opt/graphite/webapp/graphite
cp local_settings.py.example local_settings.py
sudo nano /etc/graphite/local_settings.py

           

Uncomment and edit the following attributes secret_key, timezone, remote_user_authentication, debug and databases sections as outlined below.

               

SECRET_KEY = '$SECURE_PASS'

                

Set this to a long, random unique string to use as a secret key for this install. This key salts the hashes; used in auth tokens, CRSF middleware, cookie storage, etc. - should be set identically among instances if used behind a load balancer - use uuidgen.

             

TIME_ZONE = 'Europe/Amsterdam'

                  

Set your local timezone (Django's default is America/Chicago). If your graphs appear to be offset by a couple of hours, then this probably needs to be explicitly set to your local timezone.

             

DEBUG = True

             

We also set DEBUG to True here because current versions of Django will not serve static files (JavaScript, images, and so on.) from the development server we are using in our demonstration. A more formal installation would leave the DEBUG setting disabled.

                

USE_REMOTE_USER_AUTHENTICATION = True

                  

REMOTE_USER authentication. See: 

https://docs.djangoproject.com/en/dev/howto/auth-remote-user/

            

DATABASES = {
   'default': {
     'NAME': 'fire',
     'ENGINE': 'django.db.backends.postgresql_psycopg2',
     'USER': 'metric',
     'PASSWORD': '$SECURE_PASS',
     'HOST': '127.0.0.1',
     'PORT': ''
   }
}

            

Above is an example of using PostgreSQL. The default database is SQLite; 'django.db.backends.sqlite3'.

         

PostgreSQL, mySQL, sqlite3 and Oracle are all Graphite compatible.

             

Graphite Schema

It is necessary to set up an initial Graphite schema with the following command.

          

sudo -H PYTHONPATH=/opt/graphite/webapp django-admin migrate 
--settings=graphite.settings --run-syncdb

              

At this point, the database is empty, so we need a user that has complete access over the administration system. The Django-admin script outlined below; with the “createsuperuser” arg, will prompt you for a username, e-mail, and password; creating an admin user for managing other users on the web front end.

             

sudo -H PYTHONPATH=/opt/graphite/webapp django-admin createsuperuser 
--settings=graphite.settings

                

Static Content

/opt/graphite/static is the default location for Graphite-web’s static content. One must manually populate the directory with the following command:

                

sudo -H PYTHONPATH=/opt/graphite/webapp django-admin collectstatic --noinput 
--settings=graphite.settings

            

Carbon Configuration

Next, there are two configuration files that carbon uses to control its cache and aggregation abilities, as well as the output storage format. We must copy the example configuration files as a template for carbon.conf and storage-schemas.conf.

         

sudo cp /opt/graphite/conf/carbon.conf.example /opt/graphite/conf/carbon.conf
sudo cp /opt/graphite/conf/storage-schemas.conf.example 
/opt/graphite/conf/storage-schemas.conf

         

Add the following to storage-schemas.conf to define retention and downsampling requirements; as recommended by StatsD.

         

sudo nano /opt/graphite/conf/storage-schemas.conf



[stats]
pattern = ^stats.*
retentions = 10s:6h,1m:6d,10m:1800d

                         

The above translates for all metrics starting with 'stats' (i.e. all metrics sent by StatsD), capture:

  • Six hours of 10-second data (what we consider "near-real-time")
  • Six days of 1-minute data
  • Five years of 10-minute data

                  

The recommendations also outline aggregation specifications to ensure matching patterns; preventing data being corrupted or discarded when downsampled. 

            

Edit the conf/storage-aggregation.conf file to mimic the following.

           

[min]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min

[max]
pattern = \.upper(_\d+)?$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum

[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum

[default_average]
pattern = .*
xFilesFactor = 0.3
aggregationMethod = average

                  

Metrics ending with .lower or .upper, only the minimum and the maximum value retain. See StatsD for more details.

              

At this point, we can do a quick test to ensure the setup is correct. Run the web interface under the Django development server with the following commands.

              

cd /opt/graphite 
sudo PYTHONPATH=`pwd`/whisper ./bin/run-graphite-devel-server.py 
--libs=`pwd`/webapp/ /opt/graphite/

                   

By default, the server will listen on port 8080, point your web browser to http://127.0.0.1.

                

The graphite interface should appear. If not the debug mode configuration should provide enough information; if not tail the latest process log.

               

tail -f /opt/graphite/storage/log/webapp/*.log

                  

Nginx

We will now expose the web application using Nginx which will proxy requests for Gunicorn, which in turn listens locally on port 8080 serving the web app (Django application).

                    

sudo apt install gunicorn nginx
sudo ln -s /usr/local/bin/gunicorn /opt/graphite/bin/gunicorn

                  

Create Nginx log files and add the correct permissions.

                

sudo touch /var/log/nginx/graphite.access.log
sudo touch /var/log/nginx/graphite.error.log
sudo chmod 640 /var/log/nginx/graphite.*
sudo chown www-data:www-data /var/log/nginx/graphite.*

                    

Create a configuration file called /etc/nginx/sites-available/graphite and add the following content. Change the HOSTNAME to match your server name.

                  

upstream graphite {
    server 127.0.0.1:8080 fail_timeout=0;
}

server {
    listen 80 default_server;

    server_name HOSTNAME;

    root /opt/graphite/webapp;

    access_log /var/log/nginx/graphite.access.log;
    error_log  /var/log/nginx/graphite.error.log;

    location = /favicon.ico {
        return 204;
    }

    # serve static content from the "content" directory
    location /static {
        alias /opt/graphite/webapp/content;
        expires max;
    }

    location / {
        try_files $uri @graphite;
    }

    location @graphite {
        proxy_pass_header Server;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;
        proxy_connect_timeout 10;
        proxy_read_timeout 10;
        proxy_pass http://graphite;
    }
}

                 

We need to enable the server block files by creating symbolic links from these files to the sites-enabled directory, which Nginx reads from during startup.

                  

sudo ln -s /etc/nginx/sites-available/graphite /etc/nginx/sites-enabled
sudo rm -f /etc/nginx/sites-enabled/default

                      

Then validate Nginx configuration.

               

sudo nginx -t 
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

                     

Finally, restart Nginx service.

                    

sudo systemctl restart nginx

                  

StatsD

Applications use a collector client to feed device metrics upstream to a Graphite server; typically using StatsD or CollectD. StatsD is an event counter/aggregation service; listening on a UDP port for incoming metrics data it periodically sends aggregated events upstream to a back-end such as Graphite.

                            

Today, StatsD refers to the original protocol written at Etsy and to the myriad of services that now implement this protocol.

               

StatsD requires Node; to install, use the following commands.

              

curl -L -s https://deb.nodesource.com/setup_10.x | sudo bash
sudo apt install -y nodejs git
ln -s /usr/bin/node /usr/local/bin/node

                

Clone StatsD from Etsy repository.

                

sudo git clone https://github.com/etsy/statsd.git /opt/statsd

                   

Add the following configuration for Graphite integration.

                 

sudo nano /opt/statsd/localConfig.js

{
   graphitePort: 2003,
   graphiteHost: "127.0.0.1",
   port: 8125,
   backends: [ "./backends/graphite" ]
}

                    

Supervisord

We will use supervisor to manage the Carbon, StatsD and Gunicorn processes. A configuration file is required for each process; outlined below.

                     

sudo apt install -y supervisor

                 

StatsD.

             

sudo nano /etc/supervisor/conf.d/statsd.conf



[program:statd]

command=/usr/local/bin/node /opt/statsd/stats.js /opt/statsd/localConfig.js

process_name=%(program_name)s

autostart=true

autorestart=true

stopsignal=QUIT

              

Gunicorn.

                   

sudo nano /etc/supervisor/conf.d/gunicorn.conf



[program:gunicorn]

command = /opt/graphite/bin/gunicorn -b 127.0.0.1:8080 -w 2 --pythonpath 
/opt/graphite/webapp/ wsgi:application


directory = /opt/graphite/webapp/

autostart=true

autorestart=true

redirect_stderr = true

                      

Carbon.

                    

sudo nano /etc/supervisor/conf.d/carbon.conf



[program:carbon]

command = /opt/graphite/bin/carbon-cache.py --debug start

autostart=true

autorestart=true

redirect_stderr = true
Restart supervisor for the new configuration to be reloaded.
sudo systemctl restart supervisor
sudo systemctl enable supervisor

                   

The following command will reveal if the processes are running successfully or not.

               

sudo supervisorctl 



carbon                           RUNNING   pid 1320, uptime 1:41:29
gunicorn                         RUNNING   pid 1321, uptime 1:41:29
statsd                           RUNNING   pid 1322, uptime 1:41:29

                   

If there is an error you can debug with the following.

                

systemctl status supervisor 
tail -f /var/log/supervisor/supervisord.log

             

Exploring StatsD and Graphite interaction

Now that we are up and running, we can send data to StatsD and examine the feedback in the graphite web app. StatsD accepts the following format.

             

echo "metric_name:metric_value|type_specification" | nc -u -w0 127.0.0.1 8125

              

Metric name and value are self-explanatory; below is a list of the commonly used data types and their applications. These are:

  • Gauges
  • Timers
  • Counters
  • Sets

           

Gauges are a constant data type. Best used for instrumentation; an example would be the current load of the system. They are not subject to averaging, and they don’t change unless you directly alter them.

       

echo "demo.gauge:100|g" | nc -u -w0 127.0.0.1 8125

          

The new stat is accessible under stats > gauges > demo with the tree hierarchy on the left-hand side.

Wait 10 seconds (flush rate) and send another data point.

               

echo "demo.gauge:125|g" | nc -u -w0 127.0.0.1 8125

            

Notice how it maintains its value until the next one is set.

                

Timers measure the duration of a process, crucial for measuring application performance, database calls, render times, etc.

                

echo "demo.timer:250|ms" | nc -u -w0 127.0.0.1 8125
echo "demo.timer:258|ms" | nc -u -w0 127.0.0.1 8125
echo "demo.timer:175|ms" | nc -u -w0 127.0.0.1 8125

              

StatsD will provide us with percentiles, average (mean), standard deviation, sum, lower and upper bounds for the flush interval; vital information for modelling and understanding how a system is behaving in the wild.

      

Counters are the most basic and default type and are used to measure the frequency of an event per minute, for example, failed login attempts. An example on how to count the amount of calls to an endpoint.

              

<metric name>:<value>|c[|@<rate>]



echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125
echo "demo.count:1|c" | nc -u -w0 127.0.0.1 8125

               

When viewing the graph, we can observe the average number of events per second during one minute; the count metric shows us the number of occurrences within the flush interval.

               

Sets count the number of unique occurrences between flushes. When a metric sends a unique value, an event is counted. For example, it is possible to count the number of users accessing your system as a UID accessing multiple times will only be counted once. By cross-referencing the graph with the commands below, we can see only two recorded values.

                  

echo "demo.set:100|s" | nc -u -w0 127.0.0.1 8125
echo "demo.set:100|s" | nc -u -w0 127.0.0.1 8125
echo "demo.set:100|s" | nc -u -w0 127.0.0.1 8125
echo "demo.set:8|s" | nc -u -w0 127.0.0.1 8125

                

Dashboard Configuration

It is possible to modify the graphite web app UI to our bespoke preferences. First, we need to create the configuration files by copying the default template files.

                 

cd /opt/graphite/conf
cp dashboard.conf.example dashboard.conf
cp graphTemplates.conf.example graphTemplates.conf

            

We can modify the dashboards to have larger tile sizes to prevent eye strain when reading the data.

            

sudo nano /opt/graphite/conf/dashboard.conf



[ui]
default_graph_width = 450
default_graph_height = 450
automatic_variants = true
refresh_interval = 60
autocomplete_delay = 375
merge_hover_delay = 750

                   

We can also modify the theme and aesthetics. For example, the following set of attributes gives us a solarized dark style theme.

                     

Sudo nano /opt/graphite/conf/graphTemplates.conf



[solarized-dark]
background = #002b36
foreground = #839496
majorLine = #fdf6e3
minorLine = #eee8d5
lineColors = 268bd2aa,859900aa,dc322faa,d33682aa,db4b16aa,b58900aa,2aa198aa,6c71c4aa
fontName = Sans
fontSize = 10

             

Conclusion

As you can see the process of setting up Graphite can become an installation maze. To get the best out of Graphite requires mastery, and this requires time in the trenches; learning the ins and outs of the system. 

             

MetricFire can provide this expertise for your team and deliver a fully hosted Graphite solution tailored to the needs and nuances of your system. Your team will not have to worry about scalability, releases, plugins, maintenance, tuning or backups. Everything will work out of the box tailored to your needs with 24/7, 365 continuous automated monitoring from around the world.
                

We took the best parts of open-source Graphite, and supercharged them. We also added everything that is missing in vanilla Graphite: a built-in agent, team accounts, granular dashboard permissions, and integrations to other technologies and services like AWS, Heroku, logging tools and more.

                

MetricFire’s Hosted Graphite will help you achieve visualizing your data without any setup hassles. Go ahead and avail your free trial to get started, or contact us for a quick and easy demo and learn from one of our MetricFire engineers! 

Hungry for more knowledge?

Related posts