Grafana alerting

Grafana alerting

Table of Contents

  1. Starting with your monitoring environment
  2. How to create and configure Alert Rules
  3. Useful alerts for monitoring infrastructure and network
  4. Summary

A lot of organizations are using Grafana to visualize information and get notified about events happening within their infrastructure or data. In this article, we will show how to create and configure Grafana Alert rules. 

To get started, login to the MetricFire free trial, where you can send metrics and make Grafana dashboards right in our platform. 

Starting with your monitoring environment

In order to demonstrate how to use Grafana Alerts we need to set up our monitoring environment. 

In this article, we will use Elasticsearch as the data source for MetricFire’s Grafana as a Service. It is a convenient way to have Grafana installed and configured easily without causing issues related to the on-premise install. If you want more information about using Grafana with Elasticsearch, check out our tutorial

You can also easily set up your MetricFire free trial with other data sources. The most common data sources (such as StatsD, Collected, AWS, New Relic etc.) are available as add-ons within the product. We also have a MetricFire Agent that you can download that will pull metrics from your system into the platform. In the case of Prometheus, you can change your prometheus.yml file to allow MetricFire to read data through an API key. Instructions can be found on our docs

To have something to monitor in Grafana, we will generate and send random data to the Elasticsearch index from a simple Python script - a producer. To put the data in the index we use the Logstash tool. Despite the fact that the data is random, we will give it a specific range. At the same time, we will sometimes generate data that will be out of the given range. This will be the case when the Grafana Alerts should be triggered.

The script will generate random values in the range from -5 to 34 for parameter_1. For parameter_2, it will generate random numbers in the range from 100 to 209, but every 20th second the number generated will be in the range from 250 to 254. Our script will send new data to Logstash every second. This will give us a good data set to practice making Grafana Alerts with. 

How to create and configure Alert Rules

If you want to receive notifications about alerts, you should define at least one notification channel. This is how the information about an alert being triggered will be delivered to you. Several notification channels are available as add-ons on MetricFire such as Slack and PagerDuty. Here is the tutorial for using Slack as the notification channel for MetricFire’s product, Hosted Graphite. However, using notification channels is not a requirement for alerting. 

Users can create alert rules independently for each dashboard panel. We have the dashboard that monitors three fields (parameter_1, parameter_2, parameter_3) from our Elasticsearch index:

‍ 

undefined

 

Actually, an alert rule can be set during the panel creation. But if you already have the panel and want to add an alert rule to it, you should edit the panel:

 

undefined

 

Then click on the Alert button (the bell icon):

‍ 

undefined

 

And finally, click on the Create Alert button:

 

undefined

 

Now we need to configure the alert:

 

undefined

 

As you can see, we give it a name, set the frequency for the evaluation, and set the specific conditions of the Alert. For this particular alert, we want to be notified when the average value of the parameter_1 is out of range [-2 : 32]. As the Python script is producing values between -5 and 34, the value will sometimes be outside the range [-2 : 32]. 

When looking at the conditions section, you can see the query(A, 1s, now) part. Let’s explain what these parameters mean. “A” is the query used to visualize the metric. You could see the place where this query was defined in one of the previous images (before clicking on the button with the bell). In our case, it is an average for the parameter_1 over the last 1 second. The parameters “1s” and “now” set the time range and represent: “1 second ago to now”. In other words, we want to estimate data over the last second. Below the Conditions section, you can also configure the behavior of the alert when missing data or errors occur. This is very significant, as missing data can be frequent. 

On the graph below, you will see the convenient visualization of the alert’s conditions:

 

undefined

 

If you scroll the page down, you will see the section for specifying the notification channel. We will use the “example email” channel which we had created previously:

 

undefined

 

To apply changes, save the dashboard. After we run the Python script and wait for a while, we will start to receive the emails with notifications about the alert. Here is an example of such an email:

 

undefined

 

Similarly, we can create other alert rules. Below you can see the condition of the alert for parameter_2. In this case, we want to receive notifications when the maximum value computed over the last 10 seconds is above 253.

 

undefined

 

Besides the “out-of-bounds” and “above/below” conditions there is the third condition type - missing values. Here is how it can be configured:

 

undefined

 

Remember that you can create complex conditions that consist of several blocks. To do this, click on the “Plus” button under the first condition block. Condition blocks can be stacked using the “AND” or “OR” operators. In the result, you can get something like this:

 

undefined

 

Note, that there are many different functions for evaluation: count, sum, median, diff, min, max, etc. Also, you can set up alerts with other queries (instead of just “A” in our examples above). 

For example, suppose that we have two queries: A and B (see the image below). The query A reflects the average value for the parameter_1 over the specified period of time. The query B reflects the sum of values of the parameter_2 over the specified period of time.

 

undefined

 

When you have several different queries, you can create alerts based on them:

 

undefined

 ‍

Useful alerts for monitoring infrastructure and network

For people who monitor the infrastructure and network, there are several types of alerts that can be useful. They can monitor the server load, request latency, error rates, and memory usage. If you want to monitor the performance of the application, there can be even more use-case-specific metrics to monitor. For example, there could be an alert about the large number of new user registrations over a short period of time. Remember, that in the panel query (which was named as “A” in our examples) you can include the custom request to the data source. To do this, use the Query field (see the image below). In the case with the ElasticSearch source, this should be a Lucene query.

 

undefined

 

The availability to create custom requests extends your potential capabilities to develop complex alert conditions.

Summary

In this post, we looked at how to work with Grafana Alerts. The focus was on the creation and configuring of alert conditions. If you need to be notified about alerts, you should set up a notification channel. Also, take a look at our favorite Grafana Dashboards, our article about Grafana plugins, and our Grafana Dashboard tutorial

If you want to know how MetricFire can help with your monitoring needs, book a demo and talk to us directly. And don’t forget you can use our 14-day free trial, and make your own Grafana Dashboards within a few minutes. 

Hungry for more knowledge?

Related posts