Table of Contents
Grafana is one of the most popular dashboarding and visualization tools for metrics. And the Grafana Dashboards are a very important part of infrastructure and application instrumentation. In this post, we will deep dive into Grafana dashboards. We will create a Grafana dashboard for a VM’s most important metrics, learn to create advanced dashboards with filters for multiple instance metrics, import and export dashboards, learn to refresh intervals in dashboards, and learn about plugins.
To start, we will need a metrics source from which we will add metrics to Grafana for visualization. We will use Prometheus as the data source and node-exporter to export metrics from a VM to Grafana. We suggest logging into MetricFire's Hosted Graphite free trial for your monitoring, but we will cover Prometheus in this post. You can set up your own Grafana dashboards right on our platform, and apply what you learn from this article.
Installing Prometheus and Node-Exporter on Debian based systems.
sudo apt-get install prometheus prometheus-node-exporter
This will install both Prometheus and Node-Exporter and run them as a systemd service. By default, Prometheus is configured to get metrics from Node-Exporter. Refer to our previous post for installing Grafana. Now that we have everything up and running let’s add Prometheus as a data source and create a dashboard.
Add data source
Grafana supports different storage backends which provides a variety of ways to query and visualize the data. All of these data sources expose their own query languages. For example, Prometheus exposes PromQL for time series data, and MySQL data source exposes SQL query language. Now let’s add Prometheus as the data source in Grafana.
Go to configuration → data sources and click on “Add data source”.
Add Prometheus and fill out the url, authentication, scrape interval and name of the data source. Press save and test. It should show Data source is working if Grafana successfully connects to Prometheus.
An important detail to note is the “access mode drop-down”, which has two options: server and browser. The server option means that any request to a data source will be sent to the Grafana backend server, and the backend will send the request to the data source. The browser option means that requests to the data source will be sent to the data source directly. The server option is recommended for secure access so as to not expose credentials to all the users. Now that we have a data source set up, let’s visualize a VM’s metrics.
A dashboard is a group of widgets but it also provides a lot more features like folders, variables (for changing visualizations throughout widgets), time ranges, and auto refresh of widgets. Go to the plus icon on the left side of the homepage and create a dashboard with the name “Nodes Metrics”.
Adding a row
A row is a logical divider within a dashboard that can be used to group panels together. A row can be created dynamically by using variables. We will talk about variables in the next section.
Click on the + icon (the first icon on the top right) and click on “convert to row”.
After the new row is added click on the row’s settings icon, it should open an edit pop-up. Create a row named “Overview”. We will add basic metrics like memory, CPU, network usage, etc.
We will add new panels and add them to the Overview row.
Variables are a way to create dynamic Grafana dashboards. They can be used to select different instances of metrics. For example, if you are receiving metrics from multiple machines then variables can be used to create drop-downs to select one machine’s metrics. These variables are used in the data source query to support changes in metrics in the dashboard. Let’s add a variable for VM names so that we can select metrics for different VMs.
Variables can be different types, such as:
- Data source type: which can be used to dynamically change the data source in the panels.
- Query type: where values such as instance names, and server hosts can be fetched.
- Interval type: This can be used to perform aggregation dynamically on the queries like CPU and memory usage, so the last 1m, 5m, and 10m can be seen by using variables without additional panels.
Go to dashboard settings in the top right and click on variables → add a new variable.
Select query type, and add the query for getting all the node-exporter host names, which we can use to see different VM stats.
A panel is the basic visualization building block in Grafana. There are a lot of visualizations like Graph, Singlestat, Dashlist, Table, Text, and more if you consider plugins. But before we create these visualizations let’s talk about what we want to monitor and which panels we will use for the visualizations.
The metrics you choose to monitor should answer two questions: what is broken and why? Since we want to monitor a VM, we will have to monitor basic metrics like CPU, memory, network, disk I/O, disk space, uptime, and the number of running processes. If we were monitoring a web application then we would want to monitor the number of incoming requests, response times, response codes, resources used for serving one single request, queued and rejected requests, etc.
Now let’s add some visualizations. I will explain which panels to use for monitoring the metrics: CPU, memory, filesystem, and network.
a. CPU: We want to see the current CPU usage and CPU usage overtime. For seeing current CPU usage we will use Gauge type visualization. A Gauge is like a speedometer, it will go up or down in a specific range. Add a new panel and click on Add query on the panel.
The left pane shows three controls: query, visualizations, and general. Go to Visualization and select Gauge. Under Display, for “show”, select calculation. For calc, choose the “last” value because we want to see the latest status. Under Field, for the unit, select percent (there are lots of other fields available such as KBps, miles, etc.) Under Thresholds, choose the value above which you want to display red. In the case above, anything over 80% CPU usage will be visualized as red.
To visualize the data we will use irate() of node_CPU (CPU average usage), for a 5m interval, with the instance variable as instance=~"$instance". To see more, check out our articles on rate() and irate(). When we change the instance using the drop-down menu the panel metrics will change automatically.
The General tab can be used to modify the panel title, description, links, etc. Now let’s add the title as CPU Usage[5m].
Let’s add the time series graph of CPU usage. Create a new panel, add a query, and select visualization type as a graph in the visualization tab. There are lots of options for graph types such as bar graphs, points, stacking, line width, axis units, sample size, and more.
Grafana calculates how long the time intervals are in each graph automatically. It uses the variables $_interval, $_from and $_to. For example, if you are viewing the last 6 months of data it will show 1-day interval segments in the graph, whereas if you are viewing the last 1 hour’s data it will show in 1-m interval groups.
Now that we have CPU metrics visualizations, let’s add some more metrics.
b. Memory: Let’s add a visualization for the total memory and the current memory usage. We will add multiple queries this time to visualize the available memory and the total memory.
c. Filesystem: We will add the graph for free disk space vs total disk space. We selected the filesystem_free and filesystem_size metrics respectively for this graph. Here we selected the mount point as / to visualize metrics for the complete filesystem.
d. Network: In this example we visualize received and transmitted data size over different network interfaces. Here we are setting up the y-axis unit as megabytes, but it can be changed in the visualizations tab to gigabytes or terabytes (that’s a lot of traffic).
Now drag all the added visualizations to Add Row.
Time intervals and automatic refresh
Grafana provides time intervals so we can check metrics at a given point in time or over a time interval. In the top right corner you can see the drop-down menu where you can set the time interval for the data. There are a couple of usual ones like last 5 m, last 1 hour, last 12 hours, and custom time intervals for any date or time.
Auto refresh can be used to refresh data in specific time intervals like every 5 s, every 1 m, etc.
Visualizing metrics are really useful but nobody can sit at a computer watching a dashboard all day!!! Alerts can inform us of critical metrics such as high memory usage.
Let’s set up a Slack alert on the dashboard we just built to alert us if CPU usage is high. Click on Alerting in the homepage and go to notification channels and add a new notification channel for slack. It requires credentials, slack channel name, and username.
After setting up an alert destination we will go to the panel’s alert icon to set up the alerts. We will set up the following condition: if the average value of CPU usage in a 5 m time range is greater than 90 it will fire the alert. We also select where these alerts should be sent to, here we use the MetricFire slack that we just added. This alert is evaluated every 1 m. If no data is available we configure the state to alerting, which means it will fire the alert if no data is available.
So now we have a working dashboard and alerting set up.
Dashboard import and export
Every dashboard in Grafana is JSON based. These dashboards can be exported from a JSON file or Grafana’s dashboard repository.
Let’s export the dashboard we just created. Go to dashboard setting → JSON Model, it shows a JSON document. This JSON document is a complete dashboard definition that can be imported to any other Grafana instance. Let’s save it as node-exporter.json.
Now let’s see how we can import a dashboard. Go to the + icon in the top left and click import. The JSON we made above can be pasted below to be imported, or we can paste the ID of the dashboard from Grafana dashboard repository, and it will be imported.
To duplicate a dashboard go to settings and save it as a different name and you will get a duplicate dashboard.
Plugins offer a way to extend Grafana beyond its amazing features so we can get new data sources, panels, dashboard types, etc.
Plugins can be installed using Grafana CLI. Go to grafana plugins repository and search for the plugin you need, and go to the installation tab to see the plugin id (also available in the URL path). To install a plugin use the following steps.
grafana-cli plugins install <plugin-id>
Restart the Grafana server.
Sudo service grafana-server restart.
You can list the installed plugin by the following command and verify that it's installed successfully.
grafana-cli plugins ls
If the plugin is not available on the Grafana plugin repository then it can be installed from a custom URL or a local path for example.
grafana-cli --pluginUrl https://plugins.example.com/grafana/plugins/<plugin-id>-<plugin-version>.zip plugins install <plugin-id>
Some important plugins and dashboards
- WorldPing: Monitor internet apps for availability.
- Azure Monitor: The data source for Azure monitor insights.
- PagerDuty: Pagerduty data source
- Plotly: Plotly is a data visualization framework, Plotly plugin offers a powerful set of visualization panels.
- Kubernetes: This app shows all the metrics collected by Prometheus on Kubernetes clusters, in a pre-built dashboard.
Where to go from here
For many popular data sources and metrics, the Grafana dashboards are already made available by the community, it saves the hassle of making the dashboard by yourself. Explore plugins like Plotly to utilize the full power of Grafana dashboards.
Our Hosted Grafana service already has dashboards set up for you, and every new plugin comes with a bunch of ready-to-go dashboards. Check out our Grafana as a Service page to learn more. Get a free trial with MetricFire and start making Grafana dashboards right away. You can also book a demo and talk to us directly about what Grafana can do for you.