Collectd is a data collection software that allows you to fetch metrics from a machine being monitored locally and push them to Graphite, Prometheus, etc. Everything is done by plugins. The collectd plugins can collect metrics on CPU, memory, Postgres, JVM, and many more metrics. Plugins can also be used to push these metrics to Graphite, aggregate data, send alerts, and send notifications to email.
In this article we’ll create a test system with collectd plugins, that monitors a Linux machine and pushes these metrics to Graphite. In the end, we will make sure that everything works on a Grafana dashboard.
Collectd is a software designed to collect metrics. It is also a platform that sets up and manages plugins that do all the payload. Collectd should be installed on each machine that hosts the application you need to monitor.
So, what can the plugins do? The most common plugins can monitor CPU, memory, network, and swap usage. There are also many plugins to monitor software such as Nginx, Apache, JVM, Postgres, etc.
But we also have plugins that can aggregate, alert, notify, and push metrics to a database. Collectd is really flexible because you can configure each plugin you want.
Let's check how it works in practice.
Having a lot of metrics is great, but it’s really hard to process them without an appropriate visualization. That is why we need to attach Grafana to our system. Graphite provides its own web interface, which can be enough in some cases, but it is also useful to attach a more advanced dashboarding system.
Let's define a task:
To protect us from tons of useless packages on the local machine, we will install only collectd locally. For all the other services we will use a Docker-compose. This will also help us to deploy this system easily.
In this section, we will set up Graphite with Grafana, in the next section will describe the collectd installation and plugins configuration.
Let’s define a docker-compose.yml file:
This configuration describes 2 services: Graphite and Grafana.
To install Graphite we will use an image that implements Graphite with StatsD included. But for our purpose in this article, we only make use of Graphite. Also, this image exposes some ports that will be accessible from our container.
There are also StatsD ports 8125, 8126 that can be useful for you, but not in the context of this article. We will also skip the advanced configuration of the carbon aggregator and use only the receiver.
If you’re interested in StatsD, check out our other article for more information on the differences between collectd and StatsD.
Next, we create a Grafana service. First we need to expose Port 3000 because it is used to access the user interface. Also, Grafana needs to make requests to Graphite and import the data, so we need to make this possible (network_mode: host). Then we specify that this service depends on Graphite.
collectd will collect metrics from a local PC and send them to Graphite on a port number 2003. We will add Graphite as a data source to Grafana and visualize how everything works.
Run docker-compose.yml with
And restart collectd after configuration with:
sudo service collectd restart
You should wait some time to allow metrics to be collected by collectd and piped into Graphite (it’s not interesting to analyze metrics from only a 1 minute time-span).
Then check Graphite by going to the address http://localhost/.
And create your first Dashboard:
Let’s install collectd on a local machine with:
sudo apt install collectd
The next step we need to do is to configure the collectd service with the plugins. To do so, let’s stop the collectd service:
sudo service collectd stop
Then, open a file with the configurations (it is a good idea to keep a backup of the previous variant):
sudo gedit /etc/collectd/collectd.conf
Here is an example of our file:
The first row defines the name of the current host (for now we can choose a random one) and the second row controls how a hostname is chosen. When enabled, the hostname of the node is set to the fully qualified domain name (FQDN). On a real server, the first row can be skipped.
You can also specify their updating interval and much more using this file.
Then we configure the plugins that we want to use with the following syntax:
<p>CODE: https://gist.github.com/denshirenji/0508d549a3d01c643c4d63323e528fe2.js </p>
For this example, we will use plugins without any special configurations. In the code example above, you can also find the plugin in configuration for sending data to the Prometheus instance commented out. The plugin is called write_prometheus, and you can find the line # <LoadPlugin “write_prometheus”>.
Let’s describe these collectd plugins:
The CPU plugin collects the amount of time spent by the CPU in various states, most notably executing the user code, executing the system code, waiting for IO-operations, and being idle.
The CPU plugin does not collect percentages. It collects “jiffies”, the units of scheduling. But it can be configured as follows:
<p> CODE: https://gist.github.com/denshirenji/959f5a5061b72a6d7ae499be61b10ba5.js </p>
The metrics collected are system dependent. This is an example of metrics that would be collected when running on Windows:
Metrics will be collected for each CPU separately. Each core is named with numbers from 0 to N.
The Disk plugin collects performance statistics of hard-disks and partitions.
<p>CODE: https://gist.github.com/denshirenji/80ffc5ec0afbf94dae869510805d3efe.js </p>
This is also a platform dependent plugin. An example of metrics that would be collected from a Linux environment will contain:
The DF plugin collects the file system usage information (used and free space on mounted partitions). The output is similar to the ‘df’ Linux command.
You can configure it to monitor only certain partitions, file system type or mount point. You can also monitor all the data. Here is an example:
<p> CODE: https://gist.github.com/denshirenji/82f56fc4b2a732974c2b87c06c660bfb.js </p>
This metric consists of 2 values (Free and Used) for each mounted partition.
The Swap plugin shows the amount of swap used.
Example of configuration:
There you can find 3 values (Free, Cached, and Used).
The Memory plugin collects the physical memory usage.
Under Linux you will find these metrics:
collectd is a platform built for managing plugins that perform all the payloads for metric collection. For example, you can use it when monitoring infrastructure or applications like JVM, Nginx or MongoDB. The system is really flexible and configurable, and each plugin can be configured individually. The variety of collectd plugins means you can always collect, process, and push metrics to a collection system. You can also configure alerting with functions like mail notification.
Running collectd monitoring requires some knowledge to develop and set up the monitoring architecture, which can become difficult to maintain at production scale. In addition, some plugins are not documented well. The best way to avoid costly processes and save money on infrastructure monitoring is to use services like MetricFire, where the plugins are already installed, and there is good documentation. For example, MetricFire has a Hosted Graphite Agent that allows users to skip using collectd or StatsD completely.