Table of Contents
In this article, we will explain what system performance metrics are and why you need to monitor them. Then we will look at Graphite and Grafana monitoring systems, which make it easy to collect, save and visualize metrics. Finally, we will consider why you should choose MetricFire to monitor your system’s metrics.
What are system performance metrics?
System performance metrics are the indicators that can be used to determine how accurately, quickly, and efficiently a system performs its functions.
Metrics are numerical data that can be provided by the operating system, hardware, various applications, and websites. Some examples of metrics that are generated by an operating system are CPU usage, available disk space, and used memory. Programs can create metrics about resource usage, performance, or user behavior. Websites can generate information about the number of active users on the site or the time it takes to load a web page.
Usually, metrics are collected by the system automatically within a certain period. For example, once per second, once per hour, or any other specified period.
Let’s list the main system performance metrics.
- Availability: The availability of a system is usually measured as a factor in its reliability. As reliability increases, so does availability. System availability can also be improved by improving testability and maintainability.
- Response time: Response time is the total time it takes to respond to a service request. It includes the following values:
- Service Time: Time to complete the requested work.
- Wait time: How long a request must wait for requests ahead of it in the queue before it will be run.
- Transmission time: Time for sending a request to the computer doing the job and response back to the requester.
- Channel capacity: The channel capacity is the limiting information transfer rate that can be achieved with an arbitrarily small error probability.
- Latency: Latency is the time delay between the cause and effect of an observed physical change in the system. Latency is a result of the limited speed at which any physical interaction can occur. This speed is always less than or equal to the speed of light. Therefore, every physical system that has non-zero spatial dimensions will experience some delay.
- OS loading: The amount of work that a computer system does.
- Disk usage (GB): Tracks the increase or decrease in the amount of available disk space used.
- Disk throughput: Average disk throughput for reading and writing operations, measured in megabytes per second.
- Disk speed: Average disk speed for reading and writing operations.
A lot of servers and programs generate their own metrics that can be collected and analyzed. You can configure your applications to generate the metrics you need and have your monitoring system collect them.
Why is it necessary to monitor system performance metrics?
System performance monitoring helps you keep a close eye on required and used system resources. This data can be used to effectively manage your systems and detect downgrades in system performance.
The main benefits of monitoring system performance metrics are:
- The ability to easily find the cause and source of system errors, as well as the time when the error occurred.
- The ability to detect and fix problems before users notice them.
- The ability to save and analyze historical data to predict and improve system performance.
Using Telegraf to send metrics to Graphite
Graphite is a tool that allows you to collect and save your system performance metrics. It has basic tools to graphically display stored metrics. Alternatively, you can connect Graphite to other more advanced data visualization systems such as Grafana.
Telegraf is a monitoring client that you can easily install and configure. It can work with various operating systems (Windows, MacOS, Linux, Red Hat, and CentOS) and has a Graphite output plugin. Installing and configuring Telegraf is different for different operating systems. The basic steps to configure Telegraf are:
- Install Telegraf.
- Create a configuration file.
- Make special settings in the Graphite section of the generated file.
- Launch Telegraf.
After that, the metrics will appear in your MetricFire account. In addition to sending system metrics, you can also use Telegraf to send metrics from local processes using the Procstat input plugin.
Sending System performance metrics to Graphite
There are three main methods for loading data into Graphite: plain text, Pickle, and AMQP. All data that is sent to Graphite is saved to Carbon.
The plaintext protocol is the simplest protocol supported by Carbon. Data must be sent in the following format: <metric path> <metric value> <metric timestamp>. Carbon translates this line of text into a format that suits the web interface and Whisper.
The pickle protocol is a much more efficient version of the clear text protocol and supports sending packets of metrics to Carbon in one go. The general idea is that the data forms a list of multilevel tuples: [(path, (timestamp, value)), ...]. Once you have formed a list and marinated it, send the data over the socket to the Carbon pickle receiver.
When AMQP_METRIC_NAME_IN_BODY is set to True in your carbon.conf file, the data must be in the same format as the plaintext protocol. When AMQP_METRIC_NAME_IN_BODY is set to False, you should omit "local.random.diceroll".
There are a lot of ready-made tools for sending data to Graphite. The most popular among them are collectd, collectl, diamond, graphite-pollers, and SqlToGraphite.
Building a Grafana dashboard
After creating the Grafana dashboard, you need to add information panels to it. There are different types of panels in Grafana that you can use for different purposes. For example:
- Graph Panel: Time charts display data points on the time axis. You can compare different indicators on the same chart, and use the timeshift functions to compare current values with any past time period.
- Single stat: Used to display values resulting from a query or display texts using value mapping transformation options.
- Gauge: Takes one input series and displays the value in terms of the position it occupies within predefined lower and upper bounds.
- Polystat panel: Displays small hexagons or circles representing single or composite indicators and their enabled states.
- Table: Provides discrete key data for metrics, supports multiple modes for time series, annotations, and JSON data.
- Alert list: Allows you to display all triggered alerts from the control panel independently from different panels.
The advantages of MetricFire
MetricFire offers a Hosted Graphite solution that you can use to collect and store your system’s performance metrics. MetricFire configures, installs, and maintains Graphite that you can use as a web application without having to worry about maintaining performance.
Key benefits of using MetricFire:
- Data availability: MetricFire provides permanent access to your data, which you can export at any time.
- Affordable price: MetricFire provides a wide range of pricing plans to suit your needs.
- Saving time and money: There is no need to invest in deploying and running monitoring systems.
- Reliable support: If you have any questions while working with MetricFire, our support team is ready to provide comprehensive answers either via our support channels or video conference.
We looked at why it is important to monitor system performance metrics, how to send the collected metrics to Graphite, and how to visualize them using Grafana. Also, we listed the main benefits of using MetricFire.