Monitoring High Cardinality Metrics

Monitoring High Cardinality Metrics

Table of Contents

Introduction

Monitoring is all about data. When you implement a monitoring tool, you have to make sure that the monitoring software can handle data. Today, data flows at high speeds and in large volumes. Data also comes in diverse forms, which increases the complexity of data ingestion. Because of this, monitoring solution providers promote, among others, their data processing capacities. If a monitoring platform can handle large and diverse data that comes in at a high velocity, it becomes a big advantage.

               

When we discuss data processing in monitoring, one topic that experts bring up is monitoring high cardinality metrics. Many companies require high cardinality when they evaluate a monitoring tool. In this article, we will learn about monitoring high-cardinality metrics and the concept of high cardinality.

              

MetricFire’s Hosted Graphite solution can handle high cardinality to fit your needs. Talk to our team today by booking a demo here.

                     

Key Takeaways

  1. Monitoring solution providers gain a competitive edge by being able to handle large and diverse data at high speeds.
  2. High cardinality refers to datasets with a wide range of values, and it plays a significant role in monitoring.
  3. High cardinality can adversely affect system performance, leading to resource usage spikes and potential system malfunctions.
  4. To mitigate the impact of high cardinality, teams can store old data separately, plan and measure the effects, and choose a suitable monitoring solution like Graphite.

 

What Is high cardinality?

First, let’s understand the concept of cardinality. Its dictionary definition is ​” In mathematics, the cardinality of a set is a measure of the ‘number of elements of the set” according to Wikipedia. This means that if a dataset has higher cardinality, that data has more various values. In monitoring, let’s say you have traffic source location values coming into your monitoring system.

                

These location values are, for example, “classroom_1”, “classroom_2”, “adminoffice_1”, and two others. Then, we can say the traffic source data has a cardinality of 5.  

            

In your monitoring system, these distinct values exist in the form of time-series data each point mapped with a certain value. So, time series data is a labeled set of values over time with timestamp and value pairs. If your monitoring system receives memory usage data, it will look like the following:

os.mem.util = [(2022-08-10 5:31, 78%), (2022-08-10 5:32, 76%), (2022-08-10 5:33, 83%)...]

        

This is a simple example as it has a very low cardinality. Let’s make it more realistic by adding more cardinalities. 

os.mem.util = [(2022-08-10 5:31, 78%, source="classroom_1", machine="computer_2"), (2022-08-10 5:32, 76%), (2022-08-10 5:33, 83%, source="adminoffice_2", machine="computer_11")...]

          

As you can see, we have two more data types - source and machine. Each data type has its own cardinality. If we want to create metrics using the data, we will need more dimensions than we did with the basic one. This will result in more tags and a higher number of time-series data and thus more combinations of both. When one data type gets high cardinality even when the others have a low one, it significantly increases the magnitude giving your monitoring system a burden.

                    

                  

How high cardinality affects your system

When the monitoring data with low and medium cardinality starts having high cardinality, it can lead to a spike in the data dimensions, which then can largely impact your system’s performance and your bottom line as well. When there are new labels added to the time-series data, the increase in cardinality can cause a sudden jump in resource usage by the monitoring system. It can cause memory errors, CPU spikes, and system outages. This can spread through the applications and networks that a company uses and can affect the company’s customers. Even when you can avoid system malfunctions by a jump in cardinality, you can still experience higher expenses since you will have to consume more resources to operate your monitoring systems and other applications.

       

Despite these negative impacts, adding cardinality can be needed to provide complete monitoring metrics to end users. If a cardinality increase is not avoidable, how can teams prevent or minimize the detrimental effects on their systems? There are several options.

      

  • Keep old data separately: If teams add more values to their monitoring data making the old data irrelevant, you can consider storing that data in more cost-effective storage. Cloud infrastructure providers often have a storage service for this purpose. For example, AWS has Amazon S3 Glacier Storage that can store infrequently accessed data at a low cost.
  • Plan and measure impact: teams need to understand that adding cardinality can have a broad impact. Before adding it, discuss it with relevant team members and measure potential impacts, especially if a monitoring system would be able to handle it.
  • Choose the right solution: although you have good execution plans, if your monitoring system is not strong enough, it will eventually block you from scaling up. Some solutions only work well with small and medium-sized workloads with low cardinality. For high volume and high cardinality monitoring data, consider using Graphite.

              

How MetricFire Handles High Cardinality

MetricFire provides a comprehensive monitoring solution that can ingest your high cardinality data. Using MetricFire’s solution, you can monitor networks, servers, and applications without having to build a monitoring platform on your own. MetricFire provides hosted Graphite that allows you to monitor your infrastructure with a powerful time-series ingestion engine and database.

                    

What is Graphite?

Graphite is also open-source software that is invented to monitor and visualize time series and performance data from multiple sources. The tool can reliably run with low specifications or Cloud infrastructure. Graphite was created by Chris Davis at Orbitz in 2006 as a side project. In 2008, Orbitz released Graphite with the open-source Apache 2.0 license. Many enterprises have chosen Graphite since then to monitor their production e-commerce services.

            

Major Features

Graphite offers the following major features.

  • It has a simple architecture and a great performance.
  • It shows time-series metrics in intuitive graphs and charts.
  • It is relatively easier to learn compared to other monitoring tools.
  • It has big community support and is widely used.
  • It suits diverse use cases such as analytics, DevOps, prediction, and more.

             

Hosted Graphite

Although open-source Graphite is a powerful monitoring tool, it demands huge efforts from your teams to initially set up and maintain the Graphite infrastructure. To remove the maintenance burden and let your teams focus on valuable tasks, MetricFire provides hosted Graphite.

            

The hosted Graphite has the following benefits.   

  • Redundancy storage: Graphite’s default storage is file-based and antiquated. MetricFire offers 3 times the redundant storage for seamless scaling and better data protection.
  • Control by APIs: The APIs that MetricFire provides let you control and automate the resources of Hosted Graphite.
  • Tagged metrics: Hosted Graphite stores data using tags that enable viewing and organizing metrics with data views.

        

Hosted Graphite keeps all the benefits of open-source Graphite and further enhances the tool with the built-in agent, team accounts, granular dashboard permissions, and integrations to other major platforms and services such as AWS, Heroku, logging tools, and more.

           

There are more benefits to handling high cardinality data specifically.

  • It can handle massive-scale data volume with cold and hot data separation. With hosted Graphite, you can ingest 5 billion metrics per minute and store up to 1.5 petabytes of time-series data.
  • To retrieve insights from high cardinality data, it maintains low latency queries in the event of high throughput.
  • It is highly available among a high volume of concurrent reads and writes as the time series data, by its nature, requires more frequent writing than reading.

        

On top of the benefits, hosted MetricFire offers:

  • Hosted Graphite backs up your user data and dashboard every hour.
  • You are in good hands with MetricFire's technical experts.
  • MetricFire offers an extensive range of options for users of all sizes with plans and customization to meet your needs.
  • MetricFire’s on-call team is ready 24/7, 365 days. Our team keeps watching your hosted Graphite from around the world using an automated monitoring system.
  • We have been ingesting billions of data points per day since 2012. Our mature metric processing and storage capabilities are trusted by thousands of engineers. 
  • You can directly send metrics from your application without additional dependencies or aggregation services.

                 

Conclusion

Data is the most important digital asset today. If your monitoring system can digest various and large time-series data, you can prevent incidents that can damage your digital resources as well as your business reputation. MetricFire’s hosted solution gives users flexibility and scalability to handle high cardinality data with its powerful time-series handling capability and managed support for its customers. 

                

You can use MetricFire products with minimal configuration to gain in-depth and complete insight into your digital assets. If you would like to learn more about these services, feel free to book a demo to speak with one of our experts or sign up for the free trial today.

You might also like other posts...
metricfire Apr 10, 2024 · 9 min read

Step-by-Step Guide to Monitoring Your SNMP Devices With Telegraf

Monitoring SNMP devices is crucial for maintaining network health and security, enabling early detection... Continue Reading

metricfire Mar 13, 2024 · 8 min read

Easy Guide to monitoring uWSGI Using Telegraf and MetricFire

It's important to monitor uWSGI instances to ensure their stability, performance, and availability, helping... Continue Reading

metricfire Mar 12, 2024 · 8 min read

How to Monitor ClickHouse With Telegraf and MetricFire

Monitoring your ClickHouse database is a proactive measure that helps maintain its health and... Continue Reading

header image

We strive for
99.999% uptime

Because our system is your system.

14-day trial 14-day trial
No Credit Card Required No Credit Card Required