AWS CloudWatch vs Prometheus Custom Metrics

ENGINEERING

Aug 01, 2023 ∙ 12 min read

MetricFire Blogger

Table of Contents

Introduction
Key Takeaways
What are custom metrics?
CloudWatch Custom Metrics
Prometheus Custom Metrics
Comparing custom metrics
Conclusion

Introduction

Understanding the state of your systems and their underlying infrastructure at all times is paramount for ensuring the stability and reliability of your services. Up-to-date information about the performance and health of your deployments not only helps your team react to issues in real time, but it also gives them the security to make changes with confidence and to safely forecast system failures or performance hiccups even before they occur.

Two very popular monitoring applications in the world of cloud computing are AWS CloudWatch, the principal monitoring application on the AWS suite, and Prometheus, a massive open-source monitoring application originally developed at SoundCloud.

When using CloudWatch and Prometheus, we are given a wide range of built-in metrics to choose from. But sometimes, we need to monitor more than the standard set of metrics that Prometheus or CloudWatch gives us. It is important that our monitoring platforms have the ability to allow users to make their own metrics custom to the system they are working with.

The purpose of this article is to provide an educational comparison of exposing and using custom metrics with these two popular monitoring applications: AWS CloudWatch and Prometheus.

If you are looking for a Prometheus alternative, jump on to the MetricFire free trial, where you can build your own Graphite custom metrics. Our platform lets you try Graphite and Grafana directly, and you can build your own custom dashboards. You can also integrate AWS CloudWatch with MetricFire, and monitor your AWS metrics in our platform. This is very helpful for AWS users who are looking for a second platform with greater flexibility and dashboarding options. Check out how to integrate AWS CloudWatch with Grafana on our Hosted Graphite documentation.

Key Takeaways

In AWS CloudWatch, custom metrics can be created and published using the AWS Command Line Interface (CLI) or AWS API. Monitoring scripts can be used for compute services like Elastic Beanstalk (EBS) and Elastic Cloud Compute (EC2).
Prometheus allows users to specify custom metrics of four main types: Counter, Gauge, Histogram, and Summary. It supports integrations with various systems through exporters, allowing third-party software to push metrics to Prometheus.
CloudWatch is ideal for AWS-based applications, as it easily integrates with other AWS services and provides a centralized location to view metric data.
Prometheus, being open-source, offers greater flexibility and can integrate with a wide variety of applications beyond AWS services.
Prometheus's use of exporters makes custom monitoring easier and allows users to create their own custom exporters.

What are custom metrics?

Simply put, custom metrics are metrics defined by the application user. They are different from built-in system metrics and their purpose is to allow users or system administrators to define whatever they want to monitor or track from their systems, even if this data is not natively exposed by the said system.

CloudWatch Custom Metrics

You can create and publish custom metrics to CloudWatch using the AWS Command Line Interface (CLI) tools or AWS API. These custom metrics can be created to collect all sorts of data, from application performance data not natively exposed by default, to business metrics like purchases made in a sales application.

Custom metrics can be created for any application running on an AWS service, with slightly different processes and requirements depending on the service. For example, compute services like Elastic Beanstalk (EBS) and Elastic Cloud Compute (EC2) allow the use of CloudWatch Monitoring Scripts, which are essentially Perl scripts that allow you to create and report custom metrics.

These scripts are written in such a way that they define what metrics they would collect, and how they are collected, abstracting them away from the user. Using these scripts is as easy as installing and running them on the compute instances whose data you wish to collect.

CloudWatch Monitoring Scripts provide an amazing amount of flexibility and reusability with custom metrics, as you can very easily install and run these scripts on any compute instances you wish to monitor. The metrics collected by these scripts are then graphed in the CloudWatch console, allowing you to see all your custom metrics at a glance, all in one location.

One downside of using these scripts is that you have to predefine what metrics you wish to collect, before installing and running them on a compute instance. A detailed example of such scripts can be found in the EC2 docs.

Another way of creating custom CloudWatch metrics is through the AWS API, allowing us to create metrics directly from within our application code. Say, for example, we want to track certain user interactions on an e-commerce website (our Key Performance Indicator) that would give us an insight into business performance over time. Leveraging the power of the AWS SDK, we could write code to create custom metrics that track this data and execute this code using a lambda function whenever the set event occurs. This code might be similar to the snippet below:

import boto3
def lambda_handler(event, context):
    cloudwatch = boto3.client('cloudwatch')
    response = cloudwatch.put_metric_data(
        MetricData = [
            {
                'MetricName': 'KeyPerformanceIndex',
                'Dimensions': [
                    {
                        'Name': 'SALES_SERVICE',
                        'Value': 'SalesService'
                    },
                    {
                        'Name': 'APP_VERSION',
                        'Value': '1.0'
                    },
                ],
                'Unit': 'None',
                'Value': #the actual data from your app
            },
        ],
        Namespace = 'SalesApp'
    )
print response

In simple terms, the lambda function above would create a custom metric called KeyPerformanceIndex which records data values for whatever data we specify in the Value field of the MetricData. Ideally, this data would be fed from our application and would represent the metric value for each time this metric is submitted. So, in essence, we could feed into this metric any data whose trend we want to keep an eye on. This function uses the boto3 library, which is the AWS SDK for Python, to publish metrics to CloudWatch via the put_metric_data(PutMetricData) API call.

Worthy of note is the fact that the AWS CLI method (Monitoring Scripts) discussed above also makes use of this PutMetricData api call under the hood. Metrics produced by AWS services either have a default standard resolution (they have a one-minute granularity), or a high-resolution (one-second granularity), but keep in mind that every PutMetricData call for a custom metric is charged, so calling PutMetricData more often on a high-resolution metric can lead to a much higher cost. CloudWatch pricing is notorious for escalating above what was expected, for more info see CloudWatch pricing.

Prometheus Custom Metrics

Prometheus is a very popular open-source monitoring tool with a very simple (yet complex) use case: tell it where to find metrics by configuring a series of scrape jobs, with each job specifying a series of nodes with endpoints to be scraped. Prometheus then scrapes these endpoints for metric data at intervals, persists it locally, and uses it to display visual metric charts.

You can display your metrics on the in-built Prometheus Expression Browser or export them to other graphing UIs such as Grafana. For the most part, Prometheus custom metrics are somewhat similar to CloudWatch custom metrics in that they allow users the flexibility to specify and monitor whatever aspect of their application they wish to. However, Prometheus is a little restrictive on the kinds of metrics it supports, limiting them to 4 main types:

Counter: A counter is a cumulative metric that represents a single increasing value that can only be increased or reset to 0 on application restart. You can use such a metric to represent the number of requests served, or the number of tasks completed, just to name a few.
Gauge: A gauge represents a single numerical value that can go up and down, and can be used to measure things like current memory usage or the number of concurrent requests.
Histogram: A histogram samples observations and counts them in configurable buckets, allowing certain levels of aggregation. Mostly used to measure the distribution of things like request durations or response time.
Summary: A summary samples observations and provides a count of all observed values. They are used in situations where the total value is important, for instance, payload sizes.

See more about Prometheus data structures in our article on how to query with PromQL.

So in order for us to create custom metrics, we'd have to specify one of the above types in our code and expose it to a dedicated endpoint, then have Prometheus scrape that endpoint at configurable intervals for the metric data.

However, the sheer amount of possible integration options Prometheus has makes up for this slight lack of flexibility. Prometheus does this using exporters and other integrations, allowing third-party software to push metrics to Prometheus with very little fuss. Being an Open Source project, these exporters can be written and maintained as part of the Prometheus GitHub organization, or written and hosted outside of Prometheus. Check out our article on Prometheus exporters here.

This is powerful in that it allows you to potentially write your own exporter for your system, and use this exporter to push your metrics to Prometheus. As a result, custom monitoring is a whole lot easier with Prometheus. Also, because Prometheus integrates with a huge variety of systems, the actual api we use in our code will depend on the underlying client library running our system. Here is a list of some of the client libraries in use with Prometheus.

Comparing custom metrics

Having looked at both systems, how their custom metrics work, and how to create/publish them, it is obvious that they share striking similarities. Custom metrics, as a concept on its own, are meant to help the system developer or relevant stakeholders gain valuable insights into their applications and how they work, allowing them to specify and keep track of literally any aspect of that system, and then having this data fed into a monitoring application where it is processed and displayed in an easy to use manner.

Both AWS CloudWatch and Prometheus custom metrics satisfy this requirement sufficiently well. However, they do have a few differences which foster the use of one in some scenarios over the other:

Given that CloudWatch runs on AWS and very easily integrates with other AWS services, this would be the ideal solution if your application is hosted on the AWS platform.

However, Prometheus' open-source nature allows it to support integrations with a massive variety of other applications. CloudWatch limits you to AWS services, whether hosted or on-premises, but Prometheus is extremely open and flexible.

CloudWatch provides a convenient centralized location for viewing all metric data from all your applications. Thus making it extremely easy to locate and view all your custom metrics at a glance, visualizing them all as easy-to-read graphs on a fixed dashboard. Whereas, because Prometheus is more customizable, there are fewer automatically built dashboards. Using its integration with Grafana, you could build up your dashboards in unique ways that boost your productivity and are not limited to the fixed dashboard that CloudWatch offers.

CloudWatch custom metrics can be built out of anything. If it can be represented as a value in the code, a metric can be created from it, whereas Prometheus restricts metric creation to just the aforementioned four metric types. However, using Prometheus with a collection engine like Logstash mitigates this limitation.

Prometheus can be abstracted using containerization such that it's able to run on any system platform, no matter its underlying technology, with very little configuration. The fact that it's Open Source makes its flexibility even more attractive because you can pretty much customize the Prometheus source code to best fit your use case. CloudWatch, being a proprietary piece of software, doesn't allow such flexibility.

Prometheus takes custom metrics up a notch with the use of exporters, which are basically libraries that allow exporting of metrics from third-party systems into Prometheus. It not only allows you to use the already vast catalog of exporters available, but also to build your own custom exporter, and have it export metrics from your system into Prometheus. How cool is that?
Last, but certainly not least, Prometheus is free to use. You could pretty much download its docker image, and get it working in minutes with your system with very little expenditure to worry about. However, every call to the CloudWatch PutMetricData api is charged, which could lead to a massive bill if you intend to do a lot of custom monitoring.

Conclusion

Custom metrics are powerful features, no matter which monitoring platform is used to collect and process them, and despite the advantages and disadvantages of both platforms that we discussed above, they both have massive potential that's just waiting to be applied.

There is an AWS CloudWatch integration with MetricFire, so one popular method is to send CloudWatch metrics over to MetricFire, where it's easy to do customized monitoring and dashboarding.

Try the MetricFire free trial to start monitoring with Prometheus today. Also, if you want to talk to the MetricFire team directly, book a demo and get us on a video call. We're always happy to talk about the best monitoring solutions for your company.