Top 10 cAdvisor Metrics for Prometheus

PROMETHEUS

Oct 12, 2023 ∙ 7 min read

Lauren Barnes

Introduction to cAdvisor
Key Takeaways
cAdvisor metrics overview
Conclusion

Introduction to cAdvisor

cAdvisor (container advisor) is an open-source container-monitoring platform developed and maintained by Google. It runs as a background daemon process for collecting, processing and aggregating data into performance characteristics, resource usage statistics, and related information about running containers.

With built-in support for Docker and literally any other container type out of the box, cAdvisor can be used to collect data on virtually any type of running container. This data can then be scraped by larger, system-wide monitoring platforms, to give users information on the health or status of their containers in the context of their entire system.

It is an increasingly common practice to use cAdvisor with Prometheus to monitor containerized services. This is because the cAdvisor web user interface is useful for exploring certain things monitored by cAdvisor, such as container CPU and memory usage, but it doesn't provide a means for exploring the container metrics it collects.

This is where Prometheus comes in. Prometheus scrapes these container metrics, processes them, and makes them available through dashboards using its expression browser, or through third-party integrations such as Grafana.

When monitoring with MetricFire, both hosted Graphite and hosted Grafana are included in one package. MetricFire's Hosted Graphite is a hosted monitoring solution. Save yourself the pain of monitoring your metrics by using our hosted solution. You should sign up for the MetricFire free trial here.

Key Takeaways

cAdvisor, developed by Google, is an open-source container monitoring platform that collects and processes data on running containers, making it a crucial tool for containerized services.
cAdvisor has built-in support for Docker and can collect data on virtually any type of running container, making it compatible with a wide range of container technologies.
cAdvisor is commonly used with Prometheus, which scrapes container metrics, processes them, and provides insights through dashboards using its expression browser or third-party integrations like Grafana.
The importance of these metrics depends on the specific use case of the container. Not all metrics are relevant for every scenario, so it's essential to select the most relevant ones for your monitoring needs.
cAdvisor metrics are essential for monitoring and managing containerized services, and understanding these top metrics can provide valuable insights into the health and performance of your containers when used in conjunction with tools like Prometheus and hosted Graphite from MetricFire.

cAdvisor metrics overview

cAdvisor exports a variety of container metrics for Prometheus, allowing you to monitor virtually every aspect of your running containers. Although the importance of certain metrics over others would largely depend on the actual processes running on the container, this article aims to provide the top 10 most important cAdvisor metrics for a general use case.

‍container_cpu_cfs_throttled_seconds_total

This measures the total amount of time a certain container has been throttled. Generally, container CPU usage can be throttled to prevent a single busy container from essentially choking other containers by taking away all the available CPU resources.
Throttling is usually a good way to ensure a minimum processing power is available for essential services on all running containers. This metric measures the total time that a container’s CPU usage was throttled, and observing this provides the information one needs to properly reallocate resources to specific containers. This can be done, for example, by adjusting the setting for CPU shares in Docker.‍

container_cpu_load_average_10s

This measures the value of the container CPU load average over the last 10 seconds. Monitoring CPU usage is vital for ensuring it is being used effectively. It would also give insight into what container processes are compute-intensive, and as such, help advise future CPU allocation.‍

container_fs_io_time_seconds_total

This measures the cumulative count of seconds spent doing I/Os. It can be used as a baseline to judge the speed of the processes running on your container and help advise future optimization efforts.‍

container_memory_usage_bytes

This measures the current memory usage, including all memory regardless of when it was accessed. Tracking this on a per-container basis keeps you informed of the memory footprint of the processes on each container while aiding future optimization or resource allocation efforts.‍

container_memory_failcnt

This measures the number of times a container’s memory usage limit is hit. It is good practice to set container memory usage limits, to prevent memory-intensive tasks from essentially starving other containers on the same server by using all the available memory.
This way, each container has a maximum amount of memory it can use, and tracking how many times a container hits its memory usage limit would help a user understand if the container memory limits need to be increased, or if debugging needs to be done in order to find the reason for the high memory consumptions.‍

container_network_receive_errors_total

This measures the cumulative count of errors encountered while receiving bytes over your network. Networking on containers can get tricky sometimes, so it’s essential to keep an eye on failures when they occur. This metric simply lets you know how many failures occurred on inbound traffic, which gives you an idea of where to look for debugging.‍

container_network_transmit_errors_total

This measures the cumulative count of errors encountered while transmitting. Similar to the metric directly above, this would help aid debugging efforts by keeping track of the total number of failures that occurred during transmission.‍

container_processes

This metric keeps track of the number of processes currently running inside the container. Knowing the exact state of our containers at all times is essential in keeping them up and running. As such, knowing how many processes are currently running in a specific container would provide insight into whether things are functioning normally, or whether there’s something wrong.‍

container_tasks_state

This metric tracks the number of tasks or processes in a given state (sleeping, running, stopped, uninterruptible, or waiting) in a container. At a glance, this information could be essential in providing real-time information on the status or health of the container and its processes.‍

container_start_time_seconds

Although subtle, this metric tracks a container’s start time in seconds and could either provide an early indication of trouble, or an indication of a healthy container instance.‍

How important a metric is to you largely depends on your use case. It wouldn’t make sense to pay attention to the container_fs_writes_total metric (which tracks the cumulative count of write operations completed) for a container that is only intended to perform read operations.

Conclusion

We suggest looking at this cAdvisor documentation page to find all the metrics exposed for Prometheus and depending on your use case, figure out which to pay more attention to. That said, the above metrics can be used in most container monitoring scenarios to provide easy insight into the status of your running containers.

With MetricFire’s hosted Graphite offering, monitoring is made much easier as you don’t have to worry about setting up, configuring, or maintaining your Prometheus instance. It's instantly available for you to use, making integrations with external applications like Grafana much easier. Sign up for a free trial of our hosted Graphite offering or book a demo to talk to the team directly about how MetricFire can help you.

Top 10 cAdvisor Metrics for Prometheus

Table of Contents

Introduction to cAdvisor

Key Takeaways

cAdvisor metrics overview