Top 10 cAdvisor Metrics for Prometheus

Top 10 cAdvisor Metrics for Prometheus

Table of Contents

  1. Introduction to cAdvisor
  2. cAdvisor metrics overview
    1. ‍container_cpu_cfs_throttled_seconds_total
    2. container_cpu_load_average_10s
    3. container_fs_io_time_seconds_total
    4. container_memory_usage_bytes
    5. container_memory_failcnt
    6. container_network_receive_errors_total
    7. container_network_transmit_errors_total
    8. container_processes
    9. container_tasks_state
    10. container_start_time_seconds

Introduction to cAdvisor

cAdvisor (container advisor) is an open-source container-monitoring platform developed and maintained by Google. It runs as a background daemon process for collecting, processing, and aggregating data into performance characteristics, resource usage statistics, and related information about running containers.

With built-in support for Docker and literally any other container type out of the box, cAdvisor can be used to collect data on virtually any type of running container. This data can then be scraped by larger, system-wide monitoring platforms, to give users information on the health or status of their containers in the context of their entire system.

It is an increasingly common practice to use cAdvisor with Prometheus to monitor containerized services. This is because the cAdvisor web user interface is useful for exploring certain things monitored by cAdvisor, such as container CPU and memory usage, but it doesn't provide a means for exploring the container metrics it collects.

This is where Prometheus comes in. Prometheus scrapes these container metrics, processes them, and makes them available through dashboards using its expression browser, or through third-party integrations such as Grafana.

When monitoring with MetricFire, both hosted Graphite and hosted Grafana are included in one package. MetricFire's Hosted Graphite is a hosted monitoring solution. Save yourself the pain of monitoring your metrics by using our hosted solution. You should sign up for the MetricFire free trial here.

  

   

cAdvisor metrics overview

cAdvisor exports a variety of container metrics for Prometheus, allowing you to monitor virtually every aspect of your running containers. Although the importance of certain metrics over others would largely depend on the actual processes running on the container, this article aims to provide the top 10 most important cAdvisor metrics for a general use case.

  

‍container_cpu_cfs_throttled_seconds_total

This measures the total amount of time a certain container has been throttled. Generally, container CPU usage can be throttled to prevent a single busy container from essentially choking other containers by taking away all the available CPU resources.
Throttling is usually a good way to ensure a minimum processing power is available for essential services on all running containers. This metric measures the total time that a container’s CPU usage was throttled, and observing this provides the information one needs to properly reallocate resources to specific containers. This can be done, for example, by adjusting the setting for CPU shares in Docker.‍

  

container_cpu_load_average_10s

This measures the value of the container CPU load average over the last 10 seconds. Monitoring CPU usage is vital for ensuring it is being used effectively. It would also give insight into what container processes are compute-intensive, and as such, help advise future CPU allocation.‍

 

container_fs_io_time_seconds_total

This measures the cumulative count of seconds spent doing I/Os. It can be used as a baseline to judge the speed of the processes running on your container, and help advise future optimization efforts.‍

  

container_memory_usage_bytes

This measures the current memory usage, including all memory regardless of when it was accessed. Tracking this on a per-container basis keeps you informed of the memory footprint of the processes on each container while aiding future optimization or resource allocation efforts.‍

  

container_memory_failcnt

This measures the number of times a container’s memory usage limit is hit. It is good practice to set container memory usage limits, to prevent memory-intensive tasks from essentially starving other containers on the same server by using all the available memory.
This way, each container has a max amount of memory it can use and tracking how many times a container hits its memory usage limit would help a user understand if the container memory limits need to be increased, or if debugging needs to be done in order to find the reason for the high memory consumptions.‍

  

container_network_receive_errors_total

This measures the cumulative count of errors encountered while receiving bytes over your network. Networking on containers can get tricky sometimes, so it’s essential to keep an eye on failures when they occur. This metric simply lets you know how many failures occurred on inbound traffic, which gives you an idea of where to look for debugging.‍

  

container_network_transmit_errors_total

This measures the cumulative count of errors encountered while transmitting. Similar to the metric directly above, this would help aid debugging efforts by keeping track of the total number of failures that occurred during transmission.‍

  

container_processes

This metric keeps track of the number of processes currently running inside the container. Knowing the exact state of our containers at all times is essential in keeping them up and running. As such, knowing how many processes are currently running in a specific container would provide insight into whether things are functioning normally, or whether there’s something wrong.‍

  

container_tasks_state

This metric tracks the number of tasks or processes in a given state (sleeping, running, stopped, uninterruptible, or waiting) in a container. At a glance, this information could be essential in providing real-time information on the status or health of the container and its processes.‍

  

container_start_time_seconds

Although subtle, this metric tracks a container’s start time in seconds and could either provide an early indication of trouble, or an indication of a healthy container instance.‍   

How important a metric is to you largely depends on your use case. It wouldn’t make sense to pay attention to the container_fs_writes_total  metric (which tracks the cumulative count of write operations completed) for a container that is only intended to perform read operations.

We suggest looking at this cAdvisor documentation page to find all the metrics exposed for Prometheus and depending on your use case, figure out which to pay more attention to. That said, the above metrics can be used in most container monitoring scenarios to provide easy insight into the status of your running containers.

With MetricFire’s hosted Graphite offering, monitoring is made much easier as you don’t have to worry about setting up, configuring, or maintaining your Prometheus instance. It's instantly available for you to use, making integrations with external applications like Grafana much easier. Sign up for a free trial of our hosted Graphite offering or book a demo to talk to the team directly about how MetricFire can help you.

You might also like other posts...
prometheus Apr 24, 2023 · 14 min read

How to deploy Prometheus on Kubernetes

Get to know how to deploy Prometheus on Kubernetes, including the configuration for remote... Continue Reading

prometheus Nov 09, 2022 · 11 min read

Cluster Monitoring with Prometheus and Rancher

In this article, we present an overview of cluster monitoring using Rancher and Prometheus... Continue Reading

prometheus Nov 04, 2022 · 16 min read

Monitoring HashiCorp Nomad with Prometheus and Grafana

How to monitor your HashiCorp Nomad with Prometheus and Grafana. Build dashboards with the... Continue Reading

header image

We strive for
99.999% uptime

Because our system is your system.

14-day trial 14-day trial
No Credit Card Required No Credit Card Required