cAdvisor (container advisor) is an open source container monitoring platform developed and maintained by Google. It runs as a background daemon process for collecting, processing and aggregating data into performance characteristics, resource usage statistics, and related information about running containers.
With built in support for Docker, and literally any other container type out of the box, cAdvisor can be used to collect data on virtually any type of running container. This data can then be scraped by larger, system-wide monitoring platforms, such as Prometheus, to give users information on the health or status of their containers in the context of their entire system.
It is an increasingly common practice to use cAdvisor with Prometheus to monitor containerized services. This is because the cAdvisor web user interface is useful for exploring certain things monitored by cAdvisor, such as container CPU and memory usage, but it doesn't provide a means for exploring the container metrics it collects.
This is where Prometheus comes in. Prometheus scrapes these container metrics, processes them, and makes them available through dashboards using its expression browser, or through third party integrations such as Grafana.
When monitoring with MetricFire, both hosted Prometheus and hosted Grafana are included in one package. You can monitor your cAdvisor metrics in Prometheus right in our web app, with no configuration or setup. You should sign up for the MetricFire free trial here.
cAdvisor exports a large variety of container metrics for Prometheus, allowing you to monitor virtually every aspect of your running containers. Although the importance of certain metrics over others would largely depend on the actual processes running on the container, this article aims to provide the top 10 most important cAdvisor metrics for a general use case.
- container_cpu_cfs_throttled_seconds_total: This measures the total amount of time a certain container has been throttled. Generally, container CPU usage can be throttled to prevent a single busy container from essentially choking other containers by taking away all the available CPU resources.
Throttling is usually a good way to ensure a minimum processing power is available for essential services on all running containers. This metric measures the total time that a container’s CPU usage was throttled, and observing this provides the information one needs to properly reallocate resources to specific containers. This can be done, for example, by adjusting the setting for cpu shares in Docker.
- container_cpu_load_average_10s: This measures the value of container CPU load average over the last 10 seconds. Monitoring CPU usage is vital for ensuring it is being used effectively. It would also give insight into what container processes are compute intensive, and as such, help advise future CPU allocation.
- container_fs_io_time_seconds_total: This measures the cumulative count of seconds spent doing I/Os. It can be used as a baseline to judge the speed of the processes running on your container, and help advise future optimization efforts.
- container_memory_usage_bytes: This measures the current memory usage, including all memory regardless of when it was accessed. Tracking this on a per container basis keeps you informed of the memory footprint of the processes on each container, while aiding future optimization or resource allocation efforts.
- container_memory_failcnt: This measures the number of times a container’s memory usage limit is hit. It is good practice to set container memory usage limits, to prevent memory intensive tasks from essentially starving other containers on the same server by using all the available memory.
This way, each container has a max amount of memory they can use, and tracking how many times a container hits its memory usage limit would help a user understand if the container memory limits need to be increased, or if debugging needs to be done in order to find the reason for the high memory consumptions.
- container_network_receive_errors_total: This measures the cumulative count of errors encountered while receiving bytes over your network. Networking on containers can get tricky sometimes, so it’s essential to keep an eye on failures, when they occur. This metric simply lets you know how many failures occurred on inbound traffic, which gives you an idea on where to look for debugging.
- container_network_transmit_errors_total: This measures the cumulative count of errors encountered while transmitting. Similar to the metric directly above, this would help aid debugging efforts by keeping track of the total number of failures occurred during transmission.
- container_processes: This metric keeps track of the number of processes currently running inside the container. Knowing the exact state of our containers at all times is essential in keeping them up and running. As such, knowing how many processes are currently running in a specific container would provide insight into whether things are functioning normally, or whether there’s something wrong.
- container_tasks_state: This metric tracks the number of tasks or processes in a given state (sleeping, running, stopped, uninterruptible, or ioawaiting) in a container. At a glance, this information could be essential in providing real-time information on the status or health of the container and its processes.
- container_start_time_seconds: Although subtle, this metric tracks a container’s start time in seconds, and could either provide an early indication of trouble, or an indication of a healthy container instance.
How important a metric is to you largely depends on your use case. It wouldn’t make sense to pay attention to the container_fs_writes_total metric (which tracks the cumulative count of write operations completed) for a container which is only intended to perform read operations.
We suggest looking at this cAdvisor documentation page to find all the metrics exposed for Prometheus, and depending on your use case, figure out which to pay more attention to. That said, the above metrics can be used in most container monitoring scenarios to provide easy insight into the status of your running containers.
With MetricFire’s hosted Prometheus offering, monitoring is made much easier as you don’t have to worry about setting up, configuring or maintaining your Prometheus instance. It's instantly available for you to use, making integrations with external applications like Grafana much easier. Sign up for a free trial of our hosted Prometheus offering or book a demo to talk to the team directly about how MetricFire can help you.