This document describes the base system metrics exported by the Hosted Graphite agent.
We focus on the default “base” dashboard, and also provide notes on related metrics not displayed there.
As Diamond collectors rely heavily on /proc data, many of the notes below are from Linux kernel documentation, e.g. proc.txt;
We list metric unit - percentage, count, bytes, etc. - in brackets after each metric description.
If you find anything unclear or incorrect here, please let us know!
These metrics are found under:
hg_agent.hostname.cpu.cpuid.*
and represent percentages of time each cpuid spends in particular states.
We display two of the most interesting on the dashboard:
Others you can use in your own graphs or investigations:
These metrics are found under:
hg_agent.hostname.loadavg.*
Load average, roughly speaking, is the average number of tasks waiting with “something to do” over a period of time:
Since the interpretation of load average is affected by the number of cores a machine has, you might like to use these “normalized” versions in your own graphs or investigations:
These metrics are found under:
hg_agent.hostname.loadavg.*
These are simple “snapshot” counters of the process numbers. Note that the number running will typically be maxed out at #cores.
These metrics are found under:
hg_agent.hostname.memory.*
In the “memory activity” graph, we display some of the metrics most relevant to physical memory usage:
And “swap activity” displays:
There are several other metrics available under memory.*. If you’re digging further, you can find out what they mean in the docs for /proc/meminfo.
These metrics are found under:
hg_agent.hostname.vmstat.*
These are metrics from /proc/vmstat and give some insight into the activity of the Linux virtual memory system. Unfortunately, the counters are a little underdocumented.
First, pages in and out:
Note that because everything goes through the page cache, these are recorded for essentially all pages read from or written to disk, so if you’re doing a lot of IO they’ll be elevated.
Next, swap usage which generally you want to keep low or nonexistent. See this article for more information.
Finally, page faults made by the virtual memory system to page memory into process address spaces:
Note that page faults will stimulate paging in, so you can expect these to correlate.
These metrics are found under:
hg_agent.hostname.memory.*
When you change disk-backed memory in the page cache, it’s not written to disk immediately, just marked as “dirty”. This graph allows you to see how much is building up & being written back over time.
These metrics are found under:
hg_agent.hostname.iostat.*
These metrics are per-disk, and are gathered from /proc/diskstats.
There are many other iostat metrics exported per disk; you can browse your metric tree to see which and compare with /proc/diskstats and the ‘diskusage’ diamond collector.
These metrics are found under:
hg_agent.hostname.diskspace.*
Again, these metrics are per-disk.
Apart from this useful graphed value, there are also some more available to you:
These metrics are found under:
hg_agent.hostname.network.*
These metrics are per-interface. We graph the following:
There are many other network metrics exported per interface; you can browse your metric tree to see which and compare with /proc/net/dev, which is fairly self-explanatory, and the ‘network’ diamond collector.
These metrics are found under:
hg_agent.hostname.sockets.*
They’re drawn from /proc/net/sockstat, which is under-documented.
Others you can use in your own graphs or investigations: