Metric Names
A metric name…
- …must comply with the data model for valid characters.
- …should have a (single-word) application prefix relevant to the domain the metric belongs to. The prefix is sometimes referred to as
namespace
by client libraries. For metrics specific to an application, the prefix is usually the application name itself. Sometimes, however, metrics are more generic, like standardized metrics exported by client libraries. Examples:- prometheus_notifications_total (specific to the Prometheus server)
- process_cpu_seconds_total (exported by many client libraries)
- http_request_duration_seconds` (for all HTTP requests)
- …must have a single unit (i.e. do not mix seconds with milliseconds, or seconds with bytes).
- …should use base units (e.g. seconds, bytes, meters - not milliseconds, megabytes, kilometers).
- …should have a suffix describing the unit, in plural form. Note that an accumulating count has
total
as a suffix, in addition to the unit if applicable.http_request_duration_**seconds**
node_memory_usage_**bytes**
http_requests_**total**
(for a unit-less accumulating count)process_cpu_**seconds_total**
(for an accumulating count with unit)foobar_build**_info**
(for a pseudo-metric that provides metadata about the running binary)
- …should represent the same logical thing-being-measured across all label dimensions.
- request duration
- bytes of data transfer
- instantaneous resource usage as a percentage
As a rule of thumb, either the sum()
or the avg()
over all dimensions of a given metric should be meaningful (though not necessarily useful). If it is not meaningful, split the data up into multiple metrics. For example, having the capacity of various queues in one metric is good, while mixing the capacity of a queue with the current number of elements in the queue is not.