Cardinality

Prometheus performance almost always comes down to one thing: label cardinality.

Cardinality is how many unique values of something there are. So for example a label containing HTTP methods would have a cardinality of 2 if you had only GET and POST in your application.

It’s fairly common that things start out reasonable. You might have a histogram covering 2 HTTP methods, 7 HTTP paths, 5 machines, and a Prometheus typically only monitors one environment and datacenter. So that’s 2x7x5x12 = 840. Well within the capabilities of a single Prometheus.

What tends to catch you out is that things usually don’t grow in only one dimension. Increased traffic means more machines, and more users usually means more features so new endpoints. So you might now have say 3x8x6x12, which is an increase of just 1 for each of the first three factors, resulting in 1728 or more than double the original!

It’s still small overall, but this is just one metric, from one subsystem, and this is only one minor growth spurt. Over time growth accumulates and compounds, and can bring you to a point where gradually your Prometheus starts to creak. No one change caused it, but it still needs to be dealt with before your monitoring falls over. A Prometheus 2.x can handle somewhere north of ten millions series over a time window, which is rather generous, but unwise label choices can eat that surprisingly quickly.

Scrape Interval

Reference

FEATURED TAGS

algorithm algorithmproblem architecturalpattern architecture aws blockchain c# cachesystem codis compile concurrentcontrol database dataformat datastructure debug design designpattern distributedsystem django docker domain engineering freebsd git golang grafana hackintosh hadoop hardware hexo http hugo ios iot java javaee javascript kafka kubernetes linux linuxcommand linuxio lock macos markdown microservices mysql nas network networkprogramming nginx node.js npm oop openwrt operatingsystem padavan performance programming prometheus protobuf python redis router security shell software testing spring sql systemdesign truenas ubuntu vmware vpn windows wmware wordpress xml zookeeper

【Prometheus】Best Practice

Cardinality

Scrape Interval

Reference

FEATURED TAGS

FRIENDS

TOC