【Prometheus】Prometheus - Node Exporter

Posted by 西维蜀黍 on 2020-07-22, Last Modified on 2022-04-01

安装Node Exporter

在Prometheus的架构设计中,Prometheus Server并不直接服务监控特定的目标,其主要任务负责数据的收集,存储并且对外提供数据查询支持。因此为了能够能够监控到某些东西,如主机的CPU使用率,我们需要使用到Exporter。Prometheus周期性的从Exporter暴露的HTTP服务地址(通常是/metrics)拉取监控样本数据。

xporter可以是一个相对开放的概念,其可以是一个独立运行的程序独立于监控目标以外,也可以是直接内置在监控目标中。只要能够向Prometheus提供标准格式的监控样本数据即可。

这里为了能够采集到主机的运行指标如CPU, 内存,磁盘等信息。我们可以使用Node Exporter

Node Exporter同样采用Golang编写,并且不存在任何的第三方依赖,只需要下载,解压即可运行。可以从 https://prometheus.io/download/ 获取最新的node exporter版本的二进制包。

macOS

$ curl -OL https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.darwin-amd64.tar.gz
$ tar -xzf node_exporter-1.0.1.darwin-amd64.tar.gz

# via brew
$ brew install node_exporter
# run
$ brew services start node_exporter
# check the port
$ lsof -i:9100
COMMAND    PID   USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
node_expo 2944 shiwei    3u  IPv6 0xe44152304252ea31      0t0  TCP *:hp-pdl-datastr (LISTEN)
$ curl http://localhost:9100/metrics

Linux

$ curl -OL https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz
$ tar -xzf node_exporter-1.0.1.linux-amd64.tar.gz

# or
$ sudo apt install prometheus-node-exporter
$ sudo systemctl enable prometheus-node-exporter; sudo systemctl status prometheus-node-exporter

FreeBSD

$ pkg install node_exporter
$ sysrc node_exporter_enable=YES
$ service node_exporter start

# Test
$ curl 127.0.0.1:9100/metrics

运行node exporter:

$ cd node_exporter-1.0.1.darwin-amd64
$ ./node_exporter

# Or
$ nohup /home/parallels/node_exporter-1.0.1.linux-amd64/node_exporter &

启动成功后,可以看到以下输出:

level=info ts=2020-07-22T15:10:28.758Z caller=node_exporter.go:177 msg="Starting node_exporter" version="(version=1.0.1, branch=HEAD, revision=3715be6ae899f2a9b9dbfd9c39f3e09a7bd4559f)"
level=info ts=2020-07-22T15:10:28.758Z caller=node_exporter.go:178 msg="Build context" build_context="(go=go1.14.4, user=root@1f76dbbcfa55, date=20200616-12:44:12)"
level=info ts=2020-07-22T15:10:28.759Z caller=node_exporter.go:105 msg="Enabled collectors"
level=info ts=2020-07-22T15:10:28.759Z caller=node_exporter.go:112 collector=arp
level=info ts=2020-07-22T15:10:28.759Z caller=node_exporter.go:112 collector=bcache
level=info ts=2020-07-22T15:10:28.759Z caller=node_exporter.go:112 collector=bonding
level=info ts=2020-07-22T15:10:28.759Z caller=node_exporter.go:112 collector=btrfs
level=info ts=2020-07-22T15:10:28.759Z caller=node_exporter.go:112 collector=conntrack
level=info ts=2020-07-22T15:10:28.759Z caller=node_exporter.go:112 collector=cpu
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=cpufreq
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=diskstats
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=edac
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=entropy
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=filefd
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=filesystem
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=hwmon
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=infiniband
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=ipvs
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=loadavg
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=mdadm
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=meminfo
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=netclass
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=netdev
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=netstat
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=nfs
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=nfsd
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=powersupplyclass
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=pressure
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=rapl
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=schedstat
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=sockstat
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=softnet
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=stat
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=textfile
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=thermal_zone
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=time
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=timex
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=udp_queues
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=uname
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=vmstat
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=xfs
level=info ts=2020-07-22T15:10:28.760Z caller=node_exporter.go:112 collector=zfs
level=info ts=2020-07-22T15:10:28.761Z caller=node_exporter.go:191 msg="Listening on" address=:9100
level=info ts=2020-07-22T15:10:28.761Z caller=tls_config.go:170 msg="TLS is disabled and it cannot be enabled on the fly." http2=false

访问 http://localhost:9100/ 可以看到以下页面:

Start Onboot on Ubuntu

$ curl -OL https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz
$ tar -xzf node_exporter-1.0.1.linux-amd64.tar.gz
$ sudo cp node_exporter-1.0.1.linux-amd64/node_exporter /usr/local/bin
$ sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Adding a node_exporter user

We can now add a user to handle the exporting of our metrics. This user will be a system user (-r) who will be unable to get a shell (-s /bin/false)

sudo useradd -rs /bin/false node_exporter

Setting up the Node Exporter as a service

Now we can create a service for the node exporter. We can do this by creating a file

sudo vim /etc/systemd/system/node_exporter.service

with the following contents

[Unit]
Description=Node Exporter
After=network.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/home/sw/node_exporter-1.0.1.linux-amd64/node_exporter

[Install]
WantedBy=multi-user.target

You can then save the file (Ctrl + o) and exit (Ctrl + x).

Start the service

We can now start the service, enable it so it starts on boot and view the status to make sure it started correctly

sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter
sudo systemctl status node_exporter

Compile Source Code

Building:

git clone https://github.com/prometheus/node_exporter.git
cd node_exporter
make
./node_exporter <flags>

To see all available configuration flags:

./node_exporter -h

初识 Node Exporter 监控 metrics

访问http://localhost:9100/metrics ,可以看到当前node exporter获取到的当前主机的所有监控数据,如下所示:

每一个监控指标之前都会有一段类似于如下形式的信息:

# HELP node_cpu Seconds the cpus spent in each mode.
# TYPE node_cpu counter
node_cpu{cpu="cpu0",mode="idle"} 362812.7890625
# HELP node_load1 1m load average.
# TYPE node_load1 gauge
node_load1 3.0703125

其中HELP用于解释当前指标的含义,TYPE则说明当前指标的数据类型。在上面的例子中node_cpu的注释表明当前指标是cpu0上idle进程占用CPU的总时间,CPU占用时间是一个只增不减的度量指标,从类型中也可以看出node_cpu的数据类型是计数器(counter),与该指标的实际含义一致。又例如node_load1该指标反映了当前主机在最近一分钟以内的负载情况,系统的负载情况会随系统资源的使用而变化,因此node_load1反映的是当前状态,数据可能增加也可能减少,从注释中可以看出当前指标类型为仪表盘(gauge),与指标反映的实际含义一致。

除了这些以外,在当前页面中根据物理主机系统的不同,你还可能看到如下监控指标:

  • node_boot_time:系统启动时间
  • node_cpu:系统CPU使用量
  • node*disk**:磁盘IO
  • node*filesystem**:文件系统用量
  • node_load1:系统负载
  • node*memeory**:内存使用量
  • node*network**:网络带宽
  • node_time:当前系统时间
  • go_*:node exporter中go相关指标
  • process_*:node exporter自身进程相关运行指标

通过 Prometheus UI (Prometheus expression browser)查看 Metrics

Your locally running Prometheus instance needs to be properly configured in order to access Node Exporter metrics. The following scrape_config block (in a prometheus.yml configuration file) will tell the Prometheus instance to scrape from the Node Exporter via localhost:9100:

scrape_configs:
- job_name: 'node'
  static_configs:
  - targets: ['localhost:9100']

Metrics specific to the Node Exporter are prefixed with node_ and include metrics like node_cpu_seconds_total and node_exporter_build_info.

Click on the links below to see some example metrics:

Metric Meaning
rate(node_cpu_seconds_total{mode="system"}[1m]) The average amount of CPU time spent in system mode, per second, over the last minute (in seconds)
node_filesystem_avail_bytes The filesystem space available to non-root users (in bytes)
rate(node_network_receive_bytes_total[1m]) The average network traffic received, per second, over the last minute (in bytes)

Grafana查看

Prometheus虽然自带了Web页面,但一般会和更专业的Grafana配套做指标的可视化,Grafana有很多模板,用于更友好地展示出指标的情况,如:

Reference

FreeBSD Node Exporter