监控架构 - Node_exporter 系统监控组件
Contents
[NOTE] Updated January 20, 2020. This article may have outdated content or subject matter.
0x00 起源
学习 node export 存的
0x01 Collector
- 通过
http://IP:9100/metrics
查看当前主机监控信息 - Node export 通过Collector代码块收集系统性能指标
以下内容引用 Prometeus/Node_export README 每个操作系统上的 Collectors 都有不同的支持。下表列出了所有现有 Collectors 和支持的系统。
通过 --collector.<name>
参数启用相应的 Collectors
通过 --no-collector.<name>
参数禁用默认已经启动的 Collectors
默认开启的监控类型
Name | Description | OS |
---|---|---|
arp | Exposes ARP statistics from /proc/net/arp . | Linux |
bcache | Exposes bcache statistics from /sys/fs/bcache/ . | Linux |
bonding | Exposes the number of configured and active slaves of Linux bonding interfaces. | Linux |
conntrack | Shows conntrack statistics (does nothing if no /proc/sys/net/netfilter/ present). | Linux |
cpu | Exposes CPU statistics | Darwin, Dragonfly, FreeBSD, Linux |
diskstats | Exposes disk I/O statistics. | Darwin, Linux |
edac | Exposes error detection and correction statistics. | Linux |
entropy | Exposes available entropy. | Linux |
exec | Exposes execution statistics. | Dragonfly, FreeBSD |
filefd | Exposes file descriptor statistics from /proc/sys/fs/file-nr . | Linux |
filesystem | Exposes filesystem statistics, such as disk space used. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
hwmon | Expose hardware monitoring and sensor data from /sys/class/hwmon/ . | Linux |
infiniband | Exposes network statistics specific to InfiniBand and Intel OmniPath configurations. | Linux |
ipvs | Exposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats . | Linux |
loadavg | Exposes load average. | Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris |
mdadm | Exposes statistics about devices in /proc/mdstat (does nothing if no /proc/mdstat present). | Linux |
meminfo | Exposes memory statistics. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
netdev | Exposes network interface statistics such as bytes transferred. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
netstat | Exposes network statistics from /proc/net/netstat . This is the same information as netstat -s . | Linux |
nfs | Exposes NFS client statistics from /proc/net/rpc/nfs . This is the same information as nfsstat -c . | Linux |
nfsd | Exposes NFS kernel server statistics from /proc/net/rpc/nfsd . This is the same information as nfsstat -s . | Linux |
sockstat | Exposes various statistics from /proc/net/sockstat . | Linux |
stat | Exposes various statistics from /proc/stat . This includes boot time, forks and interrupts. | Linux |
textfile | Exposes statistics read from local disk. The --collector.textfile.directory flag must be set. | any |
time | Exposes the current system time. | any |
timex | Exposes selected adjtimex(2) system call stats. | Linux |
uname | Exposes system information as provided by the uname system call. | Linux |
vmstat | Exposes statistics from /proc/vmstat . | Linux |
wifi | Exposes WiFi device and station statistics. | Linux |
xfs | Exposes XFS runtime statistics. | Linux (kernel 4.4+) |
zfs | Exposes ZFS performance statistics. | Linux |
默认关闭的监控类型
Name | Description | OS |
---|---|---|
buddyinfo | Exposes statistics of memory fragments as reported by /proc/buddyinfo. | Linux |
devstat | Exposes device statistics | Dragonfly, FreeBSD |
drbd | Exposes Distributed Replicated Block Device statistics (to version 8.4) | Linux |
interrupts | Exposes detailed interrupts statistics. | Linux, OpenBSD |
ksmd | Exposes kernel and system statistics from /sys/kernel/mm/ksm . | Linux |
logind | Exposes session counts from logind. | Linux |
meminfo_numa | Exposes memory statistics from /proc/meminfo_numa . | Linux |
mountstats | Exposes filesystem statistics from /proc/self/mountstats . Exposes detailed NFS client statistics. | Linux |
ntp | Exposes local NTP daemon health to check time | any |
qdisc | Exposes queuing discipline statistics | Linux |
runit | Exposes service status from runit. | any |
supervisord | Exposes service status from supervisord. | any |
systemd | Exposes service and system status from systemd. | Linux |
tcpstat | Exposes TCP connection status information from /proc/net/tcp and /proc/net/tcp6 . (Warning: the current version has potential performance issues in high load situations.) | Linux |
命令行参数
|
|
关闭指定 collector 模块
- 使用 prometheus.NewRegistry 判断并注册相关模块
- 引用 –no-collector 参数
关闭指定模块启动
|
|
默认启动组件
|
|
0x02 采集分析主机指标
- 以下部分内容引用 GPE 监控预警系统-node_exporter
文本格式
在讨论 Exporter 之前,有必要先介绍一下 Prometheus 文本数据格式,因为一个 Exporter 本质上就是将收集的数据,转化为对应的文本格式,并提供 http 请求。
Exporter 收集的数据转化的文本内容以行 (\n) 为单位,空行将被忽略, 文本内容最后一行为空行。
注释
文本内容,如果以 # 开头通常表示注释
以 # HELP 开头表示 metric 帮助说明
以 # TYPE 开头表示定义 metric 类型,包含 counter, gauge, histogram, summary, 和 untyped 类型
其他表示一般注释,供阅读使用,将被 Prometheus 忽略
采样数据
内容如果不以 # 开头,表示采样数据。它通常紧挨着类型定义行,满足以下格式(primaryExpr parses a primary expression.):
|
|
- 使用
curl http://127.0.0.1:9100/metrics
获取信息如下(部分数据):- 需要特别注意的是,假设采样数据 metric 叫做 x, 如果 x 是 histogram 或 summary 类型必需满足以下条件:
- 采样数据的总和应表示为 x_sum
- 采样数据的总量应表示为 x_count
- summary 类型的采样数据的 quantile 应表示为 x{quantile=“y”}
- histogram 类型的采样分区统计数据将表示为 x_bucket{le=“y”}
- histogram 类型的采样必须包含 x_bucket{le="+Inf"}, 它的值等于 x_count 的值
- summary 和 historam 中 quantile 和 le 必需按从小到大顺序排列
- 需要特别注意的是,假设采样数据 metric 叫做 x, 如果 x 是 histogram 或 summary 类型必需满足以下条件:
|
|