监控架构 - Node_exporter 系统监控组件
Contents
[NOTE] Updated January 20, 2020. This article may have outdated content or subject matter.
0x00 起源
学习 node export 存的
0x01 Collector
- 通过
http://IP:9100/metrics查看当前主机监控信息 - Node export 通过Collector代码块收集系统性能指标
以下内容引用 Prometeus/Node_export README 每个操作系统上的 Collectors 都有不同的支持。下表列出了所有现有 Collectors 和支持的系统。
通过 --collector.<name> 参数启用相应的 Collectors
通过 --no-collector.<name> 参数禁用默认已经启动的 Collectors
默认开启的监控类型
| Name | Description | OS |
|---|---|---|
| arp | Exposes ARP statistics from /proc/net/arp. | Linux |
| bcache | Exposes bcache statistics from /sys/fs/bcache/. | Linux |
| bonding | Exposes the number of configured and active slaves of Linux bonding interfaces. | Linux |
| conntrack | Shows conntrack statistics (does nothing if no /proc/sys/net/netfilter/ present). | Linux |
| cpu | Exposes CPU statistics | Darwin, Dragonfly, FreeBSD, Linux |
| diskstats | Exposes disk I/O statistics. | Darwin, Linux |
| edac | Exposes error detection and correction statistics. | Linux |
| entropy | Exposes available entropy. | Linux |
| exec | Exposes execution statistics. | Dragonfly, FreeBSD |
| filefd | Exposes file descriptor statistics from /proc/sys/fs/file-nr. | Linux |
| filesystem | Exposes filesystem statistics, such as disk space used. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
| hwmon | Expose hardware monitoring and sensor data from /sys/class/hwmon/. | Linux |
| infiniband | Exposes network statistics specific to InfiniBand and Intel OmniPath configurations. | Linux |
| ipvs | Exposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats. | Linux |
| loadavg | Exposes load average. | Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris |
| mdadm | Exposes statistics about devices in /proc/mdstat (does nothing if no /proc/mdstat present). | Linux |
| meminfo | Exposes memory statistics. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
| netdev | Exposes network interface statistics such as bytes transferred. | Darwin, Dragonfly, FreeBSD, Linux, OpenBSD |
| netstat | Exposes network statistics from /proc/net/netstat. This is the same information as netstat -s. | Linux |
| nfs | Exposes NFS client statistics from /proc/net/rpc/nfs. This is the same information as nfsstat -c. | Linux |
| nfsd | Exposes NFS kernel server statistics from /proc/net/rpc/nfsd. This is the same information as nfsstat -s. | Linux |
| sockstat | Exposes various statistics from /proc/net/sockstat. | Linux |
| stat | Exposes various statistics from /proc/stat. This includes boot time, forks and interrupts. | Linux |
| textfile | Exposes statistics read from local disk. The --collector.textfile.directory flag must be set. | any |
| time | Exposes the current system time. | any |
| timex | Exposes selected adjtimex(2) system call stats. | Linux |
| uname | Exposes system information as provided by the uname system call. | Linux |
| vmstat | Exposes statistics from /proc/vmstat. | Linux |
| wifi | Exposes WiFi device and station statistics. | Linux |
| xfs | Exposes XFS runtime statistics. | Linux (kernel 4.4+) |
| zfs | Exposes ZFS performance statistics. | Linux |
默认关闭的监控类型
| Name | Description | OS |
|---|---|---|
| buddyinfo | Exposes statistics of memory fragments as reported by /proc/buddyinfo. | Linux |
| devstat | Exposes device statistics | Dragonfly, FreeBSD |
| drbd | Exposes Distributed Replicated Block Device statistics (to version 8.4) | Linux |
| interrupts | Exposes detailed interrupts statistics. | Linux, OpenBSD |
| ksmd | Exposes kernel and system statistics from /sys/kernel/mm/ksm. | Linux |
| logind | Exposes session counts from logind. | Linux |
| meminfo_numa | Exposes memory statistics from /proc/meminfo_numa. | Linux |
| mountstats | Exposes filesystem statistics from /proc/self/mountstats. Exposes detailed NFS client statistics. | Linux |
| ntp | Exposes local NTP daemon health to check time | any |
| qdisc | Exposes queuing discipline statistics | Linux |
| runit | Exposes service status from runit. | any |
| supervisord | Exposes service status from supervisord. | any |
| systemd | Exposes service and system status from systemd. | Linux |
| tcpstat | Exposes TCP connection status information from /proc/net/tcp and /proc/net/tcp6. (Warning: the current version has potential performance issues in high load situations.) | Linux |
命令行参数
| |
关闭指定 collector 模块
- 使用 prometheus.NewRegistry 判断并注册相关模块
- 引用 –no-collector 参数
关闭指定模块启动
| |
默认启动组件
| |
0x02 采集分析主机指标
- 以下部分内容引用 GPE 监控预警系统-node_exporter
文本格式
在讨论 Exporter 之前,有必要先介绍一下 Prometheus 文本数据格式,因为一个 Exporter 本质上就是将收集的数据,转化为对应的文本格式,并提供 http 请求。
Exporter 收集的数据转化的文本内容以行 (\n) 为单位,空行将被忽略, 文本内容最后一行为空行。
注释
文本内容,如果以 # 开头通常表示注释
以 # HELP 开头表示 metric 帮助说明
以 # TYPE 开头表示定义 metric 类型,包含 counter, gauge, histogram, summary, 和 untyped 类型
其他表示一般注释,供阅读使用,将被 Prometheus 忽略
采样数据
内容如果不以 # 开头,表示采样数据。它通常紧挨着类型定义行,满足以下格式(primaryExpr parses a primary expression.):
| |
- 使用
curl http://127.0.0.1:9100/metrics获取信息如下(部分数据):- 需要特别注意的是,假设采样数据 metric 叫做 x, 如果 x 是 histogram 或 summary 类型必需满足以下条件:
- 采样数据的总和应表示为 x_sum
- 采样数据的总量应表示为 x_count
- summary 类型的采样数据的 quantile 应表示为 x{quantile=“y”}
- histogram 类型的采样分区统计数据将表示为 x_bucket{le=“y”}
- histogram 类型的采样必须包含 x_bucket{le="+Inf"}, 它的值等于 x_count 的值
- summary 和 historam 中 quantile 和 le 必需按从小到大顺序排列
- 需要特别注意的是,假设采样数据 metric 叫做 x, 如果 x 是 histogram 或 summary 类型必需满足以下条件:
| |