监控架构 - Blackbox_exporter 主动探测端口服务状态
Contents
[NOTE] Updated January 20, 2020. This article may have outdated content or subject matter.
0x00
找个能监控组件端口是否存在的服务,并且可以将数据发送到 Prometheus ;blackbox export 是 Prometheus 家族一员,按照配置文件内容用于向目标主动探测
0x01 Deploy
blackbox_exporter 是 Prometheus 官方提供的 exporter 之一,可以提供 http、dns、tcp、icmp 的监控数据采集
Binary
- Release Download
./bin/blackbox_exporter --web.listen-address=:9115 --log.level=info --config.file=conf/blackbox.yml
Docker
1 2
docker pull prom/blackbox-exporter docker run -d -p 9115:9115 --name blackbox_exporter -v `pwd`:/config blackbox_exporter --config.file=/config/blackbox.yml
blackbox.yml template
- 通过 blackbox.yml 定义模块详细信息
- 在 Prometheus 配置文件中引用该模块以及配置被监控目标主机
blackbox.yml 配置文件
如无特殊需求,使用默认配置文件即可 blackbox-good.yml 官方配置文件案例
|
|
0x02 功能测试
- HTTP 测试
- 定义 Request Header 信息
- 判断 Http status / Http Respones Header / Http Body 内容
- TCP 测试
- 业务组件端口状态监听
- 应用层协议定义与监听
- ICMP 测试
- 主机探活机制
Ping 测试
- 阅读 icmp.go 了解 ICMP 模块工作
- 以下内容引用 用 Prometheus 进行网络质量 Ping 监控
- 相关代码块添加到 Prometheus 配置文件内
|
|
TCP 测试
- 阅读 tcp_test.go 了解 TCP 工作
- 监听 TiDB TiKV PD 的业务端口地址,用来判断服务是否在线
- 相关代码块添加到 Prometheus 文件内
|
|
HTTP 测试
- http_test.go
- 相关代码块添加到 Prometheus 文件内
|
|
0x03 告警测试
通过
http://172.16.10.65:9115/probe?target=prometheus.io&module=http_2xx&debug=true
方式可观察监听过程1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
Logs for the probe: ts=2018-04-13T03:47:36.354717876Z caller=main.go:116 module=http_2xx target=prometheus.io level=info msg="Beginning probe" probe=http timeout_seconds=9.5 ts=2018-04-13T03:47:36.354917698Z caller=utils.go:42 module=http_2xx target=prometheus.io level=info msg="Resolving target address" preferred_ip_protocol=ip6 ts=2018-04-13T03:47:36.403356427Z caller=utils.go:65 module=http_2xx target=prometheus.io level=info msg="Resolved target address" ip=2400:cb00:2048:1::6818:783c ts=2018-04-13T03:47:36.40345284Z caller=http.go:282 module=http_2xx target=prometheus.io level=info msg="Making HTTP request" url=http://[2400:cb00:2048:1::6818:783c] host=prometheus.io ts=2018-04-13T03:47:36.414825723Z caller=http.go:297 module=http_2xx target=prometheus.io level=error msg="Error for HTTP request" err="Get http://[2400:cb00:2048:1::6818:783c]: dial tcp [2400:cb00:2048:1::6818:783c]:80: connect: network is unreachable" ts=2018-04-13T03:47:36.426234543Z caller=http.go:354 module=http_2xx target=prometheus.io level=info msg="Response timings for roundtrip" roundtrip=0 start=2018-04-13T11:47:36.414648186+08:00 dnsDone=2018-04-13T11:47:36.414648186+08:00 connectDone=2018-04-13T11:47:36.414794718+08:00 gotConn=0001-01-01T00:00:00Z responseStart=0001-01-01T00:00:00Z end=2018-04-13T11:47:36.414814678+08:00 ts=2018-04-13T03:47:36.426339526Z caller=main.go:129 module=http_2xx target=prometheus.io level=error msg="Probe failed" duration_seconds=0.071537196 Metrics that would have been returned: # HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds # TYPE probe_dns_lookup_time_seconds gauge probe_dns_lookup_time_seconds 0.048474718 # HELP probe_duration_seconds Returns how long the probe took to complete in seconds # TYPE probe_duration_seconds gauge probe_duration_seconds 0.071537196 # HELP probe_failed_due_to_regex Indicates if probe failed due to regex # TYPE probe_failed_due_to_regex gauge probe_failed_due_to_regex 0 # HELP probe_http_content_length Length of http content response # TYPE probe_http_content_length gauge probe_http_content_length 0 # HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects # TYPE probe_http_duration_seconds gauge probe_http_duration_seconds{phase="connect"} 0 probe_http_duration_seconds{phase="processing"} 0 probe_http_duration_seconds{phase="resolve"} 0.048474718 probe_http_duration_seconds{phase="tls"} 0 probe_http_duration_seconds{phase="transfer"} 0 # HELP probe_http_redirects The number of redirects # TYPE probe_http_redirects gauge probe_http_redirects 0 # HELP probe_http_ssl Indicates if SSL was used for the final redirect # TYPE probe_http_ssl gauge probe_http_ssl 0 # HELP probe_http_status_code Response HTTP status code # TYPE probe_http_status_code gauge probe_http_status_code 0 # HELP probe_http_version Returns the version of HTTP of the probe response # TYPE probe_http_version gauge probe_http_version 0 # HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6 # TYPE probe_ip_protocol gauge probe_ip_protocol 6 # HELP probe_success Displays whether or not the probe was a success # TYPE probe_success gauge probe_success 0 Module configuration: prober: http http: method: GET
告警测试规则
判断 probe_success metric 结果值,达到告警效果
- 成功 == 1
- 失败 == 0
创建
port.rules.yml
文件,写入以下告警规则
|
|
上报 Prometheus
- Prometheus 配置文件中添加
port.rules.yml
字段,修改完成后重启生效- 如果 Prometheus 已经引入 Alertmanger 服务,Prometheus 监控到端口 down 掉后会触发告警
|
|