[NOTE] Updated July 3, 2020. This article may have outdated content or subject matter.
0x00 陷入江局
回顾下前两篇文档 《纸上谈兵 - TiDB 性能调优》& 《沙盘点将 - TiDB 监控知识初始化》;处理问题的时候从总是一脸懵逼到偶尔无证懵逼;然后尝试给某个组件加个「无证之罪」先绕过这个棘手的问题。
![陷入江局](./jiangju.png)
程序在干什么?观察 tidb 监控和 profile 获取 SQL 运行在 TiDB Cluster 的情况
系统在干什么?使用 linux 命令和 node export 查看运行负载
程序和系统一起在干什么?比如 disk util 高的原因?CPU / MEM 占用高的原因?
重点回顾;《一条 SQL 的使命》
- TiDB
- QPS / duration / statement / slow query
- parser / compile / PD tso query
- kv count / kv error / kv duration
- TiKV
- GRPC「TiKV 进出口港」 / thread cpu / cluster / error
- coprocessor / schedule prewrite & commit
- coprocessor 与 store readpool 是两种查询方式,store readpool 主要应用不回表主键点查
- task / Raft / RocksDB
- overview
- CPU / Memory / Disk performance / OPEN FD / Network IO
- SQL 在 TiDB cluster 内运行的过程均有打点记录,资源/耗时 可在 Grafana 上分析后查证
- SQL 进入 TiDB 之前、TiKV 与磁盘的交互;该怎么和 SQL 做一一对应呢?
0x01 Slow Query
slow query 文件支持 pt-query-digest 工具分析
PingCAP 官网-slow query 慢日志;TiDB 会将执行时间超过 slow-threshold(默认值为 300 毫秒)的语句输出到 slow-query-file(默认值:“tidb-slow.log”)日志文件中,用于帮助用户定位慢查询语句,分析和解决 SQL 执行的性能问题
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| # Time: 2020-07-02T16:45:48.948834124+08:00 // 执行 query 时间
# Txn_start_ts: 417773531665268886 // 事物 tso ,类似 mysql gtid
# User: dms@172.16.4.51 // 执行用户与客户端 IP「前置为 Proxy 时,此处显示 Proxy IP」
# Conn_ID: 327382 // 链接 ID
# Query_time: 32.452025271 // query 从 begin 至 commit 总时长,begin 开启后发送 select 语句后,空置一段时间后提交也会被计入
# Parse_time: 0.056166789 // TiDB-server AST 解析 text to sql 耗时
# Compile_time: 0.000314969 // TiDB-server Plan 编译耗时
# Process_time: 0.001 Wait_time: 0.001 Request_count: 2 Prewrite_time: 0.003820812 Commit_time: 0.003007832 Get_commit_ts_time: 0.000107308 Write_keys: 1 Write_size: 539 Prewrite_region: 1
# DB: dms // 数据库信息
# Index_names: [t_data_0210:index_id] // index 使用信息
# Is_internal: false
# Digest: 26becfd8135c8f31011bdb317406018a8de4473c26af18180ef97a91c224a77e
# Stats: t_data_0210:pseudo // # Stats: t_data_0210:417773784057512074
# Num_cop_tasks: 2
# Cop_proc_avg: 0.0005 Cop_proc_p90: 0.001 Cop_proc_max: 0.001 Cop_proc_addr: 172.16.4.51:20160 // store readpool 不显示该部分信息
# Cop_wait_avg: 0.0005 Cop_wait_p90: 0.001 Cop_wait_max: 0.001 Cop_wait_addr: 172.16.4.51:20160
# Mem_max: 27644
# Prepared: false
# Has_more_results: false
# Succ: true
# Plan: tidb_decode_plan('2gHwaTAJMzFfNgkwCTEJCjEJMTNfNAkxCTEJdGFibGU6VF9EQVRBXzAyMTkxMDE3MDc0NDk4NjAsIGluZGV4OmRldmljZV9pZCwgcmFuZ2U6WyIyMDAxMTgwMDc2NDE4MSIsIjIwMDExODAwNzYFEYBdLCBrZWVwIG9yZGVyOmZhbHNlLCBzdGF0czpwc2V1ZG8BhQgwXzWOhQB6SQA=')
UPDATE t_data_0210 SET `pos`='114' WHERE (`id`='800764181');
|
exPlan
1
2
3
4
5
6
7
8
9
| (root@127.0.0.1) [(none)]>select tidb_decode_plan('2gHwaTAJMzFfNgkwCTEJCjEJMTNfNAkxCTEJdGFibGU6VF9EQVRBXzAyMTkxMDE3MDc0NDk4NjAsIGluZGV4OmRldmljZV9pZCwgcmFuZ2U6WyIyMDAxMTgwMDc2NDE4MSIsIjIwMDExODAwNzYFEYBdLCBrZWVwIG9yZGVyOmZhbHNlLCBzdGF0czpwc2V1ZG8BhQgwXzWOhQB6SQA=');
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tidb_decode_plan('2gHwaTAJMzFfNgkwCTEJCjEJMTNfNAkxCTEJdGFibGU6VF9EQVRBXzAyMTkxMDE3MDc0NDk4NjAsIGluZGV4OmRldmljZV9pZCwgcmFuZ2U6WyIyMDAxMTgwMDc2NDE4MSIsIjIwMDExODAwNzYFEYBdLCBrZWVwIG9yZGVyOmZhbHNlLCBzdGF0czpwc2V1ZG8BhQgwXzWOhQB6SQA=') |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TableReader_6 root 1
├─IndexScan_4 cop 1 table:t_data_0210, index:id, range:["800764181","800764181"], keep order:false, stats:pseudo
└─TableScan_5 cop 1 table:t_data_0210, keep order:false, stats:pseudo |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
|
Analyze
PingCAP 官网-Analyze;EXPLAIN ANALYZE 语句的工作方式类似于 EXPLAIN,主要区别在于前者实际上会执行语句。这样可以将查询计划中的估计值与执行时所遇到的实际值进行比较。如果估计值与实际值显著不同,那么应考虑在受影响的表上运行 ANALYZE TABLE。
explain analyze 是将 SQL 语句在数据库中执行了一遍,然后通过打点收集到的数据;如果是 insert / update /delete 会被实际运行并且操作数据
1
2
3
4
5
6
7
8
9
10
| explain analyze select id from T_DATA_0210 WHERE (`id`='800764181');
+---------------------+-------+------+-------------------------------------------------------------------------------------------------------------+----------------------------------+---------+
| id | count | task | operator info | execution info | memory |
+---------------------+-------+------+-------------------------------------------------------------------------------------------------------------+----------------------------------+---------+
| Projection_4 | 2.00 | root | paas_dms.t_data_0210.id | time:1.195879ms, loops:2, rows:1 | N/A |
| └─IndexLookUp_7 | 2.00 | root | | time:1.189899ms, loops:2, rows:1 | 1.25 KB |
| ├─IndexScan_5 | 2.00 | cop | table:t_data_0210, index:id, range:["800764181","800764181"], keep order:false | time:0s, loops:1, rows:1 | N/A |
| └─TableScan_6 | 2.00 | cop | table:t_data_0210, keep order:false | time:0s, loops:1, rows:1 | N/A |
+---------------------+-------+------+-------------------------------------------------------------------------------------------------------------+----------------------------------+---------+
4 rows in set (0.00 sec)
|
1
2
3
4
5
6
7
8
9
10
11
12
| // explain analyze select count(*)/count(distinct date_format(account_mdate,'%Y-%m-%d')) from account;
// 看起来只做了一次 table full scan、同时 distinct 与 date_format 无法下推到 tikv 上执行
+-----------------------------+--------------+-----------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-----------------------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+-----------------------------+--------------+-----------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-----------------------+------+
| Projection_4 | 1.00 | 1 | root | | time:18.899834996s, loops:2, Concurrency:OFF | div(cast(Column#19, decimal(20,0) BINARY), cast(Column#20, decimal(20,0) BINARY))->Column#21 | 744 Bytes | N/A |
| └─StreamAgg_6 | 1.00 | 1 | root | | time:18.899801597s, loops:2 | funcs:count(1)->Column#19, funcs:count(distinct Column#23)->Column#20 | 17.5087890625 KB | N/A |
| └─Projection_11 | 100036000.00 | 100036000 | root | | time:670.978479ms, loops:98163, Concurrency:8 | date_format(findpt.account.account_mdate, %Y-%m-%d)->Column#23 | 208.76953125 KB | N/A |
| └─TableReader_10 | 100036000.00 | 100036000 | root | | time:528.386518ms, loops:98163, rpc num: 560, rpc max:684.898903ms, min:70.020085ms, avg:427.347923ms, p80:474.446604ms, p95:546.15945ms, proc keys max:255761, p95:178374 | data:TableFullScan_9 | 113.44037055969238 MB | N/A |
| └─TableFullScan_9 | 100036000.00 | 100036000 | cop[tikv] | table:account | proc max:659ms, min:65ms, p80:459ms, p95:529ms, iters:100403, tasks:560 | keep order:false | N/A | N/A |
+-----------------------------+--------------+-----------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+-----------------------+------+
5 rows in set (18.98 sec)
|
Trace
PingCAP 官网-Trace;TRACE 语句用于提供查询执行的详细信息,可通过 TiDB 服务器状态端口所公开的图形界面进行查看。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| trace format = 'row' select id from T_DATA_0210 WHERE (`id`='800764181');
+---------------------------+-----------------+------------+
| operation | startTS | duration |
+---------------------------+-----------------+------------+
| session.getTxnFuture | 17:10:42.721708 | 10.991µs |
| ├─session.Execute | 17:10:42.721705 | 978.341µs |
| ├─session.ParseSQL | 17:10:42.722064 | 28.325µs |
| ├─executor.Compile | 17:10:42.722106 | 191.556µs |
| ├─session.runStmt | 17:10:42.722318 | 314.567µs |
| ├─session.CommitTxn | 17:10:42.722616 | 2.942µs |
| ├─recordSet.Next | 17:10:42.722694 | 812.092µs |
| ├─projection.Next | 17:10:42.722696 | 801.486µs |
| ├─tableReader.Next | 17:10:42.723073 | 394.131µs |
| ├─tableReader.Next | 17:10:42.723484 | 947ns |
| ├─recordSet.Next | 17:10:42.723516 | 12.577µs |
| └─projection.Next | 17:10:42.723518 | 1.747µs |
+---------------------------+-----------------+------------+
12 rows in set (0.01 sec)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
| // 以下信息省略 19 万行,通过 trace 清晰的发现 tidb 与 tikv 之间的网络交互频率
trace format = 'row' select count(*)/count(distinct date_format(account_mdate,'%Y-%m-%d')) from account;
+------------------------------------------------------+-----------------+-----------------+
| operation | startTS | duration |
+------------------------------------------------------+-----------------+-----------------+
| trace | 11:37:08.052125 | 3m58.620078133s |
| ├─session.Execute | 11:37:08.052129 | 1.714218ms |
| │ ├─session.ParseSQL | 11:37:08.052134 | 21.379µs |
| │ ├─executor.Compile | 11:37:08.052177 | 270.936µs |
| │ │ └─session.getTxnFuture | 11:37:08.052184 | 2.886µs |
| │ └─session.runStmt | 11:37:08.052468 | 1.353082ms |
| │ ├─TableReaderExecutor.Open | 11:37:08.052518 | 1.215215ms |
| │ │ └─distsql.Select | 11:37:08.052529 | 1.193386ms |
| │ │ ├─rpcClient.SendRequest | 11:37:08.053832 | 404.531258ms |
| │ │ ├─rpcClient.SendRequest | 11:39:53.082502 | 269.187122ms |
| │ │ ├─rpcClient.SendRequest | 11:39:53.785311 | 240.928959ms |
| │ │ └─rpcClient.SendRequest * N | 11:39:54.539536 | 331.98845ms |
| │ └─session.CommitTxn | 11:37:08.053803 | 13.513µs |
| │ └─session.doCommitWitRetry | 11:37:08.053806 | 2.235µs |
| ├─*executor.ProjectionExec.Next | 11:37:08.053865 | 3m58.617880366s |
| │ └─*executor.StreamAggExec.Next | 11:37:08.053867 | 3m58.614767904s |
| │ ├─*executor.ProjectionExec.Next | 11:37:08.053869 | 26.269835ms |
| │ │ ├─*executor.TableReaderExecutor.Next | 11:37:08.053915 | 25.530341ms |
| │ │ ├─*executor.TableReaderExecutor.Next | 11:37:08.079471 | 8.329µs |
| │ │ ├─*executor.TableReaderExecutor.Next | 11:37:08.079491 | 3.369µs |
| │ │ ├─*executor.TableReaderExecutor.Next | 11:41:06.572221 | 14.026µs |
| │ │ ├─*executor.TableReaderExecutor.Next | 11:41:06.577805 | 31.832µs |
| │ │ └─*executor.TableReaderExecutor.Next * N | 11:41:06.583194 | 19.7µs |
| │ ├─*executor.ProjectionExec.Next | 11:37:08.080339 | 5.499µs |
| │ ├─*executor.ProjectionExec.Next | 11:37:08.080509 | 5.065µs |
| │ ├─*executor.ProjectionExec.Next | 11:37:08.080681 | 5.46µs |
| │ ├─*executor.ProjectionExec.Next | 11:37:08.080855 | 9.297µs |
| │ ├─*executor.ProjectionExec.Next | 11:41:06.663323 | 2.028µs |
| │ └─*executor.ProjectionExec.Next * N | 11:41:06.665968 | 1.844µs |
| └─*executor.ProjectionExec.Next | 11:41:06.671764 | 11.448µs |
| └─*executor.StreamAggExec.Next | 11:41:06.671767 | 568ns |
+------------------------------------------------------+-----------------+-----------------+
196901 rows in set (5 min 56.31 sec)
|
0x02 SSR
PingCAP 官网-Cluster info;集群拓扑表 CLUSTER_INFO 提供集群当前的拓扑信息,以及各个节点的版本信息、版本对应的 Git Hash、各节点的启动时间、各实例的运行时间。
TiDB Dashboard是 TiDB 自 4.0 版本起提供的图形化界面,可用于监控及诊断 TiDB 集群。TiDB Dashboard 内置于 TiDB 的 PD 组件中,无需独立部署。
0x03 Profile
Profile 用于分析函数占用内存、CPU 资源信息,TiDB 4.0 版本可以在 tidb-dashboard 上一键获取;以下为分析 tidb-dashboard 后挖掘到的 HTTP API 信息
1
2
3
4
5
6
7
8
9
10
| curl http://{PD IP}:{PD 2379}/pd/api/v1/debug/pprof/profile?seconds={int} > CPU-profile
// 上述命令获取 CPU profile 信息
// PD 所有 Profile api 请求,每条 api 输出后二进制文件,使用 go tool pprof 解析
apiRouter.HandleFunc("/debug/pprof/profile", pprof.Profile)
apiRouter.Handle("/debug/pprof/heap", pprof.Handler("heap"))
apiRouter.Handle("/debug/pprof/mutex", pprof.Handler("mutex"))
apiRouter.Handle("/debug/pprof/allocs", pprof.Handler("allocs"))
apiRouter.Handle("/debug/pprof/block", pprof.Handler("block"))
apiRouter.Handle("/debug/pprof/goroutine", pprof.Handler("goroutine"))
|
- 获取 TiDB-server Profile
- 该接口可以获取 tidb-server profile 压缩包,内涵 cpu、内存等信息
- 内容来自 TiDB http api 文档 Download TiDB debug info
1
2
3
4
5
6
7
8
9
10
| curl http://{TiDB IP}:{TiDB status_port}/debug/pprof/profile?seconds={int} > debug.zip
// zip file will include:
// Go heap pprof(after GC)
// Go cpu pprof(10s)
// Go mutex pprof
// Full goroutine
// TiDB config and version
// Param:
// seconds: profile time(s), default is 10s.
|
1
| curl http://{TiKV IP}:{TiKV status_port}/debug/pprof/profile?seconds={int} > tikv.svg
|
1
2
3
4
5
6
7
| yum install go
// 安装 go 程序环境
go tool pprof -http 172.16.4.51:890 <profile-name>
// 使用 go tool pprof 工具解析 profile 数据,并开启一个网页以便在浏览器上查看
浏览器打开该网页访问并分析信息 http://172.16.4.51:890/
|
0x04 Linux
- vmstat / top / htop
- dstat / iotop / iostat
- lsof
- pstack // 该命令会阻断程序正常运行,禁止在 TiKV 使用;会导致 tikv 状态变为 Disconnected,从而丢失所有 region leader
- strace
- gdb
Sar
SAR(System Activity Reporter),可以从 14 个大方面对系统的活动进行报告,包括文件的读写情况、系统调用的使用情况、串口、CPU 效率、内存使用状况、进程活动及 IPC 有关的活动等,使用也是较为复杂。
追溯过去的统计数据(默认)
周期性的查看当前数据
怀疑 CPU 存在瓶颈,可用 sar -u 和 sar -q 等来查看
怀疑内存存在瓶颈,可用 sar -B、sar -r 和 sar -W 等来查看
怀疑 I/O 存在瓶颈,可用 sar -b、sar -u 和 sar -d 等来查看
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| -A 汇总所有的报告
-a 报告文件读写使用情况
-B 报告附加的缓存的使用情况
-b 报告缓存的使用情况
-c 报告系统调用的使用情况
-d 报告磁盘的使用情况
-g 报告串口的使用情况
-h 报告关于 buffer 使用的统计数据
-m 报告 IPC 消息队列和信号量的使用情况
-n 报告命名 cache 的使用情况
-p 报告调页活动的使用情况
-q 报告运行队列和交换队列的平均长度
-R 报告进程的活动情况
-r 报告没有使用的内存页面和硬盘块
-u 报告 CPU 的利用率
-v 报告进程、i 节点、文件和锁表状态
-w 报告系统交换活动状况
-y 报告 TTY 设备活动状况
|
0x05 不友好的
- useCursorFetch=True&defaultFetchSize>0
- update batchexecutor
Author
Atman
LastMod
2020-07-03
License
CC BY-NC-ND 4.0 | 本文不带任何官方色彩,最终解释权归本站所有