0x01 precheck

本篇文档产生原因来自 Step 7: Configure the CPUfreq governor mode on the target machine
根据 TiDB-ansible 安装时的报错反推物理机安装 TiDB 时需要干点啥……

help-tools-man

0x02 OS

结合官方文档 使用 TiDB Ansible 部署 TiDB 集群 文档先整理已知信息

Net / Port

物理机、虚拟机、云主机部署时,一定要确保内网防火墙开通以下端口,如果不了解 源地址目的地地址 可以在内网 any to any 放行。

组件端口变量默认端口说明
TiDBtidb_port4000应用及 DBA 工具访问通信端口
TiDBtidb_status_port10080TiDB 状态信息上报通信端口【与 Prometheus 通信】
TiKVtikv_port20160TiKV 通信端口
TiKVtikv_status_port20180上报 TiKV 状态的通信端口【与 Prometheus 通信】
PDpd_client_port2379提供 TiDB 和 PD 通信端口【与 Prometheus 通信】
PDpd_peer_port2380PD 集群节点间通信端口
Pumppump_port8250Pump 通信端口【Binlog 组件】、【与 Prometheus 通信】
Prometheusprometheus_port9090Prometheus 服务通信端口
Node_exporternode_exporter_port9100TiDB 集群每个节点的系统信息上报通信端口
Blackbox_exporterblackbox_exporter_port9115Blackbox_exporter 通信端口,用于 TiDB 集群端口监控
Grafanagrafana_port3000Web 监控服务对外服务和客户端(浏览器)访问端口
Kafka_exporterkafka_exporter_port9308Kafka_exporter 通信端口,用于监控 binlog Kafka 集群【默认不安装】
Pushgatewaypushgateway_port9091TiDB, TiKV, PD 监控聚合和上报端口【默认不安装】

SSH-key

按照文档中第 2 步:在中控机上创建 tidb 用户,并生成 SSH key 做好 中控机部署节点 SSH-key 免密互信,做这一步时需要 linux root 用户密码权限;如果公司内部使用 jump server 堡垒机,辣么可能需要手动创建一个小用户并设置 sudo 免密了【jump server 管理员可以批量创建用户】

如果不让配置 ssh-key 互信 & sudo 免密,那就需要了解下 ansible 的使用技巧了。比如:

  • 如果目标密码统一可以使用 -k 参数传递 ssh-remote user 密码

    1
    2
    
    -k, --ask-pass      ask for connection password
    -u REMOTE_USER, --user=REMOTE_USER
    
  • 如果密码不统一,可以在 inventory 配置密码

    • 此处使用明文密码,密码中包含 # $ ! 等特殊字符可能会导致 ssh 登陆失败
    1
    2
    3
    
    ansible_host=192.168.1.10 ansible_connection=ssh ansible_ssh_user=vagrant ansible_ssh_pass=vagrant10
    ansible_host=192.168.1.11 ansible_connection=ssh ansible_ssh_user=vagrant ansible_ssh_pass=vagrant11
    ansible_host=192.168.1.12 ansible_connection=ssh ansible_ssh_user=vagrant ansible_ssh_pass=vagrant12
    
  • remote user 提权参数

    • 注意区分 sudo 与 su 的使用姿势,以及 sudo 目标用户
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    
    -s, --sudo          run operations with sudo (nopasswd) (deprecated, use
                        become)
    -U SUDO_USER, --sudo-user=SUDO_USER
                        desired sudo user (default=root) (deprecated, use
                        become)
    -S, --su            run operations with su (deprecated, use become)
    -R SU_USER, --su-user=SU_USER
                        run operations with su as this user (default=None)
                        (deprecated, use become)
    -b, --become        run operations with become (does not imply password
                        prompting)
    -K, --ask-become-pass
    

NTP

centos 7 ntp 默认服务换成了 Chrony 服务;然 tidb-ansible 中使用的是 ntpd 服务。
如果公司主机系统内网已经有 NTP server 信息了,可以在 tidb-ansible/inventory.ini 中设置 enable_ntpd = True 变量为 False。
如果公司内网没有 NTP ,且内网没有 Yum 源,辣么就需要手动安装该服务了。

分布式系统需要一个隐藏性标准用来做信息实时判断,时间显然是最好的判断方式

CPU mode

为了让 CPU 发挥最大性能,请将 CPUfreq 调节器模式设置为 performance 模式。如需了解 CPUfreq 的更多信息,可查看使用 CPUFREQ 调控器文档。

通过 bios 调整主板参数,如果不方便重启服务器时可以通过以下命令暂时规避(重启后服务器后失效)

ansible all -m shell -a "cpupower frequency-set --governor performance" -b

Disk mount

  • tikv 数据目录挂载属性必须包含 nodelalloc 参数

UUID=c51eb23b-195c-4061-92a9-3fad812cc12f /data1 ext4 defaults,nodelalloc,noatime 0 2

Disk fio

磁盘性能检测,通过学习 ansible-playbook / tidb 看下是怎么操作的,代码块在这里 tidb-ansible/role/machine benchmark

该检测项仅针对 tikv_servers 主机

  • default values (roles/machine_benchmark/defaults/main.yml)

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    
    # fio randread iops
    min_ssd_randread_iops: 40000
    
    # fio mixed randread and sequential write
    min_ssd_mix_randread_iops: 10000
    min_ssd_mix_write_iops: 10000
    
    # fio mixed randread and sequential write lat
    max_ssd_mix_randread_lat: 250000
    max_ssd_mix_write_lat: 30000
    
    # fio test file size
    benchmark_size: 10G
    

测试使用 fio 命令,使用 psync IO 引擎,使用绕过内存缓存读写,使用基准 32k 大小,

  • 随机读测试 fio_randread.yml

    • 如果测试结果中 iops 小于 40000 将会报错
    1
    
    ./fio -ioengine=psync -bs=32k -fdatasync=1 -thread -rw=randread -size={{ benchmark_size }} -filename=fio_randread_test.txt -name='fio randread test' -iodepth=4 -runtime=60 -numjobs=4 -group_reporting --output-format=json --output=fio_randread_result.json
    
  • 混合场景:随机读 顺序写 fio_randread_write.yml

    • 随机读 iops 小于 10000 、顺序写小于 10000
    1
    
    ./fio -ioengine=psync -bs=32k -fdatasync=1 -thread -rw=randrw -percentage_random=100,0 -size={{ benchmark_size }} -filename=fio_randread_write_test.txt -name='fio mixed randread and sequential write test' -iodepth=4 -runtime=60 -numjobs=4 -group_reporting --output-format=json --output=fio_randread_write_test.json
    
  • 混合场景读写延迟:fio_randread_write_latency.yml

    • 随机读大于 25w ns(0.25 毫秒)、顺序写大于 30000 ns(0.03 毫秒)
    1
    
    ./fio -ioengine=psync -bs=32k -fdatasync=1 -thread -rw=randrw -percentage_random=100,0 -size={{ benchmark_size }} -filename=fio_randread_write_latency_test.txt -name='fio mixed randread and sequential write test' -iodepth=1 -runtime=60 -numjobs=1 -group_reporting --output-format=json --output=fio_randread_write_latency_test.json"
    

sysctl

关注 roles/bootstrap/tasks/root_tasks.yml,学习 tidb-ansible 修改了那些系统参数

  • selinux

    1
    2
    3
    4
    5
    
    cat /etc/selinux/config
    # 查看 selinux 状态
    vi /etc/selinux/config
    # 编辑 selinux 配置文件,将以下参数调整为 disable / 重启系统后生效
    SELINUX=disabled
    
  • Firewalld

    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    systemctl status firewalld.service
    # 查看防火墙状态
    systemctl stop firewalld.service
    # 关闭防火墙服务
    systemctl disable firewalld.service
    # 关闭开机启动
    
    iptable-save
    # 查看 iptable 配置,按需清理 iptable 规则
    
  • swap

    1
    2
    3
    
    swapoff -a
    # 使用命令关闭 swap 服务
    # 修改 /etc/fstab 卸载 swap 分区
    
  • sysctl

    1
    2
    3
    4
    5
    
    { name: 'net.core.somaxconn', value: 32768 }
    { name: 'vm.swappiness', value: 0 }
    { name: 'net.ipv4.tcp_syncookies', value: 0 }
    { name: 'fs.file-max', value: 1000000 }
    { name: 'net.ipv4.tcp_tw_recycle', value: 0 }
    
  • [never]表示透明大页禁用

    1
    2
    
    echo never > /sys/kernel/mm/transparent_hugepage/enabled
    echo never > /sys/kernel/mm/transparent_hugepage/defrag
    
  • irqbalance

    • Irqbalance is a daemon to help balance the cpu load generated by interrupts across all of a systems cpus.
  • ulimit

    1
    2
    3
    
    {{ deploy_user }}        soft        nofile        1000000
    {{ deploy_user }}        hard        nofile        1000000
    {{ deploy_user }}        soft        stack         10240
    

0x03 VIP

手动测试 / 如果需要跨机房部署就需要针对性测试网络延迟、网络带宽等信息

Net nc

假设机房分别部署在 上海 & 北京 两地,由于距离较远;而分布式应用对网络质量要求比较高。所以可以通过 ping、nc、iftop 等工具手动复检。

1
2
3
4
5
6
7
nc -l -p 1234 < /dev/zero
# 上海机房启动 server 端用于制造数据

nc 192.168.0.11 1234 > /dev/null
# 北京机房启动命令,填写 server 端 IP & Port 信息接受网络数据

安装 iftop 命令或者用 dstat 命令在双方服务器查看网络流量信息

0x04 Demo

intel P4610 磁盘性能测试

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
[root@jumphost bin]# ./fio -ioengine=psync -bs=32k -fdatasync=1 -thread -rw=randrw -percentage_random=100,0 -size=10G -filename=/data4/fio_randread_test.txt -name='fio randread test' -iodepth=4 -runtime=60 -numjobs=4 -group_reporting
fio randread test: (g=0): rw=randrw, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB, (T) 32.0KiB-32.0KiB, ioengine=psync, iodepth=4
...
fio-3.8
Starting 4 threads
fio randread test: Laying out IO file (1 file / 10240MiB)
Jobs: 4 (f=4): [m(4)][97.2%][r=814MiB/s,w=823MiB/s][r=26.1k,w=26.3k IOPS][eta 00m:01s]
fio randread test: (groupid=0, jobs=4): err= 0: pid=1349: Wed May 13 16:57:53 2020
   read: IOPS=18.7k, BW=586MiB/s (614MB/s)(19.0GiB/34923msec)
    clat (usec): min=3, max=2480, avg=111.13, stdev=168.36
     lat (usec): min=3, max=2480, avg=111.24, stdev=168.36
    clat percentiles (usec):
     |  1.00th=[    7],  5.00th=[    7], 10.00th=[    7], 20.00th=[    8],
     | 30.00th=[    8], 40.00th=[    8], 50.00th=[    9], 60.00th=[    9],
     | 70.00th=[  163], 80.00th=[  212], 90.00th=[  330], 95.00th=[  474],
     | 99.00th=[  717], 99.50th=[  816], 99.90th=[ 1090], 99.95th=[ 1270],
     | 99.99th=[ 1696]
   bw (  KiB/s): min=93568, max=241856, per=25.02%, avg=150055.38, stdev=36947.18, samples=274
   iops        : min= 2924, max= 7558, avg=4689.19, stdev=1154.60, samples=274
  write: IOPS=18.8k, BW=587MiB/s (616MB/s)(20.0GiB/34923msec)
    clat (usec): min=7, max=140, avg=22.46, stdev=10.45
     lat (usec): min=8, max=142, avg=25.41, stdev=10.96
    clat percentiles (nsec):
     |  1.00th=[10176],  5.00th=[10944], 10.00th=[11968], 20.00th=[14144],
     | 30.00th=[15424], 40.00th=[17280], 50.00th=[19072], 60.00th=[21120],
     | 70.00th=[25984], 80.00th=[31360], 90.00th=[36608], 95.00th=[44800],
     | 99.00th=[54016], 99.50th=[60160], 99.90th=[67072], 99.95th=[70144],
     | 99.99th=[74240]
   bw (  KiB/s): min=94400, max=242752, per=25.02%, avg=150419.79, stdev=37342.50, samples=274
   iops        : min= 2950, max= 7586, avg=4700.58, stdev=1166.95, samples=274
  lat (usec)   : 4=0.01%, 10=31.10%, 20=27.91%, 50=21.30%, 100=1.10%
  lat (usec)   : 250=11.00%, 500=5.54%, 750=1.66%, 1000=0.32%
  lat (msec)   : 2=0.07%, 4=0.01%
  fsync/fdatasync/sync_file_range:
    sync (nsec): min=756, max=1383.8k, avg=34933.13, stdev=25435.09
    sync percentiles (nsec):
     |  1.00th=[   956],  5.00th=[  1096], 10.00th=[  1176], 20.00th=[  1480],
     | 30.00th=[ 28032], 40.00th=[ 31872], 50.00th=[ 34048], 60.00th=[ 38144],
     | 70.00th=[ 42752], 80.00th=[ 50432], 90.00th=[ 65280], 95.00th=[ 78336],
     | 99.00th=[117248], 99.50th=[130560], 99.90th=[154624], 99.95th=[162816],
     | 99.99th=[189440]
  cpu          : usr=5.38%, sys=29.77%, ctx=2365485, majf=0, minf=739
  IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=654644,656076,0,0 short=1310712,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=586MiB/s (614MB/s), 586MiB/s-586MiB/s (614MB/s-614MB/s), io=19.0GiB (21.5GB), run=34923-34923msec
  WRITE: bw=587MiB/s (616MB/s), 587MiB/s-587MiB/s (616MB/s-616MB/s), io=20.0GiB (21.5GB), run=34923-34923msec

Disk stats (read/write):
  nvme1n1: ios=243805/930966, merge=0/132, ticks=66016/21901, in_queue=87969, util=95.36%

某 PCIe SATA 6Gb/s 磁盘背板挂载 SSD 测试

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
root@tidb-04:/data1# /root/fio -ioengine=psync -bs=32k -fdatasync=1 -thread -rw=randrw -percentage_random=100,0 -size=10G -filename=/data1/fio_randread_test.txt -name='fio randread test' -iodepth=4 -runtime=60 -numjobs=4 -group_reporting
fio randread test: (g=0): rw=randrw, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB, (T) 32.0KiB-32.0KiB, ioengine=psync, iodepth=4
...
fio-3.8
Starting 4 threads
fio randread test: Laying out IO file (1 file / 10240MiB)
Jobs: 2 (f=2): [E(1),m(1),E(1),m(1)][100.0%][r=97.7MiB/s,w=95.9MiB/s][r=3127,w=3069 IOPS][eta 00m:00s]
fio randread test: (groupid=0, jobs=4): err= 0: pid=51784: Wed May 13 16:52:38 2020
   read: IOPS=2263, BW=70.7MiB/s (74.2MB/s)(4245MiB/60003msec)
    clat (usec): min=7, max=51915, avg=1036.31, stdev=923.09
     lat (usec): min=8, max=51916, avg=1036.96, stdev=923.09
    clat percentiles (usec):
     |  1.00th=[    9],  5.00th=[    9], 10.00th=[   10], 20.00th=[   11],
     | 30.00th=[   12], 40.00th=[  840], 50.00th=[ 1139], 60.00th=[ 1319],
     | 70.00th=[ 1467], 80.00th=[ 1745], 90.00th=[ 2180], 95.00th=[ 2573],
     | 99.00th=[ 3458], 99.50th=[ 3818], 99.90th=[ 4555], 99.95th=[ 4883],
     | 99.99th=[10159]
   bw (  KiB/s): min= 8832, max=26944, per=24.96%, avg=18083.41, stdev=3456.47, samples=478
   iops        : min=  276, max=  842, avg=565.09, stdev=108.01, samples=478
  write: IOPS=2269, BW=70.9MiB/s (74.4MB/s)(4256MiB/60003msec)
    clat (usec): min=8, max=9672, avg=16.64, stdev=49.33
     lat (usec): min=10, max=9674, avg=18.53, stdev=49.33
    clat percentiles (usec):
     |  1.00th=[   10],  5.00th=[   11], 10.00th=[   11], 20.00th=[   13],
     | 30.00th=[   13], 40.00th=[   15], 50.00th=[   16], 60.00th=[   17],
     | 70.00th=[   18], 80.00th=[   20], 90.00th=[   24], 95.00th=[   28],
     | 99.00th=[   35], 99.50th=[   38], 99.90th=[   48], 99.95th=[   64],
     | 99.99th=[  104]
   bw (  KiB/s): min=10432, max=25536, per=24.96%, avg=18129.09, stdev=3458.77, samples=478
   iops        : min=  326, max=  798, avg=566.51, stdev=108.08, samples=478
  lat (usec)   : 10=11.21%, 20=45.86%, 50=9.87%, 100=0.04%, 250=0.01%
  lat (usec)   : 500=0.37%, 750=1.88%, 1000=2.98%
  lat (msec)   : 2=20.95%, 4=6.67%, 10=0.16%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%
  fsync/fdatasync/sync_file_range:
    sync (usec): min=14, max=28325, avg=346.05, stdev=407.58
    sync percentiles (usec):
     |  1.00th=[   18],  5.00th=[   19], 10.00th=[   21], 20.00th=[  101],
     | 30.00th=[  151], 40.00th=[  190], 50.00th=[  237], 60.00th=[  269],
     | 70.00th=[  338], 80.00th=[  486], 90.00th=[  914], 95.00th=[ 1123],
     | 99.00th=[ 1614], 99.50th=[ 1942], 99.90th=[ 2671], 99.95th=[ 3097],
     | 99.99th=[ 9765]
  cpu          : usr=1.46%, sys=6.18%, ctx=680612, majf=0, minf=379
  IO depths    : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=135831,136197,0,0 short=272023,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
   READ: bw=70.7MiB/s (74.2MB/s), 70.7MiB/s-70.7MiB/s (74.2MB/s-74.2MB/s), io=4245MiB (4451MB), run=60003-60003msec
  WRITE: bw=70.9MiB/s (74.4MB/s), 70.9MiB/s-70.9MiB/s (74.4MB/s-74.4MB/s), io=4256MiB (4463MB), run=60003-60003msec

Disk stats (read/write):
  sdb: ios=89553/406331, merge=0/41, ticks=137840/62920, in_queue=200510, util=99.27%