0x00 话不多说

Grafana 用于 TiDB-cluster 集群监控展示功能(如果不了解的同学可以看下以前的文档),如果每套集群都部署一个 Grafana,使用不方便且运维也繁琐。因此想通过单套 Grafana 管理多个 TiDB-cluster 集群。
辣么大概是以下结构:

Provisioning Grafana

0x01 想

通过 Google 一顿搜索发现其他人也有类似想法 I’ve got a new datasource I would like an existing dashboard to use.Dashboard Level Datasource #2748 等、还在 Github 发现了一些 sqllite3 修改 grafana.db 的黑科技。
但是这些都不太适合懒人且还需要二次修改定制化,不方便随时随地操作。
抽着烟,试图在 Grafana docs 找到神迹……

抽着烟,试图在 Grafana docs 找到神迹……

偶然机会看见别人讨论 tidb-ansible 4.0 安装 grafana 删除 dashboard 的问题,挖掘 tidb-ansible & tidb-cluster 等代码后发现 Grafana 6.1 版本的新功能 Provisioning;用过 tidb-ansible 应该知道 tidb-ansible/scripts 下面有个 grafana-copy.py 脚本,该脚本通过 Grafana API 把 datasource & dashboard 导入进 Grafana 服务做初始化步骤。但是经常会报错出现问题……

0x02 动动

然后进入了测试阶段:

  1. 部署一个 Grafana 6.1 以上的版本

  2. 按照模板写 datasource

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
    apiVersion: 1
    deleteDatasources:
      - name: cluster-1  # 此处传递的 tidb-ansible inventory.ini 文件中的 cluster-name
    datasources:
      - name: cluster-1  # datasource 名称,在配置文件中唯一
        type: prometheus
        access: proxy
        url: http://10.10.10.4:14090
        withCredentials: false
        isDefault: false
        tlsAuth: false
        tlsAuthWithCACert: false
        version: 1
        editable: true
      - name: cluster-2 # 这个是测试手写的
        type: prometheus
        access: proxy
        url: http://10.10.10.4:14090
        withCredentials: false
        isDefault: false
        tlsAuth: false
        tlsAuthWithCACert: false
        version: 1
        editable: true
    
  3. dashboard 配置文件

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    apiVersion: 1
    providers:
      - name: cluster-1
        folder: cluster-1  # 在 Grafana 页面中显示
        type: file
        disableDeletion: false
        editable: true
        updateIntervalSeconds: 30
        options:
        path: /home/tmpuser/tidb-deploy/grafana-14409/cluster-1
      - name: cluster-2
        folder: cluster-2
        type: file
        disableDeletion: false
        editable: true
        updateIntervalSeconds: 30
        # <bool> allow updating provisioned dashboards from the UI
        allowUiUpdates: false # 该功能需要 6.7 以上版本支持
        options:
        path: /home/tmpuser/tidb-deploy/grafana-14409/cluster-2
    
  4. Grafana 目录结构大致如下

    • 运行目录 /home/user
      • bin/grafana-server
      • conf/grafana.ini
      • provisioning/dashboards/dashboard.yml
      • provisioning/datasources/datasource.yml
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    
    // bin/bin/grafana-server
    ├── bin
    │   ├── bin
    │   │   └── grafana-server  # 运行程序
    │   ├── node.json           # dashboard 模版
    │   ├── public
    ├── conf
    │   ├── dashboard.yml       # provisioning/dashboard 模版
    │   ├── datasource.yml      # provisioning/datasources 模版
    │   └── grafana.ini
    ├── cluster-1               # 要导入集群 1 的 dashboard
    │   ├── node.json
    ├── cluster-2               # 要导入集群 2 的 dashboard
    │   └── node.json
    ├── data                    # Grafana sqllite3 数据库
    │   ├── grafana.db
    │   └── png
    ├── logs                    # 日志目录
    │   ├── grafana.log
    ├── plugins
    ├── provisioning            # 默认 provisioning 实际生效的地方
    │   ├── dashboards
    │   │   └── dashboard.yml
    │   └── datasources
    │   │   └── datasource.yml
    └── scripts
        └── run_grafana.sh
    

Run Grafana sh

Grafana 启动脚本,脚本中的 mkdir、cp、find + send 选配,非必选项。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 新建 dashboard、provisioning 相关目录
mkdir -p /home/tmpuser/tidb-deploy/grafana-14409/plugins
mkdir -p /home/tmpuser/tidb-deploy/grafana-14409/dashboards
mkdir -p /home/tmpuser/tidb-deploy/grafana-14409/provisioning/dashboards
mkdir -p /home/tmpuser/tidb-deploy/grafana-14409/provisioning/datasources

# copy 模版文件到 provisioning 相关目录
cp /home/tmpuser/tidb-deploy/grafana-14409/bin/*.json /home/tmpuser/tidb-deploy/grafana-14409/dashboards/
cp /home/tmpuser/tidb-deploy/grafana-14409/conf/datasource.yml /home/tmpuser/tidb-deploy/grafana-14409/provisioning/datasources
cp /home/tmpuser/tidb-deploy/grafana-14409/conf/dashboard.yml /home/tmpuser/tidb-deploy/grafana-14409/provisioning/dashboards

# 批量修改 & 替换 json 模版文件中的关键词【方便在页面观赏】
find /home/tmpuser/tidb-deploy/grafana-14409/dashboards/ -type f -exec sed -i "s/\${DS_.*-CLUSTER}/cluster-1/g" {} \;
find /home/tmpuser/tidb-deploy/grafana-14409/dashboards/ -type f -exec sed -i "s/\${DS_LIGHTNING}/cluster-1/g" {} \;
find /home/tmpuser/tidb-deploy/grafana-14409/dashboards/ -type f -exec sed -i "s/test-cluster/cluster-1/g" {} \;
find /home/tmpuser/tidb-deploy/grafana-14409/dashboards/ -type f -exec sed -i "s/Test-Cluster/cluster-1/g" {} \;

# 启动命令
LANG=en_US.UTF-8 \
exec bin/bin/grafana-server \
    --homepath="/home/tmpuser/tidb-deploy/grafana-14409/bin" \
    --config="/home/tmpuser/tidb-deploy/grafana-14409/conf/grafana.ini"

Start log

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
lvl=info msg="Starting Grafana" logger=server version=6.1.6 commit=cf9cb45 branch=HEAD compiled=2019-04-29T21:29:28+0800
lvl=info msg="Config loaded from" logger=settings file=/home/tmpuser/tidb-deploy/grafana-14409/bin/conf/defaults.ini
lvl=info msg="Config loaded from" logger=settings file=/home/tmpuser/tidb-deploy/grafana-14409/conf/grafana.ini
lvl=info msg="Path Home" logger=settings path=/home/tmpuser/tidb-deploy/grafana-14409/bin
lvl=info msg="Path Data" logger=settings path=/home/tmpuser/tidb-deploy/grafana-14409/data
lvl=info msg="Path Logs" logger=settings path=/home/tmpuser/tidb-deploy/grafana-14409/logs
lvl=info msg="Path Plugins" logger=settings path=/home/tmpuser/tidb-deploy/grafana-14409/plugins
lvl=info msg="Path Provisioning" logger=settings path=/home/tmpuser/tidb-deploy/grafana-14409/provisioning
lvl=info msg="App mode production" logger=settings
lvl=info msg="Initializing HTTPServer" logger=server
lvl=info msg="Initializing SqlStore" logger=server
lvl=info msg="Connecting to DB" logger=sqlstore dbtype=sqlite3
lvl=info msg="Starting DB migration" logger=migrator
lvl=info msg="Initializing InternalMetricsService" logger=server
lvl=info msg="Initializing SearchService" logger=server
lvl=info msg="Initializing PluginManager" logger=server
lvl=info msg="Starting plugin search" logger=plugins
lvl=info msg="Initializing RenderingService" logger=server
lvl=info msg="Initializing AlertingService" logger=server
lvl=info msg="Initializing DatasourceCacheService" logger=server
lvl=info msg="Initializing HooksService" logger=server
lvl=info msg="Initializing LoginService" logger=server
lvl=info msg="Initializing QuotaService" logger=server
lvl=info msg="Initializing RemoteCache" logger=server
lvl=info msg="Initializing ServerLockService" logger=server
lvl=info msg="Initializing TracingService" logger=server
lvl=info msg="Initializing UsageStatsService" logger=server
lvl=info msg="Initializing UserAuthTokenService" logger=server
lvl=info msg="Initializing CleanUpService" logger=server
lvl=info msg="Initializing NotificationService" logger=server
lvl=info msg="Initializing ProvisioningService" logger=server

# 响应配置文件需求每次启动的时候都会删除并重新注册 cluster-1
lvl=info msg="deleted datasource based on configuration" logger=provisioning.datasources name=cluster-1  
lvl=info msg="inserting datasource from configuration " logger=provisioning.datasources name=cluster-1

# 注册 cluster-2 datasource 信息
lvl=info msg="inserting datasource from configuration " logger=provisioning.datasources name=cluster-2

# provisioning 支持三个功能datasourcesdashboardalert notifiers然而我这里没有配置告警信息但是不影响启动
lvl=eror msg="Can't read alert notification provisioning files from directory" logger=provisioning.notifiers path=/home/tmpuser/tidb-deploy/grafana-14409/provisioning/notifiers error="open /home/tmpuser/tidb-deploy/grafana-14409/provisioning/notifiers: no such file or directory"

lvl=info msg="Initializing Stream Manager"
lvl=info msg="HTTP Server Listen" logger=http.server address=0.0.0.0:14409 protocol=http subUrl= socket=

0x04 issue

使用 provisioning/datasources/datasource.yml 该功能注意事项
Could not find datasource Data source not found 告警
alert Could not find datasource Data source not found

添加 Datasources 会占用一个 DatasourcesID,Datasources 无法使用重复名字。 deleteDatasources 是执行删除 Datasources 操作(此时 datasource ID 1 被删除了)。重复执行 deleteDatasources 就会导致以上现象。

1
2
3
apiVersion: 1
deleteDatasources:
    - name: cluster-1

从 Grafana repo 粗略看了下 alert 绑定了 DatasourceID。
systemdctl restart grafana-3000 操作从内容看只会重复执行 datasource.yml ,没有执行 dashboard.yml 文件。老的 dashboard 绑定的信息还是 datasource ID = 1,所以会提示 Could not find datasource Data source not found

1
2
3
4
5
6
7
8
// AlertQuery contains information about what datasource a query
// should be sent to and the query object.
type AlertQuery struct {
    Model        *simplejson.Json
    DatasourceID int64
    From         string
    To           string
}

解决方案

  1. 停止 grafana-service

  2. 删掉 {{deploy_dir}}/data/grafana.db「删除后所有自定义设置丢失」

  3. {{deploy_dir}}/conf/datasource.yml 文件中的以下内容注视或者删除

    1
    2
    
    deleteDatasources:
      - name: temp
    
  4. 重新启动 grafana-service