1.下载kafka_exporter相关包:
wget https://github.com/danielqsj/kafka_exporter/releases/download/v1.4.2/kafka_exporter-1.4.2.linux-amd64.tar.gz
2.解压

注意:一个kafka集群(CDH kafka集群同样适用),只需要配置一个kafka_exporter即可,同时kafka_exporter必须部署在kafka节点上。

tar -xf kafka_exporter-1.4.2.linux-amd64.tar.gz
3.创建服务系统文件
vim /usr/lib/systemd/system/kafka-export.service
[Unit]
Description=kafka_exporter stats exporter for Prometheus
Documentation=Prometheus exporter for various metrics about kafka_exporter, https://github.com/danielqsj/kafka_exporter/.

[Service]
ExecStart=/home/bpaas/soft/kafka_exporter/kafka_exporter --kafka.server=localhost:9092

[Install]
WantedBy=multi-user.targe
4.启动服务,设置开机自启
systemctl daemon-reload
systemctl start kafka-export.service
systemctl enable kafka-export.service
systemctl status kafka-export.service
systemctl restart kafka-export.service
5.访问kafka_exporter的metries

kafka_exporter默认端口为9308。

http://**.**.**.125:9308/metrics
6.配置prometheus访问kafka的metries文件
 - job_name: 'kafka'
    static_configs:
      - targets: ['**.**.**.125:9308']
7.重启prometheus
cd /soft/prometheus
./promtool check config  prometheus.yml
systemctl reload prometheus.service
8.观察Prometheus Targets
http://localhost:9090/targets

在这里插入图片描述

9.配置报警规则并发送钉钉报警

这个报警是我自己配置的kafka节点down机报警规则,仅供参考

groups:
 - name:  kafka_general
   rules:
   - alert: Kafka InstanceDown # 告警名称
     expr: sum(kafka_brokers) < 5
     for: 15s # 满足告警条件持续时间多久后,才会发送告警
     labels: #标签项
        severity: error
     annotations: # 解析项,详细解释告警信息
         summary: "Instance {{ $labels.instance }} down"
         description: "kafka {{ $labels.instance }} of cluster {{ $labels.job }} has been down for more than 15 second."
10.配置钉钉报警

详情请参考:prometheus配置alertmanager告警-钉钉告警

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐