1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.
一 故障描述9月22日,全国kafka集群中的其中一台kafka因磁盘空间不足宕机后,业务会受到影响,无法生产与消费消息。程序报错:WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers without a matching listener, incl
一 故障描述
9月22日,全国kafka集群中的其中一台kafka因磁盘空间不足宕机后,业务会受到影响,无法生产与消费消息。程序报错:
WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.clients.NetworkClient)
二 故障模拟
2.1 topic分区的replicas为1时情形
#生产消息
[root@Centos7-Mode-V8 kafka]# bin/kafka-console-producer.sh --broker-list 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic baidd
>aa
>bb
#消费消息:
[root@Centos7-Mode-V8 kafka]# bin/kafka-console-consumer.sh -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic baidd
aa
bb
2.1.1 模拟关掉该topic所属leader节点
#用kafka tool查看该topic的分区的leader在哪个节点上
/*
用kafka命令也可以看
bin/kafka-topics.sh --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --describe
结果输出如下:
*/
关掉其leader节点,发现生产者和所有消费者进程都一直在刷如下信息:
[2021-09-23 17:09:53,495] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.clients.NetworkClient)
无法发送消息,也无法消费消息。
2.1.2 模拟关掉非leader节点
有时消费者进程会报错:[2021-09-23 17:21:22,480] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] Connection to node 2147483645 (/192.168.144.253:9193) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
报错期间可以正常生产消息,但无法消费这中间产生的数据。
2.1.3 总结
在分区replicats等于1的情况下,停掉任意一个节点,都会影响业务。
其中,当某个分区leader所在节点宕机,会影响生产消息与消费消息。
当非leader节点宕机,会影响消费消息。
2.2 分区有多个副本情形
分区在无其他副本情况下,影响业务可以理解,因此尝试为topic配置多个副本,发现竟然还是影响业务:
#创建一个拥有三副本的topic
bin/kafka-topics.sh --create --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --replication-factor 3 --partitions 1 --topic song
#查看副本信息
[root@Centos7-Mode-V8 kafka]# bin/kafka-topics.sh --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --describe --topic song
Topic:song PartitionCount:1 ReplicationFactor:3 Configs:
Topic: song Partition: 0 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1
#发消息
bin/kafka-console-producer.sh --broker-list 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song
#消费进程1
bin/kafka-console-consumer.sh -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song --group g1
#消费进程2
bin/kafka-console-consumer.sh -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song --group g2
#模拟关掉该topic所属leader节点
发现还能生产消息,没有报1 partitions have leader brokers without a matching listener错了,但是发现消费者在连不上topic leader后,有时报错:
[2021-09-24 19:01:06,316] WARN [Consumer clientId=consumer-1, groupId=console-consumer-27609] Connection to node 2147483647 (/192.168.144.247:9193) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
这期间生产的数据有时没有过来,无法消费节点故障期间产生的消息。
只是为什么有了多个副本之后节点宕机还是会丢消息呢?
答:__consumer_offsets只有1个副本,会导致即使拥有多个副本的topic也无法实现高可用。
#后来通过扩kafka自带的这个topic(__consumer_offsets)的副本,可以实现其他普通topic的高可用了,虽然停掉某个节点后,还是报Broker may not be available,但是不再影响业务了。
三 故障定位
Kafka配置文件中没配置default.replication.factor=3,而该参数默认为1,表示没有其他副本,因此相当于是单点。
四 解决办法
4.1 修改default.replication.factor参数
修改所有kafka节点配置文件,调大topic的默认副本因子(该参数默认为1):
default.replication.factor=3
设置了default.replication.factor=3,offsets.topic.replication.factor也会默认为3。
注意,不要设置了default.replication.factor=3,又设置offsets.topic.replication.factor=1,这样offsets.topic.replication.factor的值会覆盖default.replication.factor的值。
#重启kafka,使配置生效
systemctl restart kafka
4.2 为现有普通topic扩副本
可参考https://blog.csdn.net/yabingshi_tech/article/details/120443647
4.3 为__consumer_offset扩副本
方法同上,json文件如下:
{
"version": 1,
"partitions": [
{
"topic": "__consumer_offsets",
"partition": 0,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 1,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 2,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 3,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 4,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 5,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 6,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 7,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 8,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 9,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 10,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 11,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 12,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 13,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 14,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 15,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 16,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 17,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 18,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 19,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 20,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 21,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 22,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 23,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 24,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 25,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 26,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 27,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 28,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 29,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 30,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 31,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 32,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 33,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 34,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 35,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 36,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 37,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 38,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 39,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 40,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 41,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 42,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 43,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 44,
"replicas": [
2,
0,
1
]
},
{
"topic": "__consumer_offsets",
"partition": 45,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 46,
"replicas": [
0,
1,
2
]
},
{
"topic": "__consumer_offsets",
"partition": 47,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 48,
"replicas": [
1,
2,
0
]
},
{
"topic": "__consumer_offsets",
"partition": 49,
"replicas": [
2,
0,
1
]
}
]
}
--本篇文章参考了:Kafka突然宕机了?稳住,莫慌!
更多推荐
所有评论(0)