一 故障描述

9月22日,全国kafka集群中的其中一台kafka因磁盘空间不足宕机后,业务会受到影响,无法生产与消费消息。程序报错:

WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.clients.NetworkClient)

二 故障模拟

2.1 topic分区的replicas为1时情形

#生产消息

[root@Centos7-Mode-V8 kafka]# bin/kafka-console-producer.sh --broker-list 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic baidd

>aa

>bb

#消费消息:

[root@Centos7-Mode-V8 kafka]#  bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic baidd

aa

bb

2.1.1 模拟关掉该topic所属leader节点

#用kafka tool查看该topic的分区的leader在哪个节点上

/*

用kafka命令也可以看 

bin/kafka-topics.sh --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --describe

结果输出如下:

*/

关掉其leader节点,发现生产者和所有消费者进程都一直在刷如下信息:

[2021-09-23 17:09:53,495] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.clients.NetworkClient)

无法发送消息,也无法消费消息。

2.1.2 模拟关掉非leader节点

有时消费者进程会报错:[2021-09-23 17:21:22,480] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] Connection to node 2147483645 (/192.168.144.253:9193) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

报错期间可以正常生产消息,但无法消费这中间产生的数据。

2.1.3 总结

在分区replicats等于1的情况下,停掉任意一个节点,都会影响业务。

其中,当某个分区leader所在节点宕机,会影响生产消息与消费消息。

当非leader节点宕机,会影响消费消息。

2.2 分区有多个副本情形

分区在无其他副本情况下,影响业务可以理解,因此尝试为topic配置多个副本,发现竟然还是影响业务:

#创建一个拥有三副本的topic

bin/kafka-topics.sh --create --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --replication-factor 3 --partitions 1 --topic song

#查看副本信息

[root@Centos7-Mode-V8 kafka]# bin/kafka-topics.sh  --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --describe --topic song

Topic:song PartitionCount:1 ReplicationFactor:3 Configs:

Topic: song Partition: 0 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1

#发消息

bin/kafka-console-producer.sh --broker-list 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song

#消费进程1

bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song --group g1

#消费进程2

bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song --group g2

#模拟关掉该topic所属leader节点

发现还能生产消息,没有报1 partitions have leader brokers without a matching listener错了,但是发现消费者在连不上topic leader后,有时报错:

[2021-09-24 19:01:06,316] WARN [Consumer clientId=consumer-1, groupId=console-consumer-27609] Connection to node 2147483647 (/192.168.144.247:9193) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

这期间生产的数据有时没有过来,无法消费节点故障期间产生的消息。

只是为什么有了多个副本之后节点宕机还是会丢消息呢?

答:__consumer_offsets只有1个副本,会导致即使拥有多个副本的topic也无法实现高可用。

#后来通过扩kafka自带的这个topic(__consumer_offsets)的副本,可以实现其他普通topic的高可用了,虽然停掉某个节点后,还是报Broker may not be available,但是不再影响业务了。

三 故障定位

Kafka配置文件中没配置default.replication.factor=3,而该参数默认为1,表示没有其他副本,因此相当于是单点。

四 解决办法

4.1 修改default.replication.factor参数

修改所有kafka节点配置文件,调大topic的默认副本因子(该参数默认为1):

default.replication.factor=3

设置了default.replication.factor=3,offsets.topic.replication.factor也会默认为3。

注意,不要设置了default.replication.factor=3,又设置offsets.topic.replication.factor=1,这样offsets.topic.replication.factor的值会覆盖default.replication.factor的值。

#重启kafka,使配置生效

systemctl restart kafka

4.2 为现有普通topic扩副本

可参考https://blog.csdn.net/yabingshi_tech/article/details/120443647

4.3 为__consumer_offset扩副本

方法同上,json文件如下:

{
    "version": 1, 
    "partitions": [
        {
            "topic": "__consumer_offsets", 
            "partition": 0, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 1, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 2, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 3, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 4, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 5, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 6, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 7, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 8, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 9, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 10, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 11, 
            "replicas": [
                0, 
                1, 
                2
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 12, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 13, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 14, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 15, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 16, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 17, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 18, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 19, 
            "replicas": [
                0, 
                1, 
                2                
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 20, 
            "replicas": [
                0, 
                1, 
                2                
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 21, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 22, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 23, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 24, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 25, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 26, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 27, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 28, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 29, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 30, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 31, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 32, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 33, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 34, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 35, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 36, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 37, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 38, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 39, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 40, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 41, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 42, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 43, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 44, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 45, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 46, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 47, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 48, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 49, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        }
    ]
}

--本篇文章参考了:Kafka突然宕机了?稳住,莫慌!

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐