一 问题描述

同事反馈我们的三节点kafka集群当其中一台服务器宕机后,业务受到影响,无法生产与消费消息。程序报错:

WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.clients.NetworkClient)

二 故障模拟

2.1 分区的replicas为1时情形

#生产消息

[root@Centos7-Mode-V8 kafka]# bin/kafka-console-producer.sh --broker-list 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic baidd

>aa

>bb

#正常时能收到消息:

[root@Centos7-Mode-V8 kafka]#  bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic baidd

aa

bb

2.1.1 模拟关掉该topic所属leader节点

#用kafka tool查看该topic的分区的leader在哪个节点上

 关掉其leader节点,发现生产者和所有消费者进程都一直在刷如下信息:

[2021-09-23 17:09:53,495] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] 1 partitions have leader brokers without a matching listener, including [baidd-0] (org.apache.kafka.clients.NetworkClient)

无法发送消息,也无法消费消息。

2.1.2 模拟关掉非leader节点

有时消费者进程会报错:[2021-09-23 17:21:22,480] WARN [Consumer clientId=consumer-1, groupId=console-consumer-55928] Connection to node 2147483645 (/192.168.144.253:9193) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

报错期间可以正常生产消息,但无法消费这中间产生的数据。

2.1.3 总结

在分区只有一个replicat情况下,停掉任意一个节点,都会影响业务。

其中,当某个分区leader所在节点宕机,会影响生产消息与消费消息。

当非leader节点宕机,会影响消费消息。

2.2 分区有多个副本情形

分区在无其他副本情况下,影响业务可以理解,因此尝试为topic配置多个副本,发现竟然还是影响业务:

#创建一个拥有三副本的topic

bin/kafka-topics.sh --create --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --replication-factor 3 --partitions 1 --topic song

#查看副本信息

[root@Centos7-Mode-V8 kafka]# bin/kafka-topics.sh  --zookeeper 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292 --describe --topic song

Topic:song PartitionCount:1 ReplicationFactor:3 Configs:

Topic: song Partition: 0 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1

#发消息

bin/kafka-console-producer.sh --broker-list 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song

#消费进程1

bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song --group g1

#消费进程2

bin/kafka-console-consumer.sh  -bootstrap-server 192.168.144.247:9193,192.168.144.251:9193,192.168.144.253:9193 --topic song --group g2

#模拟关掉该topic所属leader节点

发现还能生产消息,没有报1 partitions have leader brokers without a matching listener错了,但是发现有时消费者在连不上topic leader后,有时报错:

[2021-09-24 19:01:06,316] WARN [Consumer clientId=consumer-1, groupId=console-consumer-27609] Connection to node 2147483647 (/192.168.144.247:9193) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

这期间生产的数据有时没有过来,无法消费节点故障期间产生的消息。

只是为什么有了多个副本之后节点宕机还是会丢消息呢?

答:__consumer_offsets只有1个副本,会导致即使拥有多个副本的topic也无法实现高可用。

#后来通过扩kafka自带的这个topic(__consumer_offsets)的副本,可以实现其他普通topic的高可用了。

三 故障定位

Kafka配置文件中没配置default.replication.factor,而该参数默认为1,因此相当于是单点。

四 解决办法

  • 修改kafka配置文件,调大topic的默认副本因子(该参数默认为1):

default.replication.factor=3

设置了default.replication.factor=3,offsets.topic.replication.factor也会默认为3。

注意,不要设置了default.replication.factor=3,又设置offsets.topic.replication.factor=1,这样offsets.topic.replication.factor的值会覆盖default.replication.factor的值。

#重启kafka

  • 为现有普通topic扩副本

可参考https://blog.csdn.net/yabingshi_tech/article/details/120443647

  • 为__consumer_offset扩副本

方法同上,json文件内容如下:

{
    "version": 1, 
    "partitions": [
        {
            "topic": "__consumer_offsets", 
            "partition": 0, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 1, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 2, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 3, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 4, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 5, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 6, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 7, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 8, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 9, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 10, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 11, 
            "replicas": [
                0, 
                1, 
                2
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 12, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 13, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 14, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 15, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 16, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 17, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 18, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 19, 
            "replicas": [
                0, 
                1, 
                2                
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 20, 
            "replicas": [
                0, 
                1, 
                2                
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 21, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 22, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 23, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 24, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 25, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 26, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 27, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 28, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 29, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 30, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 31, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 32, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 33, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 34, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 35, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 36, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 37, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 38, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 39, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 40, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 41, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 42, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 43, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 44, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 45, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 46, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 47, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 48, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 49, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        }
    ]
}

--批量扩上述topic的副本脚本

#此脚本用于将kafka集群里现有topic的副本调整为3.

if [[ $1  = "" ]]
then
    echo '请在调用该脚本时传入kafka集群对应的zk地址参数,格式:ip1:port,ip2:port,ip3:port。参考示例:'
    echo 'sh expand_replica.sh 192.168.144.247:3292,192.168.144.251:3292,192.168.144.253:3292'
    exit
else
    echo '1.创建相关目录,用于存放扩副本的json文件...'
    cd /opt/
    mkdir -p kafka_json
    cd  kafka_json

    echo '2.扩普通topic的副本...'
    #查询都有哪些普通topic
    str1=$(/usr/local/kafka/bin/kafka-topics.sh -zookeeper $1 --list | grep -v __consumer_offsets)
    echo "$str1" > topic.txt

    #为每个topic生成一个扩副本的json文件
    for TopicName in `cat /opt/kafka_json/topic.txt` 
    do
        echo ' '
        echo '开始为'$TopicName'扩副本(若下方输出Successfully started reassignment of partitions,表示副本扩成功)...'
    #生成扩副本的json文件
        cat>${TopicName}.json<<EOF
{
    "version": 1, 
    "partitions": [ 
        {
            "topic": "$TopicName", 
            "partition": 0, 
            "replicas": [ 
                0, 
                1, 
                2 
            ] 
        } 
     ]
 }
EOF

    #为普通topic扩副本
    /usr/local/kafka/bin/kafka-reassign-partitions.sh --zookeeper $1 --reassignment-json-file ${TopicName}.json --execute
    done

    echo '3.扩__consumer_offsets的副本...'
    #生成json文件
    cat>__consumer_offsets.json<<EOF
{
    "version": 1, 
    "partitions": [
        {
            "topic": "__consumer_offsets", 
            "partition": 0, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 1, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 2, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 3, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 4, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 5, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 6, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 7, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 8, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 9, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 10, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 11, 
            "replicas": [
                0, 
                1, 
                2
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 12, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 13, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 14, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 15, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 16, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 17, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 18, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 19, 
            "replicas": [
                0, 
                1, 
                2                
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 20, 
            "replicas": [
                0, 
                1, 
                2                
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 21, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 22, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 23, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 24, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 25, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 26, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 27, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 28, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 29, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 30, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 31, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 32, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 33, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 34, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 35, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 36, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 37, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 38, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 39, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 40, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 41, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 42, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 43, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 44, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 45, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 46, 
            "replicas": [
                0, 
                1, 
                2 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 47, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 48, 
            "replicas": [
                1, 
                2, 
                0 
            ]
        },
        {
            "topic": "__consumer_offsets", 
            "partition": 49, 
            "replicas": [
                2, 
                0, 
                1 
            ]
        }
    ]
}
EOF
    
    #扩副本
    /usr/local/kafka/bin/kafka-reassign-partitions.sh --zookeeper $1 --reassignment-json-file __consumer_offsets.json --execute

fi

--本篇文章参考了Kafka突然宕机了?稳住,莫慌!

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐