今天项目上发现一个问题,运行一段时间的web服务出现了zookeeper连接断开的问题,经定位是由于写某个节点时数据太大,大概1M多,而zk默认的节点大小为1M。所以导致了会话断开等一系列问题。而问题原因主要和一个JVM参数jute.maxbuffer有关
zk客户端使用的是org.IOItec.zkclient.ZkClient

错误1.客户端报错java.io.IOException: Packet len1830457 is out of range!

很明显,数据包太大了超过了限制
解决方案:调整JVM参数,增大其数值。以下两种形式都可以

1)代码中修改,项目初始化时zk连接之前加上设置:
// 修改为4M,具体大小自己根据情况设置
System.setProperty("jute.maxbuffer",1024*4096+"");
2)项目启动命令中添加:-Djute.maxbuffer=4194304参数

例如:nohup /usr/local/jdk/bin/java -Djute.maxbuffer=4194304 -jar xxx.jar &

错误2.客户端报错java.io.IOException: Broken pipe

完整报错信息如下:

2022-02-10 15:51:53.380  INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn          : Unable to read additional data from server sessionid 0x100000057630138, likely server has closed socket, closing socket connection and attempting reconnect
2022-02-10 15:51:53.480  INFO 108020 --- [main-EventThread] org.I0Itec.zkclient.ZkClient             : zookeeper state changed (Disconnected)
2022-02-10 15:51:53.480  INFO 108020 --- [taskScheduler-1] org.I0Itec.zkclient.ZkClient             : Waiting for keeper state SyncConnected
2022-02-10 15:51:53.480  INFO 108020 --- [org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1] org.I0Itec.zkclient.ZkClient             : Waiting for keeper state SyncConnected
2022-02-10 15:51:53.480  INFO 108020 --- [https-jsse-nio-8441-exec-3] org.I0Itec.zkclient.ZkClient             : Waiting for keeper state SyncConnected
2022-02-10 15:51:53.480  INFO 108020 --- [https-jsse-nio-8441-exec-10] org.I0Itec.zkclient.ZkClient             : Waiting for keeper state SyncConnected
2022-02-10 15:51:54.303  INFO 108020 --- [myThreadPool thread:1] c.s.dbsec.commons.utils.ProcessUtils     : 22/02/10 15:51:54 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2022-02-10 15:51:55.034  INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn          : Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2022-02-10 15:51:55.034  INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn          : Socket connection established, initiating session, client: /127.0.0.1:59100, server: localhost/127.0.0.1:2181
2022-02-10 15:51:55.035  INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn          : Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x100000057630138, negotiated timeout = 30000
2022-02-10 15:51:55.035  INFO 108020 --- [main-EventThread] org.I0Itec.zkclient.ZkClient             : zookeeper state changed (SyncConnected)
2022-02-10 15:51:55.037  WARN 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn          : Session 0x100000057630138 for server localhost/127.0.0.1:2181, unexpected error, closing socket connection and attempting reconnect


java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_221]
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_221]
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_221]
        at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_221]
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) ~[na:1.8.0_221]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:123) ~[zookeeper-3.5.7.jar!/:3.5.7]
        at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363) ~[zookeeper-3.5.7.jar!/:3.5.7]
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) ~[zookeeper-3.5.7.jar!/:3.5.7]
2022-02-10 15:51:55.138  INFO 108020 --- [main-EventThread] org.I0Itec.zkclient.ZkClient             : zookeeper state changed (Disconnected)
2022-02-10 15:51:55.138  INFO 108020 --- [org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1] org.I0Itec.zkclient.ZkClient             : Waiting for keeper state SyncConnected
2022-02-10 15:51:55.138  INFO 108020 --- [taskScheduler-1] org.I0Itec.zkclient.ZkClient             : Waiting for keeper state SyncConnected
2022-02-10 15:51:55.138  INFO 108020 --- [https-jsse-nio-8441-exec-3] org.I0Itec.zkclient.ZkClient             : Waiting for keeper state SyncConnected
2022-02-10 15:51:55.138  INFO 108020 --- [https-jsse-nio-8441-exec-10] org.I0Itec.zkclient.ZkClient             : Waiting for keeper state SyncConnected
2022-02-10 15:51:56.487  INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn          : Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2022-02-10 15:51:56.487  INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn          : Socket connection established, initiating session, client: /127.0.0.1:59102, server: localhost/127.0.0.1:2181
2022-02-10 15:51:56.487  INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn          : Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x100000057630138, negotiated timeout = 30000
2022-02-10 15:51:56.488  INFO 108020 --- [main-EventThread] org.I0Itec.zkclient.ZkClient             : zookeeper state changed (SyncConnected)
2022-02-10 15:51:56.489  INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn          : Unable to read additional data from server sessionid 0x100000057630138, likely server has closed socket, closing socket connection and attempting reconnect
2022-02-10 15:51:56.590  INFO 108020 --- [main-EventThread] org.I0Itec.zkclient.ZkClient             : zookeeper state changed (Disconnected)
2022-02-10 15:51:56.590  INFO 108020 --- [taskScheduler-1] org.I0Itec.zkclient.ZkClient             : Waiting for keeper state SyncConnected
2022-02-10 15:51:56.590  INFO 108020 --- [org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1] org.I0Itec.zkclient.ZkClient             : Waiting for keeper state SyncConnected

根据报错信息只能看出socket被关闭了,然后一直在重连失败。网上也众说纷纭,大致有如下几种观点:
1.网络问题,网络波动导致会话断开(防火墙、端口2181不可访问等)
2.zookeeper服务器响应过慢,处理过久达到了会话超时时间,服务端处理完后响应给客户端时此会话已失效
3.zookeeper服务停止

结合我们的业务来说,zk客户端和服务端在一台机器,并且zookeeper服务状态正常,然后使用zk自带的zkCli.sh客户端可以访问zk服务,所以排除了以上1、3观点。至于观点2结合业务和查看日志也基本不存在这种可能。继而去看了zookeeper服务端的日志。
服务端日志如下:
zkServerLog
抛出异常代码如下:
报错位置
在这里插入图片描述
又是熟悉的配置jute.maxbuffer,默认为0xfffff即1048575B经过换算即是1M,也就是说zookeeper服务端默认配置也是1M,即可以通过修改xxx/zookeeper/bin/zkServer.sh来设置zookeeper服务端的配置。修改后的配置如下:
(配置文件中增加了自定义变量"JUTE_MAXBUFFER",通过关键字搜索即可)
zkServer.sh

#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

#
# If this scripted is run out of /usr/bin or some other system bin directory
# it should be linked to and not copied. Things like java jar files are found
# relative to the canonical path of this script.
#

# 设置为4M
JUTE_MAXBUFFER="-Djute.maxbuffer=4194304"

# use POSIX interface, symlink is followed automatically
ZOOBIN="${BASH_SOURCE-$0}"
ZOOBIN="$(dirname "${ZOOBIN}")"
ZOOBINDIR="$(cd "${ZOOBIN}"; pwd)"

if [ -e "$ZOOBIN/../libexec/zkEnv.sh" ]; then
  . "$ZOOBINDIR"/../libexec/zkEnv.sh
else
  . "$ZOOBINDIR"/zkEnv.sh
fi

# See the following page for extensive details on setting
# up the JVM to accept JMX remote management:
# http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
# by default we allow local JMX connections
if [ "x$JMXLOCALONLY" = "x" ]
then
    JMXLOCALONLY=false
fi

if [ "x$JMXDISABLE" = "x" ] || [ "$JMXDISABLE" = 'false' ]
then
  echo "ZooKeeper JMX enabled by default" >&2
  if [ "x$JMXPORT" = "x" ]
  then
    # for some reason these two options are necessary on jdk6 on Ubuntu
    #   accord to the docs they are not necessary, but otw jconsole cannot
    #   do a local attach
    ZOOMAIN="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=$JMXLOCALONLY org.apache.zookeeper.server.quorum.QuorumPeerMain"
  else
    if [ "x$JMXAUTH" = "x" ]
    then
      JMXAUTH=false
    fi
    if [ "x$JMXSSL" = "x" ]
    then
      JMXSSL=false
    fi
    if [ "x$JMXLOG4J" = "x" ]
    then
      JMXLOG4J=true
    fi
    echo "ZooKeeper remote JMX Port set to $JMXPORT" >&2
    echo "ZooKeeper remote JMX authenticate set to $JMXAUTH" >&2
    echo "ZooKeeper remote JMX ssl set to $JMXSSL" >&2
    echo "ZooKeeper remote JMX log4j set to $JMXLOG4J" >&2
    ZOOMAIN="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=$JMXPORT -Dcom.sun.management.jmxremote.authenticate=$JMXAUTH -Dcom.sun.management.jmxremote.ssl=$JMXSSL -Dzookeeper.jmx.log4j.disable=$JMXLOG4J org.apache.zookeeper.server.quorum.QuorumPeerMain"
  fi
else
    echo "JMX disabled by user request" >&2
    ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain"
fi

if [ "x$SERVER_JVMFLAGS" != "x" ]
then
    JVMFLAGS="$SERVER_JVMFLAGS $JVMFLAGS"
fi

if [ "x$2" != "x" ]
then
    ZOOCFG="$ZOOCFGDIR/$2"
fi

# if we give a more complicated path to the config, don't screw around in $ZOOCFGDIR
if [ "x$(dirname "$ZOOCFG")" != "x$ZOOCFGDIR" ]
then
    ZOOCFG="$2"
fi

if $cygwin
then
    ZOOCFG=`cygpath -wp "$ZOOCFG"`
    # cygwin has a "kill" in the shell itself, gets confused
    KILL=/bin/kill
else
    KILL=kill
fi

echo "Using config: $ZOOCFG" >&2

case "$OSTYPE" in
*solaris*)
  GREP=/usr/xpg4/bin/grep
  ;;
*)
  GREP=grep
  ;;
esac
ZOO_DATADIR="$($GREP "^[[:space:]]*dataDir" "$ZOOCFG" | sed -e 's/.*=//')"
ZOO_DATADIR="$(echo -e "${ZOO_DATADIR}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
ZOO_DATALOGDIR="$($GREP "^[[:space:]]*dataLogDir" "$ZOOCFG" | sed -e 's/.*=//')"

# iff autocreate is turned off and the datadirs don't exist fail
# immediately as we can't create the PID file, etc..., anyway.
if [ -n "$ZOO_DATADIR_AUTOCREATE_DISABLE" ]; then
    if [ ! -d "$ZOO_DATADIR/version-2" ]; then
        echo "ZooKeeper data directory is missing at $ZOO_DATADIR fix the path or run initialize"
        exit 1
    fi

    if [ -n "$ZOO_DATALOGDIR" ] && [ ! -d "$ZOO_DATALOGDIR/version-2" ]; then
        echo "ZooKeeper txnlog directory is missing at $ZOO_DATALOGDIR fix the path or run initialize"
        exit 1
    fi
    ZOO_DATADIR_AUTOCREATE="-Dzookeeper.datadir.autocreate=false"
fi

if [ -z "$ZOOPIDFILE" ]; then
    if [ ! -d "$ZOO_DATADIR" ]; then
        mkdir -p "$ZOO_DATADIR"
    fi
    ZOOPIDFILE="$ZOO_DATADIR/zookeeper_server.pid"
else
    # ensure it exists, otw stop will fail
    mkdir -p "$(dirname "$ZOOPIDFILE")"
fi

if [ ! -w "$ZOO_LOG_DIR" ] ; then
mkdir -p "$ZOO_LOG_DIR"
fi

ZOO_LOG_FILE=zookeeper-$USER-server-$HOSTNAME.log
_ZOO_DAEMON_OUT="$ZOO_LOG_DIR/zookeeper-$USER-server-$HOSTNAME.out"

case $1 in
start)
    echo  -n "Starting zookeeper ... "
    if [ -f "$ZOOPIDFILE" ]; then
      if kill -0 `cat "$ZOOPIDFILE"` > /dev/null 2>&1; then
         echo $command already running as process `cat "$ZOOPIDFILE"`.
         exit 1
      fi
    fi
    nohup "$JAVA" $ZOO_DATADIR_AUTOCREATE "$JUTE_MAXBUFFER" "-Dzookeeper.log.dir=/var/log/zookeeper" \
    "-Dzookeeper.log.file=${ZOO_LOG_FILE}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
    -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -9 %p' \
    -cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG" > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null &
    if [ $? -eq 0 ]
    then
      case "$OSTYPE" in
      *solaris*)
        /bin/echo "${!}\\c" > "$ZOOPIDFILE"
        ;;
      *)
        /bin/echo -n $! > "$ZOOPIDFILE"
        ;;
      esac
      if [ $? -eq 0 ];
      then
        sleep 1
        pid=$(cat "${ZOOPIDFILE}")
        if ps -p "${pid}" > /dev/null 2>&1; then
          echo STARTED
        else
          echo FAILED TO START
          exit 1
        fi
      else
        echo FAILED TO WRITE PID
        exit 1
      fi
    else
      echo SERVER DID NOT START
      exit 1
    fi
    ;;
start-foreground)
    ZOO_CMD=(exec "$JAVA")
    if [ "${ZOO_NOEXEC}" != "" ]; then
      ZOO_CMD=("$JAVA")
    fi
    "${ZOO_CMD[@]}" $ZOO_DATADIR_AUTOCREATE "$JUTE_MAXBUFFER" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" \
    "-Dzookeeper.log.file=${ZOO_LOG_FILE}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
    -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -9 %p' \
    -cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG"
    ;;
print-cmd)
    echo "\"$JAVA\" $ZOO_DATADIR_AUTOCREATE -Dzookeeper.log.dir=\"${ZOO_LOG_DIR}\" \
    -Dzookeeper.log.file=\"${ZOO_LOG_FILE}\" -Dzookeeper.root.logger=\"${ZOO_LOG4J_PROP}\" \
    -XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -9 %p' \
    -cp \"$CLASSPATH\" $JVMFLAGS $ZOOMAIN \"$ZOOCFG\" > \"$_ZOO_DAEMON_OUT\" 2>&1 < /dev/null"
    ;;
stop)
    echo -n "Stopping zookeeper ... "
    if [ ! -f "$ZOOPIDFILE" ]
    then
      echo "no zookeeper to stop (could not find file $ZOOPIDFILE)"
    else
      $KILL $(cat "$ZOOPIDFILE")
      rm "$ZOOPIDFILE"
      sleep 1
      echo STOPPED
    fi
    exit 0
    ;;
version)
    ZOOMAIN=org.apache.zookeeper.version.VersionInfoMain
    $JAVA -cp "$CLASSPATH" $ZOOMAIN 2> /dev/null
    ;;
restart)
    shift
    "$0" stop ${@}
    sleep 3
    "$0" start ${@}
    ;;
status)
    # -q is necessary on some versions of linux where nc returns too quickly, and no stat result is output
    isSSL="false"
    clientPortAddress=`$GREP "^[[:space:]]*clientPortAddress[^[:alpha:]]" "$ZOOCFG" | sed -e 's/.*=//'`
    if ! [ $clientPortAddress ]
    then
	      clientPortAddress="localhost"
    fi
    clientPort=`$GREP "^[[:space:]]*clientPort[^[:alpha:]]" "$ZOOCFG" | sed -e 's/.*=//'`
    if ! [[ "$clientPort"  =~ ^[0-9]+$ ]]
    then
      dataDir=`$GREP "^[[:space:]]*dataDir" "$ZOOCFG" | sed -e 's/.*=//'`
      myid=`cat "$dataDir/myid" 2> /dev/null`
      if ! [[ "$myid" =~ ^[0-9]+$ ]] ; then
        echo "myid could not be determined, will not able to locate clientPort in the server configs."
      else
        clientPortAndAddress=`$GREP "^[[:space:]]*server.$myid=.*;.*" "$ZOOCFG" | sed -e 's/.*=//' | sed -e 's/.*;//'`
        if [ ! "$clientPortAndAddress" ] ; then
          echo "Client port not found in static config file. Looking in dynamic config file."
          dynamicConfigFile=`$GREP "^[[:space:]]*dynamicConfigFile" "$ZOOCFG" | sed -e 's/.*=//'`
          clientPortAndAddress=`$GREP "^[[:space:]]*server.$myid=.*;.*" "$dynamicConfigFile" | sed -e 's/.*=//' | sed -e 's/.*;//'`
        fi
        if [ ! "$clientPortAndAddress" ] ; then
          echo "Client port not found in the server configs"
        else
          if [[ "$clientPortAndAddress" =~ ^.*:[0-9]+ ]] ; then
            if [[ "$clientPortAndAddress" =~ \[.*\]:[0-9]+ ]] ; then
              # Extracts address from address:port for example extracts 127::1 from "[127::1]:2181"
              clientPortAddress=`echo "$clientPortAndAddress" | sed -e 's|\[||' | sed -e 's|\]:.*||'`
            else
              clientPortAddress=`echo "$clientPortAndAddress" | sed -e 's/:.*//'`
            fi
          fi
          clientPort=`echo "$clientPortAndAddress" | sed -e 's/.*://'`
        fi
      fi
    fi
    if [ ! "$clientPort" ] ; then
      echo "Client port not found. Looking for secureClientPort in the static config."
      secureClientPort=`$GREP "^[[:space:]]*secureClientPort[^[:alpha:]]" "$ZOOCFG" | sed -e 's/.*=//'`
      if [ "$secureClientPort" ] ; then
        isSSL="true"
        clientPort=$secureClientPort
      else
        echo "Unable to find either secure or unsecure client port in any configs. Terminating."
        exit 1
      fi
    fi
    echo "Client port found: $clientPort. Client address: $clientPortAddress. Client SSL: $isSSL."
    STAT=`"$JAVA" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" "-Dzookeeper.log.file=${ZOO_LOG_FILE}" \
          -cp "$CLASSPATH" $CLIENT_JVMFLAGS $JVMFLAGS org.apache.zookeeper.client.FourLetterWordMain \
          $clientPortAddress $clientPort srvr $isSSL 2> /dev/null    \
          | $GREP Mode`
    if [ "x$STAT" = "x" ]
    then
      if [ "$isSSL" = "true" ] ; then
        echo " "
        echo "Note: We used secureClientPort ($secureClientPort) to establish connection, but we failed. The 'status'"
        echo "  command establishes a client connection to the server to execute diagnostic commands. Please make sure you"
        echo "  provided all the Client SSL connection related parameters in the CLIENT_JVMFLAGS environment variable! E.g.:"
        echo "  CLIENT_JVMFLAGS=\"-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty"
        echo "  -Dzookeeper.ssl.trustStore.location=/tmp/clienttrust.jks -Dzookeeper.ssl.trustStore.password=password"
        echo "  -Dzookeeper.ssl.keyStore.location=/tmp/client.jks -Dzookeeper.ssl.keyStore.password=password"
        echo "  -Dzookeeper.client.secure=true\" ./zkServer.sh status"
        echo " "
      fi
      echo "Error contacting service. It is probably not running."
      exit 1
    else
      echo $STAT
      exit 0
    fi
    ;;
*)
    echo "Usage: $0 [--config <conf-dir>] {start|start-foreground|stop|version|restart|status|print-cmd}" >&2

esac

配置文件修改后重启zookeeper服务,然后重启zk客户端即可。如果客户端读写大数据节点(超过1M小于4M)时,服务端仍报错:java.io.IOException:Connection reset by peer…
则说明客户端未设置jute.maxbuffer或者过小。参考以上错误1的解决方案修改客户端环境的jute.maxbuffer即可。

总结

1.本次zk连接、读写问题主要是节点数据过大导致,zk客户端和服务端默认配置一般为1M,示例中改为了4M,具体多大根据自己业务调整,当然也不是越大越好,设置太大会影响性能。
2.针对问题2,由于是小白,前期因为没找到zookeeper服务器的运行日志而走了很多弯路。zk配置中的日志文件路径不知什么原因并没有写入相应日志文件。如找不到时看下xxx/zookeeper/bin同级目录看是否有个logs目录。结合客户端和服务端日志才能更快定位出问题原因
3.注意如果要读、写大节点时(超过1M),可能要同时修改zookeeper服务端配置和客户端配置才可解决问题。

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐