zookeeper连接报错问题记录java.io.IOException: Broken pipe
今天项目上发现一个问题,运行一段时间的web服务出现了zookeeper连接断开的问题,经定位是由于写某个节点时数据太大,大概1M多,而zk默认的节点大小为1M。所以导致了会话断开等一系列问题。而问题原因主要和一个JVM参数jute.maxbuffer有关zk客户端使用的是org.IOItec.zkclient.ZkClient错误1.客户端报错java.io.IOException: Packe
今天项目上发现一个问题,运行一段时间的web服务出现了zookeeper连接断开的问题,经定位是由于写某个节点时数据太大,大概1M多,而zk默认的节点大小为1M。所以导致了会话断开等一系列问题。而问题原因主要和一个JVM参数jute.maxbuffer有关
zk客户端使用的是org.IOItec.zkclient.ZkClient
错误1.客户端报错java.io.IOException: Packet len1830457 is out of range!
很明显,数据包太大了超过了限制
解决方案:调整JVM参数,增大其数值。以下两种形式都可以
1)代码中修改,项目初始化时zk连接之前加上设置:
// 修改为4M,具体大小自己根据情况设置
System.setProperty("jute.maxbuffer",1024*4096+"");
2)项目启动命令中添加:-Djute.maxbuffer=4194304参数
例如:nohup /usr/local/jdk/bin/java -Djute.maxbuffer=4194304 -jar xxx.jar &
错误2.客户端报错java.io.IOException: Broken pipe
完整报错信息如下:
2022-02-10 15:51:53.380 INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn : Unable to read additional data from server sessionid 0x100000057630138, likely server has closed socket, closing socket connection and attempting reconnect
2022-02-10 15:51:53.480 INFO 108020 --- [main-EventThread] org.I0Itec.zkclient.ZkClient : zookeeper state changed (Disconnected)
2022-02-10 15:51:53.480 INFO 108020 --- [taskScheduler-1] org.I0Itec.zkclient.ZkClient : Waiting for keeper state SyncConnected
2022-02-10 15:51:53.480 INFO 108020 --- [org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1] org.I0Itec.zkclient.ZkClient : Waiting for keeper state SyncConnected
2022-02-10 15:51:53.480 INFO 108020 --- [https-jsse-nio-8441-exec-3] org.I0Itec.zkclient.ZkClient : Waiting for keeper state SyncConnected
2022-02-10 15:51:53.480 INFO 108020 --- [https-jsse-nio-8441-exec-10] org.I0Itec.zkclient.ZkClient : Waiting for keeper state SyncConnected
2022-02-10 15:51:54.303 INFO 108020 --- [myThreadPool thread:1] c.s.dbsec.commons.utils.ProcessUtils : 22/02/10 15:51:54 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2022-02-10 15:51:55.034 INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn : Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2022-02-10 15:51:55.034 INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn : Socket connection established, initiating session, client: /127.0.0.1:59100, server: localhost/127.0.0.1:2181
2022-02-10 15:51:55.035 INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn : Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x100000057630138, negotiated timeout = 30000
2022-02-10 15:51:55.035 INFO 108020 --- [main-EventThread] org.I0Itec.zkclient.ZkClient : zookeeper state changed (SyncConnected)
2022-02-10 15:51:55.037 WARN 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn : Session 0x100000057630138 for server localhost/127.0.0.1:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.8.0_221]
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.8.0_221]
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.8.0_221]
at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.8.0_221]
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) ~[na:1.8.0_221]
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:123) ~[zookeeper-3.5.7.jar!/:3.5.7]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:363) ~[zookeeper-3.5.7.jar!/:3.5.7]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1223) ~[zookeeper-3.5.7.jar!/:3.5.7]
2022-02-10 15:51:55.138 INFO 108020 --- [main-EventThread] org.I0Itec.zkclient.ZkClient : zookeeper state changed (Disconnected)
2022-02-10 15:51:55.138 INFO 108020 --- [org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1] org.I0Itec.zkclient.ZkClient : Waiting for keeper state SyncConnected
2022-02-10 15:51:55.138 INFO 108020 --- [taskScheduler-1] org.I0Itec.zkclient.ZkClient : Waiting for keeper state SyncConnected
2022-02-10 15:51:55.138 INFO 108020 --- [https-jsse-nio-8441-exec-3] org.I0Itec.zkclient.ZkClient : Waiting for keeper state SyncConnected
2022-02-10 15:51:55.138 INFO 108020 --- [https-jsse-nio-8441-exec-10] org.I0Itec.zkclient.ZkClient : Waiting for keeper state SyncConnected
2022-02-10 15:51:56.487 INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn : Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2022-02-10 15:51:56.487 INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn : Socket connection established, initiating session, client: /127.0.0.1:59102, server: localhost/127.0.0.1:2181
2022-02-10 15:51:56.487 INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn : Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x100000057630138, negotiated timeout = 30000
2022-02-10 15:51:56.488 INFO 108020 --- [main-EventThread] org.I0Itec.zkclient.ZkClient : zookeeper state changed (SyncConnected)
2022-02-10 15:51:56.489 INFO 108020 --- [main-SendThread(127.0.0.1:2181)] org.apache.zookeeper.ClientCnxn : Unable to read additional data from server sessionid 0x100000057630138, likely server has closed socket, closing socket connection and attempting reconnect
2022-02-10 15:51:56.590 INFO 108020 --- [main-EventThread] org.I0Itec.zkclient.ZkClient : zookeeper state changed (Disconnected)
2022-02-10 15:51:56.590 INFO 108020 --- [taskScheduler-1] org.I0Itec.zkclient.ZkClient : Waiting for keeper state SyncConnected
2022-02-10 15:51:56.590 INFO 108020 --- [org.springframework.kafka.KafkaListenerEndpointContainer#4-0-C-1] org.I0Itec.zkclient.ZkClient : Waiting for keeper state SyncConnected
根据报错信息只能看出socket被关闭了,然后一直在重连失败。网上也众说纷纭,大致有如下几种观点:
1.网络问题,网络波动导致会话断开(防火墙、端口2181不可访问等)
2.zookeeper服务器响应过慢,处理过久达到了会话超时时间,服务端处理完后响应给客户端时此会话已失效
3.zookeeper服务停止
结合我们的业务来说,zk客户端和服务端在一台机器,并且zookeeper服务状态正常,然后使用zk自带的zkCli.sh客户端可以访问zk服务,所以排除了以上1、3观点。至于观点2结合业务和查看日志也基本不存在这种可能。继而去看了zookeeper服务端的日志。
服务端日志如下:
抛出异常代码如下:
又是熟悉的配置jute.maxbuffer,默认为0xfffff即1048575B经过换算即是1M,也就是说zookeeper服务端默认配置也是1M,即可以通过修改xxx/zookeeper/bin/zkServer.sh来设置zookeeper服务端的配置。修改后的配置如下:
(配置文件中增加了自定义变量"JUTE_MAXBUFFER",通过关键字搜索即可)
zkServer.sh
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# If this scripted is run out of /usr/bin or some other system bin directory
# it should be linked to and not copied. Things like java jar files are found
# relative to the canonical path of this script.
#
# 设置为4M
JUTE_MAXBUFFER="-Djute.maxbuffer=4194304"
# use POSIX interface, symlink is followed automatically
ZOOBIN="${BASH_SOURCE-$0}"
ZOOBIN="$(dirname "${ZOOBIN}")"
ZOOBINDIR="$(cd "${ZOOBIN}"; pwd)"
if [ -e "$ZOOBIN/../libexec/zkEnv.sh" ]; then
. "$ZOOBINDIR"/../libexec/zkEnv.sh
else
. "$ZOOBINDIR"/zkEnv.sh
fi
# See the following page for extensive details on setting
# up the JVM to accept JMX remote management:
# http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
# by default we allow local JMX connections
if [ "x$JMXLOCALONLY" = "x" ]
then
JMXLOCALONLY=false
fi
if [ "x$JMXDISABLE" = "x" ] || [ "$JMXDISABLE" = 'false' ]
then
echo "ZooKeeper JMX enabled by default" >&2
if [ "x$JMXPORT" = "x" ]
then
# for some reason these two options are necessary on jdk6 on Ubuntu
# accord to the docs they are not necessary, but otw jconsole cannot
# do a local attach
ZOOMAIN="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=$JMXLOCALONLY org.apache.zookeeper.server.quorum.QuorumPeerMain"
else
if [ "x$JMXAUTH" = "x" ]
then
JMXAUTH=false
fi
if [ "x$JMXSSL" = "x" ]
then
JMXSSL=false
fi
if [ "x$JMXLOG4J" = "x" ]
then
JMXLOG4J=true
fi
echo "ZooKeeper remote JMX Port set to $JMXPORT" >&2
echo "ZooKeeper remote JMX authenticate set to $JMXAUTH" >&2
echo "ZooKeeper remote JMX ssl set to $JMXSSL" >&2
echo "ZooKeeper remote JMX log4j set to $JMXLOG4J" >&2
ZOOMAIN="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=$JMXPORT -Dcom.sun.management.jmxremote.authenticate=$JMXAUTH -Dcom.sun.management.jmxremote.ssl=$JMXSSL -Dzookeeper.jmx.log4j.disable=$JMXLOG4J org.apache.zookeeper.server.quorum.QuorumPeerMain"
fi
else
echo "JMX disabled by user request" >&2
ZOOMAIN="org.apache.zookeeper.server.quorum.QuorumPeerMain"
fi
if [ "x$SERVER_JVMFLAGS" != "x" ]
then
JVMFLAGS="$SERVER_JVMFLAGS $JVMFLAGS"
fi
if [ "x$2" != "x" ]
then
ZOOCFG="$ZOOCFGDIR/$2"
fi
# if we give a more complicated path to the config, don't screw around in $ZOOCFGDIR
if [ "x$(dirname "$ZOOCFG")" != "x$ZOOCFGDIR" ]
then
ZOOCFG="$2"
fi
if $cygwin
then
ZOOCFG=`cygpath -wp "$ZOOCFG"`
# cygwin has a "kill" in the shell itself, gets confused
KILL=/bin/kill
else
KILL=kill
fi
echo "Using config: $ZOOCFG" >&2
case "$OSTYPE" in
*solaris*)
GREP=/usr/xpg4/bin/grep
;;
*)
GREP=grep
;;
esac
ZOO_DATADIR="$($GREP "^[[:space:]]*dataDir" "$ZOOCFG" | sed -e 's/.*=//')"
ZOO_DATADIR="$(echo -e "${ZOO_DATADIR}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
ZOO_DATALOGDIR="$($GREP "^[[:space:]]*dataLogDir" "$ZOOCFG" | sed -e 's/.*=//')"
# iff autocreate is turned off and the datadirs don't exist fail
# immediately as we can't create the PID file, etc..., anyway.
if [ -n "$ZOO_DATADIR_AUTOCREATE_DISABLE" ]; then
if [ ! -d "$ZOO_DATADIR/version-2" ]; then
echo "ZooKeeper data directory is missing at $ZOO_DATADIR fix the path or run initialize"
exit 1
fi
if [ -n "$ZOO_DATALOGDIR" ] && [ ! -d "$ZOO_DATALOGDIR/version-2" ]; then
echo "ZooKeeper txnlog directory is missing at $ZOO_DATALOGDIR fix the path or run initialize"
exit 1
fi
ZOO_DATADIR_AUTOCREATE="-Dzookeeper.datadir.autocreate=false"
fi
if [ -z "$ZOOPIDFILE" ]; then
if [ ! -d "$ZOO_DATADIR" ]; then
mkdir -p "$ZOO_DATADIR"
fi
ZOOPIDFILE="$ZOO_DATADIR/zookeeper_server.pid"
else
# ensure it exists, otw stop will fail
mkdir -p "$(dirname "$ZOOPIDFILE")"
fi
if [ ! -w "$ZOO_LOG_DIR" ] ; then
mkdir -p "$ZOO_LOG_DIR"
fi
ZOO_LOG_FILE=zookeeper-$USER-server-$HOSTNAME.log
_ZOO_DAEMON_OUT="$ZOO_LOG_DIR/zookeeper-$USER-server-$HOSTNAME.out"
case $1 in
start)
echo -n "Starting zookeeper ... "
if [ -f "$ZOOPIDFILE" ]; then
if kill -0 `cat "$ZOOPIDFILE"` > /dev/null 2>&1; then
echo $command already running as process `cat "$ZOOPIDFILE"`.
exit 1
fi
fi
nohup "$JAVA" $ZOO_DATADIR_AUTOCREATE "$JUTE_MAXBUFFER" "-Dzookeeper.log.dir=/var/log/zookeeper" \
"-Dzookeeper.log.file=${ZOO_LOG_FILE}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
-XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -9 %p' \
-cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG" > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null &
if [ $? -eq 0 ]
then
case "$OSTYPE" in
*solaris*)
/bin/echo "${!}\\c" > "$ZOOPIDFILE"
;;
*)
/bin/echo -n $! > "$ZOOPIDFILE"
;;
esac
if [ $? -eq 0 ];
then
sleep 1
pid=$(cat "${ZOOPIDFILE}")
if ps -p "${pid}" > /dev/null 2>&1; then
echo STARTED
else
echo FAILED TO START
exit 1
fi
else
echo FAILED TO WRITE PID
exit 1
fi
else
echo SERVER DID NOT START
exit 1
fi
;;
start-foreground)
ZOO_CMD=(exec "$JAVA")
if [ "${ZOO_NOEXEC}" != "" ]; then
ZOO_CMD=("$JAVA")
fi
"${ZOO_CMD[@]}" $ZOO_DATADIR_AUTOCREATE "$JUTE_MAXBUFFER" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" \
"-Dzookeeper.log.file=${ZOO_LOG_FILE}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
-XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -9 %p' \
-cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG"
;;
print-cmd)
echo "\"$JAVA\" $ZOO_DATADIR_AUTOCREATE -Dzookeeper.log.dir=\"${ZOO_LOG_DIR}\" \
-Dzookeeper.log.file=\"${ZOO_LOG_FILE}\" -Dzookeeper.root.logger=\"${ZOO_LOG4J_PROP}\" \
-XX:+HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -9 %p' \
-cp \"$CLASSPATH\" $JVMFLAGS $ZOOMAIN \"$ZOOCFG\" > \"$_ZOO_DAEMON_OUT\" 2>&1 < /dev/null"
;;
stop)
echo -n "Stopping zookeeper ... "
if [ ! -f "$ZOOPIDFILE" ]
then
echo "no zookeeper to stop (could not find file $ZOOPIDFILE)"
else
$KILL $(cat "$ZOOPIDFILE")
rm "$ZOOPIDFILE"
sleep 1
echo STOPPED
fi
exit 0
;;
version)
ZOOMAIN=org.apache.zookeeper.version.VersionInfoMain
$JAVA -cp "$CLASSPATH" $ZOOMAIN 2> /dev/null
;;
restart)
shift
"$0" stop ${@}
sleep 3
"$0" start ${@}
;;
status)
# -q is necessary on some versions of linux where nc returns too quickly, and no stat result is output
isSSL="false"
clientPortAddress=`$GREP "^[[:space:]]*clientPortAddress[^[:alpha:]]" "$ZOOCFG" | sed -e 's/.*=//'`
if ! [ $clientPortAddress ]
then
clientPortAddress="localhost"
fi
clientPort=`$GREP "^[[:space:]]*clientPort[^[:alpha:]]" "$ZOOCFG" | sed -e 's/.*=//'`
if ! [[ "$clientPort" =~ ^[0-9]+$ ]]
then
dataDir=`$GREP "^[[:space:]]*dataDir" "$ZOOCFG" | sed -e 's/.*=//'`
myid=`cat "$dataDir/myid" 2> /dev/null`
if ! [[ "$myid" =~ ^[0-9]+$ ]] ; then
echo "myid could not be determined, will not able to locate clientPort in the server configs."
else
clientPortAndAddress=`$GREP "^[[:space:]]*server.$myid=.*;.*" "$ZOOCFG" | sed -e 's/.*=//' | sed -e 's/.*;//'`
if [ ! "$clientPortAndAddress" ] ; then
echo "Client port not found in static config file. Looking in dynamic config file."
dynamicConfigFile=`$GREP "^[[:space:]]*dynamicConfigFile" "$ZOOCFG" | sed -e 's/.*=//'`
clientPortAndAddress=`$GREP "^[[:space:]]*server.$myid=.*;.*" "$dynamicConfigFile" | sed -e 's/.*=//' | sed -e 's/.*;//'`
fi
if [ ! "$clientPortAndAddress" ] ; then
echo "Client port not found in the server configs"
else
if [[ "$clientPortAndAddress" =~ ^.*:[0-9]+ ]] ; then
if [[ "$clientPortAndAddress" =~ \[.*\]:[0-9]+ ]] ; then
# Extracts address from address:port for example extracts 127::1 from "[127::1]:2181"
clientPortAddress=`echo "$clientPortAndAddress" | sed -e 's|\[||' | sed -e 's|\]:.*||'`
else
clientPortAddress=`echo "$clientPortAndAddress" | sed -e 's/:.*//'`
fi
fi
clientPort=`echo "$clientPortAndAddress" | sed -e 's/.*://'`
fi
fi
fi
if [ ! "$clientPort" ] ; then
echo "Client port not found. Looking for secureClientPort in the static config."
secureClientPort=`$GREP "^[[:space:]]*secureClientPort[^[:alpha:]]" "$ZOOCFG" | sed -e 's/.*=//'`
if [ "$secureClientPort" ] ; then
isSSL="true"
clientPort=$secureClientPort
else
echo "Unable to find either secure or unsecure client port in any configs. Terminating."
exit 1
fi
fi
echo "Client port found: $clientPort. Client address: $clientPortAddress. Client SSL: $isSSL."
STAT=`"$JAVA" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" "-Dzookeeper.log.file=${ZOO_LOG_FILE}" \
-cp "$CLASSPATH" $CLIENT_JVMFLAGS $JVMFLAGS org.apache.zookeeper.client.FourLetterWordMain \
$clientPortAddress $clientPort srvr $isSSL 2> /dev/null \
| $GREP Mode`
if [ "x$STAT" = "x" ]
then
if [ "$isSSL" = "true" ] ; then
echo " "
echo "Note: We used secureClientPort ($secureClientPort) to establish connection, but we failed. The 'status'"
echo " command establishes a client connection to the server to execute diagnostic commands. Please make sure you"
echo " provided all the Client SSL connection related parameters in the CLIENT_JVMFLAGS environment variable! E.g.:"
echo " CLIENT_JVMFLAGS=\"-Dzookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty"
echo " -Dzookeeper.ssl.trustStore.location=/tmp/clienttrust.jks -Dzookeeper.ssl.trustStore.password=password"
echo " -Dzookeeper.ssl.keyStore.location=/tmp/client.jks -Dzookeeper.ssl.keyStore.password=password"
echo " -Dzookeeper.client.secure=true\" ./zkServer.sh status"
echo " "
fi
echo "Error contacting service. It is probably not running."
exit 1
else
echo $STAT
exit 0
fi
;;
*)
echo "Usage: $0 [--config <conf-dir>] {start|start-foreground|stop|version|restart|status|print-cmd}" >&2
esac
配置文件修改后重启zookeeper服务,然后重启zk客户端即可。如果客户端读写大数据节点(超过1M小于4M)时,服务端仍报错:java.io.IOException:Connection reset by peer…
则说明客户端未设置jute.maxbuffer或者过小。参考以上错误1的解决方案修改客户端环境的jute.maxbuffer即可。
总结
1.本次zk连接、读写问题主要是节点数据过大导致,zk客户端和服务端默认配置一般为1M,示例中改为了4M,具体多大根据自己业务调整,当然也不是越大越好,设置太大会影响性能。
2.针对问题2,由于是小白,前期因为没找到zookeeper服务器的运行日志而走了很多弯路。zk配置中的日志文件路径不知什么原因并没有写入相应日志文件。如找不到时看下xxx/zookeeper/bin同级目录看是否有个logs目录。结合客户端和服务端日志才能更快定位出问题原因
3.注意如果要读、写大节点时(超过1M),可能要同时修改zookeeper服务端配置和客户端配置才可解决问题。
更多推荐
所有评论(0)