2021-05-03 14:16:47
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException: Connection unexpectedly closed by remote task manager 'iZbp138pkl8lo4rs8xtvkhZ/172.16.211.184:42932'. This might indicate that the remote task manager was lost.
	at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.channelInactive(CreditBasedPartitionRequestClientHandler.java:167)
	at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at org.apache.flink.shaded.netty4.io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
	at org.apache.flink.runtime.io.network.netty.ZeroCopyNettyMessageDecoder.channelInactive(ZeroCopyNettyMessageDecoder.java:274)
	at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
	at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1429)
	at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
	at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
	at org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:947)
	at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:826)
	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
	at org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:474)
	at org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
	at java.lang.Thread.run(Thread.java:834)

从本次错误日志发现tm存在超时失联情况,可能存在的原因内存不足或者tm的通信时间较短,调整相关参数

state.backend.incremental=true;
taskmanager.memory.managed.fraction =0.3;
state.backend.rocksdb.block.blocksize=64 kb;
state.backend.rocksdb.block.cache-size=128 mb;
state.backend.rocksdb.files.open = -1;
state.backend.rocksdb.writebuffer.size =128 mb;
state.backend.rocksdb.writebuffer.count=4;
state.backend.rocksdb.writebuffer.number-to-merge=2;

state.backend.rocksdb.compaction.style=level;
state.backend.rocksdb.thread.num=4;
state.backend.rocksdb.metrics.block-cache-usage=true;
state.backend.rocksdb.checkpoint.transfer.thread.num=8;


table.dynamic-table-options.enabled=true;
table.exec.mini-batch.enabled=true;
table.exec.mini-batch.size=35000;
table.optimizer.distinct-agg.split.enabled=true;
table.exec.mini-batch.allow-latency=15 s;

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐