http://nicethemes.cn/news/txtlist_i28391v.html

这次来分享一下ES报错:java.io.IOException: Connection reset by peer 的解决经历

问题描述

本人最近负责了定时获取Prometheus Metrics并发送到ES做持久化存储的任务。然而在Metrics采集粒度从3分钟变为1小时后(这里使用的是Springboot的定时采集组件Scheduler),报了如下的错误:

java.io.IOException: Connection reset by peer
	at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:793) ~[elasticsearch-rest-client-7.4.0.jar!/:7.4.0]
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:218) ~[elasticsearch-rest-client-7.4.0.jar!/:7.4.0]
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:205) ~[elasticsearch-rest-client-7.4.0.jar!/:7.4.0]
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1454) ~[elasticsearch-rest-high-level-client-7.4.0.jar!/:7.4.0]
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1439) ~[elasticsearch-rest-high-level-client-7.4.0.jar!/:7.4.0]
	at org.elasticsearch.client.IndicesClient.exists(IndicesClient.java:785) ~[elasticsearch-rest-high-level-client-7.4.0.jar!/:7.4.0]
	at com.free4inno.scheduler.adapter.service.elasticsearch.GenericEsService.isIndexExists(GenericEsService.java:60) ~[classes!/:0.0.1-SNAPSHOT]
	at com.free4inno.scheduler.adapter.service.elasticsearch.MetricsToEsService.getTodayIndex(MetricsToEsService.java:56) ~[classes!/:0.0.1-SNAPSHOT]
	at com.free4inno.scheduler.adapter.service.elasticsearch.MetricsToEsService.insert(MetricsToEsService.java:49) ~[classes!/:0.0.1-SNAPSHOT]
	at com.free4inno.scheduler.adapter.service.Scheduler.sendMetrics(Scheduler.java:64) [classes!/:0.0.1-SNAPSHOT]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_261]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_261]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_261]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_261]
	at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) [spring-context-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) [spring-context-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93) [spring-context-5.2.8.RELEASE.jar!/:5.2.8.RELEASE]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_261]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_261]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_261]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_261]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_261]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_261]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_261]
Caused by: java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_261]
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_261]
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_261]
	at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[?:1.8.0_261]
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:378) ~[?:1.8.0_261]
	at org.apache.http.impl.nio.reactor.SessionInputBufferImpl.fill(SessionInputBufferImpl.java:231) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
	at org.apache.http.impl.nio.codecs.AbstractMessageParser.fillBuffer(AbstractMessageParser.java:136) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:241) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.13.jar!/:4.4.13]
	... 1 more

更令人费解的是,在采集Metrics发送至ES的过程中,在单数的整点写入ES是正常的,然而一到双数的整点就不正常了,例如0:00开始发送是可以发送到ES的,1:00就发送不到报错,2:00就重新恢复正常了…
后面经过调研,是因为esClient自动设置的KeepAlive时间为-1,也就是持续连接,然而这回受到外界的影响比如Firewall,会将TCP连接单方面断开,从而会导致Connection reset by peer的报错。

解决方案

在设置RestHighLevelClient的时候手动设置一下setKeepAliveStrategy的方法,在这里设置的是3分钟的时间(不要太长)

@Bean
    public RestHighLevelClient restHighLevelClient(@Autowired RestClientBuilder restClientBuilder){
        return new RestHighLevelClient(restClientBuilder.setHttpClientConfigCallback(requestConfig ->
                requestConfig.setKeepAliveStrategy((response, context) -> TimeUnit.MINUTES.toMillis(3))));
    }

或者也可以使用心跳,隔一段时间就去获取一次es的连接状态,以此来保持连接的活跃,个人认为这是下策,这里就没有进行尝试。

https://blog.csdn.net/qq_33999844/article/details/113843845

Elasticsearch出现Connection reset by peer分析

1.异常:

Caused by: java.io.IOException: Connection reset by peer
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:793)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:218)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:205)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1454)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1424)
2.分析

RestHighLevelClient客户端,使用的是apache httpclient,版本为4.x,keepAlive默认为-1(客户端会一直保持session);

当服务端因为超时或者其他原因关闭session,客户端仍然认为长连接存在,抛出异常;

注:当ES服务端的keepAlive短于ES客户端的keepAlive,也会导致:服务端已经关闭了连接,客户端继续复用该连接,抛出异常。

3.解决

手动设置KeepAliveStrategy来配置keepAlive,保证客户端keepAlive小于服务端keepAlive,让客户端先于服务端关闭连接
————————————————
版权声明:本文为CSDN博主「夜-NULL(zmc)」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/qq_33999844/article/details/113843845

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐