使用G1后报错

Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>]

ES: 7.5.0

今天在优化ES的GC配置的时候又踩了坑,因为之前在7.1上使用过G1,感觉效果还不错,这次的7.5上也就直接升级成了G1,jvm.options的配置是

-Xms24g
-Xmx24g

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

## G1GC Configuration
# NOTE: G1GC is only supported on JDK version 10 or later.
# To use G1GC uncomment the lines below.
10-:-XX:-UseConcMarkSweepGC
10-:-XX:-UseCMSInitiatingOccupancyOnly
10-:-XX:+UseG1GC
10-:-XX:InitiatingHeapOccupancyPercent=75

升级完成之后,集群中的节点在查询的时候有时候会报错

Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [24714355312/23gb], which is larger than the limit of [24481313587/22.7gb], real usage: [24714353312/23gb], new bytes reserved: [2000/1.9kb], usages [request=0/0b, fielddata=11807/11.5kb, in_flight_requests=2932/2.8kb, accounting=45086480/42.9mb]
	at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.5.0.jar:7.5.0]
	at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.5.0.jar:7.5.0]

从这个报错来看是触发了集群的熔断操作导致的,最开始没有注意,后来发现在低流量的时候也很容易触发复现,感觉问题似乎并不简单。使用关键字看到官方有这样的文档

G1 GC were setup to use an InitiatingHeapOccupancyPercent of 75. This
could leave used memory at a very high level for an extended duration,
triggering the real memory circuit breaker even at low activity levels.
The value is a threshold for old generation usage relative to total heap
size and thus it should leave room for the new generation. Default in
G1 is to allow up to 60 percent for new generation and this could mean that the
threshold was effectively at 135% heap usage. GC would still kick in of course and
eventually enough mixed collections would take place such that adaptive adjustment
of IHOP kicks in.

The JVM has adaptive setting of the IHOP, but this does not kick in
until it has sampled a few collections. A newly started, relatively
quiet server with primarily new generation activity could thus
experience heap above 95% frequently for a duration.

The changes here are two-fold:

Use 30% default for IHOP (the JVM default of 45 could still mean
105% heap usage threshold and did not fully ensure not to hit the
circuit breaker with low activity)
Set G1ReservePercent=25. This is used by the adaptive IHOP mechanism,
meaning old/mixed GC should kick in no later than at 75% heap. This
ensures IHOP stays compatible with the real memory circuit breaker also
after being adjusted by adaptive IHOP.

也就是说

10-:-XX:InitiatingHeapOccupancyPercent=75

这个配置是有问题的,这个配置了老年代的初始化内存,但是G1允许新生代使用60%的jvm内存,所以总的内存量可能很容易达到135%,因为这个时候可能并没有占满内存,G1的回收可能不会触发,但是对于ES来说,已经触发了内存熔断机制,所以即使请求量级不大,可能也会频繁报内存熔断的错误,需要注意哦。

修正的方法是

10-:-XX:-UseConcMarkSweepGC
10-:-XX:-UseCMSInitiatingOccupancyOnly
10-:-XX:+UseG1GC
10-:-XX:G1ReservePercent=25
10-:-XX:InitiatingHeapOccupancyPercent=30

增加和修改了配置

10-:-XX:G1ReservePercent=25
10-:-XX:InitiatingHeapOccupancyPercent=30

坑死了,刚重启完毕,又要重启一遍😭

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐