ES : 7.5.0

ES集群新增节点无法加入集群 timed out while waiting for initial discovery state - timeout: 30s

  ES集群需要增加专用的master节点,加入完master节点需要修改data节点的elasticsearch.yml为专用的data节点并重启。这里出现了一个问题,在data节点重启后无法加入cluster当中,自己形成了一个集群。

部分elasticsearch.yml配置如下

node.master: false
node.data: true

network.bind_host: 0.0.0.0
network.publish_host: 0.0.0.0
network.host: 0.0.0.0
transport.tcp.port: 9300
transport.tcp.compress: true
http.port: 9200
http.cors.enabled: true
http.cors.allow-origin: "*"
http.max_content_length: 100mb
discovery.seed_hosts: ["es01", "es6", "es10"]
cluster.initial_master_nodes: ["es01", "es6", "es10"]

服务启动后的日志如下


[2021-11-04T21:38:16,053][INFO ][o.e.e.NodeEnvironment    ] [es07] using [1] data paths, mounts [[/data (/dev/vdb)]], net usable_space [499.7gb], net total_space [499.7gb], types [xfs]
[2021-11-04T21:38:16,053][INFO ][o.e.e.NodeEnvironment    ] [es07] heap size [24gb], compressed ordinary object pointers [true]
[2021-11-04T21:38:16,055][INFO ][o.e.n.Node               ] [es07] node name [es07], node ID [pzyBT-bpT5G-iLebP6ooAw], cluster name [maisearch-algo]
[2021-11-04T21:38:16,055][INFO ][o.e.n.Node               ] [es07] version[7.5.0], pid[8281], build[default/rpm/e9ccaed468e2fac2275a3761849cbee64b39519f/2019-11-26T01:06:52.518245Z], OS[Linux/4.19.0-9.el7.ucloud.x86_64/amd64], JVM[AdoptOpenJDK/OpenJDK 64-Bit Server VM/13.0.1/13.0.1+9]
[2021-11-04T21:38:16,056][INFO ][o.e.n.Node               ] [es07] JVM home [/usr/share/elasticsearch/jdk]
[2021-11-04T21:38:16,056][INFO ][o.e.n.Node               ] [es07] JVM arguments [-Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=COMPAT, -Xms24g, -Xmx24g, -XX:NewRatio=3, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:-UseConcMarkSweepGC, -XX:-UseCMSInitiatingOccupancyOnly, -XX:+UseG1GC, -XX:InitiatingHeapOccupancyPercent=75, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.io.tmpdir=/tmp/elasticsearch-14433504592868401375, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -XX:ErrorFile=/data/elasticsearch/log/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=/data/elasticsearch/log/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -XX:UseAVX=2, -XX:MaxDirectMemorySize=12884901888, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=rpm, -Des.bundled_jdk=true]
[2021-11-04T21:38:17,778][INFO ][o.e.p.PluginsService     ] [es07] loaded module [aggs-matrix-stats]
[2021-11-04T21:38:17,779][INFO ][o.e.p.PluginsService     ] [es07] loaded module [analysis-common]d
....
....
....
[2021-11-04T21:38:17,780][INFO ][o.e.p.PluginsService     ] [es07] loaded module [x-pack-rollup]
[2021-11-04T21:38:17,781][INFO ][o.e.p.PluginsService     ] [es07] loaded module [x-pack-security]
[2021-11-04T21:38:17,781][INFO ][o.e.p.PluginsService     ] [es07] loaded module [x-pack-sql]d
[2021-11-04T21:38:20,449][INFO ][o.e.x.s.a.s.FileRolesStore] [es07] parsed [0] roles from file [/etc/elasticsearch/roles.yml]
[2021-11-04T21:38:20,878][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [es07] [controller/8512] [Main.cc@110] controller (64 bit): Version 7.5.0 (Build 17d1c724ca38a1) Copyright (c) 2019 Elasticsearch BV
[2021-11-04T21:38:21,367][DEBUG][o.e.a.ActionModule       ] [es07] Using REST wrapper from plugin org.elasticsearch.xpack.security.Security
[2021-11-04T21:38:21,468][INFO ][o.e.d.DiscoveryModule    ] [es07] using discovery type [zen] and seed hosts providers [settings]
[2021-11-04T21:38:22,148][INFO ][o.e.n.Node               ] [es07] initialized
[2021-11-04T21:38:22,148][INFO ][o.e.n.Node               ] [es07] starting ...
[2021-11-04T21:38:22,261][INFO ][o.e.t.TransportService   ] [es07] publish_address {10.19.114.93:9300}, bound_addresses {0.0.0.0:9300}
[2021-11-04T21:38:22,345][INFO ][o.e.b.BootstrapChecks    ] [es07] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2021-11-04T21:38:52,363][WARN ][o.e.n.Node               ] [es07] timed out while waiting for initial discovery state - timeout: 30s
[2021-11-04T21:38:52,385][INFO ][o.e.h.AbstractHttpServerTransport] [es07] publish_address {10.19.114.93:9200}, bound_addresses {0.0.0.0:9200}
[2021-11-04T21:38:52,385][INFO ][o.e.n.Node               ] [es07] started
[2021-11-04T21:38:58,964][INFO ][o.e.x.s.a.AuthenticationService] [es07] Authentication of [elastic] was terminated by realm [reserved] - failed to authenticate user [elastic]
[2021-11-04T21:39:01,053][INFO ][o.e.x.s.a.AuthenticationService] [es07] Authentication of [elastic] was terminated by realm [reserved] - failed to authenticate user [elastic]
[2021-11-04T21:39:02,913][INFO ][o.e.x.s.a.AuthenticationService] [es07] Authentication of [elastic] was terminated by realm [reserved] - failed to authenticate user [elastic]

这里看到节点启动之后报了鉴权错误

[2021-11-04T21:39:01,053][INFO ][o.e.x.s.a.AuthenticationService] [es07] Authentication of [elastic] was terminated by realm [reserved] - failed to authenticate user [elastic]


这个迷惑性很大,让我以为是当前节点配置了用户,查找了一下,也没有配置用户,后来仔细想想,应该是client请求过来,当前节点没能加入集群,自己形成了单节点的服务,所以client请求打过来的时候报了鉴权错误。并不是和集群中的其他节点沟通的时候报错。

再仔细查看日志,有这样一行

[2021-11-04T21:38:52,363][WARN ][o.e.n.Node               ] [es07] timed out while waiting for initial discovery state - timeout: 30s


说明是去找discovery的时候报超时错误,没有连上cluser,加了几个host,没有效果,快崩溃了,后来想想把hostname改成ip试试,节约一些dns的时间,结果真的可以了。

discovery.seed_hosts: ["10.19.5.106", "10.19.43.55","es5", "10.19.74.130"]

还是很神奇的😭,前面几个节点都是成功的,到这里就不行了。

在官方文档上面没有找到延长这个timeout的配置,这篇里面的配置太老了,在这里是没有用的。只能说明应该是master节点压力大,导致响应起来比较慢?

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐