ElasticSearch集群状态异常(Red、Yellow)原因分析
注: 部分概念介绍来源于网络一、ElasticSearch集群的三种状态:Green - 所有数据都可用,主副分片都已经分配好Yellow - 所有数据都可用,但尚未分配一些副本,不影响查询,可能影响恢复。如果集群中的某个节点发生故障,则在修复该节点之前,某些数据可能不可用。Red - 某些数据由于某种原因 存在主分片未分配,对查询会有影响二、查询索引Yellow状态原因1、查看集群的健康并显示索
注: 部分概念介绍来源于网络
一、ElasticSearch集群的三种状态:
Green - 所有数据都可用,主副分片都已经分配好
Yellow - 所有数据都可用,但尚未分配一些副本,不影响查询,可能影响恢复。如果集群中的某个节点发生故障,则在修复该节点之前,某些数据可能不可用。
Red - 某些数据由于某种原因 存在主分片未分配,对查询会有影响
二、查询索引Yellow状态原因
1、查看集群的健康并显示索引状态
GET /_cluster/health?level=indices
{
"cluster_name" : "elasticsearch-1",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
#活动主分区数量
"active_primary_shards" : 28,
#活动主分区和副本分区的总数
"active_shards" : 55,
#正在重定位的分片数量
"relocating_shards" : 0,
#正在初始化的分片数量
"initializing_shards" : 0,
#未分配的分片数
"unassigned_shards" : 3,
#其分配因超时设置而延迟的分片数
"delayed_unassigned_shards" : 0,
#尚未执行的集群级别更改的数量
"number_of_pending_tasks" : 0,
#为完成的访问数量
"number_of_in_flight_fetch" : 0,
#自最早的初始化任务等待执行以来的时间(以毫秒为单位)
"task_max_waiting_in_queue_millis" : 0,
#集群中活动碎片的比率,以百分比表示
"active_shards_percent_as_number" : 100.0,
"indices" : {
"elasticsearch-1" : {
"status" : "green",
"number_of_shards" : 3,
"number_of_replicas" : 3,
"active_primary_shards" : 5,
"active_shards" : 10,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 3
}
}
}
2、查看集群中每个节点的分片分配情况
GET /_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
19 86.7kb 36.9gb 95.2gb 132.2gb 27 127.0.0.1 127.0.0.1 master
18 73.1kb 36.9gb 95.2gb 132.2gb 27 127.0.0.1 127.0.0.1 node-003
18 67.8kb 36.9gb 95.2gb 132.2gb 27 127.0.0.1 127.0.0.1 node-002
3 UNASSIGNED
#unassigned_shards=3,确定是副本分片未分配,导致集群状态Yellow
3、查看unassigned的原因
GET /_cluster/allocation/explain?pretty
{
"index" : "elasticsearch-1",
"shard" : 3,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2022-04-20T11:01:43.051Z",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
#异常原因
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "NfmBH4nSSpGmtf7aPNuvXQ",
"node_name" : "master",
"transport_address" : "127.0.0.1:9300",
"node_decision" : "no",
"deciders" : [{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the same cannot be allocate to the same node no which a copy of the shard already exists "
}]
}]
}
查看每个节点原因说有同样的数据,不能分配。
4、查看所有的分片
GET _cat/shards?h=index,shard,prirep,state,unassigned.reason
5、修改索引副本数
PUT /elasticsearch-1/_settings
{
"number_of_replicas": 2
}
6、更改完后查询
GET /_cluster/health?level=indices
"unassigned_shards" : 0
三、总结(Red、Yellow)
遇到集群Red、Yellow时,我们可以从如下方法排查 :
集群层面:curl -s 172.31.30.28:9200/_cat/nodes 或者 GET /_cluster/health
索引层面:GET /_cluster/health?pretty&level=indices
分片层面:GET /_cluster/health?pretty&level=shards
恢复情况:GET /_recovery?pretty
1、有unassigned分片的排查思路 :
先诊断:GET /_cluster/allocation/explain
#重新分配: /_cluster/reroute
实在无法分配,索引重建:
1.1、新建备份索引:
curl -XPUT ‘http://xxxx:9200/a_index_copy/‘ -d ‘{ “settings”:{ “index”:{ “number_of_shards”:3, “number_of_replicas”:1 } } }
1.2、通过reindex api将a_index数据copy到a_index_copy:
POST _reindex { "source": { "index": "a_index" }, "dest": { "index": "a_index_copy", "op_type": "create" } }
1.3、删除a_index索引,这个必须要先做,否则别名无法添加
curl -XDELETE 'http://xxxx:9200/a_index'
1.4、给a_index_copy添加别名a_index
curl -XPOST 'http://xxxx:9200/_aliases' -d ' { "actions": [ {"add": {"index": "a_index_copy", "alias": "a_index"}} ] }'
更多推荐
所有评论(0)