Elasticsearch常见unassigned原因和解决方式

整理Elasticsearch unassigned索引故障原因和解决办法

ArchitecTang

3541人浏览 · 2022-09-05 16:35:06

ArchitecTang · 2022-09-05 16:35:06 发布

查看unsigned 的原因

GET /_cluster/health
GET _cluster/allocation/explain?pretty

1、磁盘满

the node is above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=95%], using more disk space than the maximum allowed [95.0%], actual free: [4.055101177689788%]

解决：磁盘扩容或定期删除无用数据（设定数据保存时间）

DELETE /indexName

通常如果磁盘满了，ES为了保证集群的稳定性，会将该节点上所有的索引设置为只读。ES 7.x版本之后当磁盘空间提升后可自动解除，但是7.x版本之前则需要手动执行下面的API来解除只读模式：

PUT indexName/_settings
{
  "index": {
    "blocks": {
      "read_only_allow_delete": "false"
    }
  }
}

2、分配文档超过最大限制

failure IllegalArgumentException[number of documents in the index cannot exceed 2147483519

解决：向新索引中写入数据（按天生成新索引），并设置分片大小

3、主分片所在节点掉线

cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster

解决：找到掉线的故障原因，并将节点重新加入集群，等待分片恢复

PUT /_cluster/settings
{
  "transient" : {
    "cluster.routing.allocation.include._ip": "IP address"
  }
}

4、索引属性与节点属性不匹配

node does not match index setting [index.routing.allocation.require] filters [temperature:“warm”,_id:“comdNq4ZSd2Y6ycB9Oubsg”]

解决：重新设置索引的冷热属性，和节点保持一致；如果重新设置节点属性，则需要重启节点。可以通过API来修改索引所需要分配节点的温度属性

PUT /indexName/_settings
{
  "index": {
    "routing": {
      "allocation": {
        "require": {
          "temperature": "warm"
        }
      }
    }
  }
}

5、节点长时间掉线后再次加入集群，导致引入脏数据

cannot allocate because all found copies of the shard are either stale or corrupt

解决：使用reroute api

PUT /_cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "IndexName",
        "share": "0",
        "node": "nodeName",
        "accept_data_loss": true
      }
    }
  ]
}

6、未分配的分片太多，导致达到了分片恢复的最大阈值，其他分片需要排队等待

reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])

解决：使用cluster/settings调大分片恢复的并发度和速度

PUT /_cluster/settings
{
  "persistent": {
    "indices.recovery.max_bytes_per_sec": "200mb",
    "cluster.routing.allocation.node_concurrent_recoveries":5,
    "cluster.routing.allocation.cluster_concurrent_rebalance":5
  }
}

参考整理自：Elasticsearch集群规划及性能优化实践（笔记）

华为云开发者联盟

为开发者提供学习成长、分享交流、生态实践、资源工具等服务，帮助开发者快速成长。

更多推荐

cover

华为云Stack8.3面向香港正式发布，六大亮点激发云上跃迁

华为云开发者联盟

cover

6个实例带你解读TinyVue 组件库跨框架技术

华为云开发者联盟

cover

GaussDB SQL查询语句执行过程解析

华为云开发者联盟

所有评论(0)

查看更多评论

ArchitecTang

已为社区贡献4条内容