查看unsigned 的原因

GET /_cluster/health
GET _cluster/allocation/explain?pretty
1、磁盘满
the node is above the high watermark cluster setting [cluster.routing.allocation.disk.watermark.high=95%], using more disk space than the maximum allowed [95.0%], actual free: [4.055101177689788%]

解决:磁盘扩容或定期删除无用数据(设定数据保存时间)

DELETE /indexName

通常如果磁盘满了,ES为了保证集群的稳定性,会将该节点上所有的索引设置为只读。ES 7.x版本之后当磁盘空间提升后可自动解除,但是7.x版本之前则需要手动执行下面的API来解除只读模式:

PUT indexName/_settings
{
  "index": {
    "blocks": {
      "read_only_allow_delete": "false"
    }
  }
}
2、分配文档超过最大限制
failure IllegalArgumentException[number of documents in the index cannot exceed 2147483519

解决:向新索引中写入数据(按天生成新索引),并设置分片大小

3、主分片所在节点掉线
cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster

解决:找到掉线的故障原因,并将节点重新加入集群,等待分片恢复

PUT /_cluster/settings
{
  "transient" : {
    "cluster.routing.allocation.include._ip": "IP address"
  }
}
4、索引属性与节点属性不匹配
node does not match index setting [index.routing.allocation.require] filters [temperature:“warm”,_id:“comdNq4ZSd2Y6ycB9Oubsg”]

解决:重新设置索引的冷热属性,和节点保持一致;如果重新设置节点属性,则需要重启节点。可以通过API来修改索引所需要分配节点的温度属性

PUT /indexName/_settings
{
  "index": {
    "routing": {
      "allocation": {
        "require": {
          "temperature": "warm"
        }
      }
    }
  }
}
5、节点长时间掉线后再次加入集群,导致引入脏数据
cannot allocate because all found copies of the shard are either stale or corrupt

解决:使用reroute api

PUT /_cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        "index": "IndexName",
        "share": "0",
        "node": "nodeName",
        "accept_data_loss": true
      }
    }
  ]
}

6、未分配的分片太多,导致达到了分片恢复的最大阈值,其他分片需要排队等待

reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])

解决:使用cluster/settings调大分片恢复的并发度和速度

PUT /_cluster/settings
{
  "persistent": {
    "indices.recovery.max_bytes_per_sec": "200mb",
    "cluster.routing.allocation.node_concurrent_recoveries":5,
    "cluster.routing.allocation.cluster_concurrent_rebalance":5
  }
}

参考整理自:Elasticsearch集群规划及性能优化实践(笔记)

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐