锦囊妙计解决elasticsearch集群为red状态

如何是自己搭建的elasticsearch集群，其实是比较容易发生丢失分片的情况的。1. 如果集群丢失了主分片则直接呈现红色的健康状态严重的会影响到对集群的写入，因为如果主分片丢了，但是集群的master节点会记着，有这个分片存在。所以写数据的时候，会报错。2.如果是丢失副本分片，则集群会呈现黄色的健康状态不影响写入和查询操作。究竟都有哪些原因导致集群丢失分片呢针对我使用一年elasticsear

水的精神

2700人浏览 · 2021-04-19 13:03:58

水的精神 · 2021-04-19 13:03:58 发布

如何是自己搭建的elasticsearch集群，其实是比较容易发生丢失分片的情况的。

1. 如果集群丢失了主分片

则直接呈现红色的健康状态
严重的会影响到对集群的写入，因为如果主分片丢了，但是集群的master节点会记着，有这个分片存在。所以写数据的时候，会报错。

2.如果是丢失副本分片，

则集群会呈现黄色的健康状态
不影响写入和查询操作。

究竟都有哪些原因导致集群丢失分片呢

针对我使用一年elasticsearch的经验，遇到过的错误，进行总结。我的elasticsearch是九个节点。目前最大的索引占用磁盘5.2T，加上副本有10+T，数据呢是有13亿数据。

1.由于集群本身运行的原因，导致的某个分片路由不到了

2.断电是一件很危险的事，断电很可能导致集群的状态不一致，因为落到磁盘上的版本和内存中断电前保留的版本不一致。

3.elasticsearch本身也是存在bug的，很可能导致在运行过程中就导致数据不一致了。只不过他没有想象中的脆弱。也没有想象中的安全。

锦囊妙计解决elasticsearch集群为red状态

根据我的使用经验来看，呈现红色可以基本上可以通过以下三个步骤来解决。这三个步骤是有优先级的。

先来热个身，从官网上看看我的解决问题的思路是否安全：

官网：https://www.elastic.co/guide/en/elasticsearch/reference/7.9/cluster-reroute.html

commands

(Required, array of objects) Defines the commands to perform. Supported commands are:

Properties of commands

move

Move a started shard from one node to another node. Accepts index and shard for index name and shard number, from_node for the node to move the shard from, and to_node for the node to move the shard to.

cancel

Cancel allocation of a shard (or recovery). Accepts index and shard for index name and shard number, and node for the node to cancel the shard allocation on. This can be used to force resynchronization of existing replicas from the primary shard by cancelling them and allowing them to be reinitialized through the standard recovery process. By default only replica shard allocations can be cancelled. If it is necessary to cancel the allocation of a primary shard then the allow_primary flag must also be included in the request.

allocate_replica

Allocate an unassigned replica shard to a node. Accepts index and shard for index name and shard number, and node to allocate the shard to. Takes allocation deciders into account.

Two more commands are available that allow the allocation of a primary shard to a node. These commands should however be used with extreme care, as primary shard allocation is usually fully automatically handled by Elasticsearch. Reasons why a primary shard cannot be automatically allocated include the following:

A new index was created but there is no node which satisfies the allocation deciders.
An up-to-date shard copy of the data cannot be found on the current data nodes in the cluster. To prevent data loss, the system does not automatically promote a stale shard copy to primary.

The following two commands are dangerous and may result in data loss. They are meant to be used in cases where the original data can not be recovered and the cluster administrator accepts the loss. If you have suffered a temporary issue that can be fixed, please see the retry_failed flag described above. To emphasise: if these commands are performed and then a node joins the cluster that holds a copy of the affected shard then the copy on the newly-joined node will be deleted or overwritten.

allocate_stale_primary

Allocate a primary shard to a node that holds a stale copy. Accepts the index and shard for index name and shard number, and node to allocate the shard to. Using this command may lead to data loss for the provided shard id. If a node which has the good copy of the data rejoins the cluster later on, that data will be deleted or overwritten with the data of the stale copy that was forcefully allocated with this command. To ensure that these implications are well-understood, this command requires the flag accept_data_loss to be explicitly set to true.

allocate_empty_primary

Allocate an empty primary shard to a node. Accepts the index and shard for index name and shard number, and node to allocate the shard to. Using this command leads to a complete loss of all data that was indexed into this shard, if it was previously started. If a node which has a copy of the data rejoins the cluster later on, that data will be deleted. To ensure that these implications are well-understood, this command requires the flag accept_data_loss to be explicitly set to true.

Examples

This is a short example of a simple reroute API call:
POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "test", "shard": 0,
        "from_node": "node1", "to_node": "node2"
      }
    },
    {
      "allocate_replica": {
        "index": "test", "shard": 1,
        "node": "node3"
      }
    }
  ]
}

1. 使用es给我们提供重新路由的api，在kiban上执行下边的命令，这种一般是可以解决问题的。

POST /_cluster/reroute?retry_failed=true

执行完稍微等一会儿，如果行就是行了，不行就要换下一个锦囊妙计了。

2. 还是使用es带的api，手动重新路由分片。

在kibana上执行下边命令，注意索引名，分片，数据节点这些，注意自己替换。我下边有写怎么查到这些。

POST _cluster/reroute
{
  "commands": [
    {
      "allocate_stale_primary": {
        # 这是有问题的索引
        "index": "device_search_20201204",

        # 这是有问题的分片
        "shard": 153,

        # 这是哪个数据节点
        "node": "reading_10.10.2.75_node2",
        "accept_data_loss": true
      }
    }
  ]
}

如果不知道怎么查询有问题的索引和分片，以及出问题的节点，使用下边的命令

可以通过命令:GET /_cluster/allocation/explain?pretty

在kibana上执行，得到结果如下：

3.强行把分片拉起来，放弃到这个有问题的分片

到了这里，就是鱼和熊掌不可兼得的问题了。有时候在线上，不得不使用这种丢失部分数据，快速恢复集群的方案。

不到迫不得已，不要使用这种方式。

最大的问题是，分片的数据丢了，所以这个是要谨慎使用的。

POST _cluster/reroute?pretty
{
   "commands": [
       {
           "allocate_empty_primary": {
               "index": "device_search_20201204",
               "shard": 40,
               "node": "reading_10.10.2.75_node1",
               "accept_data_loss": true
           }
       }
   ]
}