第一篇：elasticsearch问题以及解决方案

本文基于elasticsearch版本7.10编写而成。1、脚本更新对象数组中的某个属性值。说下背景，业务方有需求说，商品图片开启了CDN加速，之前的图片域名都要修改，数据库中的图片地址通sql语句执行更新就可以，关键我们es中的数据不是采集数据库binlog存，是业侧发消息存，所以要手动更新下es中的图片地址域名。更新emails索引中的images字段的属性中imageUrl中的域名为新的域名

程序员阿鸿

2893人浏览 · 2021-12-16 10:54:39

程序员阿鸿 · 2021-12-16 10:54:39 发布

本文基于elasticsearch版本7.10编写而成。

1、脚本更新对象数组中的某个属性值。

说下背景，业务方有需求说，商品图片开启了CDN加速，之前的图片域名都要修改，数据库中的图片地址通sql语句执行更新就可以，关键我们es中的数据不是采集数据库binlog存，是业侧发消息存，所以要手动更新下es中的图片地址域名。

更新emails索引中的images字段的属性中imageUrl中的域名为新的域名。

emails mapping结构：

{
  "emails" : {
    "mappings" : {
      "dynamic" : "strict",
      "properties" : {
        "images" : {
          "properties" : {
            "imageUrl" : {
              "type" : "keyword"
            },
            "isMaster" : {
              "type" : "long"
            },
            "order" : {
              "type" : "long"
            }
          }
        }
      }
    }
  }
}

es dsl 语句：

POST emails/_update_by_query
{
  "query": {
    "match_all": {}
  },
  "script": {
    "source": """ 
    def a = ctx._source.images;
    if(a != null){
      for(def x : a){
          if(x['imageUrl'].startsWith('https://www.baidu.com')){
            x['imageUrl'] = x['imageUrl'].replace('https://www.baidu.com', 'https://www.toutiao.com');
         }
        }
      ctx._source.images = a;
    }
    """,
    "lang": "painless"
  }
}

2、更新简单类型数组字段。

emails mapping结构：

{
  "emails" : {
    "mappings" : {
      "dynamic" : "strict",
      "properties" : 
        "tagId" : {
          "type" : "long"
        }
      }
    }
  }
}

es dsl:

POST emails/_update_by_query
{
  "query": {
    "match_all": {}
  },
  "script": {
    "source": """
    def a = new ArrayList();
    if(ctx._source.tagId != null){
      def oldTagIds = ctx._source.tagId;
      if(oldTagIds instanceof Collection){
        a.addAll(oldTagIds)
      } else{
        a.add(oldTagIds)
      }
    }
    a.addAll(params.x);
    ctx._source.tagId = a;
      """,
    "params": {
      "x": [
        1,
        2,
        3,
        4,
        5
      ]
    },
    "lang": "painless"
  }
}

一定要注意判断旧值是否是集合类型，集合类型用addAll()方法，不然会出现集合中有个集合，而不是同一个集合。

3、使用flattened字段类型可能会产生的异常。

背景：业务表中有个扩展字段，放的是map结构，这个map中的值可以增加也可以删除。

1）出现如下异常。

Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception 
[type=max_bytes_length_exceeded_exception , reason=bytes can be at most 32766 in length;
 got 38881]

这个异常出现的原因是业务上扩展字段map中有个key的value 长度很大。阅读官网文档得到如下信息：

flattened类型有个参数ignore_above。官网对这个参数的解释是：

Leaf values longer than this limit will not be indexed. By default, there is no limit and all values 
will be indexed. Note that this limit applies to the leaf values within the flattened object field,
and not the length of the entire field.

中文释义：超过此限制的叶节点值将不会被索引。默认情况下，没有限制，所有值都将被索引。注意，这个限制适用于扁平对象字段中的叶节点值，而不是整个字段的长度。

This option is also useful for protecting against Lucene’s term byte-length limit of 32766.

官网给的ignore_above大小建议：

The value for ignore_above is the character count, but Lucene counts bytes. 
If you use UTF-8 text with many non-ASCII characters, you may want to set the 
limit to 32766 / 4 = 8191 since UTF-8 characters may occupy at most 4 bytes.

最终解决方案：搜索的value不需要太大设置1000吧。

POST emails/_mapping
{
  "properties": {
    "featureFlat": {
      "type": "flattened",
      "ignore_above":1000
    }
  }
}

2）如果flattened类型字段中的值有删除，那用update会造成删除的数据还在。

这个设计到es全文档更新和部分文档更新。

put请求会是全文档更新，post请求是部分更新。

最终解决方案是：用put请求，也就是IndexRequest（注：我用的是
elasticsearch-rest-high-level-client 来操作es）。

最后，今天先写的这儿吧，后面还有会持续更新，欢迎点赞、评论、收藏。这是我的第一篇技术文档，感谢大家的支持。

人生信条：我今天要比昨天多掌握一个知识点。

华为开发者空间

华为开发者空间，是为全球开发者打造的专属开发空间，汇聚了华为优质开发资源及工具，致力于让每一位开发者拥有一台云主机，基于华为根生态开发、创新。

更多推荐

华为“行业AI应用创新孵化营”走进中国医科大学，共探人才培养新模式

华为开发者空间

深化产教融合协同创新，华为云HCSD校园沙龙走进山西职业技术学院

华为开发者空间

2024华为开发者盛典，海外开发者代表团走进华为

华为开发者空间

所有评论(0)

查看更多评论

程序员阿鸿

@sinat_15637467

已为社区贡献1条内容