ElasticSearch写分词keyword、text以及读分词term、match

图解：一、写分词keyword、text字符串 - text：文本索引，分词默认结合standard analyzer(标准解析器)对文本进行分词、倒排索引。不支持聚合，排序操作。模糊匹配，支持 term、match 查询。字符串 - keyword：关键词索引，不分词不分词，直接将完整的文本保存到倒排索引中。支持聚合、排序操作。支持的最大长度为32766个UTF-8类型的字符,可以通过设置ign

yzh_1346983557

2726人浏览 · 2022-05-16 15:23:57

yzh_1346983557 · 2022-05-16 15:23:57 发布

图示：

一、写分词keyword、text

字符串 - text：文本索引，分词

默认结合standard analyzer(标准解析器)对文本进行分词、倒排索引。

不支持聚合，排序操作。

模糊匹配，支持 term、match 查询。

字符串 - keyword：关键词索引，不分词

不分词，直接将完整的文本保存到倒排索引中。

支持聚合、排序操作。

支持的最大长度为32766个UTF-8类型的字符,可以通过设置ignore_above指定自持字符长度，超过给定长度后的数据将不被索引，无法通过term精确匹配数据。

精确匹配，支持 term、match 查询。

keyword、text分词对比举例：

DELETE /yzh
PUT /yzh
{
  "mappings": {
    "properties": {
      "code":{
        "type": "keyword"
      },
      "desc":{
        "type": "text"
      }
    }
  }
}

POST /yzh/_doc
{
  "code":"我的code1",
  "desc":"我的第一个code啊"
}

POST /yzh/_doc
{
  "code":"我的code2",
  "desc":"我的第二个code啊"
}

analyze(分析)数据：

POST /yzh/_analyze
{
  "field":"code",
  "text":"我的code1"
}

POST /yzh/_analyze
{
  "field":"name",
  "text":"我的第二个code啊"
}

对比下方分析的结果，可发现 keyword 类型的字段不会分词直接整个文本存储到倒排索引中了，text 类型的字段分词后才进行倒排索引：

{
  "tokens" : [
    {
      "token" : "我的code1",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    }
  ]
}

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "的",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "第",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "二",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "个",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "code",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 5
    },
    {
      "token" : "啊",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    }
  ]
}

keyword、text结合使用举例：

索引 yzh 添加索引字段 name ：name使用text类型，分词了，但name下添加字段keyword使用keyword类型，没有分词。

PUT /yzh/_mapping
{
  "properties": {
    "name": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    }
  }
}

GET /yzh/_mapping

结果：
{
  "yzh" : {
    "mappings" : {
      "properties" : {
        "code" : {
          "type" : "keyword"
        },
        "desc" : {
          "type" : "text"
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

keyword、text结合使用查询举例：

POST /yzh/_doc
{
  "code":"我的code3",
  "desc":"我的第三个code啊",
  "name":"我看看咋结合的"
}

POST /yzh/_doc
{
  "code":"我的code4",
  "desc":"我的第四个code啊",
  "name":"我看看咋结合的2"
}

POST /yzh/_analyze
{
  "field":"name.keyword",
  "text":"我看看咋结合的"
}

结果：
{
  "tokens" : [
    {
      "token" : "我看看咋结合的",
      "start_offset" : 0,
      "end_offset" : 7,
      "type" : "word",
      "position" : 0
    }
  ]
}

POST /yzh/_analyze
{
  "field":"name",
  "text":"我看看咋结合的"
}

结果：
{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "看",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "看",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "咋",
      "start_offset" : 3,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "结",
      "start_offset" : 4,
      "end_offset" : 5,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "合",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    },
    {
      "token" : "的",
      "start_offset" : 6,
      "end_offset" : 7,
      "type" : "<IDEOGRAPHIC>",
      "position" : 6
    }
  ]
}

term精确查询：name(name为text类型，分词了)

GET /yzh/_search
{
  "query": {
    "term": {
      "name": "我看看咋结合的"
    }
  }
}

结果为空

term精确查询：name.keyword(name.keyword为keyword类型，没有分词)

GET /yzh/_search
{
  "query": {
    "term": {
      "name.keyword": "我看看咋结合的"
    }
  }
}

结果：
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.6931472,
    "hits" : [
      {
        "_index" : "yzh",
        "_type" : "_doc",
        "_id" : "HQzrwYABJSFE9aEtrO9q",
        "_score" : 0.6931472,
        "_source" : {
          "code" : "我的code3",
          "desc" : "我的第三个code啊",
          "name" : "我看看咋结合的"
        }
      }
    ]
  }
}

match模糊匹配：name(name为text类型，分词了)

GET /yzh/_search
{
  "query": {
    "match": {
      "name": "我"
    }
  }
}

结果：
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.4481319,
    "hits" : [
      {
        "_index" : "yzh",
        "_type" : "_doc",
        "_id" : "HQzrwYABJSFE9aEtrO9q",
        "_score" : 1.4481319,
        "_source" : {
          "code" : "我的code3",
          "desc" : "我的第三个code啊",
          "name" : "我看看咋结合的"
        }
      },
      {
        "_index" : "yzh",
        "_type" : "_doc",
        "_id" : "7QwDwoABJSFE9aEtae_F",
        "_score" : 1.3795623,
        "_source" : {
          "code" : "我的code4",
          "desc" : "我的第四个code啊",
          "name" : "我看看咋结合的2"
        }
      }
    ]
  }
}

二、读分词term、match

term：精确匹配

精确查询，搜索前不会再对搜索词进行分词拆解。

GET /yzh/_search
{
  "query": {
    "term": {
      "code": "我的code1"
    }
  }
}

GET /yzh/_search
{
  "query": {
    "term": {
      "desc": "我"
    }
  }
}

terms：多条件精确匹配(or)

term属于精确匹配，只能查单个词。精确匹配多个词使用terms。

GET /yzh/_search
{
  "query": {
    "terms": {
      "desc": ["我","二"]
    }
  }
}

terms里的 [ ] 是或的关系，只要匹配到其中一个词就会返回。想要多个词同时匹配，就得使用bool的must来做，如下：

GET /yzh/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "desc": "我"
          }
        },
        {
          "term": {
            "desc": "二"
          }
        }
      ]
    }
  }
}

match：分词匹配(or)

先进行分词拆分，拆完词再匹配查询。

GET /yzh/_search
{
  "query": {
    "match": {
      "code": "我的code1"
    }
  }
}

GET /yzh/_search
{
  "query": {
    "match": {
      "desc": "我1"
    }
  }
}

match：分词匹配and模式

match拆词后，只要匹配到其中一个词就会返回。想要多个词同时匹配，就得使用match的and模式。

GET /yzh/_search
{
  "query": {
    "match": {
      "desc": {
        "query": "我一",
        "operator": "and"
      }
    }
  }
}

match_phrase：词组匹配(分词匹配and且顺序一致)

要求所有的分词必须同时出现在文档中，同时位置必须紧邻一致，就得使用match_phrase。

GET /yzh/_search
{
  "query": {
    "match_phrase": {
      "desc": "我的第二个"
    }
  }
}

GET /yzh/_search
{
  "query": {
    "match_phrase": {
      "desc": "我第"
    }
  }
}

multi_match：多字段匹配

其中一个字段包含分词就被文档就被搜索到时，可以用multi_match。

GET /yzh/_search
{
  "query": {
    "multi_match": {
      "query": "我q",
      "fields": [
        "code",
        "desc"
      ]
    }
  }
}

华为开发者空间

华为开发者空间，是为全球开发者打造的专属开发空间，汇聚了华为优质开发资源及工具，致力于让每一位开发者拥有一台云主机，基于华为根生态开发、创新。

更多推荐

GaussDB全密态数据库等值查询

华为开发者空间

一文带你搞懂GaussDB数据库性能调优

华为开发者空间

COC云运维中心新特性解读，让智能运维更高效

华为开发者空间

所有评论(0)

查看更多评论

yzh_1346983557

@yzh_1346983557

已为社区贡献10条内容