Elasticsearch的 DSL查询

2.2 全文搜索(full-text query)2.2.1 匹配搜索(match query)2.3 短语搜索(match phrase query)2.3.1 query_string 查询2.4多字段匹配搜索(multi match query)2.5 词条级搜索(term-level queries)添加数据结构和添加数据结构添加数据3.5.1 词条搜索(term query)3.5.2

wudl5566

1821人浏览 · 2022-07-10 18:45:01

wudl5566 · 2022-07-10 18:45:01 发布

1. 整体的分类

2. DSL 语法

2.1 查询所有((match_all query)

语法

POST /wubigdata/_search
{
  "query": {
    "match_all": {}
  }
}

# query ：代表查询对象
# match_all ：代表查询所有
# 结果
#	took：查询花费时间，单位是毫秒
#	time_out：是否超时 	
#   _shards：分片信息
#	hits：搜索结果总览对象
#		total：搜索到的总条数
#		max_score：所有结果中文档得分的最高分
#		hits：搜索结果的文档对象数组，每个元素是一条搜索到的文档信息
#			_index：索引库
#			_type：文档类型
#			_id：文档id
#			_score：文档得分
#			_source：文档的源数据

2.2 全文搜索(full-text query)

2.2.1 匹配搜索(match query)

match queries 接收 text/numerics/dates, 对它们进行分词分析, 再组织成一个boolean查询。可通过operator 指定bool组合操作（or、and 默认是 or ）

A. or关系

POST /wudbes/_search
{
  "query": {
    "match": {
      "title": "小米电视"
    }
  }
}
## ------------------------------------------------------------------
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.2044649,
    "hits" : [
      {
        "_index" : "wudbes",
        "_type" : "_doc",
        "_id" : "_C9i54EBVssqw8qzOZhJ",
        "_score" : 1.2044649,
        "_source" : {
          "title" : "小米电视4A",
          "images" : "http://image.com/12479122.jpg",
          "price" : 4288
        }
      },
      {
        "_index" : "wudbes",
        "_type" : "_doc",
        "_id" : "_S9i54EBVssqw8qzR5gA",
        "_score" : 0.52354836,
        "_source" : {
          "title" : "小米手机",
          "images" : "http://image.com/12479622.jpg",
          "price" : 2699
        }
      }
    ]
  }
}

B. and 关系

我们需要更精确查找，我们希望这个关系变成 and。

POST /wudbes/_search
{
  "query": {
    "match": {
      "title": {
        "query": "小米电视",
        "operator": "and"
      }
    }
  }
}
#______________________________________________________________________________________________________
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.2044649,
    "hits" : [
      {
        "_index" : "wudbes",
        "_type" : "_doc",
        "_id" : "_C9i54EBVssqw8qzOZhJ",
        "_score" : 1.2044649,
        "_source" : {
          "title" : "小米电视4A",
          "images" : "http://image.com/12479122.jpg",
          "price" : 4288
        }
      }
    ]
  }
}

2.3 短语搜索(match phrase query)

match_phrase 查询用来对一个字段进行短语查询，可以指定 analyzer、slop移动因子

GET /wudbes/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query": "小米 4A",
        "slop": 2
      }
    }
  }
}



GET /wudbes/_search
{
  "query": {
    "query_string": {
      "query": "2699"
    }
  }
}


GET /wudbes/_search
{
  "query": {
    "query_string": {
      "query": "2699",
      "default_field": "title"
    }
  }
}

2.3.1 query_string 查询

Query String Query提供了无需指定某字段而对文档全文进行匹配查询的一个高级查询,同时可以指定在哪些字段上进行匹配

# 默认搜索
GET /wudbes/_search
{
  "query": {
    "query_string": {
      "query": "2699"
    }
  }
}
GET /wudbes/_search
{
  "query": {
    "query_string": {
      "query": "2699",
      "default_field": "title"
    }
  }
}



#逻辑查询

GET /wudbes/_search
{
  "query": {
    "query_string": {
      "query": "手机 OR 小米",
      "default_field": "title"
    }
  }
}

GET /wudbes/_search
{
  "query": {
    "query_string": {
      "query": "手机 AND 小米",
      "default_field": "title"
    }
  }
}



# 模糊查询 
GET /wudbes/_search
{
  "query": {
    "query_string": {
      "query": "大米~1",
      "default_field": "title"
    }
  }
}



# 多字段支持 
GET /wudbes/_search
{
  "query": {
    "query_string": {
      "query": "2699",
      "fields": [
        "title",
        "price"
      ]
    }
  }
}

2.4 多字段匹配搜索(multi match query)

多个字段上进行文本搜索，可用multi_match 。multi_match在 match的基础上支持对多个字段进行文本查询。

GET /wudbes/_search
{
  "query": {
    "multi_match": {
      "query": "2699",
      "fields": [
        "title","price"
      ]
    }
  }
}

#  可以用*来匹配多个字段

GET /wudbes/_search
{
  "query": {
    "multi_match": {
      "query": "http://image.com/12479622.jpg",
      "fields": [
        "title",
        "ima*"
      ]
    }
  }
}

2.5 词条级搜索(term-level queries)

使用term-level queries根据结构化数据中的精确值查找文档

添加数据结构和添加数据
结构

PUT /wudl_book
{
  "settings": {},
  "mappings": {
    "properties": {
      "description": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "name": {
        "type": "text",
        "analyzer": "ik_max_word"
      },
      "price": {
        "type": "float"
      },
      "timestamp": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}

添加数据

PUT /wudl_book/_doc/1
{
  "name": "lucene",
  "description": "Lucene Core is a Java library providing powerful indexing and search features, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities. The PyLucene sub project provides Python bindings for Lucene Core. ",
  "price": 100.45,
  "timestamp": "2022-07-10 19:11:35"
}


PUT /wudl_book/_doc/2
{
  "name": "solr",
  "description": "Solr is highly scalable, providing fully fault tolerant distributed indexing, search and analytics. It exposes Lucenes features through easy to use JSON/HTTP interfaces or native clients for Java and other languages.",
  "price": 320.45,
  "timestamp": "2022-07-10 17:11:35"
}
  
  
PUT /wudl_book/_doc/3
{
  "name": "Hadoop",
  "description": "The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.",
  "price": 620.45,
  "timestamp": "2022-07-22 19:18:35"
}
PUT /wudl_book/_doc/4
{
  "name": "ElasticSearch",
  "description": "Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力 的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java语言开发的，并作为Apache许可条 款下的开放源码发布，是一种流行的企业级搜索引擎。Elasticsearch用于云计算中，能够达到实时搜 索，稳定，可靠，快速，安装使用方便。官方客户端在Java、.NET（C#）、PHP、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示，Elasticsearch是最受欢 迎的企业搜索引擎，其次是Apache Solr，也是基于Lucene。",
  "price": 999.99,
  "timestamp": "2022-08-15 10:11:35"
}

3.5.1 词条搜索(term query)

term 查询用于查询指定字段包含某个词项的文档

POST /wudl_book/_search
{
  "query": {
    "term": {
      "name": "solr"
    }
  }
}

3.5.2 词条集合搜索(terms query)

terms 查询用于查询指定字段包含某些词项的文档

GET /wudl_book/_search
{
  "query": {
    "terms": {
      "name": [
        "solr",
        "elasticsearch"
      ]
    }
  }
}

3.5.3 范围搜索(range query)

gte：大于等于
gt：大于
lte：小于等于
lt：小于
boost：查询权重

GET /wudl_book/_search
{
  "query": {
    "range": {
      "price": {
        "gte": 10,
        "lte": 200,
        "boost": 2
      }
    }
  }
}


GET /wudl_book/_search
{
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-2d/d",
        "lt": "now/d"
      }
    }
  }
}


GET wudl_book/_search
{
  "query": {
    "range": {
      "timestamp": {
        "gte": "18/08/2020",
        "lte": "2021",
        "format": "dd/MM/yyyy||yyyy"
      }
    }
  }
}

3.5.4 不为空搜索(exists query)

查询指定字段值不为空的文档。相当 SQL 中的 column is not null

GET /wudl_book/_search
{
  "query": {
    "exists": {
      "field": "price"
    }
  }
}

3.5.5 词项前缀搜索(prefix query)

GET /wudl_book/_search
{
  "query": {
    "prefix": {
      "name": "so"
    }
  }
}

3.5.6 通配符搜索(wildcard query)

GET /wudl_book/_search
{
  "query": {
    "wildcard": {
      "name": "so*r"
    }
  }
}
GET /wudl_book/_search
{
  "query": {
    "wildcard": {
      "name": {
        "value": "lu*",
        "boost": 2
      }
    }
  }
}

3.5.7 正则搜索(regexp query)

GET /wudl_book/_search
{
  "query": {
    "regexp": {
      "name": "s.*"
    }
  }
}

GET /wudl_book/_search
{
  "query": {
    "regexp": {
      "name": {
        "value": "s.*",
        "boost": 1.2
      }
    }
  }
}

3.5.8 模糊搜索(fuzzy query)


GET /wudl_book/_search
{
  "query": {
    "fuzzy": {
      "name": "so"
    }
  }
}
GET /wudl_book/_search
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "so",
        "boost": 1,
        "fuzziness": 2
      }
    }
  }
}

GET /wudl_book/_search
{
  "query": {
    "fuzzy": {
      "name": {
        "value": "sorl",
        "boost": 1,
        "fuzziness": 2
      }
    }
  }
}

3.5.9 ids搜索(id集合查询)

GET /wudl_book/_search
{
  "query": {
    "ids": {
      "type": "_doc",
      "values": [
        "1",
        "3"
      ]
    }
  }
}

4.复合搜索(compound query)

4.1 constant_score query

GET /wudl_book/_search
{
  "query": {
    "term": {
      "description": "solr"
    }
  }
}
GET /wudl_book/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "description": "solr"
        }
      },
      "boost": 1.2
    }
  }
}

4.2 布尔搜索(bool query)

bool 查询用bool操作来组合多个查询字句为一个查询。 可用的关键字：
	must：必须满足
	filter：必须满足，但执行的是filter上下文，不参与、不影响评分
	should：或
	must_not：必须不满足，在filter上下文中执行，不参与、不影响评分

POST /wudl_book/_search
{
  "query": {
    "bool": {
      "should": {
        "match": {
          "description": "java"
        }
      },
      "filter": {
        "term": {
          "name": "solr"
        }
      },
      "must_not": {
        "range": {
          "price": {
            "gte": 200,
            "lte": 300
          }
        }
      },
      "minimum_should_match": 1,
      "boost": 1
    }
  }
}

4.3 排序

相关性评分排序
默认情况下，返回的结果是按照 相关性 进行排序的——最相关的文档排在最前。 在本章的后面部	分，我们会解释 相关性 意味着什么以及它是如何计算的， 不		过让我们首先看看 sort 参数以及如
何使用它。
为了按照相关性来排序，需要将相关性表示为一个数值。在 Elasticsearch 中， 相关性得分 由一
个浮点数进行表示，并在搜索结果中通过 _score 参数返回， 默认排序是 _score 降序，按照相
关性评分升序排序如下


POST /wudl_book/_search
{
  "query": {
    "match": {
      "description": "solr"
    }
  }
}

POST /wudl_book/_search
{
  "query": {
    "match": {
      "description": "solr"
    }
  },
  "sort": [
    {
      "_score": {
        "order": "asc"
      }
    }
  ]
}

字段值排序

POST /wudl_book/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

多级排序

POST /wudl_book/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    },
    {
      "timestamp": {
        "order": "desc"
      }
    }
  ]
}

4.4 分页

size:每页显示多少条
from:当前页起始索引, int start = (pageNum - 1) * size

POST /wudl_book/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ],
  "size": 2,
  "from": 2
}

4.5 高亮

POST /wudl_book/_search
{
  "query": {
    "match": {
      "name": "elasticsearch"
    }
  },
  "highlight": {
    "pre_tags": "<font color='pink'>",
    "post_tags": "</font>",
    "fields": [
      {
        "name": {}
      }
    ]
  }
}



POST /wudl_book/_search
{
  "query": {
    "match": {
      "name": "elasticsearch"
    }
  },
  "highlight": {
    "pre_tags": "<font color='pink'>",
    "post_tags": "</font>",
    "fields": [
      {
        "name": {}
      },
      {
        "description": {}
      }
    ]
  }
}

POST /wudl_book/_search
{
  "query": {
    "query_string": {
      "query": "elasticsearch"
    }
  },
  "highlight": {
    "pre_tags": "<font color='pink'>",
    "post_tags": "</font>",
    "fields": [
      {
        "name": {}
      },
      {
        "description": {}
      }
    ]
  }
}

4.6 文档批量操作（bulk 和 mget）

mget 批量查询
单条查询 GET /test_index/_doc/1，如果查询多个id的文档一条一条查询，网络开销太大


GET /_mget
{
  "docs": [
    {
      "_index": "wudl_book",
      "_id": 1
    },
    {
      "_index": "wudl_book",
      "_id": 2
    }
  ]
}

同一索引下批量查询：

GET /wudl_book/_mget
{
  "docs": [
    {
      "_id": 2
    },
    {
      "_id": 3
    }
  ]
}

bulk 批量增删改:
Bulk 操作解释将文档的增删改查一些列操作，通过一次请求全都做完。减少网络传输次数

POST /_bulk
{ 
  "delete": { "_index": "wudl_book", "_id": "1" }}
{ "create": { "_index": "wudl_book", "_id": "5" }} 
{ "name": "test14","price":100.99 }
{ "update": { "_index": "wudl_book", "_id": "2"} }
{ "doc" : {"name" : "test"} }

参数说明：

delete：删除一个文档，只要1个json串就可以了 删除的批量操作不需要请求体
create：相当于强制创建 PUT /index/type/id/_create
index：普通的put操作，可以是创建文档，也可以是全量替换文档
update：执行的是局部更新partial update操作
		格式：每个json不能换行。相邻json必须换行。
		隔离：每个操作互不影响。操作失败的行会返回其失败信息

实际用法：bulk请求一次不要太大，否则一下积压到内存中，性能会下降。所以，一次请求几千个操作、大小在几M正好。
bulk会将要处理的数据载入内存中，所以数据量是有限的，最佳的数据两不是一个确定的数据，它取决
于你的硬件，你的文档大小以及复杂性，你的索引以及搜索的负载。
一般建议是1000-5000个文档，大小建议是5-15MB，默认不能超过100M，可以在es的配置文件（ES的
config下的elasticsearch.yml）中配置。
http.max_content_length: 10mb

4.7 Filter DSL

#5.0 之前的写法
POST /wudl_book/_search
{
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "range": {
          "price": {
            "gte": 20,
            "lte": 1000
          }
        }
      }
    }
  }
}
#5.0 之后的写法
POST /wudl_book/_search
{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "range": {
          "price": {
            "gte": 200,
            "lte": 1000
          }
        }
      }
    }
  }
}

华为开发者空间

华为开发者空间，是为全球开发者打造的专属开发空间，汇聚了华为优质开发资源及工具，致力于让每一位开发者拥有一台云主机，基于华为根生态开发、创新。

更多推荐

华为开发者空间云开发环境（容器）操作指导

华为开发者空间

【openGauss】Oracle与openGauss/GaussDB数据一致性高效核对方案

华为开发者空间

【GaussDB】在逻辑复制中剔除指定用户的事务

基于逻辑复制标签实现过滤，技术上可行，但打标签这个附加操作需要在执行sql前执行（除非使用触发器，但触发器属于高风险操作，不建议使用），如果漏执行，将会存在错误覆盖目标库的风险。历史表归档方案通过在源库建立历史表存储归档数据，配置复制规则排除历史表的删除操作，虽然会增加IO开销，但实现简单、安全性高，避免了事务过滤可能带来的风险。虽然插入历史表会产生额外IO，可能使数据归档操作时间翻倍，但相比剔除