Elasticsearch:学习文档(来自谷粒商城项目的ES 7.4.2)
一、基本介绍1、索引、类型、文档:es和mysql类似,es中的索引、类型、文档分别对应mysql中的库、表、数据,不过es中的文档中的数据是以json格式来写的,一行就是一串json格式的字符串,json中的属性可以认为是mysql中的列名,json的值可以认为就是mysql中的属性2、倒排索引我们把数据存在es之后,es会将数据进行分析,然后将数据进行分词,例如红海行动分成了红海、行动,而探索
文章目录
一、基本介绍
1、索引、类型、文档
es和mysql类似,es中的索引、类型、文档分别对应mysql中的库、表、数据,不过es中的文档中的数据是以json格式来写的,一行就是一串json格式的字符串,json中的属性可以认为是mysql中的列名,json的值可以认为就是mysql中的属性
2、倒排索引
我们把数据存在es之后,es会将数据进行分析,然后将数据进行分词,例如红海行动分成了红海、行动,而探索红海行动被分成了探索、红海、行动,然后红海特别行动被分成了红海、特别、行动等等,之后把这些词放在表左边,包含这些词的记录放在词的右边,当我们检索某个词的时候会根据相关性得分得出数据的展示顺序,如下图:
根据上图来看,如果我们检索“红海特工行动”,这些词语可以分为红海、特工、行动三个词,可以看到这三个词中只有特工没有被匹配到,然后我们看匹配到的词,其中1号数据、2号数据、3号数据、5号数据有2个词匹配上了,而4号数据只有一个匹配上了,我们分别来计算这些数据的相关性得分,如下:
数据 | 相关性得分 |
---|---|
1号数据 | 2/2 |
2号数据 | 2/3 |
3号数据 | 2/3 |
4号数据 | 1/2 |
5号数据 | 2/4 |
上面的相关性得分(文档得分)是将数据中的匹配词出现的次数当做分子,而数据会划分的总词数当做分母,得出的相关性得分越大显示检索的时候越靠前
以上为了介绍倒排索引对分词说的比较简单,其实并没有这么简单,例如红海特别行动不仅仅只被分成红海、特别、行动
,还有可能被分成特别行动
等等,但是上面已经把分词的含义说清楚了,不过实际分词的时候更加细致而已,毕竟分词数目不好确定
3、访问地址
elasticsearch:http://192.168.56.10:9200
kibana:http://192.168.56.10:5601
二、常用操作
1、查询节点以及集群相关信息
(1)查看所有es节点
http://192.168.56.10:9200/_cat/nodes
结果:
127.0.0.1 62 93 5 0.35 0.29 0.38 dilm * 8e383449ab38
解释:
其中*
代表当前节点是主节点,8e383449ab38
是节点的name,我们可以通过访问http://192.168.56.10:9200
看到该name
(2)查看健康状况
http://192.168.56.10:9200/_cat/health
结果:
1609038775 03:12:55 elasticsearch green 1 1 3 3 0 0 0 0 - 100.0%
解释:
green表示当前集群是非常监控的,后面的数字等是集群分片信息
(3)查看主节点
http://192.168.56.10:9200/_cat/master
结果:
manSbe-MSkyNN5WMByvsgQ 127.0.0.1 127.0.0.1 8e383449ab38
解释:
8e383449ab38
是主节点的name,可以通过访问http://192.168.56.10:9200
看到
(4)查看所有索引
http://192.168.56.10:9200/_cat/indices
结果:
green open .kibana_task_manager_1 uaBoceJ2RcOePTGlYFqfjA 1 0 2 0 12.5kb 12.5kb
green open .apm-agent-configuration FH7Ig-YZQBmihJE8HLLkWw 1 0 0 0 283b 283b
green open .kibana_1 oqIZOOuCRhKYadlo_1821Q 1 0 6 0 29kb 29kb
解释:
可以存储一些配置等等,相当于mysql数据库中执行show databases
(5)查看节点信息
http://192.168.56.10:9200/_nodes/stats
结果:
{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "elasticsearch",
"nodes": {
"7gI4uiYxRF2bJILMiG69tg": {
"timestamp": 1652883914938,
"name": "LAPTOP-OTED0HAJ",
"transport_address": "127.0.0.1:9300",
"host": "127.0.0.1",
"ip": "127.0.0.1:9300",
"roles": [
"data",
"ingest",
"master",
"ml",
"remote_cluster_client",
"transform"
]
……
(6)查看简易分片信息
http://192.168.56.10:9200/_cat/shards
结果:
kms.wiki.hotlemmas 3 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki.hotlemmas 1 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki.hotlemmas 2 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki.hotlemmas 4 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki.hotlemmas 0 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
.ds-.logs-deprecation.elasticsearch-default-2024.03.29-000001 0 p STARTED 127.0.0.1 DESKTOP-D51CQ5R
.ds-ilm-history-5-2024.03.29-000001 0 p STARTED 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki 2 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki 4 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki 3 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki 7 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki 8 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki 1 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki 6 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki 5 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
kms.wiki 0 p STARTED 0 226b 127.0.0.1 DESKTOP-D51CQ5R
(7)查看详细分片信息
http://192.168.56.10:9200/_cluster/state
结果:
{
"cluster_name": "elasticsearch",
"cluster_uuid": "bzk5iojSRgecxmy3kwaIRQ",
"version": 393,
"state_uuid": "Qn5de73TS7OvvfbhXHrIHQ",
"master_node": "7gI4uiYxRF2bJILMiG69tg",
"blocks": {},
"nodes": {
"7gI4uiYxRF2bJILMiG69tg": {
"name": "LAPTOP-OTED0HAJ",
"ephemeral_id": "ajpLYZfWTj-4nJGCs12JGA",
"transport_address": "127.0.0.1:9300",
"attributes": {
"ml.machine_memory": "51278336000",
"xpack.installed": "true",
"transform.node": "true",
"ml.max_open_jobs": "20"
}
}
},
"metadata": {
"cluster_uuid": "bzk5iojSRgecxmy3kwaIRQ",
"cluster_uuid_committed": true,
"cluster_coordination": {
"term": 21,
"last_committed_config": [
"7gI4uiYxRF2bJILMiG69tg"
],
"last_accepted_config": [
"7gI4uiYxRF2bJILMiG69tg"
],
"voting_config_exclusions": []
},
"templates": {
……
2、索引信息查询
(1)查询索引详细信息
请求方式:
GET
请求路径:
http://192.168.56.10:9200/索引名称
(2)查询索引数据总量
请求方式:
GET
请求路径:
http://192.168.56.10:9200/索引名称/_count
(2)根据条件查询数据量
请求方式:
GET
请求路径:
http://192.168.56.10:9200/索引名称/_count
请求体:
// 使用Query DSL即可,也就是普通的查询语句,其实和_search中的查询语句格式一致,这里不再赘述
结果:
{
"count": 299117,
"_shards": {
"total": 9,
"successful": 9,
"skipped": 0,
"failed": 0
}
}
3、索引(保存)文档
(1)使用Put请求方式保存文档
请求方式:
PUT
请求路径:
http://192.168.56.10:9200/customer/external/1
请求体:
{
"name": "John Doe"
}
结果:
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1
}
解释:
其中customer是索引名称,external是类型名称,1是唯一标识id(即文档),保存的数据是json格式的,返回的结果中带_的数据都被称为元数据,注意put请求必须携带id(/customer/external/后面的就是id),否则请求无法发送,如果索引—》类型下面没有对应id,第一次执行
该请求将会保存一个文档,可以看到返回值中的result是created;
以后在执行相同请求,将会执行更新操作,我们在更新操作中会具体说明
(2)使用Post请求方式保存文档
请求方式:
POST
请求路径:
// 方式1:不带id(/customer/external/后面的就是id)
http://192.168.56.10:9200/customer/external
// 方式2:带id(/customer/external/后面的就是id)
http://192.168.56.10:9200/customer/external/2
请求体:
{
"name": "John Doe"
}
结果(不带id执行):
{
"_index": "customer",
"_type": "external",
"_id": "eKJLpHYBFLB86FefIxDx",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 11,
"_primary_term": 1
}
解释:
1、不携带唯一标识id(/customer/external/后面的就是id),将自动生成唯一id,执行结果中的result为created,即保存操作,即使我们执行同一个请求,那每一次都是created保存操作
2、携带唯一标识id(/customer/external/后面的就是id),如果索引—》类型下面没有对应id(/customer/external/后面的就是id),第一次执行
该请求将会保存一个文档,可以看到返回值中的result是created;
以后在执行相同请求,将会执行更新操作,我们在更新操作中会具体说明
4、更新文档
(1)使用Put请求方式更新文档(全量更新)
请求方式:
PUT
请求路径:
http://192.168.56.10:9200/customer/external/1
请求体:
{
"name": "John Doe"
}
结果:
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 2,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 2,
"_primary_term": 1
}
解释:
该请求之前已经执行过一次了,可以看上面的“2、索引(保存)文档---》(1)使用Put请求方式保存文档”
,第二次同样的请求就是更新操作了,可以看到返回值中的result就是updated
(2)使用Post请求方式更新文档(全量更新,不带_update)
请求方式:
POST
请求路径:
// 带id(/customer/external/后面的就是id)
http://192.168.56.10:9200/customer/external/1
请求体:
{
"name": "John Doe"
}
结果:
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 3,
"result": "updated",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 3,
"_primary_term": 1
}
解释:
该请求之前已经执行过一次了,可以看上面的“2、索引(保存)文档---》(2)使用Post请求方式保存文档”
,第二次同样的请求就是更新操作了,可以看到返回值中的result就是updated
(3)使用Post请求方式更新文档(部分字段值更新,其他字段值不变,带_update)
请求方式:
POST
请求路径:
// 方式1:
// 带id(/customer/external/后面的就是id)
http://192.168.56.10:9200/customer/external/1/_update
// 方式2(ES7.X.X版本):
http://192.168.56.10:9200/customer/_update/1
请求体:
{
"doc": {
"name": "John Doe"
}
}
结果:
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 4,
"result": "noop",
"_shards": {
"total": 0,
"successful": 0,
"failed": 0
},
"_seq_no": 4,
"_primary_term": 1
}
解释:
http://192.168.56.10:9200/customer/external/1
请求之前已经执行过至少一次了,第二次相同索引—》类型—》唯一标识id(即文档)在执行就是更新操作了,可以看到返回值中的result就是updated或者noop,但是本次的更新和以往的不一样,上面我们也提到了两种更新操作,以上两种更新操作不会判断值是否真的改变了,他们会直接进行更新,然后更改_version、_seq_no、_shards对应的值,而本次带上_update会判断值是否真的改变了,如果真的改变了,它当然会更改_version、_seq_no、_shards对应的值,并且返回值中的result是updated;如果更新的值还是原来的值,那_version、_seq_no、_shards对应的值不会改变,另外返回值中的result是noop,使用_updated的特点就是可以判断值是否真的改变了,然后决定是进行更新操作还是不进行更新操作,但是需要注意的有两点:
1)、使用Post请求,请求后面不仅有唯一标识id(/customer/external/后面的就是id),还有_update,这是特定写法不能改变
2)、数据需要放在"doc": {}中,不能直接放在{}中
(4)更新数字
请求方式:
POST
请求路径:
// 方式1:
// 带id(/customer/external/后面的就是id)
http://192.168.56.10:9200/customer/external/1/_update
// 方式2(ES7.X.X版本):
http://192.168.56.10:9200/customer/_update/1
请求体:
{
"script" : "ctx._source.clickCount+=1"
}
解释:
其中ctx._source
是固定写法,而clickCount
是属性名称,这是数字类型的,然后1
是需要增加的值,当然还可以是其他数字,并且可以是负数,这都是支持的
(5)根据条件更新部分字段
请求方式:
POST
请求路径:
// customer:索引名称;external:文档名称
http://192.168.56.10:9200/customer/external/_update_by_query
请求体:
1、单值更新
{
"query": {
"term": {
"indexId": "ecb50a4579324f9fa35e6c8fa4d8807b"
}
},
"script": {
"source": "ctx._source.kgCategoryName = params.categoryName",
"params": {
"categoryName": "科技强国"
}
}
}
2、多值更新
{
"query": {
"term": {
"indexId": "ecb50a4579324f9fa35e6c8fa4d8807b"
}
},
"script": {
"source": "ctx._source.clickCount = params.clickCount;ctx._source.collectCount = params.collectCount",
"params": {
"clickCount": 1,
"collectCount": 2
}
}
}
解释:
多个值的话,中间用英文分号分隔开
(6)更新中的乐观锁操作
更新的时候可以看到这两个字段_seq_no(并发控制字段,每次更新都会加1,用来做乐观锁) 和_primary_term(主分片重新分配,就会变化,例如重启) ,更新的时候可以加上,假设有两个请求(索引—》类型后面已经有对应id)同时发送,以Put请求方式更新数据为例,当然也可以使用另外两种更新方式,依然使用id=1,请求数据还是{“name”: “John Doe”},请求地址如下:
http://192.168.56.10:9200/customer/external/1?if_seq_no=13&if_primary_term=1
其中if_seq_no和_seq_no对应比较,if_primary_term和_primary_term对应比较,只要我们更新之前上述两对值对应,那就可以执行更新,如果有任何一对值不对应,那就无法完成更新,具体执行结果如下:
{
"error": {
"root_cause": [
{
"type": "version_conflict_engine_exception",
"reason": "[1]: version conflict, required seqNo [13], primary term [1]. current document has seqNo [14] and primary term [1]",
"index_uuid": "ANbuGAD0TYC9_oNMvIskmA",
"shard": "0",
"index": "customer"
}
],
"type": "version_conflict_engine_exception",
"reason": "[1]: version conflict, required seqNo [13], primary term [1]. current document has seqNo [14] and primary term [1]",
"index_uuid": "ANbuGAD0TYC9_oNMvIskmA",
"shard": "0",
"index": "customer"
},
"status": 409
}
5、根据文档id查询文档
请求方式:
GET
请求路径:
http://192.168.56.10:9200/customer/external/1
结果:
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 4,
"_seq_no": 4,
"_primary_term": 1,
"found": true,
"_source": {
"name": "John Doe"
}
}
解释:
6、删除文档
方式1(根据indexId删除):
请求方式:
DELTE
请求方式:
http://192.168.56.10:9200/customer/external/1
结果:
{
"_index": "customer",
"_type": "external",
"_id": "1",
"_version": 15,
"result": "deleted",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 20,
"_primary_term": 1
}
解释:
可以看到返回值中的result是deleted,代表删除成功了,我们需要指定索引—》类型—》唯一标识id(即文档),例如/customer/external/1中的customer是索引、external是类型、唯一索引id是1
方式2(根据条件删除):
POST twitter/_delete_by_query
{
"query": {
"match": {
"message": "some message"
}
}
}
解释:根据条件删除,地址:https://www.elastic.co/guide/en/elasticsearch/reference/6.0/docs-delete-by-query.html
7、删除索引
请求方式:
DELTE
请求方式:
http://192.168.56.10:9200/customer
结果:
{
"acknowledged": true
}
8、批量操作
注意:以上操作均在PostMan中进行,但是批量操作无法在PostMan中进行,因此在Kibana中进行,以下的操作均在Kibana中进行
(1)使用简单的insert保存批量操作
请求:
POST /customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"John Doe"}
结果:
{
"took" : 159,
"errors" : false,
"items" : [
{
"index" : {
"_index" : "customer",
"_type" : "external",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1,
"status" : 201
}
},
{
"index" : {
"_index" : "customer",
"_type" : "external",
"_id" : "2",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 201
}
}
]
}
解释:
先上图,后说话
请求方式就是POST,请求路径代表customer是索引,而external是类型,后面的_bulk代表本次执行批量操作,主要是请求体很独特,它的语法格式是这样的:
{action:{metadata名称: metadata值, ……}}
{请求体}
其中action有index保存、create保存、update更新、delete删除等,元数据就是我们之前看到的那些带_的元数据,在4、查询文档
中可以看一下带_的那些元数据含义,本次指定的是_id,也就是唯一标识id,即我们常说的文档,至于请求体中写的东西还是我们在保存或者更新操作中使用的数据
(2)使用index保存、create保存、update更新、delete删除等批量操作
请求:
POST /_bulk
{"index":{"_index":"website","_type":"blog"}}
{"title":"My second blog post"}
{"index":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"My second blog post"}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"My first blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"My updated blog post"}}
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
结果:
{
"took" : 18,
"errors" : true,
"items" : [
{
"index" : {
"_index" : "website",
"_type" : "blog",
"_id" : "faLYpHYBFLB86FefzxCh",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 4,
"_primary_term" : 1,
"status" : 201
}
},
{
"index" : {
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 5,
"_primary_term" : 1,
"status" : 201
}
},
{
"create" : {
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"status" : 409,
"error" : {
"type" : "version_conflict_engine_exception",
"reason" : "[123]: version conflict, document already exists (current version [1])",
"index_uuid" : "Je9tgYdORkCHICn2yGJYWA",
"shard" : "0",
"index" : "website"
}
}
},
{
"update" : {
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 6,
"_primary_term" : 1,
"status" : 200
}
},
{
"delete" : {
"_index" : "website",
"_type" : "blog",
"_id" : "123",
"_version" : 3,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 1,
"status" : 200
}
}
]
}
解释:
(3)批量添加测试数据
请求体获取地址:点击我
请求:
POST bank/accout/_bulk
请求体放在这里
具体执行:
9、查询多个索引数据
索引名称之间需要用英文逗号隔开,如下:
请求:
GET cis_bookstore_book,knowledge/_search
结果:
三、Query DSL(DSL即领域特定语言)
1、什么是Query DSL
(1)uri+请求体(请求值即Query DSL)
请求:
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"account_number": {
"order": "asc"
}
}
]
}
解释:
bank/_search中的_search是固定的,代表查询操作,“match_all”: {}代表查询条件,如果查询全部,那{}中可以不写条件,sort中写的内容代表根据account_number字段按照asc升序排序,然后解释一下结果
DSL定义:Elasticsearch提供了一个可以执行查询的Json风格的DSL(domain-specific language,即领域特定语言),这就被称为Query DSL。例如我们上面所说的请求体就是DSL,该查询语言非常全面,并且刚开始的时候感觉有点复杂,真正学好它的方法是从一些基础示例开始的。
(2)uri+请求参数
除了这种使用uri+请求体的方式,还有另外一种检索方式,这种检索方式把参数放在uri的后面,我们举出一个例子实现和上面使用Query DSL方式相同的效果,例如如下:
GET /bank/_search?q=*&sort=account_number:asc
其中q=*代表查询所有,sort后面的内容代表按照account_number字段进行asc顺序排序,结果和上面的查询结果是一样的,这里就不在展示了,我们最常使用的还是Query DSL
2、基础语法格式
(1)一个查询语句的典型结构
{
QUERY_NAME:{
ARGUMENT:VALUE,
ARGUMENT:VALUE,
……
}
}
该结构中的QUERY_NAME表示是query等等,ARGUMENT就是es中使用的属性,比如match_all或者match等等,然后vlaue值中还可以有其他内容,比如es中的字段等等,我们后面也会提到的,例如:
{
"query": {"match_all": {}}
}
虽然没有用到ARGUMENT:VALUE,但是是可以使用的
(2)针对文档中字段的操作
结构如下:
{
QUERY_NAME:{
FIELD_NAME:[
{
ARGUMENT:VALUE,
ARGUMENT:VALUE,
……
},
……
]
}
}
该结构中的QUERY_NAME表示是size等等,FIELD_NAME就是文档中的属性,ARGUMENT就是es中使用的属性,比如下面例子中使用的order
例如:
{
……
"sort": [
{
"account_number": {
"order": "asc"
}
}
]
}
(3)query的使用
①、查询全部
GET bank/_search
{
"query": {
"match_all": {}
}
}
解释:
match_all用于查询所有数据,但是只会返回10条,这是默认值
②、非字符串值进行精确查询
GET bank/_search
{
"query": {
"term": {
"account_number": 20
}
}
}
解释:
account_number是字段名称,这种精确匹配和mysql中使用的where是一致的,term就是会把字段的值当做当做一个整体进行去寻找,不经过倒排索引,毕竟它不需要分词,非字符串值进行精确查询建议使用term,虽然match也可以完成同样的功能,但是match主要用于全文检索,所以非字符串进行精确查询建议使用term
③、字符串值进行精确查询
GET bank/_search
{
"query": {
"match": {
"address.keyword": "282 Kings Place"
}
}
}
解释:
每一个字段名称后面都可以跟上一个.keyword,只要加上了.keyword,那就说明要把后面的值当做一个整体去检索,并且不会经过倒排索引,虽然也可以使用term来完成,并且可以不用添加.keyword,但是term常常用于非字符串值进行精确查询,所以建议使用.keyword完成字符串值精确查询,这和match_phrase短语匹配还是不一样的,match_phrase短语匹配虽然不分词,并且把短语当做一个整体,但它还是需要经过倒排索引,即使address值只有部分和需要匹配的短语一样,那也是可以的,每一个字符串字段(type类型为text)都有key
④、字符串值进行全文检索(也可以叫做分词匹配)
GET bank/_search
{
"query": {
"match": {
"address": "mill lane"
}
}
}
结果:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 19,
"relation" : "eq"
},
"max_score" : 9.507477,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 9.507477,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "winnieholland@neteria.com",
"city" : "Urie",
"state" : "IL"
}
},
……
}
解释:
total中的value是19代表一共匹配到了19条记录,最大的得分max_score是9.507477,说明匹配的记录中最高的得分就是9.507477,越高得分的记录查询的时候越靠前,我们查到的第一条数据就是的_score就是9.507477,查询的时候会将字符串mill lane
进行分词,可以分为mill
和lane
,那么只要address中包含mill
或者lane
的数据都会被查询出来,并且不区分大小写,而我们往es中存储数据的时候es也会将数据进行分词,然后维护一个倒排索引,然后根据数据中address包含的mill
和lane
的数量和数据自身address被分词的总数目进行除法操作,得出最终的相关性得分,既然需要匹配的字符串和原来传入的字符串都需要分词,然后按照倒排索引计算相关性得分,如果我们匹配的字符串不在倒排索引中,相关性得分_score是0,那是搜不到数据的,例如把address变成mil,虽然整体能匹配上那些带mill的address字符串,但是我们不是根据address字符串整体来的,而是根据倒排索引中的分词来分析的,所以address变成mil是搜索不到数据的,毕竟倒排索引中就没有为mil的词
⑤、字符串值进行短语匹配(即不对字符串进行分词,把字符串当做整体进行匹配)
GET bank/_search
{
"query": {
"match_phrase": {
"address": "mill lane"
}
}
}
结果:
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 9.507477,
"hits" : [
{
"_index" : "bank",
"_type" : "account",
"_id" : "136",
"_score" : 9.507477,
"_source" : {
"account_number" : 136,
"balance" : 45801,
"firstname" : "Winnie",
"lastname" : "Holland",
"age" : 38,
"gender" : "M",
"address" : "198 Mill Lane",
"employer" : "Neteria",
"email" : "winnieholland@neteria.com",
"city" : "Urie",
"state" : "IL"
}
}
]
}
}
解释:
使用match_pharase不对字符串mill lane
进行分词,直接把这个字符串mill lane
当做整词在倒排索引中寻找和它一致的词,只有我们之前存入es中时address值分词的的时候可以分成字符串mill lane
那些数据会被检索到,检索的时候不区分大小写
⑥、多字段匹配
GET bank/_search
{
"query": {
"multi_match": {
"query": "mill movico",
"fields": ["address","city"]
}
}
}
解释:
首先将query中的字符串mill movico
进行分词处理,只要address
或者city
字段值中任何一个包含mill
或者movico
都是会被检索到的,这当然需要和倒排索引密切相关,另外检索的时候也是不区分大小写的
⑦、前缀查询 / 通配符查询 / 正则表达式查询
(1)前缀查询
GET bank/_search
{
"query": {
"prefix": {
"address": "Hines"
}
}
}
解释:前面只要包含Hines的就可以,如果字段是keyword类型,执行匹配该字段的前缀内容
(2)通配符查询
GET /my_index/address/_search
{
"query": {
"wildcard": {
"postcode": "W?F*HW" (1)
}
}
}
解释:?
匹配任意字符, *
匹配 0 或多个字符,如果字段是keyword类型,执行匹配该字段的全部内容
(3)正则表达式
GET /my_index/address/_search
{
"query": {
"regexp": {
"postcode": "W[0-9].+"
}
}
}
解释:正则表达式平常怎么用,这边也是怎么用,如果字段是keyword类型,执行匹配该字段的全部内容
总结:以上三种方式对应的字段必须是not_analyzed类型的,也就是keyword类型,这是支持匹配的;如果该字段是可以分词的,那就会匹配所有单个分词
⑧、复合查询
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"gender": "M"
}
},
{
"match": {
"address": "mill"
}
}
],
"must_not": [
{
"match": {
"age": "28"
}
}
],
"should": [
{
"match": {
"lastname": "Hines"
}
}
],
"filter": {
"range": {
"age": {
"gte": 10,
"lte": 40
}
}
}
}
}
}
解释:
bool里面有四种形式,分别是must、must_not、should、filter,他们分别代表必须满足全部条件、不能满足任何一个条件、必须满足条件之一、数据过滤
其中must、should都会贡献相关性得分,filter不会贡献相关性得分,而must_not被当成一个过滤器,和filter功能一致,所以也不会贡献相关性得分
可以看出我们上面需要寻找gender包含M,address包含mill,年龄不能是28岁,lastname中最好包含Hines,即使不包含也没有关系,不过不包含的话相关性得分_score低一点,但也是可以查询出来的,should里面的条件不是全部满足,但必须满足其中的条件之一,相当于或者操作,最后然后通过filter过滤掉年龄大于等于10岁并且小于等于40岁的数据
⑨、滚动查询
首次查询:
// scroll字段指定了滚动id的有效生存期,以分钟为单位,过期之后会被es自动清理
GET /kms.wiki/_search?scroll=2m
{
"query": {
"match_all": {}
},
"size": 20,
"_source": "title"
}
结果:
{
"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoCQAAAAAAAAAKFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAACxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAAAwWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAANFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADhZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABIWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAARFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABAWZldNV0hQTVlRdWkwNDBGb3NUTkduUQ==",
"took": 20,
"timed_out": false,
"_shards": {
"total": 9,
"successful": 9,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4075,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "kms.wiki",
"_type": "_doc",
"_id": "46b3f4af5ead47a69622e2d13186cf01",
"_score": 1,
"_source": {
"title": "前南斯拉夫国防学院"
}
},
……
]
}
}
非首次查询:
// scroll后面的值依然是2分钟
// scroll_id的值是上次查询获取的_scroll_id,由于滚动id的存在,所以不用在写索引名称和查询条件,当hits为空的时候,说明滚动查询到头了,不要在查询了
GET /_search/scroll?scroll=2m&scroll_id=DnF1ZXJ5VGhlbkZldGNoCQAAAAAAAAAKFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAACxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAAAwWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAANFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADhZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABIWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAARFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABAWZldNV0hQTVlRdWkwNDBGb3NUTkduUQ==
结果:
{
"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoCQAAAAAAAAAKFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAACxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAAAwWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAANFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADhZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABIWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAARFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABAWZldNV0hQTVlRdWkwNDBGb3NUTkduUQ==",
"took": 12,
"timed_out": false,
"terminated_early": true,
"_shards": {
"total": 9,
"successful": 9,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4075,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "kms.wiki",
"_type": "_doc",
"_id": "328a36aa33b444dfa1c1b379ee0a8d47",
"_score": 1,
"_source": {
"title": "卡-52武装直升机"
}
},
……
]
}
}
清除scroll_id:
// 由于滚动查询十分占用内存,所以在查询成功之后,需要根据滚动查询id及时回收内存;下面数组中都是_scroll_id,由于查询次数过多,所以滚动id存在多个
DELETE /_search/scroll
{
"scroll_id": [
"DnF1ZXJ5VGhlbkZldGNoCQAAAAAAAAAeFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAHRZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABwWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAAfFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAIBZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAACEWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAAiFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAIxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAACQWZldNV0hQTVlRdWkwNDBGb3NUTkduUQ==",
"DnF1ZXJ5VGhlbkZldGNoCQAAAAAAAAAeFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAHRZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABwWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAAfFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAIBZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAACEWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAAiFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAIxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAACQWZldNV0hQTVlRdWkwNDBGb3NUTkduUQ=="
]
}
参考:
注意:
- 查询条件中不能使用
from
属性,否则会出现错误:Validation Failed: 1: using [from] is not allowed in a scroll context
(4)sort的使用
请求:
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"account_number": {
"order": "asc"
}
}
]
}
解释:
按照account_number字段进行升序排列,这和mysql中的order by 字段名称 排序规则
是一样的,以上写法最符合标准,但是也可以简写,比如将
"account_number": {
"order": "asc"
}
简写成:
"account_number": "asc"
多字段排序:
GET bank/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"account_number": {
"order": "asc"
}
},
{
"age": {
"order": "desc"
}
}
]
}
(5)from和size的使用
请求:
GET bank/_search
{
"query": {
"match_all": {}
},
"from": 0,
"size": 5
}
解释:
结果就不在说了,query中设置的查询全部数据,from代表从0开始,数据默认确实从0开始排列,size代表取出5条数据,这个和mysql中的limit start, size
是一样的
(6)_source的使用
结果中只包含单个字段:
{
"query": {
"match_all": {}
},
"_source": "title"
}
结果中包含多个指定字段:
{
"query": {
"match_all": {}
},
"_source": ["title", "price"]
}
结果中不包含哪些字段:
{
"query": {
"match_all": {}
},
"_source": {
"excludes": ["firstname", "lastname"]
}
}
解释:
_source和mysql中的select 字段名称……
是类似的
(7)高亮的使用
默认使用em
标签包裹:
{
"query": {
"match": {
"title": "小米"
}
},
"highlight": {
"fields": {
"title": {}
}
}
}
可以使用pre_tags
和post_tags
来指定前后的包裹标签:
{
"query": {
"match": {
"title": "小米"
}
},
"highlight": {
"pre_tags": "<b color='red'>",
"post_tags": "</b>",
"fields": {
"title": {}
}
}
}
(8)boost的使用
// 1、在prefix中使用
{
"query": {
"prefix": {
"website": {
"value": "/坦克/",
"boost": 100
}
}
}
}
// 2、在match_phrase中使用
{
"query": {
"match_phrase": {
"title": {
"query": "中国",
"boost": 5
}
}
}
}
// 3、在term和terms中使用
{
"query": {
"bool": {
"should": [{
"term": {
"title.keyword": {
"value": "中国",
"boost": 2999
}
}
},
{
"prefix": {
"website": {
"value": "/中国/",
"boost": 2888
}
}
},
{
"terms": {
"title.keyword": ["中华人民共和国"],
"boost": 1000
}
},
{
"multi_match": {
"query": "中国",
"fields": ["title^10", "content"],
"minimum_should_match": "100%"
}
}
],
"minimum_should_match": 1
}
},
"highlight": {
"pre_tags": "<font>",
"post_tags": "</font>",
"fields": {
"title": {},
"title.keyword": {},
"content": {}
}
},
"_source": ["indexId", "title", "content", "date"],
"from": 0,
"size": 10
}
(9)exist的使用
请求:
GET bank/_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "kgCategoryName"
}
}
}
}
}
解释:
如果值为null或者[],但是不包括以下几种类型:(1)空字符串,例如"“或”-"(2)包含null和另一个值的数组,例如[null, “foo”](3)自定义nul值;具体细节可以查看:exists-query
(10)range和format的使用
请求:
GET /kms.wiki/_count
{
"query": {
"range": {
"date": {
"gte": "2020-01-01",
"lte": "2021-01-01",
"format": "yyyy-MM-dd"
}
}
}
}
解释:
不仅说明range的使用,更是说明format如何使用,format说明gte和lte后面的日期格式,让es能知道是什么日期
四、Aggregations聚合分析
1、聚合简单用法
GET bank/_search
{
"query": {
"match": {
"address": "mill"
}
},
"aggs": {
"balanceCount": {
"terms": {
"field": "balance",
"size": 10
}
},
"blanceSum": {
"sum": {
"field": "balance"
}
},
"balanceAvg":{
"avg": {
"field": "balance"
}
},
"balanceMin":{
"min": {
"field": "balance"
}
},
"balanceMax":{
"max": {
"field": "balance"
}
}
},
"size": 0
}
结果:
{
"took" : 34,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"balanceMax" : {
"value" : 45801.0
},
"blanceSum" : {
"value" : 100832.0
},
"balanceMin" : {
"value" : 9812.0
},
"balanceCount" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 9812,
"doc_count" : 1
},
{
"key" : 19648,
"doc_count" : 1
},
{
"key" : 25571,
"doc_count" : 1
},
{
"key" : 45801,
"doc_count" : 1
}
]
},
"balanceAvg" : {
"value" : 25208.0
}
}
}
解释:
首先查询了address中包含mill的数据,然后根据这些数据进行聚合操作,几种操作如下:
aggs:代表聚合操作,balanceCount、blanceSum、balanceAvg、balanceMin、balanceMax代表聚合操作的名称
terms:代表分组,在terms里面field指定需要分组的字段名称,size表示最多展示10个分组结果,当然这个可以随意调节
sum:代表计算总额,在sum里面field指定需要计算总额的字段名称
avg:代表计算平均值,在avg里面field指定需要计算平均值的字段名称
min:代表计算最小值,在min里面field指定需要计算最小值的字段名称
max:代表计算最小值,在max里面field指定需要计算最大值的字段名称
最后的size:0代表只看聚合数据,不看其他检索出来的数据
2、子聚合简单使用
问题:
按照年龄聚合,并且求出处于该年龄段的员工平均薪资
请求:
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageCount": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
结果:
第一个aggs里面先进行了age的年龄统计,最多展示100个统计数据,这些统计数据每个都是一组数据的集合,如果我们需要对这些数据集合再次进行统计,那就需要使用子聚合,子聚合放在父聚合名称下面,具体我们ageCount中的aggs子聚合,子聚合的名称是balanceAvg,里面用于计算这一组数据中balance的平均值
3、多重子聚合的使用
问题:
按照年龄聚合,并分别查找这些年龄段中性别为M或者F的平均薪资以及这个年龄段的总体平均薪资
请求:
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageCount": {
"terms": {
"field": "age",
"size": 100
},
"aggs": {
"genderCount": {
"terms": {
"field": "gender.keyword",
"size": 100
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
},
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
结果:
………………
"aggregations" : {
"ageCount" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 463,
"buckets" : [
{
"key" : 31,
"doc_count" : 61,
"genderCount" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "M",
"doc_count" : 35,
"balanceAvg" : {
"value" : 29565.628571428573
}
},
{
"key" : "F",
"doc_count" : 26,
"balanceAvg" : {
"value" : 26626.576923076922
}
}
]
},
"balanceAvg" : {
"value" : 28312.918032786885
}
},
………………
解释:
由于篇幅原因,我们只给出一个统计值,这代表年龄是31岁的员工有60人,这些员工的平均工资是28312.918032786885元,其中性别为M的有35人,平均薪资是29565.628571428573元,然后性别为F的有26人,平均薪资是26626.576923076922元
4、聚合用法补充
1、min_doc_count的使用
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"age_agg": {
"terms": {
"field": "age",
"min_doc_count": 2
}
}
},
"size": "0"
}
说明:可以对聚合之后的数量进行过滤,比如上面的例子中就是找到age相同的情况下总数大于2的聚合结果
五、Mapping映射
1、查看索引中的映射信息
请求:
GET bank/_mapping
结果:
{
"bank" : {
"mappings" : {
"properties" : {
"account_number" : {
"type" : "long"
},
"address" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"age" : {
"type" : "long"
},
"balance" : {
"type" : "long"
},
………………
解释:
bank是索引名称,_mapping是固定值,如果不指定字段的类型映射,es会自动猜测字段类型,然后设置字段类型,一般情况下非字符串类型都被设置为long类型,字符串类型都被设置为text类型
2、创建字段映射
请求:
// my_index是索引名称
PUT /my_index
{
"mappings": {
"properties": {
"age":{"type": "integer"},
"email":{"type":"keyword"},
"name":{"type": "text"}
}
}
}
结果:
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "my_index"
}
解释:
在往索引中插入数据之前,我们可以先定义索引中的字段名称对应的字段类型,字段类型多种多样,具体可以看:https://blog.csdn.net/hello_world123456789/article/details/95341515,另外email类型为keyword代表该字段会被精确匹配,不通过倒排索引匹配,而是通过直接和字段值进行比较完成精准匹配
字符串型数据的type可以是text
或者keyword类型,其中text
类型会被分词,而keyword
类型不会被分词,对于一个字段,除type
之外,还可以设置index
属性,默认是true
,也可以设置成false
,那就是不能放在查询条件中检索,例如:"email":{"type":"keyword", "index": false}
补充:
创建复杂索引结构,不仅包含mappings,还包括aliases和settings,如下:
PUT /kms.wiki
{
"aliases": {
"kms.wiki.alias": {}
},
"mappings": {
"properties": {
"basicInfo": {
"type": "text"
},
"browsedCount": {
"type": "long"
},
"catalog": {
"type": "text"
},
"clickCount": {
"type": "long"
},
"collectCount": {
"type": "long"
},
"content": {
"type": "text",
"analyzer": "ik_smart"
},
"contentUrl": {
"type": "text"
},
"date": {
"type": "date"
},
"fileId": {
"type": "keyword"
},
"handleLink": {
"type": "long"
},
"htmlContent": {
"type": "text"
},
"imageUrl": {
"type": "keyword"
},
"indexId": {
"type": "keyword"
},
"kGraphView": {
"type": "nested",
"properties": {
"currentInstance": {
"type": "nested",
"properties": {
"instanceId": {
"type": "keyword"
},
"instanceName": {
"type": "keyword"
},
"objectName": {
"type": "keyword"
}
}
},
"instances": {
"type": "nested",
"properties": {
"instanceId": {
"type": "keyword"
},
"instanceName": {
"type": "keyword"
},
"objectName": {
"type": "keyword"
}
}
},
"relations": {
"type": "nested",
"properties": {
"relationId": {
"type": "keyword"
},
"relationName": {
"type": "keyword"
},
"sourceId": {
"type": "keyword"
},
"targetId": {
"type": "keyword"
}
}
}
}
},
"kgCategoryName": {
"type": "keyword"
},
"kgId": {
"type": "keyword"
},
"labels": {
"type": "nested",
"properties": {
"label": {
"type": "keyword"
},
"type": {
"type": "keyword"
}
}
},
"normalTitle": {
"type": "keyword"
},
"shareCount": {
"type": "long"
},
"source": {
"type": "keyword"
},
"summary": {
"type": "text"
},
"summaryUrl": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"pinyin": {
"type": "text",
"analyzer": "pinyin_analyzer"
},
"synonyms": {
"type": "text",
"analyzer": "ik_synon_max_word"
}
},
"analyzer": "ik_max_word"
},
"titleMeaning": {
"type": "keyword"
},
"url": {
"type": "keyword"
},
"website": {
"type": "keyword"
}
}
},
"settings": {
"index": {
"number_of_shards": "9",
"analysis": {
"filter": {
"pinyin_filter": {
"keep_none_chinese_in_first_letter": "true",
"lowercase": "true",
"keep_original": "false",
"keep_first_letter": "true",
"trim_whitespace": "true",
"type": "pinyin",
"keep_none_chinese": "true",
"limit_first_letter_length": "16",
"keep_full_pinyin": "true"
},
"word_sync": {
"type": "synonym",
"synonyms_path": "analysis-ik/synonym.txt"
}
},
"analyzer": {
"ik_synon_max_word": {
"filter": [
"word_sync"
],
"type": "custom",
"tokenizer": "ik_max_word"
},
"ik_synon_smart": {
"filter": [
"word_sync"
],
"type": "custom",
"tokenizer": "ik_smart"
},
"pinyin_analyzer": {
"filter": [
"pinyin_filter"
],
"tokenizer": "whitespace"
}
}
},
"number_of_replicas": "2"
}
}
}
3、Nested禁止扁平化处理
代码:
PUT product
{
"mappings": {
"properties": {
……………………
"catalogName": {
"type": "keyword",
"index": false,
"doc_values": false
},
"attrs": {
"type": "nested",
"properties": {
"attrId": {
"type": "long"
},
"attrName": {
"type": "keyword",
"index": false,
"doc_values": false
},
"attrValue": {
"type": "keyword"
}
}
}
}
}
}
解释:
因为attrs里面还有属性,所以attrs是一个复杂属性,如果不添加"type": "nested"
(“type”: "nested"怎么用),那么es将会对复杂属性进行扁平化处理,也就是将所有的attrId的值放在一个数组中,那我们进行查询数组中的单个值将会把数组中所有的值包含的attrs查询出来,这就不是我们想要的效果,所以我们需要需要添加"type": "nested"
来禁止复杂属性的扁平化处理,对于添加"type": "nested"
的复杂属性查询就需要特别的做法,具体操作可以看:Example query,虽然里面的例子是直接在query下面使用的nested,但是我们也可以在filter或者must等等下面使用nested
4、添加字段映射
请求:
PUT /my_index/_mapping
{
"properties": {
"employee-id": {
"type": "keyword",
"index": "false"
}
}
}
结果:
{
"acknowledged" : true
}
解释:
由于我们之前已经添加过age、email、name字段的类型,所以不能在使用创建映射的方式来添加映射了,而是需要使用本次的方式来添加映射,本次添加的employee-id字段的type是keyword,代表不经过倒排索引精准匹配,index是false代表该字段不能参与query检索,不过默认index是true,也就是默认参与query检索
5、更新字段映射
字段映射创建之后,无法更新已创建的字段映射,只能重新创建索引,然后添加想要的字段映射,通过数据迁移来实现和更新字段映射相同的效果
6、修改索引副本
请求:
PUT /kms.wiki.hotlemmas/_settings
{
"number_of_replicas": "0"
}
结果:
{
"acknowledged" : true
}
7、数据迁移
(1)es7将type类型变为可选、es8完全去掉type类型的解释:
从es7开始,url中的type参数变为可选,比如索引一个文档不在要求提供文档类型type,数据可以直接存储在index索引下面,这是因为在关系数据库(比如mysql)中两个数据库是独立的,即使他们里面有相同名称的列也不会影响使用,但是ES中不是这样的,elasticsearch是基于Lucene开发的搜索引擎,ES中不同type下名称相同的filed字段最终在Lucene中的处理方式是一样的,如果同一个index中的不同type中的同名字段具有不同的映射(包括字段类型等),那就会出现冲突情况,最终导入Lucene处理效率下降,而我们总不能让同一个index下面的所有type中的文档中的同名字段属性使用同一种类型吧,所以我们要去掉type,以后文档直接就存储在index索引下面,简单来说去掉type就是为了提高ES处理数据的效率
(2)具体实现
操作:
将bank索引下面的account类型的所有文档都迁移到新的newbank下面
创建新索引的映射:
PUT /newbank
{
"mappings": {
"properties": {
"account_number": {
"type": "long"
},
"address": {
"type": "text"
},
"age": {
"type": "integer"
},
"balance": {
"type": "long"
},
"city": {
"type": "keyword"
},
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"employer": {
"type": "keyword"
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"gender": {
"type": "keyword"
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state": {
"type": "keyword"
}
}
}
}
es6中的数据迁移到es7:
POST _reindex
{
"source": {
"index": "bank",
"type": "account"
},
"dest": {
"index": "newbank"
}
}
在source中index后面的bank是老index索引名称,type后面的account是老索引下面的type类型名称,这是es6的写法,毕竟还有真实的type类型,然后es7中只需要把创建好的新index名称写在那里就可以了,因为我们当前使用的是es7.4.2版本,所以选择不使用type
es7中的数据迁移到es7:
POST _reindex
{
"source": {
"index": "老index索引名称"
},
"dest": {
"index": "新index索引名称"
}
}
总结:
无论是es6中的数据迁移到es7还是es7中的数据迁移到es7,都需要先创建映射,并且注意映射中的字段名称和老的index索引中的字段名称一致,可以通过GET bank/_mapping
来查看字段信息,其中bank是索引名称,按照上面说过的形式来完成数据迁移,es6->es7需要多写一个type,可以通过GET bank/_search
来查看type名称等,如果迁移完成可以通过GET newbank/_search
查看index和type,可以看到"_type" : "_doc"
,虽然还有一个type,但是这并不是真的type,只是一个象征意义
六、分词器使用
1、下载ik分词器和pinyin分词器
pinyin分词器下载路径:https://github.com/medcl/elasticsearch-analysis-pinyin/
ik分词器下载路径: https://github.com/medcl/elasticsearch-analysis-ik
下载方法:
点击tags
,如下:
找到合适的版本,点击Downloads
按钮,如下:
点击zip就可以下载,如下:
安装方法:
windows环境: 将zip解压之后放在elasticsearch安装目录的plugins
目录下面,如下:
k8s容器环境: 将zip解压之后放在/usr/share/elasticsearch/plugins
目录下面,如下:
2、默认分词器
POST _analyze
{
"analyzer": "standard",
"text": "我是中国人"
}
解释:
只能识别英文,不能识别中文,中文会被一个一个拆开
3、ik_smart分词器
POST _analyze
{
"analyzer": "ik_smart",
"text": "我是中国人"
}
结果:
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "中国人",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
}
]
}
4、ik_max_word分词器
POST _analyze
{
"analyzer": "ik_max_word",
"text": "我是中国人"
}
结果:
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "是",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "中国人",
"start_offset" : 2,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "中国",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "国人",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 4
}
]
}
解释:
以后创建索引之前需要先创建ik分词器,毕竟我们需要指定ik分词器,不能在使用默认分词器了,它们对中文的支持度实在太低了
5、pinyin分词器
POST _analyze
{
"analyzer": "pinyin",
"text": "我是中国人"
}
结果:
{
"tokens": [
{
"token": "wo",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "wszgr",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "shi",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 1
},
{
"token": "zhong",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
},
{
"token": "guo",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 3
},
{
"token": "ren",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 4
}
]
}
6、创建ik分词器的自定义远程仓库
请看virtualbox和vagrant.docx
七、数据迁移
请查看我写的另外一篇文章 Elasticsearch数据迁移
八、例子
GET _search
{
"query": {
"match_all": {}
}
}
POST /customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"John Doe"}
GET /bank/_search?q=*&sort=account_number:asc
GET bank/_search
{
"query": {
"match_all": {}
},
"_source": "balance"
}
GET bank/_search
{
"query": {
"match_all": {}
},
"_source": "balance"
}
GET bank/_search
{
"query": {
"match": {
"address": "mil"
}
}
}
GET bank/_search
{
"query": {
"match": {
"address": "mill lane"
}
}
}
GET bank/_search
{
"query": {
"multi_match": {
"query": "mill movico",
"fields": ["address","city"]
}
}
}
GET bank/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"gender": "M"
}
},
{
"match": {
"address": "mill"
}
}
],
"must_not": [
{
"match": {
"age": "28"
}
}
],
"should": [
{
"match": {
"lastname": "Hines"
}
}
],
"filter": {
"range": {
"age": {
"gte": 10,
"lte": 40
}
}
}
}
}
}
GET bank/_search
{
"query": {
"match": {
"balance": 16
}
}
}
GET bank/_search
{
"query": {
"term": {
"address.keyword": "282 Kings Place"
}
}
}
# 简单聚合用法
GET bank/_search
{
"query": {
"match": {
"address": "mill"
}
},
"aggs": {
"balanceCount": {
"terms": {
"field": "balance",
"size": 10
}
},
"balanceSum": {
"sum": {
"field": "balance"
}
},
"balanceAvg":{
"avg": {
"field": "balance"
}
},
"balanceMin":{
"min": {
"field": "balance"
}
},
"balanceMax":{
"max": {
"field": "balance"
}
}
}
}
# 子聚合(按照年龄聚合,并且求出处于该年龄段的员工平均薪资)
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageCount": {
"terms": {
"field": "age",
"size": 20
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
},
"balanceAvg": {
"avg": {
"field": "balance"
}
}
},
"size": 0
}
# 按照年龄聚合,并分别查找这些年龄段中性别为M或者F的平均薪资以及这个年龄段的总体平均薪资
GET bank/_search
{
"query": {
"match_all": {}
},
"aggs": {
"ageCount": {
"terms": {
"field": "age",
"size": 10
},
"aggs": {
"genderCount": {
"terms": {
"field": "gender.keyword",
"size": 100
},
"aggs": {
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
},
"balanceAvg": {
"avg": {
"field": "balance"
}
}
}
}
},
"size": 0
}
GET bank/_mapping
PUT /my_index
{
"mappings": {
"properties": {
"age":{"type": "integer"},
"email":{"type":"keyword"},
"name":{"type": "text"}
}
}
}
PUT /my_index/_mapping
{
"properties": {
"employee-id": {
"type": "keyword",
"index": "false"
}
}
}
GET bank/_mapping
GET bank/_search
PUT /newbank
{
"mappings": {
"properties": {
"account_number": {
"type": "long"
},
"address": {
"type": "text"
},
"age": {
"type": "integer"
},
"balance": {
"type": "long"
},
"city": {
"type": "keyword"
},
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"employer": {
"type": "keyword"
},
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"gender": {
"type": "keyword"
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"state": {
"type": "keyword"
}
}
}
}
POST _reindex
{
"source": {
"index": "bank",
"type": "account"
},
"dest": {
"index": "newbank"
}
}
GET newbank/_mapping
GET newbank/_search
POST _analyze
{
"analyzer": "standard",
"text": "我是中国人"
}
POST _analyze
{
"analyzer": "ik_smart",
"text": "乔碧萝殿下"
}
POST _analyze
{
"analyzer": "ik_max_word",
"text": "尚硅谷电商项目"
}
GET product/_search
POST _reindex
{
"source": {
"index": "product"
},
"dest": {
"index": "gulimall_product"
}
}
GET gulimall_product/_search
PUT gulimall_product
{
"mappings": {
"properties": {
"skuId": {
"type": "long"
},
"spuId": {
"type": "keyword"
},
"skuTitle": {
"type": "text",
"analyzer": "ik_smart"
},
"skuPrice": {
"type": "double"
},
"skuImg": {
"type": "keyword"
},
"saleCount": {
"type": "long"
},
"hasStock": {
"type": "boolean"
},
"hotScore": {
"type": "long"
},
"brandId": {
"type": "long"
},
"catalogId": {
"type": "long"
},
"brandName": {
"type": "keyword"
},
"brandImg": {
"type": "keyword"
},
"catalogName": {
"type": "keyword"
},
"attrs": {
"type": "nested",
"properties": {
"attrId": {
"type": "long"
},
"attrName": {
"type": "keyword"
},
"attrValue": {
"type": "keyword"
}
}
}
}
}
}
GET gulimall_product/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"skuTitle": "华为"
}
}
],
"filter": [
{
"term": {
"catalogId": 225
}
},
{
"terms": {
"brandId": [
1,
2
]
}
},
{
"nested": {
"path": "attrs",
"query": {
"bool": {
"must": [
{
"term": {
"attrs.attrId": {
"value": 1
}
}
},
{
"term": {
"attrs.attrValue": {
"value": "ELS-AN10"
}
}
}
]
}
}
}
},
{
"term": {
"hasStock": true
}
},
{
"range": {
"skuPrice": {
"gte": 0,
"lte": 10000
}
}
}
]
}
},
"sort": [
{
"skuPrice": {
"order": "desc"
}
}
],
"from": 0,
"size": 1,
"highlight": {
"pre_tags": "<b color='red'>",
"post_tags": "</b>",
"fields": {
"skuTitle": {}
}
},
"aggs": {
"brand_agg": {
"terms": {
"field": "brandId",
"size": 32
},
"aggs": {
"brand_name_agg": {
"terms": {
"field": "brandName",
"size": 32
}
},
"brand_img_agg": {
"terms": {
"field": "brandImg",
"size": 32
}
}
}
},
"catalog_agg": {
"terms": {
"field": "catalogId",
"size": 14
},
"aggs": {
"catalog_name_agg": {
"terms": {
"field": "catalogName",
"size": 14
}
}
}
},
"attr_agg": {
"nested": {
"path": "attrs"
},
"aggs": {
"attr_id_agg": {
"terms": {
"field": "attrs.attrId",
"size": 10
},
"aggs": {
"attr_name_agg": {
"terms": {
"field": "attrs.attrName",
"size": 10
}
},
"attr_value_agg": {
"terms": {
"field": "attrs.attrValue",
"size": 10
}
}
}
}
}
}
}
}
GET /gulimall_product/_mapping
GET /gulimall_product/_search
GET gulimall_product/_search
{
"query": {
"match": {
"brandId": "1"
}
},
"aggs": {
"brand_agg": {
"terms": {
"field": "brandId",
"size": 32
},
"aggs": {
"brand_name_agg": {
"terms": {
"field": "brandName",
"size": 32
}
},
"brand_img_agg": {
"terms": {
"field": "brandImg",
"size": 32
}
}
}
},
"catalog_agg": {
"terms": {
"field": "catalogId",
"size": 14
},
"aggs": {
"catalog_name_agg": {
"terms": {
"field": "catalogName",
"size": 14
}
}
}
},
"attr_agg": {
"nested": {
"path": "attrs"
},
"aggs": {
"attr_id_agg": {
"terms": {
"field": "attrs.attrId",
"size": 10
},
"aggs": {
"attr_name_agg": {
"terms": {
"field": "attrs.attrName",
"size": 10
}
},
"attr_value_agg": {
"terms": {
"field": "attrs.attrValue",
"size": 10
}
}
}
}
}
}
},
"size": 0
}
GET /gulimall_product/_search
{
"from": 0,
"size": 2,
"query": {
"bool": {
"must": [{
"match": {
"skuTitle": {
"query": "华为",
"operator": "OR",
"prefix_length": 0,
"max_expansions": 50,
"fuzzy_transpositions": true,
"lenient": false,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"boost": 1.0
}
}
}],
"filter": [{
"term": {
"catalogId": {
"value": 225,
"boost": 1.0
}
}
}, {
"terms": {
"brandId": [1, 2],
"boost": 1.0
}
}, {
"nested": {
"query": {
"bool": {
"must": [{
"term": {
"attrs.attrId": {
"value": "1",
"boost": 1.0
}
}
}, {
"terms": {
"attrs.attrValue": ["ELS-AN10", "123"],
"boost": 1.0
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"path": "attrs",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1.0
}
}, {
"nested": {
"query": {
"bool": {
"must": [{
"term": {
"attrs.attrId": {
"value": "4",
"boost": 1.0
}
}
}, {
"terms": {
"attrs.attrValue": ["华为 HUAWEI P40 Pro+"],
"boost": 1.0
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"path": "attrs",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1.0
}
}, {
"term": {
"hasStock": {
"value": true,
"boost": 1.0
}
}
}, {
"range": {
"skuPrice": {
"from": null,
"to": "8000",
"include_lower": true,
"include_upper": true,
"boost": 1.0
}
}
}],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"sort": [{
"skuPrice": {
"order": "desc"
}
}],
"aggregations": {
"brand_agg": {
"terms": {
"field": "brandId",
"size": 32,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
},
"aggregations": {
"brand_name_agg": {
"terms": {
"field": "brandName",
"size": 32,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
}
},
"brand_img_agg": {
"terms": {
"field": "brandImg",
"size": 32,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
}
}
}
},
"catalog_agg": {
"terms": {
"field": "catalogId",
"size": 14,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
},
"aggregations": {
"catalog_name_agg": {
"terms": {
"field": "catalogName",
"size": 14,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
}
}
}
},
"attr_agg": {
"nested": {
"path": "attrs"
},
"aggregations": {
"attr_id_agg": {
"terms": {
"field": "attrs.attrId",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
},
"aggregations": {
"attr_name_agg": {
"terms": {
"field": "attrs.attrName",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
}
},
"attr_value_agg": {
"terms": {
"field": "attrs.attrValue",
"size": 10,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false,
"order": [{
"_count": "desc"
}, {
"_key": "asc"
}]
}
}
}
}
}
}
},
"highlight": {
"pre_tags": ["<b color='red'>"],
"post_tags": ["</b>"],
"fields": {
"skuTitle": {}
}
}
}
九、在Postman中使用
链接:https://pan.baidu.com/s/1j88rbrT6F06zv-Thp7q-tQ?pwd=x6fd
提取码:x6fd
更多推荐
所有评论(0)