ES基本知识语法以及kibana操作
本文通过Kibana来操作ES Rest API. ElasticSearch版本为5.2.0知识点思维导图基础增删改查语法(1) 创建/全量替换记录不存在就是创建, 否则是全量替换.PUT /index/type/id{"属性名" : "va...
本文通过Kibana来操作ES Rest API. ElasticSearch版本为5.2.0
知识点思维导图
基础增删改查
语法
(1) 创建/全量替换
记录不存在就是创建, 否则是全量替换.
PUT /index/type/id
{
"属性名" : "value"
...
}
- 1
- 2
- 3
- 4
- 5
(2) 创建/修改
-
创建
//如果不传id, 则系统自动生成一个UUID POST /index/type/ { "属性名":修改值 } 或者 POST /index/type/id { "属性名":修改值 }
-
修改
//没有带上的属性会被清除 POST /index/type/id { "属性名":修改值 }
(3) 查询
GET /index/type/id
(4) 删除
只是逻辑删除, 将其标记为delete, 当数据越来越多时, ES会自动物理删除.
DELETE /index/type/id
实例
(1) 新增
PUT /company/employee/1
{
"age":25,
"salary":"20k",
"skill":["java","mysql","es"]
}
或者
POST /company/employee
{
"age":30,
"salary":"30k",
"skill":["python","redis","hadoop"]
}
POST /company/employee/2
{
“age”:22,
“salary”:“10k”,
“skill”:[“python”,“java”]
}
POST 命令新增数据时, 如果不传id, 则系统自动生成一个UUID.
(2) 查询
GET /company/employee/1
(3) 修改
POST /company/employee/1
{
"age":40,
"salary":"100k"
}
或
PUT /company/employee/1
{
"age":18,
"salary":"50K",
"skill":["c++"]
}
PUT 是全量替换, 且是幂等操作. 上面的POST修改, skill属性被清除了, 属性现在只有age和salary.
幂等操作: GET, PUT, DELETE都是幂等操作, 执行多次与执行一次结果一样.
(4) 删除
DELETE /company/employee/1
Partial Update
部分更新.
(1) 语法
post /index/type/id/_update
{
"doc": {
"属性名" : "value"
...
}
}
(2) 与全量替换和普通修改的区别
在应用程序中, 全量替换是先从ES中查询出记录, 然后在修改, 一个属性都不能少. 而部分更新就不用先从ES查询记录了, 可以直接修改, 且不用所有属性都列出.
部分更新如果只包含部分属性, 那么其他没有被包含的属性仍然存在, 但普通修改其他没有被包含的属性就直接清除了.
(3) 实例
a. 先添加一条数据
PUT /school/student/1
{
"name": "alice",
"age": 17
}
b. 测试部分更新
//部分更新
POST /school/student/1/_update
{
"doc": {
"name" : "Tom"
}
}
//查询
GET /school/student/1
//结果
{
“_index”: “school”,
“_type”: “student”,
“_id”: “1”,
“_version”: 7,
“found”: true,
“_source”: {
“name”: “Tom”,
“age”: 17
}
}
c. 测试普通修改
//普通修改
POST /school/student/1
{
"name" : "Linda"
}
//查询
GET school/student/1
//结果
{
“_index”: “school”,
“_type”: “student”,
“_id”: “1”,
“_version”: 8,
“found”: true,
“_source”: {
“name”: “Linda”
}
}
Bulk Operation
批量查询和批量增删改的语法不同, 所以分开来介绍.
批量查询
GET /_mget
{
"docs" : [
{
"_index" : "company",
"_type" : "employee",
"_id" : 1
},
{
"_index" : "company",
"_type" : "employee",
"_id" : 2
}
]
}
GET /company/_mget
{
“docs” : [
{
“_type” : “employee”,
“_id” : 1
},
{
“_type” : “employee”,
“_id” : 2
}
]
}
GET /company/employee/_mget
{
“ids” : [1,2]
}
批量增删改
(1) 语法
每一个操作要两个json串,语法如下:
POST /index/type/_bulk
{"action": {"metadata"}}
{"data"}
index和type可以放入metadata中. 每个json串不能换行, 只能放一行. 同时一个json串和一个json串之间, 必须有一个换行.
(2) action类型
-
delete
删除.
-
create
强制创建. PUT /index/type/id/_create
-
index
创建或替换.
-
update
属于部分更新.
(3) 实例
POST /_bulk
{"delete" : {"_index":"company", "_type":"employee","_id":"1"}}
{"create" :{"_index":"company","_type":"employee","_id":"2"}}
{"name":"tyshawn", "age":18}
{"index":{"_index":"company","_type":"employee","_id":"3"}}
{"name":"lee", "age":24}
{"update":{"_index":"company","_type":"employee","_id":"2"}}
{"doc":{"age":30}}
搜索
添加搜索实例数据.
PUT /website/article/1
{
"post_date": "2017-01-01",
"title": "my first article",
"content": "this is my first article in this website",
"author_id": 11401,
"tags": [
"java",
"c"
]
}
PUT /website/article/2
{
“post_date”: “2017-01-02”,
“title”: “my second article”,
“content”: “this is my second article in this website”,
“author_id”: 11402,
“tags”: [
“redis”,
“linux”
]
}
PUT /website/article/3
{
“post_date”: “2017-01-03”,
“title”: “my third article”,
“content”: “this is my third article in this website”,
“author_id”: 11403,
“tags”: [
“elaticsearch”,
“kafka”
]
}
mapping结构如下:
GET /website/_mapping/article
{
“website”: {
“mappings”: {
“article”: {
“properties”: {
“author_id”: {
“type”: “long”
},
“content”: {
“type”: “text”,
“fields”: {
“keyword”: {
“type”: “keyword”,
“ignore_above”: 256
}
}
},
“post_date”: {
“type”: “date”
},
“tags”: {
“type”: “text”,
“fields”: {
“keyword”: {
“type”: “keyword”,
“ignore_above”: 256
}
}
},
“title”: {
“type”: “text”,
“fields”: {
“keyword”: {
“type”: “keyword”,
“ignore_above”: 256
}
}
}
}
}
}
}
}
当type = text时, ES默认会设置两个field, 一个分词, 一个不分词. 如content会分词, 而content.keyword不分词, content.keyword最多保留256个字符.
通用语法
(1) 搜索所有index数据.
GET /_search
- 1
(2) 搜索指定index, type下的数据(index和type可以有多个)
GET /index1/_search
GET /index1,index2/_search
GET /index1/type1/_search
GET /index1/type1,type2/_search
GET /index1,index2/type1,type2/_search
(3) 搜索所有index下的指定type的数据.
GET /_all/employee,product/_search
下面再介绍Elasticsearch搜索的条件语法.
query string search
这个查询就类似于HTTP里的GET请求, 参数放在URL上.
(1) 语法
GET /index/type/_search?q=属性名:属性值
GET /index/type/_search?q=+属性名:属性值
GET /index/type/_search?q=-属性名:属性值
也可以省略属性名, 直接q=属性值
GET /index/type/_search?q=属性值
(2) + 和 - 的区别
默认是+, 指必须包含该字段, - 指不包含该字段.
(3) 实例
GET /website/article/_search?q=author_id:11403
GET /website/article/_search?q=-author_id:11403
GET /website/article/_search?q=11403
(4) _all metadata的原理
GET /index/type/_search?q=属性值.
这个语句是从所有属性中搜索包含指定的关键字的数据. 那么ES在搜索时是遍历所有document中的每一个field吗? 不是的, 我们在插入一条document时, ES会自动将多个field的值, 全部用字符串的方式串联起来, 变成一个长的字符串(以空格作为分隔符),作为_all field的值,同时进行分词建立倒排索引. 如果在搜索时没有指定属性名, 就会默认搜索_all field. (生产环境不使用)
query DSL
DSL, Domain Specified Language,特定领域的语言. 这个查询就类似于HTTP里的POST请求, 参数放在body中.
实例: 从website索引中查询所有文章
GET /website/article/_search
{
"query": {
"match_all": {}
}
}
full-text search
全文搜索.
(1) 基础使用
a. 搜索标题中包含first或second的文章
GET /website/article/_search
{
"query": {
"match": {
"title": "first second"
}
}
}
或者
GET /website/article/_search
{
“query”: {
“match”: {
“title”: {
“query”: “first second”
, “operator”: “or”
}
}
}
}
或者
GET /website/article/_search
{
“query”: {
“bool”: {
“should”: [
{“match”: {
“title”:“first”
}},
{“match”: {
“title”:“second”
}}
]
}
}
}
b. 搜索标题中包含first和second的文章
GET /website/article/_search
{
"query": {
"match": {
"title": {
"query": "first second"
, "operator": "and"
}
}
}
}
或者
GET /website/article/_search
{
“query”: {
“bool”: {
“must”: [
{“match”: {
“title”: “first”
}},
{“match”: {
“title”: “second”
}}
]
}
}
}
c. 搜索标题中至少包含first, second, third, fourth中三个单词的文章.
GET /website/article/_search
{
"query": {
"match": {
"title": {
"query": "first second third fourth",
"minimum_should_match": "75%"
}
}
}
}
或者
GET /website/article/_search
{
“query”: {
“bool”: {
“should”: [
{“match”: {
“title”: “first”
}},
{“match”: {
“title”: “second”
}},
{“match”: {
“title”: “third”
}},
{“match”: {
“title”: “fourth”
}}
],
“minimum_number_should_match”: 3
}
}
}
d. 从website索引中查询, 标题必须包含elasticsearch,内容可以包含elasticsearch也可以不包含,作者id必须不为111.
GET /website/article/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title":"elasticsearch"
}
}
],
"should": [
{
"match": {
"content":"elasticsearch"
}
}
],
"must_not": [
{
"match": {
"author_id":"111"
}
}
]
}
}
}
e. 从website索引中查询, 标题包含first, 同时按作者id降序排序
GET /website/article/_search
{
"query": {
"match": {
"title": "first"
}
},
"sort": [
{
"author_id": {
"order": "desc"
}
}
]
}
f. 从website索引中分页查询,总共3篇文章,假设每页就显示1篇文章,现在显示第2页
GET /website/article/_search
{
"query": {
"match_all": {}
},
"from": 1,
"size": 1
}
g. 从website索引中查询所有文章, 只显示post_date, title两个属性.
GET /website/article/_search
{
"query": {
"match_all": {}
},
"_source": ["post_date","title"]
}
h. 搜索标题中包含 article 的文章, 如果标题中包含first或second就优先搜索出来, 同时, 如果一个文章标题包含first article, 另一个文章标题包含second article, 包含first article的文章要优先搜索出来.
//通过加权重来处理, 默认权重为1
GET /website/article/_search
{
"query": {
"bool": {
"must": [
{"match": {
"title": "article"
}}
],
"should": [
{"match": {
"title": {
"query": "first",
"boost" : 3
}
}},
{"match": {
"title": {
"query": "second",
"boost": 2
}
}}
]
}
}
}
(2) multi_match
multi_match 用于查询词匹配多个属性. 这里涉及到几种匹配策略:
- best-fields
doc的某个属性匹配尽可能多的关键词, 那么这个doc会优先返回. - most-fields
某个关键词匹配doc尽可能多的属性, 那么这个doc会优先返回. - cross_fields
跨越多个field搜索一个关键词.
best-fields和most-fields的区别:
比如, doc1的field1匹配的三个关键词, doc2的field1, field2都匹配上了同一个关键词. 如果是best-fields策略, 则doc1的相关度分数要更高, 如果是most-fields策略, 则doc2的相关度分数要更高.
实例:
a. 使用best_fields策略, 从title和content中搜索"my third article".
GET /website/article/_search
{
"query": {
"multi_match": {
"query": "my third article",
"type": "best_fields",
"fields": ["title","content"]
}
}
}
b. 从title和content中搜索"my third article", 且这三个单词要连在一起.
GET /website/article/_search
{
"query": {
"multi_match": {
"query": "my third article",
"type": "cross_fields",
"operator": "and",
"fields": ["title","content"]
}
}
}
(3) dis_max
在我们进行多个条件的全文搜索时, 最后的计算出的相关度分数是根据多个条件的匹配分数综合而来的, 比如score = (score1 + score2) / 2, 如果我们想让最终的相关度分数等于多个条件匹配分数中的最大值, 即score = max(score1, score2), 则可以使用dis_max.
实例: 搜索title或content中包含first或article的文章
GET /website/article/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "first article"
}
},
{
"match": {
"content": "first article"
}
}
]
}
}
}
使用dis_max:
GET /website/article/_search
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"title": "first article"
}
},
{
"match": {
"content": "first article"
}
}
]
}
}
}
(4) tie_breaker
tie_breaker是与dis_max配套使用的. dis_max只取分数最大的那个条件的分数, 完全不考虑其他条件的分数, 但如果在某些场景下也需要考虑其他条件的分数呢? 我们可以指定一个系数值tie_breaker, 将其他条件的分数乘以tie_breaker, 然后和最大分数综合起来计算最终得分.
tie_breaker的取值为 0 ~ 1之间.
a. 将上面的实例优化下:
GET /website/article/_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"queries": [
{
"match": {
"title": "first article"
}
},
{
"match": {
"content": "first article"
}
}
]
}
}
}
b. 继续优化. 如果搜索词包含多个关键字, 我们要求至少匹配多个关键词, 且多个条件的权重不同.
GET /website/article/_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"queries": [
{
"match": {
"title": {
"query": "first article hello world",
"minimum_should_match": "50%",
"boost": 2
}
}
},
{
"match": {
"content": {
"query": "first article is my hero",
"minimum_should_match": "20%",
"boost": 1
}
}
}
]
}
}
}
phrase search
(1) 短语搜索, 与全文搜索有什么区别呢?
全文搜索会将"查询词"拆解开来, 去倒排索引中一一匹配, 只要能匹配上任意一个拆解后的关键词, 就可以作为结果返回. 而短语搜索在全文搜索的基础上, 要求关键词必须相邻. (注意短语搜索的"查询词"也是会被分词的)
GET /website/article/_search
{
"query": {
"match": {
"title": "first article"
}
}
}
//三条记录都会搜索出来
GET /website/article/_search
{
“query”: {
“match_phrase”: {
“title”: “first article”
}
}
}
//只有一条记录
(2) 原理:
短语搜索的原理实际上是相邻匹配(proximity match). Lucene建立的倒排索引结构为: 关键词 -> 文档号, 在文档中的位置, 在文档出现的频率等, 当一个"查询词"包含多个关键词时, Lucene先通过关键词找到对应的文档号, 判断多个关键词所在的文档号是否相同, 然后再判断在文档中的位置是否相邻.
(3) 实例
短语搜索默认是搜索相邻的关键词, 但也可以搜索间隔几个位置的关键词. 间隔越小相关度分数越高.
a. 从content中搜索"first website", first和website必须在同一个doc中, 且间隔不能超过10.
GET /website/article/_search
{
"query": {
"match_phrase": {
"content": {
"query": "first website",
"slop": 10
}
}
}
}
b. 全文搜索和短语搜索配合使用. 从content中搜索"first website", 在优先满足召回率的前提下, 尽可能提高精准度.
召回率: 从n个doc中搜索, 有多少个doc返回.
精准度: 让两个关键词间隔越小的doc相关度分数越高.
GET /website/article/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "first website"
}
}
],
"should": [
{
"match_phrase": {
"content": {
"query": "first website",
"slop": 10
}
}
}
]
}
}
}
(4) 优化短语搜索: rescore
短语搜索的性能要比全文搜索的性能低10倍以上, 所以一般我们要用短语搜索时都会配合全文搜索使用. 先通过全文搜索出匹配的doc, 然后对相关度分数最高的前n条doc进行rescore短语搜索. (这里只能用于分页搜索)
GET /website/article/_search
{
"query": {
"match": {
"content": "first website"
}
},
"rescore": {
"window_size": 20,
"query": {
"rescore_query": {
"match_phrase": {
"content": {
"query": "first website",
"slop": 10
}
}
}
}
}
}
(5) match_phrase_prefix
匹配短语的前缀, 用于做搜索推荐. 比如我们在百度输入一个关键词, 立马就会推荐一系列查询词, 这个就是搜索推荐.
这个功能不推荐使用, 因为性能太差, 我们一般通过ngram分词机制来实现搜索推荐.
GET /website/article/_search
{
"query": {
"match_phrase_prefix": {
"title": "my sec"
}
}
}
fuzzy search
模糊搜索, 自动将拼写错误的搜索文本进行纠正, 然后去匹配索引中的数据.
(1) 语法一
GET /website/article/_search
{
"query": {
"fuzzy": {
"title.keyword": {
"value": "my wecond article",
"fuzziness": 2
}
}
}
}
fuzziness代表最多纠正多少个字母, 默认为2. 搜索文本不会被分词.
(2) 语法二
GET /website/article/_search
{
"query": {
"match": {
"title": {
"query": "my wecond article",
"fuzziness": "auto",
"operator": "and"
}
}
}
}
fuzziness可以给定个数, 也可以设置为auto.
term search
term查询, 是一种结构化查询, "查询词"不会被分词, 结果要么存在要么不存在, 不关心结果的score相关度. 如果查询text属性, 需要改为查询filed.keyword.
(1) 实例
和短语搜索对比一下可以更好的理解:
1. phrase搜索
GET /website/article/_search
{
"query": {
"match_phrase": {
"title": "my first article"
}
}
}
//存在一条结果
-
term搜索text
GET /website/article/_search
{
“query”: {
“term”: {
“title”: {
“value”: “my first article”
}
}
}
}
//不存在结果, 原因是词语查询的value值不会被分词, 也就是直接查询"my first article". -
term搜索keyword
GET /website/article/_search
{
“query”: {
“term”: {
“title.keyword”: {
“value”: “my first article”
}
}
}
}
//存在一条结果, filed.keyword属性不分词.
(2) 常用语法
为了提高效率, term搜索一般与filter和constant_score联用. constant_score 以固定的评分来执行查询(默认为1), 而filter不计算score相关度, 因此执行效率非常高.
GET /website/article/_search
{
"query": {
"constant_score": {
"filter": {
"term": {
"post_date": "2017-01-03"
}
}
}
}
}
query filter
(1) 语法
query filter 用于过滤数据, 不参与score相关度计算, 效率很高. 适用于范围查询以及不计算相关度score的精确查询(filter + term)
(2) 实例
a. 从website索引中查询, 作者id必须大于等于11402,同时发表时间必须是2017-01-02.
GET /website/article/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"post_date": "2017-01-02"
}
},
{
"range": {
"author_id": {
"gte": 11402
}
}
}
]
}
}
}
}
}
b. 搜索发布日期为2017-01-01, 或者文章标题为"my first article"的帖子, 同时要求文章的发布日期绝对不为2017-01-02.
GET /website/article/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"term":{
"post_date": "2017-01-01"
}
},
{
"term":{
"title.keyword": "my first article"
}
}
],
"must_not": {
"term": {
"post_date": "2017-01-02"
}
}
}
}
}
}
}
c. 搜索文章标题为"my first article", 或者是文章标题为"my second article", 而且发布日期为"2017-01-01"的文章.
GET /website/article/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"should": [
{
"term":{
"title.keyword": "my first article"
}
},
{
"bool": {
"must": [
{
"term":{
"title.keyword": "my second article"
}
},
{
"term":{
"post_date": "2017-01-01"
}
}
]
}
}
]
}
}
}
}
}
d. 搜索文章标题为"my first article"或"my second article"的文章
GET /website/article/_search
{
"query": {
"constant_score": {
"filter": {
"terms": {
"title.keyword": [
"my first article",
"my second article"
]
}
}
}
}
}
e. 搜索tags中包含java的帖子.
GET /website/article/_search
{
"query": {
"constant_score": {
"filter": {
"terms": {
"tags.keyword": [
"java"
]
}
}
}
}
}
注意, 这里必须用terms, 因为term不支持数组.
d. 搜索tags中只包含java的帖子.
如果想搜索tags中只包含java的帖子, 就需要新增一个字段tags_count, 表示tags中有几个tag, 否则就无法搜索.
GET /website/article/_search
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"terms": {
"tags.keyword": [
"java"
]
}
},
{
"term": {
"tags_count": "1"
}
}
]
}
}
}
}
}
对上面这几实例做个总结, should表示或, must表示且.
scroll search
scroll滚动搜索,可以先搜索一批数据,然后下次再搜索一批数据,以此类推,直到搜索出全部的数据来. 它的原理是每次查询时都生成一个游标scroll_id, 后续的查询根据这个游标去获取数据, 直到返回的hits字段为空. scroll_id相当于建立了一个历史快照, 在此之后的写操作不会影响到这个快照的结果, 也就意味着其不能用于实时查询.
滚动查询用来解决深度分页的问题, 就类似于sql语句: select * from comment where id > 1000 order by id asc limit 1000.
(1) 语法
- 查询时指定一个参数scroll, 代表scroll_id的有效期, 过期后scroll_id会被ES自动清除.
- 如果不需要特定的排序, 按照文档创建时间排序更高效.
- scroll_id只能使用一次, 使用过后会被自动删除.
- 最后一次查询, hits为空时也会返回一个scroll_id, 我们需要手动删除来释放资源.
(2) 实例
a. 首次查询
GET /website/article/_search?scroll=1s
{
"query": {
"match_all": {}
},
"sort": [
{
"_doc": {
"order": "asc"
}
}
],
"size": 1
}
b. 后续查询
scroll_id只能使用一次.
GET /_search/scroll?scroll=1s&scroll_id=DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAATqFmRBQ3FwUFVrUUw2VHgyU2I5UWRMRlEAAAAAAAAE6xZkQUNxcFBVa1FMNlR4MlNiOVFkTEZRAAAAAAAABOwWZEFDcXBQVWtRTDZUeDJTYjlRZExGUQAAAAAAAATtFmRBQ3FwUFVrUUw2VHgyU2I5UWRMRlEAAAAAAAAE7hZkQUNxcFBVa1FMNlR4MlNiOVFkTEZR
{
}
或者
GET /_search/scroll
{
“scroll”:“1s”, “scroll_id”:“DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAATqFmRBQ3FwUFVrUUw2VHgyU2I5UWRMRlEAAAAAAAAE6xZkQUNxcFBVa1FMNlR4MlNiOVFkTEZRAAAAAAAABOwWZEFDcXBQVWtRTDZUeDJTYjlRZExGUQAAAAAAAATtFmRBQ3FwUFVrUUw2VHgyU2I5UWRMRlEAAAAAAAAE7hZkQUNxcFBVa1FMNlR4MlNiOVFkTEZR”
}
c. 删除指定scroll_id
DELETE /_search/scroll
{
"scroll_id": "scroll_id=DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAiXFmRBQ3FwUFVrUUw2VHgyU2I5UWRMRlEAAAAAAAAImBZkQUNxcFBVa1FMNlR4MlNiOVFkTEZRAAAAAAAACJkWZEFDcXBQVWtRTDZUeDJTYjlRZExGUQAAAAAAAAiaFmRBQ3FwUFVrUUw2VHgyU2I5UWRMRlEAAAAAAAAImxZkQUNxcFBVa1FMNlR4MlNiOVFkTEZR=="
}
d. 删除所有scroll_id
DELETE /_search/scroll/_all
聚合
聚合包括分组和统计, 其中分组操作包括term, histogram, date_histogram, filter. 统计操作包括count, avg, max, min, sum, cardinality, percentiles, percentile_ranks等.
注意: 聚合的属性不能被分词.
语法
GET /index/type/_search
{
size: 0,
"aggs": {
"NAME": {
"AGG_TYPE": {
"field": "field_name"
}
}
}
}
NAME为聚合操作的名称, 可以取一个有参考意义的名称. AGG_TYPE为分组或统计操作, 当进行分组操作时, 会自动生成一个doc_count, 统计了组内数据的数量. 默认按照doc_count降序排列.
size=0的原因是不需要搜索结果, 如果需要搜索结果, 则去除size=0.
实例
(1) 实例数据
新增电视机销售记录, 用于接下来的实例分析.
POST /televisions/sales/_bulk
{ "index": {}}
{ "price" : 1000, "color" : "红色", "brand" : "长虹", "sold_date" : "2016-10-28" }
{ "index": {}}
{ "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2016-11-05" }
{ "index": {}}
{ "price" : 3000, "color" : "绿色", "brand" : "小米", "sold_date" : "2016-05-18" }
{ "index": {}}
{ "price" : 1500, "color" : "蓝色", "brand" : "TCL", "sold_date" : "2016-07-02" }
{ "index": {}}
{ "price" : 1200, "color" : "绿色", "brand" : "TCL", "sold_date" : "2016-08-19" }
{ "index": {}}
{ "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2016-11-05" }
{ "index": {}}
{ "price" : 8000, "color" : "红色", "brand" : "三星", "sold_date" : "2017-01-01" }
{ "index": {}}
{ "price" : 2500, "color" : "蓝色", "brand" : "小米", "sold_date" : "2017-02-12" }
mapping结构如下:
GET /televisions/_mapping/sales
{
"televisions": {
"mappings": {
"sales": {
"properties": {
"brand": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"color": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"price": {
"type": "long"
},
"sold_date": {
"type": "date"
}
}
}
}
}
}
(2) 基础聚合实例
a. 统计哪种颜色的电视销量最高
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"max_sales_color": {
"terms": {
"field": "color.keyword"
}
}
}
}
分组后会计算出每组数据个数(doc_count), 默认按照doc_count降序显示.
b. 统计每种颜色电视的平均价格
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color.keyword"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
c. 统计每种颜色电视的平均价格, 以及统计每种颜色下每个品牌的平均价格.
这里就涉及到嵌套分组了, 也叫做多层下钻分析.
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color.keyword"
},
"aggs": {
"color_avg_price": {
"avg": {
"field": "price"
}
},
"group_by_brand": {
"terms": {
"field": "brand.keyword"
},
"aggs": {
"brand_avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}
d. 统计每种颜色电视机的最大最小价格
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color.keyword"
},
"aggs": {
"max_price": {
"max": {
"field": "price"
}
},
"min_price": {
"min": {
"field": "price"
}
}
}
}
}
}
e. 统计每种颜色电视机的总销售额
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color.keyword"
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
}
}
}
}
}
(2) 高级聚合实例
a. 按价格区间统计电视销量和销售额
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"group_by_price_range": {
"histogram": {
"field": "price",
"interval": 2000
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
}
}
}
}
}
b. 统计 2016-01-01 ~ 2017-12-31 范围内每个月的电视机销量.
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"group_by_sold_date": {
"date_histogram": {
"field": "sold_date",
"interval": "month",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2016-01-01",
"max": "2017-12-31"
}
}
}
}
}
c. 统计 2016-01-01 ~ 2017-12-31 范围内每个季度的销售额以及该季度下每个品牌的销售额
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"group_by_sold_date": {
"date_histogram": {
"field": "sold_date",
"interval": "quarter",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2016-01-01",
"max": "2017-12-31"
}
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
},
"group_by_brand": {
"terms": {
"field": "brand.keyword"
},
"aggs": {
"brand_sum_price": {
"sum": {
"field": "price"
}
}
}
}
}
}
}
}
d. 统计每种颜色电视的销售额, 按照销售额升序排序
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color.keyword",
"order": {
"sum_price": "asc"
}
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
}
}
}
}
}
e. 统计每种颜色下的每个品牌电视机的总销售额, 并按这个销售额升序排序.
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color.keyword"
},
"aggs": {
"group_by_brand": {
"terms": {
"field": "brand.keyword",
"order": {
"color_brand_sum_price": "asc"
}
},
"aggs": {
"color_brand_sum_price": {
"sum": {
"field": "price"
}
}
}
}
}
}
}
}
f. 统计每个月的电视销量, 并按品牌去重.
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"group_by_sold_date": {
"date_histogram": {
"field": "sold_date",
"interval": "month",
"format": "yyyy-MM-dd"
},
"aggs": {
"distinct_brand": {
"cardinality": {
"field": "brand.keyword",
"precision_threshold": 100
}
}
}
}
}
}
cardinality 去重采用的是近似估计的算法, 错误率在5%左右, 其中precision_threshold指定的值为100%准确去重的数量, 值设置的越大, 内存开销也就越大.
g. 统计50%, 90% 和 99%的电视的最大价格(一般用于统计api请求的最长延迟时间)
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"price_percentiles": {
"percentiles": {
"field": "price",
"percents": [
50,
90,
99
]
}
}
}
}
h. 统计每个品牌的电视机的价格, 在1000以内, 2000以内, 3000以内, 4000以内的所占比例.
GET /televisions/sales/_search
{
"size": 0,
"aggs": {
"group_by_brand": {
"terms": {
"field": "brand.keyword"
},
"aggs": {
"price_percentile_ranks": {
"percentile_ranks": {
"field": "price",
"values": [
1000,
2000,
3000,
4000
]
}
}
}
}
}
}
(3) 搜索+聚合
a. 统计指定品牌下(小米)每个颜色的销量
GET /televisions/sales/_search
{
"size": 0,
"query": {
"term": {
"brand.keyword": {
"value": "小米"
}
}
},
"aggs": {
"group_by_color": {
"terms": {
"field": "color.keyword"
}
}
}
}
或者
GET /televisions/sales/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"term": {
"brand.keyword": "小米"
}
}
}
},
"aggs": {
"group_by_color": {
"terms": {
"field": "color.keyword"
}
}
}
}
其实对于聚合来说, 因为不需要搜索结果, 可以直接用filter, 效率更高.
b. 统计单个品牌(长虹)与所有品牌销售额对比
GET /televisions/sales/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"term": {
"brand.keyword": "长虹"
}
}
}
},
"aggs": {
"single_brand_sum_price": {
"sum": {
"field": "price"
}
},
"all_brand": {
"global": {},
"aggs": {
"all_brand_sum_price": {
"sum": {
"field": "price"
}
}
}
}
}
}
global 表示将所有数据纳入聚合的scope,忽视前面的query过滤.
c. 统计指定品牌(长虹)最近一个月和最近半年的平均价格
GET /televisions/sales/_search
{
"size": 0,
"query": {
"term": {
"brand.keyword": {
"value": "长虹"
}
}
},
"aggs": {
"recent_one_month": {
"filter": {
"range": {
"sold_date": {
"gte": "now-1M"
}
}
},
"aggs": {
"recent_one_month_avg_price": {
"avg": {
"field": "price"
}
}
}
},
"recent_six_month": {
"filter": {
"range": {
"sold_date": {
"gte": "now-6M"
}
}
},
"aggs": {
"recent_six_month_avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
更多推荐
所有评论(0)