ElasticSearch Aggregation(三)
文章目录ElasticSearch Aggregation(三)桶聚合date histogram聚合日历间隔日历间隔例子固定间隔固定间隔例子keyskeyed response脚本缺失值排序date range聚合缺失值keyed responsefilter聚合使用顶级`query`来限制所有的聚合在多个过滤器上使用`filters`filters聚合匿名过滤器Other桶ElasticSea
文章目录
ElasticSearch Aggregation(三)
桶聚合
date histogram聚合
日期直方图聚合。这date histogram
多桶聚合与histogram
非常类似,但是date histogram
多桶聚合只能使用在日期或者日期范围上。因为在ElasticSearch内部日期是用long
来表示的。这两种API最大的不同就是,date histogram
的internal
参数可以使用日期或者时间表达式。
像直方图以上,值被四舍五入到最近的桶中。例如,如果interval的间隔时间为1
天,那么2020-01-03T07:00:01Z
会四舍五入为 2020-01-03T00:00:00Z
。对值的四舍五入的计算公式为:
bucket_key = Math.floor(value / interval) * interval
配置日期直方图聚合时,可以通过两种方式指定时间间隔:日历感知时间间隔和固定时间间隔。
日历感知的间隔可以理解夏令时改变特定天数的长度,月份有不同的天数,并且闰秒可以附加到特定的一年。
相比之下,固定间隔总是国际单位制的倍数,并且不会根据日历上下文而改变。
日历间隔
日历感知间隔可以通过calendar_interval
参数来配置。你可以使用单位名称,例如month
,或者使用数量单位,例如1m
来指定日历间隔。例如day
和1d
是等价的。不支持多个数量,例如2d
。
日历间隔接收以下参数:
minute,1m
hour,1h
day,1d
week,1w
month,1m
quarter,1q
year,1y
日历间隔例子
以下例子是一个聚合请求,分桶间隔是以日历一个月为时间单位。
curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
"aggs": {
"sales_over_time": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
}
}
}
}
'
如果你尝试使用多个日历单位,那么聚合将会运行失败,因为日历间隔支持单个日历单位。
curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
"aggs": {
"sales_over_time": {
"date_histogram": {
"field": "date",
"calendar_interval": "2d"
}
}
}
}
'
{
"error" : {
"root_cause" : [...],
"type" : "x_content_parse_exception",
"reason" : "[1:82] [date_histogram] failed to parse field [calendar_interval]",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "The supplied interval [2d] could not be parsed as a calendar interval.",
"stack_trace" : "java.lang.IllegalArgumentException: The supplied interval [2d] could not be parsed as a calendar interval."
}
}
}
固定间隔
固定间隔可以通过fixed_interval
参数来设置。
与日历感知相比,固定间隔是一个固定数量的国际制单位,并且从不偏离。允许以支持单位的倍数指定固定时间。
然而固定时间不能表达其他单位,例如月,因为月的周期是不固定的。你尝试指定日历间隔时间将会引发异常。
固定间隔接收以下参数:
- milliseconds (
ms
) - seconds (
s
) - minutes (
m
) - hours (
h
) - days (
d
)
固定间隔例子
curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
"aggs": {
"sales_over_time": {
"date_histogram": {
"field": "date",
"fixed_interval": "30d"
}
}
}
}
'
keys
在ElasticSearch内部,时间以64
位长整数表示,以毫秒为单位。这些时间戳作为桶的key
名返回。key_as_string
是同一个时间戳被转换成特定格式的字符串。
提示:如果比指定format
格式,则使用字段映射中指定的第一个日期格式。
curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
"aggs": {
"sales_over_time": {
"date_histogram": {
"field": "date",
"calendar_interval": "1M",
"format": "yyyy-MM-dd"
}
}
}
}
'
响应为:
{
...
"aggregations": {
"sales_over_time": {
"buckets": [
{
"key_as_string": "2015-01-01",
"key": 1420070400000,
"doc_count": 3
},
{
"key_as_string": "2015-02-01",
"key": 1422748800000,
"doc_count": 2
},
{
"key_as_string": "2015-03-01",
"key": 1425168000000,
"doc_count": 2
}
]
}
}
}
keyed response
keyed
设置为true
时,会将一个唯一字符串与每个桶相关联,并且以散列的形式返回,而不是以数组的形式返回。
curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
"aggs": {
"sales_over_time": {
"date_histogram": {
"field": "date",
"calendar_interval": "1M",
"format": "yyyy-MM-dd",
"keyed": true
}
}
}
}
'
响应
{
...
"aggregations": {
"sales_over_time": {
"buckets": {
"2015-01-01": {
"key_as_string": "2015-01-01",
"key": 1420070400000,
"doc_count": 3
},
"2015-02-01": {
"key_as_string": "2015-02-01",
"key": 1422748800000,
"doc_count": 2
},
"2015-03-01": {
"key_as_string": "2015-03-01",
"key": 1425168000000,
"doc_count": 2
}
}
}
}
}
脚本
如果文档中的数据与您想要聚合的数据不完全匹配,请使用运行时字段。例如,促销销售的收入应在销售日期后一天确认:
curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
"runtime_mappings": {
"date.promoted_is_tomorrow": {
"type": "date",
"script": "long date = doc[\u0027date\u0027].value.toInstant().toEpochMilli();\nif (doc[\u0027promoted\u0027].value) {\n date += 86400;\n}\nemit(date);"
}
},
"aggs": {
"sales_over_time": {
"date_histogram": {
"field": "date.promoted_is_tomorrow",
"calendar_interval": "1M"
}
}
}
}
'
缺失值
missing 参数定义了如何处理缺失值的文档。默认情况下,它们会被忽略,但也可以将它们视为具有值。
curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
"aggs": {
"sale_date": {
"date_histogram": {
"field": "date",
"calendar_interval": "year",
"missing": "2000/01/01"
}
}
}
}
'
排序
默认情况下,返回的存储桶按其键升序排序,但您可以使用 order 设置控制顺序。
date range聚合
专用于日期值的范围聚合。此聚合与常规范围聚合的主要区别在于,from
和to
值可以用Date Math
表达式表示,还可以指定一种日期格式,通过该格式返回from
和to
响应字段。请注意,此聚合包括每个范围的from
值,不包括to
值。
curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyyy",
"ranges": [
{ "to": "now-10M/M" },
{ "from": "now-10M/M" }
]
}
}
}
}
'
以上例子会创建两个桶
- 十个月之前的
- 前十个月到现在的
GET my-index-000001/_search
{
"size": 0,
"aggs": {
"range": {
"date_range": {
"field": "birthday",
"ranges": [
{
"from": "2015-01",
"to": "2015-12"
}
]
}
}
}
}
响应值:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"range" : {
"buckets" : [
{
"key" : "2015-01-01T00:00:00.000Z-2015-12-01T00:00:00.000Z",
"from" : 1.4200704E12,
"from_as_string" : "2015-01-01T00:00:00.000Z",
"to" : 1.448928E12,
"to_as_string" : "2015-12-01T00:00:00.000Z",
"doc_count" : 3
}
]
}
}
}
缺失值
missing 参数定义应如何处理缺少值的文档。默认情况下,它们将被忽略,但也可以将它们视为具有值。这是通过添加一组 fieldname : value
映射来指定每个字段的默认值来完成的。
curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"missing": "1976/11/30",
"ranges": [
{
"key": "Older",
"to": "2016/02/01"
},
{
"key": "Newer",
"from": "2016/02/01",
"to" : "now/d"
}
]
}
}
}
}
'
以上"missing": "1976/11/30"
是指某些文档中缺失date
字段的时候,在日期范围分桶聚合的时候默认赋值为1976/11/30
。
keyed response
将 keyed 标志设置为 true 会将唯一的字符串键与每个存储桶关联,并将范围作为散列而不是数组返回:
curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{ "to": "now-10M/M" },
{ "from": "now-10M/M" }
],
"keyed": true
}
}
}
}
'
响应:
{
...
"aggregations": {
"range": {
"buckets": {
"*-10-2015": {
"to": 1.4436576E12,
"to_as_string": "10-2015",
"doc_count": 7
},
"10-2015-*": {
"from": 1.4436576E12,
"from_as_string": "10-2015",
"doc_count": 0
}
}
}
}
}
你也可以为每个范围自定义key
名称
curl -X POST "localhost:9200/sales/_search?size=0&pretty" -H 'Content-Type: application/json' -d'
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [
{ "from": "01-2015", "to": "03-2015", "key": "quarter_01" },
{ "from": "03-2015", "to": "06-2015", "key": "quarter_02" }
],
"keyed": true
}
}
}
}
'
响应为:
{
...
"aggregations": {
"range": {
"buckets": {
"quarter_01": {
"from": 1.4200704E12,
"from_as_string": "01-2015",
"to": 1.425168E12,
"to_as_string": "03-2015",
"doc_count": 5
},
"quarter_02": {
"from": 1.425168E12,
"from_as_string": "03-2015",
"to": 1.4331168E12,
"to_as_string": "06-2015",
"doc_count": 2
}
}
}
}
}
filter聚合
过滤器聚合。就是在进行桶聚合之前,对文档进行过滤,利用过滤后的文档集合来进行单桶聚合。例如:
curl -XGET "http://localhost:9200/my-index-000001/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"avg_age":{
"avg": {
"field": "age"
}
},
"filter_agg": {
"filter": {
"term": {
"email": "123456@qq.com"
}
},
"aggs": {
"avg_age": {
"avg": {
"field": "age"
}
}
}
}
}
}'
响应值为:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"avg_age" : {
"value" : 39.75
},
"filter_agg" : {
"doc_count" : 2,
"avg_age" : {
"value" : 60.0
}
}
}
}
使用顶级query
来限制所有的聚合
在运行搜索的时候可以利用顶级查询来限制所有文档的聚合。这种方式比单独使用filter
聚合跟快。例如:
curl -XGET "http://localhost:9200/my-index-000001/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"query": {
"term": {
"email": {
"value": "123456@qq.com"
}
}
},
"aggs": {
"age_avg": {
"avg": {
"field": "age"
}
}
}
}'
响应为:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"age_avg" : {
"value" : 60.0
}
}
}
在多个过滤器上使用filters
使用filter aggregation
来分组文档,这种方式要比使用多个单独的filter
要快的多。例如:
curl -XGET "http://localhost:9200/my-index-000001/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"myfilters": {
"filters": {
"filters": {
"a": {"term": {
"email": "123456@qq.com"
}},
"b":{
"term": {
"email": "110@qq.com"
}
}
}
},
"aggs": {
"age_avg": {
"avg": {
"field": "age"
}
}
}
}
}
}'
以下写法与上边的写法是等价的。但是上边的性能要比下面的性能高得多:
curl -XGET "http://localhost:9200/my-index-000001/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"a": {
"filter": {"term": {
"email": "123456@qq.com"
}},
"aggs": {
"avg_age": {
"avg": {
"field": "age"
}
}
}
},
"b":{
"filter": {"term": {
"email": "110@qq.com"
}},
"aggs": {
"avg_age": {
"avg": {
"field": "age"
}
}
}
}
}
}'
响应为:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"myfilters" : {
"buckets" : {
"a" : {
"doc_count" : 2,
"age_avg" : {
"value" : 60.0
}
},
"b" : {
"doc_count" : 1,
"age_avg" : {
"value" : 13.0
}
}
}
}
}
}
filters聚合
多桶聚合,其中每个桶包含与查询匹配的文档。例如:
curl -XGET "http://localhost:9200/my-index-000001/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"my_filters": {
"filters": {
"filters": {
"li": {
"match": {
"name": "li"
}
},
"wang":{
"match":{
"name":"wang"
}
}
}
},
"aggs": {
"age_avg": {
"avg": {
"field": "age"
}
}
}
}
}
}'
响应:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_filters" : {
"buckets" : {
"li" : {
"doc_count" : 2,
"age_avg" : {
"value" : 27.5
}
},
"wang" : {
"doc_count" : 2,
"age_avg" : {
"value" : 31.5
}
}
}
}
}
}
匿名过滤器
过滤器字段也可以作为过滤器数组提供,如以下请求所示:
GET my-index-000001/_search
{
"size": 0,
"aggs": {
"my_filters": {
"filters": {
"filters": [
{"match":{"name":"li"}},
{"match":{"name":"wang"}}
]
},
"aggs": {
"age_avg": {
"avg": {
"field": "age"
}
}
}
}
}
}
响应:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_filters" : {
"buckets" : [
{
"doc_count" : 2,
"age_avg" : {
"value" : 27.5
}
},
{
"doc_count" : 2,
"age_avg" : {
"value" : 31.5
}
}
]
}
}
}
Other桶
设置other_bucket
参数可向响应中添加一个桶,该桶包含与filters
中都不匹配的所有文档。参数值可以是以下:
-
false
:并计算其他桶 -
true
:如果使用的是匿名过滤器,那么最后一个桶就是other
桶。如果不是匿名过滤器,那么other
桶的名称由other_bucket_key
参数指定。
以下例子中,other
桶被命名为other_messages
。
GET my-index-000001/_search
{
"size": 0,
"aggs": {
"my_filters": {
"filters": {
"other_bucket_key": "other_messages",
"filters": [
{
"match": {
"name": "li"
}
},
{
"match": {
"name": "wang"
}
}
]
},
"aggs": {
"age_avg": {
"avg": {
"field": "age"
}
}
}
}
}
}
响应:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_filters" : {
"buckets" : {
"li" : {
"doc_count" : 2,
"age_avg" : {
"value" : 27.5
}
},
"wang" : {
"doc_count" : 2,
"age_avg" : {
"value" : 31.5
}
},
"other_messages" : {
"doc_count" : 0,
"age_avg" : {
"value" : null
}
}
}
}
}
}
以下例子使用匿名过滤器,那么最后一个桶就是other
桶
curl -XGET "http://localhost:9200/my-index-000001/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"my_filters": {
"filters": {
"other_bucket_key": "other_messages",
"filters": [
{
"match": {
"name": "li"
}
},
{
"match": {
"name": "wang"
}
}
]
},
"aggs": {
"age_avg": {
"avg": {
"field": "age"
}
}
}
}
}
}'
响应:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"my_filters" : {
"buckets" : [
{
"doc_count" : 2,
"age_avg" : {
"value" : 27.5
}
},
{
"doc_count" : 2,
"age_avg" : {
"value" : 31.5
}
},
{
"doc_count" : 0,
"age_avg" : {
"value" : null
}
}
]
}
}
}
更多推荐










所有评论(0)