ES的复杂查询(二)——聚合、分组
ES的聚合查询(SUM、MAX、MIN、AVG)1.求和查询例如查询cust_name_s为张三的txn_amt_d的和,sql语句是select sum(txn_amt_d) as sum_number from trade_info.csrcb where cust_name_s = '张三'对应的es的查询就是(此处与sql中的sum一样,也有起别名以及聚合方式的参数,其中的aggs可以写全
ES的聚合查询(SUM、MAX、MIN、AVG)
1.求和查询
例如查询cust_name_s为张三的txn_amt_d的和,sql语句是
select sum(txn_amt_d) as sum_number from trade_info.csrcb where cust_name_s = '张三'
对应的es的查询就是(此处与sql中的sum一样,也有起别名以及聚合方式的参数,其中的aggs可以写全,写成aggregatins)
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
"query": {
"term": {
"cust_name_s.keyword": "张三"
}
},
"aggs": {
"sum_number": {
"sum": {
"field": "txn_amt_d"
}
}
}
} '
2.求平均值
例如查询cust_name_s为张三的txn_amt_d的平均值,sql语句是
select avg(txn_amt_d) as avg_number from trade_info.csrcb where cust_name_s = '张三'
同样在es的查询中只要指定聚合方式为avg即可
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
"query": {
"term": {
"cust_name_s.keyword": "张三"
}
},
"aggregations": {
"avg_number": {
"avg": {
"field": "txn_amt_d"
}
}
}
} '
3.最大最小值查询
例如查询cust_name_s为张三的txn_amt_d的最大值和最小值,sql语句是
select max(txn_amt_d) as max_number, min(txn_amt_d) as min_number from trade_info.csrcb where cust_name_s = '张三'
同样只要指定最大最小的别名,并且指定聚合方式为max以及min即可
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
"query": {
"term": {
"cust_name_s.keyword": "张三"
}
},
"aggregations": {
"max_number": {
"max": {
"field": "txn_amt_d"
}
},
"min_number": {
"min": {
"field": "txn_amt_d"
}
}
}
} '
4.唯一值查询(类似于sql中的distinct)
使用cardinality求唯一值,即不重复的字段有多少,同样,统计的字段不能分词,例如统计cust_name_s字段的不重复的值
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
"size": 0,
"query": {
"match_all": {}
},
"aggregations": {
"unique_num": {
"cardinality": {
"field": "cust_name_s.keyword"
}
}
}
} '
5.更为方便的统计查询,将记录数,最大最小,平均值,和都展示出来
使用stats聚合,可以将满足条件的记录直接统计最大最小、平均值、总和、记录数结果
例如查询cust_name_s为张三的统计,统计的字段为txn_amt_d
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search?pretty" -H "Content-Type:application/json" -d '{
"size": 0,
"query": {
"term": {"cust_name_s.keyword": "张三"}
},
"aggregations": {
"count_nums": {
"stats": {
"field": "txn_amt_d"
}
}
}
} '
ES的分组查询
1.单纯分组统计记录数
类似于sql中的count(),现在使用es统计txn_amt_d字段数值在[0,500),[500,1000),[1000,1500),[1500,2000),[2000,2500)之间的记录数,在aggs的自定义别名txn_amt_group中,使用range来做分组,field是以age为分组,分组使用ranges来做,from和to是范围,使用es查询的字段range,注意使用range就是做闭右开,即from为闭区间,to为开区间
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
"size": 0,
"query": {
"match_all": {}
},
"aggregations": {
"txn_amt_group": {
"range": {
"field": "txn_amt_d",
"ranges": [
{"from": "0","to": "500"},
{"from": "500","to": "1000"},
{"from": "1000","to": "1500"},
{"from": "1500","to": "2000"},
{"from": "2000","to": "2500"}
]
}
}
}
} '
上述案例可以理解成以每500为一个区间,查询txn_amt_d在对应范围内的数量,使用histogram桶聚合查询
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search?pretty" -H "Content-Type:application/json" -d '{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"txn_amt_d_section_num": {
"histogram": {
"field": "txn_amt_d",
"interval": 500,
"extended_bounds": {
"min":0,
"max":2500
}
}
}
}
} '
注意上面写的查询中的extended_bounds中的max不是指查询到这个最大值,而是如果文档中的最大值小于此,那么统计到此,若大于这个max的值,则一直聚合查询到文档中的最大值
2.根据刚刚的区间划分进行统计处理(使用stats)
也就是在分组的情形条件下再聚合统计即可
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
"size": 0,
"aggregations": {
"txn_amt_group": {
"range": {
"field": "txn_amt_d",
"ranges": [
{"from": "0","to": "500"},
{"from": "500","to": "1000"},
{"from": "1000","to": "1500"},
{"from": "1500","to": "2000"},
{"from": "2000","to": "2500"}
]
},
"aggs": {
"ststistics_num": {
"stats": {"field": "txn_amt_d"}
}
}
}
}
} '
3.分组查询统计(类似于sql中的group by)
例如sql是这样
select count(txn_amt_d),sum(txn_amt_d),max(txn_amt_d),min(txn_amt_d),avg(txn_amt_d) from trade_info.csrcb group by txn_type_s
对应的es的查询就是这样
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
"size": 0,
"aggregations": {
"txn_type_group": {
"terms": {
"field": "txn_type_s.keyword"
},
"aggs": {
"statstics_num": {
"stats": {
"field": "txn_amt_d"
}
}
}
}
}
} '
多字段分组的时候就想上面aggs里面嵌套一样
例如
select count(txn_amt_d),sum(txn_amt_d),max(txn_amt_d),min(txn_amt_d),avg(txn_amt_d) from trade_info.csrcb group by cust_name_s,txn_type_s
对应的es查询
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
"aggs":{
"cust_name_group":{
"terms":{
"field":"cust_name_s.keyword"
},
"aggs": {
"txn_type_group": {
"terms": {
"field": "txn_type_s.keyword"
},
"aggs": {
"statstics_num": {
"stats": {
"field": "txn_amt_d"
}
}
}
}
}
}
}
} '
4.分组查询后,展示最大(或最小)的文档的记录的信息
例如根据交易类型txn_type_s字段分组,分别展示不同组交易金额字段txn_amt_d最大的文档记录,结合size为1
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
"size": 0,
"aggregations": {
"txn_type_group": {
"terms": {
"field": "txn_type_s.keyword"
},
"aggs": {
"doc_top_1": {
"top_hits": {
"size": 1,
"sort": [{
"txn_amt_d": {
"order": "desc"
}
}]
}
}
}
}
}
} '
5.查询分组后的各组的平均值的最小值
例如,根据交易类型分组,统计各族的平均值,并统计平均值的最小值,sql应该是这样的
select min(a.avg_num) as min_avg_num from (select avg(txn_amt_d) as avg_num from trade_info group by txn_type_s) a
对应的es的查询
curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
"size": 0,
"aggregations": {
"txn_type_group": {
"terms": {
"field": "txn_type_s.keyword"
},
"aggs": {
"avg_num": {
"avg": {
"field": "txn_amt_d"
}
}
}
},
"min_avg_num": {
"min_bucket": {
"buckets_path": "txn_type_group>avg_num"
}
}
}
} '
注意:管道操作(txn_type_group>avg_num中的>不是大于的意思,描述的为路径)
脚本
聚合后保留数据两位小数
"aggs" : {
"total" : {
"sum" : {
"script" : "Math.round(doc['practiceObj.practiceValue'].value*100)/100.0"
}
}
}
更多推荐
所有评论(0)