ES的聚合查询(SUM、MAX、MIN、AVG)

1.求和查询

例如查询cust_name_s为张三的txn_amt_d的和,sql语句是

select sum(txn_amt_d) as sum_number from trade_info.csrcb where cust_name_s = '张三'

对应的es的查询就是(此处与sql中的sum一样,也有起别名以及聚合方式的参数,其中的aggs可以写全,写成aggregatins)

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "term": {
            "cust_name_s.keyword": "张三"
        }
    },
    "aggs": {
        "sum_number": {
            "sum": {
                "field": "txn_amt_d"
            }
        }
    }
} '

2.求平均值

例如查询cust_name_s为张三的txn_amt_d的平均值,sql语句是

select avg(txn_amt_d) as avg_number from trade_info.csrcb where cust_name_s = '张三'

同样在es的查询中只要指定聚合方式为avg即可

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "term": {
            "cust_name_s.keyword": "张三"
        }
    },
    "aggregations": {
        "avg_number": {
            "avg": {
                "field": "txn_amt_d"
            }
        }
    }
} '

3.最大最小值查询

例如查询cust_name_s为张三的txn_amt_d的最大值和最小值,sql语句是

select max(txn_amt_d) as max_number, min(txn_amt_d) as min_number from trade_info.csrcb where cust_name_s = '张三'

同样只要指定最大最小的别名,并且指定聚合方式为max以及min即可

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "query": {
        "term": {
            "cust_name_s.keyword": "张三"
        }
    },
    "aggregations": {
        "max_number": {
            "max": {
                "field": "txn_amt_d"
            }
        },
        "min_number": {
            "min": {
                "field": "txn_amt_d"
            }
        }
    }
} '

4.唯一值查询(类似于sql中的distinct)

使用cardinality求唯一值,即不重复的字段有多少,同样,统计的字段不能分词,例如统计cust_name_s字段的不重复的值

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggregations": {
        "unique_num": {
            "cardinality": {
                "field": "cust_name_s.keyword"
            }
        }
    }
} '

 

5.更为方便的统计查询,将记录数,最大最小,平均值,和都展示出来

使用stats聚合,可以将满足条件的记录直接统计最大最小、平均值、总和、记录数结果

例如查询cust_name_s为张三的统计,统计的字段为txn_amt_d

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search?pretty" -H "Content-Type:application/json" -d '{
    "size": 0,
    "query": {
        "term": {"cust_name_s.keyword": "张三"}
    },
    "aggregations": {
        "count_nums": {
            "stats": {
                "field": "txn_amt_d"
            }
        }
    }
} '

ES的分组查询

1.单纯分组统计记录数

类似于sql中的count(),现在使用es统计txn_amt_d字段数值在[0,500),[500,1000),[1000,1500),[1500,2000),[2000,2500)之间的记录数,在aggs的自定义别名txn_amt_group中,使用range来做分组,field是以age为分组,分组使用ranges来做,from和to是范围,使用es查询的字段range,注意使用range就是做闭右开,即from为闭区间,to为开区间

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggregations": {
        "txn_amt_group": {
            "range": {
                "field": "txn_amt_d",
                "ranges": [
		            {"from": "0","to": "500"},
                    {"from": "500","to": "1000"},
                    {"from": "1000","to": "1500"},
                    {"from": "1500","to": "2000"},
                    {"from": "2000","to": "2500"}
		        ]
            }
        }
    }
} '

上述案例可以理解成以每500为一个区间,查询txn_amt_d在对应范围内的数量,使用histogram桶聚合查询

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search?pretty" -H "Content-Type:application/json" -d '{
    "size": 0,
    "query": {
        "match_all": {}
    },
    "aggs": {
        "txn_amt_d_section_num": {
            "histogram": {
                "field": "txn_amt_d",
		        "interval": 500,
		        "extended_bounds": {
		            "min":0,
		            "max":2500
		        }
            }
        }
    }
} '

注意上面写的查询中的extended_bounds中的max不是指查询到这个最大值,而是如果文档中的最大值小于此,那么统计到此,若大于这个max的值,则一直聚合查询到文档中的最大值

2.根据刚刚的区间划分进行统计处理(使用stats)

也就是在分组的情形条件下再聚合统计即可

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "size": 0,
    "aggregations": {
        "txn_amt_group": {
            "range": {
                "field": "txn_amt_d",
                "ranges": [
		            {"from": "0","to": "500"},
                    {"from": "500","to": "1000"},
                    {"from": "1000","to": "1500"},
                    {"from": "1500","to": "2000"},
                    {"from": "2000","to": "2500"}
		        ]
            },
            "aggs": {
                "ststistics_num": {
                    "stats": {"field": "txn_amt_d"}
                }
            }
        }
    }
} '

3.分组查询统计(类似于sql中的group by)

例如sql是这样

select count(txn_amt_d),sum(txn_amt_d),max(txn_amt_d),min(txn_amt_d),avg(txn_amt_d) from trade_info.csrcb group by txn_type_s

对应的es的查询就是这样

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "size": 0,
    "aggregations": {
        "txn_type_group": {
            "terms": {
                "field": "txn_type_s.keyword"
            },
            "aggs": {
                "statstics_num": {
                    "stats": {
                        "field": "txn_amt_d"
                    }
                }
            }
        }
    }
} '

多字段分组的时候就想上面aggs里面嵌套一样

例如

select count(txn_amt_d),sum(txn_amt_d),max(txn_amt_d),min(txn_amt_d),avg(txn_amt_d) from trade_info.csrcb group by cust_name_s,txn_type_s

 对应的es查询

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "aggs":{
        "cust_name_group":{
            "terms":{
                "field":"cust_name_s.keyword"
            },
            "aggs": {
                "txn_type_group": {
                    "terms": {
                        "field": "txn_type_s.keyword"
                    },
                    "aggs": {
                        "statstics_num": {
                            "stats": {
                                "field": "txn_amt_d"
                            }
                        }
                    }
                }
            }
        }
    }    
} '

4.分组查询后,展示最大(或最小)的文档的记录的信息

例如根据交易类型txn_type_s字段分组,分别展示不同组交易金额字段txn_amt_d最大的文档记录,结合size为1

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "size": 0,
    "aggregations": {
        "txn_type_group": {
            "terms": {
                "field": "txn_type_s.keyword"
            },
            "aggs": {
                "doc_top_1": {
                    "top_hits": {
                        "size": 1,
                        "sort": [{
                            "txn_amt_d": {
                                "order": "desc"
                            }
                        }]
                    }
                }
            }
        }
    }
} '

5.查询分组后的各组的平均值的最小值

例如,根据交易类型分组,统计各族的平均值,并统计平均值的最小值,sql应该是这样的

select min(a.avg_num) as min_avg_num from (select avg(txn_amt_d) as avg_num from trade_info group by txn_type_s) a

对应的es的查询

curl -XPOST "http://192.168.236.131:9200/trade_info/csrcb/_search" -H "Content-Type:application/json" -d '{
    "size": 0,
    "aggregations": {
        "txn_type_group": {
            "terms": {
                "field": "txn_type_s.keyword"
            },
            "aggs": {
                "avg_num": {
                    "avg": {
                        "field": "txn_amt_d"
                    }
                }
            }
        },
        "min_avg_num": {
            "min_bucket": {
                "buckets_path": "txn_type_group>avg_num"
            }
        }
    }
} '

注意:管道操作(txn_type_group>avg_num中的>不是大于的意思,描述的为路径)

脚本

聚合后保留数据两位小数

"aggs" : {
        "total" : {  
            "sum" : { 
              "script" : "Math.round(doc['practiceObj.practiceValue'].value*100)/100.0"
               } 

        }
    }

可以参考elasticsearch - 如何向上舍入到小数点后两位 - elasticsearch - Thinbug

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐