【Elasticsearch】Elasticsearch中 aggs (桶)聚合查询和进行二次聚合查询


  • Bucket aggregationsedit
    Bucket aggregations don’t calculate metrics over fields like the metrics aggregations do, but instead, they create buckets of documents. Each bucket is associated with a criterion (depending on the aggregation type) which determines whether or not a document in the current context “falls” into it. In other words, the buckets effectively define document sets. In addition to the buckets themselves, the bucket aggregations also compute and return the number of documents that “fell into” each bucket.
    Bucket aggregations, as opposed to metrics aggregations, can hold sub-aggregations. These sub-aggregations will be aggregated for the buckets created by their “parent” bucket aggregation.(大致意思就是,把你指定的数据放入一个桶中,然后返回其文档的数量和你指定字段)

官方文档

以下是几个简单的例子

这里需要先准备数据

POST /product/_bulk
{"index":{"_id":1}}
{"name":"曼克顿全麦面包","desc":"曼克顿全麦面包,超级松软","price":9.00,"lv":"高端面包","type":"面包","createtime":"2022-11-01T08:00:00Z","tags":["松软","全麦","健康"]}
{"index":{"_id":2}}
{"name":"曼克顿高钙面包","desc":"曼克顿高钙面包,每片含0.01g钙","price":10.29,"lv":"高端面包","type":"面包","createtime":"2022-05-21T08:00:00Z","tags":["营养","高钙","儿童"]}
{"index":{"_id":3}}
{"name":"果子面包","desc":"百年义利,老字号品牌面包","price":9.99,"lv":"高端面包","type":"面包","createtime":"2022-06-20","tags":["品牌","老字号","百年义利用"]}
{"index":{"_id":4}}
{"name":"曼克顿巧克力","desc":"曼克顿巧克力,代脂巧克力","price":30,"lv":"代脂巧克力","type":"糖果,巧克力","createtime":"2022-06-23","tags":["代脂巧克力","巧克力","奶油巧克力"]}
{"index":{"_id":5}}
{"name":"百醇巧克力","desc":"百醇巧克力,纯享可香可","price":55,"type":"糖果,巧克力","lv":"可可巧克力","createtime":"2022-07-20","tags":["可可巧克力","巧克力","奶油巧克力"]}
{"index":{"_id":6}}
{"name":"小米手机10","desc":"充电贼快掉电更快,超级无敌望远镜,高刷电竞屏","price":"","lv":"旗舰机","type":"手机","createtime":"2021-07-27","tags":["120HZ刷新率","120W快充","120倍变焦"]}
{"index":{"_id":7}}
{"name":"飞翔SE2","desc":"除了飞行模式,一无是处","price":"3299","lv":"旗舰机","type":"手机","createtime":"2020-07-21","tags":["割韭菜","割韭菜","割新韭菜"]}
{"index":{"_id":8}}
{"name":"XS Max","desc":"听说要出新款12手机了,终于可以换掉手中的Xs了","price":4399,"lv":"旗舰机","type":"手机","createtime":"2020-08-19","tags":["5V1A","4G全网通","大"]}
{"index":{"_id":9}}
{"name":"小米电视","desc":"70寸性价比只选,不要一万八,要不要八千八,只要两千九百九十八","price":2998,"lv":"高端机","type":"耳机","createtime":"2020-08-16","tags":["巨馍","家庭影院","游戏"]}
{"index":{"_id":10}}
{"name":"红米电视","desc":"我比上边那个更划算,我也2998,我也70寸,但是我更好看","price":2999,"type":"电视","lv":"高端机","createtime":"2020-08-28","tags":["大片","蓝光8K","超薄"]}
{"index":{"_id":11}}
{"name":"红米电视","desc":"我比上边那个更划算,我也2998,我也70寸,但是我更好看","price":2998,"type":"电视","lv":"高端机","createtime":"2020-08-28","tags":["大片","蓝光8K","超薄"]}
  • 普通的聚合查询(按照标签进行聚合,并且按照数量升序排序)
#聚合查询agg 
GET product/_search
{
  "size": 0, 
  "aggs": {
    "aggs_tag":{
      "terms": {
        "field": "tags.keyword",
        "order": {
          "_count": "asc"
        }
      }
    }
  }
}

搜索结果:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 11,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "aggs_tag": {
      "doc_count_error_upper_bound": -1,
      "sum_other_doc_count": 22,
      "buckets": [
        {
          "key": "120HZ刷新率",
          "doc_count": 1
        },
        {
          "key": "120W快充",
          "doc_count": 1
        },
        {
          "key": "120倍变焦",
          "doc_count": 1
        },
        {
          "key": "4G全网通",
          "doc_count": 1
        },
        {
          "key": "5V1A",
          "doc_count": 1
        },
        {
          "key": "代脂巧克力",
          "doc_count": 1
        },
        {
          "key": "健康",
          "doc_count": 1
        },
        {
          "key": "儿童",
          "doc_count": 1
        },
        {
          "key": "全麦",
          "doc_count": 1
        },
        {
          "key": "割新韭菜",
          "doc_count": 1
        }
      ]
    }
  }
}
  • 聚合查询价格最高
GET product/_search
{
  "size": 0, 
  "aggs": {
    "max_price":{
      "max": {
        "field": "price"
      }
    }
  }
}

搜索结果如下

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 11,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "max_price": {
      "value": 4399
    }
  }
}

接下就本文重点了,是使用agg 进行二次聚合

  • 平均价格最低的商品分类
  • 使用 pipline 进行聚合查询
GET product/_search
{
  "size": 0,
  "aggs": {
    "product_type": {
      "terms": {
        "field": "type.keyword"
      },
      "aggs": {
        "price_type_avg": {
          "avg": {
            "field": "price"
          }
        }
      }
    },
    "min_avg_type":{
      "min_bucket": {
        "buckets_path": "product_type>price_type_avg"
      }
    }
  }
}

搜索结果:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 11,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "product_type": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "手机",
          "doc_count": 3,
          "price_type_avg": {
            "value": 3849
          }
        },
        {
          "key": "面包",
          "doc_count": 3,
          "price_type_avg": {
            "value": 9.759999910990397
          }
        },
        {
          "key": "电视",
          "doc_count": 2,
          "price_type_avg": {
            "value": 2998.5
          }
        },
        {
          "key": "糖果,巧克力",
          "doc_count": 2,
          "price_type_avg": {
            "value": 42.5
          }
        },
        {
          "key": "耳机",
          "doc_count": 1,
          "price_type_avg": {
            "value": 2998
          }
        }
      ]
    },
    "min_avg_type": {
      "value": 9.759999910990397,
      "keys": [
        "面包"
      ]
    }
  }
}
  • 按照类型聚合后,再次按照等级聚合,再次向价格聚合,后获取每个类型下每个标签 桶 中价格最大的
GET product/_search
{
  "size": 0,
  "aggs": {
    "product_type": {
      "terms": {
        "field": "type.keyword"
      },
      "aggs": {
        "lv_aggs": {
          "terms": {
            "field": "lv.keyword"
          },
          "aggs": {
            "lv_price": {
              "stats": {
                "field": "price"
              }
            },
            "tages_buckets": {
              "terms": {
                "field": "tags.keyword"
              }
            }
          }
        },
        "max_type": {
          "max_bucket": {
            "buckets_path": "lv_aggs>lv_price.max"
          }
        }
      }
    }
  }
}

搜索结果如下:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 11,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "product_type": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "手机",
          "doc_count": 3,
          "lv_aggs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "旗舰机",
                "doc_count": 3,
                "tages_buckets": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "120HZ刷新率",
                      "doc_count": 1
                    },
                    {
                      "key": "120W快充",
                      "doc_count": 1
                    },
                    {
                      "key": "120倍变焦",
                      "doc_count": 1
                    },
                    {
                      "key": "4G全网通",
                      "doc_count": 1
                    },
                    {
                      "key": "5V1A",
                      "doc_count": 1
                    },
                    {
                      "key": "割新韭菜",
                      "doc_count": 1
                    },
                    {
                      "key": "割韭菜",
                      "doc_count": 1
                    },
                    {
                      "key": "大",
                      "doc_count": 1
                    }
                  ]
                },
                "lv_price": {
                  "count": 2,
                  "min": 3299,
                  "max": 4399,
                  "avg": 3849,
                  "sum": 7698
                }
              }
            ]
          },
          "max_type": {
            "value": 4399,
            "keys": [
              "旗舰机"
            ]
          }
        },
        {
          "key": "面包",
          "doc_count": 3,
          "lv_aggs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "高端面包",
                "doc_count": 3,
                "tages_buckets": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "健康",
                      "doc_count": 1
                    },
                    {
                      "key": "儿童",
                      "doc_count": 1
                    },
                    {
                      "key": "全麦",
                      "doc_count": 1
                    },
                    {
                      "key": "品牌",
                      "doc_count": 1
                    },
                    {
                      "key": "松软",
                      "doc_count": 1
                    },
                    {
                      "key": "百年义利用",
                      "doc_count": 1
                    },
                    {
                      "key": "老字号",
                      "doc_count": 1
                    },
                    {
                      "key": "营养",
                      "doc_count": 1
                    },
                    {
                      "key": "高钙",
                      "doc_count": 1
                    }
                  ]
                },
                "lv_price": {
                  "count": 3,
                  "min": 9,
                  "max": 10.289999961853027,
                  "avg": 9.759999910990397,
                  "sum": 29.27999973297119
                }
              }
            ]
          },
          "max_type": {
            "value": 10.289999961853027,
            "keys": [
              "高端面包"
            ]
          }
        },
        {
          "key": "电视",
          "doc_count": 2,
          "lv_aggs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "高端机",
                "doc_count": 2,
                "tages_buckets": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "大片",
                      "doc_count": 2
                    },
                    {
                      "key": "蓝光8K",
                      "doc_count": 2
                    },
                    {
                      "key": "超薄",
                      "doc_count": 2
                    }
                  ]
                },
                "lv_price": {
                  "count": 2,
                  "min": 2998,
                  "max": 2999,
                  "avg": 2998.5,
                  "sum": 5997
                }
              }
            ]
          },
          "max_type": {
            "value": 2999,
            "keys": [
              "高端机"
            ]
          }
        },
        {
          "key": "糖果,巧克力",
          "doc_count": 2,
          "lv_aggs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "代脂巧克力",
                "doc_count": 1,
                "tages_buckets": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "代脂巧克力",
                      "doc_count": 1
                    },
                    {
                      "key": "奶油巧克力",
                      "doc_count": 1
                    },
                    {
                      "key": "巧克力",
                      "doc_count": 1
                    }
                  ]
                },
                "lv_price": {
                  "count": 1,
                  "min": 30,
                  "max": 30,
                  "avg": 30,
                  "sum": 30
                }
              },
              {
                "key": "可可巧克力",
                "doc_count": 1,
                "tages_buckets": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "可可巧克力",
                      "doc_count": 1
                    },
                    {
                      "key": "奶油巧克力",
                      "doc_count": 1
                    },
                    {
                      "key": "巧克力",
                      "doc_count": 1
                    }
                  ]
                },
                "lv_price": {
                  "count": 1,
                  "min": 55,
                  "max": 55,
                  "avg": 55,
                  "sum": 55
                }
              }
            ]
          },
          "max_type": {
            "value": 55,
            "keys": [
              "可可巧克力"
            ]
          }
        },
        {
          "key": "耳机",
          "doc_count": 1,
          "lv_aggs": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "高端机",
                "doc_count": 1,
                "tages_buckets": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "家庭影院",
                      "doc_count": 1
                    },
                    {
                      "key": "巨馍",
                      "doc_count": 1
                    },
                    {
                      "key": "游戏",
                      "doc_count": 1
                    }
                  ]
                },
                "lv_price": {
                  "count": 1,
                  "min": 2998,
                  "max": 2998,
                  "avg": 2998,
                  "sum": 2998
                }
              }
            ]
          },
          "max_type": {
            "value": 2998,
            "keys": [
              "高端机"
            ]
          }
        }
      ]
    }
  }
}

以下做一个小的测试,以证明,配合query的优先级高于聚合查询你,所以可以先再query种过滤一部分数据再去聚合,这样减少Es的内存使用提高性能

GET product/_search
{
  "size": 10,
  "query": {
    "range": {
      "price": {
        "gte": 10,
        "lte": 60
      }
    }
  },
  "aggs": {
    "tags_bucket": {
      "terms": {
        "field": "tags.keyword"
      }
    }
  }
}

搜索后的结果如下:(这里值展示了金额再10 ~60 范围内的 一些面包和糖果)

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "product",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "曼克顿高钙面包",
          "desc": "曼克顿高钙面包,每片含0.01g钙",
          "price": 10.29,
          "lv": "高端面包",
          "type": "面包",
          "createtime": "2022-05-21T08:00:00Z",
          "tags": [
            "营养",
            "高钙",
            "儿童"
          ]
        }
      },
      {
        "_index": "product",
        "_id": "4",
        "_score": 1,
        "_source": {
          "name": "曼克顿巧克力",
          "desc": "曼克顿巧克力,代脂巧克力",
          "price": 30,
          "lv": "代脂巧克力",
          "type": "糖果,巧克力",
          "createtime": "2022-06-23",
          "tags": [
            "代脂巧克力",
            "巧克力",
            "奶油巧克力"
          ]
        }
      },
      {
        "_index": "product",
        "_id": "5",
        "_score": 1,
        "_source": {
          "name": "百醇巧克力",
          "desc": "百醇巧克力,纯享可香可",
          "price": 55,
          "type": "糖果,巧克力",
          "lv": "可可巧克力",
          "createtime": "2022-07-20",
          "tags": [
            "可可巧克力",
            "巧克力",
            "奶油巧克力"
          ]
        }
      }
    ]
  },
  "aggregations": {
    "tags_bucket": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "奶油巧克力",
          "doc_count": 2
        },
        {
          "key": "巧克力",
          "doc_count": 2
        },
        {
          "key": "代脂巧克力",
          "doc_count": 1
        },
        {
          "key": "儿童",
          "doc_count": 1
        },
        {
          "key": "可可巧克力",
          "doc_count": 1
        },
        {
          "key": "营养",
          "doc_count": 1
        },
        {
          "key": "高钙",
          "doc_count": 1
        }
      ]
    }
  }
}
  • 这是一个带有【global】的查询,因为使用了 global agg,所以会导致,【all_min_price】 这个是 从全局获取的数据,进行聚合的。官方文档也解释了,这样做是没有意义的
    在这里插入图片描述

global 官方文档

GET product/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "price": {
            "gte": 3000
          }
        }
      },
      "boost": 1.2
    }
  },
  "aggs": {
    "stat_data": {
      "stats": {
        "field": "price"
      }
    },
    "all_min_price": {
      "global": {},
      "aggs": {
        "global_avg_price": {
          "min": {
            "field": "price"
          }
        }
      }
    },
    "muti_rang_price": {
      "filter": {
        "range": {
          "price": {
            "lte": 4000
          }
        }
      },
      "aggs": {
        "avg_400_pirce": {
          "avg": {
            "field": "price"
          }
        }
      }
    }
  }
}
  • 多级聚合并且排序
    在耳机聚下会把巨厚的数据金一个排序展示。此实例是按照各个商品价格总和的降序进行排序展示
GET product/_search
{
  "size": 0,
  "aggs": {
    "type_args": {
      "terms": {
        "field": "type.keyword",
        "order": {
          "stat_data.sum": "desc"
        }
      },
      "aggs": {
        "stat_data": {
          "stats": {
            "field": "price"
          }
        }
      }
    }
  }
}

结果如下

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 11,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "type_args": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "手机",
          "doc_count": 3,
          "stat_data": {
            "count": 2,
            "min": 3299,
            "max": 4399,
            "avg": 3849,
            "sum": 7698
          }
        },
        {
          "key": "电视",
          "doc_count": 2,
          "stat_data": {
            "count": 2,
            "min": 2998,
            "max": 2999,
            "avg": 2998.5,
            "sum": 5997
          }
        },
        {
          "key": "耳机",
          "doc_count": 1,
          "stat_data": {
            "count": 1,
            "min": 2998,
            "max": 2998,
            "avg": 2998,
            "sum": 2998
          }
        },
        {
          "key": "糖果,巧克力",
          "doc_count": 2,
          "stat_data": {
            "count": 2,
            "min": 30,
            "max": 55,
            "avg": 42.5,
            "sum": 85
          }
        },
        {
          "key": "面包",
          "doc_count": 3,
          "stat_data": {
            "count": 3,
            "min": 9,
            "max": 10.289999961853027,
            "avg": 9.759999910990397,
            "sum": 29.27999973297119
          }
        }
      ]
    }
  }
}
Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐