聚合(aggs)

聚合一般用于数据的统计分析,类似于mysql的group by。

聚合里面有两个基本概念,一个叫桶,一个叫度量。

桶的作用,是按照某种方式对数据进行分组,每一组数据成为一个桶。比如对手机品牌分组,可以得到小米桶,华为桶。

桶的分组方式

Date Histogram Aggregation:根据日期阶梯分组,例如给定阶梯为周,会自动每周分为一组
Histogram Aggregation:根据数值阶梯分组,与日期类似
Terms Aggregation:根据词条内容分组,词条内容完全匹配的为一组
Range Aggregation:数值和日期的范围分组,指定开始和结束,然后按段分组

可以看出ES的分组方式相当强大,mysql的group by只能实现类似Terms Aggregation的分组效果,而ES还可以根据阶梯和范围来分组。

度量

度量类似mysql的avg,max等函数,用来求分组内平均值,最大值等。

比较常用的一些度量聚合方式:

Avg Aggregation:求平均值
Max Aggregation:求最大值
Min Aggregation:求最小值
Percentiles Aggregation:求百分比
Stats Aggregation:同时返回avg、max、min、sum、count等
Sum Aggregation:求和
Top hits Aggregation:求前几
Value Count Aggregation:求总数

词条桶

我们来看最简单的词条桶,brand_aggs就是自定义桶的名字,terms表示词条桶,field:brand表示按照字段brand来划分桶,size为0表示不想返回查询结果,从这里可以看出分页不影响聚合的结果,也就是说可以实现分页查询和聚合结果一起返回。

下面的查询是通过品牌名来分组统计

GET /goods/_search
{
    "size" : 0,
    "aggs" : { 
        "brand_aggs" : { 
            "terms" : { 
              "field" : "brand"
            }
        }
    }
}

查询结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "brand_aggs" : {                        //桶的名字
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [                        //查询结果
        {
          "key" : "华为",                  //品牌名,因为是按照品牌分组
          "doc_count" : 3                  //统计的数量
        },
        {
          "key" : "小米",
          "doc_count" : 2
        }
      ]
    }
  }
}

可以看到不需要加度量默认就把总数求出来了,如果要求品牌下平均手机价格,就需要加度量了

度量平均值

GET /goods/_search
{
    "size" : 0,
    "aggs" : { 
        "brand_aggs" : { 
            "terms" : { 
              "field" : "brand"
            },
            "aggs":{
                "avg_price": { 
                   "avg": {
                      "field": "price" 
                   }
                }
            }
        }
    }
}

返回结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "brand_aggs" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "华为",
          "doc_count" : 3,
          "avg_price" : {
            "value" : 4500.0
          }
        },
        {
          "key" : "小米",
          "doc_count" : 2,
          "avg_price" : {
            "value" : 5000.0
          }
        }
      ]
    }
  }
}

代码实现

public void testAggs() {
    AbstractAggregationBuilder aggregationBuilder = AggregationBuilders.terms("brand_aggs").field("brand");//通过品牌分组
    aggregationBuilder.subAggregation(AggregationBuilders.avg("avg_price").field("price")); //平均值度量,计算price平均值
    NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
            .withPageable(PageRequest.of(0, 1))  //size只能大于0
            .addAggregation(aggregationBuilder)
            .build();
    SearchHits<GoodsInfo> goodsInfos = elasticsearchRestTemplate.search(nativeSearchQuery, GoodsInfo.class);
    Terms brandTerms = goodsInfos.getAggregations().get("brand_aggs");
    brandTerms.getBuckets().stream().forEach(bucket -> {
        System.out.println(bucket.getKey()); //获取品牌名
        System.out.println(bucket.getDocCount()); //获取总数
        ParsedAvg avgPrice = bucket.getAggregations().get("avg_price"); //获取平均价格
        System.out.println(avgPrice.getValue());
    });
}

阶梯桶Histogram

下面的例子是按照500为一个阶梯统计不同价位手机数量

GET /goods/_search
{
  "size":0,
  "aggs":{
    "price_histogram":{
      "histogram": {
        "field": "price",
        "interval": 500
      }
    }
  }
}

结果:

{
  "took" : 103,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "price_histogram" : {
      "buckets" : [
        {
          "key" : 3500.0,
          "doc_count" : 1
        },
        {
          "key" : 4000.0,
          "doc_count" : 0
        },
        {
          "key" : 4500.0,
          "doc_count" : 2
        },
        {
          "key" : 5000.0,
          "doc_count" : 0
        },
        {
          "key" : 5500.0,
          "doc_count" : 2
        }
      ]
    }
  }
}

代码:

public void testHistogram() {
    AbstractAggregationBuilder aggregationBuilder = AggregationBuilders.histogram("price_histogram").field("price").interval(500);//500一个阶梯统计
    NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
            .withPageable(PageRequest.of(0, 1))  //size只能大于0
            .addAggregation(aggregationBuilder)
            .build();
    SearchHits<GoodsInfo> goodsInfos = elasticsearchRestTemplate.search(nativeSearchQuery, GoodsInfo.class);
    ParsedHistogram priceHistogram = goodsInfos.getAggregations().get("price_histogram");
    priceHistogram.getBuckets().stream().forEach(bucket -> {
        System.out.println(bucket.getKey()); //阶梯值
        System.out.println(bucket.getDocCount()); //获取总数
    });
}

范围分桶Range Aggregation

统计价格在4000-6000手机的数量

GET /goods/_search
{
  "size": 0, 
  "aggs": {
    "price_range": {
      "range": {
        "field": "price",
        "ranges": [
          {
            "from": 4000,
            "to": 6000
          }
        ]
      }
    }
  }
}

结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "price_range" : {
      "buckets" : [
        {
          "key" : "4000.0-6000.0",
          "from" : 4000.0,
          "to" : 6000.0,
          "doc_count" : 4
        }
      ]
    }
  }
}

代码:

public void testRangeAggrs() {
    AbstractAggregationBuilder aggregationBuilder = AggregationBuilders.range("price_range").field("price").addRange(4000, 6000);//[4000,6000)范围统计
    NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
            .withPageable(PageRequest.of(0, 1))  //size只能大于0
            .addAggregation(aggregationBuilder)
            .build();
    SearchHits<GoodsInfo> goodsInfos = elasticsearchRestTemplate.search(nativeSearchQuery, GoodsInfo.class);
    ParsedRange priceHistogram = goodsInfos.getAggregations().get("price_range");
    priceHistogram.getBuckets().stream().forEach(bucket -> {
        System.out.println(bucket.getKey()); //key值
        System.out.println(bucket.getDocCount()); //获取总数
    });
}

日期桶DateHistogram

GET /cars/_search
{
  "size":0,
  "aggs" : {
      "date" : {
          "date_histogram" : {
              "field" : "sold",
              "interval" : "1M",
              "format" : "yyyy-MM",
              "time_zone": "+08:00",
              "min_doc_count": 1
          }
      }
  }
}

结果:

  "aggregations" : {
    "date" : {
      "buckets" : [
        {
          "key_as_string" : "2013-12",
          "key" : 1385859600000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2014-02",
          "key" : 1391216400000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2014-05",
          "key" : 1398906000000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2014-07",
          "key" : 1404176400000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2014-08",
          "key" : 1406854800000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2014-10",
          "key" : 1412125200000,
          "doc_count" : 1
        },
        {
          "key_as_string" : "2014-11",
          "key" : 1414803600000,
          "doc_count" : 2
        }
      ]
    }
  }

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐