elasticsearch使用 cardinality + collapse做分页去重查询
cardinality作用统计去重后的数量Elasticsearch 提供的首个近似聚合是 cardinality (注:基数)度量。 它提供一个字段的基数,即该字段的 distinct 或者 unique 值的数目。类似mysql 查询:SELECT COUNT(DISTINCT name) FROM TABLEelasticsearch写法:POST /index/_search{"size"
cardinality作用
统计去重后的数量
Elasticsearch 提供的首个近似聚合是 cardinality (注:基数)度量。 它提供一个字段的基数,即该字段的 distinct 或者 unique 值的数目。
类似mysql 查询:SELECT COUNT(DISTINCT name) FROM TABLE
elasticsearch写法:
POST /index/_search
{
"size":0,
"aggs": {
"name_count": {
"cardinality": {
"field": "name",
"precision_threshold": 100
}
}
}
}
response:
{
"aggregations": {
"name_count": {
"value": 3
}
}
}
这里precision_threshold选项允许用内存来换取准确性,并定义一个唯一计数,低于该计数预计将接近准确。高于此值,计数可能会变得更加模糊。支持的最大值为 40000,高于此数字的阈值将与阈值 40000 具有相同的效果。默认值为3000。
collapse作用
es5以后的新特性collapse ,Field Collapsing(字段折叠)不能与scroll、rescore以及search after 结合使用。
elasticsearch写法:
GET /index/_search
{
"query": {
"match": {
"message": "elasticsearch"
}
},
"collapse" : {
"field" : "user"
}
}
user :折叠字段
返回结果:
{
"hits": {
"total": 2,
"max_score": 4.276666,
"hits": [{
"_index": "index",
"_type": "type",
"_id": "kiRgoHsBtpTqY8pwLiMi",
"_score": 4.276666,
"_source": {
//数据展示
},
"fields": {
"message: [
"elasticsearch"
]
}
}]
}
}
还可以使用该inner_hits选项扩展每个折叠的热门搜索
"collapse" : {
"field" : "user", ①
"inner_hits": {
"name": "last_tweets", ②
"size": 5, ③
"sort": [{ "date": "asc" }] ④
},
"max_concurrent_group_searches": 4 ⑤
}
①使用“user”字段折叠结果集
②用于响应中内部命中部分的名称
③每个折叠键要检索的内层命中数
④如何对每个组内的文档进行排序
⑤每组允许检索inner_hits`的并发请求数
cardinality + collapse做分页查询
BoolQueryBuilder bq = esUtil.getBoolQueryBuilder(dto);
bq.must(QueryBuilders.termQuery("name","name"));
SearchSourceBuilder ssb = new SearchSourceBuilder();
ssb.query(bq)
.sort(SortBuilders.fieldSort("lastDate").order(SortOrder.DESC))
.collapse(new CollapseBuilder("name"))
.aggregation(AggregationBuilders.cardinality("newName").field("name"));
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices(index);
searchRequest.source(ssb);
SearchResponse response = restHighLevelClient.search(searchRequest);
//去重总数
Cardinality cardinality = response.getAggregations().get("reCompanyName");
long value = cardinality.getValue();
//非去重总数
SearchHits hits = response.getHits();
long totalHits = hits.getTotalHits();
里面有些具体的配置需要自行配置
elaticsearch写法:
{
"from": 0,
"size": 4,
"query": {
"bool": {
"must": [
{
"term": {
"name": {
"value": "name"
}
}
}
]
}
},
"aggregations": {
"reCompanyName": {
"cardinality": {
"field": "name"
}
}
},
"collapse": {
"field": "name"
}
}
返回数据展示:
{
"hits": {
"total": 6,
"max_score": null,
"hits": [{
"_index": "index",
"_type": "type",
"_id": "dSRgoHsBtpTqY8pwFiN8",
"_score": null,
"_source": {
//数据
},
"fields": {
"name": [
"name"
]
}
},
{
"_index": "index",
"_type": "type",
"_id": "cyRgoHsBtpTqY8pwFiN8",
"_score": null,
"_source": {
//数据
},
"fields": {
"name": [
"name"
]
}
},
{
"_index": "index",
"_type": "type",
"_id": "pSRgoHsBtpTqY8pwNCOT",
"_score": null,
"_source": {
//数据
},
"fields": {
"name": [
"name"
]
}
}
]
},
"aggregations": {
"newName": {
"value": 3
}
}
}
可以看大总命中数是6,折叠的3个最后折叠去重的数据是3
更多推荐
所有评论(0)