cardinality作用

统计去重后的数量
Elasticsearch 提供的首个近似聚合是 cardinality (注:基数)度量。 它提供一个字段的基数,即该字段的 distinct 或者 unique 值的数目。
类似mysql 查询:SELECT COUNT(DISTINCT name) FROM TABLE
elasticsearch写法:

POST /index/_search
{
  "size":0,
  "aggs": {
    "name_count": {
      "cardinality": {
        "field": "name",
        "precision_threshold": 100
      }
    }
  }
}

response:

{
  "aggregations": {
    "name_count": {
      "value": 3
    }
  }
}

这里precision_threshold选项允许用内存来换取准确性,并定义一个唯一计数,低于该计数预计将接近准确。高于此值,计数可能会变得更加模糊。支持的最大值为 40000,高于此数字的阈值将与阈值 40000 具有相同的效果。默认值为3000。

collapse作用

es5以后的新特性collapse ,Field Collapsing(字段折叠)不能与scroll、rescore以及search after 结合使用。
elasticsearch写法:

GET /index/_search
{
    "query": {
        "match": {
            "message": "elasticsearch"
        }
    },
    "collapse" : {
        "field" : "user" 
    }
}

user :折叠字段
返回结果:

{
	"hits": {
		"total": 2,
		"max_score": 4.276666,
		"hits": [{
			"_index": "index",
			"_type": "type",
			"_id": "kiRgoHsBtpTqY8pwLiMi",
			"_score": 4.276666,
			"_source": {
				//数据展示
			},
			"fields": {
				"message: [
					"elasticsearch"
				]
			}
		}]
	}
}

还可以使用该inner_hits选项扩展每个折叠的热门搜索

 "collapse" : {
        "field" : "user", ①
        "inner_hits": {
            "name": "last_tweets", ②
            "size": 5, ③
            "sort": [{ "date": "asc" }] ④
        },
        "max_concurrent_group_searches": 4 ⑤
    }

①使用“user”字段折叠结果集
②用于响应中内部命中部分的名称
③每个折叠键要检索的内层命中数
④如何对每个组内的文档进行排序
⑤每组允许检索inner_hits`的并发请求数

cardinality + collapse做分页查询

		BoolQueryBuilder bq = esUtil.getBoolQueryBuilder(dto);
        bq.must(QueryBuilders.termQuery("name","name"));
        SearchSourceBuilder ssb = new SearchSourceBuilder();
        ssb.query(bq)
            .sort(SortBuilders.fieldSort("lastDate").order(SortOrder.DESC))
			.collapse(new CollapseBuilder("name"))
            .aggregation(AggregationBuilders.cardinality("newName").field("name"));
        SearchRequest searchRequest = new SearchRequest();
        searchRequest.indices(index);
        searchRequest.source(ssb);

        SearchResponse response = restHighLevelClient.search(searchRequest);
        //去重总数
        Cardinality cardinality = response.getAggregations().get("reCompanyName");
        long value = cardinality.getValue();
        //非去重总数
        SearchHits hits = response.getHits();
        long totalHits = hits.getTotalHits();

里面有些具体的配置需要自行配置

elaticsearch写法:

{
  "from": 0,
  "size": 4,
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "name": {
              "value": "name"
            }
          }
        }
      ]
    }
  },
  "aggregations": {
    "reCompanyName": {
      "cardinality": {
        "field": "name"
      }
    }
  },
  "collapse": {
    "field": "name"
  }
}

返回数据展示:

{
	"hits": {
		"total": 6,
		"max_score": null,
		"hits": [{
				"_index": "index",
				"_type": "type",
				"_id": "dSRgoHsBtpTqY8pwFiN8",
				"_score": null,
				"_source": {
					//数据
				},
				"fields": {
					"name": [
						"name"
					]
				}
			},
			{
				"_index": "index",
				"_type": "type",
				"_id": "cyRgoHsBtpTqY8pwFiN8",
				"_score": null,
				"_source": {
					//数据
				},
				"fields": {
					"name": [
						"name"
					]
				}
			},
			{
				"_index": "index",
				"_type": "type",
				"_id": "pSRgoHsBtpTqY8pwNCOT",
				"_score": null,
				"_source": {
					//数据
				},
				"fields": {
					"name": [
						"name"
					]
				}
			}
		]
	},
	"aggregations": {
		"newName": {
			"value": 3
		}
	}
}

可以看大总命中数是6,折叠的3个最后折叠去重的数据是3

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐