一文详解elasticsearch的search type属性

Search的面临的问题：elasticsearch从出现的那天起就为分布式而生，分布式是把双刃剑，分布式强大的可扩展性和高效的性能再给elasticsearch带来强大高效的处理能力的同时，也带来了分布式常规需要解决的问题，即数据都需要在各个节点或者实例分散计算（分布式典型的移动计算而非移动数据的思想），这种特点在某些场景下可能会带来一些相对麻烦的处理。elasticsearch的search分

梦回从前

4143人浏览 · 2021-01-25 17:58:22

梦回从前 · 2021-01-25 17:58:22 发布

Search的面临的问题：

elasticsearch从出现的那天起就为分布式而生，分布式是把双刃剑，分布式强大的可扩展性和高效的性能再给elasticsearch带来强大高效的处理能力的同时，也带来了分布式常规需要解决的问题，即数据都需要在各个节点或者实例分散计算（分布式典型的移动计算而非移动数据的思想），这种特点在某些场景下可能会带来一些相对麻烦的处理。

elasticsearch的search分两步：

1、将search请求发送到各个节点的各个分片，在每个分片上进行计算，这个过程被称为Scatter（类似于mapreduce的map）

2、将第一步的结果汇集到一个节点上进行汇集，再整体进行一次计算，得到最终结果，这个过程被称为Gather（类似于mapreduce的reduce）

下面通过一个栗子来说明：

如果计算所有数据Sum，那没有任何问题，每个节点Sum后，汇集到一个节点进行最终的Sum得到最终的记过，水到渠成；

但是如果是TOP N的情况可能就会麻烦很多，有两种方法可以实现TOP N

1、和上面Sum的方式一样，先取到每个节点的TOP N，然后将所有数据在一个节点汇集后，再做一次TOP N得到最终结果；这种方式对应下面的QUERY_AND_FETCH方式，好处是仅需要访问一次实际数据节点即可以得到最后的结果，缺点是IO会是最终分片数目的几倍，倍数取决于shard个数

2、在每个节点获得到TOP N相关的元数据信息（最小能描述TOP N的数据，而非完整数据），在一个节点汇集后，计算出最终需要的TOP N信息，再去对应的节点去取最终的结果；这种方式对应下面的QUERY_THEN_FETCH方式，好处是能减少IO的开销和浪费，缺点是需要访问实际数据节点两次才能拿到最终的结果

由于elasticsearch无法确定两者的优劣，而且不同的场景有着不同的选择，于是elasticsearch将选择权交给了用户，让用户自己根据实际情况来决定。

Search Type定义

search type，顾名思义就是elasticsearch进行search时设置的search类型，从elasticsearch诞生至今一共有四种search type：

DFS_QUERY_THEN_FETCH，QUERY_THEN_FETCH，DFS_QUERY_AND_FETCH，QUERY_AND_FETCH

其中DFS_QUERY_AND_FETCH，QUERY_AND_FETCH两种类型在5.3版本之前的Search Request中会被用到，5.3版本到6.X版本只支持DFS_QUERY_THEN_FETCH，QUERY_THEN_FETCH，DFS_QUERY_AND_FETCH被废弃，QUERY_AND_FETCH过期，其中默认的search type为QUERY_THEN_FETCH，到7.X版本，再次废弃了QUERY_AND_FETCH，对应的源码被封装在了elasticsearch包里面的枚举类SearchType中，源码如下（6.6.2版本）：

package org.elasticsearch.action.search;

/**
 * Search type represent the manner at which the search operation is executed.
 *
 *
 */
public enum SearchType {
    /**
     * Same as {@link #QUERY_THEN_FETCH}, except for an initial scatter phase which goes and computes the distributed
     * term frequencies for more accurate scoring.
     */
    DFS_QUERY_THEN_FETCH((byte) 0),
    /**
     * The query is executed against all shards, but only enough information is returned (not the document content).
     * The results are then sorted and ranked, and based on it, only the relevant shards are asked for the actual
     * document content. The return number of hits is exactly as specified in size, since they are the only ones that
     * are fetched. This is very handy when the index has a lot of shards (not replicas, shard id groups).
     */
    QUERY_THEN_FETCH((byte) 1),
    // 2 used to be DFS_QUERY_AND_FETCH

    /**
     * Only used for pre 5.3 request where this type is still needed
     */
    @Deprecated
    QUERY_AND_FETCH((byte) 3);

    /**
     * The default search type ({@link #QUERY_THEN_FETCH}.
     */
    public static final SearchType DEFAULT = QUERY_THEN_FETCH;

    /**
     * Non-deprecated types
     */
    public static final SearchType [] CURRENTLY_SUPPORTED = {QUERY_THEN_FETCH, DFS_QUERY_THEN_FETCH};

    private byte id;

    SearchType(byte id) {
        this.id = id;
    }

    /**
     * The internal id of the type.
     */
    public byte id() {
        return this.id;
    }

    /**
     * Constructs search type based on the internal id.
     */
    public static SearchType fromId(byte id) {
        if (id == 0) {
            return DFS_QUERY_THEN_FETCH;
        } else if (id == 1
            || id == 3) { // TODO this bwc layer can be removed once this is back-ported to 5.3 QUERY_AND_FETCH is removed now
            return QUERY_THEN_FETCH;
        } else {
            throw new IllegalArgumentException("No search type for [" + id + "]");
        }
    }

    /**
     * The a string representation search type to execute, defaults to {@link SearchType#DEFAULT}. Can be
     * one of "dfs_query_then_fetch"/"dfsQueryThenFetch", "dfs_query_and_fetch"/"dfsQueryAndFetch",
     * "query_then_fetch"/"queryThenFetch" and "query_and_fetch"/"queryAndFetch".
     */
    public static SearchType fromString(String searchType) {
        if (searchType == null) {
            return SearchType.DEFAULT;
        }
        if ("dfs_query_then_fetch".equals(searchType)) {
            return SearchType.DFS_QUERY_THEN_FETCH;
        } else if ("query_then_fetch".equals(searchType)) {
            return SearchType.QUERY_THEN_FETCH;
        } else {
            throw new IllegalArgumentException("No search type for [" + searchType + "]");
        }
    }
    
}

按照源码的注释，可以得到如下的解释：

1、query and fetch

向索引的所有分片（shard）都发出查询请求，各分片返回的时候把元素文档（document）和计算后的排名信息一起返回。这种搜索方式是最快的。因为相比下面的几种搜索方式，这种查询方法只需要去shard查询一次。但是各个shard返回的结果的数量之和可能是用户要求的size的n倍。

2、query then fetch（默认的搜索方式）

如果你搜索时，没有指定搜索方式，就是使用的这种搜索方式。这种搜索方式，大概分两个步骤，第一步，先向所有的shard发出请求，各分片只返回”足够“（预估是排序、排名以及分值相关）的信息（注意，不包括文档document)，然后按照各分片返回的分数进行重新排序和排名，取前size个文档。然后进行第二步，去相关的shard取document。这种方式返回的document与用户要求的size是相等的。

3、DFS query and fetch

这种方式比第一种方式多了一个initial scatter phrase步骤，有这一步，可以使distributed term frequencies for more accurate scoring达到更好的效果

4、DFS query then fetch

这种方式比第二种方式多了一个initial scatter phrase步骤

Search Type总结

存在即是合理，这说明在不同的场景下，不同的search type都会有自己发挥的空间，简单来说：

1、如果search命中的shard较少，那么QUERY_AND_FETCH的效率理论上应该会比较好；反之，如果search命中的shard较多，那么QUERY_THEN_FETCH的效率理论上应该会比较好，两者的差别就在IO的消耗以及多一次查询两者之间的差值。这个临界值需要根据实际的集群以及数据情况进行测试

2、按照当前新版本SearchType的演进来看，显而易见的是，elasticsearch在经过优化后，两次查询带来的消耗明显已经优于IO的消耗了，继而SearchType的默认值也就变成了QUERY_THEN_FETCH，而QUERY_AND_FETCH在新版本中已经被废弃了，这场效率之争在现在版本看来还是QUERY_THEN_FETCH取得了阶段性的胜利

3、对于DFS相关的方式，笔者建议除非是对全文匹配的精确度有着极高的要求，否则其他对全文匹配精确度要求不高甚至不care全文匹配的场景，直接使用DEFAULT即可