ES拼音插件和es的众多插件一样,简单的说一下操作步骤就是

1,下载(https://github.com/medcl/elasticsearch-analysis-pinyin/releases  选择对应的版本,版本与ES版本一致,建议下载最新版本,然后idea打开适配当前自己使用的es版本号,然后maven打包)。

2,es安装插件,我的es是手动部署到linux服务器上的,es部署下有个目录/plugins/ 新建一个文件夹放进去/plugins/pinyin  然后需要重启es 看日志是否加载成功,当前es更新版本较快,需要适配自己的es版本,否则可能重启不成功。

3,定义新建自己的index索引,很重要,因为我是在已有的项目上去集成的pinyin插件,所以需要支持pinyin插件的同时需要适配之前的业务场景,这里列出es拼音插件的两种配置,希望读者在看到的时候要结合自己的业务逻辑 然后进行调整,拼音插件实现起来简单,但是要做到适合自己还是需要不停的调试

        第一个pinyin插件的参数

keep_first_letter启用此选项时,例如:刘德华> ldh,默认值:true
keep_separate_first_letter启用该选项时,将保留第一个字母分开,例如:刘德华> l,d,h,默认:假的,注意:查询结果也许是太模糊,由于长期过频
limit_first_letter_length 设置first_letter结果的最大长度,默认值:16
keep_full_pinyin当启用该选项,例如:刘德华> [ liu,de,hua],默认值:true
keep_joined_full_pinyin当启用此选项时,例如:刘德华> [ liudehua],默认值:false
keep_none_chinese 在结果中保留非中文字母或数字,默认值:true
keep_none_chinese_together保持非中国信一起,默认值:true,如:DJ音乐家- > DJ,yin,yue,jia,当设置为false,例如:DJ音乐家- > D,J,yin,yue,jia,注意:keep_none_chinese必须先启动
keep_none_chinese_in_first_letter第一个字母保持非中文字母,例如:刘德华AT2016- > ldhat2016,默认值:true
keep_none_chinese_in_joined_full_pinyin保留非中文字母加入完整拼音,例如:刘德华2016- > liudehua2016,默认:false
none_chinese_pinyin_tokenize打破非中国信成单独的拼音项,如果他们拼音,默认值:true,如:liudehuaalibaba13zhuanghan- > liu,de,hua,a,li,ba,ba,13,zhuang,han,注意:keep_none_chinese和keep_none_chinese_together应首先启用
keep_original 当启用此选项时,也会保留原始输入,默认值:false
lowercase 小写非中文字母,默认值:true
trim_whitespace 默认值:true
remove_duplicated_term当启用此选项时,将删除重复项以保存索引,例如:de的> de,默认值:false,注意:位置相关查询可能受影响

这个需要调试调到自己合适的参数值:业务场景是结合IK分词器分出刘德华 -> liudehua ,刘德华(保留用户输入)

"full_pinyin" : {
              "keep_joined_full_pinyin" : "true",
              "keep_none_chinese_in_first_letter" : "false",
              "lowercase" : "true",
              "none_chinese_pinyin_tokenize" : "false",
              "keep_none_chinese_in_joined_full_pinyin" : "true",
              "keep_original" : "true",
              "keep_first_letter" : "false",
              "keep_separate_first_letter" : "false",
              "type" : "pinyin",
              "keep_none_chinese" : "true",
              "limit_first_letter_length" : "16",
              "keep_full_pinyin" : "false"
            }

这个需要调试调到自己合适的参数值:业务场景是结合IK分词器分出刘德华 -> liudehua 

"full_pinyin_letter" : {
              "keep_joined_full_pinyin" : "true",
              "keep_none_chinese_in_first_letter" : "false",
              "lowercase" : "true",
              "none_chinese_pinyin_tokenize" : "false",
              "keep_none_chinese_in_joined_full_pinyin" : "true",
              "keep_original" : "false",
              "keep_first_letter" : "false",
              "keep_separate_first_letter" : "false",
              "type" : "pinyin",
              "keep_none_chinese" : "true",
              "limit_first_letter_length" : "16",
              "keep_full_pinyin" : "false"
            }

这两种能适配我现在的业务场景

pinyin的实现方式有很多种,这里介绍我自己的实现

        pinyin结合自动补全completion suggester 自动补全

这里用的分析器的实现 还有一种用 filter的实现

  "suggest_v1" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "blocks" : {
          "read_only_allow_delete" : "false"
        },
        "provided_name" : "suggest_v1",
        "max_result_window" : "1000000",
        "creation_date" : "164444429959",
        "analysis" : {
          "analyzer" : {
            "test_ik" : {
              "filter" : [
                "lowercase"
              ],
              "char_filter" : [
                "html_strip"
              ],
              "tokenizer" : "ik_max_word"
            },
            "test_ik_search" : {
              "filter" : [
                "lowercase"
              ],
              "char_filter" : [
                "html_strip"
              ],
              "tokenizer" : "ik_smart"
            },
            "pinyin_analyzer" : {
              "tokenizer" : "pinyin_letter"
            }
          },
          "tokenizer" : {
            "pinyin_letter" : {
              "keep_joined_full_pinyin" : "true",
              "keep_none_chinese_in_first_letter" : "false",
              "lowercase" : "true",
              "none_chinese_pinyin_tokenize" : "false",
              "keep_none_chinese_in_joined_full_pinyin" : "true",
              "keep_original" : "false",
              "keep_first_letter" : "false",
              "keep_separate_first_letter" : "false",
              "type" : "pinyin",
              "keep_none_chinese" : "true",
              "limit_first_letter_length" : "16",
              "keep_full_pinyin" : "false"
            }
          }
        }
      
      }
    }
  }
"mappings" : {
      "dynamic" : "false",
      "properties" : {
        "qqqq" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword"
            },
            "pinyin_suggest" : {
              "type" : "completion",
              "analyzer" : "pinyin_analyzer",
              "preserve_separators" : true,
              "preserve_position_increments" : true,
              "max_input_length" : 50
            }
          },
          "analyzer" : "test_ik",
          "search_analyzer" : "test_ik_search"
        },
        "aaaasearch" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword"
            },
            "pinyin_suggest" : {
              "type" : "completion",
              "analyzer" : "pinyin_analyzer",
              "preserve_separators" : true,
              "preserve_position_increments" : true,
              "max_input_length" : 50
            }
          },
          "analyzer" : "test_ik",
          "search_analyzer" : "test_ik_search"
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword"
            },
            "pinyin_suggest" : {
              "type" : "completion",
              "analyzer" : "pinyin_analyzer",
              "preserve_separators" : true,
              "preserve_position_increments" : true,
              "max_input_length" : 50
            }
          },
          "analyzer" : "test_ik",
          "search_analyzer" : "test_ik_search"
        }
      }
    }

然后使用completion suggester 前缀匹配即可实现 搜索建议

第二种使用filter

analysis" : {
          "filter" : {
            "full_pinyin" : {
              "keep_joined_full_pinyin" : "true",
              "keep_none_chinese_in_first_letter" : "false",
              "lowercase" : "true",
              "none_chinese_pinyin_tokenize" : "false",
              "keep_none_chinese_in_joined_full_pinyin" : "true",
              "keep_original" : "true",
              "keep_first_letter" : "false",
              "keep_separate_first_letter" : "false",
              "type" : "pinyin",
              "keep_none_chinese" : "true",
              "limit_first_letter_length" : "16",
              "keep_full_pinyin" : "false"
            }
          },
          "analyzer" : {
            "test_ik" : {
              "filter" : [
                "lowercase",
               
                "full_pinyin"
              ],
              "char_filter" : [
                "html_strip"
              ],
              "tokenizer" : "ik_max_word"
            },
            "test_ik_search" : {
              "filter" : [
                "lowercase",
                
                "full_pinyin"
              ],
              "char_filter" : [
                "html_strip"
              ],
              "tokenizer" : "ik_smart"
            }
          }
        }

都是结合ik实现

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐