ElasticSearch集群搭建,集成中文分词,建立全文检索索引(笔记)
准备工作:1.三个虚拟机节点,安装centos6x2.根据客户端的jdk情况,准备elasticsearch版本3.对应版本jdk4.elasticSearch对应版本的中文分词插件5.对应版本的head插件6.不考虑kibana,所以直接考虑chrome的sense插件1.虚拟机每个节点建立elastic用户和组groupadd elasticuseradd -m -g ela...
文章目录
准备工作:
1.三个虚拟机节点,安装centos6x
2.根据客户端的jdk情况,准备elasticsearch版本
3.对应版本jdk
4.elasticSearch对应版本的中文分词插件
5.对应版本的head插件
6.不考虑kibana,所以直接考虑chrome的sense插件
1.虚拟机每个节点建立elastic用户和组
groupadd elastic
useradd -m -g elastic elastic
2.把安装包们上传到一个节点里
有多种方式:
a.通过scp命令(针对宿主机是macbook或者Linux系统电脑)
b.通过ftp上传
注:
1.系统是传统spingmvc,jdk是1.7,所以选择elasticsearch2.4版本
2.中文分词路径https://github.com/medcl/elasticsearch-analysis-ik/,里面注明了对应版本关系。安装方式选择release版直接放到/home/elastic/env/elasticsearch-2.4.0/plugins/ik下
3.jdk选择jdk-7u80-linux-x64.tar.gz
4.head插件路径https://github.com/mobz/elasticsearch-head,在5.0之前的版本,不需要用到nodejs环境,直接放到/home/elastic/env/elasticsearch-2.4.0/plugins/head下面就行;
5.0之后要安装nodejs
3.配置jdk环境
export PATH
export JAVA_HOME=/home/elastic/env/jdk1.7.0_80
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
4.配置elasticsearch配置文件
172.16.30.100节点
vi elasticsearch.yml(关键配置)
cluster.name: blog_cluster
node.name: node-1
path.data: /home/elastic/data/elastic
path.logs: /home/elastic/data/elastic/log
network.host: 172.16.30.100 //绑定本地地址
http.port: 9200 //设置访问端口
discovery.zen.ping.unicast.hosts: ["172.16.30.100", "172.16.30.101","172.16.30.102"] //配置节点发现地址,自己的ip也配上
其余节点调整node_name、network.host这两个就可以了。
5.启动es
./bin/elasticsearch -d
(后台启动)
调试阶段可以./bin/elasticsearch
,日志输出在当前页面,方便排查问题
每个节点都启动起来
6.检查节点情况
网页访问http://172.16.30.100:9200/_plugin/head/
7.检查中文分词
http://172.16.30.100:9200/blog_index/_analyze?analyzer=ik_max_word&text=测试中文分词
{
"tokens": [
{
"token": "测试",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "中文",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
{
"token": "分词",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 2
},
{
"token": "词",
"start_offset": 5,
"end_offset": 6,
"type": "CN_WORD",
"position": 3
}
]
}
8.全文检索
8.1建立索引
PUT /blog_index
{
"settings": {
"refresh_interval": "5s",
"number_of_shards" : 3,
"number_of_replicas" : 2
},
"mappings": {
"blog": {
"dynamic": false,
"properties": {
"title": {
"type": "string",
"analyzer": "ik_max_word",
"fields": {
"cn": {
"type": "string",
"analyzer": "ik_max_word"
},
"en": {
"type": "string",
"analyzer": "english"
}
}
},
"content": {
"type": "string",
"analyzer": "ik_max_word"
},
"depcode": {
"type": "string",
"index": "not_analyzed"
},
"blog_type": {
"type": "string",
"index": "not_analyzed"
},
"create_time": {
"type": "date"
},
"blog_id": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
8.2批量插入测试数据
POST /blog_index/blog/_bulk
{ "index": {}}
{ "content": "硬盘对集群非常重要,特别是建索引多的情况。磁盘是一个服务器最慢的系统,对于写比较重的集群,磁盘很容易成为集群的瓶颈。","title": "周星驰最新电影","blog_id": "a001","create_time": "2018-07-01","depcode": "3302","blog_type": "电影类"}
{ "index": {}}
{ "content": "如果可以承担的器SSD盘,最好使用SSD盘。如果使用SSD,最好调整I/O调度算法。RAID0是加快速度的不错方法。统计一下","title": "自动调整存储带宽","blog_id": "a002","create_time": "2018-07-02","depcode": "3302","blog_type": "其他类"}
8.3全文检索
POST /blog_index/blog/_search
{
"query": {
"multi_match": {
"query": "系统",
"fields": ["title","content"]
}
},
"highlight": {
"fields": {
"title":{},
"content":{}
},
"pre_tags": [
"<em>"
],
"post_tags": [
"</em>"
]
}
}
结果如下:
{
"took": 119,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.09228377,
"hits": [
{
"_index": "blog_index",
"_type": "blog",
"_id": "AWXDay9KXenDqWqMO57c",
"_score": 0.09228377,
"_source": {
"content": "硬盘对集群非常重要,特别是建索引多的情况。磁盘是一个服务器最慢的系统,对于写比较重的集群,磁盘很容易成为集群的瓶颈。",
"title": "周星驰最新电影",
"blog_id": "a001",
"create_time": "2018-07-01",
"depcode": "3302",
"blog_type": "电影类"
},
"highlight": {
"content": [
"硬盘对集群非常重要,特别是建索引多的情况。磁盘是一个服务器最慢的<em>系统</em>,对于写比较重的集群,磁盘很容易成为集群的瓶颈。"
]
}
}
]
}
}
至此,搭建测试完成。
更多推荐
所有评论(0)