1、简介

elasticSearch【分布式开源搜索与分析引擎,适用于所有类型的数据,包括文本,数字,地理空间,结构化和非结构化数据, 秒级从海量数据从检索出我们所需要的数据,而mysql单表如果达到了百万级数据,检索很慢】

用途:

1、应用程序搜索

2、网站搜索

3、企业搜索

4、日志处理和分析

5、基础设施指标和容器检测

6、应用程序性能检测

7、地理空间数据分析和可视化

8、安全分析和业务分析 mysql【数据的持久化管理curd】

2、基本概念

1、Index(索引),相当于mysql的insert操作,插入(索引)一条数据到数据库;名词形式相当于mysql的database; 2、Type(类型),在索引中,可以定义一个或者多个类型,类似于mysql的table,每一种类型的数据放到一起; 3、Document(文档),保存在某个索引(Index)下,某种类型(Type)的一个数据(Document),文档是Json格式的,Document就像是Mysql中的某个table里边的内容; ES集群 google microsoft megacorp 索引 employee product employee product 类型 {id:1,name:张三} 文档,一条记录 {id:2,name:李四} id被称为属性,==列 ES的概念2:倒排索引;Mysql中保存一条数据,可能是正向索引; 在ES中,会维护一张倒排索引表; 词 记录位置 红海 1,2,3,4,5 行动 1,2,3 特工 5

会有一个相关性得分:3号记录命中了两个单词;

3、基本环境搭建

docker pull elasticsearch
docker pull libana ##可视化检索数据
mkdir -p /mydata/elasticsearch/config #挂载到外部,方便修改
mkdir -p /mydata/elasticsearch/data #挂载到外部,方便修改
echo "http.host:0.0.0.0">> /mydata/elasticsearch/config/elasticsearch.yml #标识ES的机器,可以被远程任何机器访问,写入到yml里边
cat elasticsearch.yml
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \ #9200是restAPI的时候给9200发送请求,9300是ES在分布式集群下节点之间的通信端口
-e "discovery.type = single-node" \#单节点模式运行
-e ES_JAVA_OPTS = "-Xms64m -Xmx128m" \ #真正上线之后,内存都是32G左右的
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \    #这是进行挂载配置文件
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \  #这是挂载数据
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \  #这是挂载插件目录
-d elasticsearch:latest
#查看镜像版本:
docker image inspect nginx:latest | grep -i version

生成elasticsearch容器[完整,有时行,有时不行,看内存好像,elasticsearch,默认好像占用全部内存,不要脸。]:
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-e ES_JAVA_OPTS="-Xms512m -Xmx1024m" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch:latest
生成之后,容器2秒后死亡,扩大运行内存试试??【不管用,那就多分配点资源,不设置最小内存了】
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \
-e "discovery.type=single-node" \
-v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/elasticsearch.yml \
-v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \
-v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \
-d elasticsearch

free -m 查看使用的虚拟机内存



启动Kibana
##docker run --name kibana -e ELASTICSEARCH_HOSTS=192.168.52.130:9200 -p 5601:5601 -d kibana:latest
docker run --name kibana -e ELASTICSEARCH_URL=http://192.168.52.130:9200 -p 5601:5601 -d kibana:latest

http://192.168.52.130:5601/app/kibana#?_g=()

/elastic/elasticsearch/master/docs/src/test/resources/accounts.json

--删除容器:
docker stop elasticsearch
docker rm elasticsearch
查看错误日志:
docker logs elasticsearch
--挂载的文件需要有读写权限:任何用户任何组都有执行权限;
chmod -R 777 /mydata/elasticsearch
执行前:
ll
drwxr-xr-x
执行后:
ll
drwxrwxrwx
elasticsearch的镜像启动成功之后,访问下:
http://192.168.52.130:9200/
{
  "name" : "ZeFr4rR",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "Hg9BNYkUQCy650XP26sYkw",
  "version" : {
    "number" : "5.6.12",
    "build_hash" : "cfe3d9f",
    "build_date" : "2018-09-10T20:12:43.732Z",
    "build_snapshot" : false,
    "lucene_version" : "6.6.1"
  },
  "tagline" : "You Know, for Search"
}

相关DSL的语句使用

es查询节点相关信息
http://192.168.52.130:9200/_cat
=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/tasks
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/thread_pool/{thread_pools}
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
/_cat/templates

http://192.168.52.130:9200/_cat/nodes
	127.0.0.1 12 62 0 0.01 0.07 0.12 mdi * ZeFr4rR 【在集群状态下,哪个节点标了*,说明他是主节点】
http://192.168.52.130:9200/_cat/health --查看es的健康状况
	1647341114 10:45:14 elasticsearch green 1 1 0 0 0 0 0 0 - 100.0% 【中间的一些数字是集群分片信息】
http://192.168.52.130:9200/_cat/master --查看主节点
	ZeFr4rRZQz-w3ZK80Vuv0Q 127.0.0.1 127.0.0.1 ZeFr4rR
http://192.168.52.130:9200/_cat/indices --查看所有索引 show databases
2、索引一个文档(保存)
保存一个数据,保存在那个索引类型下,指定用哪个唯一索引
在customer索引下的external类型下保存1号数据为
PUT请求方式,必须得带上id
http://192.168.52.130:9200/customer/external/1
body里边的json
{
	"name":"pansd"
}
返回信息:
{
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 1,       +1
    "result": "created", --updated,
    "_shards": { --分片,集群环境下介绍;
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "created": true
}
分析返回数据:带_的都是元数据。put请求,第一次是请求时创建操作,以后的请求是更新操作。
如果是post请求方式,不指定id;
{
    "_index": "customer",
    "_type": "external",
    "_id": "AX-NOVshrEj9CP_oy5eV", --自动生成id,多次发送请求,每次都会产生唯一的id;
    "_version": 1,
    "result": "created",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "created": true
}
--如果带上了id,就和put一样了。
es的查询:
Get方式: customer/external/1
http://192.168.52.130:9200/customer/external/1
{
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 1,
	"_seq_no":1, -- 此版本的镜像没有,每次更新就会+1,用来做乐观锁。
	"_primary_term":1, -- 此版本的镜像没有,主分片重新分配,如果重启,就会发生变化。
    "found": true,
    "_source": {
        "name": "pansd"
    }
}
并发更新es的的时候,要带上乐观锁,进行并发控制。或者带上版本号,进行并发控制。
比如:
put请求:http://192.168.52.130:9200/customer/external/1?if_seq_no=1&if_primary_term=1

使用update更新:
post请求方式:http://192.168.52.130:9200/customer/external/1_update
{
	"doc":{
		"name":"pshdhx"
	}
}
1、第一次更新,版本号会改变;若在发送相同的请求,版本号和锁序列号不会改变了。【简言之:update会对比原来的数据】
删除文档索引:
delete customer/external/1
{
    "found": true,
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "_version": 2,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    }
}
再搜索:
{
    "_index": "customer",
    "_type": "external",
    "_id": "1",
    "found": false
}
delete customer 【删除索引,但是es中是不能单独删除类型的;或者是把索引下所有的数据都给清空了,这样也相当于是删除了类型。】

批量导入API
此时批量操作json格式不对,使用text的换行又不对了,所以得使用kibana了。
1、首先必须post请求
2、customer/external/_bulk
action
body
具体展示为:每两行一个操作;批量插入两行数据;在kibana中的dev tools中添加如下代码;
post /customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"pansd"}
{"index":{"_id":"2"}}
{"name":"pshdhx"}
返回值为:
{
  "took": 62,
  "errors": false,
  "items": [
    {
      "index": {
        "_index": "customer",
        "_type": "external",
        "_id": "1",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "created": true,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "customer",
        "_type": "external",
        "_id": "2",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "created": true,
        "status": 201
      }
    }
  ]
}

复杂DSL语句

复杂操作:
post /_bulk
{"delete":{"_index":"website","_type":"blog","_id":"123"}}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"my first blog"}
{"index":{"_index":"website","_type":"blog"}}
{"title":"my second blog"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"my updated blog post"}}
返回了:
{
  "took": 24,
  "errors": false,
  "items": [
    {
      "delete": {
        "found": true,
        "_index": "website",
        "_type": "blog",
        "_id": "123",
        "_version": 6,
        "result": "deleted",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "status": 200
      }
    },
    {
      "create": {
        "_index": "website",
        "_type": "blog",
        "_id": "123",
        "_version": 7,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "created": true,
        "status": 201
      }
    },
    {
      "index": {
        "_index": "website",
        "_type": "blog",
        "_id": "AX-OHvjg4kOgKKwsJNF9",
        "_version": 1,
        "result": "created",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "created": true,
        "status": 201
      }
    },
    {
      "update": {
        "_index": "website",
        "_type": "blog",
        "_id": "123",
        "_version": 8,
        "result": "updated",
        "_shards": {
          "total": 2,
          "successful": 1,
          "failed": 0
        },
        "status": 200
      }
    }
  ]
}

批量新增的数据地址:https://github.com/elastic/elasticsearch/blob/5.0/docs/src/test/resources/accounts.json
post /bank/account/_bulk
.....

新增成功后:使用高级检索:
docker update ... --restart=always

https://www.elastic.co/guide/index.html 参照官方文档:
https://www.elastic.co/guide/en/elastic-stack-get-started/7.5/index.html
https://www.elastic.co/guide/en/elasticsearch/reference/7.5/getting-started-search.html

GET bank/_search?q=*&sort=account_number:asc
该响应还提供有关搜索请求的以下信息:

took– Elasticsearch 运行查询需要多长时间,以毫秒为单位
timed_out– 搜索请求是否超时
_shards– 搜索了多少分片,以及多少分片成功、失败或被跳过的细分。
max_score– 找到的最相关文档的分数
hits.total.value- 找到了多少匹配的文档
hits.sort- 文档的排序位置(不按相关性分数排序时)
hits._score- 文档的相关性分数(使用时不适用match_all)

total": 1000,但是实际上hits里边只有10条,这是分页的结果。
##match_all的查询;全文检索
GET /bank/_search
{
  "query": { "match_all": {} },
  "sort": [
    { "account_number":{
      "order": "desc"
    }

    }

  ],
  "from": 0,
  "size": 20,
  "_source": ["balance","firstname"]
}

##match查询;按条件模糊查询
GET /bank/_search
{
  "query": {
    "match": {
      "address": "mill lane"  ##凡是有mill或者是lane的,都返回
    }
  }
}

##只返回这样的mill lane;短语查询;
GET /bank/_search
{
  "query": { "match_phrase": { "address": "mill lane" } }
}

must和must_not;满足should的得分高;

GET /bank/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "age": "40" } }
      ],
      "must_not": [
        { "match": { "state": "ID" } }
      ],
      "should": [
        {
          "match": {
            "address": "MILL"
          }
        }
      ]
    }
  }
}
##filter不会计算相关性得分。must会计算相关性得分。
GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}
##多字段匹配
get bank/_search
{
  "query":{
    "multi_match": {
      "query": "mill",
      "fields": ["state","address"]
    }
  }
}

使用term检索精确字段的值;与match正好相反(match适用于全文检索)

get bank/_search
{
  "query":{
    "term": {
      "age": {
        "age": "20"
      }
    }
  }
}
##查询精确值 这是精确匹配
get bank/_search
{
  "query":{
    "match": {
      "address.keyword": "MILL"
    }
  }
}

聚合函数

众所周知:elasticsearch是一个搜索与分析的引擎,下面是它的分析的用法。
Analyze results with aggregations
搜索address中包含mill的所有人的年龄分布以及平均年龄;
get bank/_search
{
	"query":{
		"match":{
			"address":"mill"
		}
	},
	"aggs":{
	  "ageAgg":{
	    "terms": {
	      "field": "age",
	      "size": 10
	    }
	  },
	  "ageAvg":{
	    "avg": {
	      "field": "age"
	    }
	  },
	  "balanceAvg":{
	    "avg": {
	      "field": "balance"
	    }
	  }
	},
	##不显示hits的结果,只显示个数。
	"size":0
}

  "aggregations": {
    "ageAgg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 38,
          "doc_count": 2
        },
        {
          "key": 28,
          "doc_count": 1
        },
        {
          "key": 32,
          "doc_count": 1
        }
      ]
    }
  }

##按照年龄进行聚合,并且请求这些年龄的这些人的平均薪资
get bank/_search
{
  "query":{
    "match_all": {}
  },
  "aggs":{
    "ageAgg":{
      "terms": {
        "field": "age",
        "size": 10
      },
      "aggs":{
        "balanceAvg":{
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
返回值:
"aggregations": {
    "ageAgg": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 463,
      "buckets": [
        {
          "key": 31,
          "doc_count": 61,
          "balanceAvg": {
            "value": 28312.918032786885
          }
        },
        {
          "key": 39,
          "doc_count": 60,
          "balanceAvg": {
            "value": 25269.583333333332
          }
        },
##查出所有年龄分布,并且这些年龄段中男性的平均薪资和女性的平均薪资,还有总体的平均薪资
get bank/_search
{
  "query":{
    "match_all": {}
  },
  "aggs":{
    "ageAgg":{
      "terms": {
        "field": "age",
        "size": 10
      },
      "aggs": {
        "genderAgg": {
          "terms": {
            "field": "gender.keyword",
            "size": 10
          },
          "aggs": {
            "balanceAgg": {
              "avg": {
                "field": "balance"
              }
            }
          }
        },
        "balanceAvgAll":{
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}
返回值:
"buckets": [ 31岁的人 有 61个;  35个男的,26个女的;  男的的平均薪资是29565.628571428573 女性的平均薪资是26626.576923076922
        {
          "key": 31,
          "doc_count": 61,
          "genderAgg": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "M",
                "doc_count": 35,
                "balanceAgg": {
                  "value": 29565.628571428573
                }
              },
              {
                "key": "F",
                "doc_count": 26,
                "balanceAgg": {
                  "value": 26626.576923076922
                }
              }
            ]
          },
          "balanceAvgAll": {
            "value": 28312.918032786885
          }
        },

mapping映射

elasticsearch的mapping方式:7.x之后,去除了type;预计8.x取消type的支持。为了防止不同的type的字段名称相同。

get bank/_mapping

关于ES中的数据类型:https://www.elastic.co/guide/en/elasticsearch/reference/7.5/mapping-types.html
string
	text and keyword
Numeric
	long, integer, short, byte, double, float, half_float, scaled_float
Date
	date
Date nanoseconds
	date_nanos
Boolean
	boolean
Binary
	binary
Range
	integer_range, float_range, long_range, double_range, date_range

还有复杂数据类型。

关于映射:很恶心,不同的版本不一样,看对应版本的文档都对应不起来。
get bank/_mapping


PUT my_index
{
  "mappings": {
    "user": {
      "_all": {
        "enabled": false
      },
      "properties": {
        "title": {
          "type": "text"
        },
        "name": {
          "type": "text"
        },
        "age": {
          "type": "integer"
        }
      }
    }
  }
}
PUT twitter
{
  "mappings": {
    "user": {
      "properties": {
        "name": { "type": "text" },
        "user_name": { "type": "keyword" },
        "email": { "type": "keyword" }
      }
    },
    "tweet": {
      "properties": {
        "content": { "type": "text" },
        "user_name": { "type": "keyword" },
        "tweeted_at": { "type": "date" }
      }
    }
  }
}
PUT twitter/user/kimchy
{
  "name": "Shay Banon",
  "user_name": "kimchy",
  "email": "shay@kimchy.com"
}
PUT twitter/tweet/1
{
  "user_name": "kimchy",
  "tweeted_at": "2017-10-24T09:00:00Z",
  "content": "Types are going away"
}
GET twitter/tweet/_search
{
  "query": {
    "match": {
      "user_name": "kimchy"
    }
  }
}
PUT twitter
{
  "mappings": {
    "doc": {
      "properties": {
        "type": { "type": "keyword" },
        "name": { "type": "text" },
        "user_name": { "type": "keyword" },
        "email": { "type": "keyword" },
        "content": { "type": "text" },
        "tweeted_at": { "type": "date" }
      }
    }
  }

}

 ES的映射修改和数据迁移

##创建一个索引,并指定映射规则
PUT /my_index
{
  "mappings": {
    "properties": {
      "age":{"type": "integer"},
      "email":{"type": "keyword"},
      "name":{"type": "text"}
    }
  }
}
##修改索引下的映射规则,【新增字段】
PUT /my_index/_mapping
{
  "properties":{
    "employee-id":{"type":"keyword","index":false}
  }
}
GET /my_index/_mapping
##对于已经存在的映射字段,我们是不能更新的,只能添加;因为如果修改映射,那么可能存在已有的数据;我们
可以创建一个新的索引,重新reindex原来的数据。【数据迁移】
##创建新的索引,来改变老的索引的映射类型
PUT /newBank
{
  "mappings": {
    "properties": {
      
    }
  }
}
##数据迁移
POST _reindex
{
  "source": {
    "index": "bank",
    "type": "account"
  },
  "dest": {
    "index": "newBank"
  }
}

4、ES的分词器

关于ES的分词:
一个tokenizer(分词器)接收一个字符流,将之分割为独立的tokens(词元,通常是独立的单词),然后输出token流。
例如:whitespace tokenizer遇到空格字符时分割文本。它会将文本"Quick brown fox"分割为 【quick,brown,fox】
该tokenizer(分词器)还负责记录各个term(词条)的顺序或者是position位置(用于phrase短语和word proximity词近邻查询),以及term词条所
代表的原始word的start和end的字符偏移量。(用于高亮显示搜索的内容)
ES提供了很多内置的分词器,可以用来构建custom analyzers(自定义分词器)

4.1、安装ik分词器:

ES的分词器:7.4.2
链接:https://pan.baidu.com/s/1DpWUZEFicNiYOJDCSDidVA 
提取码:pshd 
--来自百度网盘超级会员V2的分享


POST _analyze
{
  "analyzer": "whitespace",
  "text":     "The quick brown fox."
}
安装下中文分词器:ik;github到plugins;
进入容器内部:
docker exec -it elasticsearch /bin/bash
pwd
ls
exit退出容器
先下载wget :用yum下载下载器;
yum install wget
设置root登录:
vi /etc/ssh/sshd_config
修改:passwordAuthentication yes/no
重启服务:service sshd restart
1、必须安装ES对应版本的IK分词器;
2、放入的plugins目录下;
3、重启容器;
4、测试分词:
POST _analyze
{
  "analyzer": "ik_smart", ## ik_max_word
  "text":     "尚硅谷电商项目"
}
返回值:
{
  "tokens": [
    {
      "token": "尚",
      "start_offset": 0,
      "end_offset": 1,
      "type": "CN_CHAR",
      "position": 0
    },
    {
      "token": "硅谷",
      "start_offset": 1,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 1
    },
    {
      "token": "电",
      "start_offset": 3,
      "end_offset": 4,
      "type": "CN_CHAR",
      "position": 2
    },
    {
      "token": "商",
      "start_offset": 4,
      "end_offset": 5,
      "type": "CN_CHAR",
      "position": 3
    },
    {
      "token": "项目",
      "start_offset": 5,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 4
    }
  ]
}
有些词语分词器不识别,需要创建自定义分词:
1、先设置网卡:
添加GATEWAY=xxxxx.1
DNS1=114.114.114.114
DNS2=8.8.8.8
使用了match之后,会有一个max_score最大得分;查询数字是精确查询;查询字符串是模糊查询【全文检索】;
关于自定义分词器:

4.2、安装nginx

安装nginx,把词库的地址放到nginx里边;
docker container cp nginx:/etc/nginx .
mv nginx conf #文件夹改名
mv conf nginx/ #把conf挪移到nginx文件夹里边
docker stop nginx;
docker rm nginx;
docker run -p 80:80 --name nginx \
-v /mydata/nginx/conf/html:/usr/share/nginx/html \
-v /mydata/nginx/conf/logs:/var/log/nginx \
-v /mydata/nginx/conf:/etc/nginx \
-d nginx:1.10
docker run -p80:80 --name nginx -v/mydata/nginx/html:/usr/share/nginx/html -v/mydata/nginx/logs:/var/log/nginx -v /mydata/nginx/conf:/etc/nginx -d nginx:1.10 ##【正确】
在nginx里边自定义fenci.txt

配置ik分词器的词库地址http://192.168.52.130/es/fenci.txt

 

5、 java代码交互elasticsearch

5.1、pom.xml

<elasticserach.version>7.4.2</elasticserach.version>
 <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.4.2</version>
        </dependency>

5.2、配置文件configuration

package com.pshdhx.elasticsearch.config;

import org.apache.http.HttpHost;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * @author pshdhx
 * @date 2022-03-28 15:09
 */
@Configuration
public class ElasticSearchConfiguration {

    public static final RequestOptions COMMON_OPTIONS;
    static {
        RequestOptions.Builder builder = RequestOptions.DEFAULT.toBuilder();
        //addHeader();
        //setHttpAsyncResponseCounsumerFactory
        COMMON_OPTIONS = builder.build();
    }

    @Bean
    public RestHighLevelClient EsRestClient(){
        RestClientBuilder restClientBuilder = RestClient.builder(new HttpHost("82.xx.xx.xxxx", 9200, "http"));
        RestHighLevelClient client = new RestHighLevelClient(restClientBuilder);
        return client;
    }
}

5.3、测试文件 

package com.pshdhx.elasticsearch;

import com.alibaba.fastjson.JSON;
import com.pshdhx.elasticsearch.config.ElasticSearchConfig;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.ToString;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.aggregations.Aggregation;
import org.elasticsearch.search.aggregations.AggregationBuilder;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.Aggregations;
import org.elasticsearch.search.aggregations.bucket.terms.Terms;
import org.elasticsearch.search.aggregations.metrics.Avg;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;

import javax.naming.directory.SearchResult;
import java.io.IOException;
import java.util.Map;


@SpringBootTest
class GuilimallEsApplicationTests {

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    @Test
    void testSearchEs() throws IOException {
        //1、创建检索请求
        SearchRequest searchRequest = new SearchRequest();
        //指定索引
        searchRequest.indices("bank");
        //指定DSL检索条件
        SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
        sourceBuilder.query(QueryBuilders.matchQuery("address","mill"));

        //测试聚合函数
        sourceBuilder.aggregation(AggregationBuilders.terms("ageAgg").field("age").size(10));
        sourceBuilder.aggregation(AggregationBuilders.avg("balanceAvg").field("balance"));

        System.out.println(sourceBuilder.toString());
        searchRequest.source(sourceBuilder);

        //2、执行检索
        SearchResponse response = restHighLevelClient.search(searchRequest, ElasticSearchConfig.COMMON_OPTIONS);

        System.out.println(response.toString());
        Map map = JSON.parseObject(response.toString(), Map.class);

        SearchHits hits = response.getHits();
        SearchHit[] hits1 = hits.getHits();
        for (SearchHit hit : hits1) {
            String index = hit.getIndex();
            //Map<String, Object> sourceAsMap = hit.getSourceAsMap();
            String sourceAsString = hit.getSourceAsString();
            Account account = JSON.parseObject(sourceAsString, Account.class);
            System.out.println("Account:"+account.toString());
        }

        //获取分析信息
        Aggregations aggregations = response.getAggregations();
        Terms ageAgg1 = aggregations.get("ageAgg");
        for (Terms.Bucket bucket : ageAgg1.getBuckets()) {
            String keyAsString = bucket.getKeyAsString();
            System.out.println("年龄"+keyAsString+"===》"+bucket.getDocCount());
        }
        Avg balanceAvg1 = aggregations.get("balanceAvg");
        System.out.println("平均薪资:"+balanceAvg1.getValue());
    }

    @Test
    void contextLoads() {
        System.out.println("111");
        System.out.println(restHighLevelClient);
    }

    @Test
    public void indexData() throws IOException {
        IndexRequest indexRequest = new IndexRequest("users");
        indexRequest.id("1");
//        indexRequest.source("userName","zhangsan","age",18,"gender","男");
        User user = new User("pshdhx",18,"man");
        String jsonString = JSON.toJSONString(user);
        IndexRequest index = indexRequest.source(jsonString, XContentType.JSON);
        //执行操作
        restHighLevelClient.index(indexRequest, ElasticSearchConfig.COMMON_OPTIONS);
        System.out.println(index);
    }


    @Data
    @AllArgsConstructor
    class User{
        private String userName;
        private Integer age;
        private String gender;
    }

    @Data
    @ToString
    static class Account {
        private int account_number;
        private int balance;
        private String firstname;
        private String lastname;
        private int age;
        private String gender;
        private String address;
        private String employer;
        private String email;
        private String city;
        private String state;
    }

}

操作结果

{"query":{"match":{"address":{"query":"mill","operator":"OR","prefix_length":0,"max_expansions":50,"fuzzy_transpositions":true,"lenient":false,"zero_terms_query":"NONE","auto_generate_synonyms_phrase_query":true,"boost":1.0}}},"aggregations":{"ageAgg":{"terms":{"field":"age","size":10,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]}},"balanceAvg":{"avg":{"field":"balance"}}}}
{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":4,"relation":"eq"},"max_score":5.4032025,"hits":[{"_index":"bank","_type":"_doc","_id":"970","_score":5.4032025,"_source":{"account_number":970,"balance":19648,"firstname":"Forbes","lastname":"Wallace","age":28,"gender":"M","address":"990 Mill Road","employer":"Pheast","email":"forbeswallace@pheast.com","city":"Lopezo","state":"AK"}},{"_index":"bank","_type":"_doc","_id":"136","_score":5.4032025,"_source":{"account_number":136,"balance":45801,"firstname":"Winnie","lastname":"Holland","age":38,"gender":"M","address":"198 Mill Lane","employer":"Neteria","email":"winnieholland@neteria.com","city":"Urie","state":"IL"}},{"_index":"bank","_type":"_doc","_id":"345","_score":5.4032025,"_source":{"account_number":345,"balance":9812,"firstname":"Parker","lastname":"Hines","age":38,"gender":"M","address":"715 Mill Avenue","employer":"Baluba","email":"parkerhines@baluba.com","city":"Blackgum","state":"KY"}},{"_index":"bank","_type":"_doc","_id":"472","_score":5.4032025,"_source":{"account_number":472,"balance":25571,"firstname":"Lee","lastname":"Long","age":32,"gender":"F","address":"288 Mill Street","employer":"Comverges","email":"leelong@comverges.com","city":"Movico","state":"MT"}}]},"aggregations":{"lterms#ageAgg":{"doc_count_error_upper_bound":0,"sum_other_doc_count":0,"buckets":[{"key":38,"doc_count":2},{"key":28,"doc_count":1},{"key":32,"doc_count":1}]},"avg#balanceAvg":{"value":25208.0}}}
Account:GuilimallEsApplicationTests.Account(account_number=970, balance=19648, firstname=Forbes, lastname=Wallace, age=28, gender=M, address=990 Mill Road, employer=Pheast, email=forbeswallace@pheast.com, city=Lopezo, state=AK)
Account:GuilimallEsApplicationTests.Account(account_number=136, balance=45801, firstname=Winnie, lastname=Holland, age=38, gender=M, address=198 Mill Lane, employer=Neteria, email=winnieholland@neteria.com, city=Urie, state=IL)
Account:GuilimallEsApplicationTests.Account(account_number=345, balance=9812, firstname=Parker, lastname=Hines, age=38, gender=M, address=715 Mill Avenue, employer=Baluba, email=parkerhines@baluba.com, city=Blackgum, state=KY)
Account:GuilimallEsApplicationTests.Account(account_number=472, balance=25571, firstname=Lee, lastname=Long, age=32, gender=F, address=288 Mill Street, employer=Comverges, email=leelong@comverges.com, city=Movico, state=MT)
年龄38===》2
年龄28===》1
年龄32===》1
平均薪资:25208.0

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐