ElasticSearch7学习笔记之Mapping
文章目录背景数据类型DynamicMapping能否更改Mapping的字段类型dynamic为falsedynamic为strict自定义Mapping定义字段可否被检索空值响应copy_to字段拼接IndexTemplate更新模板查看模板DynamicTemplate背景ES中的Mapping类似数据库中的schema,用来定义索引中的字段名称、数据类型以及配置字段和倒排索引相关信息数据类型
文章目录
背景
ES中的Mapping类似数据库中的schema,用来定义索引中的字段名称、数据类型以及配置字段和倒排索引相关信息
倒排索引
在学习ES中的映射之前,我们先学习一下ES中的倒排索引。
定义
倒排索引就是单词到文档id的关系,如下所示,左边是一个正排索引,右边就是一个单词到文档id的倒排索引:
核心组成
倒排索引包含两部分:单词词典:用B+树或哈希拉链法存储
倒排列表:由倒排索引项组成,包括文档id、词频tf、位置(所在文档id)、索引(单词在文档中的位置),如下所示是Elastic单词的倒排索引表:
es的json文档中的每个字段都有自己的倒排索引,可以指定某些字段不做索引,这么做可以节省磁盘空间,但也会导致这些字段不会被检索
ES中的数据类型
简单类型:Text/Keyword、Date、Integer/Floating、Boolean、IPv4 & IPv6
复杂类型:对象和嵌套
特殊类型:geo_point & geo_shape / percolator
DynamicMapping
写入文档时,会自动创建不存在的索引,所以我们无需手动定义Mapping。json类型在向es类型转换时的对应关系如下所示
插入一条数据后,查看相关Mapping:
PUT mapping_test/_doc/1
{
"first_name":"song",
"last_name": "ye",
"login_date": "2020-07-11T16:01:48.182Z"
}
GET mapping_test/_mapping
输出如下:
{
"mapping_test" : {
"mappings" : {
"properties" : {
"first_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"last_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"login_date" : {
"type" : "date"
}
}
}
}
}
可见"login_date"字段自动被定义为date类型
布尔类型的值要是加上"",默认就是一个text;不加""时才是boolean:
PUT mapping_test/_doc/1
{
"first_name":"song",
"last_name": "ye",
"login_date": "2020-07-11T16:01:48.182Z",
"is_vip": "true",
"is_admin": true
}
GET mapping_test/_mapping
输出如下:
{
"mapping_test": {
"mappings": {
"properties": {
"first_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"is_admin": {
"type": "boolean"
},
"is_vip": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"login_date": {
"type": "date"
}
}
}
}
}
能否更改Mapping的字段类型
对于新增字段,如果Dynamic为true,那么Mapping会随着字段的增加而修改;如果Dynamic为false,那么Mapping不会被更新,新增字段无法被索引,但是信息会出现在_source中;Dynamic为strict的话,文档写入失败
对于已有字段,一旦已经有数据写入,就不再支持修改Mapping类型。Lucene实现的倒排索引,一旦生成后,就不允许修改。
如果希望改变字段类型,必须使用Reindex API来重建索引。
这么做的原因很简单:如果修改了字段类型,会导致已建立索引的属性无法被检索,而对新增字段就不会有这样的影响
dynamic为false
下面是把dynamic设置为false后的示例,加入新字段f1就不会被索引到:
PUT mapping_test/_mapping
{
"dynamic": false
}
PUT mapping_test/_doc/2
{
"f1": "v1"
}
POST mapping_test/_search
{
"query": {
"match": {
"f1": "v1"
}
}
}
输出如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
dynamic为strict
下面是把dynamic设置为strict后,加入新字段的例子:
PUT mapping_test/_mapping
{
"dynamic": "strict"
}
PUT mapping_test/_doc/2
{
"f2":"v2"
}
输出如下,会发现不允许添加新的字段:
{
"error" : {
"root_cause" : [
{
"type" : "strict_dynamic_mapping_exception",
"reason" : "mapping set to strict, dynamic introduction of [f2] within [_doc] is not allowed"
}
],
"type" : "strict_dynamic_mapping_exception",
"reason" : "mapping set to strict, dynamic introduction of [f2] within [_doc] is not allowed"
},
"status" : 400
}
注意dynamic的值不会影响添加或修改已有字段值,比如上面的例子把f1和f2改成is_vip后就不会有问题
自定义Mapping
我们可以自定义文档的Mapping,可以参考API手册。下面是自定义文档Mapping的例子:
定义字段可否被检索
index字段用来指定字段能否被检索到:
PUT users
{
"mappings": {
"properties": {
"first_name": {
"type": "text"
},
"second_name" : {
"type": "text"
},
"mobile": {
"type": "text",
"index": false
}
}
}
}
然后插入数据,并对mobile字段进行检索,会直接报错400:
PUT users/_doc/1
{
"first_name": "song",
"second_name": "ye",
"mobile": "123444"
}
POST users/_search
{
"query": {
"match": {
"mobile": "123444"
}
}
}
空值响应
下面是对keyword类型的字段mobile设置空值对应,这样json中的null就对应文档中的"NULL":
PUT users
{
"mappings": {
"properties": {
"first_name": {
"type": "text"
},
"second_name" : {
"type": "text"
},
"mobile": {
"type": "keyword",
"null_value": "NULL"
}
}
}
}
然后插入两条数据,检索mobile为null的记录,这样会把song gong检索出来:
PUT users/_doc/1
{
"first_name": "song",
"second_name": "ye"
}
PUT users/_doc/2
{
"first_name": "song",
"second_name": "gong",
"mobile": null
}
POST users/_search
{
"query": {
"match": {
"mobile": "NULL"
}
}
}
copy_to字段拼接
下面是copy_to的示例,copy_to用来把多个字段拼接到新字段中,这个新字段不会在_source中存在,只能用来做查询:
PUT users
{
"mappings": {
"properties": {
"first_name": {
"type": "text",
"copy_to": "full_name"
},
"second_name" : {
"type": "text",
"copy_to": "full_name"
}
}
}
}
插入数据后查询,会得到那唯一的结果:
PUT users/_doc/1
{
"first_name": "song",
"second_name": "ye"
}
GET users/_search?q=full_name:(song ye)
输出如下,可见_source中没有full_name字段:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"first_name" : "song",
"second_name" : "ye"
}
}
]
}
}
IndexTemplate
索引模板可以帮助我们设定Mapping和设置,并按照一定规则,自动匹配到新建的索引上。
模板仅在一个索引被创建时起作用,修改模板不会影响已有的索引
可以设置多个索引模板,这些设置会被合并在一起,这个过程可以通过指定order来控制
默认设置:
对所有索引的默认设置如下,将其分片和副本数都设置为1:
PUT _template/template_default
{
"index_patterns": ["*"],
"order": 0,
"version": 1,
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
}
更新模板
对以test打头的索引设置以下模板,副本数为2,关闭日期探测,开启数字探测:
PUT /_template/template_test
{
"index_patterns": ["test*"],
"order": 1,
"settings": {
"number_of_shards": 1,
"number_of_replicas": 2
}
, "mappings": {
"date_detection": false,
"numeric_detection": true
}
}
查看模板
可以通过以下api进行查看模板:
GET /_template/template_default
GET /_template/template_test
然后再给testtemplate文档插入一条数据,查看其Mapping:
PUT testtemplate/_doc/2
{
"someNumber": "1",
"someDate": "2020/07/11"
}
GET testtemplate/_mapping/
输出如下,可见日期变成了文本,数字则转换成了长整型:
{
"testtemplate" : {
"mappings" : {
"date_detection" : false,
"numeric_detection" : true,
"properties" : {
"someDate" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"someNumber" : {
"type" : "long"
}
}
}
}
}
再看看设置
GET testtemplate/_settings
输出如下,可见副本数为2,分片数为1:
{
"testtemplate" : {
"settings" : {
"index" : {
"creation_date" : "1594468782269",
"number_of_shards" : "1",
"number_of_replicas" : "2",
"uuid" : "uQ7AzTYQRGyrxKueUideOA",
"version" : {
"created" : "7060099"
},
"provided_name" : "testtemplate"
}
}
}
}
索引模板给出的设置可以被显示的覆盖:
PUT testtemplate2/
{
"settings": {
"number_of_replicas": 4
}
}
GET testtemplate2/_settings
输出则显示副本数为4:
{
"testtemplate2": {
"settings": {
"index": {
"creation_date": "1594469098371",
"number_of_shards": "1",
"number_of_replicas": "4",
"uuid": "kqjdOulBQs6K6nKzMfyqyA",
"version": {
"created": "7060099"
},
"provided_name": "testtemplate2"
}
}
}
}
DynamicTemplate
根据es识别的数据类型,结合字段名称动态设置字段类型。
动态模板是定义在某个索引的Mapping中,需要有一个名称,匹配规则是一个数组,需要为待匹配字段设置mapping
举例如下,把is打头的string映射成布尔,把其他string映射成keyword类型:
PUT my_index
{
"mappings": {
"dynamic_templates":[
{
"string_as_boolean":{ "match_mapping_type": "string",
"match": "is*",
"mapping": {
"type": "boolean"
}
}
},
{
"string_as_keywords":{
"match_mapping_type": "string",
"match": "*",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
输入数据测试一下:
PUT my_index/_doc/1
{
"is_vip": "true",
"name": "szc"
}
GET my_index/_mapping
输出如下,可见is_vip成了布尔类型,name成了keyword类型:
{
"my_index" : {
"mappings" : {
"dynamic_templates" : [
{
"string_as_boolean" : {
"match" : "is*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "boolean"
}
}
},
{
"string_as_keywords" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "keyword"
}
}
}
],
"properties" : {
"is_vip" : {
"type" : "boolean"
},
"name" : {
"type" : "keyword"
}
}
}
}
}
下面这个例子用来把name下除了middle之外的属性转换成text类型,再复制给full_name搜索属性下,也就是full_name只能用于检索:
PUT my_index
{
"mappings": {
"dynamic_templates":[
{
"full_name": {
"path_match": "name.*",
"path_unmatch": "*.middle",
"mapping": {
"type": "text",
"copy_to": "full_name"
}
}
}
]
}
}
使用如下所示,可以通过full_name:s或c查到这个记录,但是检索middle值就不行:
PUT my_index/_doc/1
{
"name": {
"first": "s",
"middle": "z",
"last": "c"
}
}
GET my_index/_search?q=full_name:s
更多推荐
所有评论(0)