HBase入门：shell模糊查询

简介：本文将记录shell模糊查询数据的基本命令使用，将尽量全和详细的记录相关命令的使用过程，主要涉及的命令：scan，get 这两种命令下的模糊查询准备工作：将为test增加一个列簇（hobby：表示各种爱好，比如喜爱的体育项目，喜爱的音乐类型，喜爱读书的类型等数据），然后添加一些测试数据方便使用命令演示模糊查询操作命令解读：alter 'test','hobby' 作用是：向test表添加一个

二爵爷点灯

3918人浏览 · 2021-11-15 17:42:22

二爵爷点灯 · 2021-11-15 17:42:22 发布

简介：

本文将记录shell模糊查询数据的基本命令使用，将尽量全和详细的记录相关命令的使用过程，主要涉及的命令：scan，get 这两种命令下的模糊查询

准备工作：

将为test增加一个列簇（hobby：表示各种爱好，比如喜爱的体育项目，喜爱的音乐类型，喜爱读书的类型等数据），然后添加一些测试数据方便使用命令演示模糊查询操作

命令解读：alter 'test','hobby' 作用是：向test表添加一个新的列簇：hobby

hbase(main):002:0> alter 'test','hobby'
Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 2.5670 seconds                                                                                                                                                                                                                           
hbase(main):003:0> put 'test','row1','hobby:music', '粤语歌'
Took 0.0852 seconds                                                                                                                                                                                                                           
hbase(main):004:0> put 'test','row1','hobby:sports', '跑步'
Took 0.0048 seconds                                                                                                                                                                                                                           
hbase(main):005:0> put 'test','row2','hobby:music', '轻音乐'
Took 0.0042 seconds                                                                                                                                                                                                                           
hbase(main):006:0> put 'test','row2','hobby:sports', '足球'

数据预览：

先看看我们test表中有那些数据，大家可以使用put 添加一些测试数据

hbase(main):006:0> scan 'test', { FORMATTER => 'toString'}
ROW                                                          COLUMN+CELL                                                                                                                                                                      
 row1                                                        column=hobby:music, timestamp=2021-11-14T00:18:49.730, value=粤语歌                                                                                                                 
 row1                                                        column=hobby:sports, timestamp=2021-11-14T00:19:47.453, value=跑步                                                                                                                 
 row1                                                        column=liecuA:name, timestamp=2021-11-13T16:37:28.520, value=张三                                                                                                                  
 row2                                                        column=hobby:music, timestamp=2021-11-14T00:20:21.322, value=轻音乐                                                                                                                 
 row2                                                        column=hobby:sports, timestamp=2021-11-14T00:21:08.164, value=足球                                                                                                                 
 row3                                                        column=liecuA:name, timestamp=2021-11-13T17:28:14.094, value=3茉莉                                                                                                                 
 row4                                                        column=liecuA:name, timestamp=2021-11-15T14:43:09.051, value=粤云朋                                                                                                                 
 row粤语歌1                                                     column=hobby:musmusic-love1, timestamp=2021-11-15T14:35:08.761, value=海阔天空                                                                                                       
5 row(s)
Took 0.0137 seconds                                                                                                                                                                                                                           
hbase(main):007:0>

根据列值模糊查询

目标1:模糊查询test表数据包含“粤”的数据（不包含行健）

命令：

scan 'test', { FILTER=>"ValueFilter(=,'substring:粤')",FORMATTER => 'toString'}

hbase(main):005:0> scan 'test', { FILTER=>"ValueFilter(=,'substring:粤')",FORMATTER => 'toString'}
ROW                                                          COLUMN+CELL                                                                                                                                                                      
 row1                                                        column=hobby:music, timestamp=2021-11-14T00:18:49.730, value=粤语歌                                                                                                                 
 row4                                                        column=liecuA:name, timestamp=2021-11-15T14:43:09.051, value=粤云朋

目标2:精确查询test表，查询值是"粤云朋"的数据（不包含行健）

命令：

scan 'test', { FILTER=>"ValueFilter(=,'binary:粤云朋')",FORMATTER => 'toString'}

hbase(main):006:0> scan 'test', { FILTER=>"ValueFilter(=,'binary:粤云朋')",FORMATTER => 'toString'}
ROW                                                          COLUMN+CELL                                                                                                                                                                      
 row4                                                        column=liecuA:name, timestamp=2021-11-15T14:43:09.051, value=粤云朋                                                                                                                 
1 row(s)
Took 0.0054 seconds

目标3:明确指定查询列和值，精确查询列簇：liecuA，列：name，列值：张三的数据

命令：

scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'binary:张三')",FORMATTER => 'toString'}

hbase(main):008:0* scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'binary:张三')",FORMATTER => 'toString'}
ROW                                                          COLUMN+CELL                                                                                                                                                                      
 row1                                                        column=hobby:music, timestamp=2021-11-14T00:18:49.730, value=粤语歌                                                                                                                 
 row1                                                        column=hobby:sports, timestamp=2021-11-14T00:19:47.453, value=跑步                                                                                                                 
 row1                                                        column=liecuA:name, timestamp=2021-11-13T16:37:28.520, value=张三                                                                                                                  
1 row(s)
Took 0.0086 seconds

目标4:明确指定查询列和值，精确查询列簇：liecuA，列：name，列值：以"张"开头的数据

命令：

scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'substring:张')",FORMATTER => 'toString'}

hbase(main):013:0> scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'substring:张')",FORMATTER => 'toString'}
ROW                                                          COLUMN+CELL                                                                                                                                                                      
 row1                                                        column=hobby:music, timestamp=2021-11-14T00:18:49.730, value=粤语歌                                                                                                                 
 row1                                                        column=hobby:sports, timestamp=2021-11-14T00:19:47.453, value=跑步                                                                                                                 
 row1                                                        column=liecuA:name, timestamp=2021-11-13T16:37:28.520, value=张三                                                                                                                  
 row2                                                        column=hobby:music, timestamp=2021-11-14T00:20:21.322, value=轻音乐                                                                                                                 
 row2                                                        column=hobby:sports, timestamp=2021-11-14T00:21:08.164, value=足球                                                                                                                 
 row2                                                        column=liecuA:name, timestamp=2021-11-15T16:58:08.497, value=张小斐                                                                                                                 
2 row(s)

目标5:多条件查询test表查询姓张（liecuA：name）的不喜欢粤语歌（hobby：music）的人

命令：

scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'substring:张') AND SingleColumnValueFilter('hobby', 'music', !=, 'binary:粤语歌')",FORMATTER => 'toString'}

hbase(main):013:0> scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'substring:张') AND SingleColumnValueFilter('hobby', 'music', !=, 'binary:粤语歌')",FORMATTER => 'toString'}
ROW                                                          COLUMN+CELL                                                                                                                                                                      
 row2                                                        column=hobby:music, timestamp=2021-11-14T00:20:21.322, value=轻音乐                                                                                                                 
 row2                                                        column=hobby:sports, timestamp=2021-11-14T00:21:08.164, value=足球                                                                                                                 
 row2                                                        column=liecuA:name, timestamp=2021-11-15T16:58:08.497, value=张小斐                                                                                                                 
1 row(s)
Took 0.0107 seconds

注意事项1:

多过滤器多条件查询请使用大些：AND或OR进行连接否则会报错

例：

hbase(main):014:0> scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'substring:张') and SingleColumnValueFilter('hobby', 'music', !=, 'binary:粤语歌')",FORMATTER => 'toString'}

ROW COLUMN+CELL ERROR: Filter Name and not supported

For usage try 'help "scan"'

Took 0.0016 seconds

注意事项2：

使用SingleColumnValueFilter 过滤数据时候，返回的数据不正常问题

这是因为忽略了SingleColumnValueFilter 的一个参数问题。

hbase 权威指南文档原文：

The filter class also exposes a few auxiliary methods you can use to
fine tune its behavior:
boolean getFilterIfMissing()
void setFilterIfMissing(boolean filterIfMissing)
boolean getLatestVersionOnly()
void setLatestVersionOnly(boolean latestVersionOnly)
The former controls what happens to rows that do not have the
column at all. By default they are included in the result, but you can
use setFilterIfMissing(true) to reverse that behavior, i.e., all
rows that do not have the reference column are dropped from the
result.
266
Note
You must include the column you want to filter by, i.e., the reference
column, into the families you query for - using addColumn() for
example. If you fail to do so the column is considered missing and the
result is either empty, or contains all rows, based on the
getFilterIfMissing() result.

设置SingleColumnValueFilter时候注意的地方：
要过滤的列必须存在，如果不存在，那么这些列不存在的数据也会返回。如果不想让这些数据返回，做如下设置即可。
filter.setFilterIfMissing(true);

————————————————
此处引用自：https://blog.csdn.net/maixia24/article/details/11768753