HBase入门:shell模糊查询
简介:本文将记录shell模糊查询数据的基本命令使用,将尽量全和详细的记录相关命令的使用过程,主要涉及的命令:scan,get 这两种命令下的模糊查询准备工作:将为test增加一个列簇(hobby:表示各种爱好,比如喜爱的体育项目,喜爱的音乐类型,喜爱读书的类型等数据),然后添加一些测试数据方便使用命令演示模糊查询操作命令解读:alter 'test','hobby' 作用是:向test表添加一个
简介:
本文将记录shell模糊查询数据的基本命令使用,将尽量全和详细的记录相关命令的使用过程,主要涉及的命令:scan,get 这两种命令下的模糊查询
准备工作:
将为test增加一个列簇(hobby:表示各种爱好,比如喜爱的体育项目,喜爱的音乐类型,喜爱读书的类型等数据),然后添加一些测试数据方便使用命令演示模糊查询操作
命令解读:alter 'test','hobby' 作用是:向test表添加一个新的列簇:hobby
hbase(main):002:0> alter 'test','hobby'
Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 2.5670 seconds
hbase(main):003:0> put 'test','row1','hobby:music', '粤语歌'
Took 0.0852 seconds
hbase(main):004:0> put 'test','row1','hobby:sports', '跑步'
Took 0.0048 seconds
hbase(main):005:0> put 'test','row2','hobby:music', '轻音乐'
Took 0.0042 seconds
hbase(main):006:0> put 'test','row2','hobby:sports', '足球'
数据预览:
先看看我们test表中有那些数据,大家可以使用put 添加一些测试数据
hbase(main):006:0> scan 'test', { FORMATTER => 'toString'}
ROW COLUMN+CELL
row1 column=hobby:music, timestamp=2021-11-14T00:18:49.730, value=粤语歌
row1 column=hobby:sports, timestamp=2021-11-14T00:19:47.453, value=跑步
row1 column=liecuA:name, timestamp=2021-11-13T16:37:28.520, value=张三
row2 column=hobby:music, timestamp=2021-11-14T00:20:21.322, value=轻音乐
row2 column=hobby:sports, timestamp=2021-11-14T00:21:08.164, value=足球
row3 column=liecuA:name, timestamp=2021-11-13T17:28:14.094, value=3茉莉
row4 column=liecuA:name, timestamp=2021-11-15T14:43:09.051, value=粤云朋
row粤语歌1 column=hobby:musmusic-love1, timestamp=2021-11-15T14:35:08.761, value=海阔天空
5 row(s)
Took 0.0137 seconds
hbase(main):007:0>
根据列值模糊查询
目标1:模糊查询test表数据包含“粤”的数据(不包含行健)
命令:
scan 'test', { FILTER=>"ValueFilter(=,'substring:粤')",FORMATTER => 'toString'}
hbase(main):005:0> scan 'test', { FILTER=>"ValueFilter(=,'substring:粤')",FORMATTER => 'toString'}
ROW COLUMN+CELL
row1 column=hobby:music, timestamp=2021-11-14T00:18:49.730, value=粤语歌
row4 column=liecuA:name, timestamp=2021-11-15T14:43:09.051, value=粤云朋
目标2:精确查询test表,查询值是"粤云朋"的数据(不包含行健)
命令:
scan 'test', { FILTER=>"ValueFilter(=,'binary:粤云朋')",FORMATTER => 'toString'}
hbase(main):006:0> scan 'test', { FILTER=>"ValueFilter(=,'binary:粤云朋')",FORMATTER => 'toString'}
ROW COLUMN+CELL
row4 column=liecuA:name, timestamp=2021-11-15T14:43:09.051, value=粤云朋
1 row(s)
Took 0.0054 seconds
目标3:明确指定查询列和值,精确查询 列簇:liecuA,列:name,列值:张三的数据
命令:
scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'binary:张三')",FORMATTER => 'toString'}
hbase(main):008:0* scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'binary:张三')",FORMATTER => 'toString'}
ROW COLUMN+CELL
row1 column=hobby:music, timestamp=2021-11-14T00:18:49.730, value=粤语歌
row1 column=hobby:sports, timestamp=2021-11-14T00:19:47.453, value=跑步
row1 column=liecuA:name, timestamp=2021-11-13T16:37:28.520, value=张三
1 row(s)
Took 0.0086 seconds
目标4:明确指定查询列和值,精确查询 列簇:liecuA,列:name,列值:以"张"开头的数据
命令:
scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'substring:张')",FORMATTER => 'toString'}
hbase(main):013:0> scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'substring:张')",FORMATTER => 'toString'}
ROW COLUMN+CELL
row1 column=hobby:music, timestamp=2021-11-14T00:18:49.730, value=粤语歌
row1 column=hobby:sports, timestamp=2021-11-14T00:19:47.453, value=跑步
row1 column=liecuA:name, timestamp=2021-11-13T16:37:28.520, value=张三
row2 column=hobby:music, timestamp=2021-11-14T00:20:21.322, value=轻音乐
row2 column=hobby:sports, timestamp=2021-11-14T00:21:08.164, value=足球
row2 column=liecuA:name, timestamp=2021-11-15T16:58:08.497, value=张小斐
2 row(s)
目标5:多条件查询test表 查询姓张(liecuA:name)的不喜欢粤语歌(hobby:music)的人
命令:
scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'substring:张') AND SingleColumnValueFilter('hobby', 'music', !=, 'binary:粤语歌')",FORMATTER => 'toString'}
hbase(main):013:0> scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'substring:张') AND SingleColumnValueFilter('hobby', 'music', !=, 'binary:粤语歌')",FORMATTER => 'toString'}
ROW COLUMN+CELL
row2 column=hobby:music, timestamp=2021-11-14T00:20:21.322, value=轻音乐
row2 column=hobby:sports, timestamp=2021-11-14T00:21:08.164, value=足球
row2 column=liecuA:name, timestamp=2021-11-15T16:58:08.497, value=张小斐
1 row(s)
Took 0.0107 seconds
注意事项1:
多过滤器多条件查询请使用大些:AND或OR进行连接否则会报错
例:
hbase(main):014:0> scan 'test', {FILTER => "SingleColumnValueFilter('liecuA', 'name', =, 'substring:张') and SingleColumnValueFilter('hobby', 'music', !=, 'binary:粤语歌')",FORMATTER => 'toString'}
ROW COLUMN+CELL ERROR: Filter Name and not supported
For usage try 'help "scan"'
Took 0.0016 seconds
注意事项2:
使用SingleColumnValueFilter 过滤数据时候,返回的数据不正常问题
这是因为 忽略了SingleColumnValueFilter 的一个参数问题。
hbase 权威指南文档原文:
The filter class also exposes a few auxiliary methods you can use to
fine tune its behavior:
boolean getFilterIfMissing()
void setFilterIfMissing(boolean filterIfMissing)
boolean getLatestVersionOnly()
void setLatestVersionOnly(boolean latestVersionOnly)
The former controls what happens to rows that do not have the
column at all. By default they are included in the result, but you can
use setFilterIfMissing(true) to reverse that behavior, i.e., all
rows that do not have the reference column are dropped from the
result.
266
Note
You must include the column you want to filter by, i.e., the reference
column, into the families you query for - using addColumn() for
example. If you fail to do so the column is considered missing and the
result is either empty, or contains all rows, based on the
getFilterIfMissing() result.设置SingleColumnValueFilter时候注意的地方:
要过滤的列必须存在,如果不存在,那么这些列不存在的数据也会返回。如果不想让这些数据返回,做如下设置即可。
filter.setFilterIfMissing(true);————————————————
此处引用自:https://blog.csdn.net/maixia24/article/details/11768753
更多推荐
所有评论(0)