0 问题

ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region phm_default_lightunit,,1606205408615.397792fb6a31a2a183c3031d173c61d2. is not online on bd--4.jx.com,16020,1620637191420
	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3077)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1015)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2347)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)

Failed with exception java.io.IOException:org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Tue May 11 13:34:26 CST 2021, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=68229: org.apache.hadoop.hbase.NotServingRegionException: Region phm_default_lightunit,,1606205408615.397792fb6a31a2a183c3031d173c61d2. is not online on bd--4.jx.com,16020,1620637191420
	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3077)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1015)
	at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2347)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32385)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
 row '' on table 'phm_default_lightunit' at region=phm_default_lightunit,,1606205408615.397792fb6a31a2a183c3031d173c61d2., hostname=bd--4.jx.com,16020,1620629554687, seqNum=759396

 

Region问题排查

表状态检查:

hbase hbck -summary phm_default_lightunit

结果如下:


Summary:
Table phm_default_lightunit is okay.
    Number of regions: 0
    Deployed on: 
Table hbase:meta is okay.
    Number of regions: 1
    Deployed on:  bd--3.jx.com,16020,1620629563497
2 inconsistencies detected.
Status: INCONSISTENT

检测到2个不一致信息

此外,在hbase web ui中,该表的region块存在 所有数值都为0的异常情况

可以确认是该表的region数据发生异常,且日志中异常的region信息如下:

  • ns:phm_default_lightunit,1b1e73ed71329daa72ffb28f42947ee7d40c9cb7872063682cb511fd16dbbfd1201810277ccbfc2577361956656a23f805afb859,1549491783878.1513a155f24ac2df20b4797077ed6ae0.
  • ns:phm_default_lightunit,1b37fe2afa75581bf5f7574336f1cdfbf0f902966db54bf38aa505665828929020181207edad0e69052f705b07331e32c00b5aae,1549491783878.d4a172571b229e74109df6c03a959588.

此信息为 hbase:meta 表中的rowkey,根据meta表rowkey的组成规则,取rowkey中最后一段的encode编码信息,到hdfs上验证该region数据目录是否存在:

hadoop fs -ls /user/hbase/data/phm_default_lightunit/1513a155f24ac2df20b4797077ed6ae0
ls: `/user/hbase/data/phm_default_lightunit/1513a155f24ac2df20b4797077ed6ae0`: No such file or directory

可以看到,该region在hdfs上对应的数据目录消失了。

进入 hbase shell 中查询meta表异常rowkey对应的值:

get 'hbase:meta','phm_default_lightunit,1b1e73ed71329daa72ffb28f42947ee7d40c9cb7872063682cb511fd16dbbfd1201810277ccbfc2577361956656a23f805afb859,1549491783878.1513a155f24ac2df20b4797077ed6ae0.'

# 结果如下:
info:regioninfo                        timestamp=1566891946444, value={ENCODED => 96dd94004c9dd9fce3f4eb80c885ad85, NAME => 'phm_default_lightunit,0778ba8d2889fe7343ffc120c4ae83da0778ba8d2889fe7343ffc120c4ae83da,1559425462970.96dd94004c9dd9fce3f4eb80c885ad85.', STARTKEY => '0778ba8d2889fe7343ffc120c4ae83da0778ba8d2889fe7343ffc120c4ae83da', ENDKEY => '0857
                                       87e9323caa99fc45325a351797fdd4849891167b36bfc6241197b5a58cb1201802142fd6fdc3bf0b5b9cb8faf80d04b71d1a'}
info:seqnumDuringOpen                  timestamp=1566891946444, value=\x00\x00\x00\x00\x00\x00\x02'
info:server                            timestamp=1566891946444, value=node85-104:16020
info:serverstartcode                   timestamp=1566891946444, value=1563956428836
info:sn                                timestamp=1566891945886, value=node85-104,16020,1563956428836
info:state                             timestamp=1566891946444, value=OPEN

元数据信息正常,至此,可以确认问题原因:

元数据显示该region在正常提供服务中,客户端到具体节点上检索数据时发现该region的数据目录不存在,抛出异常

附:meta表结构

hbase:meta表结构如下:

  • rowkey:${表名},${起始键},${region时间戳}.${encode编码}.
  • info:state:Region状态,正常情况下为 OPEN
  • info:serverstartcode:RegionServer启动的13位时间戳
  • info:server:所在RegionServer 地址和端口,如node85-47:16020
  • info:snserver:和serverstartcode组成,如node85-47:16020,1549491783878
  • info:seqnumDuringOpen:Region在线时长的二进制串
  • info:regioninfo:region的详细信息,如:ENCODED、NAME、STARTKEY、ENDKEY等

其中,regioninfo是重要信息:

  • ENCODED:基于${表名},${起始键},${region时间戳}生成的32位md5字符串,region数据存储在hdfs上时使用的唯一编号,可以从meta表中根据该值定位到hdfs中的具体路径。 rowkey中最后的${encode编码}就是 ENCODED 的值,其是rowkey组成的一部分
  • NAME:与ROWKEY值相同
  • STARTKEY:该region的起始键
  • ENDKEY:该region的结束键

修复过程

使用元数据修复工具

尝试直接使用命令修复:

hbase hbck -repair phm_default_lightunit
hbase hbck -fixMeta

上面如果解决不了,走下面流程:

 

使用hdfs工具检查是否有文件块异常:

hdfs fsck /hbase
# 结果正常

手动修复元数据

备份原有的region信息:

get 'hbase:meta','phm_default_lightunit,1b37fe2afa75581bf5f7574336f1cdfbf0f902966db54bf38aa505665828929020181207edad0e69052f705b07331e32c00b5aae,1549491783878.d4a172571b229e74109df6c03a959588.'
# 备份数据

# 删除该region信息
delete 'hbase:meta','phm_default_lightunit,1b37fe2afa75581bf5f7574336f1cdfbf0f902966db54bf38aa505665828929020181207edad0e69052f705b07331e32c00b5aae,1549491783878.d4a172571b229e74109df6c03a959588.','info:regioninfo'
delete 'hbase:meta','phm_default_lightunit,1b37fe2afa75581bf5f7574336f1cdfbf0f902966db54bf38aa505665828929020181207edad0e69052f705b07331e32c00b5aae,1549491783878.d4a172571b229e74109df6c03a959588.','info:server'
delete 'hbase:meta','phm_default_lightunit,1b37fe2afa75581bf5f7574336f1cdfbf0f902966db54bf38aa505665828929020181207edad0e69052f705b07331e32c00b5aae,1549491783878.d4a172571b229e74109df6c03a959588.','info:serverstartcode'
delete 'hbase:meta','phm_default_lightunit,1b37fe2afa75581bf5f7574336f1cdfbf0f902966db54bf38aa505665828929020181207edad0e69052f705b07331e32c00b5aae,1549491783878.d4a172571b229e74109df6c03a959588.','info:sn'
delete 'hbase:meta','phm_default_lightunit,1b37fe2afa75581bf5f7574336f1cdfbf0f902966db54bf38aa505665828929020181207edad0e69052f705b07331e32c00b5aae,1549491783878.d4a172571b229e74109df6c03a959588.','info:state'
delete 'hbase:meta','phm_default_lightunit,1b37fe2afa75581bf5f7574336f1cdfbf0f902966db54bf38aa505665828929020181207edad0e69052f705b07331e32c00b5aae,1549491783878.d4a172571b229e74109df6c03a959588.','info:seqnumDuringOpen'

刷新hbase web ui页面,发现 d4a172571b229e74109df6c03a959588 region 已经消失

执行hbck -summary 发现 不一致的部分由2变成了1。

尝试:

  • hbase shell中手动scan
  • 通过接口重新查询该值

可以成功获得结果。

继续删除 1513a155f24ac2df20b4797077ed6ae0 region块对应的meta数据后,重新测试数据样本,可得到正确的结果

问题分析

  • d4a172571b229e74109df6c03a959588
  • 1513a155f24ac2df20b4797077ed6ae0

以上两个reigon在split时,子region的数据块已经并在meta表中更新上线提供正常服务,父region的数据块已删除,但是 meta表中没有更新对应的元数据信息(原因仍待排查)。

导致对应的数据查询时,仍然通过父region检索数据,但是父region的数据已被删除,故无法成功检索。

可以通过以下命令列出该表在meta表中所有的region信息,分析排查是否有相关的region范围 覆盖了有问题的region数据

echo "scan 'hbase:meta',{FILTER => org.apache.hadoop.hbase.filter.PrefixFilter.new(org.apache.hadoop.hbase.util.Bytes.toBytes('phm_default_lightunit'))}" | hbase shell | awk -F ' ' '{print $1}' | grep phm_default_lightunit| grep -v bak | sort | uniq

来判断消失的region块是否已经由其他region托管服务。

NotServingRegionException 大概率是Region转换过程中出现了问题,通过 hbase:meta 表 和其中记录的region信息可以帮助我们定位问题所在,所以掌握meta表结构和相关存储规则是一个很有效的工具。

 

 

修复后结果如下:

 

 

hbase(main):005:0> scan "phm_default_lightunit",{LIMIT => 3}
ROW                                          COLUMN+CELL                                                                                                                    
 1207924082503057408{}1207939433961881600{}1 column=d:bulb_focpla_resis, timestamp=1606374487519, value=0.334972                                                            
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:gw_id, timestamp=1606374487519, value=1207924082503057408                                                             
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:msg_label, timestamp=1606374487519, value=0                                                                           
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:msg_time, timestamp=1606374487519, value=1606374551432                                                                
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:row_key, timestamp=1606374487519, value=1207924082503057408{}1207939433961881600{}1606374551432                       
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:sensor_id, timestamp=1606374487519, value=1207939433961881600                                                         
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:sky_light_state, timestamp=1606374487519, value=0                                                                     
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_d_out_vol, timestamp=1606374487519, value=0.836828                                                             
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_l_out_vol, timestamp=1606374487519, value=0.818128                                                             
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_out_ic, timestamp=1606374487519, value=0.496074                                                                
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_out_lm, timestamp=1606374487519, value=0                                                                       
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_state, timestamp=1606374487519, value=0                                                                        
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_type, timestamp=1606374487519, value=0                                                                         
 606374551432                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:bulb_focpla_resis, timestamp=1606374531009, value=0.190709                                                            
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:gw_id, timestamp=1606374531009, value=1207924082503057408                                                             
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:msg_label, timestamp=1606374531009, value=0                                                                           
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:msg_time, timestamp=1606374531009, value=1606374594921                                                                
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:row_key, timestamp=1606374531009, value=1207924082503057408{}1207939433961881600{}1606374594921                       
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:sensor_id, timestamp=1606374531009, value=1207939433961881600                                                         
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:sky_light_state, timestamp=1606374531009, value=0                                                                     
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_d_out_vol, timestamp=1606374531009, value=0.244069                                                             
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_l_out_vol, timestamp=1606374531009, value=0.912391                                                             
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_out_ic, timestamp=1606374531009, value=0.566164                                                                
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_out_lm, timestamp=1606374531009, value=0                                                                       
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_state, timestamp=1606374531009, value=0                                                                        
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_type, timestamp=1606374531009, value=0                                                                         
 606374594921                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:bulb_focpla_resis, timestamp=1606374574792, value=0.284280                                                            
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:gw_id, timestamp=1606374574792, value=1207924082503057408                                                             
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:msg_label, timestamp=1606374574792, value=0                                                                           
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:msg_time, timestamp=1606374574792, value=1606374638704                                                                
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:row_key, timestamp=1606374574792, value=1207924082503057408{}1207939433961881600{}1606374638704                       
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:sensor_id, timestamp=1606374574792, value=1207939433961881600                                                         
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:sky_light_state, timestamp=1606374574792, value=0                                                                     
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_d_out_vol, timestamp=1606374574792, value=0.879013                                                             
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_l_out_vol, timestamp=1606374574792, value=0.986644                                                             
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_out_ic, timestamp=1606374574792, value=0.521261                                                                
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_out_lm, timestamp=1606374574792, value=0                                                                       
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_state, timestamp=1606374574792, value=0                                                                        
 606374638704                                                                                                                                                               
 1207924082503057408{}1207939433961881600{}1 column=d:slight_type, timestamp=1606374574792, value=0                                                                         
 606374638704                                                                                                                                                               
3 row(s) in 0.2720 seconds

 

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐