hadoop故障记录-hadoop启动后datanode起不来/起来了过一段时间自动消失
前几天在虚拟机里面做测试的集群
前几天在虚拟机里面做测试的hadoop集群出了问题,在NameNode启动集群的时候集群“正常”启动,NameNode和DateNode所有进程都起来了。但是,问题来了,登录50070HDFS管理页面的时候出现下图情况
并且DataNode节点上的DateNode进程崩溃。 百度谷歌大体都在说NameNode和DataNode的namespaceID不一致~修改配置文件等等~
各种尝试未能解决问题,分析了一下slave1的log:
NameNode节点上的log:
2013-07-16 14:23:53,531 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = slaves1/192.168.20.136
STARTUP_MSG: args = []
STARTUP_MSG: version = 1.0.4
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012
************************************************************/
2013-07-16 14:23:56,389 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2013-07-16 14:23:56,464 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2013-07-16 14:23:56,465 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2013-07-16 14:23:56,466 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2013-07-16 14:23:57,422 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2013-07-16 14:23:57,437 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-07-16 14:23:58,490 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2013-07-16 14:24:00,782 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 0 time(s).
2013-07-16 14:24:01,785 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 1 time(s).
2013-07-16 14:24:02,786 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 2 time(s).
2013-07-16 14:24:03,788 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 3 time(s).
2013-07-16 14:24:04,803 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 4 time(s).
2013-07-16 14:24:05,805 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 5 time(s).
2013-07-16 14:24:06,808 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 6 time(s).
2013-07-16 14:24:07,847 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 7 time(s).
2013-07-16 14:24:08,849 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 8 time(s).
2013-07-16 14:24:09,850 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 9 time(s).
2013-07-16 14:24:09,852 INFO org.apache.hadoop.ipc.RPC: Server at master/192.168.20.135:9000 not available yet, Zzzzz...
2013-07-16 14:24:11,855 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 0 time(s).
2013-07-16 14:24:12,856 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: master/192.168.20.135:9000. Already tried 1 time(s).
2013-07-16 14:24:30,484 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Incompatible build versions: namenode BV = ; datanode BV = 1393290
2013-07-16 14:24:30,775 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible build versions: namenode BV = ; datanode BV = 1393290
at org.apache.hadoop.hdfs.server.datanode.DataNode.handshake(DataNode.java:566)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:362)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:299)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)
2013-07-16 14:24:30,894 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at slaves1/192.168.20.136
************************************************************/
看这个样子是DataNode连不到NameNode了- -. ping还是ping的通的 说明集群内部有问题了。
各种抓耳挠腮的时候突然想到 之前在NameNode上面按照书上说的编译了一下hadoop的eclipse插件
其中第二步的时候在hadoop安装目录下面执行了ant compile。
在终端用hadoop version命令检查了一下hadoop版本
Hadoop 1.0.4-SNAPSHOT
Subversion -r
Compiled by jelon on Mon Jul 15 19:44:03 CST 2013
From source with checksum a34c7c3a1218f2023cb9ced9cd6033c0
NameNode节点的hadoop版本变成了 Hadoop 1.0.4-SNAPSHOT DataNode节点的hadoop版本是Hadoop 1.0.4 暂时不懂多的这个SNAPSHOT什么意思,将NameNode的hadoop整个安装目录全部拷贝到其他的数据节点 然后删除配置文件core-site.xml里面hadoop.tmp.dir设置的tmp目录下面所有的东西 重新格式化hadoop 然后重启hadoop 登录50070管理页面
启动成功 在DataNode上jps命令检查 DataNode进程仍然存在
更多推荐
所有评论(0)