前言

淘宝Mdrill号称很强大,其所用硬件设备也很强大。但对于学习者来说,虚拟机是最经济的选择了,本文只说明如何在干净虚拟机(cen os 6.4)上安装并调试Mdrill。原理不做说明,具体请参考官方文档<< INSTALL.docx>>

准备

  1. centos 6.4 final x86_64linux用户名:mdrill

  2. jdk1.6

  3. hadoop cdh3u3

  4. zookeeper-3.4.5

  5. zeromq-2.1.7

  6. jzmq 2.1.0

  7. mdrill 安装包,0.20.8.3 普通版

  8. eclipse-jee-kepler-SR1-win32

  9. 机器连接互联网

  10. Vmware9.0

注意以上版本必须严格对应,否则有可能出现安装异常或者运行异常。

相关安装文件请联系292672967@qq.com,文件列表如下:

安装所需依赖包

数据导入和jdbc调用源码的eclipse工程

eclipse-jee-kepler-SR1-win32-with-hadoop-zookeeper-plugin.rar

jdk_1.6.0_31.tar.gz

Mdrill单机版安装说明.docx

一个安装好mdrill的虚拟机-vmware9.0

淘云盘打包下载_20140218_14_49.zip

    1、安装 centos 6.4 final

虚拟机内存至少设置1G,使用 vmware9.0 ,只需在最开始填写用户名:mdrill,然后next,然后等待安装完成。安装完成后的界面应该是这样子的:

关闭系统放火墙

System->Administration->firewall


配置hosts

vi /etc/hosts

修改为以下内容:

127.0.0.1   localhost

191.168.3.149 mdrill

 191.168.3.149修改为本机IP地址。

配置主机名

vi /etc/sysconfig/network

修改network文件为以下内容:

NETWORKING=yes

HOSTNAME=mdrill

 

重启

reboot

2、安装jdk

将下载下来的JDK安装包拷贝到虚拟机/home/mdrill 目录下:

[mdrill@mdrill ~]$ tar -xvf   jdk_1.6.0_31.tar.gz

命令运行完成后可以看见jdk安装文件夹

配置环境变量 JAVA_HOME,

如果不会用vi,使用gedit也可以

[mdrill@mdrill   ~]$ vi .bashrc

增加如下内容

JAVA_HOME=/home/mdrill/jdk_1.6.0_31

export JAVA_HOME

PATH=$PATH:$JAVA_HOME/bin

export PATH

使配置生效

[mdrill@mdrill   ~]$ source .bashrc

查看java版本

[mdrill@mdrill   ~]$ java -version

java version "1.6.0_31"

Java(TM) SE Runtime Environment (build   1.6.0_31-b04)

Java HotSpot(TM) 64-Bit Server VM (build   20.6-b01, mixed mode)

如果出现 java version "1.6.0_31",则说明安装成功了。

安装hadoop cdh3u3

首先拷贝hadoop-0.20.2-cdh3u3.tar.gz/home/mdrill

解压hadoop安装包:

[mdrill@mdrill   ~]$ tar -xvf hadoop-0.20.2-cdh3u3.tar.gz

3.1配置环境变量

[mdrill@mdrill ~]$ vi .bashrc

增加如下内容:

HADOOP_HOME=/home/mdrill/hadoop-0.20.2-cdh3u3

PATH=$PATH:$HADOOP_HOME/bin

export PATH

使配置生效

[mdrill@mdrill ~]$ source .bashrc

完成后验证:

[mdrill@mdrill ~]$ hadoop

Usage: hadoop [--config confdir] COMMAND

where COMMAND is one of:

    namenode -format     format the   DFS filesystem

    secondarynamenode    run the DFS   secondary namenode

    namenode             run the DFS   namenode

…..

如果出现:

Usage: hadoop [--config confdir] COMMAND”,证明配置成功!

3.2 配置本机ssh无密码登录

运行命令:ssh-keygen,提示输入直接回车!完成后显示如下:

[mdrill@mdrill ~]$ ssh-keygen

Generating public/private rsa key pair.

Enter file in which to save the key   (/home/mdrill/.ssh/id_rsa):

Enter passphrase (empty for no   passphrase):

Enter same passphrase again:

Your identification has been saved in   /home/mdrill/.ssh/id_rsa.

Your public key has been saved in   /home/mdrill/.ssh/id_rsa.pub.

The key fingerprint is:

6e:0c:6f:94:09:7c:f3:44:4a:57:ce:ba:cb:aa:7d:e5   mdrill@mdrill.localdomain

The key's randomart image is:

+--[ RSA 2048]----+

|          . o..    |

|       . . + o     |

|        o + . o    |

|         o * .     |

|        . S o      |

|         *   ..    |

|          * .o     |

|         + ...E    |

|        ..ooo      |

+-----------------+

[mdrill@mdrill ~]$

进入.ssh目录:

[mdrill@mdrill ~]$ cd .ssh

[mdrill@mdrill .ssh]$ ls

id_rsa    id_rsa.pub

[mdrill@mdrill .ssh]$

创建authorized_keys文件

[mdrill@mdrill .ssh]$ cat id_rsa.pub   >>authorized_keys

[mdrill@mdrill .ssh]$ ls

authorized_keys  id_rsa    id_rsa.pub

授权 authorized_keys文件

[mdrill@mdrill .ssh]$ chmod 700   authorized_keys

[mdrill@mdrill .ssh]$ ls

authorized_keys  id_rsa    id_rsa.pub

测试ssh

[mdrill@mdrill .ssh]$ ssh mdrill

The authenticity of host 'mdrill (::1)'   can't be established.

RSA key fingerprint is   15:8f:e0:b5:37:43:60:0b:b1:fb:32:0a:a4:3b:6c:8d.

Are you sure you want to continue   connecting (yes/no)? yes

Warning: Permanently added 'mdrill' (RSA)   to the list of known hosts.

提示:Are you sure you want to continue connecting (yes/no),输入yes回车。

使用 ssh mdrill”命令不提示密码输入证明已经配置成功!

3.3 配置hadoop服务

修改hadoop hadoop-env.sh 
export JAVA_HOME=/home/mdrill/jdk1.7.0_15

需要修改/home/mdrill/hadoop-0.20.2-cdh3u3/conf下的3个文件内容:core-site.xml,mapred-site.xmlhdfs-site.xml

core-site.xml修改为:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl"   href="configuration.xsl"?>

<!-- Put site-specific property   overrides in this file. -->

<configuration>

 <property>

     <name>hadoop.tmp.dir</name>

     <value>/home/mdrill/tmp</value>

     <description>A base for other temporary   directories.</description>

    </property>

<!-- file system properties -->

     <property>

       <name>fs.default.name</name>

       <value>hdfs://mdrill:9000</value>

      </property>

</configuration>

mapred-site.xml修改为:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl"   href="configuration.xsl"?>

<!-- Put site-specific property   overrides in this file. -->

<configuration>

 <property>

          <name>mapred.job.tracker</name>

          <value>http://mdrill:9001</value>

      </property>

</configuration>

hdfs-site.xml修改为:

<?xml version="1.0"?>

<?xml-stylesheet   type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property   overrides in this file. -->

<configuration>

     <property>

      <name>dfs.replication</name>

      <value>1</value>

     </property>

</configuration>

注意所有配置项和对应值不要有空格。

3.4 启动hadoop

配置完成后启动hadoop,执行命令:start-all.sh

[mdrill@mdrill conf]$ start-all.sh

starting namenode, logging to ……

 

启动完成后查看hadoop进程:jps

[mdrill@mdrill conf]$ jps

14231 JobTracker

14155 SecondaryNameNode

13931 NameNode

14037 DataNode

14366 Jps

14349 TaskTracker

可以看到以上5个进程,证明hadoop已经启动成功。

在浏览器端输入 mdrill:50070可以看到如下界面:



3.5使用hadoop eclipse插件查看和管理hadoop hdfs目录

打开eclipse,点击右上角的图标,出现open perspective 对话框,如下图:


选择 Map/reduce,出现hadoop连接配置界面

新建配置,如下:

IP换成自己的,端口不要写反了。

新建,完成后出现hdfs目录树,如下图:

4、安装依赖包

4.1安装libtool

使用命令“yum -y install libtool

[mdrill@mdrill zeromq-2.1.7]$ su

Password:

[root@mdrill zeromq-2.1.7]# yum -y   install libtool

获得root权限需要输入密码,密码和mdrill相同。

4.2安装gcc

yum -y install gcc-c++

4.3安装uuid-devel

yum -y install uuid-devel

yum -y install libuuid-devel

 

5、安装zeromqjzmq

5.1安装zeromq

zeromq-2.1.7.tar.gz拷贝到目录/home/mdrill下。

cd /home/mdrill

tar –xvf zeromq-2.1.7.tar.gz

cd zeromq-2.1.7

./autogen.sh

./configure

./make

./make install

5.2安装jzmq

jzmq-master.zip拷贝到目录/home/mdrill

su mdrill

cd /home/mdrill

unzip jzmq-master.zip

cd jzmq-master

./autogen.sh

./configure

make

su

make install

5.3添加LD_LIBRARY_PATH

[mdrill@mdrill perf]$ vi   /home/mdrill/.bashrc

增加一行:

export LD_LIBRARY_PATH=/usr/local/lib

source /home/mdrill/.bashrc

5.4验证jzmq

cd /home/mdrill/jzmq-master/perf

sh local_lat.sh   tcp://127.0.0.1:5000 1 100

另外启动控制台:

cd /home/mdrill/jzmq-master/perf

sh remote_lat.sh   tcp://127.0.0.1:5000 1 100

message size: 1 [B]

roundtrip count: 100

mean latency: 275.0 [us]

看到 message size: 1 [B]….则说明配置成功。

6、安装zookeeper

6.1配置安装目录和环境变量

拷贝zookeeper-3.4.5.tar.gz到目录:/home/mdrill

cd /home/mdrill

tar -xvf    zookeeper-3.4.5.tar.gz

cd zookeeper-3.4.5

配置环境变量:

vi   /home/mdrill/.bashrc

.bashrc增加如下内容:

ZOOKEEPER_HOME=/home/mdrill/zookeeper-3.4.5

export ZOOKEEPER_HOME

PATH=$PATH:$ZOOKEEPER_HOME/bin

export PATH

配置生效:

source /home/mdrill/.bashrc

验证:

[mdrill@mdrill ~]$ zkServer.sh

JMX enabled by default

Using config:   /home/mdrill/zookeeper-3.4.5/bin/../conf/zoo.cfg

grep: /home/mdrill/zookeeper-3.4.5/bin/../conf/zoo.cfg:   No such file or directory

mkdir: cannot create directory `': No   such file or directory

Usage:   /home/mdrill/zookeeper-3.4.5/bin/zkServer.sh   {start|start-foreground|stop|restart|status|upgrade|print-cmd}

[mdrill@mdrill ~]$

6.2配置zookeeper服务

cd /home/mdrill/zookeeper-3.4.5/conf

cp  cp zoo_sample.cfg zoo.cfg

vi zoo.cfg

修改zoo.cfg 12 行为:

dataDir=/home/mdrill/zookeeperdata

在末尾增加:

server.1=mdrill:2888:3888

配置完成后的zoo.cfg文件如下:

# synchronization phase can take

initLimit=10

# The number of ticks that can pass   between

# sending a request and getting an   acknowledgement

syncLimit=5

# the directory where the snapshot is   stored.

# do not use /tmp for storage, /tmp here   is just

# example sakes.

dataDir=/home/mdrill/zookeeperdata

# the port at which the clients will   connect

clientPort=2181

#

# Be sure to read the maintenance section   of the

# administrator guide before turning on   autopurge.

#

#   http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

# The number of snapshots to retain in   dataDir

#autopurge.snapRetainCount=3

# Purge task interval in hours

# Set to "0" to disable auto   purge feature

#autopurge.purgeInterval=1

server.1=mdrill:2888:3888

新建zookeeper服务数据文件夹

mkdir /home/mdrill/zookeeperdata

vi /home/mdrill/zookeeperdata/myid

给文件”myid”中写入1

6.3启动并测试zookeeper

启动

使用命令:zkServer.sh start

[mdrill@mdrill ~]$ zkServer.sh start

JMX enabled by default

Using config:   /home/mdrill/zookeeper-3.4.5/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

[mdrill@mdrill ~]$ jps

35079 Jps

35049 QuorumPeerMain

使用jps查看到QuorumPeerMain进程说明启动成功!

查看状态

[mdrill@mdrill ~]$ zkServer.sh status

JMX enabled by default

Using config:   /home/mdrill/zookeeper-3.4.5/bin/../conf/zoo.cfg

Mode: standalone

[mdrill@mdrill ~]$

启动客户端测试

使用命令:zkCli.sh -server mdrill:2181

[mdrill@mdrill conf]$ zkCli.sh -server mdrill:2181

Connecting to mdrill:2181

2014-03-13 03:03:22,880 [myid:] -   INFO  [main:Environment@100] - Client   environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT

……

2014-03-13 03:03:23,117 [myid:] -   INFO  [main-SendThread(mdrill:2181):ClientCnxn$SendThread@849]   - Socket connection established to mdrill/0:0:0:0:0:0:0:1:2181, initiating   session

[zk: mdrill:2181(CONNECTING) 0]   2014-03-13 03:03:23,366 [myid:] - INFO    [main-SendThread(mdrill:2181):ClientCnxn$SendThread@1207] - Session   establishment complete on server mdrill/0:0:0:0:0:0:0:1:2181, sessionid =   0x144badfe5a60000, negotiated timeout = 30000

 

WATCHER::

 

WatchedEvent state:SyncConnected   type:None path:null

未完待续

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐