vagrant搭建hadoop集群

hadoop原本是打算用docker配置hadoop的，但是遇到了docker下centos/7 systemctl异常，以及没有ssh的问题，遂，该用vagrant方式在virtual box虚拟机上运行环境virtual boxcentos box 文件jdkhadoop网络gw:192.168.33.1netmask:255.255....

longtails

1109人浏览 · 2018-10-05 15:31:03

longtails · 2018-10-05 15:31:03 发布

hadoop

原本是打算用docker配置hadoop的，但是遇到了docker下centos/7 systemctl异常，  
以及没有ssh的问题，遂，该用vagrant方式在virtual box虚拟机上运行

环境
- virtual box
- centos box 文件
- jdk
- hadoop
- 网络
  - gw:192.168.33.1
  - netmask:255.255.255.0
  - node ip:
    
    hdp-node-01 hdp-node-02 hdp-node-03
    192.168.33.101 192.168.33.102 192.168.33.103
- 文件代码见github
步骤
- 启动三个centos
- 配置文件（在hdp-node-01上，其他直接复制过去）
  - core-site.xml
  - hadoop-env. sh
  - hdfs-site.xml
  - mapred-site.xml
  - slaves
  - yarn-env. sh
  - yarn-site.xml
- 修改.bashrc换件变量
- ssh连接其他节点
  - ssh-keygen生成公私钥
  - ssh-copy-id 上传公钥到hdp-node-01、02、03上
- 测试hadoop
  - 需要root或者sudo
  - wordcount example
  - ./start.sh
使用方法见README

hdp-node-01	hdp-node-02	hdp-node-03
192.168.33.101	192.168.33.102	192.168.33.103

vagrant

file tree

liudeMacBook-Pro:hadoop liu$ tree
.
├── README.md
├── Vagrantfile
├── centos_7_virtualbox.box
├── debug
├── env
│   ├── core-site.xml
│   ├── hadoop-env.sh
│   ├── hdfs-site.xml
│   ├── mapred-site.xml
│   ├── slaves
│   ├── yarn-env.sh
│   └── yarn-site.xml
├── hadoop-2.8.5.tar.gz
├── jdk-11_linux-x64_bin.tar.gz
├── sshd_config
└── test
    ├── somewords.txt
    └── start.sh

vagrant file


# -*- mode: ruby -*-
# vi: set ft=ruby :

#   
#this is for building master
#   
#author:lhy
#time:2018.8.25
#   
$clusters_script = <<-SCRIPT
#!/bin/bash

# at /home/vagrant
#---hosts---
cat >> /etc/hosts <<EOF

192.168.33.101  hdp-node-01
192.168.33.102  hdp-node-02
192.168.33.103  hdp-node-03

EOF 
#---env---
cat >> /root/.bashrc <<EOF
export HADOOP_HOME=/home/vagrant/hadoop
export JAVA_HOME=/home/vagrant/jdk
export PATH=$PATH:home/hadoop/bin/
EOF 
source /root/.bashrc

cat >> .bashrc <<EOF
export HADOOP_HOME=/home/vagrant/hadoop
export JAVA_HOME=/home/vagrant/jdk
export PATH=$PATH:/home/vagrant/hadoop/bin/
EOF 

source .bashrc
    #---hadoop---
tar -zxf hadoop-2.8.5.tar.gz
mv hadoop-2.8.5 hadoop
mv env/*  hadoop/etc/hadoop/
rm  hadoop-2.8.5.tar.gz

#---jdk---
tar -zxf jdk-11_linux-x64_bin.tar.gz
mv jdk-11 jdk
rm jdk-11_linux-x64_bin.tar.gz


#---ssh---
mv /home/vagrant/sshd_config /etc/ssh/sshd_config
systemctl restart sshd.service

SCRIPT


Vagrant.configure("2") do |config|

        (1..3).each do |i|
        config.vm.define "hdp#{i}" do |node|

        # 设置虚拟机的Box
        node.vm.box = "centos/7"
        # 设置虚拟机的主机名
        node.vm.hostname="hdp-node-0#{i}"
        # 设置虚拟机的IP
        node.vm.network "private_network", ip: "192.168.33.#{100+i}"

        # 设置主机与虚拟机的共享目录
        #node.vm.synced_folder "~/Desktop/share", "/home/vagrant/share"
        # 复制相应的依赖文件
        config.vm.provision "file", source: "./jdk-11_linux-x64_bin.tar.gz", destination: "/home/vagrant/jdk-11_linux-x64_bin.tar.gz"
        config.vm.provision "file", source: "./hadoop-2.8.5.tar.gz", destination: "/home/vagrant/hadoop-2.8.5.tar.gz"
        config.vm.provision "file", source: "./sshd_config", destination: "/home/vagrant/sshd_config"
        config.vm.provision "file", source: "./env", destination: "/home/vagrant/env"
        config.vm.provision "file", source: "./test", destination: "/home/vagrant/test"

        # VirtaulBox相关配置
        node.vm.provider "virtualbox" do |v|
            # 设置虚拟机的名称
            v.name = "hdp#{i}"
            # 设置虚拟机的内存大小  
            v.memory = 1024
            # 设置虚拟机的CPU个数
            v.cpus = 1
        end
        node.vm.provision "shell", inline: $clusters_script # 使用shell脚本进行软件安装和配置
        end
    end
end

start.sh

#!/bin/bash

echo "-------------------"
echo "ssh key gen"
echo "-------------------"

sudo ssh-keygen
sudo ssh-copy-id -i /root/.ssh/id_rsa.pub root@hdp-node-01
sudo ssh-copy-id -i /root/.ssh/id_rsa.pub root@hdp-node-02
sudo ssh-copy-id -i /root/.ssh/id_rsa.pub root@hdp-node-03

echo "-------------------"
echo "hadoop start"
echo "-------------------"

sudo $HADOOP_HOME/bin/hadoop namenode -format

sudo $HADOOP_HOME/sbin/start-dfs.sh
sudo $HADOOP_HOME/sbin/start-yarn.sh


echo "-------------------"
echo "hadoop example wordcount"
echo "-------------------"

sudo $HADOOP_HOME/bin/hadoop fs -mkdir -p /wordcount/input
sudo $HADOOP_HOME/bin/hadoop fs -put /home/vagrant/test/somewords.txt /wordcount/input
sudo $HADOOP_HOME/bin/hadoop fs -ls /wordcount/input

cd /home/vagrant/hadoop/share/hadoop/mapreduce
sudo $HADOOP_HOME/bin/hadoop jar hadoop-mapreduce-examples-2.8.5.jar wordcount /wordcount/input /wordcount/output

output

18/10/05 06:06:10 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=186
		FILE: Number of bytes written=316073
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=178
		HDFS: Number of bytes written=100
		HDFS: Number of read operations=6
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=10769
		Total time spent by all reduces in occupied slots (ms)=7307
		Total time spent by all map tasks (ms)=10769
		Total time spent by all reduce tasks (ms)=7307
		Total vcore-milliseconds taken by all map tasks=10769
		Total vcore-milliseconds taken by all reduce tasks=7307
		Total megabyte-milliseconds taken by all map tasks=11027456
		Total megabyte-milliseconds taken by all reduce tasks=7482368
	Map-Reduce Framework
		Map input records=10
		Map output records=20
		Map output bytes=140
		Map output materialized bytes=186
		Input split bytes=118
		Combine input records=20
		Combine output records=20
		Reduce input groups=20
		Reduce shuffle bytes=186
		Reduce input records=20
		Reduce output records=20
		Spilled Records=40
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=183
		CPU time spent (ms)=1950
		Physical memory (bytes) snapshot=369668096
		Virtual memory (bytes) snapshot=4044595200
		Total committed heap usage (bytes)=176128000
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=60
	File Output Format Counters 
		Bytes Written=100

几个关键问题

centos/7 mini ssh 设置

在/etc/ssh/sshd_config中，修改或加入如下内容

PubkeyAuthentication yes
RSAAuthentication yes
AuthorizedKeysCommandUser nobody

重启ssh服务

sudo systemctl restart sshd.service

hadoop需要root权限

华为开发者空间

华为开发者空间，是为全球开发者打造的专属开发空间，汇聚了华为优质开发资源及工具，致力于让每一位开发者拥有一台云主机，基于华为根生态开发、创新。

更多推荐

数据库高安全—openGauss安全整体架构&安全认证

华为开发者空间

混合云网络过于复杂？ENS给你全局一张网的极致体验

华为开发者空间

“赛博土木工程师”的多面人生：以技术绘制蓝图，分享知识行走江湖

华为开发者空间

所有评论(0)

查看更多评论

longtails

@scylhy

已为社区贡献3条内容