Hadoop本地安装

Hadoop有多种安装方式,本文讲述Hadoop本地安装方法。

一、软件准备

需要安装好虚拟机Vmware12,本文使用的是Ubuntu16。
* 下载JDK1.8
必须安装sun JDK,下载地址:
http://www.oracle.com/technetwork/java/javase/downloads/index.html
* 下载Hadoop3.0
下载地址:http://hadoop.apache.org/releases.html

二、安装和配置

2.1 安装JDK1.8

解压JDK安装包到~/soft/目录下,然后将目录拷贝到/usr/soft/目录下:mv jdk1.8.0_111 /usr/soft/

环境变量配置:
打开/etc/environment:
JAVA_HOME=/usr/soft/jdk1.8.0_111

系统本来自带open-JDK,需要对JDK进行配置,具体请看:
https://blog.csdn.net/goodmentc/article/details/80959686

2.2 安装Hadoop

安装Hadoop:
将Hadoop安装包解压到/home/tc/soft/目录下,然后将目录拷贝到/usr/soft/目录下:mv hadoop-3.0.3 /usr/soft/
HADOOP_INSTALL=/home/tc/soft/hadoop-3.0.3

PATH=”/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/soft/jdk1.8/bin:/home/tc/soft/hadoop-3.0.3/bin:/home/tc/soft/hadoop-3.0.3/sbin”

打开终端执行命令使用环境变量生效:
source environment

查看hadoop安装是否成功:
执行命令:hadoop version
如果安装成功会看到hadoop版本信息:
tc@ubuntu:/usr/soft/hadoop-3.0.3$ hadoop version
Hadoop 3.0.3
Source code repository https://yjzhangal@git-wip-us.apache.org/repos/asf/hadoop.git -r 37fd7d752db73d984dc31e0cdfd590d252f5e075
Compiled by yzhang on 2018-05-31T17:12Z
Compiled with protoc 2.5.0
From source with checksum 736cdcefa911261ad56d2d120bf1fa
This command was run using /usr/soft/hadoop-3.0.3/share/hadoop/common/hadoop-common-3.0.3.jar

如果JDK环境变量配置未生效,则报错:
tc@ubuntu:/usr/soft/hadoop-3.0.3$ hadoop version
Error JAVA_HOME is not set and could not be found.

解决方法:重启虚拟机。
tc@ubuntu:~$ hadoop version
Hadoop 3.0.3
Source code repository https://yjzhangal@git-wip-us.apache.org/repos/asf/hadoop.git -r 37fd7d752db73d984dc31e0cdfd590d252f5e075
Compiled by yzhang on 2018-05-31T17:12Z
Compiled with protoc 2.5.0
From source with checksum 736cdcefa911261ad56d2d120bf1fa
This command was run using /usr/soft/hadoop-3.0.3/share/hadoop/common/hadoop-common-3.0.3.jar

三、应用

Hadoop自带了一个MapReduce程序$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar,它作为一个例子提供了MapReduce的基本功能,并且可以用于计算,包括 wordcount、terasort、join、grep 等。

以通过执行如下命令查看该.jar文件支持哪些MapReduce功能。

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar
tc@ubuntu:~$ hadoop jar /usr/soft/hadoop-3.0.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar 
An example program must be given as the first argument.
Valid program names are:
  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
  dbcount: An example job that count the pageview counts from a database.
  distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.
  wordmean: A map/reduce program that counts the average length of the words in the input files.
  wordmedian: A map/reduce program that counts the median length of the words in the input files.
  wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
tc@ubuntu:~$   

3.1 grep使用

创建目录input,在input目录下创建文件test.txt, 执行命令:

tancan@ubuntu:~$ hadoop jar /usr/soft/hadoop-3.0.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar grep input output "test"  

执行成功后,会打印相关日志:

         ... ...
              Merged Map outputs=1
                GC time elapsed (ms)=0
                Total committed heap usage (bytes)=838860800
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=98
        File Output Format Counters 
                Bytes Written=8

查看结果:
- 执行命令:ll output

tancan@ubuntu:~$ ll output/
total 16
drwxr-xr-x  2 tancan tancan 4096 Jul  7 06:02 ./
drwxr-xr-x 25 tancan tancan 4096 Jul  7 06:02 ../
-rw-r--r--  1 tancan tancan    0 Jul  7 06:02 part-r-00000
-rw-r--r--  1 tancan tancan    8 Jul  7 06:02 .part-r-00000.crc
-rw-r--r--  1 tancan tancan    0 Jul  7 06:02 _SUCCESS
-rw-r--r--  1 tancan tancan    8 Jul  7 06:02 ._SUCCESS.crc
  • 打印结果:cat output/*
tancan@ubuntu:~$ cat output/*
1   test

3.2 wordcount应用

修改input/test.txt内容:This is test file
执行命令:

tancan@ubuntu:~$ hadoop jar /usr/soft/hadoop-3.0.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.3.jar wordcount input output/
2018-07-07 06:08:24,179 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2018-07-07 06:08:24,249 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2018-07-07 06:08:24,249 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2018-07-07 06:08:24,532 INFO input.FileInputFormat: Total input files to process : 1
2018-07-07 06:08:24,553 INFO mapreduce.JobSubmitter: number of splits:1
2018-07-07 06:08:24,716 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1885415322_0001
2018-07-07 06:08:24,719 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-07-07 06:08:24,853 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2018-07-07 06:08:24,854 INFO mapreduce.Job: Running job: job_local1885415322_0001
2018-07-07 06:08:24,855 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2018-07-07 06:08:24,863 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-07-07 06:08:24,863 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-07-07 06:08:24,865 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2018-07-07 06:08:24,916 INFO mapred.LocalJobRunner: Waiting for map tasks
2018-07-07 06:08:24,916 INFO mapred.LocalJobRunner: Starting task: attempt_local1885415322_0001_m_000000_0
2018-07-07 06:08:24,954 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-07-07 06:08:24,955 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-07-07 06:08:24,978 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2018-07-07 06:08:24,985 INFO mapred.MapTask: Processing split: file:/home/tancan/input/test.txt:0+18
2018-07-07 06:08:25,082 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2018-07-07 06:08:25,083 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2018-07-07 06:08:25,083 INFO mapred.MapTask: soft limit at 83886080
2018-07-07 06:08:25,083 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2018-07-07 06:08:25,083 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2018-07-07 06:08:25,089 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2018-07-07 06:08:25,102 INFO mapred.LocalJobRunner: 
2018-07-07 06:08:25,102 INFO mapred.MapTask: Starting flush of map output
2018-07-07 06:08:25,102 INFO mapred.MapTask: Spilling map output
2018-07-07 06:08:25,102 INFO mapred.MapTask: bufstart = 0; bufend = 34; bufvoid = 104857600
2018-07-07 06:08:25,102 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600
2018-07-07 06:08:25,121 INFO mapred.MapTask: Finished spill 0
2018-07-07 06:08:25,135 INFO mapred.Task: Task:attempt_local1885415322_0001_m_000000_0 is done. And is in the process of committing
2018-07-07 06:08:25,137 INFO mapred.LocalJobRunner: map
2018-07-07 06:08:25,138 INFO mapred.Task: Task 'attempt_local1885415322_0001_m_000000_0' done.
2018-07-07 06:08:25,144 INFO mapred.Task: Final Counters for attempt_local1885415322_0001_m_000000_0: Counters: 18
    File System Counters
        FILE: Number of bytes read=316165
        FILE: Number of bytes written=784641
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=1
        Map output records=4
        Map output bytes=34
        Map output materialized bytes=48
        Input split bytes=97
        Combine input records=4
        Combine output records=4
        Spilled Records=4
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=216006656
    File Input Format Counters 
        Bytes Read=18
2018-07-07 06:08:25,145 INFO mapred.LocalJobRunner: Finishing task: attempt_local1885415322_0001_m_000000_0
2018-07-07 06:08:25,145 INFO mapred.LocalJobRunner: map task executor complete.
2018-07-07 06:08:25,149 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2018-07-07 06:08:25,150 INFO mapred.LocalJobRunner: Starting task: attempt_local1885415322_0001_r_000000_0
2018-07-07 06:08:25,166 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2018-07-07 06:08:25,166 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2018-07-07 06:08:25,166 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2018-07-07 06:08:25,170 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@4e822838
2018-07-07 06:08:25,172 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2018-07-07 06:08:25,200 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=322594400, maxSingleShuffleLimit=80648600, mergeThreshold=212912320, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2018-07-07 06:08:25,202 INFO reduce.EventFetcher: attempt_local1885415322_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2018-07-07 06:08:25,243 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1885415322_0001_m_000000_0 decomp: 44 len: 48 to MEMORY
2018-07-07 06:08:25,249 INFO reduce.InMemoryMapOutput: Read 44 bytes from map-output for attempt_local1885415322_0001_m_000000_0
2018-07-07 06:08:25,250 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 44, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->44
2018-07-07 06:08:25,251 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2018-07-07 06:08:25,252 INFO mapred.LocalJobRunner: 1 / 1 copied.
2018-07-07 06:08:25,256 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2018-07-07 06:08:25,267 INFO mapred.Merger: Merging 1 sorted segments
2018-07-07 06:08:25,268 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 37 bytes
2018-07-07 06:08:25,269 INFO reduce.MergeManagerImpl: Merged 1 segments, 44 bytes to disk to satisfy reduce memory limit
2018-07-07 06:08:25,270 INFO reduce.MergeManagerImpl: Merging 1 files, 48 bytes from disk
2018-07-07 06:08:25,271 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2018-07-07 06:08:25,271 INFO mapred.Merger: Merging 1 sorted segments
2018-07-07 06:08:25,273 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 37 bytes
2018-07-07 06:08:25,273 INFO mapred.LocalJobRunner: 1 / 1 copied.
2018-07-07 06:08:25,279 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2018-07-07 06:08:25,280 INFO mapred.Task: Task:attempt_local1885415322_0001_r_000000_0 is done. And is in the process of committing
2018-07-07 06:08:25,281 INFO mapred.LocalJobRunner: 1 / 1 copied.
2018-07-07 06:08:25,282 INFO mapred.Task: Task attempt_local1885415322_0001_r_000000_0 is allowed to commit now
2018-07-07 06:08:25,283 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1885415322_0001_r_000000_0' to file:/home/tancan/output
2018-07-07 06:08:25,284 INFO mapred.LocalJobRunner: reduce > reduce
2018-07-07 06:08:25,284 INFO mapred.Task: Task 'attempt_local1885415322_0001_r_000000_0' done.
2018-07-07 06:08:25,285 INFO mapred.Task: Final Counters for attempt_local1885415322_0001_r_000000_0: Counters: 24
    File System Counters
        FILE: Number of bytes read=316293
        FILE: Number of bytes written=784727
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Combine input records=0
        Combine output records=0
        Reduce input groups=4
        Reduce shuffle bytes=48
        Reduce input records=4
        Reduce output records=4
        Spilled Records=4
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=216006656
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Output Format Counters 
        Bytes Written=38
2018-07-07 06:08:25,286 INFO mapred.LocalJobRunner: Finishing task: attempt_local1885415322_0001_r_000000_0
2018-07-07 06:08:25,286 INFO mapred.LocalJobRunner: reduce task executor complete.
2018-07-07 06:08:25,861 INFO mapreduce.Job: Job job_local1885415322_0001 running in uber mode : false
2018-07-07 06:08:25,864 INFO mapreduce.Job:  map 100% reduce 100%
2018-07-07 06:08:25,866 INFO mapreduce.Job: Job job_local1885415322_0001 completed successfully
2018-07-07 06:08:25,883 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=632458
        FILE: Number of bytes written=1569368
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=1
        Map output records=4
        Map output bytes=34
        Map output materialized bytes=48
        Input split bytes=97
        Combine input records=4
        Combine output records=4
        Reduce input groups=4
        Reduce shuffle bytes=48
        Reduce input records=4
        Reduce output records=4
        Spilled Records=8
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=432013312
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=18
    File Output Format Counters 
        Bytes Written=38

查看结果:

tancan@ubuntu:~$ cat output/*
This    1
file    1
is  1
test    1
tancan@ubuntu:~$ 

四、安装过程遇到的问题

4.1 在64位系统上安装后,运行报错

tancan@ubuntu:~$ hadoop fs -ls /
Java HotSpot(TM) Server VM warning: You have loaded library /usr/soft/hadoop-3.0.3/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
2018-06-28 06:52:31,341 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
ls: Call From ubuntu/127.0.1.1 to master:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

原因:
因为官网提供的版本本地库是32位的,在64位主机环境下无法执行。需要下载hadoop源码进行编译,编译成功后,找到native下的文件拷贝到${HADOOP_HOME}/lib/native目录下即可。

五、附件

我使用的native文件:
https://download.csdn.net/download/goodmentc/10528791

Logo

华为开发者空间,是为全球开发者打造的专属开发空间,汇聚了华为优质开发资源及工具,致力于让每一位开发者拥有一台云主机,基于华为根生态开发、创新。

更多推荐